When we're stopped at syscall entry tracing, ptrace can change the %rax
value from -ENOSYS to something else. If no system call is actually made
because the syscall number (now in orig_rax) is bad, then we now always
reset %rax to -ENOSYS again.
This changes it to leave the return value alone after entry tracing.
That way, the %rax value set by ptrace is there to be seen in user mode
(or in syscall exit tracing). This is consistent with what the 32-bit
kernel does.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by pageexec@freemail.hu:
> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).
To fix this we pass in pt_regs by pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use a common irq_return entry point for all the iret places, which
need the paravirt INTERRUPT return wrapper.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This change broke recovery of exceptions in iret:
commit 72fe485854
Author: Glauber de Oliveira Costa <gcosta@redhat.com>
x86: replace privileged instructions with paravirt macros
The ENTRY(native_iret) macro adds alignment padding before the iretq
instruction, so "iret_label" no longer points exactly at the instruction.
It was sloppy to leave the old "iret_label" label behind when replacing
its nearby use. Removing it would have revealed the other use of the
label later in the file, and upon noticing that use, anyone exercising
the minimum of attention to detail expected of anyone touching this
subtle code would realize it needed to change as well.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the x86-64 behavior for 32-bit processes that set
bogus %cs/%ss values (the only ones that can fault in iret)
match what the native i386 behavior is. (do not kill the task
via do_exit but generate a SIGSEGV signal)
[ tglx@linutronix.de: build fix ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The assembly code in entry_64.S issues a bunch of privileged instructions,
like cli, sti, swapgs, and others. Paravirt guests are forbidden to do so,
and we then replace them with macros that will do the right thing.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use HR-timers (when available) to deliver an accurate preemption tick.
The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.
The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.
AK: changed to 32bit move because it's strictly an int
[ tglx: arch/x86 adaptation ]
Signed-off-by: Andrey Mirkin <major@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Run the lockdep_sys_exit hook after all other C code on the syscall
return path.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>