Commit graph

15837 commits

Author SHA1 Message Date
Oleg Nesterov
b64b9c937a uprobes/x86: Only rep+nop can be emulated correctly
__skip_sstep() correctly detects the "nontrivial" nop insns,
but since it doesn't update regs->ip we can not really skip
"0x0f 0x1f | 0x0f 0x19 | 0x87 0xc0", the probed application
is killed by SIGILL'ed handle_swbp().

Remove these additional checks. If we want to implement this
correctly we need to know the full insn length to update ->ip.

rep* + nop is fine even without updating ->ip.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-10-07 21:19:40 +02:00
Oleg Nesterov
db023ea595 uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()
Move clear_thread_flag(TIF_UPROBE) from do_notify_resume() to
uprobe_notify_resume() for !CONFIG_UPROBES case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29 21:21:53 +02:00
Ingo Molnar
f74eb72868 perf/core improvements and fixes:
. Convert the trace builtins to use the growing evsel/evlist
   tracepoint infrastructure, removing several open coded constructs
   like switch like series of strcmp to dispatch events, etc.
   Basically what had already been showcased in 'perf sched'.
 
 . Add evsel constructor for tracepoints, that uses libtraceevent
   just to parse the /format events file, use it in a new 'perf test'
   to make sure the libtraceevent format parsing regressions can
   be more readily caught.
 
 . Some strange errors were happening in some builds, but not on the
   next, reported by several people, problem was some parser related
   files, generated during the build, didn't had proper make deps,
   fix from Eric Sandeen.
 
 . Fix some compiling errors on 32-bit, from Feng Tang.
 
 . Don't use sscanf extension %as, not available on bionic, reimplementation
   by Irina Tirdea.
 
 . Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf.
 
 . Introduce struct and cache information about the environment where a
   perf.data file was captured, from Namhyung Kim.
 
 . Fix several error paths in libtraceevent, from Namhyung Kim.
 
   Print event causing perf_event_open() to fail in 'perf record',
   from Stephane Eranian.
 
 . New 'kvm' analysis tool, from Xiao Guangrong.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQYIGnAAoJENZQFvNTUqpApNIP/2yUe3kfZ6Sz5rVwVdxyDBQN
 Y0sv6IwtT31G6WA8o8fvIKylMiTn0XMt5TG3e57ebw01lHH/xQAkQs7EmpmGclIT
 qOE5a1JwD8eyTMhBL92dt9LUHEq103pW/oBS1d6TeQ9XyEh+HThv+wVT/yRKneu6
 IHV7b/8fC++v80bpsWLr8S3+p9OXMclsjR7IfDCeJnicxyHb/yWfRVbQiBwGk5/v
 IfKc0uat60+1qAy7GwBM/nVDJqzekrPI76krNP8tVdftxBjatpXS+VBKiYfo352g
 nr3Gpg2MWEY2oRzkN6jCj5yQAVTxa2OrAYY/I8dSJWkTHXL7UbzNzatNpVpvS0we
 0wFP5gEumyuYE1YwjsR14ICSOJdMaO0pBYO4YMnqXsXGiS0JEM+2o+2bL2Nl9EDz
 LEGWQGWVC23Tu3P1SaHgdb2YnQLZ18AVBwBljESrLvCf4leC/RWoA3HvJ4NOhook
 08ee24r57SLLe7z8W3fBPRsslpmoZnhnLHES/N/qf2y5u4Ig9IkafMS2ARVSwgPX
 6u0uVIf48YQGAlv8p3cNsYa2PITB3R04Nm5GB2KvTxplwuRD1jYqPpR4BaKrXe+s
 SBVIxgDygglY+husTYLBWMBYswWhu5ECVobdfxtrQbQPt+v9hxf5S8UhAbu3AVcV
 IocrCPg3REvuxPwwPmY9
 =FgCY
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

 * Convert the trace builtins to use the growing evsel/evlist
   tracepoint infrastructure, removing several open coded constructs
   like switch like series of strcmp to dispatch events, etc.
   Basically what had already been showcased in 'perf sched'.

 * Add evsel constructor for tracepoints, that uses libtraceevent
   just to parse the /format events file, use it in a new 'perf test'
   to make sure the libtraceevent format parsing regressions can
   be more readily caught.

 * Some strange errors were happening in some builds, but not on the
   next, reported by several people, problem was some parser related
   files, generated during the build, didn't had proper make deps,
   fix from Eric Sandeen.

 * Fix some compiling errors on 32-bit, from Feng Tang.

 * Don't use sscanf extension %as, not available on bionic, reimplementation
   by Irina Tirdea.

 * Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf.

 * Introduce struct and cache information about the environment where a
   perf.data file was captured, from Namhyung Kim.

 * Fix several error paths in libtraceevent, from Namhyung Kim.

   Print event causing perf_event_open() to fail in 'perf record',
   from Stephane Eranian.

 * New 'kvm' analysis tool, from Xiao Guangrong.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-24 20:55:26 +02:00
Xiao Guangrong
26bf264e87 KVM: x86: Export svm/vmx exit code and vector code to userspace
Exporting KVM exit information to userspace to be consumed by perf.

Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com>
[ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-21 12:48:09 -03:00
Borislav Petkov
50a011f640 kprobes/x86: Move skip_singlestep up
I get this warning:

  arch/x86/kernel/kprobes.c:544:23: warning: ‘skip_singlestep’ declared ‘static’ but never defined

on tip/auto-latest.

Put the skip_singlestep function declaration up, in
KPROBES_CAN_USE_FTRACE and drop the superfluous forward
declaration.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1348145034-16603-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-20 14:48:16 +02:00
Dan Carpenter
1e6dd8adc7 perf: Fix off by one test in perf_reg_value()
The test should be >= ARRAY_SIZE() instead of > ARRAY_SIZE().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20120905123126.GC6128@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-19 17:08:40 +02:00
Ingo Molnar
d0616c1775 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes fixes + cleanups from Oleg Nesterov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-19 17:03:07 +02:00
Yan, Zheng
314d9f63f3 perf/x86: Add cpumask for uncore pmu
This patch adds a cpumask file to the uncore pmu sysfs directory.  The
cpumask file contains one active cpu for every socket.

Signed-off-by: "Yan, Zheng" <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: "Yan, Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1347263631-23175-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-17 13:11:43 -03:00
Oleg Nesterov
baedbf02b1 uprobes: Make arch_uprobe_task->saved_trap_nr "unsigned int"
Make arch_uprobe_task->saved_trap_nr "unsigned int" and move it down
after ->saved_scratch_register, this changes sizeof() from 24 to 16.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:32 +02:00
Oleg Nesterov
d6a00b35e4 uprobes/x86: Fix arch_uprobe_disable_step() && UTASK_SSTEP_TRAPPED interaction
arch_uprobe_disable_step() should also take UTASK_SSTEP_TRAPPED into
account. In this case the probed insn was not executed, we need to
clear X86_EFLAGS_TF if it was set by us and that is all.

Again, this code will look more clean when we move it into
arch_uprobe_post_xol() and arch_uprobe_abort_xol().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:32 +02:00
Oleg Nesterov
3a4664aa83 uprobes/x86: Xol should send SIGTRAP if X86_EFLAGS_TF was set
arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and
returns to user-mode. But this means the application gets SIGTRAP
only after the next insn.

This means that UPROBE_CLEAR_TF logic is not really right. _enable
should only record the state of X86_EFLAGS_TF, and _disable should
check it separately from UPROBE_FIX_SETF.

Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and
change enable/disable accordingly. This assumes that the probed insn
was not trapped, see the next patch.

arch_uprobe_skip_sstep() logic has the same problem, change it to
check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this
all after we fold enable/disable_step into pre/post_hol hooks.

Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap().
But this needs more changes, handle_swbp() does the same and this is
equally wrong.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:31 +02:00
Oleg Nesterov
9bd1190a11 uprobes/x86: Do not (ab)use TIF_SINGLESTEP/user_*_single_step() for single-stepping
user_enable/disable_single_step() was designed for ptrace, it assumes
a single user and does unnecessary and wrong things for uprobes. For
example:

	- arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an
	  application itself can set X86_EFLAGS_TF which must be
	  preserved after arch_uprobe_disable_step().

	- we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in
	  arch_uprobe_enable_step(), this only makes sense for ptrace.

	- otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step()
	  doesn't do user_disable_single_step(), the application will
	  be killed after the next syscall.

	- arch_uprobe_enable_step() does access_process_vm() we do
	  not need/want.

Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF
directly, this is much simpler and more correct. However, we need to
clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn,
add set_task_blockstep(false).

Note: with or without this patch, there is another (hopefully minor)
problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by
uprobes. Perhaps we should change _disable to update the stack, or
teach arch_uprobe_skip_sstep() to emulate this insn.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:30 +02:00
Oleg Nesterov
95cf00fa5d ptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logic
Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
step.c was always very wrong.

1. update_debugctlmsr() was simply unneeded. The child sleeps
   TASK_TRACED, __switch_to_xtra(next_p => child) should notice
   TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
   needed.

2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
   should always match the state of current's TIF_BLOCKSTEP bit.

3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
   look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
   register or the caller can be preempted in between.

4. It is not safe to play with TIF_BLOCKSTEP if task != current.
   DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
   other if the task is running. The tracee is stopped but it
   can be SIGKILL'ed right before set/clear_tsk_thread_flag().

However, now that uprobes uses user_enable_single_step(current)
we can't simply remove update_debugctlmsr(). So this patch adds
the additional "task == current" check and disables irqs to avoid
the race with interrupts/preemption.

Unfortunately this patch doesn't solve the last problem, we need
another fix. Probably we should teach ptrace_stop() to set/clear
single/block stepping after resume.

And afaics there is yet another problem: perf can play with
MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
__switch_to_xtra() has problems.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2012-09-15 17:37:29 +02:00
Oleg Nesterov
848e8f5f0a ptrace/x86: Introduce set_task_blockstep() helper
No functional changes, preparation for the next fix and for uprobes
single-step fixes.

Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
new helper, set_task_blockstep().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:28 +02:00
Sebastian Andrzej Siewior
bdc1e47217 uprobes/x86: Implement x86 specific arch_uprobe_*_step
The arch specific implementation behaves like user_enable_single_step()
except that it does not disable single stepping if it was already
enabled by ptrace. This allows the debugger to single step over an
uprobe. The state of block stepping is not restored. It makes only sense
together with TF and if that was enabled then the debugger is notified.

Note: this is still not correct. For example, TIF_SINGLESTEP check
is not right, the application itself can set X86_EFLAGS_TF. And otoh
we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf".
See the next patches, we need the changes in arch/x86/kernel/step.c
first.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15 17:37:28 +02:00
Ingo Molnar
26f45274af Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull tracing updates from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-14 10:06:51 +02:00
Masami Hiramatsu
c6aaf4d0bb kprobes/x86: Fix to support jprobes on ftrace-based kprobe
Fix kprobes/x86 to support jprobes on ftrace-based kprobes.
Because of -mfentry support of ftrace, ftrace is now put
on the beginning of function where jprobes are put.

Originally ftrace-based kprobes doesn't support jprobe
because it will change regs->ip and ftrace doesn't support
changing IP and ftrace itself doesn't conflict jprobe.
However, ftrace -mfentry support moves mcount call on the
top of functions where jprobes are put. This means that
jprobe always conflicts with ftrace-based kprobe and fails.

This patch allows ftrace-based kprobes to support jprobes
by allowing to modify regs->ip and kprobes breakpoint
handler also allows to skip singlestepping because there
is a ftrace call (not an original instruction).

Link: http://lkml.kernel.org/r/20120905143125.10329.90836.stgit@localhost.localdomain

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13 22:52:11 -04:00
Steven Rostedt
47d5a5f88b ftrace/x86-64: Allow to change RIP in handlers
Allow ftrace handlers to change RIP register (regs->ip)
in handlers. This will allow handlers to call another
function instead of original function.

Link: http://lkml.kernel.org/r/20120905143118.10329.5078.stgit@localhost.localdomain

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13 22:52:10 -04:00
Masami Hiramatsu
4b036d54bf kprobes/x86: Fix kprobes to collectly handle IP on ftrace
Current kprobe_ftrace_handler expects regs->ip == ip, but it is
incorrect (originally on x86-64). Actually, ftrace handler sets
regs->ip = ip + MCOUNT_INSN_SIZE.
kprobe_ftrace_handler must take care for that.

Link: http://lkml.kernel.org/r/20120905143112.10329.72069.stgit@localhost.localdomain

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13 22:52:09 -04:00
Masami Hiramatsu
a5e37863ab ftrace/x86: Adjust x86 regs.ip as like as x86-64
Adjust x86 regs.ip to ip + MCOUNT_INSN_SIZE as like as
on x86-64. This helps us to consolidate codes which use
regs->ip on both of x86/x86-64.

Link: http://lkml.kernel.org/r/20120905143100.10329.60109.stgit@localhost.localdomain

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13 22:52:09 -04:00
Ingo Molnar
4553f0b90e Merge branch 'core/rcu' into perf/core
Steve Rostedt asked for the merge of a single commit, into both
the RCU and the perf/tracing tree:

 | Josh made a change to the tracing code that affects both the
 | work Paul McKenney and I are currently doing. At the last
 | Kernel Summit back in August, Linus said when such a case
 | exists, it is best to make a separate branch based off of his
 | tree and place the change there. This way, the repositories
 | that need to share the change can both pull them in and the
 | SHA1 will match for both. Whichever branch is pulled in first
 | by Linus will also pull in the necessary change for the other
 | branch as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13 17:18:38 +02:00
Linus Torvalds
bf71d0e18e Bug-fixes:
* Fix for TLB flushing introduced in v3.6
  * Fix Xen-SWIOTLB not using proper DMA mask - device had 64bit but
    in a 32-bit kernel we need to allocate for coherent pages from a
    32-bit pool.
  * When trying to re-use P2M nodes we had a one-off error and triggered
    a BUG_ON check with specific CONFIG_ option.
  * When doing FLR in Xen-PCI-backend we would first do FLR then save the
    PCI configuration space. We needed to do it the other way around.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQSS7TAAoJEFjIrFwIi8fJOBQH/03JBKBbFvewhop8T5Jww2c9
 SWmMgzDm5HkxeWj5XnM+zlLoHrYFFu7tyuRLiny4weE0LdRl4adJ1TVpStxap/b6
 MSQKe+tZevslaReBOsMpbCk3z7fEWNlAcpm6wMp1xYmLoHcr0JMpOCmzigbf7dwM
 F4UWULheih9ME3UeqDAU8qgvfv6ccZ9vempO4TDWKjxfxfWODCNMRx+Ny+C7NNRk
 QeoInHJUqcRkg0q0OIciF/YYDmn8hIH7HgfqomuMb6rEv2LOieLnWVuniEPWM75s
 lECxro7v2Z9s1Std0WWKcFp8VZpI2scnQYU8aL8TpewHcbzKHQYyipuKPu7FiEQ=
 =GfLE
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
 * Fix for TLB flushing introduced in v3.6
 * Fix Xen-SWIOTLB not using proper DMA mask - device had 64bit but
   in a 32-bit kernel we need to allocate for coherent pages from a
   32-bit pool.
 * When trying to re-use P2M nodes we had a one-off error and triggered
   a BUG_ON check with specific CONFIG_ option.
 * When doing FLR in Xen-PCI-backend we would first do FLR then save the
   PCI configuration space. We needed to do it the other way around.

* tag 'stable/for-linus-3.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Fix proper FLR steps.
  xen: Use correct masking in xen_swiotlb_alloc_coherent.
  xen: fix logical error in tlb flushing
  xen/p2m: Fix one-off error in checking the P2M tree directory.
2012-09-06 17:16:42 -07:00
Alex Shi
ce7184bdbd xen: fix logical error in tlb flushing
While TLB_FLUSH_ALL gets passed as 'end' argument to
flush_tlb_others(), the Xen code was made to check its 'start'
parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
be flushed from TLB.

This patch fixed this issue.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Yongjie Ren <yongjie.ren@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-05 10:50:21 -04:00
Konrad Rzeszutek Wilk
593d0a3e9f Merge commit '4cb38750d49010ae72e718d46605ac9ba5a851b4' into stable/for-linus-3.6
* commit '4cb38750d49010ae72e718d46605ac9ba5a851b4': (6849 commits)
  bcma: fix invalid PMU chip control masks
  [libata] pata_cmd64x: whitespace cleanup
  libata-acpi: fix up for acpi_pm_device_sleep_state API
  sata_dwc_460ex: device tree may specify dma_channel
  ahci, trivial: fixed coding style issues related to braces
  ahci_platform: add hibernation callbacks
  libata-eh.c: local functions should not be exposed globally
  libata-transport.c: local functions should not be exposed globally
  sata_dwc_460ex: support hardreset
  ata: use module_pci_driver
  drivers/ata/pata_pcmcia.c: adjust suspicious bit operation
  pata_imx: Convert to clk_prepare_enable/clk_disable_unprepare
  ahci: Enable SB600 64bit DMA on MSI K9AGM2 (MS-7327) v2
  [libata] Prevent interface errors with Seagate FreeAgent GoFlex
  drivers/acpi/glue: revert accidental license-related 6b66d95895 bits
  libata-acpi: add missing inlines in libata.h
  i2c-omap: Add support for I2C_M_STOP message flag
  i2c: Fall back to emulated SMBus if the operation isn't supported natively
  i2c: Add SCCB support
  i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converter
  ...
2012-09-05 10:22:45 -04:00
Konrad Rzeszutek Wilk
50e900417b xen/p2m: Fix one-off error in checking the P2M tree directory.
We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.

Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:

 BUG_ON(pfn >= MAX_P2M_PFN);

which we hit and saw this:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff819cadeb>]
(XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
(XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
(XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
(XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
(XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
(XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000661795000   cr2: 0000000000000000

Fixes-Oracle-Bug: 14570662
CC: stable@vger.kernel.org # only for v3.5
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-05 09:47:41 -04:00
Ingo Molnar
508dc4f8ee Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes because upcoming uprobes changes will rely on it.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-28 18:05:55 +02:00
Michael S. Tsirkin
1d92128fe9 KVM: x86: fix KVM_GET_MSR for PV EOI
KVM_GET_MSR was missing support for PV EOI,
which is needed for migration.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-27 18:03:05 -03:00
Stephane Eranian
e3e45c01ae perf/x86: Fix microcode revision check for SNB-PEBS
The following patch makes the microcode update code path
actually invoke the perf_check_microcode() function and
thus potentially renabling SNB PEBS.

By default, CONFIG_MICROCODE_OLD_INTERFACE is
forced to Y in arch/x86/Kconfig. There is no
way to disable this. That means that the code
path used in arch/x86/kernel/microcode_core.c
did not include the call to perf_check_microcode().

Thus, even though the microcode was updated to a
version that fixes the SNB PEBS problem, perf_event
would still return EOPNOTSUPP when enabling precise
sampling.

This patch simply adds a call to perf_check_microcode()
in the call path used when OLD_INTERFACE=y.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: peterz@infradead.org
Cc: andi@firstfloor.org
Link: http://lkml.kernel.org/r/20120824133434.GA8014@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-27 08:48:19 +02:00
Linus Torvalds
267560874c Three bug-fixes:
- Revert the kexec fix which caused on non-kexec shutdowns a race.
  - Reuse existing P2M leafs - instead of requiring to allocate a large
    area of bootup virtual address estate.
  - Fix a one-off error when adding PFNs for balloon pages.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQNppKAAoJEFjIrFwIi8fJU/oH/jdWdRqJgC5mCnu9LwrIemEj
 gPTAcKw01A/2vbOY5rfXx7rCpgeU5ZM/XSt0byz/J5q0bmjjKVM106Smq1s7EaQx
 OjsdLglWoZYzKJjXH/FEKRPD39f/hd+KNJu3aGEJM8UZ0htvxlg6ACGzVPJa83Pf
 yrRXSycxvEevbGbuwWdNubxD5WKMMmbzi/HGGfdtL4256d0xIgxMrYgskLek96cR
 cg11llC5QLzH8mX+M5iX0lchASvMITyERXyEKK2opFN8a/766yi16agP75RKZdkP
 kWXp0vyOMrpy9UnOs2V1XLc/ufqNwHLcPVfecScXhz8xZWrZYOBdJQf7HAWxvLE=
 =MgvT
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull three xen bug-fixes from Konrad Rzeszutek Wilk:
 - Revert the kexec fix which caused on non-kexec shutdowns a race.
 - Reuse existing P2M leafs - instead of requiring to allocate a large
   area of bootup virtual address estate.
 - Fix a one-off error when adding PFNs for balloon pages.

* tag 'stable/for-linus-3.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.
  xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
  Revert "xen PVonHVM: move shared_info to MMIO before kexec"
2012-08-25 17:31:59 -07:00
Linus Torvalds
6ec9776c28 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86 emulator: use stack size attribute to mask rsp in stack ops
  KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  ppc: e500_tlb memset clears nothing
  KVM: PPC: Add cache flush on page map
  KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code
  KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value
2012-08-25 17:27:17 -07:00
Steven Rostedt
d57c5d51a3 ftrace/x86: Add support for -mfentry to x86_64
If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
then use that instead of mcount.

With mcount, frame pointers are forced with the -pg option and we
get something like:

<can_vma_merge_before>:
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       53                      push   %rbx
       41 51                   push   %r9
       e8 fe 6a 39 00          callq  ffffffff81483d00 <mcount>
       31 c0                   xor    %eax,%eax
       48 89 fb                mov    %rdi,%rbx
       48 89 d7                mov    %rdx,%rdi
       48 33 73 30             xor    0x30(%rbx),%rsi
       48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi

With -mfentry, frame pointers are no longer forced and the call looks
like this:

<can_vma_merge_before>:
       e8 33 af 37 00          callq  ffffffff81461b40 <__fentry__>
       53                      push   %rbx
       48 89 fb                mov    %rdi,%rbx
       31 c0                   xor    %eax,%eax
       48 89 d7                mov    %rdx,%rdi
       41 51                   push   %r9
       48 33 73 30             xor    0x30(%rbx),%rsi
       48 f7 c6 ff ff ff f7    test   $0xfffffffff7ffffff,%rsi

This adds the ftrace hook at the beginning of the function before a
frame is set up, and allows the function callbacks to be able to access
parameters. As kprobes now can use function tracing (at least on x86)
this speeds up the kprobe hooks that are at the beginning of the
function.

Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.org

Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-08-23 11:26:36 -04:00
Konrad Rzeszutek Wilk
c96aae1f7f xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.
When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:

1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added

=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.

The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.

CC: stable@vger.kernel.org
Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23 10:14:52 -04:00
Andreas Herrmann
36bf50d769 x86, microcode, AMD: Fix broken ucode patch size check
This issue was recently observed on an AMD C-50 CPU where a patch of
maximum size was applied.

Commit be62adb492 ("x86, microcode, AMD: Simplify ucode verification")
added current_size in get_matching_microcode(). This is calculated as
size of the ucode patch + 8 (ie. size of the header). Later this is
compared against the maximum possible ucode patch size for a CPU family.
And of course this fails if the patch has already maximum size.

Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-08-22 16:10:41 -07:00
Avi Kivity
5ad105e569 KVM: x86 emulator: use stack size attribute to mask rsp in stack ops
The sub-register used to access the stack (sp, esp, or rsp) is not
determined by the address size attribute like other memory references,
but by the stack segment's B bit (if not in x86_64 mode).

Fix by using the existing stack_mask() to figure out the correct mask.

This long-existing bug was exposed by a combination of a27685c33a
(emulate invalid guest state by default), which causes many more
instructions to be emulated, and a seabios change (possibly a bug) which
causes the high 16 bits of esp to become polluted across calls to real
mode software interrupts.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-22 18:54:26 -03:00
Takuya Yoshikawa
35f2d16bb9 KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
Although the possible race described in

  commit 85b7059169
  KVM: MMU: fix shrinking page from the empty mmu

was correct, the real cause of that issue was a more trivial bug of
mmu_shrink() introduced by

  commit 1952639665
  KVM: MMU: do not iterate over all VMs in mmu_shrink()

Here is the bug:

	if (kvm->arch.n_used_mmu_pages > 0) {
		if (!nr_to_scan--)
			break;
		continue;
	}

We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
in other words we try to shrink empty ones by mistake.

This patch reverses the logic so that mmu_shrink() can free pages from
the first VM whose n_used_mmu_pages is not zero.  Note that we also add
comments explaining the role of nr_to_scan which is not practically
important now, hoping this will be improved in the future.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22 15:27:13 +03:00
Avi Kivity
cb09cad44f x86/alternatives: Fix p6 nops on non-modular kernels
Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel.  If something later triggers
patching, it will overwrite kernel code with garbage.

Reported-by: Tomas Racek <tracek@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-22 12:09:49 +02:00
Liu, Chuansheng
2530cd4f44 x86/fixup_irq: Use cpu_online_mask instead of cpu_all_mask
When one CPU is going down and this CPU is the last one in irq
affinity, current code is setting cpu_all_mask as the new
affinity for that irq.

But for some systems (such as in Medfield Android mobile) the
firmware sends the interrupt to each CPU in the irq affinity
mask, averaged, and cpu_all_mask includes all potential CPUs,
i.e. offline ones as well.

So replace cpu_all_mask with cpu_online_mask.

Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-22 10:36:08 +02:00
Richard Weinberger
83be4ffa1a x86/spinlocks: Fix comment in spinlock.h
This comment is no longer true.  We support up to 2^16 CPUs
because __ticket_t is an u16 if NR_CPUS is larger than 256.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-22 09:52:47 +02:00
Michal Hocko
eb48c07146 mm: hugetlbfs: correctly populate shared pmd
Each page mapped in a process's address space must be correctly
accounted for in _mapcount.  Normally the rules for this are
straightforward but hugetlbfs page table sharing is different.  The page
table pages at the PMD level are reference counted while the mapcount
remains the same.

If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:

  kernel BUG at mm/filemap.c:135!
  invalid opcode: 0000 [#1] SMP
  CPU 22
  Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
  Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
  RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
  Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
  Call Trace:
    delete_from_page_cache+0x40/0x80
    truncate_hugepages+0x115/0x1f0
    hugetlbfs_evict_inode+0x18/0x30
    evict+0x9f/0x1b0
    iput_final+0xe3/0x1e0
    iput+0x3e/0x50
    d_kill+0xf8/0x110
    dput+0xe2/0x1b0
    __fput+0x162/0x240

During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte.  The logic is if
the PMD page is the same, they must be shared.  This assumes that the
sharing is between the parent and child.  However, if the sharing is
with a different process entirely then this check fails as in this
diagram:

  parent
    |
    ------------>pmd
                 src_pte----------> data page
                                        ^
  other--------->pmd--------------------|
                  ^
  child-----------|
                 dst_pte

For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other.  This is
possible due to the following style of race.

  PROC A                                          PROC B
  copy_hugetlb_page_range                         copy_hugetlb_page_range
    src_pte == huge_pte_offset                      src_pte == huge_pte_offset
    !src_pte so no sharing                          !src_pte so no sharing

  (time passes)

  hugetlb_fault                                   hugetlb_fault
    huge_pte_alloc                                  huge_pte_alloc
      huge_pmd_share                                 huge_pmd_share
        LOCK(i_mmap_mutex)
        find nothing, no sharing
        UNLOCK(i_mmap_mutex)
                                                      LOCK(i_mmap_mutex)
                                                      find nothing, no sharing
                                                      UNLOCK(i_mmap_mutex)
      pmd_alloc                                       pmd_alloc
      LOCK(instantiation_mutex)
      fault
      UNLOCK(instantiation_mutex)
                                                  LOCK(instantiation_mutex)
                                                  fault
                                                  UNLOCK(instantiation_mutex)

These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed.  When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.

This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd.  This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.

Race identified and changelog written mostly by Mel Gorman.

{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-21 16:45:02 -07:00
Ingo Molnar
bcada3d4b8 perf/core improvements and fixes:
. Fix include order for bison/flex-generated C files, from Ben Hutchings
 
  . Build fixes and documentation corrections from David Ahern
 
  . Group parsing support, from Jiri Olsa
 
  . UI/gtk refactorings and improvements from Namhyung Kim
 
  . NULL deref fix for perf script, from Namhyung Kim
 
  . Assorted cleanups from Robert Richter
 
  . Let O= makes handle relative paths, from Steven Rostedt
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJQMkGhAAoJENZQFvNTUqpAqjsQAJE5iD1LFogC8o/WjvRHz0TY
 Y0x+sR/XfW61KYpeq5g+UaKuFU3P44ijCoyks3y5sza97DkYgUwMpEHlLXFSM8Pp
 sNOapqY57s24nq3MLrhH1V9w+cSE+m2u/Gi5fGLCQekio9gkOBwYxNGk7vpKri/n
 LBRsMozBu/mZjMy20uWOb7Uk8xsAToh+TFaAtjyQ9Snn9nNJj49NUAp37uN888H/
 ducMLq32HN5v/6Zd3q6IWdDWgZsHLkIa3R5FIs/GNe3Dih07gtYLmDol4ktPbTFm
 yoaWpP5wbtu/62EZlJwE393vMuoeqN/96394ZZQGFafhHVxN4+rcBhXbejBs0T2b
 wk/0CzntW8bbUAI/cl3SB9aui//FWOxcjG9aDQ7PsmHzPw1Q4VD0F9Mcod4p+dRX
 PsA9q/tST1eAiwzWYthDtj81U7iChINcXKhoZn2xn6+0+aMH+6FFNBmCH8MR5aCU
 BvrXhTJjvau/Ym/sILl4Tf4wfssTq49yMsn/YKCwLJ0hg0XlTObWfQRy2MOayXH9
 NJvUE+9GSXoTEKhmr1AfTYEG9vObaXZyFwAI74xvPPwUYojCb4ZjEKmG0egW+VGk
 IJKFCaJZwwVsGau4aIbFAMP12/L8Qs/Ox91ddCJ0j5TIlSGMaqW5lbV1N1crzlTT
 a0GsN49NvhbFttBXrcNX
 =0a2X
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

 * Fix include order for bison/flex-generated C files, from Ben Hutchings

 * Build fixes and documentation corrections from David Ahern

 * Group parsing support, from Jiri Olsa

 * UI/gtk refactorings and improvements from Namhyung Kim

 * NULL deref fix for perf script, from Namhyung Kim

 * Assorted cleanups from Robert Richter

 * Let O= makes handle relative paths, from Steven Rostedt

 * perf script python fixes, from Feng Tang.

 * Improve 'perf lock' error message when the needed tracepoints
   are not present, from David Ahern.

 * Initial bash completion support, from Frederic Weisbecker

 * Allow building without libelf, from Namhyung Kim.

 * Support DWARF CFI based unwind to have callchains when %bp
   based unwinding is not possible, from Jiri Olsa.

 * Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF
   section was the end goal, several fixes for code that handles all
   architectures and cleanups are included, from Cody Schafer.

 * Add a description for the JIT interface, from Andi Kleen.

 * Assorted fixes for Documentation and build in 32 bit, from Robert Richter

 * Add support for non-tracepoint events in perf script python, from Feng Tang

 * Cache the libtraceevent event_format associated to each evsel early, so that we
   avoid relookups, i.e. calling pevent_find_event repeatedly when processing
   tracepoint events.

   [ This is to reduce the surface contact with libtraceevents and make clear what
     is that the perf tools needs from that lib: so far parsing the common and per
     event fields. ]

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-21 11:27:00 +02:00
Ingo Molnar
26198c21d1 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull ftrace updates from Steve Rostedt:

" This patch series extends ftrace function tracing utility to be
  more dynamic for its users. It allows for data passing to the callback
  functions, as well as reading regs as if a breakpoint were to trigger
  at function entry.

  The main goal of this patch series was to allow kprobes to use ftrace
  as an optimized probe point when a probe is placed on an ftrace nop.
  With lots of help from Masami Hiramatsu, and going through lots of
  iterations, we finally came up with a good solution. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-21 11:23:40 +02:00
Linus Torvalds
c71a35520f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

A x32 socket ABI fix with a -stable backport tag among other fixes.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x32: Use compat shims for {g,s}etsockopt
  Revert "x86-64/efi: Use EFI to deal with platform wall clock"
  x86, apic: fix broken legacy interrupts in the logical apic mode
  x86, build: Globally set -fno-pic
  x86, avx: don't use avx instructions with "noxsave" boot param
2012-08-20 10:36:18 -07:00
Linus Torvalds
f78602ab7c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: disable PEBS on a guest entry.
  perf/x86: Add Intel Westmere-EX uncore support
  perf/x86: Fixes for Nehalem-EX uncore driver
  perf, x86: Fix uncore_types_exit section mismatch
2012-08-20 10:34:21 -07:00
Mike Frysinger
515c7af85e x32: Use compat shims for {g,s}etsockopt
Some of the arguments to {g,s}etsockopt are passed in userland pointers.
If we try to use the 64bit entry point, we end up sometimes failing.

For example, dhcpcd doesn't run in x32:
	# dhcpcd eth0
	dhcpcd[1979]: version 5.5.6 starting
	dhcpcd[1979]: eth0: broadcasting for a lease
	dhcpcd[1979]: eth0: open_socket: Invalid argument
	dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor

The code in particular is getting back EINVAL when doing:
	struct sock_fprog pf;
	setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));

Diving into the kernel code, we can see:
include/linux/filter.h:
	struct sock_fprog {
		unsigned short len;
		struct sock_filter __user *filter;
	};

net/core/sock.c:
	case SO_ATTACH_FILTER:
		ret = -EINVAL;
		if (optlen == sizeof(struct sock_fprog)) {
			struct sock_fprog fprog;

			ret = -EFAULT;
			if (copy_from_user(&fprog, optval, sizeof(fprog)))
				break;

			ret = sk_attach_filter(&fprog, sk);
		}
		break;

arch/x86/syscalls/syscall_64.tbl:
	54 common setsockopt sys_setsockopt
	55 common getsockopt sys_getsockopt

So for x64, sizeof(sock_fprog) is 16 bytes.  For x86/x32, it's 8 bytes.
This comes down to the pointer being 32bit for x32, which means we need
to do structure size translation.  But since x32 comes in directly to
sys_setsockopt, it doesn't get translated like x86.

After changing the syscall table and rebuilding glibc with the new kernel
headers, dhcp runs fine in an x32 userland.

Oddly, it seems like Linus noted the same thing during the initial port,
but I guess that was missed/lost along the way:
	https://lkml.org/lkml/2011/8/26/452

[ hpa: tagging for -stable since this is an ABI fix. ]

Bugzilla: https://bugs.gentoo.org/423649
Reported-by: Mads <mads@ab3.no>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org
Cc: H. J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org> v3.4..v3.5
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-08-18 14:15:39 -07:00
Konrad Rzeszutek Wilk
250a41e0ec xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.
If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.

This also means we can remove git commit:
5bc6f9888d
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.

and make the amount that is required to be reserved much smaller.

CC: stable@vger.kernel.org # for 3.5 only.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-17 09:29:15 -04:00
Linus Torvalds
ad54e46113 Fix:
* On machines with large MMIO/PCI E820 spaces we fail to boot b/c
    we failed to pre-allocate large enough virtual space for extend_brk.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQKlV9AAoJEFjIrFwIi8fJZh4H/0ZlRrgG+8mqwCM+pcyYY+2a
 zqnOrfYUO/aO26oqiOQUrn4quLAElhBuJK19uSj8fckMMZ+sr5rTJTaXmT6b7F7N
 pgTXsKQCYAJ2NNGHVSQ73KYjOUeEW3woDSQZo0y/GRzOjiQsxpoFc8PS94ZieUNT
 G6a8ECZBRv3fz8nAuJlhGV/suqHGOLJ0pwum1gHGOzaH3ZoZVtaQv5LhGYctJspU
 yF5bdeD0qjCbseVtJ72tyxzLxMwLpJtdy2MbSwIv5JGuszj0nRmL4oa7Vc4vYdyv
 p+FrNmbDAZ1j61z1PhBZPmgzwba2LTXtIWhR2zsGJgqlJNzMUtlNkff1kT3NeE0=
 =Gl6V
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fix from Konrad Rzeszutek Wilk:
 "Way back in v3.5 we added a mechanism to populate back pages that were
  released (they overlapped with MMIO regions), but neglected to reserve
  the proper amount of virtual space for extend_brk to work properly.

  Coincidentally some other commit aligned the _brk space to larger area
  so I didn't trigger this until it was run on a machine with more than
  2GB of MMIO space."

 * On machines with large MMIO/PCI E820 spaces we fail to boot b/c
   we failed to pre-allocate large enough virtual space for extend_brk.

* tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
2012-08-16 11:31:59 -07:00
Konrad Rzeszutek Wilk
ca08649eb5 Revert "xen PVonHVM: move shared_info to MMIO before kexec"
This reverts commit 00e37bdb01.

During shutdown of PVHVM guests with more than 2VCPUs on certain
machines we can hit the race where the replaced shared_info is not
replaced fast enough and the PV time clock retries reading the same
area over and over without any any success and is stuck in an
infinite loop.

Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-16 13:05:25 -04:00
H. Peter Anvin
f026cfa82f Revert "x86-64/efi: Use EFI to deal with platform wall clock"
This reverts commit bacef661ac.

This commit has been found to cause serious regressions on a number of
ASUS machines at the least.  We probably need to provide a 1:1 map in
addition to the EFI virtual memory map in order for this to work.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reported-and-bisected-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu
2012-08-14 09:58:25 -07:00
Suresh Siddha
f1c6300183 x86, apic: fix broken legacy interrupts in the logical apic mode
Recent commit 332afa656e cleaned up
a workaround that updates irq_cfg domain for legacy irq's that
are handled by the IO-APIC. This was assuming that the recent
changes in assign_irq_vector() were sufficient to remove the workaround.

But this broke couple of AMD platforms. One of them seems to be
sending interrupts to the offline cpu's, resulting in spurious
"No irq handler for vector xx (irq -1)" messages when those cpu's come online.
And the other platform seems to always send the interrupt to the last logical
CPU (cpu-7). Recent changes had an unintended side effect of using only logical
cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this
broke the legacy interrupts not getting routed to the cpu-7 on the AMD
platform, resulting in a boot hang.

For now, reintroduce the removed workaround, (essentially not allowing the
vector to change for legacy irq's when io-apic starts to handle the irq. Which
also addressed the uninteded sife effect of just specifying cpu-0 in the
IO-APIC RTE for those irq's during boot).

Reported-and-tested-by: Robert Richter <robert.richter@amd.com>
Reported-and-tested-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-08-14 09:52:20 -07:00
Gleb Natapov
26a4f3c08d perf/x86: disable PEBS on a guest entry.
If PMU counter has PEBS enabled it is not enough to disable counter
on a guest entry since PEBS memory write can overshoot guest entry
and corrupt guest memory. Disabling PEBS during guest entry solves
the problem.

Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120809085234.GI3341@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 19:01:04 +02:00