Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins
On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can
return a value larger than NR_IRQS. This will lead to probe_irq_on()
overrunning the irq_desc array.
I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a
Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects
5 IOAPICs and returns 240. Here are the log messages:
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24])
Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48])
Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72])
Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95
Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96])
Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119
Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
balancing defaults on NUMA and SMP systems.
Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
why we would not want to have wakeup affinity across nodes as well.
(we already do this in the standard NUMA template.)
lat_ctx on a NUMA box is particularly happy about this change:
before:
| phoenix:~/l> ./lat_ctx -s 0 2
| "size=0k ovr=2.60
| 2 5.70
after:
| phoenix:~/l> ./lat_ctx -s 0 2
| "size=0k ovr=2.65
| 2 2.07
a 2.75x speedup.
pipe-test is similarly happy about it too:
| phoenix:~/sched-tests> ./pipe-test
| 18.26 usecs/loop.
| 14.70 usecs/loop.
| 14.38 usecs/loop.
| 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1
| 8.63 usecs/loop.
| 8.59 usecs/loop.
| 9.03 usecs/loop.
| 8.94 usecs/loop.
| 8.96 usecs/loop.
| 8.63 usecs/loop.
Also:
- disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
- enable SD_WAKE_BALANCE on SMP domains
Sysbench+postgresql improves all around the board, quite significantly:
.28-rc3-11474e2c .28-rc3-11474e2c-tune
-------------------------------------------------
1: 571 688 +17.08%
2: 1236 1206 -2.55%
4: 2381 2642 +9.89%
8: 4958 5164 +3.99%
16: 9580 9574 -0.07%
32: 7128 8118 +12.20%
64: 7342 8266 +11.18%
128: 7342 8064 +8.95%
256: 7519 7884 +4.62%
512: 7350 7731 +4.93%
-------------------------------------------------
SUM: 55412 59341 +6.62%
So it's a win both for the runup portion, the peak area and the tail.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix udelay when "notsc" boot parameter is passed
With notsc passed on commandline, tsc may not be used for
udelays, make sure that we do not use tsc_khz to calculate
the lpj value in such cases.
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
io mapping: clean up #ifdefs
io mapping: improve documentation
i915: use io-mapping interfaces instead of a variety of mapping kludges
resources: add io-mapping functions to dynamically map large device apertures
x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
Impact: cleanup
clean up ifdefs: change #ifdef CONFIG_X86_32/64 to
CONFIG_HAVE_ATOMIC_IOMAP.
flip around the #ifdef sections to clean up the structure.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix x86/Voyager build
Looks like this became static on the rest of x86. Fix it up by adding
an external definition to mach-voyager/setup.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature
x86: build fix
This makes the late e820 resources use 'insert_resource_expand_to_fit()'
instead of doing a 'reserve_region_with_split()', and also avoids
marking them as IORESOURCE_BUSY.
This results in us being perfectly happy to use pre-existing PCI
resources even if they were marked as being in a reserved region, while
still avoiding any _new_ allocations in the reserved regions. It also
makes for a simpler and more accurate resource tree.
Example resource allocation from Jonathan Corbet, who has firmware that
has an e820 reserved entry that covered a big range (e0000000-fed003ff),
and that had various PCI resources in it set up by firmware.
With old kernels, the reserved range would force us to re-allocate all
pre-existing PCI resources, and his reserved range would end up looking
like this:
e0000000-fed003ff : reserved
fec00000-fec00fff : IOAPIC 0
fed00000-fed003ff : HPET 0
where only the pre-allocated special regions (IOAPIC and HPET) were kept
around.
With 2.6.28-rc2, which uses 'reserve_region_with_split()', Jonathan's
resource tree looked like this:
e0000000-fe7fffff : reserved
fe800000-fe8fffff : PCI Bus 0000:01
fe800000-fe8fffff : reserved
fe900000-fe9d9aff : reserved
fe9d9b00-fe9d9bff : 0000:00:1f.3
fe9d9b00-fe9d9bff : reserved
fe9d9c00-fe9d9fff : 0000:00:1a.7
fe9d9c00-fe9d9fff : reserved
fe9da000-fe9dafff : 0000:00:03.3
fe9da000-fe9dafff : reserved
fe9db000-fe9dbfff : 0000:00:19.0
fe9db000-fe9dbfff : reserved
fe9dc000-fe9dffff : 0000:00:1b.0
fe9dc000-fe9dffff : reserved
fe9e0000-fe9fffff : 0000:00:19.0
fe9e0000-fe9fffff : reserved
fea00000-fea7ffff : 0000:00:02.0
fea00000-fea7ffff : reserved
fea80000-feafffff : 0000:00:02.1
fea80000-feafffff : reserved
feb00000-febfffff : 0000:00:02.0
feb00000-febfffff : reserved
fec00000-fed003ff : reserved
fec00000-fec00fff : IOAPIC 0
fed00000-fed003ff : HPET 0
and because the reserved entry had been split and moved into the
individual resources, and because it used the IORESOURCE_BUSY flag, the
drivers that actually wanted to _use_ those resources couldn't actually
attach to them:
e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff]
with this patch, the resource tree instead becomes
e0000000-fed003ff : reserved
fe800000-fe8fffff : PCI Bus 0000:01
fe9d9b00-fe9d9bff : 0000:00:1f.3
fe9d9c00-fe9d9fff : 0000:00:1a.7
fe9d9c00-fe9d9fff : ehci_hcd
fe9da000-fe9dafff : 0000:00:03.3
fe9db000-fe9dbfff : 0000:00:19.0
fe9db000-fe9dbfff : e1000e
fe9dc000-fe9dffff : 0000:00:1b.0
fe9dc000-fe9dffff : ICH HD audio
fe9e0000-fe9fffff : 0000:00:19.0
fe9e0000-fe9fffff : e1000e
fea00000-fea7ffff : 0000:00:02.0
fea80000-feafffff : 0000:00:02.1
feb00000-febfffff : 0000:00:02.0
fec00000-fec00fff : IOAPIC 0
fed00000-fed003ff : HPET 0
ie the one reserved region now ends up surrounding all the PCI resources
that were allocated inside of it by firmware, and because it is not
marked BUSY, drivers have no problem attaching to the pre-allocated
resources.
Reported-and-tested-by: Jonathan Corbet <corbet@lwn.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM
This takes the code used for CONFIG_HIGHMEM memory mappings except that
it's designed for dynamic IO resource mapping.
These fixmaps are available even with CONFIG_HIGHMEM turned off.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix on certain UP configs
fix:
arch/x86/kernel/cpu/common.c: In function 'cpu_init':
arch/x86/kernel/cpu/common.c:1141: error: 'boot_cpu_id' undeclared (first use in this function)
arch/x86/kernel/cpu/common.c:1141: error: (Each undeclared identifier is reported only once
arch/x86/kernel/cpu/common.c:1141: error: for each function it appears in.)
Pull in asm/smp.h on UP, so that we get the definition of
boot_cpu_id.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: fix irq vectors.
lguest: fix early_ioremap.
lguest: fix example launcher compile after moved asm-x86 dir.
do_IRQ: cannot handle IRQ -1 vector 0x20 cpu 0
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/irq_32.c:219!
We're not ISA: we have a 1:1 mapping from vectors to irqs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dmi_scan_machine breaks under lguest:
lguest: unhandled trap 14 at 0xc04edeae (0xffa00000)
This is because we use current_cr3 for the read_cr3() paravirt
function, and it isn't set until the first cr3 change. We got away
with it until this happened.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
fix:
arch/x86/kernel/cpu/common.c: In function 'early_identify_cpu':
arch/x86/kernel/cpu/common.c:553: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
as cpu_index is only available on SMP.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix /proc/cpuinfo output on x86/Voyager
Ever since
| commit 92cb7612ae
| Author: Mike Travis <travis@sgi.com>
| Date: Fri Oct 19 20:35:04 2007 +0200
|
| x86: convert cpuinfo_x86 array to a per_cpu array
We've had an extra field in cpuinfo_x86 which is cpu_index.
Unfortunately, voyager has never initialised this, although the only
noticeable impact seems to be that /proc/cpuinfo shows all zeros for
the processor ids.
Anyway, fix this by initialising the boot CPU properly and setting the
index when the secondaries update.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix on x86/Voyager
Given commits like this:
| Author: Suresh Siddha <suresh.b.siddha@intel.com>
| Date: Tue Jul 29 10:29:19 2008 -0700
|
| x86, xsave: enable xsave/xrstor on cpus with xsave support
Which deliberately expose boot cpu dependence to pieces of the system,
I think it's time to explicitly have a variable for it to prevent this
continual misassumption that the boot CPU is zero.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: allow /dev/mem mmaps on non-PAT CPUs/platforms
Fix mmap to /dev/mem when CONFIG_X86_PAT is off and CONFIG_STRICT_DEVMEM is
off
mmap to /dev/mem on kernel memory has been failing since the
introduction of PAT (CONFIG_STRICT_DEVMEM=n case). Seems like
the check to avoid cache aliasing with PAT is kicking in even
when PAT is disabled. The bug seems to have crept in 2.6.26.
This patch makes sure that the mmap to regular
kernel memory succeeds if CONFIG_STRICT_DEVMEM=n and
PAT is disabled, and the checks to avoid cache aliasing
still happens if PAT is enabled.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Tested-by: Tim Sirianni <tim@scalemp.com>
Cc: <stable@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix build failure on x86/Voyager
Before:
| commit 329513a35d
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Wed Jul 2 18:54:40 2008 -0700
|
| x86: move prefill_possible_map calling early
prefill_possible_mask() was hidden under CONFIG_HOTPLUG_CPU rendering
it invisitble to voyager. Since this commit it's exposed, but not
provided by the voyager subarch, so add a dummy stub to fix the link
breakage.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix x86/Voyager boot
CONFIG_SMP is used for features which work on *all* x86 boxes.
CONFIG_X86_SMP is used for standard PC like x86 boxes (for things like
multi core and apics)
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: boot up secondary CPUs as well on x86/Voyager systems
This commit:
| commit 3e9704739d
| Author: Glauber Costa <gcosta@redhat.com>
| Date: Wed May 28 13:01:54 2008 -0300
|
| x86: boot secondary cpus through initial_code
removed the use of initialize_secondary. However, it didn't update
voyager, so the secondary cpus no longer boot. Fix this by adding the
initial_code switch to voyager as well.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: include file dependency cleanup
Fix compile errors of files that include asm/uv/uv_hub.h but do
not include linux/timer.h.
[ such files are not mainline right now. ]
Signed-of-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
To the unsuspecting user it is quite annoying that this broken and
inconsistent with x86-64 definition still exists.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove incorrect WARN_ON(1)
Gets rid of dmesg spam created during physical memory hot-add which
will very likely confuse users. The change removes what appears to
be debugging code which I assume was unintentionally included in:
x86: arch/x86/mm/init_64.c printk fixes
commit 10f22dde55
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: some new sparse warnings in e820.c etc, but no functional change.
As with regular ioremap, iounmap etc, annotate with __iomem.
Fixes the following sparse warnings, will produce some new ones
elsewhere in arch/x86 that will get worked out over time.
arch/x86/mm/ioremap.c:402:9: warning: cast removes address space of expression
arch/x86/mm/ioremap.c:406:10: warning: cast adds address space to expression (<asn:2>)
arch/x86/mm/ioremap.c:782:19: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
ftrace: fix current_tracer error return
tracing: fix a build error on alpha
ftrace: use a real variable for ftrace_nop in x86
tracing/ftrace: make boot tracer select the sched_switch tracer
tracepoint: check if the probe has been registered
asm-generic: define DIE_OOPS in asm-generic
trace: fix printk warning for u64
ftrace: warning in kernel/trace/ftrace.c
ftrace: fix build failure
ftrace, powerpc, sparc64, x86: remove notrace from arch ftrace file
ftrace: remove ftrace hash
ftrace: remove mcount set
ftrace: remove daemon
ftrace: disable dynamic ftrace for all archs that use daemon
ftrace: add ftrace warn on to disable ftrace
ftrace: only have ftrace_kill atomic
ftrace: use probe_kernel
ftrace: comment arch ftrace code
ftrace: return error on failed modified text.
ftrace: dynamic ftrace process only text section
...
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix irqs on/off ip tracing
lockdep: minor fix for debug_show_all_locks()
x86: restore the old swiotlb alloc_coherent behavior
x86: use GFP_DMA for 24bit coherent_dma_mask
swiotlb: remove panic for alloc_coherent failure
xen: compilation fix of drivers/xen/events.c on IA64
xen: portability clean up and some minor clean up for xencomm.c
xen: don't reload cr3 on suspend
kernel/resource: fix reserve_region_with_split() section mismatch
printk: remove unused code from kernel/printk.c
Impact: fix AMD Family 11h boot hangs / USB device problems
The AMD Fam11h CPUs have a K8 northbridge. This northbridge is different
from other family's because it lacks GART support (as I just learned).
But the kernel implicitly expects a GART if it finds an AMD northbridge.
Fix this by removing the Fam11h northbridge id from the scan list of K8
northbridges. This patch also changes the message in the GART driver
about missing K8 northbridges to tell that the GART is missing which is
the correct information in this case.
Reported-by: Jouni Malinen <jkmalinen@gmail.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so users are not confused with memhole causing big total ram
we don't need to worry about 32 bit, because memhole is always
above max_low_pfn.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix kdump crash on 32-bit sparsemem kernels
Since linux-2.6.27, kdump has failed on i386 sparsemem kernel.
1st-kernel gets a panic just before switching to 2nd-kernel.
The cause is that a kernel accesses invalid mem_section by
page_to_pfn(image->swap_page) at machine_kexec().
image->swap_page is allocated if kexec for hibernation, but
it is not allocated if kdump. So if kdump, a kernel should
not access the mem_section corresponding to image->swap_page.
The attached patch fixes this invalid access.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: kexec-ml <kexec@lists.infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
APIC_DEBUG is always 2.
need to update inquire_remote_apic to check apic_verbosity with
it instead.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Improve the help text of the X86_PTRACE_BTS config.
Make X86_DS invisible and depend on X86_PTRACE_BTS.
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Every call of kvm_set_irq() should offer an irq_source_id, which is
allocated by kvm_request_irq_source_id(). Based on irq_source_id, we
identify the irq source and implement logical OR for shared level
interrupts.
The allocated irq_source_id can be freed by kvm_free_irq_source_id().
Currently, we support at most sizeof(unsigned long) different irq sources.
[Amit: - rebase to kvm.git HEAD
- move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file
- move kvm_request_irq_source_id to the update_irq ioctl]
[Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The pvmmu TLB flush handler should request a root sync, similarly to
a native read-write CR3.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: fix crash with memory hotplug
Shuahua Li found:
| I just did some experiments on a desktop for memory hotplug and this bug
| triggered a crash in my test.
|
| Yinghai's suggestion also fixed the bug.
We don't need to round it, just remove that extra -1
Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
now that the code is moved and converted to a work queue,
there's some minor cleanups that can be done.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: change the implementation of the debug feature
the periodic corruption checks are better off run from a work queue; there's
nothing time critical about them and this way the amount of
interrupt-context work is reduced.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The corruption check code is rather sizable and it's likely to grow over
time when we add checks for more types of corruptions (there's a few
candidates in kerneloops.org that I want to add checks for)... so lets move
it to its own file
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Before moving the code to it's own file, fix some style issues
in the corruption check code.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: avoid section mismatch warning, clean up
The dynamic ftrace determines which nop is safe to use at start up.
When it finds a safe nop for patching, it sets a pointer called ftrace_nop
to point to the code. All call sites are then patched to this nop.
Later, when tracing is turned on, this ftrace_nop variable is again used
to compare the location to make sure it is a nop before we update it to
an mcount call. If this fails just once, a warning is printed and ftrace
is disabled.
Rakib Mullick noted that the code that sets up the nop is a .init section
where as the nop itself is in the .text section. This is needed because
the nop is used later on after boot up. The problem is that the test of the
nop jumps back to the setup code and causes a "section mismatch" warning.
Rakib first recommended to convert the nop to .init.text, but as stated
above, this would fail since that text is used later.
The real solution is to extend Rabik's patch, and to make the ftrace_nop
into an array, and just save the code from the assembly to this array.
Now the section can stay as an init section, and we have a nop to use
later on.
Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: on SGI UV platforms, fix boot crash
UV initialization is currently called too late to call alloc_bootmem_pages().
The current sequence is:
start_kernel()
mem_init()
free_all_bootmem() <--- discard of bootmem
rest_init()
kernel_init()
smp_prepare_cpus()
native_smp_prepare_cpus()
uv_system_init() <--- uses alloc_bootmem_pages()
It should be calling kmalloc().
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix guest kernel boot crash on certain configs
Recent i686 2.6.27 kernels with a certain amount of memory (between
736 and 855MB) have a problem booting under a hypervisor that supports
batched mprotect (this includes the RHEL-5 Xen hypervisor as well as
any 3.3 or later Xen hypervisor).
The problem ends up being that xen_ptep_modify_prot_commit() is using
virt_to_machine to calculate which pfn to update. However, this only
works for pages that are in the p2m list, and the pages coming from
change_pte_range() in mm/mprotect.c are kmap_atomic pages. Because of
this, we can run into the situation where the lookup in the p2m table
returns an INVALID_MFN, which we then try to pass to the hypervisor,
which then (correctly) denies the request to a totally bogus pfn.
The right thing to do is to use arbitrary_virt_to_machine, so that we
can be sure we are modifying the right pfn. This unfortunately
introduces a performance penalty because of a full page-table-walk,
but we can avoid that penalty for pages in the p2m list by checking if
virt_addr_valid is true, and if so, just doing the lookup in the p2m
table.
The attached patch implements this, and allows my 2.6.27 i686 based
guest with 768MB of memory to boot on a RHEL-5 hypervisor again.
Thanks to Jeremy for the suggestions about how to fix this particular
issue.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Thu, Oct 23, 2008 at 04:09:52PM -0700, Alexander Beregalov wrote:
> arch/x86/kernel/built-in.o: In function `iommu_setup':
> pci-dma.c:(.init.text+0x36ad): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x36cc): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x3711): undefined reference to `forbid_dac
This patch partially reverts a patch to add IOMMU support to ia64. The
forbid_dac variable was incorrectly moved to quirks.c, which isn't built
when PCI is disabled.
Tested-by: "Alexander Beregalov" <a.beregalov@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This restores the old swiotlb alloc_coherent behavior (before the
alloc_coherent rewrite):
http://lkml.org/lkml/2008/8/12/200
The old alloc_coherent avoids GFP_DMA allocation first and if the
allocated address is not fit for the device's coherent_dma_mask, then
dma_alloc_coherent does GFP_DMA allocation. If it fails,
alloc_coherent calls swiotlb_alloc_coherent (in short, we rarely used
swiotlb_alloc_coherent).
After the alloc_coherent rewrite, dma_alloc_coherent
(include/asm-x86/dma-mapping.h) directly calls swiotlb_alloc_coherent.
It means that we possibly can't handle a device having dma_masks >
24bit < 32bits since swiotlb_alloc_coherent doesn't have the above
GFP_DMA retry mechanism.
This patch fixes x86's swiotlb alloc_coherent to use the GFP_DMA retry
mechanism, which dma_generic_alloc_coherent() provides now
(pci-nommu.c and GART IOMMU driver also use
dma_generic_alloc_coherent).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dma_alloc_coherent (include/asm-x86/dma-mapping.h) avoids GFP_DMA
allocation first and if the allocated address is not fit for the
device's coherent_dma_mask, then dma_alloc_coherent does GFP_DMA
allocation. This is because dma_alloc_coherent avoids precious GFP_DMA
zone if possible. This is also how the old dma_alloc_coherent
(arch/x86/kernel/pci-dma.c) works.
However, if the coherent_dma_mask of a device is 24bit, there is no
point to go into the above GFP_DMA retry mechanism. We had better use
GFP_DMA in the first place.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix section mismatch warning - apic_x2apic_phys
x86: fix section mismatch warning - apic_x2apic_cluster
x86: fix section mismatch warning - apic_x2apic_uv_x
x86: fix section mismatch warning - apic_physflat
x86: fix section mismatch warning - apic_flat
x86: memtest fix use of reserve_early()
x86 syscall.h: fix argument order
x86/tlb_uv: remove strange mc146818rtc include
x86: remove redundant KERN_DEBUG on pr_debug
x86: do_boot_cpu - check if we have ESR register
x86: MAINTAINERS change for AMD microcode patch loader
x86/proc: fix /proc/cpuinfo cpu offline bug
x86: call dmi-quirks for HP Laptops after early-quirks are executed
x86, kexec: fix hang on i386 when panic occurs while console_sem is held
MCE: Don't run 32bit machine checks with interrupts on
x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC
x86: make variables static
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
hrtimers: add missing docbook comments to struct hrtimer
hrtimers: simplify hrtimer_peek_ahead_timers()
hrtimers: fix docbook comments
DECLARE_PER_CPU needs linux/percpu.h
hrtimers: fix typo
rangetimers: fix the bug reported by Ingo for real
rangetimer: fix BUG_ON reported by Ingo
rangetimer: fix x86 build failure for the !HRTIMERS case
select: fix alpha OSF wrapper
select: fix alpha OSF wrapper
hrtimer: peek at the timer queue just before going idle
hrtimer: make the futex() system call use the per process slack value
hrtimer: make the nanosleep() syscall use the per process slack
hrtimer: fix signed/unsigned bug in slack estimator
hrtimer: show the timer ranges in /proc/timer_list
hrtimer: incorporate feedback from Peter Zijlstra
hrtimer: add a hrtimer_start_range() function
hrtimer: another build fix
hrtimer: fix build bug found by Ingo
hrtimer: make select() and poll() use the hrtimer range feature
...
* 'x86/um-header' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
x86: canonicalize remaining header guards
x86: drop double underscores from header guards
x86: Fix ASM_X86__ header guards
x86, um: get rid of uml-config.h
x86, um: get rid of arch/um/Kconfig.arch
x86, um: get rid of arch/um/os symlink
x86, um: get rid of excessive includes of uml-config.h
x86, um: get rid of header symlinks
x86, um: merge Kconfig.i386 and Kconfig.x86_64
x86, um: get rid of sysdep symlink
x86, um: trim the junk from uml ptrace-*.h
x86, um: take vm-flags.h to sysdep
x86, um: get rid of uml asm/arch
x86, um: get rid of uml highmem.h
x86, um: get rid of uml unistd.h
x86, um: get rid of system.h -> system.h include
x86, um: uml atomic.h is not needed anymore
x86, um: untangle uml ldt.h
x86, um: get rid of more uml asm/arch uses
x86, um: remove dead header (uml module-generic.h; never used these days)
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (123 commits)
dock: make dock driver not a module
ACPI: fix ia64 build warning
ACPI: hack around sysfs warning with link order
ACPI suspend: fix build warning when CONFIG_ACPI_SLEEP=n
intel_menlo: fix build warning
panasonic-laptop: fix build
ACPICA: Update version to 20080926
ACPICA: Add support for zero-length buffer-to-string conversions
ACPICA: New: Validation for predefined ACPI methods/objects
ACPICA: Fix for implicit return compatibility
ACPICA: Fixed a couple memory leaks associated with "implicit return"
ACPICA: Optimize buffer allocation procedure
ACPICA: Fix possible memory leak, error exit path
ACPICA: Fix fault after mem allocation failure in AML parser
ACPICA: Remove unused ACPI register bit definition
ACPICA: Update version to 20080829
ACPICA: Fix possible memory leak in acpi_ns_get_external_pathname
ACPICA: Cleanup for internal Reference Object
ACPICA: Update comments - no functional changes
ACPICA: Update for Reference ACPI_OPERAND_OBJECT
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (21 commits)
OProfile: Fix buffer synchronization for IBS
oprofile: hotplug cpu fix
oprofile: fixing whitespaces in arch/x86/oprofile/*
oprofile: fixing whitespaces in arch/x86/oprofile/*
oprofile: fixing whitespaces in drivers/oprofile/*
x86/oprofile: add the logic for enabling additional IBS bits
x86/oprofile: reordering functions in nmi_int.c
x86/oprofile: removing unused function parameter in add_ibs_begin()
oprofile: more whitespace fixes
oprofile: whitespace fixes
OProfile: Rename IBS sysfs dir into "ibs_op"
OProfile: Rework string handling in setup_ibs_files()
OProfile: Rework oprofile_add_ibs_sample() function
oprofile: discover counters for op ppro too
oprofile: Implement Intel architectural perfmon support
oprofile: Don't report Nehalem as core_2
oprofile: drop const in num counters field
Revert "Oprofile Multiplexing Patch"
x86, oprofile: BUG: using smp_processor_id() in preemptible code
x86/oprofile: fix on_each_cpu build error
...
Manually fixed trivial conflicts in
drivers/oprofile/{cpu_buffer.c,event_buffer.h}
* git://git.infradead.org/iommu-2.6:
Admit to maintaining VT-d, for my sins.
dmar: fix uninitialised 'ret' variable in dmar_parse_dev()
intel-iommu: use coherent_dma_mask in alloc_coherent
amd_iommu: fix nasty bug that caused ILLEGAL_DEVICE_TABLE_ENTRY errors
intel-iommu: IA64 support
dmar: remove the quirk which disables dma-remapping when intr-remapping enabled
dmar: Use queued invalidation interface for IOTLB and context invalidation
dmar: context cache and IOTLB invalidation using queued invalidation
dmar: use spin_lock_irqsave() in qi_submit_sync()
The entire file of ftrace.c in the arch code needs to be marked
as notrace. It is much cleaner to do this from the Makefile with
CFLAGS_REMOVE_ftrace.o.
[ powerpc already had this in its Makefile. ]
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The arch dependent function ftrace_mcount_set was only used by the daemon
start up code. This patch removes it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andrew Morton suggested using the proper API for reading and writing
kernel areas that might fault.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add comments to explain what is happening in the x86 arch ftrace code.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Have the ftrace_modify_code return error values:
-EFAULT on error of reading the address
-EINVAL if what is read does not match what it expected
-EPERM if the write fails to update after a successful match.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Canonicalize a few remaining header guards, with the exception for
those which are still in subarchitecture directories.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Drop double underscores from header guards in arch/x86/include. They
are used inconsistently, and are not necessary.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:
a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup only, no functionality changed
WARNING: vmlinux.o(.data+0xc008): Section mismatch in reference from the variable apic_x2apic_phys to the function .init.text:x2apic_acpi_madt_oem_check()
The variable apic_x2apic_phys references
the function __init x2apic_acpi_madt_oem_check()
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup only, no functionality changed
WARNING: vmlinux.o(.data+0xbf88): Section mismatch in reference from the variable apic_x2apic_cluster to the function .init.text:x2apic_acpi_madt_oem_check()
The variable apic_x2apic_cluster references
the function __init x2apic_acpi_madt_oem_check()
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup only, no functionality changed
WARNING: vmlinux.o(.data+0xbf08): Section mismatch in reference from the variable apic_x2apic_uv_x to the function .init.text:uv_acpi_madt_oem_check()
The variable apic_x2apic_uv_x references
the function __init uv_acpi_madt_oem_check()
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup only, no functionality changed
WARNING: vmlinux.o(.data+0xbe88): Section mismatch in reference from the variable apic_physflat to the function .init.text:physflat_acpi_madt_oem_check()
The variable apic_physflat references
the function __init physflat_acpi_madt_oem_check()
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup only, no functionality changed
WARNING: vmlinux.o(.data+0xbe08): Section mismatch in reference from the variable apic_flat to the function .init.text:flat_acpi_madt_oem_check()
The variable apic_flat references
the function __init flat_acpi_madt_oem_check()
This is harmless, because the .acpi_madt_oem_check is only called
during init time. But we keep the function pointer around in a .data
function pointer template, so it's better we do not keep that stale
- so mark this function non-__init.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hi all,
Wrong usage of 2nd parameter in reserve_early call.
66/75: reserve_early(start_bad, last_bad - start_bad, "BAD RAM");
^^^^^^^^^^^^^^^^^^^^
The correct way is to use 'end' address and not 'size'.
As a bonus a fix to the printk format.
Signed-off-by: Daniele Calore <orkaan@orkaan.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some reason tlb_uv was including linux/mc146818rtc.h. It really
just needs linux/seq_file.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citix.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix APIC IRQ irregularities on certain older boxes
We should touch the APIC ESR register if only we have it.
The patch fixes the problem mentioned by Max Kellermann:
http://lkml.org/lkml/2008/10/17/147
Bisected-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
[ mingo@elte.hu: build fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix missing CPUs in /proc/cpuinfo after CPU hotunplug/hotreplug
In my test, I found that if a cpu has been offline,
the next cpus may not be shown in the /proc/cpuinfo.
if one read() cannot consume the whole /proc/cpuinfo,
c_start() will be called again in the next read() calls.
And *pos has been increased by 1 by the caller(seq_read()).
if this time the cpu#*pos is offline, c_start() will return
NULL, and the next cpus can not be shown.
this fix use next_cpu_nr(*pos - 1, cpu_online_map) to
search the next unshown cpu.
the most easy way to reproduce this bug:
1) offline cpu#1 (cpu#0 is online)
2) dd ibs=2 if=/proc/cpuinfo
the result is that only cpu#0 is shown.
cpu#2 and cpu#3 .... cannot be shown in /proc/cpuinfo
it's bug.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make warning message disappear - functionality unchanged
Problems with bogus IRQ0 override of those laptops should be fixed
with commits
x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC
x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC
that introduce early-quirks based on chipset configuration.
For further information, see
http://bugzilla.kernel.org/show_bug.cgi?id=11516
Instead of removing the related dmi-quirks completely we'd like to
keep them for (at least) one kernel version -- to double-check whether
the early-quirks really took effect. But the dmi-quirks need to be
called after early-quirks are executed. With this patch calling
sequence for dmi-quriks is changed as follows:
acpi_boot_table_init() (dmi-quirks)
...
early_quirks() (detect bogus IRQ0 override)
...
acpi_boot_init() (late dmi-quirks and setup IO APIC)
Note: Plan is to remove the "late dmi-quirks" with next kernel version.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's a corner case in 32 bit x86 kdump at the moment. When the box
panics via nmi, we call bust_spinlocks(1) to disable sensitivity to the
console_sem (allowing us to print to the console in all cases), but we don't
call crash_kexec, until after we call bust_spinlocks(0), which re-enables
console_sem sensitivity.
The result is that, if we get an nmi while the console_sem is held and
kdump is configured, and we try to print something to the console during
kdump shutdown (which we often do) we deadlock the box. The fix is to
simply do what 64 bit die_nmi does which is to not call bust_spinlocks(0)
until after we call crash_kexec.
Patch below tested successfully by me.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Running machine checks with interrupt on is a extremly bad idea. The machine
check handler only runs when the system is broken and needs to finish
as quickly as possible.
Remove the respective bogus post 2.6.27 regression and call
the machine check vector directly again.
This removes only code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[Cherry-picked from x86/mce]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: fix hung bootup and other misbehavior on certain laptops
On some more HP laptops BIOS reports an IRQ0 override
but the SB600 chipset is configured such that timer
interrupts go to INT0 of IOAPIC.
Check IRQ0 routing and if it is routed to INT0 of IOAPIC skip the
timer override.
See following bug reports:
http://bugzilla.kernel.org/show_bug.cgi?id=11715http://bugzilla.kernel.org/show_bug.cgi?id=11516
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
These variables are only used in their source files, so make them static.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Intel 7300 Memory Controller supports dynamic throttling of memory which can
be used to save power when system is idle. This driver does the memory
throttling when all CPUs are idle on such a system.
Refer to "Intel 7300 Memory Controller Hub (MCH)" datasheet
for the config space description.
Signed-off-by: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
needed if the i7300_idle driver is to be modular.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Len Brown <len.brown@intel.com>