setup_IO_APIC_irqs could fail to get vector for some device when you have too
many devices, because at that time only boot cpu is online. So check vector
for irq in setup_ioapic_dest and call setup_IO_APIC_irq to make sure IO-APIC
irq-routing table is initialized.
Also seperate setup_IO_APIC_irq from setup_IO_APIC_irqs.
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
- Remove "Disabling IOMMU" message because it confuses people
- Clarify that the GART IOMMU is refered to in other message
Signed-off-by: Andi Kleen <ak@suse.de>
Idle callbacks has some races when enter_idle() sets isidle and subsequent
interrupts that can happen on that CPU, before CPU goes to idle. Due to this,
an IDLE_END can get called before IDLE_START. To avoid these races, disable
interrupts before enter_idle and make sure that all idle routines do not
enable interrupts before entering idle.
Note that poll_idle() still has a this race as it has to enable interrupts
before going to idle. But, all other idle routines have the race fixed.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This was added as a workaround for the fallback unwinder not supporting
unaligned stack pointers properly. But now it was fixed to do that,
so it's not needed anymore
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Tighten the requirements on both input to and output from the Dwarf2
unwinder.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We're already well protected against module unloads because module
unload uses stop_machine(). The only exception is NMIs, but other
users already risk lockless accesses here.
This avoids some hackery in lockdep and also a potential deadlock
This matches what i386 does.
Signed-off-by: Andi Kleen <ak@suse.de>
On the Core2 cpus, the rdtsc instruction is not serializing (as defined
in the architecture reference since rdtsc exists) and due to the deep
speculation of these cores, it's possible that you can observe time go
backwards between cores due to this speculation. Since the kernel
already deals with this with the SYNC_RDTSC flag, the solution is
simple, only assume that the instruction is serializing on family 15...
The price one pays for this is a slightly slower gettimeofday (by a
dozen or two cycles), but that increase is quite small to pay for a
really-going-forward tsc counter.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There is no guarantee that two RDTSCs in a row are monotonic,
so don't assume it on single core AMD systems.
This will make gettimeofday slower again
Signed-off-by: Andi Kleen <ak@suse.de>
Make mce_remove_device() clean up the kobject in per_cpu(device_mce, cpu)
after it has been unregistered.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
interrupt array is referred for idt vectors instead of NR_IRQS, so change size
to NR_VECTORS - FIRST_EXTERNAL_VECTOR. Also change to static.
Signed-off-by: Yinghai Lu <yinghai@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
idt_table is in the .bss section, so clear_bss need to called at first
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Read/Write APIC_LVTPC and APIC_LVTTHMR only,
if get_maxlvt() returns certain values.
This is done like everywhere else in i386/kernel/apic.c,
so I guess its correct.
Suspends/Resumes to disk fine and eleminates an smp_error_interrupt()
here on a K8.
AK: ported to x86-64 too
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a small patch for x86-64 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.
changelog:
- add CPU_FEATURE_BTS
- add Branch Trace Store detection
signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
quirks.
And add a PCI quirk for these platforms to check(which happens very late
during the boot) if the APIC routing is indeed set to default flat mode.
This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
selects physical mode instead of the logical flat(as needed for this errata
workaround).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The final line of /proc/<pid>/maps on x86_64 for native 64-bit
tasks shows an incorrect ending address and incorrect permissions. There
is only a single page mapped in this vsyscall region, and it is accessible
for both read and execute.
The patch below fixes this. (Since 32-bit-compat tasks have a real vma
with correct perms/range, no change is necessary for that scenario.)
Before the patch, a "cat /proc/self/maps | tail -1" shows this:
ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]
After the patch, this is the output:
ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]
Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch makes it possible to compile Calgary in but not use it by
default. In this mode, use 'iommu=calgary' to activate it.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch cleans up the previous "Use BIOS supplied BBAR information"
patch. Mostly stylistic clenaups, but also check for ioremap failure
when we ioremap the BBAR rather than when trying to use it.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Laurent Vivier <Laurent.Vivier@bull.net>
Find the BBAR register address of each Calgary using the "Extended
BIOS Data Area" rather than calculating it ourselves. Also get the bus
topology (what PHB each bus is on) from Calgary rather than
calculating it ourselves.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=7407.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Instead of adding all kinds of more quirks try various timer
routing variants in check_timer.
In particular this tries to handle quirks from:
- Nvidia NF2-4 reference BIOS: wrong timer override
- Asus: Wrong timer override but no HPET table
- ATI: require timer disabled in 8259
- Some boards: require timer enabled in 8259
We just try many of the the known variants in the hopefully right order
in check_timer.
Trying pin 0/2 on Nvidia suggested by Tim Hockin.
TBD Experimental. Needs a lot of testing
Signed-off-by: Andi Kleen <ak@suse.de>
Clear the irq releated entries in irq_vector, irq_domain and vector_irq
instead of clearing irq_vector only. So when new irq is created, it
could reuse that vector. (actually is the second loop scanning from
FIRST_DEVICE_VECTOR+8). This could avoid the vectors are used up
with enough module inserting and removing
Cc: Eric W. Biedierman <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-By: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
On modern systems RAM errors don't cause NMIs, but it's usually
caused by PCI SERR. Mention PCI instead of RAM in the printk.
Reported by r_hayashi@ctc-g.co.jp (Ryutaro Hayashi)
Cc: r_hayashi@ctc-g.co.jp
Signed-off-by: Andi Kleen <ak@suse.de>
Resending as I believe the discussion about them established they were
correct.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Currently the idle loop has two nested loops -- one high level
in cpu_idle and in some low level idle functions another one.
Looping in the low level idle functions breaks the idle notifiers
because interrupts waking up sleep states need to execute
exit_idle() which is only in cpu_idle().
So don't do that, only loop in cpu_idle(). This only removes
code.
In some cases e.g. poll_idle the idle loop is a little longer
now because cpu_idle checks more things. I hope that isn't a problem
ACPI idle doesn't change behaviour because it never looped anyways.
Cc: len.brown@intel.com
Cc: eranian@hpl.hp.com
Signed-off-by: Andi Kleen <ak@suse.de>
This patch:
- makes ret_from_sys_call no longer global (all external users were
previously switched to use int_ret_from_sys_call)
- adjusts placement of a CFI_{REMEMBER,RESTORE}_STATE pair to better
fit logic flow
- eliminates an unnecessary pair of CFI_{REMEMBER,RESTORE}_STATE
- glues together function- and unwinder-wise the previously separate
system_call and int_ret_from_sys_call function fragments
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Insert the Local APIC and IO APIC(s) into the resource tree. It allows the
APIC resources to be visible within /proc/iomem. The patch also takes into
account IO APIC(s) mapped in the PCI space by deferring the insertion until
after PCI has allocated its necessary resources.
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.
Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Here is a patch (used by perfmon2) to detect the presence of the
Precise Event Based Sampling (PEBS) feature for Intel 64-bit processors.
The patch also adds the cpu_has_pebs macro.
changelog:
- adds X86_FEATURE_PEBS
- adds cpu_has_pebs to test for X86_FEATURE_PEBS
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The unwinder has some extra newlines, which eat up loads of screen
space when it spews. (See https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=137900
for a nasty example).
warning_symbol-> and warning-> already printk a newline, so don't add one
in the strings passed to them.
[AK: redone for new code]
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86-64: Use stricter in process stack check for unwinder
[PATCH] i386: Fix compilation with UP genericarch
[PATCH] x86-64: Fix warning in io_apic.c
[PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind
[PATCH] x86_64: Align data segment to PAGE_SIZE boundary
Previously it would check for alignment only, which could break
if the stack pointer was unaligned. Now explicitely check if the
stack pointer is in the stack page of the current process.
Ported from i386.
Signed-off-by: Andi Kleen <ak@suse.de>
Commit 2c8c0e6b8d ("[PATCH] Convert x86-64
to early param") broke the earlyprintk=...,keep feature.
This restores that functionality. Tested on x86_64. Must-have for
v2.6.19, no risk.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
config options and tool chain it might be placed on a non PAGE_SIZE aligned
boundary and vmlinux loaders like kexec fail when they encounter a
PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
config options and tool chain it might be placed on a non PAGE_SIZE aligned
boundary and vmlinux loaders like kexec fail when they encounter a
PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The scheduler on Andreas Friedrich's hyperthreading system stopped
working properly: the scheduler would never move tasks to another CPU!
The lask known working kernel was 2.6.8.
After a couple of attempts to corner the bug, the following smoking gun
was found:
BIOS reported wrong ACPI idfor the processor
CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2
[<c0103bbe>] show_trace_log_lvl+0x34/0x4a
[<c0103ceb>] show_trace+0x2c/0x2e
[<c01045f8>] dump_stack+0x2b/0x2d
[<c0116a77>] set_cpus_allowed+0x52/0xec
[<c0101d86>] cpu_idle_wait+0x2e/0x100
[<c0259c57>] acpi_processor_power_exit+0x45/0x58
[<c0259752>] acpi_processor_remove+0x46/0xea
[<c025c6fb>] acpi_start_single_object+0x47/0x54
[<c025cee5>] acpi_bus_register_driver+0xa4/0xd3
[<c04ab2d7>] acpi_processor_init+0x57/0x77
[<c01004d7>] init+0x146/0x2fd
[<c0103a87>] kernel_thread_helper+0x7/0x10
a quick look at cpu_idle_wait() shows how broken that code is
on i386: it changes the init task's affinity map but never
restores it ...
and because all userspace tasks get forked by init, they all
inherited that single-CPU affinity mask. x86_64 cloned this
bug too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com>
Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
the new dwarf2 unwinder crashes while trying to dump the stack:
Leftover inexact backtrace:
Unable to handle kernel paging request at ffffffff82800000 RIP:
[<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
PGD 203027 PUD 205027 PMD 0
Oops: 0000 [2] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 30, comm: khelper Not tainted 2.6.19-rc6-rt1 #11
RIP: 0010:[<ffffffff8026cf26>] [<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
RSP: 0000:ffff81003fb9d848 EFLAGS: 00010006
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff805b3520 RDI: 0000000000000000
RBP: ffffffff827ffff9 R08: ffffffff80aad000 R09: 0000000000000005
R10: ffffffff80aae000 R11: ffffffff8037961b R12: ffff81003fb9d858
R13: 0000000000000000 R14: ffffffff80598460 R15: ffffffff80ab1fc0
FS: 0000000000000000(0000) GS:ffffffff806c4200(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff82800000 CR3: 0000000000201000 CR4: 00000000000006e0
this crash happened because it did not sanitize the dwarf2 data it
got, and got an unaligned stack pointer - which happily walked past
the process stack (and eventually reached the end of kernel memory
and pagefaulted there) due to this naive iteration condition:
HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);
note that i386 is alot more conservative when it comes to trusting
stack pointers:
static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
{
return p > (void *)tinfo &&
p < (void *)tinfo + THREAD_SIZE - 3;
}
but the x86_64 code did not take this bit of i386 code.
The fix is to align the stack pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Komuro reports that ISA interrupts do not work after a disable_irq(),
causing some PCMCIA drivers to not work, with messages like
eth0: Asix AX88190: io 0x300, irq 3, hw_addr xx:xx:xx:xx:xx:xx
eth0: found link beat
eth0: autonegotiation complete: 100baseT-FD selected
eth0: interrupt(s) dropped!
eth0: interrupt(s) dropped!
eth0: interrupt(s) dropped!
...
Linus Torvalds <torvalds@osdl.org> said:
"Now, edge-triggered interrupts are a _lot_ harder to mask, because the
Intel APIC is an unbelievable piece of sh*t, and has the edge-detect logic
_before_ the mask logic, so if a edge happens _while_ the device is
masked, you'll never ever see the edge ever again (unmasking will not
cause a new edge, so you simply lost the interrupt).
So when you "mask" an edge-triggered IRQ, you can't really mask it at all,
because if you did that, you'd lose it forever if the IRQ comes in while
you masked it. Instead, we're supposed to leave it active, and set a flag,
and IF the IRQ comes in, we just remember it, and mask it at that point
instead, and then on unmasking, we have to replay it by sending a
self-IPI."
This trivial patch solves the problem.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>