do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC
in HLT too, not just when going through the ACPI methods.
(the ACPI idle code already does this.)
[ update the 64-bit side too, as noticed by Jiri Slaby. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
scale the sched_clock() cyc_2_nsec scaling factor according to
CPU frequency changes.
[ mingo@elte.hu: simplified it and fixed it for SMP. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cf http://lkml.org/lkml/2007/10/3/41
To summarize: on Linux, SA_ONSTACK decides whether you are already on the
signal stack based on the value of the SP at the time of a signal. If
you are not already inside the range, you are not "on the signal stack"
and so the new signal handler frame starts over at the base of the signal
stack.
sigaltstack (and sigstack before it) was invented in BSD. There, the
SA_ONSTACK behavior has always been different. It uses a kernel state
flag to decide, rather than the SP value. When you first take an
SA_ONSTACK signal and switch to the alternate signal stack, it sets the
SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
Thereafter you are "on the signal stack" and don't switch SP before
pushing a handler frame no matter what the SP value is. Only when you
sigreturn from the original handler context do you clear the SS_ONSTACK
flag so that a new handler frame will start over at the base of the
alternate signal stack.
The undesireable effect of the Linux behavior is that an overflow of the
alternate signal stack can not only go undetected, but lead to a ring
buffer effect of clobbering the original handler frame at the base of the
signal stack for each successive signal that comes just after the
overflow. This is what Shi Weihua's test case demonstrates. Normally
this does not come up because of the signal mask, but the test case uses
SA_NODEFER for its SIGSEGV handler.
The other subtle part of the existing Linux semantics is that a simple
longjmp out of a signal handler serves to take you off the signal stack
in a safe and reliable fashion without having used sigreturn (nor having
just returned from the handler normally, which means the same). After
the longjmp (or even informal stack switching not via any proper libc or
kernel interface), the alternate signal stack stands ready to be used
again.
A paranoid program would allocate a PROT_NONE red zone around its
alternate signal stack. Then a small overflow would trigger a SIGSEGV in
handler setup, and be fatal (core dump) whether or not SIGSEGV is
blocked. As with thread stack red zones, that cannot catch all overflows
(or underflows). e.g., a local array as large as page size allocated in
a function called from a handler, but not actually touched before more
calls push more stack, could cause an overflow that silently pushes into
some unrelated allocated pages.
The BSD behavior does not do anything in particular about overflow. But
it does at least avoid the wraparound or "ring buffer effect", so you'll
just get a straightforward all-out overflow down your address space past
the low end of the alternate signal stack. I don't know what the BSD
behavior is for longjmp out of an SA_ONSTACK handler.
The POSIX wording relating to sigaltstack is pretty minimal. I don't
think it speaks to this issue one way or another. (The program that
overflows its stack is clearly in undefined behavior territory of one
sort or another anyhow.)
Given the longjmp issue and the potential for highly subtle complications
in existing programs relying on this in arcane ways deep in their code, I
am very dubious about changing the behavior to the BSD style persistent
flag. I think Shi Weihua's patches have a similar effect by tracking the
SP used in the last handler setup.
I think it would be sensible for the signal handler setup code to detect
when it would itself be causing a stack overflow. Maybe something like
the following patch (untested). This issue exists in the same way on all
machines, so ideally they would all do a similar check.
When it's the handler function itself or its callees that cause the
overflow, rather than the signal handler frame setup alone crossing the
boundary, this still won't help. But I don't see any way to distinguish
that from the valid longjmp case.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add the DMI strings provided by Islam Amer <pharon@gmail.com>, for
the Compaq Presario V6000 (Quanta/30B7).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
various changes to the in_p/out_p delay details:
- add the io_delay=none method
- make each method selectable from the kernel config
- simplify the delay code a bit by getting rid of an indirect function call
- add the /proc/sys/kernel/io_delay_type sysctl
- change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
- make the io delay config not depend on CONFIG_DEBUG_KERNEL
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "David P. Reed" <dpreed@reed.com>
x86: provide a DMI based port 0x80 I/O delay override.
Certain (HP) laptops experience trouble from our port 0x80 I/O delay
writes. This patch provides for a DMI based switch to the "alternate
diagnostic port" 0xed (as used by some BIOSes as well) for these.
David P. Reed confirmed that port 0xed works for him and provides a
proper delay. The symptoms of _not_ working are a hanging machine,
with "hwclock" use being a direct trigger.
Earlier versions of this attempted to simply use udelay(2), with the
2 being a value tested to be a nicely conservative upper-bound with
help from many on the linux-kernel mailinglist but that approach has
two problems.
First, pre-loops_per_jiffy calibration (which is post PIT init while
some implementations of the PIT are actually one of the historically
problematic devices that need the delay) udelay() isn't particularly
well-defined. We could initialise loops_per_jiffy conservatively (and
based on CPU family so as to not unduly delay old machines) which
would sort of work, but...
Second, delaying isn't the only effect that a write to port 0x80 has.
It's also a PCI posting barrier which some devices may be explicitly
or implicitly relying on. Alan Cox did a survey and found evidence
that additionally some drivers may be racy on SMP without the bus
locking outb.
Switching to an inb() makes the timing too unpredictable and as such,
this DMI based switch should be the safest approach for now. Any more
invasive changes should get more rigid testing first. It's moreover
only very few machines with the problem and a DMI based hack seems
to fit that situation.
This also introduces a command-line parameter "io_delay" to override
the DMI based choice again:
io_delay=<standard|alternate>
where "standard" means using the standard port 0x80 and "alternate"
port 0xed.
This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
command-line ("io_delay=udelay") choice for testing purposes as well.
This does not change the io_delay() in the boot code which is using
the same port 0x80 I/O delay but those do not appear to be a problem
as David P. Reed reported the problem was already gone after using the
udelay version. He moreover reported that booting with "acpi=off" also
fixed things and seeing as how ACPI isn't touched until after this DMI
based I/O port switch I believe it's safe to leave the ones in the boot
code be.
The DMI strings from David's HP Pavilion dv9000z are in there already
and we need to get/verify the DMI info from other machines with the
problem, notably the HP Pavilion dv6000z.
This patch is partly based on earlier patches from Pavel Machek and
David P. Reed.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Document the fact that __save_processor_state() has to save all CPU
registers referred to by the kernel in case a different kernel is
used to load and restore a hibernation image containing it.
Sigend-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Looks like IRQ 31 is assigned to timer 3, even without the patch!
I wonder who wrote the number 31. But the manual says that it is
zero by default.
I think we should check whether the timer has been allocated an IRQ before
proceeding to assign one to it. Here is a patch that does this.
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Tested-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
device. This patch fixes it by allocating IRQs to timer blocks in the HPET.
arch/x86/kernel/hpet.c | 13 +++++--------
drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++-------
include/linux/hpet.h | 2 +-
3 files changed, 44 insertions(+), 16 deletions(-)
Signed-off-by: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The following scenario might leave PIT as a disfunctional clock source:
PIT is registered as clocksource
PM_TIMER is registered as clocksource and enables highres/dyntick mode
PIT is switched to oneshot mode
-> now the readout of PIT is bogus, but the user might select PIT
via the sysfs override, which would break the box as the time
readout is unusable.
Unregister the PIT clocksource when the PIT clock event device is switched
into shutdown / oneshot mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86 the PIT might become an unusable clocksource. Add an unregister
function to provide a possibilty to remove the PIT from the list of
available clock sources.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
PIT clocksource is registered unconditionally even when HPET is enabled
or when PIT is replaced by the local APIC timer. In both cases PIT can
not be used as it is stopped and the readout would be stale.
Prevent registering PIT in those cases.
patch depends on:
x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
fixes. When there's copyright, there should be GPL.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
the logic in this function is just crazy. It's recursive, but we
can circumvent the creation for the kobject and whole creation of the
threshold_block if some conditions are met. That's why we see the
allocate_threshold_blocks so many times in the callstack, yet only a few
kobjects created.
Then we blow up in kobject_uevent_env() on the first debug printk.
Which means that we are just passing in garbage.
Man, this is one time that comments in code would have been very nice to
have, and why forward goto's into major code blocks are just evil...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
LatencyTOP kernel infrastructure; it measures latencies in the
scheduler and tracks it system wide and per process.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use HR-timers (when available) to deliver an accurate preemption tick.
The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.
The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use
get_online_cpus and put_online_cpus instead as it highlights the
refcount semantics in these operations.
The new API guarantees protection against the cpu-hotplug operation, but
it doesn't guarantee serialized access to any of the local data
structures. Hence the changes needs to be reviewed.
In case of pseries_add_processor/pseries_remove_processor, use
cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the
cpu_present_map there.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make this kobject dynamic and convert it to not use kobject_register,
which is going away.
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch reorganizes the way suspend and resume notifications are
sent to drivers. The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.
It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we set the MFGPT timer tick, there is a chance that we'll
immediately assert an event. If for some reason the IRQ routing
for this clock has been setup for some other purpose, then we
could end up firing an interrupt into the SMM handler or worse.
This rearranges the timer tick init function to initalize the handler
before we set up the MFGPT clock to make sure that even if we get
an event, it will go to the handler.
Furthermore, in the handler we need to make sure that we clear the
event, even if the timer isn't running.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Arnd Hannemann <hannemann@i4.informatik.rwth-aachen.de>
This reverts commit d4d25deca4.
It tried to fix long standing bugzilla entries, but the solution was
reported to break other systems. The reporter of
http://bugzilla.kernel.org/show_bug.cgi?id=9791
tracked it down to this commit and confirmed that reverting the patch
restores the correct behaviour. It's too late in the release cycle to
find a better solution than reverting the commit to avoid regressions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
In the current code, RTC_AIE doesn't work if the RTC relies on
CONFIG_HPET_EMULATE_RTC because the code sets the RTC_AIE flag in
hpet_set_rtc_irq_bit(). The interrupt handles does accidentally check
for RTC_PIE and not RTC_AIE when comparing the time which was set in
hpet_set_alarm_time().
I now verified on a test system here that without the patch applied,
the attached test program fails on a system that has HPET with
2.6.24-rc7-default. That's not critical since I guess the problem has
been there for several kernel releases, but as the fix is quite
obvious.
Configuration is CONFIG_RTC=y and CONFIG_HPET_EMULATE_RTC=y.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are
already in idle, have no tasks waiting to run and have no interrupts going
to them. This is common on bootup when switching cpu idle governors.
This patch gives those CPUS that don't check in an IPI kick.
Background:
-----------
I notice this while developing the mcount patches, that every once in a
while the system would hang. Looking deeper, the hang was always at boot
up when registering init_menu of the cpu_idle menu governor. Talking
with Thomas Gliexner, we discovered that one of the CPUS had no timer
events scheduled for it and it was in idle (running with NO_HZ). So the
CPU would not set the cpu_idle_state bit.
Hitting sysrq-t a few times would eventually route the interrupt to the
stuck CPU and the system would continue.
Note, I would have used the PDA isidle but that is set after the
cpu_idle_state bit is cleared, and would leave a window open where we
may miss being kicked.
hmm, looking closer at this, we still have a small race window between
clearing the cpu_idle_state and disabling interrupts (hence the RFC).
CPU0: CPU 1:
--------- ---------
cpu_idle_wait(): cpu_idle():
| __cpu_cpu_var(is_idle) = 1;
| if (__get_cpu_var(cpu_idle_state)) /* == 0 */
per_cpu(cpu_idle_state, 1) = 1; |
if (per_cpu(is_idle, 1)) /* == 1 */ |
smp_call_function(1) |
| receives ipi and runs do_nothing.
wait on map == empty idle();
/* waits forever */
So really we need interrupts off for most of this then. One might think
that we could simply clear the cpu_idle_state from do_nothing, but I'm
assuming that cpu_idle governors can be removed, and this might cause a
race that a governor might be used after the module was removed.
Venki said:
I think your RFC patch is the right solution here. As I see it, there is
no race with your RFC patch. As long as you call a dummy smp_call_function
on all CPUs, we should be OK. We can get rid of cpu_idle_state and the
current wait forever logic altogether with dummy smp_call_function. And so
there wont be any wait forever scenario.
The whole point of cpu_idle_wait() is to make all CPUs come out of idle
loop atleast once. The caller will use cpu_idle_wait something like this.
// Want to change idle handler
- Switch global idle handler to always present default_idle
- call cpu_idle_wait so that all cpus come out of idle for an instant
and stop using old idle pointer and start using default idle
- Change the idle handler to a new handler
- optional cpu_idle_wait if you want all cpus to start using the new
handler immediately.
Maybe the below 1s patch is safe bet for .24. But for .25, I would say we
just replace all complicated logic by simple dummy smp_call_function and
remove cpu_idle_state altogether.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI and APM used "pm_active" to guarantee that
they would not be simultaneously active.
But pm_active was recently moved under CONFIG_PM_LEGACY,
so that without CONFIG_PM_LEGACY, pm_active became a NOP --
allowing ACPI and APM to both be simultaneously enabled.
This caused unpredictable results, including boot hangs.
Further, the code under CONFIG_PM_LEGACY is scheduled
for removal.
So replace pm_active with pm_flags.
pm_flags depends only on CONFIG_PM,
which is present for both CONFIG_APM and CONFIG_ACPI.
http://bugzilla.kernel.org/show_bug.cgi?id=9194
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
With CPU_HOTPLUG=n:
WARNING: vmlinux.o(.text+0x104f8): Section mismatch: reference to .init.text:fork_idle (between
'do_fork_idle' and 'lapic_timer_broadcast')
do_fork_idle() needs to be __cpuinit. It can be static as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After 17d57a9206 ("x86: fix x86-32 early
fixmap initialization.") removing lg.ko caused a printk from vunmap:
mm/memory.c:115: bad pgd 004b3027.
On the second use after module load, the kernel crashes.
This fixes the immediate problem (accessed and dirty bits not set as
expected in pmd_none_or_clear_bad). I can't see why this would cause
a crash, but I haven't been able to reproduce it once this is applied.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit fbdcf18df7.
As pointed out by Yanmin Zhang, the problem was already fixed
differently (and correctly), and rather than fix anything, it actually
causes us to create a sub-optimal sched-domains hierarchy (not setting
up the domain belonging to the core) when CONFIG_X86_HT=y.
Requested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a cpu cache info entry for the Intel Tolapai cpu.
Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Andrew "Eagle Eye" Morton noticed that we use raw_local_save_flags()
instead of raw_local_irq_save(flags) in die(). This allows the
preemption of oopsing contexts - which is highly undesirable. It also
causes CONFIG_DEBUG_PREEMPT to complain, as reported by Miles Lane.
this bug was introduced via:
commit 39743c9ef7
Author: Andi Kleen <ak@suse.de>
Date: Fri Oct 19 20:35:03 2007 +0200
x86: use raw locks during oopses
- spin_lock_irqsave(&die.lock, flags);
+ __raw_spin_lock(&die.lock);
+ raw_local_save_flags(flags);
that is not a correct open-coding of spin_lock_irqsave(): both the
ordering is wrong (irqs should be disabled _first_), and the wrong
flags-saving API was used.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when called by setup_arch) after smp_store_cpu_info() had set it to the
correct value.
The error shows up in 'cat /proc/cpuinfo' will all cpus = 0.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this is the tale of a full day spent debugging an ancient but elusive bug.
after booting up thousands of random .config kernels, i finally happened
to generate a .config that produced the following rare bootup failure
on 32-bit x86:
| ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
| ..MP-BIOS bug: 8254 timer not connected to IO-APIC
| ...trying to set up timer (IRQ0) through the 8259A ... failed.
| ...trying to set up timer as Virtual Wire IRQ... failed.
| ...trying to set up timer as ExtINT IRQ... failed :(.
| Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug
| and send a report. Then try booting with the 'noapic' option
this bug has been reported many times during the years, but it was never
reproduced nor fixed.
the bug that i hit was extremely sensitive to .config details.
First i did a .config-bisection - suspecting some .config detail.
That led to CONFIG_X86_MCE: enabling X86_MCE magically made the bug disappear
and the system would boot up just fine.
Debugging my way through the MCE code ended up identifying two unlikely
candidates: the thing that made a real difference to the hang was that
X86_MCE did two printks:
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
Adding the same printks to a !CONFIG_X86_MCE kernel made the bug go away!
this left timing as the main suspect: i experimented with adding various
udelay()s to the arch/x86/kernel/io_apic_32.c:check_timer() function, and
the race window turned out to be narrower than 30 microseconds (!).
That made debugging especially funny, debugging without having printk
ability before the bug hits is ... interesting ;-)
eventually i started suspecting IRQ activities - those are pretty much the
only thing that happen this early during bootup and have the timescale of
a few dozen microseconds. Also, check_timer() changes the IRQ hardware
in various creative ways, so the main candidate became IRQ0 interaction.
i've added a counter to track timer irqs (on which core they arrived, at
what exact time, etc.) and found that no timer IRQ would arrive after the
bug condition hits - even if we re-enable IRQ0 and re-initialize the i8259A,
but that we'd get a small number of timer irqs right around the time when we
call the check_timer() function.
Eventually i got the following backtrace triggered from debug code in the
timer interrupt:
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ...
Pid: 1, comm: swapper Not tainted (2.6.24-rc5 #57)
EIP: 0060:[<c044d57e>] EFLAGS: 00000246 CPU: 0
EIP is at _spin_unlock_irqrestore+0x5/0x1c
EAX: c0634178 EBX: 00000000 ECX: c4947d63 EDX: 00000246
ESI: 00000002 EDI: 00010031 EBP: c04e0f2e ESP: f7c41df4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffe04000 CR3: 00630000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c05f5784>] setup_IO_APIC+0x9c3/0xc5c
the spin_unlock() was called from init_8259A(). Wait ... we have an IRQ0
entry while we are in the middle of setting up the local APIC, the i8259A
and the PIT??
That is certainly not how it's supposed to work! check_timer() was supposed
to be called with irqs turned off - but this eroded away sometime in the
past. This code would still work most of the time because this code runs
very quickly, but just the right timing conditions are present and IRQ0
hits in this small, ~30 usecs window, timer irqs stop and the system does
not boot up. Also, given how early this is during bootup, the hang is
very deterministic - but it would only occur on certain machines (and
certain configs).
The fix was quite simple: disable/restore interrupts properly in this
function. With that in place the test-system now boots up just fine.
(64-bit x86 io_apic_64.c had the same bug.)
Phew! One down, only 1500 other kernel bugs are left ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Kprobes for x86-64 may cause a kernel crash if it inserted on "iret"
instruction. "call absolute" is invalid on x86-64, so we don't need
treat it.
- Change the processing order as same as x86-32.
- Add "iret"(0xcf) case.
- Remove next_rip local variable.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
jprobe for x86-64 may cause kernel page fault when the jprobe_return()
is called from incorrect function.
- Use jprobe_saved_regs instead getting it from stack.
(Especially on x86-64, it may get incorrect data, because
pt_regs can not be get by using container_of(rsp))
- Change the type of stack pointer to unsigned long *.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Revert commit efa4d2fb04 ("Hibernation:
Use temporary page tables for kernel text mapping on x86_64") because it
causes my t61p to reboot right at the end of resume-from-disk. For
reasons unknown at this time.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Old debugging hack sneaked back during x86 merge, this removes it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
fix this on i386 allnoconfig:
WARNING: vmlinux.o(.text+0x6f2e): Section mismatch: reference to .init.text:register_cpu (between 'arch_register_cpu' and 'text_poke')
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
free_cache_attributes() must be __cpuinit since it calls the
__cpuinit cache_remove_shared_cpu_map().
This patch fixes the following section mismatch reported by
Chris Clayton:
...
WARNING: vmlinux.o(.text+0x90b6): Section mismatch: reference to .init.text:cache_remove_shared_cpu_map (between 'free_cache_attributes' and 'show_level')
...
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Our automated test suite looks for keywords like error, fail, warning in
the boot log. In the case when the nmi watchdog is determined to be
stuck in check_nmi_watchdog(), none of those keywords are displayed.
This patch adds a keyword, "WARNING:", so it makes it easier to notice
when the nmi watchdog isn't working correctly. Also add a proper
KERN_WARNING mark to this printout.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pageexec@freemail.hu writes:
> i've just noticed that the chunk in i386/kernel/head.S ended up in a
> weird place, namely, it's not going to be executed as it's just after
> a 'jmp 3f' and before startup_32_smp, probably not what you intended.
> on a sidenote, the whole thing can be done in a single insn, like:
>
> movl $(swapper_pg_pmd - __PAGE_OFFSET + 0x067), (swapper_pg_dir -
> __PAGE_OFFSET+ 4092)
Thanks for the reminder I thought we had fixed this problem a while ago.
Needed to get fixed virtual address for USB debug and earlycon with mmio.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
we should also add hpet_disable() for kdump.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If HPET was enabled by pci quirks, we use i8253 as initial clockevent
because pci quirks doesn't run until pci is initialized.
The above means the kernel (or something) is assuming HPET legacy
replacement is disabled and can use i8253 at boot.
If we used kexec, it isn't true. So, this patch disables HPET legacy
replacement for kexec in machine_shutdown().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Subdividing the paravirt_ops structure caused a regression in certain
non-GPL modules which try to use mmu_ops and cpu_ops. This restores the
old behaviour, and makes it consistent with the non-CONFIG_PARAVIRT case.
Takashi Iwai <tiwai@suse.de> adds:
> I took at this problem (as I have an nvidia card on one of my
> workstations), and found out that the following suffer from
> EXPORT_SYMBOL_GPL changes:
>
> * local_disable_irq(), local_irq_save*(), etc.
> * MSR-related macros like rdmsr(), wrmsr(), read_cr0(), etc.
> wbinvd(), too.
> * pmd_val(), pgd_val(), etc are all involved with pv_mm_ops.
> pmd_large() and pmd_bad() is also indirectly involved.
> __flush_tlb() and friends suffer, too.
Christoph Hellwig objects to this patch on the grounds that modules
shouldn't be using these operations anyway. I don't think this is a
particularly good reason to reject the patch, for several reasons:
1. These operations are still available to modules when not using
CONFIG_PARAVIRT, since they are implicitly exported as inline
functions via the kernel headers. Exporting the same functionality as
GPL-only symbols just adds a gratuitious difference between
CONFIG_PARAVIRT and non-CONFIG_PARAVIRT configurations. If we really
think these operations are not for module use (or non-GPL module use),
then we should solve the problem in a general way.
2. It's a regression from previous kernels, which would work these
modules even with CONFIG_PARAVIRT enabled.
3. The operations in question seem pretty reasonable for modules to
use. The control registers/MSRs can be accessed directly anyway, so there's
no benefit in preventing modules from using standard interfaces. And it seems
reasonable to allow a graphics driver to create its own mappings if it wants.
Therefore, I think this patch should go in for 2.6.24. If people
really think that these operations should not be available to modules,
then we can address that separately.
Signed-off-by: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: Tobias Powalowski <t.powa@gmx.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: fix APIC related bootup crash on Athlon XP CPUs
time: add ADJ_OFFSET_SS_READ
x86: export the symbol empty_zero_page on the 32-bit x86 architecture
x86: fix kprobes_64.c inlining borkage
pci: use pci=bfsort for HP DL385 G2, DL585 G2
x86: correctly set UTS_MACHINE for "make ARCH=x86"
lockdep: annotate do_debug() trap handler
x86: turn off iommu merge by default
x86: fix ACPI compile for LOCAL_APIC=n
x86: printk kernel version in WARN_ON and other dump_stack users
ACPI: Set max_cstate to 1 for early Opterons.
x86: fix NMI watchdog & 'stopped time' problem
warmbloodedcreature@gmail.com reported that an APIC-enabled
Asus a7v8x-x with an Athlon XP reboots early in the bootup:
http://bugzilla.kernel.org/show_bug.cgi?id=8723
after a long marathon of spontaneous-reboot debugging, it turns
out to be caused by sync_Arb_ids(). AMD CPUs never really needed
this sequence anyway, so just return early if we meet an AMD CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The latest KVM driver wants to use the empty_zero_page symbol, and it's
not exported in 32-bit x86 (although it is exported by x86_64, s390, and
uml architectures).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: tglx@linutronix.de
Cc: linux-kernel@vger.kernel.com
Cc: kvm-devel@lists.sourceforge.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/kprobes_64.c: In function 'set_current_kprobe':
arch/x86/kernel/kprobes_64.c:152: sorry, unimplemented: inlining failed in call to 'is_IF_modifier': recursive inlining
arch/x86/kernel/kprobes_64.c:166: sorry, unimplemented: called from here
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Cc: akpm@linux-foundation.org
Cc: tglx@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
today, all oopses contain a version number of the kernel, which is nice
because the people who actually do bother to read the oops get this
vital bit of information always without having to ask the reporter in
another round trip.
However, WARN_ON() and many other dump_stack() users right now lack this
information; the patch below adds this. This information is essential
for getting people to use their time effectively when looking at these
things; in addition, it's essential for tools that try to collect
statistics about defects.
Please consider, since its so simple and important for long term kernel
quality processes.
The code is identical between 32/64 bit; a lot of this code should be
unified over time, the patch keeps the identical-ness intact.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
More than 3 years ago Niclas Gustafsson reported a 'stopped time'
problem:
> Watching the /proc/interrupts with 10s apart after the "stop".
>
> [root@s151 root]# more /proc/interrupts
> CPU0
> 0: 66413955 local-APIC-edge timer
[...]
> LOC: 67355837
> ERR: 0
> MIS: 0
> [root@s151 root]# more /proc/interrupts
> CPU0
> 0: 66413955 local-APIC-edge timer
[...]
> LOC: 67379568
> ERR: 0
> MIS: 0
This may be because buggy SMM firmware messes with the 8259A (configured
for a transparent mode -- yes that rare "local-APIC-edge" mode is tricky
;-) ) insanely.
this should resolve:
http://bugzilla.kernel.org/show_bug.cgi?id=2544http://bugzilla.kernel.org/show_bug.cgi?id=6296
Patch-dusted-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Needed to make the wireless board, WRAP2C reboot.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some BIOSes advertise HPET at 0x0. We really do no want to
allocate a resource there. Check for it and leave early.
Other BIOSes tell us the HPET is at 0xfed0000000000000
instead of 0xfed00000. Add a check and fix it up with a warning
on user request.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Correct potentially unstable PC RTC time register reading in time_64.c
Stop the use of an incorrect technique for reading the standard PC RTC
timer, which is documented to "disconnect" time registers from the bus
while updates are in progress. The use of UIP flag while interrupts
are disabled to protect a 244 microsecond window is one of the
Motorola spec sheet's documented ways to read the RTC time registers
reliably.
tglx: removed locking changes from original patch, as they gain nothing
(read_persistent_clock is only called during boot, suspend, resume - so
no hot path affected) and conflict with the paravirt locking scheme
(see 32bit code), which we do not want to complicate for no benefit.
Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix hard freeze on x86_64 when the ntpd service calls
update_persistent_clock()
A repeatable but randomly timed freeze has been happening in Fedora 6
and 7 for the last year, whenever I run the ntpd service on my AMD64x2
HP Pavilion dv9000z laptop. This freeze is due to the use of
spin_lock(&rtc_lock) under the assumption (per a bad comment) that
set_rtc_mmss is called only with interrupts disabled. The call from
ntp.c to update_persistent_clock is made with interrupts enabled.
Signed-off-by: David P. Reed <dpreed@reed.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
92cb7612ae sets cpu_info->cpu_index to zero
for no reason. Referencing cpu_info->cpu_index now points always to CPU#0,
which is apparently not what we want.
Remove it.
Spotted-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix regressions introduced with 92cb7612ae.
It can happen that cpuinfo is displayed for CPUs that are not online or
even worse for CPUs not present at all. As an example, following was
shown for a "second" CPU of a single core K8 variant:
processor : 0
vendor_id : unknown
cpu family : 0
model : 0
model name : unknown
stepping : 0
cache size : 0 KB
fpu : yes
fpu_exception : yes
cpuid level : 0
wp : yes
flags :
bogomips : 0.00
clflush size : 0
cache_alignment : 0
address sizes : 0 bits physical, 0 bits virtual
power management:
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit d435d862ba
("cpu hotplug: mce: fix cpu hotplug error handling")
changed the error handling in mce_cpu_callback.
In cases where not all CPUs are brought up during
boot (e.g. using maxcpus and additional_cpus parameters)
mce_cpu_callback now returns NOTFIY_BAD because
for such CPUs cpu_data is not completely filled when
the notifier is called. Thus mce_create_device fails right
at its beginning:
if (!mce_available(&cpu_data[cpu]))
return -EIO;
As a quick fix I suggest to check boot_cpu_data for MCE.
To reproduce this regression:
(1) boot with maxcpus=2 addtional_cpus=2 on a 4 CPU x86-64 system
(2) # echo 1 >/sys/devices/system/cpu/cpu2/online
-bash: echo: write error: Invalid argument
dmesg shows:
_cpu_up: attempt to bring up CPU 2 failed
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add throttling control via MSR when T-states uses
the FixHW Control Status registers.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86:
x86: enable "make ARCH=x86"
x86: do not use $(ARCH) when not needed
kconfig: use $K64BIT to set 64BIT with all*config targets
kconfig: add helper to set config symbol from environment variable
kconfig: factor out code in confdata.c
x86: move the rest of the menu's to Kconfig
x86: move all simple arch settings to Kconfig
x86: copy x86_64 specific Kconfig symbols to Kconfig.i386
x86: add X86_64 dependency to x86_64 specific symbols in Kconfig.x86_64
x86: add X86_32 dependency to i386 specific symbols in Kconfig.i386
x86: arch/x86/Kconfig.cpu unification
x86: start unification of arch/x86/Kconfig.*
x86: unification of cfufreq/Kconfig
Fix regression introduced with d435d862ba
("cpu hotplug: mce: fix cpu hotplug error handling").
A CPU which was not brought up during boot (using maxcpus and
additional_cpus parameters) couldn't be onlined anymore. For such a CPU it
seemed that MCE was not supported during CPU_UP_PREPARE-time which caused
mce_cpu_callback to return NOTIFY_BAD to notifier_call_chain. To fix this
we:
- call mce_create_device for CPU_ONLINE event (instead of CPU_UP_PREPARE),
- avoid mce_remove_device() for the CPU that is not correctly initialized
by mce_create_device() failure,
- make mce_cpu_callback always return NOTIFY_OK for CPU_ONLINE event.
Because CPU_ONLINE callback return value is always ignored.
[akinobu.mita@gmail.com: avoid mce_remove_device() for not initialized device]
[akinobu.mita@gmail.com: make mce_cpu_callback always return NOTIFY_OK]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For x86 ARCH may say i386 or x86_64 and soon x86.
Rely on CONFIG_X64_32 to select between 32/64 or just
hardcode the value as appropriate.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Merge the two Kconfig files to a single file.
Checked using make allmodconfig for x86_64.
No changes in build.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
restore sigcontext is taking a DNA exception while restoring FP context
from the user stack, during the sigreturn. Appended patch fixes it by
doing clts() if the app doesn't touch FP during the signal handler
execution. This will stop generating a DNA, during the fxrstor in the
sigreturn.
This improves 64-bit lat_sig numbers by ~30% on my core2 platform.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
prepare for up_smp_call_function() to ensure that the 'func'
pointer is unused. (which is related to a KVM build fix)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch removes the unused EXPORT_SYMBOL(machine_id).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the 4 symbols iommu_hole_init(), iommu_aperture,
iommu_aperture_allowed, iommu_aperture_disabled. All these symbols are only
used for the GART implementation of IOMMUs.
It adds and additional gart_ prefix to them.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes some functions and variables static in pci-gart_64.c which are
not used somewhere else.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the IOMMU config option to GART_IOMMU because in fact it
means the GART and not general support for an IOMMU on x86.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
clear to which IOMMU implementation it belongs. The patch also adds "GART" to
the Kconfig line.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Additional CPUID strings (sse4_1, sse4_2, sse5, skinit, wdt); fix the
positioning of the AMD ecx strings (cr8_legacy was duplicated under
two different names, so the alignment of all the other strings were
off by one.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
blk_rq_map_sg doesn't initialize sg->dma_address/length to zero
anymore. Some low level drivers reuse sg lists without initializing so
IOMMUs might get non-zero dma_address/length. If map_sg fails, we need
pass the number of the mapped entries to gart_unmap_sg.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so
that makedumpfile (dump filtering command) supports x86_64 sparsemem
kernel of linux-2.6.24.
makedumpfile creates a small dumpfile by excluding unnecessary pages for
the analysis. It checks attributes in page structures and distinguishes
necessary pages and unnecessary ones. To check them, makedumpfile gets
the vmcoreinfo data which has the minimum debugging information only for
dump filtering.
For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates
the virtual address of page structure into physical address by subtracting
PAGE_OFFSET from virtual address, but this translation isn't effective for
linux-2.6.24 sparsemem kernel, because its page structures are in virtual
memmap area. makedumpfile should translate their virtual address by 4-levels
paging and it needs the symbol "init_level4_pgt".
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix this warning:
arch/x86/kernel/early-quirks.c:40: warning: nvidia_hpet_check defined but not used
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix !CONFIG_SMP warning:
arch/x86/kernel/acpi/processor.c: In function arch_acpi_processor_init_pdc:
arch/x86/kernel/acpi/processor.c:65: warning: unused variable cpu
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel only ever supports 1 version of the boot protocol
so there is no need to check the boot protocol revision to
see if a feature is supported.
Both x86 and x86_64 support the same boot protocol so we need
to implement the KEEP_SEGMENTS on x86_64 as well. It isn't
just paravirt bootloaders that could use this functionality.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
KVM uses smp_call_function_mask and therefor need smp_ops to be exported.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 6442eea937.
The patch breaks smp_ops and needs to be reverted. The solution to
allow modular build of KVM is to export smp_ops instead.
Pointed-out-by: James Bottomley
<jejb> tglx, so write out 100 times "voyager is a useful architecture" ...
<tglx> yes, Sir
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ensure we fixup the IRQ state before we hit any locking code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
map_sg could copy the last sg element to another position (if merging
some elements). It breaks sg chaining. This copies only
dma_address/length instead of the whole sg element.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>