definition is moved to common header. x86_64 version is now called
native_smp_prepare_cpus
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header. x86_64 version is now called
native_prepare_boot_cpu
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
function definition is moved to common header. x86_64 version
is now called native_cpu_up
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
definition is moved to common header, x86_64 function name
now is native_smp_call_function_mask
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
function definition is moved to common header, x86_64 version is now called
native_smp_send_reschedule
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the smp_ops symbol is temporarily defined in smp_64.c, but it will soon
be unified
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge pointed out that looking at the boot_params
struct to determine if the system is running in a paravirtual
environment is not reliable for the Xen case, currently. He also
points out that there already exists a function to determine if
the system is running in a paravirtual environment. So let's use
that instead. This gets rid of the preprocessor test too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeremy Fitzhardinge pointed out that looking at the boot_params
struct to determine if the system is running in a paravirtual
environment is not reliable for the Xen case, currently. He also
points out that there already exists a function to determine if
the system is running in a paravirtual environment. So let's use
that instead. This gets rid of the preprocessor test too.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current tsc_init() clears the TSC feature bit if the TSC khz
cannot be calculated, causing us to panic in
arch/x86/kernel/cpu/bugs.c check_config(). We should simply mark it
unstable.
Frankly, someone should take an axe to this code. mark_tsc_unstable()
not only marks it unstable, but sets tsc_enabled to 0, which seems
redundant but is actually important here because means it won't be
used by sched_clock() either. Perhaps a tristate enum "UNUSABLE,
UNSTABLE, OK" would be clearer, and separate mark_tsc_unstable() and
mark_tsc_broken() functions?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is an add-on to the 64-bit ebda patch. It makes
the functions reserve_ebda_region (renamed from reserve_ebda)
and copy_e820_map equal to the 32-bit versions of the previous
patch.
Changes:
Use u64 and u32 for local variables in copy_e820_map.
The amount of conventional memory and the start of the EBDA are
detected by reading the BIOS data area directly. Paravirtual
environments do not provide this area, so we bail out early
in that case. They will just have to set up a correct memory
map to start with.
Add a safety net for zeroed out BIOS data area.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Explicitly reserve_early the whole address range from the end of
conventional memory as reported by the bios data area up to the
1Mb mark. Regard the info retrieved from the BIOS data area with
a bit of paranoia, though, because some biosses forget to register
the EBDA correctly.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds explicit detection of the EBDA and reservation
of the rom and adapter address space 0xa0000-0x100000 to the
i386 kernels. Before this patch, the EBDA size was hardcoded
as 4Kb. Also, the reservation of the adapter range was done by
modifying the e820 map which is now not necessary any longer,
and that code is removed from copy_e820_map.
The amount of conventional memory and the start of the EBDA are
detected by reading the BIOS data area directly. Paravirtual
environments do not provide this area, so we bail out early
in that case. They will just have to set up a correct memory
map to start with.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Depends on:
[PATCH 2/3] x86: coding style fixes to arch/x86/kernel/early_printk.c
Remove two:
ERROR: do not initialise statics to 0 or NULL
paolo@paolo-desktop:/tmp/c$ size *
text data bss dec hex filename
1172 280 12 1464 5b8 early_printk.o.after
1172 280 12 1464 5b8 early_printk.o.before
This patch is changing the binary output:
paolo@paolo-desktop:/tmp/c$ md5sum *
dad9a9a881e0eeda62cc5645bd3d7cad early_printk.o.after
da32f5cd8f248970e4809e1005393e95 early_printk.o.before
because the two variables moved to another section. No
change in functionality.
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
early_init_intel() on 64-bit is introduced by
commit 2b16a23538
Author: Andi Kleen <ak@suse.de>
Date: Wed Jan 30 13:32:40 2008 +0100
x86: move X86_FEATURE_CONSTANT_TSC into early cpu feature detection
sets CONSTANT_TSC for intel cpus - but it is already set in init_intel().
don't need to set that two times in early_init_intel() and init_intel().
this patch removes the init_intel() one.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
quad core 8 socket system will have apic id lifting.the apic id range could
be [4, 0x23]. and apic_is_clustered_box will think that need to three clusters
and that is larger than 2. So it is treated as a clustered_box.
and will get:
Marking TSC unstable due to TSCs unsynchronized
even if the CPUs have X86_FEATURE_CONSTANT_TSC set.
this quick fix will check if the cpu is from AMD.
but vsmp still needs that checking...
this patch is fix to make sure that vsmp not to be passed.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when comparing the e820 direct from BIOS, and the one by kexec:
BIOS-provided physical RAM map:
- BIOS-e820: 0000000000000000 - 0000000000097400 (usable)
+ BIOS-e820: 0000000000000100 - 0000000000097400 (usable)
BIOS-e820: 0000000000097400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dffa0000 (usable)
- BIOS-e820: 00000000dffae000 - 00000000dffb0000 type 9
+ BIOS-e820: 00000000dffae000 - 00000000dffb0000 (reserved)
BIOS-e820: 00000000dffb0000 - 00000000dffbe000 (ACPI data)
BIOS-e820: 00000000dffbe000 - 00000000dfff0000 (ACPI NVS)
BIOS-e820: 00000000dfff0000 - 00000000e0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
- BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
=======> that is the local apic address... somewhere we lost it
BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
found one entry about reserved is missing for the kernel by kexec.
it turns out init_apic_mappings is called before e820_reserve_resources
in setup_arch. but e820_reserve_resources is using request_resource.
it will not handle the conflicts.
there are three ways to fix it:
1. change request_resource in e820_reserve_resources to to insert_resource
2. move init_apic_mappings after e820_reserve_resources
3. use late_initcall to insert lapic resource.
this patch is using method 3, that is less intrusive.
in later version could consider to use method 1.
before patch
fed20000-ffffffff : PCI Bus #00
fee00000-fee00fff : Local APIC
fefff000-feffffff : pnp 00:09
ff700000-ffffffff : reserved
with patch will get map in first kernel
fed20000-ffffffff : PCI Bus #00
fee00000-fee00fff : Local APIC
fee00000-fee00fff : reserved
fefff000-feffffff : pnp 00:09
ff700000-ffffffff : reserved
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e820_resource_resources could use insert_resource instead of request_resource
also move code_resource, data_resource, bss_resource, and crashk_res
out of e820_reserve_resources.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Copy x86_64 and add a head32.c so we can start moving early
architecture initialization out of assembly.
[ Sam Ravnborg <sam@ravnborg.org>: updated it to x86 ]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
quad core 8 socket system will have apic id lifting.the apic id range could
be [4, 0x23]. and apic_is_clustered_box will think that need to three clusters
and that is large than 2. So it is treated as clustered_box.
and will get
Marking TSC unstable due to TSCs unsynchronized
even the CPUs have X86_FEATURE_CONSTANT_TSC set.
this patch will check if the cpu is from AMD.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now cpu/proc.c and cpu/proc_64.c are same.
So cpu/proc_64.c can be removed.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change /proc/cpuinfo on 32-bit, it will look like on 64-bit.
'power management' line is added and power management information
will be printed at the line.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 /proc/cpuinfo code can be unified.
This is the first step of unification.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch make the file errors free.
Only 4 "WARNING: line over 80 characters" left.
arch/x86/kernel/cpu/mcheck/p5.o:
text data bss dec hex filename
452 0 4 456 1c8 p5.o.before
452 0 4 456 1c8 p5.o.after
md5:
50c945ef150aa95bf0481cc3e1dc3315 p5.o.before.asm
50c945ef150aa95bf0481cc3e1dc3315 p5.o.after.asm
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kills more than 150 errors/warnings
Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix code to access CMOS rtc registers so that it does not use inb_p and
outb_p routines, which are deprecated. Extensive research on all known
CMOS RTC chipset timing shows that there is no need for a delay in
accessing the registers of these chips even on old machines. These chipa
are never on an expansion bus, but have always been "motherboard"
resources, either in the processor chipset or explicitly on the
motherboard, and they are not part of the ISA/LPC or PCI buses, so
delays should not be based on bus timing. The reason to fix it:
1) port 80 writes often hang some laptops that use ENE EC chipsets,
esp. those designed and manufactured by Quanta for HP;
2) RTC accesses are timing sensitive, and extra microseconds may matter;
3) the new "io_delay" function is calibrated by expansion bus timing needs,
thus is not appropriate for access to CMOS rtc registers.
Signed-off-by: David P. Reed <dpreed@reed.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Replace the hardcoded list of initialization functions for each CPU
vendor by a list in an ELF section, which is read at initialization in
arch/x86/kernel/cpu/cpu.c to fill the cpu_devs[] array. The ELF
section, named .x86cpuvendor.init, is reclaimed after boot, and
contains entries of type "struct cpu_vendor_dev" which associates a
vendor number with a pointer to a "struct cpu_dev" structure.
This first modification allows to remove all the VENDOR_init_cpu()
functions.
This patch also removes the hardcoded calls to early_init_amd() and
early_init_intel(). Instead, we add a "c_early_init" member to the
cpu_dev structure, which is then called if not NULL by the generic CPU
initialization code. Unfortunately, in early_cpu_detect(), this_cpu is
not yet set, so we have to use the cpu_devs[] array directly.
This patch is part of the Linux Tiny project, and is needed for
further patch that will allow to disable compilation of unused CPU
support code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It becomes to early for ioremap, so we use early_ioremap
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change Makefile so vsmp_64.o object is dependent
on PARAVIRT, rather than X86_VSMP
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ tglx@linutronix.de: cleanup the other structs as well ]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The extended century readout does not solve the year 2038 problem on
32bit!
v2: Fix compilation on !ACPI, pointed out by tglx
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We assume that the RTC clock is BCD, so print a warning if it claims
to be binary.
[ tglx@linutronix.de: changed to WARN_ON - we want to know that!
If no one reports it we can remove the complete if (RTC_ALWAYS_BCD)
magic, which has RTC_ALWAYS_BCD defined to 1 since Linux 1.0 ... ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We know it is already after 2000. Use the year 2000 offset for both 32
and 64 bit, which removes ifdefs and the 1970 magic.
[ tglx@linutronix.de: remove 1970 magic, replace bogus commit message ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change size to unsigned long, becase caller and user all used unsigned long.
Also make bad_addr take an alignment parameter.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If the vDSO was not mapped, don't use it as the "restorer" for a signal
handler. Whether we have a pointer in mm->context.vdso depends on what
happened at exec time, so we shouldn't check any global flags now.
Background:
Currently, every 32-bit exec gets the vDSO mapped even if it's disabled
(the process just doesn't get told about it). Because it's in fact
always there, the bug that this patch fixes cannot happen now. With
the second patch, it won't be mapped at all when it's disabled, which is
one of the things that people might really want when they disable it (so
nothing they didn't ask for goes into their address space).
The 32-bit signal handler setup when SA_RESTORER is not used refers to
current->mm->context.vdso without regard to whether the vDSO has been
disabled when the process was exec'd. This patch fixes this not to use
it when it's null, which becomes possible after the second patch. (This
never happens in normal use, because glibc's sigaction call uses
SA_RESTORER unless glibc detected the vDSO.)
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
people sometimes do crazy stuff like building really large static
arrays into their kernels or building allyesconfig kernels. Give
more space to the kernel and push modules up a bit: kernel has
512 MB and modules have 1.5 GB.
Should be enough for a few years ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs. The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.
Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.
More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function. This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else. It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch also resolves hangs on boot:
http://lkml.org/lkml/2008/2/23/263http://bugzilla.kernel.org/show_bug.cgi?id=10093
The bug was causing once-in-few-reboots 10-15 sec wait during boot on
certain laptops.
Earlier commit 40d6a14662 added
smp_call_function in cpu_idle_wait() to kick cpus that are in tickless
idle. Looking at cpu_idle_wait code at that time, code seemed to be
over-engineered for a case which is rarely used (while changing idle
handler).
Below is a simplified version of cpu_idle_wait, which just makes a dummy
smp_call_function to all cpus, to make them come out of old idle handler
and start using the new idle handler. It eliminates code in the idle
loop to handle cpu_idle_wait.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc expects all toplevel assembly to return to the original section type.
The code in alteranative.c does not do this. This caused some strange bugs
in sched-devel where code would end up in the .rodata section and when
the kernel sets the NX bit on all .rodata, the kernel would crash when
executing this code.
This patch adds a .previous marker to return the code back to the
original section.
Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a
bug in the toplevel asm code in the kernel. ;-)
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it
instead of smp_processor_id() in the call to set_cyc2ns_scale().
This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update
the intended cpu's cyc2ns.
Related mail/thread: http://lkml.org/lkml/2007/12/7/130
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
revert:
| commit 47001d6033
| Author: Thomas Gleixner <tglx@linutronix.de>
| Date: Tue Apr 1 19:45:18 2008 +0200
|
| x86: tsc prevent time going backwards
it has been identified to cause suspend regression - and the
commit fixes a longstanding bug that existed before 2.6.25 was
opened - so it can wait some more until the effects are better
understood.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We handle a broken tsc these days, so no need to panic. We clear the
TSC bit when tsc_init decides it's unreliable (eg. under lguest w/ bad
host TSC), leading to bogus panic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commits:
commit 37a47db8d7
Author: Balaji Rao <balajirrao@gmail.com>
Date: Wed Jan 30 13:30:03 2008 +0100
x86: assign IRQs to HPET timers, fix
and
commit e3f37a54f6
Author: Balaji Rao <balajirrao@gmail.com>
Date: Wed Jan 30 13:30:03 2008 +0100
x86: assign IRQs to HPET timers
have been identified to cause a regression on some platforms due to
the assignement of legacy IRQs which makes the legacy devices
connected to those IRQs disfunctional.
Revert them.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.
This was reported in:
http://lkml.org/lkml/2007/8/23/96
and
http://lkml.org/lkml/2008/3/31/23
I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.
CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.
The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.
The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.
To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.
I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix obsolete printks in aperture-64. We used not to handle missing
agpgart, but we handle it okay now.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
right now if there's no CPU support for nmi_watchdog=2 we'll just
refuse it silently.
print a useful warning.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
implement nmi_watchdog=2 on this class of CPUs:
cpu family : 15
model : 6
model name : Intel(R) Pentium(R) D CPU 3.00GHz
the watchdog's ->setup() method is safe anyway, so if the CPU
cannot support it we'll bail out safely.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This avoids using wrmsr on MSR_IA32_DEBUGCTLMSR when it's not needed.
No wrmsr ever needs to be done if noone has ever used block stepping.
Without this change, using ptrace on 2.6.25 on an x86 KVM guest
will tickle KVM's missing support for the MSR and crash the guest
kernel. Though host KVM is the buggy one, this makes for a regression
in the guest behavior from 2.6.24->2.6.25 that we can easily avoid.
I also corrected some bad whitespace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the problem that makedumpfile sometimes fails on x86_64 machine.
This patch adds the symbol "phys_base" to a vmcoreinfo data. The
vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
CONFIG_RELOCATABLE=y, makedumpfile fails like the following:
# makedumpfile -d31 /proc/vmcore dumpfile
The kernel version is not supported.
The created dumpfile may be incomplete.
_exclude_free_page: Can't get next online node.
makedumpfile Failed.
#
The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
If the symbol "phys_base" does not exist, makedumpfile considers an
x86_64 kernel as non relocatable. As the result, makedumpfile
misunderstands the physical address where the kernel is loaded, and it
cannot translate a kernel virtual address to physical address correctly.
To fix this problem, this patch adds the symbol "phys_base" to a
vmcoreinfo data.
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kernel/ptrace.c:548: warning: 'ptrace_bts_get_size' defined but not used
arch/x86/kernel/ptrace.c:558: warning: 'ptrace_bts_read_record' defined but not used
arch/x86/kernel/ptrace.c:607: warning: 'ptrace_bts_clear' defined but not used
arch/x86/kernel/ptrace.c:617: warning: 'ptrace_bts_drain' defined but not used
arch/x86/kernel/ptrace.c:720: warning: 'ptrace_bts_config' defined but not used
arch/x86/kernel/ptrace.c:788: warning: 'ptrace_bts_status' defined but not used
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.
otherwise setup_memory() is called two times... that is duplicated...
[ mingo@elte.hu: both Thomas and me simulated a double call to
setup_bootmem_allocator() and can confirm that it is a real bug
which can hang in certain configs. It's not been reported yet but
that is probably due to the relatively scarce nature of
MTRR-trimming systems. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Wed, 26 Mar 2008 11:56:22 -0600
Jordan Crouse <jordan.crouse@amd.com> wrote:
> On 26/03/08 14:31 +0100, Stefan Pfetzing wrote:
> > Hello Jordan,
> >
> > I just tried to build your geodwdt driver for the geode watchdog. Therefore
> > I pulled your repository from http://git.infradead.org/geode.git (or more,
> > the git url).
> >
> > I tried to build the geodewdt driver as a module - which didn't work, and
> > it failed with the same problem as earlier mentioned on lkmk [1]. I also
> > checked the fix [2], but that seems to be already in your (or linus) tree -
> > and so I'm unsure what the problem is.
> >
> > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074
> > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174
> >
> > Building directly into the kernel seems to work.
> >
> > Maybe you have some idea?
>
> Hmm - that is strange. Exporting the symbols should work. I recommend
> starting over with a clean tree.
>
> CCing Andres - any thoughts?
>
> Jordan
>
Er, yeah. The patch below should fix it. This should probably go into
2.6.25.
Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header
being missing.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I have found that using SMI to change the cpu's frequency on my DELL
Latitude L400 clobbers the ECX register in speedstep_set_state, causing
unneccessary retries because the "state" variable has changed silently (GCC
assumes it is still present in ECX).
play safe and avoid gcc caching any register across IO port accesses
that trigger SMIs.
Signed-off by: <Stephan.Diestelhorst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert function comment blocks to kernel-doc notation.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Revert
commit f62f1fc9ef
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Fri Mar 7 15:02:50 2008 -0800
x86: reserve dma32 early for gart
The patch has a dependency on bootmem modifications which are not .25
material that late in the -rc cycle. The problem which is addressed by
the patch is limited to machines with 256G and more memory booted with
NUMA disabled. This is not a .25 regression and the audience which is
affected by this problem is very limited, so it's safer to do the
revert than pulling in intrusive bootmem changes right now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix the bug reported here:
http://bugzilla.kernel.org/show_bug.cgi?id=10232
use update_memory_range() instead of add_memory_range() directly
to avoid closing the gap.
( the new code only affects and runs on systems where the MTRR
workaround triggers. )
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
we have seen a little problem in rebooting Dell Optiplex 745 with the
0KW626 board. Here is a small patch enabling reboot with this board,
which forces the default reboot path it into the BIOS reboot mode.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The fault_msg text is not explictly nul terminated now in startup
assembly. Do so by converting .ascii to .asciz.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
aperture_64.c takes a piece of memory and makes it into iommu
window... but such window may not be saved by swsusp -- that leads to
oops during hibernation.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
this patch allows hpet=force on nVidia nForce 430 southbridge.
This patch was tested by me on my old Asus A8N-VM CSM (where bios does not
support hpet and does not advertise it via acpi entry). My nForce430 version:
lspci -nn | grep LPC
00:0a.0 ISA bridge [0601]: nVidia Corporation MCP51 LPC Bridge [10de:0260]
(rev a2)
Kernel 2.6.24.3 after patching and using hpet=force reports this:
dmesg | grep -i hpet
Kernel command line: root=/dev/sda8 ro vga=773 video=vesafb:mtrr:4,ywrap
vt.default_utf8=0 hpet=force
Force enabled HPET at base address 0xfed00000
hpet clockevent registered
Time: hpet clocksource has been installed.
grep -i hpet /proc/timer_list
Clock Event Device: hpet
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
grep Clock /proc/timer_list (before patching)
Clock Event Device: pit
Clock Event Device: lapic
grep Clock /proc/timer_list (after patching)
Clock Event Device: hpet
Clock Event Device: lapic
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33
Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230
the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.
solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.
the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP
will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We recently got some of the "Desktop Form Factor" Optiplex 745's in. I
noticed that there's an entry for the SFF one's, but the BIOS model number
of the DFF differs from that of the SFF. We have been reliably
experiencing the same (as far as I can tell) reboot bug as the SFF boxes.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
when numa disabled I got this compile warning:
arch/x86/kernel/setup64.c: In function setup_per_cpu_areas:
arch/x86/kernel/setup64.c:147: warning: the address of
contig_page_data will always evaluate as true
it seems we missed checking if the node is online before we try to refer
NODE_DATA. Fix it.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
memory-less node support:
this patch uses updated dev_to_node, because dev_to_node already makes sure
it returns an online node.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The code to restart syscalls after signals depends on checking for a
negative orig_ax, and for particular negative -ERESTART* values in ax.
These fields are 64 bits and for a 32-bit task they get zero-extended.
The syscall restart behavior is lost, a regression from a native 32-bit
kernel and from 64-bit tasks' behavior.
This patch fixes the problem by doing sign-extension where it matters.
For orig_ax, the only time the value should be -1 but winds up as
0x0ffffffff is via a 32-bit ptrace call. So the patch changes ptrace to
sign-extend the 32-bit orig_eax value when it's stored; it doesn't
change the checks on orig_ax, though it uses the new current_syscall()
inline to better document the subtle importance of the used of
signedness there.
The ax value is stored a lot of ways and it seems hard to get them all
sign-extended at their origins. So for that, we use the
current_syscall_ret() to sign-extend it only for 32-bit tasks at the
time of the -ERESTART* comparisons.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes 64-bit ptrace calls setting the 64-bit orig_ax field for a
32-bit task sign-extend the low 32 bits up to 64. This matches what a
64-bit debugger expects when tracing a 32-bit task.
This follows on my "x86_64 ia32 syscall restart fix". This didn't
matter until that was fixed.
The debugger ignores or zeros the high half of every register slot it
sets (including the orig_rax pseudo-register) uniformly. It expects
that the setting of the low 32 bits always has the same meaning as a
32-bit debugger setting those same 32 bits with native 32-bit
facilities.
This never arose before because the syscall restart check never
matched any -ERESTART* values due to lack of sign extension. Before
that fix, even 32-bit ptrace setting orig_eax to -1 failed to trigger
the restart check anyway. So this was never noticed as a regression
of 64-bit debuggers vs 32-bit debuggers on the same 64-bit kernel.
Signed-off-by: Roland McGrath <roland@redhat.com>
[ Changed to just do the sign-extension unconditionally on x86-64,
since orig_ax is always just a small integer and doesn't need
the full 64-bit range ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich noticed that the reboot fixups went missing during
reboot.c unification.
(commit 4d022e35fd)
Geode and a few other rare boards with special reboot quirks are
affected.
Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
convert_fxsr_to_user() in 2.6.24's i387_32.c did this, and
convert_to_fxsr() also does the inverse, so I assume it's an oversight
that it is no longer being done.
[ mingo@elte.hu:
we encode it this way because there's no space for the 'FPU Last
Instruction Opcode' (->fop) field in the legacy user_i387_ia32_struct
that PTRACE_GETFPREGS/PTRACE_SETFPREGS uses.
it's probably pure legacy - i'd be surprised if any user-space relied on
the FPU Last Opcode in any way. But indeed we used to do it previously
so the most conservative thing is to preserve that piece of information.
]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Linux kernel currently does not clear the direction flag before
calling a signal handler, whereas the x86/x86-64 ABI requires that.
Linux had this behavior/bug forever, but this becomes a real problem
with gcc version 4.3, which assumes that the direction flag is
correctly cleared at the entry of a function.
This patches changes the setup_frame() functions to clear the
direction before entering the signal handler.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
We don't need to printk a message every time we transition.
Leave the code there, but ifdef'd out, as it's useful when
adding support for new processors.
Reported-by: Petr Titěra <P.Titera@century.cz>
Signed-off-by: Dave Jones <davej@redhat.com>
This bug got introduced by the recent i387 merge:
commit 4421011120
Author: Roland McGrath <roland@redhat.com>
Date: Wed Jan 30 13:31:50 2008 +0100
x86: x86 i387 user_regset
Current usage of unlazy_fpu() in ptrace specific routines is wrong.
unlazy_fpu() will not init fpu if the task never used math. So the
ptrace calls can expose the parent tasks FPU data in some cases.
Replace it with the init_fpu() which will init the math state, if the
task never used math before.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
revert the BTS ptrace extension for now.
based on general objections from Roland McGrath:
http://lkml.org/lkml/2008/2/21/323
we'll let the BTS functionality cook some more and re-enable
it in v2.6.26. We'll leave the dead code around to help the
development of this code.
(X86_BTS is not defined at the moment)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay the removal of this symbol export by one more kernel release,
giving external modules such as VirtualBox a chance to stop using it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a recent fix:
commit ce28b9864b
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Feb 20 23:57:30 2008 +0100
x86: fix vsyscall wreckage
removed the broken /kernel/vsyscall64 handler completely.
This triggers the following debug check:
sysctl table check failed: /kernel/vsyscall64 No proc_handler
Restore the sane part of the proc handler.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a kernel bug (vmware boot problem) reported by Tomasz Grobelny,
which occurs with certain .config variants and gccs.
The x86 TLS cleanup in commit efd1ca52d0
made the sys_set_thread_area and sys_get_thread_area functions ripe for
tail call optimization. If the compiler chooses to use it for them, it
can clobber the user trap frame because these are asmlinkage functions.
Reported-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
> Diffing dmesg between git7 and git8 doesn't sched any light since
> git8 also removed the printouts of the x86 caps as they were being
> initialised and updated. I'm currently adding those printouts back
> in the hope of seeing where and when the caps get broken.
That turned out to be very illuminating:
--- dmesg-2.6.24-git7 2008-02-24 18:01:25.295851000 +0100
+++ dmesg-2.6.24-git8 2008-02-24 18:01:25.530358000 +0100
...
CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Notice how the TSC cap bit goes from Off to On.
(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)
Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
in with cleared_cpu_caps
HOWEVER, at this point c->x86_capability correctly has TSC
Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
sets TSC to On in c->x86_capability, with disastrous results.
The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.
A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.
[ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ]
The breakage was introduced via commit 7d851c8d3d.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, c_idle is declared in the stack, and thus, have no static address.
Peter Zijlstra points out this simple solution, in which c_idle.work
is initializated separatedly. Note that the INIT_WORK macro has a static
declaration of a key inside.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Acked-by: Peter Zijlstra <pzijlstr@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, there is no way for print_stack_trace() to determine whether
a given stack trace entry was deemed reliable or not, simply because
save_stack_trace() does not record this information. (Perhaps needless
to say, this makes the saved stack traces A LOT harder to read, and
probably with no other benefits, since debugging features that use
save_stack_trace() most likely also require frame pointers, etc.)
This patch reverts to the old behaviour of only recording the reliable trace
entries for saved stack traces.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There doesn't seem to be any reason for swapper_pg_pmd being global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Inside a KVM virtual machine the MTRRs are usually blank. This confuses Linux
and causes a warning message at boot. This patch removes that warning message
when running Linux as a KVM guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by pageexec@freemail.hu:
> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).
To fix this we pass in pt_regs by pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on a report from Arne Georg Gleditsch about user-space apps
misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
of the code revealed that the "NOP patching" done there is
fundamentally unsafe for a number of reasons:
1) the patching code runs without synchronizing other CPUs
2) it inserts NOPs even if there is no clock source which provides vread
3) when the clock source changes to one without vread we run in
exactly the same problem as in #2
4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out
as a result it is possible to break user-space via this patching.
The only safe thing for now is to remove the patching.
This code was broken since v2.6.21.
Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
text but data, bss and init sections as well.
That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
because i knew how big of _text_ my images have and i knew about the 40 MB
"text" limit so i wrongly thought to be on the safe side of the 40 MB limit
with my 29 MB of text, while the total image size was slightly above 40 MB.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.
after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:
#define KERNEL_TEXT_SIZE (40*1024*1024)
which limit my vmlinux just happened to pass:
text data bss dec hex filename
29703744 4222751 8646224 42572719 2899baf vmlinux
40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/
So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)
So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)
We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is
bad, fix it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix mtrr kernel-doc warning:
Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class
based on the notion that they contain all the user-space visible
features of an i686-class chip. However, this is not actually true:
they lack the EA-taking long NOPs (0F 1F /0). Since this is a
userspace-visible incompatibility, downgrade these CPUs to the
manufacturer-defined i586 level.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simple typo fix for regression introduced by the user_regset changes.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove redundant irq_desc[NR_IRQS] element initialization in
init_ISA_irqs(). irq_desc[NR_IRQS] is already statically
initialized with the same values in kernel/irq/handle.c .
besides the clean-up value this also saves some space:
text data bss dec hex filename
1389 356 14 1759 6df i8259_32.o.before
1325 356 14 1695 69f i8259_32.o.after
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Though we use PDA for regular task stack but that
is not acceptable for init_task wich is special
one. We still have to allocate init_task's stack
in that manner.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's much better to use PAGE_SIZE then magic 4096
(though it's almost synonym in most cases on x86 but
not for *all* cases ;)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
initial_code are initially used to hold a function pointer
from __init and later from __cpuinit. This confuses modpost
and changing initial_code to REFDATA silence the warning.
(But now we do not discard the variable anymore).
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch_register_cpu() is only defined for HOTPLUG_CPU code
so simple fix is to ignore references by annotating the
function __ref.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
srat_detect_node() is only used by __cpuinit init_intel().
So the trivial fix is to annotate srat_detect_node() with __cpuinit.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
nearby_node() were only used by __cpuinit amd_detect_cmp()
So annotating nearby_node() __cpuinit was the trivial fix.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/nmi_64.c:50: warning: 'unknown_nmi_panic_callback' declared 'static' but never defined
This patch also fixes nmi_32.c
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>