Commit graph

1772 commits

Author SHA1 Message Date
Thomas Gleixner
4d2920c9c0 x86_64: prepare shared crypto/aes-x86_64-asm.S
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:14:01 +02:00
Thomas Gleixner
e444d14d63 x86_64: prepare shared crypto/twofish.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:13:59 +02:00
Thomas Gleixner
565b56cc99 i386: prepare shared kernel/vsyscall-note.S
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 11:13:13 +02:00
Thomas Gleixner
f4b927a242 x86_64: simplify cpufreq build
Instead of copying the i386 Makefile and handling path substitutions
just use the i386 cpufreq Makefile.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-11 11:11:33 +02:00
Thomas Gleixner
04c17170ab x86_64: simplify oprofile build
Instead of copying the i386 Makefile and handling path substitutions
just use the i386 oprofile Makefile.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-11 11:11:32 +02:00
Andi Kleen
cf8dc57cba x86_64: increase VDSO_TEXT_OFFSET for ancient binutils
For some reason old binutils genertate larger headers so increase the text
offset of the vdso to avoid linker errors.

Roland McGrath explains:
  "There are extra symbols in the '.dynsym' section that are responsible
   for the size difference (They also cause corresponding inflation in
   '.gnu.version')

   Older ld's wrongly generated these unneeded symbols in .dynsym.  This
   was fixed not all that long ago (2006); binutils-2.17.50.0.6 might be
   the first fixed version, but I have not verified for sure where the
   cutoff was.

   The unneeded symbols et al from old ld add almost 700 bytes excess.
   This limits fairly tightly the amount by which the actual text and
   data in the vDSO can grow in the future without pushing the whole
   file over 4kb.  If it does grow later on, we should consider changing
   the layout with a config option or something to pack it better
   without that padding, when building the kernel with newer binutils."

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-01 19:21:30 -07:00
Linus Torvalds
f7f847b015 Revert "x86-64: Disable local APIC timer use on AMD systems with C1E"
This reverts commit e66485d747, since
Rafael Wysocki noticed that the change only works for his in -mm, not in
mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken
on his hardware, but that's apparently not a regression, just a symptom
of the same issue that causes the automatic apic timer disable to not
work).

It turns out that it really doesn't work correctly on x86-64, since
x86-64 doesn't use the generic clock events for timers yet.

Thanks to Rafal for testing, and here's the ugly details on x86-64 as
per Thomas:

  "I just looked into the code and the logic vs.  noapictimer on SMP is
   completely broken.

   On i386 the noapictimer option not only disables the local APIC
   timer, it also registers the CPUs for broadcasting via IPI on SMP
   systems.

   The x86-64 code uses the broadcast only when the local apic timer is
   active, i.e.  "noapictimer" is not on the command line.  This defeats
   the whole purpose of "noapictimer".  It should be there to make boxen
   work, where the local APIC timer actually has a hardware problem,
   e.g.  the nx6325.

   The current implementation of x86_64 only fixes the ACPI c-states
   related problem where the APIC timer stops in C3(2), nothing else.

   On nx6325 and other AMD X2 equipped systems which have the C1E
   enabled we run into the following:

   PIT keeps jiffies (and the system) running, but the local APIC timer
   interrupts can get out of sync due to this C1E effect.

   I don't think this is a critical problem, but it is wrong
   nevertheless.

   I think it's safe to revert the C1E patch and postpone the fix to the
   clock events conversion."

On further reflection, Thomas noted:

   "It's even worse than I thought on the first check:

    "noapictimer" on the command line of an SMP box prevents _ONLY_ the
    boot CPU apic timer from being used.  But the secondary CPU is still
    unconditionally setting up the APIC timer and uses the non
    calibrated variable calibration_result, which is of course 0, to
    setup the APIC timer.  Wreckage guaranteed."

so we'll just have to wait for the x86 merge to hopefully fix this up
for x86-64.

Tested-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26 15:43:41 -07:00
Thomas Gleixner
e66485d747 x86-64: Disable local APIC timer use on AMD systems with C1E
commit 3556ddfa92 titled

 [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E

solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64
X2) with C1E enabled:

When both cores go into idle at the same time, then the system switches
into C1E state, which is basically the same as C3. This stops the local
apic timer.

This was debugged right after the dyntick merge on i386 and despite the
patch title it fixes only the 32 bit path.

x86_64 is still missing this fix. It seems that mainline is not really
affected by this issue, as the PIT is running and keeps jiffies
incrementing, but that's just waiting for trouble.

-mm suffers from this problem due to the x86_64 high resolution timer
patches.

This is a quick and dirty port of the i386 code to x86_64.

I spent quite a time with Rafael to debug the -mm / hrt wreckage until
someone pointed us to this. I really had forgotten that we debugged this
half a year ago already.

Sigh, is it just me or is there something yelling arch/x86 into my ear?

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26 09:22:04 -07:00
Linus Torvalds
da8f153e51 Revert "x86_64: Quicklist support for x86_64"
This reverts commit 34feb2c83b.

Suresh Siddha points out that this one breaks the fundamental
requirement that you cannot free page table pages before the TLB caches
are flushed.  The quicklists do not give the same kinds of guarantees
that the mmu_gather structure does, at least not in NUMA configurations.

Requested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-21 12:09:41 -07:00
Andi Kleen
176df2457e x86_64: Zero extend all registers after ptrace in 32bit entry path.
Strictly it's only needed for eax.

It actually does a little more than strictly needed -- the other registers
are already zero extended.

Also remove the now unnecessary and non functional compat task check
in ptrace.

This is CVE-2007-4573

Found by Wojciech Purczynski

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-21 09:52:07 -07:00
H. Peter Anvin
91c4b8cb5a [acpi] Correct the decoding of video mode numbers in wakeup.S
wakeup.S looks at the video mode number from the setup header and
looks to see if it is a VESA mode.  Unfortunately, the decoding is
done incorrectly and it will attempt to frob the VESA BIOS for any
mode number 0x0200 or larger.  Correct this, and remove a bunch of #if
0'd code.

Massive thanks to Jeff Chua for reporting the bug, and suffering
though a large number of experiments in order to track this problem
down.

Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-09-20 11:06:58 -07:00
Linus Torvalds
dbe3ed1c07 x86-64: page faults from user mode are always user faults
Randy Dunlap noticed an interesting "crashme" behaviour on his dual
Prescott Xeon setup, where he gets page faults with the error code
having a zero "user" bit, but the register state points back to user
mode.

This may be a CPU microcode buglet triggered by some strange instruction
pattern that crashme generates, and loading a microcode update seems to
possibly have fixed it.

Regardless, we really should trust the register state more than the
error code, since it's really the register state that determines whether
we can actually send a signal, or whether we're in kernel mode and need
to oops/kill the process in the case of a page fault.

Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:37:14 -07:00
Andi Kleen
95b0867996 x86_64: Add missing mask operation to vdso
vdso vgetns() didn't mask the time source offset calculation, which
could lead to time problems with 32bit HPET.  Add the masking.

Thanks to Chuck Ebbert for tracking this down.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-12 09:28:06 -07:00
Rafael J. Wysocki
f3de4be9d5 PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION
Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit
296699de6b "Introduce CONFIG_SUSPEND for
suspend-to-Ram and standby" are incorrect, as they don't cover the facts that
(1) not all architectures support suspend and (2) SMP hibernation is only
possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31 01:42:22 -07:00
Len Brown
5a16eff86d Pull bugzilla-1641 into release branch 2007-08-24 22:19:05 -04:00
Len Brown
61ec7567db ACPI: boot correctly with "nosmp" or "maxcpus=0"
In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled.
However, in ACPI mode, these parameters didn't completely disable
the IO APIC initialization code and boot failed.

init/main.c:
	Disable the IO_APIC if "nosmp" or "maxcpus=0"
	undefine disable_ioapic_setup() when it doesn't apply.

i386:
	delete ioapic_setup(), it was a duplicate of parse_noapic()
	delete undefinition of disable_ioapic_setup()

x86_64:
	rename disable_ioapic_setup() to parse_noapic() to match i386
	define disable_ioapic_setup() in header to match i386

http://bugzilla.kernel.org/show_bug.cgi?id=1641

Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2007-08-21 00:33:35 -04:00
Andi Kleen
f0f12d85af x86_64: Check for .cfi_rel_offset in CFI probe
Very old 64bit binutils have .cfi_startproc/endproc, but
no .cfi_rel_offset. Check for .cfi_rel_offset too.

Cc: Jan Beulich <jbeulich@novell.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 10:25:25 -07:00
Andi Kleen
6e3515352b x86_64: Change PMDS invocation to single macro
Very old binutils (2.12.90...) seem to have trouble with newlines
in assembler macro invocation. They put them into the resulting
argument expansion. In this case this lead to a parse error because
a .rept expression ended up spread over multiple lines. Change the PMDS()
invocation to a single line.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 10:25:25 -07:00
Daniel Gollub
0328ecef90 x86_64: Fix to keep watchdog disabled by default for i386/x86_64
Fixed wrong expression which enabled watchdogs even if nmi_watchdog kernel
parameter wasn't set. This regression got slightly introduced with commit
b7471c6da9.

Introduced NMI_DISABLED (-1) which allows to switch the value of NMI_DEFAULT
without breaking the APIC NMI watchdog code (again).

Fixes:
   https://bugzilla.novell.com/show_bug.cgi?id=298084
   http://bugzilla.kernel.org/show_bug.cgi?id=7839
And likely some more nmi_watchdog=0 related issues.

Signed-off-by: Daniel Gollub <dgollub@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 10:25:25 -07:00
Andi Kleen
8154549cb8 x86_64: Fail dma_alloc_coherent on dma less devices
This should fix an oops with PCMCIA PATA devices

	http://bugzilla.kernel.org/show_bug.cgi?id=8424

This is not a full fix for the problem, but probably
still the right thing to do.

[ I'm almost certain it's *not* the right thing to do, but it avoids an
  oops, and I want comments from others on what the right thing would
  actually be..  I suspect we should just remove the use of dma_mask
  entirely in this function, and just use coherent_dma_mask.  - Linus ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-18 10:25:25 -07:00
Thomas Gleixner
cc75b92d11 genirq: mark io_apic level interrupts to avoid resend
Level type interrupts do not need to be resent.  It was also found that
some chipsets get confused in case of the resend.

Mark the ioapic level type interrupts as such to avoid the resend
functionality in the generic irq code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-12 11:05:45 -07:00
Petr Vandrovec
b8d3f2448b Do not replace whole memcpy in apply alternatives
apply_alternatives uses memcpy() to apply alternatives.  Which has the
unfortunate effect that while applying memcpy alternative to memcpy
itself it tries to overwrite itself with nops - which causes #UD fault
as it overwrites half of an instruction in copy loop, and from this
point on only possible outcome is triplefault and reboot.

So let's overwrite only first two instructions of memcpy - as long as
the main memcpy loop is not in first two bytes it will work fine.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-12 01:42:37 -07:00
Pete Zaitcev
1f1014896d x86_64: vdso.lds in arch/x86_64/vdso/.gitignore
Create arch/x86_64/vdso/.gitignore and put vdso.lds into it.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:14 -07:00
Zachary Amsden
08da5a2ca4 x86_64: Early segment setup for VT
VT is very picky about when it can enter execution.
Get all segments setup and get LDT and TR into valid state to allow
VT execution under VMware and KVM (untested).

This makes the boot decompression run under VT, which makes it several
orders of magnitude faster on 64-bit Intel hardware.

Before, I was seeing times up to a minute or more to decompress a 1.3MB kernel
on a very fast box.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:13 -07:00
Andi Kleen
d3f3c93469 x86: Disable CLFLUSH support again
It turns out CLFLUSH support is still not complete; we
flush the wrong pages.  Again disable it for the release.
Noticed by Jan Beulich who then also noticed a stupid typo later.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:13 -07:00
Murillo Fernandes Bernardes
f055a0619a x86_64: Calgary - Fix mis-handled PCI topology
Current code assumed that devices were directly connected to a Calgary
bridge, as it tried to get the iommu table directly from the parent bus
controller.

When we have another bridge between the Calgary/CalIOC2 bridge and the
device we should look upwards until we get to the top (Calgary/CalIOC2
bridge), where the iommu table resides.

Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:12 -07:00
dean gaudet
3320ad994a x86: Work around mmio config space quirk on AMD Fam10h
Some broken devices have been discovered to require %al/%ax/%eax registers
for MMIO config space accesses.  Modify mmconfig.c to use these registers
explicitly (rather than modify the global readb/writeb/etc inlines).

AK: also changed i386 to always use eax
AK: moved change to extended space probing to different patch
AK: reworked with inlines according to Linus' requirements.
AK: improve comments.

Signed-off-by: dean gaudet <dean@arctic.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:58:12 -07:00
Robin Holt
b291aa7a65 x86_64: fix HPET init race
I have had four seperate system lockups attributable to this exact problem
in two days of testing.  Instead of trying to handle all the weird end
cases and wrap, how about changing it to look for exactly what we appear
to want.

The following patch removes a couple races in setup_APIC_timer.  One occurs
when the HPET advances the COUNTER past the T0_CMP value between the time
the T0_CMP was originally read and when COUNTER is read.  This results in
a delay waiting for the counter to wrap.  The other results from the counter
wrapping.

This change takes a snapshot of T0_CMP at the beginning of the loop and
simply loops until T0_CMP has changed (a tick has happened).

<later>

I have one small concern about the patch.  I am not sure it meets the intent
as well as it should.  I think we are trying to match APIC timer interrupts up
with the hpet counter increment.  The event which appears to be disturbing
this loop in our test environment is the NMI watchdog.  What we believe has
been happening with the existing code is the setup_APIC_timer loop has read
the CMP value, and the NMI watchdog code fires for the first time.  This
results in a series of icache miss slowdowns and by the time we get back to
things it has wrapped.

I think this code is trying to get the CMP as close to the counter value as
possible.  If that is the intent, maybe we should really be testing against a
"window" around the CMP.  Something like COUNTER = CMP+/2.  It appears COUNTER
should get advanced every 89nSec (IIRC).  The above seems like an unreasonably
small window, but may be necessary.  Without documentation, I am not sure of
the original intent with this code.

In summary, this code fixes my boot hangs, but since I am not certain of the
intent of the existing code, I am not certain this has not introduced new bugs
or unexpected behaviors.

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: "Aaron Durbin" <adurbin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:47:39 -07:00
Josh Triplett
dd22a68228 x86_64: include asm/bugs.h in bugs.c for check_bugs prototype
C files should include the header files that prototype their functions.

Eliminates a sparse warning:
warning: symbol 'check_bugs' was not declared. Should it be static?

Signed-off-by: Josh Triplett <josh@kernel.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:42 -07:00
Andrew Morton
57d4810ea0 revert "x86, serial: convert legacy COM ports to platform devices"
Revert 7e92b4fc34.  It broke Sébastien Dugué's
machine and Jeff said (persuasively)

  This seems like it will break decades-long-working stuff, in favor of
  breaking new ground in our favorite area, "trusting the BIOS."

  It's just not worth it for serial ports, IMO.  Serial ports are something
  that just shouldn't break at this late stage in the game.  My new Intel
  platform boxes don't even have serial ports, so I question the value of
  messing with serial port probing even more...  because...  just wait a year,
  and your box won't have a serial port either!  :)

  I certainly don't object to the use of platform devices (or isa_driver),
  but the probe change seems questionable.  That's sorta analagous to
  rewriting the floppy driver probe routine.  Sure you could do it...  but why
  risk all that damage and go through debugging all over again?

  It seems clear from this report that we cannot, should not, trust BIOS for
  something (a) so simple and (b) that has been working for over a decade.

Much discussion ensued and we've decided to have another go at all of this.

Cc: Sébastien Dugué <sebastien.dugue@bull.net>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Sascha Sommer <saschasommer@freenet.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Stephane Eranian
a583f1b542 remove unused TIF_NOTIFY_RESUME flag
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures.  The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Randy Dunlap
e6b3322084 x86-64: Calgary: fix section mismatch warnings in tce
Fix section mismatch warnings:
these functions are called only from __init functions.

WARNING: vmlinux.o(.text+0x1861c): Section mismatch: reference to .init.text:free_bootmem (between 'free_tce_table' and 'build_tce_table')
WARNING: vmlinux.o(.text+0x187e5): Section mismatch: reference to .init.text:__alloc_bootmem_low (between 'alloc_tce_table' and 'kretprobe_trampoline_holder')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:37 -07:00
Alexey Dobriyan
4e950f6f01 Remove fs.h from mm.h
Remove fs.h from mm.h. For this,
 1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
 2) Add back fs.h or less bloated headers (err.h) to files that need it.

As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).

Cross-compile tested without regressions on my two usual configs and (sigh):

alpha              arm-mx1ads        mips-bigsur          powerpc-ebony
alpha-allnoconfig  arm-neponset      mips-capcella        powerpc-g5
alpha-defconfig    arm-netwinder     mips-cobalt          powerpc-holly
alpha-up           arm-netx          mips-db1000          powerpc-iseries
arm                arm-ns9xxx        mips-db1100          powerpc-linkstation
arm-assabet        arm-omap_h2_1610  mips-db1200          powerpc-lite5200
arm-at91rm9200dk   arm-onearm        mips-db1500          powerpc-maple
arm-at91rm9200ek   arm-picotux200    mips-db1550          powerpc-mpc7448_hpc2
arm-at91sam9260ek  arm-pleb          mips-ddb5477         powerpc-mpc8272_ads
arm-at91sam9261ek  arm-pnx4008       mips-decstation      powerpc-mpc8313_rdb
arm-at91sam9263ek  arm-pxa255-idp    mips-e55             powerpc-mpc832x_mds
arm-at91sam9rlek   arm-realview      mips-emma2rh         powerpc-mpc832x_rdb
arm-ateb9200       arm-realview-smp  mips-excite          powerpc-mpc834x_itx
arm-badge4         arm-rpc           mips-fulong          powerpc-mpc834x_itxgp
arm-carmeva        arm-s3c2410       mips-ip22            powerpc-mpc834x_mds
arm-cerfcube       arm-shannon       mips-ip27            powerpc-mpc836x_mds
arm-clps7500       arm-shark         mips-ip32            powerpc-mpc8540_ads
arm-collie         arm-simpad        mips-jazz            powerpc-mpc8544_ds
arm-corgi          arm-spitz         mips-jmr3927         powerpc-mpc8560_ads
arm-csb337         arm-trizeps4      mips-malta           powerpc-mpc8568mds
arm-csb637         arm-versatile     mips-mipssim         powerpc-mpc85xx_cds
arm-ebsa110        i386              mips-mpc30x          powerpc-mpc8641_hpcn
arm-edb7211        i386-allnoconfig  mips-msp71xx         powerpc-mpc866_ads
arm-em_x270        i386-defconfig    mips-ocelot          powerpc-mpc885_ads
arm-ep93xx         i386-up           mips-pb1100          powerpc-pasemi
arm-footbridge     ia64              mips-pb1500          powerpc-pmac32
arm-fortunet       ia64-allnoconfig  mips-pb1550          powerpc-ppc64
arm-h3600          ia64-bigsur       mips-pnx8550-jbs     powerpc-prpmc2800
arm-h7201          ia64-defconfig    mips-pnx8550-stb810  powerpc-ps3
arm-h7202          ia64-gensparse    mips-qemu            powerpc-pseries
arm-hackkit        ia64-sim          mips-rbhma4200       powerpc-up
arm-integrator     ia64-sn2          mips-rbhma4500       s390
arm-iop13xx        ia64-tiger        mips-rm200           s390-allnoconfig
arm-iop32x         ia64-up           mips-sb1250-swarm    s390-defconfig
arm-iop33x         ia64-zx1          mips-sead            s390-up
arm-ixp2000        m68k              mips-tb0219          sparc
arm-ixp23xx        m68k-amiga        mips-tb0226          sparc-allnoconfig
arm-ixp4xx         m68k-apollo       mips-tb0287          sparc-defconfig
arm-jornada720     m68k-atari        mips-workpad         sparc-up
arm-kafa           m68k-bvme6000     mips-wrppmc          sparc64
arm-kb9202         m68k-hp300        mips-yosemite        sparc64-allnoconfig
arm-ks8695         m68k-mac          parisc               sparc64-defconfig
arm-lart           m68k-mvme147      parisc-allnoconfig   sparc64-up
arm-lpd270         m68k-mvme16x      parisc-defconfig     um-x86_64
arm-lpd7a400       m68k-q40          parisc-up            x86_64
arm-lpd7a404       m68k-sun3         powerpc              x86_64-allnoconfig
arm-lubbock        m68k-sun3x        powerpc-cell         x86_64-defconfig
arm-lusl7200       mips              powerpc-celleb       x86_64-up
arm-mainstone      mips-atlas        powerpc-chrp32

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-29 17:09:29 -07:00
Len Brown
673d5b43da ACPI: restore CONFIG_ACPI_SLEEP
Restore the 2.6.22 CONFIG_ACPI_SLEEP build option, but now shadowing the
new CONFIG_PM_SLEEP option.

Signed-off-by: Len Brown <len.brown@intel.com>
[ Modified to work with the PM config setup changes. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-29 16:53:59 -07:00
Rafael J. Wysocki
b0cb1a19d0 Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION
Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-29 16:45:38 -07:00
Tony Luck
7a6c813594 [IA64] Fix build failure in fs/quota.c
b716395e2b added code to handle
a compatability issue with 32bit quota tools, but the new compat
routines are only needed when CONFIG_COMPAT=y (and with this set
to 'n' there are compilation problems since some new typedefs are
not visible).

Reported by Doug Chapman.  Fix tuned by a cast of thousands (Andi,
Andreas, Arthur, HPA, Willy)

Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-07-27 15:40:13 -07:00
Linus Torvalds
602033ed59 Revert most of "x86: Fix alternatives and kprobes to remap write-protected kernel text"
This reverts most of commit 19d36ccdc3.

The way to DEBUG_RODATA interactions with KPROBES and CPU hotplug is to
just not mark the text as being write-protected in the first place.
Both of those facilities depend on rewriting instructions.

Having "helpful" debug facilities that just cause more problem is not
being helpful.  It just adds complexity and bugs. Not worth it.

Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 12:07:21 -07:00
Thomas Gleixner
8ec3cf7d29 x86_64: cleanup tsc.c merge artifact
tsc_unstable is declared twice.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:20 -07:00
Sam Ravnborg
252c01dc27 x86_64: fix section mismatch warnings in tce
Fix the following two section mismatch warnings:

WARNING: vmlinux.o(.text+0x1ce84): Section mismatch: reference to .init.text:free_bootmem (between 'free_tce_table' and 'build_tce_table')
WARNING: vmlinux.o(.text+0x1d04d): Section mismatch: reference to .init.text:__alloc_bootmem_low (between 'alloc_tce_table' and 'kretprobe_trampoline_holder')

In both cases the functions was used only from __init
context so mark them __init.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:19 -07:00
Roland McGrath
2ebc3cc920 x86_64: fix arch_vma_name
The function arch_vma_name() is declared weak and thus it was
not noticed that x86_64 had two almost identical implementations.

It was introduced in syscall32.c by: c633090e31
It was introduced in mm/init.c by: 2aae950b21

Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:18 -07:00
Len Brown
e8b2fd0122 ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source
As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.

For ia64, just add a few stubs in anticipation of future
S3 or S4 support.

Signed-off-by: Len Brown <len.brown@intel.com>
2007-07-25 01:29:39 -04:00
Andi Kleen
037e20a3c5 x86_64: Rename CF Makefile variable in vdso
This avoids a conflict with sparse builds.

Reported by Alexey Dobriyan, fix suggested by Al Viro

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 12:43:28 -07:00
Andi Kleen
b5d009ca6b x86_64: Remove outdated comment in boot decompressor Makefile
64bit code in there now since some time.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Andi Kleen
92417df076 x86_64: Squash initial_code modpost warnings
Get rid of warnings like

WARNING: vmlinux.o(.bootstrap.text+0x1a8): Section mismatch: reference to .init.text:x86_64_start_kernel (between 'initial_code' and 'init_rsp')

- Move initialization code into .text.head like i386 because modpost knows about this already
- Mark initial_code .initdata

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Sam Ravnborg
dec2e6b7aa x86_64: fix section mismatch warning in init.c
Fix following warning:
WARNING: vmlinux.o(.text+0x188ea): Section mismatch: reference to .init.text:__alloc_bootmem_core (between 'alloc_bootmem_high_node' and 'get_gate_vma')

alloc_bootmem_high_node() is only used from __init scope so declare it __init.
And in addition declare the weak variant __init too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Sam Ravnborg
7aa6ec56b9 x86_64: fix section mismatch warning in hpet.c
Fix following warnings:
WARNING: vmlinux.o(.text+0x945e): Section mismatch: reference to .init.text:__set_fixmap (between 'hpet_arch_init' and 'hpet_mask_rtc_irq_bit')
WARNING: vmlinux.o(.text+0x9474): Section mismatch: reference to .init.text:__set_fixmap (between 'hpet_arch_init' and 'hpet_mask_rtc_irq_bit')

hpet_arch_init is only used from __init context so mark it __init.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Andi Kleen
0bd8acd1a7 x86_64: Set K8 CPUID flag for K8/Fam10h/Fam11h
Previously this flag was only used on 32bit, but some shared code can use
it now.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:38 -07:00
Andi Kleen
8f4e956b31 x86: Stop MCEs and NMIs during code patching
When a machine check or NMI occurs while multiple byte code is patched
the CPU could theoretically see an inconsistent instruction and crash.
Prevent this by temporarily disabling MCEs and returning early in the
NMI handler.

Based on discussion with Mathieu Desnoyers.

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Andi Kleen
19d36ccdc3 x86: Fix alternatives and kprobes to remap write-protected kernel text
Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA

Add a general utility function to change write protected text.  The new
function remaps the code using vmap to write it and takes care of CPU
synchronization.  It also does CLFLUSH to make icache recovery faster.

There are some limitations on when the function can be used, see the
comment.

This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.

Contains bug fixes from Zach Amsden and suggestions from Mathieu
Desnoyers.

Cc: Jan Beulich <jbeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Glauber de Oliveira Costa
f51c94528a x86_64: Use read and write crX in .c files
This patch uses the read and write functions provided at system.h
for control registers instead of writting raw assembly over and
over again in .c files. Functions to manipulate cr2 and cr8 were
provided, as they were lacking.

Also, removed some extra space after closing brackets

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Masoud Asgharifard Sharbiani
abd4f7505b x86: i386-show-unhandled-signals-v3
This patch makes the i386 behave the same way that x86_64 does when a
segfault happens.  A line gets printed to the kernel log so that tools
that need to check for failures can behave more uniformly between
debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
/proc/sys/debug/exception-trace)

Also, all of the lines being printed are now using printk_ratelimit() to
deny the ability of DoS from a local user with a program like the
following:

main()
{
       while (1)
               if (!fork()) *(int *)0 = 0;
}

This new revision also includes the fix that Andrew did which got rid of
new sysctl that was added to the system in earlier versions of this.
Also, 'show-unhandled-signals' sysctl has been renamed back to the old
'exception-trace' to avoid breakage of people's scripts.

AK: Enabling by default for i386 will be likely controversal, but let's see what happens
AK: Really folks, before complaining just fix your segfaults
AK: I bet this will find a lot of silent issues

Signed-off-by: Masoud Sharbiani <masouds@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
[ Personally, I've found the complaints useful on x86-64, so I'm all for
  this. That said, I wonder if we could do it more prettily..   -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:03:37 -07:00
Muli Ben-Yehuda
08f1c192c3 x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.

This lays the groundwork for having other users of sysdata, such as
the PCI domains work.

The Calgary bits are tested, the NUMA bits just look ok.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Joachim Deguara
7557244ba2 x86_64: make k8topology multi-core aware
This makes k8topology multicore aware instead of limited to signle- and
dual-core CPUs.  It uses the CPUID to be more future proof.

Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Jan Beulich
81e02d19b9 x86_64: remove __smp_alt* sections
Leftovers from the removal of the more general (but abandoned) SMP
alternatives.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Dan Aloni
5a3ece79b2 x86_64: arch/x86_64/kernel/e820.c lower printk severity
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Dan Aloni
753811dc82 x86_64: arch/x86_64/kernel/aperture.c lower printk severity
Users that use kernel log filtering (e.g.  via syslogd or a proprietry method)
wouldn't like to see warning prints that are not really warnings.

Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Yinghai Lu
f2cf8e085c x86_64: move iommu declaration from proto to iommu.h
[akpm@linux-foundation.org: build fix]
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
David Rientjes
1c05f093c0 x86_64: disable srat when numa emulation succeeds
When NUMA emulation succeeds, acpi_numa needs to be set to -1 so that
srat_disabled() will always return true.  We won't be calling
acpi_scan_nodes() or registering the true nodes we've found.

[hugh@veritas.com: Fix x86_64 CONFIG_NUMA_EMU build: acpi_numa needs CONFIG_ACPI_NUMA]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
David Rientjes
a7e96629ef x86_64: fix e820_hole_size based on address ranges
e820_hole_size() now uses the newly extracted helper function,
e820_find_active_region(), to determine the size of usable RAM in a range of
PFN's.

This was previously broken because of two reasons:

 - The start and end PFN's of each e820 entry were not properly rounded
   prior to excluding those entries in the range, and

 - Entries smaller than a page were not properly excluded from being
   accumulated.

This resulted in emulated nodes being incorrectly mapped to ranges that
were completely reserved and not candidates for being registered as
active ranges.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:14 -07:00
Yinghai Lu
bc2cea6a34 x86_64: disable the GART in shutdown
For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G
RAM installed.  when using kexec to load second kernel.  In the second
kernel, when mem is allocated for GART, it will do the memset for clear, it
will cause restart, because some device still used that for dma.  solution
will be:

in second kernel: disable that at first before we try to allocate mem for
it.  or in the first kernel: do disable that before shutdown.
Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
Andi also point out need to consider to AGP enable but mem less 4G case
too.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:13 -07:00
Yinghai Lu
1048fa5281 x86_64: change _map_single to static in pci_gart.c etc
This function is called via dma_ops->.., so change it to static

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:13 -07:00
Dan Aloni
2d4fa2f665 x86_64: lower printk severity
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Thomas Gleixner
28318daf79 x86_64: use the global PIT lock
Replace the pcspkr private PIT lock by the global PIT lock to serialize the
PIT access all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Will Schmidt
021daae2c2 x86_64: During VM oom condition, kill all threads in process group
During a VM oom condition, kill all threads in the process group.

We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory condition.

Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or otherwise handled, and makes it very obvious that
something has gone wrong.

This change allows the entire process group to be taken down, rather than just
the one thread.

Signed-off-by: Will <will_schmidt@vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Glauber de Oliveira Costa
99253b8e73 x86_64: Move functions declarations to header file
Some interrupt entry points are currently defined in i8259.c They probably
belong in a header.  Right now, their only user is init_IRQ, justifying
their declaration in-file.  But when virtualization comes in, we may be
interested in using that functions in late initializations.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Muli Ben-Yehuda
3cc39bda26 x86_64: Calgary - fold in redundant functions
After the bitmap changes we can get rid of the unlocked versions of
calgary_unmap_sg and iommu_free. Fold __calgary_unmap_sg and
__iommu_free into their calgary_unmap_sg and iommu_free, respectively.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Yinghai Lu
0b11e1c6a6 x86_64: Calgary - change _map_single, etc to static
there function are called via dma_ops->.., so change them to static

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:12 -07:00
Muli Ben-Yehuda
820a149705 x86_64: Calgary - tighten up the bitmap locking
Currently the IOMMU table's lock protects both the bitmap and access
to the hardware's TCE table. Access to the TCE table is synchronized
through the bitmap; therefore, only hold the lock while modifying the
bitmap. This gives a yummy 10-15% reduction in CPU utilization for
netperf on a large SMP machine.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
7354b07595 x86_64: Calgary - fix few style problems pointed out by checkpatch.pl
No actual code was harmed in the production of this patch.

Thanks to Andrew Morton <akpm@linux-foundation.org> for telling me
about checkpatch.pl.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
12de257b83 x86_64: tidy up debug printks
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
e8f2041471 x86_64: only reserve the first 1MB of IO space for CalIOC2
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
8bcf77055c x86_64: tabify and trim trailing whitespace
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Guillaume Thouvenin
05b48ea61c x86_64: cleanup of unneeded macros
Cleanup unneeded macros used for register space address calculation.
Now we are using the EBDA to find the space address.

Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
07877cf6fd x86_64: reserve TCEs with the same address as MEM regions
This works around a bug where DMAs that have the same addresses as
some MEM regions do not go through. Not clear yet if this is due to a
mis-configuration or something deeper.

[akpm@linux-foundation.org: coding style fixlet]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
ddbd41b4e7 x86_64: grab PLSSR too when a DMA error occurs
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
8cb32dc748 x86_64: make dump_error_regs a chip op
Provide seperate versions for Calgary and CalIOC2

Also print out the PCIe Root Complex Status on CalIOC2 errors

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
00be3fa42f x86_64: implement CalIOC2 TCE cache flush sequence
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
c38601084b x86_64: add chip_ops and a quirk function for CalIOC2
[akpm@linux-foundation.org>: make calioc2_chip_ops static]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
8a244590ca x86_64: introduce CalIOC2 support
CalIOC2 is a PCI-e implementation of the Calgary logic. Most of the
programming details are the same, but some differ, e.g., TCE cache
flush. This patch introduces CalIOC2 support - detection and various
support routines. It's not expected to work yet (but will with
follow-on patches).

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
35b6dfa087 x86_64: abstract how we find the iommu_table for a device
... in preparation for doing it differently for CalIOC2.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
ff297b8c08 x86_64: introduce chipset specific ops
Calgary and CalIOC2 share most of the same logic. Introduce struct
cal_chipset_ops for quirks and tce flush logic which are

[akpm@linux-foundation.org: make calgary_chip_ops static]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
b8d2ea1b87 x86_64: introduce handle_quirks() for various chipset quirks
Move the aic94xx split completion timeout handling there.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
9882234bf2 x86_64: update copyright notice
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Muli Ben-Yehuda
a2b663f672 x86_64: generalize calgary_increase_split_completion_timeout
... will be used by CalIOC2 later

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:11 -07:00
Eric W. Biederman
ef3e28c5b9 x86_64: check remote IRR bit before migrating level triggered irq
On x86_64 kernel, level triggered irq migration gets initiated in the
context of that interrupt(after executing the irq handler) and following
steps are followed to do the irq migration.

1. mask IOAPIC RTE entry;     // write to IOAPIC RTE
2. EOI;                       // processor EOI write
3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and
                              // and interrupt vector due to per cpu vector
                              // allocation.
4. unmask IOAPIC RTE entry;   // write to IOAPIC RTE

Because of the per cpu vector allocation in x86_64 kernels, when the irq
migrates to a different cpu, new vector(corresponding to the new cpu) will
get allocated.

An EOI write to local APIC has a side effect of generating an EOI write for
level trigger interrupts (normally this is a broadcast to all IOAPICs).
The EOI broadcast generated as a side effect of EOI write to processor may
be delayed while the other IOAPIC writes (step 3 and 4) can go through.

Normally, the EOI generated by local APIC for level trigger interrupt
contains vector number.  The IOAPIC will take this vector number and search
the IOAPIC RTE entries for an entry with matching vector number and clear
the remote IRR bit (indicate EOI).  However, if the vector number is
changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI
is received later.  This will cause the remote IRR to get stuck causing the
interrupt hang (no more interrupt from this RTE).

Current x86_64 kernel assumes that remote IRR bit is cleared by the time
IOAPIC RTE is reprogrammed.  Fix this assumption by checking for remote IRR
bit and if it still set, delay the irq migration to the next interrupt
arrival event(hopefully, next time remote IRR bit will get cleared before
the IOAPIC RTE is reprogrammed).

Initial analysis and patch from Nanhai.

Clean up patch from Suresh.

Rewritten to be less intrusive, and to contain a big fat comment by Eric.

[akpm@linux-foundation.org: fix comments]
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nanhai Zou <nanhai.zou@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Keith Packard <keith.packard@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Venki Pallipadi
22293e5806 x86: round_jiffies() for i386 and x86-64 non-critical/corrected MCE polling
This helps to reduce the frequency at which the CPU must be taken out of a
lower-power state.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Tim Hockin <thockin@hockin.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Alan Stern
bb1995d52b x86: Make Alt-SysRq-p display the debug register contents
This patch (as921) adds code to the show_regs() routine in i386 and x86_64
to print the contents of the debug registers along with all the others.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Nigel Cunningham
44bf4cea43 x86: PM_TRACE support
Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Tim Hockin
bd78432c8f x86_64: mcelog tolerant level cleanup
Background:
 The MCE handler has several paths that it can take, depending on various
 conditions of the MCE status and the value of the 'tolerant' knob.  The
 exact semantics are not well defined and the code is a bit twisty.

Description:
 This patch makes the MCE handler's behavior more clear by documenting the
 behavior for various 'tolerant' levels.  It also fixes or enhances
 several small things in the handler.  Specifically:
     * If RIPV is set it is not safe to restart, so set the 'no way out'
       flag rather than the 'kill it' flag.
     * Don't panic() on correctable MCEs.
     * If the _OVER bit is set *and* the _UC bit is set (meaning possibly
       dropped uncorrected errors), set the 'no way out' flag.
     * Use EIPV for testing whether an app can be killed (SIGBUS) rather
       than RIPV.  According to docs, EIPV indicates that the error is
       related to the IP, while RIPV simply means the IP is valid to
       restart from.
     * Don't clear the MCi_STATUS registers until after the panic() path.
       This leaves the status bits set after the panic() so clever BIOSes
       can find them (and dumb BIOSes can do nothing).

 This patch also calls nonseekable_open() in mce_open (as suggested by akpm).

Result:
 Tolerant levels behave almost identically to how they always have, but
 not it's well defined.  There's a slightly higher chance of panic()ing
 when multiple errors happen (a good thing, IMHO).  If you take an MBE and
 panic(), the error status bits are not cleared.

Alternatives:
 None.

Testing:
 I used software to inject correctable and uncorrectable errors.  With
 tolerant = 3, the system usually survives.  With tolerant = 2, the system
 usually panic()s (PCC) but not always.  With tolerant = 1, the system
 always panic()s.  When the system panic()s, the BIOS is able to detect
 that the cause of death was an MC4.  I was not able to reproduce the
 case of a non-PCC error in userspace, with EIPV, with (tolerant < 3).
 That will be rare at best.

Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Tim Hockin
e02e68d31e x86_64: support poll() on /dev/mcelog
Background:
 /dev/mcelog is typically polled manually.  This is less than optimal for
 situations where accurate accounting of MCEs is important.  Calling
 poll() on /dev/mcelog does not work.

Description:
 This patch adds support for poll() to /dev/mcelog.  This results in
 immediate wakeup of user apps whenever the poller finds MCEs.  Because
 the exception handler can not take any locks, it can not call the wakeup
 itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
 caught at the next return from interrupt or exit from idle, calling the
 mce_user_notify() routine.  This patch also disables the "fake panic"
 path of the mce_panic(), because it results in printk()s in the exception
 handler and crashy systems.

 This patch also does some small cleanup for essentially unused variables,
 and moves the user notification into the body of the poller, so it is
 only called once per poll, rather than once per CPU.

Result:
 Applications can now poll() on /dev/mcelog.  When an error is logged
 (whether through the poller or through an exception) the applications are
 woken up promptly.  This should not affect any previous behaviors.  If no
 MCEs are being logged, there is no overhead.

Alternatives:
 I considered simply supporting poll() through the poller and not using
 TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
 happening and the user application being notified is *the*most* critical
 window for us.  Many uncorrectable errors can be logged to the network if
 given a chance.

 I also considered doing the MCE poll directly from the idle notifier, but
 decided that was overkill.

Testing:
 I used an error-injecting DIMM to create lots of correctable DRAM errors
 and verified that my user app is woken up in sync with the polling interval.
 I also used the northbridge to inject uncorrectable ECC errors, and
 verified (printk() to the rescue) that the notify routine is called and the
 user app does wake up.  I built with PREEMPT on and off, and verified
 that my machine survives MCEs.

[wli@holomorphy.com: build fix]
Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Tim Hockin
f528e7ba28 x86_64: O_EXCL on /dev/mcelog
Background:
 /dev/mcelog is a clear-on-read interface.  It is currently possible for
 multiple users to open and read() the device.  Users are protected from
 each other during any one read, but not across reads.

Description:
 This patch adds support for O_EXCL to /dev/mcelog.  If a user opens the
 device with O_EXCL, no other user may open the device (EBUSY).  Likewise,
 any user that tries to open the device with O_EXCL while another user has
 the device will fail (EBUSY).

Result:
 Applications can get exclusive access to /dev/mcelog.  Applications that
 do not care will be unchanged.

Alternatives:
 A simpler choice would be to only allow one open() at all, regardless of
 O_EXCL.

Testing:
 I wrote an application that opens /dev/mcelog with O_EXCL and observed
 that any other app that tried to open /dev/mcelog would fail until the
 exclusive app had closed the device.

Caveats:
 None.

Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
David Rientjes
08705b89ec x86_64: fake apicid_to_node mapping for fake numa
When we are in the emulated NUMA case, we need to make sure that all existing
apicid_to_node mappings that point to real node ID's now point to the
equivalent fake node ID's.

If we simply iterate over all apicid_to_node[] members for each node, we risk
remapping an entry if it shares a node ID with a real node.  Since apicid's
may not be consecutive, we're forced to create an automatic array of
apicid_to_node mappings and then copy it over once we have finished remapping
fake to real nodes.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
David Rientjes
3484d79813 x86_64: fake pxm-to-node mapping for fake numa
For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.

When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called.  This function determines the proximity
domain (_PXM) for each true node found on the system.  It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address.  The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.

If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM.  Otherwise, we use REMOTE_DISTANCE.

PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).

Cc: Len Brown <lenb@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
David Rientjes
3af044e0f8 x86_64: extract helper function from e820_register_active_regions
The logic in e820_find_active_regions() for determining the true active
regions for an e820 entry given a range of PFN's is needed for
e820_hole_size() as well.

e820_hole_size() is called from the NUMA emulation code to determine the
reserved area within an address range on a per-node basis.  Its logic should
duplicate that of finding active regions in an e820 entry because these are
the only true ranges we may register anyway.

[akpm@linux-foundation.org: cleanup]
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
Christoph Lameter
34feb2c83b x86_64: Quicklist support for x86_64
This adds caching of pgds and puds, pmds, pte.  That way we can avoid costly
zeroing and initialization of special mappings in the pgd.

A second quicklist is useful to separate out PGD handling.  We can carry the
initialized pgds over to the next process needing them.

Also clean up the pgd_list handling to use regular list macros.  There is no
need anymore to avoid the lru field.

Move the add/removal of the pgds to the pgdlist into the constructor /
destructor.  That way the implementation is congruent with i386.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Jan Beulich
d567b6a955 x86_64: remove unused variable maxcpus
.. and adjust documentation to properly reflect options that are
x86-64 specific.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Jan Beulich
74a1ddc597 x86_64: minor exception trace variables cleanup
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Jan Beulich
cdc1793ef7 x86_64: ia32entry adjustments
Consolidate the three 32-bit system call entry points so that they all
treat registers in similar ways.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Thomas Gleixner
2618f86e00 x86_64: time.c white space wreckage cleanup
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Thomas Gleixner
6935d1f922 x86_64: apic.c coding style janitor work
Fix coding style, white space wreckage and remove unused code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Thomas Gleixner
aec8148fda x86_64: fiuxp pt_reqs leftovers
The hpet_rtc_interrupt handler still uses pt_regs. Fix it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Thomas Gleixner
f40f31bfe1 x86_64: Fix APIC typo
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:09 -07:00
Thomas Gleixner
7ff984785c x86_64: Remove dead code and other janitor work in tsc.c
Remove unused code and variables and do some codingstyle / whitespace
cleanups while at it.

Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Thomas Gleixner
ef81ab2c72 x86_64: Use generic xtime init
xtime can be initialized including the cmos update from the generic
timekeeping code. Remove the arch specific implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Thomas Gleixner
af74522ab7 x86_64: use generic cmos update
Use the generic cmos update function in kernel/time/ntp.c

Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Thomas Gleixner
8180a55028 x86_64: hpet tsc calibration fix broken smi detection logic
The current SMI detection logic in read_hpet_tsc() makes sure,
that when a SMI happens between the read of the HPET counter and
the read of the TSC, this wrong value is used for TSC calibration.

This is not the intention of the function. The comparison must ensure,
that we do _NOT_ use such a value.

Fix the check to use calibration values where delta of the two TSC reads
is smaller than a reasonable threshold.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Andi Kleen
d9c6d69145 x86_64: Don't use softirq safe locks in smp_call_function
It is not fully softirq safe anyways.

Can't do a WARN_ON unfortunately because it could trigger in the
panic case.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Andi Kleen
67cddd9479 i386: Add L3 cache support to AMD CPUID4 emulation
With that an L3 cache is correctly reported in the cache information in /sys

With fixes from Andreas Herrmann and Dean Gaudet and Joachim Deguara

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Andi Kleen
2aae950b21 x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
This implements new vDSO for x86-64.  The concept is similar
to the existing vDSOs on i386 and PPC.  x86-64 has had static
vsyscalls before,  but these are not flexible enough anymore.

A vDSO is a ELF shared library supplied by the kernel that is mapped into
user address space.  The vDSO mapping is randomized for each process
for security reasons.

Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.

The vdso can be disabled with vdso=0

It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall.  clock_gettime is significantly faster
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.

The new calls are generally faster than the old vsyscall.

Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it)

Weak points:
- glibc support still to be written

The VM interface is partly based on Ingo Molnar's i386 version.

Includes compile fix from Joachim Deguara

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
Andi Kleen
5b74e3abb3 x86_64: Use string instruction memcpy/memset on AMD Fam10
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
David Rientjes
ae2c6dcf90 x86_64: various cleanups in NUMA scan node
In acpi_scan_nodes(), we immediately return -1 if acpi_numa <= 0, meaning
we haven't detected any underlying ACPI topology or we have explicitly
disabled its use from the command-line with numa=noacpi.

acpi_table_print_srat_entry() and acpi_table_parse_srat() are only
referenced within drivers/acpi/numa.c, so we can mark them as static and
remove their prototypes from the header file.

Likewise, pxm_to_node_map[] and node_to_pxm_map[] are only used within
drivers/acpi/numa.c, so we mark them as static and remove their externs
from the header file.

The automatic 'result' variable is unused in acpi_numa_init(), so it's
removed.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:08 -07:00
David Rientjes
a2e212dae5 x86_64: Use LOCAL_DISTANCE and REMOTE_DISTANCE in x86_64 ACPI code
Use LOCAL_DISTANCE and  REMOTE_DISTANCE in x86_64 ACPI code

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:07 -07:00
Andi Kleen
78b599aed6 x86_64: Don't rely on a unique IO-APIC ID
Linux 64bit only uses the IO-APIC ID as an internal cookie. In the future
there could be some cases where the IO-APIC IDs are not unique because
they share an 8 bit space with CPUs and if there are enough CPUs
it is difficult to get them that. But Linux needs the io apic ID
internally for its data structures. Assign unique IO APIC ids on
table parsing.

TBD do for 32bit too

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:07 -07:00
Andi Kleen
65d2f0bc65 x86: Always flush pages in change_page_attr
Fix a bug introduced with the CLFLUSH changes: we must always flush pages
changed in cpa(), not just when they are reverted.

Reenable CLFLUSH usage with that now (it was temporarily disabled
for .22)

Add some BUG_ONs

Contains fixes from  Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:07 -07:00
Andi Kleen
5652559761 x86_64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:07 -07:00
Matthew Wilcox
12795067cf Update .gitignore for arch/i386/boot
With the new setup code, we generate a couple more files

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
[ .. and do the same for x86-64 - Alexey ]
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 14:32:38 -07:00
Dave Jiang
c0d1217202 drivers/edac: add new nmi rescan
Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.

Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!

Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:53 -07:00
Rusty Russell
d7e28ffe6c lguest: the host code
This is the code for the "lg.ko" module, which allows lguest guests to
be launched.

[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:52 -07:00
Roland McGrath
2e1d5b8f24 x86_64: Put allocated ELF notes in read-only data segment
This changes the x86_64 linker script to use the asm-generic NOTES macro so
that ELF note sections with SHF_ALLOC set are linked into the kernel image
along with other read-only data.  The PT_NOTE also points to their location.

This paves the way for putting useful build-time information into ELF notes
that can be found easily later in a kernel memory dump.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:47 -07:00
Ollie Wild
b6a2fea393 mm: variable length argument support
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.

We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space.  Once the binfmt code runs and figures out
where the stack should be, we move it downwards.

It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.

[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:45 -07:00
Fenghua Yu
f34e3b61f2 use the new percpu interface for shared data
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute.  Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.

This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:45 -07:00
Fenghua Yu
5fb7dc37dc define new percpu interface for shared data
per cpu data section contains two types of data.  One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus.  In the current kernel, these two sets are
not clearely separated out.  This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.

One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end.  Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.

This patch:

Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
Pavel Machek
77afcf78a2 PM: Integrate beeping flag with existing acpi_sleep flags
Move "debug during resume from s2ram" into the variable we already use
for real-mode flags to simplify code. It also closes nasty trap for
the user in acpi_sleep_setup; order of parameters actually mattered there,
acpi_sleep=s3_bios,s3_mode doing something different from
acpi_sleep=s3_mode,s3_bios.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:43 -07:00
Nigel Cunningham
5a60d6235c PM: Optional beeping during resume from suspend to RAM
Add a feature allowing the user to make the system beep during a resume from
suspend to RAM, on x86_64 and i386.

This is useful for the users with broken resume from RAM, so that they can
verify if the control reaches the kernel after a wake-up event.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:43 -07:00
Nick Piggin
83c54070ee mm: fault feedback #2
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer.  This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).

[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Roland McGrath
29eb51101c Handle bogus %cs selector in single-step instruction decoding
The code for LDT segment selectors was not robust in the face of a bogus
selector set in %cs via ptrace before the single-step was done.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18 12:09:01 -07:00
Linus Torvalds
d756d10e24 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: extent macros cleanup
  Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions.
  ext4: remove extra IS_RDONLY() check
  ext4: Use is_power_of_2()
  Use zero_user_page() in ext4 where possible
  ext4: Remove 65000 subdirectory limit
  ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields 
  ext4: Add nanosecond timestamps
  jbd2: Move jbd2-debug file to debugfs
  jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG
  ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices
  ext4: Make extents code sanely handle on-disk corruption
  ext4: copy i_flags to inode flags on write
  ext4: Enable extents by default
  Change on-disk format to support 2^15 uninitialized extents
  write support for preallocated blocks
  fallocate support in ext4
  sys_fallocate() implementation on i386, x86_64 and powerpc
2007-07-18 10:32:00 -07:00
Jeremy Fitzhardinge
b536b4b962 xen: use the hvc console infrastructure for Xen console
Implement a Xen back-end for hvc console.

* * *
Add early printk support via hvc console, enable using
"earlyprintk=xen" on the kernel command line.

From: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Olof Johansson <olof@lixom.net>
2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge
86313c488a usermodehelper: Tidy up waiting
Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that.  I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
2007-07-18 08:47:40 -07:00
Amit Arora
97ac73506c sys_fallocate() implementation on i386, x86_64 and powerpc
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17 21:42:44 -04:00
Linus Torvalds
49c13b51a1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits)
  KVM: Use CPU_DYING for disabling virtualization
  KVM: Tune hotplug/suspend IPIs
  KVM: Keep track of which cpus have virtualization enabled
  SMP: Allow smp_call_function_single() to current cpu
  i386: Allow smp_call_function_single() to current cpu
  x86_64: Allow smp_call_function_single() to current cpu
  HOTPLUG: Adapt thermal throttle to CPU_DYING
  HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
  HOTPLUG: Add CPU_DYING notifier
  KVM: Clean up #includes
  KVM: Remove kvmfs in favor of the anonymous inodes source
  KVM: SVM: Reliably detect if SVM was disabled by BIOS
  KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
  KVM: MMU: Fix Wrong tlb flush order
  KVM: VMX: Reinitialize the real-mode tss when entering real mode
  KVM: Avoid useless memory write when possible
  KVM: Fix x86 emulator writeback
  KVM: Add support for in-kernel pio handlers
  KVM: VMX: Fix interrupt checking on lightweight exit
  KVM: Adds support for in-kernel mmio handlers
  ...
2007-07-17 11:50:26 -07:00
Andrew Morton
567f3e422a x86_64: speedup touch_nmi_watchdog
Avoid dirtying remote cpu's memory if it already has the correct value.

Cc: Andi Kleen <ak@suse.de>
Cc: Konrad Rzeszutek <konrad@darnok.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:04 -07:00
Konrad Rzeszutek
1c978b935e Inhibit NMI watchdog when Alt-SysRq-T operation is underway
On large memory configuration with not so fast CPUs the NMI watchdog is
triggered when memory addresses are being gathered and printed.  The code
paths for Alt-SysRq-t are sprinkled with touch_nmi_watchdog in various
places but not in this routine (or in the loop that utilizes this
function).  The patch has been tested for regression on large CPU+memory
configuration (128 logical CPUs + 224 GB) and 1,2,4,16-CPU sockets with
various memory sizes (1,2,4,6,20).

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:04 -07:00
Ananth N Mavinakayanahalli
87a7defb0d Kprobes on select architectures no longer EXPERIMENTAL
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.

This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:

	http://marc.info/?l=linux-kernel&m=118396955423812&w=2

Arch maintainers for sparc64, avr32 and s390 need to take a similar call.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Alexey Dobriyan
f284ce7269 PTRACE_POKEDATA consolidation
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.

AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Alexey Dobriyan
7664732315 PTRACE_PEEKDATA consolidation
Identical implementations of PTRACE_PEEKDATA go into generic_ptrace_peekdata()
function.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Pavel Emelianov
bcdcd8e725 Report that kernel is tainted if there was an OOPS
If the kernel OOPSed or BUGed then it probably should be considered as
tainted.  Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel.  This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson  -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Heiko Carstens
608e261968 generic bug: use show_regs() instead of dump_stack()
The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit.  Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning.  This will give more debug informations like register contents,
etc...  In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning.  E.g.  on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():

 [<000000000001517a>] show_trace+0x92/0xe8)
 [<0000000000015270>] show_stack+0xa0/0xd0
 [<00000000000152ce>] dump_stack+0x2e/0x3c
 [<0000000000195450>] report_bug+0x98/0xf8
 [<0000000000016cc8>] illegal_op+0x1fc/0x21c
 [<00000000000227d6>] sysc_return+0x0/0x10

Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:51 -07:00
Vasily Tarasov
b716395e2b diskquota: 32bit quota tools on 64bit architectures
OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
working on 64bit architectures.  In 2.6.10 kernel sys32_quotactl() function
was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
32/64bit clean, enable it for 32bit" However this isn't right.  Look at
if_dqblk structure:

struct if_dqblk {
        __u64 dqb_bhardlimit;
        __u64 dqb_bsoftlimit;
        __u64 dqb_curspace;
        __u64 dqb_ihardlimit;
        __u64 dqb_isoftlimit;
        __u64 dqb_curinodes;
        __u64 dqb_btime;
        __u64 dqb_itime;
        __u32 dqb_valid;
};

For 32 bit quota tools sizeof(if_dqblk) == 0x44.
But for 64 bit kernel its size is 0x48, 'cause of alignment!
Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
that handles this and related situations.

[michal.k.k.piotrowski@gmail.com: build fix]
[akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:48 -07:00
Eric W. Biderman
b1c931e393 x86: initial fixmap support
Needed to get fixed virtual address for USB debug and earlycon with mmio.

Signed-off-by: Eric W. Biderman <ebiderman@xmisson.com>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:35 -07:00
Avi Kivity
4055551bbc x86_64: Allow smp_call_function_single() to current cpu
This removes the requirement for callers to get_cpu() to check in simple
cases.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-07-16 12:05:50 +03:00
Adrian Bunk
68485695e5 [CPUFREQ] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI
This patch contains the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2007-07-13 01:29:51 -04:00
Linus Torvalds
21ba0f88ae Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (34 commits)
  PCI: Only build PCI syscalls on architectures that want them
  PCI: limit pci_get_bus_and_slot to domain 0
  PCI: hotplug: acpiphp: avoid acpiphp "cannot get bridge info" PCI hotplug failure
  PCI: hotplug: acpiphp: remove hot plug parameter write to PCI host bridge
  PCI: hotplug: acpiphp: fix slot poweroff problem on systems without _PS3
  PCI: hotplug: pciehp: wait for 1 second after power off slot
  PCI: pci_set_power_state(): check for PM capabilities earlier
  PCI: cpci_hotplug: Convert to use the kthread API
  PCI: add pci_try_set_mwi
  PCI: pcie: remove SPIN_LOCK_UNLOCKED
  PCI: ROUND_UP macro cleanup in drivers/pci
  PCI: remove pci_dac_dma_... APIs
  PCI: pci-x-pci-express-read-control-interfaces cleanups
  PCI: Fix typo in include/linux/pci.h
  PCI: pci_ids, remove double or more empty lines
  PCI: pci_ids, add atheros and 3com_2 vendors
  PCI: pci_ids, reorder some entries
  PCI: i386: traps, change VENDOR to DEVICE
  PCI: ATM: lanai, change VENDOR to DEVICE
  PCI: Change all drivers to use pci_device->revision
  ...
2007-07-12 13:40:57 -07:00
H. Peter Anvin
91a6c462b0 Use the new x86 setup code for x86-64; unify with i386
This unifies arch/*/boot (except arch/*/boot/compressed) between
i386 and x86-64, and uses the new x86 setup code for x86-64 as well.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:55 -07:00
H. Peter Anvin
77e1dd654b x86-64: add CONFIG_PHYSICAL_ALIGN for consistency with i386
Add CONFIG_PHYSICAL_ALIGN (currently as a hardcoded constant) to provide
consistency with i386.  This value is manifest in the bzImage header.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin
85414b693a Define zero-page offset 0x1e4 as a scratch field, and use it
The relocatable kernel code needs a scratch field for the decompressor
to determine its own location.  It was using a location inside
struct screen_info; reserve a free location and document it as scratch
instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
Venki Pallipadi
1d67953f2b Use a new CPU feature word to cover features that are spread around
Some Intel features are spread around in different CPUID leafs like 0x5,
0x6 and 0xA.  Make this feature detection code common across i386 and
x86_64.

Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature
will be enabled automatically by current acpi-cpufreq driver.

Refer to Intel Software Developer's Manual for more details about the feature.

Thanks to hpa (H Peter Anvin) for the making the actual code detecting the
scattered features data-driven.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
H. Peter Anvin
ec481536b1 Unify the CPU features vectors between i386 and x86-64
Unify the handling of the CPU features vectors between i386 and x86-64.
This also adopts the collapsing of features which are required at
compile-time into constant tests from x86-64 to i386.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-12 10:55:54 -07:00
Jan Beulich
caa5171622 PCI: remove pci_dac_dma_... APIs
Based on replies to a respective query, remove the pci_dac_dma_...() APIs
(except for pci_dac_dma_supported() on Alpha, where this function is used
in non-DAC PCI DMA code).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Jesse Barnes <jesse.barnes@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:11 -07:00
Siddha, Suresh B
48d8d7ee5d x86_64 irq: use mask/unmask and proper locking in fixup_irqs()
Force irq migration path during cpu offline, is not using proper locks and
irq_chip mask/unmask routines.  This will result in some races(especially
the device generating the interrupt can see some inconsistent state,
resulting in issues like stuck irq,..).

Appended patch fixes the issue by taking proper lock and encapsulating
irq_chip set_affinity() with a mask() before and an unmask() after.

This fixes a MSI irq stuck issue reported by Darrick Wong.

There are several more general bugs in this area(irq migration in the
process context). For example,

 1. Possibility of missing edge triggered irq.
 2. Reliable method of migrating level triggered irq in the process context.

We plan to look and close these in the near future.

Eric says:
	In addition even with the fix from Suresh there is still at least one
	nasty hardware race in fixup_irqs().   However we exercise that code
	path rarely enough that we are unlikely to hit it in the real world,
	and that race seems to have existed since the code was merged.  And a
	fix for that is not coming soon as it is an open investigation area
	if we can fix irq migration to work outside of irq context or if
	we have to rework the requirements imposed by the generic cpu hotplug
	and layer on fixup_irqs().  So this may come up again.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Reported-and-tested-by: Darrick Wong <djwong@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-26 16:54:29 -07:00