Commit graph

1608 commits

Author SHA1 Message Date
Andi Kleen
0a4599c894 [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
For i386/x86-64.

Straight forward -- just reuse the Family 0xf code.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:25 +01:00
Andi Kleen
f790cd30d0 [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
Just various new acronyms. The new popcnt bit is in the middle
of Intel space. This looks a little weird, but I've been assured
it's ok.

Also I fixed RDTSCP for i386 which was at the wrong place.

For i386 and x86-64.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:25 +01:00
Eric W. Biederman
2fb12a9bca [PATCH] x86-64: survive having no irq mapping for a vector
Occasionally the kernel has bugs that result in no irq being found for a
given cpu vector.  If we acknowledge the irq the system has a good chance
of continuing even though we dropped an irq message.  If we continue to
simply print a message and not acknowledge the irq the system is likely to
become non-responsive shortly there after.

AK: Fixed compilation for UP kernels

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Luigi Genoni" <luigi.genoni@pirelli.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:25 +01:00
Evgeniy Polyakov
8469adde59 [PATCH] x86-64: Minor patch for compilation warning in x86_64 signal code
If DEBUG_SIG is enbaled in source code, ia32_signal.c compiles with warning
due to wrong format string.  Attached patch fixes that.  It is quite minor
update, since by default DEBUG_SIG is not enabled and can not be turned on
without code modification.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:25 +01:00
Roland Dreier
3e94fb8f54 [PATCH] x86-64: avoid warning message livelock
I've seen my box paralyzed by an endless spew of

    rtc: lost some interrupts at 1024Hz.

messages on the serial console.  What seems to be happening is that
something real causes an interrupt to be lost and triggers the
message.  But then printing the message to the serial console (from
the hpet interrupt handler) takes more than 1/1024th of a second, and
then some more interrupts are lost, so the message triggers again....

Fix this by adding a printk_ratelimit() before printing the warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:25 +01:00
Benjamin Romer
ee4eff6ff6 [PATCH] x86-64: update IO-APIC dest field to 8-bit for xAPIC
On the Unisys ES7000/ONE system, we encountered a problem where performing
a kexec reboot or dump on any cell other than cell 0 causes the system
timer to stop working, resulting in a hang during timer calibration in the
new kernel.

We traced the problem to one line of code in disable_IO_APIC(), which needs
to restore the timer's IO-APIC configuration before rebooting.  The code is
currently using the 4-bit physical destination field, rather than using the
8-bit logical destination field, and it cuts off the upper 4 bits of the
timer's APIC ID.  If we change this to use the logical destination field,
the timer works and we can kexec on the upper cells.  This was tested on
two different cells (0 and 2) in an ES7000/ONE system.

For reference, the relevant Intel xAPIC spec is kept at
ftp://download.intel.com/design/chipsets/e8501/datashts/30962001.pdf,
specifically on page 334.

Signed-off-by: Benjamin M Romer <benjamin.romer@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:25 +01:00
Bob Picco
f0a5a58aa8 [PATCH] x86-64: clean up sparsemem memory_present call
Eliminate arch specific memory_present call x86_64 NUMA by utilizing
sparse_memory_present_with_active_regions.

Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:25 +01:00
Ingo Molnar
5d0e600d90 [PATCH] x86: fix laptop bootup hang in init_acpi()
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():

 Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
 Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
 Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
 Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
 Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
 Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
 Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
 Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()

It's a hard hang that not even an NMI could punch through!  Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...

After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.

So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang.  (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)

Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.

As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang.  I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts.  Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).

I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:24 +01:00
Muli Ben-Yehuda
310adfdd91 [PATCH] x86-64: robustify bad_dma_address handling
- set bad_dma_address explicitly to 0x0
- reserve 32 pages from bad_dma_address and up
- WARN_ON() a driver feeding us bad_dma_address

Thanks to Leo Duran <leo.duran@amd.com> for the suggestion.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Leo Duran <leo.duran@amd.com>
Cc: Job Mason <jdmason@kudzu.us>
2007-02-13 13:26:24 +01:00
Andi Kleen
fc986db4fc [PATCH] x86-64: Don't reserve ROMs
We trust the e820 table, so explicitely reserving ROMs shouldn't
be needed.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Andi Kleen
00edefae05 [PATCH] x86-64: Fix off by one error in IOMMU boundary checking
Should be harmless because there is normally no memory there, but
technically it was incorrect.

Pointed out by Leo Duran

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Zachary Amsden
ffb6017563 [PATCH] x86-64: x86_64 - Fix FS/GS registers for VT execution
Initialize FS and GS to __KERNEL_DS as well.  The actual value of them is not
important, but it is important to reload them in protected mode.  At this time,
they still retain the real mode values from initial boot.  VT disallows
execution of code under such conditions, which means hardware virtualization
can not be used to boot the kernel on Intel platforms, making the boot time
painfully slow.

This requires moving the GS load before the load of GS_BASE, so just move
all the segments loads there to keep them together in the code.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Andi Kleen
9a11ff6827 [PATCH] x86-64: Unexport __supported_pte_mask
The symbol is needed to manipulate page tables, and modules shouldn't
do that.

Leftover from 2.4, but no in tree module should need it now.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Andi Kleen
f49481bc50 [PATCH] x86-64: Check return value of putreg in PTRACE_SETREGS
This means if an illegal value is set for the segment registers there
ptrace will error out now with an errno instead of silently ignoring
it.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Jack Steiner
2f7a2a79c3 [PATCH] x86-64: - Ignore long SMI interrupts in clock calibration code - update 1
Add failsafe mechanism to HPET/TSC clock calibration.

	Signed-off-by: Jack Steiner <steiner@sgi.com>

Updated to include failsafe mechanism & additional community feedback.
Patch built on latest 2.6.20-rc4-mm1 tree.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:24 +01:00
Nicolas Kaiser
edf8dd36b5 [PATCH] x86-64: Kconfig typos
Some typos in Kconfig.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:23 +01:00
Andi Kleen
a98f0dd34d [PATCH] x86-64: Allow to run a program when a machine check event is detected
When a machine check event is detected (including a AMD RevF threshold
overflow event) allow to run a "trigger" program. This allows user space
to react to such events sooner.

The trigger is configured using a new trigger entry in the
machinecheck sysfs interface. It is currently shared between
all CPUs.

I also fixed the AMD threshold handler to run the machine
check polling code immediately to actually log any events
that might have caused the threshold interrupt.

Also added some documentation for the mce sysfs interface.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:23 +01:00
Jan Beulich
24ce0e96f2 [PATCH] x86-64: Tighten mce_amd driver MSR reads
while debugging an unrelated problem in Xen, I noticed odd reads from
non-existent MSRs. Having now found time to look why these happen, I
came up with below patch, which
- prevents accessing MCi_MISCj with j > 0 when the block pointer in
MCi_MISC0 is zero
- accesses only contiguous MCi_MISCj until a non-implemented one is
found
- doesn't touch unimplemented blocks in mce_threshold_interrupt at all
- gives names to two bits previously derived from MASK_VALID_HI (it
took me some time to understand the code without this)

The first three items, besides being apparently closer to the spec, should
namely help cutting down on the time mce_threshold_interrupt() takes.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:23 +01:00
Jan Beulich
9b35589756 [PATCH] x86: simplify notify_page_fault()
Remove all parameters from this function that aren't really variable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:23 +01:00
Venkatesh Pallipadi
1676193937 [PATCH] x86-64: Handle 32 bit PerfMon Counter writes cleanly in x86_64 nmi_watchdog
P6 CPUs and Core/Core 2 CPUs which has 'architectural perf mon' feature,
only supports write of low 32 bits in Performance Monitoring Counters.
Bits 32..39 are sign extended based on bit 31 and bits 40..63 are reserved
and should be zero.

This patch:

Change x86_64 nmi handler to handle this case cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:22 +01:00
Glauber de Oliveira Costa
4c3cbf75b2 [PATCH] x86-64: Use constant instead of raw number in x86_64 ioperm.c
This is a tiny cleanup to increase readability

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:22 +01:00
Glauber de Oliveira Costa
c49c5330c9 [PATCH] x86-64: Remove fastcall references in x86_64 code
Unlike x86, x86_64 already passes arguments in registers.  The use of
regparm attribute makes no difference in produced code, and the use of
fastcall just bloats the code.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:22 +01:00
Rohit Seth
53fee04f31 [PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole
This patch resolves the issue of running with numa=fake=X on kernel command
line on x86_64 machines that have big IO hole.  While calculating the size
of each node now we look at the total hole size in that range.

Previously there were nodes that only had IO holes in them causing kernel
boot problems.  We now use the NODE_MIN_SIZE (64MB) as the minimum size of
memory that any node must have.  We reduce the number of allocated nodes if
the number of nodes specified on kernel command line results in any node
getting memory smaller than NODE_MIN_SIZE.

This change allows the extra memory to be incremented in NODE_MIN_SIZE
granule and uniformly distribute among as many nodes (called big nodes) as
possible.

[akpm@osdl.org: build fix]
Signed-off-by: David Rientjes <reintjes@google.com>
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:22 +01:00
Catalin Marinas
006e84ee3a [PATCH] x86-64: do not always end the stack trace with ULONG_MAX
It makes more sense to end the stack trace with ULONG_MAX only if
nr_entries < max_entries.  Otherwise, we lose one entry in the long stack
traces and cannot know whether the trace was complete or not.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
Karsten Weiss
5558870bfb [PATCH] x86-64: improved iommu documentation
- add SWIOTLB config help text
- mention Documentation/x86_64/boot-options.txt in
  Documentation/kernel-parameters.txt
- remove the duplication of the iommu kernel parameter documentation.
- Better explanation of some of the iommu kernel parameter options.
- "32MB<<order" instead of "32MB^order".
- Mention the default "order" value.
- list the four existing PCI-DMA mapping implementations of arch x86_64
- group the iommu= option keywords by PCI-DMA mapping implementation.
- Distinguish iommu= option keywords from number arguments.
- Explain the meaning of DAC and SAC.

Signed-off-by: Karsten Weiss <knweiss@science-computing.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:21 +01:00
OGAWA Hirofumi
56829d1982 [PATCH] mmconfig: fix unreachable_devices()
Currently, unreachable_devices() compares value of mmconfig and value
of conf1. But it doesn't check the device is reachable or not.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
OGAWA Hirofumi
429d512e53 [PATCH] mmconfig: minor cleanup in mmconfig code
This just cleans up.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
OGAWA Hirofumi
a4ec1b2c9f [PATCH] mmconfig: remove #define MMCONFIG_APER_XXX
MMCONFIG_APER_XXX is unneeded in arch/x86_64/pci/mmconfig.c.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
OGAWA Hirofumi
44de0203fa [PATCH] mmconfig: Reject a broken MCFG tables on Asus etc
This rejects broken MCFG tables on Asus. When the table
looks bogus just disable mmconfig

Arjan and Andi suggested this.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
OGAWA Hirofumi
faed197b7b [PATCH] mmconfig: Fix x86_64 ioremap base_address
Current mmconfig has some problems of remapped range.

a) In the case of broken MCFG tables on Asus etc., we need to remap 256M
   range, but currently only remap 1M.

b) The base address always corresponds to bus number 0, but currently we
   are assuming it corresponds to start bus number.

This patch fixes the above problems.

(akpm: Arjan suggests that if the MCFG table is broken we just shouldn't use
it, rather than try to work around things).

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:20 +01:00
Olivier Galibert
b78673944b [PATCH] mmconfig: Share parts of mmconfig code between i386 and x86-64
i386 and x86-64 pci mmconfig code have a lot in common.  So share what's
shareable between the two.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:20 +01:00
Amul Shah
54413927f0 [PATCH] x86-64: x86_64-make-the-numa-hash-function-nodemap-allocation fix fix
- Removed an extraneous debug message from allocate_cachealigned_map

- Changed extract_lsb_from_nodes to return 63 for the case where there was
  only one memory node.  The prevents the creation of the dynamic hashmap.

- Changed extract_lsb_from_nodes to use only the starting memory address of
  a node.  On an ES7000, our nodes overlap the starting and ending address,
  meaning, that we see nodes like

	00000 - 10000
	10000 - 20000

  But other systems have nodes whose start and end addresses do not overlap.
   For example:

	00000 - 0FFFF
	10000 - 1FFFF

  In this case, using the ending address will result in an LSB much lower
  than what is possible.  In this case an LSB of 1 when in reality it should
  be 16.

Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:20 +01:00
Amul Shah
076422d2af [PATCH] x86-64: Allocate the NUMA hash function nodemap dynamically
Remove the statically allocated memory to NUMA node hash map in favor of a
dynamically allocated memory to node hash map (it is cache aligned).

This patch has the nice side effect in that it allows the hash map to grow
for systems with large amounts of memory (256GB - 1TB), but suffer from
having small PCI space tacked onto the boot node (which is somewhere
between 192MB to 512MB on the ES7000).

Signed-off-by: Amul Shah <amul.shah@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-02-13 13:26:19 +01:00
Andi Kleen
0812a579c9 [PATCH] x86-64: Add __copy_from_user_nocache
This does user copies in fs write() into the page cache with write combining.
This pushes the destination out of the CPU's cache, but allows higher bandwidth
in some case.

The theory is that the page cache data is usually not touched by the
CPU again and it's better to not pollute the cache with it. Also it is a little
faster.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:19 +01:00
Andi Kleen
287eeb5e02 [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:19 +01:00
Len Brown
7f8f97c3cc ACPI: acpi_table_parse() now returns success/fail, not count
Returning count for tables that are supposed to be unique
was useless and confusing.

Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-13 02:58:52 -05:00
Arjan van de Ven
5dfe4c964a [PATCH] mark struct file_operations const 2
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:44 -08:00
Alon Bar-Lev
7a3a06d0e1 [PATCH] Dynamic kernel command-line: fixups
Remove in-source externs, linux/init.h is included in all cases.
This is a fixups for "Dynamic kernel command-line" patch.

It also includes some uml __init fixups so that we can __initdata also its
command_line.

Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:39 -08:00
Alon Bar-Lev
adf48856db [PATCH] Dynamic kernel command-line: x86_64
1. Rename saved_command_line into boot_command_line.
2. Set command_line as __initdata.

Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:39 -08:00
Kirill Korotaev
cefc8be824 [PATCH] Consolidate bust_spinlocks()
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.

Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock.  ia64 also loses timerlist_lock.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:34 -08:00
Kyle McMartin
d4d23add3a [PATCH] Common compat_sys_sysinfo
I noticed that almost all architectures implemented exactly the same
sys32_sysinfo...  except parisc, where a bug was to be found in handling of
the uptime.  So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.

This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:32 -08:00
Robert P. J. Day
3de3af130b [PATCH] Remove unnecessary memset(0) calls after kzalloc() calls.
Delete the few remaining unnecessary calls to memset(0) after a call to
kzalloc().

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:31 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Jean-Paul Saman
67d38229df [PATCH] disable init/initramfs.c: architectures
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected.  This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).

Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:25 -08:00
Christoph Lameter
5ac6da669e [PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA
As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
channel management.  Other functionality may still expect GFP_DMA to
provide memory below 16M.  So we need to make sure that CONFIG_ZONE_DMA is
set independent of CONFIG_GENERIC_ISA_DMA.  Undo the modifications to
mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
theses explicitly in each arches Kconfig.

Reviews must occur for each arch in order to determine if ZONE_DMA can be
switched off.  It can only be switched off if we know that all devices
supported by a platform are capable of performing DMA transfers to all of
memory (Some arches already support this: uml, avr32, sh sh64, parisc and
IA64/Altix).

In order to switch ZONE_DMA off conditionally, one would have to establish
a scheme by which one can assure that no drivers are enabled that are only
capable of doing I/O to a part of memory, or one needs to provide an
alternate means of performing an allocation from a specific range of memory
(like provided by alloc_pages_range()) and insure that all drivers use that
call.  In that case the arches alloc_dma_coherent() may need to be modified
to call alloc_pages_range() instead of relying on GFP_DMA.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:19 -08:00
Roland McGrath
dc5882b20a [PATCH] x86_64 ia32 vDSO: use install_special_mapping
This patch uses install_special_mapping for the ia32 vDSO setup, consolidating
duplicated code.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:25:47 -08:00
Linus Torvalds
78149df6d5 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (41 commits)
  Revert "PCI: remove duplicate device id from ata_piix"
  msi: Make MSI useable more architectures
  msi: Kill the msi_desc array.
  msi: Remove attach_msi_entry.
  msi: Fix msi_remove_pci_irq_vectors.
  msi: Remove msi_lock.
  msi: Kill msi_lookup_irq
  MSI: Combine pci_(save|restore)_msi/msix_state
  MSI: Remove pci_scan_msi_device()
  MSI: Replace pci_msi_quirk with calls to pci_no_msi()
  PCI: remove duplicate device id from ipr
  PCI: remove duplicate device id from ata_piix
  PCI: power management: remove noise on non-manageable hw
  PCI: cleanup MSI code
  PCI: make isa_bridge Alpha-only
  PCI: remove quirk_sis_96x_compatible()
  PCI: Speed up the Intel SMBus unhiding quirk
  PCI Quirk: 1k I/O space IOBL_ADR fix on P64H2
  shpchp: delete trailing whitespace
  shpchp: remove DBG_XXX_ROUTINE
  ...
2007-02-07 19:23:44 -08:00
Eric W. Biederman
f7feaca77d msi: Make MSI useable more architectures
The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now
responsible for allocating and freeing the linux irq in addition to
setting up the the linux irq to work with the interrupt.

arch_setup_msi_irq now takes a pci_device and a msi_desc and returns
an irq.

With this change in place this code should be useable by all platforms
except those that won't let the OS touch the hardware like ppc RTAS.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 15:50:08 -08:00
Linus Torvalds
21d37bbc65 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits)
  ACPICA: reduce table header messages to fit within 80 columns
  asus-laptop: merge with ACPICA table update
  ACPI: bay: Convert ACPI Bay driver to be compatible with sysfs update.
  ACPI: bay: new driver is EXPERIMENTAL
  ACPI: bay: make drive_bays static
  ACPI: bay: make bay a platform driver
  ACPI: bay: remove prototype procfs code
  ACPI: bay: delete unused variable
  ACPI: bay: new driver adding removable drive bay support
  ACPI: dock: check if parent is on dock
  ACPICA: fix gcc build warnings
  Altix: Add ACPI SSDT PCI device support (hotplug)
  Altix: ACPI SSDT PCI device support
  ACPICA: reduce conflicts with Altix patch series
  ACPI_NUMA: fix HP IA64 simulator issue with extended memory domain
  ACPI: fix HP RX2600 IA64 boot
  ACPI: build fix for IBM x440 - CONFIG_X86_SUMMIT
  ACPICA: Update version to 20070126
  ACPICA: Fix for incorrect parameter passed to AcpiTbDeleteTable during table load.
  ACPICA: Update copyright to 2007.
  ...
2007-02-07 15:36:08 -08:00
Jan Beulich
563aaf064f [IA64] swiotlb cleanup
- add proper __init decoration to swiotlb's init code (and the code calling
  it, where not already the case)

- replace uses of 'unsigned long' with dma_addr_t where appropriate

- do miscellaneous simplicfication and cleanup

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-02-05 18:51:25 -08:00
Alexey Starikovskiy
15a58ed121 ACPICA: Remove duplicate table definitions (non-conflicting), cont
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:29 -05:00
Alexey Starikovskiy
cee324b145 ACPICA: use new ACPI headers.
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:28 -05:00
Alexey Starikovskiy
ad71860a17 ACPICA: minimal patch to integrate new tables into Linux
Signed-off-by: Len Brown <len.brown@intel.com>
2007-02-02 21:14:22 -05:00
Roland McGrath
c633090e31 [PATCH] x86_64 ia32 vDSO: define arch_vma_name
This patch makes x86_64 define arch_vma_name for CONFIG_IA32_EMULATION.  This
makes the ia32 vDSO mapping appear in /proc/PID/maps with "[vdso]" for ia32
processes, as it does on native i386.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Roland McGrath
e03f0ca116 [PATCH] x86_64 ia32 vDSO: use VM_ALWAYSDUMP
This patch fixes ia32 core dumps on x86_64 to include just one phdr for the
vDSO vma.  Currently it writes a confused format with two phdrs for the
address, one without contents and one with.  This patch removes the
special-case core writing macros for the ia32 vDSO.  Instead, it uses
VM_ALWAYSDUMP in the vma.  This changes core dumps so they no longer include
the non-PT_LOAD phdrs from the vDSO, consistent with fixed native i386 core
dumps.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:58 -08:00
Venkatesh Pallipadi
58d9ce7d75 [PATCH] Revert nmi_known_cpu() check during boot option parsing
Commit f2802e7f57 and its x86 version
(b7471c6da9) adds nmi_known_cpu() check
while parsing boot options in x86_64 and i386.

With that, "nmi_watchdog=2" stops working for me on Intel Core 2 CPU
based system.

The problem is, setup_nmi_watchdog is called while parsing the boot
option and identify_cpu is not done yet.  So, the return value of
nmi_known_cpu() is not valid at this point.

So revert that check.  This should not have any adverse effect as the
nmi_known_cpu() check is done again later in enable_lapic_nmi_watchdog().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:05 -08:00
Andi Kleen
7401969907 [PATCH] x86-64: Fix warnings in ia32_aout.c
Fix

linux/arch/x86_64/ia32/ia32_aout.c: In function ‘create_aout_tables’:
linux/arch/x86_64/ia32/ia32_aout.c:244: warning: cast from pointer to integer of different size
linux/arch/x86_64/ia32/ia32_aout.c:253: warning: cast from pointer to integer of different size

with gcc 4.3
Signed-off-by: Andi Kleen <ak@suse.de>
2007-01-11 01:52:45 +01:00
Muli Ben-Yehuda
b92cc55923 [PATCH] x86-64: tighten up printks
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-01-11 01:52:44 +01:00
Jack Steiner
ed5316d445 [PATCH] x86-64: - Ignore long SMI interrupts in clock calibration
Ensure that no SMI interrupts occur between the read of the HPET & TSC
in the clock calibration loop.

I noticed that a 2.66GHz system incorrectly detected the processor
clock speed about 1/7 of the time:

	time.c: Detected 2660.005 MHz processor.	(most of the time)
	time.c: Detected 2988.203 MHz processor.	(sometime)

The problem is caused by an SMI interrupt occuring in hpet_calibrate_tsc()
between the read of the HPET & TSC. Prior to switching the BIOS into
ACPI mode, it appears that every 27msec an SMI interrupt occurs. The
SMI interrupt takes 4.8 msec to process.

Note: On my test system, TICK_MIN had to be >380. I picked 5000
to minimize risk of having a value that is too small for other
platforms.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>

 arch/x86_64/kernel/time.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)
2007-01-11 01:52:44 +01:00
Andi Kleen
03c3cc6128 [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2007-01-11 01:52:44 +01:00
Linus Torvalds
fea5f1e196 Revert "[PATCH] x86-64: Try multiple timer variants in check_timer"
This reverts commit b026872601, which has
been linked to several problem reports with IO-APIC and the timer.
Machines either don't boot because the timer doesn't happen, or we get
double timer interrupts because we end up double-routing the timer irq
through multiple interfaces.

See for example

	http://lkml.org/lkml/2006/12/16/101
	http://lkml.org/lkml/2007/1/3/9
	http://bugzilla.kernel.org/show_bug.cgi?id=7789

about some of the discussion.

Patches to fix this cleanup exist (and have been confirmed to work fine
at least for some of the affected cases) and we'll revisit it for
2.6.21, but this late in the -rc series we're better off just reverting
the incomplete commit that caused the problems.

Suggested-by: Adrian Bunk <bunk@stusta.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Yinghai Lu <yinghai.lu@amd.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-08 15:04:46 -08:00
Linus Torvalds
de9e957f12 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] longhaul: Kill off warnings introduced by recent changes.
  [CPUFREQ] Uninitialized use of cmd.val in arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c:acpi_cpufreq_target()
  [CPUFREQ] Longhaul - Always guess FSB
  [CPUFREQ] Longhaul - Fix up powersaver assumptions.
  [CPUFREQ] longhaul: Fix up unreachable code.
  [CPUFREQ] speedstep-centrino: missing space and bracket
  [CPUFREQ] Bug fix for acpi-cpufreq and cpufreq_stats oops on frequency change notification
  [CPUFREQ] select consistently
2007-01-03 17:34:12 -08:00
OGAWA Hirofumi
7523c4dd99 [PATCH] x86_64: Fix dump_trace()
If caller passed the tsk, we should use it to validate a stack ptr.
Otherwise, sysrq-t and other debugging stuff doesn't work.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-03 08:49:59 -08:00
Linus Torvalds
36f696cd7f Revert "[PATCH] x86_64: fix boot hang caused by CALGARY_IOMMU_ENABLED_BY_DEFAULT"
This reverts commit a9622f6219.  Now that
the Calgary code apparently detects itself properly, it's not needed any
more.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-01 10:55:45 -08:00
Soeren Sonnenburg
10f549fa15 [PATCH] make fn_keys work again on power/macbooks
The apple fn keys don't work anymore with 2.6.20-rc1.

The reason is that USB_HID_POWERBOOK appears in several files although
USB_HIDINPUT_POWERBOOK is the thing to be used.

The patch fixes this.

Cc: Greg KH <greg@kroah.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:55:55 -08:00
Randy Dunlap
917325d30a [CPUFREQ] select consistently
Make x86_64 ACPI_CPU_FREQ select CPU_FREQ_TABLE like other methods do.
(although we should still eliminate as much use of 'select' as possible)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2006-12-22 22:45:41 -05:00
Ingo Molnar
0888f06ac9 [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.

After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:

                 _------=> CPU#
                / _-----=> irqs-off
               | / _----=> need-resched
               || / _---=> hardirq/softirq
               ||| / _--=> preempt-depth
               |||| /
               |||||     delay
   cmd     pid ||||| time  |   caller
      \   /    |||||   \   |   /
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
  IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
  IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
  IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
  ...
  <idle>-0     1...1   11us!: default_idle (cpu_idle)
  ...
  <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
  ...
   <...>-5856  0D..2  618us : __switch_to (__schedule)
   <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
   <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
   <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
   <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)

what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!

the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:

    commit 495ab9c045
    Author: Andi Kleen <ak@suse.de>
    Date:   Mon Jun 26 13:59:11 2006 +0200

    [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status

    During some profiling I noticed that default_idle causes a lot of
    memory traffic. I think that is caused by the atomic operations
    to clear/set the polling flag in thread_info. There is actually
    no reason to make this atomic - only the idle thread does it
    to itself, other CPUs only read it. So I moved it into ti->status.

the problem is this type of change:

        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-               clear_thread_flag(TIF_POLLING_NRFLAG);
+               current_thread_info()->status &= ~TS_POLLING;
                smp_mb__after_clear_bit();
                while (!need_resched()) {
                        local_irq_disable();

this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:

        clear_bit(flag, &ti->flags);

and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:

  static inline void clear_bit(int nr, volatile unsigned long * addr)
  {
          __asm__ __volatile__( LOCK_PREFIX
                  "btrl %1,%0"

hence smp_mb__after_clear_bit() is defined as a simple compile barrier:

  #define smp_mb__after_clear_bit()       barrier()

but the explicit TS_POLLING clearing introduced by the patch:

+               current_thread_info()->status &= ~TS_POLLING;

is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.

CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]

Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.

( And it also turned out that the reason why these latencies never
  triggered on my testsystems is that i routinely use idle=poll, which
  was the only idle variant not affected by this bug. )

The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:51 -08:00
Ingo Molnar
136f1e7a8c [PATCH] x86_64: fix boot time hang in detect_calgary()
if CONFIG_CALGARY_IOMMU is built into the kernel via
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT, or is enabled via the
iommu=calgary boot option, then the detect_calgary() function runs to
detect the presence of a Calgary IOMMU.

detect_calgary() first searches the BIOS EBDA area for a "rio_table_hdr"
BIOS table. It has this parsing algorithm for the EBDA:

	while (offset) {
		...
		/* The next offset is stored in the 1st word. 0 means no more */
 		offset = *((unsigned short *)(ptr + offset));
	}

got that? Lets repeat it slowly: we've got a BIOS-supplied data
structure, plus Linux kernel code that will only break out of an
infinite parsing loop once the BIOS gives a zero offset. Ok?

Translation: what an excellent opportunity for BIOS writers to lock up
the Linux boot process in an utterly hard to debug place! Indeed the
BIOS jumped on that opportunity on my box, which has the following EBDA
chaining layout:

  384, 65282, 65535, 65535, 65535, 65535, 65535, 65535 ...

see the pattern? So my, definitely non-Calgary system happily locks up
in detect_calgary()!

the patch below fixes the boot hang by trusting the BIOS-supplied data
structure a bit less: the parser always has to make forward progress,
and if it doesnt, we break out of the loop and i get the expected kernel
message:

  Calgary: Unable to locate Rio Grande Table in EBDA - bailing!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-21 00:08:28 -08:00
Ingo Molnar
a9622f6219 [PATCH] x86_64: fix boot hang caused by CALGARY_IOMMU_ENABLED_BY_DEFAULT
one of my boxes didnt boot the 2.6.20-rc1-rt0 kernel rpm, it hung during
early bootup. After an hour or two of happy debugging i narrowed it down
to the CALGARY_IOMMU_ENABLED_BY_DEFAULT option, which was freshly added
to 2.6.20 via the x86_64 tree and /enabled by default/.

commit bff6547bb6 claims:

    [PATCH] Calgary: allow compiling Calgary in but not using it by default

    This patch makes it possible to compile Calgary in but not use it by
    default. In this mode, use 'iommu=calgary' to activate it.

but the change does not actually practice it:

 config CALGARY_IOMMU_ENABLED_BY_DEFAULT
        bool "Should Calgary be enabled by default?"
        default y
        depends on CALGARY_IOMMU
        help
          Should Calgary be enabled by default? if you choose 'y', Calgary
          will be used (if it exists). If you choose 'n', Calgary will not be
          used even if it exists. If you choose 'n' and would like to use
          Calgary anyway, pass 'iommu=calgary' on the kernel command line.
          If unsure, say Y.

it's both 'default y', and says "If unsure, say Y". Clearly not a typo.

disabling this option makes my box boot again. The patch below fixes the
Kconfig entry. Grumble.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-21 00:08:28 -08:00
Linus Torvalds
d1526e2cda Remove stack unwinder for now
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed.  We can put it back when
it's stable and isn't likely to make warning or bug events worse.

In the meantime, enable frame pointers for more readable stack traces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-15 08:47:51 -08:00
Dave Jones
c4366889dd Merge ../linus
Conflicts:

	drivers/cpufreq/cpufreq.c
2006-12-12 17:41:41 -05:00
Brice Goglin
e45116b8d7 [PATCH] Fix typo in 'EXPERIMENTAL' in CC_STACKPROTECTOR on x86_64
Fix typo in 'EXPERIMENTAL' in config CC_STACKPROTECTOR in arch/x86_64/Kconfig.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11 12:29:27 -08:00
Alexey Dobriyan
1f29bcd739 [PATCH] sysctl: remove unused "context" param
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:41 -08:00
Andi Kleen
1bac3b383a [PATCH] x86: Work around gcc 4.2 over aggressive optimizer
The new PDA code uses a dummy _proxy_pda variable to describe
memory references to the PDA. It is never referenced
in inline assembly, but exists as input/output arguments.
gcc 4.2 in some cases can CSE references to this which causes
unresolved symbols.  Define it to zero to avoid this.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:36 +01:00
Ravikiran G Thirumalai
92715e282b [PATCH] x86: Fix boot hang due to nmi watchdog init code
2.6.19  stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19.  check_nmi_watchdog schedules an
IPI on all cpus to  busy wait on a flag, but fails to set the busywait
flag if NMI functionality is disabled.  This causes the secondary cpus
to spin in an endless loop, causing the kernel bootup to hang.
Depending upon the build, the  busywait flag got overwritten (stack variable)
and caused  the kernel to bootup on certain builds.  Following patch fixes
the bug by setting the busywait flag before returning from check_nmi_watchdog.
I guess using a stack variable is not good here as the calling function could
potentially return while the busy wait loop is still spinning on the flag.

AK: I redid the patch significantly to be cleaner

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:35 +01:00
Andi Kleen
9f25441fe6 [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:35 +01:00
David Howells
f0d1b0b30d [PATCH] LOG2: Implement a general integer log2 facility in the kernel
This facility provides three entry points:

	ilog2()		Log base 2 of unsigned long
	ilog2_u32()	Log base 2 of u32
	ilog2_u64()	Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

	int do_something(long q)
	{
		...;
		y = ilog2(x)
		...;
	}

Or can be used to statically initialise global variables with constant values:

	unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant.  They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
Josef "Jeff" Sipek
c941192aaf [PATCH] x86_64: change uses of f_{dentry, vfsmnt} to use f_path
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the x86_64
arch code.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:42 -08:00
Jeremy Fitzhardinge
c31a0bf3e1 [PATCH] Generic BUG for x86-64
This makes x86-64 use the generic BUG machinery.

The main advantage in using the generic BUG machinery for x86-64 is that
the inlined overhead of BUG is just the ud2a instruction; the file+line
information are no longer inlined into the instruction stream.  This
reduces cache pollution.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:39 -08:00
Linus Torvalds
4522d58275 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:59:11 -08:00
Magnus Damm
85916f8166 [PATCH] Kexec / Kdump: Unify elf note code
The elf note saving code is currently duplicated over several
architectures.  This cleanup patch simply adds code to a common file and
then replaces the arch-specific code with calls to the newly added code.

The only drawback with this approach is that s390 doesn't fully support
kexec-on-panic which for that arch leads to introduction of unused code.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:46 -08:00
Ingo Molnar
0231606785 [PATCH] hotplug CPU: clean up hotcpu_notifier() use
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

    text    data     bss     dec     hex filename
 1624412  728710 3674856 6027978  5bfaca vmlinux.before
 1624412  728710 3674856 6027978  5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Andrew Morton
a38a44c1a9 [PATCH] smp_call_function_single() check that local interrupts are enabled
smp_call_function_single() can deadlock if the caller disabled local
interrupts (the target CPU could be spinning on call_lock).  Check for that.

Why on earth do these functions use spin_lock_bh()??

Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:39 -08:00
Masami Hiramatsu
b4c6c34a53 [PATCH] kprobes: enable booster on the preemptible kernel
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer.  The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag.  If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.

However, the processing of this check routine takes a long time.  So, this
patch introduces the garbage collection mechanism of insn_slot.  It also
introduces the "dirty" flag to free_insn_slot because of efficiency.

The "clean" instruction slots (dirty flag is cleared) are released
immediately.  But the "dirty" slots which are used by boosted kprobes, are
marked as garbages.  collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.

Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Magnus Damm
386d9a7edd [PATCH] elf: Always define elf_addr_t in linux/elf.h
Define elf_addr_t in linux/elf.h.  The size of the type is determined using
ELF_CLASS.  This allows us to remove the defines that today are spread all
over .c and .h files.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Henry Nestler
19e5d9c0d2 [PATCH] initrd: remove unused false condition for initrd_start
After LOADER_TYPE && INITRD_START are true, the short if-condition
for INITRD_START can never be false.

Remove unused code from the else condition.

Signed-off-by: Henry Nestler <henry.ne@arcor.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:38 -08:00
Arnd Bergmann
f5738ceed4 [PATCH] remove kernel syscalls
The last thing we agreed on was to remove the macros entirely for 2.6.19,
on all architectures. Unfortunately, I think nobody actually _did_ that,
so they are still there.

[akpm@osdl.org: x86_64 fix]
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Schafer <gschafer@zip.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:37 -08:00
Christoph Lameter
e94b176609 [PATCH] slab: remove SLAB_KERNEL
SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:24 -08:00
Andi Kleen
64a26a7312 [PATCH] x86-64: Export smp_call_function_single
smp_call_function() is exported, makes sense to export this one too.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Jan Beulich
b65780e123 [PATCH] unwinder: move .eh_frame to RODATA
The .eh_frame section contents is never written to, so it can as well
benefit from CONFIG_DEBUG_RODATA.

Diff-ed against firstfloor tree.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:19 +01:00
Yinghai Lu
ad892f5e0d [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
setup_IO_APIC_irqs could fail to get vector for some device when you have too
many devices, because at that time only boot cpu is online.  So check vector
for irq in setup_ioapic_dest and call setup_IO_APIC_irq to make sure IO-APIC
irq-routing table is initialized.

Also seperate setup_IO_APIC_irq from setup_IO_APIC_irqs.

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:19 +01:00
Andi Kleen
3807fd46e9 [PATCH] x86-64: Clarify error message in GART code
- Remove "Disabling IOMMU" message because it confuses people
- Clarify that the GART IOMMU is refered to in other message

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Venkatesh Pallipadi
d331e739f5 [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
Idle callbacks has some races when enter_idle() sets isidle and subsequent
interrupts that can happen on that CPU, before CPU goes to idle. Due to this,
an IDLE_END can get called before IDLE_START. To avoid these races, disable
interrupts before enter_idle and make sure that all idle routines do not
enable interrupts before entering idle.

Note that poll_idle() still has a this race as it has to enable interrupts
before going to idle. But, all other idle routines have the race fixed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Andi Kleen
a0429d0d7a [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
This was added as a workaround for the fallback unwinder not supporting
unaligned stack pointers properly. But now it was fixed to do that,
so it's not needed anymore

Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Jan Beulich
359ad0d401 [PATCH] unwinder: more sanity checks in Dwarf2 unwinder
Tighten the requirements on both input to and output from the Dwarf2
unwinder.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:13 +01:00
Andi Kleen
446f713ba1 [PATCH] unwinder: always use unlocked module list access in unwinder fallback
We're already well protected against module unloads because module
unload uses stop_machine(). The only exception is NMIs, but other
users already risk lockless accesses here.

This avoids some hackery in lockdep and also a potential deadlock

This matches what i386 does.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Arjan van de Ven
f3d73707a1 [PATCH] x86-64: Mark rdtsc as sync only for netburst, not for core2
On the Core2 cpus, the rdtsc instruction is not serializing (as defined
in the architecture reference since rdtsc exists) and due to the deep
speculation of these cores, it's possible that you can observe time go
backwards between cores due to this speculation. Since the kernel
already deals with this with the SYNC_RDTSC flag, the solution is
simple, only assume that the instruction is serializing on family 15...

The price one pays for this is a slightly slower gettimeofday (by a
dozen or two cycles), but that increase is quite small to pay for a
really-going-forward tsc counter.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Muli Ben-Yehuda
e496a0da7f [PATCH] Calgary: remove unused variables
Spotted by d binderman <dcb314@hotmail.com>.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Andi Kleen
6167796569 [PATCH] x86-64: Synchronize RDTSC on single core AMD
There is no guarantee that two RDTSCs in a row are monotonic,
so don't assume it on single core AMD systems.
This will make gettimeofday slower again
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
73ad8355d7 [PATCH] x86-64: remove unused acpi_found_madt in mparse.
remove unused acpi_found_madt in mparse.c

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Rafael J. Wysocki
d4c45718b3 [PATCH] x86-64: Fix kobject_init() WARN_ON on resume from disk
Make mce_remove_device() clean up the kobject in per_cpu(device_mce, cpu)
after it has been unregistered.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Adrian Bunk
026c66bdda [PATCH] x86-64: remove duplicate ARCH_DISCONTIGMEM_ENABLE option
One ARCH_DISCONTIGMEM_ENABLE option is enough.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
8fb6e5f5db [PATCH] x86_64: interrupt array size should be aligned to NR_VECTORS
interrupt array is referred for idt vectors instead of NR_IRQS, so change size
to NR_VECTORS - FIRST_EXTERNAL_VECTOR. Also change to static.

Signed-off-by: Yinghai Lu <yinghai@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Yinghai Lu
3df0af0eb0 [PATCH] x86_64: clear_bss before set_intr_gate with early_idt_handler
idt_table is in the .bss section, so clear_bss need to called at first

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:12 +01:00
Chuck Ebbert
0741f4d207 [PATCH] x86: add sysctl for kstack_depth_to_print
Add sysctl for kstack_depth_to_print. This lets users change
the amount of raw stack data printed in dump_stack() without
having to reboot.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
David Rientjes
86bd58bf4c [PATCH] x86-64: Remove unused GET_APIC_VERSION call from clear_local_APIC
Remove unused GET_APIC_VERSION call from clear_local_APIC() and
__setup_APIC_LVTT().

Reported by D Binderman <dcb314@hotmail.com>.

Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: David Rientjes <rientjes@cs.washington.edu>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Karsten Wiese
f990fff427 [PATCH] x86: Regard MSRs in lapic_suspend()/lapic_resume()
Read/Write APIC_LVTPC and APIC_LVTTHMR only,
if get_maxlvt() returns certain values.
This is done like everywhere else in i386/kernel/apic.c,
so I guess its correct.
Suspends/Resumes to disk fine and eleminates an smp_error_interrupt()
here on a K8.

AK: ported to x86-64 too

Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Stephane Eranian
ee58fad51a [PATCH] x86-64: x86-64 add Intel BTS cpufeature bit and detection (take 2)
Here is a small patch for x86-64 which adds a cpufeature flag and
detection code for Intel's Branch Trace Store (BTS) feature. This
feature can be found on Intel P4 and Core 2 processors among others.
It can also be used by perfmon.

changelog:
	- add CPU_FEATURE_BTS
	- add Branch Trace Store detection

signed-off-by: stephane eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:11 +01:00
Siddha, Suresh B
b0d0a4ba45 [PATCH] x86: fix the irqbalance quirk for E7320/E7520/E7525
Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
quirks.

And add a PCI quirk for these platforms to check(which happens very late
during the boot) if the APIC routing is indeed set to default flat mode.

This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
selects physical mode instead of the logical flat(as needed for this errata
workaround).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Siddha, Suresh B
9899f826fc [PATCH] x86-64: add genapic_force
Add genapic_force. Used by the next Intel quirks patch.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:10 +01:00
Andi Kleen
9a45732422 [PATCH] x86-64: Rate limit no irq handler messages
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Ernie Petrides
103efcd9aa [PATCH] x86-64: fix perms/range of vsyscall vma in /proc/*/maps
The final line of /proc/<pid>/maps on x86_64 for native 64-bit
tasks shows an incorrect ending address and incorrect permissions.  There
is only a single page mapped in this vsyscall region, and it is accessible
for both read and execute.

The patch below fixes this.  (Since 32-bit-compat tasks have a real vma
with correct perms/range, no change is necessary for that scenario.)

Before the patch, a "cat /proc/self/maps | tail -1" shows this:

        ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]

After the patch, this is the output:

        ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]

Signed-off-by: Ernie Petrides <petrides@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Andi Kleen
713819989a [PATCH] x86-64: Add option to compile for Core2
Add an option to compile for Intel's Core 2

The Kconfig help is a mouthful due to the inventiveness of Intel's
product naming department.

Mainly for the 64bit cache line sizes because gcc doesn't support
optimizing for core2 yet. However it will and then the kernel
should be ready by passing the right option

Also fix the old MPSC help text to confirm better to reality.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Andi Kleen
b6bcc4bb1c [PATCH] x86-64: Don't force inlining of do_csum
It's two big and used by two callers. Calls should be cheap enough anyways.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Andi Kleen
516d283643 [PATCH] x86-64: Fix race in IO-APIC routing entry setup.
Interrupt could happen between setting the IO-APIC entry
and setting its interrupt data.

Pointed out by Linus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Paolo 'Blaisorblade' Giarrusso
b9a8d94a47 [PATCH] x86-64: Make x86_64 udelay() round up instead of down.
Port two patches from i386 to x86_64 delay.c to make sure all rounding is done
upward instead of downward.

There is no sign in commit messages that the mismatch was done on purpose, and
"delay() guarantees sleeping at least for the specified time" is still a valid
rule IMHO.

The original x86 patches are both from pre-GIT era, i.e.:

"[PATCH] round up  in __udelay()" in commit
54c7e1f5cc6771ff644d7bc21a2b829308bd126f

"[PATCH] add 1 in __const_udelay()" in commit
42c77a9801b8877d8b90f65f75db758822a0bccc

(both commits are from converted BK repository to x86_64).

AK: fixed gcc warning

linux/arch/x86_64/lib/delay.c:43: warning: suggest parentheses around + or - inside shift
(did this actually work?)

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Muli Ben-Yehuda
bff6547bb6 [PATCH] Calgary: allow compiling Calgary in but not using it by default
This patch makes it possible to compile Calgary in but not use it by
default. In this mode, use 'iommu=calgary' to activate it.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:07 +01:00
Muli Ben-Yehuda
eae9375554 [PATCH] Calgary: check BBAR ioremap success when ioremapping
This patch cleans up the previous "Use BIOS supplied BBAR information"
patch. Mostly stylistic clenaups, but also check for ioremap failure
when we ioremap the BBAR rather than when trying to use it.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Laurent Vivier <Laurent.Vivier@bull.net>
2006-12-07 02:14:06 +01:00
Laurent Vivier
b34e90b8f0 [PATCH] Calgary: use BIOS supplied BBARs and topology information
Find the BBAR register address of each Calgary using the "Extended
BIOS Data Area" rather than calculating it ourselves. Also get the bus
topology (what PHB each bus is on) from Calgary rather than
calculating it ourselves.

This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=7407.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Muli Ben-Yehuda
58db854827 [PATCH] calgary: phb_shift can be int
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Albert Cahalan
8e3de538ee [PATCH] x86-64: Support -mregparm arguments for signals with SA_SIGINFO in compat mode
The recent change to make x86_64 support i386 binaries compiled
with -mregparm=3 only covered signal handlers without SA_SIGINFO.
(the 3-arg "real-time" ones)

To be compatible with i386, both types should be supported.

Signed-off-by: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Andi Kleen
b026872601 [PATCH] x86-64: Try multiple timer variants in check_timer
Instead of adding all kinds of more quirks try various timer
routing variants in check_timer.

In particular this tries to handle quirks from:
- Nvidia NF2-4 reference BIOS: wrong timer override
- Asus: Wrong timer override but no HPET table
- ATI: require timer disabled in 8259
- Some boards: require timer enabled in 8259

We just try many of the the known variants in the hopefully right order
in check_timer.

Trying pin 0/2 on Nvidia suggested by Tim Hockin.

TBD Experimental. Needs a lot of testing

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Andi Kleen
ab2bf0c1c6 [PATCH] x86-64: Use probe_kernel_address in arch/x86_64/*
Instead of open coded __get_user

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:06 +01:00
Yinghai Lu
5df0287ecc [PATCH] x86-64: Extend clear_irq_vector
Clear the irq releated entries in irq_vector, irq_domain and vector_irq
instead of clearing irq_vector only. So when new irq is created, it
could reuse that vector. (actually is the second loop scanning from
FIRST_DEVICE_VECTOR+8). This could avoid the vectors are used up
with enough module inserting and removing

Cc: Eric W. Biedierman <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-By: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:05 +01:00
Andi Kleen
ea7322decb [PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
CLFLUSH is a lot faster than WBINVD so avoid the later if at all
possible.

Always pass the complete list of pages to other CPUs to cut down
the number of IPIs.

Minor other cleanup and sync with i386 version.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:05 +01:00
Andi Kleen
9c5f8be462 [PATCH] x86: Mention PCI instead of RAM in NMI parity error message
On modern systems RAM errors don't cause NMIs, but it's usually
caused by PCI SERR. Mention PCI instead of RAM in the printk.

Reported by r_hayashi@ctc-g.co.jp (Ryutaro Hayashi)

Cc:  r_hayashi@ctc-g.co.jp

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:03 +01:00
Alan Cox
7cd8b6861e [PATCH] x86: remove last two pci_find offenders in the core code
Resending as I believe the discussion about them established they were
correct.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:03 +01:00
Andi Kleen
72690a2118 [PATCH] x86: Don't use nested idle loops
Currently the idle loop has two nested loops -- one high level
in cpu_idle and in some low level idle functions another one.

Looping in the low level idle functions breaks the idle notifiers
because interrupts waking up sleep states need to execute
exit_idle() which is only in cpu_idle().

So don't do that, only loop in cpu_idle(). This only removes
code.

In some cases e.g. poll_idle the idle loop is a little longer
now because cpu_idle checks more things. I hope that isn't a problem
ACPI idle doesn't change behaviour because it never looped anyways.

Cc: len.brown@intel.com
Cc: eranian@hpl.hp.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:03 +01:00
Jan Beulich
bcddc0155f [PATCH] x86-64: miscellaneous entry.S adjustments
This patch:
- makes ret_from_sys_call no longer global (all external users were
  previously switched to use int_ret_from_sys_call)
- adjusts placement of a CFI_{REMEMBER,RESTORE}_STATE pair to better
  fit logic flow
- eliminates an unnecessary pair of CFI_{REMEMBER,RESTORE}_STATE
- glues together function- and unwinder-wise the previously separate
  system_call and int_ret_from_sys_call function fragments

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:02 +01:00
Andrew Morton
da68933e0a [PATCH] x86-64: dump_trace() atomicity fix
Fix

BUG: using smp_processor_id() in preemptible [00000001] code:

in backtracer on preemptible debug kernels.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:02 +01:00
Aaron Durbin
399287229c [PATCH] x86-64: Insert Local and IO APIC(s) into resource map
Insert the Local APIC and IO APIC(s) into the resource tree.  It allows the
APIC resources to be visible within /proc/iomem.  The patch also takes into
account IO APIC(s) mapped in the PCI space by deferring the insertion until
after PCI has allocated its necessary resources.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:01 +01:00
Andrew Morton
bb81a09e55 [PATCH] x86: all cpu backtrace
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.

Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:01 +01:00
Alexey Dobriyan
e2764a1e30 [PATCH] x86-64: use BUILD_BUG_ON in FPU code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07 02:14:01 +01:00
Stephane Eranian
36b2a8d5af [PATCH] x86-64: add X86_FEATURE_PEBS and detection
Here is a patch (used by perfmon2) to detect the presence of the
Precise Event Based Sampling (PEBS) feature for Intel 64-bit processors.
The patch also adds the cpu_has_pebs macro.

changelog:
	- adds X86_FEATURE_PEBS
	- adds cpu_has_pebs to test for X86_FEATURE_PEBS

Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:01 +01:00
Andi Kleen
dd315df176 [PATCH] x86: Compress stack unwinder output
The unwinder has some extra newlines, which eat up loads of screen
space when it spews. (See https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=137900
for a nasty example).

warning_symbol-> and warning-> already printk a newline, so don't add one
in the strings passed to them.

[AK: redone for new code]

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:00 +01:00
Andi Kleen
b615ebdac9 [PATCH] x86: shorten lines in unwinder to be <= 80 characters
Andrew complained about > 80 character lines in the new unwinder.
Fix that.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:00 +01:00
Andi Kleen
8f820e9760 [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:00 +01:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Al Viro
a4f89fb7c0 [NET]: X86_64 checksum annotations and cleanups.
* sanitize prototypes, annotate
* usual ntohs->shift

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:23:14 -08:00
Linus Torvalds
707badb80b Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
  [PATCH] x86-64: Use stricter in process stack check for unwinder
  [PATCH] i386: Fix compilation with UP genericarch
  [PATCH] x86-64: Fix warning in io_apic.c
  [PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind
  [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
2006-11-28 17:28:41 -08:00
Andi Kleen
c547c77ee4 [PATCH] x86-64: Use stricter in process stack check for unwinder
Previously it would check for alignment only, which could break
if the stack pointer was unaligned. Now explicitely check if the
stack pointer is in the stack page of the current process.

Ported from i386.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-28 20:12:59 +01:00
Andi Kleen
f7a23328a7 [PATCH] x86-64: Fix warning in io_apic.c 2006-11-28 20:12:59 +01:00
Ingo Molnar
24d7bb3396 [PATCH] x86_64: fix 'earlyprintk=...,keep' regression
Commit 2c8c0e6b8d ("[PATCH] Convert x86-64
to early param") broke the earlyprintk=...,keep feature.

This restores that functionality.  Tested on x86_64.  Must-have for
v2.6.19, no risk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-28 10:58:21 -08:00
David Howells
65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
David Howells
52bad64d95 WorkStruct: Separate delayable and non-delayable events.
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
architecture it's nearly 100 bytes in size.  This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:54:01 +00:00
Vivek Goyal
3af9815328 [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
  config options and tool chain it might be placed on a non PAGE_SIZE aligned
  boundary and vmlinux loaders like kexec fail when they encounter a
  PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-21 10:31:21 +01:00
Vivek Goyal
9a14f2964b [PATCH] x86_64: Align data segment to PAGE_SIZE boundary
o Explicitly align data segment to PAGE_SIZE boundary otherwise depending on
  config options and tool chain it might be placed on a non PAGE_SIZE aligned
  boundary and vmlinux loaders like kexec fail when they encounter a
  PT_LOAD type segment which is not aligned to PAGE_SIZE boundary.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-21 10:26:54 +01:00
Yasunori Goto
8243229f09 [PATCH] x86_64: fix memory hotplug build with NUMA=n
This is to fix compile error of x86-64 memory hotplug without any NUMA
option.

  CC      arch/x86_64/mm/init.o
arch/x86_64/mm/init.c:501: error: redefinition of 'memory_add_physaddr_to_nid'
include/linux/memory_hotplug.h:71: error: previous definition of 'memory_add_phys
addr_to_nid' was here
arch/x86_64/mm/init.c:509: error: redefinition of 'memory_add_physaddr_to_nid'
arch/x86_64/mm/init.c:501: error: previous definition of 'memory_add_physaddr_to_
nid' was here

I confirmed compile completion with !NUMA, (NUMA & !ACPI_NUMA),
or (NUMA & ACPI_NUMA).

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-20 09:42:05 -08:00
Ingo Molnar
dc1829a4c3 [PATCH] i386/x86_64: ACPI cpu_idle_wait() fix
The scheduler on Andreas Friedrich's hyperthreading system stopped
working properly: the scheduler would never move tasks to another CPU!
The lask known working kernel was 2.6.8.

After a couple of attempts to corner the bug, the following smoking gun
was found:

  BIOS reported wrong ACPI idfor the processor
  CPU#1: set_cpus_allowed(), swapper:1, 3 -> 2
   [<c0103bbe>] show_trace_log_lvl+0x34/0x4a
   [<c0103ceb>] show_trace+0x2c/0x2e
   [<c01045f8>] dump_stack+0x2b/0x2d
   [<c0116a77>] set_cpus_allowed+0x52/0xec
   [<c0101d86>] cpu_idle_wait+0x2e/0x100
   [<c0259c57>] acpi_processor_power_exit+0x45/0x58
   [<c0259752>] acpi_processor_remove+0x46/0xea
   [<c025c6fb>] acpi_start_single_object+0x47/0x54
   [<c025cee5>] acpi_bus_register_driver+0xa4/0xd3
   [<c04ab2d7>] acpi_processor_init+0x57/0x77
   [<c01004d7>] init+0x146/0x2fd
   [<c0103a87>] kernel_thread_helper+0x7/0x10

a quick look at cpu_idle_wait() shows how broken that code is
on i386: it changes the init task's affinity map but never
restores it ...

and because all userspace tasks get forked by init, they all
inherited that single-CPU affinity mask. x86_64 cloned this
bug too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andreas Friedrich <andreas.friedrich@fujitsu-siemens.com>
Cc: Wolfgang Erig <Wolfgang.Erig@fujitsu-siemens.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-17 08:20:09 -08:00
Ingo Molnar
0796bdb7e9 [PATCH] x86_64: stack unwinder crash fix
the new dwarf2 unwinder crashes while trying to dump the stack:

  Leftover inexact backtrace:

  Unable to handle kernel paging request at ffffffff82800000 RIP:
   [<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
  PGD 203027 PUD 205027 PMD 0
  Oops: 0000 [2] PREEMPT SMP
  CPU 0
  Modules linked in:
  Pid: 30, comm: khelper Not tainted 2.6.19-rc6-rt1 #11
  RIP: 0010:[<ffffffff8026cf26>]  [<ffffffff8026cf26>] dump_trace+0x35b/0x3d2
  RSP: 0000:ffff81003fb9d848  EFLAGS: 00010006
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff805b3520 RDI: 0000000000000000
  RBP: ffffffff827ffff9 R08: ffffffff80aad000 R09: 0000000000000005
  R10: ffffffff80aae000 R11: ffffffff8037961b R12: ffff81003fb9d858
  R13: 0000000000000000 R14: ffffffff80598460 R15: ffffffff80ab1fc0
  FS:  0000000000000000(0000) GS:ffffffff806c4200(0000) knlGS:0000000000000000
  CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  CR2: ffffffff82800000 CR3: 0000000000201000 CR4: 00000000000006e0

this crash happened because it did not sanitize the dwarf2 data it
got, and got an unaligned stack pointer - which happily walked past
the process stack (and eventually reached the end of kernel memory
and pagefaulted there) due to this naive iteration condition:

        HANDLE_STACK (((long) stack & (THREAD_SIZE-1)) != 0);

note that i386 is alot more conservative when it comes to trusting
stack pointers:

  static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
  {
         return  p > (void *)tinfo &&
                 p < (void *)tinfo + THREAD_SIZE - 3;
  }

but the x86_64 code did not take this bit of i386 code.

The fix is to align the stack pointer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-17 08:20:09 -08:00
Ingo Molnar
ccf9ff524c [PATCH] x86_64: fix CONFIG_CC_STACKPROTECTOR build bug
on x86_64, the CONFIG_CC_STACKPROTECTOR build fails if used in a
distcc setup that has "CC" defined to "distcc gcc":

 gcc: gcc: linker input file unused because linking not done
 gcc: gcc: linker input file unused because linking not done
 gcc: gcc: linker input file unused because linking not done

this is because the gcc-x86_64-has-stack-protector.sh script
has a 2-parameters assumption. Fix this by passing $(CC) as
a single parameter.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Please-Use-Me-More: make randconfig
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16 14:00:25 -08:00
Andi Kleen
6b3d1a95ba [PATCH] x86-64: Fix vsyscall.c compilation on UP
Broken by earlier patch by me.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16 13:57:03 -08:00
Eric W. Biederman
45c9953325 [PATCH] Use delayed disable mode of ioapic edge triggered interrupts
Komuro reports that ISA interrupts do not work after a disable_irq(),
causing some PCMCIA drivers to not work, with messages like

	eth0: Asix AX88190: io 0x300, irq 3, hw_addr xx:xx:xx:xx:xx:xx
	eth0: found link beat
	eth0: autonegotiation complete: 100baseT-FD selected
	eth0: interrupt(s) dropped!
	eth0: interrupt(s) dropped!
	eth0: interrupt(s) dropped!
	...

Linus Torvalds <torvalds@osdl.org> said:

  "Now, edge-triggered interrupts are a _lot_ harder to mask, because the
   Intel APIC is an unbelievable piece of sh*t, and has the edge-detect logic
   _before_ the mask logic, so if a edge happens _while_ the device is
   masked, you'll never ever see the edge ever again (unmasking will not
   cause a new edge, so you simply lost the interrupt).

   So when you "mask" an edge-triggered IRQ, you can't really mask it at all,
   because if you did that, you'd lose it forever if the IRQ comes in while
   you masked it. Instead, we're supposed to leave it active, and set a flag,
   and IF the IRQ comes in, we just remember it, and mask it at that point
   instead, and then on unmasking, we have to replay it by sending a
   self-IPI."

This trivial patch solves the problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-15 09:04:32 -08:00
Andi Kleen
9446868b53 [PATCH] x86-64: Fix race in exit_idle
When another interrupt happens in exit_idle the exit idle notifier
could be called an incorrect number of times.

Add a test_and_clear_bit_pda and use it handle the bit
atomically against interrupts to avoid this.

Pointed out by Stephane Eranian

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
8c131af1db [PATCH] x86-64: Fix vgetcpu when CONFIG_HOTPLUG_CPU is disabled
The vgetcpu per CPU initialization previously relied on CPU hotplug
events for all CPUs to initialize the per CPU state. That only
worked only on kernels with CONFIG_HOTPLUG_CPU enabled.  On the
others some CPUs didn't get their state initialized properly
and vgetcpu wouldn't work.

Change the initialization sequence to instead run in a normal
initcall (which runs after the normal CPU bootup) and initialize
all running CPUs there. Later hotplug CPUs are still handled
with an hotplug notifier.

This actually simplifies the code somewhat.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
fa18f477d0 [PATCH] x86: Add acpi_user_timer_override option for Asus boards
Timer overrides are normally disabled on Nvidia board because
they are commonly wrong, except on new ones with HPET support.
Unfortunately there are quite some Asus boards around that
don't have HPET, but need a timer override.

We don't know yet how to handle this transparently,
but at least add a command line option to force the timer override
and let them boot.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Magnus Damm
15803a4328 [PATCH] x86-64: setup saved_max_pfn correctly (kdump)
x86_64: setup saved_max_pfn correctly

2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
to zero instead of the max_pfn value before the user map is setup.

This happens because saved_max_pfn is initialized at parse_early_param() time,
and at this time no active regions have been registered. save_max_pfn is setup
from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
returns 0 because no regions exist.

This patch fixes this by registering before and removing after the call
to e820_end_of_ram().

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
5e58a02a8f [PATCH] x86-64: Handle reserve_bootmem_generic beyond end_pfn
This can happen on kexec kernels with some configurations, in particularly
on Unisys ES7000 systems.

Analysis by Amul Shah

Cc: Amul Shah <amul.shah@unisys.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Steven Rostedt
51d67a488b [PATCH] x86-64: shorten the x86_64 boot setup GDT to what the comment says
Stephen Tweedie, Herbert Xu, and myself have been struggling with a very
nasty bug in Xen.  But it also pointed out a small bug in the x86_64
kernel boot setup.

The GDT limit being setup by the initial bzImage code when entering into
protected mode is way too big.  The comment by the code states that the
size of the GDT is 2048, but the actual size being set up is much bigger
(32768). This happens simply because of one extra '0'.

Instead of setting up a 0x800 size, 0x8000 is set up.  On bare metal this
is fine because the CPU wont load any segments unless  they are
explicitly used.  But unfortunately, this breaks Xen on vmx FV, since it
(for now) blindly loads all the segments into the VMCS if they are less
than the gdt limit. Since the real mode segments are around 0x3000, we are
getting junk into the VMCS and that later causes an exception.

Stephen Tweedie has written up a patch to fix the Xen side and will be
submitting that to those folks. But that doesn't excuse the GDT limit
being a magnitude too big.

AK: changed to compute true gdt size in assembler, fixed comment

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Andi Kleen
14679eb3c5 [PATCH] x86-64: Fix PTRACE_[SG]ET_THREAD_AREA regression with ia32 emulation.
ptrace(PTRACE_[SG]ET_THREAD_AREA) calls from ia32 code
should be passed onto the x86_64 implementation.

The default case in sys32_ptrace used to call to sys_ptrace(), but is
now EINVAL.  This patch fixes a regression caused by that changed.

Signed-off-by: Mike McCormack <mike@codeweavers.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:46 +01:00
Aaron Durbin
14f448e361 [PATCH] x86-64: Fix partial page check to ensure unusable memory is not being marked usable.
Fix partial page check in e820_register_active_regions to ensure
partial pages are
not being marked as active in the memory pool.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-11-14 16:57:45 +01:00
Andi Kleen
64e72e41ac Revert "[PATCH] MMCONFIG and new Intel motherboards"
This reverts 4c6e052adf commit.

Following Linus' i386 change: revert resource reservation
for mmcfg config now. Will be revisited in .20 hopefully.
2006-11-14 16:56:33 +01:00
Eric W. Biederman
ec68307cc5 [PATCH] htirq: refactor so we only have one function that writes to the chip
This refactoring actually optimizes the code a little by caching the value
that we think the device is programmed with instead of reading it back from
the hardware.  Which simplifies the code a little and should speed things up a
bit.

This patch introduces the concept of a ht_irq_msg and modifies the
architecture read/write routines to update this code.

There is a minor consistency fix here as well as x86_64 forgot to initialize
the htirq as masked.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 18:29:24 -08:00
Linus Torvalds
48797ebd9e x86-64: write IO APIC irq routing entries in correct order
This is the x86-64 version of f9dadfa71b
that did the same thing on i386.

Since the "mask" bit is in the low word, when we write a new entry, we
need to write the high word first, before we potentially unmask it.

The exception is when we actually want to mask the interrupt, in which
case we want to write the low word first to make sure that the high word
doesn't change while the interrupt routing is still active.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 10:27:54 -08:00
Linus Torvalds
6c0ffb9d2f x86-64: clean up io-apic accesses
This is just commit 130fe05dbc ported to
x86-64, for all the same reasons.  It cleans up the IO-APIC accesses in
order to then fix the ordering issues.

We move the accessor functions (that were only used by io_apic.c) out of
a header file, and use proper memory-mapped accesses rather than making
up our own "volatile" pointers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08 10:23:03 -08:00
Albert Cahalan
a7aacdf9ea [PATCH] fix i386 regparm=3 RT signal handlers on x86_64
The recent change to make x86_64 support i386 binaries compiled
with -mregparm=3 only covered signal handlers without SA_SIGINFO.
(the 3-arg "real-time" ones) This is useful for klibc at least.

Signed-off-by: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-30 12:12:21 -08:00
Linus Torvalds
fe31eb6797 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
  PCI: Remove quirk_via_abnormal_poweroff
  PCI: reset pci device state to unknown state for resume
  PCI: x86-64: mmconfig missing printk levels
  PCI: fix pci_fixup_video as it blows up on sparc64
  acpiphp: fix latch status
2006-10-27 15:35:28 -07:00
Andrew Morton
61ce1efe6e [PATCH] vmlinux.lds: consolidate initcall sections
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.

This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-27 15:34:51 -07:00
Dave Jones
3095fc0c97 PCI: x86-64: mmconfig missing printk levels
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-10-27 11:20:33 -07:00
Eric W. Biederman
70a0a5357d [PATCH] x86-64: Only look at per_cpu data for online cpus.
When I generalized __assign_irq_vector I failed to pay attention
to what happens when you access a per cpu data structure for
a cpu that is not online.   It is an undefined case making any
code that does it have undefined behavior as well.

The code still needs to be able to allocate a vector across cpus
that are not online to properly handle combinations like lowest
priority interrupt delivery and cpu_hotplug.  Not that we can do
that today but the infrastructure shouldn't prevent it.

So this patch updates the places where we touch per cpu data
to only touch online cpus, it makes cpu vector allocation
an atomic operation with respect to cpu hotplug, and it updates
the cpu start code to properly initialize vector_irq so we
don't have inconsistencies.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-25 01:00:23 +02:00
Eric W. Biederman
d1752aa884 [PATCH] x86-64: Simplify the vector allocator.
There is no reason to remember a per cpu position of which vector
to try.  Keeping a global position is simpler and more likely to
result in a global vector allocation even if I don't need or require
it.  For level triggered interrupts this means we are less likely to
acknowledge another cpus irq, and cause the level triggered irq to
harmlessly refire.

This simplification makes it easier to only access data structures
of  online cpus, by having fewer special cases to deal with.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-25 01:00:22 +02:00
Muli Ben-Yehuda
cb01fc720c [PATCH] x86-64: increase PHB1 split transaction timeout
This patch increases the timeout for PCI split transactions on PHB1 on
the first Calgary to work around an issue with the aic94xx
adapter. Fixes kernel.org bugzilla #7180
(http://bugzilla.kernel.org/show_bug.cgi?id=7180)

Based on excellent debugging and a patch by Darrick J. Wong
<djwong@us.ibm.com>

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2006-10-22 00:41:15 +02:00
Andi Kleen
aa026ede51 [PATCH] x86-64: Fix C3 timer test
There was a typo in the C3 latency test to decide of the TSC
should be used or not. It used the C2 latency threshold, not the
C3 one. Fix that.

This should fix the time on various dual core laptops.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-22 00:41:15 +02:00
Andi Kleen
e70ea8c09d [PATCH] x86-64: Revert timer routing behaviour back to 2.6.16 state
By default route the 8254 over the 8259 and only disable
it on ATI boards where this causes double timer interrupts.

This should unbreak some Nvidia boards where the timer doesn't
seem to tick of it isn't enabled in the 8259. At least one
VIA board also seemed to have a little trouble with the disabled
8259.

For 2.6.20 we'll try both dynamically without black listing, but I think
for .19 this is the safer approach because it has been already well tested
in earlier kernels. This also makes the x86-64 behaviour the same
as i386.

Command line options can change all this of course.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:03 +02:00
Vivek Goyal
dbaab49f92 [PATCH] x86-64: Overlapping program headers in physical addr space fix
o A recent change to vmlinux.ld.S file broke kexec as now resulting vmlinux
  program headers are overlapping in physical address space.

o Now all the vsyscall related sections are placed after data and after
  that mostly init data sections are placed. To avoid physical overlap
  among phdrs, there are three possible solutions.
	- Place vsyscall sections also in data phdrs instead of user
	- move vsyscal sections after init data in bss.
	- create another phdrs say data.init and move all the sections
	  after vsyscall into this new phdr.

o This patch implements the third solution.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Magnus Damm <magnus@valinux.co.jp>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-21 18:37:03 +02:00
Eric W. Biederman
84f404f695 [PATCH] x86-64: Put more than one cpu in TARGET_CPUS
TARGET_CPUS is the default irq routing poicy.  It specifies which cpus the
kernel should aim an irq at.  In physflat delivery mode we can route an irq to
a single cpu.  But that doesn't mean our default policy should only be a
single cpu is allowed.

By allowing the irq routing code to select from multiple cpus this enables
systems with more irqs then we can service on a single processor to actually
work.

I just audited and tested the code and irqbalance doesn't care, and the
io_apic.c doesn't care if we have extra cpus in the mask.  Everything will use
or assume we are using the lowest numbered cpu in the mask if we can't use
them all.

So this should result in no behavior changes except on systems that need it.

Thanks for YH Lu for spotting this problem in his testing.

Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Andi Kleen
8cf2c51927 [PATCH] x86: Revert new unwind kernel stack termination
Jan convinced me that it was unnecessary because the assembly stubs do
this already on the stack.

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Eric W. Biederman
6bf2dafad1 [PATCH] x86-64: Use irq_domain in ioapic_retrigger_irq
Thanks to YH Lu for spotting this.  It appears I missed this function when I
refactored allocate_irq_vector and introduced irq_domain, with the result that
all retriggered irqs would go to cpu 0 even if we were not prepared to receive
them there.

While reviewing YH's patch I also noticed that this function was missing
locking, and since I am now reading two values from two diffrent arrays that
looks like a race we might be able to hit in the real world.

Cc: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Andi Kleen
581910e2eb [PATCH] x86-64: Revert interrupt backlink changes
They break more than they fix
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Jan Beulich
cc7d479fe5 [PATCH] x86-64: Fix ENOSYS in system call tracing
This patch:

- out of range system calls failing to return -ENOSYS under
  system call tracing

[AK: split out from another patch by Jan as separate bugfix]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Andi Kleen
cdfce1f571 [PATCH] x86: Use -maccumulate-outgoing-args
This avoids some problems with gcc 4.x and earlier generating
invalid unwind information. In 4.1 the option is default
when unwind information is enabled.

And it seems to generate smaller code too, so it's probably
a good thing on its own. With gcc 4.0:

i386:
4683198  902112  480868 6066178  5c9002 vmlinux (before)
4449895  902112  480868 5832875  5900ab vmlinux (after)

x86-64:
4939761 1449584  648216 7037561  6b6279 vmlinux (before)
4854193 1449584  648216 6951993  6a1439 vmlinux (after)

On 4.1 it shouldn't make much difference because it is
default when unwind is enabled anyways.

Suggested by Michael Matz and Jan Beulich

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Vivek Goyal
73bb8919b3 [PATCH] x86-64: fix page align in e820 allocator
Currently some code pieces assume that address returned by find_e820_area()
are page aligned.  But looks like find_e820_area() had no such intention
and hence one might end up stomping over some of the data.  One such case
is bootmem allocator initialization code stomped over bss.

This patch modified find_e820_area() to return page aligned address.  This
might be little wasteful of memory but at the same time probably it is
easier to handle page aligned memory.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-21 18:37:01 +02:00
Corey Minyard
469b1d8741 [PATCH] x86-64: Fix for arch/x86_64/pci/Makefile CFLAGS
The arch/x86_64/pci directory was giving problems in a wierd cross-compile
environment.  The exact cause is unknown, but the Makefile used CFLAGS
instead of EXTRA_CFLAGS.  From what I can tell from
Documentation/kbuild/makefiles.txt, CFLAGS should not be used for this, it
should be EXTRA_CFLAGS.  And it solves the cross-compile problem.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-21 18:37:01 +02:00
Yinghai Lu
45edfd1db0 [PATCH] x86-64: typo in __assign_irq_vector when updating pos for vector and offset
typo with cpu instead of new_cpu

Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
keith mannthey
926fafebc4 [PATCH] x86-64: x86_64 hot-add memory srat.c fix
This patch corrects the logic used in srat.c to figure out what
parsing what action to take when registering hot-add areas.  Hot-add
areas should only be added to the node information for the
MEMORY_HOTPLUG_RESERVE case.  When booting MEMORY_HOTPLUG_SPARSE hot-add
areas on everything but the last node are getting include in the node
data and during kernel boot the pages are setup then the kernel dies
when the pages are used. This patch fixes this issue.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:01 +02:00
Andi Kleen
f248b6a34f [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:00 +02:00
Ingo Molnar
a460e745e8 [PATCH] genirq: clean up irq-flow-type naming
Introduce desc->name and eliminate the handle_irq_name() hack.  Add
set_irq_chip_and_handler_name() to set the flow type and name at once.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-17 08:18:45 -07:00
Venkatesh Pallipadi
83d0515bbb [CPUFREQ][4/8] acpi-cpufreq: Mark speedstep-centrino ACPI as deprecated
Mark ACPI hooks in speedstep-centrino as deprecated. Change the order in which
speedstep-centrino and acpi-cpufreq (when both are in kernel) will be
added. First driver to be tried is now acpi-cpufreq, followed by
speedstep-centrino.

Add a note in feature-removal-schedule to mark this deprecation.

Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-15 19:57:10 -04:00
Venkatesh Pallipadi
991528d734 ACPI: Processor native C-states using MWAIT
Intel processors starting with the Core Duo support
support processor native C-state using the MWAIT instruction.
Refer: Intel Architecture Software Developer's Manual
http://www.intel.com/design/Pentium4/manuals/253668.htm

Platform firmware exports the support for Native C-state to OS using
ACPI _PDC and _CST methods.
Refer: Intel Processor Vendor-Specific ACPI: Interface Specification
http://www.intel.com/technology/iapc/acpi/downloads/302223.htm

With Processor Native C-state, we use 'MWAIT' instruction on the processor
to enter different C-states (C1, C2, C3).  We won't use the special IO
ports to enter C-state and no SMM mode etc required to enter C-state.
Overall this will mean better C-state support.

One major advantage of using MWAIT for all C-states is, with this and
"treat interrupt as break event" feature of MWAIT, we can now get accurate
timing for the time spent in C1, C2, ..  states.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2006-10-14 00:35:39 -04:00
Ravikiran Thirumalai
734c4c6739 [PATCH] Fix build breakage with CONFIG_X86_VSMP
Kernel build breaks with CONFIG_X86_VSMP.  Probably due to some header
file cleanups in 2.6.19-rc1.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-12 12:25:27 -07:00
Eric W. Biederman
994bd4f9f5 [PATCH] x86_64 irq: Properly update vector_irq
This patch fixes my one line thinko where I was clearing
the vector_irq entries on the wrong cpus.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-12 07:37:30 -07:00
Aneesh Kumar K.V
c37e108d15 [PATCH] use struct irq_chip instead of struct hw_interrupt_type
hw_interrupt_type is deprecated in favour of struct irq_chip.

[mingo@elte.hu: do x86_64 too]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:14 -07:00
Mel Gorman
6391af174a [PATCH] mm: use symbolic names instead of indices for zone initialisation
Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns).  The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.

The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing.  Two users have successfully booted their
powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.

Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix.  Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:14 -07:00
Al Viro
86f9333654 [PATCH] ptrace32 trivial __user annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-10 15:37:23 -07:00
Eric W. Biederman
d3696cf737 [PATCH] x86_64 irq: Scream but don't die if we receive an unexpected irq
Due to code bugs or misbehaving hardware it is possible that we can
receive an interrupt that we have not mapped into a linux irq.  Calling
BUG when that happens is very rude, and if the problem is mild enough
prevents anything else from getting done.

So instead of calling BUG just scream loudly about the problem and
continue running.  We don't have enough knowledge to know which
interrupt triggered this behavior so we don't acknowledge it.  This will
likely prevent a recurrence of the problem by jamming up the works with
an unacknowledged interrupt.

If the interrupt was something important it is quite possible that
nothing productive will happen past this point.  But it is now at least
possible to keep working if the kernel can survive without the interrupt
we dropped on the floor.

Solutions like irqpoll should generally make dropped irqs non-fatal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-09 14:51:43 -07:00
Eric W. Biederman
c7111c1318 [PATCH] x86_64 irq: Allocate a vector across all cpus for genapic_flat.
The problem we can't take advantage of lowest priority delivery mode if
the vectors are allocated for only one cpu at a time.  Nor can we work
around hardware that assumes lowest priority delivery mode is always
used with several cpus.

So this patch introduces the concept of a vector_allocation_domain.  A
set of cpus that will receive an irq on the same vector.  Currently the
code for implementing this is placed in the genapic structure so we can
vary this depending on how we are using the io_apics.

This allows us to restore the previous behaviour of genapic_flat without
removing the benefits of having separate vector allocation for large
machines.

This should also fix the problem report where a hyperthreaded cpu was
receving the irq on the wrong hyperthread when in logical delivery mode
because the previous behaviour is restored.

This patch properly records our allocation of the first 16 irqs to the
first 16 available vectors on all cpus.  This should be fine but it may
run into problems with multiple interrupts at the same interrupt level.
Except for some badly maintained comments in the code and the behaviour
of the interrupt allocator I have no real understanding of that problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-08 12:24:02 -07:00
Eric W. Biederman
b940d22d58 [PATCH] i386/x86_64: Remove global IO_APIC_VECTOR
Which vector an irq is assigned to now varies dynamically and is
not needed outside of io_apic.c.  So remove the possibility
of accessing the information outside of io_apic.c and remove
the silly macro that makes looking for users of irq_vector
difficult.

The fact this compiles ensures there aren't any more pieces
of the old CONFIG_PCI_MSI weirdness that I failed to remove.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-08 12:24:02 -07:00
Andrew Morton
d150ad7bd9 [PATCH] x86_64 irq_regs fix
smp_apic_timer_interrupt() needs to stack the pt_regs* for profile_tick.

If any other of those APIC interrupt handlers want to run get_irq_regs() then
their C entrypoint handlers will need the same treatment.

Cc: Andi Kleen <ak@muc.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06 13:36:52 -07:00
Linus Torvalds
44aefd2706 Merge git://git.infradead.org/~dhowells/irq-2.6
* git://git.infradead.org/~dhowells/irq-2.6:
  IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
  IRQ: Typedef the IRQ handler function type
  IRQ: Typedef the IRQ flow handler function type
2006-10-05 16:32:01 -07:00
Randy Dunlap
4b0ff1a94c [PATCH] x86-64: Fix compilation without CONFIG_KALLSYMS
Include linux/kallsyms.h unconditionally for print_symbol().

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-05 15:55:15 -07:00
Andi Kleen
7d0b0e8ddb [PATCH] x86-64: Annotate interrupt frame backlink in interrupt handlers
Add correct CFI annotation to the backlink on top of the interrupt stack.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Andi Kleen
0a5ace2ab0 [PATCH] x86-64: Fix FPU corruption
This reverts an earlier patch that was found to cause FPU
state corruption. I think the corruption happens because
unlazy_fpu() can cause FPU exceptions and when it happens
after the current switch some processing would affect
the state in the wrong process.

Thanks to  Douglas Crosher and Tom Hughes for testing.

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Andi Kleen
51ec28e1b2 [PATCH] x86: Terminate the kernel stacks for the unwinder
Always make sure RIP/EIP is 0 in the registers stored on the top
of the stack of a kernel thread. This makes sure the unwinder code
won't try a fallback but knows the stack has ended.

AK: this patch is a bit mysterious. in theory they should be terminated
anyways, but it seems to fix at least one crash. Anyways double termination
probably doesn't hurt.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:22 +02:00
Jon Mason
70d666d6ae [PATCH] x86-64: Calgary IOMMU: print PCI bus numbers in hex
Make the references to the bus number in hex instead of decimal, as
that is the way that lspci prints out the bus numbers.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
d8d2bedf60 [PATCH] x86-64: Calgary IOMMU: Update Jon's contact info
Also add copyright for work done after leaving IBM.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
76fd231717 [PATCH] x86-64: Calgary IOMMU: Fix off by one when calculating register space location
The purpose of the code being modified is to determine the location
of the calgary chip address space.  This is done by a magical formula
of FE0MB-8MB*OneBasedChassisNumber+1MB*(RioNodeId-ChassisBase) to
find the offset where BIOS puts it.  In this formula,
OneBasedChassisNumber corresponds to the NUMA node, and rionodeid is
always 2 or 3 depending on which chip in the system it is.  The
problem was that we had an off by one error that caused us to account
some busses to the wrong chip and thus give them the wrong address
space.

Fixes RH bugzilla #203971.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-bu: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Jon Mason
dedc9937e8 [PATCH] x86-64: Calgary IOMMU: deobfuscate calgary_init
calgary_init's for loop does not correspond to the actual device being
checked, which makes its upperbound check for array overflow useless.
Changing this to a do-while loop is the correct way of doing this.
There should be no possibility of spinning forever in this loop, as
pci_get_device states that it will go through all iterations, then
return NULL (thus breaking the loop).

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
Andi Kleen
a7441a39a3 [PATCH] x86-64: Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-05 18:47:21 +02:00
David Howells
7d12e780e0 IRQ: Maintain regs pointer globally rather than passing to IRQ handlers
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.

The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around.  On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).

Where appropriate, an arch may override the generic storage facility and do
something different with the variable.  On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.

Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions.  Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller.  A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.

I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.

This will affect all archs.  Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

	struct pt_regs *old_regs = set_irq_regs(regs);

And put the old one back at the end:

	set_irq_regs(old_regs);

Don't pass regs through to generic_handle_irq() or __do_IRQ().

In timer_interrupt(), this sort of change will be necessary:

	-	update_process_times(user_mode(regs));
	-	profile_tick(CPU_PROFILING, regs);
	+	update_process_times(user_mode(get_irq_regs()));
	+	profile_tick(CPU_PROFILING);

I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().

Some notes on the interrupt handling in the drivers:

 (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
     the input_dev struct.

 (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
     something different depending on whether it's been supplied with a regs
     pointer or not.

 (*) Various IRQ handler function pointers have been moved to type
     irq_handler_t.

Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
2006-10-05 15:10:12 +01:00
Linus Torvalds
fefd26b3b8 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
  Remove all inclusions of <linux/config.h>

Manually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory.
2006-10-04 09:59:57 -07:00
Eric W. Biederman
95d77884c7 [PATCH] htirq: tidy up the htirq code
This moves the declarations for the architecture helpers into
include/linux/htirq.h from the generic include/linux/pci.h.  Hopefully this
will make this distinction clearer.

htirq.h is included where it is needed.

The dependency on the msi code is fixed and removed.

The Makefile is tidied up.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:30 -07:00
Eric W. Biederman
3b7d1921f4 [PATCH] msi: refactor and move the msi irq_chip into the arch code
It turns out msi_ops was simply not enough to abstract the architecture
specific details of msi.  So I have moved the resposibility of constructing
the struct irq_chip to the architectures, and have two architecture specific
functions arch_setup_msi_irq, and arch_teardown_msi_irq.

For simple architectures those functions can do all of the work.  For
architectures with platform dependencies they can call into the appropriate
platform code.

With this msi.c is finally free of assuming you have an apic, and this
actually takes less code.

The helpers for the architecture specific code are declared in the linux/msi.h
to keep them separate from the msi functions used by drivers in linux/pci.h

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
8b955b0ddd [PATCH] Initial generic hypertransport interrupt support
This patch implements two functions ht_create_irq and ht_destroy_irq for
use by drivers.  Several other functions are implemented as helpers for
arch specific irq_chip handlers.

The driver for the card I tested this on isn't yet ready to be merged.
However this code is and hypertransport irqs are in use in a few other
places in the kernel.  Not that any of this will get merged before 2.6.19

Because the ipath-ht400 is slightly out of spec this code will need to be
generalized to work there.

I think all of the powerpc uses are for a plain interrupt controller in a
chipset so support for native hypertransport devices is a little less
interesting.

However I think this is a half way decent model on how to separate arch
specific and generic helper code, and I think this is a functional model of
how to get the architecture dependencies out of the msi code.

[akpm@osdl.org: Kconfig fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
cd1182f56a [PATCH] genirq: x86_64 irq: Kill irq compression
With more irqs in the system we don't need this.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
f023d764cc [PATCH] genirq: x86_64 irq: Kill gsi_irq_sharing
After raising the number of irqs the system supports this function is no
longer necessary.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
550f2299ac [PATCH] genirq: x86_64 irq: make vector_irq per cpu
This refactors the irq handling code to make the vectors a per cpu resource so
the same vector number can be simultaneously used on multiple cpus for
different irqs.

This should make systems that were hitting limits on the total number of irqs
much more livable.

[akpm@osdl.org: build fix]
[akpm@osdl.org: __target_IO_APIC_irq is unneeded on UP]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:29 -07:00
Eric W. Biederman
e500f57436 [PATCH] genirq: x86_64 irq: Make the external irq handlers report their vector, not the irq number
This is a small pessimization but it paves the way for making this information
per cpu.  Which allows the the maximum number of IRQS to become NR_CPUS*224.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
04b9267b15 [PATCH] genirq: x86_64 irq: Remove the msi assumption that irq == vector
This patch removes the change in behavior of the irq allocation code when
CONFIG_PCI_MSI is defined.  Removing all instances of the assumption that irq
== vector.

create_irq is rewritten to first allocate a free irq and then to assign that
irq a vector.

assign_irq_vector is made static and the AUTO_ASSIGN case which allocates an
vector not bound to an irq is removed.

The ioapic vector methods are removed, and everything now works with irqs.

The definition of NR_IRQS no longer depends on CONFIG_PCI_MSI

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
589e367f9b [PATCH] genirq: x86_64 irq: Move msi message composition into io_apic.c
This removes the hardcoded assumption that irq == vector in the msi
composition code, and it allows the msi message composition to setup logical
mode, or lowest priorirty delivery mode as we do for other apic interrupts,
and with the same selection criteria.

Basically this moves the problem of what is in the msi message into the
architecture irq management code where it belongs.  Not in a generic layer
that doesn't have enough information to compose msi messages properly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
c4fa0bbf38 [PATCH] genirq: x86_64 irq: Dynamic irq support
The current implementation of create_irq() is a hack but it is the current
hack that msi.c uses, and unfortunately the ``generic'' apic msi ops depend on
this hack.  Thus we are this hack of assuming irq == vector until the
depencencies in the generic irq code are removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:28 -07:00
Eric W. Biederman
0be6652f1e [PATCH] genirq: x86_64 irq: Reenable migrating irqs to other cpus
In the latest changes the code for migrating x86_64 irqs was dropped.  This
reads it in a fashion that will work even if we change the vector on level
triggered irqs when we migrate them.

[akpm@osdl.org: build fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rajesh Shah <rajesh.shah@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:26 -07:00
Ingo Molnar
f29bd1ba68 [PATCH] genirq: convert the x86_64 architecture to irq-chips
This patch converts all the x86_64 PIC controllers layers to the new and
simpler irq-chip interrupt handling layer.

[mingo@elte.hu: The patch also enables the fasteoi handler for x86_64]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04 07:55:25 -07:00
Dave Jones
038b0a6d8d Remove all inclusions of <linux/config.h>
kbuild explicitly includes this at build time.

Signed-off-by: Dave Jones <davej@redhat.com>
2006-10-04 03:38:54 -04:00
Matt LaPlante
44c09201a4 more misc typo fixes
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:34:14 +02:00
David Howells
afefdbb28a [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system.  They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example.  The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.

Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.

This patch:

Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.

The stat functions then returns the full 64-bit inode number where
available and where possible.  If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.

Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.

Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.

Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.

It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.

[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:40 -07:00
Andrew Morton
0e4a523fa3 [PATCH] revert "insert IOAPIC(s) and Local APIC into resource map"
Commit 54dbc0c9eb is causing various
people's machines to fail to map PCI resources.

Revert it in preparation for addressing the show-APICs-in-/proc/iomem
requirement in a different manner.

Cc: Aaron Durbin <adurbin@google.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 19:46:18 -07:00
Arnd Bergmann
3db03b4afb [PATCH] rename the provided execve functions to kernel_execve
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly.  Rename these to kernel_execve to
get the right semantics there.  Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Serge E. Hallyn
96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn
0437eb594e [PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c.  This
avoids all arches having to be updated.  Compiles and boots on s390.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Serge E. Hallyn
ab516013ad [PATCH] namespaces: add nsproxy
This patch adds a nsproxy structure to the task struct.  Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
bibo,mao
99219a3fbc [PATCH] kretprobe spinlock deadlock patch
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock.  This patch moves kfree function out scope of
kretprobe_lock.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
bibo,mao
62c27be0dd [PATCH] kprobe whitespace cleanup
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Ananth N Mavinakayanahalli
3a872d89ba [PATCH] Kprobes: Make kprobe modules more portable
In an effort to make kprobe modules more portable, here is a patch that:

o Introduces the "symbol_name" field to struct kprobe.
  The symbol->address resolution now happens in the kernel in an
  architecture agnostic manner. 64-bit powerpc users no longer have
  to specify the ".symbols"
o Introduces the "offset" field to struct kprobe to allow a user to
  specify an offset into a symbol.
o The legacy mechanism of specifying the kprobe.addr is still supported.
  However, if both the kprobe.addr and kprobe.symbol_name are specified,
  probe registration fails with an -EINVAL.
o The symbol resolution code uses kallsyms_lookup_name(). So
  CONFIG_KPROBES now depends on CONFIG_KALLSYMS
o Apparantly kprobe modules were the only legitimate out-of-tree user of
  the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
  happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
o Modify tcp_probe.c that uses the kprobe interface so as to make it
  work on multiple platforms (in its earlier form, the code wouldn't
  work, say, on powerpc)

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Haavard Skinnemoen
16c564bb3c [PATCH] Generic ioremap_page_range: x86_64 conversion
Convert x86_64 to use generic ioremap_page_range()

[akpm@osdl.org: build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:32 -07:00
Atsushi Nemoto
8ef386092d [PATCH] kill wall_jiffies
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.

This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1".  This condition is never met so I
suppose it is just a bug.  I just remove that condition only instead of
kill the whole "if" block.

[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:27 -07:00
Keith Mannthey
45e0b78b05 [PATCH] hot-add-mem x86_64: use CONFIG_MEMORY_HOTPLUG_RESERVE
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid.  This patch allows the fucntion to do
something besides return 0.  It uses the nodes_add infomation to lookup to
node info for a hot add event.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
8c2676a587 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid node fixup
In cases where the acpi memory-add event does not containe the pxm (node)
infomation allow the driver to look up node info based on the address.  The
acpi_get_node call returns -1 if it can't decode the pxm info, this causes
add_memory to panic.  acpi_get_node would have to decode the resource from the
handle (a lenghty proposition).  This seems to be the cleanist point to
interject the hook.

[kamezawa.hiroyu@jp.fujitsu.com: build fixes]
[y-goto@jp.fujitsu.com: build fixes]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
4942e998b4 [PATCH] hot-add-mem x86_64: memory_add_physaddr_to_nid enable
The api for hot-add memory already has a construct for finding nodes based on
an address, memory_add_physaddr_to_nid.  This patch allows the fucntion to do
something besides return 0.  It uses the nodes_add infomation to lookup to
node info for a hot add event.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
71efa8fdc5 [PATCH] hot-add-mem x86_64: Enable SPARSEMEM in srat.c
Enable x86_64 srat.c to share code between both reserve and sparsemem based
add memory paths.  Both paths need the hot-add area node locality infomration
(nodes_add).  This code refactors the code path to allow this.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Keith Mannthey
ec69acbb11 [PATCH] hot-add-mem x86_64: Kconfig changes
Create Kconfig namespace for MEMORY_HOTPLUG_RESERVE and MEMORY_HOTPLUG_SPARSE.
 This is needed to create a disticiton between the 2 paths.  Selecting the
high level opiton of MEMORY_HOTPLUG will get you MEMORY_HOTPLUG_SPARSE if you
have sparsemem enabled or MEMORY_HOTPLUG_RESERVE if you are x86_64 with
discontig and ACPI numa support.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:18 -07:00
Andi Kleen
34596dc9e5 [PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Vivek Goyal
120b114237 [PATCH] Re-positioning the bss segment
[AK: This apparently broke some systems, but we need it to fix
a compile problem with old binutils and in theory the patch
is correct. So let's trying reenabling it again.]

o Currently bss segment is being placed somewhere in the middle (after .data)
  section and after bss lots of init section and data sections are coming.
  Is it intentional?

o One side affect of placing bss in the middle is that objcopy keeps the
  bss in raw binary image (vmlinux.bin) hence unnecessarily increasing
  the size of raw binary image. (In my case ~600K). It also increases
  the size of generated bzImage, though the increase is very small
  (896 bytes), probably a very high compression ratio for stream
  of zeros.

o This patch moves the bss at the end hence reducing the size of
  bzImage by 896 bytes and size of vmlinux.bin by 600K.

o This change benefits in the context of relocatable kernel patches. If
  kernel bss is not part of compressed data (vmlinux.bin) then it does
  not have to be decompressed and this area can be used by the decompressor
  for its execution hence keeping the memory requirements bounded and
  decompressor code does not stomp over any other data loaded beyond
  kernel image (As might be the case with bootloaders like kexec).

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
9d0ef4fd61 [PATCH] Use ARRAY_SIZE in setup.c
Based on i386 patch from Bjorn.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
29cbc78b90 [PATCH] x86: Clean up x86 NMI sysctls
Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
013bf2c50e [PATCH] Refactor some duplicated code in mpparse.c
No logic changes

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
d802ab981d [PATCH] Document iommu=panic
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ded318ec80 [PATCH] Fix broken indentation in iommu_setup
No functional changes; only white space.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
ece6684012 [PATCH] Allow disabling DAC using command line options
Might or might not work around some reported bugs on VIA systems.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Andi Kleen
81b999b10b [PATCH] Update defconfig
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:54 +02:00
Atsushi Nemoto
3171a0305d [PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem)
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.

Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update.  Passing ticks
get rid of this redundant calculation.  Also there are another redundancy
pointed out by Martin Schwidefsky.

This cleanup make a barrier added by
5aee405c66 needless.  So this patch removes
it.

As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies.  (This patch does not really
remove wall_jiffies.  It would be another cleanup patch)

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:15 -07:00
Sukadev Bhattiprolu
f400e198b2 [PATCH] pidspace: is_init()
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:12 -07:00
Rolf Eike Beer
d6bd3a39f7 [PATCH] Move valid_dma_direction() from x86_64 to generic code
As suggested by Muli Ben-Yehuda this function is moved to generic code as
may be useful for all archs.

[akpm@osdl.org: fix]
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:10 -07:00
Jason Baron
df67b3daea [PATCH] make PROT_WRITE imply PROT_READ
Make PROT_WRITE imply PROT_READ for a number of architectures which don't
support write only in hardware.

While looking at this, I noticed that some architectures which do not
support write only mappings already take the exact same approach.  For
example, in arch/alpha/mm/fault.c:

"
        if (cause < 0) {
                if (!(vma->vm_flags & VM_EXEC))
                        goto bad_area;
        } else if (!cause) {
                /* Allow reads even for write-only mappings */
                if (!(vma->vm_flags & (VM_READ | VM_WRITE)))
                        goto bad_area;
        } else {
                if (!(vma->vm_flags & VM_WRITE))
                        goto bad_area;
        }
"

Thus, this patch brings other architectures which do not support write only
mappings in-line and consistent with the rest.  I've verified the patch on
ia64, x86_64 and x86.

Additional discussion:

Several architectures, including x86, can not support write-only mappings.
The pte for x86 reserves a single bit for protection and its two states are
read only or read/write.  Thus, write only is not supported in h/w.

Currently, if i 'mmap' a page write-only, the first read attempt on that page
creates a page fault and will SEGV.  That check is enforced in
arch/blah/mm/fault.c.  However, if i first write that page it will fault in
and the pte will be set to read/write.  Thus, any subsequent reads to the page
will succeed.  It is this inconsistency in behavior that this patch is
attempting to address.  Furthermore, if the page is swapped out, and then
brought back the first read will also cause a SEGV.  Thus, any arbitrary read
on a page can potentially result in a SEGV.

According to the SuSv3 spec, "if the application requests only PROT_WRITE, the
implementation may also allow read access." Also as mentioned, some
archtectures, such as alpha, shown above already take the approach that i am
suggesting.

The counter-argument to this raised by Arjan, is that the kernel is enforcing
the write only mapping the best it can given the h/w limitations.  This is
true, however Alan Cox, and myself would argue that the inconsitency in
behavior, that is applications can sometimes work/sometimes fails is highly
undesireable.  If you read through the thread, i think people, came to an
agreement on the last patch i posted, as nobody has objected to it...

Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:05 -07:00
Eric W. Biederman
b89a81712f [PATCH] sysctl: Allow /proc/sys without sys_sysctl
Since sys_sysctl is deprecated start allow it to be compiled out.  This
should catch any remaining user space code that cares, and paves the way
for further sysctl cleanups.

[akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:19 -07:00
Shaohua Li
9a4b9efa1d [PATCH] x86 microcode: add sysfs and hotplug support
Add sysfs support.  Currently each CPU has three microcode related
attributes.  One is 'version' which shows current ucode version of CPU.
Tools can use the attribute do validation or show CPU ucode status.  one is
'reload' which allows manually reloading ucode.  Another is
'processor_flags', which exports processor flags, so we can write tools to
check if CPU has latest ucode.  Also add suspend/resume and CPU hotplug
support.

[akpm@osdl.org: cleanups, build fix]
[bunk@stusta.de: Kconfig fixes]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:18 -07:00
Shaohua Li
9a3110bf4b [PATCH] x86 microcode: microcode driver cleanup.
Clean up microcode update driver and make it more readable.

[akpm@osdl.org: cleanups]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:18 -07:00
Mel Gorman
fb01439c5b [PATCH] Allow an arch to expand node boundaries
Arch-independent zone-sizing determines the size of a node
(pgdat->node_spanned_pages) based on the physical memory that was
registered by the architecture.  However, when
CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
spanned_pages will be much larger and that mem_map will be allocated that
is used lated on memory hot-add.

This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
to call push_node_boundaries() which will set the node beginning and end to
at *least* the requested boundary.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:12 -07:00
Mel Gorman
9c7cd6877c [PATCH] Account for holes that are outside the range of physical memory
absent_pages_in_range() made the assumption that users of the API would not
care about holes beyound the end of physical memory.  This was not the
case.  This patch will account for ranges outside of physical memory as
holes correctly.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Mel Gorman
0e0b864e06 [PATCH] Account for memmap and optionally the kernel image as holes
The x86_64 code accounted for memmap and some portions of the the DMA zone as
holes.  This was because those areas would never be reclaimed and accounting
for them as memory affects min watermarks.  This patch will account for the
memmap as a memory hole.  Architectures may optionally use set_dma_reserve()
if they wish to account for a portion of memory in ZONE_DMA as a hole.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Mel Gorman
5cb248abf5 [PATCH] Have x86_64 use add_active_range() and free_area_init_nodes
Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:11 -07:00
Linus Torvalds
b278240839 Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
  [PATCH] Don't set calgary iommu as default y
  [PATCH] i386/x86-64: New Intel feature flags
  [PATCH] x86: Add a cumulative thermal throttle event counter.
  [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
  [PATCH] x86: Refactor thermal throttle processing
  [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
  [PATCH] Fix unwinder warning in traps.c
  [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
  [PATCH] x86: Move direct PCI scanning functions out of line
  [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
  [PATCH] Don't leak NT bit into next task
  [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
  [PATCH] Fix some broken white space in ia32_signal.c
  [PATCH] Initialize argument registers for 32bit signal handlers.
  [PATCH] Remove all traces of signal number conversion
  [PATCH] Don't synchronize time reading on single core AMD systems
  [PATCH] Remove outdated comment in x86-64 mmconfig code
  [PATCH] Use string instructions for Core2 copy/clear
  [PATCH] x86: - restore i8259A eoi status on resume
  [PATCH] i386: Split multi-line printk in oops output.
  ...
2006-09-26 13:07:55 -07:00
Rafael J. Wysocki
75534b50cc [PATCH] Change the name of pagedir_nosave
The name of the pagedir_nosave variable does not make sense any more, so it
seems reasonable to change it to something more meaningful.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:49:01 -07:00
Rafael J. Wysocki
e8eff5ac29 [PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64
On x86_64 machines with more than 2 GB of RAM there are large memory gaps
(with no corresponding kernel virtual addresses) and reserved memory
regions between areas of usable physical RAM.  Moreover, if CONFIG_FLATMEM
is set, they appear within the normal zone.  swsusp should not try to save
them, so the corresponding page structs have to be marked as 'nosave'.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:58 -07:00
Andrew Morton
a3bc0dbc81 [PATCH] smp_call_function_single() cleanup
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.

Move it into <linux/smp.h>.

Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:56 -07:00
Clemens Ladisch
1447c27d38 [PATCH] hpet rtc emulation: add watchdog timer
To prevent the emulated RTC timer from stopping when interrupts are delayed
for too long, disable interrupts around all of the register initialization,
and check that the interrupt handler did not schedule the next interrupt in
the past.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:54 -07:00
Dave McCracken
46a82b2d55 [PATCH] Standardize pxx_page macros
One of the changes necessary for shared page tables is to standardize the
pxx_page macros.  pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address.  pud_page and pgd_page, on the
other hand, return the kernel virtual address.

Shared page tables needs pud_page and pgd_page to return the actual page
structures.  There are very few actual users of these functions, so it is
simple to standardize their usage.

Since this is basic cleanup, I am submitting these changes as a standalone
patch.  Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.

Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:51 -07:00
Christoph Lameter
fb0e7942bd [PATCH] reduce MAX_NR_ZONES: make ZONE_DMA32 optional
Make ZONE_DMA32 optional

- Add #ifdefs around ZONE_DMA32 specific code and definitions.

- Add CONFIG_ZONE_DMA32 config option and use that for x86_64
  that alone needs this zone.

- Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL
  for ia64 and fix up the way per node ZVCs are calculated.

- Fall back to prior GFP_ZONEMASK of 0x03 if there is no
  DMA32 zone.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:46 -07:00
Christoph Lameter
776ed98b84 [PATCH] reduce MAX_NR_ZONES: remove two strange uses of MAX_NR_ZONES
I keep seeing zones on various platforms that are never used and wonder why we
compile support for them into the kernel.  Counters show up for HIGHMEM and
DMA32 that are alway zero.

This patch allows the removal of ZONE_DMA32 for non x86_64 architectures and
it will get rid of ZONE_HIGHMEM for arches not using highmem (like 64 bit
architectures).  If an arch does not define CONFIG_HIGHMEM then ZONE_HIGHMEM
will not be defined.  Similarly if an arch does not define CONFIG_ZONE_DMA32
then ZONE_DMA32 will not be defined.

No current architecture uses all the 4 zones (DMA,DMA32,NORMAL,HIGH) that we
have now.  The patchset will reduce the number of zones for all platforms.

On many platforms that do not have DMA32 or HIGHMEM this will reduce the
number of zones by 50%.  F.e.  ia64 only uses DMA and NORMAL.

Large amounts of memory can be saved for larger systemss that may have a few
hundred NUMA nodes.

With ZONE_DMA32 and ZONE_HIGHMEM support optional MAX_NR_ZONES will be 2 for
many non i386 platforms and even for i386 without CONFIG_HIGHMEM set.

Tested on ia64, x86_64 and on i386 with and without highmem.

The patchset consists of 11 patches that are following this message.

One could go even further than this patchset and also make ZONE_DMA optional
because some platforms do not need a separate DMA zone and can do DMA to all
of memory.  This could reduce MAX_NR_ZONES to 1.  Such a patchset will
hopefully follow soon.

This patch:

Fix strange uses of MAX_NR_ZONES

Sometimes we use MAX_NR_ZONES - x to refer to a zone.  Make that explicit.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26 08:48:46 -07:00
Andi Kleen
3f75f42d77 [PATCH] Don't set calgary iommu as default y
Most systems don't need it.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dave Jones
dcf10307c3 [PATCH] i386/x86-64: New Intel feature flags
Add supplemental SSE3 instructions flag, and Direct Cache Access flag.
As described in "Intel Processor idenfication and the CPUID instruction
AP485 Sept 2006"

AK: also added for x86-64

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
3222b36f46 [PATCH] x86: Add a cumulative thermal throttle event counter.
The counter is exported to /sys that keeps track of the
number of thermal events, such that the user knows how bad the
thermal problem might be (since the logging to syslog and mcelog
is rate limited).

AK: Fixed cpu hotplug locking

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Dmitriy Zavin
15d5f83983 [PATCH] x86: Refactor thermal throttle processing
Refactor the event processing (syslog messaging and rate limiting)
into separate file therm_throt.c. This allows consistent reporting
of CPU thermal throttle events.

After ACK'ing the interrupt, if the event is current, the user
(p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
the event. If that function returns 1, the user has the option to log
things further (such as to mce_log in x86_64).

AK: minor cleanup

Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:42 +02:00
Andi Kleen
0637a70a5d [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
Some buggy systems can machine check when config space accesses
happen for some non existent devices.  i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.

This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.

Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
8f60774a11 [PATCH] x86: Move direct PCI scanning functions out of line
Saves about 200 bytes of code space.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
f157cbb1eb [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.

Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
658fdbef66 [PATCH] Don't leak NT bit into next task
SYSENTER can cause a NT to be set which might cause crashes on the IRET
in the next task.

Following similar i386 patch from Linus.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Jan Beulich
adf1423698 [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.

This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information.  Then teach
the unwinder to decode this.

For normal call frames the unwinder now subtracts one from the address which avoids
this problem.  The standard libgcc unwinder uses the same trick.

It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.

This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame

[AK: added automatic detection of the new binutils and wrote description]

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
ab2e0b46cb [PATCH] Fix some broken white space in ia32_signal.c
No functional changes
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
536e3ee4fe [PATCH] Initialize argument registers for 32bit signal handlers.
In case the user space was compiled with -mregparm=3
Following i386. Pointed out by Albert Cahalan

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
dd54a11004 [PATCH] Remove all traces of signal number conversion
This was old code that was needed for iBCS and x86-64 never supported that.

Pointed out by Albert Cahalan
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
2049336f60 [PATCH] Don't synchronize time reading on single core AMD systems
We do some additional CPU synchronization in gettimeofday et.al. to make
sure the time stamps are always monotonic over multiple CPUs. But on
single core systems that is not needed. So don't do it.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
9ddab42d1e [PATCH] Remove outdated comment in x86-64 mmconfig code
Cc: gregkh@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Andi Kleen
27fbe5b28a [PATCH] Use string instructions for Core2 copy/clear
It is faster than using a unrolled loop for the use cases the kernel
cares about (cached, sizes typically < 4K)

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Matthew Garrett
35d534a3ff [PATCH] x86: - restore i8259A eoi status on resume
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:41 +02:00
Aaron Durbin
4c6e052adf [PATCH] MMCONFIG and new Intel motherboards
On Sat, Sep 09, 2006 at 04:14:29PM +0200, Andi Kleen wrote:
> [patch] Looks reasonable, but probably not for 2.6.18 because this stuff
> is already too fragile and it is probably too risky to do any big changes now
> since not enough testing time is left. Can you please resubmit
> it with proper description and signed-off-by line? I can queue it for .19 then
>
> -Andi

Patch inserts PCI memory mapped config region(s) into the resource map.  This
will allow for the MMCCONFIG regions to be marked as busy in the iomem
address space as well as the regions(s) showing up in /proc/iomem.

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Aaron Durbin
56dd669a13 [PATCH] Insert GART region into resource map
Patch inserts the GART region into the iomem resource map. The GART will then
be visible within /proc/iomem. It will also allow for other users
utilizing the GART to subreserve the region (agp or IOMMU).

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
9abd79280b [PATCH] i386/x86-64: Only do MCFG e820 check when type 1 works
Needs earlier patch to split type 1 probing from use.

This patch should fix the x86 macs where type 1 PCI config space access
doesn't work, but MCFG does. They also don't have a usable e820 table
so the e820 sanity check failed.

Instead assume now that if type 1 doesn't work then MCFG must work
and don't do the e820 check.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
5e544d618f [PATCH] i386/x86-64: PCI: split probing and initialization of type 1 config space access
First probe if type1/2 accesses work, but then only initialize them at the end.

This is useful for a later patch that needs this information inbetween.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
a15da49deb [PATCH] Fix idle notifiers
Previously exit_idle would be called more often than enter_idle

Now instead of using complicated tests just keep track of it
using the per CPU variable as a flip flop.  I moved the idle state into the
PDA to make the access more efficient.

Original bug report and an initial patch from Stephane Eranian,
but redone by AK.

Cc: Stephane Eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Eric W. Biederman
1c9c0a6ca3 [PATCH] Remove experimental mark of kexec
kexec has been marked experimental for a year now and all
of the serious problems have been worked through.  So it
is time (if not past time) to remove the experimental mark.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
b38337a624 [PATCH] Mark per cpu data initialization __initdata again
Before 2.6.16 this was changed to work around code that accessed
CPUs not in the possible map. But that code should be all fixed now,
so mark it __initdata again.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
26c13f2b5b [PATCH] Check return values of __copy_to_user in uname emulation
Quietens some new warnings
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:40 +02:00
Andi Kleen
95912008ba [PATCH] Add __must_check to copy_*_user
Following i386.

And also fix the two occurrences that caused warnings in arch/x86_64/*

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
3022d734a5 [PATCH] Fix zeroing on exception in copy_*_user
- Don't zero for __copy_from_user_inatomic following i386.
This will prevent spurious zeros for parallel file system writers when
one does a exception
- The string instruction version didn't zero the output on
exception. Oops.

Also I cleaned up the code a bit while I was at it and added a minor
optimization to the string instruction path.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
adurbin@google.com
54dbc0c9eb [PATCH] insert IOAPIC(s) and Local APIC into resource map
This patch places the IOAPIC(s) and the Local APIC specified by ACPI
tables into the resource map. The APICs will then be visible within
/proc/iomem

Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
7b0bda74f7 [PATCH] Fix a PDA warning uncovered by the new type checking
Fix
linux/arch/x86_64/kernel/process.c: In function __switch_to:
linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Andi Kleen
96e540492a [PATCH] Fix a irqcount comment in entry.S
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:39 +02:00
Arjan van de Ven
4f7fd4d7a7 [PATCH] Add the -fstack-protector option to the CFLAGS
Add a feature check that checks that the gcc compiler has stack-protector
support and has the bugfix for PR28281 to make this work in kernel mode.
The easiest solution I could find was to have a shell script in scripts/
to do the detection; if needed we can make this fancier in the future
without making the makefile too complex.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
CC: Sam Ravnborg <sam@ravnborg.org>
2006-09-26 10:52:39 +02:00
Arjan van de Ven
0a42540580 [PATCH] Add the canary field to the PDA area and the task struct
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.

Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Arjan van de Ven
b62a5c740d [PATCH] Add the Kconfig option for the stackprotector feature
This patch adds the config options for -fstack-protector.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
e8c7391de4 [PATCH] Don't use kernel_text_address in oops context
Because it can take spinlocks.

Suggested by Mathieu Desnoyers

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Magnus Damm
4bfaaef01a [PATCH] Avoid overwriting the current pgd (V4, x86_64)
kexec: Avoid overwriting the current pgd (V4, x86_64)

This patch upgrades the x86_64-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.

The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Keith Owens
f574164491 [PATCH] Remove most of the special cases for the debug IST stack
Remove most of the special cases for the debug IST stack.  This is a
follow on clean up patch, it requires the bug fix patch that adds
orig_ist.

Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
53ee11ae0d [PATCH] Optimize PDA accesses slightly
Based on a idea by Jeremy Fitzhardinge:

Replace the volatiles and memory clobbers in the PDA access with
telling gcc about access to a proxy PDA structure that doesn't
actually exist. But the dummy accesses give a defined ordering for
read/write accesses.

Also add some memory barriers to the early GS initialization to
make sure no PDA access is moved before it.

Advantage is some .text savings (probably most from better
code for accessing "current"):

   text    data     bss     dec     hex filename
4845647 1223688  615864 6685199  66020f vmlinux
4837780 1223688  615864 6677332  65e354 vmlinux-pda

1.2% smaller code

Cc:  Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Ian Campbell
f2a9e1dec2 [PATCH] Put .note.* sections into a PT_NOTE segment
This patch updates x86_64 linker script to pack any .note.* sections
into a PT_NOTE segment in the output file.

To do this, we tell ld that we need a PT_NOTE segment.  This requires
us to start explicitly mapping sections to segments, so we also need
to explicitly create PT_LOAD segments for text and data, and map the
sections to them appropriately.  Fortunately, each section will
default to its previous section's segment, so it doesn't take many
changes to vmlinux.lds.S.

The corresponding change is already made for i386 in -mm and I'd like
this patch to join it. The section to segment mappings do change as do
the segment flags so some time in -mm would be good for that reason as
well, just in case.

In particular .data and .bss move from the text segment to the data
segment and .data.cacheline_aligned .data.read_mostly are put in the
data segment instead of a separate one.

I think that it would be possible to exactly match the existing section
to segment mapping and flags but it would be a more intrusive change and
I'm not sure there is a reason for the existing layout other than it is
what you get by default if you don't explicitly specify something else.
If there is a reason for the existing layout then I will of course make
the more intrusive change. If there is no reason we could probably drop
the executable or writable flags from some segments but I don't know how
much attention is paid to them anyway so it might not be worth the
effort.

The vsyscall related sections need to go in a different segment to the
normal data segment and so I invented a "user" segment to contain them.
I believe this should appear to be another data segment as far as the
kernel is concerned so the flags are setup accordingly.

The notes will be used in the Xen paravirt_ops backend to provide
additional information to the domain builder. I am in the process of
converting the xen-unstable kernels and tools over to this scheme at the
moment to support this in the future.

It has been suggested to me that the notes segment should have flags 0
(i.e. not readable) since it is only used by the loader and is not used
at runtime. For now I went with a readable segment since that is what
the i386 patch uses.

AK: dropped NOTES addition right now because the needed infrastructure
for that is not merged yet

Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Eric W. Biederman
26374c7b7d [PATCH] Reload CS when startup_64 is used.
In long mode the %cs is largely a relic.  However there are a few cases
like iret where it matters that we have a valid value.  Without this
patch it is possible to enter the kernel in startup_64 without setting
%cs to a valid value.  With this patch we don't care what %cs value
we enter the kernel with, so long as the cs shadow register indicates
it is a privileged code segment.

Thanks to Magnus Damm for finding this problem and posting the
first workable patch.  I have moved the jump to set %cs down a
few instructions so we don't need to take an extra jump.  Which
keeps the code simpler.

Signed-of-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:38 +02:00
Andi Kleen
8380aabb99 [PATCH] Remove non e820 fallbacks in high level code
Drop support for non e820 BIOS calls to get the memory map.

The boot assembler code still has some support, but not the C code now.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Paolo 'Blaisorblade' Giarrusso
b3698c03eb [PATCH] Fix boot code head.S warning
When compiling a 64-bit kernel on an Ubuntu 6.06 32bit system (whose GCC is also
a cross-compiler for x86_64) I've seen that head.o is compiled as a 64-bit file
(while it should not) and ld complaining about this during linking:
[AK: it happens on all systems with new binutils]

ld: warning: i386:x86-64 architecture of input file
`arch/x86_64/boot/compressed/head.o' is incompatible with i386 output

I've verified that removing -m64 from compilation flags to turn
"-m64 -traditional -m32" into "-traditional -m32" fixes the issue.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
7a0a2dff1c [PATCH] Add a missing check for irq flags tracing in NMI
NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
did it anyways. Add a check.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
aecc63615e [PATCH] Fix coding style and output of the mptable parser
Give the printks a consistent prefix.
Add some missing white space.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
e4251e130d [PATCH] Remove some cruft in apic id checking during processor setup
- Remove a define that was used only once
- Remove the too large APIC ID check because we always support
the full 8bit range of APICs.
- Restructure code a bit to be simpler.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
f2c2cca3ac [PATCH] Remove APIC version/cpu capability mpparse checking/printing
ACPI went to great trouble to get the APIC version and CPU capabilities
of different CPUs before passing them to the mpparser. But all
that data was used was to print it out.  Actually it even faked some data
based on the boot cpu, not on the actual CPU being booted.

Remove all this code because it's not needed.

Cc: len.brown@intel.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
5e6b0bfe5b [PATCH] Use proper accessors to change PSE bits in change_page_attr()
Use normal pte accessors in change_page_attr() to access the PSE
bits.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
df992848f5 [PATCH] Fix pte_exec/mkexec and use it in change_page_attr()
Fix the pte_exec/mkexec page table accessor functions to really
use the NX bit. Previously they only checked the USER bit, but
weren't actually used for anything.

Then use them in change_page_attr() to manipulate the NX bit
properly.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
d3cf7f0615 [PATCH] Remove bogus warning from early_ioremap
It is correct for its only caller right now, but not for possible
future others.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
151f8cc116 [PATCH] Remove safe_smp_processor_id()
And replace all users with ordinary smp_processor_id.  The function
was originally added to get some basic oops information out even
if the GS register was corrupted. However that didn't
work for some anymore because printk is needed to print the oops
and it uses smp_processor_id() already. Also GS register corruptions
are not particularly common anymore.

This also helps the Xen port which would otherwise need to
do this in a special way because it can't access the local APIC.

Cc: Chris Wright <chrisw@sous-sol.org>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Rafael J. Wysocki
34464a5b89 [PATCH] Detect clock skew during suspend
Detect the situations in which the time after a resume from disk would
be earlier than the time before the suspend and prevent them from
happening on x86_64.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andrew Morton
1164c9994f [PATCH] make numa_emulation() __init
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:37 +02:00
Andi Kleen
dbf9272e86 [PATCH] Don't force reserve the 640k-1MB range
From i386 x86-64 inherited code to force reserve the 640k-1MB area.
That was needed on some old systems.

But we generally trust the e820 map to be correct on 64bit systems
and mark all areas that are not memory correctly.

This patch will allow to use the real memory in there.

Or rather the only way to find out if it's still needed is to
try. So far I'm optimistic.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Magnus Damm
ed77504b20 [PATCH] mark init_amd() as __cpuinit
The init_amd() function is only called from identify_cpu() which is already
marked as __cpuinit. So let's mark it as __cpuinit.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Keith Mannthey
6ad9165811 [PATCH] x86_64 kernel mapping fix
Fix for the x86_64 kernel mapping code.  Without this patch the update path
only inits one pmd_page worth of memory and tramples any entries on it.  now
the calling convention to phys_pmd_init and phys_init is to always pass a
[pmd/pud] page not an offset within a page.

Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Andrew Morton
abf0f10948 [PATCH] wire up oops_enter()/oops_exit()
Implement pause_on_oops() on x86_64.

AK: I redid the patch to do the oops_enter/exit in the existing
oops_begin()/end(). This makes it much shorter.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Arjan van de Ven
e07e23e1fd [PATCH] non lazy "sleazy" fpu implementation
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily.  This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time).  However for very frequent FPU users...  you
take an extra trap every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch.  If the app indeed uses the FPU, the
trap is avoided.  (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again).  The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.

[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:36 +02:00
Brice Goglin
40bee2ee73 [PATCH] fix bus numbering format in mmconfig warning
Make an mmconfig warning print the bus id with a regular format.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:35 +02:00
Eric W. Biederman
ba4d40bb5c [PATCH] Auto size the per cpu area.
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.

Currently everything (except module data) we place in
the per cpu area we know about at compile time.  So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.

It isn't perfect but it is much less of a pain to
work with than what we are doing now.

AK: fixed warning

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
273819a2d9 [PATCH] make fault notifier unconditional and export it
It's needed for external debuggers and overhead is very small.

Also make the actual notifier chain they use static

Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
2717941c6a [PATCH] Make boot_param_data pure BSS
Since it's all zero.

Actually I think gcc 4+ will do that automatically, but earlier compilers won't
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Andi Kleen
1edf777803 [PATCH] i386/x86-64: Improve Kconfig description of CRASH_DUMP
Improve Kconfig description of CONFIG_CRASH_DUMP. Previously
it was too brief to be useful.

Cc: vgoyal@in.ibm.com
Cc: ebiederm@xmission.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:35 +02:00
Dimitri Sivanich
cbf9b4bb76 [PATCH] X86_64 monotonic_clock goes backwards
I've noticed some erratic behavior while testing the X86_64 version
of monotonic_clock().

While spinning in a loop reading monotonic clock values (pinned to a
single cpu) I noticed that the difference between subsequent values
occasionally went negative (time going backwards).

I found that in the following code:
                this_offset = get_cycles_sync();
                /* FIXME: 1000 or 1000000? */
-->             offset = (this_offset - last_offset)*1000 / cpu_khz;
        }
        return base + offset;

the offset sometimes turns out to be 0, even though
this_offset > last_offset.

+Added fix From: Toyo Abe <toyoa@mvista.com>

The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
the updating of monotonic_base. It now uses cycles_2_ns().

I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
Because cycles_2_ns() can be called from the timer ISR right after the irq0
is enabled.

Signed-off-by: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Prasanna S.P
d28c4393a7 [PATCH] x86: error_code is not safe for kprobes
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.

AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.

+From: Jeremy Fitzhardinge <jeremy@goop.org>

KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.

Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup.  Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup).  It's
surprising this didn't cause more havok.

The fix is to use .pushsection/.popsection, so this stuff nests
properly.  A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.

+From: Chuck Ebbert <76306.1226@compuserve.com>

Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.

Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
be7a91709b [PATCH] Check for end of stack trace before falling back
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
c0b766f13d [PATCH] Merge stacktrace and show_trace
This unifies the standard backtracer and the new stacktrace
in memory backtracer. The standard one is converted to use callbacks
and then reimplement stacktrace using new callbacks.

The main advantage is that stacktrace can now use the new dwarf2 unwinder
and avoid false positives in many cases.

I kept it simple to make sure the standard backtracer stays reliable.

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
b7f5e3c774 [PATCH] Don't access the APIC in safe_smp_processor_id when it is not mapped yet
Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
uses safe_smp_processor_id which tries to access the local APIC page.
But that doesn't work before the APIC code has set up its fixmap.

Check for this case and always return boot cpu then.

Cc: jbeulich@novell.com
Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Andi Kleen
5a1b3999d6 [PATCH] x86: Some preparationary cleanup for stack trace
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)

Cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:34 +02:00
Muli Ben-Yehuda
4ea8a5d8b5 [PATCH] Calgary IOMMU: eradicate sole remaining 80 chars per line offender
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
4ccf4ae314 [PATCH] remove tce_cache_blast_stress()
tce_cache_blast_stress was useful during bringup to stress the IOMMU's
cache flushing. Now that we quiesce DMAs on every cache flush, using
_stress() brings the machine down to its knees once you put it under
load. Remove this debug / bringup code that isn't useful anymore
completely.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
796e4390e0 [PATCH] only verify the allocation bitmap if CONFIG_IOMMU_DEBUG is on
Introduce new function verify_bit_range(). Define two versions, one
for CONFIG_IOMMU_DEBUG enabled and one for disabled. Previously we
were checking that the bitmap was consistent every time we allocated
or freed an entry in the TCE table, which is good for debugging but
incurs an unnecessary penalty on non debug builds.

Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Muli Ben-Yehuda
de684652f3 [PATCH] print whether CONFIG_IOMMU_DEBUG is enabled
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Chuck Ebbert
2ade2920dc [PATCH] i386/x86-64: rename is_at_popf(), add iret to tests and fix
is_at_popf() needs to test for the iret instruction as well as
popf.  So add that test and rename it to is_setting_trap_flag().

Also change max insn length from 16 to 15 to match reality.

LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Fernando Luis Vázquez Cao
2b94ab2fd5 [PATCH] Replace local_save_flags+local_irq_disable with
The combination of "local_save_flags" and "local_irq_disable" seems to be
equivalent to "local_irq_save" (see code snips below). Consequently, replace
occurrences of local_save_flags+local_irq_disable with local_irq_save.

* local_irq_save
#define raw_local_irq_save(flags) \
                do { (flags) = __raw_local_irq_save(); } while (0)

static inline unsigned long __raw_local_irq_save(void)
{
        unsigned long flags = __raw_local_save_flags();

        raw_local_irq_disable();

        return flags;
}

* local_save_flags
#define raw_local_save_flags(flags) \
                do { (flags) = __raw_local_save_flags(); } while (0)

Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
52d522f53f [PATCH] Fix sparse warnings in compat aout code
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
ddb15ec130 [PATCH] Fix most sparse warnings in sys_ia32.c
Mostly by adding casts.

I didn't touch the "invalid access past ..." which are caused
by the sigset conversion.
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
dd2994f619 [PATCH] Add sparse annotations to quiet sparse in arch/x86_64/mm/fault.c
Fixes

linux/arch/x86_64/mm/fault.c:125:7: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:125:7:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:125:7:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:163:8: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:163:8:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:163:8:    got unsigned char *[assigned] instr
linux/arch/x86_64/mm/fault.c:179:9: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/mm/fault.c:179:9:    expected void [noderef] *<noident><asn:1>
linux/arch/x86_64/mm/fault.c:179:9:    got unsigned long *<noident>

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
131cfd7bd5 [PATCH] Add sparse annotation to vsyscall.c
Fixes

linux/arch/x86_64/kernel/vsyscall.c:276:7: warning: constant 0x0f40000000000 is so big it is long
linux/arch/x86_64/kernel/vsyscall.c:80:14: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:80:14:    expected void const volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:80:14:    got void *<noident>
linux/arch/x86_64/kernel/vsyscall.c:200:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:200:7:    expected unsigned short [usertype] *map1
linux/arch/x86_64/kernel/vsyscall.c:200:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:203:7: warning: incorrect type in assignment (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:203:7:    expected unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:203:7:    got void [noderef] *<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:215:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:215:10:    got unsigned short [usertype] *map2
linux/arch/x86_64/kernel/vsyscall.c:217:10: warning: incorrect type in argument 1 (different address spaces)
linux/arch/x86_64/kernel/vsyscall.c:217:10:    expected void volatile [noderef] *addr<asn:2>
linux/arch/x86_64/kernel/vsyscall.c:217:10:    got unsigned short [usertype] *map1

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
3bd4d18cba [PATCH] Move e820 map into e820.c
Minor cleanup. Keep setup.c free from unrelated clutter.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
c31fbb1ad8 [PATCH] Clean up acpi_numa variable
Move it into srat.c No need to clutter up setup.c for it

And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
df3bb57d2c [PATCH] i386/x86-64: Move acpi_disabled variables into acpi/boot.c
Removes code duplication between i386/x86-64.

Not needed anymore in setup.c since early_param cleanup

Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:33 +02:00
Andi Kleen
43c85c9c5d [PATCH] Remove need for early lockdep init
I think it was only needed for the printks and we can do them later.

I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)

This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.

cc: mingo@elte.hu

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00
Andi Kleen
2c8c0e6b8d [PATCH] Convert x86-64 to early param
Instead of hackish manual parsing

Requires earlier i386 patchkit, but also fixes i386 early_printk again.

I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)

Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)

This cleans up a lot of code.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:32 +02:00