Add a default poll idle state with 0 latency. Provides an option to users
to use poll_idle by using 0 as the latency requirement.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
lockdep just caught this one:
=================================
[ INFO: inconsistent lock state ]
2.6.24 #38
---------------------------------
inconsistent {in-softirq-W} -> {softirq-on-W} usage.
swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
(pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250
{in-softirq-W} state was registered at:
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 394559
hardirqs last enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0
hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0
softirqs last enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0
softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30
other info that might help us debug this:
no locks held by swapper/1.
stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24 #38
Call Trace:
[<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190
[<ffffffff8024f55d>] mark_lock+0x53d/0x560
[<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0
[<ffffffff80250ba8>] lock_acquire+0xa8/0xe0
[<ffffffff8022a9ea>] ? mm_init+0x1da/0x250
[<ffffffff809bcd10>] _spin_lock+0x30/0x70
[<ffffffff8022a9ea>] mm_init+0x1da/0x250
[<ffffffff8022aa99>] mm_alloc+0x39/0x50
[<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0
[<ffffffff8028d12b>] do_execve+0x7b/0x220
[<ffffffff80209776>] sys_execve+0x46/0x70
[<ffffffff8020c214>] kernel_execve+0x64/0xd0
[<ffffffff8020901e>] ? _stext+0x1e/0x20
[<ffffffff802090ba>] init_post+0x9a/0xf0
[<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0
[<ffffffff8020c1a8>] ? child_rip+0xa/0x12
[<ffffffff8020bcbc>] ? restore_args+0x0/0x44
[<ffffffff8020c19e>] ? child_rip+0x0/0x12
turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe
way for almost two years, since commit 8c914cb704.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pavel Emelyanov reported that his networking card did not work
and bisected it down to:
"
The commit
093af8d7f0
x86_32: trim memory by updating e820
broke my e1000 card: on loading driver says that
e1000: probe of 0000:04:03.0 failed with error -5
and the interface doesn't appear.
"
on a 32-bit kernel, base will overflow when try to do PAGE_SHIFT,
and highest_addr will always less 4G.
So use pfn instead of address to avoid the overflow when more than
4g RAM is installed on a 32-bit kernel.
Many thanks to Pavel Emelyanov for reporting and testing it.
Bisected-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay the CPA self-test so that any impact (corruption) of
user-space pagetables can be triggered. Repeat the test
every 30 seconds.
this would have prevented the bug fixed by 8cb2a7c1e9,
at its source.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The .rodata section really should just be read only; the config option
is there to make breaking up the 2Mb page an option (so people whos machines
give more performance for the 2Mb case can opt to do so).
But when the page gets split anyway, this is no longer an issue, so
clean up the code and remove the ifdefs
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The .rodata section shouldn't just be read-only,
but also non-executable. This is free since we've broken
up the 2MB page already anyway.
also update test_nx to check for this.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This change broke recovery of exceptions in iret:
commit 72fe485854
Author: Glauber de Oliveira Costa <gcosta@redhat.com>
x86: replace privileged instructions with paravirt macros
The ENTRY(native_iret) macro adds alignment padding before the iretq
instruction, so "iret_label" no longer points exactly at the instruction.
It was sloppy to leave the old "iret_label" label behind when replacing
its nearby use. Removing it would have revealed the other use of the
label later in the file, and upon noticing that use, anyone exercising
the minimum of attention to detail expected of anyone touching this
subtle code would realize it needed to change as well.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:7: warning: symbol 'hi' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:6: originally declared here
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:15: warning: symbol 'lo' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:14: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This was being used to ensure the proper alignment of the FXSAVE/FXRSTOR data.
This would create a sparse error in the _correct_ cases, hiding further
warnings. Use BUILD_BUG_ON instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In my revamp of the x86 ptrace code for setting register values,
I accidentally omitted a check that was there in the old code.
Allowing %cs to be 0 causes a bad crash in recovery from iret failure.
This patch fixes that regression against 2.6.24, and adds a comment
that should help prevent this subtlety from being overlooked again.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In very rare cases, on certain CPUs, we could end up in the spurious
fault handler and ignore a large pud/pmd mapping. The resulting pte
pointer points into the mapped physical space and dereferencing it
will fault recursively.
Make the code aware of large mappings and do the permission check
on the pmd/pud entry, when a large pud/pmd mapping is detected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the x86-64 behavior for 32-bit processes that set
bogus %cs/%ss values (the only ones that can fault in iret)
match what the native i386 behavior is. (do not kill the task
via do_exit but generate a SIGSEGV signal)
[ tglx@linutronix.de: build fix ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
That patch adds the RTC emulation of the HPET timer to the new RTC_DRV_CMOS.
The old drivers/char/rtc.ko driver had that functionality and it's important
on new systems.
[akpm@linux-foundation.org: unbreak alpha build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
calibrate_delay() must be __cpuinit, not __{dev,}init.
I've verified that this is correct for all users.
While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>
This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:
WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- All implementations can be __devinit
- The function prototypes were in asm/timex.h but they all must be the same,
so create a single declaration in linux/timex.h.
- uninline the sparc64 version to match the other architectures
- Don't bother #defining ARCH_HAS_READ_CURRENT_TIMER to a particular value.
[ezk@cs.sunysb.edu: fix build]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When change_page_attr splits a large page on x86_32 (without PAE), it is
currently corrupting every process's page directory: fix that by removing
the thinko which passes down a physical instead of a virtual address.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp: remove flush_agp_mappings calls from new flush handling code
intel-agp: introduce IS_I915 and do some cleanups..
[intel_agp] fix name for G35 chipset
intel-agp: fixup resource handling in flush code.
intel-agp: add new chipset ID
agp: remove unnecessary pci_dev_put
agp: remove uid comparison as security check
fix AGP warning
agp/intel: Add chipset flushing support for i8xx chipsets.
intel-agp: add chipset flushing support
agp: add chipset flushing support to AGP interface
(with Martin Schwidefsky <schwidefsky@de.ibm.com>)
The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
first argument. The free functions do not get the mm_struct argument. This
is 1) asymmetrical and 2) to do mm related page table allocations the mm
argument is needed on the free function as well.
[kamalesh@linux.vnet.ibm.com: i386 fix]
[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This kills unused __clear_bit_string and find_next_zero_string (they
were used by only gart and calgary IOMMUs).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Muli Ben-Yehuda <mulix@mulix.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wires up the new timerfd API to the x86 family.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kvm/x86.c: In function 'emulator_cmpxchg_emulated':
arch/x86/kvm/x86.c:1746: warning: passing argument 2 of 'vcpu->arch.mmu.gva_to_gpa' makes integer from pointer without a cast
arch/x86/kvm/x86.c:1746: warning: 'addr' is used uninitialized in this function
Is true. Local variable `addr' shadows incoming arg `addr'. Avi is on
vacation for a while, so...
Cc: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for flushing the chipsets on the 915, 945, 965 and G33
families of Intel chips.
The BIOS doesn't seem to always allocate the BAR on the 965 chipsets
so I have to use pci resource code to create a resource
It adds an export for pcibios_align_resource.
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (25 commits)
virtio: balloon driver
virtio: Use PCI revision field to indicate virtio PCI ABI version
virtio: PCI device
virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
virtio_blk: Dont waste major numbers
virtio_blk: provide getgeo
virtio_net: parametrize the napi_weight for virtio receive queue.
virtio: free transmit skbs when notified, not on next xmit.
virtio: flush buffers on open
virtnet: remove double ether_setup
virtio: Allow virtio to be modular and used by modules
virtio: Use the sg_phys convenience function.
virtio: Put the virtio under the virtualization menu
virtio: handle interrupts after callbacks turned off
virtio: reset function
virtio: populate network rings in the probe routine, not open
virtio: Tweak virtio_net defines
virtio: Net header needs hdr_len
virtio: remove unused id field from struct virtio_blk_outhdr
virtio: clarify NO_NOTIFY flag usage
...
Use set_pte() for setting up the 2MB pages in the direct mapping.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pci-gart needs to unmap the IOMMU aperture to prevent cache corruptions.
Switch this over to using set_memory_np() instead of clear_kernel_mapping().
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
pud and pmd entries in the RAM area might be marked as non present.
Do not try to modify them in the selftest.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the readout of the large entry into the spinlock section to
prevent an unlikely but possible race.
Mark the pmd/pud entry present after the split. We preserved the
non present bit in the new split mapping.
Remove the stale gfp_flags double initialization.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
lookup_address() returns a wrong level and a wrong pointer to a non
existing pte, when pmd or pud entries are marked !present. This
happens for example due to boot time mapping of GART into the low
memory space.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
An Athlon 64 X2 test system showed hard hangs shortly after marking
the kernel text read-only, if we tried to preserve largepages and
changed the PSE entry from RW to RO. The pagetable code itself is
correct, it's the CPU that locked up hard (and not even the NMI
watchdog could punch through that hard hang).
So be conservative and always do splitups - like we did in the past.
Signed-off-by: Ingo Molnar <mingo@elte.hu>