lockdep just caught this one:
=================================
[ INFO: inconsistent lock state ]
2.6.24 #38
---------------------------------
inconsistent {in-softirq-W} -> {softirq-on-W} usage.
swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
(pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250
{in-softirq-W} state was registered at:
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 394559
hardirqs last enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0
hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0
softirqs last enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0
softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30
other info that might help us debug this:
no locks held by swapper/1.
stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24 #38
Call Trace:
[<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190
[<ffffffff8024f55d>] mark_lock+0x53d/0x560
[<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0
[<ffffffff80250ba8>] lock_acquire+0xa8/0xe0
[<ffffffff8022a9ea>] ? mm_init+0x1da/0x250
[<ffffffff809bcd10>] _spin_lock+0x30/0x70
[<ffffffff8022a9ea>] mm_init+0x1da/0x250
[<ffffffff8022aa99>] mm_alloc+0x39/0x50
[<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0
[<ffffffff8028d12b>] do_execve+0x7b/0x220
[<ffffffff80209776>] sys_execve+0x46/0x70
[<ffffffff8020c214>] kernel_execve+0x64/0xd0
[<ffffffff8020901e>] ? _stext+0x1e/0x20
[<ffffffff802090ba>] init_post+0x9a/0xf0
[<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0
[<ffffffff8020c1a8>] ? child_rip+0xa/0x12
[<ffffffff8020bcbc>] ? restore_args+0x0/0x44
[<ffffffff8020c19e>] ? child_rip+0x0/0x12
turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe
way for almost two years, since commit 8c914cb704.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix build bug:
drivers/virtio/virtio_balloon.c: In function 'fill_balloon':
drivers/virtio/virtio_balloon.c:98: error: implicit declaration of function 'msleep'
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pavel Emelyanov reported that his networking card did not work
and bisected it down to:
"
The commit
093af8d7f0
x86_32: trim memory by updating e820
broke my e1000 card: on loading driver says that
e1000: probe of 0000:04:03.0 failed with error -5
and the interface doesn't appear.
"
on a 32-bit kernel, base will overflow when try to do PAGE_SHIFT,
and highest_addr will always less 4G.
So use pfn instead of address to avoid the overflow when more than
4g RAM is installed on a 32-bit kernel.
Many thanks to Pavel Emelyanov for reporting and testing it.
Bisected-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay the CPA self-test so that any impact (corruption) of
user-space pagetables can be triggered. Repeat the test
every 30 seconds.
this would have prevented the bug fixed by 8cb2a7c1e9,
at its source.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The .rodata section really should just be read only; the config option
is there to make breaking up the 2Mb page an option (so people whos machines
give more performance for the 2Mb case can opt to do so).
But when the page gets split anyway, this is no longer an issue, so
clean up the code and remove the ifdefs
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The .rodata section shouldn't just be read-only,
but also non-executable. This is free since we've broken
up the 2MB page already anyway.
also update test_nx to check for this.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This change broke recovery of exceptions in iret:
commit 72fe485854
Author: Glauber de Oliveira Costa <gcosta@redhat.com>
x86: replace privileged instructions with paravirt macros
The ENTRY(native_iret) macro adds alignment padding before the iretq
instruction, so "iret_label" no longer points exactly at the instruction.
It was sloppy to leave the old "iret_label" label behind when replacing
its nearby use. Removing it would have revealed the other use of the
label later in the file, and upon noticing that use, anyone exercising
the minimum of attention to detail expected of anyone touching this
subtle code would realize it needed to change as well.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:7: warning: symbol 'hi' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:6: originally declared here
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:830:15: warning: symbol 'lo' shadows an earlier one
arch/x86/kernel/cpu/cpufreq/powernow-k8.c:824:14: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This was being used to ensure the proper alignment of the FXSAVE/FXRSTOR data.
This would create a sparse error in the _correct_ cases, hiding further
warnings. Use BUILD_BUG_ON instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In my revamp of the x86 ptrace code for setting register values,
I accidentally omitted a check that was there in the old code.
Allowing %cs to be 0 causes a bad crash in recovery from iret failure.
This patch fixes that regression against 2.6.24, and adds a comment
that should help prevent this subtlety from being overlooked again.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
based on similar patch from: Pavel Machek <pavel@ucw.cz>
Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free
(but not obliged to) randomize the brk area.
Heap randomization breaks ancient binaries, so we keep COMPAT_BRK
enabled by default.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is a check in sys_brk(), that tries to make sure that we do not
underflow the area that is dedicated to brk heap.
The check is however wrong, as it assumes that brk area starts immediately
after the end of the code (+bss), which is wrong for example in
environments with randomized brk start. The proper way is to check whether
the address is not below the start_brk address.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In very rare cases, on certain CPUs, we could end up in the spurious
fault handler and ignore a large pud/pmd mapping. The resulting pte
pointer points into the mapped physical space and dereferencing it
will fault recursively.
Make the code aware of large mappings and do the permission check
on the pmd/pud entry, when a large pud/pmd mapping is detected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify the x86-64 behavior for 32-bit processes that set
bogus %cs/%ss values (the only ones that can fault in iret)
match what the native i386 behavior is. (do not kill the task
via do_exit but generate a SIGSEGV signal)
[ tglx@linutronix.de: build fix ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'async-tx-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop:
async_tx: allow architecture specific async_tx_find_channel implementations
async_tx: replace 'int_en' with operation preparation flags
async_tx: kill tx_set_src and tx_set_dest methods
async_tx: kill ASYNC_TX_ASSUME_COHERENT
iop-adma: use LIST_HEAD instead of LIST_HEAD_INIT
async_tx: use LIST_HEAD instead of LIST_HEAD_INIT
async_tx: fix compile breakage, mark do_async_xor __always_inline
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
SELinux: Remove security_get_policycaps()
security: allow Kconfig to set default mmap_min_addr protection
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix.c:piix_init_one() must be __devinit
sata_via.c: Remove missleading comment.
libata-core: unblacklist HITACHI drives
sata_nv: fix ATAPI issues with memory over 4GB (v7)
ata: drivers/ata/sata_mv.c needs dmapool.h
libata: kill now unused n_iter and fix sata_fsl
ahci: fix CAP.NP and PI handling
sata_mv: Support SoC controllers
Rename: linux/pata_platform.h to linux/ata_platform.h
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
virtio net: fix oops on interface-up
Fix PHY Lib support for gianfar and ucc_geth
forcedeth: preserve registers
forcedeth: phy status fix
forcedeth: restart tx/rx
ipvs: Make wrr "no available servers" error message rate-limited
[PPPOL2TP]: Label unused warning when CONFIG_PROC_FS is not set.
[NET_SCHED]: cls_flow: support classification based on VLAN tag
[VLAN]: Constify skb argument to vlan_get_tag()
[NET_SCHED]: cls_flow: fix key mask validity check
[NET_SCHED]: em_meta: fix compile warning
b43: Fix DMA for 30/32-bit DMA engines
b43: fix build with CONFIG_SSB_PCIHOST=n
mac80211: Is not EXPERIMENTAL anymore
iwl3945-base.c: fix off-by-one errors
b43legacy: fix DMA slot resource leakage
b43legacy: drop packets we are not able to encrypt
b43legacy: fix suspend/resume
b43legacy: fix PIO crash
Generic HDLC - use random_ether_addr()
...
Warning is reproducible with selected FB_CFB_REV_PIXELS_IN_BYTE.
CC drivers/video/sysfillrect.o
In file included from drivers/video/sysfillrect.c:18:
drivers/video/fb_draw.h: In function `fb_rev_pixels_in_long':
drivers/video/fb_draw.h:94: warning: no return statement in function returning non-void
CC drivers/video/syscopyarea.o
In file included from drivers/video/syscopyarea.c:22:
drivers/video/fb_draw.h: In function `fb_rev_pixels_in_long':
drivers/video/fb_draw.h:94: warning: no return statement in function returning non-void
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Include linux/delay.h to fix compiler error:
drivers/virtio/virtio_balloon.c: In function 'fill_balloon':
drivers/virtio/virtio_balloon.c:98: error: implicit declaration of function 'msleep'
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We cannot start transaction in ext3_direct_IO() and just let it last during
the whole write because dio_get_page() acquires mmap_sem which ranks above
transaction start (e.g. because we have dependency chain
mmap_sem->PageLock->journal_start, or because we update atime while holding
mmap_sem) and thus deadlocks could happen. We solve the problem by
starting a transaction separately for each ext3_get_block() call.
We *could* have a problem that we allocate a block and before its data are
written out the machine crashes and thus we expose stale data. But that
does not happen because for hole-filling generic code falls back to
buffered writes and for file extension, we add inode to orphan list and
thus in case of crash, journal replay will truncate inode back to the
original size.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move a few kernel-only things into __KERNEL__.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ext[234]_bg_has_super() to remove duplicate code.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The argument chain for ext[234]_find_goal() is not used. This patch removes
it and fixes comment as well.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ext[234]_get_group_desc() to get group descriptor from group number.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment in ext[234]_new_blocks() describes about "i". But there is no
local variable called "i" in that scope. I guess it has been renamed to
group_no.
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext3 file system was by default ignoring errors and continuing. This is
not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system readonly. Debian
and ubuntu already does this as the default in their fstab.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes some instances where we were continuing after calling
ext3_error. ext3_error calls panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext3_error call
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__journal_abort_hard() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
None of the callers of this function does actually take the BKL as far as I
can see. So remove the comment refering to the BKL.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: <linux-ext4@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No BKL used anywhere, so don't mention it.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I checked ext2_ioctl and could not find anything in there that would need the
BKL. So convert it over to use unlocked_ioctl
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a new block bitmap is read from disk in read_block_bitmap() there are a
few bits that should ALWAYS be set. In particular, the blocks given
corresponding to block bitmap, inode bitmap and inode tables. Validate the
block bitmap against these blocks.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a new block bitmap is read from disk in read_block_bitmap() there are a
few bits that should ALWAYS be set. In particular, the blocks given
corresponding to block bitmap, inode bitmap and inode tables. Validate the
block bitmap against these blocks.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some Supermicro BIOSes describe a SATA PCI BAR as a motherboard resource.
The PNP system driver claims motherboard resources, and this prevents the
sata_nv driver from requesting it later.
This patch disables the PNP0C01/PNP0C02 resources so they won't be claimed
by the PNP system driver, so they'll available for sata_nv.
This fixes the bugs below, where sata_nv detects only two out of four SATA
drives. The signature includes dmesg lines similar to these:
pnp: 00:09: iomem range 0xdfefc000-0xdfefcfff has been reserved
pnp: 00:09: iomem range 0xdfefd000-0xdfefd3ff has been reserved
pnp: 00:09: iomem range 0xdfefe000-0xdfefe3ff has been reserved
PCI: Unable to reserve mem region #6:1000@dfefd000 for device 0000:80:07.0
sata_nv: probe of 0000:80:07.0 failed with error -16
PCI: Unable to reserve mem region #6:1000@dfefe000 for device 0000:80:08.0
sata_nv: probe of 0000:80:08.0 failed with error -16
References:
https://bugzilla.redhat.com/show_bug.cgi?id=280641https://bugzilla.redhat.com/show_bug.cgi?id=313491http://lkml.org/lkml/2008/1/9/449http://thread.gmane.org/gmane.linux.acpi.devel/27312
This is post-2.6.24 material.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PNP_DRIVER_RES_DO_NOT_CHANGE flag is meant to signify that the PNP core
should not change resources for the device -- not that it shouldn't
disable/enable the device on suspend/resume.
ALSA ISAPnP drivers set PNP_DRIVER_RES_DO_NOT_CHANAGE (0x0001) through
setting PNP_DRIVER_RES_DISABLE (0x0003). The latter including the former
may in itself be considered rather unexpected but doesn't change that
suspend/resume wouldn't seem to have any business testing the flag.
As reported by Ondrej Zary for snd-cs4236, ALSA driven ISAPnP cards don't
survive swsusp hibernation with the resume skipping setting the resources
due to testing the flag -- the same test in the suspend path isn't enough
to keep hibernation from disabling the card it seems.
These tests were added (in 2005) by Piere Ossman in commit
68094e3251, "alsa: Improved PnP suspend
support" who doesn't remember why. This deletes them.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Tested-by: Ondrej Zary <linux@rainbow-software.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are three kind of parse functions provided by PNP acpi/bios:
- get current resources
- set resources
- get possible resources
The first two may be needed later at runtime.
The possible resource settings should never change dynamically.
And even if this would make any sense (I doubt it), the current implementation
only parses possible resource settings at early init time:
-> declare all the option parsing __init
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-By: Rene Herman <rene.herman@gmail.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make pnp_activate_dev() and pnp_disable_dev() return only 0 (success) or a
negative error value, as pci_enable_device() and pci_disable_device() do.
Previously they returned:
0: device was already active (or disabled)
1: we just activated (or disabled) device
<0: -EBUSY or error from pnp_start_dev() (or pnp_stop_dev())
Now we return only 0 (device is active or disabled) or <0 (error).
All in-tree callers either ignore the return values or check only for
errors (negative values).
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
raid5's 'make_request' function calls generic_make_request on underlying
devices and if we run out of stripe heads, it could end up waiting for one of
those requests to complete. This is bad as recursive calls to
generic_make_request go on a queue and are not even attempted until
make_request completes.
So: don't make any generic_make_request calls in raid5 make_request until all
waiting has been done. We do this by simply setting STRIPE_HANDLE instead of
calling handle_stripe().
If we need more stripe_heads, raid5d will get called to process the pending
stripe_heads which will call generic_make_request from a
This change by itself causes a performance hit. So add a change so that
raid5_activate_delayed is only called at unplug time, never in raid5. This
seems to bring back the performance numbers. Calling it in raid5d was
sometimes too soon...
Neil said:
How about we queue it for 2.6.25-rc1 and then about when -rc2 comes out,
we queue it for 2.6.24.y?
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Tested-by: dean gaudet <dean@arctic.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>