In some situations PAV alias devices on LPAR are not accessible.
The initialization procedure required to enable access to PAV alias
devices has to be performed per storage server subsystem and not
only once per storage server.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd_page_cache should return page addresses and therefore the
cache must be created with an alignment of PAGE_SIZE.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The dasd function dasd_set_uid calls kzalloc while holding the
dasd_devmap_lock. Rearrange the code to do the memory allocation
outside the lock.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The request queue flush function of the dasd driver has to dequeue
the requests first and then call the end request function. Otherwise
a kernel bug in ll_rw_block.c might get triggered.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use __cpuinit for CPU hotplug notifier function.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove system device class for xpram. It creates the directory hierarchy
under /sys/devices/system/xpram/xpram0. The xpram0 directory is empty and
it is always created while xpram1 and following devices are always missing,
independent if the devices exist or not. Since the xpram devices are
listed in /proc/partitions and /sys/block/ as slram<x> the system device
class for xpram is meaningless.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
I/O on a CCW device may stall if a channel path to that device is
logicaly varied off/on. A user I/O interrupt can get misinterpreted
as interrupt for an internal path verification operation due to a
missing check and is therefore never reported to the device driver.
Correct check for pending interruptions before starting path
verification.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Do a retry of read device characteristics / read configuration
data when a deferred condition code 1 is encountered in
ccw_device_wake_up().
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Without this patch register_tape_dev() will always fail, but might
return a value that is not an error number. This will lead to accesses
to already freed memory areas...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa:
[ALSA] Don't reject O_RDWR at opening PCM OSS with read/write-only device
[ALSA] snd-emu10k1: Implement support for Audigy 2 ZS [SB0353]
[ALSA] add MAINTAINERS entry for snd-aoa
[ALSA] aoa: platform function gpio: ignore errors from functions that don't exist
[ALSA] make snd-powermac load even when it can't bind the device
[ALSA] aoa: fix toonie codec
[ALSA] aoa: feature gpio layer: fix IRQ access
[ALSA] Conversions from kmalloc+memset to k(z|c)alloc
[ALSA] snd-emu10k1: Fixes ALSA bug#2190
While busy-waiting for completion, check the hardware after scheduling;
don't schedule and then immediately check the _timeout_. If the yield()
took a long time (as it does on my OLPC prototype board when it's busy),
we'd report a timeout even though the hardware was now ready.
This fixes it, and also switches the yield() for a cond_resched() because
we don't actually want to be _that_ nice about it. I see nice
tightly-packed SMBus transactions now, rather than waiting for milliseconds
between successive phases.
Actually, we shouldn't be busy-waiting here at all. We should be using
interrupts. That's an exercise for another day though.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Christer Weinigel <wingel@nano-system.com>
Cc: <Jordan.Crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I saw an oops down this path when trying to create a new file on a UDF
filesystem which was internally marked as readonly, but mounted rw:
udf_create
udf_new_inode
new_inode
alloc_inode
udf_alloc_inode
udf_new_block
returns EIO due to readonlyness
iput (on error)
udf_put_inode
udf_discard_prealloc
udf_next_aext
udf_current_aext
udf_get_fileshortad
OOPS
the udf_discard_prealloc() path was examining uninitialized fields of the
udf inode.
udf_discard_prealloc() already has this code to short-circuit the discard
path if no extents are preallocated:
if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB ||
inode->i_size == UDF_I_LENEXTENTS(inode))
{
return;
}
so if we initialize UDF_I_LENEXTENTS(inode) = 0 earlier in udf_new_inode,
we won't try to free the (not) preallocated blocks, since this will match
the i_size = 0 set when the inode was initialized.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The recent fixups in futex.c need to be applied to futex_compat.c too. Fixes
a hang reported by Olaf.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A patch in -mm kernel correct the parsing of "address resources" of pnpacpi.
Before we assumed it was memory only, but it could be also IO.
But this change show an hidden bug : some resources could be producer type
that are not handled by pnp layer. So we should ignore the producer
resources.
This patch fixes bug 6292 (http://bugzilla.kernel.org/show_bug.cgi?id=6292).
Some devices like PNP0A03 have 0xd00-0xffff and 0x0-0xcf7 as IO producer
resources.
Before correcting "address resources" parsing, it was seen as memory and was
harmless, because nobody tried to reserve this memory range as it should be
IO.
With the correction it become IO resources, and make failed all others device
that want to register IO in this range and use pnp layer (like a ISA sound
card).
The solution is to ignore producer resources
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Signed-off-by: Uwe Bugla <uwe.bugla@gmx.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: "Brown, Len" <len.brown@intel.com>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs_write_full_page does zero bytes in the file past eof, but it may
call get_block on those buffers as well. On machines where the page size
is larger than the blocksize, this can result in mmaped files incorrectly
growing up to a block boundary during writepage.
The fix is to avoid calling get_block for any blocks that are entirely past
eof
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is for collision check enhancement for memory hot add.
It's better to do resouce collision check before doing memory hot add,
which will touch memory management structures.
And add_section() should check section exists or not before calling
sparse_add_one_section(). (sparse_add_one_section() will do another
check anyway. but checking in memory_hotplug.c will be easy to understand.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: keith mannthey <kmannth@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
both of acpi_memory_enable_device() and acpi_memory_add_device() may evaluate
_CRS method.
We should avoid evaluate device's resource twice if we could get it
successfully in past.
Signed-off-by: KAMEZWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Keith Mannthey <kmannth@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add_memory() does all necessary check to avoid collision. then, acpi layer
doesn't have to check region by itself.
(*) pfn_valid() just returns page struct is valid or not. It returns 0
if a section has been already added even is ioresource is not added.
ioresource collision check in mm/memory_hotplug.c can do more precise
collistion check.
added enabled bit check just for sanity check..
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Keith Mannthey <kmannth@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_next_system_ram() is used to find available memory resource at onlining
newly added memory. This patch fixes following problem.
find_next_system_ram() cannot catch this case.
Resource: (start)-------------(end)
Section : (start)-------------(end)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Keith Mannthey <kmannth@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
find_next_system_ram() returns valid memory range which meets requested area,
only used by memory-hot-add.
This function always rewrite requested resource even if returned area is not
fully fit in requested one. And sometimes the returnd resource is larger than
requested area. This annoyes the caller. This patch changes the returned
value to fit in requested area.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Keith Mannthey <kmannth@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioresouce handling code in memory hotplug allows not-aligned memory hot add.
But when memmap and other memory structures are initialized, parameters should
be aligned. (if not aligned, initialization of mem_map will do wrong, it
assumes parameters are aligned.) This patch fix it.
And this patch allows ioresource collision check to handle -EEXIST.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Keith Mannthey <kmannth@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Cc: Jens Kilian <jjk@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At least Maxtor OneTouch III require a "start stop unit" command after auto
spin-down before the next access can proceed. This patch activates the
responsible code in scsi_mod for all Maxtor SBP-2 disks.
https://bugzilla.novell.com/show_bug.cgi?id=183011
Maybe that should be done for all SBP-2 disks, but better be cautious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Jody McIntyre <scjody@modernduck.com>
Cc: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While helping someone to submit a patch to the stable branch, I noticed
that the stable branch is not listed in the MAINTAINERS file. This was
after I went there to look for the email addresses for the stable branch
list (stable@kernel.org).
This patch adds the stable branch to the maintainers file so that people
can find where to send patches when they have a fix for the stable team.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up proc file removal in sq module for superh arch. currently on a
failed module load or on module unload a proc file is left registered which
can cause a random memory execution or oopses if read after unload. This
patch cleans up that deregistration.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* MODE_MASK is unused in eicon driver.
* Conflicts with a ptrace stuff on arm.
drivers/isdn/hardware/eicon/divasync.h:259:1: warning: "MODE_MASK" redefined
include2/asm/ptrace.h:48:1: warning: this is the location of the previous definition
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Acked-by: Armin Schindler <armin@melware.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A set of tty line discipline cleanup patches were introduced before the
dawn of time, in kernel version 2.4.21. This patch performs that cleanup
for the hvsi driver.
The hvsi driver is used only on IBM pSeries PowerPC boxes. The driver was
originally written by Hollis Blanchard, who has delegated maintainership to
me. So this my first and maybe only patch in this official new role,
because this driver is otherwise bug-free :-)
Alan: "Actually its also a bug fix, tty->ldisc should be locked by refcounting
and the helpers do this for you."
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Under certain rare circumstances, it appears that there can be be a
NULL-pointer deref when a user fiddles with terminal emeulation programs while
outpu is being sent to the console. This patch checks for and avoids a
NULL-pointer deref.
Signed-off-by: Hollis Blanchard <hollisbl@austin.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we don't find the item we are lookng for, we allocate a new one, and
then grab the lock again and search to see if it has been added while we
did the alloc. If it had been added we need to 'cache_put' the newly
created item that we are never going to use. But as it hasn't been
initialised properly, putting it can cause an oops.
So move the ->init call earlier to that it will always be fully initilised
if we have to put it.
Thanks to Philipp Matthias Hahn <pmhahn@svs.Informatik.Uni-Oldenburg.de>
for reporting the problem.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The POSIX_FADV_NOREUSE hint means "the application will use this range of the
file a single time". It seems to be intended that the implementation will use
this hint to perform drop-behind of that part of the file when the application
gets around to reading or writing it.
However for reasons which aren't obvious (or sane?) I mapped
POSIX_FADV_NOREUSE onto POSIX_FADV_WILLNEED. ie: it does readahead.
That's daft. So for now, make POSIX_FADV_NOREUSE a no-op.
This is a non-back-compatible change. If someone was using POSIX_FADV_NOREUSE
to perform readahead, they lose. The likelihood is low.
If/when we later implement POSIX_FADV_NOREUSE things will get interesting - to
do it fully we'll need to maintain file offset/length ranges and peform all
sorts of complex tricks, and managing the lifetime of those ranges' data
structures will be interesting..
A sensible implementation would probably ignore the file range and would
simply mark the entire file as needing some form of drop-behind treatment.
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- fix up the start up sequence.
This new sequence allow you to correctly enable the LCD controller
even if the bootloader has already did it.
- fix up a wrong indentation issue.
Signed-off-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix "info->var.rotate" data settings.
This info should be deduced directly from "fbdev->panel->control_base"
defined into au1100fb.h.
Signed-off-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reported by: Dave Jones
Whilst printk'ing to both console and serial console, I got this...
(2.6.18rc1)
BUG: sleeping function called from invalid context at kernel/sched.c:4438
in_atomic():0, irqs_disabled():1
Call Trace:
[<ffffffff80271db8>] show_trace+0xaa/0x23d
[<ffffffff80271f60>] dump_stack+0x15/0x17
[<ffffffff8020b9f8>] __might_sleep+0xb2/0xb4
[<ffffffff8029232e>] __cond_resched+0x15/0x55
[<ffffffff80267eb8>] cond_resched+0x3b/0x42
[<ffffffff80268c64>] console_conditional_schedule+0x12/0x14
[<ffffffff80368159>] fbcon_redraw+0xf6/0x160
[<ffffffff80369c58>] fbcon_scroll+0x5d9/0xb52
[<ffffffff803a43c4>] scrup+0x6b/0xd6
[<ffffffff803a4453>] lf+0x24/0x44
[<ffffffff803a7ff8>] vt_console_print+0x166/0x23d
[<ffffffff80295528>] __call_console_drivers+0x65/0x76
[<ffffffff80295597>] _call_console_drivers+0x5e/0x62
[<ffffffff80217e3f>] release_console_sem+0x14b/0x232
[<ffffffff8036acd6>] fb_flashcursor+0x279/0x2a6
[<ffffffff80251e3f>] run_workqueue+0xa8/0xfb
[<ffffffff8024e5e0>] worker_thread+0xef/0x122
[<ffffffff8023660f>] kthread+0x100/0x136
[<ffffffff8026419e>] child_rip+0x8/0x12
This can occur when release_console_sem() is called but the log
buffer still has contents that need to be flushed. The console drivers
are called while the console_may_schedule flag is still true. The
might_sleep() is triggered when fbcon calls console_conditional_schedule().
Fix by setting console_may_schedule to zero earlier, before the call to the
console drivers.
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The per cpu variables are used incorrectly in vmstat.h.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Acked-by: Steve Fox <drfickle@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When delivering PTRACE_EVENT_VFORK_DONE, provide pid of the child process
when tracer calls ptrace(PTRACE_GETEVENTMSG). This is already
(accidentally) available when the tracer is tracing VFORK in addition to
VFORK_DONE.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Daniel Jacobowitz <dan@debian.org>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent patch that allowed linear arrays to be reconfigured on-line
allowed in a bug which results in divide by zero - not all
mddev->array_size were converted to conf->array_size.
This patch finished the conversion and fixed the bug.
The offending patch was commit 7c7546ccf6.
Thanks to Simon Kirby <sim@netnation.com> for the bug report.
Cc: Simon Kirby <sim@netnation.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seems like the omap-rng driver in the main tree predates the switch from
<asm/hardware/clock.h> to <linux/clk.h> ... now it builds OK.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current Linus tree crashes in aty128_set_lcd_enable() because par->pdev
is NULL. This happens since at least a week. Call trace is:
aty128_set_lcd_enable
aty128fb_set_par
fbcon_init
visual_init
take_over_console
fbcon_takeover
notifier_call_chain
blocking_notifier_call_chain
register_framebuffer
aty128fb_probe
pci_device_probe
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
aty128fb_init
init
kernel_thread
- info->fix was assigned twice.
- par->vram_size is assigned in aty128_probe(), no need to redo it again
in aty128_init()
- register_framebuffer() uses uninitialized struct members, move it past
par->pdev assignment and past aty128_bl_init().
Signed-off-by: Olaf Hering <olh@suse.de>
Acked-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_get_locked_page is called twice in ufs code, one time in ufs_truncate
path(we allocated last block), and another time when fragments are
reallocated. In ideal world in the second case on allocation/free block
layer we should not know that things like `truncate' exists, but now with
such crutch like ufs_get_locked_page we can (or should?) skip truncated
pages.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As discussed earlier:
http://lkml.org/lkml/2006/6/28/136
this patch fixes such issue:
`ufs_get_locked_page' takes page from cache
after that `vmtruncate' takes page and deletes it from cache
`ufs_get_locked_page' locks page, and reports about EIO error.
Also because of find_lock_page always return valid page or NULL, we have no
need to check it if page not NULL.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a barrier() in futex unqueue_me to avoid aliasing of two
pointers.
On my s390x system I saw the following oops:
Unable to handle kernel pointer dereference at virtual kernel address
0000000000000000
Oops: 0004 [#1]
CPU: 0 Not tainted
Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8)
Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30)
Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000
0000000000000000 0000000000000000 000001fe00002028 00000000000c091f
000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0
00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8
Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04
Call Trace:
([<0000000000144f90>] unqueue_me+0x40/0xe4)
[<0000000000145a0c>] do_futex+0x33c/0xc40
[<000000000014643e>] sys_futex+0x12e/0x144
[<000000000010bb00>] sysc_noemu+0x10/0x16
[<000002000003741c>] 0x2000003741c
The code in question is:
static int unqueue_me(struct futex_q *q)
{
int ret = 0;
spinlock_t *lock_ptr;
/* In the common case we don't take the spinlock, which is nice. */
retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr);
/*
* q->lock_ptr can change between reading it and
* spin_lock(), causing us to take the wrong lock. This
* corrects the race condition.
[...]
and my compiler (gcc 4.1.0) makes the following out of it:
00000000000003c8 <unqueue_me>:
3c8: eb bf f0 70 00 24 stmg %r11,%r15,112(%r15)
3ce: c0 d0 00 00 00 00 larl %r13,3ce <unqueue_me+0x6>
3d0: R_390_PC32DBL .rodata+0x2a
3d4: a7 f1 1e 00 tml %r15,7680
3d8: a7 84 00 01 je 3da <unqueue_me+0x12>
3dc: b9 04 00 ef lgr %r14,%r15
3e0: a7 fb ff d0 aghi %r15,-48
3e4: b9 04 00 b2 lgr %r11,%r2
3e8: e3 e0 f0 98 00 24 stg %r14,152(%r15)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
/* write q->lock_ptr in r12 */
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: a7 84 00 4b je 48e <unqueue_me+0xc6>
/* if r12 is zero then jump over the code.... */
3fc: e3 20 b0 28 00 04 lg %r2,40(%r11)
/* write q->lock_ptr in r2 */
402: c0 e5 00 00 00 00 brasl %r14,402 <unqueue_me+0x3a>
404: R_390_PC32DBL _spin_lock+0x2
/* use r2 as parameter for spin_lock */
So the code becomes more or less:
if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)
Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: b9 04 00 2c lgr %r2,%r12
3fc: a7 84 00 48 je 48c <unqueue_me+0xc4>
400: c0 e5 00 00 00 00 brasl %r14,400 <unqueue_me+0x38>
402: R_390_PC32DBL _spin_lock+0x2
As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q->lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@timesys.com>
Signed-off-by: Christian Borntraeger <borntrae@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>