If the destination pipe is full and we already transferred
data, we break out instead of waiting for more pipe room.
The exit logic looks at spd->nr_pages to see if we moved
everything inside the spd container, but we decrement that
variable in the loop to decide when spd has emptied.
Instead we want to compare to the original page count in
the spd, so cache that in a local variable.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As we have potentially dirtied more than 1 page, we should indicate as
such to the dirty page balancing. So call
balance_dirty_pages_ratelimited_nr() and pass in the approximate number
of pages we dirtied.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Allowing attribute and symlink dentries to be reclaimed means
sd->s_dentry can change dynamically. However, updates to the field
are unsynchronized leading to race conditions. This patch adds
sysfs_lock and use it to synchronize updates to sd->s_dentry.
Due to the locking around ->d_iput, the check in sysfs_drop_dentry()
is complex. sysfs_lock only protect sd->s_dentry pointer itself. The
validity of the dentry is protected by dcache_lock, so whether dentry
is alive or not can only be tested while holding both locks.
This is minimal backport of sysfs_drop_dentry() rewrite in devel
branch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The condition check doesn't make much sense as it basically always
succeeds. This causes NULL dereferencing on certain cases. It seems
that parentheses are put in the wrong place. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Backport of
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch
For regular files in sysfs, sysfs_readdir wants to traverse
sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number.
But, the dentry can be reclaimed under memory pressure, and there is
no synchronization with readdir. This patch follows Tejun's scheme of
allocating and storing an inode number in the new s_ino member of a
sysfs_dirent, when dirents are created, and retrieving it from there
for readdir, so that the pointer chain doesn't have to be traversed.
Tejun's upstream patch uses a new-ish "ida" allocator which brings
along some extra complexity; this -stable patch has a brain-dead
incrementing counter which does not guarantee uniqueness, but because
sysfs doesn't hash inodes as iunique expects, uniqueness wasn't
guaranteed today anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] CIFS should honour umask
[CIFS] Missing flag on negprot needed for some servers to force packet signing
[CIFS] whitespace cleanup part 2
[CIFS] whitespace cleanup
[CIFS] fix mempool destroy done in wrong order in cifs error path
[CIFS] typo in previous patch
[CIFS] Fix oops on failed cifs mount (in kthread_stop)
Report the correct errno for out of memory debug output in binfmt_flat.c
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes CIFS honour a process' umask like other filesystems.
Of course the server is still free to munge the permissions if it wants
to; but the client will send the "right" permissions to begin with.
A few caveats:
1) It only applies to filesystems that have CAP_UNIX (aka support unix
extensions)
2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms()
after remote creation
When mode to CIFS/NTFS ACL mapping is complete we can do the
same thing for that case for servers which do not
support the Unix Extensions.
Signed-off-by: Matt Keenen <matt@opcode-solutions.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Original patch and description from Neil Brown <neilb@suse.de>,
merged and adapted to splice branch by me. Neils text follows:
__generic_file_splice_read() currently samples the i_size at the start
and doesn't do so again unless it needs to call ->readpage to load
a page. After ->readpage it has to re-sample i_size as a truncate
may have caused that page to be filled with zeros, and the read()
call should not see these.
However there are other activities that might cause ->readpage to be
called on a page between the time that __generic_file_splice_read()
samples i_size and when it finds that it has an uptodate page. These
include at least read-ahead and possibly another thread performing a
read
So we must sample i_size *after* it has an uptodate page. Thus the
current sampling at the start and after a read can be replaced with a
sampling before page addition into spd.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
__generic_file_splice_read's partial page check, at eof after readpage,
not only got its calculations wrong, but also reused the loff variable:
causing data corruption when splicing from a non-0 offset in the file's
last page (revealed by ext2 -b 1024 testing on a loop of a tmpfs file).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
I've seen inode related deadlocks, so move this call outside of the
actor itself, which may hold the inode lock.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This bug was caught by LTP testcase fchmod06 on Blackfin platform.
In the manpage of fchmod, "EPERM: The effective UID does not match the
owner of the file, and the process is not privileged (Linux: it does not
have the CAP_FOWNER capability)."
But the ramfs nommu code missed the inode_change_ok POSIX UID/GID
verification. This patch fixed this.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The write path code intends to bug if a math error (or unhandled case)
results in a write outside of the current cluster boundaries. The actual
BUG_ON() statements however are incorrect, leading to a crash on kernels
with 64k page size. Fix those by checking against the right variables.
Also, move the assertions higher up within the functions so that they trip
*before* the code starts to mark buffers.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Some of the sysfs changes inadvertantly broke the simple runtime debug log
filtering employed in ocfs2. Fix this by properly exporting the masklog
category filter names.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
A related signature issue that I came across.
There's a bug in win2k that when NT error codes are not negotiated, the
server doesn't response that signatures are mandatory. Since there's
(currently) no way turn on signatures in such case, I had to force NT
error codes, so that this bug will not occur
Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Various coding style problems found by running the new
checkpatch.pl script against fs/cifs. 3 more files
fixed up.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Various coding style problems found by running fs/cifs
against the new checkpatch.pl script. Since there
were too many to fit in one patch. Updated the first
four files.
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.infradead.org/mtd-2.6:
[JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree()
[MTD] Fix error checking after get_mtd_device() in get_sb_mtd functions
[JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes()
[JFFS2] Fix potential memory leak of dead xattrs on unmount.
[JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files.
[MTD] generalise the handling of MTD-specific superblocks
[MTD] [MAPS] don't force uclinux mtd map to be root dev
We've had several reoprts of the CPU jumping to 0x00000000 is do_ioctl(). I
assume that there's a race and someone is zeroing out the ioctl handler while
this CPU waits for the lock_kernel().
The patch adds code to detect this, then emits stuff which will hopefuly lead
us to the culprit.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab cache used as memory pool can not be destroyed before the memory
pool destruction. Because the memory pool still holds some objects and
kmem_cache_destroy() says "Can't free all objects".
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We should keep the mdata node with higher version number, not just the
one we happen to find latest. Doh.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
update_next_aext() could possibly rewrite values in elen and eloc, possibly
leading to data corruption when rewriting a file. Use temporary variables
instead. Also advance cur_epos as it can also point to an indirect extent
pointer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we have already read enough bytes, no need to call read_more().
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
we should free just the allocated blocks.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds a check for overlap of extents and cuts short the
new extent to be inserted, if there is a chance of overlap.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
mips:
fs/afs/flock.c: In function `afs_lock_may_be_available':
fs/afs/flock.c:55: error: dereferencing pointer to incomplete type
fs/afs/flock.c: In function `afs_lock_work':
fs/afs/flock.c:84: error: dereferencing pointer to incomplete type
fs/afs/flock.c:89: error: dereferencing pointer to incomplete type
fs/afs/flock.c:109: error: dereferencing pointer to incomplete type
fs/afs/flock.c:135: error: dereferencing pointer to incomplete type
fs/afs/flock.c:143: error: dereferencing pointer to incomplete type
fs/afs/flock.c:158: error: dereferencing pointer to incomplete type
fs/afs/flock.c:161: error: dereferencing pointer to incomplete type
fs/afs/flock.c:179: error: `TASK_UNINTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:179: error: (Each undeclared identifier is reported only once
fs/afs/flock.c:179: error: for each function it appears in.)
fs/afs/flock.c:179: error: `TASK_INTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:182: error: dereferencing pointer to incomplete type
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Local variable `i' is a byte-counter. Don't use it as an index into an array
of le32's.
Reported-by: "young dave" <hidave.darkstar@gmail.com>
Cc: "Christoph Lameter" <clameter@sgi.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It should be pass "newsize" to vmtruncate function to modify the
inode->i_size, while the old size is passed to vmtruncate.
This bug was caught by LTP truncate test case on Blackfin platform.
After it was fixed, the LTP truncate test case passed.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current code is leaking a reference to dreq->kref when the calls to
nfs_direct_read_schedule() and nfs_direct_write_schedule() return an
error.
This patch moves the call to kref_put() from nfs_direct_wait() back into
nfs_direct_read() and nfs_direct_write() (which are the functions that
actually took the reference in the first place) fixing the leak.
Thanks to Denis V. Lunev for spotting the bug and proposing the original
fix.
Acked-by: Denis V. Lunev <dlunev@gmail.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The recent fix for preventing NULL files from being left around does not
update the file size corectly in all cases. The missing case is a write
extending the file that does not need to allocate a block.
In that case we used a read mapping of the extent which forced the use of
the read I/O completion handler instead of the write I/O completion
handle. Hence the file size was not updated on I/O completion.
SGI-PV: 965068
SGI-Modid: xfs-linux-melb:xfs-kern:28657a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nscott@aconex.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Why is it that since the 2f1a2ccb9c console
UTF-8 fixes went into 2.6.22-rc1, the PowerMac G5 shows only inverse video
question marks for the text on tty2-6? whereas tty1 is fine, and so is x86.
No fault of that patch: by removing the old fallback behaviour, it reveals
that 32-bit setfont running on 64-bit kernels has only really worked on
the current console, the rest getting faked by that inadequate fallback.
Bring the compat do_unimap_ioctl into line with the main one: PIO_UNIMAP
and GIO_UNIMAP apply to the specified tty, not redirected to fg_console.
Use the same checks, and most particularly, remember to check access_ok:
con_set_unimap and con_get_unimap are using __get_user and __put_user.
And the compat vt_check should ask for the same capability as the main
one, CAP_SYS_TTY_CONFIG rather than CAP_SYS_ADMIN. Added in vt_ioctl's
vc_cons_allocated check for safety, though failure may well be impossible.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We weren't cleaning up our inode reference on error in
ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if
need be. Move the code to set the alloc context inode info to the end of the
function so we don't have any possibility of passing back a bad pointer.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we
can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just
return AOP_TRUNCATED_PAGE if the operation fails.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6:
NFS: Fix nfs_direct_dirty_pages()
NFS: Fix handful of compiler warnings in direct.c
NFS: Avoid a deadlock situation on write
We only need to dirty the pages that were actually read in.
Also convert nfs_direct_dirty_pages() to call set_page_dirty() instead of
set_page_dirty_lock(). A call to lock_page() is unacceptable in an rpciod
callback function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch fixes a couple of signage issues that were causing an Oops
when running the LTP diotest4 test. get_user_pages() returns a signed
error, hence we need to be careful when comparing with the unsigned
number of pages from data->npages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When processes are allowed to attempt to lock a non-contiguous range of nfs
write requests, it is possible for generic_writepages to 'wrap round' the
address space, and call writepage() on a request that is already locked by
the same process.
We avoid the deadlock by checking if the page index is contiguous with the
list of nfs write requests that is already held in our
nfs_pageio_descriptor prior to attempting to lock a new request.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Delay writing 0's out in eCryptfs after a seek past the end of the file
until data is actually written.
http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html
``The lseek() function shall not, by itself, extend the size of a
file.''
Without this fix, applications that lseek() past the end of the file without
writing will experience unexpected behavior.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gathering signals in bulk enables server applications to drain a signal
queue (almost full of realtime signals) more efficiently by reducing the
syscall and file look-up overhead.
Very similar to the sigtimedwait4() call described by Niels Provos, Chuck
Lever, and Stephen Tweedie in a paper entitled "Analyzing the Overload
Behavior of a Simple Web Server". The paper lists more details and
advantages.
Signed-off-by: Davi E. M. Arnaut <davi@haxent.com.br>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When inode is dropped (no more references) delete it from cache.
There's not much point in keeping it cached, when a new lookup will refresh
the attributes anyway.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes O_APPEND in direct IO mode. Also checks writes against file size
limits, notably rlimits.
Reported by Greg Bruno.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't allow loading ELF shared library from noexec points so the
same should apply to sys_uselib aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ulrich Drepper <drepper@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In stree.c, MIN_KEY is declared const. The extern declaration in dir.c
doesn't match...
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Optimize select by a using stack space for small fd sets.
core_sys_select() already has this optimization. This is for compat
version.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The wrong lookup flag was tested in ->create() causing havoc (error or
Oops) when a regular file was created with mknod() in a fuse filesystem.
Thanks to J. Cameijo Cerdeira for the report.
Kernels 2.6.18 onward are affected. Please apply to -stable as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the cifs demultiplex thread wakes up and exits
(zeroing server->tsk) before kthread_stop is called, the
cifs_mount code could pass a null pointer to kthread_stop
Thanks to akpm, Dave Young and Shaggy for suggesting
earlier versions of this patch.
CC: akpm@linux-foundatior.org
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This from a "tested" patch...
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the LDM driver so that it works with Windows Vista dynamic
disks which are subtly different to Windows 2000/XP ones.
The patch was needed to get a Vista formatted dynamic disk to be
recognized and parsed successfully.
Thanks go to Chris Teachworth for the report and testing.
Cc: Richard Russon <ldm@flatcap.org>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug was introduced by 01f2705daf.
It misses to convert the first argument, it should be "new_page".
This became a cause of fatfs corruption.
Cc: Nate Diller <nate.diller@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not really sure where this bogosity came from, but there's certainly
nothing special about sh that lets us use flat files with the MMU on.
Kill the dependency, and leave it as !MMU, like it is for all of the
other nommu-wielding ports.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
An xattr_datum which ends up orphaned should be freed by the GC
thread. But if we umount before the GC thread is finished, or if we
mount read-only and the GC thread never runs, they might never be
freed. Clean them up during unmount, if there are any left.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When we cannot mark nodes as obsolete, such as on NAND flash, we end up
having to delete inodes with !nlink in jffs2_build_remove_unlinked_inode().
However, jffs2_build_xattr_subsystem() runs later than this, and will
attach an xref to the dead inode. Then later when the last nodes of that
dead inode are erased we hit a BUG() in jffs2_del_ino_cache()
because we're not supposed to get there with an xattr still attached to
the inode which is being killed.
The simple fix is to refrain from attaching xattrs to inodes with zero
nlink, in jffs2_build_xattr_subsystem(). It's it's OK to trust nlink
here because the file system isn't actually mounted yet, so there's no
chance that a zero-nlink file could actually be alive still because
it's open.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The timerfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it.
This makes timerfd directly use the waitqueue lock.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The eventfd was using the unlocked waitqueue operations, but it was
using a different lock, so poll_wait() would race with it.
This makes eventfd directly use the waitqueue lock.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means
the allocation of radix tree nodes is done with GFP_NOFS and the allocation
of a new page is done using GFP_NOFS.
The mapping has a flags field that contains the necessary allocation flags
for the page cache allocation. These need to be consulted in order to get
DMA and HIGHMEM allocations etc right. And yes a blockdev could be
allowing Highmem allocations if its a ramdisk.
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
call under dqonoff_mutex and save some problems with races...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
filename size and change it to 128, so that extensive patterns such as
'/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
generation.
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just thought this is easier to read.
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
afs_prepare_write() should not mark a page up to date if it only partially
fills it in, in expectation of the caller filling in the rest prior to calling
commit_write(). commit_write(), however, should mark the page up to date.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix AFS to write back dirty on unmounting. This didn't happen because
afs_super_ops.drop_inode was pointing to generic_delete_inode. Now this
pointer is left set to NULL so that the default behaviour occurs instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is no more
needed. For the same reason, the rwlock can become a simple spinlock.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes the epoll single pass code. During the unlocked event delivery (to
userspace) code, the poll callback can re-issue new events, and we must
receive them correctly. Since we loop in a lockless fashion, we want to be
O(nready), and we don't want to flash on/off the spinlock for every event, we
have the poll callback to use a secondary list to queue events while we're
inside the event delivery loop. The rw_semaphore has been turned into a
mutex. This patch also adds the wait-exclusive flag, as suggested by Davi
Arnaut.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- fs/lockd/xdr4.c:140:27: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:141:27: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:432:28: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:433:28: warning: incorrect type in argument 2 (different
explicit signedness)
- fs/lockd/xdr4.c:587:20: warning: symbol 'nlm_version4' was not declared.
Should it be static?
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
- fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared.
Should it be static?
- fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared.
Should it be static?
- fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one
- fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not
declared. Should it be static?
- fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease'
was not declared. Should it be static?
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The XDR code should not depend on the physical allocation size of
structures like nfs4_stateid and nfs4_verifier since those may have to
change at some future date. We therefore replace all uses of
sizeof() with constants like NFS4_VERIFIER_SIZE and NFS4_STATEID_SIZE.
This also has the side-effect of fixing some warnings of the type
format ‘%u’ expects type ‘unsigned int’, but argument X has type
‘long unsigned int’
on 64-bit systems
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use zero_user_page() instead of the newly deprecated memclear_highpage_flush().
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
reclaimer() calls allow_signal() which plays with parent process's ->sighand.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use zero_user_page() instead of open-coding it.
[akpm@linux-foundation.org: kmap-type fixes]
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] Abnormal End of Processes
[PATCH] match audit name data
[PATCH] complete message queue auditing
[PATCH] audit inode for all xattr syscalls
[PATCH] initialize name osid
[PATCH] audit signal recipients
[PATCH] add SIGNAL syscall class (v3)
[PATCH] auditing ptrace
Re-arrange epoll code to avoid static functions pre-declarations, and apply
akpm-filter on it.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Epoll is either compiled it, or not (if EMBEDDED). Remove the module code
and use fs_initcall().
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cut out lots of code from epoll, by reusing the anonymous inode source
patch (fs/anon_inodes.c).
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd (hence
compatible with POSIX select/poll). The KAIO code simply signals the eventfd
fd when events are ready, and this triggers a POLLIN in the fd. This patch
uses a reserved for future use member of the struct iocb to pass an eventfd
file descriptor, that KAIO will use to post events every time a request
completes. At that point, an aio_getevents() will return the completed result
to a struct io_event. I made a quick test program to verify the patch, and it
runs fine here:
http://www.xmailserver.org/eventfd-aio-test.c
The test program uses poll(2), but it'd, of course, work with select and epoll
too.
This can allow to schedule both block I/O and other poll-able devices
requests, and wait for results using select/poll/epoll. In a typical
scenario, an application would submit KAIO request using aio_submit(), and
will also use epoll_ctl() on the whole other class of devices (that with the
addition of signals, timers and user events, now it's pretty much complete),
and then would:
epoll_wait(...);
for_each_event {
if (curr_event_is_kaiofd) {
aio_getevents();
dispatch_aio_events();
} else {
dispatch_epoll_event();
}
}
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a very simple and light file descriptor, that can be used as event
wait/dispatch by userspace (both wait and dispatch) and by the kernel
(dispatch only). It can be used instead of pipe(2) in all cases where those
would simply be used to signal events. Their kernel overhead is much lower
than pipes, and they do not consume two fds. When used in the kernel, it can
offer an fd-bridge to enable, for example, functionalities like KAIO or
syslets/threadlets to signal to an fd the completion of certain operations.
But more in general, an eventfd can be used by the kernel to signal readiness,
in a POSIX poll/select way, of interfaces that would otherwise be incompatible
with it. The API is:
int eventfd(unsigned int count);
The eventfd API accepts an initial "count" parameter, and returns an eventfd
fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to the
internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks (unless
O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not sleep
during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow happened
on the internal counter (due to 2^64 eventfd_signal() posts that has never
been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it to the current
counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:
struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);
The eventfd_fget() should be called to get a struct file* from an eventfd fd
(this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post an event
to userspace. The eventfd_signal() function can be called from any context.
An eventfd() simple test and bench is available here:
http://www.xmailserver.org/eventfd-bench.c
This is the eventfd-based version of pipetest-4 (pipe(2) based):
http://www.xmailserver.org/pipetest-4.c
Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.
[akpm@linux-foundation.org: fix i386 build]
[akpm@linux-foundation.org: add sys_eventfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch implements the necessary compat code for the timerfd system call.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces a new system call for timers events delivered though
file descriptors. This allows timer event to be used with standard POSIX
poll(2), select(2) and read(2). As a consequence of supporting the Linux
f_op->poll subsystem, they can be used with epoll(2) too.
The system call is defined as:
int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);
The "ufd" parameter allows for re-use (re-programming) of an existing timerfd
w/out going through the close/open cycle (same as signalfd). If "ufd" is -1,
s new file descriptor will be created, otherwise the existing "ufd" will be
re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time
specified in the "utmr->it_value" parameter is the expiry time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time,
otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should be
generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create a
timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2) and
epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be
returned.
The read(2) call can be used, and it will return a u32 variable holding the
number of "ticks" that happened on the interface since the last call to
read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will
be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:
http://www.xmailserver.org/timerfd-test.c
[akpm@linux-foundation.org: add sys_timerfd to sys_ni.c]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch implements the necessary compat code for the signalfd system call.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how badly it can be
broken :), and I added even more breakage ;) Signals are fetched from the same
signal queue used by the process, so signalfd will compete with standard
kernel delivery in dequeue_signal(). If you want to reliably fetch signals on
the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This
seems to be working fine on my Dual Opteron machine. I made a quick test
program for it:
http://www.xmailserver.org/signafd-test.c
The signalfd() system call implements signal delivery into a file descriptor
receiver. The signalfd file descriptor if created with the following API:
int signalfd(int ufd, const sigset_t *mask, size_t masksize);
The "ufd" parameter allows to change an existing signalfd sigmask, w/out going
to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new
signalfd file.
The "mask" allows to specify the signal mask of signals that we are interested
in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure in
the userspace supplied buffer. The return value is the number of bytes copied
in the supplied buffer, or -1 in case of error. The read(2) call can also
return 0, in case the sighand structure to which the signalfd was attached,
has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will
return -EAGAIN in case no signal is available.
If the size of the buffer passed to read(2) is lower than sizeof(struct
signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also
return -ERESTARTSYS in case a signal hits the process. The format of the
struct signalfd_siginfo is, and the valid fields depends of the (->code &
__SI_MASK) value, in the same way a struct siginfo would:
struct signalfd_siginfo {
__u32 signo; /* si_signo */
__s32 err; /* si_errno */
__s32 code; /* si_code */
__u32 pid; /* si_pid */
__u32 uid; /* si_uid */
__s32 fd; /* si_fd */
__u32 tid; /* si_fd */
__u32 band; /* si_band */
__u32 overrun; /* si_overrun */
__u32 trapno; /* si_trapno */
__s32 status; /* si_status */
__s32 svint; /* si_int */
__u64 svptr; /* si_ptr */
__u64 utime; /* si_utime */
__u64 stime; /* si_stime */
__u64 addr; /* si_addr */
};
[akpm@linux-foundation.org: fix signalfd_copyinfo() on i386]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch add an anonymous inode source, to be used for files that need
and inode only in order to create a file*. We do not care of having an
inode for each file, and we do not even care of having different names in
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd
(and whatever else there'll be).
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make autofs container-friendly by caching struct pid reference rather than
pid_t and using pid_nr() to retreive a task's pid_t.
ChangeLog:
- Fix Eric Biederman's comments - Use find_get_pid() to hold a
reference to oz_pgrp and release while unmounting; separate out
changes to autofs and autofs4.
- Fix Cedric's comments: retain old prototype of parse_options()
and move necessary change to its caller.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: containers@lists.osdl.org
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files
being modified for container/pidspace issues.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: <containers@lists.osdl.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
attach_pid() currently takes a pid_t and then uses find_pid() to find the
corresponding struct pid. Sometimes we already have the struct pid. We can
then skip find_pid() if attach_pid() were to take a struct pid parameter.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: <containers@lists.osdl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up massive code duplication between mpage_writepages() and
generic_writepages().
The new generic function, write_cache_pages() takes a function pointer
argument, which will be called for each page to be written.
Maybe cifs_writepages() too can use this infrastructure, but I'm not
touching that with a ten-foot pole.
The upcoming page writeback support in fuse will also want this.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove unused argument in is_pmbr_valid()
Remove unneeded initialization of local variable legacy_mbr
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't enable SYSV68 partition table support on all m68k boxes by default,
only on Motorola VME boards.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the statfs() op for AFS.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a couple of problems with unlinking AFS files.
(1) The parent directory wasn't being updated properly between unlink() and
the following lookup().
It seems that, for some reason, invalidate_remote_inode() wasn't
discarding the directory contents correctly, so this patch calls
invalidate_inode_pages2() instead on non-regular files.
(2) afs_vnode_deleted_remotely() should handle vnodes that don't have a
source server recorded without oopsing.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following bug was uncovered by compiling with '-W' flag:
CC [M] fs/afs/write.o
fs/afs/write.c: In function âafs_write_back_from_locked_pageâ:
fs/afs/write.c:398: warning: comparison of unsigned expression >= 0 is always true
Loop variable 'n' is unsigned, so wraps around happily as far as I can
see. Trival fix attached (compile tested only).
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generalise the handling of MTD-specific superblocks so that JFFS2 and ROMFS
can both share it.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Hi,
I have been working on some code that detects abnormal events based on audit
system events. One kind of event that we currently have no visibility for is
when a program terminates due to segfault - which should never happen on a
production machine. And if it did, you'd want to investigate it. Attached is a
patch that collects these events and sends them into the audit system.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Handle the edge cases for POSIX message queue auditing. Collect inode
info when opening an existing mq, and for send/receive operations. Remove
audit_inode_update() as it has really evolved into the equivalent of
audit_inode().
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Collect inode info for the remaining xattr syscalls that operate on a file
descriptor. These don't call a path_lookup variant, so they aren't covered by
the general audit hook.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In 9d6a8c5c21 we changed posix_test_lock
to modify its single file_lock argument instead of taking separate input
and output arguments. This makes it no longer safe to set the output
lock's fl_type to F_UNLCK before looking for a conflict, since that
means searching for a conflict against a lock with type F_UNLCK.
This fixes a regression which causes F_GETLK to incorrectly report no
conflict on most filesystems (including any filesystem that doesn't do
its own locking).
Also fix posix_lock_to_flock() to copy the lock type. This isn't
strictly necessary, since the caller already does this; but it seems
less likely to cause confusion in the future.
Thanks to Doug Chapman for the bug report.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A small regression appears to have been introduced in the recent patch
"cleanup compat ioctl handling", which was included in Linus' tree after
2.6.20.
siocdevprivate_ioctl() is no longer defined if CONFIG_NET is undefined,
whereas previously it was a dummy function in this case.
This causes compilation with CONFIG_COMPAT but without CONFIG_NET to fail.
fs/compat_ioctl.c: In function `compat_sys_ioctl':
fs/compat_ioctl.c:3571: warning: implicit declaration of function `siocdevprivate_ioctl'
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Further fixes for AFS write support:
(1) The afs_send_pages() outer loop must do an extra iteration if it ends
with 'first == last' because 'last' is inclusive in the page set
otherwise it fails to send the last page and complete the RxRPC op under
some circumstances.
(2) Similarly, the outer loop in afs_pages_written_back() must also do an
extra iteration if it ends with 'first == last', otherwise it fails to
clear PG_writeback on the last page under some circumstances.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AFS write support fixes:
(1) Support large files using the 64-bit file access operations if available
on the server.
(2) Use kmap_atomic() rather than kmap() in afs_prepare_page().
(3) Don't do stuff in afs_writepage() that's done by the caller.
[akpm@linux-foundation.org: fix right shift count >= width of type]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it more useful for debugging purposes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The maximum size of an NFSv4 SETATTR compound reply should include the
GETATTR operation that we send.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The check for nfs_attribute_timeout(dir) in nfs_check_verifier is
redundant: nfs_lookup_revalidate() will already call nfs_revalidate_inode()
on the parent dir when necessary.
The only case where this is not done is the case of a negative dentry. Fix
this case by moving up the revalidation code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
dentry verifiers are always set to the parent directory's
cache_change_attribute. There is no reason to be testing for anything other
than equality when we're trying to find out if the dentry has been checked
since the last time the directory was modified.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.infradead.org/mtd-2.6: (21 commits)
[MTD] [CHIPS] Remove MTD_OBSOLETE_CHIPS (jedec, amd_flash, sharp)
[MTD] Delete allegedly obsolete "bank_size" field of mtd_info.
[MTD] Remove unnecessary user space check from mtd.h.
[MTD] [MAPS] Remove flash maps for no longer supported 405LP boards
[MTD] [MAPS] Fix missing printk() parameter in physmap_of.c MTD driver
[MTD] [NAND] platform NAND driver: add driver
[MTD] [NAND] platform NAND driver: update header
[JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more.
[JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree()
[JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree()
[JFFS2] Remember to calculate overlap on nodes which replace older nodes
[JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush
[MTD] [NAND] at91_nand.c: CMDLINE_PARTS support
[MTD] [NAND] Tidy up handling of page number in nand_block_bad()
[MTD] block2mtd_paramline[] mustn't be __initdata
[MTD] [NAND] Support multiple chips in CAFÉ driver
[MTD] [NAND] Rename cafe.c to cafe_nand.c and remove the multi-obj magic
[MTD] [NAND] Use rslib for CAFÉ ECC
[RSLIB] Support non-canonical GF representations
[JFFS2] Remove dead file histo_mips.h
...
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
sound: convert "sound" subdirectory to UTF-8
MAINTAINERS: Add cxacru website/mailing list
include files: convert "include" subdirectory to UTF-8
general: convert "kernel" subdirectory to UTF-8
documentation: convert the Documentation directory to UTF-8
Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
remove broken URLs from net drivers' output
Magic number prefix consistency change to Documentation/magic-number.txt
trivial: s/i_sem /i_mutex/
fix file specification in comments
drivers/base/platform.c: fix small typo in doc
misc doc and kconfig typos
Remove obsolete fat_cvf help text
Fix occurrences of "the the "
Fix minor typoes in kernel/module.c
Kconfig: Remove reference to external mqueue library
Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
Correct comments in genrtc.c to refer to correct /proc file.
Fix more "deprecated" spellos.
Fix "deprecated" typoes.
...
Fix trivial comment conflict in kernel/relay.c.
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's very common for file systems to need to zero part or all of a page,
the simplist way is just to use kmap_atomic() and memset(). There's
actually a library function in include/linux/highmem.h that does exactly
that, but it's confusingly named memclear_highpage_flush(), which is
descriptive of *how* it does the work rather than what the *purpose* is.
So this patchset renames the function to zero_user_page(), and calls it
from the various places that currently open code it.
This first patch introduces the new function call, and converts all the
core kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones. Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a
patch deprecating the old call. The diffstat below shows the entire
patchset.
[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a lookup request arrives, nfsd uses information provided by userspace
(mountd) to find the right filesystem.
It then assumes that the same filehandle type as the incoming filehandle can
be used to create an outgoing filehandle.
However if mountd is buggy, or maybe just being creative, the filesystem may
not support that filesystem type, and the kernel could oops, particularly if
'ex_uuid' is NULL but a FSID_UUID* filehandle type is used.
So add some proper checking that the fsid version/type from the incoming
filehandle is actually supportable, and ignore that information if it isn't
supportable.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ decode_sattr and decode_sattr3 never return NULL, so remove
several checks for that. ditto for xdr_decode_hyper.
2/ replace some open coded XDR_QUADLEN calls with calls to
XDR_QUADLEN
3/ in decode_writeargs, simply an 'if' to use a single
calculation.
.page_len is the length of that part of the packet that did
not fit in the first page (the head).
So the length of the data part is the remainder of the
head, plus page_len.
3/ other minor cleanups.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kbuild directly interprets <modulename>-y as objects to build into a module,
no need to assign it to the old foo-objs variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to zero various parts of 'exp' before any 'goto out', otherwise when
we go to free the contents... we die.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:
RPC request reserved 164 but used 208
While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726
This is probably also a problem for other sec= types and for NFSv4. The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.
This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum. It also fixes up the appropriate callers of svc_reserve to
call the wrapper. For now, it just uses a hardcoded value that I
determined via testing. That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.
Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes
correctly. The specifications indicate that the server should accept the
request, but it should mostly turn into a no-op. Currently, the server
will return an XDR decode error, which it should not.
Attached is a patch which addresses this issue. It also adds some boundary
checking to ensure that the request contains as much data as was requested
to be written. It also correctly handles an NFSv3 request which requests
to write more data than the server has stated that it is prepared to
handle. Previously, there was some support which looked like it should
work, but wasn't quite right.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfs4_acl_add_ace() can now be removed.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.
Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.
(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Migrate AIO over to use flush_work().
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement support for writing to regular AFS files, including:
(1) write
(2) truncate
(3) fsync, fdatasync
(4) chmod, chown, chgrp, utime.
AFS writeback attempts to batch writes into as chunks as large as it can manage
up to the point that it writes back 65535 pages in one chunk or it meets a
locked page.
Furthermore, if a page has been written to using a particular key, then should
another write to that page use some other key, the first write will be flushed
before the second is allowed to take place. If the first write fails due to a
security error, then the page will be scrapped and reread before the second
write takes place.
If a page is dirty and the callback on it is broken by the server, then the
dirty data is not discarded (same behaviour as NFS).
Shared-writable mappings are not supported by this patch.
[akpm@linux-foundation.org: fix a bunch of warnings]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make some miscellaneous changes to the AFS filesystem:
(1) Assert RCU barriers on module exit to make sure RCU has finished with
callbacks in this module.
(2) Correctly handle the AFS server returning a zero-length read.
(3) Split out data zapping calls into one function (afs_zap_data).
(4) Rename some afs_file_*() functions to afs_*() where they apply to
non-regular files too.
(5) Be consistent about the presentation of volume ID:vnode ID in debugging
output.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since path_walk sets the total_link_count to 0 and calls link_path_walk, we
can just call path_walk directly.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup using simple_read_from_buffer() in binfmt_misc, configfs, and sysfs.
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>