Some user space tools need to identify SYSV shared memory when examining
/proc/<pid>/maps. To do so they look for a block device with major zero, a
dentry named SYSV<sysv key>, and having the minor of the internal sysv
shared memory kernel mount.
To help these tools and to make it easier for people just browsing
/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the
SYSV<key> dentry naming convention.
User space tools will still have to be aware that hugetlb sysv shared
memory lives on a different internal kernel mount and so has a different
block device minor number from the rest of sysv shared memory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Albert Cahalan <acahalan@gmail.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have to take care that when we call udf_discard_prealloc() from
udf_clear_inode() we have to write inode ourselves afterwards (otherwise,
some changes might be lost leading to leakage of blocks, use of free blocks
or improperly aligned extents).
Also udf_discard_prealloc() does two different things - it removes
preallocated blocks and truncates the last extent to exactly match i_size.
We move the latter functionality to udf_truncate_tail_extent(), call
udf_discard_prealloc() when last reference to a file is dropped and call
udf_truncate_tail_extent() when inode is being removed from inode cache
(udf_clear_inode() call).
We cannot call udf_truncate_tail_extent() earlier as subsequent open+write
would find the last block of the file mapped and happily write to the end
of it, although the last extent says it's shorter.
[akpm@linux-foundation.org: Make checkpatch.pl happier]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Sandeen <sandeen@sandeen.net>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We only ever set do_wakeup to non-zero if the pipe has an inode
backing, so it's pointless to check outside the pipe->inode
check.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If the destination pipe is full and we already transferred
data, we break out instead of waiting for more pipe room.
The exit logic looks at spd->nr_pages to see if we moved
everything inside the spd container, but we decrement that
variable in the loop to decide when spd has emptied.
Instead we want to compare to the original page count in
the spd, so cache that in a local variable.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As we have potentially dirtied more than 1 page, we should indicate as
such to the dirty page balancing. So call
balance_dirty_pages_ratelimited_nr() and pass in the approximate number
of pages we dirtied.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Allowing attribute and symlink dentries to be reclaimed means
sd->s_dentry can change dynamically. However, updates to the field
are unsynchronized leading to race conditions. This patch adds
sysfs_lock and use it to synchronize updates to sd->s_dentry.
Due to the locking around ->d_iput, the check in sysfs_drop_dentry()
is complex. sysfs_lock only protect sd->s_dentry pointer itself. The
validity of the dentry is protected by dcache_lock, so whether dentry
is alive or not can only be tested while holding both locks.
This is minimal backport of sysfs_drop_dentry() rewrite in devel
branch.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The condition check doesn't make much sense as it basically always
succeeds. This causes NULL dereferencing on certain cases. It seems
that parentheses are put in the wrong place. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Backport of
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch
For regular files in sysfs, sysfs_readdir wants to traverse
sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number.
But, the dentry can be reclaimed under memory pressure, and there is
no synchronization with readdir. This patch follows Tejun's scheme of
allocating and storing an inode number in the new s_ino member of a
sysfs_dirent, when dirents are created, and retrieving it from there
for readdir, so that the pointer chain doesn't have to be traversed.
Tejun's upstream patch uses a new-ish "ida" allocator which brings
along some extra complexity; this -stable patch has a brain-dead
incrementing counter which does not guarantee uniqueness, but because
sysfs doesn't hash inodes as iunique expects, uniqueness wasn't
guaranteed today anyway.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] CIFS should honour umask
[CIFS] Missing flag on negprot needed for some servers to force packet signing
[CIFS] whitespace cleanup part 2
[CIFS] whitespace cleanup
[CIFS] fix mempool destroy done in wrong order in cifs error path
[CIFS] typo in previous patch
[CIFS] Fix oops on failed cifs mount (in kthread_stop)
Report the correct errno for out of memory debug output in binfmt_flat.c
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes CIFS honour a process' umask like other filesystems.
Of course the server is still free to munge the permissions if it wants
to; but the client will send the "right" permissions to begin with.
A few caveats:
1) It only applies to filesystems that have CAP_UNIX (aka support unix
extensions)
2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms()
after remote creation
When mode to CIFS/NTFS ACL mapping is complete we can do the
same thing for that case for servers which do not
support the Unix Extensions.
Signed-off-by: Matt Keenen <matt@opcode-solutions.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Original patch and description from Neil Brown <neilb@suse.de>,
merged and adapted to splice branch by me. Neils text follows:
__generic_file_splice_read() currently samples the i_size at the start
and doesn't do so again unless it needs to call ->readpage to load
a page. After ->readpage it has to re-sample i_size as a truncate
may have caused that page to be filled with zeros, and the read()
call should not see these.
However there are other activities that might cause ->readpage to be
called on a page between the time that __generic_file_splice_read()
samples i_size and when it finds that it has an uptodate page. These
include at least read-ahead and possibly another thread performing a
read
So we must sample i_size *after* it has an uptodate page. Thus the
current sampling at the start and after a read can be replaced with a
sampling before page addition into spd.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
__generic_file_splice_read's partial page check, at eof after readpage,
not only got its calculations wrong, but also reused the loff variable:
causing data corruption when splicing from a non-0 offset in the file's
last page (revealed by ext2 -b 1024 testing on a loop of a tmpfs file).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
I've seen inode related deadlocks, so move this call outside of the
actor itself, which may hold the inode lock.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This bug was caught by LTP testcase fchmod06 on Blackfin platform.
In the manpage of fchmod, "EPERM: The effective UID does not match the
owner of the file, and the process is not privileged (Linux: it does not
have the CAP_FOWNER capability)."
But the ramfs nommu code missed the inode_change_ok POSIX UID/GID
verification. This patch fixed this.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The write path code intends to bug if a math error (or unhandled case)
results in a write outside of the current cluster boundaries. The actual
BUG_ON() statements however are incorrect, leading to a crash on kernels
with 64k page size. Fix those by checking against the right variables.
Also, move the assertions higher up within the functions so that they trip
*before* the code starts to mark buffers.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Some of the sysfs changes inadvertantly broke the simple runtime debug log
filtering employed in ocfs2. Fix this by properly exporting the masklog
category filter names.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
A related signature issue that I came across.
There's a bug in win2k that when NT error codes are not negotiated, the
server doesn't response that signatures are mandatory. Since there's
(currently) no way turn on signatures in such case, I had to force NT
error codes, so that this bug will not occur
Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Various coding style problems found by running the new
checkpatch.pl script against fs/cifs. 3 more files
fixed up.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Various coding style problems found by running fs/cifs
against the new checkpatch.pl script. Since there
were too many to fit in one patch. Updated the first
four files.
Signed-off-by: Steve French <sfrench@us.ibm.com>
* git://git.infradead.org/mtd-2.6:
[JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree()
[MTD] Fix error checking after get_mtd_device() in get_sb_mtd functions
[JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes()
[JFFS2] Fix potential memory leak of dead xattrs on unmount.
[JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files.
[MTD] generalise the handling of MTD-specific superblocks
[MTD] [MAPS] don't force uclinux mtd map to be root dev
We've had several reoprts of the CPU jumping to 0x00000000 is do_ioctl(). I
assume that there's a race and someone is zeroing out the ioctl handler while
this CPU waits for the lock_kernel().
The patch adds code to detect this, then emits stuff which will hopefuly lead
us to the culprit.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab cache used as memory pool can not be destroyed before the memory
pool destruction. Because the memory pool still holds some objects and
kmem_cache_destroy() says "Can't free all objects".
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
We should keep the mdata node with higher version number, not just the
one we happen to find latest. Doh.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
update_next_aext() could possibly rewrite values in elen and eloc, possibly
leading to data corruption when rewriting a file. Use temporary variables
instead. Also advance cur_epos as it can also point to an indirect extent
pointer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we have already read enough bytes, no need to call read_more().
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
we should free just the allocated blocks.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds a check for overlap of extents and cuts short the
new extent to be inserted, if there is a chance of overlap.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
mips:
fs/afs/flock.c: In function `afs_lock_may_be_available':
fs/afs/flock.c:55: error: dereferencing pointer to incomplete type
fs/afs/flock.c: In function `afs_lock_work':
fs/afs/flock.c:84: error: dereferencing pointer to incomplete type
fs/afs/flock.c:89: error: dereferencing pointer to incomplete type
fs/afs/flock.c:109: error: dereferencing pointer to incomplete type
fs/afs/flock.c:135: error: dereferencing pointer to incomplete type
fs/afs/flock.c:143: error: dereferencing pointer to incomplete type
fs/afs/flock.c:158: error: dereferencing pointer to incomplete type
fs/afs/flock.c:161: error: dereferencing pointer to incomplete type
fs/afs/flock.c:179: error: `TASK_UNINTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:179: error: (Each undeclared identifier is reported only once
fs/afs/flock.c:179: error: for each function it appears in.)
fs/afs/flock.c:179: error: `TASK_INTERRUPTIBLE' undeclared (first use in this function)
fs/afs/flock.c:182: error: dereferencing pointer to incomplete type
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Local variable `i' is a byte-counter. Don't use it as an index into an array
of le32's.
Reported-by: "young dave" <hidave.darkstar@gmail.com>
Cc: "Christoph Lameter" <clameter@sgi.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It should be pass "newsize" to vmtruncate function to modify the
inode->i_size, while the old size is passed to vmtruncate.
This bug was caught by LTP truncate test case on Blackfin platform.
After it was fixed, the LTP truncate test case passed.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current code is leaking a reference to dreq->kref when the calls to
nfs_direct_read_schedule() and nfs_direct_write_schedule() return an
error.
This patch moves the call to kref_put() from nfs_direct_wait() back into
nfs_direct_read() and nfs_direct_write() (which are the functions that
actually took the reference in the first place) fixing the leak.
Thanks to Denis V. Lunev for spotting the bug and proposing the original
fix.
Acked-by: Denis V. Lunev <dlunev@gmail.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The recent fix for preventing NULL files from being left around does not
update the file size corectly in all cases. The missing case is a write
extending the file that does not need to allocate a block.
In that case we used a read mapping of the extent which forced the use of
the read I/O completion handler instead of the write I/O completion
handle. Hence the file size was not updated on I/O completion.
SGI-PV: 965068
SGI-Modid: xfs-linux-melb:xfs-kern:28657a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nscott@aconex.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Why is it that since the 2f1a2ccb9c console
UTF-8 fixes went into 2.6.22-rc1, the PowerMac G5 shows only inverse video
question marks for the text on tty2-6? whereas tty1 is fine, and so is x86.
No fault of that patch: by removing the old fallback behaviour, it reveals
that 32-bit setfont running on 64-bit kernels has only really worked on
the current console, the rest getting faked by that inadequate fallback.
Bring the compat do_unimap_ioctl into line with the main one: PIO_UNIMAP
and GIO_UNIMAP apply to the specified tty, not redirected to fg_console.
Use the same checks, and most particularly, remember to check access_ok:
con_set_unimap and con_get_unimap are using __get_user and __put_user.
And the compat vt_check should ask for the same capability as the main
one, CAP_SYS_TTY_CONFIG rather than CAP_SYS_ADMIN. Added in vt_ioctl's
vc_cons_allocated check for safety, though failure may well be impossible.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We weren't cleaning up our inode reference on error in
ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if
need be. Move the code to set the alloc context inode info to the end of the
function so we don't have any possibility of passing back a bad pointer.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Use zero_user_page() instead of open-coding it.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we
can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just
return AOP_TRUNCATED_PAGE if the operation fails.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6:
NFS: Fix nfs_direct_dirty_pages()
NFS: Fix handful of compiler warnings in direct.c
NFS: Avoid a deadlock situation on write
We only need to dirty the pages that were actually read in.
Also convert nfs_direct_dirty_pages() to call set_page_dirty() instead of
set_page_dirty_lock(). A call to lock_page() is unacceptable in an rpciod
callback function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>