Commit graph

9059 commits

Author SHA1 Message Date
Jens Axboe
75065ff619 Revert "relay: fix splice problem"
This reverts commit c3270e577c.
2008-05-08 14:06:19 +02:00
Randy Dunlap
ffee0259c9 docbook: fix bio missing parameter
Fix fs/bio.c kernel-doc parameter warning:
Warning(linux-2.6.25-git14//fs/bio.c:972): No description found for parameter 'reading'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 18:35:03 +02:00
Jens Axboe
eeae1d48c0 block: use unitialized_var() in bio_alloc_bioset()
Better than setting idx to some random value and it silences the
same bogus gcc warning.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 13:26:27 +02:00
Jan Kara
9afadc4b1f udf: Fix memory corruption when fs mounted with noadinicb option
When UDF filesystem is mounted with noadinicb mount option, it
happens that we extend an empty directory with a block. A code in
udf_add_entry() didn't count with this possibility and used
uninitialized data leading to memory and filesystem corruption.
Add a check whether file already has some extents before operating
on them.

Signed-off-by: Jan Kara <jack@suse.cz>
2008-05-07 09:49:52 +02:00
Rasmus Rohde
221e583a73 udf: Make udf exportable
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Rasmus Rohde <rohde@duff.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
2008-05-07 09:48:23 +02:00
Miklos Szeredi
7f3d4ee108 vfs: splice remove_suid() cleanup
generic_file_splice_write() duplicates remove_suid() just because it
doesn't hold i_mutex.  But it grabs i_mutex inside splice_from_pipe()
anyway, so this is rather pointless.

Move locking to generic_file_splice_write() and call remove_suid() and
__splice_from_pipe() instead.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 09:29:00 +02:00
Steve French
cf432eb50f [CIFS] cleanup cifsd completion
Was a holdover from the old kernel_thread based cifsd
code. We needed to know that the thread had set the task variable
before proceeding. Now that kthread_run returns the new task, this
doesn't appear to be needed anymore.

As best I can tell, this sleep was intended to try to prevent
cifs_umount from freeing the cifsSesInfo struct before cifsd had
exited. Now that cifsd is using the kthread API, we know that
when kthread_stop returns that cifsd has exited, so I don't
think this is needed any longer.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Christop Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-05-06 22:27:16 +00:00
Steve French
dea570e08a [CIFS] Remove over-indented code in find_unc().
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-05-06 22:05:51 +00:00
Linus Torvalds
6ce07c7b61 VFS: fix unused variable warning
Commit 33dcdac2df ("kill ->put_inode")
removed the final use of i_op->put_inode, but left the now totally
unused "op" variable in iput().

Get rid of it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:13:37 -07:00
Al Viro
0b2bac2f1e [PATCH] fix SMP ordering hole in fcntl_setlk()
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.

As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides.  We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).

Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves.  Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-06 13:58:34 -04:00
Steve French
a815752ac0 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-05-06 17:55:32 +00:00
Christoph Hellwig
33dcdac2df [PATCH] kill ->put_inode
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-06 13:45:34 -04:00
Roman Zippel
dca3c33652 [PATCH] fix reservation discarding in affs
- remove affs_put_inode, so preallocations aren't discared unnecessarily
  often.
- remove affs_drop_inode, it's called with a spinlock held, so it can't
  use a mutex.
- make i_opencnt atomic
- avoid direct b_count manipulations
- a few allocation failure fixes, so that these are more gracefully
  handled now.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-06 13:45:33 -04:00
Bryan Wu
eb28062f13 task_nommu: fix compile failing bug because of spilt file.h
CC      fs/proc/task_nommu.o
fs/proc/task_nommu.c: In function ‘task_mem’:
fs/proc/task_nommu.c:55: error: dereferencing pointer to incomplete type
make[2]: *** [fs/proc/task_nommu.o] Error 1
make[1]: *** [fs/proc] Error 2
make: *** [fs] Error 2

Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-04 17:08:48 -07:00
Ulrich Drepper
d35c7b0e54 unified (weak) sys_pipe implementation
This replaces the duplicated arch-specific versions of "sys_pipe()" with
one unified implementation.  This removes almost 250 lines of duplicated
code.

It's marked __weak, so that *if* an architecture wants to override the
default implementation it can do so by simply having its own replacement
version, since many architectures use alternate calling conventions for
the 'pipe()' system call for legacy reasons (ie traditional UNIX
implementations often return the two file descriptors in registers)

I still haven't changed the cris version even though Linus says the BKL
isn't needed.  The arch maintainer can easily do it if there are really
no obstacles.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-03 13:50:33 -07:00
Linus Torvalds
4f9faaace2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  rose: Wrong list_lock argument in rose_node seqops
  netns: Fix reassembly timer to use the right namespace
  netns: Fix device renaming for sysfs
  bnx2: Update version to 1.7.5.
  bnx2: Update RV2P firmware for 5709.
  bnx2: Zero out context memory for 5709.
  bnx2: Fix register test on 5709.
  bnx2: Fix remote PHY initial link state.
  bnx2: Refine remote PHY locking.
  bridge: forwarding table information for >256 devices
  tg3: Update version to 3.92
  tg3: Add link state reporting to UMP firmware
  tg3: Fix ethtool loopback test for 5761 BX devices
  tg3: Fix 5761 NVRAM sizes
  tg3: Use constant 500KHz MI clock on adapters with a CPMU
  hci_usb.h: fix hard-to-trigger race
  dccp: ccid2.c, ccid3.c use clamp(), clamp_t()
  net: remove NR_CPUS arrays in net/core/dev.c
  net: use get/put_unaligned_* helpers
  bluetooth: use get/put_unaligned_* helpers
  ...
2008-05-03 10:18:21 -07:00
Steve French
5ade9deaaa [CIFS] fix typo
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-05-02 20:56:23 +00:00
Linus Torvalds
be2e88011b Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Use GFP_NOFS in kmalloc during localalloc window move
  ocfs2: Allow uid/gid/perm changes of symlinks
  ocfs2/dlm: dlmdebug.c: make 2 functions static
  ocfs2: make struct o2cb_stack_ops static
  ocfs2: make struct ocfs2_control_device static
  ocfs2: Correct merge of 52f7c21 (Move /sys/o2cb to /sys/fs/o2cb)
2008-05-02 13:53:07 -07:00
Linus Torvalds
b66e1f11eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] fix sysctl_nr_open bugs
  [PATCH] sanitize anon_inode_getfd()
  [PATCH] split linux/file.h
  [PATCH] make osf_select() use core_sys_select()
  [PATCH] remove horrors with irix tty ioctls handling
  [PATCH] fix file and descriptor handling in perfmon
2008-05-02 11:23:14 -07:00
Denis V. Lunev
78e92b99ec netns: assign PDE->data before gluing entry into /proc tree
In this unfortunate case, proc_mkdir_mode wrapper can't be used anymore and
this is no way to reuse proc_create_data due to nlinks assignment. So,
copy the code from proc_mkdir and assign PDE->data at the appropriate
moment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 04:12:41 -07:00
Linus Torvalds
2c4aabcca8 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [MTD][NOR] Add physical address to point() method
  [JFFS2] Track parent inode for directories (for NFS export)
  [JFFS2] Invert last argument of jffs2_gc_fetch_inode(), make it boolean.
  [JFFS2] Quiet lockdep false positive.
  [JFFS2] Clean up jffs2_alloc_inode() and jffs2_i_init_once()
  [MTD] Delete long-unused jedec.h header file.
  [MTD] [NAND] at91_nand: use at91_nand_{en,dis}able consistently.
2008-05-01 11:15:28 -07:00
Jared Hulbert
a98889f3d8 [MTD][NOR] Add physical address to point() method
Adding the ability to get a physical address from point() in addition
to virtual address.  This physical address is required for XIP of
userspace code from flash.

Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 18:59:11 +01:00
David Woodhouse
27c72b040c [JFFS2] Track parent inode for directories (for NFS export)
To support NFS export, we need to know the parent inode of directories.
Rather than growing the jffs2_inode_cache structure, share space with
the nlink field -- which was always set to 1 for directories anyway.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 18:47:17 +01:00
Al Viro
5c598b3428 [PATCH] fix sysctl_nr_open bugs
* if luser with root sets it to something that is not a multiple of
  BITS_PER_LONG, the system is screwed.
* if it gets decreased at the wrong time, we can get expand_files()
  returning success and _not_ increasing the size of table as asked.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:08:57 -04:00
Al Viro
2030a42cec [PATCH] sanitize anon_inode_getfd()
a) none of the callers even looks at inode or file returned by anon_inode_getfd()
b) any caller that would try to look at those would be racy, since by the time
it returns we might have raced with close() from another thread and that
file would be pining for fjords.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:08:50 -04:00
Al Viro
9f3acc3140 [PATCH] split linux/file.h
Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:08:16 -04:00
Al Viro
a2dcb44c3c [PATCH] make osf_select() use core_sys_select()
... instead of open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:07:28 -04:00
David Woodhouse
1b690b4878 [JFFS2] Invert last argument of jffs2_gc_fetch_inode(), make it boolean.
We don't actually care about nlink; we only care whether the inode in
question is unlinked or not.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 17:24:28 +01:00
Harvey Harrison
bd7309677c fuse: use clamp() rather than nested min/max
clamp() exists for this use.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:02 -07:00
Jan Blunck
868eb7a853 autofs: path_{get,put}() cleanups
Here are some more places where path_{get,put}() can be used instead of
dput()/mntput() pair.  Besides that it fixes a bug in autofs4_mount_busy()
where mntput() was called before dput().

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Jeff Moyer
9d2de6ad2a autofs4: fix incorrect return from root.c:try_to_fill_dentry()
Jeff Moyer has identified a case where the autofs4 function
root.c:try_to_fill_dentry() can return -EBUSY when it should return 0.

Jeff's description of the way this happens is:

"automount starts an expire for directory d.  after the callout to the daemon,
but before the rmdir, another process tries to walk into the same directory.
It puts itself onto the waitq, pending the expiration.

When the expire finishes, the second process is woken up.  In
try_to_fill_dentry, it does this check:

                status = d_invalidate(dentry);
                if (status != -EBUSY)
                        return -EAGAIN;

And status is EBUSY.  The dentry still has a non-zero d_inode, and the
flags do not contain LOOKUP_CONTINUE or LOOKUP_DIRECTORY

So, we fall through and return -EBUSY to the caller."

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Jeff Moyer
033790449b autofs4: fix execution order race in mount request code
Jeff Moyer has identified a race in due to an execution order dependency
in the autofs4 function root.c:try_to_fill_dentry().

Jeff's description of this race is:

"P1 does a lookup of /mount/submount/foo.  Since the VFS can't find an entry
for "foo" under /mount/submount, it calls into the autofs4 kernel module to
allocate a new dentry, D1.  The kernel creates a new waitq for this lookup and
calls the daemon to perform the mount.

The daemon performs a mkdir of the "foo" directory under /mount/submount,
which ends up creating a *new* dentry, D2.

Then, P2 does a lookup of /mount/submount/foo.  The VFS path walking logic
finds a dentry in the dcache, D2, and calls the revalidate function with this.
 In the autofs4 revalidate code, we then trigger a mount, since the dentry is
an empty directory that isn't a mountpoint, and so set DCACHE_AUTOFS_PENDING
and call into the wait code to trigger the mount.

The wait code finds our existing waitq entry (since it is keyed off of the
directory name) and adds itself to the list of waiters.

After the daemon finishes the mount, it calls back into the kernel to release
the waiters.  When this happens, P1 is woken up and goes about clearing the
DCACHE_AUTOFS_PENDING flag, but it does this in D1!  So, given that P1 in our
case is a program that will immediately try to access a file under
/mount/submount/foo, we end up finding the dentry D2 which still has the
pending flag set, and we set out to wait for a mount *again*!

So, one way to address this is to re-do the lookup at the end of
try_to_fill_dentry, and to clear the pending flag on the hashed dentry.  This
seems a sane approach to me."

And Jeff's patch does this.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Ian Kent
cab0936aac autofs4: check for invalid dentry in getpath
Catch invalid dentry when calculating its path.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Ian Kent
afec570c32 autofs4: fix sparse warning in waitq.c:autofs4_expire_indirect()
Re-order some code in expire.c:autofs4_expire_indirect() to avoid compile
warning, reported by Harvey Harrison:

 CHECK   fs/autofs4/expire.c
fs/autofs4/expire.c:383:2: warning: context imbalance in
'autofs4_expire_indirect' - unexpected unlock

Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Miklos Szeredi
02c6be615f vfs: fix permission checking in sys_utimensat
If utimensat() is called with both times set to UTIME_NOW or one of them to
UTIME_NOW and the other to UTIME_OMIT, then it will update the file time
without any permission checking.

I don't think this can be used for anything other than a local DoS, but could
be quite bewildering at that (e.g.  "Why was that large source tree rebuilt
when I didn't modify anything???")

This affects all kernels from 2.6.22, when the utimensat() syscall was
introduced.

Fix by doing the same permission checking as for the "times == NULL" case.

Thanks to Michael Kerrisk, whose utimensat-non-conformances-and-fixes.patch in
-mm also fixes this (and breaks other stuff), only he didn't realize the
security implications of this bug.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
David Woodhouse
590fe34c47 [JFFS2] Quiet lockdep false positive.
Don't hold f->sem while calling into jffs2_do_create(). It makes lockdep
unhappy, and we don't really need it -- the _reason_ it's a false
positive is because nobody else can see this inode yet and so nobody
will be trying to lock it anyway.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 15:53:28 +01:00
David Woodhouse
4e571aba7b [JFFS2] Clean up jffs2_alloc_inode() and jffs2_i_init_once()
Ditch a couple of pointless casts from void *, and use the normal
variable name 'f' for jffs2_inode_info pointers -- especially since
it actually shows up in lockdep reports.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 12:29:37 +01:00
Al Viro
214b7049a7 Fix dnotify/close race
We have a race between fcntl() and close() that can lead to
dnotify_struct inserted into inode's list *after* the last descriptor
had been gone from current->files.

Since that's the only point where dnotify_struct gets evicted, we are
screwed - it will stick around indefinitely.  Even after struct file in
question is gone and freed.  Worse, we can trigger send_sigio() on it at
any later point, which allows to send an arbitrary signal to arbitrary
process if we manage to apply enough memory pressure to get the page
that used to host that struct file and fill it with the right pattern...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 20:09:00 -07:00
Sunil Mushran
4ba1c5bfd2 ocfs2: Use GFP_NOFS in kmalloc during localalloc window move
kmalloc() during a localalloc window move can trigger the mm to prune
the dcache which inturn can trigger the fs to delete an inode causing
it start a recursive transaction.

The fix also makes the change in kmalloc during localalloc shutdown
just to be safe.

Fixes oss bugzilla#901
http://oss.oracle.com/bugzilla/show_bug.cgi?id=901

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:09:58 -07:00
Sunil Mushran
bc535809c0 ocfs2: Allow uid/gid/perm changes of symlinks
This patch adds the ability to change attributes of a symlink.
Fixes oss bugzilla#963
http://oss.oracle.com/bugzilla/show_bug.cgi?id=963

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:09:54 -07:00
Adrian Bunk
95642e5664 ocfs2/dlm: dlmdebug.c: make 2 functions static
This patch makes the following needlessly global functions static:
- stringify_lockname()
- dlm_debug_put()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:09:40 -07:00
Adrian Bunk
4af694e672 ocfs2: make struct o2cb_stack_ops static
This patch makes the needlessly global struct o2cb_stack_ops static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:09:25 -07:00
Adrian Bunk
4d8755b5e6 ocfs2: make struct ocfs2_control_device static
This patch makes the needlessly global struct ocfs2_control_device
static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:09:08 -07:00
Joel Becker
9d80f7539a ocfs2: Correct merge of 52f7c21 (Move /sys/o2cb to /sys/fs/o2cb)
Commit 52f7c21b61 was intended to move
/sys/o2cb to /sys/fs/o2cb, providing /sys/o2cb as a symlink for
backwards compatibility.  However, the merge apparently added the
symlink but failed to move the directory, resulting in a duplicate
filename error.  It's a one-line change that was missing.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30 17:07:59 -07:00
Robert P. J. Day
883ce42ec4 DEBUGFS: Correct location of debugfs API documentation.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:47 -07:00
Ben Hutchings
40a2159abf sysfs: Disallow truncation of files in sysfs
sysfs allows attribute files to be truncated, e.g. using ftruncate(), with the
expected effect on their inode.   For most attributes, this doesn't change the
"real" size of the file i.e. how much can be read from it.  However, the
parameter validation for reading and writing binary attribute files is based
on the inode size and not the size specified in the file's bin_attribute, so it
can be broken by this. For example, if we try using dd to write to such a file:

# pwd
/sys/bus/pci/devices/0000:08:00.0
# ls -l config
-rw-r--r--  1 root root 4096 Feb  1 17:35 config
# dd if=/dev/zero of=config bs=4 count=1
1+0 records in
1+0 records out
# ls -l config
-rw-r--r--  1 root root 0 Feb  1 17:50 config
# dd if=/dev/zero of=config bs=4 count=1 seek=128
dd: writing `config': No space left on device
1+0 records in
0+0 records out

Also, after truncation to 0, parameter validation for read and write is
disabled.  Most bin_attribute read and write methods also validate the size and
offset, but for some this will allow out-of-range access.  This may be a
security issue, though access to such files is often limited to root.  In any
case, the validation should remain for safety's sake!)

This was previously reported in Bugzilla as bug 9867.

sysfs should ignore size changes or else refuse them (by returning -EINVAL).
This patch makes it ignore them.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:46 -07:00
Linus Torvalds
d67c6f869c Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] Update default configuration.
  [S390] use generic sys_ptrace
  [S390] Remove self ptrace IEEE_IP hack.
  [S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
  [S390] System z large page support.
  [S390] Convert machine feature detection code to C.
  [S390] vmemmap: use clear_table to initialise page tables.
  [S390] Move stfl to system.h and delete duplicated version.
  [S390] uaccess_mvcos: #ifdef config dependent code.
  [S390] cpu topology: Fix possible deadlock.
  [S390] Add topology_core_siblings to topology.h
  [S390] cio: Make isc handling more robust.
  [S390] remove -traditional
  [S390] Automatically detect added cpus.
  [S390] smp: Fix locking order.
  [S390] Add missing ifndef/define to include/asm-s390/sysinfo.h.
  [S390] Move show_regs to traps.c.
  [S390] cio: Use strict_strtoul() for attributes.
2008-04-30 08:38:30 -07:00
Harvey Harrison
8e24eea728 fs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Harvey Harrison
530b641278 afs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Thomas Gleixner
c6f3a97f86 debugobjects: add timer specific object debugging code
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Andrew Morton
487798df6d hfsplus: fix warning with 64k PAGE_SIZE
fs/hfsplus/btree.c: In function 'hfsplus_bmap_alloc':
fs/hfsplus/btree.c:239: warning: comparison is always false due to limited range of data type

But this might hide a real bug?

Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:52 -07:00
Andrew Morton
3e5a509730 hfs: fix warning with 64k PAGE_SIZE
fs/hfs/btree.c: In function 'hfs_bmap_alloc':
fs/hfs/btree.c:263: warning: comparison is always false due to limited range of data type

The patch makes the warning go away, but the code might actually be buggy?

Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:52 -07:00
Marcin Slusarz
07132922aa sysv: [bl]e*_add_cpu conversion
replace all:
big/little_endian_variable = cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	[bl]eX_add_cpu(&big/little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:52 -07:00
Marcin Slusarz
e3592b12f5 quota: le*_add_cpu conversion
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Marcin Slusarz
20c79e785a hfs/hfsplus: be*_add_cpu conversion
replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
					expression_in_cpu_byteorder);
with:
	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Marcin Slusarz
6369a4abb4 affs: be*_add_cpu conversion
replace all:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
					expression_in_cpu_byteorder);
with:
	beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Christoph Hellwig
86098fa011 reiserfs: use open_bdev_excl
Use the proper helper to open a blockdevice by name for filesystem use,
this makes sure it's properly claimed (also added for open-by-number) and
gets rid of the struct file abuse.

Tested by mounting a reiserfs filesystem with external journal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
4dbf930ed6 fuse: fix sparse warnings
fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
5559b8f4d1 fuse: fix race in llseek
Fuse doesn't use i_mutex to protect setting i_size, and so
generic_file_llseek() can be racy: it doesn't use i_size_read().

So do a fuse specific llseek method, which does use i_size_read().

[akpm@linux-foundation.org: make `retval' loff_t]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
b48badf013 fuse: fix node ID type
Node ID is 64bit but it is passed as unsigned long to some functions.  This
breakage wasn't noticed, because libfuse uses unsigned long too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
e5d9a0df07 fuse: fix max i/o size calculation
Fix a bug that Werner Baumann reported: fuse can send a bigger write request
than the maximum specified.  This only affected direct_io operation.

In addition set a sane minimum for the max_read and max_write tunables, so I/O
always makes some progress.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
5c5c5e51b2 fuse: update file size on short read
If the READ request returned a short count, then either

  - cached size is incorrect
  - filesystem is buggy, as short reads are only allowed on EOF

So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Nick Piggin
ea9b9907b8 fuse: implement perform_write
Introduce fuse_perform_write.  With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).

[mszeredi@suse.cz]:

 - split into smaller functions
 - testing
 - duplicate generic_file_aio_write(), so that there's no need to add a
   new ->perform_write() a_op.  Comment from hch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
854512ec35 fuse: clean up setting i_size in write
Extract common code for setting i_size in write functions into a common
helper.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
3be5a52b30 fuse: support writable mmap
Quoting Linus (3 years ago, FUSE inclusion discussions):

  "User-space filesystems are hard to get right. I'd claim that they
   are almost impossible, unless you limit them somehow (shared
   writable mappings are the nastiest part - if you don't have those,
   you can reasonably limit your problems by limiting the number of
   dirty pages you accept through normal "write()" calls)."

Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others).  This nicely solved the biggest problem: limiting the number of pages
used for write caching.

Some small details remained, however, which this largish patch attempts to
address.  It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks.  Performance may not be very
good for certain usage patterns, but generally it should be acceptable.

It has been tested extensively with fsx-linux and bash-shared-mapping.

Fuse page writeback design
--------------------------

fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.

The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.

For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented.  The per-bdi writeback count is not decremented until the actual
write completes.

On dirtying the page, fuse waits for a previous write to finish before
proceeding.  This makes sure, there can only be one temporary page used at a
time for one cached page.

This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?

The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write.  It may be buggy or even
malicious, and fail to complete WRITE requests.  We don't want unrelated parts
of the system to grind to a halt in such cases.

Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request.  There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.

Currently there are several cases where the kernel can block on page
writeback:

  - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
  - page migration
  - throttle_vm_writeout (through NR_WRITEBACK)
  - sync(2)

Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.

As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default.  This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.

With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
fc3ba692a4 mm: Add NR_WRITEBACK_TEMP counter
Fuse will use temporary buffers to write back dirty data from memory mappings
(normal writes are done synchronously).  This is needed, because there cannot
be any guarantee about the time in which a write will complete.

By using temporary buffers, from the MM's point if view the page is written
back immediately.  If the writeout was due to memory pressure, this
effectively migrates data from a full zone to a less full zone.

This patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used
as temporary buffers.

[Lee.Schermerhorn@hp.com: add vmstat_text for NR_WRITEBACK_TEMP]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
e4ad08fe64 mm: bdi: add separate writeback accounting capability
Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB.  If this flag is
set, then don't update the per-bdi writeback stats from
test_set_page_writeback() and test_clear_page_writeback().

Misc cleanups:

 - convert bdi_cap_writeback_dirty() and friends to static inline functions
 - create a flag that includes all three dirty/writeback related flags,
   since almst all users will want to have them toghether

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
b6f2fcbcfc mm: bdi: expose the BDI object in sysfs for FUSE
Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"

Make the fuse control filesystem use s_dev instead of a fuse specific ID.
This makes it easier to match directories under /sys/fs/fuse/connections/ with
directories under /sys/class/bdi, and with actual mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Miklos Szeredi
fa799759f9 mm: bdi: expose the BDI object in sysfs for NFS
Register NFS' backing_dev_info under sysfs with the name "nfs-MAJOR:MINOR"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Sukadev Bhattiprolu
718a916338 devpts: factor out PTY index allocation
Factor out the code used to allocate/free a pts index into new interfaces,
devpts_new_index() and devpts_kill_index().  This localizes the external data
structures used in managing the pts indices.

[akpm@linux-foundation.org: undo accidental mutex2sem conversion]
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:48 -07:00
Alan Cox
f34d7a5b70 tty: The big operations rework
- Operations are now a shared const function block as with most other Linux
  objects

- Introduce wrappers for some optional functions to get consistent behaviour

- Wrap put_char which used to be patched by the tty layer

- Document which functions are needed/optional

- Make put_char report success/fail

- Cache the driver->ops pointer in the tty as tty->ops

- Remove various surplus lock calls we no longer need

- Remove proc_write method as noted by Alexey Dobriyan

- Introduce some missing sanity checks where certain driver/ldisc
  combinations would oops as they didn't check needed methods were present

[akpm@linux-foundation.org: fix fs/compat_ioctl.c build]
[akpm@linux-foundation.org: fix isicom]
[akpm@linux-foundation.org: fix arch/ia64/hp/sim/simserial.c build]
[akpm@linux-foundation.org: fix kgdb]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:47 -07:00
Alan Cox
5d0fdf1e01 tty_io: fix remaining pid struct locking
This fixes the last couple of pid struct locking failures I know about.

[oleg@tv-sign.ru: clean up do_task_stat()]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Alan Cox
04f378b198 tty: BKL pushdown
- Push the BKL down into the line disciplines
- Switch the tty layer to unlocked_ioctl
- Introduce a new ctrl_lock spin lock for the control bits
- Eliminate much of the lock_kernel use in n_tty
- Prepare to (but don't yet) call the drivers with the lock dropped
  on the paths that historically held the lock

BKL now primarily protects open/close/ldisc change in the tty layer

[jirislaby@gmail.com: a couple of fixes]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Oleg Nesterov
2800d8d19e document de_thread() with exit_notify() connection
Add a couple of small comments, it is not easy to see what this code does.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:38 -07:00
Roland McGrath
f3de272b82 signals: use HAVE_SET_RESTORE_SIGMASK
Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK.  If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
4e4c22c711 signals: add set_restore_sigmask
This adds the set_restore_sigmask() inline in <linux/thread_info.h> and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it.  No
change, but abstracts the details of the flag protocol from all the calls.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
7a5e873f09 signals: de_thread: simplify the ->child_reaper switching
Now that we rely on SIGNAL_UNKILLABLE flag, de_thread() doesn't need the nasty
hack to kill the old ->child_reaper during the mt-exec.

This also means we can avoid taking tasklist_lock around zap_other_threads().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
06fffb1267 do_task_stat: don't take rcu_read_lock()
lock_task_sighand() was changed, and do_task_stat() doesn't need
rcu_read_lock any longer.  sighand->siglock protects all "interesting"
fields.

Except: it doesn't protect ->tty->pgrp, but neither does rcu_read_lock(), this
should be fixed.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Pavel Emelyanov <xemul@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Jan Kara
2deb1acc65 isofs: fix access to unallocated memory when reading corrupted filesystem
When a directory on isofs is corrupted, we did not check whether length of the
name in a directory entry and the length of the directory entry itself are
consistent.  This could lead to possible access beyond the end of buffer when
the length of the name was too big.  Add this sanity check to directory
reading code.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
David Chinner
64275ea4f3 [XFS] Include linux/random.h in all builds, not just debug.
Noted-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dave Chinner <dgc@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 07:53:50 -07:00
Gerald Schaefer
53492b1de4 [S390] System z large page support.
This adds hugetlbfs support on System z, using both hardware large page
support if available and software large page emulation on older hardware.
Shared (large) page tables are implemented in software emulation mode,
by using page->index of the first tail page from a compound large page
to store page table information.

Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-04-30 13:38:47 +02:00
Linus Torvalds
c4755d16fc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits)
  ext4: fix hot spins in mballoc after err_freebuddy and err_freemeta
  ext4: fix test ext_generic_write_end() copied return value
  ext3: fix test ext_generic_write_end() copied return value
  ext4: Move mballoc headers/structures to a seperate header file mballoc.h
  ext4: cleanup for compiling mballoc with verification and debugging #defines
  ext4: don't use ext4_error in ext4_check_descriptors
  ext4: mark inode dirty after initializing the extent tree
  ext4: update ctime and mtime for truncate with extents.
  ext4: Don't do GFP_NOFS allocations after taking ext4_lock_group
  ext4: move headers out of include/linux
  ext4: fix wrong gfp type under transaction
  ext4: Fix hang on umount with quotas when journal is aborted
  ext4: Fix update of mtime and ctime on rename
  jdb2: replace remaining __FUNCTION__ occurrences
  ext4: replace remaining __FUNCTION__ occurrences
  jbd2: only create debugfs and stats entries if init is successful
  jbd2: fix kernel-doc notation
  jbd2: replace potentially false assertion with if block
  jbd2: eliminate duplicated code in revocation table init/destroy functions
  jbd2: tidy up revoke cache initialisation and destruction
  ...
2008-04-29 20:34:49 -07:00
Linus Torvalds
c15a2434ed Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (24 commits)
  [XFS] Fix build failure after enabling CONFIG_XFS_DEBUG
  [XFS] remove dmapi cruft in xfs_file.c
  [XFS] remove sendfile leftovers
  [XFS] allow enabling CONFIG_XFS_DEBUG
  [XFS] Don't initialise new inode generation numbers to zero
  [XFS] Fix check for block zero access in xfs_write_iomap_allocate()
  [XFS] Don't double count reserved block changes on UP.
  [XFS] remove xfs_log_ticket_zone on rmmod
  [XFS] fix non-smp xfs build
  [XFS] Fix broken HAVE_SPLICE removal commit.
  [XFS] kill XFS_ICSB_SB_LOCKED
  [XFS] split xfs_icsb_balance_counter
  [XFS] Add xfs_icsb_sync_counters_locked for when m_sb_lock already held
  [XFS] Cleanup xfs_attr a bit with xfs_name and remove cred
  [XFS] kill usesless IHOLD calls in xfs_remove and xfs_rmdir
  [XFS] kill parent == child checks in xfs_remove and xfs_rmdir
  [XFS] kill usesless IHOLD calls in xfs_rename
  [XFS] remove manual lookup from xfs_rename and simplify locking
  [XFS] shrink mrlock_t
  [XFS] simplify xfs_lookup
  ...
2008-04-29 20:34:17 -07:00
Roel Kluin
f1fa3342e2 ext4: fix hot spins in mballoc after err_freebuddy and err_freemeta
In ext4_mb_init_backend() 'i' is of type ext4_group_t. Since unsigned, i
>= 0 is always true, so fix hot spins after err_freebuddy: and -meta:
and prevent decrements when zero.

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:01:15 -04:00
Roel Kluin
f8a87d8930 ext4: fix test ext_generic_write_end() copied return value
'copied' is unsigned, whereas 'ret2' is not. The test (copied < 0) fails

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:01:18 -04:00
Roel Kluin
7c2f3d6f89 ext3: fix test ext_generic_write_end() copied return value
'copied' is unsigned, whereas 'ret2' is not. The test (copied < 0) fails

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:01:27 -04:00
Mingming Cao
8f6e39a7ad ext4: Move mballoc headers/structures to a seperate header file mballoc.h
Move function and structure definiations out of mballoc.c and put it under
a new header file mballoc.h

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:01:31 -04:00
Solofo Ramangalahy
60bd63d192 ext4: cleanup for compiling mballoc with verification and debugging #defines
This patch allows compiling mballoc with:
#define AGGRESSIVE_CHECK
#define DOUBLE_CHECK
#define MB_DEBUG

It fixes:
Compilation errors:
fs/ext4/mballoc.c: In function '__mb_check_buddy':
fs/ext4/mballoc.c:605: error: 'struct ext4_prealloc_space' has no member named 'group_list'
fs/ext4/mballoc.c:606: error: 'struct ext4_prealloc_space' has no member named 'pstart'
fs/ext4/mballoc.c:608: error: 'struct ext4_prealloc_space' has no member named 'len'

Compilation warnings:
fs/ext4/mballoc.c: In function 'ext4_mb_normalize_group_request':
fs/ext4/mballoc.c:2863: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'
fs/ext4/mballoc.c: In function 'ext4_mb_use_inode_pa':
fs/ext4/mballoc.c:3103: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'

Sparse check:
fs/ext4/mballoc.c:3818:2: warning: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block

Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 21:59:59 -04:00
Josef Bacik
c19204b0ae ext4: don't use ext4_error in ext4_check_descriptors
Because ext4_check_descriptors is called at mount time you can't use ext4_error
as it calls ext4_commit_sb, which since the sb isn't all the way initialized
causes bad things to happen (ie a panic).  This patch changes the ext4_error's
to printk's to keep this problem from happening.  Thanks much,

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:28 -04:00
Aneesh Kumar K.V
8753e88f1b ext4: mark inode dirty after initializing the extent tree
We should mark the inode dirty only after initializing the extent
tree.  Also if we fail during extent initialization we need
to call DQUOT_FREE_INODE.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:36 -04:00
Solofo Ramangalahy
ef7377289a ext4: update ctime and mtime for truncate with extents.
The recently announced "Linux POSIX file system test suite"
caught a truncate issue when using extents:
mtime and ctime are not updated when truncate is successful.

This is the single issue caught with "default" ext4 (mkfs and mount
with minimal options).
The testsuite does not report failure with -o noextents.

With the following patch, all tests of the testsuite pass.

Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:41 -04:00
Aneesh Kumar K.V
c83617db76 ext4: Don't do GFP_NOFS allocations after taking ext4_lock_group
We can't do GFP_NOFS allocation after taking ext4_lock_group

BUG: sleeping function called from invalid context at mm/slab.c:3054
in_atomic():1, irqs_disabled():0
1 lock held by vi/2426:
#0:  (&ei->i_data_sem){----}, at: [<c01cf665>] ext4_release_file+0x23/0x66
Pid: 2426, comm: vi Not tainted 2.6.25-rc7 #24
[<c011a3dc>] __might_sleep+0xbe/0xc5
[<c01620c9>] kmem_cache_alloc+0x22/0xa6
[<c01e382a>] ext4_mb_release_inode_pa+0x73/0x1b3
[<c01e6adf>] ext4_mb_discard_inode_preallocations+0x22d/0x2d4
[<c013000a>] ? param_set_ushort+0x32/0x39
[<c01ceba1>] ext4_discard_reservation+0x27/0x6a
[<c01cf66c>] ext4_release_file+0x2a/0x66
[<c0165bd6>] __fput+0xae/0x155
[<c0165e46>] fput+0x17/0x19
[<c0163756>] filp_close+0x50/0x5a
[<c01647c0>] sys_close+0x71/0xad
[<c0104aba>] sysenter_past_esp+0x5f/0xa5

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:00:47 -04:00
Christoph Hellwig
3dcf54515a ext4: move headers out of include/linux
Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 18:13:32 -04:00
Josef Bacik
216553c4b7 ext4: fix wrong gfp type under transaction
This fixes the allocations with GFP_KERNEL while under a transaction problems
in ext4.  This patch is the same as its ext3 counterpart, just switches these
to GFP_NOFS.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:02 -04:00
Jan Kara
2887df139c ext4: Fix hang on umount with quotas when journal is aborted
Call dquot_drop() from ext4_dquot_drop() even if we fail to start a
transaction. Otherwise we never get to dropping references to quota structures
from the inode and umount will hang indefinitely.  Thanks to Payphone LIOU for
spotting the problem.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
CC: Payphone LIOU <lioupayphone@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:07 -04:00
Jan Kara
53b7e9f680 ext4: Fix update of mtime and ctime on rename
The patch below makes ext4 update mtime and ctime of the directory
into which we move file even if the directory entry already exists.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 22:02:11 -04:00
Dave Jones
355a46961b trivial: fix user-visible typo in hfsplus
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 13:23:21 -07:00
Steve French
9b1ec9ecea [CIFS] Remove duplicate call to mode_to_acl
The current logic in cifs_setattr calls mode_to_acl twice on mode
changes if cifsacl is enabled. Remove the duplicate call.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-04-29 20:15:43 +00:00
Linus Torvalds
bd5d435a96 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Skip I/O merges when disabled
  block: add large command support
  block: replace sizeof(rq->cmd) with BLK_MAX_CDB
  ide: use blk_rq_init() to initialize the request
  block: use blk_rq_init() to initialize the request
  block: rename and export rq_init()
  block: no need to initialize rq->cmd with blk_get_request
  block: no need to initialize rq->cmd in prepare_flush_fn hook
  block/blk-barrier.c:blk_ordered_cur_seq() mustn't be inline
  block/elevator.c:elv_rq_merge_ok() mustn't be inline
  block: make queue flags non-atomic
  block: add dma alignment and padding support to blk_rq_map_kern
  unexport blk_max_pfn
  ps3disk: Remove superfluous cast
  block: make rq_init() do a full memset()
  relay: fix splice problem
2008-04-29 08:18:03 -07:00
Jeff Moyer
39fa00311f aio: fix misleading comments
The FIXME comments are inaccurate.
The locking comment over lookup_ioctx() is wrong.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:29 -07:00
Harvey Harrison
97a4feb4a7 ncpfs: use get/put_unaligned_* helpers
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:28 -07:00
Harvey Harrison
58d485d481 isofs: use get/put_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:28 -07:00
Harvey Harrison
8b3789e5d5 hfsplus: use get/put_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:28 -07:00
Harvey Harrison
803f445f17 fat: use get/put_unaligned_* helpers
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:28 -07:00
David Howells
9396d496d7 afs: support the CB.ProbeUuid RPC op
Add support for the CB.ProbeUuid cache manager RPC op.  This allows a modern
OpenAFS server to quickly ask if the client has been rebooted.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
David Howells
7c80bcce34 afs: the AFS RPC op CBGetCapabilities is actually CBTellMeAboutYourself
The AFS RxRPC op CBGetCapabilities is actually CBTellMeAboutYourself.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Robert P. J. Day
0ae52d6fba afs: use the shorter LIST_HEAD for brevity
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Hirofumi Nakagawa
801678c5a3 Remove duplicated unlikely() in IS_ERR()
Some drivers have duplicated unlikely() macros.  IS_ERR() already has
unlikely() in itself.

This patch cleans up such pointless code.

Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:25 -07:00
Pavel Emelyanov
d7321cd624 sysctl: add the ->permissions callback on the ctl_table_root
When reading from/writing to some table, a root, which this table came from,
may affect this table's permissions, depending on who is working with the
table.

The core hunk is at the bottom of this patch.  All the rest is just pushing
the ctl_table_root argument up to the sysctl_perm() function.

This will be mostly (only?) used in the net sysctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Pavel Emelyanov
7708bfb1c8 sysctl: merge equal proc_sys_read and proc_sys_write
Many (most of) sysctls do not have a per-container sense.  E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget and so on
and so forth.  Besides, tuning then from inside a container is not even
secure.  On the other hand, hiding them completely from the container's tasks
sometimes causes user-space to stop working.

When developing net sysctl, the common practice was to duplicate a table and
drop the write bits in table->mode, but this approach was not very elegant,
lead to excessive memory consumption and was not suitable in general.

Here's the alternative solution.  To facilitate the per-container sysctls
ctl_table_root-s were introduced.  Each root contains a list of
ctl_table_header-s that are visible to different namespaces.  The idea of this
set is to add the permissions() callback on the ctl_table_root to allow ctl
root limit permissions to the same ctl_table-s.

The main user of this functionality is the net-namespaces code, but later this
will (should) be used by more and more namespaces, containers and control
groups.

Actually, this idea's core is in a single hunk in the third patch.  First two
patches are cleanups for sysctl code, while the third one mostly extends the
arguments set of some sysctl functions.

This patch:

These ->read and ->write callbacks act in a very similar way, so merge these
paths to reduce the number of places to patch later and shrink the .text size
(a bit).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Denis V. Lunev
79da3664f6 jbd2: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
19b4fc52d6 reiserfs: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

/proc entry owner is also added.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
46fe74f2ae ext4: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
21ac295b42 afs: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
34b37235c6 nfs: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
9ef2db2630 nfsd: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Denis V. Lunev
59b7435149 proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code.  The original OOPS is described in the
commit 2d3a4e3666:

    Typical PDE creation code looks like:

    	pde = create_proc_entry("foo", 0, NULL);
    	if (pde)
    		pde->proc_fops = &foo_proc_fops;

    Notice that PDE is first created, only then ->proc_fops is set up to
    final value. This is a problem because right after creation
    a) PDE is fully visible in /proc , and
    b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
       possible to ->read without ->open (see one class of oopses below).

    The fix is new API called proc_create() which makes sure ->proc_fops are
    set up before gluing PDE to main tree. Typical new code looks like:

    	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
    	if (!pde)
    		return -ENOMEM;

    Fix most networking users for a start.

    In the long run, create_proc_entry() for regular files will go.

In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.

create_proc_entries is replaced in the entire kernel code as new method
is also simply better.

This patch:

The problem is the same as for de->proc_fops.  Right now PDE becomes visible
without data set.  So, the entry could be looked up without data.  This, in
most cases, will simply OOPS.

proc_create_data call is created to address this issue.  proc_create now
becomes a wrapper around it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
b640a89ddd proc: convert /proc/tty/ldiscs to seq_file interface
Note: THIS_MODULE and header addition aren't technically needed because
      this code is not modular, but let's keep it anyway because people
      can copy this code into modular code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
8731f14d37 proc: remove ->get_info infrastructure
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.

P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
      is long-standing crap, BTW, thus
      a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
      b) new such users should be rejected,
      c) everyone is encouraged to convert his favourite ->read_proc user or
         I'll do it, lazy bastards.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:19 -07:00
Alexey Dobriyan
c74c120a21 proc: remove proc_root from drivers
Remove proc_root export.  Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.

So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
928b4d8c89 proc: remove proc_root_driver
Use creation by full path: "driver/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
36a5aeb878 proc: remove proc_root_fs
Use creation by full path instead: "fs/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
9c37066d88 proc: remove proc_bus
Remove proc_bus export and variable itself. Using pathnames works fine
and is slightly more understandable and greppable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
5e971dce0b proc: drop several "PDE valid/invalid" checks
proc-misc code is noticeably full of "if (de)" checks when PDE passed is
always valid.  Remove them.

Addition of such check in proc_lookup_de() is for failed lookup case.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
7cee4e00e0 proc: less special case in xlate code
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail.  However, if NULL is passed as parent then create/remove
accept full path as a argument.  This is arbitrary restriction -- all
infrastructure is in place.

So, patch allows the following to succeed:

	create_proc_entry("foo/bar", 0, pde_baz);
	remove_proc_entry("baz/foo/bar", &proc_root);

Also makes the following to behave identically:

	create_proc_entry("foo/bar", 0, NULL);
	create_proc_entry("foo/bar", 0, &proc_root);

Discrepancy noticed by Den Lunev (IIRC).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
f649d6d326 proc: simplify locking in remove_proc_entry()
proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Roland McGrath
638fa202cd procfs: mem permission cleanup
This cleans up the permission checks done for /proc/PID/mem i/o calls.  It
puts all the logic in a new function, check_mem_permission().

The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
magical expression multiple times.  The new function does all that work in one
place, with clear comments.

The old code called security_ptrace() twice on successful checks, once in
MAY_PTRACE() and once in __ptrace_may_attach().  Now it's only called once,
and only if all other checks have succeeded.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
0d5c9f5f59 proc: switch to proc_create()
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Matt Helsley
925d1c401f procfs task exe symlink
The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA.  Then the path to the file is reconstructed and
reported as the result.

Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.

That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs.  So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped.  This avoids pinning the mounted filesystem.

[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap]
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Alexey Dobriyan
e93b4ea20a proc: print more information when removing non-empty directories
This usually saves one recompile to insert similar printk like below. :)

Sample nastygram:

remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G   M     2.6.25-rc1 #5

Call Trace:
 [<ffffffff80231974>] warn_on_slowpath+0x64/0x90
 [<ffffffff80232a6e>] printk+0x4e/0x60
 [<ffffffff802d6c8a>] remove_proc_entry+0x18a/0x200
 [<ffffffff8045cd88>] mutex_lock_nested+0x1c8/0x2d0
 [<ffffffff8025f0f0>] __try_stop_module+0x0/0x40
 [<ffffffff8025effd>] sys_delete_module+0x14d/0x200
 [<ffffffff8045df3d>] lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff8031c307>] __up_read+0x27/0xa0
 [<ffffffff8045decc>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b6ab>] system_call_after_swapgs+0x7b/0x80

---[ end trace 10ef850597e89c54 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
WANG Cong
4220b7fe89 elf: fix shadowed variables in fs/binfmt_elf.c
Fix these sparse warings:
fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one
fs/binfmt_elf.c:1734:28: originally declared here
fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one
fs/binfmt_elf.c:1892:24: originally declared here

[akpm@linux-foundation.org: chose better variable name]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Cyrill Gorcunov
6970c8eff8 BINFMT: fill_elf_header cleanup - use straight memset first
This patch does simplify fill_elf_header function by setting
to zero the whole elf header first. So we fillup the fields
we really need only.

before:
   text    data     bss     dec     hex filename
  11735      80       0   11815    2e27 fs/binfmt_elf.o

after:
   text    data     bss     dec     hex filename
  11710      80       0   11790    2e0e fs/binfmt_elf.o

viola, 25 bytes of text is freed

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Balbir Singh
cf475ad28a cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage.  The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined.  It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes.  The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Serge E. Hallyn
08ce5f16ee cgroups: implement device whitelist
Implement a cgroup to track and enforce open and mknod restrictions on device
files.  A device cgroup associates a device access whitelist with each cgroup.
 A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers.  Major
and minor are either an integer or * for all.  Access is a composition of r
(read), w (write), and m (mknod).

The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
the parent.  Admins can then remove devices from the whitelist or add new
entries.  A child cgroup can never receive a device access which is denied its
parent.  However when a device access is removed from a parent it will not
also be removed from the child(ren).

An entry is added using devices.allow, and removed using
devices.deny.  For instance

	echo 'c 1:3 mr' > /cgroups/1/devices.allow

allows cgroup 1 to read and mknod the device usually known as
/dev/null.  Doing

	echo a > /cgroups/1/devices.deny

will remove the default 'a *:* mrw' entry.

CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
has.  Any task can move itself between cgroups.  This won't be sufficient, but
we can decide the best way to adequately restrict movement later.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Michael Halcrow
2f9b12a31f eCryptfs: protect crypt_stat->flags in ecryptfs_open()
Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
6a3fd92e73 eCryptfs: make key module subsystem respect namespaces
Make eCryptfs key module subsystem respect namespaces.

Since I will be removing the netlink interface in a future patch, I just made
changes to the netlink.c code so that it will not break the build.  With my
recent patches, the kernel module currently defaults to the device handle
interface rather than the netlink interface.

[akpm@linux-foundation.org: export free_user_ns()]
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
f66e883eb6 eCryptfs: integrate eCryptfs device handle into the module.
Update the versioning information.  Make the message types generic.  Add an
outgoing message queue to the daemon struct.  Make the functions to parse
and write the packet lengths available to the rest of the module.  Add
functions to create and destroy the daemon structs.  Clean up some of the
comments and make the code a little more consistent with itself.

[akpm@linux-foundation.org: printk fixes]
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Michael Halcrow
8bf2debd5f eCryptfs: introduce device handle for userspace daemon communications
A regular device file was my real preference from the get-go, but I went with
netlink at the time because I thought it would be less complex for managing
send queues (i.e., just do a unicast and move on).  It turns out that we do
not really get that much complexity reduction with netlink, and netlink is
more heavyweight than a device handle.

In addition, the netlink interface to eCryptfs has been broken since 2.6.24.
I am assuming this is a bug in how eCryptfs uses netlink, since the other
in-kernel users of netlink do not seem to be having any problems.  I have had
one report of a user successfully using eCryptfs with netlink on 2.6.24, but
for my own systems, when starting the userspace daemon, the initial helo
message sent to the eCryptfs kernel module results in an oops right off the
bat.  I spent some time looking at it, but I have not yet found the cause.
The netlink interface breaking gave me the motivation to just finish my patch
to migrate to a regular device handle.  If I cannot find out soon why the
netlink interface in eCryptfs broke, I am likely to just send a patch to
disable it in 2.6.24 and 2.6.25.  I would like the device handle to be the
preferred means of communicating with the userspace daemon from 2.6.26 on
forward.

This patch:

Functions to facilitate reading and writing to the eCryptfs miscellaneous
device handle.  This will replace the netlink interface as the preferred
mechanism for communicating with the userspace eCryptfs daemon.

Each user has his own daemon, which registers itself by opening the eCryptfs
device handle.  Only one daemon per euid may be registered at any given time.
The eCryptfs module sends a message to a daemon by adding its message to the
daemon's outgoing message queue.  The daemon reads the device handle to get
the oldest message off the queue.

Incoming messages from the userspace daemon are immediately handled.  If the
message is a response, then the corresponding process that is blocked waiting
for the response is awakened.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Miklos Szeredi
9c3580aa52 ecryptfs: add missing lock around notify_change
Callers of notify_change() need to hold i_mutex.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Harvey Harrison
18d1dbf1d4 ecryptfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Adrian Bunk
05db67a4f2 remove ecryptfs_header_cache_0
Remove the no longer used ecryptfs_header_cache_0.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
OGAWA Hirofumi
762873c251 vfs: fix unconditional write_super() call in file_fsync()
We need to check ->s_dirt before calling write_super().  It became the cause
of an unneeded write.

This bug was noticed by Sudhanshu Saxena.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
David Howells
8f0cfa52a1 xattr: add missing consts to function arguments
Add missing consts to xattr function arguments.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Jan Blunck
7ec02ef159 vfs: remove lives_below_in_same_fs()
Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is
providing the same functionality.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Matthias Kaehlcke
c5c8be3ce5 fs/inode.c: use hlist_for_each_entry()
fs/inode.c: use hlist_for_each_entry() in find_inode() and find_inode_fast()

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Jan Kara
af065b8a19 vfs: skip inodes without pages to free in drop_pagecache_sb()
Many inodes have no pagecache, so we can avoid lots of lock-takings.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Jan Kara
eccb95cee4 vfs: fix lock inversion in drop_pagecache_sb()
Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
before calling __invalidate_mapping_pages().  We just have to make sure inode
won't go away from under us by keeping reference to it and putting the
reference only after we have safely resumed the scan of the inode list.  A bit
tricky but not too bad...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
David Howells
e1d2c8b69a fdpic: check that the size returned by kernel_read() is what we asked for
Check that the size of the read returned by kernel_read() is what we asked
for.  If it isn't, then reject the binary as being a badly formatted.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
S.Caglar Onur
2e50b6ccda fs/binfmt_aout.c: use printk_ratelimit()
Use printk_ratelimit() instead of jiffies based arithmetic, suggested by Geert
Uytterhoeven

Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Pavel Emelyanov
3a2e7f47d7 binfmt_misc.c: avoid potential kernel stack overflow
This can be triggered with root help only, but...

Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and
try launching the cat.txt file (by anyone) :) The result is - the endless
recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and
stack overflow.

There's a similar problem with binfmt_script, and there's a sh_bang memner on
linux_binprm structure to handle this, but simply raising this in binfmt_misc
may break some setups when the interpreter of some misc binaries is a script.

So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang)
and raise it in load_misc_binary.  After this, even if we set up the misc ->
script -> misc loop for binfmts one of them will step on its own bang and
exit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00