Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir.
It may be changed any time, but is used only on startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the code to create and remove client subdirectories from the
recovery directory, as described in the previous patch comment.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFSv4 clients are required to know what state they have on the server so that
they can reclaim it on server reboot. However, it is possible for
pathalogical combinations of server reboots and network partitions to leave a
client in a state where it cannot know whether it has lost its state on the
server.
For this reason, rfc3530 requires that we store some information about clients
to stable storage.
So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for
each client with active state. We leave open the possibility of including
files underneath each such subdirectory with information about the client, but
for now the subdirectories are empty.
We create a client subdirectory whenever a client makes its first non-reclaim
open_confirm.
We remove a client subdirectory whenever either
a) its lease expires, or
b) the grace period ends without it reclaiming anything.
When handling reclaims, we allow the reclaim if and only if the client doing
the reclaim has a subdirectory.
This patch adds just the code to scan the recovery directory on nfsd startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cb_parsed field is only used by probe_callback, to determine whether the
callback information has been filled in by setclientid. But there is no way
that probe_callback() can be called without that having already happened, so
that check is superfluous, as is cb_parsed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>From the language of rfc3530 section 8.1.3 (e.g., the suggestion that a
"process id" might be a reasonable lockowner value) it's conceivable that a
client might want to use the same lockowner string on multiple files, so we may
as well allow that. We expect each use of open_to_lockowner to create a
distinct seqid stream, though.
For now we're also allowing multiple uses of open_to_lockowner with the same
open, though it seems unlikely clients would actually do that.
Also add a comment reminding myself of some very non-scalable data structures.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Already done for struct nfs4_file; go ahead and do it for the other nfsd4
state structures.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor cleanup, remove some unnecessary printk's.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial whitespace and comment fixes.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change from "goto" to "else if" format in setclientid_confirm.
From: Fred Isaman
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFS4_INVAL is not a valid error for setclientid_confirm, and INUSE is the more
logical error here anyway.
From: Fred Isaman
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Setclientid_confirm code confused states 1 and 3 (numbering from the
IMPLEMENTATION section of rfc3530, section 14.2.33). Fix this.
State 1 allows the client to change the callback channel on the fly. We don't
implement this currently, so just turn off the callback channel in this case.
From: Fred Isaman
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Setclientid code assumes there is only one match in unconfirmed list.
Make sure that assumption holds.
From: Fred Isaman
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make needlessly global code static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For the purposes of reboot recovery, we want to do some work during the
transition period at the end of the grace period. Some of that work must be
guaranteed to have a certain relationship with the end of the grace period, so
we want to control the transition there.
Our approach is to modify the in_grace() checks to consult a global variable
instead of checking the time directly, to schedule the first run of the
laundromat thread at the end of the grace period, and to set the global
end-of-grace-period there.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor setclientid cleanup
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For the purposes of reboot recovery we keep a directory with subdirectories
each having a name that is the ascii hex representation of the md5 sum of a
client identifier for an active client.
This adds the code to calculate that name. We also use it for the purposes of
comparing clients, so if someone ever manages to find two client names that
are md5 collisions, then we'll return clid_inuse to the second.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can be a little more concise here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adopt standard kernel style by defining a no-op function instead of putting
ifdef's in the code where the function is called.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfs4_reclaim_init is no longer performing any useful function.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Separate out stuff that needs initialization on startup from stuff that only
needs initialization on module init from static data.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somewhat gratuitous rename to simplify following patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow recovery of delegations after reboot.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only way the protocol gives to change the lease time on the fly is to
simulate a reboot. We don't have that completely right in the current code;
among other things, we should probably put lockd in grace too while we do
this.
For now, let's just keep this simple, and wait till the next time nfsd starts
to register any changes in lease time. If the administrator really wants to
change the lease time *now*, they can go ahead and bring nfsd down and then
back up again after changing the lease time.
Also remove the "if (reclaim_str_hashtbl_size == 0)" case, a shortcut which
skips the grace period if we know of no clients in need of recovery. This
isn't going to work well with nlm.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're running the laundromat work on the default kevent worker thread. But
the laundromat takes the nfsv4 state semaphore, which is used for way too much
stuff, and the potential for deadlocks is high. Better to have this on a
separate workqueue.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor cleanup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
rpc_create_client was modified recently to do its own (synchronous) NULL ping
of the server. We'd rather do that on our own, asynchronously, so that we
don't have to block the nfsd thread doing the probe, and so that setclientid
handling (hence, client mounts) can proceed normally whether the callback is
succesful or not. (We can still function fine without the callback
channel--we just won't be able to give out delegations till it's verified to
work.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're trying to read and write from a struct file that we may not hold a
reference to any more (since a close could be processed as soon as we drop the
state lock).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Silence another annoying "failed to contact portmap (errno -512)" on shutdown.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a struct kref to each nfs4_file and take a reference to it from each
stateid and delegation that refers to it. The atomicity guarantees are
overkill given that all this stuff is done under the single nfsd4 state lock,
but a) we'd like finer-grained locking some day, and b) this simplifies the
cleanup of the structures a bit, something that has previously been a bit
complicated and bug-prone.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Go ahead and do this for struct nfs4_file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These remaining debugging counters haven't proved that useful.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allocate delegations from a slab cache.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allocate stateid's from a slab cache.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The structures the server uses to keep track of various pieces of nfsv4 state
(open files, outstanding delegations, etc.) are likely to be allocated and
deallocated frequently and seem reasonable candidates for slab caches.
While we're at it, the slab code keeps statistics that help catch leaks and
such, so we may as well take this chance to eliminate some debugging counters
that we've been keeping ourselves.
Start with the struct nfs4_file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We currently return err_grace if a user attempts a non-reclaim open during the
grace period. But we also need to prevent renames and removes, at least, to
ensure clients have the chance to recover state on files before they are moved
or deleted.
Of course, local users could also do renames and removes during the lease
period, and there's not much we can do about that. This at least will help
with remote users.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're returning NFS4_FH_NOEXPIRE_WITH_OPEN | NFS4_FH_VOL_RENAME for the
fh_expire_type attribute. This is incorrect:
1. The spec actually only allows NOEXPIRE_WITH_OPEN when
VOLATILE_ANY is also set.
2. Filehandles for open files can expire, if the file is removed
and there is a reboot.
3. Filehandles are only volatile on rename in the nosubtree check
case.
Unfortunately, there's no way to indicate that we only expire on remove. So
our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set
FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually
cause problems in practice and it seems possibly useful to give clients some
way to distinguish that case.
Fix a mispelled #define while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add OPEN claim type NFS4_OPEN_CLAIM_DELEGATE_CUR to nfsd4_open().
A delegation stateid and a name are provided. OPEN with O_CREAT is not legal
with this claim type; otherwise, use the NFS4_OPEN_CLAIM_NULL code path to
lookup the filename to be opened.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
State logic for OPEN with claim type CLAIM_DELEGATE_CUR, which the NFSv4
client uses to report local OPENs on a delegated file back to the NFSv4
server.
nfs4_check_deleg() performs input delegation stateid lookup and sanity check.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't really need to be doing a separate open for every stateid. And in
the case of an open from a client that already has a delegation on a file, it
unnecessarily results in a delegation recall.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Additional minor code reshuffling to prepare for claim_deleg_cur support.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Factor out a bit of common code that will be useful elsewhere.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make reiserfs BUG() when somebody tries to start a larger transaction than
it's allowed (currently the code just silently deadlocks).
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve space
for a quota operation in a transaction only if filesystem was mounted with
some quota option.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve a space
for a quota operation in a transaction only if filesystem was mounted with
some quota options.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check return values of journal_begin() and journal_end() in the quota code
for reiserfs.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
XFS will have to look at iocb->private to fix aio+dio. No other filesystem
is using the blockdev_direct_IO* end_io callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When do_sync_(read|write) encounters an aio method that makes use of the
retry mechanism, they fail to correctly retry the operation. This fixes
that by adding the appropriate sleep and retry mechanism.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/buffer.c::__block_prepare_write() has broken error recovery. It calls
the get_block() callback with "create = 1" and if that succeeds it
immediately clears buffer_new on the just allocated buffer (which has
buffer_new set).
The bug is that if an error occurs and get_block() returns != 0, we break
from this loop and go into recovery code. This code has this comment:
/* Error case: */
/*
* Zero out any newly allocated blocks to avoid exposing stale
* data. If BH_New is set, we know that the block was newly
* allocated in the above loop.
*/
So the intent is obviously good in that it wants to clear just allocated
and hence not zeroed buffers. However the code recognises allocated
buffers by checking for buffer_new being set.
Unfortunately __block_prepare_write() as discussed above already cleared
buffer_new on all allocated buffers thus no buffers will be cleared during
error recovery and old data will be leaked.
The simplest way I can see to fix this is to make the current recovery code
work by _not_ clearing buffer_new after calling get_block() in
__block_prepare_write().
We cannot safely allow buffer_new buffers to "leak out" of
__block_prepare_write(), thus we simply do a quick loop over the buffers
clearing buffer_new on each of them if it is set just before returning
"success" from __block_prepare_write().
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file duplicates <linux/posix_acl_xattr.h>, using slightly different
names.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch removes the f_error field and all checks of f_error.
Trond said:
f_error was introduced for NFS, and made sense when we were guaranteed
always to have a file pointer around when write errors occurred. Since
then, we have (for various reasons) had to introduce the nfs_open_context in
order to track the file read/write state, and it made sense to move our
f_error tracking there too.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows block device drivers to convert their ioctl functions to
unlocked_ioctl() like character devices and other subsystems. All
functions that were called with the BKL held before are still used that
way, but I would not be surprised if it could be removed from the ioctl
functions in drivers/block/ioctl.c themselves.
As a side note, I found that compat_blkdev_ioctl() acquires the BKL as
well, which looks like a bug. I have checked that every user of
disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so
it could easily be removed from compat_blkdev_ioctl().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch gets rid of some macro obfuscation from fs/eventpoll.c by
removing slab allocator wrappers and converting macros to static inline
functions.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This
is problematic for execution of 32 bit processes, which are not largefile
aware, either by SW emulation or by HW execution.
For such processes, the problem is two-fold:
1) When trying to open a file that is larger than 4G
the operation should fail, but it's not
2) Writing to offset larger than 4G should fail, but
it's not
The proposed patch takes advantage of the way 32 bit processes are
identified in ia64 systems. Such processes have PER_LINUX32 for their
personality. With the patch, the ia64 kernel will not enforce the
O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior
for all other architectures remains unchanged.
Signed-off-by: Yoav Zach <yoav.zach@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes O(n^2) super block loops in sync_inodes(),
sync_filesystems() etc. in favour of using __put_super_and_need_restart()
which I introduced earlier. We faced a noticably long freezes on sb
syncing when there are thousands of super blocks in the system.
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In a duplicate of lookup_create in the af_unix code Al commented what's
going on nicely, so let's bring that over to lookup_create before the copy
is going away (I'll send a patch soon)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Henrik Grubbstrom noted:
The 2.6.10 ext3_get_parent attempts to use ext3_find_entry to look up the
entry "..", which fails for dx directories since ".." is not present in the
directory hash table. The patch below solves this by looking up the dotdot
entry in the dx_root block.
Typical symptoms of the above bug are intermittent claims by nfsd that
files or directories are missing on exported ext3 filesystems.
cf https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D150759 and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=3D144556
ext3_get_parent() is IMHO the wrong place to fix this bug as it introduces
a lot of internals from htree into that function. Instead, I think this
should be fixed in ext3_find_entry() as in the below patch. This has the
added advantage that it works for any callers of ext3_find_entry() and not
just ext3_lookup_parent().
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Henrik Grubbstrom <grubba@grubba.org>
Cc: <ext2-devel@lists.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use lookup_one_len instead of opencoding a simplified lookup using
lookup_hash with a fake hash.
Also there's no need anymore for the d_invalidate as we have a completely
valid dentry now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some code duplicated in both callers into vfs_quota_on_mount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Various filesystem drivers have grown a get_dentry() function that's a
duplicate of lookup_one_len, except that it doesn't take a maximum length
argument and doesn't check for \0 or / in the passed in filename.
Switch all these places to use lookup_one_len.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Greg KH <greg@kroah.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
of /proc/devices from spilling over the provided page if more than 4096
bytes of string data are generated from all the registered character and
block devices in a system
Signed-off-by: Neil Horman <nhorman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
There is a race condition that can occur if an inode is allocated and then
released (using iput) during the ->fill_super functions. The race
condition is between kswapd and mount.
For most filesystems this can only happen in an error path when kswapd is
running concurrently. For isofs, however, the error can occur in a more
common code path (which is how the bug was found).
The logic here is "we want final iput() to free inode *now* instead of
letting it sit in cache if fs is going down or had not quite come up". The
problem is with kswapd seeing such inodes in the middle of being killed and
happily taking over.
The clean solution would be to tell kswapd to leave those inodes alone and
let our final iput deal with them. I.e. add a new flag
(I_FORCED_FREEING), set it before write_inode_now() there and make
prune_icache() leave those alone.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Request RDATTR_ERROR as an attribute in readdir to distinguish between a
directory being within an absent filesystem or one (or more) of its entries.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Basically copies the VFS's method for tracking writebacks and applies
it to the struct nfs_page.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Unless we're doing O_APPEND writes, we really don't care about revalidating
the file length. Just make sure that we catch any page cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.
Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we do not hold a valid stateid that is open for writes, there is little
point in doing an extra open of the file, as the RFC does not appear to
mandate this...
Make setattr use the correct stateid if we're holding mandatory byte
range locks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 currently returns the unsigned 64-bit cookie directly to
userspace. The following patch causes the kernel to generate
loff_t offsets for the benefit of userland.
The current server-generated READDIR cookie is cached in the
nfs_open_context instead of in filp->f_pos, so we still end up work
correctly under directory insertions/deletion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The changeset "trond.myklebust@fys.uio.no|ChangeSet|20050322152404|16979"
(RPC: Ensure XDR iovec length is initialized correctly in call_header)
causes the NFSv4 callback code to BUG() due to an incorrectly initialized
scratch buffer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Older gcc's don't like this.
fs/nfs/nfs4proc.c:2194: field `data' has incomplete type
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The Coverity checker noticed that such a simplification was possible.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Pointer arithmetic bug: p is in word units. This fixes a memory
corruption with big acls.
* Initialize pg_class to prevent a NULL pointer access.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Attach acls to inodes in the icache to avoid unnecessary GETACL RPC
round-trips. As long as the client doesn't retrieve any acls itself, only the
default acls of exiting directories and the default and access acls of new
directories will end up in the cache, which preserves some memory compared to
always caching the access and default acl of all files.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 has no concept of a umask on the server side: The client applies
the umask locally, and sends the effective permissions to the server.
This behavior is wrong when files are created in a directory that has a
default ACL. In this case, the umask is supposed to be ignored, and
only the default ACL determines the file's effective permissions.
Usually its the server's task to conditionally apply the umask. But
since the server knows nothing about the umask, we have to do it on the
client side. This patch tries to fetch the parent directory's default
ACL before creating a new file, computes the appropriate create mode to
send to the server, and finally sets the new file's access and default
acl appropriately.
Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial
version of this patch, as well as for arguing why we need this change.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds acl support fo nfs clients via the NFSACL protocol extension, by
implementing the getxattr, listxattr, setxattr, and removexattr iops for the
system.posix_acl_access and system.posix_acl_default attributes. This patch
implements a dumb version that uses no caching (and thus adds some overhead).
(Another patch in this patchset adds caching as well.)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add the missing NFS3ERR_NOTSUPP error code (defined in NFSv3) to the
system-to-protocol-error table in nfsd. The nfsacl extension uses this error
code.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently we return -ENOMEM for every single failure to create a new auth.
This is actually accurate for auth_null and auth_unix, but for auth_gss it's a
bit confusing.
Allow rpcauth_create (and the ->create methods) to return errors. With this
patch, the user may sometimes see an EINVAL instead. Whee.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache
acls of size up to a page. Also prepare for up to a page of acl data even
when the user doesn't pass in a buffer, as when they want to get the acl
length to decide what size buffer to allocate.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side write support for NFSv4 ACLs.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
writing acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 ACLs. Exports the raw xdr code via the
system.nfs4_acl extended attribute. It is up to userspace to decode the acl
(and to provide correctly xdr'd acls on setxattr), and to convert to/from
POSIX ACLs if desired.
This patch provides only the read support.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
reading acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make nfs4 fattr size calculations more explicit, revising them downward a
bit in the process.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>