Look for "nodir" in the hostdata mount option which is used to
set the NODIR flag in dlm_new_lockspace().
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This reduces the size of the directory code by about 3k and gets
readdir() to use the functions which were introduced in the previous
directory code update.
Two memory allocations are merged into one. Eliminates zeroing of some
buffers which were never used before they were initialised by
other data.
There is still scope for further improvement in the directory code.
On the logging side, a hand created mutex has been replaced by a
standard Linux mutex in the log allocation code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The various flags on inodes will in future be set and queried
via the extended attributes interface, so this interface is no
longer required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Due to a typo, the dir leaf split operation was (for the first
split in a directory) writing the new hash vaules at the
wrong offset. This is now fixed.
Also some other tidy ups are included:
- We use GFS2's hash function for dentries (see ops_dentry.c) so that
we don't have to keep recalculating the hash values.
- A lot of common code is eliminated between the various directory
lookup routines.
- Better error checking on directory lookup (previously different
routines checked for different errors)
- The leaf split operation has a couple of redundant operations
removed from it, so it should be faster.
There is still further scope for further clean ups in the directory
code, and readdir in particular could do with slimming down a bit.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
A user can use nfsservctl() to spam the logs.
This can happen because the arguments to the nfsservctl() system call are
versioned. This is a good thing. However, when a bad version is detected,
the kernel prints a message and then returns an error.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon. This was likely a part of the wierd cwd behavior
folks were seeing.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
in the ext2 filesystem "readdir()" function. It also cleans up the code
to avoid the unnecessary goto mess.
The bugs were related to re-valiating the f_pos value after somebody had
either done an "lseek()" on the directory to an invalid offset, or when
the offset had become invalid due to a file being unlinked in the
directory. The code would not only set the f_version too eagerly, it
would also not update f_pos appropriately for when the offset fixup took
place.
When that happened, we'd occasionally subsequently fail the readdir()
even when we shouldn't (no real harm done, but an ugly printk, and
obviously you would end up not necessarily seeing all entries).
Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
and had a test-case for it, and also fixed up a thinko in the first
version of this patch.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Masoud Sharbiani <masouds@google.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Coverity checker spotted the following bug in dup_namespace():
<-- snip -->
if (!new_ns->root) {
up_write(&namespace_sem);
kfree(new_ns);
goto out;
}
...
out:
return new_ns;
<-- snip -->
Callers expect a non-NULL result to not be freed.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.
However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.
migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations. This can cause
a direct read to attempt a double i_mutex unlock inside XFS.
We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.
Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2
Signed-off-by: Nathan Scott <nathans@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>
This fixes a race where lsn could be cleared before taking the lock
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In theory, NLM specs assure us that the server will only reply LCK_GRANTED or
LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request.
In practice, we should not assume this to be the case, and the code will
currently Oops if we do.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
mapping them to kernel errors. Problem spotted by Neil Horman
<nhorman@tuxdriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on an original patch by Mike O'Connor and Greg Banks of SGI.
Mike states:
A normal user can panic an NFS client and cause a local DoS with
'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the
user buffer starts with a valid mapped page and contains an unmapped page,
will crash in this way. I haven't followed the code, but O_DIRECT reads with
similar user buffers will probably also crash albeit in different ways.
Details: when nfs_get_user_pages() calls get_user_pages(), it detects and
correctly handles get_user_pages() returning an error, which happens if the
first page covered by the user buffer's address range is unmapped. However,
if the first page is mapped but some subsequent page isn't, get_user_pages()
will return a positive number which is less than the number of pages requested
(this behaviour is sort of analagous to a short write() call and appears to be
intentional). nfs_get_user_pages() doesn't detect this and hands off the
array of pages (whose last few elements are random rubbish from the newly
allocated array memory) to it's caller, whence they go to
nfs_direct_write_seg(), which then totally ignores the nr_pages it's given,
and calculates its own idea of how many pages are in the array from the user
buffer length. Needless to say, when it comes to transmit those uninitialised
page* pointers, we see a crash in the network stack.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One can do "chattr +j" on a file to change its journalling mode. Fix
writeback mode with "nobh" handling for it.
Even though, we mount ext3 filesystem in writeback mode with "nobh" option,
some one can do "chattr +j" on a single file to force it to do journalled
mode. In order to do journaling, ext3_block_truncate_page() need to
fallback to default case of creating buffers and adding them to transaction
etc.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes illegal __GFP_FS allocation inside ext3 transaction in
ext3_symlink(). Such allocation may re-enter ext3 code from
try_to_free_pages. But JBD/ext3 code keeps a pointer to current journal
handle in task_struct and, hence, is not reentrable.
This bug led to "Assertion failure in journal_dirty_metadata()" messages.
http://bugzilla.openvz.org/show_bug.cgi?id=115
Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix some bugs in mtd/jffs2 on 64bit platform.
The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h.
And some variables in jffs2 are declared as uint32_t but used to hold
size_t values.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent change to compat. dev_ifconf() in fs/compat_ioctl.c
causes ifconf data to be truncated 1 entry too early when copying it
to userspace. The correct amount of data (length) is returned,
but the final entry is empty (zero, not filled in).
The for-loop 'i' check should use <= to allow the final struct
ifreq32 to be copied. I also used the ifconf-corruption program
in kernel bugzilla #4746 to make sure that this change does not
re-introduce the corruption.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miscellaneous fixes related to accessing uninitialized variables or memory
that was already freed.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
DASD allows to open a device as soon as gendisk is registered, which means the
device is a fake device (capacity=0) and we do know nothing about blocksize
and partitions at that point of time. In case the device is opened by
someone, the bdev and inode creation is done with the fake device info and the
following partition detection code is just using the wrong data.
To avoid this modify the DASD state machine to make sure that the open is
rejected until the device analysis is either finished or an unformatted device
was detected.
Signed-off-by: Horst Hummel <horst.hummel@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench. Tested on both x86_64 and powerpc.
The way we do file struct accounting is not very suitable for batched
freeing. For scalability reasons, file accounting was
constructor/destructor based. This meant that nr_files was decremented
only when the object was removed from the slab cache. This is susceptible
to slab fragmentation. With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -
llm22:~ # cat /proc/sys/fs/file-nr
587730 0 758844
At the same time, I see only a 2000+ objects in filp cache. The following
patch I fixes this problem.
This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api. In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.
Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug in udf where it would write uid/gid = 0 to the disk for files
owned by the id given with the uid=/gid= mount options. It also adds 4 new
mount options: uid/gid=forget and uid/gid=ignore. Without any options the
id in core and on disk always match. Giving uid/gid=nnn specifies a
default ID to be used in core when the on disk ID is -1. uid/gid=ignore
forces the in core ID to allways be used no matter what the on disk ID is.
uid/gid=forget forces the on disk ID to always be written out as -1.
The use of these options allows you to override ownerships on a disk or
disable ownwership information from being written, allowing the media to be
used portably between different computers and possibly different users
without permissions issues that would require root to correct.
Signed-off-by: Phillip Susi <psusi@cfl.rr.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't do interruptible waits for the pipe mutex anywhere else any
more either, so don't do it in fifo_open() either.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The point of the smaps "shared" is to count the number of pages that are
mapped by more than one process, according to Mauricio Lin. However, smaps
uses page_count for this, so it will return a false positive for every page
that is mapped by just that one process, which is also in pagecache or
swapcache. There are false positive situations for anonymous pages not in
swapcache as well: - page reclaim, migration - get_user_pages (eg.
direct-io, ptrace)
Use page_mapcount instead, to count the number of mappings to the page.
Use vm_normal_page so that weird things like /dev/mem aren't counted either.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ramfs neglects to update the directory mtime and ctime fields when creating
a new symbolic link. Ramfs was modified in 2.6.15 to update these fields
when other types of entries are created. The symlink support is separate
from that other support, so that change did not cover quite all of the
possibilities.
All of the directory content manipulation entry points now seem to be
covered with respect to these time field updates.
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of cramfs images created by util-linux containing empty
regular files. Images created by cramfstools 1.x were ok.
Fill out inode contents in cramfs_iget5_set() instead of get_cramfs_inode()
to prevent issues if cramfs_iget5_test() is called with I_LOCK|I_NEW still
set.
Signed-off-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Cc: Olaf Hering <olh@suse.de>
Cc: Chris Mason <mason@suse.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
session when multiply mounted.
Fixes slow response when cifs client is mounted to shares on multiple
servers and oplock break occurs (usually due to attempt to multiply open a
file). When treeids on mutiple mounted shares match and we find the wrong
match first, we searched for the wrong cached files to send oplock break
response for which usually meant that no matching file was found and thus
the server would have to timeout the notification. Oplock break timeout is
about 20 seconds on some servers so this could cause significantly slower
performance on file open calls in a few cases (in particular when multiple
shares are mounted from multiple servers, tree ids match, and we have a
cached file which is later opened multiple times). This was the most
important of the bugs that was found and fixed at Connectathon
(interoperability testing event) this week.
Acked-by: Shaggy (shaggy@austin.ibm.com)
Signed-off-by: Steve French (sfrench@us.ibm.com)
In order to separate out the filesystem's metadata from "normal"
files and directories, a new filesystem type has been created.
It is called gfs2meta and mounting it gives access to the files
that were previously under .gfs2_admin (well still are until mkfs
is altered, which is next on the adgenda).
Its not currently possible to mount both gfs2 and gfs2meta on the
same block device at the same time. A future patch will allow that
to happen.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The bitmaps associated with generation numbers for directory entries
are declared as an array of ints. On some platforms, this causes alignment
exceptions.
The following patch uses the standard bitmap declaration macros to
declare the bitmaps, fixing the problem.
Originally from Takashi Iwai.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes bugs in reiserfs where unsigned integers were checked
whether they are less then 0.
Signed-off-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
v9fs has been plagued by an over-complicated approach trying to map Linux
dentry semantics to Plan 9 fid semantics. Our previous approach called for
aggressive flushing of the dcache resulting in several problems (including
wierd cwd behavior when running /bin/pwd).
This patch dramatically simplifies our handling of this fid management. Fids
will not be clunked as promptly, but the new approach is more functionally
correct. We now clunk un-open fids only when their dentry ref_count reaches 0
(and d_delete is called).
Another simplification is we no longer seek to match fids to the process-id or
uid of the action initiator. The uid-matching will need to be revisited when
we fix the security model.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lucho's atomic create+open fix had a bug in the super block initialization
causing all mounts to fail. He was freeing an fcall too early. This patch
fixes that oversight.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In order to assure atomic create+open v9fs stores the open fid produced by
v9fs_vfs_create in the dentry, from where v9fs_file_open retrieves it and
associates it with the open file.
This patch modifies v9fs to use nameidata.intent.open values to do the atomic
create+open.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a bug I introduced earlier with a kfree() and usage of
a structure in the wrong order. Also try and get the counts
of the journaled data buffers "more correct". Still some work
to do in this area though.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We no longer lookup ".gfs2_admin" in the root directory in order to
find it, but instead use the inode number given in the superblock.
Both the root directory and the admin directory are now looked up using
the same routine, so the redundant code is removed.
Also, there is no longer a reference to the root inode in the
GFS2 super block. When required this can be retreived via
sb->s_root->d_inode instead.
Assuming that we introduce a metadata filesystem type for GFS, then
this is a first step towards that goal.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Switch from list_head to hlist_head. Make the size of the hash dependent
upon the allocated area, rather than a constant.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
to prevent confusion when a virtual ip is created on the same interface
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The extent map code has long noticed when the on-disk extent information
is corrupt. However, so far it has only returned an error. We should
take the filesystem read-only, as it is corrupt.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Orphan dir recovery can deadlock with another process in
ocfs2_delete_inode() in some corner cases. Fix this by tracking recovery
state more closely and allowing it to handle inode wipes which might
deadlock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch finishes cleaning up the node manager allocations if it fails
to initialize.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
The check to determine which format string is appopriate for u64 and
friends works in most cases, but UML on x86_64 doesn't define CONFIG_X86_64,
so it results in screen fulls of compile-time warnings.
This patch fixes it to handle that case.
fs/ocfs2/cluster/masklog.h | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>