Fix RedHat bug 329431
The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.
Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.
The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.
CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.
This patch does the following:
1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.
2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.
3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.
4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.
Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.
Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
request
In SendReceive() function in transport.c - it memcpy's
message payload into a buffer passed via out_buf param. The function
assumes that all buffers are of size (CIFSMaxBufSize +
MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller
(MAX_CIFS_SMALL_BUFFER_SIZE) buffers. There are eight callers
(SMB worker functions) which are primarily affected by this change:
TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes,
Lock and PosixLock
CC: Dave Kleikamp <shaggy@austin.ibm.com>
CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When find_writable_file is racing with close and the session
to the server goes down, Shaggy noticed that there was a
chance that an open file in the list of files off the inode
could have been freed by close since cifs_reconnect can
block (the spinlock thus not held). This means that
we have to start over at the beginning of the list in some
cases.
There is a 2nd change that needs to be made later
(pointed out by Jeremy Allison and Shaggy) in order to
prevent cifs_close ever freeing the cifs per file info
when a write is pending. Although we delay close from
freeing this memory for sufficiently long for all known
cases, ultimately on a very, very slow write
overlapping a close pending we need to allow close to return
(without freeing the cifs file info) and defer freeing the
memory to be the responsibility of the (sloooow) write
thread (presumably have to look at every place wrtPending
is decremented - and add a flag for deferred free for
after wrtPending goes to zero).
Acked-by: Shaggy <shaggy@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Harmless since it only protected turning off caching for the
inode, but cleaner to lock around this in case we have a close
racing with open.
Signed-off-by: Shaggy <shaggy@us.ibm.com>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
There was a case in which find_writable_file was not waiting long enough
under heavy stress when writepages was racing with close of the file
handle being used by the write.
Signed-off-by: Steve French <sfrench@us.ibm.com>
On a mount without posix extensions enabled, when an unlock request is
made, the client can release more than is intended. To reproduce, on a
CIFS mount without posix extensions enabled:
1) open file
2) do fcntl lock: start=0 len=1
3) do fcntl lock: start=2 len=1
4) do fcntl unlock: start=0 len=1
...on the unlock call the client sends an unlock request to the server
for both locks. The problem is a bad test in cifs_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Caused by unneeded reopen during reconnect while spinlock held.
Fixes kernel bugzilla bug #7903
Thanks to Lin Feng Shen for testing this, and Amit Arora for
some nice problem determination to narrow this down.
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Previously the only way to do this was to umount all mounts to that server,
turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled).
Fixes Samba bugzilla bug number: 4582 (and also 2008)
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's common for file systems to need to zero data on either side of a
write, if a page is not Uptodate during prepare_write. It just so happens
that simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This should be the last big batch of whitespace/formatting fixes.
checkpatch warnings for the cifs directory are down about 90% and
many of the remaining ones are harder to remove or make the code
harder to read.
Signed-off-by: Steve French <sfrench@us.ibm.com>
nfsd is passing null nameidata (probably the only one doing that)
on call to create - cifs was missing one check for this.
Note that running nfsd over a cifs mount requires specifying fsid on
the nfs exports entry and requires mounting cifs with serverino mount
option.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally at http://lkml.org/lkml/2006/9/2/86
The recent change to "allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call" introduced a new semaphore in struct cifsFileInfo,
lock_sem. However, semaphores used as mutexes are deprecated these days,
and there's no reason to add a new one to the kernel. Therefore, convert
lock_sem to a struct mutex (and also fix one indentation glitch on one of
the lines changed anyway).
Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Also expand debug entry to show which character on a failed Unicode
mapping.
Acked-by: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
file->f_path.dentry or file->f_path.dentry.d_inode can't be NULL since at
least ten years, similar for all but very few arguments passed in from the
VFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Could cause hangs on smp systems in i_size_read on a cifs inode
whose size has been previously simultaneously updated from
different processes.
Thanks to Brian Wang for some great testing/debugging on this
hard problem.
Fixes kernel bugzilla #7903
CC: Shirish Pargoankar <shirishp@us.ibm.com>
CC: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The two cifs functions that used the most stack according
to "make checkstack" have been changed to use less stack.
Thanks to jra and Shaggy for helpful ideas
Signed-off-by: Steve French <sfrench@us.ibm.com>
cc: jra@samba.org
cc: shaggy@us.ibm.com
This also adds he required page "writeback" flag handling, that cifs
hasn't been doing and that the page dirty flag changes made obvious.
Acked-by: Steve French <smfltc@us.ibm.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CIFS implements ->readpages and doesn't use read_cache_pages(). So wire the
read IO accounting up within CIFS.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the cifs
filesystem.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Informational/debug message was being logged too often. The error
case of logging having to send a close with (presumably stuck on buggy
server) pending writes is still logged.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This just ignore the remaining pages, and will fix a forgot put_pages_list().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove inclusions of linux/mpage.h that are no longer necessary due to the
transfer of generic_writepages().
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call. TODO - restrict this to servers
that support NT_STATUS codes (Win9x will probably
not support this call).
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 570d4d2d895569825d0d017d4e76b51138f68864 commit)
request and do not time out slow requests to a server that is still responding
well to other threads
Suggested by jra of Samba team
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 89b57148115479eef074b8d3f86c4c86c96ac969 commit)
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CIFS takes/releases f_owner.lock - why? It does not change anything in the
fowner state. Remove this locking.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request. Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".
To make all this sane, the patch changes range of writeback_control.
So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.
And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.
This patch does,
- Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
-1 is usually ok for range_end (type is long long). But, if someone did,
range_end += val; range_end is "val - 1"
u64val = range_end >> bits; u64val is "~(0ULL)"
or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
things, and uses LLONG_MAX for range_end.
- All callers of ->writepages() sets range_start/end or range_cyclic.
- Fix updates of ->writeback_index. It seems already bit strange.
If it starts at 0 and ended by check of nr_to_write, this last
index may reduce chance to scan end of file. So, this updates
->writeback_index only if range_cyclic is true or whole-file is
scanned.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes oops to OS/2 on ls and removes redundant NTCreateX calls to servers
which do not support NT SMBs. Key operations to OS/2 work.
Signed-off-by: Steve French <sfrench@us.ibm.com>
cifs should not be overwriting an element of the aops structure, since the
structure is shared by all cifs inodes. Instead define a separate aops
structure to suit each purpose.
I also took the liberty of replacing a hard-coded 4096 with PAGE_CACHE_SIZE
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Wasn't able to reproduce a hard hang, but was able to get an oops if
suspended the machine during a copy to the cifs mount. This led to some
things hanging, including a "sync". Also got I/O errors when trying to
access the mount afterwards (even when didn't see the oops), and had
to unmount and remount in order to access the filesystem.
This patch fixed the oops.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>