Commit graph

24787 commits

Author SHA1 Message Date
Jan Kara
853a0c25ba udf: Mark LVID buffer as uptodate before marking it dirty
When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.

Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:10 +01:00
Jan Kara
33c104d415 ext3: Don't warn from writepage when readonly inode is spotted after error
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.

CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
0048278552 jbd: Remove j_barrier mutex
j_barrier mutex is used for serializing different journal lock operations.  The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.

So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.

CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jeff Mahoney
a9e36da655 reiserfs: Force inode evictions before umount to avoid crash
This patch fixes a crash in reiserfs_delete_xattrs during umount.

When shrink_dcache_for_umount clears the dcache from
generic_shutdown_super, delayed evictions are forced to disk. If an
evicted inode has extended attributes associated with it, it will
need to walk the xattr tree to locate and remove them.

But since shrink_dcache_for_umount will BUG if it encounters active
dentries, the xattr tree must be released before it's called or it will
crash during every umount.

This patch forces the evictions to occur before generic_shutdown_super
by calling shrink_dcache_sb first. The additional evictions caused
by the removal of each associated xattr file and dir will be automatically
handled as they're added to the LRU list.

CC: reiserfs-devel@vger.kernel.org
CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
a06d789b42 reiserfs: Fix quota mount option parsing
When jqfmt mount option is not specified on remount, we mistakenly clear
s_jquota_fmt value stored in superblock. Fix the problem.

CC: stable@kernel.org
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
fef2e9f330 udf: Treat symlink component of type 2 as /
Currently, we ignore symlink component of type 2. But mkisofs and other OS'
seem to treat it as / so do the same for compatibility.

Reported-by: "Gábor S." <otnaccess@hotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
d2eb8c3593 udf: Fix deadlock when converting file from in-ICB one to normal one
During BKL removal in 2.6.38, conversion of files from in-ICB format to normal
format got broken. We call ->writepage with i_data_sem held but udf_get_block()
also acquires i_data_sem thus creating A-A deadlock.

We fix the problem by dropping i_data_sem before calling ->writepage() which is
safe since i_mutex still protects us against any changes in the file. Also fix
pagelock - i_data_sem lock inversion in udf_expand_file_adinicb() by dropping
i_data_sem before calling find_or_create_page().

CC: stable@kernel.org
Reported-by: Matthias Matiak <netzpython@mail-on.us>
Tested-by: Matthias Matiak <netzpython@mail-on.us>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
7b0b0933a3 udf: Cleanup calling convention of inode_getblk()
inode_getblk() always returned NULL and passed results in its parameters.
Make the function return something useful - found block number.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
ef6919c283 ext2: Fix error handling on inode bitmap corruption
When insert_inode_locked() fails in ext2_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by informing about filesystem error and
jumping to fail: (instead of fail_drop:) which doesn't call unlock_new_inode().

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Jan Kara
1415dd8705 ext3: Fix error handling on inode bitmap corruption
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Zheng Liu
d03e1292c4 ext3: replace ll_rw_block with other functions
ll_rw_block() is deprecated. Thus we replace it with other functions.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Dan Carpenter
bcdd0c1600 ext3: NULL dereference in ext3_evict_inode()
This is an fsfuzzer bug.  ->s_journal is set at the end of
ext3_load_journal() but we try to use it in the error handling from
ext3_get_journal() while it's still NULL.

[  337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[  337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.041687] PGD 0
[  337.043118] Oops: 0002 [#1] SMP
[  337.044483] CPU 3
[  337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
[  337.047633]
[  337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511    /RV411/RV511/E3511/S3511
[  337.051064] RIP: 0010:[<ffffffff816e6539>]  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.052879] RSP: 0018:ffff8800b1d11ae8  EFLAGS: 00010282
[  337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
[  337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
[  337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
[  337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
[  337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
[  337.063385] FS:  00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
[  337.065110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
[  337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
[  337.073800] Stack:
[  337.075487]  ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
[  337.077255]  ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
[  337.078851]  ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
[  337.080284] Call Trace:
[  337.081706]  [<ffffffff811f48cf>] log_start_commit+0x1f/0x40
[  337.083107]  [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0
[  337.084490]  [<ffffffff81131e31>] evict+0xa1/0x1a0
[  337.085857]  [<ffffffff81132031>] iput+0x101/0x210
[  337.087220]  [<ffffffff811339d1>] iget_failed+0x21/0x30
[  337.088581]  [<ffffffff811905fc>] ext3_iget+0x15c/0x450
[  337.089936]  [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100
[  337.091284]  [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde
[  337.092641]  [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30
[  337.093991]  [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60
[  337.095332]  [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0
[  337.096680]  [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210
[  337.098026]  [<ffffffff8119a770>] ext3_mount+0x10/0x20
[  337.099362]  [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0
[  337.100759]  [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10
[  337.102330]  [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0
[  337.103889]  [<ffffffff8113611f>] do_kern_mount+0x4f/0x100
[  337.105442]  [<ffffffff811378fc>] do_mount+0x19c/0x890
[  337.106989]  [<ffffffff810e8456>] ? memdup_user+0x46/0x90
[  337.108572]  [<ffffffff810e84f3>] ? strndup_user+0x53/0x70
[  337.110114]  [<ffffffff811383fb>] sys_mount+0x8b/0xe0
[  337.111617]  [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b
[  337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
[  337.116588] RIP  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.118260]  RSP <ffff8800b1d11ae8>
[  337.119998] CR2: 0000000000000024
[  337.188701] ---[ end trace c36d790becac1615 ]---

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22 13:36:15 +01:00
Yongqiang Yang
8c111b3f56 jbd: clear revoked flag on buffers before a new transaction started
Currently, we clear revoked flag only when a block is reused.  However,
this can tigger a false journal error.  Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file.  At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.

We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22 01:20:53 +01:00
Eryu Guan
63894ab9f6 ext3: call ext3_mark_recovery_complete() when recovery is really needed
Call ext3_mark_recovery_complete() in ext3_fill_super() only if
needs_recovery is non-zero.

Besides that, print out "recovery complete" message after calling
ext3_mark_recovery_complete().

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-09 12:23:17 +01:00
Al Viro
a3fbbde70a VFS: we need to set LOOKUP_JUMPED on mountpoint crossing
Mountpoint crossing is similar to following procfs symlinks - we do
not get ->d_revalidate() called for dentry we have arrived at, with
unpleasant consequences for NFS4.

Simple way to reproduce the problem in mainline:

    cat >/tmp/a.c <<'EOF'
    #include <unistd.h>
    #include <fcntl.h>
    #include <stdio.h>
    main()
    {
            struct flock fl = {.l_type = F_RDLCK, .l_whence = SEEK_SET, .l_len = 1};
            if (fcntl(0, F_SETLK, &fl))
                    perror("setlk");
    }
    EOF
    cc /tmp/a.c -o /tmp/test

then on nfs4:

    mount --bind file1 file2
    /tmp/test < file1		# ok
    /tmp/test < file2		# spews "setlk: No locks available"...

What happens is the missing call of ->d_revalidate() after mountpoint
crossing and that's where NFS4 would issue OPEN request to server.

The fix is simple - treat mountpoint crossing the same way we deal with
following procfs-style symlinks.  I.e.  set LOOKUP_JUMPED...

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 14:58:06 -08:00
Al Viro
50e696308c vfs: d_invalidate() should leave mountpoints alone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 10:54:10 -08:00
Linus Torvalds
ff4d7fa8c3 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Cleanup byte-range locking code style
  CIFS: Simplify setlk error handling for mandatory locking
2011-11-07 09:56:22 -08:00
Linus Torvalds
e0d65113a7 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (226 commits)
  mtd: tests: annotate as DANGEROUS in Kconfig
  mtd: tests: don't use mtd0 as a default
  mtd: clean up usage of MTD_DOCPROBE_ADDRESS
  jffs2: add compr=lzo and compr=zlib options
  jffs2: implement mount option parsing and compression overriding
  mtd: nand: initialize ops.mode
  mtd: provide an alias for the redboot module name
  mtd: m25p80: don't probe device which has status of 'disabled'
  mtd: nand_h1900 never worked
  mtd: Add DiskOnChip G3 support
  mtd: m25p80: add EON flash EN25Q32B into spi flash id table
  mtd: mark block device queue as non-rotational
  mtd: r852: make r852_pm_ops static
  mtd: m25p80: add support for at25df321a spi data flash
  mtd: mxc_nand: preset_v1_v2: unlock all NAND flash blocks
  mtd: nand: switch `check_pattern()' to standard `memcmp()'
  mtd: nand: invalidate cache on unaligned reads
  mtd: nand: do not scan bad blocks with NAND_BBT_NO_OOB set
  mtd: nand: wait to set BBT version
  mtd: nand: scrub BBT on ECC errors
  ...

Fix up trivial conflicts:
 - arch/arm/mach-at91/board-usb-a9260.c
	Merged into board-usb-a926x.c
 - drivers/mtd/maps/lantiq-flash.c
	add_mtd_partitions -> mtd_device_register vs changed to use
	mtd_device_parse_register.
2011-11-07 09:11:16 -08:00
Linus Torvalds
cf5e15fbd7 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix the dark space calculation
  UBIFS: introduce a helper to dump scanning info
2011-11-07 08:52:19 -08:00
Linus Torvalds
6a6662ced4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
  Btrfs: check for a null fs root when writing to the backup root log
  Btrfs: fix race during transaction joins
  Btrfs: fix a potential btrfs_bio leak on scrub fixups
  Btrfs: rename btrfs_bio multi -> bbio for consistency
  Btrfs: stop leaking btrfs_bios on readahead
  Btrfs: stop the readahead threads on failed mount
  Btrfs: fix extent_buffer leak in the metadata IO error handling
  Btrfs: fix the new inspection ioctls for 32 bit compat
  Btrfs: fix delayed insertion reservation
  Btrfs: ClearPageError during writepage and clean_tree_block
  Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
  Btrfs: make a delayed_block_rsv for the delayed item insertion
  Btrfs: add a log of past tree roots
  btrfs: separate superblock items out of fs_info
  Btrfs: use the global reserve when truncating the free space cache inode
  Btrfs: release metadata from global reserve if we have to fallback for unlink
  Btrfs: make sure to flush queued bios if write_cache_pages waits
  Btrfs: fix extent pinning bugs in the tree log
  Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
  Btrfs: don't wait as long for more batches during SSD log commit
  ...
2011-11-06 20:03:41 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
208bca0860 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Add a 'reason' to wb_writeback_work
  writeback: send work item to queue_io, move_expired_inodes
  writeback: trace event balance_dirty_pages
  writeback: trace event bdi_dirty_ratelimit
  writeback: fix ppc compile warnings on do_div(long long, unsigned long)
  writeback: per-bdi background threshold
  writeback: dirty position control - bdi reserve area
  writeback: control dirty pause time
  writeback: limit max dirty pause time
  writeback: IO-less balance_dirty_pages()
  writeback: per task dirty rate limit
  writeback: stabilize bdi->dirty_ratelimit
  writeback: dirty rate control
  writeback: add bg_threshold parameter to __bdi_update_bandwidth()
  writeback: dirty position control
  writeback: account per-bdi accumulated dirtied pages
2011-11-06 19:02:23 -08:00
Linus Torvalds
5d5a8d2d9d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph/super.c: quiet sparse noise
  ceph/mds_client.c: quiet sparse noise
  ceph: use new D_COMPLETE dentry flag
  ceph: clear parent D_COMPLETE flag when on dentry prune
2011-11-06 17:28:44 -08:00
Chris Mason
7c7e82a77f Btrfs: check for a null fs root when writing to the backup root log
During log replay, can commit the transaction before the fs_root
pointers are setup, so we have to make sure they are not null before
trying to use them.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 18:50:56 -05:00
Chris Mason
d43317dcd0 Btrfs: fix race during transaction joins
While we're allocating ram for a new transaction, we drop our spinlock.
When we get the lock back, we do check to see if a transaction started
while we slept, but we don't check to make sure it isn't blocked
because a commit has already started.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:26:19 -05:00
Ilya Dryomov
56d2a48f81 Btrfs: fix a potential btrfs_bio leak on scrub fixups
In case we were able to map less than we wanted (length < PAGE_SIZE
clause is true) btrfs_bio is still allocated and we have to free it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:29 -05:00
Ilya Dryomov
21ca543efc Btrfs: rename btrfs_bio multi -> bbio for consistency
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:21 -05:00
Ilya Dryomov
9510dc4c62 Btrfs: stop leaking btrfs_bios on readahead
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:11:08 -05:00
Chris Mason
306c8b68c8 Btrfs: stop the readahead threads on failed mount
If we don't stop them, they linger around corrupting
memory by using pointers to freed things.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:09:41 -05:00
Chris Mason
c674e04e1c Btrfs: fix extent_buffer leak in the metadata IO error handling
The scrub readahead branch brought in a new error handling hook,
but it was leaking extent_buffer references.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:09:10 -05:00
Chris Mason
740c3d226c Btrfs: fix the new inspection ioctls for 32 bit compat
The new ioctls to follow backrefs are not clean for 32/64 bit
compat.  This reworks them for u64s everywhere.  They are brand new, so
there are no problems with changing the interface now.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:08:49 -05:00
Chris Mason
806468f8bf Merge git://git.jan-o-sch.net/btrfs-unstable into integration
Conflicts:
	fs/btrfs/Makefile
	fs/btrfs/extent_io.c
	fs/btrfs/extent_io.h
	fs/btrfs/scrub.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:07:10 -05:00
Chris Mason
531f4b1ae5 Merge branch 'for-chris' of git://github.com/sensille/linux into integration
Conflicts:
	fs/btrfs/ctree.h

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:05:08 -05:00
Josef Bacik
c06a0e120a Btrfs: fix delayed insertion reservation
We all keep getting those stupid warnings from use_block_rsv when running
stress.sh, and it's because the delayed insertion stuff is being stupid.  It's
not the delayed insertion stuffs fault, it's all just stupid.  When marking an
inode dirty for oh say updating the time on it, we just do a
btrfs_join_transaction, which doesn't reserve any space.  This is stupid because
we're going to have to have space reserve to make this change, but we do it
because it's fast because chances are we're going to call it over and over again
and it doesn't matter.  Well thanks to the delayed insertion stuff this is
mostly the case, so we do actually need to make this reservation.  So if
trans->bytes_reserved is 0 then try to do a normal reservation.  If not return
ENOSPC which will make the btrfs_dirty_inode start a proper transaction which
will let it do the whole ENOSPC dance and reserve enough space for the delayed
insertion to steal the reservation from the transaction.

The other stupid thing we do is not reserve space for the inode when writing to
the thing.  Usually this is ok since we have to update the time so we'd have
already done all this work before we get to the endio stuff, so it doesn't
matter.  But this is stupid because we could write the data after the
transaction commits where we changed the mtime of the inode so we have to cow
all the way down to the inode anyway.  This used to be masked by the delalloc
reservation stuff, but because we delay the update it doesn't get masked in this
case.  So again the delayed insertion stuff bites us in the ass.  So if our
trans->block_rsv is delalloc, just steal the reservation from the delalloc
reserve.  Hopefully this won't bite us in the ass, but I've said that before.

With this patch stress.sh no longer spits out those stupid warnings (famous last
words).  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:20 -05:00
Chris Mason
bf0da8c183 Btrfs: ClearPageError during writepage and clean_tree_block
Failure testing was tripping up over stale PageError bits in
metadata pages.  If we have an io error on a block, and later on
end up reusing it, nobody ever clears PageError on those pages.

During commit, we'll find PageError and think we had trouble writing
the block, which will lead to aborts and other problems.

This changes clean_tree_block and the btrfs writepage code to
clear the PageError bit.  In both cases we're either completely
done with the page or the page has good stuff and the error bit
is no longer valid.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:20 -05:00
Josef Bacik
663350ac38 Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
Because of the overcommit stuff I had to make it so that we committed the
transaction all the time in reserve_metadata_bytes in case we had overcommitted
because of delayed items.  This was because previously we had no way of knowing
how much space was reserved for delayed items.  Now that we have the
delayed_block_rsv we can check it to see if committing the transaction would get
us anywhere.  This patch breaks out the committing logic into a helper function
that will check to see if committing the transaction would free enough space for
us to get anything done.  With this patch xfstests 83 goes from taking 445
seconds to taking 28 seconds on my box.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:19 -05:00
Josef Bacik
6d668dda0c Btrfs: make a delayed_block_rsv for the delayed item insertion
I've been hitting warnings in use_block_rsv when running the delayed insertion
stuff.  It's because we will readjust global block rsv based on what is in use,
which means we could end up discarding reservations that are for the delayed
insertion stuff.  So instead create a seperate block rsv for the delayed
insertion stuff.  This will also make it easier to debug problems with the
delayed insertion reservations since we will know that only the delayed
insertion code touches this block_rsv.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:18 -05:00
Chris Mason
af31f5e5b8 Btrfs: add a log of past tree roots
This takes some of the free space in the btrfs super block
to record information about most of the roots in the last four
commits.

It also adds a -o recovery to use the root history log when
we're not able to read the tree of tree roots, the extent
tree root, the device tree root or the csum root.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:04:15 -05:00
David Sterba
6c41761fc6 btrfs: separate superblock items out of fs_info
fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.

Signed-off-by: David Sterba <dsterba@suse.cz>
2011-11-06 03:04:01 -05:00
Josef Bacik
c8174313a8 Btrfs: use the global reserve when truncating the free space cache inode
We no longer use the orphan block rsv for holding the reservation for truncating
the inode, so instead use the global block rsv and check to make sure it has
enough space for us to truncate the space.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:50 -05:00
Josef Bacik
5a77d76c24 Btrfs: release metadata from global reserve if we have to fallback for unlink
I fixed a problem where we weren't reserving space for an orphan item when we
had to fallback to using the global reserve for an unlink, but I introduced
another problem.  I was migrating the bytes from the transaction reserve to the
global reserve and then releasing from the global reserve in
btrfs_end_transaction().  The problem with this is that a migrate will jack up
the size for the destination, but leave the size alone for the source, with the
idea that you can do a release normally on the source and it all washes out, and
then you can do a release again on the destination and it works out right.  My
way was skipping the release on the trans_block_rsv which still had the jacked
up size from our original reservation.  So instead release manually from the
global reserve if this transaction was using it, and then set the
trans->block_rsv back to the trans_block_rsv so that btrfs_end_transaction
cleans everything up properly.  With this patch xfstest 83 doesn't emit warnings
about leaking space.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:49 -05:00
Chris Mason
01d658f2ca Btrfs: make sure to flush queued bios if write_cache_pages waits
write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:48 -05:00
Chris Mason
e688b7252f Btrfs: fix extent pinning bugs in the tree log
The tree log had two important bugs that could cause corruptions after a
crash.  Sometimes we were allowing tree log blocks to be reused after
the tree log was committed but before the transaction commit was done.

This allowed a future metadata write to overwrite the tree log data.  It
is fixed by adding a new variant of freeing reserved extents that always
pins them.  Credit goes to Stefan Behrens and Arne Jansen for many many
hours spent tracking this bug down.

During tree log replay, we do a pass through the tree log and pin all
the extents we find.  This makes sure the replay code won't go in and
use any of those blocks for new allocations during replay.  The problem
is the free space cache isn't honoring these pinned extents.  So the
allocator can end up handing them out, leading to all kinds of problems
during replay.

The fix here is to force any free space cache to load while we pin the
extents, and then to make sure we remove the pinned extents from the
free space rbtree.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-11-06 03:03:48 -05:00
Chris Mason
1eae31e918 Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
btrfs_remove_free_space needs to make sure to set ret back to a
valid return value after setting it to EAGAIN, otherwise we return
it to the callers.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:47 -05:00
Chris Mason
cd354ad613 Btrfs: don't wait as long for more batches during SSD log commit
When we're doing log commits, we try to wait for more writers to come in
and make the commit bigger.  This helps improve performance on rotating
disks, but on SSDs it adds latencies.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06 03:03:47 -05:00
H Hartley Sweeten
0c6d4b4e22 ceph/super.c: quiet sparse noise
Quiet the sparse noise:

warning: symbol 'create_fs_client' was not declared. Should it be static?
warning: symbol 'destroy_fs_client' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-05 21:10:12 -07:00
H Hartley Sweeten
7fd7d101ff ceph/mds_client.c: quiet sparse noise
Quiet the following sparse noise:

warning: symbol 'get_nonsnap_parent' was not declared. Should it be static?
warning: symbol 'done_closing_sessions' was not declared. Should it be static?

Local functions don't need external visability. Make them static.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-05 21:10:11 -07:00
Sage Weil
c6ffe10015 ceph: use new D_COMPLETE dentry flag
We used to use a flag on the directory inode to track whether the dcache
contents for a directory were a complete cached copy.  Switch to a dentry
flag CEPH_D_COMPLETE that is safely updated by ->d_prune().

Signed-off-by: Sage Weil <sage@newdream.net>
2011-11-05 21:10:10 -07:00
Dan McGee
5c8a0fbba5 VFS: fix statfs() automounter semantics regression
No one in their right mind would expect statfs() to not work on a
automounter managed mount point. Fix it.

[ I'm not sure about the "no one in their right mind" part.  It's not
  mounted, and you didn't ask for it to be mounted.  But nobody will
  really care, and this probably makes it match previous semantics, so..
      - Linus ]

This mirrors the fix made to the quota code in 815d405cef.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-04 18:15:59 -07:00
Linus Torvalds
3d0a8d10cf Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
  virtio-blk: use ida to allocate disk index
  hpsa: add small delay when using PCI Power Management to reset for kump
  cciss: add small delay when using PCI Power Management to reset for kump
  xen/blkback: Fix two races in the handling of barrier requests.
  xen/blkback: Check for proper operation.
  xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
  xen/blkback: Report VBD_WSECT (wr_sect) properly.
  xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
  xen-blkfront: plug device number leak in xlblk_init() error path
  xen-blkfront: If no barrier or flush is supported, use invalid operation.
  xen-blkback: use kzalloc() in favor of kmalloc()+memset()
  xen-blkback: fixed indentation and comments
  xen-blkfront: fix a deadlock while handling discard response
  xen-blkfront: Handle discard requests.
  xen-blkback: Implement discard requests ('feature-discard')
  xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
  drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
  drivers/block/loop.c: emit uevent on auto release
  drivers/block/cpqarray.c: use pci_dev->revision
  loop: always allow userspace partitions and optionally support automatic scanning
  ...

Fic up trivial header file includsion conflict in drivers/block/loop.c
2011-11-04 17:22:14 -07:00