Commit graph

261086 commits

Author SHA1 Message Date
KAMEZAWA Hiroyuki
bb2a0de92c memcg: consolidate memory cgroup lru stat functions
In mm/memcontrol.c, there are many lru stat functions as..

  mem_cgroup_zone_nr_lru_pages
  mem_cgroup_node_nr_file_lru_pages
  mem_cgroup_nr_file_lru_pages
  mem_cgroup_node_nr_anon_lru_pages
  mem_cgroup_nr_anon_lru_pages
  mem_cgroup_node_nr_unevictable_lru_pages
  mem_cgroup_nr_unevictable_lru_pages
  mem_cgroup_node_nr_lru_pages
  mem_cgroup_nr_lru_pages
  mem_cgroup_get_local_zonestat

Some of them are under #ifdef MAX_NUMNODES >1 and others are not.
This seems bad. This patch consolidates all functions into

  mem_cgroup_zone_nr_lru_pages()
  mem_cgroup_node_nr_lru_pages()
  mem_cgroup_nr_lru_pages()

For these functions, "which LRU?" information is passed by a mask.

example:
  mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))

And I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.

example:
  mem_cgroup_nr_lru_pages(mem, ALL_LRU)

BTW, considering layout of NUMA memory placement of counters, this patch seems
to be better.

Now, when we gather all LRU information, we scan in following orer
    for_each_lru -> for_each_node -> for_each_zone.

This means we'll touch cache lines in different node in turn.

After patch, we'll scan
    for_each_node -> for_each_zone -> for_each_lru(mask)

Then, we'll gather information in the same cacheline at once.

[akpm@linux-foundation.org: fix warnigns, build error]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:42 -07:00
KAMEZAWA Hiroyuki
1f4c025b5a memcg: export memory cgroup's swappiness with mem_cgroup_swappiness()
Each memory cgroup has a 'swappiness' value which can be accessed by
get_swappiness(memcg).  The major user is try_to_free_mem_cgroup_pages()
and swappiness is passed by argument.  It's propagated by scan_control.

get_swappiness() is a static function but some planned updates will need
to get swappiness from files other than memcontrol.c This patch exports
get_swappiness() as mem_cgroup_swappiness().  With this, we can remove the
argument of swapiness from try_to_free...  and drop swappiness from
scan_control.  only memcg uses it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:42 -07:00
Thomas Gleixner
b830ac1d9a rtc: fix hrtimer deadlock
Ben reported a lockup related to rtc. The lockup happens due to:

CPU0                                        CPU1

rtc_irq_set_state()			    __run_hrtimer()
  spin_lock_irqsave(&rtc->irq_task_lock)    rtc_handle_legacy_irq();
					      spin_lock(&rtc->irq_task_lock);
  hrtimer_cancel()
    while (callback_running);

So the running callback never finishes as it's blocked on
rtc->irq_task_lock.

Use hrtimer_try_to_cancel() instead and drop rtc->irq_task_lock while
waiting for the callback.  Fix this for both rtc_irq_set_state() and
rtc_irq_set_freq().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:42 -07:00
Thomas Gleixner
431e2bcc37 rtc: limit frequency
Due to the hrtimer self rearming mode a user can DoS the machine simply
because it's starved by hrtimer events.

The RTC hrtimer is self rearming.  We really need to limit the frequency
to something sensible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ben Greear <greearb@candelatech.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:42 -07:00
Thomas Gleixner
2c4f57d12d rtc: handle errors correctly in rtc_irq_set_state()
The code checks the correctness of the parameters, but unconditionally
arms/disarms the hrtimer.

The result is that a random task might arm/disarm rtc timer and surprise
the real owner by either generating events or by stopping them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ben Greear <greearb@candelatech.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:41 -07:00
Mathias Krause
b45d59fb92 mn10300, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:41 -07:00
Jonghwan Choi
fc92805a8e drivers/base/power/opp.c: fix dev_opp initial value
Dev_opp initial value shoule be ERR_PTR(), IS_ERR() is used to check
error.

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:41 -07:00
Mathias Krause
adc400f690 frv, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Also removed the dead code in flush_thread().

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:41 -07:00
Tomoya MORINAGA
07e729ce89 i2c-eg20t : Fix the issue of Combined R/W transfer mode
issue-1
In case combined transfer mode fails halfway, the processing must be stopped halfway.
However currently, the processing is continued.
This patch breaks the processing.

issue-2
Currently, pch_i2c_xfer returns read/write size at that time.
However pch_i2c_xfer must return the number of messages to be read/written.
This patch modifies correctly.

Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2011-07-27 00:02:28 +01:00
Tomoya MORINAGA
7a9c42ccc9 i2c-eg20t : Support Combined R/W transfer mode
Currently, Combined R/W transfer mode is not supported.
This patch enables Combined R/W transfer mode.

Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2011-07-27 00:02:28 +01:00
John Bonesio
5c470f39ee i2c: Tegra: Add DeviceTree support
This patch modifies the tegra i2c driver so that it can be initiailized
using the device tree along with the devices connected to the i2c bus.

Signed-off-by: John Bonesio <bones@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: OIof Johansson <olof@lixom.net>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2011-07-27 00:02:28 +01:00
Linus Torvalds
6fd4ce8864 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (31 commits)
  MIPS: Close races in TLB modify handlers.
  MIPS: Add uasm UASM_i_SRL_SAFE macro.
  MIPS: RB532: Use hex_to_bin()
  MIPS: Enable cpu_has_clo_clz for MIPS Technologies' platforms
  MIPS: PowerTV: Provide cpu-feature-overrides.h
  MIPS: Remove pointless return statement from empty void functions.
  MIPS: Limit fixrange_init() to the FIXMAP region
  MIPS: Install handlers for software IRQs
  MIPS: Move FIXADDR_TOP into spaces.h
  MIPS: Add SYNC after cacheflush
  MIPS: pfn_valid() is broken on low memory HIGHMEM systems
  MIPS: HIGHMEM DMA on noncoherent MIPS32 processors
  MIPS: topdown mmap support
  MIPS: Remove redundant addr_limit assignment on exec.
  MIPS: AR7: Replace __attribute__((__packed__)) with __packed
  MIPS: AR7: Remove 'space before tabs' in platform.c
  MIPS: Lantiq: Add missing clk_enable and clk_disable functions.
  MIPS: AR7: Fix trailing semicolon bug in clock.c
  MAINTAINERS: Update MIPS entry.
  MIPS: BCM63xx: Remove duplicate PERF_IRQSTAT_REG definition
  ...
2011-07-26 14:17:28 -07:00
John W. Linville
e5036c2575 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2011-07-26 17:05:55 -04:00
Linus Torvalds
ba5b56cb3e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
  ceph: document unlocked d_parent accesses
  ceph: explicitly reference rename old_dentry parent dir in request
  ceph: document locking for ceph_set_dentry_offset
  ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
  ceph: protect d_parent access in ceph_d_revalidate
  ceph: protect access to d_parent
  ceph: handle racing calls to ceph_init_dentry
  ceph: set dir complete frag after adding capability
  rbd: set blk_queue request sizes to object size
  ceph: set up readahead size when rsize is not passed
  rbd: cancel watch request when releasing the device
  ceph: ignore lease mask
  ceph: fix ceph_lookup_open intent usage
  ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
  ceph: fix bad parent_inode calc in ceph_lookup_open
  ceph: avoid carrying Fw cap during write into page cache
  libceph: don't time out osd requests that haven't been received
  ceph: report f_bfree based on kb_avail rather than diffing.
  ceph: only queue capsnap if caps are dirty
  ceph: fix snap writeback when racing with writes
  ...
2011-07-26 13:38:50 -07:00
Mihai Moldovan
5bc91db893 wireless: fix a typo in ignore_reg_update
Just a typo fix changing regulaotry to regulatory.

Signed-off-by: Mihai Moldovan <ionic@ionic.de>
CC: John W. Linville <linville@tuxdriver.com>
CC: Mohammed Shafi <shafi.wireless@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:32 -04:00
Pavel Roskin
e61b52d130 b43: fix invalid memory access in b43_ssb_remove()
wldev is freed in b43_one_core_detach() and should not be accessed after
that call.  Keep wldev->dev in a local variable.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:31 -04:00
Rafał Miłecki
6f89062a66 b43: bcma: drop BROKEN
We've fixed the last issue with BCMA support which caused memory
corruption on loading and unloading b43. Support for BCMA in b43 was
tested with 14e4:4353, 14e4:4357, 14e4:4727 and 14e4:4331. First two
cards (BCM43224 and BCM43225) are supported.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:31 -04:00
Rafał Miłecki
f76f424353 b43: bus: fix memory corruption when setting driver's data
Fixes bug described in:
https://bugzilla.kernel.org/show_bug.cgi?id=39172

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:30 -04:00
Sven Neumann
a203c2aa4c cfg80211: really ignore the regulatory request
At the beginning of wiphy_update_regulatory() a check is performed
whether the request is to be ignored. Then the request is sent to
the driver nevertheless. This happens even if last_request points
to NULL, leading to a crash in the driver:

 [<bf01d864>] (lbs_set_11d_domain_info+0x28/0x1e4 [libertas]) from [<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4)
 [<c03b714c>] (wiphy_update_regulatory+0x4d0/0x4f4) from [<c03b4008>] (wiphy_register+0x354/0x420)
 [<c03b4008>] (wiphy_register+0x354/0x420) from [<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas])
 [<bf01b17c>] (lbs_cfg_register+0x80/0x164 [libertas]) from [<bf020e64>] (lbs_start_card+0x20/0x88 [libertas])
 [<bf020e64>] (lbs_start_card+0x20/0x88 [libertas]) from [<bf02cbd8>] (if_sdio_probe+0x898/0x9c0 [libertas_sdio])

Fix this by returning early. Also remove the out: label as it is
not any longer needed.

Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Cc: linux-wireless@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:29 -04:00
Dan Carpenter
276556dbd2 NFC: pn533: use after free in pn533_disconnect()
We freed "dev" on the line before.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-26 16:27:24 -04:00
Al Viro
e57712ebeb merge fchmod() and fchmodat() guts, kill ancient broken kludge
The kludge in question is undocumented and doesn't work for 32bit
binaries on amd64, sparc64 and s390.  Passing (mode_t)-1 as
mode had (since 0.99.14v and contrary to behaviour of any
other Unix, prescriptions of POSIX, SuS and our own manpages)
was kinda-sorta no-op.  Note that any software relying on
that (and looking for examples shows none) would be visibly
broken on sparc64, where practically all userland is built
32bit.  No such complaints noticed...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-26 15:07:43 -04:00
Al Viro
03209378b4 xfs: fix misspelled S_IS...()
mode_t is not a bitmap...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-26 15:05:30 -04:00
Al Viro
abbede1b3a xfs: get rid of open-coded S_ISREG(), etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-26 15:05:16 -04:00
Stephen Rothwell
243dd2809a gma500: udelay(20000) it too long again
so replace it with mdelay(20).

Fixes build error:

  ERROR: "__bad_udelay" [drivers/staging/gma500/psb_gfx.ko] undefined!

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 11:55:14 -07:00
Rafael J. Wysocki
9c646cfc3d USB / Renesas: Fix build issue related to struct scatterlist
Fix build issue caused by undefined struct scatterlist in
drivers/usb/renesas_usbhs/fifo.c.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 11:52:55 -07:00
Rafael J. Wysocki
6c0cbef666 MMC / TMIO: Fix build issue related to struct scatterlist
Fix build issue caused by undefined struct scatterlist in
drivers/mmc/host/tmio_mmc.c.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 11:52:55 -07:00
Linus Torvalds
2ac232f37f Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
  ext3.txt: update the links in the section "useful links" to the latest ones
  ext3: Fix data corruption in inodes with journalled data
  ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
  ext3: Fix compilation with -DDX_DEBUG
  quota: Remove unused declaration
  jbd: Use WRITE_SYNC in journal checkpoint.
  jbd: Fix oops in journal_remove_journal_head()
  ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
  ext3/ioctl.c: silence sparse warnings about different address spaces
  ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
  ext3: Improve truncate error handling
  ext3: use proper little-endian bitops
  ext2: include fs.h into ext2_fs.h
  ext3: Fix oops in ext3_try_to_allocate_with_rsv()
  jbd: fix a bug of leaking jh->b_jcount
  jbd: remove dependency on __GFP_NOFAIL
  ext3: Convert ext3 to new truncate calling convention
  jbd: Add fixed tracepoints
  ext3: Add fixed tracepoints

Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
new fixed tracepoints.
2011-07-26 11:34:40 -07:00
Sage Weil
d79698da32 ceph: document unlocked d_parent accesses
For the most part we don't care about racing with rename when directing
MDS requests; either the old or new parent is fine.  Document that, and
do some minor cleanup.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:31:26 -07:00
Sage Weil
41b02e1f9b ceph: explicitly reference rename old_dentry parent dir in request
We carry a pin on the parent directory for the rename source and dest
dentries.  For the source it's r_locked_dir; we need to explicitly
reference the old_dentry parent as well, since the dentry's d_parent may
change between when the request was created and pinned and when it is
freed.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:31:14 -07:00
Sage Weil
4f17726452 ceph: document locking for ceph_set_dentry_offset
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:31:08 -07:00
Sage Weil
e5f86dc377 ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug
Have caller pass in a safely-obtained reference to the parent directory
for calculating a dentry's hash valud.

While we're here, simpify the flow through ceph_encode_fh() so that there
is a single exit point and cleanup.

Also fix a bug with the dentry hash calculation: calculate the hash for the
dentry we were given, not its parent.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:30:55 -07:00
Sage Weil
bf1c6aca96 ceph: protect d_parent access in ceph_d_revalidate
Protect d_parent with d_lock.  Carry a reference.  Simplify the flow so
that there is a single exit point and cleanup.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:30:43 -07:00
Sage Weil
5f21c96dd5 ceph: protect access to d_parent
d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode.  Also take a reference and drop it in the caller to avoid
a use-after-free.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:30:29 -07:00
Sage Weil
48d0cbd124 ceph: handle racing calls to ceph_init_dentry
The ->lookup() and prepopulate_readdir() callers are working with unhashed
dentries, so we don't have to worry.  The export.c callers, though, need
to initialize something they got back from d_obtain_alias() and are
potentially racing with other callers.  Make sure we don't return unless
the dentry is properly initialized (by us or someone else).

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:30:15 -07:00
Sage Weil
dfabbed6fd ceph: set dir complete frag after adding capability
Curretly ceph_add_cap clears the complete bit if we are newly issued the
FILE_SHARED cap, which is normally the case for a newly issue cap on a new
directory.  That means we clear the just-set bit.  Move the check that sets
the flag to after the cap is added/updated.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:30:02 -07:00
Josh Durgin
029bcbd8b0 rbd: set blk_queue request sizes to object size
This improves performance since more requests can be merged.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-07-26 11:29:35 -07:00
Yehuda Sadeh
e985222743 ceph: set up readahead size when rsize is not passed
This should improve the default read performance, as without it
readahead is practically disabled.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-07-26 11:29:14 -07:00
Yehuda Sadeh
79e3057c4c rbd: cancel watch request when releasing the device
We were missing this cleanup, so when a device was released
the osd didn't clean up its watchers list, so following notifications
could be slow as osd needed to timeout on the client.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-07-26 11:29:04 -07:00
Sage Weil
2f90b852e3 ceph: ignore lease mask
The lease mask is no longer used (and it changed a while back).  Instead,
use a non-zero duration to indicate that there is a lease being issued.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:28:25 -07:00
Sage Weil
468640e32c ceph: fix ceph_lookup_open intent usage
We weren't properly calling lookup_instantiate_filp when setting up the
lookup intent, which could lead to file leakage on errors.  So:

 - use separate helper for the hidden snapdir translation, immediately
   following the mds request
 - use ceph_finish_lookup for the final dentry/return value dance in the
   exit path
 - lookup_instantiate_filp on success

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:28:11 -07:00
Sage Weil
9bae113a08 ceph: only link open operations to directory unsafe list if O_CREAT|O_TRUNC
We only need to put these on the directory unsafe list if they have
side effects that fsync(2) should flush out.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:27:59 -07:00
Sage Weil
acda765788 ceph: fix bad parent_inode calc in ceph_lookup_open
We were always getting NULL here because the intent file f_dentry is always
NULL at this point, which means we were always passing NULL to
ceph_mdsc_do_request.  In reality, this was fine, since this isn't
currently ever a write operation that needs to get strung on the dir's
unsafe list.

Use the dir explicitly, and only pass it if this open has side-effects that
a dir fsync should flush.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:27:48 -07:00
Sage Weil
d8de9ab63a ceph: avoid carrying Fw cap during write into page cache
The generic_file_aio_write call may block on balance_dirty_pages while we
flush data to the OSDs.  If we hold a reference to the FILE_WR cap during
that interval revocation by the MDS (e.g., to do a stat(2)) may be very
slow.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:27:34 -07:00
Sage Weil
4cf9d54463 libceph: don't time out osd requests that haven't been received
Keep track of when an outgoing message is ACKed (i.e., the server fully
received it and, presumably, queued it for processing).  Time out OSD
requests only if it's been too long since they've been received.

This prevents timeouts and connection thrashing when the OSDs are simply
busy and are throttling the requests they read off the network.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:27:24 -07:00
Greg Farnum
8f04d42276 ceph: report f_bfree based on kb_avail rather than diffing.
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-07-26 11:27:06 -07:00
Sage Weil
e77dc3e9c0 ceph: only queue capsnap if caps are dirty
We used to go into this branch if i_wrbuffer_ref_head was non-zero.  This
was an ancient check from before we were careful about dealing with all
kinds of caps (and not just dirty pages).  It is cleaner to only queue a
capsnap if there is an actual dirty cap.  If we are racing with...
something...we will end up here with ci->i_wrbuffer_refs but no dirty
caps.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:26:41 -07:00
Sage Weil
af0ed569d7 ceph: fix snap writeback when racing with writes
There are two problems that come up when we try to queue a capsnap while a
write is in progress:

 - The FILE_WR cap is held, but not yet dirty, so we may queue a capsnap
   with dirty == 0.  That will crash later in __ceph_flush_snaps().  Or
   on the FILE_WR cap if a write is in progress.
 - We may not have i_head_snapc set, which causes problems pretty quickly.
   Look to the snaprealm in this case.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:26:31 -07:00
Sage Weil
9cfa1098dc ceph: use flag bit for at_end readdir flag
This saves us a word of memory per file.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:26:18 -07:00
Sage Weil
4918b6d140 ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io
This allows us to force IO through the sync path which you normally only
get when multiple clients are reading/writing to the same file or by
mounting with -o sync.  Among other things, this lets test programs verify
correctness with a single mount.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:26:07 -07:00
Sage Weil
252c6728de ceph: add flags field to file_info
Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-07-26 11:25:27 -07:00