Commit graph

3989 commits

Author SHA1 Message Date
Mikulas Patocka
3ec981e30f loop: fix crash if blk_alloc_queue fails
loop: fix crash if blk_alloc_queue fails

If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
identifier allocated with idr_alloc. That causes crash on module unload in
idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
remove non-existed device with that id.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
PGD 43d399067 PUD 43d0ad067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
 ton unix
CPU: 7 PID: 2735 Comm: rmmod Tainted: G        W    3.10.15-devel #15
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
RIP: 0010:[<ffffffff812057c9>]  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
RSP: 0018:ffff88043d21fe10  EFLAGS: 00010282
RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
FS:  00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
 00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
 ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
Call Trace:
 [<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
 [<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
 [<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
 [<ffffffff81217b74>] idr_for_each+0x104/0x190
 [<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
 [<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
 [<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
 [<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
RIP  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
 RSP <ffff88043d21fe10>
CR2: 0000000000000380
---[ end trace 64ec069ec70f1309 ]---

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org	# 3.1+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:26 -07:00
Heinz Graalfs
7f03b17d5c virtio_blk: verify if queue is broken after virtqueue_get_buf()
In case virtqueue_get_buf() returned with a NULL pointer verify if the
virtqueue is broken in order to leave while loop.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29 11:28:17 +10:30
Olof Johansson
43d93947a5 Via Paul Walmsley <paul@pwsan.com>:
Move some of the OMAP2+ CM and System Control Module direct
 register accesses into CM- and System Control
 Module-specific "drivers" underneath arch/arm/mach-omap2/.  This
 is a prerequisite for moving this code out of arch/arm/mach-omap2/ into
 drivers/.
 
 Basic test logs are available here:
 
 http://www.pwsan.com/omap/testlogs/cm_scm_cleanup_a_v3.13/20131019101809/
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSZAZXAAoJEBvUPslcq6VzCWQQAKH4Rj0izwbbLkgBAeeaQz5K
 oJgPJ6UPLOJ2uLIUauCKUSR6+nktrCTfV8P+J4DhCc6OiGrKBXJhSETPgaTbWsNw
 Bd577pmuvXSfNFXUaLwCgkSmafJ1pi6d7kEx/7ZW3TziVE/aUxyeHkrMtWJHrjTP
 28tJVieOxLlO5iK06DfmGcCpLUBKJKtgGRo0h/oqMhLAaN5S8//lyVYgdsto7oCN
 /bes6OpuVVdKiSr78V4rCVtR5Lij5+lVrT8HDiw2BA0V3bYcI7+CVlWBPZ3mYkuy
 oAJDcn9whNyfWS+SsaTIjy6nHsgQkhEJnhrQW3k2skVZobRtWDv7U5LiTjsUhb3o
 pjyWD8zZ7jqrkgyLsai6dm1zsljMQXsIQwH5h++HdCRhtNOXd6bVQZy0KqkpLu0y
 Bhpt8/edh4Bdc305oB05/Y9Uxr7Gr8M377chVZx+JD3rxIDjRRyOJcRIhd27WZEf
 HSMLpO/ayUXWdDuTlKW0IEnImx3PrxT913cnjIY589FhfdahfGQoft4sWDeiQLAX
 +zVYZljeY+GxbUWO6aY4m2PfVN9p/Hwal58NZZgj59wq9iHUuJErK11X7rj+2vwN
 +20IS8sikz6Iym84iC0T+omUeFVY0Zo004DVvpPB+D1C2LpwdI1c6kTz4DYT1EBP
 pvs8Wihkk7xQxQn0rBGP
 =L37r
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v3.13/cm-scm-cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup

From Paul Walmsley <paul@pwsan.com> via Tony Lindgren:

Move some of the OMAP2+ CM and System Control Module direct
register accesses into CM- and System Control
Module-specific "drivers" underneath arch/arm/mach-omap2/.  This
is a prerequisite for moving this code out of arch/arm/mach-omap2/ into
drivers/.

Basic test logs are available here:

http://www.pwsan.com/omap/testlogs/cm_scm_cleanup_a_v3.13/20131019101809/

* tag 'omap-for-v3.13/cm-scm-cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP3: control: add API for setting IVA bootmode
  ARM: OMAP3: CM/control: move CM scratchpad save to CM driver
  ARM: OMAP3: McBSP: do not access CM register directly
  ARM: OMAP3: clock: add API to enable/disable autoidle for a single clock
  ARM: OMAP2: CM/PM: remove direct register accesses outside CM code
  + Linux 3.12-rc4

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-10-28 14:39:03 -07:00
Jens Axboe
f2298c0403 null_blk: multi queue aware block test driver
A driver that simply completes IO it receives, it does no
transfers. Written to fascilitate testing of the blk-mq code.
It supports various module options to use either bio queueing,
rq queueing, or mq mode.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:56:00 +01:00
Jens Axboe
5953316dbf block: make rq->cmd_flags be 64-bit
We have officially run out of flags in a 32-bit space. Extend it
to 64-bit even on 32-bit archs.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
Rusty Russell
855e0c5288 virtio: use size-based config accessors.
This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17 10:55:37 +10:30
Olof Johansson
ec01791d0f This deletes the Shark SA110-based sub-architecture from
the kernel.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSQuSSAAoJEEEQszewGV1zi00QALIJQt28Ir03WoxpbfLev2WP
 2s/f3zwW76XkNhWhh7FNG+GCl7VUcLcLSXod/nYE7wvbVtx9DVGzpWRbzqmNW/Ti
 5eqkk9EQso1G8jEseBiKaWsO7JxKkwa+nEYkGaqbm9K5Qk9iVlQ1hTF/m9/L64Dq
 I4BazqHJuU8g7sJhNaUaVvJLX5YXEeYTaTl43XoEEHy24HDqnvT2sIkiEj1WXLGE
 EmsOHVYLeJ4z7kZK8Dl0ZW/YFzrP3UAM25KObRYJXKAwGZhbTNreujF0RDdg5CBY
 6etMU7h7W90KkCGXzbgt4AkHwBDHM6EGb81ZiKOWZBGlrQRQFyAas4e8UTgEQh1F
 +I9PKNY/m94F9+RMpV9a35hNHNUj/+XskmXmoIFnCgSsrgYDIoSzZqV+v3iVDHRb
 oeX4R6S5ZbmxT3QC0n1xdctGn8BbgzGpAUnvfcNHG/IypNjCuGk9Femu7CQS2Pka
 R3BuTcizcppbITaFeaLLorRkC8U/GPuh5JmMUpyJh/+aq+wiWDw9oBEVfNclHB67
 Qo44EAN1WPv9M+EyQAhzUndJIFj0Ib4wUM7ILjHeXJLVvwQfpE1dQURDs//6Wwgp
 /hALpZqa5Y9oxPJO3Fgmmmci6q0P3+EFX5PT8GO2ecGWhhDlCSfLWc/lEtYrjzxG
 t77iKH/DdzX5nfrr4AQQ
 =r6/u
 -----END PGP SIGNATURE-----

Merge tag 'del-shark-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into next/cleanup

From Linus Walleij:

This deletes the Shark SA110-based sub-architecture from
the kernel.

* tag 'del-shark-for-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  video: drop code for ARCH_SHARK in cyber2000fb
  input: i8042: drop dependency on ARCH_SHARK
  ide: drop dependency on ARCH_SHARK
  block: drop dependency on ARCH_SHARK
  MAINTAINERS: delete Shark machine entry
  ARM: delete mach-shark

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-25 21:55:46 -07:00
Linus Walleij
3c5710f6a2 block: drop dependency on ARCH_SHARK
With this machine deleted, there is no need to maintain the
MFM block driver for its hard disk either.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-09-25 15:25:17 +02:00
Dan Carpenter
58f09e00ae cciss: fix info leak in cciss_ioctl32_passthru()
The arg64 struct has a hole after ->buf_size which isn't cleared.  Or if
any of the calls to copy_from_user() fail then that would cause an
information leak as well.

This was assigned CVE-2013-2147.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:26 -07:00
Dan Carpenter
627aad1c01 cpqarray: fix info leak in ida_locked_ioctl()
The pciinfo struct has a two byte hole after ->dev_fn so stack
information could be leaked to the user.

This was assigned CVE-2013-2147.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:26 -07:00
Aaron Lu
8910700039 virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-23 15:45:58 +09:30
Russell King
052d0efada DMA-API: block: nvme-core: replace dma_set_mask()+dma_set_coherent_mask() with new helper
Replace the following sequence:

	dma_set_mask(dev, mask);
	dma_set_coherent_mask(dev, mask);

with a call to the new helper dma_set_mask_and_coherent().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-09-21 21:02:24 +01:00
Linus Torvalds
e9ff04dd94 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "These fix several bugs with RBD from 3.11 that didn't get tested in
  time for the merge window: some error handling, a use-after-free, and
  a sequencing issue when unmapping and image races with a notify
  operation.

  There is also a patch fixing a problem with the new ceph + fscache
  code that just went in"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  fscache: check consistency does not decrement refcount
  rbd: fix error handling from rbd_snap_name()
  rbd: ignore unmapped snapshots that no longer exist
  rbd: fix use-after free of rbd_dev->disk
  rbd: make rbd_obj_notify_ack() synchronous
  rbd: complete notifies before cleaning up osd_client and rbd_dev
  libceph: add function to ensure notifies are complete
2013-09-19 12:50:37 -05:00
Martin Schwidefsky
0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Joe Perches
666dc7c90a pktcdvd: fix defective misuses of pkt_<level>
Fix thinkos where pkt_<level> needs a valid pktcdvd_device * and the
pointer is known to be NULL.

Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (go smatch!)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:34 -07:00
Joe Perches
f3ded788bb pktcdvd: add struct pktcdvd_device * to pkt_dump_sense()
Allow the device name to be emitted with pkt_err when logging the sense
data.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:33 -07:00
Joe Perches
0c075d64df pktcdvd: convert pr_info to pkt_info
Add a new pkt_info macro to prefix the name to the logging output.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:33 -07:00
Joe Perches
ca73dabc3d pktcdvd: convert pr_notice to pkt_notice
Add a new pkt_notice macro to prefix the name to the logging output.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:32 -07:00
Joe Perches
fa63c0ab81 pktcdvd: add struct pktcdvd_device.name to pr_err logging where possible
Add a new pkt_err macro to prefix the name to the logging output.  Convert
pr_err where there is a non-null struct pktcdvd_device.

Includes improvements from Andy Shevchenko.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:32 -07:00
Joe Perches
844aa79743 pktcdvd: add struct pktcdvd_device * to pkt_dbg
Add pd->name to output for these debugging messages.

Remove normally compiled out pkt_dbg(2, ...) function entry tracing
equivalents as it's better done via the function tracer.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:32 -07:00
Joe Perches
cd3f2cd05c pktcdvd: consolidate DPRINTK and VPRINTK macros
Use the more common pkt_dbg(level, fmt, ...) form.

These messages are emitted at KERN_NOTICE.

Always emit function name with pkt_dbg(2, ...) uses and remove the
sometimes abbreviated embedded function name.

This form always verifies the format and arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:32 -07:00
Joe Perches
99481334bc pktcdvd: convert printk to pr_<level>
Use a more current logging style and add messages levels to the logging
messages.

Simplify pkt_dump_sense by using %*ph and adding a simple function to emit
the sense string.

Includes improvements from Andy Shevchenko and Dan Carpenter.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:30 -07:00
Joe Perches
5323fb770b pktcdvd: convert ZONE macro to static function get_zone()
Macros should be converted to functions where feasible to
verify arguments and the like.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:30 -07:00
Ed Cashin
fea1b13973 aoe: do not BUG if memory pressure prevented debugfs file creation
If the system has trouble allocating memory for the creation of the aoe
debugfs directory or of a file inside it, the debugfs member of an aoedev
can be NULL.

Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown, avoiding
the user impact of an unecessary panic.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:28 -07:00
Andy Shevchenko
e0ec360597 aoe: suppress compiler warnings
This patch fixes following compiler warnings:

  drivers/block/aoe/aoecmd.c: In function `aoecmd_ata_rw':
  drivers/block/aoe/aoecmd.c:383:17: warning: variable `t' set but not used [-Wunused-but-set-variable]
    struct aoetgt *t;
                   ^
  drivers/block/aoe/aoecmd.c: In function `resend':
  drivers/block/aoe/aoecmd.c:488:21: warning: variable `ah' set but not used [-Wunused-but-set-variable]
    struct aoe_atahdr *ah;
                       ^

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:27 -07:00
Andy Shevchenko
a88c1f0cac aoe: remove custom implementation of kbasename()
In the kernel we have a nice helper that may be used here. This patch
substitutes the custom implementation by the native function call.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:26 -07:00
Ed Cashin
896dcd9a64 aoe: update internal version number to 85
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:26 -07:00
Ed Cashin
ec345120c5 aoe: update copyright date
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:25 -07:00
Ed Cashin
2256c1c51e aoe: fill in per-AoE-target information for debugfs file
This information is presented in a compact format that has evolved for
easy routine scanning by expert humans, mostly developers and support
technicians helping to troubleshoot or test AoE-based systems.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:25 -07:00
Ed Cashin
1cf94797c2 aoe: provide file operations for debugfs files
The place holder in the file contents is filled out in the following
patch.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:24 -07:00
Ed Cashin
e8866cf2b9 aoe: add AoE-target files to debugfs
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:23 -07:00
Ed Cashin
190519cd30 aoe: create and destroy debugfs directory for aoe
This series adds the debugging information that the coraid.com-distributed
aoe driver exports via sysfs, but instead of sysfs, it uses debugfs.

With these patches applied, even without AoE targets on the network, KEDR
reports new possible memory leaks, but these are from callers outside the
aoe driver that have used aoe_devnode to get the name of the character
devices through the aoe_class->devnode callback, and I believe they're
responsible for freeing that memory.

This patch:

Create and destroy the debugfs directory.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:22 -07:00
Jingoo Han
c07303c0af drivers/block/swim.c: remove unnecessary platform_set_drvdata()
The driver core clears the driver data to NULL after device_release or
on probe failure.  Thus, it is not needed to manually clear the device
driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:59 -07:00
Mike Miller
e7b18ede44 cciss: set max scatter gather entries to 32 on P600
At one time we used to set the maximum number of scatter gather elements
on all Smart Array controllers to 32.  At some point in time the
firmware began to write the "appropriate" value for each controller into
the config table.  The cciss driver would then read that and set
h->maxsgentries.

        h->maxsgentries = readl(&(h->cfgtable->MaxSGElements);

On the P600 that value is 544.  Under some workloads a significant
performance reduction may result.  This patch forces the P600 to use
only 32 scatter gather elements.  Other controllers are not affected.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Dwight (Bud) Brown <bubrown@redhat.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Stephen M. Cameron <steve.cameron@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:59 -07:00
Jingoo Han
c86db975c8 drivers/block/mg_disk.c: make mg_times_out() static
mg_times_out() is used only in this file.  Fix the following sparse
warning:

  drivers/block/mg_disk.c:639:6: warning: symbol 'mg_times_out' was not declared. Should it be static?

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:59 -07:00
Jingoo Han
bb8e0e84b3 block: replace strict_strtoul() with kstrtoul()
The use of strict_strtoul() is not preferred, because strict_strtoul() is
obsolete.  Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:56 -07:00
Josh Durgin
da6a6b6397 rbd: fix error handling from rbd_snap_name()
rbd_snap_name() calls rbd_dev_v{1,2}_snap_name() depending on the
format of the image. The format 1 version returns NULL on error, which
is handled by the caller. The format 2 version returns an ERR_PTR,
which the caller of rbd_snap_name() does not expect.

Fortunately this is unlikely to occur in practice because
rbd_snap_id_by_name() is called before rbd_snap_name(). This would hit
similar errors to rbd_snap_name() (like the snapshot not existing) and
return early, so rbd_snap_name() would not hit an error unless the
snapshot was removed between the two calls or memory was exhausted.

Use an ERR_PTR in rbd_dev_v1_snap_name() so that the specific error
can be propagated, and it is consistent with rbd_dev_v2_snap_name().
Handle the ERR_PTR in the only rbd_snap_name() caller.

Suggested-by: Alex Elder <alex.elder@linaro.org>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09 11:16:44 -07:00
Josh Durgin
efadc98aab rbd: ignore unmapped snapshots that no longer exist
This prevents erroring out while adding a device when a snapshot
unrelated to the current mapping is deleted between reading the
snapshot context and reading the snapshot names. If the mapped
snapshot name is not found an error still occurs as usual.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09 11:16:32 -07:00
Josh Durgin
9875201e10 rbd: fix use-after free of rbd_dev->disk
Removing a device deallocates the disk, unschedules the watch, and
finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
from the watch callback, updates the disk size and rbd_dev
structure. With no locking between them, rbd_dev_refresh() may use the
device or rbd_dev after they've been freed.

To fix this, check whether RBD_DEV_FLAG_REMOVING is set before
updating the disk size in rbd_dev_refresh(). In order to prevent a
race where rbd_dev_refresh() is already revalidating the disk when
rbd_remove() is called, move the call to rbd_bus_del_dev() after the
watch is unregistered and all notifies are complete. It's safe to
defer deleting this structure because no new requests can be submitted
once the RBD_DEV_FLAG_REMOVING is set, since the device cannot be
opened.

Fixes: http://tracker.ceph.com/issues/5636
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09 11:16:25 -07:00
Josh Durgin
20e0af67ce rbd: make rbd_obj_notify_ack() synchronous
The only user of rbd_obj_notify_ack() is rbd_watch_cb(). It used
asynchronously with no tracking of when the notify ack completes, so
it may still be in progress when the osd_client is shut down.  This
results in a BUG() since the osd client assumes no requests are in
flight when it stops. Since all notifies are flushed before the
osd_client is stopped, waiting for the notify ack to complete before
returning from the watch callback ensures there are no notify acks in
flight during shutdown.

Rename rbd_obj_notify_ack() to rbd_obj_notify_ack_sync() to reflect
its new synchronous nature.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09 11:16:02 -07:00
Josh Durgin
9abc59908e rbd: complete notifies before cleaning up osd_client and rbd_dev
To ensure rbd_dev is not used after it's released, flush all pending
notify callbacks before calling rbd_dev_image_release(). No new
notifies can be added to the queue at this point because the watch has
already be unregistered with the osd_client.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2013-09-09 11:15:57 -07:00
Linus Torvalds
6cccc7d301 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This includes both the first pile of Ceph patches (which I sent to
  torvalds@vger, sigh) and a few new patches that add support for
  fscache for Ceph.  That includes a few fscache core fixes that David
  Howells asked go through the Ceph tree.  (Thanks go to Milosz Tanski
  for putting this feature together)

  This first batch of patches (included here) had (has) several
  important RBD bug fixes, hole punch support, several different
  cleanups in the page cache interactions, improvements in the truncate
  code (new truncate mutex to avoid shenanigans with i_mutex), and a
  series of fixes in the synchronous striping read/write code.

  On top of that is a random collection of small fixes all across the
  tree (error code checks and error path cleanup, obsolete wq flags,
  etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
  ceph: use d_invalidate() to invalidate aliases
  ceph: remove ceph_lookup_inode()
  ceph: trivial buildbot warnings fix
  ceph: Do not do invalidate if the filesystem is mounted nofsc
  ceph: page still marked private_2
  ceph: ceph_readpage_to_fscache didn't check if marked
  ceph: clean PgPrivate2 on returning from readpages
  ceph: use fscache as a local presisent cache
  fscache: Netfs function for cleanup post readpages
  FS-Cache: Fix heading in documentation
  CacheFiles: Implement interface to check cache consistency
  FS-Cache: Add interface to check consistency of a cached object
  rbd: fix null dereference in dout
  rbd: fix buffer size for writes to images with snapshots
  libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
  rbd: fix I/O error propagation for reads
  ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
  ceph: allow sync_read/write return partial successed size of read/write.
  ceph: fix bugs about handling short-read for sync read mode.
  ceph: remove useless variable revoked_rdcache
  ...
2013-09-09 09:13:22 -07:00
Linus Torvalds
b409624ad5 Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVM Express driver update from Matthew Wilcox.

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Merge issue on character device bring-up
  NVMe: Handle ioremap failure
  NVMe: Add pci suspend/resume driver callbacks
  NVMe: Use normal shutdown
  NVMe: Separate controller init from disk discovery
  NVMe: Separate queue alloc/free from create/delete
  NVMe: Group pci related actions in functions
  NVMe: Disk stats for read/write commands only
  NVMe: Bring up cdev on set feature failure
  NVMe: Fix checkpatch issues
  NVMe: Namespace IDs are unsigned
  NVMe: Update nvme_id_power_state with latest spec
  NVMe: Split header file into user-visible and kernel-visible pieces
  NVMe: Call nvme_process_cq from submission path
  NVMe: Remove "process_cq did something" message
  NVMe: Return correct value from interrupt handler
  NVMe: Disk IO statistics
  NVMe: Restructure MSI / MSI-X setup
  NVMe: Use kzalloc instead of kmalloc+memset
2013-09-07 20:19:02 -07:00
Keith Busch
d82e8bfdef NVMe: Merge issue on character device bring-up
A recent patch made it possible to bring up the character handle when the
device is responsive but not accepting a set-features command. Another
recent patch moved the initialization that requires we move where the
checks for this condition occur. This patch merges these two ideas so
it works much as before.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-06 16:26:58 -04:00
Linus Torvalds
2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Josh Durgin
c35455791c rbd: fix null dereference in dout
The order parameter is sometimes NULL in _rbd_dev_v2_snap_size(), but
the dout() always derefences it. Move this to another dout() protected
by a check that order is non-NULL.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-09-03 22:08:51 -07:00
Josh Durgin
03507db631 rbd: fix buffer size for writes to images with snapshots
rbd_osd_req_create() needs to know the snapshot context size to create
a buffer large enough to send it with the message front. It gets this
from the img_request, which was not set for the obj_request yet. This
resulted in trying to write past the end of the front payload, hitting
this BUG:

libceph: BUG_ON(p > msg->front.iov_base + msg->front.iov_len);

Fix this by associating the obj_request with its img_request
immediately after it's created, before the osd request is created.

Fixes: http://tracker.ceph.com/issues/5760
Suggested-by: Alex Elder <alex.elder@linaro.org>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-09-03 22:08:46 -07:00
Josh Durgin
17c1cc1d92 rbd: fix I/O error propagation for reads
When a request returns an error, the driver needs to report the entire
extent of the request as completed.  Writes already did this, since
they always set xferred = length, but reads were skipping that step if
an error other than -ENOENT occurred.  Instead, rbd would end up
passing 0 xferred to blk_end_request(), which would always report
needing more data.  This resulted in an assert failing when more data
was required by the block layer, but all the object requests were
done:

[ 1868.719077] rbd: obj_request read result -108 xferred 0
[ 1868.719077]
[ 1868.719518] end_request: I/O error, dev rbd1, sector 0
[ 1868.719739]
[ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
[ 1868.719739]
[ 1868.719739]   rbd_assert(more ^ (which == img_request->obj_request_count));

Without this assert, reads that hit errors would hang forever, since
the block layer considered them incomplete.

Fixes: http://tracker.ceph.com/issues/5647
CC: stable@vger.kernel.org  # v3.10
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-09-03 22:06:10 -07:00
Keith Busch
9d713c2bfb NVMe: Handle ioremap failure
Decrement the number of queues required for doorbell remapping until
the memory is successfully mapped for that size.

Additional checks are done so that we don't call free_irq if it has
already been freed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:44:25 -04:00
Keith Busch
cd63894630 NVMe: Add pci suspend/resume driver callbacks
Used for going in and out of low power states. Resuming reuses the IO
queues from the previous initialization, freeing any allocated queues
that are no longer usable.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:44:16 -04:00
Keith Busch
1894d8f16a NVMe: Use normal shutdown
The NVMe spec recommends using the shutdown normal sequence when safely
taking the controller offline instead of hitting CC.EN on the next
start-up to reset the controller. The spec recommends a minimum of 1
second for the shutdown complete. This patch waits 2 seconds to be on
the safe side.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:40:32 -04:00
Keith Busch
f0b50732a9 NVMe: Separate controller init from disk discovery
This combines the controller initialization into one function, removing
IO queue setup from namespace discovery, and creates symetric functions
for device removal. The controller start and shutdown functions can now
be called from resume/suspend context as well as probe/remove.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:40:28 -04:00
Keith Busch
2240427425 NVMe: Separate queue alloc/free from create/delete
This separates nvme queue allocation from creation, and queue deletion
from freeing. This is so that we may in the future temporarily disable
queues and reuse the same memory when bringing them back online, like
coming back from suspend state.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:39:28 -04:00
Keith Busch
0877cb0d28 NVMe: Group pci related actions in functions
This will make it easier to reuse these outside probe/remove.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:39:25 -04:00
Keith Busch
9e59d091b0 NVMe: Disk stats for read/write commands only
Flush and discard requests would previously mess up the accounting.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:33:35 -04:00
Keith Busch
7e03b12406 NVMe: Bring up cdev on set feature failure
This patch creates the character device as long as a device's admin queues
are usable so a user has an opprotunity to perform administration tasks.
A device may be in a state that does not allow IO and setting the queue
count feature in such a state returns an error. Previously the driver
would bail and the controller would be unusable.

Signed-off-by: Keith Busch <keith.busch@intel.com>
2013-09-03 16:32:26 -04:00
Keith Busch
1b56749e54 NVMe: Fix checkpatch issues
Signed-off-by: Keith Busch <keith.busch@intel.com>
2013-09-03 16:32:26 -04:00
Matthew Wilcox
c3bfe7176c NVMe: Namespace IDs are unsigned
The 'Number of Namespaces' read from the device was being treated as
signed, which would cause us to not scan any namespaces for a device
with more than 2 billion namespaces.  That led to noticing that the
namespace ID was also being treated as signed, which could lead to the
result from NVME_IOCTL_ID being treated as an error code.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03 16:32:26 -04:00
Linus Torvalds
542a086ac7 Driver core patches for 3.12-rc1
Here's the big driver core pull request for 3.12-rc1.
 
 Lots of tiny changes here fixing up the way sysfs attributes are
 created, to try to make drivers simpler, and fix a whole class race
 conditions with creations of device attributes after the device was
 announced to userspace.
 
 All the various pieces are acked by the different subsystem maintainers.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.21 (GNU/Linux)
 
 iEYEABECAAYFAlIlIPcACgkQMUfUDdst+ynUMwCaAnITsxyDXYQ4DqEsz8EcOtMk
 718AoLrgnUZs3B+70AT34DVktg4HSThk
 =USl9
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg KH:
 "Here's the big driver core pull request for 3.12-rc1.

  Lots of tiny changes here fixing up the way sysfs attributes are
  created, to try to make drivers simpler, and fix a whole class race
  conditions with creations of device attributes after the device was
  announced to userspace.

  All the various pieces are acked by the different subsystem
  maintainers"

* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
  firmware loader: fix pending_fw_head list corruption
  drivers/base/memory.c: introduce help macro to_memory_block
  dynamic debug: line queries failing due to uninitialized local variable
  sysfs: sysfs_create_groups returns a value.
  debugfs: provide debugfs_create_x64() when disabled
  rbd: convert bus code to use bus_groups
  firmware: dcdbas: use binary attribute groups
  sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
  driver core: add #include <linux/sysfs.h> to core files.
  HID: convert bus code to use dev_groups
  Input: serio: convert bus code to use drv_groups
  Input: gameport: convert bus code to use drv_groups
  driver core: firmware: use __ATTR_RW()
  driver core: core: use DEVICE_ATTR_RO
  driver core: bus: use DRIVER_ATTR_WO()
  driver core: create write-only attribute macros for devices and drivers
  sysfs: create __ATTR_WO()
  driver-core: platform: convert bus code to use dev_groups
  workqueue: convert bus code to use dev_groups
  MEI: convert bus code to use dev_groups
  ...
2013-09-03 11:37:15 -07:00
Greg Kroah-Hartman
b15a21ddda rbd: convert bus code to use bus_groups
The bus_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the RBD bus code to use the
correct field.

Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Acked-by: Alex Elder <elder@linaro.org>
Cc: <ceph-devel@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-27 22:07:49 -07:00
Joe Perches
8be04b9374 treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
Don't emit OOM warnings when k.alloc calls fail when
there there is a v.alloc immediately afterwards.

Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-08-20 13:06:40 +02:00
Sage Weil
ee3e542fec Merge remote-tracking branch 'linus/master' into testing 2013-08-15 11:11:45 -07:00
Ed Cashin
fb32975d1b aoe: adjust ref of head for compound page tails
Fix a BUG which can trigger when direct-IO is used with AOE.

As discussed previously, the fact that some users of the block layer
provide bios that point to pages with a zero _count means that it is not
OK for the network layer to do a put_page on the skb frags during an
skb_linearize, so the aoe driver gets a reference to pages in bios and
puts the reference before ending the bio.  And because it cannot use
get_page on a page with a zero _count, it manipulates the value
directly.

It is not OK to increment the _count of a compound page tail, though,
since the VM layer will VM_BUG_ON a non-zero _count.  Block users that
do direct I/O can result in the aoe driver seeing compound page tails in
bios.  In that case, the same logic works as long as the head of the
compound page is used instead of the tails.  This patch handles compound
pages and does not BUG.

It relies on the block layer user leaving the relationship between the
page tail and its head alone for the duration between the submission of
the bio and its completion, whether successful or not.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13 17:57:48 -07:00
Jingoo Han
a158073c43 block: rbd: use NULL instead of 0
The local variables such as 'bio_list', and 'pages' are pointers;
thus, use NULL instead of 0 to fix the following sparse warnings.

drivers/block/rbd.c:2166:32: warning: Using plain integer as NULL pointer
drivers/block/rbd.c:2168:31: warning: Using plain integer as NULL pointer

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-08-09 17:55:40 -07:00
Linus Torvalds
d4c90b1b9f Merge branch 'for-3.11/drivers' of git://git.kernel.dk/linux-block
Pull block IO driver bits from Jens Axboe:
 "As I mentioned in the core block pull request, due to real life
  circumstances the driver pull request would be late.  Now it looks
  like -rc2 late...  On the plus side, apart form the rsxx update, these
  are all things that I could argue could go in later in the cycle as
  they are fixes and not features.  So even though things are late, it's
  not ALL bad.

  The pull request contains:

   - Updates to bcache, all bug fixes, from Kent.

   - A pile of drbd bug fixes (no big features this time!).

   - xen blk front/back fixes.

   - rsxx driver updates, some of them deferred form 3.10.  So should be
     well cooked by now"

* 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
  bcache: Allocation kthread fixes
  bcache: Fix GC_SECTORS_USED() calculation
  bcache: Journal replay fix
  bcache: Shutdown fix
  bcache: Fix a sysfs splat on shutdown
  bcache: Advertise that flushes are supported
  bcache: check for allocation failures
  bcache: Fix a dumb race
  bcache: Use standard utility code
  bcache: Update email address
  bcache: Delete fuzz tester
  bcache: Document shrinker reserve better
  bcache: FUA fixes
  drbd: Allow online change of al-stripes and al-stripe-size
  drbd: Constants should be UPPERCASE
  drbd: Ignore the exit code of a fence-peer handler if it returns too late
  drbd: Fix rcu_read_lock balance on error path
  drbd: fix error return code in drbd_init()
  drbd: Do not sleep inside rcu
  bcache: Refresh usage docs
  ...
2013-07-22 19:02:52 -07:00
Linus Torvalds
72c1c2f1be Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
Pull microblaze update from Michal Simek:
 "This Microblaze merge window is quite minimal.

  I have also added to my branch one xilinx systemace sparse fix because
  haven't got any reply from block maintainer."

* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
  xilinx systemace: Fix sparse warnings
  microblaze: Move __NR_syscalls from uapi
  microblaze: Enable KGDB in defconfig
  microblaze: Don't mark arch_kgdb_ops as const.
2013-07-10 10:16:07 -07:00
Michal Simek
4937a269cb xilinx systemace: Fix sparse warnings
Fix sysace sparse warnings.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2013-07-10 07:47:12 +02:00
Linus Torvalds
9a5889ae1c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned->signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc->mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
2013-07-09 12:39:10 -07:00
Linus Torvalds
7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Ed Cashin
94ac11833f aoe: update internal version number to v83
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:08:05 -07:00
Ed Cashin
ca47bbd93c aoe: update copyright date
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:08:05 -07:00
Ed Cashin
8030d34397 aoe: perform I/O completions in parallel
Some users have a large AoE target while others like to use many AoE
targets at the same time.  In the latter case, there is an opportunity to
greatly improve aggregate throughput by allowing different threads to
complete the I/O associated with each target.  For 36 targets, 4 KiB read
throughput roughly doubles, for example, with these changes in place.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:08:05 -07:00
Paul Clements
c378f70adb nbd: correct disconnect behavior
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes).  This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.

This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect.  This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Acked-by: Rob Landley <rob@landley.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:08:05 -07:00
Michal Belczyk
9532f149ee nbd: remove bogus BUG_ON in NBD_CLEAR_QUE
The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls).  We should stop trying to make bogus
assertions in it.  Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.

Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:08:05 -07:00
Kees Cook
f170168b9a drivers: avoid parsing names as kthread_run() format strings
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Kees Cook
ffc8b30866 block: do not pass disk names as format strings
Disk names may contain arbitrary strings, so they must not be
interpreted as format strings.  It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.

CVE-2013-2851

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:25 -07:00
Sage Weil
e976cad0f0 rbd: fix a couple warnings
gcc isn't quite smart enough and generates these warnings:

drivers/block/rbd.c: In function 'rbd_img_request_fill':
drivers/block/rbd.c:1266:22: warning: 'bio_list' may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/block/rbd.c:2186:14: note: 'bio_list' was declared here
drivers/block/rbd.c:2247:10: warning: 'pages' may be used uninitialized in this function [-Wmaybe-uninitialized]

even though they are initialized for their respective code paths.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-03 15:32:50 -07:00
Alex Elder
d552c6191b rbd: take a little credit
Add a name to the list of authors.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:44 -07:00
Alex Elder
cfbf6377b6 rbd: use rwsem to protect header updates
Updating an image header needs to be protected to ensure it's
done consistently.  However distinct headers can be updated
concurrently without a problem.  Instead of using the global
control lock to serialize headder updates, just rely on the header
semaphore.  (It's already used, this just moves it out to cover
a broader section of the code.)

That leaves the control mutex protecting only the creation of rbd
clients, so rename it.

This resolves:
    http://tracker.ceph.com/issues/5222

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:43 -07:00
Alex Elder
1ba0f1e797 rbd: don't hold ctl_mutex to get/put device
When an rbd device is first getting mapped, its device registration
is protected the control mutex.  There is no need to do that though,
because the device has already been assigned an id that's guaranteed
to be unique.

An unmap of an rbd device won't proceed if the device has a non-zero
open count or is already being unmapped.  So there's no need to hold
the control mutex in that case either.

Finally, an rbd device can't be opened if it is being removed, and
it won't go away if there is a non-zero open count.  So here too
there's no need to hold the control mutex while getting or putting a
reference to an rbd device's Linux device structure.

Drop the mutex calls in these cases.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:42 -07:00
Alex Elder
82a442d239 rbd: protect against concurrent unmaps
Make sure two concurrent unmap operations on the same rbd device
won't collide, by only proceeding with the removal and cleanup of a
device if is not already underway.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:41 -07:00
Alex Elder
751cc0e3cf rbd: set removing flag while holding list lock
When unmapping a device, its id is supplied, and that is used to
look up which rbd device should be unmapped.  Looking up the
device involves searching the rbd device list while holding
a spinlock that protects access to that list.

Currently all of this is done under protection of the control lock,
but that protection is going away soon.  To ensure the rbd_dev is
still valid (still on the list) while setting its REMOVING flag, do
so while still holding the list lock.  To do so, get rid of
__rbd_get_dev(), and open code what it did in the one place it
was used.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:41 -07:00
Alex Elder
08f75463c1 rbd: protect against duplicate client creation
If more than one rbd image has the same ceph cluster configuration
(same options, same set of monitors, same keys) they normally share
a single rbd client.

When an image is getting mapped, rbd looks to see if an existing
client can be used, and creates a new one if not.

The lookup and creation are not done under a common lock though, so
mapping two images concurrently could lead to duplicate clients
getting set up needlessly.  This isn't a major problem, but it's
wasteful and different from what's intended.

This patch fixes that by using the control mutex to protect
both the lookup and (if needed) creation of the client.  It
was previously used just when creating.

This resolves:
    http://tracker.ceph.com/issues/3094

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:39 -07:00
Alex Elder
3b5cf2a2f1 rbd: clean up a few things in the refresh path
This includes a few relatively small fixes I found while examining
the code that refreshes image information.

This resolves:
    http://tracker.ceph.com/issues/5040

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:38 -07:00
Alex Elder
e215605417 rbd: flush dcache after zeroing page data
Neither zero_bio_chain() nor zero_pages() contains a call to flush
caches after zeroing a portion of a page.  This can cause problems
on architectures that have caches that allow virtual address
aliasing.

This resolves:
    http://tracker.ceph.com/issues/4777

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:37 -07:00
Linus Torvalds
baa6f82093 Simple warning fix for module sections. If too late to pull, no big deal.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0NO+AAoJENkgDmzRrbjxS0YP/RoI5WguUDVx8tIdS29pwe9L
 Z5U0uIDUBZkNftlvvkdqdGjU1MeejxoMIs6cMz6ScY6YEI7li1AzuNApRadflnMx
 Do9u999NylI6q2oCLrt9y/eliJ/p5hkeCbhDrGdCgB7rPCjEAp/+Eu9Kq7FO8AYr
 Qu0rpLjvJ/5c5lsLAmUn95lZZ/uc/EPHasfZDiR8eGHRoAvPeyJ5qR06MpHFcl33
 2EKwR8RCYP9yUdqWnSKlC4LTuIpNK/BjEpHcQK2UPSn49vR8B3gVz5VXNP/7nIYQ
 hCr3VGoEOBoAlsD38IuKycXQ7vmBTU2hq7bQPq7Z6wyH96Sb6xFmVqT1Dm7uAI4+
 3OU69ugQb7WM3hQjeRRFvn5WjPqu8THwJKU7vhEf+xA09+v09QFjbHoEFabrFDlo
 MqVVhelkT6Mo8bBrxl8sZc/CG+aSl/kLQfyXWIcR2MQEAlfYKOYuDz+5TXtMZHaR
 SwlxgE2wbPP54v27yQWfBWY4d7dtJ+OX25GW1yzJdsNUFg7fQ9zzMnrv1prsUlog
 bxs4P0JcBOMLZx2IWm0R2kmbRFYjPgj6Kr3VnrIhZGesG1cwQB2QNvEPEkCtl6kq
 8HdX8Tua3dpfKAhDxnlztpVcJUmTuLn95jMWjW1KhCcjOqqJbaElq9ubplP2eMtr
 7zlfzFq9RsBKG0+J98Gt
 =vZYj
 -----END PGP SIGNATURE-----
mergetag object b3087e48ce
 type commit
 tag virtio-next-for-linus
 tagger Rusty Russell <rusty@rustcorp.com.au> 1372639977 +0930
 
 Was away, but it's all trivial and been sitting in linux-next.  So if you don't
 pull, no electrons will be harmed.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0NMjAAoJENkgDmzRrbjxf3EQALoIAuQAjiEcHTGkLd/hGMJk
 mDiN14CXNH0siKASlCfcAxudJxKfIRM1tYzMFV1e7DtC7ZAH43m/SYBmXC4hRpIz
 nHsv57xtUg8bGZgb2OZKX/L0rqiE5wxtK7ZWGIPh3aE4uZKif+YvSBNJniO/HRTl
 UVDbrp+sbZzbT8NIHQ8d2vdtfG7rCNRkVaaETxk1wguu2TGJkyt0xuU3zT54We0L
 M8rJuHyTplgEMuDUydmNi9uNPZJZZY7QqoZ7s+ubXELo2FcFeyqWP2oHZnywESok
 fWOfhgdgIgmTED/eXS9wGwB7+K6azKg2chA23QZ91ncX/Ts141sEAd9ezwuQZc+5
 VS/Lo2mRvWbEcG78uYGUz+ukltZv0Og6uOvfSfCrALcVA4pYNFJTjbRu+Io+qAml
 082byeBUT/eurZr7xE30b4aTvZV2bYfqFkVynjF9ugNvTTjMiruw8uAzep6DD1gl
 38XRwtrk6MHLAFuxsyNV1FgunIWB6zvXNjMTUUVDLAQCbZwxZArr9gDd4zUG9VTS
 EbA9TIulSMMLUve5zZQyByqRLlzmzDVghMOZVbYUOQN4uNxbAPoXZ87tpiRhk8Oc
 XW+LRABjwZOjiQ9SDWRxxb8p2WTVyjuwIOTE0pLCuEBw6RFmjGga+PzH9rDygJnF
 zEIbVDRT9Jj7rOtxSabE
 =p6z7
 -----END PGP SIGNATURE-----

Merge tags 'modules-next-for-linus' and 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull trivial module and virtio fixes from Rusty Russell.

Apparently these were meant for 3.10, but came in after the release.

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost.c: Add .text.unlikely to TEXT_SECTIONS

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio: remove virtqueue_add_buf().
  lguest: rename i386_head.S
  virtio_blk: Add missing 'static' qualifiers
  virtio: console: Add emergency writeonly register to config space
  virtio_pci: better macro exported in uapi
2013-07-03 13:09:06 -07:00
Linus Torvalds
0e97456ab5 Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven.

* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/q40: Enable PC parallel port in defconfig
  m68k/q40: Undefine insl/outsl before redefining them
  m68k/uaccess: Fix asm constraints for userspace access
  swim: Release memory region after incorrect return/goto
  m68k/irq: Vector ints need a valid interrupt handler
  m68k/math-emu: unsigned issue, 'unsigned long' will never be less than zero
  m68k: remove CONFIG_EARLY_PRINTK dependency on CONFIG_EMBEDDED, default to n
  m68k/sun3: remove inline marking of EXPORT_SYMBOL functions
  [SCSI] a3000: use module_platform_driver_probe()
  [SCSI] a4000t: use module_platform_driver_probe()
  m68k: Remove inline strcpy() and strcat() implementations
2013-07-03 11:11:23 -07:00
Linus Torvalds
63580e51bb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS patches (part 1) from Al Viro:
 "The major change in this pile is ->readdir() replacement with
  ->iterate(), dealing with ->f_pos races in ->readdir() instances for
  good.

  There's a lot more, but I'd prefer to split the pull request into
  several stages and this is the first obvious cutoff point."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
  [readdir] constify ->actor
  [readdir] ->readdir() is gone
  [readdir] convert ecryptfs
  [readdir] convert coda
  [readdir] convert ocfs2
  [readdir] convert fatfs
  [readdir] convert xfs
  [readdir] convert btrfs
  [readdir] convert hostfs
  [readdir] convert afs
  [readdir] convert ncpfs
  [readdir] convert hfsplus
  [readdir] convert hfs
  [readdir] convert befs
  [readdir] convert cifs
  [readdir] convert freevxfs
  [readdir] convert fuse
  [readdir] convert hpfs
  reiserfs: switch reiserfs_readdir_dentry to inode
  reiserfs: is_privroot_deh() needs only directory inode, actually
  ...
2013-07-02 09:28:37 -07:00
Jens Axboe
5f0e5afa0d Linux 3.10-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
 ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
 DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
 tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
 HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
 pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
 =uUH5
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc7' into for-3.11/drivers

Linux 3.10-rc7

Pull this in early to avoid doing it with the bcache merge,
since there are a number of changes to bcache between my old
base (3.10-rc1) and the new pull request.
2013-07-02 08:31:48 +02:00
Alex Elder
912c317d46 rbd: drop original request earlier for existence check
The reference to the original request dropped at the end of
rbd_img_obj_exists_callback() corresponds to the reference taken
in rbd_img_obj_exists_submit() to account for the stat request
referring to it.  Move the put of that reference up right after
clearing that pointer to make its purpose more obvious.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-01 09:52:02 -07:00
Geert Uytterhoeven
491205a8b4 rbd: Use min_t() to fix comparison of distinct pointer types warning
drivers/block/rbd.c: In function ‘zero_pages’:
drivers/block/rbd.c:1102: warning: comparison of distinct pointer types lacks a cast

Remove the hackish casts and use min_t() to fix this.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-07-01 09:52:01 -07:00
Linus Torvalds
bd2931b5cf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "This is a recently spotted regression in the snapshot behavior...

  It turns out several tests weren't being run in the nightlies so this
  took a while to spot"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: send snapshot context with writes
2013-06-29 10:31:15 -07:00
Al Viro
83a8761142 move linux/loop.h to drivers/block
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:45 +04:00
Philipp Reisner
d752b26960 drbd: Allow online change of al-stripes and al-stripe-size
Allow to change the AL layout with an resize operation. For that
the reisze command gets two new fields: al_stripes and al_stripe_size.

In order to make the operation crash save:
1) Lock out all IO and MD-IO
2) Write the super block with MDF_PRIMARY_IND clear
3) write the bitmap to the new location (all zeros, since
   we allow only while connected)
4) Initialize the new AL-area
5) Write the super block with the restored MDF_PRIMARY_IND.
6) Unfreeze all IO

Since the AL-layout has no influence on the protocol, this operation
needs to be beforemed on both sides of a resource (if intended).

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Philipp Reisner
e96c96333f drbd: Constants should be UPPERCASE
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Philipp Reisner
28e448bb30 drbd: Ignore the exit code of a fence-peer handler if it returns too late
In case the connection was established and lost again before
the a fence-peer handler returns, ignore the exit code of this
instance. (And use the exit code of the later started instance)

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Andreas Gruenbacher
f9eb7bf424 drbd: Fix rcu_read_lock balance on error path
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Wei Yongjun
6110d70bdf drbd: fix error return code in drbd_init()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Andreas Gruenbacher
26ea8f9239 drbd: Do not sleep inside rcu
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-06-28 16:04:36 +02:00
Jens Axboe
f35546e072 Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers
Konrad writes:

It has the 'feature-max-indirect-segments' implemented in both backend
and frontend. The current problem with the backend and frontend is that the
segment size is limited to 11 pages. It means we can at most squeeze in 44kB per
request. The ring can hold 32 (next power of two below 36) requests, meaning we
can do 1.4M of outstanding requests. Nowadays that is not enough.

The problem in the past was addressed in two ways - but neither one went upstream.
The first solution to this proposed by Justin from Spectralogic was to negotiate
the segment size.  This means that the ‘struct blkif_sring_entry’ is now a variable size.
It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes
(256 pages of data - so 1MB). It is a simple extension by just making the array in the
request expand from 11 to a variable size negotiated. But it had limits: this extension
still limits the number of segments per request to 255 (as the total number must be
specified in the request, which only has an 8-bit field for that purpose).

The other solution (from Intel - Ronghui) was to create one extra ring that only has the
‘struct blkif_request_segment’ in them. The ‘struct blkif_request’ would be changed to have
an index in said ‘segment ring’. There is only one segment ring. This means that the size of
the initial ring is still the same. The requests would point to the segment and enumerate out
how many of the indexes it wants to use. The limit is of course the size of the segment.
If one assumes a one-page segment this means we can in one request cover ~4MB.

Those patches were posted as RFC and the author never followed up on the ideas on changing
it to be a bit more flexible.

There is yet another mechanism that could be employed  (which these patches implement) - and it
borrows from VirtIO protocol. And that is the ‘indirect descriptors’. This very similar to
what Intel suggests, but with a twist. The twist is to negotiate how many of these
'segment' pages (aka indirect descriptor pages) we want to support (in reality we negotiate
how many entries in the segment we want to cover, and we module the number if it is
bigger than the segment size).

This means that with the existing 36 slots in the ring (single page) we can cover:
32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we ample space
in the blkif_request_indirect to span more than one indirect page, that number (64M)
can be also multiplied by eight = 512MB.

Roger Pau Monne took the idea and implemented them in these patches. They work
great and the corner cases (migration between backends with and without this extension)
work nicely. The backend has a limit right now off how many indirect entries
it can handle: one indirect page, and at maximum 256 entries (out of 512 - so  50% of the page
is used). That comes out to 32 slots * 256 entries in a indirect page * 1 indirect page
per request * 4096 = 32MB.

This is a conservative number that can change in the future. Right now it strikes
a good balance between giving excellent performance, memory usage in the backend, and
balancing the needs of many guests.

In the patchset there is also the split of the blkback structure to be per-VBD.
This means that the spinlock contention we had with many guests trying to do I/O and
all the blkback threads hitting the same lock has been eliminated.

Also there are bug-fixes to deal with oddly sized sectors, insane amounts on
th ring, and also a security fix (posted earlier).
2013-06-28 16:01:14 +02:00