sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.
So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
A bit too eager - we definitely need to clear the sg table
initially, so that we don't accidentally have ->page & 0x01
true and think that is a chain pointer.
This reverts commit f5c0dde4c6.
This reverts sg segment size ifdefs that the current code has in order
to provide a way to reduce sgpool memory consumption.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This option is true if a low-level driver can support sg
chaining. This will be removed eventually when all the drivers are
converted to support sg chaining. q->max_phys_segments is set to
SCSI_MAX_SG_SEGMENTS if false.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is what enables large commands. If we need to allocate an
sgtable that doesn't fit in a single page, allocate several
SCSI_MAX_SG_SEGMENTS sized tables and chain them together.
SCSI defaults to large chained sg tables, if the arch supports it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Just pass in the command, no point in passing in the scatterlist
and scatterlist pool index seperately.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts the SCSI mid layer to using the sg helpers for looking up
sg elements, instead of doing it manually.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The ULD ->done callback moves into the scsi_driver. By moving the call
to scsi_io_completion() from scsi_blk_pc_done() to scsi_finish_command(),
we can eliminate the latter entirely. By returning 'good_bytes' from
the ->done callback (rather than invoking scsi_io_completion()), we can
stop exporting scsi_io_completion().
Also move the prototypes from sd.h to sd.c as they're all internal anyway.
Rename sd_rw_intr to sd_done and rw_intr to sr_done.
Inspired-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Because scsi_print_sense_hdr prefixes with KERN_INFO, the output from
scsi_io_completion looks like:
sd 0:0:0:0: [sdb] Device not ready: <6>: Sense Key : 0x2 [current]
: ASC=0x4 ASCQ=0x3
By using scsi_show_sense_hdr, we can get the much more appealing output:
sd 0:0:0:0: [sdb] Device not ready: Sense Key : 0x2 [current]
sd 0:0:0:0: [sdb] Device not ready: ASC=0x4 ASCQ=0x3
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
One of the intents of the block prep function was to allow ULDs to use
it for preprocessing. The original SCSI model was to have a single prep
function and add a pointer indirect filter to build the necessary
commands. This patch reverses that, does away with the init_command
field of the scsi_driver structure and makes ULDs attach directly to the
prep function instead. The value is really that it allows us to begin
to separate the ULDs from the SCSI mid layer (as long as they don't use
any core functions---which is hard at the moment---a ULD doesn't even
need SCSI to bind).
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A BUSY status returned on a write request results in a stale residual
being returned when the write ultimately successfully completes.
This can be reproduced as follows:
1) issue immediate mode rewind to scsi tape drive
2) issue write request
The tape drive returns busy. The low level driver detects underrun and
sets the residual into the scsi command. The low level driver responds
with (DID_OK << 16) | scsi_status. scsi_status is 8, hence
status_byte(result) == 4, i.e., BUSY.
scsi_softirq_done() calls scsi_decide_disposition() which returns
ADD_TO_MLQUEUE. scsi_softirq_done() then calls scsi_queue_insert()
which, on the way to resubmitting the request to the driver, calls
scsi_init_cmd_errh().
The attached patch modifies scsi_init_cmd_errh() to clear the resid
field. This prevents a "stale" residual from being returned when the
scsi command finally completes without a BUSY status.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sg's may have setup a the buffer with a different length than
the transfer length so we should be using the bufflen passed
in as the request's data len.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
ll_back_merge_fn is currently exported to SCSI where is it used,
together with blk_rq_bio_prep, in exactly the same way these
functions are used in __blk_rq_map_user.
So move the common code into a new function (blk_rq_append_bio), and
don't export ll_back_merge_fn any longer.
Signed-off-by: Neil Brown <neilb@suse.de>
diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Our current implementation has a generic set of barrier functions that
go through the SCSI driver model. Realistically, this is unnecessary,
because the only device that can use barriers (sd) can set the flush
functions up at probe or revalidate time. This patch pulls the barrier
functions out of the mid layer and scsi driver model and relocates them
directly in sd.
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.
There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.
This patch saves ~150kb of pinned kernel memory on a 32-bit box.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Some targets can return both valid data and sense information.
Always update the request data_len from the SCSI command residual.
Callers should interpret sense data to determine what parts of the
data are valid in case of a CHECK CONDITION status.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch enhances SCSI error printing by:
- Making use of scsi_print_result() in the completion functions.
- Having scmd_printk() output the disk name (when applicable).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (97 commits)
[SCSI] zfcp: removed wrong comment
[SCSI] zfcp: use of uninitialized variable
[SCSI] zfcp: Invalid locking order
[SCSI] aic79xx: use dma_get_required_mask()
[SCSI] aic79xx: fix bracket mismatch in unused macro
[SCSI] BusLogic: Replace 'boolean' by 'bool'
[SCSI] advansys: clean up warnings
[SCSI] 53c7xx: brackets fix in uncompiled code
[SCSI] nsp_cs: remove old scsi code
[SCSI] aic79xx: make ahd_match_scb() static
[SCSI] DAC960: kmalloc->kzalloc/Casting cleanups
[SCSI] scsi_kmap_atomic_sg(): check that local irqs are disabled
[SCSI] Buslogic: local_irq_disable() is redundant after local_irq_save()
[SCSI] aic94xx: update for v28 firmware
[SCSI] scsi_error: Fix lost EH commands
[SCSI] aic94xx: Add default bus reset handler
[SCSI] aic94xx: Remove TMF result code munging
[SCSI] libsas: Add an LU reset mechanism to the error handler
[SCSI] libsas: Don't BUG when connecting two expanders via wide port
[SCSI] st: fix Tape dies if wrong block size used, bug 7919
...
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The KM_BIO_SRC_IRQ kmap slot must be taken with local irqs disabled. Add a
check into scsi for this.
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_retry_command only has a single caller, so there is no point
in having this function. Additionally the memset of the sense
buffer it does is entirely superflous as scsi_request_fn already
calls scsi_init_cmd_errh to perform this memset before the command
is reissued.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have full flexibility of merging parameters now, so we can remove the
hooks that define back/front/request merge strategies. Nobody is using
them anymore.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's a file system thing, for block requests the only size used in the
io paths is ->data_len as it is in bytes, not sectors.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the needed changes to the scsi-ml for the target
mode support.
Note, per the last review we moved almost all the fields we added
to the scsi_cmnd to our internal data structure which we are going
to try and kill off when we can replace it with support from other
parts of the kernel.
The one field we left on was the offset variable. This is needed to handle
the case where the target gets request that is so large that it cannot
execute it in one dma operation. So max_secotors or a segment limit may
limit the size of the transfer. In this case our tgt core code will
break up the command into managable transfers and send them to the
LLD one at a time. The offset is then used to tell the LLD where in
the command we are at. Is there another field on the scsi_cmd for
that?
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
ATAPI devices transfer fixed number of bytes for CDBs (12 or 16). Some
ATAPI devices choke when shorter CDB is used and the left bytes contain
garbage. Block SG_IO cleared left bytes but SCSI SG_IO didn't. This patch
makes SCSI SG_IO clear it and simplify CDB clearing in block SG_IO.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Mathieu Fluhr <mfluhr@nero.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Douglas Gilbert <dougg@torque.net>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I wanted to add some BUG checks to scsi_prep_fn to make sure no one
sends us a non-sg command, but this function is a horrible mess.
So I decided to detangle the function and document what the valid
cases are. While doing that I found that REQ_TYPE_SPECIAL commands
aren't used by the SCSI layer anymore and we can get rid of the code
handling them.
The new structure of scsi_prep_fn is:
(1) check if we're allowed to send this command
(2) big switch on cmd_type. For the two valid types call into
a function to set the command up, else error
(3) code to handle error cases
Because FS and BLOCK_PC commands are handled entirely separate after
the patch this introduces a tiny amount of code duplication. This
improves readabiulity though and will help to avoid the bidi command
overhead for FS commands so it's a good thing.
I've tested this on both sata and mptsas.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In scsi_execute_async()'s error path, a struct scsi_io_context
allocated with kmem_cache_alloc() is kfree()'d. Obviously
kmem_cache_free() should be used instead.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Right now ->flags is a bit of a mess: some are request types, and
others are just modifiers. Clean this up by splitting it into
->cmd_type and ->cmd_flags. This allows introduction of generic
Linux block message types, useful for sending generic Linux commands
to block devices.
Signed-off-by: Jens Axboe <axboe@suse.de>
Attached is a patch that should limit a possible recursion that can
lead to a stack overflow like follows:
Kernel stack overflow.
CPU: 3 Not tainted
Process zfcperp0.0.d819
(pid: 13897, task: 000000003e0d8cc8, ksp: 000000003499dbb8)
Krnl PSW : 0404000180000000 000000000030f8b2 (get_device+0x12/0x48)
Krnl GPRS: 00000000135a1980 000000000030f758 000000003ed6c1e8 0000000000000005
0000000000000000 000000000044a780 000000003dbf7000 0000000034e15800
000000003621c048 070000003499c108 000000003499c1a0 000000003ed6c000
0000000040895000 00000000408ab630 000000003499c0a0 000000003499c0a0
Krnl Code: a7 fb ff e8 a7 19 00 00 b9 02 00 22 e3 e0 f0 98 00 24 a7 84
Call Trace:
([<000000004089edc2>] scsi_request_fn+0x13e/0x650 [scsi_mod])
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
...
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod]
[<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod]
[<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod]
[<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4
[<000000004089fa9e>] scsi_run_host_queues+0x196/0x230 [scsi_mod]
[<00000000409eba28>] zfcp_erp_thread+0x2638/0x3080 [zfcp]
[<0000000000107462>] kernel_thread_starter+0x6/0xc
[<000000000010745c>] kernel_thread_starter+0x0/0xc
<0>Kernel panic - not syncing: Corrupt kernel stack, can't continue.
This stack overflow occurred during tests on s390 using zfcp.
Recursion depth for this panic was 19.
Usually recursion between blk_run_queue and a request_fn is avoided
using QUEUE_FLAG_REENTER. But this does not help if the scsi stack
tries to flush the starved_list of a scsi_host.
Limit recursion depth when flushing the starved_list
of a scsi_host.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently struct scsi_cmnd has various fields that are used to backup
original data after the corresponding fields have been overridden for
EH commands. This means drivers can easily get at it and misuse it.
Due to the old_ naming this doesn't happen for most of them, but two
that have different names have been used wrong a lot (see previous
patch). Another downside is that they unessecarily bloat the scsi_cmnd
size.
This patch moves them onstack in scsi_send_eh_cmnd to fix those two
issues aswell as allowing future EH fixes like moving the EH command
submissions to use SG lists like everything else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There was a logic fault in scsi_io_completion() where zero transfer
commands that complete successfully were sent to the block layer as
not up to date. This patch removes the if (good_bytes > 0) gate
around the successful completion, since zero transfer commands do have
good_bytes == 0.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If a device gets offlined as a result of the Inquiry sent
during scanning, the following oops can occur. After the
disk gets put into the SDEV_OFFLINE state, the error handler
sends back the failed inquiry, which wakes the thread doing
the scan. This starts a race between the scanning thread
freeing the scsi device and the error handler calling
scsi_run_host_queues to restart the host. Since the disk
is in the SDEV_OFFLINE state, scsi_device_get will still
work, which results in __scsi_iterate_devices getting
a reference to the scsi disk when it shouldn't.
The following execution thread causes the oops:
CPU 0 (scan) CPU 1 (eh)
---------------------------------------------------------
scsi_probe_and_add_lun
....
scsi_eh_offline_sdevs
scsi_eh_flush_done_q
scsi_destroy_sdev
scsi_device_dev_release
scsi_restart_operations
scsi_run_host_queues
__scsi_iterate_devices
get_device
scsi_device_dev_release_usercontext
scsi_run_queue
<---OOPS--->
The patch fixes this by changing the state of the sdev to SDEV_DEL
before doing the final put_device, which should prevent the race
from occurring.
Original oops follows:
Badness in kref_get at lib/kref.c:32
Call Trace:
[C00000002F4476D0] [C00000000000EE20] .show_stack+0x68/0x1b0 (unreliable)
[C00000002F447770] [C00000000037515C] .program_check_exception+0x1cc/0x5a8
[C00000002F447840] [C00000000000446C] program_check_common+0xec/0x100
Exception: 700 at .kref_get+0x10/0x28
LR = .kobject_get+0x20/0x3c
[C00000002F447B30] [C00000002F447BC0] 0xc00000002f447bc0 (unreliable)
[C00000002F447BB0] [C000000000254BDC] .get_device+0x20/0x3c
[C00000002F447C30] [D000000000063188] .scsi_device_get+0x34/0xdc [scsi_mod]
[C00000002F447CC0] [D0000000000633EC] .__scsi_iterate_devices+0x50/0xbc [scsi_mod]
[C00000002F447D60] [D00000000006A910] .scsi_run_host_queues+0x34/0x5c [scsi_mod]
[C00000002F447DF0] [D000000000069054] .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[C00000002F447EE0] [C00000000007B4E0] .kthread+0x128/0x178
[C00000002F447F90] [C000000000025E84] .kernel_thread+0x4c/0x68
Unable to handle kernel paging request for <7>PCI: Enabling device: (0002:41:01.1), cmd 143
data at address 0x000001b8
Faulting instruction address: 0xd0000000000698e4
sym1: <1010-66> rev 0x1 at pci 0002:41:01.1 irq 216
sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: SCSI BUS has been reset.
scsi2 : sym-2.2.2
cpu 0x0: Vector: 300 (Data Access) at [c00000002f447a30]
pc: d0000000000698e4: .scsi_run_queue+0x2c/0x218 [scsi_mod]
lr: d00000000006a904: .scsi_run_host_queues+0x28/0x5c [scsi_mod]
sp: c00000002f447cb0
msr: 9000000000009032
dar: 1b8
dsisr: 40000000
current = 0xc0000000045fecd0
paca = 0xc00000000048ee80
pid = 1123, comm = scsi_eh_1
enter ? for help
[c00000002f447d60] d00000000006a904 .scsi_run_host_queues+0x28/0x5c [scsi_mod]
[c00000002f447df0] d000000000069054 .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
[c00000002f447ee0] c00000000007b4e0 .kthread+0x128/0x178
[c00000002f447f90] c000000000025e84 .kernel_thread+0x4c/0x68
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We have to be able to remove SCSI devices even when they are suspended, so
QUIESCE -> CANCEL must be a legal state transition. This patch (as727)
adds the transition to the state machine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch simplifies "good_bytes" computation in sd_rw_intr().
sd: "good_bytes" computation is always done in terms of the resolution
of the device's medium, since after that it is the number of good bytes
we pass around and other layers/contexts (as opposed ot sd) can translate
that to their own resolution (block layer:512). It also makes
scsi_io_completion() processing more straightforward, eliminating the
3rd argument to the function.
It also fixes a couple of bugs like not checking return value,
using "break" instead of "return;", etc.
I've been running with this patch for some time now on a
test (do-it-all) system.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>