People do not read the README and seem to like to
unselect the crc32c module even though iscsi_tcp selects
it for them. This patch spits a error that tells the user
that they really do need the module. Hopefully, we will
get fewer people asking about this now.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For a while now, the block layer has seperated max sectors
and max hw sectors. Software iscsi has no limit so this patch
increases max hw sectors, so we can support large pass through
commands.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The return value of crypto_alloc_hash() should be checked by
IS_ERR().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The transition from crypto_digest_*() to the crypto_hash_*() family
introduced a bug into the data digest calculation: crypto_hash_update() is
called with the number of S/G elements instead of the S/G lists data size.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
XMSTATE_SOL_HDR could be set when the xmit thread tests it, but there may
not be anything on the r2tqueue yet. Move the XMSTATE_SOL_HDR set
before the addition to the queue to make sure that when we pull something
off it it is valid. This does not add locks around the xmstate test or make
that a atmoic_t because this is a fast path and if it is set when we test it
we can handle it there without the overhead. Later on we check the xmitqueue
for all requests with the session lock so we will not miss it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Unconditionally free crypto state, as it is always allocated during
TCP connection creation. Without this, crypto structures leak and
crc32c module refcounts grow as connections are created and
destroyed.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch converts ISCSI to use the new crypto_hash interface instead
of crypto_digest. It's a fairly straightforward substitution.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a digest is spread across two network buffers, we currently
ignore this and try to check the digest with the partial buffer.
Or course this fails. This patch has use iscsi_tcp_copy to
copy the whole digest before testing it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we relogin to a target, we have not yet negotiated digests
so we must reset the hdr_size var.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch built over the last ones fixes a bug in the partial header
resend code, where we add on another 4 bytes to the send length on the resend.
We want just the header plus digest.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently allocated seperate tfms for data and header digests. There
is no reason for this since we can never calculate a rx header and
digest at the same time. Same for sends. So this patch removes the data
tfms and has the send and recv sides use the rx_tfm or tx_tfm.
I also made the connection creation code preallocate the tfms because I
thought I hit a bug where I changed the digests settings during a
relogin but could not allocate the tfm and then we just failed.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iscsi_tcp calculates padding by using the expected transfer length. This
has the problem where if we have immediate data = no and initial R2T =
yes, and the transfer length ended up needing padding then we send:
1. header
2. padding which should have gone after data
3. data
Besides this bug, we also assume the target will always ask for nice
transfer lengths and the first burst length will always be a nice value.
As far as I can tell form the RFC this is not a requirement. It would be
silly to do this, but if someone did it we will end doing bad things.
Finally the last bug in that bit of code is in our handling of the
recalculation of data digests when we do not send a whole iscsi_buf in
one try. The bug here is that we call crypto_digest_final on a
iscsi_sendpage error, then when we send the rest of the iscsi_buf, we
doiscsi_data_digest_init and this causes the previous data digest to be
lost.
And to make matters worse, some of these bugs are replicated over and
over and over again for immediate data, solicited data and unsolicited
data. So the attached patch made over the iscsi git tree (see
kernel.org/git for details) which I updated today to include the patches
I said I merged, consolidates the sending of data, padding and digests
and calculation of data digests and fixes the above bugs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A couple targets like string bean and MDS, send r2ts with
a data len greater than the max burst we agreed to. We
were being strict in our enforcing of the iscsi rfc in that
code path, but there is no driver limitation that prevents
us from fullfilling the request. To allow those targets
to work we will ignore the max_burst length and send as
much data as the target asks for assuming it has consciously
decided to override its max burst length.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iSCSI RFC states that the first burst length must be smaller than the
max burst length. We currently assume targets will be good, but that may
not be the case, so this patch adds a check.
This patch also moves the unsol data out offset to the lib so the LLDs
do not have to track it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The version info is useful for iscsi tcp, iser and qla4xxx so move to
transport class.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Must pass ISCSI_ERR values from the recv path and propogate them
upwards.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently try to allocate a max_recv_data_segment_length
which can be very large (default is 64K), and common uses
are up to 1MB. It is very very difficult to allocte this
much contiguous memory and it turns out we never even use it.
We really only need a couple of pages, so this patch has us
allocates just what we know what we need today.
Later if vendors start adding vendor specific data and
we need to handle large buffers we can do this, but for
the last 4 years we have not seen anyone do this or request
it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we enter recovery and flush the running commands
we cannot freee the connection before flushing the commands.
Some commands may have a reference to the connection
that needs to be released before. iscsi_stop was forcing
the term and suspend too early and was causing a oops
in iser, so this patch removes those callbacks all together
and allows the LLD to handle that detail.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
if iscsi_data_rsp fails we must bail out. Since the pdu values like
data length are invalid we cannot continue to process the data since
it could over run buffers.
This fixes a bug with cisco 5428s where that target is sending
too much data.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The iscsi tcp code can pluck multiple rt2s from the tasks's r2tqueue
in the xmit code. This can result in the task being queued on the xmit queue
but gettting completed at the same time.
This patch fixes the above bug by making the fifo a list so
we always remove the entry on the list del.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In the xmit patch we are sending a -EXXX value to iscsi_conn_failure
which is causing userspace to get confused.
We should be sending a ISCSI_ERR_* value that userspace understands.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Convert iscsi_tcp to new lib functions.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We can race and misset the suspend bit if iscsi_write_space is
called then iscsi_send returns with a failure indicating
there is no space.
To handle this this patch returns a error upwards allowing xmitworker
to decide if we need to try and transmit again. For the no
write space case xmitworker will not retry, and instead
let iscsi_write_space queue it back up if needed (this relies
on the work queue code to properly requeue us if needed).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If recovery failed or we are in recovery only overwrite the state
if we are going to terminate the session or if we logged back in.
STOP_CONN_SUSPEND and conn_cnt are not used. We only support
a single connection session ATM, so cleanup that code while
we are working around it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Discovered by steven@hayter.me.uk and patch by michaelc@cs.wisc.edu
The dtask mempool is reserving 261120 items per session! Since we are now
sending headers with sendmsg there is no reason for the mempool and that
was causing us to us carzy amounts of mem. We can preallicate a header in
the r2t and task struct and reuse them
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Zhen and ported by Mike:
Don't use sendpage for the headers. sendpage for the pdu headers
does not seem to have a performance impact, makes life harder
for mutiple data pdus to be in flight and still trips up some
network cards when it is from slab mem.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by wrwhitehead@novell.com
patch and analysis by fujita.tomonori@lab.ntt.co.jp
Only tcp_read_sock and recv_actor (iscsi_tcp_data_recv for us) see
desc.count. It is is used just for permitting tcp_read_sock to read
the portion of data in the socket.
When iscsi_tcp_data_recv sees a partial header, it sets
desc.count. However, it is possible that the next skb (containing the
rest of the header) still does not come. So I'm not sure that this
scheme is completely correct.
Ideally, we should use the exact length of the data in the socket for
desc.count. However, it is not so simple (see SIOCINQ in
tcp_ioctl). So I think that iscsi_tcp_data_recv can just stop playing
with desc.count and tell tcp_read_sock to read the all skbs. As
proposed already, if iscsi_tcp_data_ready sets desc.count to
non-zero, tcp_read_sock does that.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by Ming and Rohan:
The problem Ming and Rohan debugged was that during a normal session
login, open-iscsi is not incrementing the exp_statsn counter. It was
stuck at zero. From the RFC, it looks like if the login response PDU has
a successful status then we should be incrementing that value. Also from
the RFC, it looks like if when we drop a connection then reconnect, we
should be using the exp_statsn from the old connection in the next
relogin attempt.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
align printk output
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
add transport end point callbacks so iscsi drivers that cannot connect
from userspace, like iscsi tcp, using sockets do not have to
implement their own socket infrastructure.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This just converts iscsi_tcp to the lib
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current iscsi_tcp eh is not nicely setup for dm-multipath
and performs some extra task management functions when they
are not needed.
The attached patch:
- Fixes the TMF issues. If a session is rebuilt
then we do not send aborts.
- Fixes the problem where if the host reset fired, we would
return SUCCESS even though we had not really done anything
yet. This ends up causing problem with scsi_error.c's TUR.
- If someone has turned on the userspace nop daemon code to try
and detect network problems before the scsi command timeout
we can now drop and clean up the session before the scsi command
timesout and fires the eh speeding up the time it takes for a
command to go from one patch to another. For network problems
we fail the command with DID_BUS_BUSY so if failfast is set
scsi_decide_disposition fails the command up to dm for it to
try on another path.
- And we had to add some basic iscsi session block code. Previously
if we were trying to repair a session we would retrun a MLQUEUE code
in the queuecommand. This worked but it was not the most efficient
or pretty thing to do since it would take a while to relogin
to the target. For iscsi_tcp/open-iscsi a lot of the iscsi error handler
is in userspace the block code is pretty bare. We will be
adding to that for qla4xxx.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For iscsi boot when going from initramfs to the real root we
need to stop the userpsace iscsi daemon. To later restart it
iscsid needs to be able to rebuild itself and part of that
process is matching a session running the kernel with the
iscsid representation. To do this the attached patch
adds several required iscsi values. If the LLD does not provide
them becuase, login is done in userspace, then the transport
class and userspace set ths up for the LLD.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
from hare@suse.de and michaelc@cs.wisc.edu
hw iscsi like qla4xxx does not allocate a host per session and
for userspace it is difficult to restart iscsid using the
"iscsi handles" for the session and connection, so this
patch just has the class or userspace allocate the id for
the session and connection.
Note: this breaks userspace and requires users to upgrade to the newest
open-iscsi tools. Sorry about his but open-iscsi is still too new to
say we have a stable user-kernel api and we were not good nough
designers to know that other hw iscsi drivers and iscsid itself would
need such changes. Actually we sorta did but at the time we did not
have the HW available to us so we could only guess.
Luckily, the only tools hooking into the class are the open-iscsi ones
or other tools like iscsitart hook into the open-iscsi engine from
userspace or prgroams like anaconda call our tools so they are not affected.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure. However, that is not what happens. The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap(). Cache_reap()
is run every few seconds independently of memory pressure.
Could we remove the whole thing? Its only used by three slabs anyways and
I cannot find a reason for having this option.
There is an additional problem with SLAB_NO_REAP. If set then the recovery
of objects from alien caches is switched off. Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk. (Strangely __cache_shrink does not
check for SLAB_NO_REAP)
Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>From ogerlitz@voltaire.com:
mgmtpool shoild be frees in immdata_alloc_fail label.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
>From erezz@voltaire.com:
We are still in ISCSI_STATE_FREE state at create time. The addition
of the first connection puts us in ISCSI_STATE_LOGGED_IN.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
>From erezz@voltaire.com:
rm conn->lock since it is not used anymore. The dataqueue is protected
by the session lock and xmitmutex.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From:
michaelc@cs.wisc.edufujita.tomonori@lab.ntt.co.jpda-x@monatomic.org
and err path fixup from:
ogerlitz@voltaire.com
This patch cleans up that interface by having the lld and class
pass a iscsi_cls_session or iscsi_cls_conn between each other when
the function is used by HW and SW iscsi llds. This way the lld
does not have to remember if it has to send a handle or pointer
and a handle or pointer to connection, session or host.
This also has the class verify the session handle that gets passed from
userspace instead of using the pointer passed into the kernel directly.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: FUJITA Tomonori <tomof@acm.org> and zhenyu.z.wang@intel.com:
We cannot handle filesystems like XFS becuase of the pages they
are sending us. We had thought page_count could be used to
work around this, but the correct test is for PageSlab.
The proper solution is to figure out what type of pages
filesystems can use so we do not have to add tests like
this or handle it in the block layer for all network block drivers
but the issue still has not been resolved on fs-devel
so we are sending this patch as a temporary fix.
This is last patch just in case it is Nakd with the explanation
that we need to push the correct fix through fs-devel, mm
or the block layer. The rest of the patchset can live without
the patch, but the driver will not work with filesystems like
XFS.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we run the xmit code from queuecomand the stack trace
gets too deep. The patch runs the xmit code from the scsi_host
work queue. This fixes 4k stack and xfs support and should
fix the st and sg stack usage bugs.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is the second version of the patch to address Christoph's comments.
Instead of doing the lib, I just kept everything in scsi_trnapsort_iscsi.c
like the FC and SPI class. This was becuase the driver model and sysfs
class is tied to the session and connection setup so separating did not
buy very much at this time.
The reason for this patch was becuase HW iscsi LLDs like qla4xxx cannot
use the iscsi class becuase the scsi_host was tied to the interface and
class code. This patch just seperates the session from scsi host so
that LLDs that allocate the host per some resource like pci device
can still use the class.
This is also fixes a couple refcount bugs that can be triggered
when users have a sysfs file open, close the session, then
read or write to the file.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>