cfq's add_req_fn callback may invoke q->request_fn directly and
depending on low-level driver used and timing, a queued request may be
finished & deallocated before add_req_fn callback returns. So,
__elv_add_request must not access rq after it's passed to add_req_fn
callback.
This patch moves rq_mergeable test above add_req_fn(). This may
result in q->last_merge pointing to REQ_NOMERGE request if add_req_fn
callback sets it but as RQ_NOMERGE is checked again when blk layer
actually tries to merge requests, this does not cause any problem.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).
Signed-off-by: Jens Axboe <axboe@suse.de>
Right now we do it at queueing time, which works alright for reads
(since they are usually sync), but not for async writes since we can
queue io a lot faster than we can complete it. This makes the vmstat
output look extremely bursty.
Signed-off-by: Jens Axboe <axboe@suse.de>
Tejun Heo notes:
"I'm currently debugging this. The problem is that we are using the
generic dispatch queue directly in the noop sched and merging is NOT
allowed on dispatch queues but generic handling of last_merge tries
to merge requests. I'm still trying to verify this, so I'll be back
with results soon."
In the meantime, disable merging for noop by setting REQ_NOMERGE in
elevator_noop_add_request().
Eventually, we should add a noop_list and do the dispatching like in the
other io schedulers. Merging is still beneficial for noop (and it has
always done it).
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't clear ->elevator_data on exit, if we are switching queues we are
overwriting the data of the new io scheduler.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the requested I/O scheduler is already in place, elevator_switch simply
leaves the queue alone, and returns. However, it forgets to call
elevator_put, so
'echo [current_sched] > /sys/block/[dev]/queue/scheduler'
will leak a reference, causing the current_sched module to be permanently
pinned in memory.
Signed-off-by: Nate Diller <nate@namesys.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a kconfig submenu to select the default I/O scheduler, in case
anticipatory is not compiled in or another default is preferred. Also,
since no-op is always available, we should use it whenever the selected
default is not.
Signed-off-by: Nate Diller <nate@namesys.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The last patch from Jens Axboe for drivers/block/paride/pf.c introduced
pf_end_request() which sets pf_req to NULL.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're trying to get rid of as much as possible tasklist walks, or at
least moving them to core code. This patch falls into the second
category.
Instead of walking the tasklist in cfq-iosched move that into
elv_unregister. The added benefit is that with this change the as
ioscheduler might be might unloadable more easily aswell.
The new code uses read_lock instead of read_lock_irq because the
tasklist_lock only needs irq disabling for writers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert everyone who uses platform_bus_type to include
linux/platform_device.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
as-iosched deals with aliased requests differently from other ioscheds.
It links together aliased requests using rq->queuelist instead of
spilling alises to dispatch queue like other ioscheds do. Requests
linked in this way cannot be merged.
Unfortunately, generic q->last_merge handling patch didn't take this
into account and q->last_merge could be set to an aliased request
resulting in Badness, corrupt list and eventually panic.
This explicitly marks aliased requests to be unmergeable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When building on a 64-bit platform, gcc produces a warning
"cast of a pointer to an integer of a different size".
The scatterlist.offset on the LHS is unsigned int, so I used
that originally.
Signed-off-by: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
drivers/block/ub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A "coldplug + udevstart" can be simple like this:
for i in /sys/block/*/*/uevent; do echo 1 > $i; done
for i in /sys/class/*/*/uevent; do echo 1 > $i; done
for i in /sys/bus/*/devices/*/uevent; do echo 1 > $i; done
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use get_unaligned for possibly-unaligned multi-byte accesses to the
ATA device identify response buffer.
- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch kills max_back_kb handling from elv_dispatch_sort() and
kills max_back_kb field from struct request_queue.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Currently, both generic elevator code and specific ioscheds
participate in the management and usage of last_merge. This
and the following patches move last_merge handling into
generic elevator code.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch updates all four ioscheds to use generic dispatch
queue. There's one behavior change in as-iosched.
* In as-iosched, when force dispatching
(ELEVATOR_INSERT_BACK), batch_data_dir is reset to REQ_SYNC
and changed_batch and new_batch are cleared to zero. This
prevernts AS from doing incorrect update_write_batch after
the forced dispatched requests are finished.
* In cfq-iosched, cfqd->rq_in_driver currently counts the
number of activated (removed) requests to determine
whether queue-kicking is needed and cfq_max_depth has been
reached. With generic dispatch queue, I think counting
the number of dispatched requests would be more appropriate.
* cfq_max_depth can be lowered to 1 again.
Original from Tejun Heo, modified version applied.
Signed-off-by: Jens Axboe <axboe@suse.de>
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch removes try_module_get race in elevator_find.
try_module_get should always be called with the spinlock protecting
what the module init/cleanup routines register/unregister to held. In
the case of elevators, we should be holding elv_list to avoid it going
away between spin_unlock_irq and try_module_get.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Just set the name field directly in the device_driver structure
contained in the vio_driver struct.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>