weiping zhang [Fri, 18 Aug 2017 16:37:20 +0000 (00:37 +0800)]
block, bfq: fix error handle in bfq_init
if elv_register fail, bfq_pool should be free.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:32 +0000 (19:10 +0200)]
block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:31 +0000 (19:10 +0200)]
block: cache the partition index in struct block_device
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:30 +0000 (19:10 +0200)]
block: add a __disk_get_part helper
This helper allows looking up a partion under RCU protection without
grabbing a reference to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:29 +0000 (19:10 +0200)]
block: reject attempts to allocate more than DISK_MAX_PARTS partitions
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:28 +0000 (19:10 +0200)]
raid5: remove a call to get_start_sect
The block layer always remaps partitions before calling into the
->make_request methods of drivers. Thus the call to get_start_sect in
in_chunk_boundary will always return 0 and can be removed.
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Aug 2017 17:10:27 +0000 (19:10 +0200)]
btrfs: index check-integrity state hash by a dev_t
We won't have the struct block_device available in the bio soon, so switch
to the numerical dev_t instead of the block_device pointer for looking up
the check-integrity state.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 23 Aug 2017 17:56:33 +0000 (10:56 -0700)]
skd: Change default interrupt mode to MSI-X
Since MSI support on some motherboards is unreliable, change the
default interrupt mode from MSI to MSI-X. This patch avoids that
the following message appears sporadially in the kernel logs of
my test setup:
do_IRQ: 3.193 No irq handler for vector
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 23 Aug 2017 17:56:32 +0000 (10:56 -0700)]
skd: Avoid double completions in case of a timeout
Avoid that normal request completion and the timeout handler can
run concurrently by calling blk_mq_complete_request() instead of
blk_mq_end_request() from skd_end_request(). Avoid that the block
layer can reuse a request while the firmware is still processing
it. Convert skd_softirq_done() to blk-mq. Pass the pointer to
skd_softirq_done() to the block layer core through
blk_mq_ops.complete instead of by calling blk_queue_softirq_done().
Pass the pointer to skd_timed_out() to the block layer core
through blk_mq_ops.timeout instead of by calling
blk_queue_timed_out(). The timeout handler has been tested as
follows:
echo 1 > /sys/block/skd0/io-timeout-fail &&
(cd /sys/kernel/debug/fail_io_timeout &&
echo 100 > probability &&
echo N > task-filter &&
echo 1 > times)
Fixes: commit
a74d5b76fab9 ("skd: Switch to block layer timeout mechanism")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 23 Aug 2017 17:56:31 +0000 (10:56 -0700)]
skd: Inline skd_process_request()
This patch does not change any functionality but makes the skd
driver code more similar to that of other blk-mq kernel drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 23 Aug 2017 17:56:30 +0000 (10:56 -0700)]
skd: Report completion mismatches once
This patch removes one debug statement but otherwise does not change
any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 23 Aug 2017 17:56:29 +0000 (10:56 -0700)]
block: Warn if blk_queue_rq_timed_out() is called for a blk-mq queue
The timeout handler set by blk_queue_rq_timed_out() is only used
in single queue mode. Calling this function for blk-mq drivers is
wrong. Hence issue a warning if this function is called by a blk-mq
driver.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:05:00 +0000 (15:05 -0700)]
nullb: badbblocks support
Sometime disk could have tracks broken and data there is inaccessable,
but data in other parts can be accessed in normal way. MD RAID supports
such disks. But we don't have a good way to test it, because we can't
control which part of a physical disk is bad. For a virtual disk, this
can be easily controlled.
This patch adds a new 'badblock' attribute. Configure it in this way:
echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad
blocks.
echo "-20-30" > xxx/badblock, this will make sector [20-30] good
If badblocks are accessed, the nullb disk will return IO error. Other
parts of the disk can accessed in normal way.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:59 +0000 (15:04 -0700)]
nullb: emulate cache
Software must flush disk cache to guarantee data safety. To check if
software correctly does disk cache flush, we must know the behavior of
disk. But physical disk behavior is uncontrollable. Even software
doesn't do the flush, the disk probably does the flush. This patch tries
to emulate a cache in the test disk.
All write will go to a cache first, when the cache is full, we then
flush some data to disk storage. A flush request will flush all data of
the cache to disk storage. A FUA write will write to memory store
directly and revalidate data in cache. If there is a power failure (by
writing to power attribute, 'echo 0 > disk_name/power'), we discard all
data in the cache, but preserve the data in disk storage. Later we can
power on the disk again as usual (write 1 to 'power' attribute), then we
can check data integrity and very if software does everything correctly.
A new attribute 'cache_size' (in MB) is added to configure cache size.
Based on original patch from Kyungchan Koh
Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:58 +0000 (15:04 -0700)]
nullb: bandwidth control
In test, we usually expect controllable disk speed. For example, in a
raid array, we'd like some disks are fast and some are slow. MD RAID
actually has a feature for this. To test the feature, we'd like to make
the disk run in specific speed.
block throttling probably can be used for this purpose, but it requires
cgroup setup. Here we just implement a simple throttling mechanism in
the driver. There is slight fluctuation in the mechanism, but it's good
enough for test.
To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
MB/s.
Based on original patch from Kyungchan Koh
Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:57 +0000 (15:04 -0700)]
nullb: support discard
discard makes sense for memory backed disk. And also it's useful to test
if upper layer supports dicard correctly.
User configures 'discard' attribute to enable/disable dicard support.
Based on original patch from Kyungchan Koh
Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:56 +0000 (15:04 -0700)]
nullb: support memory backed store
This adds memory backed store in nullb.
User configure 'memory_backed' attribute for this. By default, nullb
disk doesn't use memory backed store.
Based on original patch from Kyungchan Koh
Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:55 +0000 (15:04 -0700)]
nullb: use ida to manage index
We now dynamically create disks. Managing the disk index with ida to
avoid bump up the index too much.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:54 +0000 (15:04 -0700)]
nullb: add interface to power on disk
The device created in nullb configfs interface isn't power on by
default. After user configures the device, user can do 'echo 1 >
xxx/nullb/device_name/power' to power on the device, which will create a
disk. the xxx/nullb/device_name/index is the disk index, so if the index
is 2, the new created disk should be named as /dev/nullb2. Note, the
'index' is only valid after disk is power on.
'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this
doesn't remove the device. To remove the device, user should do 'rmdir
xxx/nullb/device_name'. Removing the device will remove the disk too.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:53 +0000 (15:04 -0700)]
nullb: add configfs interface
Add configfs interface for nullb. configfs interface is more flexible
and easy to configure in a per-disk basis.
Configuration is something like this:
mount -t configfs none /mnt
Checking which features the driver supports:
cat /mnt/nullb/features
The 'features' attribute is for future extension. We probably will add
new features into the driver, userspace can check this attribute to find
the supported features.
Create/remove a device:
mkdir/rmdir /mnt/nullb/a
Then configure the device by setting attributes under /mnt/nullb/a, most
of nullb supported module parameters are converted to attributes:
size; /* device size in MB */
completion_nsec; /* time in ns to complete a request */
submit_queues; /* number of submission queues */
home_node; /* home node for the device */
queue_mode; /* block interface */
blocksize; /* block size */
irqmode; /* IRQ completion handler */
hw_queue_depth; /* queue depth */
use_lightnvm; /* register as a LightNVM device */
blocking; /* blocking blk-mq device */
use_per_node_hctx; /* use per-node allocation for hardware context */
Note, creating a device doesn't create a disk immediately. Creating a
disk is done in two phases: create a device and then power on the
device. Next patch will introduce device power on.
Based on original patch from Kyungchan Koh
Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Mon, 14 Aug 2017 22:04:52 +0000 (15:04 -0700)]
nullb: factor disk parameters
When we switch to configfs interface, each disk could have different
configuration. To prepare for the change, we move most disk setting to a
separate data structure. The existing module parameter interface is
kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk
setting, so they are remained as global settings.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan Carpenter [Wed, 23 Aug 2017 11:20:57 +0000 (14:20 +0300)]
skd: error pointer dereference in skd_cons_disk()
My initial impulse was to check for IS_ERR_OR_NULL() but when I looked
at this code a bit more closely, we should only need to check for
IS_ERR().
The blk_mq_alloc_tag_set() returns negative error codes and zero on
success so we can just do an "if (rc) goto err_out;". It's better to
preserve the error code anyhow. The blk_mq_init_queue() returns error
pointers on failure, it never returns NULL. We can also remove the
"q = NULL;" at the start because that's no longer needed.
Fixes:
ca33dd92968b ("skd: Convert to blk-mq")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan Carpenter [Wed, 23 Aug 2017 10:44:20 +0000 (13:44 +0300)]
skd: Uninitialized variable in skd_isr_completion_posted()
Someone got too agressive about removing initializations and
accidentally removed the "rc = 0;" which is required.
Fixes:
c830da8cbc7b ("skd: Remove superfluous initializations from skd_isr_completion_posted()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Aug 2017 15:22:28 +0000 (08:22 -0700)]
skd: Remove driver version information
Remove the driver version information because this information
is not useful in an upstream kernel driver.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:38 +0000 (13:13 -0700)]
skd: Bump driver version
Bump the driver version. Remove the build ID because build IDs do
not make sense for an upstream kernel driver. Keep the driver
version in the module information but do not report it during every
load, unload or probe.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:37 +0000 (13:13 -0700)]
skd: Optimize locking
Only take skdev->lock if necessary.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:36 +0000 (13:13 -0700)]
skd: Remove several local variables
This patch does not change any functionality but makes the code
more brief.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:35 +0000 (13:13 -0700)]
skd: Reduce memory usage
Every single coherent DMA memory buffer occupies at least one page.
Reduce memory usage by switching from coherent buffers to streaming
DMA for I/O requests (struct skd_fitmsg_context) and S/G-lists
(struct fit_sg_descriptor[]).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:34 +0000 (13:13 -0700)]
skd: Remove skd_device.in_flight
Since skd_device.in_flight is only used to display the number of
in-flight requests in debug messages, remove that member and
introduce skd_in_flight(). That last function relies on the block
layer to determine the number of in flight requests.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:33 +0000 (13:13 -0700)]
skd: Switch to block layer timeout mechanism
Remove the timeout slot variables and rely on the block layer to
detect request timeouts.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:32 +0000 (13:13 -0700)]
skd: Convert to blk-mq
Introduce a tag set and a blk_mq_ops structure. Set .cmd_size such
that struct request and struct skd_request_context are allocated
through a single allocation. Remove the skd_request_context.req
pointer. Make queue starting asynchronous such that this can occur
safely from interrupt context. Use locking to protect skdev->skmsg
and *skdev->skmsg against concurrent access from concurrent
.queue_rq() calls. Introduce the functions skd_init_request() and
skd_exit_request() to set up / clean up the per-request S/G-list.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:31 +0000 (13:13 -0700)]
skd: Coalesce struct request and struct skd_request_context
Set request_queue.cmd_size, introduce skd_init_rq() and skd_exit_rq()
and remove skd_device.skreq_table.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:30 +0000 (13:13 -0700)]
skd: Move skd_free_sg_list() up
Issue a warning if a NULL argument is passed to skd_free_sg_list().
Move this function up to make the blk-mq conversion patch easier
to read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:29 +0000 (13:13 -0700)]
skd: Split skd_recover_requests()
This patch does not change any functionality but makes the blk-mq
conversion patch easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:28 +0000 (13:13 -0700)]
skd: Introduce skd_process_request()
The only functional change in this patch is that the skd_fitmsg_context
in which requests are accumulated is changed from a local variable into
a member of struct skd_device. This patch will make the blk-mq conversion
easier.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:27 +0000 (13:13 -0700)]
skd: Convert several per-device scalar variables into atomics
Convert the per-device scalar variables that are protected by the
queue lock into atomics such that it becomes safe to access these
variables without holding the queue lock.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:26 +0000 (13:13 -0700)]
skd: Enable request tags for the block layer queue
Use the request tag when allocating a skd_fitmsg_context or
skd_request_context such that the lists used to track free elements
can be eliminated. Swap the skd_end_request() and skd_release_req()
calls to avoid triggering a use-after-free. Remove
skd_fitmsg_context.state and .outstanding because FIT messages are
shared among requests and because updating a FIT message after a
request has finished whould trigger a use-after-free.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:25 +0000 (13:13 -0700)]
skd: Initialize skd_special_context.req.n_sg to one
The debug code in skd_send_special_fitmsg() assumes that req.n_sg
represents the number of S/G descriptors. However, skd_construct()
initializes that member variable to zero. Set req.n_sg to one such
that the debugging code in skd_send_special_fitmsg() works as
expected.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:24 +0000 (13:13 -0700)]
skd: Remove dead code
Removing the SG IO code also removed the code that sets
SKD_REQ_STATE_ABORTED. Hence also remove the code that checks for
this state.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:23 +0000 (13:13 -0700)]
skd: Remove SG IO support
The skd SG IO support duplicates the functionality of the bsg driver.
Hence remove it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:22 +0000 (13:13 -0700)]
skd: Convert explicit skd_request_fn() calls
This will make it easier to convert this driver to the blk-mq
approach. This patch also reduces interrupt latency by moving
skd_request_fn() calls out of the skd_isr() interrupt.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:21 +0000 (13:13 -0700)]
skd: Rework request failing code path
Move the skd_fail_all_pending() call out of skd_request_fn_not_online()
such that this function can be reused in the blk-mq code path.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:20 +0000 (13:13 -0700)]
skd: Move a function definition
This patch does not change any functionality but makes the next
patch in this series easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:19 +0000 (13:13 -0700)]
skb: Use symbolic names for SCSI opcodes
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:18 +0000 (13:13 -0700)]
skd: Use kcalloc() instead of kzalloc() with multiply
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:17 +0000 (13:13 -0700)]
skd: Remove superfluous occurrences of the 'volatile' keyword
mem_map[i] is accessed through readl() / writel() hence declaring
mem_map as volatile is not necessary.
Remove the volatile declarations from struct fit_completion_entry_v1
pointers and struct fit_comp_error_info since reading these structures
multiple times is safe.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:16 +0000 (13:13 -0700)]
skd: Remove a redundant init_timer() call
Since setup_timer() invokes init_timer(), invoking init_timer()
just before setup_timer() is redundant. Hence remove the init_timer()
call.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:15 +0000 (13:13 -0700)]
skd: Use for_each_sg()
This change makes skd_preop_sg_list() support chained sg-lists.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:14 +0000 (13:13 -0700)]
skd: Drop second argument of skd_recover_requests()
Since all callers pass zero as second argument to skd_recover_requests(),
drop that second argument.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:13 +0000 (13:13 -0700)]
skd: Remove superfluous initializations from skd_isr_completion_posted()
The value of skcmp, cmp_cntxt etc. is overwritten during every
loop iteration and is not used after the loop has finished. Hence
initializing these variables outside the loop is not necessary.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:12 +0000 (13:13 -0700)]
skd: Simplify the code for handling data direction
Use DMA_FROM_DEVICE and DMA_TO_DEVICE directly instead of
introducing driver-private constants with the same numerical
value.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:11 +0000 (13:13 -0700)]
skd: Use ARRAY_SIZE() where appropriate
Use ARRAY_SIZE() instead of open-coding it. This patch does not
change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:10 +0000 (13:13 -0700)]
skd: Make the skd_isr() code more brief
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:09 +0000 (13:13 -0700)]
skd: Use __packed only when needed
Since needless use of __packed slows down access to data structures,
only use __packed when needed.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:08 +0000 (13:13 -0700)]
skd: Check structure sizes at build time
This patch will help to verify the changes made by the next patch.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:07 +0000 (13:13 -0700)]
skd: Use a structure instead of hardcoding structure offsets
This change makes the source code easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:06 +0000 (13:13 -0700)]
skd: Simplify the code for allocating DMA message buffers
dma_alloc_coherent() guarantees alignment on a page boundary so
no explicit alignment is needed to align on a 64 byte boundary.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:05 +0000 (13:13 -0700)]
skd: Simplify the code for deciding whether or not to send a FIT msg
Due to the previous patch it is guaranteed that the FIT msg contains
at least one request after the for-loop has finished. Use this to
simplify the code.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:04 +0000 (13:13 -0700)]
skd: Reorder the code in skd_process_request()
Prepare the S/G-list before allocating a FIT msg such that the FIT
msg always contains at least one request after the for-loop is
finished.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:03 +0000 (13:13 -0700)]
skd: Fix size argument in skd_free_skcomp()
Pass the correct size to pci_free_consistent() in skd_free_skcomp().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:02 +0000 (13:13 -0700)]
skd: Introduce SKD_SKCOMP_SIZE
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:01 +0000 (13:13 -0700)]
skd: Introduce the symbolic constant SKD_MAX_REQ_PER_MSG
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:13:00 +0000 (13:13 -0700)]
skd: Document locking assumptions
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:59 +0000 (13:12 -0700)]
skd: Fix endianness annotations
Ensure that sparse does not report any warnings when building the
skd driver with sparse verification enabled (C=1 or C=2).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:58 +0000 (13:12 -0700)]
skd: Switch from the pr_*() to the dev_*() logging functions
Use dev_err() and dev_info() instead of pr_err() and pr_info().
Since dev_dbg() is able to report file name and line number
information, remove __FILE__ and __LINE__ from the dev_dbg() calls.
Remove the struct skd_device members and the function (skd_name())
that became superfluous due to these changes.
This patch removes the device name and serial number from log
statements. An example of the old log line format:
(skd0:STM000196603:[0000:00:09.0]): Driver state STARTING(3)=>ONLINE(4)
An example of the new log line format:
skd:0000:00:09.0: Driver state STARTING(3)=>ONLINE(4)
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:57 +0000 (13:12 -0700)]
skd: Remove useless barrier() calls
The purpose of barrier() is to prevent reordering by the compiler.
Since the compiler does not reorder calls to non-pure functions,
remove the barrier() calls from skd_reg_{read,write}{32,64}().
Since pr_debug() is able to report file name and line number
information, remove __FILE__ and __LINE__ from the pr_debug() calls.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:56 +0000 (13:12 -0700)]
skd: Remove a set-but-not-used variable from struct skd_device
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:55 +0000 (13:12 -0700)]
skd: Remove set-but-not-used local variables
These variables have been detected by building with W=1. Declare
'acc' as __maybe_unused because most access_ok() implementations
ignore their first argument. This patch does not change any
functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:54 +0000 (13:12 -0700)]
skd: Fix a function name in a comment
There is no function skd_completion_posted_isr() in the skd driver
but there is a function called skd_isr_completion_posted(). Fix
the function name in the comment.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:53 +0000 (13:12 -0700)]
skd: Fix spelling in a source code comment
Change "ptimal" into "optimal" and remove the misleading reference
to sysfs.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:52 +0000 (13:12 -0700)]
skd: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:51 +0000 (13:12 -0700)]
skd: Remove unnecessary blank lines
This patch does not change any functionality but makes the skd
driver source code more uniform with that of other kernel drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:50 +0000 (13:12 -0700)]
skd: Remove ESXi code
Since the code guarded by #ifdef SKD_VMK_POLL_HANDLER / #endif
is never built on Linux systems, remove it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:49 +0000 (13:12 -0700)]
skd: Remove unneeded #include directives
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:48 +0000 (13:12 -0700)]
skd: Update maintainer information
E-mails sent to support@stec-inc.com bounce. Hence remove that
e-mail address from the driver. Add an entry to the MAINTAINERS
file instead.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:47 +0000 (13:12 -0700)]
skd: Switch to GPLv2
This change does not affect any skd driver version derived from a
dual licensed code base but makes all code derived from future
upstream skd driver versions GPLv2 only.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:46 +0000 (13:12 -0700)]
skd: Submit requests to firmware before triggering the doorbell
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:45 +0000 (13:12 -0700)]
skd: Avoid that module unloading triggers a use-after-free
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
dump_stack+0x63/0x84
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
refcount_sub_and_test+0x70/0x80
refcount_dec_and_test+0x11/0x20
kobject_put+0x1f/0x50
blk_put_queue+0x15/0x20
disk_release+0xae/0xf0
device_release+0x32/0x90
kobject_release+0x67/0x170
kobject_put+0x2b/0x50
put_disk+0x17/0x20
skd_destruct+0x5c/0x890 [skd]
skd_pci_probe+0x124d/0x13a0 [skd]
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x19e/0x470
worker_thread+0x1dc/0x4a0
kthread+0x125/0x140
ret_from_fork+0x25/0x30
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:44 +0000 (13:12 -0700)]
block: Relax a check in blk_start_queue()
Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:
WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
skd_unquiesce_dev+0x12a/0x1d0 [skd]
skd_complete_internal+0x1e7/0x5a0 [skd]
skd_complete_other+0xc2/0xd0 [skd]
skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
skd_isr+0x14f/0x180 [skd]
irq_forced_thread_fn+0x2a/0x70
irq_thread+0x144/0x1a0
kthread+0x125/0x140
ret_from_fork+0x2a/0x40
Fixes: commit
a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:11 +0000 (16:23 -0700)]
xen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:10 +0000 (16:23 -0700)]
xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:09 +0000 (16:23 -0700)]
xen-blkback: Fix indentation
Avoid that smatch reports the following warning when building with
C=2 CHECK="smatch -p=kernel":
drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:08 +0000 (16:23 -0700)]
virtio_blk: Use blk_rq_is_scsi()
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:07 +0000 (16:23 -0700)]
ide-floppy: Use blk_rq_is_scsi()
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:06 +0000 (16:23 -0700)]
genhd: Annotate all part and part_tbl pointer dereferences
Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with
rcu_dereference_protected(). This patch does not change the behavior
of the modified code but ensures that sparse does not complain about
disk->part_tbl manipulations nor about part_tbl->part accesses.
Additionally, improve documentation of the locking requirements of
the modified functions.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:04 +0000 (16:23 -0700)]
blk-mq-debugfs: Declare a local symbol static
This was detected by sparse.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:03 +0000 (16:23 -0700)]
blk-mq: Make blk_mq_reinit_tagset() calls easier to read
Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:01 +0000 (16:23 -0700)]
block: Unexport blk_queue_end_tag()
This function is only used inside the block layer core. Hence
unexport it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:00 +0000 (16:23 -0700)]
block: Fix two comments that refer to .queue_rq() return values
Since patch "blk-mq: switch .queue_rq return value to blk_status_t"
.queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_*
value. Hence refer to the former in comments about .queue_rq()
return values.
Fixes: commit
39a70c76b89b ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Mon, 14 Aug 2017 18:56:16 +0000 (18:56 +0000)]
nbd: change the default nbd partitions
There's no reason to have partitions disabled for nbd by default, it costs us
nothing to have it enabled and is just confusing/obnoxious to users who try to
use partitions with nbd.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Mon, 14 Aug 2017 18:25:33 +0000 (18:25 +0000)]
nbd: allow device creation at a specific index
If users really want to use a particular index for their nbd device and it
doesn't already exist there's no reason we can't just create it for them. Do
this instead of erroring out.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anton Volkov [Mon, 7 Aug 2017 12:37:50 +0000 (15:37 +0300)]
loop: fix to a race condition due to the early registration of device
The early device registration made possible a race leading to allocations
of disks with wrong minors.
This patch moves the device registration further down the loop_init
function to make the race infeasible.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ritesh Harjani [Wed, 9 Aug 2017 12:58:32 +0000 (18:28 +0530)]
cfq: Give a chance for arming slice idle timer in case of group_idle
In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.
For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan 1 19:41:49 1970
write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan 1 19:41:49 1970
write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)
In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.
Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.
With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan 1 00:06:08 1970
write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan 1 00:06:08 1970
write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Fri, 4 Aug 2017 05:35:11 +0000 (07:35 +0200)]
block, bfq: boost throughput with flash-based non-queueing devices
When a queue associated with a process remains empty, there are cases
where throughput gets boosted if the device is idled to await the
arrival of a new I/O request for that queue. Currently, BFQ assumes
that one of these cases is when the device has no internal queueing
(regardless of the properties of the I/O being served). Unfortunately,
this condition has proved to be too general. So, this commit refines it
as "the device has no internal queueing and is rotational".
This refinement provides a significant throughput boost with random
I/O, on flash-based storage without internal queueing. For example, on
a HiKey board, throughput increases by up to 125%, growing, e.g., from
6.9MB/s to 15.6MB/s with two or three random readers in parallel.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Fri, 4 Aug 2017 05:35:10 +0000 (07:35 +0200)]
block,bfq: refactor device-idling logic
The logic that decides whether to idle the device is scattered across
three functions. Almost all of the logic is in the function
bfq_bfqq_may_idle, but (1) part of the decision is made in
bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
switch off idling regardless of the output of bfq_bfqq_may_idle. In
addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
their decisions as a function of parameters that are used, for similar
purposes, also in bfq_bfqq_may_idle. This commit addresses these
issues by moving all the logic into bfq_bfqq_may_idle.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 10 Aug 2017 14:25:38 +0000 (08:25 -0600)]
block: remove unused syncfull/asyncfull queue flags
We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:53:33 +0000 (17:53 -0600)]
blk-mq: enable checking two part inflight counts at the same time
Modify blk_mq_in_flight() to count both a partition and root at
the same time. Then we only have to call it once, instead of
potentially looping the tags twice.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:51:45 +0000 (17:51 -0600)]
blk-mq: provide internal in-flight variant
We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:49:47 +0000 (17:49 -0600)]
block: make part_in_flight() take an array of two ints
Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root inflight count, if the partition passed in is not the
root.
This is in preparation for being able to calculate both in one
go.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 1 Jul 2017 03:55:08 +0000 (21:55 -0600)]
block: pass in queue to inflight accounting
No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>