Bart Van Assche [Fri, 18 Jun 2021 00:44:47 +0000 (17:44 -0700)]
block/mq-deadline: Remove two local variables
Make __dd_dispatch_request() easier to read by removing two local
variables.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210618004456.7280-8-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:46 +0000 (17:44 -0700)]
block/mq-deadline: Add two lockdep_assert_held() statements
Document the locking strategy by adding two lockdep_assert_held()
statements.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210618004456.7280-7-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:45 +0000 (17:44 -0700)]
block/mq-deadline: Add several comments
Make the code easier to read by adding more comments.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210618004456.7280-6-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:44 +0000 (17:44 -0700)]
block: Introduce the ioprio rq-qos policy
Introduce an rq-qos policy that assigns an I/O priority to requests based
on blk-cgroup configuration settings. This policy has the following
advantages over the ioprio_set() system call:
- This policy is cgroup based so it has all the advantages of cgroups.
- While ioprio_set() does not affect page cache writeback I/O, this rq-qos
controller affects page cache writeback I/O for filesystems that support
assiociating a cgroup with writeback I/O. See also
Documentation/admin-guide/cgroup-v2.rst.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210618004456.7280-5-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:43 +0000 (17:44 -0700)]
block/blk-rq-qos: Move a function from a header file into a C file
rq_qos_id_to_name() is only used in blk-mq-debugfs.c so move that function
into in blk-mq-debugfs.c.
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20210618004456.7280-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:42 +0000 (17:44 -0700)]
block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls
Before adding more calls in this function, simplify the error path.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210618004456.7280-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 18 Jun 2021 00:44:41 +0000 (17:44 -0700)]
block/Kconfig: Make the BLK_WBT and BLK_WBT_MQ entries consecutive
These entries were consecutive at the time of their introduction but are no
longer consecutive. Make these again consecutive. Additionally, modify the
help text since it refers to blk-mq and since the legacy block layer has
been removed.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20210618004456.7280-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
lijiazi [Fri, 18 Jun 2021 03:17:20 +0000 (11:17 +0800)]
blk-wbt: remove outdated comment
Now wbt_wait() returns void, so remove now outdated comment.
Signed-off-by: lijiazi <lijiazi@xiaomi.com>
Link: https://lore.kernel.org/r/1623986240-13878-1-git-send-email-lijiazi@xiaomi.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan Carpenter [Fri, 18 Jun 2021 13:45:22 +0000 (16:45 +0300)]
blk-mq: fix an IS_ERR() vs NULL bug
The __blk_mq_alloc_disk() function doesn't return NULLs it returns
error pointers.
Fixes:
b461dfc49eb6 ("blk-mq: add the blk_mq_alloc_disk APIs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/YMyjci35WBqrtqG+@mwanda
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Fri, 18 Jun 2021 01:59:22 +0000 (10:59 +0900)]
block: Remove unnecessary elevator operation checks
The insert_requests and dispatch_request elevator operations are
mandatory for the correct execution of an elevator, and all implemented
elevators (bfq, kyber and mq-deadline) implement them. As a result,
there is no need to check for these operations before calling them when
a queue has an elevator set. This simplifies the code in
__blk_mq_sched_dispatch_requests() and blk_mq_sched_insert_request().
To avoid out-of-tree elevators to crash the kernel in case of bad
implementation, add a check in elv_register() to verify that these
operations are implemented.
A small, probably not significant, IOPS improvement of 0.1% is observed
with this patch applied (4.117 MIOPS to 4.123 MIOPS, average of 20 fio
runs doing 4K random direct reads with psync and 32 jobs).
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210618015922.713999-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 9 Jun 2021 06:30:46 +0000 (14:30 +0800)]
blk-mq: fix use-after-free in blk_mq_exit_sched
tagset can't be used after blk_cleanup_queue() is returned because
freeing tagset usually follows blk_clenup_queue(). Commit
d97e594c5166
("blk-mq: Use request queue-wide tags for tagset-wide sbitmap") adds
check on q->tag_set->flags in blk_mq_exit_sched(), and causes
use-after-free.
Fixes it by using hctx->flags.
Reported-by: syzbot+77ba3d171a25c56756ea@syzkaller.appspotmail.com
Fixes:
d97e594c5166 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap")
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20210609063046.122843-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kir Kolyshkin [Fri, 11 Jun 2021 03:07:37 +0000 (20:07 -0700)]
docs/cgroup-v1/blkio: update for 5.x kernels
Commit
bf382fb0bcef4 ("block: remove legacy IO schedulers", Oct 12 2018)
removes the CFQ scheduler, together with blkio.weight and
blkio.weight_device described in cgroup v1 documentation. Users are
supposed to use the BFQ scheduler, which cgroup file for setting weight
is blkio.bfq.weight, but there is no way to set per-device weight.
Later, commit
795fe54c2a8 per-device weights for BFQ, meaning that
blkio.bfq.weight and blkio.bfq.weight_device can be used in a way
similar to the old CFQ cgroup interface.
Yet, the cgroup v1 docs were never updated. Fix this:
- use the new file names;
- fix the range for weight (used to be 10..1000, now 1..1000);
- link to BFQ scheduler docs.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kir Kolyshkin [Fri, 11 Jun 2021 03:07:36 +0000 (20:07 -0700)]
docs/cgroup-v1/blkio: stop abusing itemized list
Fix many formatting issues by stop (ab)using itemized lists for
everything (mostly replaced by definition lists).
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kir Kolyshkin [Mon, 14 Jun 2021 21:41:09 +0000 (14:41 -0700)]
docs: block/bfq: describe per-device weight
The functionality of setting per-device weight for BFQ was added
in v5.4 (commit
795fe54c2a828099), but the documentation was never
updated.
While at it, improve formatting a bit.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Link: https://lore.kernel.org/r/20210614214109.207430-1-kolyshkin@gmail.com
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 9 Jun 2021 01:58:22 +0000 (09:58 +0800)]
block: mark queue init done at the end of blk_register_queue
Mark queue init done when everything is done well in blk_register_queue(),
so that wbt_enable_default() can be run quickly without any RCU period
involved since adding rq qos requires to freeze queue.
Also no any side effect by delaying to mark queue init done.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20210609015822.103433-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 9 Jun 2021 01:58:21 +0000 (09:58 +0800)]
block: fix race between adding/removing rq qos and normal IO
Yi reported several kernel panics on:
[16687.001777] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
...
[16687.163549] pc : __rq_qos_track+0x38/0x60
or
[ 997.690455] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000020
...
[ 997.850347] pc : __rq_qos_done+0x2c/0x50
Turns out it is caused by race between adding rq qos(wbt) and normal IO
because rq_qos_add can be run when IO is being submitted, fix this issue
by freezing queue before adding/deleting rq qos to queue.
rq_qos_exit() needn't to freeze queue because it is called after queue
has been frozen.
iolatency calls rq_qos_add() during allocating queue, so freezing won't
add delay because queue usage refcount works at atomic mode at that
time.
iocost calls rq_qos_add() when writing cgroup attribute file, that is
fine to freeze queue at that time since we usually freeze queue when
storing to queue sysfs attribute, meantime iocost only exists on the
root cgroup.
wbt_init calls it in blk_register_queue() and queue sysfs attribute
store(queue_wb_lat_store() when write it 1st time in case of !BLK_WBT_MQ),
the following patch will speedup the queue freezing in wbt_init.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20210609015822.103433-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 16 Jun 2021 07:15:47 +0000 (09:15 +0200)]
loop: fix order of cleaning up the queue and freeing the tagset
We must release the queue before freeing the tagset.
Fixes:
1c99502fae35 ("loop: use blk_mq_alloc_disk and blk_cleanup_disk")
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 16 Jun 2021 07:15:46 +0000 (09:15 +0200)]
mtd_blkdevs: initialze new->rq in add_mtd_blktrans_dev
Various places expect the request_queue in ->rq. Initialize it to
avoid NULL pointer derefences.
Fixes:
6966bb921def ("mtd_blkdevs: use blk_mq_alloc_disk")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:45 +0000 (09:53 +0300)]
z2ram: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-31-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:44 +0000 (09:53 +0300)]
ataflop: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-30-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:43 +0000 (09:53 +0300)]
amiflop: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-29-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:42 +0000 (09:53 +0300)]
scm_blk: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:41 +0000 (09:53 +0300)]
ubi: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:40 +0000 (09:53 +0300)]
xen-blkfront: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:39 +0000 (09:53 +0300)]
sx8: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:38 +0000 (09:53 +0300)]
rnbd: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:37 +0000 (09:53 +0300)]
rbd: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:36 +0000 (09:53 +0300)]
pd: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:35 +0000 (09:53 +0300)]
nullb: use blk_mq_alloc_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:34 +0000 (09:53 +0300)]
nbd: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:33 +0000 (09:53 +0300)]
loop: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:32 +0000 (09:53 +0300)]
floppy: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:31 +0000 (09:53 +0300)]
aoe: use blk_mq_alloc_disk and blk_cleanup_disk
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and
request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:30 +0000 (09:53 +0300)]
blk-mq: remove blk_mq_init_sq_queue
All users are gone now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:29 +0000 (09:53 +0300)]
gdrom: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:28 +0000 (09:53 +0300)]
sunvdc: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:27 +0000 (09:53 +0300)]
swim: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:26 +0000 (09:53 +0300)]
swim3: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:25 +0000 (09:53 +0300)]
ps3disk: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:24 +0000 (09:53 +0300)]
mtd_blkdevs: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:23 +0000 (09:53 +0300)]
mspro: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:22 +0000 (09:53 +0300)]
ms_block: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:21 +0000 (09:53 +0300)]
pf: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:20 +0000 (09:53 +0300)]
pcd: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:19 +0000 (09:53 +0300)]
virtio-blk: use blk_mq_alloc_disk
Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:18 +0000 (09:53 +0300)]
blk-mq: add the blk_mq_alloc_disk APIs
Add a new API to allocate a gendisk including the request_queue for use
with blk-mq based drivers. This is to avoid boilerplate code in drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:17 +0000 (09:53 +0300)]
blk-mq: improve the blk_mq_init_allocated_queue interface
Don't return the passed in request_queue but a normal error code, and
drop the elevator_init argument in favor of just calling elevator_init_mq
directly from dm-rq.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 2 Jun 2021 06:53:16 +0000 (09:53 +0300)]
blk-mq: factor out a blk_mq_alloc_sq_tag_set helper
Factour out a helper to initialize a simple single hw queue tag_set from
blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue
in favor of a more symmetric and general API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan Williams [Mon, 7 Jun 2021 23:52:43 +0000 (16:52 -0700)]
libnvdimm/pmem: Fix blk_cleanup_disk() usage
The queue_to_disk() helper can not be used after del_gendisk()
communicate @disk via the pgmap->owner.
Otherwise, queue_to_disk() returns NULL resulting in the splat below.
Kernel attempted to read user page (330) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000330
Faulting instruction address: 0xc000000000906344
Oops: Kernel access of bad area, sig: 11 [#1]
[..]
NIP [
c000000000906344] pmem_pagemap_cleanup+0x24/0x40
LR [
c0000000004701d4] memunmap_pages+0x1b4/0x4b0
Call Trace:
[
c000000022cbb9c0] [
c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable)
[
c000000022cbb9e0] [
c0000000004701d4] memunmap_pages+0x1b4/0x4b0
[
c000000022cbba90] [
c0000000008b28a0] devm_action_release+0x30/0x50
[
c000000022cbbab0] [
c0000000008b39c8] release_nodes+0x2f8/0x3e0
[
c000000022cbbb60] [
c0000000008ac440] device_release_driver_internal+0x190/0x2b0
[
c000000022cbbba0] [
c0000000008a8450] unbind_store+0x130/0x170
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Fixes:
87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk")
Link: http://lore.kernel.org/r/DFB75BA8-603F-4A35-880B-C5B23EF8FA7D@linux.vnet.ibm.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com
[axboe: fold in compile warning fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Mon, 7 Jun 2021 11:26:13 +0000 (13:26 +0200)]
rq-qos: fix missed wake-ups in rq_qos_throttle try two
Commit
545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:
CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker)
rq_qos_wait() rq_qos_wait()
acquire_inflight_cb() -> fails
acquire_inflight_cb() -> fails
completes IOs, inflight
decreased
prepare_to_wait_exclusive()
prepare_to_wait_exclusive()
has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
has_sleeper = !wq_has_single_sleeper() -> true
io_schedule() io_schedule()
Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.
Fixes:
545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Long Li [Mon, 7 Jun 2021 19:34:05 +0000 (12:34 -0700)]
block: return the correct bvec when checking for gaps
After commit
07173c3ec276 ("block: enable multipage bvecs"), a bvec can
have multiple pages. But bio_will_gap() still assumes one page bvec while
checking for merging. If the pages in the bvec go across the
seg_boundary_mask, this check for merging can potentially succeed if only
the 1st page is tested, and can fail if all the pages are tested.
Later, when SCSI builds the SG list the same check for merging is done in
__blk_segment_map_sg_merge() with all the pages in the bvec tested. This
time the check may fail if the pages in bvec go across the
seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
BIOs were merged). If this check fails, we end up with a broken SG list
for drivers assuming the SG list not having offsets in intermediate pages.
This results in incorrect pages written to the disk.
Fix this by returning the multi-page bvec when testing gaps for merging.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes:
07173c3ec276 ("block: enable multipage bvecs")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 19 May 2021 17:52:26 +0000 (10:52 -0700)]
block: Update blk_update_request() documentation
Although the original intent was to use blk_update_request() in stacking
block drivers only, it is used much more widely today. Reflect this in the
documentation block above this function. See also:
* commit
32fab448e5e8 ("block: add request update interface").
* commit
2e60e02297cf ("block: clean up request completion API").
* commit
ed6565e73424 ("block: handle partial completions for special
payload requests").
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210519175226.8853-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Thu, 3 Jun 2021 10:47:21 +0000 (12:47 +0200)]
block: Do not pull requests from the scheduler when we cannot dispatch them
Provided the device driver does not implement dispatch budget accounting
(which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
requests from the IO scheduler as long as it is willing to give out any.
That defeats scheduling heuristics inside the scheduler by creating
false impression that the device can take more IO when it in fact
cannot.
For example with BFQ IO scheduler on top of virtio-blk device setting
blkio cgroup weight has barely any impact on observed throughput of
async IO because __blk_mq_do_dispatch_sched() always sucks out all the
IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
when that is all dispatched, it will give out IO of lower weight cgroups
as well. And then we have to wait for all this IO to be dispatched to
the disk (which means lot of it actually has to complete) before the
IO scheduler is queried again for dispatching more requests. This
completely destroys any service differentiation.
So grab request tag for a request pulled out of the IO scheduler already
in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
cannot get it because we are unlikely to be able to dispatch it. That
way only single request is going to wait in the dispatch list for some
tag to free.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Colin Ian King [Wed, 2 Jun 2021 10:06:59 +0000 (11:06 +0100)]
null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk call
The error handling on a nullb->disk allocation currently jumps to
out_cleanup_disk that calls blk_cleanup_disk with a null pointer causing
a null pointer dereference issue. Fix this by jumping to out_cleanup_tags
instead.
Addresses-Coverity: ("Dereference after null check")
Fixes:
132226b301b5 ("null_blk: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210602100659.11058-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:13:01 +0000 (08:13 +0200)]
block: remove bdget_disk
Just opencode the xa_load in the callers, as none of them actually
needs a reference to the bdev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:13:00 +0000 (08:13 +0200)]
block: factor out a part_devt helper
Add a helper to find the dev_t for a disk + partno tuple.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:59 +0000 (08:12 +0200)]
block: move bd_part_count to struct gendisk
The bd_part_count value only makes sense for whole devices, so move it
to struct gendisk and give it a more descriptive name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:58 +0000 (08:12 +0200)]
block: split __blkdev_put
Split __blkdev_put into one helper for the whole device, and one for
partitions as well as another shared helper for flushing the block
device inode mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:57 +0000 (08:12 +0200)]
block: move adjusting bd_part_count out of __blkdev_get
Keep in the callers and thus remove the for_part argument. This mirrors
what is done on the blkdev_get side and slightly simplifies
blkdev_get_part as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@rehat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:56 +0000 (08:12 +0200)]
block: move bd_mutex to struct gendisk
Replace the per-block device bd_mutex with a per-gendisk open_mutex,
thus simplifying locking wherever we deal with partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:55 +0000 (08:12 +0200)]
block: move sync_blockdev from __blkdev_put to blkdev_put
Do the early unlocked syncing even earlier to move more code out of
the recursive path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210525061301.2242282-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 25 May 2021 06:12:54 +0000 (08:12 +0200)]
block: split __blkdev_get
Split __blkdev_get into one helper for the whole device, and one for
opening partitions. This removes the (bounded) recursion when opening
a partition.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210525061301.2242282-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:16 +0000 (07:51 +0200)]
block: unexport blk_alloc_queue
blk_alloc_queue is just an internal helper now, unexport it and remove
it from the public header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:15 +0000 (07:51 +0200)]
null_blk: convert to blk_alloc_disk/blk_cleanup_disk
Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation. Note that the
blk-mq mode is left with its own allocations scheme, to be handled later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:14 +0000 (07:51 +0200)]
xpram: convert to blk_alloc_disk/blk_cleanup_disk
Convert the xpram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:13 +0000 (07:51 +0200)]
dcssblk: convert to blk_alloc_disk/blk_cleanup_disk
Convert the dcssblk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:12 +0000 (07:51 +0200)]
ps3vram: convert to blk_alloc_disk/blk_cleanup_disk
Convert the ps3vram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:11 +0000 (07:51 +0200)]
n64cart: convert to blk_alloc_disk
Convert the n64cart driver to use the blk_alloc_disk helper to simplify
gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:10 +0000 (07:51 +0200)]
simdisk: convert to blk_alloc_disk/blk_cleanup_disk
Convert the simdisk driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:09 +0000 (07:51 +0200)]
nfblock: convert to blk_alloc_disk/blk_cleanup_disk
Convert the nfblock driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:08 +0000 (07:51 +0200)]
nvme-multipath: convert to blk_alloc_disk/blk_cleanup_disk
Convert the nvme-multipath driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:07 +0000 (07:51 +0200)]
nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk
Convert the nvdimm-pmem driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:06 +0000 (07:51 +0200)]
nvdimm-btt: convert to blk_alloc_disk/blk_cleanup_disk
Convert the nvdimm-btt driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:05 +0000 (07:51 +0200)]
nvdimm-blk: convert to blk_alloc_disk/blk_cleanup_disk
Convert the nvdimm-blk driver to use the blk_alloc_disk and
blk_cleanup_disk helpers to simplify gendisk and request_queue
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:04 +0000 (07:51 +0200)]
md: convert to blk_alloc_disk/blk_cleanup_disk
Convert the md driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:03 +0000 (07:51 +0200)]
dm: convert to blk_alloc_disk/blk_cleanup_disk
Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:02 +0000 (07:51 +0200)]
bcache: convert to blk_alloc_disk/blk_cleanup_disk
Convert the bcache driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:01 +0000 (07:51 +0200)]
lightnvm: convert to blk_alloc_disk/blk_cleanup_disk
Convert the lightnvm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:51:00 +0000 (07:51 +0200)]
zram: convert to blk_alloc_disk/blk_cleanup_disk
Convert the zram driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:59 +0000 (07:50 +0200)]
rsxx: convert to blk_alloc_disk/blk_cleanup_disk
Convert the rsxx driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:58 +0000 (07:50 +0200)]
pktcdvd: convert to blk_alloc_disk/blk_cleanup_disk
Convert the pktcdvd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:57 +0000 (07:50 +0200)]
drbd: convert to blk_alloc_disk/blk_cleanup_disk
Convert the drbd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:56 +0000 (07:50 +0200)]
brd: convert to blk_alloc_disk/blk_cleanup_disk
Convert the brd driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation. This also
allows to remove the request_queue pointer in struct request_queue,
and to simplify the initialization as blk_cleanup_disk can be called
on any disk returned from blk_alloc_disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:55 +0000 (07:50 +0200)]
block: add blk_alloc_disk and blk_cleanup_disk APIs
Add two new APIs to allocate and free a gendisk including the
request_queue for use with BIO based drivers. This is to avoid
boilerplate code in drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:54 +0000 (07:50 +0200)]
block: add a flag to make put_disk on partially initalized disks safer
Add a flag to indicate that __device_add_disk did grab a queue reference
so that disk_release only drops it if we actually had it. This sort
out one of the major pitfals with partially initialized gendisk that
a lot of drivers did get wrong or still do.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:53 +0000 (07:50 +0200)]
block: automatically enable GENHD_FL_EXT_DEVT
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated
without an explicit number of minors. This is what all new block
drivers should do, so make sure it is the default without boilerplate
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:52 +0000 (07:50 +0200)]
block: move the DISK_MAX_PARTS sanity check into __device_add_disk
Keep this together with the first place that actually looks at
->minors and prepare for not passing a minors argument to
alloc_disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 21 May 2021 05:50:51 +0000 (07:50 +0200)]
block: refactor device number setup in __device_add_disk
Untangle the mess around blk_alloc_devt by moving the check for
the used allocation scheme into the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
John Garry [Thu, 13 May 2021 12:00:58 +0000 (20:00 +0800)]
blk-mq: Use request queue-wide tags for tagset-wide sbitmap
The tags used for an IO scheduler are currently per hctx.
As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.
This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.
Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/
440dfcfc-1a2c-bd98-1161-
cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b
In that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.
Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).
For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
John Garry [Thu, 13 May 2021 12:00:57 +0000 (20:00 +0800)]
blk-mq: Some tag allocation code refactoring
The tag allocation code to alloc the sbitmap pairs is common for regular
bitmaps tags and shared sbitmap, so refactor into a common function.
Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap().
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-2-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 11 May 2021 15:22:36 +0000 (23:22 +0800)]
blk-mq: clearing flush request reference in tags->rqs[]
Before we free request queue, clearing flush request reference in
tags->rqs[], so that potential UAF can be avoided.
Based on one patch written by David Jeffery.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 11 May 2021 15:22:35 +0000 (23:22 +0800)]
blk-mq: clear stale request in tags->rq[] before freeing one request pool
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.
Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()
2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]
So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.
The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 11 May 2021 15:22:34 +0000 (23:22 +0800)]
blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.
Fix request use-after-free(UAF) related with completion race or queue
releasing:
- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.
- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.
However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Tue, 11 May 2021 15:22:33 +0000 (23:22 +0800)]
block: avoid double io accounting for flush request
For flush request, rq->end_io() may be called two times, one is from
timeout handling(blk_mq_check_expired()), another is from normal
completion(__blk_mq_end_request()).
Move blk_account_io_flush() after flush_rq->ref drops to zero, so
io accounting can be done just once for flush request.
Fixes:
b68663186577 ("block: add iostat counters for flush requests")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Max Gurtovoy [Tue, 11 May 2021 15:53:19 +0000 (15:53 +0000)]
block: remove unneeded parenthesis from blk-sysfs
Align to common code conventions.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210511155319.1885277-1-mgurtovoy@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Tue, 11 May 2021 18:58:04 +0000 (14:58 -0400)]
blkcg: drop CLONE_IO check in blkcg_can_attach()
blkcg has always rejected to attach if any of the member tasks has shared
io_context. The rationale was that io_contexts can be shared across
different cgroups making it impossible to define what the appropriate
control behavior should be. However, this check causes more problems than it
solves:
* The check prevents controller enable and migrations but not CLONE_IO
itself, which can lead to surprises as the outcome changes depending on
the order of operations.
* Sharing within a cgroup is fine but the check can't distinguish that. This
leads to unnecessary conflicts with the recent CLONE_IO usage in io_uring.
io_context sharing doesn't make any difference for rq_qos based controllers
and the way it's used is safe as long as tasks aren't migrated dynamically
which is the vast majority of use cases. While we can try to make the check
more precise to avoid false positives, the added complexity doesn't seem
worthwhile. Let's just drop blkcg_can_attach().
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/YJrTvHbrRDbJjw+S@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yang Yingliang [Tue, 11 May 2021 11:34:40 +0000 (19:34 +0800)]
aoe: remove unnecessary mutex_init()
The mutex ktio_spawn_lock is initialized statically.
It is unnecessary to initialize by mutex_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210511113440.3772053-1-yangyingliang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zhangyi (F) [Sat, 13 Mar 2021 03:01:46 +0000 (11:01 +0800)]
block_dump: remove comments in docs
Now block_dump feature is gone, remove all comments in docs.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210313030146.2882027-4-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zhangyi (F) [Sat, 13 Mar 2021 03:01:45 +0000 (11:01 +0800)]
block_dump: remove block_dump feature
We have already delete block_dump feature in mark_inode_dirty() because
it can be replaced by tracepoints, now we also remove the part in
submit_bio() for the same reason. The part of block dump feature in
submit_bio() dump the write process, write region and sectors on the
target disk into kernel message. it can be replaced by
block_bio_queue tracepoint in submit_bio_checks(), so we do not need
block_dump anymore, remove the whole block_dump feature.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210313030146.2882027-3-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zhangyi (F) [Sat, 13 Mar 2021 03:01:44 +0000 (11:01 +0800)]
block_dump: remove block_dump feature in mark_inode_dirty()
block_dump is an old debugging interface, one of it's functions is used
to print the information about who write which file on disk. If we
enable block_dump through /proc/sys/vm/block_dump and turn on debug log
level, we can gather information about write process name, target file
name and disk from kernel message. This feature is realized in
block_dump___mark_inode_dirty(), it print above information into kernel
message directly when marking inode dirty, so it is noisy and can easily
trigger log storm. At the same time, get the dentry refcount is also not
safe, we found it will lead to deadlock on ext4 file system with
data=journal mode.
After tracepoints has been introduced into the kernel, we got a
tracepoint in __mark_inode_dirty(), which is a better replacement of
block_dump___mark_inode_dirty(). The only downside is that it only trace
the inode number and not a file name, but it probably doesn't matter
because the original printed file name in block_dump is not accurate in
some cases, and we can still find it through the inode number and device
id. So this patch delete the dirting inode part of block_dump feature.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210313030146.2882027-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>