blk-mq: avoid double ->queue_rq() because of early timeout
authorDavid Jeffery <djeffery@redhat.com>
Wed, 26 Oct 2022 05:19:57 +0000 (13:19 +0800)
committerJens Axboe <axboe@kernel.dk>
Mon, 31 Oct 2022 13:24:22 +0000 (07:24 -0600)
commit82c229476b8f6afd7e09bc4dc77d89dc19ff7688
treede0b228a4437d3fcfb2f7fb2da59503f359fd269
parent95465318849f7525f4dce1b720e4627f48963327
blk-mq: avoid double ->queue_rq() because of early timeout

David Jeffery found one double ->queue_rq() issue, so far it can
be triggered in VM use case because of long vmexit latency or preempt
latency of vCPU pthread or long page fault in vCPU pthread, then block
IO req could be timed out before queuing the request to hardware but after
calling blk_mq_start_request() during ->queue_rq(), then timeout handler
may handle it by requeue, then double ->queue_rq() is caused, and kernel
panic.

So far, it is driver's responsibility to cover the race between timeout
and completion, so it seems supposed to be solved in driver in theory,
given driver has enough knowledge.

But it is really one common problem, lots of driver could have similar
issue, and could be hard to fix all affected drivers, even it isn't easy
for driver to handle the race. So David suggests this patch by draining
in-progress ->queue_rq() for solving this issue.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: virtualization@lists.linux-foundation.org
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221026051957.358818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block/blk-mq.c