blk-mq: avoid double ->queue_rq() because of early timeout
authorDavid Jeffery <djeffery@redhat.com>
Wed, 26 Oct 2022 05:19:57 +0000 (13:19 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Sat, 31 Dec 2022 12:32:59 +0000 (13:32 +0100)
commit8b3d6b029a552d2978bbac275303d11419826a69
tree97b80237c188bd8b3066f60d0f7f4dc925009e49
parent6ec50ce315f47b4d0789e7d61e39886a179c6adb
blk-mq: avoid double ->queue_rq() because of early timeout

[ Upstream commit 82c229476b8f6afd7e09bc4dc77d89dc19ff7688 ]

David Jeffery found one double ->queue_rq() issue, so far it can
be triggered in VM use case because of long vmexit latency or preempt
latency of vCPU pthread or long page fault in vCPU pthread, then block
IO req could be timed out before queuing the request to hardware but after
calling blk_mq_start_request() during ->queue_rq(), then timeout handler
may handle it by requeue, then double ->queue_rq() is caused, and kernel
panic.

So far, it is driver's responsibility to cover the race between timeout
and completion, so it seems supposed to be solved in driver in theory,
given driver has enough knowledge.

But it is really one common problem, lots of driver could have similar
issue, and could be hard to fix all affected drivers, even it isn't easy
for driver to handle the race. So David suggests this patch by draining
in-progress ->queue_rq() for solving this issue.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: virtualization@lists.linux-foundation.org
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221026051957.358818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
block/blk-mq.c