block: init flush rq ref count to 1
authorJosef Bacik <josef@toxicpanda.com>
Thu, 7 Mar 2019 21:37:18 +0000 (21:37 +0000)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 31 Jul 2019 05:27:07 +0000 (07:27 +0200)
commit8a1a3d3839233406eed675b1695019802dc4284a
treee2a70c6920bb3c61f1fa8ec9d337ba553bdcb6ce
parent4b9dc73a0d4adc67bdd33b8c60dcbfe1e04c61b0
block: init flush rq ref count to 1

[ Upstream commit b554db147feea39617b533ab6bca247c91c6198a ]

We discovered a problem in newer kernels where a disconnect of a NBD
device while the flush request was pending would result in a hang.  This
is because the blk mq timeout handler does

        if (!refcount_inc_not_zero(&rq->ref))
                return true;

to determine if it's ok to run the timeout handler for the request.
Flush_rq's don't have a ref count set, so we'd skip running the timeout
handler for this request and it would just sit there in limbo forever.

Fix this by always setting the refcount of any request going through
blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
requests to verify that it hung, and then tested with this patch to
verify I got the timeout as expected and the error handling kicked in.
Thanks,

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
block/blk-core.c