blk-mq: fix io hung due to missing commit_rqs
authorYu Kuai <yukuai3@huawei.com>
Tue, 26 Jul 2022 12:22:24 +0000 (20:22 +0800)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 31 Aug 2022 15:16:50 +0000 (17:16 +0200)
commit 65fac0d54f374625b43a9d6ad1f2c212bd41f518 upstream.

Currently, in virtio_scsi, if 'bd->last' is not set to true while
dispatching request, such io will stay in driver's queue, and driver
will wait for block layer to dispatch more rqs. However, if block
layer failed to dispatch more rq, it should trigger commit_rqs to
inform driver.

There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
won't be called:

// assume that queue_depth is set to 1, list contains two rq
blk_mq_try_issue_list_directly
 blk_mq_request_issue_directly
 // dispatch first rq
 // last is false
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // succeed to get first budget
   __blk_mq_issue_directly
    scsi_queue_rq
     cmd->flags |= SCMD_LAST
      virtscsi_queuecommand
       kick = (sc->flags & SCMD_LAST) != 0
       // kick is false, first rq won't issue to disk
 queued++

 blk_mq_request_issue_directly
 // dispatch second rq
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // failed to get second budget
 ret == BLK_STS_RESOURCE
  blk_mq_request_bypass_insert
 // errors is still 0

 if (!list_empty(list) || errors && ...)
  // won't pass, commit_rqs won't be called

In this situation, first rq relied on second rq to dispatch, while
second rq relied on first rq to complete, thus they will both hung.

Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
it means that request is not queued successfully.

Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
can't be treated as 'errors' here, fix the problem by calling
commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.

Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
block/blk-mq.c

index 95993c4efa4938b2bf473790767d19ee13761992..1a28ba9017edb260ed5768bdac330d320b074ec1 100644 (file)
@@ -1400,7 +1400,8 @@ out:
        /* If we didn't flush the entire list, we could have told the driver
         * there was more coming, but that turned out to be a lie.
         */
-       if ((!list_empty(list) || errors) && q->mq_ops->commit_rqs && queued)
+       if ((!list_empty(list) || errors || needs_resource ||
+            ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
                q->mq_ops->commit_rqs(hctx);
        /*
         * Any items that need requeuing? Stuff them into hctx->dispatch,
@@ -2111,6 +2112,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
                list_del_init(&rq->queuelist);
                ret = blk_mq_request_issue_directly(rq, list_empty(list));
                if (ret != BLK_STS_OK) {
+                       errors++;
                        if (ret == BLK_STS_RESOURCE ||
                                        ret == BLK_STS_DEV_RESOURCE) {
                                blk_mq_request_bypass_insert(rq, false,
@@ -2118,7 +2120,6 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
                                break;
                        }
                        blk_mq_end_request(rq, ret);
-                       errors++;
                } else
                        queued++;
        }