nvme: fix use after free when disconnecting a reconnecting ctrl
authorRuozhu Li <liruozhu@huawei.com>
Thu, 4 Nov 2021 07:13:32 +0000 (15:13 +0800)
committerChristoph Hellwig <hch@lst.de>
Tue, 7 Dec 2021 17:21:16 +0000 (18:21 +0100)
A crash happens when trying to disconnect a reconnecting ctrl:

 1) The network was cut off when the connection was just established,
    scan work hang there waiting for some IOs complete.  Those I/Os were
    retried because we return BLK_STS_RESOURCE to blk in reconnecting.
 2) After a while, I tried to disconnect this connection.  This
    procedure also hangs because it tried to obtain ctrl->scan_lock.
    It should be noted that now we have switched the controller state
    to NVME_CTRL_DELETING.
 3) In nvme_check_ready(), we always return true when ctrl->state is
    NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
    device which was already freed.

To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live.  If not, return host path error to
the block layer

Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
drivers/nvme/host/core.c
drivers/nvme/host/nvme.h

index 4ee7d2f..1af8a45 100644 (file)
@@ -666,6 +666,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
                struct request *rq)
 {
        if (ctrl->state != NVME_CTRL_DELETING_NOIO &&
+           ctrl->state != NVME_CTRL_DELETING &&
            ctrl->state != NVME_CTRL_DEAD &&
            !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) &&
            !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
index b334af8..9b095ee 100644 (file)
@@ -709,7 +709,7 @@ static inline bool nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
                return true;
        if (ctrl->ops->flags & NVME_F_FABRICS &&
            ctrl->state == NVME_CTRL_DELETING)
-               return true;
+               return queue_live;
        return __nvme_check_ready(ctrl, rq, queue_live);
 }
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,