nvme: fix possible hang when ns scanning fails during error recovery
authorSagi Grimberg <sagi@grimberg.me>
Wed, 6 May 2020 22:44:02 +0000 (15:44 -0700)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 14 May 2020 05:58:18 +0000 (07:58 +0200)
commit56fc76893f875cb86128ab7e48934c6c53331ae2
tree785545dad90b5cf547a2ce1af6c7b578b57c4341
parentfb1b41128c70ce5691a114169aaafe567434818b
nvme: fix possible hang when ns scanning fails during error recovery

[ Upstream commit 59c7c3caaaf8750df4ec3255082f15eb4e371514 ]

When the controller is reconnecting, the host fails I/O and admin
commands as the host cannot reach the controller. ns scanning may
revalidate namespaces during that period and it is wrong to remove
namespaces due to these failures as we may hang (see 205da2434301).

One command that may fail is nvme_identify_ns_descs. Since we return
success due to having ns identify descriptor list optional, we continue
to compare ns identifiers in nvme_revalidate_disk, obviously fail and
return -ENODEV to nvme_validate_ns, which will remove the namespace.

Exactly what we don't want to happen.

Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional")
Tested-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/nvme/host/core.c