nvme: add error message on mismatching controller ids
authorJames Smart <jsmart2021@gmail.com>
Thu, 21 Nov 2019 17:58:10 +0000 (09:58 -0800)
committerKeith Busch <kbusch@kernel.org>
Tue, 26 Nov 2019 17:48:33 +0000 (02:48 +0900)
We've seen a few devices that return different controller id's to
the Fabric Connect command vs the Identify(controller) command. It's
currently hard to identify this failure by existing error messages. It
comes across as a (re)connect attempt in the transport that fails with
a -22 (-EINVAL) status. The issue is compounded by older kernels not
having the controller id check or had the identify command overwrite the
fabrics controller id value before it checked. Both resulted in cases
where the devices appeared fine until more recent kernels.

Clarify the reject by adding an error message on controller id mismatches.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c

index 8e85274..e6ee343 100644 (file)
@@ -2862,6 +2862,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
                 * admin connect
                 */
                if (ctrl->cntlid != le16_to_cpu(id->cntlid)) {
+                       dev_err(ctrl->device,
+                               "Mismatching cntlid: Connect %u vs Identify "
+                               "%u, rejecting\n",
+                               ctrl->cntlid, le16_to_cpu(id->cntlid));
                        ret = -EINVAL;
                        goto out_free;
                }