nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
authorChristoph Hellwig <hch@lst.de>
Thu, 13 Jul 2023 13:30:42 +0000 (15:30 +0200)
committerKeith Busch <kbusch@kernel.org>
Thu, 13 Jul 2023 15:12:50 +0000 (08:12 -0700)
While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.

Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead.  In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.

Fixes: 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: stable@vger.kernel.org # 6.0+
Co-developed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c

index 47d7ba2827ff29a84dc4c871fae307b701690627..37b6fa7466620436b2ab23cd00eb0de91309bb30 100644 (file)
@@ -3431,10 +3431,40 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
 
        ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
        if (ret) {
-               dev_err(ctrl->device,
-                       "globally duplicate IDs for nsid %d\n", info->nsid);
+               /*
+                * We've found two different namespaces on two different
+                * subsystems that report the same ID.  This is pretty nasty
+                * for anything that actually requires unique device
+                * identification.  In the kernel we need this for multipathing,
+                * and in user space the /dev/disk/by-id/ links rely on it.
+                *
+                * If the device also claims to be multi-path capable back off
+                * here now and refuse the probe the second device as this is a
+                * recipe for data corruption.  If not this is probably a
+                * cheap consumer device if on the PCIe bus, so let the user
+                * proceed and use the shiny toy, but warn that with changing
+                * probing order (which due to our async probing could just be
+                * device taking longer to startup) the other device could show
+                * up at any time.
+                */
                nvme_print_device_info(ctrl);
-               return ret;
+               if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || /* !PCIe */
+                   ((ns->ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) &&
+                    info->is_shared)) {
+                       dev_err(ctrl->device,
+                               "ignoring nsid %d because of duplicate IDs\n",
+                               info->nsid);
+                       return ret;
+               }
+
+               dev_err(ctrl->device,
+                       "clearing duplicate IDs for nsid %d\n", info->nsid);
+               dev_err(ctrl->device,
+                       "use of /dev/disk/by-id/ may cause data corruption\n");
+               memset(&info->ids.nguid, 0, sizeof(info->ids.nguid));
+               memset(&info->ids.uuid, 0, sizeof(info->ids.uuid));
+               memset(&info->ids.eui64, 0, sizeof(info->ids.eui64));
+               ctrl->quirks |= NVME_QUIRK_BOGUS_NID;
        }
 
        mutex_lock(&ctrl->subsys->lock);