PCI: Restore ARI Capable Hierarchy before setting numVFs
authorTony Nguyen <anthony.l.nguyen@intel.com>
Wed, 4 Oct 2017 15:52:58 +0000 (08:52 -0700)
committerBjorn Helgaas <bhelgaas@google.com>
Wed, 11 Oct 2017 00:15:29 +0000 (19:15 -0500)
In the restore path, we previously read PCI_SRIOV_VF_OFFSET and
PCI_SRIOV_VF_STRIDE before restoring PCI_SRIOV_CTRL_ARI:

  pci_restore_state
    pci_restore_iov_state
      sriov_restore_state
        pci_iov_set_numvfs
          pci_read_config_word(... PCI_SRIOV_VF_OFFSET, &iov->offset)
          pci_read_config_word(... PCI_SRIOV_VF_STRIDE, &iov->stride)
        pci_write_config_word(... PCI_SRIOV_CTRL, iov->ctrl)

But per SR-IOV r1.1, sec 3.3.3.5, the device can use PCI_SRIOV_CTRL_ARI to
determine PCI_SRIOV_VF_OFFSET and PCI_SRIOV_VF_STRIDE.  Therefore, this
path, which is used for suspend/resume and AER recovery, can corrupt
iov->offset and iov->stride.

Since the iov state is associated with the device, not the driver, if we
reload the driver, it will use the the corrupted data, which may cause
crashes like this:

  kernel BUG at drivers/pci/iov.c:157!
  RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
  Call Trace:
   pci_enable_sriov+0x353/0x440
   ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
   sriov_numvfs_store+0xf7/0x170
   dev_attr_store+0x18/0x30
   sysfs_kf_write+0x37/0x40
   kernfs_fop_write+0x120/0x1b0
   vfs_write+0xb5/0x1a0
   SyS_write+0x55/0xc0

Restore PCI_SRIOV_CTRL_ARI before calling pci_iov_set_numvfs(), then
restore the rest of PCI_SRIOV_CTRL (which may set PCI_SRIOV_CTRL_VFE)
afterwards.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[bhelgaas: changelog, add comment, also clear ARI if necessary]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Emil Tantilov <emil.s.tantilov@intel.com>
drivers/pci/iov.c

index ce24cf235f010a132915214e4e98f5374fe55a45..6bacb8995e9641dfc0ab98d6a5e9f032bbd7f98e 100644 (file)
@@ -498,6 +498,14 @@ static void sriov_restore_state(struct pci_dev *dev)
        if (ctrl & PCI_SRIOV_CTRL_VFE)
                return;
 
+       /*
+        * Restore PCI_SRIOV_CTRL_ARI before pci_iov_set_numvfs() because
+        * it reads offset & stride, which depend on PCI_SRIOV_CTRL_ARI.
+        */
+       ctrl &= ~PCI_SRIOV_CTRL_ARI;
+       ctrl |= iov->ctrl & PCI_SRIOV_CTRL_ARI;
+       pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, ctrl);
+
        for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++)
                pci_update_resource(dev, i);