From 068c29a248b6ddbfdf7bb150b547569759620d36 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Mon, 22 Jun 2020 19:35:23 +0800 Subject: [PATCH] PCI/ERR: Clear PCIe Device Status errors only if OS owns AER pcie_clear_device_status() resets the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA). Previously we did this unconditionally, but on ACPI systems, the _OSC AER bit negotiates control of the AER capability. Per sec 4.5.1 of the System Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers other error enable/status bits including the following: Correctable Error Reporting Enable Non-Fatal Error Reporting Enable Fatal Error Reporting Enable Unsupported Request Reporting Enable These bits are all in the PCIe Device Control register (the ECN omitted "Reporting", but I think that's a typo), so by implication the _OSC AER bit also applies to the error status bits in the PCIe Device Status register: Correctable Error Detected Non-Fatal Error Detected Fatal Error Detected Unsupported Request Detected Clear the PCIe Device Status error bits only when the OS controls the AER capability and related error enable/status bits. If platform firmware controls the AER capability, firmware is responsible for clearing these bits. One call path leading here is: ghes_do_proc ghes_handle_aer aer_recover_queue schedule_work(&aer_recover_work) ... aer_recover_work_func pcie_do_recovery pcie_clear_device_status [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [bhelgaas: commit log, move test from pcie_clear_device_status() to callers] Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com Signed-off-by: Jonathan Cameron Signed-off-by: Bjorn Helgaas --- drivers/pci/pcie/aer.c | 3 ++- drivers/pci/pcie/err.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6d7783..cccd674 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -939,7 +939,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) if (aer) pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, info->status); - pcie_clear_device_status(dev); + if (pcie_aer_is_native(dev)) + pcie_clear_device_status(dev); } else if (info->severity == AER_NONFATAL) pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); else if (info->severity == AER_FATAL) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 55755bc..c543f41 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -197,7 +197,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast resume message\n"); pci_walk_bus(bus, report_resume, &status); - pcie_clear_device_status(dev); + if (pcie_aer_is_native(dev)) + pcie_clear_device_status(dev); pci_aer_clear_nonfatal_status(dev); pci_info(dev, "device recovery successful\n"); return status; -- 2.7.4