net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
authorRoy Novich <royno@nvidia.com>
Wed, 26 Oct 2022 13:51:48 +0000 (14:51 +0100)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 3 Nov 2022 14:59:19 +0000 (23:59 +0900)
[ Upstream commit 416ef713631937cf5452476a7f1041a3ae7b06c6 ]

Update devlink health fw fatal reporter state to "healthy" is needed by
strictly calling devlink_health_reporter_state_update() after recovery
was done by PCI error handler. This is needed when fw_fatal reporter was
triggered due to PCI error. Poll health is called and set reporter state
to error. Health recovery failed (since EEH didn't re-enable the PCI).
PCI handlers keep on recover flow and succeed later without devlink
acknowledgment. Fix this by adding devlink state update at the end of
the PCI handler recovery process.

Fixes: 6181e5cb752e ("devlink: add support for reporter recovery completion")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/main.c

index 1f0156e..d092261 100644 (file)
@@ -1682,6 +1682,10 @@ static void mlx5_pci_resume(struct pci_dev *pdev)
 
        err = mlx5_load_one(dev);
 
+       if (!err)
+               devlink_health_reporter_state_update(dev->priv.health.fw_fatal_reporter,
+                                                    DEVLINK_HEALTH_REPORTER_STATE_HEALTHY);
+
        mlx5_pci_trace(dev, "Done, err = %d, device %s\n", err,
                       !err ? "recovered" : "Failed");
 }