From 21fc79336b9587fcc251e77246b68b6e20340146 Mon Sep 17 00:00:00 2001 From: Tomer Tayar Date: Wed, 20 Jul 2022 20:02:20 +0300 Subject: [PATCH] habanalabs/gaudi2: mark PCIE access error as fatal F/W events are enabled in a late phase of the device init, so an event for a PCIE access error during the init, can be received after the init is already done and considered as successful. A resulting device reset, which does the same H/W init, can end similarly with this event right after the reset is done and considered as successful, and a loop of this sequence can continue. To avoid it mark the PCIE access error as a fatal event, so after 2 consecutive events no more resets will be done. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi2/gaudi2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index 2c43ed4..68ab407 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -8532,6 +8532,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR: gaudi2_print_pcie_addr_dec_info(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; break; case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR: -- 2.7.4