habanalabs/gaudi2: find decode error root cause
authorKoby Elbaz <kelbaz@habana.ai>
Sun, 15 Jan 2023 10:38:53 +0000 (12:38 +0200)
committerOded Gabbay <ogabbay@kernel.org>
Thu, 26 Jan 2023 09:52:14 +0000 (11:52 +0200)
commitf7d67c1cfdccfe7168d28c26b935c9da18bfdb8c
treecb8a6c3c1036bd726eeaf06bef1fc43b9678712f
parentce582bea86bf0c7ae6f8269873bd82dbc0158e53
habanalabs/gaudi2: find decode error root cause

When a decode error happens, we often don't know the exact root
cause (the erroneous address that was accessed) and the exact engine
that created the erroneous transaction.

To find out, we need to go over all the relevant register blocks
in the ASIC. Once we find the relevant engine, we print its details
and the offending address.

This helps tremendously when debugging an error that was created
by running a user workload.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
drivers/accel/habanalabs/common/habanalabs.h
drivers/accel/habanalabs/common/security.c
drivers/accel/habanalabs/common/security.h [new file with mode: 0644]
drivers/accel/habanalabs/gaudi2/gaudi2.c
drivers/accel/habanalabs/gaudi2/gaudi2P.h
drivers/accel/habanalabs/include/gaudi2/gaudi2_special_blocks.h [new file with mode: 0644]