net/mlx5: Fix health error state handling
authorShay Drory <shayd@nvidia.com>
Mon, 23 Nov 2020 06:39:10 +0000 (08:39 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 4 Mar 2021 10:37:31 +0000 (11:37 +0100)
commit27c79b3a9212cf4ba634c157e07d29548181a208
tree6ce5c7f91a5ed1da8e1f9a34dba3517098a8051e
parentae624d4bd9b6ce159f5b21ba5a321073773e5354
net/mlx5: Fix health error state handling

[ Upstream commit 51d138c2610a236c1ed0059d034ee4c74f452b86 ]

Currently, when we discover a fatal error, we are queueing a work that
will wait for a lock in order to enter the device to error state.
Meanwhile, FW commands are still being processed, and gets timeouts.
This can block the driver for few minutes before the work will manage
to get the lock and enter to error state.

Setting the device to error state before queueing health work, in order
to avoid FW commands being processed while the work is waiting for the
lock.

Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/health.c