IB/mlx5: Fetch soft WQE's on fatal error state
authorErez Shitrit <erezsh@mellanox.com>
Mon, 21 May 2018 08:41:01 +0000 (11:41 +0300)
committerJason Gunthorpe <jgg@mellanox.com>
Thu, 24 May 2018 15:39:25 +0000 (09:39 -0600)
commit7b74a83cf54a3747e22c57e25712bd70eef8acee
treeb31c733af4c47d3a993bc609b8c4e1d0eafca589
parent7a8690ed6f5346f6738971892205e91d39b6b901
IB/mlx5: Fetch soft WQE's on fatal error state

On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.

For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.

This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.

Without the fix the next deadlock might appears:
        schedule_timeout+0x274/0x350
        wait_for_common+0xec/0x240
        mcast_remove_one+0xd0/0x120 [ib_core]
        ib_unregister_device+0x12c/0x230 [ib_core]
        mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
        mlx5_detach_device+0x184/0x1a0 [mlx5_core]
        mlx5_unload_one+0x308/0x340 [mlx5_core]
        mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
drivers/infiniband/hw/mlx5/cq.c