IB/mlx5: Fetch soft WQE's on fatal error state
authorErez Shitrit <erezsh@mellanox.com>
Mon, 21 May 2018 08:41:01 +0000 (11:41 +0300)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 3 Jul 2018 09:23:09 +0000 (11:23 +0200)
commite355402cf19cf30e5f659ac21ddf40ccd7e5bcf0
tree4b44b3367abfb16c5b127a36ad8214c32dffbe18
parent9cac0a08e476775df3ae5f90730a766ea01b6232
IB/mlx5: Fetch soft WQE's on fatal error state

commit 7b74a83cf54a3747e22c57e25712bd70eef8acee upstream.

On fatal error the driver simulates CQE's for ULPs that rely on
completion of all their posted work-request.

For the GSI traffic, the mlx5 has its own mechanism that sends the
completions via software CQE's directly to the relevant CQ.

This should be kept in fatal error too, so the driver should simulate
such CQE's with the specified error state in order to complete GSI QP
work requests.

Without the fix the next deadlock might appears:
        schedule_timeout+0x274/0x350
        wait_for_common+0xec/0x240
        mcast_remove_one+0xd0/0x120 [ib_core]
        ib_unregister_device+0x12c/0x230 [ib_core]
        mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
        mlx5_detach_device+0x184/0x1a0 [mlx5_core]
        mlx5_unload_one+0x308/0x340 [mlx5_core]
        mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/infiniband/hw/mlx5/cq.c