scsi: cxlflash: Synchronize reset and remove ops
authorUma Krishnan <ukrishn@linux.vnet.ibm.com>
Mon, 26 Mar 2018 16:35:27 +0000 (11:35 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 3 Aug 2018 05:50:41 +0000 (07:50 +0200)
commitbb7cccb01c84e0b7c0a4d43fa7b25510e56e92df
tree29bee2d981dec5c696a8ef0931320a2194ce413c
parent07b2a0d0018381c7b9760e6d3ceb72f683ad46ba
scsi: cxlflash: Synchronize reset and remove ops

[ Upstream commit a3feb6ef50def7c91244d7bd15a3625b7b49b81f ]

The following Oops can be encountered if a device removal or system shutdown
is initiated while an EEH recovery is in process:

[c000000ff2f479c0c008000015256f18 cxlflash_pci_slot_reset+0xa0/0x100
                                      [cxlflash]
[c000000ff2f47a30c00800000dae22e0 cxl_pci_slot_reset+0x168/0x290 [cxl]
[c000000ff2f47ae0c00000000003ef1c eeh_report_reset+0xec/0x170
[c000000ff2f47b20c00000000003d0b8 eeh_pe_dev_traverse+0x98/0x170
[c000000ff2f47bb0c00000000003f80c eeh_handle_normal_event+0x56c/0x580
[c000000ff2f47c60c00000000003fba4 eeh_handle_event+0x2a4/0x338
[c000000ff2f47d10c0000000000400b8 eeh_event_handler+0x1f8/0x200
[c000000ff2f47dc0c00000000013da48 kthread+0x1a8/0x1b0
[c000000ff2f47e30c00000000000b528 ret_from_kernel_thread+0x5c/0xb4

The remove handler frees AFU memory while the EEH recovery is in progress,
leading to a race condition. This can result in a crash if the recovery thread
tries to access this memory.

To resolve this issue, the cxlflash remove handler will evaluate the device
state and yield to any active reset or probing threads.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/scsi/cxlflash/main.c