scsi: lpfc: Fix crash when a fabric node is released prematurely
authorJames Smart <jsmart2021@gmail.com>
Mon, 4 Jan 2021 18:02:29 +0000 (10:02 -0800)
committerMartin K. Petersen <martin.petersen@oracle.com>
Fri, 8 Jan 2021 04:02:35 +0000 (23:02 -0500)
commit07aaefdf75c50b55e1f1e1c904fa6d00466e0a75
tree2e9319d64373ce5d132384e8ffade776f3ae1aa7
parentecf041fe9895c3f5c0119fb427fa634a275c9382
scsi: lpfc: Fix crash when a fabric node is released prematurely

The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport.  This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO.  The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI.  If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event.  The driver cannot count on any persistence from fabric controller
nodes.

Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/lpfc/lpfc_hbadisc.c
drivers/scsi/lpfc/lpfc_nportdisc.c