scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
authorJames Smart <jsmart2021@gmail.com>
Wed, 20 Oct 2021 21:14:16 +0000 (14:14 -0700)
committerMartin K. Petersen <martin.petersen@oracle.com>
Thu, 21 Oct 2021 03:33:46 +0000 (23:33 -0400)
commitaf984c87293b19dccbd0b16afc57c5c9a4a279c7
tree3c1ab333dbcece2368161c75758097ca57deeb9a
parent1854f53ccd88ad4e7568ddfafafffe71f1ceb0a6
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss

A link bounce to a slow fabric may observe FDISC response delays lasting
longer than devloss tmo.  Current logic decrements the final fabric node
kref during a devloss tmo event.  This results in a NULL ptr dereference
crash if the FDISC completes for that fabric node after devloss tmo.

Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when
devloss tmo triggers and we've noticed that fabric node recovery has
already started or finished in between the time lpfc_dev_loss_tmo_callbk
queues lpfc_dev_loss_tmo_handler.  If fabric node recovery succeeds, then
the driver reverses the devloss tmo marked kref put with a kref get.  If
fabric node recovery fails, then the final kref put relies on the ELS
timing out or the REG_LOGIN cmpl routine.

Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/lpfc/lpfc_crtn.h
drivers/scsi/lpfc/lpfc_disc.h
drivers/scsi/lpfc/lpfc_els.c
drivers/scsi/lpfc/lpfc_hbadisc.c
drivers/scsi/lpfc/lpfc_init.c
drivers/scsi/lpfc/lpfc_scsi.c