scsi: ufs: Fix a race condition between error handler and runtime PM ops
authorCan Guo <cang@codeaurora.org>
Sun, 9 Aug 2020 12:15:54 +0000 (05:15 -0700)
committerMartin K. Petersen <martin.petersen@oracle.com>
Tue, 18 Aug 2020 00:54:56 +0000 (20:54 -0400)
commit5586dd8ea250ab0851caeab6f461a9dbf57c806f
tree34278294e86a65001c310e545efa4d927dfb929d
parentc3be8d1ee1bff7c6f7e588d2a87f076148f40326
scsi: ufs: Fix a race condition between error handler and runtime PM ops

The current IRQ handler blocks SCSI requests before scheduling eh_work,
when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume
sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/resume is
blocked by the SCSI cmd.

 - In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
   should stay under the same spin lock. This is to make sure that no more
   commands leak into doorbell after hba->ufshcd_state is changed.

 - Don't block SCSI requests before error handler starts to run, let error
   handler block SCSI requests when it is ready to start error recovery.

 - Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime
   PM ops, let them pass or fail them. Let them pass if eh_work is
   scheduled due to non-fatal errors. Fail them if eh_work is scheduled due
   to fatal errors, otherwise the cmds may eventually time out since UFS is
   in bad state, which gets error handler blocked for too long. If we fail
   the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails
   too, but it does not hurt since error handler can recover HBA runtime PM
   error.

Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/ufs/ufshcd.c