scsi: hisi_sas: Grab sas_dev lock when traversing the members of sas_dev.list
When freeing slots in function slot_complete_v3_hw(), it is possible that
sas_dev.list is being traversed elsewhere, and it may trigger a NULL
pointer exception, such as follows:
==>cq thread ==>scsi_eh_6
==>scsi_error_handler()
==>sas_eh_handle_sas_errors()
==>sas_scsi_find_task()
==>lldd_abort_task()
==>slot_complete_v3_hw() ==>hisi_sas_abort_task()
==>hisi_sas_slot_task_free() ==>dereg_device_v3_hw()
==>list_del_init() ==>list_for_each_entry_safe()
[ 7165.434918] sas: Enter sas_scsi_recover_host busy: 32 failed: 32
[ 7165.434926] sas: trying to find task 0x00000000769b5ba5
[ 7165.434927] sas: sas_scsi_find_task: aborting task 0x00000000769b5ba5
[ 7165.434940] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(
00000000769b5ba5) aborted
[ 7165.434964] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(
00000000c9f7aa07) ignored
[ 7165.434965] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(
00000000e2a1cf01) ignored
[ 7165.434968] Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
[ 7165.434972] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(
0000000022d52d93) ignored
[ 7165.434975] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(
0000000066a7516c) ignored
[ 7165.434976] Mem abort info:
[ 7165.434982] ESR = 0x96000004
[ 7165.434991] Exception class = DABT (current EL), IL = 32 bits
[ 7165.434992] SET = 0, FnV = 0
[ 7165.434993] EA = 0, S1PTW = 0
[ 7165.434994] Data abort info:
[ 7165.434994] ISV = 0, ISS = 0x00000004
[ 7165.434995] CM = 0, WnR = 0
[ 7165.434997] user pgtable: 4k pages, 48-bit VAs, pgdp =
00000000f29543f2
[ 7165.434998] [
0000000000000000] pgd=
0000000000000000
[ 7165.435003] Internal error: Oops:
96000004 [#1] SMP
[ 7165.439863] Process scsi_eh_6 (pid: 4109, stack limit = 0x00000000c43818d5)
[ 7165.468862] pstate:
00c00009 (nzcv daif +PAN +UAO)
[ 7165.473637] pc : dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw]
[ 7165.479443] lr : dereg_device_v3_hw+0x2c/0xa8 [hisi_sas_v3_hw]
[ 7165.485247] sp :
ffff00001d623bc0
[ 7165.488546] x29:
ffff00001d623bc0 x28:
ffffa027d03b9508
[ 7165.493835] x27:
ffff80278ed50af0 x26:
ffffa027dd31e0a8
[ 7165.499123] x25:
ffffa027d9b27f88 x24:
ffffa027d9b209f8
[ 7165.504411] x23:
ffffa027c45b0d60 x22:
ffff80278ec07c00
[ 7165.509700] x21:
0000000000000008 x20:
ffffa027d9b209f8
[ 7165.514988] x19:
ffffa027d9b27f88 x18:
ffffffffffffffff
[ 7165.520276] x17:
0000000000000000 x16:
0000000000000000
[ 7165.525564] x15:
ffff0000091d9708 x14:
ffff0000093b7dc8
[ 7165.530852] x13:
ffff0000093b7a23 x12:
6e7265746e692067
[ 7165.536140] x11:
0000000000000000 x10:
0000000000000bb0
[ 7165.541429] x9 :
ffff00001d6238f0 x8 :
ffffa027d877af00
[ 7165.546718] x7 :
ffffa027d6329600 x6 :
ffff7e809f58ca00
[ 7165.552006] x5 :
0000000000001f8a x4 :
000000000000088e
[ 7165.557295] x3 :
ffffa027d9b27fa8 x2 :
0000000000000000
[ 7165.562583] x1 :
0000000000000000 x0 :
000000003000188e
[ 7165.567872] Call trace:
[ 7165.570309] dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw]
[ 7165.575775] hisi_sas_abort_task+0x248/0x358 [hisi_sas_main]
[ 7165.581415] sas_eh_handle_sas_errors+0x258/0x8e0 [libsas]
[ 7165.586876] sas_scsi_recover_host+0x134/0x458 [libsas]
[ 7165.592082] scsi_error_handler+0xb4/0x488
[ 7165.596163] kthread+0x134/0x138
[ 7165.599380] ret_from_fork+0x10/0x18
[ 7165.602940] Code:
d5033e9f b9000040 aa0103e2 eb03003f (
f9400021)
[ 7165.609004] kernel fault(0x1) notification starting on CPU 75
[ 7165.700728] ---[ end trace
fc042cbbea224efc ]---
[ 7165.705326] Kernel panic - not syncing: Fatal exception
To fix the issue, grab sas_dev lock when traversing the members of
sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid
concurrency of adding and deleting member. When function
hisi_sas_release_tasks() calls hisi_sas_do_release_task() to free slot, the
lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool
parameter need_lock is added.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1679283265-115066-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>