md: fix deadlock causing by sysfs_notify
The following deadlock was captured. The first process is holding 'kernfs_mutex'
and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device,
this pending bio list would be flushed by second process 'md127_raid1', but
it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace
sysfs_notify() can fix it. There were other sysfs_notify() invoked from io
path, removed all of them.
PID: 40430 TASK:
ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file"
#0 [
ffffb87c4df37260] __schedule at
ffffffff9a8678ec
#1 [
ffffb87c4df372f8] schedule at
ffffffff9a867f06
#2 [
ffffb87c4df37310] io_schedule at
ffffffff9a0c73e6
#3 [
ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at
ffffffffc03a4057 [xfs]
#4 [
ffffb87c4df373a0] xfs_iunpin_wait at
ffffffffc03a6c79 [xfs]
#5 [
ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at
ffffffffc039a46c [xfs]
#6 [
ffffb87c4df37400] xfs_reclaim_inodes_ag at
ffffffffc039a8b6 [xfs]
#7 [
ffffb87c4df37590] xfs_reclaim_inodes_nr at
ffffffffc039bb33 [xfs]
#8 [
ffffb87c4df375b0] xfs_fs_free_cached_objects at
ffffffffc03af0e9 [xfs]
#9 [
ffffb87c4df375c0] super_cache_scan at
ffffffff9a287ec7
#10 [
ffffb87c4df37618] shrink_slab at
ffffffff9a1efd93
#11 [
ffffb87c4df37700] shrink_node at
ffffffff9a1f5968
#12 [
ffffb87c4df37788] do_try_to_free_pages at
ffffffff9a1f5ea2
#13 [
ffffb87c4df377f0] try_to_free_mem_cgroup_pages at
ffffffff9a1f6445
#14 [
ffffb87c4df37880] try_charge at
ffffffff9a26cc5f
#15 [
ffffb87c4df37920] memcg_kmem_charge_memcg at
ffffffff9a270f6a
#16 [
ffffb87c4df37958] new_slab at
ffffffff9a251430
#17 [
ffffb87c4df379c0] ___slab_alloc at
ffffffff9a251c85
#18 [
ffffb87c4df37a80] __slab_alloc at
ffffffff9a25635d
#19 [
ffffb87c4df37ac0] kmem_cache_alloc at
ffffffff9a251f89
#20 [
ffffb87c4df37b00] alloc_inode at
ffffffff9a2a2b10
#21 [
ffffb87c4df37b20] iget_locked at
ffffffff9a2a4854
#22 [
ffffb87c4df37b60] kernfs_get_inode at
ffffffff9a311377
#23 [
ffffb87c4df37b80] kernfs_iop_lookup at
ffffffff9a311e2b
#24 [
ffffb87c4df37ba8] lookup_slow at
ffffffff9a290118
#25 [
ffffb87c4df37c10] walk_component at
ffffffff9a291e83
#26 [
ffffb87c4df37c78] path_lookupat at
ffffffff9a293619
#27 [
ffffb87c4df37cd8] filename_lookup at
ffffffff9a2953af
#28 [
ffffb87c4df37de8] user_path_at_empty at
ffffffff9a295566
#29 [
ffffb87c4df37e10] vfs_statx at
ffffffff9a289787
#30 [
ffffb87c4df37e70] SYSC_newlstat at
ffffffff9a289d5d
#31 [
ffffb87c4df37f18] sys_newlstat at
ffffffff9a28a60e
#32 [
ffffb87c4df37f28] do_syscall_64 at
ffffffff9a003949
#33 [
ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at
ffffffff9aa001ad
RIP:
00007f617a5f2905 RSP:
00007f607334f838 RFLAGS:
00000246
RAX:
ffffffffffffffda RBX:
00007f6064044b20 RCX:
00007f617a5f2905
RDX:
00007f6064044b20 RSI:
00007f6064044b20 RDI:
00007f6064005890
RBP:
00007f6064044aa0 R8:
0000000000000030 R9:
000000000000011c
R10:
0000000000000013 R11:
0000000000000246 R12:
00007f606417e6d0
R13:
00007f6064044aa0 R14:
00007f6064044b10 R15:
00000000ffffffff
ORIG_RAX:
0000000000000006 CS: 0033 SS: 002b
PID: 927 TASK:
ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1"
#0 [
ffffb87c4df07b28] __schedule at
ffffffff9a8678ec
#1 [
ffffb87c4df07bc0] schedule at
ffffffff9a867f06
#2 [
ffffb87c4df07bd8] schedule_preempt_disabled at
ffffffff9a86825e
#3 [
ffffb87c4df07be8] __mutex_lock at
ffffffff9a869bcc
#4 [
ffffb87c4df07ca0] __mutex_lock_slowpath at
ffffffff9a86a013
#5 [
ffffb87c4df07cb0] mutex_lock at
ffffffff9a86a04f
#6 [
ffffb87c4df07cc8] kernfs_find_and_get_ns at
ffffffff9a311d83
#7 [
ffffb87c4df07cf0] sysfs_notify at
ffffffff9a314b3a
#8 [
ffffb87c4df07d18] md_update_sb at
ffffffff9a688696
#9 [
ffffb87c4df07d98] md_update_sb at
ffffffff9a6886d5
#10 [
ffffb87c4df07da8] md_check_recovery at
ffffffff9a68ad9c
#11 [
ffffb87c4df07dd0] raid1d at
ffffffffc01f0375 [raid1]
#12 [
ffffb87c4df07ea0] md_thread at
ffffffff9a680348
#13 [
ffffb87c4df07f08] kthread at
ffffffff9a0b8005
#14 [
ffffb87c4df07f50] ret_from_fork at
ffffffff9aa00344
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Song Liu <songliubraving@fb.com>