This patch fixes a race between queue_work() in
_dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can
take the final reference of a dlm_msg and so msg->idx can contain
garbage which is signaled by the following warning:
[ 676.237050] ------------[ cut here ]------------
[ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50
[ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr
[ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546
[ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+
033991b0 04/01/2014
[ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50
[ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 676.249253] RSP: 0018:
ffffa401c18ffc68 EFLAGS:
00010282
[ 676.249855] RAX:
0000000000000001 RBX:
00000000ffff8b76 RCX:
0000000000000006
[ 676.250713] RDX:
0000000000000000 RSI:
ffffffffbccf3a10 RDI:
ffffffffbcc7b62e
[ 676.251610] RBP:
ffffa401c18ffc70 R08:
0000000000000001 R09:
0000000000000001
[ 676.252481] R10:
0000000000000001 R11:
0000000000000001 R12:
0000000000000005
[ 676.253421] R13:
ffff8b76786ec370 R14:
ffff8b76786ec370 R15:
ffff8b76786ec480
[ 676.254257] FS:
0000000000000000(0000) GS:
ffff8b7777800000(0000) knlGS:
0000000000000000
[ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 676.255897] CR2:
00005590205d88b8 CR3:
000000017656c003 CR4:
0000000000770ee0
[ 676.256734] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 676.257567] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 676.258397] PKRU:
55555554
[ 676.258729] Call Trace:
[ 676.259063] <TASK>
[ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110
[ 676.259964] queue_bast+0x8b/0xb0
[ 676.260423] grant_pending_locks+0x166/0x1b0
[ 676.261007] _unlock_lock+0x75/0x90
[ 676.261469] unlock_lock.isra.57+0x62/0xa0
[ 676.262009] dlm_unlock+0x21e/0x330
[ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture]
[ 676.263815] ? preempt_count_sub+0xba/0x100
[ 676.264361] ? complete+0x1d/0x60
[ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture]
[ 676.265555] kthread+0x10a/0x130
[ 676.266007] ? kthread_complete_and_exit+0x20/0x20
[ 676.266616] ret_from_fork+0x22/0x30
[ 676.267097] </TASK>
[ 676.267381] irq event stamp:
9579855
[ 676.267824] hardirqs last enabled at (
9579863): [<
ffffffffbb14e6f8>] __up_console_sem+0x58/0x60
[ 676.268896] hardirqs last disabled at (
9579872): [<
ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60
[ 676.270008] softirqs last enabled at (
9579798): [<
ffffffffbc200349>] __do_softirq+0x349/0x4c7
[ 676.271438] softirqs last disabled at (
9579897): [<
ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0
[ 676.272796] ---[ end trace
0000000000000000 ]---
I reproduced this warning with dlm_locktorture test which is currently
not upstream. However this patch fix the issue by make a additional
refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg().
In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be
the final put.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>