From 7386457af39de4163562ece1fd44b22722805add Mon Sep 17 00:00:00 2001 From: =?utf8?q?Edwin=20T=C3=B6r=C3=B6k?= Date: Mon, 6 Mar 2023 15:48:10 -0500 Subject: [PATCH] DLM: increase socket backlog to avoid hangs with 16 nodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit On a 16 node virtual cluster with e1000 NICs joining the 12th node prints SYN flood warnings for the DLM port: Dec 21 01:46:41 localhost kernel: [ 2146.516664] TCP: request_sock_TCP: Possible SYN flooding on port 21064. Sending cookies. Check SNMP counters. And then joining a DLM lockspace hangs: ``` Dec 21 01:49:00 localhost kernel: [ 2285.780913] INFO: task xapi-clusterd:17638 blocked for more than 120 seconds. Dec 21 01:49:00 localhost kernel: [ 2285.786476] Not tainted 4.4.0+10 #1 Dec 21 01:49:00 localhost kernel: [ 2285.789043] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 21 01:49:00 localhost kernel: [ 2285.794611] xapi-clusterd D ffff88001930bc58 0 17638 1 0x00000000 Dec 21 01:49:00 localhost kernel: [ 2285.794615] ffff88001930bc58 ffff880025593800 ffff880022433800 ffff88001930c000 Dec 21 01:49:00 localhost kernel: [ 2285.794617] ffff88000ef4a660 ffff88000ef4a658 ffff880022433800 ffff88000ef4a000 Dec 21 01:49:00 localhost kernel: [ 2285.794619] ffff88001930bc70 ffffffff8159f6b4 7fffffffffffffff ffff88001930bd10 Dec 21 01:49:00 localhost kernel: [ 2285.794644] [] ? printk+0x4d/0x4f Dec 21 01:49:00 localhost kernel: [ 2285.794647] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 Dec 21 01:49:00 localhost kernel: [ 2285.794649] [] wait_for_completion+0x9d/0x110 Dec 21 01:49:00 localhost kernel: [ 2285.794653] [] ? wake_up_q+0x80/0x80 Dec 21 01:49:00 localhost kernel: [ 2285.794661] [] dlm_new_lockspace+0x908/0xac0 [dlm] Dec 21 01:49:00 localhost kernel: [ 2285.794665] [] ? prepare_to_wait_event+0x100/0x100 Dec 21 01:49:00 localhost kernel: [ 2285.794670] [] device_write+0x497/0x6b0 [dlm] Dec 21 01:49:00 localhost kernel: [ 2285.794673] [] ? handle_mm_fault+0x7f0/0x13b0 Dec 21 01:49:00 localhost kernel: [ 2285.794677] [] __vfs_write+0x28/0xd0 Dec 21 01:49:00 localhost kernel: [ 2285.794679] [] ? rw_verify_area+0x6f/0xd0 Dec 21 01:49:00 localhost kernel: [ 2285.794681] [] vfs_write+0xb1/0x190 Dec 21 01:49:00 localhost kernel: [ 2285.794686] [] ? __do_page_fault+0x302/0x420 Dec 21 01:49:00 localhost kernel: [ 2285.794688] [] SyS_write+0x46/0xa0 Dec 21 01:49:00 localhost kernel: [ 2285.794690] [] entry_SYSCALL_64_fastpath+0x12/0x71 ``` The previous limit of 5 seems like an arbitrary number, that doesn't match any known DLM cluster size upper bound limit. Signed-off-by: Edwin Török Cc: Christine Caulfield Cc: David Teigland Cc: cluster-devel@redhat.com Signed-off-by: Alexander Aring Signed-off-by: David Teigland --- fs/dlm/lowcomms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index c06dd19..d561550 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1814,7 +1814,7 @@ static int dlm_listen_for_all(void) sock->sk->sk_data_ready = lowcomms_listen_data_ready; release_sock(sock->sk); - result = sock->ops->listen(sock, 5); + result = sock->ops->listen(sock, 128); if (result < 0) { dlm_close_sock(&listen_con.sock); return result; -- 2.7.4