From 565b7b2d2e632b5792879c0c9cccdd9eecd31195 Mon Sep 17 00:00:00 2001 From: Konstantin Khorenko Date: Thu, 24 Jun 2010 21:54:58 -0700 Subject: [PATCH] tcp: do not send reset to already closed sockets i've found that tcp_close() can be called for an already closed socket, but still sends reset in this case (tcp_send_active_reset()) which seems to be incorrect. Moreover, a packet with reset is sent with different source port as original port number has been already cleared on socket. Besides that incrementing stat counter for LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case. Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same seems to be true for the current mainstream kernel (checked on 2.6.35-rc3). Please, correct me if i missed something. How that happens: 1) the server receives a packet for socket in TCP_CLOSE_WAIT state that triggers a tcp_reset(): Call Trace: [] tcp_reset+0x12f/0x1e8 [] tcp_rcv_state_process+0x1c0/0xa08 [] tcp_v4_do_rcv+0x310/0x37a [] tcp_v4_rcv+0x74d/0xb43 [] ip_local_deliver_finish+0x0/0x259 [] ip_local_deliver+0x200/0x2f4 [] ip_rcv+0x64c/0x69f [] netif_receive_skb+0x4c4/0x4fa [] process_backlog+0x90/0xec [] net_rx_action+0xbb/0x1f1 [] __do_softirq+0xf5/0x1ce [] handle_IRQ_event+0x56/0xb0 [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x85 [] do_IRQ+0x149/0x152 [] ret_from_intr+0x0/0xa [] __handle_mm_fault+0x6cd/0x1303 [] __handle_mm_fault+0x5a2/0x1303 [] cache_free_debugcheck+0x21f/0x22e [] do_page_fault+0x49a/0x7dc [] thread_return+0x89/0x174 [] audit_syscall_exit+0x341/0x35c [] error_exit+0x0/0x84 tcp_rcv_state_process() ... // (sk_state == TCP_CLOSE_WAIT here) ... /* step 2: check RST bit */ if(th->rst) { tcp_reset(sk); goto discard; } ... --------------------------------- tcp_rcv_state_process tcp_reset tcp_done tcp_set_state(sk, TCP_CLOSE); inet_put_port __inet_put_port inet_sk(sk)->num = 0; sk->sk_shutdown = SHUTDOWN_MASK; 2) After that the process (socket owner) tries to write something to that socket and "inet_autobind" sets a _new_ (which differs from the original!) port number for the socket: Call Trace: [] inet_bind_hash+0x33/0x5f [] inet_csk_get_port+0x216/0x268 [] inet_autobind+0x22/0x8f [] inet_sendmsg+0x27/0x57 [] do_sock_write+0xae/0xea [] sock_writev+0xdc/0xf6 [] _spin_lock_irqsave+0x9/0xe [] __pollwait+0x0/0xdd [] default_wake_function+0x0/0xe [] autoremove_wake_function+0x0/0x2e [] do_readv_writev+0x163/0x274 [] thread_return+0x13a/0x174 [] tcp_poll+0x0/0x1c9 [] audit_syscall_entry+0x180/0x1b3 [] sys_writev+0x49/0xe4 [] tracesys+0xd5/0xe0 3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace): F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3 Call Trace: [] tcp_sendmsg+0xcb/0xe87 [] release_sock+0x10/0xae [] vgacon_cursor+0x0/0x1a7 [] inet_autobind+0x8b/0x8f [] do_sock_write+0xae/0xea [] sock_writev+0xdc/0xf6 [] _spin_lock_irqsave+0x9/0xe [] __pollwait+0x0/0xdd [] default_wake_function+0x0/0xe [] autoremove_wake_function+0x0/0x2e [] do_readv_writev+0x163/0x274 [] thread_return+0x13a/0x174 [] tcp_poll+0x0/0x1c9 [] audit_syscall_entry+0x180/0x1b3 [] sys_writev+0x49/0xe4 [] tracesys+0xd5/0xe0 tcp_sendmsg() ... /* Wait for a connection to finish. */ if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { int old_state = sk->sk_state; if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) { if (f_d && (err == -EPIPE)) { printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, " "sk_err=%d, sk_shutdown=%d\n", sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state, sk->sk_err, sk->sk_shutdown); dump_stack(); } goto out_err; } } ... 4) Then the process (socket owner) understands that it's time to close that socket and does that (and thus triggers sending reset packet): Call Trace: ... [] dev_queue_xmit+0x343/0x3d6 [] ip_output+0x351/0x384 [] dst_output+0x0/0xe [] ip_queue_xmit+0x567/0x5d2 [] vprintk+0x21/0x33 [] check_poison_obj+0x2e/0x206 [] poison_obj+0x36/0x45 [] tcp_send_active_reset+0x15/0x14d [] dbg_redzone1+0x1c/0x25 [] tcp_send_active_reset+0x15/0x14d [] cache_alloc_debugcheck_after+0x189/0x1c8 [] tcp_transmit_skb+0x764/0x786 [] tcp_send_active_reset+0xf9/0x14d [] tcp_close+0x39a/0x960 [] inet_release+0x69/0x80 [] sock_release+0x4f/0xcf [] sock_close+0x2c/0x30 [] __fput+0xac/0x197 [] filp_close+0x59/0x61 [] sys_close+0x85/0xc7 [] tracesys+0xd5/0xe0 So, in brief: * a received packet for socket in TCP_CLOSE_WAIT state triggers tcp_reset() which clears inet_sk(sk)->num and put socket into TCP_CLOSE state * an attempt to write to that socket forces inet_autobind() to get a new port (but the write itself fails with -EPIPE) * tcp_close() called for socket in TCP_CLOSE state sends an active reset via socket with newly allocated port This adds an additional check in tcp_close() for already closed sockets. We do not want to send anything to closed sockets. Signed-off-by: Konstantin Khorenko Signed-off-by: David S. Miller --- net/ipv4/tcp.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 779d40c..d5f84bd 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1898,6 +1898,10 @@ void tcp_close(struct sock *sk, long timeout) sk_mem_reclaim(sk); + /* If socket has been already reset (e.g. in tcp_reset()) - kill it. */ + if (sk->sk_state == TCP_CLOSE) + goto adjudge_to_death; + /* As outlined in RFC 2525, section 2.17, we send a RST here because * data was lost. To witness the awful effects of the old behavior of * always doing a FIN, run an older 2.1.x kernel or 2.0.x, start a bulk -- 2.7.4