tcp: do not send reset to already closed sockets
i've found that tcp_close() can be called for an already closed
socket, but still sends reset in this case (tcp_send_active_reset())
which seems to be incorrect. Moreover, a packet with reset is sent
with different source port as original port number has been already
cleared on socket. Besides that incrementing stat counter for
LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.
Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
seems to be true for the current mainstream kernel (checked on
2.6.35-rc3). Please, correct me if i missed something.
How that happens:
1) the server receives a packet for socket in TCP_CLOSE_WAIT state
that triggers a tcp_reset():
Call Trace:
<IRQ> [<
ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
[<
ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
[<
ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
[<
ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
[<
ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
[<
ffffffff80037131>] ip_local_deliver+0x200/0x2f4
[<
ffffffff8003843c>] ip_rcv+0x64c/0x69f
[<
ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
[<
ffffffff80032eca>] process_backlog+0x90/0xec
[<
ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
[<
ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
[<
ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
[<
ffffffff8006334c>] call_softirq+0x1c/0x28
[<
ffffffff80070476>] do_softirq+0x2c/0x85
[<
ffffffff80070441>] do_IRQ+0x149/0x152
[<
ffffffff80062665>] ret_from_intr+0x0/0xa
<EOI> [<
ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
[<
ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
[<
ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
[<
ffffffff8006a263>] do_page_fault+0x49a/0x7dc
[<
ffffffff80066487>] thread_return+0x89/0x174
[<
ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
[<
ffffffff80062e39>] error_exit+0x0/0x84
tcp_rcv_state_process()
... // (sk_state == TCP_CLOSE_WAIT here)
...
/* step 2: check RST bit */
if(th->rst) {
tcp_reset(sk);
goto discard;
}
...
---------------------------------
tcp_rcv_state_process
tcp_reset
tcp_done
tcp_set_state(sk, TCP_CLOSE);
inet_put_port
__inet_put_port
inet_sk(sk)->num = 0;
sk->sk_shutdown = SHUTDOWN_MASK;
2) After that the process (socket owner) tries to write something to
that socket and "inet_autobind" sets a _new_ (which differs from
the original!) port number for the socket:
Call Trace:
[<
ffffffff80255a12>] inet_bind_hash+0x33/0x5f
[<
ffffffff80257180>] inet_csk_get_port+0x216/0x268
[<
ffffffff8026bcc9>] inet_autobind+0x22/0x8f
[<
ffffffff80049140>] inet_sendmsg+0x27/0x57
[<
ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<
ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<
ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<
ffffffff8001fb49>] __pollwait+0x0/0xdd
[<
ffffffff8008d533>] default_wake_function+0x0/0xe
[<
ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<
ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<
ffffffff80066538>] thread_return+0x13a/0x174
[<
ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<
ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<
ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):
F: tcp_sendmsg1 -EPIPE: sk=
ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3
Call Trace:
[<
ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
[<
ffffffff80033300>] release_sock+0x10/0xae
[<
ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
[<
ffffffff8026bd32>] inet_autobind+0x8b/0x8f
[<
ffffffff8003a9d9>] do_sock_write+0xae/0xea
[<
ffffffff80226ac7>] sock_writev+0xdc/0xf6
[<
ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
[<
ffffffff8001fb49>] __pollwait+0x0/0xdd
[<
ffffffff8008d533>] default_wake_function+0x0/0xe
[<
ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
[<
ffffffff800f0b49>] do_readv_writev+0x163/0x274
[<
ffffffff80066538>] thread_return+0x13a/0x174
[<
ffffffff800145d8>] tcp_poll+0x0/0x1c9
[<
ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
[<
ffffffff800f0dd0>] sys_writev+0x49/0xe4
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
tcp_sendmsg()
...
/* Wait for a connection to finish. */
if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
int old_state = sk->sk_state;
if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
if (f_d && (err == -EPIPE)) {
printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
"sk_err=%d, sk_shutdown=%d\n",
sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
sk->sk_err, sk->sk_shutdown);
dump_stack();
}
goto out_err;
}
}
...
4) Then the process (socket owner) understands that it's time to close
that socket and does that (and thus triggers sending reset packet):
Call Trace:
...
[<
ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
[<
ffffffff80034698>] ip_output+0x351/0x384
[<
ffffffff80251ae9>] dst_output+0x0/0xe
[<
ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
[<
ffffffff80095700>] vprintk+0x21/0x33
[<
ffffffff800070f0>] check_poison_obj+0x2e/0x206
[<
ffffffff80013587>] poison_obj+0x36/0x45
[<
ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<
ffffffff80023481>] dbg_redzone1+0x1c/0x25
[<
ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
[<
ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
[<
ffffffff80023405>] tcp_transmit_skb+0x764/0x786
[<
ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
[<
ffffffff80258ff1>] tcp_close+0x39a/0x960
[<
ffffffff8026be12>] inet_release+0x69/0x80
[<
ffffffff80059b31>] sock_release+0x4f/0xcf
[<
ffffffff80059d4c>] sock_close+0x2c/0x30
[<
ffffffff800133c9>] __fput+0xac/0x197
[<
ffffffff800252bc>] filp_close+0x59/0x61
[<
ffffffff8001eff6>] sys_close+0x85/0xc7
[<
ffffffff800622dd>] tracesys+0xd5/0xe0
So, in brief:
* a received packet for socket in TCP_CLOSE_WAIT state triggers
tcp_reset() which clears inet_sk(sk)->num and put socket into
TCP_CLOSE state
* an attempt to write to that socket forces inet_autobind() to get a
new port (but the write itself fails with -EPIPE)
* tcp_close() called for socket in TCP_CLOSE state sends an active
reset via socket with newly allocated port
This adds an additional check in tcp_close() for already closed
sockets. We do not want to send anything to closed sockets.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>