tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
authorLiu Jian <liujian56@huawei.com>
Tue, 12 Oct 2021 05:20:19 +0000 (13:20 +0800)
committerAlexei Starovoitov <ast@kernel.org>
Tue, 26 Oct 2021 19:25:55 +0000 (12:25 -0700)
With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or
multiple cores) on a single socket 'sk' we could get the following flow.

 msgA, sk                               msgB, sk
 -----------                            ---------------
 tcp_bpf_sendmsg()
 lock(sk)
 psock = sk->psock
                                        tcp_bpf_sendmsg()
                                        lock(sk) ... blocking
tcp_bpf_send_verdict
if (psock->eval == NONE)
   psock->eval = sk_psock_msg_verdict
 ..
 < handle SK_REDIRECT case >
   release_sock(sk)                     < lock dropped so grab here >
   ret = tcp_bpf_sendmsg_redir
                                        psock = sk->psock
                                        tcp_bpf_send_verdict
 lock_sock(sk) ... blocking on B
                                        if (psock->eval == NONE) <- boom.
                                         psock->eval will have msgA state

The problem here is we dropped the lock on msgA and grabbed it with msgB.
Now we have old state in psock and importantly psock->eval has not been
cleared. So msgB will run whatever action was done on A and the verdict
program may never see it.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
net/ipv4/tcp_bpf.c

index d3e9386..9d06815 100644 (file)
@@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
        bool cork = false, enospc = sk_msg_full(msg);
        struct sock *sk_redir;
        u32 tosend, delta = 0;
+       u32 eval = __SK_NONE;
        int ret;
 
 more_data:
@@ -275,13 +276,24 @@ more_data:
        case __SK_REDIRECT:
                sk_redir = psock->sk_redir;
                sk_msg_apply_bytes(psock, tosend);
+               if (!psock->apply_bytes) {
+                       /* Clean up before releasing the sock lock. */
+                       eval = psock->eval;
+                       psock->eval = __SK_NONE;
+                       psock->sk_redir = NULL;
+               }
                if (psock->cork) {
                        cork = true;
                        psock->cork = NULL;
                }
                sk_msg_return(sk, msg, tosend);
                release_sock(sk);
+
                ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
+
+               if (eval == __SK_REDIRECT)
+                       sock_put(sk_redir);
+
                lock_sock(sk);
                if (unlikely(ret < 0)) {
                        int free = sk_msg_free_nocharge(sk, msg);