tcp: schedule EPOLLOUT after a partial sendmsg
authorSoheil Hassas Yeganeh <soheil@google.com>
Mon, 14 Sep 2020 21:52:10 +0000 (17:52 -0400)
committerDavid S. Miller <davem@davemloft.net>
Mon, 14 Sep 2020 23:58:24 +0000 (16:58 -0700)
commitafb83012cc7236c8f5cefbd0fd4ba628ec34ce02
tree3c28059b9ec5d3443bc0bd5c73893a5b3d2e0657
parent8ba3c9d1c6d75d1e6af2087278b30e17f68e1fff
tcp: schedule EPOLLOUT after a partial sendmsg

For EPOLLET, applications must call sendmsg until they get EAGAIN.
Otherwise, there is no guarantee that EPOLLOUT is sent if there was
a failure upon memory allocation.

As a result on high-speed NICs, userspace observes multiple small
sendmsgs after a partial sendmsg until EAGAIN, since TCP can send
1-2 TSOs in between two sendmsg syscalls:

// One large partial send due to memory allocation failure.
sendmsg(20MB)   = 2MB
// Many small sends until EAGAIN.
sendmsg(18MB)   = 64KB
sendmsg(17.9MB) = 128KB
sendmsg(17.8MB) = 64KB
...
sendmsg(...)    = EAGAIN
// At this point, userspace can assume an EPOLLOUT.

To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios
to guarantee that we send EPOLLOUT after partial sendmsg.

After this commit userspace can assume that it will receive an EPOLLOUT
after the first partial sendmsg. This EPOLLOUT will benefit from
sk_stream_write_space() logic delaying the EPOLLOUT until significant
space is available in write queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/tcp.c