When TCP_ZEROCOPY_RECEIVE operation has been added,
I made the mistake of automatically un-mapping prior
content before mapping new pages.
This has the unfortunate effect of adding potentially long
MMU operations (like TLB flushes) while socket lock is held.
Using madvise(MADV_DONTNEED) right after pages has been used
has two benefits :
1) This releases pages sooner, allowing pages to be recycled
if they were part of a page pool in a NIC driver.
2) No more long unmap operations while preventing immediate
processing of incoming packets.
The cost of the added system call is small enough.
Arjun will submit a kernel patch allowing to opt out from
the unmap attempt in tcp_zerocopy_receive()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
total_mmap += zc.length;
if (xflg)
hash_zone(addr, zc.length);
+ /* It is more efficient to unmap the pages right now,
+ * instead of doing this in next TCP_ZEROCOPY_RECEIVE.
+ */
+ madvise(addr, zc.length, MADV_DONTNEED);
total += zc.length;
}
if (zc.recv_skip_hint) {