Merge branch 'ipv6_route_sharing'
Martin KaFai Lau says:
====================
ipv6: Only create RTF_CACHE route after encountering pmtu exception
v4 -> v5:
- Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment().
- Further simplify the newly added rt6_get_pcpu_route(). If there is a
'prev' after cmpxchg, return prev instead of the newly created percpu
clone.
v3 -> v4:
- Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle
the iface down/unregister event.
- Remove rcu from the newly added rt6i_pcpu variable. It is not needed
because it has already been protected by the existing reader/writer lock.
- Thanks to 'Julian Anastasov <ja@ssi.bg>' for testing the FLOWI_FLAG_KNOWN_NH
patches.
v2 -> v3:
- Patch 5 to 7 are new. They take care of cases where the daddr in
skb is not the one used to do the route look-up. There is also
related changes to rt6_nexthop() since v2 which is in patch 2/9.
Thanks to 'Julian Anastasov <ja@ssi.bg>' for pointing it out.
- Fix a few problems in __ip6_rt_update_pmtu(), like setting the expire
and mtu before inserting to the tree and don't do dst_destroy() after
tree insertion failure. Also update the rt6i_pmtu in fib6_add_rt2node().
Thanks to 'Steffen Klassert <steffen.klassert@secunet.com>' for pointing
it out.
- Merge ip6_pmtu_rt_cache_alloc() into ip6_rt_cache_alloc().
v1 -> v2:
- Move the /128 route bug fixes to another series (accepted).
- Create a function for checking (rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)).
- Avoid shuffling the skb network_header. Instead, change the function
signature to take iph instead of skb.
- Many Thanks to 'Hannes Frederic Sowa <hannes@stressinduktion.org>' on
reviewing v1 and v2 and giving advice.
--Martin
~~~ start: v1 compose message (with the out-dated parts removed) ~~~
This series is to avoid creating a RTF_CACHE route whenever we are consulting
the fib6 tree with a new destination. Instead, only create RTF_CACHE route
when we see a pmtu exception.
Out of all ipv6 RTF_CACHE routes that are created, the percentage that has a
different mtu is very small. In one of our end-user facing proxy server,
only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC
traffic, there is no mtu exception.
A large fib6 tree has problems like, 'ip -6 r show' takes a long time.
gc may kick in too often. Also, when a service has restarted and a lot
of new TCP conn requests come in, it creates pressure on the tree by inserting
a lot of RTF_CACHE in a short time and it currently requires a write lock
to do that.
The first few patches are prep works to remove assumption that the
returned rt is always RTF_CACHE.
The patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception'
do the lazy RTF_CACHE route creation.
The following patches added percpu rt to compensate the performance loss after
doing the RTF_CACHE lazy creation.
Here is some numbers of the udpflood test. The udpflood has been
slightly modified to have a time limit instead of count limit.
A /64 via gateway route is used for the test. Each udpflood uses 10000 dst
addresses. The dst addresses of different udpflood processes do not overlap
with each other.
1 16M 15M
10 61M 61M
20 65M 62M
40 88M 83M
~~~ end: v1 compose message ~~~
====================
Signed-off-by: David S. Miller <davem@davemloft.net>