Thomas Graf says:
====================
Lightweight & flow based encapsulation
This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:
* Consolidate code between OVS and the rest of the kernel and get
rid of OVS vports and instead represent them as pure net_devices.
* Introduce a lightweight tunneling mechanism which enables flow
based encapsulation to improve scalability on both RX and TX.
* Do the above in an encapsulation unspecific way so that the
encapsulation type is eventually abstracted away from the user.
* Use the same forwarding decision for both native forwarding and
encapsulation thus allowing to switch between native IPv6 and
UDP encapsulation based on endpoint without requiring additional
logic
The fundamental changes introduces in this series are:
* A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
instructions. Depending on the specified type, the instructions
apply to UDP encapsulations, MPLS and possible other in the future.
* Depending on the encapsulation type, the output function of the
dst is directly overwritten or the dst merely attaches metadata and
relies on a subsequent net_device to apply it to the packet. The
latter is typically used if an inner and outer IP header exist which
require two subsequent routing lookups to be performed.
* A new metadata_dst structure which can be attached to skbs to
carry metadata in between subsystems. This new metadata transport
is used to provide a single interface for VXLAN, routing and OVS
to communicate through metadata.
The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:
VXLAN:
ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0
MPLS:
ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
Performance implications:
The additional memory allocation in the receive path should have
performance implications although it is not observable in standard
throughput tests if GRO is properly done. The correct net_device
model outweights the additional cost of the allocation. Furthermore,
this implication can be relaxed by reintroducing a direct unqueued
path from a software device to a consumer like bridge or OVS if
needed.
$ netperf -t TCP_STREAM -H 15.1.1.201
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 9118.17
Changes since v1:
* Properly initialize tun_id as reported by Julian
* Drop dupliate netif_keep_dst() as reported by Alexei
====================
Signed-off-by: David S. Miller <davem@davemloft.net>