From: David S. Miller Date: Wed, 23 Jan 2013 18:44:10 +0000 (-0500) Subject: Merge branch 'soreuseport' X-Git-Tag: v3.12-rc1~1418^2~284 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=c617f398edd4db2b8567a28e899a88f8f574798d;p=kernel%2Fkernel-generic.git Merge branch 'soreuseport' Tom Herbert says: ==================== This series implements so_reuseport (SO_REUSEPORT socket option) for TCP and UDP.  For TCP, so_reuseport allows multiple listener sockets to be bound to the same port.  In the case of UDP, so_reuseport allows multiple sockets to bind to the same port.  To prevent port hijacking all sockets bound to the same port using so_reuseport must have the same uid.  Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash. The motivating case for so_resuseport in TCP would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket.  This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads.  In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets.  We have seen the  disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest.  With so_reusport the distribution is uniform. The TCP implementation has a problem in that the request sockets for a listener are attached to a listener socket.  If a SYN is received, a listener socket is chosen and request structure is created (SYN-RECV state).  If the subsequent ack in 3WHS does not match the same port by so_reusport, the connection state is not found (reset) and the request structure is orphaned.  This scenario would occur when the number of listener sockets bound to a port changes (new ones are added, or old ones closed).  We are looking for a solution to this, maybe allow multiple sockets to share the same request table... The motivating case for so_reuseport in UDP would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads.  As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock.  Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. ==================== Signed-off-by: David S. Miller --- c617f398edd4db2b8567a28e899a88f8f574798d