epoll: optimize EPOLL_CTL_DEL using rcu
authorJason Baron <jbaron@akamai.com>
Tue, 12 Nov 2013 23:10:16 +0000 (15:10 -0800)
committerJiri Slaby <jslaby@suse.cz>
Wed, 12 Mar 2014 12:25:41 +0000 (13:25 +0100)
commit81ff0d3b03c450601fa29a49baedf826c738fc24
tree96c280e6d8817f10f4d5b6cc7e6ff08184a9dc2d
parentdd7ab8120e6709e90d7cdc5ecb66e18fd78001e4
epoll: optimize EPOLL_CTL_DEL using rcu

commit ae10b2b4eb01bedc91d29d5c5bb9e416fd806c40 upstream.

Nathan Zimmer found that once we get over 10+ cpus, the scalability of
SPECjbb falls over due to the contention on the global 'epmutex', which is
taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.

Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
by using rcu to guard against any concurrent traversals.

Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
simple topologies.  IE when adding a link from an epoll file descriptor to
a wakeup source, where the epoll file descriptor is not nested.

This patch (of 2):

Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
converting the file->f_ep_links list into an rcu one.  In this way, we can
traverse the epoll network on the add path in parallel with deletes.
Since deletes can't create loops or worse wakeup paths, this is safe.

This patch in combination with the patch "epoll: Do not take global 'epmutex'
for simple topologies", shows a dramatic performance improvement in
scalability for SPECjbb.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
CC: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
fs/eventpoll.c