prlimit: do not grab the tasklist_lock
authorBarret Rhoden <brho@google.com>
Thu, 6 Jan 2022 17:20:41 +0000 (12:20 -0500)
committerEric W. Biederman <ebiederm@xmission.com>
Tue, 8 Mar 2022 20:33:36 +0000 (14:33 -0600)
commit18c91bb2d87268d23868bf13508f5bc9cf04e89a
treee081820aafeec498f7d81513d6ec7411972ab1f9
parentc57bef0287dd5deeabaea5727950559fb9037cd9
prlimit: do not grab the tasklist_lock

Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck
for workloads that also must grab the tasklist_lock for waiting,
killing, and cloning.

The tasklist_lock was grabbed to protect tsk->sighand from disappearing
(becoming NULL).  tsk->signal was already protected by holding a
reference to tsk.

update_rlimit_cpu() assumed tsk->sighand != NULL.  With this commit, it
attempts to lock_task_sighand().  However, this means that
update_rlimit_cpu() can fail.  This only happens when a task is exiting.
Note that during exec, sighand may *change*, but it will not be NULL.

Prior to this commit, the do_prlimit() ensured that update_rlimit_cpu()
would not fail by read locking the tasklist_lock and checking tsk->sighand
!= NULL.

If update_rlimit_cpu() fails, there may be other tasks that are not
exiting that share tsk->signal.  However, the group_leader is the last
task to be released, so if we cannot update_rlimit_cpu(group_leader),
then the entire process is exiting.

The only other caller of update_rlimit_cpu() is
selinux_bprm_committing_creds().  It has tsk == current, so
update_rlimit_cpu() cannot fail (current->sighand cannot disappear
until current exits).

This change resulted in a 14% speedup on a microbenchmark where parents
kill and wait on their children, and children getpriority, setpriority,
and getrlimit.

Signed-off-by: Barret Rhoden <brho@google.com>
Link: https://lkml.kernel.org/r/20220106172041.522167-4-brho@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
include/linux/posix-timers.h
kernel/sys.c
kernel/time/posix-cpu-timers.c