sched/fair: Block nohz tick_stop when cfs bandwidth in use
authorPhil Auld <pauld@redhat.com>
Wed, 12 Jul 2023 13:33:57 +0000 (09:33 -0400)
committerPeter Zijlstra <peterz@infradead.org>
Wed, 2 Aug 2023 14:19:26 +0000 (16:19 +0200)
commit88c56cfeaec4642aee8aac58b38d5708c6aae0d3
treea931bedfa512ff5828f15bf56d50158b030a1b60
parentc98c18270be115678f4295b10a5af5dcc9c4efa0
sched/fair: Block nohz tick_stop when cfs bandwidth in use

CFS bandwidth limits and NOHZ full don't play well together.  Tasks
can easily run well past their quotas before a remote tick does
accounting.  This leads to long, multi-period stalls before such
tasks can run again. Currently, when presented with these conflicting
requirements the scheduler is favoring nohz_full and letting the tick
be stopped. However, nohz tick stopping is already best-effort, there
are a number of conditions that can prevent it, whereas cfs runtime
bandwidth is expected to be enforced.

Make the scheduler favor bandwidth over stopping the tick by setting
TICK_DEP_BIT_SCHED when the only running task is a cfs task with
runtime limit enabled. We use cfs_b->hierarchical_quota to
determine if the task requires the tick.

Add check in pick_next_task_fair() as well since that is where
we have a handle on the task that is actually going to be running.

Add check in sched_can_stop_tick() to cover some edge cases such
as nr_running going from 2->1 and the 1 remains the running task.

Reviewed-By: Ben Segall <bsegall@google.com>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230712133357.381137-3-pauld@redhat.com
kernel/sched/core.c
kernel/sched/fair.c
kernel/sched/features.h
kernel/sched/sched.h