sched/fair: Add ancestors of unthrottled undecayed cfs_rq
authorMichal Koutný <mkoutny@suse.com>
Fri, 17 Sep 2021 15:30:37 +0000 (17:30 +0200)
committerPeter Zijlstra <peterz@infradead.org>
Fri, 1 Oct 2021 11:57:57 +0000 (13:57 +0200)
Since commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8a2
("sched/fair: Ensure that the CFS parent is added after unthrottling").

I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.

Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.

Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Odin Ugedal <odin@uged.al>
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
kernel/sched/fair.c

index ff69f245b939595ab914c70c1f7fa01c1843778a..f6a05d9b54436ae0353d88574b8669e98c468d4e 100644 (file)
@@ -4936,8 +4936,12 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
        /* update hierarchical throttle state */
        walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-       if (!cfs_rq->load.weight)
+       /* Nothing to run but something to decay (on_list)? Complete the branch */
+       if (!cfs_rq->load.weight) {
+               if (cfs_rq->on_list)
+                       goto unthrottle_throttle;
                return;
+       }
 
        task_delta = cfs_rq->h_nr_running;
        idle_task_delta = cfs_rq->idle_h_nr_running;