sched/cfs: fix spurious active migration
authorVincent Guittot <vincent.guittot@linaro.org>
Fri, 29 Nov 2019 14:04:47 +0000 (15:04 +0100)
committerPeter Zijlstra <peterz@infradead.org>
Tue, 17 Dec 2019 12:32:48 +0000 (13:32 +0100)
The load balance can fail to find a suitable task during the periodic check
because  the imbalance is smaller than half of the load of the waiting
tasks. This results in the increase of the number of failed load balance,
which can end up to start an active migration. This active migration is
useless because the current running task is not a better choice than the
waiting ones. In fact, the current task was probably not running but
waiting for the CPU during one of the previous attempts and it had already
not been selected.

When load balance fails too many times to migrate a task, we should relax
the contraint on the maximum load of the tasks that can be migrated
similarly to what is done with cache hotness.

Before the rework, load balance used to set the imbalance to the average
load_per_task in order to mitigate such situation. This increased the
likelihood of migrating a task but also of selecting a larger task than
needed while more appropriate ones were in the list.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1575036287-6052-1-git-send-email-vincent.guittot@linaro.org
kernel/sched/fair.c

index 146b6c83633f7322e12c2f5b9955d311fe54d04d..ba749f579714cbe02f502b02402b8467aff41c50 100644 (file)
@@ -7328,7 +7328,14 @@ static int detach_tasks(struct lb_env *env)
                            load < 16 && !env->sd->nr_balance_failed)
                                goto next;
 
-                       if (load/2 > env->imbalance)
+                       /*
+                        * Make sure that we don't migrate too much load.
+                        * Nevertheless, let relax the constraint if
+                        * scheduler fails to find a good waiting task to
+                        * migrate.
+                        */
+                       if (load/2 > env->imbalance &&
+                           env->sd->nr_balance_failed <= env->sd->cache_nice_tries)
                                goto next;
 
                        env->imbalance -= load;