sched/fair: Improve spreading of utilization
authorVincent Guittot <vincent.guittot@linaro.org>
Thu, 12 Mar 2020 16:54:29 +0000 (17:54 +0100)
committerPeter Zijlstra <peterz@infradead.org>
Fri, 20 Mar 2020 12:06:20 +0000 (13:06 +0100)
During load_balancing, a group with spare capacity will try to pull some
utilizations from an overloaded group. In such case, the load balance
looks for the runqueue with the highest utilization. Nevertheless, it
should also ensure that there are some pending tasks to pull otherwise
the load balance will fail to pull a task and the spread of the load will
be delayed.

This situation is quite transient but it's possible to highlight the
effect with a short run of sysbench test so the time to spread task impacts
the global result significantly.

Below are the average results for 15 iterations on an arm64 octo core:
sysbench --test=cpu --num-threads=8  --max-requests=1000 run

                           tip/sched/core  +patchset
total time:                172ms           158ms
per-request statistics:
         avg:                1.337ms         1.244ms
         max:               21.191ms        10.753ms

The average max doesn't fully reflect the wide spread of the value which
ranges from 1.350ms to more than 41ms for the tip/sched/core and from
1.350ms to 21ms with the patch.

Other factors like waiting for an idle load balance or cache hotness
can delay the spreading of the tasks which explains why we can still
have up to 21ms with the patch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312165429.990-1-vincent.guittot@linaro.org
kernel/sched/fair.c

index c7aaae2b1030d10c460a62478b90b321d74ac136..783356f96b7bbbbcebe8b56285b005368f3e4b70 100644 (file)
@@ -9313,6 +9313,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
                case migrate_util:
                        util = cpu_util(cpu_of(rq));
 
+                       /*
+                        * Don't try to pull utilization from a CPU with one
+                        * running task. Whatever its utilization, we will fail
+                        * detach the task.
+                        */
+                       if (nr_running <= 1)
+                               continue;
+
                        if (busiest_util < util) {
                                busiest_util = util;
                                busiest = rq;