sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in fix_small_imbalance()
authorVladimir Davydov <vdavydov@parallels.com>
Sun, 15 Sep 2013 13:49:14 +0000 (17:49 +0400)
committerIngo Molnar <mingo@kernel.org>
Fri, 20 Sep 2013 09:59:38 +0000 (11:59 +0200)
In busiest->group_imb case we can come to fix_small_imbalance() with
local->avg_load > busiest->avg_load. This can result in wrong imbalance
fix-up, because there is the following check there where all the
members are unsigned:

if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
    (scaled_busy_load_per_task * imbn)) {
env->imbalance = busiest->load_per_task;
return;
}

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix it by substituting the subtraction with an equivalent addition in
the check.

[ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
  belonging to different cores on an HT-enabled machine with N logical
  cpus: just look at se.nr_migrations growth. ]

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ef167822e5c5b2d96cf5b0e3e4f4bdff3f0414a2.1379252740.git.vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/fair.c

index 0b99aae..2aedacc 100644 (file)
@@ -4823,8 +4823,8 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
                (busiest->load_per_task * SCHED_POWER_SCALE) /
                busiest->group_power;
 
-       if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
-           (scaled_busy_load_per_task * imbn)) {
+       if (busiest->avg_load + scaled_busy_load_per_task >=
+           local->avg_load + (scaled_busy_load_per_task * imbn)) {
                env->imbalance = busiest->load_per_task;
                return;
        }