sched/numa: Simplify load_too_imbalanced()
authorSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Wed, 20 Jun 2018 17:02:44 +0000 (22:32 +0530)
committerIngo Molnar <mingo@kernel.org>
Wed, 25 Jul 2018 09:41:06 +0000 (11:41 +0200)
Currently load_too_imbalance() cares about the slope of imbalance.
It doesn't care of the direction of the imbalance.

However this may not work if nodes that are being compared have
dissimilar capacities. Few nodes might have more cores than other nodes
in the system. Also unlike traditional load balance at a NUMA sched
domain, multiple requests to migrate from the same source node to same
destination node may run in parallel. This can cause huge load
imbalance. This is specially true on a larger machines with either large
cores per node or more number of nodes in the system. Hence allow
move/swap only if the imbalance is going to reduce.

Running SPECjbb2005 on a 4 node machine and comparing bops/JVM
JVMS  LAST_PATCH  WITH_PATCH  %CHANGE
16    25058.2     25122.9     0.25
1     72950       73850       1.23

(numbers from v1 based on v4.17-rc5)
Testcase       Time:         Min         Max         Avg      StdDev
numa01.sh      Real:      516.14      892.41      739.84      151.32
numa01.sh       Sys:      153.16      192.99      177.70       14.58
numa01.sh      User:    39821.04    69528.92    57193.87    10989.48
numa02.sh      Real:       60.91       62.35       61.58        0.63
numa02.sh       Sys:       16.47       26.16       21.20        3.85
numa02.sh      User:     5227.58     5309.61     5265.17       31.04
numa03.sh      Real:      739.07      917.73      795.75       64.45
numa03.sh       Sys:       94.46      136.08      109.48       14.58
numa03.sh      User:    57478.56    72014.09    61764.48     5343.69
numa04.sh      Real:      442.61      715.43      530.31       96.12
numa04.sh       Sys:      224.90      348.63      285.61       48.83
numa04.sh      User:    35836.84    47522.47    40235.41     3985.26
numa05.sh      Real:      386.13      489.17      434.94       43.59
numa05.sh       Sys:      144.29      438.56      278.80      105.78
numa05.sh      User:    33255.86    36890.82    34879.31     1641.98

Testcase       Time:         Min         Max         Avg      StdDev   %Change
numa01.sh      Real:      435.78      653.81      534.58       83.20   38.39%
numa01.sh       Sys:      121.93      187.18      145.90       23.47   21.79%
numa01.sh      User:    37082.81    51402.80    43647.60     5409.75   31.03%
numa02.sh      Real:       60.64       61.63       61.19        0.40   0.637%
numa02.sh       Sys:       14.72       25.68       19.06        4.03   11.22%
numa02.sh      User:     5210.95     5266.69     5233.30       20.82   0.608%
numa03.sh      Real:      746.51      808.24      780.36       23.88   1.972%
numa03.sh       Sys:       97.26      108.48      105.07        4.28   4.197%
numa03.sh      User:    58956.30    61397.05    60162.95     1050.82   2.661%
numa04.sh      Real:      465.97      519.27      484.81       19.62   9.385%
numa04.sh       Sys:      304.43      359.08      334.68       20.64   -14.6%
numa04.sh      User:    37544.16    41186.15    39262.44     1314.91   2.478%
numa05.sh      Real:      411.57      457.20      433.29       16.58   0.380%
numa05.sh       Sys:      230.05      435.48      339.95       67.58   -17.9%
numa05.sh      User:    33325.54    36896.31    35637.84     1222.64   -2.12%

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1529514181-9842-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel/sched/fair.c

index b10e066..2268379 100644 (file)
@@ -1546,28 +1546,12 @@ static bool load_too_imbalanced(long src_load, long dst_load,
        src_capacity = env->src_stats.compute_capacity;
        dst_capacity = env->dst_stats.compute_capacity;
 
-       /* We care about the slope of the imbalance, not the direction. */
-       if (dst_load < src_load)
-               swap(dst_load, src_load);
-
-       /* Is the difference below the threshold? */
-       imb = dst_load * src_capacity * 100 -
-             src_load * dst_capacity * env->imbalance_pct;
-       if (imb <= 0)
-               return false;
+       imb = abs(dst_load * src_capacity - src_load * dst_capacity);
 
-       /*
-        * The imbalance is above the allowed threshold.
-        * Compare it with the old imbalance.
-        */
        orig_src_load = env->src_stats.load;
        orig_dst_load = env->dst_stats.load;
 
-       if (orig_dst_load < orig_src_load)
-               swap(orig_dst_load, orig_src_load);
-
-       old_imb = orig_dst_load * src_capacity * 100 -
-                 orig_src_load * dst_capacity * env->imbalance_pct;
+       old_imb = abs(orig_dst_load * src_capacity - orig_src_load * dst_capacity);
 
        /* Would this change make things worse? */
        return (imb > old_imb);