sched/pelt: Relax the sync of util_sum with util_avg
authorVincent Guittot <vincent.guittot@linaro.org>
Tue, 11 Jan 2022 13:46:56 +0000 (14:46 +0100)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tue, 1 Feb 2022 16:27:10 +0000 (17:27 +0100)
[ Upstream commit 98b0d890220d45418cfbc5157b3382e6da5a12ab ]

Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
    https://bugzilla.kernel.org/show_bug.cgi?id=215045

He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")

This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.

Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.

In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.

Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
kernel/sched/fair.c
kernel/sched/pelt.h

index d41f966f5866aa710692f9cfd85ed1c09bb43956..6420580f2730b956116f423adf833aba4a7079a2 100644 (file)
@@ -3422,7 +3422,6 @@ void set_task_rq_fair(struct sched_entity *se,
        se->avg.last_update_time = n_last_update_time;
 }
 
-
 /*
  * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to
  * propagate its contribution. The key to this propagation is the invariant
@@ -3490,7 +3489,6 @@ void set_task_rq_fair(struct sched_entity *se,
  * XXX: only do this for the part of runnable > running ?
  *
  */
-
 static inline void
 update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
 {
@@ -3722,7 +3720,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 
                r = removed_util;
                sub_positive(&sa->util_avg, r);
-               sa->util_sum = sa->util_avg * divider;
+               sub_positive(&sa->util_sum, r * divider);
+               /*
+                * Because of rounding, se->util_sum might ends up being +1 more than
+                * cfs->util_sum. Although this is not a problem by itself, detaching
+                * a lot of tasks with the rounding problem between 2 updates of
+                * util_avg (~1ms) can make cfs->util_sum becoming null whereas
+                * cfs_util_avg is not.
+                * Check that util_sum is still above its lower bound for the new
+                * util_avg. Given that period_contrib might have moved since the last
+                * sync, we are only sure that util_sum must be above or equal to
+                *    util_avg * minimum possible divider
+                */
+               sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER);
 
                r = removed_runnable;
                sub_positive(&sa->runnable_avg, r);
index e06071bf3472c4eb40744ae9be0ffeffa07c9df8..c336f5f481bca25c781991eea01f956fa60c320d 100644 (file)
@@ -37,9 +37,11 @@ update_irq_load_avg(struct rq *rq, u64 running)
 }
 #endif
 
+#define PELT_MIN_DIVIDER       (LOAD_AVG_MAX - 1024)
+
 static inline u32 get_pelt_divider(struct sched_avg *avg)
 {
-       return LOAD_AVG_MAX - 1024 + avg->period_contrib;
+       return PELT_MIN_DIVIDER + avg->period_contrib;
 }
 
 static inline void cfs_se_util_change(struct sched_avg *avg)