sched/psi: Per-cgroup PSI accounting disable/re-enable interface
authorChengming Zhou <zhouchengming@bytedance.com>
Wed, 7 Sep 2022 09:03:32 +0000 (17:03 +0800)
committerPeter Zijlstra <peterz@infradead.org>
Fri, 9 Sep 2022 09:08:33 +0000 (11:08 +0200)
commit34f26a15611afb03c33df6819359d36f5b382589
tree38765bb0112e0ba6c8f173d8a6a129bf0b9f92f0
parentdc86aba751e2867244411adda1562f6664747019
sched/psi: Per-cgroup PSI accounting disable/re-enable interface

PSI accounts stalls for each cgroup separately and aggregates it
at each level of the hierarchy. This may cause non-negligible overhead
for some workloads when under deep level of the hierarchy.

commit 3958e2d0c34e ("cgroup: make per-cgroup pressure stall tracking configurable")
make PSI to skip per-cgroup stall accounting, only account system-wide
to avoid this each level overhead.

But for our use case, we also want leaf cgroup PSI stats accounted for
userspace adjustment on that cgroup, apart from only system-wide adjustment.

So this patch introduce a per-cgroup PSI accounting disable/re-enable
interface "cgroup.pressure", which is a read-write single value file that
allowed values are "0" and "1", the defaults is "1" so per-cgroup
PSI stats is enabled by default.

Implementation details:

It should be relatively straight-forward to disable and re-enable
state aggregation, time tracking, averaging on a per-cgroup level,
if we can live with losing history from while it was disabled.
I.e. the avgs will restart from 0, total= will have gaps.

But it's hard or complex to stop/restart groupc->tasks[] updates,
which is not implemented in this patch. So we always update
groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when
the cgroup PSI stats is disabled.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com
Documentation/admin-guide/cgroup-v2.rst
include/linux/cgroup-defs.h
include/linux/psi.h
include/linux/psi_types.h
kernel/cgroup/cgroup.c
kernel/sched/psi.c