audit: report audit wait metric in audit status reply
authorMax Englander <max.englander@gmail.com>
Sat, 4 Jul 2020 15:15:28 +0000 (15:15 +0000)
committerPaul Moore <paul@paul-moore.com>
Tue, 21 Jul 2020 15:21:44 +0000 (11:21 -0400)
commitb43870c74f3fdf0cd06bf5f1b7a5ed70a2cd4ed2
treee858b27d67516e5769337eaa9e56d07a2cf22f12
parentf1d9b23cabc61e58509164c3c3132556476491d2
audit: report audit wait metric in audit status reply

In environments where the preservation of audit events and predictable
usage of system memory are prioritized, admins may use a combination of
--backlog_wait_time and -b options at the risk of degraded performance
resulting from backlog waiting. In some cases, this risk may be
preferred to lost events or unbounded memory usage. Ideally, this risk
can be mitigated by making adjustments when backlog waiting is detected.

However, detection can be difficult using the currently available
metrics. For example, an admin attempting to debug degraded performance
may falsely believe a full backlog indicates backlog waiting. It may
turn out the backlog frequently fills up but drains quickly.

To make it easier to reliably track degraded performance to backlog
waiting, this patch makes the following changes:

Add a new field backlog_wait_time_total to the audit status reply.
Initialize this field to zero. Add to this field the total time spent
by the current task on scheduled timeouts while the backlog limit is
exceeded. Reset field to zero upon request via AUDIT_SET.

Tested on Ubuntu 18.04 using complementary changes to the
audit-userspace and audit-testsuite:
- https://github.com/linux-audit/audit-userspace/pull/134
- https://github.com/linux-audit/audit-testsuite/pull/97

Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
include/uapi/linux/audit.h
kernel/audit.c