From 47d871f0f93d89815e9b839f50272f6fb2afd856 Mon Sep 17 00:00:00 2001 From: Andrew Birchall Date: Wed, 11 May 2016 18:31:49 -0700 Subject: [PATCH] move ENOMEM comment; create --stack-storage-size arg; misc cleanup Summary: * move ENOMEM comment from tools/offcputime.py to src/cc/export/helpers.h * create --stack-storage-size arg to allow the user to set the stack storage size * requires a positive_nonzero_int type checking function for argparse * clean up the rest of the args a bit * use parser.add_mutually_exclusive_group instead of manually handling the mutual exclusion * use positive_nonzero_int for duration arg * use positive_int for pid arg * only print a warning about increasing the storage size if at least one of the get_stackid errors was a -ENOMEM * remove the debug param (we can add this manually when testing) * fix a bug where all processes are traced when specifying pid of 0 Test Plan: ``` devbig680[bcc](no branch, rebasing abirchall_dev): ~/bcc_run_tool.sh offcputime --help [Running] /data/users/abirchall/bcc/tools/offcputime.py --help usage: offcputime.py [-h] [-u | -p PID] [-v] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [duration] Summarize off-CPU time by kernel stack trace positional arguments: duration duration of trace, in seconds optional arguments: -h, --help show this help message and exit -u, --useronly user threads only (no kernel threads) -p PID, --pid PID trace this PID only -v, --verbose show raw addresses -f, --folded output folded format --stack-storage-size STACK_STORAGE_SIZE the number of unique stack traces that can be stored and displayed examples: ./offcputime # trace off-CPU stack time until Ctrl-C ./offcputime 5 # trace for 5 seconds only ./offcputime -f 5 # 5 seconds, and output in folded format ./offcputime -u # don't include kernel threads (user only) ./offcputime -p 185 # trace fo PID 185 only devbig680[bcc](no branch, rebasing abirchall_dev): ~/bcc_run_tool.sh offcputime -f 0 [Running] /data/users/abirchall/bcc/tools/offcputime.py -f 0 usage: offcputime.py [-h] [-u | -p PID] [-v] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [duration] offcputime.py: error: argument duration: must be positive and nonzero devbig680[bcc](no branch, rebasing abirchall_dev): ~/bcc_run_tool.sh offcputime -f -1 [Running] /data/users/abirchall/bcc/tools/offcputime.py -f -1 usage: offcputime.py [-h] [-u | -p PID] [-v] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [duration] offcputime.py: error: argument duration: must be positive and nonzero devbig680[bcc](no branch, rebasing abirchall_dev): ~/bcc_run_tool.sh offcputime --stack-storage-size 0 -f 1 [Running] /data/users/abirchall/bcc/tools/offcputime.py --stack-storage-size 0 -f 1 usage: offcputime.py [-h] [-u | -p PID] [-v] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [duration] offcputime.py: error: argument --stack-storage-size: must be positive and nonzero devbig680[bcc](no branch, rebasing abirchall_dev): ~/bcc_run_tool.sh offcputime --stack-storage-size 1 -f 1 [Running] /data/users/abirchall/bcc/tools/offcputime.py --stack-storage-size 1 -f 1 swapper/27;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 7 swapper/29;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 102 swapper/25;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 113 swapper/26;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 138 swapper/28;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 164 swapper/30;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 177 swapper/24;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 980 swapper/31;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 1556 swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2038 swapper/3;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2190 swapper/2;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2235 swapper/15;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2260 swapper/13;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2450 swapper/6;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2646 swapper/12;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2675 swapper/11;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2769 swapper/8;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 2885 swapper/4;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 3134 swapper/10;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 3416 swapper/14;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 3702 swapper/7;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 4343 swapper/9;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 5397 swapper/5;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 6836 swapper/17;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 13792 swapper/19;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 14488 swapper/22;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 15298 swapper/18;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 15735 swapper/20;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 17333 swapper/16;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 17645 swapper/23;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 17681 swapper/21;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule 18607 WARNING: 92 stack traces could not be displayed. Consider increasing --stack-storage-size. --- src/cc/export/helpers.h | 7 +++++ tools/offcputime.py | 67 +++++++++++++++++++++++++++++--------------- tools/offcputime_example.txt | 23 +++++++++------ 3 files changed, 65 insertions(+), 32 deletions(-) diff --git a/src/cc/export/helpers.h b/src/cc/export/helpers.h index 2596eb3..2570849 100644 --- a/src/cc/export/helpers.h +++ b/src/cc/export/helpers.h @@ -175,12 +175,19 @@ static int (*bpf_perf_event_output)(void *ctx, void *map, u32 index, void *data, (void *) BPF_FUNC_perf_event_output; static int (*bpf_skb_load_bytes)(void *ctx, int offset, void *to, u32 len) = (void *) BPF_FUNC_skb_load_bytes; + +/* bpf_get_stackid will return a negative value in the case of an error + * + * BPF_STACK_TRACE(_name, _size) will allocate space for _size stack traces. + * -ENOMEM will be returned when this limit is reached. + */ static int (*bpf_get_stackid_)(void *ctx, void *map, u64 flags) = (void *) BPF_FUNC_get_stackid; static inline __attribute__((always_inline)) int bpf_get_stackid(uintptr_t map, void *ctx, u64 flags) { return bpf_get_stackid_(ctx, (void *)map, flags); } + static int (*bpf_csum_diff)(void *from, u64 from_size, void *to, u64 to_size, u64 seed) = (void *) BPF_FUNC_csum_diff; diff --git a/tools/offcputime.py b/tools/offcputime.py index e8f2120..31dc0a3 100755 --- a/tools/offcputime.py +++ b/tools/offcputime.py @@ -21,35 +21,54 @@ from time import sleep, strftime import argparse import signal +# arg validation +def positive_int(val): + try: + ival = int(val) + except ValueError: + raise argparse.ArgumentTypeError("must be an integer") + + if ival < 0: + raise argparse.ArgumentTypeError("must be positive") + return ival + +def positive_nonzero_int(val): + ival = positive_int(val) + if ival == 0: + raise argparse.ArgumentTypeError("must be nonzero") + return ival + # arguments examples = """examples: ./offcputime # trace off-CPU stack time until Ctrl-C ./offcputime 5 # trace for 5 seconds only ./offcputime -f 5 # 5 seconds, and output in folded format ./offcputime -u # don't include kernel threads (user only) - ./offcputime -p 185 # trace fo PID 185 only + ./offcputime -p 185 # trace for PID 185 only """ parser = argparse.ArgumentParser( description="Summarize off-CPU time by kernel stack trace", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=examples) -parser.add_argument("-u", "--useronly", action="store_true", +thread_group = parser.add_mutually_exclusive_group() +thread_group.add_argument("-u", "--useronly", action="store_true", help="user threads only (no kernel threads)") -parser.add_argument("-p", "--pid", +thread_group.add_argument("-p", "--pid", type=positive_int, help="trace this PID only") parser.add_argument("-v", "--verbose", action="store_true", help="show raw addresses") parser.add_argument("-f", "--folded", action="store_true", help="output folded format") +parser.add_argument("--stack-storage-size", default=1024, + type=positive_nonzero_int, + help="the number of unique stack traces that can be stored and " \ + "displayed (default 1024)") parser.add_argument("duration", nargs="?", default=99999999, + type=positive_nonzero_int, help="duration of trace, in seconds") args = parser.parse_args() folded = args.folded duration = int(args.duration) -debug = 0 -if args.pid and args.useronly: - print("ERROR: use either -p or -u.") - exit() # signal handler def signal_ignore(signal, frame): @@ -68,7 +87,7 @@ struct key_t { }; BPF_HASH(counts, struct key_t); BPF_HASH(start, u32); -BPF_STACK_TRACE(stack_traces, 1024) +BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE) int oncpu(struct pt_regs *ctx, struct task_struct *prev) { u32 pid; @@ -104,23 +123,26 @@ int oncpu(struct pt_regs *ctx, struct task_struct *prev) { return 0; } """ -if args.pid: + +# set thread filter +if args.pid is not None: filter = 'pid == %s' % args.pid elif args.useronly: filter = '!(prev->flags & PF_KTHREAD)' else: filter = '1' bpf_text = bpf_text.replace('FILTER', filter) -if debug: - print(bpf_text) + +# set stack storage size +bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size)) # initialize BPF b = BPF(text=bpf_text) b.attach_kprobe(event="finish_task_switch", fn_name="oncpu") matched = b.num_open_kprobes() if matched == 0: - print("0 functions traced. Exiting.") - exit() + print("error: 0 functions traced. Exiting.", file=stderr) + exit(1) # header if not folded: @@ -140,18 +162,16 @@ if not folded: print() missing_stacks = 0 +has_enomem = False counts = b.get_table("counts") stack_traces = b.get_table("stack_traces") for k, v in sorted(counts.items(), key=lambda counts: counts[1].value): - """ - bpf_get_stackid will return a negative value in the case of an error - - BPF_STACK_TRACE(_name, _size) will allocate space for _size stack traces - on each CPU. -ENOMEM will be returned when this limit is reached within a - single CPU. - """ + # handle get_stackid erorrs if k.stack_id < 0: missing_stacks += 1 + # check for an ENOMEM error + if k.stack_id == -12: + has_enomem = True continue stack = stack_traces.walk(k.stack_id) @@ -169,7 +189,8 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value): print(" %d\n" % v.value) if missing_stacks > 0: - print(("WARNING: %d stack traces could not be displayed. " - "You may be running out of storage space for stack traces.") % - missing_stacks, + enomem_str = "" if not has_enomem else \ + " Consider increasing --stack-storage-size." + print("WARNING: %d stack traces could not be displayed.%s" % + (missing_stacks, enomem_str), file=stderr) diff --git a/tools/offcputime_example.txt b/tools/offcputime_example.txt index deab802..cb42b4b 100644 --- a/tools/offcputime_example.txt +++ b/tools/offcputime_example.txt @@ -11,7 +11,7 @@ Here is some example output. To explain what we are seeing: the very first stack trace looks like a page fault (do_page_fault() etc) from the "chmod" command, and in total was off-CPU for 13 microseconds. -# ./offcputime +# ./offcputime Tracing off-CPU time (us) by kernel stack... Hit Ctrl-C to end. ^C schedule @@ -644,7 +644,7 @@ Tracing off-CPU time (us) by kernel stack for 5 secs. Here, dd was blocked for 4.4 seconds out of 5. Or put differently, likely on-CPU for about 12% of the time. Which matches the ratio seen by time(1): -# time dd if=/dev/md0 iflag=direct of=/dev/null bs=1k +# time dd if=/dev/md0 iflag=direct of=/dev/null bs=1k ^C108115+0 records in 108114+0 records out 110708736 bytes (111 MB) copied, 13.7565 s, 8.0 MB/s @@ -718,19 +718,24 @@ creating your "off-CPU time flame graphs". USAGE message: # ./offcputime -h -usage: offcputime [-h] [-u] [-p PID] [-v] [-f] [duration] +usage: offcputime.py [-h] [-u | -p PID] [-v] [-f] + [--stack-storage-size STACK_STORAGE_SIZE] + [duration] Summarize off-CPU time by kernel stack trace positional arguments: - duration duration of trace, in seconds + duration duration of trace, in seconds optional arguments: - -h, --help show this help message and exit - -u, --useronly user threads only (no kernel threads) - -p PID, --pid PID trace this PID only - -v, --verbose show raw addresses - -f, --folded output folded format + -h, --help show this help message and exit + -u, --useronly user threads only (no kernel threads) + -p PID, --pid PID trace this PID only + -v, --verbose show raw addresses + -f, --folded output folded format + --stack-storage-size STACK_STORAGE_SIZE + the number of unique stack traces that can be stored + and displayed (default 1024) examples: ./offcputime # trace off-CPU stack time until Ctrl-C -- 2.7.4