From 860b649746013acdb60b8d1b577ae17dbf92a676 Mon Sep 17 00:00:00 2001 From: Brendan Gregg Date: Tue, 20 Oct 2015 15:52:23 -0700 Subject: [PATCH] softirq and hardirq --- README.md | 2 + man/man8/hardirqs.8 | 92 +++++++ man/man8/softirqs.8 | 93 +++++++ tools/hardirqs | 155 +++++++++++ tools/hardirqs_example.txt | 638 +++++++++++++++++++++++++++++++++++++++++++++ tools/softirqs | 145 +++++++++++ tools/softirqs_example.txt | 203 +++++++++++++++ 7 files changed, 1328 insertions(+) create mode 100644 man/man8/hardirqs.8 create mode 100644 man/man8/softirqs.8 create mode 100755 tools/hardirqs create mode 100644 tools/hardirqs_example.txt create mode 100755 tools/softirqs create mode 100644 tools/softirqs_example.txt diff --git a/README.md b/README.md index cbb1ff1..b0564a7 100644 --- a/README.md +++ b/README.md @@ -69,9 +69,11 @@ Tools: - tools/[biosnoop](tools/biosnoop): Trace block device I/O with PID and latency. [Examples](tools/biosnoop_example.txt). - tools/[funccount](tools/funccount): Count kernel function calls. [Examples](tools/funccount_example.txt). - tools/[funclatency](tools/funclatency): Time kernel functions and show their latency distribution. [Examples](tools/funclatency_example.txt). +- tools/[hardirqs](tools/hardirqs): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt). - tools/[killsnoop](tools/killsnoop): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt). - tools/[opensnoop](tools/opensnoop): Trace open() syscalls. [Examples](tools/opensnoop_example.txt). - tools/[pidpersec](tools/pidpersec): Count new processes (via fork). [Examples](tools/pidpersec_example.txt). +- tools/[softirqs](tools/softirqs): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt). - tools/[syncsnoop](tools/syncsnoop): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt). - tools/[tcpaccept](tools/tcpaccept): Trace TCP passive connections (accept()). [Examples](tools/tcpaccept_example.txt). - tools/[tcpconnect](tools/tcpconnect): Trace TCP active connections (connect()). [Examples](tools/tcpconnect_example.txt). diff --git a/man/man8/hardirqs.8 b/man/man8/hardirqs.8 new file mode 100644 index 0000000..7276ef7 --- /dev/null +++ b/man/man8/hardirqs.8 @@ -0,0 +1,92 @@ +.TH hardirqs 8 "2015-10-20" "USER COMMANDS" +.SH NAME +hardirqs \- Measure hard IRQ (hard interrupt) event time. Uses Linux eBPF/bcc. +.SH SYNOPSIS +.B hardirqs [\-h] [\-T] [\-N] [\-d] [interval] [count] +.SH DESCRIPTION +This summarizes the time spent servicing hard IRQs (hard interrupts), and can +show this time as either totals or histogram distributions. A system-wide +summary of this time is shown by the %irq column of mpstat(1), and event +counts (but not times) are shown by /proc/interrupts. + +WARNING: This currently uses dynamic tracing of hard interrupts. You should +understand what this means before use. Try in a test environment. Future +versions should switch to tracepoints. + +Since this uses BPF, only the root user can use this tool. +.SH REQUIREMENTS +CONFIG_BPF and bcc. +.SH OPTIONS +.TP +\-h +Print usage message. +.TP +\-T +Include timestamps on output. +.TP +\-N +Output in nanoseconds +.TP +\-d +Show IRQ time distribution as histograms +.SH EXAMPLES +.TP +Sum hard IRQ event time until Ctrl-C: +# +.B hardirqs +.TP +Show hard IRQ event time as histograms: +# +.B hardirqs \-d +.TP +Print 1 second summaries, 10 times: +# +.B hardirqs 1 10 +.TP +1 second summaries, printed in nanoseconds, with timestamps: +# +.B hardirqs \-NT 1 +.SH FIELDS +.TP +HARDIRQ +The irq action name for this hard IRQ. +.TP +TOTAL_usecs +Total time spent in this hard IRQ in microseconds. +.TP +TOTAL_nsecs +Total time spent in this hard IRQ in nanoseconds. +.TP +usecs +Range of microseconds for this bucket. +.TP +nsecs +Range of nanoseconds for this bucket. +.TP +count +Number of hard IRQs in this time range. +.TP +distribution +ASCII representation of the distribution (the count column). +.SH OVERHEAD +This traces kernel functions and maintains in-kernel counts, which +are asynchronously copied to user-space. While the rate of interrupts +be very high (>1M/sec), this is a relatively efficient way to trace these +events, and so the overhead is expected to be small for normal workloads, but +could become noticable for heavy workloads. Measure in a test environment +before use. +.SH SOURCE +This is from bcc. +.IP +https://github.com/iovisor/bcc +.PP +Also look in the bcc distribution for a companion _examples.txt file containing +example usage, output, and commentary for this tool. +.SH OS +Linux +.SH STABILITY +Unstable - in development. +.SH AUTHOR +Brendan Gregg +.SH SEE ALSO +softirqs(8) diff --git a/man/man8/softirqs.8 b/man/man8/softirqs.8 new file mode 100644 index 0000000..011382c --- /dev/null +++ b/man/man8/softirqs.8 @@ -0,0 +1,93 @@ +.TH softirqs 8 "2015-10-20" "USER COMMANDS" +.SH NAME +softirqs \- Measure soft IRQ (soft interrupt) event time. Uses Linux eBPF/bcc. +.SH SYNOPSIS +.B softirqs [\-h] [\-T] [\-N] [\-d] [interval] [count] +.SH DESCRIPTION +This summarizes the time spent servicing soft IRQs (soft interrupts), and can +show this time as either totals or histogram distributions. A system-wide +summary of this time is shown by the %soft column of mpstat(1), and soft IRQ +event counts (but not times) are available in /proc/softirqs. + +WARNING: This currently uses dynamic tracing of various soft interrupt +functions, and can easily not work with different kernel versions. Check and +adjust the code as necessary. Also try in a test environment and ensure this +tool is safe before use. Future versions should switch to tracepoints. + +Since this uses BPF, only the root user can use this tool. +.SH REQUIREMENTS +CONFIG_BPF and bcc. +.SH OPTIONS +.TP +\-h +Print usage message. +.TP +\-T +Include timestamps on output. +.TP +\-N +Output in nanoseconds +.TP +\-d +Show IRQ time distribution as histograms +.SH EXAMPLES +.TP +Sum soft IRQ event time until Ctrl-C: +# +.B softirqs +.TP +Show soft IRQ event time as histograms: +# +.B softirqs \-d +.TP +Print 1 second summaries, 10 times: +# +.B softirqs 1 10 +.TP +1 second summaries, printed in nanoseconds, with timestamps: +# +.B softirqs \-NT 1 +.SH FIELDS +.TP +SOFTIRQ +The kernel function name that performs the soft IRQ action. +.TP +TOTAL_usecs +Total time spent in this soft IRQ function in microseconds. +.TP +TOTAL_nsecs +Total time spent in this soft IRQ function in nanoseconds. +.TP +usecs +Range of microseconds for this bucket. +.TP +nsecs +Range of nanoseconds for this bucket. +.TP +count +Number of soft IRQs in this time range. +.TP +distribution +ASCII representation of the distribution (the count column). +.SH OVERHEAD +This traces kernel functions and maintains in-kernel counts, which +are asynchronously copied to user-space. While the rate of interrupts +be very high (>1M/sec), this is a relatively efficient way to trace these +events, and so the overhead is expected to be small for normal workloads, but +could become noticable for heavy workloads. Measure in a test environment +before use. +.SH SOURCE +This is from bcc. +.IP +https://github.com/iovisor/bcc +.PP +Also look in the bcc distribution for a companion _examples.txt file containing +example usage, output, and commentary for this tool. +.SH OS +Linux +.SH STABILITY +Unstable - in development. +.SH AUTHOR +Brendan Gregg +.SH SEE ALSO +hardirqs(8) diff --git a/tools/hardirqs b/tools/hardirqs new file mode 100755 index 0000000..bff5e51 --- /dev/null +++ b/tools/hardirqs @@ -0,0 +1,155 @@ +#!/usr/bin/python +# +# hardirqs Summarize hard IRQ (interrupt) event time. +# For Linux, uses BCC, eBPF. +# +# USAGE: hardirqs [-h] [-T] [-Q] [-m] [-D] [interval] [count] +# +# Thanks Amer Ather for help understanding irq behavior. +# +# Copyright (c) 2015 Brendan Gregg. +# Licensed under the Apache License, Version 2.0 (the "License") +# +# 19-Oct-2015 Brendan Gregg Created this. + +from __future__ import print_function +from bcc import BPF +from time import sleep, strftime +import argparse + +### arguments +examples = """examples: + ./hardirqs # sum hard irq event time + ./hardirqs -d # show hard irq event time as histograms + ./hardirqs 1 10 # print 1 second summaries, 10 times + ./hardirqs -NT 1 # 1s summaries, nanoseconds, and timestamps +""" +parser = argparse.ArgumentParser( + description="Summarize hard irq event time as histograms", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=examples) +parser.add_argument("-T", "--timestamp", action="store_true", + help="include timestamp on output") +parser.add_argument("-N", "--nanoseconds", action="store_true", + help="output in nanoseconds") +parser.add_argument("-d", "--dist", action="store_true", + help="show distributions as histograms") +parser.add_argument("interval", nargs="?", default=99999999, + help="output interval, in seconds") +parser.add_argument("count", nargs="?", default=99999999, + help="number of outputs") +args = parser.parse_args() +countdown = int(args.count) +if args.nanoseconds: + factor = 1 + label = "nsecs" +else: + factor = 1000 + label = "usecs" +debug = 0 + +### define BPF program +bpf_text = """ +#include +#include +#include +#include + +typedef struct irq_key { + char name[32]; + u64 slot; +} irq_key_t; +BPF_HASH(start, u32); +BPF_HASH(irqdesc, u32, struct irq_desc *); +BPF_HISTOGRAM(dist, irq_key_t); + +// time IRQ +int trace_start(struct pt_regs *ctx, struct irq_desc *desc) +{ + u32 pid = bpf_get_current_pid_tgid(); + u64 ts = bpf_ktime_get_ns(); + start.update(&pid, &ts); + irqdesc.update(&pid, &desc); + return 0; +} + +int trace_completion(struct pt_regs *ctx) +{ + u64 *tsp, delta; + struct irq_desc **descp; + u32 pid = bpf_get_current_pid_tgid(); + + // fetch timestamp and calculate delta + tsp = start.lookup(&pid); + descp = irqdesc.lookup(&pid); + if (tsp == 0 || descp == 0) { + return 0; // missed start + } + // Note: descp is a value from map, so '&' can be done without + // probe_read, but the next level irqaction * needs a probe read. + // Do these steps first after reading the map, otherwise some of these + // pointers may get pushed onto the stack and verifier will fail. + struct irqaction *action = 0; + bpf_probe_read(&action, sizeof(action), &(*descp)->action); + const char **namep = &action->name; + char *name = 0; + bpf_probe_read(&name, sizeof(name), namep); + delta = bpf_ktime_get_ns() - *tsp; + + // store as sum or histogram + STORE + + start.delete(&pid); + irqdesc.delete(&pid); + return 0; +} +""" + +### code substitutions +if args.dist: + bpf_text = bpf_text.replace('STORE', + 'irq_key_t key = {.slot = bpf_log2l(delta)};' + + 'bpf_probe_read(&key.name, sizeof(key.name), name);' + + 'dist.increment(key);') +else: + bpf_text = bpf_text.replace('STORE', + 'irq_key_t key = {.slot = 0 /* ignore */};' + + 'bpf_probe_read(&key.name, sizeof(key.name), name);' + + 'u64 zero = 0, *vp = dist.lookup_or_init(&key, &zero);' + + '(*vp) += delta;') +if debug: + print(bpf_text) + +### load BPF program +b = BPF(text=bpf_text) + +# these should really use irq:irq_handler_entry/exit tracepoints: +b.attach_kprobe(event="handle_irq_event_percpu", fn_name="trace_start") +b.attach_kretprobe(event="handle_irq_event_percpu", fn_name="trace_completion") + +print("Tracing hard irq event time... Hit Ctrl-C to end.") + +### output +exiting = 0 if args.interval else 1 +dist = b.get_table("dist") +while (1): + try: + sleep(int(args.interval)) + except KeyboardInterrupt: + exiting=1 + + print() + if args.timestamp: + print("%-8s\n" % strftime("%H:%M:%S"), end="") + + if args.dist: + dist.print_log2_hist(label, "hardirq") + else: + print("%-26s %11s" % ("HARDIRQ", "TOTAL_" + label)) + for k, v in sorted(dist.items(), key=lambda dist: dist[1].value): + print("%-26s %11d" % (k.name, v.value / factor)) + dist.clear() + + countdown -= 1 + if exiting or countdown == 0: + exit() diff --git a/tools/hardirqs_example.txt b/tools/hardirqs_example.txt new file mode 100644 index 0000000..c2d9fa2 --- /dev/null +++ b/tools/hardirqs_example.txt @@ -0,0 +1,638 @@ +Demonstrations of hardirqs, the Linux eBPF/bcc version. + + +This program traces hard interrupts (irqs), and stores timing statistics +in-kernel for efficiency. For example: + +# ./hardirqs +Tracing hard irq event time... Hit Ctrl-C to end. +^C +HARDIRQ TOTAL_usecs +callfuncsingle0 2 +callfuncsingle5 5 +callfuncsingle6 5 +callfuncsingle7 21 +blkif 66 +timer7 84 +resched5 94 +resched0 97 +resched3 102 +resched7 111 +resched6 255 +timer3 362 +resched4 367 +timer5 474 +timer1 529 +timer6 679 +timer2 746 +timer4 943 +resched1 1048 +timer0 1558 +resched2 1750 +eth0 11441 + +The HARDIRQ column prints the interrupt action name. While tracing, the eth0 +hard irq action ran for 11441 microseconds (11 milliseconds) in total. + +Many other interrupts are visible in the output: this is an 8 CPU system, and +some of these interrupts have a separate action per-CPU (eg, "timer", +"resched"). + + +An interval can be provided, and also optionally a count. Eg, printing output +every 1 second, and including timestamps (-T): + +# ./hardirqs -T 1 3 +Tracing hard irq event time... Hit Ctrl-C to end. + +22:16:14 +HARDIRQ TOTAL_usecs +callfuncsingle0 2 +callfuncsingle7 5 +callfuncsingle3 5 +callfuncsingle2 5 +callfuncsingle6 6 +callfuncsingle1 11 +resched0 32 +blkif 51 +resched5 71 +resched7 71 +resched4 72 +resched6 82 +timer7 172 +resched1 187 +resched2 236 +timer3 252 +resched3 282 +timer1 320 +timer2 374 +timer6 396 +timer5 427 +timer4 470 +timer0 1430 +eth0 7498 + +22:16:15 +HARDIRQ TOTAL_usecs +callfuncsingle7 6 +callfuncsingle5 11 +callfuncsingle4 13 +timer2 17 +callfuncsingle6 18 +resched0 21 +blkif 33 +resched3 40 +resched5 60 +resched4 69 +resched6 70 +resched7 74 +timer7 86 +resched2 91 +timer3 134 +resched1 293 +timer5 354 +timer1 433 +timer6 497 +timer4 1112 +timer0 1768 +eth0 6972 + +22:16:16 +HARDIRQ TOTAL_usecs +callfuncsingle7 5 +callfuncsingle3 5 +callfuncsingle2 6 +timer3 10 +resched0 18 +callfuncsingle4 22 +resched5 27 +resched6 44 +blkif 45 +resched7 65 +resched4 69 +timer4 77 +resched2 97 +timer7 98 +resched3 103 +timer2 169 +resched1 226 +timer5 525 +timer1 691 +timer6 697 +timer0 1415 +eth0 7152 + +This can be useful for quantifying where CPU cycles are spent among the hard +interrupts (summarized as the %irq column from mpstat(1)). The output above +shows that most time was spent processing for eth0 (network interface), which +was around 7 milliseconds per second (total across all CPUs). + +Note that the time spent among the "timer" interrupts was low, and usually less +than one microsecond per second. Here's the hardirq per-second output when the +perf tool is performing a 999 Hertz CPU profile ("perf record -F 999 -a ..."): + +22:13:59 +HARDIRQ TOTAL_usecs +callfuncsingle7 5 +callfuncsingle5 5 +callfuncsingle3 6 +callfuncsingle4 7 +callfuncsingle6 19 +blkif 66 +resched0 66 +resched2 82 +resched7 87 +resched3 96 +resched4 118 +resched5 120 +resched6 130 +resched1 230 +timer3 946 +timer1 1981 +timer7 2618 +timer5 3063 +timer6 3141 +timer4 3511 +timer2 3554 +timer0 5044 +eth0 16015 + +This sheds some light into the CPU overhead of the perf profiler, which cost +around 3 milliseconds per second. Note that I'm usually profiling at a much +lower rate, 99 Hertz, which looks like this: + +22:22:12 +HARDIRQ TOTAL_usecs +callfuncsingle3 5 +callfuncsingle6 5 +callfuncsingle5 22 +blkif 46 +resched6 47 +resched5 57 +resched4 66 +resched7 78 +resched2 97 +resched0 214 +timer2 326 +timer0 498 +timer5 536 +timer6 576 +timer1 600 +timer4 982 +resched1 1315 +timer7 1364 +timer3 1825 +resched3 5708 +eth0 9743 + +Much lower (and remember to compare this to the baseline). Note that perf has +other overheads (non-irq CPU cycles, file system storage). + + +The distribution of interrupt run time can be printed as a histogram with the -d +option. Eg: + +# ./hardirqs -d +Tracing hard irq event time... Hit Ctrl-C to end. +^C + +hardirq = 'callfuncsingle1' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 0 | | + 8192 -> 16383 : 1 |****************************************| + +hardirq = 'callfuncsingle0' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 1 |****************************************| + +hardirq = 'callfuncsingle3' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 3 |****************************************| + +hardirq = 'callfuncsingle2' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 2 |****************************************| + +hardirq = 'callfuncsingle5' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 5 |****************************************| + +hardirq = 'callfuncsingle4' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 6 |****************************************| + +hardirq = 'callfuncsingle7' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 4 |****************************************| + +hardirq = 'callfuncsingle6' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 4 |****************************************| + +hardirq = 'eth0' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 5102 |********* | + 1024 -> 2047 : 20617 |****************************************| + 2048 -> 4095 : 4832 |********* | + 4096 -> 8191 : 12 | | + +hardirq = 'timer7' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 9 |*** | + 2048 -> 4095 : 70 |***************************** | + 4096 -> 8191 : 94 |****************************************| + +hardirq = 'timer6' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 1 | | + 2048 -> 4095 : 86 |*********** | + 4096 -> 8191 : 295 |****************************************| + 8192 -> 16383 : 28 |*** | + +hardirq = 'timer5' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 1 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 137 |****************************************| + 4096 -> 8191 : 123 |*********************************** | + 8192 -> 16383 : 8 |** | + +hardirq = 'timer4' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 46 |********* | + 4096 -> 8191 : 198 |****************************************| + 8192 -> 16383 : 49 |********* | + +hardirq = 'timer3' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 4 | | + 2048 -> 4095 : 210 |****************************************| + 4096 -> 8191 : 186 |*********************************** | + +hardirq = 'timer2' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 245 |****************************************| + 4096 -> 8191 : 227 |************************************* | + 8192 -> 16383 : 6 | | + +hardirq = 'timer1' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 6 |* | + 2048 -> 4095 : 112 |************************ | + 4096 -> 8191 : 181 |****************************************| + 8192 -> 16383 : 7 |* | + +hardirq = 'timer0' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 887 |****************************************| + 8192 -> 16383 : 92 |**** | + +hardirq = 'blkif' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 9 |****************************************| + 8192 -> 16383 : 7 |******************************* | + 16384 -> 32767 : 2 |******** | + +hardirq = 'resched4' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 104 |****************************************| + 2048 -> 4095 : 80 |****************************** | + +hardirq = 'resched5' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 27 |***** | + 1024 -> 2047 : 216 |****************************************| + 2048 -> 4095 : 27 |***** | + 4096 -> 8191 : 1 | | + +hardirq = 'resched6' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 480 |******************* | + 1024 -> 2047 : 1003 |****************************************| + 2048 -> 4095 : 64 |** | + +hardirq = 'resched7' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 46 |********* | + 1024 -> 2047 : 190 |****************************************| + 2048 -> 4095 : 42 |******** | + +hardirq = 'resched0' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 11 |**** | + 1024 -> 2047 : 100 |****************************************| + 2048 -> 4095 : 23 |********* | + +hardirq = 'resched1' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 96 |******** | + 1024 -> 2047 : 462 |****************************************| + 2048 -> 4095 : 36 |*** | + +hardirq = 'resched2' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 120 |************************** | + 1024 -> 2047 : 183 |****************************************| + 2048 -> 4095 : 41 |******** | + +hardirq = 'resched3' + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 789 |****************************************| + 2048 -> 4095 : 39 |* | + + +USAGE message: + +# ./hardirqs -h +usage: hardirqs [-h] [-T] [-N] [-d] [interval] [count] + +Summarize hard irq event time as histograms + +positional arguments: + interval output interval, in seconds + count number of outputs + +optional arguments: + -h, --help show this help message and exit + -T, --timestamp include timestamp on output + -N, --nanoseconds output in nanoseconds + -d, --dist show distributions as histograms + +examples: + ./hardirqs # sum hard irq event time + ./hardirqs -d # show hard irq event time as histograms + ./hardirqs 1 10 # print 1 second summaries, 10 times + ./hardirqs -NT 1 # 1s summaries, nanoseconds, and timestamps diff --git a/tools/softirqs b/tools/softirqs new file mode 100755 index 0000000..b3c2f3b --- /dev/null +++ b/tools/softirqs @@ -0,0 +1,145 @@ +#!/usr/bin/python +# +# softirqs Summarize soft IRQ (interrupt) event time. +# For Linux, uses BCC, eBPF. +# +# USAGE: softirqs [-h] [-T] [-N] [-d] [interval] [count] +# +# Copyright (c) 2015 Brendan Gregg. +# Licensed under the Apache License, Version 2.0 (the "License") +# +# 20-Oct-2015 Brendan Gregg Created this. + +from __future__ import print_function +from bcc import BPF +from time import sleep, strftime +import argparse + +### arguments +examples = """examples: + ./softirqs # sum soft irq event time + ./softirqs -d # show soft irq event time as histograms + ./softirqs 1 10 # print 1 second summaries, 10 times + ./softirqs -NT 1 # 1s summaries, nanoseconds, and timestamps +""" +parser = argparse.ArgumentParser( + description="Summarize soft irq event time as histograms", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=examples) +parser.add_argument("-T", "--timestamp", action="store_true", + help="include timestamp on output") +parser.add_argument("-N", "--nanoseconds", action="store_true", + help="output in nanoseconds") +parser.add_argument("-d", "--dist", action="store_true", + help="show distributions as histograms") +parser.add_argument("interval", nargs="?", default=99999999, + help="output interval, in seconds") +parser.add_argument("count", nargs="?", default=99999999, + help="number of outputs") +args = parser.parse_args() +countdown = int(args.count) +if args.nanoseconds: + factor = 1 + label = "nsecs" +else: + factor = 1000 + label = "usecs" +debug = 0 + +### define BPF program +bpf_text = """ +#include + +typedef struct irq_key { + u64 ip; + u64 slot; +} irq_key_t; +BPF_HASH(start, u32); +BPF_HASH(iptr, u32); +BPF_HISTOGRAM(dist, irq_key_t); + +// time IRQ +int trace_start(struct pt_regs *ctx) +{ + u32 pid = bpf_get_current_pid_tgid(); + u64 ip = ctx->ip, ts = bpf_ktime_get_ns(); + start.update(&pid, &ts); + iptr.update(&pid, &ip); + return 0; +} + +int trace_completion(struct pt_regs *ctx) +{ + u64 *tsp, delta, ip, *ipp; + u32 pid = bpf_get_current_pid_tgid(); + + // fetch timestamp and calculate delta + tsp = start.lookup(&pid); + ipp = iptr.lookup(&pid); + if (tsp == 0 || ipp == 0) { + return 0; // missed start + } + delta = bpf_ktime_get_ns() - *tsp; + ip = *ipp; + + // store as sum or histogram + STORE + + start.delete(&pid); + iptr.delete(&pid); + return 0; +} +""" + +### code substitutions +if args.dist: + bpf_text = bpf_text.replace('STORE', + 'irq_key_t key = {.ip = ip, .slot = bpf_log2l(delta)};' + + 'dist.increment(key);') +else: + bpf_text = bpf_text.replace('STORE', + 'irq_key_t key = {.ip = ip, .slot = 0 /* ignore */};' + + 'u64 zero = 0, *vp = dist.lookup_or_init(&key, &zero);' + + '(*vp) += delta;') +if debug: + print(bpf_text) + +### load BPF program +b = BPF(text=bpf_text) + +# this should really use irq:softirq_entry/exit tracepoints; for now the +# soft irq functions are individually traced (search your kernel for +# open_softirq() calls, and adjust the following list as needed). +for softirqfunc in ("blk_iopoll_softirq", "blk_done_softirq", + "rcu_process_callbacks", "run_rebalance_domains", "tasklet_action", + "tasklet_hi_action", "run_timer_softirq", "net_tx_action", + "net_rx_action"): + b.attach_kprobe(event=softirqfunc, fn_name="trace_start") + b.attach_kretprobe(event=softirqfunc, fn_name="trace_completion") + +print("Tracing soft irq event time... Hit Ctrl-C to end.") + +### output +exiting = 0 if args.interval else 1 +dist = b.get_table("dist") +while (1): + try: + sleep(int(args.interval)) + except KeyboardInterrupt: + exiting=1 + + print() + if args.timestamp: + print("%-8s\n" % strftime("%H:%M:%S"), end="") + + if args.dist: + dist.print_log2_hist(label, "softirq", section_print_fn=b.ksym) + else: + print("%-26s %11s" % ("SOFTIRQ", "TOTAL_" + label)) + for k, v in sorted(dist.items(), key=lambda dist: dist[1].value): + print("%-26s %11d" % (b.ksym(k.ip), v.value / factor)) + dist.clear() + + countdown -= 1 + if exiting or countdown == 0: + exit() diff --git a/tools/softirqs_example.txt b/tools/softirqs_example.txt new file mode 100644 index 0000000..be6ee2d --- /dev/null +++ b/tools/softirqs_example.txt @@ -0,0 +1,203 @@ +Demonstrations of softirqs, the Linux eBPF/bcc version. + + +This program traces soft interrupts (irqs), and stores timing statistics +in-kernel for efficiency. For example: + +# ./softirqs +Tracing soft irq event time... Hit Ctrl-C to end. +^C +SOFTIRQ TOTAL_usecs +rcu_process_callbacks 974 +run_rebalance_domains 1809 +run_timer_softirq 2615 +net_tx_action 14605 +tasklet_action 38692 +net_rx_action 88188 + +The SOFTIRQ column prints the interrupt action function name. While tracing, +the net_rx_action() soft interrupt ran for 20199 microseconds (20 milliseconds) +in total. + +This tool works by dynamic tracing the individual softirq functions, and will +need to be adjusted to match kernel/module changes. Future versions should +use the softirq tracepoints instead. + + +An interval can be provided, and also optionally a count. Eg, printing output +every 1 second, and including timestamps (-T): + +# ./softirqs -T 1 3 +Tracing soft irq event time... Hit Ctrl-C to end. + +22:29:16 +SOFTIRQ TOTAL_usecs +rcu_process_callbacks 456 +run_rebalance_domains 1005 +run_timer_softirq 1196 +net_tx_action 2796 +tasklet_action 5534 +net_rx_action 15075 + +22:29:17 +SOFTIRQ TOTAL_usecs +rcu_process_callbacks 456 +run_rebalance_domains 839 +run_timer_softirq 1142 +net_tx_action 1912 +tasklet_action 4428 +net_rx_action 14652 + +22:29:18 +SOFTIRQ TOTAL_usecs +rcu_process_callbacks 502 +run_rebalance_domains 840 +run_timer_softirq 1192 +net_tx_action 2341 +tasklet_action 5496 +net_rx_action 15656 + +This can be useful for quantifying where CPU cycles are spent among the soft +interrupts (summarized as the %softirq column from mpstat(1), and shown as +event counts in /proc/softirqs). The output above shows that most time was spent +processing net_rx_action(), which was around 15 milleconds per second (total +time across all CPUs). + + +The distribution of interrupt run time can be printed as a histogram with the -d +option. Eg: + +# ./softirqs -d +Tracing soft irq event time... Hit Ctrl-C to end. +^C + +softirq = net_tx_action + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 440 | | + 512 -> 1023 : 27613 |****************************************| + 1024 -> 2047 : 5728 |******** | + 2048 -> 4095 : 439 | | + 4096 -> 8191 : 53 | | + 8192 -> 16383 : 2 | | + +softirq = net_rx_action + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 6 | | + 1024 -> 2047 : 35 | | + 2048 -> 4095 : 3562 |**************** | + 4096 -> 8191 : 7023 |******************************** | + 8192 -> 16383 : 8770 |****************************************| + 16384 -> 32767 : 1780 |******** | + 32768 -> 65535 : 216 | | + 65536 -> 131071 : 4 | | + +softirq = tasklet_action + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 95 | | + 512 -> 1023 : 12521 |****************************************| + 1024 -> 2047 : 1068 |*** | + 2048 -> 4095 : 1077 |*** | + 4096 -> 8191 : 12349 |*************************************** | + 8192 -> 16383 : 464 |* | + 16384 -> 32767 : 1 | | + +softirq = rcu_process_callbacks + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 708 |****************************************| + 1024 -> 2047 : 495 |*************************** | + 2048 -> 4095 : 98 |***** | + 4096 -> 8191 : 62 |*** | + 8192 -> 16383 : 4 | | + +softirq = run_timer_softirq + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 2 | | + 512 -> 1023 : 366 |********* | + 1024 -> 2047 : 1525 |****************************************| + 2048 -> 4095 : 629 |**************** | + 4096 -> 8191 : 87 |** | + 8192 -> 16383 : 1 | | + +softirq = run_rebalance_domains + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 3 | | + 512 -> 1023 : 18 |* | + 1024 -> 2047 : 80 |******** | + 2048 -> 4095 : 374 |****************************************| + 4096 -> 8191 : 257 |*************************** | + 8192 -> 16383 : 50 |***** | + 16384 -> 32767 : 24 |** | + + +USAGE message: + +# ./softirqs -h +usage: softirqs [-h] [-T] [-N] [-d] [interval] [count] + +Summarize soft irq event time as histograms + +positional arguments: + interval output interval, in seconds + count number of outputs + +optional arguments: + -h, --help show this help message and exit + -T, --timestamp include timestamp on output + -N, --nanoseconds output in nanoseconds + -d, --dist show distributions as histograms + +examples: + ./softirqs # sum soft irq event time + ./softirqs -d # show soft irq event time as histograms + ./softirqs 1 10 # print 1 second summaries, 10 times + ./softirqs -NT 1 # 1s summaries, nanoseconds, and timestamps -- 2.7.4