From 423656396e3c8454924daefc9a8dc0496414d748 Mon Sep 17 00:00:00 2001 From: Fei Li Date: Mon, 31 Aug 2020 21:35:33 +0800 Subject: [PATCH] kvmexit.py: introduce a tool to show kvm exit reasons and counts Considering virtual machines' frequent exits can cause performance problems, introduce a tool to show kvm exit reasons and counts, so that the most frequent exited reasons could be located, reduced, or even avoided. For better performance, this tool employs a percpu array and percpu hash in bpf to store exit reason and its counts. Besides, the bcc python provides aggregation and various custom output. For more background, realization and examples, please see kvmexit_example.txt and man/man8/kvmexit.8 for more reference. Signed-off-by: Fei Li --- README.md | 1 + man/man8/kvmexit.8 | 115 ++++++++++++++ tools/kvmexit.py | 389 ++++++++++++++++++++++++++++++++++++++++++++++ tools/kvmexit_example.txt | 250 +++++++++++++++++++++++++++++ 4 files changed, 755 insertions(+) create mode 100644 man/man8/kvmexit.8 create mode 100755 tools/kvmexit.py create mode 100644 tools/kvmexit_example.txt diff --git a/README.md b/README.md index f731c0a..e95532b 100644 --- a/README.md +++ b/README.md @@ -123,6 +123,7 @@ pair of .c and .py files, and some are directories of files. - tools/[inject](tools/inject.py): Targeted error injection with call chain and predicates [Examples](tools/inject_example.txt). - tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt). - tools/[klockstat](tools/klockstat.py): Traces kernel mutex lock events and display locks statistics. [Examples](tools/klockstat_example.txt). +- tools/[kvmexit](tools/kvmexit.py): Display the exit_reason and its statistics of each vm exit. [Examples](tools/kvmexit_example.txt). - tools/[llcstat](tools/llcstat.py): Summarize CPU cache references and misses by process. [Examples](tools/llcstat_example.txt). - tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt). - tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt). diff --git a/man/man8/kvmexit.8 b/man/man8/kvmexit.8 new file mode 100644 index 0000000..c0cb4c9 --- /dev/null +++ b/man/man8/kvmexit.8 @@ -0,0 +1,115 @@ +.TH kvmexit 8 "2021-07-08" "USER COMMANDS" +.SH NAME +kvmexit \- Display the exit_reason and its statistics of each vm exit. +.SH SYNOPSIS +.B kvmexit [\-h] [\-p PID [\-v VCPU | \-a] ] [\-t TID | \-T 'TID1,TID2'] [duration] +.SH DESCRIPTION +Considering virtual machines' frequent exits can cause performance problems, +this tool aims to locate the frequent exited reasons and then find solutions +to reduce or even avoid the exit, by displaying the detail exit reasons and +the counts of each vm exit for all vms running on one physical machine. + +This tool uses a PERCPU_ARRAY: pcpuArrayA and a percpu_hash: hashA to +collaboratively store each kvm exit reason and its count. The reason is there +exists a rule when one vcpu exits and re-enters, it tends to continue to run on +the same physical cpu as the last cycle, which is also called 'cache hit'. Thus +we turn to use a PERCPU_ARRAY to record the 'cache hit' situation to speed +things up; and for other cases, then use a percpu_hash. + +As RAW_TRACEPOINT_PROBE(kvm_exit) consumes less cpu cycles, when this tool is +used, it firstly tries to employ raw tracepoints in modules, and if failes, +then fall back to regular tracepoint. + +Limitation: In view of the hardware-assisted virtualization technology of +different architectures, currently we only adapt on vmx in intel. + +Since this uses BPF, only the root user can use this tool. +.SH REQUIREMENTS +CONFIG_BPF and bcc. + +This also requires Linux 4.7+ (BPF_PROG_TYPE_TRACEPOINT support). +.SH OPTIONS +.TP +\-h +Print usage message. +.TP +\-p PID +Display process with this PID only, collpase all tids with exit reasons sorted +in descending order. +.TP +\-v VCPU +Display this VCPU only for this PID. +.TP +\-a ALLTIDS +Display all TIDS for this PID. +.TP +\-t TID +Display thread with this TID only with exit reasons sorted in descending order. +.TP +\-T 'TID1,TID2' +Display threads for a union like {395490, 395491}. +.TP +duration +Duration of display, after sleeping several seconds. +.SH EXAMPLES +.TP +Display kvm exit reasons and statistics for all threads... Hit Ctrl-C to end: +# +.B kvmexit +.TP +Display kvm exit reasons and statistics for all threads after sleeping 6 secs: +# +.B kvmexit 6 +.TP +Display kvm exit reasons and statistics for PID 1273795 after sleeping 5 secs: +# +.B kvmexit -p 1273795 5 +.TP +Display kvm exit reasons and statistics for PID 1273795 and its all threads after sleeping 5 secs: +# +.B kvmexit -p 1273795 5 -a +.TP +Display kvm exit reasons and statistics for PID 1273795 VCPU 0... Hit Ctrl-C to end: +# +.B kvmexit -p 1273795 -v 0 +.TP +Display kvm exit reasons and statistics for PID 1273795 VCPU 0 after sleeping 4 secs: +# +.B kvmexit -p 1273795 -v 0 4 +.TP +Display kvm exit reasons and statistics for TID 1273819 after sleeping 10 secs: +# +.B kvmexit -t 1273819 10 +.TP +Display kvm exit reasons and statistics for TIDS ['1273820', '1273819']... Hit Ctrl-C to end: +# +.B kvmexit -T '1273820,1273819' +.SH OVERHEAD +This traces the "kvm_exit" kernel function, records the exit reason and +calculates its counts. Contrast with filling more vm-exit reason debug entries, +this tool is more easily and flexibly: the bcc python logic could provide nice +kernel aggregation and custom output, the bpf in-kernel percpu_array and +percpu_cache further improves performance. + +The impact of using this tool on the host should be negligible. While this +tool is very efficient, it does affect the guest virtual machine itself, the +average test results on guest vm are as follows: + | cpu cycles + no TP | 1127 + regular TP | 1277 (13% downgrade) + RAW TP | 1187 (5% downgrade) + +Host: echo 1 > /proc/sys/net/core/bpf_jit_enable +.SH SOURCE +This is from bcc. +.IP +https://github.com/iovisor/bcc +.PP +Also look in the bcc distribution for a companion _examples.txt file containing +example usage, output, and commentary for this tool. +.SH OS +Linux +.SH STABILITY +Unstable - in development. +.SH AUTHOR +Fei Li diff --git a/tools/kvmexit.py b/tools/kvmexit.py new file mode 100755 index 0000000..a959efb --- /dev/null +++ b/tools/kvmexit.py @@ -0,0 +1,389 @@ +#!/usr/bin/env python +# +# kvmexit.py +# +# Display the exit_reason and its statistics of each vm exit +# for all vcpus of all virtual machines. For example: +# $./kvmexit.py +# PID TID KVM_EXIT_REASON COUNT +# 1273551 1273568 EXIT_REASON_MSR_WRITE 6 +# 1274253 1274261 EXIT_REASON_EXTERNAL_INTERRUPT 1 +# 1274253 1274261 EXIT_REASON_HLT 12 +# ... +# +# Besides, we also allow users to specify one pid, tid(s), or one +# pid and its vcpu. See kvmexit_example.txt for more examples. +# +# @PID: each vitual machine's pid in the user space. +# @TID: the user space's thread of each vcpu of that virtual machine. +# @KVM_EXIT_REASON: the reason why the vm exits. +# @COUNT: the counts of the @KVM_EXIT_REASONS. +# +# REQUIRES: Linux 4.7+ (BPF_PROG_TYPE_TRACEPOINT support) +# +# Copyright (c) 2021 ByteDance Inc. All rights reserved. +# +# Author(s): +# Fei Li + + +from __future__ import print_function +from time import sleep, strftime +from bcc import BPF +import argparse +import multiprocessing +import os +import signal +import subprocess + +# +# Process Arguments +# +def valid_args_list(args): + args_list = args.split(",") + for arg in args_list: + try: + int(arg) + except: + raise argparse.ArgumentTypeError("must be valid integer") + return args_list + +# arguments +examples = """examples: + ./kvmexit # Display kvm_exit_reason and its statistics in real-time until Ctrl-C + ./kvmexit 5 # Display in real-time after sleeping 5s + ./kvmexit -p 3195281 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order + ./kvmexit -p 3195281 20 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order, and display after sleeping 20s + ./kvmexit -p 3195281 -v 0 # Display only vcpu0 for pid 3195281, descending sort by default + ./kvmexit -p 3195281 -a # Display all tids for pid 3195281 + ./kvmexit -t 395490 # Display only for tid 395490 with exit reasons sorted in descending order + ./kvmexit -t 395490 20 # Display only for tid 395490 with exit reasons sorted in descending order after sleeping 20s + ./kvmexit -T '395490,395491' # Display for a union like {395490, 395491} +""" +parser = argparse.ArgumentParser( + description="Display kvm_exit_reason and its statistics at a timed interval", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=examples) +parser.add_argument("duration", nargs="?", default=99999999, type=int, help="show delta for next several seconds") +parser.add_argument("-p", "--pid", type=int, help="trace this PID only") +exgroup = parser.add_mutually_exclusive_group() +exgroup.add_argument("-t", "--tid", type=int, help="trace this TID only") +exgroup.add_argument("-T", "--tids", type=valid_args_list, help="trace a comma separated series of tids with no space in between") +exgroup.add_argument("-v", "--vcpu", type=int, help="trace this vcpu only") +exgroup.add_argument("-a", "--alltids", action="store_true", help="trace all tids for this pid") +args = parser.parse_args() +duration = int(args.duration) + +# +# Setup BPF +# + +# load BPF program +bpf_text = """ +#include + +#define REASON_NUM 69 +#define TGID_NUM 1024 + +struct exit_count { + u64 exit_ct[REASON_NUM]; +}; +BPF_PERCPU_ARRAY(init_value, struct exit_count, 1); +BPF_TABLE("percpu_hash", u64, struct exit_count, pcpu_kvm_stat, TGID_NUM); + +struct cache_info { + u64 cache_pid_tgid; + struct exit_count cache_exit_ct; +}; +BPF_PERCPU_ARRAY(pcpu_cache, struct cache_info, 1); + +FUNC_ENTRY { + int cache_miss = 0; + int zero = 0; + u32 er = GET_ER; + if (er >= REASON_NUM) { + return 0; + } + + u64 cur_pid_tgid = bpf_get_current_pid_tgid(); + u32 tgid = cur_pid_tgid >> 32; + u32 pid = cur_pid_tgid; + + if (THREAD_FILTER) + return 0; + + struct exit_count *tmp_info = NULL, *initial = NULL; + struct cache_info *cache_p; + cache_p = pcpu_cache.lookup(&zero); + if (cache_p == NULL) { + return 0; + } + + if (cache_p->cache_pid_tgid == cur_pid_tgid) { + //a. If the cur_pid_tgid hit this physical cpu consecutively, save it to pcpu_cache + tmp_info = &cache_p->cache_exit_ct; + } else { + //b. If another pid_tgid matches this pcpu for the last hit, OR it is the first time to hit this physical cpu. + cache_miss = 1; + + // b.a Try to load the last cache struct if exists. + tmp_info = pcpu_kvm_stat.lookup(&cur_pid_tgid); + + // b.b If it is the first time for the cur_pid_tgid to hit this pcpu, employ a + // per_cpu array to initialize pcpu_kvm_stat's exit_count with each exit reason's count is zero + if (tmp_info == NULL) { + initial = init_value.lookup(&zero); + if (initial == NULL) { + return 0; + } + + pcpu_kvm_stat.update(&cur_pid_tgid, initial); + tmp_info = pcpu_kvm_stat.lookup(&cur_pid_tgid); + // To pass the verifier + if (tmp_info == NULL) { + return 0; + } + } + } + + if (er < REASON_NUM) { + tmp_info->exit_ct[er]++; + if (cache_miss == 1) { + if (cache_p->cache_pid_tgid != 0) { + // b.*.a Let's save the last hit cache_info into kvm_stat. + pcpu_kvm_stat.update(&cache_p->cache_pid_tgid, &cache_p->cache_exit_ct); + } + // b.* As the cur_pid_tgid meets current pcpu_cache_array for the first time, save it. + cache_p->cache_pid_tgid = cur_pid_tgid; + bpf_probe_read(&cache_p->cache_exit_ct, sizeof(*tmp_info), tmp_info); + } + return 0; + } + + return 0; +} +""" + +# format output +exit_reasons = ( + "EXCEPTION_NMI", + "EXTERNAL_INTERRUPT", + "TRIPLE_FAULT", + "INIT_SIGNAL", + "N/A", + "N/A", + "N/A", + "INTERRUPT_WINDOW", + "NMI_WINDOW", + "TASK_SWITCH", + "CPUID", + "N/A", + "HLT", + "INVD", + "INVLPG", + "RDPMC", + "RDTSC", + "N/A", + "VMCALL", + "VMCLEAR", + "VMLAUNCH", + "VMPTRLD", + "VMPTRST", + "VMREAD", + "VMRESUME", + "VMWRITE", + "VMOFF", + "VMON", + "CR_ACCESS", + "DR_ACCESS", + "IO_INSTRUCTION", + "MSR_READ", + "MSR_WRITE", + "INVALID_STATE", + "MSR_LOAD_FAIL", + "N/A", + "MWAIT_INSTRUCTION", + "MONITOR_TRAP_FLAG", + "N/A", + "MONITOR_INSTRUCTION", + "PAUSE_INSTRUCTION", + "MCE_DURING_VMENTRY", + "N/A", + "TPR_BELOW_THRESHOLD", + "APIC_ACCESS", + "EOI_INDUCED", + "GDTR_IDTR", + "LDTR_TR", + "EPT_VIOLATION", + "EPT_MISCONFIG", + "INVEPT", + "RDTSCP", + "PREEMPTION_TIMER", + "INVVPID", + "WBINVD", + "XSETBV", + "APIC_WRITE", + "RDRAND", + "INVPCID", + "VMFUNC", + "ENCLS", + "RDSEED", + "PML_FULL", + "XSAVES", + "XRSTORS", + "N/A", + "N/A", + "UMWAIT", + "TPAUSE" +) + +# +# Do some checks +# +try: + # Currently, only adapte on intel architecture + cmd = "cat /proc/cpuinfo | grep vendor_id | head -n 1" + arch_info = subprocess.check_output(cmd, shell=True).strip() + if b"Intel" in arch_info: + pass + else: + raise Exception("Currently we only support Intel architecture, please do expansion if needs more.") + + # Check if kvm module is loaded + if os.access("/dev/kvm", os.R_OK | os.W_OK): + pass + else: + raise Exception("Please insmod kvm module to use kvmexit tool.") +except Exception as e: + raise Exception("Failed to do precondition check, due to: %s." % e) + +try: + if BPF.support_raw_tracepoint_in_module(): + # Let's firstly try raw_tracepoint_in_module + func_entry = "RAW_TRACEPOINT_PROBE(kvm_exit)" + get_er = "ctx->args[0]" + else: + # If raw_tp_in_module is not supported, fall back to regular tp + func_entry = "TRACEPOINT_PROBE(kvm, kvm_exit)" + get_er = "args->exit_reason" +except Exception as e: + raise Exception("Failed to catch kvm exit reasons due to: %s" % e) + + +def find_tid(tgt_dir, tgt_vcpu): + for tid in os.listdir(tgt_dir): + path = tgt_dir + "/" + tid + "/comm" + fp = open(path, "r") + comm = fp.read() + if (comm.find(tgt_vcpu) != -1): + return tid + return -1 + +# set process/thread filter +thread_context = "" +header_format = "" +need_collapse = not args.alltids +if args.tid is not None: + thread_context = "TID %s" % args.tid + thread_filter = 'pid != %s' % args.tid +elif args.tids is not None: + thread_context = "TIDS %s" % args.tids + thread_filter = "pid != " + " && pid != ".join(args.tids) + header_format = "TIDS " +elif args.pid is not None: + thread_context = "PID %s" % args.pid + thread_filter = 'tgid != %s' % args.pid + if args.vcpu is not None: + thread_context = "PID %s VCPU %s" % (args.pid, args.vcpu) + # transfer vcpu to tid + tgt_dir = '/proc/' + str(args.pid) + '/task' + tgt_vcpu = "CPU " + str(args.vcpu) + args.tid = find_tid(tgt_dir, tgt_vcpu) + if args.tid == -1: + raise Exception("There's no v%s for PID %d." % (tgt_vcpu, args.pid)) + thread_filter = 'pid != %s' % args.tid + elif args.alltids: + thread_context = "PID %s and its all threads" % args.pid + header_format = "TID " +else: + thread_context = "all threads" + thread_filter = '0' + header_format = "PID TID " +bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter) + +# For kernel >= 5.0, use RAW_TRACEPOINT_MODULE for performance consideration +bpf_text = bpf_text.replace('FUNC_ENTRY', func_entry) +bpf_text = bpf_text.replace('GET_ER', get_er) +b = BPF(text=bpf_text) + + +# header +print("Display kvm exit reasons and statistics for %s" % thread_context, end="") +if duration < 99999999: + print(" after sleeping %d secs." % duration) +else: + print("... Hit Ctrl-C to end.") +print("%s%-35s %s" % (header_format, "KVM_EXIT_REASON", "COUNT")) + +# signal handler +def signal_ignore(signal, frame): + print() +try: + sleep(duration) +except KeyboardInterrupt: + signal.signal(signal.SIGINT, signal_ignore) + + +# Currently, sort multiple tids in descending order is not supported. +if (args.pid or args.tid): + ct_reason = [] + if args.pid: + tgid_exit = [0 for i in range(len(exit_reasons))] + +# output +pcpu_kvm_stat = b["pcpu_kvm_stat"] +pcpu_cache = b["pcpu_cache"] +for k, v in pcpu_kvm_stat.items(): + tgid = k.value >> 32 + pid = k.value & 0xffffffff + for i in range(0, len(exit_reasons)): + sum1 = 0 + for inner_cpu in range(0, multiprocessing.cpu_count()): + cachePIDTGID = pcpu_cache[0][inner_cpu].cache_pid_tgid + # Take priority to check if it is in cache + if cachePIDTGID == k.value: + sum1 += pcpu_cache[0][inner_cpu].cache_exit_ct.exit_ct[i] + # If not in cache, find from kvm_stat + else: + sum1 += v[inner_cpu].exit_ct[i] + if sum1 == 0: + continue + + if (args.pid and args.pid == tgid and need_collapse): + tgid_exit[i] += sum1 + elif (args.tid and args.tid == pid): + ct_reason.append((sum1, i)) + elif not need_collapse or args.tids: + print("%-8u %-35s %-8u" % (pid, exit_reasons[i], sum1)) + else: + print("%-8u %-8u %-35s %-8u" % (tgid, pid, exit_reasons[i], sum1)) + + # Display only for the target tid in descending sort + if (args.tid and args.tid == pid): + ct_reason.sort(reverse=True) + for i in range(0, len(ct_reason)): + if ct_reason[i][0] == 0: + continue + print("%-35s %-8u" % (exit_reasons[ct_reason[i][1]], ct_reason[i][0])) + break + + +# Aggregate all tids' counts for this args.pid in descending sort +if args.pid and need_collapse: + for i in range(0, len(exit_reasons)): + ct_reason.append((tgid_exit[i], i)) + ct_reason.sort(reverse=True) + for i in range(0, len(ct_reason)): + if ct_reason[i][0] == 0: + continue + print("%-35s %-8u" % (exit_reasons[ct_reason[i][1]], ct_reason[i][0])) diff --git a/tools/kvmexit_example.txt b/tools/kvmexit_example.txt new file mode 100644 index 0000000..6b5b871 --- /dev/null +++ b/tools/kvmexit_example.txt @@ -0,0 +1,250 @@ +Demonstrations of kvm exit reasons, the Linux eBPF/bcc version. + + +Considering virtual machines' frequent exits can cause performance problems, +this tool aims to locate the frequent exited reasons and then find solutions +to reduce or even avoid the exit, by displaying the detail exit reasons and +the counts of each vm exit for all vms running on one physical machine. + + +Features of this tool +===================== + +- Although there is a patch: [KVM: x86: add full vm-exit reason debug entries] + (https://patchwork.kernel.org/project/kvm/patch/1555939499-30854-1-git-send-email-pizhenwei@bytedance.com/) + trying to fill more vm-exit reason debug entries, just as the comments said, + the code allocates lots of memory that may never be consumed, misses some + arch-specific kvm causes, and can not do kernel aggregation. Instead bcc, as + a user space tool, can implement all these functions more easily and flexibly. +- The bcc python logic could provide nice kernel aggregation and custom output, + like collpasing all tids for one pid (e.i. one vm's qemu process id) with exit + reasons sorted in descending order. For more information, see the following + #USAGE message. +- The bpf in-kernel percpu_array and percpu_cache further improves performance. + For more information, see the following #Help to understand. + + +Limited +======= + +In view of the hardware-assisted virtualization technology of +different architectures, currently we only adapt on vmx in intel. +And the amd feature is on the road.. + + +Example output: +=============== + +# ./kvmexit.py +Display kvm exit reasons and statistics for all threads... Hit Ctrl-C to end. +PID TID KVM_EXIT_REASON COUNT +^C1273551 1273568 EXIT_REASON_HLT 12 +1273551 1273568 EXIT_REASON_MSR_WRITE 6 +1274253 1274261 EXIT_REASON_EXTERNAL_INTERRUPT 1 +1274253 1274261 EXIT_REASON_HLT 12 +1274253 1274261 EXIT_REASON_MSR_WRITE 4 + +# ./kvmexit.py 6 +Display kvm exit reasons and statistics for all threads after sleeping 6 secs. +PID TID KVM_EXIT_REASON COUNT +1273903 1273922 EXIT_REASON_EXTERNAL_INTERRUPT 175 +1273903 1273922 EXIT_REASON_CPUID 10 +1273903 1273922 EXIT_REASON_HLT 6043 +1273903 1273922 EXIT_REASON_IO_INSTRUCTION 24 +1273903 1273922 EXIT_REASON_MSR_WRITE 15025 +1273903 1273922 EXIT_REASON_PAUSE_INSTRUCTION 11 +1273903 1273922 EXIT_REASON_EOI_INDUCED 12 +1273903 1273922 EXIT_REASON_EPT_VIOLATION 6 +1273903 1273922 EXIT_REASON_EPT_MISCONFIG 380 +1273903 1273922 EXIT_REASON_PREEMPTION_TIMER 194 +1273551 1273568 EXIT_REASON_EXTERNAL_INTERRUPT 18 +1273551 1273568 EXIT_REASON_HLT 989 +1273551 1273568 EXIT_REASON_IO_INSTRUCTION 10 +1273551 1273568 EXIT_REASON_MSR_WRITE 2205 +1273551 1273568 EXIT_REASON_PAUSE_INSTRUCTION 1 +1273551 1273568 EXIT_REASON_EOI_INDUCED 5 +1273551 1273568 EXIT_REASON_EPT_MISCONFIG 61 +1273551 1273568 EXIT_REASON_PREEMPTION_TIMER 14 + +# ./kvmexit.py -p 1273795 5 +Display kvm exit reasons and statistics for PID 1273795 after sleeping 5 secs. +KVM_EXIT_REASON COUNT +MSR_WRITE 13467 +HLT 5060 +PREEMPTION_TIMER 345 +EPT_MISCONFIG 264 +EXTERNAL_INTERRUPT 169 +EPT_VIOLATION 18 +PAUSE_INSTRUCTION 6 +IO_INSTRUCTION 4 +EOI_INDUCED 2 + +# ./kvmexit.py -p 1273795 5 -a +Display kvm exit reasons and statistics for PID 1273795 and its all threads after sleeping 5 secs. +TID KVM_EXIT_REASON COUNT +1273819 EXTERNAL_INTERRUPT 64 +1273819 HLT 2802 +1273819 IO_INSTRUCTION 4 +1273819 MSR_WRITE 7196 +1273819 PAUSE_INSTRUCTION 2 +1273819 EOI_INDUCED 2 +1273819 EPT_VIOLATION 6 +1273819 EPT_MISCONFIG 162 +1273819 PREEMPTION_TIMER 194 +1273820 EXTERNAL_INTERRUPT 78 +1273820 HLT 2054 +1273820 MSR_WRITE 5199 +1273820 EPT_VIOLATION 2 +1273820 EPT_MISCONFIG 77 +1273820 PREEMPTION_TIMER 102 + +# ./kvmexit.py -p 1273795 -v 0 +Display kvm exit reasons and statistics for PID 1273795 VCPU 0... Hit Ctrl-C to end. +KVM_EXIT_REASON COUNT +^CMSR_WRITE 2076 +HLT 795 +PREEMPTION_TIMER 86 +EXTERNAL_INTERRUPT 20 +EPT_MISCONFIG 10 +PAUSE_INSTRUCTION 2 +IO_INSTRUCTION 2 +EPT_VIOLATION 1 +EOI_INDUCED 1 + +# ./kvmexit.py -p 1273795 -v 0 4 +Display kvm exit reasons and statistics for PID 1273795 VCPU 0 after sleeping 4 secs. +KVM_EXIT_REASON COUNT +MSR_WRITE 4726 +HLT 1827 +PREEMPTION_TIMER 78 +EPT_MISCONFIG 67 +EXTERNAL_INTERRUPT 28 +IO_INSTRUCTION 4 +EOI_INDUCED 2 +PAUSE_INSTRUCTION 2 + +# ./kvmexit.py -p 1273795 -v 4 4 +Traceback (most recent call last): + File "tools/kvmexit.py", line 306, in + raise Exception("There's no v%s for PID %d." % (tgt_vcpu, args.pid)) + Exception: There's no vCPU 4 for PID 1273795. + +# ./kvmexit.py -t 1273819 10 +Display kvm exit reasons and statistics for TID 1273819 after sleeping 10 secs. +KVM_EXIT_REASON COUNT +MSR_WRITE 13318 +HLT 5274 +EPT_MISCONFIG 263 +PREEMPTION_TIMER 171 +EXTERNAL_INTERRUPT 109 +IO_INSTRUCTION 8 +PAUSE_INSTRUCTION 5 +EOI_INDUCED 4 +EPT_VIOLATION 2 + +# ./kvmexit.py -T '1273820,1273819' +Display kvm exit reasons and statistics for TIDS ['1273820', '1273819']... Hit Ctrl-C to end. +TIDS KVM_EXIT_REASON COUNT +^C1273819 EXTERNAL_INTERRUPT 300 +1273819 HLT 13718 +1273819 IO_INSTRUCTION 26 +1273819 MSR_WRITE 37457 +1273819 PAUSE_INSTRUCTION 13 +1273819 EOI_INDUCED 13 +1273819 EPT_VIOLATION 53 +1273819 EPT_MISCONFIG 654 +1273819 PREEMPTION_TIMER 958 +1273820 EXTERNAL_INTERRUPT 212 +1273820 HLT 9002 +1273820 MSR_WRITE 25495 +1273820 PAUSE_INSTRUCTION 2 +1273820 EPT_VIOLATION 64 +1273820 EPT_MISCONFIG 396 +1273820 PREEMPTION_TIMER 268 + + +Help to understand +================== + +We use a PERCPU_ARRAY: pcpuArrayA and a percpu_hash: hashA to collaboratively +store each kvm exit reason and its count. The reason is there exists a rule when +one vcpu exits and re-enters, it tends to continue to run on the same physical +cpu (pcpu as follows) as the last cycle, which is also called 'cache hit'. Thus +we turn to use a PERCPU_ARRAY to record the 'cache hit' situation to speed +things up; and for other cases, then use a percpu_hash. + +BTW, we originally use a common hash to do this, with a u64(exit_reason) +key and a struct exit_info {tgid_pid, exit_reason} value. But due to +the big lock in bpf_hash, each updating is quite performance consuming. + +Now imagine here is a pid_tgidA (vcpu A) exits and is going to run on +pcpuArrayA, the BPF code flow is as follows: + + pid_tgidA keeps running on the same pcpu + // \\ + // \\ + // Y N \\ + // \\ + a. cache_hit b. cache_miss +(cacheA's pid_tgid matches pid_tgidA) || + | || + | || + "increase percpu exit_ct and return" || + [*Note*] || + pid_tgidA ever been exited on pcpuArrayA? + // \\ + // \\ + // \\ + // Y N \\ + // \\ + b.a load_last_hashA b.b initialize_hashA_with_zero + \ / + \ / + \ / + "increase percpu exit_ct" + || + || + is another pid_tgid been running on pcpuArrayA? + // \\ + // Y N \\ + // \\ + b.*.a save_theLastHit_hashB do_nothing + \\ // + \\ // + \\ // + b.* save_to_pcpuArrayA + + +[*Note*] we do not update the table in above "a.", in case the vcpu hit the same +pcpu again when exits next time, instead we only update until this pcpu is not +hitted by the same tgidpid(vcpu) again, which is in "b.*.a" and "b.*". + + +USAGE message: +============== + +# ./kvmexit.py -h +usage: kvmexit.py [-h] [-p PID [-v VCPU | -a] ] [-t TID | -T 'TID1,TID2'] [duration] + +Display kvm_exit_reason and its statistics at a timed interval + +optional arguments: + -h, --help show this help message and exit + -p PID, --pid PID display process with this PID only, collpase all tids with exit reasons sorted in descending order + -v VCPU, --v VCPU display this VCPU only for this PID + -a, --alltids display all TIDS for this PID + -t TID, --tid TID display thread with this TID only with exit reasons sorted in descending order + -T 'TID1,TID2', --tids 'TID1,TID2' + display threads for a union like {395490, 395491} + duration duration of display, after sleeping several seconds + +examples: + ./kvmexit # Display kvm_exit_reason and its statistics in real-time until Ctrl-C + ./kvmexit 5 # Display in real-time after sleeping 5s + ./kvmexit -p 3195281 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order + ./kvmexit -p 3195281 20 # Collpase all tids for pid 3195281 with exit reasons sorted in descending order, and display after sleeping 20s + ./kvmexit -p 3195281 -v 0 # Display only vcpu0 for pid 3195281, descending sort by default + ./kvmexit -p 3195281 -a # Display all tids for pid 3195281 + ./kvmexit -t 395490 # Display only for tid 395490 with exit reasons sorted in descending order + ./kvmexit -t 395490 20 # Display only for tid 395490 with exit reasons sorted in descending order after sleeping 20s + ./kvmexit -T '395490,395491' # Display for a union like {395490, 395491} -- 2.7.4