Rob Herring [Mon, 27 Mar 2023 16:20:57 +0000 (11:20 -0500)]
perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store
Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging
Extension (MTE) and memory operations (MOPS). The memory operations
are memcpy and memset. Add support for decoding these new subclasses in
the raw decoding.
Reviewed-by: Leo Yan <leo.yan@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230327162057.4057188-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mike Leach [Fri, 31 Mar 2023 05:56:45 +0000 (06:56 +0100)]
perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
When using dynamically assigned CoreSight trace IDs the drivers can output
the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID
mapping.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mike Leach [Fri, 31 Mar 2023 05:56:44 +0000 (06:56 +0100)]
perf cs-etm: Update record event to use new Trace ID protocol
Trace IDs are now dynamically allocated.
Previously used the static association algorithm that is no longer
used. The 'cpu * 2 + seed' was outdated and broken for systems with high
core counts (>46). as it did not scale and was broken for larger
core counts.
Trace ID will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record.
Legacy ID algorithm renamed and retained for limited backward
compatibility use.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mike Leach [Fri, 31 Mar 2023 05:56:43 +0000 (06:56 +0100)]
perf cs-etm: Move mapping of Trace ID and cpu into helper function
The information to associate Trace ID and CPU will be changing.
Drivers will start outputting this as a hardware ID packet in the data
file which if present will be used in preference to the AUXINFO values.
To prepare for this we provide a helper functions to do the individual ID
mapping, and one to extract the IDs from the completed metadata blocks.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 27 Mar 2023 22:57:11 +0000 (15:57 -0700)]
perf lock contention: Show detail failure reason for BPF
It can fail to collect lock stat from BPF for various reasons. For
example, I've got a report that sometimes time calculation seems wrong
in case of contended spinlocks. I suspect the time delta went negative
for some reason.
Count them separately and show in the output like below:
$ sudo perf lock contention -abE5 sleep 10
contended total wait max wait avg wait type caller
13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14
10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27
9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36
114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5
132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62
=== output for debug ===
bad: 1, total: 279
bad rate: 0.36 %
histogram of failure reasons
task: 1
stack: 0
time: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 27 Mar 2023 22:57:10 +0000 (15:57 -0700)]
perf lock contention: Fix debug stat if no contention
It should not divide if the total number is 0. Otherwise it'd show
NaN in the bad rate output. Also add a whitespace in the "output
for debug" message.
$ sudo perf lock contention -abv true
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
contended total wait max wait avg wait type caller
=== output for debug===
bad: 0, total: 0
bad rate: -nan % <------------------------- (here)
histogram of events caused bad sequence
acquire: 0
acquired: 0
contended: 0
release: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:41:42 +0000 (16:41 -0700)]
perf vendor events intel: Update ivybridge and ivytown
Update to versions 24 and 23 respectively. Adds the event
BR_MISP_EXEC.INDIRECT.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230328234142.1080045-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andreas Herrmann [Thu, 30 Mar 2023 07:42:02 +0000 (09:42 +0200)]
perf bench numa: Fix type of loop iterator in do_work, it should be 'long'
'j' is of type int and start/end are of type 'long'. Thus 'j' might become
negative and cause segfault in access_data(). Fix it by using 'long' for
'j' as well.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Andreas Herrmann <aherrmann@suse.de>
Link: https://lore.kernel.org/r/20230330074202.14052-1-aherrmann@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 30 Mar 2023 13:18:33 +0000 (16:18 +0300)]
perf symbol: Remove unused branch_callstack
branch_callstack was added by commit
8b7bad58efb7 ("perf callchain: Support
handling complete branch stacks as histograms") but never used.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230330131833.12864-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 30 Mar 2023 13:18:32 +0000 (16:18 +0300)]
perf top: Add --branch-history option
Add --branch-history option, to act the same as that option does for
perf report.
Example:
$ cat tcallf.c
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
while (1)
f1();
}
$ gcc -w -g -o tcallf tcallf.c
$ ./tcallf &
[1] 29409
$ perf top -e cycles:u -t $(pidof tcallf) --stdio --no-children --branch-history
PerfTop: 3819 irqs/sec kernel: 0.0% exact: 0.0% lost: 0/0 drop: 0/0 [4000Hz cycles:u], (target_tid: 29409)
--------------------------------------------------------------------------------------------------------------------
49.01% tcallf.c:5 [.] f2 tcallf
|
|--24.91%--f2 tcallf.c:4
| |
| |--17.14%--f1 tcallf.c:11 (cycles:1)
| | f1 tcallf.c:11
| | f2 tcallf.c:6 (cycles:3)
| | f2 tcallf.c:4
| | f1 tcallf.c:10 (cycles:2)
| | f1 tcallf.c:9
| | main tcallf.c:16 (cycles:1)
| | main tcallf.c:16
| | main tcallf.c:16 (cycles:1)
| | main tcallf.c:16
| | f1 tcallf.c:12 (cycles:1)
| | f1 tcallf.c:12
| | f2 tcallf.c:6 (cycles:3)
| | f2 tcallf.c:4
| | f1 tcallf.c:11 (cycles:1 iter:1 avg_cycles:12)
| | f1 tcallf.c:11
| | f2 tcallf.c:6 (cycles:3 iter:1 avg_cycles:12)
| | f2 tcallf.c:4
| | f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12)
| |
| --7.78%--f1 tcallf.c:10 (cycles:2)
| f1 tcallf.c:9
| main tcallf.c:16 (cycles:1)
| main tcallf.c:16
| main tcallf.c:16 (cycles:1)
| main tcallf.c:16
| f1 tcallf.c:12 (cycles:1)
| f1 tcallf.c:12
| f2 tcallf.c:6 (cycles:3)
| f2 tcallf.c:4
| f1 tcallf.c:11 (cycles:1)
| f1 tcallf.c:11
| f2 tcallf.c:6 (cycles:3)
| f2 tcallf.c:4
| f1 tcallf.c:10 (cycles:2 iter:1 avg_cycles:12)
| f1 tcallf.c:9
| main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12)
| main tcallf.c:16
| main tcallf.c:16 (cycles:1 iter:1 avg_cycles:12)
...
$ pkill tcallf
[1]+ Terminated ./tcallf
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230330131833.12864-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 30 Mar 2023 18:38:27 +0000 (11:38 -0700)]
perf build: Conditionally define NDEBUG
When a build is done without DEBUG=1 then define NDEBUG. This will
compile out asserts and other debug code.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230330183827.1412303-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 30 Mar 2023 18:38:26 +0000 (11:38 -0700)]
perf block-range: Move debug code behind ifndef NDEBUG
Make good on a comment and avoid a unused-but-set-variable warning.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230330183827.1412303-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 30 Mar 2023 18:38:25 +0000 (11:38 -0700)]
perf bench: Avoid NDEBUG warning
With NDEBUG set the asserts are compiled out. This yields
"unused-but-set-variable" variables. Move these variables behind
NDEBUG to avoid the warning.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230330183827.1412303-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 29 Mar 2023 16:23:18 +0000 (09:23 -0700)]
perf vendor events: Update Alderlake for E-Core TMA v2.3
https://github.com/intel/perfmon/pull/65
Generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
The PR notes state:
- E-Core TMA version 2.3.
- FP_UOPS changed to FPDIV_Uops
- Added BR_MISP breakdown stats
- Frontend_Bandwidth/Latency changed to Fetch_Bandwidth/Latency
- Load_Store_Bound changed to Memory_Bound
- Icache changed to ICache_Misses
- ITLB changed to ITLB_Misses
- Store_Fwd changed to Store_Fwd_Blk
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230329162318.1227114-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:43 +0000 (16:55 -0700)]
perf symbol: Add command line support for addr2line path
Allow addr2line to be set either on the command line or via the
perfconfig file. This doesn't currently work with llvm-addr2line as
the addr2line code emits two things:
1) the address to decode,
2) a bogus ',' value.
The expectation is the bogus value will generate:
??
??:0
that terminates the addr2line reading. However, the output from
llvm-addr2line is a single line with just the input ',' locking up the
addr2line reading that is expecting a second line.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:42 +0000 (16:55 -0700)]
perf annotate: Allow objdump to be set in perfconfig
Allow the setting of the objdump command in the perfconfig. Update man
page for this new option.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:41 +0000 (16:55 -0700)]
perf annotate: Own objdump_path and disassembler_style strings
Make struct annotation_options own the strings objdump_path and
disassembler_style, freeing them on exit. Add missing strdup for
disassembler_style when read from a config file.
Committer notes:
Converted free(obj->member) to zfree(&obj->member) in
annotation_options__exit()
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:40 +0000 (16:55 -0700)]
perf annotate: Add init/exit to annotation_options remove default
The annotation__default_options global variable was used to initialize
annotation_options. Switch to the init/exit pattern as later changes
will give ownership over strings and this will be necessary to avoid
memory leaks.
Committer note:
Fix the GTK2=1 build, hist_entry__gtk_annotate() needs to receive a
'struct annotation_options' pointer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:39 +0000 (16:55 -0700)]
perf report: Additional config warnings
If the default_sort_order isn't correctly strdup-ed warn and return an
error. Debug warn if no option is matched.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 28 Mar 2023 23:55:38 +0000 (16:55 -0700)]
perf annotate: Delete session for debug builds
Use the debug build indicator as the guide to free the session. This
implements a behavior described in a comment, which is consequentially
removed.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 16 Mar 2023 19:41:56 +0000 (21:41 +0200)]
perf tools: Avoid warning in do_realloc_array_as_needed()
do_realloc_array_as_needed() used memcpy() of zero size with a NULL
pointer. Check the size first to avoid sanitize warning.
Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address".
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com
Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 16 Mar 2023 19:41:55 +0000 (21:41 +0200)]
perf symbols: Fix unaligned access in get_x86_64_plt_disp()
Use memcpy() to avoid unaligned access.
Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address".
Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64")
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com
Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 16 Mar 2023 19:41:54 +0000 (21:41 +0200)]
perf symbols: Fix use-after-free in get_plt_got_name()
Fix use-after-free in get_plt_got_name().
Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address".
Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64")
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com
Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kajol Jain [Tue, 28 Mar 2023 11:29:08 +0000 (16:59 +0530)]
perf vendor events power9: Remove UTF-8 characters from JSON files
Commit
3c22ba5243040c13 ("perf vendor events powerpc: Update POWER9
events") added and updated power9 PMU JSON events. However some of the
JSON events which are part of other.json and pipeline.json files,
contains UTF-8 characters in their brief description. Having UTF-8
character could breaks the perf build on some distros.
Fix this issue by removing the UTF-8 characters from other.json and
pipeline.json files.
Result without the fix:
[command]# file -i pmu-events/arch/powerpc/power9/*
pmu-events/arch/powerpc/power9/cache.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/floating-point.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/frontend.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/marked.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/memory.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/metrics.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/nest_metrics.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/other.json: application/json; charset=utf-8
pmu-events/arch/powerpc/power9/pipeline.json: application/json; charset=utf-8
pmu-events/arch/powerpc/power9/pmc.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/translation.json: application/json; charset=us-ascii
[command]#
Result with the fix:
[command]# file -i pmu-events/arch/powerpc/power9/*
pmu-events/arch/powerpc/power9/cache.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/floating-point.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/frontend.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/marked.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/memory.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/metrics.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/nest_metrics.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/other.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/pipeline.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/pmc.json: application/json; charset=us-ascii
pmu-events/arch/powerpc/power9/translation.json: application/json; charset=us-ascii
[command]#
Fixes: 3c22ba5243040c13 ("perf vendor events powerpc: Update POWER9 events")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/lkml/ZBxP77deq7ikTxwG@kernel.org/
Link: https://lore.kernel.org/r/20230328112908.113158-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yang Jihong [Fri, 24 Mar 2023 03:27:02 +0000 (03:27 +0000)]
perf ftrace: Make system wide the default target for latency subcommand
If no target is specified for 'latency' subcommand, the execution fails
because - 1 (invalid value) is written to set_ftrace_pid tracefs file.
Make system wide the default target, which is the same as the default
behavior of 'trace' subcommand.
Before the fix:
# perf ftrace latency -T schedule
failed to set ftrace pid
After the fix:
# perf ftrace latency -T schedule
^C# DURATION | COUNT | GRAPH |
0 - 1 us | 0 | |
1 - 2 us | 0 | |
2 - 4 us | 0 | |
4 - 8 us | 2828 | #### |
8 - 16 us | 23953 | ######################################## |
16 - 32 us | 408 | |
32 - 64 us | 318 | |
64 - 128 us | 4 | |
128 - 256 us | 3 | |
256 - 512 us | 0 | |
512 - 1024 us | 1 | |
1 - 2 ms | 4 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 4 | |
256 - 512 ms | 2 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
Fixes: 53be50282269b46c ("perf ftrace: Add 'latency' subcommand")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230324032702.109964-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tiezhu Yang [Tue, 21 Mar 2023 06:57:01 +0000 (14:57 +0800)]
perf bench syscall: Add fork syscall benchmark
This is a follow up patch for the execve bench which is actually
fork + execve, it makes sense to add the fork syscall benchmark
to compare the execve part precisely.
Some archs have no __NR_fork definition which is used only as a
check condition to call test_fork(), let us just define it as -1
to avoid build error.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: loongson-kernel@lists.loongnix.cn
Link: https://lore.kernel.org/r/1679381821-22736-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Thu, 16 Mar 2023 07:49:46 +0000 (08:49 +0100)]
perf stat: Suppress warning when using cpum_cf events on s390
Running command perf stat -vv -e cpu_cycles -C0 -- true
displays this warning:
Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,'
that may result in non-fatal errors
Make the PMU cpum_cf selectable and avoid this warning.
While at it also fix this warning for PMUs pai_crypto and pai_ext.
Output before:
# ./perf stat -vv -e cpu_cycles -C0 -- true
Using CPUID IBM,3931,704,A01,3.7,002f
Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,'
that may result in non-fatal errors
After aliases, add event pmu 'cpum_cf' with 'event,'
that may result in non-fatal errors
cpu_cycles -> cpum_cf/event=0/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
type 10
size 128
config 0x1001
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3
cpu_cycles: 0: 290434
2479172 2479172:
cpu_cycles: 290434
2479172 2479172
Performance counter stats for 'CPU(s) 0':
290,434 cpu_cycles
0.
002465617 seconds time elapsed
#
Now the warning "Attempting to add event pmu 'cpum_cf' ..."
does not show up anymore.
Output after:
# ./perf stat -vv -e cpu_cycles -C0 -- true
Using CPUID IBM,3931,704,A01,3.7,002f
After aliases, add event pmu 'cpum_cf' with 'event,'
that may result in non-fatal errors
cpu_cycles -> cpum_cf/event=0/
Control descriptor is not initialized
....
Performance counter stats for 'CPU(s) 0':
357,023 cpu_cycles
0.
002454995 seconds time elapsed
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230316074946.41110-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Patrice Duroux [Fri, 3 Mar 2023 19:20:56 +0000 (20:20 +0100)]
perf tests test_bridge_fdb_stress.sh: Fix redirection of stderr to stdin
It's not 2&>1, the correct is 2>&1.
Signed-off-by: Patrice Duroux <patrice.duroux@gmail.com>
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/r/20230303193058.21274-1-patrice.duroux@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Patrice Duroux [Fri, 3 Mar 2023 19:30:58 +0000 (20:30 +0100)]
perf tests record_offcpu.sh: Fix redirection of stderr to stdin
It's not 2&>1, the correct is 2>&1
Fixes: ade1d0307b2fb3d9 ("perf offcpu: Update offcpu test for child process")
Signed-off-by: Patrice Duroux <patrice.duroux@gmail.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230303193058.21274-1-patrice.duroux@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:18 +0000 (00:22 -0700)]
perf vendor events intel: Update metrics to detect pmem at runtime
By detecting whether nvdimms are installed at runtime the number of
events can be reduced if it isn't. These changes come from this PR:
https://github.com/intel/perfmon/pull/63
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:17 +0000 (00:22 -0700)]
perf metrics: Add has_pmem literal
Add literal so that if nvdimms aren't installed we can record fewer
events. The file detection mechanism was suggested by Dan Williams
<dan.j.williams@intel.com> in:
https://lore.kernel.org/linux-perf-users/641bbe1eced26_1b98bb29440@dwillia2-xfh.jf.intel.com.notmuch/
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:16 +0000 (00:22 -0700)]
perf vendor events intel: Sandybridge v19 events
Adds BR_MISP_EXEC.INDIRECT event.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:15 +0000 (00:22 -0700)]
perf vendor events intel: Jaketown v23 events
Adds BR_MISP_EXEC.INDIRECT event.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:14 +0000 (00:22 -0700)]
perf vendor events intel: Haswellx v27 events
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:13 +0000 (00:22 -0700)]
perf vendor events intel: Haswell v33 events
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:12 +0000 (00:22 -0700)]
perf vendor events intel: Broadwellx v20 events
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:11 +0000 (00:22 -0700)]
perf vendor events intel: Broadwellde v9 events
Updates descriptions and encodings. Adds BR_MISP_EXEC.INDIRECT events.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Fri, 24 Mar 2023 07:22:10 +0000 (00:22 -0700)]
perf vendor events intel: Broadwell v27 events
Description updates and formatting changes.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 24 Mar 2023 00:19:22 +0000 (17:19 -0700)]
perf lock contention: Fix msan issue in lock_contention_read()
I got a report of a msan failure like below:
$ sudo perf lock con -ab -- sleep 1
...
==224416==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x5651160d6c96 in lock_contention_read util/bpf_lock_contention.c:290:8
#1 0x565115f90870 in __cmd_contention builtin-lock.c:1919:3
#2 0x565115f90870 in cmd_lock builtin-lock.c:2385:8
#3 0x565115f03a83 in run_builtin perf.c:330:11
#4 0x565115f03756 in handle_internal_command perf.c:384:8
#5 0x565115f02d53 in run_argv perf.c:428:2
#6 0x565115f02d53 in main perf.c:562:3
#7 0x7f43553bc632 in __libc_start_main
#8 0x565115e865a9 in _start
It was because the 'key' variable is not initialized. Actually it'd be set
by bpf_map_get_next_key() but msan didn't seem to understand it. Let's make
msan happy by initializing the variable.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230324001922.937634-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Thu, 23 Mar 2023 12:25:32 +0000 (13:25 +0100)]
perf vendor events s390: Remove UTF-8 characters from JSON file
Commit
7f76b31130680fb3 ("perf list: Add IBM z16 event description for
s390") contains the verbal description for z16 extended counter set.
However some entries of the public description contain UTF-8 characters
which breaks the build on some distros.
Fix this and remove the UTF-8 characters.
Fixes: 7f76b31130680fb3 ("perf list: Add IBM z16 event description for s390")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/ZBwkl77/I31AQk12@osiris
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Thu, 23 Mar 2023 02:50:05 +0000 (19:50 -0700)]
perf hist: Improve srcfile sort key performance (really)
The earlier commit
f0cdde28fecc0d7f ("perf hist: Improve srcfile sort
key performance") updated the srcfile logic but missed to change the
->cmp() callback which is called for every sample.
It should use the same logic like in the srcline to speed up the
processing because it'd return the same information repeatedly for the
same address. The real processing will be done in
sort__srcfile_collapse().
Fixes: f0cdde28fecc0d7f ("perf hist: Improve srcfile sort key performance")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230323025005.191239-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Wed, 22 Mar 2023 09:47:31 +0000 (10:47 +0100)]
perf test: Fix wrong size expectation for 'Setup struct perf_event_attr'
The test case "perf test 'Setup struct perf_event_attr'" is failing.
On s390 this output is observed:
# ./perf test -Fvvvv 17
17: Setup struct perf_event_attr :
--- start ---
running './tests/attr/test-stat-C0'
Using CPUID IBM,8561,703,T01,3.6,002f
.....
Event event:base-stat
fd = 1
group_fd = -1
flags = 0|8
cpu = *
type = 0
size = 128 <<<--- wrong, specified in file base-stat
config = 0
sample_period = 0
sample_type = 65536
...
'PERF_TEST_ATTR=/tmp/tmpgw574wvg ./perf stat -o \
/tmp/tmpgw574wvg/perf.data -e cycles -C 0 kill >/dev/null \
2>&1 ret '1', expected '1'
loading result events
Event event-0-0-4
fd = 4
group_fd = -1
cpu = 0
pid = -1
flags = 8
type = 0
size = 136 <<<--- actual size used in system call
.....
compare
matching [event-0-0-4]
to [event:base-stat]
[cpu] 0 *
[flags] 8 0|8
[type] 0 0
[size] 136 128
->FAIL
match: [event-0-0-4] matches []
expected size=136, got 128
FAILED './tests/attr/test-stat-C0' - match failure
This mismatch is caused by
commit
09519ec3b19e ("perf: Add perf_event_attr::config3")
which enlarges the structure perf_event_attr by 8 bytes.
Fix this by adjusting the expected value of size.
Output after:
# ./perf test -Fvvvv 17
17: Setup struct perf_event_attr :
--- start ---
running './tests/attr/test-stat-C0'
Using CPUID IBM,8561,703,T01,3.6,002f
...
matched
compare
matching [event-0-0-4]
to [event:base-stat]
[cpu] 0 *
[flags] 8 0|8
[type] 0 0
[size] 136 136
....
->OK
match: [event-0-0-4] matches ['event:base-stat']
matched
Fixes: 09519ec3b19e4144 ("perf: Add perf_event_attr::config3")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230322094731.1768281-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 22 Mar 2023 18:31:08 +0000 (11:31 -0700)]
perf build: Add warning for when vmlinux.h generation fails
The warning advises on the NO_BPF_SKEL=1 option.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230322183108.1380882-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Artem Savkov [Thu, 16 Mar 2023 13:35:57 +0000 (14:35 +0100)]
perf report: Append inlines to non-DWARF callchains
Append information about inlined functions to FP and LBR callchains from
DWARF debuginfo when available. Do so by calling append_inlines() from
add_callchain_ip().
Testing it:
Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out'
#include <stdio.h>
#include <stdint.h>
static __attribute__((noinline)) uint32_t func5(uint32_t i)
{
return i + 10;
}
static uint32_t func4(uint32_t i)
{
return func5(i + 5);
}
static inline uint32_t func3(uint32_t i)
{
return func4(i + 4);
}
static __attribute__((noinline)) uint32_t func2(uint32_t i)
{
return func3(i + 3);
}
static uint32_t func1(uint32_t i)
{
return func2(i + 2);
}
__attribute__((noinline)) uint64_t entry(void)
{
uint64_t ret = 0;
uint32_t i = 0;
for (i = 0; i <
1000000; i++) {
ret += func1(i);
ret -= func2(i);
ret += func3(i);
ret += func4(i);
ret -= func5(i);
}
return ret;
}
int main(int argc, char **argv)
{
printf("%s\n", __func__);
return entry();
}
======
Here is the output I get with '--call-graph callee --no-children'
======
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 250 of event 'cycles:u'
# Event count (approx.):
26819859
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... .....................................
#
43.58% a.out a.out [.] func5
|
|--28.93%--entry
| main
| __libc_start_call_main
|
--14.65%--func4 (inlined)
|
|--10.45%--entry
| main
| __libc_start_call_main
|
--4.20%--func3 (inlined)
entry
main
__libc_start_call_main
38.80% a.out a.out [.] entry
|
|--23.27%--func4 (inlined)
| |
| |--20.28%--func3 (inlined)
| | func2
| | main
| | __libc_start_call_main
| |
| --2.99%--entry
| main
| __libc_start_call_main
|
|--8.17%--func5
| main
| __libc_start_call_main
|
|--3.89%--func1 (inlined)
| entry
| main
| __libc_start_call_main
|
--3.48%--entry
main
__libc_start_call_main
13.07% a.out a.out [.] func2
|
---func5
main
__libc_start_call_main
1.54% a.out [unknown] [k] 0xffffffff81e011b7
1.16% a.out [unknown] [k] 0xffffffff81e00193
|
--0.57%--__mmap64 (inlined)
__mmap64 (inlined)
0.34% a.out ld-linux-x86-64.so.2 [.] __tunable_get_val
0.34% a.out ld-linux-x86-64.so.2 [.] strcmp
0.32% a.out libc.so.6 [.] strchr
0.31% a.out ld-linux-x86-64.so.2 [.] _dl_relocate_object
0.22% a.out ld-linux-x86-64.so.2 [.] _dl_init_paths
0.18% a.out ld-linux-x86-64.so.2 [.] get_common_cache_info.constprop.0
0.14% a.out ld-linux-x86-64.so.2 [.] __GI___tunables_init
#
# (Tip: Show individual samples with: perf script)
#
======
It does not seem to be out of order, or at least it is consistent with
what I get with dwarf unwinders.
Committer notes:
Adrian Hunter pointed out that this breaks --branch-history, so don't do
it for branches, see the second Link below.
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: <asavkov@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230316133557.868731-2-asavkov@redhat.com
Link: https://lore.kernel.org/r/54129783-2960-84e1-05e9-97ac70ffb432@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rob Herring [Fri, 17 Feb 2023 22:32:11 +0000 (16:32 -0600)]
perf tools: Add support for perf_event_attr::config3
perf_event_attr has gained a new field, config3, so add support for it
extending the existing configN support.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220914-arm-perf-tool-spe1-2-v2-v5-2-2cf5210b2f77@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Mon, 20 Mar 2023 11:45:59 +0000 (11:45 +0000)]
perf vendor events arm64: Add N1 metrics
Generated from the telemetry solution repo[1] with this command:
./generate.py <linux-repo>/tools/perf/ --telemetry-files \
../../data/pmu/cpu/neoverse/neoverse-n1.json
Since this data source now includes the SPE events for N1, it has
diverged from A76 which means the folder has to be split.
The new data also uses more fine grained grouping, but this will be
consistent for all future products. Long PublicDescriptions are now
included even for common events because this can include product
specific details. For non verbose mode the common BriefDescriptions
remain the same.
[1]: https://gitlab.arm.com/telemetry-solution/telemetry-solution
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: renyu.zj@linux.alibaba.com
Link: https://lore.kernel.org/r/20230320114601.524958-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bernhard M. Wiedemann [Tue, 21 Mar 2023 06:30:32 +0000 (07:30 +0100)]
perf jevents: Sort list of input files
Without this, pmu-events.c would be generated with variations in
ordering depending on non-deterministic filesystem readdir order.
I tested that pmu-events.c still has the same number of lines and that
perf list output works.
This patch was done while working on reproducible builds for openSUSE,
but also solves issues in Debian [1] and other distributions.
[1] https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/linux.html
Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230321063032.19804-1-bwiedemann@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Mon, 20 Mar 2023 06:16:19 +0000 (14:16 +0800)]
perf kvm: Delete histograms entries before exiting
It's good not to release resources for a program when kernel cleans up
memory space, this patch explicitly releases histograms entries with
hists__delete_entries().
Committer notice:
This helps with memory leak checkers, but may delay exiting a tool by
doing needless linked list traversals freeing lots of objects.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230320061619.29520-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Mon, 20 Mar 2023 06:16:18 +0000 (14:16 +0800)]
perf kvm: Reference count 'struct kvm_info'
hists__add_entry_ops() doesn't allocate a new histogram entry if it has
an existing entry for a KVM event, in this case, find_create_kvm_event()
allocates a 'struct kvm_info' but it's not used by any histograms and
never freed.
To fix the memory leak, this patch first introduces a refcnt and a set
of functions for refcnt operations on 'struct kvm_info'. When the data
structure is not anymore used (the refcnt hits zero) kvm_info__zput()
will free the memory used.
Committer:
Provide a nop version of kvm_info__zput() to be used when
HAVE_KVM_STAT_SUPPORT isn't defined as it is used unconditionally in
hists__findnew_entry() and hist_entry__delete().
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230320061619.29520-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
German Gomez [Mon, 20 Mar 2023 15:15:08 +0000 (15:15 +0000)]
perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.
Rows are labeled with the SIMD ISA, and the type of predicate (if any):
- [p] partial predicate
- [e] empty predicate (no elements in the vector being used)
Example with Arm SPE and SVE (Scalable Vector Extension):
#include <arm_sve.h>
double src[1025], dst[1025];
int main(void) {
svfloat64_t vc = svdup_f64(1);
for(;;)
for(int i = 0; i < 1025; i += svcntd())
{
svbool_t pg = svwhilelt_b64(i, 1025);
svfloat64_t vsrc = svld1(pg, &src[i]);
svfloat64_t vdst = svadd_x(pg, vsrc, vc);
svst1(pg, &dst[i], vdst);
}
return 0;
}
... compiled using "gcc-11 -march=armv8-a+sve -O3"
Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:
$ perf record -e arm_spe_0// -- ./a.out
$ perf report --itrace=i1i -s overhead,pid,simd,sym
Overhead Pid:Command Simd Symbol
........ ................ ....... ......................
53.76% 10758:program [.] main
46.14% 10758:program [.] SVE [.] main
0.09% 10758:program [p] SVE [.] main
The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
German Gomez [Mon, 20 Mar 2023 15:15:07 +0000 (15:15 +0000)]
perf arm-spe: Add SVE flags to the SPE samples
Add flags from the Scalable Vector Extension (SVE) to the SPE samples
which are available from Armv8.3 (FEAT_SPEv1p1).
These will be displayed in a new SIMD sort field in a later commit.
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Cc: Anshuman.Khandual@arm.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
German Gomez [Mon, 20 Mar 2023 15:15:06 +0000 (15:15 +0000)]
perf arm-spe: Refactor arm-spe to support operation packet type
Extend the decoder of Arm SPE records to support more fields from the
operation packet type.
Not all fields are being decoded by this commit. Only those needed to
support the use-case SVE load/store/other operations.
Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
German Gomez [Mon, 20 Mar 2023 15:15:05 +0000 (15:15 +0000)]
perf event: Add 'simd_flags' field to 'struct perf_sample'
Add new field to 'struct perf_sample' to store flags related to SIMD
ops.
It will be used to store SIMD information from SVE and NEON when
profiling using ARM SPE.
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 Mar 2023 18:35:17 +0000 (20:35 +0200)]
perf intel-pt: Add support for new branch instructions ERETS and ERETU
Intel Flexible Return and Event Delivery (FRED) adds instructions ERETS
(return to supervisor) and ERETU (return to user). Intel PT instruction
decoder needs to know about these instructions because they are
branch instructions. Similar to IRET instructions, when the decoder
encounters one of these instructions it will match it to a TIP (target
instruction pointer) packet that informs what the branch destination is.
The existing "x86 instruction decoder - new instructions" test can be
used to test the result e.g.
$ perf test -v ins |& grep eret
Decoded ok: f2 0f 01 ca erets
Decoded ok: f3 0f 01 ca eretu
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230320183517.15099-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 Mar 2023 18:35:16 +0000 (20:35 +0200)]
perf intel-pt: Add event type names UINTR and UIRET
UINTR and UIRET are listed in table 32-50 "CFE Packet Type and Vector
Fields Details" in the Intel Processor Trace chapter of The Intel SDM
Volume 3 version 078.
The codes are for "User interrupt delivered" and "Exiting from user
interrupt routine" respectively.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230320183517.15099-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 20 Mar 2023 03:37:53 +0000 (20:37 -0700)]
perf symbol: Sort names under write lock
If finding a name doesn't find the sorted names then they are
allocated and sorted. This shouldn't be done under a read lock as
another reader may access it. Release the read lock and acquire the
write lock, then release the write lock and reacquire the read lock.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 20 Mar 2023 03:37:52 +0000 (20:37 -0700)]
perf test: Fix memory leak in symbols
machine__delete() doesn't delete threads. Add call to delete threads
ahead of deleting the machine.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 20 Mar 2023 03:37:51 +0000 (20:37 -0700)]
perf tests: Add common error route for code-reading
A later change will enforce that the map is put on this path
regardless of success or error.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 20 Mar 2023 03:37:50 +0000 (20:37 -0700)]
perf bpf_counter: Use public cpumap accessors
Avoid the use of internal apis via the cpumap accessor functions.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 20 Mar 2023 03:37:49 +0000 (20:37 -0700)]
perf symbol: Avoid memory leak from abi::__cxa_demangle
Rather than allocate memory, allow abi::__cxa_demangle to do
that. This avoids a problem where on error NULL was returned
triggering a memory leak.
Fixes: 3b4e4efe88f615f1 ("perf symbol: Add abi::__cxa_demangle C++ demangling support")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:12 +0000 (22:51 +0800)]
perf kvm: Update documentation to reflect new changes
Update documentation for new sorting and option '--stdio'.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:11 +0000 (22:51 +0800)]
perf kvm: Add TUI mode for stat report
Since we have supported histograms list and prepared the dimensions in
the tool, this patch adds TUI mode for stat report. It also adds UI
progress for sorting for better user experience.
Committer notes:
kvm_display() is only used by functions enclosed in:
#if defined(HAVE_KVM_STAT_SUPPORT) && defined(HAVE_LIBTRACEEVENT)
So do it with this new function as well.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:10 +0000 (22:51 +0800)]
perf kvm: Add dimensions for percentages
Add dimensions for count and time percentages, it would be useful for
user to review percentage statistics.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:09 +0000 (22:51 +0800)]
perf kvm: Support printing attributions for dimensions
This patch adds header, entry callback and width for every dimension,
thus in TUI mode the tool can print items with the defined attributions.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:08 +0000 (22:51 +0800)]
perf kvm: Polish sorting key
Since histograms supports sorting, the tool doesn't need to maintain the
mapping between the sorting keys and the corresponding comparison
callbacks, therefore, this patch removes structure kvm_event_key.
But we still need to validate the sorting key, this patch uses an array
for sorting keys and renames function select_key() to is_valid_key()
to validate the sorting key passed by user.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:07 +0000 (22:51 +0800)]
perf kvm: Use histograms list to replace cached list
perf kvm tool defines its own cached list which is managed with RB tree,
histograms also provide RB tree to manage data entries. Since now we
have introduced histograms in the tool, it's not necessary to use the
self defined list and we can directly use histograms list to manage
KVM events.
This patch changes to use histograms list to track KVM events, and it
invokes the common function hists__output_resort_cb() to sort result,
this also give us flexibility to extend more sorting key words easily.
After histograms list supported, the cached list is redundant so remove
the relevant code for it.
Committer notes:
kvm_hists__reinit() is only used by functions enclosed in:
#if defined(HAVE_KVM_STAT_SUPPORT) && defined(HAVE_LIBTRACEEVENT)
So do it with this new function as well.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:06 +0000 (22:51 +0800)]
perf kvm: Add dimensions for KVM event statistics
To support KVM event statistics, this patch firstly registers histograms
columns and sorting fields; every column or field has its own format
structure, the format structure is dereferenced to access the dimension,
finally the dimension provides the comparison callback for sorting
result.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:05 +0000 (22:51 +0800)]
perf hist: Add 'kvm_info' field in histograms entry
__hists__add_entry() creates a temporary entry and compare it with
existed histograms entries, if any existed entry equals to the
temporary entry it skips to allocation to avoid duplication.
The problem for support KVM event in histograms is it doesn't contain
any info to identify KVM event and can be used for comparison entries.
This patch adds 'kvm_info' field in the histograms entry which contains
the KVM event's key, this identifier will be used for comparison
histograms entries in later change.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:04 +0000 (22:51 +0800)]
perf kvm: Parse address location for samples
Parse address location for samples and save it into the structure
'perf_kvm_stat', it is to be used by histograms entry.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:03 +0000 (22:51 +0800)]
perf kvm: Pass argument 'sample' to kvm_alloc_init_event()
This patch adds an argument 'sample' for kvm_alloc_init_event(), and its
caller functions are updated as well for passing down the 'sample'
pointer.
This is a preparation change to allow later patch to create histograms
entries for kvm event, no any functionality changes.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:02 +0000 (22:51 +0800)]
perf kvm: Introduce histograms data structures
This is a preparation to support histograms in perf kvm tool. As first
step, this patch defines histograms data structures and initialize them.
Committer notes:
Those are only used by functions enclosed in:
#if efined(HAVE_KVM_STAT_SUPPORT) && defined(HAVE_LIBTRACEEVENT)
So do this for these new functions and struct as well.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:01 +0000 (22:51 +0800)]
perf kvm: Use macro to replace variable 'decode_str_len'
The variable 'decode_str_len' defines the string length for KVM event
name and every arch defines its own values.
This introduces complexity that the variable definition are spreading in
multiple source files under arch folder. This patch refactors code to
use a macro KVM_EVENT_NAME_LEN to define event name length and thus
remove the definitions in arch files.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:51:00 +0000 (22:51 +0800)]
perf kvm: Use subtraction for comparison metrics
Currently the metrics comparison uses greater operator (>), it returns
the boolean value (0 or 1).
This patch changes to use subtraction as comparison result, which can
be used by histograms sorting. Since the subtraction result is u64
type, we change key_cmp_fun's return type to int64_t to avoid overflow.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:50:59 +0000 (22:50 +0800)]
perf kvm: Move up metrics helpers
This patch moves up the helper functions of event's metrics for later
adding code to call them.
No any functionality changes, but has a function renaming from
compare_kvm_event_{metric}() to cmp_event_{metric}().
Committer notes:
Those helper functions are only used if this is true:
if defined(HAVE_KVM_STAT_SUPPORT) && defined(HAVE_LIBTRACEEVENT)
So keep them enclosed with that.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:50:58 +0000 (22:50 +0800)]
perf kvm: Add pointer to 'perf_kvm_stat' in kvm event
Sometimes, handling kvm events needs to base on global variables, e.g.
when read event counts we need to know the target vcpu ID; the global
variables are stored in structure perf_kvm_stat.
This patch adds add a 'perf_kvm_stat' pointer in kvm event structure,
it is to be used by later refactoring.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 15 Mar 2023 14:50:57 +0000 (22:50 +0800)]
perf kvm: Refactor overall statistics
Currently the tool computes overall statistics when sort the results.
This patch refactors overall statistics during events processing,
therefore, the function update_total_coun() is not needed anymore, an
extra benefit is we can de-couple code between the statistics and the
sorting.
This patch is not expected any functionality changes.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:37 +0000 (16:42 -0700)]
perf record: Update documentation for BPF filters
Add more description and examples.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:36 +0000 (16:42 -0700)]
perf bpf filter: Show warning for missing sample flags
For a BPF filter to work properly, users need to provide appropriate
options to enable the sample types. Otherwise the BPF program would
see an invalid value (i.e. always 0) and filter won't work well.
Show a warning message if sample types are missing like below.
$ sudo ./perf record -e cycles --filter 'addr < 100' true
Error: cycles event does not have PERF_SAMPLE_ADDR
Hint: please add -d option to perf record.
failed to set filter "BPF" on event cycles with 22 (Invalid argument)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:35 +0000 (16:42 -0700)]
perf bpf filter: Add logical OR operator
It supports two or more expressions connected as a group and the group
result is considered true when one of them returns true. The new group
operators (GROUP_BEGIN and GROUP_END) are added to setup and check the
condition. As it doesn't allow nested groups, the condition is saved
in local variables.
For example, the following is to get samples only if the data source
memory level is L2 cache or the weight value is greater than 30.
$ sudo ./perf record -adW -e cpu/mem-loads/pp \
> --filter 'mem_lvl == l2 || weight > 30' -- sleep 1
$ sudo ./perf script -F data_src,weight
10668100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 47
11868100242 |OP LOAD|LVL LFB/MAB or LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 57
10668100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 56
10650100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L2 miss|LCK No|BLK N/A 144
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 16
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 20
11868100242 |OP LOAD|LVL LFB/MAB or LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 189
1026a100142 |OP LOAD|LVL L1 or L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK N/A 193
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 18
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:34 +0000 (16:42 -0700)]
perf bpf filter: Add data_src sample data support
The data_src has many entries to express memory behaviors. Add each
term separately so that users can combine them for their purpose.
I didn't add prefix for the constants for simplicity as they are mostly
distinguishable but I had to use l1_miss and l2_hit for mem_dtlb since
mem_lvl has different values for the same names. Note that I decided
mem_lvl to be used as an alias of mem_lvlnum as it's deprecated now.
According to the comment in the UAPI header, users should use the mix of
mem_lvlnum, mem_remote and mem_snoop. Also the SNOOPX bits are
concatenated to mem_snoop for simplicity.
The following terms are used for data_src and the corresponding perf
sample data fields:
* mem_op : { load, store, pfetch, exec }
* mem_lvl: { l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem }
* mem_snoop: { none, hit, miss, hitm, fwd, peer }
* mem_remote: { remote }
* mem_lock: { locked }
* mem_dtlb { l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault }
* mem_blk { by_data, by_addr }
* mem_hops { hops0, hops1, hops2, hops3 }
We can now use a filter expression like below:
'mem_op == load, mem_lvl <= l2, mem_dtlb == l1_hit'
'mem_dtlb == l2_miss, mem_hops > hops1'
'mem_lvl == ram, mem_remote == 1'
Note that 'na' is shared among the terms as it has the same value except
for mem_lvl. I don't have a good idea to handle that for now.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:33 +0000 (16:42 -0700)]
perf bpf filter: Add more weight sample data support
The weight data consists of a couple of fields with the
PERF_SAMPLE_WEIGHT_STRUCT. Add weight{1,2,3} term to select them
separately. Also add their aliases like 'ins_lat', 'p_stage_cyc' and
'retire_lat'.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:32 +0000 (16:42 -0700)]
perf bpf filter: Add 'pid' sample data support
The pid is special because it's saved in the PERF_SAMPLE_TID together.
So it needs to differenciate tid and pid using the 'part' field in the
perf bpf filter entry struct.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:31 +0000 (16:42 -0700)]
perf record: Record dropped sample count
When it uses bpf filters, event might drop some samples. It'd be nice
if it can report how many samples it lost. As LOST_SAMPLES event can
carry the similar information, let's use it for bpf filters.
To indicate it's from BPF filters, add a new misc flag for that and
do not display cpu load warnings.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:30 +0000 (16:42 -0700)]
perf record: Add BPF event filter support
Use --filter option to set BPF filter for generic events other than the
tracepoints or Intel PT. The BPF program will check the sample data and
filter according to the expression.
For example, the below is the typical perf record for frequency mode.
The sample period started from 1 and increased gradually.
$ sudo ./perf record -e cycles true
$ sudo ./perf script
perf-exec
2272336 546683.916875: 1 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916892: 1 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916899: 3 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916905: 17 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916911: 100 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916917: 589 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916924: 3470 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec
2272336 546683.916930: 20465 cycles:
ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
true
2272336 546683.916940: 119873 cycles:
ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
true
2272336 546683.917003: 461349 cycles:
ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
true
2272336 546683.917237: 635778 cycles:
ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
When you add a BPF filter to get samples having periods greater than 1000,
the output would look like below:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec
2273949 546850.708501: 5029 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec
2273949 546850.708508: 32409 cycles:
ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec
2273949 546850.708526: 143369 cycles:
ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec
2273949 546850.708600: 372650 cycles:
ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec
2273949 546850.708791: 482953 cycles:
ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true
2273949 546850.709036: 501985 cycles:
ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true
2273949 546850.709292: 503065 cycles:
7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
Committer notes:
Add stubs for perf_bpf_filter__prepare() and perf_bpf_filter__destroy()
to tools/perf/util/python.c to keep it building.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:29 +0000 (16:42 -0700)]
perf bpf filter: Implement event sample filtering
The BPF program will be attached to a perf_event and be triggered when
it overflows. It'd iterate the filters map and compare the sample
value according to the expression. If any of them fails, the sample
would be dropped.
Also it needs to have the corresponding sample data for the expression
so it compares data->sample_flags with the given value. To access the
sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
in v6.2 kernel.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 14 Mar 2023 23:42:28 +0000 (16:42 -0700)]
perf bpf filter: Introduce basic BPF filter expression
This implements a tiny parser for the filter expressions used for BPF.
Each expression will be converted to struct perf_bpf_filter_expr and
be passed to a BPF map.
For now, I'd like to start with the very basic comparisons like EQ or
GT. The LHS should be a term for sample data and the RHS is a number.
The expressions are connected by a comma. For example,
period > 10000
ip < 0x1000000000000, cpu == 3
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230314234237.3008956-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
liuwenyu [Wed, 15 Mar 2023 06:42:17 +0000 (14:42 +0800)]
perf top: Fix rare segfault in thread__comm_len()
In thread__comm_len(),strlen() is called outside of the
thread->comm_lock critical section,which may cause a UAF
problems if comm__free() is called by the process_thread
concurrently.
backtrace of the core file is as follows:
(gdb) bt
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
#1 0x000055ad15d31de5 in thread__comm_len (thread=0x7f627d20e300) at util/thread.c:320
#2 0x000055ad15d4fade in hists__calc_col_len (h=0x7f627d295940, hists=0x55ad1772bfe0)
at util/hist.c:103
#3 hists__calc_col_len (hists=0x55ad1772bfe0, h=0x7f627d295940) at util/hist.c:79
#4 0x000055ad15d52c8c in output_resort (hists=hists@entry=0x55ad1772bfe0, prog=0x0,
use_callchain=false, cb=cb@entry=0x0, cb_arg=0x0) at util/hist.c:1926
#5 0x000055ad15d530a4 in evsel__output_resort_cb (evsel=evsel@entry=0x55ad1772bde0,
prog=prog@entry=0x0, cb=cb@entry=0x0, cb_arg=cb_arg@entry=0x0) at util/hist.c:1945
#6 0x000055ad15d53110 in evsel__output_resort (evsel=evsel@entry=0x55ad1772bde0,
prog=prog@entry=0x0) at util/hist.c:1950
#7 0x000055ad15c6ae9a in perf_top__resort_hists (t=t@entry=0x7ffcd9cbf4f0) at builtin-top.c:311
#8 0x000055ad15c6cc6d in perf_top__print_sym_table (top=0x7ffcd9cbf4f0) at builtin-top.c:346
#9 display_thread (arg=0x7ffcd9cbf4f0) at builtin-top.c:700
#10 0x00007f6282fab4fa in start_thread (arg=<optimized out>) at pthread_create.c:443
#11 0x00007f628302e200 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
The reason is that strlen() get a pointer to a memory that has been freed.
The string pointer is stored in the structure comm_str, which corresponds
to a rb_tree node,when the node is erased, the memory of the string is also freed.
In thread__comm_len(),it gets the pointer within the thread->comm_lock critical section,
but passed to strlen() outside of the thread->comm_lock critical section, and the perf
process_thread may called comm__free() concurrently, cause this segfault problem.
The process is as follows:
display_thread process_thread
-------------- --------------
thread__comm_len
-> thread__comm_str
# held the comm read lock
-> __thread__comm_str(thread)
# release the comm read lock
thread__delete
# held the comm write lock
-> comm__free
-> comm_str__put(comm->comm_str)
-> zfree(&cs->str)
# release the comm write lock
# The memory of the string pointed
to by comm has been free.
-> thread->comm_len = strlen(comm);
This patch expand the critical section range of thread->comm_lock in thread__comm_len(),
to make strlen() called safe.
Signed-off-by: Wenyu Liu <liuwenyu7@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yunfeng Ye <yeyunfeng@huawei.com>
Link: https://lore.kernel.org/r/322bfb49-840b-f3b6-9ef1-f9ec3435b07e@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 15 Mar 2023 08:43:21 +0000 (10:43 +0200)]
perf script: Fix Python support when no libtraceevent
Python scripting can be used without libtraceevent. In particular,
scripting for Intel PT does not use tracepoints, and so does not need
libtraceevent support.
Alter the build and employ conditional compilation to allow Python
scripting without libtraceevent.
Example:
Before:
$ ldd `which perf` | grep -i python
$ ldd `which perf` | grep -i libtraceevent
$ perf record -e intel_pt//u uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data ]
$ perf script intel-pt-events.py |& head -3
Error: Couldn't find script `intel-pt-events.py'
See perf script -l for available scripts.
After:
$ ldd `which perf` | grep -i python
libpython3.10.so.1.0 => /lib/x86_64-linux-gnu/libpython3.10.so.1.0 (0x00007f4bac400000)
$ ldd `which perf` | grep -i libtraceevent
$ perf script intel-pt-events.py | head
Intel PT Branch Trace, Power Events, Event Trace and PTWRITE
Switch In 8021/8021 [000] 11234.
097713404 0/0
perf-exec 8021/8021 [000] 11234.
098041726 psb offset: 0x0 0 [unknown] ([unknown])
perf-exec 8021/8021 [000] 11234.
098041726 cbr 45 freq: 4505 MHz (161%) 0 [unknown] ([unknown])
uname 8021/8021 [000] 11234.
098082170 branches:uH tr strt 0 [unknown] ([unknown]) =>
7f3a8b9422b0 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
uname 8021/8021 [000] 11234.
098082379 branches:uH tr end
7f3a8b9422b0 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) => 0 [unknown] ([unknown])
uname 8021/8021 [000] 11234.
098083629 branches:uH tr strt 0 [unknown] ([unknown]) =>
7f3a8b9422b0 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
uname 8021/8021 [000] 11234.
098083629 branches:uH call
7f3a8b9422b3 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) =>
7f3a8b943050 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
uname 8021/8021 [000] 11234.
098083837 branches:uH tr end
7f3a8b943060 _dl_start+0x10 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) => 0 [unknown] ([unknown]) IPC: 0.01 (9/938)
uname 8021/8021 [000] 11234.
098084670 branches:uH tr strt 0 [unknown] ([unknown]) =>
7f3a8b943060 _dl_start+0x10 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230315084321.14563-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 13 Mar 2023 08:02:01 +0000 (09:02 +0100)]
perf vendor events s390: Add metric for TLB and cache
Add metrics for tlb and cache statistics:
- finite_cpi: Cycles per Instructions from Finite cache/memory
- est_cpi: Estimated Instruction Complexity CPI infinite Level 1
- scpl1m: Estimated Sourcing Cycles per Level 1 Miss
- tlb_percent: Estimated TLB CPU percentage of Total CPU
- tlb_miss: Estimated Cycles per TLB Miss
For details about the formulas see this documentation:
https://www.ibm.com/support/pages/system/files/inline-files/CPU%20MF%20Formulas%20including%20z16%20-%20May%202022_1.pdf
Output after:
# ./perf stat -M tlb_miss -- dd if=/dev/zero of=/dev/null bs=1M count=10K
... dd output removed
Performance counter stats for 'dd if=/dev/zero of=/dev/null bs=1M count=10K':
667,726 DTLB2_MISSES # 440.96 tlb_miss
198 ITLB2_WRITES
795,170,260 L1C_TLB2_MISSES
9,478 ITLB2_MISSES
820 DTLB2_WRITES
1,197,126,869 L1D_PENALTY_CYCLES
2,457,447 L1I_PENALTY_CYCLES
1.
249342187 seconds time elapsed
0.
001030000 seconds user
1.
248105000 seconds sys
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-By: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313080201.2440201-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 13 Mar 2023 08:02:00 +0000 (09:02 +0100)]
perf vendor events s390: Add cache metrics for z13
Add metrics for s390 z13
- Percentage sourced from Level 2 cache
- Percentage sourced from Level 3 on same chip cache
- Percentage sourced from Level 4 Local cache on same book
- Percentage sourced from Level 4 Remote cache on different book
- Percentage sourced from memory
For details about the formulas see this documentation:
https://www.ibm.com/support/pages/system/files/inline-files/CPU%20MF%20Formulas%20including%20z16%20-%20May%202022_1.pdf
Output after:
# ./perf stat -M l4rp -- find /
...find output deleted
Performance counter stats for 'find /':
2 L1I_OFFDRAWER_SCOL_L4_SOURCED_WRITES # 0.02 l4rp
252 L1D_ONDRAWER_L4_SOURCED_WRITES
3,465 L1D_ONDRAWER_L3_SOURCED_WRITES_IV
80 L1D_OFFDRAWER_SCOL_L4_SOURCED_WRITES
761 L1D_ONDRAWER_L3_SOURCED_WRITES
0 L1I_OFFDRAWER_SCOL_L3_SOURCED_WRITES
131,817,067 L1I_DIR_WRITES
1 L1I_OFFDRAWER_FCOL_L4_SOURCED_WRITES
447 L1D_OFFDRAWER_SCOL_L3_SOURCED_WRITES
22 L1D_OFFDRAWER_FCOL_L4_SOURCED_WRITES
7 L1I_ONDRAWER_L4_SOURCED_WRITES
0 L1I_OFFDRAWER_FCOL_L3_SOURCED_WRITES
1,071 L1D_OFFDRAWER_FCOL_L3_SOURCED_WRITES
3 L1I_ONDRAWER_L3_SOURCED_WRITES
13,352 L1D_OFFDRAWER_FCOL_L3_SOURCED_WRITES_IV
15,252 L1D_OFFDRAWER_SCOL_L3_SOURCED_WRITES_IV
0 L1I_ONDRAWER_L3_SOURCED_WRITES_IV
0 L1I_OFFDRAWER_FCOL_L3_SOURCED_WRITES_IV
57,431,083 L1D_DIR_WRITES
0 L1I_OFFDRAWER_SCOL_L3_SOURCED_WRITES_IV
15.
386502874 seconds time elapsed
0.
647348000 seconds user
3.
537041000 seconds sys
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-By: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313080201.2440201-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 13 Mar 2023 08:01:59 +0000 (09:01 +0100)]
perf vendor events s390: Add cache metrics for z14
Add metrics for s390 z14
- Percentage sourced from Level 2 cache
- Percentage sourced from Level 3 on same chip cache
- Percentage sourced from Level 4 Local cache on same book
- Percentage sourced from Level 4 Remote cache on different book
- Percentage sourced from memory
For details about the formulas see this documentation:
https://www.ibm.com/support/pages/system/files/inline-files/CPU%20MF%20Formulas%20including%20z16%20-%20May%202022_1.pdf
Output after:
# ./perf stat -M l4rp -- find /
.... find output deleted
Performance counter stats for 'find /':
0 L1I_OFFDRAWER_L4_SOURCED_WRITES # 0.01 l4rp
84 L1D_OFFDRAWER_L4_SOURCED_WRITES
0 L1I_OFFDRAWER_L3_SOURCED_WRITES
71,535,353 L1I_DIR_WRITES
219 L1D_OFFDRAWER_L3_SOURCED_WRITES
16,436 L1D_OFFDRAWER_L3_SOURCED_WRITES_IV
0 L1I_OFFDRAWER_L3_SOURCED_WRITES_IV
46,343,940 L1D_DIR_WRITES
10.
530805537 seconds time elapsed
0.
774396000 seconds user
1.
602714000 seconds sys
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-By: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313080201.2440201-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Mon, 13 Mar 2023 08:01:58 +0000 (09:01 +0100)]
perf vendor events s390: Add cache metrics for z15
Add metrics for s390 z15
- Percentage sourced from Level 2 cache
- Percentage sourced from Level 3 on same chip cache
- Percentage sourced from Level 4 Local cache on same book
- Percentage sourced from Level 4 Remote cache on different book
- Percentage sourced from memory
For details about the formulas see this documentation:
https://www.ibm.com/support/pages/system/files/inline-files/CPU%20MF%20Formulas%20including%20z16%20-%20May%202022_1.pdf
Outpuf after:
# ./perf stat -M l4rp -- find /
.... find output deleted
Performance counter stats for 'find /':
5 L1I_OFFDRAWER_L4_SOURCED_WRITES # 0.01 l4rp
187 L1D_OFFDRAWER_L4_SOURCED_WRITES
0 L1I_OFFDRAWER_L3_SOURCED_WRITES
231,333,165 L1I_DIR_WRITES
3,303 L1D_OFFDRAWER_L3_SOURCED_WRITES
47,461 L1D_OFFDRAWER_L3_SOURCED_WRITES_IV
0 L1I_OFFDRAWER_L3_SOURCED_WRITES_IV
126,706,244 L1D_DIR_WRITES
27.
870355461 seconds time elapsed
0.
521562000 seconds user
12.
494503000 seconds sys
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-By: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20230313080201.2440201-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Mar 2023 05:33:12 +0000 (22:33 -0700)]
perf vendor events intel: Update skylake events
Update from v54 to v55. Addition of OFFCORE_RESPONSE,
FP_ARITH_INST_RETIRED.SCALAR, FP_ARITH_INST_RETIRED.VECTOR and
INT_MISC.CLEARS_COUNT.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230314053312.3237390-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Mar 2023 05:33:11 +0000 (22:33 -0700)]
perf vendor events intel: Update meteorlake events
Update from 1.00 to 1.01. Event description updates. Addition of
IDQ_BUBBLES.CORE, TOPDOWN.BACKEND_BOUND_SLOTS, UOPS_RETIRED.SLOTS.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230314053312.3237390-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Tue, 14 Mar 2023 05:33:10 +0000 (22:33 -0700)]
perf vendor events intel: Update graniterapids events
Update from 1.00 to 1.01, some event description updates.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230314053312.3237390-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Roman Lozko [Fri, 10 Mar 2023 15:04:45 +0000 (15:04 +0000)]
perf scripts intel-pt-events.py: Fix IPC output for Python 2
Integers are not converted to floats during division in Python 2 which
results in incorrect IPC values. Fix by switching to new division
behavior.
Fixes: a483e64c0b62e93a ("perf scripting python: intel-pt-events.py: Add --insn-trace and --src-trace")
Signed-off-by: Roman Lozko <lozko.roma@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20230310150445.2925841-1-lozko.roma@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 14 Mar 2023 11:41:36 +0000 (08:41 -0300)]
perf tools bpf: Add vmlinux.h to .gitignore
Now that BPF skel based tools will be built by default if the toolchain
pieces that are needed are available, building directly on the source
tree will produce a vmlinux.h from the BTF info that needs to get
ignored.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Wed, 8 Mar 2023 00:27:14 +0000 (16:27 -0800)]
perf test: Fix "PMU event table sanity" for NO_JEVENTS=1
A table was renamed and needed to be renamed in the empty case.
Fixes: 62774db2a05dc878 ("perf jevents: Generate metrics and events as separate tables")
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230308002714.1755698-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 13 Mar 2023 20:48:25 +0000 (13:48 -0700)]
perf lock contention: Show lock type with address
Show lock type names after the symbol of locks if any. This can be
useful especially when it doesn't show the lock symbols.
The indentation before the lock type parenthesis is to recognize lock
symbols more easily.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
44 6.13 ms 284.49 us 139.28 us
ffffffff92e06080 tasklist_lock (rwlock)
159 983.38 us 12.38 us 6.18 us
ffff8cc717c90000 siglock (spinlock)
10 679.90 us 153.35 us 67.99 us
ffff8cdc2872aaf8 mmap_lock (rwsem)
9 558.11 us 180.67 us 62.01 us
ffff8cd647914038 mmap_lock (rwsem)
78 228.56 us 7.82 us 2.93 us
ffff8cc700061c00 (spinlock)
5 41.60 us 16.93 us 8.32 us
ffffd853acb41468 (spinlock)
10 37.24 us 5.87 us 3.72 us
ffff8cd560b5c200 siglock (spinlock)
4 11.17 us 3.97 us 2.79 us
ffff8d053ddf0c80 rq_lock (spinlock)
1 7.86 us 7.86 us 7.86 us
ffff8cd64791404c (spinlock)
1 4.13 us 4.13 us 4.13 us
ffff8d053d930c80 rq_lock (spinlock)
7 3.98 us 1.67 us 568 ns
ffff8ccb92479440 (mutex)
2 2.62 us 2.33 us 1.31 us
ffff8cc702e6ede0 (rwlock)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 13 Mar 2023 20:48:24 +0000 (13:48 -0700)]
perf lock contention: Show per-cpu rq_lock with address
Using the BPF_PROG_RUN mechanism, we can run a raw_tp BPF program to
collect some semi-global locks like per-cpu locks. Let's add runqueue
locks using bpf_per_cpu_ptr() helper.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
248 3.25 ms 32.23 us 13.10 us
ffff8cc75cfd2940 siglock
60 217.91 us 9.69 us 3.63 us
ffff8cc700061c00
8 70.23 us 13.86 us 8.78 us
ffff8cc703629484
4 56.32 us 35.81 us 14.08 us
ffff8cc78b66f778 mmap_lock
4 16.70 us 5.18 us 4.18 us
ffff8cc7036a0684
3 4.99 us 2.65 us 1.66 us
ffff8d053da30c80 rq_lock
2 3.44 us 2.28 us 1.72 us
ffff8d053dcf0c80 rq_lock
9 2.51 us 371 ns 278 ns
ffff8ccb92479440
2 2.11 us 1.24 us 1.06 us
ffff8d053db30c80 rq_lock
2 2.06 us 1.69 us 1.03 us
ffff8d053d970c80 rq_lock
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>