Pull perf tools updates from Arnaldo Carvalho de Melo:
"Miscellaneous:
- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
- Add support for retire latency feature (pipeline stall of a
instruction compared to the previous one, in cycles) present on
some Intel processors.
- Add 'perf c2c' report option to show false sharing with adjacent
cachelines, to be used in machines with cacheline prefetching,
where accesses to a cacheline brings the next one too.
- Skip 'perf test bpf' when the required kernel-debuginfo package
isn't installed.
- Avoid d3-flame-graph package dependency in 'perf script flamegraph',
making this feature more generally available.
- Add JSON metric events to present CPI stall cycles in Power10.
- Assorted improvements/refactorings on the JSON metrics parsing
code.
perf lock contention:
- Add -o/--lock-owner option:
$ sudo ./perf lock contention -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
The owner is unknown in most cases. Filtering only for the
mutex locks, it will more likely get the owners.
- -S/--callstack-filter is to limit display entries having the given
string in the callstack:
$ sudo ./perf lock contention -abv -S net sleep 1
...
contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Please note that to have the -b option (BPF) working above one has
to build with BUILD_BPF_SKEL=1.
- Add more 'perf test' entries to test these new features.
perf script:
- Add 'cgroup' field for 'perf script' output:
$ perf record --all-cgroups -- true
$ perf script -F comm,pid,cgroup
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
- Add support for showing branch speculation information in 'perf
script' and in the 'perf report' raw dump (-D).
perf record:
- Fix 'perf record' segfault with --overwrite and --max-size.
perf test/bench:
- Switch basic BPF filtering test to use syscall tracepoint to avoid
the variable number of probes inserted when using the previous
probe point (do_epoll_wait) that happens on different CPU
architectures.
- Fix DWARF unwind test by adding non-inline to expected function in
a backtrace.
- Use 'grep -c' where the longer form 'grep | wc -l' was being used.
- Add getpid and execve benchmarks to 'perf bench syscall'.
Intel PT:
- Add support for synthesizing "cycle" events from Intel PT traces as
we support "instruction" events when Intel PT CYC packets are
available. This enables much more accurate profiles than when using
the regular 'perf record -e cycles' (the default) when the workload
lasts for very short periods (<10ms).
- .plt symbol handling improvements, better handling IBT (in the past
MPX) done in the context of decoding Intel PT processor traces,
IFUNC symbols on x86_64, static executables, understanding .plt.got
symbols on x86_64.
- Add a 'perf test' to test symbol resolution, part of the .plt
improvements series, this tests things like symbol size in contexts
where only the symbol start is available (kallsyms), etc.
- Better handle auxtrace/Intel PT data when using pipe mode (perf
record sleep 1|perf report).
- Fix symbol lookup with kcore with multiple segments match stext,
getting the symbol resolution to just show DSOs as unknown.
ARM:
- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
Macrocell v4).
- Ensure ARM64 CoreSight timestamps don't go backwards.
- Document that ARM64 SPE (Statistical Profiling Extension) is used
with 'perf c2c/mem'.
- Add raw decoding for ARM64 SPEv1.2 previous branch address.
- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1,
TLB, cache, branch, PE utilization and instruction mix metrics.
- Update decoder code for OpenCSD version 1.4, on ARM64 systems.
- Fix command line auto-complete of CPU events on aarch64.
Build:
- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked,
as several tests use tracepoints, those should be skipped.
- More fallout fixes for the removal of tools/lib/traceevent/.
- Fix build error when linking with libpfm"
* tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits)
perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc
perf intel-pt: Synthesize cycle events
perf c2c: Add report option to show false sharing in adjacent cachelines
perf record: Fix segfault with --overwrite and --max-size
perf stat: Avoid merging/aggregating metric counts twice
perf tools: Fix perf tool build error in util/pfm.c
perf tools: Fix auto-complete on aarch64
perf lock contention: Support old rw_semaphore type
perf lock contention: Add -o/--lock-owner option
perf lock contention: Fix to save callstack for the default modified
perf test bpf: Skip test if kernel-debuginfo is not present
perf probe: Update the exit error codes in function try_to_find_probe_trace_event
perf script: Fix missing Retire Latency fields option documentation
perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT
perf test x86: Support the retire_lat (Retire Latency) sample_type check
perf test bpf: Check for libtraceevent support
perf script: Support Retire Latency
perf report: Support Retire Latency
perf lock contention: Support filters for different aggregation
...