1 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
3 <book id="oprofile-guide">
5 <title>OProfile manual</title>
9 <firstname>John</firstname>
10 <surname>Levon</surname>
12 <address><email>levon@movementarian.org</email></address>
18 <year>2000-2004</year>
19 <holder>Victoria University of Manchester, John Levon and others</holder>
25 <chapter id="introduction">
26 <title>Introduction</title>
29 This manual applies to OProfile version <oprofileversion />.
30 OProfile is a profiling system for Linux 2.6 and higher systems on a number of architectures. It is capable of profiling
31 all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries
32 to binaries. OProfile can profile the whole system in the background, collecting information at a low overhead. These
33 features make it ideal for profiling entire systems to determine bottle necks in real-world systems.
36 Many CPUs provide "performance counters", hardware registers that can count "events"; for example,
37 cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events:
38 repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded.
39 This information is aggregated into profiles for each binary image.</para>
41 Some hardware setups do not allow OProfile to use performance counters: in these cases, no
42 events are available so OProfile operates in timer mode, as described in later chapters. Timer
43 mode is only available in "legacy mode" (see <xref linkend="legacy_mode"/>).
45 <sect1 id="legacy_mode">
46 <title>OProfile legacy mode</title>
47 "Legacy" OProfile consists of the <command>opcontrol</command> shell script, the <command>oprofiled</command> daemon, and several post-processing tools (e.g.,
48 <command>opreport</command>). The <command>opcontrol</command> script is used for configuring, starting, and stopping a profiling session. An OProfile
49 kernel driver (usually built as a kernel module) is used for collecting samples, which are then recorded into sample files by
50 <command>oprofiled</command>. Using OProfile in "legacy mode" requires root user authority since the profiling is done on a system-wide basis, which may
51 (if misused) cause adverse effects to the system.
53 Profiling setup parameters that you specify using <command>opcontrol</command> are cached in <filename>/root/.oprofile/daemonrc</filename>.
54 Subsequent runs of <code>opcontrol --start</code> will continue to use these cached values until you
55 override them with new values.
58 <sect1 id="perf_events">
59 <title>OProfile perf_events mode</title>
60 As of release 0.9.8, OProfile now includes the ability to profile a single process versus the system-wide technique
61 of legacy OProfile. With this new technique, the <command>operf</command> program is used to control profiling instead of the
62 <command>opcontrol</command> script and <command>oprofiled</command> daemon of leagacy mode. Also, <command>operf</command> does not require the
63 special OProfile kernel driver that legacy mode does; instead, it interfaces with the kernel to collect samples via the Linux Kernel
64 Performance Events Subsystem (hereafter referred to as "perf_events"). Using <command>operf</command> to profile a single
65 process can be done as a normal user; however, root authority <emphasis>is</emphasis> required to run <command>operf</command> in system-wide
69 The same OProfile post-processing tools are used whether you collect your profile with <command>operf</command> or <command>opcontrol</command>.
73 Some older processor models are not supported by the underlying perf_events kernel and, thus, are not supported by <command>operf</command>.
74 If you receive the message
75 <screen> Your kernel's Performance Events Subsystem does not support your processor type</screen>
76 when attempting to use <command>operf</command>, try profiling with <command>opcontrol</command>
77 to see if your processor type may be supported by OProfile's legacy mode.
80 <sect1 id="applications">
81 <title>Applications of OProfile</title>
83 OProfile is useful in a number of situations. You might want to use OProfile when you :
86 <listitem><para>need low overhead</para></listitem>
87 <listitem><para>cannot use highly intrusive profiling methods</para></listitem>
88 <listitem><para>need to profile interrupt handlers</para></listitem>
89 <listitem><para>need to profile an application and its shared libraries</para></listitem>
90 <listitem><para>need to profile dynamically compiled code of supported virtual machines (see <xref linkend="jitsupport"/>)</para></listitem>
91 <listitem><para>need to capture the performance behaviour of entire system</para></listitem>
92 <listitem><para>want to examine hardware effects such as cache misses</para></listitem>
93 <listitem><para>want detailed source annotation</para></listitem>
94 <listitem><para>want instruction-level profiles</para></listitem>
95 <listitem><para>want call-graph profiles</para></listitem>
98 OProfile is not a panacea. OProfile might not be a complete solution when you :
101 <listitem><para>require call graph profiles on platforms other than x86, ARM, and PowerPC</para></listitem>
102 <listitem><para>require 100% instruction-accurate profiles</para></listitem>
103 <listitem><para>need function call counts or an interstitial profiling API</para></listitem>
104 <listitem><para>cannot tolerate any disturbance to the system whatsoever</para></listitem>
105 <listitem><para>need to profile interpreted or dynamically compiled code of non-supported virtual machines</para></listitem>
107 <sect2 id="jitsupport">
108 <title>Support for dynamically compiled (JIT) code</title>
110 Older versions of OProfile were not capable of attributing samples to symbols from dynamically
111 compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into
112 anonymous memory regions. OProfile reported the samples from such code, but the attribution
114 <screen> anon: <tgid><address range></screen>
115 Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs)
116 like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code.
117 A development library is provided to allow developers
118 to add support for any VM that produces dynamically compiled code (see the <emphasis>OProfile JIT agent
119 developer guide</emphasis>).
120 In addition, built-in support is included for the following:</para>
121 <itemizedlist><listitem>JVMTI agent library for Java (1.5 and higher)</listitem>
122 <listitem>JVMPI agent library for Java (1.5 and lower)</listitem>
125 For information on how to use OProfile's JIT support, see <xref linkend="setup-jit"/>.
129 <sect2 id="guestsupport">
130 <title>No support for virtual machine guests</title>
132 OProfile currently does not support event-based profiling (i.e, using hardware events like cache misses,
133 branch mispredicts) on virtual machine guests running under systems such as VMware. The list of
134 supported events displayed by ophelp or 'opcontrol --list-events' is based on CPU type and does
135 not take into account whether the running system is a guest system or real system. To use
136 OProfile on such guest systems, you can use timer mode (see <xref linkend="timer" />).
143 <sect1 id="requirements">
144 <title>System requirements</title>
148 <term>Linux kernel</term>
150 To use OProfile's JIT support, a kernel version 2.6.13 or later is required.
151 In earlier kernel versions, the anonymous memory regions are not reported to OProfile and results
152 in profiling reports without any samples in these regions.
156 Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version
157 of 2.6.18 or more recent.
158 Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version
159 of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library
160 from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run
161 the <code>configure</code> utility with <code>--with-target=cell-be</code>.
163 Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1
166 <note>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the
167 system to crash.</note>
171 Instruction-Based Sampling (IBS) profile on AMD family10h processors requires
172 kernel version 2.6.28-rc2 or later.
177 <term>Supported architecture</term>
179 For Intel IA32, processors as old as P6 generation or Pentium 4 core are
180 supported. The AMD Athlon, Opteron, Phenom, and Turion CPUs are also supported.
181 Older IA32 CPU types can be used with the timer mode of OProfile; please
182 see later in this manual for details. OProfile also supports most processor
183 types of the following architectures: Alpha, MIPS, ARM, x86-64, sparc64, PowerPC,
184 AVR32, and, in timer mode, PA-RISC and s390.
188 <term>Uniprocessor or SMP</term>
190 SMP machines are fully supported.
194 <term>Required libraries</term>
196 These libraries are required : <filename>popt</filename>, <filename>bfd</filename>,
197 <filename>liberty</filename> (debian users: libiberty is provided in binutils-dev package), <filename>dl</filename>,
198 plus the standard C++ libraries.
202 <term>Required kernel headers</term>
204 In order to build the perf_events-enabled <command>operf</command> program, you need to either
205 install the kernel-headers package for your system or use the <code>--with-kernel</code>
210 <term>Required user account</term>
212 For secure processing of sample data from JIT virtual machines (e.g., Java),
213 the special user account "oprofile" must exist on the system. The 'configure'
214 and 'make install' operations will print warning messages if this
215 account is not found. If you intend to profile JITed code, you must create
216 a group account named 'oprofile' and then create the 'oprofile' user account,
217 setting the default group to 'oprofile'. A runtime error message is printed to
218 the oprofile log when processing JIT samples if this special user
219 account cannot be found.
223 <term>OProfile GUI</term>
225 The use of the GUI to start the profiler requires the <filename>Qt</filename> library.
226 Either <filename>Qt 3</filename> or <filename>Qt 4</filename> should work.
230 <term><acronym>ELF</acronym></term>
232 Probably not too strenuous a requirement, but older <acronym>A.OUT</acronym> binaries/libraries are not supported.
236 <term>K&R coding style</term>
238 OK, so it's not really a requirement, but I wish it was...
246 <sect1 id="resources">
247 <title>Internet resources</title>
251 <term>Web page</term>
253 There is a web page (which you may be reading now) at
254 <ulink url="http://oprofile.sf.net/">http://oprofile.sf.net/</ulink>.
258 <term>Download</term>
260 You can download a source tarball or check out code from
261 the code repository at the sourceforge page,
262 <ulink url="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</ulink>.
266 <term>Mailing list</term>
268 There is a low-traffic OProfile-specific mailing list, details at
269 <ulink url="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</ulink>.
273 <term>Bug tracker</term>
275 There is a bug tracker for OProfile at SourceForge,
276 <ulink url="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</ulink>.
280 <term>IRC channel</term>
282 Several OProfile developers and users sometimes hang out on channel <command>#oprofile</command>
283 on the <ulink url="http://oftc.net">OFTC</ulink> network.
291 <title>Installation</title>
294 First you need to build OProfile and install it. <command>./configure</command>, <command>make</command>, <command>make install</command>
295 is often all you need, but note these arguments to <command>./configure</command> :
299 <term><option>--with-java</option></term>
302 Use this option if you need to profile Java applications. Also, see
303 <xref linkend="requirements"/>, "Required user account". This option
304 is used to specify the location of the Java Development Kit (JDK)
305 source tree you wish to use. This is necessary to get the interface description
306 of the JVMPI (or JVMTI) interface to compile the JIT support code successfully.
310 The Java Runtime Environment (JRE) does not include the development
311 files that are required to compile the JIT support code, so the full
312 JDK must be installed in order to use this option.
316 By default, the Oprofile JIT support libraries will be installed in
317 <filename><oprof_install_dir>/lib/oprofile</filename>. To build
318 and install OProfile and the JIT support libraries as 64-bit, you can
319 do something like the following:
321 # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \
322 --with-java={my_jdk_installdir} \
323 --libdir=/usr/local/lib64
328 If you encounter errors building 64-bit, you should
329 install libtool 1.5.26 or later since that release of
330 libtool fixes known problems for certain platforms.
331 If you install libtool into a non-standard location,
332 you'll need to edit the invocation of 'aclocal' in
333 OProfile's autogen.sh as follows (assume an install
334 location of /usr/local):
337 <code>aclocal -I m4 -I /usr/local/share/aclocal</code>
343 <term><option>--with-qt-dir/includes/libraries</option></term>
345 Specify the location of Qt headers and libraries. It defaults to searching in
346 <constant>$QTDIR</constant> if these are not specified.
349 <varlistentry id="disable-werror">
350 <term><option>--disable-werror</option></term>
352 Development versions of OProfile build by
353 default with <option>-Werror</option>. This option turns
354 <option>-Werror</option> off.
357 <varlistentry id="disable-optimization">
358 <term><option>--disable-optimization</option></term>
360 Disable the <option>-O2</option> compiler flag
361 (useful if you discover an OProfile bug and want to give a useful
365 <varlistentry id="with-kernel">
366 <term><option>--with-kernel</option></term>
368 This option is used to specify the location of the kernel headers <filename>include</filename> directory
369 needed to build the perf_events-enabled <command>operf</command> program. By default, the OProfile
370 build system expects to find this directory under <filename>/usr</filename>. Use this option if your
371 kernel headers are in a non-standard location or if building in a cross-compile enviroment or in a
372 situation where the host system does not support perf_events but you wish to build binaries for a
373 target system that does support perf_events.
378 It is recommended that if you have a
379 uniprocessor machine, you enable the local APIC / IO_APIC support for
380 your kernel (this is automatically enabled for SMP kernels). With many BIOS (kernel >= 2.6.9 and UP kernel)
381 it's not sufficient to enable the local APIC -- you must also turn it on explicitly at boot
382 time by providing the "lapic" option to the kernel.
383 If you use the NMI watchdog, be aware that the watchdog is disabled when profiling starts
384 and not re-enabled until the profiling is stopped.
387 Please note that you must save or have available the <filename>vmlinux</filename> file
388 generated during a kernel compile, as OProfile needs it (you can use
389 <option>--no-vmlinux</option>, but this will prevent kernel profiling).
394 <sect1 id="uninstall">
395 <title>Uninstalling OProfile</title>
397 You must have the source tree available to uninstall OProfile; a <command>make uninstall</command> will
398 remove all installed files except your configuration file in the directory <filename>~/.oprofile</filename>.
404 <chapter id="overview">
405 <title>Overview</title>
406 <sect1 id="getting-started-with-operf">
407 <title>Getting started with OProfile using <command>operf</command></title>
409 Profiling with <command>operf</command> is the recommended profiling mode with OProfile. Using
410 this mode not only allows you to target your profiling more precisely (i.e., single process
411 or system-wide), it also allows OProfile to co-exist better with other tools on your system that
412 may also be using the perf_events kernel subsystem.
415 With <command>operf</command>, there is no initial setup needed -- simply invoke <command>operf</command> with
416 the options you need; then run the OProfile post-processing tool(s). The <command>operf</command> syntax
419 <screen>operf [ options ] [ --system-wide | --pid=<PID> | [ command [ args ] ] ]</screen>
421 A typical usage might look like this:
423 <screen>operf ./my_test_program my_arg</screen>
425 When <filename>./my_test_program</filename> completes (or when you press Ctrl-C), profiling
426 stops and you're ready to use <command>opreport</command> or other OProfile post-processing tools.
427 By default, <command>operf</command> stores the sample data in <filename><cur_dir>/oprofile_data/samples/current</filename>,
428 and <command>opreport</command> and other post-processing tools will look in that location first for profile data,
429 unless you pass the <code>--session-dir</code> option.
433 <sect1 id="getting-started-with-legacy">
434 <title>Getting started with OProfile using legacy mode</title>
436 Before you can use OProfile's legacy mode, you must set it up. The minimum setup required for this
437 is to tell OProfile where the <filename>vmlinux</filename> file corresponding to the
438 running kernel is, for example :
440 <screen>opcontrol --vmlinux=/boot/vmlinux-`uname -r`</screen>
442 If you don't want to profile the kernel itself,
443 you can tell OProfile you don't have a <filename>vmlinux</filename> file :
445 <screen>opcontrol --no-vmlinux</screen>
447 Now we are ready to start the daemon (<command>oprofiled</command>) which collects
450 <screen>opcontrol --start</screen>
452 When you want to stop profiling, you can do so with :
454 <screen>opcontrol --shutdown</screen>
456 Note that unlike <command>gprof</command>, no instrumentation (<option>-pg</option>
457 and <option>-a</option> options to <command>gcc</command>)
461 Periodically (or on <command>opcontrol --shutdown</command> or <command>opcontrol --dump</command>)
462 the profile data is written out into the $SESSION_DIR/samples directory (by default at <filename>/var/lib/oprofile/samples</filename>).
463 These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules.
464 You can clear the profile data (at any time) with <command>opcontrol --reset</command>.
467 To place these sample database files in a specific directory instead of the default location
468 (<filename>/var/lib/oprofile</filename>) use the <option>--session-dir=dir</option> option.
469 You must also specify the <option>--session-dir</option> to tell the tools to continue using this directory.
471 <screen>opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</screen>
472 <screen>opcontrol --start --session-dir=/home/me/tmpsession</screen>
474 You can get summaries of this data in a number of ways at any time. To get a summary of
475 data across the entire system for all of these profiles, you can do :
477 <screen>opreport [--session-dir=dir]</screen>
479 Or to get a more detailed summary, for a particular image, you can do something like :
481 <screen>opreport -l /boot/vmlinux-`uname -r`</screen>
483 There are also a number of other ways of presenting the data, as described later in this manual.
484 Note that OProfile will choose a default profiling setup for you. However, there are a number
485 of options you can pass to <command>opcontrol</command> if you need to change something,
491 <sect1 id="tools-overview">
492 <title>Tools summary</title>
494 This section gives a brief description of the available OProfile utilities and their purpose.
498 <term><filename>ophelp</filename></term>
500 This utility lists the available events and short descriptions.
505 <term><filename>operf</filename></term>
507 This is the recommended program for collecting profile data.
512 <term><filename>opcontrol</filename></term>
514 Used for controlling OProfile data collection in legacy mode, discussed in <xref linkend="controlling" />.
519 <term><filename>agent libraries</filename></term>
521 Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <xref linkend="setup-jit" />.
526 <term><filename>opreport</filename></term>
528 This is the main tool for retrieving useful profile data, described in
529 <xref linkend="opreport" />.
534 <term><filename>opannotate</filename></term>
536 This utility can be used to produce annotated source, assembly or mixed source/assembly.
537 Source level annotation is available only if the application was compiled with
538 debugging symbols. See <xref linkend="opannotate" />.
543 <term><filename>opgprof</filename></term>
545 This utility can output gprof-style data files for a binary, for use with
546 <command>gprof -p</command>. See <xref linkend="opgprof" />.
551 <term><filename>oparchive</filename></term>
553 This utility can be used to collect executables, debuginfo,
554 and sample files and copy the files into an archive.
555 The archive is self-contained and can be moved to another
556 machine for further analysis.
557 See <xref linkend="oparchive" />.
562 <term><filename>opimport</filename></term>
564 This utility converts sample database files from a foreign binary format (abi) to
565 the native format. This is useful only when moving sample files between hosts,
566 for analysis on platforms other than the one used for collection.
567 See <xref linkend="opimport" />.
576 <chapter id="controlling">
577 <title>Controlling the profiler</title>
579 <sect1 id="controlling-operf">
580 <title>Using <command>operf</command></title>
582 This section describes in detail how <command>operf</command> is used to
583 control profiling. Unless otherwise directed, <command>operf</command> will profile using
584 the default event for your system. For most systems, the default event is some
585 cycles-based event, assuming your processor type supports hardware performance
586 counters. If your hardware <emphasis>does</emphasis> support performance counters, you can specify
587 something other than the default hardware event on which to profile. The performance
588 monitor counters can be programmed to count various hardware events,
589 such as cache misses or MMX operations. The event
590 chosen for each counter is reflected in the profile data collected
591 by OProfile: functions and binaries at the top of the profiles reflect
592 that most of the chosen events happened within that code.
595 Additionally, each counter is programmed with a "count" value, which corresponds to how
596 detailed the profile is. The lower the value, the more frequently profile
597 samples are taken. You can choose to sample only kernel code, user-space code,
598 or both (both is the default). Finally, some events have a "unit mask"
599 -- this is a value that further restricts the types of event that are counted.
600 You can see the event types and unit masks for your CPU using <command>ophelp</command>.
601 More information on event specification can be found at <xref linkend="eventspec"/>.
604 The <command>operf</command> command syntax is:
606 <screen>operf [ options ] [ --system-wide | --pid=<PID> | [ command [ args ] ] ]</screen>
608 When profiling an application using either the <code>command</code> or <code>--pid</code> option of
609 <command>operf</command>, forks and execs of the profiled process will also be profiled. The samples
610 from an exec'ed process will be attributed to the executable binary run by that process. See
611 <xref linkend="interpreting_operf_results"/>
614 Following is a description of the <command>operf</command> options.
618 <term><option>command</option></term>
620 The command or application to be profiled. <command>args</command> are the input arguments
621 that the command or application requires. Either <code>command</code>, <code>--pid</code> or
622 <code>--system-wide</code> is required, but cannot be used simultaneously.
626 <term><option>--pid / -p [PID]</option></term>
628 This option enables <command>operf</command> to profile a running application. <code>PID</code>
629 should be the process ID of the process you wish to profile. When
630 finished profiling (e.g., when the profiled process ends), press
631 Ctrl-c to stop <command>operf</command>.
635 <term><option>--system-wide / -s</option></term>
637 This option is for performing a system-wide profile. You must
638 have root authority to run <command>operf</command> in this mode.
639 When finished profiling, Ctrl-C to stop <command>operf</command>. If you run
640 <code>operf --system-wide</code> as a background job (i.e., with the &), you
641 <emphasis>must</emphasis> stop it in a controlled manner in order to process
642 the profile data it has collected. Use <code>kill -SIGINT <operf-PID></code>
643 for this purpose. It is recommended that when running <command>operf</command>
644 with this option, your current working directory should be <filename>/root</filename> or a subdirectory
645 of <filename>/root</filename> to avoid storing sample data files in locations accessible by regular users.
649 <term><option>--vmlinux / k [vmlinux_path]</option></term>
651 A vmlinux file that matches the running kernel that has symbol and/or debuginfo.
652 Kernel samples will be attributed to this binary, allowing post-processing tools
653 (like <command>opreport</command>) to attribute samples to the appropriate kernel symbols.
654 If this option is not specified, all kernel samples will be attributed to a pseudo
655 binary named "no-vmlinux".
659 <term><option>--callgraph / -g</option></term>
661 This option enables the callgraph to be saved during profiling. NOTE: The
662 full callchain is recorded, so there is no depth limit.
666 <term><option>--append / -a</option></term>
668 By default, <command>operf</command> moves old profile data from
669 <filename><session_dir>/samples/current</filename> to
670 <filename><session_dir>/samples/previous</filename>.
671 If a 'previous' profile already existed, it will be replaced. If the
672 <code>--append</code> option is passed, old profile data in 'current' is left in place and
673 new profile data will be added to it, and the 'previous' profile (if one existed)
674 will remain untouched. To access the 'previous' profile, simply add a session
675 specification to the normal invocation of oprofile post-processing tools; for example:
678 <screen>opreport session:previous</screen>
682 <term><option>--events / -e [event1[,event2[,...]]]</option></term>
684 This option is for passing a comma-separated list of event specifications
685 for profiling. Each event spec is of the form:
687 <screen>name:count[:unitmask[:kernel[:user]]]</screen>
689 When no event specification is given, the default event for the running
690 processor type will be used for profiling. Use <command>ophelp</command>
691 to list the available events for your processor type.
695 <term><option>--separate-thread / -t</option></term>
697 This option categorizes samples by thread group ID (tgid) and thread ID (tid).
698 The <code>--separate-thread</code> option is useful for seeing per-thread samples in
699 multi-threaded applications. When used in conjuction with the <code>--system-wide</code>
700 option, <code>--separate-thread</code> is also useful for seeing per-process
701 (i.e., per-thread group) samples for the case where multiple processes are
702 executing the same program during a profiling run.
706 <term><option>--separate-cpu / -c</option></term>
708 This option categorizes samples by cpu.
712 <term><option>--session-dir / -d [path]</option></term>
714 This option specifies the session directory to hold the sample data. If not specified,
715 the data is saved in the <filename>oprofile_data</filename> directory on the current path.
719 <term><option>---lazy-conversion / -l</option></term>
721 Use this option to reduce the overhead of <command>operf</command> during profiling.
722 Normally, profile data received from the kernel is converted to OProfile format
723 during profiling time. This is typically not an issue when profiling a single
724 application. But when using the <code>--system-wide</code> option, this on-the-fly
725 conversion process can cause noticeable overhead, particularly on busy
726 multi-processor systems. The <code>--lazy-conversion</code> option directs
727 <command>operf</command> to wait until profiling is completed to do the conversion
732 <term><option>--verbose / -V [level]</option></term>
734 A comma-separated list of debugging control values used to increase the verbosity of the
735 output. Valid values are: debug, record, convert, misc, sfile, arcs, and the special value, 'all'.
739 <term><option>--version -v </option></term>
741 Show <command>operf</command> version.
745 <term><option>--help / -h</option></term>
753 <sect1 id="controlling-daemon">
754 <title>Using <command>opcontrol</command></title>
756 In this section we describe the configuration and control of the profiling system
757 with opcontrol in more depth. See <xref linkend="controlling-operf"/> for a description
758 of the preferred profiling method.
761 The <command>opcontrol</command> script has a default setup, but you
762 can alter this with the options given below. In particular, you can select
763 specific hardware events on which to base your profile. See <xref linkend="controlling-operf"/> for an
764 introduction to hardware events and performance counter configuration.
765 The event types and unit masks for your CPU are listed by <command>opcontrol
766 --list-events</command> or <command>ophelp</command>.
769 The <command>opcontrol</command> script provides the following actions :
773 <term><option>--init</option></term>
775 Loads the OProfile module if required and makes the OProfile driver
780 <term><option>--setup</option></term>
782 Followed by list arguments for profiling set up. List of arguments
783 saved in <filename>/root/.oprofile/daemonrc</filename>.
784 Giving this option is not necessary; you can just directly pass one
785 of the setup options, e.g. <command>opcontrol --no-vmlinux</command>.
789 <term><option>--status</option></term>
791 Show configuration information.
795 <term><option>--start-daemon</option></term>
797 Start the oprofile daemon without starting actual profiling. The profiling
798 can then be started using <option>--start</option>. This is useful for avoiding
799 measuring the cost of daemon startup, as <option>--start</option> is a simple
800 write to a file in oprofilefs.
804 <term><option>--start</option></term>
806 Start data collection with either arguments provided by <option>--setup</option>
807 or information saved in <filename>/root/.oprofile/daemonrc</filename>. Specifying
808 the addition <option>--verbose</option> makes the daemon generate lots of debug data
809 whilst it is running.
813 <term><option>--dump</option></term>
815 Force a flush of the collected profiling data to the daemon.
819 <term><option>--stop</option></term>
821 Stop data collection.
825 <term><option>--shutdown</option></term>
827 Stop data collection and kill the daemon.
831 <term><option>--reset</option></term>
833 Clears out data from current session, but leaves saved sessions.
837 <term><option>--save=</option>session_name</term>
839 Save data from current session to session_name.
843 <term><option>--deinit</option></term>
845 Shuts down daemon. Unload the OProfile module and oprofilefs.
849 <term><option>--list-events</option></term>
851 List event types and unit masks.
855 <term><option>--help</option></term>
857 Generate usage messages.
863 There are a number of possible settings, of which, only
864 <option>--vmlinux</option> (or <option>--no-vmlinux</option>)
865 is required. These settings are stored in <filename>~/.oprofile/daemonrc</filename>.
869 <term><option>--buffer-size=</option>num</term>
871 Number of samples in kernel buffer.
872 Buffer watershed needs to be tweaked when changing this value.
876 <term><option>--buffer-watershed=</option>num</term>
878 Set kernel buffer watershed to num samples. When remain only
879 buffer-size - buffer-watershed free entries remain in the kernel buffer, data will be
880 flushed to the daemon. Most useful values are in the range [0.25 - 0.5] * buffer-size.
884 <term><option>--cpu-buffer-size=</option>num</term>
886 Number of samples in kernel per-cpu buffer. If you
887 profile at high rate, it can help to increase this if the log
888 file show excessive count of samples lost due to cpu buffer overflow.
892 <term><option>--event=</option>[eventspec]</term>
894 Use the given performance counter event to profile.
895 See <xref linkend="eventspec" /> below.
899 <term><option>--session-dir=</option>dir_path</term>
901 Create/use sample database out of directory <filename>dir_path</filename> instead of
902 the default location (/var/lib/oprofile).
906 <term><option>--separate=</option>[none,lib,kernel,thread,cpu,all]</term>
908 By default, every profile is stored in a single file. Thus, for example,
909 samples in the C library are all accredited to the <filename>/lib/libc.o</filename>
910 profile. However, you choose to create separate sample files by specifying
911 one of the below options.
913 <informaltable frame="all">
916 <row><entry><option>none</option></entry><entry>No profile separation (default)</entry></row>
917 <row><entry><option>lib</option></entry><entry>Create per-application profiles for libraries</entry></row>
918 <row><entry><option>kernel</option></entry><entry>Create per-application profiles for the kernel and kernel modules</entry></row>
919 <row><entry><option>thread</option></entry><entry>Create profiles for each thread and each task</entry></row>
920 <row><entry><option>cpu</option></entry><entry>Create profiles for each CPU</entry></row>
921 <row><entry><option>all</option></entry><entry>All of the above options</entry></row>
926 Note that <option>--separate=kernel</option> also turns on <option>--separate=lib</option>.
927 <!-- FIXME: update if this change -->
928 When using <option>--separate=kernel</option>, samples in hardware interrupts, soft-irqs, or other
929 asynchronous kernel contexts are credited to the task currently running. This means you will see
930 seemingly nonsense profiles such as <filename>/bin/bash</filename> showing samples for the PPP modules,
934 Using <option>--separate=thread</option> creates a lot
935 of sample files if you leave OProfile running for a while; it's most
936 useful when used for short sessions, or when using image filtering.
941 <term><option>--callgraph=</option>#depth</term>
943 Enable call-graph sample collection with a maximum depth. Use 0 to disable
944 callgraph profiling. NOTE: Callgraph support is available on a limited
945 number of platforms at this time; for example:
948 <listitem><para>x86 with 2.6 or higher kernel</para></listitem>
949 <listitem><para>ARM with 2.6 or higher kernel</para></listitem>
950 <listitem><para>PowerPC with 2.6.17 or higher kernel</para></listitem>
956 <term><option>--image=</option>image,[images]|"all"</term>
958 Image filtering. If you specify one or more absolute
959 paths to binaries, OProfile will only produce profile results for those
960 binary images. This is useful for restricting the sometimes voluminous
961 output you may get otherwise, especially with
962 <option>--separate=thread</option>. Note that if you are using
963 <option>--separate=lib</option> or
964 <option>--separate=kernel</option>, then if you specification an
965 application binary, the shared libraries and kernel code
966 <emphasis>are</emphasis> included. Specify the value
967 "all" to profile everything (the default).
971 <term><option>--vmlinux=</option>file</term>
973 vmlinux kernel image.
977 <term><option>--no-vmlinux</option></term>
979 Use this when you don't have a kernel vmlinux file, and you don't want
980 to profile the kernel. This still counts the total number of kernel samples,
981 but can't give symbol-based results for the kernel or any modules.
986 <sect2 id="opcontrolexamples">
987 <title>Examples</title>
989 <sect3 id="examplesperfctr">
990 <title>Intel performance counter setup</title>
992 Here, we have a Pentium III running at 800MHz, and we want to look at where data memory
993 references are happening most, and also get results for CPU time.
996 # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000
997 # opcontrol --vmlinux=/boot/2.6.0/vmlinux
1002 <sect3 id="examplesstartdaemon">
1003 <title>Starting the daemon separately</title>
1005 Use <option>--start-daemon</option> to avoid
1006 the profiler startup affecting results.
1009 # opcontrol --vmlinux=/boot/2.6.0/vmlinux
1010 # opcontrol --start-daemon
1011 # my_favourite_benchmark --init
1012 # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop
1016 <sect3 id="exampleseparate">
1017 <title>Separate profiles for libraries and the kernel</title>
1019 Here, we want to see a profile of the OProfile daemon itself, including when
1020 it was running inside the kernel driver, and its use of shared libraries.
1023 # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux
1025 # my_favourite_stress_test --run
1026 # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled
1030 <sect3 id="examplessessions">
1031 <title>Profiling sessions</title>
1033 It can often be useful to split up profiling data into several different
1034 time periods. For example, you may want to collect data on an application's
1035 startup separately from the normal runtime data. You can use the simple
1036 command <command>opcontrol --save</command> to do this. For example :
1039 # opcontrol --save=blah
1042 will create a sub-directory in <filename>$SESSION_DIR/samples</filename> containing the samples
1043 up to that point (the current session's sample files are moved into this
1044 directory). You can then pass this session name as a parameter to the post-profiling
1045 analysis tools, to only get data up to the point you named the
1046 session. If you do not want to save a session, you can do
1047 <command>rm -rf $SESSION_DIR/samples/sessionname</command> or, for the
1048 current session, <command>opcontrol --reset</command>.
1054 <sect1 id="eventspec">
1055 <title>Specifying performance counter events</title>
1057 Both methods of profiling (<command>operf</command> and <command>opcontrol</command>)
1058 allow you to give one or more event specifications to provide details of how each
1059 hardware performance counter should be setup. With <command>operf</command>, you
1060 can provide a comma-separated list of event specfications using the <code>--events</code>
1061 option. With <command>opcontrol</command>, you use the <code>--event</code> option
1062 for each desired event specification.
1063 The event specification is a colon-separated string of the form
1064 <option><emphasis>name</emphasis>:<emphasis>count</emphasis>:<emphasis>unitmask</emphasis>:<emphasis>kernel</emphasis>:<emphasis>user</emphasis></option>
1065 as described in the table below.
1068 If no event specs are passed to <command>operf</command> or <command>opcontrol</command>,
1069 the default event will be used for profiling. With <command>opcontrol</command>, if you have
1070 previously specified some non-default event but want to revert to the default event, use
1071 <option>--event=default</option>. Use of this option overrides all previous event selections
1072 that have been cached.
1075 <note>OProfile will allocate hardware counters as necessary, but some processor
1076 types have restrictions as to what hardware events may be counted simultaneously.
1077 The <command>operf</command> program uses a multiplexing technique when such
1078 hardware restrictions are encountered, but <command>opcontrol</command> does
1079 not have this capability; instead, <command>opcontrol</command> will display an
1080 error message if you select an incompatible set of events.
1083 <informaltable frame="all">
1086 <row><entry><option>name</option></entry><entry>The symbolic event name, e.g. <constant>CPU_CLK_UNHALTED</constant></entry></row>
1087 <row><entry><option>count</option></entry><entry>The counter reset value, e.g. 100000</entry></row>
1088 <row><entry><option>unitmask</option></entry><entry>The unit mask, as given in the events list: e.g. 0x0f; or a symbolic name as
1089 given by the first word of the description (only valid for unit masks having an "extra:" parameter)</entry></row>
1090 <row><entry><option>kernel</option></entry><entry>Whether to profile kernel code</entry></row>
1091 <row><entry><option>user</option></entry><entry>Whether to profile userspace code</entry></row>
1096 The last three values are optional, if you omit them (e.g. <option>--event=DATA_MEM_REFS:30000</option>),
1097 they will be set to the default values (a unit mask of 0, and profiling both kernel and
1098 userspace code). Note that some events require a unit mask.
1101 When specifying a unit mask value, it may be either a hexadecimal value (which
1102 <emphasis>must</emphasis> begin with "0x") or a string (i.e, symbolic name) which matches
1103 the first word in the unit mask description. Specifying a symbolic name for
1104 the unit mask is valid only for unit masks having "extra:" parameters, as
1105 shown by the output of <command>ophelp</command>. Unit masks with "extra:" parameters must be
1106 specified using the symbolic name.
1109 When using legacy mode <command>opcontrol</command> on PowerPC platforms, all events specified must be in the same group;
1110 i.e., the group number appended to the event name (e.g. <constant><<emphasis>some-event-name</emphasis>>_GRP9
1111 </constant>) must be the same.
1114 If OProfile is using timer-interrupt mode, there is no event configuration possible.
1117 The table below lists the default event for various processor types:
1119 <informaltable frame="all">
1122 <row><entry>Processor</entry><entry>cpu_type</entry><entry>Default event</entry></row>
1123 <row><entry>Alpha EV4</entry><entry>alpha/ev4</entry><entry>CYCLES:100000:0:1:1</entry></row>
1124 <row><entry>Alpha EV5</entry><entry>alpha/ev5</entry><entry>CYCLES:100000:0:1:1</entry></row>
1125 <row><entry>Alpha PCA56</entry><entry>alpha/pca56</entry><entry>CYCLES:100000:0:1:1</entry></row>
1126 <row><entry>Alpha EV6</entry><entry>alpha/ev6</entry><entry>CYCLES:100000:0:1:1</entry></row>
1127 <row><entry>Alpha EV67</entry><entry>alpha/ev67</entry><entry>CYCLES:100000:0:1:1</entry></row>
1128 <row><entry>ARM/XScale PMU1</entry><entry>arm/xscale1</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1129 <row><entry>ARM/XScale PMU2</entry><entry>arm/xscale2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1130 <row><entry>ARM/MPCore</entry><entry>arm/mpcore</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1131 <row><entry>AVR32</entry><entry>avr32</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1132 <row><entry>Athlon</entry><entry>i386/athlon</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1133 <row><entry>Pentium Pro</entry><entry>i386/ppro</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1134 <row><entry>Pentium II</entry><entry>i386/pii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1135 <row><entry>Pentium III</entry><entry>i386/piii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1136 <row><entry>Pentium M (P6 core)</entry><entry>i386/p6_mobile</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1137 <row><entry>Pentium 4 (non-HT)</entry><entry>i386/p4</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row>
1138 <row><entry>Pentium 4 (HT)</entry><entry>i386/p4-ht</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row>
1139 <row><entry>Hammer</entry><entry>x86-64/hammer</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1140 <row><entry>Family10h</entry><entry>x86-64/family10</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1141 <row><entry>Family11h</entry><entry>x86-64/family11h</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row>
1142 <row><entry>Itanium</entry><entry>ia64/itanium</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1143 <row><entry>Itanium 2</entry><entry>ia64/itanium2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row>
1144 <row><entry>TIMER_INT</entry><entry>timer</entry><entry>None selectable</entry></row>
1145 <row><entry>IBM pseries</entry><entry>PowerPC 4/5/6/7/970/Cell</entry><entry>CYCLES:100000:0:1:1</entry></row>
1146 <row><entry>IBM s390</entry><entry>timer</entry><entry>None selectable</entry></row>
1147 <row><entry>IBM s390x</entry><entry>timer</entry><entry>None selectable</entry></row>
1154 <sect1 id="setup-jit">
1155 <title>Setting up the JIT profiling feature</title>
1157 To gather information about JITed code from a virtual machine,
1158 it needs to be instrumented with an agent library. We use the
1159 agent libraries for Java in the following example. To use the
1160 Java profiling feature, you must build OProfile with the "--with-java" option
1161 (<xref linkend="install" />).
1165 <sect2 id="setup-jit-jvm">
1166 <title>JVM instrumentation</title>
1168 Add this to the startup parameters of the JVM (for JVMTI):
1170 <screen><option>-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</option> </screen>
1172 <screen><option>-agentlib:jvmti_oprofile[=<options>]</option> </screen>
1175 The JVMPI agent implementation is enabled with the command line option
1176 <screen><option>-Xrunjvmpi_oprofile[:<options>]</option> </screen>
1179 Currently, there is just one option available -- <option>debug</option>. For JVMPI,
1180 the convention for specifying an option is <option>option_name=[yes|no]</option>.
1181 For JVMTI, the option specification is simply the option name, implying
1182 "yes"; no option specified implies "no".
1185 The agent library (installed in <filename><oprof_install_dir>/lib/oprofile</filename>)
1186 needs to be in the library search path (e.g. add the library directory
1187 to <constant>LD_LIBRARY_PATH</constant>). If the command line of
1188 the JVM is not accessible, it may be buried within shell scripts or a
1189 launcher program. It may also be possible to set an environment variable to add
1190 the instrumentation.
1191 For Sun JVMs this is <constant>JAVA_TOOL_OPTIONS</constant>. Please check
1192 your JVM documentation for
1193 further information on the agent startup options.
1199 <sect1 id="oprofile-gui">
1200 <title>Using <command>oprof_start</command></title>
1202 The <command>oprof_start</command> application provides a convenient way to start the profiler.
1203 Note that <command>oprof_start</command> is just a wrapper around the <command>opcontrol</command> script,
1204 so it does not provide more services than the script itself.
1207 After <command>oprof_start</command> is started you can select the event type for each counter;
1208 the sampling rate and other related parameters are explained in <xref linkend="controlling-daemon" />.
1209 The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename
1210 etc. The counter setup interface should be self-explanatory; <xref linkend="hardware-counters" /> and related
1211 links contain information on using unit masks.
1214 A status line shows the current status of the profiler: how long it has been running, and the average
1215 number of interrupts received per second and the total, over all processors.
1216 Note that quitting <command>oprof_start</command> does not stop the profiler.
1219 Your configuration is saved in the same file as <command>opcontrol</command> uses; that is,
1220 <filename>~/.oprofile/daemonrc</filename>.
1223 <note><command>oprof_start</command> does not currently support <command>operf</command>.</note>
1227 <sect1 id="detailed-parameters">
1228 <title>Configuration details</title>
1230 <sect2 id="hardware-counters">
1231 <title>Hardware performance counters</title>
1232 <para>Most processor models include performance monitor units that can be configured to monitor (count)
1233 various types of hardware events. This section is where you can find architecture-specific information
1234 to help you use these events for profiling. You do not really need to read this section unless you are interested in using
1235 events other than the default event chosen by OProfile.
1239 Your CPU type may not include the requisite support for hardware performance counters, in which case
1240 you must use OProfile in timer mode (see <xref linkend="timer" />).
1244 The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available
1245 from <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink>.
1246 The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <ulink
1247 url="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf">
1248 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</ulink>.
1249 For IBM PowerPC processors, documentation is available at <ulink url="https://www.power.org/">
1250 https://www.power.org/</ulink>. For example, <ulink url="https://www.power.org/events/Power7">
1251 https://www.power.org/events/Power7</ulink> contains specific information on the performance
1252 monitor unit for the IBM POWER7.
1255 These processors are capable of delivering an interrupt when a counter overflows.
1256 This is the basic mechanism on which OProfile is based. The delivery mode is <acronym>NMI</acronym>,
1257 so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called,
1258 the current <acronym>PC</acronym> value and the current task are recorded into the profiling structure.
1259 This allows the overflow event to be attached to a specific assembly instruction in a binary image.
1260 OProfile receives this data from the kernel and writes it to the sample files.
1263 If we use an event such as <constant>CPU_CLK_UNHALTED</constant> or <constant>INST_RETIRED</constant>
1264 (<constant>GLOBAL_POWER_EVENTS</constant> or <constant>INSTR_RETIRED</constant>, respectively, on the Pentium 4), we can
1265 use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting
1266 data such as the cache behaviour of routines with the other available counters.
1269 However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay
1270 between the counter overflow and the interrupt delivery that can skew results on a small scale - this means
1271 you cannot rely on the profiles at the instruction level as being perfectly accurate.
1272 If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean
1273 that it is responsible for that event. However, it implies that the counter overflowed in the dynamic
1274 vicinity of that instruction, to within a few instructions. Further details on this problem can be found in
1275 <xref linkend="interpreting" /> and also in the Digital paper "ProfileMe: A Hardware Performance Counter".
1278 Each counter has several configuration parameters.
1279 First, there is the unit mask: this simply further specifies what to count.
1280 Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts
1281 whilst in kernel or user space. You can configure these separately for each counter.
1284 After each overflow event, the counter will be re-initialized
1285 such that another overflow will occur after this many events have been counted. Thus, higher
1286 values mean less-detailed profiling, and lower values mean more detail, but higher overhead.
1287 Picking a good value for this
1288 parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event
1290 Specifying too large a value will mean not enough interrupts are generated
1291 to give a realistic profile (though this problem can be ameliorated by profiling for <emphasis>longer</emphasis>).
1292 Specifying too small a value can lead to higher performance overhead.
1298 <title>OProfile in timer interrupt mode</title>
1300 Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes
1301 some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix).
1302 On these machines, OProfile falls back to using the timer interrupt for profiling,
1303 back to using the real-time clock interrupt to collect samples. In timer mode, OProfile
1304 is not able to profile code that has interrupts disabled.
1307 You can force use of the timer interrupt by using the <option>timer=1</option> module
1308 parameter (or <option>oprofile.timer=1</option> on the boot command line if OProfile is
1309 built-in). If OProfile was built as a kernel module, then you must pass the 'timer=1'
1310 parameter with the modprobe command. Do this before executing 'opcontrol --init' or
1311 edit the opcontrol command's invocation of modprobe to pass the 'timer=1' parameter.
1313 <note>Timer mode is only available using the legacy <command>opcontrol</command> command.</note>
1318 <title>Pentium 4 support</title>
1320 The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event
1321 selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a
1322 particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their
1323 operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one
1324 another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of
1325 one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar
1326 to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU.
1329 There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store
1330 (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described
1336 <title>Intel Itanium 2 support</title>
1338 The Itanium 2 performance monitoring unit (PMU) organizes the counters as four
1339 pairs of performance event monitoring registers. Each pair is composed of a
1340 Performance Monitoring Configuration (PMC) register and Performance Monitoring
1341 Data (PMD) register. The PMC selects the performance event being monitored and
1342 the PMD determines the sampling interval. The IA64 Performance Monitoring Unit
1343 (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur
1344 in sections of the IA64 kernel where interrupts are disabled.
1347 None of the advance features of the Itanium 2 performance monitoring unit
1348 such as opcode matching, address range matching, or precise event sampling are
1349 supported by this version of OProfile. The Itanium 2 support only maps OProfile's
1350 existing interrupt-based model to the PMU hardware.
1355 <title>PowerPC64 support</title>
1357 The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors
1358 consists of between 4 and 8 counters (depending on the model), plus three
1359 special purpose registers used for programming the counters -- MMCR0, MMCR1,
1360 and MMCRA. Advanced features such as instruction matching and thresholding are
1361 not supported by this version of OProfile.
1362 <note>Later versions of the IBM POWER5+ processor (beginning with revision 3.0)
1363 run the performance monitor unit in POWER6 mode, effectively removing OProfile's
1364 access to counters 5 and 6. These two counters are dedicated to counting
1365 instructions completed and cycles, respectively. In POWER6 mode, however, the
1366 counters do not generate an interrupt on overflow and so are unusable by
1367 OProfile. Kernel versions 2.6.23 and higher will recognize this mode
1368 and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem.
1369 OProfile userspace responds to this cpu_type by removing these counters from
1370 the list of potential events to count. Without this kernel support, attempts
1371 to profile using an event from one of these counters will yield incorrect
1372 results -- typically, zero (or near zero) samples in the generated report.
1378 <sect2 id="cell-be">
1379 <title>Cell Broadband Engine support</title>
1381 The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing
1382 Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each
1383 consist of a processing unit (PPU and SPU, respectively) and other hardware
1384 components, such as memory controllers.
1387 A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor
1388 unit of the CBE collects event information on one hardware thread at a time.
1389 Therefore, when profiling PPE events,
1390 OProfile collects the profile based on the selected events by time slicing the
1391 performance counter hardware between the two threads. The user must ensure the
1392 collection interval is long enough so that the time spent collecting data for
1393 each PPU is sufficient to obtain a good profile.
1396 To profile an SPU application, the user should specify the SPU_CYCLES event.
1397 When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain
1398 separation parameters (separate=cpu,lib) to ensure that sufficient information
1399 is collected in the sample data in order to generate a complete report. The
1400 --merge=cpu option can be used to obtain a more readable report if analyzing
1401 the performance of each separate SPU is not necessary.
1404 Profiling with an SPU event (events 4100 through 4163) is not compatible with any other
1405 event. Further more, only one SPU event can be specified at a time. The hardware only
1406 supports profiling on one SPU per node at a time. The OProfile kernel code time slices
1407 between the eight SPUs to collect data on all SPUs.
1410 SPU profile reports have some unique characteristics compared to reports for
1411 standard architectures:
1414 <listitem>Typically no "app name" column. This is really standard OProfile behavior
1415 when the report contains samples for just a single application, which is
1416 commonly the case when profiling SPUs.</listitem>
1417 <listitem>"CPU" equates to "SPU"</listitem>
1418 <listitem>Specifying '--long-filenames' on the opreport command does not always result
1419 in long filenames. This happens when the SPU application code is embedded in
1420 the PPE executable or shared library. The embedded SPU ELF data contains only the
1421 short filename (i.e., no path information) for the SPU binary file that was used as
1422 the source for embedding. The reason that just the short filename is used is because
1423 the original SPU binary file may not exist or be accessible at runtime. The performance
1424 analyst must have sufficient knowledge of the application to be able to correlate the
1425 SPU binary image names found in the report to the application's source files.
1427 Compile the application with -g and generate the OProfile report
1428 with -g to facilitate finding the right source file(s) on which to focus.
1435 <sect2 id="amd-ibs-support">
1436 <title>AMD64 (x86_64) Instruction-Based Sampling (IBS) support</title>
1439 Instruction-Based Sampling (IBS) is a new performance measurement technique
1440 available on AMD Family 10h processors. Traditional performance counter
1441 sampling is not precise enough to isolate performance issues to individual
1442 instructions. IBS, however, precisely identifies instructions which are not
1443 making the best use of the processor pipeline and memory hierarchy.
1444 For more information, please refer to the "Instruction-Based Sampling:
1445 A New Performance Analysis Technique for AMD Family 10h Processors" (
1446 <ulink url="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf">
1447 http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</ulink>).
1448 There are two types of IBS profile types, described in the following sections.
1449 <note>Profiling on IBS events is only supported with legacy mode profiling
1450 (i.e., with <command>opcontrol</command>).</note>
1453 <sect3 id="ibs-fetch">
1454 <title>IBS Fetch</title>
1457 IBS fetch sampling is a statistical sampling method which counts completed
1458 fetch operations. When the number of completed fetch operations reaches the
1459 maximum fetch count (the sampling period), IBS tags the fetch operation and
1460 monitors that operation until it either completes or aborts. When a tagged
1461 fetch completes or aborts, a sampling interrupt is generated and an IBS fetch
1462 sample is taken. An IBS fetch sample contains a timestamp, the identifier of
1463 the interrupted process, the virtual fetch address, and several event flags
1464 and values that describe what happened during the fetch operation.
1470 <title>IBS Op</title>
1473 IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64
1474 instructions. Two options are available for selecting ops for sampling:
1479 Cycles-based selection counts CPU clock cycles. The op is tagged and monitored
1480 when the count reaches a threshold (the sampling period) and a valid op is
1485 Dispatched op-based selection counts dispatched macro-ops.
1486 When the count reaches a threshold, the next valid op is tagged and monitored.
1491 In both cases, an IBS sample is generated only if the tagged op retires.
1492 Thus, IBS op event information does not measure speculative execution activity.
1493 The execution stages of the pipeline monitor the tagged macro-op. When the
1494 tagged macro-op retires, a sampling interrupt is generated and an IBS op
1495 sample is taken. An IBS op sample contains a timestamp, the identifier of
1496 the interrupted process, the virtual address of the AMD64 instruction from
1497 which the op was issued, and several event flags and values that describe
1498 what happened when the macro-op executed.
1504 Enabling IBS profiling is done simply by specifying IBS performance events
1505 through the "--event=" options. These events are listed in the
1506 <function>opcontrol --list-events</function>.
1510 opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user>
1511 opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user>
1513 Note: * All IBS fetch event must have the same event count and unitmask,
1514 as do those for IBS op.
1519 <sect2 id="systemz">
1520 <title>IBM System z hardware sampling support</title>
1522 IBM System z provides a facility which does instruction sampling as
1523 part of the CPU. This has great advantages over the timer based
1524 sampling approach like better sampling resolution with less overhead
1525 and the possibility to get samples within code sections where
1526 interrupts are disabled (useful especially for Linux kernel code).
1528 <note>Profiling with the instruction sampling facility is currently only supported
1529 with legacy mode profiling (i.e., with <command>opcontrol</command>).</note>
1531 A public description of the System z CPU-Measurement Facilities can be
1533 <ulink url="http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a">The Load-Program-Parameter and CPU-Measurement Facilities</ulink>
1536 System z hardware sampling can be used for Linux instances in LPAR
1537 mode. The hardware sampling support used by OProfile was introduced
1538 for System z10 in October 2008.
1541 To enable hardware sampling for an LPAR you must activate the LPAR
1542 with authorization for basic sampling control. See the "Support
1543 Element Operations Guide" for your mainframe system for more
1547 The hardware sampling facility can be enabled and disabled using the
1548 event interface. A `virtual' counter 0 has been defined that only supports
1549 a single event, HWSAMPLING. By default the HWSAMPLING event is
1550 enabled on machines providing the facility. For both events only the
1551 `count', `kernel' and `user' options are evaluated by the kernel
1555 The `count' value is the sampling rate as it is passed to the CPU
1556 measurement facility. A sample will be taken by the hardware every
1557 `count' cycles. Using low values here will quickly fill up the
1558 sampling buffers and will generate CPU load on the OProfile daemon and
1559 the kernel module being busy flushing the hardware buffers. This
1560 might considerably impact the workload to be profiled.
1563 The unit mask `um' is required to be zero.
1566 The opcontrol tool provides a new option specific to System z
1571 <listitem>--s390hwsampbufsize="num": Number of 2MB areas
1572 used per CPU for storing sample data. The best
1573 size for the sample memory depends on the particular system and the
1574 workload to be measured. Providing the sampler with too little memory
1575 results in lost samples. Reserving too much system memory for the
1576 sampler impacts the overall performance and, hence, also the workload
1577 to be measured.</listitem>
1581 A special counter <filename>/dev/oprofile/timer</filename> is provided
1582 by the kernel module allowing to switch back to timer mode sampling
1583 dynamically. The TIMER event is limited to be used only with this
1584 counter. The TIMER event can be specified using the
1585 <option>--event=</option> as with every other event.
1587 <screen>opcontrol --event=TIMER:1</screen>
1589 On z10 or later machines the default event is set to TIMER in case the
1590 hardware sampling facility is not available.
1593 Although required, the 'count' parameter of the TIMER event is
1594 ignored. The value may eventually be used for timer based sampling
1595 with a configurable sampling frequency, but this is currently not
1602 <title>Dangerous counter settings</title>
1604 OProfile is a low-level profiler which allows continuous profiling with a low-overhead cost.
1605 When using OProfile legacy mode profiling, it may be possible to configure such a low a counter reset value
1606 (i.e., high sampling rate) that the system can become overloaded with counter interrupts and your
1607 system's responsiveness may be severely impacted. Whilst some validation is done on the <code>count</code>
1608 values you pass to <command>opcontrol</command> with your event specification, it is not foolproof.
1611 This can happen as follows: When the profiler count
1612 reaches zero, an NMI handler is called which stores the sample values in an internal buffer, then resets the counter
1613 to its original value. If the reset count you specified is very low, a pending NMI can be sent before the NMI handler has
1614 completed. Due to the priority of the NMI, the pending interrupt is delivered immediately after
1615 completion of the previous interrupt handler, and control never returns to other parts of the system.
1616 If all processors are stuck in this mode, the system will appear to be frozen.
1618 <para>If this happens, it will be impossible to bring the system back to a workable state.
1619 There is no way to provide real security against this happening, other than making sure to use a reasonable value
1620 for the counter reset. For example, setting <constant>CPU_CLK_UNHALTED</constant> event type with a ridiculously low reset count (e.g. 500)
1621 is likely to freeze the system.
1624 In short : <command>Don't try a foolish sample count value</command>. Unfortunately the definition of a foolish value
1625 is really dependent on the event type. If ever in doubt, post a message to <address><email>oprofile-list@lists.sf.net</email>.</address>
1628 The scenario described above cannot occur if you use <command>operf</command> for profiling instead of
1629 <command>opcontrol</command>, because the perf_events kernel subsystem automatically detects when performance monitor
1630 interrupts are arriving at a dangerous level and will throttle back the sampling rate.
1638 <chapter id="results">
1639 <title>Obtaining results</title>
1641 OK, so the profiler has been running, but it's not much use unless we can get some data out. Sometimes,
1642 OProfile does a little <emphasis>too</emphasis> good a job of keeping overhead low, and no data reaches
1643 the profiler. This can happen on lightly-loaded machines. If you're using OPorifle legacy mode, you can
1644 force a dump at any time with :
1646 <para><command>opcontrol --dump</command></para>
1647 <para>This ensures that any profile data collected by the <command>oprofiled</command> daemon has been flusehd
1648 to disk. Remember to do a <code>dump</code>, <code>stop</code>, <code>shutdown</code>, or <code>deinit</code>
1649 before complaining there is no profiling data!
1652 Now that we've got some data, it has to be processed. That's the job of <command>opreport</command>,
1653 <command>opannotate</command>, or <command>opgprof</command>.
1656 <sect1 id="profile-spec">
1657 <title>Profile specifications</title>
1660 All of the analysis tools take a <emphasis>profile specification</emphasis>.
1661 This is a set of definitions that describe which actual profiles should be
1662 examined. The simplest profile specification is empty: this will match all
1663 the available profile files for the current session (this is what happens
1664 when you do <command>opreport</command>).
1667 Specification parameters are of the form <option>name:value[,value]</option>.
1668 For example, if I wanted to get a combined symbol summary for
1669 <filename>/bin/myprog</filename> and <filename>/bin/myprog2</filename>,
1670 I could do <command>opreport -l image:/bin/myprog,/bin/myprog2</command>.
1671 As a special case, you don't actually need to specify the <option>image:</option>
1672 part here: anything left on the command line is assumed to be an
1673 <option>image:</option> name. Similarly, if no <option>session:</option>
1674 is specified, then <option>session:current</option> is assumed ("current"
1675 is a special name of the current / last profiling session).
1678 In addition to the comma-separated list shown above, some of the
1679 specification parameters can take <command>glob</command>-style
1680 values. For example, if I want to see image summaries for all
1681 binaries profiled in <filename>/usr/bin/</filename>, I could do
1682 <command>opreport image:/usr/bin/\*</command>. Note the necessity
1683 to escape the special character from the shell.
1686 For <command>opreport</command>, profile specifications can be used to
1687 define two profiles, giving differential output. This is done by
1688 enclosing each of the two specifications within curly braces, as shown
1689 in the examples below. Any specifications outside of curly braces are
1693 <sect2 id="profile-spec-examples">
1694 <title>Examples</title>
1697 Image summaries for all profiles with <constant>DATA_MEM_REFS</constant>
1698 samples in the saved session called "stresstest" :
1701 # opreport session:stresstest event:DATA_MEM_REFS
1705 Symbol summary for the application called "test_sym53c8xx,9xx". Note the
1706 escaping is necessary as <option>image:</option> takes a comma-separated list.
1709 # opreport -l ./test/test_sym53c8xx\,9xx
1713 Image summaries for all binaries in the <filename>test</filename> directory,
1714 excepting <filename>boring-test</filename> :
1717 # opreport image:./test/\* image-exclude:./test/boring-test
1721 Differential profile of a binary stored in two archives :
1724 # opreport -l /bin/bash { archive:./orig } { archive:./new }
1728 Differential profile of an archived binary with the current session :
1731 # opreport -l /bin/bash { archive:./orig } { }
1734 </sect2> <!-- profile spec examples -->
1736 <sect2 id="profile-spec-details">
1737 <title>Profile specification parameters</title>
1741 <term><option>archive:</option><emphasis>archivepath</emphasis></term>
1743 A path to an archive made with <command>oparchive</command>.
1744 Absence of this tag, unlike others, means "the current system",
1745 equivalent to specifying "archive:".
1749 <term><option>session:</option><emphasis>sessionlist</emphasis></term>
1751 A comma-separated list of session names to resolve in. Absence of this
1752 tag, unlike others, means "the current session", equivalent to
1753 specifying "session:current".
1757 <term><option>session-exclude:</option><emphasis>sessionlist</emphasis></term>
1759 A comma-separated list of sessions to exclude.
1763 <term><option>image:</option><emphasis>imagelist</emphasis></term>
1765 A comma-separated list of image names to resolve. Each entry may be relative
1766 path, <command>glob</command>-style name, or full path, e.g.</para>
1767 <screen>opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</screen>
1772 <term><option>image-exclude:</option><emphasis>imagelist</emphasis></term>
1774 Same as <option>image:</option>, but the matching images are excluded.
1779 <term><option>lib-image:</option><emphasis>imagelist</emphasis></term>
1781 Same as <option>image:</option>, but only for images that are for
1782 a particular primary binary image (namely, an application). This only
1783 makes sense to use if you're using <option>--separate</option>.
1784 This includes kernel modules and the kernel when using
1785 <option>--separate=kernel</option>.
1790 <term><option>lib-image-exclude:</option><emphasis>imagelist</emphasis></term>
1792 Same as <option>lib-image:</option>, but the matching images
1798 <term><option>event:</option><emphasis>eventlist</emphasis></term>
1800 The symbolic event name to match on, e.g. <option>event:DATA_MEM_REFS</option>.
1801 You can pass a list of events for side-by-side comparison with <command>opreport</command>.
1802 When using the timer interrupt, the event is always "TIMER".
1807 <term><option>count:</option><emphasis>eventcountlist</emphasis></term>
1809 The event count to match on, e.g. <option>event:DATA_MEM_REFS count:30000</option>.
1810 Note that this value refers to the count value in the event spec you passed
1811 to <command>opcontrol</command> or <command>operf</command> when setting up to do a
1812 profile run. It has nothing to do with the sample counts in the profile data
1814 You can pass a list of events for side-by-side comparison with <command>opreport</command>.
1815 When using the timer interrupt, the count is always 0 (indicating it cannot be set).
1820 <term><option>unit-mask:</option><emphasis>masklist</emphasis></term>
1822 The unit mask value of the event to match on, e.g. <option>unit-mask:1</option>.
1823 You can pass a list of events for side-by-side comparison with <command>opreport</command>.
1828 <term><option>cpu:</option><emphasis>cpulist</emphasis></term>
1830 Only consider profiles for the given numbered CPU (starting from zero).
1831 This is only useful when using CPU profile separation.
1836 <term><option>tgid:</option><emphasis>pidlist</emphasis></term>
1838 Only consider profiles for the given task groups. Unless some program
1839 is using threads, the task group ID of a process is the same
1840 as its process ID. This option corresponds to the POSIX
1841 notion of a thread group.
1842 This is only useful when using per-process profile separation.
1847 <term><option>tid:</option><emphasis>tidlist</emphasis></term>
1849 Only consider profiles for the given threads. When using
1850 recent thread libraries, all threads in a process share the
1851 same task group ID, but have different thread IDs. You can
1852 use this option in combination with <option>tgid:</option> to
1853 restrict the results to particular threads within a process.
1854 This is only useful when using per-process profile separation.
1861 <sect2 id="locating-and-managing-binary-images">
1862 <title>Locating and managing binary images</title>
1864 Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default when
1865 using legacy mode: <filename>/var/lib/oprofile/samples/</filename>; default when using
1866 <command>operf</command>: <filename><cur_dir>/oprofile_data/samples/</filename>).
1867 These are used, along with the binary image files, to produce human-readable data.
1868 In some circumstances (e.g., kernel modules), OProfile
1869 will not be able to find the binary images. All the tools have an <option>--image-path</option>
1870 option to which you can pass a comma-separated list of alternate paths to search. For example,
1871 I can let OProfile find my 2.6 modules by using <command>--image-path /lib/modules/2.6.0/kernel/</command>.
1872 It is your responsibility to ensure that the correct images are found when using this
1876 Note that if a binary image changes after the sample file was created, you won't be able to get useful
1877 symbol-based data out. This situation is detected for you. If you replace a binary, you should
1878 make sure to save the old binary if you need to do comparative profiles.
1883 <sect2 id="no-results">
1884 <title>What to do when you don't get any results</title>
1886 When attempting to get output, you may see the error :
1889 error: no sample files found: profile specification too strict ?
1892 What this is saying is that the profile specification you passed in,
1893 when matched against the available sample files, resulted in no matches.
1894 There are a number of reasons this might happen:
1897 <varlistentry><term>spelling</term><listitem><para>
1898 You specified a binary name, but spelt it wrongly. Check your spelling !
1899 </para></listitem></varlistentry>
1900 <varlistentry><term>profiler wasn't running</term><listitem><para>
1901 Make very sure that OProfile was actually up and running when you ran
1902 the application you wish to profile.
1903 </para></listitem></varlistentry>
1904 <varlistentry><term>application didn't run long enough</term><listitem><para>
1905 Remember OProfile is a statistical profiler - you're not guaranteed to
1906 get samples for short-running programs. You can help this by using a
1907 lower count for the performance counter, so there are a lot more samples
1909 </para></listitem></varlistentry>
1910 <varlistentry><term>application spent most of its time in libraries</term><listitem><para>
1911 Similarly, if the application spends little time in the main binary image
1912 itself, with most of it spent in shared libraries it uses, you might
1913 not see any samples for the binary image (i.e., executable) itself. If you're
1914 using OProfile legacy mode profiling, then we recommend using
1915 <command>opcontrol --separate=lib</command> before the
1916 profiling session so that <command>opreport</command> and friends show
1917 the library profiles on a per-application basis. This is done automatically
1918 when profiling with <command>operf</command>, so no special setup is necessary.
1919 </para></listitem></varlistentry>
1920 <varlistentry><term>specification was really too strict</term><listitem><para>
1921 For example, you specified something like <option>tgid:3433</option>,
1922 but no task with that group ID ever ran the code.
1923 </para></listitem></varlistentry>
1924 <varlistentry><term>application didn't generate any events</term><listitem><para>
1925 If you're using a particular event counter, for example counting MMX
1926 operations, the code might simply have not generated any events in the
1927 first place. Verify the code you're profiling does what you expect it
1929 </para></listitem></varlistentry>
1930 <varlistentry><term>you didn't specify kernel module name correctly</term><listitem><para>
1931 If you're trying to get reports for a kernel
1932 module, make sure to use the <option>-p</option> option, and specify the
1933 module name <emphasis>with</emphasis> the <filename>.ko</filename>
1934 extension. Check if the module is one loaded from initrd.
1935 </para></listitem></varlistentry>
1940 </sect1> <!-- profile-spec -->
1942 <sect1 id="opreport">
1943 <title>Image summaries and symbol summaries (<command>opreport</command>)</title>
1945 The <command>opreport</command> utility is the primary utility you will use for
1946 getting formatted data out of OProfile. It produces two types of data: image summaries
1947 and symbol summaries. An image summary lists the number of samples for individual
1948 binary images such as libraries or applications. Symbol summaries provide per-symbol
1949 profile data. In the following example, we're getting an image summary for the whole
1953 $ opreport --long-filenames
1954 CPU: PIII, speed 863.195 MHz (estimated)
1955 Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150
1956 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus
1957 214320 14.1338 /boot/2.6.0/vmlinux
1958 103450 6.8222 /lib/i686/libc-2.3.2.so
1959 60160 3.9674 /usr/local/bin/madplay
1960 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled
1961 26550 1.7509 /usr/lib/libartsflow.so.1.0.0
1962 23906 1.5765 /usr/bin/as
1963 18770 1.2378 /oprofile
1964 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5
1965 11979 0.7900 /usr/X11R6/bin/XFree86
1966 11328 0.7471 /bin/bash
1970 If we had specified <option>--symbols</option> in the previous command, we would have
1971 gotten a symbol summary of all the images across the entire system. We can restrict this to only
1972 part of the system profile; for example,
1973 below is a symbol summary of the OProfile daemon. Note that as we used
1974 <command>opcontrol --separate=lib,kernel</command>, symbols from images that <command>oprofiled</command>
1975 has used are also shown.
1978 $ opreport -l -p /lib/modules/`uname -r` `which oprofiled` 2>/dev/null | more
1979 CPU: Core 2, speed 2.534e+06 MHz (estimated)
1980 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
1981 samples % image name symbol name
1982 1353 24.9447 vmlinux sidtab_context_to_sid
1983 500 9.2183 vmlinux avtab_hash_eval
1984 154 2.8392 vmlinux __link_path_walk
1985 152 2.8024 vmlinux d_prune_aliases
1986 120 2.2124 vmlinux avtab_search_node
1987 104 1.9174 vmlinux find_next_bit
1988 85 1.5671 vmlinux selinux_file_fcntl
1989 82 1.5118 vmlinux avtab_write
1990 81 1.4934 oprofiled odb_update_node_with_offset
1991 73 1.3459 oprofiled opd_process_samples
1992 72 1.3274 vmlinux avc_has_perm_noaudit
1993 61 1.1246 libc-2.12.so _IO_vfscanf
1994 59 1.0878 ext4.ko ext4_mark_iloc_dirty
1999 These are the two basic ways you are most likely to use regularly, but <command>opreport</command>
2000 can do a lot more than that, as described below.
2003 <sect2 id="opreport-merging">
2004 <title>Merging separate profiles</title>
2006 If you have used one of the <option>--separate[*]</option> options
2007 whilst profiling, there can be several separate profiles for
2008 a single binary image within a session. Normally the output
2009 will keep these images separated. So, for example, if you profiled
2010 with separation on a per-cpu basis (<code>opcontrol --separate=cpu</code> or
2011 <code>operf --separate-cpu</code>), you would see separate columns in
2012 the output of <command>opreport</command> for each CPU where samples
2013 were recorded. But it can be useful to merge these results back together
2014 to make the report more readable. The <option>--merge</option> option allows
2018 <sect2 id="opreport-comparison">
2019 <title>Side-by-side multiple results</title>
2020 If you have used multiple events when profiling, by default you get
2021 side-by-side results of each event's sample values from <command>opreport</command>.
2022 You can restrict which events to list by appropriate use of the
2023 <option>event:</option> profile specifications, etc.
2026 <sect2 id="opreport-callgraph">
2027 <title>Callgraph output</title>
2029 This section provides details on how to use the OProfile callgraph feature.
2032 <title>Callgraph details</title>
2034 When using the <option>--callgraph</option> option, you can see what
2035 functions are calling other functions in the output. Consider the
2039 #include <string.h>
2040 #include <stdlib.h>
2041 #include <stdio.h>
2045 static int compare(const void *s1, const void *s2)
2047 return strcmp(s1, s2);
2050 static void repeat(void)
2053 char *strings[SIZE];
2054 char str[] = "abcdefghijklmnopqrstuvwxyz";
2056 for (i = 0; i < SIZE; ++i) {
2057 strings[i] = strdup(str);
2061 qsort(strings, SIZE, sizeof(char *), compare);
2071 When running with the call-graph option, OProfile will
2072 record the function stack every time it takes a sample.
2073 <command>opreport --callgraph</command> outputs an entry for each
2074 function, where each entry looks similar to:
2077 samples % image name symbol name
2079 127036 99.8452 cg repeat
2080 84590 42.5084 libc-2.3.2.so strfry
2081 84590 66.4838 libc-2.3.2.so strfry [self]
2082 39169 30.7850 libc-2.3.2.so random_r
2083 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx
2084 -------------------------------------------------------------------------------
2087 Here the non-indented line is the function we're focussing upon
2088 (<function>strfry()</function>). This
2089 line is the same as you'd get from a normal <command>opreport</command>
2093 Above the non-indented line we find the functions that called this
2094 function (for example, <function>repeat()</function> calls
2095 <function>strfry()</function>). The samples and percentage values here
2096 refer to the number of times we took a sample where this call was found
2097 in the stack; the percentage is relative to all other callers of the
2098 function we're focussing on. Note that these values are
2099 <emphasis>not</emphasis> call counts; they only reflect the call stack
2100 every time a sample is taken; that is, if a call is found in the stack
2101 at the time of a sample, it is recorded in this count.
2104 Below the line are functions that are called by
2105 <function>strfry()</function> (called <emphasis>callees</emphasis>).
2106 It's clear here that <function>strfry()</function> calls
2107 <function>random_r()</function>. We also see a special entry with a
2108 "[self]" marker. This records the normal samples for the function, but
2109 the percentage becomes relative to all callees. This allows you to
2110 compare time spent in the function itself compared to functions it
2111 calls. Note that if a function calls itself, then it will appear in the
2112 list of callees of itself, but without the "[self]" marker; so recursive
2113 calls are still clearly separable.
2116 You may have noticed that the output lists <function>main()</function>
2117 as calling <function>strfry()</function>, but it's clear from the source
2118 that this doesn't actually happen. See <xref
2119 linkend="interpreting-callgraph" /> for an explanation.
2122 <sect3 id="cg-with-jitsupport">
2123 <title>Callgraph and JIT support</title>
2125 Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading.
2126 For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory
2127 named <filename>{anon:anon}/<tgid>.<begin_addr>.<end_addr></filename>.
2128 As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java,
2129 OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code.
2130 However, when viewing callgraph output, any anonymously mapped code in the callstack
2131 will be attributed to <filename>anon (<tgid>: range:<begin_addr>-<end_addr></filename>,
2132 even if a <filename>.jo</filename> ELF file had been created for it. See the example below.
2135 -------------------------------------------------------------------------------
2136 1 2.2727 libj9ute23.so java.bin traceV
2137 2 4.5455 libj9ute23.so java.bin utsTraceV
2138 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces
2139 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter
2140 8 0.0154 libj9prt23.so java.bin j9time_hires_clock
2141 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols)
2142 9 20.4545 libc-2.4.so java.bin gettimeofday
2143 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self]
2144 -------------------------------------------------------------------------------
2147 The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of
2148 <code>j9time_hires_clock</code>, even though the ELF file <filename>10014.jo</filename> was
2149 created for this profile run. Unfortunately, there is currently no way to correlate
2150 that anonymous callgraph entry with its corresponding <filename>.jo</filename> file.
2155 </sect2> <!-- opreport-callgraph -->
2157 <sect2 id="opreport-diff">
2158 <title>Differential profiles with <command>opreport</command></title>
2161 Often, we'd like to be able to compare two profiles. For example, when
2162 analysing the performance of an application, we'd like to make code
2163 changes and examine the effect of the change. This is supported in
2164 <command>opreport</command> by giving a profile specification that
2165 identifies two different profiles. The general form is of:
2168 $ opreport <shared-spec> { <first-profile> } { <second-profile> }
2171 We lost our Dragon book down the back of the sofa, so you have to be
2172 careful to have spaces around those braces, or things will get
2173 hopelessly confused. We can only apologise.
2176 For each of the profiles, the shared section is prefixed, and then the
2177 specification is analysed. The usual parameters work both within the
2178 shared section, and in the sub-specification within the curly braces.
2181 A typical way to use this feature is with archives created with
2182 <command>oparchive</command>. Let's look at an example:
2186 $ oparchive -o orig ./a
2188 # edit and recompile a
2190 # now compare the current profile of a with the archived profile
2191 $ opreport -xl ./a { archive:./orig } { }
2192 CPU: PIII, speed 863.233 MHz (estimated)
2193 Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a
2194 unit mask of 0x00 (No unit mask) count 100000
2195 samples % diff % symbol name
2196 92435 48.5366 +0.4999 a
2199 48787 25.6175 -2.2e-01 b
2202 Note that we specified an empty second profile in the curly braces, as
2203 we wanted to use the current session; alternatively, we could
2204 have specified another archive, or a tgid etc. We specified the binary
2205 <command>a</command> in the shared section, so we matched that in both
2206 the profiles we're diffing.
2209 As in the normal output, the results are sorted by the number of
2210 samples, and the percentage field represents the relative percentage of
2211 the symbol's samples in the second profile.
2214 Notice the new column in the output. This value represents the
2215 percentage change of the relative percent between the first and the
2216 second profile: roughly, "how much more important this symbol is".
2217 Looking at the symbol <function>a()</function>, we can see that it took
2218 roughly the same amount of the total profile in both the first and the
2219 second profile. The function <function>c()</function> was not in the new
2220 profile, so has been marked with <function>---</function>. Note that the
2221 sample value is the number of samples in the first profile; since we're
2222 displaying results for the second profile, we don't list a percentage
2223 value for it, as it would be meaningless. <function>d()</function> is
2224 new in the second profile, and consequently marked with
2225 <function>+++</function>.
2228 When comparing profiles between different binaries, it should be clear
2229 that functions can change in terms of VMA and size. To avoid this
2230 problem, <command>opreport</command> considers a symbol to be the same
2231 if the symbol name, image name, and owning application name all match;
2232 any other factors are ignored. Note that the check for application name
2233 means that trying to compare library profiles between two different
2234 applications will not work as you might expect: each symbol will be
2235 considered different.
2238 </sect2> <!-- opreport-diff -->
2240 <sect2 id="opreport-anon">
2241 <title>Anonymous executable mappings</title>
2243 Many applications, typically ones involving dynamic compilation into
2244 machine code (just-in-time, or "JIT", compilation), have executable mappings that
2245 are not backed by an ELF file. <command>opreport</command> has basic support for showing the
2246 samples taken in these regions; for example:
2248 $ opreport /usr/bin/mono -l
2249 CPU: ppc64 POWER5, speed 1654.34 MHz (estimated)
2250 Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000
2251 samples % image name symbol name
2252 47 58.7500 mono (no symbols)
2253 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols)
2254 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols)
2259 Note that, since such mappings are dependent upon individual invocations of
2260 a binary, these mappings are always listed as a dependent image,
2261 even when using the legacy mode <option>opcontrol --separate=none</option> command.
2262 Equally, the results are not affected by the <option>--merge</option>
2266 As shown in the opreport output above, OProfile is unable to attribute the samples to any
2267 symbol(s) because there is no ELF file for this code.
2268 Enhanced support for JITed code is now available for some virtual machines;
2269 e.g., the Java Virtual Machine. For details about OProfile output for
2270 JITed code, see <xref linkend="getting-jit-reports" />.
2272 <para>For more information about JIT support in OProfile, see <xref linkend="jitsupport"/>.
2274 </sect2> <!-- opreport-anon -->
2276 <sect2 id="opreport-xml">
2277 <title>XML formatted output</title>
2279 The --xml option can be used to generate XML instead of the usual
2280 text format. This allows opreport to eliminate some of the constraints
2281 dictated by the two dimensional text format. For example, it is possible
2282 to separate the sample data across multiple events, cpus and threads. The XML
2283 schema implemented by opreport is found in doc/opreport.xsd. It contains
2284 more detailed comments about the structure of the XML generated by opreport.
2287 Since XML is consumed by a client program rather than a user, its structure
2288 is fairly static. In particular, the --sort option is incompatible with the
2289 --xml option. Percentages are not dislayed in the XML so the options related
2290 to percentages will have no effect. Full pathnames are always displayed in
2291 the XML so --long-filenames is not necessary. The --details option will cause
2292 all of the individual sample data to be included in the XML as well as the
2293 instruction byte stream for each symbol (for doing disassembly) and can result
2294 in very large XML files.
2296 </sect2> <!-- opreport-xml -->
2298 <sect2 id="opreport-options">
2299 <title>Options for <command>opreport</command></title>
2302 <varlistentry><term><option>--accumulated / -a</option></term><listitem><para>
2303 Accumulate sample and percentage counts in the symbol list.
2304 </para></listitem></varlistentry>
2305 <varlistentry><term><option>--callgraph / -c</option></term><listitem><para>
2306 Show callgraph information.
2307 </para></listitem></varlistentry>
2308 <varlistentry><term><option>--debug-info / -g</option></term><listitem><para>
2309 Show source file and line for each symbol.
2310 </para></listitem></varlistentry>
2311 <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para>
2312 none: no demangling. normal: use default demangler (default) smart: use
2313 pattern-matching to make C++ symbol demangling more readable.
2314 </para></listitem></varlistentry>
2315 <varlistentry><term><option>--details / -d</option></term><listitem><para>
2316 Show per-instruction details for all selected symbols. Note that, for
2317 binaries without symbol information, the VMA values shown are raw file
2318 offsets for the image binary.
2319 </para></listitem></varlistentry>
2320 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para>
2321 Do not include application-specific images for libraries, kernel modules
2322 and the kernel. This option only makes sense if the profile session
2324 </para></listitem></varlistentry>
2325 <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para>
2326 Exclude all the symbols in the given comma-separated list.
2327 </para></listitem></varlistentry>
2328 <varlistentry><term><option>--global-percent / -%</option></term><listitem><para>
2329 Make all percentages relative to the whole profile.
2330 </para></listitem></varlistentry>
2331 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para>
2333 </para></listitem></varlistentry>
2334 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para>
2335 Comma-separated list of additional paths to search for binaries.
2336 This is needed to find kernel modules.
2337 </para></listitem></varlistentry>
2338 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para>
2339 A path to a filesystem to search for additional binaries.
2340 </para></listitem></varlistentry>
2341 <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para>
2342 Only include symbols in the given comma-separated list.
2343 </para></listitem></varlistentry>
2344 <varlistentry><term><option>--long-filenames / -f</option></term><listitem><para>
2345 Output full paths instead of basenames.
2346 </para></listitem></varlistentry>
2347 <varlistentry><term><option>--merge / -m [lib,cpu,tid,tgid,unitmask,all]</option></term><listitem><para>
2348 Merge any profiles separated in a --separate session.
2349 </para></listitem></varlistentry>
2350 <varlistentry><term><option>--no-header</option></term><listitem><para>
2351 Don't output a header detailing profiling parameters.
2352 </para></listitem></varlistentry>
2353 <varlistentry><term><option>--output-file / -o [file]</option></term><listitem><para>
2354 Output to the given file instead of stdout.
2355 </para></listitem></varlistentry>
2356 <varlistentry><term><option>--reverse-sort / -r</option></term><listitem><para>
2357 Reverse the sort from the default.
2358 </para></listitem></varlistentry>
2359 <varlistentry><term><option>--session-dir=</option>dir_path</term><listitem><para>
2360 Use sample database out of directory <filename>dir_path</filename>
2361 instead of the default location (/var/lib/oprofile).
2362 </para></listitem></varlistentry>
2363 <varlistentry><term><option>--show-address / -w</option></term><listitem><para>
2364 Show the VMA address of each symbol (off by default).
2365 </para></listitem></varlistentry>
2366 <varlistentry><term><option>--sort / -s [vma,sample,symbol,debug,image]</option></term><listitem><para>
2367 Sort the list of symbols by, respectively, symbol address,
2368 number of samples, symbol name, debug filename and line number,
2369 binary image filename.
2370 </para></listitem></varlistentry>
2371 <varlistentry><term><option>--symbols / -l</option></term><listitem><para>
2372 List per-symbol information instead of a binary image summary.
2373 </para></listitem></varlistentry>
2374 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para>
2375 Only output data for symbols that have more than the given percentage
2377 </para></listitem></varlistentry>
2378 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
2379 Give verbose debugging output.
2380 </para></listitem></varlistentry>
2381 <varlistentry><term><option>--version / -v</option></term><listitem><para>
2383 </para></listitem></varlistentry>
2384 <varlistentry><term><option>--xml / -X</option></term><listitem><para>
2385 Generate XML output.
2386 </para></listitem></varlistentry>
2391 </sect1> <!-- opreport -->
2393 <sect1 id="opannotate">
2394 <title>Outputting annotated source (<command>opannotate</command>)</title>
2396 The <command>opannotate</command> utility generates annotated source files or assembly listings, optionally
2398 If you want to see the source file, the profiled application needs to have debug information, and the source
2399 must be available through this debug information. For GCC, you must use the <option>-g</option> option
2400 when you are compiling.
2401 If the binary doesn't contain sufficient debug information, you can still
2402 use <command>opannotate <option>--assembly</option></command> to get annotated assembly
2403 as long as the binary has (at least) symbol information.
2406 Note that for the reason explained in <xref linkend="hardware-counters" /> the results can be
2407 inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be
2408 incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to
2409 crediting source lines with samples not really "owned" by this line. Also see
2410 <xref linkend="interpreting" />.
2413 You can output the annotation to one single file, containing all the source found using the
2414 <option>--source</option>. You can use this in conjunction with <option>--assembly</option>
2415 to get combined source/assembly output.
2418 You can also output a directory of annotated source files that maintains the structure of
2419 the original sources. Each line in the annotated source is prepended with the samples
2420 for that line. Additionally, each symbol is annotated giving details for the symbol
2421 as a whole. An example:
2424 $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled
2425 $ ls annotated/home/moz/src/oprofile-pp/daemon/
2426 opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c
2429 Line numbers are maintained in the source files, but each file has
2430 a footer appended describing the profiling details. The actual annotation
2431 looks something like this :
2435 :static uint64_t pop_buffer_value(struct transient * trans)
2436 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */
2439 10227 1.7469 : if (!trans->remaining) {
2440 : fprintf(stderr, "BUG: popping empty buffer !\n");
2441 : exit(EXIT_FAILURE);
2444 : val = get_buffer_value(trans->buffer, 0);
2445 2281 0.3896 : trans->remaining--;
2446 2296 0.3922 : trans->buffer += kernel_pointer_size;
2453 The first number on each line is the number of samples, whilst the second is
2454 the relative percentage of total samples.
2457 <sect2 id="opannotate-finding-source">
2458 <title>Locating source files</title>
2460 Of course, <command>opannotate</command> needs to be able to locate the source files
2461 for the binary image(s) in order to produce output. Some binary images have debug
2462 information where the given source file paths are relative, not absolute. You can
2463 specify search paths to look for these files (similar to <command>gdb</command>'s
2464 <option>dir</option> command) with the <option>--search-dirs</option> option.
2467 Sometimes you may have a binary image which gives absolute paths for the source files,
2468 but you have the actual sources elsewhere (commonly, you've installed an SRPM for
2469 a binary on your system and you want annotation from an existing profile). You can
2470 use the <option>--base-dirs</option> option to redirect OProfile to look somewhere
2471 else for source files. For example, imagine we have a binary generated from a source
2472 file that is given in the debug information as <filename>/tmp/build/libfoo/foo.c</filename>,
2473 and you have the source tree matching that binary installed in <filename>/home/user/libfoo/</filename>.
2474 You can redirect OProfile to find <filename>foo.c</filename> correctly like this :
2477 $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so
2480 You can specify multiple (comma-separated) paths to both options.
2484 <sect2 id="opannotate-details">
2485 <title>Usage of <command>opannotate</command></title>
2488 <varlistentry><term><option>--assembly / -a</option></term><listitem><para>
2489 Output annotated assembly. If this is combined with --source, then mixed
2490 source / assembly annotations are output.
2491 </para></listitem></varlistentry>
2492 <varlistentry><term><option>--base-dirs / -b [paths]/</option></term><listitem><para>
2493 Comma-separated list of path prefixes. This can be used to point OProfile to a
2494 different location for source files when the debug information specifies an
2495 absolute path on your system for the source that does not exist. The prefix
2496 is stripped from the debug source file paths, then searched in the search dirs
2497 specified by <option>--search-dirs</option>.
2498 </para></listitem></varlistentry>
2499 <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para>
2500 none: no demangling. normal: use default demangler (default) smart: use
2501 pattern-matching to make C++ symbol demangling more readable.
2502 </para></listitem></varlistentry>
2503 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para>
2504 Do not include application-specific images for libraries, kernel modules
2505 and the kernel. This option only makes sense if the profile session
2507 </para></listitem></varlistentry>
2508 <varlistentry><term><option>--exclude-file [files]</option></term><listitem><para>
2509 Exclude all files in the given comma-separated list of glob patterns.
2510 </para></listitem></varlistentry>
2511 <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para>
2512 Exclude all the symbols in the given comma-separated list.
2513 </para></listitem></varlistentry>
2514 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para>
2516 </para></listitem></varlistentry>
2517 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para>
2518 Comma-separated list of additional paths to search for binaries.
2519 This is needed to find kernel modules.
2520 </para></listitem></varlistentry>
2521 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para>
2522 A path to a filesystem to search for additional binaries.
2523 </para></listitem></varlistentry>
2524 <varlistentry><term><option>--include-file [files]</option></term><listitem><para>
2525 Only include files in the given comma-separated list of glob patterns.
2526 </para></listitem></varlistentry>
2527 <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para>
2528 Only include symbols in the given comma-separated list.
2529 </para></listitem></varlistentry>
2530 <varlistentry><term><option>--objdump-params [params]</option></term><listitem><para>
2531 Pass the given parameters as extra values when calling objdump.
2532 If more than one option is to be passed to objdump, the parameters must be enclosed in a
2536 An example of where this option is useful is when your toolchain does not
2537 automatically recognize instructions that are specific to your processor.
2538 For example, on IBM POWER7/RHEL 6, objdump must be told that a binary file may have
2539 POWER7-specific instructions. The <command>opannotate</command> option to show the POWER7-specific
2542 --objdump-params=-Mpower7
2546 The <command>opannotate</command> option to show the POWER7-specific instructions,
2547 the source code (--source) and the line numbers (-l) would be:
2549 --objdump-params="-Mpower7 -l --source"
2551 </para></listitem></varlistentry>
2552 <varlistentry><term><option>--output-dir / -o [dir]</option></term><listitem><para>
2553 Output directory. This makes opannotate output one annotated file for each
2554 source file. This option can't be used in conjunction with --assembly.
2555 </para></listitem></varlistentry>
2556 <varlistentry><term><option>--search-dirs / -d [paths]</option></term><listitem><para>
2557 Comma-separated list of paths to search for source files. This is useful to find
2558 source files when the debug information only contains relative paths.
2559 </para></listitem></varlistentry>
2560 <varlistentry><term><option>--source / -s</option></term><listitem><para>
2561 Output annotated source. This requires debugging information to be available
2563 </para></listitem></varlistentry>
2564 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para>
2565 Only output data for symbols that have more than the given percentage
2567 </para></listitem></varlistentry>
2568 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
2569 Give verbose debugging output.
2570 </para></listitem></varlistentry>
2571 <varlistentry><term><option>--version / -v</option></term><listitem><para>
2573 </para></listitem></varlistentry>
2577 </sect2> <!-- opannotate-details -->
2579 </sect1> <!-- opannotate -->
2581 <sect1 id="getting-jit-reports">
2582 <title>OProfile results with JIT samples</title>
2584 After profiling a Java (or other supported VM) application, the
2585 OProfile JIT support creates ELF binaries from the
2586 intermediate files that were written by the agent library.
2587 The ELF binaries are named <filename><tgid>.jo</filename>.
2588 With the symbol information stored in these ELF files, it is
2589 possible to map samples to the appropriate symbols.
2592 The usual analysis tools (<command>opreport</command> and/or
2593 <command>opannotate</command>) can now be used
2594 to get symbols and assembly code for the instrumented VM processes.
2597 Below is an example of a profile report of a Java application that has been
2598 instrumented with the provided agent library.
2600 $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java
2601 CPU: Core Solo / Duo, speed 2167 MHz (estimated)
2602 Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000
2603 samples % image name symbol name
2604 186020 50.0523 no-vmlinux no-vmlinux (no symbols)
2605 34333 9.2380 7635.jo java void test.f1()
2606 19022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1
2607 18762 5.0483 libc-2.5.so libc-2.5.so vfprintf
2608 16408 4.4149 7635.jo java void test$HelloThread.run()
2609 16250 4.3724 7635.jo java void test$test_1.f2(int)
2610 15303 4.1176 7635.jo java void test.f2(int, int)
2611 13252 3.5657 7635.jo java void test.f2(int)
2612 5165 1.3897 7635.jo java void test.f4()
2613 955 0.2570 7635.jo java void test$HelloThread.run()~
2618 Depending on the JVM that is used, certain options of opreport and opannotate
2619 do NOT work since they rely on debug information (e.g. source code line number)
2620 that is not always available. The Sun JVM does provide the necessary debug
2621 information via the JVMTI[PI] interface,
2622 but other JVMs do not.
2625 As you can see in the opreport output, the JIT support agent for Java
2626 generates symbols to include the class and method signature.
2627 A symbol with the suffix ˜<n> (e.g.
2628 <code>void test$HelloThread.run()˜1</code>) means that this is
2629 the <n>th occurrence of the identical name. This happens if a method is re-JITed.
2630 A symbol with the suffix %<n>, means that the address space of this symbol
2631 was reused during the sample session (see <xref linkend="overlapping-symbols" />).
2632 The value <n> is the percentage of time that this symbol/code was present in
2633 relation to the total lifetime of all overlapping other symbols. A symbol of the form
2634 <code><return_val> <class_name>$<method_sig></code> denotes an
2639 <sect1 id="opgprof">
2640 <title><command>gprof</command>-compatible output (<command>opgprof</command>)</title>
2642 If you're familiar with the output produced by <command>GNU gprof</command>,
2643 you may find <command>opgprof</command> useful. It takes a single binary
2644 as an argument, and produces a <filename>gmon.out</filename> file for use
2645 with <command>gprof -p</command>. If call-graph profiling is enabled,
2646 then this is also included.
2649 $ opgprof `which oprofiled` # generates gmon.out file
2650 $ gprof -p `which oprofiled` | head
2653 Each sample counts as 1 samples.
2654 % cumulative self self total
2655 time samples samples calls T1/call T1/call name
2656 33.13 206237.00 206237.00 odb_insert
2657 22.67 347386.00 141149.00 pop_buffer_value
2658 9.56 406881.00 59495.00 opd_put_sample
2659 7.34 452599.00 45718.00 opd_find_image
2660 7.19 497327.00 44728.00 opd_process_samples
2663 <sect2 id="opgprof-details">
2664 <title>Usage of <command>opgprof</command></title>
2667 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para>
2669 </para></listitem></varlistentry>
2670 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para>
2671 Comma-separated list of additional paths to search for binaries.
2672 This is needed to find kernel modules.
2673 </para></listitem></varlistentry>
2674 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para>
2675 A path to a filesystem to search for additional binaries.
2676 </para></listitem></varlistentry>
2677 <varlistentry><term><option>--output-filename / -o [file]</option></term><listitem><para>
2678 Output to the given file instead of the default, gmon.out
2679 </para></listitem></varlistentry>
2680 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para>
2681 Only output data for symbols that have more than the given percentage
2683 </para></listitem></varlistentry>
2684 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
2685 Give verbose debugging output.
2686 </para></listitem></varlistentry>
2687 <varlistentry><term><option>--version / -v</option></term><listitem><para>
2689 </para></listitem></varlistentry>
2692 </sect2> <!-- opgprof-details -->
2694 </sect1> <!-- opgprof -->
2696 <sect1 id="oparchive">
2697 <title>Archiving measurements (<command>oparchive</command>)</title>
2699 The <command>oparchive</command> utility generates a directory populated
2700 with executable, debug, and oprofile sample files. This directory can be
2701 moved to another machine via <command>tar</command> and analyzed without
2702 further use of the data collection machine.
2706 The following command would collect the sample files, the executables
2707 associated with the sample files, and the debuginfo files associated
2708 with the executables and copy them into
2709 <filename>/tmp/current_data</filename>:
2713 # oparchive -o /tmp/current_data
2716 <sect2 id="oparchive-details">
2717 <title>Usage of <command>oparchive</command></title>
2720 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para>
2722 </para></listitem></varlistentry>
2723 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para>
2724 Do not include application-specific images for libraries, kernel modules
2725 and the kernel. This option only makes sense if the profile session
2727 </para></listitem></varlistentry>
2728 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para>
2729 Comma-separated list of additional paths to search for binaries.
2730 This is needed to find kernel modules.
2731 </para></listitem></varlistentry>
2732 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para>
2733 A path to a filesystem to search for additional binaries.
2734 </para></listitem></varlistentry>
2735 <varlistentry><term><option>--output-directory / -o [directory]</option></term><listitem><para>
2736 Output to the given directory. There is no default. This must be specified.
2737 </para></listitem></varlistentry>
2738 <varlistentry><term><option>--list-files / -l</option></term><listitem><para>
2739 Only list the files that would be archived, don't copy them.
2740 </para></listitem></varlistentry>
2741 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para>
2742 Give verbose debugging output.
2743 </para></listitem></varlistentry>
2744 <varlistentry><term><option>--version / -v</option></term><listitem><para>
2746 </para></listitem></varlistentry>
2749 </sect2> <!-- oparchive-details -->
2751 </sect1> <!-- oparchive -->
2753 <sect1 id="opimport">
2754 <title>Converting sample database files (<command>opimport</command>)</title>
2756 This utility converts sample database files from a foreign binary format (abi) to
2757 the native format. This is useful only when moving sample files between systems
2758 for analysis on platforms other than the one used for collection. The <command>
2759 oparchive</command> should be used on the machine where the profile was taken (target)
2760 in order to collect sample files and all other necessary information. The archive
2761 directory that is the output from <command>oparchive</command> should be copied
2762 to the system where you wish to perform your performance analysis (host). If the
2763 When the architecture of your target and host systems differ, then you'll need to
2764 use the <command>opimport</command> command. The abi format of the sample files
2765 to be imported is described in a text file located in <filename>$SESSION_DIR/abi</filename>.
2769 The following command converts an input sample file to the specified
2770 output sample file using the given abi file as a binary description
2771 of the input file and the curent platform abi as a binary description
2772 of the output file. (NOTE: The ellipses are used to make the example more
2773 compact and cannot be used in an actual command line.)
2777 # opimport -a /tmp/foreign-abi -o /tmp/imported/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /tmp/archived/var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all
2780 Since opimport converts just one file at a time, an example shell script is provided below
2781 that will perform an import/conversion of all sample files in a samples directory collected
2782 from the target system.
2785 Usage: my-import.sh <input-abi-pathname>
2787 # NOTE: Start from the "samples" directory containing the "current" directory
2790 mkdir current-imported
2791 cd current-imported; (cd ../current; find . -type d ! -name .) |xargs mkdir
2792 cd ../current; mv stats ../StatsSave; find . -type f | while read line; do opimport -a $1 -o ../current-imported/$line $line; done; mv ../StatsSave stats;
2796 Example usage: Assume that on the target system, a profile was collected using a session-dir of
2797 <filename>/var/lib/oprofile</filename>, and then <command>oparchive -o profile1</command> was run.
2798 Then the <filename>profile1</filename> directory is copied to the host system for analysis. To import
2799 the sample data in <filename>profile1</filename>, you would perform the following steps:
2801 $cd profile1/var/lib/oprofile/samples
2802 $my-import.sh `pwd`/../abi
2805 <sect2 id="opimport-details">
2806 <title>Usage of <command>opimport</command></title>
2809 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para>
2811 </para></listitem></varlistentry>
2812 <varlistentry><term><option>--abi / -a [filename]</option></term><listitem><para>
2813 Input abi file description location.
2814 </para></listitem></varlistentry>
2815 <varlistentry><term><option>--force / -f</option></term><listitem><para>
2816 Force conversion even if the input and output abi are identical.
2817 </para></listitem></varlistentry>
2818 <varlistentry><term><option>--output / -o [filename]</option></term><listitem><para>
2819 Specify the output filename. If the output file already exists, the file is
2820 not overwritten but data are accumulated in. Sample filename are informative
2821 for post profile tools and must be kept identical, in other word the pathname
2822 from the first path component containing a '{' must be kept as it in the
2824 </para></listitem></varlistentry>
2825 <varlistentry><term><option>--verbose / -V</option></term><listitem><para>
2826 Give verbose debugging output.
2827 </para></listitem></varlistentry>
2828 <varlistentry><term><option>--version / -v</option></term><listitem><para>
2830 </para></listitem></varlistentry>
2833 </sect2> <!-- opimport-details -->
2835 </sect1> <!-- opimport -->
2839 <chapter id="interpreting">
2840 <title>Interpreting profiling results</title>
2842 The standard caveats of profiling apply in interpreting the results from OProfile:
2843 profile realistic situations, profile different scenarios, profile
2844 for as long as a time as possible, avoid system-specific artifacts, don't trust
2845 the profile data too much. Also bear in mind the comments on the performance
2846 counters above - you <emphasis>cannot</emphasis> rely on totally accurate
2847 instruction-level profiling. However, for almost all circumstances the data
2848 can be useful. Ideally a utility such as Intel's VTUNE would be available to
2849 allow careful instruction-level analysis; go hassle Intel for this, not me ;)
2851 <sect1 id="irq-latency">
2852 <title>Profiling interrupt latency</title>
2854 This is an example of how the latency of delivery of profiling interrupts
2855 can impact the reliability of the profiling data. This is pretty much a
2856 worst-case-scenario example: these problems are fairly rare.
2859 double fun(double a, double b, double c)
2862 for (int i = 0 ; i < 10000; ++i) {
2871 Here the last instruction of the loop is very costly, and you would expect the result
2872 reflecting that - but (cutting the instructions inside the loop):
2875 $ opannotate -a -t 10 ./a.out
2877 88 15.38% : 8048337: fadd %st(3),%st
2878 48 8.391% : 8048339: fmul %st(2),%st
2879 68 11.88% : 804833b: fdiv %st(1),%st
2880 368 64.33% : 804833d: inc %eax
2881 : 804833e: cmp $0x270f,%eax
2882 : 8048343: jle 8048337
2885 The problem comes from the x86 hardware; when the counter overflows the IRQ
2886 is asserted but the hardware has features that can delay the NMI interrupt:
2887 x86 hardware is synchronous (i.e. cannot interrupt during an instruction);
2888 there is also a latency when the IRQ is asserted, and the multiple
2889 execution units and the out-of-order model of modern x86 CPUs also causes
2890 problems. This is the same function, with annotation :
2893 $ opannotate -s -t 10 ./a.out
2895 :double fun(double a, double b, double c)
2896 :{ /* _Z3funddd total: 572 100.0% */
2897 : double result = 0;
2898 368 64.33% : for (int i = 0 ; i < 10000; ++i) {
2899 88 15.38% : result += a;
2900 48 8.391% : result *= b;
2901 68 11.88% : result /= c;
2907 The conclusion: don't trust samples coming at the end of a loop,
2908 particularly if the last instruction generated by the compiler is costly. This
2909 case can also occur for branches. Always bear in mind that samples
2910 can be delayed by a few cycles from its real position. That's a hardware
2911 problem and OProfile can do nothing about it.
2914 <sect1 id="kernel-profiling">
2915 <title>Kernel profiling</title>
2916 <sect2 id="irq-masking">
2917 <title>Interrupt masking</title>
2919 OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4,
2920 Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in sections of the
2921 kernel where interrupts are disabled, allowing collection of samples in virtually
2922 all executable code. The timer interrupt mode and Itanium 2 collection mechanisms
2923 use maskable interrupts; therefore, these profiling mechanisms have "sample
2924 shadows", or blind spots: regions where no samples will be collected. Typically, the samples
2925 will be attributed to the code immediately after the interrupts are re-enabled.
2929 <title>Idle time</title>
2931 Your kernel is likely to support halting the processor when a CPU is idle. As
2932 the typical hardware events like <constant>CPU_CLK_UNHALTED</constant> do not
2933 count when the CPU is halted, the kernel profile will not reflect the actual
2934 amount of time spent idle. You can change this behaviour by booting with
2935 the <option>idle=poll</option> option, which uses a different idle routine. This
2936 will appear as <function>poll_idle()</function> in your kernel profile.
2939 <sect2 id="kernel-modules">
2940 <title>Profiling kernel modules</title>
2942 OProfile profiles kernel modules by default. However, there are a couple of problems
2943 you may have when trying to get results. First, you may have booted via an initrd;
2944 this means that the actual path for the module binaries cannot be determined automatically.
2945 To get around this, you can use the <option>-p</option> option to the profiling tools
2946 to specify where to look for the kernel modules.
2949 In kernel version 2.6, the information on where kernel module binaries are located was removed.
2950 This means OProfile needs guiding with the <option>-p</option> option to find your
2951 modules. Normally, you can just use your standard module top-level directory for this.
2952 Note that due to this problem, OProfile cannot check that the modification times match;
2953 it is your responsibility to make sure you do not modify a binary after a profile
2957 If you have run <command>insmod</command> or <command>modprobe</command> to insert a module
2958 in a particular directory, it is important that you specify this directory with the
2959 <option>-p</option> option first, so that it over-rides an older module binary that might
2960 exist in other directories you've specified with <option>-p</option>. It is up to you
2961 to make sure that these values are correct: the kernel simply does not provide enough
2962 information for OProfile to get this information.
2967 <sect1 id="interpreting-callgraph">
2968 <title>Interpreting call-graph profiles</title>
2970 Sometimes the results from call-graph profiles may be different to what
2971 you expect to see. The first thing to check is whether the target
2972 binaries where compiled with frame pointers enabled (if the binary was
2973 compiled using <command>gcc</command>'s
2974 <option>-fomit-frame-pointer</option> option, you will not get
2975 meaningful results). Note that as of this writing, the GCC developers
2976 plan to disable frame pointers by default. The Linux kernel is built
2977 without frame pointers by default; there is a configuration option you
2978 can use to turn it on under the "Kernel Hacking" menu.
2981 Often you may see a caller of a function that does not actually directly
2982 call the function you're looking at (e.g. if <function>a()</function>
2983 calls <function>b()</function>, which in turn calls
2984 <function>c()</function>, you may see an entry for
2985 <function>a()->c()</function>). What's actually occurring is that we
2986 are taking samples at the very start (or the very end) of
2987 <function>c()</function>; at these few instructions, we haven't yet
2988 created the new function's frame, so it appears as if
2989 <function>a()</function> is calling directly into
2990 <function>c()</function>. Be careful not to be misled by these
2994 Like the rest of OProfile, call-graph profiling uses a statistical
2995 approach; this means that sometimes a backtrace sample is truncated, or
2996 even partially wrong. Bear this in mind when examining results.
2998 <!-- FIXME: what do we need here ? -->
3001 <sect1 id="debug-info">
3002 <title>Inaccuracies in annotated source</title>
3003 <sect2 id="effect-of-optimizations">
3004 <title>Side effects of optimizations</title>
3006 The compiler can introduce some pitfalls in the annotated source output.
3007 The optimizer can move pieces of code in such manner that two line of codes
3008 are interlaced (instruction scheduling). Also debug info generated by the compiler
3009 can show strange behavior. This is especially true for complex expressions e.g. inside
3018 here the problem come from the position of line number. The available debug
3019 info does not give enough details for the if condition, so all samples are
3020 accumulated at the position of the right brace of the expression. Using
3021 <command>opannotate <option>-a</option></command> can help to show the real
3022 samples at an assembly level.
3025 <sect2 id="prologues">
3026 <title>Prologues and epilogues</title>
3028 The compiler generally needs to generate "glue" code across function calls, dependent
3029 on the particular function call conventions used. Additionally other things
3030 need to happen, like stack pointer adjustment for the local variables; this
3031 code is known as the function prologue. Similar code is needed at function return,
3032 and is known as the function epilogue. This will show up in annotations as
3033 samples at the very start and end of a function, where there is no apparent
3034 executable code in the source.
3037 <sect2 id="inlined-function">
3038 <title>Inlined functions</title>
3040 You may see that a function is credited with a certain number of samples, but
3041 the listing does not add up to the correct total. To pick a real example :
3044 :internal_sk_buff_alloc_security(struct sk_buff *skb)
3045 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */
3047 : sk_buff_security_t *sksec;
3048 15 0.0995% : int rc = 0;
3050 10 0.06633% : sksec = skb->lsm_security;
3051 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) {
3055 : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb);
3056 3 0.0199% : if (!sksec) {
3057 38 0.2521% : rc = -ENOMEM;
3060 : memset(sksec, 0, sizeof (sk_buff_security_t));
3061 44 0.2919% : sksec->magic = DSI_MAGIC;
3062 32 0.2123% : sksec->skb = skb;
3063 45 0.2985% : sksec->sid = DSI_SID_NORMAL;
3064 31 0.2056% : skb->lsm_security = sksec;
3068 146 0.9685% : return rc;
3073 Here, the function is credited with 1,882 samples, but the annotations
3074 below do not account for this. This is usually because of inline functions -
3075 the compiler marks such code with debug entries for the inline function
3076 definition, and this is where <command>opannotate</command> annotates
3077 such samples. In the case above, <function>memset</function> is the most
3078 likely candidate for this problem. Examining the mixed source/assembly
3079 output can help identify such results.
3082 This problem is more visible when there is no source file available, in the
3083 following example it's trivially visible the sums of symbols samples is less
3084 than the number of the samples for this file. The difference must be accounted
3085 to inline functions.
3089 * Total samples for file : "arch/i386/kernel/process.c"
3094 /* default_idle total: 84 1.8970 */
3095 /* cpu_idle total: 21 0.4743 */
3096 /* flush_thread total: 1 0.0226 */
3097 /* prepare_to_copy total: 1 0.0226 */
3098 /* __switch_to total: 18 0.4065 */
3101 The missing samples are not lost, they will be credited to another source
3102 location where the inlined function is defined. The inlined function will be
3103 credited from multiple call site and merged in one place in the annotated
3104 source file so there is no way to see from what call site are coming the
3105 samples for an inlined function.
3108 When running <command>opannotate</command>, you may get a warning
3109 "some functions compiled without debug information may have incorrect source line attributions".
3110 In some rare cases, OProfile is not able to verify that the derived source line
3111 is correct (when some parts of the binary image are compiled without debugging
3112 information). Be wary of results if this warning appears.
3115 Furthermore, for some languages the compiler can implicitly generate functions,
3116 such as default copy constructors. Such functions are labelled by the compiler
3117 as having a line number of 0, which means the source annotation can be confusing.
3119 <!-- FIXME so what *actually* happens to those samples ? ignored ? -->
3121 <sect2 id="wrong-linenr-info">
3122 <title>Inaccuracy in line number information</title>
3124 Depending on your compiler you can fall into the following problem:
3127 struct big_object { int a[500]; };
3132 for (int i = 0 ; i != 1000 * 1000; ++i)
3139 Compiled with <command>gcc</command> 3.0.4 the annotated source is clearly inaccurate:
3143 :{ /* main total: 7871 100% */
3145 : for (int i = 0 ; i != 1000 * 1000; ++i)
3147 7871 100% : return 0;
3151 The problem here is distinct from the IRQ latency problem; the debug line number
3152 information is not precise enough; again, looking at output of <command>opannoatate -as</command> can help.
3158 : for (int i = 0 ; i != 1000 * 1000; ++i)
3159 : 80484c0: push %ebp
3160 : 80484c1: mov %esp,%ebp
3161 : 80484c3: sub $0xfac,%esp
3162 : 80484c9: push %edi
3163 : 80484ca: push %esi
3164 : 80484cb: push %ebx
3166 : 80484cc: lea 0xfffff060(%ebp),%edx
3167 : 80484d2: lea 0xfffff830(%ebp),%eax
3168 : 80484d8: mov $0xf423f,%ebx
3169 : 80484dd: lea 0x0(%esi),%esi
3171 3 0.03811% : 80484e0: mov %edx,%edi
3172 : 80484e2: mov %eax,%esi
3173 1 0.0127% : 80484e4: cld
3174 8 0.1016% : 80484e5: mov $0x1f4,%ecx
3175 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi)
3176 9 0.1143% : 80484ec: dec %ebx
3177 : 80484ed: jns 80484e0
3178 : 80484ef: xor %eax,%eax
3186 So here it's clear that copying is correctly credited with of all the samples, but the
3187 line number information is misplaced. <command>objdump -dS</command> exposes the
3188 same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising.
3189 The problem of debug information
3190 accuracy is also dependent on the binutils version used; some BFD library versions
3191 contain a work-around for known problems of <command>gcc</command>, some others do not. This is unfortunate but we must live with that,
3192 since profiling is pointless when you disable optimisation (which would give better debugging entries).
3196 <sect1 id="symbol-without-debug-info">
3197 <title>Assembly functions</title>
3199 Often the assembler cannot generate debug information automatically.
3200 This means that you cannot get a source report unless
3201 you manually define the neccessary debug information; read your assembler documentation for how you might
3203 debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly
3204 without debugging info you can always get report for symbols, and optionally for VMA, through <command>opreport -l</command>
3205 or <command>opreport -d</command>, but this works only for symbols with the right attributes.
3206 For <command>gas</command> you can get this by
3213 whilst for <command>nasm</command> you must use
3216 GLOBAL foo:function ; [1]
3219 Note that OProfile does not need the global attribute, only the function attribute.
3224 FIXME: I commented this bit out until we've written something ...
3226 improve this ? but look first why this file is special
3227 <sect2 id="small-functions">
3228 <title>Small functions</title>
3230 Very small functions can show strange behavior. The file in your source
3231 directory of OProfile <filename>$SRC/test-oprofile/understanding/puzzle.c</filename>
3237 <sect1 id="overlapping-symbols">
3238 <title>Overlapping symbols in JITed code</title>
3240 Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously
3241 allocated space for a piece of compiled code to be reused. This means that, at one distinct
3242 code address, multiple symbols/methods may be present during the run time of the application.
3245 Since OProfile samples are buffered and don′t have timing information, there is no way
3246 to correlate samples with the (possibly) varying address ranges in which the code for a symbol
3248 An alternative would be flushing the OProfile sampling buffer when we get an unload event,
3249 but this could result in high overhead.
3252 To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was
3253 present at this address range most of the time. Additionally, other overlapping symbols
3254 are truncated in the overlapping area.
3255 This gives reasonable results, because in reality, address reuse typically takes place
3256 during phase changes of the application -- in particular, during application startup.
3257 Thus, for optimum profiling results, start the sampling session after application startup
3262 <sect1 id="interpreting_operf_results">
3263 <title>Using operf to profile fork/execs</title>
3265 When profiling an application that forks one or more new processes, <command>operf</command> will
3266 record samples for both the parent process and forked processes. This is also true even if the
3267 forked process performs an exec of some sort (e.g., <code>execvp</code>). If the
3268 process does <emphasis>not</emphasis> perform an exec, you will see that <command>opreport</command>
3269 will attribute samples for the forked process to the main application executable. On the other
3270 hand, if the forked process <emphasis>does</emphasis> perform an exec, then <command>opreport</command>
3271 will attribute samples to the executable being exec'ed.
3274 To demonstrate this, consider the following examples.
3275 When using <command>operf</command> to profile a single application (either with the <code>--pid</code>
3276 option or <code>command</code> option), the normal <command>opreport</command> summary output
3277 (i.e., invoking <command>opreport</command> with no options) looks something like the following:
3282 112342 100.000 sprintft
3286 104209 92.7605 libc-2.12.so
3287 7273 6.4740 sprintft
3288 858 0.7637 no-vmlinux
3293 But if you profile an application that does a fork/exec, the <command>opreport</command> summary output
3294 will show samples for both the main application you profiled, as well as the exec'ed program.
3295 An example is shown below where <code>s-m-fork</code> is the main application being profiled, which
3296 in turn forks a process that does an <code>execvp</code> of the <code>memcpyt</code> program.
3301 133382 70.5031 memcpyt
3305 123852 92.8551 libc-2.12.so
3307 1007 0.7550 no-vmlinux
3308 1 7.5e-04 ld-2.12.so
3309 55804 29.4969 s-m-fork
3313 51801 92.8267 libc-2.12.so
3314 3589 6.4314 s-m-fork
3315 414 0.7419 no-vmlinux
3320 <sect1 id="hidden-cost">
3321 <title>Other discrepancies</title>
3323 Another cause of apparent problems is the hidden cost of instructions. A very
3324 common example is two memory reads: one from L1 cache and the other from memory:
3325 the second memory read is likely to have more samples.
3326 There are many other causes of hidden cost of instructions. A non-exhaustive
3327 list: mis-predicted branch, TLB cache miss, partial register stall,
3328 partial register dependencies, memory mismatch stall, re-executed µops. If you want to write
3329 programs at the assembly level, be sure to take a look at the Intel and
3330 AMD documentation at <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink>
3331 and <ulink url="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</ulink>.
3336 <title>Acknowledgments</title>
3338 Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie,
3339 Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu,
3340 Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh,
3341 Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer,
3343 Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro".