1 ======================================
2 Coresight - HW Assisted Tracing on ARM
3 ======================================
5 :Author: Mathieu Poirier <mathieu.poirier@linaro.org>
6 :Date: September 11th, 2014
11 Coresight is an umbrella of technologies allowing for the debugging of ARM
12 based SoC. It includes solutions for JTAG and HW assisted tracing. This
13 document is concerned with the latter.
15 HW assisted tracing is becoming increasingly useful when dealing with systems
16 that have many SoCs and other components like GPU and DMA engines. ARM has
17 developed a HW assisted tracing solution by means of different components, each
18 being added to a design at synthesis time to cater to specific tracing needs.
19 Components are generally categorised as source, link and sinks and are
20 (usually) discovered using the AMBA bus.
22 "Sources" generate a compressed stream representing the processor instruction
23 path based on tracing scenarios as configured by users. From there the stream
24 flows through the coresight system (via ATB bus) using links that are connecting
25 the emanating source to a sink(s). Sinks serve as endpoints to the coresight
26 implementation, either storing the compressed stream in a memory buffer or
27 creating an interface to the outside world where data can be transferred to a
28 host without fear of filling up the onboard coresight memory buffer.
30 At typical coresight system would look like this::
32 *****************************************************************
33 **************************** AMBA AXI ****************************===||
34 ***************************************************************** ||
37 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
38 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
39 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
40 | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
41 | # ETM # ::::: | # PTM # ::::: ::::: @ |
42 | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
43 | |->### | ! | |->### | ! | ! . | || DAP ||
44 | | # | ! | | # | ! | ! . | |||||||||
45 | | . | ! | | . | ! | ! . | | |
46 | | . | ! | | . | ! | ! . | | *
47 | | . | ! | | . | ! | ! . | | SWD/
48 | | . | ! | | . | ! | ! . | | JTAG
49 *****************************************************************<-|
50 *************************** AMBA Debug APB ************************
51 *****************************************************************
54 *****************************************************************
55 ******************** Cross Trigger Matrix (CTM) *******************
56 *****************************************************************
59 *****************************************************************
60 ****************** AMBA Advanced Trace Bus (ATB) ******************
61 *****************************************************************
63 | * ===== F =====<---------|
64 | ::::::::: ==== U ====
65 |-->:: CTI ::<!! === N ===
68 | ! &&&&&&&&& IIIIIII == L ==
69 |------>&& ETB &&<......II I =======
72 | ! I REP I<..........
74 | !!>&&&&&&&&& II I *Source: ARM ltd.
75 |------>& TPIU &<......II I DAP = Debug Access Port
76 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
77 ; PTM = Program Trace Macrocell
78 ; CTI = Cross Trigger Interface
79 * ETB = Embedded Trace Buffer
80 To trace port TPIU= Trace Port Interface Unit
81 SWD = Serial Wire Debug
83 While on target configuration of the components is done via the APB bus,
84 all trace data are carried out-of-band on the ATB bus. The CTM provides
85 a way to aggregate and distribute signals between CoreSight components.
87 The coresight framework provides a central point to represent, configure and
88 manage coresight devices on a platform. This first implementation centers on
89 the basic tracing functionality, enabling components such ETM/PTM, funnel,
90 replicator, TMC, TPIU and ETB. Future work will enable more
91 intricate IP blocks such as STM and CTI.
94 Acronyms and Classification
95 ---------------------------
100 Program Trace Macrocell
102 Embedded Trace Macrocell
104 System trace Macrocell
106 Embedded Trace Buffer
108 Instrumentation Trace Macrocell
110 Trace Port Interface Unit
112 Trace Memory Controller, configured as Embedded Trace Router
114 Trace Memory Controller, configured as Embedded Trace FIFO
116 Cross Trigger Interface
121 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
123 Funnel, replicator (intelligent or not), TMC-ETR
125 ETBv1.0, ETB1.1, TPIU, TMC-ETF
133 See Documentation/devicetree/bindings/arm/coresight.txt for details.
135 As of this writing drivers for ITM, STMs and CTIs are not provided but are
136 expected to be added as the solution matures.
139 Framework and implementation
140 ----------------------------
142 The coresight framework provides a central point to represent, configure and
143 manage coresight devices on a platform. Any coresight compliant device can
144 register with the framework for as long as they use the right APIs:
146 .. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc);
147 .. c:function:: void coresight_unregister(struct coresight_device *csdev);
149 The registering function is taking a ``struct coresight_desc *desc`` and
150 register the device with the core framework. The unregister function takes
151 a reference to a ``struct coresight_device *csdev`` obtained at registration time.
153 If everything goes well during the registration process the new devices will
154 show up under /sys/bus/coresight/devices, as showns here for a TC2 platform::
156 root:~# ls /sys/bus/coresight/devices/
157 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
158 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
161 The functions take a ``struct coresight_device``, which looks like this::
163 struct coresight_desc {
164 enum coresight_dev_type type;
165 struct coresight_dev_subtype subtype;
166 const struct coresight_ops *ops;
167 struct coresight_platform_data *pdata;
169 const struct attribute_group **groups;
173 The "coresight_dev_type" identifies what the device is, i.e, source link or
174 sink while the "coresight_dev_subtype" will characterise that type further.
176 The ``struct coresight_ops`` is mandatory and will tell the framework how to
177 perform base operations related to the components, each component having
178 a different set of requirement. For that ``struct coresight_ops_sink``,
179 ``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been
182 The next field ``struct coresight_platform_data *pdata`` is acquired by calling
183 ``of_get_coresight_platform_data()``, as part of the driver's _probe routine and
184 ``struct device *dev`` gets the device reference embedded in the ``amba_device``::
186 static int etm_probe(struct amba_device *adev, const struct amba_id *id)
190 drvdata->dev = &adev->dev;
194 Specific class of device (source, link, or sink) have generic operations
195 that can be performed on them (see ``struct coresight_ops``). The ``**groups``
196 is a list of sysfs entries pertaining to operations
197 specific to that component only. "Implementation defined" customisations are
198 expected to be accessed and controlled using those entries.
203 The devices that appear on the "coresight" bus were named the same as their
204 parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
205 Thus the names were based on the Linux Open Firmware layer naming convention,
206 which follows the base physical address of the device followed by the device
209 root:~# ls /sys/bus/coresight/devices/
210 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
211 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
212 20070000.etr 20120000.replicator 220c0000.funnel
213 23040000.etm 23140000.etm 23340000.etm
215 However, with the introduction of ACPI support, the names of the real
216 devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
217 introduced to use more generic names based on the type of the device. The
218 following rules apply::
220 1) Devices that are bound to CPUs, are named based on the CPU logical
223 e.g, ETM bound to CPU0 is named "etm0"
225 2) All other devices follow a pattern, "<device_type_prefix>N", where :
227 <device_type_prefix> - A prefix specific to the type of the device
228 N - a sequential number assigned based on the order
231 e.g, tmc_etf0, tmc_etr0, funnel0, funnel1
233 Thus, with the new scheme the devices could appear as ::
235 root:~# ls /sys/bus/coresight/devices/
236 etm0 etm1 etm2 etm3 etm4 etm5 funnel0
237 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
239 Some of the examples below might refer to old naming scheme and some
240 to the newer scheme, to give a confirmation that what you see on your
241 system is not unexpected. One must use the "names" as they appear on
242 the system under specified locations.
244 How to use the tracer modules
245 -----------------------------
247 There are two ways to use the Coresight framework:
249 1. using the perf cmd line tools.
250 2. interacting directly with the Coresight devices using the sysFS interface.
252 Preference is given to the former as using the sysFS interface
253 requires a deep understanding of the Coresight HW. The following sections
254 provide details on using both methods.
256 1) Using the sysFS interface:
258 Before trace collection can start, a coresight sink needs to be identified.
259 There is no limit on the amount of sinks (nor sources) that can be enabled at
260 any given moment. As a generic operation, all device pertaining to the sink
261 class will have an "active" entry in sysfs::
263 root:/sys/bus/coresight/devices# ls
264 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
265 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
266 root:/sys/bus/coresight/devices# ls 20010000.etb
267 enable_sink status trigger_cntr
268 root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
269 root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
271 root:/sys/bus/coresight/devices#
273 At boot time the current etm3x driver will configure the first address
274 comparator with "_stext" and "_etext", essentially tracing any instruction
275 that falls within that range. As such "enabling" a source will immediately
276 trigger a trace capture::
278 root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
279 root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
281 root:/sys/bus/coresight/devices# cat 20010000.etb/status
285 RAM wrt ptr: 0x19d3 <----- The write pointer is moving
290 root:/sys/bus/coresight/devices#
292 Trace collection is stopped the same way::
294 root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
295 root:/sys/bus/coresight/devices#
297 The content of the ETB buffer can be harvested directly from /dev::
299 root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
303 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
304 root:/sys/bus/coresight/devices#
306 The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
308 Following is a DS-5 output of an experimental loop that increments a variable up
309 to a certain value. The example is simple and yet provides a glimpse of the
310 wealth of possibilities that coresight provides.
314 Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
315 Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
316 Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
317 Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
318 Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
319 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
320 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
321 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
322 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
323 Timestamp Timestamp: 17106715833
324 Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
325 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
326 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
327 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
328 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
329 Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
330 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
331 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
332 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
333 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
334 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
335 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
336 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
337 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
338 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
339 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
340 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
341 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
342 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
343 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
344 Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
345 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
346 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
347 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
348 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
349 Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
350 Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
351 Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
352 Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
353 Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
354 Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
355 Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
356 Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
357 Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
359 Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
360 Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
361 Timestamp Timestamp: 17107041535
363 2) Using perf framework:
365 Coresight tracers are represented using the Perf framework's Performance
366 Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
367 controlling when tracing gets enabled based on when the process of interest is
368 scheduled. When configured in a system, Coresight PMUs will be listed when
369 queried by the perf command line tool:
371 linaro@linaro-nano:~$ ./perf list pmu
373 List of pre-defined events (to be used in -e):
375 cs_etm// [Kernel PMU event]
377 linaro@linaro-nano:~$
379 Regardless of the number of tracers available in a system (usually equal to the
380 amount of processor cores), the "cs_etm" PMU will be listed only once.
382 A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
383 listed along with configuration options within forward slashes '/'. Since a
384 Coresight system will typically have more than one sink, the name of the sink to
385 work with needs to be specified as an event option.
386 On newer kernels the available sinks are listed in sysFS under
387 ($SYSFS)/bus/event_source/devices/cs_etm/sinks/::
389 root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
390 tmc_etf0 tmc_etr0 tpiu0
392 On older kernels, this may need to be found from the list of coresight devices,
393 available under ($SYSFS)/bus/coresight/devices/::
395 root:~# ls /sys/bus/coresight/devices/
396 etm0 etm1 etm2 etm3 etm4 etm5 funnel0
397 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
398 root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
400 As mentioned above in section "Device Naming scheme", the names of the devices could
401 look different from what is used in the example above. One must use the device names
402 as it appears under the sysFS.
404 The syntax within the forward slashes '/' is important. The '@' character
405 tells the parser that a sink is about to be specified and that this is the sink
406 to use for the trace session.
408 More information on the above and other example on how to use Coresight with
409 the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
410 repository [#third]_.
412 2.1) AutoFDO analysis using the perf tools:
414 perf can be used to record and analyze trace of programs.
416 Execution can be recorded using 'perf record' with the cs_etm event,
417 specifying the name of the sink to record to, e.g::
419 perf record -e cs_etm/@tmc_etr0/u --per-thread
421 The 'perf report' and 'perf script' commands can be used to analyze execution,
422 synthesizing instruction and branch events from the instruction trace.
423 'perf inject' can be used to replace the trace data with the synthesized events.
424 The --itrace option controls the type and frequency of synthesized events
425 (see perf documentation).
427 Note that only 64-bit programs are currently supported - further work is
428 required to support instruction decode of 32-bit Arm programs.
431 Generating coverage files for Feedback Directed Optimization: AutoFDO
432 ---------------------------------------------------------------------
434 'perf inject' accepts the --itrace option in which case tracing data is
435 removed and replaced with the synthesized events. e.g.
438 perf inject --itrace --strip -i perf.data -o perf.data.new
440 Below is an example of using ARM ETM for autoFDO. It requires autofdo
441 (https://github.com/google/autofdo) and gcc version 5. The bubble
442 sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
445 $ gcc-5 -O3 sort.c -o sort
446 $ taskset -c 2 ./sort
447 Bubble sorting array of 30000 elements
450 $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
451 Bubble sorting array of 30000 elements
453 [ perf record: Woken up 35 times to write data ]
454 [ perf record: Captured and wrote 69.640 MB perf.data ]
456 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
457 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
458 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
459 $ taskset -c 2 ./sort_autofdo
460 Bubble sorting array of 30000 elements
464 How to use the STM module
465 -------------------------
467 Using the System Trace Macrocell module is the same as the tracers - the only
468 difference is that clients are driving the trace capture rather
469 than the program flow through the code.
471 As with any other CoreSight component, specifics about the STM tracer can be
472 found in sysfs with more information on each entry being found in [#first]_::
474 root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
475 enable_source hwevent_select port_enable subsystem uevent
476 hwevent_enable mgmt port_select traceid
479 Like any other source a sink needs to be identified and the STM enabled before
482 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
483 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
485 From there user space applications can request and use channels using the devfs
486 interface provided for that purpose by the generic STM API::
488 root@genericarmv8:~# ls -l /dev/stm0
489 crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0
492 Details on how to use the generic STM API can be found here [#second]_.
494 .. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
496 .. [#second] Documentation/trace/stm.rst
498 .. [#third] https://github.com/Linaro/perf-opencsd