1 .. SPDX-License-Identifier: GPL-2.0
7 :Author: Tom Zanussi <zanussi@kernel.org>
9 This document attempts to provide a description of how the ftrace
10 histograms work and how the individual pieces map to the data
11 structures used to implement them in trace_events_hist.c and
14 Note: All the ftrace histogram command examples assume the working
15 directory is the ftrace /tracing directory. For example::
17 # cd /sys/kernel/tracing
19 Also, the histogram output displayed for those commands will be
20 generally be truncated - only enough to make the point is displayed.
22 'hist_debug' trace event files
23 ==============================
25 If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an
26 event file named 'hist_debug' will appear in each event's
27 subdirectory. This file can be read at any time and will display some
28 of the hist trigger internals described in this document. Specific
29 examples and output will be described in test cases below.
34 First, basic histograms. Below is pretty much the simplest thing you
35 can do with histograms - create one with a single key on a single
36 event and cat the output::
38 # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger
40 # cat events/sched/sched_waking/hist
42 { pid: 18249 } hitcount: 1
43 { pid: 13399 } hitcount: 1
44 { pid: 17973 } hitcount: 1
45 { pid: 12572 } hitcount: 1
47 { pid: 10 } hitcount: 921
48 { pid: 18255 } hitcount: 1444
49 { pid: 25526 } hitcount: 2055
50 { pid: 5257 } hitcount: 2055
51 { pid: 27367 } hitcount: 2055
52 { pid: 1728 } hitcount: 2161
59 What this does is create a histogram on the sched_waking event using
60 pid as a key and with a single value, hitcount, which even if not
61 explicitly specified, exists for every histogram regardless.
63 The hitcount value is a per-bucket value that's automatically
64 incremented on every hit for the given key, which in this case is the
67 So in this histogram, there's a separate bucket for each pid, and each
68 bucket contains a value for that bucket, counting the number of times
69 sched_waking was called for that pid.
71 Each histogram is represented by a hist_data struct.
73 To keep track of each key and value field in the histogram, hist_data
74 keeps an array of these fields named fields[]. The fields[] array is
75 an array containing struct hist_field representations of each
76 histogram val and key in the histogram (variables are also included
77 here, but are discussed later). So for the above histogram we have one
78 key and one value; in this case the one value is the hitcount value,
79 which all histograms have, regardless of whether they define that
80 value or not, which the above histogram does not.
82 Each struct hist_field contains a pointer to the ftrace_event_field
83 from the event's trace_event_file along with various bits related to
84 that such as the size, offset, type, and a hist_field_fn_t function,
85 which is used to grab the field's data from the ftrace event buffer
86 (in most cases - some hist_fields such as hitcount don't directly map
87 to an event field in the trace buffer - in these cases the function
88 implementation gets its value from somewhere else). The flags field
89 indicates which type of field it is - key, value, variable, variable
90 reference, etc., with value being the default.
92 The other important hist_data data structure in addition to the
93 fields[] array is the tracing_map instance created for the histogram,
94 which is held in the .map member. The tracing_map implements the
95 lock-free hash table used to implement histograms (see
96 kernel/trace/tracing_map.h for much more discussion about the
97 low-level data structures implementing the tracing_map). For the
98 purposes of this discussion, the tracing_map contains a number of
99 buckets, each bucket corresponding to a particular tracing_map_elt
100 object hashed by a given histogram key.
102 Below is a diagram the first part of which describes the hist_data and
103 associated key and value fields for the histogram described above. As
104 you can see, there are two fields in the fields array, one val field
105 for the hitcount and one key field for the pid key.
107 Below that is a diagram of a run-time snapshot of what the tracing_map
108 might look like for a given run. It attempts to show the
109 relationships between the hist_data fields and the tracing_map
110 elements for a couple hypothetical keys and values.::
114 +------------------+ +----------------+
115 | .fields[] |---->| val = hitcount |----------------------------+
116 +----------------+ +----------------+ |
118 +----------------+ +--------------+ |
126 +----------------+ <--- n_vals |
127 | key = pid |----------------------------|--+
128 +----------------+ | |
134 +----------------+ <--- n_fields | |
136 +----------------+ | |
143 n_keys = n_fields - n_vals | |
145 The hist_data n_vals and n_fields delineate the extent of the fields[] | |
146 array and separate keys from values for the rest of the code. | |
148 Below is a run-time representation of the tracing_map part of the | |
149 histogram, with pointers from various parts of the fields[] array | |
150 to corresponding parts of the tracing_map. | |
152 The tracing_map consists of an array of tracing_map_entrys and a set | |
153 of preallocated tracing_map_elts (abbreviated below as map_entry and | |
154 map_elt). The total number of map_entrys in the hist_data.map array = | |
155 map->max_elts (actually map->map_size but only max_elts of those are | |
156 used. This is a property required by the map_insert() algorithm). | |
158 If a map_entry is unused, meaning no key has yet hashed into it, its | |
159 .key value is 0 and its .val pointer is NULL. Once a map_entry has | |
160 been claimed, the .key value contains the key's hash value and the | |
161 .val member points to a map_elt containing the full key and an entry | |
162 for each key or value in the map_elt.fields[] array. There is an | |
163 entry in the map_elt.fields[] array corresponding to each hist_field | |
164 in the histogram, and this is where the continually aggregated sums | |
165 corresponding to each histogram value are kept. | |
167 The diagram attempts to show the relationship between the | |
168 hist_data.fields[] and the map_elt.fields[] with the links drawn | |
175 +---------+ +-----------+ | |
176 | .map |---->| map_entry | | |
177 +---------+ +-----------+ | |
180 | .val |---> NULL | |
184 | .key |---> pid = 999 | |
185 +---------+ +-----------+ | |
186 | .val |--->| map_elt | | |
187 +---------+ +-----------+ | |
188 . | .key |---> full key * | |
189 . +---------+ +---------------+ | |
190 . | .fields |--->| .sum (val) |<-+ |
191 +-----------+ +---------+ | 2345 | | |
192 | map_entry | +---------------+ | |
193 +-----------+ | .offset (key) |<----+
194 | .key |---> 0 | 0 | | |
195 +---------+ +---------------+ | |
196 | .val |---> NULL . | |
199 +-----------+ +---------------+ | |
200 | .key | | .sum (val) or | | |
201 +---------+ +---------+ | .offset (key) | | |
202 | .val |--->| map_elt | +---------------+ | |
203 +-----------+ +---------+ | .sum (val) or | | |
204 | map_entry | | .offset (key) | | |
205 +-----------+ +---------------+ | |
206 | .key |---> pid = 4444 | |
207 +---------+ +-----------+ | |
208 | .val | | map_elt | | |
209 +---------+ +-----------+ | |
210 | .key |---> full key * | |
211 +---------+ +---------------+ | |
212 | .fields |--->| .sum (val) |<-+ |
213 +---------+ | 65523 | |
215 | .offset (key) |<----+
229 Abbreviations used in the diagrams::
231 hist_data = struct hist_trigger_data
232 hist_data.fields = struct hist_field
234 map_entry = struct tracing_map_entry
235 map_elt = struct tracing_map_elt
236 map_elt.fields = struct tracing_map_field
238 Whenever a new event occurs and it has a hist trigger associated with
239 it, event_hist_trigger() is called. event_hist_trigger() first deals
240 with the key: for each subkey in the key (in the above example, there
241 is just one subkey corresponding to pid), the hist_field that
242 represents that subkey is retrieved from hist_data.fields[] and the
243 hist_field_fn_t fn() associated with that field, along with the
244 field's size and offset, is used to grab that subkey's data from the
245 current trace record.
247 Once the complete key has been retrieved, it's used to look that key
248 up in the tracing_map. If there's no tracing_map_elt associated with
249 that key, an empty one is claimed and inserted in the map for the new
250 key. In either case, the tracing_map_elt associated with that key is
253 Once a tracing_map_elt available, hist_trigger_elt_update() is called.
254 As the name implies, this updates the element, which basically means
255 updating the element's fields. There's a tracing_map_field associated
256 with each key and value in the histogram, and each of these correspond
257 to the key and value hist_fields created when the histogram was
258 created. hist_trigger_elt_update() goes through each value hist_field
259 and, as for the keys, uses the hist_field's fn() and size and offset
260 to grab the field's value from the current trace record. Once it has
261 that value, it simply adds that value to that field's
262 continually-updated tracing_map_field.sum member. Some hist_field
263 fn()s, such as for the hitcount, don't actually grab anything from the
264 trace record (the hitcount fn() just increments the counter sum by 1),
265 but the idea is the same.
267 Once all the values have been updated, hist_trigger_elt_update() is
268 done and returns. Note that there are also tracing_map_fields for
269 each subkey in the key, but hist_trigger_elt_update() doesn't look at
270 them or update anything - those exist only for sorting, which can
276 This is a good example to try. It produces 3 value fields and 2 key
277 fields in the output::
279 # echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
281 To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It
282 will show the trigger info of the histogram it corresponds to, along
283 with the address of the hist_data associated with the histogram, which
284 will become useful in later examples. It then displays the number of
285 total hist_fields associated with the histogram along with a count of
286 how many of those correspond to keys and how many correspond to values.
288 It then goes on to display details for each field, including the
289 field's flags and the position of each field in the hist_data's
290 fields[] array, which is useful information for verifying that things
291 internally appear correct or not, and which again will become even
292 more useful in further examples::
294 # cat events/kmem/kmalloc/hist_debug
298 # trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active]
301 hist_data: 000000005e48c9a5
309 hist_data->fields[0]:
311 VAL: HIST_FIELD_FL_HITCOUNT
316 hist_data->fields[1]:
318 VAL: normal u64 value
319 ftrace_event_field name: bytes_req
324 hist_data->fields[2]:
326 VAL: normal u64 value
327 ftrace_event_field name: bytes_alloc
334 hist_data->fields[3]:
337 ftrace_event_field name: common_pid
342 hist_data->fields[4]:
345 ftrace_event_field name: call_site
350 The commands below can be used to clean things up for the next test::
352 # echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger
357 Variables allow data from one hist trigger to be saved by one hist
358 trigger and retrieved by another hist trigger. For example, a trigger
359 on the sched_waking event can capture a timestamp for a particular
360 pid, and later a sched_switch event that switches to that pid event
361 can grab the timestamp and use it to calculate a time delta between
364 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
365 events/sched/sched_waking/trigger
367 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >>
368 events/sched/sched_switch/trigger
370 In terms of the histogram data structures, variables are implemented
371 as another type of hist_field and for a given hist trigger are added
372 to the hist_data.fields[] array just after all the val fields. To
373 distinguish them from the existing key and val fields, they're given a
374 new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also
375 make use of a new .var.idx field member in struct hist_field, which
376 maps them to an index in a new map_elt.vars[] array added to the
377 map_elt specifically designed to store and retrieve variable values.
378 The diagram below shows those new elements and adds a new variable
379 entry, ts0, corresponding to the ts0 variable in the sched_waking
382 sched_waking histogram
383 ----------------------::
386 | hist_data |<-------------------------------------------------------+
387 +------------------+ +-------------------+ |
388 | .fields[] |-->| val = hitcount | |
389 +----------------+ +-------------------+ |
391 +----------------+ +-----------------+ |
393 +-----------------+ |
395 +-----------------+ |
397 +-----------------+ |
399 +-------------------+ |
401 +-------------------+ |
403 +-----------------+ |
405 +-----------------+ |
407 +-----------------+ |
408 | .flags & FL_VAR | |
409 +-----------------+ |
410 | .var.idx |----------------------------+-+ |
411 +-----------------+ | | |
415 +-------------------+ <--- n_vals | | |
417 +-------------------+ | | |
419 +-----------------+ | | |
421 +-----------------+ | | |
423 +-----------------+ | | |
424 | .flags & FL_KEY | | | |
425 +-----------------+ | | |
427 +-------------------+ <--- n_fields | | |
429 +-------------------+ | | |
431 +-----------------+ | | |
433 +-----------------+ | | |
435 +-----------------+ | | |
437 +-----------------+ | | |
439 +-----------------+ | | |
440 n_keys = n_fields - n_vals | | |
443 This is very similar to the basic case. In the above diagram, we can | | |
444 see a new .flags member has been added to the struct hist_field | | |
445 struct, and a new entry added to hist_data.fields representing the ts0 | | |
446 variable. For a normal val hist_field, .flags is just 0 (modulo | | |
447 modifier flags), but if the value is defined as a variable, the .flags | | |
448 contains a set FL_VAR bit. | | |
450 As you can see, the ts0 entry's .var.idx member contains the index | | |
451 into the tracing_map_elts' .vars[] array containing variable values. | | |
452 This idx is used whenever the value of the variable is set or read. | | |
453 The map_elt.vars idx assigned to the given variable is assigned and | | |
454 saved in .var.idx by create_tracing_map_fields() after it calls | | |
455 tracing_map_add_var(). | | |
457 Below is a representation of the histogram at run-time, which | | |
458 populates the map, along with correspondence to the above hist_data and | | |
459 hist_field data structures. | | |
461 The diagram attempts to show the relationship between the | | |
462 hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | |
463 the links drawn between diagrams. For each of the map_elts, you can | | |
464 see that the .fields[] members point to the .sum or .offset of a key | | |
465 or val and the .vars[] members point to the value of a variable. The | | |
466 arrows between the two diagrams show the linkages between those | | |
467 tracing_map members and the field definitions in the corresponding | | |
468 hist_data fields[] members.::
474 +---------+ +-----------+ | | |
475 | .map |---->| map_entry | | | |
476 +---------+ +-----------+ | | |
479 | .val |---> NULL | | |
483 | .key |---> pid = 999 | | |
484 +---------+ +-----------+ | | |
485 | .val |--->| map_elt | | | |
486 +---------+ +-----------+ | | |
487 . | .key |---> full key * | | |
488 . +---------+ +---------------+ | | |
489 . | .fields |--->| .sum (val) | | | |
490 . +---------+ | 2345 | | | |
491 . +--| .vars | +---------------+ | | |
492 . | +---------+ | .offset (key) | | | |
494 . | +---------------+ | | |
498 . | +---------------+ | | |
499 . | | .sum (val) or | | | |
500 . | | .offset (key) | | | |
501 . | +---------------+ | | |
502 . | | .sum (val) or | | | |
503 . | | .offset (key) | | | |
504 . | +---------------+ | | |
506 . +---------------->+---------------+ | | |
508 . | 113345679876 | | | |
509 . +---------------+ | | |
512 . +---------------+ | | |
516 . +---------------+ | | |
519 . +---------------+ | | |
522 . +---------------+ | | |
527 | .key |---> pid = 4444 | | |
528 +---------+ +-----------+ | | |
529 | .val |--->| map_elt | | | |
530 +---------+ +-----------+ | | |
531 . | .key |---> full key * | | |
532 . +---------+ +---------------+ | | |
533 . | .fields |--->| .sum (val) | | | |
534 +---------+ | 2345 | | | |
535 +--| .vars | +---------------+ | | |
536 | +---------+ | .offset (key) | | | |
538 | +---------------+ | | |
542 | +---------------+ | | |
543 | | .sum (val) or | | | |
544 | | .offset (key) | | | |
545 | +---------------+ | | |
546 | | .sum (val) or | | | |
547 | | .offset (key) | | | |
548 | +---------------+ | | |
550 | +---------------+ | | |
551 +---------------->| ts0 |<--+ | |
553 +---------------+ | |
556 +---------------+ | |
560 +---------------+ | |
563 +---------------+ | |
566 +---------------+ | |
568 For each used map entry, there's a map_elt pointing to an array of | |
569 .vars containing the current value of the variables associated with | |
570 that histogram entry. So in the above, the timestamp associated with | |
571 pid 999 is 113345679876, and the timestamp variable in the same | |
572 .var.idx for pid 4444 is 213499240729. | |
574 sched_switch histogram | |
575 ---------------------- | |
577 The sched_switch histogram paired with the above sched_waking | |
578 histogram is shown below. The most important aspect of the | |
579 sched_switch histogram is that it references a variable on the | |
580 sched_waking histogram above. | |
582 The histogram diagram is very similar to the others so far displayed, | |
583 but it adds variable references. You can see the normal hitcount and | |
584 key fields along with a new wakeup_lat variable implemented in the | |
585 same way as the sched_waking ts0 variable, but in addition there's an | |
586 entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | |
588 Associated with the new var ref field are a couple of new hist_field | |
589 members, var.hist_data and var_ref_idx. For a variable reference, the | |
590 var.hist_data goes with the var.idx, which together uniquely identify | |
591 a particular variable on a particular histogram. The var_ref_idx is | |
592 just the index into the var_ref_vals[] array that caches the values of | |
593 each variable whenever a hist trigger is updated. Those resulting | |
594 values are then finally accessed by other code such as trace action | |
595 code that uses the var_ref_idx values to assign param values. | |
597 The diagram below describes the situation for the sched_switch | |
598 histogram referred to before::
600 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | |
601 events/sched/sched_switch/trigger | |
603 +------------------+ | |
605 +------------------+ +-----------------------+ | |
606 | .fields[] |-->| val = hitcount | | |
607 +----------------+ +-----------------------+ | |
608 | .map | | .size | | |
609 +----------------+ +---------------------+ | |
610 +--| .var_refs[] | | .offset | | |
611 | +----------------+ +---------------------+ | |
613 | var_ref_vals[] +---------------------+ | |
614 | +-------------+ | .flags | | |
615 | | $ts0 |<---+ +---------------------+ | |
616 | +-------------+ | | .var.idx | | |
617 | | | | +---------------------+ | |
618 | +-------------+ | | .var.hist_data | | |
619 | | | | +---------------------+ | |
620 | +-------------+ | | .var_ref_idx | | |
621 | | | | +-----------------------+ | |
622 | +-------------+ | | var = wakeup_lat | | |
623 | . | +-----------------------+ | |
625 | . | +---------------------+ | |
626 | +-------------+ | | .offset | | |
627 | | | | +---------------------+ | |
628 | +-------------+ | | .fn() | | |
629 | | | | +---------------------+ | |
630 | +-------------+ | | .flags & FL_VAR | | |
631 | | +---------------------+ | |
633 | | +---------------------+ | |
634 | | | .var.hist_data | | |
635 | | +---------------------+ | |
636 | | | .var_ref_idx | | |
637 | | +---------------------+ | |
641 | | +-----------------------+ <--- n_vals | |
642 | | | key = pid | | |
643 | | +-----------------------+ | |
645 | | +---------------------+ | |
647 | | +---------------------+ | |
649 | | +---------------------+ | |
651 | | +---------------------+ | |
653 | | +-----------------------+ <--- n_fields | |
655 | | +-----------------------+ | |
657 | | +---------------------+ | |
659 | | +---------------------+ | |
661 | | +---------------------+ | |
663 | | +---------------------+ | |
665 | | +---------------------+ | |
666 | | n_keys = n_fields - n_vals | |
669 | | +-----------------------+ | |
670 +---------------------->| var_ref = $ts0 | | |
671 | +-----------------------+ | |
673 | +---------------------+ | |
675 | +---------------------+ | |
677 | +---------------------+ | |
678 | | .flags & FL_VAR_REF | | |
679 | +---------------------+ | |
680 | | .var.idx |--------------------------+ |
681 | +---------------------+ |
682 | | .var.hist_data |----------------------------+
683 | +---------------------+
685 +---------------------+
687 Abbreviations used in the diagrams::
689 hist_data = struct hist_trigger_data
690 hist_data.fields = struct hist_field
692 FL_KEY = HIST_FIELD_FL_KEY
693 FL_VAR = HIST_FIELD_FL_VAR
694 FL_VAR_REF = HIST_FIELD_FL_VAR_REF
696 When a hist trigger makes use of a variable, a new hist_field is
697 created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the
698 var.idx and var.hist_data take the same values as the referenced
699 variable, as well as the referenced variable's size, type, and
700 is_signed values. The VAR_REF field's .name is set to the name of the
701 variable it references. If a variable reference was created using the
702 explicit system.event.$var_ref notation, the hist_field's system and
703 event_name variables are also set.
705 So, in order to handle an event for the sched_switch histogram,
706 because we have a reference to a variable on another histogram, we
707 need to resolve all variable references first. This is done via the
708 resolve_var_refs() calls made from event_hist_trigger(). What this
709 does is grabs the var_refs[] array from the hist_data representing the
710 sched_switch histogram. For each one of those, the referenced
711 variable's var.hist_data along with the current key is used to look up
712 the corresponding tracing_map_elt in that histogram. Once found, the
713 referenced variable's var.idx is used to look up the variable's value
714 using tracing_map_read_var(elt, var.idx), which yields the value of
715 the variable for that element, ts0 in the case above. Note that both
716 the hist_fields representing both the variable and the variable
717 reference have the same var.idx, so this is straightforward.
719 Variable and variable reference test
720 ------------------------------------
722 This example creates a variable on the sched_waking event, ts0, and
723 uses it in the sched_switch trigger. The sched_switch trigger also
724 creates its own variable, wakeup_lat, but nothing yet uses it::
726 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
728 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
730 Looking at the sched_waking 'hist_debug' output, in addition to the
731 normal key and value hist_fields, in the val fields section we see a
732 field with the HIST_FIELD_FL_VAR flag, which indicates that that field
733 represents a variable. Note that in addition to the variable name,
734 contained in the var.name field, it includes the var.idx, which is the
735 index into the tracing_map_elt.vars[] array of the actual variable
736 location. Note also that the output shows that variables live in the
737 same part of the hist_data->fields[] array as normal values::
739 # cat events/sched/sched_waking/hist_debug
743 # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
746 hist_data: 000000009536f554
754 hist_data->fields[0]:
756 VAL: HIST_FIELD_FL_HITCOUNT
761 hist_data->fields[1]:
765 var.idx (into tracing_map_elt.vars[]): 0
772 hist_data->fields[2]:
775 ftrace_event_field name: pid
780 Moving on to the sched_switch trigger hist_debug output, in addition
781 to the unused wakeup_lat variable, we see a new section displaying
782 variable references. Variable references are displayed in a separate
783 section because in addition to being logically separate from
784 variables and values, they actually live in a separate hist_data
787 In this example, the sched_switch trigger has a reference to a
788 variable on the sched_waking trigger, $ts0. Looking at the details,
789 we can see that the var.hist_data value of the referenced variable
790 matches the previously displayed sched_waking trigger, and the var.idx
791 value matches the previously displayed var.idx value for that
792 variable. Also displayed is the var_ref_idx value for that variable
793 reference, which is where the value for that variable is cached for
794 use when the trigger is invoked::
796 # cat events/sched/sched_switch/hist_debug
800 # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active]
803 hist_data: 00000000f4ee8006
811 hist_data->fields[0]:
813 VAL: HIST_FIELD_FL_HITCOUNT
818 hist_data->fields[1]:
822 var.idx (into tracing_map_elt.vars[]): 0
829 hist_data->fields[2]:
832 ftrace_event_field name: next_pid
837 variable reference fields:
839 hist_data->var_refs[0]:
841 HIST_FIELD_FL_VAR_REF
843 var.idx (into tracing_map_elt.vars[]): 0
844 var.hist_data: 000000009536f554
845 var_ref_idx (into hist_data->var_refs[]): 0
850 The commands below can be used to clean things up for the next test::
852 # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger
854 # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
859 Adding onto the previous example, we will now do something with that
860 wakeup_lat variable, namely send it and another field as a synthetic
863 The onmatch() action below basically says that whenever we have a
864 sched_switch event, if we have a matching sched_waking event, in this
865 case if we have a pid in the sched_waking histogram that matches the
866 next_pid field on this sched_switch event, we retrieve the
867 variables specified in the wakeup_latency() trace action, and use
868 them to generate a new wakeup_latency event into the trace stream.
870 Note that the way the trace handlers such as wakeup_latency() (which
871 could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid)
872 are implemented, the parameters specified to the trace handler must be
873 variables. In this case, $wakeup_lat is obviously a variable, but
874 next_pid isn't, since it's just naming a field in the sched_switch
875 trace event. Since this is something that almost every trace() and
876 save() action does, a special shortcut is implemented to allow field
877 names to be used directly in those cases. How it works is that under
878 the covers, a temporary variable is created for the named field, and
879 this variable is what is actually passed to the trace handler. In the
880 code and documentation, this type of variable is called a 'field
883 Fields on other trace event's histograms can be used as well. In that
884 case we have to generate a new histogram and an unfortunately named
885 'synthetic_field' (the use of synthetic here has nothing to do with
886 synthetic events) and use that special histogram field as a variable.
888 The diagram below illustrates the new elements described above in the
889 context of the sched_switch histogram using the onmatch() handler and
892 First, we define the wakeup_latency synthetic event::
894 # echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events
896 Next, the sched_waking hist trigger as before::
898 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >>
899 events/sched/sched_waking/trigger
901 Finally, we create a hist trigger on the sched_switch event that
902 generates a wakeup_latency() trace event. In this case we pass
903 next_pid into the wakeup_latency synthetic event invocation, which
904 means it will be automatically converted into a field variable::
906 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
907 onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >>
908 /sys/kernel/tracing/events/sched/sched_switch/trigger
910 The diagram for the sched_switch event is similar to previous examples
911 but shows the additional field_vars[] array for hist_data and shows
912 the linkages between the field_vars and the variables and references
913 created to implement the field variables. The details are discussed
918 +------------------+ +-----------------------+
919 | .fields[] |-->| val = hitcount |
920 +----------------+ +-----------------------+
922 +----------------+ +---------------------+
923 +---| .field_vars[] | | .offset |
924 | +----------------+ +---------------------+
925 |+--| .var_refs[] | | .offset |
926 || +----------------+ +---------------------+
928 || var_ref_vals[] +---------------------+
929 || +-------------+ | .flags |
930 || | $ts0 |<---+ +---------------------+
931 || +-------------+ | | .var.idx |
932 || | $next_pid |<-+ | +---------------------+
933 || +-------------+ | | | .var.hist_data |
934 ||+>| $wakeup_lat | | | +---------------------+
935 ||| +-------------+ | | | .var_ref_idx |
936 ||| | | | | +-----------------------+
937 ||| +-------------+ | | | var = wakeup_lat |
938 ||| . | | +-----------------------+
940 ||| . | | +---------------------+
941 ||| +-------------+ | | | .offset |
942 ||| | | | | +---------------------+
943 ||| +-------------+ | | | .fn() |
944 ||| | | | | +---------------------+
945 ||| +-------------+ | | | .flags & FL_VAR |
946 ||| | | +---------------------+
948 ||| | | +---------------------+
949 ||| | | | .var.hist_data |
950 ||| | | +---------------------+
951 ||| | | | .var_ref_idx |
952 ||| | | +---------------------+
957 ||| +--------------+ | | .
958 +-->| field_var | | | .
959 || +--------------+ | | .
961 || +------------+ | | .
963 || +--------------+ | | .
964 || | field_var | | | .
965 || +--------------+ | | .
967 || +------------+ | | .
969 || +------------+ | | .
972 || . | | +-----------------------+ <--- n_vals
973 || +--------------+ | | | key = pid |
974 || | field_var | | | +-----------------------+
975 || +--------------+ | | | .size |
976 || | var |--+| +---------------------+
977 || +------------+ ||| | .offset |
978 || | val |-+|| +---------------------+
979 || +------------+ ||| | .fn() |
980 || ||| +---------------------+
982 || ||| +---------------------+
984 || ||| +---------------------+ <--- n_fields
986 || ||| n_keys = n_fields - n_vals
987 || ||| +-----------------------+
988 || |+->| var = next_pid |
989 || | | +-----------------------+
991 || | | +---------------------+
993 || | | +---------------------+
994 || | | | .flags & FL_VAR |
995 || | | +---------------------+
997 || | | +---------------------+
998 || | | | .var.hist_data |
999 || | | +-----------------------+
1000 || +-->| val for next_pid |
1001 || | | +-----------------------+
1003 || | | +---------------------+
1005 || | | +---------------------+
1007 || | | +---------------------+
1009 || | | +---------------------+
1011 || | | +---------------------+
1014 || | | +-----------------------+
1015 +|------------------|-|>| var_ref = $ts0 |
1016 | | | +-----------------------+
1018 | | | +---------------------+
1020 | | | +---------------------+
1022 | | | +---------------------+
1023 | | | | .flags & FL_VAR_REF |
1024 | | | +---------------------+
1025 | | +---| .var_ref_idx |
1026 | | +-----------------------+
1027 | | | var_ref = $next_pid |
1028 | | +-----------------------+
1030 | | +---------------------+
1032 | | +---------------------+
1034 | | +---------------------+
1035 | | | .flags & FL_VAR_REF |
1036 | | +---------------------+
1037 | +-----| .var_ref_idx |
1038 | +-----------------------+
1039 | | var_ref = $wakeup_lat |
1040 | +-----------------------+
1042 | +---------------------+
1044 | +---------------------+
1046 | +---------------------+
1047 | | .flags & FL_VAR_REF |
1048 | +---------------------+
1049 +------------------------| .var_ref_idx |
1050 +---------------------+
1052 As you can see, for a field variable, two hist_fields are created: one
1053 representing the variable, in this case next_pid, and one to actually
1054 get the value of the field from the trace stream, like a normal val
1055 field does. These are created separately from normal variable
1056 creation and are saved in the hist_data->field_vars[] array. See
1057 below for how these are used. In addition, a reference hist_field is
1058 also created, which is needed to reference the field variables such as
1059 $next_pid variable in the trace() action.
1061 Note that $wakeup_lat is also a variable reference, referencing the
1062 value of the expression common_timestamp-$ts0, and so also needs to
1063 have a hist field entry representing that reference created.
1065 When hist_trigger_elt_update() is called to get the normal key and
1066 value fields, it also calls update_field_vars(), which goes through
1067 each field_var created for the histogram, and available from
1068 hist_data->field_vars and calls val->fn() to get the data from the
1069 current trace record, and then uses the var's var.idx to set the
1070 variable at the var.idx offset in the appropriate tracing_map_elt's
1071 variable at elt->vars[var.idx].
1073 Once all the variables have been updated, resolve_var_refs() can be
1074 called from event_hist_trigger(), and not only can our $ts0 and
1075 $next_pid references be resolved but the $wakeup_lat reference as
1076 well. At this point, the trace() action can simply access the values
1077 assembled in the var_ref_vals[] array and generate the trace event.
1079 The same process occurs for the field variables associated with the
1082 Abbreviations used in the diagram::
1084 hist_data = struct hist_trigger_data
1085 hist_data.fields = struct hist_field
1086 field_var = struct field_var
1087 fn = hist_field_fn_t
1088 FL_KEY = HIST_FIELD_FL_KEY
1089 FL_VAR = HIST_FIELD_FL_VAR
1090 FL_VAR_REF = HIST_FIELD_FL_VAR_REF
1092 trace() action field variable test
1093 ----------------------------------
1095 This example adds to the previous test example by finally making use
1096 of the wakeup_lat variable, but in addition also creates a couple of
1097 field variables that then are all passed to the wakeup_latency() trace
1098 action via the onmatch() handler.
1100 First, we create the wakeup_latency synthetic event::
1102 # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
1104 Next, the sched_waking trigger from previous examples::
1106 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1108 Finally, as in the previous test example, we calculate and assign the
1109 wakeup latency using the $ts0 reference from the sched_waking trigger
1110 to the wakeup_lat variable, and finally use it along with a couple
1111 sched_switch event fields, next_pid and next_comm, to generate a
1112 wakeup_latency trace event. The next_pid and next_comm event fields
1113 are automatically converted into field variables for this purpose::
1115 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
1117 The sched_waking hist_debug output shows the same data as in the
1118 previous test example::
1120 # cat events/sched/sched_waking/hist_debug
1124 # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
1127 hist_data: 00000000d60ff61f
1135 hist_data->fields[0]:
1137 VAL: HIST_FIELD_FL_HITCOUNT
1142 hist_data->fields[1]:
1146 var.idx (into tracing_map_elt.vars[]): 0
1153 hist_data->fields[2]:
1156 ftrace_event_field name: pid
1161 The sched_switch hist_debug output shows the same key and value fields
1162 as in the previous test example - note that wakeup_lat is still in the
1163 val fields section, but that the new field variables are not there -
1164 although the field variables are variables, they're held separately in
1165 the hist_data's field_vars[] array. Although the field variables and
1166 the normal variables are located in separate places, you can see that
1167 the actual variable locations for those variables in the
1168 tracing_map_elt.vars[] do have increasing indices as expected:
1169 wakeup_lat takes the var.idx = 0 slot, while the field variables for
1170 next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note
1171 also that those are the same values displayed for the variable
1172 references corresponding to those variables in the variable reference
1173 fields section. Since there are two triggers and thus two hist_data
1174 addresses, those addresses also need to be accounted for when doing
1175 the matching - you can see that the first variable refers to the 0
1176 var.idx on the previous hist trigger (see the hist_data address
1177 associated with that trigger), while the second variable refers to the
1178 0 var.idx on the sched_switch hist trigger, as do all the remaining
1179 variable references.
1181 Finally, the action tracking variables section just shows the system
1182 and event name for the onmatch() handler::
1184 # cat events/sched/sched_switch/hist_debug
1188 # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active]
1191 hist_data: 0000000008f551b7
1199 hist_data->fields[0]:
1201 VAL: HIST_FIELD_FL_HITCOUNT
1206 hist_data->fields[1]:
1209 var.name: wakeup_lat
1210 var.idx (into tracing_map_elt.vars[]): 0
1217 hist_data->fields[2]:
1220 ftrace_event_field name: next_pid
1225 variable reference fields:
1227 hist_data->var_refs[0]:
1229 HIST_FIELD_FL_VAR_REF
1231 var.idx (into tracing_map_elt.vars[]): 0
1232 var.hist_data: 00000000d60ff61f
1233 var_ref_idx (into hist_data->var_refs[]): 0
1238 hist_data->var_refs[1]:
1240 HIST_FIELD_FL_VAR_REF
1242 var.idx (into tracing_map_elt.vars[]): 0
1243 var.hist_data: 0000000008f551b7
1244 var_ref_idx (into hist_data->var_refs[]): 1
1249 hist_data->var_refs[2]:
1251 HIST_FIELD_FL_VAR_REF
1253 var.idx (into tracing_map_elt.vars[]): 1
1254 var.hist_data: 0000000008f551b7
1255 var_ref_idx (into hist_data->var_refs[]): 2
1260 hist_data->var_refs[3]:
1262 HIST_FIELD_FL_VAR_REF
1264 var.idx (into tracing_map_elt.vars[]): 2
1265 var.hist_data: 0000000008f551b7
1266 var_ref_idx (into hist_data->var_refs[]): 3
1273 hist_data->field_vars[0]:
1279 var.idx (into tracing_map_elt.vars[]): 1
1282 ftrace_event_field name: next_pid
1287 hist_data->field_vars[1]:
1293 var.idx (into tracing_map_elt.vars[]): 2
1296 ftrace_event_field name: next_comm
1301 action tracking variables (for onmax()/onchange()/onmatch()):
1303 hist_data->actions[0].match_data.event_system: sched
1304 hist_data->actions[0].match_data.event: sched_waking
1306 The commands below can be used to clean things up for the next test::
1308 # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
1310 # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1312 # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
1314 action_data and the trace() action
1315 ----------------------------------
1317 As mentioned above, when the trace() action generates a synthetic
1318 event, all the parameters to the synthetic event either already are
1319 variables or are converted into variables (via field variables), and
1320 finally all those variable values are collected via references to them
1321 into a var_ref_vals[] array.
1323 The values in the var_ref_vals[] array, however, don't necessarily
1324 follow the same ordering as the synthetic event params. To address
1325 that, struct action_data contains another array, var_ref_idx[] that
1326 maps the trace action params to the var_ref_vals[] values. Below is a
1327 diagram illustrating that for the wakeup_latency() synthetic event::
1329 +------------------+ wakeup_latency()
1330 | action_data | event params var_ref_vals[]
1331 +------------------+ +-----------------+ +-----------------+
1332 | .var_ref_idx[] |--->| $wakeup_lat idx |---+ | |
1333 +----------------+ +-----------------+ | +-----------------+
1334 | .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val |
1335 +----------------+ +-----------------+ | | +-----------------+
1336 . | +->| $next_pid val |
1337 . | +-----------------+
1339 +-----------------+ | .
1341 +-----------------+ | +-----------------+
1342 +--->| $wakeup_lat val |
1345 Basically, how this ends up getting used in the synthetic event probe
1346 function, trace_event_raw_event_synth(), is as follows::
1348 for each field i in .synth_event
1349 val_idx = .var_ref_idx[i]
1350 val = var_ref_vals[val_idx]
1352 action_data and the onXXX() handlers
1353 ------------------------------------
1355 The hist trigger onXXX() actions other than onmatch(), such as onmax()
1356 and onchange(), also make use of and internally create hidden
1357 variables. This information is contained in the
1358 action_data.track_data struct, and is also visible in the hist_debug
1359 output as will be described in the example below.
1361 Typically, the onmax() or onchange() handlers are used in conjunction
1362 with the save() and snapshot() actions. For example::
1364 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
1365 onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >>
1366 /sys/kernel/tracing/events/sched/sched_switch/trigger
1370 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \
1371 onmax($wakeup_lat).snapshot()' >>
1372 /sys/kernel/tracing/events/sched/sched_switch/trigger
1374 save() action field variable test
1375 ---------------------------------
1377 For this example, instead of generating a synthetic event, the save()
1378 action is used to save field values whenever an onmax() handler
1379 detects that a new max latency has been hit. As in the previous
1380 example, the values being saved are also field values, but in this
1381 case, are kept in a separate hist_data array named save_vars[].
1383 As in previous test examples, we set up the sched_waking trigger::
1385 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1387 In this case, however, we set up the sched_switch trigger to save some
1388 sched_switch field values whenever we hit a new maximum latency. For
1389 both the onmax() handler and save() action, variables will be created,
1390 which we can use the hist_debug files to examine::
1392 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
1394 The sched_waking hist_debug output shows the same data as in the
1395 previous test examples::
1397 # cat events/sched/sched_waking/hist_debug
1400 # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
1403 hist_data: 00000000e6290f48
1411 hist_data->fields[0]:
1413 VAL: HIST_FIELD_FL_HITCOUNT
1418 hist_data->fields[1]:
1422 var.idx (into tracing_map_elt.vars[]): 0
1429 hist_data->fields[2]:
1432 ftrace_event_field name: pid
1437 The output of the sched_switch trigger shows the same val and key
1438 values as before, but also shows a couple new sections.
1440 First, the action tracking variables section now shows the
1441 actions[].track_data information describing the special tracking
1442 variables and references used to track, in this case, the running
1443 maximum value. The actions[].track_data.var_ref member contains the
1444 reference to the variable being tracked, in this case the $wakeup_lat
1445 variable. In order to perform the onmax() handler function, there
1446 also needs to be a variable that tracks the current maximum by getting
1447 updated whenever a new maximum is hit. In this case, we can see that
1448 an auto-generated variable named ' __max' has been created and is
1449 visible in the actions[].track_data.track_var variable.
1451 Finally, in the new 'save action variables' section, we can see that
1452 the 4 params to the save() function have resulted in 4 field variables
1453 being created for the purposes of saving the values of the named
1454 fields when the max is hit. These variables are kept in a separate
1455 save_vars[] array off of hist_data, so are displayed in a separate
1458 # cat events/sched/sched_switch/hist_debug
1462 # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active]
1465 hist_data: 0000000057bcd28d
1473 hist_data->fields[0]:
1475 VAL: HIST_FIELD_FL_HITCOUNT
1480 hist_data->fields[1]:
1483 var.name: wakeup_lat
1484 var.idx (into tracing_map_elt.vars[]): 0
1491 hist_data->fields[2]:
1494 ftrace_event_field name: next_pid
1499 variable reference fields:
1501 hist_data->var_refs[0]:
1503 HIST_FIELD_FL_VAR_REF
1505 var.idx (into tracing_map_elt.vars[]): 0
1506 var.hist_data: 00000000e6290f48
1507 var_ref_idx (into hist_data->var_refs[]): 0
1512 hist_data->var_refs[1]:
1514 HIST_FIELD_FL_VAR_REF
1516 var.idx (into tracing_map_elt.vars[]): 0
1517 var.hist_data: 0000000057bcd28d
1518 var_ref_idx (into hist_data->var_refs[]): 1
1523 action tracking variables (for onmax()/onchange()/onmatch()):
1525 hist_data->actions[0].track_data.var_ref:
1527 HIST_FIELD_FL_VAR_REF
1529 var.idx (into tracing_map_elt.vars[]): 0
1530 var.hist_data: 0000000057bcd28d
1531 var_ref_idx (into hist_data->var_refs[]): 1
1536 hist_data->actions[0].track_data.track_var:
1540 var.idx (into tracing_map_elt.vars[]): 1
1545 save action variables (save() params):
1547 hist_data->save_vars[0]:
1553 var.idx (into tracing_map_elt.vars[]): 2
1556 ftrace_event_field name: next_comm
1561 hist_data->save_vars[1]:
1567 var.idx (into tracing_map_elt.vars[]): 3
1570 ftrace_event_field name: prev_pid
1575 hist_data->save_vars[2]:
1581 var.idx (into tracing_map_elt.vars[]): 4
1584 ftrace_event_field name: prev_prio
1589 hist_data->save_vars[3]:
1595 var.idx (into tracing_map_elt.vars[]): 5
1598 ftrace_event_field name: prev_comm
1603 The commands below can be used to clean things up for the next test::
1605 # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger
1607 # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1609 A couple special cases
1610 ======================
1612 While the above covers the basics of the histogram internals, there
1613 are a couple of special cases that should be discussed, since they
1614 tend to create even more confusion. Those are field variables on other
1615 histograms, and aliases, both described below through example tests
1616 using the hist_debug files.
1618 Test of field variables on other histograms
1619 -------------------------------------------
1621 This example is similar to the previous examples, but in this case,
1622 the sched_switch trigger references a hist trigger field on another
1623 event, namely the sched_waking event. In order to accomplish this, a
1624 field variable is created for the other event, but since an existing
1625 histogram can't be used, as existing histograms are immutable, a new
1626 histogram with a matching variable is created and used, and we'll see
1627 that reflected in the hist_debug output shown below.
1629 First, we create the wakeup_latency synthetic event. Note the
1630 addition of the prio field::
1632 # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
1634 As in previous test examples, we set up the sched_waking trigger::
1636 # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1638 Here we set up a hist trigger on sched_switch to send a wakeup_latency
1639 event using an onmatch handler naming the sched_waking event. Note
1640 that the third param being passed to the wakeup_latency() is prio,
1641 which is a field name that needs to have a field variable created for
1642 it. There isn't however any prio field on the sched_switch event so
1643 it would seem that it wouldn't be possible to create a field variable
1644 for it. The matching sched_waking event does have a prio field, so it
1645 should be possible to make use of it for this purpose. The problem
1646 with that is that it's not currently possible to define a new variable
1647 on an existing histogram, so it's not possible to add a new prio field
1648 variable to the existing sched_waking histogram. It is however
1649 possible to create an additional new 'matching' sched_waking histogram
1650 for the same event, meaning that it uses the same key and filters, and
1651 define the new prio field variable on that.
1653 Here's the sched_switch trigger::
1655 # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
1657 And here's the output of the hist_debug information for the
1658 sched_waking hist trigger. Note that there are two histograms
1659 displayed in the output: the first is the normal sched_waking
1660 histogram we've seen in the previous examples, and the second is the
1661 special histogram we created to provide the prio field variable.
1663 Looking at the second histogram below, we see a variable with the name
1664 synthetic_prio. This is the field variable created for the prio field
1665 on that sched_waking histogram::
1667 # cat events/sched/sched_waking/hist_debug
1671 # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
1674 hist_data: 00000000349570e4
1682 hist_data->fields[0]:
1684 VAL: HIST_FIELD_FL_HITCOUNT
1689 hist_data->fields[1]:
1693 var.idx (into tracing_map_elt.vars[]): 0
1700 hist_data->fields[2]:
1703 ftrace_event_field name: pid
1711 # trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active]
1714 hist_data: 000000006920cf38
1722 hist_data->fields[0]:
1724 VAL: HIST_FIELD_FL_HITCOUNT
1729 hist_data->fields[1]:
1732 ftrace_event_field name: prio
1733 var.name: synthetic_prio
1734 var.idx (into tracing_map_elt.vars[]): 0
1741 hist_data->fields[2]:
1744 ftrace_event_field name: pid
1749 Looking at the sched_switch histogram below, we can see a reference to
1750 the synthetic_prio variable on sched_waking, and looking at the
1751 associated hist_data address we see that it is indeed associated with
1752 the new histogram. Note also that the other references are to a
1753 normal variable, wakeup_lat, and to a normal field variable, next_pid,
1754 the details of which are in the field variables section::
1756 # cat events/sched/sched_switch/hist_debug
1760 # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active]
1763 hist_data: 00000000a73b67df
1771 hist_data->fields[0]:
1773 VAL: HIST_FIELD_FL_HITCOUNT
1778 hist_data->fields[1]:
1781 var.name: wakeup_lat
1782 var.idx (into tracing_map_elt.vars[]): 0
1789 hist_data->fields[2]:
1792 ftrace_event_field name: next_pid
1797 variable reference fields:
1799 hist_data->var_refs[0]:
1801 HIST_FIELD_FL_VAR_REF
1803 var.idx (into tracing_map_elt.vars[]): 0
1804 var.hist_data: 00000000349570e4
1805 var_ref_idx (into hist_data->var_refs[]): 0
1810 hist_data->var_refs[1]:
1812 HIST_FIELD_FL_VAR_REF
1814 var.idx (into tracing_map_elt.vars[]): 0
1815 var.hist_data: 00000000a73b67df
1816 var_ref_idx (into hist_data->var_refs[]): 1
1821 hist_data->var_refs[2]:
1823 HIST_FIELD_FL_VAR_REF
1825 var.idx (into tracing_map_elt.vars[]): 1
1826 var.hist_data: 00000000a73b67df
1827 var_ref_idx (into hist_data->var_refs[]): 2
1832 hist_data->var_refs[3]:
1834 HIST_FIELD_FL_VAR_REF
1835 name: synthetic_prio
1836 var.idx (into tracing_map_elt.vars[]): 0
1837 var.hist_data: 000000006920cf38
1838 var_ref_idx (into hist_data->var_refs[]): 3
1845 hist_data->field_vars[0]:
1851 var.idx (into tracing_map_elt.vars[]): 1
1854 ftrace_event_field name: next_pid
1859 action tracking variables (for onmax()/onchange()/onmatch()):
1861 hist_data->actions[0].match_data.event_system: sched
1862 hist_data->actions[0].match_data.event: sched_waking
1864 The commands below can be used to clean things up for the next test::
1866 # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger
1868 # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1870 # echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events
1875 This example is very similar to previous examples, but demonstrates
1878 First, we create the wakeup_latency synthetic event::
1880 # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events
1882 Next, we create a sched_waking trigger similar to previous examples,
1883 but in this case we save the pid in the waking_pid variable::
1885 # echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
1887 For the sched_switch trigger, instead of using $waking_pid directly in
1888 the wakeup_latency synthetic event invocation, we create an alias of
1889 $waking_pid named $woken_pid, and use that in the synthetic event
1890 invocation instead::
1892 # echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
1894 Looking at the sched_waking hist_debug output, in addition to the
1895 normal fields, we can see the waking_pid variable::
1897 # cat events/sched/sched_waking/hist_debug
1901 # trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active]
1904 hist_data: 00000000a250528c
1912 hist_data->fields[0]:
1914 VAL: HIST_FIELD_FL_HITCOUNT
1919 hist_data->fields[1]:
1922 ftrace_event_field name: pid
1923 var.name: waking_pid
1924 var.idx (into tracing_map_elt.vars[]): 0
1929 hist_data->fields[2]:
1933 var.idx (into tracing_map_elt.vars[]): 1
1940 hist_data->fields[3]:
1943 ftrace_event_field name: pid
1948 The sched_switch hist_debug output shows that a variable named
1949 woken_pid has been created but that it also has the
1950 HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag
1951 set, which is why it appears in the val field section.
1953 Despite that implementation detail, an alias variable is actually more
1954 like a variable reference; in fact it can be thought of as a reference
1955 to a reference. The implementation copies the var_ref->fn() from the
1956 variable reference being referenced, in this case, the waking_pid
1957 fn(), which is hist_field_var_ref() and makes that the fn() of the
1958 alias. The hist_field_var_ref() fn() requires the var_ref_idx of the
1959 variable reference it's using, so waking_pid's var_ref_idx is also
1960 copied to the alias. The end result is that when the value of alias
1961 is retrieved, in the end it just does the same thing the original
1962 reference would have done and retrieves the same value from the
1963 var_ref_vals[] array. You can verify this in the output by noting
1964 that the var_ref_idx of the alias, in this case woken_pid, is the same
1965 as the var_ref_idx of the reference, waking_pid, in the variable
1966 reference fields section.
1968 Additionally, once it gets that value, since it is also a variable, it
1969 then saves that value into its var.idx. So the var.idx of the
1970 woken_pid alias is 0, which it fills with the value from var_ref_idx 0
1971 when its fn() is called to update itself. You'll also notice that
1972 there's a woken_pid var_ref in the variable refs section. That is the
1973 reference to the woken_pid alias variable, and you can see that it
1974 retrieves the value from the same var.idx as the woken_pid alias, 0,
1975 and then in turn saves that value in its own var_ref_idx slot, 3, and
1976 the value at this position is finally what gets assigned to the
1977 $woken_pid slot in the trace event invocation::
1979 # cat events/sched/sched_switch/hist_debug
1983 # trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active]
1986 hist_data: 0000000055d65ed0
1994 hist_data->fields[0]:
1996 VAL: HIST_FIELD_FL_HITCOUNT
2001 hist_data->fields[1]:
2006 var.idx (into tracing_map_elt.vars[]): 0
2007 var_ref_idx (into hist_data->var_refs[]): 0
2012 hist_data->fields[2]:
2015 var.name: wakeup_lat
2016 var.idx (into tracing_map_elt.vars[]): 1
2023 hist_data->fields[3]:
2026 ftrace_event_field name: next_pid
2031 variable reference fields:
2033 hist_data->var_refs[0]:
2035 HIST_FIELD_FL_VAR_REF
2037 var.idx (into tracing_map_elt.vars[]): 0
2038 var.hist_data: 00000000a250528c
2039 var_ref_idx (into hist_data->var_refs[]): 0
2044 hist_data->var_refs[1]:
2046 HIST_FIELD_FL_VAR_REF
2048 var.idx (into tracing_map_elt.vars[]): 1
2049 var.hist_data: 00000000a250528c
2050 var_ref_idx (into hist_data->var_refs[]): 1
2055 hist_data->var_refs[2]:
2057 HIST_FIELD_FL_VAR_REF
2059 var.idx (into tracing_map_elt.vars[]): 1
2060 var.hist_data: 0000000055d65ed0
2061 var_ref_idx (into hist_data->var_refs[]): 2
2066 hist_data->var_refs[3]:
2068 HIST_FIELD_FL_VAR_REF
2070 var.idx (into tracing_map_elt.vars[]): 0
2071 var.hist_data: 0000000055d65ed0
2072 var_ref_idx (into hist_data->var_refs[]): 3
2077 hist_data->var_refs[4]:
2079 HIST_FIELD_FL_VAR_REF
2081 var.idx (into tracing_map_elt.vars[]): 2
2082 var.hist_data: 0000000055d65ed0
2083 var_ref_idx (into hist_data->var_refs[]): 4
2090 hist_data->field_vars[0]:
2096 var.idx (into tracing_map_elt.vars[]): 2
2099 ftrace_event_field name: next_comm
2104 action tracking variables (for onmax()/onchange()/onmatch()):
2106 hist_data->actions[0].match_data.event_system: sched
2107 hist_data->actions[0].match_data.event: sched_waking
2109 The commands below can be used to clean things up for the next test::
2111 # echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger
2113 # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger
2115 # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events