docs/design/part-tracing.txt

   1 Tracing
   2 =======
   3
   4 This subsystem will provide a mechanism to get structured tracing info from
   5 GStreamer applications. This can be used for post-run analysis as well as for
   6 live introspection.
   7
   8 Use cases
   9 ---------
  10 * I'd like to get statistics from a running application.
  11
  12 * I'd like to to understand which parts of my pipeline use how many resources.
  13
  14 * I'd like to know which parts of the pipeline use how much memory.
  15
  16 * I'd like to know about ref-counts of parts in the pipeline to find ref-count
  17   issues.
  18
  19 Non use-cases
  20 -------------
  21 * Some element in the pipeline does not play along the rules, find out which
  22   one. This could be done with generic tests.
  23
  24 Design
  25 ------
  26
  27 The system brings the following new items:
  28 core hooks: probes in the core api, that will expose internal state when tracing
  29   is in use
  30 tracers: plugin features that can process data from the hooks and emit a log
  31 tracing front-ends: applications that consume logs from tracers
  32
  33 Like the logging, the tracer hooks can be compiled out and if not use a local
  34 condition to check if active.
  35
  36 Certain GStreamer core function (such as gst_pad_push or gst_element_add_pad)
  37 will call into the tracer subsystem to dispatch into active tracing modules.
  38 Developers will be able to select a list of plugins by setting an environment
  39 variable, such as GST_TRACERS="meminfo;dbus". One can also pass parameters to
  40 plugins:  GST_TRACERS="log(events,buffers);stats(all)".
  41 When then plugins are loaded, we'll add them to certain hooks according to which
  42 they are interested in.
  43
  44 Right now tracing info is logged as GstStructures to the TRACE level.
  45 Idea: Another env var GST_TRACE_CHANNEL could be used to send the tracing to a
  46 file or a socket. See https://bugzilla.gnome.org/show_bug.cgi?id=733188 for
  47 discussion on these environment variables.
  48
  49 Hook api
  50 --------
  51 We'll wrap interesting api calls with two macros, e.g. gst_pad_push():
  52
  53 GstFlowReturn
  54 gst_pad_push (GstPad * pad, GstBuffer * buffer)
  55 {
  56   GstFlowReturn res;
  57
  58   g_return_val_if_fail (GST_IS_PAD (pad), GST_FLOW_ERROR);
  59   g_return_val_if_fail (GST_PAD_IS_SRC (pad), GST_FLOW_ERROR);
  60   g_return_val_if_fail (GST_IS_BUFFER (buffer), GST_FLOW_ERROR);
  61
  62   GST_TRACER_PAD_PUSH_PRE (pad, buffer);
  63   res = gst_pad_push_data (pad,
  64       GST_PAD_PROBE_TYPE_BUFFER | GST_PAD_PROBE_TYPE_PUSH, buffer);
  65   GST_TRACER_PAD_PUSH_POST (pad, res);
  66   return res;
  67 }
  68
  69 TODO(ensonic): gcc has some magic for wrapping functions
  70 - http://gcc.gnu.org/onlinedocs/gcc/Constructing-Calls.html
  71 - http://www.clifford.at/cfun/gccfeat/#gccfeat05.c
  72
  73 TODO(ensonic): we should eval if we can use something like jump_label in the kernel
  74 - http://lwn.net/Articles/412072/ + http://lwn.net/Articles/435215/
  75 - http://lxr.free-electrons.com/source/kernel/jump_label.c
  76 - http://lxr.free-electrons.com/source/include/linux/jump_label.h
  77 - http://lxr.free-electrons.com/source/arch/x86/kernel/jump_label.c
  78 TODO(ensonic): liblttng-ust provides such a mechanism for user-space
  79 - but this is mostly about logging traces
  80 - it is linux specific :/
  81
  82 In addition to api hooks we should also provide timer hooks. Interval timers are
  83 useful to get e.g. resource usage snapshots. Also absolute timers might make
  84 sense. All this could be implemented with a clock thread. We can use another
  85 env-var GST_TRACE_TIMERS="100ms,75ms" to configure timers and then pass them to
  86 the tracers like, GST_TRACERS="rusage(timer=100ms);meminfo(timer=75ms)". Maybe
  87 we can create them ad-hoc and avoid the GST_TRACE_TIMERS var.
  88
  89 Hooks (* already implemented)
  90 -----
  91 * gst_bin_add
  92 * gst_bin_remove
  93 * gst_element_add_pad
  94 * gst_element_post_message
  95 * gst_element_query
  96 * gst_element_remove_pad
  97 * gst_element_factory_make
  98 * gst_pad_link
  99 * gst_pad_pull_range
 100 * gst_pad_push
 101 * gst_pad_push_list
 102 * gst_pad_push_event
 103 * gst_pad_unlink
 104
 105 Tracer api
 106 ----------
 107 Tracers are plugin features. They have a simple api:
 108
 109 class init
 110 Here the tracers describe the data the will emit.
 111
 112 instance init
 113 Tracers attach handlers to one or more hooks using gst_tracing_register_hook().
 114 In case the are configurable, they can read the options from the 'params'
 115 property. This is the extra detail from the environment var.
 116
 117 hook functions
 118 Hooks marshal the parameters given to a trace hook into varargs and also
 119 add some extra into such as a timestamp. Hooks will be called from misc threads.
 120 The trace plugins should only consume (=read) the provided data. Expensive
 121 computation should be avoided to not affect the execution too much.
 122 Most trace plugins will log data to a trace channel.
 123
 124 instance destruction
 125 Tracers can output results and release data. This would ideally be done at the
 126 end of the applications, but gst_deinit() is not mandatory. gst_tracelib was
 127 using a gcc_destructor. Ideally tracer modules log data as they have them and
 128 leave aggregation to a tool that processes the log.
 129
 130 tracer event classes
 131 --------------------
 132 Most tracers will log some kind of 'events' : a data transfer, an event,
 133 a message, a query or a measurement. Every tracers should describe the data
 134 format. This way tools that process tracer logs can show the data in a
 135 meaningful way without having to know about the tracer plugin.
 136
 137 One way would be to introspect the data from the plugin. This has the
 138 disadvantage that the postprocessing app needs to load the plugins or talk to
 139 the gstreamer registry. An alternative is to also log the format description
 140 into the log. Right now we're logging several nested GstStructure from the
 141 _tracer_class_init() function (except in the log tracer).
 142
 143 // the name is the value name + ".class"
 144 // the content describes a single log record
 145 gst_tracer_record_new ("thread-rusage.class",
 146     // value in the log record (order does not matter)
 147     // 'thread-id' is a 'key' to related the record to something as indicated
 148     // by 'scope' substructure
 149     "thread-id", GST_TYPE_STRUCTURE, gst_structure_new ("scope",
 150         "type", G_TYPE_GTYPE, G_TYPE_GUINT64,
 151         "related-to", GST_TYPE_TRACER_VALUE_SCOPE, GST_TRACER_VALUE_SCOPE_THREAD,
 152         NULL),
 153     // next value in the record
 154     // 'average-cpuload' is a measurement as indicated by the 'value'
 155     // substructure
 156     "average-cpuload", GST_TYPE_STRUCTURE, gst_structure_new ("value",
 157         // value type
 158         "type", G_TYPE_GTYPE, G_TYPE_UINT,
 159         // human readable description, that can be used as a graph label
 160         "description", G_TYPE_STRING, "average cpu usage per thread",
 161         // flags that help to use the right graph type
 162         // flags { aggregated, windowed, cumulative, ... }
 163         "flags", GST_TYPE_TRACER_VALUE_FLAGS, GST_TRACER_VALUE_FLAGS_AGGREGATED,
 164         // value range
 165         "min", G_TYPE_UINT, 0,
 166         "max", G_TYPE_UINT, 100,
 167         NULL),
 168     ...
 169     NULL);
 170
 171 A few ideas that are not yet in the above spec:
 172 - it would be nice to describe the unit of values
 173   - putting it into the description is not flexible though, e.g. time would be
 174     a guint64 but a ui would reformat it to e.g. h:m:s.ms
 175   - other units are e.g.: percent, per-mille, or kbit/s
 176 - we'd like to have some metadata on scopes
 177   - e.g. we'd like to log the thread-names, so that a UI can show that instead
 178     of thread-ids
 179   - the stats tracer logs 'new-element' and 'new-pad' messages
 180     - they add a unique 'ix' to each instance as the memory ptr can be reused
 181       for new instances, the data is attached to the objects as qdata
 182     - the latency tracer would like to also reference this metadata
 183 - right now we log the classes as structures
 184   - this is important so that the log is self contained
 185   - it would be nice to add them to the registry, so that gst-inspect can show
 186     them
 187
 188 We could also consider to add each value as a READABLE gobject property. The
 189 property has name/description. We could use qdata for scope and flags (or have
 190 some new property flags).
 191 We would also need a new "notify" signal, so that value-change notifications
 192 would include a time-stamp. This way the tracers would not needs to be aware of
 193 the logging. The core tracer would register the notify handlers and emit the
 194 log.
 195 Or we just add a gst_tracer_class_install_event() and that mimics the
 196 g_object_class_install_property().
 197
 198 Frontends can:
 199 - do an events over time histogram
 200 - plot curves of values over time or deltas
 201 - show gauges
 202 - collect statistics (min, max, avg, ...)
 203
 204 Plugins ideas
 205 =============
 206
 207 We can have some under gstreamer/plugins/tracers/
 208
 209 latency
 210 -------
 211 - register to buffer and event flow
 212 - send custom event on buffer flow at source elements
 213 - catch events on event transfer at sink elements
 214
 215 meminfo (not yet implemented)
 216 -------
 217 - register to an interval-timer hook.
 218 - call mallinfo() and log memory usage
 219
 220 rusage
 221 ------
 222 - register to an interval-timer hook.
 223 - call getrusage() and log resource usage
 224
 225 dbus (not yet implemented)
 226 ----
 227 - provide a dbus iface to announce applications that are traced
 228 - tracing UIs can use the dbus iface to find the channels where logging and
 229   tracing is getting logged to
 230 - one would start the tracing UI first and when the application is started with
 231   tracing activated, the dbus plugin will announce the new application,
 232   upon which the tracing UI can start reading from the log channels, this avoid
 233   missing some data
 234
 235 topology (not yet implemented)
 236 --------
 237 - register to pipeline topology hooks
 238 - tracing UIs can show a live pipeline graph
 239
 240 stats
 241 -----
 242 - register to buffer, event, message and query flow
 243 - tracing apps can do e.g. statistics
 244
 245 refcounts (not yet implemented)
 246 ---------
 247 - log ref-counts of objects
 248 - just logging them outside of glib/gobject would still make it hard to detect
 249   issues though
 250
 251 opengl (not yet implemented)
 252 ------
 253 - upload/download times
 254 - there is not hardware agnostic way to get e.g. memory usage info
 255   (gl extensions)
 256
 257 memory (not yet implemented)
 258 ------
 259 - trace live instance (and pointer to the memory)
 260 - use an atexit handler to dump leaked instance
 261 https://bugzilla.gnome.org/show_bug.cgi?id=756760#c6
 262
 263 leaks
 264 -----
 265 - track creation/destruction of GstObject and GstMiniObject
 266 - log those which are still alive when app is exiting and raise an error if any
 267
 268 User interfaces
 269 ===============
 270
 271 gst-debug-viewer
 272 ----------------
 273 gst-debug-viewer could be given the trace log in addition to the debug log (or a
 274 combined log). Alternatively it would show a dialog that shows all local apps
 275 (if the dbus plugin is loaded) and read the log streams from the sockets/files
 276 that are configured for the app.
 277
 278 gst-tracer
 279 ----------
 280 Counterpart of gst-tracelib-ui.
 281
 282 gst-stats
 283 ---------
 284 A terminal app that shows summary/running stats like the summary gst-tracelib
 285 shows at the end of a run. Currently only shows an aggregated status.
 286
 287 live-graphers
 288 -------------
 289 Maybe we can even feed the log into existing live graphers, with a little driver
 290 * https://github.com/dkogan/feedgnuplot
 291
 292
 293 Problems / Open items
 294 =====================
 295 - should tracers log into the debug.log or into a separate log?
 296   - separate log
 297     - use a binary format?
 298     - worse performance (we're writing two logs at the same time)
 299     - need to be careful when people to GST_DEBUG_CHANNEL=stderr and
 300       GST_TRACE_CHANNEL=stderr (use a shared channel, but what about the
 301       formats?)
 302   - debug log
 303     - the tracer subsystem would need to log the GST_TRACE at a level that is
 304       active
 305     - should the tracer call gst_debug_category_set_threshold() to ensure things
 306       work, even though the levels don't make a lot of sense here
 307   - make logging a tracer (a hook in gst_debug_log_valist, move
 308     gst_debug_log_default() to the tracer module)
 309     - log all debug log to the tracer log, some of the current logging
 310       statements can be replaced by generic logging as shown in the log-tracer
 311     - add tools/gst-debug to extract a human readable debug log from the trace
 312       log
 313   - we could maintain a list of log functions, where gst_tracer_log_trace() is
 314     the default one. This way e.g. gst-validate could consume the traces
 315     directly.
 316
 317 - when hooking into a timer, should we just have some predefined intervals?
 318   - can we add a tracer module that registers the timer hook? then we could do
 319       GST_TRACER="timer(10ms);rusage"
 320     right now the tracer hooks are defined as an enum though.
 321
 322 - when connecting to a running app, we can't easily get the 'current' state if
 323   logging is using a socket, as past events are not explicitly stored, we could
 324   determine the current topology and emit events with GST_CLOCK_TIME_NONE as ts
 325   to indicate that the events are synthetic.
 326
 327 - we need stable ids for scopes (threads, elements, pads)
 328   - the address can be reused
 329   - we can use gst_util_seqnum_next()
 330   - something like gst_object_get_path_string() won't work as objects are
 331     initially without parent
 332
 333 - right now the tracing-hooks are enabled/disabled from configure with
 334   --{enable,disable}-gst-tracer-hooks
 335   The tracer code and the plugins are still built though. We should add a
 336   --{enable,disable}-gst-tracer to disabled the whole system, allthough this
 337   is a bit confusing with the --{enable,disable}-trace option we have already.
 338
 339 Try it
 340 ======
 341 GST_DEBUG="GST_TRACER:7,GST_BUFFER*:7,GST_EVENT:7,GST_MESSAGE:7" GST_TRACERS=log gst-launch-1.0 fakesrc num-buffers=10 ! fakesink
 342 - traces for buffer flow in TRACE level
 343
 344 GST_DEBUG="GST_TRACER:7" GST_TRACERS="stats;rusage" GST_DEBUG_FILE=trace.log gst-launch-1.0 fakesrc num-buffers=10 sizetype=fixed ! queue ! fakesink
 345 gst-stats-1.0 trace.log
 346 - print some pipeline stats on exit
 347
 348 GST_DEBUG="GST_TRACER:7" GST_TRACERS="stats;rusage" GST_DEBUG_FILE=trace.log /usr/bin/gst-play-1.0 --interactive $HOME/Videos/movie.mp4
 349 ./scripts/gst-plot-traces.sh --format=png | gnuplot
 350 eog trace.log.*.png
 351 - get ts, average-cpuload, current-cpuload, time and plot
 352
 353 GST_DEBUG="GST_TRACER:7" GST_TRACERS=latency gst-launch-1.0 audiotestsrc num-buffers=10 ! audioconvert ! volume volume=0.7 ! autoaudiosink
 354 - print processing latencies
 355
 356 GST_TRACERS="leaks" gst-launch-1.0 videotestsrc num-buffers=10 ! fakesink
 357 - Raise a warning if a leak is detected
 358
 359 GST_DEBUG="GST_TRACER:7" GST_TRACERS="leaks(GstEvent,GstMessage)" gst-launch-1.0 videotestsrc num-buffers=10 ! fakesink
 360 - check if any GstEvent or GstMessage is leaked and raise a warning
 361
 362 Performance
 363 ===========
 364 run ./tests/benchmarks/tracing.sh <tracer(s)> <media>
 365
 366 egrep -c "(proc|thread)-rusage" trace.log
 367 658618
 368 grep -c "gst_tracer_log_trace" trace.log
 369 823351
 370
 371 - we can optimize most of it by using quarks in structures or eventually avoid structures totally
 372