5 ktap is a new script-based dynamic tracing tool for linux
8 ktap is a new script-based dynamic tracing tool for Linux,
9 it uses a scripting language and lets users trace the Linux kernel dynamically.
10 ktap is designed to give operational insights with interoperability
11 that allows users to tune, troubleshoot and extend kernel and application.
12 It's similar with Linux Systemtap and Solaris Dtrace.
14 ktap have different design principles from Linux mainstream dynamic tracing
15 language in that it's based on bytecode, so it doesn't depend upon GCC,
16 doesn't require compiling kernel module for each script, safe to use in
17 production environment, fulfilling the embedded ecosystem's tracing needs.
21 * simple but powerful scripting language
22 * register based interpreter (heavily optimized) in Linux kernel
23 * small and lightweight
24 * not depend on gcc for each script running
25 * easy to use in embedded environment without debugging info
26 * support for tracepoint, kprobe, uprobe, function trace, timer, and more
27 * supported in x86, arm, ppc, mips
35 * Linux 3.1 or later(Need some kernel patches for kernel earlier than 3.1)
36 * CONFIG_EVENT_TRACING enabled
37 * CONFIG_PERF_EVENTS enabled
38 * CONFIG_DEBUG_FS enabled
39 make sure debugfs mounted before insmod ktapvm
40 mount debugfs: mount -t debugfs none /sys/kernel/debug/
42 Install elfutils-libelf-devel on RHEL-based distros, or libelf-dev on
44 Use `make NO_LIBELF=1` to build without libelf support.
45 libelf is required for resolving symbols to addresses in DSO, and for sdt.
47 Note that those configuration is always enabled in Linux distribution,
48 like REHL, Fedora, Ubuntu, etc.
50 1. Clone ktap from github
52 $ git clone http://github.com/ktap/ktap.git
57 $ make #generate ktapvm kernel module and ktap binary
59 3. Load ktapvm kernel module(make sure debugfs mounted)
61 $ make load #need to be root or have sudo access
65 $ ./ktap scripts/helloworld.kp
72 ktap's syntax is design on the mind of C language syntax friendly,
73 to make it easy scripting by kernel developer.
75 1. Variable declaration
76 The biggest syntax differences with C is that ktap is a dynamic typed
77 language, so you won't need add any variable type declaration, just
81 All functions in ktap should use keyword "function" declaration
84 The comments of ktap is starting from '#', long comments doesn't support now.
87 Don't need place any ';' at the ending of statement in ktap.
88 ktap use free syntax style, so you can choose to use the ';' or not.
90 ktap use nil as NULL, the result of any number operate on nil is nil.
92 ktap don't have array structure, also don't have any pointer operation.
96 ktap if/else is same as C language.
98 There have two method of for-loop in ktap:
100 for (i = init, limit, step) { body }
102 this is same as below in C:
104 for (i = init; i < limit; i += step) { body }
106 The next for-loop method is:
108 for (k, v in pairs(t)) { body } # looping all elements of table
110 Note that ktap don't have "continue" keyword, but C does.
114 Associative array is heavily used in ktap, it's also called by table.
127 for (k, v in pairs(t)) { body } # looping all elements of table
130 # Built in functions and librarys
132 ## Built in functions
135 Receives any number of arguments, and prints their values,
136 print is not intended for formatted output, but only as a
137 quick way to show a value, typically for debugging.
138 For formatted output, use printf.
140 **printf (fmt, ...)**
141 Similar with C printf, use for format string output.
144 Returns three values: the next function, the table t, and nil,
145 so that the construction
146 for (k,v in pairs(t)) { body }
147 will iterate over all key-value pairs of table t.
150 If the argument is string, return length of string,
151 if the argument is table, return counts of table pairs.
154 checking is context is interrupt context
157 quit ktap executing, similar with exit syscall
160 return current process pid
163 return current thread id
166 return current process uid
169 return current process exec name string
172 return current cpu id
175 return machine architecture, like x86, arm, etc.
178 return Linux kernel version string, like 3.9, etc.
180 **user_string (addr)**
181 Receive userspace address, read string from userspace, return string.
184 Receive table, output table histogram to user.
186 **curr_task_info (offset, fetch_bytes)**
187 fetch value in field offset of task_struct structure, argument fetch_bytes
188 could be 4 or 8, if fetch_bytes is not given, default is 4.
190 user may need to get field offset by gdb, for example:
192 (gdb)p &(((struct task_struct *)0).prio)
194 **print_backtrace ()**
195 print current task stack info
202 **kdebug.probe_by_id (eventdef_info, eventfun)**
204 This function is underly representation of high level tracing primitive.
205 Note that eventdef_info is just a userspace memory pointer refer to real
206 eventdef_info structure, the structure defintion is:
208 struct ktap_eventdef_info {
209 int nr; /* the number to id */
210 int *id_arr; /* id array */
214 Those id is read from /sys/kernel/debug/tracing/events/$SYS/$EVENT/id
216 The second argument in above examples is a function:
217 function eventfun () { action }
220 **kdebug.probe_end (endfunc)**
222 This function is used for invoking a function when tracing end, it will wait
223 until user press CTRL+C to stop tracing, then ktap will call endfunc function,
224 user could show tracing results in that function, or do other things.
226 User don't have to use kdebug library directly, use trace/trace_end keyword.
232 # Linux tracing basics
234 tracepoints, probe, timer
239 # Tracing semantics in ktap
243 **trace EVENTDEF /FILTER/ { ACTION }**
245 This is the basic tracing block for ktap, you need to use a specific EVENTDEF
246 string, and own event function.
248 There have four type of EVENTDEF, tracepoint, kprobe, uprobe, sdt.
253 -------------------- -------------------------------
254 syscalls:* trace all syscalls events
255 syscalls:sys_enter_* trace all syscalls entry events
256 kmem:* trace all kmem related events
257 sched:* trace all sched related events
258 sched:sched_switch trace sched_switch tracepoint
259 \*:\* trace all tracepoints in system
261 All tracepoint events are based on:
262 /sys/kernel/debug/tracing/events/$SYS/$EVENT
264 - ftrace(kernel newer than 3.3, and must compiled with CONFIG_FUNCTION_TRACER)
267 -------------------- -------------------------------
268 ftrace:function trace kernel functions based on ftrace
270 User need to use filter (/ip==*/) to trace specfic functions.
271 Function must be listed in /sys/kernel/debug/tracing/available_filter_functions
273 > ***Note*** of function event
275 > perf support ftrace:function tracepoint since Linux 3.3(see below commit),
276 > ktap is based on perf callback, so it means kernel must be newer than 3.3
277 > then can use this feature.
279 > commit ced39002f5ea736b716ae233fb68b26d59783912
280 > Author: Jiri Olsa <jolsa@redhat.com>
281 > Date: Wed Feb 15 15:51:52 2012 +0100
283 > ftrace, perf: Add support to use function tracepoint in perf
289 -------------------- -----------------------------------
290 probe:schedule trace schedule function
291 probe:schedule%return trace schedule function return
292 probe:SyS_write trace SyS_write function
293 probe:vfs* trace wildcards vfs related function
295 kprobe functions must be listed in /proc/kallsyms
299 ------------------------------------ ---------------------------
300 probe:/lib64/libc.so.6:malloc trace malloc function
301 probe:/lib64/libc.so.6:malloc%return trace malloc function return
302 probe:/lib64/libc.so.6:free trace free function
303 probe:/lib64/libc.so.6:0x82000 trace function with file offset 0x82000
304 probe:/lib64/libc.so.6:* trace all libc function
306 symbol resolving need libelf support
311 ------------------------------------ --------------------------
312 sdt:/libc64/libc.so.6:lll_futex_wake trace stapsdt lll_futex_wake
313 sdt:/libc64/libc.so.6:* trace all static markers in libc
315 sdt resolving need libelf support
318 **trace_end { ACTION }**
320 ## Tracing built-in variables
323 event object, you can print it by: print(argevent), it will print events
324 into human readable string, the result is mostly same as each entry of
325 /sys/kernel/debug/tracing/trace
328 event name, each event have a name associated with it.
331 get argument 1..9 of event object.
333 > ***Note*** of arg offset
335 > The arg offset(1..9) is determined by event format shown in debugfs.
337 > #cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
341 > field:char prev_comm[32]; <- arg1
342 > field:pid_t prev_pid; <- arg2
343 > field:int prev_prio; <- arg3
344 > field:long prev_state; <- arg4
345 > field:char next_comm[32]; <- arg5
346 > field:pid_t next_pid; <- arg6
347 > field:int next_prio; <- arg7
349 > As shown, tracepoint event sched:sched_switch have 7 arguments, from arg1 to
352 > Need to note that arg1 of syscall event is syscall number, not first argument
353 > of syscall function. Use arg2 as first argument of syscall function.
356 > SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
357 > <arg2> <arg3> <arg4>
359 > This is similar with kprobe and uprobe, the arg1 of kprobe/uprobe event
360 > always is _probe_ip, not the first argument given by user, for example:
362 > # ktap -e 'trace probe:/lib64/libc.so.6:malloc size=%di'
364 > # cat /sys/kernel/debug/tracing/events/ktap_uprobes_3796/malloc/format
365 > field:unsigned long __probe_ip; <- arg1
366 > field:u64 size; <- arg2
371 **tick-Ns { ACTION }**
372 **tick-Nsec { ACTION }**
373 **tick-Nms { ACTION }**
374 **tick-Nmsec { ACTION }**
375 **tick-Nus { ACTION }**
376 **tick-Nusec { ACTION }**
378 **profile-Ns { ACTION }**
379 **profile-Nsec { ACTION }**
380 **profile-Nms { ACTION }**
381 **profile-Nmsec { ACTION }**
382 **profile-Nus { ACTION }**
383 **profile-Nusec { ACTION }**
385 architecture overview picture reference(pnp format)
389 # Advanced tracing pattern
391 Aggregation/Histogram
395 # Overhead/Performance
397 ktap have more fast boot time thant Systemtap(try the helloword script)
398 ktap have little memory usage than Systemtap
399 and some scripts show that ktap have a little overhead than Systemtap
400 (we choosed two scripts to compare, function profile, stack profile.
401 this is not means all scripts in Systemtap have big overhead than ktap)
406 **Q: Why use bytecode design?**
407 A: Using bytecode would be a clean and lightweight solution,
408 you don't need gcc toolchain to compile every scripts, all you
409 need is a ktapvm kernel modules and userspace tool called ktap.
410 Since its language virtual machine design, it have great portability,
411 suppose you are working at a multi-arch cluster, if you want to run
412 a tracing script on each board, you won't need cross-compile tracing
413 script onto all board, what you really need to do is use ktap tool
414 to run script just in time.
416 Bytecode based design also will make executing more safer, than native code
419 Reality already showing that SystemTap is not widely used in embedded Linux,
420 caused by problem of SystemTap's architecture design choice, it's a natural
421 design for Redhat and IBM, because Redhat/IBM is focusing on server area,
424 **Q: What's the differences with SystemTap and Dtrace?**
425 A: For SystemTap, the answer is already mentioned at above question,
426 SystemTap use translator design, for trade-off on performance with usability,
427 based on GCC, that's what ktap want to solve.
429 For Dtrace, one common design with Dtrace is also use bytecode, so basically
430 Dtrace and ktap is on the same road. There have some projects aim to porting
431 Dtrace from Solaris to Linux, but the process is still on the road, Dtrace
432 is rooted in Solaris, and there have many huge differences between Solaris
433 tracing infrastructure with Linux's.
435 Dtrace is based on D language, a language subset of C, it's a restricted
436 language, like without for-looping, for safty use in production system.
437 It seems that Dtrace for Linux only support x86 architecture, not work on
438 powerpc and arm/mips, obviously it's not suit for embedded Linux currently.
440 Dtrace use ctf as input for debuginfo handing, compare with vmlinux for
443 On the license part, Dtrace is released as CDDL, which is incompatible with
444 GPL(this is why it's impossible to upstream Dtrace into mainline).
446 **Q: Why use dynamically typed language? but not statically typed language?**
447 A: It's hard to say which one is more better than other, dynamically typed
448 language bring efficiency and fast prototype production, but loosing type
449 check at compiling phase, and easy to make mistake in runtime, also it's
450 need many runtime checking, In contrast, statically typed language win on
451 programing safety, and performance. Statically language would suit for
452 interoperate with kernel, as kernel is wrote mainly in C, Need to note that
453 SystemTap and Dtrace both is statically language.
455 ktap choose dynamically typed language as initial implementation.
457 **Q: Why we need ktap for event tracing? There already have a built-in ftrace**
458 A: This also is a common question for all dynamic tracing tool, not only ktap.
459 ktap provide more flexibility than built-in tracing infrastructure. Suppose
460 you need print a global variable when tracepoint hit, or you want print
461 backtrace, even more, you want to store some info into associative array, and
462 display it in histogram style when tracing end, in these case, some of them
463 ftrace can take it, some of them ftrace can not.
464 Overall, ktap provide you with great flexibility to scripting your own trace
467 **Q: How about the performance? Is ktap slow?**
468 A: ktap is not slow, the bytecode is very high-level, based on lua, the language
469 virtual machine is register-based(compare with stack-based), with little
470 instruction, the table data structure is heavily optimized in ktapvm.
471 ktap use per-cpu allocation in many place, without global locking scheme,
472 it's very fast when executing tracepoint callback.
473 Performance benchmark showing that the overhead of ktap running is nearly
474 10%(store event name into associative array), compare with full speed
475 running without any tracepoint enabled.
477 ktap will optimize overhead all the time, hopefully the overhead will
478 decrease to little than 5%, even more.
480 **Q: Why not porting a high level language implementation into kernel directly?
482 A: I take serious on the size of vm and memory footprint. Python vm is large,
483 it's not suit to embed into kernel, and python have some functionality
486 The bytecode of other high level language is also big, ktap only have 32
487 bytecodes, python/java/erlang have nearly two hundred bytecodes.
488 There also have some problems when porting those language into kernel,
489 userspace programming have many differences with kernel programming,
490 like float numbers, handle sleeping code carefully in kernel, deadloop is
491 not allowed in kernel, multi-thread management, etc.., so it's impossible
492 to porting language implementation into kernel with little adaption work.
494 **Q: What's the status of ktap now?**
495 A: Basically it works on x86-32, x86-64, powerpc, arm, it also could work for
496 other hardware architecture, but not proven yet(I don't have enough hardware
498 If you found some bug, fix it on you own programming skill, or report to me.
500 **Q: How to hack ktap? I want to write some extensions onto ktap.**
502 You can write your own library to fulfill your specific need,
503 you can write any script as you want.
505 **Q: What's the plan of ktap? any roadmap?**
506 A: the current plan is deliver stable ktapvm kernel modules, more ktap script,
512 * [Linux Performance Analysis and Tools][LPAT]
513 * [Dtrace Blog][dtraceblog]
514 * [Dtrace User Guide][dug]
515 * [LWN: ktap -- yet another kernel tracer][lwn1]
516 * [LWN: Ktap almost gets into 3.13][lwn2]
517 * [staging: ktap: add to the kernel tree][ktap_commit]
518 * [ktap introduction in LinuxCon Japan 2013][lcj](content is out of date)
519 * [ktap Examples by Brendan Gregg][KEBG]
521 [LPAT]: http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf
522 [dtraceblog]: http://dtrace.org/blogs/
523 [dug]: http://docs.huihoo.com/opensolaris/dtrace-user-guide/html/index.html
524 [lwn1]: http://lwn.net/Articles/551314/
525 [lwn2]: http://lwn.net/Articles/572788/
526 [ktap_commit]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c63a164271f81220ff4966d41218a9101f3d0ec4
527 [lcj]: http://events.linuxfoundation.org/sites/events/files/lcjpcojp13_zhangwei.pdf
528 [KEBG]: http://www.brendangregg.com/ktap.html
532 * ktap was invented at 2002
533 * First RFC sent to LKML at 2012.12.31
534 * The code was released in github at 2013.01.18
535 * ktap released v0.1 at 2013.05.21
536 * ktap released v0.2 at 2013.07.31
537 * ktap released v0.3 at 2013.10.29
539 For more release info, please look at RELEASES.txt in project root directory.
543 1. simplest one-liner command to enable all tracepoints
545 ktap -e "trace *:* { print(argevent) }"
547 2. syscall tracing on target process
549 ktap -e "trace syscalls:* { print(argevent) }" -- ls
551 3. ftrace(kernel newer than 3.3, and must compiled with CONFIG_FUNCTION_TRACER)
553 ktap -e "trace ftrace:function { print(argevent) }"
555 ktap -e "trace ftrace:function /ip==mutex*/ { print(argevent) }"
557 4. simple syscall tracing
560 print(cpu(), pid(), execname(), argevent)
563 5. syscall tracing in histogram style
567 trace syscalls:sys_enter_* {
577 trace probe:do_sys_open dfd=%di fname=%dx flags=%cx mode=+4($stack) {
578 print("entry:", execname(), argevent)
581 trace probe:do_sys_open%return fd=$retval {
582 print("exit:", execname(), argevent)
587 trace probe:/lib/libc.so.6:malloc {
588 print("entry:", execname(), argevent)
591 trace probe:/lib/libc.so.6:malloc%return {
592 print("exit:", execname(), argevent)
595 8. stapsdt tracing (userspace static marker)
597 trace sdt:/lib64/libc.so.6:lll_futex_wake {
598 print("lll_futex_wake", execname(), argevent)
603 #trace all static mark in libc
604 trace sdt:/lib64/libc.so.6:* {
605 print(execname(), argevent)
611 printf("time fired on one cpu\n");
615 printf("time fired on every cpu\n");
618 10. FFI (Call kernel function from ktap script, need compile with FFI=1)
621 int printk(char *fmt, ...);
624 C.printk("This message is called from ktap ffi\n")
626 More examples can be found at [samples][samples_dir] directory.
628 [samples_dir]: https://github.com/ktap/ktap/tree/master/samples
632 Here is the complete syntax of ktap in extended BNF.
633 (based on lua syntax: http://www.lua.org/manual/5.1/manual.html#5.1)
635 chunk ::= {stat [';']} [laststat [';']
639 stat ::= varlist '=' explist |
642 while exp { block } |
643 repeat block until exp |
644 if exp { block {elseif exp { block }} [else block] } |
645 for Name '=' exp ',' exp [',' exp] { block } |
646 for namelist in explist { block } |
647 function funcname funcbody |
648 local function Name funcbody |
649 local namelist ['=' explist]
651 laststat ::= return [explist] | break
653 funcname ::= Name {'.' Name} [':' Name]
655 varlist ::= var {',' var}
657 var ::= Name | prefixexp '[' exp ']'| prefixexp '.' Name
659 namelist ::= Name {',' Name}
661 explist ::= {exp ',' exp
663 exp ::= nil | false | true | Number | String | '...' | function |
664 prefixexp | tableconstructor | exp binop exp | unop exp
666 prefixexp ::= var | functioncall | '(' exp ')'
668 functioncall ::= prefixexp args | prefixexp ':' Name args
670 args ::= '(' [explist] ')' | tableconstructor | String
672 function ::= function funcbody
674 funcbody ::= '(' [parlist] ')' { block }
676 parlist ::= namelist [',' '...'] | '...'
678 tableconstructor ::= '{' [fieldlist] '}'
680 fieldlist ::= field {fieldsep field} [fieldsep]
682 field ::= '[' exp ']' '=' exp | Name '=' exp | exp
684 fieldsep ::= ',' | ';'
686 binop ::= '+' | '-' | '*' | '/' | '^' | '%' | '..' |
687 '<' | '<=' | '>' | '>=' | '==' | '!=' |