doc/libunwind-dynamic.tex

   1 \documentclass{article}
   2 \usepackage[fancyhdr,pdf]{latex2man}
   3
   4 \input{common.tex}
   5
   6 \begin{document}
   7
   8 \begin{Name}{3}{libunwind-dynamic}{David Mosberger-Tang}{Programming Library}{Introduction to dynamic unwind-info}libunwind-dynamic -- libunwind-support for runtime-generated code
   9 \end{Name}
  10
  11 \section{Introduction}
  12
  13 For \Prog{libunwind} to do its job, it needs to be able to reconstruct
  14 the \emph{frame state} of each frame in a call-chain.  The frame state
  15 describes the subset of the machine-state that consists of the
  16 \emph{frame registers} (typically the instruction-pointer and the
  17 stack-pointer) and all callee-saved registers (preserved registers).
  18 The frame state describes each register either by providing its
  19 current value (for frame registers) or by providing the location at
  20 which the current value is stored (callee-saved registers).
  21
  22 For statically generated code, the compiler normally takes care of
  23 emitting \emph{unwind-info} which provides the minimum amount of
  24 information needed to reconstruct the frame-state for each instruction
  25 in a procedure.  For dynamically generated code, the runtime code
  26 generator must use the dynamic unwind-info interface provided by
  27 \Prog{libunwind} to supply the equivalent information.  This manual
  28 page describes the format of this information in detail.
  29
  30 For the purpose of this discussion, a \emph{procedure} is defined to
  31 be an arbitrary piece of \emph{contiguous} code.  Normally, each
  32 procedure directly corresponds to a function in the source-language
  33 but this is not strictly required.  For example, a runtime
  34 code-generator could translate a given function into two separate
  35 (discontiguous) procedures: one for frequently-executed (hot) code and
  36 one for rarely-executed (cold) code.  Similarly, simple
  37 source-language functions (usually leaf functions) may get translated
  38 into code for which the default unwind-conventions apply and for such
  39 code, it is not strictly necessary to register dynamic unwind-info.
  40
  41 A procedure logically consists of a sequence of \emph{regions}.
  42 Regions are nested in the sense that the frame state at the end of one
  43 region is, by default, assumed to be the frame state for the next
  44 region.  Each region is thought of as being divided into a
  45 \emph{prologue}, a \emph{body}, and an \emph{epilogue}.  Each of them
  46 can be empty.  If non-empty, the prologue sets up the frame state for
  47 the body.  For example, the prologue may need to allocate some space
  48 on the stack and save certain callee-saved registers.  The body
  49 performs the actual work of the procedure but does not change the
  50 frame state in any way.  If non-empty, the epilogue restores the
  51 previous frame state and as such it undoes or cancels the effect of
  52 the prologue.  In fact, a single epilogue may undo the effect of the
  53 prologues of several (nested) regions.
  54
  55 We should point out that even though the prologue, body, and epilogue
  56 are logically separate entities, optimizing code-generators will
  57 generally interleave instructions from all three entities.  For this
  58 reason, the dynamic unwind-info interface of \Prog{libunwind} makes no
  59 distinction whatsoever between prologue and body.  Similarly, the
  60 exact set of instructions that make up an epilogue is also irrelevant.
  61 The only point in the epilogue that needs to be described explicitly
  62 by the dynamic unwind-info is the point at which the stack-pointer
  63 gets restored.  The reason this point needs to be described is that
  64 once the stack-pointer is restored, all values saved in the
  65 deallocated portion of the stack frame become invalid and hence
  66 \Prog{libunwind} needs to know about it.  The portion of the frame
  67 state not saved on the stack is assume to remain valid through the end
  68 of the region.  For this reason, there is usually no need to describe
  69 instructions which restore the contents of callee-saved registers.
  70
  71 Within a region, each instruction that affects the frame state in some
  72 fashion needs to be described with an operation descriptor.  For this
  73 purpose, each instruction in the region is assigned a unique index.
  74 Exactly how this index is derived depends on the architecture.  For
  75 example, on RISC and EPIC-style architecture, instructions have a
  76 fixed size so it's possible to simply number the instructions.  In
  77 contrast, most CISC use variable-length instruction encodings, so it
  78 is usually necessary to use a byte-offset as the index.  Given the
  79 instruction index, the operation descriptor specifies the effect of
  80 the instruction in an abstract manner.  For example, it might express
  81 that the instruction stores calle-saved register \Var{r1} at offset 16
  82 in the stack frame.
  83
  84 \section{Procedures}
  85
  86 A runtime code-generator registers the dynamic unwind-info of a
  87 procedure by setting up a structure of type \Type{unw\_dyn\_info\_t}
  88 and calling \Func{\_U\_dyn\_register}(), passing the address of the
  89 structure as the sole argument.  The members of the
  90 \Type{unw\_dyn\_info\_t} structure are described below:
  91 \begin{itemize}
  92 \item[\Type{void~*}next] Private to \Prog{libunwind}.  Must not be used
  93   by the application.
  94 \item[\Type{void~*}prev] Private to \Prog{libunwind}.  Must not be used
  95   by the application.
  96 \item[\Type{unw\_word\_t} \Var{start\_ip}] The start-address of the
  97   instructions of the procedure (remember: procedure are defined to be
  98   contiguous pieces of code, so a single code-range is sufficient).
  99 \item[\Type{unw\_word\_t} \Var{end\_ip}] The end-address of the
 100   instructions of the procedure (non-inclusive, that is,
 101   \Var{end\_ip}-\Var{start\_ip} is the size of the procedure in
 102   bytes).
 103 \item[\Type{unw\_word\_t} \Var{gp}] The global-pointer value in use
 104   for this procedure.  The exact meaing of the global-pointer is
 105   architecture-specific and on some architecture, it is not used at
 106   all.
 107 \item[\Type{int32\_t} \Var{format}] The format of the unwind-info.
 108   This member can be one of \Const{UNW\_INFO\_FORMAT\_DYNAMIC},
 109   \Const{UNW\_INFO\_FORMAT\_TABLE}, or
 110   \Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
 111 \item[\Type{union} \Var{u}] This union contains one sub-member
 112   structure for every possible unwind-info format:
 113   \begin{description}
 114   \item[\Type{unw\_dyn\_proc\_info\_t} \Var{pi}] This member is used
 115     for format \Const{UNW\_INFO\_FORMAT\_DYNAMIC}.
 116   \item[\Type{unw\_dyn\_table\_info\_t} \Var{ti}] This member is used
 117     for format \Const{UNW\_INFO\_FORMAT\_TABLE}.
 118   \item[\Type{unw\_dyn\_remote\_table\_info\_t} \Var{rti}] This member
 119     is used for format \Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
 120   \end{description}\
 121   The format of these sub-members is described in detail below.
 122 \end{itemize}
 123
 124 \subsection{Proc-info format}
 125
 126 This is the preferred dynamic unwind-info format and it is generally
 127 the one used by full-blown runtime code-generators.  In this format,
 128 the details of a procedure are described by a structure of type
 129 \Type{unw\_dyn\_proc\_info\_t}.  This structure contains the following
 130 members:
 131 \begin{description}
 132
 133 \item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
 134   (human-readable) name of the procedure or 0 if no such name is
 135   available.  If non-zero, The string stored at this address must be
 136   ASCII NUL terminated.  For source languages that use name-mangling
 137   (such as C++ or Java) the string stored at this address should be
 138   the \emph{demangled} version of the name.
 139
 140 \item[\Type{unw\_word\_t} \Var{handler}] The address of the
 141   personality-routine for this procedure.  Personality-routines are
 142   used in conjunction with exception handling.  See the C++ ABI draft
 143   (http://www.codesourcery.com/cxx-abi/) for an overview and a
 144   description of the personality routine.  If the procedure has no
 145   personality routine, \Var{handler} must be set to 0.
 146
 147 \item[\Type{uint32\_t} \Var{flags}] A bitmask of flags.  At the
 148   moment, no flags have been defined and this member must be
 149   set to 0.
 150
 151 \item[\Type{unw\_dyn\_region\_info\_t~*}\Var{regions}] A NULL-terminated
 152   linked list of region-descriptors.  See section ``Region
 153   descriptors'' below for more details.
 154
 155 \end{description}
 156
 157 \subsection{Table-info format}
 158
 159 This format is generally used when the dynamically generated code was
 160 derived from static code and the unwind-info for the dynamic and the
 161 static versions is identical.  For example, this format can be useful
 162 when loading statically-generated code into an address-space in a
 163 non-standard fashion (i.e., through some means other than
 164 \Func{dlopen}()).  In this format, the details of a group of procedures
 165 is described by a structure of type \Type{unw\_dyn\_table\_info}.
 166 This structure contains the following members:
 167 \begin{description}
 168
 169 \item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
 170   (human-readable) name of the procedure or 0 if no such name is
 171   available.  If non-zero, The string stored at this address must be
 172   ASCII NUL terminated.  For source languages that use name-mangling
 173   (such as C++ or Java) the string stored at this address should be
 174   the \emph{demangled} version of the name.
 175
 176 \item[\Type{unw\_word\_t} \Var{segbase}] The segment-base value
 177   that needs to be added to the segment-relative values stored in the
 178   unwind-info.  The exact meaning of this value is
 179   architecture-specific.
 180
 181 \item[\Type{unw\_word\_t} \Var{table\_len}] The length of the
 182   unwind-info (\Var{table\_data}) counted in units of words
 183   (\Type{unw\_word\_t}).
 184
 185 \item[\Type{unw\_word\_t} \Var{table\_data}] A pointer to the actual
 186   data encoding the unwind-info.  The exact format is
 187   architecture-specific (see architecture-specific sections below).
 188
 189 \end{description}
 190
 191 \subsection{Remote table-info format}
 192
 193 The remote table-info format has the same basic purpose as the regular
 194 table-info format.  The only difference is that when \Prog{libunwind}
 195 uses the unwind-info, it will keep the table data in the target
 196 address-space (which may be remote).  Consequently, the type of the
 197 \Var{table\_data} member is \Type{unw\_word\_t} rather than a pointer.
 198 This implies that \Prog{libunwind} will have to access the table-data
 199 via the address-space's \Func{access\_mem}() call-back, rather than
 200 through a direct memory reference.
 201
 202 From the point of view of a runtime-code generator, the remote
 203 table-info format offers no advantage and it is expected that such
 204 generators will describe their procedures either with the proc-info
 205 format or the normal table-info format.  The main reason that the
 206 remote table-info format exists is to enable the
 207 address-space-specific \Func{find\_proc\_info}() callback (see
 208 \SeeAlso{unw\_create\_addr\_space}(3)) to return unwind tables whose
 209 data remains in remote memory.  This can speed up unwinding (e.g., for
 210 a debugger) because it reduces the amount of data that needs to be
 211 loaded from remote memory.
 212
 213 \section{Regions descriptors}
 214
 215 A region descriptor is a variable length structure that describes how
 216 each instruction in the region affects the frame state.  Of course,
 217 most instructions in a region usualy do not change the frame state and
 218 for those, nothing needs to be recorded in the region descriptor.  A
 219 region descriptor is a structure of type
 220 \Type{unw\_dyn\_region\_info\_t} and has the following members:
 221 \begin{description}
 222 \item[\Type{unw\_dyn\_region\_info\_t~*}\Var{next}] A pointer to the
 223   next region.  If this is the last region, \Var{next} is \Const{NULL}.
 224 \item[\Type{int32\_t} \Var{insn\_count}] The length of the region in
 225   instructions.  Each instruction is assumed to have a fixed size (see
 226   architecture-specific sections for details).  The value of
 227   \Var{insn\_count} may be negative in the last region of a procedure
 228   (i.e., it may be negative only if \Var{next} is \Const{NULL}).  A
 229   negative value indicates that the region covers the last \emph{N}
 230   instructions of the procedure, where \emph{N} is the absolute value
 231   of \Var{insn\_count}.
 232 \item[\Type{uint32\_t} \Var{op\_count}] The (allocated) length of
 233   the \Var{op\_count} array.
 234 \item[\Type{unw\_dyn\_op\_t} \Var{op}] An array of dynamic unwind
 235   directives.  See Section ``Dynamic unwind directives'' for a
 236   description of the directives.
 237 \end{description}
 238 A region descriptor with an \Var{insn\_count} of zero is an
 239 \emph{empty region} and such regions are perfectly legal.  In fact,
 240 empty regions can be useful to establish a particular frame state
 241 before the start of another region.
 242
 243 A single region list can be shared across multiple procedures provided
 244 those procedures share a common prologue and epilogue (their bodies
 245 may differ, of course).  Normally, such procedures consist of a canned
 246 prologue, the body, and a canned epilogue.  This could be described by
 247 two regions: one covering the prologue and one covering the epilogue.
 248 Since the body length is variable, the latter region would need to
 249 specify a negative value in \Var{insn\_count} such that
 250 \Prog{libunwind} knows that the region covers the end of the procedure
 251 (up to the address specified by \Var{end\_ip}).
 252
 253 The region descriptor is a variable length structure to make it
 254 possible to allocate all the necessary memory with a single
 255 memory-allocation request.  To facilitate the allocation of a region
 256 descriptors \Prog{libunwind} provides a helper routine with the
 257 following synopsis:
 258
 259 \noindent
 260 \Type{size\_t} \Func{\_U\_dyn\_region\_size}(\Type{int} \Var{op\_count});
 261
 262 This routine returns the number of bytes needed to hold a region
 263 descriptor with space for \Var{op\_count} unwind directives.  Note
 264 that the length of the \Var{op} array does not have to match exactly
 265 with the number of directives in a region.  Instead, it is sufficient
 266 if the \Var{op} array contains at least as many entries as there are
 267 directives, since the end of the directives can always be indicated
 268 with the \Const{UNW\_DYN\_STOP} directive.
 269
 270 \section{Dynamic unwind directives}
 271
 272 A dynamic unwind directive describes how the frame state changes
 273 at a particular point within a region.  The description is in
 274 the form of a structure of type \Type{unw\_dyn\_op\_t}.  This
 275 structure has the following members:
 276 \begin{description}
 277 \item[\Type{int8\_t} \Var{tag}] The operation tag.  Must be one
 278   of the \Type{unw\_dyn\_operation\_t} values described below.
 279 \item[\Type{int8\_t} \Var{qp}] The qualifying predicate that controls
 280   whether or not this directive is active.  This is useful for
 281   predicated architecturs such as IA-64 or ARM, where the contents of
 282   another (callee-saved) register determines whether or not an
 283   instruction is executed (takes effect).  If the directive is always
 284   active, this member should be set to the manifest constant
 285   \Const{\_U\_QP\_TRUE} (this constant is defined for all
 286   architectures, predicated or not).
 287 \item[\Type{int16\_t} \Var{reg}] The number of the register affected
 288   by the instruction.
 289 \item[\Type{int32\_t} \Var{when}] The region-relative number of
 290   the instruction to which this directive applies.  For example,
 291   a value of 0 means that the effect described by this directive
 292   has taken place once the first instruction in the region has
 293   executed.
 294 \item[\Type{unw\_word\_t} \Var{val}] The value to be applied by the
 295   operation tag.  The exact meaning of this value varies by tag.  See
 296   Section ``Operation tags'' below.
 297 \end{description}
 298 It is perfectly legitimate to specify multiple dynamic unwind
 299 directives with the same \Var{when} value, if a particular instruction
 300 has a complex effect on the frame state.
 301
 302 Empty regions by definition contain no actual instructions and as such
 303 the directives are not tied to a particular instruction.  By
 304 convention, the \Var{when} member should be set to 0, however.
 305
 306 There is no need for the dynamic unwind directives to appear
 307 in order of increasing \Var{when} values.  If the directives happen to
 308 be sorted in that order, it may result in slightly faster execution,
 309 but a runtime code-generator should not go to extra lengths just to
 310 ensure that the directives are sorted.
 311
 312 IMPLEMENTATION NOTE: should \Prog{libunwind} implementations for
 313 certain architectures prefer the list of unwind directives to be
 314 sorted, it is recommended that such implementations first check
 315 whether the list happens to be sorted already and, if not, sort the
 316 directives explicitly before the first use.  With this approach, the
 317 overhead of explicit sorting is only paid when there is a real benefit
 318 and if the runtime code-generator happens to generated sorted lists
 319 naturally, the performance penalty is limited to a simple O(N) check.
 320
 321 \subsection{Operations tags}
 322
 323 The possible operation tags are defined by enumeration type
 324 \Type{unw\_dyn\_operation\_t} which defines the following
 325 values:
 326 \begin{description}
 327
 328 \item[\Const{UNW\_DYN\_STOP}] Marks the end of the dynamic unwind
 329   directive list.  All remaining entries in the \Var{op} array of the
 330   region-descriptor are ignored.  This tag is guaranteed to have a
 331   value of 0.
 332
 333 \item[\Const{UNW\_DYN\_SAVE\_REG}] Marks an instruction which saves
 334   register \Var{reg} to register \Var{val}.
 335
 336 \item[\Const{UNW\_DYN\_SPILL\_FP\_REL}] Marks an instruction which
 337   spills register \Var{reg} to a frame-pointer-relative location.  The
 338   frame-pointer-relative offset is given by the value stored in member
 339   \Var{val}.  See the architecture-specific sections for a description
 340   of the stack frame layout.
 341
 342 \item[\Const{UNW\_DYN\_SPILL\_SP\_REL}] Marks an instruction which
 343   spills register \Var{reg} to a stack-pointer-relative location.  The
 344   stack-pointer-relative offset is given by the value stored in member
 345   \Var{val}.  See the architecture-specific sections for a description
 346   of the stack frame layout.
 347
 348 \item[\Const{UNW\_DYN\_ADD}] Marks an instruction which adds
 349   the constant value \Var{val} to register \Var{reg}.  To add subtract
 350   a constant value, store the two's-complement of the value in
 351   \Var{val}.  The set of registers that can be specified for this tag
 352   is described in the architecture-specific sections below.
 353
 354 \item[\Const{UNW\_DYN\_POP\_FRAMES}]
 355
 356 \item[\Const{UNW\_DYN\_LABEL\_STATE}]
 357
 358 \item[\Const{UNW\_DYN\_COPY\_STATE}]
 359
 360 \item[\Const{UNW\_DYN\_ALIAS}]
 361
 362 \end{description}
 363
 364 unw\_dyn\_op\_t
 365
 366 \_U\_dyn\_op\_save\_reg();
 367 \_U\_dyn\_op\_spill\_fp\_rel();
 368 \_U\_dyn\_op\_spill\_sp\_rel();
 369 \_U\_dyn\_op\_add();
 370 \_U\_dyn\_op\_pop\_frames();
 371 \_U\_dyn\_op\_label\_state();
 372 \_U\_dyn\_op\_copy\_state();
 373 \_U\_dyn\_op\_alias();
 374 \_U\_dyn\_op\_stop();
 375
 376 \section{IA-64 specifics}
 377
 378 - meaning of segbase member in table-info/table-remote-info format
 379 - format of table\_data in table-info/table-remote-info format
 380 - instruction size: each bundle is counted as 3 instructions, regardless
 381   of template (MLX)
 382 - describe stack-frame layout, especially with regards to sp-relative
 383   and fp-relative addressing
 384 - UNW\_DYN\_ADD can only add to ``sp'' (always a negative value); use
 385   POP\_FRAMES otherwise
 386
 387 \section{See Also}
 388
 389 \SeeAlso{libunwind(3)},
 390 \SeeAlso{\_U\_dyn\_register(3)},
 391 \SeeAlso{\_U\_dyn\_cancel(3)}
 392
 393 \section{Author}
 394
 395 \noindent
 396 David Mosberger-Tang\\
 397 Email: \Email{dmosberger@gmail.com}\\
 398 WWW: \URL{http://www.nongnu.org/libunwind/}.
 399 \LatexManEnd
 400
 401 \end{document}