TODO

   1 -*-org-*-
   2 * TODO
   3 ** Automatic prototype discovery:
   4 *** Use debuginfo if available
   5     Alternatively, use debuginfo to generate configure file.
   6 *** Mangled identifiers contain partial prototypes themselves
   7     They don't contain return type info, which can change the
   8     parameter passing convention.  We could use it and hope for the
   9     best.
  10 ** Automatically update list of syscalls?
  11 ** More operating systems (solaris?)
  12 ** Get rid of EVENT_ARCH_SYSCALL and EVENT_ARCH_SYSRET
  13 ** Implement displaced tracing
  14    A technique used in GDB (and in uprobes, I believe), whereby the
  15    instruction under breakpoint is moved somewhere else, and followed
  16    by a jump back to original place.  When the breakpoint hits, the IP
  17    is moved to the displaced instruction, and the process is
  18    continued.  We avoid all the fuss with singlestepping and
  19    reenablement.
  20 ** Create different ltrace processes to trace different children
  21 ** Config file syntax
  22 *** mark some symbols as exported
  23     For PLT hits, only exported prototypes would be considered.  For
  24     symtab entry point hits, all would be.
  25
  26 *** named arguments
  27     This would be useful for replacing the arg1, emt2 etc.
  28
  29 *** parameter pack improvements
  30     The above format tweaks require that packs that expand to no types
  31     at all be supported.  If this works, then it should be relatively
  32     painless to implement conditionals:
  33
  34     | void ptrace(REQ=enum(PTRACE_TRACEME=0,...),
  35     |             if[REQ==0](pack(),pack(pid_t, void*, void *)))
  36
  37     This is of course dangerously close to a programming language, and
  38     I think ltrace should be careful to stay as simple as possible.
  39     (We can hook into Lua, or TinyScheme, or some such if we want more
  40     general scripting capabilities.  Implementing something ad-hoc is
  41     undesirable.)  But the above can be nicely expressed by pattern
  42     matching:
  43
  44     | void ptrace(REQ=enum[int](...)):
  45     |   [REQ==0] => ()
  46     |   [REQ==1 or REQ==2] => (pid_t, void*)
  47     |   [true] => (pid_t, void*, void*);
  48
  49     Or:
  50
  51     | int open(string, FLAGS=flags[int](O_RDONLY=00,...,O_CREAT=0100,...)):
  52     |   [(FLAGS & 0100) != 0] => (flags[int](S_IRWXU,...))
  53
  54     This would still require pretty complete expression evaluation.
  55     _Including_ pointer dereferences and such.  And e.g. in accept, we
  56     need subtraction:
  57
  58     | int accept(int, +struct(short, +array(hex(char), X-2))*, (X=uint)*);
  59
  60     Perhaps we should hook to something after all.
  61
  62 *** system call error returns
  63
  64     This is closely related to above.  Take the following syscall
  65     prototype:
  66
  67     | long read(int,+string0,ulong);
  68
  69     string0 means the same as string(array(char, zero(retval))*).  But
  70     if read returns a negative value, that signifies errno.  But zero
  71     takes this at face value and is suspicious:
  72
  73     | read@SYS(3 <no return ...>
  74     | error: maximum array length seems negative
  75     | , "\n\003\224\003\n", 4096)                  = -11
  76
  77     Ideally we would do what strace does, e.g.:
  78
  79     | read@SYS(3, 0x12345678, 4096)                = -EAGAIN
  80
  81 *** errno tracking
  82     Some calls result in setting errno.  Somehow mark those, and on
  83     failure, show errno.  System calls return errno as a negative
  84     value (see the previous point).
  85
  86 *** second conversions?
  87     This definitely calls for some general scripting.  The goal is to
  88     have seconds in adjtimex calls show as e.g. 10s, 1m15s or some
  89     such.
  90
  91 *** format should take arguments like string does
  92     Format should take value argument describing the value that should
  93     be analyzed.  The following overwriting rules would then apply:
  94
  95     | format       | format(array(char, zero)*) |
  96     | format(LENS) | X=LENS, format[X]          |
  97
  98     The latter expanded form would be canonical.
  99
 100     This depends on named arguments and parameter pack improvements
 101     (we need to be able to construct parameter packs that expand to
 102     nothing).
 103
 104 *** More fine-tuned control of right arguments
 105     Combination of named arguments and some extensions could take care
 106     of that:
 107
 108     | void func(X=hide(int*), long*, +pack(X)); |
 109
 110     This would show long* as input argument (i.e. the function could
 111     mangle it), and later show the pre-fetched X.  The "pack" syntax is
 112     utterly undeveloped as of now.  The general idea is to produce
 113     arguments that expand to some mix of types and values.  But maybe
 114     all we need is something like
 115
 116     | void func(out int*, long*); |
 117
 118     ltrace would know that out/inout/in arguments are given in the
 119     right order, but left pass should display in and inout arguments
 120     only, and right pass then out and inout.  + would be
 121     backward-compatible syntactic sugar, expanded like so:
 122
 123     | void func(int*, int*, +long*, long*);              |
 124     | void func(in int*, in int*, out long*, out long*); |
 125
 126     But sometimes we may want to see a different type on the way in and
 127     on the way out.  E.g. in asprintf, what's interesting on the way in
 128     is the address, but on the way out we want to see buffer contents.
 129     Does something like the following make sense?
 130
 131     | void func(X=void*, long*, out string(X)); |
 132
 133 ** Support for functions that never return
 134    This would be useful for __cxa_throw, presumably also for longjmp
 135    (do we handle that at all?) and perhaps a handful of others.
 136
 137 ** Support flag fields
 138    enum-like syntax, except disjunction of several values is assumed.
 139 ** Support long long
 140    We currently can't define time_t on 32bit machines.  That mean we
 141    can't describe a range of time-related functions.
 142
 143 ** Support signed char, unsigned char, char
 144    Also, don't format it as characted by default, string lens can do
 145    it.  Perhaps introduce byte and ubyte and leave 'char' as alias of
 146    one of those with string lens applied by default.
 147
 148 ** Support fixed-width types
 149    Really we should keep everything as {u,}int{8,16,32,64} internally,
 150    and have long, short and others be translated to one of those
 151    according to architecture rules.  Maybe this could be achieved by a
 152    per-arch config file with typedefs such as:
 153
 154    | typedef ulong = uint8_t; |
 155
 156 ** Support for ARM/AARCH64 types
 157    - ARM and AARCH64 both support half-precision floating point
 158      - there are two different half-precision formats, IEEE 754-2008
 159        and "alternative".  Both have 10 bits of mantissa and 5 bits of
 160        exponent, and differ only in how exponent==0x1F is handled.  In
 161        IEEE format, we get NaN's and infinities; in alternative
 162        format, this encodes normalized value -1S × 2¹⁶ × (1.mant)
 163      - The Floating-Point Control Register, FPCR, controls: — The
 164        half-precision format where applicable, FPCR.AHP bit.
 165    - AARCH64 supports fixed-point interpretation of {,double}words
 166      - e.g. fixed(int, X) (int interpreted as a decimal number with X
 167        binary digits of fraction).
 168    - AARCH64 supports 128-bit quad words in SIMD
 169
 170 ** Some more functions in vect might be made to take const*
 171    Or even marked __attribute__((pure)).
 172
 173 ** pretty printer support
 174    GDB supports python pretty printers.  We migh want to hook this in
 175    and use it to format certain types.
 176
 177 * BUGS
 178 ** After a clone(), syscalls may be seen as sysrets in s390 (see trace.c:syscall_p())