Optimize tracking of active allocations.
Instead of mapping the pointer to a pair of allocated size and
trace index, we intern the pair and map the pointer to a 32bit index.
This assumes that not more than 4.294.967.295 different pairs occur,
which could theoretically be broken by allocating different sizes
in a loop. Practically, I've never seen this happen. If it really
breaks we can always bump it to a 64bit index later.
Furthermore, this patch introduces a new PointerMap, which drastically
reduces the memory overhead of tracking the allocations. The benchmark
which also gets added here, shows that its overhead is only ~20%
compared to the 100% overhead of a simple hash map. Also note that
even google's sparse_hash_map only gets down to 50% overhead.
Even better, the PointerMap implementation is faster than google's
sparse_hash_map. This is possible by leveraging some information about
memory allocations, which return pointers to pages. Thus the pointers
are clustered and we can shrink a 64bit pointer to an common shared
base pointer and a small 16bit offset.
On my machine, the runtime performance is close to that of the simple
hash map. As such, I decided to move this allocation tracking into
heaptrack_interpret itself. This drastically reduces the file size
of heaptrack data files, sometimes cutting the size into half. Even
better, this speeds some heaptrack benchmarks, as less data is written
to disk. And of course all of this leads to an much improved
performance of heaptrack_gui and heaptrack_print.
13 files changed: