speed up end_fde_sort using radix sort
When registering a dynamic unwinding frame the fde list is sorted.
Previously, we split the list into a sorted and an unsorted part,
sorted the later using heap sort, and merged both. That can be
quite slow due to the large number of (expensive) comparisons.
This patch replaces that logic with a radix sort instead. The
radix sort uses the same amount of memory as the old logic,
using the second list as auxiliary space, and it includes two
techniques to speed up sorting: First, it computes the pointer
addresses for blocks of values, reducing the decoding overhead.
And it recognizes when the data has reached a sorted state,
allowing for early termination. When running out of memory
we fall back to pure heap sort, as before.
For this test program
\#include <cstdio>
int main(int argc, char** argv) {
return 0;
}
compiled with g++ -O -o hello -static hello.c we get with
perf stat -r 200 on a 5950X the following performance numbers:
old logic:
0,20 msec task-clock
930.834 cycles
3.079.765 instructions
0,
00030478 +- 0,
00000237 seconds time elapsed
new logic:
0,10 msec task-clock
473.269 cycles
1.239.077 instructions
0,
00021119 +- 0,
00000168 seconds time elapsed
libgcc/ChangeLog:
* unwind-dw2-fde.c: Use radix sort instead of split+sort+merge.