[CSSPGO][llvm-profgen] Instruction symbolization
authorwlei <wlei@fb.com>
Mon, 19 Oct 2020 17:19:08 +0000 (10:19 -0700)
committerwlei <wlei@fb.com>
Fri, 20 Nov 2020 22:26:27 +0000 (14:26 -0800)
commit0196b45ceaf8784eae058e6af4fd943f16a2d071
treefd38c717d0ec4e40ccaa10c272f0abf3b308dd13
parent32221694cb927b6acc7a8e16af7155e4e31418a4
[CSSPGO][llvm-profgen] Instruction symbolization

This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change adds the support of instruction symbolization. Given the RVA on an instruction pointer, a full calling context can be printed side-by-side with the disassembly code.
E.g.
```
 Disassembly of section .text [0x0, 0x4a]:

 <funcA>:
     0: mov eax, edi                           funcA:0
     2: mov ecx, dword ptr [rip]               funcLeaf:2 @ funcA:1
     8: lea edx, [rcx + 3]                     fib:2 @ funcLeaf:2 @ funcA:1
     b: cmp ecx, 3                             fib:2 @ funcLeaf:2 @ funcA:1
     e: cmovl edx, ecx                           fib:2 @ funcLeaf:2 @ funcA:1
    11: sub eax, edx                           funcLeaf:2 @ funcA:1
    13: ret                                        funcA:2
    14: nop word ptr cs:[rax + rax]
    1e: nop

 <funcLeaf>:
    20: mov eax, edi                           funcLeaf:1
    22: mov ecx, dword ptr [rip]               funcLeaf:2
    28: lea edx, [rcx + 3]                     fib:2 @ funcLeaf:2
    2b: cmp ecx, 3                             fib:2 @ funcLeaf:2
    2e: cmovl edx, ecx                           fib:2 @ funcLeaf:2
    31: sub eax, edx                           funcLeaf:2
    33: ret                                        funcLeaf:3
    34: nop word ptr cs:[rax + rax]
    3e: nop

 <fib>:
    40: lea eax, [rdi + 3]                     fib:2
    43: cmp edi, 3                             fib:2
    46: cmovl eax, edi                           fib:2
    49: ret                                        fib:8
```

Test Plan:
ninja check-llvm

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D89715
llvm/test/tools/llvm-profgen/symbolize.ll [new file with mode: 0644]
llvm/tools/llvm-profgen/CMakeLists.txt
llvm/tools/llvm-profgen/CallContext.h [new file with mode: 0644]
llvm/tools/llvm-profgen/ProfiledBinary.cpp
llvm/tools/llvm-profgen/ProfiledBinary.h