Introducing support for callchain profile-driven optimizations (#47664)
authorTomáš Rylek <trylek@microsoft.com>
Tue, 9 Feb 2021 00:27:58 +0000 (01:27 +0100)
committerGitHub <noreply@github.com>
Tue, 9 Feb 2021 00:27:58 +0000 (01:27 +0100)
commitea93c3e78a315a1a948504db277bd799c896d4f8
tree634571e344fe3ada6317df1eea96c22aab4b3805
parent4c5b208474f438256938917de8e318ee0c22999e
Introducing support for callchain profile-driven optimizations (#47664)

After adding support for calculating callchain profile statistics
as an initial attempt at compile-time measure for code layout
quality, I have implemented an initial algorithm for method
placement based on the statistics. The algorithm is trivial, it
just sorts all caller-callee pairs resolved from the profile
by descending call counts and then goes through the list and
just places methods in the order in which they are found,
putting non-profiled methods last. In the System.Private.CoreLib
compilation using the profile file

simple_new_model_two_apps_1_7.json

(not yet the latest one from Siva with signatures), I'm seeing
the following difference in the statistics:

--callchain-method:none

CHARACTERISTIC            | PAIR COUNT | CALL COUNT | PERCENTAGE
----------------------------------------------------------------
ENTRIES TOTAL             |        291 |     266229 |     100.00
RESOLVED ENTRIES          |        267 |     260172 |      97.72
UNRESOLVED ENTRIES        |         24 |       6057 |       2.28
NEAR (INTRA-PAGE) CALLS   |        145 |     109055 |      40.96
FAR (CROSS-PAGE) CALLS    |        122 |     151117 |      56.76

--callchain-method:sort

CHARACTERISTIC            | PAIR COUNT | CALL COUNT | PERCENTAGE
----------------------------------------------------------------
ENTRIES TOTAL             |        291 |     266229 |     100.00
RESOLVED ENTRIES          |        267 |     260172 |      97.72
UNRESOLVED ENTRIES        |         24 |       6057 |       2.28
NEAR (INTRA-PAGE) CALLS   |        237 |     260110 |      97.70
FAR (CROSS-PAGE) CALLS    |         30 |         62 |       0.02

While the initial results seem encouraging, I guess that's mostly
because the profile used is very small, I guess that with more
complex profiles and / or with composite compilation of the entire
framework things may become substantially more complex.

Thanks

Tomas
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunFileLayoutOptimizer.cs
src/coreclr/tools/aot/crossgen2/Program.cs
src/coreclr/tools/aot/crossgen2/Properties/Resources.resx