Documentation/performance/JitOptimizerPlanningGuide.md

   1 JIT Optimizer Planning Guide
   2 ============================
   3
   4 The goal of this document is to capture some thinking about the process used to
   5 prioritize and validate optimizer investments.  The overriding goal of such
   6 investments is to help ensure that the dotnet platform satisfies developers'
   7 performance needs.
   8
   9
  10 Benchmarking
  11 ------------
  12
  13 There are a number of public benchmarks which evaluate different platforms'
  14 relative performance, so naturally dotnet's scores on such benchmarks give
  15 some indication of how well it satisfies developers' performance needs.  The JIT
  16 team has used some of these benchmarks, particularly [TechEmpower](https://www.techempower.com/benchmarks/)
  17 and [Benchmarks Game](http://benchmarksgame.alioth.debian.org/), for scouting
  18 out optimization opportunities and prioritizing optimization improvements.
  19 While it is important to track scores on such benchmarks to validate performance
  20 changes in the dotnet platform as a whole, when it comes to planning and
  21 prioritizing JIT optimization improvements specifically, they aren't sufficient,
  22 due to a few well-known issues:
  23
  24  - For macro-benchmarks, such as TechEmpower, compiler optimization is often not
  25    the dominant factor in performance.  The effects of individual optimizer
  26    changes are most often in the sub-percent range, well below the noise level
  27    of the measurements, which will usually be at least 3% or so even for the
  28    most well-behaved macro-benchmarks.
  29  - Source-level changes can be made much more rapidly than compiler optimization
  30    changes.  This means that for anything we're trying to track where the whole
  31    team is effecting changes in source, runtime, etc., any particular code
  32    sequence we may target with optimization improvements may well be targeted
  33    with source changes in the interim, nullifying the measured benefit of the
  34    optimization change when it is eventually merged.  Source/library/runtime
  35    changes are in play for TechEmpower and Benchmarks Game both.
  36
  37 Compiler micro-benchmarks (like those in our [test tree](https://github.com/dotnet/coreclr/tree/master/tests/src/JIT/Performance/CodeQuality))
  38 don't share these issues, and adding them as optimizations are implemented is
  39 critical for validation and regression prevention; however, micro-benchmarks
  40 often aren't as representative of real-world code, and therefore not as
  41 reflective of developers' performance needs, so aren't well suited for scouting
  42 out and prioritizing opportunities.
  43
  44
  45 Benefits of JIT Optimization
  46 ----------------------------
  47
  48 While source changes can more rapidly and dramatically effect changes to
  49 targeted hot code sequences in macro-benchmarks, compiler changes have the
  50 advantage that they apply broadly to all compiled code.  One of the best reasons
  51 to invest in compiler optimization improvements is to capitalize on this.  A few
  52 specific benefits:
  53
  54  - Optimizer changes can effect "peanut-butter" improvements; by making an
  55    improvement which is small in any particular instance to a code sequence that
  56    is repeated thousands of times across a codebase, they can produce substantial
  57    cumulative wins.  These should accrue toward the standard metrics (benchmark
  58    scores and code size), but identifying the most profitable "peanut-butter"
  59    opportunities is difficult.  Improving our methodology for identifying such
  60    opportunities would be helpful; some ideas are below.
  61  - Optimizer changes can unblock coding patterns that performance-sensitive
  62    developers want to employ but consider prohibitively expensive.  They may
  63    have inelegant works-around in their code, such as gotos for loop-exiting
  64    returns to work around poor block layout, manually scalarized structs to work
  65    around poor struct promotion, manually unrolled loops to work around lack of
  66    loop unrolling, limited use of lambdas to work around inefficient access to
  67    heap-allocated closures, etc.  The more the optimizer can improve such
  68    situations, the better, as it both increases developer productivity and
  69    increases the usefulness of abstractions provided by the language and
  70    libraries.  Finding a measurable metric to track this type of improvement
  71    poses a challenge, but would be a big help toward prioritizing and validating
  72    optimization improvements; again, some ideas are below.
  73
  74
  75 Brainstorm
  76 ----------
  77
  78 Listed here are several ideas for undertakings we might pursue to improve our
  79 ability to identify opportunities and validate/track improvements that mesh
  80 with the benefits discussed above.  Thinking here is in the early stages, but
  81 the hope is that with some thought/discussion some of these will surface as
  82 worth investing in.
  83
  84  - Is there telemetry we can implement/analyze to identify "peanut-butter"
  85    opportunities, or target "coding pattern"s?  Probably easier to use this
  86    to evaluate/prioritize patterns we're considering targeting than to identify
  87    the patterns in the first place.
  88  - Can we construct some sort of "peanut-butter profiler"?  The idea would
  89    roughly be to aggregate samples/counters under particular input constructs
  90    rather than aggregate them under callstack.  Might it be interesting to
  91    group by MSIL opcode, or opcode pair, or opcode triplet... ?
  92  - It might behoove us to build up some SPMI traces that could be data-mined
  93    for any of these experiments.
  94  - We should make it easy to view machine code emitted by the jit, and to
  95    collect profiles and correlate them with that machine code.  This could
  96    benefit any developers doing performance analysis of their own code.
  97    The JIT team has discussed this, options include building something on top of
  98    the profiler APIs, enabling COMPlus_JitDisasm in release builds, and shipping
  99    with or making easily available an alt jit that supports JitDisasm.
 100  - Hardware companies maintain optimization/performance guides for their ISAs.
 101    Should we maintain one for MSIL and/or C# (and/or F#)?  If we hosted such a
 102    thing somewhere publicly votable, we could track which anti-patterns people
 103    find most frustrating to avoid, and subsequent removal of them.  Does such
 104    a guide already exist somewhere, that we could use as a starting point?
 105    Should we collate GitHub issues or Stack Overflow issues to create such a thing?
 106  - Maybe we should expand our labels on GitHub so that there are sub-areas
 107    within "optimization"?  It could help prioritize by letting us compare the
 108    relative sizes of those buckets.
 109  - Can we more effectively leverage the legacy JIT codebases for comparative
 110    analysis?  We've compared micro-benchmark performance against Jit64 and
 111    manually compared disassembly of hot code, what else can we do?  One concrete
 112    idea:  run over some large corpus of code (SPMI?), and do a path-length
 113    comparison e.g. by looking at each sequence of k MSIL instructions (for some
 114    small k), and for each combination of k opcodes collect statistics on the
 115    size of generated machine code (maybe using debug line number info to do the
 116    correlation?), then look for common sequences which are much longer with
 117    RyuJIT.
 118  - Maybe hook RyuJIT up to some sort of superoptimizer to identify opportunities?
 119  - Microsoft Research has done some experimenting that involved converting RyuJIT
 120    IR to LLVM IR; perhaps we could use this to identify common expressions that
 121    could be much better optimized.
 122  - What's a practical way to establish a metric of "unblocked coding patterns"?
 123  - How developers give feedback about patterns/performance could use some thought;
 124    the GitHub issue list is open, but does it need to be publicized somehow?  We
 125    perhaps should have some regular process where we pull issues over from other
 126    places where people report/discuss dotnet performance issues, like
 127    [Stack Overflow](https://stackoverflow.com/questions/tagged/performance+.net).