Documentation/performance/JitOptimizerTodoAssessment.md

   1 Optimizer Codebase Status/Investments
   2 =====================================
   3
   4 There are a number of areas in the optimizer that we know we would invest in
   5 improving if resources were unlimited.  This document lists them and some
   6 thoughts about their current state and prioritization, in an effort to capture
   7 the thinking about them that comes up in planning discussions.
   8
   9
  10 Improved Struct Handling
  11 ------------------------
  12
  13 This is an area that has received recent attention, with the [first-class structs](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/first-class-structs.md)
  14 work and the struct promotion improvements that went in for `Span<T>`.  Work here
  15 is expected to continue and can happen incrementally.  Possible next steps:
  16
  17  - Struct promotion stress mode (test mode to improve robustness/reliability)
  18  - Promotion of more structs; relax limits on e.g. field count (should generally
  19    help performance-sensitive code where structs are increasingly used to avoid
  20    heap allocations)
  21  - Improve handling of System V struct passing (I think we currently insert
  22    some unnecessary round-trips through memory at call boundaries due to
  23    internal representation issues)
  24  - Implicit byref parameter promotion w/o shadow copy
  25
  26 We don't have specific benchmarks that we know would jump in response to any of
  27 these.  May well be able to find some with some looking, though this may be an
  28 area where current performance-sensitive code avoids structs.
  29
  30
  31 Exception handling
  32 ------------------
  33
  34 This is increasingly important as C# language constructs like async/await and
  35 certain `foreach` incantations are implemented with EH constructs, making them
  36 difficult to avoid at source level.  The recent work on finally cloning, empty
  37 finally removal, and empty try removal targeted this.  [Writethrough](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/eh-writethru.md)
  38 is another key optimization enabler here, and we are actively pursuing it.  Other
  39 things we've discussed include inlining methods with EH and computing funclet
  40 callee-save register usage independently of main function callee-save register
  41 usage, but I don't think we have any particular data pointing to either as a
  42 high priority.
  43
  44
  45 Loop Optimizations
  46 ------------------
  47
  48 We haven't been targeting benchmarks that spend a lot of time doing compuations
  49 in an inner loop.  Pursuing loop optimizations for the peanut butter effect
  50 would seem odd.  So this simply hasn't bubbled up in priority yet, though it's
  51 bound to eventually.
  52
  53
  54 More Expression Optimizations
  55 -----------------------------
  56
  57 We again don't have particular benchmarks pointing to key missing cases, and
  58 balancing the CQ vs TP will be delicate here, so it would really help to have
  59 an appropriate benchmark suite to evaluate this work against.
  60
  61
  62 Forward Substitution
  63 --------------------
  64
  65 This too needs an appropriate benchmark suite that I don't think we have at
  66 this time.  The tradeoffs against register pressure increase and throughput
  67 need to be evaluated.  This also might make more sense to do if/when we can
  68 handle SSA renames.
  69
  70
  71 Value Number Conservativism
  72 ---------------------------
  73
  74 We have some frustrating phase-ordering issues resulting from this, but the
  75 opt-repeat experiment indicated that they're not prevalent enough to merit
  76 pursuing changing this right now.  Also, using SSA def as the proxy for value
  77 number would require handling SSA renaming, so there's a big dependency chained
  78 to this.
  79 Maybe it's worth reconsidering the priority based on throughput?
  80
  81
  82 High Tier Optimizations
  83 -----------------------
  84
  85 We don't have that many knobs we can "crank up" (though we do have the tracked
  86 assertion count and could switch inliner policies), nor do we have any sort of
  87 benchmarking story set up to validate whether tiering changes are helping or
  88 hurting.  We should get that benchmarking story sorted out and at least hook
  89 up those two knobs.
  90
  91
  92 Low Tier Back-Off
  93 -----------------
  94
  95 We have some changes we know we want to make here: morph does more than it needs
  96 to in minopts, and tier 0 should be doing throughput-improving inlines, as
  97 opposed to minopts which does no inlining.  It would be nice to have the
  98 benchmarking story set up to measure the effect of such changes when they go in,
  99 we should do that.
 100
 101
 102 Async
 103 -----
 104
 105 We've made note of the prevalence of async/await in modern code (and particularly
 106 in web server code such as TechEmpower), and have some opportunities listed in
 107 [#7914](https://github.com/dotnet/coreclr/issues/7914).  Some sort of study of
 108 async peanut butter to find more opportunities is probably in order, but what
 109 would that look like?
 110
 111
 112 Address Mode Building
 113 ---------------------
 114
 115 One opportunity that's frequently visible in asm dumps is that more address
 116 expressions could be folded into memory operands' address expressions.  This
 117 would likely give a measurable codesize win.  Needs some thought about where
 118 to run in phase list and how aggressive to be about e.g. analyzing across
 119 statements.
 120
 121
 122 If-Conversion (cmov formation)
 123 ------------------------------
 124
 125 This hits big in microbenchmarks where it hits.  There's some work in flight
 126 on this (see #7447 and #10861).
 127
 128
 129 Mulshift
 130 --------
 131
 132 Replacing multiplication by constants with shift/add/lea sequences is a
 133 classic optimization that keeps coming up in planning.  An [analysis](https://gist.github.com/JosephTremoulet/c1246b17ea2803e93e203b9969ee5a25#file-mulshift-md)
 134 indicates that RyuJIT is already capitalizing on most of the opportunity here.