1 # Exception Handling Write Through Optimization.
3 Write through is an optimization done on local variables that live across
4 exception handling flow like a handler, filter, or finally so that they can be
5 enregistered - treated as a register candidate - throughout a method. For each
6 variable live across one of these constructs, the minimum requirement is that a
7 store to the variables location on the stack is placed between a reaching
8 definition and any point of control flow leading to the handler, as well as a
9 load between any return from a filter or finally and an upward exposed use.
10 Conceptually this maintains the value of the variable on the stack across the
11 exceptional flow which would kill any live registers. This transformation splits
12 a local variable into an enregisterable compiler temporary backed by
13 the local variable on the stack. For local vars that additionally have
14 appearances within an eh construct, a load from the stack local is inserted to
15 a temp that will be enregistered within the handler.
19 Historically the JIT has not done this transformation because exception
20 handling was rare and thus the transformation was not worth the compile time.
21 Additionally it was easy to make the recomendation to users to remove EH from
22 performance critical methods since they had control of where the EH appeared.
23 Neither of these points remain true as we increase our focus on cloud
24 workloads. The use of non-blocking async calls are common in performance
25 critical paths for these workloads and async injects exception handling
26 constructs to implement the feature. This in combination with the long
27 standing use of EH in 'foreach' and 'using' statements means that we are seeing
28 EH constructs that are difficult for the user to manage or remove high in the
29 profile (Techempower on Kestrel is a good example). It's also good to consider
30 that in MSIL, basic operations can raise semantically meaningful exceptions
31 (unlike say C++, where an explicit throw is required to raise an exception) so
32 injected handlers can end up pessimizing a number of local variables in the
33 method. Given this combination of issues in cloud workloads doing the
34 transformation should be a clear benefit.
38 The goal of the design is to preserve the constraints listed above - i.e.
39 preserve a correct value on the stack for any local var that crosses an EH edge
40 in the flow graph. To ensure that the broad set of global optimizations can act
41 on the IR shape produced by this transformation and that phase ordering issues
42 do not block enregistration opportunities the write through phase will be
43 staged just prior to SSA build after morph and it will do a full walk of the
44 IR rewriting appearances to proxies as well as inserting reloads at the
45 appropriate blocks in the flow graph as indicated by EH control flow semantics.
46 To preserve the needed values on the stack a store will also be inserted after
47 every definition to copy the new value in the proxy back to the stack location.
48 This will leave non optimal number of stores (too many) but with the strategy
49 that the more expensive analysis to eliminate/better place stores will be
50 staged as a global optimization in a higher compilation tier.
52 There are a number of wrinkles informing this design based on how the JIT models EH:
53 - The jit does not explicitly model the exception flow, so a given block and
54 even a given statement within a block may have multiple exception-raising sites.
55 - For statements within protected regions, and for all variables live into any
56 reachable handler, the jit assumes all definitions within the region can
57 potentially reach uses in the handlers, since the exact interleaving of
58 definition points and exception points is not known. Hence every definition
59 is a reaching definition, even both values back from to back stores with no
60 read of the variable in between.
61 - The jit does not model which handlers are reachable from a given protected region,
62 so considers a variable live into a handler if it is live into any handler in the method.
64 It is posible to do better than the "store every definition" approch outlined
65 in the design, but the expectation is that this would require posibly
66 modifying the model in the JIT and staging more throughput intensive analyses.
67 With these considerations this design was selected and further improvements
68 left to future optimization.
72 To identify EH crossing local vars global liveness is necessary. This comes at
73 the significant cost of the liveness analysis. To mitigate this the write
74 through phase is staged immediately before SSA build for the global optimizer.
75 Since the typical case is that there is no EH, the liveness analysis in write
76 through can be reused directly by SSA build. For the case where EH local vars
77 are present liveness today must be rebuilt for SSA since new local vars have
78 been added, but incremental update to the RyuJIT liveness analysis can be
79 implemented (worklist based live analysis) to improve the throughput.
80 Additionally the write through transformation does a full IR walk - also
81 expensive - to replace EH local var appearances with proxies and insert
82 transfers to and from the stack for EH flow, given this initial implementations
83 may need to be staged as part of AOT (crossgen) compiles until tiering can move
84 the more expensive analysis out of the startup path.
88 On the IR directly before SSA build:
89 - Run global liveness to identify local vars that cross EH boundaries (as a
90 byproduct of this these local vars are marked "do not enregister")
91 - Foreach EH local var create a new local var "proxy" that can be enregisterd.
92 - Iterate each block in the flow graph doing the following:
93 * Foreach tree in block do a post order traversal and
94 - Replace all appearances of EH local vars with the defined proxy
95 - Insert a copy of proxy definition back to the EH local var (on the stack)
96 * If EH handler entry block insert reloads from EH local var to proxy at
98 * If finally or filter exit, insert reloads from EH local var to proxy at
100 - For method entry block, insert reloads from parameter EH local vars to
103 At end no proxy should be live across EH flow and all value updates will be
104 written back to the stack location.
106 ### Alternate Algorithm: In LSRA
108 * Add a flag to identify Intervals as "WriteThru". This would be set on all lclVars
109 considered by liveness to be exceptVars.
110 * Additionally, add a flag to identify RefPositions as "WriteThru". The motivation
111 for having both, is that in the exception var case, we want to create all defs as
112 write-thru, but for other purposes we may want to make some defs write-thru
113 (i.e. they spill but the target register remains live), but not all defs for a given lclVar.
114 * During liveness, mark exception vars as `lvLiveInOutOfHndlr`, but not `lvDoNotEnregister`.
115 * During interval creation, if a lclVar is marked `lvLiveInOutOfHndlr`, set `isWriteThru` on the interval.
116 * Set handler entry blocks as having no predecessor for register-mapping purposes.
117 - Leave the inVarToRegMaps empty (all incoming vars on stack)
118 * Set the outVarToRegMap to empty for EH exit blocks.
119 * During allocation, treat isWriteThru interval defs and uses differently:
120 - A def is always marked writeThru if it is assigned a register. If it doesn't get a register
121 at all, it is marked spillAfter as per usual.
122 - A use is never marked spillAfter (as the stack location is always valid at a use).
123 * During resolution/writeback:
124 - Mark all isWriteThru defs with `GTF_SPILL`, as for `spillAfter`, but keep the reg assignment,
125 and the interval stays active.
126 - Assert that uses of isWriteThru intervals are never marked spillAfter
127 * During `genFnProlog()`, ensure that incoming reg parameters that have register assoginments also
128 get stored to stack if they are marked lvLiveInOutOfHndlr.
130 #### Challenges/Issues with the LSRA approach above:
132 * Liveness currently adds all exceptVars to the live-in for blocks where `ehBlockHasExnFlowDsc` returns true.
133 This results in more "artificial" liveness than strictly entry to and exit from EH regions.
134 * In some cases, write-thru may be worse, performance-wise, than always using memory, if the EH local is
135 infrequently referenced in non-EH code. This is a slightly different issue than known spill placement
136 and allocation issues, but is related (i.e. when to choose not to keep the register live, and simply
137 create the value in memory if that doesn't require a register).
141 The initial prototype that produced the example bellow is currently being
142 improved to make it production ready. At the same time a more extensive suite
143 of example tests are being developed.
145 - [X] Proof of concept prototype.
146 - [ ] Production implementation of WriteThru phase.
147 - [ ] Suite of optimization examples/regression tests.
149 * [ ] Full CI test pass.
150 * [ ] JIT benchmark diffs.
151 * [ ] Kestrel techempower numbers.
155 The following is a simple example that shows enregistration for a local var
156 live, and modified, through a catch.
158 #### Source code snippet
166 public Enreg01(int x) {
171 [MethodImpl(MethodImplOptions.NoInlining)]
172 public int foo(ref double d) { return (int)d; }
174 [MethodImpl(MethodImplOptions.NoInlining)]
182 catch (ValueException e)
184 Console.WriteLine("Catching {0}", Convert.ToString(e.x));
193 [MethodImpl(MethodImplOptions.NoInlining)]
194 public int TryValue(int y)
198 Console.WriteLine("Throwing 97");
199 throw new ValueException(97);
208 #### Post WriteThru GenTree nodes for Run() method
210 The Run() contains the catch and is the only method the EH WriteThru modifies.
213 Creating enregisterable proxies:
214 lvaGrabTemp returning 8 (V08 tmp5) (a long lifetime temp) called for Add proxy for EH Write Thru..
215 Creating proxy V08 for local var V00
217 lvaGrabTemp returning 9 (V09 tmp6) (a long lifetime temp) called for Add proxy for EH Write Thru..
218 Creating proxy V09 for local var V01
220 Trees after EH Write Thru
222 ---------------------------------------------------------------------------------------------------------------------------
223 BBnum descAddr ref try hnd preds weight [IL range] [jump] [EH region] [flags]
224 ---------------------------------------------------------------------------------------------------------------------------
225 BB01 [00000263A1C161B8] 1 1 [000..007) i label target
226 BB02 [00000263A1C162D0] 1 0 BB01 1 [007..012) T0 try { } keep i try label gcsafe
227 BB03 [00000263A1C16500] 2 BB02,BB04 1 [050..052) (return) i label target gcsafe
229 BB04 [00000263A1C163E8] 0 0 0 [012..050)-> BB03 ( cret ) H0 F catch { } keep i rare label target gcsafe flet
230 -------------------------------------------------------------------------------------------------------------------------------------
232 ------------ BB01 [000..007), preds={} succs={BB02}
235 ( 3, 3) [000123] ------------ * stmtExpr void (IL ???... ???)
236 N001 ( 3, 2) [000120] ------------ | /--* lclVar ref V00 this
237 N003 ( 3, 3) [000122] -A------R--- \--* = ref
238 N002 ( 1, 1) [000121] D------N---- \--* lclVar ref V08 tmp5
241 ( 17, 13) [000005] ------------ * stmtExpr void (IL 0x000...0x006)
242 N007 ( 3, 2) [000097] ------------ | /--* lclVar int V09 tmp6
243 N009 ( 7, 5) [000098] -A------R--- | /--* = int
244 N008 ( 3, 2) [000096] D------N---- | | \--* lclVar int V01 loc0
245 N010 ( 17, 13) [000099] -A-XG------- \--* comma void
246 N004 ( 6, 5) [000002] ---XG------- | /--* indir int
247 N002 ( 1, 1) [000059] ------------ | | | /--* const long 16 field offset Fseq[val]
248 N003 ( 4, 3) [000060] -------N---- | | \--* + byref
249 N001 ( 3, 2) [000001] ------------ | | \--* lclVar ref V08 tmp5
250 N006 ( 10, 8) [000004] -A-XG---R--- \--* = int
251 N005 ( 3, 2) [000003] D------N---- \--* lclVar int V09 tmp6
253 ------------ BB02 [007..012), preds={BB01} succs={BB03}
256 ( 16, 10) [000013] ------------ * stmtExpr void (IL 0x007...0x00F)
257 N008 ( 16, 10) [000011] --C-G------- \--* call int Enreg01.TryIncrement
258 N004 ( 1, 1) [000009] ------------ this in rcx +--* lclVar ref V08 tmp5
259 N005 ( 1, 1) [000010] ------------ arg1 in rdx \--* const int 97
261 ------------ BB03 [050..052) (return), preds={BB02,BB04} succs={}
264 ( 3, 3) [000119] ------------ * stmtExpr void (IL ???... ???)
265 N001 ( 3, 2) [000116] ------------ | /--* lclVar int V01 loc0
266 N003 ( 3, 3) [000118] -A------R--- \--* = int
267 N002 ( 1, 1) [000117] D------N---- \--* lclVar int V09 tmp6
270 ( 4, 3) [000017] ------------ * stmtExpr void (IL 0x050...0x051)
271 N002 ( 4, 3) [000016] ------------ \--* return int
272 N001 ( 3, 2) [000015] ------------ \--* lclVar int V09 tmp6
274 ------------ BB04 [012..050) -> BB03 (cret), preds={} succs={BB03}
277 ( 5, 4) [000021] ------------ * stmtExpr void (IL 0x012...0x012)
278 N001 ( 1, 1) [000007] -----O------ | /--* catchArg ref
279 N003 ( 5, 4) [000020] -A---O--R--- \--* = ref
280 N002 ( 3, 2) [000019] D------N---- \--* lclVar ref V03 tmp0
283 ( 3, 3) [000111] ------------ * stmtExpr void (IL ???... ???)
284 N001 ( 3, 2) [000108] ------------ | /--* lclVar ref V00 this
285 N003 ( 3, 3) [000110] -A------R--- \--* = ref
286 N002 ( 1, 1) [000109] D------N---- \--* lclVar ref V08 tmp5
289 ( 3, 3) [000115] ------------ * stmtExpr void (IL ???... ???)
290 N001 ( 3, 2) [000112] ------------ | /--* lclVar int V01 loc0
291 N003 ( 3, 3) [000114] -A------R--- \--* = int
292 N002 ( 1, 1) [000113] D------N---- \--* lclVar int V09 tmp6
295 ( 59, 43) [000034] ------------ * stmtExpr void (IL 0x013...0x037)
296 N021 ( 59, 43) [000031] --CXG------- \--* call void System.Console.WriteLine
297 N002 ( 5, 12) [000066] ----G------- | /--* indir ref
298 N001 ( 3, 10) [000065] ------------ | | \--* const(h) long 0xB3963070 "Catching {0}"
299 N004 ( 9, 15) [000076] -A--G---R-L- arg0 SETUP +--* = ref
300 N003 ( 3, 2) [000075] D------N---- | \--* lclVar ref V05 tmp2
301 N012 ( 20, 14) [000029] --CXG------- | /--* call ref System.Convert.ToString
302 N010 ( 6, 8) [000028] ---XG------- arg0 in rcx | | \--* indir int
303 N008 ( 1, 4) [000067] ------------ | | | /--* const long 140 field offset Fseq[x]
304 N009 ( 4, 6) [000068] -------N---- | | \--* + byref
305 N007 ( 3, 2) [000027] ------------ | | \--* lclVar ref V03 tmp0
306 N014 ( 24, 17) [000072] -ACXG---R-L- arg1 SETUP +--* = ref
307 N013 ( 3, 2) [000071] D------N---- | \--* lclVar ref V04 tmp1
308 N017 ( 3, 2) [000073] ------------ arg1 in rdx +--* lclVar ref V04 tmp1 (last use)
309 N018 ( 3, 2) [000077] ------------ arg0 in rcx \--* lclVar ref V05 tmp2 (last use)
312 ( 18, 19) [000044] ------------ * stmtExpr void (IL 0x028... ???)
313 N014 ( 1, 1) [000101] ------------ | /--* lclVar int V09 tmp6
314 N016 ( 5, 4) [000102] -A------R--- | /--* = int
315 N015 ( 3, 2) [000100] D------N---- | | \--* lclVar int V01 loc0
316 N017 ( 18, 19) [000103] -A-XG------- \--* comma void
317 N010 ( 6, 8) [000039] ---XG------- | /--* indir int
318 N008 ( 1, 4) [000081] ------------ | | | /--* const long 140 field offset Fseq[x]
319 N009 ( 4, 6) [000082] -------N---- | | \--* + byref
320 N007 ( 3, 2) [000038] ------------ | | \--* lclVar ref V03 tmp0 (last use)
321 N011 ( 13, 15) [000041] ---XG------- | /--* + int
322 N005 ( 4, 4) [000037] ---XG------- | | | /--* indir int
323 N003 ( 1, 1) [000079] ------------ | | | | | /--* const long 16 field offset Fseq[val]
324 N004 ( 2, 2) [000080] -------N---- | | | | \--* + byref
325 N002 ( 1, 1) [000036] ------------ | | | | \--* lclVar ref V08 tmp5
326 N006 ( 6, 6) [000040] ---XG------- | | \--* + int
327 N001 ( 1, 1) [000035] ------------ | | \--* lclVar int V09 tmp6
328 N013 ( 13, 15) [000043] -A-XG---R--- \--* = int
329 N012 ( 1, 1) [000042] D------N---- \--* lclVar int V09 tmp6
332 ( 20, 14) [000051] ------------ * stmtExpr void (IL 0x038...0x044)
333 N013 ( 20, 14) [000049] --CXGO------ \--* call int Enreg01.foo
334 N007 ( 1, 1) [000086] ------------ | /--* const long 8 field offset Fseq[dist]
335 N008 ( 3, 3) [000087] ------------ | /--* + byref
336 N006 ( 1, 1) [000085] ------------ | | \--* lclVar ref V08 tmp5
337 N009 ( 5, 5) [000088] ---XGO-N---- arg1 in rdx +--* comma byref
338 N005 ( 2, 2) [000084] ---X-O-N---- | \--* nullcheck byte
339 N004 ( 1, 1) [000083] ------------ | \--* lclVar ref V08 tmp5
340 N010 ( 1, 1) [000045] ------------ this in rcx \--* lclVar ref V08 tmp5
343 ( 11, 10) [000058] ------------ * stmtExpr void (IL 0x045...0x04D)
344 N009 ( 1, 1) [000105] ------------ | /--* lclVar int V09 tmp6
345 N011 ( 5, 4) [000106] -A------R--- | /--* = int
346 N010 ( 3, 2) [000104] D------N---- | | \--* lclVar int V01 loc0
347 N012 ( 11, 10) [000107] -A-XG------- \--* comma void
348 N005 ( 4, 4) [000054] ---XG------- | /--* indir int
349 N003 ( 1, 1) [000094] ------------ | | | /--* const long 16 field offset Fseq[val]
350 N004 ( 2, 2) [000095] -------N---- | | \--* + byref
351 N002 ( 1, 1) [000053] ------------ | | \--* lclVar ref V08 tmp5
352 N006 ( 6, 6) [000055] ---XG------- | /--* + int
353 N001 ( 1, 1) [000052] ------------ | | \--* lclVar int V09 tmp6
354 N008 ( 6, 6) [000057] -A-XG---R--- \--* = int
355 N007 ( 1, 1) [000056] D------N---- \--* lclVar int V09 tmp6
359 #### Post register allocation and code generation code
362 --- base.asmdmp 2017-03-28 20:40:36.000000000 -0700
363 +++ wt.asmdmp 2017-03-28 20:41:11.000000000 -0700
365 *************** After end code gen, before unwindEmit()
366 -G_M16307_IG01: ; func=00, offs=000000H, size=0014H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
367 +G_M16307_IG01: ; func=00, offs=000000H, size=0017H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
376 -mov qword ptr [V07 rbp-20H], rsp
378 +mov qword ptr [V07 rbp-30H], rsp
379 mov gword ptr [V00 rbp+10H], rcx
381 -G_M16307_IG02: ; offs=000014H, size=000AH, gcVars=0000000000000001 {V00}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
382 +G_M16307_IG02: ; offs=000017H, size=000AH, gcVars=0000000000000001 {V00}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
384 -mov rcx, gword ptr [V00 rbp+10H]
385 -mov ecx, dword ptr [rcx+16]
386 -mov dword ptr [V01 rbp-14H], ecx
387 +mov rsi, gword ptr [V00 rbp+10H]
388 +mov edi, dword ptr [rsi+16]
389 +mov dword ptr [V01 rbp-24H], edi
391 -G_M16307_IG03: ; offs=00001EH, size=000FH, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
392 +G_M16307_IG03: ; offs=000021H, size=000EH, gcrefRegs=00000040 {rsi}, byrefRegs=00000000 {}, byref
394 -mov rcx, gword ptr [V00 rbp+10H]
395 +mov rcx, rsi ; Elided reload in try region
397 call Enreg01:TryIncrement(int):int:this
400 -G_M16307_IG04: ; offs=00002DH, size=0003H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
401 +G_M16307_IG04: ; offs=00002FH, size=0005H, gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
403 -mov eax, dword ptr [V01 rbp-14H]
404 +mov edi, dword ptr [V01 rbp-24H]
407 -G_M16307_IG05: ; offs=000030H, size=0008H, epilog, nogc, emitadd
408 +G_M16307_IG05: ; offs=000034H, size=000BH, epilog, nogc, emitadd
419 -G_M16307_IG06: ; func=01, offs=000038H, size=0014H, gcrefRegs=00000004 {rdx}, byrefRegs=00000000 {}, byref, funclet prolog, nogc
420 +G_M16307_IG06: ; func=01, offs=00003FH, size=0017H, gcrefRegs=00000004 {rdx}, byrefRegs=00000000 {}, byref, funclet prolog, nogc
428 mov rbp, qword ptr [rcx+32]
429 mov qword ptr [rsp+20H], rbp
433 -G_M16307_IG07: ; offs=00004CH, size=005EH, gcVars=0000000000000001 {V00}, gcrefRegs=00000004 {rdx}, byrefRegs=00000000 {}, gcvars, byref, isz
434 +G_M16307_IG07: ; offs=000056H, size=0054H, gcVars=0000000000000001 {V00}, gcrefRegs=00000004 {rdx}, byrefRegs=00000000 {}, gcvars, byref, isz
437 -mov rcx, 0x18A3C473070
438 -mov rdi, gword ptr [rcx]
439 +mov rcx, gword ptr [V00 rbp+10H] ; Reload of proxy register
440 +mov rdi, rcx ; Missed peep
441 +mov ecx, dword ptr [V01 rbp-24H] ; Reload of proxy register
442 +mov ebx, ecx ; Missed peep
443 +mov rcx, 0x263B3963070
444 +mov r14, gword ptr [rcx] ; Missed addressing mode
445 mov ecx, dword ptr [rsi+140]
446 call System.Convert:ToString(int):ref
450 call System.Console:WriteLine(ref,ref)
451 -mov edx, dword ptr [V01 rbp-14H] ; Elided stack access
452 -mov rcx, gword ptr [V00 rbp+10H] ; Elided stack access
453 -add edx, dword ptr [rcx+16]
454 -add edx, dword ptr [rsi+140]
455 -mov dword ptr [V01 rbp-14H], edx ; Elided stack access
456 -mov rdx, gword ptr [V00 rbp+10H] ; Elided stack access
458 -mov rcx, gword ptr [V00 rbp+10H] ; Elided stack access
459 +add ebx, dword ptr [rdi+16]
460 +add ebx, dword ptr [rsi+140]
461 +lea rdx, bword ptr [rdi+8]
463 call Enreg01:foo(byref):int:this
464 -mov eax, dword ptr [V01 rbp-14H] ; Elided stack access
465 -mov rdx, gword ptr [V00 rbp+10H] ; Elided stack access
466 -add eax, dword ptr [rdx+16]
467 -mov dword ptr [V01 rbp-14H], eax ; Elided stack access
468 +add ebx, dword ptr [rdi+16]
469 +mov dword ptr [V01 rbp-24H], ebx ; Store of proxy register
470 lea rax, G_M16307_IG04
472 -G_M16307_IG08: ; offs=0000AAH, size=0008H, funclet epilog, nogc, emitadd
473 +G_M16307_IG08: ; offs=0000AAH, size=000BH, funclet epilog, nogc, emitadd
486 replaced 6 loads and 2 stores with 2 loads, 1 store, 2 push, 2 pop.