Guarded devirtualization foundations (#21270)
authorAndy Ayers <andya@microsoft.com>
Thu, 6 Dec 2018 17:03:31 +0000 (09:03 -0800)
committerGitHub <noreply@github.com>
Thu, 6 Dec 2018 17:03:31 +0000 (09:03 -0800)
Lay the groundwork for guarded devirtualization of virtual and interface
calls in the jit.

Introduce the notion of a guarded devirtualization candidate and identify
these if regular devirtualization fails. Use simple heuristics to produce
a class to guess for. Require that the method that would be invoked if the class
guess is correct be a plausible inline candidate.

Generalize the calli transformer to become an indirect call transformer.
This runs after importation because it needs to introduce control flow and
runs before inlining so that the new direct calls it introduces can be inlined.

Implement the transformation to duplicate the call site, devirtualize on the side
where the class is now known exactly, and turn the resulting direct call into an
inline candidate.

Add a motivation and design document.

19 files changed:
Documentation/design-docs/GuardedDevirtualization.md [new file with mode: 0644]
Documentation/design-docs/ThreeClassesDevirt.JPG [new file with mode: 0644]
Documentation/design-docs/ThreeClassesDevirtDetail.JPG [new file with mode: 0644]
Documentation/design-docs/ThreeClassesDevirtFull.JPG [new file with mode: 0644]
Documentation/design-docs/ThreeClassesInterface.JPG [new file with mode: 0644]
Documentation/design-docs/TwoClassesBaseline.JPG [new file with mode: 0644]
Documentation/design-docs/TwoClassesDevirt.JPG [new file with mode: 0644]
Documentation/design-docs/TwoClassesInterface.JPG [new file with mode: 0644]
src/jit/compiler.cpp
src/jit/compiler.h
src/jit/flowgraph.cpp
src/jit/gentree.cpp
src/jit/gentree.h
src/jit/importer.cpp
src/jit/inline.cpp
src/jit/inline.h
src/jit/jitconfigvalues.h
src/jit/lclvars.cpp
src/jit/morph.cpp

diff --git a/Documentation/design-docs/GuardedDevirtualization.md b/Documentation/design-docs/GuardedDevirtualization.md
new file mode 100644 (file)
index 0000000..ba1f7bf
--- /dev/null
@@ -0,0 +1,533 @@
+# Guarded Devirtualization
+
+## Overview
+
+Guarded devirtualization is a proposed new optimization for the JIT in .NET Core
+3.0. This document describes the motivation, initial design sketch, and highlights
+various issues needing further investigation.
+
+## Motivation
+
+The .NET Core JIT is able to do a limited amount of devirtualization for virtual
+and interface calls. This ability was added in .NET Core 2.0. To devirtualize
+the JIT must be able to demonstrate one of two things: either that it knows the
+type of some reference exactly (say because it has seen a `newobj`) or that the
+declared type of the reference is a `final` class (aka `sealed`). For virtual
+calls the JIT can also devirtualize if it can prove the method is marked as `final`.
+
+However, most of the time the JIT is unable to determine exactness or finalness
+and so devirtualization fails. Statistics show that currently only around 15% of
+virtual call sites can be devirtualized.  Result are even more pessimistic for
+interface calls, where success rates are around 5%.
+
+There are a variety of reasons for this. The JIT analysis is somewhat weak.
+Historically all the JIT cared about was whether some location held **a** reference
+type, not a specific reference type. So the current type propagation has been
+retrofitted and there are places where types just get lost. The JIT analysis
+happens quite early (during importation) and there is only minimal ability to do
+data flow analysis at this stage. So for current devirtualization the source of
+the type information and the consumption must be fairly close in the code. A
+more detailed accounting of some of the shortcomings can be found in 
+[CoreCLR#9908](https://github.com/dotnet/coreclr/issues/9908).
+
+Resolution of these issues will improve the ability of the JIT to devirtualize,
+but even the best analysis possible will still miss out on many cases. Some call
+sites are truly polymorphic. Some others are truly monomorphic but proving this
+would require sophisticated interprocedural analyses that are not practical in
+the JIT or in a system as dynamic as the CLR. And some sites are monomorphic in
+practice but potentially polymorphic.
+
+As an alternative, when devirtualization fails, the JIT can perform *guarded
+devirtualization*. Here the JIT creates an `if-then-else` block set in place of
+a virtual or interface call and inserts a runtime type test (or similar) into
+the `if` -- the "guard". If the guard test succeeds the JIT knows the type of
+the reference, so the `then` block can directly invoke the method corresponding
+to that type. If the test fails then the `else` block is executed and this
+contains the original virtual or interface call.
+
+The upshot is that the JIT conditionally gains the benefit of devirtualization at
+the expense of increased code size, longer JIT times, and slightly longer code
+paths around the call. So long as the JIT's guess at the type is somewhat
+reasonable, this optimization can improve performance.
+
+## Opportunity
+
+One might imagine that the JIT's guess about the type of the reference has to be
+pretty good for devirtualization to pay off. Somewhat surprisingly, at least based
+on our initial results, that is not the case.
+
+### Virtual Calls: The Two-Class Case
+
+Given these class declarations:
+```C#
+class B
+{
+   public virtual int F() { return 33;  }
+}
+
+class D : B
+{
+    public override int F() { return 44; }
+}
+```
+Suppose we have an array `B[]` that is randomly filled with instances of `B` and
+`D` and each element is class `B` with probability `p`.  We time how long
+it takes to invoke `F` on each member of the array (note the JIT will not ever 
+be able to devirtualize these calls), and plot the times as a function of `p`.
+The result is something like the following:
+
+![two classes baseline perf](TwoClassesBaseline.JPG)
+
+Modern hardware includes an indirect branch target predictor and we can see it
+in action here. When the array element type is predictable (`p` very close to
+zero or very close to 1) performance is better. When the element type is
+unpredictable (`p` near 0.5) performance is quite a bit worse.
+
+From this we can see that a correctly predicted virtual call requires about
+19 time units and worst case incorrect prediction around 55 time units. There is
+some timing overhead here too so the real costs are a bit lower.
+
+Now imagine we update the JIT to do guarded devirtualization and check if the
+element is indeed type `B`. If so the JIT can call `B.F` directly and in our 
+prototype the JIT will also inline the call. So we would expect that if the
+element types are mostly `B`s (that is if `p` is near 1.0) we'd see very good
+performance, and if the element type is mostly `D` (that is `p` near 0.0)
+performance should perhaps slightly worse than the un-optimized case as there is
+now extra code to run check before the call.
+
+![two classes devirt perf](TwoClassesDevirt.JPG)
+
+However as you can see the performance of devirtualized case (blue line) is as
+good or better than the un-optimized case for all values of `p`. This is perhaps
+unexpected and deserves some explanation.
+
+Recall that modern hardware also includes a branch predictor. For small or large
+values of `p` this predictor will correctly guess whether the test added by the
+JIT will resolve to the `then` or `else` case. For small values of `p` the JIT
+guess will be wrong and control will flow to the `else` block. But unlike the
+original example, the indirect call here will only see instances of type `D` and
+so the indirect branch predictor will work extremely well. So the overhead for
+the small `p` case is similar to the well-predicted indirect case without guarded
+devirtualization. As `p` increases the branch predictor starts to mispredict and
+that costs some cycles. But when it mispredicts control reaches the `then` block
+which executes the inlined call. So the cost of misprediction is offset by the
+faster execution and the cost stays relatively flat. 
+
+As `p` passes 0.5 the branch predictor flips its prediction to prefer the `then`
+case. As before mispredicts are costly and send us down the `else` path but there
+we still execute a correctly predicted indirect call.
+
+And as `p` approaches 1.0 the cost falls as the branch predictor is almost always
+correct and so the cost is simply that of the inlined call.
+
+So oddly enough the guarded devirtualization case shown here does not require any
+sort of perf tradeoff. The JIT is better off guessing the more likely case but
+even guessing the less likely case can pay off and doesn't hurt performance.
+
+One might suspect at this point that the two class case is a special case and that
+the results do not hold up in more complex cases. More on that shortly.
+
+Before moving on, we should point out that virtual calls in the current
+CLR are a bit more expensive than in C++, because the CLR uses a two-level method
+table. That is, the indirect call sequence is something like:
+```asm
+000095 mov      rax, qword ptr [rcx]                 ; fetch method table
+000098 mov      rax, qword ptr [rax+72]              ; fetch proper chunk
+00009C call     qword ptr [rax+32]B:F():int:this     ; call indirect
+```
+This is a chain of 3 dependent loads and so best-case will require at least 3x
+the best cache latency (plus any indirect prediction overhead).
+
+So the virtual call costs for the CLR are high. The chunked method table design
+was adopted to save space (chunks can be shared by different classes) at the
+expense of some performance. And this apparently makes guarded devirtualization
+pay off over a wider range of class distributions than one might expect.
+
+And for completeness, the full guarded `if-then-else` sequence measured above is:
+```asm
+00007A mov      rcx, gword ptr [rsi+8*rcx+16]        ; fetch array element
+00007F mov      rax, 0x7FFC9CFB4A90                  ; B's method table
+000089 cmp      qword ptr [rcx], rax                 ; method table test
+00008C jne      SHORT G_M30756_IG06                  ; jump if class is not B
+
+00008E mov      eax, 33                              ; inlined B.F
+000093 jmp      SHORT G_M30756_IG07
+
+G_M30756_IG06:
+
+000095 mov      rax, qword ptr [rcx]                 ; fetch method table
+000098 mov      rax, qword ptr [rax+72]              ; fetch proper chunk
+00009C call     qword ptr [rax+32]B:F():int:this     ; call indirect
+
+G_M30756_IG07:
+```
+Note there is a redundant load of the method table (hidden in the `cmp`) that
+could be eliminated with a bit more work on the prototype. So guarded
+devirtualization perf could potentially be even better than is shown above,
+especially for smaller values of `p`.
+
+### Virtual Calls: The Three-Class Case
+
+Now to return to the question we asked above: is there something about the two
+class case that made guarded devirtualization especially attractive? Read on.
+
+Suppose we introduce a third class into the mix and repeat the above measurement.
+There are now two probabilities in play: `p`, the probability that the element
+has class `B`, and `p1`, the probability that the element has class `D`, and
+there is a third class `E`. To avoid introducing a 3D plot we'll first simply
+average the results for the various values of `p1` and plot performance as a
+function of `p`:
+
+![three classes devirt perf](ThreeClassesDevirt.JPG)
+
+The right-hand side (`p` near 1.0) looks a lot like the previous chart. This is
+not surprising as there are relatively few instances of that third class. But the
+middle and left hand side differ and are more costly.
+
+For the un-optimized case (orange) the difference is directly attributable to
+the performance of the indirect ranch predictor. Even when `p` is small there
+are still two viable branch targets (on average) and some some degree of indirect
+misprediction.
+
+For the optimized case we now see that guarded devirtualization performs worse
+than no optimization if the JIT's guess is completely wrong. The penalty is not
+that bad because the JIT-introduced branch is predictable. But even at very
+modest values of `p` guarded devirtualization starts to win out.
+
+Because we've averaged over `p1` you might suspect that we're hiding something.
+The following chart shows the min and max values as well as the average, and also
+shows the two-class result (dashed lines).
+
+![three classes devirt perf ranges](ThreeClassesDevirtFull.JPG)
+
+You can see the minimum values are very similar to the two class case; these
+are cases where the `p1` is close to 0 or close to 1. And that makes sense because
+if there really are only two classes despite the potential of there being three
+then we'd expect to see similar results as in the case where there only can be
+two classes.
+
+And as noted above, if `p` is high enough then the curves also converge to the
+two class case, as the relative mixture of `D` and `E` is doesn't matter: the
+predominance of `B` wins out.
+
+For low values of `p` the actual class at the call site is some mixture of `D`
+and `E`. Here's some detail (the x axis now shows `p1` and `p` as upper and
+lower values respectively).
+
+![three classes devirt perf detail](ThreeClassesDevirtDetail.JPG)
+
+The worst case for perf for both is when the mixture of `D` and `E` is
+unpredictably 50-50 and there are no `B`s. Once we mix in just 10% of `B` then
+guarded devirt performs better no matter what distribution we have for the other
+two classes. Worst case overhead -- where the JIT guesses a class that never
+appears, and the other classes are evenly distributed -- is around 20%.
+
+So it seems reasonable to say that so long as the JIT can make a credible guess
+about the possible class -- say a guess that is right at least 10% of the time
+-- then there is quite likely a performance benefit to guarded
+devirtualization for virtual calls.
+
+We'll need to verify this with more scenarios, but these initial results are
+certainly encouraging.
+
+### Virtual Calls: Testing for Multiple Cases
+
+One might deduce from the above that if there are two likely candidates the JIT
+should test for each. This is certainly a possibility and in C++ compilers that
+do indirect call profiling there are cases where multiple tests are considered
+a good idea. But there's also additional code size and another branch. 
+
+This is something we'll look into further.
+
+### Interface Calls: The Two Class Case
+
+Interface calls on the CLR are implemented via [Virtual Stub Dispatch](
+https://github.com/dotnet/coreclr/blob/master/Documentation/botr/virtual-stub-dispatch.md
+)  (aka VSD). Calls are made through an indirection cell that initially points
+at a lookup stub. On the first call, the interface target is identified from the
+object's method table and the lookup stub is replaced with a dispatch stub that
+checks for that specific method table in a manner quite similar to guarded
+devirtualization.
+
+If the method table check fails a counter is incremented, and once the counter
+reaches a threshold the dispatch stub is replaced with a resolve stub that looks
+up the right target in a process-wide hash table.
+
+For interface call sites that are monomorphic, the VSD mechanism (via the dispatch
+stub) executes the following code sequence (here for x64)
+```asm
+; JIT-produced code
+;
+; set up R11 with interface target info
+mov R11, ...         ; additional VSD info for call
+mov RCX, ...         ; dispatch target object
+cmp [rcx], rcx       ; null check (unnecessary)
+call [addr]          ; call indirect through indir cell
+
+; dispatch stub
+cmp [RCX], targetMT  ; check for right method table
+jne DISPATCH-FAIL    ; bail to resolve stub if check fails (uses R11 info)
+jmp targetCode       ; else "tail call" the right method
+```
+
+At first glance it might appear that adding guarded devirtualization on top of 
+VSD may not provide much benefit for monomorphic sites. However the guarded
+devirtualization test doesn't use an indirection cell and doesn't require R11
+setup, may be able to optimize away the null check, and opens the door for
+inlining. So it should be slightly cheaper on average and significantly cheaper
+in some cases.
+
+(Note [CoreCLR#1422](https://github.com/dotnet/coreclr/issues/14222) indicates
+we should be able to optimize away the null check in any case).
+
+If the guarded tests fails we've filtered out one method table the dispatch cell
+now works well even if a call site alternates between two classes. So we'd expect
+the combination of guarded devirtualization and VSD to perform well on the two
+class test and only show limitations when faced with mixtures of three or more
+classes.
+
+If the guard test always fails we have the up-front cost for the vtable fetch
+(which should amortize pretty well with the subsequent fetch in the) stub plus
+the predicted not taken branch. So we'd expect the cost for the two-class cases
+where the JIT's prediction is always wrong to be a bit higher).
+
+The graph below shows the measured results. To makes sure we're not overly impacted
+by residual VSD state we use a fresh call site for each value of p. The solid
+orange line is the current cost. The dashed orange line is the corresponding cost
+for a virtual call with the same value of p. The solid blue line is the cost with
+an up-front guarded test. As noted there is some slowdown when the JIT always
+guesses the wrong class, but the break-even point (not shown) is at a relatively
+small probability of a correct guess.
+
+![two classes interface devirt](TwoClassesInterface.JPG)
+
+### Interface Calls: The Three Class Case
+
+As with virtual calls you may strongly suspect the two class case for interface
+calls is special. And you'd be right.
+
+If we mix a third class in as we did above, we see similar changes in the
+performance mix for interface calls, as seen below. But also as with virtual calls
+the JIT's guess doesn't have to be all that good to see payoffs. At around 10%
+correct, guessing wins on average, and around 30% correct guessing is always a 
+perf win.
+
+![three classes interface devirt](ThreeClassesInterface.JPG)
+
+### Delegate Speculation
+
+While we have been discussing this topic in the context of virtual calls, the
+method is general and can be applied to indirect calls as well. Here the guard
+test may just test for a particular function rather than a type.
+
+`Delegate.Invoke` is a special method that can eventually turns into an indirect
+call. The JIT could speculate about the possible target of this call. Choosing
+a good target here would require some kind of indirect call profiling.
+
+### Calli Speculation
+
+Indirect calls also arise via the `calli` opcode. As with delegates, choosing a
+target here likely requires specialized profiling.
+
+### Costs
+
+Given the optimistic take on performance, it is important to remember that
+there are also some costs involved to guarded devirtualization: increased code
+size and increased JIT time. There may also be some second-order effects on
+the local code generation as we've introduced control flow into the method where
+it didn't exist previously.
+
+A naive implementation that aggressively performs guarded devirtualization
+increases code size overall by about 5% as measured by PMI. JIT time increase
+was not measured but should be in that same ballpark. Some assemblies see code
+size increasing by as much as 12%.
+
+However the guarded devirtualization only kicks in for about 15% of the methods.
+So the average relative size increase in a method with virtual calls is probably
+more like 33%.
+
+There may be some inefficiencies in the current prototype that can be fixed to
+reduce the code size impact. Aside from the extra method table fetch noted above 
+the duplicated calls have the same sets of arguments and so we might be able to
+amortize argument evaluation costs better. And there are some complexities around
+handling return values (especially for implicit by-reference structures) that
+likewise might be able to be tightened up.
+
+Nevertheless, blindly optimizing all virtual calls with guarded devirtualization
+is not likely the right approach. Something more selective is almost certainly
+needed.
+
+However we have done code-expanding optimizations somewhat blindly before, and
+we could contain the size growth risk by restricting this optimization to Tier1.
+Also PMI can overstate size impact seen in real scenarios as it may over-count
+the impact of changes in methods that are always inlined. So we should look at
+size increases from some actual scenarios.
+
+And perhaps I'll look at the size impact of loop cloning as a precedent.
+
+## Implementation Considerations
+
+To get the data above and a better feel for the challenges involved we have
+implemented a prototype. It is currently located on this branch:
+[GuardedDevirtFoundations](https://github.com/AndyAyersMS/coreclr/tree/GuardedDevirtFoundations).
+
+The prototype can introduce guarded devirtualization for some virtual and
+interface calls. It supports inlining of the directly invoked method. It uses
+the JIT's "best known type" as the class to predict. It also anticipates being
+able to query the runtime for implementing classes of an interface.
+
+### Phase Ordering
+
+For the most part, devirtualization is done very early on in the JIT, during
+importation. This allows devirtualized calls to subsequently be inlined, and for
+devirtualization of call sites in inlinees to take advantage of type information
+propagating down into the inlinee from inlined arguments.
+
+We want those same properties to hold for guarded devirtualization candidates.
+So conceptually the transformation should happen in the same place. However it is
+not possible to introduce new control flow in the importer (ignoring for the moment
+the possibility of using question ops). So the actual transformation must be
+deferred until sometime after the importer runs and before the inliner runs.
+
+This deferral is a bit problematic as some key bits of importer state are needed
+to query the runtime about the properties of a call target. So if we defer the
+transformation we need to somehow capture the data needed for these queries and
+make it available later. The current prototype uses (abuses?) the inline
+candidate information for this. As part of this we require that all speculative
+devirtualization sites be treated as inline candidates, at least initially.
+This has the side effect of hoisting the call to be a top level (statement)
+expression and introduces a return value placeholder.
+
+We currently already have a similar transformation in the JIT, the "fat calli"
+transformation needed on CoreRT. This transformation runs at the right time --
+after the importer and before the inliner -- and introduces the right kind of 
+`if-then-else` control flow structure. So the thought is to generalize this to
+handle guarded devirtualization as well.
+
+### Recognition
+
+In the prototype, candidates are recognized during the initial importer driven
+call to `impDevirtualizeCall`. If the only reason devirtualization fails is lack
+of exactness, then the call is marked as a guarded devirtualization candidate.
+
+### Devirtualization
+
+To produce the direct call the prototype updates the `this` passed in the `then`
+version of the call so it has the exact predicted type. It then re-invokes 
+`impDevirtualizeCall` which should now succeed as the type is now exactly 
+known. The benefit of reuse here is that certain special cases of devirtualization
+are now more likely to be handled.
+
+### Inline Candidacy
+
+The prototype currently sets up all virtual and interface calls as potential
+inline candidates. One open question is whether it is worth doing guarded
+devirtualization simply to introduce a direct call. As an alternative we could 
+insist that the directly called method also be something that is potentially
+inlineable. One can argue that call overhead matters much more for small methods
+that are also likely good inline candidates.
+
+The inline candidate info is based on the apparent method invoked at the virtual
+site. This is the base method, the one that introduces the virtual slot. So if we
+speculatively check for some class and that class overrides, we need to somehow
+update the inline info. How to best do this is unclear.
+
+### Return Values
+
+Because the candidate calls are handled as inline candidates, the JIT hoists the
+call to a top level expression (which is good) during importation and introduces
+a return value placeholder into the place the call occupied in its original tree.
+(Oddly we introduce return value placeholders for some calls that don't return a
+a value -- we should fix this). The placeholder points back at the call.
+
+When we split the call into two calls we can't keep this structure intact as there
+needs to be a 1-1 relationship between call and placeholder. So the prototype
+needs to save the return value in a new local and then update the placeholder to
+refer to that local. This can be tricky because in some cases we haven't yet settled
+on what the actual type of the return value is.
+
+The handling of return values in the early stages of the JIT (arguably, in the entire
+JIT) is quite messy. The ABI details bleed through quite early and do so somewhat
+unevenly. This mostly impacts methods that return structures as different ABIs have
+quite different conventions, and the IR is transformed to reflect those conventions
+at different times for un-inlined calls, inlineable calls that end up not getting
+inlined, and for calls that get inlined. In particular, structures that are small
+enough to be returned by value (in a register or set of registers) need careful
+handling.
+
+The prototype skips over such by-value-returning struct methods today. Some of
+the logic found in `fgUpdateInlineReturnExpressionPlaceHolder` needs to be pulled
+in to properly type the call return value so we can properly type the temp. Or
+perhaps we could leverage some of importer-time transformations that are done for
+the fat calli cases.
+
+For larger structs we should arrange so that the call(s) write their return values
+directly into the new temp, instead of copying the value from wherever they
+return it into a temp, to avoid one level of struct copy. Doing so may require
+upstream zero init of the return value struct and this should only happen in one
+place.
+
+## Open Issues
+
+Here are some of the issues that need to be looked into more carefully.
+
+### Policy
+
+- what is the best mechanism for guessing which class to test for?
+  - instrument Tier0 code?
+  - look at types of arguments?
+  - ask runtime for set of known classes?
+  - harvest info from runtime caches (VSD)?
+  - add instrumenting Tier1 to collect data and Tier2 to optimize?
+- is there some efficient way to test for class ranges? Currently the JIT is
+doing an exact type test. But we really care more about what method is going to
+be invoked. So if there is a range of types `D1...DN` that all will invoke some
+particular method can we test for them all somehow?
+- or should we test the method after the method lookup (possibly worse tradeoff
+because of the chunked method table arrangement, also tricky as a method can
+have multiple addresses over time. Since many types can share a chunk this
+might allow devirtualization over a wider set of classes (good) but we'd lose
+knowledge of exact types (bad). Not clear how these tradeoffs play out.
+- interaction of guarded devirt with VSD? For interface calls we are sort of 
+inlining the first level of the VSD into the JITted code.
+- revocation or reworking of the guard if the JIT's prediction turns out to bad?
+- improve regular devirtualization to reduce need for guarded
+devirtualization.
+- should we enable this for preJITted code? In preJITted code the target method
+table is not a JIT-time constant and must be looked up.
+- in the prototype, guarded devirtualization and late devirtualization sometimes
+conflict. Say we fail to devirtualize a site, and so expand via guarded devirtualization
+guessing some class X. The residual virtual call then may be optimizable via late
+devirtualization, and this may discover the actual class. In that case the guarded
+devirtualization is not needed. But currently it can't be undone.
+- we probably don't want to bother with guarded devirtualization if we can't also
+inline. But it takes us several evaluation steps to determine if a call can
+be inlined, some of these happening *after* we've done the guarded expansion.
+Again this expansion can't be undone.
+- so perhaps we need to build an undo capability for the cases where guarded
+devirtualization doesn't lead to inlining and/or where late devirtualization also
+applies.
+
+### Implementation
+
+- avoid re-fetching method table for latent virtual call (should reduce code
+size and improve overall perf win)
+- look at how effectively we are sharing argument setup (might reduce code size
+and JIT time impact) -- perhaps implement head merging?
+- handle return values in full generality
+- il offsets
+- flag residual calls as not needing null checks
+- properly establish inline candidacy
+- decide if the refactoring of `InlineCandidateInfo` is the right way to pass
+information from importer to the indirect transform phase
+
+### Futures
+
+- can we cover multiple calls with one test? This can happen already if the
+subsequent call is introduced via inlining of the directly called method, as we
+know the exact type along that path. But for back to back calls to virtual 
+methods off of the same object it would be nice to do just one test.
+- should we test for multiple types? Once we've peeled off the "most likely" case
+if the conditional probability of the next most likely case is high it is probably
+worth testing for it too. I believe the C++ compiler will test up to 3 candidates
+this way... but that's a lot of code expansion.
\ No newline at end of file
diff --git a/Documentation/design-docs/ThreeClassesDevirt.JPG b/Documentation/design-docs/ThreeClassesDevirt.JPG
new file mode 100644 (file)
index 0000000..1ae19ba
Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirt.JPG differ
diff --git a/Documentation/design-docs/ThreeClassesDevirtDetail.JPG b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG
new file mode 100644 (file)
index 0000000..0edba74
Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirtDetail.JPG differ
diff --git a/Documentation/design-docs/ThreeClassesDevirtFull.JPG b/Documentation/design-docs/ThreeClassesDevirtFull.JPG
new file mode 100644 (file)
index 0000000..2c09920
Binary files /dev/null and b/Documentation/design-docs/ThreeClassesDevirtFull.JPG differ
diff --git a/Documentation/design-docs/ThreeClassesInterface.JPG b/Documentation/design-docs/ThreeClassesInterface.JPG
new file mode 100644 (file)
index 0000000..cbd3551
Binary files /dev/null and b/Documentation/design-docs/ThreeClassesInterface.JPG differ
diff --git a/Documentation/design-docs/TwoClassesBaseline.JPG b/Documentation/design-docs/TwoClassesBaseline.JPG
new file mode 100644 (file)
index 0000000..3a8b4b2
Binary files /dev/null and b/Documentation/design-docs/TwoClassesBaseline.JPG differ
diff --git a/Documentation/design-docs/TwoClassesDevirt.JPG b/Documentation/design-docs/TwoClassesDevirt.JPG
new file mode 100644 (file)
index 0000000..7c48264
Binary files /dev/null and b/Documentation/design-docs/TwoClassesDevirt.JPG differ
diff --git a/Documentation/design-docs/TwoClassesInterface.JPG b/Documentation/design-docs/TwoClassesInterface.JPG
new file mode 100644 (file)
index 0000000..69063dc
Binary files /dev/null and b/Documentation/design-docs/TwoClassesInterface.JPG differ
index 86a85ce..2851f3d 100644 (file)
@@ -4561,10 +4561,8 @@ void Compiler::compCompile(void** methodCodePtr, ULONG* methodCodeSize, JitFlags
         fgRemovePreds();
     }
 
-    if (IsTargetAbi(CORINFO_CORERT_ABI) && doesMethodHaveFatPointer())
-    {
-        fgTransformFatCalli();
-    }
+    // Transform indirect calls that require control flow expansion.
+    fgTransformIndirectCalls();
 
     EndPhase(PHASE_IMPORTATION);
 
index a130962..a3f46bd 100644 (file)
@@ -3298,6 +3298,13 @@ public:
         return IsTargetAbi(CORINFO_CORERT_ABI) ? TYP_I_IMPL : TYP_REF;
     }
 
+    void impDevirtualizeCall(GenTreeCall*            call,
+                             CORINFO_METHOD_HANDLE*  method,
+                             unsigned*               methodFlags,
+                             CORINFO_CONTEXT_HANDLE* contextHandle,
+                             CORINFO_CONTEXT_HANDLE* exactContextHandle,
+                             bool                    isLateDevirtualization);
+
     //=========================================================================
     //                          PROTECTED
     //=========================================================================
@@ -3355,12 +3362,6 @@ protected:
                             CORINFO_CALL_INFO* callInfo,
                             IL_OFFSET          rawILOffset);
 
-    void impDevirtualizeCall(GenTreeCall*            call,
-                             CORINFO_METHOD_HANDLE*  method,
-                             unsigned*               methodFlags,
-                             CORINFO_CONTEXT_HANDLE* contextHandle,
-                             CORINFO_CONTEXT_HANDLE* exactContextHandle);
-
     CORINFO_CLASS_HANDLE impGetSpecialIntrinsicExactReturnType(CORINFO_METHOD_HANDLE specialIntrinsicHandle);
 
     bool impMethodInfo_hasRetBuffArg(CORINFO_METHOD_INFO* methInfo);
@@ -3865,7 +3866,7 @@ private:
                         bool                  forceInline,
                         InlineResult*         inlineResult);
 
-    void impCheckCanInline(GenTree*               call,
+    void impCheckCanInline(GenTreeCall*           call,
                            CORINFO_METHOD_HANDLE  fncHandle,
                            unsigned               methAttr,
                            CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -3894,6 +3895,11 @@ private:
                                 bool                   exactContextNeedsRuntimeLookup,
                                 CORINFO_CALL_INFO*     callInfo);
 
+    void impMarkInlineCandidateHelper(GenTreeCall*           call,
+                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                      bool                   exactContextNeedsRuntimeLookup,
+                                      CORINFO_CALL_INFO*     callInfo);
+
     bool impTailCallRetTypeCompatible(var_types            callerRetType,
                                       CORINFO_CLASS_HANDLE callerRetTypeClass,
                                       var_types            calleeRetType,
@@ -4133,7 +4139,7 @@ public:
 
     void fgImport();
 
-    void fgTransformFatCalli();
+    void fgTransformIndirectCalls();
 
     void fgInline();
 
@@ -5354,8 +5360,8 @@ private:
 #ifdef DEBUG
     static fgWalkPreFn fgDebugCheckInlineCandidates;
 
-    void               CheckNoFatPointerCandidatesLeft();
-    static fgWalkPreFn fgDebugCheckFatPointerCandidates;
+    void               CheckNoTransformableIndirectCallsRemain();
+    static fgWalkPreFn fgDebugCheckForTransformableIndirectCalls;
 #endif
 
     void fgPromoteStructs();
@@ -6138,6 +6144,7 @@ public:
 #define OMF_HAS_NULLCHECK 0x00000010     // Method contains null check.
 #define OMF_HAS_FATPOINTER 0x00000020    // Method contains call, that needs fat pointer transformation.
 #define OMF_HAS_OBJSTACKALLOC 0x00000040 // Method contains an object allocated on the stack.
+#define OMF_HAS_GUARDEDDEVIRT 0x00000080 // Method contains guarded devirtualization candidate
 
     bool doesMethodHaveFatPointer()
     {
@@ -6156,6 +6163,27 @@ public:
 
     void addFatPointerCandidate(GenTreeCall* call);
 
+    bool doesMethodHaveGuardedDevirtualization()
+    {
+        return (optMethodFlags & OMF_HAS_GUARDEDDEVIRT) != 0;
+    }
+
+    void setMethodHasGuardedDevirtualization()
+    {
+        optMethodFlags |= OMF_HAS_GUARDEDDEVIRT;
+    }
+
+    void clearMethodHasGuardedDevirtualization()
+    {
+        optMethodFlags &= ~OMF_HAS_GUARDEDDEVIRT;
+    }
+
+    void addGuardedDevirtualizationCandidate(GenTreeCall*          call,
+                                             CORINFO_METHOD_HANDLE methodHandle,
+                                             CORINFO_CLASS_HANDLE  classHandle,
+                                             unsigned              methodAttr,
+                                             unsigned              classAttr);
+
     unsigned optMethodFlags;
 
     // Recursion bound controls how far we can go backwards tracking for a SSA value.
index dd18453..ad89317 100644 (file)
@@ -5892,6 +5892,21 @@ void Compiler::fgFindBasicBlocks()
             {
                 // This temp should already have the type of the return value.
                 JITDUMP("\nInliner: re-using pre-existing spill temp V%02u\n", lvaInlineeReturnSpillTemp);
+
+                if (info.compRetType == TYP_REF)
+                {
+                    // We may have co-opted an existing temp for the return spill.
+                    // We likely assumed it was single-def at the time, but now
+                    // we can see it has multiple definitions.
+                    if ((retBlocks > 1) && (lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef == 1))
+                    {
+                        // Make sure it is no longer marked single def. This is only safe
+                        // to do if we haven't ever updated the type.
+                        assert(!lvaTable[lvaInlineeReturnSpillTemp].lvClassInfoUpdated);
+                        JITDUMP("Marked return spill temp V%02u as NOT single def temp\n", lvaInlineeReturnSpillTemp);
+                        lvaTable[lvaInlineeReturnSpillTemp].lvSingleDef = 0;
+                    }
+                }
             }
             else
             {
@@ -21859,43 +21874,58 @@ void Compiler::fgInline()
 
     do
     {
-        /* Make the current basic block address available globally */
-
+        // Make the current basic block address available globally
         compCurBB = block;
 
-        GenTreeStmt* stmt;
-        GenTree*     expr;
-
-        for (stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
+        for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
-            expr = stmt->gtStmtExpr;
 
-            // See if we can expand the inline candidate
-            if ((expr->gtOper == GT_CALL) && ((expr->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0))
+#ifdef DEBUG
+            // In debug builds we want the inline tree to show all failed
+            // inlines. Some inlines may fail very early and never make it to
+            // candidate stage. So scan the tree looking for those early failures.
+            fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
+#endif
+
+            GenTree* expr = stmt->gtStmtExpr;
+
+            // The importer ensures that all inline candidates are
+            // statement expressions.  So see if we have a call.
+            if (expr->IsCall())
             {
                 GenTreeCall* call = expr->AsCall();
-                InlineResult inlineResult(this, call, stmt, "fgInline");
 
-                fgMorphStmt = stmt;
+                // We do. Is it an inline candidate?
+                //
+                // Note we also process GuardeDevirtualizationCandidates here as we've
+                // split off GT_RET_EXPRs for them even when they are not inline candidates
+                // as we need similar processing to ensure they get patched back to where
+                // they belong.
+                if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+                {
+                    InlineResult inlineResult(this, call, stmt, "fgInline");
+
+                    fgMorphStmt = stmt;
 
-                fgMorphCallInline(call, &inlineResult);
+                    fgMorphCallInline(call, &inlineResult);
 
-                if (stmt->gtStmtExpr->IsNothingNode())
-                {
-                    fgRemoveStmt(block, stmt);
-                    continue;
+                    // fgMorphCallInline may have updated the
+                    // statement expression to a GT_NOP if the
+                    // call returned a value, regardless of
+                    // whether the inline succeeded or failed.
+                    //
+                    // If so, remove the GT_NOP and continue
+                    // on with the next statement.
+                    if (stmt->gtStmtExpr->IsNothingNode())
+                    {
+                        fgRemoveStmt(block, stmt);
+                        continue;
+                    }
                 }
             }
-            else
-            {
-#ifdef DEBUG
-                // Look for non-candidates.
-                fgWalkTreePre(&stmt->gtStmtExpr, fgFindNonInlineCandidate, stmt);
-#endif
-            }
 
-            // See if we need to replace the return value place holder.
-            // Also, see if this update enables further devirtualization.
+            // See if we need to replace some return value place holders.
+            // Also, see if this replacement enables further devirtualization.
             //
             // Note we have both preorder and postorder callbacks here.
             //
@@ -22008,6 +22038,11 @@ Compiler::fgWalkResult Compiler::fgFindNonInlineCandidate(GenTree** pTree, fgWal
 
 void Compiler::fgNoteNonInlineCandidate(GenTreeStmt* stmt, GenTreeCall* call)
 {
+    if (call->IsInlineCandidate() || call->IsGuardedDevirtualizationCandidate())
+    {
+        return;
+    }
+
     InlineResult      inlineResult(this, call, nullptr, "fgNotInlineCandidate");
     InlineObservation currentObservation = InlineObservation::CALLSITE_NOT_CANDIDATE;
 
@@ -22471,10 +22506,11 @@ Compiler::fgWalkResult Compiler::fgLateDevirtualization(GenTree** pTree, fgWalkD
             }
 #endif // DEBUG
 
-            CORINFO_METHOD_HANDLE  method      = call->gtCallMethHnd;
-            unsigned               methodFlags = 0;
-            CORINFO_CONTEXT_HANDLE context     = nullptr;
-            comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr);
+            CORINFO_METHOD_HANDLE  method                 = call->gtCallMethHnd;
+            unsigned               methodFlags            = 0;
+            CORINFO_CONTEXT_HANDLE context                = nullptr;
+            const bool             isLateDevirtualization = true;
+            comp->impDevirtualizeCall(call, &method, &methodFlags, &context, nullptr, isLateDevirtualization);
         }
     }
     else if (tree->OperGet() == GT_ASG)
@@ -25507,8 +25543,10 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock*      block,
     return true;
 }
 
-// FatCalliTransformer transforms calli that can use fat function pointer.
-// Fat function pointer is pointer with the second least significant bit set,
+// IndirectCallTransformer transforms indirect calls that involve fat pointers
+// or guarded devirtualization.
+//
+// A fat function pointer is pointer with the second least significant bit set,
 // if the bit is set, the pointer (after clearing the bit) actually points to
 // a tuple <method pointer, instantiation argument pointer> where
 // instantiationArgument is a hidden first argument required by method pointer.
@@ -25557,38 +25595,60 @@ bool Compiler::fgRetargetBranchesToCanonicalCallFinally(BasicBlock*      block,
 //     subsequent statements
 //   }
 //
-class FatCalliTransformer
+class IndirectCallTransformer
 {
 public:
-    FatCalliTransformer(Compiler* compiler) : compiler(compiler)
+    IndirectCallTransformer(Compiler* compiler) : compiler(compiler)
     {
     }
 
     //------------------------------------------------------------------------
     // Run: run transformation for each block.
     //
-    void Run()
+    // Returns:
+    //   Count of calls transformed.
+    int Run()
     {
+        int count = 0;
+
         for (BasicBlock* block = compiler->fgFirstBB; block != nullptr; block = block->bbNext)
         {
-            TransformBlock(block);
+            count += TransformBlock(block);
         }
+
+        return count;
     }
 
 private:
     //------------------------------------------------------------------------
-    // TransformBlock: look through statements and transform statements with fat pointer calls.
+    // TransformBlock: look through statements and transform statements with
+    //   particular indirect calls
     //
-    void TransformBlock(BasicBlock* block)
+    // Returns:
+    //   Count of calls transformed.
+    //
+    int TransformBlock(BasicBlock* block)
     {
+        int count = 0;
+
         for (GenTreeStmt* stmt = block->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
             if (ContainsFatCalli(stmt))
             {
-                StatementTransformer stmtTransformer(compiler, block, stmt);
-                stmtTransformer.Run();
+                FatPointerCallTransformer transformer(compiler, block, stmt);
+                transformer.Run();
+                count++;
+            }
+
+            if (ContainsGuardedDevirtualizationCandidate(stmt))
+            {
+                GuardedDevirtualizationTransformer transformer(compiler, block, stmt);
+                transformer.Run();
+                count++;
             }
         }
+
+        return count;
     }
 
     //------------------------------------------------------------------------
@@ -25609,39 +25669,158 @@ private:
         return fatPointerCandidate->IsCall() && fatPointerCandidate->AsCall()->IsFatPointerCandidate();
     }
 
-    class StatementTransformer
+    //------------------------------------------------------------------------
+    // ContainsGuardedDevirtualizationCandidate: check does this statement contain a virtual
+    // call that we'd like to guardedly devirtualize?
+    //
+    // Return Value:
+    //    true if contains, false otherwise.
+    //
+    // Notes:
+    //    calls are hoisted to top level ... (we hope)
+    bool ContainsGuardedDevirtualizationCandidate(GenTreeStmt* stmt)
+    {
+        GenTree* candidate = stmt->gtStmtExpr;
+        return candidate->IsCall() && candidate->AsCall()->IsGuardedDevirtualizationCandidate();
+    }
+
+    class Transformer
     {
     public:
-        StatementTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+        Transformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
             : compiler(compiler), currBlock(block), stmt(stmt)
         {
-            remainderBlock  = nullptr;
-            checkBlock      = nullptr;
-            thenBlock       = nullptr;
-            elseBlock       = nullptr;
-            doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
-            origCall        = GetCall(stmt);
-            fptrAddress     = origCall->gtCallAddr;
-            pointerType     = fptrAddress->TypeGet();
+            remainderBlock = nullptr;
+            checkBlock     = nullptr;
+            thenBlock      = nullptr;
+            elseBlock      = nullptr;
+            origCall       = nullptr;
         }
 
         //------------------------------------------------------------------------
         // Run: transform the statement as described above.
         //
-        void Run()
+        virtual void Run()
+        {
+            origCall = GetCall(stmt);
+            Transform();
+        }
+
+        void Transform()
         {
-            ClearFatFlag();
+            JITDUMP("*** %s: transforming [%06u]\n", Name(), compiler->dspTreeID(stmt));
+            FixupRetExpr();
+            ClearFlag();
             CreateRemainder();
             CreateCheck();
             CreateThen();
             CreateElse();
-
             RemoveOldStatement();
             SetWeights();
             ChainFlow();
         }
 
-    private:
+    protected:
+        virtual const char*  Name()                         = 0;
+        virtual void         ClearFlag()                    = 0;
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt) = 0;
+        virtual void FixupRetExpr()                         = 0;
+
+        //------------------------------------------------------------------------
+        // CreateRemainder: split current block at the call stmt and
+        // insert statements after the call into remainderBlock.
+        //
+        void CreateRemainder()
+        {
+            remainderBlock          = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
+            unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
+            remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
+        }
+
+        virtual void CreateCheck() = 0;
+
+        //------------------------------------------------------------------------
+        // CreateAndInsertBasicBlock: ask compiler to create new basic block.
+        // and insert in into the basic block list.
+        //
+        // Arguments:
+        //    jumpKind - jump kind for the new basic block
+        //    insertAfter - basic block, after which compiler has to insert the new one.
+        //
+        // Return Value:
+        //    new basic block.
+        BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
+        {
+            BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
+            if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
+            {
+                block->bbFlags &= ~BBF_INTERNAL;
+                block->bbFlags |= BBF_IMPORTED;
+            }
+            return block;
+        }
+
+        virtual void CreateThen() = 0;
+        virtual void CreateElse() = 0;
+
+        //------------------------------------------------------------------------
+        // RemoveOldStatement: remove original stmt from current block.
+        //
+        void RemoveOldStatement()
+        {
+            compiler->fgRemoveStmt(currBlock, stmt);
+        }
+
+        //------------------------------------------------------------------------
+        // SetWeights: set weights for new blocks.
+        //
+        void SetWeights()
+        {
+            remainderBlock->inheritWeight(currBlock);
+            checkBlock->inheritWeight(currBlock);
+            thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
+            elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+        }
+
+        //------------------------------------------------------------------------
+        // ChainFlow: link new blocks into correct cfg.
+        //
+        void ChainFlow()
+        {
+            assert(!compiler->fgComputePredsDone);
+            checkBlock->bbJumpDest = elseBlock;
+            thenBlock->bbJumpDest  = remainderBlock;
+        }
+
+        Compiler*    compiler;
+        BasicBlock*  currBlock;
+        BasicBlock*  remainderBlock;
+        BasicBlock*  checkBlock;
+        BasicBlock*  thenBlock;
+        BasicBlock*  elseBlock;
+        GenTreeStmt* stmt;
+        GenTreeCall* origCall;
+
+        const int HIGH_PROBABILITY = 80;
+    };
+
+    class FatPointerCallTransformer final : public Transformer
+    {
+    public:
+        FatPointerCallTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+            : Transformer(compiler, block, stmt)
+        {
+            doesReturnValue = stmt->gtStmtExpr->OperIs(GT_ASG);
+            fptrAddress     = origCall->gtCallAddr;
+            pointerType     = fptrAddress->TypeGet();
+        }
+
+    protected:
+        virtual const char* Name()
+        {
+            return "FatPointerCall";
+        }
+
         //------------------------------------------------------------------------
         // GetCall: find a call in a statement.
         //
@@ -25650,7 +25829,7 @@ private:
         //
         // Return Value:
         //    call tree node pointer.
-        GenTreeCall* GetCall(GenTreeStmt* callStmt)
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
         {
             GenTree*     tree = callStmt->gtStmtExpr;
             GenTreeCall* call = nullptr;
@@ -25667,28 +25846,22 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // ClearFatFlag: clear fat pointer candidate flag from the original call.
+        // ClearFlag: clear fat pointer candidate flag from the original call.
         //
-        void ClearFatFlag()
+        virtual void ClearFlag()
         {
             origCall->ClearFatPointerCandidate();
         }
 
-        //------------------------------------------------------------------------
-        // CreateRemainder: split current block at the fat call stmt and
-        // insert statements after the call into remainderBlock.
-        //
-        void CreateRemainder()
+        // FixupRetExpr: no action needed as we handle this in the importer.
+        virtual void FixupRetExpr()
         {
-            remainderBlock          = compiler->fgSplitBlockAfterStatement(currBlock, stmt);
-            unsigned propagateFlags = currBlock->bbFlags & BBF_GC_SAFE_POINT;
-            remainderBlock->bbFlags |= BBF_JMP_TARGET | BBF_HAS_LABEL | propagateFlags;
         }
 
         //------------------------------------------------------------------------
         // CreateCheck: create check block, that checks fat pointer bit set.
         //
-        void CreateCheck()
+        virtual void CreateCheck()
         {
             checkBlock               = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
             GenTree* fatPointerMask  = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK);
@@ -25702,19 +25875,20 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // CreateCheck: create then block, that is executed if call address is not fat pointer.
+        // CreateThen: create then block, that is executed if the check succeeds.
+        // This simply executes the original call.
         //
-        void CreateThen()
+        virtual void CreateThen()
         {
-            thenBlock               = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
-            GenTree* nonFatCallStmt = compiler->gtCloneExpr(stmt)->AsStmt();
-            compiler->fgInsertStmtAtEnd(thenBlock, nonFatCallStmt);
+            thenBlock                   = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+            GenTree* copyOfOriginalStmt = compiler->gtCloneExpr(stmt)->AsStmt();
+            compiler->fgInsertStmtAtEnd(thenBlock, copyOfOriginalStmt);
         }
 
         //------------------------------------------------------------------------
-        // CreateCheck: create else block, that is executed if call address is fat pointer.
+        // CreateElse: create else block, that is executed if call address is fat pointer.
         //
-        void CreateElse()
+        virtual void CreateElse()
         {
             elseBlock = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
 
@@ -25727,27 +25901,6 @@ private:
         }
 
         //------------------------------------------------------------------------
-        // CreateAndInsertBasicBlock: ask compiler to create new basic block.
-        // and insert in into the basic block list.
-        //
-        // Arguments:
-        //    jumpKind - jump kind for the new basic block
-        //    insertAfter - basic block, after which compiler has to insert the new one.
-        //
-        // Return Value:
-        //    new basic block.
-        BasicBlock* CreateAndInsertBasicBlock(BBjumpKinds jumpKind, BasicBlock* insertAfter)
-        {
-            BasicBlock* block = compiler->fgNewBBafter(jumpKind, insertAfter, true);
-            if ((insertAfter->bbFlags & BBF_INTERNAL) == 0)
-            {
-                block->bbFlags &= ~BBF_INTERNAL;
-                block->bbFlags |= BBF_IMPORTED;
-            }
-            return block;
-        }
-
-        //------------------------------------------------------------------------
         // GetFixedFptrAddress: clear fat pointer bit from fat pointer address.
         //
         // Return Value:
@@ -25843,49 +25996,306 @@ private:
             iterator->Rest() = compiler->gtNewArgList(hiddenArgument);
         }
 
+    private:
+        const int FAT_POINTER_MASK = 0x2;
+
+        GenTree*  fptrAddress;
+        var_types pointerType;
+        bool      doesReturnValue;
+    };
+
+    class GuardedDevirtualizationTransformer final : public Transformer
+    {
+    public:
+        GuardedDevirtualizationTransformer(Compiler* compiler, BasicBlock* block, GenTreeStmt* stmt)
+            : Transformer(compiler, block, stmt), returnTemp(BAD_VAR_NUM)
+        {
+        }
+
         //------------------------------------------------------------------------
-        // RemoveOldStatement: remove original stmt from current block.
+        // Run: transform the statement as described above.
         //
-        void RemoveOldStatement()
+        virtual void Run()
         {
-            compiler->fgRemoveStmt(currBlock, stmt);
+            origCall = GetCall(stmt);
+
+            JITDUMP("*** %s contemplating [%06u]\n", Name(), compiler->dspTreeID(origCall));
+
+            // We currently need inline candidate info to guarded devirt.
+            if (!origCall->IsInlineCandidate())
+            {
+                JITDUMP("*** %s Bailing on [%06u] -- not an inline candidate\n", Name(), compiler->dspTreeID(origCall));
+                ClearFlag();
+                return;
+            }
+
+            // For now, bail on transforming calls that still appear
+            // to return structs by value as there is deferred work
+            // needed to fix up the return type.
+            //
+            // See for instance fgUpdateInlineReturnExpressionPlaceHolder.
+            if (origCall->TypeGet() == TYP_STRUCT)
+            {
+                JITDUMP("*** %s Bailing on [%06u] -- can't handle by-value struct returns yet\n", Name(),
+                        compiler->dspTreeID(origCall));
+                ClearFlag();
+
+                // For stub calls restore the stub address
+                if (origCall->IsVirtualStub())
+                {
+                    origCall->gtStubCallStubAddr = origCall->gtInlineCandidateInfo->stubAddr;
+                }
+                return;
+            }
+
+            Transform();
+        }
+
+    protected:
+        virtual const char* Name()
+        {
+            return "GuardedDevirtualization";
         }
 
         //------------------------------------------------------------------------
-        // SetWeights: set weights for new blocks.
+        // GetCall: find a call in a statement.
         //
-        void SetWeights()
+        // Arguments:
+        //    callStmt - the statement with the call inside.
+        //
+        // Return Value:
+        //    call tree node pointer.
+        virtual GenTreeCall* GetCall(GenTreeStmt* callStmt)
         {
-            remainderBlock->inheritWeight(currBlock);
-            checkBlock->inheritWeight(currBlock);
-            thenBlock->inheritWeightPercentage(currBlock, HIGH_PROBABILITY);
-            elseBlock->inheritWeightPercentage(currBlock, 100 - HIGH_PROBABILITY);
+            GenTree* tree = callStmt->gtStmtExpr;
+            assert(tree->IsCall());
+            GenTreeCall* call = tree->AsCall();
+            return call;
         }
 
         //------------------------------------------------------------------------
-        // ChainFlow: link new blocks into correct cfg.
+        // ClearFlag: clear guarded devirtualization candidate flag from the original call.
         //
-        void ChainFlow()
+        virtual void ClearFlag()
         {
-            assert(!compiler->fgComputePredsDone);
-            checkBlock->bbJumpDest = elseBlock;
-            thenBlock->bbJumpDest  = remainderBlock;
+            origCall->ClearGuardedDevirtualizationCandidate();
         }
 
-        Compiler*    compiler;
-        BasicBlock*  currBlock;
-        BasicBlock*  remainderBlock;
-        BasicBlock*  checkBlock;
-        BasicBlock*  thenBlock;
-        BasicBlock*  elseBlock;
-        GenTreeStmt* stmt;
-        GenTreeCall* origCall;
-        GenTree*     fptrAddress;
-        var_types    pointerType;
-        bool         doesReturnValue;
+        //------------------------------------------------------------------------
+        // CreateCheck: create check block and check method table
+        //
+        virtual void CreateCheck()
+        {
+            checkBlock = CreateAndInsertBasicBlock(BBJ_COND, currBlock);
 
-        const int FAT_POINTER_MASK = 0x2;
-        const int HIGH_PROBABILITY = 80;
+            // Fetch method table from object arg to call.
+            GenTree* thisTree = compiler->gtCloneExpr(origCall->gtCallObjp);
+
+            // Create temp for this if the tree is costly.
+            if (!thisTree->IsLocal())
+            {
+                const unsigned thisTempNum = compiler->lvaGrabTemp(true DEBUGARG("guarded devirt this temp"));
+                // lvaSetClass(thisTempNum, ...);
+                GenTree* asgTree = compiler->gtNewTempAssign(thisTempNum, thisTree);
+                GenTree* asgStmt = compiler->fgNewStmtFromTree(asgTree, stmt->gtStmt.gtStmtILoffsx);
+                compiler->fgInsertStmtAtEnd(checkBlock, asgStmt);
+
+                thisTree = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+
+                // Propagate the new this to the call. Must be a new expr as the call
+                // will live on in the else block and thisTree is used below.
+                origCall->gtCallObjp = compiler->gtNewLclvNode(thisTempNum, TYP_REF);
+            }
+
+            GenTree* methodTable = compiler->gtNewIndir(TYP_I_IMPL, thisTree);
+            methodTable->gtFlags |= GTF_IND_INVARIANT;
+
+            // Find target method table
+            GuardedDevirtualizationCandidateInfo* guardedInfo       = origCall->gtGuardedDevirtualizationCandidateInfo;
+            CORINFO_CLASS_HANDLE                  clsHnd            = guardedInfo->guardedClassHandle;
+            GenTree*                              targetMethodTable = compiler->gtNewIconEmbClsHndNode(clsHnd);
+
+            // Compare and jump to else (which does the indirect call) if NOT equal
+            GenTree* methodTableCompare = compiler->gtNewOperNode(GT_NE, TYP_INT, targetMethodTable, methodTable);
+            GenTree* jmpTree            = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, methodTableCompare);
+            GenTree* jmpStmt            = compiler->fgNewStmtFromTree(jmpTree, stmt->gtStmt.gtStmtILoffsx);
+            compiler->fgInsertStmtAtEnd(checkBlock, jmpStmt);
+        }
+
+        //------------------------------------------------------------------------
+        // FixupRetExpr: set up to repair return value placeholder from call
+        //
+        virtual void FixupRetExpr()
+        {
+            // If call returns a value, we need to copy it to a temp, and
+            // update the associated GT_RET_EXPR to refer to the temp instead
+            // of the call.
+            //
+            // Note implicit by-ref returns should have already been converted
+            // so any struct copy we induce here should be cheap.
+            //
+            // Todo: make sure we understand how this interacts with return type
+            // munging for small structs.
+            InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+            GenTree*             retExpr    = inlineInfo->retExpr;
+
+            // Sanity check the ret expr if non-null: it should refer to the original call.
+            if (retExpr != nullptr)
+            {
+                assert(retExpr->gtRetExpr.gtInlineCandidate == origCall);
+            }
+
+            if (origCall->TypeGet() != TYP_VOID)
+            {
+                returnTemp = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt return temp"));
+                JITDUMP("Reworking call(s) to return value via a new temp V%02u\n", returnTemp);
+
+                if (varTypeIsStruct(origCall))
+                {
+                    compiler->lvaSetStruct(returnTemp, origCall->gtRetClsHnd, false);
+                }
+
+                GenTree* tempTree = compiler->gtNewLclvNode(returnTemp, origCall->TypeGet());
+
+                JITDUMP("Updating GT_RET_EXPR [%06u] to refer to temp V%02u\n", compiler->dspTreeID(retExpr),
+                        returnTemp);
+                retExpr->gtRetExpr.gtInlineCandidate = tempTree;
+            }
+            else if (retExpr != nullptr)
+            {
+                // We still oddly produce GT_RET_EXPRs for some void
+                // returning calls. Just patch the ret expr to a NOP.
+                //
+                // Todo: consider bagging creation of these RET_EXPRs. The only possible
+                // benefit they provide is stitching back larger trees for failed inlines
+                // of void-returning methods. But then the calls likely sit in commas and
+                // the benefit of a larger tree is unclear.
+                JITDUMP("Updating GT_RET_EXPR [%06u] for VOID return to refer to a NOP\n",
+                        compiler->dspTreeID(retExpr));
+                GenTree* nopTree                     = compiler->gtNewNothingNode();
+                retExpr->gtRetExpr.gtInlineCandidate = nopTree;
+            }
+            else
+            {
+                // We do not produce GT_RET_EXPRs for CTOR calls, so there is nothing to patch.
+            }
+        }
+
+        //------------------------------------------------------------------------
+        // CreateThen: create else block with direct call to method
+        //
+        virtual void CreateThen()
+        {
+            thenBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, checkBlock);
+
+            InlineCandidateInfo* inlineInfo = origCall->gtInlineCandidateInfo;
+            CORINFO_CLASS_HANDLE clsHnd     = inlineInfo->clsHandle;
+
+            // copy 'this' to temp with exact type.
+            const unsigned thisTemp  = compiler->lvaGrabTemp(false DEBUGARG("guarded devirt this exact temp"));
+            GenTree*       clonedObj = compiler->gtCloneExpr(origCall->gtCallObjp);
+            GenTree*       assign    = compiler->gtNewTempAssign(thisTemp, clonedObj);
+            compiler->lvaSetClass(thisTemp, clsHnd, true);
+            GenTreeStmt* assignStmt = compiler->gtNewStmt(assign);
+            compiler->fgInsertStmtAtEnd(thenBlock, assignStmt);
+
+            // Clone call
+            GenTreeCall* call = compiler->gtCloneExpr(origCall)->AsCall();
+            call->gtCallObjp  = compiler->gtNewLclvNode(thisTemp, TYP_REF);
+            call->SetIsGuarded();
+
+            JITDUMP("Direct call [%06u] in block BB%02u\n", compiler->dspTreeID(call), thenBlock->bbNum);
+
+            // Then invoke impDevirtualizeCall do actually
+            // transform the call for us. It should succeed.... as we have
+            // now provided an exact typed this.
+            CORINFO_METHOD_HANDLE  methodHnd              = inlineInfo->methInfo.ftn;
+            unsigned               methodFlags            = inlineInfo->methAttr;
+            CORINFO_CONTEXT_HANDLE context                = inlineInfo->exactContextHnd;
+            const bool             isLateDevirtualization = true;
+            compiler->impDevirtualizeCall(call, &methodHnd, &methodFlags, &context, nullptr, isLateDevirtualization);
+
+            // Presumably devirt might fail? If so we should try and avoid
+            // making this a guarded devirt candidate instead of ending
+            // up here.
+            assert(!call->IsVirtual());
+
+            // Re-establish this call as an inline candidate.
+            GenTree* oldRetExpr         = inlineInfo->retExpr;
+            inlineInfo->clsHandle       = clsHnd;
+            inlineInfo->exactContextHnd = context;
+            call->gtInlineCandidateInfo = inlineInfo;
+
+            // Add the call
+            GenTreeStmt* callStmt = compiler->gtNewStmt(call);
+            compiler->fgInsertStmtAtEnd(thenBlock, callStmt);
+
+            // If there was a ret expr for this call, we need to create a new one
+            // and append it just after the call.
+            //
+            // Note the original GT_RET_EXPR is sitting at the join point of the
+            // guarded expansion and for non-void calls, and now refers to a temp local;
+            // we set all this up in FixupRetExpr().
+            if (oldRetExpr != nullptr)
+            {
+                GenTree* retExpr    = compiler->gtNewInlineCandidateReturnExpr(call, call->TypeGet());
+                inlineInfo->retExpr = retExpr;
+
+                if (returnTemp != BAD_VAR_NUM)
+                {
+                    retExpr = compiler->gtNewTempAssign(returnTemp, retExpr);
+                }
+                else
+                {
+                    // We should always have a return temp if we return results by value
+                    assert(origCall->TypeGet() == TYP_VOID);
+                }
+
+                GenTreeStmt* resultStmt = compiler->gtNewStmt(retExpr);
+                compiler->fgInsertStmtAtEnd(thenBlock, resultStmt);
+            }
+        }
+
+        //------------------------------------------------------------------------
+        // CreateElse: create else block. This executes the unaltered indirect call.
+        //
+        virtual void CreateElse()
+        {
+            elseBlock            = CreateAndInsertBasicBlock(BBJ_NONE, thenBlock);
+            GenTreeCall* call    = origCall;
+            GenTreeStmt* newStmt = compiler->gtNewStmt(call);
+
+            call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+            call->SetIsGuarded();
+
+            JITDUMP("Residual call [%06u] moved to block BB%02u\n", compiler->dspTreeID(call), elseBlock->bbNum);
+
+            if (returnTemp != BAD_VAR_NUM)
+            {
+                GenTree* assign     = compiler->gtNewTempAssign(returnTemp, call);
+                newStmt->gtStmtExpr = assign;
+            }
+
+            // For stub calls, restore the stub address. For everything else,
+            // null out the candidate info field.
+            if (call->IsVirtualStub())
+            {
+                JITDUMP("Restoring stub addr %p from candidate info\n", call->gtInlineCandidateInfo->stubAddr);
+                call->gtStubCallStubAddr = call->gtInlineCandidateInfo->stubAddr;
+            }
+            else
+            {
+                call->gtInlineCandidateInfo = nullptr;
+            }
+
+            compiler->fgInsertStmtAtEnd(elseBlock, newStmt);
+
+            // Set the original statement to a nop.
+            stmt->gtStmtExpr = compiler->gtNewNothingNode();
+        }
+
+    private:
+        unsigned returnTemp;
     };
 
     Compiler* compiler;
@@ -25894,46 +26304,74 @@ private:
 #ifdef DEBUG
 
 //------------------------------------------------------------------------
-// fgDebugCheckFatPointerCandidates: callback to make sure there are no more GTF_CALL_M_FAT_POINTER_CHECK calls.
+// fgDebugCheckForTransformableIndirectCalls: callback to make sure there
+//  are no more GTF_CALL_M_FAT_POINTER_CHECK or GTF_CALL_M_GUARDED_DEVIRT
+//  calls remaining
 //
-Compiler::fgWalkResult Compiler::fgDebugCheckFatPointerCandidates(GenTree** pTree, fgWalkData* data)
+Compiler::fgWalkResult Compiler::fgDebugCheckForTransformableIndirectCalls(GenTree** pTree, fgWalkData* data)
 {
     GenTree* tree = *pTree;
     if (tree->IsCall())
     {
         assert(!tree->AsCall()->IsFatPointerCandidate());
+        assert(!tree->AsCall()->IsGuardedDevirtualizationCandidate());
     }
     return WALK_CONTINUE;
 }
 
 //------------------------------------------------------------------------
-// CheckNoFatPointerCandidatesLeft: walk through blocks and check that there are no fat pointer candidates left.
+// CheckNoTransformableIndirectCallsRemain: walk through blocks and check
+//    that there are no indirect call candidates left to transform.
 //
-void Compiler::CheckNoFatPointerCandidatesLeft()
+void Compiler::CheckNoTransformableIndirectCallsRemain()
 {
     assert(!doesMethodHaveFatPointer());
+    assert(!doesMethodHaveGuardedDevirtualization());
+
     for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
     {
         for (GenTreeStmt* stmt = fgFirstBB->firstStmt(); stmt != nullptr; stmt = stmt->gtNextStmt)
         {
-            fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckFatPointerCandidates);
+            fgWalkTreePre(&stmt->gtStmtExpr, fgDebugCheckForTransformableIndirectCalls);
         }
     }
 }
 #endif
 
 //------------------------------------------------------------------------
-// fgTransformFatCalli: find and transform fat calls.
+// fgTransformIndirectCalls: find and transform various indirect calls
 //
-void Compiler::fgTransformFatCalli()
+// These transformations happen post-import because they may introduce
+// control flow.
+//
+void Compiler::fgTransformIndirectCalls()
 {
-    assert(IsTargetAbi(CORINFO_CORERT_ABI));
-    FatCalliTransformer fatCalliTransformer(this);
-    fatCalliTransformer.Run();
-    clearMethodHasFatPointer();
-#ifdef DEBUG
-    CheckNoFatPointerCandidatesLeft();
-#endif
+    JITDUMP("\n*************** in fgTransformIndirectCalls(%s)\n", compIsForInlining() ? "inlinee" : "root");
+
+    if (doesMethodHaveFatPointer() || doesMethodHaveGuardedDevirtualization())
+    {
+        IndirectCallTransformer indirectCallTransformer(this);
+        int                     count = indirectCallTransformer.Run();
+
+        if (count > 0)
+        {
+            JITDUMP("\n*************** After fgTransformIndirectCalls() [%d calls transformed]\n", count);
+            INDEBUG(if (verbose) { fgDispBasicBlocks(true); });
+        }
+        else
+        {
+            JITDUMP(" -- no transforms done (?)\n");
+        }
+
+        clearMethodHasFatPointer();
+        clearMethodHasGuardedDevirtualization();
+    }
+    else
+    {
+        JITDUMP(" -- no candidates to transform\n");
+    }
+
+    INDEBUG(CheckNoTransformableIndirectCallsRemain(););
 }
 
 //------------------------------------------------------------------------
index 90b6bf8..542a701 100644 (file)
@@ -7343,8 +7343,9 @@ GenTree* Compiler::gtCloneExpr(
             copy->gtCall.setEntryPoint(tree->gtCall.gtEntryPoint);
 #endif
 
-#ifdef DEBUG
+#if defined(DEBUG) || defined(INLINE_DATA)
             copy->gtCall.gtInlineObservation = tree->gtCall.gtInlineObservation;
+            copy->gtCall.gtRawILOffset       = tree->gtCall.gtRawILOffset;
 #endif
 
             copy->AsCall()->CopyOtherRegFlags(tree->AsCall());
@@ -9334,9 +9335,22 @@ void Compiler::gtDispNode(GenTree* tree, IndentStack* indentStack, __in __in_z _
                 goto DASH;
 
             case GT_CALL:
-                if (tree->gtFlags & GTF_CALL_INLINE_CANDIDATE)
+                if (tree->gtCall.IsInlineCandidate())
                 {
-                    printf("I");
+                    if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+                    {
+                        printf("&");
+                    }
+                    else
+                    {
+                        printf("I");
+                    }
+                    --msgLength;
+                    break;
+                }
+                else if (tree->gtCall.IsGuardedDevirtualizationCandidate())
+                {
+                    printf("G");
                     --msgLength;
                     break;
                 }
index c6c5841..360fa1b 100644 (file)
@@ -144,8 +144,8 @@ enum gtCallTypes : BYTE
 /*****************************************************************************/
 
 struct BasicBlock;
-
 struct InlineCandidateInfo;
+struct GuardedDevirtualizationCandidateInfo;
 
 typedef unsigned short AssertionIndex;
 
@@ -3547,6 +3547,8 @@ struct GenTreeCall final : public GenTree
                                                     // the comma result is unused.
 #define GTF_CALL_M_DEVIRTUALIZED         0x00040000 // GT_CALL -- this call was devirtualized
 #define GTF_CALL_M_UNBOXED               0x00080000 // GT_CALL -- this call was optimized to use the unboxed entry point
+#define GTF_CALL_M_GUARDED_DEVIRT        0x00100000 // GT_CALL -- this call is a candidate for guarded devirtualization
+#define GTF_CALL_M_GUARDED               0x00200000 // GT_CALL -- this call was transformed by guarded devirtualization
 
     // clang-format on
 
@@ -3739,6 +3741,11 @@ struct GenTreeCall final : public GenTree
         return (gtCallMoreFlags & GTF_CALL_M_FAT_POINTER_CHECK) != 0;
     }
 
+    bool IsGuardedDevirtualizationCandidate() const
+    {
+        return (gtCallMoreFlags & GTF_CALL_M_GUARDED_DEVIRT) != 0;
+    }
+
     bool IsPure(Compiler* compiler) const;
 
     bool HasSideEffects(Compiler* compiler, bool ignoreExceptions = false, bool ignoreCctors = false) const;
@@ -3758,11 +3765,31 @@ struct GenTreeCall final : public GenTree
         return (gtCallMoreFlags & GTF_CALL_M_DEVIRTUALIZED) != 0;
     }
 
+    bool IsGuarded() const
+    {
+        return (gtCallMoreFlags & GTF_CALL_M_GUARDED) != 0;
+    }
+
     bool IsUnboxed() const
     {
         return (gtCallMoreFlags & GTF_CALL_M_UNBOXED) != 0;
     }
 
+    void ClearGuardedDevirtualizationCandidate()
+    {
+        gtCallMoreFlags &= ~GTF_CALL_M_GUARDED_DEVIRT;
+    }
+
+    void SetGuardedDevirtualizationCandidate()
+    {
+        gtCallMoreFlags |= GTF_CALL_M_GUARDED_DEVIRT;
+    }
+
+    void SetIsGuarded()
+    {
+        gtCallMoreFlags |= GTF_CALL_M_GUARDED;
+    }
+
     unsigned gtCallMoreFlags; // in addition to gtFlags
 
     unsigned char gtCallType : 3;   // value from the gtCallTypes enumeration
@@ -3774,8 +3801,9 @@ struct GenTreeCall final : public GenTree
         // only used for CALLI unmanaged calls (CT_INDIRECT)
         GenTree* gtCallCookie;
         // gtInlineCandidateInfo is only used when inlining methods
-        InlineCandidateInfo*   gtInlineCandidateInfo;
-        void*                  gtStubCallStubAddr;              // GTF_CALL_VIRT_STUB - these are never inlined
+        InlineCandidateInfo*                  gtInlineCandidateInfo;
+        GuardedDevirtualizationCandidateInfo* gtGuardedDevirtualizationCandidateInfo;
+        void*                                 gtStubCallStubAddr; // GTF_CALL_VIRT_STUB - these are never inlined
         CORINFO_GENERIC_HANDLE compileTimeHelperArgumentHandle; // Used to track type handle argument of dynamic helpers
         void*                  gtDirectCallAddress; // Used to pass direct call address between lower and codegen
     };
index eabc3bc..0ddf3b3 100644 (file)
@@ -7740,8 +7740,7 @@ var_types Compiler::impImportCall(OPCODE                  opcode,
                 }
                 else
                 {
-                    // ok, the stub is available at compile type.
-
+                    // The stub address is known at compile time
                     call = gtNewCallNode(CT_USER_FUNC, callInfo->hMethod, callRetTyp, nullptr, ilOffset);
                     call->gtCall.gtStubCallStubAddr = callInfo->stubLookup.constLookup.addr;
                     call->gtFlags |= GTF_CALL_VIRT_STUB;
@@ -8388,8 +8387,9 @@ var_types Compiler::impImportCall(OPCODE                  opcode,
             assert(obj->gtType == TYP_REF);
 
             // See if we can devirtualize.
+            const bool isLateDevirtualization = false;
             impDevirtualizeCall(call->AsCall(), &callInfo->hMethod, &callInfo->methodFlags, &callInfo->contextHandle,
-                                &exactContextHnd);
+                                &exactContextHnd, isLateDevirtualization);
         }
 
         if (impIsThis(obj))
@@ -8717,26 +8717,43 @@ DONE_CALL:
         {
             // Sometimes "call" is not a GT_CALL (if we imported an intrinsic that didn't turn into a call)
 
-            bool fatPointerCandidate = call->AsCall()->IsFatPointerCandidate();
+            GenTreeCall* origCall = call->AsCall();
+
+            const bool isFatPointerCandidate              = origCall->IsFatPointerCandidate();
+            const bool isInlineCandidate                  = origCall->IsInlineCandidate();
+            const bool isGuardedDevirtualizationCandidate = origCall->IsGuardedDevirtualizationCandidate();
+
             if (varTypeIsStruct(callRetTyp))
             {
+                // Need to treat all "split tree" cases here, not just inline candidates
                 call = impFixupCallStructReturn(call->AsCall(), sig->retTypeClass);
             }
 
-            if ((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0)
+            // TODO: consider handling fatcalli cases this way too...?
+            if (isInlineCandidate || isGuardedDevirtualizationCandidate)
             {
+                // We should not have made any adjustments in impFixupCallStructReturn
+                // as we defer those until we know the fate of the call.
+                assert(call == origCall);
+
                 assert(opts.OptEnabled(CLFLG_INLINING));
-                assert(!fatPointerCandidate); // We should not try to inline calli.
+                assert(!isFatPointerCandidate); // We should not try to inline calli.
 
                 // Make the call its own tree (spill the stack if needed).
                 impAppendTree(call, (unsigned)CHECK_SPILL_ALL, impCurStmtOffs);
 
                 // TODO: Still using the widened type.
-                call = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+                GenTree* retExpr = gtNewInlineCandidateReturnExpr(call, genActualType(callRetTyp));
+
+                // Link the retExpr to the call so if necessary we can manipulate it later.
+                origCall->gtInlineCandidateInfo->retExpr = retExpr;
+
+                // Propagate retExpr as the placeholder for the call.
+                call = retExpr;
             }
             else
             {
-                if (fatPointerCandidate)
+                if (isFatPointerCandidate)
                 {
                     // fatPointer candidates should be in statements of the form call() or var = call().
                     // Such form allows to find statements with fat calls without walking through whole trees
@@ -18378,7 +18395,7 @@ void Compiler::impCanInlineIL(CORINFO_METHOD_HANDLE fncHandle,
 /*****************************************************************************
  */
 
-void Compiler::impCheckCanInline(GenTree*               call,
+void Compiler::impCheckCanInline(GenTreeCall*           call,
                                  CORINFO_METHOD_HANDLE  fncHandle,
                                  unsigned               methAttr,
                                  CORINFO_CONTEXT_HANDLE exactContextHnd,
@@ -18391,7 +18408,7 @@ void Compiler::impCheckCanInline(GenTree*               call,
     struct Param
     {
         Compiler*              pThis;
-        GenTree*               call;
+        GenTreeCall*           call;
         CORINFO_METHOD_HANDLE  fncHandle;
         unsigned               methAttr;
         CORINFO_CONTEXT_HANDLE exactContextHnd;
@@ -18523,22 +18540,42 @@ void Compiler::impCheckCanInline(GenTree*               call,
                    (varTypeIsStruct(fncRetType) && (fncRealRetType == TYP_STRUCT)));
 #endif
 
+            // Allocate an InlineCandidateInfo structure,
             //
-            // Allocate an InlineCandidateInfo structure
+            // Or, reuse the existing GuardedDevirtualizationCandidateInfo,
+            // which was pre-allocated to have extra room.
             //
             InlineCandidateInfo* pInfo;
-            pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
-
-            pInfo->dwRestrictions       = dwRestrictions;
-            pInfo->methInfo             = methInfo;
-            pInfo->methAttr             = pParam->methAttr;
-            pInfo->clsHandle            = clsHandle;
-            pInfo->clsAttr              = clsAttr;
-            pInfo->fncRetType           = fncRetType;
-            pInfo->exactContextHnd      = pParam->exactContextHnd;
-            pInfo->ilCallerHandle       = pParam->pThis->info.compMethodHnd;
-            pInfo->initClassResult      = initClassResult;
-            pInfo->preexistingSpillTemp = BAD_VAR_NUM;
+
+            if (pParam->call->IsGuardedDevirtualizationCandidate())
+            {
+                pInfo = pParam->call->gtInlineCandidateInfo;
+            }
+            else
+            {
+                pInfo = new (pParam->pThis, CMK_Inlining) InlineCandidateInfo;
+
+                // Null out bits we don't use when we're just inlining
+                pInfo->guardedClassHandle  = nullptr;
+                pInfo->guardedMethodHandle = nullptr;
+                pInfo->stubAddr            = nullptr;
+            }
+
+            pInfo->methInfo                       = methInfo;
+            pInfo->ilCallerHandle                 = pParam->pThis->info.compMethodHnd;
+            pInfo->clsHandle                      = clsHandle;
+            pInfo->exactContextHnd                = pParam->exactContextHnd;
+            pInfo->retExpr                        = nullptr;
+            pInfo->dwRestrictions                 = dwRestrictions;
+            pInfo->preexistingSpillTemp           = BAD_VAR_NUM;
+            pInfo->clsAttr                        = clsAttr;
+            pInfo->methAttr                       = pParam->methAttr;
+            pInfo->initClassResult                = initClassResult;
+            pInfo->fncRetType                     = fncRetType;
+            pInfo->exactContextNeedsRuntimeLookup = false;
+
+            // Note exactContextNeedsRuntimeLookup is reset later on,
+            // over in impMarkInlineCandidate.
 
             *(pParam->ppInlineCandidateInfo) = pInfo;
 
@@ -19488,6 +19525,60 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree*    ad
 //    callInfo -- call info from VM
 //
 // Notes:
+//    Mostly a wrapper for impMarkInlineCandidateHelper that also undoes
+//    guarded devirtualization for virtual calls where the method we'd
+//    devirtualize to cannot be inlined.
+
+void Compiler::impMarkInlineCandidate(GenTree*               callNode,
+                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                      bool                   exactContextNeedsRuntimeLookup,
+                                      CORINFO_CALL_INFO*     callInfo)
+{
+    GenTreeCall* call = callNode->AsCall();
+
+    // Do the actual evaluation
+    impMarkInlineCandidateHelper(call, exactContextHnd, exactContextNeedsRuntimeLookup, callInfo);
+
+    // If this call is an inline candidate or is not a guarded devirtualization
+    // candidate, we're done.
+    if (call->IsInlineCandidate() || !call->IsGuardedDevirtualizationCandidate())
+    {
+        return;
+    }
+
+    // If we can't inline the call we'd guardedly devirtualize to,
+    // we undo the guarded devirtualization, as the benefit from
+    // just guarded devirtualization alone is likely not worth the
+    // extra jit time and code size.
+    //
+    // TODO: it is possibly interesting to allow this, but requires
+    // fixes elsewhere too...
+    JITDUMP("Revoking guarded devirtualization candidacy for call [%06u]: target method can't be inlined\n",
+            dspTreeID(call));
+
+    call->ClearGuardedDevirtualizationCandidate();
+
+    // If we have a stub address, restore it back into the union that it shares
+    // with the candidate info.
+    if (call->IsVirtualStub())
+    {
+        JITDUMP("Restoring stub addr %p from guarded devirt candidate info\n",
+                call->gtGuardedDevirtualizationCandidateInfo->stubAddr);
+        call->gtStubCallStubAddr = call->gtGuardedDevirtualizationCandidateInfo->stubAddr;
+    }
+}
+
+//------------------------------------------------------------------------
+// impMarkInlineCandidateHelper: determine if this call can be subsequently
+//     inlined
+//
+// Arguments:
+//    callNode -- call under scrutiny
+//    exactContextHnd -- context handle for inlining
+//    exactContextNeedsRuntimeLookup -- true if context required runtime lookup
+//    callInfo -- call info from VM
+//
+// Notes:
 //    If callNode is an inline candidate, this method sets the flag
 //    GTF_CALL_INLINE_CANDIDATE, and ensures that helper methods have
 //    filled in the associated InlineCandidateInfo.
@@ -19497,10 +19588,10 @@ BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree*    ad
 //    method may be marked as "noinline" to short-circuit any
 //    future assessments of calls to this method.
 
-void Compiler::impMarkInlineCandidate(GenTree*               callNode,
-                                      CORINFO_CONTEXT_HANDLE exactContextHnd,
-                                      bool                   exactContextNeedsRuntimeLookup,
-                                      CORINFO_CALL_INFO*     callInfo)
+void Compiler::impMarkInlineCandidateHelper(GenTreeCall*           call,
+                                            CORINFO_CONTEXT_HANDLE exactContextHnd,
+                                            bool                   exactContextNeedsRuntimeLookup,
+                                            CORINFO_CALL_INFO*     callInfo)
 {
     // Let the strategy know there's another call
     impInlineRoot()->m_inlineStrategy->NoteCall();
@@ -19525,7 +19616,6 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
         return;
     }
 
-    GenTreeCall* call = callNode->AsCall();
     InlineResult inlineResult(this, call, nullptr, "impMarkInlineCandidate");
 
     // Don't inline if not optimizing root method
@@ -19562,14 +19652,20 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
 
     if (call->IsVirtual())
     {
-        inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
-        return;
+        // Allow guarded devirt calls to be treated as inline candidates,
+        // but reject all other virtual calls.
+        if (!call->IsGuardedDevirtualizationCandidate())
+        {
+            inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_NOT_DIRECT);
+            return;
+        }
     }
 
     /* Ignore helper calls */
 
     if (call->gtCallType == CT_HELPER)
     {
+        assert(!call->IsGuardedDevirtualizationCandidate());
         inlineResult.NoteFatal(InlineObservation::CALLSITE_IS_CALL_TO_HELPER);
         return;
     }
@@ -19585,17 +19681,27 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
      * restricts the inliner to non-expanding inlines.  I removed the check to allow for non-expanding
      * inlining in throw blocks.  I should consider the same thing for catch and filter regions. */
 
-    CORINFO_METHOD_HANDLE fncHandle = call->gtCallMethHnd;
+    CORINFO_METHOD_HANDLE fncHandle;
     unsigned              methAttr;
 
-    // Reuse method flags from the original callInfo if possible
-    if (fncHandle == callInfo->hMethod)
+    if (call->IsGuardedDevirtualizationCandidate())
     {
-        methAttr = callInfo->methodFlags;
+        fncHandle = call->gtGuardedDevirtualizationCandidateInfo->guardedMethodHandle;
+        methAttr  = info.compCompHnd->getMethodAttribs(fncHandle);
     }
     else
     {
-        methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+        fncHandle = call->gtCallMethHnd;
+
+        // Reuse method flags from the original callInfo if possible
+        if (fncHandle == callInfo->hMethod)
+        {
+            methAttr = callInfo->methodFlags;
+        }
+        else
+        {
+            methAttr = info.compCompHnd->getMethodAttribs(fncHandle);
+        }
     }
 
 #ifdef DEBUG
@@ -19695,14 +19801,13 @@ void Compiler::impMarkInlineCandidate(GenTree*               callNode,
         return;
     }
 
-    // The old value should be NULL
-    assert(call->gtInlineCandidateInfo == nullptr);
+    // The old value should be null OR this call should be a guarded devirtualization candidate.
+    assert((call->gtInlineCandidateInfo == nullptr) || call->IsGuardedDevirtualizationCandidate());
 
-    // The new value should not be NULL.
+    // The new value should not be null.
     assert(inlineCandidateInfo != nullptr);
     inlineCandidateInfo->exactContextNeedsRuntimeLookup = exactContextNeedsRuntimeLookup;
-
-    call->gtInlineCandidateInfo = inlineCandidateInfo;
+    call->gtInlineCandidateInfo                         = inlineCandidateInfo;
 
     // Mark the call node as inline candidate.
     call->gtFlags |= GTF_CALL_INLINE_CANDIDATE;
@@ -19837,6 +19942,7 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
 //     methodFlags -- [IN/OUT] flags for the method to call. Updated iff call devirtualized.
 //     contextHandle -- [IN/OUT] context handle for the call. Updated iff call devirtualized.
 //     exactContextHnd -- [OUT] updated context handle iff call devirtualized
+//     isLateDevirtualization -- if devirtualization is happening after importation
 //
 // Notes:
 //     Virtual calls in IL will always "invoke" the base class method.
@@ -19861,11 +19967,16 @@ bool Compiler::IsMathIntrinsic(GenTree* tree)
 //     to instead make a local copy. If that is doable, the call is
 //     updated to invoke the unboxed entry on the local copy.
 //
+//     When guarded devirtualization is enabled, this method will mark
+//     calls as guarded devirtualization candidates, if the type of `this`
+//     is not exactly known, and there is a plausible guess for the type.
+
 void Compiler::impDevirtualizeCall(GenTreeCall*            call,
                                    CORINFO_METHOD_HANDLE*  method,
                                    unsigned*               methodFlags,
                                    CORINFO_CONTEXT_HANDLE* contextHandle,
-                                   CORINFO_CONTEXT_HANDLE* exactContextHandle)
+                                   CORINFO_CONTEXT_HANDLE* exactContextHandle,
+                                   bool                    isLateDevirtualization)
 {
     assert(call != nullptr);
     assert(method != nullptr);
@@ -20031,16 +20142,67 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     }
 #endif // defined(DEBUG)
 
-    // Bail if obj class is an interface.
+    // See if the jit's best type for `obj` is an interface.
     // See for instance System.ValueTuple`8::GetHashCode, where lcl 0 is System.IValueTupleInternal
     //   IL_021d:  ldloc.0
     //   IL_021e:  callvirt   instance int32 System.Object::GetHashCode()
+    //
+    // If so, we can't devirtualize, but we may be able to do guarded devirtualization.
     if ((objClassAttribs & CORINFO_FLG_INTERFACE) != 0)
     {
-        JITDUMP("--- obj class is interface, sorry\n");
+        // If we're called during early devirtualiztion, attempt guarded devirtualization
+        // if there's currently just one implementing class.
+        if (exactContextHandle == nullptr)
+        {
+            JITDUMP("--- obj class is interface...unable to dervirtualize, sorry\n");
+            return;
+        }
+
+        CORINFO_CLASS_HANDLE uniqueImplementingClass = NO_CLASS_HANDLE;
+
+        // info.compCompHnd->getUniqueImplementingClass(objClass);
+
+        if (uniqueImplementingClass == NO_CLASS_HANDLE)
+        {
+            JITDUMP("No unique implementor of interface %p (%s), sorry\n", objClass, objClassName);
+            return;
+        }
+
+        JITDUMP("Only known implementor of interface %p (%s) is %p (%s)!\n", objClass, objClassName,
+                uniqueImplementingClass, eeGetClassName(uniqueImplementingClass));
+
+        bool guessUniqueInterface = true;
+
+        INDEBUG(guessUniqueInterface = (JitConfig.JitGuardedDevirtualizationGuessUniqueInterface() > 0););
+
+        if (!guessUniqueInterface)
+        {
+            JITDUMP("Guarded devirt for unique interface implementor is not enabled, sorry\n");
+            return;
+        }
+
+        // Ask the runtime to determine the method that would be called based on the guessed-for type.
+        CORINFO_CONTEXT_HANDLE ownerType = *contextHandle;
+        CORINFO_METHOD_HANDLE  uniqueImplementingMethod =
+            info.compCompHnd->resolveVirtualMethod(baseMethod, uniqueImplementingClass, ownerType);
+
+        if (uniqueImplementingMethod == nullptr)
+        {
+            JITDUMP("Can't figure out which method would be invoked, sorry\n");
+            return;
+        }
+
+        JITDUMP("Interface call would invoke method %s\n", eeGetMethodName(uniqueImplementingMethod, nullptr));
+        DWORD uniqueMethodAttribs = info.compCompHnd->getMethodAttribs(uniqueImplementingMethod);
+        DWORD uniqueClassAttribs  = info.compCompHnd->getClassAttribs(uniqueImplementingClass);
+
+        addGuardedDevirtualizationCandidate(call, uniqueImplementingMethod, uniqueImplementingClass,
+                                            uniqueMethodAttribs, uniqueClassAttribs);
         return;
     }
 
+    // If we get this far, the jit has a lower bound class type for the `this` object being used for dispatch.
+    // It may or may not know enough to devirtualize...
     if (isInterface)
     {
         assert(call->IsVirtualStub());
@@ -20054,6 +20216,9 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     // If we failed to get a handle, we can't devirtualize.  This can
     // happen when prejitting, if the devirtualization crosses
     // servicing bubble boundaries.
+    //
+    // Note if we have some way of guessing a better and more likely type we can do something similar to the code
+    // above for the case where the best jit type is an interface type.
     if (derivedMethod == nullptr)
     {
         JITDUMP("--- no derived method, sorry\n");
@@ -20068,7 +20233,7 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     const char* derivedClassName  = "?derivedClass";
     const char* derivedMethodName = "?derivedMethod";
 
-    const char* note = "speculative";
+    const char* note = "inexact or not final";
     if (isExact)
     {
         note = "exact";
@@ -20093,29 +20258,41 @@ void Compiler::impDevirtualizeCall(GenTreeCall*            call,
     }
 #endif // defined(DEBUG)
 
-    if (!isExact && !objClassIsFinal && !derivedMethodIsFinal)
-    {
-        // Type is not exact, and neither class or method is final.
-        //
-        // We could speculatively devirtualize, but there's no
-        // reason to believe the derived method is the one that
-        // is likely to be invoked.
-        //
-        // If there's currently no further overriding (that is, at
-        // the time of jitting, objClass has no subclasses that
-        // override this method), then perhaps we'd be willing to
-        // make a bet...?
-        JITDUMP("    Class not final or exact, method not final, no devirtualization\n");
-        return;
-    }
+    const bool canDevirtualize = isExact || objClassIsFinal || (!isInterface && derivedMethodIsFinal);
 
-    // For interface calls we must have an exact type or final class.
-    if (isInterface && !isExact && !objClassIsFinal)
+    if (!canDevirtualize)
     {
-        JITDUMP("    Class not final or exact for interface, no devirtualization\n");
+        JITDUMP("    Class not final or exact%s\n", isInterface ? "" : ", and method not final");
+
+        // Have we enabled guarded devirtualization by guessing the jit's best class?
+        bool guessJitBestClass = true;
+        INDEBUG(guessJitBestClass = (JitConfig.JitGuardedDevirtualizationGuessBestClass() > 0););
+
+        if (!guessJitBestClass)
+        {
+            JITDUMP("No guarded devirt: guessing for jit best class disabled\n");
+            return;
+        }
+
+        // Don't try guarded devirtualiztion when we're doing late devirtualization.
+        if (isLateDevirtualization)
+        {
+            JITDUMP("No guarded devirt during late devirtualization\n");
+            return;
+        }
+
+        // We will use the class that introduced the method as our guess
+        // for the runtime class of othe object.
+        CORINFO_CLASS_HANDLE derivedClass = info.compCompHnd->getMethodClass(derivedMethod);
+
+        // Try guarded devirtualization.
+        addGuardedDevirtualizationCandidate(call, derivedMethod, derivedClass, derivedMethodAttribs, objClassAttribs);
         return;
     }
 
+    // All checks done. Time to transform the call.
+    assert(canDevirtualize);
+
     JITDUMP("    %s; can devirtualize\n", note);
 
     // Make the updates.
@@ -20429,10 +20606,25 @@ private:
     {
         GenTree* retExpr = *pRetExpr;
         assert(retExpr->OperGet() == GT_RET_EXPR);
-        JITDUMP("Store return expression %u  as a local var.\n", retExpr->gtTreeID);
-        unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+        const unsigned tmp = comp->lvaGrabTemp(true DEBUGARG("spilling ret_expr"));
+        JITDUMP("Storing return expression [%06u] to a local var V%02u.\n", comp->dspTreeID(retExpr), tmp);
         comp->impAssignTempGen(tmp, retExpr, (unsigned)Compiler::CHECK_SPILL_NONE);
         *pRetExpr = comp->gtNewLclvNode(tmp, retExpr->TypeGet());
+
+        if (retExpr->TypeGet() == TYP_REF)
+        {
+            assert(comp->lvaTable[tmp].lvSingleDef == 0);
+            comp->lvaTable[tmp].lvSingleDef = 1;
+            JITDUMP("Marked V%02u as a single def temp\n", tmp);
+
+            bool                 isExact   = false;
+            bool                 isNonNull = false;
+            CORINFO_CLASS_HANDLE retClsHnd = comp->gtGetClassHandle(retExpr, &isExact, &isNonNull);
+            if (retClsHnd != nullptr)
+            {
+                comp->lvaSetClass(tmp, retClsHnd, isExact);
+            }
+        }
     }
 
 private:
@@ -20448,8 +20640,108 @@ private:
 //
 void Compiler::addFatPointerCandidate(GenTreeCall* call)
 {
+    JITDUMP("Marking call [%06u] as fat pointer candidate\n", dspTreeID(call));
     setMethodHasFatPointer();
     call->SetFatPointerCandidate();
     SpillRetExprHelper helper(this);
     helper.StoreRetExprResultsInArgs(call);
 }
+
+//------------------------------------------------------------------------
+// addGuardedDevirtualizationCandidate: potentially mark the call as a guarded
+//    devirtualization candidate
+//
+// Notes:
+//
+// We currently do not mark calls as candidates when prejitting. This was done
+// to simplify bringing up the associated transformation. It is worth revisiting
+// if we think we can come up with a good guess for the class when prejitting.
+//
+// Call sites in rare or unoptimized code, and calls that require cookies are
+// also not marked as candidates.
+//
+// As part of marking the candidate, the code spills GT_RET_EXPRs anywhere in any
+// child tree, because and we need to clone all these trees when we clone the call
+// as part of guarded devirtualization, and these IR nodes can't be cloned.
+//
+// Arguments:
+//    call - potentual guarded devirtialization candidate
+//    methodHandle - method that will be invoked if the class test succeeds
+//    classHandle - class that will be tested for at runtime
+//    methodAttr - attributes of the method
+//    classAttr - attributes of the class
+//
+void Compiler::addGuardedDevirtualizationCandidate(GenTreeCall*          call,
+                                                   CORINFO_METHOD_HANDLE methodHandle,
+                                                   CORINFO_CLASS_HANDLE  classHandle,
+                                                   unsigned              methodAttr,
+                                                   unsigned              classAttr)
+{
+    // This transformation only makes sense for virtual calls
+    assert(call->IsVirtual());
+
+    // Only mark calls if the feature is enabled.
+    const bool isEnabled = JitConfig.JitEnableGuardedDevirtualization() > 0;
+
+    if (!isEnabled)
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- disabled by jit config\n",
+                dspTreeID(call));
+        return;
+    }
+
+    // Bail when prejitting. We only do this for jitted code.
+    // We shoud revisit this if we think we can come up with good class guesses when prejitting.
+    if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT))
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- prejitting", dspTreeID(call));
+        return;
+    }
+
+    // Bail if not optimizing or the call site is very likely cold
+    if (compCurBB->isRunRarely() || opts.compDbgCode || opts.MinOpts())
+    {
+        JITDUMP("NOT Marking call [%06u] as guarded devirtualization candidate -- rare / dbg / minopts\n",
+                dspTreeID(call));
+        return;
+    }
+
+    // CT_INDRECT calls may use the cookie, bail if so...
+    //
+    // If transforming these provides a benefit, we could save this off in the same way
+    // we save the stub address below.
+    if ((call->gtCallType == CT_INDIRECT) && (call->gtCall.gtCallCookie != nullptr))
+    {
+        return;
+    }
+
+    // We're all set, proceed with candidate creation.
+    JITDUMP("Marking call [%06u] as guarded devirtualization candidate; will guess for class %s\n", dspTreeID(call),
+            eeGetClassName(classHandle));
+    setMethodHasGuardedDevirtualization();
+    call->SetGuardedDevirtualizationCandidate();
+
+    // Spill off any GT_RET_EXPR subtrees so we can clone the call.
+    SpillRetExprHelper helper(this);
+    helper.StoreRetExprResultsInArgs(call);
+
+    // Gather some information for later. Note we actually allocate InlineCandidateInfo
+    // here, as the devirtualized half of this call will likely become an inline candidate.
+    GuardedDevirtualizationCandidateInfo* pInfo = new (this, CMK_Inlining) InlineCandidateInfo;
+
+    pInfo->guardedMethodHandle = methodHandle;
+    pInfo->guardedClassHandle  = classHandle;
+
+    // Save off the stub address since it shares a union with the candidate info.
+    if (call->IsVirtualStub())
+    {
+        JITDUMP("Saving stub addr %p in candidate info\n", call->gtStubCallStubAddr);
+        pInfo->stubAddr = call->gtStubCallStubAddr;
+    }
+    else
+    {
+        pInfo->stubAddr = nullptr;
+    }
+
+    call->gtGuardedDevirtualizationCandidateInfo = pInfo;
+}
index a106476..f523d2a 100644 (file)
@@ -337,6 +337,7 @@ InlineContext::InlineContext(InlineStrategy* strategy)
     , m_CodeSizeEstimate(0)
     , m_Success(true)
     , m_Devirtualized(false)
+    , m_Guarded(false)
     , m_Unboxed(false)
 #if defined(DEBUG) || defined(INLINE_DATA)
     , m_Policy(nullptr)
@@ -397,18 +398,19 @@ void InlineContext::Dump(unsigned indent)
         const char* inlineReason  = InlGetObservationString(m_Observation);
         const char* inlineResult  = m_Success ? "" : "FAILED: ";
         const char* devirtualized = m_Devirtualized ? " devirt" : "";
+        const char* guarded       = m_Guarded ? " guarded" : "";
         const char* unboxed       = m_Unboxed ? " unboxed" : "";
 
         if (m_Offset == BAD_IL_OFFSET)
         {
-            printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
-                   inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+            printf("%*s[%u IL=???? TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, m_TreeID, calleeToken,
+                   inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
         }
         else
         {
             IL_OFFSET offset = jitGetILoffs(m_Offset);
-            printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID, calleeToken,
-                   inlineResult, inlineReason, devirtualized, unboxed, calleeName);
+            printf("%*s[%u IL=%04d TR=%06u %08X] [%s%s%s%s%s] %s\n", indent, "", m_Ordinal, offset, m_TreeID,
+                   calleeToken, inlineResult, inlineReason, guarded, devirtualized, unboxed, calleeName);
         }
     }
 
@@ -1205,6 +1207,7 @@ InlineContext* InlineStrategy::NewSuccess(InlineInfo* inlineInfo)
     calleeContext->m_Observation   = inlineInfo->inlineResult->GetObservation();
     calleeContext->m_Success       = true;
     calleeContext->m_Devirtualized = originalCall->IsDevirtualized();
+    calleeContext->m_Guarded       = originalCall->IsGuarded();
     calleeContext->m_Unboxed       = originalCall->IsUnboxed();
 
 #if defined(DEBUG) || defined(INLINE_DATA)
@@ -1266,6 +1269,7 @@ InlineContext* InlineStrategy::NewFailure(GenTreeStmt* stmt, InlineResult* inlin
     failedContext->m_Callee        = inlineResult->GetCallee();
     failedContext->m_Success       = false;
     failedContext->m_Devirtualized = originalCall->IsDevirtualized();
+    failedContext->m_Guarded       = originalCall->IsGuarded();
     failedContext->m_Unboxed       = originalCall->IsUnboxed();
 
     assert(InlIsValidObservation(failedContext->m_Observation));
index 551341e..e94765a 100644 (file)
@@ -494,22 +494,36 @@ private:
     bool                  m_Reported;
 };
 
+// GuardedDevirtualizationCandidateInfo provides information about
+// a potential target of a virtual call.
+
+struct GuardedDevirtualizationCandidateInfo
+{
+    CORINFO_CLASS_HANDLE  guardedClassHandle;
+    CORINFO_METHOD_HANDLE guardedMethodHandle;
+    void*                 stubAddr;
+};
+
 // InlineCandidateInfo provides basic information about a particular
 // inline candidate.
+//
+// It is a superset of GuardedDevirtualizationCandidateInfo: calls
+// can start out as GDv candidates and turn into inline candidates
 
-struct InlineCandidateInfo
+struct InlineCandidateInfo : public GuardedDevirtualizationCandidateInfo
 {
-    DWORD                  dwRestrictions;
     CORINFO_METHOD_INFO    methInfo;
-    unsigned               methAttr;
+    CORINFO_METHOD_HANDLE  ilCallerHandle; // the logical IL caller of this inlinee.
     CORINFO_CLASS_HANDLE   clsHandle;
+    CORINFO_CONTEXT_HANDLE exactContextHnd;
+    GenTree*               retExpr;
+    DWORD                  dwRestrictions;
+    unsigned               preexistingSpillTemp;
     unsigned               clsAttr;
+    unsigned               methAttr;
+    CorInfoInitClassResult initClassResult;
     var_types              fncRetType;
-    CORINFO_METHOD_HANDLE  ilCallerHandle; // the logical IL caller of this inlinee.
-    CORINFO_CONTEXT_HANDLE exactContextHnd;
     bool                   exactContextNeedsRuntimeLookup;
-    CorInfoInitClassResult initClassResult;
-    unsigned               preexistingSpillTemp;
 };
 
 // InlArgInfo describes inline candidate argument properties.
@@ -683,6 +697,11 @@ public:
         return m_Devirtualized;
     }
 
+    bool IsGuarded() const
+    {
+        return m_Guarded;
+    }
+
     bool IsUnboxed() const
     {
         return m_Unboxed;
@@ -703,6 +722,7 @@ private:
     int               m_CodeSizeEstimate;  // in bytes * 10
     bool              m_Success : 1;       // true if this was a successful inline
     bool              m_Devirtualized : 1; // true if this was a devirtualized call
+    bool              m_Guarded : 1;       // true if this was a guarded call
     bool              m_Unboxed : 1;       // true if this call now invokes the unboxed entry
 
 #if defined(DEBUG) || defined(INLINE_DATA)
index 383264e..649db4e 100644 (file)
@@ -363,6 +363,15 @@ CONFIG_INTEGER(JitEnableRemoveEmptyTry, W("JitEnableRemoveEmptyTry"), 0)
 #endif // defined(FEATURE_CORECLR)
 #endif // DEBUG
 
+// Overall master enable for Guarded Devirtualization. Currently not enabled by default.
+CONFIG_INTEGER(JitEnableGuardedDevirtualization, W("JitEnableGuardedDevirtualization"), 0)
+
+#if defined(DEBUG)
+// Various policies for GuardedDevirtualization
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessUniqueInterface, W("JitGuardedDevirtualizationGuessUniqueInterface"), 1)
+CONFIG_INTEGER(JitGuardedDevirtualizationGuessBestClass, W("JitGuardedDevirtualizationGuessBestClass"), 1)
+#endif // DEBUG
+
 #undef CONFIG_INTEGER
 #undef CONFIG_STRING
 #undef CONFIG_METHODSET
index 8ce2b59..339f7a5 100644 (file)
@@ -2746,6 +2746,11 @@ void Compiler::lvaUpdateClass(unsigned varNum, CORINFO_CLASS_HANDLE clsHnd, bool
     {
         varDsc->lvClassHnd     = clsHnd;
         varDsc->lvClassIsExact = isExact;
+
+#if DEBUG
+        // Note we've modified the type...
+        varDsc->lvClassInfoUpdated = true;
+#endif // DEBUG
     }
 
     return;
index 20e378e..566714f 100644 (file)
@@ -6614,42 +6614,59 @@ GenTree* Compiler::fgMorphField(GenTree* tree, MorphAddrContext* mac)
 
 void Compiler::fgMorphCallInline(GenTreeCall* call, InlineResult* inlineResult)
 {
-    // The call must be a candiate for inlining.
-    assert((call->gtFlags & GTF_CALL_INLINE_CANDIDATE) != 0);
+    bool inliningFailed = false;
 
-    // Attempt the inline
-    fgMorphCallInlineHelper(call, inlineResult);
+    // Is this call an inline candidate?
+    if (call->IsInlineCandidate())
+    {
+        // Attempt the inline
+        fgMorphCallInlineHelper(call, inlineResult);
 
-    // We should have made up our minds one way or another....
-    assert(inlineResult->IsDecided());
+        // We should have made up our minds one way or another....
+        assert(inlineResult->IsDecided());
 
-    // If we failed to inline, we have a bit of work to do to cleanup
-    if (inlineResult->IsFailure())
-    {
+        // If we failed to inline, we have a bit of work to do to cleanup
+        if (inlineResult->IsFailure())
+        {
 
 #ifdef DEBUG
 
-        // Before we do any cleanup, create a failing InlineContext to
-        // capture details of the inlining attempt.
-        m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
+            // Before we do any cleanup, create a failing InlineContext to
+            // capture details of the inlining attempt.
+            m_inlineStrategy->NewFailure(fgMorphStmt, inlineResult);
 
 #endif
 
-        // It was an inline candidate, but we haven't expanded it.
-        if (call->gtCall.gtReturnType != TYP_VOID)
+            inliningFailed = true;
+
+            // Clear the Inline Candidate flag so we can ensure later we tried
+            // inlining all candidates.
+            //
+            call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
+        }
+    }
+    else
+    {
+        // This wasn't an inline candidate. So it must be a GDV candidate.
+        assert(call->IsGuardedDevirtualizationCandidate());
+
+        // We already know we can't inline this call, so don't even bother to try.
+        inliningFailed = true;
+    }
+
+    // If we failed to inline (or didn't even try), do some cleanup.
+    if (inliningFailed)
+    {
+        if (call->gtReturnType != TYP_VOID)
         {
+            JITDUMP("Inlining [%06u] failed, so bashing [%06u] to NOP\n", dspTreeID(call), dspTreeID(fgMorphStmt));
+
             // Detach the GT_CALL tree from the original statement by
             // hanging a "nothing" node to it. Later the "nothing" node will be removed
             // and the original GT_CALL tree will be picked up by the GT_RET_EXPR node.
-
             noway_assert(fgMorphStmt->gtStmtExpr == call);
             fgMorphStmt->gtStmtExpr = gtNewNothingNode();
         }
-
-        // Clear the Inline Candidate flag so we can ensure later we tried
-        // inlining all candidates.
-        //
-        call->gtFlags &= ~GTF_CALL_INLINE_CANDIDATE;
     }
 }
 
@@ -6682,6 +6699,13 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result)
         return;
     }
 
+    // Re-check this because guarded devirtualization may allow these through.
+    if (gtIsRecursiveCall(call) && call->IsImplicitTailCall())
+    {
+        result->NoteFatal(InlineObservation::CALLSITE_IMPLICIT_REC_TAIL_CALL);
+        return;
+    }
+
     // impMarkInlineCandidate() is expected not to mark tail prefixed calls
     // and recursive tail calls as inline candidates.
     noway_assert(!call->IsTailPrefixedCall());