Disable Http2_MultipleConnectionsEnabled_InfiniteRequestsCompletelyBlockOneConnection_RemaningRequestsAreHandledByNewConnection (#55593)
Add leeway to revocation timeout tests.
[wasm][http] Improve compatibility of abort and cancelation of BrowserHttpHandler (#55084)
* fixed handling of cancelation and abots exceptions to match unit test expectations
added [ActiveIssue("https://github.com/dotnet/runtime/issues/55083", TestPlatforms.Browser)] for redirect outerloop tests
* more
* code review feedback
Add Span.TryWrite, StringBuilder.Append, and String.Create interpolated strings support
Use ReflectionOnly as data contract serializer option for iOS/tvOS/MacCatalyst (#55503)
An attempt to address #47114 by making the System.Runtime.Serialization.DataContractSerializer.Option property return ReflectionOnly value when RuntimeFeature.IsDynamicCodeSupported is true.
Handle nullable primitives before passing them to WritePrimitive which expects non-null values. (#54800)
Look up the ICustomMarshaler implementation methods based on runtime type (#55439)
[hot_reload] Add support for row modifications; CustomAttribute updates (#55445)
This fixes https://github.com/dotnet/runtime/issues/55097 - which allows us to support C# nullability analysis once again in hot reload deltas.
Specifically we allow EnC deltas to include modifications of existing rows in the CustomAttribute table as long as the Parent and Type columns stay the same (that is: a custom attribute modification that still applies to the same element - and uses the same custom attribute constructor, but may have a different value).
To support this, we add a new BaselineInfo:any_modified_rows array that keeps track for each table whether any rows have been modified (as opposed to added) by any EnC delta. When the runtime calls mono_metadata_decode_row, if there have been any deltas that modified a row in the requested table, we call hot_reload_effective_table_slow which find the latest delta that modified that row. If there have only been additions, we stop at the first delta that added the row we're looking for, if there are modifications, we look through all the deltas and return the latest one.
* [hot_reload] Add test for updating custom attribute ctor values
Just changing the arguments of a custom attribute constructor should generate an update to the CustomAttributes table with the same parent and .ctor. That kind of change should be allowed by Mono and CoreCLR
* [hot_reload] Allow custom attribute row modifications if Parent and Type unchanged.
Allows updates to the constructor arguments (or property values)
* [hot_reload] Implement table lookup of modified rows
Add a bool array on the base image to keep track of whether each table had any row modifications (as opposed to row additions) in any generation.
If there was a modification, take the slow path in mono_image_effective_table even if the index we're looking at is in the base image.
Update hot_reload_effective_table_slow so that if there was a modified row, we look at all the deltas to see if there's an even later update to that row. (When there are only additions, keep same behavior as before - only look
as far as the generation that added the row we wanted to find).
Also refine the assertion in hot_reload_relative_delta_index to properly account for EnCMap entries that correspond to modifications - in that case we might stop searching a metadata delta before we hit the end of the table if the
EnCmap entries start pointing to rows that are past the one we wanted to look up.
* Update the CustomAttributeUpdates test to check attribute value
Check that we get the updated custom attribute string property value.
* Re-enable nullability for hot reload tests
Mono can now deal with the custom attributes modifications that Roslyn emits
Support filtering ObjectAllocated callback for pinned object heap allocation only (#55448)
* Prototype allocation profiler
* Add callback for pinned objects
* Fix the build issue caused by corprof.idl change
* Improve the test
* Misc changes for the tests
Co-authored-by: Andrew Au <andrewau@microsoft.com>
Co-authored-by: Yauk Jia <yaujia@microsoft.com>
Improve loop cloning, with debugging improvements (#55299)
When loop cloning was creating cloning conditions, it was creating unnecessary bounds checks in some multi-dimensional array index cases. When creating a set of cloning conditions, first a null check is done, then an array length check is done, etc. Thus, the array length expression itself won't fault because we've already done a null check. And a subsequent array index expression won't fault (or need a bounds check) because we've already checked the array length (i.e., we've done a manual bounds check). So, stop creating the unnecessary bounds checks, and mark the appropriate instructions as non-faulting by clearing the GTF_EXCEPT bit.
Note that I did not turn on the code to clear GTF_EXCEPT for array length checks because it leads to negative downstream effects in CSE. Namely, there end up being array length expressions that are identical except for the exception bit. When CSE sees this, it gives up on creating a CSE, which leads to regressions in some cases where we don't CSE the array length expression.
Also, for multi-dimension jagged arrays, when optimizing the fast path, we were not removing as many bounds checks as we could. In particular, we weren't removing outer bounds checks, only inner ones. Add code to handle all the bounds checks.
There are some runtime improvements (measured via BenchmarkDotNet on the JIT microbenchmarks), but also some regressions, due, as far as I can tell, to the Intel jcc erratum performance impact. In particular, benchmark ludcmp shows up to a 9% regression due to a `jae` instruction in the hot loop now crossing a 32-byte boundary due to code changes earlier in the function affecting instruction alignment. The hot loop itself is exactly the same (module register allocation differences). As there is nothing that can be done (without mitigating the jcc erratum) -- it's "bad luck".
In addition to those functional changes, there are a number of debugging-related improvements:
1. Loop cloning: (a) Improved dumping of cloning conditions and other things, (b) remove an unnecessary member to `LcOptInfo`, (c) convert the `LoopCloneContext` raw arrays to `jitstd::vector` for easier debugging, as clrjit.natvis can be taught to understand them.
2. CSE improvements: (a) Add `getCSEAvailBit` and `getCSEAvailCrossCallBit` functions to avoid multiple hard-codings of these expresions, (b) stop printing all the details of the CSE dataflow to JitDump; just print the result, (c) add `optPrintCSEDataFlowSet` function to print the CSE dataflow set in symbolic form, not just the raw bits, (d) added `FMT_CSE` string to use for formatting CSE candidates, (e) added `optOptimizeCSEs` to the phase structure for JitDump output, (f) remove unused `optCSECandidateTotal` (remnant of Valnum + lexical CSE)
3. Alignment: (a) Moved printing of alignment boundaries from `emitIssue1Instr` to `emitEndCodeGen`, to avoid the possibility of reading an instruction beyond the basic block. Also, improved the Intel jcc erratum criteria calculations, (b) Change `align` instructions of zero size to have a zero PerfScore throughput number (since they don't generate code), (c) Add `COMPlus_JitDasmWithAlignmentBoundaries` to force disasm output to display alignment boundaries.
4. Codegen / Emitter: (a) Added `emitLabelString` function for constructing a string to display for a bound emitter label. Created `emitPrintLabel` to directly print the label, (b) Add `genInsDisplayName` function to create a string for use when outputting an instruction. For xarch, this prepends the "v" for SIMD instructions, as necessary. This is preferable to calling the raw `genInsName` function, (c) For each insGroup, created a debug-only list of basic blocks that contributed code to that insGroup. Display this set of blocks in the JitDump disasm output, with block ID. This is useful for looking at an IG, and finding the blocks in a .dot flow graph visualization that contributed to it, (d) remove unused `instDisp`
5. Clrjit.natvis: (a) add support for `jitstd::vector`, `JitExpandArray<T>`, `JitExpandArrayStack<T>`, `LcOptInfo`.
6. Misc: (a) When compacting an empty loop preheader block with a subsequent block, clear the preheader flag.
## benchmarks.run.windows.x64.checked.mch:
```
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 25504
Total bytes of diff: 25092
Total bytes of delta: -412 (-1.62% of base)
Total relative delta: -0.31
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (bytes):
-92 : 14861.dasm (-2.57% of base)
-88 : 2430.dasm (-0.77% of base)
-68 : 12182.dasm (-3.82% of base)
-48 : 24678.dasm (-1.61% of base)
-31 : 21598.dasm (-5.13% of base)
-26 : 21601.dasm (-4.57% of base)
-21 : 25069.dasm (-7.14% of base)
-16 : 14859.dasm (-1.38% of base)
-11 : 14862.dasm (-1.35% of base)
-6 : 21600.dasm (-1.83% of base)
-5 : 25065.dasm (-0.58% of base)
11 total files with Code Size differences (11 improved, 0 regressed), 1 unchanged.
Top method improvements (bytes):
-92 (-2.57% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-88 (-0.77% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
-68 (-3.82% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-48 (-1.61% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-31 (-5.13% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-26 (-4.57% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-21 (-7.14% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-16 (-1.38% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-11 (-1.35% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
-6 (-1.83% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-5 (-0.58% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this
Top method improvements (percentages):
-21 (-7.14% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-31 (-5.13% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-26 (-4.57% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-68 (-3.82% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-92 (-2.57% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-6 (-1.83% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-48 (-1.61% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-16 (-1.38% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-11 (-1.35% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
-88 (-0.77% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
-5 (-0.58% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this
11 total methods with Code Size differences (11 improved, 0 regressed), 1 unchanged.
```
</details>
--------------------------------------------------------------------------------
```
Summary of Perf Score diffs:
(Lower is better)
Total PerfScoreUnits of base: 38374.96
Total PerfScoreUnits of diff: 37914.
07000000001
Total PerfScoreUnits of delta: -460.89 (-1.20% of base)
Total relative delta: -0.12
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (PerfScoreUnits):
-220.67 : 24678.dasm (-1.74% of base)
-99.27 : 14861.dasm (-2.09% of base)
-66.30 : 21598.dasm (-1.41% of base)
-18.73 : 2430.dasm (-0.28% of base)
-18.40 : 21601.dasm (-1.37% of base)
-9.73 : 25065.dasm (-0.56% of base)
-9.05 : 14859.dasm (-0.77% of base)
-5.51 : 21600.dasm (-0.77% of base)
-4.15 : 12182.dasm (-0.17% of base)
-3.92 : 14860.dasm (-0.32% of base)
-3.46 : 25069.dasm (-2.31% of base)
-1.70 : 14862.dasm (-0.20% of base)
12 total files with Perf Score differences (12 improved, 0 regressed), 0 unchanged.
Top method improvements (PerfScoreUnits):
-220.67 (-1.74% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-99.27 (-2.09% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-66.30 (-1.41% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-18.73 (-0.28% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
-18.40 (-1.37% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-9.73 (-0.56% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this
-9.05 (-0.77% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-5.51 (-0.77% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-4.15 (-0.17% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-3.92 (-0.32% of base) : 14860.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long
-3.46 (-2.31% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-1.70 (-0.20% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
Top method improvements (percentages):
-3.46 (-2.31% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-99.27 (-2.09% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-220.67 (-1.74% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-66.30 (-1.41% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-18.40 (-1.37% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-9.05 (-0.77% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-5.51 (-0.77% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-9.73 (-0.56% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this
-3.92 (-0.32% of base) : 14860.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long
-18.73 (-0.28% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
-1.70 (-0.20% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
-4.15 (-0.17% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
12 total methods with Perf Score differences (12 improved, 0 regressed), 0 unchanged.
```
</details>
--------------------------------------------------------------------------------
## coreclr_tests.pmi.windows.x64.checked.mch:
```
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 25430
Total bytes of diff: 24994
Total bytes of delta: -436 (-1.71% of base)
Total relative delta: -0.42
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (bytes):
-92 : 194668.dasm (-2.57% of base)
-68 : 194589.dasm (-3.82% of base)
-48 : 248565.dasm (-1.61% of base)
-32 : 249053.dasm (-3.58% of base)
-31 : 251012.dasm (-5.13% of base)
-26 : 251011.dasm (-4.57% of base)
-19 : 248561.dasm (-6.76% of base)
-16 : 194667.dasm (-1.38% of base)
-15 : 252241.dasm (-0.72% of base)
-12 : 252242.dasm (-0.81% of base)
-11 : 194669.dasm (-1.35% of base)
-9 : 246308.dasm (-1.06% of base)
-9 : 246307.dasm (-1.06% of base)
-9 : 246245.dasm (-1.06% of base)
-9 : 246246.dasm (-1.06% of base)
-6 : 228622.dasm (-0.77% of base)
-6 : 251010.dasm (-1.83% of base)
-5 : 248557.dasm (-0.61% of base)
-4 : 249054.dasm (-0.50% of base)
-4 : 249052.dasm (-0.47% of base)
22 total files with Code Size differences (22 improved, 0 regressed), 1 unchanged.
Top method improvements (bytes):
-92 (-2.57% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-68 (-3.82% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-48 (-1.61% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-32 (-3.58% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2()
-31 (-5.13% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-26 (-4.57% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-19 (-6.76% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-16 (-1.38% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-15 (-0.72% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int
-12 (-0.81% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int
-11 (-1.35% of base) : 194669.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
-9 (-1.06% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-6 (-0.77% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[])
-6 (-1.83% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-5 (-0.61% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool
-4 (-0.50% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3()
-4 (-0.47% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1()
Top method improvements (percentages):
-19 (-6.76% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-31 (-5.13% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-26 (-4.57% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-68 (-3.82% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-32 (-3.58% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2()
-92 (-2.57% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-6 (-1.83% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-48 (-1.61% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-16 (-1.38% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-11 (-1.35% of base) : 194669.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
-3 (-1.11% of base) : 249057.dasm - SimpleArray_01.Test:Test2()
-9 (-1.06% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-9 (-1.06% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-12 (-0.81% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int
-6 (-0.77% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[])
-15 (-0.72% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int
-5 (-0.61% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool
-4 (-0.50% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3()
22 total methods with Code Size differences (22 improved, 0 regressed), 1 unchanged.
```
</details>
--------------------------------------------------------------------------------
```
Summary of Perf Score diffs:
(Lower is better)
Total PerfScoreUnits of base: 161610.
68999999997
Total PerfScoreUnits of diff: 160290.
10999999996
Total PerfScoreUnits of delta: -1320.58 (-0.82% of base)
Total relative delta: -0.20
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (PerfScoreUnits):
-639.25 : 252241.dasm (-0.97% of base)
-220.67 : 248565.dasm (-1.74% of base)
-132.59 : 252242.dasm (-0.26% of base)
-99.27 : 194668.dasm (-2.09% of base)
-66.30 : 251012.dasm (-1.41% of base)
-62.20 : 249053.dasm (-2.74% of base)
-18.40 : 251011.dasm (-1.37% of base)
-9.33 : 248557.dasm (-0.54% of base)
-9.05 : 194667.dasm (-0.77% of base)
-8.32 : 249054.dasm (-0.42% of base)
-5.85 : 246308.dasm (-0.52% of base)
-5.85 : 246307.dasm (-0.52% of base)
-5.85 : 246245.dasm (-0.52% of base)
-5.85 : 246246.dasm (-0.52% of base)
-5.51 : 251010.dasm (-0.77% of base)
-4.36 : 249052.dasm (-0.22% of base)
-4.16 : 253363.dasm (-0.21% of base)
-4.15 : 194589.dasm (-0.17% of base)
-3.92 : 194666.dasm (-0.32% of base)
-3.41 : 248561.dasm (-2.29% of base)
23 total files with Perf Score differences (23 improved, 0 regressed), 0 unchanged.
Top method improvements (PerfScoreUnits):
-639.25 (-0.97% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int
-220.67 (-1.74% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-132.59 (-0.26% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int
-99.27 (-2.09% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-66.30 (-1.41% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-62.20 (-2.74% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2()
-18.40 (-1.37% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-9.33 (-0.54% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool
-9.05 (-0.77% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-8.32 (-0.42% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3()
-5.85 (-0.52% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-5.51 (-0.77% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-4.36 (-0.22% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1()
-4.16 (-0.21% of base) : 253363.dasm - MatrixMul.Test:MatrixMul()
-4.15 (-0.17% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
-3.92 (-0.32% of base) : 194666.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long
-3.41 (-2.29% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
Top method improvements (percentages):
-62.20 (-2.74% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2()
-3.41 (-2.29% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int)
-99.27 (-2.09% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int
-220.67 (-1.74% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
-2.70 (-1.71% of base) : 249057.dasm - SimpleArray_01.Test:Test2()
-66.30 (-1.41% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool
-18.40 (-1.37% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool
-639.25 (-0.97% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int
-9.05 (-0.77% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[])
-5.51 (-0.77% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][])
-9.33 (-0.54% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool
-5.85 (-0.52% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this
-5.85 (-0.52% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this
-8.32 (-0.42% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3()
-3.92 (-0.32% of base) : 194666.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long
-132.59 (-0.26% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int
-1.89 (-0.22% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[])
-4.36 (-0.22% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1()
23 total methods with Perf Score differences (23 improved, 0 regressed), 0 unchanged.
```
</details>
--------------------------------------------------------------------------------
## libraries.crossgen2.windows.x64.checked.mch:
```
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 10828
Total bytes of diff: 10809
Total bytes of delta: -19 (-0.18% of base)
Total relative delta: -0.00
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (bytes):
-19 : 72504.dasm (-0.18% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (bytes):
-19 (-0.18% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
Top method improvements (percentages):
-19 (-0.18% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
1 total methods with Code Size differences (1 improved, 0 regressed), 0 unchanged.
```
</details>
--------------------------------------------------------------------------------
```
Summary of Perf Score diffs:
(Lower is better)
Total PerfScoreUnits of base: 6597.12
Total PerfScoreUnits of diff: 6586.31
Total PerfScoreUnits of delta: -10.81 (-0.16% of base)
Total relative delta: -0.00
diff is an improvement.
relative diff is an improvement.
```
<details>
<summary>Detail diffs</summary>
```
Top file improvements (PerfScoreUnits):
-10.81 : 72504.dasm (-0.16% of base)
1 total files with Perf Score differences (1 improved, 0 regressed), 0 unchanged.
Top method improvements (PerfScoreUnits):
-10.81 (-0.16% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
Top method improvements (percentages):
-10.81 (-0.16% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
1 total methods with Perf Score differences (1 improved, 0 regressed), 0 unchanged.
```
</details>
--------------------------------------------------------------------------------
* Increase loop cloning max allowed condition blocks
Allows inner loop of 3-nested loops (e.g., Array2 benchmark)
to be cloned.
* Clear GTF_INX_RNGCHK bit on loop cloning created index nodes
to avoid unnecessary bounds checks.
Revert max cloning condition blocks to 3; allowing more doesn't
seem to improve performance (probably too many conditions before
a not-sufficiently-executed loop, at least for the Array2 benchmark)
* Remove outer index bounds checks
* Convert loop cloning data structures to `vector` for better debugging
* Improve CSE dump output
1. "#if 0" the guts of the CSE dataflow; that's not useful to most people.
2. Add readable CSE number output to the CSE dataflow set output
3. Add FMT_CSE to commonize CSE number output.
4. Add PHASE_OPTIMIZE_VALNUM_CSES to the pre-phase output "allow list"
and stop doing its own blocks/trees output.
5. Remove unused optCSECandidateTotal
6. Add functions `getCSEAvailBit` and `getCSEAvailCrossCallBit` to avoid
hand-coding these bit calculations in multiple places, for the CSE dataflow set bits.
* Mark cloned array indexes as non-faulting
When generating loop cloning conditions, mark array index expressions
as non-faulting, as we have already null- and range-checked the array
before generating an index expression.
I also added similary code to mark array length expressions as non-faulting,
for the same reason. However, that leads to CQ losses because of downstream
CSE effects.
* Don't count zero-sized align instructions in PerfScore
* Add COMPlus_JitDasmWithAlignmentBoundaries
This outputs the alignment boundaries without requiring outputting the actual addresses.
It makes it easier to diff changes.
* Improve bounds check output
* Improve emitter label printing
Create function for printing bound emitter labels.
Also, add debug code to associate a BasicBlock with an insGroup, and
output the block number and ID with the emitter label in JitDump, so it's easier
to find where a group of generated instructions came from.
* Formatting
* Clear BBF_LOOP_PREHEADER bit when compacting empty pre-header block
* Keep track of all basic blocks that contribute code to an insGroup
* Update display of Intel jcc erratum branches in dump
For instructions or instruction sequences which match the Intel jcc
erratum criteria, note that in the alignment boundary dump.
Also, a few fixes:
1. Move the alignment boundary dumping from `emitIssue1Instr` to
`emitEndCodeGen` to avoid the possibility of reading the next instruction in
a group when there is no next instruction.
2. Create `IsJccInstruction` and `IsJmpInstruction` functions for use by the
jcc criteria detection, and fix that detection to fix a few omissions/errors.
3. Change the jcc criteria detection to be hard-coded to 32 byte boundaries
instead of assuming `compJitAlignLoopBoundary` is 32.
An example:
```
cmp r11d, dword ptr [rax+8]
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (cmp: 0 ; jcc erratum) 32B boundary ...............................
jae G_M42486_IG103
```
In this case, the `cmp` doesn't cross the boundary, it is adjacent (the zero indicates the number of bytes
of the instruction which cross the boundary), followed by the `jae` which starts after the boundary.
Indicating the jcc erratum criteria can help point out potential performance issues due to unlucky
alignment of these instructions in asm diffs.
* Display full instruction name in alignment and other messages
XArch sometimes prepends a "v" to the instructions names from the instruction
table. Add a function `genInsDisplayName` to create the full instruction name
that should be displayed, and use that in most places an instruction name will
be displayed, such as in the alignment messages, and normal disassembly. Use
this instead of the raw `genInsName`.
This could be extended to handle arm32 appending an "s", but I didn't want to
touch arm32 with this change.
* Fix build
* Code review feedback
1. Rename GTF_INX_NONFAULTING to GTF_INX_NOFAULT to increase clarity compared
to existing GTF_IND_NONFAULTING.
2. Minor cleanup in getInsDisplayName.
* Formatting