Remove Monitor asm helpers (#14146)
- Removed asm helpers on Windows and used portable C++ helpers instead
- Rearranged fast path code to improve them a bit and match the asm more closely
Perf:
- The asm helpers are a bit faster. The code generated for the portable helpers is almost the same now, the remaining differences are:
- There were some layout issues where hot paths were in the wrong place and return paths were not cloned. Instrumenting some of the tests below with PGO on x64 resolved all of the layout issues. I couldn't get PGO instrumentation to work on x86 but I imagine it would be the same there.
- Register usage
- x64: All of the Enter functions are using one or two (TryEnter is using two) callee-saved registers for no apparent reason, forcing them to be saved and restored. r10 and r11 seem to be available but they're not being used.
- x86: Similarly to x64, the compiled functions are pushing and popping 2-3 additional registers in the hottest fast paths.
- I believe this is the main remaining gap and PGO is not helping with this
- On Linux, perf is >= before for the most part
- Perf tests used for below are updated in PR https://github.com/dotnet/coreclr/pull/13670
14 files changed: