Optimize integer div/mod by const power of 2 in lowering
Optimizing GT_DIV/GT_UDIV/GT_MOD/GT_UMOD by power of 2 in codegen is problematic because the xarch DIV instruction has special register requirements. By the time codegen decides to perform the optimization the rax and rdx registers have been already allocated by LSRA even though they're not always needed (as it happens in the case of unsigned division where CDQ isn't used).
Since the JIT can't represent a CDQ instruction in its IR an arithmetic shift (GT_RSH) has been instead to extract the dividend sign. xarch's SAR is larger than CDQ but it has the advantage that it doesn't require specific registers. Also, arithmetic shifts are available on architectures other than xarch.
Example: method "static int foo(int x) => x / 8;" is now compiled to
mov eax, ecx
mov edx, eax
sar edx, 31
and edx, 7
add edx, eax
mov eax, edx
sar eax, 3
instead of
mov eax, ecx
cdq
and edx, 7
add eax, edx
sar eax, 3
As a side-effect of this change the optimization now also works when the divisor is too large to be contained. Previously this wasn't possible because the divisor constant needed to be modified during codegen but the constant was already loaded into a register.
Example: method "static ulong foo(ulong x) => x /
4294967296;" is now compiled to
mov rax, rcx
shr rax, 32
whereas before a DIV instruction was used.
This change also fixes an issue in fgShouldUseMagicNumberDivide. The optimization that is done in lower can handle negative power of 2 divisors but fgShouldUseMagicNumberDivide handled those cases because it didn't check the absolute value of the divisor.
Example: method "static int foo(int x) => return x / -2;" is now compiled to
mov eax, ecx
mov edx, eax
shr edx, 31
add edx, eax
sar edx, 1
mov eax, edx
neg eax
instead of
mov eax, 0x7FFFFFFF
imul edx:eax, ecx
mov eax, edx
sub eax, ecx
mov edx, eax
shr edx, 31
add eax, edx
Commit migrated from https://github.com/dotnet/coreclr/commit/
d3647c10d7f01daa1f6b38fd601cd9606a08b687