Do magic division optimization in lowering
Currently this optimization is done during global morph. Global morph
takes place very early so cases that rely on constant propagation and
folding (e.g "int d = 3; return n / d;") aren't handled.
Also, the IR generated during morph is complex and unlikely to enable
further optimizations. Quite the contrary, the presence of GT_ASG and
GT_MULHI nodes blocks other optimizations (CSE, LICM currently).
The generated code is identical to the code generated by the morph
implementation with one exception: when 2 shifts are needed
(e.g. for x / 7) they are now computed independently:
mov eax, edx
sar eax, 2
shr edx, 31
add eax, edx
instead of:
mov eax, edx
sar eax, 2
mov edx, eax
shr edx, 31
add eax, edx
This results in shorter code and avoids creating an additional temp.