ARM64: Fix for Multiplication with Overflow Check
For 4 byte integer multiplication, JIT emits a bad-code which is valid
only for 8 byte (i8) multiplication.
For the fix, I use ```smull```(signed)/```umull```(unsigned) instructions
that contain 8 byte results from 4 byte by 4 byte multiplication.
So only one multiplication is needed instead of two for this case.
By simply shifting the results, we could get the upper results that is
used to detect overflow.
Similar transform is made for the unsigned case.
Lower is also changed to reserve a register for overflow check.
Before
smulh w10, w8, w9 --> Incorrect use: smulh is for obtaining the upper
bits [127:64] of x8 * x9
mul w8, w8, w9
cmp x10, x8, ASR #63
After
smull x8, x8, x9 --> x8 = w8 * w9
lsr x10, x8, #32 --> shift the upper bit of x8 to get sign bit
cmp w10, w8, ASR #31 --> check sign bit