This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
* Imported GotoBLAS2 1.13 BSD version
x86/x86 64:
+ * On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps
+ in zdot_sse2.S line 191. This would casue zdotu & zdotc failures.
+ Instead,Walk around it. (Refs issue #8 #9 on github)
* Modified ?axpy functions to return same netlib BLAS results
when incx==0 or incy==0 (Refs issue #7 on github)
* Modified ?swap functions to return same netlib BLAS results
testl $1, N
jle .L48
- movlps -16 * SIZE(X), %xmm4
- movlps -16 * SIZE(Y), %xmm6
+ movlpd -16 * SIZE(X), %xmm4
+ movlpd -16 * SIZE(Y), %xmm6
pshufd $0x4e, %xmm6, %xmm3
mulpd %xmm4, %xmm6