GBE: fixed a predication bug for DW multiplication.
Per bspec:
mul (8) acc0:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
mach (8) rTemp<1>:d r2.0<8;8,1>:d r3.0<8;8,1>:d //All channels must be enabled
mov (8) r5.0<1>:d rTemp<8;8,1>:d // High 32 bits
mov (8) r6.0<1>:d acc0:d // Low 32 bits
The mul and mach instructions must have all channels enabled.
The first mov should have channel enable from the destHI of IMUL,
the second mov should have the channel enable from the destLO of IMUL.
We need to disable the predication and the mask rather than only set noMask to 1.
The strange thing here is for the first quarter, it seems we don't need to do so.
As change both quarter to this style will waste some registers which cause some
kernels fail to compile (compiler_box_blur.cl), I just change the second quater
to fully comply with bspec here. And in practice, it works fine with all unit
test cases and Homer's specific test case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Xing, Homer <homer.xing@intel.com>