POWER10: Change the packing format for bfloat16
authorRajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Tue, 13 Oct 2020 21:05:10 +0000 (16:05 -0500)
committerRajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Tue, 13 Oct 2020 21:05:10 +0000 (16:05 -0500)
commit0826d68f93ef1fed021c426911c464728d60ccb3
tree0ed4ace473a0e51d63d6ed2d0e495e82580049a2
parent602a0c7a699ba1f2167119591ba2798ef506a73b
POWER10: Change the packing format for bfloat16

As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code.  This avoids permute instructions
in the gemm kernel inner loop.
kernel/power/KERNEL.POWER10
kernel/power/sbgemm_kernel_power10.c
kernel/power/sbgemm_ncopy_16_power10.c [new file with mode: 0644]
kernel/power/sbgemm_ncopy_8_power10.c [new file with mode: 0644]
kernel/power/sbgemm_tcopy_16_power10.c [new file with mode: 0644]
kernel/power/sbgemm_tcopy_8_power10.c [new file with mode: 0644]