RFC : Add half precision gemm for bfloat16 in OpenBLAS
authorRajalakshmi Srinivasaraghavan <raji@linux.ibm.com>
Tue, 14 Apr 2020 19:55:08 +0000 (14:55 -0500)
committerRajalakshmi Srinivasaraghavan <raji@linux.ibm.com>
Tue, 14 Apr 2020 19:55:08 +0000 (14:55 -0500)
commit7eb55504b1727eebcb0f451fa5b148dbea303b69
tree1de8d9b68acf46139b1e01dc36664e220aac0b6d
parentc861b2a7bda3c88ad30aac105e473b46fd940dd7
RFC : Add half precision gemm for bfloat16 in OpenBLAS

This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes).  Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N.  Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.

Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64.  For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.

This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
27 files changed:
Makefile.system
Makefile.tail
cmake/prebuild.cmake
cmake/system.cmake
common.h
common_interface.h
common_level3.h
common_macro.h
common_param.h
common_sh.h [new file with mode: 0644]
driver/level3/Makefile
driver/level3/level3.c
driver/level3/level3_thread.c
driver/others/parameter.c
getarch_2nd.c
interface/Makefile
interface/gemm.c
kernel/Makefile.L3
kernel/generic/gemm_beta.c
kernel/generic/gemm_ncopy_2.c
kernel/generic/gemm_tcopy_2.c
kernel/generic/gemmkernel_2x2.c
kernel/power/KERNEL.POWER9
kernel/setparam-ref.c
lapack/getrf/potrf_parallel.c
param.h
test/compare_sgemm_shgemm.c [new file with mode: 0644]