s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang
authorMarius Hillenbrand <mhillen@linux.ibm.com>
Tue, 1 Sep 2020 14:16:53 +0000 (16:16 +0200)
committerMarius Hillenbrand <mhillen@linux.ibm.com>
Wed, 2 Sep 2020 11:49:31 +0000 (13:49 +0200)
commit2ee5b899ce9777c63710de1ede75c362db5bcd47
tree21cc4993947946764653c757e524b1197989203d
parent095f4e6964ba150b1293747d842a60294836be45
s390x: enable S/DGEMM block with explicit loop unrolling + interleaving with clang

The code for SGEMM 16x4 and DGEMM 8x4 blocks on z14 and z15 uses
explicit unrolling and interleaving to improve performance. The code
employs an empty inline asm statement with operands that constrain the
compiler's instruction scheduling and thereby enforce proper overlapping
of load and compute phases. Fix an ifdef to apply that for clang builds,
as well.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
kernel/zarch/gemm_vec.c