powerpc: Optimized SGEMM/DGEMM/CGEMM for POWER10
This patch introduces new optimized version of SGEMM, CGEMM and DGEMM
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
Cycles count reduced by 30-50% compared to POWER9 version depending on
M/N/K sizes.
MMA GCC patch for reference:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=
8ee2640bfdc62f835ec9740278f948034bc7d9f1