platform/upstream/openblas.git
2 years agoMerge pull request #3464 from binebrank/arm_sve_sgemm
Martin Kroeker [Sat, 11 Dec 2021 19:35:22 +0000 (20:35 +0100)]
Merge pull request #3464 from binebrank/arm_sve_sgemm

Add sgemm part for Arm SVE

2 years agofix UNROLL_MN and add to targets for SVE
Bine Brank [Sat, 11 Dec 2021 15:37:23 +0000 (16:37 +0100)]
fix UNROLL_MN and add to targets for SVE

2 years agoadjust Makefile.L3 for SVE
Bine Brank [Sat, 11 Dec 2021 15:35:08 +0000 (16:35 +0100)]
adjust Makefile.L3 for SVE

2 years agoMerge pull request #3472 from kavanabhat/p10_aixas_p8
Martin Kroeker [Thu, 9 Dec 2021 06:28:57 +0000 (07:28 +0100)]
Merge pull request #3472 from kavanabhat/p10_aixas_p8

Fallback for Power kernels

2 years agoMerge pull request #3469 from martin-frbg/issue2986
Martin Kroeker [Wed, 8 Dec 2021 21:19:32 +0000 (22:19 +0100)]
Merge pull request #3469 from martin-frbg/issue2986

Roll back SkylakeX DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH

2 years agoFix ar path in ARMV7 Darwin NDK build on Azure (#3473)
Martin Kroeker [Wed, 8 Dec 2021 21:18:44 +0000 (22:18 +0100)]
Fix ar path in ARMV7 Darwin NDK build on Azure (#3473)

* Adjust ar commad in ARMV7 Darwin NDK build after homebrew update to NDK 23b

2 years agoFallback for Power kernels
kavanabhat [Wed, 8 Dec 2021 09:52:23 +0000 (03:52 -0600)]
Fallback for Power kernels

2 years agoroll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH
Martin Kroeker [Mon, 6 Dec 2021 18:43:54 +0000 (19:43 +0100)]
roll back DGEMM kernels to 4x8 when compiling for DYNAMIC_ARCH

2 years agoswitch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH
Martin Kroeker [Mon, 6 Dec 2021 18:42:51 +0000 (19:42 +0100)]
switch DGEMM unroll parameters for SkylakeX if DYNAMIC_ARCH

2 years agosgemm v2x8 SVE kernel
Bine Brank [Sun, 5 Dec 2021 17:47:29 +0000 (18:47 +0100)]
sgemm v2x8 SVE kernel

2 years agoMerge pull request #3468 from martin-frbg/issue3467
Martin Kroeker [Sun, 5 Dec 2021 14:52:44 +0000 (15:52 +0100)]
Merge pull request #3468 from martin-frbg/issue3467

Fix hardcoded library name in cpp_thread_test Makefile

2 years agoFix hardcoded library name
Martin Kroeker [Sun, 5 Dec 2021 13:38:41 +0000 (14:38 +0100)]
Fix hardcoded library name

2 years agostrmm sve v1x8 kernel
Bine Brank [Sun, 5 Dec 2021 13:03:08 +0000 (14:03 +0100)]
strmm sve v1x8 kernel

2 years agoFix DYNAMIC_ARCH builds with CMAKE on OSX and add corresponding test to Azure CI...
Martin Kroeker [Sat, 4 Dec 2021 21:24:02 +0000 (22:24 +0100)]
Fix DYNAMIC_ARCH builds with CMAKE on OSX and add corresponding test to Azure CI (#3409)

* Use linker response files and a custom link command to get around ARG_MAX limitations on OSX
* Reconfigure a redundant job on Azure to test shared library builds with CMAKE and DYNAMIC_ARCH on OSX

2 years agoMerge pull request #3466 from rafaelcfsousa/rafael/small_matrix_p10
Martin Kroeker [Fri, 3 Dec 2021 11:12:20 +0000 (12:12 +0100)]
Merge pull request #3466 from rafaelcfsousa/rafael/small_matrix_p10

[POWER] Add small matrix for sgemm/dgemm on Power10

2 years agoMerge pull request #3465 from kavanabhat/develop
Martin Kroeker [Fri, 3 Dec 2021 11:11:43 +0000 (12:11 +0100)]
Merge pull request #3465 from kavanabhat/develop

Fix truncated assembler checks used to build Power10 Kernels

2 years agoDelete test_zhemv.c
Martin Kroeker [Fri, 3 Dec 2021 10:41:53 +0000 (11:41 +0100)]
Delete test_zhemv.c

2 years agoMerge pull request #3455 from cenewcombe/develop
Martin Kroeker [Fri, 3 Dec 2021 09:01:20 +0000 (10:01 +0100)]
Merge pull request #3455 from cenewcombe/develop

Fix unsafe read during final iteration of zsymv_L_sse2.S

2 years agoUpdate Makefile.system
kavanabhat [Thu, 2 Dec 2021 07:59:38 +0000 (13:29 +0530)]
Update Makefile.system

2 years agoMerge pull request #1 from kavanabhat/as_check_fix
kavanabhat [Wed, 1 Dec 2021 15:00:43 +0000 (20:30 +0530)]
Merge pull request #1 from kavanabhat/as_check_fix

Fix truncated assembler checks used for Power10 kernel build

2 years agoFix truncated assembler checks
kavanabhat [Wed, 1 Dec 2021 14:00:40 +0000 (19:30 +0530)]
Fix truncated assembler checks

2 years agotrmm sve copy fucntions for single precision
Bine Brank [Mon, 29 Nov 2021 20:25:05 +0000 (21:25 +0100)]
trmm sve copy fucntions for single precision

2 years ago[POWER] Add support for SMALL_MATRIX_OPT
Rafael Cardoso Fernandes Sousa [Tue, 16 Nov 2021 20:47:41 +0000 (14:47 -0600)]
[POWER] Add support for SMALL_MATRIX_OPT

2 years agoadd sgemm kernel and copy functions for sgemm and ssymm
Bine Brank [Sun, 28 Nov 2021 17:12:47 +0000 (18:12 +0100)]
add sgemm kernel and copy functions for sgemm and ssymm

2 years agoMerge pull request #3425 from binebrank/arm_sve_dgemm
Martin Kroeker [Fri, 26 Nov 2021 15:14:55 +0000 (16:14 +0100)]
Merge pull request #3425 from binebrank/arm_sve_dgemm

Add dgemm kernel for arm64 SVE

2 years agoMerge pull request #3459 from rafaelcfsousa/fix_cmake
Martin Kroeker [Fri, 26 Nov 2021 14:19:24 +0000 (15:19 +0100)]
Merge pull request #3459 from rafaelcfsousa/fix_cmake

Fix issues when building OpenBLAS with cmake

2 years agoMerge pull request #3462 from martin-frbg/azure-alpine2
Martin Kroeker [Fri, 26 Nov 2021 12:40:23 +0000 (13:40 +0100)]
Merge pull request #3462 from martin-frbg/azure-alpine2

Azure CI: Update alpine-chroot-install again

2 years agoUpdate alpine-chroot-install again
Martin Kroeker [Fri, 26 Nov 2021 12:39:49 +0000 (13:39 +0100)]
Update alpine-chroot-install again

2 years agoupdate CONTRIBUTORS.md
Bine Brank [Fri, 26 Nov 2021 12:11:19 +0000 (13:11 +0100)]
update CONTRIBUTORS.md

2 years agoAdapt CMake for SVE
Bine Brank [Fri, 26 Nov 2021 09:35:01 +0000 (10:35 +0100)]
Adapt CMake for SVE

2 years agoMerge pull request #3457 from wjc404/optimize-A53-dgemm
Martin Kroeker [Fri, 26 Nov 2021 09:30:47 +0000 (10:30 +0100)]
Merge pull request #3457 from wjc404/optimize-A53-dgemm

MOD: optimize zgemm on cortex-A53/cortex-A55

2 years agoMerge pull request #3456 from martin-frbg/issue3444
Martin Kroeker [Fri, 26 Nov 2021 09:29:28 +0000 (10:29 +0100)]
Merge pull request #3456 from martin-frbg/issue3444

Add/restore a GENERIC target for MIPS32 and support MIPS32 cross-compilation using CMAKE

2 years agoAzureCI: Fetch alpine-chroot-install from master to get key updates (#3460)
Martin Kroeker [Fri, 26 Nov 2021 08:38:41 +0000 (09:38 +0100)]
AzureCI: Fetch alpine-chroot-install from master to get key updates (#3460)

* Fetch alpine-chroot-install from master to get key updates

2 years agoMOD: add comments to a53 zgemm kernel
Jia-Chen [Thu, 25 Nov 2021 14:48:48 +0000 (22:48 +0800)]
MOD: add comments to a53 zgemm kernel

2 years agoModify the order that cmake set the KERNEL variables (generic now is fallback)
Rafael Cardoso Fernandes Sousa [Thu, 25 Nov 2021 02:07:20 +0000 (20:07 -0600)]
Modify the order that cmake set the KERNEL variables (generic now is fallback)

2 years agoFix the cmake parser to identify more patterns
Rafael Cardoso Fernandes Sousa [Wed, 24 Nov 2021 20:07:28 +0000 (14:07 -0600)]
Fix the cmake parser to identify more patterns

2 years agoMOD: optimize zgemm on cortex-A53/cortex-A55
Jia-Chen [Wed, 24 Nov 2021 13:51:45 +0000 (21:51 +0800)]
MOD: optimize zgemm on cortex-A53/cortex-A55

2 years agoreduced dgemm_unroll_m to work with 128-bit sve
Bine Brank [Tue, 23 Nov 2021 20:18:08 +0000 (21:18 +0100)]
reduced dgemm_unroll_m to work with 128-bit sve

2 years agoremoved unused code (compiler warnings)
Bine Brank [Mon, 22 Nov 2021 09:12:34 +0000 (10:12 +0100)]
removed unused code (compiler warnings)

2 years agomodify Makefile for SVE copy
Bine Brank [Mon, 22 Nov 2021 08:54:20 +0000 (09:54 +0100)]
modify Makefile for SVE copy

2 years agoconfigure SVE Makefile
Bine Brank [Sun, 21 Nov 2021 17:33:43 +0000 (18:33 +0100)]
configure SVE Makefile

2 years agosome clean-up & commentary
Bine Brank [Sun, 21 Nov 2021 13:56:27 +0000 (14:56 +0100)]
some clean-up & commentary

2 years agoFix unintended reversion of recent CortexA53 changes
Martin Kroeker [Sat, 20 Nov 2021 22:54:48 +0000 (23:54 +0100)]
Fix unintended reversion of recent CortexA53 changes

2 years agoAdd CMAKE support for cross-compiling to MIPS32
Martin Kroeker [Sat, 20 Nov 2021 16:34:28 +0000 (17:34 +0100)]
Add CMAKE support for cross-compiling to MIPS32

2 years agoAdd generic mips32 target
Martin Kroeker [Sat, 20 Nov 2021 16:31:51 +0000 (17:31 +0100)]
Add generic mips32 target

2 years agoAdd generic MIPS32 target
Martin Kroeker [Sat, 20 Nov 2021 16:31:11 +0000 (17:31 +0100)]
Add generic MIPS32 target

2 years agosymm SVE copy rutines
Bine Brank [Sat, 20 Nov 2021 15:35:29 +0000 (16:35 +0100)]
symm SVE copy rutines

2 years agoFix unsafe read during final iteration of zsymv_L_sse2.S
Caroline Newcombe [Fri, 19 Nov 2021 20:29:32 +0000 (14:29 -0600)]
Fix unsafe read during final iteration of zsymv_L_sse2.S

2 years agoMerge pull request #3451 from wjc404/optimize-A53-dgemm
Martin Kroeker [Thu, 18 Nov 2021 17:17:27 +0000 (18:17 +0100)]
Merge pull request #3451 from wjc404/optimize-A53-dgemm

MOD: optimize DGEMM of large matrices on cortex A53 & A55

2 years agoMOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
Jia-Chen [Thu, 18 Nov 2021 13:14:43 +0000 (21:14 +0800)]
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55

2 years agoMerge pull request #3450 from mmuetzel/suffix-nofortran
Martin Kroeker [Tue, 16 Nov 2021 22:58:09 +0000 (23:58 +0100)]
Merge pull request #3450 from mmuetzel/suffix-nofortran

cmake: Set SUFFIX64 also for NOFORTRAN

2 years agocmake: Set SUFFIX64 also for NOFORTRAN
Markus Mützel [Mon, 15 Nov 2021 07:53:52 +0000 (08:53 +0100)]
cmake: Set SUFFIX64 also for NOFORTRAN

2 years agoadd remaining trmm copy rutines for SVE
Bine Brank [Sun, 14 Nov 2021 15:00:10 +0000 (16:00 +0100)]
add remaining trmm copy rutines for SVE

2 years agoMerge pull request #3449 from martin-frbg/mips_msa
Martin Kroeker [Sun, 14 Nov 2021 11:01:53 +0000 (12:01 +0100)]
Merge pull request #3449 from martin-frbg/mips_msa

Fix MIPS/MIPS64 compilation querying compiler rather than cpu for MSA capability

2 years agoIgnore compiler support for MIPS MSA if the cpu lacks this capability
Martin Kroeker [Sat, 13 Nov 2021 22:32:26 +0000 (23:32 +0100)]
Ignore compiler support for MIPS MSA if the cpu lacks this capability

2 years agoMIPS P5600 and 24KC,1004K cpus do not support MSA
Martin Kroeker [Sat, 13 Nov 2021 22:26:48 +0000 (23:26 +0100)]
MIPS P5600 and 24KC,1004K cpus do not support MSA

2 years agoget MSA capability from feature flags
Martin Kroeker [Sat, 13 Nov 2021 22:25:34 +0000 (23:25 +0100)]
get MSA capability from feature flags

2 years agodtrmm_utcopy sve function
Bine Brank [Sat, 13 Nov 2021 17:48:53 +0000 (18:48 +0100)]
dtrmm_utcopy sve function

3 years agoMerge pull request #3447 from martin-frbg/issue3446
Martin Kroeker [Thu, 11 Nov 2021 08:29:36 +0000 (09:29 +0100)]
Merge pull request #3447 from martin-frbg/issue3446

Fix potentially wrong HOSTARCH definition in cross-compilation

3 years agoFix potentially wrong HOSTARCH definition in cross-compilation
Martin Kroeker [Wed, 10 Nov 2021 21:27:14 +0000 (22:27 +0100)]
Fix potentially wrong HOSTARCH definition in cross-compilation

3 years agoadd v2x8 kernel + fix sve dtrmm
Bine Brank [Sun, 7 Nov 2021 19:37:51 +0000 (20:37 +0100)]
add v2x8 kernel + fix sve dtrmm

3 years agoMerge pull request #3443 from martin-frbg/issue3441
Martin Kroeker [Fri, 5 Nov 2021 11:23:47 +0000 (12:23 +0100)]
Merge pull request #3443 from martin-frbg/issue3441

Fix NULL pointer checks in blas_memory_alloc

3 years agoFix NULL pointer checks in blas_memory_alloc
Martin Kroeker [Fri, 5 Nov 2021 09:43:17 +0000 (10:43 +0100)]
Fix NULL pointer checks in blas_memory_alloc

3 years agoMerge pull request #3431 from MehdiChinoune/export-shared-only
Martin Kroeker [Thu, 4 Nov 2021 22:48:02 +0000 (23:48 +0100)]
Merge pull request #3431 from MehdiChinoune/export-shared-only

Fix exported OpenBLASTargets.cmake

3 years agoMerge pull request #3442 from martin-frbg/cpuid_x86
Martin Kroeker [Thu, 4 Nov 2021 22:47:11 +0000 (23:47 +0100)]
Merge pull request #3442 from martin-frbg/cpuid_x86

Add CPUID recognition of Intel Alder Lake

3 years agoAdd CPUIDs for Alder Lake and other recent Intel cpus
Martin Kroeker [Thu, 4 Nov 2021 19:36:39 +0000 (20:36 +0100)]
Add CPUIDs for Alder Lake and other recent Intel cpus

3 years agoAdd CPUIDs for Alder Lake and some other recent Intel cpus
Martin Kroeker [Thu, 4 Nov 2021 19:35:41 +0000 (20:35 +0100)]
Add CPUIDs for Alder Lake and some other recent Intel cpus

3 years agoMerge pull request #3429 from martin-frbg/issue3428
Martin Kroeker [Thu, 4 Nov 2021 11:13:22 +0000 (12:13 +0100)]
Merge pull request #3429 from martin-frbg/issue3428

Adjust compiler options for nvc after 21.9 (and fix typo in DYNAMIC_ARCH settings)

3 years agoMerge pull request #3440 from mhillenbrand/fix_gemv_indices
Martin Kroeker [Thu, 4 Nov 2021 11:11:50 +0000 (12:11 +0100)]
Merge pull request #3440 from mhillenbrand/fix_gemv_indices

Fix flipped indices in benchmark for gemv

3 years agoFix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437)
Martin Kroeker [Thu, 4 Nov 2021 11:11:16 +0000 (12:11 +0100)]
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437)

*  return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP

3 years agoFix flipped indices in benchmark for gemv
Marius Hillenbrand [Wed, 3 Nov 2021 11:45:09 +0000 (12:45 +0100)]
Fix flipped indices in benchmark for gemv

Fixes #3439

3 years agoadd ARMV8SVE target
Bine Brank [Mon, 1 Nov 2021 21:53:21 +0000 (22:53 +0100)]
add ARMV8SVE target

3 years agoMerge pull request #3427 from mhillenbrand/zarch-detection-notes
Martin Kroeker [Mon, 1 Nov 2021 20:45:33 +0000 (21:45 +0100)]
Merge pull request #3427 from mhillenbrand/zarch-detection-notes

cpuid_zarch/hwcaps: add documentation and dump hwcaps in init

3 years agoMerge pull request #3434 from gxw-loongson/develop
Martin Kroeker [Mon, 1 Nov 2021 20:44:49 +0000 (21:44 +0100)]
Merge pull request #3434 from gxw-loongson/develop

Add cblas_{c/z}srot cblas_{c/z}rotg support

3 years agoAdd cblas_{c/z}srot cblas_{c/z}rotg support
gxw [Mon, 1 Nov 2021 12:15:42 +0000 (20:15 +0800)]
Add cblas_{c/z}srot cblas_{c/z}rotg support

3 years agofix sve dgemm kernel + sve dtrmm
Bine Brank [Sun, 31 Oct 2021 09:24:25 +0000 (10:24 +0100)]
fix sve dgemm kernel + sve dtrmm

3 years agoFix nvidia HPC version checks
Martin Kroeker [Sat, 30 Oct 2021 15:31:19 +0000 (17:31 +0200)]
Fix nvidia HPC version checks

3 years agoadded SVE ncopy and tcopy
Bine Brank [Sat, 30 Oct 2021 10:11:44 +0000 (12:11 +0200)]
added SVE ncopy and tcopy

3 years agoFix exported OpenBLASTargets.cmake
Mehdi Chinoune [Fri, 29 Oct 2021 20:28:21 +0000 (21:28 +0100)]
Fix exported OpenBLASTargets.cmake

When both BUILD_SHARED_LIBS and BUILD_STATIC_LIBS are enabled,
cmake export both of them to OpenBLASTargets under tha same name `OpenBLAS::OpenBLAS`
which leads to fatal error about OpenBLAS::OpenBLAS being both static and shared target.
This change makes cmake export only the shared library in that case.
There is another solution to treat them as components,
but I am afraid that will make it backward incompatible.

3 years agoAdjust compiler options for nvidia hpc 21.9 (and fix a long-standing typo in dynamic_...
Martin Kroeker [Fri, 29 Oct 2021 14:39:03 +0000 (16:39 +0200)]
Adjust compiler options for nvidia hpc 21.9 (and fix a long-standing typo in dynamic_arch settings)

3 years agocpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Marius Hillenbrand [Wed, 27 Oct 2021 15:26:28 +0000 (17:26 +0200)]
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init

Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.

To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
3 years agoMerge pull request #3426 from martin-frbg/pr3424
Martin Kroeker [Thu, 28 Oct 2021 05:31:12 +0000 (07:31 +0200)]
Merge pull request #3426 from martin-frbg/pr3424

Add runtime DYNAMIC_ARCH cpu detection for Tiger Lake H

3 years agoAdd model number for Tiger Lake H (mobile variant)
Martin Kroeker [Wed, 27 Oct 2021 20:17:58 +0000 (22:17 +0200)]
Add model number for Tiger Lake H (mobile variant)

3 years agoadd sve dgemm prototype
Bine Brank [Wed, 27 Oct 2021 14:37:18 +0000 (16:37 +0200)]
add sve dgemm prototype

3 years agoMerge pull request #3424 from Neutron3529/patch-1
Martin Kroeker [Wed, 27 Oct 2021 14:28:12 +0000 (16:28 +0200)]
Merge pull request #3424 from Neutron3529/patch-1

auto-detect for Intel i7-11800H

3 years agoMerge pull request #3423 from mhillenbrand/fix-static-detection
Martin Kroeker [Wed, 27 Oct 2021 14:27:47 +0000 (16:27 +0200)]
Merge pull request #3423 from mhillenbrand/fix-static-detection

s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice

3 years agoauto-detect for Intel i7-11800H
Neutron3529 [Wed, 27 Oct 2021 06:16:37 +0000 (14:16 +0800)]
auto-detect for Intel i7-11800H

3 years agos390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
Marius Hillenbrand [Tue, 26 Oct 2021 13:19:49 +0000 (15:19 +0200)]
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice

On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
3 years agoMerge pull request #3422 from martin-frbg/issue3421
Martin Kroeker [Mon, 25 Oct 2021 21:37:28 +0000 (23:37 +0200)]
Merge pull request #3422 from martin-frbg/issue3421

Revert invalid trsv shortcut from PR #3252

3 years agoRevert #3252
Martin Kroeker [Sun, 24 Oct 2021 21:57:06 +0000 (23:57 +0200)]
Revert #3252

3 years agoMerge pull request #3420 from martin-frbg/issue3419
Martin Kroeker [Wed, 20 Oct 2021 10:00:06 +0000 (12:00 +0200)]
Merge pull request #3420 from martin-frbg/issue3419

Revert wrong ZTRSV optimization from #3252

3 years agoRemove dangerous optimization from previous #3252 - buffer is never unused here
Martin Kroeker [Wed, 20 Oct 2021 08:50:02 +0000 (10:50 +0200)]
Remove dangerous optimization from previous #3252 - buffer is never unused here

3 years agoMerge pull request #3418 from martin-frbg/issue2927-2
Martin Kroeker [Wed, 20 Oct 2021 06:23:53 +0000 (08:23 +0200)]
Merge pull request #3418 from martin-frbg/issue2927-2

Enable SVE for A64FX

3 years agoEnable SVE for A64FX
Martin Kroeker [Tue, 19 Oct 2021 21:23:40 +0000 (23:23 +0200)]
Enable SVE for A64FX

3 years agoAdd basic support for the Fujitsu A64FX (#3415)
Martin Kroeker [Mon, 18 Oct 2021 13:00:19 +0000 (15:00 +0200)]
Add basic support for the Fujitsu A64FX (#3415)

* Add initial support for Fujitsu A64FX as generic ARMV8

3 years agoMerge pull request #3416 from guowangy/spr-bf16
Martin Kroeker [Mon, 18 Oct 2021 12:59:21 +0000 (14:59 +0200)]
Merge pull request #3416 from guowangy/spr-bf16

sbgemm: add AMX-BF16 based kernel for Sapphire Rapids

3 years agosbgemm: spr: disable small matrix path by default
Wangyang Guo [Tue, 12 Oct 2021 08:18:37 +0000 (01:18 -0700)]
sbgemm: spr: disable small matrix path by default

3 years agosbgemm: spr: implement otcopy_16
Wangyang Guo [Thu, 23 Sep 2021 08:08:40 +0000 (01:08 -0700)]
sbgemm: spr: implement otcopy_16

3 years agosbgemm: spr: reuse ncopy_16 from cooperlake as incopy
Wangyang Guo [Sat, 18 Sep 2021 08:11:31 +0000 (01:11 -0700)]
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy

3 years agosbgemm: spr: optimization for tmp_c buffer
Wangyang Guo [Sat, 18 Sep 2021 06:59:32 +0000 (23:59 -0700)]
sbgemm: spr: optimization for tmp_c buffer