Martin Kroeker [Fri, 26 Nov 2021 12:39:49 +0000 (13:39 +0100)]
Update alpine-chroot-install again
Martin Kroeker [Fri, 26 Nov 2021 09:30:47 +0000 (10:30 +0100)]
Merge pull request #3457 from wjc404/optimize-A53-dgemm
MOD: optimize zgemm on cortex-A53/cortex-A55
Martin Kroeker [Fri, 26 Nov 2021 09:29:28 +0000 (10:29 +0100)]
Merge pull request #3456 from martin-frbg/issue3444
Add/restore a GENERIC target for MIPS32 and support MIPS32 cross-compilation using CMAKE
Martin Kroeker [Fri, 26 Nov 2021 08:38:41 +0000 (09:38 +0100)]
AzureCI: Fetch alpine-chroot-install from master to get key updates (#3460)
* Fetch alpine-chroot-install from master to get key updates
Jia-Chen [Thu, 25 Nov 2021 14:48:48 +0000 (22:48 +0800)]
MOD: add comments to a53 zgemm kernel
Jia-Chen [Wed, 24 Nov 2021 13:51:45 +0000 (21:51 +0800)]
MOD: optimize zgemm on cortex-A53/cortex-A55
Martin Kroeker [Sat, 20 Nov 2021 22:54:48 +0000 (23:54 +0100)]
Fix unintended reversion of recent CortexA53 changes
Martin Kroeker [Sat, 20 Nov 2021 16:34:28 +0000 (17:34 +0100)]
Add CMAKE support for cross-compiling to MIPS32
Martin Kroeker [Sat, 20 Nov 2021 16:31:51 +0000 (17:31 +0100)]
Add generic mips32 target
Martin Kroeker [Sat, 20 Nov 2021 16:31:11 +0000 (17:31 +0100)]
Add generic MIPS32 target
Martin Kroeker [Thu, 18 Nov 2021 17:17:27 +0000 (18:17 +0100)]
Merge pull request #3451 from wjc404/optimize-A53-dgemm
MOD: optimize DGEMM of large matrices on cortex A53 & A55
Jia-Chen [Thu, 18 Nov 2021 13:14:43 +0000 (21:14 +0800)]
MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55
Martin Kroeker [Tue, 16 Nov 2021 22:58:09 +0000 (23:58 +0100)]
Merge pull request #3450 from mmuetzel/suffix-nofortran
cmake: Set SUFFIX64 also for NOFORTRAN
Markus Mützel [Mon, 15 Nov 2021 07:53:52 +0000 (08:53 +0100)]
cmake: Set SUFFIX64 also for NOFORTRAN
Martin Kroeker [Sun, 14 Nov 2021 11:01:53 +0000 (12:01 +0100)]
Merge pull request #3449 from martin-frbg/mips_msa
Fix MIPS/MIPS64 compilation querying compiler rather than cpu for MSA capability
Martin Kroeker [Sat, 13 Nov 2021 22:32:26 +0000 (23:32 +0100)]
Ignore compiler support for MIPS MSA if the cpu lacks this capability
Martin Kroeker [Sat, 13 Nov 2021 22:26:48 +0000 (23:26 +0100)]
MIPS P5600 and 24KC,1004K cpus do not support MSA
Martin Kroeker [Sat, 13 Nov 2021 22:25:34 +0000 (23:25 +0100)]
get MSA capability from feature flags
Martin Kroeker [Thu, 11 Nov 2021 08:29:36 +0000 (09:29 +0100)]
Merge pull request #3447 from martin-frbg/issue3446
Fix potentially wrong HOSTARCH definition in cross-compilation
Martin Kroeker [Wed, 10 Nov 2021 21:27:14 +0000 (22:27 +0100)]
Fix potentially wrong HOSTARCH definition in cross-compilation
Martin Kroeker [Fri, 5 Nov 2021 11:23:47 +0000 (12:23 +0100)]
Merge pull request #3443 from martin-frbg/issue3441
Fix NULL pointer checks in blas_memory_alloc
Martin Kroeker [Fri, 5 Nov 2021 09:43:17 +0000 (10:43 +0100)]
Fix NULL pointer checks in blas_memory_alloc
Martin Kroeker [Thu, 4 Nov 2021 22:48:02 +0000 (23:48 +0100)]
Merge pull request #3431 from MehdiChinoune/export-shared-only
Fix exported OpenBLASTargets.cmake
Martin Kroeker [Thu, 4 Nov 2021 22:47:11 +0000 (23:47 +0100)]
Merge pull request #3442 from martin-frbg/cpuid_x86
Add CPUID recognition of Intel Alder Lake
Martin Kroeker [Thu, 4 Nov 2021 19:36:39 +0000 (20:36 +0100)]
Add CPUIDs for Alder Lake and other recent Intel cpus
Martin Kroeker [Thu, 4 Nov 2021 19:35:41 +0000 (20:35 +0100)]
Add CPUIDs for Alder Lake and some other recent Intel cpus
Martin Kroeker [Thu, 4 Nov 2021 11:13:22 +0000 (12:13 +0100)]
Merge pull request #3429 from martin-frbg/issue3428
Adjust compiler options for nvc after 21.9 (and fix typo in DYNAMIC_ARCH settings)
Martin Kroeker [Thu, 4 Nov 2021 11:11:50 +0000 (12:11 +0100)]
Merge pull request #3440 from mhillenbrand/fix_gemv_indices
Fix flipped indices in benchmark for gemv
Martin Kroeker [Thu, 4 Nov 2021 11:11:16 +0000 (12:11 +0100)]
Fix miscounting of threadpool size on Linux with OMP_PROC_BIND=TRUE (#3437)
* return OMP places (if available, or SC_NPROCESSORS_CONF) for maximum thread count when built with OpenMP
Marius Hillenbrand [Wed, 3 Nov 2021 11:45:09 +0000 (12:45 +0100)]
Fix flipped indices in benchmark for gemv
Fixes #3439
Martin Kroeker [Mon, 1 Nov 2021 20:45:33 +0000 (21:45 +0100)]
Merge pull request #3427 from mhillenbrand/zarch-detection-notes
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Martin Kroeker [Mon, 1 Nov 2021 20:44:49 +0000 (21:44 +0100)]
Merge pull request #3434 from gxw-loongson/develop
Add cblas_{c/z}srot cblas_{c/z}rotg support
gxw [Mon, 1 Nov 2021 12:15:42 +0000 (20:15 +0800)]
Add cblas_{c/z}srot cblas_{c/z}rotg support
Martin Kroeker [Sat, 30 Oct 2021 15:31:19 +0000 (17:31 +0200)]
Fix nvidia HPC version checks
Mehdi Chinoune [Fri, 29 Oct 2021 20:28:21 +0000 (21:28 +0100)]
Fix exported OpenBLASTargets.cmake
When both BUILD_SHARED_LIBS and BUILD_STATIC_LIBS are enabled,
cmake export both of them to OpenBLASTargets under tha same name `OpenBLAS::OpenBLAS`
which leads to fatal error about OpenBLAS::OpenBLAS being both static and shared target.
This change makes cmake export only the shared library in that case.
There is another solution to treat them as components,
but I am afraid that will make it backward incompatible.
Martin Kroeker [Fri, 29 Oct 2021 14:39:03 +0000 (16:39 +0200)]
Adjust compiler options for nvidia hpc 21.9 (and fix a long-standing typo in dynamic_arch settings)
Marius Hillenbrand [Wed, 27 Oct 2021 15:26:28 +0000 (17:26 +0200)]
cpuid_zarch/hwcaps: add documentation and dump hwcaps in init
Add pointers to the definition of the hardware capability flags in glibc
and describe how they relate to the levels CPU_Z13 and CPU_Z14 for
optimized kernels.
To aid identifying available hardware capabilities and in debugging
potential build issues, dump their value in dynamic_arch_init() when
OPENBLAS_VERBOSE is set to 2 or higher.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Martin Kroeker [Thu, 28 Oct 2021 05:31:12 +0000 (07:31 +0200)]
Merge pull request #3426 from martin-frbg/pr3424
Add runtime DYNAMIC_ARCH cpu detection for Tiger Lake H
Martin Kroeker [Wed, 27 Oct 2021 20:17:58 +0000 (22:17 +0200)]
Add model number for Tiger Lake H (mobile variant)
Martin Kroeker [Wed, 27 Oct 2021 14:28:12 +0000 (16:28 +0200)]
Merge pull request #3424 from Neutron3529/patch-1
auto-detect for Intel i7-11800H
Martin Kroeker [Wed, 27 Oct 2021 14:27:47 +0000 (16:27 +0200)]
Merge pull request #3423 from mhillenbrand/fix-static-detection
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
Neutron3529 [Wed, 27 Oct 2021 06:16:37 +0000 (14:16 +0800)]
auto-detect for Intel i7-11800H
Marius Hillenbrand [Tue, 26 Oct 2021 13:19:49 +0000 (15:19 +0200)]
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Martin Kroeker [Mon, 25 Oct 2021 21:37:28 +0000 (23:37 +0200)]
Merge pull request #3422 from martin-frbg/issue3421
Revert invalid trsv shortcut from PR #3252
Martin Kroeker [Sun, 24 Oct 2021 21:57:06 +0000 (23:57 +0200)]
Revert #3252
Martin Kroeker [Wed, 20 Oct 2021 10:00:06 +0000 (12:00 +0200)]
Merge pull request #3420 from martin-frbg/issue3419
Revert wrong ZTRSV optimization from #3252
Martin Kroeker [Wed, 20 Oct 2021 08:50:02 +0000 (10:50 +0200)]
Remove dangerous optimization from previous #3252 - buffer is never unused here
Martin Kroeker [Wed, 20 Oct 2021 06:23:53 +0000 (08:23 +0200)]
Merge pull request #3418 from martin-frbg/issue2927-2
Enable SVE for A64FX
Martin Kroeker [Tue, 19 Oct 2021 21:23:40 +0000 (23:23 +0200)]
Enable SVE for A64FX
Martin Kroeker [Mon, 18 Oct 2021 13:00:19 +0000 (15:00 +0200)]
Add basic support for the Fujitsu A64FX (#3415)
* Add initial support for Fujitsu A64FX as generic ARMV8
Martin Kroeker [Mon, 18 Oct 2021 12:59:21 +0000 (14:59 +0200)]
Merge pull request #3416 from guowangy/spr-bf16
sbgemm: add AMX-BF16 based kernel for Sapphire Rapids
Wangyang Guo [Tue, 12 Oct 2021 08:18:37 +0000 (01:18 -0700)]
sbgemm: spr: disable small matrix path by default
Wangyang Guo [Thu, 23 Sep 2021 08:08:40 +0000 (01:08 -0700)]
sbgemm: spr: implement otcopy_16
Wangyang Guo [Sat, 18 Sep 2021 08:11:31 +0000 (01:11 -0700)]
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy
Wangyang Guo [Sat, 18 Sep 2021 06:59:32 +0000 (23:59 -0700)]
sbgemm: spr: optimization for tmp_c buffer
Wangyang Guo [Fri, 17 Sep 2021 07:48:52 +0000 (00:48 -0700)]
sbgemm: spr: kernel handle alpha != 1.0
Wangyang Guo [Fri, 17 Sep 2021 03:08:42 +0000 (20:08 -0700)]
sbgemm: spr: oncopy: use tile load/store instead
Wangyang Guo [Thu, 16 Sep 2021 08:04:01 +0000 (01:04 -0700)]
sbgemm: spr: only load A once in tail_k handling
Wangyang Guo [Thu, 16 Sep 2021 06:59:38 +0000 (23:59 -0700)]
sbgemm: spr: process k2 and odd k at the same time
Wangyang Guo [Thu, 16 Sep 2021 03:29:49 +0000 (20:29 -0700)]
sbgemm: spr: enlarge P to 256 for performance
Wangyang Guo [Thu, 16 Sep 2021 02:36:02 +0000 (19:36 -0700)]
sbgemm: spr: oncopy: avoid handling too much pointer at a time
Wangyang Guo [Wed, 15 Sep 2021 08:11:15 +0000 (01:11 -0700)]
sbgemm: spr: reduce tile conf loading by seperate tail k handling
Wangyang Guo [Mon, 13 Sep 2021 08:44:53 +0000 (01:44 -0700)]
sbgemm: spr: tuning for blocking params
Wangyang Guo [Mon, 13 Sep 2021 02:22:58 +0000 (19:22 -0700)]
sbgemm: spr: kernel works for NN case when alpha is 1.0
Wangyang Guo [Fri, 10 Sep 2021 08:14:05 +0000 (01:14 -0700)]
sbgemm: spr: kernel works for m32 in NN case
Wangyang Guo [Thu, 9 Sep 2021 02:41:12 +0000 (19:41 -0700)]
sbgemm: spr: implement oncopy_16
Wangyang Guo [Tue, 7 Sep 2021 02:48:23 +0000 (19:48 -0700)]
sbgemm: spr: add dummy source files
Martin Kroeker [Sun, 17 Oct 2021 22:26:14 +0000 (00:26 +0200)]
Add march/mtune flags for clang builds on ARM64 as well (#3414)
* Add march/mtune flags for clang as well
Martin Kroeker [Sun, 17 Oct 2021 21:05:11 +0000 (23:05 +0200)]
Merge pull request #3404 from guowangy/spr-build
Initial build support for Sapphire Rapids
Martin Kroeker [Sun, 17 Oct 2021 20:46:48 +0000 (22:46 +0200)]
Merge pull request #3413 from MehdiChinoune/cmake-readibiltiy
[NFC] Improve CMakeLists.txt file readibility
Mehdi Chinoune [Sun, 17 Oct 2021 04:19:30 +0000 (05:19 +0100)]
[NFC] Improve CMakeLists.txt file readibility
Add some extra lines and indentations.
Martin Kroeker [Sun, 17 Oct 2021 18:07:14 +0000 (20:07 +0200)]
Merge pull request #3411 from MehdiChinoune/both_shared_static
Support building both static and shared libraries
Mehdi Chinoune [Sat, 16 Oct 2021 07:33:47 +0000 (08:33 +0100)]
Support building both static and shared libraries
Martin Kroeker [Sat, 16 Oct 2021 11:52:41 +0000 (13:52 +0200)]
Merge pull request #3410 from MehdiChinoune/mingw-clang-64
Fix MinGW/Clang 64 bits detection.
مهدي شينون (Mehdi Chinoune) [Sat, 16 Oct 2021 06:55:10 +0000 (07:55 +0100)]
Fix MinGW/Clang 64 bits detection.
CMAKE_COMPILER_IS_GNUCC is only valid for GCC.
Wangyang Guo [Tue, 12 Oct 2021 09:01:20 +0000 (02:01 -0700)]
Fix build error in legacy gcc
Wangyang Guo [Tue, 12 Oct 2021 08:39:09 +0000 (01:39 -0700)]
Add NO_AVX=1 fallbacks to Sapphire Rapids build
Wangyang Guo [Fri, 3 Sep 2021 07:39:50 +0000 (00:39 -0700)]
initial support for Sapphire Rapids platform
Martin Kroeker [Sun, 10 Oct 2021 21:24:52 +0000 (23:24 +0200)]
Update conda in Appveyor CI and move jobs from Appveyor to Azure (#3400)
* Fix clang/cl builds on Appveyor and move them to Azure
* Add clang/flang and mingw builds on Windows to Azure
Martin Kroeker [Thu, 7 Oct 2021 06:09:34 +0000 (08:09 +0200)]
Merge pull request #3399 from martin-frbg/issue2814
Improve performance on Apple M1 Vortex
Martin Kroeker [Wed, 6 Oct 2021 09:10:19 +0000 (11:10 +0200)]
Use "big arm server" GEMM defaults for Vortex
Martin Kroeker [Wed, 6 Oct 2021 09:06:43 +0000 (11:06 +0200)]
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well
Martin Kroeker [Tue, 5 Oct 2021 16:59:47 +0000 (18:59 +0200)]
Merge pull request #3398 from kavanabhat/aix_p10_gnuas
Big Endian Changes for Power10 kernels
Martin Kroeker [Mon, 4 Oct 2021 20:23:19 +0000 (22:23 +0200)]
Merge pull request #3396 from martin-frbg/makesys_typo
Fix minor typo in Makefile.system
Martin Kroeker [Mon, 4 Oct 2021 20:22:57 +0000 (22:22 +0200)]
Merge pull request #3397 from martin-frbg/m1detect
Fix detection of Apple M1 "Vortex"
Martin Kroeker [Mon, 4 Oct 2021 15:58:29 +0000 (17:58 +0200)]
Fix cache reporting for Apple M1
Martin Kroeker [Mon, 4 Oct 2021 14:46:41 +0000 (16:46 +0200)]
Fix detection of Apple M1 "Vortex"
Martin Kroeker [Mon, 4 Oct 2021 14:14:32 +0000 (16:14 +0200)]
Fix minor typo
Martin Kroeker [Sat, 2 Oct 2021 19:15:39 +0000 (21:15 +0200)]
Update version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:15:00 +0000 (21:15 +0200)]
Update version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:14:11 +0000 (21:14 +0200)]
Merge pull request #3395 from xianyi/release-0.3.0
Merge 0.3.18 back into develop to copy tag
Martin Kroeker [Sat, 2 Oct 2021 17:38:09 +0000 (19:38 +0200)]
Update version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:35:27 +0000 (19:35 +0200)]
Merge pull request #3394 from xianyi/develop
Merge from develop for 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:35:03 +0000 (19:35 +0200)]
Merge branch 'release-0.3.0' into develop
Martin Kroeker [Sat, 2 Oct 2021 17:29:59 +0000 (19:29 +0200)]
Update version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:25:58 +0000 (19:25 +0200)]
Update Changelog for 0.3.18 (#3388)
* Update Changelog for 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:07:04 +0000 (19:07 +0200)]
Merge pull request #3393 from martin-frbg/azurealpine
Update Alpine version used in Azure CI
Martin Kroeker [Sat, 2 Oct 2021 14:27:34 +0000 (16:27 +0200)]
Update Alpine version
Martin Kroeker [Fri, 1 Oct 2021 12:56:09 +0000 (14:56 +0200)]
Merge pull request #3392 from martin-frbg/lapack625
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
kavanabhat [Fri, 1 Oct 2021 10:18:35 +0000 (05:18 -0500)]
AIX changes for P10 with GNU Compiler