Marek Pikuła [Mon, 14 Nov 2022 11:10:23 +0000 (12:10 +0100)]
riscv64: Add RISC-V target
Change-Id: I58a8241b6f91946adb2a9506217928f31cf6d34e
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Marek Pikuła [Mon, 14 Nov 2022 11:04:04 +0000 (12:04 +0100)]
Upgrade to version 0.3.21
Change-Id: I47149364b077965d2dbb3dd39d31187704b1866d
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Marek Pikuła [Mon, 14 Nov 2022 11:00:13 +0000 (12:00 +0100)]
Merge tag 'v0.3.21' into tizen_riscv_base
Change-Id: I01b8b8b366852eb996598f79bd9a93976cbfa8e0
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Martin Kroeker [Sun, 7 Aug 2022 20:36:26 +0000 (22:36 +0200)]
Update version to 0.3.21
Martin Kroeker [Sun, 7 Aug 2022 20:35:20 +0000 (22:35 +0200)]
Merge pull request #3717 from xianyi/develop
Update from develop for 0.3.21 release
Martin Kroeker [Sun, 7 Aug 2022 20:32:11 +0000 (22:32 +0200)]
Update version to 0.3.21
Martin Kroeker [Sun, 7 Aug 2022 20:30:58 +0000 (22:30 +0200)]
Merge pull request #3716 from martin-frbg/0321changes
Update Changelog for 0.3.21
Martin Kroeker [Sun, 7 Aug 2022 20:21:23 +0000 (22:21 +0200)]
Update with 0.3.21 changes
Martin Kroeker [Sun, 7 Aug 2022 06:45:06 +0000 (08:45 +0200)]
Merge pull request #3715 from martin-frbg/issue3648
Increase thresholds for STFSM and CTFSM in the LAPACK testsuite
Martin Kroeker [Sat, 6 Aug 2022 22:03:50 +0000 (00:03 +0200)]
Increase threshold
Martin Kroeker [Sat, 6 Aug 2022 22:03:20 +0000 (00:03 +0200)]
Increase threshold
Martin Kroeker [Sat, 6 Aug 2022 12:31:56 +0000 (14:31 +0200)]
Merge pull request #3609 from martin-frbg/lapack3101
Update LAPACK/LAPACKE to Reference-LAPACK 3.10.1
Martin Kroeker [Sat, 6 Aug 2022 07:29:09 +0000 (09:29 +0200)]
resync gensymbol with develop
Martin Kroeker [Thu, 4 Aug 2022 21:58:21 +0000 (23:58 +0200)]
Merge pull request #3714 from martin-frbg/crosscmake
Add more x86_64 target definitions for CMAKE cross-compiling
Martin Kroeker [Thu, 4 Aug 2022 18:42:18 +0000 (20:42 +0200)]
Support compilation with the Cray C and Fortran compilers (#3712)
* Add support for the Cray Fortran compiler
Martin Kroeker [Thu, 4 Aug 2022 17:18:32 +0000 (19:18 +0200)]
Add more x86_64 target definitions for cross-compiling
Martin Kroeker [Wed, 3 Aug 2022 13:43:27 +0000 (15:43 +0200)]
Merge pull request #3709 from nursik/develop
Add TCORE Generic
Martin Kroeker [Wed, 3 Aug 2022 13:38:39 +0000 (15:38 +0200)]
Merge pull request #3703 from martin-frbg/omp_adaptive
Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour
Martin Kroeker [Wed, 3 Aug 2022 13:38:14 +0000 (15:38 +0200)]
Merge pull request #3693 from Mayank-Raj3/Mayank-Raj3-patch-1
corrected indentation of for and if statement dgemv_thread_safety.cpp
Nursultan Zarlyk [Tue, 2 Aug 2022 08:50:58 +0000 (10:50 +0200)]
Add TCORE Generic in prebuild.cmake
During the cross-compilation on x64 host with MSVC for ARMv8, the
build fails as there is no define directives for Generic core.
Martin Kroeker [Sun, 31 Jul 2022 08:13:38 +0000 (10:13 +0200)]
Merge pull request #3707 from martin-frbg/getarch_risc
Fix crash in RISCV autodetection when pmodel is not present in /proc/cpuinfo
Martin Kroeker [Sat, 30 Jul 2022 22:41:04 +0000 (00:41 +0200)]
Really fix compilation; fix crash when pmodel is not present in cpuinfo
Martin Kroeker [Sat, 30 Jul 2022 12:11:45 +0000 (14:11 +0200)]
Merge pull request #3706 from martin-frbg/czifunding
Acknowledge past CZI EOSS 1/EOSS 3 funding
Martin Kroeker [Sat, 30 Jul 2022 10:34:09 +0000 (12:34 +0200)]
Acknowledge past CZI EOSS 1/EOSS 3 funding
Martin Kroeker [Thu, 28 Jul 2022 18:31:20 +0000 (20:31 +0200)]
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
LoongArch64: Add DYNAMIC_ARCH support
Martin Kroeker [Thu, 28 Jul 2022 16:38:14 +0000 (18:38 +0200)]
Merge pull request #3705 from RajalakshmiSR/bf16ppc
POWER: Enable bfloat16 kernels by default
Rajalakshmi Srinivasaraghavan [Thu, 28 Jul 2022 12:43:53 +0000 (07:43 -0500)]
POWER: Enable bfloat16 kernels by default
This patch enables bfloat16 kernels by default for POWER processors.
Tested on Linux POWER8, POWER9, POWER10 and AIX POWER10 systems.
gxw [Thu, 28 Jul 2022 05:47:20 +0000 (13:47 +0800)]
LoongArch64: Add DYNAMIC_ARCH support
Martin Kroeker [Wed, 27 Jul 2022 21:43:20 +0000 (23:43 +0200)]
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
Martin Kroeker [Wed, 27 Jul 2022 21:41:47 +0000 (23:41 +0200)]
Add environment variable OMP_ADAPTIVE
Martin Kroeker [Wed, 27 Jul 2022 18:57:50 +0000 (20:57 +0200)]
Merge pull request #3702 from martin-frbg/issue3687
Add openblas_getaffinity() extension (Linux-only)
Martin Kroeker [Wed, 27 Jul 2022 17:15:18 +0000 (19:15 +0200)]
add openblas_getaffinity()
Martin Kroeker [Wed, 27 Jul 2022 17:14:36 +0000 (19:14 +0200)]
add openblas_getaffinity()
Martin Kroeker [Wed, 27 Jul 2022 07:17:43 +0000 (09:17 +0200)]
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH (#3700)
* fix detection of Neoverse V1 and user-enforced selection of N2
Martin Kroeker [Tue, 26 Jul 2022 18:06:26 +0000 (20:06 +0200)]
Merge pull request #3684 from imzhuhl/neoversen2_dynamic_arch
Neoverse N2: DYNAMIC_ARCH
Martin Kroeker [Tue, 26 Jul 2022 14:36:43 +0000 (16:36 +0200)]
Merge pull request #3699 from martin-frbg/issue3692
Add c_check recognition of Fujitsu fcc for Fugaku A64FX
Martin Kroeker [Tue, 26 Jul 2022 11:55:41 +0000 (13:55 +0200)]
Merge pull request #3696 from XiWeiGu/loongson2k1000
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
Martin Kroeker [Mon, 25 Jul 2022 19:59:03 +0000 (21:59 +0200)]
typo fix
Martin Kroeker [Mon, 25 Jul 2022 18:25:23 +0000 (20:25 +0200)]
Merge pull request #3698 from martin-frbg/issue3697
utest needs to be linked against libm on QNX as well
Martin Kroeker [Mon, 25 Jul 2022 17:48:59 +0000 (19:48 +0200)]
Treat Fujitsu fcc on Fugaku like clang
Martin Kroeker [Mon, 25 Jul 2022 17:42:59 +0000 (19:42 +0200)]
Add Fujitsu compiler
Martin Kroeker [Mon, 25 Jul 2022 17:39:17 +0000 (19:39 +0200)]
Add Fujitsu compiler (fcc)
Martin Kroeker [Mon, 25 Jul 2022 17:34:16 +0000 (19:34 +0200)]
Add Fujitsu compiler
Martin Kroeker [Mon, 25 Jul 2022 15:02:16 +0000 (17:02 +0200)]
utest needs to be linked against libm on QNX as well
Martin Kroeker [Mon, 25 Jul 2022 13:41:15 +0000 (15:41 +0200)]
Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
gxw [Fri, 22 Jul 2022 09:23:43 +0000 (17:23 +0800)]
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
Martin Kroeker [Mon, 25 Jul 2022 04:14:30 +0000 (06:14 +0200)]
Merge pull request #3695 from martin-frbg/ppc6nrm2
PPC6: Fix DNRM2 returning INF instead of zero due to intermediate overflow
Martin Kroeker [Sun, 24 Jul 2022 20:13:08 +0000 (22:13 +0200)]
Merge pull request #3694 from martin-frbg/traviswait
Add back travis_wait to keep ppc jobs from getting cancelled
Martin Kroeker [Sun, 24 Jul 2022 15:42:31 +0000 (17:42 +0200)]
Fix DNRM2 returning INF instead of zero due to intermediate overflow
Martin Kroeker [Sun, 24 Jul 2022 14:44:16 +0000 (16:44 +0200)]
Add back travis_wait to keep ppc jobs from getting cancelled
Mayank Raj [Sun, 24 Jul 2022 06:21:25 +0000 (11:51 +0530)]
Update dgemv_thread_safety.cpp
Martin Kroeker [Tue, 19 Jul 2022 11:59:16 +0000 (13:59 +0200)]
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
Martin Kroeker [Tue, 19 Jul 2022 08:25:01 +0000 (10:25 +0200)]
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
Martin Kroeker [Tue, 19 Jul 2022 08:24:28 +0000 (10:24 +0200)]
Merge pull request #3682 from XiWeiGu/develop
Fix dnrm2_tiny testcase failure
Martin Kroeker [Tue, 19 Jul 2022 08:19:27 +0000 (10:19 +0200)]
fix DNRM2 returning INF instead of zero due to intermediate overflow
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 19:48:43 +0000 (14:48 -0500)]
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 14:48:01 +0000 (09:48 -0500)]
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
gxw [Fri, 15 Jul 2022 03:18:59 +0000 (11:18 +0800)]
LoongArch64: Fix dnrm2_tiny testcase failure
Martin Kroeker [Wed, 13 Jul 2022 06:24:15 +0000 (08:24 +0200)]
Merge pull request #3686 from martin-frbg/issue3685
Fix Fortran-less CTEST build option
Martin Kroeker [Tue, 12 Jul 2022 17:37:30 +0000 (19:37 +0200)]
Fix function prototypes and INTERFACE64 support
Martin Kroeker [Tue, 12 Jul 2022 17:35:31 +0000 (19:35 +0200)]
Fix switching between Fortran and C build
Honglin Zhu [Mon, 11 Jul 2022 16:40:22 +0000 (00:40 +0800)]
Neoverse N2: DYNAMIC_ARCH
gxw [Thu, 7 Jul 2022 12:39:01 +0000 (20:39 +0800)]
MIPS64: Fix dnrm2_tiny testcase failure
Martin Kroeker [Thu, 7 Jul 2022 09:38:24 +0000 (11:38 +0200)]
Merge pull request #3680 from martin-frbg/issue3636-2
Guard against sysconf(__SC_NPROCESSORS_CONF) returning zero at runtime
Martin Kroeker [Wed, 6 Jul 2022 15:22:18 +0000 (17:22 +0200)]
Guard against sysconf returning zero processors
Martin Kroeker [Wed, 6 Jul 2022 15:21:10 +0000 (17:21 +0200)]
Guard against system call returning zero processors
Martin Kroeker [Tue, 5 Jul 2022 08:40:32 +0000 (10:40 +0200)]
Merge pull request #3678 from martin-frbg/issue3677
Eliminate uses of CREAL on left-hand side of assignments
Martin Kroeker [Mon, 4 Jul 2022 22:01:09 +0000 (00:01 +0200)]
Eliminate uses of CREAL on left-hand side of assignments
Martin Kroeker [Mon, 4 Jul 2022 06:37:18 +0000 (08:37 +0200)]
Merge pull request #3676 from martin-frbg/dnrm2-utest
Add DNRM2 regression test for issues 2998 and 3654
Martin Kroeker [Sun, 3 Jul 2022 21:48:30 +0000 (23:48 +0200)]
properly embed test_dnrm2
Martin Kroeker [Sun, 3 Jul 2022 18:19:24 +0000 (20:19 +0200)]
use huge_val not huge_valf for portability
Martin Kroeker [Sun, 3 Jul 2022 16:23:51 +0000 (18:23 +0200)]
old systems may not have inf in math.h
Martin Kroeker [Sun, 3 Jul 2022 15:56:49 +0000 (17:56 +0200)]
Add DNRM2 regression test for issues 2998 and 3654
Martin Kroeker [Sun, 3 Jul 2022 06:45:45 +0000 (08:45 +0200)]
Merge pull request #3675 from martin-frbg/issue3654
workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0
Martin Kroeker [Sat, 2 Jul 2022 21:47:17 +0000 (23:47 +0200)]
workaround fault with ssq=inf,scale=0
Martin Kroeker [Fri, 1 Jul 2022 10:13:42 +0000 (12:13 +0200)]
Merge pull request #3672 from imzhuhl/neoversen2_bf16
sbgemm support for ARM Neoverse N2
Martin Kroeker [Wed, 29 Jun 2022 06:31:04 +0000 (08:31 +0200)]
Merge pull request #3670 from martin-frbg/osxvermin
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
Honglin Zhu [Wed, 29 Jun 2022 02:08:06 +0000 (10:08 +0800)]
Add gfortran parameters
Honglin Zhu [Wed, 22 Jun 2022 15:00:40 +0000 (23:00 +0800)]
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
Honglin Zhu [Thu, 16 Jun 2022 11:36:22 +0000 (19:36 +0800)]
format code
Honglin Zhu [Wed, 15 Jun 2022 06:20:25 +0000 (14:20 +0800)]
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
Honglin Zhu [Mon, 13 Jun 2022 09:05:43 +0000 (17:05 +0800)]
neoverse n2 sbgemm: init file
Martin Kroeker [Tue, 28 Jun 2022 21:13:11 +0000 (23:13 +0200)]
Merge pull request #3673 from martin-frbg/azuredynmingw
AzureCI: drop cpus from the DYNAMIC_LIST for Windows/mingw to save time
Martin Kroeker [Tue, 28 Jun 2022 19:40:04 +0000 (21:40 +0200)]
mingw-dynamic arch: drop Haswell too
Martin Kroeker [Tue, 28 Jun 2022 18:12:11 +0000 (20:12 +0200)]
drop NEHALEM from the DYNLIST for Windows/mingw to save time
Martin Kroeker [Tue, 28 Jun 2022 14:09:36 +0000 (16:09 +0200)]
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
Martin Kroeker [Tue, 28 Jun 2022 14:05:11 +0000 (16:05 +0200)]
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
Martin Kroeker [Tue, 28 Jun 2022 09:52:48 +0000 (11:52 +0200)]
Add C versions of the CBLAS test sources (#3656)
* Add C conversions of the CBLAS tests for NOFORTRAN=1 builds
* Enable CTEST without Fortran and fix passing of BUILD_vartype options to exports/gensymbol
Martin Kroeker [Tue, 28 Jun 2022 09:46:25 +0000 (11:46 +0200)]
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
VFerrari [Sat, 25 Jun 2022 06:28:23 +0000 (03:28 -0300)]
Power: Enable SMALL_MATRIX OPT as default for dynamic arch
VFerrari [Sat, 25 Jun 2022 06:21:18 +0000 (03:21 -0300)]
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
Martin Kroeker [Sat, 18 Jun 2022 18:52:26 +0000 (20:52 +0200)]
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
Martin Kroeker [Sat, 18 Jun 2022 18:51:59 +0000 (20:51 +0200)]
Merge pull request #3653 from RajalakshmiSR/dgemvp10
POWER10: convert dgemv inline assembly
Rajalakshmi Srinivasaraghavan [Fri, 17 Jun 2022 13:18:08 +0000 (08:18 -0500)]
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
Martin Kroeker [Fri, 10 Jun 2022 06:58:00 +0000 (08:58 +0200)]
Merge pull request #3647 from martin-frbg/exports_3.10.0
Amend gensymbol with some LAPACK 3.10.0 additions
Martin Kroeker [Thu, 9 Jun 2022 17:31:08 +0000 (19:31 +0200)]
Amend some LAPACK 3.10.0 additions
Nursultan Zarlyk [Thu, 9 Jun 2022 16:49:49 +0000 (18:49 +0200)]
Replace with ARM64 intrinsics
Rajalakshmi Srinivasaraghavan [Thu, 9 Jun 2022 15:42:57 +0000 (10:42 -0500)]
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
Martin Kroeker [Wed, 8 Jun 2022 17:29:07 +0000 (19:29 +0200)]
Merge pull request #3645 from martin-frbg/issue3644
Fix quotes around compiler args in C11 check
Martin Kroeker [Wed, 8 Jun 2022 09:22:20 +0000 (11:22 +0200)]
Fix quotes around compiler args in C11 check