platform/upstream/openblas.git
5 months agoUpdate Makefile.system 35/312435/1 accepted/tizen_9.0_unified accepted/tizen_unified accepted/tizen_unified_dev accepted/tizen_unified_toolchain accepted/tizen_unified_x accepted/tizen_unified_x_asan tizen tizen_9.0 accepted/tizen/9.0/unified/20241030.232516 accepted/tizen/unified/20240611.123207 accepted/tizen/unified/dev/20240620.010958 accepted/tizen/unified/toolchain/20240610.172756 accepted/tizen/unified/x/20240610.223337 accepted/tizen/unified/x/asan/20240625.092441 tizen_9.0_m2_release
Donghak PARK [Mon, 10 Jun 2024 06:28:34 +0000 (15:28 +0900)]
Update Makefile.system

Update Makefile.system file for gcc-14 update

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Change-Id: I70088470cb0a8ebf26d0267c89e77e02a829d1e2
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
23 months agoriscv64: Add RISC-V target 44/284644/1 accepted/tizen_8.0_unified tizen_8.0 accepted/tizen/8.0/unified/20231005.095258 accepted/tizen/unified/20221209.124048 tizen_8.0_m2_release
Marek Pikuła [Mon, 14 Nov 2022 11:10:23 +0000 (12:10 +0100)]
riscv64: Add RISC-V target

Change-Id: I58a8241b6f91946adb2a9506217928f31cf6d34e
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
23 months agoUpgrade to version 0.3.21 43/284643/1
Marek Pikuła [Mon, 14 Nov 2022 11:04:04 +0000 (12:04 +0100)]
Upgrade to version 0.3.21

Change-Id: I47149364b077965d2dbb3dd39d31187704b1866d
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
23 months agoMerge tag 'v0.3.21' into tizen_riscv_base 42/284642/1
Marek Pikuła [Mon, 14 Nov 2022 11:00:13 +0000 (12:00 +0100)]
Merge tag 'v0.3.21' into tizen_riscv_base

Change-Id: I01b8b8b366852eb996598f79bd9a93976cbfa8e0
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
2 years agoUpdate version to 0.3.21 sandbox/lstelmach/upstream-0.3.21 upstream upstream/0.3.21
Martin Kroeker [Sun, 7 Aug 2022 20:36:26 +0000 (22:36 +0200)]
Update version to 0.3.21

2 years agoMerge pull request #3717 from xianyi/develop
Martin Kroeker [Sun, 7 Aug 2022 20:35:20 +0000 (22:35 +0200)]
Merge pull request #3717 from xianyi/develop

Update from develop for 0.3.21 release

2 years agoUpdate version to 0.3.21
Martin Kroeker [Sun, 7 Aug 2022 20:32:11 +0000 (22:32 +0200)]
Update version to 0.3.21

2 years agoMerge pull request #3716 from martin-frbg/0321changes
Martin Kroeker [Sun, 7 Aug 2022 20:30:58 +0000 (22:30 +0200)]
Merge pull request #3716 from martin-frbg/0321changes

Update Changelog for 0.3.21

2 years agoUpdate with 0.3.21 changes
Martin Kroeker [Sun, 7 Aug 2022 20:21:23 +0000 (22:21 +0200)]
Update with 0.3.21 changes

2 years agoMerge pull request #3715 from martin-frbg/issue3648
Martin Kroeker [Sun, 7 Aug 2022 06:45:06 +0000 (08:45 +0200)]
Merge pull request #3715 from martin-frbg/issue3648

Increase thresholds for STFSM and CTFSM in the LAPACK testsuite

2 years agoIncrease threshold
Martin Kroeker [Sat, 6 Aug 2022 22:03:50 +0000 (00:03 +0200)]
Increase threshold

2 years agoIncrease threshold
Martin Kroeker [Sat, 6 Aug 2022 22:03:20 +0000 (00:03 +0200)]
Increase threshold

2 years agoMerge pull request #3609 from martin-frbg/lapack3101
Martin Kroeker [Sat, 6 Aug 2022 12:31:56 +0000 (14:31 +0200)]
Merge pull request #3609 from martin-frbg/lapack3101

Update LAPACK/LAPACKE to Reference-LAPACK 3.10.1

2 years agoresync gensymbol with develop
Martin Kroeker [Sat, 6 Aug 2022 07:29:09 +0000 (09:29 +0200)]
resync gensymbol with develop

2 years agoMerge pull request #3714 from martin-frbg/crosscmake
Martin Kroeker [Thu, 4 Aug 2022 21:58:21 +0000 (23:58 +0200)]
Merge pull request #3714 from martin-frbg/crosscmake

Add more x86_64 target definitions for CMAKE cross-compiling

2 years agoSupport compilation with the Cray C and Fortran compilers (#3712)
Martin Kroeker [Thu, 4 Aug 2022 18:42:18 +0000 (20:42 +0200)]
Support compilation with the Cray C and Fortran compilers (#3712)

* Add support for the Cray Fortran compiler

2 years agoAdd more x86_64 target definitions for cross-compiling
Martin Kroeker [Thu, 4 Aug 2022 17:18:32 +0000 (19:18 +0200)]
Add more x86_64 target definitions for cross-compiling

2 years agoMerge pull request #3709 from nursik/develop
Martin Kroeker [Wed, 3 Aug 2022 13:43:27 +0000 (15:43 +0200)]
Merge pull request #3709 from nursik/develop

Add TCORE Generic

2 years agoMerge pull request #3703 from martin-frbg/omp_adaptive
Martin Kroeker [Wed, 3 Aug 2022 13:38:39 +0000 (15:38 +0200)]
Merge pull request #3703 from martin-frbg/omp_adaptive

Add env variable OMP_ADAPTIVE to control OMP threadpool behaviour

2 years agoMerge pull request #3693 from Mayank-Raj3/Mayank-Raj3-patch-1
Martin Kroeker [Wed, 3 Aug 2022 13:38:14 +0000 (15:38 +0200)]
Merge pull request #3693 from Mayank-Raj3/Mayank-Raj3-patch-1

corrected indentation of for and if statement dgemv_thread_safety.cpp

2 years agoAdd TCORE Generic in prebuild.cmake
Nursultan Zarlyk [Tue, 2 Aug 2022 08:50:58 +0000 (10:50 +0200)]
Add TCORE Generic in prebuild.cmake

During the cross-compilation on x64 host with MSVC for ARMv8, the
build fails as there is no define directives for Generic core.

2 years agoMerge pull request #3707 from martin-frbg/getarch_risc
Martin Kroeker [Sun, 31 Jul 2022 08:13:38 +0000 (10:13 +0200)]
Merge pull request #3707 from martin-frbg/getarch_risc

Fix crash in RISCV autodetection when pmodel is not present in /proc/cpuinfo

2 years agoReally fix compilation; fix crash when pmodel is not present in cpuinfo
Martin Kroeker [Sat, 30 Jul 2022 22:41:04 +0000 (00:41 +0200)]
Really fix compilation; fix crash when pmodel is not present in cpuinfo

2 years agoMerge pull request #3706 from martin-frbg/czifunding
Martin Kroeker [Sat, 30 Jul 2022 12:11:45 +0000 (14:11 +0200)]
Merge pull request #3706 from martin-frbg/czifunding

Acknowledge past CZI EOSS 1/EOSS 3 funding

2 years agoAcknowledge past CZI EOSS 1/EOSS 3 funding
Martin Kroeker [Sat, 30 Jul 2022 10:34:09 +0000 (12:34 +0200)]
Acknowledge past CZI EOSS 1/EOSS 3 funding

2 years agoMerge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch
Martin Kroeker [Thu, 28 Jul 2022 18:31:20 +0000 (20:31 +0200)]
Merge pull request #3704 from XiWeiGu/loongarch64_dynamic_arch

LoongArch64: Add DYNAMIC_ARCH support

2 years agoMerge pull request #3705 from RajalakshmiSR/bf16ppc
Martin Kroeker [Thu, 28 Jul 2022 16:38:14 +0000 (18:38 +0200)]
Merge pull request #3705 from RajalakshmiSR/bf16ppc

POWER: Enable bfloat16 kernels by default

2 years agoPOWER: Enable bfloat16 kernels by default
Rajalakshmi Srinivasaraghavan [Thu, 28 Jul 2022 12:43:53 +0000 (07:43 -0500)]
POWER: Enable bfloat16 kernels by default

This patch enables bfloat16 kernels by default for POWER processors.
Tested on Linux POWER8, POWER9, POWER10 and AIX POWER10 systems.

2 years agoLoongArch64: Add DYNAMIC_ARCH support
gxw [Thu, 28 Jul 2022 05:47:20 +0000 (13:47 +0800)]
LoongArch64: Add DYNAMIC_ARCH support

2 years agoUse OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size
Martin Kroeker [Wed, 27 Jul 2022 21:43:20 +0000 (23:43 +0200)]
Use OMP_ADAPTIVE setting to choose between static and dynamic OMP threadpool size

2 years agoAdd environment variable OMP_ADAPTIVE
Martin Kroeker [Wed, 27 Jul 2022 21:41:47 +0000 (23:41 +0200)]
Add environment variable OMP_ADAPTIVE

2 years agoMerge pull request #3702 from martin-frbg/issue3687
Martin Kroeker [Wed, 27 Jul 2022 18:57:50 +0000 (20:57 +0200)]
Merge pull request #3702 from martin-frbg/issue3687

Add openblas_getaffinity() extension (Linux-only)

2 years agoadd openblas_getaffinity()
Martin Kroeker [Wed, 27 Jul 2022 17:15:18 +0000 (19:15 +0200)]
add openblas_getaffinity()

2 years agoadd openblas_getaffinity()
Martin Kroeker [Wed, 27 Jul 2022 17:14:36 +0000 (19:14 +0200)]
add openblas_getaffinity()

2 years agofix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH...
Martin Kroeker [Wed, 27 Jul 2022 07:17:43 +0000 (09:17 +0200)]
fix detection of Neoverse V1 and user-enforced selection of N2 in ARM64 DYNAMIC_ARCH (#3700)

* fix detection of Neoverse V1 and user-enforced selection of N2

2 years agoMerge pull request #3684 from imzhuhl/neoversen2_dynamic_arch
Martin Kroeker [Tue, 26 Jul 2022 18:06:26 +0000 (20:06 +0200)]
Merge pull request #3684 from imzhuhl/neoversen2_dynamic_arch

Neoverse N2: DYNAMIC_ARCH

2 years agoMerge pull request #3699 from martin-frbg/issue3692
Martin Kroeker [Tue, 26 Jul 2022 14:36:43 +0000 (16:36 +0200)]
Merge pull request #3699 from martin-frbg/issue3692

Add c_check recognition of Fujitsu fcc for Fugaku A64FX

2 years agoMerge pull request #3696 from XiWeiGu/loongson2k1000
Martin Kroeker [Tue, 26 Jul 2022 11:55:41 +0000 (13:55 +0200)]
Merge pull request #3696 from XiWeiGu/loongson2k1000

LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC

2 years agotypo fix
Martin Kroeker [Mon, 25 Jul 2022 19:59:03 +0000 (21:59 +0200)]
typo fix

2 years agoMerge pull request #3698 from martin-frbg/issue3697
Martin Kroeker [Mon, 25 Jul 2022 18:25:23 +0000 (20:25 +0200)]
Merge pull request #3698 from martin-frbg/issue3697

utest needs to be linked against libm on QNX as well

2 years agoTreat Fujitsu fcc on Fugaku like clang
Martin Kroeker [Mon, 25 Jul 2022 17:48:59 +0000 (19:48 +0200)]
Treat Fujitsu fcc on Fugaku like clang

2 years agoAdd Fujitsu compiler
Martin Kroeker [Mon, 25 Jul 2022 17:42:59 +0000 (19:42 +0200)]
Add Fujitsu compiler

2 years agoAdd Fujitsu compiler (fcc)
Martin Kroeker [Mon, 25 Jul 2022 17:39:17 +0000 (19:39 +0200)]
Add Fujitsu compiler (fcc)

2 years agoAdd Fujitsu compiler
Martin Kroeker [Mon, 25 Jul 2022 17:34:16 +0000 (19:34 +0200)]
Add Fujitsu compiler

2 years agoutest needs to be linked against libm on QNX as well
Martin Kroeker [Mon, 25 Jul 2022 15:02:16 +0000 (17:02 +0200)]
utest needs to be linked against libm on QNX as well

2 years agoMerge pull request #3691 from martin-frbg/issue3679-sparc
Martin Kroeker [Mon, 25 Jul 2022 13:41:15 +0000 (15:41 +0200)]
Merge pull request #3691 from martin-frbg/issue3679-sparc

SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow

2 years agoLoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC
gxw [Fri, 22 Jul 2022 09:23:43 +0000 (17:23 +0800)]
LoongArch64: Add core LOONGSON2K1000 and LOONGSONGENERIC

2 years agoMerge pull request #3695 from martin-frbg/ppc6nrm2
Martin Kroeker [Mon, 25 Jul 2022 04:14:30 +0000 (06:14 +0200)]
Merge pull request #3695 from martin-frbg/ppc6nrm2

PPC6: Fix DNRM2 returning INF instead of zero due to intermediate overflow

2 years agoMerge pull request #3694 from martin-frbg/traviswait
Martin Kroeker [Sun, 24 Jul 2022 20:13:08 +0000 (22:13 +0200)]
Merge pull request #3694 from martin-frbg/traviswait

Add back travis_wait to keep ppc jobs from getting cancelled

2 years agoFix DNRM2 returning INF instead of zero due to intermediate overflow
Martin Kroeker [Sun, 24 Jul 2022 15:42:31 +0000 (17:42 +0200)]
Fix DNRM2 returning INF instead of zero due to intermediate overflow

2 years agoAdd back travis_wait to keep ppc jobs from getting cancelled
Martin Kroeker [Sun, 24 Jul 2022 14:44:16 +0000 (16:44 +0200)]
Add back travis_wait to keep ppc jobs from getting cancelled

2 years agoUpdate dgemv_thread_safety.cpp
Mayank Raj [Sun, 24 Jul 2022 06:21:25 +0000 (11:51 +0530)]
Update dgemv_thread_safety.cpp

2 years agoMerge pull request #3690 from RajalakshmiSR/cdotp10
Martin Kroeker [Tue, 19 Jul 2022 11:59:16 +0000 (13:59 +0200)]
Merge pull request #3690 from RajalakshmiSR/cdotp10

POWER: Fix complex dot function failures

2 years agoMerge pull request #3689 from RajalakshmiSR/dgemvgcc10
Martin Kroeker [Tue, 19 Jul 2022 08:25:01 +0000 (10:25 +0200)]
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10

POWER10: dgemv builtin rename

2 years agoMerge pull request #3682 from XiWeiGu/develop
Martin Kroeker [Tue, 19 Jul 2022 08:24:28 +0000 (10:24 +0200)]
Merge pull request #3682 from XiWeiGu/develop

Fix dnrm2_tiny testcase failure

2 years agofix DNRM2 returning INF instead of zero due to intermediate overflow
Martin Kroeker [Tue, 19 Jul 2022 08:19:27 +0000 (10:19 +0200)]
fix DNRM2 returning INF instead of zero due to intermediate overflow

2 years agoPOWER: Fix complex dot function failures
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 19:48:43 +0000 (14:48 -0500)]
POWER: Fix complex dot function failures

There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.

2 years agoPOWER10: dgemv builtin rename
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 14:48:01 +0000 (09:48 -0500)]
POWER10: dgemv builtin rename

Add check to use correct builtin name for older versions
of gcc10 compilers.

2 years agoLoongArch64: Fix dnrm2_tiny testcase failure
gxw [Fri, 15 Jul 2022 03:18:59 +0000 (11:18 +0800)]
LoongArch64: Fix dnrm2_tiny testcase failure

2 years agoMerge pull request #3686 from martin-frbg/issue3685
Martin Kroeker [Wed, 13 Jul 2022 06:24:15 +0000 (08:24 +0200)]
Merge pull request #3686 from martin-frbg/issue3685

Fix Fortran-less CTEST build option

2 years agoFix function prototypes and INTERFACE64 support
Martin Kroeker [Tue, 12 Jul 2022 17:37:30 +0000 (19:37 +0200)]
Fix function prototypes and INTERFACE64 support

2 years agoFix switching between Fortran and C build
Martin Kroeker [Tue, 12 Jul 2022 17:35:31 +0000 (19:35 +0200)]
Fix switching between Fortran and C build

2 years agoNeoverse N2: DYNAMIC_ARCH
Honglin Zhu [Mon, 11 Jul 2022 16:40:22 +0000 (00:40 +0800)]
Neoverse N2: DYNAMIC_ARCH

2 years agoMIPS64: Fix dnrm2_tiny testcase failure
gxw [Thu, 7 Jul 2022 12:39:01 +0000 (20:39 +0800)]
MIPS64: Fix dnrm2_tiny testcase failure

2 years agoMerge pull request #3680 from martin-frbg/issue3636-2
Martin Kroeker [Thu, 7 Jul 2022 09:38:24 +0000 (11:38 +0200)]
Merge pull request #3680 from martin-frbg/issue3636-2

Guard against sysconf(__SC_NPROCESSORS_CONF) returning zero at runtime

2 years agoGuard against sysconf returning zero processors
Martin Kroeker [Wed, 6 Jul 2022 15:22:18 +0000 (17:22 +0200)]
Guard against sysconf returning zero processors

2 years agoGuard against system call returning zero processors
Martin Kroeker [Wed, 6 Jul 2022 15:21:10 +0000 (17:21 +0200)]
Guard against system call returning zero processors

2 years agoMerge pull request #3678 from martin-frbg/issue3677
Martin Kroeker [Tue, 5 Jul 2022 08:40:32 +0000 (10:40 +0200)]
Merge pull request #3678 from martin-frbg/issue3677

Eliminate uses of CREAL on left-hand side of assignments

2 years agoEliminate uses of CREAL on left-hand side of assignments
Martin Kroeker [Mon, 4 Jul 2022 22:01:09 +0000 (00:01 +0200)]
Eliminate uses of CREAL on left-hand side of assignments

2 years agoMerge pull request #3676 from martin-frbg/dnrm2-utest
Martin Kroeker [Mon, 4 Jul 2022 06:37:18 +0000 (08:37 +0200)]
Merge pull request #3676 from martin-frbg/dnrm2-utest

Add DNRM2 regression test for issues 2998 and 3654

2 years agoproperly embed test_dnrm2
Martin Kroeker [Sun, 3 Jul 2022 21:48:30 +0000 (23:48 +0200)]
properly embed test_dnrm2

2 years agouse huge_val not huge_valf for portability
Martin Kroeker [Sun, 3 Jul 2022 18:19:24 +0000 (20:19 +0200)]
use huge_val not huge_valf for portability

2 years agoold systems may not have inf in math.h
Martin Kroeker [Sun, 3 Jul 2022 16:23:51 +0000 (18:23 +0200)]
old systems may not have inf in math.h

2 years agoAdd DNRM2 regression test for issues 2998 and 3654
Martin Kroeker [Sun, 3 Jul 2022 15:56:49 +0000 (17:56 +0200)]
Add DNRM2 regression test for issues 2998 and 3654

2 years agoMerge pull request #3675 from martin-frbg/issue3654
Martin Kroeker [Sun, 3 Jul 2022 06:45:45 +0000 (08:45 +0200)]
Merge pull request #3675 from martin-frbg/issue3654

workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0

2 years agoworkaround fault with ssq=inf,scale=0
Martin Kroeker [Sat, 2 Jul 2022 21:47:17 +0000 (23:47 +0200)]
workaround fault with ssq=inf,scale=0

2 years agoMerge pull request #3672 from imzhuhl/neoversen2_bf16
Martin Kroeker [Fri, 1 Jul 2022 10:13:42 +0000 (12:13 +0200)]
Merge pull request #3672 from imzhuhl/neoversen2_bf16

sbgemm support for ARM Neoverse N2

2 years agoMerge pull request #3670 from martin-frbg/osxvermin
Martin Kroeker [Wed, 29 Jun 2022 06:31:04 +0000 (08:31 +0200)]
Merge pull request #3670 from martin-frbg/osxvermin

Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs

2 years agoAdd gfortran parameters
Honglin Zhu [Wed, 29 Jun 2022 02:08:06 +0000 (10:08 +0800)]
Add gfortran parameters

2 years agoNeoverse N2 sbgemm:
Honglin Zhu [Wed, 22 Jun 2022 15:00:40 +0000 (23:00 +0800)]
Neoverse N2 sbgemm:

    1. Modify the algorithm to resolve multithreading failures
    2. No memory allocation in sbgemm kernel
    3. Optimize when alpha == 1.0f

2 years agoformat code
Honglin Zhu [Thu, 16 Jun 2022 11:36:22 +0000 (19:36 +0800)]
format code

2 years agoneoverse n2 sbgemm:
Honglin Zhu [Wed, 15 Jun 2022 06:20:25 +0000 (14:20 +0800)]
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4

2 years agoneoverse n2 sbgemm: init file
Honglin Zhu [Mon, 13 Jun 2022 09:05:43 +0000 (17:05 +0800)]
neoverse n2 sbgemm: init file

2 years agoMerge pull request #3673 from martin-frbg/azuredynmingw
Martin Kroeker [Tue, 28 Jun 2022 21:13:11 +0000 (23:13 +0200)]
Merge pull request #3673 from martin-frbg/azuredynmingw

AzureCI: drop cpus from the DYNAMIC_LIST for Windows/mingw to save time

2 years agomingw-dynamic arch: drop Haswell too
Martin Kroeker [Tue, 28 Jun 2022 19:40:04 +0000 (21:40 +0200)]
mingw-dynamic arch: drop Haswell too

2 years agodrop NEHALEM from the DYNLIST for Windows/mingw to save time
Martin Kroeker [Tue, 28 Jun 2022 18:12:11 +0000 (20:12 +0200)]
drop NEHALEM from the DYNLIST for Windows/mingw to save time

2 years agoMerge pull request #3669 from VFerrari/fix_small_matrix_kernel
Martin Kroeker [Tue, 28 Jun 2022 14:09:36 +0000 (16:09 +0200)]
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel

POWER: fix issues with the small matrix kernel

2 years agoMerge pull request #3642 from nursik/develop
Martin Kroeker [Tue, 28 Jun 2022 14:05:11 +0000 (16:05 +0200)]
Merge pull request #3642 from nursik/develop

Add ARM64 support for Windows

2 years agoAdd C versions of the CBLAS test sources (#3656)
Martin Kroeker [Tue, 28 Jun 2022 09:52:48 +0000 (11:52 +0200)]
Add C versions of the CBLAS test sources  (#3656)

* Add C conversions of the CBLAS tests for NOFORTRAN=1 builds

* Enable CTEST without Fortran and fix passing of BUILD_vartype options to exports/gensymbol

2 years agoIncrease MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
Martin Kroeker [Tue, 28 Jun 2022 09:46:25 +0000 (11:46 +0200)]
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs

2 years agoPower: Enable SMALL_MATRIX OPT as default for dynamic arch
VFerrari [Sat, 25 Jun 2022 06:28:23 +0000 (03:28 -0300)]
Power: Enable SMALL_MATRIX OPT as default for dynamic arch

2 years agoPOWER10: Fix multithreading check when USE_THREAD=0
VFerrari [Sat, 25 Jun 2022 06:21:18 +0000 (03:21 -0300)]
POWER10: Fix multithreading check when USE_THREAD=0

This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.

The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.

2 years agoMerge pull request #3655 from RajalakshmiSR/zgemmasmp10
Martin Kroeker [Sat, 18 Jun 2022 18:52:26 +0000 (20:52 +0200)]
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10

POWER10: Fix ZGEMM testcase failures

2 years agoMerge pull request #3653 from RajalakshmiSR/dgemvp10
Martin Kroeker [Sat, 18 Jun 2022 18:51:59 +0000 (20:51 +0200)]
Merge pull request #3653 from RajalakshmiSR/dgemvp10

POWER10: convert dgemv inline assembly

2 years agoPOWER10: Fix ZGEMM testcase failures
Rajalakshmi Srinivasaraghavan [Fri, 17 Jun 2022 13:18:08 +0000 (08:18 -0500)]
POWER10: Fix ZGEMM testcase failures

This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.

2 years agoMerge pull request #3647 from martin-frbg/exports_3.10.0
Martin Kroeker [Fri, 10 Jun 2022 06:58:00 +0000 (08:58 +0200)]
Merge pull request #3647 from martin-frbg/exports_3.10.0

Amend gensymbol with some LAPACK 3.10.0 additions

2 years agoAmend some LAPACK 3.10.0 additions
Martin Kroeker [Thu, 9 Jun 2022 17:31:08 +0000 (19:31 +0200)]
Amend some LAPACK 3.10.0 additions

2 years agoReplace with ARM64 intrinsics
Nursultan Zarlyk [Thu, 9 Jun 2022 16:49:49 +0000 (18:49 +0200)]
Replace with ARM64 intrinsics

2 years agoPOWER10: convert dgemv inline assembly
Rajalakshmi Srinivasaraghavan [Thu, 9 Jun 2022 15:42:57 +0000 (10:42 -0500)]
POWER10: convert dgemv inline assembly

This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.

2 years agoMerge pull request #3645 from martin-frbg/issue3644
Martin Kroeker [Wed, 8 Jun 2022 17:29:07 +0000 (19:29 +0200)]
Merge pull request #3645 from martin-frbg/issue3644

Fix quotes around compiler args in C11 check