Martin Kroeker [Sun, 24 Jul 2022 20:13:08 +0000 (22:13 +0200)]
Merge pull request #3694 from martin-frbg/traviswait
Add back travis_wait to keep ppc jobs from getting cancelled
Martin Kroeker [Sun, 24 Jul 2022 14:44:16 +0000 (16:44 +0200)]
Add back travis_wait to keep ppc jobs from getting cancelled
Martin Kroeker [Tue, 19 Jul 2022 11:59:16 +0000 (13:59 +0200)]
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
Martin Kroeker [Tue, 19 Jul 2022 08:25:01 +0000 (10:25 +0200)]
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
Martin Kroeker [Tue, 19 Jul 2022 08:24:28 +0000 (10:24 +0200)]
Merge pull request #3682 from XiWeiGu/develop
Fix dnrm2_tiny testcase failure
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 19:48:43 +0000 (14:48 -0500)]
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
Rajalakshmi Srinivasaraghavan [Mon, 18 Jul 2022 14:48:01 +0000 (09:48 -0500)]
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
gxw [Fri, 15 Jul 2022 03:18:59 +0000 (11:18 +0800)]
LoongArch64: Fix dnrm2_tiny testcase failure
Martin Kroeker [Wed, 13 Jul 2022 06:24:15 +0000 (08:24 +0200)]
Merge pull request #3686 from martin-frbg/issue3685
Fix Fortran-less CTEST build option
Martin Kroeker [Tue, 12 Jul 2022 17:37:30 +0000 (19:37 +0200)]
Fix function prototypes and INTERFACE64 support
Martin Kroeker [Tue, 12 Jul 2022 17:35:31 +0000 (19:35 +0200)]
Fix switching between Fortran and C build
gxw [Thu, 7 Jul 2022 12:39:01 +0000 (20:39 +0800)]
MIPS64: Fix dnrm2_tiny testcase failure
Martin Kroeker [Thu, 7 Jul 2022 09:38:24 +0000 (11:38 +0200)]
Merge pull request #3680 from martin-frbg/issue3636-2
Guard against sysconf(__SC_NPROCESSORS_CONF) returning zero at runtime
Martin Kroeker [Wed, 6 Jul 2022 15:22:18 +0000 (17:22 +0200)]
Guard against sysconf returning zero processors
Martin Kroeker [Wed, 6 Jul 2022 15:21:10 +0000 (17:21 +0200)]
Guard against system call returning zero processors
Martin Kroeker [Tue, 5 Jul 2022 08:40:32 +0000 (10:40 +0200)]
Merge pull request #3678 from martin-frbg/issue3677
Eliminate uses of CREAL on left-hand side of assignments
Martin Kroeker [Mon, 4 Jul 2022 22:01:09 +0000 (00:01 +0200)]
Eliminate uses of CREAL on left-hand side of assignments
Martin Kroeker [Mon, 4 Jul 2022 06:37:18 +0000 (08:37 +0200)]
Merge pull request #3676 from martin-frbg/dnrm2-utest
Add DNRM2 regression test for issues 2998 and 3654
Martin Kroeker [Sun, 3 Jul 2022 21:48:30 +0000 (23:48 +0200)]
properly embed test_dnrm2
Martin Kroeker [Sun, 3 Jul 2022 18:19:24 +0000 (20:19 +0200)]
use huge_val not huge_valf for portability
Martin Kroeker [Sun, 3 Jul 2022 16:23:51 +0000 (18:23 +0200)]
old systems may not have inf in math.h
Martin Kroeker [Sun, 3 Jul 2022 15:56:49 +0000 (17:56 +0200)]
Add DNRM2 regression test for issues 2998 and 3654
Martin Kroeker [Sun, 3 Jul 2022 06:45:45 +0000 (08:45 +0200)]
Merge pull request #3675 from martin-frbg/issue3654
workaround ThunderX2 DNRM2 fault with ssq=inf,scale=0
Martin Kroeker [Sat, 2 Jul 2022 21:47:17 +0000 (23:47 +0200)]
workaround fault with ssq=inf,scale=0
Martin Kroeker [Fri, 1 Jul 2022 10:13:42 +0000 (12:13 +0200)]
Merge pull request #3672 from imzhuhl/neoversen2_bf16
sbgemm support for ARM Neoverse N2
Martin Kroeker [Wed, 29 Jun 2022 06:31:04 +0000 (08:31 +0200)]
Merge pull request #3670 from martin-frbg/osxvermin
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
Honglin Zhu [Wed, 29 Jun 2022 02:08:06 +0000 (10:08 +0800)]
Add gfortran parameters
Honglin Zhu [Wed, 22 Jun 2022 15:00:40 +0000 (23:00 +0800)]
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
Honglin Zhu [Thu, 16 Jun 2022 11:36:22 +0000 (19:36 +0800)]
format code
Honglin Zhu [Wed, 15 Jun 2022 06:20:25 +0000 (14:20 +0800)]
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
Honglin Zhu [Mon, 13 Jun 2022 09:05:43 +0000 (17:05 +0800)]
neoverse n2 sbgemm: init file
Martin Kroeker [Tue, 28 Jun 2022 21:13:11 +0000 (23:13 +0200)]
Merge pull request #3673 from martin-frbg/azuredynmingw
AzureCI: drop cpus from the DYNAMIC_LIST for Windows/mingw to save time
Martin Kroeker [Tue, 28 Jun 2022 19:40:04 +0000 (21:40 +0200)]
mingw-dynamic arch: drop Haswell too
Martin Kroeker [Tue, 28 Jun 2022 18:12:11 +0000 (20:12 +0200)]
drop NEHALEM from the DYNLIST for Windows/mingw to save time
Martin Kroeker [Tue, 28 Jun 2022 14:09:36 +0000 (16:09 +0200)]
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
Martin Kroeker [Tue, 28 Jun 2022 14:05:11 +0000 (16:05 +0200)]
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
Martin Kroeker [Tue, 28 Jun 2022 09:52:48 +0000 (11:52 +0200)]
Add C versions of the CBLAS test sources (#3656)
* Add C conversions of the CBLAS tests for NOFORTRAN=1 builds
* Enable CTEST without Fortran and fix passing of BUILD_vartype options to exports/gensymbol
Martin Kroeker [Tue, 28 Jun 2022 09:46:25 +0000 (11:46 +0200)]
Increase MACOSX_DEPLOYMENT_TARGET to 11 on ARM macs
VFerrari [Sat, 25 Jun 2022 06:28:23 +0000 (03:28 -0300)]
Power: Enable SMALL_MATRIX OPT as default for dynamic arch
VFerrari [Sat, 25 Jun 2022 06:21:18 +0000 (03:21 -0300)]
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
Martin Kroeker [Sat, 18 Jun 2022 18:52:26 +0000 (20:52 +0200)]
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
Martin Kroeker [Sat, 18 Jun 2022 18:51:59 +0000 (20:51 +0200)]
Merge pull request #3653 from RajalakshmiSR/dgemvp10
POWER10: convert dgemv inline assembly
Rajalakshmi Srinivasaraghavan [Fri, 17 Jun 2022 13:18:08 +0000 (08:18 -0500)]
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
Martin Kroeker [Fri, 10 Jun 2022 06:58:00 +0000 (08:58 +0200)]
Merge pull request #3647 from martin-frbg/exports_3.10.0
Amend gensymbol with some LAPACK 3.10.0 additions
Martin Kroeker [Thu, 9 Jun 2022 17:31:08 +0000 (19:31 +0200)]
Amend some LAPACK 3.10.0 additions
Nursultan Zarlyk [Thu, 9 Jun 2022 16:49:49 +0000 (18:49 +0200)]
Replace with ARM64 intrinsics
Rajalakshmi Srinivasaraghavan [Thu, 9 Jun 2022 15:42:57 +0000 (10:42 -0500)]
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
Martin Kroeker [Wed, 8 Jun 2022 17:29:07 +0000 (19:29 +0200)]
Merge pull request #3645 from martin-frbg/issue3644
Fix quotes around compiler args in C11 check
Martin Kroeker [Wed, 8 Jun 2022 09:22:20 +0000 (11:22 +0200)]
Fix quotes around compiler args in C11 check
Martin Kroeker [Wed, 8 Jun 2022 09:18:46 +0000 (11:18 +0200)]
Merge pull request #3643 from martin-frbg/fixgensymbol
Fix LAPACK path in new gensymbol script
Xianyi Zhang [Mon, 6 Jun 2022 06:12:09 +0000 (14:12 +0800)]
Merge branch 'develop' into risc-v
Xianyi Zhang [Mon, 6 Jun 2022 06:11:28 +0000 (14:11 +0800)]
Add PLCT to contributors.
Xianyi Zhang [Mon, 6 Jun 2022 05:56:05 +0000 (13:56 +0800)]
Merge branch 'risc-v_fix_intrinsic' into risc-v
Xianyi Zhang [Mon, 28 Feb 2022 12:33:11 +0000 (20:33 +0800)]
Update RISC-V Intrinsic API.
Martin Kroeker [Sun, 5 Jun 2022 21:28:12 +0000 (23:28 +0200)]
Fix LAPACK path in new gensymbol script
Martin Kroeker [Sun, 5 Jun 2022 09:23:29 +0000 (11:23 +0200)]
Merge pull request #3641 from RajalakshmiSR/ppc_build
power10: Fix build issues due to perl scripts conversion
Nursultan Zarlyk [Thu, 2 Jun 2022 14:53:54 +0000 (16:53 +0200)]
Fix MSVC ARM64 build. Add generic kernel for ARM64
Rajalakshmi Srinivasaraghavan [Thu, 2 Jun 2022 13:11:10 +0000 (08:11 -0500)]
power10: Fix build issues due to perl scripts conversion
Due to recent perl script conversion, there are some build
errors when compiling openblas with advance toolchain compilers.
Martin Kroeker [Fri, 27 May 2022 08:23:02 +0000 (10:23 +0200)]
Merge pull request #3637 from martin-frbg/issue3636
Add fallback value for bogus sc_nprocessors_conf in getarch
Martin Kroeker [Thu, 26 May 2022 22:29:17 +0000 (00:29 +0200)]
Add fallback value for bogus sc_nprocessors_conf
Martin Kroeker [Thu, 26 May 2022 09:57:53 +0000 (11:57 +0200)]
Merge pull request #3635 from martin-frbg/issue3634
Support compilation with the Intel ifx compiler
Martin Kroeker [Thu, 26 May 2022 07:31:49 +0000 (09:31 +0200)]
Add Intel ifx compiler
Martin Kroeker [Sun, 22 May 2022 19:18:44 +0000 (21:18 +0200)]
Merge pull request #3633 from martin-frbg/perl_fallback
Add back original PERL-based build scripts and add option USE_PERL
Martin Kroeker [Sun, 22 May 2022 16:36:24 +0000 (18:36 +0200)]
Support USE_PERL fallback for gensymbol
Martin Kroeker [Sun, 22 May 2022 16:35:23 +0000 (18:35 +0200)]
Add USE_PERL fallback option for gensymbol script
Martin Kroeker [Sun, 22 May 2022 16:33:24 +0000 (18:33 +0200)]
Add back original PERL-based script under new name
Martin Kroeker [Sun, 22 May 2022 16:32:19 +0000 (18:32 +0200)]
Add USE_PERL fallback option for create script used with FUNCTION_PROFILE
Martin Kroeker [Sun, 22 May 2022 16:29:01 +0000 (18:29 +0200)]
Add back original PERL-based script under new name
Martin Kroeker [Sun, 22 May 2022 16:27:45 +0000 (18:27 +0200)]
Add back PERL-based scripts under new name
Martin Kroeker [Sun, 22 May 2022 16:27:02 +0000 (18:27 +0200)]
Add fallback option USE_PERL for original PERL-based build scripts
Martin Kroeker [Sun, 22 May 2022 16:21:17 +0000 (18:21 +0200)]
Merge pull request #3624 from ioraff/no-perl
rewrite perl scripts in universal shell
Martin Kroeker [Fri, 20 May 2022 11:47:09 +0000 (13:47 +0200)]
Merge pull request #3631 from martin-frbg/revertdynskx
Revert selection of a different DGEMM kernel for SkylakeX in DYNAMIC_ARCH builds
Martin Kroeker [Fri, 20 May 2022 09:28:23 +0000 (11:28 +0200)]
revert "switch DGEMM parameters for SkylakeX if DYNAMIC_ARCH"
Martin Kroeker [Fri, 20 May 2022 09:23:30 +0000 (11:23 +0200)]
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
Martin Kroeker [Fri, 20 May 2022 04:37:37 +0000 (06:37 +0200)]
Merge pull request #3630 from martin-frbg/fixpr3629
Fix compilation of cpuid_riscv
Martin Kroeker [Thu, 19 May 2022 16:57:46 +0000 (18:57 +0200)]
Fix compilation
Zhang Xianyi [Thu, 19 May 2022 09:57:19 +0000 (17:57 +0800)]
Merge pull request #3629 from Rabenda/riscv-c910
riscv: Fix machine recognition for c910v
Han Gao [Thu, 19 May 2022 09:32:48 +0000 (17:32 +0800)]
riscv: Fix machine recognition for c910v
Signed-off-by: Han Gao <gaohan@uniontech.com>
Owen Rafferty [Thu, 12 May 2022 23:58:10 +0000 (18:58 -0500)]
rewrite perl scripts in universal shell
Martin Kroeker [Wed, 18 May 2022 22:03:55 +0000 (00:03 +0200)]
Merge pull request #3628 from martin-frbg/issue3620
DYNAMIC_ARCH: Improve mapping for future AMD cpus
Martin Kroeker [Wed, 18 May 2022 13:35:30 +0000 (15:35 +0200)]
Expand cpu mapping for future Zen cpus and use feature-based fallback for unknown AMD family codes
Martin Kroeker [Sat, 14 May 2022 22:24:35 +0000 (00:24 +0200)]
Merge pull request #3625 from RajalakshmiSR/P10_store
POWER10: Changing store instructions for Level1 functions
Rajalakshmi Srinivasaraghavan [Thu, 12 May 2022 16:17:33 +0000 (11:17 -0500)]
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
Martin Kroeker [Wed, 4 May 2022 13:12:22 +0000 (15:12 +0200)]
Merge pull request #3619 from martin-frbg/fixup-3613
Initial attempt at proper cpu detection on RISCV
Martin Kroeker [Wed, 4 May 2022 06:58:56 +0000 (08:58 +0200)]
Initial attempt at proper cpu detection on RISCV
Martin Kroeker [Wed, 4 May 2022 05:22:47 +0000 (07:22 +0200)]
Merge pull request #3613 from Rabenda/fix-riscv
Fix riscv64 detect
Martin Kroeker [Wed, 4 May 2022 05:22:25 +0000 (07:22 +0200)]
Merge pull request #3618 from martin-frbg/issue3606
Automatically downgrade C910V to RISCV64_GENERIC if the compiler lacks vector support
Martin Kroeker [Tue, 3 May 2022 21:29:55 +0000 (23:29 +0200)]
Have getarch downgrade the RISCV C910V target to GENERIC if compiler lacks vector support
Martin Kroeker [Tue, 3 May 2022 21:27:50 +0000 (23:27 +0200)]
Add compiler check for RISCV vector support
Martin Kroeker [Sat, 30 Apr 2022 22:09:20 +0000 (00:09 +0200)]
Merge pull request #3616 from martin-frbg/issue3615
Fix CMAKE generator rules for ?laswp_ncopy and ?neg_tcopy kernels
Martin Kroeker [Sat, 30 Apr 2022 18:38:09 +0000 (20:38 +0200)]
rename lapack subtarget to lapack_overrides to avoid name clash with netlib in case-insensitive settings
Martin Kroeker [Sat, 30 Apr 2022 18:35:17 +0000 (20:35 +0200)]
Merge pull request #3614 from martin-frbg/clapackfix
Makefile fixes related to C_LAPACK, plus Travis CI fixes
Martin Kroeker [Sat, 30 Apr 2022 16:49:04 +0000 (18:49 +0200)]
Update .travis.yml
Martin Kroeker [Sat, 30 Apr 2022 16:33:00 +0000 (18:33 +0200)]
try to fix assembler errors on z13
Martin Kroeker [Sat, 30 Apr 2022 13:28:38 +0000 (15:28 +0200)]
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
Martin Kroeker [Wed, 27 Apr 2022 20:18:22 +0000 (22:18 +0200)]
fix arch tags
Martin Kroeker [Wed, 27 Apr 2022 19:59:45 +0000 (21:59 +0200)]
Remove leftover debug output
Martin Kroeker [Wed, 27 Apr 2022 18:31:42 +0000 (20:31 +0200)]
Avoid adding -lgfortran with NOFORTRAN
Martin Kroeker [Wed, 27 Apr 2022 18:26:45 +0000 (20:26 +0200)]
Update NOFORTRAN message for fallback to C_LAPACK
Han Gao [Tue, 26 Apr 2022 18:29:43 +0000 (02:29 +0800)]
Fix riscv64 arch detect
Signed-off-by: Han Gao <gaohan@uniontech.com>