Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop
rebase
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
Martin Kroeker [Tue, 29 Dec 2020 20:59:40 +0000 (21:59 +0100)]
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Aurelien Jarno [Tue, 29 Dec 2020 12:06:39 +0000 (12:06 +0000)]
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Martin Kroeker [Sun, 27 Dec 2020 21:54:20 +0000 (22:54 +0100)]
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
Martin Kroeker [Sun, 27 Dec 2020 20:55:08 +0000 (21:55 +0100)]
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
Martin Kroeker [Sun, 27 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Merge pull request #6 from xianyi/develop
rebase
Martin Kroeker [Sun, 27 Dec 2020 20:26:52 +0000 (21:26 +0100)]
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
Martin Kroeker [Mon, 21 Dec 2020 12:30:08 +0000 (13:30 +0100)]
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
Martin Kroeker [Mon, 21 Dec 2020 06:45:13 +0000 (07:45 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:42:51 +0000 (07:42 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:41:18 +0000 (07:41 +0100)]
Temporarily revert to the old nrm2 kernel
Martin Kroeker [Sun, 20 Dec 2020 10:55:53 +0000 (11:55 +0100)]
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
Martin Kroeker [Sat, 19 Dec 2020 22:21:22 +0000 (23:21 +0100)]
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:19:05 +0000 (23:19 +0100)]
NVIDIA compiler does not yet support POWER10
Martin Kroeker [Sat, 19 Dec 2020 22:17:40 +0000 (23:17 +0100)]
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:14:02 +0000 (23:14 +0100)]
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
Martin Kroeker [Sat, 19 Dec 2020 21:15:58 +0000 (22:15 +0100)]
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
Martin Kroeker [Sat, 19 Dec 2020 21:11:49 +0000 (22:11 +0100)]
Amend SkylakeX options to support the NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:09:57 +0000 (22:09 +0100)]
Add nvfortran
Martin Kroeker [Sat, 19 Dec 2020 21:08:37 +0000 (22:08 +0100)]
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
Martin Kroeker [Sat, 19 Dec 2020 21:06:56 +0000 (22:06 +0100)]
Add version printout for PGI/NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 19:11:06 +0000 (20:11 +0100)]
Merge pull request #5 from xianyi/develop
rebase
Martin Kroeker [Sat, 19 Dec 2020 19:04:19 +0000 (20:04 +0100)]
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
Martin Kroeker [Thu, 17 Dec 2020 10:34:05 +0000 (11:34 +0100)]
Conditionally add -mfma to compiler options where needed
Martin Kroeker [Thu, 17 Dec 2020 10:32:27 +0000 (11:32 +0100)]
Move -fma option setting to kernel/Makefile.L1
Martin Kroeker [Tue, 15 Dec 2020 23:05:04 +0000 (00:05 +0100)]
Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
Martin Kroeker [Tue, 15 Dec 2020 23:04:45 +0000 (00:04 +0100)]
Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
Martin Kroeker [Mon, 14 Dec 2020 18:21:52 +0000 (19:21 +0100)]
Fix undefined CC variable in clang check
Martin Kroeker [Sun, 13 Dec 2020 20:28:01 +0000 (21:28 +0100)]
Fix spurious removal of a trailing character from the hostarch string on x86_64
Martin Kroeker [Sun, 13 Dec 2020 20:22:41 +0000 (21:22 +0100)]
Merge pull request #4 from xianyi/develop
rebase
Martin Kroeker [Sun, 13 Dec 2020 20:21:34 +0000 (21:21 +0100)]
Merge pull request #3036 from RajalakshmiSR/p10copyalign
POWER10: Improve copy performance
Rajalakshmi Srinivasaraghavan [Sun, 13 Dec 2020 16:41:45 +0000 (10:41 -0600)]
POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
Joshie [Sun, 13 Dec 2020 09:06:14 +0000 (09:06 +0000)]
Define BLAS acronym in README
Martin Kroeker [Sat, 12 Dec 2020 22:28:49 +0000 (23:28 +0100)]
Update version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:28:20 +0000 (23:28 +0100)]
Update version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:27:40 +0000 (23:27 +0100)]
Merge pull request #3034 from xianyi/release-0.3.0
Merge back the release branch into develop to copy tag
Martin Kroeker [Sat, 12 Dec 2020 17:19:29 +0000 (18:19 +0100)]
Merge pull request #3033 from xianyi/develop
Update branch from develop to release 0.3.13
Martin Kroeker [Sat, 12 Dec 2020 17:15:33 +0000 (18:15 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:14:49 +0000 (18:14 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:13:23 +0000 (18:13 +0100)]
Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
Martin Kroeker [Sat, 12 Dec 2020 13:27:37 +0000 (14:27 +0100)]
Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
Martin Kroeker [Sat, 12 Dec 2020 09:01:45 +0000 (10:01 +0100)]
Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
Martin Kroeker [Sat, 12 Dec 2020 00:25:20 +0000 (01:25 +0100)]
Update Changelog.txt for 0.3.13
Martin Kroeker [Fri, 11 Dec 2020 22:41:17 +0000 (23:41 +0100)]
Make fallback from P10 to P9 conditional on suitable compiler
Martin Kroeker [Fri, 11 Dec 2020 22:38:42 +0000 (23:38 +0100)]
Merge pull request #3 from xianyi/develop
rebase
Martin Kroeker [Fri, 11 Dec 2020 22:37:30 +0000 (23:37 +0100)]
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
Martin Kroeker [Thu, 10 Dec 2020 21:49:28 +0000 (22:49 +0100)]
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
Martin Kroeker [Thu, 10 Dec 2020 18:42:54 +0000 (19:42 +0100)]
Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
Rajalakshmi Srinivasaraghavan [Thu, 10 Dec 2020 17:51:42 +0000 (11:51 -0600)]
POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
Martin Kroeker [Thu, 10 Dec 2020 15:29:41 +0000 (16:29 +0100)]
Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
Martin Kroeker [Thu, 10 Dec 2020 15:27:30 +0000 (16:27 +0100)]
Merge pull request #3027 from gxw-loongson/develop
Add msa support for loongson
gxw [Thu, 10 Dec 2020 02:48:53 +0000 (10:48 +0800)]
Keep LOONGSON3A and LOONGSON3B for loongson
gxw [Thu, 26 Nov 2020 06:59:41 +0000 (14:59 +0800)]
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
Martin Kroeker [Tue, 8 Dec 2020 20:07:57 +0000 (21:07 +0100)]
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
Martin Kroeker [Tue, 8 Dec 2020 20:01:36 +0000 (21:01 +0100)]
remove extra/intermediate size step for min_jj introduced in PR747
Martin Kroeker [Tue, 8 Dec 2020 19:59:56 +0000 (20:59 +0100)]
remove extra/intermediate size step of min_jj from PR747
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop
rebase
gxw [Tue, 8 Dec 2020 11:16:39 +0000 (19:16 +0800)]
Remove gcc unrecognized option '-msched-weight' when check msa
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization
The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc
Fix 32 and 64bit builds on SPARC with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:20:50 +0000 (19:20 +0100)]
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:15:37 +0000 (19:15 +0100)]
Work around DOT and SWAP test failures
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop
rebase
Jin Bo [Sat, 5 Dec 2020 07:06:12 +0000 (15:06 +0800)]
Fix test errors reported by cblas_cgemm & cblas_ctrmm
The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.
Gordon Fossum [Fri, 4 Dec 2020 23:07:06 +0000 (17:07 -0600)]
Added special unrolled vectorized versions of "Solve" for specific sizes,
in DTRSM and STRSM, to improve performance in Power9 and Power10.
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015
Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum
Improve the performance of zasum and casum with AVX512 intrinsic
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46
Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link
link math lib on FreeBSD
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param
POWER10: Update param.h
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h
Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value