Martin Kroeker [Wed, 20 Jan 2021 14:38:30 +0000 (15:38 +0100)]
Merge pull request #8 from xianyi/develop
rebase
Martin Kroeker [Sat, 16 Jan 2021 14:47:34 +0000 (15:47 +0100)]
Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:02:05 +0000 (23:02 +0100)]
Merge pull request #3060 from martin-frbg/dyn_arm64
Label the assembly part of the ARMV8 dynamic arch detection as volatile
Martin Kroeker [Tue, 12 Jan 2021 15:51:35 +0000 (16:51 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:49:39 +0000 (16:49 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:47:15 +0000 (16:47 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:39:35 +0000 (16:39 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:38:51 +0000 (16:38 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:36:12 +0000 (16:36 +0100)]
Support NVIDIA HPC compiler
Martin Kroeker [Tue, 12 Jan 2021 15:34:18 +0000 (16:34 +0100)]
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
Martin Kroeker [Tue, 12 Jan 2021 15:32:29 +0000 (16:32 +0100)]
Support compilation with nvfortran
Gordon Fossum [Tue, 12 Jan 2021 02:13:53 +0000 (21:13 -0500)]
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
Martin Kroeker [Mon, 11 Jan 2021 18:05:29 +0000 (19:05 +0100)]
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop
rebase
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
Martin Kroeker [Tue, 29 Dec 2020 20:59:40 +0000 (21:59 +0100)]
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Aurelien Jarno [Tue, 29 Dec 2020 12:06:39 +0000 (12:06 +0000)]
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Martin Kroeker [Sun, 27 Dec 2020 21:54:20 +0000 (22:54 +0100)]
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
Martin Kroeker [Sun, 27 Dec 2020 20:55:08 +0000 (21:55 +0100)]
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
Martin Kroeker [Sun, 27 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Merge pull request #6 from xianyi/develop
rebase
Martin Kroeker [Sun, 27 Dec 2020 20:26:52 +0000 (21:26 +0100)]
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
Martin Kroeker [Mon, 21 Dec 2020 12:30:08 +0000 (13:30 +0100)]
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
Martin Kroeker [Mon, 21 Dec 2020 06:45:13 +0000 (07:45 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:42:51 +0000 (07:42 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:41:18 +0000 (07:41 +0100)]
Temporarily revert to the old nrm2 kernel
Martin Kroeker [Sun, 20 Dec 2020 10:55:53 +0000 (11:55 +0100)]
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
Martin Kroeker [Sat, 19 Dec 2020 22:21:22 +0000 (23:21 +0100)]
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:19:05 +0000 (23:19 +0100)]
NVIDIA compiler does not yet support POWER10
Martin Kroeker [Sat, 19 Dec 2020 22:17:40 +0000 (23:17 +0100)]
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:14:02 +0000 (23:14 +0100)]
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
Martin Kroeker [Sat, 19 Dec 2020 21:15:58 +0000 (22:15 +0100)]
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
Martin Kroeker [Sat, 19 Dec 2020 21:11:49 +0000 (22:11 +0100)]
Amend SkylakeX options to support the NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:09:57 +0000 (22:09 +0100)]
Add nvfortran
Martin Kroeker [Sat, 19 Dec 2020 21:08:37 +0000 (22:08 +0100)]
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
Martin Kroeker [Sat, 19 Dec 2020 21:06:56 +0000 (22:06 +0100)]
Add version printout for PGI/NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 19:11:06 +0000 (20:11 +0100)]
Merge pull request #5 from xianyi/develop
rebase
Martin Kroeker [Sat, 19 Dec 2020 19:04:19 +0000 (20:04 +0100)]
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
Martin Kroeker [Thu, 17 Dec 2020 10:34:05 +0000 (11:34 +0100)]
Conditionally add -mfma to compiler options where needed
Martin Kroeker [Thu, 17 Dec 2020 10:32:27 +0000 (11:32 +0100)]
Move -fma option setting to kernel/Makefile.L1
Martin Kroeker [Tue, 15 Dec 2020 23:05:04 +0000 (00:05 +0100)]
Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
Martin Kroeker [Tue, 15 Dec 2020 23:04:45 +0000 (00:04 +0100)]
Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
Martin Kroeker [Mon, 14 Dec 2020 21:40:23 +0000 (22:40 +0100)]
Add Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 21:30:36 +0000 (22:30 +0100)]
Add Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 18:21:52 +0000 (19:21 +0100)]
Fix undefined CC variable in clang check
Martin Kroeker [Sun, 13 Dec 2020 20:28:01 +0000 (21:28 +0100)]
Fix spurious removal of a trailing character from the hostarch string on x86_64
Martin Kroeker [Sun, 13 Dec 2020 20:22:41 +0000 (21:22 +0100)]
Merge pull request #4 from xianyi/develop
rebase
Martin Kroeker [Sun, 13 Dec 2020 20:21:34 +0000 (21:21 +0100)]
Merge pull request #3036 from RajalakshmiSR/p10copyalign
POWER10: Improve copy performance
Rajalakshmi Srinivasaraghavan [Sun, 13 Dec 2020 16:41:45 +0000 (10:41 -0600)]
POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
Joshie [Sun, 13 Dec 2020 09:06:14 +0000 (09:06 +0000)]
Define BLAS acronym in README
Martin Kroeker [Sat, 12 Dec 2020 22:28:49 +0000 (23:28 +0100)]
Update version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:28:20 +0000 (23:28 +0100)]
Update version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:27:40 +0000 (23:27 +0100)]
Merge pull request #3034 from xianyi/release-0.3.0
Merge back the release branch into develop to copy tag
Martin Kroeker [Sat, 12 Dec 2020 17:19:29 +0000 (18:19 +0100)]
Merge pull request #3033 from xianyi/develop
Update branch from develop to release 0.3.13
Martin Kroeker [Sat, 12 Dec 2020 17:15:33 +0000 (18:15 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:14:49 +0000 (18:14 +0100)]
Update version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:13:23 +0000 (18:13 +0100)]
Merge pull request #3031 from martin-frbg/changelog13
Update Changelog.txt
Martin Kroeker [Sat, 12 Dec 2020 13:27:37 +0000 (14:27 +0100)]
Update Changelog.txt
Co-authored-by: h-vetinari <h.vetinari@gmx.com>
Martin Kroeker [Sat, 12 Dec 2020 09:01:45 +0000 (10:01 +0100)]
Merge pull request #3030 from martin-frbg/fix2994
Make fallback from POWER10 to POWER9 depend on new enough compiler
Martin Kroeker [Sat, 12 Dec 2020 00:25:20 +0000 (01:25 +0100)]
Update Changelog.txt for 0.3.13
Martin Kroeker [Fri, 11 Dec 2020 22:41:17 +0000 (23:41 +0100)]
Make fallback from P10 to P9 conditional on suitable compiler
Martin Kroeker [Fri, 11 Dec 2020 22:38:42 +0000 (23:38 +0100)]
Merge pull request #3 from xianyi/develop
rebase
Martin Kroeker [Fri, 11 Dec 2020 22:37:30 +0000 (23:37 +0100)]
Merge pull request #2994 from antonblanchard/power10-fixes
Power10 fixes
Martin Kroeker [Thu, 10 Dec 2020 21:49:28 +0000 (22:49 +0100)]
Merge pull request #3029 from RajalakshmiSR/axpyp10
POWER10: Improve axpy performance
Martin Kroeker [Thu, 10 Dec 2020 18:42:54 +0000 (19:42 +0100)]
Merge pull request #3021 from austinpagan/trsm_p10
POWER: Added special unrolled vectorized versions of "Solve" for specific si…
Rajalakshmi Srinivasaraghavan [Thu, 10 Dec 2020 17:51:42 +0000 (11:51 -0600)]
POWER10: Improve axpy performance
This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.
Martin Kroeker [Thu, 10 Dec 2020 15:29:41 +0000 (16:29 +0100)]
Merge pull request #3026 from martin-frbg/revert747
Revert PR747 - SYRK parameter changes for Haswell and related targets
Martin Kroeker [Thu, 10 Dec 2020 15:27:30 +0000 (16:27 +0100)]
Merge pull request #3027 from gxw-loongson/develop
Add msa support for loongson
gxw [Thu, 10 Dec 2020 02:48:53 +0000 (10:48 +0800)]
Keep LOONGSON3A and LOONGSON3B for loongson
gxw [Thu, 26 Nov 2020 06:59:41 +0000 (14:59 +0800)]
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
Martin Kroeker [Tue, 8 Dec 2020 20:07:57 +0000 (21:07 +0100)]
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
Martin Kroeker [Tue, 8 Dec 2020 20:01:36 +0000 (21:01 +0100)]
remove extra/intermediate size step for min_jj introduced in PR747
Martin Kroeker [Tue, 8 Dec 2020 19:59:56 +0000 (20:59 +0100)]
remove extra/intermediate size step of min_jj from PR747
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop
rebase
gxw [Tue, 8 Dec 2020 11:16:39 +0000 (19:16 +0800)]
Remove gcc unrecognized option '-msched-weight' when check msa
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop
MIPS: Fix two bugs
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop
Fix test errors reported by cblas_cgemm & cblas_ctrmm
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization
The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.