Martin Kroeker [Thu, 11 Feb 2021 08:26:15 +0000 (09:26 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:25:36 +0000 (09:25 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:51 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:16 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:23:05 +0000 (09:23 +0100)]
Enable optimized srot/drot kernels from Haswell
Martin Kroeker [Wed, 10 Feb 2021 13:17:24 +0000 (14:17 +0100)]
Merge pull request #12 from xianyi/develop
rebase
Martin Kroeker [Tue, 2 Feb 2021 12:36:17 +0000 (13:36 +0100)]
Merge pull request #3094 from xoviat/patch-1
build openmp on appveyor
Martin Kroeker [Tue, 2 Feb 2021 12:33:15 +0000 (13:33 +0100)]
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
Martin Kroeker [Tue, 2 Feb 2021 09:53:46 +0000 (10:53 +0100)]
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
Martin Kroeker [Mon, 1 Feb 2021 20:02:53 +0000 (21:02 +0100)]
remove spurious lines (probably editor malfunction)
Martin Kroeker [Mon, 1 Feb 2021 19:18:53 +0000 (20:18 +0100)]
handle AppleClang in Cooperlake support condition
Martin Kroeker [Mon, 1 Feb 2021 18:45:25 +0000 (19:45 +0100)]
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
xoviat [Sun, 31 Jan 2021 03:28:12 +0000 (21:28 -0600)]
appveyor: cleanup and add openmp run
Martin Kroeker [Sun, 31 Jan 2021 17:02:41 +0000 (18:02 +0100)]
Merge pull request #3073 from xoviat/embedded
add embedded option
Martin Kroeker [Sat, 30 Jan 2021 21:21:28 +0000 (22:21 +0100)]
Merge pull request #3093 from martin-frbg/fix3064
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:46:25 +0000 (16:46 +0100)]
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:23:37 +0000 (16:23 +0100)]
Merge pull request #3092 from RajalakshmiSR/cscal_p10
Optimize cscal function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 29 Jan 2021 19:51:43 +0000 (13:51 -0600)]
Optimize cscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Fri, 29 Jan 2021 12:37:23 +0000 (13:37 +0100)]
Merge pull request #3091 from martin-frbg/lapack477-2
Fix calculation of the non-exceptional shift values in LAPACK complex QZ
Martin Kroeker [Fri, 29 Jan 2021 09:45:36 +0000 (10:45 +0100)]
fix data type
Martin Kroeker [Fri, 29 Jan 2021 08:56:12 +0000 (09:56 +0100)]
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477)
Martin Kroeker [Fri, 29 Jan 2021 08:52:21 +0000 (09:52 +0100)]
Merge pull request #11 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 18:11:55 +0000 (19:11 +0100)]
Merge pull request #3087 from martin-frbg/lapack477
Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ
Martin Kroeker [Wed, 27 Jan 2021 12:41:45 +0000 (13:41 +0100)]
Add exceptional shift to fix rare convergence problems
Martin Kroeker [Wed, 27 Jan 2021 12:39:26 +0000 (13:39 +0100)]
Merge pull request #10 from xianyi/develop
rebase
Martin Kroeker [Wed, 27 Jan 2021 12:25:45 +0000 (13:25 +0100)]
Merge pull request #3076 from martin-frbg/dyn-thunderx
Add Ci job for ARM64/gcc10 DYNAMIC_ARCH
Martin Kroeker [Tue, 26 Jan 2021 19:11:42 +0000 (20:11 +0100)]
Merge pull request #3085 from alexhenrie/memory_alloc
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Tue, 26 Jan 2021 14:13:35 +0000 (15:13 +0100)]
Merge pull request #3083 from martin-frbg/develop
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Mon, 25 Jan 2021 18:02:21 +0000 (19:02 +0100)]
Remove the VORTEX support bits again for now
Martin Kroeker [Mon, 25 Jan 2021 12:13:20 +0000 (13:13 +0100)]
Add DYNAMIC_LIST support for ARM64
Alex Henrie [Mon, 25 Jan 2021 05:20:44 +0000 (22:20 -0700)]
Fix null pointer check in blas_memory_alloc
Martin Kroeker [Sun, 24 Jan 2021 22:18:52 +0000 (23:18 +0100)]
Add DYNAMIC_LIST support for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:01 +0000 (23:18 +0100)]
Add DYNAMIC_LIST option for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:14:45 +0000 (23:14 +0100)]
Merge pull request #9 from xianyi/develop
rebase
Martin Kroeker [Sun, 24 Jan 2021 18:03:40 +0000 (19:03 +0100)]
Merge pull request #3082 from RajalakshmiSR/scalp10
Optimize s/dscal function for POWER10
Rajalakshmi Srinivasaraghavan [Sun, 24 Jan 2021 13:48:28 +0000 (07:48 -0600)]
Optimize s/dscal function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
xoviat [Sun, 24 Jan 2021 04:12:17 +0000 (22:12 -0600)]
add functions for embedded
Martin Kroeker [Sat, 23 Jan 2021 18:08:05 +0000 (19:08 +0100)]
Merge pull request #3059 from Guobing-Chen/BF16_gemm
Initial code for Cooperlake BF16 GEMM kernel
Martin Kroeker [Sat, 23 Jan 2021 18:06:29 +0000 (19:06 +0100)]
Merge pull request #3068 from alexhenrie/scan-build
scan-build fixes
Martin Kroeker [Fri, 22 Jan 2021 07:26:00 +0000 (08:26 +0100)]
Merge pull request #3079 from RajalakshmiSR/rotp10
Optimize s/drot function for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 21 Jan 2021 19:24:45 +0000 (13:24 -0600)]
Optimize s/drot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Thu, 21 Jan 2021 07:51:30 +0000 (08:51 +0100)]
Merge pull request #3075 from martin-frbg/issue3074
Fix DYNAMIC_ARCH compilation on POWER with gcc <11
Martin Kroeker [Wed, 20 Jan 2021 20:34:36 +0000 (21:34 +0100)]
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel
Martin Kroeker [Wed, 20 Jan 2021 19:21:27 +0000 (20:21 +0100)]
Update .drone.yml
Martin Kroeker [Wed, 20 Jan 2021 17:30:05 +0000 (18:30 +0100)]
Add gcc10/arm64 DYNAMIC_ARCH build
Martin Kroeker [Wed, 20 Jan 2021 14:41:04 +0000 (15:41 +0100)]
Require gcc 11 for builtin_cpu_is(power10)
fixes #3074
Martin Kroeker [Wed, 20 Jan 2021 14:38:30 +0000 (15:38 +0100)]
Merge pull request #8 from xianyi/develop
rebase
xoviat [Tue, 19 Jan 2021 14:57:44 +0000 (08:57 -0600)]
add cortex-m platform
Martin Kroeker [Sat, 16 Jan 2021 14:47:34 +0000 (15:47 +0100)]
Merge pull request #3070 from RajalakshmiSR/cdot
Optimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Alex Henrie [Fri, 15 Jan 2021 02:40:32 +0000 (19:40 -0700)]
Remove dead assignment to dflag in rotmg functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Don't define the mode variable when not needed in gemm functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Fix uninitialized argument value in dasum_k
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake
Fix building "generic" TRMM kernel with CMake
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063
Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix
Fix compile-time setting of the GEMM buffer size for gmake builds
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi
Support NVIDIA HPC SDK on ARM64
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake
Add CPUID information for Intel Rocket Lake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake
The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:02:05 +0000 (23:02 +0100)]
Merge pull request #3060 from martin-frbg/dyn_arm64
Label the assembly part of the ARMV8 dynamic arch detection as volatile
Martin Kroeker [Tue, 12 Jan 2021 15:51:35 +0000 (16:51 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:49:39 +0000 (16:49 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:47:15 +0000 (16:47 +0100)]
Add workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:39:35 +0000 (16:39 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:38:51 +0000 (16:38 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:36:12 +0000 (16:36 +0100)]
Support NVIDIA HPC compiler
Martin Kroeker [Tue, 12 Jan 2021 15:34:18 +0000 (16:34 +0100)]
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
Martin Kroeker [Tue, 12 Jan 2021 15:32:29 +0000 (16:32 +0100)]
Support compilation with nvfortran
Gordon Fossum [Tue, 12 Jan 2021 02:13:53 +0000 (21:13 -0500)]
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.
Martin Kroeker [Mon, 11 Jan 2021 18:05:29 +0000 (19:05 +0100)]
Label get_cpu_ftr as volatile to keep gcc from rearranging the code
Chen, Guobing [Sun, 10 Jan 2021 18:15:21 +0000 (02:15 +0800)]
Initial code for Cooperlake BF16 GEMM kernel
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop
rebase
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
Martin Kroeker [Tue, 29 Dec 2020 20:59:40 +0000 (21:59 +0100)]
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Aurelien Jarno [Tue, 29 Dec 2020 12:06:39 +0000 (12:06 +0000)]
getarch.c: define OPENBLAS_SUPPORTED for riscv64
Martin Kroeker [Sun, 27 Dec 2020 21:54:20 +0000 (22:54 +0100)]
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
Martin Kroeker [Sun, 27 Dec 2020 20:55:08 +0000 (21:55 +0100)]
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
Martin Kroeker [Sun, 27 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Merge pull request #6 from xianyi/develop
rebase
Martin Kroeker [Sun, 27 Dec 2020 20:26:52 +0000 (21:26 +0100)]
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
Martin Kroeker [Mon, 21 Dec 2020 12:30:08 +0000 (13:30 +0100)]
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
Martin Kroeker [Mon, 21 Dec 2020 06:45:13 +0000 (07:45 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:42:51 +0000 (07:42 +0100)]
Temporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:41:18 +0000 (07:41 +0100)]
Temporarily revert to the old nrm2 kernel
Martin Kroeker [Sun, 20 Dec 2020 10:55:53 +0000 (11:55 +0100)]
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
Martin Kroeker [Sat, 19 Dec 2020 22:21:22 +0000 (23:21 +0100)]
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers