platform/upstream/openblas.git
3 years agoMerge branch 'develop' into msvc
xoviat [Wed, 27 Jan 2021 20:15:59 +0000 (14:15 -0600)]
Merge branch 'develop' into msvc

3 years agoMerge pull request #3087 from martin-frbg/lapack477
Martin Kroeker [Wed, 27 Jan 2021 18:11:55 +0000 (19:11 +0100)]
Merge pull request #3087 from martin-frbg/lapack477

Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ

3 years agoAdd exceptional shift to fix rare convergence problems
Martin Kroeker [Wed, 27 Jan 2021 12:41:45 +0000 (13:41 +0100)]
Add exceptional shift to fix rare convergence problems

3 years agoMerge pull request #10 from xianyi/develop
Martin Kroeker [Wed, 27 Jan 2021 12:39:26 +0000 (13:39 +0100)]
Merge pull request #10 from xianyi/develop

rebase

3 years agoMerge pull request #3076 from martin-frbg/dyn-thunderx
Martin Kroeker [Wed, 27 Jan 2021 12:25:45 +0000 (13:25 +0100)]
Merge pull request #3076 from martin-frbg/dyn-thunderx

Add Ci job for ARM64/gcc10 DYNAMIC_ARCH

3 years agoMerge pull request #3085 from alexhenrie/memory_alloc
Martin Kroeker [Tue, 26 Jan 2021 19:11:42 +0000 (20:11 +0100)]
Merge pull request #3085 from alexhenrie/memory_alloc

Fix null pointer check in blas_memory_alloc

3 years agoMerge pull request #3083 from martin-frbg/develop
Martin Kroeker [Tue, 26 Jan 2021 14:13:35 +0000 (15:13 +0100)]
Merge pull request #3083 from martin-frbg/develop

Add DYNAMIC_LIST support for ARM64

3 years agoRemove the VORTEX support bits again for now
Martin Kroeker [Mon, 25 Jan 2021 18:02:21 +0000 (19:02 +0100)]
Remove the VORTEX support bits again for now

3 years agoAdd DYNAMIC_LIST support for ARM64
Martin Kroeker [Mon, 25 Jan 2021 12:13:20 +0000 (13:13 +0100)]
Add DYNAMIC_LIST support for ARM64

3 years agoFix null pointer check in blas_memory_alloc
Alex Henrie [Mon, 25 Jan 2021 05:20:44 +0000 (22:20 -0700)]
Fix null pointer check in blas_memory_alloc

3 years agoAdd DYNAMIC_LIST support for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:52 +0000 (23:18 +0100)]
Add DYNAMIC_LIST support for ARM64

3 years agoAdd DYNAMIC_LIST option for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:01 +0000 (23:18 +0100)]
Add DYNAMIC_LIST option for ARM64

3 years agoMerge pull request #9 from xianyi/develop
Martin Kroeker [Sun, 24 Jan 2021 22:14:45 +0000 (23:14 +0100)]
Merge pull request #9 from xianyi/develop

rebase

3 years agoMerge pull request #3082 from RajalakshmiSR/scalp10
Martin Kroeker [Sun, 24 Jan 2021 18:03:40 +0000 (19:03 +0100)]
Merge pull request #3082 from RajalakshmiSR/scalp10

Optimize s/dscal function for POWER10

3 years agoOptimize s/dscal function for POWER10
Rajalakshmi Srinivasaraghavan [Sun, 24 Jan 2021 13:48:28 +0000 (07:48 -0600)]
Optimize s/dscal function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3059 from Guobing-Chen/BF16_gemm
Martin Kroeker [Sat, 23 Jan 2021 18:08:05 +0000 (19:08 +0100)]
Merge pull request #3059 from Guobing-Chen/BF16_gemm

Initial code for Cooperlake BF16 GEMM kernel

3 years agoMerge pull request #3068 from alexhenrie/scan-build
Martin Kroeker [Sat, 23 Jan 2021 18:06:29 +0000 (19:06 +0100)]
Merge pull request #3068 from alexhenrie/scan-build

scan-build fixes

3 years agoMerge pull request #3079 from RajalakshmiSR/rotp10
Martin Kroeker [Fri, 22 Jan 2021 07:26:00 +0000 (08:26 +0100)]
Merge pull request #3079 from RajalakshmiSR/rotp10

Optimize s/drot function for POWER10

3 years agoOptimize s/drot function for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 21 Jan 2021 19:24:45 +0000 (13:24 -0600)]
Optimize s/drot function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3075 from martin-frbg/issue3074
Martin Kroeker [Thu, 21 Jan 2021 07:51:30 +0000 (08:51 +0100)]
Merge pull request #3075 from martin-frbg/issue3074

Fix DYNAMIC_ARCH compilation on POWER with gcc <11

3 years agopatch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that...
Martin Kroeker [Wed, 20 Jan 2021 20:34:36 +0000 (21:34 +0100)]
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel

3 years agoUpdate .drone.yml
Martin Kroeker [Wed, 20 Jan 2021 19:21:27 +0000 (20:21 +0100)]
Update .drone.yml

3 years agoAdd gcc10/arm64 DYNAMIC_ARCH build
Martin Kroeker [Wed, 20 Jan 2021 17:30:05 +0000 (18:30 +0100)]
Add gcc10/arm64 DYNAMIC_ARCH build

3 years agoRequire gcc 11 for builtin_cpu_is(power10)
Martin Kroeker [Wed, 20 Jan 2021 14:41:04 +0000 (15:41 +0100)]
Require gcc 11 for builtin_cpu_is(power10)

fixes #3074

3 years agoMerge pull request #8 from xianyi/develop
Martin Kroeker [Wed, 20 Jan 2021 14:38:30 +0000 (15:38 +0100)]
Merge pull request #8 from xianyi/develop

rebase

3 years agoMerge pull request #3070 from RajalakshmiSR/cdot
Martin Kroeker [Sat, 16 Jan 2021 14:47:34 +0000 (15:47 +0100)]
Merge pull request #3070 from RajalakshmiSR/cdot

Optimize cdot function for POWER10

3 years agoOptimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoRemove dead assignment to dflag in rotmg functions
Alex Henrie [Fri, 15 Jan 2021 02:40:32 +0000 (19:40 -0700)]
Remove dead assignment to dflag in rotmg functions

3 years agoDon't define the mode variable when not needed in gemm functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Don't define the mode variable when not needed in gemm functions

3 years agoFix uninitialized argument value in dasum_k
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Fix uninitialized argument value in dasum_k

3 years agoMerge pull request #3067 from albertziegenhagel/fix-generic-cmake
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake

Fix building "generic" TRMM kernel with CMake

3 years agoMerge pull request #3064 from martin-frbg/issue3063
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063

Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3066 from martin-frbg/buffsizefix
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix

Fix compile-time setting of the GEMM buffer size for gmake builds

3 years agoMerge pull request #3062 from austinpagan/GemmPreferedSize3
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3

Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…

3 years agoMerge pull request #3061 from martin-frbg/arm64-pgi
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi

Support NVIDIA HPC SDK on ARM64

3 years agoMerge pull request #3051 from martin-frbg/rocketlake
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake

Add CPUID information for Intel Rocket Lake

3 years agoFix building "generic" TRMM kernel with CMake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake

The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.

3 years agoMake compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor

3 years agoWorkaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable

3 years agotry to work around gcc update problems
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems

3 years agoAdd prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agorestore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agoAdd CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot

3 years agoAdd prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3060 from martin-frbg/dyn_arm64
Martin Kroeker [Tue, 12 Jan 2021 22:02:05 +0000 (23:02 +0100)]
Merge pull request #3060 from martin-frbg/dyn_arm64

Label the assembly part of the ARMV8 dynamic arch detection as volatile

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:51:35 +0000 (16:51 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:49:39 +0000 (16:49 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:47:15 +0000 (16:47 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:39:35 +0000 (16:39 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:38:51 +0000 (16:38 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoSupport NVIDIA HPC compiler
Martin Kroeker [Tue, 12 Jan 2021 15:36:12 +0000 (16:36 +0100)]
Support NVIDIA HPC compiler

3 years agoSupport compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
Martin Kroeker [Tue, 12 Jan 2021 15:34:18 +0000 (16:34 +0100)]
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)

3 years agoSupport compilation with nvfortran
Martin Kroeker [Tue, 12 Jan 2021 15:32:29 +0000 (16:32 +0100)]
Support compilation with nvfortran

3 years agoAdded definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10...
Gordon Fossum [Tue, 12 Jan 2021 02:13:53 +0000 (21:13 -0500)]
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.

3 years agoLabel get_cpu_ftr as volatile to keep gcc from rearranging the code
Martin Kroeker [Mon, 11 Jan 2021 18:05:29 +0000 (19:05 +0100)]
Label get_cpu_ftr as volatile to keep gcc from rearranging the code

3 years agoInitial code for Cooperlake BF16 GEMM kernel
Chen, Guobing [Sun, 10 Jan 2021 18:15:21 +0000 (02:15 +0800)]
Initial code for Cooperlake BF16 GEMM kernel

3 years agoMerge pull request #7 from xianyi/develop
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop

rebase

3 years agoMerge pull request #3055 from RajalakshmiSR/swapp10
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10

Optimize swap function for POWER10

3 years agoOptimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3053 from pkubaj/patch-1
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1

Fix build on FreeBSD/powerpc64le

3 years agoFix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le

3 years agoMerge pull request #3052 from ashwinyes/arm64_fix_nrm2
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2

arm64: Fix nrm2 for input vectors with Inf

3 years agoarm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf

Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.

3 years agoMerge pull request #3050 from aurel32/riscv64-openblas-supported
Martin Kroeker [Tue, 29 Dec 2020 20:59:40 +0000 (21:59 +0100)]
Merge pull request #3050 from aurel32/riscv64-openblas-supported

getarch.c: define OPENBLAS_SUPPORTED for riscv64

3 years agogetarch.c: define OPENBLAS_SUPPORTED for riscv64
Aurelien Jarno [Tue, 29 Dec 2020 12:06:39 +0000 (12:06 +0000)]
getarch.c: define OPENBLAS_SUPPORTED for riscv64

3 years agoMerge pull request #3049 from martin-frbg/readme
Martin Kroeker [Sun, 27 Dec 2020 21:54:20 +0000 (22:54 +0100)]
Merge pull request #3049 from martin-frbg/readme

Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos

3 years agoAdd pointers to the netlib documentation and Gilbert Strang's linear algebra primers
Martin Kroeker [Sun, 27 Dec 2020 20:55:08 +0000 (21:55 +0100)]
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers

3 years agoMerge pull request #6 from xianyi/develop
Martin Kroeker [Sun, 27 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Merge pull request #6 from xianyi/develop

rebase

3 years agoMerge pull request #3035 from Joshua-Ashton/patch-1
Martin Kroeker [Sun, 27 Dec 2020 20:26:52 +0000 (21:26 +0100)]
Merge pull request #3035 from Joshua-Ashton/patch-1

Define BLAS acronym in README

3 years agoMerge pull request #3048 from martin-frbg/issue2998
Martin Kroeker [Mon, 21 Dec 2020 12:30:08 +0000 (13:30 +0100)]
Merge pull request #3048 from martin-frbg/issue2998

Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1

3 years agoTemporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:45:13 +0000 (07:45 +0100)]
Temporarily revert to the old nrm2 kernels

3 years agoTemporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:42:51 +0000 (07:42 +0100)]
Temporarily revert to the old nrm2 kernels

3 years agoTemporarily revert to the old nrm2 kernel
Martin Kroeker [Mon, 21 Dec 2020 06:41:18 +0000 (07:41 +0100)]
Temporarily revert to the old nrm2 kernel

3 years agoMerge pull request #3046 from martin-frbg/nvidiasdk-ppc
Martin Kroeker [Sun, 20 Dec 2020 10:55:53 +0000 (11:55 +0100)]
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc

Support NVIDIA HPC SDK on POWERPC

3 years agoImplement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:21:22 +0000 (23:21 +0100)]
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers

3 years agoNVIDIA compiler does not yet support POWER10
Martin Kroeker [Sat, 19 Dec 2020 22:19:05 +0000 (23:19 +0100)]
NVIDIA compiler does not yet support POWER10

3 years agoLimit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:17:40 +0000 (23:17 +0100)]
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers

3 years agoMerge pull request #3045 from martin-frbg/nvidiasdk
Martin Kroeker [Sat, 19 Dec 2020 22:14:02 +0000 (23:14 +0100)]
Merge pull request #3045 from martin-frbg/nvidiasdk

Support NVIDIA HPC SDK 20.11 compilers on x86_64

3 years agoDisable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
Martin Kroeker [Sat, 19 Dec 2020 21:15:58 +0000 (22:15 +0100)]
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA

3 years agoAmend SkylakeX options to support the NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:11:49 +0000 (22:11 +0100)]
Amend SkylakeX options to support the NVIDIA compiler

3 years agoAdd nvfortran
Martin Kroeker [Sat, 19 Dec 2020 21:09:57 +0000 (22:09 +0100)]
Add nvfortran

3 years agoAdd/modify "PGI" compiler options for NVIDIA SDK 20.11
Martin Kroeker [Sat, 19 Dec 2020 21:08:37 +0000 (22:08 +0100)]
Add/modify "PGI" compiler options for NVIDIA SDK 20.11

3 years agoAdd version printout for PGI/NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:06:56 +0000 (22:06 +0100)]
Add version printout for PGI/NVIDIA compiler

3 years agoMerge pull request #5 from xianyi/develop
Martin Kroeker [Sat, 19 Dec 2020 19:11:06 +0000 (20:11 +0100)]
Merge pull request #5 from xianyi/develop

rebase

3 years agoMerge pull request #3042 from martin-frbg/develop
Martin Kroeker [Sat, 19 Dec 2020 19:04:19 +0000 (20:04 +0100)]
Merge pull request #3042 from martin-frbg/develop

Move FMA3 option setting to the kernel makefile

3 years agoConditionally add -mfma to compiler options where needed
Martin Kroeker [Thu, 17 Dec 2020 10:34:05 +0000 (11:34 +0100)]
Conditionally add -mfma to compiler options where needed

3 years agoMove -fma option setting to kernel/Makefile.L1
Martin Kroeker [Thu, 17 Dec 2020 10:32:27 +0000 (11:32 +0100)]
Move -fma option setting to kernel/Makefile.L1

3 years agoMerge pull request #3040 from martin-frbg/fixfcheck
Martin Kroeker [Tue, 15 Dec 2020 23:05:04 +0000 (00:05 +0100)]
Merge pull request #3040 from martin-frbg/fixfcheck

Fix undefined CC variable in check for clang+gfortran combo

3 years agoMerge pull request #3038 from martin-frbg/issue3037
Martin Kroeker [Tue, 15 Dec 2020 23:04:45 +0000 (00:04 +0100)]
Merge pull request #3038 from martin-frbg/issue3037

Fix spurious assumption of cross-compilation on some architectures

3 years agoAdd Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 21:40:23 +0000 (22:40 +0100)]
Add Intel Rocket Lake

3 years agoAdd Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 21:30:36 +0000 (22:30 +0100)]
Add Intel Rocket Lake

3 years agoFix undefined CC variable in clang check
Martin Kroeker [Mon, 14 Dec 2020 18:21:52 +0000 (19:21 +0100)]
Fix undefined CC variable in clang check

3 years agoFix spurious removal of a trailing character from the hostarch string on x86_64
Martin Kroeker [Sun, 13 Dec 2020 20:28:01 +0000 (21:28 +0100)]
Fix spurious removal of a trailing character from the hostarch string on x86_64

3 years agoMerge pull request #4 from xianyi/develop
Martin Kroeker [Sun, 13 Dec 2020 20:22:41 +0000 (21:22 +0100)]
Merge pull request #4 from xianyi/develop

rebase

3 years agoMerge pull request #3036 from RajalakshmiSR/p10copyalign
Martin Kroeker [Sun, 13 Dec 2020 20:21:34 +0000 (21:21 +0100)]
Merge pull request #3036 from RajalakshmiSR/p10copyalign

POWER10: Improve copy performance

3 years agoPOWER10: Improve copy performance
Rajalakshmi Srinivasaraghavan [Sun, 13 Dec 2020 16:41:45 +0000 (10:41 -0600)]
POWER10: Improve copy performance

This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.

3 years agoDefine BLAS acronym in README
Joshie [Sun, 13 Dec 2020 09:06:14 +0000 (09:06 +0000)]
Define BLAS acronym in README

3 years agoUpdate version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:28:49 +0000 (23:28 +0100)]
Update version to 0.3.13.dev