platform/upstream/openblas.git
3 years agoOptimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3067 from albertziegenhagel/fix-generic-cmake
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake

Fix building "generic" TRMM kernel with CMake

3 years agoMerge pull request #3064 from martin-frbg/issue3063
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063

Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3066 from martin-frbg/buffsizefix
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix

Fix compile-time setting of the GEMM buffer size for gmake builds

3 years agoMerge pull request #3062 from austinpagan/GemmPreferedSize3
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3

Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…

3 years agoMerge pull request #3061 from martin-frbg/arm64-pgi
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi

Support NVIDIA HPC SDK on ARM64

3 years agoMerge pull request #3051 from martin-frbg/rocketlake
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake

Add CPUID information for Intel Rocket Lake

3 years agoFix building "generic" TRMM kernel with CMake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake

The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.

3 years agoMake compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor

3 years agoWorkaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable

3 years agotry to work around gcc update problems
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems

3 years agoAdd prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agorestore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agoAdd CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot

3 years agoAdd prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3060 from martin-frbg/dyn_arm64
Martin Kroeker [Tue, 12 Jan 2021 22:02:05 +0000 (23:02 +0100)]
Merge pull request #3060 from martin-frbg/dyn_arm64

Label the assembly part of the ARMV8 dynamic arch detection as volatile

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:51:35 +0000 (16:51 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:49:39 +0000 (16:49 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:47:15 +0000 (16:47 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:39:35 +0000 (16:39 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:38:51 +0000 (16:38 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoSupport NVIDIA HPC compiler
Martin Kroeker [Tue, 12 Jan 2021 15:36:12 +0000 (16:36 +0100)]
Support NVIDIA HPC compiler

3 years agoSupport compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
Martin Kroeker [Tue, 12 Jan 2021 15:34:18 +0000 (16:34 +0100)]
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)

3 years agoSupport compilation with nvfortran
Martin Kroeker [Tue, 12 Jan 2021 15:32:29 +0000 (16:32 +0100)]
Support compilation with nvfortran

3 years agoAdded definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10...
Gordon Fossum [Tue, 12 Jan 2021 02:13:53 +0000 (21:13 -0500)]
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.

3 years agoLabel get_cpu_ftr as volatile to keep gcc from rearranging the code
Martin Kroeker [Mon, 11 Jan 2021 18:05:29 +0000 (19:05 +0100)]
Label get_cpu_ftr as volatile to keep gcc from rearranging the code

3 years agoMerge pull request #7 from xianyi/develop
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop

rebase

3 years agoMerge pull request #3055 from RajalakshmiSR/swapp10
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10

Optimize swap function for POWER10

3 years agoOptimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3053 from pkubaj/patch-1
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1

Fix build on FreeBSD/powerpc64le

3 years agoFix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le

3 years agoMerge pull request #3052 from ashwinyes/arm64_fix_nrm2
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2

arm64: Fix nrm2 for input vectors with Inf

3 years agoarm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf

Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.

3 years agoMerge pull request #3050 from aurel32/riscv64-openblas-supported
Martin Kroeker [Tue, 29 Dec 2020 20:59:40 +0000 (21:59 +0100)]
Merge pull request #3050 from aurel32/riscv64-openblas-supported

getarch.c: define OPENBLAS_SUPPORTED for riscv64

3 years agogetarch.c: define OPENBLAS_SUPPORTED for riscv64
Aurelien Jarno [Tue, 29 Dec 2020 12:06:39 +0000 (12:06 +0000)]
getarch.c: define OPENBLAS_SUPPORTED for riscv64

3 years agoMerge pull request #3049 from martin-frbg/readme
Martin Kroeker [Sun, 27 Dec 2020 21:54:20 +0000 (22:54 +0100)]
Merge pull request #3049 from martin-frbg/readme

Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos

3 years agoAdd pointers to the netlib documentation and Gilbert Strang's linear algebra primers
Martin Kroeker [Sun, 27 Dec 2020 20:55:08 +0000 (21:55 +0100)]
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers

3 years agoMerge pull request #6 from xianyi/develop
Martin Kroeker [Sun, 27 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Merge pull request #6 from xianyi/develop

rebase

3 years agoMerge pull request #3035 from Joshua-Ashton/patch-1
Martin Kroeker [Sun, 27 Dec 2020 20:26:52 +0000 (21:26 +0100)]
Merge pull request #3035 from Joshua-Ashton/patch-1

Define BLAS acronym in README

3 years agoMerge pull request #3048 from martin-frbg/issue2998
Martin Kroeker [Mon, 21 Dec 2020 12:30:08 +0000 (13:30 +0100)]
Merge pull request #3048 from martin-frbg/issue2998

Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1

3 years agoTemporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:45:13 +0000 (07:45 +0100)]
Temporarily revert to the old nrm2 kernels

3 years agoTemporarily revert to the old nrm2 kernels
Martin Kroeker [Mon, 21 Dec 2020 06:42:51 +0000 (07:42 +0100)]
Temporarily revert to the old nrm2 kernels

3 years agoTemporarily revert to the old nrm2 kernel
Martin Kroeker [Mon, 21 Dec 2020 06:41:18 +0000 (07:41 +0100)]
Temporarily revert to the old nrm2 kernel

3 years agoMerge pull request #3046 from martin-frbg/nvidiasdk-ppc
Martin Kroeker [Sun, 20 Dec 2020 10:55:53 +0000 (11:55 +0100)]
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc

Support NVIDIA HPC SDK on POWERPC

3 years agoImplement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:21:22 +0000 (23:21 +0100)]
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers

3 years agoNVIDIA compiler does not yet support POWER10
Martin Kroeker [Sat, 19 Dec 2020 22:19:05 +0000 (23:19 +0100)]
NVIDIA compiler does not yet support POWER10

3 years agoLimit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
Martin Kroeker [Sat, 19 Dec 2020 22:17:40 +0000 (23:17 +0100)]
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers

3 years agoMerge pull request #3045 from martin-frbg/nvidiasdk
Martin Kroeker [Sat, 19 Dec 2020 22:14:02 +0000 (23:14 +0100)]
Merge pull request #3045 from martin-frbg/nvidiasdk

Support NVIDIA HPC SDK 20.11 compilers on x86_64

3 years agoDisable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
Martin Kroeker [Sat, 19 Dec 2020 21:15:58 +0000 (22:15 +0100)]
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA

3 years agoAmend SkylakeX options to support the NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:11:49 +0000 (22:11 +0100)]
Amend SkylakeX options to support the NVIDIA compiler

3 years agoAdd nvfortran
Martin Kroeker [Sat, 19 Dec 2020 21:09:57 +0000 (22:09 +0100)]
Add nvfortran

3 years agoAdd/modify "PGI" compiler options for NVIDIA SDK 20.11
Martin Kroeker [Sat, 19 Dec 2020 21:08:37 +0000 (22:08 +0100)]
Add/modify "PGI" compiler options for NVIDIA SDK 20.11

3 years agoAdd version printout for PGI/NVIDIA compiler
Martin Kroeker [Sat, 19 Dec 2020 21:06:56 +0000 (22:06 +0100)]
Add version printout for PGI/NVIDIA compiler

3 years agoMerge pull request #5 from xianyi/develop
Martin Kroeker [Sat, 19 Dec 2020 19:11:06 +0000 (20:11 +0100)]
Merge pull request #5 from xianyi/develop

rebase

3 years agoMerge pull request #3042 from martin-frbg/develop
Martin Kroeker [Sat, 19 Dec 2020 19:04:19 +0000 (20:04 +0100)]
Merge pull request #3042 from martin-frbg/develop

Move FMA3 option setting to the kernel makefile

3 years agoConditionally add -mfma to compiler options where needed
Martin Kroeker [Thu, 17 Dec 2020 10:34:05 +0000 (11:34 +0100)]
Conditionally add -mfma to compiler options where needed

3 years agoMove -fma option setting to kernel/Makefile.L1
Martin Kroeker [Thu, 17 Dec 2020 10:32:27 +0000 (11:32 +0100)]
Move -fma option setting to kernel/Makefile.L1

3 years agoMerge pull request #3040 from martin-frbg/fixfcheck
Martin Kroeker [Tue, 15 Dec 2020 23:05:04 +0000 (00:05 +0100)]
Merge pull request #3040 from martin-frbg/fixfcheck

Fix undefined CC variable in check for clang+gfortran combo

3 years agoMerge pull request #3038 from martin-frbg/issue3037
Martin Kroeker [Tue, 15 Dec 2020 23:04:45 +0000 (00:04 +0100)]
Merge pull request #3038 from martin-frbg/issue3037

Fix spurious assumption of cross-compilation on some architectures

3 years agoAdd Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 21:40:23 +0000 (22:40 +0100)]
Add Intel Rocket Lake

3 years agoAdd Intel Rocket Lake
Martin Kroeker [Mon, 14 Dec 2020 21:30:36 +0000 (22:30 +0100)]
Add Intel Rocket Lake

3 years agoFix undefined CC variable in clang check
Martin Kroeker [Mon, 14 Dec 2020 18:21:52 +0000 (19:21 +0100)]
Fix undefined CC variable in clang check

3 years agoFix spurious removal of a trailing character from the hostarch string on x86_64
Martin Kroeker [Sun, 13 Dec 2020 20:28:01 +0000 (21:28 +0100)]
Fix spurious removal of a trailing character from the hostarch string on x86_64

3 years agoMerge pull request #4 from xianyi/develop
Martin Kroeker [Sun, 13 Dec 2020 20:22:41 +0000 (21:22 +0100)]
Merge pull request #4 from xianyi/develop

rebase

3 years agoMerge pull request #3036 from RajalakshmiSR/p10copyalign
Martin Kroeker [Sun, 13 Dec 2020 20:21:34 +0000 (21:21 +0100)]
Merge pull request #3036 from RajalakshmiSR/p10copyalign

POWER10: Improve copy performance

3 years agoPOWER10: Improve copy performance
Rajalakshmi Srinivasaraghavan [Sun, 13 Dec 2020 16:41:45 +0000 (10:41 -0600)]
POWER10: Improve copy performance

This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.

3 years agoDefine BLAS acronym in README
Joshie [Sun, 13 Dec 2020 09:06:14 +0000 (09:06 +0000)]
Define BLAS acronym in README

3 years agoUpdate version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:28:49 +0000 (23:28 +0100)]
Update version to 0.3.13.dev

3 years agoUpdate version to 0.3.13.dev
Martin Kroeker [Sat, 12 Dec 2020 22:28:20 +0000 (23:28 +0100)]
Update version to 0.3.13.dev

3 years agoMerge pull request #3034 from xianyi/release-0.3.0
Martin Kroeker [Sat, 12 Dec 2020 22:27:40 +0000 (23:27 +0100)]
Merge pull request #3034 from xianyi/release-0.3.0

Merge back the release branch into develop to copy tag

3 years agoMerge pull request #3033 from xianyi/develop
Martin Kroeker [Sat, 12 Dec 2020 17:19:29 +0000 (18:19 +0100)]
Merge pull request #3033 from xianyi/develop

Update branch from develop to release 0.3.13

3 years agoUpdate version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:15:33 +0000 (18:15 +0100)]
Update version to 0.3.13 for release

3 years agoUpdate version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:14:49 +0000 (18:14 +0100)]
Update version to 0.3.13 for release

3 years agoMerge pull request #3031 from martin-frbg/changelog13
Martin Kroeker [Sat, 12 Dec 2020 17:13:23 +0000 (18:13 +0100)]
Merge pull request #3031 from martin-frbg/changelog13

Update Changelog.txt

3 years agoUpdate Changelog.txt
Martin Kroeker [Sat, 12 Dec 2020 13:27:37 +0000 (14:27 +0100)]
Update Changelog.txt

Co-authored-by: h-vetinari <h.vetinari@gmx.com>
3 years agoMerge pull request #3030 from martin-frbg/fix2994
Martin Kroeker [Sat, 12 Dec 2020 09:01:45 +0000 (10:01 +0100)]
Merge pull request #3030 from martin-frbg/fix2994

Make fallback from POWER10 to POWER9 depend on new enough compiler

3 years agoUpdate Changelog.txt for 0.3.13
Martin Kroeker [Sat, 12 Dec 2020 00:25:20 +0000 (01:25 +0100)]
Update Changelog.txt for 0.3.13

3 years agoMake fallback from P10 to P9 conditional on suitable compiler
Martin Kroeker [Fri, 11 Dec 2020 22:41:17 +0000 (23:41 +0100)]
Make fallback from P10 to P9 conditional on suitable compiler

3 years agoMerge pull request #3 from xianyi/develop
Martin Kroeker [Fri, 11 Dec 2020 22:38:42 +0000 (23:38 +0100)]
Merge pull request #3 from xianyi/develop

rebase

3 years agoMerge pull request #2994 from antonblanchard/power10-fixes
Martin Kroeker [Fri, 11 Dec 2020 22:37:30 +0000 (23:37 +0100)]
Merge pull request #2994 from antonblanchard/power10-fixes

Power10 fixes

3 years agoMerge pull request #3029 from RajalakshmiSR/axpyp10
Martin Kroeker [Thu, 10 Dec 2020 21:49:28 +0000 (22:49 +0100)]
Merge pull request #3029 from RajalakshmiSR/axpyp10

POWER10: Improve axpy performance

3 years agoMerge pull request #3021 from austinpagan/trsm_p10
Martin Kroeker [Thu, 10 Dec 2020 18:42:54 +0000 (19:42 +0100)]
Merge pull request #3021 from austinpagan/trsm_p10

POWER: Added special unrolled vectorized versions of "Solve" for specific si…

3 years agoPOWER10: Improve axpy performance
Rajalakshmi Srinivasaraghavan [Thu, 10 Dec 2020 17:51:42 +0000 (11:51 -0600)]
POWER10: Improve axpy performance

This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.

3 years agoMerge pull request #3026 from martin-frbg/revert747
Martin Kroeker [Thu, 10 Dec 2020 15:29:41 +0000 (16:29 +0100)]
Merge pull request #3026 from martin-frbg/revert747

Revert PR747 - SYRK parameter changes for Haswell and related targets

3 years agoMerge pull request #3027 from gxw-loongson/develop
Martin Kroeker [Thu, 10 Dec 2020 15:27:30 +0000 (16:27 +0100)]
Merge pull request #3027 from gxw-loongson/develop

Add msa support for loongson

3 years agoKeep LOONGSON3A and LOONGSON3B for loongson
gxw [Thu, 10 Dec 2020 02:48:53 +0000 (10:48 +0800)]
Keep LOONGSON3A and LOONGSON3B for loongson

3 years agoAdd msa support for loongson
gxw [Thu, 26 Nov 2020 06:59:41 +0000 (14:59 +0800)]
Add msa support for loongson

1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson

Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1

3 years agoRemove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
Martin Kroeker [Tue, 8 Dec 2020 20:07:57 +0000 (21:07 +0100)]
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)

3 years agoremove extra/intermediate size step for min_jj introduced in PR747
Martin Kroeker [Tue, 8 Dec 2020 20:01:36 +0000 (21:01 +0100)]
remove extra/intermediate size step for min_jj introduced in PR747

3 years agoremove extra/intermediate size step of min_jj from PR747
Martin Kroeker [Tue, 8 Dec 2020 19:59:56 +0000 (20:59 +0100)]
remove extra/intermediate size step of min_jj from PR747

3 years agoMerge pull request #2 from xianyi/develop
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop

rebase

3 years agoRemove gcc unrecognized option '-msched-weight' when check msa
gxw [Tue, 8 Dec 2020 11:16:39 +0000 (19:16 +0800)]
Remove gcc unrecognized option '-msched-weight' when check msa

3 years agoMerge pull request #3025 from TiredNotTear/develop
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop

MIPS: Fix two bugs

3 years agoAdd PingTouGe contribution credit.
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.

3 years agoMerge pull request #3022 from jinboson/develop
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop

Fix test errors reported by cblas_cgemm & cblas_ctrmm

3 years agoFix failed cgemv and zgemv test case after using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization

The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.

3 years agoFix failed sswap and dswap case by using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization

The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.

3 years agoMerge pull request #3024 from martin-frbg/sparc
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc

Fix 32 and 64bit builds on SPARC with SolarisStudio compilers