platform/upstream/openblas.git
3 years agoMerge pull request #3110 from martin-frbg/issue3108
Martin Kroeker [Thu, 18 Feb 2021 14:45:25 +0000 (15:45 +0100)]
Merge pull request #3110 from martin-frbg/issue3108

Fix get_num_procs()  in the USE_TLS branch for non-glibc systems

3 years agoFix get_num_procs() in the USE_TLS branch for non-glibc systems
Martin Kroeker [Thu, 18 Feb 2021 10:14:05 +0000 (11:14 +0100)]
Fix get_num_procs()  in the USE_TLS branch for non-glibc systems

3 years agoMerge pull request #3105 from martin-frbg/tigerlake
Martin Kroeker [Fri, 12 Feb 2021 12:29:53 +0000 (13:29 +0100)]
Merge pull request #3105 from martin-frbg/tigerlake

Recognize Intel Tiger Lake CPUID as SkylakeX

3 years agoMerge pull request #3106 from RajalakshmiSR/ppcbe
Martin Kroeker [Fri, 12 Feb 2021 12:29:23 +0000 (13:29 +0100)]
Merge pull request #3106 from RajalakshmiSR/ppcbe

Fix build issue on POWER8 with DYNAMIC_ARCH

3 years agoFix build issue on POWER8 with DYNAMIC_ARCH
Rajalakshmi Srinivasaraghavan [Fri, 12 Feb 2021 03:28:03 +0000 (21:28 -0600)]
Fix build issue on POWER8 with DYNAMIC_ARCH

Running make DYNAMIC_ARCH=1 on POWER 8 BE with gcc10.2 version, gives
the following error due to the difference in UNROLL_M/N.
'No rule to make target 'dgemm_incopy_POWER10.o', needed by kernel'

3 years agoRecognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:17:11 +0000 (20:17 +0100)]
Recognize Intel Tiger Lake as SkylakeX

3 years agoRecognize Intel Tiger Lake as SkylakeX
Martin Kroeker [Thu, 11 Feb 2021 19:16:27 +0000 (20:16 +0100)]
Recognize Intel Tiger Lake as SkylakeX

3 years agoMerge pull request #3104 from martin-frbg/issue3103
Martin Kroeker [Thu, 11 Feb 2021 14:42:47 +0000 (15:42 +0100)]
Merge pull request #3104 from martin-frbg/issue3103

Enable optimized Haswell/AVX2 kernels for sasum/dasum and srot/drot on Ryzen

3 years agoMerge pull request #3101 from jake-arkinstall/issue-3100
Martin Kroeker [Thu, 11 Feb 2021 14:42:18 +0000 (15:42 +0100)]
Merge pull request #3101 from jake-arkinstall/issue-3100

Addressed issue #3100 - removing an unnecessary write to the include directory

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:26:15 +0000 (09:26 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:25:36 +0000 (09:25 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:51 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well

3 years agoUse Haswell optimizations for Zen as well
Martin Kroeker [Thu, 11 Feb 2021 08:24:16 +0000 (09:24 +0100)]
Use Haswell optimizations for Zen as well

3 years agoEnable optimized srot/drot kernels from Haswell
Martin Kroeker [Thu, 11 Feb 2021 08:23:05 +0000 (09:23 +0100)]
Enable optimized srot/drot kernels from Haswell

3 years agoMerge pull request #3102 from martin-frbg/issue3099
Martin Kroeker [Thu, 11 Feb 2021 07:56:46 +0000 (08:56 +0100)]
Merge pull request #3102 from martin-frbg/issue3099

Strip pkgversion info from compiler version string before comparing

3 years agoStrip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation
Martin Kroeker [Wed, 10 Feb 2021 13:22:59 +0000 (14:22 +0100)]
Strip parenthesized (pkgversion) data from GCC version string to avoid misinterpretation

3 years agoMerge pull request #12 from xianyi/develop
Martin Kroeker [Wed, 10 Feb 2021 13:17:24 +0000 (14:17 +0100)]
Merge pull request #12 from xianyi/develop

rebase

3 years agoAddressed issue #3100, removing an unnecessary write to the include directory
Jake Arkinstall [Wed, 10 Feb 2021 12:11:17 +0000 (12:11 +0000)]
Addressed issue #3100, removing an unnecessary write to the include directory

3 years agoMerge pull request #3094 from xoviat/patch-1
Martin Kroeker [Tue, 2 Feb 2021 12:36:17 +0000 (13:36 +0100)]
Merge pull request #3094 from xoviat/patch-1

build openmp on appveyor

3 years agoMerge pull request #3096 from martin-frbg/fixclangcmake
Martin Kroeker [Tue, 2 Feb 2021 12:33:15 +0000 (13:33 +0100)]
Merge pull request #3096 from martin-frbg/fixclangcmake

Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows

3 years agofix case in compiler name check
Martin Kroeker [Tue, 2 Feb 2021 09:53:46 +0000 (10:53 +0100)]
fix case in compiler name check

Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
3 years agoremove spurious lines (probably editor malfunction)
Martin Kroeker [Mon, 1 Feb 2021 20:02:53 +0000 (21:02 +0100)]
remove spurious lines (probably editor malfunction)

3 years agohandle AppleClang in Cooperlake support condition
Martin Kroeker [Mon, 1 Feb 2021 19:18:53 +0000 (20:18 +0100)]
handle AppleClang in Cooperlake support condition

3 years agoFix compiler version check for Intel Cooperlake support (clang-cl does not accept...
Martin Kroeker [Mon, 1 Feb 2021 18:45:25 +0000 (19:45 +0100)]
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)

3 years agoappveyor: cleanup and add openmp run
xoviat [Sun, 31 Jan 2021 03:28:12 +0000 (21:28 -0600)]
appveyor: cleanup and add openmp run

3 years agoMerge pull request #3073 from xoviat/embedded
Martin Kroeker [Sun, 31 Jan 2021 17:02:41 +0000 (18:02 +0100)]
Merge pull request #3073 from xoviat/embedded

add embedded option

3 years agoMerge pull request #3093 from martin-frbg/fix3064
Martin Kroeker [Sat, 30 Jan 2021 21:21:28 +0000 (22:21 +0100)]
Merge pull request #3093 from martin-frbg/fix3064

fix copy-paste error in build rules for cblas_crotg and cblas_zrotg

3 years agofix copy-paste error in build rules for cblas_crotg and cblas_zrotg
Martin Kroeker [Sat, 30 Jan 2021 15:46:25 +0000 (16:46 +0100)]
fix copy-paste error in build rules for cblas_crotg and cblas_zrotg

3 years agoMerge pull request #3092 from RajalakshmiSR/cscal_p10
Martin Kroeker [Sat, 30 Jan 2021 15:23:37 +0000 (16:23 +0100)]
Merge pull request #3092 from RajalakshmiSR/cscal_p10

Optimize cscal function for POWER10

3 years agoOptimize cscal function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 29 Jan 2021 19:51:43 +0000 (13:51 -0600)]
Optimize cscal function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3091 from martin-frbg/lapack477-2
Martin Kroeker [Fri, 29 Jan 2021 12:37:23 +0000 (13:37 +0100)]
Merge pull request #3091 from martin-frbg/lapack477-2

Fix calculation of the non-exceptional shift values in LAPACK complex QZ

3 years agofix data type
Martin Kroeker [Fri, 29 Jan 2021 09:45:36 +0000 (10:45 +0100)]
fix data type

3 years agofix calculation of non-exceptional shift (from Reference-LAPACK PR 477)
Martin Kroeker [Fri, 29 Jan 2021 08:56:12 +0000 (09:56 +0100)]
fix calculation of non-exceptional shift (from Reference-LAPACK PR 477)

3 years agoMerge pull request #11 from xianyi/develop
Martin Kroeker [Fri, 29 Jan 2021 08:52:21 +0000 (09:52 +0100)]
Merge pull request #11 from xianyi/develop

rebase

3 years agoMerge pull request #3087 from martin-frbg/lapack477
Martin Kroeker [Wed, 27 Jan 2021 18:11:55 +0000 (19:11 +0100)]
Merge pull request #3087 from martin-frbg/lapack477

Apply Reference-LAPACK PR 477 for convergence problems in CHGEQZ/ZHGEQZ

3 years agoAdd exceptional shift to fix rare convergence problems
Martin Kroeker [Wed, 27 Jan 2021 12:41:45 +0000 (13:41 +0100)]
Add exceptional shift to fix rare convergence problems

3 years agoMerge pull request #10 from xianyi/develop
Martin Kroeker [Wed, 27 Jan 2021 12:39:26 +0000 (13:39 +0100)]
Merge pull request #10 from xianyi/develop

rebase

3 years agoMerge pull request #3076 from martin-frbg/dyn-thunderx
Martin Kroeker [Wed, 27 Jan 2021 12:25:45 +0000 (13:25 +0100)]
Merge pull request #3076 from martin-frbg/dyn-thunderx

Add Ci job for ARM64/gcc10 DYNAMIC_ARCH

3 years agoMerge pull request #3085 from alexhenrie/memory_alloc
Martin Kroeker [Tue, 26 Jan 2021 19:11:42 +0000 (20:11 +0100)]
Merge pull request #3085 from alexhenrie/memory_alloc

Fix null pointer check in blas_memory_alloc

3 years agoMerge pull request #3083 from martin-frbg/develop
Martin Kroeker [Tue, 26 Jan 2021 14:13:35 +0000 (15:13 +0100)]
Merge pull request #3083 from martin-frbg/develop

Add DYNAMIC_LIST support for ARM64

3 years agoRemove the VORTEX support bits again for now
Martin Kroeker [Mon, 25 Jan 2021 18:02:21 +0000 (19:02 +0100)]
Remove the VORTEX support bits again for now

3 years agoAdd DYNAMIC_LIST support for ARM64
Martin Kroeker [Mon, 25 Jan 2021 12:13:20 +0000 (13:13 +0100)]
Add DYNAMIC_LIST support for ARM64

3 years agoFix null pointer check in blas_memory_alloc
Alex Henrie [Mon, 25 Jan 2021 05:20:44 +0000 (22:20 -0700)]
Fix null pointer check in blas_memory_alloc

3 years agoAdd DYNAMIC_LIST support for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:52 +0000 (23:18 +0100)]
Add DYNAMIC_LIST support for ARM64

3 years agoAdd DYNAMIC_LIST option for ARM64
Martin Kroeker [Sun, 24 Jan 2021 22:18:01 +0000 (23:18 +0100)]
Add DYNAMIC_LIST option for ARM64

3 years agoMerge pull request #9 from xianyi/develop
Martin Kroeker [Sun, 24 Jan 2021 22:14:45 +0000 (23:14 +0100)]
Merge pull request #9 from xianyi/develop

rebase

3 years agoMerge pull request #3082 from RajalakshmiSR/scalp10
Martin Kroeker [Sun, 24 Jan 2021 18:03:40 +0000 (19:03 +0100)]
Merge pull request #3082 from RajalakshmiSR/scalp10

Optimize s/dscal function for POWER10

3 years agoOptimize s/dscal function for POWER10
Rajalakshmi Srinivasaraghavan [Sun, 24 Jan 2021 13:48:28 +0000 (07:48 -0600)]
Optimize s/dscal function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoadd functions for embedded
xoviat [Sun, 24 Jan 2021 04:12:17 +0000 (22:12 -0600)]
add functions for embedded

3 years agoMerge pull request #3059 from Guobing-Chen/BF16_gemm
Martin Kroeker [Sat, 23 Jan 2021 18:08:05 +0000 (19:08 +0100)]
Merge pull request #3059 from Guobing-Chen/BF16_gemm

Initial code for Cooperlake BF16 GEMM kernel

3 years agoMerge pull request #3068 from alexhenrie/scan-build
Martin Kroeker [Sat, 23 Jan 2021 18:06:29 +0000 (19:06 +0100)]
Merge pull request #3068 from alexhenrie/scan-build

scan-build fixes

3 years agoMerge pull request #3079 from RajalakshmiSR/rotp10
Martin Kroeker [Fri, 22 Jan 2021 07:26:00 +0000 (08:26 +0100)]
Merge pull request #3079 from RajalakshmiSR/rotp10

Optimize s/drot function for POWER10

3 years agoOptimize s/drot function for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 21 Jan 2021 19:24:45 +0000 (13:24 -0600)]
Optimize s/drot function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3075 from martin-frbg/issue3074
Martin Kroeker [Thu, 21 Jan 2021 07:51:30 +0000 (08:51 +0100)]
Merge pull request #3075 from martin-frbg/issue3074

Fix DYNAMIC_ARCH compilation on POWER with gcc <11

3 years agopatch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that...
Martin Kroeker [Wed, 20 Jan 2021 20:34:36 +0000 (21:34 +0100)]
patch to support power10 in builtin_cpu_is was backported to gcc 10.2, so allow that as wel

3 years agoUpdate .drone.yml
Martin Kroeker [Wed, 20 Jan 2021 19:21:27 +0000 (20:21 +0100)]
Update .drone.yml

3 years agoAdd gcc10/arm64 DYNAMIC_ARCH build
Martin Kroeker [Wed, 20 Jan 2021 17:30:05 +0000 (18:30 +0100)]
Add gcc10/arm64 DYNAMIC_ARCH build

3 years agoRequire gcc 11 for builtin_cpu_is(power10)
Martin Kroeker [Wed, 20 Jan 2021 14:41:04 +0000 (15:41 +0100)]
Require gcc 11 for builtin_cpu_is(power10)

fixes #3074

3 years agoMerge pull request #8 from xianyi/develop
Martin Kroeker [Wed, 20 Jan 2021 14:38:30 +0000 (15:38 +0100)]
Merge pull request #8 from xianyi/develop

rebase

3 years agoadd cortex-m platform
xoviat [Tue, 19 Jan 2021 14:57:44 +0000 (08:57 -0600)]
add cortex-m platform

3 years agoMerge pull request #3070 from RajalakshmiSR/cdot
Martin Kroeker [Sat, 16 Jan 2021 14:47:34 +0000 (15:47 +0100)]
Merge pull request #3070 from RajalakshmiSR/cdot

Optimize cdot function for POWER10

3 years agoOptimize cdot function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 15 Jan 2021 19:40:34 +0000 (13:40 -0600)]
Optimize cdot function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoRemove dead assignment to dflag in rotmg functions
Alex Henrie [Fri, 15 Jan 2021 02:40:32 +0000 (19:40 -0700)]
Remove dead assignment to dflag in rotmg functions

3 years agoDon't define the mode variable when not needed in gemm functions
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Don't define the mode variable when not needed in gemm functions

3 years agoFix uninitialized argument value in dasum_k
Alex Henrie [Fri, 15 Jan 2021 02:40:31 +0000 (19:40 -0700)]
Fix uninitialized argument value in dasum_k

3 years agoMerge pull request #3067 from albertziegenhagel/fix-generic-cmake
Martin Kroeker [Thu, 14 Jan 2021 20:35:19 +0000 (21:35 +0100)]
Merge pull request #3067 from albertziegenhagel/fix-generic-cmake

Fix building "generic" TRMM kernel with CMake

3 years agoMerge pull request #3064 from martin-frbg/issue3063
Martin Kroeker [Thu, 14 Jan 2021 15:47:59 +0000 (16:47 +0100)]
Merge pull request #3064 from martin-frbg/issue3063

Add cblas_crotg, cblas_zrotg, cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3066 from martin-frbg/buffsizefix
Martin Kroeker [Thu, 14 Jan 2021 15:00:38 +0000 (16:00 +0100)]
Merge pull request #3066 from martin-frbg/buffsizefix

Fix compile-time setting of the GEMM buffer size for gmake builds

3 years agoMerge pull request #3062 from austinpagan/GemmPreferedSize3
Martin Kroeker [Thu, 14 Jan 2021 14:59:53 +0000 (15:59 +0100)]
Merge pull request #3062 from austinpagan/GemmPreferedSize3

Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWE…

3 years agoMerge pull request #3061 from martin-frbg/arm64-pgi
Martin Kroeker [Thu, 14 Jan 2021 14:59:21 +0000 (15:59 +0100)]
Merge pull request #3061 from martin-frbg/arm64-pgi

Support NVIDIA HPC SDK on ARM64

3 years agoMerge pull request #3051 from martin-frbg/rocketlake
Martin Kroeker [Thu, 14 Jan 2021 14:56:25 +0000 (15:56 +0100)]
Merge pull request #3051 from martin-frbg/rocketlake

Add CPUID information for Intel Rocket Lake

3 years agoFix building "generic" TRMM kernel with CMake
Albert Ziegenhagel [Thu, 14 Jan 2021 09:00:49 +0000 (10:00 +0100)]
Fix building "generic" TRMM kernel with CMake

The CMake "TARGET_CORE" variables stores the "generic" target name in all lowercase letters, but gets compared to an all uppercase string, which results in the wrong TRMM kernel being selected.
This commit converts the TARGET_CORE to all uppercase before comparing its value to make sure case mismatches are not an issue in the future anymore.

3 years agoMake compile-time BUFFERSIZE setting actually reach the compiler/preprocessor
Martin Kroeker [Wed, 13 Jan 2021 21:36:04 +0000 (22:36 +0100)]
Make compile-time BUFFERSIZE setting actually reach the compiler/preprocessor

3 years agoWorkaround for cmake having its own C_COMPILER variable
Martin Kroeker [Wed, 13 Jan 2021 11:30:26 +0000 (12:30 +0100)]
Workaround for cmake having its own C_COMPILER variable

3 years agotry to work around gcc update problems
Martin Kroeker [Wed, 13 Jan 2021 08:46:53 +0000 (09:46 +0100)]
try to work around gcc update problems

3 years agoAdd prototypes for CBLAS_CROTG and CBLAS_ZROTG
Martin Kroeker [Tue, 12 Jan 2021 23:30:27 +0000 (00:30 +0100)]
Add prototypes for CBLAS_CROTG and CBLAS_ZROTG

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:29:38 +0000 (00:29 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agorestore Makefile after accidental overwrite
Martin Kroeker [Tue, 12 Jan 2021 23:28:43 +0000 (00:28 +0100)]
restore Makefile after accidental overwrite

3 years agoBuild CBLAS interfaces for CROTG and ZROTG as well
Martin Kroeker [Tue, 12 Jan 2021 23:27:42 +0000 (00:27 +0100)]
Build CBLAS interfaces for CROTG and ZROTG as well

3 years agoAdd CBLAS interfaces for csrot and zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:22:00 +0000 (23:22 +0100)]
Add CBLAS interfaces for csrot and zdrot

3 years agoAdd prototypes for cblas_csrot and cblas_zdrot
Martin Kroeker [Tue, 12 Jan 2021 22:20:07 +0000 (23:20 +0100)]
Add prototypes for cblas_csrot and cblas_zdrot

3 years agoMerge pull request #3060 from martin-frbg/dyn_arm64
Martin Kroeker [Tue, 12 Jan 2021 22:02:05 +0000 (23:02 +0100)]
Merge pull request #3060 from martin-frbg/dyn_arm64

Label the assembly part of the ARMV8 dynamic arch detection as volatile

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:51:35 +0000 (16:51 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:49:39 +0000 (16:49 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC
Martin Kroeker [Tue, 12 Jan 2021 15:47:15 +0000 (16:47 +0100)]
Add workaround for NVIDIA HPC

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:39:35 +0000 (16:39 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoAdd workaround for NVIDIA HPC mishandling of the asm DOT kernels
Martin Kroeker [Tue, 12 Jan 2021 15:38:51 +0000 (16:38 +0100)]
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels

3 years agoSupport NVIDIA HPC compiler
Martin Kroeker [Tue, 12 Jan 2021 15:36:12 +0000 (16:36 +0100)]
Support NVIDIA HPC compiler

3 years agoSupport compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
Martin Kroeker [Tue, 12 Jan 2021 15:34:18 +0000 (16:34 +0100)]
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)

3 years agoSupport compilation with nvfortran
Martin Kroeker [Tue, 12 Jan 2021 15:32:29 +0000 (16:32 +0100)]
Support compilation with nvfortran

3 years agoAdded definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10...
Gordon Fossum [Tue, 12 Jan 2021 02:13:53 +0000 (21:13 -0500)]
Added definitions for GEMM_PREFERED_SIZE and SWITCH_RATIO to the POWER9 and POWER10 specific sections of param.h.

3 years agoLabel get_cpu_ftr as volatile to keep gcc from rearranging the code
Martin Kroeker [Mon, 11 Jan 2021 18:05:29 +0000 (19:05 +0100)]
Label get_cpu_ftr as volatile to keep gcc from rearranging the code

3 years agoInitial code for Cooperlake BF16 GEMM kernel
Chen, Guobing [Sun, 10 Jan 2021 18:15:21 +0000 (02:15 +0800)]
Initial code for Cooperlake BF16 GEMM kernel

3 years agoMerge pull request #7 from xianyi/develop
Martin Kroeker [Sun, 10 Jan 2021 16:09:46 +0000 (17:09 +0100)]
Merge pull request #7 from xianyi/develop

rebase

3 years agoMerge pull request #3055 from RajalakshmiSR/swapp10
Martin Kroeker [Fri, 8 Jan 2021 23:11:44 +0000 (00:11 +0100)]
Merge pull request #3055 from RajalakshmiSR/swapp10

Optimize swap function for POWER10

3 years agoOptimize swap function for POWER10
Rajalakshmi Srinivasaraghavan [Fri, 8 Jan 2021 14:01:36 +0000 (08:01 -0600)]
Optimize swap function for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #3053 from pkubaj/patch-1
Martin Kroeker [Sat, 2 Jan 2021 15:14:07 +0000 (16:14 +0100)]
Merge pull request #3053 from pkubaj/patch-1

Fix build on FreeBSD/powerpc64le

3 years agoFix build on FreeBSD/powerpc64le
pkubaj [Fri, 1 Jan 2021 21:19:57 +0000 (21:19 +0000)]
Fix build on FreeBSD/powerpc64le

3 years agoMerge pull request #3052 from ashwinyes/arm64_fix_nrm2
Martin Kroeker [Fri, 1 Jan 2021 14:51:07 +0000 (15:51 +0100)]
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2

arm64: Fix nrm2 for input vectors with Inf

3 years agoarm64: Fix nrm2 for input vectors with Inf
Ashwin Sekhar T K [Fri, 1 Jan 2021 10:09:40 +0000 (02:09 -0800)]
arm64: Fix nrm2 for input vectors with Inf

Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.