platform/upstream/openblas.git
3 years agoMerge pull request #3034 from xianyi/release-0.3.0
Martin Kroeker [Sat, 12 Dec 2020 22:27:40 +0000 (23:27 +0100)]
Merge pull request #3034 from xianyi/release-0.3.0

Merge back the release branch into develop to copy tag

3 years agoMerge pull request #3033 from xianyi/develop
Martin Kroeker [Sat, 12 Dec 2020 17:19:29 +0000 (18:19 +0100)]
Merge pull request #3033 from xianyi/develop

Update branch from develop to release 0.3.13

3 years agoUpdate version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:15:33 +0000 (18:15 +0100)]
Update version to 0.3.13 for release

3 years agoUpdate version to 0.3.13 for release
Martin Kroeker [Sat, 12 Dec 2020 17:14:49 +0000 (18:14 +0100)]
Update version to 0.3.13 for release

3 years agoMerge pull request #3031 from martin-frbg/changelog13
Martin Kroeker [Sat, 12 Dec 2020 17:13:23 +0000 (18:13 +0100)]
Merge pull request #3031 from martin-frbg/changelog13

Update Changelog.txt

3 years agoUpdate Changelog.txt
Martin Kroeker [Sat, 12 Dec 2020 13:27:37 +0000 (14:27 +0100)]
Update Changelog.txt

Co-authored-by: h-vetinari <h.vetinari@gmx.com>
3 years agoMerge pull request #3030 from martin-frbg/fix2994
Martin Kroeker [Sat, 12 Dec 2020 09:01:45 +0000 (10:01 +0100)]
Merge pull request #3030 from martin-frbg/fix2994

Make fallback from POWER10 to POWER9 depend on new enough compiler

3 years agoUpdate Changelog.txt for 0.3.13
Martin Kroeker [Sat, 12 Dec 2020 00:25:20 +0000 (01:25 +0100)]
Update Changelog.txt for 0.3.13

3 years agoMake fallback from P10 to P9 conditional on suitable compiler
Martin Kroeker [Fri, 11 Dec 2020 22:41:17 +0000 (23:41 +0100)]
Make fallback from P10 to P9 conditional on suitable compiler

3 years agoMerge pull request #3 from xianyi/develop
Martin Kroeker [Fri, 11 Dec 2020 22:38:42 +0000 (23:38 +0100)]
Merge pull request #3 from xianyi/develop

rebase

3 years agoMerge pull request #2994 from antonblanchard/power10-fixes
Martin Kroeker [Fri, 11 Dec 2020 22:37:30 +0000 (23:37 +0100)]
Merge pull request #2994 from antonblanchard/power10-fixes

Power10 fixes

3 years agoMerge pull request #3029 from RajalakshmiSR/axpyp10
Martin Kroeker [Thu, 10 Dec 2020 21:49:28 +0000 (22:49 +0100)]
Merge pull request #3029 from RajalakshmiSR/axpyp10

POWER10: Improve axpy performance

3 years agoMerge pull request #3021 from austinpagan/trsm_p10
Martin Kroeker [Thu, 10 Dec 2020 18:42:54 +0000 (19:42 +0100)]
Merge pull request #3021 from austinpagan/trsm_p10

POWER: Added special unrolled vectorized versions of "Solve" for specific si…

3 years agoPOWER10: Improve axpy performance
Rajalakshmi Srinivasaraghavan [Thu, 10 Dec 2020 17:51:42 +0000 (11:51 -0600)]
POWER10: Improve axpy performance

This patch aligns the stores to 32 byte boundary for saxpy and daxpy
before entering into vector pair loop. Fox caxpy, changed the store
instructions to stxv to improve performance of unaligned cases.

3 years agoMerge pull request #3026 from martin-frbg/revert747
Martin Kroeker [Thu, 10 Dec 2020 15:29:41 +0000 (16:29 +0100)]
Merge pull request #3026 from martin-frbg/revert747

Revert PR747 - SYRK parameter changes for Haswell and related targets

3 years agoMerge pull request #3027 from gxw-loongson/develop
Martin Kroeker [Thu, 10 Dec 2020 15:27:30 +0000 (16:27 +0100)]
Merge pull request #3027 from gxw-loongson/develop

Add msa support for loongson

3 years agoKeep LOONGSON3A and LOONGSON3B for loongson
gxw [Thu, 10 Dec 2020 02:48:53 +0000 (10:48 +0800)]
Keep LOONGSON3A and LOONGSON3B for loongson

3 years agoAdd msa support for loongson
gxw [Thu, 26 Nov 2020 06:59:41 +0000 (14:59 +0800)]
Add msa support for loongson

1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson

Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1

3 years agoRemove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)
Martin Kroeker [Tue, 8 Dec 2020 20:07:57 +0000 (21:07 +0100)]
Remove GEMM_DEFAULT_UNROLL_MN parameters for Haswell and ZEN (introduced in PR747)

3 years agoremove extra/intermediate size step for min_jj introduced in PR747
Martin Kroeker [Tue, 8 Dec 2020 20:01:36 +0000 (21:01 +0100)]
remove extra/intermediate size step for min_jj introduced in PR747

3 years agoremove extra/intermediate size step of min_jj from PR747
Martin Kroeker [Tue, 8 Dec 2020 19:59:56 +0000 (20:59 +0100)]
remove extra/intermediate size step of min_jj from PR747

3 years agoMerge pull request #2 from xianyi/develop
Martin Kroeker [Tue, 8 Dec 2020 19:53:35 +0000 (20:53 +0100)]
Merge pull request #2 from xianyi/develop

rebase

3 years agoRemove gcc unrecognized option '-msched-weight' when check msa
gxw [Tue, 8 Dec 2020 11:16:39 +0000 (19:16 +0800)]
Remove gcc unrecognized option '-msched-weight' when check msa

3 years agoMerge pull request #3025 from TiredNotTear/develop
Martin Kroeker [Tue, 8 Dec 2020 08:39:27 +0000 (09:39 +0100)]
Merge pull request #3025 from TiredNotTear/develop

MIPS: Fix two bugs

3 years agoAdd PingTouGe contribution credit.
Xianyi Zhang [Mon, 7 Dec 2020 08:55:05 +0000 (16:55 +0800)]
Add PingTouGe contribution credit.

3 years agoMerge pull request #3022 from jinboson/develop
Martin Kroeker [Mon, 7 Dec 2020 07:09:11 +0000 (08:09 +0100)]
Merge pull request #3022 from jinboson/develop

Fix test errors reported by cblas_cgemm & cblas_ctrmm

3 years agoFix failed cgemv and zgemv test case after using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:18:51 +0000 (10:18 +0800)]
Fix failed cgemv and zgemv test case after using msa optimization

The cgemv and zgemv test case will call cgemv_n/t_msa.c zgemv_n/t_msa.c files in MIPS environment.
When the macro CONJ is defined, the calculation result will be wrong due to the wrong definition of OP2.
This patch updates the value of OP2 and passes the corresponding test.

3 years agoFix failed sswap and dswap case by using msa optimization
Hao Chen [Mon, 7 Dec 2020 02:04:00 +0000 (10:04 +0800)]
Fix failed sswap and dswap case by using msa optimization

The swap test case will call sswap_msa.c and dswap_msa.c files in MIPS environmnet.
When inc_x or inc_y is equal to zero, the calculation result of the two functions will be wrong.
This patch adds the processing of inc_x or inc_y equal to zero, and the swap test case has passed.

3 years agoMerge pull request #3024 from martin-frbg/sparc
Martin Kroeker [Sun, 6 Dec 2020 21:34:36 +0000 (22:34 +0100)]
Merge pull request #3024 from martin-frbg/sparc

Fix 32 and 64bit builds on SPARC with SolarisStudio compilers

3 years agoFix compiler options for 32 and 64bit SPARC builds with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:20:50 +0000 (19:20 +0100)]
Fix compiler options for 32 and 64bit SPARC builds with SolarisStudio

3 years agoWork around DOT and SWAP test failures
Martin Kroeker [Sun, 6 Dec 2020 18:15:37 +0000 (19:15 +0100)]
Work around DOT and SWAP test failures

3 years agoFix compilation with SolarisStudio
Martin Kroeker [Sun, 6 Dec 2020 18:14:16 +0000 (19:14 +0100)]
Fix compilation with SolarisStudio

3 years agoFix utest build with SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:12:56 +0000 (19:12 +0100)]
Fix utest build with SolarisStudio compilers

3 years agoChange comments to C style for compatibility
Martin Kroeker [Sun, 6 Dec 2020 18:12:02 +0000 (19:12 +0100)]
Change comments to C style for compatibility

3 years agoFix complex ABI for 32bit SolarisStudio builds
Martin Kroeker [Sun, 6 Dec 2020 18:08:43 +0000 (19:08 +0100)]
Fix complex ABI for 32bit SolarisStudio builds

3 years agoFix hostarch detection for sparc
Martin Kroeker [Sun, 6 Dec 2020 18:07:45 +0000 (19:07 +0100)]
Fix hostarch detection for sparc

3 years agoFix build options for SolarisStudio compilers
Martin Kroeker [Sun, 6 Dec 2020 18:05:27 +0000 (19:05 +0100)]
Fix build options for SolarisStudio compilers

3 years agoMerge pull request #1 from xianyi/develop
Martin Kroeker [Sun, 6 Dec 2020 17:52:51 +0000 (18:52 +0100)]
Merge pull request #1 from xianyi/develop

rebase

3 years agoFix test errors reported by cblas_cgemm & cblas_ctrmm
Jin Bo [Sat, 5 Dec 2020 07:06:12 +0000 (15:06 +0800)]
Fix test errors reported by cblas_cgemm & cblas_ctrmm

The file cgemm_kernel_8x4_msa.c holds the MSA optimization
codes of cblas_cgemm and cblas_ctrmm. It defines two
macros: CGEMM_SCALE_1X2 and CGEMM_TRMM_SCALE_1X2. The pc1
array index in the two macros should be 0 and 1.

3 years agoAdded special unrolled vectorized versions of "Solve" for specific sizes,
Gordon Fossum [Fri, 4 Dec 2020 23:07:06 +0000 (17:07 -0600)]
Added special unrolled vectorized versions of "Solve" for specific sizes,
in DTRSM and STRSM, to improve performance in Power9 and Power10.

3 years agoMerge pull request #3018 from martin-frbg/issue3015
Martin Kroeker [Fri, 4 Dec 2020 21:08:17 +0000 (22:08 +0100)]
Merge pull request #3018 from martin-frbg/issue3015

Avoid concurrent inclusion of libgomp and libomp in clang+gfortran builds

3 years agoMerge pull request #3016 from xiegengxin/complex-asum
Martin Kroeker [Fri, 4 Dec 2020 21:07:16 +0000 (22:07 +0100)]
Merge pull request #3016 from xiegengxin/complex-asum

Improve the performance of zasum and casum with AVX512 intrinsic

3 years agoMerge pull request #3013 from martin-frbg/gcc46
Martin Kroeker [Fri, 4 Dec 2020 07:54:11 +0000 (08:54 +0100)]
Merge pull request #3013 from martin-frbg/gcc46

Fix 32bit x86 builds and add workaround for x86_64 miscompilations by gcc 4.6 (including our Travis setup)

3 years agoMerge pull request #3011 from cyyever/fix_link
Martin Kroeker [Fri, 4 Dec 2020 07:50:59 +0000 (08:50 +0100)]
Merge pull request #3011 from cyyever/fix_link

link math lib on FreeBSD

3 years agoMerge pull request #3019 from RajalakshmiSR/dgemm_param
Martin Kroeker [Fri, 4 Dec 2020 07:49:28 +0000 (08:49 +0100)]
Merge pull request #3019 from RajalakshmiSR/dgemm_param

POWER10: Update param.h

3 years agoUpdate f_check
Martin Kroeker [Thu, 3 Dec 2020 22:43:17 +0000 (23:43 +0100)]
Update f_check

3 years agoPOWER10: Update param.h
Rajalakshmi Srinivasaraghavan [Thu, 3 Dec 2020 20:40:11 +0000 (14:40 -0600)]
POWER10: Update param.h

Increasing the values of DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q helps
in improving performance ~10% for DGEMM.

3 years agoAdd libomp to the LAPACK(-test) dependencies in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:28:10 +0000 (21:28 +0100)]
Add libomp to the LAPACK(-test) dependencies in clang/gfortran builds

3 years agoAvoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds
Martin Kroeker [Thu, 3 Dec 2020 20:25:57 +0000 (21:25 +0100)]
Avoid linking both GNU libgomp and LLVM libomp in clang/gfortran builds

3 years agouse gfortran-10 with xcode 12
Martin Kroeker [Thu, 3 Dec 2020 13:32:21 +0000 (14:32 +0100)]
use gfortran-10 with xcode 12

3 years agoUpdate .travis.yml
Martin Kroeker [Thu, 3 Dec 2020 08:17:27 +0000 (09:17 +0100)]
Update .travis.yml

3 years agofix misplaced lines
Martin Kroeker [Wed, 2 Dec 2020 22:13:13 +0000 (23:13 +0100)]
fix misplaced lines

3 years agofix gfortran requirement in osx interface64 test
Martin Kroeker [Wed, 2 Dec 2020 14:56:21 +0000 (15:56 +0100)]
fix gfortran requirement in osx interface64 test

3 years agoDisable deprecated 32bit xcode
Martin Kroeker [Wed, 2 Dec 2020 06:49:43 +0000 (07:49 +0100)]
Disable deprecated 32bit xcode

3 years agofix error declare function blas_level1_thread_with_return_value
Gengxin Xie [Wed, 2 Dec 2020 01:51:52 +0000 (09:51 +0800)]
fix error declare function blas_level1_thread_with_return_value

3 years agoUpdate an overlooked instance of xcode 10.0 as well
Martin Kroeker [Tue, 1 Dec 2020 21:05:35 +0000 (22:05 +0100)]
Update an overlooked instance of xcode 10.0 as well

3 years agoUpdate OSX xcode version to 11.5
Martin Kroeker [Tue, 1 Dec 2020 11:23:30 +0000 (12:23 +0100)]
Update OSX xcode version to 11.5

3 years agoImprove the performance of zasum and casum with AVX512 intrinsic
Gengxin Xie [Tue, 1 Dec 2020 08:49:26 +0000 (16:49 +0800)]
Improve the performance of zasum and casum with AVX512 intrinsic

3 years agoSuppress -mfma as well for gcc 4.6
Martin Kroeker [Mon, 30 Nov 2020 20:41:51 +0000 (21:41 +0100)]
Suppress -mfma as well for gcc 4.6

3 years agoMove the version check to avoid overwriting unprocessed compiler data
Martin Kroeker [Mon, 30 Nov 2020 16:24:27 +0000 (17:24 +0100)]
Move the version check to avoid overwriting unprocessed compiler data

3 years agoMerge pull request #3014 from RajalakshmiSR/dgemvnp10
Martin Kroeker [Mon, 30 Nov 2020 07:18:24 +0000 (08:18 +0100)]
Merge pull request #3014 from RajalakshmiSR/dgemvnp10

POWER10:  Optimize dgemv_n

3 years agoPOWER10: Optimize dgemv_n
Rajalakshmi Srinivasaraghavan [Sun, 29 Nov 2020 21:28:28 +0000 (15:28 -0600)]
POWER10:  Optimize dgemv_n

Handling as 4x8 with vector pairs gives better performance than
existing code in POWER10.

3 years agoAdd SSE flags for x86
Martin Kroeker [Sun, 29 Nov 2020 14:33:07 +0000 (15:33 +0100)]
Add SSE  flags for x86

3 years agoAdd workaround for gcc 4.6 miscompiling assembly kernels with -mavx
Martin Kroeker [Sun, 29 Nov 2020 14:32:17 +0000 (15:32 +0100)]
Add workaround for gcc 4.6 miscompiling assembly kernels with -mavx

3 years agoMerge pull request #3012 from martin-frbg/restore-getarch
Martin Kroeker [Sun, 29 Nov 2020 12:27:47 +0000 (13:27 +0100)]
Merge pull request #3012 from martin-frbg/restore-getarch

Restore RISCV entries accidentally trashed by my PR 3005

3 years agoRestore RISCV entries accidentally trashed by my PR 3005
Martin Kroeker [Sun, 29 Nov 2020 12:19:51 +0000 (13:19 +0100)]
Restore RISCV entries accidentally trashed by my PR 3005

3 years agoMerge pull request #3010 from ggouaillardet/topic/fj_compilers
Martin Kroeker [Sun, 29 Nov 2020 10:36:43 +0000 (11:36 +0100)]
Merge pull request #3010 from ggouaillardet/topic/fj_compilers

add Fujitsu compilers

3 years agolink math lib on FreeBSD
cyy [Sun, 29 Nov 2020 09:17:07 +0000 (17:17 +0800)]
link math lib on FreeBSD

3 years agoadd Fujitsu compilers
Gilles Gouaillardet [Sun, 29 Nov 2020 04:57:57 +0000 (13:57 +0900)]
add Fujitsu compilers

Co-authored-by: Tomoki Karatsu <karatsu.spack@gmail.com>
3 years agoMerge pull request #3005 from martin-frbg/ssefix
Martin Kroeker [Mon, 23 Nov 2020 07:35:32 +0000 (08:35 +0100)]
Merge pull request #3005 from martin-frbg/ssefix

Add -msse for x86 and silence build warning in getarch

3 years agoMerge pull request #3004 from martin-frbg/bsd_getauxval
Martin Kroeker [Mon, 23 Nov 2020 07:35:12 +0000 (08:35 +0100)]
Merge pull request #3004 from martin-frbg/bsd_getauxval

ARM64 DYNAMIC_ARCH build fix for BSD/OSX

3 years agoMerge pull request #3002 from martin-frbg/issue3000
Martin Kroeker [Sun, 22 Nov 2020 21:51:26 +0000 (22:51 +0100)]
Merge pull request #3002 from martin-frbg/issue3000

Ensure that all targets in a DYNAMIC_ARCH build on POWER use the same buffer size

3 years agoMerge pull request #3001 from martin-frbg/issue2996
Martin Kroeker [Sun, 22 Nov 2020 21:50:41 +0000 (22:50 +0100)]
Merge pull request #3001 from martin-frbg/issue2996

Fix ambiguous ifdefs in tests for user-defined options in Makefiles

3 years agoAvoid redefinition warning
Martin Kroeker [Sun, 22 Nov 2020 20:16:07 +0000 (21:16 +0100)]
Avoid redefinition warning

3 years agoAdd -msse if supported
Martin Kroeker [Sun, 22 Nov 2020 20:15:08 +0000 (21:15 +0100)]
Add -msse if supported

3 years agoBuild fix for systems that do not support getauxval
Martin Kroeker [Sun, 22 Nov 2020 19:20:28 +0000 (20:20 +0100)]
Build fix for systems that do not support getauxval

3 years agoFix syntax mixup
Martin Kroeker [Sun, 22 Nov 2020 16:41:44 +0000 (17:41 +0100)]
Fix syntax mixup

3 years agoRestore proper Makefile
Martin Kroeker [Sun, 22 Nov 2020 16:16:22 +0000 (17:16 +0100)]
Restore proper Makefile

3 years agoEnsure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds
Martin Kroeker [Sun, 22 Nov 2020 15:48:22 +0000 (16:48 +0100)]
Ensure that the same (large) BUFFERSIZE is used for all cpus in DYNAMIC_ARCH builds

3 years agoUse ifneq instead of ifdef for CROSS option
Martin Kroeker [Sun, 22 Nov 2020 15:33:34 +0000 (16:33 +0100)]
Use ifneq instead of ifdef for CROSS option

3 years agoUse ifeq instead of ifdef for user-definable build options
Martin Kroeker [Sun, 22 Nov 2020 15:31:44 +0000 (16:31 +0100)]
Use ifeq instead of ifdef for user-definable build options

3 years agoUse ifeq instead of ifdef for user-definable options
Martin Kroeker [Sun, 22 Nov 2020 15:29:56 +0000 (16:29 +0100)]
Use ifeq instead of ifdef for user-definable options

3 years agoConvert ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:27:17 +0000 (16:27 +0100)]
Convert ifndefs to ifneq

3 years agoChange ifndef CROSS to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:25:36 +0000 (16:25 +0100)]
Change ifndef CROSS to ifneq

3 years agoChange ifndefs to ifneq
Martin Kroeker [Sun, 22 Nov 2020 15:19:31 +0000 (16:19 +0100)]
Change ifndefs to ifneq

3 years agoMerge pull request #112 from xianyi/develop
Martin Kroeker [Sun, 22 Nov 2020 15:17:19 +0000 (16:17 +0100)]
Merge pull request #112 from xianyi/develop

rebase

3 years agoMerge pull request #2965 from epsilon-0/develop
Martin Kroeker [Sun, 22 Nov 2020 11:25:33 +0000 (12:25 +0100)]
Merge pull request #2965 from epsilon-0/develop

allow setting soname without suffix or prefix

3 years agoMerge pull request #2988 from xiegengxin/smp-asum
Martin Kroeker [Sun, 22 Nov 2020 11:24:13 +0000 (12:24 +0100)]
Merge pull request #2988 from xiegengxin/smp-asum

Improve the performance of dasum and sasum when SMP is defined

3 years agoMerge pull request #2997 from Flamefire/reproduce_crash
Martin Kroeker [Sun, 22 Nov 2020 11:22:57 +0000 (12:22 +0100)]
Merge pull request #2997 from Flamefire/reproduce_crash

Add reproducer test for crash after fork

3 years agoMerge branch 'risc-v' into develop
Xianyi Zhang [Sun, 22 Nov 2020 08:05:32 +0000 (16:05 +0800)]
Merge branch 'risc-v' into develop

3 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Sun, 22 Nov 2020 08:04:50 +0000 (16:04 +0800)]
Merge branch 'develop' into risc-v

3 years agoUpdate doc for C910.
Xianyi Zhang [Sun, 22 Nov 2020 08:02:19 +0000 (16:02 +0800)]
Update doc for C910.

3 years agoMerge pull request #2995 from Flamefire/fix_thread_buffer_init
Martin Kroeker [Fri, 20 Nov 2020 08:42:10 +0000 (09:42 +0100)]
Merge pull request #2995 from Flamefire/fix_thread_buffer_init

Don't overwrite blas_thread_buffer if already set

3 years agoAdd reproducer test for crash after fork
Alexander Grund [Thu, 19 Nov 2020 14:24:57 +0000 (15:24 +0100)]
Add reproducer test for crash after fork

See #2993 for an analysis

3 years agoDon't overwrite blas_thread_buffer if already set
Alexander Grund [Thu, 19 Nov 2020 13:39:00 +0000 (14:39 +0100)]
Don't overwrite blas_thread_buffer if already set

After a fork it is possible that blas_thread_buffer has already
allocated memory buffers: goto_set_num_threads does allocate those
already and it may be called by num_cpu_avail in case the OpenBLAS
NUM_THREADS differ from the OMP num threads.
This leads to a memory leak which can cause subsequent execution of BLAS
kernels to fail.

Fixes #2993

3 years agoPOWER10: Use POWER9 as a fallback
Anton Blanchard [Thu, 19 Nov 2020 10:04:10 +0000 (21:04 +1100)]
POWER10: Use POWER9 as a fallback

If the toolchain is too old, or the mma features isn't set on a POWER10
fall back to the POWER9 loops.

3 years agoPOWER10: Fix ld version detection
Anton Blanchard [Thu, 19 Nov 2020 09:50:42 +0000 (20:50 +1100)]
POWER10: Fix ld version detection

LDVERSIONGTEQ35 needs to escape the '>' character.

LDVERSIONGTEQ35 is checking the system ld version which may be different
to the toolchain being used to compile OpenBLAS. We don't have a path
to the linker in our Makefiles, so (ab)use gcc -Wl,--version to get the
version of ld in our toolchain.

3 years agoMerge pull request #2981 from Qiyu8/fix-sum
Martin Kroeker [Mon, 16 Nov 2020 07:40:46 +0000 (08:40 +0100)]
Merge pull request #2981 from Qiyu8/fix-sum

Fix sum optimize issues

3 years agoMerge pull request #2983 from Qiyu8/optimize-srot
Martin Kroeker [Mon, 16 Nov 2020 07:38:37 +0000 (08:38 +0100)]
Merge pull request #2983 from Qiyu8/optimize-srot

Optimize the performance of rot by using universal intrinsics

3 years agoremove the -mfma flag in when the host has AVX.
Qiyu8 [Mon, 16 Nov 2020 01:14:56 +0000 (09:14 +0800)]
remove the -mfma flag in when the host has AVX.