platform/upstream/openblas.git
3 years agoRefs #2899
Xianyi Zhang [Tue, 10 Nov 2020 01:38:04 +0000 (09:38 +0800)]
Refs #2899
Merge branch 'openblas-open-910' of git://github.com/damonyu1989/OpenBLAS into damonyu1989-openblas-open-910

3 years agoMerge branch 'develop' into risc-v
Xianyi Zhang [Tue, 10 Nov 2020 01:18:25 +0000 (09:18 +0800)]
Merge branch 'develop' into risc-v

3 years agoMerge pull request #2972 from xiegengxin/rot-intrinsic
Martin Kroeker [Sun, 8 Nov 2020 21:43:00 +0000 (22:43 +0100)]
Merge pull request #2972 from xiegengxin/rot-intrinsic

Improve the performance of rot by using AVX512 and AVX2 intrinsic

3 years agoMerge pull request #2980 from martin-frbg/fixgetarch
Martin Kroeker [Sun, 8 Nov 2020 16:39:05 +0000 (17:39 +0100)]
Merge pull request #2980 from martin-frbg/fixgetarch

Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

3 years agoFix missing AVX2 and FMA3 capabilities in FORCE_target mode
Martin Kroeker [Sun, 8 Nov 2020 12:15:40 +0000 (13:15 +0100)]
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode

3 years agoMerge pull request #2979 from RajalakshmiSR/dot_power10
Martin Kroeker [Sun, 8 Nov 2020 09:19:34 +0000 (10:19 +0100)]
Merge pull request #2979 from RajalakshmiSR/dot_power10

Optimize sdot/ddot for POWER10

3 years agoMerge pull request #2978 from martin-frbg/fixdynfeatures
Martin Kroeker [Sun, 8 Nov 2020 09:19:17 +0000 (10:19 +0100)]
Merge pull request #2978 from martin-frbg/fixdynfeatures

Fix handling of cpu capability flags in DYNAMIC_ARCH builds

3 years agoStay compatible with old gmake that did not support undefine
Martin Kroeker [Sat, 7 Nov 2020 23:12:55 +0000 (00:12 +0100)]
Stay compatible with old gmake that did not support undefine

3 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 23:01:36 +0000 (00:01 +0100)]
Update Makefile.system

3 years agoUpdate Makefile.system
Martin Kroeker [Sat, 7 Nov 2020 22:37:21 +0000 (23:37 +0100)]
Update Makefile.system

3 years agoOptimize sdot/ddot for POWER10
Rajalakshmi Srinivasaraghavan [Sat, 7 Nov 2020 21:21:58 +0000 (15:21 -0600)]
Optimize sdot/ddot for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoRemove previous workaround for compiler flags related to cpu capabilities in x86_64...
Martin Kroeker [Sat, 7 Nov 2020 19:39:56 +0000 (20:39 +0100)]
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds

3 years agoReset cpu property flags between build cycles in DYNAMIC_ARCH mode
Martin Kroeker [Sat, 7 Nov 2020 19:37:03 +0000 (20:37 +0100)]
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode

3 years agoFix propagation of cpu properties to compiler options
Martin Kroeker [Sat, 7 Nov 2020 19:30:15 +0000 (20:30 +0100)]
Fix propagation of cpu properties to compiler options

3 years agoRemove extraneous quotes that caused a cmake policy warning
Martin Kroeker [Sat, 7 Nov 2020 19:27:42 +0000 (20:27 +0100)]
Remove extraneous quotes that caused a cmake policy warning

3 years agoFix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH...
Martin Kroeker [Sat, 7 Nov 2020 19:26:12 +0000 (20:26 +0100)]
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds

3 years agoMerge pull request #110 from xianyi/develop
Martin Kroeker [Sat, 7 Nov 2020 19:22:41 +0000 (20:22 +0100)]
Merge pull request #110 from xianyi/develop

rebase

3 years agoMerge pull request #2977 from martin-frbg/issue2976
Martin Kroeker [Sat, 7 Nov 2020 13:41:34 +0000 (14:41 +0100)]
Merge pull request #2977 from martin-frbg/issue2976

Fix macro name used in ifdef for POWERPC/PGI

3 years agoFix macro name used in ifdef
Martin Kroeker [Sat, 7 Nov 2020 11:17:49 +0000 (12:17 +0100)]
Fix macro name used in ifdef

3 years agofix typo
Gengxin Xie [Thu, 5 Nov 2020 08:25:17 +0000 (16:25 +0800)]
fix typo

3 years agoImprove the performance of rot by using AVX512 and AVX2 intrinsic
Gengxin Xie [Sun, 27 Sep 2020 02:38:19 +0000 (10:38 +0800)]
Improve the performance of rot by using AVX512 and AVX2 intrinsic

3 years agoMerge pull request #2966 from martin-frbg/issue2964
Martin Kroeker [Wed, 4 Nov 2020 15:02:46 +0000 (16:02 +0100)]
Merge pull request #2966 from martin-frbg/issue2964

Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds

3 years agoExport NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
Martin Kroeker [Tue, 3 Nov 2020 22:47:04 +0000 (23:47 +0100)]
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target

3 years agoAdd -msse3 where needed for DYNAMIC_ARCH builds
Martin Kroeker [Tue, 3 Nov 2020 22:45:49 +0000 (23:45 +0100)]
Add -msse3 where needed for DYNAMIC_ARCH builds

3 years agoFix target test
Martin Kroeker [Mon, 2 Nov 2020 22:17:46 +0000 (23:17 +0100)]
Fix target test

3 years agoAdd files via upload
Martin Kroeker [Mon, 2 Nov 2020 21:43:50 +0000 (22:43 +0100)]
Add files via upload

3 years agoMerge pull request #2967 from RajalakshmiSR/dgemm88
Martin Kroeker [Mon, 2 Nov 2020 17:54:36 +0000 (18:54 +0100)]
Merge pull request #2967 from RajalakshmiSR/dgemm88

POWER10:  Change dgemm unroll factors

3 years agotypo fix
Martin Kroeker [Sun, 1 Nov 2020 21:25:43 +0000 (22:25 +0100)]
typo fix

3 years agoDisable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
Martin Kroeker [Sun, 1 Nov 2020 21:11:48 +0000 (22:11 +0100)]
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target

3 years agoDisable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
Martin Kroeker [Sun, 1 Nov 2020 20:58:26 +0000 (21:58 +0100)]
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC

NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default

3 years agoMerge pull request #2962 from brada4/develop
Martin Kroeker [Sun, 1 Nov 2020 13:24:40 +0000 (14:24 +0100)]
Merge pull request #2962 from brada4/develop

add openbsd 68+ gfortran name

3 years agoMerge pull request #2963 from martin-frbg/issue2959
Martin Kroeker [Sun, 1 Nov 2020 08:14:54 +0000 (09:14 +0100)]
Merge pull request #2963 from martin-frbg/issue2959

Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode

3 years agoPOWER10: Change dgemm unroll factors
Rajalakshmi Srinivasaraghavan [Sat, 31 Oct 2020 23:28:57 +0000 (18:28 -0500)]
POWER10:  Change dgemm unroll factors

Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature.   Also made some minor changes in sgemm for edge cases.

3 years agoReunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
Martin Kroeker [Sat, 31 Oct 2020 23:00:43 +0000 (00:00 +0100)]
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH

3 years agoadd openbsd gfortran
User User-User [Sat, 31 Oct 2020 22:38:08 +0000 (00:38 +0200)]
add openbsd gfortran

3 years agoMerge pull request #109 from xianyi/develop
Martin Kroeker [Sat, 31 Oct 2020 21:33:52 +0000 (22:33 +0100)]
Merge pull request #109 from xianyi/develop

rebase

3 years agoMerge pull request #2960 from thrasibule/avx2_detection
Martin Kroeker [Sat, 31 Oct 2020 19:24:21 +0000 (20:24 +0100)]
Merge pull request #2960 from thrasibule/avx2_detection

fix avx2 detection

3 years agofix avx2 detection
Guillaume Horel [Sat, 31 Oct 2020 14:00:48 +0000 (10:00 -0400)]
fix avx2 detection

reword commits to make it clearer

3 years agoMerge pull request #2956 from RajalakshmiSR/caxpy_p10
Martin Kroeker [Fri, 30 Oct 2020 07:54:10 +0000 (08:54 +0100)]
Merge pull request #2956 from RajalakshmiSR/caxpy_p10

Optimize caxpy for POWER10

3 years agoOptimize caxpy for POWER10
Rajalakshmi Srinivasaraghavan [Thu, 29 Oct 2020 19:57:51 +0000 (14:57 -0500)]
Optimize caxpy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoMerge pull request #2940 from Qiyu8/optimize-benchmark
Martin Kroeker [Thu, 29 Oct 2020 19:28:37 +0000 (20:28 +0100)]
Merge pull request #2940 from Qiyu8/optimize-benchmark

Refactor the performance measurement system

3 years agoMerge pull request #2954 from Guobing-Chen/BF16_gemv_support
Martin Kroeker [Thu, 29 Oct 2020 08:22:33 +0000 (09:22 +0100)]
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support

Implementation of BF16 based gemv

3 years agoMerge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Martin Kroeker [Thu, 29 Oct 2020 08:22:07 +0000 (09:22 +0100)]
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue

Fix cooperlake compile issue

3 years agoFix cooperlake compile issue
Chen, Guobing [Wed, 28 Oct 2020 19:37:51 +0000 (03:37 +0800)]
Fix cooperlake compile issue

Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.

3 years agoImplementation of BF16 based gemv
Chen, Guobing [Wed, 28 Oct 2020 00:49:12 +0000 (08:49 +0800)]
Implementation of BF16 based gemv

1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
3 years agoMerge pull request #2939 from thrasibule/Makefile_cleanup
Martin Kroeker [Wed, 28 Oct 2020 08:38:40 +0000 (09:38 +0100)]
Merge pull request #2939 from thrasibule/Makefile_cleanup

reuse variables defined in Makefile.system

3 years agoMerge pull request #2951 from martin-frbg/cleanup_make
Martin Kroeker [Wed, 28 Oct 2020 08:37:56 +0000 (09:37 +0100)]
Merge pull request #2951 from martin-frbg/cleanup_make

Minor Makefile cleanup

3 years agoMerge pull request #2952 from martin-frbg/issue2931
Martin Kroeker [Wed, 28 Oct 2020 08:37:32 +0000 (09:37 +0100)]
Merge pull request #2952 from martin-frbg/issue2931

Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails

3 years agoMerge pull request #2948 from martin-frbg/issue2947
Martin Kroeker [Wed, 28 Oct 2020 08:37:09 +0000 (09:37 +0100)]
Merge pull request #2948 from martin-frbg/issue2947

Expressly enable neon for use with intrinsics if available

3 years agoOutput predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
Martin Kroeker [Tue, 27 Oct 2020 22:01:19 +0000 (23:01 +0100)]
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET

3 years agoTry to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
Martin Kroeker [Tue, 27 Oct 2020 16:51:32 +0000 (17:51 +0100)]
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails

3 years agoMerge pull request #2950 from RajalakshmiSR/saxpy
Martin Kroeker [Mon, 26 Oct 2020 23:02:18 +0000 (00:02 +0100)]
Merge pull request #2950 from RajalakshmiSR/saxpy

Optimize saxpy for POWER10

3 years agoRemove debug printout of object list
Martin Kroeker [Mon, 26 Oct 2020 20:37:04 +0000 (21:37 +0100)]
Remove debug printout of object list

3 years agoRemove spurious expr in flang version check
Martin Kroeker [Mon, 26 Oct 2020 20:35:40 +0000 (21:35 +0100)]
Remove spurious expr in flang version check

3 years agoOptimize saxpy for POWER10
Rajalakshmi Srinivasaraghavan [Mon, 26 Oct 2020 18:24:59 +0000 (13:24 -0500)]
Optimize saxpy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores.

3 years agoRefractoring remaining benchmark cases.
Qiyu8 [Mon, 26 Oct 2020 02:25:05 +0000 (10:25 +0800)]
Refractoring remaining benchmark cases.

3 years agoMerge pull request #2946 from martin-frbg/issue2945
Martin Kroeker [Sun, 25 Oct 2020 23:43:44 +0000 (00:43 +0100)]
Merge pull request #2946 from martin-frbg/issue2945

Move definitions that are neither needed nor supported on Solaris

3 years agoExpressly enable neon for use with intrinsics if available
Martin Kroeker [Sun, 25 Oct 2020 23:21:56 +0000 (00:21 +0100)]
Expressly enable neon for use with intrinsics if available

3 years agoMove definitions that are neither needed nor supported on SUNOS
Martin Kroeker [Sun, 25 Oct 2020 11:01:50 +0000 (12:01 +0100)]
Move definitions that are neither needed nor supported on SUNOS

3 years agoUpdate version to 0.3.12.dev
Martin Kroeker [Sat, 24 Oct 2020 21:29:05 +0000 (23:29 +0200)]
Update version to 0.3.12.dev

3 years agoUpdate version to 0.3.12.dev
Martin Kroeker [Sat, 24 Oct 2020 21:28:29 +0000 (23:28 +0200)]
Update version to 0.3.12.dev

3 years agoMerge pull request #2944 from xianyi/release-0.3.0
Martin Kroeker [Sat, 24 Oct 2020 11:10:51 +0000 (13:10 +0200)]
Merge pull request #2944 from xianyi/release-0.3.0

Merge back 0.3.12 tag (and Changelog typo fixes) from release

3 years agoFix typos
Martin Kroeker [Sat, 24 Oct 2020 11:03:28 +0000 (13:03 +0200)]
Fix typos

3 years agoMerge pull request #2943 from xianyi/develop
Martin Kroeker [Sat, 24 Oct 2020 10:52:59 +0000 (12:52 +0200)]
Merge pull request #2943 from xianyi/develop

Merge from develop for 0.3.12 release

3 years agoUpdate Changelog with 0.3.12 changes
Martin Kroeker [Sat, 24 Oct 2020 10:50:04 +0000 (12:50 +0200)]
Update Changelog with 0.3.12 changes

3 years agoUpdate version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 10:15:33 +0000 (12:15 +0200)]
Update version to 0.3.12 for release

3 years agoUpdate version to 0.3.12 for release
Martin Kroeker [Sat, 24 Oct 2020 10:14:45 +0000 (12:14 +0200)]
Update version to 0.3.12 for release

3 years agoMerge pull request #2942 from martin-frbg/makebuildtypes
Martin Kroeker [Sat, 24 Oct 2020 07:26:50 +0000 (09:26 +0200)]
Merge pull request #2942 from martin-frbg/makebuildtypes

Comment out  BUILD_SINGLE etc. in Makefile.rule and add a short explanation

3 years agoComment out BUILD_SINGLE etc. and add a short explanation
Martin Kroeker [Fri, 23 Oct 2020 21:32:06 +0000 (23:32 +0200)]
Comment out  BUILD_SINGLE etc. and add a short explanation

3 years agoMerge pull request #2941 from martin-frbg/exportsfix
Martin Kroeker [Fri, 23 Oct 2020 18:47:35 +0000 (20:47 +0200)]
Merge pull request #2941 from martin-frbg/exportsfix

Fix grouping of sladiv1/dladiv1/ilaenv2stage in gensymbol

3 years agoFix wrong grouping of dcombssq
Martin Kroeker [Fri, 23 Oct 2020 13:53:40 +0000 (15:53 +0200)]
Fix wrong grouping of dcombssq

3 years agofix missing split of sladiv1/dladiv/ilaenv2stage by build type
Martin Kroeker [Fri, 23 Oct 2020 13:31:25 +0000 (15:31 +0200)]
fix missing split of sladiv1/dladiv/ilaenv2stage by build type

3 years agoMerge pull request #108 from xianyi/develop
Martin Kroeker [Fri, 23 Oct 2020 13:29:48 +0000 (15:29 +0200)]
Merge pull request #108 from xianyi/develop

rebase

3 years agoMerge pull request #2937 from martin-frbg/pwr-buffersz
Martin Kroeker [Fri, 23 Oct 2020 05:15:32 +0000 (07:15 +0200)]
Merge pull request #2937 from martin-frbg/pwr-buffersz

Increase and unify BUFFERSIZE on POWER;fix gcc inline warning

3 years agoRefactor the performance measurement system
Qiyu8 [Fri, 23 Oct 2020 02:32:03 +0000 (10:32 +0800)]
Refactor the performance measurement system

3 years agoreuse variables defined in Makefile.system
Guillaume Horel [Fri, 23 Oct 2020 02:00:00 +0000 (22:00 -0400)]
reuse variables defined in Makefile.system

3 years agoMerge pull request #2938 from martin-frbg/2934-3
Martin Kroeker [Thu, 22 Oct 2020 22:19:49 +0000 (00:19 +0200)]
Merge pull request #2938 from martin-frbg/2934-3

Fix twisted spelling that broke the gfortran version test again

3 years agoFix twisted spelling that broke the gfortran version test again
Martin Kroeker [Thu, 22 Oct 2020 22:18:29 +0000 (00:18 +0200)]
Fix twisted spelling that broke the gfortran version test again

3 years agoIncrease BUFFERSIZE further
Martin Kroeker [Thu, 22 Oct 2020 22:12:06 +0000 (00:12 +0200)]
Increase BUFFERSIZE further

3 years agolabel always_inline function as inline to silence a gcc warning
Martin Kroeker [Thu, 22 Oct 2020 20:14:26 +0000 (22:14 +0200)]
label always_inline function as inline to silence a gcc warning

3 years agoMerge pull request #2936 from martin-frbg/issue2934-2
Martin Kroeker [Thu, 22 Oct 2020 20:08:46 +0000 (22:08 +0200)]
Merge pull request #2936 from martin-frbg/issue2934-2

Fix compiler version check for -mavx2 support (DYNAMIC_ARCH case)

3 years agoMerge pull request #2935 from martin-frbg/lapack458
Martin Kroeker [Thu, 22 Oct 2020 17:25:58 +0000 (19:25 +0200)]
Merge pull request #2935 from martin-frbg/lapack458

Fix macro used in argument conversion (LAPACK PR 458)

3 years agoIncrease BUFFERSIZE for POWER8-10 and use same value for POWER6
Martin Kroeker [Thu, 22 Oct 2020 16:47:07 +0000 (18:47 +0200)]
Increase BUFFERSIZE for POWER8-10 and use same value for POWER6

to fix overflow warning for PWR8 ZGEMM and PWR9 C/ZGEMM and avoid size mismatches in DYNAMIC_ARCH

3 years agoFix compiler version check
Martin Kroeker [Thu, 22 Oct 2020 14:23:29 +0000 (16:23 +0200)]
Fix compiler version check

3 years agoMerge pull request #106 from xianyi/develop
Martin Kroeker [Thu, 22 Oct 2020 14:21:09 +0000 (16:21 +0200)]
Merge pull request #106 from xianyi/develop

rebase

3 years agoFix macro used in argument conversion (LAPACK PR 458)
Martin Kroeker [Thu, 22 Oct 2020 14:19:26 +0000 (16:19 +0200)]
Fix macro used in argument conversion (LAPACK PR 458)

3 years agoMerge pull request #2932 from RajalakshmiSR/copyp10
Martin Kroeker [Wed, 21 Oct 2020 22:29:46 +0000 (00:29 +0200)]
Merge pull request #2932 from RajalakshmiSR/copyp10

Optimize scopy/ccopy for POWER10

3 years agoMerge pull request #2934 from thrasibule/improve_version_check
Martin Kroeker [Wed, 21 Oct 2020 22:29:02 +0000 (00:29 +0200)]
Merge pull request #2934 from thrasibule/improve_version_check

actually check that version is greater than 4.7

3 years agoactually check that version is greater than 4.7
Guillaume Horel [Wed, 21 Oct 2020 20:42:37 +0000 (16:42 -0400)]
actually check that version is greater than 4.7

3 years agoOptimize scopy/ccopy for POWER10
Rajalakshmi Srinivasaraghavan [Wed, 21 Oct 2020 14:53:45 +0000 (09:53 -0500)]
Optimize scopy/ccopy for POWER10

This patch makes use of new POWER10 vector pair instructions for
loads and stores. Also reorganized all variants of copy functions
to make use of same kernel.

3 years agoMerge pull request #2930 from ismail/fix-no-return
Martin Kroeker [Wed, 21 Oct 2020 09:43:01 +0000 (11:43 +0200)]
Merge pull request #2930 from ismail/fix-no-return

Fix build with -Werror=return-type

3 years agoMerge pull request #2928 from martin-frbg/issue2917
Martin Kroeker [Wed, 21 Oct 2020 08:11:02 +0000 (10:11 +0200)]
Merge pull request #2928 from martin-frbg/issue2917

Enable -mavx2 for flang as well where supported

3 years agoFix build with -Werror=return-type
İsmail Dönmez [Wed, 21 Oct 2020 06:43:39 +0000 (08:43 +0200)]
Fix build with -Werror=return-type
dgemm_tcopy_16_skylakex.c CNAME function should return an int, add a
return 0 similar to other files.

3 years agoEnable -mavx2 for flang as well
Martin Kroeker [Tue, 20 Oct 2020 21:56:30 +0000 (23:56 +0200)]
Enable -mavx2 for flang as well

3 years agoMerge pull request #105 from xianyi/develop
Martin Kroeker [Tue, 20 Oct 2020 21:48:53 +0000 (23:48 +0200)]
Merge pull request #105 from xianyi/develop

rebase

3 years agoMerge pull request #2925 from martin-frbg/issue2911-2
Martin Kroeker [Tue, 20 Oct 2020 09:27:36 +0000 (11:27 +0200)]
Merge pull request #2925 from martin-frbg/issue2911-2

Add binutils version check as prerequisite for POWER10 in DYNAMIC_ARCH build

3 years agoMerge pull request #2926 from bartoldeman/vzeroupper-clobber-all
Martin Kroeker [Tue, 20 Oct 2020 07:24:47 +0000 (09:24 +0200)]
Merge pull request #2926 from bartoldeman/vzeroupper-clobber-all

x86_64: clobber all xmm registers after vzeroupper

3 years agoFix missing backquotes
Martin Kroeker [Tue, 20 Oct 2020 06:37:53 +0000 (08:37 +0200)]
Fix missing backquotes

3 years agox86_64: clobber all xmm registers after vzeroupper
Bart Oldeman [Tue, 20 Oct 2020 02:16:47 +0000 (02:16 +0000)]
x86_64: clobber all xmm registers after vzeroupper

As observed using GCC 10 using -march=native -ftree-vectorize
on Knights Landing, it is now smart enough to find clobbers inside
non-inlined static functions.

In particular, sgemv counted on a kernel to preserve the whole
%ymm2 register (since it was not in the clobber list), but the top
part was destroyed by vzeroupper. This caused many tests to fail.

This patch makes sure all xmm (and ymm/zmm by extension) registers
are listed as clobbered to avoid this happening, as most kernels
already did correctly in fact.

3 years agoAdd POWER10 support flag (unconditionally for now)
Martin Kroeker [Mon, 19 Oct 2020 23:09:49 +0000 (01:09 +0200)]
Add POWER10 support flag (unconditionally for now)