platform/upstream/openblas.git
2 years agoAdapt CMake for SVE
Bine Brank [Fri, 26 Nov 2021 09:35:01 +0000 (10:35 +0100)]
Adapt CMake for SVE

2 years agoreduced dgemm_unroll_m to work with 128-bit sve
Bine Brank [Tue, 23 Nov 2021 20:18:08 +0000 (21:18 +0100)]
reduced dgemm_unroll_m to work with 128-bit sve

2 years agoremoved unused code (compiler warnings)
Bine Brank [Mon, 22 Nov 2021 09:12:34 +0000 (10:12 +0100)]
removed unused code (compiler warnings)

2 years agomodify Makefile for SVE copy
Bine Brank [Mon, 22 Nov 2021 08:54:20 +0000 (09:54 +0100)]
modify Makefile for SVE copy

2 years agoconfigure SVE Makefile
Bine Brank [Sun, 21 Nov 2021 17:33:43 +0000 (18:33 +0100)]
configure SVE Makefile

2 years agosome clean-up & commentary
Bine Brank [Sun, 21 Nov 2021 13:56:27 +0000 (14:56 +0100)]
some clean-up & commentary

2 years agosymm SVE copy rutines
Bine Brank [Sat, 20 Nov 2021 15:35:29 +0000 (16:35 +0100)]
symm SVE copy rutines

2 years agoadd remaining trmm copy rutines for SVE
Bine Brank [Sun, 14 Nov 2021 15:00:10 +0000 (16:00 +0100)]
add remaining trmm copy rutines for SVE

2 years agodtrmm_utcopy sve function
Bine Brank [Sat, 13 Nov 2021 17:48:53 +0000 (18:48 +0100)]
dtrmm_utcopy sve function

2 years agoadd v2x8 kernel + fix sve dtrmm
Bine Brank [Sun, 7 Nov 2021 19:37:51 +0000 (20:37 +0100)]
add v2x8 kernel + fix sve dtrmm

2 years agoadd ARMV8SVE target
Bine Brank [Mon, 1 Nov 2021 21:53:21 +0000 (22:53 +0100)]
add ARMV8SVE target

2 years agofix sve dgemm kernel + sve dtrmm
Bine Brank [Sun, 31 Oct 2021 09:24:25 +0000 (10:24 +0100)]
fix sve dgemm kernel + sve dtrmm

2 years agoadded SVE ncopy and tcopy
Bine Brank [Sat, 30 Oct 2021 10:11:44 +0000 (12:11 +0200)]
added SVE ncopy and tcopy

2 years agoadd sve dgemm prototype
Bine Brank [Wed, 27 Oct 2021 14:37:18 +0000 (16:37 +0200)]
add sve dgemm prototype

2 years agoMerge pull request #3424 from Neutron3529/patch-1
Martin Kroeker [Wed, 27 Oct 2021 14:28:12 +0000 (16:28 +0200)]
Merge pull request #3424 from Neutron3529/patch-1

auto-detect for Intel i7-11800H

2 years agoMerge pull request #3423 from mhillenbrand/fix-static-detection
Martin Kroeker [Wed, 27 Oct 2021 14:27:47 +0000 (16:27 +0200)]
Merge pull request #3423 from mhillenbrand/fix-static-detection

s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice

2 years agoauto-detect for Intel i7-11800H
Neutron3529 [Wed, 27 Oct 2021 06:16:37 +0000 (14:16 +0800)]
auto-detect for Intel i7-11800H

2 years agos390x: use DYNAMIC_ARCH's cpu detection for compile-time choice
Marius Hillenbrand [Tue, 26 Oct 2021 13:19:49 +0000 (15:19 +0200)]
s390x: use DYNAMIC_ARCH's cpu detection for compile-time choice

On s390x, the run-time detection for DYNAMIC_ARCH and the compile-time
choice in cpuid_zarch use different methods for identifying the
supported CPU features. To make cpuid_zarch future-proof and both easier
to maintain, switch cpuid_zarch to the same mechanism as DYNAMIC_ZARCH
(i.e., derive the supported CPU features from hwcap flags) and share
code between both (in a new header cpuid_zarch.h).

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
2 years agoMerge pull request #3422 from martin-frbg/issue3421
Martin Kroeker [Mon, 25 Oct 2021 21:37:28 +0000 (23:37 +0200)]
Merge pull request #3422 from martin-frbg/issue3421

Revert invalid trsv shortcut from PR #3252

2 years agoRevert #3252
Martin Kroeker [Sun, 24 Oct 2021 21:57:06 +0000 (23:57 +0200)]
Revert #3252

2 years agoMerge pull request #3420 from martin-frbg/issue3419
Martin Kroeker [Wed, 20 Oct 2021 10:00:06 +0000 (12:00 +0200)]
Merge pull request #3420 from martin-frbg/issue3419

Revert wrong ZTRSV optimization from #3252

2 years agoRemove dangerous optimization from previous #3252 - buffer is never unused here
Martin Kroeker [Wed, 20 Oct 2021 08:50:02 +0000 (10:50 +0200)]
Remove dangerous optimization from previous #3252 - buffer is never unused here

2 years agoMerge pull request #3418 from martin-frbg/issue2927-2
Martin Kroeker [Wed, 20 Oct 2021 06:23:53 +0000 (08:23 +0200)]
Merge pull request #3418 from martin-frbg/issue2927-2

Enable SVE for A64FX

2 years agoEnable SVE for A64FX
Martin Kroeker [Tue, 19 Oct 2021 21:23:40 +0000 (23:23 +0200)]
Enable SVE for A64FX

2 years agoAdd basic support for the Fujitsu A64FX (#3415)
Martin Kroeker [Mon, 18 Oct 2021 13:00:19 +0000 (15:00 +0200)]
Add basic support for the Fujitsu A64FX (#3415)

* Add initial support for Fujitsu A64FX as generic ARMV8

2 years agoMerge pull request #3416 from guowangy/spr-bf16
Martin Kroeker [Mon, 18 Oct 2021 12:59:21 +0000 (14:59 +0200)]
Merge pull request #3416 from guowangy/spr-bf16

sbgemm: add AMX-BF16 based kernel for Sapphire Rapids

2 years agosbgemm: spr: disable small matrix path by default
Wangyang Guo [Tue, 12 Oct 2021 08:18:37 +0000 (01:18 -0700)]
sbgemm: spr: disable small matrix path by default

2 years agosbgemm: spr: implement otcopy_16
Wangyang Guo [Thu, 23 Sep 2021 08:08:40 +0000 (01:08 -0700)]
sbgemm: spr: implement otcopy_16

2 years agosbgemm: spr: reuse ncopy_16 from cooperlake as incopy
Wangyang Guo [Sat, 18 Sep 2021 08:11:31 +0000 (01:11 -0700)]
sbgemm: spr: reuse ncopy_16 from cooperlake as incopy

2 years agosbgemm: spr: optimization for tmp_c buffer
Wangyang Guo [Sat, 18 Sep 2021 06:59:32 +0000 (23:59 -0700)]
sbgemm: spr: optimization for tmp_c buffer

2 years agosbgemm: spr: kernel handle alpha != 1.0
Wangyang Guo [Fri, 17 Sep 2021 07:48:52 +0000 (00:48 -0700)]
sbgemm: spr: kernel handle alpha != 1.0

2 years agosbgemm: spr: oncopy: use tile load/store instead
Wangyang Guo [Fri, 17 Sep 2021 03:08:42 +0000 (20:08 -0700)]
sbgemm: spr: oncopy: use tile load/store instead

2 years agosbgemm: spr: only load A once in tail_k handling
Wangyang Guo [Thu, 16 Sep 2021 08:04:01 +0000 (01:04 -0700)]
sbgemm: spr: only load A once in tail_k handling

2 years agosbgemm: spr: process k2 and odd k at the same time
Wangyang Guo [Thu, 16 Sep 2021 06:59:38 +0000 (23:59 -0700)]
sbgemm: spr: process k2 and odd k at the same time

2 years agosbgemm: spr: enlarge P to 256 for performance
Wangyang Guo [Thu, 16 Sep 2021 03:29:49 +0000 (20:29 -0700)]
sbgemm: spr: enlarge P to 256 for performance

2 years agosbgemm: spr: oncopy: avoid handling too much pointer at a time
Wangyang Guo [Thu, 16 Sep 2021 02:36:02 +0000 (19:36 -0700)]
sbgemm: spr: oncopy: avoid handling too much pointer at a time

2 years agosbgemm: spr: reduce tile conf loading by seperate tail k handling
Wangyang Guo [Wed, 15 Sep 2021 08:11:15 +0000 (01:11 -0700)]
sbgemm: spr: reduce tile conf loading by seperate tail k handling

2 years agosbgemm: spr: tuning for blocking params
Wangyang Guo [Mon, 13 Sep 2021 08:44:53 +0000 (01:44 -0700)]
sbgemm: spr: tuning for blocking params

2 years agosbgemm: spr: kernel works for NN case when alpha is 1.0
Wangyang Guo [Mon, 13 Sep 2021 02:22:58 +0000 (19:22 -0700)]
sbgemm: spr: kernel works for NN case when alpha is 1.0

2 years agosbgemm: spr: kernel works for m32 in NN case
Wangyang Guo [Fri, 10 Sep 2021 08:14:05 +0000 (01:14 -0700)]
sbgemm: spr: kernel works for m32 in NN case

2 years agosbgemm: spr: implement oncopy_16
Wangyang Guo [Thu, 9 Sep 2021 02:41:12 +0000 (19:41 -0700)]
sbgemm: spr: implement oncopy_16

2 years agosbgemm: spr: add dummy source files
Wangyang Guo [Tue, 7 Sep 2021 02:48:23 +0000 (19:48 -0700)]
sbgemm: spr: add dummy source files

2 years agoAdd march/mtune flags for clang builds on ARM64 as well (#3414)
Martin Kroeker [Sun, 17 Oct 2021 22:26:14 +0000 (00:26 +0200)]
Add march/mtune flags for clang builds on ARM64 as well (#3414)

* Add march/mtune flags for clang as well

2 years agoMerge pull request #3404 from guowangy/spr-build
Martin Kroeker [Sun, 17 Oct 2021 21:05:11 +0000 (23:05 +0200)]
Merge pull request #3404 from guowangy/spr-build

Initial build support for Sapphire Rapids

2 years agoMerge pull request #3413 from MehdiChinoune/cmake-readibiltiy
Martin Kroeker [Sun, 17 Oct 2021 20:46:48 +0000 (22:46 +0200)]
Merge pull request #3413 from MehdiChinoune/cmake-readibiltiy

[NFC] Improve CMakeLists.txt file readibility

2 years ago[NFC] Improve CMakeLists.txt file readibility
Mehdi Chinoune [Sun, 17 Oct 2021 04:19:30 +0000 (05:19 +0100)]
[NFC] Improve CMakeLists.txt file readibility

Add some extra lines and indentations.

2 years agoMerge pull request #3411 from MehdiChinoune/both_shared_static
Martin Kroeker [Sun, 17 Oct 2021 18:07:14 +0000 (20:07 +0200)]
Merge pull request #3411 from MehdiChinoune/both_shared_static

Support building both static and shared libraries

2 years agoSupport building both static and shared libraries
Mehdi Chinoune [Sat, 16 Oct 2021 07:33:47 +0000 (08:33 +0100)]
Support building both static and shared libraries

2 years agoMerge pull request #3410 from MehdiChinoune/mingw-clang-64
Martin Kroeker [Sat, 16 Oct 2021 11:52:41 +0000 (13:52 +0200)]
Merge pull request #3410 from MehdiChinoune/mingw-clang-64

Fix MinGW/Clang 64 bits detection.

2 years agoFix MinGW/Clang 64 bits detection.
مهدي شينون (Mehdi Chinoune) [Sat, 16 Oct 2021 06:55:10 +0000 (07:55 +0100)]
Fix MinGW/Clang 64 bits detection.

CMAKE_COMPILER_IS_GNUCC is only valid for GCC.

2 years agoFix build error in legacy gcc
Wangyang Guo [Tue, 12 Oct 2021 09:01:20 +0000 (02:01 -0700)]
Fix build error in legacy gcc

2 years agoAdd NO_AVX=1 fallbacks to Sapphire Rapids build
Wangyang Guo [Tue, 12 Oct 2021 08:39:09 +0000 (01:39 -0700)]
Add NO_AVX=1 fallbacks to Sapphire Rapids build

2 years agoinitial support for Sapphire Rapids platform
Wangyang Guo [Fri, 3 Sep 2021 07:39:50 +0000 (00:39 -0700)]
initial support for Sapphire Rapids platform

2 years agoUpdate conda in Appveyor CI and move jobs from Appveyor to Azure (#3400)
Martin Kroeker [Sun, 10 Oct 2021 21:24:52 +0000 (23:24 +0200)]
Update conda in Appveyor CI and move jobs from Appveyor to Azure (#3400)

* Fix clang/cl builds on Appveyor and move them to Azure

* Add clang/flang and mingw builds on Windows to Azure

2 years agoMerge pull request #3399 from martin-frbg/issue2814
Martin Kroeker [Thu, 7 Oct 2021 06:09:34 +0000 (08:09 +0200)]
Merge pull request #3399 from martin-frbg/issue2814

Improve performance on Apple M1 Vortex

2 years agoUse "big arm server" GEMM defaults for Vortex
Martin Kroeker [Wed, 6 Oct 2021 09:10:19 +0000 (11:10 +0200)]
Use "big arm server"  GEMM defaults for Vortex

2 years agoUse Neoverse's current mix of ThunderX2 kernels for Vortex as well
Martin Kroeker [Wed, 6 Oct 2021 09:06:43 +0000 (11:06 +0200)]
Use Neoverse's current mix of ThunderX2 kernels for Vortex as well

2 years agoMerge pull request #3398 from kavanabhat/aix_p10_gnuas
Martin Kroeker [Tue, 5 Oct 2021 16:59:47 +0000 (18:59 +0200)]
Merge pull request #3398 from kavanabhat/aix_p10_gnuas

Big Endian Changes for Power10 kernels

2 years agoMerge pull request #3396 from martin-frbg/makesys_typo
Martin Kroeker [Mon, 4 Oct 2021 20:23:19 +0000 (22:23 +0200)]
Merge pull request #3396 from martin-frbg/makesys_typo

Fix minor typo in Makefile.system

2 years agoMerge pull request #3397 from martin-frbg/m1detect
Martin Kroeker [Mon, 4 Oct 2021 20:22:57 +0000 (22:22 +0200)]
Merge pull request #3397 from martin-frbg/m1detect

Fix detection of Apple M1 "Vortex"

2 years agoFix cache reporting for Apple M1
Martin Kroeker [Mon, 4 Oct 2021 15:58:29 +0000 (17:58 +0200)]
Fix cache reporting for Apple M1

2 years agoFix detection of Apple M1 "Vortex"
Martin Kroeker [Mon, 4 Oct 2021 14:46:41 +0000 (16:46 +0200)]
Fix detection of Apple M1 "Vortex"

2 years agoFix minor typo
Martin Kroeker [Mon, 4 Oct 2021 14:14:32 +0000 (16:14 +0200)]
Fix minor typo

2 years agoUpdate version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:15:39 +0000 (21:15 +0200)]
Update version to 0.3.18.dev

2 years agoUpdate version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:15:00 +0000 (21:15 +0200)]
Update version to 0.3.18.dev

2 years agoMerge pull request #3395 from xianyi/release-0.3.0
Martin Kroeker [Sat, 2 Oct 2021 19:14:11 +0000 (21:14 +0200)]
Merge pull request #3395 from xianyi/release-0.3.0

Merge 0.3.18 back into develop to copy tag

2 years agoUpdate version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:38:09 +0000 (19:38 +0200)]
Update version to 0.3.18

2 years agoMerge pull request #3394 from xianyi/develop
Martin Kroeker [Sat, 2 Oct 2021 17:35:27 +0000 (19:35 +0200)]
Merge pull request #3394 from xianyi/develop

Merge from develop for 0.3.18

2 years agoMerge branch 'release-0.3.0' into develop
Martin Kroeker [Sat, 2 Oct 2021 17:35:03 +0000 (19:35 +0200)]
Merge branch 'release-0.3.0' into develop

2 years agoUpdate version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:29:59 +0000 (19:29 +0200)]
Update version to 0.3.18

2 years agoUpdate Changelog for 0.3.18 (#3388)
Martin Kroeker [Sat, 2 Oct 2021 17:25:58 +0000 (19:25 +0200)]
Update Changelog for 0.3.18 (#3388)

* Update Changelog for 0.3.18

2 years agoMerge pull request #3393 from martin-frbg/azurealpine
Martin Kroeker [Sat, 2 Oct 2021 17:07:04 +0000 (19:07 +0200)]
Merge pull request #3393 from martin-frbg/azurealpine

Update Alpine version used in Azure CI

2 years agoUpdate Alpine version
Martin Kroeker [Sat, 2 Oct 2021 14:27:34 +0000 (16:27 +0200)]
Update Alpine version

2 years agoMerge pull request #3392 from martin-frbg/lapack625
Martin Kroeker [Fri, 1 Oct 2021 12:56:09 +0000 (14:56 +0200)]
Merge pull request #3392 from martin-frbg/lapack625

Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)

2 years agoAIX changes for P10 with GNU Compiler
kavanabhat [Fri, 1 Oct 2021 10:18:35 +0000 (05:18 -0500)]
AIX changes for P10 with GNU Compiler

2 years agoFix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:19:53 +0000 (11:19 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)

2 years agoFix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:19:07 +0000 (11:19 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)

2 years agoFix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:18:20 +0000 (11:18 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)

2 years agoFix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:17:21 +0000 (11:17 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)

2 years agoMerge pull request #3390 from Keno/patch-4
Martin Kroeker [Fri, 1 Oct 2021 07:11:12 +0000 (09:11 +0200)]
Merge pull request #3390 from Keno/patch-4

Make sure that Netlib LAPACK respects FFLAGS

2 years agoAIX changes for P10 with GNU Compiler
kavanabhat [Thu, 30 Sep 2021 11:06:27 +0000 (06:06 -0500)]
AIX changes for P10 with GNU Compiler

2 years agoMake sure that Netlib LAPACK respects FFLAGS
Keno Fischer [Thu, 30 Sep 2021 07:14:15 +0000 (03:14 -0400)]
Make sure that Netlib LAPACK respects FFLAGS

OpenBLAS allows users to specify `FFLAGS` and then uses `override` to append additional
options. However, without such an override in lapack's make.inc, lapack would use the external
FFLAGS, rather than the ones being computed by OpenBLAS. For example the `DEBUG=1` flag
would not apply to LAPACK code. This is all a bit messy but forced by the integration with netlib
lapack. Note that `CFLAGS` already has this override for the same reason. It is possible that
other variables here should have a similar override, but I think for most of the other ones, OpenBLAS's
build system does not append to the flags passed in by the user.

2 years agoMerge pull request #3389 from guowangy/bf16-build-warn-fix
Martin Kroeker [Tue, 28 Sep 2021 17:44:36 +0000 (19:44 +0200)]
Merge pull request #3389 from guowangy/bf16-build-warn-fix

x86_64: BFLOAT16: fix build warning

2 years agox86_64: BFLOAT16: fix build warning
Wangyang Guo [Tue, 28 Sep 2021 10:22:15 +0000 (18:22 +0800)]
x86_64: BFLOAT16: fix build warning

2 years agoMerge pull request #3387 from commodo/adjust-mips-el-archs
Martin Kroeker [Sun, 26 Sep 2021 12:16:50 +0000 (14:16 +0200)]
Merge pull request #3387 from commodo/adjust-mips-el-archs

Makefile.system: adjust mipsel/mips64el ARCH variables

2 years agoMakefile.system: adjust mipsel/mips64el ARCH variables
Alexandru Ardelean [Sun, 26 Sep 2021 09:17:21 +0000 (12:17 +0300)]
Makefile.system: adjust mipsel/mips64el ARCH variables

When building for MIPS{64} little-endian variants, the included makefiles
should be the same as for the big-endian.

There are already some adjustments being done for some ARCH names.
This change adds the ones for the `mipsel` and `mips64el` names, so that
the Makefile.mips{64} files get included.

This comes as a result of: https://github.com/openwrt/packages/issues/16649

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
2 years agoMerge pull request #3385 from martin-frbg/update_readme
Martin Kroeker [Sun, 19 Sep 2021 16:16:02 +0000 (18:16 +0200)]
Merge pull request #3385 from martin-frbg/update_readme

Update README.md

2 years agoUpdate README.md
Martin Kroeker [Sun, 19 Sep 2021 12:54:35 +0000 (14:54 +0200)]
Update README.md

2 years agoMerge pull request #3384 from martin-frbg/issue3383
Martin Kroeker [Fri, 17 Sep 2021 12:53:39 +0000 (14:53 +0200)]
Merge pull request #3384 from martin-frbg/issue3383

Modify ARMV8 kernels to leave x18 unused as it is reserved on OSX

2 years agoMove alphaI to x22 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:53:18 +0000 (09:53 +0200)]
Move alphaI to x22 to leave x18 unused (reserved on OSX)

2 years agomove alpha to x19/x20 to leave x18 unused for OSX
Martin Kroeker [Fri, 17 Sep 2021 07:42:17 +0000 (09:42 +0200)]
move alpha to x19/x20 to leave x18 unused for OSX

2 years agoMove temp to x21 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:28:19 +0000 (09:28 +0200)]
Move temp to x21 to leave x18 unused (reserved on OSX)

2 years agoMove temp to x21 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:24:11 +0000 (09:24 +0200)]
Move temp to x21 to leave x18 unused (reserved on OSX)

2 years agoUse x21 for I to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:19:51 +0000 (09:19 +0200)]
Use x21 for I to leave x18 unused (reserved on OSX)

2 years agoRemove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:18:25 +0000 (09:18 +0200)]
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)

2 years agoMerge pull request #3382 from rafaelcfsousa/rafael/cwarnings
Martin Kroeker [Fri, 17 Sep 2021 07:15:16 +0000 (09:15 +0200)]
Merge pull request #3382 from rafaelcfsousa/rafael/cwarnings

[POWER] Remove unused variable warnings.

2 years agoMerge pull request #3381 from martin-frbg/issue3371
Martin Kroeker [Thu, 16 Sep 2021 05:14:49 +0000 (07:14 +0200)]
Merge pull request #3381 from martin-frbg/issue3371

Silence compiler warnings about uninitialized variables

2 years agoRemove unused commented code (#if directive)
Rafael Cardoso Fernandes Sousa [Wed, 15 Sep 2021 22:18:48 +0000 (22:18 +0000)]
Remove unused commented code (#if directive)

2 years agoInitialize abs_mask1 with itself to silence a gcc warning
Martin Kroeker [Wed, 15 Sep 2021 20:11:35 +0000 (22:11 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning

2 years agoInitialize abs_mask1 with itself to silence a gcc warning
Martin Kroeker [Wed, 15 Sep 2021 20:10:43 +0000 (22:10 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning

actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s