platform/upstream/openblas.git
2 years agoInitialize abs_mask1 with itself to silence a gcc warning
Martin Kroeker [Wed, 15 Sep 2021 20:11:35 +0000 (22:11 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning

2 years agoInitialize abs_mask1 with itself to silence a gcc warning
Martin Kroeker [Wed, 15 Sep 2021 20:10:43 +0000 (22:10 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning

actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s

2 years agoMerge pull request #3380 from martin-frbg/structwarn
Martin Kroeker [Wed, 15 Sep 2021 05:19:09 +0000 (07:19 +0200)]
Merge pull request #3380 from martin-frbg/structwarn

Remove extraneous qualifiers from struct definition

2 years agoMerge pull request #3379 from martin-frbg/issue3369-2
Martin Kroeker [Wed, 15 Sep 2021 05:18:57 +0000 (07:18 +0200)]
Merge pull request #3379 from martin-frbg/issue3369-2

Add casts to fix compiler warnings for SkylakeX sasum/dasum

2 years agoMerge pull request #3378 from martin-frbg/issue3368-2
Martin Kroeker [Wed, 15 Sep 2021 05:18:38 +0000 (07:18 +0200)]
Merge pull request #3378 from martin-frbg/issue3368-2

Rework generation of BFLOAT16 objects in CMAKE builds and fix missing CBLAS_XERBLA

2 years agoRemove extraneous qualifiers from struct definition
Martin Kroeker [Tue, 14 Sep 2021 19:52:26 +0000 (21:52 +0200)]
Remove extraneous qualifiers from struct definition

2 years agoAdd casts
Martin Kroeker [Tue, 14 Sep 2021 19:41:53 +0000 (21:41 +0200)]
Add casts

2 years agoAdd dedicated entries for BFLOAT16 kernels
Martin Kroeker [Tue, 14 Sep 2021 14:17:18 +0000 (16:17 +0200)]
Add dedicated entries for BFLOAT16 kernels

2 years agoAdd separate entries for BFLOAT16 functions and fix missing cblas_xerbla
Martin Kroeker [Tue, 14 Sep 2021 14:15:57 +0000 (16:15 +0200)]
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla

2 years agoAdd sbgemm
Martin Kroeker [Tue, 14 Sep 2021 14:14:43 +0000 (16:14 +0200)]
Add sbgemm

2 years agoAdd sbgemv
Martin Kroeker [Tue, 14 Sep 2021 14:13:57 +0000 (16:13 +0200)]
Add sbgemv

2 years agoPropagate BUILD_BFLOAT16 to CFLAGS
Martin Kroeker [Tue, 14 Sep 2021 14:12:27 +0000 (16:12 +0200)]
Propagate BUILD_BFLOAT16 to CFLAGS

2 years agoAdd defaults for SBGEMV kernels
Martin Kroeker [Tue, 14 Sep 2021 14:10:58 +0000 (16:10 +0200)]
Add defaults for SBGEMV kernels

2 years agoRemove BFLOAT16 from the task list of GenerateNamedObject
Martin Kroeker [Tue, 14 Sep 2021 14:09:46 +0000 (16:09 +0200)]
Remove BFLOAT16 from the task list of GenerateNamedObject

2 years agoMerge pull request #3376 from martin-frbg/issue3370
Martin Kroeker [Sat, 11 Sep 2021 22:01:31 +0000 (00:01 +0200)]
Merge pull request #3376 from martin-frbg/issue3370

Fix a few harmless compiler warnings

2 years agoMerge pull request #3375 from martin-frbg/issue3369
Martin Kroeker [Sat, 11 Sep 2021 22:01:20 +0000 (00:01 +0200)]
Merge pull request #3375 from martin-frbg/issue3369

Add casts to eliminate compiler warnings for Haswell sasum/dasum

2 years agoOne instance of kernel_4x1 is used even on SKX
Martin Kroeker [Sat, 11 Sep 2021 13:30:19 +0000 (15:30 +0200)]
One instance of kernel_4x1 is used even on SKX

2 years agoreally remove the unused variable
Martin Kroeker [Sat, 11 Sep 2021 13:05:55 +0000 (15:05 +0200)]
really remove the unused variable

2 years agoAdd ifdefs around conditionally used functions
Martin Kroeker [Sat, 11 Sep 2021 12:38:47 +0000 (14:38 +0200)]
Add ifdefs around conditionally used functions

2 years agoMove a conditionally used variable
Martin Kroeker [Sat, 11 Sep 2021 12:37:44 +0000 (14:37 +0200)]
Move a conditionally used variable

2 years agoRemove unused variable
Martin Kroeker [Sat, 11 Sep 2021 12:36:27 +0000 (14:36 +0200)]
Remove unused variable

2 years agoAdd casts
Martin Kroeker [Sat, 11 Sep 2021 11:38:28 +0000 (13:38 +0200)]
Add casts

2 years agoMerge pull request #3367 from RajalakshmiSR/makesyntax
Martin Kroeker [Wed, 8 Sep 2021 18:19:39 +0000 (20:19 +0200)]
Merge pull request #3367 from RajalakshmiSR/makesyntax

POWER: Fixing syntax error in makefile

2 years agoFixing syntax error in makefile
Rajalakshmi Srinivasaraghavan [Wed, 8 Sep 2021 12:04:13 +0000 (07:04 -0500)]
Fixing syntax error in makefile

Fixing syntax issue in Makefile.power added by recent commit
af19cda65aef4d033ae33213013c88b0a99f9da2

2 years agoMerge pull request #3366 from martin-frbg/azure-ubuntu
Martin Kroeker [Wed, 8 Sep 2021 11:57:35 +0000 (13:57 +0200)]
Merge pull request #3366 from martin-frbg/azure-ubuntu

migrate Azure CI jobs from deprecated ubuntu-16.04 vmImage

2 years agomigrate from deprecated ubuntu-16.04 vmImage
Martin Kroeker [Wed, 8 Sep 2021 08:51:59 +0000 (10:51 +0200)]
migrate from deprecated ubuntu-16.04 vmImage

2 years agoMerge pull request #3365 from martin-frbg/travis-lx
Martin Kroeker [Tue, 7 Sep 2021 14:24:33 +0000 (16:24 +0200)]
Merge pull request #3365 from martin-frbg/travis-lx

Disable the remaining x86_64 job on Travis

2 years agoMerge pull request #3364 from guowangy/bf16-cooperlake
Martin Kroeker [Tue, 7 Sep 2021 11:57:40 +0000 (13:57 +0200)]
Merge pull request #3364 from guowangy/bf16-cooperlake

Add SBGEMM kernel for Cooperlake

2 years agosbgemm: fix build error in BFLOAT16 disabled
Wangyang Guo [Tue, 7 Sep 2021 15:37:08 +0000 (23:37 +0800)]
sbgemm: fix build error in BFLOAT16 disabled

2 years agosbgemm: avoid falling into SGEMM_KERNEL_DIRECT
Wangyang Guo [Tue, 7 Sep 2021 10:34:26 +0000 (18:34 +0800)]
sbgemm: avoid falling into SGEMM_KERNEL_DIRECT

2 years agosbgemm: cooperlake: tuning for small matrix
Wangyang Guo [Tue, 7 Sep 2021 10:12:40 +0000 (18:12 +0800)]
sbgemm: cooperlake: tuning for small matrix

2 years agosbgemm: cooperlake: implement ncopy_16
Wangyang Guo [Fri, 20 Aug 2021 14:01:00 +0000 (22:01 +0800)]
sbgemm: cooperlake: implement ncopy_16

2 years agosbgemm: cooperlake: add n24 kernel for tcopy_4
Wangyang Guo [Thu, 19 Aug 2021 11:46:08 +0000 (19:46 +0800)]
sbgemm: cooperlake: add n24 kernel for tcopy_4

2 years agosbgemm: cooperlake: implement tcopy_4
Wangyang Guo [Wed, 18 Aug 2021 16:08:06 +0000 (00:08 +0800)]
sbgemm: cooperlake: implement tcopy_4

2 years agosbgemm: cooperlake: prefetch A & B
Wangyang Guo [Wed, 18 Aug 2021 13:17:08 +0000 (21:17 +0800)]
sbgemm: cooperlake: prefetch A & B

2 years agosbgemm: cooperlake: unroll core loop by 2
Wangyang Guo [Tue, 17 Aug 2021 15:21:19 +0000 (23:21 +0800)]
sbgemm: cooperlake: unroll core loop by 2

2 years agosbgemm: cooperlake: reorder ptr increase for performance
Wangyang Guo [Tue, 17 Aug 2021 14:08:24 +0000 (22:08 +0800)]
sbgemm: cooperlake: reorder ptr increase for performance

2 years agosbgemm: cooperlake: fix bug in m64n12
Wangyang Guo [Tue, 17 Aug 2021 13:13:29 +0000 (21:13 +0800)]
sbgemm: cooperlake: fix bug in m64n12

2 years agosbgemm: cooperlake: tuning for block params
Wangyang Guo [Tue, 17 Aug 2021 11:35:40 +0000 (19:35 +0800)]
sbgemm: cooperlake: tuning for block params

2 years agosbgemm: cooperlake: kernel works for NN
Wangyang Guo [Mon, 16 Aug 2021 11:39:24 +0000 (19:39 +0800)]
sbgemm: cooperlake: kernel works for NN

2 years agosbgemm: cooperlake: change kernel size to 16x4
Wangyang Guo [Thu, 12 Aug 2021 01:46:49 +0000 (01:46 +0000)]
sbgemm: cooperlake: change kernel size to 16x4

2 years agosbgemm: cooperlake: implement sbgemm_tcopy_32
Wangyang Guo [Tue, 10 Aug 2021 06:14:45 +0000 (06:14 +0000)]
sbgemm: cooperlake: implement sbgemm_tcopy_32

2 years agosbgemm: cooperlake: add dummy source files
Wangyang Guo [Tue, 10 Aug 2021 03:23:45 +0000 (03:23 +0000)]
sbgemm: cooperlake: add dummy source files

2 years agoUpdate .travis.yml
Martin Kroeker [Tue, 7 Sep 2021 09:40:40 +0000 (11:40 +0200)]
Update .travis.yml

2 years agoDisable the remaining x86_64 job on Travis
Martin Kroeker [Tue, 7 Sep 2021 09:19:51 +0000 (11:19 +0200)]
Disable the remaining x86_64 job on Travis

2 years agoMerge pull request #3363 from martin-frbg/fixpr3360
Martin Kroeker [Tue, 7 Sep 2021 06:02:53 +0000 (08:02 +0200)]
Merge pull request #3363 from martin-frbg/fixpr3360

Correct misplaced ifdef lines from PR 3360

2 years agoCorrect misplaced ifdef lines
Martin Kroeker [Mon, 6 Sep 2021 21:44:20 +0000 (23:44 +0200)]
Correct misplaced ifdef lines

2 years agoAdd NO_AVX=1 fallbacks to newer generation x86_64 for completeness (#3360)
Martin Kroeker [Sun, 5 Sep 2021 18:35:48 +0000 (20:35 +0200)]
Add NO_AVX=1 fallbacks to newer generation x86_64 for completeness (#3360)

* Add NO_AVX=1 fallbacks to newer generation x86_64 for completeness

* Update .travis.yml

2 years agoAdd "recursive" option for IBM xlf compiler (#3359)
Martin Kroeker [Sat, 4 Sep 2021 16:26:59 +0000 (18:26 +0200)]
Add "recursive" option for IBM xlf compiler (#3359)

* Add correct "recursive" option for xlf (from reference-lapack issue 606)

2 years agoMerge pull request #3355 from martin-frbg/smallgemmcr
Martin Kroeker [Wed, 1 Sep 2021 22:27:23 +0000 (00:27 +0200)]
Merge pull request #3355 from martin-frbg/smallgemmcr

Add workaround for Windows10 macro name clash in small gemm kernel build rules

2 years agoAdd workaround for Windows10 macro name clash
Martin Kroeker [Wed, 1 Sep 2021 19:36:50 +0000 (21:36 +0200)]
Add workaround for Windows10 macro name clash

2 years agoMerge pull request #3352 from martin-frbg/3321-2n
Martin Kroeker [Wed, 1 Sep 2021 11:52:40 +0000 (13:52 +0200)]
Merge pull request #3352 from martin-frbg/3321-2n

Allocate an auxiliary struct when running out of preconfigured threads

2 years agoMerge pull request #3354 from nsait-linaro/fix_gmemm_align_win_arm
Martin Kroeker [Tue, 31 Aug 2021 19:47:21 +0000 (21:47 +0200)]
Merge pull request #3354 from nsait-linaro/fix_gmemm_align_win_arm

[win/arm64]: Explicit casting for GEMM_DEFAULT_ALIGN to create 64-bit value

2 years agoMake explicit conversion condition on _WIN64 flag
Niyas Sait [Tue, 31 Aug 2021 13:36:44 +0000 (14:36 +0100)]
Make explicit conversion condition on _WIN64 flag

2 years ago[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value
Niyas Sait [Tue, 24 Aug 2021 05:09:29 +0000 (06:09 +0100)]
[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value

Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address

2 years agoMerge pull request #3353 from guowangy/bf16-small-matrix-cooperlake
Martin Kroeker [Mon, 30 Aug 2021 18:39:51 +0000 (20:39 +0200)]
Merge pull request #3353 from guowangy/bf16-small-matrix-cooperlake

Enable existing SBGEMM kernel for Cooperlake by small-matrix path

2 years agoFix typo
Martin Kroeker [Mon, 30 Aug 2021 12:38:28 +0000 (14:38 +0200)]
Fix typo

2 years agoClean up debug messages
Martin Kroeker [Mon, 30 Aug 2021 12:21:25 +0000 (14:21 +0200)]
Clean up debug messages

2 years agosbgemm: remove unnecessary b0 files
Wangyang Guo [Mon, 30 Aug 2021 09:48:11 +0000 (17:48 +0800)]
sbgemm: remove unnecessary b0 files

2 years agosbgemm: cooperlake: make sure hot buffer aligned to 64
Wangyang Guo [Fri, 13 Aug 2021 10:43:41 +0000 (18:43 +0800)]
sbgemm: cooperlake: make sure hot buffer aligned to 64

2 years agosbgemm: add missing cblas_sbgemm definition
Wangyang Guo [Thu, 12 Aug 2021 16:51:24 +0000 (00:51 +0800)]
sbgemm: add missing cblas_sbgemm definition

2 years agosbgemm: cooperlake: enable SBGEMM by small matrix path
Wangyang Guo [Thu, 12 Aug 2021 06:10:51 +0000 (06:10 +0000)]
sbgemm: cooperlake: enable SBGEMM by small matrix path

2 years agoSmall Matrix: support BFLOAT16 data type
Wangyang Guo [Thu, 12 Aug 2021 03:14:18 +0000 (03:14 +0000)]
Small Matrix: support BFLOAT16 data type

2 years agoMerge pull request #3335 from guowangy/small-matrix-latest
Martin Kroeker [Sun, 29 Aug 2021 20:33:33 +0000 (22:33 +0200)]
Merge pull request #3335 from guowangy/small-matrix-latest

Add GEMM optimization for small matrix and single/double kernel for skylakex

2 years agoFix unmap logic
Martin Kroeker [Sun, 29 Aug 2021 17:50:24 +0000 (19:50 +0200)]
Fix unmap logic

2 years agoAdd likely() hints for gcc
Martin Kroeker [Sun, 29 Aug 2021 11:54:51 +0000 (13:54 +0200)]
Add likely() hints for gcc

2 years agoFix typo
Martin Kroeker [Sat, 28 Aug 2021 15:14:59 +0000 (17:14 +0200)]
Fix typo

2 years agoAdd auxiliary tracking space for thread buffer frees too
Martin Kroeker [Sat, 28 Aug 2021 15:03:53 +0000 (17:03 +0200)]
Add auxiliary tracking space for thread buffer frees too

2 years agoAllocate an auxiliary struct when running out of preconfigured threads
Martin Kroeker [Sat, 28 Aug 2021 12:18:36 +0000 (14:18 +0200)]
Allocate an auxiliary struct when running out of preconfigured threads

2 years agoMerge pull request #3348 from guowangy/skylakex-sgemv_t-fix
Martin Kroeker [Wed, 25 Aug 2021 20:43:45 +0000 (22:43 +0200)]
Merge pull request #3348 from guowangy/skylakex-sgemv_t-fix

skylakex sgemv_t kernel fix

2 years agoMerge pull request #3345 from nsait-linaro/windows_on_arm64
Martin Kroeker [Wed, 25 Aug 2021 13:49:55 +0000 (15:49 +0200)]
Merge pull request #3345 from nsait-linaro/windows_on_arm64

Add support for windows/arm64 targets with clang

2 years agosgemv: skylakex: fix build warning
Wangyang Guo [Wed, 25 Aug 2021 07:13:00 +0000 (07:13 +0000)]
sgemv: skylakex: fix build warning

2 years agosgemv: skylakex: bug fix for sgemv_t kernel in corner case
Wangyang Guo [Wed, 25 Aug 2021 07:07:27 +0000 (07:07 +0000)]
sgemv: skylakex: bug fix for sgemv_t kernel in corner case

2 years agoFix ctest.h to build using clang on windows
Niyas Sait [Mon, 16 Aug 2021 10:25:07 +0000 (11:25 +0100)]
Fix ctest.h to build using clang on windows

2 years agoadd support for building on windows/arm64 target
Niyas Sait [Mon, 16 Aug 2021 10:22:51 +0000 (11:22 +0100)]
add support for building on windows/arm64 target

2 years agoAdd more OSX build jobs to Azure CI (#3338)
Martin Kroeker [Sat, 14 Aug 2021 22:17:23 +0000 (00:17 +0200)]
Add more OSX build jobs to Azure CI (#3338)

* Add OSX build job with Homebrew OpenMP in a CMAKE build
* Check install step on OSX/gcc to make sure all include files are generated and installed as intended
* Add mixed clang/gfortran build with cmake on OSX
* move IOS ARMV7/ARMV8 crossbuilds from travis to azure

2 years agoSmall Matrix: skylakex: remove unnecessary b0 source files
Wangyang Guo [Fri, 13 Aug 2021 03:28:44 +0000 (03:28 +0000)]
Small Matrix: skylakex: remove unnecessary b0 source files

2 years agoSmall Matrix: reduce generic kernel source files
Wangyang Guo [Fri, 13 Aug 2021 03:17:38 +0000 (03:17 +0000)]
Small Matrix: reduce generic kernel source files

2 years agoMerge pull request #3344 from gxw-loongson/develop
Martin Kroeker [Thu, 12 Aug 2021 13:16:46 +0000 (15:16 +0200)]
Merge pull request #3344 from gxw-loongson/develop

Delete the macro instruction "li" and use "li.d" instead

2 years agoDelete the macro instruction "li" and use "li.d" instead
gxw [Tue, 10 Aug 2021 08:42:57 +0000 (16:42 +0800)]
Delete the macro instruction "li" and use "li.d" instead

Change-Id: Icff7981e2eb7df29ba5af1f8eb5be8443c67450f

2 years agoMerge pull request #3343 from cianciosa/develop
Martin Kroeker [Wed, 11 Aug 2021 23:28:18 +0000 (01:28 +0200)]
Merge pull request #3343 from cianciosa/develop

Fix undefined behavior checking the size of ARGC

2 years agoFix a small syntax error. A ( was accidently deleted.
cianciosa [Wed, 11 Aug 2021 16:08:34 +0000 (12:08 -0400)]
Fix a small syntax error. A ( was accidently deleted.

2 years agoCheck the total number of arguments passed insead of if the ARGV# is defined. This...
cianciosa [Wed, 11 Aug 2021 16:00:07 +0000 (12:00 -0400)]
Check the total number of arguments passed insead of if the ARGV# is defined. This fixes a problem when compling openblas as a subproject of another code.

2 years agoMerge pull request #3341 from RajalakshmiSR/dasump10
Martin Kroeker [Wed, 11 Aug 2021 07:39:10 +0000 (09:39 +0200)]
Merge pull request #3341 from RajalakshmiSR/dasump10

POWER10: Improving dasum performance

2 years agoPOWER10: Improving dasum performance
Rajalakshmi Srinivasaraghavan [Wed, 11 Aug 2021 03:06:04 +0000 (22:06 -0500)]
POWER10: Improving dasum performance

Unrolling a loop in dasum micro code to help in improving
POWER10 performance.

2 years agoMerge pull request #3336 from martin-frbg/traviscom
Zhang Xianyi [Thu, 5 Aug 2021 11:13:19 +0000 (19:13 +0800)]
Merge pull request #3336 from martin-frbg/traviscom

Disable all x86 jobs on Travis

2 years agoDisable all x86 jobs
Martin Kroeker [Thu, 5 Aug 2021 09:08:18 +0000 (11:08 +0200)]
Disable all x86 jobs

2 years agoMerge pull request #3332 from martin-frbg/travisbadge
Martin Kroeker [Thu, 5 Aug 2021 07:36:59 +0000 (09:36 +0200)]
Merge pull request #3332 from martin-frbg/travisbadge

Update Travis badge in README

2 years agoMerge pull request #3334 from Guobing-Chen/BF16_gemm_full_kernel
Martin Kroeker [Thu, 5 Aug 2021 06:01:13 +0000 (08:01 +0200)]
Merge pull request #3334 from Guobing-Chen/BF16_gemm_full_kernel

Add all SBGEMM kernels for IA AVX512-BF16 based platforms

2 years agoSmall Matrix: skip compile in unimplemented data type
Wangyang Guo [Thu, 5 Aug 2021 05:46:13 +0000 (05:46 +0000)]
Small Matrix: skip compile in unimplemented data type

2 years agoSmall Matrix: skylakex: fix build error in old compiler
Wangyang Guo [Thu, 5 Aug 2021 04:43:47 +0000 (04:43 +0000)]
Small Matrix: skylakex: fix build error in old compiler

2 years agoAdd all SBGEMM kernels for IA AVX512-BF16 based platforms
Chen, Guobing [Thu, 5 Aug 2021 03:11:14 +0000 (11:11 +0800)]
Add all SBGEMM kernels for IA AVX512-BF16 based platforms

Added all SBGEMM kernels including NN/NT/TN/TT for both ColMajor and
RowMajor, based on AVX512-BF16 ISA set on IA.

Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
2 years agoSmall Matrix: enable by default for x86_64 arch
Wangyang Guo [Thu, 5 Aug 2021 02:57:58 +0000 (02:57 +0000)]
Small Matrix: enable by default for x86_64 arch

If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.

2 years agoSmall Matrix: better handle with GEMM3M marco
Wangyang Guo [Thu, 5 Aug 2021 02:45:53 +0000 (02:45 +0000)]
Small Matrix: better handle with GEMM3M marco

2 years agoSmall Matrix: support cmake build
Wangyang Guo [Wed, 4 Aug 2021 08:50:15 +0000 (08:50 +0000)]
Small Matrix: support cmake build

2 years agoSmall Matrix: support DYNAMIC_ARCH build
Wangyang Guo [Wed, 4 Aug 2021 03:12:41 +0000 (03:12 +0000)]
Small Matrix: support DYNAMIC_ARCH build

2 years agoUpdate Travis badge in README
Martin Kroeker [Tue, 3 Aug 2021 08:45:45 +0000 (10:45 +0200)]
Update Travis badge in README

2 years agoSmall Matrix: disable low performance default kernel
Wangyang Guo [Tue, 15 Jun 2021 16:09:51 +0000 (16:09 +0000)]
Small Matrix: disable low performance default kernel

2 years agoMerge pull request #3330 from xianyi/issue3321
Martin Kroeker [Mon, 2 Aug 2021 20:36:05 +0000 (22:36 +0200)]
Merge pull request #3330 from xianyi/issue3321

Improve the "tried to allocate too many buffers" error message

2 years agoActually add the message to the TLS section
Martin Kroeker [Mon, 2 Aug 2021 12:50:14 +0000 (14:50 +0200)]
Actually add the message to the TLS section