Martin Kroeker [Mon, 4 Oct 2021 14:46:41 +0000 (16:46 +0200)]
Fix detection of Apple M1 "Vortex"
Martin Kroeker [Sat, 2 Oct 2021 19:15:39 +0000 (21:15 +0200)]
Update version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:15:00 +0000 (21:15 +0200)]
Update version to 0.3.18.dev
Martin Kroeker [Sat, 2 Oct 2021 19:14:11 +0000 (21:14 +0200)]
Merge pull request #3395 from xianyi/release-0.3.0
Merge 0.3.18 back into develop to copy tag
Martin Kroeker [Sat, 2 Oct 2021 17:38:09 +0000 (19:38 +0200)]
Update version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:35:27 +0000 (19:35 +0200)]
Merge pull request #3394 from xianyi/develop
Merge from develop for 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:35:03 +0000 (19:35 +0200)]
Merge branch 'release-0.3.0' into develop
Martin Kroeker [Sat, 2 Oct 2021 17:29:59 +0000 (19:29 +0200)]
Update version to 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:25:58 +0000 (19:25 +0200)]
Update Changelog for 0.3.18 (#3388)
* Update Changelog for 0.3.18
Martin Kroeker [Sat, 2 Oct 2021 17:07:04 +0000 (19:07 +0200)]
Merge pull request #3393 from martin-frbg/azurealpine
Update Alpine version used in Azure CI
Martin Kroeker [Sat, 2 Oct 2021 14:27:34 +0000 (16:27 +0200)]
Update Alpine version
Martin Kroeker [Fri, 1 Oct 2021 12:56:09 +0000 (14:56 +0200)]
Merge pull request #3392 from martin-frbg/lapack625
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:19:53 +0000 (11:19 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:19:07 +0000 (11:19 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:18:20 +0000 (11:18 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 09:17:21 +0000 (11:17 +0200)]
Fix out of bounds read in ?llarv (Reference-LAPACK PR 625)
Martin Kroeker [Fri, 1 Oct 2021 07:11:12 +0000 (09:11 +0200)]
Merge pull request #3390 from Keno/patch-4
Make sure that Netlib LAPACK respects FFLAGS
Keno Fischer [Thu, 30 Sep 2021 07:14:15 +0000 (03:14 -0400)]
Make sure that Netlib LAPACK respects FFLAGS
OpenBLAS allows users to specify `FFLAGS` and then uses `override` to append additional
options. However, without such an override in lapack's make.inc, lapack would use the external
FFLAGS, rather than the ones being computed by OpenBLAS. For example the `DEBUG=1` flag
would not apply to LAPACK code. This is all a bit messy but forced by the integration with netlib
lapack. Note that `CFLAGS` already has this override for the same reason. It is possible that
other variables here should have a similar override, but I think for most of the other ones, OpenBLAS's
build system does not append to the flags passed in by the user.
Martin Kroeker [Tue, 28 Sep 2021 17:44:36 +0000 (19:44 +0200)]
Merge pull request #3389 from guowangy/bf16-build-warn-fix
x86_64: BFLOAT16: fix build warning
Wangyang Guo [Tue, 28 Sep 2021 10:22:15 +0000 (18:22 +0800)]
x86_64: BFLOAT16: fix build warning
Martin Kroeker [Sun, 26 Sep 2021 12:16:50 +0000 (14:16 +0200)]
Merge pull request #3387 from commodo/adjust-mips-el-archs
Makefile.system: adjust mipsel/mips64el ARCH variables
Alexandru Ardelean [Sun, 26 Sep 2021 09:17:21 +0000 (12:17 +0300)]
Makefile.system: adjust mipsel/mips64el ARCH variables
When building for MIPS{64} little-endian variants, the included makefiles
should be the same as for the big-endian.
There are already some adjustments being done for some ARCH names.
This change adds the ones for the `mipsel` and `mips64el` names, so that
the Makefile.mips{64} files get included.
This comes as a result of: https://github.com/openwrt/packages/issues/16649
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Martin Kroeker [Sun, 19 Sep 2021 16:16:02 +0000 (18:16 +0200)]
Merge pull request #3385 from martin-frbg/update_readme
Update README.md
Martin Kroeker [Sun, 19 Sep 2021 12:54:35 +0000 (14:54 +0200)]
Update README.md
Martin Kroeker [Fri, 17 Sep 2021 12:53:39 +0000 (14:53 +0200)]
Merge pull request #3384 from martin-frbg/issue3383
Modify ARMV8 kernels to leave x18 unused as it is reserved on OSX
Martin Kroeker [Fri, 17 Sep 2021 07:53:18 +0000 (09:53 +0200)]
Move alphaI to x22 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:42:17 +0000 (09:42 +0200)]
move alpha to x19/x20 to leave x18 unused for OSX
Martin Kroeker [Fri, 17 Sep 2021 07:28:19 +0000 (09:28 +0200)]
Move temp to x21 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:24:11 +0000 (09:24 +0200)]
Move temp to x21 to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:19:51 +0000 (09:19 +0200)]
Use x21 for I to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:18:25 +0000 (09:18 +0200)]
Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX)
Martin Kroeker [Fri, 17 Sep 2021 07:15:16 +0000 (09:15 +0200)]
Merge pull request #3382 from rafaelcfsousa/rafael/cwarnings
[POWER] Remove unused variable warnings.
Martin Kroeker [Thu, 16 Sep 2021 05:14:49 +0000 (07:14 +0200)]
Merge pull request #3381 from martin-frbg/issue3371
Silence compiler warnings about uninitialized variables
Rafael Cardoso Fernandes Sousa [Wed, 15 Sep 2021 22:18:48 +0000 (22:18 +0000)]
Remove unused commented code (#if directive)
Martin Kroeker [Wed, 15 Sep 2021 20:11:35 +0000 (22:11 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning
Martin Kroeker [Wed, 15 Sep 2021 20:10:43 +0000 (22:10 +0200)]
Initialize abs_mask1 with itself to silence a gcc warning
actual initialization is via the _mm_cmpeq_ep18, which I've seen claimed to be the fastest way to set an xmm register to all 1s
Rafael Cardoso Fernandes Sousa [Wed, 15 Sep 2021 18:36:07 +0000 (13:36 -0500)]
Fix unused variable warnings on Power
Martin Kroeker [Wed, 15 Sep 2021 05:19:09 +0000 (07:19 +0200)]
Merge pull request #3380 from martin-frbg/structwarn
Remove extraneous qualifiers from struct definition
Martin Kroeker [Wed, 15 Sep 2021 05:18:57 +0000 (07:18 +0200)]
Merge pull request #3379 from martin-frbg/issue3369-2
Add casts to fix compiler warnings for SkylakeX sasum/dasum
Martin Kroeker [Wed, 15 Sep 2021 05:18:38 +0000 (07:18 +0200)]
Merge pull request #3378 from martin-frbg/issue3368-2
Rework generation of BFLOAT16 objects in CMAKE builds and fix missing CBLAS_XERBLA
Martin Kroeker [Tue, 14 Sep 2021 19:52:26 +0000 (21:52 +0200)]
Remove extraneous qualifiers from struct definition
Martin Kroeker [Tue, 14 Sep 2021 19:41:53 +0000 (21:41 +0200)]
Add casts
Martin Kroeker [Tue, 14 Sep 2021 14:17:18 +0000 (16:17 +0200)]
Add dedicated entries for BFLOAT16 kernels
Martin Kroeker [Tue, 14 Sep 2021 14:15:57 +0000 (16:15 +0200)]
Add separate entries for BFLOAT16 functions and fix missing cblas_xerbla
Martin Kroeker [Tue, 14 Sep 2021 14:14:43 +0000 (16:14 +0200)]
Add sbgemm
Martin Kroeker [Tue, 14 Sep 2021 14:13:57 +0000 (16:13 +0200)]
Add sbgemv
Martin Kroeker [Tue, 14 Sep 2021 14:12:27 +0000 (16:12 +0200)]
Propagate BUILD_BFLOAT16 to CFLAGS
Martin Kroeker [Tue, 14 Sep 2021 14:10:58 +0000 (16:10 +0200)]
Add defaults for SBGEMV kernels
Martin Kroeker [Tue, 14 Sep 2021 14:09:46 +0000 (16:09 +0200)]
Remove BFLOAT16 from the task list of GenerateNamedObject
Martin Kroeker [Sat, 11 Sep 2021 22:01:31 +0000 (00:01 +0200)]
Merge pull request #3376 from martin-frbg/issue3370
Fix a few harmless compiler warnings
Martin Kroeker [Sat, 11 Sep 2021 22:01:20 +0000 (00:01 +0200)]
Merge pull request #3375 from martin-frbg/issue3369
Add casts to eliminate compiler warnings for Haswell sasum/dasum
Martin Kroeker [Sat, 11 Sep 2021 13:30:19 +0000 (15:30 +0200)]
One instance of kernel_4x1 is used even on SKX
Martin Kroeker [Sat, 11 Sep 2021 13:05:55 +0000 (15:05 +0200)]
really remove the unused variable
Martin Kroeker [Sat, 11 Sep 2021 12:38:47 +0000 (14:38 +0200)]
Add ifdefs around conditionally used functions
Martin Kroeker [Sat, 11 Sep 2021 12:37:44 +0000 (14:37 +0200)]
Move a conditionally used variable
Martin Kroeker [Sat, 11 Sep 2021 12:36:27 +0000 (14:36 +0200)]
Remove unused variable
Martin Kroeker [Sat, 11 Sep 2021 11:38:28 +0000 (13:38 +0200)]
Add casts
Martin Kroeker [Wed, 8 Sep 2021 18:19:39 +0000 (20:19 +0200)]
Merge pull request #3367 from RajalakshmiSR/makesyntax
POWER: Fixing syntax error in makefile
Rajalakshmi Srinivasaraghavan [Wed, 8 Sep 2021 12:04:13 +0000 (07:04 -0500)]
Fixing syntax error in makefile
Fixing syntax issue in Makefile.power added by recent commit
af19cda65aef4d033ae33213013c88b0a99f9da2
Martin Kroeker [Wed, 8 Sep 2021 11:57:35 +0000 (13:57 +0200)]
Merge pull request #3366 from martin-frbg/azure-ubuntu
migrate Azure CI jobs from deprecated ubuntu-16.04 vmImage
Martin Kroeker [Wed, 8 Sep 2021 08:51:59 +0000 (10:51 +0200)]
migrate from deprecated ubuntu-16.04 vmImage
Martin Kroeker [Tue, 7 Sep 2021 14:24:33 +0000 (16:24 +0200)]
Merge pull request #3365 from martin-frbg/travis-lx
Disable the remaining x86_64 job on Travis
Martin Kroeker [Tue, 7 Sep 2021 11:57:40 +0000 (13:57 +0200)]
Merge pull request #3364 from guowangy/bf16-cooperlake
Add SBGEMM kernel for Cooperlake
Wangyang Guo [Tue, 7 Sep 2021 15:37:08 +0000 (23:37 +0800)]
sbgemm: fix build error in BFLOAT16 disabled
Wangyang Guo [Tue, 7 Sep 2021 10:34:26 +0000 (18:34 +0800)]
sbgemm: avoid falling into SGEMM_KERNEL_DIRECT
Wangyang Guo [Tue, 7 Sep 2021 10:12:40 +0000 (18:12 +0800)]
sbgemm: cooperlake: tuning for small matrix
Wangyang Guo [Fri, 20 Aug 2021 14:01:00 +0000 (22:01 +0800)]
sbgemm: cooperlake: implement ncopy_16
Wangyang Guo [Thu, 19 Aug 2021 11:46:08 +0000 (19:46 +0800)]
sbgemm: cooperlake: add n24 kernel for tcopy_4
Wangyang Guo [Wed, 18 Aug 2021 16:08:06 +0000 (00:08 +0800)]
sbgemm: cooperlake: implement tcopy_4
Wangyang Guo [Wed, 18 Aug 2021 13:17:08 +0000 (21:17 +0800)]
sbgemm: cooperlake: prefetch A & B
Wangyang Guo [Tue, 17 Aug 2021 15:21:19 +0000 (23:21 +0800)]
sbgemm: cooperlake: unroll core loop by 2
Wangyang Guo [Tue, 17 Aug 2021 14:08:24 +0000 (22:08 +0800)]
sbgemm: cooperlake: reorder ptr increase for performance
Wangyang Guo [Tue, 17 Aug 2021 13:13:29 +0000 (21:13 +0800)]
sbgemm: cooperlake: fix bug in m64n12
Wangyang Guo [Tue, 17 Aug 2021 11:35:40 +0000 (19:35 +0800)]
sbgemm: cooperlake: tuning for block params
Wangyang Guo [Mon, 16 Aug 2021 11:39:24 +0000 (19:39 +0800)]
sbgemm: cooperlake: kernel works for NN
Wangyang Guo [Thu, 12 Aug 2021 01:46:49 +0000 (01:46 +0000)]
sbgemm: cooperlake: change kernel size to 16x4
Wangyang Guo [Tue, 10 Aug 2021 06:14:45 +0000 (06:14 +0000)]
sbgemm: cooperlake: implement sbgemm_tcopy_32
Wangyang Guo [Tue, 10 Aug 2021 03:23:45 +0000 (03:23 +0000)]
sbgemm: cooperlake: add dummy source files
Martin Kroeker [Tue, 7 Sep 2021 09:40:40 +0000 (11:40 +0200)]
Update .travis.yml
Martin Kroeker [Tue, 7 Sep 2021 09:19:51 +0000 (11:19 +0200)]
Disable the remaining x86_64 job on Travis
Martin Kroeker [Tue, 7 Sep 2021 06:02:53 +0000 (08:02 +0200)]
Merge pull request #3363 from martin-frbg/fixpr3360
Correct misplaced ifdef lines from PR 3360
Martin Kroeker [Mon, 6 Sep 2021 21:44:20 +0000 (23:44 +0200)]
Correct misplaced ifdef lines
Martin Kroeker [Sun, 5 Sep 2021 18:35:48 +0000 (20:35 +0200)]
Add NO_AVX=1 fallbacks to newer generation x86_64 for completeness (#3360)
* Add NO_AVX=1 fallbacks to newer generation x86_64 for completeness
* Update .travis.yml
Martin Kroeker [Sat, 4 Sep 2021 16:26:59 +0000 (18:26 +0200)]
Add "recursive" option for IBM xlf compiler (#3359)
* Add correct "recursive" option for xlf (from reference-lapack issue 606)
Martin Kroeker [Wed, 1 Sep 2021 22:27:23 +0000 (00:27 +0200)]
Merge pull request #3355 from martin-frbg/smallgemmcr
Add workaround for Windows10 macro name clash in small gemm kernel build rules
Martin Kroeker [Wed, 1 Sep 2021 19:36:50 +0000 (21:36 +0200)]
Add workaround for Windows10 macro name clash
Martin Kroeker [Wed, 1 Sep 2021 11:52:40 +0000 (13:52 +0200)]
Merge pull request #3352 from martin-frbg/3321-2n
Allocate an auxiliary struct when running out of preconfigured threads
Martin Kroeker [Tue, 31 Aug 2021 19:47:21 +0000 (21:47 +0200)]
Merge pull request #3354 from nsait-linaro/fix_gmemm_align_win_arm
[win/arm64]: Explicit casting for GEMM_DEFAULT_ALIGN to create 64-bit value
Niyas Sait [Tue, 31 Aug 2021 13:36:44 +0000 (14:36 +0100)]
Make explicit conversion condition on _WIN64 flag
Niyas Sait [Tue, 24 Aug 2021 05:09:29 +0000 (06:09 +0100)]
[win/arm64]: Explicit casting for GMEMM_DEFAULT_ALIGN to create 64-bit value
Win64 uses LLP64 datamodel and unsigned long is only 32-bit. For 64-bit
architecture we need 64-bit mask to correctly generate address
Martin Kroeker [Mon, 30 Aug 2021 18:39:51 +0000 (20:39 +0200)]
Merge pull request #3353 from guowangy/bf16-small-matrix-cooperlake
Enable existing SBGEMM kernel for Cooperlake by small-matrix path
Martin Kroeker [Mon, 30 Aug 2021 12:38:28 +0000 (14:38 +0200)]
Fix typo
Martin Kroeker [Mon, 30 Aug 2021 12:21:25 +0000 (14:21 +0200)]
Clean up debug messages
Wangyang Guo [Mon, 30 Aug 2021 09:48:11 +0000 (17:48 +0800)]
sbgemm: remove unnecessary b0 files
Wangyang Guo [Fri, 13 Aug 2021 10:43:41 +0000 (18:43 +0800)]
sbgemm: cooperlake: make sure hot buffer aligned to 64
Wangyang Guo [Thu, 12 Aug 2021 16:51:24 +0000 (00:51 +0800)]
sbgemm: add missing cblas_sbgemm definition
Wangyang Guo [Thu, 12 Aug 2021 06:10:51 +0000 (06:10 +0000)]
sbgemm: cooperlake: enable SBGEMM by small matrix path
Wangyang Guo [Thu, 12 Aug 2021 03:14:18 +0000 (03:14 +0000)]
Small Matrix: support BFLOAT16 data type
Martin Kroeker [Sun, 29 Aug 2021 20:33:33 +0000 (22:33 +0200)]
Merge pull request #3335 from guowangy/small-matrix-latest
Add GEMM optimization for small matrix and single/double kernel for skylakex
Martin Kroeker [Sun, 29 Aug 2021 17:50:24 +0000 (19:50 +0200)]
Fix unmap logic