Martin Kroeker [Wed, 12 Feb 2020 18:16:14 +0000 (19:16 +0100)]
Merge pull request #27 from xianyi/develop
rebase
Martin Kroeker [Wed, 12 Feb 2020 14:38:37 +0000 (15:38 +0100)]
Merge pull request #2410 from bartoldeman/fix-dscal-inline-asm
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
Bart Oldeman [Wed, 12 Feb 2020 14:11:44 +0000 (14:11 +0000)]
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
Martin Kroeker [Tue, 11 Feb 2020 12:04:44 +0000 (13:04 +0100)]
Merge pull request #2407 from susilehtola/patch-2
Patch out instances of Z15 in dynamic_zarch.c
Martin Kroeker [Tue, 11 Feb 2020 12:03:35 +0000 (13:03 +0100)]
Merge pull request #2405 from susilehtola/patch-1
Fix typo in dynamic_zarch.c
Martin Kroeker [Tue, 11 Feb 2020 12:00:36 +0000 (13:00 +0100)]
Merge pull request #2404 from martin-frbg/issue2395
Fix spurious application of USE_TRMM in cmake builds
Martin Kroeker [Tue, 11 Feb 2020 12:00:16 +0000 (13:00 +0100)]
Merge pull request #2403 from martin-frbg/issue2400
Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont
Martin Kroeker [Tue, 11 Feb 2020 11:59:53 +0000 (12:59 +0100)]
Merge pull request #2402 from gxw-loongson/develop
Avoid printing the following information on mips and mips64 when check msa
Martin Kroeker [Tue, 11 Feb 2020 11:56:56 +0000 (12:56 +0100)]
Merge pull request #2399 from martin-frbg/buffersize
Make BUFFER_SIZE configurable at build time
Susi Lehtola [Tue, 11 Feb 2020 02:07:33 +0000 (15:07 +1300)]
Patch out instances of Z15 in dynamic_zarch.c
There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.
Susi Lehtola [Tue, 11 Feb 2020 01:46:30 +0000 (14:46 +1300)]
Fix typo in dynamic_zarch.c
Martin Kroeker [Mon, 10 Feb 2020 20:17:39 +0000 (21:17 +0100)]
Fix bad conditional syntax that caused spurious application of USE_TRMM
Martin Kroeker [Mon, 10 Feb 2020 18:17:32 +0000 (19:17 +0100)]
Fix coretype detection for Intel extended models 6 and 7
affecting Goldmont, Cannon Lake, Ice Lake autodetection
gxw [Mon, 10 Feb 2020 11:11:45 +0000 (19:11 +0800)]
Avoid printing the following information on mips and mips64 when check msa:
"unrecognized command line option ‘-mmsa’"
Martin Kroeker [Sun, 9 Feb 2020 22:32:57 +0000 (23:32 +0100)]
Make BUFFER_SIZE configurable
Martin Kroeker [Sun, 9 Feb 2020 22:30:22 +0000 (23:30 +0100)]
Make BUFFER_SIZE configurable
Martin Kroeker [Sun, 9 Feb 2020 22:28:04 +0000 (23:28 +0100)]
Add configuration option for BUFFER_SIZE
Martin Kroeker [Sun, 9 Feb 2020 22:23:55 +0000 (23:23 +0100)]
Merge pull request #26 from xianyi/develop
rebase
Martin Kroeker [Sun, 9 Feb 2020 22:18:44 +0000 (23:18 +0100)]
Increment version to 0.3.9.dev
Martin Kroeker [Sun, 9 Feb 2020 22:18:07 +0000 (23:18 +0100)]
Increment version to 0.3.9.dev
Martin Kroeker [Sun, 9 Feb 2020 22:16:06 +0000 (23:16 +0100)]
Merge branch 'release-0.3.0' into develop
Martin Kroeker [Sun, 9 Feb 2020 22:01:52 +0000 (23:01 +0100)]
Merge pull request #2397 from martin-frbg/038changes
Update Changelog with changes from 0.3.8
Martin Kroeker [Sun, 9 Feb 2020 22:00:36 +0000 (23:00 +0100)]
Update with changes from 0.3.8
Martin Kroeker [Sun, 9 Feb 2020 21:48:15 +0000 (22:48 +0100)]
Merge pull request #25 from xianyi/develop
rebase
Martin Kroeker [Sun, 9 Feb 2020 00:06:40 +0000 (01:06 +0100)]
typo fixes
Martin Kroeker [Sun, 9 Feb 2020 00:00:33 +0000 (01:00 +0100)]
Merge pull request #2393 from martin-frbg/issue2388
Provide more documentation in README.md
Martin Kroeker [Sat, 8 Feb 2020 23:13:40 +0000 (00:13 +0100)]
Merge pull request #2390 from martin-frbg/pgi
Small corrections for compilation with PGI compilers
Martin Kroeker [Sat, 8 Feb 2020 23:06:07 +0000 (00:06 +0100)]
Update CPU and OS support and document DYNAMIC_ARCH option in README.md
prompted by #2388
Martin Kroeker [Sat, 8 Feb 2020 09:20:13 +0000 (10:20 +0100)]
Remove PGI from list again as it is actually still not capable
Martin Kroeker [Fri, 7 Feb 2020 15:05:46 +0000 (16:05 +0100)]
Merge pull request #2389 from Zeyiii/develop
Fix bugs in benchmark of gemv
Martin Kroeker [Fri, 7 Feb 2020 15:03:51 +0000 (16:03 +0100)]
Remove OpenMP libraries from link list
Martin Kroeker [Fri, 7 Feb 2020 15:02:17 +0000 (16:02 +0100)]
Remove OpenMP libraries from link list
Martin Kroeker [Fri, 7 Feb 2020 12:47:12 +0000 (13:47 +0100)]
Merge pull request #2384 from wjc404/develop
Optimize AVX512 DGEMM (& DTRMM)
Martin Kroeker [Fri, 7 Feb 2020 12:01:31 +0000 (13:01 +0100)]
Add PGI to avx512-supporting compilers
Martin Kroeker [Fri, 7 Feb 2020 09:15:18 +0000 (10:15 +0100)]
Fix utest compilation with PGI
Martin Kroeker [Fri, 7 Feb 2020 09:09:25 +0000 (10:09 +0100)]
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test
Martin Kroeker [Fri, 7 Feb 2020 09:03:02 +0000 (10:03 +0100)]
Merge pull request #24 from xianyi/develop
rebase
wjc404 [Thu, 6 Feb 2020 02:14:10 +0000 (02:14 +0000)]
Update dgemm_kernel_16x2_skylakex.c
wjc404 [Thu, 6 Feb 2020 01:47:46 +0000 (01:47 +0000)]
Update sgemm_kernel_8x4_haswell.c
wjc404 [Thu, 6 Feb 2020 01:46:36 +0000 (01:46 +0000)]
Update dgemm_kernel_16x2_skylakex.c
w00421467 [Wed, 5 Feb 2020 07:07:18 +0000 (15:07 +0800)]
Fix another branch
w00421467 [Wed, 5 Feb 2020 06:53:37 +0000 (14:53 +0800)]
Fix bugs in benchmark of gemv
wjc404 [Wed, 5 Feb 2020 05:36:57 +0000 (13:36 +0800)]
Update dgemm_kernel_16x2_skylakex.c
wjc404 [Wed, 5 Feb 2020 02:15:02 +0000 (10:15 +0800)]
Update trmm_R.c
wjc404 [Wed, 5 Feb 2020 02:09:41 +0000 (10:09 +0800)]
Update trmm_L.c
wjc404 [Tue, 4 Feb 2020 12:33:08 +0000 (20:33 +0800)]
Update level3_thread.c
wjc404 [Tue, 4 Feb 2020 12:30:23 +0000 (20:30 +0800)]
Update level3.c
wjc404 [Tue, 4 Feb 2020 11:55:26 +0000 (19:55 +0800)]
Update param.h
wjc404 [Mon, 3 Feb 2020 13:38:08 +0000 (21:38 +0800)]
Update KERNEL.SKYLAKEX
wjc404 [Mon, 3 Feb 2020 13:34:12 +0000 (21:34 +0800)]
Update param.h
wjc404 [Mon, 3 Feb 2020 13:32:56 +0000 (21:32 +0800)]
AVX512 16x2 DGEMM kernel
Martin Kroeker [Thu, 30 Jan 2020 16:07:19 +0000 (17:07 +0100)]
Merge pull request #2378 from martin-frbg/issue2377
Add -march option for AVX512 in cmake as well
Martin Kroeker [Thu, 30 Jan 2020 11:41:18 +0000 (12:41 +0100)]
Add -march option for AVX512
Martin Kroeker [Thu, 30 Jan 2020 09:27:29 +0000 (10:27 +0100)]
Merge pull request #2375 from ewanglong/master
fix a few performance drop in some matrix size per data type
Martin Kroeker [Thu, 23 Jan 2020 20:50:19 +0000 (21:50 +0100)]
Merge pull request #2376 from wjc404/develop
Fix remaining bugs in parallel GEMM3M
wjc404 [Wed, 22 Jan 2020 17:40:03 +0000 (17:40 +0000)]
Update level3_gemm3m_thread.c
Wang,Long [Wed, 22 Jan 2020 15:07:50 +0000 (15:07 +0000)]
fix a few performance drop in some matrix size per data type
Signed-off-by: Wang,Long <long1.wang@intel.com>
Martin Kroeker [Tue, 21 Jan 2020 14:05:38 +0000 (15:05 +0100)]
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
Optimize genenal Gemm Beta
Martin Kroeker [Tue, 21 Jan 2020 13:56:45 +0000 (14:56 +0100)]
Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
Qiyu8 [Mon, 20 Jan 2020 03:49:42 +0000 (11:49 +0800)]
Optimize genenal Gemm Beta
Martin Kroeker [Sun, 19 Jan 2020 12:28:27 +0000 (13:28 +0100)]
Do not run any cleanup if the program is exiting anyway
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
Martin Kroeker [Sat, 18 Jan 2020 19:39:34 +0000 (20:39 +0100)]
Merge pull request #2371 from martin-frbg/issue2370
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
Martin Kroeker [Sat, 18 Jan 2020 19:39:04 +0000 (20:39 +0100)]
Merge pull request #2253 from thrasibule/xerbla
fix error messages
Martin Kroeker [Sat, 18 Jan 2020 14:06:39 +0000 (15:06 +0100)]
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
as suggested by hjmndv in #2370
Martin Kroeker [Wed, 15 Jan 2020 20:13:43 +0000 (21:13 +0100)]
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
wjc404 [Mon, 13 Jan 2020 08:59:23 +0000 (16:59 +0800)]
Update sgemm_direct_skylakex.c
wjc404 [Mon, 13 Jan 2020 08:58:54 +0000 (16:58 +0800)]
Update sgemm_kernel_16x4_skylakex_2.c
wjc404 [Mon, 13 Jan 2020 08:28:41 +0000 (16:28 +0800)]
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
wjc404 [Mon, 13 Jan 2020 08:26:03 +0000 (16:26 +0800)]
improve skylakex paralleled sgemm performance
Martin Kroeker [Mon, 13 Jan 2020 08:00:21 +0000 (09:00 +0100)]
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
Martin Kroeker [Sun, 12 Jan 2020 21:00:50 +0000 (22:00 +0100)]
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
Martin Kroeker [Sun, 12 Jan 2020 20:57:23 +0000 (21:57 +0100)]
Merge pull request #23 from xianyi/develop
rebase
Martin Kroeker [Thu, 9 Jan 2020 22:23:09 +0000 (23:23 +0100)]
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
wjc404 [Thu, 9 Jan 2020 05:48:41 +0000 (13:48 +0800)]
Update KERNEL.SKYLAKEX
Martin Kroeker [Wed, 8 Jan 2020 15:20:28 +0000 (16:20 +0100)]
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
wjc404 [Tue, 7 Jan 2020 03:22:46 +0000 (11:22 +0800)]
Update sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 12:11:36 +0000 (20:11 +0800)]
Update sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 04:28:43 +0000 (12:28 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 6 Jan 2020 04:16:09 +0000 (12:16 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:11:21 +0000 (12:11 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:09:14 +0000 (12:09 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:07:02 +0000 (12:07 +0800)]
optimize AVX2 SGEMM
Martin Kroeker [Fri, 3 Jan 2020 14:03:30 +0000 (15:03 +0100)]
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
Martin Kroeker [Fri, 3 Jan 2020 10:10:00 +0000 (11:10 +0100)]
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
Martin Kroeker [Fri, 3 Jan 2020 09:23:25 +0000 (10:23 +0100)]
Merge pull request #22 from xianyi/develop
rebase
Martin Kroeker [Fri, 3 Jan 2020 08:02:03 +0000 (09:02 +0100)]
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
shengyang [Fri, 3 Jan 2020 02:03:33 +0000 (10:03 +0800)]
modified: ctest/din3 ctest/sin3
Martin Kroeker [Thu, 2 Jan 2020 21:28:36 +0000 (22:28 +0100)]
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
Martin Kroeker [Thu, 2 Jan 2020 21:27:44 +0000 (22:27 +0100)]
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
shengyang [Thu, 2 Jan 2020 03:01:57 +0000 (11:01 +0800)]
update
chenxuqiang [Thu, 2 Jan 2020 02:50:45 +0000 (21:50 -0500)]
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
Martin Kroeker [Wed, 1 Jan 2020 21:14:16 +0000 (22:14 +0100)]
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
Martin Kroeker [Wed, 1 Jan 2020 21:13:37 +0000 (22:13 +0100)]
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
Martin Kroeker [Wed, 1 Jan 2020 12:18:53 +0000 (13:18 +0100)]
[WIP] Update LAPACK to 3.9.0 (#2353)
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
Martin Kroeker [Tue, 31 Dec 2019 17:08:10 +0000 (18:08 +0100)]
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
Martin Kroeker [Tue, 31 Dec 2019 17:07:37 +0000 (18:07 +0100)]
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
int_13h [Tue, 31 Dec 2019 17:03:27 +0000 (22:33 +0530)]
add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
shengyang [Tue, 31 Dec 2019 09:06:35 +0000 (17:06 +0800)]
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
shengyang [Tue, 31 Dec 2019 07:59:52 +0000 (15:59 +0800)]
modified: ctest/din3
modified: ctest/sin3
w00421467 [Tue, 31 Dec 2019 02:31:07 +0000 (10:31 +0800)]
Use arm neon instructions to optimize sgemm_beta operation