Martin Kroeker [Fri, 7 Feb 2020 09:09:25 +0000 (10:09 +0100)]
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test
Martin Kroeker [Fri, 7 Feb 2020 09:03:02 +0000 (10:03 +0100)]
Merge pull request #24 from xianyi/develop
rebase
Martin Kroeker [Thu, 30 Jan 2020 16:07:19 +0000 (17:07 +0100)]
Merge pull request #2378 from martin-frbg/issue2377
Add -march option for AVX512 in cmake as well
Martin Kroeker [Thu, 30 Jan 2020 11:41:18 +0000 (12:41 +0100)]
Add -march option for AVX512
Martin Kroeker [Thu, 30 Jan 2020 09:27:29 +0000 (10:27 +0100)]
Merge pull request #2375 from ewanglong/master
fix a few performance drop in some matrix size per data type
Martin Kroeker [Thu, 23 Jan 2020 20:50:19 +0000 (21:50 +0100)]
Merge pull request #2376 from wjc404/develop
Fix remaining bugs in parallel GEMM3M
wjc404 [Wed, 22 Jan 2020 17:40:03 +0000 (17:40 +0000)]
Update level3_gemm3m_thread.c
Wang,Long [Wed, 22 Jan 2020 15:07:50 +0000 (15:07 +0000)]
fix a few performance drop in some matrix size per data type
Signed-off-by: Wang,Long <long1.wang@intel.com>
Martin Kroeker [Tue, 21 Jan 2020 14:05:38 +0000 (15:05 +0100)]
Merge pull request #2373 from Qiyu8/optimize#gemmbeta
Optimize genenal Gemm Beta
Martin Kroeker [Tue, 21 Jan 2020 13:56:45 +0000 (14:56 +0100)]
Merge pull request #2372 from martin-frbg/winexit
Do not run any cleanup if the program is exiting anyway
Qiyu8 [Mon, 20 Jan 2020 03:49:42 +0000 (11:49 +0800)]
Optimize genenal Gemm Beta
Martin Kroeker [Sun, 19 Jan 2020 12:28:27 +0000 (13:28 +0100)]
Do not run any cleanup if the program is exiting anyway
From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain
Martin Kroeker [Sat, 18 Jan 2020 19:39:34 +0000 (20:39 +0100)]
Merge pull request #2371 from martin-frbg/issue2370
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
Martin Kroeker [Sat, 18 Jan 2020 19:39:04 +0000 (20:39 +0100)]
Merge pull request #2253 from thrasibule/xerbla
fix error messages
Martin Kroeker [Sat, 18 Jan 2020 14:06:39 +0000 (15:06 +0100)]
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
as suggested by hjmndv in #2370
Martin Kroeker [Wed, 15 Jan 2020 20:13:43 +0000 (21:13 +0100)]
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
wjc404 [Mon, 13 Jan 2020 08:59:23 +0000 (16:59 +0800)]
Update sgemm_direct_skylakex.c
wjc404 [Mon, 13 Jan 2020 08:58:54 +0000 (16:58 +0800)]
Update sgemm_kernel_16x4_skylakex_2.c
wjc404 [Mon, 13 Jan 2020 08:28:41 +0000 (16:28 +0800)]
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
wjc404 [Mon, 13 Jan 2020 08:26:03 +0000 (16:26 +0800)]
improve skylakex paralleled sgemm performance
Martin Kroeker [Mon, 13 Jan 2020 08:00:21 +0000 (09:00 +0100)]
Merge pull request #2366 from martin-frbg/install390
Add new file lapack.h from LAPACK 3.9.0 to installable headers
Martin Kroeker [Sun, 12 Jan 2020 21:00:50 +0000 (22:00 +0100)]
Install new lapack.h
new file in LAPACK 3.9.0, split off from lapacke.h
Martin Kroeker [Sun, 12 Jan 2020 20:57:23 +0000 (21:57 +0100)]
Merge pull request #23 from xianyi/develop
rebase
Martin Kroeker [Thu, 9 Jan 2020 22:23:09 +0000 (23:23 +0100)]
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
wjc404 [Thu, 9 Jan 2020 05:48:41 +0000 (13:48 +0800)]
Update KERNEL.SKYLAKEX
Martin Kroeker [Wed, 8 Jan 2020 15:20:28 +0000 (16:20 +0100)]
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
wjc404 [Tue, 7 Jan 2020 03:22:46 +0000 (11:22 +0800)]
Update sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 12:11:36 +0000 (20:11 +0800)]
Update sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 04:28:43 +0000 (12:28 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 6 Jan 2020 04:16:09 +0000 (12:16 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:11:21 +0000 (12:11 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:09:14 +0000 (12:09 +0800)]
optimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:07:02 +0000 (12:07 +0800)]
optimize AVX2 SGEMM
Martin Kroeker [Fri, 3 Jan 2020 14:03:30 +0000 (15:03 +0100)]
Merge pull request #2359 from martin-frbg/lapack-pr330
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
Martin Kroeker [Fri, 3 Jan 2020 10:10:00 +0000 (11:10 +0100)]
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
from Reference-LAPACK PR 330
Martin Kroeker [Fri, 3 Jan 2020 09:23:25 +0000 (10:23 +0100)]
Merge pull request #22 from xianyi/develop
rebase
Martin Kroeker [Fri, 3 Jan 2020 08:02:03 +0000 (09:02 +0100)]
Merge pull request #2358 from shengyang-3390/develop
Test all 7 declared values of N in float and double cblas3 tests
shengyang [Fri, 3 Jan 2020 02:03:33 +0000 (10:03 +0800)]
modified: ctest/din3 ctest/sin3
Martin Kroeker [Thu, 2 Jan 2020 21:28:36 +0000 (22:28 +0100)]
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero
kernel/arm64/dgemm_beta.S: add beta == zero branch
Martin Kroeker [Thu, 2 Jan 2020 21:27:44 +0000 (22:27 +0100)]
Merge pull request #2356 from shengyang-3390/develop
Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)
shengyang [Thu, 2 Jan 2020 03:01:57 +0000 (11:01 +0800)]
update
chenxuqiang [Thu, 2 Jan 2020 02:50:45 +0000 (21:50 -0500)]
kernel/arm64/dgemm_beta.S: add beta == zero branch
added beta == zero branch, and no need to load C matrix.
Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>
Martin Kroeker [Wed, 1 Jan 2020 21:14:16 +0000 (22:14 +0100)]
Merge pull request #2355 from Zeyiii/dev-zeyi2
Use arm neon instructions to optimize sgemm_beta operation
Martin Kroeker [Wed, 1 Jan 2020 21:13:37 +0000 (22:13 +0100)]
Merge pull request #2354 from ZuoQ3/develop
[WIP] Use arm neon instructions to optimize tcopy operation
Martin Kroeker [Wed, 1 Jan 2020 12:18:53 +0000 (13:18 +0100)]
[WIP] Update LAPACK to 3.9.0 (#2353)
* Update make.inc entries for LAPACK 3.9.0
Reference-LAPACK PR 347 changed some variable names and relative paths
* Update LAPACK to 3.9.0
* Add new functions from LAPACK 3.9.0
* Add new functions from LAPACK 3.9.0
* Restore LOADER command
as it makes it easier to specify pthread as needed
* Restore LOADER
* Restore EIG/LIN prefixes in cmdbase
* add binary path to lapack_testing.py call
* Restore OpenMP version check
* Restore OpenMP version check
* Restore fix for out-of-bounds array accesses
from #2096
Martin Kroeker [Tue, 31 Dec 2019 17:08:10 +0000 (18:08 +0100)]
Merge pull request #2352 from wjc404/develop
AVX2 ZGEMM3M kernel
Martin Kroeker [Tue, 31 Dec 2019 17:07:37 +0000 (18:07 +0100)]
Merge pull request #2351 from Zeyiii/develop
prefetching for dgemm_beta
int_13h [Tue, 31 Dec 2019 17:03:27 +0000 (22:33 +0530)]
add in runtime cpu detection for zarch (#2349)
add in runtime cpu detection for zarch
shengyang [Tue, 31 Dec 2019 09:06:35 +0000 (17:06 +0800)]
Use arm neon instructions to optimize ncopy operation
modified: KERNEL.ARMV8
modified: KERNEL.TSV110
new file: sgemm_ncopy_4.S
shengyang [Tue, 31 Dec 2019 07:59:52 +0000 (15:59 +0800)]
modified: ctest/din3
modified: ctest/sin3
w00421467 [Tue, 31 Dec 2019 02:31:07 +0000 (10:31 +0800)]
Use arm neon instructions to optimize sgemm_beta operation
zq [Tue, 31 Dec 2019 02:21:23 +0000 (10:21 +0800)]
[WIP] Use arm neon instructions to optimize tcopy operation
w00421467 [Tue, 31 Dec 2019 02:13:24 +0000 (10:13 +0800)]
Merge remote-tracking branch 'pub/develop' into develop
wjc404 [Mon, 30 Dec 2019 09:33:42 +0000 (17:33 +0800)]
Update zgemm3m_kernel_4x4_haswell.c
wjc404 [Mon, 30 Dec 2019 09:18:59 +0000 (17:18 +0800)]
Add files via upload
wjc404 [Mon, 30 Dec 2019 08:11:37 +0000 (16:11 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 30 Dec 2019 08:10:08 +0000 (16:10 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 30 Dec 2019 08:08:19 +0000 (16:08 +0800)]
Update param.h
wjc404 [Mon, 30 Dec 2019 08:04:23 +0000 (16:04 +0800)]
Update KERNEL.ZEN
wjc404 [Mon, 30 Dec 2019 08:03:24 +0000 (16:03 +0800)]
Update KERNEL.HASWELL
wjc404 [Mon, 30 Dec 2019 08:02:51 +0000 (16:02 +0800)]
Create zgemm3m_kernel_4x4_haswell.c
w00421467 [Mon, 30 Dec 2019 03:45:49 +0000 (11:45 +0800)]
prefetching for dgemm_beta
Martin Kroeker [Sun, 29 Dec 2019 20:27:18 +0000 (21:27 +0100)]
Update LAPACK to 3.9.0
Martin Kroeker [Sun, 29 Dec 2019 17:08:55 +0000 (18:08 +0100)]
Merge pull request #21 from xianyi/develop
rebase
Martin Kroeker [Sat, 28 Dec 2019 19:07:56 +0000 (20:07 +0100)]
Merge pull request #2348 from wjc404/develop
AVX2 CGEMM3M kernel
wjc404 [Fri, 27 Dec 2019 15:36:13 +0000 (23:36 +0800)]
Update CONTRIBUTORS.md
wjc404 [Fri, 27 Dec 2019 10:23:29 +0000 (18:23 +0800)]
Update cgemm3m_kernel_8x4_haswell.c
wjc404 [Fri, 27 Dec 2019 10:06:42 +0000 (18:06 +0800)]
Update param.h
wjc404 [Fri, 27 Dec 2019 10:04:08 +0000 (18:04 +0800)]
Update KERNEL.ZEN
wjc404 [Fri, 27 Dec 2019 10:03:01 +0000 (18:03 +0800)]
Update gemm3m_level3.c
wjc404 [Fri, 27 Dec 2019 10:01:38 +0000 (18:01 +0800)]
Update KERNEL.HASWELL
wjc404 [Fri, 27 Dec 2019 10:00:55 +0000 (18:00 +0800)]
Create cgemm3m_kernel_8x4_haswell.c
Martin Kroeker [Wed, 25 Dec 2019 21:26:41 +0000 (22:26 +0100)]
Merge pull request #2345 from wjc404/develop
Optimize AVX2 CGEMM
wjc404 [Mon, 23 Dec 2019 16:40:16 +0000 (00:40 +0800)]
Update cgemm_kernel_8x2_haswell.c
wjc404 [Mon, 23 Dec 2019 16:30:16 +0000 (00:30 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 16:24:40 +0000 (00:24 +0800)]
Update CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 15:44:55 +0000 (23:44 +0800)]
Update param.h
wjc404 [Mon, 23 Dec 2019 15:42:30 +0000 (23:42 +0800)]
Update KERNEL.ZEN
wjc404 [Mon, 23 Dec 2019 15:41:44 +0000 (23:41 +0800)]
Update KERNEL.HASWELL
wjc404 [Mon, 23 Dec 2019 15:40:03 +0000 (23:40 +0800)]
Fast Haswell CGEMM kernel
Martin Kroeker [Sat, 21 Dec 2019 11:16:55 +0000 (12:16 +0100)]
Merge pull request #2344 from wjc404/develop
Optimize AVX2 ZGEMM
wjc404 [Sat, 21 Dec 2019 06:38:51 +0000 (14:38 +0800)]
Adjust Haswell ZGEMM blocking parameters
wjc404 [Sat, 21 Dec 2019 06:37:06 +0000 (14:37 +0800)]
Fast Haswell ZGEMM kernel
wjc404 [Sat, 21 Dec 2019 06:35:15 +0000 (14:35 +0800)]
Fast Haswell ZGEMM kernel
Martin Kroeker [Fri, 20 Dec 2019 07:38:57 +0000 (08:38 +0100)]
Merge pull request #2340 from Zeyiii/develop
[WIP] Use arm neon instructions to optimize gemm beta operation
w00421467 [Fri, 20 Dec 2019 02:11:50 +0000 (10:11 +0800)]
declare DGEMM_BETA in KERNEL.ARMV8 rather than the generic KERNEL
w00421467 [Tue, 17 Dec 2019 02:00:13 +0000 (10:00 +0800)]
use arm neon instructions to optimize gemm beta operation
Martin Kroeker [Fri, 13 Dec 2019 13:57:26 +0000 (14:57 +0100)]
Merge pull request #2339 from Jehan/wip/Jehan/fix-timeout
driver: more reasonable thread wait timeout on Windows.
Jehan [Wed, 11 Dec 2019 16:51:42 +0000 (17:51 +0100)]
driver: more reasonable thread wait timeout on Windows.
It used to be 5ms, which might not be long enough in some cases for the
thread to exit well, but then when set to 5000 (5s), it would slow down
any program depending on OpenBlas.
Let's just set it to 50ms, which is at least 10 times longer than
originally, but still reasonable in case of failed thread termination.
Martin Kroeker [Mon, 9 Dec 2019 16:54:49 +0000 (17:54 +0100)]
Merge pull request #2338 from kavanabhat/aix_mod
Changes to build on AIX in POWER8 mode
Martin Kroeker [Sat, 7 Dec 2019 08:38:06 +0000 (09:38 +0100)]
Merge pull request #2337 from martin-frbg/issue2336
Support two-digit version numbers in gcc version check
Martin Kroeker [Fri, 6 Dec 2019 20:23:56 +0000 (21:23 +0100)]
Support two-digit version numbers in gcc version check
fixes #2336 (non-recognition of gcc 10) with patch provided by JeffreyALaw.
Kavana Bhat [Fri, 6 Dec 2019 10:33:32 +0000 (04:33 -0600)]
AIX changes for Power8
Martin Kroeker [Wed, 4 Dec 2019 10:06:03 +0000 (11:06 +0100)]
Update DYNAMIC_ARCH support for ARM64 and PPC (#2332)
* Update DYNAMIC_ARCH list of ARM64 targets for gmake
* Update arm64 cpu list for runtime detection
* Update DYNAMIC_ARCH list of ARM64 targets for cmake and add POWERPC targets
Kavana Bhat [Wed, 4 Dec 2019 06:23:46 +0000 (00:23 -0600)]
AIX changes for Power8
Martin Kroeker [Tue, 3 Dec 2019 21:23:52 +0000 (22:23 +0100)]
Merge pull request #2334 from martin-frbg/fix2228
Remove misplaced file
Martin Kroeker [Tue, 3 Dec 2019 07:32:29 +0000 (08:32 +0100)]
Add Intel Goldmont+ cpuid
was originally in #2228 but that PR had misplaced the file in the toplevel directory
Martin Kroeker [Tue, 3 Dec 2019 07:24:10 +0000 (08:24 +0100)]
Delete stray copy of dynamic.c from PR 2228
Martin Kroeker [Tue, 3 Dec 2019 07:22:40 +0000 (08:22 +0100)]
Merge pull request #20 from xianyi/develop
Rebase
Martin Kroeker [Mon, 2 Dec 2019 07:30:43 +0000 (08:30 +0100)]
Merge pull request #2329 from isuruf/patch-1
Workaround an ICE in clang 9.0.0