platform/upstream/openblas.git
4 years agoAVX2 STRSM kernel
wjc404 [Mon, 16 Mar 2020 16:34:08 +0000 (00:34 +0800)]
AVX2 STRSM kernel

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:26:15 +0000 (22:26 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoUpdate zgemm_kernel_4x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:25:19 +0000 (22:25 +0800)]
Update zgemm_kernel_4x2_haswell.c

4 years agoUpdate zgemm_kernel_4x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:38:12 +0000 (18:38 +0800)]
Update zgemm_kernel_4x2_haswell.c

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:36:54 +0000 (18:36 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoAdd files via upload
wjc404 [Sat, 22 Feb 2020 15:40:02 +0000 (23:40 +0800)]
Add files via upload

4 years agoUpdate KERNEL.ZEN
wjc404 [Sat, 22 Feb 2020 15:39:43 +0000 (23:39 +0800)]
Update KERNEL.ZEN

4 years agoUpdate KERNEL.HASWELL
wjc404 [Sat, 22 Feb 2020 15:39:20 +0000 (23:39 +0800)]
Update KERNEL.HASWELL

4 years agoDelete sgemm_kernel_8x4_haswell_2.c
wjc404 [Sat, 22 Feb 2020 15:38:48 +0000 (23:38 +0800)]
Delete sgemm_kernel_8x4_haswell_2.c

4 years agoFix performance bug when LDC is a multiple of 1024
wjc404 [Sat, 22 Feb 2020 15:37:45 +0000 (23:37 +0800)]
Fix performance bug when LDC is a multiple of 1024

4 years agoUpdate param.h
wjc404 [Sun, 16 Feb 2020 15:01:31 +0000 (23:01 +0800)]
Update param.h

4 years agoUpdate KERNEL.SKYLAKEX
wjc404 [Sun, 16 Feb 2020 14:58:44 +0000 (22:58 +0800)]
Update KERNEL.SKYLAKEX

4 years agoAVX512 STRMM kernel
wjc404 [Sun, 16 Feb 2020 14:58:00 +0000 (22:58 +0800)]
AVX512 STRMM kernel

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Thu, 6 Feb 2020 02:14:10 +0000 (02:14 +0000)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Thu, 6 Feb 2020 01:47:46 +0000 (01:47 +0000)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Thu, 6 Feb 2020 01:46:36 +0000 (01:46 +0000)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Wed, 5 Feb 2020 05:36:57 +0000 (13:36 +0800)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoUpdate trmm_R.c
wjc404 [Wed, 5 Feb 2020 02:15:02 +0000 (10:15 +0800)]
Update trmm_R.c

4 years agoUpdate trmm_L.c
wjc404 [Wed, 5 Feb 2020 02:09:41 +0000 (10:09 +0800)]
Update trmm_L.c

4 years agoUpdate level3_thread.c
wjc404 [Tue, 4 Feb 2020 12:33:08 +0000 (20:33 +0800)]
Update level3_thread.c

4 years agoUpdate level3.c
wjc404 [Tue, 4 Feb 2020 12:30:23 +0000 (20:30 +0800)]
Update level3.c

4 years agoUpdate param.h
wjc404 [Tue, 4 Feb 2020 11:55:26 +0000 (19:55 +0800)]
Update param.h

4 years agoUpdate KERNEL.SKYLAKEX
wjc404 [Mon, 3 Feb 2020 13:38:08 +0000 (21:38 +0800)]
Update KERNEL.SKYLAKEX

4 years agoUpdate param.h
wjc404 [Mon, 3 Feb 2020 13:34:12 +0000 (21:34 +0800)]
Update param.h

4 years agoAVX512 16x2 DGEMM kernel
wjc404 [Mon, 3 Feb 2020 13:32:56 +0000 (21:32 +0800)]
AVX512 16x2 DGEMM kernel

4 years agoMerge pull request #2378 from martin-frbg/issue2377
Martin Kroeker [Thu, 30 Jan 2020 16:07:19 +0000 (17:07 +0100)]
Merge pull request #2378 from martin-frbg/issue2377

Add -march option for AVX512 in cmake as well

4 years agoAdd -march option for AVX512
Martin Kroeker [Thu, 30 Jan 2020 11:41:18 +0000 (12:41 +0100)]
Add -march option for AVX512

4 years agoMerge pull request #2375 from ewanglong/master
Martin Kroeker [Thu, 30 Jan 2020 09:27:29 +0000 (10:27 +0100)]
Merge pull request #2375 from ewanglong/master

fix a few performance drop in some matrix size per data type

4 years agoMerge pull request #2376 from wjc404/develop
Martin Kroeker [Thu, 23 Jan 2020 20:50:19 +0000 (21:50 +0100)]
Merge pull request #2376 from wjc404/develop

Fix remaining bugs in parallel GEMM3M

4 years agoUpdate level3_gemm3m_thread.c
wjc404 [Wed, 22 Jan 2020 17:40:03 +0000 (17:40 +0000)]
Update level3_gemm3m_thread.c

4 years agofix a few performance drop in some matrix size per data type
Wang,Long [Wed, 22 Jan 2020 15:07:50 +0000 (15:07 +0000)]
fix a few performance drop in some matrix size per data type

Signed-off-by: Wang,Long <long1.wang@intel.com>
4 years agoMerge pull request #2373 from Qiyu8/optimize#gemmbeta
Martin Kroeker [Tue, 21 Jan 2020 14:05:38 +0000 (15:05 +0100)]
Merge pull request #2373 from Qiyu8/optimize#gemmbeta

Optimize genenal Gemm Beta

4 years agoMerge pull request #2372 from martin-frbg/winexit
Martin Kroeker [Tue, 21 Jan 2020 13:56:45 +0000 (14:56 +0100)]
Merge pull request #2372 from martin-frbg/winexit

Do not run any cleanup if the program is exiting anyway

4 years agoOptimize genenal Gemm Beta
Qiyu8 [Mon, 20 Jan 2020 03:49:42 +0000 (11:49 +0800)]
Optimize genenal Gemm Beta

4 years agoDo not run any cleanup if the program is exiting anyway
Martin Kroeker [Sun, 19 Jan 2020 12:28:27 +0000 (13:28 +0100)]
Do not run any cleanup if the program is exiting anyway

From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain

4 years agoMerge pull request #2371 from martin-frbg/issue2370
Martin Kroeker [Sat, 18 Jan 2020 19:39:34 +0000 (20:39 +0100)]
Merge pull request #2371 from martin-frbg/issue2370

Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT

4 years agoMerge pull request #2253 from thrasibule/xerbla
Martin Kroeker [Sat, 18 Jan 2020 19:39:04 +0000 (20:39 +0100)]
Merge pull request #2253 from thrasibule/xerbla

fix error messages

4 years agoFree Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
Martin Kroeker [Sat, 18 Jan 2020 14:06:39 +0000 (15:06 +0100)]
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT

as suggested by hjmndv in #2370

4 years agoMerge pull request #2367 from wjc404/develop
Martin Kroeker [Wed, 15 Jan 2020 20:13:43 +0000 (21:13 +0100)]
Merge pull request #2367 from wjc404/develop

Improve paralleled SGEMM performance on SKYLAKEX CPUs

4 years agoUpdate sgemm_direct_skylakex.c
wjc404 [Mon, 13 Jan 2020 08:59:23 +0000 (16:59 +0800)]
Update sgemm_direct_skylakex.c

4 years agoUpdate sgemm_kernel_16x4_skylakex_2.c
wjc404 [Mon, 13 Jan 2020 08:58:54 +0000 (16:58 +0800)]
Update sgemm_kernel_16x4_skylakex_2.c

4 years agomake skylakex sgemm code more friendly for readers
wjc404 [Mon, 13 Jan 2020 08:28:41 +0000 (16:28 +0800)]
make skylakex sgemm code more friendly for readers

BTW some kernels were adjusted to improve performance

4 years agoimprove skylakex paralleled sgemm performance
wjc404 [Mon, 13 Jan 2020 08:26:03 +0000 (16:26 +0800)]
improve skylakex paralleled sgemm performance

4 years agoMerge pull request #2366 from martin-frbg/install390
Martin Kroeker [Mon, 13 Jan 2020 08:00:21 +0000 (09:00 +0100)]
Merge pull request #2366 from martin-frbg/install390

Add new file lapack.h from LAPACK 3.9.0 to installable headers

4 years agoInstall new lapack.h
Martin Kroeker [Sun, 12 Jan 2020 21:00:50 +0000 (22:00 +0100)]
Install new lapack.h

new file in LAPACK 3.9.0, split off from lapacke.h

4 years agoMerge pull request #23 from xianyi/develop
Martin Kroeker [Sun, 12 Jan 2020 20:57:23 +0000 (21:57 +0100)]
Merge pull request #23 from xianyi/develop

rebase

4 years agoMerge pull request #2365 from wjc404/develop
Martin Kroeker [Thu, 9 Jan 2020 22:23:09 +0000 (23:23 +0100)]
Merge pull request #2365 from wjc404/develop

Fix SKYLAKEX STRMM issues

4 years agoUpdate KERNEL.SKYLAKEX
wjc404 [Thu, 9 Jan 2020 05:48:41 +0000 (13:48 +0800)]
Update KERNEL.SKYLAKEX

4 years agoMerge pull request #2361 from wjc404/develop
Martin Kroeker [Wed, 8 Jan 2020 15:20:28 +0000 (16:20 +0100)]
Merge pull request #2361 from wjc404/develop

Optimize AVX2 SGEMM & STRMM

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Tue, 7 Jan 2020 03:22:46 +0000 (11:22 +0800)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 12:11:36 +0000 (20:11 +0800)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 6 Jan 2020 04:28:43 +0000 (12:28 +0800)]
Update CONTRIBUTORS.md

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:16:09 +0000 (12:16 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:11:21 +0000 (12:11 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:09:14 +0000 (12:09 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:07:02 +0000 (12:07 +0800)]
optimize AVX2 SGEMM

4 years agoMerge pull request #2359 from martin-frbg/lapack-pr330
Martin Kroeker [Fri, 3 Jan 2020 14:03:30 +0000 (15:03 +0100)]
Merge pull request #2359 from martin-frbg/lapack-pr330

Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers

4 years agoApply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
Martin Kroeker [Fri, 3 Jan 2020 10:10:00 +0000 (11:10 +0100)]
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers

from Reference-LAPACK PR 330

4 years agoMerge pull request #22 from xianyi/develop
Martin Kroeker [Fri, 3 Jan 2020 09:23:25 +0000 (10:23 +0100)]
Merge pull request #22 from xianyi/develop

rebase

4 years agoMerge pull request #2358 from shengyang-3390/develop
Martin Kroeker [Fri, 3 Jan 2020 08:02:03 +0000 (09:02 +0100)]
Merge pull request #2358 from shengyang-3390/develop

Test all 7 declared values of N in float and double cblas3 tests

4 years ago modified: ctest/din3 ctest/sin3
shengyang [Fri, 3 Jan 2020 02:03:33 +0000 (10:03 +0800)]
modified:   ctest/din3 ctest/sin3

4 years agoMerge pull request #2357 from chenxuqiang/dgemm_beta_zero
Martin Kroeker [Thu, 2 Jan 2020 21:28:36 +0000 (22:28 +0100)]
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero

kernel/arm64/dgemm_beta.S: add beta == zero branch

4 years agoMerge pull request #2356 from shengyang-3390/develop
Martin Kroeker [Thu, 2 Jan 2020 21:27:44 +0000 (22:27 +0100)]
Merge pull request #2356 from shengyang-3390/develop

Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)

4 years agoupdate
shengyang [Thu, 2 Jan 2020 03:01:57 +0000 (11:01 +0800)]
update

4 years agokernel/arm64/dgemm_beta.S: add beta == zero branch
chenxuqiang [Thu, 2 Jan 2020 02:50:45 +0000 (21:50 -0500)]
kernel/arm64/dgemm_beta.S: add beta == zero branch

added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>

4 years agoMerge pull request #2355 from Zeyiii/dev-zeyi2
Martin Kroeker [Wed, 1 Jan 2020 21:14:16 +0000 (22:14 +0100)]
Merge pull request #2355 from Zeyiii/dev-zeyi2

Use arm neon instructions to optimize sgemm_beta operation

4 years agoMerge pull request #2354 from ZuoQ3/develop
Martin Kroeker [Wed, 1 Jan 2020 21:13:37 +0000 (22:13 +0100)]
Merge pull request #2354 from ZuoQ3/develop

[WIP] Use arm neon instructions to optimize tcopy operation

4 years ago[WIP] Update LAPACK to 3.9.0 (#2353)
Martin Kroeker [Wed, 1 Jan 2020 12:18:53 +0000 (13:18 +0100)]
[WIP] Update LAPACK to 3.9.0 (#2353)

* Update make.inc entries for LAPACK 3.9.0

Reference-LAPACK PR 347 changed some variable names and relative paths

* Update LAPACK to 3.9.0

* Add new functions from LAPACK 3.9.0

* Add new functions from LAPACK 3.9.0

* Restore LOADER command

as it makes it easier to specify pthread as needed

* Restore LOADER

* Restore EIG/LIN prefixes in cmdbase

* add binary path to lapack_testing.py call

* Restore OpenMP version check

* Restore OpenMP version check

* Restore fix for out-of-bounds array accesses

from #2096

4 years agoMerge pull request #2352 from wjc404/develop
Martin Kroeker [Tue, 31 Dec 2019 17:08:10 +0000 (18:08 +0100)]
Merge pull request #2352 from wjc404/develop

AVX2 ZGEMM3M kernel

4 years agoMerge pull request #2351 from Zeyiii/develop
Martin Kroeker [Tue, 31 Dec 2019 17:07:37 +0000 (18:07 +0100)]
Merge pull request #2351 from Zeyiii/develop

prefetching for dgemm_beta

4 years agoadd in runtime cpu detection for zarch (#2349)
int_13h [Tue, 31 Dec 2019 17:03:27 +0000 (22:33 +0530)]
add in runtime cpu detection for zarch (#2349)

 add in runtime cpu detection for zarch

4 years ago Use arm neon instructions to optimize ncopy operation
shengyang [Tue, 31 Dec 2019 09:06:35 +0000 (17:06 +0800)]
Use arm neon instructions to optimize ncopy operation

modified:   KERNEL.ARMV8
modified:   KERNEL.TSV110
new file:   sgemm_ncopy_4.S

4 years ago modified: ctest/din3
shengyang [Tue, 31 Dec 2019 07:59:52 +0000 (15:59 +0800)]
modified:   ctest/din3
modified:   ctest/sin3

4 years agoUse arm neon instructions to optimize sgemm_beta operation
w00421467 [Tue, 31 Dec 2019 02:31:07 +0000 (10:31 +0800)]
Use arm neon instructions to optimize sgemm_beta operation

4 years ago[WIP] Use arm neon instructions to optimize tcopy operation
zq [Tue, 31 Dec 2019 02:21:23 +0000 (10:21 +0800)]
[WIP] Use arm neon instructions to optimize tcopy operation

4 years agoMerge remote-tracking branch 'pub/develop' into develop
w00421467 [Tue, 31 Dec 2019 02:13:24 +0000 (10:13 +0800)]
Merge remote-tracking branch 'pub/develop' into develop

4 years agoUpdate zgemm3m_kernel_4x4_haswell.c
wjc404 [Mon, 30 Dec 2019 09:33:42 +0000 (17:33 +0800)]
Update zgemm3m_kernel_4x4_haswell.c

4 years agoAdd files via upload
wjc404 [Mon, 30 Dec 2019 09:18:59 +0000 (17:18 +0800)]
Add files via upload

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 30 Dec 2019 08:11:37 +0000 (16:11 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 30 Dec 2019 08:10:08 +0000 (16:10 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate param.h
wjc404 [Mon, 30 Dec 2019 08:08:19 +0000 (16:08 +0800)]
Update param.h

4 years agoUpdate KERNEL.ZEN
wjc404 [Mon, 30 Dec 2019 08:04:23 +0000 (16:04 +0800)]
Update KERNEL.ZEN

4 years agoUpdate KERNEL.HASWELL
wjc404 [Mon, 30 Dec 2019 08:03:24 +0000 (16:03 +0800)]
Update KERNEL.HASWELL

4 years agoCreate zgemm3m_kernel_4x4_haswell.c
wjc404 [Mon, 30 Dec 2019 08:02:51 +0000 (16:02 +0800)]
Create zgemm3m_kernel_4x4_haswell.c

4 years agoprefetching for dgemm_beta
w00421467 [Mon, 30 Dec 2019 03:45:49 +0000 (11:45 +0800)]
prefetching for dgemm_beta

4 years agoUpdate LAPACK to 3.9.0
Martin Kroeker [Sun, 29 Dec 2019 20:27:18 +0000 (21:27 +0100)]
Update LAPACK to 3.9.0

4 years agoMerge pull request #21 from xianyi/develop
Martin Kroeker [Sun, 29 Dec 2019 17:08:55 +0000 (18:08 +0100)]
Merge pull request #21 from xianyi/develop

rebase

4 years agoMerge pull request #2348 from wjc404/develop
Martin Kroeker [Sat, 28 Dec 2019 19:07:56 +0000 (20:07 +0100)]
Merge pull request #2348 from wjc404/develop

AVX2 CGEMM3M kernel

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Fri, 27 Dec 2019 15:36:13 +0000 (23:36 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate cgemm3m_kernel_8x4_haswell.c
wjc404 [Fri, 27 Dec 2019 10:23:29 +0000 (18:23 +0800)]
Update cgemm3m_kernel_8x4_haswell.c

4 years agoUpdate param.h
wjc404 [Fri, 27 Dec 2019 10:06:42 +0000 (18:06 +0800)]
Update param.h

4 years agoUpdate KERNEL.ZEN
wjc404 [Fri, 27 Dec 2019 10:04:08 +0000 (18:04 +0800)]
Update KERNEL.ZEN

4 years agoUpdate gemm3m_level3.c
wjc404 [Fri, 27 Dec 2019 10:03:01 +0000 (18:03 +0800)]
Update gemm3m_level3.c

4 years agoUpdate KERNEL.HASWELL
wjc404 [Fri, 27 Dec 2019 10:01:38 +0000 (18:01 +0800)]
Update KERNEL.HASWELL

4 years agoCreate cgemm3m_kernel_8x4_haswell.c
wjc404 [Fri, 27 Dec 2019 10:00:55 +0000 (18:00 +0800)]
Create cgemm3m_kernel_8x4_haswell.c

4 years agoMerge pull request #2345 from wjc404/develop
Martin Kroeker [Wed, 25 Dec 2019 21:26:41 +0000 (22:26 +0100)]
Merge pull request #2345 from wjc404/develop

Optimize AVX2 CGEMM

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Mon, 23 Dec 2019 16:40:16 +0000 (00:40 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 16:30:16 +0000 (00:30 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 23 Dec 2019 16:24:40 +0000 (00:24 +0800)]
Update CONTRIBUTORS.md

4 years agoUpdate param.h
wjc404 [Mon, 23 Dec 2019 15:44:55 +0000 (23:44 +0800)]
Update param.h