platform/upstream/openblas.git
4 years agoUpdate isamin_power8.S
Martin Kroeker [Wed, 12 Feb 2020 22:57:48 +0000 (23:57 +0100)]
Update isamin_power8.S

4 years agoUpdate isamax_power8.S
Martin Kroeker [Wed, 12 Feb 2020 22:56:57 +0000 (23:56 +0100)]
Update isamax_power8.S

4 years agoFix syntax of endianness conditional
Martin Kroeker [Wed, 12 Feb 2020 19:00:29 +0000 (20:00 +0100)]
Fix syntax of endianness conditional

4 years agoFix syntax of endianness conditional
Martin Kroeker [Wed, 12 Feb 2020 18:58:42 +0000 (19:58 +0100)]
Fix syntax of endianness conditional

4 years agoFix syntax of endianness conditional and add gcc version check for workaround
Martin Kroeker [Wed, 12 Feb 2020 18:56:52 +0000 (19:56 +0100)]
Fix syntax of endianness conditional and add gcc version check for workaround

4 years agoMerge pull request #27 from xianyi/develop
Martin Kroeker [Wed, 12 Feb 2020 18:16:14 +0000 (19:16 +0100)]
Merge pull request #27 from xianyi/develop

rebase

4 years agoMerge pull request #2410 from bartoldeman/fix-dscal-inline-asm
Martin Kroeker [Wed, 12 Feb 2020 14:38:37 +0000 (15:38 +0100)]
Merge pull request #2410 from bartoldeman/fix-dscal-inline-asm

Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408

4 years agoFix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
Bart Oldeman [Wed, 12 Feb 2020 14:11:44 +0000 (14:11 +0000)]
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408

The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.

4 years agoMerge pull request #2407 from susilehtola/patch-2
Martin Kroeker [Tue, 11 Feb 2020 12:04:44 +0000 (13:04 +0100)]
Merge pull request #2407 from susilehtola/patch-2

Patch out instances of Z15 in dynamic_zarch.c

4 years agoMerge pull request #2405 from susilehtola/patch-1
Martin Kroeker [Tue, 11 Feb 2020 12:03:35 +0000 (13:03 +0100)]
Merge pull request #2405 from susilehtola/patch-1

Fix typo in dynamic_zarch.c

4 years agoMerge pull request #2404 from martin-frbg/issue2395
Martin Kroeker [Tue, 11 Feb 2020 12:00:36 +0000 (13:00 +0100)]
Merge pull request #2404 from martin-frbg/issue2395

Fix spurious application of USE_TRMM in cmake builds

4 years agoMerge pull request #2403 from martin-frbg/issue2400
Martin Kroeker [Tue, 11 Feb 2020 12:00:16 +0000 (13:00 +0100)]
Merge pull request #2403 from martin-frbg/issue2400

Fix coretype identification of Intel Cannon Lake, Ice Lake and Goldmont

4 years agoMerge pull request #2402 from gxw-loongson/develop
Martin Kroeker [Tue, 11 Feb 2020 11:59:53 +0000 (12:59 +0100)]
Merge pull request #2402 from gxw-loongson/develop

Avoid printing the following information on mips and mips64 when check msa

4 years agoMerge pull request #2399 from martin-frbg/buffersize
Martin Kroeker [Tue, 11 Feb 2020 11:56:56 +0000 (12:56 +0100)]
Merge pull request #2399 from martin-frbg/buffersize

Make BUFFER_SIZE configurable at build time

4 years agoPatch out instances of Z15 in dynamic_zarch.c
Susi Lehtola [Tue, 11 Feb 2020 02:07:33 +0000 (15:07 +1300)]
Patch out instances of Z15 in dynamic_zarch.c

There does not appear to be a Z15 kernel yet, causing link errors from the code. This patch fixes the issue.

4 years agoFix typo in dynamic_zarch.c
Susi Lehtola [Tue, 11 Feb 2020 01:46:30 +0000 (14:46 +1300)]
Fix typo in dynamic_zarch.c

4 years agoFix bad conditional syntax that caused spurious application of USE_TRMM
Martin Kroeker [Mon, 10 Feb 2020 20:17:39 +0000 (21:17 +0100)]
Fix bad conditional syntax that caused spurious application of USE_TRMM

4 years agoFix coretype detection for Intel extended models 6 and 7
Martin Kroeker [Mon, 10 Feb 2020 18:17:32 +0000 (19:17 +0100)]
Fix coretype detection for Intel extended models 6 and 7

affecting Goldmont, Cannon Lake, Ice Lake autodetection

4 years agoAvoid printing the following information on mips and mips64 when check msa:
gxw [Mon, 10 Feb 2020 11:11:45 +0000 (19:11 +0800)]
Avoid printing the following information on mips and mips64 when check msa:
"unrecognized command line option ‘-mmsa’"

4 years agoMake BUFFER_SIZE configurable
Martin Kroeker [Sun, 9 Feb 2020 22:32:57 +0000 (23:32 +0100)]
Make BUFFER_SIZE configurable

4 years agoMake BUFFER_SIZE configurable
Martin Kroeker [Sun, 9 Feb 2020 22:30:22 +0000 (23:30 +0100)]
Make BUFFER_SIZE configurable

4 years agoAdd configuration option for BUFFER_SIZE
Martin Kroeker [Sun, 9 Feb 2020 22:28:04 +0000 (23:28 +0100)]
Add configuration option for BUFFER_SIZE

4 years agoMerge pull request #26 from xianyi/develop
Martin Kroeker [Sun, 9 Feb 2020 22:23:55 +0000 (23:23 +0100)]
Merge pull request #26 from xianyi/develop

rebase

4 years agoIncrement version to 0.3.9.dev
Martin Kroeker [Sun, 9 Feb 2020 22:18:44 +0000 (23:18 +0100)]
Increment version to 0.3.9.dev

4 years agoIncrement version to 0.3.9.dev
Martin Kroeker [Sun, 9 Feb 2020 22:18:07 +0000 (23:18 +0100)]
Increment version to 0.3.9.dev

4 years agoMerge branch 'release-0.3.0' into develop
Martin Kroeker [Sun, 9 Feb 2020 22:16:06 +0000 (23:16 +0100)]
Merge branch 'release-0.3.0' into develop

4 years agoMerge pull request #2397 from martin-frbg/038changes
Martin Kroeker [Sun, 9 Feb 2020 22:01:52 +0000 (23:01 +0100)]
Merge pull request #2397 from martin-frbg/038changes

Update Changelog with changes from 0.3.8

4 years agoUpdate with changes from 0.3.8
Martin Kroeker [Sun, 9 Feb 2020 22:00:36 +0000 (23:00 +0100)]
Update with changes from 0.3.8

4 years agoMerge pull request #25 from xianyi/develop
Martin Kroeker [Sun, 9 Feb 2020 21:48:15 +0000 (22:48 +0100)]
Merge pull request #25 from xianyi/develop

rebase

4 years agotypo fixes
Martin Kroeker [Sun, 9 Feb 2020 00:06:40 +0000 (01:06 +0100)]
typo fixes

4 years agoMerge pull request #2393 from martin-frbg/issue2388
Martin Kroeker [Sun, 9 Feb 2020 00:00:33 +0000 (01:00 +0100)]
Merge pull request #2393 from martin-frbg/issue2388

Provide more documentation in README.md

4 years agoMerge pull request #2390 from martin-frbg/pgi
Martin Kroeker [Sat, 8 Feb 2020 23:13:40 +0000 (00:13 +0100)]
Merge pull request #2390 from martin-frbg/pgi

Small corrections for compilation with PGI compilers

4 years agoUpdate CPU and OS support and document DYNAMIC_ARCH option in README.md
Martin Kroeker [Sat, 8 Feb 2020 23:06:07 +0000 (00:06 +0100)]
Update  CPU and OS support and document DYNAMIC_ARCH option in README.md

prompted by #2388

4 years agoRemove PGI from list again as it is actually still not capable
Martin Kroeker [Sat, 8 Feb 2020 09:20:13 +0000 (10:20 +0100)]
Remove PGI from list again as it is actually still not capable

4 years agoMerge pull request #2389 from Zeyiii/develop
Martin Kroeker [Fri, 7 Feb 2020 15:05:46 +0000 (16:05 +0100)]
Merge pull request #2389 from Zeyiii/develop

Fix bugs in benchmark of gemv

4 years agoRemove OpenMP libraries from link list
Martin Kroeker [Fri, 7 Feb 2020 15:03:51 +0000 (16:03 +0100)]
Remove OpenMP libraries from link list

4 years agoRemove OpenMP libraries from link list
Martin Kroeker [Fri, 7 Feb 2020 15:02:17 +0000 (16:02 +0100)]
Remove OpenMP libraries from link list

4 years agoMerge pull request #2384 from wjc404/develop
Martin Kroeker [Fri, 7 Feb 2020 12:47:12 +0000 (13:47 +0100)]
Merge pull request #2384 from wjc404/develop

Optimize AVX512 DGEMM (& DTRMM)

4 years agoAdd PGI to avx512-supporting compilers
Martin Kroeker [Fri, 7 Feb 2020 12:01:31 +0000 (13:01 +0100)]
Add PGI to avx512-supporting compilers

4 years agoFix utest compilation with PGI
Martin Kroeker [Fri, 7 Feb 2020 09:15:18 +0000 (10:15 +0100)]
Fix utest compilation with PGI

4 years agoSet SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in...
Martin Kroeker [Fri, 7 Feb 2020 09:09:25 +0000 (10:09 +0100)]
Set SUFFIX in tempfile commands, fix bad architecture option for PGI compiler in avx512 test

4 years agoMerge pull request #24 from xianyi/develop
Martin Kroeker [Fri, 7 Feb 2020 09:03:02 +0000 (10:03 +0100)]
Merge pull request #24 from xianyi/develop

rebase

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Thu, 6 Feb 2020 02:14:10 +0000 (02:14 +0000)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Thu, 6 Feb 2020 01:47:46 +0000 (01:47 +0000)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Thu, 6 Feb 2020 01:46:36 +0000 (01:46 +0000)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoFix another branch
w00421467 [Wed, 5 Feb 2020 07:07:18 +0000 (15:07 +0800)]
Fix another branch

4 years agoFix bugs in benchmark of gemv
w00421467 [Wed, 5 Feb 2020 06:53:37 +0000 (14:53 +0800)]
Fix bugs in benchmark of gemv

4 years agoUpdate dgemm_kernel_16x2_skylakex.c
wjc404 [Wed, 5 Feb 2020 05:36:57 +0000 (13:36 +0800)]
Update dgemm_kernel_16x2_skylakex.c

4 years agoUpdate trmm_R.c
wjc404 [Wed, 5 Feb 2020 02:15:02 +0000 (10:15 +0800)]
Update trmm_R.c

4 years agoUpdate trmm_L.c
wjc404 [Wed, 5 Feb 2020 02:09:41 +0000 (10:09 +0800)]
Update trmm_L.c

4 years agoUpdate level3_thread.c
wjc404 [Tue, 4 Feb 2020 12:33:08 +0000 (20:33 +0800)]
Update level3_thread.c

4 years agoUpdate level3.c
wjc404 [Tue, 4 Feb 2020 12:30:23 +0000 (20:30 +0800)]
Update level3.c

4 years agoUpdate param.h
wjc404 [Tue, 4 Feb 2020 11:55:26 +0000 (19:55 +0800)]
Update param.h

4 years agoUpdate KERNEL.SKYLAKEX
wjc404 [Mon, 3 Feb 2020 13:38:08 +0000 (21:38 +0800)]
Update KERNEL.SKYLAKEX

4 years agoUpdate param.h
wjc404 [Mon, 3 Feb 2020 13:34:12 +0000 (21:34 +0800)]
Update param.h

4 years agoAVX512 16x2 DGEMM kernel
wjc404 [Mon, 3 Feb 2020 13:32:56 +0000 (21:32 +0800)]
AVX512 16x2 DGEMM kernel

4 years agoMerge pull request #2378 from martin-frbg/issue2377
Martin Kroeker [Thu, 30 Jan 2020 16:07:19 +0000 (17:07 +0100)]
Merge pull request #2378 from martin-frbg/issue2377

Add -march option for AVX512 in cmake as well

4 years agoAdd -march option for AVX512
Martin Kroeker [Thu, 30 Jan 2020 11:41:18 +0000 (12:41 +0100)]
Add -march option for AVX512

4 years agoMerge pull request #2375 from ewanglong/master
Martin Kroeker [Thu, 30 Jan 2020 09:27:29 +0000 (10:27 +0100)]
Merge pull request #2375 from ewanglong/master

fix a few performance drop in some matrix size per data type

4 years agoMerge pull request #2376 from wjc404/develop
Martin Kroeker [Thu, 23 Jan 2020 20:50:19 +0000 (21:50 +0100)]
Merge pull request #2376 from wjc404/develop

Fix remaining bugs in parallel GEMM3M

4 years agoUpdate level3_gemm3m_thread.c
wjc404 [Wed, 22 Jan 2020 17:40:03 +0000 (17:40 +0000)]
Update level3_gemm3m_thread.c

4 years agofix a few performance drop in some matrix size per data type
Wang,Long [Wed, 22 Jan 2020 15:07:50 +0000 (15:07 +0000)]
fix a few performance drop in some matrix size per data type

Signed-off-by: Wang,Long <long1.wang@intel.com>
4 years agoMerge pull request #2373 from Qiyu8/optimize#gemmbeta
Martin Kroeker [Tue, 21 Jan 2020 14:05:38 +0000 (15:05 +0100)]
Merge pull request #2373 from Qiyu8/optimize#gemmbeta

Optimize genenal Gemm Beta

4 years agoMerge pull request #2372 from martin-frbg/winexit
Martin Kroeker [Tue, 21 Jan 2020 13:56:45 +0000 (14:56 +0100)]
Merge pull request #2372 from martin-frbg/winexit

Do not run any cleanup if the program is exiting anyway

4 years agoOptimize genenal Gemm Beta
Qiyu8 [Mon, 20 Jan 2020 03:49:42 +0000 (11:49 +0800)]
Optimize genenal Gemm Beta

4 years agoDo not run any cleanup if the program is exiting anyway
Martin Kroeker [Sun, 19 Jan 2020 12:28:27 +0000 (13:28 +0100)]
Do not run any cleanup if the program is exiting anyway

From keno's PR #2350 - this avoids the potential hang in blas_thread_shutdown where we may wait for threads to exit while they are waiting on the loader lock from DllMain

4 years agoMerge pull request #2371 from martin-frbg/issue2370
Martin Kroeker [Sat, 18 Jan 2020 19:39:34 +0000 (20:39 +0100)]
Merge pull request #2371 from martin-frbg/issue2370

Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT

4 years agoMerge pull request #2253 from thrasibule/xerbla
Martin Kroeker [Sat, 18 Jan 2020 19:39:04 +0000 (20:39 +0100)]
Merge pull request #2253 from thrasibule/xerbla

fix error messages

4 years agoFree Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT
Martin Kroeker [Sat, 18 Jan 2020 14:06:39 +0000 (15:06 +0100)]
Free Windows thread memory with MEM_RELEASE rather than MEM_DECOMMIT

as suggested by hjmndv in #2370

4 years agoMerge pull request #2367 from wjc404/develop
Martin Kroeker [Wed, 15 Jan 2020 20:13:43 +0000 (21:13 +0100)]
Merge pull request #2367 from wjc404/develop

Improve paralleled SGEMM performance on SKYLAKEX CPUs

4 years agoUpdate sgemm_direct_skylakex.c
wjc404 [Mon, 13 Jan 2020 08:59:23 +0000 (16:59 +0800)]
Update sgemm_direct_skylakex.c

4 years agoUpdate sgemm_kernel_16x4_skylakex_2.c
wjc404 [Mon, 13 Jan 2020 08:58:54 +0000 (16:58 +0800)]
Update sgemm_kernel_16x4_skylakex_2.c

4 years agomake skylakex sgemm code more friendly for readers
wjc404 [Mon, 13 Jan 2020 08:28:41 +0000 (16:28 +0800)]
make skylakex sgemm code more friendly for readers

BTW some kernels were adjusted to improve performance

4 years agoimprove skylakex paralleled sgemm performance
wjc404 [Mon, 13 Jan 2020 08:26:03 +0000 (16:26 +0800)]
improve skylakex paralleled sgemm performance

4 years agoMerge pull request #2366 from martin-frbg/install390
Martin Kroeker [Mon, 13 Jan 2020 08:00:21 +0000 (09:00 +0100)]
Merge pull request #2366 from martin-frbg/install390

Add new file lapack.h from LAPACK 3.9.0 to installable headers

4 years agoInstall new lapack.h
Martin Kroeker [Sun, 12 Jan 2020 21:00:50 +0000 (22:00 +0100)]
Install new lapack.h

new file in LAPACK 3.9.0, split off from lapacke.h

4 years agoMerge pull request #23 from xianyi/develop
Martin Kroeker [Sun, 12 Jan 2020 20:57:23 +0000 (21:57 +0100)]
Merge pull request #23 from xianyi/develop

rebase

4 years agoMerge pull request #2365 from wjc404/develop
Martin Kroeker [Thu, 9 Jan 2020 22:23:09 +0000 (23:23 +0100)]
Merge pull request #2365 from wjc404/develop

Fix SKYLAKEX STRMM issues

4 years agoUpdate KERNEL.SKYLAKEX
wjc404 [Thu, 9 Jan 2020 05:48:41 +0000 (13:48 +0800)]
Update KERNEL.SKYLAKEX

4 years agoMerge pull request #2361 from wjc404/develop
Martin Kroeker [Wed, 8 Jan 2020 15:20:28 +0000 (16:20 +0100)]
Merge pull request #2361 from wjc404/develop

Optimize AVX2 SGEMM & STRMM

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Tue, 7 Jan 2020 03:22:46 +0000 (11:22 +0800)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate sgemm_kernel_8x4_haswell.c
wjc404 [Mon, 6 Jan 2020 12:11:36 +0000 (20:11 +0800)]
Update sgemm_kernel_8x4_haswell.c

4 years agoUpdate CONTRIBUTORS.md
wjc404 [Mon, 6 Jan 2020 04:28:43 +0000 (12:28 +0800)]
Update CONTRIBUTORS.md

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:16:09 +0000 (12:16 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:11:21 +0000 (12:11 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:09:14 +0000 (12:09 +0800)]
optimize AVX2 SGEMM

4 years agooptimize AVX2 SGEMM
wjc404 [Mon, 6 Jan 2020 04:07:02 +0000 (12:07 +0800)]
optimize AVX2 SGEMM

4 years agoMerge pull request #2359 from martin-frbg/lapack-pr330
Martin Kroeker [Fri, 3 Jan 2020 14:03:30 +0000 (15:03 +0100)]
Merge pull request #2359 from martin-frbg/lapack-pr330

Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers

4 years agoApply LAPACKE fix for eigenvector transposition in symmetric eigensolvers
Martin Kroeker [Fri, 3 Jan 2020 10:10:00 +0000 (11:10 +0100)]
Apply LAPACKE fix for eigenvector transposition in symmetric eigensolvers

from Reference-LAPACK PR 330

4 years agoMerge pull request #22 from xianyi/develop
Martin Kroeker [Fri, 3 Jan 2020 09:23:25 +0000 (10:23 +0100)]
Merge pull request #22 from xianyi/develop

rebase

4 years agoMerge pull request #2358 from shengyang-3390/develop
Martin Kroeker [Fri, 3 Jan 2020 08:02:03 +0000 (09:02 +0100)]
Merge pull request #2358 from shengyang-3390/develop

Test all 7 declared values of N in float and double cblas3 tests

4 years ago modified: ctest/din3 ctest/sin3
shengyang [Fri, 3 Jan 2020 02:03:33 +0000 (10:03 +0800)]
modified:   ctest/din3 ctest/sin3

4 years agoMerge pull request #2357 from chenxuqiang/dgemm_beta_zero
Martin Kroeker [Thu, 2 Jan 2020 21:28:36 +0000 (22:28 +0100)]
Merge pull request #2357 from chenxuqiang/dgemm_beta_zero

kernel/arm64/dgemm_beta.S: add beta == zero branch

4 years agoMerge pull request #2356 from shengyang-3390/develop
Martin Kroeker [Thu, 2 Jan 2020 21:27:44 +0000 (22:27 +0100)]
Merge pull request #2356 from shengyang-3390/develop

Use arm neon instructions to optimize ncopy operation (and enable 7th column in float+complex cblas3 test drivers)

4 years agoupdate
shengyang [Thu, 2 Jan 2020 03:01:57 +0000 (11:01 +0800)]
update

4 years agokernel/arm64/dgemm_beta.S: add beta == zero branch
chenxuqiang [Thu, 2 Jan 2020 02:50:45 +0000 (21:50 -0500)]
kernel/arm64/dgemm_beta.S: add beta == zero branch

added beta == zero branch, and no need to load C matrix.

Signed by: Xuqiang Chen <chenxuqiang3@hisilicon.com>

4 years agoMerge pull request #2355 from Zeyiii/dev-zeyi2
Martin Kroeker [Wed, 1 Jan 2020 21:14:16 +0000 (22:14 +0100)]
Merge pull request #2355 from Zeyiii/dev-zeyi2

Use arm neon instructions to optimize sgemm_beta operation

4 years agoMerge pull request #2354 from ZuoQ3/develop
Martin Kroeker [Wed, 1 Jan 2020 21:13:37 +0000 (22:13 +0100)]
Merge pull request #2354 from ZuoQ3/develop

[WIP] Use arm neon instructions to optimize tcopy operation

4 years ago[WIP] Update LAPACK to 3.9.0 (#2353)
Martin Kroeker [Wed, 1 Jan 2020 12:18:53 +0000 (13:18 +0100)]
[WIP] Update LAPACK to 3.9.0 (#2353)

* Update make.inc entries for LAPACK 3.9.0

Reference-LAPACK PR 347 changed some variable names and relative paths

* Update LAPACK to 3.9.0

* Add new functions from LAPACK 3.9.0

* Add new functions from LAPACK 3.9.0

* Restore LOADER command

as it makes it easier to specify pthread as needed

* Restore LOADER

* Restore EIG/LIN prefixes in cmdbase

* add binary path to lapack_testing.py call

* Restore OpenMP version check

* Restore OpenMP version check

* Restore fix for out-of-bounds array accesses

from #2096

4 years agoMerge pull request #2352 from wjc404/develop
Martin Kroeker [Tue, 31 Dec 2019 17:08:10 +0000 (18:08 +0100)]
Merge pull request #2352 from wjc404/develop

AVX2 ZGEMM3M kernel