platform/upstream/openblas.git
4 years agoModify Makefile in interface to remove the error occured in travis CI
jayfely@qq.com [Wed, 11 Mar 2020 08:36:45 +0000 (16:36 +0800)]
Modify Makefile in interface to remove the error occured in travis CI

4 years agoOnly keep spmv.goto and spmv.atlas
jayfely@qq.com [Wed, 11 Mar 2020 07:48:58 +0000 (15:48 +0800)]
Only keep spmv.goto and spmv.atlas

4 years agoUpdate spmv.c: solve segmentation fault when m and n are larger than 50000
jayfely@qq.com [Wed, 11 Mar 2020 02:30:09 +0000 (10:30 +0800)]
Update spmv.c: solve segmentation fault when m and n are larger than 50000

4 years agoModify Makefile in Benchmark
jayfely@qq.com [Tue, 10 Mar 2020 06:32:18 +0000 (14:32 +0800)]
Modify Makefile in Benchmark

4 years agoAdd benchmark for SPMV
jayfely@qq.com [Tue, 10 Mar 2020 06:22:18 +0000 (14:22 +0800)]
Add benchmark for SPMV

4 years agoMerge pull request #2498 from njutcz/develop
Zhang Xianyi [Mon, 9 Mar 2020 08:04:33 +0000 (16:04 +0800)]
Merge pull request #2498 from njutcz/develop

Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.

4 years agoFix the functional bugs for zamax.
s00548429 [Mon, 9 Mar 2020 07:36:50 +0000 (15:36 +0800)]
Fix the functional bugs for zamax.

4 years agoAdd benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
s00548429 [Mon, 9 Mar 2020 06:59:03 +0000 (14:59 +0800)]
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and  i?min.

4 years agoMerge pull request #1 from xianyi/develop
njutcz [Mon, 9 Mar 2020 02:39:40 +0000 (10:39 +0800)]
Merge pull request #1 from xianyi/develop

update

4 years agoMerge pull request #2495 from ZuoQ3/develop
Martin Kroeker [Sun, 8 Mar 2020 07:09:58 +0000 (08:09 +0100)]
Merge pull request #2495 from ZuoQ3/develop

add benchmark for axpby test

4 years agoMerge pull request #2494 from shengyang-3390/develop
Martin Kroeker [Sat, 7 Mar 2020 22:04:21 +0000 (23:04 +0100)]
Merge pull request #2494 from shengyang-3390/develop

add benchmark for csrot and zdrot

4 years agoMerge pull request #2489 from jijiwawa/brightness
Martin Kroeker [Sat, 7 Mar 2020 21:26:00 +0000 (22:26 +0100)]
Merge pull request #2489 from jijiwawa/brightness

Remove redundant code

4 years agoadd trmm.c
s00527847 [Sat, 7 Mar 2020 18:09:19 +0000 (13:09 -0500)]
add trmm.c

4 years agoRemove redundant code
s00527847 [Wed, 4 Mar 2020 22:44:50 +0000 (17:44 -0500)]
Remove redundant code

4 years agoMerge pull request #2493 from martin-frbg/plainmake
Martin Kroeker [Sat, 7 Mar 2020 15:55:53 +0000 (16:55 +0100)]
Merge pull request #2493 from martin-frbg/plainmake

Fix use of make vs $(MAKE) in building lapack-testing

4 years agoMerge pull request #2488 from liujingjue/develop
Martin Kroeker [Sat, 7 Mar 2020 15:52:29 +0000 (16:52 +0100)]
Merge pull request #2488 from liujingjue/develop

Modify the main Makefile in OpenBLAS

4 years agoAdd benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby
zq [Sat, 7 Mar 2020 09:48:55 +0000 (17:48 +0800)]
Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby

4 years agoMerge pull request #1 from xianyi/develop
zq [Sat, 7 Mar 2020 09:04:59 +0000 (17:04 +0800)]
Merge pull request #1 from xianyi/develop

update

4 years agoadd benchmark for csrot and zdrot
shengyang [Sat, 7 Mar 2020 07:17:49 +0000 (15:17 +0800)]
add benchmark for csrot and zdrot
modified:   benchmark/Makefile
modified:   benchmark/rot.c

4 years ago[OpenBLAS]:modifed the Makefile
l00546269 [Sat, 7 Mar 2020 02:14:33 +0000 (10:14 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]: check the compiler version and show the detail info

4 years agoFix another spot where make was used instead of $(MAKE)
Martin Kroeker [Fri, 6 Mar 2020 14:37:26 +0000 (15:37 +0100)]
Fix another spot where make was used instead of $(MAKE)

Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax

4 years agoMerge pull request #36 from xianyi/develop
Martin Kroeker [Fri, 6 Mar 2020 14:32:27 +0000 (15:32 +0100)]
Merge pull request #36 from xianyi/develop

rebase

4 years agoMerge pull request #2491 from chenxuqiang/hbmv_benchmark
Martin Kroeker [Fri, 6 Mar 2020 14:06:42 +0000 (15:06 +0100)]
Merge pull request #2491 from chenxuqiang/hbmv_benchmark

benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c

4 years agoMerge pull request #2490 from shengyang-3390/develop
Martin Kroeker [Fri, 6 Mar 2020 14:05:55 +0000 (15:05 +0100)]
Merge pull request #2490 from shengyang-3390/develop

Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm

4 years agoMerge pull request #2487 from jijiwawa/develop
Martin Kroeker [Fri, 6 Mar 2020 13:42:25 +0000 (14:42 +0100)]
Merge pull request #2487 from jijiwawa/develop

add benchmark for spr/spr2

4 years agoMerge branch 'develop' into develop
Martin Kroeker [Fri, 6 Mar 2020 13:41:40 +0000 (14:41 +0100)]
Merge branch 'develop' into develop

4 years agoMerge pull request #2486 from qqqil/develop
Martin Kroeker [Fri, 6 Mar 2020 13:30:09 +0000 (14:30 +0100)]
Merge pull request #2486 from qqqil/develop

add benchmark for trsv

4 years agoMerge pull request #2485 from Darkness303/develop
Martin Kroeker [Fri, 6 Mar 2020 13:29:27 +0000 (14:29 +0100)]
Merge pull request #2485 from Darkness303/develop

Add syr2 benchmark

4 years agoMerge pull request #2469 from AGSaidi/acq-rel-2
Martin Kroeker [Fri, 6 Mar 2020 13:28:58 +0000 (14:28 +0100)]
Merge pull request #2469 from AGSaidi/acq-rel-2

Use acq/rel semantics to pass flags/pointers in getrf_parallel.

4 years agoUse acq/rel semantics to pass flags/pointers in getrf_parallel.
Ali Saidi [Mon, 24 Feb 2020 05:45:30 +0000 (05:45 +0000)]
Use acq/rel semantics to pass flags/pointers in getrf_parallel.

The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.

4 years agobenchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
chenxuqiang [Fri, 6 Mar 2020 06:02:02 +0000 (01:02 -0500)]
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c

Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
4 years agoAdd benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
shengyang [Thu, 5 Mar 2020 01:55:16 +0000 (09:55 +0800)]
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm

modified:   benchmark/Makefile
new file:   benchmark/rotm.c

4 years agoadd benchmark for spr/spr2
s00527847 [Wed, 4 Mar 2020 20:50:19 +0000 (15:50 -0500)]
add benchmark for spr/spr2

4 years agochange clock to CLOCK_PROCESS_CPUTIME_ID
q00437336 [Wed, 4 Mar 2020 08:54:40 +0000 (03:54 -0500)]
change clock to CLOCK_PROCESS_CPUTIME_ID

4 years ago[OpenBLAS]:modifed the Makefile
l00546269 [Wed, 4 Mar 2020 08:47:23 +0000 (16:47 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]:add c/fortran compiler version information in final note

4 years agoadd benchmark for trsv
q00437336 [Wed, 4 Mar 2020 07:57:33 +0000 (02:57 -0500)]
add benchmark for trsv

4 years agoMerge pull request #2484 from RajalakshmiSR/power-dynamic
Martin Kroeker [Wed, 4 Mar 2020 07:06:06 +0000 (08:06 +0100)]
Merge pull request #2484 from RajalakshmiSR/power-dynamic

Fix DYNAMIC_ARCH build for POWER9

4 years agoMerge pull request #2483 from aaawuanjun/develop
Martin Kroeker [Wed, 4 Mar 2020 06:59:56 +0000 (07:59 +0100)]
Merge pull request #2483 from aaawuanjun/develop

Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years agoMerge pull request #2466 from AGSaidi/acq-rel-1
Martin Kroeker [Wed, 4 Mar 2020 06:59:31 +0000 (07:59 +0100)]
Merge pull request #2466 from AGSaidi/acq-rel-1

Switch blas_server to use acq/rel semantics

4 years ago1.Add syr2 benchmark
Darkness303 [Wed, 4 Mar 2020 06:09:10 +0000 (14:09 +0800)]
1.Add syr2 benchmark
2.Fixed some errors

4 years agoFix cut/paste glitch
Martin Kroeker [Tue, 3 Mar 2020 20:37:48 +0000 (21:37 +0100)]
Fix cut/paste glitch

4 years agoRestore initializers for mutex and conditional
Martin Kroeker [Tue, 3 Mar 2020 20:04:12 +0000 (21:04 +0100)]
Restore initializers for mutex and conditional

4 years agoFix DYNAMIC_ARCH build for POWER9
Rajalakshmi Srinivasaraghavan [Tue, 3 Mar 2020 18:35:10 +0000 (12:35 -0600)]
Fix DYNAMIC_ARCH build for POWER9

Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks.  This patch fixes some of the macros that are used
to check compiler version.  On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 11:03:57 +0000 (19:03 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years ago[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c...
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years ago[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c...
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years agoMerge pull request #2479 from Darkness303/develop
Martin Kroeker [Tue, 3 Mar 2020 07:46:49 +0000 (08:46 +0100)]
Merge pull request #2479 from Darkness303/develop

Fix potential index overflows at large matrix sizes in the benchmark codes

4 years agoMerge pull request #2436 from marxin/improve-utest-coverage
Martin Kroeker [Tue, 3 Mar 2020 07:43:00 +0000 (08:43 +0100)]
Merge pull request #2436 from marxin/improve-utest-coverage

Improve test coverage for utests.

4 years agoMerge pull request #2481 from ChinouneMehdi/fix2480
Martin Kroeker [Mon, 2 Mar 2020 20:21:29 +0000 (21:21 +0100)]
Merge pull request #2481 from ChinouneMehdi/fix2480

Fix #2480

4 years agoMerge pull request #2478 from MacChen02/develop
Martin Kroeker [Mon, 2 Mar 2020 20:20:51 +0000 (21:20 +0100)]
Merge pull request #2478 from MacChen02/develop

Update benchmark statistical time function

4 years agofixes #2480
مهدي شينون (Mehdi Chinoune) [Mon, 2 Mar 2020 16:22:28 +0000 (17:22 +0100)]
fixes #2480

4 years agoImprove test coverage for utests.
Martin Liska [Wed, 19 Feb 2020 17:24:01 +0000 (18:24 +0100)]
Improve test coverage for utests.

4 years agofix core dumped error
jianghesong [Mon, 2 Mar 2020 11:13:45 +0000 (19:13 +0800)]
fix core dumped error

4 years agoUpdate benchmark statistical time function
MacChen02 [Mon, 2 Mar 2020 06:36:27 +0000 (14:36 +0800)]
Update benchmark statistical time function

The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.

4 years agoSwitch blas_server to use acq/rel semantics
Ali Saidi [Sat, 22 Feb 2020 05:31:07 +0000 (05:31 +0000)]
Switch blas_server to use acq/rel semantics

Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.

We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.

The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.

4 years agoMerge pull request #2475 from martin-frbg/039changes
Martin Kroeker [Sun, 1 Mar 2020 23:04:26 +0000 (00:04 +0100)]
Merge pull request #2475 from martin-frbg/039changes

Update ChangeLog for 0.3.9

4 years agoMerge pull request #2474 from martin-frbg/p9be
Martin Kroeker [Sun, 1 Mar 2020 23:04:08 +0000 (00:04 +0100)]
Merge pull request #2474 from martin-frbg/p9be

Use POWER8 kernels on big-endian POWER9 for now

4 years agoAdd Ampere EMAG8180
Martin Kroeker [Sun, 1 Mar 2020 23:02:36 +0000 (00:02 +0100)]
Add Ampere EMAG8180

4 years agoUpdate with 0.3.9 changes
Martin Kroeker [Sun, 1 Mar 2020 23:01:22 +0000 (00:01 +0100)]
Update with 0.3.9 changes

4 years agoUse POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 22:45:58 +0000 (23:45 +0100)]
Use POWER8 kernels on big-endian POWER9 for now

4 years agoMerge pull request #35 from xianyi/develop
Martin Kroeker [Sun, 1 Mar 2020 22:44:10 +0000 (23:44 +0100)]
Merge pull request #35 from xianyi/develop

rebase

4 years agoMerge pull request #2471 from AGSaidi/l3-fix-2
Martin Kroeker [Sun, 1 Mar 2020 18:41:07 +0000 (19:41 +0100)]
Merge pull request #2471 from AGSaidi/l3-fix-2

Fix barriers in level3_thread

4 years agoMerge pull request #2468 from AGSaidi/wfe
Martin Kroeker [Sun, 1 Mar 2020 18:40:46 +0000 (19:40 +0100)]
Merge pull request #2468 from AGSaidi/wfe

Use wait-for-event to not spin in the blas_lock

4 years agoMerge pull request #2464 from Darkness303/develop
Martin Kroeker [Sun, 1 Mar 2020 12:02:34 +0000 (13:02 +0100)]
Merge pull request #2464 from Darkness303/develop

Add syr benchmark

4 years agoMerge pull request #2467 from AGSaidi/rpcc
Martin Kroeker [Sat, 29 Feb 2020 21:43:02 +0000 (22:43 +0100)]
Merge pull request #2467 from AGSaidi/rpcc

Make rpcc() on arm64 get closer to what x86 returns

4 years agoMerge pull request #2463 from martin-frbg/mingwfix
Martin Kroeker [Sat, 29 Feb 2020 18:08:03 +0000 (19:08 +0100)]
Merge pull request #2463 from martin-frbg/mingwfix

Apply MinGW AVX512 compilation fix to fortran options as well

4 years agoMerge pull request #2422 from wjc404/develop
Martin Kroeker [Sat, 29 Feb 2020 18:07:35 +0000 (19:07 +0100)]
Merge pull request #2422 from wjc404/develop

Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM

4 years agoFix barriers in level3_thread
Ali Saidi [Sat, 29 Feb 2020 17:27:18 +0000 (17:27 +0000)]
Fix barriers in level3_thread

4 years agoMerge pull request #2465 from AGSaidi/neoverse-n1
Martin Kroeker [Sat, 29 Feb 2020 12:24:44 +0000 (13:24 +0100)]
Merge pull request #2465 from AGSaidi/neoverse-n1

Add Neoverse-N1 core

4 years agoUse wait-for-event to not spin in the blas_lock
Ali Saidi [Fri, 21 Feb 2020 23:43:43 +0000 (23:43 +0000)]
Use wait-for-event to not spin in the blas_lock

4 years agoMake rpcc() on arm64 get closer to what x86 returns
Ali Saidi [Sat, 22 Feb 2020 05:07:55 +0000 (05:07 +0000)]
Make rpcc() on arm64 get closer to what x86 returns

The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.

4 years agoAdd Neoverse-N1 core
Ali Saidi [Fri, 21 Feb 2020 22:46:58 +0000 (22:46 +0000)]
Add Neoverse-N1 core

The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a

4 years agoNew add syr benchmark
j00520245 [Fri, 28 Feb 2020 08:36:53 +0000 (16:36 +0800)]
New add syr benchmark

4 years agoApply MinGW AVX512 compilation fix to fortran options as well
Martin Kroeker [Thu, 27 Feb 2020 22:09:40 +0000 (23:09 +0100)]
Apply MinGW AVX512 compilation fix to fortran options as well

original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:26:15 +0000 (22:26 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoUpdate zgemm_kernel_4x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:25:19 +0000 (22:25 +0800)]
Update zgemm_kernel_4x2_haswell.c

4 years agoMerge pull request #2447 from martin-frbg/issue2446
Martin Kroeker [Thu, 27 Feb 2020 14:07:02 +0000 (15:07 +0100)]
Merge pull request #2447 from martin-frbg/issue2446

Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180

4 years agoAlways assume server-class cpu count for TSV110 and EMAG8180
Martin Kroeker [Wed, 26 Feb 2020 21:19:57 +0000 (22:19 +0100)]
Always assume server-class cpu count for TSV110 and EMAG8180

4 years agoMerge pull request #34 from xianyi/develop
Martin Kroeker [Wed, 26 Feb 2020 21:16:28 +0000 (22:16 +0100)]
Merge pull request #34 from xianyi/develop

rebase

4 years agoUpdate zgemm_kernel_4x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:38:12 +0000 (18:38 +0800)]
Update zgemm_kernel_4x2_haswell.c

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:36:54 +0000 (18:36 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoMerge pull request #2437 from martin-frbg/issue2434
Martin Kroeker [Tue, 25 Feb 2020 17:42:52 +0000 (18:42 +0100)]
Merge pull request #2437 from martin-frbg/issue2434

[WIP] Add support for Ampere EMAG8180 ARMV8 cpu

4 years agoAdd Ampere EMAG8180
Martin Kroeker [Tue, 25 Feb 2020 13:30:00 +0000 (14:30 +0100)]
Add Ampere EMAG8180

4 years agoAdd parameters for EMAG8180 DYNAMIC_ARCH support with cmake
Martin Kroeker [Mon, 24 Feb 2020 19:23:18 +0000 (20:23 +0100)]
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake

4 years agoAdd EMAG8180 to arm64 DYNAMIC_ARCH list for cmake
Martin Kroeker [Mon, 24 Feb 2020 19:16:18 +0000 (20:16 +0100)]
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake

4 years agoTypo fix
Martin Kroeker [Mon, 24 Feb 2020 19:15:04 +0000 (20:15 +0100)]
Typo fix

4 years agoAdd EMAG8180 to DYNAMIC_CORE list for ARM64
Martin Kroeker [Mon, 24 Feb 2020 18:23:46 +0000 (19:23 +0100)]
Add EMAG8180 to DYNAMIC_CORE list for ARM64

4 years agoAdd DYNAMIC_ARCH support for ARMV8 EMAG8180
Martin Kroeker [Mon, 24 Feb 2020 18:20:00 +0000 (19:20 +0100)]
Add DYNAMIC_ARCH support for ARMV8 EMAG8180

4 years agoMerge pull request #2443 from aaawuanjun/develop
Martin Kroeker [Mon, 24 Feb 2020 12:14:51 +0000 (13:14 +0100)]
Merge pull request #2443 from aaawuanjun/develop

[OpenBlas]:benchmark/copy.c has time,x,y data loop problems

4 years agoMerge pull request #2442 from martin-frbg/lapackpr390
Martin Kroeker [Mon, 24 Feb 2020 11:27:01 +0000 (12:27 +0100)]
Merge pull request #2442 from martin-frbg/lapackpr390

Apply fix from Reference-LAPACK PR 390

4 years ago[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
wuanjun 00447568 [Mon, 24 Feb 2020 03:23:39 +0000 (11:23 +0800)]
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems

4 years agoApply fix from Reference-LAPACK PR390, NaN not propagating
Martin Kroeker [Sun, 23 Feb 2020 21:40:40 +0000 (22:40 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating

4 years agoApply fix from Reference-LAPACK PR390, NaN not propagating
Martin Kroeker [Sun, 23 Feb 2020 21:39:01 +0000 (22:39 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating

4 years agoAdd files via upload
wjc404 [Sat, 22 Feb 2020 15:40:02 +0000 (23:40 +0800)]
Add files via upload

4 years agoUpdate KERNEL.ZEN
wjc404 [Sat, 22 Feb 2020 15:39:43 +0000 (23:39 +0800)]
Update KERNEL.ZEN

4 years agoUpdate KERNEL.HASWELL
wjc404 [Sat, 22 Feb 2020 15:39:20 +0000 (23:39 +0800)]
Update KERNEL.HASWELL

4 years agoDelete sgemm_kernel_8x4_haswell_2.c
wjc404 [Sat, 22 Feb 2020 15:38:48 +0000 (23:38 +0800)]
Delete sgemm_kernel_8x4_haswell_2.c

4 years agoFix performance bug when LDC is a multiple of 1024
wjc404 [Sat, 22 Feb 2020 15:37:45 +0000 (23:37 +0800)]
Fix performance bug when LDC is a multiple of 1024