platform/upstream/openblas.git
4 years agoMerge pull request #2516 from wjc404/develop
Martin Kroeker [Mon, 16 Mar 2020 20:58:34 +0000 (21:58 +0100)]
Merge pull request #2516 from wjc404/develop

AVX2 STRSM kernels

4 years agoUpdate KERNEL.ZEN
wjc404 [Mon, 16 Mar 2020 16:39:37 +0000 (16:39 +0000)]
Update KERNEL.ZEN

4 years agoAVX2 STRSM kernel
wjc404 [Mon, 16 Mar 2020 16:34:08 +0000 (00:34 +0800)]
AVX2 STRSM kernel

4 years agoMerge pull request #2508 from liujingjue/develop
Martin Kroeker [Sat, 14 Mar 2020 13:21:30 +0000 (14:21 +0100)]
Merge pull request #2508 from liujingjue/develop

[OpenBLAS]:fix the iamax benchmark error

4 years agoMerge pull request #2512 from martin-frbg/lapackh
Martin Kroeker [Sat, 14 Mar 2020 12:27:40 +0000 (13:27 +0100)]
Merge pull request #2512 from martin-frbg/lapackh

Move declarations of lapack_complex_custom types outside the extern C

4 years agoMerge pull request #2506 from xiaofengF/develop
Martin Kroeker [Sat, 14 Mar 2020 12:08:36 +0000 (13:08 +0100)]
Merge pull request #2506 from xiaofengF/develop

Add benchmark for SPMV and fix segmentation fault when data size >= 50000

4 years agoMerge pull request #2505 from aaawuanjun/develop
Martin Kroeker [Fri, 13 Mar 2020 22:16:35 +0000 (23:16 +0100)]
Merge pull request #2505 from aaawuanjun/develop

[OpenBlas]:Add benchmark tpmv.c and modify benchmark/Makefile

4 years agoMerge pull request #2511 from martin-frbg/fixppctest
Martin Kroeker [Fri, 13 Mar 2020 22:04:01 +0000 (23:04 +0100)]
Merge pull request #2511 from martin-frbg/fixppctest

Prevent attempts to run ctest or test when fortran is not available

4 years agoMove declarations of lapack_complex_custom types outside the extern C
Martin Kroeker [Fri, 13 Mar 2020 19:34:13 +0000 (20:34 +0100)]
Move declarations of lapack_complex_custom types outside the extern C

fixes #2510

4 years agoDo not attempt to run test without fortran
Martin Kroeker [Fri, 13 Mar 2020 19:11:19 +0000 (20:11 +0100)]
Do not attempt to run test without fortran

4 years agoDo not attempt to run ctest without fortran
Martin Kroeker [Fri, 13 Mar 2020 19:10:26 +0000 (20:10 +0100)]
Do not attempt to run ctest without fortran

The main Makefile takes care of this in the build process, but users or CI jobs may try to run this directly

4 years ago[OpenBLAS]:fix the iamax benchmark error
l00546269 [Fri, 13 Mar 2020 02:58:39 +0000 (10:58 +0800)]
[OpenBLAS]:fix the iamax benchmark error
[Description]:the result for i?amax is not MFlops, it is MBytes

4 years agoRemove cspmv and zspmv to remove the error occured in travis CI
jayfely@qq.com [Wed, 11 Mar 2020 09:02:34 +0000 (17:02 +0800)]
Remove cspmv and zspmv to remove the error occured in travis CI

4 years agoModify Makefile in interface to remove the error occured in travis CI
jayfely@qq.com [Wed, 11 Mar 2020 08:36:45 +0000 (16:36 +0800)]
Modify Makefile in interface to remove the error occured in travis CI

4 years agoOnly keep spmv.goto and spmv.atlas
jayfely@qq.com [Wed, 11 Mar 2020 07:48:58 +0000 (15:48 +0800)]
Only keep spmv.goto and spmv.atlas

4 years ago[OpenBlas]:Add benchmark tpmv.c and modify Makefile
wuanjun 00447568 [Wed, 11 Mar 2020 04:31:48 +0000 (12:31 +0800)]
[OpenBlas]:Add benchmark tpmv.c and modify Makefile
[Description]:Solve the problem of missing tpmv.c benchmark file

4 years agoUpdate spmv.c: solve segmentation fault when m and n are larger than 50000
jayfely@qq.com [Wed, 11 Mar 2020 02:30:09 +0000 (10:30 +0800)]
Update spmv.c: solve segmentation fault when m and n are larger than 50000

4 years agoMerge pull request #2503 from martin-frbg/xerbl
Martin Kroeker [Tue, 10 Mar 2020 22:38:07 +0000 (23:38 +0100)]
Merge pull request #2503 from martin-frbg/xerbl

Apply fix for LAPACK issue 394 (fixed-form code beyond column 72)

4 years agoMerge pull request #2502 from martin-frbg/issue2497
Martin Kroeker [Tue, 10 Mar 2020 19:01:23 +0000 (20:01 +0100)]
Merge pull request #2502 from martin-frbg/issue2497

Fix INTERFACE64 not propagating to the fortran codes on ARMV8

4 years agoMerge pull request #2501 from jijiwawa/Fix_mistakes
Martin Kroeker [Tue, 10 Mar 2020 15:44:40 +0000 (16:44 +0100)]
Merge pull request #2501 from jijiwawa/Fix_mistakes

Fix  pr #2487 error

4 years agoUse the correct unit of measure
s00527847 [Tue, 10 Mar 2020 23:26:06 +0000 (19:26 -0400)]
Use the correct unit of measure

4 years agoApply fix for Reference-LAPACK issue 394
Martin Kroeker [Tue, 10 Mar 2020 12:37:41 +0000 (13:37 +0100)]
Apply fix for Reference-LAPACK issue 394

reference to XERBLA extending beyond column 72, breaking builds with compilers that default to traditional punch card format

4 years agoRestore INTERFACE64 for arm64
Martin Kroeker [Tue, 10 Mar 2020 11:51:07 +0000 (12:51 +0100)]
Restore INTERFACE64 for arm64

4 years agoMerge pull request #37 from xianyi/develop
Martin Kroeker [Tue, 10 Mar 2020 11:49:21 +0000 (12:49 +0100)]
Merge pull request #37 from xianyi/develop

rebase

4 years agoModify Makefile in Benchmark
jayfely@qq.com [Tue, 10 Mar 2020 06:32:18 +0000 (14:32 +0800)]
Modify Makefile in Benchmark

4 years agoAdd benchmark for SPMV
jayfely@qq.com [Tue, 10 Mar 2020 06:22:18 +0000 (14:22 +0800)]
Add benchmark for SPMV

4 years agoMerge pull request #2498 from njutcz/develop
Zhang Xianyi [Mon, 9 Mar 2020 08:04:33 +0000 (16:04 +0800)]
Merge pull request #2498 from njutcz/develop

Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.

4 years agoFix the functional bugs for zamax.
s00548429 [Mon, 9 Mar 2020 07:36:50 +0000 (15:36 +0800)]
Fix the functional bugs for zamax.

4 years agoAdd benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
s00548429 [Mon, 9 Mar 2020 06:59:03 +0000 (14:59 +0800)]
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and  i?min.

4 years agoMerge pull request #1 from xianyi/develop
njutcz [Mon, 9 Mar 2020 02:39:40 +0000 (10:39 +0800)]
Merge pull request #1 from xianyi/develop

update

4 years agoMerge pull request #2495 from ZuoQ3/develop
Martin Kroeker [Sun, 8 Mar 2020 07:09:58 +0000 (08:09 +0100)]
Merge pull request #2495 from ZuoQ3/develop

add benchmark for axpby test

4 years agoMerge pull request #2494 from shengyang-3390/develop
Martin Kroeker [Sat, 7 Mar 2020 22:04:21 +0000 (23:04 +0100)]
Merge pull request #2494 from shengyang-3390/develop

add benchmark for csrot and zdrot

4 years agoMerge pull request #2489 from jijiwawa/brightness
Martin Kroeker [Sat, 7 Mar 2020 21:26:00 +0000 (22:26 +0100)]
Merge pull request #2489 from jijiwawa/brightness

Remove redundant code

4 years agoadd trmm.c
s00527847 [Sat, 7 Mar 2020 18:09:19 +0000 (13:09 -0500)]
add trmm.c

4 years agoRemove redundant code
s00527847 [Wed, 4 Mar 2020 22:44:50 +0000 (17:44 -0500)]
Remove redundant code

4 years agoMerge pull request #2493 from martin-frbg/plainmake
Martin Kroeker [Sat, 7 Mar 2020 15:55:53 +0000 (16:55 +0100)]
Merge pull request #2493 from martin-frbg/plainmake

Fix use of make vs $(MAKE) in building lapack-testing

4 years agoMerge pull request #2488 from liujingjue/develop
Martin Kroeker [Sat, 7 Mar 2020 15:52:29 +0000 (16:52 +0100)]
Merge pull request #2488 from liujingjue/develop

Modify the main Makefile in OpenBLAS

4 years agoAdd benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby
zq [Sat, 7 Mar 2020 09:48:55 +0000 (17:48 +0800)]
Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby

4 years agoMerge pull request #1 from xianyi/develop
zq [Sat, 7 Mar 2020 09:04:59 +0000 (17:04 +0800)]
Merge pull request #1 from xianyi/develop

update

4 years agoadd benchmark for csrot and zdrot
shengyang [Sat, 7 Mar 2020 07:17:49 +0000 (15:17 +0800)]
add benchmark for csrot and zdrot
modified:   benchmark/Makefile
modified:   benchmark/rot.c

4 years ago[OpenBLAS]:modifed the Makefile
l00546269 [Sat, 7 Mar 2020 02:14:33 +0000 (10:14 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]: check the compiler version and show the detail info

4 years agoFix another spot where make was used instead of $(MAKE)
Martin Kroeker [Fri, 6 Mar 2020 14:37:26 +0000 (15:37 +0100)]
Fix another spot where make was used instead of $(MAKE)

Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax

4 years agoMerge pull request #36 from xianyi/develop
Martin Kroeker [Fri, 6 Mar 2020 14:32:27 +0000 (15:32 +0100)]
Merge pull request #36 from xianyi/develop

rebase

4 years agoMerge pull request #2491 from chenxuqiang/hbmv_benchmark
Martin Kroeker [Fri, 6 Mar 2020 14:06:42 +0000 (15:06 +0100)]
Merge pull request #2491 from chenxuqiang/hbmv_benchmark

benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c

4 years agoMerge pull request #2490 from shengyang-3390/develop
Martin Kroeker [Fri, 6 Mar 2020 14:05:55 +0000 (15:05 +0100)]
Merge pull request #2490 from shengyang-3390/develop

Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm

4 years agoMerge pull request #2487 from jijiwawa/develop
Martin Kroeker [Fri, 6 Mar 2020 13:42:25 +0000 (14:42 +0100)]
Merge pull request #2487 from jijiwawa/develop

add benchmark for spr/spr2

4 years agoMerge branch 'develop' into develop
Martin Kroeker [Fri, 6 Mar 2020 13:41:40 +0000 (14:41 +0100)]
Merge branch 'develop' into develop

4 years agoMerge pull request #2486 from qqqil/develop
Martin Kroeker [Fri, 6 Mar 2020 13:30:09 +0000 (14:30 +0100)]
Merge pull request #2486 from qqqil/develop

add benchmark for trsv

4 years agoMerge pull request #2485 from Darkness303/develop
Martin Kroeker [Fri, 6 Mar 2020 13:29:27 +0000 (14:29 +0100)]
Merge pull request #2485 from Darkness303/develop

Add syr2 benchmark

4 years agoMerge pull request #2469 from AGSaidi/acq-rel-2
Martin Kroeker [Fri, 6 Mar 2020 13:28:58 +0000 (14:28 +0100)]
Merge pull request #2469 from AGSaidi/acq-rel-2

Use acq/rel semantics to pass flags/pointers in getrf_parallel.

4 years agoUse acq/rel semantics to pass flags/pointers in getrf_parallel.
Ali Saidi [Mon, 24 Feb 2020 05:45:30 +0000 (05:45 +0000)]
Use acq/rel semantics to pass flags/pointers in getrf_parallel.

The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.

4 years agobenchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
chenxuqiang [Fri, 6 Mar 2020 06:02:02 +0000 (01:02 -0500)]
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c

Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
4 years agoAdd benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
shengyang [Thu, 5 Mar 2020 01:55:16 +0000 (09:55 +0800)]
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm

modified:   benchmark/Makefile
new file:   benchmark/rotm.c

4 years agoadd benchmark for spr/spr2
s00527847 [Wed, 4 Mar 2020 20:50:19 +0000 (15:50 -0500)]
add benchmark for spr/spr2

4 years agochange clock to CLOCK_PROCESS_CPUTIME_ID
q00437336 [Wed, 4 Mar 2020 08:54:40 +0000 (03:54 -0500)]
change clock to CLOCK_PROCESS_CPUTIME_ID

4 years ago[OpenBLAS]:modifed the Makefile
l00546269 [Wed, 4 Mar 2020 08:47:23 +0000 (16:47 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]:add c/fortran compiler version information in final note

4 years agoadd benchmark for trsv
q00437336 [Wed, 4 Mar 2020 07:57:33 +0000 (02:57 -0500)]
add benchmark for trsv

4 years agoMerge pull request #2484 from RajalakshmiSR/power-dynamic
Martin Kroeker [Wed, 4 Mar 2020 07:06:06 +0000 (08:06 +0100)]
Merge pull request #2484 from RajalakshmiSR/power-dynamic

Fix DYNAMIC_ARCH build for POWER9

4 years agoMerge pull request #2483 from aaawuanjun/develop
Martin Kroeker [Wed, 4 Mar 2020 06:59:56 +0000 (07:59 +0100)]
Merge pull request #2483 from aaawuanjun/develop

Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years agoMerge pull request #2466 from AGSaidi/acq-rel-1
Martin Kroeker [Wed, 4 Mar 2020 06:59:31 +0000 (07:59 +0100)]
Merge pull request #2466 from AGSaidi/acq-rel-1

Switch blas_server to use acq/rel semantics

4 years ago1.Add syr2 benchmark
Darkness303 [Wed, 4 Mar 2020 06:09:10 +0000 (14:09 +0800)]
1.Add syr2 benchmark
2.Fixed some errors

4 years agoFix cut/paste glitch
Martin Kroeker [Tue, 3 Mar 2020 20:37:48 +0000 (21:37 +0100)]
Fix cut/paste glitch

4 years agoRestore initializers for mutex and conditional
Martin Kroeker [Tue, 3 Mar 2020 20:04:12 +0000 (21:04 +0100)]
Restore initializers for mutex and conditional

4 years agoFix DYNAMIC_ARCH build for POWER9
Rajalakshmi Srinivasaraghavan [Tue, 3 Mar 2020 18:35:10 +0000 (12:35 -0600)]
Fix DYNAMIC_ARCH build for POWER9

Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks.  This patch fixes some of the macros that are used
to check compiler version.  On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 11:03:57 +0000 (19:03 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years agoMerge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop

4 years ago[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c...
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years ago[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c...
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv

4 years agoMerge pull request #2479 from Darkness303/develop
Martin Kroeker [Tue, 3 Mar 2020 07:46:49 +0000 (08:46 +0100)]
Merge pull request #2479 from Darkness303/develop

Fix potential index overflows at large matrix sizes in the benchmark codes

4 years agoMerge pull request #2436 from marxin/improve-utest-coverage
Martin Kroeker [Tue, 3 Mar 2020 07:43:00 +0000 (08:43 +0100)]
Merge pull request #2436 from marxin/improve-utest-coverage

Improve test coverage for utests.

4 years agoMerge pull request #2481 from ChinouneMehdi/fix2480
Martin Kroeker [Mon, 2 Mar 2020 20:21:29 +0000 (21:21 +0100)]
Merge pull request #2481 from ChinouneMehdi/fix2480

Fix #2480

4 years agoMerge pull request #2478 from MacChen02/develop
Martin Kroeker [Mon, 2 Mar 2020 20:20:51 +0000 (21:20 +0100)]
Merge pull request #2478 from MacChen02/develop

Update benchmark statistical time function

4 years agofixes #2480
مهدي شينون (Mehdi Chinoune) [Mon, 2 Mar 2020 16:22:28 +0000 (17:22 +0100)]
fixes #2480

4 years agoImprove test coverage for utests.
Martin Liska [Wed, 19 Feb 2020 17:24:01 +0000 (18:24 +0100)]
Improve test coverage for utests.

4 years agofix core dumped error
jianghesong [Mon, 2 Mar 2020 11:13:45 +0000 (19:13 +0800)]
fix core dumped error

4 years agoUpdate benchmark statistical time function
MacChen02 [Mon, 2 Mar 2020 06:36:27 +0000 (14:36 +0800)]
Update benchmark statistical time function

The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.

4 years agoSwitch blas_server to use acq/rel semantics
Ali Saidi [Sat, 22 Feb 2020 05:31:07 +0000 (05:31 +0000)]
Switch blas_server to use acq/rel semantics

Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.

We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.

The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.

4 years agoMerge pull request #2475 from martin-frbg/039changes
Martin Kroeker [Sun, 1 Mar 2020 23:04:26 +0000 (00:04 +0100)]
Merge pull request #2475 from martin-frbg/039changes

Update ChangeLog for 0.3.9

4 years agoMerge pull request #2474 from martin-frbg/p9be
Martin Kroeker [Sun, 1 Mar 2020 23:04:08 +0000 (00:04 +0100)]
Merge pull request #2474 from martin-frbg/p9be

Use POWER8 kernels on big-endian POWER9 for now

4 years agoAdd Ampere EMAG8180
Martin Kroeker [Sun, 1 Mar 2020 23:02:36 +0000 (00:02 +0100)]
Add Ampere EMAG8180

4 years agoUpdate with 0.3.9 changes
Martin Kroeker [Sun, 1 Mar 2020 23:01:22 +0000 (00:01 +0100)]
Update with 0.3.9 changes

4 years agoUse POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 22:45:58 +0000 (23:45 +0100)]
Use POWER8 kernels on big-endian POWER9 for now

4 years agoMerge pull request #35 from xianyi/develop
Martin Kroeker [Sun, 1 Mar 2020 22:44:10 +0000 (23:44 +0100)]
Merge pull request #35 from xianyi/develop

rebase

4 years agoMerge pull request #2471 from AGSaidi/l3-fix-2
Martin Kroeker [Sun, 1 Mar 2020 18:41:07 +0000 (19:41 +0100)]
Merge pull request #2471 from AGSaidi/l3-fix-2

Fix barriers in level3_thread

4 years agoMerge pull request #2468 from AGSaidi/wfe
Martin Kroeker [Sun, 1 Mar 2020 18:40:46 +0000 (19:40 +0100)]
Merge pull request #2468 from AGSaidi/wfe

Use wait-for-event to not spin in the blas_lock

4 years agoMerge pull request #2464 from Darkness303/develop
Martin Kroeker [Sun, 1 Mar 2020 12:02:34 +0000 (13:02 +0100)]
Merge pull request #2464 from Darkness303/develop

Add syr benchmark

4 years agoMerge pull request #2467 from AGSaidi/rpcc
Martin Kroeker [Sat, 29 Feb 2020 21:43:02 +0000 (22:43 +0100)]
Merge pull request #2467 from AGSaidi/rpcc

Make rpcc() on arm64 get closer to what x86 returns

4 years agoMerge pull request #2463 from martin-frbg/mingwfix
Martin Kroeker [Sat, 29 Feb 2020 18:08:03 +0000 (19:08 +0100)]
Merge pull request #2463 from martin-frbg/mingwfix

Apply MinGW AVX512 compilation fix to fortran options as well

4 years agoMerge pull request #2422 from wjc404/develop
Martin Kroeker [Sat, 29 Feb 2020 18:07:35 +0000 (19:07 +0100)]
Merge pull request #2422 from wjc404/develop

Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM

4 years agoFix barriers in level3_thread
Ali Saidi [Sat, 29 Feb 2020 17:27:18 +0000 (17:27 +0000)]
Fix barriers in level3_thread

4 years agoMerge pull request #2465 from AGSaidi/neoverse-n1
Martin Kroeker [Sat, 29 Feb 2020 12:24:44 +0000 (13:24 +0100)]
Merge pull request #2465 from AGSaidi/neoverse-n1

Add Neoverse-N1 core

4 years agoUse wait-for-event to not spin in the blas_lock
Ali Saidi [Fri, 21 Feb 2020 23:43:43 +0000 (23:43 +0000)]
Use wait-for-event to not spin in the blas_lock

4 years agoMake rpcc() on arm64 get closer to what x86 returns
Ali Saidi [Sat, 22 Feb 2020 05:07:55 +0000 (05:07 +0000)]
Make rpcc() on arm64 get closer to what x86 returns

The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.

4 years agoAdd Neoverse-N1 core
Ali Saidi [Fri, 21 Feb 2020 22:46:58 +0000 (22:46 +0000)]
Add Neoverse-N1 core

The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a

4 years agoNew add syr benchmark
j00520245 [Fri, 28 Feb 2020 08:36:53 +0000 (16:36 +0800)]
New add syr benchmark

4 years agoApply MinGW AVX512 compilation fix to fortran options as well
Martin Kroeker [Thu, 27 Feb 2020 22:09:40 +0000 (23:09 +0100)]
Apply MinGW AVX512 compilation fix to fortran options as well

original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.

4 years agoUpdate cgemm_kernel_8x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:26:15 +0000 (22:26 +0800)]
Update cgemm_kernel_8x2_haswell.c

4 years agoUpdate zgemm_kernel_4x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:25:19 +0000 (22:25 +0800)]
Update zgemm_kernel_4x2_haswell.c

4 years agoMerge pull request #2447 from martin-frbg/issue2446
Martin Kroeker [Thu, 27 Feb 2020 14:07:02 +0000 (15:07 +0100)]
Merge pull request #2447 from martin-frbg/issue2446

Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180