Martin Kroeker [Fri, 13 Mar 2020 19:11:19 +0000 (20:11 +0100)]
Do not attempt to run test without fortran
Martin Kroeker [Fri, 13 Mar 2020 19:10:26 +0000 (20:10 +0100)]
Do not attempt to run ctest without fortran
The main Makefile takes care of this in the build process, but users or CI jobs may try to run this directly
Martin Kroeker [Tue, 10 Mar 2020 11:49:21 +0000 (12:49 +0100)]
Merge pull request #37 from xianyi/develop
rebase
Zhang Xianyi [Mon, 9 Mar 2020 08:04:33 +0000 (16:04 +0800)]
Merge pull request #2498 from njutcz/develop
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
s00548429 [Mon, 9 Mar 2020 07:36:50 +0000 (15:36 +0800)]
Fix the functional bugs for zamax.
s00548429 [Mon, 9 Mar 2020 06:59:03 +0000 (14:59 +0800)]
Add benchmark for ?amax, ?max, ?amin, ?min, i?max, i?amin and i?min.
njutcz [Mon, 9 Mar 2020 02:39:40 +0000 (10:39 +0800)]
Merge pull request #1 from xianyi/develop
update
Martin Kroeker [Sun, 8 Mar 2020 07:09:58 +0000 (08:09 +0100)]
Merge pull request #2495 from ZuoQ3/develop
add benchmark for axpby test
Martin Kroeker [Sat, 7 Mar 2020 22:04:21 +0000 (23:04 +0100)]
Merge pull request #2494 from shengyang-3390/develop
add benchmark for csrot and zdrot
Martin Kroeker [Sat, 7 Mar 2020 21:26:00 +0000 (22:26 +0100)]
Merge pull request #2489 from jijiwawa/brightness
Remove redundant code
s00527847 [Sat, 7 Mar 2020 18:09:19 +0000 (13:09 -0500)]
add trmm.c
s00527847 [Wed, 4 Mar 2020 22:44:50 +0000 (17:44 -0500)]
Remove redundant code
Martin Kroeker [Sat, 7 Mar 2020 15:55:53 +0000 (16:55 +0100)]
Merge pull request #2493 from martin-frbg/plainmake
Fix use of make vs $(MAKE) in building lapack-testing
Martin Kroeker [Sat, 7 Mar 2020 15:52:29 +0000 (16:52 +0100)]
Merge pull request #2488 from liujingjue/develop
Modify the main Makefile in OpenBLAS
zq [Sat, 7 Mar 2020 09:48:55 +0000 (17:48 +0800)]
Add benchmark file axpby.c and modify benchmark/Makefile to test s/d/c/zaxpby
zq [Sat, 7 Mar 2020 09:04:59 +0000 (17:04 +0800)]
Merge pull request #1 from xianyi/develop
update
shengyang [Sat, 7 Mar 2020 07:17:49 +0000 (15:17 +0800)]
add benchmark for csrot and zdrot
modified: benchmark/Makefile
modified: benchmark/rot.c
l00546269 [Sat, 7 Mar 2020 02:14:33 +0000 (10:14 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]: check the compiler version and show the detail info
Martin Kroeker [Fri, 6 Mar 2020 14:37:26 +0000 (15:37 +0100)]
Fix another spot where make was used instead of $(MAKE)
Broke lapack-testing on BSD as their default "make" does not support GNU Makefile syntax
Martin Kroeker [Fri, 6 Mar 2020 14:32:27 +0000 (15:32 +0100)]
Merge pull request #36 from xianyi/develop
rebase
Martin Kroeker [Fri, 6 Mar 2020 14:06:42 +0000 (15:06 +0100)]
Merge pull request #2491 from chenxuqiang/hbmv_benchmark
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
Martin Kroeker [Fri, 6 Mar 2020 14:05:55 +0000 (15:05 +0100)]
Merge pull request #2490 from shengyang-3390/develop
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
Martin Kroeker [Fri, 6 Mar 2020 13:42:25 +0000 (14:42 +0100)]
Merge pull request #2487 from jijiwawa/develop
add benchmark for spr/spr2
Martin Kroeker [Fri, 6 Mar 2020 13:41:40 +0000 (14:41 +0100)]
Merge branch 'develop' into develop
Martin Kroeker [Fri, 6 Mar 2020 13:30:09 +0000 (14:30 +0100)]
Merge pull request #2486 from qqqil/develop
add benchmark for trsv
Martin Kroeker [Fri, 6 Mar 2020 13:29:27 +0000 (14:29 +0100)]
Merge pull request #2485 from Darkness303/develop
Add syr2 benchmark
Martin Kroeker [Fri, 6 Mar 2020 13:28:58 +0000 (14:28 +0100)]
Merge pull request #2469 from AGSaidi/acq-rel-2
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
Ali Saidi [Mon, 24 Feb 2020 05:45:30 +0000 (05:45 +0000)]
Use acq/rel semantics to pass flags/pointers in getrf_parallel.
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this improves performance by 20x on sgesv.goto.
chenxuqiang [Fri, 6 Mar 2020 06:02:02 +0000 (01:02 -0500)]
benchmark/hpmv&hbmv: add benchmark/hpmv.c and benchmark/hbmv.c
Signed-off-by: Xuqiang Chen chenxuqiang3@hisilicon.com
shengyang [Thu, 5 Mar 2020 01:55:16 +0000 (09:55 +0800)]
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
modified: benchmark/Makefile
new file: benchmark/rotm.c
s00527847 [Wed, 4 Mar 2020 20:50:19 +0000 (15:50 -0500)]
add benchmark for spr/spr2
q00437336 [Wed, 4 Mar 2020 08:54:40 +0000 (03:54 -0500)]
change clock to CLOCK_PROCESS_CPUTIME_ID
l00546269 [Wed, 4 Mar 2020 08:47:23 +0000 (16:47 +0800)]
[OpenBLAS]:modifed the Makefile
[Description]:add c/fortran compiler version information in final note
q00437336 [Wed, 4 Mar 2020 07:57:33 +0000 (02:57 -0500)]
add benchmark for trsv
Martin Kroeker [Wed, 4 Mar 2020 07:06:06 +0000 (08:06 +0100)]
Merge pull request #2484 from RajalakshmiSR/power-dynamic
Fix DYNAMIC_ARCH build for POWER9
Martin Kroeker [Wed, 4 Mar 2020 06:59:56 +0000 (07:59 +0100)]
Merge pull request #2483 from aaawuanjun/develop
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
Martin Kroeker [Wed, 4 Mar 2020 06:59:31 +0000 (07:59 +0100)]
Merge pull request #2466 from AGSaidi/acq-rel-1
Switch blas_server to use acq/rel semantics
Darkness303 [Wed, 4 Mar 2020 06:09:10 +0000 (14:09 +0800)]
1.Add syr2 benchmark
2.Fixed some errors
Martin Kroeker [Tue, 3 Mar 2020 20:37:48 +0000 (21:37 +0100)]
Fix cut/paste glitch
Martin Kroeker [Tue, 3 Mar 2020 20:04:12 +0000 (21:04 +0100)]
Restore initializers for mutex and conditional
Rajalakshmi Srinivasaraghavan [Tue, 3 Mar 2020 18:35:10 +0000 (12:35 -0600)]
Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
wuanjun 00447568 [Tue, 3 Mar 2020 11:03:57 +0000 (19:03 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
Martin Kroeker [Tue, 3 Mar 2020 07:46:49 +0000 (08:46 +0100)]
Merge pull request #2479 from Darkness303/develop
Fix potential index overflows at large matrix sizes in the benchmark codes
Martin Kroeker [Tue, 3 Mar 2020 07:43:00 +0000 (08:43 +0100)]
Merge pull request #2436 from marxin/improve-utest-coverage
Improve test coverage for utests.
Martin Kroeker [Mon, 2 Mar 2020 20:21:29 +0000 (21:21 +0100)]
Merge pull request #2481 from ChinouneMehdi/fix2480
Fix #2480
Martin Kroeker [Mon, 2 Mar 2020 20:20:51 +0000 (21:20 +0100)]
Merge pull request #2478 from MacChen02/develop
Update benchmark statistical time function
مهدي شينون (Mehdi Chinoune) [Mon, 2 Mar 2020 16:22:28 +0000 (17:22 +0100)]
fixes #2480
Martin Liska [Wed, 19 Feb 2020 17:24:01 +0000 (18:24 +0100)]
Improve test coverage for utests.
jianghesong [Mon, 2 Mar 2020 11:13:45 +0000 (19:13 +0800)]
fix core dumped error
MacChen02 [Mon, 2 Mar 2020 06:36:27 +0000 (14:36 +0800)]
Update benchmark statistical time function
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
Ali Saidi [Sat, 22 Feb 2020 05:31:07 +0000 (05:31 +0000)]
Switch blas_server to use acq/rel semantics
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
Martin Kroeker [Sun, 1 Mar 2020 23:04:26 +0000 (00:04 +0100)]
Merge pull request #2475 from martin-frbg/039changes
Update ChangeLog for 0.3.9
Martin Kroeker [Sun, 1 Mar 2020 23:04:08 +0000 (00:04 +0100)]
Merge pull request #2474 from martin-frbg/p9be
Use POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 23:02:36 +0000 (00:02 +0100)]
Add Ampere EMAG8180
Martin Kroeker [Sun, 1 Mar 2020 23:01:22 +0000 (00:01 +0100)]
Update with 0.3.9 changes
Martin Kroeker [Sun, 1 Mar 2020 22:45:58 +0000 (23:45 +0100)]
Use POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 22:44:10 +0000 (23:44 +0100)]
Merge pull request #35 from xianyi/develop
rebase
Martin Kroeker [Sun, 1 Mar 2020 18:41:07 +0000 (19:41 +0100)]
Merge pull request #2471 from AGSaidi/l3-fix-2
Fix barriers in level3_thread
Martin Kroeker [Sun, 1 Mar 2020 18:40:46 +0000 (19:40 +0100)]
Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
Martin Kroeker [Sun, 1 Mar 2020 12:02:34 +0000 (13:02 +0100)]
Merge pull request #2464 from Darkness303/develop
Add syr benchmark
Martin Kroeker [Sat, 29 Feb 2020 21:43:02 +0000 (22:43 +0100)]
Merge pull request #2467 from AGSaidi/rpcc
Make rpcc() on arm64 get closer to what x86 returns
Martin Kroeker [Sat, 29 Feb 2020 18:08:03 +0000 (19:08 +0100)]
Merge pull request #2463 from martin-frbg/mingwfix
Apply MinGW AVX512 compilation fix to fortran options as well
Martin Kroeker [Sat, 29 Feb 2020 18:07:35 +0000 (19:07 +0100)]
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
Ali Saidi [Sat, 29 Feb 2020 17:27:18 +0000 (17:27 +0000)]
Fix barriers in level3_thread
Martin Kroeker [Sat, 29 Feb 2020 12:24:44 +0000 (13:24 +0100)]
Merge pull request #2465 from AGSaidi/neoverse-n1
Add Neoverse-N1 core
Ali Saidi [Fri, 21 Feb 2020 23:43:43 +0000 (23:43 +0000)]
Use wait-for-event to not spin in the blas_lock
Ali Saidi [Sat, 22 Feb 2020 05:07:55 +0000 (05:07 +0000)]
Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
Ali Saidi [Fri, 21 Feb 2020 22:46:58 +0000 (22:46 +0000)]
Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
j00520245 [Fri, 28 Feb 2020 08:36:53 +0000 (16:36 +0800)]
New add syr benchmark
Martin Kroeker [Thu, 27 Feb 2020 22:09:40 +0000 (23:09 +0100)]
Apply MinGW AVX512 compilation fix to fortran options as well
original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
wjc404 [Thu, 27 Feb 2020 14:26:15 +0000 (22:26 +0800)]
Update cgemm_kernel_8x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:25:19 +0000 (22:25 +0800)]
Update zgemm_kernel_4x2_haswell.c
Martin Kroeker [Thu, 27 Feb 2020 14:07:02 +0000 (15:07 +0100)]
Merge pull request #2447 from martin-frbg/issue2446
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
Martin Kroeker [Wed, 26 Feb 2020 21:19:57 +0000 (22:19 +0100)]
Always assume server-class cpu count for TSV110 and EMAG8180
Martin Kroeker [Wed, 26 Feb 2020 21:16:28 +0000 (22:16 +0100)]
Merge pull request #34 from xianyi/develop
rebase
wjc404 [Wed, 26 Feb 2020 10:38:12 +0000 (18:38 +0800)]
Update zgemm_kernel_4x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:36:54 +0000 (18:36 +0800)]
Update cgemm_kernel_8x2_haswell.c
Martin Kroeker [Tue, 25 Feb 2020 17:42:52 +0000 (18:42 +0100)]
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
Martin Kroeker [Tue, 25 Feb 2020 13:30:00 +0000 (14:30 +0100)]
Add Ampere EMAG8180
Martin Kroeker [Mon, 24 Feb 2020 19:23:18 +0000 (20:23 +0100)]
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake
Martin Kroeker [Mon, 24 Feb 2020 19:16:18 +0000 (20:16 +0100)]
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake
Martin Kroeker [Mon, 24 Feb 2020 19:15:04 +0000 (20:15 +0100)]
Typo fix
Martin Kroeker [Mon, 24 Feb 2020 18:23:46 +0000 (19:23 +0100)]
Add EMAG8180 to DYNAMIC_CORE list for ARM64
Martin Kroeker [Mon, 24 Feb 2020 18:20:00 +0000 (19:20 +0100)]
Add DYNAMIC_ARCH support for ARMV8 EMAG8180
Martin Kroeker [Mon, 24 Feb 2020 12:14:51 +0000 (13:14 +0100)]
Merge pull request #2443 from aaawuanjun/develop
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
Martin Kroeker [Mon, 24 Feb 2020 11:27:01 +0000 (12:27 +0100)]
Merge pull request #2442 from martin-frbg/lapackpr390
Apply fix from Reference-LAPACK PR 390
wuanjun 00447568 [Mon, 24 Feb 2020 03:23:39 +0000 (11:23 +0800)]
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
Martin Kroeker [Sun, 23 Feb 2020 21:40:40 +0000 (22:40 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating
Martin Kroeker [Sun, 23 Feb 2020 21:39:01 +0000 (22:39 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating
wjc404 [Sat, 22 Feb 2020 15:40:02 +0000 (23:40 +0800)]
Add files via upload
wjc404 [Sat, 22 Feb 2020 15:39:43 +0000 (23:39 +0800)]
Update KERNEL.ZEN
wjc404 [Sat, 22 Feb 2020 15:39:20 +0000 (23:39 +0800)]
Update KERNEL.HASWELL
wjc404 [Sat, 22 Feb 2020 15:38:48 +0000 (23:38 +0800)]
Delete sgemm_kernel_8x4_haswell_2.c
wjc404 [Sat, 22 Feb 2020 15:37:45 +0000 (23:37 +0800)]
Fix performance bug when LDC is a multiple of 1024
Martin Kroeker [Sat, 22 Feb 2020 10:21:03 +0000 (11:21 +0100)]
Merge pull request #2441 from martin-frbg/ismin2
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
Martin Kroeker [Fri, 21 Feb 2020 10:58:15 +0000 (11:58 +0100)]
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations