shengyang [Thu, 5 Mar 2020 01:55:16 +0000 (09:55 +0800)]
Add benchmark file rotm.c and modify benchmark/Makefile to test s/drotm
modified: benchmark/Makefile
new file: benchmark/rotm.c
Martin Kroeker [Wed, 4 Mar 2020 07:06:06 +0000 (08:06 +0100)]
Merge pull request #2484 from RajalakshmiSR/power-dynamic
Fix DYNAMIC_ARCH build for POWER9
Martin Kroeker [Wed, 4 Mar 2020 06:59:56 +0000 (07:59 +0100)]
Merge pull request #2483 from aaawuanjun/develop
Add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
Martin Kroeker [Wed, 4 Mar 2020 06:59:31 +0000 (07:59 +0100)]
Merge pull request #2466 from AGSaidi/acq-rel-1
Switch blas_server to use acq/rel semantics
Martin Kroeker [Tue, 3 Mar 2020 20:37:48 +0000 (21:37 +0100)]
Fix cut/paste glitch
Martin Kroeker [Tue, 3 Mar 2020 20:04:12 +0000 (21:04 +0100)]
Restore initializers for mutex and conditional
Rajalakshmi Srinivasaraghavan [Tue, 3 Mar 2020 18:35:10 +0000 (12:35 -0600)]
Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
wuanjun 00447568 [Tue, 3 Mar 2020 11:03:57 +0000 (19:03 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:39:26 +0000 (17:39 +0800)]
Merge branch 'develop' of https://github.com/aaawuanjun/OpenBLAS into develop
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
wuanjun 00447568 [Tue, 3 Mar 2020 09:13:49 +0000 (17:13 +0800)]
[OpenBlas]: add benchmark file trmv.c and modify benchmark/Makefile to test s/d/c/ztrmv
Martin Kroeker [Tue, 3 Mar 2020 07:46:49 +0000 (08:46 +0100)]
Merge pull request #2479 from Darkness303/develop
Fix potential index overflows at large matrix sizes in the benchmark codes
Martin Kroeker [Tue, 3 Mar 2020 07:43:00 +0000 (08:43 +0100)]
Merge pull request #2436 from marxin/improve-utest-coverage
Improve test coverage for utests.
Martin Kroeker [Mon, 2 Mar 2020 20:21:29 +0000 (21:21 +0100)]
Merge pull request #2481 from ChinouneMehdi/fix2480
Fix #2480
Martin Kroeker [Mon, 2 Mar 2020 20:20:51 +0000 (21:20 +0100)]
Merge pull request #2478 from MacChen02/develop
Update benchmark statistical time function
مهدي شينون (Mehdi Chinoune) [Mon, 2 Mar 2020 16:22:28 +0000 (17:22 +0100)]
fixes #2480
Martin Liska [Wed, 19 Feb 2020 17:24:01 +0000 (18:24 +0100)]
Improve test coverage for utests.
jianghesong [Mon, 2 Mar 2020 11:13:45 +0000 (19:13 +0800)]
fix core dumped error
MacChen02 [Mon, 2 Mar 2020 06:36:27 +0000 (14:36 +0800)]
Update benchmark statistical time function
The function gettimeofday does not count the time,when testing the axpy small data volume use case.
Use the function clock_gettime to replace the gettimeofday function to count the time.
Ali Saidi [Sat, 22 Feb 2020 05:31:07 +0000 (05:31 +0000)]
Switch blas_server to use acq/rel semantics
Heavy-weight locking isn't required to pass the work queue
pointer between threads and simple atomic acquire/release
semantics can be used instead. This is especially important as
pthread_mutex_lock() isn't fair.
We've observed substantial variation in runtime because of the
the unfairness of these locks which complety goes away with
this implementation.
The locks themselves are left to provide a portable way for
idling threads to sleep/wakeup after many unsuccessful iterations
waiting.
Martin Kroeker [Sun, 1 Mar 2020 23:04:26 +0000 (00:04 +0100)]
Merge pull request #2475 from martin-frbg/039changes
Update ChangeLog for 0.3.9
Martin Kroeker [Sun, 1 Mar 2020 23:04:08 +0000 (00:04 +0100)]
Merge pull request #2474 from martin-frbg/p9be
Use POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 23:02:36 +0000 (00:02 +0100)]
Add Ampere EMAG8180
Martin Kroeker [Sun, 1 Mar 2020 23:01:22 +0000 (00:01 +0100)]
Update with 0.3.9 changes
Martin Kroeker [Sun, 1 Mar 2020 22:45:58 +0000 (23:45 +0100)]
Use POWER8 kernels on big-endian POWER9 for now
Martin Kroeker [Sun, 1 Mar 2020 22:44:10 +0000 (23:44 +0100)]
Merge pull request #35 from xianyi/develop
rebase
Martin Kroeker [Sun, 1 Mar 2020 18:41:07 +0000 (19:41 +0100)]
Merge pull request #2471 from AGSaidi/l3-fix-2
Fix barriers in level3_thread
Martin Kroeker [Sun, 1 Mar 2020 18:40:46 +0000 (19:40 +0100)]
Merge pull request #2468 from AGSaidi/wfe
Use wait-for-event to not spin in the blas_lock
Martin Kroeker [Sun, 1 Mar 2020 12:02:34 +0000 (13:02 +0100)]
Merge pull request #2464 from Darkness303/develop
Add syr benchmark
Martin Kroeker [Sat, 29 Feb 2020 21:43:02 +0000 (22:43 +0100)]
Merge pull request #2467 from AGSaidi/rpcc
Make rpcc() on arm64 get closer to what x86 returns
Martin Kroeker [Sat, 29 Feb 2020 18:08:03 +0000 (19:08 +0100)]
Merge pull request #2463 from martin-frbg/mingwfix
Apply MinGW AVX512 compilation fix to fortran options as well
Martin Kroeker [Sat, 29 Feb 2020 18:07:35 +0000 (19:07 +0100)]
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
Ali Saidi [Sat, 29 Feb 2020 17:27:18 +0000 (17:27 +0000)]
Fix barriers in level3_thread
Martin Kroeker [Sat, 29 Feb 2020 12:24:44 +0000 (13:24 +0100)]
Merge pull request #2465 from AGSaidi/neoverse-n1
Add Neoverse-N1 core
Ali Saidi [Fri, 21 Feb 2020 23:43:43 +0000 (23:43 +0000)]
Use wait-for-event to not spin in the blas_lock
Ali Saidi [Sat, 22 Feb 2020 05:07:55 +0000 (05:07 +0000)]
Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer
which is defined by the SBSA to be between 10-400MHz. These numbers
are much smaller than the cycle counter frequency used by x86. Make
the numbers closer by shifting the cycle counter up by the number of
leading zeros in the cntfrq_el0 register which gets us closer to a
noraml cpu clock cycle range.
Ali Saidi [Fri, 21 Feb 2020 22:46:58 +0000 (22:46 +0000)]
Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
j00520245 [Fri, 28 Feb 2020 08:36:53 +0000 (16:36 +0800)]
New add syr benchmark
Martin Kroeker [Thu, 27 Feb 2020 22:09:40 +0000 (23:09 +0100)]
Apply MinGW AVX512 compilation fix to fortran options as well
original issue was #1708, I see now that the same problem affects gfortran compilation. The underlying issue is said to be fixed (but not yet released) on all branches of gcc as of a few days ago but it will certainly take time to reach mingw/msys.
wjc404 [Thu, 27 Feb 2020 14:26:15 +0000 (22:26 +0800)]
Update cgemm_kernel_8x2_haswell.c
wjc404 [Thu, 27 Feb 2020 14:25:19 +0000 (22:25 +0800)]
Update zgemm_kernel_4x2_haswell.c
Martin Kroeker [Thu, 27 Feb 2020 14:07:02 +0000 (15:07 +0100)]
Merge pull request #2447 from martin-frbg/issue2446
Always select ARMV8 parameters for big servers when cpu is TSV110 or EMAG8180
Martin Kroeker [Wed, 26 Feb 2020 21:19:57 +0000 (22:19 +0100)]
Always assume server-class cpu count for TSV110 and EMAG8180
Martin Kroeker [Wed, 26 Feb 2020 21:16:28 +0000 (22:16 +0100)]
Merge pull request #34 from xianyi/develop
rebase
wjc404 [Wed, 26 Feb 2020 10:38:12 +0000 (18:38 +0800)]
Update zgemm_kernel_4x2_haswell.c
wjc404 [Wed, 26 Feb 2020 10:36:54 +0000 (18:36 +0800)]
Update cgemm_kernel_8x2_haswell.c
Martin Kroeker [Tue, 25 Feb 2020 17:42:52 +0000 (18:42 +0100)]
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
Martin Kroeker [Tue, 25 Feb 2020 13:30:00 +0000 (14:30 +0100)]
Add Ampere EMAG8180
Martin Kroeker [Mon, 24 Feb 2020 19:23:18 +0000 (20:23 +0100)]
Add parameters for EMAG8180 DYNAMIC_ARCH support with cmake
Martin Kroeker [Mon, 24 Feb 2020 19:16:18 +0000 (20:16 +0100)]
Add EMAG8180 to arm64 DYNAMIC_ARCH list for cmake
Martin Kroeker [Mon, 24 Feb 2020 19:15:04 +0000 (20:15 +0100)]
Typo fix
Martin Kroeker [Mon, 24 Feb 2020 18:23:46 +0000 (19:23 +0100)]
Add EMAG8180 to DYNAMIC_CORE list for ARM64
Martin Kroeker [Mon, 24 Feb 2020 18:20:00 +0000 (19:20 +0100)]
Add DYNAMIC_ARCH support for ARMV8 EMAG8180
Martin Kroeker [Mon, 24 Feb 2020 12:14:51 +0000 (13:14 +0100)]
Merge pull request #2443 from aaawuanjun/develop
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
Martin Kroeker [Mon, 24 Feb 2020 11:27:01 +0000 (12:27 +0100)]
Merge pull request #2442 from martin-frbg/lapackpr390
Apply fix from Reference-LAPACK PR 390
wuanjun 00447568 [Mon, 24 Feb 2020 03:23:39 +0000 (11:23 +0800)]
[OpenBlas]:benchmark/copy.c has time,x,y data loop problems
Martin Kroeker [Sun, 23 Feb 2020 21:40:40 +0000 (22:40 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating
Martin Kroeker [Sun, 23 Feb 2020 21:39:01 +0000 (22:39 +0100)]
Apply fix from Reference-LAPACK PR390, NaN not propagating
wjc404 [Sat, 22 Feb 2020 15:40:02 +0000 (23:40 +0800)]
Add files via upload
wjc404 [Sat, 22 Feb 2020 15:39:43 +0000 (23:39 +0800)]
Update KERNEL.ZEN
wjc404 [Sat, 22 Feb 2020 15:39:20 +0000 (23:39 +0800)]
Update KERNEL.HASWELL
wjc404 [Sat, 22 Feb 2020 15:38:48 +0000 (23:38 +0800)]
Delete sgemm_kernel_8x4_haswell_2.c
wjc404 [Sat, 22 Feb 2020 15:37:45 +0000 (23:37 +0800)]
Fix performance bug when LDC is a multiple of 1024
Martin Kroeker [Sat, 22 Feb 2020 10:21:03 +0000 (11:21 +0100)]
Merge pull request #2441 from martin-frbg/ismin2
Add proper defaults for the IxMIN/IxMAX kernels on mips64 and power
Martin Kroeker [Fri, 21 Feb 2020 10:58:15 +0000 (11:58 +0100)]
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
Martin Kroeker [Fri, 21 Feb 2020 10:55:52 +0000 (11:55 +0100)]
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
Martin Kroeker [Fri, 21 Feb 2020 08:56:05 +0000 (09:56 +0100)]
Merge pull request #33 from xianyi/develop
rebase
Martin Kroeker [Thu, 20 Feb 2020 23:01:58 +0000 (00:01 +0100)]
Merge pull request #2435 from martin-frbg/issue2433
Fix handling of ppc endianness
Martin Kroeker [Wed, 19 Feb 2020 18:00:28 +0000 (19:00 +0100)]
Add preliminary support for EMAG8180
Martin Kroeker [Wed, 19 Feb 2020 17:57:26 +0000 (18:57 +0100)]
Add preliminary support for EMAG8180 ARMV8 processor
Martin Kroeker [Wed, 19 Feb 2020 17:49:13 +0000 (18:49 +0100)]
Recognize Ampere EMAG8180
Martin Kroeker [Wed, 19 Feb 2020 17:09:54 +0000 (18:09 +0100)]
Fix endianness conditionals
Martin Kroeker [Wed, 19 Feb 2020 17:08:20 +0000 (18:08 +0100)]
Get endianness into Makefile variable
Martin Kroeker [Wed, 19 Feb 2020 17:06:39 +0000 (18:06 +0100)]
Merge pull request #32 from xianyi/develop
rebase
Martin Kroeker [Wed, 19 Feb 2020 07:14:28 +0000 (08:14 +0100)]
Merge pull request #2432 from isuruf/install_name
Fix install name on osx again
Isuru Fernando [Tue, 18 Feb 2020 18:22:49 +0000 (10:22 -0800)]
Fix install name on osx again
Martin Kroeker [Tue, 18 Feb 2020 11:09:15 +0000 (12:09 +0100)]
Merge pull request #2426 from zbeekman/nightly-homebrew-check
Nightly homebrew check
Martin Kroeker [Tue, 18 Feb 2020 07:15:02 +0000 (08:15 +0100)]
Merge pull request #2427 from martin-frbg/powermin
Fix ISMIN and ISMAX kernel choices for POWER8
Martin Kroeker [Mon, 17 Feb 2020 18:55:39 +0000 (19:55 +0100)]
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
Izaak Beekman [Mon, 17 Feb 2020 18:32:33 +0000 (13:32 -0500)]
Fix bottle upload problem & typo
Izaak Beekman [Mon, 17 Feb 2020 18:12:50 +0000 (13:12 -0500)]
Test push & PRs only when workflow file changes
Also, add comments to clarify what the test is testing
Izaak Beekman [Mon, 17 Feb 2020 16:49:53 +0000 (11:49 -0500)]
Add Github Action to build development branch nightly with Homebrew
Martin Kroeker [Mon, 17 Feb 2020 16:00:08 +0000 (17:00 +0100)]
Merge pull request #2424 from isuruf/osx
Fix building on osx
Martin Kroeker [Mon, 17 Feb 2020 13:53:46 +0000 (14:53 +0100)]
Merge pull request #30 from xianyi/develop
rebase
Martin Kroeker [Mon, 17 Feb 2020 13:50:18 +0000 (14:50 +0100)]
Merge pull request #2414 from marxin/fix-iamax_sse-implementation
Fix iamax sse implementation and add utests
Martin Liska [Thu, 13 Feb 2020 13:42:45 +0000 (14:42 +0100)]
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
Martin Liska [Thu, 13 Feb 2020 13:32:24 +0000 (14:32 +0100)]
Fix implementation of iamax_sse.S as reported in #2116.
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
Martin Liska [Fri, 14 Feb 2020 09:35:51 +0000 (10:35 +0100)]
Add missing USE_MIN in kernel/CMakeLists.txt.
Martin Kroeker [Mon, 17 Feb 2020 06:24:02 +0000 (07:24 +0100)]
Merge pull request #2423 from xianyi/issue2419
Restore -march flag for Android builds
Isuru Fernando [Sun, 16 Feb 2020 21:11:40 +0000 (15:11 -0600)]
Pass CFLAGS from env to Makefile.prebuild and remove iOS hack
Martin Kroeker [Sun, 16 Feb 2020 16:32:13 +0000 (17:32 +0100)]
Restore -march flag for Android builds
fixes #2419 - renewed discussion in #2112 suggests removal of the option was primarily aimed at non-Android builds
Martin Kroeker [Sun, 16 Feb 2020 16:29:35 +0000 (17:29 +0100)]
Update KERNEL.POWER8
Martin Kroeker [Sun, 16 Feb 2020 16:28:10 +0000 (17:28 +0100)]
Merge pull request #29 from xianyi/develop
rebase
wjc404 [Sun, 16 Feb 2020 15:01:31 +0000 (23:01 +0800)]
Update param.h
wjc404 [Sun, 16 Feb 2020 14:58:44 +0000 (22:58 +0800)]
Update KERNEL.SKYLAKEX
wjc404 [Sun, 16 Feb 2020 14:58:00 +0000 (22:58 +0800)]
AVX512 STRMM kernel
Martin Kroeker [Sat, 15 Feb 2020 22:07:50 +0000 (23:07 +0100)]
Update KERNEL.POWER8
Martin Kroeker [Sat, 15 Feb 2020 22:06:51 +0000 (23:06 +0100)]
Update KERNEL.POWER8
Martin Kroeker [Sat, 15 Feb 2020 20:57:41 +0000 (21:57 +0100)]
Merge pull request #2417 from marxin/make-ctest-verbose-for-drone
Make ctest verbose for drone