Martin Kroeker [Fri, 27 Apr 2018 21:10:21 +0000 (23:10 +0200)]
Merge pull request #1539 from martin-frbg/ztrmv-1332
Disable multithreading in ztrmv
Martin Kroeker [Fri, 27 Apr 2018 21:09:57 +0000 (23:09 +0200)]
Merge pull request #1486 from martin-frbg/atomic
Use _Atomic instead of volatile for thread safety where C11 is supported
Martin Kroeker [Wed, 25 Apr 2018 21:23:00 +0000 (23:23 +0200)]
Merge pull request #1540 from martin-frbg/mips32-zasum
Fix typo in MIPS P5600 complex ASUM code selection
Martin Kroeker [Wed, 25 Apr 2018 20:50:10 +0000 (22:50 +0200)]
Fix typo in MIPS P5600 complex ASUM code selection
Martin Kroeker [Wed, 25 Apr 2018 20:35:46 +0000 (22:35 +0200)]
Disable multithreading in ztrmv
BLAS-Tester shows that the same problem exists as with DTRMV (issue #1332)
Martin Kroeker [Wed, 25 Apr 2018 06:38:58 +0000 (08:38 +0200)]
Merge pull request #1538 from martin-frbg/arm7utest
Fix handling of zero INCX, INCY in ArmV7 AXPY and ROT
Martin Kroeker [Tue, 24 Apr 2018 20:43:00 +0000 (22:43 +0200)]
Move the test for zero incx,incy in ARMV7 ROT
to pass the related utest (see #1469)
Martin Kroeker [Tue, 24 Apr 2018 20:39:50 +0000 (22:39 +0200)]
Drop test for zero incx,incy in armv7 AXPY
...to pass the related utest (see #1469)
Martin Kroeker [Mon, 23 Apr 2018 17:05:49 +0000 (19:05 +0200)]
Use generic zrot.c on ppc64/POWER6 to work around utest failure from … (#1535)
* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
Martin Kroeker [Sun, 22 Apr 2018 21:34:17 +0000 (23:34 +0200)]
Merge pull request #1534 from xianyi/revert-1333-haswell32
Revert "Fix 32bit HASWELL builds"
Martin Kroeker [Sun, 22 Apr 2018 18:20:04 +0000 (20:20 +0200)]
Revert "Fix 32bit HASWELL builds"
Martin Kroeker [Fri, 20 Apr 2018 21:44:15 +0000 (23:44 +0200)]
Merge pull request #1532 from martin-frbg/utest-cblas
Do not try to build the fork utest when NO_CBLAS=1
Martin Kroeker [Fri, 20 Apr 2018 13:43:59 +0000 (15:43 +0200)]
fork utest depends on CBLAS
Martin Kroeker [Fri, 20 Apr 2018 13:42:13 +0000 (15:42 +0200)]
fork utest depends on CBLAS
Martin Kroeker [Thu, 19 Apr 2018 12:10:57 +0000 (14:10 +0200)]
Merge pull request #1530 from ashwinyes/develop_20180419_Tx2AutoDetect
ARM64: Enable Auto Detection of ThunderX2T99
Ashwin Sekhar T K [Thu, 19 Apr 2018 09:05:25 +0000 (09:05 +0000)]
ARM64: Enable Auto Detection of ThunderX2T99
Martin Kroeker [Sun, 15 Apr 2018 11:09:30 +0000 (13:09 +0200)]
Merge pull request #1523 from martin-frbg/utest_waith
Include sys/types.h for proper typedefs related to wait()
Martin Kroeker [Sat, 14 Apr 2018 20:24:34 +0000 (22:24 +0200)]
Merge pull request #1520 from martin-frbg/cpucounts
Catch invalid cpu count returned by CPU_COUNT_S
Martin Kroeker [Sat, 14 Apr 2018 16:59:46 +0000 (18:59 +0200)]
Include sys/types.h for proper typedefs related to wait()
Should fix #1519
Martin Kroeker [Sat, 14 Apr 2018 16:29:10 +0000 (18:29 +0200)]
Catch invalid cpu count returned by CPU_COUNT_S
mips32 was seen to return zero here, driving nthreads to zero with subsequent fpe in blas_quickdivide
Martin Kroeker [Wed, 11 Apr 2018 06:21:25 +0000 (08:21 +0200)]
Merge pull request #1515 from martin-frbg/mipsdot
Correct precision of mips dsdot
Martin Kroeker [Tue, 10 Apr 2018 21:30:59 +0000 (23:30 +0200)]
Fix precision of mips dsdot
Martin Kroeker [Sat, 7 Apr 2018 21:31:26 +0000 (23:31 +0200)]
Merge pull request #1512 from ararslan/aa/travis-macos-2
Add macOS to the Travis testing matrix: Take 2!
Alex Arslan [Sat, 7 Apr 2018 19:29:57 +0000 (12:29 -0700)]
Add a BINARY=32 build to macOS
Alex Arslan [Sat, 7 Apr 2018 17:56:34 +0000 (10:56 -0700)]
Add macOS to the Travis testing matrix
Martin Kroeker [Sat, 7 Apr 2018 11:29:31 +0000 (13:29 +0200)]
Merge pull request #1511 from xianyi/revert-1510-aa/travis-macos
Revert "Add macOS to the Travis testing matrix"
Martin Kroeker [Sat, 7 Apr 2018 11:27:24 +0000 (13:27 +0200)]
Revert "Add macOS to the Travis testing matrix"
Martin Kroeker [Sat, 7 Apr 2018 10:09:39 +0000 (12:09 +0200)]
Merge branch 'develop' into atomic
Martin Kroeker [Sat, 7 Apr 2018 10:07:12 +0000 (12:07 +0200)]
Merge pull request #1510 from ararslan/aa/travis-macos
Add macOS to the Travis testing matrix
Martin Kroeker [Sat, 7 Apr 2018 10:06:57 +0000 (12:06 +0200)]
Merge pull request #1509 from ararslan/aa/dragonfly
Add DragonFly to exports/Makefile
Alex Arslan [Sat, 7 Apr 2018 00:53:58 +0000 (17:53 -0700)]
Add macOS to the Travis testing matrix
Alex Arslan [Sat, 7 Apr 2018 00:30:10 +0000 (17:30 -0700)]
Add DragonFly to exports/Makefile
Its exclusion was an oversight on my part.
Martin Kroeker [Thu, 5 Apr 2018 21:46:36 +0000 (23:46 +0200)]
Merge pull request #1506 from martin-frbg/issue1497
Fix thread races and infinite looping on systems with many cpus
Martin Kroeker [Thu, 5 Apr 2018 06:54:07 +0000 (08:54 +0200)]
Merge pull request #1507 from martin-frbg/threads_usage
Underline importance of NUM_THREADS setting for BUFFER allocation
Martin Kroeker [Thu, 5 Apr 2018 06:53:38 +0000 (08:53 +0200)]
Merge pull request #1508 from ararslan/aa/wording
Minor changes to wording and formatting in the README
Alex Arslan [Wed, 4 Apr 2018 21:30:32 +0000 (14:30 -0700)]
Minor changes to wording and formatting in the README
The wording in some places is not grammatically correct. This change
also provides minor adjustments to the Markdown formatting which provide
modest improvements to readability.
Martin Kroeker [Wed, 4 Apr 2018 20:45:33 +0000 (22:45 +0200)]
Merge pull request #1505 from ararslan/aa/compiler
Compile with cc rather than gcc whenever possible
Martin Kroeker [Wed, 4 Apr 2018 20:40:30 +0000 (22:40 +0200)]
Remove unguarded use of _Atomic and fix tabbing
Martin Kroeker [Wed, 4 Apr 2018 20:26:51 +0000 (22:26 +0200)]
Underline importance of NUM_THREADS setting for BUFFER allocation
following augray's suggestion from #1451, and incorporating ashwinyes' comments from #1141 on the importance of NUM_THREADS even for single-threaded builds.
Alex Arslan [Wed, 4 Apr 2018 18:41:45 +0000 (11:41 -0700)]
Reinstate macOS logic
Alex Arslan [Tue, 3 Apr 2018 22:09:25 +0000 (15:09 -0700)]
Compile with cc rather than gcc whenever possible
Martin Kroeker [Wed, 4 Apr 2018 16:16:52 +0000 (18:16 +0200)]
Fix thread races and infinite looping on systems with many cpus
On systems with more than 64 cpus, blas_quickdivide will sometimes return zero which creates bogus workloads when used for the stride calculation. This then leads to threads spinning incessantly waiting for a status change that never happens, as seen in #1497.
This patch also fixes several data races that were found by helgrind and/or tsan while debugging the issue.
Martin Kroeker [Wed, 4 Apr 2018 13:26:46 +0000 (15:26 +0200)]
Merge pull request #1504 from ararslan/aa/openbsd
Allow building on OpenBSD
Martin Kroeker [Wed, 4 Apr 2018 13:26:21 +0000 (15:26 +0200)]
Merge pull request #1501 from martin-frbg/issue875
Add workaround for old gcc and clang versions
Alex Arslan [Tue, 3 Apr 2018 23:42:01 +0000 (16:42 -0700)]
Add OpenBSD and DragonFly to community supported platforms
Alex Arslan [Tue, 3 Apr 2018 23:39:29 +0000 (16:39 -0700)]
Add support for DragonFly BSD
Alex Arslan [Mon, 2 Apr 2018 17:48:22 +0000 (10:48 -0700)]
Allow building on OpenBSD
With this change, OpenBLAS builds and all tests pass on OpenBSD 6.2
using Clang. Tested on x86-64 only, with and without DYNAMIC_ARCH=1.
Martin Kroeker [Sat, 31 Mar 2018 20:32:06 +0000 (22:32 +0200)]
Update memory.c
Martin Kroeker [Thu, 29 Mar 2018 11:13:49 +0000 (13:13 +0200)]
Update memory.c
Martin Kroeker [Thu, 29 Mar 2018 09:56:56 +0000 (11:56 +0200)]
Add workaround for old gcc and clang versions
Old gcc and clang do not handle constructor arguments, finally fix #875 as discussed there, using the fedora patch
Martin Kroeker [Wed, 28 Mar 2018 07:15:34 +0000 (09:15 +0200)]
Merge pull request #1500 from martin-frbg/issue1474
Correct index variables used in MFlops calculation
Martin Kroeker [Tue, 27 Mar 2018 19:52:29 +0000 (21:52 +0200)]
Correct index variables used in MFlops calculation
Fixes #1474
Martin Kroeker [Tue, 27 Mar 2018 19:43:23 +0000 (21:43 +0200)]
Merge pull request #1499 from quickwritereader/develop
Implemented missing vsx simd kernels for power8 blas1/2 double. z13 modifications
Martin Kroeker [Tue, 27 Mar 2018 19:43:05 +0000 (21:43 +0200)]
Merge pull request #1491 from martin-frbg/ddot_mt
Add multithreading support for Haswell DDOT
QWR QWR [Wed, 7 Mar 2018 15:01:03 +0000 (10:01 -0500)]
power8:Added initial zgemv_(t|n) ,i(d|z)amax,i(d|z)amin,dgemv_t(transposed),zrot
z13: improved zgemv_(t|n)_4,zscal,zaxpy
Martin Kroeker [Mon, 19 Mar 2018 17:03:25 +0000 (18:03 +0100)]
Merge pull request #1495 from martin-frbg/aff
Disable CPU affinity by default again
Martin Kroeker [Mon, 19 Mar 2018 17:02:23 +0000 (18:02 +0100)]
Disable CPU affinity by default again
This setting must have been changed unintentionally by my PR #1214 (probably leftover from unrelated tests)
Martin Kroeker [Sat, 17 Mar 2018 14:26:47 +0000 (15:26 +0100)]
Merge pull request #1494 from martin-frbg/x86_dsdot
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
Martin Kroeker [Sat, 17 Mar 2018 12:49:15 +0000 (13:49 +0100)]
Use generic/dot.c instead of the inferior arm/dot.c for x86 DSDOT
to resolve dsdot utest failure seen in #1492
Martin Kroeker [Fri, 16 Mar 2018 21:23:36 +0000 (22:23 +0100)]
Declare dot_compute static to avoid conflicts in multiarch builds
Martin Kroeker [Fri, 16 Mar 2018 15:58:47 +0000 (16:58 +0100)]
Add multithreading support for Haswell DDOT
copied from ashwinyes' implementation in dot_thunderx2t99.c
Martin Kroeker [Fri, 9 Mar 2018 23:15:44 +0000 (00:15 +0100)]
Use _Atomic instead of volatile for thread safety where C11 is supported
Martin Kroeker [Fri, 9 Mar 2018 23:03:49 +0000 (00:03 +0100)]
Use _Atomic instead of volatile for thread safety where C11 is supported
Suggested by dodomorandi in #660
Martin Kroeker [Sun, 4 Mar 2018 21:21:18 +0000 (22:21 +0100)]
Merge pull request #1482 from martin-frbg/haswell_axpy
Re-enable DAXPY AVX microkernels for x86_64
Martin Kroeker [Sun, 4 Mar 2018 18:37:03 +0000 (19:37 +0100)]
Re-enable DAXPY microkernels for x86_64
as the inaccuracies seen in the original testcase for #1332 appear to be due to an artefact that amplifies the very small rounding differences between FMA and discrete multiply+add
Martin Kroeker [Sun, 4 Mar 2018 16:39:56 +0000 (17:39 +0100)]
Rewrite ROTMG to address cases not covered by the netlib algorithm (#1480)
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
Martin Kroeker [Sat, 3 Mar 2018 21:43:56 +0000 (22:43 +0100)]
Merge pull request #1481 from martin-frbg/utest-fixup
Fix transposition of expected and computed values in error message
Martin Kroeker [Sat, 3 Mar 2018 17:01:51 +0000 (18:01 +0100)]
Fix transposition of expected and computed values in error message
Martin Kroeker [Wed, 28 Feb 2018 17:47:57 +0000 (18:47 +0100)]
Merge pull request #1476 from xsacha/patch-1
Fix CMake cross-compiling
Martin Kroeker [Wed, 28 Feb 2018 17:46:54 +0000 (18:46 +0100)]
Merge pull request #1477 from quickwritereader/develop
Power8 blas3 copy-pack routines
Martin Kroeker [Wed, 28 Feb 2018 17:40:31 +0000 (18:40 +0100)]
Merge pull request #1468 from martin-frbg/martin-frbg-patch-1
Limit the additional locking from PRs 1052,1299 to non-OpenMP cases
Sacha [Wed, 28 Feb 2018 00:25:25 +0000 (10:25 +1000)]
Fix CMake cross-compiling
Without specifying thread count, NUM_THREADS would not be defined and CMake would fail.
This is because core count cannot be determined when cross-compiling.
Martin Kroeker [Tue, 27 Feb 2018 13:04:16 +0000 (14:04 +0100)]
Merge pull request #1475 from ashwinyes/develop_20180227_utest_dsdot_fixes
ARM64: Fix utest dsdot errors
Ashwin Sekhar T K [Tue, 27 Feb 2018 10:47:55 +0000 (10:47 +0000)]
ARM64: Fix utest dsdot errors
Martin Kroeker [Tue, 27 Feb 2018 07:28:20 +0000 (08:28 +0100)]
Merge pull request #1473 from martin-frbg/p2align
Replace .align with .p2aligns in dscal.c and the Nehalem microkernels as well
Martin Kroeker [Mon, 26 Feb 2018 21:48:07 +0000 (22:48 +0100)]
Merge pull request #1472 from martin-frbg/utest-fixes
Fix limited DSDOT precision on arm,aarch64 and zarch
Martin Kroeker [Mon, 26 Feb 2018 19:58:33 +0000 (20:58 +0100)]
Replace .align with .p2align in the Nehalem microkernels
Martin Kroeker [Mon, 26 Feb 2018 19:48:03 +0000 (20:48 +0100)]
Convert .align to .p2align for OSX compatibility
Martin Kroeker [Mon, 26 Feb 2018 11:28:01 +0000 (12:28 +0100)]
Merge pull request #1471 from martin-frbg/p2align
Use .p2align instead of .align for portability on Haswell and Sandybridge
Martin Kroeker [Sun, 25 Feb 2018 18:57:23 +0000 (19:57 +0100)]
Use generic/dot.c for DSDOT on ARMV5 and above
The default arm/dot.c is less precise when used for DSDOT, as shown by utest
Martin Kroeker [Sun, 25 Feb 2018 18:52:14 +0000 (19:52 +0100)]
Use generic/dot.c for DSDOT on zarch
Martin Kroeker [Sun, 25 Feb 2018 18:51:25 +0000 (19:51 +0100)]
Use generic/dot.c for DSDOT on z13
The implementation in arm/dot.c has lower precision, as shown by the utest for dsdot.
Martin Kroeker [Sun, 25 Feb 2018 18:48:09 +0000 (19:48 +0100)]
Use dot.S also for DSDOT on CORTEXA57
Martin Kroeker [Sun, 25 Feb 2018 18:45:16 +0000 (19:45 +0100)]
Use dot.S also for DSDOT on ARMV8
Martin Kroeker [Sat, 24 Feb 2018 18:43:15 +0000 (19:43 +0100)]
Use .p2align instead of .align for compatibility on Sandybridge as well
Martin Kroeker [Sat, 24 Feb 2018 16:50:13 +0000 (17:50 +0100)]
Use .p2align instead of .align for portability
The OSX assembler apparently mishandles the argument to decimal .align, leading to a significant loss of performance
as observed in #730, #901 and most recently #1470
Martin Kroeker [Wed, 21 Feb 2018 10:45:33 +0000 (11:45 +0100)]
Limit the additional locking from PRs 1052,1299 to non-OpenMP multithreading
Martin Kroeker [Tue, 20 Feb 2018 16:17:38 +0000 (17:17 +0100)]
Merge pull request #1466 from xianyi/revert-1464-issue1461
Revert "Add locks only for non-OPENMP multithreading"
Martin Kroeker [Tue, 20 Feb 2018 16:17:12 +0000 (17:17 +0100)]
Revert "Add locks only for non-OPENMP multithreading"
Martin Kroeker [Tue, 20 Feb 2018 13:08:24 +0000 (14:08 +0100)]
Merge pull request #1464 from martin-frbg/issue1461
Add locks only for non-OPENMP multithreading
Martin Kroeker [Tue, 20 Feb 2018 11:17:18 +0000 (12:17 +0100)]
Add locks only for non-OPENMP multithreading
to migitate performance problems caused by #1052 and #1299 as seen in #1461
Martin Kroeker [Tue, 20 Feb 2018 09:07:17 +0000 (10:07 +0100)]
Restore the remaining utests (#1462)
* Restore the remaining utests
* Try fork test on Cygwin and Linux only, it hangs on at least ARMv8/Android as well
* Use generic sswap/dswap kernels for NEHALEM 32bit to fix fault found by the restored swap utest
* Disable zdotu test for MS cl to work around runtime error -
1073741819 on AppVeyor for now
(probably coding error in the initialization of the complex numbers or wrong choice of zdotu API)
Martin Kroeker [Sun, 18 Feb 2018 14:12:42 +0000 (15:12 +0100)]
Merge pull request #1463 from martin-frbg/rotmg2
Fix wrong conditionals in scaling loops of rotmg and update BLAS1 tests from netlib
Martin Kroeker [Sun, 18 Feb 2018 14:12:20 +0000 (15:12 +0100)]
Merge pull request #1460 from martin-frbg/issue1425
Revert insidious suppression of the -fopenmp flag in the LAPACK subtree
Martin Kroeker [Sun, 18 Feb 2018 11:54:52 +0000 (12:54 +0100)]
Fix conditionals in the rescaling against GAMSQ
Martin Kroeker [Sun, 18 Feb 2018 11:44:14 +0000 (12:44 +0100)]
Update single and double precision BLAS1 tests from LAPACK 3.8.0
adding tests for SROTMG, SROTM, SDSDOT, DROTMG, DROTM, DSDOT
Martin Kroeker [Sun, 18 Feb 2018 11:37:09 +0000 (12:37 +0100)]
Fix condition in both second scaling loops
the mslm [Sun, 18 Feb 2018 00:16:38 +0000 (04:16 +0400)]
dgemm_ncopy_4_ save/restore
the mslm [Fri, 16 Feb 2018 05:56:08 +0000 (09:56 +0400)]
power8 ?gemm_tcopy save/restore
Martin Kroeker [Wed, 14 Feb 2018 21:58:14 +0000 (22:58 +0100)]
Make "OMP task depend" sections conditional on OpenMP4, not just OpenMP
To allow compiling with gcc versions older than 4.9