Martin Kroeker [Thu, 16 Aug 2018 07:46:34 +0000 (09:46 +0200)]
Merge pull request #1738 from sharkcz/s390x
detect z14 arch on s390x
Dan Horák [Tue, 14 Aug 2018 10:30:38 +0000 (12:30 +0200)]
detect z14 arch on s390x
Martin Kroeker [Sun, 12 Aug 2018 22:01:37 +0000 (00:01 +0200)]
Merge pull request #1731 from fenrus75/readme
add short blurb about avx512 and needed compiler to README
Martin Kroeker [Sun, 12 Aug 2018 16:18:36 +0000 (18:18 +0200)]
Merge pull request #1733 from fenrus75/dsymv
Add an AVX512 enabled DSYMV (L) function
Martin Kroeker [Sun, 12 Aug 2018 16:17:42 +0000 (18:17 +0200)]
Merge pull request #1732 from fenrus75/dgemv
Add an AVX512 enabled DGEMV (n) function
Martin Kroeker [Sun, 12 Aug 2018 16:17:01 +0000 (18:17 +0200)]
Merge pull request #1730 from fenrus75/fix-sdot
Fix typo in sdot function
Martin Kroeker [Sun, 12 Aug 2018 16:16:45 +0000 (18:16 +0200)]
Merge pull request #1729 from fenrus75/dscal
Add an AVX512 enabled DSCAL function
Martin Kroeker [Sat, 11 Aug 2018 19:08:45 +0000 (21:08 +0200)]
Merge pull request #1723 from maamountki/develop
Disable zgemv scale in gemv benchmark by default
Arjan van de Ven [Sat, 11 Aug 2018 17:46:24 +0000 (17:46 +0000)]
Add an AVX512 enabled DSYMV (L) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Arjan van de Ven [Sat, 11 Aug 2018 17:38:12 +0000 (17:38 +0000)]
Add an AVX512 enabled DGEMV (n) function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Arjan van de Ven [Sat, 11 Aug 2018 17:21:46 +0000 (17:21 +0000)]
add short blurb about avx512 and needed compiler to README
Arjan van de Ven [Sat, 11 Aug 2018 17:16:45 +0000 (17:16 +0000)]
Fix typo in sdot function
it looks like my previous pull request was short the final commit;
fix a typo in sdot
Arjan van de Ven [Sat, 11 Aug 2018 17:14:57 +0000 (17:14 +0000)]
Add an AVX512 enabled DSCAL function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Martin Kroeker [Sat, 11 Aug 2018 09:01:20 +0000 (11:01 +0200)]
Merge pull request #1725 from fenrus75/axpy
Add a AVX512 enabled SAXPY/DAXPY functions
Martin Kroeker [Sat, 11 Aug 2018 09:00:56 +0000 (11:00 +0200)]
Merge pull request #1724 from fenrus75/sdot
Add an AVX512 enabled SDOT function
Martin Kroeker [Fri, 10 Aug 2018 11:24:36 +0000 (13:24 +0200)]
Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
Martin Kroeker [Fri, 10 Aug 2018 11:23:47 +0000 (13:23 +0200)]
Add changes from the 0.3.x releases
fixes #1727
Arjan van de Ven [Fri, 10 Aug 2018 02:58:32 +0000 (02:58 +0000)]
Add a AVX512 enabled SAXPY/DAXPY functions
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Arjan van de Ven [Fri, 10 Aug 2018 02:31:48 +0000 (02:31 +0000)]
Add an AVX512 enabled SDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
maamountki [Thu, 9 Aug 2018 22:54:18 +0000 (01:54 +0300)]
Disable scal to benchmark zgemv separately by default
Martin Kroeker [Thu, 9 Aug 2018 13:39:06 +0000 (15:39 +0200)]
Merge pull request #1721 from fenrus75/ddot2
Add an AVX512 enabled DDOT function
Arjan van de Ven [Wed, 8 Aug 2018 02:59:11 +0000 (02:59 +0000)]
Add an AVX512 enabled DDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Martin Kroeker [Mon, 6 Aug 2018 20:05:47 +0000 (22:05 +0200)]
Merge pull request #1717 from martin-frbg/issue1708
Add workaround for avx512 compilations on Cygwin
Martin Kroeker [Mon, 6 Aug 2018 14:40:32 +0000 (16:40 +0200)]
Add workaround for avx512 compilations on Cygwin
fixes #1708
Martin Kroeker [Sun, 5 Aug 2018 20:48:44 +0000 (22:48 +0200)]
Merge pull request #1715 from stevengj/patch-1
fix blasabs for windows
Steven G. Johnson [Sun, 5 Aug 2018 12:18:51 +0000 (08:18 -0400)]
fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
Martin Kroeker [Sat, 4 Aug 2018 21:51:31 +0000 (23:51 +0200)]
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 21:51:10 +0000 (23:51 +0200)]
Merge pull request #1709 from stevengj/patch-1
fabs -> fabsl
Martin Kroeker [Sat, 4 Aug 2018 18:14:51 +0000 (20:14 +0200)]
fabs -> fabsl
Martin Kroeker [Sat, 4 Aug 2018 18:07:59 +0000 (20:07 +0200)]
Introduce blasabs() to switch between abs() and labs() for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 18:06:49 +0000 (20:06 +0200)]
Use blasabs to switch between abs and labs as needed for INTERFACE64
Steven G. Johnson [Fri, 3 Aug 2018 17:00:10 +0000 (13:00 -0400)]
fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
Martin Kroeker [Thu, 2 Aug 2018 21:48:42 +0000 (23:48 +0200)]
Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
Martin Kroeker [Thu, 2 Aug 2018 20:27:00 +0000 (22:27 +0200)]
Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
Scott Thornton [Thu, 2 Aug 2018 19:58:52 +0000 (14:58 -0500)]
Added target_include_directories()
Zoltán Mizsei [Thu, 2 Aug 2018 18:49:14 +0000 (20:49 +0200)]
Haiku supporting patches
Martin Kroeker [Thu, 2 Aug 2018 16:53:34 +0000 (18:53 +0200)]
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
Craig Donner [Thu, 2 Aug 2018 15:21:19 +0000 (16:21 +0100)]
Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
Scott Thornton [Mon, 30 Jul 2018 20:18:29 +0000 (15:18 -0500)]
Set EXPORT_NAME to match OpenBLASConfig.cmake
Martin Kroeker [Mon, 30 Jul 2018 06:23:13 +0000 (08:23 +0200)]
Set version to 0.3.3.dev
Martin Kroeker [Mon, 30 Jul 2018 06:22:38 +0000 (08:22 +0200)]
Set version to 0.3.3.dev
Martin Kroeker [Sun, 29 Jul 2018 20:37:09 +0000 (22:37 +0200)]
Merge branch 'release-0.3.0' into develop
Martin Kroeker [Wed, 25 Jul 2018 17:55:29 +0000 (19:55 +0200)]
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
Martin Kroeker [Tue, 24 Jul 2018 15:46:33 +0000 (17:46 +0200)]
Do not treat WIndows UWB builds as cross-compiling
Martin Kroeker [Sun, 22 Jul 2018 14:34:09 +0000 (16:34 +0200)]
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
Martin Kroeker [Sun, 22 Jul 2018 07:19:19 +0000 (09:19 +0200)]
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
Martin Kroeker [Thu, 19 Jul 2018 17:03:45 +0000 (19:03 +0200)]
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
Martin Kroeker [Thu, 19 Jul 2018 06:57:56 +0000 (08:57 +0200)]
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
Martin Kroeker [Mon, 16 Jul 2018 20:47:05 +0000 (22:47 +0200)]
Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
Martin Kroeker [Mon, 16 Jul 2018 20:46:49 +0000 (22:46 +0200)]
Merge pull request #1684 from martin-frbg/issue1672
Work around utest failures in the MIPS64 SICORTEX target
Martin Kroeker [Mon, 16 Jul 2018 10:56:39 +0000 (12:56 +0200)]
typo fix
Martin Kroeker [Sun, 15 Jul 2018 15:11:40 +0000 (17:11 +0200)]
Fix precision problem in DSDOT
Martin Kroeker [Sun, 15 Jul 2018 15:09:55 +0000 (17:09 +0200)]
Use C kernels for default c/zAXPY, xROT, c/zSWAP
Martin Kroeker [Thu, 12 Jul 2018 21:39:00 +0000 (23:39 +0200)]
Add cpu identification via mfpvr call for the BSDs
fixes #1671
Martin Kroeker [Thu, 12 Jul 2018 12:05:13 +0000 (14:05 +0200)]
Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
Martin Kroeker [Thu, 12 Jul 2018 09:47:52 +0000 (11:47 +0200)]
Fix declaration of snprintf for older MSVC
_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)
Martin Kroeker [Thu, 12 Jul 2018 09:42:25 +0000 (11:42 +0200)]
Fix definition of snprintf for MSVC
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
Martin Kroeker [Thu, 12 Jul 2018 07:21:34 +0000 (09:21 +0200)]
Merge pull request #1678 from martin-frbg/issue1677
Define snprintf for older versions of MSVC
Martin Kroeker [Thu, 12 Jul 2018 05:30:58 +0000 (07:30 +0200)]
Define snprintf for older versions of MSVC
for #1677
Martin Kroeker [Wed, 4 Jul 2018 06:27:21 +0000 (08:27 +0200)]
Merge pull request #1667 from xianyi/revert-1642-develop
Revert "Rewrite &= -> = and simplify the initial blocking phase."
Martin Kroeker [Wed, 4 Jul 2018 06:19:40 +0000 (08:19 +0200)]
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
Martin Kroeker [Wed, 4 Jul 2018 06:19:11 +0000 (08:19 +0200)]
Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
Martin Kroeker [Tue, 3 Jul 2018 19:42:28 +0000 (21:42 +0200)]
Revert "Rewrite &= -> = and simplify the initial blocking phase."
Martin Kroeker [Tue, 3 Jul 2018 19:03:24 +0000 (21:03 +0200)]
Add cpuid for AMD Ryzen 2
Martin Kroeker [Tue, 3 Jul 2018 19:01:35 +0000 (21:01 +0200)]
Add cpuid for AMD Ryzen 2
for #1664
Martin Kroeker [Tue, 3 Jul 2018 15:40:09 +0000 (17:40 +0200)]
Merge pull request #1662 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
Martin Kroeker [Tue, 3 Jul 2018 15:35:54 +0000 (17:35 +0200)]
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
for #1641
Martin Kroeker [Tue, 3 Jul 2018 12:41:44 +0000 (14:41 +0200)]
Add -march=skylake-avx512 to AVX512 compile check and suppress its output
Martin Kroeker [Mon, 2 Jul 2018 15:48:19 +0000 (17:48 +0200)]
Merge pull request #1660 from martin-frbg/issue1659
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
Martin Kroeker [Mon, 2 Jul 2018 12:40:41 +0000 (14:40 +0200)]
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
fixes 1659
Martin Kroeker [Sun, 1 Jul 2018 10:03:07 +0000 (12:03 +0200)]
Merge pull request #1657 from martin-frbg/release-0.3.0
Release 0.3.1
Martin Kroeker [Sun, 1 Jul 2018 10:01:51 +0000 (12:01 +0200)]
set version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 10:01:16 +0000 (12:01 +0200)]
set version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 09:59:47 +0000 (11:59 +0200)]
remove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:58:57 +0000 (11:58 +0200)]
remove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:56:40 +0000 (11:56 +0200)]
Merge pull request #1648 from martin-frbg/nofort
Handle NOFORTRAN=0
Martin Kroeker [Sun, 1 Jul 2018 09:55:21 +0000 (11:55 +0200)]
Merge pull request #1656 from xianyi/develop
Update the 0.3 branch from develop
Martin Kroeker [Sun, 1 Jul 2018 06:41:22 +0000 (08:41 +0200)]
Merge pull request #1655 from martin-frbg/issue1641
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
Martin Kroeker [Sat, 30 Jun 2018 23:17:03 +0000 (01:17 +0200)]
Merge pull request #1654 from martin-frbg/avx512check
Add compiler option to avx512 test and hide test output
Martin Kroeker [Sat, 30 Jun 2018 21:57:50 +0000 (23:57 +0200)]
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
fixes #1641
Martin Kroeker [Sat, 30 Jun 2018 21:47:44 +0000 (23:47 +0200)]
Add compiler option to avx512 test and hide test output
Martin Kroeker [Sat, 30 Jun 2018 15:48:03 +0000 (17:48 +0200)]
Merge pull request #1651 from martin-frbg/avx512-nodgemm
Disable the 16x2 DTRMM kernel on SkylakeX as well
Martin Kroeker [Sat, 30 Jun 2018 15:31:06 +0000 (17:31 +0200)]
Disable the 16x2 DTRMM kernel on SkylakeX as well
Martin Kroeker [Sat, 30 Jun 2018 11:05:46 +0000 (13:05 +0200)]
Merge pull request #1650 from martin-frbg/avx512-nodgemm
Disable the AVX512 DGEMM kernel for now
Martin Kroeker [Sat, 30 Jun 2018 11:05:30 +0000 (13:05 +0200)]
Merge pull request #1639 from martin-frbg/dyn_list
Add DYNAMIC_LIST option for user-defined list of dynamic targets
Martin Kroeker [Sat, 30 Jun 2018 09:34:48 +0000 (11:34 +0200)]
Disable the AVX512 DGEMM kernel for now
due to #1643
Martin Kroeker [Tue, 26 Jun 2018 22:09:21 +0000 (00:09 +0200)]
Update Makefile
Martin Kroeker [Tue, 26 Jun 2018 22:07:32 +0000 (00:07 +0200)]
Merge branch 'develop' into nofort
Martin Kroeker [Tue, 26 Jun 2018 22:00:27 +0000 (00:00 +0200)]
Handle NOFORTRAN=0
Martin Kroeker [Tue, 26 Jun 2018 20:27:30 +0000 (22:27 +0200)]
Merge pull request #1647 from martin-frbg/armv7-dot
Remove premature exits from ARMV7 xdot codes
Martin Kroeker [Tue, 26 Jun 2018 18:46:42 +0000 (20:46 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:57 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:00 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:44:13 +0000 (20:44 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 08:15:15 +0000 (10:15 +0200)]
Merge pull request #1644 from martin-frbg/revert-filterout
Revert changes to NOFORTRAN handling in Makefile
Martin Kroeker [Tue, 26 Jun 2018 06:09:52 +0000 (08:09 +0200)]
Revert changes to NOFORTRAN handling from 952541e
Martin Kroeker [Mon, 25 Jun 2018 19:02:31 +0000 (21:02 +0200)]
Try gradual fallback for cores not in the dynamic core list
Martin Kroeker [Mon, 25 Jun 2018 18:48:10 +0000 (20:48 +0200)]
Merge pull request #2 from martin-frbg/develop
merge develop
Martin Kroeker [Mon, 25 Jun 2018 18:45:56 +0000 (20:45 +0200)]
Merge pull request #1 from xianyi/develop
Merge xianyi:develop into develop
Martin Kroeker [Mon, 25 Jun 2018 17:23:40 +0000 (19:23 +0200)]
Merge pull request #1642 from oon3m0oo/develop
Rewrite &= -> = and simplify the initial blocking phase.