Martin Kroeker [Fri, 10 Aug 2018 11:24:36 +0000 (13:24 +0200)]
Merge pull request #1728 from martin-frbg/changelog
Add changes from the 0.3.x releases
Martin Kroeker [Fri, 10 Aug 2018 11:23:47 +0000 (13:23 +0200)]
Add changes from the 0.3.x releases
fixes #1727
Martin Kroeker [Thu, 9 Aug 2018 13:39:06 +0000 (15:39 +0200)]
Merge pull request #1721 from fenrus75/ddot2
Add an AVX512 enabled DDOT function
Arjan van de Ven [Wed, 8 Aug 2018 02:59:11 +0000 (02:59 +0000)]
Add an AVX512 enabled DDOT function
written in C intrinsics for best readability.
(the same C code works for Haswell as well)
For logistical reasons the code falls back to the existing
haswell AVX2 implementation if the GCC or LLVM compiler is not new enough
Martin Kroeker [Mon, 6 Aug 2018 20:05:47 +0000 (22:05 +0200)]
Merge pull request #1717 from martin-frbg/issue1708
Add workaround for avx512 compilations on Cygwin
Martin Kroeker [Mon, 6 Aug 2018 14:40:32 +0000 (16:40 +0200)]
Add workaround for avx512 compilations on Cygwin
fixes #1708
Martin Kroeker [Sun, 5 Aug 2018 20:48:44 +0000 (22:48 +0200)]
Merge pull request #1715 from stevengj/patch-1
fix blasabs for windows
Steven G. Johnson [Sun, 5 Aug 2018 12:18:51 +0000 (08:18 -0400)]
fix blasabs for windows
Bugfix in #1713 for Windows (LLP64), where `blasabs` needs to be `llabs` rather than `labs` for the 64-bit API.
Martin Kroeker [Sat, 4 Aug 2018 21:51:31 +0000 (23:51 +0200)]
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 21:51:10 +0000 (23:51 +0200)]
Merge pull request #1709 from stevengj/patch-1
fabs -> fabsl
Martin Kroeker [Sat, 4 Aug 2018 18:14:51 +0000 (20:14 +0200)]
fabs -> fabsl
Martin Kroeker [Sat, 4 Aug 2018 18:07:59 +0000 (20:07 +0200)]
Introduce blasabs() to switch between abs() and labs() for INTERFACE64
Martin Kroeker [Sat, 4 Aug 2018 18:06:49 +0000 (20:06 +0200)]
Use blasabs to switch between abs and labs as needed for INTERFACE64
Steven G. Johnson [Fri, 3 Aug 2018 17:00:10 +0000 (13:00 -0400)]
fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
Martin Kroeker [Thu, 2 Aug 2018 21:48:42 +0000 (23:48 +0200)]
Merge pull request #1703 from wsttiger/cmake_fix
Set EXPORT_NAME to match OpenBLASConfig.cmake
Martin Kroeker [Thu, 2 Aug 2018 20:27:00 +0000 (22:27 +0200)]
Merge pull request #1707 from extrowerk/haiku_support
Haiku supporting patches
Scott Thornton [Thu, 2 Aug 2018 19:58:52 +0000 (14:58 -0500)]
Added target_include_directories()
Zoltán Mizsei [Thu, 2 Aug 2018 18:49:14 +0000 (20:49 +0200)]
Haiku supporting patches
Martin Kroeker [Thu, 2 Aug 2018 16:53:34 +0000 (18:53 +0200)]
Merge pull request #1706 from oon3m0oo/develop
Fix #1705 where we incorrectly calculate page locations.
Craig Donner [Thu, 2 Aug 2018 15:21:19 +0000 (16:21 +0100)]
Fix #1705 where we incorrectly calculate page locations.
Since we now use an allocation size that isn't a multiple of PAGESIZE, finding
the pages for run_bench wasn't terminating properly. Now we detect if we've
found enough pages for the allocation and terminate the loop.
Scott Thornton [Mon, 30 Jul 2018 20:18:29 +0000 (15:18 -0500)]
Set EXPORT_NAME to match OpenBLASConfig.cmake
Martin Kroeker [Mon, 30 Jul 2018 06:23:13 +0000 (08:23 +0200)]
Set version to 0.3.3.dev
Martin Kroeker [Mon, 30 Jul 2018 06:22:38 +0000 (08:22 +0200)]
Set version to 0.3.3.dev
Martin Kroeker [Sun, 29 Jul 2018 20:37:09 +0000 (22:37 +0200)]
Merge branch 'release-0.3.0' into develop
Martin Kroeker [Wed, 25 Jul 2018 17:55:29 +0000 (19:55 +0200)]
Merge pull request #1697 from martin-frbg/issue1696
Do not treat WIndows UWB builds as cross-compiling
Martin Kroeker [Tue, 24 Jul 2018 15:46:33 +0000 (17:46 +0200)]
Do not treat WIndows UWB builds as cross-compiling
Martin Kroeker [Sun, 22 Jul 2018 14:34:09 +0000 (16:34 +0200)]
Merge pull request #1695 from martin-frbg/issue1692
Unset memory table entry, not just the local pointer to it on shutdown
Martin Kroeker [Sun, 22 Jul 2018 07:19:19 +0000 (09:19 +0200)]
Unset memory table entry, not just the temporary pointer to it on shutdown
to fix crash with multiple instances of OpenBLAS, #1692
Martin Kroeker [Thu, 19 Jul 2018 17:03:45 +0000 (19:03 +0200)]
Merge pull request #1688 from martin-frbg/issue1673
Temporarily disable special handling of OPENMP thread memory allocation
Martin Kroeker [Thu, 19 Jul 2018 06:57:56 +0000 (08:57 +0200)]
Temporarily disable special handling of OPENMP thread memory allocation
for issue #1673
Martin Kroeker [Mon, 16 Jul 2018 20:47:05 +0000 (22:47 +0200)]
Merge pull request #1681 from martin-frbg/issue1671
Add cpu identification via mfpvr call for the BSDs
Martin Kroeker [Mon, 16 Jul 2018 20:46:49 +0000 (22:46 +0200)]
Merge pull request #1684 from martin-frbg/issue1672
Work around utest failures in the MIPS64 SICORTEX target
Martin Kroeker [Mon, 16 Jul 2018 10:56:39 +0000 (12:56 +0200)]
typo fix
Martin Kroeker [Sun, 15 Jul 2018 15:11:40 +0000 (17:11 +0200)]
Fix precision problem in DSDOT
Martin Kroeker [Sun, 15 Jul 2018 15:09:55 +0000 (17:09 +0200)]
Use C kernels for default c/zAXPY, xROT, c/zSWAP
Martin Kroeker [Thu, 12 Jul 2018 21:39:00 +0000 (23:39 +0200)]
Add cpu identification via mfpvr call for the BSDs
fixes #1671
Martin Kroeker [Thu, 12 Jul 2018 12:05:13 +0000 (14:05 +0200)]
Merge pull request #1680 from martin-frbg/snprint
Fix wrong redefinitions of snprintf for older MSVC
Martin Kroeker [Thu, 12 Jul 2018 09:47:52 +0000 (11:47 +0200)]
Fix declaration of snprintf for older MSVC
_snprintf_s takes an additional (size) argument, so is no direct replacement.
(Note that this code is currently unused - the two instances of snprintf here are within ifdef blocks that are not compiled for MSVC)
Martin Kroeker [Thu, 12 Jul 2018 09:42:25 +0000 (11:42 +0200)]
Fix definition of snprintf for MSVC
MS _snprintf_s takes an additional argument for the size of the buffer, so is not a direct replacement (utest/ctest.h from which I copied was wrong)
Martin Kroeker [Thu, 12 Jul 2018 07:21:34 +0000 (09:21 +0200)]
Merge pull request #1678 from martin-frbg/issue1677
Define snprintf for older versions of MSVC
Martin Kroeker [Thu, 12 Jul 2018 05:30:58 +0000 (07:30 +0200)]
Define snprintf for older versions of MSVC
for #1677
Martin Kroeker [Wed, 4 Jul 2018 06:27:21 +0000 (08:27 +0200)]
Merge pull request #1667 from xianyi/revert-1642-develop
Revert "Rewrite &= -> = and simplify the initial blocking phase."
Martin Kroeker [Wed, 4 Jul 2018 06:19:40 +0000 (08:19 +0200)]
Merge pull request #1665 from martin-frbg/cpuid-ryzen2
Add cpuid for AMD Ryzen 2
Martin Kroeker [Wed, 4 Jul 2018 06:19:11 +0000 (08:19 +0200)]
Merge pull request #1663 from martin-frbg/issue1641
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
Martin Kroeker [Tue, 3 Jul 2018 19:42:28 +0000 (21:42 +0200)]
Revert "Rewrite &= -> = and simplify the initial blocking phase."
Martin Kroeker [Tue, 3 Jul 2018 19:03:24 +0000 (21:03 +0200)]
Add cpuid for AMD Ryzen 2
Martin Kroeker [Tue, 3 Jul 2018 19:01:35 +0000 (21:01 +0200)]
Add cpuid for AMD Ryzen 2
for #1664
Martin Kroeker [Tue, 3 Jul 2018 15:40:09 +0000 (17:40 +0200)]
Merge pull request #1662 from martin-frbg/cmake-avx512
Add -march=skylake-avx512 to AVX512 compile check and suppress its ou…
Martin Kroeker [Tue, 3 Jul 2018 15:35:54 +0000 (17:35 +0200)]
Double MAX_ALLOCATING_THREADS to fix segfaults with Go and Octave
for #1641
Martin Kroeker [Tue, 3 Jul 2018 12:41:44 +0000 (14:41 +0200)]
Add -march=skylake-avx512 to AVX512 compile check and suppress its output
Martin Kroeker [Mon, 2 Jul 2018 15:48:19 +0000 (17:48 +0200)]
Merge pull request #1660 from martin-frbg/issue1659
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
Martin Kroeker [Mon, 2 Jul 2018 12:40:41 +0000 (14:40 +0200)]
Fix typo that broke compilation with DYNAMIC_ARCH and NO_AVX2
fixes 1659
Martin Kroeker [Sun, 1 Jul 2018 10:03:07 +0000 (12:03 +0200)]
Merge pull request #1657 from martin-frbg/release-0.3.0
Release 0.3.1
Martin Kroeker [Sun, 1 Jul 2018 10:01:51 +0000 (12:01 +0200)]
set version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 10:01:16 +0000 (12:01 +0200)]
set version number to 0.3.2.dev
Martin Kroeker [Sun, 1 Jul 2018 09:59:47 +0000 (11:59 +0200)]
remove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:58:57 +0000 (11:58 +0200)]
remove dev suffix from version number
Martin Kroeker [Sun, 1 Jul 2018 09:56:40 +0000 (11:56 +0200)]
Merge pull request #1648 from martin-frbg/nofort
Handle NOFORTRAN=0
Martin Kroeker [Sun, 1 Jul 2018 09:55:21 +0000 (11:55 +0200)]
Merge pull request #1656 from xianyi/develop
Update the 0.3 branch from develop
Martin Kroeker [Sun, 1 Jul 2018 06:41:22 +0000 (08:41 +0200)]
Merge pull request #1655 from martin-frbg/issue1641
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
Martin Kroeker [Sat, 30 Jun 2018 23:17:03 +0000 (01:17 +0200)]
Merge pull request #1654 from martin-frbg/avx512check
Add compiler option to avx512 test and hide test output
Martin Kroeker [Sat, 30 Jun 2018 21:57:50 +0000 (23:57 +0200)]
Fix apparent off-by-one error in calculation of MAX_ALLOCATING_THREADS
fixes #1641
Martin Kroeker [Sat, 30 Jun 2018 21:47:44 +0000 (23:47 +0200)]
Add compiler option to avx512 test and hide test output
Martin Kroeker [Sat, 30 Jun 2018 15:48:03 +0000 (17:48 +0200)]
Merge pull request #1651 from martin-frbg/avx512-nodgemm
Disable the 16x2 DTRMM kernel on SkylakeX as well
Martin Kroeker [Sat, 30 Jun 2018 15:31:06 +0000 (17:31 +0200)]
Disable the 16x2 DTRMM kernel on SkylakeX as well
Martin Kroeker [Sat, 30 Jun 2018 11:05:46 +0000 (13:05 +0200)]
Merge pull request #1650 from martin-frbg/avx512-nodgemm
Disable the AVX512 DGEMM kernel for now
Martin Kroeker [Sat, 30 Jun 2018 11:05:30 +0000 (13:05 +0200)]
Merge pull request #1639 from martin-frbg/dyn_list
Add DYNAMIC_LIST option for user-defined list of dynamic targets
Martin Kroeker [Sat, 30 Jun 2018 09:34:48 +0000 (11:34 +0200)]
Disable the AVX512 DGEMM kernel for now
due to #1643
Martin Kroeker [Tue, 26 Jun 2018 22:09:21 +0000 (00:09 +0200)]
Update Makefile
Martin Kroeker [Tue, 26 Jun 2018 22:07:32 +0000 (00:07 +0200)]
Merge branch 'develop' into nofort
Martin Kroeker [Tue, 26 Jun 2018 22:00:27 +0000 (00:00 +0200)]
Handle NOFORTRAN=0
Martin Kroeker [Tue, 26 Jun 2018 20:27:30 +0000 (22:27 +0200)]
Merge pull request #1647 from martin-frbg/armv7-dot
Remove premature exits from ARMV7 xdot codes
Martin Kroeker [Tue, 26 Jun 2018 18:46:42 +0000 (20:46 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:57 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:45:00 +0000 (20:45 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 18:44:13 +0000 (20:44 +0200)]
Remove premature exit for INC_X or INC_Y zero
Martin Kroeker [Tue, 26 Jun 2018 08:15:15 +0000 (10:15 +0200)]
Merge pull request #1644 from martin-frbg/revert-filterout
Revert changes to NOFORTRAN handling in Makefile
Martin Kroeker [Tue, 26 Jun 2018 06:09:52 +0000 (08:09 +0200)]
Revert changes to NOFORTRAN handling from 952541e
Martin Kroeker [Mon, 25 Jun 2018 19:02:31 +0000 (21:02 +0200)]
Try gradual fallback for cores not in the dynamic core list
Martin Kroeker [Mon, 25 Jun 2018 18:48:10 +0000 (20:48 +0200)]
Merge pull request #2 from martin-frbg/develop
merge develop
Martin Kroeker [Mon, 25 Jun 2018 18:45:56 +0000 (20:45 +0200)]
Merge pull request #1 from xianyi/develop
Merge xianyi:develop into develop
Martin Kroeker [Mon, 25 Jun 2018 17:23:40 +0000 (19:23 +0200)]
Merge pull request #1642 from oon3m0oo/develop
Rewrite &= -> = and simplify the initial blocking phase.
Craig Donner [Mon, 25 Jun 2018 12:53:11 +0000 (13:53 +0100)]
Rewrite &= -> = and simplify the initial blocking phase.
Martin Kroeker [Sat, 23 Jun 2018 17:42:15 +0000 (19:42 +0200)]
Add support for a user-defined list of dynamic targets
Martin Kroeker [Sat, 23 Jun 2018 17:41:32 +0000 (19:41 +0200)]
Add support for a user-defined list of dynamic targets
Martin Kroeker [Sat, 23 Jun 2018 13:01:02 +0000 (15:01 +0200)]
Merge pull request #1638 from martin-frbg/issue1637
Expose the CBLAS interface to the IxAMIN functions and have make build it
Martin Kroeker [Sat, 23 Jun 2018 11:31:09 +0000 (13:31 +0200)]
Expose CBLAS interface to BLAS extensions iXamin
Martin Kroeker [Sat, 23 Jun 2018 11:27:30 +0000 (13:27 +0200)]
Build cblas_iXamin interfaces
Martin Kroeker [Thu, 21 Jun 2018 19:01:03 +0000 (21:01 +0200)]
Merge pull request #1634 from oon3m0oo/develop
Fix data races reported by TSAN.
oon3m0oo [Thu, 21 Jun 2018 16:47:45 +0000 (17:47 +0100)]
Use BLAS rather than CBLAS in test_fork.c (#1626)
This is handy for people not using lapack.
Craig Donner [Thu, 21 Jun 2018 10:13:57 +0000 (11:13 +0100)]
Fix data races reported by TSAN.
oon3m0oo [Wed, 20 Jun 2018 20:04:03 +0000 (21:04 +0100)]
Further improvements to memory.c. (#1625)
- Compiler TLS is now used only used when the compiler supports it
- If compiler TLS is unsupported, we use platform-specific TLS
- Only one variable (an index) is now in TLS
- We only access TLS once per alloc, and never when freeing
- Allocation / release info is now stored within the allocation itself, by
over-allocating; this saves having external structures do the bookkeeping, and
reduces some of the redundant data that was being stored (such as addresses)
- We never hit the alloc lock when not using SMP or when using OpenMP (that was
my fault)
- Now that there are fewer tracking structures I think this is a bit easier to
read than before
Martin Kroeker [Wed, 20 Jun 2018 19:51:57 +0000 (21:51 +0200)]
Merge pull request #1630 from martin-frbg/x86-march
Add -march=skylake-avx512 to flags if target is skylake x
Martin Kroeker [Wed, 20 Jun 2018 19:51:38 +0000 (21:51 +0200)]
Merge pull request #1631 from oon3m0oo/stack
Avoid declaring arrays of size 0 when making large stack allocations.
Craig Donner [Wed, 20 Jun 2018 16:03:18 +0000 (17:03 +0100)]
Avoid declaring arrays of size 0 when making large stack allocations.
Martin Kroeker [Wed, 20 Jun 2018 14:41:13 +0000 (16:41 +0200)]
Merge pull request #1629 from martin-frbg/issue1628
Make gfortran link libomp for clang in the tests; avoid two typical gotchas with NOFORTRAN
Martin Kroeker [Wed, 20 Jun 2018 13:16:19 +0000 (15:16 +0200)]
Add -march=skylake-avx512 to flags if target is skylake x
Martin Kroeker [Wed, 20 Jun 2018 11:20:30 +0000 (13:20 +0200)]
Need to use filter-out to handle NOFORTRAN not set
Martin Kroeker [Tue, 19 Jun 2018 21:28:06 +0000 (23:28 +0200)]
Modify NOFORTRAN tests to always check the value; fix rewriting of NO_FORTRAN
Martin Kroeker [Tue, 19 Jun 2018 18:53:19 +0000 (20:53 +0200)]
Handle erroneous user settings NOFORTRAN=0 and NO_FORTRAN