Martin Kroeker [Sun, 11 Aug 2019 21:31:36 +0000 (23:31 +0200)]
Update with changes from 0.3.7
Martin Kroeker [Sun, 11 Aug 2019 21:28:47 +0000 (23:28 +0200)]
Increment version to 0.3.8.dev
Martin Kroeker [Sun, 11 Aug 2019 21:28:13 +0000 (23:28 +0200)]
Increment version to 0.3.8.dev
Martin Kroeker [Sun, 11 Aug 2019 18:26:34 +0000 (20:26 +0200)]
Merge pull request #2212 from martin-frbg/nofort-nolib
Avoid spurious dependency on the fortran runtime despite NOFORTRAN=1
Martin Kroeker [Sun, 11 Aug 2019 14:24:39 +0000 (16:24 +0200)]
Avoid adding a spurious dependency on the fortran runtime despite NOFORTRAN=1
for cases where a fortran compiler is present but not wanted (e.g. not fully functional)
Martin Kroeker [Sun, 11 Aug 2019 14:08:05 +0000 (16:08 +0200)]
Merge pull request #2211 from martin-frbg/arm64_gcc_trivial
Silence two nuisance warnings from gcc
Martin Kroeker [Sun, 11 Aug 2019 10:46:05 +0000 (12:46 +0200)]
Silence two nuisance warnings from gcc
Martin Kroeker [Fri, 9 Aug 2019 05:55:35 +0000 (07:55 +0200)]
Merge pull request #2208 from martin-frbg/munmap-debug
Provide more information on mmap/munmap failure
Martin Kroeker [Fri, 9 Aug 2019 05:55:20 +0000 (07:55 +0200)]
Merge pull request #2206 from martin-frbg/zen-dtrmm
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
Martin Kroeker [Fri, 9 Aug 2019 05:55:02 +0000 (07:55 +0200)]
Merge pull request #2199 from martin-frbg/zen-dtrsm
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
Martin Kroeker [Thu, 8 Aug 2019 22:08:11 +0000 (00:08 +0200)]
Add files via upload
Martin Kroeker [Thu, 8 Aug 2019 21:15:35 +0000 (23:15 +0200)]
Provide more information on mmap/munmap failure
for #2207
Martin Kroeker [Sat, 3 Aug 2019 10:40:13 +0000 (12:40 +0200)]
Replace most vpermpd calls in the Haswell DTRSM_RN kernel
Martin Kroeker [Fri, 2 Aug 2019 06:36:14 +0000 (08:36 +0200)]
Merge pull request #2198 from martin-frbg/icelake
Update CPUID recognition for Intel Ice Lake
Martin Kroeker [Thu, 1 Aug 2019 20:52:35 +0000 (22:52 +0200)]
Add CPUID identification of Intel Ice Lake
Martin Kroeker [Thu, 1 Aug 2019 20:51:09 +0000 (22:51 +0200)]
Autodetect Intel Ice Lake (as SKYLAKEX target)
Martin Kroeker [Sun, 28 Jul 2019 21:17:28 +0000 (23:17 +0200)]
Replace vpermpd with vpermilpd in the Haswell DTRMM kernel
to improve performance on AMD Zen (#2180) applying wjc404's improvement of the DGEMM kernel from #2186
Martin Kroeker [Sun, 28 Jul 2019 21:11:40 +0000 (23:11 +0200)]
Merge pull request #2196 from wjc404/develop
Add vbroadcastsd kernel to dgemm_kernel_4x8_haswell.S
wjc404 [Sat, 27 Jul 2019 23:39:09 +0000 (07:39 +0800)]
Add files via upload
Martin Kroeker [Sat, 27 Jul 2019 11:00:13 +0000 (13:00 +0200)]
Merge pull request #2112 from ffontaine/develop
Makefile.arm: remove -march flags
Martin Kroeker [Wed, 24 Jul 2019 18:19:21 +0000 (20:19 +0200)]
Merge pull request #2193 from martin-frbg/makeutest
Override special make variables
Martin Kroeker [Wed, 24 Jul 2019 13:26:09 +0000 (15:26 +0200)]
Unset special make variables in ctest Makefile as well
Martin Kroeker [Tue, 23 Jul 2019 14:56:40 +0000 (16:56 +0200)]
Override special make variables
as seen in https://github.com/xianyi/OpenBLAS/issues/1912#issuecomment-
514183900 , any external setting of TARGET_ARCH (which could result from building OpenBLAS as part of a larger project that actually uses this variable) would cause the utest build to fail.
(Other subtargets appear to be unaffected as they do not use implicit make rules)
Martin Kroeker [Tue, 23 Jul 2019 14:20:39 +0000 (16:20 +0200)]
Merge pull request #2191 from tylerjereddy/conditional_updates
MAINT: remove legacy CMake endif()
Martin Kroeker [Tue, 23 Jul 2019 14:15:08 +0000 (16:15 +0200)]
Merge pull request #2190 from martin-frbg/zdot-zen
Replace vpermpd with vpermilpd in the Haswell/Zen zdot microkernel
Martin Kroeker [Tue, 23 Jul 2019 06:32:56 +0000 (08:32 +0200)]
Merge pull request #2189 from wjc404/develop
Update dgemm_kernel_4x8_haswell.S for reducing cache misses
Tyler Reddy [Tue, 23 Jul 2019 03:24:57 +0000 (21:24 -0600)]
MAINT: remove legacy CMake endif()
* clean up a case where CMake endif()
contained the conditional used in the
if(), which is no longer needed /
discouraged since our minimum required
CMake version supports the modern syntax
Martin Kroeker [Mon, 22 Jul 2019 06:28:16 +0000 (08:28 +0200)]
Replace vpermpd with vpermilpd
to improve performance on Zen/Zen2 (as demonstrated by wjc404 in #2180)
wjc404 [Sat, 20 Jul 2019 17:10:32 +0000 (01:10 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Sat, 20 Jul 2019 16:47:45 +0000 (00:47 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Sat, 20 Jul 2019 14:08:22 +0000 (22:08 +0800)]
Add files via upload
wjc404 [Sat, 20 Jul 2019 14:04:41 +0000 (22:04 +0800)]
Add files via upload
wjc404 [Sat, 20 Jul 2019 06:33:37 +0000 (14:33 +0800)]
Add files via upload
wjc404 [Fri, 19 Jul 2019 15:58:24 +0000 (23:58 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Fri, 19 Jul 2019 15:47:58 +0000 (23:47 +0800)]
Add files via upload
Martin Kroeker [Thu, 18 Jul 2019 14:04:44 +0000 (16:04 +0200)]
Merge pull request #2186 from wjc404/develop
Update "dgemm_kernel_4x8_haswell.S" for improving performance on zen2 chips
wjc404 [Wed, 17 Jul 2019 15:50:03 +0000 (23:50 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 15:47:30 +0000 (23:47 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 14:39:15 +0000 (22:39 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 13:27:41 +0000 (21:27 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 09:02:35 +0000 (17:02 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Tue, 16 Jul 2019 16:55:06 +0000 (00:55 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Tue, 16 Jul 2019 16:46:51 +0000 (00:46 +0800)]
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
Martin Kroeker [Tue, 9 Jul 2019 18:08:52 +0000 (20:08 +0200)]
Merge pull request #2181 from isuruf/install_name
Change install_name on osx to match linux
Isuru Fernando [Mon, 8 Jul 2019 22:13:21 +0000 (17:13 -0500)]
Change install_name on osx to match linux
Martin Kroeker [Sun, 7 Jul 2019 16:28:21 +0000 (18:28 +0200)]
Merge pull request #2177 from martin-frbg/noaff
Fix surprising behaviour of NO_AFFINITY=0
Martin Kroeker [Sun, 7 Jul 2019 14:04:45 +0000 (16:04 +0200)]
Fix surprising behaviour of NO_AFFINITY=0
Martin Kroeker [Sat, 6 Jul 2019 16:07:19 +0000 (18:07 +0200)]
Merge pull request #2175 from martin-frbg/cmake-mingw-fixes
Fix CMAKE compilation with MinGW32 and add it to Appveyor
Martin Kroeker [Sat, 6 Jul 2019 13:07:15 +0000 (15:07 +0200)]
Mingw32 needs leading underscore on object names
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
Martin Kroeker [Sat, 6 Jul 2019 13:05:04 +0000 (15:05 +0200)]
Make disabling DYNAMIC_ARCH on unsupported systems work
needs to be unset in the cache for the change to have any effect
Martin Kroeker [Sat, 6 Jul 2019 13:02:39 +0000 (15:02 +0200)]
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
Martin Kroeker [Sat, 6 Jul 2019 12:30:33 +0000 (14:30 +0200)]
Add mingw builds to Appveyor config
Martin Kroeker [Sat, 6 Jul 2019 12:29:47 +0000 (14:29 +0200)]
Utest needs CBLAS but not necessarily FORTRAN
Martin Kroeker [Wed, 3 Jul 2019 17:16:30 +0000 (19:16 +0200)]
Merge pull request #2162 from martin-frbg/pgi
Fixes for PGI compiler
Martin Kroeker [Mon, 1 Jul 2019 19:06:02 +0000 (21:06 +0200)]
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
AbdelRauf [Tue, 18 Jun 2019 15:55:56 +0000 (15:55 +0000)]
cgemm/ctrmm power9
Martin Kroeker [Sun, 30 Jun 2019 21:29:02 +0000 (23:29 +0200)]
Merge pull request #2170 from pkubaj/patch-1
Fix build on PPC970 for FreeBSD
pkubaj [Fri, 28 Jun 2019 10:31:45 +0000 (10:31 +0000)]
Fix build for PPC970 on FreeBSD pt.2
FreeBSD needs those macros too.
pkubaj [Fri, 28 Jun 2019 10:29:44 +0000 (10:29 +0000)]
Fix build for PPC970 on FreeBSD pt. 1
FreeBSD needs DCBT_ARG=0 as well.
Martin Kroeker [Tue, 25 Jun 2019 10:56:33 +0000 (12:56 +0200)]
Merge pull request #2169 from pkubaj/develop
Fix build on FreeBSD/powerpc64.
Piotr Kubaj [Tue, 25 Jun 2019 08:58:56 +0000 (10:58 +0200)]
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
Martin Kroeker [Thu, 20 Jun 2019 17:56:01 +0000 (19:56 +0200)]
PGI compiler does not like -march=native
Martin Kroeker [Wed, 19 Jun 2019 12:38:01 +0000 (14:38 +0200)]
Merge pull request #2167 from kavanabhat/dtrmm_power8_segfault
Fix DTRMMKERNEL register save for power8 64-bit mode (Fix for #2166)
kavanabhat [Wed, 19 Jun 2019 09:57:14 +0000 (15:27 +0530)]
Update dtrmm_kernel_16x4_power8.S
AbdelRauf [Mon, 17 Jun 2019 15:33:38 +0000 (15:33 +0000)]
new sgemm 8x16
Martin Kroeker [Sun, 16 Jun 2019 16:35:43 +0000 (18:35 +0200)]
Fix mov syntax
Martin Kroeker [Sun, 16 Jun 2019 13:04:10 +0000 (15:04 +0200)]
Zero ecx with a mov instruction
PGI assembler does not like the initialization in the constraints.
Martin Kroeker [Fri, 14 Jun 2019 06:08:11 +0000 (08:08 +0200)]
Update Makefile.x86_64
Martin Kroeker [Thu, 13 Jun 2019 21:01:35 +0000 (23:01 +0200)]
Do not force gcc options on non-gcc compilers
fixes compile failure with pgi 18.10 as reported on OpenBLAS-users
Martin Kroeker [Mon, 10 Jun 2019 17:12:45 +0000 (19:12 +0200)]
Merge pull request #2159 from martin-frbg/issue2149
Avoid unintentional activation of TLS codepath via USE_TLS=0
Martin Kroeker [Mon, 10 Jun 2019 15:24:15 +0000 (17:24 +0200)]
Avoid unintentional activation of TLS code via USE_TLS=0
fixes #2149
Martin Kroeker [Mon, 10 Jun 2019 12:08:11 +0000 (14:08 +0200)]
Merge pull request #2158 from martin-frbg/issue2143
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
Martin Kroeker [Mon, 10 Jun 2019 07:50:13 +0000 (09:50 +0200)]
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
Martin Kroeker [Sun, 9 Jun 2019 10:19:08 +0000 (12:19 +0200)]
Merge pull request #2157 from martin-frbg/2154-2
Add gfortran workaround for potential ABI violation
Martin Kroeker [Sun, 9 Jun 2019 07:31:13 +0000 (09:31 +0200)]
Update fc.cmake
Martin Kroeker [Sat, 8 Jun 2019 21:17:03 +0000 (23:17 +0200)]
Add gfortran workaround for potential ABI violation
for #2154
Martin Kroeker [Fri, 7 Jun 2019 11:23:07 +0000 (13:23 +0200)]
Merge pull request #2148 from TiborGY/cpp_thread_test_2
Thread safety tester using C++11 threading (cleaned history)
Martin Kroeker [Thu, 6 Jun 2019 11:43:12 +0000 (13:43 +0200)]
Merge pull request #2156 from martin-frbg/issue2154
Add gfortran workaround for C->FORTRAN ABI violation
Martin Kroeker [Thu, 6 Jun 2019 08:24:16 +0000 (10:24 +0200)]
Add gfortran workaround for ABI violations
for #2154 (see gcc bug 90329)
Martin Kroeker [Thu, 6 Jun 2019 08:18:40 +0000 (10:18 +0200)]
Add gfortran workaround for ABI violations in LAPACKE
for #2154 (see gcc bug 90329)
Martin Kroeker [Thu, 6 Jun 2019 05:42:56 +0000 (07:42 +0200)]
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
AbdelRauf [Wed, 5 Jun 2019 20:50:50 +0000 (20:50 +0000)]
conflict resolve
AbdelRauf [Wed, 5 Jun 2019 10:30:57 +0000 (10:30 +0000)]
power9 zgemm ztrmm optimized
Martin Kroeker [Wed, 5 Jun 2019 18:27:45 +0000 (20:27 +0200)]
Merge pull request #2145 from martin-frbg/1912-3
Separate implementations of AMAX and IAMAX on arm
Martin Kroeker [Wed, 5 Jun 2019 18:27:05 +0000 (20:27 +0200)]
Merge pull request #2110 from pc2/cpu-detection
Fix detection of Skylake processors when using GCC
Michael Lass [Fri, 3 May 2019 19:22:27 +0000 (21:22 +0200)]
c_check: Unlink correct file
Michael Lass [Fri, 3 May 2019 19:07:14 +0000 (21:07 +0200)]
Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
AbdelRauf [Fri, 31 May 2019 22:48:16 +0000 (22:48 +0000)]
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
Martin Kroeker [Mon, 3 Jun 2019 09:01:33 +0000 (11:01 +0200)]
Document NO_AVX512
for #2151
TiborGY [Sat, 1 Jun 2019 19:36:41 +0000 (21:36 +0200)]
add c++ thread test option to Makefile.rule
TiborGY [Sat, 1 Jun 2019 19:32:52 +0000 (21:32 +0200)]
hook up c++ thread safety test (main Makefile)
TiborGY [Sat, 1 Jun 2019 19:30:06 +0000 (21:30 +0200)]
upload thread safety test folder
AbdelRauf [Thu, 23 May 2019 04:23:43 +0000 (04:23 +0000)]
improved zgemm power9 based on power8
Martin Kroeker [Thu, 30 May 2019 09:38:11 +0000 (11:38 +0200)]
Use generic kernels for complex (I)AMAX to support softfp
Martin Kroeker [Thu, 30 May 2019 09:25:43 +0000 (11:25 +0200)]
Ensure correct output for DAMAX with softfp
Martin Kroeker [Wed, 29 May 2019 13:02:51 +0000 (15:02 +0200)]
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
Martin Kroeker [Wed, 29 May 2019 12:09:10 +0000 (14:09 +0200)]
Merge pull request #2144 from xianyi/revert-2142-issue1912-2
Revert "Add softfp support in min/max kernels"
Martin Kroeker [Wed, 29 May 2019 12:07:17 +0000 (14:07 +0200)]
Revert "Add softfp support in min/max kernels"
Martin Kroeker [Tue, 28 May 2019 20:56:08 +0000 (22:56 +0200)]
Merge pull request #2142 from martin-frbg/issue1912-2
Add softfp support in min/max kernels
Martin Kroeker [Tue, 28 May 2019 18:50:40 +0000 (20:50 +0200)]
Merge pull request #2141 from martin-frbg/issue1912
Build and run utests independently of fortran