Martin Kroeker [Thu, 18 Jul 2019 14:04:44 +0000 (16:04 +0200)]
Merge pull request #2186 from wjc404/develop
Update "dgemm_kernel_4x8_haswell.S" for improving performance on zen2 chips
wjc404 [Wed, 17 Jul 2019 15:50:03 +0000 (23:50 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 15:47:30 +0000 (23:47 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 14:39:15 +0000 (22:39 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 13:27:41 +0000 (21:27 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Wed, 17 Jul 2019 09:02:35 +0000 (17:02 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Tue, 16 Jul 2019 16:55:06 +0000 (00:55 +0800)]
Update dgemm_kernel_4x8_haswell.S
wjc404 [Tue, 16 Jul 2019 16:46:51 +0000 (00:46 +0800)]
Update dgemm_kernel_4x8_haswell.S for zen2
replaced a bunch of vpermpd instructions with vpermilpd and vperm2f128
Martin Kroeker [Tue, 9 Jul 2019 18:08:52 +0000 (20:08 +0200)]
Merge pull request #2181 from isuruf/install_name
Change install_name on osx to match linux
Isuru Fernando [Mon, 8 Jul 2019 22:13:21 +0000 (17:13 -0500)]
Change install_name on osx to match linux
Martin Kroeker [Sun, 7 Jul 2019 16:28:21 +0000 (18:28 +0200)]
Merge pull request #2177 from martin-frbg/noaff
Fix surprising behaviour of NO_AFFINITY=0
Martin Kroeker [Sun, 7 Jul 2019 14:04:45 +0000 (16:04 +0200)]
Fix surprising behaviour of NO_AFFINITY=0
Martin Kroeker [Sat, 6 Jul 2019 16:07:19 +0000 (18:07 +0200)]
Merge pull request #2175 from martin-frbg/cmake-mingw-fixes
Fix CMAKE compilation with MinGW32 and add it to Appveyor
Martin Kroeker [Sat, 6 Jul 2019 13:07:15 +0000 (15:07 +0200)]
Mingw32 needs leading underscore on object names
(also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
Martin Kroeker [Sat, 6 Jul 2019 13:05:04 +0000 (15:05 +0200)]
Make disabling DYNAMIC_ARCH on unsupported systems work
needs to be unset in the cache for the change to have any effect
Martin Kroeker [Sat, 6 Jul 2019 13:02:39 +0000 (15:02 +0200)]
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
Martin Kroeker [Sat, 6 Jul 2019 12:30:33 +0000 (14:30 +0200)]
Add mingw builds to Appveyor config
Martin Kroeker [Sat, 6 Jul 2019 12:29:47 +0000 (14:29 +0200)]
Utest needs CBLAS but not necessarily FORTRAN
Martin Kroeker [Wed, 3 Jul 2019 17:16:30 +0000 (19:16 +0200)]
Merge pull request #2162 from martin-frbg/pgi
Fixes for PGI compiler
Martin Kroeker [Mon, 1 Jul 2019 19:06:02 +0000 (21:06 +0200)]
Merge pull request #2172 from quickwritereader/develop
power9 cgemm/ctrmm. new sgemm 8x16
AbdelRauf [Tue, 18 Jun 2019 15:55:56 +0000 (15:55 +0000)]
cgemm/ctrmm power9
Martin Kroeker [Sun, 30 Jun 2019 21:29:02 +0000 (23:29 +0200)]
Merge pull request #2170 from pkubaj/patch-1
Fix build on PPC970 for FreeBSD
pkubaj [Fri, 28 Jun 2019 10:31:45 +0000 (10:31 +0000)]
Fix build for PPC970 on FreeBSD pt.2
FreeBSD needs those macros too.
pkubaj [Fri, 28 Jun 2019 10:29:44 +0000 (10:29 +0000)]
Fix build for PPC970 on FreeBSD pt. 1
FreeBSD needs DCBT_ARG=0 as well.
Martin Kroeker [Tue, 25 Jun 2019 10:56:33 +0000 (12:56 +0200)]
Merge pull request #2169 from pkubaj/develop
Fix build on FreeBSD/powerpc64.
Piotr Kubaj [Tue, 25 Jun 2019 08:58:56 +0000 (10:58 +0200)]
Fix build on FreeBSD/powerpc64.
Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl>
Martin Kroeker [Thu, 20 Jun 2019 17:56:01 +0000 (19:56 +0200)]
PGI compiler does not like -march=native
Martin Kroeker [Wed, 19 Jun 2019 12:38:01 +0000 (14:38 +0200)]
Merge pull request #2167 from kavanabhat/dtrmm_power8_segfault
Fix DTRMMKERNEL register save for power8 64-bit mode (Fix for #2166)
kavanabhat [Wed, 19 Jun 2019 09:57:14 +0000 (15:27 +0530)]
Update dtrmm_kernel_16x4_power8.S
AbdelRauf [Mon, 17 Jun 2019 15:33:38 +0000 (15:33 +0000)]
new sgemm 8x16
Martin Kroeker [Sun, 16 Jun 2019 16:35:43 +0000 (18:35 +0200)]
Fix mov syntax
Martin Kroeker [Sun, 16 Jun 2019 13:04:10 +0000 (15:04 +0200)]
Zero ecx with a mov instruction
PGI assembler does not like the initialization in the constraints.
Martin Kroeker [Fri, 14 Jun 2019 06:08:11 +0000 (08:08 +0200)]
Update Makefile.x86_64
Martin Kroeker [Thu, 13 Jun 2019 21:01:35 +0000 (23:01 +0200)]
Do not force gcc options on non-gcc compilers
fixes compile failure with pgi 18.10 as reported on OpenBLAS-users
Martin Kroeker [Mon, 10 Jun 2019 17:12:45 +0000 (19:12 +0200)]
Merge pull request #2159 from martin-frbg/issue2149
Avoid unintentional activation of TLS codepath via USE_TLS=0
Martin Kroeker [Mon, 10 Jun 2019 15:24:15 +0000 (17:24 +0200)]
Avoid unintentional activation of TLS code via USE_TLS=0
fixes #2149
Martin Kroeker [Mon, 10 Jun 2019 12:08:11 +0000 (14:08 +0200)]
Merge pull request #2158 from martin-frbg/issue2143
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
Martin Kroeker [Mon, 10 Jun 2019 07:50:13 +0000 (09:50 +0200)]
Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds
from #2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway.
Martin Kroeker [Sun, 9 Jun 2019 10:19:08 +0000 (12:19 +0200)]
Merge pull request #2157 from martin-frbg/2154-2
Add gfortran workaround for potential ABI violation
Martin Kroeker [Sun, 9 Jun 2019 07:31:13 +0000 (09:31 +0200)]
Update fc.cmake
Martin Kroeker [Sat, 8 Jun 2019 21:17:03 +0000 (23:17 +0200)]
Add gfortran workaround for potential ABI violation
for #2154
Martin Kroeker [Fri, 7 Jun 2019 11:23:07 +0000 (13:23 +0200)]
Merge pull request #2148 from TiborGY/cpp_thread_test_2
Thread safety tester using C++11 threading (cleaned history)
Martin Kroeker [Thu, 6 Jun 2019 11:43:12 +0000 (13:43 +0200)]
Merge pull request #2156 from martin-frbg/issue2154
Add gfortran workaround for C->FORTRAN ABI violation
Martin Kroeker [Thu, 6 Jun 2019 08:24:16 +0000 (10:24 +0200)]
Add gfortran workaround for ABI violations
for #2154 (see gcc bug 90329)
Martin Kroeker [Thu, 6 Jun 2019 08:18:40 +0000 (10:18 +0200)]
Add gfortran workaround for ABI violations in LAPACKE
for #2154 (see gcc bug 90329)
Martin Kroeker [Thu, 6 Jun 2019 05:42:56 +0000 (07:42 +0200)]
Merge pull request #2153 from quickwritereader/develop
improved power9 zgemm,sgemm
AbdelRauf [Wed, 5 Jun 2019 20:50:50 +0000 (20:50 +0000)]
conflict resolve
AbdelRauf [Wed, 5 Jun 2019 10:30:57 +0000 (10:30 +0000)]
power9 zgemm ztrmm optimized
Martin Kroeker [Wed, 5 Jun 2019 18:27:45 +0000 (20:27 +0200)]
Merge pull request #2145 from martin-frbg/1912-3
Separate implementations of AMAX and IAMAX on arm
Martin Kroeker [Wed, 5 Jun 2019 18:27:05 +0000 (20:27 +0200)]
Merge pull request #2110 from pc2/cpu-detection
Fix detection of Skylake processors when using GCC
Michael Lass [Fri, 3 May 2019 19:22:27 +0000 (21:22 +0200)]
c_check: Unlink correct file
Michael Lass [Fri, 3 May 2019 19:07:14 +0000 (21:07 +0200)]
Fix detection of AVX512 capable compilers in getarch
21eda8b5 introduced a check in getarch.c to test if the compiler is capable of
AVX512. This check currently fails, since the used __AVX2__ macro is only
defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this
is the case by building getarch with -march=native on x86_64. It is only
supposed to run on the build host anyway.
AbdelRauf [Fri, 31 May 2019 22:48:16 +0000 (22:48 +0000)]
sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52
Martin Kroeker [Mon, 3 Jun 2019 09:01:33 +0000 (11:01 +0200)]
Document NO_AVX512
for #2151
TiborGY [Sat, 1 Jun 2019 19:36:41 +0000 (21:36 +0200)]
add c++ thread test option to Makefile.rule
TiborGY [Sat, 1 Jun 2019 19:32:52 +0000 (21:32 +0200)]
hook up c++ thread safety test (main Makefile)
TiborGY [Sat, 1 Jun 2019 19:30:06 +0000 (21:30 +0200)]
upload thread safety test folder
AbdelRauf [Thu, 23 May 2019 04:23:43 +0000 (04:23 +0000)]
improved zgemm power9 based on power8
Martin Kroeker [Thu, 30 May 2019 09:38:11 +0000 (11:38 +0200)]
Use generic kernels for complex (I)AMAX to support softfp
Martin Kroeker [Thu, 30 May 2019 09:25:43 +0000 (11:25 +0200)]
Ensure correct output for DAMAX with softfp
Martin Kroeker [Wed, 29 May 2019 13:02:51 +0000 (15:02 +0200)]
Separate implementations of AMAX and IAMAX on arm
As noted in #1912 and comment on #1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register
Martin Kroeker [Wed, 29 May 2019 12:09:10 +0000 (14:09 +0200)]
Merge pull request #2144 from xianyi/revert-2142-issue1912-2
Revert "Add softfp support in min/max kernels"
Martin Kroeker [Wed, 29 May 2019 12:07:17 +0000 (14:07 +0200)]
Revert "Add softfp support in min/max kernels"
Martin Kroeker [Tue, 28 May 2019 20:56:08 +0000 (22:56 +0200)]
Merge pull request #2142 from martin-frbg/issue1912-2
Add softfp support in min/max kernels
Martin Kroeker [Tue, 28 May 2019 18:50:40 +0000 (20:50 +0200)]
Merge pull request #2141 from martin-frbg/issue1912
Build and run utests independently of fortran
Martin Kroeker [Tue, 28 May 2019 18:34:22 +0000 (20:34 +0200)]
Add softfp support in min/max kernels
fix for #1912
Martin Kroeker [Sun, 26 May 2019 10:39:20 +0000 (12:39 +0200)]
Merge pull request #2140 from martin-frbg/pgi19
Do not try ancient PGI hacks with recent versions of that compiler
Martin Kroeker [Fri, 24 May 2019 11:02:23 +0000 (13:02 +0200)]
Build and run utests in any case, they do their own checks for fortran availability
Martin Kroeker [Wed, 22 May 2019 11:48:27 +0000 (13:48 +0200)]
Do not try ancient PGI hacks with recent versions of that compiler
should fix #2139
Martin Kroeker [Thu, 16 May 2019 10:08:16 +0000 (12:08 +0200)]
Merge pull request #2136 from martin-frbg/issue2126
Add option to allow combining USE_THREAD=0 with thread locking support
Martin Kroeker [Wed, 15 May 2019 21:40:06 +0000 (23:40 +0200)]
Merge pull request #2134 from tylerjereddy/skylake_regress_guard_may14
TST: add SkylakeX AVX512 CI test
Martin Kroeker [Wed, 15 May 2019 21:38:12 +0000 (23:38 +0200)]
Remove unrelated change
Martin Kroeker [Wed, 15 May 2019 21:36:17 +0000 (23:36 +0200)]
Add option USE_LOCKING but keep default settings intact
Martin Kroeker [Wed, 15 May 2019 21:21:20 +0000 (23:21 +0200)]
Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds
Martin Kroeker [Wed, 15 May 2019 21:19:30 +0000 (23:19 +0200)]
Add option USE_LOCKING for single-threaded build with locking support
Martin Kroeker [Wed, 15 May 2019 21:18:43 +0000 (23:18 +0200)]
Add option USE_LOCKING for single-threaded build with locking support
for calling from concurrent threads
Tyler Reddy [Tue, 14 May 2019 18:32:23 +0000 (11:32 -0700)]
TST: add SkylakeX AVX512 CI test
* adapt the C-level reproducer code for some
recent SkylakeX AVX512 kernel issues, provided
by Isuru Fernando and modified by Martin Kroeker,
for usage in the utest suite
* add an Intel SDE SkylakeX emulation utest run to
the Azure CI matrix; a custom Docker build was required
because Ubuntu image provided by Azure does not support
AVX512VL instructions
Martin Kroeker [Tue, 14 May 2019 07:37:00 +0000 (09:37 +0200)]
Merge pull request #2130 from isuruf/drone
Drone CI for arm64 native builds
Isuru Fernando [Sun, 12 May 2019 20:25:45 +0000 (15:25 -0500)]
Fix typo
Isuru Fernando [Sun, 12 May 2019 20:14:46 +0000 (15:14 -0500)]
arm32 build
Isuru Fernando [Sun, 12 May 2019 20:09:53 +0000 (15:09 -0500)]
Remove qemu armv8 builds
Isuru Fernando [Sun, 12 May 2019 19:28:48 +0000 (14:28 -0500)]
See if ubuntu 19.04 fixes the ICE
Isuru Fernando [Sun, 12 May 2019 19:22:36 +0000 (14:22 -0500)]
parallel build
Isuru Fernando [Sun, 12 May 2019 19:17:12 +0000 (14:17 -0500)]
build without lapack on cmake
Isuru Fernando [Sun, 12 May 2019 19:09:29 +0000 (14:09 -0500)]
Add cmake builds and print options
Isuru Fernando [Sun, 12 May 2019 19:06:04 +0000 (14:06 -0500)]
Add a cmake build as well
Isuru Fernando [Sun, 12 May 2019 19:02:39 +0000 (14:02 -0500)]
no need of gcc in clang build
Isuru Fernando [Sun, 12 May 2019 18:56:59 +0000 (13:56 -0500)]
update yes
Isuru Fernando [Sun, 12 May 2019 18:55:38 +0000 (13:55 -0500)]
Fix typo
Isuru Fernando [Sun, 12 May 2019 18:55:04 +0000 (13:55 -0500)]
apt update
Isuru Fernando [Sun, 12 May 2019 18:53:58 +0000 (13:53 -0500)]
Switch to ubuntu and parallel jobs
Isuru Fernando [Sun, 12 May 2019 18:50:37 +0000 (13:50 -0500)]
gfortran->gcc-gfortran
Isuru Fernando [Sun, 12 May 2019 18:47:49 +0000 (13:47 -0500)]
Install gfortran and add a clang job
Isuru Fernando [Sun, 12 May 2019 18:44:15 +0000 (13:44 -0500)]
Install perl
Isuru Fernando [Sun, 12 May 2019 18:42:16 +0000 (13:42 -0500)]
Install gcc
Isuru Fernando [Sun, 12 May 2019 18:40:23 +0000 (13:40 -0500)]
remove sudo
Isuru Fernando [Sun, 12 May 2019 18:39:51 +0000 (13:39 -0500)]
install make
Isuru Fernando [Sun, 12 May 2019 18:35:07 +0000 (13:35 -0500)]
Test drone CI
Martin Kroeker [Sun, 12 May 2019 07:55:57 +0000 (09:55 +0200)]
Merge pull request #2129 from martin-frbg/armv8azure
Move ARMv8/gcc CI job from Travis to Azure
Martin Kroeker [Sat, 11 May 2019 20:37:06 +0000 (22:37 +0200)]
Update .travis.yml