Martin Kroeker [Sat, 27 Apr 2019 20:45:47 +0000 (22:45 +0200)]
Merge pull request #2094 from martin-frbg/issue2066
Fix ReLAPACK integration problems
Martin Kroeker [Sat, 27 Apr 2019 17:06:00 +0000 (19:06 +0200)]
Add support for INTERFACE64 and fix XERBLA calls
1. Replaced all instances of "int" with "blasint"
2. Added string length as "hidden" third parameter in calls to fortran XERBLA
Martin Kroeker [Sat, 27 Apr 2019 16:55:47 +0000 (18:55 +0200)]
Support INTERFACE64=1
Martin Kroeker [Tue, 23 Apr 2019 18:12:06 +0000 (20:12 +0200)]
Merge pull request #2092 from jeffbaylor/snprintf_with_MSC_VER
snprintf define consolidated to common.h
Martin Kroeker [Tue, 23 Apr 2019 18:11:36 +0000 (20:11 +0200)]
Merge pull request #2072 from martin-frbg/sum
Add (C)BLAS extension ?sum
Jeff Baylor [Tue, 23 Apr 2019 00:01:34 +0000 (17:01 -0700)]
snprintf define consolidated to common.h
Martin Kroeker [Sun, 14 Apr 2019 19:40:07 +0000 (21:40 +0200)]
Merge pull request #2084 from RashmicaG/develop
Add in runtime CPU detection for POWER.
Rashmica Gupta [Tue, 9 Apr 2019 04:13:24 +0000 (14:13 +1000)]
Add in runtime CPU detection for POWER.
Martin Kroeker [Tue, 2 Apr 2019 19:40:58 +0000 (21:40 +0200)]
Merge pull request #2080 from martin-frbg/issue2075
Add -lm and disable EXPRECISION support on *BSD
Martin Kroeker [Tue, 2 Apr 2019 07:38:18 +0000 (09:38 +0200)]
Add -lm and disable EXPRECISION support on *BSD
fixes #2075
Martin Kroeker [Sun, 31 Mar 2019 20:12:23 +0000 (22:12 +0200)]
Add declarations for ?sum
Martin Kroeker [Sun, 31 Mar 2019 11:56:08 +0000 (13:56 +0200)]
Merge pull request #2073 from martin-frbg/issue2056-2
Detect 32bit environment on 64bit ARM hardware
Martin Kroeker [Sun, 31 Mar 2019 11:55:49 +0000 (13:55 +0200)]
Add ?sum definitions for generic kernel
Martin Kroeker [Sun, 31 Mar 2019 11:55:05 +0000 (13:55 +0200)]
Add ?sum
Martin Kroeker [Sun, 31 Mar 2019 09:57:01 +0000 (11:57 +0200)]
Add cmake defaults for ?sum kernels
Martin Kroeker [Sun, 31 Mar 2019 08:50:43 +0000 (10:50 +0200)]
Detect 32bit environment on 64bit ARM hardware
for #2056, using same approach as #2058
Martin Kroeker [Sat, 30 Mar 2019 21:49:05 +0000 (22:49 +0100)]
Add ZARCH implementation of ?sum
as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:27:04 +0000 (22:27 +0100)]
Add x86_64 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:26:10 +0000 (22:26 +0100)]
Add x86 implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:25:06 +0000 (22:25 +0100)]
Add SPARC implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure
Martin Kroeker [Sat, 30 Mar 2019 21:23:42 +0000 (22:23 +0100)]
Add POWER implementation of ?sum
as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure
Martin Kroeker [Sat, 30 Mar 2019 21:22:15 +0000 (22:22 +0100)]
Add MIPS64 implementation of ?sum
as trivial copy of ?asum with the fabs replaced by mov to preserve code structure
Martin Kroeker [Sat, 30 Mar 2019 21:20:14 +0000 (22:20 +0100)]
Add MIPS implementation of ?sum
as trivial copy of ?asum with the fabs calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:18:03 +0000 (22:18 +0100)]
Add ia64 implementation of ?sum
as trivial copy of asum with the fabs calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:13:36 +0000 (22:13 +0100)]
Add ARM64 implementations of ?sum
as trivial copies of the respective ?asum kernels with the fabs calls removed
Martin Kroeker [Sat, 30 Mar 2019 21:11:38 +0000 (22:11 +0100)]
Add ARM implementations of ?sum
(trivial copies of the respective ?asum with the fabs calls removed)
Martin Kroeker [Sat, 30 Mar 2019 21:05:11 +0000 (22:05 +0100)]
Add implementations of ssum/dsum and csum/zsum
as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure
Martin Kroeker [Sat, 30 Mar 2019 21:01:13 +0000 (22:01 +0100)]
Add ?sum
Martin Kroeker [Sat, 30 Mar 2019 20:59:18 +0000 (21:59 +0100)]
Add interface for ?sum (derived from ?asum)
Martin Kroeker [Sat, 30 Mar 2019 20:58:03 +0000 (21:58 +0100)]
Add declarations for ?sum and cblas_?sum
Martin Kroeker [Sat, 30 Mar 2019 20:21:38 +0000 (21:21 +0100)]
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
Disable the AVX512 DGEMM kernel (again)
Martin Kroeker [Sat, 30 Mar 2019 13:54:28 +0000 (14:54 +0100)]
Merge pull request #2071 from martin-frbg/issue2068
Provide CBLAS interfaces to I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:38:41 +0000 (12:38 +0100)]
Build CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:37:13 +0000 (12:37 +0100)]
Expose CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Fri, 29 Mar 2019 20:46:21 +0000 (21:46 +0100)]
Merge pull request #2070 from quickwritereader/develop
power9 makefile. dgemm based on power8 kernel with following changes …
Martin Kroeker [Fri, 29 Mar 2019 18:36:29 +0000 (19:36 +0100)]
Merge branch 'develop' into develop
AbdelRauf [Thu, 14 Mar 2019 10:42:04 +0000 (10:42 +0000)]
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself
Martin Kroeker [Mon, 25 Mar 2019 20:34:30 +0000 (21:34 +0100)]
Merge pull request #2069 from aixoss/aix-asm-change
AIX asm syntax changes needed for shared object creation
Ayappan P [Mon, 25 Mar 2019 13:23:25 +0000 (18:53 +0530)]
AIX asm syntax changes needed for shared object creation
Martin Kroeker [Tue, 19 Mar 2019 21:12:51 +0000 (22:12 +0100)]
Merge pull request #2064 from embray/cygwin/use-tls-thread-memory-cleanup
Fix for #2063
Erik M. Bray [Tue, 19 Mar 2019 09:22:02 +0000 (10:22 +0100)]
Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.
Erik M. Bray [Mon, 18 Mar 2019 19:32:48 +0000 (20:32 +0100)]
Fix for #2063: The DllMain used in Cygwin did not run the thread memory
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.
Martin Kroeker [Sat, 16 Mar 2019 10:57:23 +0000 (11:57 +0100)]
Merge pull request #2058 from xsacha/patch-3
Change 64-bit detection as explained in #2056
Martin Kroeker [Sat, 16 Mar 2019 10:56:51 +0000 (11:56 +0100)]
Merge pull request #2060 from embray/cygwin/readenv
Use POSIX getenv on Cygwin
Erik M. Bray [Fri, 15 Mar 2019 14:06:30 +0000 (15:06 +0100)]
Use POSIX getenv on Cygwin
The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().
Martin Kroeker [Wed, 13 Mar 2019 21:10:28 +0000 (22:10 +0100)]
Disable the AVX512 DGEMM kernel (again)
Due to as yet unresolved errors seen in #1955 and #2029
Martin Kroeker [Wed, 13 Mar 2019 18:20:23 +0000 (19:20 +0100)]
Trivial typo fix
as suggested in #2022
Sacha [Wed, 13 Mar 2019 13:21:54 +0000 (23:21 +1000)]
Change 64-bit detection as explained in #2056
Martin Kroeker [Tue, 12 Mar 2019 21:57:39 +0000 (22:57 +0100)]
Merge pull request #2042 from maomao194313/develop
add TARGET support for HiSilicon tsv110 CPUs
Martin Kroeker [Tue, 12 Mar 2019 21:57:07 +0000 (22:57 +0100)]
Merge pull request #2055 from martin-frbg/atomid
Add CPUID data for Intel Denverton (as Nehalem)
Martin Kroeker [Tue, 12 Mar 2019 15:09:55 +0000 (16:09 +0100)]
Add Intel Denverton
Martin Kroeker [Tue, 12 Mar 2019 15:03:56 +0000 (16:03 +0100)]
Add Intel Denverton
for #2048
maomao194313 [Tue, 12 Mar 2019 08:11:01 +0000 (16:11 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110
maomao194313 [Tue, 12 Mar 2019 08:05:19 +0000 (16:05 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110.
Martin Kroeker [Sat, 9 Mar 2019 15:39:35 +0000 (16:39 +0100)]
Merge pull request #2051 from martin-frbg/issue2048
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
Martin Kroeker [Sat, 9 Mar 2019 15:39:08 +0000 (16:39 +0100)]
Merge pull request #2050 from kencu/PowerMacFix
PowerMac 970 fixes
Martin Kroeker [Sat, 9 Mar 2019 10:21:16 +0000 (11:21 +0100)]
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
for issue #2048
ken-cunningham-webuse [Thu, 7 Mar 2019 19:41:58 +0000 (11:41 -0800)]
common_power.h: force DCBT_ARG 0 on PPC970 Darwin
without this, we see
../kernel/power/gemv_n.S:427:Parameter syntax error
and many more similar entries
that relates to this assembly command
dcbt 8, r24, r18
this change makes the DCBT_ARG = 0
and openblas builds through to completion on PowerMac 970
Tests pass
ken-cunningham-webuse [Thu, 7 Mar 2019 19:36:35 +0000 (11:36 -0800)]
param.h : enable defines for PPC970 on DarwinOS
fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
#define SGEMM_P SGEMM_DEFAULT_P
^
Martin Kroeker [Thu, 7 Mar 2019 18:28:06 +0000 (19:28 +0100)]
Merge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64
Fix crash in sgemm SSE/nano kernel on x86_64
Celelibi [Thu, 7 Mar 2019 15:39:41 +0000 (16:39 +0100)]
Fix crash in sgemm SSE/nano kernel on x86_64
Fix bug #2047.
Signed-off-by: Celelibi <celelibi@gmail.com>
Martin Kroeker [Thu, 7 Mar 2019 13:51:41 +0000 (14:51 +0100)]
Merge pull request #2046 from kencu/powermac
ctest.c : add __POWERPC__ for PowerMac
ken-cunningham-webuse [Thu, 7 Mar 2019 04:55:06 +0000 (20:55 -0800)]
ctest.c : add __POWERPC__ for PowerMac
Martin Kroeker [Wed, 6 Mar 2019 21:40:26 +0000 (22:40 +0100)]
Merge pull request #2045 from martin-frbg/2033-3
Do not compile in AVX512 check if AVX support is disabled
Martin Kroeker [Tue, 5 Mar 2019 15:04:25 +0000 (16:04 +0100)]
Do not compile in AVX512 check if AVX support is disabled
xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway
Martin Kroeker [Tue, 5 Mar 2019 11:11:32 +0000 (12:11 +0100)]
Merge pull request #2044 from martin-frbg/issue2043
Fix module definition conflicts between LAPACK and ReLAPACK
Martin Kroeker [Tue, 5 Mar 2019 11:11:15 +0000 (12:11 +0100)]
Merge pull request #2039 from brada4/meminit
Address warning in memory.c
Martin Kroeker [Mon, 4 Mar 2019 20:17:08 +0000 (21:17 +0100)]
Fix module definition conflicts between LAPACK and ReLAPACK
for #2043
Martin Kroeker [Mon, 4 Mar 2019 14:08:31 +0000 (15:08 +0100)]
Merge pull request #2026 from martin-frbg/trmv_threads
Correct range limiting in trmv_thread and re-enable TRMV multithreading
Martin Kroeker [Mon, 4 Mar 2019 14:07:48 +0000 (15:07 +0100)]
Merge pull request #2038 from martin-frbg/issue2035
Improve handling of NO_STATIC and NO_SHARED
Martin Kroeker [Mon, 4 Mar 2019 14:07:14 +0000 (15:07 +0100)]
Merge pull request #2040 from martin-frbg/locks2002
Restore locking optimizations for OpenMP case
maomao194313 [Mon, 4 Mar 2019 08:48:49 +0000 (16:48 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:45:22 +0000 (16:45 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:41:21 +0000 (16:41 +0800)]
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:30:50 +0000 (16:30 +0800)]
HiSilicon tsv110 CPUs optimization branch
add HiSilicon tsv110 CPUs optimization branch
Martin Kroeker [Sun, 3 Mar 2019 13:17:07 +0000 (14:17 +0100)]
Restore locking optimizations for OpenMP case
restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461
Andrew [Sun, 3 Mar 2019 07:05:11 +0000 (09:05 +0200)]
address warning introed with #1814 et al
Andrew [Sun, 3 Mar 2019 06:59:27 +0000 (08:59 +0200)]
init
Martin Kroeker [Sat, 2 Mar 2019 22:36:36 +0000 (23:36 +0100)]
Improve handling of NO_STATIC and NO_SHARED
to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422
Martin Kroeker [Fri, 1 Mar 2019 10:45:02 +0000 (11:45 +0100)]
Merge pull request #2037 from martin-frbg/issue2033-2
Make sure that AVX512 is disabled in 32bit builds
Martin Kroeker [Fri, 1 Mar 2019 08:23:03 +0000 (09:23 +0100)]
Make sure that AVX512 is disabled in 32bit builds
for #2033
Martin Kroeker [Thu, 28 Feb 2019 21:10:12 +0000 (22:10 +0100)]
Merge pull request #2034 from martin-frbg/issue2033
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
Martin Kroeker [Thu, 28 Feb 2019 09:51:54 +0000 (10:51 +0100)]
Keep xcode8.3 for osx BINARY=32 build
as xcode10 deprecated i386
Martin Kroeker [Thu, 28 Feb 2019 08:58:25 +0000 (09:58 +0100)]
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
fixes #2033
Martin Kroeker [Mon, 25 Feb 2019 16:58:31 +0000 (17:58 +0100)]
Fix AVX512 test always returning false due to missing compiler option
Martin Kroeker [Mon, 25 Feb 2019 16:55:36 +0000 (17:55 +0100)]
Fix missing -c option in AVX512 test
Martin Kroeker [Sun, 24 Feb 2019 18:50:23 +0000 (19:50 +0100)]
Merge pull request #2028 from brada4/mv
Move one of clobber fixes to right place
Andrew [Sun, 24 Feb 2019 18:41:02 +0000 (20:41 +0200)]
move fix to right place
Andrew [Sun, 24 Feb 2019 18:39:25 +0000 (20:39 +0200)]
init
Martin Kroeker [Wed, 20 Feb 2019 09:27:48 +0000 (10:27 +0100)]
Reduce list of kernels in the dynamic arch build
to make compilation complete reliably within the 1h limit again
Martin Kroeker [Tue, 19 Feb 2019 21:16:33 +0000 (22:16 +0100)]
Fix error introduced during cleanup
Martin Kroeker [Tue, 19 Feb 2019 20:03:30 +0000 (21:03 +0100)]
Allow multithreading TRMV again
revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)
Martin Kroeker [Tue, 19 Feb 2019 19:59:48 +0000 (20:59 +0100)]
Correct range_n limiting
same bug as seen in #1388, somehow missed in corresponding PR #1389
Martin Kroeker [Sun, 17 Feb 2019 10:49:15 +0000 (11:49 +0100)]
Merge pull request #2024 from martin-frbg/gcc9fixes4
Fix inline assembly constraints in Bulldozer TRSM kernels
Martin Kroeker [Sun, 17 Feb 2019 10:48:57 +0000 (11:48 +0100)]
Merge pull request #2023 from martin-frbg/gcc9fixes3
Fix inline assembly constraints in various x86_64 GEMVN kernels
Martin Kroeker [Sun, 17 Feb 2019 10:36:04 +0000 (11:36 +0100)]
Merge pull request #1988 from TiborGY/patch-1
Reword/expand comments in Makefile.rule
TiborGY [Sat, 16 Feb 2019 22:26:13 +0000 (23:26 +0100)]
fix the the
Martin Kroeker [Sat, 16 Feb 2019 19:06:48 +0000 (20:06 +0100)]
Fix inline assembly constraints in Bulldozer TRSM kernels
rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009
Martin Kroeker [Sat, 16 Feb 2019 17:51:09 +0000 (18:51 +0100)]
Fix inline assembly constraints
Martin Kroeker [Sat, 16 Feb 2019 17:46:17 +0000 (18:46 +0100)]
Fix inline assembly constraints