platform/upstream/openblas.git
5 years agoCorrect length of name string in xerbla call
Martin Kroeker [Sat, 27 Apr 2019 20:49:04 +0000 (22:49 +0200)]
Correct length of name string in xerbla call

5 years agoMerge pull request #2094 from martin-frbg/issue2066
Martin Kroeker [Sat, 27 Apr 2019 20:45:47 +0000 (22:45 +0200)]
Merge pull request #2094 from martin-frbg/issue2066

Fix ReLAPACK integration problems

5 years agoAdd support for INTERFACE64 and fix XERBLA calls
Martin Kroeker [Sat, 27 Apr 2019 17:06:00 +0000 (19:06 +0200)]
Add support for INTERFACE64 and fix XERBLA calls

1. Replaced all instances of "int" with "blasint"
2. Added string length as "hidden" third parameter in calls to fortran XERBLA

5 years agoSupport INTERFACE64=1
Martin Kroeker [Sat, 27 Apr 2019 16:55:47 +0000 (18:55 +0200)]
Support INTERFACE64=1

5 years agoMerge pull request #2092 from jeffbaylor/snprintf_with_MSC_VER
Martin Kroeker [Tue, 23 Apr 2019 18:12:06 +0000 (20:12 +0200)]
Merge pull request #2092 from jeffbaylor/snprintf_with_MSC_VER

snprintf define consolidated to common.h

5 years agoMerge pull request #2072 from martin-frbg/sum
Martin Kroeker [Tue, 23 Apr 2019 18:11:36 +0000 (20:11 +0200)]
Merge pull request #2072 from martin-frbg/sum

Add (C)BLAS extension ?sum

5 years agosnprintf define consolidated to common.h
Jeff Baylor [Tue, 23 Apr 2019 00:01:34 +0000 (17:01 -0700)]
snprintf define consolidated to common.h

5 years agoMerge pull request #2084 from RashmicaG/develop
Martin Kroeker [Sun, 14 Apr 2019 19:40:07 +0000 (21:40 +0200)]
Merge pull request #2084 from RashmicaG/develop

Add in runtime CPU detection for POWER.

5 years agoAdd in runtime CPU detection for POWER.
Rashmica Gupta [Tue, 9 Apr 2019 04:13:24 +0000 (14:13 +1000)]
Add in runtime CPU detection for POWER.

5 years agoMerge pull request #2080 from martin-frbg/issue2075
Martin Kroeker [Tue, 2 Apr 2019 19:40:58 +0000 (21:40 +0200)]
Merge pull request #2080 from martin-frbg/issue2075

Add -lm and disable EXPRECISION support on *BSD

5 years agoAdd -lm and disable EXPRECISION support on *BSD
Martin Kroeker [Tue, 2 Apr 2019 07:38:18 +0000 (09:38 +0200)]
Add -lm and disable EXPRECISION support on *BSD

fixes #2075

5 years agoAdd declarations for ?sum
Martin Kroeker [Sun, 31 Mar 2019 20:12:23 +0000 (22:12 +0200)]
Add declarations for ?sum

5 years agoMerge pull request #2073 from martin-frbg/issue2056-2
Martin Kroeker [Sun, 31 Mar 2019 11:56:08 +0000 (13:56 +0200)]
Merge pull request #2073 from martin-frbg/issue2056-2

Detect 32bit environment on 64bit ARM hardware

5 years agoAdd ?sum definitions for generic kernel
Martin Kroeker [Sun, 31 Mar 2019 11:55:49 +0000 (13:55 +0200)]
Add ?sum definitions for generic kernel

5 years agoAdd ?sum
Martin Kroeker [Sun, 31 Mar 2019 11:55:05 +0000 (13:55 +0200)]
Add ?sum

5 years agoAdd cmake defaults for ?sum kernels
Martin Kroeker [Sun, 31 Mar 2019 09:57:01 +0000 (11:57 +0200)]
Add cmake defaults for ?sum kernels

5 years agoDetect 32bit environment on 64bit ARM hardware
Martin Kroeker [Sun, 31 Mar 2019 08:50:43 +0000 (10:50 +0200)]
Detect 32bit environment on 64bit ARM hardware

for #2056, using same approach as #2058

5 years agoAdd ZARCH implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:49:05 +0000 (22:49 +0100)]
Add ZARCH implementation of ?sum

as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed

5 years agoAdd x86_64 implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:27:04 +0000 (22:27 +0100)]
Add x86_64 implementation of ?sum

as trivial copy of ?asum with the fabs calls removed

5 years agoAdd x86 implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:26:10 +0000 (22:26 +0100)]
Add x86 implementation of ?sum

as trivial copy of ?asum with the fabs calls removed

5 years agoAdd SPARC implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:25:06 +0000 (22:25 +0100)]
Add SPARC implementation of ?sum

as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure

5 years agoAdd POWER implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:23:42 +0000 (22:23 +0100)]
Add POWER implementation of ?sum

as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure

5 years agoAdd MIPS64 implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:22:15 +0000 (22:22 +0100)]
Add MIPS64 implementation of ?sum

as trivial copy of ?asum with the fabs replaced by mov to preserve code structure

5 years agoAdd MIPS implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:20:14 +0000 (22:20 +0100)]
Add MIPS implementation of ?sum

as trivial copy of ?asum with the fabs calls removed

5 years agoAdd ia64 implementation of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:18:03 +0000 (22:18 +0100)]
Add ia64 implementation of ?sum

as trivial copy of asum with the fabs calls removed

5 years agoAdd ARM64 implementations of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:13:36 +0000 (22:13 +0100)]
Add ARM64 implementations of ?sum

as trivial copies of the respective ?asum kernels with the fabs calls removed

5 years agoAdd ARM implementations of ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:11:38 +0000 (22:11 +0100)]
Add ARM implementations of ?sum

(trivial copies of the respective ?asum with the fabs calls removed)

5 years agoAdd implementations of ssum/dsum and csum/zsum
Martin Kroeker [Sat, 30 Mar 2019 21:05:11 +0000 (22:05 +0100)]
Add implementations of ssum/dsum and csum/zsum

as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure

5 years agoAdd ?sum
Martin Kroeker [Sat, 30 Mar 2019 21:01:13 +0000 (22:01 +0100)]
Add ?sum

5 years agoAdd interface for ?sum (derived from ?asum)
Martin Kroeker [Sat, 30 Mar 2019 20:59:18 +0000 (21:59 +0100)]
Add interface for ?sum (derived from ?asum)

5 years agoAdd declarations for ?sum and cblas_?sum
Martin Kroeker [Sat, 30 Mar 2019 20:58:03 +0000 (21:58 +0100)]
Add declarations for ?sum and cblas_?sum

5 years agoMerge pull request #2061 from martin-frbg/martin-frbg-patch-1
Martin Kroeker [Sat, 30 Mar 2019 20:21:38 +0000 (21:21 +0100)]
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1

Disable the AVX512 DGEMM kernel (again)

5 years agoMerge pull request #2071 from martin-frbg/issue2068
Martin Kroeker [Sat, 30 Mar 2019 13:54:28 +0000 (14:54 +0100)]
Merge pull request #2071 from martin-frbg/issue2068

Provide CBLAS interfaces to I?MIN and I?MAX

5 years agoBuild CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:38:41 +0000 (12:38 +0100)]
Build CBLAS interfaces for I?MIN and I?MAX

5 years agoExpose CBLAS interfaces for I?MIN and I?MAX
Martin Kroeker [Sat, 30 Mar 2019 11:37:13 +0000 (12:37 +0100)]
Expose CBLAS interfaces for I?MIN and I?MAX

5 years agoMerge pull request #2070 from quickwritereader/develop
Martin Kroeker [Fri, 29 Mar 2019 20:46:21 +0000 (21:46 +0100)]
Merge pull request #2070 from quickwritereader/develop

power9 makefile. dgemm based on power8 kernel with following changes …

5 years agoMerge branch 'develop' into develop
Martin Kroeker [Fri, 29 Mar 2019 18:36:29 +0000 (19:36 +0100)]
Merge branch 'develop' into develop

5 years agopower9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled...
AbdelRauf [Thu, 14 Mar 2019 10:42:04 +0000 (10:42 +0000)]
power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself

5 years agoMerge pull request #2069 from aixoss/aix-asm-change
Martin Kroeker [Mon, 25 Mar 2019 20:34:30 +0000 (21:34 +0100)]
Merge pull request #2069 from aixoss/aix-asm-change

AIX asm syntax changes needed for shared object creation

5 years agoAIX asm syntax changes needed for shared object creation
Ayappan P [Mon, 25 Mar 2019 13:23:25 +0000 (18:53 +0530)]
AIX asm syntax changes needed for shared object creation

5 years agoMerge pull request #2064 from embray/cygwin/use-tls-thread-memory-cleanup
Martin Kroeker [Tue, 19 Mar 2019 21:12:51 +0000 (22:12 +0100)]
Merge pull request #2064 from embray/cygwin/use-tls-thread-memory-cleanup

Fix for #2063

5 years agoAlso call CloseHandle on each thread, as well as on the event so as to not leak threa...
Erik M. Bray [Tue, 19 Mar 2019 09:22:02 +0000 (10:22 +0100)]
Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles.

5 years agoFix for #2063: The DllMain used in Cygwin did not run the thread memory
Erik M. Bray [Mon, 18 Mar 2019 19:32:48 +0000 (20:32 +0100)]
Fix for #2063: The DllMain used in Cygwin did not run the thread memory
pool cleanup upon THREAD_DETACH which is needed when compiled with
USE_TLS=1.

5 years agoMerge pull request #2058 from xsacha/patch-3
Martin Kroeker [Sat, 16 Mar 2019 10:57:23 +0000 (11:57 +0100)]
Merge pull request #2058 from xsacha/patch-3

Change 64-bit detection as explained in #2056

5 years agoMerge pull request #2060 from embray/cygwin/readenv
Martin Kroeker [Sat, 16 Mar 2019 10:56:51 +0000 (11:56 +0100)]
Merge pull request #2060 from embray/cygwin/readenv

Use POSIX getenv on Cygwin

5 years agoUse POSIX getenv on Cygwin
Erik M. Bray [Fri, 15 Mar 2019 14:06:30 +0000 (15:06 +0100)]
Use POSIX getenv on Cygwin

The Windows-native GetEnvironmentVariable cannot be relied on, as
Cygwin does not always copy environment variables set through Cygwin
to the Windows environment block, particularly after fork().

5 years agoDisable the AVX512 DGEMM kernel (again)
Martin Kroeker [Wed, 13 Mar 2019 21:10:28 +0000 (22:10 +0100)]
Disable the AVX512 DGEMM kernel (again)

Due to as yet unresolved errors seen in #1955 and #2029

5 years agoTrivial typo fix
Martin Kroeker [Wed, 13 Mar 2019 18:20:23 +0000 (19:20 +0100)]
Trivial typo fix

as suggested in #2022

5 years agoChange 64-bit detection as explained in #2056
Sacha [Wed, 13 Mar 2019 13:21:54 +0000 (23:21 +1000)]
Change 64-bit detection as explained in #2056

5 years agoMerge pull request #2042 from maomao194313/develop
Martin Kroeker [Tue, 12 Mar 2019 21:57:39 +0000 (22:57 +0100)]
Merge pull request #2042 from maomao194313/develop

add TARGET support for HiSilicon tsv110 CPUs

5 years agoMerge pull request #2055 from martin-frbg/atomid
Martin Kroeker [Tue, 12 Mar 2019 21:57:07 +0000 (22:57 +0100)]
Merge pull request #2055 from martin-frbg/atomid

Add CPUID data for Intel Denverton (as Nehalem)

5 years agoAdd Intel Denverton
Martin Kroeker [Tue, 12 Mar 2019 15:09:55 +0000 (16:09 +0100)]
Add Intel Denverton

5 years agoAdd Intel Denverton
Martin Kroeker [Tue, 12 Mar 2019 15:03:56 +0000 (16:03 +0100)]
Add Intel Denverton

for #2048

5 years agomake DYNAMIC_ARCH=1 package work on TSV110
maomao194313 [Tue, 12 Mar 2019 08:11:01 +0000 (16:11 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110

5 years agomake DYNAMIC_ARCH=1 package work on TSV110.
maomao194313 [Tue, 12 Mar 2019 08:05:19 +0000 (16:05 +0800)]
make DYNAMIC_ARCH=1 package work on TSV110.

5 years agoMerge pull request #2051 from martin-frbg/issue2048
Martin Kroeker [Sat, 9 Mar 2019 15:39:35 +0000 (16:39 +0100)]
Merge pull request #2051 from martin-frbg/issue2048

Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1

5 years agoMerge pull request #2050 from kencu/PowerMacFix
Martin Kroeker [Sat, 9 Mar 2019 15:39:08 +0000 (16:39 +0100)]
Merge pull request #2050 from kencu/PowerMacFix

PowerMac 970 fixes

5 years agoMake TARGET=GENERIC compatible with DYNAMIC_ARCH=1
Martin Kroeker [Sat, 9 Mar 2019 10:21:16 +0000 (11:21 +0100)]
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1

for issue #2048

5 years agocommon_power.h: force DCBT_ARG 0 on PPC970 Darwin
ken-cunningham-webuse [Thu, 7 Mar 2019 19:41:58 +0000 (11:41 -0800)]
common_power.h: force DCBT_ARG 0 on PPC970 Darwin

without this, we see
../kernel/power/gemv_n.S:427:Parameter syntax error
and many more similar entries

that relates to this assembly command
dcbt 8, r24, r18

this change makes the DCBT_ARG = 0
and openblas builds through to completion on PowerMac 970
Tests pass

5 years agoparam.h : enable defines for PPC970 on DarwinOS
ken-cunningham-webuse [Thu, 7 Mar 2019 19:36:35 +0000 (11:36 -0800)]
param.h : enable defines for PPC970 on DarwinOS

fixes:
gemm.c: In function 'sgemm_':
../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function)
 #define SGEMM_P  SGEMM_DEFAULT_P
                  ^

5 years agoMerge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64
Martin Kroeker [Thu, 7 Mar 2019 18:28:06 +0000 (19:28 +0100)]
Merge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64

Fix crash in sgemm SSE/nano kernel on x86_64

5 years agoFix crash in sgemm SSE/nano kernel on x86_64
Celelibi [Thu, 7 Mar 2019 15:39:41 +0000 (16:39 +0100)]
Fix crash in sgemm SSE/nano kernel on x86_64

Fix bug #2047.

Signed-off-by: Celelibi <celelibi@gmail.com>
5 years agoMerge pull request #2046 from kencu/powermac
Martin Kroeker [Thu, 7 Mar 2019 13:51:41 +0000 (14:51 +0100)]
Merge pull request #2046 from kencu/powermac

ctest.c : add __POWERPC__ for PowerMac

5 years agoctest.c : add __POWERPC__ for PowerMac
ken-cunningham-webuse [Thu, 7 Mar 2019 04:55:06 +0000 (20:55 -0800)]
ctest.c : add __POWERPC__ for PowerMac

5 years agoMerge pull request #2045 from martin-frbg/2033-3
Martin Kroeker [Wed, 6 Mar 2019 21:40:26 +0000 (22:40 +0100)]
Merge pull request #2045 from martin-frbg/2033-3

Do not compile in AVX512 check if AVX support is disabled

5 years agoDo not compile in AVX512 check if AVX support is disabled
Martin Kroeker [Tue, 5 Mar 2019 15:04:25 +0000 (16:04 +0100)]
Do not compile in AVX512 check if AVX support is disabled

xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway

5 years agoMerge pull request #2044 from martin-frbg/issue2043
Martin Kroeker [Tue, 5 Mar 2019 11:11:32 +0000 (12:11 +0100)]
Merge pull request #2044 from martin-frbg/issue2043

Fix module definition conflicts between LAPACK and ReLAPACK

5 years agoMerge pull request #2039 from brada4/meminit
Martin Kroeker [Tue, 5 Mar 2019 11:11:15 +0000 (12:11 +0100)]
Merge pull request #2039 from brada4/meminit

Address warning in memory.c

5 years agoFix module definition conflicts between LAPACK and ReLAPACK
Martin Kroeker [Mon, 4 Mar 2019 20:17:08 +0000 (21:17 +0100)]
Fix module definition conflicts between LAPACK and ReLAPACK

for #2043

5 years agoMerge pull request #2026 from martin-frbg/trmv_threads
Martin Kroeker [Mon, 4 Mar 2019 14:08:31 +0000 (15:08 +0100)]
Merge pull request #2026 from martin-frbg/trmv_threads

Correct range limiting in trmv_thread and re-enable TRMV multithreading

5 years agoMerge pull request #2038 from martin-frbg/issue2035
Martin Kroeker [Mon, 4 Mar 2019 14:07:48 +0000 (15:07 +0100)]
Merge pull request #2038 from martin-frbg/issue2035

Improve handling of NO_STATIC and NO_SHARED

5 years agoMerge pull request #2040 from martin-frbg/locks2002
Martin Kroeker [Mon, 4 Mar 2019 14:07:14 +0000 (15:07 +0100)]
Merge pull request #2040 from martin-frbg/locks2002

Restore locking optimizations for OpenMP case

5 years agoadd TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:48:49 +0000 (16:48 +0800)]
add TARGET support for HiSilicon tsv110 CPUs

5 years agoadd TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:45:22 +0000 (16:45 +0800)]
add TARGET support for HiSilicon tsv110 CPUs

5 years agoadd TARGET support for HiSilicon tsv110 CPUs
maomao194313 [Mon, 4 Mar 2019 08:41:21 +0000 (16:41 +0800)]
add TARGET support for  HiSilicon tsv110 CPUs

5 years agoHiSilicon tsv110 CPUs optimization branch
maomao194313 [Mon, 4 Mar 2019 08:30:50 +0000 (16:30 +0800)]
HiSilicon tsv110 CPUs optimization branch

add HiSilicon tsv110 CPUs  optimization branch

5 years agoRestore locking optimizations for OpenMP case
Martin Kroeker [Sun, 3 Mar 2019 13:17:07 +0000 (14:17 +0100)]
Restore locking optimizations for OpenMP case

restore another accidentally dropped part of #1468 that was missed in #2004 to address performance regression reported in #1461

5 years agoaddress warning introed with #1814 et al
Andrew [Sun, 3 Mar 2019 07:05:11 +0000 (09:05 +0200)]
address warning introed with #1814 et al

5 years agoinit
Andrew [Sun, 3 Mar 2019 06:59:27 +0000 (08:59 +0200)]
init

5 years agoImprove handling of NO_STATIC and NO_SHARED
Martin Kroeker [Sat, 2 Mar 2019 22:36:36 +0000 (23:36 +0100)]
Improve handling of NO_STATIC and NO_SHARED

to avoid surprises from defining either as zero. Fixes #2035 by addressing some concerns from #1422

5 years agoMerge pull request #2037 from martin-frbg/issue2033-2
Martin Kroeker [Fri, 1 Mar 2019 10:45:02 +0000 (11:45 +0100)]
Merge pull request #2037 from martin-frbg/issue2033-2

Make sure that AVX512 is disabled in 32bit builds

5 years agoMake sure that AVX512 is disabled in 32bit builds
Martin Kroeker [Fri, 1 Mar 2019 08:23:03 +0000 (09:23 +0100)]
Make sure that AVX512 is disabled in 32bit builds

for #2033

5 years agoMerge pull request #2034 from martin-frbg/issue2033
Martin Kroeker [Thu, 28 Feb 2019 21:10:12 +0000 (22:10 +0100)]
Merge pull request #2034 from martin-frbg/issue2033

Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX

5 years agoKeep xcode8.3 for osx BINARY=32 build
Martin Kroeker [Thu, 28 Feb 2019 09:51:54 +0000 (10:51 +0100)]
Keep xcode8.3 for osx BINARY=32 build

as xcode10 deprecated i386

5 years agoMake x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
Martin Kroeker [Thu, 28 Feb 2019 08:58:25 +0000 (09:58 +0100)]
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX

fixes #2033

5 years agoFix AVX512 test always returning false due to missing compiler option
Martin Kroeker [Mon, 25 Feb 2019 16:58:31 +0000 (17:58 +0100)]
Fix AVX512 test always returning false due to missing compiler option

5 years agoFix missing -c option in AVX512 test
Martin Kroeker [Mon, 25 Feb 2019 16:55:36 +0000 (17:55 +0100)]
Fix missing -c option in AVX512 test

5 years agoMerge pull request #2028 from brada4/mv
Martin Kroeker [Sun, 24 Feb 2019 18:50:23 +0000 (19:50 +0100)]
Merge pull request #2028 from brada4/mv

Move one of clobber fixes to right place

5 years agomove fix to right place
Andrew [Sun, 24 Feb 2019 18:41:02 +0000 (20:41 +0200)]
move fix to right place

5 years agoinit
Andrew [Sun, 24 Feb 2019 18:39:25 +0000 (20:39 +0200)]
init

5 years agoReduce list of kernels in the dynamic arch build
Martin Kroeker [Wed, 20 Feb 2019 09:27:48 +0000 (10:27 +0100)]
Reduce list of kernels in the dynamic arch build

to make compilation complete reliably within the 1h limit again

5 years agoFix error introduced during cleanup
Martin Kroeker [Tue, 19 Feb 2019 21:16:33 +0000 (22:16 +0100)]
Fix error introduced during cleanup

5 years agoAllow multithreading TRMV again
Martin Kroeker [Tue, 19 Feb 2019 20:03:30 +0000 (21:03 +0100)]
Allow multithreading TRMV again

revert workaround introduced for issue #1332 as the actual cause appears to be my incorrect fix from #1262 (see #1388)

5 years agoCorrect range_n limiting
Martin Kroeker [Tue, 19 Feb 2019 19:59:48 +0000 (20:59 +0100)]
Correct range_n limiting

same bug as seen in #1388, somehow missed in corresponding PR #1389

5 years agoMerge pull request #2024 from martin-frbg/gcc9fixes4
Martin Kroeker [Sun, 17 Feb 2019 10:49:15 +0000 (11:49 +0100)]
Merge pull request #2024 from martin-frbg/gcc9fixes4

Fix inline assembly constraints in Bulldozer TRSM kernels

5 years agoMerge pull request #2023 from martin-frbg/gcc9fixes3
Martin Kroeker [Sun, 17 Feb 2019 10:48:57 +0000 (11:48 +0100)]
Merge pull request #2023 from martin-frbg/gcc9fixes3

Fix inline assembly constraints in various x86_64 GEMVN kernels

5 years agoMerge pull request #1988 from TiborGY/patch-1
Martin Kroeker [Sun, 17 Feb 2019 10:36:04 +0000 (11:36 +0100)]
Merge pull request #1988 from TiborGY/patch-1

Reword/expand comments in Makefile.rule

5 years agofix the the
TiborGY [Sat, 16 Feb 2019 22:26:13 +0000 (23:26 +0100)]
fix the the

5 years agoFix inline assembly constraints in Bulldozer TRSM kernels
Martin Kroeker [Sat, 16 Feb 2019 19:06:48 +0000 (20:06 +0100)]
Fix inline assembly constraints in Bulldozer TRSM kernels

rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For #2009

5 years agoFix inline assembly constraints
Martin Kroeker [Sat, 16 Feb 2019 17:51:09 +0000 (18:51 +0100)]
Fix inline assembly constraints