Andrew [Thu, 16 Mar 2017 12:13:31 +0000 (13:13 +0100)]
Address unlikely memleak in zimatcopy interface (#1129)
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
Martin Kroeker [Wed, 15 Mar 2017 09:00:52 +0000 (10:00 +0100)]
Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
Martin Kroeker [Tue, 14 Mar 2017 16:17:39 +0000 (17:17 +0100)]
Merge pull request #1126 from martin-frbg/pgi
Fix compilation with PGI by replacing verbatim _real_, _imag_ extensions and updating macro definitions for modern, C99-capable versions of the PGI compiler
Martin Kroeker [Mon, 13 Mar 2017 17:08:00 +0000 (18:08 +0100)]
Update zdot.c
Martin Kroeker [Mon, 13 Mar 2017 16:49:07 +0000 (17:49 +0100)]
Update zdot.c
Martin Kroeker [Sun, 12 Mar 2017 23:40:11 +0000 (00:40 +0100)]
Replace gnu _real_, _imag_ extensions in initializers
Martin Kroeker [Sun, 12 Mar 2017 23:38:37 +0000 (00:38 +0100)]
Replace gnu _real_ , _imag_ extensions in initializers
Martin Kroeker [Sun, 12 Mar 2017 23:36:01 +0000 (00:36 +0100)]
Fix CREAL,CIMAG macros for PGI
Abdurrauf [Sun, 12 Mar 2017 21:23:16 +0000 (01:23 +0400)]
strmm and ctrmm
Martin Kroeker [Fri, 10 Mar 2017 11:58:38 +0000 (12:58 +0100)]
Merge pull request #1124 from martin-frbg/c_check-ppc
Update c_check.cmake to label ppc64 as power ARCH
Martin Kroeker [Fri, 10 Mar 2017 10:45:48 +0000 (11:45 +0100)]
Update c_check.cmake
Martin Kroeker [Fri, 10 Mar 2017 08:51:34 +0000 (09:51 +0100)]
Merge pull request #1122 from martin-frbg/zlasyf
Fix misspelling of zlasyf_aa from previous commit
Martin Kroeker [Fri, 10 Mar 2017 07:44:49 +0000 (08:44 +0100)]
Fix misspelling of zlasyf_aa from previous commit
Martin Kroeker [Fri, 10 Mar 2017 07:33:36 +0000 (08:33 +0100)]
Merge pull request #1121 from staticfloat/sf/Xsymv_export
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
Elliot Saba [Thu, 9 Mar 2017 23:30:43 +0000 (15:30 -0800)]
Whitespace cleanup/reformatting
Elliot Saba [Thu, 9 Mar 2017 23:22:40 +0000 (15:22 -0800)]
Add `csymv` and `zsymv` into `@lapackobjs2` for exporting
Abdurrauf [Mon, 6 Mar 2017 00:27:40 +0000 (04:27 +0400)]
initial strmm(sgemm). not tuned yet
Martin Kroeker [Thu, 2 Mar 2017 17:43:59 +0000 (18:43 +0100)]
Merge pull request #1111 from martin-frbg/kaby-no-avx
Fix core detection for Kaby Lake without AVX (G4560)
Martin Kroeker [Thu, 2 Mar 2017 16:36:16 +0000 (17:36 +0100)]
Fix core detection for Kaby Lake without AVX (G4560)
Should fix #1109)
Martin Kroeker [Wed, 1 Mar 2017 22:08:07 +0000 (23:08 +0100)]
Merge pull request #1110 from quickwritereader/develop
Conventional usage of the register save area.
Abdurrauf [Tue, 28 Feb 2017 23:13:57 +0000 (03:13 +0400)]
conventional usage of the register save area
Abdurrauf [Tue, 28 Feb 2017 23:13:21 +0000 (03:13 +0400)]
changed to conventional register save area
Martin Kroeker [Tue, 28 Feb 2017 15:02:19 +0000 (16:02 +0100)]
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
Ashwin Sekhar T K [Tue, 28 Feb 2017 14:03:26 +0000 (14:03 +0000)]
THUNDERX2T99: Performance fix for ZGEMM
Ashwin Sekhar T K [Tue, 28 Feb 2017 09:11:38 +0000 (01:11 -0800)]
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
Ashwin Sekhar T K [Mon, 27 Feb 2017 11:22:50 +0000 (11:22 +0000)]
THUNDERX2T99: Add Optimized ZGEMM Implementation
Martin Kroeker [Sun, 26 Feb 2017 08:49:01 +0000 (09:49 +0100)]
Merge pull request #1107 from quickwritereader/develop
ztrmm(zgemm) complex double precision kernel for ibm z13
Abdurrauf [Sun, 26 Feb 2017 02:17:33 +0000 (06:17 +0400)]
Merge branch 'z13' into develop
Abdurrauf [Sun, 26 Feb 2017 01:59:24 +0000 (05:59 +0400)]
ztrmm kernel.
Martin Kroeker [Thu, 23 Feb 2017 19:00:22 +0000 (20:00 +0100)]
Merge pull request #915 from mdong/small_fix_for_icc
remove input from clobbered list
Ashwin Sekhar T K [Wed, 22 Feb 2017 10:26:51 +0000 (02:26 -0800)]
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
Martin Kroeker [Wed, 22 Feb 2017 21:42:52 +0000 (22:42 +0100)]
Merge pull request #1105 from martin-frbg/testing-eig-typos
TESTING/EIG: fix spurious EXTERNAL references to nonexistent functions
Martin Kroeker [Wed, 22 Feb 2017 20:48:35 +0000 (21:48 +0100)]
Remove spurious names from EXTERNAL list
Remove unused (and nonexistent) functions ZHETRD_SY2SB and ZHETRD_SB2ST from comment and EXTERNAL declaration
Martin Kroeker [Wed, 22 Feb 2017 20:45:27 +0000 (21:45 +0100)]
Remove spurious names from EXTERNAL list
Remove unused (and nonexistent) ZHETRD_SY2SB and ZHETRD_SB2ST
Martin Kroeker [Wed, 22 Feb 2017 20:41:07 +0000 (21:41 +0100)]
Fix typo in EXTERNAL declaration
ZHBTRD_HB2ST should be ZHETRD_HB2ST
Martin Kroeker [Wed, 22 Feb 2017 09:31:39 +0000 (10:31 +0100)]
Merge pull request #1104 from martin-frbg/lapack-comma
LAPACK: fix missing comma on continued lines
Martin Kroeker [Wed, 22 Feb 2017 07:40:39 +0000 (08:40 +0100)]
Fix missing comma on continued line
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
Martin Kroeker [Wed, 22 Feb 2017 07:39:06 +0000 (08:39 +0100)]
Fix missing comma on continued line
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
Martin Kroeker [Wed, 22 Feb 2017 07:34:20 +0000 (08:34 +0100)]
Fix missing comma on continued line
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
Martin Kroeker [Wed, 22 Feb 2017 07:32:20 +0000 (08:32 +0100)]
Fix missing comma in continued line
EXTERNAL declaration of subroutines missed a comma before the continuation line,
causing a strange run-together name to appear in the object when compiled with ifort.
Ashwin Sekhar T K [Wed, 22 Feb 2017 05:42:32 +0000 (21:42 -0800)]
ARM64: Remove unused code
Martin Kroeker [Tue, 21 Feb 2017 21:58:30 +0000 (22:58 +0100)]
Merge pull request #1103 from vladimir-ch/fix-lapacke-ormbr
LAPACKE: fix wrong matrix size in ?ormbr
Vladimir Chalupecky [Tue, 21 Feb 2017 20:57:18 +0000 (21:57 +0100)]
LAPACKE: fix wrong matrix size in ?ormbr
Changes made upstream in Reference LAPACK in
https://github.com/Reference-LAPACK/lapack/pull/128
Martin Kroeker [Tue, 21 Feb 2017 14:26:14 +0000 (15:26 +0100)]
Merge pull request #1098 from martin-frbg/amodra-power8
Power8 inline assembly fixes
Martin Kroeker [Tue, 21 Feb 2017 14:19:56 +0000 (15:19 +0100)]
Merge pull request #1101 from martin-frbg/martin-frbg-patch-1
LAPACKE: fix wrong number of columns in ?ormlq
Ashwin Sekhar T K [Tue, 21 Feb 2017 11:25:00 +0000 (03:25 -0800)]
THUNDERX2T99: Add Optimized C/Z DOT Implementation
Ashwin Sekhar T K [Mon, 20 Feb 2017 07:12:27 +0000 (23:12 -0800)]
THUNDERX2T99: Add Optimized SDOT Implementation
Martin Kroeker [Tue, 21 Feb 2017 07:26:39 +0000 (08:26 +0100)]
Merge pull request #1102 from brada4/develop
Correct Apollo Lake CPUID identification in dynamic_arch builds
Andrew [Mon, 20 Feb 2017 22:54:59 +0000 (23:54 +0100)]
detect apollo lake for real
Martin Kroeker [Mon, 20 Feb 2017 15:20:43 +0000 (16:20 +0100)]
LAPACKE: fix wrong number of columns in ?ormlq
Copied from lapack https://github.com/Reference-LAPACK/lapack/pull/127 by vladimir-ch (with earlier changes from echeresh's
PR 115 "lapacke_*ormlq_work: move declarations under if" there as they touched some of the same files)
Ashwin Sekhar T K [Mon, 20 Feb 2017 07:11:50 +0000 (23:11 -0800)]
THUNDERX2T99: Bug fix in C/Z IAMAX
Ashwin Sekhar T K [Fri, 17 Feb 2017 11:06:32 +0000 (03:06 -0800)]
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
Martin Kroeker [Fri, 17 Feb 2017 09:30:09 +0000 (10:30 +0100)]
Merge pull request #1091 from staticfloat/sf/corei5_7600k
CPUID mappings for Core i5-7600K (Kaby Lake)
Ashwin Sekhar T K [Tue, 14 Feb 2017 12:10:06 +0000 (04:10 -0800)]
THUNDERX2T99: Add parallel SCNRM2 Implementation
Martin Kroeker [Mon, 13 Feb 2017 22:38:50 +0000 (23:38 +0100)]
Power8 inline assembly fixes
Quoting patch author amodra from #1078
Lots of issues here.
- The vsx regs weren't listed as clobbered.
- Poor choice of vsx regs, which along with the lack of clobbers led to
trashing v0..v21 and fr14..fr23. Ideally you'd let gcc choose all
temp vsx regs, but asms currently have a limit of 30 i/o parms.
- Other regs were clobbered unnecessarily, seemingly in an attempt to
clobber inputs, with gcc-7 complaining about the clobber of r2.
(Changed inputs should be also listed as outputs or as an i/o.)
- "r" constraint used instead of "b" for gprs used in insns where the
r0 encoding means zero rather than r0.
- There were unused asm inputs too.
- All memory was clobbered rather than hooking up memory outputs with
proper memory constraints, and that and the lack of proper memory
input constraints meant the asms needed to be volatile and their
containing function noinline.
- Some parameters were being passed unnecessarily via memory.
- When a copy of a
Martin Kroeker [Sun, 12 Feb 2017 16:00:17 +0000 (17:00 +0100)]
Merge pull request #1096 from martin-frbg/pkg-config
Build only openblas.pc for pkg-config and install it from cmake as well
Martin Kroeker [Sun, 12 Feb 2017 13:38:32 +0000 (14:38 +0100)]
Add cmake template for openblas.pc
Martin Kroeker [Sun, 12 Feb 2017 13:37:33 +0000 (14:37 +0100)]
Create and install openblas.pc in cmake builds
Martin Kroeker [Sun, 12 Feb 2017 13:35:48 +0000 (14:35 +0100)]
Create and install only a single openblas.pc file
Martin Kroeker [Sun, 12 Feb 2017 13:34:03 +0000 (14:34 +0100)]
Rename blas.pc.in to openblas.pc.in
Martin Kroeker [Sun, 12 Feb 2017 13:30:29 +0000 (14:30 +0100)]
Merge pull request #1095 from martin-frbg/lapack370-cmake
Update cmakefiles for netlib 3.7.0
Martin Kroeker [Sun, 12 Feb 2017 12:49:49 +0000 (13:49 +0100)]
Add zlasyf_aa to lapack.cmake
Martin Kroeker [Sun, 12 Feb 2017 00:59:30 +0000 (01:59 +0100)]
Add another bunch of lapack 3.7 functions to cmake list
Martin Kroeker [Sun, 12 Feb 2017 00:37:35 +0000 (01:37 +0100)]
Add LAPACK 3.7 files not mentioned in announcement
Martin Kroeker [Sat, 11 Feb 2017 23:40:16 +0000 (00:40 +0100)]
Update cmake file list for lapacke 3.7.0
Martin Kroeker [Sat, 11 Feb 2017 22:11:26 +0000 (23:11 +0100)]
Update cmake file list for lapack 3.7.0
Martin Kroeker [Sat, 11 Feb 2017 19:48:41 +0000 (20:48 +0100)]
Merge pull request #1094 from martin-frbg/cmake-1
Update cmakefiles with changes from netlib 3.6.1
Martin Kroeker [Sat, 11 Feb 2017 18:56:02 +0000 (19:56 +0100)]
Reflect name change of lapacke_mangling.h template
Martin Kroeker [Sat, 11 Feb 2017 18:54:02 +0000 (19:54 +0100)]
Add new functions from LAPACK 3.6.1
Martin Kroeker [Sat, 11 Feb 2017 16:41:39 +0000 (17:41 +0100)]
Merge pull request #1093 from martin-frbg/restore-cmakeinstall
Restore cmake install target
Martin Kroeker [Sat, 11 Feb 2017 15:43:46 +0000 (16:43 +0100)]
Add cmake install target
Add CMAKE install target (based on patch provided by PrimarchOfTheSpaceWolves in #957)
This was originally merged as 988 but accidentally reverted by my subsequent PR the following day
Elliot Saba [Fri, 10 Feb 2017 23:23:34 +0000 (15:23 -0800)]
Add `exfamily == 9` case (Kaby Lake) to dynamic arch detection
Elliot Saba [Fri, 10 Feb 2017 22:47:10 +0000 (14:47 -0800)]
CPUID mappings for Core i5-7600K (Kaby Lake)
Martin Kroeker [Wed, 8 Feb 2017 00:01:18 +0000 (01:01 +0100)]
Merge pull request #1084 from isuruf/develop
Install pkg-config files
Martin Kroeker [Wed, 8 Feb 2017 00:00:32 +0000 (01:00 +0100)]
Merge pull request #1087 from grisuthedragon/enable-a12
Enable EXCAVATOR kernels for A12-9800
Martin Koehler [Tue, 7 Feb 2017 20:38:28 +0000 (21:38 +0100)]
Enable EXCAVATOR kernels for A12-9800
Martin Kroeker [Tue, 7 Feb 2017 10:40:41 +0000 (11:40 +0100)]
Merge pull request #1085 from vladimir-ch/lapacke_laswp_work
LAPACKE: fix incorrect value of lda_t in lapacke_?laswp_work
Ashwin Sekhar T K [Tue, 7 Feb 2017 10:14:33 +0000 (02:14 -0800)]
THUNDERX2T99: Fix bug in SNRM2
Ashwin Sekhar T K [Mon, 6 Feb 2017 04:57:54 +0000 (20:57 -0800)]
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
Vladimir Chalupecky [Tue, 7 Feb 2017 08:21:46 +0000 (09:21 +0100)]
LAPACKE: fix incorrect value of lda_t in lapacke_?laswp_work
Fixed in Reference LAPACK in commit:
https://github.com/Reference-LAPACK/lapack/pull/118/commits/
07e1fbd89752bed74d35c48e92287d467646a158
Isuru Fernando [Mon, 6 Feb 2017 06:29:48 +0000 (11:59 +0530)]
Install pkg-config files
Martin Kroeker [Sat, 4 Feb 2017 16:25:43 +0000 (17:25 +0100)]
Merge pull request #1076 from ashwinyes/develop_20170130_thunderx2t99
More optimized implementations for ThunderX2T99
Ashwin Sekhar T K [Fri, 3 Feb 2017 10:09:17 +0000 (02:09 -0800)]
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
Ashwin Sekhar T K [Thu, 2 Feb 2017 06:10:35 +0000 (22:10 -0800)]
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
Ashwin Sekhar T K [Wed, 1 Feb 2017 07:25:41 +0000 (23:25 -0800)]
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
Ashwin Sekhar T K [Tue, 31 Jan 2017 06:10:45 +0000 (11:40 +0530)]
LAPACK: Fix lapack-test errors in ARM64 threaded version
Ashwin Sekhar T K [Fri, 27 Jan 2017 09:26:00 +0000 (01:26 -0800)]
THUNDERX2T99: Add optimized CASUM Implementation
Ashwin Sekhar T K [Mon, 30 Jan 2017 06:39:04 +0000 (12:09 +0530)]
THUNDERX2T99: Rename labels in for DDOT and SNRM2
Ashwin Sekhar T K [Fri, 27 Jan 2017 09:11:58 +0000 (01:11 -0800)]
THUNDERX2T99: Remove Duplicate Code
Ashwin Sekhar T K [Wed, 25 Jan 2017 11:14:59 +0000 (03:14 -0800)]
THUNDERX2T99: Add Optimized CGEMM Implementation
Zhang Xianyi [Wed, 25 Jan 2017 14:17:05 +0000 (22:17 +0800)]
Merge pull request #1074 from ashwinyes/develop_20170116_thunderx2t99_sgemm
Add more THUNDERX2T99 Optimized APIs
Ashwin Sekhar T K [Wed, 25 Jan 2017 07:14:09 +0000 (23:14 -0800)]
Update .gitignore
Ashwin Sekhar T K [Wed, 25 Jan 2017 07:13:47 +0000 (23:13 -0800)]
Benchmark: Add MFlops print in iamax benchmark
Ashwin Sekhar T K [Wed, 25 Jan 2017 04:50:23 +0000 (20:50 -0800)]
Benchmarks: Avoid building lapack benchmarks when NO_LAPACK=1
Ashwin Sekhar T K [Tue, 24 Jan 2017 16:09:29 +0000 (21:39 +0530)]
THUNDERX2T99: Add threaded SNRM2 Implementation
Ashwin Sekhar T K [Tue, 24 Jan 2017 09:19:49 +0000 (14:49 +0530)]
ARM64: Rename kernel files to have consistent naming
Ashwin Sekhar T K [Thu, 19 Jan 2017 10:27:13 +0000 (15:57 +0530)]
THUNDERX2T99: Add Optimized CNRM2 Implementation
Ashwin Sekhar T K [Thu, 19 Jan 2017 08:57:02 +0000 (00:57 -0800)]
THUNDERX2T99: Add Optimized SNRM2 Implementation
Ashwin Sekhar T K [Wed, 18 Jan 2017 08:39:04 +0000 (00:39 -0800)]
Update .gitignore
Ashwin Sekhar T K [Thu, 19 Jan 2017 05:26:17 +0000 (10:56 +0530)]
THUNDERX2T99: Add threaded DDOT Implementation