Ashwin Sekhar T K [Tue, 28 Feb 2017 09:11:38 +0000 (01:11 -0800)]
THUNDERX2T99: Bug Fixes in D/Z NRM2 and ZGEMM
Ashwin Sekhar T K [Mon, 27 Feb 2017 11:22:50 +0000 (11:22 +0000)]
THUNDERX2T99: Add Optimized ZGEMM Implementation
Ashwin Sekhar T K [Wed, 22 Feb 2017 10:26:51 +0000 (02:26 -0800)]
THUNDERX2T99: Add Optimized D/Z NRM2 Implementation
Ashwin Sekhar T K [Wed, 22 Feb 2017 05:42:32 +0000 (21:42 -0800)]
ARM64: Remove unused code
Ashwin Sekhar T K [Tue, 21 Feb 2017 11:25:00 +0000 (03:25 -0800)]
THUNDERX2T99: Add Optimized C/Z DOT Implementation
Ashwin Sekhar T K [Mon, 20 Feb 2017 07:12:27 +0000 (23:12 -0800)]
THUNDERX2T99: Add Optimized SDOT Implementation
Ashwin Sekhar T K [Mon, 20 Feb 2017 07:11:50 +0000 (23:11 -0800)]
THUNDERX2T99: Bug fix in C/Z IAMAX
Ashwin Sekhar T K [Fri, 17 Feb 2017 11:06:32 +0000 (03:06 -0800)]
THUNDERX2T99: Add Optimized C/Z IAMAX Implementation
Ashwin Sekhar T K [Tue, 14 Feb 2017 12:10:06 +0000 (04:10 -0800)]
THUNDERX2T99: Add parallel SCNRM2 Implementation
Ashwin Sekhar T K [Tue, 7 Feb 2017 10:14:33 +0000 (02:14 -0800)]
THUNDERX2T99: Fix bug in SNRM2
Ashwin Sekhar T K [Mon, 6 Feb 2017 04:57:54 +0000 (20:57 -0800)]
THUNDERX2T99: Add Optimized S/D IAMAX Implementation
Ashwin Sekhar T K [Fri, 3 Feb 2017 10:09:17 +0000 (02:09 -0800)]
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
Ashwin Sekhar T K [Thu, 2 Feb 2017 06:10:35 +0000 (22:10 -0800)]
THUNDERX2T99: Add optimized S/D/C/Z COPY Implementations
Ashwin Sekhar T K [Wed, 1 Feb 2017 07:25:41 +0000 (23:25 -0800)]
THUDNERX2T99: Add optimized D/C/Z ASUM Implementations
Ashwin Sekhar T K [Tue, 31 Jan 2017 06:10:45 +0000 (11:40 +0530)]
LAPACK: Fix lapack-test errors in ARM64 threaded version
Ashwin Sekhar T K [Fri, 27 Jan 2017 09:26:00 +0000 (01:26 -0800)]
THUNDERX2T99: Add optimized CASUM Implementation
Ashwin Sekhar T K [Mon, 30 Jan 2017 06:39:04 +0000 (12:09 +0530)]
THUNDERX2T99: Rename labels in for DDOT and SNRM2
Ashwin Sekhar T K [Fri, 27 Jan 2017 09:11:58 +0000 (01:11 -0800)]
THUNDERX2T99: Remove Duplicate Code
Ashwin Sekhar T K [Wed, 25 Jan 2017 11:14:59 +0000 (03:14 -0800)]
THUNDERX2T99: Add Optimized CGEMM Implementation
Ashwin Sekhar T K [Wed, 25 Jan 2017 07:14:09 +0000 (23:14 -0800)]
Update .gitignore
Ashwin Sekhar T K [Wed, 25 Jan 2017 07:13:47 +0000 (23:13 -0800)]
Benchmark: Add MFlops print in iamax benchmark
Ashwin Sekhar T K [Wed, 25 Jan 2017 04:50:23 +0000 (20:50 -0800)]
Benchmarks: Avoid building lapack benchmarks when NO_LAPACK=1
Ashwin Sekhar T K [Tue, 24 Jan 2017 16:09:29 +0000 (21:39 +0530)]
THUNDERX2T99: Add threaded SNRM2 Implementation
Ashwin Sekhar T K [Tue, 24 Jan 2017 09:19:49 +0000 (14:49 +0530)]
ARM64: Rename kernel files to have consistent naming
Ashwin Sekhar T K [Thu, 19 Jan 2017 10:27:13 +0000 (15:57 +0530)]
THUNDERX2T99: Add Optimized CNRM2 Implementation
Ashwin Sekhar T K [Thu, 19 Jan 2017 08:57:02 +0000 (00:57 -0800)]
THUNDERX2T99: Add Optimized SNRM2 Implementation
Ashwin Sekhar T K [Wed, 18 Jan 2017 08:39:04 +0000 (00:39 -0800)]
Update .gitignore
Ashwin Sekhar T K [Thu, 19 Jan 2017 05:26:17 +0000 (10:56 +0530)]
THUNDERX2T99: Add threaded DDOT Implementation
Ashwin Sekhar T K [Thu, 19 Jan 2017 05:23:48 +0000 (10:53 +0530)]
THUNDERX2T99: Add Optimized DDOT Implementation
Ashwin Sekhar T K [Wed, 18 Jan 2017 08:57:11 +0000 (00:57 -0800)]
THUNDERX2T99: Improve SGEMM
Ashwin Sekhar T K [Tue, 17 Jan 2017 07:16:23 +0000 (23:16 -0800)]
THUNDERX2T99: Improve DGEMM
Ashwin Sekhar T K [Tue, 17 Jan 2017 08:28:54 +0000 (00:28 -0800)]
THUNDERX2T99: Add Optimized DAXPY Implementation
Ashwin Sekhar T K [Wed, 11 Jan 2017 09:37:11 +0000 (15:07 +0530)]
THUNDERX2T99: Add Optimized SGEMM Implementation
Ashwin Sekhar T K [Wed, 11 Jan 2017 07:47:10 +0000 (13:17 +0530)]
ARM64: Let target VULCAN inherit THUNDERX2T99 properties
Martin Kroeker [Mon, 16 Jan 2017 15:03:53 +0000 (16:03 +0100)]
Merge pull request #1067 from martin-frbg/msysinst
Fix DESTDIR support for cygwin/msys2 install
Martin Kroeker [Mon, 16 Jan 2017 14:15:46 +0000 (15:15 +0100)]
Fix DESTDIR support for cygwin/msys2 install
fixes #1066
Zhang Xianyi [Mon, 16 Jan 2017 05:20:10 +0000 (13:20 +0800)]
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
Martin Kroeker [Wed, 11 Jan 2017 16:40:06 +0000 (17:40 +0100)]
Update Makefile.install (#1064)
* Update Makefile.install to reflect name change of lapacke_mangling.h source
Werner Saar [Wed, 11 Jan 2017 11:37:45 +0000 (12:37 +0100)]
Merge pull request #1063 from wernsaar/develop
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
Werner Saar [Wed, 11 Jan 2017 10:56:50 +0000 (11:56 +0100)]
prepared kernel/setparam-ref.c for UNROLL values, that are not a power of two
Werner Saar [Wed, 11 Jan 2017 09:30:46 +0000 (10:30 +0100)]
Merge pull request #1062 from wernsaar/develop
prepared parameter.c for UNROLL values, that are not a power of two
Werner Saar [Wed, 11 Jan 2017 08:50:28 +0000 (09:50 +0100)]
prepared parameter.c for UNROLL values, that are not a power of two
Werner Saar [Wed, 11 Jan 2017 06:29:17 +0000 (07:29 +0100)]
prepared lapack/lauum for UNROLL values, that are not a power of two
Ashwin Sekhar T K [Tue, 10 Jan 2017 08:55:55 +0000 (14:25 +0530)]
ARM64: Add Cavium THUNDERX2T99 Target
Ashwin Sekhar T K [Tue, 10 Jan 2017 07:23:47 +0000 (12:53 +0530)]
ARM64: Fix auto detect of ARM64 cpus
Andrew Pinski [Fri, 17 Jul 2015 04:08:03 +0000 (00:08 -0400)]
THUNDERX: Add optimized version of daxpy
This is better for single core but does not change anything for multiple cores
Martin Kroeker [Tue, 10 Jan 2017 18:09:49 +0000 (19:09 +0100)]
Merge pull request #1060 from martin-frbg/lapacke-mingw
Split LAPACKE 3.7.0 obj list (take 2, missed splitting the actual ar command invocation)
Martin Kroeker [Tue, 10 Jan 2017 16:11:35 +0000 (17:11 +0100)]
Split LAPACKE 3.7.0 obj list (take 2)
Missed the splitting of the actual ar call
Werner Saar [Tue, 10 Jan 2017 15:00:28 +0000 (16:00 +0100)]
Merge pull request #1059 from wernsaar/develop
updated some level1 funcions, that are not thread save
Werner Saar [Tue, 10 Jan 2017 13:05:07 +0000 (14:05 +0100)]
updated some level1 funcions, that are not thread save
Werner Saar [Tue, 10 Jan 2017 10:30:08 +0000 (11:30 +0100)]
Merge pull request #1058 from wernsaar/develop
prepared lapack/potrf functions for UNROLL values, that are not a pow…
Werner Saar [Tue, 10 Jan 2017 09:50:28 +0000 (10:50 +0100)]
prepared lapack/potrf functions for UNROLL values, that are not a power of two
Andrew Pinski [Thu, 16 Jul 2015 07:30:16 +0000 (03:30 -0400)]
THUNDERX: Add an optimized version of ddot
Andrew Pinski [Tue, 10 Jan 2017 06:27:36 +0000 (11:57 +0530)]
ARM64: Add Cavium THUNDERX Target
Ashwin Sekhar T K [Mon, 9 Jan 2017 13:18:39 +0000 (18:48 +0530)]
VULCAN: Add optimized DGEMM implementation
Ashwin Sekhar T K [Tue, 4 Oct 2016 08:50:20 +0000 (01:50 -0700)]
ARM64: Add the VULCAN Target
Ashwin Sekhar T K [Tue, 4 Oct 2016 08:24:28 +0000 (01:24 -0700)]
CORTEXA57: Add assembly kernels for copy routines
Zhang Xianyi [Tue, 10 Jan 2017 05:58:26 +0000 (13:58 +0800)]
Merge pull request #1055 from ksraste/develop
Add msa optimization for AXPY, COPY, SCALE, SWAP
jiahaipeng [Sun, 11 Dec 2016 09:09:50 +0000 (09:09 +0000)]
Adding multi-threading for copy, dot, rot, and asum funcitons
jiahaipeng [Sun, 11 Dec 2016 09:02:18 +0000 (09:02 +0000)]
modify the blas_l1_thread.c for support multi-threded for L1 fuction with return value
Martin Kroeker [Mon, 9 Jan 2017 19:45:26 +0000 (20:45 +0100)]
Merge pull request #1057 from martin-frbg/lapacke-mingw
Split the obj list of LAPACKE 3.7.0
Martin Kroeker [Mon, 9 Jan 2017 17:29:53 +0000 (18:29 +0100)]
Split the obj list of LAPACKE 3.7.0
Split obj list to allow building with mingw (argument list too long for the msys ar)
kaustubh [Mon, 9 Jan 2017 12:57:23 +0000 (18:27 +0530)]
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
kaustubh [Mon, 9 Jan 2017 12:52:09 +0000 (18:22 +0530)]
Add msa optimization for AXPY, COPY, SCALE, SWAP
Signed-off-by: kaustubh <kaustubh.raste@imgtec.com>
Werner Saar [Mon, 9 Jan 2017 12:38:56 +0000 (13:38 +0100)]
Merge pull request #1054 from wernsaar/develop
prepared lapack/getrf functions for UNROLL values, that are not a pow…
Werner Saar [Mon, 9 Jan 2017 11:57:26 +0000 (12:57 +0100)]
prepared lapack/getrf functions for UNROLL values, that are not a power of two
Zhang Xianyi [Mon, 9 Jan 2017 10:52:42 +0000 (05:52 -0500)]
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
Zhang Xianyi [Mon, 9 Jan 2017 10:48:09 +0000 (05:48 -0500)]
Add USE_TRMM=1 for IBM z13 in kernel/Makefile.L3
Werner Saar [Mon, 9 Jan 2017 10:17:38 +0000 (11:17 +0100)]
Merge pull request #1053 from wernsaar/develop
prepared driver/level3 functions for UNROLL values, that are not a po…
Werner Saar [Mon, 9 Jan 2017 09:38:15 +0000 (10:38 +0100)]
prepared driver/level3 functions for UNROLL values, that are not a power of two
Zhang Xianyi [Mon, 9 Jan 2017 08:23:22 +0000 (16:23 +0800)]
Merge pull request #1050 from martin-frbg/fflags
Apply COMMON_OPT to default FFLAGS
Zhang Xianyi [Mon, 9 Jan 2017 08:22:58 +0000 (16:22 +0800)]
Merge pull request #1052 from martin-frbg/locking
Fix thread data races detected by helgrind 3.12
Martin Kroeker [Mon, 9 Jan 2017 00:10:43 +0000 (01:10 +0100)]
Relocate declaration of alloc_lock outside ifdef block
Martin Kroeker [Sun, 8 Jan 2017 22:33:51 +0000 (23:33 +0100)]
Fix thread data races detected by helgrind 3.12
Ref. #995, may possibly help solve issues seen in 660,883
Martin Kroeker [Sun, 8 Jan 2017 20:17:22 +0000 (21:17 +0100)]
Apply COMMON_OPT to default FFLAGS to avoid building non-optimized LAPACK by mistake
Werner Saar [Sun, 8 Jan 2017 08:30:19 +0000 (09:30 +0100)]
Merge pull request #1049 from wernsaar/develop
removed blas_thread_shutdown from gensymbol
Werner Saar [Sun, 8 Jan 2017 07:51:30 +0000 (08:51 +0100)]
removed blas_thread_shutdown from gensymbol
Zhang Xianyi [Sun, 8 Jan 2017 03:19:06 +0000 (11:19 +0800)]
Merge pull request #1047 from brada4/erre
Improve R benchmark timing
Zhang Xianyi [Sun, 8 Jan 2017 03:18:38 +0000 (11:18 +0800)]
Merge pull request #1040 from martin-frbg/develop
Use appropriate int32/int64 format for error number in message string
Zhang Xianyi [Sun, 8 Jan 2017 03:18:05 +0000 (11:18 +0800)]
Merge pull request #1036 from sva-img/develop
Added prefetch to CGEMV and ZGEMV.
Andrew [Sat, 7 Jan 2017 18:01:42 +0000 (19:01 +0100)]
anti GC and reflow
Andrew [Sat, 7 Jan 2017 18:01:21 +0000 (19:01 +0100)]
init
Werner Saar [Sat, 7 Jan 2017 14:09:56 +0000 (15:09 +0100)]
Merge pull request #1046 from wernsaar/develop
updated lapack to version 3.7.0 with latest patches from git
Werner Saar [Sat, 7 Jan 2017 13:27:08 +0000 (14:27 +0100)]
fix for appveyor test
Werner Saar [Sat, 7 Jan 2017 12:20:28 +0000 (13:20 +0100)]
updated exports/gensymbol for lapack-3.7.0
Werner Saar [Sat, 7 Jan 2017 07:41:42 +0000 (08:41 +0100)]
filtered out -fopenmp and fix for mingw
Werner Saar [Fri, 6 Jan 2017 15:35:20 +0000 (16:35 +0100)]
removed xerbla and lsame for Makefile
Werner Saar [Fri, 6 Jan 2017 15:14:53 +0000 (16:14 +0100)]
removed obj-files, that are moved to lapack 3.7.0
Werner Saar [Fri, 6 Jan 2017 12:42:31 +0000 (13:42 +0100)]
filtered out optimized functions
Werner Saar [Fri, 6 Jan 2017 10:48:40 +0000 (11:48 +0100)]
added lapack 3.7.0 with latest patches from git
Werner Saar [Fri, 6 Jan 2017 10:46:58 +0000 (11:46 +0100)]
removed lapack-devel.log
Werner Saar [Fri, 6 Jan 2017 10:44:57 +0000 (11:44 +0100)]
removed lapack 3.6.0
Martin Kroeker [Thu, 5 Jan 2017 18:15:36 +0000 (19:15 +0100)]
Merge pull request #1043 from quickwritereader/z13
Z13
Martin Kroeker [Wed, 4 Jan 2017 22:16:48 +0000 (23:16 +0100)]
Update xerbla.c
Abdurrauf [Wed, 4 Jan 2017 15:41:24 +0000 (19:41 +0400)]
Update README.md
Abdurrauf [Wed, 4 Jan 2017 15:32:33 +0000 (19:32 +0400)]
dtrmm and dgemm for z13
Martin Kroeker [Thu, 29 Dec 2016 23:45:59 +0000 (00:45 +0100)]
Use appropriate int32/int64 format for error number in message string
Shivraj Patil [Tue, 27 Dec 2016 06:03:51 +0000 (11:33 +0530)]
Added prefetch to CGEMV and ZGEMV.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Sun, 18 Dec 2016 06:48:22 +0000 (14:48 +0800)]
Merge pull request #1032 from kiwifb/OSX_target
Do not override MACOSX_DEPLOYMENT_TARGET if it is already defined.
Zhang Xianyi [Sun, 18 Dec 2016 06:47:59 +0000 (14:47 +0800)]
Merge pull request #1025 from mfoster96/develop
Fix for issue #1024: arm-linux-androideabi-g++ Compiler Error in /cpu…