Shivraj Patil [Fri, 15 Jul 2016 13:08:25 +0000 (18:38 +0530)]
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Thu, 14 Jul 2016 20:09:36 +0000 (13:09 -0700)]
Refs #917 Avoid detecting gfortran bug on IBM POWER + Ubuntu
Zhang Xianyi [Thu, 14 Jul 2016 19:49:33 +0000 (15:49 -0400)]
Merge pull request #926 from vriera/develop
Complete support for MIPS n32 ABI
Zhang Xianyi [Thu, 14 Jul 2016 19:48:58 +0000 (15:48 -0400)]
Merge pull request #925 from martin-frbg/develop
Update zgetrf2.f, cpuid_x86.c, dynamic.c
Zhang Xianyi [Thu, 14 Jul 2016 19:47:55 +0000 (15:47 -0400)]
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
Zhang Xianyi [Thu, 14 Jul 2016 19:45:39 +0000 (15:45 -0400)]
Merge pull request #918 from sva-img/develop
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM.
Vicente Olivert Riera [Thu, 14 Jul 2016 16:20:51 +0000 (17:20 +0100)]
Complete support for MIPS n32 ABI
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Martin Kroeker [Thu, 14 Jul 2016 15:29:34 +0000 (17:29 +0200)]
Update cpuid_x86.c
Martin Kroeker [Thu, 14 Jul 2016 10:25:17 +0000 (12:25 +0200)]
Update cpuid_x86.c
Martin Kroeker [Thu, 14 Jul 2016 10:22:55 +0000 (12:22 +0200)]
Update dynamic.c
Add Braswell (extended model 4, model 12) N3150 as Nehalem
Martin Kroeker [Thu, 14 Jul 2016 09:41:57 +0000 (11:41 +0200)]
Update zgetrf2.f
Trivial typo correction (ZERBLA => XERBLA) to fix #910
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:21:17 +0000 (13:51 +0530)]
Improvements to TRMM and GEMM kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:20:38 +0000 (13:50 +0530)]
Improvements to GEMV kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:19:15 +0000 (13:49 +0530)]
Improvements to COPY and IAMAX kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:18:13 +0000 (13:48 +0530)]
Add time prints in benchmark output
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:16:01 +0000 (13:46 +0530)]
Add IAMAX and NRM2 benchmarks
Shivraj Patil [Tue, 28 Jun 2016 12:21:10 +0000 (17:51 +0530)]
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Mon, 27 Jun 2016 14:05:30 +0000 (10:05 -0400)]
Merge pull request #913 from dpfoose/develop
Small change to allow compiling with USE_OPENMP on MSVC
Zhang Xianyi [Mon, 27 Jun 2016 14:04:54 +0000 (10:04 -0400)]
Merge pull request #907 from jeromerobert/bug786
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
Zhang Xianyi [Mon, 27 Jun 2016 14:04:18 +0000 (10:04 -0400)]
Merge pull request #897 from ksraste/develop
STRSM optimized for MSA
Daniel Patrick Foose [Tue, 14 Jun 2016 18:37:28 +0000 (14:37 -0400)]
Change to allow compiling with USE_OPENMP on MSVC
MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition.
Jerome Robert [Tue, 7 Jun 2016 14:11:09 +0000 (16:11 +0200)]
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
Werner Saar [Tue, 31 May 2016 12:13:52 +0000 (14:13 +0200)]
Merge pull request #898 from wernsaar/develop
added experimental support for optimized lapack fortran functions
Werner Saar [Tue, 31 May 2016 10:53:07 +0000 (12:53 +0200)]
added directory for optimized lapack fortan codes and added dlaqr5.f
Kaustubh Raste [Tue, 31 May 2016 04:47:23 +0000 (10:17 +0530)]
STRSM optimized for MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Kaustubh Raste [Mon, 30 May 2016 15:47:00 +0000 (21:17 +0530)]
STRSM optimized
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Mon, 30 May 2016 06:52:58 +0000 (14:52 +0800)]
Merge pull request #893 from biddisco/develop
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
Zhang Xianyi [Mon, 30 May 2016 06:52:40 +0000 (14:52 +0800)]
Merge pull request #891 from rndfax/develop
mips64/axpy: fix error when INCY == 0
John Biddiscombe [Wed, 25 May 2016 07:13:28 +0000 (09:13 +0200)]
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
Aleksey Kuleshov [Mon, 23 May 2016 10:24:15 +0000 (13:24 +0300)]
mips64/axpy: fix error when INCY == 0
Werner Saar [Mon, 23 May 2016 09:20:41 +0000 (11:20 +0200)]
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
Werner Saar [Sun, 22 May 2016 14:01:35 +0000 (16:01 +0200)]
Merge pull request #890 from wernsaar/develop
optimized dtrsm_kernel_LT for POWER8
Werner Saar [Sun, 22 May 2016 13:20:04 +0000 (15:20 +0200)]
optimized dtrsm_kernel_LT for POWER8
Werner Saar [Sun, 22 May 2016 11:51:47 +0000 (13:51 +0200)]
added trsm bencharks for POWER8 to benchmark/Makefile
Werner Saar [Sun, 22 May 2016 11:09:05 +0000 (13:09 +0200)]
added optimized dtrsm_LT kernel for POWER8
Zhang Xianyi [Sat, 21 May 2016 17:08:44 +0000 (01:08 +0800)]
Merge the patch for musl libc.
Zhang Xianyi [Fri, 20 May 2016 23:17:21 +0000 (07:17 +0800)]
Merge pull request #887 from ksraste/develop
STRSM optimization for MIPS P5600 and I6400 using MSA
Kaustubh Raste [Fri, 20 May 2016 05:29:03 +0000 (10:59 +0530)]
STRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Thu, 19 May 2016 11:59:09 +0000 (19:59 +0800)]
Merge pull request #886 from vriera/develop
Makefile.system: P5600 and I6400 cores need -mmsa
Zhang Xianyi [Thu, 19 May 2016 11:58:32 +0000 (19:58 +0800)]
Merge pull request #885 from sva-img/develop
SGEMM optimization for MIPS P5600 and I6400 using MSA.
Vicente Olivert Riera [Thu, 19 May 2016 09:35:45 +0000 (10:35 +0100)]
Makefile.system: P5600 and I6400 cores need -mmsa
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Shivraj Patil [Thu, 19 May 2016 05:34:42 +0000 (11:04 +0530)]
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Thu, 19 May 2016 03:16:43 +0000 (11:16 +0800)]
Merge pull request #878 from ksraste/develop
DTRSM bug fix for MIPS P5600 and I6400
Werner Saar [Tue, 17 May 2016 15:10:36 +0000 (17:10 +0200)]
Merge pull request #879 from wernsaar/develop
optimized dgemm and dgetrf for POWER8
Werner Saar [Tue, 17 May 2016 14:19:53 +0000 (16:19 +0200)]
optimized getrf_single.c for POWER8
Werner Saar [Tue, 17 May 2016 12:45:27 +0000 (14:45 +0200)]
optimized dgemm and dgetrf for POWER8
Kaustubh Raste [Tue, 17 May 2016 10:18:02 +0000 (15:48 +0530)]
DTRSM bug fix for MIPS P5600 and I6400
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Mon, 16 May 2016 15:21:56 +0000 (23:21 +0800)]
Merge pull request #877 from jeromerobert/bug873
Disable multi-threading in swap
Jerome Robert [Mon, 16 May 2016 13:07:55 +0000 (13:07 +0000)]
Disable multi-threading in swap
* Close #873
Werner Saar [Mon, 16 May 2016 12:52:40 +0000 (14:52 +0200)]
Merge pull request #876 from wernsaar/develop
optimized dgemm on power8 for 20 threads
Werner Saar [Mon, 16 May 2016 12:14:25 +0000 (14:14 +0200)]
optimized dgemm for 20 threads
Zhang Xianyi [Mon, 9 May 2016 14:54:55 +0000 (10:54 -0400)]
Merge pull request #869 from ksraste/develop
DTRSM optimization for MIPS P5600 and I6400 using MSA
Zhang Xianyi [Mon, 9 May 2016 14:54:30 +0000 (10:54 -0400)]
Merge pull request #868 from sva-img/develop
build fix for MIPS 32 bit
Kaustubh Raste [Mon, 9 May 2016 09:45:26 +0000 (15:15 +0530)]
DTRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Shivraj Patil [Mon, 9 May 2016 09:15:12 +0000 (14:45 +0530)]
build fix for MIPS 32 bit
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Fri, 6 May 2016 14:53:22 +0000 (10:53 -0400)]
Merge pull request #866 from sva-img/develop
DGEMM optimization for MIPS P5600 and I6400 using MSA
Shivraj Patil [Wed, 4 May 2016 05:37:14 +0000 (11:07 +0530)]
conflict resolved by syncing with 'xianyi:develop'
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Tue, 3 May 2016 21:06:31 +0000 (17:06 -0400)]
Merge pull request #867 from IvanUkhov/space
Wrap CURDIR and DESTDIR in quotes
Ivan Ukhov [Tue, 3 May 2016 19:31:32 +0000 (21:31 +0200)]
Wrap CURDIR and DESTDIR in quotes
Shivraj Patil [Tue, 3 May 2016 09:12:26 +0000 (14:42 +0530)]
DGEMM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Fri, 29 Apr 2016 15:46:24 +0000 (11:46 -0400)]
Merge pull request #863 from ashwinyes/develop_20160429_update_numa_binding
Update NUMA CPU binding
Zhang Xianyi [Fri, 29 Apr 2016 15:44:36 +0000 (11:44 -0400)]
Merge pull request #847 from sva-img/develop
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Werner Saar [Fri, 29 Apr 2016 11:33:45 +0000 (13:33 +0200)]
Merge pull request #864 from wernsaar/develop
optimized dgemm for POWER8
Werner Saar [Fri, 29 Apr 2016 10:52:47 +0000 (12:52 +0200)]
optimized dgemm for POWER8
Ashwin Sekhar T K [Fri, 29 Apr 2016 06:28:15 +0000 (11:58 +0530)]
Update NUMA CPU binding
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
Zhang Xianyi [Thu, 28 Apr 2016 15:42:18 +0000 (11:42 -0400)]
Merge pull request #858 from buffer51/develop
Fixed cross-suffix detection for path that contains dashes
buffer51 [Thu, 28 Apr 2016 05:23:02 +0000 (22:23 -0700)]
Use CROSS_SUFFIX only if CROSS is set
buffer51 [Wed, 27 Apr 2016 19:09:44 +0000 (12:09 -0700)]
Fixed cross-suffix detection for path that contains dashes when the compiler itself doesn't
Werner Saar [Wed, 27 Apr 2016 14:34:15 +0000 (16:34 +0200)]
Merge pull request #856 from wernsaar/develop
optimized dgemm for POWER8
Werner Saar [Wed, 27 Apr 2016 13:48:09 +0000 (15:48 +0200)]
optimized param.h for POWER8
Werner Saar [Wed, 27 Apr 2016 12:01:08 +0000 (14:01 +0200)]
optimized dgemm for POWER8
Zhang Xianyi [Tue, 26 Apr 2016 14:24:33 +0000 (10:24 -0400)]
Merge pull request #852 from buffer51/develop
Added Android as a community-supported OS
Zhang Xianyi [Tue, 26 Apr 2016 14:24:13 +0000 (10:24 -0400)]
Merge pull request #851 from rndfax/develop
allow building tests when CROSS compiling but don't run them
buffer51 [Tue, 26 Apr 2016 10:14:03 +0000 (03:14 -0700)]
Added Android as a community-supported OS
Aleksey Kuleshov [Fri, 22 Apr 2016 15:21:18 +0000 (18:21 +0300)]
allow building tests when CROSS compiling but don't run them
Werner Saar [Mon, 25 Apr 2016 10:00:43 +0000 (12:00 +0200)]
Merge pull request #850 from wernsaar/develop
Bugfixes and enhancements for EXCAVATOR
Werner Saar [Mon, 25 Apr 2016 08:40:04 +0000 (10:40 +0200)]
updated param.h for EXCAVATOR
Werner Saar [Mon, 25 Apr 2016 08:36:23 +0000 (10:36 +0200)]
updated some kernel files for EXCAVATOR
Werner Saar [Mon, 25 Apr 2016 08:13:30 +0000 (10:13 +0200)]
bugfix for EXCAVATOR and DYNAMIC_ARCH
Werner Saar [Mon, 25 Apr 2016 07:08:38 +0000 (09:08 +0200)]
bugfix in dynamic.c
Werner Saar [Sat, 23 Apr 2016 14:25:27 +0000 (16:25 +0200)]
Merge pull request #849 from wernsaar/develop
optimized gemm for POWER8
Werner Saar [Sat, 23 Apr 2016 12:26:24 +0000 (14:26 +0200)]
updated param.h for POWER8
Werner Saar [Sat, 23 Apr 2016 08:04:41 +0000 (10:04 +0200)]
added sgemm_tcopy_8_power8.S
Werner Saar [Sat, 23 Apr 2016 05:37:18 +0000 (07:37 +0200)]
added cgemm_tcopy_8_power8.S
Werner Saar [Fri, 22 Apr 2016 11:46:22 +0000 (13:46 +0200)]
Merge pull request #848 from wernsaar/develop
Optimized zgemm for POWER8 and tested zgemm again
Werner Saar [Fri, 22 Apr 2016 11:07:12 +0000 (13:07 +0200)]
Optimized zgemm and tested zgemm again
Shivraj Patil [Fri, 22 Apr 2016 08:33:18 +0000 (14:03 +0530)]
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Werner Saar [Thu, 21 Apr 2016 11:52:24 +0000 (13:52 +0200)]
Merge pull request #846 from wernsaar/develop
Optimized sgemm and dgemm for POWER8
Werner Saar [Thu, 21 Apr 2016 10:54:32 +0000 (12:54 +0200)]
added bugfixes for some make files and smallscaling.c
Werner Saar [Thu, 21 Apr 2016 09:37:57 +0000 (11:37 +0200)]
Optimized sgemm and dgemm and tested again.
Werner Saar [Wed, 20 Apr 2016 13:28:28 +0000 (15:28 +0200)]
optimized Makefile.power for POWER8
wernsaar [Wed, 20 Apr 2016 11:44:22 +0000 (13:44 +0200)]
Merge pull request #845 from wernsaar/develop
optimized sgemm for power8
Werner Saar [Wed, 20 Apr 2016 11:06:38 +0000 (13:06 +0200)]
optimized sgemm
Werner Saar [Tue, 19 Apr 2016 14:08:54 +0000 (16:08 +0200)]
added optimized sgemm_tcopy for power8
Zhang Xianyi [Tue, 12 Apr 2016 19:32:10 +0000 (15:32 -0400)]
Bump to 0.2.19.dev.
Zhang Xianyi [Tue, 12 Apr 2016 19:28:31 +0000 (15:28 -0400)]
Update doc for 0.2.18 version.
Zhang Xianyi [Tue, 12 Apr 2016 15:49:28 +0000 (11:49 -0400)]
Delete LOCAL_BUFFER_SIZE for other architectures.
Zhang Xianyi [Tue, 12 Apr 2016 14:26:11 +0000 (22:26 +0800)]
Refs #834. Fix zgemv config bug on Steamroller.
Werner Saar [Mon, 11 Apr 2016 09:21:36 +0000 (11:21 +0200)]
bugfix for arm scal.c and zscal.c
Werner Saar [Sun, 10 Apr 2016 09:28:20 +0000 (11:28 +0200)]
added cholesky benchmarks to Makefile for ESSL