Zhang Xianyi [Thu, 13 Oct 2016 02:13:12 +0000 (10:13 +0800)]
Merge pull request #980 from kiwifb/utest_ldflags
make utest/Makefile respect LDFLAGS
Zhang Xianyi [Thu, 13 Oct 2016 02:12:35 +0000 (10:12 +0800)]
Merge pull request #973 from vladimir-ch/fix-lapacke-xlarfb
LAPACKE: fix wrong direction check in LAPACKE_?larfb_work
François Bissey [Wed, 12 Oct 2016 20:32:25 +0000 (09:32 +1300)]
make utest/Makefile respect LDFLAGS
Vladimir Chalupecky [Fri, 30 Sep 2016 20:31:30 +0000 (05:31 +0900)]
LAPACKE: fix wrong direction check in LAPACKE_?larfb_work
Closes #971
Zhang Xianyi [Thu, 22 Sep 2016 15:34:57 +0000 (11:34 -0400)]
Merge pull request #968 from buffer51/develop
Updated CROSS_SUFFIX regex to work with CC containing arguments
Zhang Xianyi [Thu, 22 Sep 2016 15:33:51 +0000 (11:33 -0400)]
Merge pull request #969 from sva-img/develop
DGEMM function split and data prefech
Shivraj Patil [Thu, 22 Sep 2016 11:55:46 +0000 (17:25 +0530)]
DGEMM function split and data prefech
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Paul MUSTIÈRE [Wed, 14 Sep 2016 18:42:22 +0000 (11:42 -0700)]
Updated CROSS_SUFFIX regex to work with CC containing arguments
Zhang Xianyi [Tue, 13 Sep 2016 20:15:37 +0000 (16:15 -0400)]
Merge pull request #958 from intelfx/remove-stabs
common_arm.h, common_mips.h: get rid of .func directives
Ivan Shapovalov [Fri, 9 Sep 2016 00:36:49 +0000 (03:36 +0300)]
common_arm.h, common_mips.h: get rid of .func directives
.func/.endfunc are gcc/gas-specific directives for generating stabs
debug information (and nothing more). This is near-useless now because
DWARF is commonly used, and not implemented in Clang. Hence building
OpenBLAS with Clang fails, and there is no sane way to detect GCC vs.
anything else with preprocessor definitions.
Hence, just remove these directives.
Zhang Xianyi [Thu, 1 Sep 2016 04:01:23 +0000 (00:01 -0400)]
Update develop for 0.2.20.dev.
Zhang Xianyi [Thu, 1 Sep 2016 03:58:29 +0000 (23:58 -0400)]
Update doc for 0.2.19.
Zhang Xianyi [Fri, 19 Aug 2016 01:59:43 +0000 (18:59 -0700)]
Refs #946. Use nrm2 reference implementation for Power8.
Zhang Xianyi [Thu, 18 Aug 2016 17:24:42 +0000 (10:24 -0700)]
Refs #929. Deal with zero and NaNs for scale.
Zhang Xianyi [Thu, 18 Aug 2016 13:31:31 +0000 (09:31 -0400)]
Merge pull request #941 from sva-img/develop
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Zhang Xianyi [Fri, 12 Aug 2016 21:20:59 +0000 (17:20 -0400)]
Merge pull request #943 from ibmsoe/IBMMASS_Support
Added support of IBM's MASS library that optimizes performance on Pow…
nishidha@us.ibm.com [Thu, 11 Aug 2016 09:13:26 +0000 (14:43 +0530)]
Added support of IBM's MASS library that optimizes performance on Power architectures
Shivraj Patil [Wed, 10 Aug 2016 12:14:22 +0000 (17:44 +0530)]
MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Shivraj Patil [Mon, 8 Aug 2016 06:28:01 +0000 (11:58 +0530)]
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Tue, 26 Jul 2016 13:54:31 +0000 (09:54 -0400)]
Merge pull request #933 from ashwinyes/develop_aarch64_20160726_Dgemm_8x4_Opts
Cortex A57: Improvements to DGEMM 8x4 kernel
Ashwin Sekhar T K [Mon, 25 Jul 2016 09:03:25 +0000 (14:33 +0530)]
Cortex A57: Improvements to DGEMM 8x4 kernel
Zhang Xianyi [Fri, 22 Jul 2016 15:42:30 +0000 (11:42 -0400)]
Merge pull request #930 from sva-img/develop
P6600/I6400 Build fix.
Shivraj Patil [Fri, 22 Jul 2016 13:15:06 +0000 (18:45 +0530)]
P6600/I6400 Build fix. Reverted the changes which was done to support for MIPS n32 ABI
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Fri, 15 Jul 2016 15:17:30 +0000 (11:17 -0400)]
Merge pull request #927 from sva-img/develop
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Shivraj Patil [Fri, 15 Jul 2016 13:08:25 +0000 (18:38 +0530)]
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Thu, 14 Jul 2016 20:09:36 +0000 (13:09 -0700)]
Refs #917 Avoid detecting gfortran bug on IBM POWER + Ubuntu
Zhang Xianyi [Thu, 14 Jul 2016 19:49:33 +0000 (15:49 -0400)]
Merge pull request #926 from vriera/develop
Complete support for MIPS n32 ABI
Zhang Xianyi [Thu, 14 Jul 2016 19:48:58 +0000 (15:48 -0400)]
Merge pull request #925 from martin-frbg/develop
Update zgetrf2.f, cpuid_x86.c, dynamic.c
Zhang Xianyi [Thu, 14 Jul 2016 19:47:55 +0000 (15:47 -0400)]
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
Zhang Xianyi [Thu, 14 Jul 2016 19:45:39 +0000 (15:45 -0400)]
Merge pull request #918 from sva-img/develop
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM.
Vicente Olivert Riera [Thu, 14 Jul 2016 16:20:51 +0000 (17:20 +0100)]
Complete support for MIPS n32 ABI
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Martin Kroeker [Thu, 14 Jul 2016 15:29:34 +0000 (17:29 +0200)]
Update cpuid_x86.c
Martin Kroeker [Thu, 14 Jul 2016 10:25:17 +0000 (12:25 +0200)]
Update cpuid_x86.c
Martin Kroeker [Thu, 14 Jul 2016 10:22:55 +0000 (12:22 +0200)]
Update dynamic.c
Add Braswell (extended model 4, model 12) N3150 as Nehalem
Martin Kroeker [Thu, 14 Jul 2016 09:41:57 +0000 (11:41 +0200)]
Update zgetrf2.f
Trivial typo correction (ZERBLA => XERBLA) to fix #910
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:21:17 +0000 (13:51 +0530)]
Improvements to TRMM and GEMM kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:20:38 +0000 (13:50 +0530)]
Improvements to GEMV kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:19:15 +0000 (13:49 +0530)]
Improvements to COPY and IAMAX kernels
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:18:13 +0000 (13:48 +0530)]
Add time prints in benchmark output
Ashwin Sekhar T K [Thu, 14 Jul 2016 08:16:01 +0000 (13:46 +0530)]
Add IAMAX and NRM2 benchmarks
Shivraj Patil [Tue, 28 Jun 2016 12:21:10 +0000 (17:51 +0530)]
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Mon, 27 Jun 2016 14:05:30 +0000 (10:05 -0400)]
Merge pull request #913 from dpfoose/develop
Small change to allow compiling with USE_OPENMP on MSVC
Zhang Xianyi [Mon, 27 Jun 2016 14:04:54 +0000 (10:04 -0400)]
Merge pull request #907 from jeromerobert/bug786
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
Zhang Xianyi [Mon, 27 Jun 2016 14:04:18 +0000 (10:04 -0400)]
Merge pull request #897 from ksraste/develop
STRSM optimized for MSA
Daniel Patrick Foose [Tue, 14 Jun 2016 18:37:28 +0000 (14:37 -0400)]
Change to allow compiling with USE_OPENMP on MSVC
MSVC treats the declaration of omp_in_parallel and omp_get_num_procs without the modifiers __declspec(dllimport) and __cdecl as a redefinition.
Jerome Robert [Tue, 7 Jun 2016 14:11:09 +0000 (16:11 +0200)]
Fix z/ctrmv stack allocation on AMD bulldozer and barcelona target
* Hopefully, because this was found by error and trial (dark magic)
* Ref #786
Werner Saar [Tue, 31 May 2016 12:13:52 +0000 (14:13 +0200)]
Merge pull request #898 from wernsaar/develop
added experimental support for optimized lapack fortran functions
Werner Saar [Tue, 31 May 2016 10:53:07 +0000 (12:53 +0200)]
added directory for optimized lapack fortan codes and added dlaqr5.f
Kaustubh Raste [Tue, 31 May 2016 04:47:23 +0000 (10:17 +0530)]
STRSM optimized for MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Kaustubh Raste [Mon, 30 May 2016 15:47:00 +0000 (21:17 +0530)]
STRSM optimized
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Mon, 30 May 2016 06:52:58 +0000 (14:52 +0800)]
Merge pull request #893 from biddisco/develop
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PRO…
Zhang Xianyi [Mon, 30 May 2016 06:52:40 +0000 (14:52 +0800)]
Merge pull request #891 from rndfax/develop
mips64/axpy: fix error when INCY == 0
John Biddiscombe [Wed, 25 May 2016 07:13:28 +0000 (09:13 +0200)]
Replace CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR with PROJECT_SOURCE_DIR/PROJECT_BINARY_DIR
If OpenBLAS is built using add_subdirectory(OpenBlas) as part of another project
then the paths set by CMAKE_XXX_DIR are relative to the parent project
and not the OpenBLAS project.
Aleksey Kuleshov [Mon, 23 May 2016 10:24:15 +0000 (13:24 +0300)]
mips64/axpy: fix error when INCY == 0
Werner Saar [Mon, 23 May 2016 09:20:41 +0000 (11:20 +0200)]
optimized dtrsm_logic_LT_16x4_power8.S and dtrsm_macros_LT_16x4_power8.S
Werner Saar [Sun, 22 May 2016 14:01:35 +0000 (16:01 +0200)]
Merge pull request #890 from wernsaar/develop
optimized dtrsm_kernel_LT for POWER8
Werner Saar [Sun, 22 May 2016 13:20:04 +0000 (15:20 +0200)]
optimized dtrsm_kernel_LT for POWER8
Werner Saar [Sun, 22 May 2016 11:51:47 +0000 (13:51 +0200)]
added trsm bencharks for POWER8 to benchmark/Makefile
Werner Saar [Sun, 22 May 2016 11:09:05 +0000 (13:09 +0200)]
added optimized dtrsm_LT kernel for POWER8
Zhang Xianyi [Sat, 21 May 2016 17:08:44 +0000 (01:08 +0800)]
Merge the patch for musl libc.
Zhang Xianyi [Fri, 20 May 2016 23:17:21 +0000 (07:17 +0800)]
Merge pull request #887 from ksraste/develop
STRSM optimization for MIPS P5600 and I6400 using MSA
Kaustubh Raste [Fri, 20 May 2016 05:29:03 +0000 (10:59 +0530)]
STRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Thu, 19 May 2016 11:59:09 +0000 (19:59 +0800)]
Merge pull request #886 from vriera/develop
Makefile.system: P5600 and I6400 cores need -mmsa
Zhang Xianyi [Thu, 19 May 2016 11:58:32 +0000 (19:58 +0800)]
Merge pull request #885 from sva-img/develop
SGEMM optimization for MIPS P5600 and I6400 using MSA.
Vicente Olivert Riera [Thu, 19 May 2016 09:35:45 +0000 (10:35 +0100)]
Makefile.system: P5600 and I6400 cores need -mmsa
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Shivraj Patil [Thu, 19 May 2016 05:34:42 +0000 (11:04 +0530)]
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Thu, 19 May 2016 03:16:43 +0000 (11:16 +0800)]
Merge pull request #878 from ksraste/develop
DTRSM bug fix for MIPS P5600 and I6400
Werner Saar [Tue, 17 May 2016 15:10:36 +0000 (17:10 +0200)]
Merge pull request #879 from wernsaar/develop
optimized dgemm and dgetrf for POWER8
Werner Saar [Tue, 17 May 2016 14:19:53 +0000 (16:19 +0200)]
optimized getrf_single.c for POWER8
Werner Saar [Tue, 17 May 2016 12:45:27 +0000 (14:45 +0200)]
optimized dgemm and dgetrf for POWER8
Kaustubh Raste [Tue, 17 May 2016 10:18:02 +0000 (15:48 +0530)]
DTRSM bug fix for MIPS P5600 and I6400
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Zhang Xianyi [Mon, 16 May 2016 15:21:56 +0000 (23:21 +0800)]
Merge pull request #877 from jeromerobert/bug873
Disable multi-threading in swap
Jerome Robert [Mon, 16 May 2016 13:07:55 +0000 (13:07 +0000)]
Disable multi-threading in swap
* Close #873
Werner Saar [Mon, 16 May 2016 12:52:40 +0000 (14:52 +0200)]
Merge pull request #876 from wernsaar/develop
optimized dgemm on power8 for 20 threads
Werner Saar [Mon, 16 May 2016 12:14:25 +0000 (14:14 +0200)]
optimized dgemm for 20 threads
Zhang Xianyi [Mon, 9 May 2016 14:54:55 +0000 (10:54 -0400)]
Merge pull request #869 from ksraste/develop
DTRSM optimization for MIPS P5600 and I6400 using MSA
Zhang Xianyi [Mon, 9 May 2016 14:54:30 +0000 (10:54 -0400)]
Merge pull request #868 from sva-img/develop
build fix for MIPS 32 bit
Kaustubh Raste [Mon, 9 May 2016 09:45:26 +0000 (15:15 +0530)]
DTRSM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com>
Shivraj Patil [Mon, 9 May 2016 09:15:12 +0000 (14:45 +0530)]
build fix for MIPS 32 bit
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Fri, 6 May 2016 14:53:22 +0000 (10:53 -0400)]
Merge pull request #866 from sva-img/develop
DGEMM optimization for MIPS P5600 and I6400 using MSA
Shivraj Patil [Wed, 4 May 2016 05:37:14 +0000 (11:07 +0530)]
conflict resolved by syncing with 'xianyi:develop'
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Tue, 3 May 2016 21:06:31 +0000 (17:06 -0400)]
Merge pull request #867 from IvanUkhov/space
Wrap CURDIR and DESTDIR in quotes
Ivan Ukhov [Tue, 3 May 2016 19:31:32 +0000 (21:31 +0200)]
Wrap CURDIR and DESTDIR in quotes
Shivraj Patil [Tue, 3 May 2016 09:12:26 +0000 (14:42 +0530)]
DGEMM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Zhang Xianyi [Fri, 29 Apr 2016 15:46:24 +0000 (11:46 -0400)]
Merge pull request #863 from ashwinyes/develop_20160429_update_numa_binding
Update NUMA CPU binding
Zhang Xianyi [Fri, 29 Apr 2016 15:44:36 +0000 (11:44 -0400)]
Merge pull request #847 from sva-img/develop
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Werner Saar [Fri, 29 Apr 2016 11:33:45 +0000 (13:33 +0200)]
Merge pull request #864 from wernsaar/develop
optimized dgemm for POWER8
Werner Saar [Fri, 29 Apr 2016 10:52:47 +0000 (12:52 +0200)]
optimized dgemm for POWER8
Ashwin Sekhar T K [Fri, 29 Apr 2016 06:28:15 +0000 (11:58 +0530)]
Update NUMA CPU binding
When the number of process can all be
accommodated within the current node,
then use cores from the current node only.
Zhang Xianyi [Thu, 28 Apr 2016 15:42:18 +0000 (11:42 -0400)]
Merge pull request #858 from buffer51/develop
Fixed cross-suffix detection for path that contains dashes
buffer51 [Thu, 28 Apr 2016 05:23:02 +0000 (22:23 -0700)]
Use CROSS_SUFFIX only if CROSS is set
buffer51 [Wed, 27 Apr 2016 19:09:44 +0000 (12:09 -0700)]
Fixed cross-suffix detection for path that contains dashes when the compiler itself doesn't
Werner Saar [Wed, 27 Apr 2016 14:34:15 +0000 (16:34 +0200)]
Merge pull request #856 from wernsaar/develop
optimized dgemm for POWER8
Werner Saar [Wed, 27 Apr 2016 13:48:09 +0000 (15:48 +0200)]
optimized param.h for POWER8
Werner Saar [Wed, 27 Apr 2016 12:01:08 +0000 (14:01 +0200)]
optimized dgemm for POWER8
Zhang Xianyi [Tue, 26 Apr 2016 14:24:33 +0000 (10:24 -0400)]
Merge pull request #852 from buffer51/develop
Added Android as a community-supported OS
Zhang Xianyi [Tue, 26 Apr 2016 14:24:13 +0000 (10:24 -0400)]
Merge pull request #851 from rndfax/develop
allow building tests when CROSS compiling but don't run them
buffer51 [Tue, 26 Apr 2016 10:14:03 +0000 (03:14 -0700)]
Added Android as a community-supported OS
Aleksey Kuleshov [Fri, 22 Apr 2016 15:21:18 +0000 (18:21 +0300)]
allow building tests when CROSS compiling but don't run them
Werner Saar [Mon, 25 Apr 2016 10:00:43 +0000 (12:00 +0200)]
Merge pull request #850 from wernsaar/develop
Bugfixes and enhancements for EXCAVATOR