platform/upstream/openblas.git
9 years agoadded optimized ssymv kernels for haswell
Werner Saar [Thu, 23 Apr 2015 08:23:13 +0000 (10:23 +0200)]
added optimized ssymv kernels for haswell

9 years agoMerge pull request #549 from wernsaar/develop
wernsaar [Wed, 22 Apr 2015 10:36:13 +0000 (12:36 +0200)]
Merge pull request #549 from wernsaar/develop

added optimized dsymv kernels for haswell and sandybridge

9 years agoadded optimized dsymv kernels for sandybridge
Werner Saar [Wed, 22 Apr 2015 10:09:43 +0000 (12:09 +0200)]
added optimized dsymv kernels for sandybridge

9 years agoadded optimized dsymv kernels for haswell
Werner Saar [Wed, 22 Apr 2015 08:42:50 +0000 (10:42 +0200)]
added optimized dsymv kernels for haswell

9 years agoRefs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
Zhang Xianyi [Tue, 21 Apr 2015 04:22:40 +0000 (23:22 -0500)]
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)

9 years agoadded asum benchmark
Werner Saar [Sun, 19 Apr 2015 09:24:07 +0000 (11:24 +0200)]
added asum benchmark

9 years agoadded scal benchmark
Werner Saar [Sat, 18 Apr 2015 06:41:41 +0000 (08:41 +0200)]
added scal benchmark

9 years agoMerge pull request #546 from wernsaar/develop
wernsaar [Thu, 16 Apr 2015 09:36:51 +0000 (11:36 +0200)]
Merge pull request #546 from wernsaar/develop

added optimized zaxpy-kernels

9 years agoadded optimized zaxpy-kernels
Werner Saar [Thu, 16 Apr 2015 09:19:37 +0000 (11:19 +0200)]
added optimized zaxpy-kernels

9 years agoMerge pull request #543 from jeromerobert/develop
Zhang Xianyi [Wed, 15 Apr 2015 16:18:14 +0000 (11:18 -0500)]
Merge pull request #543 from jeromerobert/develop

Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t

9 years agoMerge pull request #544 from wernsaar/develop
wernsaar [Wed, 15 Apr 2015 15:04:02 +0000 (17:04 +0200)]
Merge pull request #544 from wernsaar/develop

Optimized  caxpy-kernels

9 years agoadded optimized caxpy-kernel for sandybridge
Werner Saar [Wed, 15 Apr 2015 14:29:25 +0000 (16:29 +0200)]
added optimized caxpy-kernel for sandybridge

9 years agoadded optimized caxpy-kernel for haswell
Werner Saar [Wed, 15 Apr 2015 13:16:31 +0000 (15:16 +0200)]
added optimized caxpy-kernel for haswell

9 years agoadded optimized caxpy-kernel for steamroller
Werner Saar [Wed, 15 Apr 2015 11:49:23 +0000 (13:49 +0200)]
added optimized caxpy-kernel for steamroller

9 years agoupdated caxpy_microk_bulldozer-2.c and caxpy.c
Werner Saar [Wed, 15 Apr 2015 09:59:38 +0000 (11:59 +0200)]
updated caxpy_microk_bulldozer-2.c and caxpy.c

9 years agoFix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Jerome Robert [Wed, 15 Apr 2015 07:41:45 +0000 (09:41 +0200)]
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t

Refs #478, #482, 9798481, fd9fd42

9 years agoMerge pull request #540 from wernsaar/develop
wernsaar [Tue, 14 Apr 2015 13:53:09 +0000 (15:53 +0200)]
Merge pull request #540 from wernsaar/develop

Optimized dot- and axpy-kernels

9 years agoadd optimized ddot-kernel for piledriver
Werner Saar [Tue, 14 Apr 2015 13:09:13 +0000 (15:09 +0200)]
add optimized ddot-kernel for piledriver

9 years agoadd optimized daxpy-kernel for piledriver
Werner Saar [Tue, 14 Apr 2015 12:23:29 +0000 (14:23 +0200)]
add optimized daxpy-kernel for piledriver

9 years agoadded optimized saxpy kernel for steamroller
Werner Saar [Tue, 14 Apr 2015 07:09:39 +0000 (09:09 +0200)]
added optimized saxpy kernel for steamroller

9 years agooptimized saxpy for piledriver
Werner Saar [Tue, 14 Apr 2015 06:34:11 +0000 (08:34 +0200)]
optimized saxpy for piledriver

9 years agoEnable MAX_STACK_ALLOC by default.
Zhang Xianyi [Tue, 14 Apr 2015 04:23:40 +0000 (23:23 -0500)]
Enable MAX_STACK_ALLOC by default.

9 years agoRefs #478, #482. Fixed bug on previous commit.
Zhang Xianyi [Tue, 14 Apr 2015 04:22:27 +0000 (23:22 -0500)]
Refs #478, #482. Fixed bug on previous commit.

9 years agoRefs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
Zhang Xianyi [Tue, 14 Apr 2015 00:45:27 +0000 (19:45 -0500)]
Refs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.

For gemv_t, directly use malloc to create the buffer.

9 years agooptimized sdot-kernel for pilediver
Werner Saar [Mon, 13 Apr 2015 11:19:21 +0000 (13:19 +0200)]
optimized sdot-kernel for pilediver

9 years agoadd optimized daxpy-kernel for steamroller
Werner Saar [Mon, 13 Apr 2015 10:22:43 +0000 (12:22 +0200)]
add optimized daxpy-kernel for steamroller

9 years agoadded optimized sdot-kernel for steamroller
Werner Saar [Sat, 11 Apr 2015 06:48:18 +0000 (08:48 +0200)]
added optimized sdot-kernel for steamroller

9 years agoadded optimized ddot kernel for steamroller
Werner Saar [Fri, 10 Apr 2015 14:18:03 +0000 (16:18 +0200)]
added optimized ddot kernel for steamroller

9 years agoMerge pull request #538 from wernsaar/develop
wernsaar [Fri, 10 Apr 2015 14:03:37 +0000 (16:03 +0200)]
Merge pull request #538 from wernsaar/develop

Added optimized cdot- and zdot-kernels

9 years agoupdated cdot and zdot
Werner Saar [Fri, 10 Apr 2015 09:10:31 +0000 (11:10 +0200)]
updated cdot and zdot

9 years agoadd optimized cdot- and zdot-kernel for sandybridge
Werner Saar [Fri, 10 Apr 2015 07:37:26 +0000 (09:37 +0200)]
add optimized cdot- and zdot-kernel for sandybridge

9 years agoadd optimized cdot- and zdot-kernel for haswell
Werner Saar [Thu, 9 Apr 2015 13:13:52 +0000 (15:13 +0200)]
add optimized cdot- and zdot-kernel for haswell

9 years agoupdated cdot and zdot for piledriver
Werner Saar [Thu, 9 Apr 2015 08:33:46 +0000 (10:33 +0200)]
updated cdot and zdot for piledriver

9 years agoadded optimized cdot- and zdot-kernel for steamroller
Werner Saar [Thu, 9 Apr 2015 07:45:23 +0000 (09:45 +0200)]
added optimized cdot- and zdot-kernel for steamroller

9 years agoadded optimized cdot- and zdot-kernels for bulldozer
Werner Saar [Wed, 8 Apr 2015 14:29:55 +0000 (16:29 +0200)]
added optimized cdot- and zdot-kernels for bulldozer

9 years agoRefs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel.
Zhang Xianyi [Tue, 7 Apr 2015 19:55:49 +0000 (03:55 +0800)]
Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel.

9 years agoMerge pull request #534 from wernsaar/develop
Zhang Xianyi [Tue, 7 Apr 2015 17:48:11 +0000 (12:48 -0500)]
Merge pull request #534 from wernsaar/develop

Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge

9 years agoadded cdot- and zdot benchmark
Werner Saar [Tue, 7 Apr 2015 09:56:06 +0000 (11:56 +0200)]
added cdot- and zdot benchmark

9 years agoupdated some lines for bulldozer
Werner Saar [Mon, 6 Apr 2015 16:47:16 +0000 (18:47 +0200)]
updated some lines for bulldozer

9 years agoadded optimized saxpy- and daxpy-kernel for sandybridge
Werner Saar [Mon, 6 Apr 2015 14:05:16 +0000 (16:05 +0200)]
added optimized saxpy- and daxpy-kernel for sandybridge

9 years agoadded optimized saxpy- and daxpy-kernel for haswell
Werner Saar [Mon, 6 Apr 2015 10:33:16 +0000 (12:33 +0200)]
added optimized saxpy- and daxpy-kernel for haswell

9 years agoMerge pull request #531 from wernsaar/develop
Zhang Xianyi [Sun, 5 Apr 2015 21:42:39 +0000 (16:42 -0500)]
Merge pull request #531 from wernsaar/develop

added optimized sdot- and ddot-kernels for Haswell and Sandybridge

9 years agoadded optimized ddot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 18:19:38 +0000 (20:19 +0200)]
added optimized ddot-kernel for sandybridge

9 years agoadd optimized sdot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 17:47:05 +0000 (19:47 +0200)]
add optimized sdot-kernel for sandybridge

9 years agoremoved double definition line
Werner Saar [Sun, 5 Apr 2015 16:35:34 +0000 (18:35 +0200)]
removed double definition line

9 years agoadded optimized sdot- and ddot-kernel for HASWELL
Werner Saar [Sun, 5 Apr 2015 15:57:53 +0000 (17:57 +0200)]
added optimized sdot- and ddot-kernel for HASWELL

9 years agoRefs #529. Support Intel Broadwell by Haswell kernels.
Zhang Xianyi [Thu, 2 Apr 2015 16:08:03 +0000 (11:08 -0500)]
Refs #529. Support Intel Broadwell by Haswell kernels.

9 years agoMerge pull request #527 from xantares/patch-1
Zhang Xianyi [Mon, 30 Mar 2015 15:16:11 +0000 (10:16 -0500)]
Merge pull request #527 from xantares/patch-1

fix mingw install

9 years agofix mingw install
xantares [Mon, 30 Mar 2015 07:30:55 +0000 (09:30 +0200)]
fix mingw install

9 years agoFix build bug for ARM64.
Zhang Xianyi [Tue, 24 Mar 2015 20:27:17 +0000 (15:27 -0500)]
Fix build bug for ARM64.

9 years agoUpdate the doc for 0.2.14.
Zhang Xianyi [Tue, 24 Mar 2015 20:05:59 +0000 (15:05 -0500)]
Update the doc for 0.2.14.

9 years agoMerge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Tue, 24 Mar 2015 17:17:12 +0000 (12:17 -0500)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop

9 years agoAdd ARM targets.
Zhang Xianyi [Tue, 24 Mar 2015 17:17:04 +0000 (12:17 -0500)]
Add ARM targets.

9 years agoFix compiling bug for ARM with setting BINARY.
Zhang Xianyi [Tue, 24 Mar 2015 17:15:33 +0000 (17:15 +0000)]
Fix compiling bug for ARM with setting BINARY.

9 years agoMerge pull request #521 from maxlevesque/patch-1
Zhang Xianyi [Sat, 21 Mar 2015 17:26:35 +0000 (12:26 -0500)]
Merge pull request #521 from maxlevesque/patch-1

Correct typo /proc/ instead of /pros/

9 years agoCorrect typo /proc/ instead of /pros/
Maximilien Levesque [Fri, 20 Mar 2015 22:25:11 +0000 (23:25 +0100)]
Correct typo /proc/ instead of /pros/

9 years agoRefs #519. Avoid calling strncpy.
Zhang Xianyi [Thu, 19 Mar 2015 20:57:22 +0000 (15:57 -0500)]
Refs #519. Avoid calling strncpy.

9 years agoRefs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.
Zhang Xianyi [Thu, 19 Mar 2015 16:51:36 +0000 (11:51 -0500)]
Refs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.

9 years agoMerge pull request #518 from ton/issue-508
Zhang Xianyi [Wed, 18 Mar 2015 18:00:07 +0000 (13:00 -0500)]
Merge pull request #518 from ton/issue-508

Fix issue #508

9 years agoFix issue #508
Ton van den Heuvel [Wed, 18 Mar 2015 12:22:43 +0000 (13:22 +0100)]
Fix issue #508

Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().

9 years agoRefs #492. Fixed c/zsyr bug with negative incx.
Zhang Xianyi [Wed, 25 Feb 2015 22:37:03 +0000 (06:37 +0800)]
Refs #492. Fixed c/zsyr bug with negative incx.

9 years agoRefs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.
Zhang Xianyi [Wed, 25 Feb 2015 17:47:11 +0000 (01:47 +0800)]
Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.

9 years agoRefs#509. Merge branch 'grisuthedragon-develop' into develop
Zhang Xianyi [Wed, 25 Feb 2015 17:44:19 +0000 (01:44 +0800)]
Refs#509. Merge branch 'grisuthedragon-develop' into develop

9 years agoAdd ATLAS-style ?geadd function
Martin Koehler [Mon, 16 Feb 2015 12:46:20 +0000 (13:46 +0100)]
Add ATLAS-style ?geadd function

9 years agoDetect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.
Zhang Xianyi [Sun, 8 Feb 2015 07:42:48 +0000 (01:42 -0600)]
Detect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.

9 years agoFix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
Zhang Xianyi [Sun, 8 Feb 2015 07:30:12 +0000 (01:30 -0600)]
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.

9 years agoMerge pull request #497 from eschnett/develop
Zhang Xianyi [Wed, 4 Feb 2015 05:09:38 +0000 (23:09 -0600)]
Merge pull request #497 from eschnett/develop

Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoIntroduce openblas_get_num_threads and openblas_get_num_procs
Erik Schnetter [Tue, 3 Feb 2015 17:23:34 +0000 (12:23 -0500)]
Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoMerge pull request #495 from jeromerobert/develop
Zhang Xianyi [Thu, 29 Jan 2015 10:23:50 +0000 (18:23 +0800)]
Merge pull request #495 from jeromerobert/develop

Fix a segfault in gemv when MAX_STACK_ALLOC is set

9 years agoFix a segfault in gemv when MAX_STACK_ALLOC is set
Jerome Robert [Thu, 29 Jan 2015 08:55:57 +0000 (09:55 +0100)]
Fix a segfault in gemv when MAX_STACK_ALLOC is set

* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.

9 years agoMerge pull request #490 from eschnett/develop
Zhang Xianyi [Tue, 13 Jan 2015 07:43:56 +0000 (15:43 +0800)]
Merge pull request #490 from eschnett/develop

Move #include statements outside extern "C" blocks

9 years agoMove #include statements outside extern "C" blocks
Erik Schnetter [Tue, 13 Jan 2015 02:27:52 +0000 (21:27 -0500)]
Move #include statements outside extern "C" blocks

9 years agoFix cortex-a15 detecting bug.
Zhang Xianyi [Mon, 12 Jan 2015 09:35:16 +0000 (09:35 +0000)]
Fix cortex-a15 detecting bug.

9 years agoAdd cortex-a9 and cortex-a15 targets.
Zhang Xianyi [Mon, 12 Jan 2015 08:55:29 +0000 (08:55 +0000)]
Add cortex-a9 and cortex-a15 targets.

9 years agoMerge pull request #487 from kortschak/dromtg-test
Zhang Xianyi [Wed, 7 Jan 2015 06:13:11 +0000 (14:13 +0800)]
Merge pull request #487 from kortschak/dromtg-test

Add test for drotmg bug fixed by 692b14c

9 years agoAdd test for drotmg bug fixed by 692b14c
kortschak [Tue, 6 Jan 2015 23:36:55 +0000 (10:06 +1030)]
Add test for drotmg bug fixed by 692b14c

Test requested in issue xianyi/OpenBLAS#484.

Run tests by applying the following change and then make:

diff --git a/Makefile.rule b/Makefile.rule
index bea1fe1..9852ff3 100644
--- a/Makefile.rule
+++ b/Makefile.rule
@@ -140,7 +140,7 @@ NO_AFFINITY = 1

-# UTEST_CHECK = 1
+UTEST_CHECK = 1

9 years agoAdd configuration options.
Zhang Xianyi [Thu, 1 Jan 2015 18:42:32 +0000 (02:42 +0800)]
Add configuration options.

9 years agoMerge pull request #482 from jeromerobert/develop
Zhang Xianyi [Thu, 1 Jan 2015 18:26:17 +0000 (02:26 +0800)]
Merge pull request #482 from jeromerobert/develop

Allow to do gemv and ger buffer allocation on the stack

9 years agoMerge pull request #486 from wernsaar/develop
Zhang Xianyi [Tue, 30 Dec 2014 18:36:23 +0000 (02:36 +0800)]
Merge pull request #486 from wernsaar/develop

Optimizations for steamroller

9 years agoMerge branch 'develop' of github.com:wernsaar/OpenBLAS into develop
Werner Saar [Tue, 30 Dec 2014 12:16:53 +0000 (20:16 +0800)]
Merge branch 'develop' of github.com:wernsaar/OpenBLAS into develop

9 years agoadded optimizations for steamroller
Werner Saar [Tue, 30 Dec 2014 12:14:45 +0000 (20:14 +0800)]
added optimizations for steamroller

9 years agoMerge pull request #483 from wernsaar/develop
Zhang Xianyi [Mon, 29 Dec 2014 04:00:16 +0000 (12:00 +0800)]
Merge pull request #483 from wernsaar/develop

added Steamroller as a  cpu target

9 years agobugfix in dynamic.c
Werner Saar [Sun, 28 Dec 2014 16:15:42 +0000 (17:15 +0100)]
bugfix in dynamic.c

9 years agoadded Steamroller as a target processor
Werner Saar [Sun, 28 Dec 2014 12:45:19 +0000 (13:45 +0100)]
added Steamroller as a target processor

9 years agoadded target processor STEAMROLLER
Werner Saar [Sun, 28 Dec 2014 12:16:46 +0000 (20:16 +0800)]
added target processor STEAMROLLER

9 years agoAllow to do gemv and ger buffer allocation on the stack
Jerome Robert [Fri, 26 Dec 2014 13:42:00 +0000 (14:42 +0100)]
Allow to do gemv and ger buffer allocation on the stack

ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478

9 years agoMerge pull request #481 from eschnett/develop
Zhang Xianyi [Fri, 26 Dec 2014 02:09:19 +0000 (10:09 +0800)]
Merge pull request #481 from eschnett/develop

Correct ilaver C declaration

9 years agoCorrect ilaver C declaration
Erik Schnetter [Thu, 25 Dec 2014 22:41:17 +0000 (17:41 -0500)]
Correct ilaver C declaration

9 years agoMerge pull request #479 from wernsaar/develop
Zhang Xianyi [Mon, 22 Dec 2014 16:59:41 +0000 (00:59 +0800)]
Merge pull request #479 from wernsaar/develop

workaround for sandybridge zgemm kernel

9 years agoRef #458: Backport, sandybrigde uses nehalem zgemm kernel
Werner Saar [Mon, 22 Dec 2014 16:01:18 +0000 (17:01 +0100)]
Ref #458: Backport, sandybrigde uses nehalem zgemm kernel

9 years agoincreased NMAX to 128
Werner Saar [Mon, 22 Dec 2014 13:04:27 +0000 (14:04 +0100)]
increased NMAX to 128

9 years agomodified sources for OS Darwin
Werner Saar [Fri, 19 Dec 2014 11:40:46 +0000 (12:40 +0100)]
modified sources for OS Darwin

9 years agosmall optimization on dgemm_kernel for N=1
Werner Saar [Thu, 18 Dec 2014 19:35:51 +0000 (20:35 +0100)]
small optimization on dgemm_kernel for N=1

9 years agoadded code for the size of n
Werner Saar [Wed, 17 Dec 2014 14:02:11 +0000 (15:02 +0100)]
added code for the size of n

9 years agomodified makefile for acml6.1
Werner Saar [Wed, 17 Dec 2014 13:12:21 +0000 (14:12 +0100)]
modified makefile for acml6.1

9 years agoFixed installation bug on Mac OSX.
Zhang Xianyi [Sat, 13 Dec 2014 05:05:06 +0000 (13:05 +0800)]
Fixed installation bug on Mac OSX.

9 years agoIncreased the Threshold value in sep.in
Werner Saar [Thu, 11 Dec 2014 13:57:41 +0000 (14:57 +0100)]
Increased the Threshold value in sep.in

9 years agoadded tests to sep.as as workaround for gfortran-4.8.x
Werner Saar [Thu, 11 Dec 2014 12:53:59 +0000 (13:53 +0100)]
added tests to sep.as as workaround for gfortran-4.8.x

9 years agoMerge pull request #475 from xantares/patch-2
Zhang Xianyi [Tue, 9 Dec 2014 09:57:43 +0000 (17:57 +0800)]
Merge pull request #475 from xantares/patch-2

add OpenBLAS_VERSION to cmake config file

9 years agoMerge pull request #474 from xantares/patch-1
Zhang Xianyi [Tue, 9 Dec 2014 09:57:16 +0000 (17:57 +0800)]
Merge pull request #474 from xantares/patch-1

set OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>