platform/upstream/openblas.git
9 years agouse only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD
Werner Saar [Wed, 6 May 2015 08:41:53 +0000 (10:41 +0200)]
use only 1 thread if m or n < 2*GEMM_MULTITHREAD_THRESHOLD

9 years agoadded loops to trsm.c
Werner Saar [Wed, 6 May 2015 07:21:19 +0000 (09:21 +0200)]
added loops to trsm.c

9 years agoMerge pull request #563 from wernsaar/develop
wernsaar [Tue, 5 May 2015 10:13:35 +0000 (12:13 +0200)]
Merge pull request #563 from wernsaar/develop

Bugfix for gemm3m tests

9 years agobugfix for gemm3m tests
Werner Saar [Tue, 5 May 2015 09:58:59 +0000 (11:58 +0200)]
bugfix for gemm3m tests

9 years agoremoved gemm3m functions from normal checks
Werner Saar [Tue, 5 May 2015 09:39:43 +0000 (11:39 +0200)]
removed gemm3m functions from normal checks

9 years agoMerge pull request #561 from wernsaar/develop
wernsaar [Mon, 4 May 2015 09:11:13 +0000 (11:11 +0200)]
Merge pull request #561 from wernsaar/develop

 updated dgemv_n sgemv_n kernels

9 years agoupdated dgemv_n kernel for nehalem
Werner Saar [Thu, 30 Apr 2015 12:38:06 +0000 (14:38 +0200)]
updated dgemv_n kernel for nehalem

9 years agooptimized dgemv_n kernel for haswell
Werner Saar [Thu, 30 Apr 2015 10:11:39 +0000 (12:11 +0200)]
optimized dgemv_n kernel for haswell

9 years agoMerge pull request #560 from sebastien-villemot/develop
Zhang Xianyi [Wed, 29 Apr 2015 16:36:47 +0000 (11:36 -0500)]
Merge pull request #560 from sebastien-villemot/develop

Fix detection of ARM architectures in c_check.

9 years agoFix detection of ARM architectures in c_check.
Sébastien Villemot [Wed, 29 Apr 2015 16:14:21 +0000 (18:14 +0200)]
Fix detection of ARM architectures in c_check.

This is necessary to avoid the false detection of a cross-compiling environment.

9 years agoMerge pull request #558 from wernsaar/develop
wernsaar [Tue, 28 Apr 2015 15:30:16 +0000 (17:30 +0200)]
Merge pull request #558 from wernsaar/develop

optimizations for sandybridge

9 years agooptimized dger kernel for sandybridge
Werner Saar [Tue, 28 Apr 2015 14:58:11 +0000 (16:58 +0200)]
optimized dger kernel for sandybridge

9 years agoadded optimized sger kernel for sandybridge
Werner Saar [Tue, 28 Apr 2015 13:33:38 +0000 (15:33 +0200)]
added optimized sger kernel for sandybridge

9 years agooptimized saxpy and daxpy for sandybridge
Werner Saar [Tue, 28 Apr 2015 08:18:32 +0000 (10:18 +0200)]
optimized saxpy and daxpy for sandybridge

9 years agoMerge pull request #554 from wernsaar/develop
Zhang Xianyi [Sat, 25 Apr 2015 13:11:36 +0000 (08:11 -0500)]
Merge pull request #554 from wernsaar/develop

added benchmarks for zgeru and cgeru

9 years agoadd benchmarks for zgeru and cgeru
Werner Saar [Sat, 25 Apr 2015 12:53:07 +0000 (14:53 +0200)]
add benchmarks for zgeru and cgeru

9 years agoMerge pull request #552 from jeromerobert/develop
Zhang Xianyi [Fri, 24 Apr 2015 19:12:12 +0000 (14:12 -0500)]
Merge pull request #552 from jeromerobert/develop

gemv: Ensure stack buffer is large enough to handle memory alignment

9 years agobugfixes: replaced int with BLASLONG
Werner Saar [Fri, 24 Apr 2015 12:30:44 +0000 (14:30 +0200)]
bugfixes: replaced int with BLASLONG

9 years agoMerge pull request #553 from wernsaar/develop
wernsaar [Fri, 24 Apr 2015 11:57:48 +0000 (13:57 +0200)]
Merge pull request #553 from wernsaar/develop

optimized some blas level1 kernels for increments != 1

9 years agooptimized sdot.c for increments != 1
Werner Saar [Fri, 24 Apr 2015 11:13:20 +0000 (13:13 +0200)]
optimized sdot.c for increments != 1

9 years agooptimized saxpy.c for increments != 1
Werner Saar [Fri, 24 Apr 2015 09:52:59 +0000 (11:52 +0200)]
optimized saxpy.c for increments != 1

9 years agooptimized daxpy kernel for increments != 1
Werner Saar [Fri, 24 Apr 2015 09:39:17 +0000 (11:39 +0200)]
optimized daxpy kernel for increments != 1

9 years agooptimized ddot.c for increments != 1
Werner Saar [Fri, 24 Apr 2015 08:56:55 +0000 (10:56 +0200)]
optimized ddot.c for increments != 1

9 years agogemv: Ensure stack buffer is large enough to handle memory alignment
Jerome Robert [Tue, 21 Apr 2015 08:12:01 +0000 (10:12 +0200)]
gemv: Ensure stack buffer is large enough to handle memory alignment

Ref #478

9 years agoMerge pull request #550 from wernsaar/develop
wernsaar [Thu, 23 Apr 2015 11:27:38 +0000 (13:27 +0200)]
Merge pull request #550 from wernsaar/develop

added optimized ssymv kernels for haswell and sandybridge

9 years agoadded optimized ssymv kernels for sandybridge
Werner Saar [Thu, 23 Apr 2015 10:19:24 +0000 (12:19 +0200)]
added optimized ssymv kernels for sandybridge

9 years agoadded optimized ssymv kernels for haswell
Werner Saar [Thu, 23 Apr 2015 08:23:13 +0000 (10:23 +0200)]
added optimized ssymv kernels for haswell

9 years agoMerge pull request #549 from wernsaar/develop
wernsaar [Wed, 22 Apr 2015 10:36:13 +0000 (12:36 +0200)]
Merge pull request #549 from wernsaar/develop

added optimized dsymv kernels for haswell and sandybridge

9 years agoadded optimized dsymv kernels for sandybridge
Werner Saar [Wed, 22 Apr 2015 10:09:43 +0000 (12:09 +0200)]
added optimized dsymv kernels for sandybridge

9 years agoadded optimized dsymv kernels for haswell
Werner Saar [Wed, 22 Apr 2015 08:42:50 +0000 (10:42 +0200)]
added optimized dsymv kernels for haswell

9 years agoRefs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)
Zhang Xianyi [Tue, 21 Apr 2015 04:22:40 +0000 (23:22 -0500)]
Refs #478,#482, Enable stack alloc for s/dgemv_t.(revert 9798491)

9 years agoadded asum benchmark
Werner Saar [Sun, 19 Apr 2015 09:24:07 +0000 (11:24 +0200)]
added asum benchmark

9 years agoadded scal benchmark
Werner Saar [Sat, 18 Apr 2015 06:41:41 +0000 (08:41 +0200)]
added scal benchmark

9 years agoMerge pull request #546 from wernsaar/develop
wernsaar [Thu, 16 Apr 2015 09:36:51 +0000 (11:36 +0200)]
Merge pull request #546 from wernsaar/develop

added optimized zaxpy-kernels

9 years agoadded optimized zaxpy-kernels
Werner Saar [Thu, 16 Apr 2015 09:19:37 +0000 (11:19 +0200)]
added optimized zaxpy-kernels

9 years agoMerge pull request #543 from jeromerobert/develop
Zhang Xianyi [Wed, 15 Apr 2015 16:18:14 +0000 (11:18 -0500)]
Merge pull request #543 from jeromerobert/develop

Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t

9 years agoMerge pull request #544 from wernsaar/develop
wernsaar [Wed, 15 Apr 2015 15:04:02 +0000 (17:04 +0200)]
Merge pull request #544 from wernsaar/develop

Optimized  caxpy-kernels

9 years agoadded optimized caxpy-kernel for sandybridge
Werner Saar [Wed, 15 Apr 2015 14:29:25 +0000 (16:29 +0200)]
added optimized caxpy-kernel for sandybridge

9 years agoadded optimized caxpy-kernel for haswell
Werner Saar [Wed, 15 Apr 2015 13:16:31 +0000 (15:16 +0200)]
added optimized caxpy-kernel for haswell

9 years agoadded optimized caxpy-kernel for steamroller
Werner Saar [Wed, 15 Apr 2015 11:49:23 +0000 (13:49 +0200)]
added optimized caxpy-kernel for steamroller

9 years agoupdated caxpy_microk_bulldozer-2.c and caxpy.c
Werner Saar [Wed, 15 Apr 2015 09:59:38 +0000 (11:59 +0200)]
updated caxpy_microk_bulldozer-2.c and caxpy.c

9 years agoFix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t
Jerome Robert [Wed, 15 Apr 2015 07:41:45 +0000 (09:41 +0200)]
Fix a buffer overflow with MAX_STACK_ALLOC size in dgemv_t

Refs #478, #482, 9798481, fd9fd42

9 years agoMerge pull request #540 from wernsaar/develop
wernsaar [Tue, 14 Apr 2015 13:53:09 +0000 (15:53 +0200)]
Merge pull request #540 from wernsaar/develop

Optimized dot- and axpy-kernels

9 years agoadd optimized ddot-kernel for piledriver
Werner Saar [Tue, 14 Apr 2015 13:09:13 +0000 (15:09 +0200)]
add optimized ddot-kernel for piledriver

9 years agoadd optimized daxpy-kernel for piledriver
Werner Saar [Tue, 14 Apr 2015 12:23:29 +0000 (14:23 +0200)]
add optimized daxpy-kernel for piledriver

9 years agoadded optimized saxpy kernel for steamroller
Werner Saar [Tue, 14 Apr 2015 07:09:39 +0000 (09:09 +0200)]
added optimized saxpy kernel for steamroller

9 years agooptimized saxpy for piledriver
Werner Saar [Tue, 14 Apr 2015 06:34:11 +0000 (08:34 +0200)]
optimized saxpy for piledriver

9 years agoEnable MAX_STACK_ALLOC by default.
Zhang Xianyi [Tue, 14 Apr 2015 04:23:40 +0000 (23:23 -0500)]
Enable MAX_STACK_ALLOC by default.

9 years agoRefs #478, #482. Fixed bug on previous commit.
Zhang Xianyi [Tue, 14 Apr 2015 04:22:27 +0000 (23:22 -0500)]
Refs #478, #482. Fixed bug on previous commit.

9 years agoRefs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.
Zhang Xianyi [Tue, 14 Apr 2015 00:45:27 +0000 (19:45 -0500)]
Refs #478, #482. Fix segfault bug for gemv_t with MAX_ALLOC_STACK flag.

For gemv_t, directly use malloc to create the buffer.

9 years agooptimized sdot-kernel for pilediver
Werner Saar [Mon, 13 Apr 2015 11:19:21 +0000 (13:19 +0200)]
optimized sdot-kernel for pilediver

9 years agoadd optimized daxpy-kernel for steamroller
Werner Saar [Mon, 13 Apr 2015 10:22:43 +0000 (12:22 +0200)]
add optimized daxpy-kernel for steamroller

9 years agoadded optimized sdot-kernel for steamroller
Werner Saar [Sat, 11 Apr 2015 06:48:18 +0000 (08:48 +0200)]
added optimized sdot-kernel for steamroller

9 years agoadded optimized ddot kernel for steamroller
Werner Saar [Fri, 10 Apr 2015 14:18:03 +0000 (16:18 +0200)]
added optimized ddot kernel for steamroller

9 years agoMerge pull request #538 from wernsaar/develop
wernsaar [Fri, 10 Apr 2015 14:03:37 +0000 (16:03 +0200)]
Merge pull request #538 from wernsaar/develop

Added optimized cdot- and zdot-kernels

9 years agoupdated cdot and zdot
Werner Saar [Fri, 10 Apr 2015 09:10:31 +0000 (11:10 +0200)]
updated cdot and zdot

9 years agoadd optimized cdot- and zdot-kernel for sandybridge
Werner Saar [Fri, 10 Apr 2015 07:37:26 +0000 (09:37 +0200)]
add optimized cdot- and zdot-kernel for sandybridge

9 years agoadd optimized cdot- and zdot-kernel for haswell
Werner Saar [Thu, 9 Apr 2015 13:13:52 +0000 (15:13 +0200)]
add optimized cdot- and zdot-kernel for haswell

9 years agoupdated cdot and zdot for piledriver
Werner Saar [Thu, 9 Apr 2015 08:33:46 +0000 (10:33 +0200)]
updated cdot and zdot for piledriver

9 years agoadded optimized cdot- and zdot-kernel for steamroller
Werner Saar [Thu, 9 Apr 2015 07:45:23 +0000 (09:45 +0200)]
added optimized cdot- and zdot-kernel for steamroller

9 years agoadded optimized cdot- and zdot-kernels for bulldozer
Werner Saar [Wed, 8 Apr 2015 14:29:55 +0000 (16:29 +0200)]
added optimized cdot- and zdot-kernels for bulldozer

9 years agoRefs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel.
Zhang Xianyi [Tue, 7 Apr 2015 19:55:49 +0000 (03:55 +0800)]
Refs #535. Fix the wrong vector instruction in sgemm sandy bridge kernel.

9 years agoMerge pull request #534 from wernsaar/develop
Zhang Xianyi [Tue, 7 Apr 2015 17:48:11 +0000 (12:48 -0500)]
Merge pull request #534 from wernsaar/develop

Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge

9 years agoadded cdot- and zdot benchmark
Werner Saar [Tue, 7 Apr 2015 09:56:06 +0000 (11:56 +0200)]
added cdot- and zdot benchmark

9 years agoupdated some lines for bulldozer
Werner Saar [Mon, 6 Apr 2015 16:47:16 +0000 (18:47 +0200)]
updated some lines for bulldozer

9 years agoadded optimized saxpy- and daxpy-kernel for sandybridge
Werner Saar [Mon, 6 Apr 2015 14:05:16 +0000 (16:05 +0200)]
added optimized saxpy- and daxpy-kernel for sandybridge

9 years agoadded optimized saxpy- and daxpy-kernel for haswell
Werner Saar [Mon, 6 Apr 2015 10:33:16 +0000 (12:33 +0200)]
added optimized saxpy- and daxpy-kernel for haswell

9 years agoMerge pull request #531 from wernsaar/develop
Zhang Xianyi [Sun, 5 Apr 2015 21:42:39 +0000 (16:42 -0500)]
Merge pull request #531 from wernsaar/develop

added optimized sdot- and ddot-kernels for Haswell and Sandybridge

9 years agoadded optimized ddot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 18:19:38 +0000 (20:19 +0200)]
added optimized ddot-kernel for sandybridge

9 years agoadd optimized sdot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 17:47:05 +0000 (19:47 +0200)]
add optimized sdot-kernel for sandybridge

9 years agoremoved double definition line
Werner Saar [Sun, 5 Apr 2015 16:35:34 +0000 (18:35 +0200)]
removed double definition line

9 years agoadded optimized sdot- and ddot-kernel for HASWELL
Werner Saar [Sun, 5 Apr 2015 15:57:53 +0000 (17:57 +0200)]
added optimized sdot- and ddot-kernel for HASWELL

9 years agoRefs #529. Support Intel Broadwell by Haswell kernels.
Zhang Xianyi [Thu, 2 Apr 2015 16:08:03 +0000 (11:08 -0500)]
Refs #529. Support Intel Broadwell by Haswell kernels.

9 years agoMerge pull request #527 from xantares/patch-1
Zhang Xianyi [Mon, 30 Mar 2015 15:16:11 +0000 (10:16 -0500)]
Merge pull request #527 from xantares/patch-1

fix mingw install

9 years agofix mingw install
xantares [Mon, 30 Mar 2015 07:30:55 +0000 (09:30 +0200)]
fix mingw install

9 years agoFix build bug for ARM64.
Zhang Xianyi [Tue, 24 Mar 2015 20:27:17 +0000 (15:27 -0500)]
Fix build bug for ARM64.

9 years agoUpdate the doc for 0.2.14.
Zhang Xianyi [Tue, 24 Mar 2015 20:05:59 +0000 (15:05 -0500)]
Update the doc for 0.2.14.

9 years agoMerge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Tue, 24 Mar 2015 17:17:12 +0000 (12:17 -0500)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop

9 years agoAdd ARM targets.
Zhang Xianyi [Tue, 24 Mar 2015 17:17:04 +0000 (12:17 -0500)]
Add ARM targets.

9 years agoFix compiling bug for ARM with setting BINARY.
Zhang Xianyi [Tue, 24 Mar 2015 17:15:33 +0000 (17:15 +0000)]
Fix compiling bug for ARM with setting BINARY.

9 years agoMerge pull request #521 from maxlevesque/patch-1
Zhang Xianyi [Sat, 21 Mar 2015 17:26:35 +0000 (12:26 -0500)]
Merge pull request #521 from maxlevesque/patch-1

Correct typo /proc/ instead of /pros/

9 years agoCorrect typo /proc/ instead of /pros/
Maximilien Levesque [Fri, 20 Mar 2015 22:25:11 +0000 (23:25 +0100)]
Correct typo /proc/ instead of /pros/

9 years agoRefs #519. Avoid calling strncpy.
Zhang Xianyi [Thu, 19 Mar 2015 20:57:22 +0000 (15:57 -0500)]
Refs #519. Avoid calling strncpy.

9 years agoRefs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.
Zhang Xianyi [Thu, 19 Mar 2015 16:51:36 +0000 (11:51 -0500)]
Refs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.

9 years agoMerge pull request #518 from ton/issue-508
Zhang Xianyi [Wed, 18 Mar 2015 18:00:07 +0000 (13:00 -0500)]
Merge pull request #518 from ton/issue-508

Fix issue #508

9 years agoFix issue #508
Ton van den Heuvel [Wed, 18 Mar 2015 12:22:43 +0000 (13:22 +0100)]
Fix issue #508

Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().

9 years agoRefs #492. Fixed c/zsyr bug with negative incx.
Zhang Xianyi [Wed, 25 Feb 2015 22:37:03 +0000 (06:37 +0800)]
Refs #492. Fixed c/zsyr bug with negative incx.

9 years agoRefs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.
Zhang Xianyi [Wed, 25 Feb 2015 17:47:11 +0000 (01:47 +0800)]
Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.

9 years agoRefs#509. Merge branch 'grisuthedragon-develop' into develop
Zhang Xianyi [Wed, 25 Feb 2015 17:44:19 +0000 (01:44 +0800)]
Refs#509. Merge branch 'grisuthedragon-develop' into develop

9 years agoAdd ATLAS-style ?geadd function
Martin Koehler [Mon, 16 Feb 2015 12:46:20 +0000 (13:46 +0100)]
Add ATLAS-style ?geadd function

9 years agoDetect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.
Zhang Xianyi [Sun, 8 Feb 2015 07:42:48 +0000 (01:42 -0600)]
Detect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.

9 years agoFix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
Zhang Xianyi [Sun, 8 Feb 2015 07:30:12 +0000 (01:30 -0600)]
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.

9 years agoMerge pull request #497 from eschnett/develop
Zhang Xianyi [Wed, 4 Feb 2015 05:09:38 +0000 (23:09 -0600)]
Merge pull request #497 from eschnett/develop

Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoIntroduce openblas_get_num_threads and openblas_get_num_procs
Erik Schnetter [Tue, 3 Feb 2015 17:23:34 +0000 (12:23 -0500)]
Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoMerge pull request #495 from jeromerobert/develop
Zhang Xianyi [Thu, 29 Jan 2015 10:23:50 +0000 (18:23 +0800)]
Merge pull request #495 from jeromerobert/develop

Fix a segfault in gemv when MAX_STACK_ALLOC is set

9 years agoFix a segfault in gemv when MAX_STACK_ALLOC is set
Jerome Robert [Thu, 29 Jan 2015 08:55:57 +0000 (09:55 +0100)]
Fix a segfault in gemv when MAX_STACK_ALLOC is set

* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.

9 years agoMerge pull request #490 from eschnett/develop
Zhang Xianyi [Tue, 13 Jan 2015 07:43:56 +0000 (15:43 +0800)]
Merge pull request #490 from eschnett/develop

Move #include statements outside extern "C" blocks

9 years agoMove #include statements outside extern "C" blocks
Erik Schnetter [Tue, 13 Jan 2015 02:27:52 +0000 (21:27 -0500)]
Move #include statements outside extern "C" blocks

9 years agoFix cortex-a15 detecting bug.
Zhang Xianyi [Mon, 12 Jan 2015 09:35:16 +0000 (09:35 +0000)]
Fix cortex-a15 detecting bug.

9 years agoAdd cortex-a9 and cortex-a15 targets.
Zhang Xianyi [Mon, 12 Jan 2015 08:55:29 +0000 (08:55 +0000)]
Add cortex-a9 and cortex-a15 targets.