platform/upstream/openblas.git
9 years agoadd optimized cdot- and zdot-kernel for haswell
Werner Saar [Thu, 9 Apr 2015 13:13:52 +0000 (15:13 +0200)]
add optimized cdot- and zdot-kernel for haswell

9 years agoupdated cdot and zdot for piledriver
Werner Saar [Thu, 9 Apr 2015 08:33:46 +0000 (10:33 +0200)]
updated cdot and zdot for piledriver

9 years agoadded optimized cdot- and zdot-kernel for steamroller
Werner Saar [Thu, 9 Apr 2015 07:45:23 +0000 (09:45 +0200)]
added optimized cdot- and zdot-kernel for steamroller

9 years agoadded optimized cdot- and zdot-kernels for bulldozer
Werner Saar [Wed, 8 Apr 2015 14:29:55 +0000 (16:29 +0200)]
added optimized cdot- and zdot-kernels for bulldozer

9 years agoadded cdot- and zdot benchmark
Werner Saar [Tue, 7 Apr 2015 09:56:06 +0000 (11:56 +0200)]
added cdot- and zdot benchmark

9 years agoupdated some lines for bulldozer
Werner Saar [Mon, 6 Apr 2015 16:47:16 +0000 (18:47 +0200)]
updated some lines for bulldozer

9 years agoadded optimized saxpy- and daxpy-kernel for sandybridge
Werner Saar [Mon, 6 Apr 2015 14:05:16 +0000 (16:05 +0200)]
added optimized saxpy- and daxpy-kernel for sandybridge

9 years agoadded optimized saxpy- and daxpy-kernel for haswell
Werner Saar [Mon, 6 Apr 2015 10:33:16 +0000 (12:33 +0200)]
added optimized saxpy- and daxpy-kernel for haswell

9 years agoMerge pull request #531 from wernsaar/develop
Zhang Xianyi [Sun, 5 Apr 2015 21:42:39 +0000 (16:42 -0500)]
Merge pull request #531 from wernsaar/develop

added optimized sdot- and ddot-kernels for Haswell and Sandybridge

9 years agoadded optimized ddot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 18:19:38 +0000 (20:19 +0200)]
added optimized ddot-kernel for sandybridge

9 years agoadd optimized sdot-kernel for sandybridge
Werner Saar [Sun, 5 Apr 2015 17:47:05 +0000 (19:47 +0200)]
add optimized sdot-kernel for sandybridge

9 years agoremoved double definition line
Werner Saar [Sun, 5 Apr 2015 16:35:34 +0000 (18:35 +0200)]
removed double definition line

9 years agoadded optimized sdot- and ddot-kernel for HASWELL
Werner Saar [Sun, 5 Apr 2015 15:57:53 +0000 (17:57 +0200)]
added optimized sdot- and ddot-kernel for HASWELL

9 years agoRefs #529. Support Intel Broadwell by Haswell kernels.
Zhang Xianyi [Thu, 2 Apr 2015 16:08:03 +0000 (11:08 -0500)]
Refs #529. Support Intel Broadwell by Haswell kernels.

9 years agoMerge pull request #527 from xantares/patch-1
Zhang Xianyi [Mon, 30 Mar 2015 15:16:11 +0000 (10:16 -0500)]
Merge pull request #527 from xantares/patch-1

fix mingw install

9 years agofix mingw install
xantares [Mon, 30 Mar 2015 07:30:55 +0000 (09:30 +0200)]
fix mingw install

9 years agoFix build bug for ARM64.
Zhang Xianyi [Tue, 24 Mar 2015 20:27:17 +0000 (15:27 -0500)]
Fix build bug for ARM64.

9 years agoUpdate the doc for 0.2.14.
Zhang Xianyi [Tue, 24 Mar 2015 20:05:59 +0000 (15:05 -0500)]
Update the doc for 0.2.14.

9 years agoMerge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Tue, 24 Mar 2015 17:17:12 +0000 (12:17 -0500)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop

9 years agoAdd ARM targets.
Zhang Xianyi [Tue, 24 Mar 2015 17:17:04 +0000 (12:17 -0500)]
Add ARM targets.

9 years agoFix compiling bug for ARM with setting BINARY.
Zhang Xianyi [Tue, 24 Mar 2015 17:15:33 +0000 (17:15 +0000)]
Fix compiling bug for ARM with setting BINARY.

9 years agoMerge pull request #521 from maxlevesque/patch-1
Zhang Xianyi [Sat, 21 Mar 2015 17:26:35 +0000 (12:26 -0500)]
Merge pull request #521 from maxlevesque/patch-1

Correct typo /proc/ instead of /pros/

9 years agoCorrect typo /proc/ instead of /pros/
Maximilien Levesque [Fri, 20 Mar 2015 22:25:11 +0000 (23:25 +0100)]
Correct typo /proc/ instead of /pros/

9 years agoRefs #519. Avoid calling strncpy.
Zhang Xianyi [Thu, 19 Mar 2015 20:57:22 +0000 (15:57 -0500)]
Refs #519. Avoid calling strncpy.

9 years agoRefs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.
Zhang Xianyi [Thu, 19 Mar 2015 16:51:36 +0000 (11:51 -0500)]
Refs #520. Fixed ONLY_CBLAS=1 compiling bug on OSX.

9 years agoMerge pull request #518 from ton/issue-508
Zhang Xianyi [Wed, 18 Mar 2015 18:00:07 +0000 (13:00 -0500)]
Merge pull request #518 from ton/issue-508

Fix issue #508

9 years agoFix issue #508
Ton van den Heuvel [Wed, 18 Mar 2015 12:22:43 +0000 (13:22 +0100)]
Fix issue #508

Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().

9 years agoRefs #492. Fixed c/zsyr bug with negative incx.
Zhang Xianyi [Wed, 25 Feb 2015 22:37:03 +0000 (06:37 +0800)]
Refs #492. Fixed c/zsyr bug with negative incx.

9 years agoRefs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.
Zhang Xianyi [Wed, 25 Feb 2015 17:47:11 +0000 (01:47 +0800)]
Refs #509. Fixed geadd building bug with DYNAMIC_ARCH=1.

9 years agoRefs#509. Merge branch 'grisuthedragon-develop' into develop
Zhang Xianyi [Wed, 25 Feb 2015 17:44:19 +0000 (01:44 +0800)]
Refs#509. Merge branch 'grisuthedragon-develop' into develop

9 years agoAdd ATLAS-style ?geadd function
Martin Koehler [Mon, 16 Feb 2015 12:46:20 +0000 (13:46 +0100)]
Add ATLAS-style ?geadd function

9 years agoDetect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.
Zhang Xianyi [Sun, 8 Feb 2015 07:42:48 +0000 (01:42 -0600)]
Detect the wrong combined flags of USE_OPENMP=1 and USE_THREAD=0.

9 years agoFix openblas_get_num_threads and openblas_get_num_procs bug with single thread.
Zhang Xianyi [Sun, 8 Feb 2015 07:30:12 +0000 (01:30 -0600)]
Fix openblas_get_num_threads and openblas_get_num_procs bug with single thread.

9 years agoMerge pull request #497 from eschnett/develop
Zhang Xianyi [Wed, 4 Feb 2015 05:09:38 +0000 (23:09 -0600)]
Merge pull request #497 from eschnett/develop

Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoIntroduce openblas_get_num_threads and openblas_get_num_procs
Erik Schnetter [Tue, 3 Feb 2015 17:23:34 +0000 (12:23 -0500)]
Introduce openblas_get_num_threads and openblas_get_num_procs

9 years agoMerge pull request #495 from jeromerobert/develop
Zhang Xianyi [Thu, 29 Jan 2015 10:23:50 +0000 (18:23 +0800)]
Merge pull request #495 from jeromerobert/develop

Fix a segfault in gemv when MAX_STACK_ALLOC is set

9 years agoFix a segfault in gemv when MAX_STACK_ALLOC is set
Jerome Robert [Thu, 29 Jan 2015 08:55:57 +0000 (09:55 +0100)]
Fix a segfault in gemv when MAX_STACK_ALLOC is set

* stack_alloc_size is needed after the implementation call
but it may be overwritten if it's optimized to a register,
because some gemv implementation (ex: dgemv_n.S) do not
restore all register (ex: r10).
* do the same in ger.c for the same reasons even if the bug
has not been observed.

9 years agoMerge pull request #490 from eschnett/develop
Zhang Xianyi [Tue, 13 Jan 2015 07:43:56 +0000 (15:43 +0800)]
Merge pull request #490 from eschnett/develop

Move #include statements outside extern "C" blocks

9 years agoMove #include statements outside extern "C" blocks
Erik Schnetter [Tue, 13 Jan 2015 02:27:52 +0000 (21:27 -0500)]
Move #include statements outside extern "C" blocks

9 years agoFix cortex-a15 detecting bug.
Zhang Xianyi [Mon, 12 Jan 2015 09:35:16 +0000 (09:35 +0000)]
Fix cortex-a15 detecting bug.

9 years agoAdd cortex-a9 and cortex-a15 targets.
Zhang Xianyi [Mon, 12 Jan 2015 08:55:29 +0000 (08:55 +0000)]
Add cortex-a9 and cortex-a15 targets.

9 years agoMerge pull request #487 from kortschak/dromtg-test
Zhang Xianyi [Wed, 7 Jan 2015 06:13:11 +0000 (14:13 +0800)]
Merge pull request #487 from kortschak/dromtg-test

Add test for drotmg bug fixed by 692b14c

9 years agoAdd test for drotmg bug fixed by 692b14c
kortschak [Tue, 6 Jan 2015 23:36:55 +0000 (10:06 +1030)]
Add test for drotmg bug fixed by 692b14c

Test requested in issue xianyi/OpenBLAS#484.

Run tests by applying the following change and then make:

diff --git a/Makefile.rule b/Makefile.rule
index bea1fe1..9852ff3 100644
--- a/Makefile.rule
+++ b/Makefile.rule
@@ -140,7 +140,7 @@ NO_AFFINITY = 1

-# UTEST_CHECK = 1
+UTEST_CHECK = 1

9 years agoAdd configuration options.
Zhang Xianyi [Thu, 1 Jan 2015 18:42:32 +0000 (02:42 +0800)]
Add configuration options.

9 years agoMerge pull request #482 from jeromerobert/develop
Zhang Xianyi [Thu, 1 Jan 2015 18:26:17 +0000 (02:26 +0800)]
Merge pull request #482 from jeromerobert/develop

Allow to do gemv and ger buffer allocation on the stack

9 years agoMerge pull request #486 from wernsaar/develop
Zhang Xianyi [Tue, 30 Dec 2014 18:36:23 +0000 (02:36 +0800)]
Merge pull request #486 from wernsaar/develop

Optimizations for steamroller

9 years agoMerge branch 'develop' of github.com:wernsaar/OpenBLAS into develop
Werner Saar [Tue, 30 Dec 2014 12:16:53 +0000 (20:16 +0800)]
Merge branch 'develop' of github.com:wernsaar/OpenBLAS into develop

9 years agoadded optimizations for steamroller
Werner Saar [Tue, 30 Dec 2014 12:14:45 +0000 (20:14 +0800)]
added optimizations for steamroller

9 years agoMerge pull request #483 from wernsaar/develop
Zhang Xianyi [Mon, 29 Dec 2014 04:00:16 +0000 (12:00 +0800)]
Merge pull request #483 from wernsaar/develop

added Steamroller as a  cpu target

9 years agobugfix in dynamic.c
Werner Saar [Sun, 28 Dec 2014 16:15:42 +0000 (17:15 +0100)]
bugfix in dynamic.c

9 years agoadded Steamroller as a target processor
Werner Saar [Sun, 28 Dec 2014 12:45:19 +0000 (13:45 +0100)]
added Steamroller as a target processor

9 years agoadded target processor STEAMROLLER
Werner Saar [Sun, 28 Dec 2014 12:16:46 +0000 (20:16 +0800)]
added target processor STEAMROLLER

9 years agoAllow to do gemv and ger buffer allocation on the stack
Jerome Robert [Fri, 26 Dec 2014 13:42:00 +0000 (14:42 +0100)]
Allow to do gemv and ger buffer allocation on the stack

ger and gemv call blas_memory_alloc/free which in their turn
call blas_lock. blas_lock create thread contention when matrices
are small and the number of thread is high enough. We avoid
call blas_memory_alloc by replacing it with stack allocation.
This can be enabled with:
make -DMAX_STACK_ALLOC=2048
The given size (in byte) must be high enough to avoid thread contention
and small enough to avoid stack overflow.

Fix #478

9 years agoMerge pull request #481 from eschnett/develop
Zhang Xianyi [Fri, 26 Dec 2014 02:09:19 +0000 (10:09 +0800)]
Merge pull request #481 from eschnett/develop

Correct ilaver C declaration

9 years agoCorrect ilaver C declaration
Erik Schnetter [Thu, 25 Dec 2014 22:41:17 +0000 (17:41 -0500)]
Correct ilaver C declaration

9 years agoMerge pull request #479 from wernsaar/develop
Zhang Xianyi [Mon, 22 Dec 2014 16:59:41 +0000 (00:59 +0800)]
Merge pull request #479 from wernsaar/develop

workaround for sandybridge zgemm kernel

9 years agoRef #458: Backport, sandybrigde uses nehalem zgemm kernel
Werner Saar [Mon, 22 Dec 2014 16:01:18 +0000 (17:01 +0100)]
Ref #458: Backport, sandybrigde uses nehalem zgemm kernel

9 years agoincreased NMAX to 128
Werner Saar [Mon, 22 Dec 2014 13:04:27 +0000 (14:04 +0100)]
increased NMAX to 128

9 years agomodified sources for OS Darwin
Werner Saar [Fri, 19 Dec 2014 11:40:46 +0000 (12:40 +0100)]
modified sources for OS Darwin

9 years agosmall optimization on dgemm_kernel for N=1
Werner Saar [Thu, 18 Dec 2014 19:35:51 +0000 (20:35 +0100)]
small optimization on dgemm_kernel for N=1

9 years agoadded code for the size of n
Werner Saar [Wed, 17 Dec 2014 14:02:11 +0000 (15:02 +0100)]
added code for the size of n

9 years agomodified makefile for acml6.1
Werner Saar [Wed, 17 Dec 2014 13:12:21 +0000 (14:12 +0100)]
modified makefile for acml6.1

9 years agoFixed installation bug on Mac OSX.
Zhang Xianyi [Sat, 13 Dec 2014 05:05:06 +0000 (13:05 +0800)]
Fixed installation bug on Mac OSX.

9 years agoIncreased the Threshold value in sep.in
Werner Saar [Thu, 11 Dec 2014 13:57:41 +0000 (14:57 +0100)]
Increased the Threshold value in sep.in

9 years agoadded tests to sep.as as workaround for gfortran-4.8.x
Werner Saar [Thu, 11 Dec 2014 12:53:59 +0000 (13:53 +0100)]
added tests to sep.as as workaround for gfortran-4.8.x

9 years agoMerge pull request #475 from xantares/patch-2
Zhang Xianyi [Tue, 9 Dec 2014 09:57:43 +0000 (17:57 +0800)]
Merge pull request #475 from xantares/patch-2

add OpenBLAS_VERSION to cmake config file

9 years agoMerge pull request #474 from xantares/patch-1
Zhang Xianyi [Tue, 9 Dec 2014 09:57:16 +0000 (17:57 +0800)]
Merge pull request #474 from xantares/patch-1

set OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>

9 years agoadd OpenBLAS_VERSION to cmake config file
xantares [Tue, 9 Dec 2014 09:34:41 +0000 (10:34 +0100)]
add OpenBLAS_VERSION to cmake config file

9 years agoset OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>
xantares [Tue, 9 Dec 2014 09:18:18 +0000 (10:18 +0100)]
set OPENBLAS_CMAKE_DIR to <prefix>/lib/cmake/<package_name>

usually these files are more often located in this subdir

9 years agoMerge pull request #473 from wernsaar/develop
Zhang Xianyi [Mon, 8 Dec 2014 05:22:18 +0000 (13:22 +0800)]
Merge pull request #473 from wernsaar/develop

changed inline assembler labels to short form

9 years agochanged inline assembler labels to short form
Werner Saar [Sun, 7 Dec 2014 11:38:54 +0000 (12:38 +0100)]
changed inline assembler labels to short form

9 years agoMerge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Wed, 3 Dec 2014 15:03:48 +0000 (23:03 +0800)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop

9 years agoUpdate the doc for 0.2.13 version.
Zhang Xianyi [Wed, 3 Dec 2014 15:00:29 +0000 (23:00 +0800)]
Update the doc for 0.2.13 version.

9 years agoFixed a bug of sgemm sandy bridge kernel.
Zhang Xianyi [Wed, 3 Dec 2014 09:38:41 +0000 (17:38 +0800)]
Fixed a bug of sgemm sandy bridge kernel.

Reported by Julia project. JuliaLang/julia#9084

9 years agoMerge pull request #471 from nolta/patch-4
Zhang Xianyi [Wed, 3 Dec 2014 04:53:20 +0000 (12:53 +0800)]
Merge pull request #471 from nolta/patch-4

c_check: set $hostarch to x86_64 instead of amd64

9 years agoMerge pull request #470 from nolta/patch-3
Zhang Xianyi [Wed, 3 Dec 2014 04:50:46 +0000 (12:50 +0800)]
Merge pull request #470 from nolta/patch-3

fix fortran compiler detection on FreeBSD

9 years agoc_check: set $hostarch to x86_64 instead of amd64
Mike Nolta [Wed, 3 Dec 2014 02:23:23 +0000 (21:23 -0500)]
c_check: set $hostarch to x86_64 instead of amd64

`uname -m` returns "amd64" on some systems.

9 years agofix fortran compiler detection on FreeBSD
Mike Nolta [Wed, 3 Dec 2014 01:47:40 +0000 (20:47 -0500)]
fix fortran compiler detection on FreeBSD

On FreeBSD, passing extra options to `which` causes it to report a non-zero status:

```
$ which gfortran48 -m64
/usr/local/bin/gfortran48
$ echo $?
1
```

```
$ which gfortran48
/usr/local/bin/gfortran48
$ echo $?
0
```

9 years agoRefs #461. Provide OpenBLASConfig.cmake to support CMake.
Zhang Xianyi [Fri, 28 Nov 2014 18:16:40 +0000 (02:16 +0800)]
Refs #461. Provide OpenBLASConfig.cmake to support CMake.

If you "make PREFIX=/path/to/OpenBLAS install" ,
The config file will be located in /path/to/OpenBLAS/cmake

Then, you can use "find_package(OpenBLAS)" at CMake.
cmake -DOpenBLAS_DIR=/path/to/OpenBLAS/cmake ..

9 years agoUpdate organization info.
Zhang Xianyi [Tue, 25 Nov 2014 07:28:58 +0000 (15:28 +0800)]
Update organization info.

9 years agoRefs #467. Added generic kernel file for x86_64.
Zhang Xianyi [Mon, 24 Nov 2014 07:34:48 +0000 (15:34 +0800)]
Refs #467. Added generic kernel file for x86_64.

9 years agoFixed #456. Merged the optimizations for APM's
Zhang Xianyi [Tue, 11 Nov 2014 14:21:04 +0000 (22:21 +0800)]
Fixed #456. Merged the optimizations for APM's
xgene-1 (aarch64).
Merge branch 'benedikt-huber-dave-patch' into develop

9 years ago # The first commit's message is:
Benedikt Huber [Thu, 9 Oct 2014 13:52:10 +0000 (06:52 -0700)]
 # The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).

1) general system updates to support armv8 better.  Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C.  Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.

Added Dave Nuechterlein to the contributors list.

9 years agorefs #464. Fixed the bug of detecting L2 associative on x86.
Zhang Xianyi [Mon, 10 Nov 2014 09:15:34 +0000 (17:15 +0800)]
refs #464. Fixed the bug of detecting L2 associative on x86.

9 years ago#463 fixed a compiling bug on AIX.
Zhang Xianyi [Mon, 10 Nov 2014 06:39:56 +0000 (14:39 +0800)]
#463 fixed a compiling bug on AIX.

9 years agoMerge pull request #459 from tkelman/symbol-rename
Zhang Xianyi [Sat, 25 Oct 2014 11:49:03 +0000 (19:49 +0800)]
Merge pull request #459 from tkelman/symbol-rename

add SYMBOLPREFIX and SYMBOLSUFFIX makefile options

9 years agoadd SYMBOLPREFIX and SYMBOLSUFFIX makefile options
Tony Kelman [Sat, 25 Oct 2014 05:27:00 +0000 (22:27 -0700)]
add SYMBOLPREFIX and SYMBOLSUFFIX makefile options

for adding a prefix or suffix to all exported symbol names in the shared library
Useful to avoid conflicts with other BLAS libraries, especially when using
64 bit integer interfaces in OpenBLAS

Note that since OSX does not have the objcopy utility, setting these options
to non-empty values on Mac requires the objconv tool, available (GPL license)
from http://www.agner.org/optimize/#objconv

9 years agoUpdate dot to 0.2.12 version.
Zhang Xianyi [Mon, 13 Oct 2014 09:10:12 +0000 (17:10 +0800)]
Update dot to 0.2.12 version.

9 years agoRef #454: fixed bug in common_param.h
wernsaar [Tue, 23 Sep 2014 09:34:29 +0000 (11:34 +0200)]
Ref #454: fixed bug in common_param.h

9 years agoMerge pull request #453 from wernsaar/develop
Zhang Xianyi [Mon, 22 Sep 2014 08:47:54 +0000 (16:47 +0800)]
Merge pull request #453 from wernsaar/develop

Enabled GEMM3M functions

9 years agoupdated cblas.h and cblas_noconst.h
wernsaar [Sun, 21 Sep 2014 11:39:15 +0000 (13:39 +0200)]
updated cblas.h and cblas_noconst.h

9 years agoadded benchmark for gemm3m functions
wernsaar [Sun, 21 Sep 2014 10:00:41 +0000 (12:00 +0200)]
added benchmark for gemm3m functions

9 years agobugfix for GEMM3M functions
wernsaar [Sun, 21 Sep 2014 09:41:43 +0000 (11:41 +0200)]
bugfix for GEMM3M functions

9 years agoadded GEMM3M tests
wernsaar [Sun, 21 Sep 2014 08:55:08 +0000 (10:55 +0200)]
added GEMM3M tests

9 years agoenabled cblas gemm3m functions
wernsaar [Sat, 20 Sep 2014 15:20:02 +0000 (17:20 +0200)]
enabled cblas gemm3m functions

9 years agodisabled SYMM3M and HEMM3M functions because segment violations
wernsaar [Sat, 20 Sep 2014 13:27:40 +0000 (15:27 +0200)]
disabled SYMM3M and HEMM3M functions because segment violations

9 years agoadded test for CGEMM3M function
wernsaar [Sat, 20 Sep 2014 12:53:30 +0000 (14:53 +0200)]
added test for CGEMM3M function

9 years agoenabled use of GEMM3M functions
wernsaar [Sat, 20 Sep 2014 12:27:10 +0000 (14:27 +0200)]
enabled use of GEMM3M functions

9 years agoadded test for GEMM3M functions
wernsaar [Sat, 20 Sep 2014 12:21:42 +0000 (14:21 +0200)]
added test for GEMM3M functions

9 years agoupdated README.md
wernsaar [Wed, 17 Sep 2014 14:01:07 +0000 (16:01 +0200)]
updated README.md