Werner Saar [Mon, 22 Dec 2014 16:01:18 +0000 (17:01 +0100)]
Ref #458: Backport, sandybrigde uses nehalem zgemm kernel
Werner Saar [Mon, 22 Dec 2014 13:04:27 +0000 (14:04 +0100)]
increased NMAX to 128
Werner Saar [Fri, 19 Dec 2014 11:40:46 +0000 (12:40 +0100)]
modified sources for OS Darwin
Werner Saar [Thu, 18 Dec 2014 19:35:51 +0000 (20:35 +0100)]
small optimization on dgemm_kernel for N=1
Werner Saar [Wed, 17 Dec 2014 14:02:11 +0000 (15:02 +0100)]
added code for the size of n
Werner Saar [Wed, 17 Dec 2014 13:12:21 +0000 (14:12 +0100)]
modified makefile for acml6.1
Werner Saar [Thu, 11 Dec 2014 13:57:41 +0000 (14:57 +0100)]
Increased the Threshold value in sep.in
Werner Saar [Thu, 11 Dec 2014 12:53:59 +0000 (13:53 +0100)]
added tests to sep.as as workaround for gfortran-4.8.x
Werner Saar [Sun, 7 Dec 2014 11:38:54 +0000 (12:38 +0100)]
changed inline assembler labels to short form
Zhang Xianyi [Wed, 3 Dec 2014 15:03:48 +0000 (23:03 +0800)]
Merge branch 'develop' of github.com:xianyi/OpenBLAS into develop
Zhang Xianyi [Wed, 3 Dec 2014 15:00:29 +0000 (23:00 +0800)]
Update the doc for 0.2.13 version.
Zhang Xianyi [Wed, 3 Dec 2014 09:38:41 +0000 (17:38 +0800)]
Fixed a bug of sgemm sandy bridge kernel.
Reported by Julia project. JuliaLang/julia#9084
Zhang Xianyi [Wed, 3 Dec 2014 04:53:20 +0000 (12:53 +0800)]
Merge pull request #471 from nolta/patch-4
c_check: set $hostarch to x86_64 instead of amd64
Zhang Xianyi [Wed, 3 Dec 2014 04:50:46 +0000 (12:50 +0800)]
Merge pull request #470 from nolta/patch-3
fix fortran compiler detection on FreeBSD
Mike Nolta [Wed, 3 Dec 2014 02:23:23 +0000 (21:23 -0500)]
c_check: set $hostarch to x86_64 instead of amd64
`uname -m` returns "amd64" on some systems.
Mike Nolta [Wed, 3 Dec 2014 01:47:40 +0000 (20:47 -0500)]
fix fortran compiler detection on FreeBSD
On FreeBSD, passing extra options to `which` causes it to report a non-zero status:
```
$ which gfortran48 -m64
/usr/local/bin/gfortran48
$ echo $?
1
```
```
$ which gfortran48
/usr/local/bin/gfortran48
$ echo $?
0
```
Zhang Xianyi [Fri, 28 Nov 2014 18:16:40 +0000 (02:16 +0800)]
Refs #461. Provide OpenBLASConfig.cmake to support CMake.
If you "make PREFIX=/path/to/OpenBLAS install" ,
The config file will be located in /path/to/OpenBLAS/cmake
Then, you can use "find_package(OpenBLAS)" at CMake.
cmake -DOpenBLAS_DIR=/path/to/OpenBLAS/cmake ..
Zhang Xianyi [Tue, 25 Nov 2014 07:28:58 +0000 (15:28 +0800)]
Update organization info.
Zhang Xianyi [Mon, 24 Nov 2014 07:34:48 +0000 (15:34 +0800)]
Refs #467. Added generic kernel file for x86_64.
Zhang Xianyi [Tue, 11 Nov 2014 14:21:04 +0000 (22:21 +0800)]
Fixed #456. Merged the optimizations for APM's
xgene-1 (aarch64).
Merge branch 'benedikt-huber-dave-patch' into develop
Benedikt Huber [Thu, 9 Oct 2014 13:52:10 +0000 (06:52 -0700)]
# The first commit's message is:
Optimizations for APM's xgene-1 (aarch64).
1) general system updates to support armv8 better. Make all did not work, one needed to supply TARGET=ARMV8.
2) sgem 4x4 kernel in assembler using SIMD, and configuration changes to use it.
3) strmm 4x4 kernel in C. Since the sgem kernel does 4x4, the trmm kernel must also do 4xN.
Added Dave Nuechterlein to the contributors list.
Zhang Xianyi [Mon, 10 Nov 2014 09:15:34 +0000 (17:15 +0800)]
refs #464. Fixed the bug of detecting L2 associative on x86.
Zhang Xianyi [Mon, 10 Nov 2014 06:39:56 +0000 (14:39 +0800)]
#463 fixed a compiling bug on AIX.
Zhang Xianyi [Sat, 25 Oct 2014 11:49:03 +0000 (19:49 +0800)]
Merge pull request #459 from tkelman/symbol-rename
add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
Tony Kelman [Sat, 25 Oct 2014 05:27:00 +0000 (22:27 -0700)]
add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
for adding a prefix or suffix to all exported symbol names in the shared library
Useful to avoid conflicts with other BLAS libraries, especially when using
64 bit integer interfaces in OpenBLAS
Note that since OSX does not have the objcopy utility, setting these options
to non-empty values on Mac requires the objconv tool, available (GPL license)
from http://www.agner.org/optimize/#objconv
Zhang Xianyi [Mon, 13 Oct 2014 09:10:12 +0000 (17:10 +0800)]
Update dot to 0.2.12 version.
wernsaar [Tue, 23 Sep 2014 09:34:29 +0000 (11:34 +0200)]
Ref #454: fixed bug in common_param.h
Zhang Xianyi [Mon, 22 Sep 2014 08:47:54 +0000 (16:47 +0800)]
Merge pull request #453 from wernsaar/develop
Enabled GEMM3M functions
wernsaar [Sun, 21 Sep 2014 11:39:15 +0000 (13:39 +0200)]
updated cblas.h and cblas_noconst.h
wernsaar [Sun, 21 Sep 2014 10:00:41 +0000 (12:00 +0200)]
added benchmark for gemm3m functions
wernsaar [Sun, 21 Sep 2014 09:41:43 +0000 (11:41 +0200)]
bugfix for GEMM3M functions
wernsaar [Sun, 21 Sep 2014 08:55:08 +0000 (10:55 +0200)]
added GEMM3M tests
wernsaar [Sat, 20 Sep 2014 15:20:02 +0000 (17:20 +0200)]
enabled cblas gemm3m functions
wernsaar [Sat, 20 Sep 2014 13:27:40 +0000 (15:27 +0200)]
disabled SYMM3M and HEMM3M functions because segment violations
wernsaar [Sat, 20 Sep 2014 12:53:30 +0000 (14:53 +0200)]
added test for CGEMM3M function
wernsaar [Sat, 20 Sep 2014 12:27:10 +0000 (14:27 +0200)]
enabled use of GEMM3M functions
wernsaar [Sat, 20 Sep 2014 12:21:42 +0000 (14:21 +0200)]
added test for GEMM3M functions
wernsaar [Wed, 17 Sep 2014 14:01:07 +0000 (16:01 +0200)]
updated README.md
Zhang Xianyi [Wed, 17 Sep 2014 06:29:21 +0000 (14:29 +0800)]
Update the doc for target list.
Zhang Xianyi [Wed, 17 Sep 2014 06:20:06 +0000 (14:20 +0800)]
Merge pull request #451 from eshelman/patch-1
Add HASWELL to TargetList.txt
Eliot Eshelman [Tue, 16 Sep 2014 22:26:45 +0000 (18:26 -0400)]
Add HASWELL to TargetList.txt
The Intel "Haswell" architecture is missing from the list of build targets.
Zhang Xianyi [Tue, 16 Sep 2014 06:33:48 +0000 (14:33 +0800)]
Merge pull request #449 from wernsaar/develop
optimized multithreading lower limits
wernsaar [Mon, 15 Sep 2014 09:38:25 +0000 (11:38 +0200)]
optimized multithreading lower limits
Zhang Xianyi [Mon, 15 Sep 2014 05:12:14 +0000 (13:12 +0800)]
Merge pull request #448 from wernsaar/develop
Optimized cgemv and zgemv kernels
wernsaar [Sun, 14 Sep 2014 09:00:53 +0000 (11:00 +0200)]
removed obsolete gemv kernel files
wernsaar [Sun, 14 Sep 2014 08:21:22 +0000 (10:21 +0200)]
optimized zgemv_n_microk_sandy-4.c
wernsaar [Sun, 14 Sep 2014 07:02:05 +0000 (09:02 +0200)]
added optimized zgemv_n kernel for sandybridge
wernsaar [Sat, 13 Sep 2014 14:26:53 +0000 (16:26 +0200)]
bugfix in KERNEL.PILEDRIVER
wernsaar [Sat, 13 Sep 2014 14:13:27 +0000 (16:13 +0200)]
optimized cgemv_t kernel for haswell
wernsaar [Sat, 13 Sep 2014 13:14:12 +0000 (15:14 +0200)]
added optimized cgemv_t kernel for haswell
wernsaar [Sat, 13 Sep 2014 10:23:27 +0000 (12:23 +0200)]
updated KERNEL.HASWELL
wernsaar [Sat, 13 Sep 2014 07:48:34 +0000 (09:48 +0200)]
updated zgemv_t_4.c
wernsaar [Sat, 13 Sep 2014 07:47:07 +0000 (09:47 +0200)]
added optimized zgemv_t kernel for haswell
wernsaar [Fri, 12 Sep 2014 17:18:23 +0000 (19:18 +0200)]
optimized interface/zgemv.c for multithreading
wernsaar [Fri, 12 Sep 2014 15:43:47 +0000 (17:43 +0200)]
enabled optimized zgemv_t kernel for bulldozer
wernsaar [Fri, 12 Sep 2014 15:42:25 +0000 (17:42 +0200)]
optimized zgemv_t for bulldozer
wernsaar [Fri, 12 Sep 2014 15:04:22 +0000 (17:04 +0200)]
added optimized zgemv_t kernel for bulldozer
wernsaar [Fri, 12 Sep 2014 12:12:24 +0000 (14:12 +0200)]
bugfix in cgemv_t_4.c
wernsaar [Fri, 12 Sep 2014 11:38:01 +0000 (13:38 +0200)]
added optimized cgemv_t kernel
wernsaar [Fri, 12 Sep 2014 10:35:20 +0000 (12:35 +0200)]
added optimized zgemv_t routine
wernsaar [Thu, 11 Sep 2014 11:44:55 +0000 (13:44 +0200)]
optimized zgemv_n_microk_haswell-4.c for small size
wernsaar [Thu, 11 Sep 2014 11:18:00 +0000 (13:18 +0200)]
bugfix in zgemv_n_4.c
wernsaar [Thu, 11 Sep 2014 10:34:57 +0000 (12:34 +0200)]
added optimized zgemv_n kernel
wernsaar [Thu, 11 Sep 2014 09:12:44 +0000 (11:12 +0200)]
bufix in cgemv_n_microk_haswell-4.c
wernsaar [Thu, 11 Sep 2014 08:25:48 +0000 (10:25 +0200)]
more optimizations
wernsaar [Wed, 10 Sep 2014 17:26:14 +0000 (19:26 +0200)]
optimized cgemv_n_4.c
wernsaar [Wed, 10 Sep 2014 12:11:24 +0000 (14:11 +0200)]
added optimized cgemv_kernel for haswell
wernsaar [Wed, 10 Sep 2014 11:45:13 +0000 (13:45 +0200)]
added cgemv_n kernel, optimized for small sizes
Zhang Xianyi [Wed, 10 Sep 2014 08:31:31 +0000 (16:31 +0800)]
Merge pull request #446 from grisuthedragon/cblas_matcopy
Add a CBLAS interface for the BLAS extension s/d/c/z*matcopy routines.
Zhang Xianyi [Wed, 10 Sep 2014 08:28:14 +0000 (16:28 +0800)]
Merge pull request #445 from wernsaar/develop
A lot of optimizations for gemv kernels
wernsaar [Tue, 9 Sep 2014 14:17:45 +0000 (16:17 +0200)]
added and tested optimized dgemv_n kernel for haswell
wernsaar [Tue, 9 Sep 2014 13:32:32 +0000 (15:32 +0200)]
added optimized dgemv_n kernel for haswell
wernsaar [Tue, 9 Sep 2014 12:38:08 +0000 (14:38 +0200)]
optimized dgemv_t kernel for haswell
wernsaar [Tue, 9 Sep 2014 12:04:44 +0000 (14:04 +0200)]
bugfix in KERNEL.HASWELL
wernsaar [Tue, 9 Sep 2014 11:54:55 +0000 (13:54 +0200)]
added optimized gemv kernels
wernsaar [Tue, 9 Sep 2014 11:34:22 +0000 (13:34 +0200)]
added optimized dgemv_t kernel for haswell
Martin Koehler [Tue, 9 Sep 2014 07:52:13 +0000 (09:52 +0200)]
add CBLAS interface for s/d/c/zimatcopy
wernsaar [Mon, 8 Sep 2014 17:15:31 +0000 (19:15 +0200)]
removed obsolete files
Martin Köhler [Mon, 8 Sep 2014 15:57:44 +0000 (17:57 +0200)]
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
wernsaar [Mon, 8 Sep 2014 13:22:35 +0000 (15:22 +0200)]
optimized dgemv_n kernel for small sizes
wernsaar [Mon, 8 Sep 2014 10:27:32 +0000 (12:27 +0200)]
modified multithreading threshold
wernsaar [Mon, 8 Sep 2014 10:25:16 +0000 (12:25 +0200)]
added haswell optimized kernel
wernsaar [Mon, 8 Sep 2014 08:54:33 +0000 (10:54 +0200)]
bugfix in sgemv_n_microk_haswell-4.c
wernsaar [Mon, 8 Sep 2014 08:13:39 +0000 (10:13 +0200)]
added optimized sgemv_t kernel for haswell
wernsaar [Sun, 7 Sep 2014 19:48:42 +0000 (21:48 +0200)]
bugfix for windows
wernsaar [Sun, 7 Sep 2014 19:13:57 +0000 (21:13 +0200)]
enabled optimized sgemv kernels for piledriver
wernsaar [Sun, 7 Sep 2014 18:53:30 +0000 (20:53 +0200)]
optimized sgemv_n kernel for sandybridge
wernsaar [Sun, 7 Sep 2014 17:20:08 +0000 (19:20 +0200)]
optimized sgemv_n kernel for nehalem
wernsaar [Sun, 7 Sep 2014 16:23:48 +0000 (18:23 +0200)]
optimized sgemv_n for very small size of m
wernsaar [Sun, 7 Sep 2014 11:45:03 +0000 (13:45 +0200)]
optimizations for very small sizes
wernsaar [Sat, 6 Sep 2014 19:28:57 +0000 (21:28 +0200)]
better optimzations for sgemv_t kernel
wernsaar [Sat, 6 Sep 2014 17:41:57 +0000 (19:41 +0200)]
optimized sgemv_t_4 kernel for very small sizes
wernsaar [Sat, 6 Sep 2014 16:34:25 +0000 (18:34 +0200)]
optimized sgemv_t
wernsaar [Sat, 6 Sep 2014 11:17:56 +0000 (13:17 +0200)]
optimization for small size
wernsaar [Sat, 6 Sep 2014 10:08:48 +0000 (12:08 +0200)]
added optimized sgemv_n kernel for haswell
wernsaar [Sat, 6 Sep 2014 09:01:42 +0000 (11:01 +0200)]
undef WHEREAMI
wernsaar [Sat, 6 Sep 2014 06:41:53 +0000 (08:41 +0200)]
added optimized sgemv_n kernel for sandybridge
wernsaar [Fri, 5 Sep 2014 13:05:53 +0000 (15:05 +0200)]
experimentally removed expensive function calls
wernsaar [Fri, 5 Sep 2014 08:22:50 +0000 (10:22 +0200)]
optimized sgemv_t for sandybridge
wernsaar [Thu, 4 Sep 2014 16:55:52 +0000 (18:55 +0200)]
bugfix for sgemv_n_4.c