wernsaar [Fri, 16 Aug 2013 18:23:34 +0000 (20:23 +0200)]
removed unnessesary instructions
Zhang Xianyi [Fri, 23 Aug 2013 08:27:17 +0000 (16:27 +0800)]
Refs #282. Fixed zgemv_n typo bug on Win64.
Zhang Xianyi [Wed, 21 Aug 2013 15:21:51 +0000 (08:21 -0700)]
Merge pull request #280 from ViralBShah/develop
Patch LAPACK XLASD4.f as discussed in JuliaLang/julia#2340
Viral B. Shah [Wed, 21 Aug 2013 13:44:07 +0000 (19:14 +0530)]
Patch LAPACK XLASD4.f as discussed in JuliaLang/julia#2340
Zhang Xianyi [Tue, 20 Aug 2013 16:03:25 +0000 (00:03 +0800)]
Refs #279. Provide ONLY_CBLAS flag. If you only need CBLAS without
a fortran compiler, please try make ONLY_CBLAS=1.
This mode only compiler CBLAS without BLAS fortran interface and LAPACK.
Zhang Xianyi [Mon, 12 Aug 2013 15:22:10 +0000 (23:22 +0800)]
Merge branch 'bulldozer' into develop
Zhang Xianyi [Fri, 9 Aug 2013 02:49:44 +0000 (10:49 +0800)]
Fixed #276. Merge branch 'wernsaar-develop' into bulldozer
Zhang Xianyi [Fri, 9 Aug 2013 02:48:46 +0000 (10:48 +0800)]
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
wernsaar [Thu, 8 Aug 2013 15:49:30 +0000 (17:49 +0200)]
modified KERNEL.BULLDOZER
wernsaar [Thu, 8 Aug 2013 05:14:08 +0000 (07:14 +0200)]
added dtrsm_kernel_RN_8x2_bulldozer.S
wernsaar [Mon, 5 Aug 2013 09:27:16 +0000 (11:27 +0200)]
dtrsm_kernel_LT_8x2_bulldozer.S performance optimization
Zhang Xianyi [Mon, 5 Aug 2013 08:17:15 +0000 (16:17 +0800)]
Refs #270 #268. Merge branch 'wernsaar-develop' into bulldozer
Zhang Xianyi [Mon, 5 Aug 2013 08:09:47 +0000 (16:09 +0800)]
Merge branch 'develop' of https://github.com/wernsaar/OpenBLAS into wernsaar-develop
Zhang Xianyi [Mon, 5 Aug 2013 08:07:54 +0000 (16:07 +0800)]
Enable bulldozer kernels.
Zhang Xianyi [Mon, 5 Aug 2013 07:51:53 +0000 (15:51 +0800)]
Merge branch 'develop' into bulldozer
wernsaar [Sun, 4 Aug 2013 10:16:12 +0000 (12:16 +0200)]
modified dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sun, 4 Aug 2013 08:15:33 +0000 (10:15 +0200)]
modified dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sun, 4 Aug 2013 07:54:40 +0000 (09:54 +0200)]
added dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 13:40:51 +0000 (15:40 +0200)]
removed dtrsm_kernel_LT_8x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 10:19:29 +0000 (12:19 +0200)]
fixed bug in dgemv_t_bulldozer.S
wernsaar [Sat, 3 Aug 2013 09:43:25 +0000 (11:43 +0200)]
repaired trmm bug in sgemm_kernel_16x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 08:32:51 +0000 (10:32 +0200)]
repaired trmm bug in cgemm_kernel_4x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 08:17:08 +0000 (10:17 +0200)]
repaired trmm bug in zgemm_kernel_2x2_bulldozer.S
wernsaar [Sat, 3 Aug 2013 07:35:39 +0000 (09:35 +0200)]
repaired trmm bug in dgemm_kernel_8x2_bulldozer.S
Zhang Xianyi [Thu, 1 Aug 2013 15:57:19 +0000 (23:57 +0800)]
Merge branch 'hotfix-v0.2.8' into develop
Zhang Xianyi [Thu, 1 Aug 2013 15:52:43 +0000 (23:52 +0800)]
Update the doc for 0.2.8 version.
Zhang Xianyi [Wed, 31 Jul 2013 06:49:16 +0000 (14:49 +0800)]
OpenBLAS 0.2.8 rc1.
Zhang Xianyi [Wed, 31 Jul 2013 06:46:56 +0000 (14:46 +0800)]
Merge branch 'hotfix-v0.2.8' into develop
Zhang Xianyi [Wed, 31 Jul 2013 06:41:39 +0000 (14:41 +0800)]
Refs #266. Fixed the compiling bug with Open64 5.0.
wernsaar [Tue, 30 Jul 2013 18:18:57 +0000 (20:18 +0200)]
added generic trmm kernels and modified Makefile.L3
Zhang Xianyi [Mon, 29 Jul 2013 15:21:10 +0000 (23:21 +0800)]
Fixed #264 the memory leak bug in dtrtri_U.
Zhang Xianyi [Sat, 27 Jul 2013 14:37:57 +0000 (22:37 +0800)]
Fixed the FMA3 detection bug.
Zhang Xianyi [Fri, 26 Jul 2013 15:43:54 +0000 (23:43 +0800)]
Fixed #261. Use strncmp instead of a comparing trick.
Zhang Xianyi [Mon, 29 Jul 2013 07:42:00 +0000 (15:42 +0800)]
Fixed typo in getarch_2nd.c.
wernsaar [Sun, 28 Jul 2013 14:47:58 +0000 (16:47 +0200)]
added dtrsm_kernel_LT_8x2_bulldozer.S
Zhang Xianyi [Sun, 28 Jul 2013 09:39:24 +0000 (17:39 +0800)]
Refs #263. Rollback bulldozer and piledriver kernels to barcelona kernels.
Zhang Xianyi [Sun, 28 Jul 2013 04:38:25 +0000 (06:38 +0200)]
Merge branch 'develop' into bulldozer
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
Zhang Xianyi [Sat, 27 Jul 2013 16:09:40 +0000 (00:09 +0800)]
Refs #262. Added executable stack markings.
Zhang Xianyi [Sat, 27 Jul 2013 15:03:07 +0000 (23:03 +0800)]
Merge branch 'sfabbro-ldflags' into develop
Zhang Xianyi [Sat, 27 Jul 2013 15:01:36 +0000 (23:01 +0800)]
Fixed #260. Fixed generating 32-bit shared library on previous commit.
Zhang Xianyi [Sat, 27 Jul 2013 14:37:57 +0000 (22:37 +0800)]
Fixed the FMA3 detection bug.
Zhang Xianyi [Sat, 27 Jul 2013 14:19:54 +0000 (22:19 +0800)]
Merge branch 'ldflags' of https://github.com/sfabbro/OpenBLAS into sfabbro-ldflags
Zhang Xianyi [Fri, 26 Jul 2013 15:43:54 +0000 (23:43 +0800)]
Fixed #261. Use strncmp instead of a comparing trick.
Sebastien Fabbro [Wed, 24 Jul 2013 16:37:16 +0000 (09:37 -0700)]
Respect user's LDFLAGS
Zhang Xianyi [Thu, 25 Jul 2013 17:34:45 +0000 (01:34 +0800)]
Merge branch 'develop'
Zhang Xianyi [Thu, 25 Jul 2013 17:32:32 +0000 (01:32 +0800)]
Refs #259. Fixed missing LAPACK functions in shared library.
Zhang Xianyi [Tue, 23 Jul 2013 05:40:08 +0000 (13:40 +0800)]
Merge branch 'develop'
Zhang Xianyi [Tue, 23 Jul 2013 05:35:29 +0000 (22:35 -0700)]
Merge pull request #257 from staticfloat/develop
Add in return value for `interface/trtri.c`
Elliot Saba [Tue, 23 Jul 2013 00:02:06 +0000 (17:02 -0700)]
Fix xianyi/OpenBLAS#256
Zhang Xianyi [Mon, 22 Jul 2013 03:34:43 +0000 (11:34 +0800)]
Refs #255. Didn't use f77 compiler.
Zhang Xianyi [Sat, 20 Jul 2013 15:32:23 +0000 (23:32 +0800)]
Update CONTRIBUTORS.md
Zhang Xianyi [Sat, 20 Jul 2013 15:05:36 +0000 (23:05 +0800)]
Merge branch 'develop'
Zhang Xianyi [Sat, 20 Jul 2013 15:05:12 +0000 (23:05 +0800)]
Fixed #253. Update doc for v0.2.7 version.
Zhang Xianyi [Sat, 20 Jul 2013 14:33:35 +0000 (22:33 +0800)]
Merge branch 'loongson3b' into develop
Zhang Xianyi [Sat, 20 Jul 2013 14:32:38 +0000 (22:32 +0800)]
Merge branch 'loongson3a' into develop
Conflicts:
Makefile.system
Zhang Xianyi [Sat, 20 Jul 2013 03:35:27 +0000 (11:35 +0800)]
Fixed #254. Added the date of changes in contributors file.
Zhang Xianyi [Fri, 19 Jul 2013 00:38:03 +0000 (08:38 +0800)]
create contributor file.
wangqian [Thu, 18 Jul 2013 12:23:21 +0000 (20:23 +0800)]
Fixed a computational error in zgemm_kernel_4x4_sandy.S file.
Zhang Xianyi [Wed, 17 Jul 2013 07:19:07 +0000 (15:19 +0800)]
Ensure the correct stack alignment on Win32.
Zhang Xianyi [Tue, 16 Jul 2013 15:18:18 +0000 (23:18 +0800)]
Fixed typo in generating shared library on x86_64.
Zhang Xianyi [Tue, 16 Jul 2013 14:44:27 +0000 (22:44 +0800)]
Modified Makefile to avoid redundant echo.
Zhang Xianyi [Tue, 16 Jul 2013 09:45:00 +0000 (17:45 +0800)]
Modified Makefile.install
Zhang Xianyi [Mon, 15 Jul 2013 01:56:19 +0000 (09:56 +0800)]
Refs #225. Fixed a bug in GEMM OpenMP threading.
Zhang Xianyi [Sun, 14 Jul 2013 14:16:30 +0000 (22:16 +0800)]
Refs #191. A walk around for dtrtri_U single thread bug.
This function caused the failure of ERKALE serial test.
I replaced it with LAPACK source code.
Zhang Xianyi [Sun, 14 Jul 2013 02:41:54 +0000 (10:41 +0800)]
Changed makefile for lapack.
Zhang Xianyi [Fri, 12 Jul 2013 13:41:12 +0000 (21:41 +0800)]
Updated travis.
Zhang Xianyi [Thu, 11 Jul 2013 15:49:29 +0000 (23:49 +0800)]
Update build matrix for Travis CI.
Zhang Xianyi [Thu, 11 Jul 2013 15:47:07 +0000 (23:47 +0800)]
Fixed the typo.
Zhang Xianyi [Thu, 11 Jul 2013 14:24:50 +0000 (22:24 +0800)]
Fixed generating dll bug in last commit.
Zhang Xianyi [Thu, 11 Jul 2013 13:41:44 +0000 (21:41 +0800)]
Fixed #251. Merge branch 'grisuthedragon-develop' into develop
grisuthedragon [Thu, 11 Jul 2013 11:39:27 +0000 (13:39 +0200)]
create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
Zhang Xianyi [Wed, 10 Jul 2013 19:20:02 +0000 (03:20 +0800)]
Refs #214, #221, #246. Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
Zhang Xianyi [Wed, 10 Jul 2013 08:02:27 +0000 (16:02 +0800)]
Refs #248. Support LAPACK and LAPACKE with lsbcc.
For LAPACKE, use LAPACK_COMPLEX_STRUCTURE.
The reson is lsbcc didn't define complex I in complex.h.
Zhang Xianyi [Wed, 10 Jul 2013 08:01:03 +0000 (01:01 -0700)]
Merge pull request #249 from wernsaar/develop
replaced defined(DOUBLE) by !defined(XDOUBLE)
wernsaar [Tue, 9 Jul 2013 16:17:50 +0000 (18:17 +0200)]
replaced defined(DOUBLE) by !defined(XDOUBLE)
Zhang Xianyi [Tue, 9 Jul 2013 09:00:02 +0000 (17:00 +0800)]
Refs #247. Included lapack source codes. Avoid downloading tar.gz from netlib.org
Based on 3.4.2 version, apply patch.for_lapack-3.4.2.
Zhang Xianyi [Tue, 9 Jul 2013 08:26:59 +0000 (16:26 +0800)]
Fixed the typo in getarch.c
Zhang Xianyi [Tue, 9 Jul 2013 07:38:03 +0000 (15:38 +0800)]
Refs #248. Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
Zhang Xianyi [Sun, 7 Jul 2013 17:07:05 +0000 (01:07 +0800)]
Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
Zhang Xianyi [Sat, 6 Jul 2013 15:06:43 +0000 (12:06 -0300)]
Support AMD Piledriver by bulldozer kernels.
Zhang Xianyi [Fri, 5 Jul 2013 07:28:41 +0000 (15:28 +0800)]
Added Travis CI status image.
Zhang Xianyi [Fri, 5 Jul 2013 06:52:57 +0000 (14:52 +0800)]
Use quiet make for Travis CI.
Zhang Xianyi [Fri, 5 Jul 2013 03:11:18 +0000 (11:11 +0800)]
Install gfortran in Travis CI.
Zhang Xianyi [Thu, 4 Jul 2013 15:30:53 +0000 (23:30 +0800)]
Added travis.yml file.
Zhang Xianyi [Tue, 2 Jul 2013 06:37:30 +0000 (14:37 +0800)]
Improved make clean on Mac OS X.
Zhang Xianyi [Tue, 2 Jul 2013 06:17:55 +0000 (14:17 +0800)]
Refs #221. Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.
Zhang Xianyi [Mon, 1 Jul 2013 08:09:05 +0000 (16:09 +0800)]
Use ALIGN_5 instead of .algin 32 in assembly kernel. Added ALIGN_5 for 32-bit OSX.
Zhang Xianyi [Sun, 30 Jun 2013 16:40:32 +0000 (09:40 -0700)]
Merge pull request #242 from danluu/readme.haswell
Update README to reflect Haswell support, etc.
Dan Luu [Sun, 30 Jun 2013 16:36:13 +0000 (11:36 -0500)]
Fix miscellaneous typos
Zhang Xianyi [Sun, 30 Jun 2013 16:35:14 +0000 (00:35 +0800)]
Fixed #217 openblas_config.h bug on Windows 64.
Dan Luu [Sun, 30 Jun 2013 16:35:00 +0000 (11:35 -0500)]
Add Haswell support
Dan Luu [Sat, 29 Jun 2013 22:26:56 +0000 (17:26 -0500)]
Refs #241. Add Haswell support (using sandybridge optimizations)
Zhang Xianyi [Sat, 29 Jun 2013 02:36:01 +0000 (10:36 +0800)]
Fixed #239 bug in param.h about BARCELONA and BULLDOZER.
Zhang Xianyi [Fri, 28 Jun 2013 14:43:41 +0000 (22:43 +0800)]
Fixed #238 bug in lsame on x86.
Zhang Xianyi [Sat, 22 Jun 2013 00:59:26 +0000 (17:59 -0700)]
Merge pull request #235 from wernsaar/develop
Added ddot, daxpy, dcopy kernels for AMD bulldozer.
wernsaar [Fri, 21 Jun 2013 14:06:51 +0000 (16:06 +0200)]
added dcopy_bulldozer.S
wernsaar [Thu, 20 Jun 2013 14:15:09 +0000 (16:15 +0200)]
added ddot_bulldozer.S
wernsaar [Thu, 20 Jun 2013 12:07:54 +0000 (14:07 +0200)]
added daxpy_bulldozer.S
wernsaar [Wed, 19 Jun 2013 17:31:38 +0000 (19:31 +0200)]
cleanup of dgemm_ncopy_8_bulldozer.S
wernsaar [Wed, 19 Jun 2013 15:32:42 +0000 (17:32 +0200)]
added dgemv_t_bulldozer.S