platform/upstream/openblas.git
13 years agoFixed #25 dtrmm and dtrsm computational error on Loongson3A.
traz [Sat, 14 May 2011 22:00:57 +0000 (22:00 +0000)]
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.

13 years agoFinish dtrsm_kernel_Rx.S on Loongson3A.
traz [Wed, 11 May 2011 10:44:23 +0000 (10:44 +0000)]
Finish dtrsm_kernel_Rx.S on Loongson3A.

13 years agoFinish dtrsm_kernel_Lx.S on Loongson3A.
traz [Tue, 10 May 2011 12:48:43 +0000 (12:48 +0000)]
Finish dtrsm_kernel_Lx.S on Loongson3A.

13 years agoModify dtrsm compiler options
traz [Mon, 9 May 2011 17:31:58 +0000 (17:31 +0000)]
Modify dtrsm compiler options

13 years agoFixed #24 drmm error on Loongson3A
traz [Mon, 9 May 2011 17:28:20 +0000 (17:28 +0000)]
Fixed #24 drmm error on Loongson3A

13 years agoCompletely dtrmm function.
traz [Sun, 17 Apr 2011 20:26:49 +0000 (20:26 +0000)]
Completely dtrmm function.

13 years agoIncreased handling trmm part, no edge handling. Test size(M and N) must be a multiple...
traz [Fri, 15 Apr 2011 21:56:25 +0000 (21:56 +0000)]
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .

13 years agoModify prefetching C.
traz [Mon, 11 Apr 2011 22:46:36 +0000 (22:46 +0000)]
Modify prefetching C.

13 years agoAdjust kc size from 112 to 116 .
traz [Mon, 11 Apr 2011 22:17:57 +0000 (22:17 +0000)]
Adjust kc size from 112 to 116 .

13 years agoChanged default page size to 16KB on Loongson 3A.
Xianyi Zhang [Mon, 11 Apr 2011 21:46:48 +0000 (21:46 +0000)]
Changed default page size to 16KB on Loongson 3A.

13 years agoChange BLOCK SIZE of LOONGSON3A TARGET.
traz [Wed, 6 Apr 2011 10:39:31 +0000 (10:39 +0000)]
Change BLOCK SIZE of LOONGSON3A TARGET.

13 years agoAdd dgemm compiler Options in KERNEL.LOONGSON3A.
traz [Wed, 6 Apr 2011 10:38:34 +0000 (10:38 +0000)]
Add dgemm compiler Options in KERNEL.LOONGSON3A.

13 years agoNew kernel in LOONGSON3A.
traz [Wed, 6 Apr 2011 10:36:44 +0000 (10:36 +0000)]
New kernel in LOONGSON3A.

13 years agoMerge branch 'master' of github.com:xianyi/OpenBLAS into x86
Xianyi Zhang [Tue, 22 Mar 2011 06:16:18 +0000 (14:16 +0800)]
Merge branch 'master' of github.com:xianyi/OpenBLAS into x86

13 years agoFixed the detecting bug on Intel Core i5. Thank ggl329 for the patch. v0.1alpha1
Xianyi Zhang [Tue, 22 Mar 2011 06:09:47 +0000 (14:09 +0800)]
Fixed the detecting bug on Intel Core i5. Thank ggl329 for the patch.

13 years agoUpdated the developing version to v0.1 alpha2.
Xianyi Zhang [Sun, 20 Mar 2011 15:35:31 +0000 (23:35 +0800)]
Updated the developing version to v0.1 alpha2.

13 years agoInit Changelog file for next release version(v0.1alpha2).
Xianyi Zhang [Sun, 20 Mar 2011 15:30:09 +0000 (23:30 +0800)]
Init Changelog file for next release version(v0.1alpha2).

13 years agoOpenBLAS 0.1 alpha version 1.
Xianyi Zhang [Sun, 20 Mar 2011 14:44:57 +0000 (22:44 +0800)]
OpenBLAS 0.1 alpha version 1.

13 years agoMerge remote branch 'origin/loongson3a' into x86
Xianyi Zhang [Sun, 20 Mar 2011 13:57:58 +0000 (21:57 +0800)]
Merge remote branch 'origin/loongson3a' into x86

13 years agoMerge remote branch 'origin/loongson3a' into x86
Xianyi Zhang [Sun, 20 Mar 2011 13:57:09 +0000 (21:57 +0800)]
Merge remote branch 'origin/loongson3a' into x86

13 years agoDetect Intel Core Clarkdale & Arrandale
Xianyi Zhang [Sun, 20 Mar 2011 13:56:40 +0000 (21:56 +0800)]
Detect Intel Core Clarkdale & Arrandale

13 years agoFixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now...
Xianyi Zhang [Fri, 18 Mar 2011 23:05:56 +0000 (23:05 +0000)]
Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct.

13 years agoMerge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
Xianyi Zhang [Fri, 18 Mar 2011 01:20:15 +0000 (01:20 +0000)]
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a

13 years agoSupported detecting new kernel(2.6.36) & new Loongson3A03 CPU.
Xianyi Zhang [Fri, 18 Mar 2011 01:10:58 +0000 (01:10 +0000)]
Supported detecting new kernel(2.6.36) & new Loongson3A03 CPU.

13 years agoModified the default kernel makefile in MIPS64 arch.
Wang Qian [Mon, 7 Mar 2011 11:22:32 +0000 (11:22 +0000)]
Modified the default kernel makefile in MIPS64 arch.

13 years agoSupport unalign address in daxpy on loongson3a simd..
Xianyi Zhang [Sat, 5 Mar 2011 02:17:10 +0000 (10:17 +0800)]
Support unalign address in daxpy on loongson3a simd..

13 years agoUnroll to 16 in daxpy on loongson3a.
Xianyi Zhang [Fri, 4 Mar 2011 09:50:17 +0000 (17:50 +0800)]
Unroll to 16 in daxpy on loongson3a.

13 years agoMerge commit 'origin/x86' into loongson3a
Xianyi Zhang [Fri, 4 Mar 2011 14:11:52 +0000 (14:11 +0000)]
Merge commit 'origin/x86' into loongson3a

13 years agoMerge branch 'x86' of github.com:xianyi/OpenBLAS into x86
Xianyi Zhang [Fri, 4 Mar 2011 03:53:04 +0000 (11:53 +0800)]
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86

13 years agoSupport NO_LAPACK=1 to build the lib without LAPACK functions.
Xianyi Zhang [Fri, 4 Mar 2011 03:51:32 +0000 (11:51 +0800)]
Support NO_LAPACK=1 to build the lib without LAPACK functions.

13 years agoChanged movlps macro name in capital in x86/zdot_sse2.S file.
Xianyi [Wed, 2 Mar 2011 16:46:39 +0000 (00:46 +0800)]
Changed movlps macro name in capital in x86/zdot_sse2.S file.

13 years agoOn x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S...
Xianyi [Wed, 2 Mar 2011 10:45:30 +0000 (18:45 +0800)]
On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.

13 years agoAdded zdotu with x & y offset=1 test case.
Xianyi Zhang [Wed, 2 Mar 2011 10:03:40 +0000 (18:03 +0800)]
Added zdotu with x & y offset=1 test case.

13 years agoMerge remote branch 'origin/x86' into loongson3a
Xianyi Zhang [Wed, 2 Mar 2011 05:52:05 +0000 (13:52 +0800)]
Merge remote branch 'origin/x86' into loongson3a

13 years agoupdated the changelog.
Xianyi Zhang [Wed, 2 Mar 2011 05:40:55 +0000 (13:40 +0800)]
updated the changelog.

13 years agoFixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34. Fixed #12. Thank...
Xianyi Zhang [Wed, 2 Mar 2011 05:38:32 +0000 (13:38 +0800)]
Fixed randomly SEGFAULT when nodemask==NULL with above Linux 2.6.34. Fixed #12. Thank Mr.Ei-ji Nakama providing this patch.

13 years agoAdded Changelog. Fixed #11.
Xianyi Zhang [Sat, 26 Feb 2011 04:27:56 +0000 (12:27 +0800)]
Added Changelog. Fixed #11.

13 years agoEnable Debug flags in memory alloc and init functions.
Xianyi Zhang [Sat, 26 Feb 2011 03:51:39 +0000 (11:51 +0800)]
Enable Debug flags in memory alloc and init functions.

13 years agoAdded DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
Xianyi Zhang [Sat, 26 Feb 2011 03:19:54 +0000 (11:19 +0800)]
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.

13 years agoMerge branch 'x86' of github.com:xianyi/OpenBLAS into x86
Xianyi Zhang [Thu, 24 Feb 2011 09:02:52 +0000 (17:02 +0800)]
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86

13 years agoFixed #10. Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables.
Xianyi Zhang [Thu, 24 Feb 2011 07:16:21 +0000 (15:16 +0800)]
Fixed #10. Supported GOTO_NUM_THREADS & GOTO_THREADS_TIMEOUT environment variables.

13 years agoFixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86...
Xianyi [Wed, 23 Feb 2011 12:08:34 +0000 (20:08 +0800)]
Fixed #7. Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch.

13 years agoAdded unit test case (zdotu, N=1).
Xianyi Zhang [Tue, 22 Feb 2011 06:16:46 +0000 (14:16 +0800)]
Added unit test case (zdotu, N=1).

13 years agoSupported building debug version.
Xianyi Zhang [Tue, 22 Feb 2011 05:40:40 +0000 (13:40 +0800)]
Supported building debug version.

13 years agoImproved the quality of codes in unit test.
Xianyi Zhang [Sun, 20 Feb 2011 16:42:46 +0000 (00:42 +0800)]
Improved the quality of codes in unit test.
Thanks José Luis García Pallero

13 years agoFixed #7. 1)Disable the multi-thread and 2) Modified kernel codes to avoid unloop...
Xianyi Zhang [Sun, 20 Feb 2011 16:24:21 +0000 (00:24 +0800)]
Fixed #7. 1)Disable the multi-thread and  2) Modified kernel codes to avoid unloop in axpy function when incx==0 or incy==0.

13 years ago Added axpy unit test with incx==0 and incy==0.
Xianyi Zhang [Sun, 20 Feb 2011 16:17:33 +0000 (00:17 +0800)]
 Added axpy unit test with incx==0 and incy==0.

13 years agoFixed #6. Disable multi-thread swap when incx==0 or incy==0.
Xianyi Zhang [Sun, 20 Feb 2011 09:14:38 +0000 (17:14 +0800)]
Fixed #6. Disable multi-thread swap when incx==0 or incy==0.

13 years agoAdded swap unit test with incx==0 and incy==0.
Xianyi Zhang [Sun, 20 Feb 2011 09:13:12 +0000 (17:13 +0800)]
Added swap unit test with incx==0 and incy==0.

13 years agoUpdated readme file.
Xianyi Zhang [Fri, 18 Feb 2011 16:18:17 +0000 (00:18 +0800)]
Updated readme file.

13 years agoFixed #5 Detected Intel Westmere (using Nehalem codes) in build and dynamic arch...
Xianyi Zhang [Fri, 18 Feb 2011 14:08:10 +0000 (22:08 +0800)]
Fixed #5 Detected Intel Westmere (using Nehalem codes) in build and dynamic arch build.
Thanks Cao He from Dawning supporting Intel Xeon 5660 testbed.

13 years agofixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch.
Xianyi [Thu, 17 Feb 2011 19:00:58 +0000 (03:00 +0800)]
fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch.

13 years agoDisable quad and x precision objs in reference.
Xianyi [Thu, 17 Feb 2011 18:50:32 +0000 (02:50 +0800)]
Disable quad and x precision objs in reference.

13 years agoMerge branch 'master' into loongson3a
Xianyi Zhang [Wed, 16 Feb 2011 16:39:09 +0000 (00:39 +0800)]
Merge branch 'master' into loongson3a

13 years agoMerge branch 'x86' of github.com:xianyi/OpenBLAS into x86
Xianyi Zhang [Wed, 16 Feb 2011 15:41:15 +0000 (23:41 +0800)]
Merge branch 'x86' of github.com:xianyi/OpenBLAS into x86

13 years agofixed #4 csrot returned the wrong result when incx==incy==0.
Xianyi Zhang [Wed, 16 Feb 2011 15:39:43 +0000 (23:39 +0800)]
fixed #4 csrot returned the wrong result when incx==incy==0.

13 years agoAdded rot testcase when incx == incy ==1.
Xianyi Zhang [Wed, 16 Feb 2011 15:32:13 +0000 (23:32 +0800)]
Added rot testcase when incx == incy ==1.

13 years agofixed a bug in drot whe incx or incy equals to zero.
Xianyi Zhang [Tue, 15 Feb 2011 16:18:45 +0000 (00:18 +0800)]
fixed a bug in drot whe incx or incy equals to zero.

13 years agoMerge branch 'master' into x86
Xianyi Zhang [Wed, 16 Feb 2011 09:42:12 +0000 (17:42 +0800)]
Merge branch 'master' into x86

13 years agoUpdated gitignore.
Xianyi Zhang [Wed, 16 Feb 2011 09:37:48 +0000 (17:37 +0800)]
Updated gitignore.

13 years agoAdded utest frame using CUnit(http://cunit.sourceforge.net/).
Xianyi Zhang [Wed, 16 Feb 2011 09:33:06 +0000 (17:33 +0800)]
Added utest frame using CUnit(cunit.sourceforge.net/).

13 years agofixed a bug in drot whe incx or incy equals to zero.
Xianyi Zhang [Tue, 15 Feb 2011 16:18:45 +0000 (00:18 +0800)]
fixed a bug in drot whe incx or incy equals to zero.

13 years agoDid the experiment with Loongson 3A 128bit load & store instruction.
Xianyi Zhang [Fri, 28 Jan 2011 19:05:27 +0000 (03:05 +0800)]
Did the experiment with Loongson 3A 128bit load & store instruction.

13 years agochanged prefetch order.
Xianyi Zhang [Fri, 28 Jan 2011 19:03:34 +0000 (03:03 +0800)]
changed prefetch order.

13 years agoload x & y contiguously in axpy.
Xianyi Zhang [Fri, 28 Jan 2011 03:18:50 +0000 (11:18 +0800)]
load x & y contiguously in axpy.

13 years agoModified aligned size. Added additional prefetch instruction because of cache line...
Xianyi Zhang [Thu, 27 Jan 2011 15:07:06 +0000 (23:07 +0800)]
Modified aligned size. Added additional prefetch instruction because of cache line is 32 bytes in Loongson 3A.

13 years agoadded axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance ...
Xianyi Zhang [Wed, 26 Jan 2011 14:34:33 +0000 (22:34 +0800)]
added axpy kernel with prefetch for Loongson3A. To-Do: tuning prefetch distance & instruction order.

13 years agoModified the unsupported instruction on Loongson3A. Closed #1 OpenBLAS could run...
Xianyi Zhang [Tue, 25 Jan 2011 09:34:47 +0000 (17:34 +0800)]
Modified the unsupported instruction on Loongson3A. Closed #1 OpenBLAS could run on Loongson3A now.

13 years agoUpdated readme for cross compiling.
Xianyi Zhang [Tue, 25 Jan 2011 08:52:36 +0000 (16:52 +0800)]
Updated readme for cross compiling.

13 years agofixed a typo.
Xianyi Zhang [Tue, 25 Jan 2011 07:55:56 +0000 (15:55 +0800)]
fixed a typo.

13 years agoAdded the configures of loongson 3a. refs #1
Xianyi Zhang [Mon, 24 Jan 2011 22:45:35 +0000 (22:45 +0000)]
Added the configures of loongson 3a. refs #1

13 years agoUpdated readme file.
Xianyi Zhang [Mon, 24 Jan 2011 20:03:04 +0000 (20:03 +0000)]
Updated readme file.

13 years agoAdded .gitignore files.
Xianyi Zhang [Mon, 24 Jan 2011 19:59:52 +0000 (19:59 +0000)]
Added .gitignore files.

13 years agoAdded mailing list.
Xianyi Zhang [Mon, 24 Jan 2011 19:59:15 +0000 (19:59 +0000)]
Added mailing list.

13 years agoUpdated readme file.
Xianyi Zhang [Mon, 24 Jan 2011 18:12:06 +0000 (18:12 +0000)]
Updated readme file.

13 years agoUsed the environment variable OPENBLAS_NUM_THREADS to set the number of threads in...
Xianyi Zhang [Mon, 24 Jan 2011 18:11:35 +0000 (18:11 +0000)]
Used the environment variable OPENBLAS_NUM_THREADS to set the number of threads in test.

13 years agochanged library name to openblas and modified environment variable.
Xianyi Zhang [Mon, 24 Jan 2011 17:58:05 +0000 (17:58 +0000)]
changed library name to openblas and modified environment variable.

13 years agoAdded OpenBLAS docs.
Xianyi Zhang [Mon, 24 Jan 2011 16:05:00 +0000 (16:05 +0000)]
Added OpenBLAS docs.

13 years agoFixed a bug when compiling dynamic ARCH x86 in GotoBLAS2.
Xianyi Zhang [Mon, 24 Jan 2011 16:04:17 +0000 (16:04 +0000)]
Fixed a bug when compiling dynamic ARCH x86 in GotoBLAS2.

13 years agorename documents in GotoBLAS.
Xianyi Zhang [Mon, 24 Jan 2011 15:57:23 +0000 (15:57 +0000)]
rename documents in GotoBLAS.

13 years agoImport GotoBLAS2 1.13 BSD version codes.
Xianyi Zhang [Mon, 24 Jan 2011 14:54:24 +0000 (14:54 +0000)]
Import GotoBLAS2 1.13 BSD version codes.