traz [Thu, 10 Nov 2011 15:38:48 +0000 (15:38 +0000)]
Add conjugate condition to gemv.
traz [Fri, 4 Nov 2011 19:32:21 +0000 (19:32 +0000)]
Fix the compute error of gemv when incx and incy are negative numbers.
traz [Thu, 3 Nov 2011 13:53:48 +0000 (13:53 +0000)]
Add complete gemv function on Loongson3a platform.
traz [Mon, 26 Sep 2011 15:21:45 +0000 (15:21 +0000)]
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.
traz [Fri, 23 Sep 2011 20:59:48 +0000 (20:59 +0000)]
Modify aligned address of sa and sb to improve the performance of multi-threads.
traz [Fri, 16 Sep 2011 17:50:40 +0000 (17:50 +0000)]
Complete all the complex single-precision functions of level3, but the performance needs further improve.
traz [Fri, 16 Sep 2011 16:08:39 +0000 (16:08 +0000)]
Add ctrmm part in cgemm_kernel_loongson3a_4x2_ps.S.
traz [Thu, 15 Sep 2011 16:08:23 +0000 (16:08 +0000)]
Complete cgemm function, but no optimization.
traz [Wed, 14 Sep 2011 20:00:35 +0000 (20:00 +0000)]
Fix some compute error.
traz [Wed, 14 Sep 2011 16:32:36 +0000 (16:32 +0000)]
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
traz [Wed, 14 Sep 2011 15:32:25 +0000 (15:32 +0000)]
Use ps instructions in cgemm.
Xianyi Zhang [Thu, 8 Sep 2011 16:39:34 +0000 (16:39 +0000)]
Refs #47. Fixed the seting parameter bug on Loongson 3A single thread version.
Xianyi Zhang [Tue, 6 Sep 2011 18:27:33 +0000 (18:27 +0000)]
Check the return value of pthread_create. Update the docs with known issue on Loongson 3A.
Xianyi Zhang [Tue, 6 Sep 2011 18:19:50 +0000 (18:19 +0000)]
Merge branch 'develop' into loongson3a
traz [Mon, 5 Sep 2011 16:31:40 +0000 (16:31 +0000)]
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
traz [Mon, 5 Sep 2011 16:30:55 +0000 (16:30 +0000)]
Fixed #46. Initialize variables in cblat3.f and zblat3.f.
Xianyi Zhang [Mon, 5 Sep 2011 15:13:05 +0000 (15:13 +0000)]
Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
traz [Fri, 2 Sep 2011 19:41:06 +0000 (19:41 +0000)]
Fix an illegal instruction for strmm_RTLU.
traz [Fri, 2 Sep 2011 16:57:33 +0000 (16:57 +0000)]
Fix an error for strmm_LLTN.
traz [Fri, 2 Sep 2011 16:50:50 +0000 (16:50 +0000)]
Fix an error for strmm_LLTN.
traz [Fri, 2 Sep 2011 16:00:04 +0000 (16:00 +0000)]
Fix a compute error for strmm.
traz [Fri, 2 Sep 2011 15:28:01 +0000 (15:28 +0000)]
Fix stack-pointer bug for strmm.
traz [Fri, 2 Sep 2011 09:15:09 +0000 (09:15 +0000)]
Add strmm part.
traz [Thu, 1 Sep 2011 17:15:28 +0000 (17:15 +0000)]
Tuning mb, kb, nb size to get the best performance.
traz [Wed, 31 Aug 2011 21:24:03 +0000 (21:24 +0000)]
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now.
Xianyi Zhang [Wed, 31 Aug 2011 10:21:37 +0000 (18:21 +0800)]
Fixed the bug about installation. f77blas.h works OK now.
traz [Tue, 30 Aug 2011 20:57:00 +0000 (20:57 +0000)]
Modify compile options.
traz [Tue, 30 Aug 2011 20:54:19 +0000 (20:54 +0000)]
Using ps instruction.
traz [Mon, 18 Jul 2011 17:06:53 +0000 (17:06 +0000)]
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
traz [Mon, 18 Jul 2011 17:03:38 +0000 (17:03 +0000)]
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
traits [Wed, 13 Jul 2011 17:09:21 +0000 (01:09 +0800)]
Merge branch 'hotfix-0.1alpha2.2' into develop
traits [Wed, 13 Jul 2011 17:02:19 +0000 (01:02 +0800)]
Update the documents for 0.1alpha2.2 version.
traits [Wed, 13 Jul 2011 16:54:23 +0000 (00:54 +0800)]
Fixed #44 a makefile bug when DYNAMIC_ARCH=1 and INTERFACE64=1.
Xianyi Zhang [Thu, 7 Jul 2011 06:25:51 +0000 (14:25 +0800)]
Merge branch 'develop' into loongson3a
traits [Tue, 28 Jun 2011 07:46:55 +0000 (15:46 +0800)]
Merge branch 'hotfix-0.1alpha2.1' into develop
traits [Tue, 28 Jun 2011 07:43:08 +0000 (15:43 +0800)]
Merge branch 'hotfix-0.1alpha2.1'
traits [Tue, 28 Jun 2011 07:42:09 +0000 (15:42 +0800)]
Refs #42. Output the error message when detecting fortran compiler failed.
traz [Fri, 24 Jun 2011 09:28:12 +0000 (09:28 +0000)]
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
traz [Fri, 24 Jun 2011 09:27:41 +0000 (09:27 +0000)]
Fix compute error in ztrmm.
traz [Thu, 23 Jun 2011 21:11:00 +0000 (21:11 +0000)]
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
traz [Thu, 23 Jun 2011 10:46:58 +0000 (10:46 +0000)]
Change prefetch length of A and B, the performance is 2.1G now.
Xianyi Zhang [Thu, 23 Jun 2011 08:08:23 +0000 (16:08 +0800)]
Merge branch 'release-v0.1alpha2' into loongson3a
Xianyi Zhang [Thu, 23 Jun 2011 08:07:34 +0000 (16:07 +0800)]
Merge branch 'release-v0.1alpha2' into develop
traits [Thu, 23 Jun 2011 07:18:40 +0000 (15:18 +0800)]
Merge branch 'release-v0.1alpha2'
traits [Thu, 23 Jun 2011 07:16:24 +0000 (15:16 +0800)]
Fixed #38. Released v0.1 alpha2.
traits [Thu, 23 Jun 2011 07:09:34 +0000 (15:09 +0800)]
Refs #37. Updated REAME about the compatible issue with EKOPath compiler.
Xianyi Zhang [Wed, 22 Jun 2011 05:19:39 +0000 (13:19 +0800)]
Refs #39. Moved the shared lib (dll) to top directory in MingW64 compiler environment.
traz [Tue, 21 Jun 2011 22:16:23 +0000 (22:16 +0000)]
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
Xianyi Zhang [Tue, 21 Jun 2011 17:52:20 +0000 (01:52 +0800)]
Refs #39. It's unnecessary to include sys/mman.h file in blas_server_omp.c.
Xianyi Zhang [Tue, 21 Jun 2011 10:06:13 +0000 (18:06 +0800)]
Refs #38. Prepare the docs with v0.1alpha2.
Xianyi Zhang [Tue, 21 Jun 2011 09:50:00 +0000 (17:50 +0800)]
Merge branch 'loongson3a' into release-v0.1alpha2
Xianyi Zhang [Tue, 21 Jun 2011 09:40:16 +0000 (17:40 +0800)]
Merge branch 'add_install_target' into develop
Xianyi Zhang [Tue, 21 Jun 2011 09:39:08 +0000 (17:39 +0800)]
Refs #20. Fixed the installation bug with DYNAMIC_ARCH=1.
Xianyi Zhang [Mon, 20 Jun 2011 10:40:05 +0000 (18:40 +0800)]
Merge branch 'add_install_target' into develop
Conflicts:
Changelog.txt
Xianyi Zhang [Mon, 20 Jun 2011 10:36:29 +0000 (18:36 +0800)]
Refs #20. Updated the docs.
Xianyi Zhang [Mon, 20 Jun 2011 10:35:35 +0000 (18:35 +0800)]
Fixed #20. Added install target in makefile. You can use "make install PREFIX=your_installation_directory".
Xianyi Zhang [Sun, 19 Jun 2011 04:07:31 +0000 (12:07 +0800)]
Updated gitignore file.
Xianyi Zhang [Sun, 19 Jun 2011 03:59:38 +0000 (11:59 +0800)]
Merge branch 'master' of github.com:xianyi/OpenBLAS into develop
Xianyi Zhang [Sun, 19 Jun 2011 03:55:29 +0000 (11:55 +0800)]
Fixed #27. Temporarily walk around axpy's low performance issue with small imput size & multithreads.
Xianyi Zhang [Sat, 11 Jun 2011 12:59:00 +0000 (05:59 -0700)]
Merge pull request #36 from pipping/master
Fixed the bug about USE_OPENMP=0 enabling OpenMP
Elias Pipping [Sat, 11 Jun 2011 12:36:16 +0000 (14:36 +0200)]
Make USE_OPENMP=0 disable openmp
Xianyi Zhang [Thu, 9 Jun 2011 14:59:49 +0000 (22:59 +0800)]
Fixed #35 a build bug with NO_LAPACK=1 DYNAMIC_ARCH=1 FC=gfortran. I forgot to test it with gfortran in last bug fixed commit.
Xianyi Zhang [Thu, 9 Jun 2011 03:38:59 +0000 (11:38 +0800)]
Fixed #35 a build bug with NO_LAPACK=1 & DYNAMIC_ARCH=1.
Xianyi Zhang [Thu, 9 Jun 2011 02:40:15 +0000 (10:40 +0800)]
Print the wall time (cycles) with enabling FUNCTION_PROFILE.
Wang Qian [Tue, 7 Jun 2011 04:53:25 +0000 (12:53 +0800)]
Fixed #33 ztrmm bug on Nehalem.
Xianyi [Fri, 3 Jun 2011 05:19:54 +0000 (13:19 +0800)]
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.
Xianyi Zhang [Mon, 30 May 2011 04:42:17 +0000 (12:42 +0800)]
Fixed #31 Shared library placement on Mac. Thank Mr.Viral B. Shah for this patch.
traz [Sat, 28 May 2011 09:48:34 +0000 (09:48 +0000)]
Fixed #30 strmm computational error on Loongson3A.
Xianyi Zhang [Fri, 27 May 2011 13:15:30 +0000 (21:15 +0800)]
Fixed the makefile bug about openblas_set_num_threads.
Xianyi Zhang [Fri, 27 May 2011 10:16:19 +0000 (18:16 +0800)]
Fixed a bug about detecting underscore prefix in c_check.
Xianyi Zhang [Fri, 27 May 2011 10:12:45 +0000 (18:12 +0800)]
Ingnore *.obj files in git.
traz [Fri, 27 May 2011 09:47:17 +0000 (09:47 +0000)]
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
traz [Wed, 18 May 2011 10:54:51 +0000 (10:54 +0000)]
Remove the useless code, modify code comments and format.
Xianyi Zhang [Tue, 17 May 2011 21:24:00 +0000 (21:24 +0000)]
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel.
traz [Sat, 14 May 2011 22:00:57 +0000 (22:00 +0000)]
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
Xianyi Zhang [Thu, 12 May 2011 18:41:39 +0000 (02:41 +0800)]
Added missed testing codes for dsdot.
Xianyi Zhang [Thu, 12 May 2011 18:34:30 +0000 (02:34 +0800)]
Fixed #28. Convert the result to double precision in the end of dsdot kernel.
Xianyi Zhang [Thu, 12 May 2011 18:19:55 +0000 (02:19 +0800)]
Added the unit testcase for dsdot.
Xianyi Zhang [Thu, 12 May 2011 17:21:39 +0000 (01:21 +0800)]
Added the unit test for drotmg.
Xianyi Zhang [Thu, 12 May 2011 11:06:31 +0000 (19:06 +0800)]
Merge branch 'hotfix-readme_about_branches' into develop
Xianyi Zhang [Thu, 12 May 2011 11:06:02 +0000 (19:06 +0800)]
Merge branch 'hotfix-readme_about_branches'
Xianyi Zhang [Thu, 12 May 2011 11:05:20 +0000 (19:05 +0800)]
Added the spec of git branches about this project.
traz [Wed, 11 May 2011 10:44:23 +0000 (10:44 +0000)]
Finish dtrsm_kernel_Rx.S on Loongson3A.
Xianyi Zhang [Tue, 10 May 2011 17:12:32 +0000 (01:12 +0800)]
Fixed #26 the wrong result of rotmg. Used fabs() instead of abs().
traz [Tue, 10 May 2011 12:48:43 +0000 (12:48 +0000)]
Finish dtrsm_kernel_Lx.S on Loongson3A.
traz [Mon, 9 May 2011 17:31:58 +0000 (17:31 +0000)]
Modify dtrsm compiler options
traz [Mon, 9 May 2011 17:28:20 +0000 (17:28 +0000)]
Fixed #24 drmm error on Loongson3A
Xianyi Zhang [Fri, 6 May 2011 09:03:35 +0000 (17:03 +0800)]
Added openblas_set_num_threads for Fortran.
Xianyi Zhang [Wed, 4 May 2011 05:03:10 +0000 (13:03 +0800)]
Fixed #23. Fixed a bug of f_check script about generating link flags.
Xianyi Zhang [Tue, 3 May 2011 09:19:36 +0000 (17:19 +0800)]
Fixed a bug when detecting Intel CPU.
traits [Tue, 3 May 2011 06:42:11 +0000 (14:42 +0800)]
Fixed a build bug with NO_LAPACK=1 and SANNITY_CHECK=1.
Xianyi Zhang [Fri, 22 Apr 2011 14:14:06 +0000 (22:14 +0800)]
Fixed #16. Print the user-friendly message when detecting CPU failed.
Xianyi Zhang [Fri, 22 Apr 2011 14:07:46 +0000 (22:07 +0800)]
Added docs for make TARGET=your_cpu_target.
Xianyi Zhang [Fri, 22 Apr 2011 12:21:42 +0000 (20:21 +0800)]
Fixed #19. Provided an error msg when the arch is not supported.
Xianyi Zhang [Wed, 20 Apr 2011 05:41:38 +0000 (13:41 +0800)]
Fixed #21. Added extern C to support C++. Thank Tasio for the patch.
traz [Sun, 17 Apr 2011 20:26:49 +0000 (20:26 +0000)]
Completely dtrmm function.
traz [Fri, 15 Apr 2011 21:56:25 +0000 (21:56 +0000)]
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .
traz [Mon, 11 Apr 2011 22:46:36 +0000 (22:46 +0000)]
Modify prefetching C.
traz [Mon, 11 Apr 2011 22:17:57 +0000 (22:17 +0000)]
Adjust kc size from 112 to 116 .
Xianyi Zhang [Mon, 11 Apr 2011 21:46:48 +0000 (21:46 +0000)]
Changed default page size to 16KB on Loongson 3A.