review.tizen.org Git - platform/upstream/openblas.git/commit

author	Arjan van de Ven <arjan@linux.intel.com>
	Wed, 12 Dec 2018 16:45:57 +0000 (16:45 +0000)
committer	Arjan van de Ven <arjan@linux.intel.com>
	Thu, 13 Dec 2018 13:47:31 +0000 (13:47 +0000)
commit	cdc668d82b7afd6a2ddee33987ecfebcaccebc2d
tree	fbf24b02f82425220ccac6191609768c3640ac19	tree \| snapshot
parent	87718807f0f53ea394af694285f8c810a6033d0c	commit \| diff

Add a "sgemm direct" mode for small matrixes

OpenBLAS has a fancy algorithm for copying the input data while laying
it out in a more CPU friendly memory layout.

This is great for large matrixes; the cost of the copy is easily
ammortized by the gains from the better memory layout.

But for small matrixes (on CPUs that can do efficient unaligned loads) this
copy can be a net loss.

This patch adds (for SKYLAKEX initially) a "sgemm direct" mode, that bypasses
the whole copy machinary for ALPHA=1/BETA=0/... standard arguments,
for small matrixes only.

What is small? For the non-threaded case this has been measured to be
in the M*N*K = 28 * 512 * 512 range, while in the threaded case it's
less, around M*N*K = 1 * 512 * 512

common_level3.h		diff \| blob \| history
interface/gemm.c		diff \| blob \| history
kernel/x86_64/sgemm_kernel_16x4_skylakex.c		diff \| blob \| history
param.h		diff \| blob \| history