review.tizen.org Git - platform/upstream/openblas.git/commit

author	Arjan van de Ven <arjan@linux.intel.com>
	Thu, 1 Nov 2018 01:43:20 +0000 (01:43 +0000)
committer	Arjan van de Ven <arjan@linux.intel.com>
	Thu, 1 Nov 2018 01:43:20 +0000 (01:43 +0000)
commit	5b708e5eb1b17af9c45e0da2993da8a4756cb912
tree	738f8b3e43a330e3d17f4002ed508c3687835c8e	tree \| snapshot
parent	dcc5d6291e7b02761acfb6161c04ba1f8f25b502	commit \| diff

sgemm/dgemm: add a way for an arch kernel to specify prefered sizes

The current gemm threading code can make very unfortunate choices, for
example on my 10 core system a 1024x1024x1024 matrix multiply ends up
chunking into blocks of 102... which is not a vector friendly size
and performance ends up horrible.

this patch adds a helper define where an architecture can specify
a preference for size multiples.
This is different from existing defines that are minimum sizes and such.

The performance increase with this patch for the 1024x1024x1024 sgemm
is 2.3x (!!)

driver/level3/level3_thread.c		diff \| blob \| history
param.h		diff \| blob \| history