review.tizen.org Git - platform/upstream/libvpx.git/commit

author	Scott LaVarnway <slavarnway@google.com>
	Thu, 30 Jul 2015 12:02:04 +0000 (05:02 -0700)
committer	Scott LaVarnway <slavarnway@google.com>
	Fri, 31 Jul 2015 21:51:51 +0000 (14:51 -0700)
commit	a5e97d874b16ae5826b68515f1e35ffb44361cf8
tree	90355c3fad36aee0de850ea3f1d16359cef0263c	tree \| snapshot
parent	6025c6d65bacea0c72e02ee498bd3e82f92c9141	commit \| diff

VP9_COPY_CONVOLVE_SSE2 optimization

This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)

Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671