review.tizen.org Git - platform/upstream/libvpx.git/commit

author	levytamar82 <tamar.levy@intel.com>
	Wed, 29 Apr 2015 18:54:11 +0000 (11:54 -0700)
committer	levytamar82 <tamar.levy@intel.com>
	Thu, 2 Jul 2015 18:56:11 +0000 (11:56 -0700)
commit	3c5256d572152f2937a741076491ab5cf22eafa0
tree	7c73c6f9172c34c60d7eaca2d535b9872434b529	tree \| snapshot
parent	8565a1c99a18d29a72fb716ad64614dc8e92499f	commit \| diff

VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization

The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)

Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46