review.tizen.org Git - platform/upstream/libvpx.git/commit

[NEON] Optimize and homogenize Butterfly DCT functions

Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.

Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3

author	Konstantinos Margaritis <konstantinos@vectorcamp.gr>
	Wed, 26 Oct 2022 21:37:31 +0000 (21:37 +0000)
committer	Konstantinos Margaritis <konstantinos@vectorcamp.gr>
	Tue, 1 Nov 2022 23:07:27 +0000 (23:07 +0000)
commit	3121783fec60d0ce4551d472d1acbd1f1a8253be
tree	95a53f73adccf711003346860021ddf9ed1a2e0f	tree \| snapshot
parent	dcb566e69f03eb046180dabf41c4118b249af96f	commit \| diff

vp9/encoder/arm/neon/vp9_dct_neon.c		diff \| blob \| history
vpx_dsp/arm/fdct16x16_neon.c		diff \| blob \| history
vpx_dsp/arm/fdct16x16_neon.h		diff \| blob \| history
vpx_dsp/arm/fdct32x32_neon.c		diff \| blob \| history
vpx_dsp/arm/fdct32x32_neon.h	[new file with mode: 0644]	blob
vpx_dsp/arm/fdct4x4_neon.c		diff \| blob \| history
vpx_dsp/arm/fdct4x4_neon.h	[new file with mode: 0644]	blob
vpx_dsp/arm/fdct8x8_neon.c		diff \| blob \| history
vpx_dsp/arm/fdct8x8_neon.h	[new file with mode: 0644]	blob
vpx_dsp/arm/fdct_neon.h		diff \| blob \| history
vpx_dsp/arm/transpose_neon.h		diff \| blob \| history