[NEON] Optimize FHT functions, add highbd FHT 4x4
authorKonstantinos Margaritis <konstantinos@vectorcamp.gr>
Wed, 9 Nov 2022 09:30:58 +0000 (09:30 +0000)
committerKonstantinos Margaritis <konstantinos@vectorcamp.gr>
Fri, 11 Nov 2022 13:53:54 +0000 (13:53 +0000)
commitf951514a40554e55715d7a31f182581cdd2bf971
treef4715ebbe4934eee5c300931dcad0c86d81874c2
parentfb2d1616f657f6617a6d3ea1cf6e06100f92cddd
[NEON] Optimize FHT functions, add highbd FHT 4x4

Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.

Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94
test/dct_test.cc
vp9/common/vp9_rtcd_defs.pl
vp9/encoder/arm/neon/vp9_dct_neon.c
vpx_dsp/arm/fdct_neon.h