Add AVX2 intrinsic for vpx_fdct16x16() function
authorAnupam Pandey <anupam.pandey@ittiam.com>
Tue, 4 Apr 2023 11:54:27 +0000 (17:24 +0530)
committerAnupam Pandey <anupam.pandey@ittiam.com>
Mon, 17 Apr 2023 09:53:51 +0000 (15:23 +0530)
commite15c2e34451b4177e4119cce47bf73dac9864de8
tree3ea54e0fbb82c6985f4877bc0fdc62d75936df3f
parent0f42bd3fb81cd79dfaef5c6aef26c643c48be909
Add AVX2 intrinsic for vpx_fdct16x16() function

Introduced AVX2 intrinsic to compute FDCT for block size
16x16 case. This is a bit-exact change.

Please check the module level scaling w.r.t C function (timer based)
for existing (SSE2) and new AVX2 intrinsics:

   Scaling
SSE2      AVX2
3.88x     5.95x

Change-Id: I02299c3746fcb52d808e2a75d30aa62652c816dc
test/dct16x16_test.cc
vpx_dsp/vpx_dsp_rtcd_defs.pl
vpx_dsp/x86/fwd_txfm_avx2.c