Add AVX2 intrinsic for idct16x16 and idct32x32 functions
authorAnupam Pandey <anupam.pandey@ittiam.com>
Tue, 18 Apr 2023 09:16:56 +0000 (14:46 +0530)
committerAnupam Pandey <anupam.pandey@ittiam.com>
Fri, 5 May 2023 10:25:16 +0000 (15:55 +0530)
commit255ee1888589aa15ae909b992fe123c0358b1730
treed46b2799a29b05c325497d01d2b44b33d456ff1d
parent24802201acd7dfa15928bcc47c1e270e7db5afac
Add AVX2 intrinsic for idct16x16 and idct32x32 functions

Added AVX2 intrinsic optimization for the following functions
1. vpx_idct16x16_256_add
2. vpx_idct32x32_1024_add
3. vpx_idct32x32_135_add

The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
                            Scaling
   Function Name         SSE2      AVX2
vpx_idct32x32_1024_add  3.62x     7.49x
vpx_idct32x32_135_add   4.85x     9.41x
vpx_idct16x16_256_add   4.82x     7.70x

This is a bit-exact change.

Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2
test/dct16x16_test.cc
test/dct32x32_test.cc
vp9/common/vp9_idct.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vpx_dsp/vpx_dsp.mk
vpx_dsp/vpx_dsp_rtcd_defs.pl
vpx_dsp/x86/inv_txfm_avx2.c [new file with mode: 0644]