Add AVX2 intrinsic for idct16x16 and idct32x32 functions
Added AVX2 intrinsic optimization for the following functions
1. vpx_idct16x16_256_add
2. vpx_idct32x32_1024_add
3. vpx_idct32x32_135_add
The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
Scaling
Function Name SSE2 AVX2
vpx_idct32x32_1024_add 3.62x 7.49x
vpx_idct32x32_135_add 4.85x 9.41x
vpx_idct16x16_256_add 4.82x 7.70x
This is a bit-exact change.
Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2