Implement highbd_d63_predictor using Neon
Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16
and 32x32 block sizes. Also update tests to add new corresponding cases.
This re-lands commit
7cdf139e3d6237386e0f93bdb0bdc1b459c663bf,
previously reverted in
7478b7e4e481562a4a13f233acb66a60462e1934.
Compared to the previous implementation attempt we now correctly match
the behaviour of the C code when handling the final element loaded from
the 'above' input array. In particular:
- The C code for a 4x4 block performs a full average of the last element
rather than duplicating the final element from the input 'above'
array.
- The C code for other block sizes performs a full average for the
stride=0 and stride=1, and otherwise shifts in duplicates of the final
element from the input 'above' array. Notably this shifting for later
strides _replaces_ the final element which we previously performed an
average on (see {d0,d1}_ext in the code).
It is worth noting that this difference is not caught by the existing
VP9HighbdIntraPredTest test cases since the test vector initialisation
contains this loop:
for (int x = block_size; x < 2 * block_size; x++) {
above_row_[x] = above_row_[block_size - 1];
}
Since AVG2(a, a) and AVG3(a, a, a) are simply 'a', such differences in
behaviour for the final element are not observed.
Tested on AArch64 with:
- ./test_libvpx --gtest_filter="*VP9HighbdIntraPredTest*"
- ./test_libvpx --gtest_filter="*VP9/TestVectorTest.MD5Match*"
- ./test_libvpx --gtest_filter="*VP9/ExternalFrameBufferMD5Test*"
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 2.43
Neoverse N1 | LLVM 15 | 8x8 | 3.92
Neoverse N1 | LLVM 15 | 16x16 | 3.19
Neoverse N1 | LLVM 15 | 32x32 | 4.13
Neoverse N1 | GCC 12 | 4x4 | 2.92
Neoverse N1 | GCC 12 | 8x8 | 6.51
Neoverse N1 | GCC 12 | 16x16 | 4.55
Neoverse N1 | GCC 12 | 32x32 | 3.18
Neoverse V1 | LLVM 15 | 4x4 | 1.99
Neoverse V1 | LLVM 15 | 8x8 | 3.65
Neoverse V1 | LLVM 15 | 16x16 | 3.72
Neoverse V1 | LLVM 15 | 32x32 | 3.26
Neoverse V1 | GCC 12 | 4x4 | 2.39
Neoverse V1 | GCC 12 | 8x8 | 4.76
Neoverse V1 | GCC 12 | 16x16 | 3.24
Neoverse V1 | GCC 12 | 32x32 | 2.44
Change-Id: Iefaa774d6a20388b523eaa7f5df6bc5f5cf249e4