Optimize Neon implementation of high bitdepth SAD functions
authorSalome Thirot <salome.thirot@arm.com>
Wed, 1 Feb 2023 16:37:24 +0000 (16:37 +0000)
committerSalome Thirot <salome.thirot@arm.com>
Mon, 6 Feb 2023 15:51:43 +0000 (15:51 +0000)
commite3028ddbb408381601ab8d2c67be37124a9726e5
tree13533af79d25d09c914030ba41f1ed26dcd78bba
parent858a8c611f4c965078485860a6820e2135e6611b
Optimize Neon implementation of high bitdepth SAD functions

Optimizations take a similar form to those implemented for standard
bitdepth SAD:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.

Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230
vpx_dsp/arm/highbd_sad_neon.c