Optimize vpx_sum_squares_2d_i16_neon
authorJonathan Wright <jonathan.wright@arm.com>
Mon, 6 Mar 2023 17:52:13 +0000 (17:52 +0000)
committerJonathan Wright <jonathan.wright@arm.com>
Mon, 6 Mar 2023 18:34:23 +0000 (18:34 +0000)
commit6b783c6975a5fc2ee21579cc3c48e59184bf3295
treee8bb11dec2636045de4984b079b3108e2e2d3afc
parent5fae248f2a8af49bc82590ec1d397c8535859b0e
Optimize vpx_sum_squares_2d_i16_neon

Add an additional 32-bit vector accumulator to allow parallel
processing on CPUs that have more than one Neon multiply-accumulate
pipeline. Also use sum_neon.h horizontal-add helpers for reduction.

Change-Id: Ibcb48a738f5dee1430c3ebcd305b5ea8ea344c40
vpx_dsp/arm/sum_neon.h
vpx_dsp/arm/sum_squares_neon.c