Optimize Neon implementation of vpx_int_pro_row
authorJonathan Wright <jonathan.wright@arm.com>
Tue, 30 May 2023 16:31:18 +0000 (17:31 +0100)
committerJonathan Wright <jonathan.wright@arm.com>
Wed, 31 May 2023 13:34:43 +0000 (14:34 +0100)
commitc36aa2e9c4a610dd7f5467126c894ac4dcbded02
tree8f3ebabb7de5d4a1eb3af856801ff1b49bb44b94
parentc738e87f27ef8e12dd28b9052f446a5f69abf3c9
Optimize Neon implementation of vpx_int_pro_row

Double the number of accumulator registers to remove the bottleneck.
Also peel the first loop iteration.

Change-Id: I6a90680369f9c33cdfe14ea547ac1569ec3f50de
vpx_dsp/arm/avg_neon.c