vp9: neon: optimise convolve8_horiz functions
authorMans Rullgard <mans@mansr.com>
Sun, 11 Aug 2013 14:34:24 +0000 (15:34 +0100)
committerMans Rullgard <mans@mansr.com>
Sun, 11 Aug 2013 15:21:55 +0000 (16:21 +0100)
commitb84dc949c8a4a520ef1e9121d72a6250fb8f8e47
tree69327d7aa4728e8d3505d886b967aef737fcbe25
parente7c5ca8983b3b59f3f68208038c817798f2be7d5
vp9: neon: optimise convolve8_horiz functions

Each iteration of the horizontal loop reuses 7 of the 11 source
values.  Loading only the 4 new values saves some time.

Also add preload for source data.

Overall 4% faster on Chromebook.

Change-Id: I8f69e749f2b7f79e9734620dcee51dbfcd716b44
vp9/common/arm/neon/vp9_convolve8_avg_neon.asm
vp9/common/arm/neon/vp9_convolve8_neon.asm