review.tizen.org Git - platform/upstream/libvpx.git/commit

author	George Steed <george.steed@arm.com>
	Wed, 22 Mar 2023 08:44:26 +0000 (08:44 +0000)
committer	George Steed <george.steed@arm.com>
	Wed, 29 Mar 2023 08:39:35 +0000 (08:39 +0000)
commit	83def747ff316d283c949458a4b890b23e5e0b8b
tree	6428e3fcc0af77372b7be9f92f9d4e8c0544912d	tree \| snapshot
parent	4cf9819282aa123e8b126731ef5629ee5144cd86	commit \| diff

Avoid interleaving loads/stores in Neon for highbd dc predictor

The interleaving load/store instructions (LD2/LD3/LD4 and ST2/ST3/ST4)
are useful if we are dealing with interleaved data (e.g. real/imag
components of complex numbers), but for simply loading or storing larger
quantities of data it is preferable to simply use two or more of the
normal load/store instructions.

This patch replaces such occurrences in the two larger block sizes:
vpx_highbd_dc_predictor_16x16_neon, vpx_highbd_dc_predictor_32x32_neon,
and related helper functions.

Speedups over the original Neon code (higher is better):

Microarch.  | Compiler | Block | Speedup
Neoverse N1 |  LLVM 15 | 16x16 |    1.25
Neoverse N1 |  LLVM 15 | 32x32 |    1.13
Neoverse N1 |   GCC 12 | 16x16 |    1.56
Neoverse N1 |   GCC 12 | 32x32 |    1.52
Neoverse V1 |  LLVM 15 | 16x16 |    1.63
Neoverse V1 |  LLVM 15 | 32x32 |    1.08
Neoverse V1 |   GCC 12 | 16x16 |    1.59
Neoverse V1 |   GCC 12 | 32x32 |    1.37

Change-Id: If5ec220aba9dd19785454eabb0f3d6affec0cc8b