combine loopfilter data access
authorJohann <johannkoenig@google.com>
Tue, 27 Sep 2011 00:17:20 +0000 (17:17 -0700)
committerJohann <johannkoenig@google.com>
Fri, 30 Sep 2011 14:38:35 +0000 (07:38 -0700)
commit3556deaca3e499a42b1696a6a75a73ba3a0671d1
tree8779ed009f79649947905eff4ffe02ddbc74c93e
parent6f9457ec12a98b3aceefbcb79783c084268d0b36
combine loopfilter data access

The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
vp8/common/x86/loopfilter_block_sse2.asm [new file with mode: 0644]
vp8/common/x86/loopfilter_sse2.asm
vp8/common/x86/loopfilter_x86.c
vp8/vp8_common.mk