Do horizontal loopfiltering in parallel
authorYunqing Wang <yunqingwang@google.com>
Wed, 13 Nov 2013 00:51:15 +0000 (16:51 -0800)
committerFrank Galligan <fgalligan@google.com>
Sat, 16 Nov 2013 00:18:43 +0000 (16:18 -0800)
commit64f728caef5d9f019222c6989a9c6df17464dd69
tree5e5994eb65821008e78d7677d5abffbb3908f1fc
parent60d1a5299576649f6db38714319b5845683ff0ab
Do horizontal loopfiltering in parallel

This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.

Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.

Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
vp9/common/arm/neon/vp9_loopfilter_16_neon.c [new file with mode: 0644]
vp9/common/vp9_loopfilter.c
vp9/common/vp9_loopfilter_filters.c
vp9/common/vp9_rtcd_defs.sh
vp9/common/x86/vp9_loopfilter_intrin_sse2.c
vp9/vp9_common.mk