Do vertical loopfiltering in parallel
authorYunqing Wang <yunqingwang@google.com>
Fri, 22 Nov 2013 00:43:37 +0000 (16:43 -0800)
committerYunqing Wang <yunqingwang@google.com>
Fri, 22 Nov 2013 18:04:51 +0000 (10:04 -0800)
commited36720b66ca71438a8e14a41f05e837d030da61
treebd3d5325cfd2029fbe6e033abcb99e579c082b4d
parent5925ba08a33cafceb0a2d21ca6d30923dc58f372
Do vertical loopfiltering in parallel

This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.

Decoder speedup:
tulip clip:     2% speed gain;
old_town_cross: 1.2% speed gain;
bus:            2% speed gain.

Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
vp9/common/arm/neon/vp9_loopfilter_16_neon.c
vp9/common/mips/dspr2/vp9_loopfilter_filters_dspr2.c
vp9/common/vp9_loopfilter.c
vp9/common/vp9_loopfilter_filters.c
vp9/common/vp9_rtcd_defs.sh
vp9/common/x86/vp9_loopfilter_intrin_sse2.c