vp9 decoder: row-based multi-threaded loopfilter
authorYunqing Wang <yunqingwang@google.com>
Fri, 27 Dec 2013 23:25:54 +0000 (15:25 -0800)
committerYunqing Wang <yunqingwang@google.com>
Fri, 31 Jan 2014 22:44:53 +0000 (14:44 -0800)
commit903801f1ef7ac8d13d4f57571d048b604e8aaafd
tree23567c0947d8492ea9333ff924ed02e0d505c8bb
parente78c174e540117dcfcdff505d38478d4ac6df844
vp9 decoder: row-based multi-threaded loopfilter

Implemented parallel loopfiltering, which uses existing tile-
decoding threads. Each thread works on one row, and when that row
is loopfiltered, it moves to next unattended row. To ensure the
correct filtering order, threads are synchronized and one
superblock is filtered only if the superblocks it depends on are
filtered already.

To reduce synchronization overhead and speed up the decoder, we use
nsync > 1 for high resolution.

Performance tests:
1. on desktop:
8-tile 4k video using 8 threads, speedup: 70% - 80%
4-tile HD video using 4 threads, speedup: ~35%
2. on mobile device(Nexus 7):
4-tile 1080p video using 4 threads, speedup: 18% - 25%
4-tile 1080p video using 2 threads, speedup: 10% - 15%

Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a
vp9/common/vp9_loopfilter.c
vp9/common/vp9_loopfilter.h
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_dthread.c [new file with mode: 0644]
vp9/decoder/vp9_dthread.h [new file with mode: 0644]
vp9/decoder/vp9_onyxd_if.c
vp9/decoder/vp9_onyxd_int.h
vp9/decoder/vp9_thread.c
vp9/decoder/vp9_thread.h
vp9/vp9dx.mk