platform/upstream/libvpx.git
6 years agoMerge "vpx postproc: rewrite in intrinsics"
Johann Koenig [Fri, 2 Nov 2018 18:13:14 +0000 (18:13 +0000)]
Merge "vpx postproc: rewrite in intrinsics"

6 years agoMerge "Add highbd Hadamard transform C implementations"
Sai Deng [Fri, 2 Nov 2018 16:46:46 +0000 (16:46 +0000)]
Merge "Add highbd Hadamard transform C implementations"

6 years agoMerge "fix snprintf error on windows"
Johann Koenig [Fri, 2 Nov 2018 16:23:23 +0000 (16:23 +0000)]
Merge "fix snprintf error on windows"

6 years agoMerge "clang-tidy: normalize variance functions"
Johann Koenig [Fri, 2 Nov 2018 14:41:24 +0000 (14:41 +0000)]
Merge "clang-tidy: normalize variance functions"

6 years agofix snprintf error on windows
Johann [Fri, 2 Nov 2018 14:34:12 +0000 (07:34 -0700)]
fix snprintf error on windows

Include vpx_ports/msvc.h to handle snprintf on older
versions of Visual Studio

Change-Id: I06cd99b32bbae82b3df079d41ff20a9a07f6fe1c

6 years agoAdd highbd Hadamard transform C implementations
sdeng [Tue, 30 Oct 2018 22:35:44 +0000 (15:35 -0700)]
Add highbd Hadamard transform C implementations

Change-Id: Ibec078c80ca1dfe6fbbc4288db89d719dac453a7

6 years agoMerge "vp8 boolcoder: normalize to "bc""
Johann Koenig [Wed, 31 Oct 2018 22:39:08 +0000 (22:39 +0000)]
Merge "vp8 boolcoder: normalize to "bc""

6 years agoMerge "vp8: fix to address overflow in decoder."
Jerome Jiang [Wed, 31 Oct 2018 22:27:44 +0000 (22:27 +0000)]
Merge "vp8: fix to address overflow in decoder."

6 years agoMerge "vp8dx_get_quantizer: normalize VP8D_COMP"
Johann Koenig [Wed, 31 Oct 2018 22:19:43 +0000 (22:19 +0000)]
Merge "vp8dx_get_quantizer: normalize VP8D_COMP"

6 years agoclang-tidy: normalize variance functions
Johann [Tue, 30 Oct 2018 21:43:36 +0000 (14:43 -0700)]
clang-tidy: normalize variance functions

Always use src/ref and _ptr/_stride suffixes.

Normalize to [xy]_offset and second_pred.

Drop some stray source/recon_strides.

BUG=webm:1444

Change-Id: I32362a50988eb84464ab78686348610ea40e5c80

6 years agoMerge "clang-tidy: fix vp9/encoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:43:05 +0000 (21:43 +0000)]
Merge "clang-tidy: fix vp9/encoder parameters"

6 years agoMerge "clang-tidy: fix vp9/decoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:42:16 +0000 (21:42 +0000)]
Merge "clang-tidy: fix vp9/decoder parameters"

6 years agoMerge "clang-tidy: fix vp9/common parameters"
Johann Koenig [Wed, 31 Oct 2018 21:42:07 +0000 (21:42 +0000)]
Merge "clang-tidy: fix vp9/common parameters"

6 years agoMerge "clang-tidy: fix vp8/encoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:41:53 +0000 (21:41 +0000)]
Merge "clang-tidy: fix vp8/encoder parameters"

6 years agovp8 boolcoder: normalize to "bc"
Johann [Wed, 31 Oct 2018 21:24:31 +0000 (14:24 -0700)]
vp8 boolcoder: normalize to "bc"

"bc" maps to BOOL_CODER better than "br"

Change-Id: Idefd03e79ccc1851a1b26f8206a159b0e5c5fb2d

6 years agoMerge "clang-tidy: fix vp8/decoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:14:46 +0000 (21:14 +0000)]
Merge "clang-tidy: fix vp8/decoder parameters"

6 years agovp8dx_get_quantizer: normalize VP8D_COMP
Johann [Wed, 31 Oct 2018 21:13:45 +0000 (14:13 -0700)]
vp8dx_get_quantizer: normalize VP8D_COMP

Use "pbi" like the rest of the functions

Change-Id: I5f3036b8f8361c30353be378d83455b83b82ac9f

6 years agoMerge "Add SSE2 support for hbd 4-tap interpolation filter."
Chi Yo Tsai [Wed, 31 Oct 2018 20:38:52 +0000 (20:38 +0000)]
Merge "Add SSE2 support for hbd 4-tap interpolation filter."

6 years agovp8: fix to address overflow in decoder.
Jerome Jiang [Tue, 7 Aug 2018 18:10:26 +0000 (11:10 -0700)]
vp8: fix to address overflow in decoder.

Can't call internal error from the decoder thread.

Add vpx_internal_error_info to MACROBLOCKD. When corrupted frame
detected, the decoder thread returns to its own context and signal
completion of decoding for current frame.

The main decoding thread will detect error too and return error code to
decoding API call.

Each thread will signal end of decoding of the frame. Main thread waits
for the signal of all other threads to start decoding next frame.

BUG=875626,webm:1496
Change-Id: Icd05fbc558893a4e7d8532c1e7177e7550283a64

6 years agoMerge "clang-tidy: fix vp8/common parameters"
Johann Koenig [Tue, 30 Oct 2018 22:11:59 +0000 (22:11 +0000)]
Merge "clang-tidy: fix vp8/common parameters"

6 years agoclang-tidy: fix vp9/encoder parameters
Johann [Tue, 30 Oct 2018 19:46:39 +0000 (12:46 -0700)]
clang-tidy: fix vp9/encoder parameters

BUG=webm:1444

Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac

6 years agoclang-tidy: fix vp9/decoder parameters
Johann [Tue, 30 Oct 2018 19:17:22 +0000 (12:17 -0700)]
clang-tidy: fix vp9/decoder parameters

BUG=webm:1444

Change-Id: I9c7c0a4161aaf52436bd5c01d30b035b2ff5508c

6 years agoAdd SSE2 support for hbd 4-tap interpolation filter.
chiyotsai [Mon, 29 Oct 2018 23:12:05 +0000 (16:12 -0700)]
Add SSE2 support for hbd 4-tap interpolation filter.

Unit test performance on bitdepth 10:
    | 4X4 | 8X8 |16X16|64X64|
 2D |1.582|1.461|1.425|1.572|
HORZ|1.643|1.247|1.346|1.345|
VERT|1.378|1.695|2.020|1.763|

Unit test performance on bitdepth 12:

    | 4X4 | 8X8 |16X16|64X64|
 2D |1.578|1.409|1.426|1.497|
HORZ|1.625|1.153|1.323|1.259|
VERT|1.392|1.707|2.030|1.787|

Change-Id: I6df85330ac33fcb17d46e4302b41415dda1219f5

6 years agoclang-tidy: fix vp9/common parameters
Johann [Tue, 30 Oct 2018 18:56:17 +0000 (11:56 -0700)]
clang-tidy: fix vp9/common parameters

BUG=webm:1444

Change-Id: I1a14ad119b3bcbaddcf2291a7521513cf6425635

6 years agoclang-tidy: fix vp8/encoder parameters
Johann [Tue, 30 Oct 2018 18:24:40 +0000 (11:24 -0700)]
clang-tidy: fix vp8/encoder parameters

BUG=webm:1444

Change-Id: I57a305cdab0d62b0745116272fbd5d9257c6e679

6 years agoclang-tidy: fix vp8/decoder parameters
Johann [Tue, 30 Oct 2018 17:55:21 +0000 (10:55 -0700)]
clang-tidy: fix vp8/decoder parameters

BUG=webm:1444

Change-Id: I3dfc56f7f6430d36a1c447d8999733015a001101

6 years agoclang-tidy: fix vp8/common parameters
Johann [Tue, 30 Oct 2018 17:21:58 +0000 (10:21 -0700)]
clang-tidy: fix vp8/common parameters

Match function definitions to declarations

BUG=webm:1444

Change-Id: Ib96d3b735eaf81cece5406c89cc5156bc2cde462

6 years agoMerge "Add AVX2 support for hbd 4-tap interpolation filter."
Chi Yo Tsai [Tue, 30 Oct 2018 16:50:00 +0000 (16:50 +0000)]
Merge "Add AVX2 support for hbd 4-tap interpolation filter."

6 years agoMerge "Properly space qp in q mode for multi-layer ARF"
Jingning Han [Tue, 30 Oct 2018 04:44:09 +0000 (04:44 +0000)]
Merge "Properly space qp in q mode for multi-layer ARF"

6 years agovpx postproc: rewrite in intrinsics
Johann [Thu, 25 Oct 2018 20:37:50 +0000 (13:37 -0700)]
vpx postproc: rewrite in intrinsics

About ~10% faster on 64bit but ~10% slower on 32

Removes the assembly usage of vpx_rv.

Change-Id: I214698fb5677f615dee0a8f5f5bb8f64daf2565e

6 years agoProperly space qp in q mode for multi-layer ARF
Jingning Han [Mon, 29 Oct 2018 21:08:20 +0000 (14:08 -0700)]
Properly space qp in q mode for multi-layer ARF

Space the quantization parameter distribution according to the
layer depth for multi-layer ARF coding structure. This allows
lower layers to have relatively smaller quantization parameters
than higher layers. It improves the compression performance
in constant q mode for multi-layer ARF system:

        avg PSNR      overall PSNR      SSIM
lowres  -0.33%         -0.31%          -1.44%
midres  -0.29%         -0.38%          -1.14%
hdres   -0.27%         -0.49%          -1.02%

Change-Id: I9cfe2f27e6c0029c30614970a46de3045840264e

6 years agoMerge "vp8 bilinear: ensure non-16x16 arrays are aligned"
Johann Koenig [Mon, 29 Oct 2018 22:26:42 +0000 (22:26 +0000)]
Merge "vp8 bilinear: ensure non-16x16 arrays are aligned"

6 years agoAdd AVX2 support for hbd 4-tap interpolation filter.
chiyotsai [Fri, 26 Oct 2018 21:14:28 +0000 (14:14 -0700)]
Add AVX2 support for hbd 4-tap interpolation filter.

Speed gain:

BIT DEPTH | 8TAP FPS | 4TAP FPS | PCT INC |
    10    |   1.69   |   1.85   |  9.46%  |
    12    |   1.64   |   1.78   |  8.54%  |

Speed test is done on jet.y4m on speed 1 profile 2 over 100 frame with
br=500.

Change-Id: I411e122553e2c466be7a26e64b4dd144efb884a9

6 years agovp8 bilinear: ensure non-16x16 arrays are aligned
Johann Koenig [Mon, 29 Oct 2018 18:59:56 +0000 (18:59 +0000)]
vp8 bilinear: ensure non-16x16 arrays are aligned

The 16x16 array was changed to aligned. The 8xN and 4x4 functions
use aligned loads/stores on their internal arrays as well.

BUG=webm:1570

Change-Id: I9cfe53d7c8ed76e8854c2688eb9a509b876471d8

6 years agoMerge "vp8 bilinear: ensure temp array is aligned"
Johann Koenig [Mon, 29 Oct 2018 18:55:52 +0000 (18:55 +0000)]
Merge "vp8 bilinear: ensure temp array is aligned"

6 years agoMerge "Enable 10 bit tpl support"
Sai Deng [Mon, 29 Oct 2018 17:14:09 +0000 (17:14 +0000)]
Merge "Enable 10 bit tpl support"

6 years agovp8 bilinear: ensure temp array is aligned
Johann [Mon, 29 Oct 2018 16:21:15 +0000 (09:21 -0700)]
vp8 bilinear: ensure temp array is aligned

Loads and stores to this array require 16 byte alignment.

BUG=webm:1570

Change-Id: I82c7d21c9539a108930fd030d79caaa0bcd1eeb3

6 years agoMerge "remove "register" keyword"
Johann Koenig [Mon, 29 Oct 2018 02:07:20 +0000 (02:07 +0000)]
Merge "remove "register" keyword"

6 years agoMerge "Remove unused macros from vp9_firstpass.c"
Jingning Han [Sat, 27 Oct 2018 03:56:18 +0000 (03:56 +0000)]
Merge "Remove unused macros from vp9_firstpass.c"

6 years agoEnable 10 bit tpl support
sdeng [Wed, 24 Oct 2018 23:23:24 +0000 (16:23 -0700)]
Enable 10 bit tpl support

         lowres_bd10   midres_bd10
avg_psnr      -0.897        -1.261
ovr_psnr      -0.975        -1.349

Change-Id: Id54f2c419f4edaa91e89ffea52b4038b1d94e563

6 years agoremove "register" keyword
Johann [Fri, 26 Oct 2018 21:55:26 +0000 (14:55 -0700)]
remove "register" keyword

This has been deprecated for a long time. c++17 is trying to recover the name.

Change-Id: Iade6bebce03a50b76061695f9e634a107cd989cd

6 years agoMerge "Add Memory to Enable Row Decode"
Harish Mahendrakar [Fri, 26 Oct 2018 18:31:41 +0000 (18:31 +0000)]
Merge "Add Memory to Enable Row Decode"

6 years agoRemove unused macros from vp9_firstpass.c
Jingning Han [Fri, 26 Oct 2018 18:03:31 +0000 (11:03 -0700)]
Remove unused macros from vp9_firstpass.c

Change-Id: If5267a8c71113b171b7bddda5b49f0326c4266b8

6 years agovp8 bilinear: rewrite 4x4
Johann [Thu, 25 Oct 2018 19:23:03 +0000 (12:23 -0700)]
vp8 bilinear: rewrite 4x4

~20% faster than the MMX. Removes the last usage of
vp8_bilinear_filters_x86_[48].

Change-Id: Iee976fab9655d0020440f26c4403ce50103af913

6 years agoMerge "vp8 bilinear: rewrite 16x16"
Johann Koenig [Thu, 25 Oct 2018 19:59:06 +0000 (19:59 +0000)]
Merge "vp8 bilinear: rewrite 16x16"

6 years agoMerge "Add AVX2 support for 4-tap interpolation filter."
Chi Yo Tsai [Thu, 25 Oct 2018 18:25:09 +0000 (18:25 +0000)]
Merge "Add AVX2 support for 4-tap interpolation filter."

6 years agovp8 bilinear: rewrite 16x16
Johann [Wed, 24 Oct 2018 22:48:32 +0000 (15:48 -0700)]
vp8 bilinear: rewrite 16x16

Marginally faster. Most importantly it drops a dependency on an
external symbol (vp8_bilinear_filters_x86_8).

Change-Id: Iff022e718720f1f0eeced6201a1ad69a9c9c4f45

6 years agoMerge "vp8 bilinear: rewrite in intrinsics"
Johann Koenig [Thu, 25 Oct 2018 17:13:50 +0000 (17:13 +0000)]
Merge "vp8 bilinear: rewrite in intrinsics"

6 years agoAdd Memory to Enable Row Decode
Ritu Baldwa [Wed, 10 Oct 2018 10:55:51 +0000 (16:25 +0530)]
Add Memory to Enable Row Decode

Row based multi-thread needs extra memory to store the parsed
co-efficients, partitions and eob. This commit adds memory for the same.

Change-Id: I13fa4a6ada2ec3048bc973e465055b832429388f

6 years agoMerge "Enable tpl model to support multi-layer ARF"
Jingning Han [Thu, 25 Oct 2018 00:01:52 +0000 (00:01 +0000)]
Merge "Enable tpl model to support multi-layer ARF"

6 years agoMerge "Reset frame udpate flags after qp estimate in tpl"
Jingning Han [Thu, 25 Oct 2018 00:01:46 +0000 (00:01 +0000)]
Merge "Reset frame udpate flags after qp estimate in tpl"

6 years agoMerge "Bypass processing on use existing frame"
Jingning Han [Thu, 25 Oct 2018 00:01:41 +0000 (00:01 +0000)]
Merge "Bypass processing on use existing frame"

6 years agoMerge "Fix frame offset computation for GOP extension"
Jingning Han [Thu, 25 Oct 2018 00:01:35 +0000 (00:01 +0000)]
Merge "Fix frame offset computation for GOP extension"

6 years agoMerge "Refactor gop_length use case in tpl model"
Jingning Han [Thu, 25 Oct 2018 00:01:29 +0000 (00:01 +0000)]
Merge "Refactor gop_length use case in tpl model"

6 years agovp8 bilinear: rewrite in intrinsics
Johann [Wed, 24 Oct 2018 19:22:35 +0000 (12:22 -0700)]
vp8 bilinear: rewrite in intrinsics

8x8 is 15% faster than the assembly. 8x4 is 200% faster than MMX.

Remove MMX version.

Change-Id: I55642ebd276db265911f2c79616177a3a9a7e04f

6 years agoMerge "Clean up vpx_dsp/x86/convolve_sse2.h"
Chi Yo Tsai [Wed, 24 Oct 2018 16:36:20 +0000 (16:36 +0000)]
Merge "Clean up vpx_dsp/x86/convolve_sse2.h"

6 years agoEnable tpl model to support multi-layer ARF
Jingning Han [Wed, 24 Oct 2018 03:30:35 +0000 (20:30 -0700)]
Enable tpl model to support multi-layer ARF

Enable temporal dependency model for the base layer ARF. It
improves the multi-layer ARF compression performance (results
are tested in speed 0 vbr mode):

         avg PSNR    overall PSNR     SSIM
lowres   -0.40%       -0.46%         -0.32%
midres   -0.59%       -0.68%         -0.45%
720p     -0.55%       -0.59%         -1.07%

Change-Id: I7790b89ccfb6e61f9b7965f34d348c7440220dd0

6 years agoAdd AVX2 support for 4-tap interpolation filter.
chiyotsai [Tue, 23 Oct 2018 19:42:21 +0000 (12:42 -0700)]
Add AVX2 support for 4-tap interpolation filter.

Performance:
     | 4X4 | 8X8 |16X16|64X64|
2 DIM|1.491|1.902|1.772|1.479|
 HORZ|1.145|1.521|1.757|1.497|
 VERT|1.176|1.614|1.707|1.467|

Each number in the chart above is 8-tap function time / 4-tap function time.

The framerate tested on jets.y4m for 100 frames on speed 1 increased from 3.72
fps to 3.91 fps (about 5% increase).

Change-Id: Ic0ad275cf32fafeefd0a89811badd8adff2134a0

6 years agoClean up vpx_dsp/x86/convolve_sse2.h
chiyotsai [Thu, 18 Oct 2018 16:51:56 +0000 (09:51 -0700)]
Clean up vpx_dsp/x86/convolve_sse2.h

Removes unnecesssary includes and reword some functions/comments.

Change-Id: Ied557d7faa9d845d38255e6e3e0e3fe1395276e1

6 years agoReset frame udpate flags after qp estimate in tpl
Jingning Han [Tue, 23 Oct 2018 23:24:28 +0000 (16:24 -0700)]
Reset frame udpate flags after qp estimate in tpl

After the frame quantizer estimate run in tpl model, reset the
actual value assigned to the current coding frame. This would
avoid certain frame update flags being overwritten by different
frame types' update.

Change-Id: Idde2ba1108f1f68747b14149b211f882965c99f0

6 years agoMerge "Use 8-tap interp filter in temporal filtering"
Yunqing Wang [Tue, 23 Oct 2018 22:29:46 +0000 (22:29 +0000)]
Merge "Use 8-tap interp filter in temporal filtering"

6 years agoUse 8-tap interp filter in temporal filtering
Yunqing Wang [Tue, 23 Oct 2018 19:30:13 +0000 (12:30 -0700)]
Use 8-tap interp filter in temporal filtering

Used 8-tap interp filter in temporal filtering to achieve more accurate
motion search result. Using 8-tap sharp gave slight better result than
using 8-tap regular.

Speed 0 borg test showed that
        avg_psnr:  ovr_psnr:    ssim:
hdres:  -0.160      -0.157     -0.173
midres: -0.083      -0.061     -0.183
lowres: -0.077      -0.099     -0.204

Speed test didn't see noticeable encoder time changes.

Change-Id: I97dc3c4864b5a5675a6c1e3952799b81eedd7d93

6 years agoMerge "Remove empty else branch in mode_estimation"
Jingning Han [Tue, 23 Oct 2018 19:16:41 +0000 (19:16 +0000)]
Merge "Remove empty else branch in mode_estimation"

6 years agoBypass processing on use existing frame
Jingning Han [Mon, 22 Oct 2018 23:25:43 +0000 (16:25 -0700)]
Bypass processing on use existing frame

The use of show existing frame requries no further operation on
that coding frame. Bypass the corresponding process.

Change-Id: Ia092027a8a543be0ca54c00b4d51e453039712b8

6 years agoFix frame offset computation for GOP extension
Jingning Han [Mon, 22 Oct 2018 21:21:48 +0000 (14:21 -0700)]
Fix frame offset computation for GOP extension

Properly compute the extended GOP frames' buffer offsets.

Change-Id: I9aed14f4b8d623f1832e782828dce07aa546507d

6 years agoRefactor gop_length use case in tpl model
Jingning Han [Mon, 22 Oct 2018 17:37:09 +0000 (10:37 -0700)]
Refactor gop_length use case in tpl model

Make it support both single- and multi-layer ARF GOP structure.

Change-Id: I760a95804d1b583b057120f6d6be65195a0e6c19

6 years agoRemove empty else branch in mode_estimation
Jingning Han [Tue, 23 Oct 2018 05:51:48 +0000 (22:51 -0700)]
Remove empty else branch in mode_estimation

Change-Id: Iefa184aae80b920b054e3e922a77244c2b0d4b61

6 years agoMerge "Use the proper gfu_boost factor to compute rd_mult"
Jingning Han [Tue, 23 Oct 2018 02:28:22 +0000 (02:28 +0000)]
Merge "Use the proper gfu_boost factor to compute rd_mult"

6 years agoUse the proper gfu_boost factor to compute rd_mult
Jingning Han [Mon, 22 Oct 2018 16:28:04 +0000 (09:28 -0700)]
Use the proper gfu_boost factor to compute rd_mult

Update the Lagrangian multiplier according to the gfu_boost factor
assigned per frame. It improves the multi-layer ARF compression
performance (results below shown for speed 0):

         avg PSNR      overall PSNR      SSIM
lowres    -0.08%          0.02%         -0.28%
midres    -0.08%          0.03%         -0.22%
hdres     -0.19%         -0.10%         -0.39%
nflx2k    -0.29%         -0.18%         -0.85%

Change-Id: Ifeb4b14918f880ba011ea41c1454ab00504f8855

6 years agoMerge "ML_VAR_PARTITION: enable at speed 5"
Hui Su [Fri, 19 Oct 2018 16:48:40 +0000 (16:48 +0000)]
Merge "ML_VAR_PARTITION: enable at speed 5"

6 years agoML_VAR_PARTITION: enable at speed 5
Hui Su [Tue, 16 Oct 2018 03:45:07 +0000 (20:45 -0700)]
ML_VAR_PARTITION: enable at speed 5

When the ML_VAR_PARTITION experiment is turned on, replace
REFERENCE_PARTITION with ML_BASED_PARTITION at speed 5.

Coding gains(avg_psnr) compared to baseline:
ytlivehr  1.63%
ytlivelr  0.07%

Tested encoding speed with several clips from ytlivehr and ytlivelr
on linux desktop(rt, vbr, 4 threads). Encoder speed is on average
faster than baseline:
360p:   14% faster
720p:    7% faster
1080p: 1.5% faster

Change-Id: I39b00078176ff516f7306818f33ba2b1ea53dfa1

6 years agoChanges 4-tap SSSE3 filter to 8-tap AVX2 filter.
chiyotsai [Thu, 18 Oct 2018 16:34:20 +0000 (09:34 -0700)]
Changes 4-tap SSSE3 filter to 8-tap AVX2 filter.

AVX2's 8-tap filter is slightly faster than 4-tap SSSE3 filter.

Change-Id: I5fc37c431670780108706b206b32c791828555c9

6 years agoMerge "Add SSSE3 support for 4-tap interpolation filter"
Chi Yo Tsai [Thu, 18 Oct 2018 18:19:41 +0000 (18:19 +0000)]
Merge "Add SSSE3 support for 4-tap interpolation filter"

6 years agoMerge "Enable rect partition search for HBD at speed 1"
Hui Su [Thu, 18 Oct 2018 16:46:15 +0000 (16:46 +0000)]
Merge "Enable rect partition search for HBD at speed 1"

6 years agoAdd SSSE3 support for 4-tap interpolation filter
chiyotsai [Wed, 17 Oct 2018 21:52:26 +0000 (14:52 -0700)]
Add SSSE3 support for 4-tap interpolation filter

Performance:
     | 4X4 | 8X8 |16X16|64X64|
2 DIM|1.526|1.827|1.844|1.906|
 HORZ|1.336|1.795|1.886|1.654|
 VERT|1.443|1.539|2.139|2.190|

The ratio is SSSE3 8-tap time / SSSE3 4-tap time.

Change-Id: I01ed2ab494428256e918875774a459afecc5ec6a

6 years agoMerge "Replace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size"
Jingning Han [Thu, 18 Oct 2018 16:25:37 +0000 (16:25 +0000)]
Merge "Replace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size"

6 years agoMerge "Optimize vp9_highbd_temporal_filter_apply_c"
Yunqing Wang [Wed, 17 Oct 2018 23:11:46 +0000 (23:11 +0000)]
Merge "Optimize vp9_highbd_temporal_filter_apply_c"

6 years agoReplace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size
Jingning Han [Wed, 17 Oct 2018 23:04:21 +0000 (16:04 -0700)]
Replace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size

MAX_ARF_GOP_SIZE accurately reflects the maximum frame operated
per group of pictures. Use that to replace MAX_LAG_BUFFERS in
such use cases.

Change-Id: Id26f9b1b2b0c38f255dee19795356c387d06d033

6 years agoMerge changes I6d5c77af,I6bf504b4,Ie5dc5ea7,Ie6024b1a,If45fba8a, ...
Angie Chiang [Wed, 17 Oct 2018 23:10:50 +0000 (23:10 +0000)]
Merge changes I6d5c77af,I6bf504b4,Ie5dc5ea7,Ie6024b1a,If45fba8a, ...

* changes:
  Add do_motion_search
  Preserve code of doing mv search in raster order
  Variant implementation of changing mv search order
  Add feature_score_loc_sort
  Init mv_[dist/cost]_sum in init_tpl_stats
  Change mv search order according to feature_score

6 years agoAdd do_motion_search
Angie Chiang [Wed, 17 Oct 2018 21:45:08 +0000 (14:45 -0700)]
Add do_motion_search

This will make the code cleaner.

Change-Id: I6d5c77af7261c39656b35ec40ac1451bbdbfb7a7

6 years agoMerge "Adds SSE2 support for interpolation filter for width 4 and 8"
Chi Yo Tsai [Wed, 17 Oct 2018 21:35:14 +0000 (21:35 +0000)]
Merge "Adds SSE2 support for interpolation filter for width 4 and 8"

6 years agoPreserve code of doing mv search in raster order
Angie Chiang [Wed, 17 Oct 2018 20:56:42 +0000 (13:56 -0700)]
Preserve code of doing mv search in raster order

With this change, there will be three version of mv search scheme
on the codebase simultaneously.
We will do further experiment to evaluate which version is better
in terms of visual quality and coding performance.

Change-Id: I6bf504b4551316ef10b8a341ab3ba14d0ec977ce

6 years agoEnable rect partition search for HBD at speed 1
Hui Su [Wed, 17 Oct 2018 15:40:26 +0000 (08:40 -0700)]
Enable rect partition search for HBD at speed 1

This patch enables rectangular partition search on speed 1 for high
bit depth encoding. The encoding speed loss is reduced thanks to
recently added speed features.

This only affects speed 1 high bit-depth encoding.

Coding gains:
                      avg_psnr     ovr_psnr
lowres_bd10(480p)      1.34%        1.40%
midres_bd10(720p)      1.28%        1.33%

Average speed loss:
        QP=30    QP=40    QP=50    average
480p     2.5%     2.3%     2.6%     2.5%
720p     4.0%     3.9%     3.2%     3.7%

Change-Id: Id9cac4eea0769d94e093c9d170194659b3342d89

6 years agoAdds SSE2 support for interpolation filter for width 4 and 8
chiyotsai [Tue, 16 Oct 2018 22:45:05 +0000 (15:45 -0700)]
Adds SSE2 support for interpolation filter for width 4 and 8

Performance:
The chart below shows the speed relative to baseline
(baseline_time/new_time)
_____| 4X4 | 8X8 |16X16|64X64|
2 DIM|1.889|1.780|1.811|1.963|
 HORZ|2.266|1.834|1.617|1.595|
 VERI|2.043|2.190|2.373|2.485|

Change-Id: Ic4262222db78f013b94a8c61b46efb8520722927

6 years agoMerge "For keyframe-only coding do not boost in q mode"
Urvang Joshi [Wed, 17 Oct 2018 20:25:02 +0000 (20:25 +0000)]
Merge "For keyframe-only coding do not boost in q mode"

6 years agoFor keyframe-only coding do not boost in q mode
Urvang Joshi [Wed, 17 Oct 2018 18:48:10 +0000 (11:48 -0700)]
For keyframe-only coding do not boost in q mode

If we are using keyframe only coding - either coding a
single frame, or a sequence of keyframes - in the end-usage=q
mode, use the cq_level directly as the quality of each
coded frame, rather than boost them.

Ported from AV1: 563a0d1eb92bdc1e987df071a568d8406c4ffa92

Change-Id: I6dc929b8b4f0aa18e279139077f3a87958c92245

6 years agoRefactor SSE2 Code for 4-tap interpolation filter on width 16.
chiyotsai [Tue, 16 Oct 2018 19:26:34 +0000 (12:26 -0700)]
Refactor SSE2 Code for 4-tap interpolation filter on width 16.

Some repeated codes are refactored as inline functions. No performance
degradation is observed. These inline functions can be used for width 8
and width 4.

Change-Id: Ibf08cc9ebd2dd47bd2a6c2bcc1616f9d4c252d4d

6 years agoOptimize vp9_highbd_temporal_filter_apply_c
Yunqing Wang [Sat, 13 Oct 2018 00:21:23 +0000 (17:21 -0700)]
Optimize vp9_highbd_temporal_filter_apply_c

Following the previous patch:
(https://chromium-review.googlesource.com/c/webm/libvpx/+/1277913),
this patch modified the highbd version of applying temporal filter
in the similar way.

Change-Id: I2bb6f1fff6e32bca86f7139a497181d34aa9f3ec

6 years agoAdd SSE2 support for 4-tap interpolation filter for width 16.
chiyotsai [Wed, 17 Oct 2018 00:50:37 +0000 (17:50 -0700)]
Add SSE2 support for 4-tap interpolation filter for width 16.

Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.

Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c

6 years agoVariant implementation of changing mv search order
Angie Chiang [Tue, 16 Oct 2018 19:31:13 +0000 (12:31 -0700)]
Variant implementation of changing mv search order

We start mv search from the block with highest feature score, then
move on to the block's neighbors with with an searching order using
their feature scores.

We use max heap to help us achieve the functionality.

This feature is under flag USE_PQSORT

Change-Id: Ie5dc5ea715b0f9a7a594e5080a7cb4f5309f5597

6 years agoAdd feature_score_loc_sort
Angie Chiang [Mon, 15 Oct 2018 19:25:22 +0000 (12:25 -0700)]
Add feature_score_loc_sort

This CL is for facilitating the upcoming change,
a variant implementation of change mv search order according to
feature score

Change-Id: Ie6024b1a5ec02343aea6aa81fc14f94e2e515d06

6 years agoInit mv_[dist/cost]_sum in init_tpl_stats
Angie Chiang [Fri, 12 Oct 2018 22:37:26 +0000 (15:37 -0700)]
Init mv_[dist/cost]_sum in init_tpl_stats

Change-Id: If45fba8a74186803eec09da7dbaf2e1fe4e9e156

6 years agoChange mv search order according to feature_score
Angie Chiang [Thu, 11 Oct 2018 00:43:22 +0000 (17:43 -0700)]
Change mv search order according to feature_score

Sort the feature_score in descending order.
Do mv search from the block with higher score to the block with
lower score

Change-Id: I47a87cd66ea3e40d8c8fc55a7517ab8aa10fdb94

6 years agoMerge "Reduce the cpi->scaled_ref_idx array size by 1."
Wan-Teh Chang [Wed, 17 Oct 2018 14:43:22 +0000 (14:43 +0000)]
Merge "Reduce the cpi->scaled_ref_idx array size by 1."

6 years agoMerge "Refactor tpl dependency model to support multi-layer ARF updates"
Jingning Han [Tue, 16 Oct 2018 21:24:17 +0000 (21:24 +0000)]
Merge "Refactor tpl dependency model to support multi-layer ARF updates"

6 years agoMerge "Refactor GOP reference frame ordering for tpl model"
Jingning Han [Tue, 16 Oct 2018 21:23:52 +0000 (21:23 +0000)]
Merge "Refactor GOP reference frame ordering for tpl model"

6 years agoMerge "Record gop size"
Jingning Han [Tue, 16 Oct 2018 21:07:55 +0000 (21:07 +0000)]
Merge "Record gop size"

6 years agoMerge "Fix a bug in ml_prune_rect_partition()"
Hui Su [Tue, 16 Oct 2018 20:58:45 +0000 (20:58 +0000)]
Merge "Fix a bug in ml_prune_rect_partition()"

6 years agoMerge "Fix the filter tap calculation in mips optimizations"
Yunqing Wang [Tue, 16 Oct 2018 17:55:37 +0000 (17:55 +0000)]
Merge "Fix the filter tap calculation in mips optimizations"

6 years agoFix a bug in ml_prune_rect_partition()
Hui Su [Tue, 16 Oct 2018 16:50:13 +0000 (09:50 -0700)]
Fix a bug in ml_prune_rect_partition()

The quantization step size should be scaled properly for high bit depth
settings.

This only affects speed 0.
Encoder speed change is almost neutral.
There is a small coding gain of 0.09%.

Change-Id: I96b2bae03a53ce8ccd6428e3a050cfe18e06a024