Johann Koenig [Fri, 2 Nov 2018 18:13:14 +0000 (18:13 +0000)]
Merge "vpx postproc: rewrite in intrinsics"
Sai Deng [Fri, 2 Nov 2018 16:46:46 +0000 (16:46 +0000)]
Merge "Add highbd Hadamard transform C implementations"
Johann Koenig [Fri, 2 Nov 2018 16:23:23 +0000 (16:23 +0000)]
Merge "fix snprintf error on windows"
Johann Koenig [Fri, 2 Nov 2018 14:41:24 +0000 (14:41 +0000)]
Merge "clang-tidy: normalize variance functions"
Johann [Fri, 2 Nov 2018 14:34:12 +0000 (07:34 -0700)]
fix snprintf error on windows
Include vpx_ports/msvc.h to handle snprintf on older
versions of Visual Studio
Change-Id: I06cd99b32bbae82b3df079d41ff20a9a07f6fe1c
sdeng [Tue, 30 Oct 2018 22:35:44 +0000 (15:35 -0700)]
Add highbd Hadamard transform C implementations
Change-Id: Ibec078c80ca1dfe6fbbc4288db89d719dac453a7
Johann Koenig [Wed, 31 Oct 2018 22:39:08 +0000 (22:39 +0000)]
Merge "vp8 boolcoder: normalize to "bc""
Jerome Jiang [Wed, 31 Oct 2018 22:27:44 +0000 (22:27 +0000)]
Merge "vp8: fix to address overflow in decoder."
Johann Koenig [Wed, 31 Oct 2018 22:19:43 +0000 (22:19 +0000)]
Merge "vp8dx_get_quantizer: normalize VP8D_COMP"
Johann [Tue, 30 Oct 2018 21:43:36 +0000 (14:43 -0700)]
clang-tidy: normalize variance functions
Always use src/ref and _ptr/_stride suffixes.
Normalize to [xy]_offset and second_pred.
Drop some stray source/recon_strides.
BUG=webm:1444
Change-Id: I32362a50988eb84464ab78686348610ea40e5c80
Johann Koenig [Wed, 31 Oct 2018 21:43:05 +0000 (21:43 +0000)]
Merge "clang-tidy: fix vp9/encoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:42:16 +0000 (21:42 +0000)]
Merge "clang-tidy: fix vp9/decoder parameters"
Johann Koenig [Wed, 31 Oct 2018 21:42:07 +0000 (21:42 +0000)]
Merge "clang-tidy: fix vp9/common parameters"
Johann Koenig [Wed, 31 Oct 2018 21:41:53 +0000 (21:41 +0000)]
Merge "clang-tidy: fix vp8/encoder parameters"
Johann [Wed, 31 Oct 2018 21:24:31 +0000 (14:24 -0700)]
vp8 boolcoder: normalize to "bc"
"bc" maps to BOOL_CODER better than "br"
Change-Id: Idefd03e79ccc1851a1b26f8206a159b0e5c5fb2d
Johann Koenig [Wed, 31 Oct 2018 21:14:46 +0000 (21:14 +0000)]
Merge "clang-tidy: fix vp8/decoder parameters"
Johann [Wed, 31 Oct 2018 21:13:45 +0000 (14:13 -0700)]
vp8dx_get_quantizer: normalize VP8D_COMP
Use "pbi" like the rest of the functions
Change-Id: I5f3036b8f8361c30353be378d83455b83b82ac9f
Chi Yo Tsai [Wed, 31 Oct 2018 20:38:52 +0000 (20:38 +0000)]
Merge "Add SSE2 support for hbd 4-tap interpolation filter."
Jerome Jiang [Tue, 7 Aug 2018 18:10:26 +0000 (11:10 -0700)]
vp8: fix to address overflow in decoder.
Can't call internal error from the decoder thread.
Add vpx_internal_error_info to MACROBLOCKD. When corrupted frame
detected, the decoder thread returns to its own context and signal
completion of decoding for current frame.
The main decoding thread will detect error too and return error code to
decoding API call.
Each thread will signal end of decoding of the frame. Main thread waits
for the signal of all other threads to start decoding next frame.
BUG=875626,webm:1496
Change-Id: Icd05fbc558893a4e7d8532c1e7177e7550283a64
Johann Koenig [Tue, 30 Oct 2018 22:11:59 +0000 (22:11 +0000)]
Merge "clang-tidy: fix vp8/common parameters"
Johann [Tue, 30 Oct 2018 19:46:39 +0000 (12:46 -0700)]
clang-tidy: fix vp9/encoder parameters
BUG=webm:1444
Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
Johann [Tue, 30 Oct 2018 19:17:22 +0000 (12:17 -0700)]
clang-tidy: fix vp9/decoder parameters
BUG=webm:1444
Change-Id: I9c7c0a4161aaf52436bd5c01d30b035b2ff5508c
chiyotsai [Mon, 29 Oct 2018 23:12:05 +0000 (16:12 -0700)]
Add SSE2 support for hbd 4-tap interpolation filter.
Unit test performance on bitdepth 10:
| 4X4 | 8X8 |16X16|64X64|
2D |1.582|1.461|1.425|1.572|
HORZ|1.643|1.247|1.346|1.345|
VERT|1.378|1.695|2.020|1.763|
Unit test performance on bitdepth 12:
| 4X4 | 8X8 |16X16|64X64|
2D |1.578|1.409|1.426|1.497|
HORZ|1.625|1.153|1.323|1.259|
VERT|1.392|1.707|2.030|1.787|
Change-Id: I6df85330ac33fcb17d46e4302b41415dda1219f5
Johann [Tue, 30 Oct 2018 18:56:17 +0000 (11:56 -0700)]
clang-tidy: fix vp9/common parameters
BUG=webm:1444
Change-Id: I1a14ad119b3bcbaddcf2291a7521513cf6425635
Johann [Tue, 30 Oct 2018 18:24:40 +0000 (11:24 -0700)]
clang-tidy: fix vp8/encoder parameters
BUG=webm:1444
Change-Id: I57a305cdab0d62b0745116272fbd5d9257c6e679
Johann [Tue, 30 Oct 2018 17:55:21 +0000 (10:55 -0700)]
clang-tidy: fix vp8/decoder parameters
BUG=webm:1444
Change-Id: I3dfc56f7f6430d36a1c447d8999733015a001101
Johann [Tue, 30 Oct 2018 17:21:58 +0000 (10:21 -0700)]
clang-tidy: fix vp8/common parameters
Match function definitions to declarations
BUG=webm:1444
Change-Id: Ib96d3b735eaf81cece5406c89cc5156bc2cde462
Chi Yo Tsai [Tue, 30 Oct 2018 16:50:00 +0000 (16:50 +0000)]
Merge "Add AVX2 support for hbd 4-tap interpolation filter."
Jingning Han [Tue, 30 Oct 2018 04:44:09 +0000 (04:44 +0000)]
Merge "Properly space qp in q mode for multi-layer ARF"
Johann [Thu, 25 Oct 2018 20:37:50 +0000 (13:37 -0700)]
vpx postproc: rewrite in intrinsics
About ~10% faster on 64bit but ~10% slower on 32
Removes the assembly usage of vpx_rv.
Change-Id: I214698fb5677f615dee0a8f5f5bb8f64daf2565e
Jingning Han [Mon, 29 Oct 2018 21:08:20 +0000 (14:08 -0700)]
Properly space qp in q mode for multi-layer ARF
Space the quantization parameter distribution according to the
layer depth for multi-layer ARF coding structure. This allows
lower layers to have relatively smaller quantization parameters
than higher layers. It improves the compression performance
in constant q mode for multi-layer ARF system:
avg PSNR overall PSNR SSIM
lowres -0.33% -0.31% -1.44%
midres -0.29% -0.38% -1.14%
hdres -0.27% -0.49% -1.02%
Change-Id: I9cfe2f27e6c0029c30614970a46de3045840264e
Johann Koenig [Mon, 29 Oct 2018 22:26:42 +0000 (22:26 +0000)]
Merge "vp8 bilinear: ensure non-16x16 arrays are aligned"
chiyotsai [Fri, 26 Oct 2018 21:14:28 +0000 (14:14 -0700)]
Add AVX2 support for hbd 4-tap interpolation filter.
Speed gain:
BIT DEPTH | 8TAP FPS | 4TAP FPS | PCT INC |
10 | 1.69 | 1.85 | 9.46% |
12 | 1.64 | 1.78 | 8.54% |
Speed test is done on jet.y4m on speed 1 profile 2 over 100 frame with
br=500.
Change-Id: I411e122553e2c466be7a26e64b4dd144efb884a9
Johann Koenig [Mon, 29 Oct 2018 18:59:56 +0000 (18:59 +0000)]
vp8 bilinear: ensure non-16x16 arrays are aligned
The 16x16 array was changed to aligned. The 8xN and 4x4 functions
use aligned loads/stores on their internal arrays as well.
BUG=webm:1570
Change-Id: I9cfe53d7c8ed76e8854c2688eb9a509b876471d8
Johann Koenig [Mon, 29 Oct 2018 18:55:52 +0000 (18:55 +0000)]
Merge "vp8 bilinear: ensure temp array is aligned"
Sai Deng [Mon, 29 Oct 2018 17:14:09 +0000 (17:14 +0000)]
Merge "Enable 10 bit tpl support"
Johann [Mon, 29 Oct 2018 16:21:15 +0000 (09:21 -0700)]
vp8 bilinear: ensure temp array is aligned
Loads and stores to this array require 16 byte alignment.
BUG=webm:1570
Change-Id: I82c7d21c9539a108930fd030d79caaa0bcd1eeb3
Johann Koenig [Mon, 29 Oct 2018 02:07:20 +0000 (02:07 +0000)]
Merge "remove "register" keyword"
Jingning Han [Sat, 27 Oct 2018 03:56:18 +0000 (03:56 +0000)]
Merge "Remove unused macros from vp9_firstpass.c"
sdeng [Wed, 24 Oct 2018 23:23:24 +0000 (16:23 -0700)]
Enable 10 bit tpl support
lowres_bd10 midres_bd10
avg_psnr -0.897 -1.261
ovr_psnr -0.975 -1.349
Change-Id: Id54f2c419f4edaa91e89ffea52b4038b1d94e563
Johann [Fri, 26 Oct 2018 21:55:26 +0000 (14:55 -0700)]
remove "register" keyword
This has been deprecated for a long time. c++17 is trying to recover the name.
Change-Id: Iade6bebce03a50b76061695f9e634a107cd989cd
Harish Mahendrakar [Fri, 26 Oct 2018 18:31:41 +0000 (18:31 +0000)]
Merge "Add Memory to Enable Row Decode"
Jingning Han [Fri, 26 Oct 2018 18:03:31 +0000 (11:03 -0700)]
Remove unused macros from vp9_firstpass.c
Change-Id: If5267a8c71113b171b7bddda5b49f0326c4266b8
Johann [Thu, 25 Oct 2018 19:23:03 +0000 (12:23 -0700)]
vp8 bilinear: rewrite 4x4
~20% faster than the MMX. Removes the last usage of
vp8_bilinear_filters_x86_[48].
Change-Id: Iee976fab9655d0020440f26c4403ce50103af913
Johann Koenig [Thu, 25 Oct 2018 19:59:06 +0000 (19:59 +0000)]
Merge "vp8 bilinear: rewrite 16x16"
Chi Yo Tsai [Thu, 25 Oct 2018 18:25:09 +0000 (18:25 +0000)]
Merge "Add AVX2 support for 4-tap interpolation filter."
Johann [Wed, 24 Oct 2018 22:48:32 +0000 (15:48 -0700)]
vp8 bilinear: rewrite 16x16
Marginally faster. Most importantly it drops a dependency on an
external symbol (vp8_bilinear_filters_x86_8).
Change-Id: Iff022e718720f1f0eeced6201a1ad69a9c9c4f45
Johann Koenig [Thu, 25 Oct 2018 17:13:50 +0000 (17:13 +0000)]
Merge "vp8 bilinear: rewrite in intrinsics"
Ritu Baldwa [Wed, 10 Oct 2018 10:55:51 +0000 (16:25 +0530)]
Add Memory to Enable Row Decode
Row based multi-thread needs extra memory to store the parsed
co-efficients, partitions and eob. This commit adds memory for the same.
Change-Id: I13fa4a6ada2ec3048bc973e465055b832429388f
Jingning Han [Thu, 25 Oct 2018 00:01:52 +0000 (00:01 +0000)]
Merge "Enable tpl model to support multi-layer ARF"
Jingning Han [Thu, 25 Oct 2018 00:01:46 +0000 (00:01 +0000)]
Merge "Reset frame udpate flags after qp estimate in tpl"
Jingning Han [Thu, 25 Oct 2018 00:01:41 +0000 (00:01 +0000)]
Merge "Bypass processing on use existing frame"
Jingning Han [Thu, 25 Oct 2018 00:01:35 +0000 (00:01 +0000)]
Merge "Fix frame offset computation for GOP extension"
Jingning Han [Thu, 25 Oct 2018 00:01:29 +0000 (00:01 +0000)]
Merge "Refactor gop_length use case in tpl model"
Johann [Wed, 24 Oct 2018 19:22:35 +0000 (12:22 -0700)]
vp8 bilinear: rewrite in intrinsics
8x8 is 15% faster than the assembly. 8x4 is 200% faster than MMX.
Remove MMX version.
Change-Id: I55642ebd276db265911f2c79616177a3a9a7e04f
Chi Yo Tsai [Wed, 24 Oct 2018 16:36:20 +0000 (16:36 +0000)]
Merge "Clean up vpx_dsp/x86/convolve_sse2.h"
Jingning Han [Wed, 24 Oct 2018 03:30:35 +0000 (20:30 -0700)]
Enable tpl model to support multi-layer ARF
Enable temporal dependency model for the base layer ARF. It
improves the multi-layer ARF compression performance (results
are tested in speed 0 vbr mode):
avg PSNR overall PSNR SSIM
lowres -0.40% -0.46% -0.32%
midres -0.59% -0.68% -0.45%
720p -0.55% -0.59% -1.07%
Change-Id: I7790b89ccfb6e61f9b7965f34d348c7440220dd0
chiyotsai [Tue, 23 Oct 2018 19:42:21 +0000 (12:42 -0700)]
Add AVX2 support for 4-tap interpolation filter.
Performance:
| 4X4 | 8X8 |16X16|64X64|
2 DIM|1.491|1.902|1.772|1.479|
HORZ|1.145|1.521|1.757|1.497|
VERT|1.176|1.614|1.707|1.467|
Each number in the chart above is 8-tap function time / 4-tap function time.
The framerate tested on jets.y4m for 100 frames on speed 1 increased from 3.72
fps to 3.91 fps (about 5% increase).
Change-Id: Ic0ad275cf32fafeefd0a89811badd8adff2134a0
chiyotsai [Thu, 18 Oct 2018 16:51:56 +0000 (09:51 -0700)]
Clean up vpx_dsp/x86/convolve_sse2.h
Removes unnecesssary includes and reword some functions/comments.
Change-Id: Ied557d7faa9d845d38255e6e3e0e3fe1395276e1
Jingning Han [Tue, 23 Oct 2018 23:24:28 +0000 (16:24 -0700)]
Reset frame udpate flags after qp estimate in tpl
After the frame quantizer estimate run in tpl model, reset the
actual value assigned to the current coding frame. This would
avoid certain frame update flags being overwritten by different
frame types' update.
Change-Id: Idde2ba1108f1f68747b14149b211f882965c99f0
Yunqing Wang [Tue, 23 Oct 2018 22:29:46 +0000 (22:29 +0000)]
Merge "Use 8-tap interp filter in temporal filtering"
Yunqing Wang [Tue, 23 Oct 2018 19:30:13 +0000 (12:30 -0700)]
Use 8-tap interp filter in temporal filtering
Used 8-tap interp filter in temporal filtering to achieve more accurate
motion search result. Using 8-tap sharp gave slight better result than
using 8-tap regular.
Speed 0 borg test showed that
avg_psnr: ovr_psnr: ssim:
hdres: -0.160 -0.157 -0.173
midres: -0.083 -0.061 -0.183
lowres: -0.077 -0.099 -0.204
Speed test didn't see noticeable encoder time changes.
Change-Id: I97dc3c4864b5a5675a6c1e3952799b81eedd7d93
Jingning Han [Tue, 23 Oct 2018 19:16:41 +0000 (19:16 +0000)]
Merge "Remove empty else branch in mode_estimation"
Jingning Han [Mon, 22 Oct 2018 23:25:43 +0000 (16:25 -0700)]
Bypass processing on use existing frame
The use of show existing frame requries no further operation on
that coding frame. Bypass the corresponding process.
Change-Id: Ia092027a8a543be0ca54c00b4d51e453039712b8
Jingning Han [Mon, 22 Oct 2018 21:21:48 +0000 (14:21 -0700)]
Fix frame offset computation for GOP extension
Properly compute the extended GOP frames' buffer offsets.
Change-Id: I9aed14f4b8d623f1832e782828dce07aa546507d
Jingning Han [Mon, 22 Oct 2018 17:37:09 +0000 (10:37 -0700)]
Refactor gop_length use case in tpl model
Make it support both single- and multi-layer ARF GOP structure.
Change-Id: I760a95804d1b583b057120f6d6be65195a0e6c19
Jingning Han [Tue, 23 Oct 2018 05:51:48 +0000 (22:51 -0700)]
Remove empty else branch in mode_estimation
Change-Id: Iefa184aae80b920b054e3e922a77244c2b0d4b61
Jingning Han [Tue, 23 Oct 2018 02:28:22 +0000 (02:28 +0000)]
Merge "Use the proper gfu_boost factor to compute rd_mult"
Jingning Han [Mon, 22 Oct 2018 16:28:04 +0000 (09:28 -0700)]
Use the proper gfu_boost factor to compute rd_mult
Update the Lagrangian multiplier according to the gfu_boost factor
assigned per frame. It improves the multi-layer ARF compression
performance (results below shown for speed 0):
avg PSNR overall PSNR SSIM
lowres -0.08% 0.02% -0.28%
midres -0.08% 0.03% -0.22%
hdres -0.19% -0.10% -0.39%
nflx2k -0.29% -0.18% -0.85%
Change-Id: Ifeb4b14918f880ba011ea41c1454ab00504f8855
Hui Su [Fri, 19 Oct 2018 16:48:40 +0000 (16:48 +0000)]
Merge "ML_VAR_PARTITION: enable at speed 5"
Hui Su [Tue, 16 Oct 2018 03:45:07 +0000 (20:45 -0700)]
ML_VAR_PARTITION: enable at speed 5
When the ML_VAR_PARTITION experiment is turned on, replace
REFERENCE_PARTITION with ML_BASED_PARTITION at speed 5.
Coding gains(avg_psnr) compared to baseline:
ytlivehr 1.63%
ytlivelr 0.07%
Tested encoding speed with several clips from ytlivehr and ytlivelr
on linux desktop(rt, vbr, 4 threads). Encoder speed is on average
faster than baseline:
360p: 14% faster
720p: 7% faster
1080p: 1.5% faster
Change-Id: I39b00078176ff516f7306818f33ba2b1ea53dfa1
chiyotsai [Thu, 18 Oct 2018 16:34:20 +0000 (09:34 -0700)]
Changes 4-tap SSSE3 filter to 8-tap AVX2 filter.
AVX2's 8-tap filter is slightly faster than 4-tap SSSE3 filter.
Change-Id: I5fc37c431670780108706b206b32c791828555c9
Chi Yo Tsai [Thu, 18 Oct 2018 18:19:41 +0000 (18:19 +0000)]
Merge "Add SSSE3 support for 4-tap interpolation filter"
Hui Su [Thu, 18 Oct 2018 16:46:15 +0000 (16:46 +0000)]
Merge "Enable rect partition search for HBD at speed 1"
chiyotsai [Wed, 17 Oct 2018 21:52:26 +0000 (14:52 -0700)]
Add SSSE3 support for 4-tap interpolation filter
Performance:
| 4X4 | 8X8 |16X16|64X64|
2 DIM|1.526|1.827|1.844|1.906|
HORZ|1.336|1.795|1.886|1.654|
VERT|1.443|1.539|2.139|2.190|
The ratio is SSSE3 8-tap time / SSSE3 4-tap time.
Change-Id: I01ed2ab494428256e918875774a459afecc5ec6a
Jingning Han [Thu, 18 Oct 2018 16:25:37 +0000 (16:25 +0000)]
Merge "Replace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size"
Yunqing Wang [Wed, 17 Oct 2018 23:11:46 +0000 (23:11 +0000)]
Merge "Optimize vp9_highbd_temporal_filter_apply_c"
Jingning Han [Wed, 17 Oct 2018 23:04:21 +0000 (16:04 -0700)]
Replace MAX_LAG_BUFFERS with MAX_ARF_GOP_SIZE for gop size
MAX_ARF_GOP_SIZE accurately reflects the maximum frame operated
per group of pictures. Use that to replace MAX_LAG_BUFFERS in
such use cases.
Change-Id: Id26f9b1b2b0c38f255dee19795356c387d06d033
Angie Chiang [Wed, 17 Oct 2018 23:10:50 +0000 (23:10 +0000)]
Merge changes I6d5c77af,I6bf504b4,Ie5dc5ea7,Ie6024b1a,If45fba8a, ...
* changes:
Add do_motion_search
Preserve code of doing mv search in raster order
Variant implementation of changing mv search order
Add feature_score_loc_sort
Init mv_[dist/cost]_sum in init_tpl_stats
Change mv search order according to feature_score
Angie Chiang [Wed, 17 Oct 2018 21:45:08 +0000 (14:45 -0700)]
Add do_motion_search
This will make the code cleaner.
Change-Id: I6d5c77af7261c39656b35ec40ac1451bbdbfb7a7
Chi Yo Tsai [Wed, 17 Oct 2018 21:35:14 +0000 (21:35 +0000)]
Merge "Adds SSE2 support for interpolation filter for width 4 and 8"
Angie Chiang [Wed, 17 Oct 2018 20:56:42 +0000 (13:56 -0700)]
Preserve code of doing mv search in raster order
With this change, there will be three version of mv search scheme
on the codebase simultaneously.
We will do further experiment to evaluate which version is better
in terms of visual quality and coding performance.
Change-Id: I6bf504b4551316ef10b8a341ab3ba14d0ec977ce
Hui Su [Wed, 17 Oct 2018 15:40:26 +0000 (08:40 -0700)]
Enable rect partition search for HBD at speed 1
This patch enables rectangular partition search on speed 1 for high
bit depth encoding. The encoding speed loss is reduced thanks to
recently added speed features.
This only affects speed 1 high bit-depth encoding.
Coding gains:
avg_psnr ovr_psnr
lowres_bd10(480p) 1.34% 1.40%
midres_bd10(720p) 1.28% 1.33%
Average speed loss:
QP=30 QP=40 QP=50 average
480p 2.5% 2.3% 2.6% 2.5%
720p 4.0% 3.9% 3.2% 3.7%
Change-Id: Id9cac4eea0769d94e093c9d170194659b3342d89
chiyotsai [Tue, 16 Oct 2018 22:45:05 +0000 (15:45 -0700)]
Adds SSE2 support for interpolation filter for width 4 and 8
Performance:
The chart below shows the speed relative to baseline
(baseline_time/new_time)
_____| 4X4 | 8X8 |16X16|64X64|
2 DIM|1.889|1.780|1.811|1.963|
HORZ|2.266|1.834|1.617|1.595|
VERI|2.043|2.190|2.373|2.485|
Change-Id: Ic4262222db78f013b94a8c61b46efb8520722927
Urvang Joshi [Wed, 17 Oct 2018 20:25:02 +0000 (20:25 +0000)]
Merge "For keyframe-only coding do not boost in q mode"
Urvang Joshi [Wed, 17 Oct 2018 18:48:10 +0000 (11:48 -0700)]
For keyframe-only coding do not boost in q mode
If we are using keyframe only coding - either coding a
single frame, or a sequence of keyframes - in the end-usage=q
mode, use the cq_level directly as the quality of each
coded frame, rather than boost them.
Ported from AV1:
563a0d1eb92bdc1e987df071a568d8406c4ffa92
Change-Id: I6dc929b8b4f0aa18e279139077f3a87958c92245
chiyotsai [Tue, 16 Oct 2018 19:26:34 +0000 (12:26 -0700)]
Refactor SSE2 Code for 4-tap interpolation filter on width 16.
Some repeated codes are refactored as inline functions. No performance
degradation is observed. These inline functions can be used for width 8
and width 4.
Change-Id: Ibf08cc9ebd2dd47bd2a6c2bcc1616f9d4c252d4d
Yunqing Wang [Sat, 13 Oct 2018 00:21:23 +0000 (17:21 -0700)]
Optimize vp9_highbd_temporal_filter_apply_c
Following the previous patch:
(https://chromium-review.googlesource.com/c/webm/libvpx/+/1277913),
this patch modified the highbd version of applying temporal filter
in the similar way.
Change-Id: I2bb6f1fff6e32bca86f7139a497181d34aa9f3ec
chiyotsai [Wed, 17 Oct 2018 00:50:37 +0000 (17:50 -0700)]
Add SSE2 support for 4-tap interpolation filter for width 16.
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.
Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c
Angie Chiang [Tue, 16 Oct 2018 19:31:13 +0000 (12:31 -0700)]
Variant implementation of changing mv search order
We start mv search from the block with highest feature score, then
move on to the block's neighbors with with an searching order using
their feature scores.
We use max heap to help us achieve the functionality.
This feature is under flag USE_PQSORT
Change-Id: Ie5dc5ea715b0f9a7a594e5080a7cb4f5309f5597
Angie Chiang [Mon, 15 Oct 2018 19:25:22 +0000 (12:25 -0700)]
Add feature_score_loc_sort
This CL is for facilitating the upcoming change,
a variant implementation of change mv search order according to
feature score
Change-Id: Ie6024b1a5ec02343aea6aa81fc14f94e2e515d06
Angie Chiang [Fri, 12 Oct 2018 22:37:26 +0000 (15:37 -0700)]
Init mv_[dist/cost]_sum in init_tpl_stats
Change-Id: If45fba8a74186803eec09da7dbaf2e1fe4e9e156
Angie Chiang [Thu, 11 Oct 2018 00:43:22 +0000 (17:43 -0700)]
Change mv search order according to feature_score
Sort the feature_score in descending order.
Do mv search from the block with higher score to the block with
lower score
Change-Id: I47a87cd66ea3e40d8c8fc55a7517ab8aa10fdb94
Wan-Teh Chang [Wed, 17 Oct 2018 14:43:22 +0000 (14:43 +0000)]
Merge "Reduce the cpi->scaled_ref_idx array size by 1."
Jingning Han [Tue, 16 Oct 2018 21:24:17 +0000 (21:24 +0000)]
Merge "Refactor tpl dependency model to support multi-layer ARF updates"
Jingning Han [Tue, 16 Oct 2018 21:23:52 +0000 (21:23 +0000)]
Merge "Refactor GOP reference frame ordering for tpl model"
Jingning Han [Tue, 16 Oct 2018 21:07:55 +0000 (21:07 +0000)]
Merge "Record gop size"
Hui Su [Tue, 16 Oct 2018 20:58:45 +0000 (20:58 +0000)]
Merge "Fix a bug in ml_prune_rect_partition()"
Yunqing Wang [Tue, 16 Oct 2018 17:55:37 +0000 (17:55 +0000)]
Merge "Fix the filter tap calculation in mips optimizations"
Hui Su [Tue, 16 Oct 2018 16:50:13 +0000 (09:50 -0700)]
Fix a bug in ml_prune_rect_partition()
The quantization step size should be scaled properly for high bit depth
settings.
This only affects speed 0.
Encoder speed change is almost neutral.
There is a small coding gain of 0.09%.
Change-Id: I96b2bae03a53ce8ccd6428e3a050cfe18e06a024