platform/upstream/libvpx.git
6 years agovp9: Fix mem rel for non-ref for external buffer.
Jerome Jiang [Tue, 7 Nov 2017 21:00:01 +0000 (13:00 -0800)]
vp9: Fix mem rel for non-ref for external buffer.

Release frame buffers for non-ref when the decoder is destroyed.

Enable the non ref test.

BUG=b/68819248

Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa

6 years agovp9: Add nonref frame buffer test.
Jerome Jiang [Wed, 8 Nov 2017 01:20:34 +0000 (17:20 -0800)]
vp9: Add nonref frame buffer test.

The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.

Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.

Disabled for now. Will be enabled with the fix.

BUG=b/68819248

Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5

6 years agoCompound prediction mode for nonrd pickmode.
Marco [Wed, 25 Oct 2017 22:45:11 +0000 (15:45 -0700)]
Compound prediction mode for nonrd pickmode.

Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.

Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.

avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.

Small/negligible decrease in speed.

Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d

6 years agoMerge "vp9: Move allocation of vt2 after early exits."
Jerome Jiang [Wed, 1 Nov 2017 16:58:01 +0000 (16:58 +0000)]
Merge "vp9: Move allocation of vt2 after early exits."

6 years agovp9: Move allocation of vt2 after early exits.
Jerome Jiang [Tue, 31 Oct 2017 23:53:46 +0000 (16:53 -0700)]
vp9: Move allocation of vt2 after early exits.

Remove the memory deallocation on the early exits.

Change-Id: I00b4a814ae6705105ecab89644d055ca3311d9f4

6 years agoMerge "vp9: Reduce stack usage of choose_partitioning."
Jerome Jiang [Tue, 31 Oct 2017 21:42:18 +0000 (21:42 +0000)]
Merge "vp9: Reduce stack usage of choose_partitioning."

6 years agovp9: Reduce stack usage of choose_partitioning.
Jerome Jiang [Tue, 31 Oct 2017 02:21:24 +0000 (19:21 -0700)]
vp9: Reduce stack usage of choose_partitioning.

Move vt2 to heap.
Reduce the stack usage from ~87K to ~44K.

BUG=b/68362457

Change-Id: I8f5f93712934d59a8cc4564378172d409a736a2e

6 years agoMerge "vp9: Reduce stack usage of choose_partioning."
Jerome Jiang [Mon, 30 Oct 2017 23:39:41 +0000 (23:39 +0000)]
Merge "vp9: Reduce stack usage of choose_partioning."

6 years agovp9: Reduce stack usage of choose_partioning.
Jerome Jiang [Mon, 30 Oct 2017 20:24:29 +0000 (13:24 -0700)]
vp9: Reduce stack usage of choose_partioning.

Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.

This reduces the stack usage from ~131K to ~87K.

BUG=b/68362457

Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149

6 years agovp8: correct if/else '{' placement
James Zern [Wed, 20 Jul 2016 03:56:25 +0000 (20:56 -0700)]
vp8: correct if/else '{' placement

swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.

Change-Id: I67a45596f80a12389faca49c5be440875092a7df

6 years agovpx: hadamard: use ptrdiff_t instead of int for stride
Scott LaVarnway [Thu, 26 Oct 2017 16:45:06 +0000 (09:45 -0700)]
vpx: hadamard: use ptrdiff_t instead of int for stride

Eliminates the following instruction for the x86 (64 bit)
intrinsic code:

movslq %esi,%rax

Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae

6 years agoMerge "Optimize convolve8 SSSE3 and AVX2 intrinsics"
Kyle Siefring [Tue, 24 Oct 2017 19:22:36 +0000 (19:22 +0000)]
Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"

6 years agoOptimize convolve8 SSSE3 and AVX2 intrinsics
Kyle Siefring [Sun, 22 Oct 2017 23:34:19 +0000 (19:34 -0400)]
Optimize convolve8 SSSE3 and AVX2 intrinsics

Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668

6 years agoMerge "vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix"
Scott LaVarnway [Mon, 23 Oct 2017 22:02:59 +0000 (22:02 +0000)]
Merge "vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix"

6 years agovp9-svc: Allow for adapt_rd_thresh with row-mt.
Marco [Mon, 23 Oct 2017 17:58:28 +0000 (10:58 -0700)]
vp9-svc: Allow for adapt_rd_thresh with row-mt.

Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.

~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.

Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc

6 years agovpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix
Scott LaVarnway [Fri, 20 Oct 2017 21:46:41 +0000 (14:46 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix

Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.

Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b

6 years agovpx: [x86] vpx_hadamard_16x16_avx2() improvements
Scott LaVarnway [Fri, 20 Oct 2017 12:21:15 +0000 (05:21 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() improvements

~10% performance gain.  Fixed the cosmetics noted in the
previous commit.

Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13

6 years agoMerge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
Scott LaVarnway [Thu, 19 Oct 2017 23:32:10 +0000 (23:32 +0000)]
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"

6 years agoMerge "Corpus VBR tweak for undershoot."
Paul Wilkins [Thu, 19 Oct 2017 10:07:45 +0000 (10:07 +0000)]
Merge "Corpus VBR tweak for undershoot."

6 years agoMerge "Increase precision of some debug stats output for corpus VBR."
Paul Wilkins [Thu, 19 Oct 2017 10:07:30 +0000 (10:07 +0000)]
Merge "Increase precision of some debug stats output for corpus VBR."

6 years agoMerge "Prevent double application of min rate in two pass."
Paul Wilkins [Thu, 19 Oct 2017 10:06:33 +0000 (10:06 +0000)]
Merge "Prevent double application of min rate in two pass."

6 years agovpx: [x86] add vpx_hadamard_16x16_avx2()
Scott LaVarnway [Thu, 19 Oct 2017 00:12:57 +0000 (17:12 -0700)]
vpx: [x86] add vpx_hadamard_16x16_avx2()

This version is ~1.91x faster than the sse2 version.  When
highbitdepth is enabled, it is ~1.74x.

Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd

6 years agoMerge "Add datarate test for vp8 ROI."
Jerome Jiang [Wed, 18 Oct 2017 19:39:26 +0000 (19:39 +0000)]
Merge "Add datarate test for vp8 ROI."

6 years agoAdd datarate test for vp8 ROI.
Jerome Jiang [Tue, 17 Oct 2017 21:43:07 +0000 (14:43 -0700)]
Add datarate test for vp8 ROI.

BUG=webm:1470

Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13

6 years agoMerge "vp8: Enable use of ROI map."
Jerome Jiang [Wed, 18 Oct 2017 18:16:44 +0000 (18:16 +0000)]
Merge "vp8: Enable use of ROI map."

6 years agoMerge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"
Kyle Siefring [Wed, 18 Oct 2017 16:19:52 +0000 (16:19 +0000)]
Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"

6 years agoMerge "vp8: [loongson] optimize idct with mmi"
Shiyou Yin [Wed, 18 Oct 2017 00:55:36 +0000 (00:55 +0000)]
Merge "vp8: [loongson] optimize idct with mmi"

6 years agovp8: Enable use of ROI map.
Jerome Jiang [Thu, 12 Oct 2017 22:03:22 +0000 (15:03 -0700)]
vp8: Enable use of ROI map.

Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.

BUG=webm:1470

Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda

6 years agoMerge changes I17fff122,Ic149e3cb
Linfeng Zhang [Tue, 17 Oct 2017 16:03:29 +0000 (16:03 +0000)]
Merge changes I17fff122,Ic149e3cb

* changes:
  Add 4 to 3 scaling SSSE3 optimization
  Test extreme inputs in frame scale functions

6 years agoMerge "Generalize CheckScalingFiltering in ConvolveTest"
Linfeng Zhang [Tue, 17 Oct 2017 16:03:07 +0000 (16:03 +0000)]
Merge "Generalize CheckScalingFiltering in ConvolveTest"

6 years agoRefactor x86/vpx_subpixel_8t_intrin_avx2.c
Kyle Siefring [Sat, 14 Oct 2017 20:26:35 +0000 (16:26 -0400)]
Refactor x86/vpx_subpixel_8t_intrin_avx2.c

Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305

6 years agovp8: [loongson] optimize idct with mmi
Shiyou Yin [Wed, 13 Sep 2017 08:20:21 +0000 (16:20 +0800)]
vp8: [loongson] optimize idct with mmi

1. vp8_dequant_idct_add_y_block_mmi
2. vp8_dequant_idct_add_uv_block_mmi

Change-Id: I9987147be2685ac79d4b045d1d56f6709ee1223c

6 years agoAdd 4 to 3 scaling SSSE3 optimization
Linfeng Zhang [Wed, 11 Oct 2017 18:59:04 +0000 (11:59 -0700)]
Add 4 to 3 scaling SSSE3 optimization

Note this change will trigger the different C version on SSSE3 and
generate different scaled output.

Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().

Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194

6 years agoAdjust threshold in gf_boost for 1 pass vbr
Marco [Fri, 13 Oct 2017 22:31:02 +0000 (15:31 -0700)]
Adjust threshold in gf_boost for 1 pass vbr

Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.

Small improvement in few clips on ytlive, otherwise neutral change.

Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b

6 years agoMerge "Further Corpus VBR change."
Paul Wilkins [Fri, 13 Oct 2017 15:59:58 +0000 (15:59 +0000)]
Merge "Further Corpus VBR change."

6 years agoMerge "Corpus Wide VBR test implementation."
Paul Wilkins [Fri, 13 Oct 2017 15:59:45 +0000 (15:59 +0000)]
Merge "Corpus Wide VBR test implementation."

6 years agoCorpus VBR tweak for undershoot.
paulwilkins [Wed, 11 Oct 2017 09:12:20 +0000 (10:12 +0100)]
Corpus VBR tweak for undershoot.

In cases of strong undershoot adjust Q range down faster.

Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03

6 years agoMerge "vp8: [loongson] optimize dct with mmi"
Shiyou Yin [Fri, 13 Oct 2017 00:37:57 +0000 (00:37 +0000)]
Merge "vp8: [loongson] optimize dct with mmi"

6 years agoMerge "Adjust to scene detection for 1 pass vbr."
Marco Paniconi [Thu, 12 Oct 2017 19:36:33 +0000 (19:36 +0000)]
Merge "Adjust to scene detection for 1 pass vbr."

6 years agoAdjust to scene detection for 1 pass vbr.
Marco [Tue, 10 Oct 2017 22:30:32 +0000 (15:30 -0700)]
Adjust to scene detection for 1 pass vbr.

Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.

No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.

Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026

6 years agoMerge "vp9: use nonrd pick_intra for small blocks on keyframes."
Jerome Jiang [Thu, 12 Oct 2017 17:29:27 +0000 (17:29 +0000)]
Merge "vp9: use nonrd pick_intra for small blocks on keyframes."

6 years agoMerge changes I38783d97,If5160c0c
Kyle Siefring [Thu, 12 Oct 2017 16:12:38 +0000 (16:12 +0000)]
Merge changes I38783d97,If5160c0c

* changes:
  Extend 16 wide AVX2 convolve8 code to support averaging.
  Add AVX2 version of vpx_convolve8_avg.

6 years agoIncrease precision of some debug stats output for corpus VBR.
paulwilkins [Wed, 11 Oct 2017 10:56:44 +0000 (11:56 +0100)]
Increase precision of some debug stats output for corpus VBR.

Change-Id: I75841797cc0c215781b5b36e3a3e9f4b0e35ba63

6 years agovp9: use nonrd pick_intra for small blocks on keyframes.
Jerome Jiang [Thu, 12 Oct 2017 00:13:39 +0000 (17:13 -0700)]
vp9: use nonrd pick_intra for small blocks on keyframes.

Keyframe encoding is more than 2x faster.
Disabled on Speed 8.

Change-Id: I2157318b6ac8253fa5398322c72d98cd7fa9b2b6

6 years agovp8: [loongson] optimize dct with mmi
Shiyou Yin [Wed, 13 Sep 2017 06:03:11 +0000 (14:03 +0800)]
vp8: [loongson] optimize dct with mmi

1. vp8_short_fdct4x4_mmi
2. vp8_short_fdct8x4_mmi
3. vp8_short_walsh4x4_mmi

Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb

6 years agoMerge "vp8: [loongson] optimize quantize with mmi"
Shiyou Yin [Thu, 12 Oct 2017 00:33:17 +0000 (00:33 +0000)]
Merge "vp8: [loongson] optimize quantize with mmi"

6 years agoAdjust threshold in datarate tests for 1 pass VBR
Marco [Wed, 11 Oct 2017 18:06:34 +0000 (11:06 -0700)]
Adjust threshold in datarate tests for 1 pass VBR

Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>

Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16

6 years agoTest extreme inputs in frame scale functions
Linfeng Zhang [Wed, 11 Oct 2017 18:35:19 +0000 (11:35 -0700)]
Test extreme inputs in frame scale functions

Change-Id: Ic149e3cb59be2ee0f98a3fcfd83226ad5ea30c99

6 years agoPrevent double application of min rate in two pass.
paulwilkins [Wed, 11 Oct 2017 09:31:57 +0000 (10:31 +0100)]
Prevent double application of min rate in two pass.

The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.

Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.

For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150):  +0.751, +1.099
New code(50-150): +0.241, -0.009

Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef

6 years agovp8: [loongson] optimize quantize with mmi
Shiyou Yin [Mon, 11 Sep 2017 10:07:25 +0000 (18:07 +0800)]
vp8: [loongson] optimize quantize with mmi

1. vp8_fast_quantize_b_mmi
2. vp8_regular_quantize_b_mmi

Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646

6 years agoAdd 4 to 1 scaling x86 optimization
Linfeng Zhang [Wed, 4 Oct 2017 16:55:56 +0000 (09:55 -0700)]
Add 4 to 1 scaling x86 optimization

Change-Id: I51c190f0a88685867df36912522e67bdae58a673

6 years agoMerge "Fix alignment in vpx_image without external allocation."
Jerome Jiang [Tue, 10 Oct 2017 23:02:05 +0000 (23:02 +0000)]
Merge "Fix alignment in vpx_image without external allocation."

6 years agoFix alignment in vpx_image without external allocation.
Jerome Jiang [Tue, 10 Oct 2017 02:33:03 +0000 (19:33 -0700)]
Fix alignment in vpx_image without external allocation.

This restores behaviors prior to
<40c8fde Fix image width alignment. Enable ImageSizeSetting test.>.

BUG=b/64710201

Change-Id: I559557afe80d5ff5ea6ac24021561715068e7786

6 years agoGeneralize CheckScalingFiltering in ConvolveTest
Linfeng Zhang [Tue, 10 Oct 2017 19:13:55 +0000 (12:13 -0700)]
Generalize CheckScalingFiltering in ConvolveTest

Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.

Change-Id: I1042564d1d390589ca203070fe332c6da3315d75

6 years agoAdjustment to scene detection and key frame.
Marco [Tue, 10 Oct 2017 00:53:21 +0000 (17:53 -0700)]
Adjustment to scene detection and key frame.

For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).

Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.

Change-Id: I12626f7627419ca14f9d0d249df86c7104438162

6 years agoMerge changes I9d4c1af5,I882da3a0
Linfeng Zhang [Tue, 10 Oct 2017 17:29:50 +0000 (17:29 +0000)]
Merge changes I9d4c1af5,I882da3a0

* changes:
  Rename some inline functions in NEON scaling
  Generalize 2:1 vp9_scale_and_extend_frame_ssse3()

6 years agoFurther Corpus VBR change.
paulwilkins [Thu, 17 Aug 2017 13:13:29 +0000 (14:13 +0100)]
Further Corpus VBR change.

Change to the bit allocation within a GF/ARF group.

Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).

This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.

Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708

6 years agoCorpus Wide VBR test implementation.
paulwilkins [Fri, 19 May 2017 14:08:15 +0000 (15:08 +0100)]
Corpus Wide VBR test implementation.

This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.

At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef.  Ultimately this
would need to be controlled by command line parameters.

Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873

6 years agoExtend 16 wide AVX2 convolve8 code to support averaging.
Kyle Siefring [Sun, 8 Oct 2017 03:25:03 +0000 (23:25 -0400)]
Extend 16 wide AVX2 convolve8 code to support averaging.

Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf

6 years agoRename some inline functions in NEON scaling
Linfeng Zhang [Wed, 4 Oct 2017 20:04:40 +0000 (13:04 -0700)]
Rename some inline functions in NEON scaling

Change-Id: I9d4c1af53d57f72fc716bacbe3b0965719c045ac

6 years agoMerge "Update vp9_scale_and_extend_frame_ssse3()"
Linfeng Zhang [Mon, 9 Oct 2017 16:20:00 +0000 (16:20 +0000)]
Merge "Update vp9_scale_and_extend_frame_ssse3()"

6 years agoAdd AVX2 version of vpx_convolve8_avg.
Kyle Siefring [Sat, 7 Oct 2017 20:02:02 +0000 (16:02 -0400)]
Add AVX2 version of vpx_convolve8_avg.

vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983

6 years agoMerge "ppc: Add vpx_idct32x32_1024_add_vsx"
James Zern [Sat, 7 Oct 2017 19:08:26 +0000 (19:08 +0000)]
Merge "ppc: Add  vpx_idct32x32_1024_add_vsx"

6 years agoMerge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad.""
Marco Paniconi [Fri, 6 Oct 2017 22:41:34 +0000 (22:41 +0000)]
Merge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad.""

6 years agoRevert "Speed >=5 real-time: add TM intra mode for high_source_sad."
Marco Paniconi [Fri, 6 Oct 2017 22:14:56 +0000 (22:14 +0000)]
Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."

This reverts commit 9311ef18b4b4eff0da3adf9d702a34f489a270ff.

Reason for revert:
Notice small regression in some clips.
Will revisit in another change.

Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
>
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
>
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

TBR=marpan@google.com,builds@webmproject.org,jianj@google.com

Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true

6 years agoAdjust threshold in scene detection
Marco [Fri, 6 Oct 2017 17:53:40 +0000 (10:53 -0700)]
Adjust threshold in scene detection

For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.

Reduces possible false detection for scene cut detection.

Neutral/small change in metrics or speed for speed 5.

Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76

6 years agoMerge "Speed >=5 real-time: add TM intra mode for high_source_sad."
Marco Paniconi [Fri, 6 Oct 2017 06:29:46 +0000 (06:29 +0000)]
Merge "Speed >=5 real-time: add TM intra mode for high_source_sad."

6 years agoSpeed >=5 real-time: add TM intra mode for high_source_sad.
Marco [Thu, 5 Oct 2017 19:58:51 +0000 (12:58 -0700)]
Speed >=5 real-time: add TM intra mode for high_source_sad.

Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.

Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

6 years agoMerge "vpx_codec.h: namespace local defines"
James Zern [Fri, 6 Oct 2017 05:30:16 +0000 (05:30 +0000)]
Merge "vpx_codec.h: namespace local defines"

6 years agovpx_codec.h: namespace local defines
James Zern [Thu, 5 Oct 2017 22:09:33 +0000 (15:09 -0700)]
vpx_codec.h: namespace local defines

add VPX_ to UNUSED/*DEPRECATED to avoid conflicts with other headers.

Change-Id: Ie16bdac3575bc1af57a05d37e65b994370585377

6 years agovp9_ethread_test: abort early/add more detailed output
James Zern [Thu, 5 Oct 2017 22:02:51 +0000 (15:02 -0700)]
vp9_ethread_test: abort early/add more detailed output

in the case compare_fp_stats fails report the 2 values and their index

Change-Id: I927a832b7a1e24c392961093b7caee1134223def

6 years agoMerge "Adjust threshold for adapt_partition for speed 6."
Marco Paniconi [Thu, 5 Oct 2017 03:28:06 +0000 (03:28 +0000)]
Merge "Adjust threshold for adapt_partition for speed 6."

6 years agoAdjust threshold for adapt_partition for speed 6.
Marco [Thu, 5 Oct 2017 01:01:37 +0000 (18:01 -0700)]
Adjust threshold for adapt_partition for speed 6.

Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.

Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc

6 years agoMerge "Avoid nonrd_pick_partition for speed >= 6."
Marco Paniconi [Wed, 4 Oct 2017 23:36:27 +0000 (23:36 +0000)]
Merge "Avoid nonrd_pick_partition for speed >= 6."

6 years agoAvoid nonrd_pick_partition for speed >= 6.
Marco [Wed, 4 Oct 2017 22:27:45 +0000 (15:27 -0700)]
Avoid nonrd_pick_partition for speed >= 6.

For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.

Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308

6 years agoMerge "Modify early exit for alt_ref in nonrd_pickmode."
Marco Paniconi [Wed, 4 Oct 2017 19:38:49 +0000 (19:38 +0000)]
Merge "Modify early exit for alt_ref in nonrd_pickmode."

6 years agoGeneralize 2:1 vp9_scale_and_extend_frame_ssse3()
Linfeng Zhang [Tue, 3 Oct 2017 23:09:19 +0000 (16:09 -0700)]
Generalize 2:1 vp9_scale_and_extend_frame_ssse3()

Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5

6 years agoUpdate vp9_scale_and_extend_frame_ssse3()
Linfeng Zhang [Tue, 3 Oct 2017 16:59:11 +0000 (09:59 -0700)]
Update vp9_scale_and_extend_frame_ssse3()

Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4

6 years agoModify early exit for alt_ref in nonrd_pickmode.
Marco [Wed, 4 Oct 2017 18:41:52 +0000 (11:41 -0700)]
Modify early exit for alt_ref in nonrd_pickmode.

For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.

Small gain in metrics, ~0.18%, no change in speed.

Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e

6 years agoMerge changes Id6a8c549,Ib1e0650b,Ic369dd86
Linfeng Zhang [Wed, 4 Oct 2017 16:15:14 +0000 (16:15 +0000)]
Merge changes Id6a8c549,Ib1e0650b,Ic369dd86

* changes:
  Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
  Add vpx_dsp/x86/mem_sse2.h
  Add transpose_8bit_{4x4,8x8}() x86 optimization

6 years agoMerge "Fix image width alignment. Enable ImageSizeSetting test."
Jerome Jiang [Wed, 4 Oct 2017 14:48:03 +0000 (14:48 +0000)]
Merge "Fix image width alignment. Enable ImageSizeSetting test."

6 years agoEnable arf usage for speed >= 6, 1 pass vbr.
Marco [Wed, 4 Oct 2017 00:14:24 +0000 (17:14 -0700)]
Enable arf usage for speed >= 6, 1 pass vbr.

For speed 6 on ytlive set:
On average, speed slowdown ~5%, quality gain ~2%.

Change-Id: Ia18237cc1d52c54d7e2cb3c71f571cf37ef61b44

6 years agovp9: 1 pass vbr: Limit qpdelta on high_source_sad.
Marco [Fri, 29 Sep 2017 18:34:00 +0000 (11:34 -0700)]
vp9: 1 pass vbr: Limit qpdelta on high_source_sad.

For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.

Neutral/no change on metrics.

Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea

6 years agoMerge "vpx: fix nasm build errors"
James Zern [Tue, 3 Oct 2017 21:47:49 +0000 (21:47 +0000)]
Merge "vpx: fix nasm build errors"

6 years agovpx: fix nasm build errors
Scott LaVarnway [Sat, 30 Sep 2017 12:51:24 +0000 (05:51 -0700)]
vpx: fix nasm build errors

BUG=webm:1462,766721

Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98

6 years agoRefactor x86/vpx_subpixel_8t_intrin_ssse3.c
Linfeng Zhang [Mon, 2 Oct 2017 21:29:06 +0000 (14:29 -0700)]
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c

Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac

6 years agoAdd vpx_dsp/x86/mem_sse2.h
Linfeng Zhang [Mon, 2 Oct 2017 20:46:15 +0000 (13:46 -0700)]
Add vpx_dsp/x86/mem_sse2.h

Add some load and store sse2 inline functions.

Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222

6 years agoUse adapt_partition for ARF in 1 pass.
Marco [Tue, 3 Oct 2017 17:55:55 +0000 (10:55 -0700)]
Use adapt_partition for ARF in 1 pass.

For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.

Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.

Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536

6 years agoAdd transpose_8bit_{4x4,8x8}() x86 optimization
Linfeng Zhang [Mon, 2 Oct 2017 20:01:56 +0000 (13:01 -0700)]
Add transpose_8bit_{4x4,8x8}() x86 optimization

Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878

6 years agoMerge "ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode."
Marco Paniconi [Tue, 3 Oct 2017 03:01:14 +0000 (03:01 +0000)]
Merge "ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode."

6 years agoARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode.
Marco [Mon, 2 Oct 2017 21:00:18 +0000 (14:00 -0700)]
ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode.

Speedup of ~2-3% on 1080p clips speed 6.
Neutral/negligible loss in metrics on ytlive.

Change-Id: I7ac47a4d8b58c566920bae29a94a0e8d59c36dee

6 years agoAdd 4 to 3 scaling NEON optimization
Linfeng Zhang [Tue, 12 Sep 2017 18:49:58 +0000 (11:49 -0700)]
Add 4 to 3 scaling NEON optimization

Speed comparing with the one calling vpx_scaled_2d_neon()
  ~1.7 x in general
  ~2.8x for BILINEAR filter

BUG=webm:1419

Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6

6 years agoSpecialize 4 to 3 frame scaling in C
Linfeng Zhang [Wed, 20 Sep 2017 17:58:39 +0000 (10:58 -0700)]
Specialize 4 to 3 frame scaling in C

Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.

Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
   Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
   (The drifting is 1/(3*16) in each step.)

BUG=webm:1419

Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b

6 years agoMerge "vpxdsp: [x86] add highbd_d135_predictor functions"
Scott LaVarnway [Mon, 2 Oct 2017 15:00:19 +0000 (15:00 +0000)]
Merge "vpxdsp: [x86] add highbd_d135_predictor functions"

6 years agoppc: Add vpx_idct32x32_1024_add_vsx
Alexandra Hájková [Mon, 31 Jul 2017 19:07:22 +0000 (19:07 +0000)]
ppc: Add  vpx_idct32x32_1024_add_vsx

Change-Id: I55cd0a1569ccc47a53d0ecf751aac259d510e10d

6 years agoFix partition selection in speed features for arf overlay frame.
Marco [Fri, 29 Sep 2017 21:54:56 +0000 (14:54 -0700)]
Fix partition selection in speed features for arf overlay frame.

For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.

Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).

Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee

6 years agoEnable use_altref_onepass for speed 4 real-time mode.
Marco [Fri, 29 Sep 2017 17:53:59 +0000 (10:53 -0700)]
Enable use_altref_onepass for speed 4 real-time mode.

Used for VBR mode with lag-in-frames > 0.
On ytlive set at speed 4: ~3% average gain.

Change-Id: I45dad1700bf8be9d8f177815dc062774f6f2f0de

6 years agovpxdsp: [x86] add highbd_d135_predictor functions
Scott LaVarnway [Fri, 29 Sep 2017 13:34:16 +0000 (06:34 -0700)]
vpxdsp: [x86] add highbd_d135_predictor functions

C vs SSE2 speed gains:
_4x4 : ~1.81x

C vs SSSE3 speed gains:
_8x8 : ~1.96x
_16x16 : ~1.88x
_32x32 : ~2.02x

BUG=webm:1411

Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0

6 years agovpxdsp: [x86] add highbd_d117_predictor functions
Scott LaVarnway [Wed, 20 Sep 2017 12:21:23 +0000 (05:21 -0700)]
vpxdsp: [x86] add highbd_d117_predictor functions

C vs SSE2 speed gains:
_4x4 : ~2.04x

C vs SSSE3 speed gains:
_8x8 : ~2.82x
_16x16 : ~5.93x
_32x32 : ~2.79x

BUG=webm:1411

Change-Id: I31d949695991c067dac89d91e0bed3e666c94993

6 years agoFix image width alignment. Enable ImageSizeSetting test.
Jerome Jiang [Wed, 27 Sep 2017 18:08:37 +0000 (11:08 -0700)]
Fix image width alignment. Enable ImageSizeSetting test.

BUG=b/64710201

Change-Id: I5465f6c6481d3c9a5e00fcab024cf4ae562b6b01