platform/upstream/libvpx.git
6 years agoDisable allow_partition_search_skip for speed 2.
paulwilkins [Thu, 16 Nov 2017 16:15:06 +0000 (16:15 +0000)]
Disable allow_partition_search_skip for speed 2.

When allow_partition_search_skip  is set the two pass code
can optionally skip the partition search in the rd loop if the image
appears static (based on selection of 0,0 motion).

Unfortunately 0,0 motion does not necessarily mean that there are
no meaningful changes or that motion or intra modes will not be selected
in the second pass.

Disabling "allow_partition_search_skip" may hurt the encode speed a little
for a small number of clips but can have a big impact on compression.
The most notable example of this in our test sets is "bridge_close_cif"
where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
psnr-hvs.

Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024

6 years agoCode cleanup.
paulwilkins [Wed, 15 Nov 2017 17:07:28 +0000 (17:07 +0000)]
Code cleanup.

Removal of parameters to and code in calc_frame_boost() that is no
longer required.

No change to results from previous patch.

Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017

6 years agoRemove decay_accumulator clause from alt ref breakout.
paulwilkins [Wed, 15 Nov 2017 16:58:05 +0000 (16:58 +0000)]
Remove decay_accumulator clause from alt ref breakout.

The decay accumulator clause covers similar ground to the
new clause that tests the accumulated second reference error
so it has been removed to reduce complexity.

Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525

6 years agoAdd clause to alt ref group breakout.
paulwilkins [Wed, 15 Nov 2017 16:39:54 +0000 (16:39 +0000)]
Add clause to alt ref group breakout.

Add a clause to the breakout test for alt ref groups that
examines the size of the accumulated second reference
frame error compared to the cost of intra coding.

This clause causes a reduction in the average group length for many
clips. Alongside the change to the group length the minimum
boost is increased.

On balance the results are positive for psnr and psnr-hvs
but is negative for ssim/fast ssim for the smaller image formats.

Strong gains on some harder clips (eg ducks take off (midres) ~20%,
husky (lowres) 6-17%. Most of the negative cases are lower motion
clips. Subsequent patch hopefully will help with those.

Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4

6 years agotiny_ssim.c : clang compile error fix
Scott LaVarnway [Tue, 14 Nov 2017 12:38:00 +0000 (04:38 -0800)]
tiny_ssim.c : clang compile error fix

Change-Id: Ic10ba580fd5da7d6ff7fa0f33db72fb0c1a97801

6 years agoMerge "add 10 and 12 bit to tiny_ssim"
James Bankoski [Tue, 14 Nov 2017 00:15:24 +0000 (00:15 +0000)]
Merge "add 10 and 12 bit to tiny_ssim"

6 years agoMerge "vp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION."
Jerome Jiang [Mon, 13 Nov 2017 21:04:41 +0000 (21:04 +0000)]
Merge "vp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION."

6 years agovp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION.
Jerome Jiang [Mon, 13 Nov 2017 19:05:20 +0000 (11:05 -0800)]
vp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION.

VPX_ENCODER_ABI_VERSION was bumped up in 93e83f.

Change-Id: Id5707f9f9db56fa96549bc8f54e1cfa04e7fa4cd

6 years agoadd 10 and 12 bit to tiny_ssim
Jim Bankoski [Mon, 13 Nov 2017 14:44:17 +0000 (06:44 -0800)]
add 10 and 12 bit to tiny_ssim

Change-Id: I92e4dba2d1682a0d77ad9a214ec4312b1cf4d42e

6 years agoNew content type to improve grain retention.
paulwilkins [Wed, 27 Sep 2017 17:17:18 +0000 (18:17 +0100)]
New content type to improve grain retention.

For new VP9 only content type adjust  the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.

In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.

For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).

The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.

This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.

The features are enabled on the vpxenc command line by setting
--tune-content=film.

VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.

Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a

6 years agoSmall parameter clean up.
paulwilkins [Mon, 6 Nov 2017 11:24:34 +0000 (11:24 +0000)]
Small parameter clean up.

Removed three parameters that are no longer needed in calls
to calc_arf_boost() and associated minor changes.

No impact on encode results.

Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb

6 years agoMerge "Fix to frames considered in arf boost calculation."
Paul Wilkins [Mon, 13 Nov 2017 16:36:43 +0000 (16:36 +0000)]
Merge "Fix to frames considered in arf boost calculation."

6 years agoMerge "CVBR command line option."
Paul Wilkins [Mon, 13 Nov 2017 16:32:39 +0000 (16:32 +0000)]
Merge "CVBR command line option."

6 years agovpx: [x86] add vpx_satd_avx2()
Scott LaVarnway [Fri, 10 Nov 2017 18:19:52 +0000 (10:19 -0800)]
vpx: [x86] add vpx_satd_avx2()

SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize   16: ~1.33
blocksize   64: ~1.51
blocksize  256: ~3.03
blocksize 1024: ~3.71

Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58

6 years agoMerge "vpx: [x86] add vp9_block_error_fp_avx2()"
Scott LaVarnway [Fri, 10 Nov 2017 00:45:47 +0000 (00:45 +0000)]
Merge "vpx: [x86] add vp9_block_error_fp_avx2()"

6 years agoMerge "vp9-svc: Avoid minmax variance for non-reference frames."
Marco Paniconi [Fri, 10 Nov 2017 00:30:04 +0000 (00:30 +0000)]
Merge "vp9-svc: Avoid minmax variance for non-reference frames."

6 years agovp9-svc: Avoid minmax variance for non-reference frames.
Marco [Thu, 9 Nov 2017 23:24:10 +0000 (15:24 -0800)]
vp9-svc: Avoid minmax variance for non-reference frames.

For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.

Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.

Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.

Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5

6 years agoMerge "runtime error fix: bitdepth_conversion_avx2.h"
James Zern [Fri, 10 Nov 2017 00:15:03 +0000 (00:15 +0000)]
Merge "runtime error fix: bitdepth_conversion_avx2.h"

6 years agoMerge "vp9: SVC feature to use partition from lower resolution."
Jerome Jiang [Thu, 9 Nov 2017 23:28:44 +0000 (23:28 +0000)]
Merge "vp9: SVC feature to use partition from lower resolution."

6 years agovp9: SVC feature to use partition from lower resolution.
Jerome Jiang [Wed, 8 Nov 2017 23:12:44 +0000 (15:12 -0800)]
vp9: SVC feature to use partition from lower resolution.

For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.

Enabled for speed >= 7 and only for non-reference frames.

Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.

Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d

6 years agoruntime error fix: bitdepth_conversion_avx2.h
Scott LaVarnway [Thu, 9 Nov 2017 20:26:43 +0000 (12:26 -0800)]
runtime error fix: bitdepth_conversion_avx2.h

Change-Id: I7364a157de39eb7137b599808474b8d46d19d376

6 years agoMerge "fail early on oversize frames"
Johann Koenig [Thu, 9 Nov 2017 19:50:04 +0000 (19:50 +0000)]
Merge "fail early on oversize frames"

6 years agovpx: [x86] add vp9_block_error_fp_avx2()
Scott LaVarnway [Thu, 9 Nov 2017 00:06:29 +0000 (16:06 -0800)]
vpx: [x86] add vp9_block_error_fp_avx2()

SSE2 asm vs AVX2 intrinsics speed gains:
blocksize   16: ~1.00
blocksize   64: ~1.17
blocksize  256: ~1.67
blocksize 1024: ~1.81

Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e

6 years agoFix to frames considered in arf boost calculation.
paulwilkins [Wed, 1 Nov 2017 14:21:39 +0000 (14:21 +0000)]
Fix to frames considered in arf boost calculation.

For a chosen interval "i" the existing arf boost calculation examined frames
+/- (i-1) frames from the current location in the second pass.

This change checks to make sure that the forward search does not extend
beyond the next key frame in the event that the distance to the next key
frame is < (i - 1).

Small metrics gains on all our  test sets but these are localized to a few clips
(e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)

Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6

6 years agoMerge "vp9: Add nonref frame buffer test."
Jerome Jiang [Thu, 9 Nov 2017 04:41:10 +0000 (04:41 +0000)]
Merge "vp9: Add nonref frame buffer test."

6 years agovp9: Add nonref frame buffer test.
Jerome Jiang [Wed, 8 Nov 2017 01:20:34 +0000 (17:20 -0800)]
vp9: Add nonref frame buffer test.

The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.

Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.

Disabled for now. Will be enabled with the fix.

BUG=b/68819248

Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5

6 years agoMerge "Support building AVX-512 and implement sadx4 for AVX-512"
Johann Koenig [Wed, 8 Nov 2017 16:28:40 +0000 (16:28 +0000)]
Merge "Support building AVX-512 and implement sadx4 for AVX-512"

6 years agoCVBR command line option.
paulwilkins [Tue, 10 Oct 2017 18:49:59 +0000 (19:49 +0100)]
CVBR command line option.

Added command line control of Corpus VBR.

The new corpus vbr mode is a variant of standard
VBR (end-usage=0) where the complexity distribution
mid point is passed in rather than calculated for a specific
clip or chunk.

The new variant is enabled by setting a new command line
parameter --corpus-complexity to a zero value. Omitting
this parameter or setting it to 0 will cause the codec to use
standard vbr mode.

The correct value for a given corpus needs to be derived
experimentally using a training set such that the average
rate for the corpus is close to the target value.

For example our using our low res test set with upper and lower
vbr limits of 50%-150% and a corpus complexity value of 650
gives a similar average data rate across the set to using standard
vbr. However, with the corpus mode easier clips will be allocated
fewer bits and harder clips more bits rather than having the same
rate target for all.

Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589

6 years agoNonrd_pickmode: avoid computing UV cost when early_term is set.
Marco [Fri, 3 Nov 2017 18:29:35 +0000 (11:29 -0700)]
Nonrd_pickmode: avoid computing UV cost when early_term is set.

For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).

Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.

Very small speed gain (~0.5%) on some clips with strong color content.

Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07

6 years agoSupport building AVX-512 and implement sadx4 for AVX-512
Kyle Siefring [Tue, 31 Oct 2017 15:19:19 +0000 (11:19 -0400)]
Support building AVX-512 and implement sadx4 for AVX-512

The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.

Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7

6 years agoCompound prediction mode for nonrd pickmode.
Marco [Wed, 25 Oct 2017 22:45:11 +0000 (15:45 -0700)]
Compound prediction mode for nonrd pickmode.

Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.

Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.

avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.

Small/negligible decrease in speed.

Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d

6 years agofail early on oversize frames
Johann [Fri, 3 Nov 2017 16:49:13 +0000 (09:49 -0700)]
fail early on oversize frames

Even though frame_size is calculated in uint64_t, it winds up in an int
size value.

This was exposed with the msan test because the memset is called with
(int)frame_size, leading to a segfault.

Change-Id: I7fd930360dca274adb8f3e43e5e6785204808861

6 years agoMerge "vp9: Move allocation of vt2 after early exits."
Jerome Jiang [Wed, 1 Nov 2017 16:58:01 +0000 (16:58 +0000)]
Merge "vp9: Move allocation of vt2 after early exits."

6 years agovp9: Move allocation of vt2 after early exits.
Jerome Jiang [Tue, 31 Oct 2017 23:53:46 +0000 (16:53 -0700)]
vp9: Move allocation of vt2 after early exits.

Remove the memory deallocation on the early exits.

Change-Id: I00b4a814ae6705105ecab89644d055ca3311d9f4

6 years agoMerge "vp9: Reduce stack usage of choose_partitioning."
Jerome Jiang [Tue, 31 Oct 2017 21:42:18 +0000 (21:42 +0000)]
Merge "vp9: Reduce stack usage of choose_partitioning."

6 years agovp9: Reduce stack usage of choose_partitioning.
Jerome Jiang [Tue, 31 Oct 2017 02:21:24 +0000 (19:21 -0700)]
vp9: Reduce stack usage of choose_partitioning.

Move vt2 to heap.
Reduce the stack usage from ~87K to ~44K.

BUG=b/68362457

Change-Id: I8f5f93712934d59a8cc4564378172d409a736a2e

6 years agoMerge "vp9: Reduce stack usage of choose_partioning."
Jerome Jiang [Mon, 30 Oct 2017 23:39:41 +0000 (23:39 +0000)]
Merge "vp9: Reduce stack usage of choose_partioning."

6 years agovp9: Reduce stack usage of choose_partioning.
Jerome Jiang [Mon, 30 Oct 2017 20:24:29 +0000 (13:24 -0700)]
vp9: Reduce stack usage of choose_partioning.

Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.

This reduces the stack usage from ~131K to ~87K.

BUG=b/68362457

Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149

6 years agovp8: correct if/else '{' placement
James Zern [Wed, 20 Jul 2016 03:56:25 +0000 (20:56 -0700)]
vp8: correct if/else '{' placement

swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.

Change-Id: I67a45596f80a12389faca49c5be440875092a7df

6 years agovpx: hadamard: use ptrdiff_t instead of int for stride
Scott LaVarnway [Thu, 26 Oct 2017 16:45:06 +0000 (09:45 -0700)]
vpx: hadamard: use ptrdiff_t instead of int for stride

Eliminates the following instruction for the x86 (64 bit)
intrinsic code:

movslq %esi,%rax

Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae

6 years agoMerge "Optimize convolve8 SSSE3 and AVX2 intrinsics"
Kyle Siefring [Tue, 24 Oct 2017 19:22:36 +0000 (19:22 +0000)]
Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"

6 years agoOptimize convolve8 SSSE3 and AVX2 intrinsics
Kyle Siefring [Sun, 22 Oct 2017 23:34:19 +0000 (19:34 -0400)]
Optimize convolve8 SSSE3 and AVX2 intrinsics

Changed the intrinsics to perform summation similiar to the way the assembly does.

The new code diverges from the assembly by preferring unsaturated additions.

Results for haswell

SSSE3
Horiz/Vert  Size  Speedup
Horiz       x4    ~32%
Horiz       x8    ~6%
Vert        x8    ~4%

AVX2
Horiz/Vert  Size  Speedup
Horiz       x16   ~16%
Vert        x16   ~14%

BUG=webm:1471

Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668

6 years agoMerge "vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix"
Scott LaVarnway [Mon, 23 Oct 2017 22:02:59 +0000 (22:02 +0000)]
Merge "vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix"

6 years agovp9-svc: Allow for adapt_rd_thresh with row-mt.
Marco [Mon, 23 Oct 2017 17:58:28 +0000 (10:58 -0700)]
vp9-svc: Allow for adapt_rd_thresh with row-mt.

Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.

~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.

Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc

6 years agovpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix
Scott LaVarnway [Fri, 20 Oct 2017 21:46:41 +0000 (14:46 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix

Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.

Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b

6 years agovpx: [x86] vpx_hadamard_16x16_avx2() improvements
Scott LaVarnway [Fri, 20 Oct 2017 12:21:15 +0000 (05:21 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() improvements

~10% performance gain.  Fixed the cosmetics noted in the
previous commit.

Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13

6 years agoMerge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
Scott LaVarnway [Thu, 19 Oct 2017 23:32:10 +0000 (23:32 +0000)]
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"

6 years agoMerge "Corpus VBR tweak for undershoot."
Paul Wilkins [Thu, 19 Oct 2017 10:07:45 +0000 (10:07 +0000)]
Merge "Corpus VBR tweak for undershoot."

6 years agoMerge "Increase precision of some debug stats output for corpus VBR."
Paul Wilkins [Thu, 19 Oct 2017 10:07:30 +0000 (10:07 +0000)]
Merge "Increase precision of some debug stats output for corpus VBR."

6 years agoMerge "Prevent double application of min rate in two pass."
Paul Wilkins [Thu, 19 Oct 2017 10:06:33 +0000 (10:06 +0000)]
Merge "Prevent double application of min rate in two pass."

6 years agovpx: [x86] add vpx_hadamard_16x16_avx2()
Scott LaVarnway [Thu, 19 Oct 2017 00:12:57 +0000 (17:12 -0700)]
vpx: [x86] add vpx_hadamard_16x16_avx2()

This version is ~1.91x faster than the sse2 version.  When
highbitdepth is enabled, it is ~1.74x.

Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd

6 years agoMerge "Add datarate test for vp8 ROI."
Jerome Jiang [Wed, 18 Oct 2017 19:39:26 +0000 (19:39 +0000)]
Merge "Add datarate test for vp8 ROI."

6 years agoAdd datarate test for vp8 ROI.
Jerome Jiang [Tue, 17 Oct 2017 21:43:07 +0000 (14:43 -0700)]
Add datarate test for vp8 ROI.

BUG=webm:1470

Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13

6 years agoMerge "vp8: Enable use of ROI map."
Jerome Jiang [Wed, 18 Oct 2017 18:16:44 +0000 (18:16 +0000)]
Merge "vp8: Enable use of ROI map."

6 years agoMerge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"
Kyle Siefring [Wed, 18 Oct 2017 16:19:52 +0000 (16:19 +0000)]
Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"

6 years agoMerge "vp8: [loongson] optimize idct with mmi"
Shiyou Yin [Wed, 18 Oct 2017 00:55:36 +0000 (00:55 +0000)]
Merge "vp8: [loongson] optimize idct with mmi"

6 years agovp8: Enable use of ROI map.
Jerome Jiang [Thu, 12 Oct 2017 22:03:22 +0000 (15:03 -0700)]
vp8: Enable use of ROI map.

Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.

BUG=webm:1470

Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda

6 years agoMerge changes I17fff122,Ic149e3cb
Linfeng Zhang [Tue, 17 Oct 2017 16:03:29 +0000 (16:03 +0000)]
Merge changes I17fff122,Ic149e3cb

* changes:
  Add 4 to 3 scaling SSSE3 optimization
  Test extreme inputs in frame scale functions

6 years agoMerge "Generalize CheckScalingFiltering in ConvolveTest"
Linfeng Zhang [Tue, 17 Oct 2017 16:03:07 +0000 (16:03 +0000)]
Merge "Generalize CheckScalingFiltering in ConvolveTest"

6 years agoRefactor x86/vpx_subpixel_8t_intrin_avx2.c
Kyle Siefring [Sat, 14 Oct 2017 20:26:35 +0000 (16:26 -0400)]
Refactor x86/vpx_subpixel_8t_intrin_avx2.c

Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305

6 years agovp8: [loongson] optimize idct with mmi
Shiyou Yin [Wed, 13 Sep 2017 08:20:21 +0000 (16:20 +0800)]
vp8: [loongson] optimize idct with mmi

1. vp8_dequant_idct_add_y_block_mmi
2. vp8_dequant_idct_add_uv_block_mmi

Change-Id: I9987147be2685ac79d4b045d1d56f6709ee1223c

6 years agoAdd 4 to 3 scaling SSSE3 optimization
Linfeng Zhang [Wed, 11 Oct 2017 18:59:04 +0000 (11:59 -0700)]
Add 4 to 3 scaling SSSE3 optimization

Note this change will trigger the different C version on SSSE3 and
generate different scaled output.

Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().

Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194

6 years agoAdjust threshold in gf_boost for 1 pass vbr
Marco [Fri, 13 Oct 2017 22:31:02 +0000 (15:31 -0700)]
Adjust threshold in gf_boost for 1 pass vbr

Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.

Small improvement in few clips on ytlive, otherwise neutral change.

Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b

6 years agoMerge "Further Corpus VBR change."
Paul Wilkins [Fri, 13 Oct 2017 15:59:58 +0000 (15:59 +0000)]
Merge "Further Corpus VBR change."

6 years agoMerge "Corpus Wide VBR test implementation."
Paul Wilkins [Fri, 13 Oct 2017 15:59:45 +0000 (15:59 +0000)]
Merge "Corpus Wide VBR test implementation."

6 years agoCorpus VBR tweak for undershoot.
paulwilkins [Wed, 11 Oct 2017 09:12:20 +0000 (10:12 +0100)]
Corpus VBR tweak for undershoot.

In cases of strong undershoot adjust Q range down faster.

Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03

6 years agoMerge "vp8: [loongson] optimize dct with mmi"
Shiyou Yin [Fri, 13 Oct 2017 00:37:57 +0000 (00:37 +0000)]
Merge "vp8: [loongson] optimize dct with mmi"

6 years agoMerge "Adjust to scene detection for 1 pass vbr."
Marco Paniconi [Thu, 12 Oct 2017 19:36:33 +0000 (19:36 +0000)]
Merge "Adjust to scene detection for 1 pass vbr."

6 years agoAdjust to scene detection for 1 pass vbr.
Marco [Tue, 10 Oct 2017 22:30:32 +0000 (15:30 -0700)]
Adjust to scene detection for 1 pass vbr.

Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.

No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.

Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026

6 years agoMerge "vp9: use nonrd pick_intra for small blocks on keyframes."
Jerome Jiang [Thu, 12 Oct 2017 17:29:27 +0000 (17:29 +0000)]
Merge "vp9: use nonrd pick_intra for small blocks on keyframes."

6 years agoMerge changes I38783d97,If5160c0c
Kyle Siefring [Thu, 12 Oct 2017 16:12:38 +0000 (16:12 +0000)]
Merge changes I38783d97,If5160c0c

* changes:
  Extend 16 wide AVX2 convolve8 code to support averaging.
  Add AVX2 version of vpx_convolve8_avg.

6 years agoIncrease precision of some debug stats output for corpus VBR.
paulwilkins [Wed, 11 Oct 2017 10:56:44 +0000 (11:56 +0100)]
Increase precision of some debug stats output for corpus VBR.

Change-Id: I75841797cc0c215781b5b36e3a3e9f4b0e35ba63

6 years agovp9: use nonrd pick_intra for small blocks on keyframes.
Jerome Jiang [Thu, 12 Oct 2017 00:13:39 +0000 (17:13 -0700)]
vp9: use nonrd pick_intra for small blocks on keyframes.

Keyframe encoding is more than 2x faster.
Disabled on Speed 8.

Change-Id: I2157318b6ac8253fa5398322c72d98cd7fa9b2b6

6 years agovp8: [loongson] optimize dct with mmi
Shiyou Yin [Wed, 13 Sep 2017 06:03:11 +0000 (14:03 +0800)]
vp8: [loongson] optimize dct with mmi

1. vp8_short_fdct4x4_mmi
2. vp8_short_fdct8x4_mmi
3. vp8_short_walsh4x4_mmi

Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb

6 years agoMerge "vp8: [loongson] optimize quantize with mmi"
Shiyou Yin [Thu, 12 Oct 2017 00:33:17 +0000 (00:33 +0000)]
Merge "vp8: [loongson] optimize quantize with mmi"

6 years agoAdjust threshold in datarate tests for 1 pass VBR
Marco [Wed, 11 Oct 2017 18:06:34 +0000 (11:06 -0700)]
Adjust threshold in datarate tests for 1 pass VBR

Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>

Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16

6 years agoTest extreme inputs in frame scale functions
Linfeng Zhang [Wed, 11 Oct 2017 18:35:19 +0000 (11:35 -0700)]
Test extreme inputs in frame scale functions

Change-Id: Ic149e3cb59be2ee0f98a3fcfd83226ad5ea30c99

6 years agoPrevent double application of min rate in two pass.
paulwilkins [Wed, 11 Oct 2017 09:31:57 +0000 (10:31 +0100)]
Prevent double application of min rate in two pass.

The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.

Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.

For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150):  +0.751, +1.099
New code(50-150): +0.241, -0.009

Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef

6 years agovp8: [loongson] optimize quantize with mmi
Shiyou Yin [Mon, 11 Sep 2017 10:07:25 +0000 (18:07 +0800)]
vp8: [loongson] optimize quantize with mmi

1. vp8_fast_quantize_b_mmi
2. vp8_regular_quantize_b_mmi

Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646

6 years agoAdd 4 to 1 scaling x86 optimization
Linfeng Zhang [Wed, 4 Oct 2017 16:55:56 +0000 (09:55 -0700)]
Add 4 to 1 scaling x86 optimization

Change-Id: I51c190f0a88685867df36912522e67bdae58a673

6 years agoMerge "Fix alignment in vpx_image without external allocation."
Jerome Jiang [Tue, 10 Oct 2017 23:02:05 +0000 (23:02 +0000)]
Merge "Fix alignment in vpx_image without external allocation."

6 years agoFix alignment in vpx_image without external allocation.
Jerome Jiang [Tue, 10 Oct 2017 02:33:03 +0000 (19:33 -0700)]
Fix alignment in vpx_image without external allocation.

This restores behaviors prior to
<40c8fde Fix image width alignment. Enable ImageSizeSetting test.>.

BUG=b/64710201

Change-Id: I559557afe80d5ff5ea6ac24021561715068e7786

6 years agoGeneralize CheckScalingFiltering in ConvolveTest
Linfeng Zhang [Tue, 10 Oct 2017 19:13:55 +0000 (12:13 -0700)]
Generalize CheckScalingFiltering in ConvolveTest

Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.

Change-Id: I1042564d1d390589ca203070fe332c6da3315d75

6 years agoAdjustment to scene detection and key frame.
Marco [Tue, 10 Oct 2017 00:53:21 +0000 (17:53 -0700)]
Adjustment to scene detection and key frame.

For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).

Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.

Change-Id: I12626f7627419ca14f9d0d249df86c7104438162

6 years agoMerge changes I9d4c1af5,I882da3a0
Linfeng Zhang [Tue, 10 Oct 2017 17:29:50 +0000 (17:29 +0000)]
Merge changes I9d4c1af5,I882da3a0

* changes:
  Rename some inline functions in NEON scaling
  Generalize 2:1 vp9_scale_and_extend_frame_ssse3()

6 years agoFurther Corpus VBR change.
paulwilkins [Thu, 17 Aug 2017 13:13:29 +0000 (14:13 +0100)]
Further Corpus VBR change.

Change to the bit allocation within a GF/ARF group.

Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).

This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.

Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708

6 years agoCorpus Wide VBR test implementation.
paulwilkins [Fri, 19 May 2017 14:08:15 +0000 (15:08 +0100)]
Corpus Wide VBR test implementation.

This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.

At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef.  Ultimately this
would need to be controlled by command line parameters.

Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873

6 years agoExtend 16 wide AVX2 convolve8 code to support averaging.
Kyle Siefring [Sun, 8 Oct 2017 03:25:03 +0000 (23:25 -0400)]
Extend 16 wide AVX2 convolve8 code to support averaging.

Also adds vpx_convolve8_avg_horiz_avx2.

Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf

6 years agoRename some inline functions in NEON scaling
Linfeng Zhang [Wed, 4 Oct 2017 20:04:40 +0000 (13:04 -0700)]
Rename some inline functions in NEON scaling

Change-Id: I9d4c1af53d57f72fc716bacbe3b0965719c045ac

6 years agoMerge "Update vp9_scale_and_extend_frame_ssse3()"
Linfeng Zhang [Mon, 9 Oct 2017 16:20:00 +0000 (16:20 +0000)]
Merge "Update vp9_scale_and_extend_frame_ssse3()"

6 years agoAdd AVX2 version of vpx_convolve8_avg.
Kyle Siefring [Sat, 7 Oct 2017 20:02:02 +0000 (16:02 -0400)]
Add AVX2 version of vpx_convolve8_avg.

vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.

The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.

vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.

Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983

6 years agoMerge "ppc: Add vpx_idct32x32_1024_add_vsx"
James Zern [Sat, 7 Oct 2017 19:08:26 +0000 (19:08 +0000)]
Merge "ppc: Add  vpx_idct32x32_1024_add_vsx"

6 years agoMerge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad.""
Marco Paniconi [Fri, 6 Oct 2017 22:41:34 +0000 (22:41 +0000)]
Merge "Revert "Speed >=5 real-time: add TM intra mode for high_source_sad.""

6 years agoRevert "Speed >=5 real-time: add TM intra mode for high_source_sad."
Marco Paniconi [Fri, 6 Oct 2017 22:14:56 +0000 (22:14 +0000)]
Revert "Speed >=5 real-time: add TM intra mode for high_source_sad."

This reverts commit 9311ef18b4b4eff0da3adf9d702a34f489a270ff.

Reason for revert:
Notice small regression in some clips.
Will revisit in another change.

Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
>
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
>
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

TBR=marpan@google.com,builds@webmproject.org,jianj@google.com

Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true

6 years agoAdjust threshold in scene detection
Marco [Fri, 6 Oct 2017 17:53:40 +0000 (10:53 -0700)]
Adjust threshold in scene detection

For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.

Reduces possible false detection for scene cut detection.

Neutral/small change in metrics or speed for speed 5.

Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76

6 years agoMerge "Speed >=5 real-time: add TM intra mode for high_source_sad."
Marco Paniconi [Fri, 6 Oct 2017 06:29:46 +0000 (06:29 +0000)]
Merge "Speed >=5 real-time: add TM intra mode for high_source_sad."

6 years agoSpeed >=5 real-time: add TM intra mode for high_source_sad.
Marco [Thu, 5 Oct 2017 19:58:51 +0000 (12:58 -0700)]
Speed >=5 real-time: add TM intra mode for high_source_sad.

Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.

Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d

6 years agoMerge "vpx_codec.h: namespace local defines"
James Zern [Fri, 6 Oct 2017 05:30:16 +0000 (05:30 +0000)]
Merge "vpx_codec.h: namespace local defines"

6 years agovpx_codec.h: namespace local defines
James Zern [Thu, 5 Oct 2017 22:09:33 +0000 (15:09 -0700)]
vpx_codec.h: namespace local defines

add VPX_ to UNUSED/*DEPRECATED to avoid conflicts with other headers.

Change-Id: Ie16bdac3575bc1af57a05d37e65b994370585377

6 years agovp9_ethread_test: abort early/add more detailed output
James Zern [Thu, 5 Oct 2017 22:02:51 +0000 (15:02 -0700)]
vp9_ethread_test: abort early/add more detailed output

in the case compare_fp_stats fails report the 2 values and their index

Change-Id: I927a832b7a1e24c392961093b7caee1134223def