Marco [Thu, 30 Nov 2017 20:08:00 +0000 (12:08 -0800)]
Nonrd-pickmode: avoid duplicate computation of UV predictor.
Avoids duplicate computation of UV predictor.
Bit-exact when static_threshold is zero.
Small/neutral difference on RTC set with nonzero static_threshold
(since UV predictor won't be skipped with this change).
Small speed gain, ~1-2%, at speed 8.
Change-Id: Iba8d22a307768b391e29d63c9826aac5a4d9c285
Shiyou Yin [Thu, 30 Nov 2017 00:53:50 +0000 (00:53 +0000)]
Merge changes Icd9c866b,I81717e47
* changes:
vp8: [loongson] optimize regular quantize v2.
vp8: [loongson] optimize vp8_short_fdct4x4_mmi v2.
Shiyou Yin [Thu, 30 Nov 2017 00:53:37 +0000 (00:53 +0000)]
Merge "vpx: [loongson] fix bug in var_filter_block2d_bil_16x"
Jingning Han [Wed, 29 Nov 2017 22:56:47 +0000 (22:56 +0000)]
Merge "Add PSNR Cb and Cr metric to opsnr.stt"
Marco Paniconi [Wed, 29 Nov 2017 22:41:31 +0000 (22:41 +0000)]
Merge "vp9-svc: Don't allow encode_breakout on golden ref."
Marco [Wed, 29 Nov 2017 21:37:21 +0000 (13:37 -0800)]
vp9-svc: Don't allow encode_breakout on golden ref.
For 1 pass cbr SVC: GOLDEN is the spatial reference,
better not to check for encoder_breakout on this reference.
Small positive ~0.075% (mostly neutral) gain in avgPSNR/SSIM metrics.
No observed change in encoder speed.
Change-Id: Ib337f16d6771105bf06384c6a23ad047fc690418
Marco [Wed, 29 Nov 2017 20:48:20 +0000 (12:48 -0800)]
vp9-svc: Clean conditon for allowing copy_partition.
Make condition explicit on non_reference_frame.
No change in behavior.
Change-Id: Iec5068bccd93c7c7be67634c5c090580b2dbb20d
Kyle Siefring [Wed, 29 Nov 2017 19:14:45 +0000 (19:14 +0000)]
Merge "Remove unnecessary includes of emmintrin_compat.h"
Kyle Siefring [Wed, 29 Nov 2017 16:48:24 +0000 (11:48 -0500)]
Remove unnecessary includes of emmintrin_compat.h
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
Shiyou Yin [Wed, 29 Nov 2017 08:59:22 +0000 (16:59 +0800)]
vp8: [loongson] optimize regular quantize v2.
1. Optimize the memset with mmi.
2. Optimize macro REGULAR_SELECT_EOB.
Change-Id: Icd9c866b0e6aef08874b2f123e9b0e09919445ff
Shiyou Yin [Wed, 29 Nov 2017 03:58:38 +0000 (11:58 +0800)]
vp8: [loongson] optimize vp8_short_fdct4x4_mmi v2.
Optimize the calculate process of a,b,c,d.
Change-Id: I81717e47bc988ace1412d478513e7dd3cb6b0cc9
James Zern [Tue, 28 Nov 2017 02:35:37 +0000 (18:35 -0800)]
vpx{enc,dec}: add --help
only output short usage to stderr on error, with --help use stdout
Change-Id: I7089f3bca829817e14b14c766f4f3eaee6f54e5c
Jingning Han [Mon, 27 Nov 2017 23:04:32 +0000 (15:04 -0800)]
Add PSNR Cb and Cr metric to opsnr.stt
Change-Id: I24e1741c00f9514647c7db2758a7ababd4e96932
Shiyou Yin [Fri, 22 Sep 2017 07:29:21 +0000 (15:29 +0800)]
vpx: [loongson] fix bug in var_filter_block2d_bil_16x
Which cause failed case:
1. MMI/VpxSubpelVarianceTest.Ref/6
2. MMI/VpxSubpelVarianceTest.Ref/7
3. MMI/VpxSubpelVarianceTest.ExtremeRef/6
4. MMI/VpxSubpelVarianceTest.ExtremeRef/7
Change-Id: I122ca20089e14ac324edd61295cf8f506e06afc8
Marco [Tue, 28 Nov 2017 22:28:26 +0000 (14:28 -0800)]
vp9-svc: Fix condition for setting downsampling filter.
Use (width * height) for setting downsampling filter type.
Change-Id: If4acfde7ff9339e0584155f8a4d15b2f134211f2
Johann [Wed, 23 Aug 2017 22:27:25 +0000 (15:27 -0700)]
quantize x86: dedup some parts
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
Marco [Tue, 21 Nov 2017 23:04:53 +0000 (15:04 -0800)]
vp9-svc: Fix to the layer buffer settings.
For the case when the number of temporal layers > 1,
the buffer levels (starting/optimal_buffer_level,
and maximum_buffer_size) were not scaled properly.
In vp9_update_layer_context_change_config():
when setting the layer-buffer levels, fix is to scale
the layer-target_bandwidth by the target_bandwidth
(which is the full stream bandwidth) instead of the
spatial_layer_target.
This is needed because prior to the call
vp9_update_layer_context_change_config(), set_rc_buffer_sizes()
is called which sets the buffer levels based on target bandwidth
(which is the full bandwidth for the SVC stream).
This fix properly sets the layer-buffer levels based on the
layer-bandwidth, and leads to better rate targeting.
Small/neutral change in avgPSNR/SSIM metrics on RTC set.
Change-Id: Ic0f4f7f3487c37b9a9adb4781ae5edfed7140a57
Peter Collingbourne [Tue, 21 Nov 2017 18:42:40 +0000 (18:42 +0000)]
Merge "[CFI] Remove function pointer casts"
Jerome Jiang [Tue, 21 Nov 2017 01:22:46 +0000 (01:22 +0000)]
Merge "vp8 simulcast: fix compile warnings."
Vlad Tsyrklevich [Mon, 20 Nov 2017 21:40:54 +0000 (13:40 -0800)]
[CFI] Remove function pointer casts
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts to make libvpx CFI-safe.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I7e08522d195a43c88cda06fa20414426c8c4372c
Jerome Jiang [Tue, 21 Nov 2017 00:18:31 +0000 (16:18 -0800)]
vp8 simulcast: fix compile warnings.
Clean up some prints.
Change-Id: I199350e34a8b6fbff9601fcbd11ec68d24da5073
Kyle Siefring [Mon, 20 Nov 2017 22:37:57 +0000 (22:37 +0000)]
Merge "Optimize AVX2 get16x16var and get32x16var functions"
Jerome Jiang [Mon, 20 Nov 2017 18:52:58 +0000 (18:52 +0000)]
Merge "vp9 svc: fix a few compile warnings."
Marco [Fri, 17 Nov 2017 17:47:03 +0000 (09:47 -0800)]
vp9-svc: Enbale scale partition reference frames.
For reference frames: enable scale partition for
superblocks with low source sad or if bsize on lower-resoln
is at least 32x32.
Keep feature disabled for base temporal layer.
Small regression in avgPNSR/SSIM metrics, ~0.5-1%.
Speedup ~2-3% on mac for SVC (3 spatial/3 temporal layers) at speed 7.
Change-Id: I5987eb7763845b680059128b538bb5188be0cca5
Jerome Jiang [Fri, 17 Nov 2017 22:33:21 +0000 (14:33 -0800)]
vp9 svc: fix a few compile warnings.
Change-Id: I4cb878600038066513ab73f3658990d1245ff2fb
Kyle Siefring [Fri, 17 Nov 2017 18:43:05 +0000 (13:43 -0500)]
Optimize AVX2 get16x16var and get32x16var functions
Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2
Paul Wilkins [Fri, 17 Nov 2017 10:34:56 +0000 (10:34 +0000)]
Merge "Disable allow_partition_search_skip for speed 2."
Paul Wilkins [Fri, 17 Nov 2017 10:34:46 +0000 (10:34 +0000)]
Merge "Code cleanup."
Paul Wilkins [Fri, 17 Nov 2017 10:34:37 +0000 (10:34 +0000)]
Merge "Remove decay_accumulator clause from alt ref breakout."
Paul Wilkins [Fri, 17 Nov 2017 10:34:26 +0000 (10:34 +0000)]
Merge "Add clause to alt ref group breakout."
Jerome Jiang [Fri, 17 Nov 2017 00:31:16 +0000 (00:31 +0000)]
Merge "vp9: Fix mem rel for non-ref for external buffer."
paulwilkins [Thu, 16 Nov 2017 16:15:06 +0000 (16:15 +0000)]
Disable allow_partition_search_skip for speed 2.
When allow_partition_search_skip is set the two pass code
can optionally skip the partition search in the rd loop if the image
appears static (based on selection of 0,0 motion).
Unfortunately 0,0 motion does not necessarily mean that there are
no meaningful changes or that motion or intra modes will not be selected
in the second pass.
Disabling "allow_partition_search_skip" may hurt the encode speed a little
for a small number of clips but can have a big impact on compression.
The most notable example of this in our test sets is "bridge_close_cif"
where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
psnr-hvs.
Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
Jerome Jiang [Tue, 14 Nov 2017 01:21:26 +0000 (17:21 -0800)]
vp9 svc: Rework/fix scale partitioning on boundary.
Enable partition copy on boundary and scale blocks along the boundary.
Rename copy_partition_svc to scale_partition_svc.
Do not copy if the block crosses the boundary.
Change-Id: I37a04d48f11b15c4ea67facd7631193ec2f62150
Johann [Wed, 15 Nov 2017 21:01:44 +0000 (13:01 -0800)]
fwd txfm ssse3: use GLOBAL() for loading constants
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object
Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
paulwilkins [Wed, 15 Nov 2017 17:07:28 +0000 (17:07 +0000)]
Code cleanup.
Removal of parameters to and code in calc_frame_boost() that is no
longer required.
No change to results from previous patch.
Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
paulwilkins [Wed, 15 Nov 2017 16:58:05 +0000 (16:58 +0000)]
Remove decay_accumulator clause from alt ref breakout.
The decay accumulator clause covers similar ground to the
new clause that tests the accumulated second reference error
so it has been removed to reduce complexity.
Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
paulwilkins [Wed, 15 Nov 2017 16:39:54 +0000 (16:39 +0000)]
Add clause to alt ref group breakout.
Add a clause to the breakout test for alt ref groups that
examines the size of the accumulated second reference
frame error compared to the cost of intra coding.
This clause causes a reduction in the average group length for many
clips. Alongside the change to the group length the minimum
boost is increased.
On balance the results are positive for psnr and psnr-hvs
but is negative for ssim/fast ssim for the smaller image formats.
Strong gains on some harder clips (eg ducks take off (midres) ~20%,
husky (lowres) 6-17%. Most of the negative cases are lower motion
clips. Subsequent patch hopefully will help with those.
Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
Marco [Wed, 15 Nov 2017 03:52:54 +0000 (19:52 -0800)]
vp9-svc: Fix flag for usage of reuse-lowres partition
Fix/cleaup the conditioning for usage of the reuse-lowres
partition feature.
Replace the non-reference condition with the top temporal
layer, and put this condition in the speed feature.
This prevents doing update_partition_svc() on every
VGA frame, instead it will now only do update for VGA in
the top temporal layer frames.
Also this makes it easier to test/enable this feature
for lower layer temporal frames.
Change-Id: Ia897afbc6fe5c84c5693e310bcaa6a87ce017be5
Scott LaVarnway [Tue, 14 Nov 2017 12:38:00 +0000 (04:38 -0800)]
tiny_ssim.c : clang compile error fix
Change-Id: Ic10ba580fd5da7d6ff7fa0f33db72fb0c1a97801
James Bankoski [Tue, 14 Nov 2017 00:15:24 +0000 (00:15 +0000)]
Merge "add 10 and 12 bit to tiny_ssim"
Jerome Jiang [Mon, 13 Nov 2017 21:04:41 +0000 (21:04 +0000)]
Merge "vp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION."
Jerome Jiang [Mon, 13 Nov 2017 19:05:20 +0000 (11:05 -0800)]
vp9 svc: Change conditions on VPX_ENCODER_ABI_VERSION.
VPX_ENCODER_ABI_VERSION was bumped up in 93e83f.
Change-Id: Id5707f9f9db56fa96549bc8f54e1cfa04e7fa4cd
Jim Bankoski [Mon, 13 Nov 2017 14:44:17 +0000 (06:44 -0800)]
add 10 and 12 bit to tiny_ssim
Change-Id: I92e4dba2d1682a0d77ad9a214ec4312b1cf4d42e
paulwilkins [Wed, 27 Sep 2017 17:17:18 +0000 (18:17 +0100)]
New content type to improve grain retention.
For new VP9 only content type adjust the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.
In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.
For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).
The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.
This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.
The features are enabled on the vpxenc command line by setting
--tune-content=film.
VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
paulwilkins [Mon, 6 Nov 2017 11:24:34 +0000 (11:24 +0000)]
Small parameter clean up.
Removed three parameters that are no longer needed in calls
to calc_arf_boost() and associated minor changes.
No impact on encode results.
Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
Paul Wilkins [Mon, 13 Nov 2017 16:36:43 +0000 (16:36 +0000)]
Merge "Fix to frames considered in arf boost calculation."
Paul Wilkins [Mon, 13 Nov 2017 16:32:39 +0000 (16:32 +0000)]
Merge "CVBR command line option."
Scott LaVarnway [Fri, 10 Nov 2017 18:19:52 +0000 (10:19 -0800)]
vpx: [x86] add vpx_satd_avx2()
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize 16: ~1.33
blocksize 64: ~1.51
blocksize 256: ~3.03
blocksize 1024: ~3.71
Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
Scott LaVarnway [Fri, 10 Nov 2017 00:45:47 +0000 (00:45 +0000)]
Merge "vpx: [x86] add vp9_block_error_fp_avx2()"
Marco Paniconi [Fri, 10 Nov 2017 00:30:04 +0000 (00:30 +0000)]
Merge "vp9-svc: Avoid minmax variance for non-reference frames."
Marco [Thu, 9 Nov 2017 23:24:10 +0000 (15:24 -0800)]
vp9-svc: Avoid minmax variance for non-reference frames.
For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.
Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.
Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.
Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
James Zern [Fri, 10 Nov 2017 00:15:03 +0000 (00:15 +0000)]
Merge "runtime error fix: bitdepth_conversion_avx2.h"
Jerome Jiang [Tue, 7 Nov 2017 21:00:01 +0000 (13:00 -0800)]
vp9: Fix mem rel for non-ref for external buffer.
Release frame buffers for non-ref when the decoder is destroyed.
Enable the non ref test.
BUG=b/
68819248
Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
Jerome Jiang [Thu, 9 Nov 2017 23:28:44 +0000 (23:28 +0000)]
Merge "vp9: SVC feature to use partition from lower resolution."
Jerome Jiang [Wed, 8 Nov 2017 23:12:44 +0000 (15:12 -0800)]
vp9: SVC feature to use partition from lower resolution.
For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.
Enabled for speed >= 7 and only for non-reference frames.
Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
Scott LaVarnway [Thu, 9 Nov 2017 20:26:43 +0000 (12:26 -0800)]
runtime error fix: bitdepth_conversion_avx2.h
Change-Id: I7364a157de39eb7137b599808474b8d46d19d376
Johann Koenig [Thu, 9 Nov 2017 19:50:04 +0000 (19:50 +0000)]
Merge "fail early on oversize frames"
Scott LaVarnway [Thu, 9 Nov 2017 00:06:29 +0000 (16:06 -0800)]
vpx: [x86] add vp9_block_error_fp_avx2()
SSE2 asm vs AVX2 intrinsics speed gains:
blocksize 16: ~1.00
blocksize 64: ~1.17
blocksize 256: ~1.67
blocksize 1024: ~1.81
Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
paulwilkins [Wed, 1 Nov 2017 14:21:39 +0000 (14:21 +0000)]
Fix to frames considered in arf boost calculation.
For a chosen interval "i" the existing arf boost calculation examined frames
+/- (i-1) frames from the current location in the second pass.
This change checks to make sure that the forward search does not extend
beyond the next key frame in the event that the distance to the next key
frame is < (i - 1).
Small metrics gains on all our test sets but these are localized to a few clips
(e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
Jerome Jiang [Thu, 9 Nov 2017 04:41:10 +0000 (04:41 +0000)]
Merge "vp9: Add nonref frame buffer test."
Jerome Jiang [Wed, 8 Nov 2017 01:20:34 +0000 (17:20 -0800)]
vp9: Add nonref frame buffer test.
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.
Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.
Disabled for now. Will be enabled with the fix.
BUG=b/
68819248
Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
Johann Koenig [Wed, 8 Nov 2017 16:28:40 +0000 (16:28 +0000)]
Merge "Support building AVX-512 and implement sadx4 for AVX-512"
paulwilkins [Tue, 10 Oct 2017 18:49:59 +0000 (19:49 +0100)]
CVBR command line option.
Added command line control of Corpus VBR.
The new corpus vbr mode is a variant of standard
VBR (end-usage=0) where the complexity distribution
mid point is passed in rather than calculated for a specific
clip or chunk.
The new variant is enabled by setting a new command line
parameter --corpus-complexity to a zero value. Omitting
this parameter or setting it to 0 will cause the codec to use
standard vbr mode.
The correct value for a given corpus needs to be derived
experimentally using a training set such that the average
rate for the corpus is close to the target value.
For example our using our low res test set with upper and lower
vbr limits of 50%-150% and a corpus complexity value of 650
gives a similar average data rate across the set to using standard
vbr. However, with the corpus mode easier clips will be allocated
fewer bits and harder clips more bits rather than having the same
rate target for all.
Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
Marco [Fri, 3 Nov 2017 18:29:35 +0000 (11:29 -0700)]
Nonrd_pickmode: avoid computing UV cost when early_term is set.
For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).
Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.
Very small speed gain (~0.5%) on some clips with strong color content.
Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
Kyle Siefring [Tue, 31 Oct 2017 15:19:19 +0000 (11:19 -0400)]
Support building AVX-512 and implement sadx4 for AVX-512
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
Marco [Wed, 25 Oct 2017 22:45:11 +0000 (15:45 -0700)]
Compound prediction mode for nonrd pickmode.
Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.
Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.
avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.
Small/negligible decrease in speed.
Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
Johann [Fri, 3 Nov 2017 16:49:13 +0000 (09:49 -0700)]
fail early on oversize frames
Even though frame_size is calculated in uint64_t, it winds up in an int
size value.
This was exposed with the msan test because the memset is called with
(int)frame_size, leading to a segfault.
Change-Id: I7fd930360dca274adb8f3e43e5e6785204808861
Jerome Jiang [Wed, 1 Nov 2017 16:58:01 +0000 (16:58 +0000)]
Merge "vp9: Move allocation of vt2 after early exits."
Jerome Jiang [Tue, 31 Oct 2017 23:53:46 +0000 (16:53 -0700)]
vp9: Move allocation of vt2 after early exits.
Remove the memory deallocation on the early exits.
Change-Id: I00b4a814ae6705105ecab89644d055ca3311d9f4
Jerome Jiang [Tue, 31 Oct 2017 21:42:18 +0000 (21:42 +0000)]
Merge "vp9: Reduce stack usage of choose_partitioning."
Jerome Jiang [Tue, 31 Oct 2017 02:21:24 +0000 (19:21 -0700)]
vp9: Reduce stack usage of choose_partitioning.
Move vt2 to heap.
Reduce the stack usage from ~87K to ~44K.
BUG=b/
68362457
Change-Id: I8f5f93712934d59a8cc4564378172d409a736a2e
Jerome Jiang [Mon, 30 Oct 2017 23:39:41 +0000 (23:39 +0000)]
Merge "vp9: Reduce stack usage of choose_partioning."
Jerome Jiang [Mon, 30 Oct 2017 20:24:29 +0000 (13:24 -0700)]
vp9: Reduce stack usage of choose_partioning.
Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.
This reduces the stack usage from ~131K to ~87K.
BUG=b/
68362457
Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
James Zern [Wed, 20 Jul 2016 03:56:25 +0000 (20:56 -0700)]
vp8: correct if/else '{' placement
swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.
Change-Id: I67a45596f80a12389faca49c5be440875092a7df
Scott LaVarnway [Thu, 26 Oct 2017 16:45:06 +0000 (09:45 -0700)]
vpx: hadamard: use ptrdiff_t instead of int for stride
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:
movslq %esi,%rax
Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
Kyle Siefring [Tue, 24 Oct 2017 19:22:36 +0000 (19:22 +0000)]
Merge "Optimize convolve8 SSSE3 and AVX2 intrinsics"
Kyle Siefring [Sun, 22 Oct 2017 23:34:19 +0000 (19:34 -0400)]
Optimize convolve8 SSSE3 and AVX2 intrinsics
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
Scott LaVarnway [Mon, 23 Oct 2017 22:02:59 +0000 (22:02 +0000)]
Merge "vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix"
Marco [Mon, 23 Oct 2017 17:58:28 +0000 (10:58 -0700)]
vp9-svc: Allow for adapt_rd_thresh with row-mt.
Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.
~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.
Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
Scott LaVarnway [Fri, 20 Oct 2017 21:46:41 +0000 (14:46 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() highbitdepth fix
Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.
Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b
Scott LaVarnway [Fri, 20 Oct 2017 12:21:15 +0000 (05:21 -0700)]
vpx: [x86] vpx_hadamard_16x16_avx2() improvements
~10% performance gain. Fixed the cosmetics noted in the
previous commit.
Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13
Scott LaVarnway [Thu, 19 Oct 2017 23:32:10 +0000 (23:32 +0000)]
Merge "vpx: [x86] add vpx_hadamard_16x16_avx2()"
Paul Wilkins [Thu, 19 Oct 2017 10:07:45 +0000 (10:07 +0000)]
Merge "Corpus VBR tweak for undershoot."
Paul Wilkins [Thu, 19 Oct 2017 10:07:30 +0000 (10:07 +0000)]
Merge "Increase precision of some debug stats output for corpus VBR."
Paul Wilkins [Thu, 19 Oct 2017 10:06:33 +0000 (10:06 +0000)]
Merge "Prevent double application of min rate in two pass."
Scott LaVarnway [Thu, 19 Oct 2017 00:12:57 +0000 (17:12 -0700)]
vpx: [x86] add vpx_hadamard_16x16_avx2()
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
Jerome Jiang [Wed, 18 Oct 2017 19:39:26 +0000 (19:39 +0000)]
Merge "Add datarate test for vp8 ROI."
Jerome Jiang [Tue, 17 Oct 2017 21:43:07 +0000 (14:43 -0700)]
Add datarate test for vp8 ROI.
BUG=webm:1470
Change-Id: Icbc848837e64eacc49491dcc26b4c5802af2ee13
Jerome Jiang [Wed, 18 Oct 2017 18:16:44 +0000 (18:16 +0000)]
Merge "vp8: Enable use of ROI map."
Kyle Siefring [Wed, 18 Oct 2017 16:19:52 +0000 (16:19 +0000)]
Merge "Refactor x86/vpx_subpixel_8t_intrin_avx2.c"
Shiyou Yin [Wed, 18 Oct 2017 00:55:36 +0000 (00:55 +0000)]
Merge "vp8: [loongson] optimize idct with mmi"
Jerome Jiang [Thu, 12 Oct 2017 22:03:22 +0000 (15:03 -0700)]
vp8: Enable use of ROI map.
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.
BUG=webm:1470
Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
Linfeng Zhang [Tue, 17 Oct 2017 16:03:29 +0000 (16:03 +0000)]
Merge changes I17fff122,Ic149e3cb
* changes:
Add 4 to 3 scaling SSSE3 optimization
Test extreme inputs in frame scale functions
Linfeng Zhang [Tue, 17 Oct 2017 16:03:07 +0000 (16:03 +0000)]
Merge "Generalize CheckScalingFiltering in ConvolveTest"
Kyle Siefring [Sat, 14 Oct 2017 20:26:35 +0000 (16:26 -0400)]
Refactor x86/vpx_subpixel_8t_intrin_avx2.c
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
Shiyou Yin [Wed, 13 Sep 2017 08:20:21 +0000 (16:20 +0800)]
vp8: [loongson] optimize idct with mmi
1. vp8_dequant_idct_add_y_block_mmi
2. vp8_dequant_idct_add_uv_block_mmi
Change-Id: I9987147be2685ac79d4b045d1d56f6709ee1223c
Linfeng Zhang [Wed, 11 Oct 2017 18:59:04 +0000 (11:59 -0700)]
Add 4 to 3 scaling SSSE3 optimization
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
Marco [Fri, 13 Oct 2017 22:31:02 +0000 (15:31 -0700)]
Adjust threshold in gf_boost for 1 pass vbr
Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.
Small improvement in few clips on ytlive, otherwise neutral change.
Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
Paul Wilkins [Fri, 13 Oct 2017 15:59:58 +0000 (15:59 +0000)]
Merge "Further Corpus VBR change."
Paul Wilkins [Fri, 13 Oct 2017 15:59:45 +0000 (15:59 +0000)]
Merge "Corpus Wide VBR test implementation."