platform/upstream/libvpx.git
6 years agoMerge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."
Shiyou Yin [Tue, 22 Aug 2017 00:37:23 +0000 (00:37 +0000)]
Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."

7 years agoquantize test: test _fp_ version of quantize
Johann [Wed, 2 Aug 2017 21:28:05 +0000 (14:28 -0700)]
quantize test: test _fp_ version of quantize

None of the x86 optimizations pass the tests.

Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909

7 years agoRemove skip_block from quantize
Johann [Wed, 16 Aug 2017 20:34:14 +0000 (13:34 -0700)]
Remove skip_block from quantize

This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.

Add assert() and comments regarding the usage of skip_block.

Removing the parameter is a fairly involved process so leave it be for
the moment.

Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a

7 years agoMerge "vpx_dsp: vpx_get16x16var_avx2() cleanup"
Scott LaVarnway [Fri, 18 Aug 2017 20:30:59 +0000 (20:30 +0000)]
Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"

7 years agovpx_dsp: vpx_get16x16var_avx2() cleanup
Scott LaVarnway [Mon, 7 Aug 2017 18:56:42 +0000 (11:56 -0700)]
vpx_dsp: vpx_get16x16var_avx2() cleanup

BUG=webm:1404

Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a

7 years agoMerge "quantize: normalize intermediate types"
Johann Koenig [Fri, 18 Aug 2017 16:00:28 +0000 (16:00 +0000)]
Merge "quantize: normalize intermediate types"

7 years agovpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi.
Shiyou Yin [Wed, 2 Aug 2017 06:17:09 +0000 (14:17 +0800)]
vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi.

Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e

7 years agohighbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo
James Zern [Thu, 17 Aug 2017 22:37:38 +0000 (15:37 -0700)]
highbd_idct32x32*,idct32_34_4x32_quarter_1_2: fix typo

135 -> 34

fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12]

Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682

7 years agoquantize: normalize intermediate types
Johann [Wed, 16 Aug 2017 17:22:48 +0000 (10:22 -0700)]
quantize: normalize intermediate types

Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.

HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.

Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5

7 years agoinv_txfm_sse2.h: correct idct*/iadst* prototypes
James Zern [Thu, 17 Aug 2017 06:06:09 +0000 (23:06 -0700)]
inv_txfm_sse2.h: correct idct*/iadst* prototypes

fixes mismatch between prototypes and definitions

Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652

7 years agoMerge "Prevent parameters that can cause invalid ARF groups."
Paul Wilkins [Wed, 16 Aug 2017 18:25:57 +0000 (18:25 +0000)]
Merge "Prevent parameters that can cause invalid ARF groups."

7 years agoMerge "Fix corrupt arf groups due to low "lag_in_frames""
Paul Wilkins [Wed, 16 Aug 2017 18:25:29 +0000 (18:25 +0000)]
Merge "Fix corrupt arf groups due to low "lag_in_frames""

7 years agoMerge changes I08b562b6,Ia275940a,I51106e90
Linfeng Zhang [Wed, 16 Aug 2017 16:36:37 +0000 (16:36 +0000)]
Merge changes I08b562b6,Ia275940a,I51106e90

* changes:
  Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
  Update highbd idct x86 optimizations.
  Update 32x32 idct sse2 and ssse3 optimizations.

7 years agoPrevent parameters that can cause invalid ARF groups.
paulwilkins [Wed, 16 Aug 2017 12:34:49 +0000 (13:34 +0100)]
Prevent parameters that can cause invalid ARF groups.

Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.

This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).

See also BUG=webm:1454

Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c

7 years agoFix corrupt arf groups due to low "lag_in_frames"
paulwilkins [Wed, 16 Aug 2017 13:07:24 +0000 (14:07 +0100)]
Fix corrupt arf groups due to low "lag_in_frames"

Having a very small value for "lag_in_frames" can result in
corrupt arf groups including displayed frames that update
the arf buffer and fake overlay frames that are not in fact
overlays of real arfs but are nevertheless starved of bits.

Leaving lag_in_frames at the default of 25 for these 5 frame two
pass VBR tests should now give rise to a valid ARF coding pattern
as follows:-  K(ey), A(rf), N(ormal), N, N, O(verlay).

This change is part of a response to BUG=webm:1454 where broken
arf groups interacted badly with a change that corrects for large rate
misses. However, it may still in some cases increase encode time by
virtue of the fact that the unit test now codes a correct coding pattern
with "hidden" ARF frames.

Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b

7 years agoMerge "Fix for encoder slowdown (for speeds >= 3)"
Paul Wilkins [Wed, 16 Aug 2017 13:01:38 +0000 (13:01 +0000)]
Merge "Fix for encoder slowdown (for speeds >= 3)"

7 years agoFix for encoder slowdown (for speeds >= 3)
paulwilkins [Mon, 14 Aug 2017 15:11:34 +0000 (16:11 +0100)]
Fix for encoder slowdown (for speeds >= 3)

Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.

The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.

This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.

For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.

If we require the emergency rate correction added in  Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.

BUG=webm:1454

Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f

7 years agoMerge "Clean up writing YUV files for debug purpose."
Jerome Jiang [Tue, 15 Aug 2017 18:28:54 +0000 (18:28 +0000)]
Merge "Clean up writing YUV files for debug purpose."

7 years agoMerge "vp9: Denoiser fix: use correct bsize for skin detection."
Marco Paniconi [Tue, 15 Aug 2017 17:53:08 +0000 (17:53 +0000)]
Merge "vp9: Denoiser fix: use correct bsize for skin detection."

7 years agoClean up writing YUV files for debug purpose.
Jerome Jiang [Mon, 14 Aug 2017 20:57:51 +0000 (13:57 -0700)]
Clean up writing YUV files for debug purpose.

Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.

To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising

For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'

Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528

7 years agoMerge changes I1f1edeaa,I89313cac
Johann Koenig [Tue, 15 Aug 2017 17:37:59 +0000 (17:37 +0000)]
Merge changes I1f1edeaa,I89313cac

* changes:
  quantize: silence unsigned overflow warning
  quantize test: quiet overflow warning

7 years agovp9: Denoiser fix: use correct bsize for skin detection.
Marco [Tue, 15 Aug 2017 17:01:09 +0000 (10:01 -0700)]
vp9: Denoiser fix: use correct bsize for skin detection.

Change-Id: I9d201fa3a4b00ebd147b57ed519fab8d59b0a802

7 years agoquantize: silence unsigned overflow warning
Johann [Tue, 15 Aug 2017 16:48:24 +0000 (09:48 -0700)]
quantize: silence unsigned overflow warning

The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.

Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0

7 years agoMerge "vp9: strip temporal filter code"
Scott LaVarnway [Tue, 15 Aug 2017 15:35:33 +0000 (15:35 +0000)]
Merge "vp9: strip temporal filter code"

7 years agoquantize test: quiet overflow warning
Johann [Tue, 15 Aug 2017 15:28:09 +0000 (08:28 -0700)]
quantize test: quiet overflow warning

Promote the result of RandRange to signed

Change-Id: I89313cace3bcbe9af96946bef00b6857fc48b128

7 years agoMerge "Patch relating to Issue 1456."
Paul Wilkins [Tue, 15 Aug 2017 14:57:56 +0000 (14:57 +0000)]
Merge "Patch relating to Issue 1456."

7 years agoMerge "Enable emergency fast Q adaptation for VBR test case."
Paul Wilkins [Tue, 15 Aug 2017 14:57:22 +0000 (14:57 +0000)]
Merge "Enable emergency fast Q adaptation for VBR test case."

7 years agoAdd vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
Linfeng Zhang [Tue, 15 Aug 2017 00:05:22 +0000 (17:05 -0700)]
Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}

BUG=webm:1412

Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f

7 years agoUpdate highbd idct x86 optimizations.
Linfeng Zhang [Mon, 14 Aug 2017 23:47:24 +0000 (16:47 -0700)]
Update highbd idct x86 optimizations.

BUG=webm:1412

Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069

7 years agoUpdate 32x32 idct sse2 and ssse3 optimizations.
Linfeng Zhang [Thu, 10 Aug 2017 22:17:48 +0000 (15:17 -0700)]
Update 32x32 idct sse2 and ssse3 optimizations.

Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70

7 years agovp9: strip temporal filter code
Scott LaVarnway [Thu, 10 Aug 2017 23:19:18 +0000 (16:19 -0700)]
vp9: strip temporal filter code

when CONFIG_REALTIME_ONLY is enabled.

BUG=webm:1446

Change-Id: Id547783ec75383966c40ab5cf6abb4a0f7984f52

7 years agoMerge changes I4b4beab1,I02f74dec
Johann Koenig [Mon, 14 Aug 2017 20:52:52 +0000 (20:52 +0000)]
Merge changes I4b4beab1,I02f74dec

* changes:
  quantize test: check skip_block
  quantize test: use negative input

7 years agoMerge "temporal filter test: adjust inputs and runtime"
Johann Koenig [Mon, 14 Aug 2017 20:46:22 +0000 (20:46 +0000)]
Merge "temporal filter test: adjust inputs and runtime"

7 years agovp9 svc: Fix the stats output when sl = 1.
Jerome Jiang [Mon, 14 Aug 2017 18:55:42 +0000 (11:55 -0700)]
vp9 svc: Fix the stats output when sl = 1.

Actual frame size and bitrate is all 0 when using SVC sample encoder
with sl = 1 because the stats are set in parse_superframe_index which
will not caculate properly when sl = 1 since there is no superframe.

Use pkt->data.frame.sz instead when sl = 1.

Change-Id: I93f5e98a4c779e32b007e1564ba5396af9e34ad6

7 years agoMerge "vp9: strip mb graph code"
Scott LaVarnway [Mon, 14 Aug 2017 18:01:44 +0000 (18:01 +0000)]
Merge "vp9: strip mb graph code"

7 years agotemporal filter test: adjust inputs and runtime
Johann [Tue, 28 Mar 2017 22:19:55 +0000 (15:19 -0700)]
temporal filter test: adjust inputs and runtime

Use input with a narrow range because the filter only applies when the
frames are similar.

Run CompareReferenceRandom more times. Especially before narrowing the
input range, the filter frequently did not apply.

Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38

7 years agodisable SSSE3/VP9QuantizeTest* in hbd builds
James Zern [Mon, 14 Aug 2017 16:31:14 +0000 (09:31 -0700)]
disable SSSE3/VP9QuantizeTest* in hbd builds

this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics

BUG=webm:1458

Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf

7 years agovp9: strip mb graph code
Scott LaVarnway [Fri, 11 Aug 2017 19:24:33 +0000 (12:24 -0700)]
vp9: strip mb graph code

when CONFIG_REALTIME_ONLY is enabled.

BUG=webm:1446

Change-Id: I4b1b8e9a456830ba1b1bd3a8882e038d37ee7903

7 years agoRename vp8 quantize file
Johann [Fri, 11 Aug 2017 17:44:36 +0000 (10:44 -0700)]
Rename vp8 quantize file

BUG=webm:1457

Change-Id: Ie8fae018ad8417724fde087055b90228850d631d

7 years agoMerge "vp9 SVC: Fix the denoiser frame buffer management."
Jerome Jiang [Fri, 11 Aug 2017 00:54:35 +0000 (00:54 +0000)]
Merge "vp9 SVC: Fix the denoiser frame buffer management."

7 years agovp9 SVC: Fix the denoiser frame buffer management.
Jerome Jiang [Mon, 7 Aug 2017 23:32:26 +0000 (16:32 -0700)]
vp9 SVC: Fix the denoiser frame buffer management.

Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).

This change is only for SVC with denoising and is bitexact.

Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae

7 years agoMerge "Clean highbd idct x86 code with inline functions"
Linfeng Zhang [Thu, 10 Aug 2017 20:25:18 +0000 (20:25 +0000)]
Merge "Clean highbd idct x86 code with inline functions"

7 years agoMerge "neon: vpx_quantize_b_32x32"
Johann Koenig [Thu, 10 Aug 2017 15:42:49 +0000 (15:42 +0000)]
Merge "neon: vpx_quantize_b_32x32"

7 years agoMerge "quantize: copy ssse3 optimizations to intrinsics"
Johann Koenig [Thu, 10 Aug 2017 15:42:20 +0000 (15:42 +0000)]
Merge "quantize: copy ssse3 optimizations to intrinsics"

7 years agoPatch relating to Issue 1456.
paulwilkins [Tue, 8 Aug 2017 11:01:46 +0000 (12:01 +0100)]
Patch relating to Issue 1456.

Testing of 4k videos encoded with a fixed arbitrary chunking interval
uncovered a bug where by if a chunk ends 1 frame before a real scene cut,
the next chunk may be encoded with two consecutive key frames at the start
with the first being assigned 0 bits.

This fix insures that where there is a key frame group of length 1 it is
at least assigned 1 frames worth of bits not 0.

See also patch Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
which by virtue of allowing fast adaptation  of Q made this bug more visible.

BUG=webm:1456

Change-Id: Ic9e016cb66d489b829412052273238975dc6f6ab

7 years agoClean highbd idct x86 code with inline functions
Linfeng Zhang [Wed, 9 Aug 2017 00:39:04 +0000 (17:39 -0700)]
Clean highbd idct x86 code with inline functions

Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()

BUG=webm:1412

Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a

7 years agoMerge "vp9: Partition logic adjustment for speed 6 feature."
Marco Paniconi [Tue, 8 Aug 2017 23:08:10 +0000 (23:08 +0000)]
Merge "vp9: Partition logic adjustment for speed 6 feature."

7 years agoquantize test: check skip_block
Johann [Tue, 8 Aug 2017 21:21:58 +0000 (14:21 -0700)]
quantize test: check skip_block

Not all sizes were tested previously. Only 4x4 and 32x32

Change-Id: I4b4beab1b92a810a097a7306de04cc9e0e260315

7 years agoquantize test: use negative input
Johann [Tue, 8 Aug 2017 21:19:56 +0000 (14:19 -0700)]
quantize test: use negative input

coeff contains signed values.

Change-Id: I02f74decf30379a28122169ab3e844d0f3bd7d23

7 years agoneon: vpx_quantize_b_32x32
Johann [Tue, 8 Aug 2017 21:05:16 +0000 (14:05 -0700)]
neon: vpx_quantize_b_32x32

With skip block the neon is about twice as fast as C.

The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.

BUG=webm:1426

Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6

7 years agoquantize: copy ssse3 optimizations to intrinsics
Johann [Thu, 3 Aug 2017 17:22:07 +0000 (10:22 -0700)]
quantize: copy ssse3 optimizations to intrinsics

Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.

Fixes test failures in HBD and OS X builds.

Allows using it in 32bit builds, where it is about 40% faster than sse2.

Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.

Change-Id: If783bb3567e561e47667e10133b9c84414a334e2

7 years agovp9: Partition logic adjustment for speed 6 feature.
Marco [Tue, 8 Aug 2017 17:34:47 +0000 (10:34 -0700)]
vp9: Partition logic adjustment for speed 6 feature.

When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.

And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.

Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.

Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae

7 years agoUpdate 32x32 idct sse2 funcs, add partial case 135
Linfeng Zhang [Tue, 8 Aug 2017 00:37:02 +0000 (17:37 -0700)]
Update 32x32 idct sse2 funcs, add partial case 135

Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a

7 years agoRename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()
Linfeng Zhang [Fri, 4 Aug 2017 00:50:03 +0000 (17:50 -0700)]
Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()

in idct x86 code

Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083

7 years agoReplace multiplication_and_add() with butterfly() in idct x86 code
Linfeng Zhang [Fri, 4 Aug 2017 00:46:21 +0000 (17:46 -0700)]
Replace multiplication_and_add() with butterfly() in idct x86 code

Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e

7 years agoUpdate butterfly() in idct x86 optimizations.
Linfeng Zhang [Fri, 4 Aug 2017 00:42:54 +0000 (17:42 -0700)]
Update butterfly() in idct x86 optimizations.

Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701

7 years agoAdd vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
Linfeng Zhang [Thu, 3 Aug 2017 00:48:40 +0000 (17:48 -0700)]
Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1

BUG=webm:1412

Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca

7 years agoUpdate for loop increment of idct x86 functions
Linfeng Zhang [Fri, 4 Aug 2017 22:29:19 +0000 (15:29 -0700)]
Update for loop increment of idct x86 functions

Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4

7 years agoUpdate high bitdepth 16x16 idct x86 code
Linfeng Zhang [Fri, 4 Aug 2017 22:10:12 +0000 (15:10 -0700)]
Update high bitdepth 16x16 idct x86 code

Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.

BUG=webm:1412

Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a

7 years agoMerge "quantize test: consolidate sizes"
Johann Koenig [Fri, 4 Aug 2017 20:34:50 +0000 (20:34 +0000)]
Merge "quantize test: consolidate sizes"

7 years agoquantize test: consolidate sizes
Johann [Wed, 2 Aug 2017 17:24:19 +0000 (10:24 -0700)]
quantize test: consolidate sizes

Pass a max txfm size parameter and combine the base quantize
test with the 32x32 test.

Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b

7 years agovpx_dsp: merge avx2 variance files
Scott LaVarnway [Fri, 4 Aug 2017 14:48:46 +0000 (07:48 -0700)]
vpx_dsp: merge avx2 variance files

BUG=webm:1404

Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4

7 years agoFix mips dspr2 6 tap filter clobber list
Kaustubh Raste [Fri, 4 Aug 2017 05:26:56 +0000 (10:56 +0530)]
Fix mips dspr2 6 tap filter clobber list

Change-Id: Ib7c07e6ce00a5c7e59113b16e6661a8369f9e646

7 years agoMerge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"
Linfeng Zhang [Fri, 4 Aug 2017 01:16:35 +0000 (01:16 +0000)]
Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"

7 years agoMerge "vpx_dsp: Use correct check for halfpel in"
Scott LaVarnway [Thu, 3 Aug 2017 23:17:09 +0000 (23:17 +0000)]
Merge "vpx_dsp: Use correct check for halfpel in"

7 years agoRewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function
Linfeng Zhang [Wed, 2 Aug 2017 23:28:13 +0000 (16:28 -0700)]
Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function

BUG=webm:1412

Change-Id: I945f0fb6807b8948747243794dc7352b959221f7

7 years agoMerge changes I76727df0,I66297d78,I1d000c6b
Linfeng Zhang [Thu, 3 Aug 2017 20:51:02 +0000 (20:51 +0000)]
Merge changes I76727df0,I66297d78,I1d000c6b

* changes:
  Extract inlined 16x16 idct sse2 code into header file
  Add transpose_32bit_8x4() sse2 optimization
  Update x86 idct optimization

7 years agovpx_dsp: Use correct check for halfpel in
Scott LaVarnway [Wed, 2 Aug 2017 19:19:19 +0000 (12:19 -0700)]
vpx_dsp: Use correct check for halfpel in

vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2

see:
17fae3a Change to use correct check for halfpel

Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c

7 years agoEnable emergency fast Q adaptation for VBR test case.
paulwilkins [Tue, 1 Aug 2017 16:06:29 +0000 (17:06 +0100)]
Enable emergency fast Q adaptation for VBR test case.

Enable fast adaptation of Q when there is a large overshoot
for the  #ifdef AGGRESSIVE_VBR test case.

AGGRESSIVE_VBR  is not currently enabled by default.

Change-Id: I7240bb6589795964b6b0b66df4468e4f21504e0f

7 years agoMerge "Force the bit exactness in the first pass"
Yunqing Wang [Thu, 3 Aug 2017 00:03:10 +0000 (00:03 +0000)]
Merge "Force the bit exactness in the first pass"

7 years agoExtract inlined 16x16 idct sse2 code into header file
Linfeng Zhang [Wed, 2 Aug 2017 23:17:43 +0000 (16:17 -0700)]
Extract inlined 16x16 idct sse2 code into header file

Will be called by high bitdepth functions.

Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c

7 years agoAdd transpose_32bit_8x4() sse2 optimization
Linfeng Zhang [Wed, 2 Aug 2017 23:15:58 +0000 (16:15 -0700)]
Add transpose_32bit_8x4() sse2 optimization

Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955

7 years agoForce the bit exactness in the first pass
Yunqing Wang [Wed, 2 Aug 2017 22:47:09 +0000 (15:47 -0700)]
Force the bit exactness in the first pass

Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.

BUG=webm:1453

Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8

7 years agoMerge "quantize test: add speed comparison"
Johann Koenig [Wed, 2 Aug 2017 21:16:35 +0000 (21:16 +0000)]
Merge "quantize test: add speed comparison"

7 years agovp8: Drop due to overshoot for non-screen content.
Marco [Fri, 30 Jun 2017 15:51:31 +0000 (08:51 -0700)]
vp8: Drop due to overshoot for non-screen content.

For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.

For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.

Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.

This feature is to avoid large frame sizes on big content
changes following low content period.

No change in behavior for screen_content_mode = 2.

Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6

7 years agoMerge "vpxdsp: variance_impl_avx2.c cleanup"
Scott LaVarnway [Wed, 2 Aug 2017 19:08:10 +0000 (19:08 +0000)]
Merge "vpxdsp: variance_impl_avx2.c cleanup"

7 years agoquantize test: add speed comparison
Johann [Thu, 27 Jul 2017 21:14:20 +0000 (14:14 -0700)]
quantize test: add speed comparison

Test some possible scenarios.

Change-Id: I1a612e7153b31756be66390ceea55877856d5a33

7 years agovpxdsp: variance_impl_avx2.c cleanup
Scott LaVarnway [Tue, 25 Jul 2017 20:26:46 +0000 (13:26 -0700)]
vpxdsp: variance_impl_avx2.c cleanup

BUG=webm:1404

Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02

7 years agoMerge "loongson mmi configuration patch."
shiyou yin [Wed, 2 Aug 2017 01:08:43 +0000 (01:08 +0000)]
Merge "loongson mmi configuration patch."

7 years agoUpdate x86 idct optimization
Linfeng Zhang [Tue, 1 Aug 2017 00:46:20 +0000 (17:46 -0700)]
Update x86 idct optimization

Move constant coefficients preparation into inline function.

Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1

7 years agoMerge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
Linfeng Zhang [Tue, 1 Aug 2017 21:39:39 +0000 (21:39 +0000)]
Merge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"

7 years agoMerge "neon: vpx_quantize_b"
Johann Koenig [Tue, 1 Aug 2017 16:44:31 +0000 (16:44 +0000)]
Merge "neon: vpx_quantize_b"

7 years agoMerge "Respond more rapidly to excessive local overshoot."
Paul Wilkins [Tue, 1 Aug 2017 08:58:36 +0000 (08:58 +0000)]
Merge "Respond more rapidly to excessive local overshoot."

7 years agoMerge "vp9: Adjust noise estimation for 360p."
Marco Paniconi [Tue, 1 Aug 2017 02:48:13 +0000 (02:48 +0000)]
Merge "vp9: Adjust noise estimation for 360p."

7 years agovp9: Adjust noise estimation for 360p.
Marco [Tue, 1 Aug 2017 00:06:14 +0000 (17:06 -0700)]
vp9: Adjust noise estimation for 360p.

Change-Id: Ib76875232491b14f7114061e8e913e87004427a0

7 years agoRewrite vpx_highbd_idct8x8_{12,64}_add_sse2
Linfeng Zhang [Mon, 31 Jul 2017 23:36:13 +0000 (16:36 -0700)]
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2

This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.

The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().

Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8

7 years agoMerge "highbd_inv_txfm_sse4: make << of neg. val a multiply"
James Zern [Mon, 31 Jul 2017 22:43:41 +0000 (22:43 +0000)]
Merge "highbd_inv_txfm_sse4: make << of neg. val a multiply"

7 years agoneon: vpx_quantize_b
Johann [Thu, 27 Jul 2017 20:25:38 +0000 (13:25 -0700)]
neon: vpx_quantize_b

With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7

7 years agoloongson mmi configuration patch.
YinShiyou [Fri, 23 Jun 2017 08:26:30 +0000 (16:26 +0800)]
loongson mmi configuration patch.

enable loongson mmi optimization: ../configure --enable-mmi

Change-Id: I7792c3adeac1d5b573917d7857bba6c1cc05fea5

7 years agoMerge "Revert "Revert "vp9: Speed feature to adapt partition based on source_sad."""
Marco Paniconi [Mon, 31 Jul 2017 14:58:15 +0000 (14:58 +0000)]
Merge "Revert "Revert "vp9: Speed feature to adapt partition based on source_sad."""

7 years agovp9: Fix denoising condition when pickmode partition is used.
Marco [Sat, 29 Jul 2017 02:11:53 +0000 (19:11 -0700)]
vp9: Fix denoising condition when pickmode partition is used.

When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).

Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04

7 years agoRevert "Revert "vp9: Speed feature to adapt partition based on source_sad.""
Jerome Jiang [Mon, 31 Jul 2017 01:57:44 +0000 (18:57 -0700)]
Revert "Revert "vp9: Speed feature to adapt partition based on source_sad.""

This reverts commit c9266b85476aadf078238b7bde3c36bf7953e11c.

Disable source_sad when resolution > 1080P. The test should
pass now.

BUG=webm:1452

Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082

7 years agohighbd_inv_txfm_sse4: make << of neg. val a multiply
James Zern [Sun, 30 Jul 2017 19:48:28 +0000 (12:48 -0700)]
highbd_inv_txfm_sse4: make << of neg. val a multiply

left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.

Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254

7 years agoMerge "Revert "vp9: Speed feature to adapt partition based on source_sad.""
James Zern [Sun, 30 Jul 2017 03:26:10 +0000 (03:26 +0000)]
Merge "Revert "vp9: Speed feature to adapt partition based on source_sad.""

7 years agoRevert "vp9: Speed feature to adapt partition based on source_sad."
James Zern [Sat, 29 Jul 2017 18:34:57 +0000 (11:34 -0700)]
Revert "vp9: Speed feature to adapt partition based on source_sad."

This reverts commit 064fc570ff8399536563e3846500fd99b273b034.

This causes an assertion failure in vp9_mcomp.c when running
gtest_filter=VP9/MotionVectorTestLarge.OverallTest/41:
`mv->col >= -((1 << (11 + 1 + 2)) - 1) && mv->col < ((1 << (11 + 1 + 2))
- 1)'

Change-Id: I449e777bf18b661cb3f1d82253610c55c51687f6

7 years agoRevert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
James Zern [Sat, 29 Jul 2017 18:07:01 +0000 (11:07 -0700)]
Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"

This reverts commit aa1c4cd140007ea5b4be99732fbb23d1fd8cf2b5.

This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2

previously the optimized path was skipped in this range

Change-Id: I9af015a46eba96208834a219fafd651d37556a80

7 years agoMerge "vp9: Adjust logic in source sad for screen content."
Marco Paniconi [Sat, 29 Jul 2017 01:46:58 +0000 (01:46 +0000)]
Merge "vp9: Adjust logic in source sad for screen content."

7 years agoMerge "vp9: Speed feature to adapt partition based on source_sad."
Marco Paniconi [Sat, 29 Jul 2017 01:45:19 +0000 (01:45 +0000)]
Merge "vp9: Speed feature to adapt partition based on source_sad."

7 years agovp9: Adjust logic in source sad for screen content.
Jerome Jiang [Fri, 28 Jul 2017 23:34:04 +0000 (16:34 -0700)]
vp9: Adjust logic in source sad for screen content.

Change-Id: I917d106f4c95ea44e413e23881f6303982e1a6a3

7 years agovp9: Speed feature to adapt partition based on source_sad.
Marco [Fri, 28 Jul 2017 17:29:12 +0000 (10:29 -0700)]
vp9: Speed feature to adapt partition based on source_sad.

Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.

Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.

Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c