platform/upstream/libvpx.git
6 years agoneon: vpx_quantize_b_32x32
Johann [Tue, 8 Aug 2017 21:05:16 +0000 (14:05 -0700)]
neon: vpx_quantize_b_32x32

With skip block the neon is about twice as fast as C.

The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.

BUG=webm:1426

Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6

6 years agoUpdate 32x32 idct sse2 funcs, add partial case 135
Linfeng Zhang [Tue, 8 Aug 2017 00:37:02 +0000 (17:37 -0700)]
Update 32x32 idct sse2 funcs, add partial case 135

Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a

6 years agoRename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()
Linfeng Zhang [Fri, 4 Aug 2017 00:50:03 +0000 (17:50 -0700)]
Rename highbd_multiplication_and_add_xx() to highbd_butterfly_xx()

in idct x86 code

Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083

6 years agoReplace multiplication_and_add() with butterfly() in idct x86 code
Linfeng Zhang [Fri, 4 Aug 2017 00:46:21 +0000 (17:46 -0700)]
Replace multiplication_and_add() with butterfly() in idct x86 code

Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e

6 years agoUpdate butterfly() in idct x86 optimizations.
Linfeng Zhang [Fri, 4 Aug 2017 00:42:54 +0000 (17:42 -0700)]
Update butterfly() in idct x86 optimizations.

Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701

6 years agoAdd vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1
Linfeng Zhang [Thu, 3 Aug 2017 00:48:40 +0000 (17:48 -0700)]
Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1

BUG=webm:1412

Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca

6 years agoUpdate for loop increment of idct x86 functions
Linfeng Zhang [Fri, 4 Aug 2017 22:29:19 +0000 (15:29 -0700)]
Update for loop increment of idct x86 functions

Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4

6 years agoUpdate high bitdepth 16x16 idct x86 code
Linfeng Zhang [Fri, 4 Aug 2017 22:10:12 +0000 (15:10 -0700)]
Update high bitdepth 16x16 idct x86 code

Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.

BUG=webm:1412

Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a

6 years agoMerge "quantize test: consolidate sizes"
Johann Koenig [Fri, 4 Aug 2017 20:34:50 +0000 (20:34 +0000)]
Merge "quantize test: consolidate sizes"

6 years agoquantize test: consolidate sizes
Johann [Wed, 2 Aug 2017 17:24:19 +0000 (10:24 -0700)]
quantize test: consolidate sizes

Pass a max txfm size parameter and combine the base quantize
test with the 32x32 test.

Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b

6 years agovpx_dsp: merge avx2 variance files
Scott LaVarnway [Fri, 4 Aug 2017 14:48:46 +0000 (07:48 -0700)]
vpx_dsp: merge avx2 variance files

BUG=webm:1404

Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4

6 years agoFix mips dspr2 6 tap filter clobber list
Kaustubh Raste [Fri, 4 Aug 2017 05:26:56 +0000 (10:56 +0530)]
Fix mips dspr2 6 tap filter clobber list

Change-Id: Ib7c07e6ce00a5c7e59113b16e6661a8369f9e646

6 years agoMerge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"
Linfeng Zhang [Fri, 4 Aug 2017 01:16:35 +0000 (01:16 +0000)]
Merge "Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function"

6 years agoMerge "vpx_dsp: Use correct check for halfpel in"
Scott LaVarnway [Thu, 3 Aug 2017 23:17:09 +0000 (23:17 +0000)]
Merge "vpx_dsp: Use correct check for halfpel in"

6 years agoRewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function
Linfeng Zhang [Wed, 2 Aug 2017 23:28:13 +0000 (16:28 -0700)]
Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function

BUG=webm:1412

Change-Id: I945f0fb6807b8948747243794dc7352b959221f7

6 years agoMerge changes I76727df0,I66297d78,I1d000c6b
Linfeng Zhang [Thu, 3 Aug 2017 20:51:02 +0000 (20:51 +0000)]
Merge changes I76727df0,I66297d78,I1d000c6b

* changes:
  Extract inlined 16x16 idct sse2 code into header file
  Add transpose_32bit_8x4() sse2 optimization
  Update x86 idct optimization

6 years agovpx_dsp: Use correct check for halfpel in
Scott LaVarnway [Wed, 2 Aug 2017 19:19:19 +0000 (12:19 -0700)]
vpx_dsp: Use correct check for halfpel in

vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2

see:
17fae3a Change to use correct check for halfpel

Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c

6 years agoMerge "Force the bit exactness in the first pass"
Yunqing Wang [Thu, 3 Aug 2017 00:03:10 +0000 (00:03 +0000)]
Merge "Force the bit exactness in the first pass"

6 years agoExtract inlined 16x16 idct sse2 code into header file
Linfeng Zhang [Wed, 2 Aug 2017 23:17:43 +0000 (16:17 -0700)]
Extract inlined 16x16 idct sse2 code into header file

Will be called by high bitdepth functions.

Change-Id: I76727df00941b5a27adceaba8347f275475fcd8c

6 years agoAdd transpose_32bit_8x4() sse2 optimization
Linfeng Zhang [Wed, 2 Aug 2017 23:15:58 +0000 (16:15 -0700)]
Add transpose_32bit_8x4() sse2 optimization

Change-Id: I66297d78b38db718cfe3ebb8ea972f5a72c17955

6 years agoForce the bit exactness in the first pass
Yunqing Wang [Wed, 2 Aug 2017 22:47:09 +0000 (15:47 -0700)]
Force the bit exactness in the first pass

Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.

BUG=webm:1453

Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8

6 years agoMerge "quantize test: add speed comparison"
Johann Koenig [Wed, 2 Aug 2017 21:16:35 +0000 (21:16 +0000)]
Merge "quantize test: add speed comparison"

6 years agovp8: Drop due to overshoot for non-screen content.
Marco [Fri, 30 Jun 2017 15:51:31 +0000 (08:51 -0700)]
vp8: Drop due to overshoot for non-screen content.

For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.

For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.

Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.

This feature is to avoid large frame sizes on big content
changes following low content period.

No change in behavior for screen_content_mode = 2.

Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6

6 years agoMerge "vpxdsp: variance_impl_avx2.c cleanup"
Scott LaVarnway [Wed, 2 Aug 2017 19:08:10 +0000 (19:08 +0000)]
Merge "vpxdsp: variance_impl_avx2.c cleanup"

6 years agoquantize test: add speed comparison
Johann [Thu, 27 Jul 2017 21:14:20 +0000 (14:14 -0700)]
quantize test: add speed comparison

Test some possible scenarios.

Change-Id: I1a612e7153b31756be66390ceea55877856d5a33

6 years agovpxdsp: variance_impl_avx2.c cleanup
Scott LaVarnway [Tue, 25 Jul 2017 20:26:46 +0000 (13:26 -0700)]
vpxdsp: variance_impl_avx2.c cleanup

BUG=webm:1404

Change-Id: I8d8498009e5ef7bf1137e4ff16ec81738a020b02

6 years agoMerge "loongson mmi configuration patch."
shiyou yin [Wed, 2 Aug 2017 01:08:43 +0000 (01:08 +0000)]
Merge "loongson mmi configuration patch."

6 years agoUpdate x86 idct optimization
Linfeng Zhang [Tue, 1 Aug 2017 00:46:20 +0000 (17:46 -0700)]
Update x86 idct optimization

Move constant coefficients preparation into inline function.

Change-Id: I1d000c6b161794c8828ff70768439b767e2afea1

6 years agoMerge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
Linfeng Zhang [Tue, 1 Aug 2017 21:39:39 +0000 (21:39 +0000)]
Merge "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"

6 years agoMerge "neon: vpx_quantize_b"
Johann Koenig [Tue, 1 Aug 2017 16:44:31 +0000 (16:44 +0000)]
Merge "neon: vpx_quantize_b"

6 years agoMerge "Respond more rapidly to excessive local overshoot."
Paul Wilkins [Tue, 1 Aug 2017 08:58:36 +0000 (08:58 +0000)]
Merge "Respond more rapidly to excessive local overshoot."

6 years agoMerge "vp9: Adjust noise estimation for 360p."
Marco Paniconi [Tue, 1 Aug 2017 02:48:13 +0000 (02:48 +0000)]
Merge "vp9: Adjust noise estimation for 360p."

6 years agovp9: Adjust noise estimation for 360p.
Marco [Tue, 1 Aug 2017 00:06:14 +0000 (17:06 -0700)]
vp9: Adjust noise estimation for 360p.

Change-Id: Ib76875232491b14f7114061e8e913e87004427a0

6 years agoRewrite vpx_highbd_idct8x8_{12,64}_add_sse2
Linfeng Zhang [Mon, 31 Jul 2017 23:36:13 +0000 (16:36 -0700)]
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2

This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.

The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().

Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8

6 years agoMerge "highbd_inv_txfm_sse4: make << of neg. val a multiply"
James Zern [Mon, 31 Jul 2017 22:43:41 +0000 (22:43 +0000)]
Merge "highbd_inv_txfm_sse4: make << of neg. val a multiply"

6 years agoneon: vpx_quantize_b
Johann [Thu, 27 Jul 2017 20:25:38 +0000 (13:25 -0700)]
neon: vpx_quantize_b

With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7

6 years agoloongson mmi configuration patch.
YinShiyou [Fri, 23 Jun 2017 08:26:30 +0000 (16:26 +0800)]
loongson mmi configuration patch.

enable loongson mmi optimization: ../configure --enable-mmi

Change-Id: I7792c3adeac1d5b573917d7857bba6c1cc05fea5

6 years agoMerge "Revert "Revert "vp9: Speed feature to adapt partition based on source_sad."""
Marco Paniconi [Mon, 31 Jul 2017 14:58:15 +0000 (14:58 +0000)]
Merge "Revert "Revert "vp9: Speed feature to adapt partition based on source_sad."""

6 years agovp9: Fix denoising condition when pickmode partition is used.
Marco [Sat, 29 Jul 2017 02:11:53 +0000 (19:11 -0700)]
vp9: Fix denoising condition when pickmode partition is used.

When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).

Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04

6 years agoRevert "Revert "vp9: Speed feature to adapt partition based on source_sad.""
Jerome Jiang [Mon, 31 Jul 2017 01:57:44 +0000 (18:57 -0700)]
Revert "Revert "vp9: Speed feature to adapt partition based on source_sad.""

This reverts commit c9266b85476aadf078238b7bde3c36bf7953e11c.

Disable source_sad when resolution > 1080P. The test should
pass now.

BUG=webm:1452

Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082

6 years agohighbd_inv_txfm_sse4: make << of neg. val a multiply
James Zern [Sun, 30 Jul 2017 19:48:28 +0000 (12:48 -0700)]
highbd_inv_txfm_sse4: make << of neg. val a multiply

left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.

Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254

6 years agoMerge "Revert "vp9: Speed feature to adapt partition based on source_sad.""
James Zern [Sun, 30 Jul 2017 03:26:10 +0000 (03:26 +0000)]
Merge "Revert "vp9: Speed feature to adapt partition based on source_sad.""

6 years agoRevert "vp9: Speed feature to adapt partition based on source_sad."
James Zern [Sat, 29 Jul 2017 18:34:57 +0000 (11:34 -0700)]
Revert "vp9: Speed feature to adapt partition based on source_sad."

This reverts commit 064fc570ff8399536563e3846500fd99b273b034.

This causes an assertion failure in vp9_mcomp.c when running
gtest_filter=VP9/MotionVectorTestLarge.OverallTest/41:
`mv->col >= -((1 << (11 + 1 + 2)) - 1) && mv->col < ((1 << (11 + 1 + 2))
- 1)'

Change-Id: I449e777bf18b661cb3f1d82253610c55c51687f6

6 years agoRevert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"
James Zern [Sat, 29 Jul 2017 18:07:01 +0000 (11:07 -0700)]
Revert "Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2"

This reverts commit aa1c4cd140007ea5b4be99732fbb23d1fd8cf2b5.

This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2

previously the optimized path was skipped in this range

Change-Id: I9af015a46eba96208834a219fafd651d37556a80

6 years agoMerge "vp9: Adjust logic in source sad for screen content."
Marco Paniconi [Sat, 29 Jul 2017 01:46:58 +0000 (01:46 +0000)]
Merge "vp9: Adjust logic in source sad for screen content."

6 years agoMerge "vp9: Speed feature to adapt partition based on source_sad."
Marco Paniconi [Sat, 29 Jul 2017 01:45:19 +0000 (01:45 +0000)]
Merge "vp9: Speed feature to adapt partition based on source_sad."

6 years agovp9: Adjust logic in source sad for screen content.
Jerome Jiang [Fri, 28 Jul 2017 23:34:04 +0000 (16:34 -0700)]
vp9: Adjust logic in source sad for screen content.

Change-Id: I917d106f4c95ea44e413e23881f6303982e1a6a3

6 years agovp9: Speed feature to adapt partition based on source_sad.
Marco [Fri, 28 Jul 2017 17:29:12 +0000 (10:29 -0700)]
vp9: Speed feature to adapt partition based on source_sad.

Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.

Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.

Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c

6 years agoRemove the DP version of vp9_optimize_b().
Urvang Joshi [Fri, 28 Jul 2017 22:57:22 +0000 (15:57 -0700)]
Remove the DP version of vp9_optimize_b().

The greedy version was already enabled by default here:
https://chromium-review.googlesource.com/c/546848/

And the speed+compression gains from greedy version were already
mentioned here:
https://chromium-review.googlesource.com/c/531675/

Change-Id: Iad9f7d03490c845ad1e230af028c9d39edddca97

6 years agoMerge changes Ia0e20f5f,I28150789,I35df041b,I221dff34
Linfeng Zhang [Fri, 28 Jul 2017 22:43:00 +0000 (22:43 +0000)]
Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34

* changes:
  Update vpx_idct16x16_10_add_sse2()
  Add vpx_idct16x16_38_add_sse2()
  Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2
  Refactor highbd idct 4x4 and 8x8 x86 functions

6 years agoRevert "quantize ssse3: declare all variables"
James Zern [Fri, 28 Jul 2017 08:21:28 +0000 (01:21 -0700)]
Revert "quantize ssse3: declare all variables"

This reverts commit 03f5e300d69d368290305e19cc66bac8b0ea1ff8.

This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0

Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b

6 years agoUpdate vpx_idct16x16_10_add_sse2()
Linfeng Zhang [Fri, 21 Jul 2017 21:56:42 +0000 (14:56 -0700)]
Update vpx_idct16x16_10_add_sse2()

Change-Id: Ia0e20f5fa47382af5785221eebb05212b40bd35c

6 years agoAdd vpx_idct16x16_38_add_sse2()
Linfeng Zhang [Thu, 20 Jul 2017 23:53:19 +0000 (16:53 -0700)]
Add vpx_idct16x16_38_add_sse2()

Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898

6 years agoRewrite vpx_highbd_idct8x8_{12,64}_add_sse2
Linfeng Zhang [Fri, 30 Jun 2017 23:55:17 +0000 (16:55 -0700)]
Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2

BUG=webm:1412

Change-Id: I35df041b757d42278ac7a5cdbd909e8ffcee1455

6 years agoRefactor highbd idct 4x4 and 8x8 x86 functions
Linfeng Zhang [Fri, 30 Jun 2017 20:55:38 +0000 (13:55 -0700)]
Refactor highbd idct 4x4 and 8x8 x86 functions

BUG=webm:1412

Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec

6 years agoMerge "quantize ssse3: declare all variables"
Johann Koenig [Thu, 27 Jul 2017 21:18:35 +0000 (21:18 +0000)]
Merge "quantize ssse3: declare all variables"

6 years agoMerge "vp8: Remove isolated skin & non skin blocks."
Jerome Jiang [Thu, 27 Jul 2017 20:24:08 +0000 (20:24 +0000)]
Merge "vp8: Remove isolated skin & non skin blocks."

6 years agovp8: Remove isolated skin & non skin blocks.
Jerome Jiang [Wed, 19 Jul 2017 20:02:53 +0000 (13:02 -0700)]
vp8: Remove isolated skin & non skin blocks.

Neutral on RTC metrics and speed on Pixel.

Change-Id: I26b907483fe133e6e4c1009d147631f0d0e0f2fb

6 years agoinv_txfm_{sse2,ssse3}: clear conversion warnings
James Zern [Wed, 26 Jul 2017 03:13:49 +0000 (20:13 -0700)]
inv_txfm_{sse2,ssse3}: clear conversion warnings

visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16

Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745

6 years agohighbd_idct*_sse*.c: clear conversion warnings
James Zern [Wed, 26 Jul 2017 03:11:09 +0000 (20:11 -0700)]
highbd_idct*_sse*.c: clear conversion warnings

visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32

Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05

6 years agovpx_variance16x16_sse2: correct cast order
James Zern [Tue, 25 Jul 2017 23:40:21 +0000 (16:40 -0700)]
vpx_variance16x16_sse2: correct cast order

allow the right shift to operate on 64-bits, this matches the rest of
the implementations

previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order

Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa

6 years agovpx_get16x16var_avx2: correct cast order
James Zern [Mon, 24 Jul 2017 23:29:44 +0000 (16:29 -0700)]
vpx_get16x16var_avx2: correct cast order

allow the right shift to operate on 64-bits, this matches the rest of
the implementations

missed in:
6acd061aa variance_avx2: sync variance functions with c-code

Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617

6 years agoset_var_thresh_from_histogram: prevent negative variance
James Zern [Sat, 22 Jul 2017 20:01:49 +0000 (13:01 -0700)]
set_var_thresh_from_histogram: prevent negative variance

For 8-bit the subtrahend is small enough to fit into uint32_t.

For 10/12-bit apply:
63a37d16f Prevent negative variance

previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.

Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee

6 years agovp8: Fix compile warning in vp8_multi_resolution_encoder.c
Marco [Thu, 20 Jul 2017 20:43:55 +0000 (13:43 -0700)]
vp8: Fix compile warning in vp8_multi_resolution_encoder.c

Change-Id: I49c960179dfc1902aa5e5c99915789878c06bc3d

6 years agoMerge "quantize test: promote RandRange() result to signed"
Johann Koenig [Thu, 20 Jul 2017 19:46:05 +0000 (19:46 +0000)]
Merge "quantize test: promote RandRange() result to signed"

6 years agoMerge "quantize test: lowbd functions do not pass in highbd"
Johann Koenig [Thu, 20 Jul 2017 19:45:59 +0000 (19:45 +0000)]
Merge "quantize test: lowbd functions do not pass in highbd"

6 years agoMerge "vp9: Removed unused skin detection function."
Jerome Jiang [Thu, 20 Jul 2017 16:58:01 +0000 (16:58 +0000)]
Merge "vp9: Removed unused skin detection function."

6 years agoquantize test: promote RandRange() result to signed
Johann [Wed, 19 Jul 2017 21:33:00 +0000 (14:33 -0700)]
quantize test: promote RandRange() result to signed

Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'

Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12

6 years agoquantize test: lowbd functions do not pass in highbd
Johann [Wed, 19 Jul 2017 21:20:13 +0000 (14:20 -0700)]
quantize test: lowbd functions do not pass in highbd

qcoeff output looks OK but dqcoeff is no good.

BUG=webm:1448

Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd

6 years agoMerge "quantize test: eob is output"
Johann Koenig [Thu, 20 Jul 2017 15:17:26 +0000 (15:17 +0000)]
Merge "quantize test: eob is output"

7 years agoMerge "Earmark extra space for VSX."
Johann Koenig [Wed, 19 Jul 2017 21:35:57 +0000 (21:35 +0000)]
Merge "Earmark extra space for VSX."

7 years agoMerge "Roll libwebm: Fix android build failure with NDK r15b."
Jerome Jiang [Wed, 19 Jul 2017 21:30:21 +0000 (21:30 +0000)]
Merge "Roll libwebm: Fix android build failure with NDK r15b."

7 years agoquantize test: eob is output
Johann [Tue, 18 Jul 2017 21:20:14 +0000 (14:20 -0700)]
quantize test: eob is output

eob values are generated by the function.

Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3

7 years agoEarmark extra space for VSX.
Han Shen [Wed, 12 Jul 2017 19:56:19 +0000 (12:56 -0700)]
Earmark extra space for VSX.

Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around.  Fixed by allocating more when
building for VSX. This is reported by asan.

Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.

Refer:
  vsx:
    vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
  neon:
    vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
  sse2:
    tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm

BUG=b/63112600

Tested:
  asan tests passed.

Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d

7 years agoMerge "variance: call C comp_avg_pred"
Johann Koenig [Wed, 19 Jul 2017 20:34:13 +0000 (20:34 +0000)]
Merge "variance: call C comp_avg_pred"

7 years agoRoll libwebm: Fix android build failure with NDK r15b.
Jerome Jiang [Mon, 17 Jul 2017 20:59:14 +0000 (13:59 -0700)]
Roll libwebm: Fix android build failure with NDK r15b.

BUG=webm:1447

Change-Id: I8defe45cb94eb9c209ba72ce446786f24c14c0b8

7 years agovp9: Removed unused skin detection function.
Jerome Jiang [Tue, 18 Jul 2017 21:52:04 +0000 (14:52 -0700)]
vp9: Removed unused skin detection function.

Change-Id: I6702b7b11aa4ac9aac5fd54deef4377cdcb29c64

7 years agoMerge "vp9: Allocate alt-ref in denoiser for SVC."
Jerome Jiang [Tue, 18 Jul 2017 21:30:04 +0000 (21:30 +0000)]
Merge "vp9: Allocate alt-ref in denoiser for SVC."

7 years agoMerge "vp9: Remove isolated skin & non-skin blocks."
Jerome Jiang [Tue, 18 Jul 2017 20:48:32 +0000 (20:48 +0000)]
Merge "vp9: Remove isolated skin & non-skin blocks."

7 years agoMerge changes I62c2e313,Ibd7a0337,I94e1d886
Johann Koenig [Tue, 18 Jul 2017 20:42:39 +0000 (20:42 +0000)]
Merge changes I62c2e313,Ibd7a0337,I94e1d886

* changes:
  quantize test: test sse2 and avx optimizations
  quantize test: extend arrays
  quantize test: restrict and correct input

7 years agovariance: call C comp_avg_pred
Johann [Fri, 14 Jul 2017 18:29:32 +0000 (11:29 -0700)]
variance: call C comp_avg_pred

Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.

Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689

7 years agovp9: Allocate alt-ref in denoiser for SVC.
Jerome Jiang [Mon, 17 Jul 2017 23:29:16 +0000 (16:29 -0700)]
vp9: Allocate alt-ref in denoiser for SVC.

When SVC is used, allocate alt-ref in denoiser.

Change-Id: I1b17221b55b9444cd23b97d481b54ff8d296d857

7 years agoquantize ssse3: declare all variables
Johann [Tue, 18 Jul 2017 19:32:57 +0000 (12:32 -0700)]
quantize ssse3: declare all variables

Copy missing line from avx implementation.

Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27

7 years agoquantize test: test sse2 and avx optimizations
Johann [Tue, 18 Jul 2017 17:06:23 +0000 (10:06 -0700)]
quantize test: test sse2 and avx optimizations

ssse3 does not pass either of the tests.

avx 32x32 does not pass.

Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44

7 years agovp9: Remove isolated skin & non-skin blocks.
Jerome Jiang [Tue, 11 Jul 2017 18:31:01 +0000 (11:31 -0700)]
vp9: Remove isolated skin & non-skin blocks.

0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.

Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc

7 years agoquantize test: extend arrays
Johann [Tue, 18 Jul 2017 16:55:45 +0000 (09:55 -0700)]
quantize test: extend arrays

Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.

If all the optimizations were unified, the structure sizes could be
greatly reduced.

Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae

7 years agoquantize test: restrict and correct input
Johann [Tue, 18 Jul 2017 16:40:45 +0000 (09:40 -0700)]
quantize test: restrict and correct input

Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.

This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.

Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e

7 years agovp9: Disable usage of sb_use_mv_part for SVC.
Marco [Tue, 18 Jul 2017 16:15:13 +0000 (09:15 -0700)]
vp9: Disable usage of sb_use_mv_part for SVC.

To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.

Will re-enable later when issue is resolved.

Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44

7 years agovp9: Fix to setting content_state for real-time mode.
Marco [Mon, 17 Jul 2017 23:04:04 +0000 (16:04 -0700)]
vp9: Fix to setting content_state for real-time mode.

When content_state_sb is set to LowVarHighSumdiff, don't reset
it to VeryHighSad. Visually better on clips with strong lighting changes.

Small/negligible change in RTC metrics and speed.

Change-Id: I20c383e3c4cf8d1149de5f9260449c0b7cf7c6aa

7 years agovp9: Reuse motion from choose_partitioning in NEWMV search.
Marco [Thu, 13 Jul 2017 21:49:39 +0000 (14:49 -0700)]
vp9: Reuse motion from choose_partitioning in NEWMV search.

When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.

For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.

Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b

7 years agoMerge "fix 'make exampletest' w/CONFIG_REALTIME_ONLY"
James Zern [Sat, 15 Jul 2017 18:37:10 +0000 (18:37 +0000)]
Merge "fix 'make exampletest' w/CONFIG_REALTIME_ONLY"

7 years agovp9: Compute skin only for blocks eligible for noise estimation.
Jerome Jiang [Fri, 14 Jul 2017 20:45:33 +0000 (13:45 -0700)]
vp9: Compute skin only for blocks eligible for noise estimation.

Change-Id: Iddcb83a5968db57cfd312c5bc44b2a226a2a3264

7 years agovp9: Adjust minmax threshold for variance partitioning.
Marco [Thu, 13 Jul 2017 23:09:11 +0000 (16:09 -0700)]
vp9: Adjust minmax threshold for variance partitioning.

Only affects speed 7. Improvement on high motion clips.

Change-Id: Ibddb68fed9c63207df29ffd790f9205b1cecf687

7 years agoquantize test: use Buffer
Johann [Thu, 13 Jul 2017 16:14:37 +0000 (09:14 -0700)]
quantize test: use Buffer

Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.

Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.

eob is a single element and never written using aligned instructions.

BUG=webm:1426

Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35

7 years agofix 'make exampletest' w/CONFIG_REALTIME_ONLY
James Zern [Thu, 13 Jul 2017 17:47:20 +0000 (10:47 -0700)]
fix 'make exampletest' w/CONFIG_REALTIME_ONLY

for tests that aren't explicitly testing 2-pass behavior use --passes=1
with this configuration

Change-Id: I6a1520ecc65d0f626486604310af29dacb9f197f

7 years agoMerge "remove vp9_firstpass.c w/CONFIG_REALTIME_ONLY"
James Zern [Wed, 12 Jul 2017 23:30:04 +0000 (23:30 +0000)]
Merge "remove vp9_firstpass.c w/CONFIG_REALTIME_ONLY"

7 years agoMerge "sad4d neon: 64x[32,64]"
Johann Koenig [Wed, 12 Jul 2017 20:15:00 +0000 (20:15 +0000)]
Merge "sad4d neon: 64x[32,64]"

7 years agoMerge "vp9: Fix to SVC and denoising for fixed pattern case."
Marco Paniconi [Wed, 12 Jul 2017 19:13:05 +0000 (19:13 +0000)]
Merge "vp9: Fix to SVC and denoising for fixed pattern case."

7 years agoMerge changes Ibf5e61dc,I44b48512,I7de2500c,I5081b5ce
Johann Koenig [Wed, 12 Jul 2017 15:01:30 +0000 (15:01 +0000)]
Merge changes Ibf5e61dc,I44b48512,I7de2500c,I5081b5ce

* changes:
  sad4d neon: 32x[16,32,64]
  sad4d neon: 16x[8,16,32]
  sad4d neon: 8x[4,8,16]
  sad4d neon: 4x4, 4x8

7 years agosad4d neon: 64x[32,64]
Johann [Tue, 11 Jul 2017 16:15:09 +0000 (09:15 -0700)]
sad4d neon: 64x[32,64]

Rewrite 64x64.

BUG=webm:1425

Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf