platform/upstream/libvpx.git
7 years agoMerge "Add vpx_highbd_idct32x32_135_add_c()"
Linfeng Zhang [Mon, 13 Mar 2017 18:49:01 +0000 (18:49 +0000)]
Merge "Add vpx_highbd_idct32x32_135_add_c()"

7 years agoMerge "vp9: Fix condition for intra search in non-rd pickmode."
Marco Paniconi [Mon, 13 Mar 2017 06:11:12 +0000 (06:11 +0000)]
Merge "vp9: Fix condition for intra search in non-rd pickmode."

7 years agovp9: Fix condition for intra search in non-rd pickmode.
Marco [Sat, 11 Mar 2017 06:50:43 +0000 (22:50 -0800)]
vp9: Fix condition for intra search in non-rd pickmode.

Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.

Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c

Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b

7 years agoinv_txfm_ssse3,butterfly: fix win32 abi compatibility
James Zern [Fri, 10 Mar 2017 07:29:54 +0000 (23:29 -0800)]
inv_txfm_ssse3,butterfly: fix win32 abi compatibility

only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.

since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance

BUG=webm:1384

Change-Id: I0324f701e723a27cb470036a180693ba8829d01d

7 years agoMerge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt"
Marco Paniconi [Fri, 10 Mar 2017 18:26:06 +0000 (18:26 +0000)]
Merge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt"

7 years agovp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt
Marco [Fri, 10 Mar 2017 16:46:23 +0000 (08:46 -0800)]
vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt

Enable row-mt in the sample encoder vpx_temporal_svc_encoder.c,
under certain condiitons.

Change-Id: Ic103ee81a9d80be5bf6e5778cc21fc3199db909d

7 years agoMerge "Improve idct32x32_135_add SSSE3 intrinsics performance"
Yi Luo [Fri, 10 Mar 2017 17:14:30 +0000 (17:14 +0000)]
Merge "Improve idct32x32_135_add SSSE3 intrinsics performance"

7 years agoImprove idct32x32_135_add SSSE3 intrinsics performance
Yi Luo [Fri, 3 Mar 2017 00:52:41 +0000 (16:52 -0800)]
Improve idct32x32_135_add SSSE3 intrinsics performance

- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.

Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee

7 years agoMerge "ppc: include ppc.h for ppc_simd_caps()"
Johann Koenig [Thu, 9 Mar 2017 23:12:36 +0000 (23:12 +0000)]
Merge "ppc: include ppc.h for ppc_simd_caps()"

7 years agoMerge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c"
James Zern [Thu, 9 Mar 2017 22:51:08 +0000 (22:51 +0000)]
Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c"

7 years agoppc: include ppc.h for ppc_simd_caps()
Johann [Thu, 9 Mar 2017 17:26:45 +0000 (09:26 -0800)]
ppc: include ppc.h for ppc_simd_caps()

Change-Id: Idc829eb066cf4e905d062cb9c08424e0f1b7e1a7

7 years agomove vp9_scale_and_extend_frame_c to vp9_frame_scale.c
James Zern [Thu, 9 Mar 2017 04:42:35 +0000 (20:42 -0800)]
move vp9_scale_and_extend_frame_c to vp9_frame_scale.c

this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c

7 years agoMerge "vp9: Enable two speed features for SVC real-time mode."
Marco Paniconi [Thu, 9 Mar 2017 03:58:14 +0000 (03:58 +0000)]
Merge "vp9: Enable two speed features for SVC real-time mode."

7 years agovp9: Enable two speed features for SVC real-time mode.
Marco [Thu, 9 Mar 2017 00:10:45 +0000 (16:10 -0800)]
vp9: Enable two speed features for SVC real-time mode.

Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e

7 years agovp9: Add control to vpx_temporal_svc_encoder for row-mt.
Marco [Thu, 9 Mar 2017 00:01:58 +0000 (16:01 -0800)]
vp9: Add control to vpx_temporal_svc_encoder for row-mt.

Keep it off as default for now.

Change-Id: Ia2518a8ce96c9735c3fe67215dde25a35e8620af

7 years agoMerge "Shift speed 2 from non-large VP9 tests to large ones."
Jerome Jiang [Wed, 8 Mar 2017 23:14:27 +0000 (23:14 +0000)]
Merge "Shift speed 2 from non-large VP9 tests to large ones."

7 years agoMerge "Add support for POWER8/VSX"
Johann Koenig [Wed, 8 Mar 2017 22:38:21 +0000 (22:38 +0000)]
Merge "Add support for POWER8/VSX"

7 years agoMerge "Make the partition search early termination feature to be frame size dependent"
Yunqing Wang [Wed, 8 Mar 2017 22:31:30 +0000 (22:31 +0000)]
Merge "Make the partition search early termination feature to be frame size dependent"

7 years agoMake the partition search early termination feature to be frame size dependent
Yunqing Wang [Wed, 8 Mar 2017 20:24:15 +0000 (12:24 -0800)]
Make the partition search early termination feature to be frame size dependent

The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9

7 years agoUpdate vpx_idct32x32_1024_add_neon()
Linfeng Zhang [Tue, 7 Mar 2017 21:06:06 +0000 (13:06 -0800)]
Update vpx_idct32x32_1024_add_neon()

Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4

Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.

Tried to remove store_in_output(), but speed gets worse.

Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e

7 years agoAdd support for POWER8/VSX
Rafael de Lucena Valle [Thu, 20 Oct 2016 00:21:09 +0000 (22:21 -0200)]
Add support for POWER8/VSX

Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST

Add VSX flags and check for -mvsx

Define empty setup_rtcd_internal

Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux

Detect VSX at runtime when enabled

Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
7 years agoAdd vpx_highbd_idct32x32_135_add_c()
Linfeng Zhang [Wed, 8 Mar 2017 18:46:33 +0000 (10:46 -0800)]
Add vpx_highbd_idct32x32_135_add_c()

When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6

7 years agoMerge "vp9: Fix for denoising with SVC."
Marco Paniconi [Wed, 8 Mar 2017 18:26:11 +0000 (18:26 +0000)]
Merge "vp9: Fix for denoising with SVC."

7 years agovp9: Fix for denoising with SVC.
Marco [Wed, 8 Mar 2017 01:35:45 +0000 (17:35 -0800)]
vp9: Fix for denoising with SVC.

Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42

7 years agocosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
Linfeng Zhang [Tue, 7 Mar 2017 23:29:15 +0000 (15:29 -0800)]
cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()

No speed changes and disassembly is almost identical.

Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f

7 years agocosmetics,dsp/arm/: rename a variable
Linfeng Zhang [Wed, 1 Mar 2017 23:11:46 +0000 (15:11 -0800)]
cosmetics,dsp/arm/: rename a variable

Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a

7 years agoShift speed 2 from non-large VP9 tests to large ones.
Jerome Jiang [Tue, 7 Mar 2017 21:58:11 +0000 (13:58 -0800)]
Shift speed 2 from non-large VP9 tests to large ones.

This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.

Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc

7 years agoMerge "tiny_ssim.c : adds y4m support to tiny_ssim."
James Bankoski [Tue, 7 Mar 2017 18:49:13 +0000 (18:49 +0000)]
Merge "tiny_ssim.c : adds y4m support to tiny_ssim."

7 years agotiny_ssim.c : adds y4m support to tiny_ssim.
Jim Bankoski [Thu, 9 Feb 2017 22:12:55 +0000 (14:12 -0800)]
tiny_ssim.c : adds y4m support to tiny_ssim.

Change-Id: I7a13b7e3a1e11ddbe4be3009edf03528e1bc7647

7 years agoMerge "vp8_create_decoder_instances: correct pbi[] memset"
James Zern [Sat, 4 Mar 2017 00:47:17 +0000 (00:47 +0000)]
Merge "vp8_create_decoder_instances: correct pbi[] memset"

7 years agoMerge "Narrow cat6_high_cost tables to uint16_t"
Alex Converse [Fri, 3 Mar 2017 23:45:39 +0000 (23:45 +0000)]
Merge "Narrow cat6_high_cost tables to uint16_t"

7 years agovp8_create_decoder_instances: correct pbi[] memset
James Zern [Fri, 3 Mar 2017 23:23:32 +0000 (15:23 -0800)]
vp8_create_decoder_instances: correct pbi[] memset

clear the entire array on error. the size used previously was equal to
the number of elements.

BUG=webm:1364

Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46

7 years agoNarrow cat6_high_cost tables to uint16_t
Alex Converse [Fri, 3 Mar 2017 23:02:56 +0000 (15:02 -0800)]
Narrow cat6_high_cost tables to uint16_t

Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56

7 years agoMerge "vp9,realtime: Enable row multithreading for non-rd"
Vignesh Venkatasubramanian [Fri, 3 Mar 2017 19:05:52 +0000 (19:05 +0000)]
Merge "vp9,realtime: Enable row multithreading for non-rd"

7 years agoMerge "vp9: Speed 8: reduce the adaptive_rd_thresh level."
Marco Paniconi [Thu, 2 Mar 2017 22:25:03 +0000 (22:25 +0000)]
Merge "vp9: Speed 8: reduce the adaptive_rd_thresh level."

7 years agovp9: Speed 8: reduce the adaptive_rd_thresh level.
Marco [Thu, 2 Mar 2017 21:01:53 +0000 (13:01 -0800)]
vp9: Speed 8: reduce the adaptive_rd_thresh level.

Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).

Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13

7 years agovp9,realtime: Enable row multithreading for non-rd
Vignesh Venkatasubramanian [Mon, 13 Feb 2017 19:36:02 +0000 (11:36 -0800)]
vp9,realtime: Enable row multithreading for non-rd

Enable row level multithreading for realtime encodes where non-rd
path is used (speed >= 5).

Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41

7 years agoImprove idct32x32_34_add SSSE3 intrinsics performance
Yi Luo [Wed, 1 Mar 2017 00:38:41 +0000 (16:38 -0800)]
Improve idct32x32_34_add SSSE3 intrinsics performance

- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.

Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35

7 years agoMerge "VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface"
Chrome Cunningham [Wed, 1 Mar 2017 18:01:13 +0000 (18:01 +0000)]
Merge "VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface"

7 years agoVPX_CODEC_CAP_HIGHBITDEPTH for decoder interface
Chris Cunningham [Thu, 16 Feb 2017 23:02:30 +0000 (15:02 -0800)]
VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface

Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value
is changed as part of this move.

Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH.

Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419

7 years agoRevert "Fix for max qindex calculation of a gf interval"
James Zern [Wed, 1 Mar 2017 00:17:49 +0000 (16:17 -0800)]
Revert "Fix for max qindex calculation of a gf interval"

This reverts commit d3db846cc50b1b0a9f6efcbe2b36c9c1943bc528.

This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)

BUG=b/35804225

Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0

7 years agovp9_ethread_test,cosmetics: s/new-mt/row-mt/
James Zern [Tue, 28 Feb 2017 23:13:11 +0000 (15:13 -0800)]
vp9_ethread_test,cosmetics: s/new-mt/row-mt/

Change-Id: I8c145337adf49d30b88a17ff31501b8751ed1fa0

7 years agostress.sh: add vp9_stress_test_row_mt
James Zern [Fri, 24 Feb 2017 08:55:01 +0000 (00:55 -0800)]
stress.sh: add vp9_stress_test_row_mt

vp9_stress_test now forces --row-mt=0 to cover both versions

Change-Id: I8d134879435bf1d8e76ab3fd89e698efba0e86b2

7 years agostress.sh: parameterize thread count
James Zern [Fri, 24 Feb 2017 08:54:02 +0000 (00:54 -0800)]
stress.sh: parameterize thread count

Change-Id: Iae45266cea86585f0935af4012335198cf93719f

7 years agostress.sh: add one pass encodes
James Zern [Fri, 24 Feb 2017 08:30:08 +0000 (00:30 -0800)]
stress.sh: add one pass encodes

Change-Id: I38e6c988f17c56fbfacd95378b27ef8d77c75f90

7 years agoAdd a comment in encoder thread test
Yunqing Wang [Tue, 28 Feb 2017 19:13:09 +0000 (11:13 -0800)]
Add a comment in encoder thread test

Added a comment.

Change-Id: I82f71c72598ad6f1eaa0b57b0b8ec56ab9658e81

7 years agoSet row_mt to 0 by default
Yunqing Wang [Tue, 28 Feb 2017 19:00:56 +0000 (11:00 -0800)]
Set row_mt to 0 by default

Set row_mt to 0 for now.

Change-Id: I922536a6d71a765e435daeaf4d932ef14363d19a

7 years agovp9: Fix an issue with setting variance thresholds.
Marco [Mon, 27 Feb 2017 20:03:12 +0000 (12:03 -0800)]
vp9: Fix an issue with setting variance thresholds.

From commit:
https://chromium-review.googlesource.com/c/441393/

On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.

Small change in RTC metrics and speed.

Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e

7 years agovp9_ethread_test: Rename new_mt to row_mt
Vignesh Venkatasubramanian [Mon, 27 Feb 2017 18:50:02 +0000 (10:50 -0800)]
vp9_ethread_test: Rename new_mt to row_mt

Rename left over occurences of new_mt.

Change-Id: Ib884e84c801fcd366ca4b57ec912ac5972023375

7 years agovp9: Rename new_mt to row_mt
Vignesh Venkatasubramanian [Fri, 24 Feb 2017 19:40:22 +0000 (11:40 -0800)]
vp9: Rename new_mt to row_mt

new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558

7 years agoRemove an old leftover comment
Yunqing Wang [Sat, 25 Feb 2017 02:31:21 +0000 (18:31 -0800)]
Remove an old leftover comment

Removed an old comment that wasn't true anymore.

Change-Id: I286ad8d7cb2843070a55e45a599d26bc226d6bd7

7 years agoget_prob(): rationalize int types
James Zern [Fri, 24 Feb 2017 23:36:52 +0000 (15:36 -0800)]
get_prob(): rationalize int types

promote the unsigned int calculation to uint64_t rather than int64_t for
type consistency

Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438

7 years agoMerge "Improve VP9 encoder threading test for better coverage"
Yunqing Wang [Fri, 24 Feb 2017 23:26:22 +0000 (23:26 +0000)]
Merge "Improve VP9 encoder threading test for better coverage"

7 years agoImprove VP9 encoder threading test for better coverage
Yunqing Wang [Wed, 22 Feb 2017 20:24:16 +0000 (12:24 -0800)]
Improve VP9 encoder threading test for better coverage

Re-organized the encoder threading tests and grouped tests into
4 parts. Added PSNR checking test to make sure the PSNR variation
is within a small range.

BUG=webm:1376

Change-Id: I09edb990236a87a4d2b2b0e1ceaf6c6435a35eff

7 years agoMerge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8."
Jerome Jiang [Fri, 24 Feb 2017 16:56:33 +0000 (16:56 +0000)]
Merge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8."

7 years agoconsolidate block_error functions
Johann [Fri, 17 Feb 2017 01:57:44 +0000 (17:57 -0800)]
consolidate block_error functions

vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.

In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.

Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1

7 years agoMerge "block error sse2: use tran_low_t"
Johann Koenig [Fri, 24 Feb 2017 05:24:34 +0000 (05:24 +0000)]
Merge "block error sse2: use tran_low_t"

7 years agoMake vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.
Jerome Jiang [Wed, 22 Feb 2017 22:24:02 +0000 (14:24 -0800)]
Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.

Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.

Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.

BUG=webm:1371

Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712

7 years agoblock error sse2: use tran_low_t
Johann [Thu, 16 Feb 2017 20:44:49 +0000 (12:44 -0800)]
block error sse2: use tran_low_t

Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb

7 years agoMerge "vp8_fdct4x4 test: fix segfault again"
Johann Koenig [Thu, 23 Feb 2017 07:41:20 +0000 (07:41 +0000)]
Merge "vp8_fdct4x4 test: fix segfault again"

7 years agoMerge "vp9: 1pass CBR: modify condition for reducing loop filter."
Marco Paniconi [Thu, 23 Feb 2017 03:24:26 +0000 (03:24 +0000)]
Merge "vp9: 1pass CBR: modify condition for reducing loop filter."

7 years agoMerge "vp9: Non-rd pickmode: use simple block_yrd under some conditons."
Jerome Jiang [Wed, 22 Feb 2017 23:19:29 +0000 (23:19 +0000)]
Merge "vp9: Non-rd pickmode: use simple block_yrd under some conditons."

7 years agovp9: 1pass CBR: modify condition for reducing loop filter.
Marco [Wed, 22 Feb 2017 23:06:28 +0000 (15:06 -0800)]
vp9: 1pass CBR: modify condition for reducing loop filter.

The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.

Only affects 1 pass CBR.

Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664

7 years agovp9: Non-rd pickmode: use simple block_yrd under some conditons.
Marco [Fri, 17 Feb 2017 16:44:50 +0000 (08:44 -0800)]
vp9: Non-rd pickmode: use simple block_yrd under some conditons.

For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.

Disabled for now.

Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a

7 years agoMerge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0."
Marco Paniconi [Wed, 22 Feb 2017 19:52:24 +0000 (19:52 +0000)]
Merge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0."

7 years agovp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.
Marco [Wed, 22 Feb 2017 18:45:21 +0000 (10:45 -0800)]
vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.

This prevent possible reduction of cyclic refresh after key frame.

Change-Id: Idd4e49b69cd95476e7eccfa31b2bd8669569e9e8

7 years agovp8_fdct4x4 test: fix segfault again
Johann [Tue, 21 Feb 2017 19:12:45 +0000 (11:12 -0800)]
vp8_fdct4x4 test: fix segfault again

The output needs to be aligned. Input is read with 'movq' not 'movqda'
so it is not expected to be aligned.

Change-Id: Ibd48a84c1785917a6a97c3689a05322abba486b4

7 years agovp9: Only compute y_sad for golden in variance partition for speed < 8.
Jerome Jiang [Wed, 22 Feb 2017 17:49:17 +0000 (09:49 -0800)]
vp9: Only compute y_sad for golden in variance partition for speed < 8.

Only affects speed 8. No obvious quality regression. Systematic speed
ups by ~1% on Nexus 6.

Change-Id: Ia904ca28ea041c3281c532911ec38fb7d7f46a17

7 years agoMerge "Refactored the row based multi-threading code"
Yunqing Wang [Wed, 22 Feb 2017 16:55:03 +0000 (16:55 +0000)]
Merge "Refactored the row based multi-threading code"

7 years agoMerge "Fix segmentation fault caused by denoiser working with spatial SVC."
Jerome Jiang [Wed, 22 Feb 2017 04:44:55 +0000 (04:44 +0000)]
Merge "Fix segmentation fault caused by denoiser working with spatial SVC."

7 years agovp9: Incorporate source sum_diff into non-rd partition thresholds.
Marco [Mon, 13 Feb 2017 18:16:42 +0000 (10:16 -0800)]
vp9: Incorporate source sum_diff into non-rd partition thresholds.

Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.

Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.

Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1

7 years agoFollowing SSSE3 intrinsics functions also work for HBD
Yi Luo [Tue, 21 Feb 2017 20:07:47 +0000 (12:07 -0800)]
Following SSSE3 intrinsics functions also work for HBD

- vpx_idct8x8_12_add_ssse3
  vpx_idct8x8_64_add_ssse3
  vpx_idct32x32_34_add_ssse3
  vpx_idct32x32_135_add_ssse3
  vpx_idct32x32_1024_add_ssse3
- turn on unit tests.

Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7

7 years agoMerge "Drop zbin_ptr and quant_shift_ptr"
Johann Koenig [Tue, 21 Feb 2017 18:16:38 +0000 (18:16 +0000)]
Merge "Drop zbin_ptr and quant_shift_ptr"

7 years agoFix segmentation fault caused by denoiser working with spatial SVC.
Jerome Jiang [Sat, 18 Feb 2017 01:56:08 +0000 (17:56 -0800)]
Fix segmentation fault caused by denoiser working with spatial SVC.

Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb

7 years agoMerge "Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests"
Yi Luo [Tue, 21 Feb 2017 16:36:05 +0000 (16:36 +0000)]
Merge "Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests"

7 years agoMerge "Change to prediction decay calculation."
Paul Wilkins [Tue, 21 Feb 2017 09:42:37 +0000 (09:42 +0000)]
Merge "Change to prediction decay calculation."

7 years agoMerge "vp9: Fix for non-rd pickmode for high-bitdepth build."
Marco Paniconi [Tue, 21 Feb 2017 05:37:22 +0000 (05:37 +0000)]
Merge "vp9: Fix for non-rd pickmode for high-bitdepth build."

7 years agovp9: Fix for non-rd pickmode for high-bitdepth build.
Marco [Tue, 21 Feb 2017 04:15:40 +0000 (20:15 -0800)]
vp9: Fix for non-rd pickmode for high-bitdepth build.

Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.

For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).

This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.

Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754

7 years agoRefactored the row based multi-threading code
Ranjit Kumar Tulabandu [Thu, 16 Feb 2017 13:37:41 +0000 (19:07 +0530)]
Refactored the row based multi-threading code

Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness

Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df

7 years agovp8_fdct4x4_test: align input and output buffers
James Zern [Sat, 18 Feb 2017 21:24:32 +0000 (13:24 -0800)]
vp8_fdct4x4_test: align input and output buffers

fixes segfault in 32-bit builds

Change-Id: I5b3cc5a335cb236a6ec4cb11fa8feb54ae0182c7

7 years agodatarate_test: disable OnePassCbrSvc2SpatialLayersDenoiserOn
James Zern [Sat, 18 Feb 2017 00:23:22 +0000 (16:23 -0800)]
datarate_test: disable OnePassCbrSvc2SpatialLayersDenoiserOn

segfaults

BUG=webm:1374

Change-Id: I3790c6cb8a539d13dee6a8225ef09b1575dea26c

7 years agoMerge "vp8_short_fdct4x4: verify optimized functions"
Johann Koenig [Fri, 17 Feb 2017 22:11:08 +0000 (22:11 +0000)]
Merge "vp8_short_fdct4x4: verify optimized functions"

7 years agoFix idct8x8 SSSE3 SingleExtremeCoeff unit tests
Yi Luo [Fri, 17 Feb 2017 18:59:46 +0000 (10:59 -0800)]
Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests

- In SSSE3 optimization, 16-bit addition and subtraction would
  overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
  idct8x8_64: 284 -> 294
  idct8x8_12: 145 -> 158.

BUG=webm:1332

Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b

7 years agoMerge "Add vpx_highbd_idct16x16_10_add_neon()"
James Zern [Fri, 17 Feb 2017 20:29:36 +0000 (20:29 +0000)]
Merge "Add vpx_highbd_idct16x16_10_add_neon()"

7 years agoChange to prediction decay calculation.
paulwilkins [Wed, 15 Feb 2017 16:41:38 +0000 (16:41 +0000)]
Change to prediction decay calculation.

This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.

The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)

Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c

7 years agovp8_short_fdct4x4: verify optimized functions
Johann [Fri, 23 Sep 2016 23:45:03 +0000 (16:45 -0700)]
vp8_short_fdct4x4: verify optimized functions

Change-Id: I7c7f5dfabde65c09f111fb0ced0e3ad231ee716e

7 years agotiny_ssim: clean up on failure
Johann [Tue, 31 Jan 2017 23:58:43 +0000 (15:58 -0800)]
tiny_ssim: clean up on failure

Clears up clang static analysis warnings about memory leaks.

Change-Id: I60d4d0f3794735a8b81d9da4a30d19e7a9cba9cf

7 years agoReplace idct32x32_1024_add_ssse3 assembly with intrinsics
Yi Luo [Thu, 16 Feb 2017 21:15:22 +0000 (13:15 -0800)]
Replace idct32x32_1024_add_ssse3 assembly with intrinsics

- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
  i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.

Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc

7 years agoMerge "cosmetics: Fix spelling mistake in compile flag name."
James Zern [Fri, 17 Feb 2017 00:04:42 +0000 (00:04 +0000)]
Merge "cosmetics: Fix spelling mistake in compile flag name."

7 years agoMerge "block error avx2: use tran_low_t"
Johann Koenig [Thu, 16 Feb 2017 23:51:14 +0000 (23:51 +0000)]
Merge "block error avx2: use tran_low_t"

7 years agoAdd vpx_highbd_idct16x16_10_add_neon()
Linfeng Zhang [Tue, 14 Feb 2017 18:24:51 +0000 (10:24 -0800)]
Add vpx_highbd_idct16x16_10_add_neon()

BUG=webm:1301

Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab

7 years agoMerge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"
James Zern [Thu, 16 Feb 2017 23:02:10 +0000 (23:02 +0000)]
Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"

7 years agoMerge "disable VP9MultiThreadedFrameParallel tests"
James Zern [Thu, 16 Feb 2017 22:56:02 +0000 (22:56 +0000)]
Merge "disable VP9MultiThreadedFrameParallel tests"

7 years agocosmetics: Fix spelling mistake in compile flag name.
paulwilkins [Thu, 16 Feb 2017 12:36:56 +0000 (12:36 +0000)]
cosmetics: Fix spelling mistake in compile flag name.

agressive -> aggressive

after:
ce7b38459 Aggressive VBR method.

Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5

7 years agoMerge "correct bitdepth_conversion_sse2.h header guard"
Johann Koenig [Thu, 16 Feb 2017 21:41:27 +0000 (21:41 +0000)]
Merge "correct bitdepth_conversion_sse2.h header guard"

7 years agoDrop zbin_ptr and quant_shift_ptr
Johann [Tue, 14 Feb 2017 00:29:49 +0000 (16:29 -0800)]
Drop zbin_ptr and quant_shift_ptr

vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.

scan is used for C code and iscan is used for SIMD implementations.

Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5

7 years agodisable VP9MultiThreadedFrameParallel tests
James Zern [Thu, 16 Feb 2017 20:56:04 +0000 (12:56 -0800)]
disable VP9MultiThreadedFrameParallel tests

these are flaky and cause TSan warnings with clang-3.9.1

BUG=webm:1372

Change-Id: I8a7047552ba2ccd2d8c45f8795818c74562e5990

7 years agocorrect bitdepth_conversion_sse2.h header guard
Johann [Thu, 16 Feb 2017 20:43:33 +0000 (12:43 -0800)]
correct bitdepth_conversion_sse2.h header guard

Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519

7 years agoMerge "Add idct32x32_135_add SSSE3 intrinsics"
Yi Luo [Thu, 16 Feb 2017 20:43:28 +0000 (20:43 +0000)]
Merge "Add idct32x32_135_add SSSE3 intrinsics"

7 years agoblock error avx2: use tran_low_t
Johann [Thu, 16 Feb 2017 19:12:31 +0000 (11:12 -0800)]
block error avx2: use tran_low_t

Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c