platform/upstream/libvpx.git
7 years agoRemove an old leftover comment
Yunqing Wang [Sat, 25 Feb 2017 02:31:21 +0000 (18:31 -0800)]
Remove an old leftover comment

Removed an old comment that wasn't true anymore.

Change-Id: I286ad8d7cb2843070a55e45a599d26bc226d6bd7

7 years agoget_prob(): rationalize int types
James Zern [Fri, 24 Feb 2017 23:36:52 +0000 (15:36 -0800)]
get_prob(): rationalize int types

promote the unsigned int calculation to uint64_t rather than int64_t for
type consistency

Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438

7 years agoMerge "Improve VP9 encoder threading test for better coverage"
Yunqing Wang [Fri, 24 Feb 2017 23:26:22 +0000 (23:26 +0000)]
Merge "Improve VP9 encoder threading test for better coverage"

7 years agoImprove VP9 encoder threading test for better coverage
Yunqing Wang [Wed, 22 Feb 2017 20:24:16 +0000 (12:24 -0800)]
Improve VP9 encoder threading test for better coverage

Re-organized the encoder threading tests and grouped tests into
4 parts. Added PSNR checking test to make sure the PSNR variation
is within a small range.

BUG=webm:1376

Change-Id: I09edb990236a87a4d2b2b0e1ceaf6c6435a35eff

7 years agoMerge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8."
Jerome Jiang [Fri, 24 Feb 2017 16:56:33 +0000 (16:56 +0000)]
Merge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8."

7 years agoconsolidate block_error functions
Johann [Fri, 17 Feb 2017 01:57:44 +0000 (17:57 -0800)]
consolidate block_error functions

vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.

In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.

Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1

7 years agoMerge "block error sse2: use tran_low_t"
Johann Koenig [Fri, 24 Feb 2017 05:24:34 +0000 (05:24 +0000)]
Merge "block error sse2: use tran_low_t"

7 years agoMake vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.
Jerome Jiang [Wed, 22 Feb 2017 22:24:02 +0000 (14:24 -0800)]
Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.

Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.

Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.

BUG=webm:1371

Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712

7 years agoblock error sse2: use tran_low_t
Johann [Thu, 16 Feb 2017 20:44:49 +0000 (12:44 -0800)]
block error sse2: use tran_low_t

Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb

7 years agoMerge "vp8_fdct4x4 test: fix segfault again"
Johann Koenig [Thu, 23 Feb 2017 07:41:20 +0000 (07:41 +0000)]
Merge "vp8_fdct4x4 test: fix segfault again"

7 years agoMerge "vp9: 1pass CBR: modify condition for reducing loop filter."
Marco Paniconi [Thu, 23 Feb 2017 03:24:26 +0000 (03:24 +0000)]
Merge "vp9: 1pass CBR: modify condition for reducing loop filter."

7 years agoMerge "vp9: Non-rd pickmode: use simple block_yrd under some conditons."
Jerome Jiang [Wed, 22 Feb 2017 23:19:29 +0000 (23:19 +0000)]
Merge "vp9: Non-rd pickmode: use simple block_yrd under some conditons."

7 years agovp9: 1pass CBR: modify condition for reducing loop filter.
Marco [Wed, 22 Feb 2017 23:06:28 +0000 (15:06 -0800)]
vp9: 1pass CBR: modify condition for reducing loop filter.

The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.

Only affects 1 pass CBR.

Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664

7 years agovp9: Non-rd pickmode: use simple block_yrd under some conditons.
Marco [Fri, 17 Feb 2017 16:44:50 +0000 (08:44 -0800)]
vp9: Non-rd pickmode: use simple block_yrd under some conditons.

For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.

Disabled for now.

Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a

7 years agoMerge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0."
Marco Paniconi [Wed, 22 Feb 2017 19:52:24 +0000 (19:52 +0000)]
Merge "vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0."

7 years agovp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.
Marco [Wed, 22 Feb 2017 18:45:21 +0000 (10:45 -0800)]
vp9: aq-mode=3: On key frame reset cr->reduce_refresh to 0.

This prevent possible reduction of cyclic refresh after key frame.

Change-Id: Idd4e49b69cd95476e7eccfa31b2bd8669569e9e8

7 years agovp8_fdct4x4 test: fix segfault again
Johann [Tue, 21 Feb 2017 19:12:45 +0000 (11:12 -0800)]
vp8_fdct4x4 test: fix segfault again

The output needs to be aligned. Input is read with 'movq' not 'movqda'
so it is not expected to be aligned.

Change-Id: Ibd48a84c1785917a6a97c3689a05322abba486b4

7 years agovp9: Only compute y_sad for golden in variance partition for speed < 8.
Jerome Jiang [Wed, 22 Feb 2017 17:49:17 +0000 (09:49 -0800)]
vp9: Only compute y_sad for golden in variance partition for speed < 8.

Only affects speed 8. No obvious quality regression. Systematic speed
ups by ~1% on Nexus 6.

Change-Id: Ia904ca28ea041c3281c532911ec38fb7d7f46a17

7 years agoMerge "Refactored the row based multi-threading code"
Yunqing Wang [Wed, 22 Feb 2017 16:55:03 +0000 (16:55 +0000)]
Merge "Refactored the row based multi-threading code"

7 years agoMerge "Fix segmentation fault caused by denoiser working with spatial SVC."
Jerome Jiang [Wed, 22 Feb 2017 04:44:55 +0000 (04:44 +0000)]
Merge "Fix segmentation fault caused by denoiser working with spatial SVC."

7 years agovp9: Incorporate source sum_diff into non-rd partition thresholds.
Marco [Mon, 13 Feb 2017 18:16:42 +0000 (10:16 -0800)]
vp9: Incorporate source sum_diff into non-rd partition thresholds.

Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.

Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.

Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1

7 years agoFollowing SSSE3 intrinsics functions also work for HBD
Yi Luo [Tue, 21 Feb 2017 20:07:47 +0000 (12:07 -0800)]
Following SSSE3 intrinsics functions also work for HBD

- vpx_idct8x8_12_add_ssse3
  vpx_idct8x8_64_add_ssse3
  vpx_idct32x32_34_add_ssse3
  vpx_idct32x32_135_add_ssse3
  vpx_idct32x32_1024_add_ssse3
- turn on unit tests.

Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7

7 years agoMerge "Drop zbin_ptr and quant_shift_ptr"
Johann Koenig [Tue, 21 Feb 2017 18:16:38 +0000 (18:16 +0000)]
Merge "Drop zbin_ptr and quant_shift_ptr"

7 years agoFix segmentation fault caused by denoiser working with spatial SVC.
Jerome Jiang [Sat, 18 Feb 2017 01:56:08 +0000 (17:56 -0800)]
Fix segmentation fault caused by denoiser working with spatial SVC.

Re-enable the affected test.
BUG=webm:1374

Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb

7 years agoMerge "Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests"
Yi Luo [Tue, 21 Feb 2017 16:36:05 +0000 (16:36 +0000)]
Merge "Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests"

7 years agoMerge "Change to prediction decay calculation."
Paul Wilkins [Tue, 21 Feb 2017 09:42:37 +0000 (09:42 +0000)]
Merge "Change to prediction decay calculation."

7 years agoMerge "vp9: Fix for non-rd pickmode for high-bitdepth build."
Marco Paniconi [Tue, 21 Feb 2017 05:37:22 +0000 (05:37 +0000)]
Merge "vp9: Fix for non-rd pickmode for high-bitdepth build."

7 years agovp9: Fix for non-rd pickmode for high-bitdepth build.
Marco [Tue, 21 Feb 2017 04:15:40 +0000 (20:15 -0800)]
vp9: Fix for non-rd pickmode for high-bitdepth build.

Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.

For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).

This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.

Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754

7 years agoRefactored the row based multi-threading code
Ranjit Kumar Tulabandu [Thu, 16 Feb 2017 13:37:41 +0000 (19:07 +0530)]
Refactored the row based multi-threading code

Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness

Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df

7 years agovp8_fdct4x4_test: align input and output buffers
James Zern [Sat, 18 Feb 2017 21:24:32 +0000 (13:24 -0800)]
vp8_fdct4x4_test: align input and output buffers

fixes segfault in 32-bit builds

Change-Id: I5b3cc5a335cb236a6ec4cb11fa8feb54ae0182c7

7 years agodatarate_test: disable OnePassCbrSvc2SpatialLayersDenoiserOn
James Zern [Sat, 18 Feb 2017 00:23:22 +0000 (16:23 -0800)]
datarate_test: disable OnePassCbrSvc2SpatialLayersDenoiserOn

segfaults

BUG=webm:1374

Change-Id: I3790c6cb8a539d13dee6a8225ef09b1575dea26c

7 years agoMerge "vp8_short_fdct4x4: verify optimized functions"
Johann Koenig [Fri, 17 Feb 2017 22:11:08 +0000 (22:11 +0000)]
Merge "vp8_short_fdct4x4: verify optimized functions"

7 years agoFix idct8x8 SSSE3 SingleExtremeCoeff unit tests
Yi Luo [Fri, 17 Feb 2017 18:59:46 +0000 (10:59 -0800)]
Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests

- In SSSE3 optimization, 16-bit addition and subtraction would
  overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
  idct8x8_64: 284 -> 294
  idct8x8_12: 145 -> 158.

BUG=webm:1332

Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b

7 years agoMerge "Add vpx_highbd_idct16x16_10_add_neon()"
James Zern [Fri, 17 Feb 2017 20:29:36 +0000 (20:29 +0000)]
Merge "Add vpx_highbd_idct16x16_10_add_neon()"

7 years agoChange to prediction decay calculation.
paulwilkins [Wed, 15 Feb 2017 16:41:38 +0000 (16:41 +0000)]
Change to prediction decay calculation.

This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.

The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)

Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c

7 years agovp8_short_fdct4x4: verify optimized functions
Johann [Fri, 23 Sep 2016 23:45:03 +0000 (16:45 -0700)]
vp8_short_fdct4x4: verify optimized functions

Change-Id: I7c7f5dfabde65c09f111fb0ced0e3ad231ee716e

7 years agotiny_ssim: clean up on failure
Johann [Tue, 31 Jan 2017 23:58:43 +0000 (15:58 -0800)]
tiny_ssim: clean up on failure

Clears up clang static analysis warnings about memory leaks.

Change-Id: I60d4d0f3794735a8b81d9da4a30d19e7a9cba9cf

7 years agoReplace idct32x32_1024_add_ssse3 assembly with intrinsics
Yi Luo [Thu, 16 Feb 2017 21:15:22 +0000 (13:15 -0800)]
Replace idct32x32_1024_add_ssse3 assembly with intrinsics

- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
  i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.

Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc

7 years agoMerge "cosmetics: Fix spelling mistake in compile flag name."
James Zern [Fri, 17 Feb 2017 00:04:42 +0000 (00:04 +0000)]
Merge "cosmetics: Fix spelling mistake in compile flag name."

7 years agoMerge "block error avx2: use tran_low_t"
Johann Koenig [Thu, 16 Feb 2017 23:51:14 +0000 (23:51 +0000)]
Merge "block error avx2: use tran_low_t"

7 years agoAdd vpx_highbd_idct16x16_10_add_neon()
Linfeng Zhang [Tue, 14 Feb 2017 18:24:51 +0000 (10:24 -0800)]
Add vpx_highbd_idct16x16_10_add_neon()

BUG=webm:1301

Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab

7 years agoMerge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"
James Zern [Thu, 16 Feb 2017 23:02:10 +0000 (23:02 +0000)]
Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"

7 years agoMerge "disable VP9MultiThreadedFrameParallel tests"
James Zern [Thu, 16 Feb 2017 22:56:02 +0000 (22:56 +0000)]
Merge "disable VP9MultiThreadedFrameParallel tests"

7 years agocosmetics: Fix spelling mistake in compile flag name.
paulwilkins [Thu, 16 Feb 2017 12:36:56 +0000 (12:36 +0000)]
cosmetics: Fix spelling mistake in compile flag name.

agressive -> aggressive

after:
ce7b38459 Aggressive VBR method.

Change-Id: Ie0f30b1bbc77ed9f32bec047b4a9b3d0cf4853f5

7 years agoMerge "correct bitdepth_conversion_sse2.h header guard"
Johann Koenig [Thu, 16 Feb 2017 21:41:27 +0000 (21:41 +0000)]
Merge "correct bitdepth_conversion_sse2.h header guard"

7 years agoDrop zbin_ptr and quant_shift_ptr
Johann [Tue, 14 Feb 2017 00:29:49 +0000 (16:29 -0800)]
Drop zbin_ptr and quant_shift_ptr

vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.

scan is used for C code and iscan is used for SIMD implementations.

Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5

7 years agodisable VP9MultiThreadedFrameParallel tests
James Zern [Thu, 16 Feb 2017 20:56:04 +0000 (12:56 -0800)]
disable VP9MultiThreadedFrameParallel tests

these are flaky and cause TSan warnings with clang-3.9.1

BUG=webm:1372

Change-Id: I8a7047552ba2ccd2d8c45f8795818c74562e5990

7 years agocorrect bitdepth_conversion_sse2.h header guard
Johann [Thu, 16 Feb 2017 20:43:33 +0000 (12:43 -0800)]
correct bitdepth_conversion_sse2.h header guard

Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519

7 years agoMerge "Add idct32x32_135_add SSSE3 intrinsics"
Yi Luo [Thu, 16 Feb 2017 20:43:28 +0000 (20:43 +0000)]
Merge "Add idct32x32_135_add SSSE3 intrinsics"

7 years agoblock error avx2: use tran_low_t
Johann [Thu, 16 Feb 2017 19:12:31 +0000 (11:12 -0800)]
block error avx2: use tran_low_t

Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c

7 years agoMerge changes I267050a5,Iebade0ef,Id96a8df3
Johann Koenig [Thu, 16 Feb 2017 20:34:48 +0000 (20:34 +0000)]
Merge changes I267050a5,Iebade0ef,Id96a8df3

* changes:
  quantize_fp_32x32 highbd ssse3: enable existing function
  quantize_fp highbd ssse3: use tran_low_t for coeff
  quantize_fp highbd sse2: use tran_low_t for coeff

7 years agoAdd idct32x32_135_add SSSE3 intrinsics
Yi Luo [Wed, 15 Feb 2017 01:09:59 +0000 (17:09 -0800)]
Add idct32x32_135_add SSSE3 intrinsics

- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.

Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5

7 years agoMerge "Structured the mode ordering code to avoid redundant memcpy"
Yunqing Wang [Thu, 16 Feb 2017 16:22:54 +0000 (16:22 +0000)]
Merge "Structured the mode ordering code to avoid redundant memcpy"

7 years agoquantize_fp_32x32 highbd ssse3: enable existing function
Johann [Thu, 16 Feb 2017 15:29:32 +0000 (07:29 -0800)]
quantize_fp_32x32 highbd ssse3: enable existing function

This was created as part of the quantize_fp_ssse3 change. Both
functions use the same source file with different macro parameters.

Change-Id: I267050a559426a85955d215aa0aaca270439c5ab

7 years agoquantize_fp highbd ssse3: use tran_low_t for coeff
Johann [Thu, 16 Feb 2017 03:01:38 +0000 (19:01 -0800)]
quantize_fp highbd ssse3: use tran_low_t for coeff

Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8

7 years agoquantize_fp highbd sse2: use tran_low_t for coeff
Johann [Fri, 3 Feb 2017 23:57:28 +0000 (15:57 -0800)]
quantize_fp highbd sse2: use tran_low_t for coeff

Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa

7 years agobitdepth conversion: really use num elements
Johann [Thu, 16 Feb 2017 01:17:45 +0000 (17:17 -0800)]
bitdepth conversion: really use num elements

The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.

Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a

7 years agoStructured the mode ordering code to avoid redundant memcpy
Ranjit Kumar Tulabandu [Thu, 16 Feb 2017 14:07:39 +0000 (19:37 +0530)]
Structured the mode ordering code to avoid redundant memcpy

Change-Id: I4f5d6b54018bd1928cd9e5e42619e6f55b334803

7 years agoMerge "Disconnect ARF breakout from frame boost."
Paul Wilkins [Thu, 16 Feb 2017 10:02:09 +0000 (10:02 +0000)]
Merge "Disconnect ARF breakout from frame boost."

7 years agoMerge "Remove unnecessary factor."
Paul Wilkins [Thu, 16 Feb 2017 10:01:57 +0000 (10:01 +0000)]
Merge "Remove unnecessary factor."

7 years agoMerge "Bug in scale_sse_threshold()"
Paul Wilkins [Thu, 16 Feb 2017 10:01:45 +0000 (10:01 +0000)]
Merge "Bug in scale_sse_threshold()"

7 years agoMerge "Additional first pass stats."
Paul Wilkins [Thu, 16 Feb 2017 09:39:29 +0000 (09:39 +0000)]
Merge "Additional first pass stats."

7 years agoFix mips vpx_post_proc_down_and_across_mb_row_msa function
Kaustubh Raste [Thu, 16 Feb 2017 06:42:24 +0000 (12:12 +0530)]
Fix mips vpx_post_proc_down_and_across_mb_row_msa function

Added fix to handle non-multiple of 16 cols case for size 16

Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c

7 years agoMerge "Use 'packssdw' for loading tran_low_t values"
Johann Koenig [Thu, 16 Feb 2017 02:40:59 +0000 (02:40 +0000)]
Merge "Use 'packssdw' for loading tran_low_t values"

7 years agoMerge "vp8_dx_iface: remove unused 'else' condition"
Johann Koenig [Thu, 16 Feb 2017 01:00:44 +0000 (01:00 +0000)]
Merge "vp8_dx_iface: remove unused 'else' condition"

7 years agoMerge "vpx_temporal_svc_encoder.sh: remove FUNCNAME bashism"
James Zern [Thu, 16 Feb 2017 00:21:19 +0000 (00:21 +0000)]
Merge "vpx_temporal_svc_encoder.sh: remove FUNCNAME bashism"

7 years agoMerge "vp9: Some code cleanup for aq-mode = 3."
Marco Paniconi [Wed, 15 Feb 2017 23:03:27 +0000 (23:03 +0000)]
Merge "vp9: Some code cleanup for aq-mode = 3."

7 years agovp9: Some code cleanup for aq-mode = 3.
Marco [Wed, 15 Feb 2017 21:51:14 +0000 (13:51 -0800)]
vp9: Some code cleanup for aq-mode = 3.

The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.

Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db

7 years agovpx_temporal_svc_encoder: Expose error resilient control to cmd line.
Jerome Jiang [Wed, 15 Feb 2017 03:09:15 +0000 (19:09 -0800)]
vpx_temporal_svc_encoder: Expose error resilient control to cmd line.

Change-Id: Ic74a8690b136ffbc370080f70b2d5a6b1572bf63

7 years agoMerge "cosmetics,dsp/inv_txfm.c: reorder functions"
Linfeng Zhang [Wed, 15 Feb 2017 20:18:23 +0000 (20:18 +0000)]
Merge "cosmetics,dsp/inv_txfm.c: reorder functions"

7 years agoMerge "vp9. Use same source_sad threshold for all speeds."
Marco Paniconi [Wed, 15 Feb 2017 20:07:19 +0000 (20:07 +0000)]
Merge "vp9. Use same source_sad threshold for all speeds."

7 years agocosmetics,dsp/inv_txfm.c: reorder functions
Linfeng Zhang [Wed, 15 Feb 2017 00:27:30 +0000 (16:27 -0800)]
cosmetics,dsp/inv_txfm.c: reorder functions

Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6

7 years agoMerge "Add vpx_highbd_idct16x16_38_add_neon()"
Linfeng Zhang [Wed, 15 Feb 2017 19:34:18 +0000 (19:34 +0000)]
Merge "Add vpx_highbd_idct16x16_38_add_neon()"

7 years agovp9. Use same source_sad threshold for all speeds.
Marco [Wed, 15 Feb 2017 19:26:29 +0000 (11:26 -0800)]
vp9. Use same source_sad threshold for all speeds.

Only affects real-time mode.

Change-Id: Iba836f110c4da936f5173cc0f54424d5b6121bff

7 years agoVp9: Speed 8 aq-mode=3: Reduce computation in estimating bits per mb.
Marco [Wed, 15 Feb 2017 17:18:34 +0000 (09:18 -0800)]
Vp9: Speed 8 aq-mode=3: Reduce computation in estimating bits per mb.

vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.

Change-Id: If25933715783f31104a18a5092ea347b1221b5f5

7 years agoAdd vpx_highbd_idct16x16_38_add_neon()
Linfeng Zhang [Wed, 8 Feb 2017 00:58:12 +0000 (16:58 -0800)]
Add vpx_highbd_idct16x16_38_add_neon()

BUG=webm:1301

Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe

7 years agoMerge "Add vpx_highbd_idct16x16_38_add_c()"
Linfeng Zhang [Wed, 15 Feb 2017 17:06:16 +0000 (17:06 +0000)]
Merge "Add vpx_highbd_idct16x16_38_add_c()"

7 years agoDisconnect ARF breakout from frame boost.
paulwilkins [Wed, 15 Feb 2017 10:33:10 +0000 (10:33 +0000)]
Disconnect ARF breakout from frame boost.

This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.

The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.

The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.

Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c

7 years agoRemove unnecessary factor.
paulwilkins [Tue, 14 Feb 2017 10:35:12 +0000 (10:35 +0000)]
Remove unnecessary factor.

Removed unnecessary scaling factor to simplify.

Change-Id: I3fc9c5975a2597e72f1324e09dd586dea1facfa7

7 years agoBug in scale_sse_threshold()
paulwilkins [Thu, 9 Feb 2017 16:30:38 +0000 (16:30 +0000)]
Bug in scale_sse_threshold()

The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.

SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.

Hence the threshold used for the test should also be scaled up.

Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d

7 years agoAdditional first pass stats.
paulwilkins [Mon, 12 Dec 2016 14:05:19 +0000 (14:05 +0000)]
Additional first pass stats.

Added counts that split the intra coded blocks into low and high variance.

Change-Id: Ic540144b34d5141659081bb22f7ee16fd6861f14

7 years agoMerge "Aggressive VBR method."
Paul Wilkins [Wed, 15 Feb 2017 10:37:02 +0000 (10:37 +0000)]
Merge "Aggressive VBR method."

7 years agovpx_temporal_svc_encoder.sh: remove FUNCNAME bashism
James Zern [Wed, 15 Feb 2017 07:44:00 +0000 (23:44 -0800)]
vpx_temporal_svc_encoder.sh: remove FUNCNAME bashism

replace with an explicit output file prefix that matches the function
name

Change-Id: I7f6a4105adb34327b1099a5fbf132aa8d1ad5b90

7 years agoMerge "vp9 fdct higbd neon: connect existing highbd calls"
Johann Koenig [Wed, 15 Feb 2017 01:33:00 +0000 (01:33 +0000)]
Merge "vp9 fdct higbd neon: connect existing highbd calls"

7 years agoAdd vpx_highbd_idct16x16_38_add_c()
Linfeng Zhang [Tue, 14 Feb 2017 23:39:37 +0000 (15:39 -0800)]
Add vpx_highbd_idct16x16_38_add_c()

When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.

BUG=webm:1301

Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060

7 years agoMerge "Row based multi-threading of encoding stage"
Yunqing Wang [Wed, 15 Feb 2017 00:54:10 +0000 (00:54 +0000)]
Merge "Row based multi-threading of encoding stage"

7 years agoRow based multi-threading of encoding stage
Ranjit Kumar Tulabandu [Fri, 10 Feb 2017 10:55:50 +0000 (16:25 +0530)]
Row based multi-threading of encoding stage

(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.

Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.

Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de

7 years agoMerge "Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts"
Linfeng Zhang [Wed, 15 Feb 2017 00:46:29 +0000 (00:46 +0000)]
Merge "Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts"

7 years agovp8_dx_iface: remove unused 'else' condition
Johann [Tue, 31 Jan 2017 23:40:58 +0000 (15:40 -0800)]
vp8_dx_iface: remove unused 'else' condition

Clears up static clang analysis warning regarding a dead store.

Change-Id: If4fe7a9a7f94c6e2001d46136944f90712e543b4

7 years agoUse 'packssdw' for loading tran_low_t values
Johann [Tue, 14 Feb 2017 22:26:09 +0000 (14:26 -0800)]
Use 'packssdw' for loading tran_low_t values

This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.

Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b

7 years agovp9 fdct higbd neon: connect existing highbd calls
Johann [Fri, 3 Feb 2017 23:25:50 +0000 (15:25 -0800)]
vp9 fdct higbd neon: connect existing highbd calls

Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c

7 years agoMerge "quantize_fp highbd neon: use tran_low_t for coeff"
Johann Koenig [Tue, 14 Feb 2017 21:28:22 +0000 (21:28 +0000)]
Merge "quantize_fp highbd neon: use tran_low_t for coeff"

7 years agoReplace 14 with DCT_CONST_BITS in idct NEON functions' shifts
Linfeng Zhang [Tue, 14 Feb 2017 20:44:57 +0000 (12:44 -0800)]
Replace 14 with DCT_CONST_BITS in idct NEON functions' shifts

Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6

7 years agoapply clang-format
clang-format [Tue, 14 Feb 2017 04:06:18 +0000 (20:06 -0800)]
apply clang-format

Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce

7 years ago.clang-format: update to 3.9.1
James Zern [Wed, 1 Feb 2017 02:06:43 +0000 (18:06 -0800)]
.clang-format: update to 3.9.1

Change-Id: Ia51f2201df897651067d09122075953382b59139

7 years agoMerge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"
Yi Luo [Tue, 14 Feb 2017 20:13:26 +0000 (20:13 +0000)]
Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"

7 years agoReplace idct32x32_34_add_ssse3 assembly with intrinsics
Yi Luo [Wed, 8 Feb 2017 19:09:03 +0000 (11:09 -0800)]
Replace idct32x32_34_add_ssse3 assembly with intrinsics

- No user-level speed performance change.
- Pass unit tests.

Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6

7 years agoquantize_fp highbd neon: use tran_low_t for coeff
Johann [Fri, 3 Feb 2017 22:24:32 +0000 (14:24 -0800)]
quantize_fp highbd neon: use tran_low_t for coeff

Change-Id: I90fd815f15884490ad138f35df575a00d31e8c95

7 years agovp8 onyx_if: assert divide by zero
Johann [Tue, 31 Jan 2017 23:51:15 +0000 (15:51 -0800)]
vp8 onyx_if: assert divide by zero

Clears up static clang analysis warning regarding divide by zero.

Trying to explain to the compiler how it's impossible to avoid
incrementing num_blocks at least once is difficult.

Change-Id: Ibaae43be572e5cd7a689b440dcd341c17d33443b

7 years agoMerge "Remove UNINITIALIZED_IS_SAFE"
Johann Koenig [Tue, 14 Feb 2017 03:02:50 +0000 (03:02 +0000)]
Merge "Remove UNINITIALIZED_IS_SAFE"