platform/upstream/libvpx.git
10 years agoMerge "Fix loopfilter bug"
Yaowu Xu [Tue, 9 Jul 2013 08:34:25 +0000 (01:34 -0700)]
Merge "Fix loopfilter bug"

10 years agoMerge "Using mi_cols instead of mb_cols."
Dmitry Kovalev [Tue, 9 Jul 2013 03:09:19 +0000 (20:09 -0700)]
Merge "Using mi_cols instead of mb_cols."

10 years agoMerge "Refactoring setup_pre_planes function."
Dmitry Kovalev [Tue, 9 Jul 2013 03:08:05 +0000 (20:08 -0700)]
Merge "Refactoring setup_pre_planes function."

10 years agoMerge "Calling set_partition_seg_context() instead of code duplication."
Dmitry Kovalev [Tue, 9 Jul 2013 03:07:06 +0000 (20:07 -0700)]
Merge "Calling set_partition_seg_context() instead of code duplication."

10 years agoFix loopfilter bug
John Koleszar [Mon, 8 Jul 2013 23:39:37 +0000 (16:39 -0700)]
Fix loopfilter bug

In the rare case were 4x4 interior filtering was called for but no
8x8 or larger filtering takes place, the previous code was skipping
the filtering. This patch fixes the issue by including the interior
mask in the overall mask for the filter application loops.

Change-Id: I4a0b65056c64f97478827c2ff41e0914fc7779d0

10 years agoDon't call encode_sb() for the final of 4-split subpartitions.
Ronald S. Bultje [Mon, 8 Jul 2013 21:38:40 +0000 (14:38 -0700)]
Don't call encode_sb() for the final of 4-split subpartitions.

The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45

10 years agoInline vp9_get_mv_joint().
Ronald S. Bultje [Wed, 3 Jul 2013 19:04:30 +0000 (12:04 -0700)]
Inline vp9_get_mv_joint().

Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
prevent the call overhead.

Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5

10 years agoDon't recalculate mv_ref costs for each block/partition.
Ronald S. Bultje [Wed, 3 Jul 2013 17:54:36 +0000 (10:54 -0700)]
Don't recalculate mv_ref costs for each block/partition.

Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a

10 years agoRemove unnecessary memset(best_index, 0) from trellis/optimize.
Ronald S. Bultje [Wed, 3 Jul 2013 17:09:15 +0000 (10:09 -0700)]
Remove unnecessary memset(best_index, 0) from trellis/optimize.

First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e

10 years agoRemove memcpy() in handle_inter_mode() filter selection.
Ronald S. Bultje [Mon, 8 Jul 2013 21:49:48 +0000 (14:49 -0700)]
Remove memcpy() in handle_inter_mode() filter selection.

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0

10 years agoMake frame-wide filter-type decision fully RD-based.
Ronald S. Bultje [Mon, 8 Jul 2013 21:49:33 +0000 (14:49 -0700)]
Make frame-wide filter-type decision fully RD-based.

Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78

10 years agoUsing mi_cols instead of mb_cols.
Dmitry Kovalev [Mon, 8 Jul 2013 21:54:04 +0000 (14:54 -0700)]
Using mi_cols instead of mb_cols.

Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607

10 years agoImplements several heuristics to prune mode search
Deb Mukherjee [Wed, 3 Jul 2013 21:47:54 +0000 (14:47 -0700)]
Implements several heuristics to prune mode search

Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495

11 years agoMerge "Refactor SSE2 8x8 functional units"
Jingning Han [Fri, 5 Jul 2013 18:18:18 +0000 (11:18 -0700)]
Merge "Refactor SSE2 8x8 functional units"

11 years agoMerge "Fix to comp_inter_joint_search_thresh feature."
Paul Wilkins [Thu, 4 Jul 2013 10:27:00 +0000 (03:27 -0700)]
Merge "Fix to comp_inter_joint_search_thresh feature."

11 years agoRefactoring setup_pre_planes function.
Dmitry Kovalev [Thu, 4 Jul 2013 00:42:01 +0000 (17:42 -0700)]
Refactoring setup_pre_planes function.

Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63

11 years agoMerge "Adding write_skip_coeff function."
Dmitry Kovalev [Wed, 3 Jul 2013 23:33:58 +0000 (16:33 -0700)]
Merge "Adding write_skip_coeff function."

11 years agoMerge "Enable early termination in rd search"
Jingning Han [Wed, 3 Jul 2013 21:20:41 +0000 (14:20 -0700)]
Merge "Enable early termination in rd search"

11 years agoMerge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE."
Dmitry Kovalev [Wed, 3 Jul 2013 21:16:02 +0000 (14:16 -0700)]
Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE."

11 years agoAdding write_skip_coeff function.
Dmitry Kovalev [Wed, 3 Jul 2013 20:23:47 +0000 (13:23 -0700)]
Adding write_skip_coeff function.

Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd

11 years agoMerge "Inline a few intra predictors"
Yaowu Xu [Wed, 3 Jul 2013 20:21:22 +0000 (13:21 -0700)]
Merge "Inline a few intra predictors"

11 years agoEnable early termination in rd search
Jingning Han [Tue, 2 Jul 2013 23:48:15 +0000 (16:48 -0700)]
Enable early termination in rd search

This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a

11 years agoCalling set_partition_seg_context() instead of code duplication.
Dmitry Kovalev [Wed, 3 Jul 2013 18:15:58 +0000 (11:15 -0700)]
Calling set_partition_seg_context() instead of code duplication.

Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555

11 years agoReplacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Dmitry Kovalev [Wed, 3 Jul 2013 17:54:50 +0000 (10:54 -0700)]
Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.

Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c

11 years agoMerge "Adding write_selected_txfm_size function."
Dmitry Kovalev [Wed, 3 Jul 2013 17:33:55 +0000 (10:33 -0700)]
Merge "Adding write_selected_txfm_size function."

11 years agoInline a few intra predictors
Yaowu Xu [Wed, 3 Jul 2013 17:20:41 +0000 (10:20 -0700)]
Inline a few intra predictors

Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710

11 years agoRefactor SSE2 8x8 functional units
Jingning Han [Wed, 3 Jul 2013 16:05:01 +0000 (09:05 -0700)]
Refactor SSE2 8x8 functional units

These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d

11 years agoMerge "Use pmovmskb to skip quantize loops over empty coefficients."
Ronald S. Bultje [Wed, 3 Jul 2013 16:05:48 +0000 (09:05 -0700)]
Merge "Use pmovmskb to skip quantize loops over empty coefficients."

11 years agoMerge "Remove unused function vp9_build_inter4x4_predictors_mbuv()."
Ronald S. Bultje [Wed, 3 Jul 2013 16:05:20 +0000 (09:05 -0700)]
Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()."

11 years agoFix to comp_inter_joint_search_thresh feature.
Paul Wilkins [Wed, 3 Jul 2013 11:53:36 +0000 (12:53 +0100)]
Fix to comp_inter_joint_search_thresh feature.

When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88

11 years agoAdded two new skip experiments.
Paul Wilkins [Mon, 1 Jul 2013 15:27:12 +0000 (16:27 +0100)]
Added two new skip experiments.

sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd

11 years agoMerge "Adjust Speed 0 settings."
Paul Wilkins [Wed, 3 Jul 2013 09:47:18 +0000 (02:47 -0700)]
Merge "Adjust Speed 0 settings."

11 years agoMerge "Removing redundant struct from union b_mode_info."
Dmitry Kovalev [Wed, 3 Jul 2013 01:09:31 +0000 (18:09 -0700)]
Merge "Removing redundant struct from union b_mode_info."

11 years agoMerge "Added a speed feature use_square_partition_only"
Yaowu Xu [Wed, 3 Jul 2013 00:16:11 +0000 (17:16 -0700)]
Merge "Added a speed feature use_square_partition_only"

11 years agoRemoving redundant struct from union b_mode_info.
Dmitry Kovalev [Tue, 2 Jul 2013 23:51:57 +0000 (16:51 -0700)]
Removing redundant struct from union b_mode_info.

Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83

11 years agoAdding write_selected_txfm_size function.
Dmitry Kovalev [Tue, 2 Jul 2013 23:41:22 +0000 (16:41 -0700)]
Adding write_selected_txfm_size function.

Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214

11 years agoAdded a speed feature use_square_partition_only
Yaowu Xu [Thu, 27 Jun 2013 19:07:07 +0000 (12:07 -0700)]
Added a speed feature use_square_partition_only

This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9

11 years agoUse pmovmskb to skip quantize loops over empty coefficients.
Ronald S. Bultje [Mon, 1 Jul 2013 19:03:20 +0000 (12:03 -0700)]
Use pmovmskb to skip quantize loops over empty coefficients.

If none of the 16 coefficients that we quantize per loop iteration
are larger than the zbin, directly skip to the next round of coeffs,
rather than doing a full quantize loop that will eventually result
in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
no significantly positive results for smaller transforms, so is not
used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).

Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647

11 years agoRemove unused function vp9_build_inter4x4_predictors_mbuv().
Ronald S. Bultje [Tue, 2 Jul 2013 23:34:10 +0000 (16:34 -0700)]
Remove unused function vp9_build_inter4x4_predictors_mbuv().

Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b

11 years agonew unit test for cpu-speed
Jim Bankoski [Tue, 2 Jul 2013 21:14:16 +0000 (14:14 -0700)]
new unit test for cpu-speed

Tests q0 ( lossless),  very high bitrate and low bitrates at cpu speed
0, 1 and 2.

Change-Id: I0c5cdca00acd8d01e7b13f124b3b08d4b1ae9f6d

11 years agoSpeed feature to binary search dir intramodes
Deb Mukherjee [Tue, 2 Jul 2013 18:18:00 +0000 (11:18 -0700)]
Speed feature to binary search dir intramodes

This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2

11 years agoMerge "Clean-up in forward update to use mapping tables"
Deb Mukherjee [Tue, 2 Jul 2013 21:02:53 +0000 (14:02 -0700)]
Merge "Clean-up in forward update to use mapping tables"

11 years agoTx size selection enhancements
Deb Mukherjee [Fri, 21 Jun 2013 23:31:12 +0000 (16:31 -0700)]
Tx size selection enhancements

(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936

11 years agoClean-up in forward update to use mapping tables
Deb Mukherjee [Tue, 2 Jul 2013 19:48:20 +0000 (12:48 -0700)]
Clean-up in forward update to use mapping tables

Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.

No bit-stream or output change.

Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546

11 years agoMerge "Removing unused implicit segmentation code."
Dmitry Kovalev [Tue, 2 Jul 2013 18:58:48 +0000 (11:58 -0700)]
Merge "Removing unused implicit segmentation code."

11 years agoMerge "Make get_coef_context() branchless."
Ronald S. Bultje [Tue, 2 Jul 2013 18:48:15 +0000 (11:48 -0700)]
Merge "Make get_coef_context() branchless."

11 years agoMerge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h."
Dmitry Kovalev [Tue, 2 Jul 2013 18:31:35 +0000 (11:31 -0700)]
Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h."

11 years agoMerge "Additional vp9_decodemv.c cleanup."
Dmitry Kovalev [Tue, 2 Jul 2013 18:31:23 +0000 (11:31 -0700)]
Merge "Additional vp9_decodemv.c cleanup."

11 years agoRemoving unused implicit segmentation code.
Dmitry Kovalev [Tue, 2 Jul 2013 18:16:42 +0000 (11:16 -0700)]
Removing unused implicit segmentation code.

Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905

11 years agoMerge "Add speed feature to disable splitmv"
Yunqing Wang [Tue, 2 Jul 2013 17:54:22 +0000 (10:54 -0700)]
Merge "Add speed feature to disable splitmv"

11 years agoAdd speed feature to disable splitmv
Yunqing Wang [Sun, 30 Jun 2013 00:34:51 +0000 (17:34 -0700)]
Add speed feature to disable splitmv

Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6

11 years agoCalculate rd cost per transformed block
Jingning Han [Mon, 1 Jul 2013 23:50:58 +0000 (16:50 -0700)]
Calculate rd cost per transformed block

Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac

11 years agoMerge "Update quantize SSSE3 SIMD to cover 32x32 transform case also."
Ronald S. Bultje [Tue, 2 Jul 2013 16:38:08 +0000 (09:38 -0700)]
Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also."

11 years agoAdjust Speed 0 settings.
Paul Wilkins [Tue, 2 Jul 2013 14:42:14 +0000 (15:42 +0100)]
Adjust Speed 0 settings.

Remove the use of sf->comp_inter_joint_search_thresh
from the baseline speed 0. Approx +0.4% on derf.

Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71

11 years agoRevert "New motion threshold factor - speed feature."
Paul Wilkins [Tue, 2 Jul 2013 11:34:41 +0000 (12:34 +0100)]
Revert "New motion threshold factor - speed feature."

This reverts commit 13772781807ebff8b5c7d100e90d0eac6c61cbd4.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f

11 years agofix the mismatch again in cpu_used 2
Yaowu Xu [Tue, 2 Jul 2013 02:13:18 +0000 (19:13 -0700)]
fix the mismatch again in cpu_used 2

Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679

11 years agouse partitioning from last frame
Jim Bankoski [Tue, 2 Jul 2013 01:18:50 +0000 (18:18 -0700)]
use partitioning from last frame

This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none

Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
11 years agoRemoving vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Dmitry Kovalev [Tue, 2 Jul 2013 00:28:08 +0000 (17:28 -0700)]
Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.

Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c

11 years agoMake get_coef_context() branchless.
Ronald S. Bultje [Mon, 1 Jul 2013 17:40:00 +0000 (10:40 -0700)]
Make get_coef_context() branchless.

This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56

11 years agoAdditional vp9_decodemv.c cleanup.
Dmitry Kovalev [Mon, 1 Jul 2013 23:14:13 +0000 (16:14 -0700)]
Additional vp9_decodemv.c cleanup.

Change-Id: I5b413bc0884af0bda38c05332d86490103905b3b

11 years agoMerge "Quantize (64-bit only, for now) SSSE3 SIMD."
Yaowu Xu [Mon, 1 Jul 2013 22:58:57 +0000 (15:58 -0700)]
Merge "Quantize (64-bit only, for now) SSSE3 SIMD."

11 years agoMerge "Removing vp9_modecont.{h, c}."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:48 +0000 (14:58 -0700)]
Merge "Removing vp9_modecont.{h, c}."

11 years agoMerge "Moving encoder subexp encoding functions to subexp.{h, c}."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:36 +0000 (14:58 -0700)]
Merge "Moving encoder subexp encoding functions to subexp.{h, c}."

11 years agoMerge "Adding vp9_rb_read_signed_literal function."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:20 +0000 (14:58 -0700)]
Merge "Adding vp9_rb_read_signed_literal function."

11 years agoMerge "Inlining decode_atom, decode_sb_intra, and decode_sb."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:06 +0000 (14:58 -0700)]
Merge "Inlining decode_atom, decode_sb_intra, and decode_sb."

11 years agoMerge "Cleanup inside vp9_decodemv.c."
Dmitry Kovalev [Mon, 1 Jul 2013 21:50:32 +0000 (14:50 -0700)]
Merge "Cleanup inside vp9_decodemv.c."

11 years agoUpdate quantize SSSE3 SIMD to cover 32x32 transform case also.
Ronald S. Bultje [Mon, 1 Jul 2013 18:36:33 +0000 (11:36 -0700)]
Update quantize SSSE3 SIMD to cover 32x32 transform case also.

Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87

11 years agoQuantize (64-bit only, for now) SSSE3 SIMD.
Ronald S. Bultje [Mon, 1 Jul 2013 18:36:07 +0000 (11:36 -0700)]
Quantize (64-bit only, for now) SSSE3 SIMD.

Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904

11 years agoRemoving vp9_modecont.{h, c}.
Dmitry Kovalev [Mon, 1 Jul 2013 17:17:15 +0000 (10:17 -0700)]
Removing vp9_modecont.{h, c}.

Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.

Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de

11 years agoMerge "New motion threshold factor - speed feature."
Paul Wilkins [Mon, 1 Jul 2013 16:39:02 +0000 (09:39 -0700)]
Merge "New motion threshold factor - speed feature."

11 years agofix a mismatch in cpuused 2
Yaowu Xu [Mon, 1 Jul 2013 15:54:50 +0000 (08:54 -0700)]
fix a mismatch in cpuused 2

Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540

11 years agoNew motion threshold factor - speed feature.
Paul Wilkins [Thu, 27 Jun 2013 12:16:33 +0000 (13:16 +0100)]
New motion threshold factor - speed feature.

Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5

11 years agoAdding vp9_rb_read_signed_literal function.
Dmitry Kovalev [Mon, 1 Jul 2013 09:09:36 +0000 (02:09 -0700)]
Adding vp9_rb_read_signed_literal function.

Change-Id: I30ea91561ffac7e5065ba41b2d3ab7dedb720593

11 years agoMerge "Enable SSE2 4x4 ADST/DCT transform"
Jingning Han [Sat, 29 Jun 2013 22:57:03 +0000 (15:57 -0700)]
Merge "Enable SSE2 4x4 ADST/DCT transform"

11 years agoSSE2 version of vp9_short_fdct32x32_rd.
Christian Duvivier [Tue, 18 Jun 2013 22:23:25 +0000 (15:23 -0700)]
SSE2 version of vp9_short_fdct32x32_rd.

43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0

11 years agoMoving encoder subexp encoding functions to subexp.{h, c}.
Dmitry Kovalev [Sat, 29 Jun 2013 18:50:45 +0000 (11:50 -0700)]
Moving encoder subexp encoding functions to subexp.{h, c}.

Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca

11 years agoMerge "fixed a bug where sse is not populated"
Ronald S. Bultje [Sat, 29 Jun 2013 14:42:41 +0000 (07:42 -0700)]
Merge "fixed a bug where sse is not populated"

11 years agoMerge "add Neon optimized add constant residual functions"
Johann [Sat, 29 Jun 2013 02:50:38 +0000 (19:50 -0700)]
Merge "add Neon optimized add constant residual functions"

11 years agoMerge "fix test compile error"
James Zern [Sat, 29 Jun 2013 02:48:05 +0000 (19:48 -0700)]
Merge "fix test compile error"

11 years agoMerge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."
Ronald S. Bultje [Sat, 29 Jun 2013 02:37:11 +0000 (19:37 -0700)]
Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."

11 years agoMerge "Minor change to prevent one level of dereference in cost_coeffs()."
Ronald S. Bultje [Sat, 29 Jun 2013 02:36:56 +0000 (19:36 -0700)]
Merge "Minor change to prevent one level of dereference in cost_coeffs()."

11 years agoadd Neon optimized add constant residual functions
chm [Thu, 27 Jun 2013 12:47:56 +0000 (20:47 +0800)]
add Neon optimized add constant residual functions

- Add add_constant_residual_8x8 16x16 32x32 functions
- Tested under RealView debugger enviroment

Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e

11 years agoMerge "Cosmetic reordering of FRAME_CONTEXT members."
Dmitry Kovalev [Sat, 29 Jun 2013 01:38:02 +0000 (18:38 -0700)]
Merge "Cosmetic reordering of FRAME_CONTEXT members."

11 years agoInlining decode_atom, decode_sb_intra, and decode_sb.
Dmitry Kovalev [Sat, 29 Jun 2013 01:34:30 +0000 (18:34 -0700)]
Inlining decode_atom, decode_sb_intra, and decode_sb.

Change-Id: I41711bb994f542c5ba3d0cefd9b2e79db3c2c3a1

11 years agofix test compile error
James Zern [Sat, 29 Jun 2013 01:07:37 +0000 (18:07 -0700)]
fix test compile error

since:
92479d9 Make update_partition_context faster

fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
  char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
                     ^~~~~~~~~~~~~~~~~

Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b

11 years agoEnable SSE2 4x4 ADST/DCT transform
Jingning Han [Fri, 28 Jun 2013 20:37:19 +0000 (13:37 -0700)]
Enable SSE2 4x4 ADST/DCT transform

This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.

Change-Id: Iad4d526346e05c7be896466c05500711bb763660

11 years agofixed a bug where sse is not populated
Yaowu Xu [Sat, 29 Jun 2013 00:10:22 +0000 (17:10 -0700)]
fixed a bug where sse is not populated

Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc

11 years agoMerge "Fix switch statement in 8x8 transform"
Jingning Han [Fri, 28 Jun 2013 23:49:59 +0000 (16:49 -0700)]
Merge "Fix switch statement in 8x8 transform"

11 years agoCosmetic reordering of FRAME_CONTEXT members.
Dmitry Kovalev [Fri, 28 Jun 2013 23:16:03 +0000 (16:16 -0700)]
Cosmetic reordering of FRAME_CONTEXT members.

Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2

11 years agoCleanup inside vp9_decodemv.c.
Dmitry Kovalev [Fri, 28 Jun 2013 22:32:01 +0000 (15:32 -0700)]
Cleanup inside vp9_decodemv.c.

Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.

Change-Id: Ifee05d333da4cd331d4aff40ce41ccd9b70e494a

11 years agoMerge "Removing CONFIG_DEBUG checks on assertions."
Dmitry Kovalev [Fri, 28 Jun 2013 21:03:28 +0000 (14:03 -0700)]
Merge "Removing CONFIG_DEBUG checks on assertions."

11 years agoFix switch statement in 8x8 transform
Jingning Han [Fri, 28 Jun 2013 20:39:32 +0000 (13:39 -0700)]
Fix switch statement in 8x8 transform

Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f

11 years agoMerge "Some minor optimizations for cost_coeffs()."
Ronald S. Bultje [Fri, 28 Jun 2013 18:54:50 +0000 (11:54 -0700)]
Merge "Some minor optimizations for cost_coeffs()."

11 years agoMerge "Make coefficient skip condition an explicit RD choice."
Ronald S. Bultje [Fri, 28 Jun 2013 18:54:28 +0000 (11:54 -0700)]
Merge "Make coefficient skip condition an explicit RD choice."

11 years agoInline vp9_get_coef_context() (and remove vp9_ prefix).
Ronald S. Bultje [Fri, 28 Jun 2013 17:40:21 +0000 (10:40 -0700)]
Inline vp9_get_coef_context() (and remove vp9_ prefix).

Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896

11 years agoMerge "Decoder's code cleanup."
Dmitry Kovalev [Fri, 28 Jun 2013 17:38:54 +0000 (10:38 -0700)]
Merge "Decoder's code cleanup."

11 years agoRemoving CONFIG_DEBUG checks on assertions.
Dmitry Kovalev [Fri, 28 Jun 2013 17:36:20 +0000 (10:36 -0700)]
Removing CONFIG_DEBUG checks on assertions.

Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6

11 years agoMinor change to prevent one level of dereference in cost_coeffs().
Ronald S. Bultje [Fri, 28 Jun 2013 17:21:25 +0000 (10:21 -0700)]
Minor change to prevent one level of dereference in cost_coeffs().

4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles

Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78

11 years agoSome minor optimizations for cost_coeffs().
Ronald S. Bultje [Fri, 28 Jun 2013 03:57:37 +0000 (20:57 -0700)]
Some minor optimizations for cost_coeffs().

Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles

Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.

Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95

11 years agoMake coefficient skip condition an explicit RD choice.
Ronald S. Bultje [Fri, 28 Jun 2013 00:41:54 +0000 (17:41 -0700)]
Make coefficient skip condition an explicit RD choice.

This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4