platform/upstream/libvpx.git
11 years agoAdd neon optimize vp9_dc_only_idct_add.
hkuang [Tue, 9 Jul 2013 19:06:21 +0000 (12:06 -0700)]
Add neon optimize vp9_dc_only_idct_add.

Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423

11 years agoMerge "Wide loopfilter 16 pix at a time"
Jim Bankoski [Thu, 11 Jul 2013 13:44:02 +0000 (06:44 -0700)]
Merge "Wide loopfilter 16 pix at a time"

11 years agoMerge "Fix tx_type bug in intra4x4 rd loop"
Jingning Han [Thu, 11 Jul 2013 03:13:25 +0000 (20:13 -0700)]
Merge "Fix tx_type bug in intra4x4 rd loop"

11 years agoReplace copy_memNxM functions with a generic copy/avg function.
Ronald S. Bultje [Wed, 10 Jul 2013 18:17:19 +0000 (11:17 -0700)]
Replace copy_memNxM functions with a generic copy/avg function.

Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa

11 years agoRemove unused fwalsh/fdct x86 SIMD implementations.
Ronald S. Bultje [Wed, 10 Jul 2013 17:34:58 +0000 (10:34 -0700)]
Remove unused fwalsh/fdct x86 SIMD implementations.

Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e

11 years agoMerge "Remove unused iwalsh4x4 MMX/SSE2 functions."
Ronald S. Bultje [Thu, 11 Jul 2013 00:08:46 +0000 (17:08 -0700)]
Merge "Remove unused iwalsh4x4 MMX/SSE2 functions."

11 years agoMerge "Remove unused 16x3/3x16 sad SSE2 functions."
Ronald S. Bultje [Thu, 11 Jul 2013 00:08:43 +0000 (17:08 -0700)]
Merge "Remove unused 16x3/3x16 sad SSE2 functions."

11 years agoWide loopfilter 16 pix at a time
John Koleszar [Wed, 12 Jun 2013 21:37:01 +0000 (14:37 -0700)]
Wide loopfilter 16 pix at a time

Where possible, do the 16 pixel wide filter while doing the horizontal
filtering pass. The same approach can be taken for the mbloop_filter
when that's implemented. Doing so on the vertical pass is a little more
involved, but possible.

Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320

11 years agoFix tx_type bug in intra4x4 rd loop
Jingning Han [Wed, 10 Jul 2013 22:45:34 +0000 (15:45 -0700)]
Fix tx_type bug in intra4x4 rd loop

This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980

11 years agoMerge "Prunes out full-rd computation based on modeled rd"
Deb Mukherjee [Wed, 10 Jul 2013 22:37:11 +0000 (15:37 -0700)]
Merge "Prunes out full-rd computation based on modeled rd"

11 years agoMerge "Adding read_compressed_header function."
Dmitry Kovalev [Wed, 10 Jul 2013 22:11:08 +0000 (15:11 -0700)]
Merge "Adding read_compressed_header function."

11 years agoconfigure with internal stats not working
Jim Bankoski [Wed, 10 Jul 2013 22:07:53 +0000 (15:07 -0700)]
configure with internal stats not working

Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a

11 years agoRemove unused iwalsh4x4 MMX/SSE2 functions.
Ronald S. Bultje [Wed, 10 Jul 2013 17:27:42 +0000 (10:27 -0700)]
Remove unused iwalsh4x4 MMX/SSE2 functions.

Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652

11 years agoRemove unused 16x3/3x16 sad SSE2 functions.
Ronald S. Bultje [Wed, 10 Jul 2013 17:23:41 +0000 (10:23 -0700)]
Remove unused 16x3/3x16 sad SSE2 functions.

Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4

11 years agoMerge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction."
Ronald S. Bultje [Wed, 10 Jul 2013 21:52:23 +0000 (14:52 -0700)]
Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction."

11 years agoMerge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction."
Ronald S. Bultje [Wed, 10 Jul 2013 21:52:19 +0000 (14:52 -0700)]
Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction."

11 years agoMerge "remove warnings when NDEBUG is set"
Jim Bankoski [Wed, 10 Jul 2013 21:39:39 +0000 (14:39 -0700)]
Merge "remove warnings when NDEBUG is set"

11 years agoremove warnings when NDEBUG is set
Jim Bankoski [Wed, 10 Jul 2013 21:27:20 +0000 (14:27 -0700)]
remove warnings when NDEBUG is set

Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136

11 years agoPrunes out full-rd computation based on modeled rd
Deb Mukherjee [Mon, 8 Jul 2013 23:01:01 +0000 (16:01 -0700)]
Prunes out full-rd computation based on modeled rd

Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc

11 years agoMerge "msvc: set a more useful debug format"
James Zern [Wed, 10 Jul 2013 20:02:22 +0000 (13:02 -0700)]
Merge "msvc: set a more useful debug format"

11 years agoMerge "test_libvpx: disable pthreads in gtest for win targets"
James Zern [Wed, 10 Jul 2013 20:01:52 +0000 (13:01 -0700)]
Merge "test_libvpx: disable pthreads in gtest for win targets"

11 years agoSSE2 16x16 ADST/DCT hybrid transform
Jingning Han [Wed, 3 Jul 2013 16:05:01 +0000 (09:05 -0700)]
SSE2 16x16 ADST/DCT hybrid transform

This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230

11 years agoMerge "Adding encode_tiles function to vp9_bitstream.c."
Dmitry Kovalev [Wed, 10 Jul 2013 18:43:50 +0000 (11:43 -0700)]
Merge "Adding encode_tiles function to vp9_bitstream.c."

11 years agoMerge "Add a feature to reduce chrome intra mode search"
Yaowu Xu [Wed, 10 Jul 2013 18:35:47 +0000 (11:35 -0700)]
Merge "Add a feature to reduce chrome intra mode search"

11 years agoMerge "Add unit test for 16x16 forward ADST/DCT"
Jingning Han [Wed, 10 Jul 2013 18:16:39 +0000 (11:16 -0700)]
Merge "Add unit test for 16x16 forward ADST/DCT"

11 years agoMerge "Bug fix: set frame_parallel_decoding_mode"
Scott LaVarnway [Wed, 10 Jul 2013 18:09:30 +0000 (11:09 -0700)]
Merge "Bug fix: set frame_parallel_decoding_mode"

11 years agoMerge "Fix intermediate height in convolve"
John Koleszar [Wed, 10 Jul 2013 18:04:40 +0000 (11:04 -0700)]
Merge "Fix intermediate height in convolve"

11 years agoAdding read_compressed_header function.
Dmitry Kovalev [Mon, 8 Jul 2013 18:54:36 +0000 (11:54 -0700)]
Adding read_compressed_header function.

Splitting setup_txfm_mode into read_tx_mode and read_tx_probs.

Change-Id: I5b4fe48698d56490857d32eafcaeb4291f208479

11 years agoMerge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction."
Ronald S. Bultje [Wed, 10 Jul 2013 17:24:16 +0000 (10:24 -0700)]
Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction."

11 years agoMerge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction."
Ronald S. Bultje [Wed, 10 Jul 2013 17:13:16 +0000 (10:13 -0700)]
Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction."

11 years agoMerge "Remove memcpy() in handle_inter_mode() filter selection."
Ronald S. Bultje [Wed, 10 Jul 2013 17:13:07 +0000 (10:13 -0700)]
Merge "Remove memcpy() in handle_inter_mode() filter selection."

11 years agoMerge "Removing vp9_maskingmv.c and corresponding assembly file."
Dmitry Kovalev [Wed, 10 Jul 2013 17:05:06 +0000 (10:05 -0700)]
Merge "Removing vp9_maskingmv.c and corresponding assembly file."

11 years agoAdd unit test for 16x16 forward ADST/DCT
Jingning Han [Tue, 9 Jul 2013 23:16:49 +0000 (16:16 -0700)]
Add unit test for 16x16 forward ADST/DCT

Unit tests on the functional accuracy of forward ADST/DCT.

Change-Id: I81afff866bdeacbd457b0af96993a035741657f6

11 years agoSSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.
Ronald S. Bultje [Tue, 9 Jul 2013 23:18:28 +0000 (16:18 -0700)]
SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.

Change-Id: Iad70966b986f65259329070e258f76ef0af816b4

11 years agoSSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction.
Ronald S. Bultje [Wed, 10 Jul 2013 02:46:01 +0000 (19:46 -0700)]
SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction.

Change-Id: I3441c059214c2956e8261331bbf521525a617a86

11 years agoSSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction.
Ronald S. Bultje [Tue, 9 Jul 2013 21:54:20 +0000 (14:54 -0700)]
SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction.

Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997

11 years agoSSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction.
Ronald S. Bultje [Tue, 9 Jul 2013 21:52:20 +0000 (14:52 -0700)]
SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction.

Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0

11 years agoRemove memcpy() in handle_inter_mode() filter selection.
Ronald S. Bultje [Wed, 10 Jul 2013 16:26:32 +0000 (09:26 -0700)]
Remove memcpy() in handle_inter_mode() filter selection.

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83

11 years agoAdd a feature to reduce chrome intra mode search
Yaowu Xu [Wed, 10 Jul 2013 15:59:18 +0000 (08:59 -0700)]
Add a feature to reduce chrome intra mode search

Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8

11 years agomi_width_log2 & mi_height_log2
Jim Bankoski [Wed, 10 Jul 2013 14:26:08 +0000 (07:26 -0700)]
mi_width_log2 & mi_height_log2

converted to lookup to avoid unnecessary code

Change-Id: I2ee6a01f06984cc2c4ba74b3fffd215318f749d2

11 years agob_width_log2 and b_height_log2 lookups
Jim Bankoski [Wed, 10 Jul 2013 14:19:09 +0000 (07:19 -0700)]
b_width_log2 and b_height_log2 lookups

    Replace case statement with lookup.
    Small speed gain at low speed settings but at speed 2+ where the
    number of motion searches etc. falls the impact rises to ~3-4%.

    Change-Id: Idff639b7b302ee65e042b7bf836943ac0a06fad8

Change-Id: I5940719a4a161f8c26ac9a6753f1678494cec644

11 years agoremoving case statements around prediction entropy coding
Jim Bankoski [Wed, 10 Jul 2013 02:55:07 +0000 (19:55 -0700)]
removing case statements around prediction entropy coding

Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b

11 years agoMerge "Revert "Remove memcpy() in handle_inter_mode() filter selection.""
Yaowu Xu [Wed, 10 Jul 2013 03:10:06 +0000 (20:10 -0700)]
Merge "Revert "Remove memcpy() in handle_inter_mode() filter selection.""

11 years agoMerge "remove unused VP8 com/dec asm offsets"
James Zern [Wed, 10 Jul 2013 02:13:49 +0000 (19:13 -0700)]
Merge "remove unused VP8 com/dec asm offsets"

11 years agoMerge "Remove all asm offset files from VP9"
James Zern [Wed, 10 Jul 2013 02:13:37 +0000 (19:13 -0700)]
Merge "Remove all asm offset files from VP9"

11 years agoMerge "Loop filter code cleanup."
Dmitry Kovalev [Wed, 10 Jul 2013 01:56:19 +0000 (18:56 -0700)]
Merge "Loop filter code cleanup."

11 years agoRevert "Remove memcpy() in handle_inter_mode() filter selection."
Yaowu Xu [Wed, 10 Jul 2013 00:40:39 +0000 (17:40 -0700)]
Revert "Remove memcpy() in handle_inter_mode() filter selection."

This reverts commit fcf7998a47f7e1ec27fe93f99e488d345560a9be.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb

11 years agomsvc: set a more useful debug format
James Zern [Wed, 10 Jul 2013 00:28:22 +0000 (17:28 -0700)]
msvc: set a more useful debug format

pdb vs. c7; works better with test_libvpx

Change-Id: I67d18e328dd8e7734d3710f3912e9b179d368a62

11 years agoMerge "Added a lossless test"
Yaowu Xu [Wed, 10 Jul 2013 00:15:09 +0000 (17:15 -0700)]
Merge "Added a lossless test"

11 years agoAdding encode_tiles function to vp9_bitstream.c.
Dmitry Kovalev [Tue, 9 Jul 2013 22:59:19 +0000 (15:59 -0700)]
Adding encode_tiles function to vp9_bitstream.c.

Change-Id: Ie44824ec25fd8fdb25d7c8124a9b28c26d802029

11 years agoMerge "Add Neon horizontal and vertical vp9_mbloop_filter"
Frank Galligan [Tue, 9 Jul 2013 22:38:44 +0000 (15:38 -0700)]
Merge "Add Neon horizontal and vertical vp9_mbloop_filter"

11 years agoAdded a lossless test
Yaowu Xu [Tue, 9 Jul 2013 17:54:36 +0000 (10:54 -0700)]
Added a lossless test

It does encodings with min and max q set at 0, and check to make sure
output PSNR at MAX_PSNR (100).

Change-Id: Ia2418353cccf6e487204ea4ff874a7e71e55cb3e

11 years agoremove unused VP8 com/dec asm offsets
James Zern [Tue, 9 Jul 2013 21:33:49 +0000 (14:33 -0700)]
remove unused VP8 com/dec asm offsets

Change-Id: Ib3b26ee27f04b2dcbbd32b3127afb45e9f50cfcf

11 years agoRemove all asm offset files from VP9
John Koleszar [Fri, 26 Apr 2013 05:07:29 +0000 (22:07 -0700)]
Remove all asm offset files from VP9

The files are empty and unused.

Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a

11 years agoMerge "Removed unnecessary xd->mode_info_context assignment"
Scott LaVarnway [Tue, 9 Jul 2013 19:45:32 +0000 (12:45 -0700)]
Merge "Removed unnecessary xd->mode_info_context assignment"

11 years agoAdd Neon horizontal and vertical vp9_mbloop_filter
Frank Galligan [Mon, 1 Jul 2013 19:52:38 +0000 (12:52 -0700)]
Add Neon horizontal and vertical vp9_mbloop_filter

- The vp9 mbfilter C code will branch on flat and mask. This CL
  will perform both branches and combine the data. A later CL will
  perform a check to see if all patch will take one branch.
- These functions are about 1.75 times faster than the C code on
  Nexus 7.

PS #3
- Changed all functions to dub limit, blimit, and thresh from
  vld {dx[]}, freeing up r4-r6.
- Changed code to use vbif to reduce one instruction and free
  up a d register.

Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777

11 years agoMerge "Adding update_tx_ct function, removing duplicated code."
Dmitry Kovalev [Tue, 9 Jul 2013 19:26:11 +0000 (12:26 -0700)]
Merge "Adding update_tx_ct function, removing duplicated code."

11 years agoRemoving vp9_maskingmv.c and corresponding assembly file.
Dmitry Kovalev [Tue, 9 Jul 2013 18:22:56 +0000 (11:22 -0700)]
Removing vp9_maskingmv.c and corresponding assembly file.

Change-Id: I9842d02d61d78d17dc3449bae8ffbe60f4b3ecb3

11 years agoLoop filter code cleanup.
Dmitry Kovalev [Tue, 9 Jul 2013 18:17:36 +0000 (11:17 -0700)]
Loop filter code cleanup.

Using MAX_LOOP_FILTER constant instead of number 63.

Change-Id: If91e0c198331b3041e7cd0707a5948479e9209d8

11 years agoRemoved unnecessary xd->mode_info_context assignment
Scott LaVarnway [Tue, 9 Jul 2013 17:41:34 +0000 (13:41 -0400)]
Removed unnecessary xd->mode_info_context assignment

mi is xd->mode_info_context

Change-Id: Ib101be922b695205ec57b5ce1828ba19bde5b41c

11 years agoMerge "Unbreak lossless."
Ronald S. Bultje [Tue, 9 Jul 2013 16:54:48 +0000 (09:54 -0700)]
Merge "Unbreak lossless."

11 years agoMerge "Make intra prediction pointers RTCD-based."
Ronald S. Bultje [Tue, 9 Jul 2013 16:54:43 +0000 (09:54 -0700)]
Merge "Make intra prediction pointers RTCD-based."

11 years agoUnbreak lossless.
Ronald S. Bultje [Tue, 9 Jul 2013 16:46:37 +0000 (09:46 -0700)]
Unbreak lossless.

Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1

11 years agocleanup read_mode_info if (1)
Jim Bankoski [Tue, 9 Jul 2013 16:04:45 +0000 (09:04 -0700)]
cleanup read_mode_info if (1)

Change-Id: I851af23c787a2d3637d84244b9f75063cbf782f1

11 years agodecoder speedup - get-segment-id only if segmentation enabled
Jim Bankoski [Tue, 9 Jul 2013 15:52:30 +0000 (08:52 -0700)]
decoder speedup - get-segment-id only if segmentation enabled

Change-Id: I9355f8446660aeb7dfdbc5ee56635c791ac35e95

11 years agoMerge "Fix loopfilter bug"
Yaowu Xu [Tue, 9 Jul 2013 08:34:25 +0000 (01:34 -0700)]
Merge "Fix loopfilter bug"

11 years agoMerge "Using mi_cols instead of mb_cols."
Dmitry Kovalev [Tue, 9 Jul 2013 03:09:19 +0000 (20:09 -0700)]
Merge "Using mi_cols instead of mb_cols."

11 years agoMerge "Refactoring setup_pre_planes function."
Dmitry Kovalev [Tue, 9 Jul 2013 03:08:05 +0000 (20:08 -0700)]
Merge "Refactoring setup_pre_planes function."

11 years agoMerge "Calling set_partition_seg_context() instead of code duplication."
Dmitry Kovalev [Tue, 9 Jul 2013 03:07:06 +0000 (20:07 -0700)]
Merge "Calling set_partition_seg_context() instead of code duplication."

11 years agoMake intra prediction pointers RTCD-based.
Ronald S. Bultje [Tue, 9 Jul 2013 00:25:51 +0000 (17:25 -0700)]
Make intra prediction pointers RTCD-based.

This probably has a mildly negative impact on performance, but will
(in future commits - or possibly merged with this one) allow SIMD
implementations of individual intra prediction functions. We may
perhaps want to consider having separate functions per txfm-size
also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
each intra prediction mode), but I haven't played much with that
yet.

Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269

11 years agoFix loopfilter bug
John Koleszar [Mon, 8 Jul 2013 23:39:37 +0000 (16:39 -0700)]
Fix loopfilter bug

In the rare case were 4x4 interior filtering was called for but no
8x8 or larger filtering takes place, the previous code was skipping
the filtering. This patch fixes the issue by including the interior
mask in the overall mask for the filter application loops.

Change-Id: I4a0b65056c64f97478827c2ff41e0914fc7779d0

11 years agoDon't call encode_sb() for the final of 4-split subpartitions.
Ronald S. Bultje [Mon, 8 Jul 2013 21:38:40 +0000 (14:38 -0700)]
Don't call encode_sb() for the final of 4-split subpartitions.

The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45

11 years agoInline vp9_get_mv_joint().
Ronald S. Bultje [Wed, 3 Jul 2013 19:04:30 +0000 (12:04 -0700)]
Inline vp9_get_mv_joint().

Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
prevent the call overhead.

Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5

11 years agoDon't recalculate mv_ref costs for each block/partition.
Ronald S. Bultje [Wed, 3 Jul 2013 17:54:36 +0000 (10:54 -0700)]
Don't recalculate mv_ref costs for each block/partition.

Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a

11 years agoRemove unnecessary memset(best_index, 0) from trellis/optimize.
Ronald S. Bultje [Wed, 3 Jul 2013 17:09:15 +0000 (10:09 -0700)]
Remove unnecessary memset(best_index, 0) from trellis/optimize.

First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e

11 years agoRemove memcpy() in handle_inter_mode() filter selection.
Ronald S. Bultje [Mon, 8 Jul 2013 21:49:48 +0000 (14:49 -0700)]
Remove memcpy() in handle_inter_mode() filter selection.

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0

11 years agoMake frame-wide filter-type decision fully RD-based.
Ronald S. Bultje [Mon, 8 Jul 2013 21:49:33 +0000 (14:49 -0700)]
Make frame-wide filter-type decision fully RD-based.

Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78

11 years agoUsing mi_cols instead of mb_cols.
Dmitry Kovalev [Mon, 8 Jul 2013 21:54:04 +0000 (14:54 -0700)]
Using mi_cols instead of mb_cols.

Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607

11 years agoImplements several heuristics to prune mode search
Deb Mukherjee [Wed, 3 Jul 2013 21:47:54 +0000 (14:47 -0700)]
Implements several heuristics to prune mode search

Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495

11 years agoMerge "Refactor SSE2 8x8 functional units"
Jingning Han [Fri, 5 Jul 2013 18:18:18 +0000 (11:18 -0700)]
Merge "Refactor SSE2 8x8 functional units"

11 years agoFix intermediate height in convolve
Tero Rintaluoma [Fri, 5 Jul 2013 10:53:36 +0000 (13:53 +0300)]
Fix intermediate height in convolve

intermediate_height for horizontal filtering must be at least 8
pixels to be able to do vertical filtering correctly. Currently
it can be less for small block and y_step_q4 sizes.

Change-Id: I2ee28b0591b2041c2fa9844d0ae2ff8a1a59cc21

11 years agoMerge "Fix to comp_inter_joint_search_thresh feature."
Paul Wilkins [Thu, 4 Jul 2013 10:27:00 +0000 (03:27 -0700)]
Merge "Fix to comp_inter_joint_search_thresh feature."

11 years agoAdding update_tx_ct function, removing duplicated code.
Dmitry Kovalev [Thu, 4 Jul 2013 01:24:13 +0000 (18:24 -0700)]
Adding update_tx_ct function, removing duplicated code.

Change-Id: I8882fe3cd247a5a8304ab8ab2ee9abdb92830133

11 years agoRefactoring setup_pre_planes function.
Dmitry Kovalev [Thu, 4 Jul 2013 00:42:01 +0000 (17:42 -0700)]
Refactoring setup_pre_planes function.

Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63

11 years agoMerge "Adding write_skip_coeff function."
Dmitry Kovalev [Wed, 3 Jul 2013 23:33:58 +0000 (16:33 -0700)]
Merge "Adding write_skip_coeff function."

11 years agoMerge "Enable early termination in rd search"
Jingning Han [Wed, 3 Jul 2013 21:20:41 +0000 (14:20 -0700)]
Merge "Enable early termination in rd search"

11 years agoMerge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE."
Dmitry Kovalev [Wed, 3 Jul 2013 21:16:02 +0000 (14:16 -0700)]
Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE."

11 years agoAdding write_skip_coeff function.
Dmitry Kovalev [Wed, 3 Jul 2013 20:23:47 +0000 (13:23 -0700)]
Adding write_skip_coeff function.

Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd

11 years agoMerge "Inline a few intra predictors"
Yaowu Xu [Wed, 3 Jul 2013 20:21:22 +0000 (13:21 -0700)]
Merge "Inline a few intra predictors"

11 years agoEnable early termination in rd search
Jingning Han [Tue, 2 Jul 2013 23:48:15 +0000 (16:48 -0700)]
Enable early termination in rd search

This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a

11 years agoCalling set_partition_seg_context() instead of code duplication.
Dmitry Kovalev [Wed, 3 Jul 2013 18:15:58 +0000 (11:15 -0700)]
Calling set_partition_seg_context() instead of code duplication.

Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555

11 years agoReplacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Dmitry Kovalev [Wed, 3 Jul 2013 17:54:50 +0000 (10:54 -0700)]
Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.

Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c

11 years agoMerge "Adding write_selected_txfm_size function."
Dmitry Kovalev [Wed, 3 Jul 2013 17:33:55 +0000 (10:33 -0700)]
Merge "Adding write_selected_txfm_size function."

11 years agoInline a few intra predictors
Yaowu Xu [Wed, 3 Jul 2013 17:20:41 +0000 (10:20 -0700)]
Inline a few intra predictors

Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710

11 years agoRefactor SSE2 8x8 functional units
Jingning Han [Wed, 3 Jul 2013 16:05:01 +0000 (09:05 -0700)]
Refactor SSE2 8x8 functional units

These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d

11 years agoMerge "Use pmovmskb to skip quantize loops over empty coefficients."
Ronald S. Bultje [Wed, 3 Jul 2013 16:05:48 +0000 (09:05 -0700)]
Merge "Use pmovmskb to skip quantize loops over empty coefficients."

11 years agoMerge "Remove unused function vp9_build_inter4x4_predictors_mbuv()."
Ronald S. Bultje [Wed, 3 Jul 2013 16:05:20 +0000 (09:05 -0700)]
Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()."

11 years agoFix to comp_inter_joint_search_thresh feature.
Paul Wilkins [Wed, 3 Jul 2013 11:53:36 +0000 (12:53 +0100)]
Fix to comp_inter_joint_search_thresh feature.

When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88

11 years agoAdded two new skip experiments.
Paul Wilkins [Mon, 1 Jul 2013 15:27:12 +0000 (16:27 +0100)]
Added two new skip experiments.

sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd

11 years agoBug fix: set frame_parallel_decoding_mode
Scott LaVarnway [Wed, 3 Jul 2013 14:25:29 +0000 (10:25 -0400)]
Bug fix: set frame_parallel_decoding_mode

This patch allows the frame_parallel_decoding_mode flag
to be set instead of returning a codec error.

Change-Id: I4a1631c625723ac8873290d0fd0211074a87d112