Dmitry Kovalev [Tue, 2 Jul 2013 00:28:08 +0000 (17:28 -0700)]
Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
Yaowu Xu [Mon, 1 Jul 2013 22:58:57 +0000 (15:58 -0700)]
Merge "Quantize (64-bit only, for now) SSSE3 SIMD."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:48 +0000 (14:58 -0700)]
Merge "Removing vp9_modecont.{h, c}."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:36 +0000 (14:58 -0700)]
Merge "Moving encoder subexp encoding functions to subexp.{h, c}."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:20 +0000 (14:58 -0700)]
Merge "Adding vp9_rb_read_signed_literal function."
Dmitry Kovalev [Mon, 1 Jul 2013 21:58:06 +0000 (14:58 -0700)]
Merge "Inlining decode_atom, decode_sb_intra, and decode_sb."
Dmitry Kovalev [Mon, 1 Jul 2013 21:50:32 +0000 (14:50 -0700)]
Merge "Cleanup inside vp9_decodemv.c."
Ronald S. Bultje [Mon, 1 Jul 2013 18:36:07 +0000 (11:36 -0700)]
Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
Dmitry Kovalev [Mon, 1 Jul 2013 17:17:15 +0000 (10:17 -0700)]
Removing vp9_modecont.{h, c}.
Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.
Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
Paul Wilkins [Mon, 1 Jul 2013 16:39:02 +0000 (09:39 -0700)]
Merge "New motion threshold factor - speed feature."
Yaowu Xu [Mon, 1 Jul 2013 15:54:50 +0000 (08:54 -0700)]
fix a mismatch in cpuused 2
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
Paul Wilkins [Thu, 27 Jun 2013 12:16:33 +0000 (13:16 +0100)]
New motion threshold factor - speed feature.
Added a speed feature that focuses only on thresholds
for new motion modes.
Moved sf->comp_inter_joint_search_thresh into speed
1. This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.
Slight adjustment to baseline thresholds.
Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
Dmitry Kovalev [Mon, 1 Jul 2013 09:09:36 +0000 (02:09 -0700)]
Adding vp9_rb_read_signed_literal function.
Change-Id: I30ea91561ffac7e5065ba41b2d3ab7dedb720593
Jingning Han [Sat, 29 Jun 2013 22:57:03 +0000 (15:57 -0700)]
Merge "Enable SSE2 4x4 ADST/DCT transform"
Christian Duvivier [Tue, 18 Jun 2013 22:23:25 +0000 (15:23 -0700)]
SSE2 version of vp9_short_fdct32x32_rd.
43,000 -> 5,750 cycles, about 7.5x faster.
Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
Dmitry Kovalev [Sat, 29 Jun 2013 18:50:45 +0000 (11:50 -0700)]
Moving encoder subexp encoding functions to subexp.{h, c}.
Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca
Ronald S. Bultje [Sat, 29 Jun 2013 14:42:41 +0000 (07:42 -0700)]
Merge "fixed a bug where sse is not populated"
Johann [Sat, 29 Jun 2013 02:50:38 +0000 (19:50 -0700)]
Merge "add Neon optimized add constant residual functions"
James Zern [Sat, 29 Jun 2013 02:48:05 +0000 (19:48 -0700)]
Merge "fix test compile error"
Ronald S. Bultje [Sat, 29 Jun 2013 02:37:11 +0000 (19:37 -0700)]
Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."
Ronald S. Bultje [Sat, 29 Jun 2013 02:36:56 +0000 (19:36 -0700)]
Merge "Minor change to prevent one level of dereference in cost_coeffs()."
chm [Thu, 27 Jun 2013 12:47:56 +0000 (20:47 +0800)]
add Neon optimized add constant residual functions
- Add add_constant_residual_8x8 16x16 32x32 functions
- Tested under RealView debugger enviroment
Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
Dmitry Kovalev [Sat, 29 Jun 2013 01:38:02 +0000 (18:38 -0700)]
Merge "Cosmetic reordering of FRAME_CONTEXT members."
Dmitry Kovalev [Sat, 29 Jun 2013 01:34:30 +0000 (18:34 -0700)]
Inlining decode_atom, decode_sb_intra, and decode_sb.
Change-Id: I41711bb994f542c5ba3d0cefd9b2e79db3c2c3a1
James Zern [Sat, 29 Jun 2013 01:07:37 +0000 (18:07 -0700)]
fix test compile error
since:
92479d9 Make update_partition_context faster
fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
^~~~~~~~~~~~~~~~~
Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
Jingning Han [Fri, 28 Jun 2013 20:37:19 +0000 (13:37 -0700)]
Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.
Change-Id: Iad4d526346e05c7be896466c05500711bb763660
Yaowu Xu [Sat, 29 Jun 2013 00:10:22 +0000 (17:10 -0700)]
fixed a bug where sse is not populated
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
Jingning Han [Fri, 28 Jun 2013 23:49:59 +0000 (16:49 -0700)]
Merge "Fix switch statement in 8x8 transform"
Dmitry Kovalev [Fri, 28 Jun 2013 23:16:03 +0000 (16:16 -0700)]
Cosmetic reordering of FRAME_CONTEXT members.
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
Dmitry Kovalev [Fri, 28 Jun 2013 22:32:01 +0000 (15:32 -0700)]
Cleanup inside vp9_decodemv.c.
Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.
Change-Id: Ifee05d333da4cd331d4aff40ce41ccd9b70e494a
Dmitry Kovalev [Fri, 28 Jun 2013 21:03:28 +0000 (14:03 -0700)]
Merge "Removing CONFIG_DEBUG checks on assertions."
Jingning Han [Fri, 28 Jun 2013 20:39:32 +0000 (13:39 -0700)]
Fix switch statement in 8x8 transform
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
Ronald S. Bultje [Fri, 28 Jun 2013 18:54:50 +0000 (11:54 -0700)]
Merge "Some minor optimizations for cost_coeffs()."
Ronald S. Bultje [Fri, 28 Jun 2013 18:54:28 +0000 (11:54 -0700)]
Merge "Make coefficient skip condition an explicit RD choice."
Ronald S. Bultje [Fri, 28 Jun 2013 17:40:21 +0000 (10:40 -0700)]
Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles
Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
Dmitry Kovalev [Fri, 28 Jun 2013 17:38:54 +0000 (10:38 -0700)]
Merge "Decoder's code cleanup."
Dmitry Kovalev [Fri, 28 Jun 2013 17:36:20 +0000 (10:36 -0700)]
Removing CONFIG_DEBUG checks on assertions.
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.
Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
Ronald S. Bultje [Fri, 28 Jun 2013 17:21:25 +0000 (10:21 -0700)]
Minor change to prevent one level of dereference in cost_coeffs().
4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles
Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
Ronald S. Bultje [Fri, 28 Jun 2013 03:57:37 +0000 (20:57 -0700)]
Some minor optimizations for cost_coeffs().
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles
Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
Ronald S. Bultje [Fri, 28 Jun 2013 00:41:54 +0000 (17:41 -0700)]
Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.
The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.
Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.
Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
Yaowu Xu [Fri, 28 Jun 2013 16:56:11 +0000 (09:56 -0700)]
Merge "Minor cleanups"
Yaowu Xu [Fri, 28 Jun 2013 16:29:39 +0000 (09:29 -0700)]
Merge "Optimize partition search order"
Yaowu Xu [Fri, 28 Jun 2013 16:19:50 +0000 (09:19 -0700)]
Minor cleanups
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
Yaowu Xu [Thu, 27 Jun 2013 19:07:07 +0000 (12:07 -0700)]
Optimize partition search order
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.
This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set
Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
Ronald S. Bultje [Fri, 28 Jun 2013 14:01:15 +0000 (07:01 -0700)]
Merge "Fix tile independence with both column tiling and static_thresh set."
James Zern [Fri, 28 Jun 2013 05:25:33 +0000 (22:25 -0700)]
Merge "variance_test: add missing ClearSystemState..."
Ronald S. Bultje [Fri, 28 Jun 2013 02:50:23 +0000 (19:50 -0700)]
Fix tile independence with both column tiling and static_thresh set.
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
Dmitry Kovalev [Thu, 27 Jun 2013 23:15:43 +0000 (16:15 -0700)]
Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.
Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
Frank Galligan [Fri, 21 Jun 2013 19:58:46 +0000 (12:58 -0700)]
Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
functions.
- Matches x86 md5 checksum.
Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
Dmitry Kovalev [Thu, 27 Jun 2013 21:57:07 +0000 (14:57 -0700)]
Merge "General cleanup in segmentation-related code."
Dmitry Kovalev [Thu, 27 Jun 2013 21:55:18 +0000 (14:55 -0700)]
Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file."
Ronald S. Bultje [Wed, 26 Jun 2013 00:28:24 +0000 (17:28 -0700)]
Inline quantize so idiv instruction gets removed from inner loop.
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.
Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
Paul Wilkins [Thu, 27 Jun 2013 09:28:41 +0000 (02:28 -0700)]
Merge "Auto adapt step size feature."
Paul Wilkins [Thu, 27 Jun 2013 09:28:36 +0000 (02:28 -0700)]
Merge "Start adaptive threshold for each mode at max."
Paul Wilkins [Thu, 27 Jun 2013 09:28:21 +0000 (02:28 -0700)]
Merge "Change meaning of cpi->sf.first_step and rename."
Jingning Han [Thu, 27 Jun 2013 02:02:02 +0000 (19:02 -0700)]
Merge "Make intra predictor reference buffer configurable"
Jingning Han [Thu, 27 Jun 2013 02:01:45 +0000 (19:01 -0700)]
Merge "Make update_partition_context faster"
James Zern [Thu, 27 Jun 2013 01:32:21 +0000 (18:32 -0700)]
variance_test: add missing ClearSystemState...
...to recently added SubpelVarianceTest
Change-Id: I8775e39fd5dbfba81ad42b79b47bf6dd6ca8cc0e
Yaowu Xu [Thu, 27 Jun 2013 00:19:47 +0000 (17:19 -0700)]
Merge "Change to use LUT for mode-to-txfm conversion"
Jingning Han [Wed, 26 Jun 2013 02:41:56 +0000 (19:41 -0700)]
Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
Jingning Han [Thu, 27 Jun 2013 00:06:56 +0000 (17:06 -0700)]
Merge "Remove empty function vp9_build_block_offsets"
Jingning Han [Wed, 26 Jun 2013 18:50:14 +0000 (11:50 -0700)]
Make update_partition_context faster
Use vpx_memset for updating the partition contexts. Thanks to Noah
for pointing out the need of refactoring in this part.
Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
Ronald S. Bultje [Wed, 26 Jun 2013 21:52:56 +0000 (14:52 -0700)]
Remove unused macro RDTRUNC_8x8 from encodemb.c.
Change-Id: I0c097567adab24215d807963ccb34810a2afe007
Jingning Han [Wed, 26 Jun 2013 21:55:47 +0000 (14:55 -0700)]
Remove empty function vp9_build_block_offsets
This function is empty, hence is removed.
Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74
Yaowu Xu [Wed, 26 Jun 2013 01:15:42 +0000 (18:15 -0700)]
Change to use LUT for mode-to-txfm conversion
Change-Id: Ieb989830f49e6708ee7728eddebf7a2144c37c6f
Jingning Han [Wed, 26 Jun 2013 19:21:05 +0000 (12:21 -0700)]
Merge "Fix aligned memory allocation in unit tests"
Jingning Han [Wed, 26 Jun 2013 18:59:46 +0000 (11:59 -0700)]
Fix aligned memory allocation in unit tests
Change-Id: I38fac90e0ed25cb747453ab1d6396187cf5ef3b9
Paul Wilkins [Wed, 26 Jun 2013 18:58:16 +0000 (11:58 -0700)]
Merge "fixed a compiling problem with MSVC win32 build"
Paul Wilkins [Wed, 26 Jun 2013 16:06:25 +0000 (17:06 +0100)]
Auto adapt step size feature.
Also tweaks to other features and experiments with
what is on and off at different speed settings.
Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
James Zern [Wed, 26 Jun 2013 18:09:08 +0000 (11:09 -0700)]
test/fdct*: fix some warnings
comment out some unused parameters and adjust the format to avoid:
./test/fdct4x4_test.cc|27| warning C4138: '*/' found outside of comment
Change-Id: I60f93b4c3cd7e8d61f0de80019f3404b40161f03
Dmitry Kovalev [Wed, 26 Jun 2013 17:27:28 +0000 (10:27 -0700)]
General cleanup in segmentation-related code.
Using consistent function and variable names.
Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7
Dmitry Kovalev [Wed, 26 Jun 2013 17:23:27 +0000 (10:23 -0700)]
Merge "Using get_plane_block_{width, height} instead of custom code."
Yaowu Xu [Wed, 26 Jun 2013 16:33:16 +0000 (09:33 -0700)]
fixed a compiling problem with MSVC win32 build
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.
Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992
Paul Wilkins [Tue, 25 Jun 2013 13:59:18 +0000 (14:59 +0100)]
Start adaptive threshold for each mode at max.
Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.
Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
Paul Wilkins [Mon, 24 Jun 2013 14:19:16 +0000 (15:19 +0100)]
Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.
The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.
Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
John Koleszar [Wed, 26 Jun 2013 05:44:39 +0000 (22:44 -0700)]
Merge "vpxenc: send usage to stderr"
John Koleszar [Wed, 26 Jun 2013 05:44:26 +0000 (22:44 -0700)]
Merge ".gitignore: add gcov files"
John Koleszar [Wed, 26 Jun 2013 05:44:21 +0000 (22:44 -0700)]
Merge "Move vp9_counts_to_nmv_context to encoder"
John Koleszar [Wed, 26 Jun 2013 05:44:16 +0000 (22:44 -0700)]
Merge "Move vp9_full_to_model_counts to encoder"
John Koleszar [Wed, 26 Jun 2013 05:30:50 +0000 (22:30 -0700)]
Merge "make: add libvpx_test_srcs.txt target"
John Koleszar [Wed, 26 Jun 2013 05:29:37 +0000 (22:29 -0700)]
Merge "tests/*source: test file pointer before reading"
John Koleszar [Wed, 26 Jun 2013 05:27:39 +0000 (22:27 -0700)]
Merge "encode_test_driver: check for fatal failures"
Jingning Han [Wed, 26 Jun 2013 02:46:55 +0000 (19:46 -0700)]
Merge "Refactor intra predictor block"
James Zern [Wed, 26 Jun 2013 00:55:28 +0000 (17:55 -0700)]
tests/*source: test file pointer before reading
if the caller did not abort after an ASSERT failure in Begin()
FillFrame() would segfault.
Change-Id: I2d3f5a0918611bbd081be6f686dea19c56695073
James Zern [Wed, 26 Jun 2013 00:53:20 +0000 (17:53 -0700)]
encode_test_driver: check for fatal failures
Make the base test be:
!(fatal || abort_) removing some redundancy in the encode tests
Change-Id: I8ffaf33fcf9a3030b38ea3e8eb94704cdc2fc920
Jingning Han [Tue, 25 Jun 2013 23:01:48 +0000 (16:01 -0700)]
Refactor intra predictor block
Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.
Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
Dmitry Kovalev [Tue, 25 Jun 2013 22:19:18 +0000 (15:19 -0700)]
Renaming "nmv" to "mv".
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
Dmitry Kovalev [Tue, 25 Jun 2013 21:11:18 +0000 (14:11 -0700)]
Using get_plane_block_{width, height} instead of custom code.
Change-Id: I453ed11b965e857a14c18ea5c0f4a0a48e7dc0d9
Ronald S. Bultje [Tue, 25 Jun 2013 20:51:18 +0000 (13:51 -0700)]
Merge "Only do metrics on cropped (visible) area of picture."
Ronald S. Bultje [Tue, 25 Jun 2013 20:51:04 +0000 (13:51 -0700)]
Merge "Don't skip right/bottom border pixels in SSIM calculations."
Ronald S. Bultje [Tue, 25 Jun 2013 20:50:53 +0000 (13:50 -0700)]
Merge "Add averaging-SAD functions for 8-point comp-inter motion search."
James Zern [Tue, 25 Jun 2013 20:50:30 +0000 (13:50 -0700)]
make: add libvpx_test_srcs.txt target
same application as libvpx_srcs.txt
Change-Id: I1f096cc3c180d205365663c1aa5533b52561d811
Jingning Han [Tue, 25 Jun 2013 20:17:23 +0000 (13:17 -0700)]
Merge "Cosmetic changes in 4x4 fwd transform unit test"
Jingning Han [Tue, 25 Jun 2013 20:17:05 +0000 (13:17 -0700)]
Merge "Tune the rounding operations in 8x8 ADST/DCT sse2"
James Zern [Tue, 25 Jun 2013 19:57:49 +0000 (12:57 -0700)]
Merge "I420VideoSource: normalize framerate types"
Ronald S. Bultje [Mon, 10 Jun 2013 18:47:22 +0000 (11:47 -0700)]
Only do metrics on cropped (visible) area of picture.
The part where we align it by 8 or 16 is an implementation detail that
shouldn't matter to the outside world.
Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
Ronald S. Bultje [Mon, 10 Jun 2013 18:36:04 +0000 (11:36 -0700)]
Don't skip right/bottom border pixels in SSIM calculations.
Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1
Ronald S. Bultje [Tue, 25 Jun 2013 18:26:49 +0000 (11:26 -0700)]
Add averaging-SAD functions for 8-point comp-inter motion search.
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.
Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
James Zern [Tue, 25 Jun 2013 19:56:40 +0000 (12:56 -0700)]
Merge "intrapred_test: add virtual dtor to IntraPredBase"
Jingning Han [Fri, 21 Jun 2013 22:56:24 +0000 (15:56 -0700)]
Tune the rounding operations in 8x8 ADST/DCT sse2
Improve the round-trip precision to meet the unit test setttings.
Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79