Johann [Tue, 21 Sep 2010 18:56:42 +0000 (14:56 -0400)]
Fix typo
Also, move with other ppc32 options
Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
John Koleszar [Tue, 21 Sep 2010 14:02:43 +0000 (07:02 -0700)]
Merge "configure: support for ppc32-linux-gcc"
John Koleszar [Tue, 21 Sep 2010 12:36:46 +0000 (05:36 -0700)]
Merge "Add high limit check for unsigned parameters"
Yunqing Wang [Tue, 21 Sep 2010 12:00:30 +0000 (05:00 -0700)]
Merge "Restructure multi-threaded decoder"
Fritz Koenig [Mon, 20 Sep 2010 16:30:49 +0000 (09:30 -0700)]
Use movq instead of movdqu.
Movdqu is more expensive (throughput, uops) than movq. Minimal
impact for newer big cores, but ~2.25% gain on Atom.
Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
Fritz Koenig [Mon, 20 Sep 2010 18:01:51 +0000 (11:01 -0700)]
Merge "Better choice of instruction filter mask comparision."
Johann [Mon, 20 Sep 2010 17:47:33 +0000 (10:47 -0700)]
Merge "reorder data to use wider instructions"
Johann [Mon, 20 Sep 2010 17:47:22 +0000 (10:47 -0700)]
Merge "Update NEON wide idcts"
Fritz Koenig [Wed, 15 Sep 2010 21:07:32 +0000 (14:07 -0700)]
Better choice of instruction filter mask comparision.
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.
Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
Guillermo Ballester Valor [Fri, 11 Jun 2010 18:33:49 +0000 (14:33 -0400)]
Add high limit check for unsigned parameters
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.
This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.
Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
Johann [Thu, 9 Sep 2010 19:55:19 +0000 (15:55 -0400)]
reorder data to use wider instructions
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data
looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
Yunqing Wang [Thu, 16 Sep 2010 18:08:52 +0000 (14:08 -0400)]
Restructure multi-threaded decoder
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.
The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.
Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.
Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
John Koleszar [Thu, 16 Sep 2010 17:13:31 +0000 (13:13 -0400)]
cleanup: remove unused xprintf
These files aren't currently used, and we can get them back if we
need them.
Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
John Koleszar [Thu, 16 Sep 2010 14:00:04 +0000 (10:00 -0400)]
Reduce size of tokenizer tables
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.
Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
Fritz Koenig [Tue, 14 Sep 2010 22:46:37 +0000 (15:46 -0700)]
Modify GET_GOT macro for performance.
GET_GOT was producing a zero length call. This resulted in
pipeline flushes occuring when returing from the assembly
functions. Masked on out of order cores, but evident on
Atom cores.
Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
Fritz Koenig [Tue, 14 Sep 2010 01:34:34 +0000 (18:34 -0700)]
Removed unnecessary pxor.
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.
Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
Fritz Koenig [Mon, 13 Sep 2010 18:04:22 +0000 (11:04 -0700)]
Merge "Make block access to frame buffer sequential"
John Koleszar [Mon, 13 Sep 2010 13:04:55 +0000 (09:04 -0400)]
configure: support for ppc32-linux-gcc
Fixes issue 89. Thanks to josejx for the patch.
Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
John Koleszar [Mon, 13 Sep 2010 13:00:24 +0000 (09:00 -0400)]
cosmetics: expand tabs in configure
Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
Fritz Koenig [Fri, 10 Sep 2010 23:27:28 +0000 (16:27 -0700)]
Make block access to frame buffer sequential
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.
Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
Scott LaVarnway [Thu, 9 Sep 2010 18:51:29 +0000 (11:51 -0700)]
Merge "Improved subset block search"
Scott LaVarnway [Thu, 9 Sep 2010 18:42:48 +0000 (14:42 -0400)]
Improved subset block search
Improved the subset block search and fill. (about 3% improvement for
32 bit) Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.
Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
Johann [Tue, 7 Sep 2010 18:21:27 +0000 (14:21 -0400)]
Update NEON wide idcts
Expand
93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost
Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
John Koleszar [Thu, 9 Sep 2010 16:57:23 +0000 (12:57 -0400)]
Fix GF interval for non-lagged ARFs
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.
Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
Fritz Koenig [Thu, 9 Sep 2010 15:54:21 +0000 (08:54 -0700)]
Merge branch 'master' of git://review.webmproject.org/libvpx
John Koleszar [Thu, 9 Sep 2010 12:16:39 +0000 (08:16 -0400)]
Use WebM in copyright notice for consistency
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
Jim Bankoski [Thu, 22 Jul 2010 20:07:13 +0000 (16:07 -0400)]
Skip unnecessary search of identical frames
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.
Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
Jim Bankoski [Thu, 22 Jul 2010 20:07:13 +0000 (16:07 -0400)]
Enable ARFs for non-lagged compress
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.
Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
Fritz Koenig [Tue, 7 Sep 2010 17:52:54 +0000 (10:52 -0700)]
Bilinear subpixel optimizations for ssse3.
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.
Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
Scott LaVarnway [Thu, 2 Sep 2010 20:17:52 +0000 (16:17 -0400)]
Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK. Also reduced the size of other members
of the MB_MODE_INFO struct. For 1080p, the memory was reduced
by 1,209,516 bytes. The decoder performance appeared to improve
by 3% for the clip used.
Note: The main goal for this change is to improve the decoder
performance. The encoder will be revisited at a later date for
further structure cleanup.
Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
John Koleszar [Thu, 2 Sep 2010 18:56:47 +0000 (14:56 -0400)]
Update CHANGELOG for v0.9.2 release
Change-Id: I184e927987544e9f34f890249b589ea13a93a330
John Koleszar [Thu, 2 Sep 2010 17:40:43 +0000 (13:40 -0400)]
Update AUTHORS
Change-Id: I0395ffa107651a773fd11d12682ab9372f76a90b
John Koleszar [Thu, 2 Sep 2010 17:33:01 +0000 (13:33 -0400)]
Whitespace: nuke CRLFs
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
John Koleszar [Thu, 2 Sep 2010 16:03:51 +0000 (12:03 -0400)]
Use native win32 timers on mingw
Changed to use QueryPerformanceCounter on Windows rather than only
when building with MSVC, so that MSVC can link libs built with
MinGW.
Fixes issue #149.
Change-Id: Ie2dc7edc8f4d096cf95ec5ffb1ab00f2d67b3e7d
John Koleszar [Thu, 2 Sep 2010 15:51:45 +0000 (11:51 -0400)]
Fix target detection on mingw32
gcc -dumpmachine returns only 'mingw32'
Change-Id: I774d05a97c5131fc12009e436712c319e54490a5
John Koleszar [Thu, 2 Sep 2010 15:17:05 +0000 (11:17 -0400)]
Use -fno-common for mingw
Fixes http://code.google.com/p/webm/issues/detail?id=112
Thanks to Ramiro Polla for the issue/fix.
Change-Id: I7f7b547a4ea3270e183f59280510066cc29a619e
James Zern [Fri, 20 Aug 2010 20:06:56 +0000 (16:06 -0400)]
encoder: remove postproc dependency
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.
Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.
This addresses issue #153.
Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
John Koleszar [Thu, 2 Sep 2010 15:42:29 +0000 (08:42 -0700)]
Merge "added separate rounding/zbin constants for 2nd order"
John Koleszar [Thu, 2 Sep 2010 15:41:46 +0000 (08:41 -0700)]
Merge "Disable frame dropping by default"
Yaowu Xu [Mon, 16 Aug 2010 23:16:24 +0000 (16:16 -0700)]
added separate rounding/zbin constants for 2nd order
This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.
Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
John Koleszar [Thu, 2 Sep 2010 13:32:03 +0000 (09:32 -0400)]
Disable frame dropping by default
This is not the behavior that most users expect.
Change-Id: I226126ea400c22cf1f7918e80ea7fe0771c569cb
Frank Galligan [Wed, 1 Sep 2010 20:40:18 +0000 (16:40 -0400)]
Fix rare deadlock before loop filter
There was an extremely rare deadlock that happened when one thread
was waiting to start the loop filter on frame n while the other
threads were starting to work on frame n+1.
Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
Paul Wilkins [Wed, 1 Sep 2010 09:45:12 +0000 (02:45 -0700)]
Merge "Improved Force Key Frame Behaviour"
Yunqing Wang [Mon, 30 Aug 2010 22:16:04 +0000 (18:16 -0400)]
Replace sleep(0) calls in multi-threaded decoder
This is a workaround for gLucid problem.
Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
Paul Wilkins [Fri, 20 Aug 2010 11:27:26 +0000 (12:27 +0100)]
Improved Force Key Frame Behaviour
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.
The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.
Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
Johann [Thu, 26 Aug 2010 20:11:30 +0000 (16:11 -0400)]
followup arm patch
make the arm asm detokenizer work with the new structures
Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
Scott LaVarnway [Tue, 31 Aug 2010 14:49:57 +0000 (10:49 -0400)]
Changed above and left context data layout
The main reason for the change was to reduce cycles in the token
decoder. (~1.5% gain for 32 bit) This layout should be more
cache friendly.
As a result of this change, the encoder had to be updated.
Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837
Note: dixie uses a similar layout
John Koleszar [Mon, 30 Aug 2010 19:40:42 +0000 (12:40 -0700)]
Merge "Fix harmless off-by-1 error."
John Koleszar [Mon, 30 Aug 2010 19:40:37 +0000 (12:40 -0700)]
Merge "Fix two-pass framrate for Y4M input."
John Koleszar [Mon, 30 Aug 2010 14:49:35 +0000 (07:49 -0700)]
Merge "increase rate control buffer level precision"
Timothy B. Terriberry [Wed, 5 May 2010 23:14:36 +0000 (19:14 -0400)]
Fix harmless off-by-1 error.
The memory being zeroed in vp8_update_mode_info_border() was just
allocated with calloc, and so the entire function is actually
redundant, but it should be made correct in case someone expects
it to actually work in the future.
Change-Id: If7a84e489157ab34ab77ec6e2fe034fb71cf8c79
Timothy B. Terriberry [Fri, 27 Aug 2010 22:21:22 +0000 (15:21 -0700)]
Fix two-pass framrate for Y4M input.
The timebase was being set to the value in the Y4M file on each
pass, but only doubled to account for the altref placement on
the first past.
This avoids reseting it on the second pass.
Change-Id: Ie342639bad1ffe9c2214fbbaaded72cfed835b42
Fritz Koenig [Wed, 25 Aug 2010 18:39:59 +0000 (11:39 -0700)]
Merge "Allow --cpu= to work for x86."
Fritz Koenig [Tue, 24 Aug 2010 23:27:49 +0000 (16:27 -0700)]
Allow --cpu= to work for x86.
--cpu was already implemented for most of our embedded
platforms, this just extends it to x86. Corner case for
Atom processor as it doesn't respond to the --march=
option under icc.
Change-Id: I2d57a7a6e9d0b55c0059e9bc46cfc9bf9468c185
Johann [Tue, 24 Aug 2010 22:23:16 +0000 (18:23 -0400)]
clean up compiler warnings
did a test compile with clang and got rid of some warnings that have
been annoying me for a while:
vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer':
vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type
vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type
Change-Id: I78ddab176fe47cbeed30379709dc7bab01c0c2e4
Johann [Mon, 23 Aug 2010 17:35:26 +0000 (13:35 -0400)]
update structures
mbmi and eob moved in previous commits
Change-Id: I30a2eba36addf89ee50b406ad4afdd059a832711
Fritz Koenig [Fri, 20 Aug 2010 17:58:19 +0000 (10:58 -0700)]
Rework idct calling structure.
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data. This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.
SSE2 implementation gives 3% gain on Atom.
Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
John Koleszar [Fri, 20 Aug 2010 15:04:10 +0000 (11:04 -0400)]
increase rate control buffer level precision
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.
Change-Id: If8dd2a87614c05747f81432cbe75dd9e6ed2f04e
Jim Bankoski [Thu, 19 Aug 2010 19:50:29 +0000 (15:50 -0400)]
Revert "Removed ssse3 sixtap code"
This reverts commit
6ea5bb85cd1547b846f4c794e8684de5abcf9f62.
Johann [Thu, 19 Aug 2010 17:37:40 +0000 (13:37 -0400)]
cleanup simple loop filter
move some things around, reorder some instructions
constant 0 is used several times. load it once per call in horiz,
once per loop in vert.
separate saturating instructions to avoid stalls.
just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0
document some stalls for further consideration
Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
Johann [Thu, 19 Aug 2010 15:31:57 +0000 (08:31 -0700)]
Merge "fix armv6 simpleloop filter"
Johann [Thu, 19 Aug 2010 15:29:21 +0000 (11:29 -0400)]
fix armv6 simpleloop filter
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.
Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
Scott LaVarnway [Wed, 18 Aug 2010 19:29:38 +0000 (15:29 -0400)]
Removed ssse3 sixtap code
Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
John Koleszar [Mon, 16 Aug 2010 14:54:48 +0000 (07:54 -0700)]
Merge "store more vars than we removed"
Johann [Mon, 16 Aug 2010 14:32:15 +0000 (10:32 -0400)]
store more vars than we removed
only saved r4-11+lr, but were storing r4-r12+lr
Change-Id: If77df1998af50e9badee7d99ef53543046434675
John Koleszar [Mon, 16 Aug 2010 13:34:30 +0000 (09:34 -0400)]
arm: fix missing dependency with --enable-shared
The C version of the dequant/idct/add function depends on the C
version of the IDCT, but this isn't compiled in on ARM. Since this
code has asm version, we can just remove this file to eliminate the
link error.
Change-Id: I21de74d89d3765a1db2da27292b20727c53178e9
John Koleszar [Fri, 13 Aug 2010 18:50:51 +0000 (14:50 -0400)]
move segmentation_common to encoder
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.
Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
Johann [Thu, 12 Aug 2010 13:05:37 +0000 (09:05 -0400)]
framework for assembly version of the detokenizer
adds a compile time option: --enable-arm-asm-detok which pulls in
vp8/decoder/arm/detokenize.asm
currently about break even speed wise, but changes are pending to
the fill code (branch and load 3 bytes versus conditionally always
load one) and the error handling. Currently it doesn't handle zero
runs or overrunning the buffer.
this is really just so i don't have to rebase my changes all the
time to run benchmarks - now just need to replace one file!
Change-Id: I56d0e2354dc0ca3811bffd0e88fe1f952fa6c797
Johann [Thu, 12 Aug 2010 17:27:07 +0000 (13:27 -0400)]
update structure
mode_info_context->mbmi no longer gets copied up a level
Change-Id: Icd2d27d381909721326c34594a1ccdc26d48a995
Johann [Thu, 12 Aug 2010 17:06:47 +0000 (13:06 -0400)]
remove unused definition
asm_offsets contains some definitions which are no longer used. this
was one of them. v6 build works now
Change-Id: If370cfa8acd145de4fead2d9a11b048fccc090df
Scott LaVarnway [Thu, 12 Aug 2010 20:25:43 +0000 (16:25 -0400)]
Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result,
a large number compile errors had to be fixed.
Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
Scott LaVarnway [Wed, 11 Aug 2010 19:23:24 +0000 (12:23 -0700)]
Merge "Finished vp8_sixtap_predict4x4_ssse3 function"
John Koleszar [Mon, 9 Aug 2010 17:48:04 +0000 (13:48 -0400)]
cosmetics: add missing 2D array braces
Silences compile warning.
Change-Id: I4b207d97f8570fe29aa2710e4ce4f02e7e43b57a
John Koleszar [Mon, 9 Aug 2010 17:27:26 +0000 (13:27 -0400)]
avoid negative array subscript warnings
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.
Change-Id: I3699c3f84111209ecfb91097c4b900773e9a3ad5
Scott LaVarnway [Wed, 11 Aug 2010 17:49:00 +0000 (13:49 -0400)]
Finished vp8_sixtap_predict4x4_ssse3 function
Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3
assembly routines. Also removed unused assembly.
Change-Id: I01c1021835f2edda9da706822345f217087ca0d0
Johann [Wed, 11 Aug 2010 17:36:35 +0000 (13:36 -0400)]
rename DETOK_[AL]
everything else uses lowercase detok
Change-Id: I9671e2e90eb2961208dfa81c00b3accb5749ec04
Scott LaVarnway [Wed, 11 Aug 2010 15:02:31 +0000 (11:02 -0400)]
Moved gf_active code to encoder only
The gf_active code is only used by the encoder, so it was moved from
common and decoder.
Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
Yaowu Xu [Wed, 11 Aug 2010 04:45:34 +0000 (21:45 -0700)]
Removed duplicate functions
Change-Id: Ie587972ccefd3c762b8cdf8ef39345cd22924b9b
Yaowu Xu [Wed, 11 Aug 2010 04:12:04 +0000 (21:12 -0700)]
Normalize quantizer's zero bin and rounding factors
This patch changes a few numbers in the two constant arrays
for quantizer's zerobin and rounding factors, in general to
make the sum of the two factors for any Q to be 128. While
it might be beneficial to calibrate the two arrays for best
quantizer performance, it is not the purpose of this patch.
Normalizing the two arrays will enable quick optimization
of the current faster quantizer, i.e .zerobin check can be
removed.
Change-Id: If9abfd7929bf4b8e9ecd64a79d817c6728c820bd
Timothy B. Terriberry [Fri, 2 Jul 2010 21:35:53 +0000 (14:35 -0700)]
Add trellis quantization.
Replace the exponential search for optimal rounding during
quantization with a linear Viterbi trellis and enable it
by default when using --best.
Right now this operates on top of the output of the adaptive
zero-bin quantizer in vp8_regular_quantize_b() and gives a small
gain.
It can be tested as a replacement for that quantizer by
enabling the call to vp8_strict_quantize_b(), which uses
normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
in order to take advantage of activity masking, since there is
limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
quantizer (which is lambda-dependent) loses to
vp8_regular_quantize_b() alone (which is not) on my test clip.
Patch Set 3:
Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.
Patch Set 2:
Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.
(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)
Change-Id: Ib1a043b665d75fbf00cb0257b7c18e90eebab95e
Scott LaVarnway [Tue, 10 Aug 2010 21:06:05 +0000 (17:06 -0400)]
Added ssse3 version of sixtap filters
Improved decoder performance by 9% for the clip used.
Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9
Yunqing Wang [Thu, 29 Jul 2010 20:24:26 +0000 (16:24 -0400)]
First modification of multi-thread decoder
This is the first modification of VP8 multi-thread decoder, which uses
same threads to decode macroblocks and then do loopfiltering for each
frame.
Inspired by Rob Clark, synchronization was done on every 8 macroblocks
instead of every macroblock to reduce lock contention.
Comparing with the original code, this implementation gave about 15%-
20% performance gain while decoding my test clips on a Core2 Quad
platform (Linux).
The work is not done yet.
Test on other platforms are needed.
Change-Id: Ice9ddb0b511af1359b9f71e65066143c04fef3b5
John Koleszar [Mon, 9 Aug 2010 13:33:00 +0000 (09:33 -0400)]
Mark loopfilter C functions as static
Clang defaults to C99 mode, and inline works differently in C99.
(gcc, on the other hand, defaults to a special gnu-style inlining,
which uses different syntax.) Making the functions static makes sure
clang doesn't decide to discard a function because it's too large to
inline.
Thanks to eli.friedman for the patch.
Fixes http://code.google.com/p/webm/issues/detail?id=114
Change-Id: If3c1c3c176eb855a584a60007237283b0cc631a4
John Koleszar [Mon, 2 Aug 2010 16:35:05 +0000 (09:35 -0700)]
Merge "Issue 150: Fixing linker warning in extend.c."
John Koleszar [Mon, 2 Aug 2010 14:21:52 +0000 (10:21 -0400)]
configure: support directories containing .o
Fixes http://code.google.com/p/webm/issues/detail?id=96
The regex which postprocesses the gcc make-deps (-M) output was too
greedy and matching in the dependencies part of the rule rather than
the target only. The patch provided with the issue was not correct, as
it tried to match the .o at the end of the line, which isn't correct
at least for my GCC version. This patch matches word characters
instead of .*
Thanks to raimue and the MacPorts community for isolating this issue.
Change-Id: I28510da2252e03db910c017101d9db12e5945a27
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: avoid space before the :data symbol type.
global label:data
^^
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: end labels with colon (':')
Labels should end by colon (':'), nasm requires it.
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: use OWORD vs DQWORD
nasm knows only OWORD. yasm knows both OWORD and DQWORD.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I62151390089e90df9a7667822fa594ac20b00e78
John Koleszar [Mon, 2 Aug 2010 13:16:26 +0000 (06:16 -0700)]
Merge "Replace pinsrw (SSE) with MMX instructions"
Philip Jägenstedt [Tue, 13 Jul 2010 09:43:51 +0000 (11:43 +0200)]
Replace pinsrw (SSE) with MMX instructions
Fixes http://code.google.com/p/webm/issues/detail?id=136
Change-Id: I5a3e294061644a1a9718e8ba4a39548ede25cc42
John Koleszar [Thu, 29 Jul 2010 21:04:39 +0000 (17:04 -0400)]
apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'.
Patch courtesy of timeless at gmail.com
Change-Id: I9991dedd1ccfddc092eca86705ecbc3b764b799d
Yaowu Xu [Thu, 29 Jul 2010 14:17:40 +0000 (07:17 -0700)]
Merge "Enable the switch between two versions of quantizer"
Frank Galligan [Wed, 28 Jul 2010 21:25:09 +0000 (17:25 -0400)]
Removed two unused global variables.
Removed the global variables vp8_an and vp8_cd. vp8_an was causing problems
because it was increasing the .bss by 1572864 bytes.
Change-Id: I6c12e294133c7fb6e770c0e4536d8287a5720a87
Yaowu Xu [Wed, 28 Jul 2010 17:44:17 +0000 (10:44 -0700)]
Enable the switch between two versions of quantizer
To facilitate more testing related to quantizer and rate
control, the old version quantizer is added back. old and
new quantizer can be switched back and forth by define or
un-define the macro "EXACT_QUANT".
Change-Id: Ia77e687622421550f10e9d65a9884128a79a65ff
John Koleszar [Tue, 22 Jun 2010 13:53:23 +0000 (09:53 -0400)]
configure: pass original arguments through to make dist
When running configure automatically through the make dist target,
reuse the arguments passed to the original configure command.
Change-Id: I40e5b8384d6485a565b91e6d2356d5bc9c4c5928
John Koleszar [Tue, 27 Jul 2010 18:21:42 +0000 (11:21 -0700)]
Merge "msvs: fix install of codec sources"
Johann [Tue, 27 Jul 2010 16:10:48 +0000 (12:10 -0400)]
x86/sse2: disable asm quantizer
follow up to Change I0e51492d: neon: disable asm quantizer
Now x86 doesn't segfault with --disable-runtime-cpu-detect and -p=2
Change-Id: I8ca127bb299198efebbcbd5a661e81788361933f
Johann [Tue, 27 Jul 2010 15:56:19 +0000 (11:56 -0400)]
Fix build w/o RTCD
So many places to update ...
Change-Id: Ide957b40cc833f99c2d1849acade6850fbf7585d
John Koleszar [Tue, 27 Jul 2010 15:12:21 +0000 (11:12 -0400)]
neon: disable asm quantizer
The assembly version of the quantizer has not been updated to match
the new exact quantizer introduced in commit
e04e2935. That commit tried
to disable this code but missed the non-RTCD case.
Thanks to David Baker <david.baker at openmarket.com> for isolating the
issue and testing this fix.
Change-Id: I0e51492dc6f8e44d2c10b587427448bf94135c65
Fritz Koenig [Mon, 26 Jul 2010 13:05:39 +0000 (06:05 -0700)]
Merge "update arm idct functions"