Scott LaVarnway [Mon, 1 Nov 2010 20:24:15 +0000 (16:24 -0400)]
SSSE3 version of fast quantizer
(test clip: tulip)
For good quality mode with speed=1, this gave the encoder
a small (2 - 3%) performance boost.
Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
Scott LaVarnway [Wed, 27 Oct 2010 18:38:33 +0000 (14:38 -0400)]
Finding first label
Using tables for the label count and label offset.
Change-Id: Iac3d5b292c37341a881be0af282f5cac3b3e01eb
Yunqing Wang [Thu, 28 Oct 2010 20:59:03 +0000 (16:59 -0400)]
Save XMM registers in asm functions
XMM6/7 are used in these functions, and need to be saved.
Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
Yunqing Wang [Thu, 28 Oct 2010 20:46:35 +0000 (13:46 -0700)]
Merge "Fix full-search SAD function crash in Visual Studio"
John Koleszar [Thu, 28 Oct 2010 20:01:03 +0000 (16:01 -0400)]
Merge branch 'aylesbury'
Yunqing Wang [Thu, 28 Oct 2010 19:26:58 +0000 (15:26 -0400)]
Fix full-search SAD function crash in Visual Studio
Unlike GCC, Visual Studio compiler doesn't allocate SAD output
array 16-byte aligned, which causes crash in visual studio.
Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
John Koleszar [Thu, 28 Oct 2010 13:14:14 +0000 (09:14 -0400)]
CHANGELOG: correct date
Change-Id: I146a7f241efad4f0684cf8613c7fa42bd5cf42f3
John Koleszar [Wed, 27 Oct 2010 20:27:56 +0000 (16:27 -0400)]
Update CHANGELOG for v0.9.5 (Aylesbury) release
Change-Id: Ic9f05dbbe90480d5b172233c87eaf1d4e2f1b48e
Timothy B. Terriberry [Wed, 27 Oct 2010 23:04:02 +0000 (16:04 -0700)]
Eliminate more warnings.
This eliminates a large set of warnings exposed by the Mozilla build
system (Use of C++ comments in ISO C90 source, commas at the end of
enum lists, a couple incomplete initializers, and signed/unsigned
comparisons).
It also eliminates many (but not all) of the warnings expose by newer
GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
without checking the return values).
There are a few spurious warnings left on my system:
../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
change between the two if blocks that test it here.
../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum,
and the C standard does not mandate that enums be unsigned, so the
checks can't be removed.
Change-Id: Iead6cd561a2afaa3d801fd63f1d8d58953da7426
Fritz Koenig [Wed, 27 Oct 2010 19:50:16 +0000 (12:50 -0700)]
postproc: Tweaks to line drawing and blending.
Turned down the blending level to make colored blocks obscure
the video less.
Not blending the entire block to give distinction to macro
block edges.
Added configuration so that macro block blending function can
be optimized.
Change to constrain line as to when dx and dy are computed.
Now draw two lines to form an arrow.
Change-Id: I986784e6abff65ea3e0d1437dfca7d06d44ede71
Frank Galligan [Wed, 27 Oct 2010 15:28:56 +0000 (11:28 -0400)]
Output the PSNR for the entire file.
If --psnr option is enabled vpxenc will output PSNR values for the
entire file. Added a \n before final output to make sure the output
is on its own line. Overall and Avg psnr matches the values written
to opsnr.stt file.
Change-Id: Ibac5fa9baf8d5a626ea0d6ba161b484e6e8427ee
Timothy B. Terriberry [Wed, 27 Oct 2010 23:04:02 +0000 (16:04 -0700)]
Eliminate more warnings.
This eliminates a large set of warnings exposed by the Mozilla build
system (Use of C++ comments in ISO C90 source, commas at the end of
enum lists, a couple incomplete initializers, and signed/unsigned
comparisons).
It also eliminates many (but not all) of the warnings expose by newer
GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
without checking the return values).
There are a few spurious warnings left on my system:
../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
change between the two if blocks that test it here.
../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
standard does not mandate that enums be unsigned, so the checks can't be
removed.
Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
Fritz Koenig [Wed, 27 Oct 2010 20:20:56 +0000 (13:20 -0700)]
Merge "postproc: Tweaks to line drawing and blending."
Fritz Koenig [Wed, 27 Oct 2010 19:50:16 +0000 (12:50 -0700)]
postproc: Tweaks to line drawing and blending.
Turned down the blending level to make colored blocks obscure
the video less.
Not blending the entire block to give distinction to macro
block edges.
Added configuration so that macro block blending function can
be optimized.
Change to constrain line as to when dx and dy are computed.
Now draw two lines to form an arrow.
Change-Id: Id3ef0fdeeab2949a6664b2c63e2a3e1a89503f6c
John Koleszar [Wed, 27 Oct 2010 19:06:23 +0000 (12:06 -0700)]
Merge "Output the PSNR for the entire file."
Frank Galligan [Wed, 27 Oct 2010 15:28:56 +0000 (11:28 -0400)]
Output the PSNR for the entire file.
If --psnr option is enabled vpxenc will output PSNR values for the
entire file. Added a \n before final output to make sure the output
is on its own line. Overall and Avg psnr matches the values written
to opsnr.stt file.
Change-Id: I869268b704fe8b0c8389d318cceb6072fea102f8
Yunqing Wang [Wed, 27 Oct 2010 12:45:24 +0000 (08:45 -0400)]
Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
John Koleszar [Wed, 27 Oct 2010 15:28:43 +0000 (11:28 -0400)]
Fix half-pixel variance RTCD functions
This patch fixes the system dependent entries for the half-pixel
variance functions in both the RTCD and non-RTCD cases:
- The generic C versions of these functions are now correct.
Before all three cases called the hv code.
- Wire up the ARM functions in RTCD mode
- Created stubs for x86 to call the optimized subpixel functions
with the correct parameters, rather than falling back to C
code.
Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
John Koleszar [Wed, 27 Oct 2010 14:08:17 +0000 (10:08 -0400)]
vpxdec: don't require -o with --noblit
Specifiying the output file is meaningless when we're not writing to
it.
Change-Id: I271e1d3ae1994d79f0773747477124600f98ca58
John Koleszar [Wed, 27 Oct 2010 14:06:45 +0000 (10:06 -0400)]
makefile: remove ivf{enc,dec} on make clean
Prior clean-up removed the object files, but not the binaries
themselves.
Change-Id: Ic2332188cea88094c14457ebb8b77680a60d581b
John Koleszar [Wed, 27 Oct 2010 14:05:55 +0000 (10:05 -0400)]
vpxenc: add unique track id
MKV requires a unique(ish) TrackID element in the track info header.
Instead of the current hard-coded ID, take a hash of the video track
and use that. This value is not written in the deterministic output
mode, despite being a deterministic value itself, to give flexibility
to change the hash algorithm and not affect bisecting across the
change.
Change-Id: I807fc3ea6d1427a151c3ef703269b67e80aef860
Johann [Wed, 27 Oct 2010 16:59:28 +0000 (09:59 -0700)]
Merge "fix implicit declarations"
Johann [Wed, 27 Oct 2010 16:59:01 +0000 (09:59 -0700)]
Merge "RTCD build is bringing old errors to light"
Fritz Koenig [Tue, 26 Oct 2010 20:26:17 +0000 (13:26 -0700)]
vpxdec : Change --pp-debug-info to be a bit field.
This allows multiple post processor debug levels to be overlayed.
i.e. can show colored reference blocks and visual motion vectors.
Change-Id: Ic4a1df438445b9f5780fe73adb3126e803472e53
Fritz Koenig [Wed, 27 Oct 2010 16:04:39 +0000 (09:04 -0700)]
Merge "postproc: Add mode and refrence frame visualizers."
Johann [Wed, 27 Oct 2010 15:21:02 +0000 (11:21 -0400)]
fix implicit declarations
ARM used to explicitly remove this file from the build. With the RTCD
changes, that's no longer possible. These errors also exist for x86 w/o
RTCD, but that's not the default configuration
Change-Id: I3e10e5553ddf3278e8d3c9365ca6fb84f52f5066
Johann [Wed, 27 Oct 2010 14:47:48 +0000 (10:47 -0400)]
RTCD build is bringing old errors to light
needs to be _recon_ not _recon_recon_
Change-Id: I7a8b9ddcb4fb72c2b723c563932c9ea52ff15982
John Koleszar [Wed, 27 Oct 2010 13:50:02 +0000 (06:50 -0700)]
Merge "vpxenc: add deterministic output option"
John Koleszar [Wed, 27 Oct 2010 03:05:02 +0000 (20:05 -0700)]
Merge "Add half-pixel variance RTCD functions"
John Koleszar [Wed, 27 Oct 2010 03:02:57 +0000 (20:02 -0700)]
Merge "make vp8_recon16x16mb{,y} RTCD functions"
John Koleszar [Wed, 27 Oct 2010 03:02:37 +0000 (20:02 -0700)]
Merge "make arm hex search the generic implementation"
John Koleszar [Wed, 27 Oct 2010 03:02:18 +0000 (20:02 -0700)]
Merge "arm: move unrolled loops back to generic code"
John Koleszar [Wed, 27 Oct 2010 03:01:54 +0000 (20:01 -0700)]
Merge "arm: remove duplicate functions"
John Koleszar [Tue, 26 Oct 2010 19:34:16 +0000 (15:34 -0400)]
Add half-pixel variance RTCD functions
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.
A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.
Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
Fritz Koenig [Tue, 26 Oct 2010 19:58:51 +0000 (12:58 -0700)]
postproc: Add mode and refrence frame visualizers.
Post process option to color the block for either the mode
of the macro block, or the frame that the macro block references.
Change-Id: Ie498175497f2d20e3319924d352dc4ddc16f4134
John Koleszar [Tue, 26 Oct 2010 20:22:22 +0000 (16:22 -0400)]
vpxenc: add deterministic output option
By baking the version number into the output file, a hash of the file
will vary from commit to commit, even if the output is otherwise bit
exact. Add a -D option to suppress this behavior, for use when
bisecting or other debugging.
Change-Id: I5089a8ce5719920ffaf47620fa9069b81fa15673
John Koleszar [Tue, 26 Oct 2010 20:10:59 +0000 (13:10 -0700)]
Merge "Update AUTHORS"
John Koleszar [Tue, 26 Oct 2010 20:10:22 +0000 (16:10 -0400)]
Update AUTHORS
Change-Id: I18e0a9e00731c23a2bdd1a978c8cb38f71e9029d
John Koleszar [Tue, 26 Oct 2010 15:37:23 +0000 (11:37 -0400)]
make vp8_recon16x16mb{,y} RTCD functions
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.
Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
John Koleszar [Tue, 26 Oct 2010 14:46:31 +0000 (10:46 -0400)]
make arm hex search the generic implementation
The ARM version of vp8_hex_search() is a faster implementation
of the same algorithm. Since it doesn't use any ARM specific
code, it can be made the default implementation. This removes
a linking error.
Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
John Koleszar [Tue, 26 Oct 2010 14:05:21 +0000 (07:05 -0700)]
Merge "add missing GET_GOT/RESTORE_GOT pairs"
John Koleszar [Tue, 26 Oct 2010 13:51:35 +0000 (09:51 -0400)]
arm: move unrolled loops back to generic code
Some of the ARM functions differed from their generic counterparts
only by unrolling their loops. Since this change may be useful
on other platforms, or might even supercede the looped version
in the generic case, move it back to the generic file.
This code is left under #if ARCH_ARM for now, but it may be worth
considering a different (possibly new) conditional for these. If
it turns out that this should be runtime selectable, these
functions will have to move to the RTCD infrastructure. Don't want
to take that step at this time without more profile data.
Change-Id: I4612fdbc606fbebba4971a690fb743ad184ff15f
John Koleszar [Tue, 26 Oct 2010 13:37:44 +0000 (09:37 -0400)]
arm: remove duplicate functions
These functions were true duplicates of functions present in the
generic code. This fixes some of the link errors when building
with --enable-shared --enable-pic.
Change-Id: Idff26599d510d954e439207883607ad6b74df20c
Jim Bankoski [Tue, 26 Oct 2010 11:34:57 +0000 (07:34 -0400)]
Merge commit 'refs/changes/09/809/1' of https://review.webmproject.org/p/libvpx
John Koleszar [Tue, 26 Oct 2010 03:45:02 +0000 (23:45 -0400)]
add missing GET_GOT/RESTORE_GOT pairs
These functions made global references but did not set up the GOT,
causing compilation failures in PIC mode.
Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
John Koleszar [Tue, 26 Oct 2010 02:59:23 +0000 (22:59 -0400)]
Merge WebM input/output branch
Change-Id: I83a6f18d2314e5d97759b4ae49afaa52fd8b3c44
John Koleszar [Fri, 22 Oct 2010 18:57:02 +0000 (14:57 -0400)]
vpxenc: warn against webm output to pipes
The WebM writer requires a seekable stream.
Change-Id: I192e00706a0685362d41b8d2faf80add63d564b9
John Koleszar [Fri, 22 Oct 2010 18:48:21 +0000 (14:48 -0400)]
vpxenc: specify output file with -o
Requiring the output file to be specified with the -o option opens up
the possibility of supporting multiple input files in the future.
Change-Id: I14c9b75e9b21184b47081e1ccf30cf4c91315964
John Koleszar [Fri, 22 Oct 2010 03:40:42 +0000 (20:40 -0700)]
vpxdec: rework default output parameters
This patch reworks the default behavior of the tool to output Y4M
instead of writing individual raw frames. The relevant controls are
now:
--yv12, --i420 - These options change the output format to be
raw planar data. The output will be Y4M unless
one of these options is specified.
--flipuv - Swaps the chroma planes. Works with Y4M output.
-o, --output - Sets the output filename. Defaults to stdout if
not specified. Supports escape character
expansion for frame width (%w) height (%h) and
sequence number (%1..%9). The --prefix option
has been removed in favor of this escape
expansion.
Since the output defaults to stdout if -o is not specified, an
error will be thrown if stdout is not connected to a pipe. This
can be overridden by specifying '-o -'.
Change-Id: I94e42c57ca75721fdd57a6129e79bcdb2afe5d4d
John Koleszar [Fri, 22 Oct 2010 03:35:12 +0000 (20:35 -0700)]
vpxdec: replace --quiet with --verbose
Be quiet by default, to play nicer with scripts.
Change-Id: I68f6c88411fd5487566f268fb73b4e55ae64410c
John Koleszar [Thu, 21 Oct 2010 21:28:34 +0000 (17:28 -0400)]
vpxdec: use the same output for --progress and --summary
Update the timing information in-place for the --progress
option.
Change-Id: I8efea57050db72963c0bc5c994425e7e692d1502
John Koleszar [Thu, 21 Oct 2010 20:53:52 +0000 (16:53 -0400)]
usage: fix horizontal alignment of options
When showing the command usage information for vpxenc and vpxdec,
options with both a short and long version that do not take an
argument were not properly aligned.
Change-Id: I8d65b5ab85bcb5a5dc8bc0d4b293b5189d56dedb
John Koleszar [Thu, 21 Oct 2010 20:52:14 +0000 (16:52 -0400)]
vpxenc: change --framerate to --fps
Saves a little typing. FPS is a well known abbreviation.
Change-Id: I53730ea36afb9309732eb1c72c52d824d5365fec
John Koleszar [Thu, 21 Oct 2010 20:23:20 +0000 (16:23 -0400)]
vpxenc: output webm by default
WebM should be preferred to IVF output, since it has wider tool support.
Change-Id: I5ac3d5cb68722e6c8af917cdba32ac01dd5e0ea2
John Koleszar [Thu, 21 Oct 2010 19:02:10 +0000 (15:02 -0400)]
rename ivf{enc,dec} to vpx{enc,dec}
The new WebM output support should be preferred to IVF, but we can't
change the default behavior of the ivf* tools. There are a few other
default behaviors for these tools that are counterintuitive for
historical reasons, and changing the binary name provides the
opportunity to clean those up as well. This patch takes the first
step by renaming the binaries.
Change-Id: I647008ae37cc352dd27ec1da7ed13489e0609b24
John Koleszar [Wed, 20 Oct 2010 16:05:48 +0000 (12:05 -0400)]
ivfenc: webm output support
This patch adds the --webm option, to allow the creation of WebM streams
without having to remux ivf into webm.
Change-Id: Ief93c114a6913c55a04cf51bce38f594372d0ad0
John Koleszar [Wed, 20 Oct 2010 15:06:48 +0000 (11:06 -0400)]
Import webmquicktime webm writer
Initial import of the libmkv directory from the webmquicktime[1]
project, at commit fedbda1.
[1]: git://review.webmproject.org/webmquicktime.git
commit
fedbda18de899ff94855cb334de7e471036fbf1d
Change-Id: I1564a0ebfa72293fc296ee02178196530dfd90e4
Frank Galligan [Wed, 6 Oct 2010 16:51:00 +0000 (12:51 -0400)]
Fixed the timebase parameter of ivfenc.
Ivfenc will use timebase if it is set. If it is not set ivfenc will
still double the timebase so altref frames will have a unique pts.
Patch Set #3: Use integer math to generate source pts. Added a
framerate parameter. Increased the default timebase to milliseconds to
remove the *2 everywhere.
Change-Id: I8d25b5b2cb26deef7eb72d74b5f76c98cafaf4db
John Koleszar [Wed, 20 Oct 2010 14:49:12 +0000 (10:49 -0400)]
ivfdec: support y4m output from raw input
The width and height needed to write the Y4M header can be found by
probing the stream with vpx_codec_peek_stream_info(). This also
has the consequence of supporting multiple codecs from raw files
with automatic detections, should we add additional codecs in the
future.
Change-Id: I7522a8f4c7577b6ed9876d744c59cd86d30c6049
John Koleszar [Tue, 19 Oct 2010 21:20:17 +0000 (17:20 -0400)]
ivfdec: webm reader support
This patch enables ivfdec to decode WebM files. WebM demuxing is
provided by the Matthew Gregan's Nestegg library.
This patch also makes minor changes to the timebase->framerate
handling when doing Y4M output. For WebM files, the framerate is
guessed by looking at the first second of video. For IVF files,
the timebase=1/(2*fps) hack is still in place, but is only used
if the timebase denominator is less than 1000. This is in anticipation
of change I8d25b5b, which introduces the distinction between
framerate and timebase to ivfenc. In the case of high resolution
timebases, like 100ns, we would have to guess the framerate
like we do for WebM, but since WebM support in ivfenc will
deprecate IVF output, we just assume 30fps rather than writing the
lookahead code.
Change-Id: I1dd8600f13bf6071533d2816f005da9ede4f60a2
Fritz Koenig [Mon, 25 Oct 2010 22:40:22 +0000 (15:40 -0700)]
Merge "Debug option for drawing motion vectors."
Fritz Koenig [Fri, 22 Oct 2010 22:41:06 +0000 (15:41 -0700)]
Debug option for drawing motion vectors.
Postproc level that uses Bresenham's line algorithm
to draw motion vectors onto the postproc buffer.
Change-Id: I34c7daa324f2bdfee71e84fcb1c50b90fa06f6fb
Johann [Mon, 25 Oct 2010 20:26:55 +0000 (13:26 -0700)]
Merge "quiet compiler"
John Koleszar [Mon, 25 Oct 2010 20:23:19 +0000 (13:23 -0700)]
Merge "Remove legacy release.sh script"
Aaron Watry [Thu, 30 Sep 2010 19:36:00 +0000 (15:36 -0400)]
Add sparc-solaris-gcc as a build target.
Solaris 10 requires -lposix4 to build successfully on gcc. I only have a
Sparc machine to test with on Solaris 10, but this change leaves
OpenSolaris x86 in a usable state w/ gnu-generic.
I am of the belief that this change should fix Solaris 10 on Sparc, but
will leave other Solaris architectures as is. If someone has an x86
Solaris 10 machine to test on, they may add x86-solaris-gcc to
libvpx/configure and give it a go.
Change-Id: I17a282028bb4d3e9fd8764159f95665160f7b62a
Martin Ettl [Mon, 25 Oct 2010 17:14:11 +0000 (13:14 -0400)]
Fix leaked file descriptor with ENTROPY_STATS
cppcheck found a leaked file descriptor in the debugging code
enabled by defining ENTROPY_STATS. Fixes issue #60.
Change-Id: I0c1d0669cb94d44fed77860f97b82763be06b7cb
John Koleszar [Mon, 25 Oct 2010 14:28:45 +0000 (10:28 -0400)]
NASM: trailing slash for ASFLAGS includes
Fix out-of-tree builds using NASM. NASM expects its include paths to
have a trailing slash. These aren't used used when doing in-tree builds
(./configure)
Change-Id: I38d469d15acb1b7e65733a2e5ca8c9d86fa4ad86
Johann [Mon, 25 Oct 2010 14:07:35 +0000 (10:07 -0400)]
quiet compiler
clean up compiler warnings, man in the yellow hat warnings, and start to
remove unused #includes
Change-Id: I6267e98d9b3024b6fb1ef2732b29067a33cb96f6
Johann [Mon, 18 Oct 2010 18:57:40 +0000 (14:57 -0400)]
reuse common loopfilter code
there were four versions for the regular and
macroblock loopfilters:
horizontal [y|uv]
vertical [y|uv]
this moves all the common code into 2 functions:
vp8_loop_filter_neon
vp8_mbloop_filter_neon
this provides no gain in performance. there's a bit
of jitter, but it trends down ~0.25-0.5%. however,
this is a huge gain maintenance. also, there is the
potential to drop some stack usage in the macroblock
loopfilter.
Change-Id: I91506f07d2f449631ff67ad6f1b3f3be63b81a92
Timothy B. Terriberry [Wed, 20 Oct 2010 22:39:11 +0000 (15:39 -0700)]
Add runtime CPU detection support for ARM.
The primary goal is to allow a binary to be built which supports
NEON, but can fall back to non-NEON routines, since some Android
devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
which versions of each function to build, and when
CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
at run time.
In order for this to work, the CFLAGS must be set to something
appropriate (e.g., without -mfpu=neon for ARMv7, and with
appropriate -march and -mcpu for even earlier configurations), or
the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
required at build time, since the ARM assembler will refuse to emit
them otherwise.
I have not attempted to make any changes to configure to do this
automatically.
Doing so will probably require the addition of new configure options.
Many of the hooks for RTCD on ARM were already there, but a lot of
the code had bit-rotted, and a good deal of the ARM-specific code
is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
site were expanded to check the RTCD flags at that site, but they
should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
these should be moved into an RTCD struct for thread safety (I
believe every platform currently supported has atomic pointer
stores, but this is not guaranteed).
The encoder's boolhuff functions did not even have _c and armv7
suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
used was rbit, and this was completely superfluous, so I reworked
them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
least ARMv5TE, I did not try to detect these at runtime, and simply
enable them for ARMv5 and above.
Finally, the NEON register saving code was completely non-reentrant,
since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
and produced identical output, while using the correct accelerated
functions on each.
I did not test on any earlier processors.
Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
Johann [Wed, 20 Oct 2010 20:27:33 +0000 (16:27 -0400)]
isolate new temporal filtering code
onyx_if is getting pretty big. split out the temporal code to make it
easier to look at.
Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
John Koleszar [Fri, 22 Oct 2010 15:54:07 +0000 (11:54 -0400)]
Merge "Improve handling of invalid frames."
Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
Timothy B. Terriberry [Tue, 19 Oct 2010 22:40:46 +0000 (15:40 -0700)]
Improve handling of invalid frames.
The code was not checking for frame sizes smaller than 3 bytes, and the
partition size checks might have failed if the input buffer was within
16MB of the top of the heap.
In addition, the reference count on the current frame buffer was not
being decremented on error, so after a small number of errors, no new
frame buffer could be found and it would run off the list of them.
Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
Timothy B. Terriberry [Fri, 22 Oct 2010 00:04:30 +0000 (17:04 -0700)]
Convert [4][4] matrices to [16] arrays.
Most of the code that actually uses these matrices indexes them as
if they were a single contiguous array, and coverity produces
reports about the resulting accesses that overflow the static
bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
arrays should eliminate the report, and removes a good deal of
extraneous indexing and address operators from the code.
Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
Frank Galligan [Tue, 5 Oct 2010 21:46:37 +0000 (17:46 -0400)]
Change altref times to preceding pts+1.
Change the pts of the altref frame to be as close as possible to the
pts of the preceding frame and still be strictly increasing.
Change-Id: Iae3033a4c89ae5a9d0e5c4198e9196e5f3ee57c7
John Koleszar [Thu, 21 Oct 2010 18:09:02 +0000 (11:09 -0700)]
Merge "Move firstpass motion map to stats packet"
John Koleszar [Thu, 14 Oct 2010 20:40:12 +0000 (16:40 -0400)]
Move firstpass motion map to stats packet
The first implementation of the firstpass motion map for motion
compensated temporal filtering created a file, fpmotionmap.stt,
in the current working directory. This was not safe for multiple
encoder instances. This patch merges this data into the first pass
stats packet interface, so that it is handled like the other
(numerical) firstpass stats.
The new stats packet is defined as follows:
Numerical Stats (16 doubles) -- 128 bytes
Motion Map -- 1 byte / Macroblock
Padding -- to align packet to 8 bytes
The fpmotionmap.stt file can still be generated for debugging
purposes in the same way that the textual version of the stats
are available (defining OUTPUT_FPF in firstpass.c)
Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
Yunqing Wang [Thu, 21 Oct 2010 17:42:24 +0000 (13:42 -0400)]
Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
Yunqing Wang [Thu, 21 Oct 2010 17:31:22 +0000 (10:31 -0700)]
Merge "Rewrite vp8_short_walsh4x4_sse2()"
Yunqing Wang [Thu, 21 Oct 2010 17:30:27 +0000 (10:30 -0700)]
Merge "Add SSE2 subtract functions"
Yunqing Wang [Thu, 21 Oct 2010 14:26:50 +0000 (10:26 -0400)]
Rewrite vp8_short_walsh4x4_sse2()
This rewriting reflects changes made in commit "Improve the
accuracy of forward walsh-hadamard transform". Since this function
is not called much, only a small encoder performance gain (~0.5% )
is seen.
Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
John Koleszar [Tue, 19 Oct 2010 18:40:07 +0000 (14:40 -0400)]
Import nestegg webm/mkv parser
Initial import of nestegg[1] parser lib, at commit 0d51131.
[1]: http://github.com/kinetiknz/nestegg
commit
0d51131519a1014660b5e111e28a78785d76600f
Change-Id: I191d388b7e5140ef96624511ccdd65d0e183076d
John Koleszar [Wed, 20 Oct 2010 03:20:31 +0000 (20:20 -0700)]
Merge "Update arnr strength range form 1-6 to 0-6."
Frank Galligan [Tue, 5 Oct 2010 01:12:22 +0000 (21:12 -0400)]
Update arnr strength range form 1-6 to 0-6.
Change-Id: I8eb49c56f7509f0a8074d440e8345b9e3344b85b
Yaowu Xu [Tue, 19 Oct 2010 23:23:31 +0000 (16:23 -0700)]
Merge "fixed a typo that mis-used Y plane stride for UV blocks."
Yaowu Xu [Tue, 19 Oct 2010 15:11:52 +0000 (08:11 -0700)]
Merge "change to make use of more trellis quantization"
Yunqing Wang [Mon, 18 Oct 2010 18:15:15 +0000 (14:15 -0400)]
Add SSE2 subtract functions
Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.
Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
Johann [Mon, 18 Oct 2010 17:23:39 +0000 (13:23 -0400)]
copy compiler warning fixes
generic version got fixed, but not the arm version. fixes:
vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
argument 5 of 'fn_ptr->sdx3f' differ in signedness
vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
argument is of type 'int *'
and another unsigned change to keep the files similar
Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
Johann [Fri, 15 Oct 2010 19:25:19 +0000 (15:25 -0400)]
remove dead code
vp8_diamond_search_sadx4 isn't used in arm because there is no
corrosponding sdx4df as in x86. rather than keep it in sync with
../mcomp.c, delete it
vp8_hex_search had the original, more readable/understandable code if`d
out. it's also available in ../mcomp.c, so remove the dead copy
Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
Yaowu Xu [Fri, 15 Oct 2010 01:58:34 +0000 (18:58 -0700)]
change to make use of more trellis quantization
when a subsequent frame is encoded as an alt reference frame, it is
unlikely that any mb in current frame will be used as reference for
future frames, so we can enable quantization optimization even when
the RD constant is slightly rate-biased. The change has an overall
benefit between 0.1% to 0.2% bit savings on the test sets based on
vpxssim scores.
Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
Jim Bankoski [Thu, 14 Oct 2010 20:19:06 +0000 (16:19 -0400)]
safety check to avoid divide by 0s
Yunqing Wang [Thu, 14 Oct 2010 19:20:25 +0000 (12:20 -0700)]
Merge "Fix one gcc compiler warning"
Yunqing Wang [Thu, 14 Oct 2010 18:25:03 +0000 (14:25 -0400)]
Fix one gcc compiler warning
../libvpx/vp8/encoder/bitstream.c: In function ‘pack_inter_mode_mvs’:
../libvpx/vp8/encoder/bitstream.c:1026: warning: array subscript has type ‘char’
Change-Id: Ic77491e0a172fa1821e5b3e914d0dc41fe87c00f
Yunqing Wang [Thu, 14 Oct 2010 18:29:24 +0000 (11:29 -0700)]
Merge "Improve bounds checking in vp8_diamond_search_sadx4()"
Yunqing Wang [Thu, 14 Oct 2010 15:06:37 +0000 (11:06 -0400)]
Improve bounds checking in vp8_diamond_search_sadx4()
In order to know if all 4/8 neighbor points are within the bounds,
4 bounds checking are enough instead of checking 4 bounds for
each points (16/32 checkings). This improvement reduces cost of
vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
performance gain (test options: 1 pass, good, speed=4).
Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
Fritz Koenig [Thu, 14 Oct 2010 00:08:13 +0000 (17:08 -0700)]
Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.
Typo had function defined as _ssse2 and prototyped as _sse2.
Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
Fritz Koenig [Wed, 13 Oct 2010 23:57:57 +0000 (16:57 -0700)]
Correct QWORD usage in assembly files
QWORD was being undefined because it was being used
incorrectly.
Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
Fritz Koenig [Tue, 12 Oct 2010 21:55:31 +0000 (14:55 -0700)]
Add processor dectection for x86.
Use cpuid to check the vendor string against known
architectures.
Change-Id: I3fbd7f73638d71857a0c4a44a6275eb295fb4cef
Fritz Koenig [Tue, 12 Oct 2010 16:42:03 +0000 (09:42 -0700)]
GCC inline restrictions were not adequate.
=r was not restrictive enough and the compiler was not returning
ebx correctly.
Change-Id: I7606e384067bd5fb69189802f1ff64ccc5aa02d6
John Koleszar [Thu, 7 Oct 2010 05:39:16 +0000 (22:39 -0700)]
Centralize mb skip state calculation
This patch moves the scattered updates to the mb skip state
(mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent
changes to the quantizer exposed a bug where if a macroblock
could be coded as a skip but isn't, the encoder would run the
loopfilter but the decoder wouldn't, causing a reference buffer
mismatch.
The loopfilter is controlled by a flag called dc_diff. The decoder
looks at the number of decoded coefficients when setting this flag.
The encoder sets this flag based on the skip state, since any
skippable macroblock should be transmitted as a skip. The coefficient
optimization pass (vp8_optimize_b()) could change the coefficients
such that a block that was not a skip becomes one. The encoder was
not updating the skip state in this situation for intra coded blocks.
The underlying issue predates it, but this bug was recently triggered
by enabling trellis quantization on the Y2 block in commit dcd29e3,
and by changing the quantizer range control in commit 305be4e.
Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802