Søren Sandmann Pedersen [Wed, 12 Jan 2011 08:02:59 +0000 (03:02 -0500)]
Add a test program, tri-test
This program tests whether the new triangle support works.
Søren Sandmann Pedersen [Tue, 11 Jan 2011 15:15:21 +0000 (10:15 -0500)]
Add support for triangles to pixman.
The Render X extension can draw triangles as well as trapezoids, but
the implementation has always converted them to trapezoids. This patch
moves the X server's triangle conversion code into pixman, where we
can reuse the pixman_composite_trapezoid() code.
Søren Sandmann Pedersen [Thu, 10 Feb 2011 15:37:08 +0000 (10:37 -0500)]
Add a test program for pixman_composite_trapezoids().
A CRC32 based test program to check that pixman_composite_trapezoids()
actually works.
Søren Sandmann Pedersen [Tue, 11 Jan 2011 14:23:43 +0000 (09:23 -0500)]
Add pixman_composite_trapezoids().
This function is an implementation of the X server request
Trapezoids. That request is what the X backend of cairo is using all
the time; by moving it into pixman we can hopefully make it faster.
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:40:53 +0000 (19:40 -0500)]
test/Makefile.am: Move all the TEST_LDADD into a new global LDADD.
This gets rid of a bunch of replicated *_LDADD clauses
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:20:18 +0000 (19:20 -0500)]
Add @TESTPROGS_EXTRA_LDFLAGS@ to AM_LDFLAGS
Instead of explicitly adding it to each test program.
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:16:39 +0000 (19:16 -0500)]
Move all the GTK+ based test programs to a new subdir, "demos"
This separates the test suite from the random gtk+ using test
programs. "demos" is somewhat misleading because the programs there
are not particularly exciting (with the possible exception of
composite-test which shows off all the compositing operators).
Siarhei Siamashka [Thu, 3 Feb 2011 22:47:36 +0000 (00:47 +0200)]
SSE2 optimization for nearest scaled over_8888_n_8888
This operation shows up a little bit in some of the html5 based
games from http://www.kesiev.com/akihabara/
=== Cairo trace of the game intro animation for 'Legend of Sadness' ===
before:
[ 0] image firefox-legend-of-sadness 46.286 46.298 0.01% 5/6
after:
[ 0] image firefox-legend-of-sadness 45.088 45.102 0.04% 6/6
=== Microbenchmark (scaling ~2000x~2000 -> ~2000x~2000) ===
before:
translucent: op=3, src=8888, mask=s dst=8888, speed=131.30 MPix/s
transparent: op=3, src=8888, mask=s dst=8888, speed=132.38 MPix/s
opaque: op=3, src=8888, mask=s dst=8888, speed=167.90 MPix/s
after:
translucent: op=3, src=8888, mask=s dst=8888, speed=301.93 MPix/s
transparent: op=3, src=8888, mask=s dst=8888, speed=770.70 MPix/s
opaque: op=3, src=8888, mask=s dst=8888, speed=301.80 MPix/s
Siarhei Siamashka [Wed, 3 Nov 2010 13:22:28 +0000 (15:22 +0200)]
ARM: NEON optimization for nearest scaled over_0565_8_0565
In some cases may be used for html5 video when hardware acceleration
is not available.
Siarhei Siamashka [Wed, 3 Nov 2010 13:16:28 +0000 (15:16 +0200)]
ARM: NEON optimization for nearest scaled over_8888_8_0565
In some cases may be used for html5 video when hardware acceleration
is not available.
Siarhei Siamashka [Wed, 3 Nov 2010 13:15:15 +0000 (15:15 +0200)]
ARM: new macro template for using scaled fast paths with a8 mask
Siarhei Siamashka [Wed, 2 Feb 2011 16:14:56 +0000 (18:14 +0200)]
Better support for NONE repeat in nearest scaling main loop template
Scaling function now gets an extra boolean argument, which is set
to TRUE when we are fetching padding pixels for NONE repeat. This
allows to make a decision whether to interpret alpha as 0xFF or 0x00
for such pixels when working with formats which don't have alpha
channel (for example x8r8g8b8 and r5g6b5).
Siarhei Siamashka [Fri, 22 Oct 2010 14:54:41 +0000 (17:54 +0300)]
Support for a8 and solid mask in nearest scaling main loop template
In addition to the most common case of not having any mask at all, two
variants of scaling with mask show up in cairo traces:
1. non-scaled a8 mask with SAMPLES_COVER_CLIP flag
2. solid mask
This patch extends the nearest scaling main loop template to also
support these cases.
Siarhei Siamashka [Fri, 22 Oct 2010 13:29:01 +0000 (16:29 +0300)]
test: Extend scaling-test to support a8/solid mask and ADD operation
Image width also has been increased because SIMD optimizations typically
do more unrolling in the inner loops, and this needs to be tested.
Siarhei Siamashka [Mon, 17 Jan 2011 00:29:43 +0000 (02:29 +0200)]
Use const modifiers for source buffers in nearest scaling fast paths
Siarhei Siamashka [Fri, 30 Jul 2010 15:37:51 +0000 (18:37 +0300)]
C fast paths for a simple 90/270 degrees rotation
Depending on CPU architecture, performance is in the range of 1.5 to 4 times
slower than simple nonrotated copy (which would be an ideal case, perfectly
utilizing memory bandwidth), but still is more than 7 times faster if
compared to general path.
This implementation sets a performance baseline for rotation. The use
of SIMD instructions may further improve memory bandwidth utilization.
Siarhei Siamashka [Thu, 29 Jul 2010 14:58:13 +0000 (17:58 +0300)]
New flags for 90/180/270 rotation
These flags are set when the transform is a simple nonscaled 90/180/270
degrees rotation.
Siarhei Siamashka [Tue, 26 Oct 2010 12:40:01 +0000 (15:40 +0300)]
test: affine-test updated to stress 90/180/270 degrees rotation more
Søren Sandmann Pedersen [Thu, 10 Feb 2011 10:21:42 +0000 (05:21 -0500)]
Add pixman-conical-gradient.c to Makefile.win32.
Pointed out by Kirill Tishin.
Søren Sandmann Pedersen [Sun, 23 Jan 2011 21:53:26 +0000 (16:53 -0500)]
Add SSE2 fetcher for 0565
Before:
add_0565_0565 = L1: 61.08 L2: 61.03 M: 60.57 ( 10.95%) HT: 46.85 VT: 45.25 R: 39.99 RT: 20.41 ( 233Kops/s)
After:
add_0565_0565 = L1: 77.84 L2: 76.25 M: 75.38 ( 13.71%) HT: 55.99 VT: 54.56 R: 45.41 RT: 21.95 ( 255Kops/s)
Søren Sandmann Pedersen [Fri, 31 Dec 2010 05:57:46 +0000 (00:57 -0500)]
Improve performance of sse2_combine_over_u()
Split this function into two, one that has a mask, and one that
doesn't. This is a fairly substantial speed-up in many cases.
New output of lowlevel-blt-bench over_x888_8_0565:
over_x888_8_0565 = L1: 63.76 L2: 62.75 M: 59.37 ( 21.55%) HT: 45.89 VT: 43.55 R: 34.51 RT: 16.80 ( 201Kops/s)
Søren Sandmann Pedersen [Sun, 23 Jan 2011 21:17:17 +0000 (16:17 -0500)]
Add SSE2 fetcher for a8
New output of lowlevel-blt-bench over_x888_8_0565:
over_x888_8_0565 = L1: 57.85 L2: 56.80 M: 54.14 ( 19.50%) HT: 42.64 VT: 40.56 R: 32.67 RT: 16.22 ( 195Kops/s)
Based in part on code by Steve Snyder from
https://bugs.freedesktop.org/show_bug.cgi?id=21173
Søren Sandmann Pedersen [Wed, 12 Jan 2011 11:38:54 +0000 (06:38 -0500)]
Add SSE2 fetcher for x8r8g8b8
New output of lowlevel-blt-bench over_x888_8_0565:
over_x888_8_0565 = L1: 55.68 L2: 55.11 M: 52.83 ( 19.04%) HT: 39.62 VT: 37.70 R: 30.88 RT: 14.62 ( 174Kops/s)
The fetcher is looked up in a table, so that other fetchers can easily
be added.
See also https://bugs.freedesktop.org/show_bug.cgi?id=20709
Søren Sandmann Pedersen [Sat, 22 Jan 2011 22:13:19 +0000 (17:13 -0500)]
Add a test for over_x888_8_0565 in lowlevel_blt_bench().
The next few commits will speed this up quite a bit.
Current output:
---
reference memcpy speed = 2217.5MB/s (554.4MP/s for 32bpp fills)
---
over_x888_8_0565 = L1: 54.67 L2: 54.01 M: 52.33 ( 18.88%) HT: 37.19 VT: 35.54 R: 29.40 RT: 13.63 ( 162Kops/s)
Søren Sandmann Pedersen [Mon, 24 Jan 2011 17:24:42 +0000 (12:24 -0500)]
Move fallback decisions from implementations into pixman-cpu.c.
Instead of having each individual implementation decide which fallback
to use, move it into pixman-cpu.c, where a more global decision can be
made.
This is accomplished by adding a "fallback" argument to all the
pixman_implementation_create_*() implementations, and then in
_pixman_choose_implementation() pass in the desired fallback.
Søren Sandmann Pedersen [Fri, 21 Jan 2011 19:47:33 +0000 (14:47 -0500)]
Print a warning when a development snapshot is being configured.
It seems to be relatively common for people to use development
snapshots of pixman thinking they are ordinary releases. This patch
makes it such that if the current minor version is odd, configure will
print a banner explaining the version number scheme plus information
about where to report bugs.
Rolland Dudemaine [Tue, 25 Jan 2011 13:08:26 +0000 (15:08 +0200)]
Fix "variable was set but never used" warnings
Removes useless variable declarations. This can only result in more
efficient code, as these variables where sometimes assigned, but
their values were never used.
Rolland Dudemaine [Tue, 25 Jan 2011 12:14:57 +0000 (14:14 +0200)]
test: Use the right enum types instead of int to fix warnings
Green Hills Software MULTI compiler was producing a number
of warnings due to incorrect uses of int instead of the correct
corresponding pixman_*_t type.
Rolland Dudemaine [Tue, 25 Jan 2011 12:52:49 +0000 (14:52 +0200)]
Correct the initialization of 'max_vx'
http://lists.freedesktop.org/archives/pixman/2011-January/000937.html
Rolland Dudemaine [Tue, 25 Jan 2011 11:55:28 +0000 (13:55 +0200)]
test: Fix for mismatched 'fence_malloc' prototype/implementation
Solves compilation problem when 'mprotect' is not available. For
example, when using Green Hills Software MULTI compiler or mingw:
http://lists.freedesktop.org/archives/pixman/2011-January/000939.html
Siarhei Siamashka [Mon, 10 Jan 2011 19:01:16 +0000 (21:01 +0200)]
The code in 'bitmap_addrect' already assumes non-null 'reg->data'
So the check of 'reg->data' pointer can be safely removed.
Søren Sandmann Pedersen [Wed, 19 Jan 2011 12:47:52 +0000 (07:47 -0500)]
Post-release version bump to 0.21.5
Søren Sandmann Pedersen [Wed, 19 Jan 2011 12:38:24 +0000 (07:38 -0500)]
Pre-release version bump to 0.21.4
Søren Sandmann Pedersen [Mon, 17 Jan 2011 19:12:20 +0000 (14:12 -0500)]
Fix dangling-pointer bug in bits_image_fetch_bilinear_no_repeat_8888().
The mask_bits variable is only declared in a limited scope, so the
pointer to it becomes invalid instantly. Somehow this didn't actually
trigger any bugs, but Brent Fulgham reported that Bounds Checker was
complaining about it.
Fix the bug by moving mask_bits to the function scope.
Andrea Canciani [Wed, 12 Jan 2011 16:43:40 +0000 (17:43 +0100)]
Add a test for radial gradients
radial-test is a port of the radial-gradient test from the cairo test
suite. It has been modified so that some pixels have 0 in both the a
and b coefficients of the quadratic equation solved by the rasterizer,
to expose a division by zero in the original implementation.
Søren Sandmann Pedersen [Sun, 12 Dec 2010 12:34:42 +0000 (07:34 -0500)]
Fix destination fetching
When fetching from destinations, we need to ignore transformations,
repeat and filtering. Currently we don't ignore them, which means all
kinds of bad things can happen.
This bug fixes this problem by directly calling the scanline fetchers
for destinations instead of going through the full
get_scanline_32/64().
Søren Sandmann Pedersen [Sun, 12 Dec 2010 14:19:13 +0000 (09:19 -0500)]
Turn on testing for destination transformation
Søren Sandmann Pedersen [Sat, 11 Dec 2010 13:10:04 +0000 (08:10 -0500)]
Skip fetching pixels when possible
Add two new iterator flags, ITER_IGNORE_ALPHA and ITER_IGNORE_RGB that
are set when the alpha and rgb values are not needed. If both are set,
then we can skip fetching entirely and just use
_pixman_iter_get_scanline_noop.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 21:55:55 +0000 (16:55 -0500)]
Add direct-write optimization back
Introduce a new ITER_LOCALIZED_ALPHA flag that indicates that the
alpha value computed is used only for the alpha channel of the output;
it doesn't affect the RGB channels.
Then in pixman-bits-image.c, if a destination is either a8r8g8b8 or
x8r8g8b8 with localized alpha, the iterator will return a pointer
directly into the image.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 20:18:48 +0000 (15:18 -0500)]
Get rid of the classify methods
They are not used anymore, and the linear gradient is now doing the
optimization in a different way.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 20:14:24 +0000 (15:14 -0500)]
Linear: Optimize for horizontal gradients
If the gradient is horizontal, we can reuse the same scanline over and
over. Add support for this optimization to
_pixman_linear_gradient_iter_init().
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:59:20 +0000 (14:59 -0500)]
Consolidate the various get_scanline_32() into get_scanline_narrow()
The separate get_scanline_32() functions in solid, linear, radial and
conical images are no longer necessary because all access to these
images now go through iterators.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:44:22 +0000 (14:44 -0500)]
Allow NULL property_changed function
Initialize the field to NULL, and then delete the empty functions from
the solid, linear, radial, and conical images.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:39:01 +0000 (14:39 -0500)]
Move get_scanline_32/64 to the bits part of the image struct
At this point these functions are basically a cache that the bits
image uses for its fetchers, so they can be moved to the bits image.
With the scanline getters only being initialized in the bits image,
the _pixman_image_get_scanline_generic_64 can be moved to
pixman-bits-image.c. That gets rid of the final user of
_pixman_image_get_scanline_32/64, so these can be deleted.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 15:53:02 +0000 (10:53 -0500)]
Use an iterator in pixman_image_get_solid()
This is a step towards getting rid of the
_pixman_image_get_scanline_32/64() functions.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 18:26:53 +0000 (13:26 -0500)]
Virtualize iterator initialization
Make src_iter_init() and dest_iter_init() virtual methods in the
implementation struct. This allows individual implementations to plug
in their own CPU specific scanline fetchers.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:40:26 +0000 (12:40 -0500)]
Move iterator initialization to the respective image files
Instead of calling _pixman_image_get_scanline_32/64(), move the
iterator initialization into the respecive image implementations and
call the scanline generators directly.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:31:29 +0000 (12:31 -0500)]
Eliminate the _pixman_image_store_scanline_32/64 functions
They were only called from next_line_write_narrow/wide, so they could
simply be absorbed into those functions.
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:19:50 +0000 (12:19 -0500)]
Move initialization of iterators for bits images to pixman-bits-image.c
pixman_iter_t is now defined in pixman-private.h, and iterators for
bits images are being initialized in pixman-bits-image.c
Søren Sandmann Pedersen [Fri, 10 Dec 2010 16:30:27 +0000 (11:30 -0500)]
Add iterators in the general implementation
We add a new structure called a pixman_iter_t that encapsulates the
information required to read scanlines from an image. It contains two
functions, get_scanline() and write_back(). The get_scanline()
function will generate pixels for the current scanline. For iterators
for source images, it will also advance to the next scanline. The
write_back() function is only called for destination images. Its
function is to write back the modified pixels to the image and then
advance to the next scanline.
When an iterator is initialized, it is passed this information:
- The image to iterate
- The rectangle to be iterated
- A buffer that the iterator may (but is not required to) use. This
buffer is guaranteed to have space for at least width pixels.
- A flag indicating whether a8r8g8b8 or a16r16g16b16 pixels should
be fetched
There are a number of (eventual) benefits to the iterators:
- The initialization of the iterator can be virtualized such that
implementations can plug in their own CPU specific get_scanline()
and write_back() functions.
- If an image is horizontal, it can simply plug in an appropriate
get_scanline(). This way we can get rid of the annoying
classify() virtual function.
- In general, iterators can remember what they did on the last
scanline, so for example a REPEAT_NONE image might reuse the same
data for all the empty scanlines generated by the zero-extension.
- More detailed information can be passed to iterator, allowing
more specialized fetchers to be used.
- We can fix the bug where destination filters and transformations
are not currently being ignored as they should be.
However, this initial implementation is not optimized at all. We lose
several existing optimizations:
- The ability to composite directly in the destination
- The ability to only fetch one scanline for horizontal images
- The ability to avoid fetching the src and mask for the CLEAR
operator
Later patches will re-introduce these optimizations.
Siarhei Siamashka [Tue, 11 Jan 2011 12:36:24 +0000 (14:36 +0200)]
ARM: do /proc/self/auxv based cpu features detection only in linux
This method is linux specific, but earlier it was tried for any platform
that did not have _MSC_VER macro defined.
Siarhei Siamashka [Mon, 13 Sep 2010 01:21:33 +0000 (04:21 +0300)]
A new configure option --enable-static-testprogs
This option can be used for building fully static binaries of the test
programs so that they can be easily run using qemu-user. With binfmt-misc
configured, 'make check' works fine for crosscompiled pixman builds.
Siarhei Siamashka [Mon, 10 Jan 2011 16:29:33 +0000 (18:29 +0200)]
Make 'fast_composite_scaled_nearest_*' less suspicious
Taking address of a variable and then using it as an array looks suspicious
to static code analyzers. So change it into an array with 1 element to make
them happy. Both old and new variants of this code are correct because 'vx'
and 'unit_x' arguments are set to 0 and it means that the called scanline
function can only access a single element of 'zero' buffer.
Siarhei Siamashka [Mon, 10 Jan 2011 16:09:16 +0000 (18:09 +0200)]
Bugfix for a corner case in 'pixman_transform_is_inverse'
When 'pixman_transform_multiply' fails, the result of multiplication just
could not have been identity matrix (one of the values in the resulting
matrix can't be represented as 16.16 fixed point value). So it is safe
to return FALSE.
Siarhei Siamashka [Tue, 4 Jan 2011 11:42:29 +0000 (13:42 +0200)]
Workaround for a preprocessor issue in old Sun Studio
Patch from Peter O'Gorman with some modifications
https://bugs.freedesktop.org//show_bug.cgi?id=32764
Siarhei Siamashka [Tue, 4 Jan 2011 06:41:02 +0000 (08:41 +0200)]
Fix for "syntax error: empty declaration" Solaris Studio warnings
Siarhei Siamashka [Tue, 4 Jan 2011 06:18:38 +0000 (08:18 +0200)]
Revert "Fix "syntax error: empty declaration" warnings."
This reverts commit
b924bb1f8191cc7c386d8211d9822aeeaadcab44.
There is a better fix for these Solaris Studio warnings.
Andrea Canciani [Tue, 23 Nov 2010 10:37:54 +0000 (11:37 +0100)]
Improve handling of tangent circles
When b is 0, avoid the division by zero and just return transparent
black.
When the solution t would have an invalid radius (negative or outside
[0,1] for none-extended gradients), return transparent black.
Søren Sandmann Pedersen [Mon, 20 Dec 2010 21:11:48 +0000 (16:11 -0500)]
sse2: Skip src pixels that are zero in sse2_composite_over_8888_n_8888()
This is a big speed-up in the SVG helicopter game:
http://ie.microsoft.com/testdrive/Performance/Helicopter/Default.xhtml
when rendered by Firefox 4 since it is compositing big images
consisting almost entirely of zeros.
Søren Sandmann Pedersen [Sat, 18 Dec 2010 11:06:39 +0000 (06:06 -0500)]
Fix divide-by-zero in set_lum().
When (l - min) or (max - l) are zero, simply set all the channels to
the limit, 0 in the case of (l - min), and a in the case of (max - l).
Søren Sandmann Pedersen [Sat, 18 Dec 2010 11:05:52 +0000 (06:05 -0500)]
Add a test compositing with the various PDF operators.
The test has floating point exceptions enabled, and currently fails
with a divide-by-zero.
Cyril Brulebois [Sun, 19 Dec 2010 18:37:26 +0000 (19:37 +0100)]
Fix linking issues when HAVE_FEENABLEEXCEPT is set.
All objects using test/util.c fail to link:
| CCLD region-test
| /usr/bin/ld: utils.o: in function enable_fp_exceptions:utils.c(.text+0x939): error: undefined reference to 'feenableexcept'
There's indeed no explicit dependency on -lm, and if HAVE_FEENABLEEXCEPT
happens to be set, test/util.c uses feenableexcept(), which is nowhere
to be found while linking.
Fix this by adding -lm to TEST_LDADD, although two alternatives could be
thought of:
- Only specifying -lm for objects using util.c.
- Introducing a conditional to add -lm only when configure detects
have_feenableexcept=yes.
Signed-off-by: Cyril Brulebois <kibi@debian.org>
Jon TURNEY [Sat, 18 Dec 2010 18:32:39 +0000 (18:32 +0000)]
Remove stray #include <fenv.h>
Remove a stray #include <fenv.h> added in commit
2444b2265abeaf6dcf3df1763bc2711684e63bb8
to fix compilation on platforms which don't have fenv.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Søren Sandmann Pedersen [Tue, 24 Aug 2010 01:55:02 +0000 (21:55 -0400)]
Add a stress-test program.
This test program tries to use as many rarely-used features as
possible, including alpha maps, accessor functions, oddly-sized
images, strange transformations, conical gradients, etc.
The hope is to provoke crashes or irregular behavior in pixman.
Søren Sandmann Pedersen [Tue, 12 Oct 2010 14:56:26 +0000 (10:56 -0400)]
Make the argument to fence_malloc() an int64_t
That way we can detect if someone attempts to allocate a negative size
and abort instead of just returning NULL and segfaulting later.
Søren Sandmann Pedersen [Sun, 29 Aug 2010 22:02:02 +0000 (18:02 -0400)]
test/utils.c: Initialize palette->rgba to 0.
That way it can be used with palettes that are not statically
allocated, without causing valgrind issues.
Søren Sandmann Pedersen [Tue, 24 Aug 2010 01:02:02 +0000 (21:02 -0400)]
test: Move palette initialization to utils.[ch]
Søren Sandmann Pedersen [Wed, 20 Oct 2010 17:12:37 +0000 (13:12 -0400)]
Extend gradient-crash-test
Test the gradients with various transformations, and test cases where
the gradients are specified with two identical points.
Søren Sandmann Pedersen [Wed, 20 Oct 2010 17:53:07 +0000 (13:53 -0400)]
Add enable_fp_exceptions() function in utils.[ch]
This function enables floating point traps if possible.
Søren Sandmann Pedersen [Tue, 24 Aug 2010 00:56:11 +0000 (20:56 -0400)]
test: Make composite test use some existing macros instead of defining its own
Also move the ARRAY_LENGTH macro into utils.h so it can be used elsewhere.
Siarhei Siamashka [Fri, 17 Dec 2010 13:29:58 +0000 (15:29 +0200)]
COPYING: added Nokia to the list of copyright holders
Siarhei Siamashka [Mon, 29 Nov 2010 22:31:06 +0000 (00:31 +0200)]
Fix for potential unaligned memory accesses
The temporary scanline buffer allocated on stack was declared
as uint8_t array. As a result, the compiler was free to select
any arbitrary alignment for it (even though there is typically
no reason to use really weird alignments here and the stack is
normally at least 4 bytes aligned on most platforms). Having
improper alignment is non-portable and can impact performance
or even make the code misbehave depending on the target platform.
Using uint64_t type for this array should ensure that any possible
memory accesses done by pixman code are going to be handled correctly
(pixman-combine64.c can access this buffer via uint64_t * pointer).
Some alignment related problem was reported in:
http://lists.freedesktop.org/archives/pixman/2010-November/000747.html
Siarhei Siamashka [Thu, 25 Nov 2010 00:28:29 +0000 (02:28 +0200)]
ARM: added 'neon_src_rpixbuf_8888' fast path
With this optimization added, pixman assisted conversion from
non-premultiplied to premultiplied alpha format is now fully
NEON optimized (both with and without R/B color components
swapping in the process).
Siarhei Siamashka [Mon, 29 Nov 2010 07:11:29 +0000 (09:11 +0200)]
ARM: added 'neon_composite_in_n_8' fast path
Siarhei Siamashka [Mon, 29 Nov 2010 07:00:46 +0000 (09:00 +0200)]
ARM: added flags parameter to some asm fast path wrapper macros
Not all types of operations can be skipped when having transparent
solid source or transparent solid mask. Add an extra flags parameter
for providing this information to the wrappers.
Siarhei Siamashka [Mon, 29 Nov 2010 01:31:32 +0000 (03:31 +0200)]
ARM: added 'neon_composite_add_8888_n_8888' fast path
Siarhei Siamashka [Mon, 29 Nov 2010 00:38:52 +0000 (02:38 +0200)]
ARM: added 'neon_composite_add_n_8_8888' fast path
Siarhei Siamashka [Mon, 29 Nov 2010 00:10:22 +0000 (02:10 +0200)]
ARM: better NEON instructions scheduling for add_8888_8888_8888
Provides a minor performance improvement by using pipelining and hiding
instructions latencies. Also do not clobber d0-d3 registers (source
image pixels) while doing calculations in order to allow the use of
the same macro for add_n_8_8888 fast path later.
Benchmark from ARM Cortex-A8 @500MHz:
== before ==
add_8888_8888_8888 = L1: 95.94 L2: 42.27 M: 25.60 (121.09%)
HT: 14.54 VT: 13.13 R: 12.77 RT: 4.49 (48Kops/s)
add_8888_8_8888 = L1: 104.51 L2: 57.81 M: 36.06 (106.62%)
HT: 19.24 VT: 16.45 R: 14.71 RT: 4.80 (51Kops/s)
== after ==
add_8888_8888_8888 = L1: 106.66 L2: 47.82 M: 27.32 (129.30%)
HT: 15.44 VT: 13.96 R: 12.86 RT: 4.48 (48Kops/s)
add_8888_8_8888 = L1: 107.72 L2: 61.02 M: 38.26 (113.16%)
HT: 19.48 VT: 16.72 R: 14.82 RT: 4.80 (51Kops/s)
Siarhei Siamashka [Sun, 28 Nov 2010 20:05:53 +0000 (22:05 +0200)]
ARM: added 'neon_composite_add_8888_8_8888' fast path
Siarhei Siamashka [Sat, 27 Nov 2010 13:53:54 +0000 (15:53 +0200)]
ARM: added 'neon_composite_over_0565_n_0565' fast path
Siarhei Siamashka [Sat, 27 Nov 2010 02:47:39 +0000 (04:47 +0200)]
ARM: reuse common NEON code for over_{n_8|8888_n|8888_8}_0565
Renamed suppementary macros from 'over_n_8_0565' to 'over_8888_8_0565',
because they can actually support all variants of this operation:
over_8888_8_0565/over_n_8_0565/over_8888_n_0565.
Also 'over_8888_8_0565' now uses more optimized common code instead of its
own variant, improving performance a bit. Even though this operation is
still memory bandwidth limited, scaled variants of these fast paths may
put more stress on CPU later.
Benchmarked on ARM Cortex-A8 @500MHz:
== before ==
over_8888_8_0565 = L1: 67.10 L2: 53.82 M: 44.70 (105.17%)
HT: 18.73 VT: 16.91 R: 14.25 RT: 4.80 (52Kops/s)
== after ==
over_8888_8_0565 = L1: 77.83 L2: 58.14 M: 44.82 (105.52%)
HT: 20.58 VT: 17.44 R: 15.05 RT: 4.88 (52Kops/s)
Siarhei Siamashka [Sat, 27 Nov 2010 01:53:12 +0000 (03:53 +0200)]
ARM: added 'neon_composite_over_8888_n_0565' fast path
Siarhei Siamashka [Fri, 26 Nov 2010 15:06:58 +0000 (17:06 +0200)]
ARM: better NEON instructions scheduling for over_n_8_0565
Code rearranged to get better instructions scheduling for ARM Cortex-A8/A9.
Now it is ~30% faster for the pixel data in L1 cache and makes better use
of memory bandwidth when running at lower clock frequencies (ex. 500MHz).
Also register d24 (pixels from the mask image) is now not clobbered by
supplementary macros, which allows to reuse them for the other variants
of compositing operations later.
Benchmark from ARM Cortex-A8 @500MHz:
== before ==
over_n_8_0565 = L1: 63.90 L2: 63.15 M: 60.97 ( 73.53%)
HT: 28.89 VT: 24.14 R: 21.33 RT: 6.78 ( 67Kops/s)
== after ==
over_n_8_0565 = L1: 82.64 L2: 75.19 M: 71.52 ( 84.14%)
HT: 30.49 VT: 25.56 R: 22.36 RT: 6.89 ( 68Kops/s)
Siarhei Siamashka [Sun, 28 Nov 2010 19:45:06 +0000 (21:45 +0200)]
ARM: introduced 'fetch_mask_pixblock' macro to simplify code
This macro hides the implementation details of pixels fetching
for the mask image just like 'fetch_src_pixblock' does for the
source image. This provides more possibilities for reusing the
same code blocks in different compositing functions.
This patch does not introduce any functional changes and the
resulting code in the compiled object file is exactly the same.
Siarhei Siamashka [Fri, 26 Nov 2010 06:55:49 +0000 (08:55 +0200)]
ARM: added 'neon_composite_over_n_8_8' fast path
Siarhei Siamashka [Mon, 15 Nov 2010 16:26:43 +0000 (18:26 +0200)]
C fast path for a1 fill operation
Can be used as one of the solutions to fix bug
https://bugs.freedesktop.org/show_bug.cgi?id=31604
Alan Coopersmith [Sun, 21 Nov 2010 19:42:22 +0000 (11:42 -0800)]
Sun's copyrights belong to Oracle now
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Cyril Brulebois [Wed, 17 Nov 2010 15:16:56 +0000 (16:16 +0100)]
Fix argument quoting for AC_INIT.
One gets rid of this accordingly:
| autoreconf -vfi
| autoreconf: Entering directory `.'
| autoreconf: configure.ac: not using Gettext
| autoreconf: running: aclocal --force
| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org"
| autoreconf: configure.ac: tracing
| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org"
Signed-off-by: Cyril Brulebois <kibi@debian.org>
Søren Sandmann Pedersen [Tue, 16 Nov 2010 22:14:47 +0000 (17:14 -0500)]
Post-release version bump to 0.21.3
Søren Sandmann Pedersen [Tue, 16 Nov 2010 21:43:26 +0000 (16:43 -0500)]
Pre-release version bump
Søren Sandmann Pedersen [Wed, 3 Nov 2010 03:38:10 +0000 (23:38 -0400)]
Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images
There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each function is an
inline function that takes a format and a repeat mode as parameters.
Andrea Canciani [Tue, 2 Nov 2010 16:04:35 +0000 (17:04 +0100)]
Improve conical gradients opacity check
Conical gradients are completely opaque if all of their stops are
opaque and the repeat mode is not 'none'.
Andrea Canciani [Tue, 2 Nov 2010 16:02:01 +0000 (17:02 +0100)]
Fix opacity check
Radial gradients are "conical", thus they can have some non-opaque
parts even if all of their stops are completely opaque.
To guarantee that a radial gradient is actually opaque, it needs to
also have one of the two circles containing the other one. In this
case when extrapolating, the whole plane is completely covered (as
explained in the comment in pixman-radial-gradient.c).
Andrea Canciani [Sun, 31 Oct 2010 15:59:45 +0000 (16:59 +0100)]
Remove unused stop_range field
Siarhei Siamashka [Sun, 3 Oct 2010 22:56:59 +0000 (01:56 +0300)]
ARM: optimization for scaled src_0565_0565 with nearest filter
The performance improvement is only in the ballpark of 5% when
compared against C code built with a reasonably good compiler
(gcc 4.5.1). But gcc 4.4 produces approximately 30% slower code
here, so assembly optimization makes sense to avoid dependency
on the compiler quality and/or optimization options.
Benchmark from ARM11:
== before ==
op=1, src_fmt=
10020565, dst_fmt=
10020565, speed=34.86 MPix/s
== after ==
op=1, src_fmt=
10020565, dst_fmt=
10020565, speed=36.62 MPix/s
Benchmark from ARM Cortex-A8:
== before ==
op=1, src_fmt=
10020565, dst_fmt=
10020565, speed=89.55 MPix/s
== after ==
op=1, src_fmt=
10020565, dst_fmt=
10020565, speed=94.91 MPix/s
Siarhei Siamashka [Tue, 2 Nov 2010 14:12:42 +0000 (16:12 +0200)]
ARM: NEON optimization for scaled src_0565_8888 with nearest filter
Benchmark from ARM Cortex-A8 @720MHz:
== before ==
op=1, src_fmt=
10020565, dst_fmt=
20028888, speed=8.99 MPix/s
== after ==
op=1, src_fmt=
10020565, dst_fmt=
20028888, speed=76.98 MPix/s
== unscaled ==
op=1, src_fmt=
10020565, dst_fmt=
20028888, speed=137.78 MPix/s
Siarhei Siamashka [Tue, 2 Nov 2010 13:25:51 +0000 (15:25 +0200)]
ARM: NEON optimization for scaled src_8888_0565 with nearest filter
Benchmark from ARM Cortex-A8 @720MHz:
== before ==
op=1, src_fmt=
20028888, dst_fmt=
10020565, speed=42.51 MPix/s
== after ==
op=1, src_fmt=
20028888, dst_fmt=
10020565, speed=55.61 MPix/s
== unscaled ==
op=1, src_fmt=
20028888, dst_fmt=
10020565, speed=117.99 MPix/s
Siarhei Siamashka [Tue, 2 Nov 2010 12:39:02 +0000 (14:39 +0200)]
ARM: NEON optimization for scaled over_8888_0565 with nearest filter
Benchmark from ARM Cortex-A8 @720MHz:
== before ==
op=3, src_fmt=
20028888, dst_fmt=
10020565, speed=10.29 MPix/s
== after ==
op=3, src_fmt=
20028888, dst_fmt=
10020565, speed=36.36 MPix/s
== unscaled ==
op=3, src_fmt=
20028888, dst_fmt=
10020565, speed=79.40 MPix/s
Siarhei Siamashka [Tue, 2 Nov 2010 12:29:57 +0000 (14:29 +0200)]
ARM: NEON optimization for scaled over_8888_8888 with nearest filter
Benchmark from ARM Cortex-A8 @720MHz:
== before ==
op=3, src_fmt=
20028888, dst_fmt=
20028888, speed=12.73 MPix/s
== after ==
op=3, src_fmt=
20028888, dst_fmt=
20028888, speed=28.75 MPix/s
== unscaled ==
op=3, src_fmt=
20028888, dst_fmt=
20028888, speed=53.03 MPix/s
Siarhei Siamashka [Tue, 2 Nov 2010 17:16:46 +0000 (19:16 +0200)]
ARM: performance tuning of NEON nearest scaled pixel fetcher
Interleaving the use of NEON registers helps to avoid some stalls
in NEON pipeline and provides a small performance improvement.