Søren Sandmann Pedersen [Sat, 7 Jan 2012 19:32:08 +0000 (14:32 -0500)]
test: In the alphamap test, also test that we get the right red value
There is a bug where the red channel of the alpha map of the
destination image is used instead of the red channel of the
destination image itself.
Alan Coopersmith [Sat, 24 Dec 2011 00:32:57 +0000 (16:32 -0800)]
Make mmx code compatible with Solaris Studio 12.3 compilers
Rearranged some of the existing gcc & Intel compiler checks to allow
easier sharing of common cases among the compilers.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Søren Sandmann Pedersen [Tue, 20 Dec 2011 11:32:26 +0000 (06:32 -0500)]
Fix rounding for DIV_UNc()
We need to compute floor (a/b * 255 + 0.5), not floor (a / b * 255),
so add b/2 to the numerator in the DIV_UNc() macro.
Søren Sandmann Pedersen [Thu, 22 Dec 2011 16:37:26 +0000 (11:37 -0500)]
Reject trapezoids where top (botttom) is above (below) the edges
When a trapezoid has a top/bottom that is above/below the left/right
edges, degenerate trapezoids become possible. For example the edge
could be very short and close to horizontal. If the bottom edge is far
below the bottom point of such a short edge, the result is that the
lower right corner of the trapezoid will be extremely far to the left.
This kind of trapezoid causes overflows in the rasterization code, so
change pixman_trapezoid_valid() to reject them.
Søren Sandmann Pedersen [Tue, 20 Dec 2011 11:34:41 +0000 (06:34 -0500)]
In MUL_UNc() cast to comp2_t
Otherwise, when comp1_t is 16 bits wide, we can end up with a signed
integer overflow.
Søren Sandmann Pedersen [Wed, 21 Dec 2011 13:19:05 +0000 (08:19 -0500)]
Fix a bunch of signed overflow issues
In pixman-fast-path.c: (1 << 31) - 1 causes a signed overflow, so
change to (1U << n) - 1.
In pixman-image.c: The check for whether m10 == -m01 will overflow
when -m01 == INT_MIN. Instead just check whether the variables are 1
and -1.
In pixman-utils.c: When the depth of the topmost channel is 0, we can
end up shifting by 32.
In blitters-test.c: Replicating the mask would end up shifting more
than 32.
In region-contains-test.c: Computing the average of two large integers
could overflow. Instead add half the difference between them to the
first integer.
In stress-test.c: Masking the value in fake_reader() would sometimes
shift by 32. Instead just use the most significant bits instead of
the least significant.
All these issues were found by the IOC tool:
http://embed.cs.utah.edu/ioc/
Søren Sandmann Pedersen [Sun, 18 Dec 2011 14:54:47 +0000 (09:54 -0500)]
Add missing cast in _pixman_edge_multi_init()
nx and e->dy are both 32 bit quantities, so a cast is needed to make
sure their product is 64 bit before subtracting it from a 64 bit
quantity.
Søren Sandmann Pedersen [Sun, 18 Dec 2011 13:16:45 +0000 (08:16 -0500)]
Fix some signed overflow bugs
In the macros for the PDF blend modes, two comp1_t variables are
multiplied together and then used as if the result were a
comp4_t. When comp1_t is a uint8_t, this is fine because they are
promoted to int, and the product of two uint8_ts fits in an
int. However, when comp1_t is uint16, the product does not necessarily
fit in an int, so casts are necessary.
Fix for bug 43906, reported by Siarhei Siamashka.
Søren Sandmann Pedersen [Thu, 5 Jan 2012 15:37:51 +0000 (10:37 -0500)]
pixman-image.c: Fix typo in pixman_image_set_transform()
A parenthesis was misplaced so that the size argument to memcmp() was
always 0. The bug is harmless except that the flags might be
unnecessarily recomputed in some cases.
A bug reporting this in Mozilla's fork was discovered here:
https://bugzilla.mozilla.org/show_bug.cgi?id=710992
Colin Walters [Wed, 4 Jan 2012 13:06:05 +0000 (08:06 -0500)]
autogen.sh: Support GNOME Build API
http://people.gnome.org/~walters/docs/build-api.txt
Søren Sandmann Pedersen [Sun, 18 Dec 2011 12:29:59 +0000 (07:29 -0500)]
gradient-walker: For NONE repeats, when x < 0 or x > 1, set both colors to 0
ec7c9c2b6865b48b8bd14e4 introduced a bug where NONE gradients would be
misrendered, causing the area outside the gradient to be treated as a
(very) long fade to transparent.The problem was that a check for
positions outside the gradients were dropped in favor of relying on
the sentinels.
Aside from misrendering, this also caused a signed integer overflow
when the code would compute a stepper size based on MIN_INT32.
This patches fixes the issue by reinstating a check for these cases
and setting both the right and left colors to transparent black.
Søren Sandmann Pedersen [Wed, 21 Dec 2011 10:19:00 +0000 (05:19 -0500)]
Modify gradient-test to show a bug in NONE processing
This patch modifies demos/gradient-test to display a bug in gradients
with a repeat mode of NONE. With the current gradient code, the left
side will be a solid red (actually an extremely long fade from solid
red to transparent) instead of a sharp transition from red to green.
Søren Sandmann Pedersen [Fri, 9 Dec 2011 08:59:04 +0000 (03:59 -0500)]
region: Add pixman_region{,32}_clear() functions.
These functions simply reset the region to empty. They are equivalent
to
pixman_region_fini (®ion);
pixman_region_init (®ion);
Bobby Salazar [Tue, 13 Dec 2011 07:03:16 +0000 (02:03 -0500)]
Android Runtime Detection Support For ARM NEON
This patch adds runtime detection support for the ARM NEON fast paths
for code compiled with the Android NDK. This is the only code change
needed to enable the ARM NEON pixman fast paths for the ever growing
Android platform (200 million+ smartphones, tablets, etc.). Just make
sure to #define USE_ARM_NEON in your makefile.
Naohiro Aota [Thu, 24 Nov 2011 12:12:15 +0000 (13:12 +0100)]
Don't use non-POSIX test
test "$test_CFLAGS" == "" && \
may cause an error on some POSIX shells and uses a style which is not
consistent with the other tests in configure.ac
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=42588 and
https://bugs.gentoo.org/show_bug.cgi?id=387087
Andrea Canciani [Tue, 8 Nov 2011 21:00:46 +0000 (22:00 +0100)]
test: Produce autotools-looking report in the win32 build system
Tweak the commands used to run the tests on win32 to make the output
look mostly like that produced by the autotools test system.
In addition to this, make sure that the exit status of the test target
is success (0) if and only if no failure occurred.
Andrea Canciani [Thu, 3 Nov 2011 10:07:25 +0000 (11:07 +0100)]
demos: Consistently use G_N_ELEMENTS()
Instead of open-coding G_N_ELEMENTS(), just use it.
Andrea Canciani [Thu, 3 Nov 2011 09:53:10 +0000 (10:53 +0100)]
test: Reuse the ARRAY_LENGTH() macro
It is provided by utils.h, there is no need to redefine it.
Andrea Canciani [Thu, 3 Nov 2011 09:51:27 +0000 (10:51 +0100)]
Use the ARRAY_LENGTH() macro when possible
This patch has been generated by the following Coccinelle semantic patch:
// Use the ARRAY_LENGTH() macro when possible
//
// Replace open-coded array length computations with the
// ARRAY_LENGTH() macro
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(T))
+ ARRAY_LENGTH (E)
Andrea Canciani [Thu, 3 Nov 2011 09:40:24 +0000 (10:40 +0100)]
test: Cleanup includes
All the tests are linked to libutil, hence it makes sence to always
include utils.h and reuse what it provides (config.h inclusion, access
to private pixman APIs, ARRAY_LENGTH, ...).
Andrea Canciani [Thu, 3 Nov 2011 09:21:41 +0000 (10:21 +0100)]
Remove useless checks for NULL before freeing
This patch has been generated by the following Coccinelle semantic patch:
// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it
@@
expression E;
@@
+ free (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
- free(E);
(
- E = NULL;
|
- E = 0;
)
...
- }
@@
expression E;
@@
+ free (E);
- if (unlikely (E != NULL)) {
- free (E);
- }
Søren Sandmann Pedersen [Sun, 6 Nov 2011 21:36:01 +0000 (16:36 -0500)]
Post-release version bump to 0.25.1
Søren Sandmann Pedersen [Sun, 6 Nov 2011 21:10:33 +0000 (16:10 -0500)]
Pre-release version bump to 0.24.0
Alan Coopersmith [Sun, 30 Oct 2011 16:12:06 +0000 (09:12 -0700)]
Change MMX ldq_u to return _m64 instead of forcing all callers to cast
Sun/Oracle Studio compilers allow the pointers to be cast, but not the
non-pointer forms, causing pixman compiles to fail with many errors of:
"pixman-mmx.c", line 1411: invalid cast expression
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Jeff Muizelaar [Wed, 2 Nov 2011 22:49:58 +0000 (18:49 -0400)]
Add definitions of INT64_MIN and INT64_MAX
Søren Sandmann Pedersen [Sat, 29 Oct 2011 09:51:44 +0000 (05:51 -0400)]
Post-release version bump to 0.23.9
Søren Sandmann Pedersen [Sat, 29 Oct 2011 09:33:44 +0000 (05:33 -0400)]
Pre-release version bump to 0.23.8
Søren Sandmann Pedersen [Tue, 25 Oct 2011 12:45:34 +0000 (08:45 -0400)]
Fix use of uninitialized fields reported by valgrind
In pixman-noop.c and pixman-sse2.c, we are accessing
image->bits.width/height without first making sure the image is a bits
image. The warning is harmless because we never act on this
information without checking that the image is a8r8g8b8, but valgrind
does warn about it.
In pixman-noop.c, just reorder the clauses in the if statement; in
pixman-sse2.c require images to have the FAST_PATH_BITS_IMAGE flag
set.
Søren Sandmann Pedersen [Thu, 20 Oct 2011 13:13:12 +0000 (09:13 -0400)]
Merge branch 'gradients'
Taekyun Kim [Tue, 18 Oct 2011 12:50:18 +0000 (21:50 +0900)]
ARM: NEON: Fix assembly typo error in src_n_8_8888
Binutils 2.21 does not complain about missing comma between ARM
register and alignement specifier in vld/vst instructions which
causes build error on binutils 2.20.
Taekyun Kim [Mon, 26 Sep 2011 09:33:27 +0000 (18:33 +0900)]
ARM: NEON: Standard fast path src_n_8_8
Performance numbers of before/after on cortex-a8 @ 1GHz
- before
L1: 28.05 L2: 28.26 M: 26.97 ( 4.48%) HT: 19.79 VT: 19.14 R: 17.61 RT: 9.88 ( 101Kops/s)
- after
L1:1430.28 L2:1252.10 M:421.93 ( 75.48%) HT:170.16 VT:138.03 R:145.86 RT: 35.51 ( 255Kops/s)
Taekyun Kim [Mon, 26 Sep 2011 08:03:54 +0000 (17:03 +0900)]
ARM: NEON: Standard fast path src_n_8_8888
Performance numbers of before/after on cortex-a8 @ 1GHz
- before
L1: 32.39 L2: 31.79 M: 30.84 ( 13.77%) HT: 21.58 VT: 19.75 R: 18.83 RT: 10.46 ( 106Kops/s)
- after
L1: 516.25 L2: 372.00 M:193.49 ( 85.59%) HT:136.93 VT:109.10 R:104.48 RT: 34.77 ( 253Kops/s)
Taekyun Kim [Mon, 26 Sep 2011 10:04:53 +0000 (19:04 +0900)]
ARM: NEON: Instruction scheduling of bilinear over_8888_8_8888
Instructions are reordered to eliminate pipeline stalls and get
better memory access.
Performance of before/after on cortex-a8 @ 1GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 40.53 Mpix/s
after : 50.76 Mpix/s
Taekyun Kim [Wed, 21 Sep 2011 06:52:13 +0000 (15:52 +0900)]
ARM: NEON: Instruction scheduling of bilinear over_8888_8888
Instructions are reordered to eliminate pipeline stalls and get
better memory access.
Performance of before/after on cortex-a8 @ 1GHz
<< 2000 x 2000 with scale factor close to 1.x >>
before : 50.43 Mpix/s
after : 61.09 Mpix/s
Taekyun Kim [Thu, 22 Sep 2011 15:03:22 +0000 (00:03 +0900)]
ARM: NEON: Replace old bilinear scanline generator with new template
Bilinear scanline functions in pixman-arm-neon-asm-bilinear.S can
be replaced with new template just by wrapping existing macros.
Taekyun Kim [Tue, 20 Sep 2011 12:32:35 +0000 (21:32 +0900)]
ARM: NEON: Bilinear macro template for instruction scheduling
This macro template takes 6 code blocks.
1. process_last_pixel
2. process_two_pixels
3. process_four_pixels
4. process_pixblock_head
5. process_pixblock_tail
6. process_pixblock_tail_head
process_last_pixel does not need to update horizontal weight. This
is done by the template. two and four code block should update
horizontal weight inside of them. head/tail/tail_head blocks
consist unrolled core loop. You can apply instruction scheduling
to the tail_head blocks.
You can also specify size of the pixel block. Supported size is 4
and 8. If you want to use mask, give BILINEAR_FLAG_USE_MASK flags
to the template, then you can use register MASK. When using d8~d15
registers, give BILINEAR_FLAG_USE_ALL_NEON_REGS to make sure
registers are properly saved on the stack and later restored.
Taekyun Kim [Tue, 20 Sep 2011 10:46:25 +0000 (19:46 +0900)]
ARM: NEON: Some cleanup of bilinear scanline functions
Use STRIDE and initial horizontal weight update is done before
entering interpolation loop. Cache preload for mask and dst.
Søren Sandmann Pedersen [Fri, 14 Oct 2011 13:04:48 +0000 (09:04 -0400)]
Simplify gradient_walker_reset()
The code that searches for the closest color stop to the given
position is duplicated across the various repeat modes. Replace the
switch with two if/else constructions, and put the search code between
them.
Søren Sandmann Pedersen [Fri, 14 Oct 2011 13:02:14 +0000 (09:02 -0400)]
Use sentinels instead of special casing first and last stops
When storing the gradient stops internally, allocate two more stops,
one before the beginning of the stop list and one after the
end. Initialize those stops based on the repeat property of the
gradient.
This allows gradient_walker_reset() to be simplified because it can
now simply pick the two closest stops to the position without special
casing the first and last stops.
Søren Sandmann Pedersen [Fri, 14 Oct 2011 11:42:00 +0000 (07:42 -0400)]
gradient walker: Correct types and fix formatting
The type of pos in gradient_walker_reset() and gradient_walker_pixel()
is pixman_fixed_48_16_t and not pixman_fixed_32_32. The types of the
positions in the walker struct are pixman_fixed_t and not int32_t, and
need_reset is a boolean, not an integer. The spread field should be
called repeat and have the type pixman_repeat_t.
Also fix some formatting issues, make gradient_walker_reset() static,
and delete the pointless PIXMAN_GRADIENT_WALKER_NEED_RESET() macro.
Søren Sandmann Pedersen [Tue, 11 Oct 2011 20:12:24 +0000 (16:12 -0400)]
Add stable release / development snapshot to draft release notes
This will hopefully serve as a reminder to me that I should put this
information in the release notes.
Søren Sandmann Pedersen [Tue, 11 Oct 2011 10:10:39 +0000 (06:10 -0400)]
Post-release version bump to 0.23.7
Søren Sandmann Pedersen [Tue, 11 Oct 2011 10:00:51 +0000 (06:00 -0400)]
Pre-release version bump to 0.23.6
Taekyun Kim [Thu, 22 Sep 2011 09:42:38 +0000 (18:42 +0900)]
Simple repeat: Extend too short source scanlines into temporary buffer
Too short scanlines can cause repeat handling overhead and optimized
pixman composite functions usually process a bunch of pixels in a
single loop iteration it might be beneficial to pre-extend source
scanlines. The temporary buffers will usually reside in cache, so
accessing them should be quite efficient.
Taekyun Kim [Mon, 29 Aug 2011 12:44:36 +0000 (21:44 +0900)]
Simple repeat fast path
We can implement simple repeat by stitching existing fast path
functions. First lookup COVER_CLIP function for given input and
then stitch horizontally using the function.
Taekyun Kim [Thu, 22 Sep 2011 07:33:02 +0000 (16:33 +0900)]
Move _pixman_lookup_composite_function() to pixman-utils.c
Søren Sandmann Pedersen [Mon, 27 Jun 2011 21:17:04 +0000 (21:17 +0000)]
Add src, mask, and dest flags to the composite args struct.
These flags are useful in the various compositing routines, and the
flags stored in the image structs are missing some bits of information
that can only be computed when pixman_image_composite() is called.
Taekyun Kim [Thu, 22 Sep 2011 07:26:55 +0000 (16:26 +0900)]
Add new fast path flag FAST_PATH_BITS_IMAGE
This fast path flag indicate that type of the image is bits image.
Taekyun Kim [Thu, 22 Sep 2011 07:20:03 +0000 (16:20 +0900)]
init/fini functions for pixman_image_t
pixman_image_t itself can be on stack or heap. So segregating
init/fini from create/unref can be useful when we want to use
pixman_image_t on stack or other memory.
Taekyun Kim [Wed, 7 Sep 2011 14:00:29 +0000 (23:00 +0900)]
sse2: Bilinear scaled over_8888_8_8888
Taekyun Kim [Wed, 7 Sep 2011 13:57:29 +0000 (22:57 +0900)]
sse2: Bilinear scaled over_8888_8888
Taekyun Kim [Wed, 7 Sep 2011 13:51:46 +0000 (22:51 +0900)]
sse2: Macros for assembling bilinear interpolation code fractions
Primitive bilinear interpolation code is reusable to implement other
bilinear functions.
BILINEAR_DECLARE_VARIABLES
- Declare variables needed to interpolate src pixels.
BILINEAR_INTERPOLATE_ONE_PIXEL
- Interpolate one pixel and advance to next pixel
BILINEAR_SKIP_ONE_PIXEL
- Skip interpolation and just advance to next pixel
This is useful for skipping zero mask
Matt Turner [Thu, 6 Oct 2011 21:56:09 +0000 (17:56 -0400)]
Correct the minimum gcc version needed for iwmmxt
Spotted by Søren Sandmann.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Thu, 6 Oct 2011 02:54:36 +0000 (22:54 -0400)]
Make sure iwMMXt is only detected on ARM
iwMMXt is incorrectly detected on x86 and amd64. This happens because
the test uses standard _mm_* intrinsic functions which it compiles with
-march=iwmmxt, but when the user has set CFLAGS=-march=k8 for instance,
no error is generated from -march=iwmmxt, even though it's not a valid
flag on x86/amd64. Passing CFLAGS=-march=native does not override the
-march=iwmmxt flag though, which is why it wasn't noticed before.
So, just #error out in the test if the __arm__ preprocessor directive
isn't defined.
Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179
Signed-off-by: Matt Turner <mattst88@gmail.com>
Søren Sandmann Pedersen [Tue, 27 Sep 2011 15:32:13 +0000 (11:32 -0400)]
Don't include stdint.h in scaling-helpers-test.
Fixes bug 41257.
Benjamin Otte [Wed, 14 Sep 2011 15:52:03 +0000 (17:52 +0200)]
build: replace @VAR@ with $(VAR) in makefiles
Benjamin Otte [Wed, 14 Sep 2011 15:01:51 +0000 (17:01 +0200)]
tests: Add PNG_CFLAGS/LIBS to tests
PNG flags were accidentally included by gdk-pixbuf. This has been fixed
recently, so we need to make sure to include it ourselves.
Matt Turner [Thu, 22 Sep 2011 19:28:00 +0000 (15:28 -0400)]
mmx: optimize unaligned 64-bit ARM/iwmmxt loads
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Mon, 1 Aug 2011 02:42:24 +0000 (22:42 -0400)]
mmx: compile on ARM for iwmmxt optimizations
Check in configure for at least gcc-4.6, since gcc-4.7 (and hopefully
4.6) will be the eariest version capable of compiling the _mm_*
intrinsics on ARM/iwmmxt. Even for suitable compile versions I use
_mm_srli_si64 which is known to cause unpatched compilers to fail.
Select iwmmxt at runtime only after NEON, since we expect the NEON
optimizations to be more capable and faster than iwmmxt.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Sun, 4 Sep 2011 18:11:46 +0000 (14:11 -0400)]
mmx: prepare pixman-mmx.c to be compiled for ARM/iwmmxt
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Thu, 8 Sep 2011 18:33:45 +0000 (20:33 +0200)]
mmx: fix unaligned accesses
Simply return *p in the unaligned access functions, since alignment
constraints are very relaxed on x86 and this allows us to generate
identical code as before.
Tested with the test suite, lowlevel-blit-test, and cairo-perf-trace on
ARM and Alpha with no unaligned accesses found.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Thu, 22 Sep 2011 19:39:53 +0000 (15:39 -0400)]
mmx: wrap x86/MMX inline assembly in ifdef USE_X86_MMX
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Sun, 31 Jul 2011 20:20:12 +0000 (20:20 +0000)]
mmx: rename USE_MMX to USE_X86_MMX
This will make upcoming ARM usage of pixman-mmx.c unambiguous.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Fri, 23 Sep 2011 18:10:52 +0000 (14:10 -0400)]
mmx: convert while (w) to if (w) when possible
gcc isn't able to see that w is no greater than 1, so it generates
unnecessary loop instructions with while (w).
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Fri, 9 Sep 2011 13:33:14 +0000 (15:33 +0200)]
mmx: fix formats in commented code
b8r8g8 is apparently no longer supported sometime since this code was
commented.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Matt Turner [Fri, 9 Sep 2011 13:34:04 +0000 (15:34 +0200)]
lowlevel-blt: add over_x888_8_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
Siarhei Siamashka [Sun, 22 May 2011 19:51:00 +0000 (22:51 +0300)]
BILINEAR->NEAREST filter optimization for simple rotation and translation
Simple rotation and translation are the additional cases when BILINEAR
filter can be safely reduced to NEAREST.
Søren Sandmann Pedersen [Sun, 4 Sep 2011 06:53:39 +0000 (02:53 -0400)]
Strength-reduce BILINEAR filter to NEAREST filter for identity transforms
An image with a bilinear filter and an identity transform is
equivalent to one with a nearest filter, so there is no reason the
standard fast paths shouldn't be usable.
But because a BILINEAR filter samples a 2x2 pixel block in the source
image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the
source area is the entire image, because some compositing operations
might then read pixels outside the image.
This patch fixes the problem by splitting the
FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip
covers the samples taking into account NEAREST/BILINEAR filters
respectively.
All the existing compositing operations that require
FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick
either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which
filter they depend on.
In compute_image_info() both COVER_CILP_NEAREST and
COVER_CLIP_BILINEAR can be set depending on how much room there is
around the clip rectangle.
Finally, images with an identity transform and a bilinear filter get
FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER.
Performance measurementas with render_bench against Xephyr:
Before
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.
After:
*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.
Søren Sandmann Pedersen [Mon, 5 Sep 2011 18:43:25 +0000 (14:43 -0400)]
test: Occasionally use a BILINEAR filter in blitters-test
To test that reductions of BILINEAR->NEAREST for identity
transformations happen correctly, occasionally use a bilinear filter
in blitters test.
Siarhei Siamashka [Sun, 22 May 2011 19:16:38 +0000 (22:16 +0300)]
test: better coverage for BILINEAR->NEAREST filter optimization
The upcoming optimization which is going to be able to replace BILINEAR filter
with NEAREST where appropriate needs to analyze the transformation matrix
and not to make any mistakes.
The changes to affine-test include:
1. Higher chance of using the same scale factor for x and y axes. This can help
to stress some special cases (for example the case when both x and y scale
factors are integer). The same applies to x/y translation.
2. Introduced a small chance for "corrupting" transformation matrix by flipping
random bits. This supposedly can help to identify the cases when some of the
fast paths or other code logic is wrongly activated due to insufficient checks.
Søren Sandmann Pedersen [Mon, 5 Sep 2011 04:19:51 +0000 (00:19 -0400)]
Eliminate compute_sample_extents() function
In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear->nearest optimization
will do something different with these two pieces of code.
Søren Sandmann Pedersen [Sun, 4 Sep 2011 21:43:29 +0000 (17:43 -0400)]
Split computation of sample area into own function
compute_sample_extents() have two parts: one that computes the
transformed extents, and one that checks whether the computed extents
fit within the 16.16 coordinate space.
Split the first part into its own function
compute_transformed_extents().
Søren Sandmann Pedersen [Sun, 4 Sep 2011 21:17:53 +0000 (17:17 -0400)]
Remove x and y coordinates from analyze_extents() and compute_sample_extents()
These coordinates were only ever used for subtracting from the extents
box to put it into the coordinate space of the image, so we might as
well do this coordinate translation only once before entering the
functions.
Søren Sandmann Pedersen [Tue, 16 Aug 2011 10:13:59 +0000 (06:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for paletted formats
Add support in convert_pixel_from_a8r8g8b8() and
convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats,
then use MAKE_ACCESSORS() to generate accessors for the indexed
formats: c8, g8, g4, c4, g1
Søren Sandmann Pedersen [Sun, 30 May 2010 16:36:58 +0000 (12:36 -0400)]
Use MAKE_ACCESSORS() to generate accessors for the a1 format.
Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp
pixels to fetch_and_convert_pixel() and convert_and_store_pixel(),
then use MAKE_ACCESSORS() to generate the accessors for the a1
format. (Not the g1 format as it is indexed).
Søren Sandmann Pedersen [Tue, 16 Aug 2011 18:38:44 +0000 (14:38 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 24bpp formats
Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp
pixels in fetch_and_convert_pixel() and
convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate
accessors for the 24 bpp formats:
r8g8b8
b8g8r8
Søren Sandmann Pedersen [Thu, 18 Aug 2011 09:09:07 +0000 (05:09 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formats
Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to
fetch_and_convert_pixel() and convert_and_store_pixel(), then use
MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and
c4 which are indexed:
a4
r1g2b1
b1g2r1
a1r1g1b1
a1b1g1r1
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:58 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 8bpp RGB formats
Add support for 8 bpp formats to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the
accessors for all the 8 bpp formats, except g8 and c8, which are
indexed:
a8
r3g3b2
b2g3r3
a2r2g2b2
a2b2g2r2
x4a4
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:44 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats
Add support for 16bpp pixels to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 16bpp formats:
r5g6b5
b5g6r5
a1r5g5b5
x1r5g5b5
a1b5g5r5
x1b5g5r5
a4r4g4b4
x4r4g4b4
a4b4g4r4
x4b4g4r4
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:30 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate all the 32 bit accessors
Add support for 32bpp formats in fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 32 bpp formats:
a8r8g8b8
x8r8g8b8
a8b8g8r8
x8b8g8r8
x14r6g6b6
b8g8r8a8
b8g8r8x8
r8g8b8x8
r8g8b8a8
Søren Sandmann Pedersen [Wed, 17 Aug 2011 21:27:58 +0000 (17:27 -0400)]
Add initial version of the MAKE_ACCESSORS() macro
This macro will eventually allow the fetchers and storers to be
generated automatically. For now, it's just a skeleton that doesn't
actually do anything.
Søren Sandmann Pedersen [Mon, 15 Aug 2011 22:42:38 +0000 (18:42 -0400)]
Add general pixel converter
This function can convert between any <= 32 bpp formats. Nothing uses
it yet.
Søren Sandmann Pedersen [Mon, 15 Aug 2011 14:22:05 +0000 (10:22 -0400)]
Add a generic unorm_to_unorm() conversion utility
This function can convert between normalized numbers of different
depths. When converting to higher bit depths, it will replicate the
existing bits, when converting to lower bit depths, it will simply
truncate.
This function replaces the expand16() function in pixman-utils.c
Søren Sandmann Pedersen [Mon, 19 Sep 2011 13:08:33 +0000 (09:08 -0400)]
A few tweaks to a comment in pixman-combine.c.template
Include a link to
http://marc.info/?l=xfree-render&m=
99792000027857&w=2
where Keith explains how the disjoint/conjoint operators work.
Jon TURNEY [Mon, 19 Sep 2011 10:17:58 +0000 (06:17 -0400)]
Fix build on cygwin after commit
efdf65c0c4fff551fb3cd9104deda9adb6261e22
libutils depends on pixman and so needs to preceed it in the link order
Found by tinderbox, see [1]
[1] http://tinderbox.freedesktop.org/builds/2011-09-15-0005/logs/pixman/#build
Signed-off-by: Jon TURNEY <jon.turney at dronecode.org.uk>
Søren Sandmann Pedersen [Tue, 13 Sep 2011 03:17:39 +0000 (23:17 -0400)]
test: Use smaller boxes in region_contains_test()
The boxes used region_contains_test() sometimes overflow causing
*** BUG ***
In pixman_region32_union_rect: Invalid rectangle passed
Set a breakpoint on '_pixman_log_error' to debug
messages to be printed when pixman is compiled with DEBUG. Fix this by
dividing the x, y, w, h coordinates by 4 to prevent overflows.
Andrea Canciani [Sun, 4 Sep 2011 19:33:05 +0000 (21:33 +0200)]
build-win32: Add 'check' target
On win32 the tests are built but they are not run automatically by the
build system.
A minimal 'check' target (depending on the tests being built) can
simply run them and log to the console their success/failure.
Andrea Canciani [Sun, 4 Sep 2011 20:52:53 +0000 (13:52 -0700)]
test: Do not include config.h unless HAVE_CONFIG_H is defined
The win32 build system does not generate config.h and correctly runs
the compiler without defining HAVE_CONFIG_H. Nevertheless some files
include config.h without checking for its availability, breaking the
build from a clean directory:
test\utils.h(2) : fatal error C1083: Cannot open include file:
'config.h': No such file or directory
...
Andrea Canciani [Sun, 4 Sep 2011 19:56:20 +0000 (21:56 +0200)]
build-win32: Add root Makefile.win32
Add Makefile.win32 to the pixman root. This makefile can recursively
run the other ones to compile the library or the test suite.
Andrea Canciani [Sun, 4 Sep 2011 16:00:38 +0000 (18:00 +0200)]
build-win32: Share targets and variables across win32 makefiles
The win32 build system repeatedly defines some basic variables
(notably program names and flags) and C sources compilation rules.
They can be factored out to a common Makefile, to be included in every
other Makefile.win32.
Andrea Canciani [Sun, 4 Sep 2011 18:07:42 +0000 (20:07 +0200)]
build: Reuse test sources
Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.
This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.
In order to further simplify the test makefiles, the utility functions
are now in a static library, which gets linked to all the tests and
benchmarks.
Andrea Canciani [Sun, 4 Sep 2011 16:41:41 +0000 (09:41 -0700)]
build: Reuse sources and pixman-combine build rules
Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.
This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.
Andrea Canciani [Sun, 4 Sep 2011 18:07:57 +0000 (20:07 +0200)]
test: Fix compilation on win32
Adding scaling-helpers-test to the testsuite on win32 makes MSVC
complain about int64_t being used as an expression:
scaling-helpers-test.c(27) : error C2275: 'int64_t' : illegal use of
this type as an expression
Søren Sandmann Pedersen [Sun, 11 Sep 2011 23:44:06 +0000 (19:44 -0400)]
Use pkg-config to determine the flags to use with libpng
Previously we would unconditionally link with -lpng leading to build
failures on systems without libpng.
Søren Sandmann Pedersen [Tue, 22 Feb 2011 10:20:36 +0000 (05:20 -0500)]
test: New function to save a pixman image to .png
When debugging it is often very useful to be able to save an image as
a png file. This commit adds a function "write_png()" that does that.
If libpng is not available, then the function becomes a noop.
Søren Sandmann Pedersen [Sat, 10 Sep 2011 03:59:20 +0000 (23:59 -0400)]
Post-release version bump to 0.23.5
Søren Sandmann Pedersen [Sat, 10 Sep 2011 03:51:11 +0000 (23:51 -0400)]
Pre-release version bump to 0.23.4
Chris Wilson [Mon, 22 Aug 2011 14:29:25 +0000 (15:29 +0100)]
bits: optimise fetching width==1 repeats
Profiling ign.com, 20% of the entire render time was absorbed in this
single operation:
<< /content //COLOR_ALPHA /width 480 /height 800 >> surface context
<< /width 1 /height 677 /format //ARGB32 /source <|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern
//EXTEND_REPEAT set-extend
set-source
n 0 0 480 677 rectangle
fill+
pop
which is a simple composition of a single pixel wide image. Sadly this
is a workaround for lack of independent repeat-x/y handling in cairo and
pixman. Worse still is that the worst-case behaviour of the general repeat
path is for width 1 images...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Taekyun Kim [Fri, 19 Aug 2011 12:20:08 +0000 (21:20 +0900)]
ARM: NEON better instruction scheduling of over_n_8888
New head, tail, tail/head blocks are added and instructions
are reordered to eliminate pipeline stalls
Performance numbers of before/after
- cortex a8 -
before : L1: 375.39 L2: 391.93 M:114.39 ( 40.99%) HT: 99.37 VT: 98.20 R: 90.24 RT: 32.87 ( 240Kops/s)
after : L1: 481.90 L2: 483.46 M:114.29 ( 40.69%) HT:106.91 VT: 93.38 R: 90.74 RT: 29.51 ( 236Kops/s)
- cortex a9 -
before : L1: 324.50 L2: 332.79 M:155.55 ( 47.51%) HT:111.93 VT: 93.58 R: 71.92 RT: 28.21 ( 233Kops/s)
after : L1: 355.87 L2: 364.49 M:156.90 ( 47.59%) HT:111.52 VT: 91.76 R: 72.16 RT: 28.22 ( 234Kops/s)
Taekyun Kim [Tue, 23 Aug 2011 06:00:11 +0000 (15:00 +0900)]
ARM: NEON better instruction scheduling of over_n_8_8888
tail/head block is expanded and reordered to eliminate stalls
Performance numbers of before/after
- cortex a8 -
before : L1: 201.35 L2: 190.48 M:101.94 ( 54.85%) HT: 78.41 VT: 63.83 R: 58.25 RT: 21.74 ( 191Kops/s)
after : L1: 257.65 L2: 255.49 M:102.04 ( 55.33%) HT: 79.19 VT: 65.46 R: 59.23 RT: 21.12 ( 189Kops/s)
- cortex a9 -
before : L1: 157.35 L2: 159.81 M:133.00 ( 60.94%) HT: 82.44 VT: 63.64 R: 51.66 RT: 19.15 ( 179Kops/s)
after : L1: 216.83 L2: 219.40 M:135.83 ( 61.80%) HT: 85.60 VT: 64.80 R: 52.23 RT: 19.16 ( 179Kops/s)