profile/ivi/pixman.git
14 years agoRemove broken optimizations in combine_disjoint_over_u()
Søren Sandmann Pedersen [Tue, 5 Oct 2010 15:08:42 +0000 (11:08 -0400)]
Remove broken optimizations in combine_disjoint_over_u()

The first broken optimization is that it checks "a != 0x00" where it
should check "s != 0x00". The other is that it skips the computation
when alpha is 0xff. That is wrong because in the formula:

     min (1, (1 - Aa)/Ab)

the render specification states that if Ab is 0, the quotient is
defined to positive infinity. That is the case even if (1 - Aa) is 0.

14 years agoARM: restore fallback to ARMv6 implementation from NEON in the delegate chain
Siarhei Siamashka [Mon, 4 Oct 2010 01:49:08 +0000 (04:49 +0300)]
ARM: restore fallback to ARMv6 implementation from NEON in the delegate chain

After fast path cache introduction, the overhead of having this fallback is
insignificant. On the other hand, some of the ARM assembly optimizations (for
example nearest neighbor scaling) do not need NEON.

14 years agoUse more unrolling for scaled src_0565_0565 with nearest filter
Siarhei Siamashka [Wed, 8 Sep 2010 06:30:23 +0000 (09:30 +0300)]
Use more unrolling for scaled src_0565_0565 with nearest filter

Benchmark from Intel Core i7 860:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1335.29 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1550.96 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=2401.31 MPix/s

Benchmark from ARM Cortex-A8:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=81.79 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=197.44 MPix/s

14 years agoARM: added 'neon_composite_out_reverse_8_0565' fast path
Siarhei Siamashka [Thu, 23 Sep 2010 20:41:50 +0000 (23:41 +0300)]
ARM: added 'neon_composite_out_reverse_8_0565' fast path

== before ==

    outrev_8_0565 =  L1:  22.91  L2:  22.40  M: 18.75 ( 10.47%)
                     HT: 12.62   VT: 12.22   R: 11.32  RT:  5.30 (  58Kops/s)

== after ==

    outrev_8_0565 =  L1: 176.27  L2: 151.70  M:108.79 ( 60.81%)
                     HT: 50.43   VT: 37.16   R: 32.26  RT:  9.62 (  97Kops/s)

14 years agoARM: added 'neon_composite_add_0565_8_0565' fast path
Siarhei Siamashka [Thu, 23 Sep 2010 19:28:55 +0000 (22:28 +0300)]
ARM: added 'neon_composite_add_0565_8_0565' fast path

== before ==

    add_0565_8_0565 =  L1:  14.05  L2:  14.03  M: 11.57 ( 12.94%)
                       HT:  8.31   VT:  8.10   R:  7.47  RT:  3.64 (  42Kops/s)

== after ==

    add_0565_8_0565 =  L1: 123.36  L2:  94.70  M: 74.36 ( 83.15%)
                       HT: 31.17   VT:  23.97  R: 21.06  RT:  6.42 (  70Kops/s)

14 years agoARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565
Siarhei Siamashka [Fri, 21 May 2010 13:31:03 +0000 (16:31 +0300)]
ARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565

Prefetch provides up to 40-50% better performance when working
with large images and/or when having lots of L2 cache misses
on ARM Cortex-A8 @ 720MHz:

== before ==

    over_n_8888 =  L1: 225.83  L2: 181.02  M: 55.57 ( 41.41%)
                   HT: 38.96   VT: 36.92   R: 32.84  RT: 14.15 ( 123Kops/s)

    over_n_0565 =  L1: 153.91  L2: 149.69  M: 83.17 ( 30.95%)
                   HT: 50.41   VT: 49.15   R: 40.56  RT: 15.45 ( 131Kops/s)

== after ==

    over_n_8888 =  L1: 222.39  L2: 170.95  M: 76.86 ( 57.27%)
                   HT: 58.80   VT: 53.03   R: 45.51  RT: 14.13 ( 124Kops/s)

    over_n_0565 =  L1: 151.87  L2: 149.54  M:125.63 ( 46.80%)
                   HT: 67.85   VT: 57.54   R: 50.21  RT: 15.32 ( 130Kops/s)

14 years agoFix "syntax error: empty declaration" warnings.
Mika Yrjola [Fri, 1 Oct 2010 13:17:50 +0000 (16:17 +0300)]
Fix "syntax error: empty declaration" warnings.

These minor changes should fix a large number of
macro declaration - related "syntax error:  empty declaration" warnings
which are seen while compiling the code with the Solaris Studio
compiler.

14 years agoDelete simple repeat code
Søren Sandmann Pedersen [Tue, 28 Sep 2010 04:51:07 +0000 (00:51 -0400)]
Delete simple repeat code

This was supposedly an optimization, but it has pathological cases
where it definitely isn't. For example a 1 x n image will cause it to
have terrible memory access patterns and to generate a ton of modulus
operations.

Since no one has ever measured whether it actually is an improvement,
and since it is doing the repeating at the wrong the stage in the
pipeline, and since with the previous commit it can't be triggered
anymore because we now require SAMPLES_COVER_CLIP for regular fast
paths, just delete it.

14 years agoFix bug in FAST_PATH_STD_FAST_PATH
Søren Sandmann Pedersen [Tue, 28 Sep 2010 04:42:25 +0000 (00:42 -0400)]
Fix bug in FAST_PATH_STD_FAST_PATH

The standard fast paths deal with two kinds of images: solids and
bits. These two image types require different flags, but
PIXMAN_STD_FAST_PATH uses the same ones for both.

This patch makes it so that solid images just get the standard flags,
while bits images must be untransformed contain the destination clip
within the sample grid.

This means that the old FAST_PATH_COVERS_CLIP flag is now not used
anymore, so it can be deleted.

14 years agoSome clean-ups in fence_malloc() and fence_free()
Dmitri Vorobiev [Tue, 28 Sep 2010 11:42:02 +0000 (14:42 +0300)]
Some clean-ups in fence_malloc() and fence_free()

This patch removes an unnecessary typecast of MAP_FAILED,
replaces an erroneous free() by the correct munmap() in the
error path for a failing mprotect(), and, finally, removes
redundant calls to mprotect() that aren't necessary, because
munmap() doesn't call for any specific memory protection.

14 years agoFix search-and-replace issue in lowlevel-blt-bench.c
Søren Sandmann Pedersen [Tue, 28 Sep 2010 06:52:02 +0000 (02:52 -0400)]
Fix search-and-replace issue in lowlevel-blt-bench.c

14 years agoRename all the fast paths with _8000 in their names to _8
Søren Sandmann Pedersen [Fri, 17 Sep 2010 13:21:09 +0000 (09:21 -0400)]
Rename all the fast paths with _8000 in their names to _8

This inconsistent naming somehow survived the refactoring from a while
back.

14 years agoRemove cache prefetch code.
Liu Xinyun [Sat, 25 Sep 2010 06:56:38 +0000 (14:56 +0800)]
Remove cache prefetch code.

The performance is decreased with cache prefetch, especially for
ATOM. So remove these code. Following is the experiment.

old: 0.19.5-with-cache-prefetch
new: 0.19.5-without-cache-prefetch

CPU: Intel Atom N270@1.6GHz
OS: MeeGo (32 bits)
Speedups
========
image-rgba                    poppler-0    17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%):  1.16x speedup
image-rgba                  ocitysmap-0    9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%):  1.09x speedup
image-rgba          xfce4-terminal-a1-0    18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%):  1.08x speedup
image-rgba         gnome-terminal-vim-0    25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%):  1.07x speedup
image-rgba          firefox-talos-gfx-0    57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%):  1.06x speedup
image-rgba       firefox-planet-gnome-0    102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%):  1.06x speedup
image-rgba         swfdec-giant-steps-0    12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%):  1.06x speedup

CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz
OS: Ubuntu 10.04 (64bits)
Speedups
========
image-rgba                  ocitysmap-0    2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%):  1.16x speedup
image-rgba         swfdec-giant-steps-0    1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%):  1.05x speedup

Signed-off-by: Liu Xinyun <xinyun.liu@intel.com>
Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>
14 years agoUse <sys/mman.h> macros only when they are available
Dmitri Vorobiev [Wed, 22 Sep 2010 09:34:57 +0000 (12:34 +0300)]
Use <sys/mman.h> macros only when they are available

Not all systems are regular Unices, so let's be careful with the
mmap()-related stuff, which might be unavailable. This patch makes
sure that mmap() and friends is used only when the <sys/mman.h>
header is found.

14 years agoRevert "add enable-cache-prefetch option"
Søren Sandmann Pedersen [Tue, 21 Sep 2010 18:20:43 +0000 (14:20 -0400)]
Revert "add enable-cache-prefetch option"

Revert this accidentally committed patch.

This reverts commit 19ea0e16b958e5abe491365c203293ab372f3586.

14 years agoIf MAP_ANONYMOUS is not defined, define it to MAP_ANON.
Søren Sandmann Pedersen [Tue, 21 Sep 2010 18:12:00 +0000 (14:12 -0400)]
If MAP_ANONYMOUS is not defined, define it to MAP_ANON.

This hopefully fixes the build failure on OS X.

14 years agoadd enable-cache-prefetch option
Liu Xinyun [Tue, 21 Sep 2010 16:15:10 +0000 (00:15 +0800)]
add enable-cache-prefetch option

OK. here is the work to clear all cache prefetch. Please review it. 3x

On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote:
> Liu Xinyun <xinyun.liu@intel.com> writes:
>
> >    This patch is to add a new configuration option: enable-cache-prefetch,
> > which is default yes.
> >
> >    Here is a link which talks on cache issue.
> >    http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
> >
> >    When disable it on Atom CPU(configured with --enable-cache-prefetch=no),
> > it will have a little performance gain. Here is the patch.
>
> I think the cache prefetch code should just be deleted outright. No
> benchmarks that I'm aware of show it to be an improvement.
>
>
> Thanks,
> Soren

>From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001
From: Liu Xinyun <xinyun.liu@intel.com>
Date: Wed, 22 Sep 2010 00:11:56 +0800
Subject: [PATCH] remove cache prefetch

14 years agoPost-release version bump to 0.19.5
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:18:44 +0000 (10:18 -0400)]
Post-release version bump to 0.19.5

14 years agoPre-release version bump to 0.19.4
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:11:34 +0000 (10:11 -0400)]
Pre-release version bump to 0.19.4

14 years agocompute_composite_region32: Zero extents before returning FALSE.
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:05:52 +0000 (10:05 -0400)]
compute_composite_region32: Zero extents before returning FALSE.

If the extents of the composite region are broken such that x2 <= x1
or y2 <= y1, then we need to zero the extents before returning so that
the region won't be completely broken when calling
pixman_region32_fini().

14 years agoAdd a lowlevel blitter benchmark
Jonathan Morton [Fri, 17 Sep 2010 14:52:23 +0000 (17:52 +0300)]
Add a lowlevel blitter benchmark

This test is a modified version of Siarhei's compositor throughput
benchmark.  It's expanded with explicit reporting of memory bandwidth
consumption for the M-test, and with an additional 8x8-random test
intended to determine peak ops/sec capability.  There are also quite a
lot more operations tested for.

14 years agoAdd noinline macro
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:22 +0000 (17:52 +0300)]
Add noinline macro

This patch adds a noinline macro, which expands to compiler-dependent
keywords that tell the compiler to never inline a function.

14 years agoAdd gettime() routine to test utils
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:21 +0000 (17:52 +0300)]
Add gettime() routine to test utils

Impending benchmark code will need a function to get current time
in seconds, and this patch introduces such routine. We try to use
the POSIX gettimeofday() function when available, and fall back to
clock() when not.

14 years agoMove aligned_malloc() to utils
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:20 +0000 (17:52 +0300)]
Move aligned_malloc() to utils

The aligned_malloc() routine will be used in more than one test utility.
At least, a low-level blitter benchmark needs it. Therefore, let's make
this function a part of common test utilities code.

14 years agoEnable bits_image_fetch_bilinear_affine_normal_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:23 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:10 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_none_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:00 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_none_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_pad_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:44 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_normal_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:27 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_a8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:12 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_a8

14 years agoEnable bits_image_fetch_bilinear_affine_none_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:31:57 +0000 (10:31 -0400)]
Enable bits_image_fetch_bilinear_affine_none_a8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:31:45 +0000 (10:31 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_a8

14 years agoEnable bits_image_fetch_bilinear_affine_normal_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:41:20 +0000 (02:41 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:41:08 +0000 (02:41 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_none_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:56 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:46 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_normal_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:16 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:03 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_none_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:39:51 +0000 (02:39 -0400)]
Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:39:37 +0000 (02:39 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8

14 years agoUse a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.
Søren Sandmann Pedersen [Sun, 23 May 2010 08:44:33 +0000 (04:44 -0400)]
Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.

There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each scaler is an
inline function that takes a format and a repeat mode as parameters.

The new scalers are all commented out, but the next commits will
enable them one at a time to facilitate bisecting.

14 years agotest: Add affine-test
Søren Sandmann Pedersen [Wed, 14 Jul 2010 20:27:27 +0000 (16:27 -0400)]
test: Add affine-test

This test tests compositing with various affine transformations. It is
almost identical to scaling-test, except that it also applies a random
rotation in addition to the random scaling and translation.

14 years agoanalyze_extents: Fast path for non-transformed BITS images
Søren Sandmann Pedersen [Sun, 12 Sep 2010 10:07:41 +0000 (06:07 -0400)]
analyze_extents: Fast path for non-transformed BITS images

Profiling various cairo traces showed that we were spending a lot of
time in analyze_extents and compute_sample_extents(). This was
especially bad for glyphs where all this computation was completely
unnecessary.

This patch adds a fast path for the case of non-transformed BITS
images. The result is approximately a 6% improvement on the
firefox-talos-gfx benchmark:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   13.797   13.848   0.20%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   12.946   13.018   0.39%    6/6

14 years agoMove some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c
Søren Sandmann Pedersen [Thu, 16 Sep 2010 12:35:05 +0000 (08:35 -0400)]
Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c

When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag
can be set in compute_image_info().

Also the code that turned this flag off in pixman.c was not correct;
it didn't take transformations into account. With this patch, pixman.c
doesn't set the flag by default, but instead relies on the call to
compute_samples_extents() to set it when possible.

14 years agoSupport __thread on MINGW 4.5
Tor Lillqvist [Wed, 15 Sep 2010 15:53:47 +0000 (11:53 -0400)]
Support __thread on MINGW 4.5

By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse
and mmx work fine.

I added the below to pixman 0.18 and as far as I can see, it works.
make check reports no problems. (Earlier I had to use --disable-mmx
and --disable-sse2.) Also gtk-demo and gimp run fine.

(Also a change to get rid of the warnings about -fvisibility being ignored.)

14 years agoClip composite region against the destination alpha map extents.
Søren Sandmann Pedersen [Mon, 30 Aug 2010 02:46:09 +0000 (22:46 -0400)]
Clip composite region against the destination alpha map extents.

Otherwise we can end up writing outside the alpha map.

14 years agoRemove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map
Søren Sandmann Pedersen [Sun, 29 Aug 2010 21:07:40 +0000 (17:07 -0400)]
Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map

If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.

In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.

14 years agoRename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT
Søren Sandmann Pedersen [Sun, 29 Aug 2010 21:03:01 +0000 (17:03 -0400)]
Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT

This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.

14 years agoUpdate and extend the alphamap test
Søren Sandmann Pedersen [Sun, 29 Aug 2010 20:59:02 +0000 (16:59 -0400)]
Update and extend the alphamap test

- Test many more combinations of formats

- Test destination alpha maps

- Test various different alpha origins

Also add a transformation to the destination, but comment it out
because it is actually broken at the moment (and pretty difficult to
fix).

14 years agoAdd fence_malloc() and fence_free().
Søren Sandmann Pedersen [Mon, 13 Sep 2010 18:34:34 +0000 (14:34 -0400)]
Add fence_malloc() and fence_free().

These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.

If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().

14 years agoDo opacity computation with shifts instead of comparing with 0
Søren Sandmann Pedersen [Sun, 12 Sep 2010 08:35:08 +0000 (04:35 -0400)]
Do opacity computation with shifts instead of comparing with 0

Also add a COMPILE_TIME_ASSERT() macro and use it to assert that the
shift is correct.

14 years agoSSE2 optimization for scaled over_8888_8888 operation with nearest filter
Siarhei Siamashka [Wed, 8 Sep 2010 06:16:12 +0000 (09:16 +0300)]
SSE2 optimization for scaled over_8888_8888 operation with nearest filter

This is the first demo implementation, it should be possible to
generalize it later to cover more operations with less lines of code.

It should be also possible to introduce the use of '__builtin_constant_p'
gcc builtin function for an efficient way of checking if 'unit_x' is known
to be zero at compile time (when processing padding pixels for NONE, or
PAD repeat).

Benchmarks from Intel Core i7 860:

== before (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=142.01 MPix/s

== after (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=314.99 MPix/s

== performance of nonscaled operation as a reference ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=652.09 MPix/s

14 years agoNONE repeat support for fast scaling with nearest filter
Siarhei Siamashka [Thu, 16 Sep 2010 15:25:40 +0000 (18:25 +0300)]
NONE repeat support for fast scaling with nearest filter

Implemented very similar to PAD repeat.

And gcc also seems to be able to completely eliminate the
code responsible for left and right padding pixels for OVER
operation with NONE repeat.

14 years agoPAD repeat support for fast scaling with nearest filter
Siarhei Siamashka [Thu, 16 Sep 2010 14:10:40 +0000 (17:10 +0300)]
PAD repeat support for fast scaling with nearest filter

When processing pixels from the left and right padding, the same
scanline function is used with 'unit_x' set to 0.

Actually appears that gcc can handle this quite efficiently. When
using 'restrict' keyword, it is able to optimize the whole operation
performed on left or right padding pixels to a small unrolled loop
(the code is reduced to a simple fill implementation):

    9b30:       89 08                   mov    %ecx,(%rax)
    9b32:       89 48 04                mov    %ecx,0x4(%rax)
    9b35:       48 83 c0 08             add    $0x8,%rax
    9b39:       49 39 c0                cmp    %rax,%r8
    9b3c:       75 f2                   jne    9b30

Without 'restrict' keyword, there is one instruction more: reloading
source pixel data from memory in the beginning of each iteration. That
is slower, but also acceptable.

14 years agoIntroduce a fake PIXMAN_REPEAT_COVER constant
Siarhei Siamashka [Fri, 17 Sep 2010 13:22:25 +0000 (16:22 +0300)]
Introduce a fake PIXMAN_REPEAT_COVER constant

We need to implement a true PIXMAN_REPEAT_NONE support later (padding
the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE
for handling FAST_PATH_SAMPLES_COVER_CLIP special case.

14 years agoNearest scaling fast path macro split into two parts
Siarhei Siamashka [Thu, 16 Sep 2010 10:02:18 +0000 (13:02 +0300)]
Nearest scaling fast path macro split into two parts

Scanline processing is now split into a separate function. This provides
an easy way of overriding it with a platform specific implementation,
which may use SIMD optimizations. Only basic C data types are used as
the arguments for this function, so it may be implemented entirely in
assembly or be generated by some JIT engine.

Also as a result of this split, the complexity of code is reduced a
bit and now it should be easier to introduce support for the currently
missing NONE, PAD and REFLECT repeat types.

14 years agoNearest scaling fast path macros moved to 'pixman-fast-path.h'
Siarhei Siamashka [Thu, 16 Sep 2010 09:31:27 +0000 (12:31 +0300)]
Nearest scaling fast path macros moved to 'pixman-fast-path.h'

These macros with some modifications can can be reused later by
various platform specific implementations, introducing SIMD
optimizations for nearest scaling fast paths.

14 years agoAdd FAST_PATH_NO_ALPHA_MAP to the standard destination flags.
Søren Sandmann Pedersen [Sun, 29 Aug 2010 20:26:45 +0000 (16:26 -0400)]
Add FAST_PATH_NO_ALPHA_MAP to the standard destination flags.

We can't in general take a fast path if the destination has an alpha
map.

14 years agotest: detection of possible floating point registers corruption
Siarhei Siamashka [Thu, 9 Sep 2010 09:02:59 +0000 (12:02 +0300)]
test: detection of possible floating point registers corruption

Added a pair of macros which can help to detect corruption
of floating point registers after a function call. This may
happen if _mm_empty() call is forgotten in MMX/SSE2 fast
path code, or ARM NEON assembly optimized function
forgets to save/restore d8-d15 registers before use.

14 years agoARM: added 'neon_composite_over_0565_8_0565' fast path
Siarhei Siamashka [Mon, 6 Sep 2010 22:15:57 +0000 (01:15 +0300)]
ARM: added 'neon_composite_over_0565_8_0565' fast path

14 years agoARM: helper macros for conversion between 8888/x888/0565 formats
Siarhei Siamashka [Mon, 6 Sep 2010 22:10:43 +0000 (01:10 +0300)]
ARM: helper macros for conversion between 8888/x888/0565 formats

14 years agoARM: common init/cleanup macro for saving/restoring NEON registers
Siarhei Siamashka [Mon, 6 Sep 2010 22:05:44 +0000 (01:05 +0300)]
ARM: common init/cleanup macro for saving/restoring NEON registers

This is a typical prologue/epilogue for many NEON fast path functions, so
it makes sense to provide common reusable macros for it in the header file.

14 years agoSilence some warnings about uninitialized variables
Søren Sandmann Pedersen [Thu, 2 Sep 2010 23:43:08 +0000 (19:43 -0400)]
Silence some warnings about uninitialized variables

Neither were real problems, but GCC was complaining about them.

14 years agoWhen pixman_compute_composite_region32() returns FALSE, don't fini the region.
Søren Sandmann Pedersen [Tue, 31 Aug 2010 04:30:54 +0000 (00:30 -0400)]
When pixman_compute_composite_region32() returns FALSE, don't fini the region.

The rule is that the region passed in must be initialized and that the
region returned will still be valid. Ie., the lifecycle is the
responsibility of the caller, regardless of what the function returns.

Previously, compute_composite_region32() would finalize the region and
then return FALSE, and then the caller would finalize the region
again, leading to memory corruption in some cases.

14 years agoStore a2b2g2r2 pixel through the WRITE macro
Søren Sandmann Pedersen [Mon, 30 Aug 2010 04:16:07 +0000 (00:16 -0400)]
Store a2b2g2r2 pixel through the WRITE macro

Otherwise, accessor functions won't work.

14 years agoARM: added 'neon_composite_over_8888_8_0565' fast path
Siarhei Siamashka [Mon, 23 Aug 2010 15:24:32 +0000 (18:24 +0300)]
ARM: added 'neon_composite_over_8888_8_0565' fast path

14 years agoAdd *.exe to .gitignore
Maarten Bosmans [Mon, 30 Aug 2010 06:55:00 +0000 (08:55 +0200)]
Add *.exe to .gitignore

14 years agoUse windows.h directly for mingw32 build
Maarten Bosmans [Sun, 29 Aug 2010 04:28:42 +0000 (06:28 +0200)]
Use windows.h directly for mingw32 build

This patch adresses the issue discussed in
http://lists.freedesktop.org/archives/pixman/2010-April/000163.html

There were only two clashing identifiers.  The first one is IN, which
obviously causes problems in Pixman for lines like

    PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8),

Fortunately the mingw headers provide a solution: by defining
_NO_W32_PSEUDO_MODIFIERS, these stupid symbols are skipped.

The other name is UINT64, used in pixman-mmx.c. I renamed that
function to to_uint64, but may be another name is more appropriate.

14 years agoBe more paranoid about checking for GTK+
Søren Sandmann Pedersen [Mon, 23 Aug 2010 13:27:38 +0000 (09:27 -0400)]
Be more paranoid about checking for GTK+

From time to time people run into issues where the configure script
detects GTK+ when it is either not installed, or not functional due to
a missing pixman. Most recently:

  https://bugs.freedesktop.org/show_bug.cgi?id=29736

This patch makes the configure script more paranoid by

- always using PKG_CHECK_MODULES and not PKG_CHECK_EXISTS, since it
seems PKG_CHECK_EXISTS will sometimes return true even if a dependency
of GTK+, such as pixman-1, is missing.

- explicitly checking that pixman-1 is installed before enabling GTK+.

Cc: my.somewhat.lengthy.loginname@gmail.com
14 years agoMerge pixman_image_composite32() and do_composite().
Søren Sandmann Pedersen [Sun, 22 Aug 2010 15:09:45 +0000 (11:09 -0400)]
Merge pixman_image_composite32() and do_composite().

There is not much point having a separate function that just validates
the images. Also add a boolean return to lookup_composite_function()
so that we can return if no composite function is found.

14 years agoregion: Fix pixman_region_translate() clipping bug
Benjamin Otte [Mon, 23 Aug 2010 16:20:09 +0000 (18:20 +0200)]
region: Fix pixman_region_translate() clipping bug

Fixes the region-translate test case by clipping region translations to
the newly defined PIXMAN_REGION_MIN/MAX and using the newly introduced
type overflow_int_t to check for the overflow.
Also uses INT16_MAX or INT32_MAX for these values instead of relying on
the size of short and int types.

14 years agoregion: Add a new test region-translate
Benjamin Otte [Tue, 24 Aug 2010 10:17:18 +0000 (12:17 +0200)]
region: Add a new test region-translate

This test exercises a bug in pixman_region32_translate(). The function
clips the region to int16 coordinates SHRT_MIN/SHRT_MAX.

14 years agoPost-release version bump to 0.19.3
Søren Sandmann Pedersen [Sat, 21 Aug 2010 10:39:44 +0000 (06:39 -0400)]
Post-release version bump to 0.19.3

14 years agoPre-release version bump to 0.19.2
Søren Sandmann Pedersen [Sat, 21 Aug 2010 10:33:19 +0000 (06:33 -0400)]
Pre-release version bump to 0.19.2

14 years agoOnly try to compute the FAST_SAMPLES_COVER_CLIP for bits images
Søren Sandmann Pedersen [Mon, 16 Aug 2010 11:24:48 +0000 (07:24 -0400)]
Only try to compute the FAST_SAMPLES_COVER_CLIP for bits images

It doesn't make sense in other cases, and the computation would make
use of image->bits.{width,height} which lead to uninitialized memory
accesses when the image wasn't of type BITS.

14 years agoIntroduce new FAST_PATH_SAMPLES_OPAQUE flag
Søren Sandmann Pedersen [Tue, 10 Aug 2010 00:54:49 +0000 (20:54 -0400)]
Introduce new FAST_PATH_SAMPLES_OPAQUE flag

This flag is set whenever the pixels of a bits image don't have an
alpha channel. Together with FAST_PATH_SAMPLES_COVER_CLIP it implies
that the image effectively is opaque, so we can do operator reductions
such as OVER->SRC.

14 years agopixman_image_set_alpha_map(): Disallow alpha map cycles
Søren Sandmann Pedersen [Wed, 4 Aug 2010 21:51:49 +0000 (17:51 -0400)]
pixman_image_set_alpha_map(): Disallow alpha map cycles

If someone tries to set an alpha map that itself has an alpha map,
simply return. Also, if someone tries to add an alpha map to an image
that is being _used_ as an alpha map, simply return.

This ensures that an alpha map can never have an alpha map.

14 years agoAdd alpha-loop test program
Søren Sandmann Pedersen [Wed, 4 Aug 2010 21:55:14 +0000 (17:55 -0400)]
Add alpha-loop test program

This tests what happens if you attempt to make an image with an alpha
map that has the image as its alpha map. This results in an infinite
loop in _pixman_image_validate(), so the test sets up a SIGALRM to
exit if it runs for more than five seconds.

14 years agoARM: 'neon_combine_out_reverse_u' combiner
Siarhei Siamashka [Mon, 31 May 2010 16:24:43 +0000 (19:24 +0300)]
ARM: 'neon_combine_out_reverse_u' combiner

This operation was seen in mozilla browser profiling logs.
Implemented so that 'over' and 'out_reverse' operations
now reuse common parts of code.

14 years agoCode simplification (no need advancing 'vx' at the end of scanline)
Siarhei Siamashka [Fri, 19 Mar 2010 10:21:32 +0000 (12:21 +0200)]
Code simplification (no need advancing 'vx' at the end of scanline)

14 years agoStore the various bits image fetchers in a table with formats and flags.
Søren Sandmann Pedersen [Fri, 2 Jul 2010 18:14:21 +0000 (14:14 -0400)]
Store the various bits image fetchers in a table with formats and flags.

Similarly to how the fast paths are done, put the various bits_image
fetchers in a table, so that we can quickly find the best one based on
the image's flags and format.

14 years agoAdd some new FAST_PATH flags
Søren Sandmann Pedersen [Fri, 2 Jul 2010 16:53:56 +0000 (12:53 -0400)]
Add some new FAST_PATH flags

The flags are:

 *  AFFINE_TRANSFORM, for affine transforms

 *  Y_UNIT_ZERO, for when the 10 entry in the transformation is zero

 *  FILTER_BILINEAR, for when the image has a bilinear filter

 *  NO_NORMAL_REPEAT, for when the repeat mode is not NORMAL

 *  HAS_TRANSFORM, for when the transform is not NULL

Also add some new FAST_PATH_REPEAT_* macros. These are just shorthands
for the image not having any of the other repeat modes. For example
REPEAT_NORMAL is (NO_NONE | NO_PAD | NO_REFLECT).

14 years agoRemove "_raw_" from all the accessors.
Søren Sandmann Pedersen [Fri, 2 Jul 2010 16:45:44 +0000 (12:45 -0400)]
Remove "_raw_" from all the accessors.

There are no non-raw accessors anymore.

14 years agoEliminate the store_scanline_{32,64} function pointers.
Søren Sandmann Pedersen [Fri, 2 Jul 2010 16:34:42 +0000 (12:34 -0400)]
Eliminate the store_scanline_{32,64} function pointers.

Now that we can't recurse on alpha maps, they are not needed anymore.

14 years agoSplit bits_image_fetch_transformed() into two functions.
Søren Sandmann Pedersen [Fri, 2 Jul 2010 16:31:50 +0000 (12:31 -0400)]
Split bits_image_fetch_transformed() into two functions.

One function deals with the common affine, no-alpha-map case. The
other deals with perspective transformations and alpha maps.

14 years agoEliminate get_pixel_32() and get_pixel_64() from bits_image.
Søren Sandmann Pedersen [Fri, 2 Jul 2010 16:11:44 +0000 (12:11 -0400)]
Eliminate get_pixel_32() and get_pixel_64() from bits_image.

These functions can simply be passed as arguments to the various pixel
fetchers. We don't need to store them. Since they are known at compile
time and the pixel fetchers are force_inline, this is not a
performance issue.

Also temporarily make all pixel access go through the alpha path.

14 years agoEliminate recursion from alpha map code
Søren Sandmann Pedersen [Fri, 2 Jul 2010 15:58:23 +0000 (11:58 -0400)]
Eliminate recursion from alpha map code

Alpha maps with alpha maps are no longer supported. It's not a useful
feature and it could could lead to infinite recursion.

14 years agoReplace compute_src_extent_flags() with analyze_extents()
Søren Sandmann Pedersen [Thu, 22 Jul 2010 08:27:45 +0000 (04:27 -0400)]
Replace compute_src_extent_flags() with analyze_extents()

This commit fixes two separate problems: 1. Incorrect computation of
the FAST_PATH_SAMPLES_COVER_CLIP flag, and 2. FAST_PATH_16BIT_SAFE is
a nonsensical thing to compute.

== 1. Incorrect computation of SAMPLES_COVER_CLIP:

Previously we were using pixman_transform_bounds() to compute which
source samples would be used for a composite operation. This is
incorrect for several reasons:

(a) pixman_transform_bounds() is transforming the integer bounding box
of the destination samples, where it should be transforming the
bounding box of the samples themselves. In other words, it is too
pessimistic in some cases.

(b) pixman_transform_bounds() is not rounding the same way as we do
during sampling. For example, for a NEAREST filter we subtract
pixman_fixed_e before rounding off to the nearest sample so that a
transformed value of 1 will round to the sample at 0.5 and not to the
one at 1.5. However, pixman_transform_bounds() would simply truncate
to 1 which would imply that the first sample to be used was the one at
1.5. In other words, it is too optimistic in some cases.

(c) The result of pixman_transform_bounds() does not account for the
interpolation filter applied to the source.

== 2. FAST_PATH_16BIT_SAFE is nonsensical

The FAST_PATH_16BIT_SAFE is a flag that indicates that various
computations can be safely done within a 16.16 fixed-point
variable. It was used by certain fast paths who relied on those
computations succeeding. The problem is that many other compositing
functions were making similar assumptions but not actually requiring
the flag to be set. Notably, all the general compositing functions
simply walk the source region using 16.16 variables. If the
transformation happens to overflow, strange things will happen.

So instead of computing this flag in certain cases, it is better to
simply detect that overflows will happen and not try to composite at
all in that case. This has the advantage that most compositing
functions can be written naturally way.

It does have the disadvantage that we are giving up on some cases that
previously worked, but those are all corner cases where the areas
involved were very close to the limits of the coordinate
system. Relying on these working reliably was always a somewhat
dubious proposition. The most important case that might have worked
previously was untransformed compositing involving images larger than
32 bits. But even in those cases, if you had REPEAT_PAD or
REPEAT_REFLECT turned on, you would hit bits_image_fetch_transformed()
which has the 16 bit limitations.

== Fixes

This patch fixes both problems by introducing a new function called
analyze_extents() that has the responsibility to reject corner cases,
and to compute flags based on the extents.

It does this through a new compute_sample_extents() function that will
compute a conservative (but tight) approximation to the bounding box
of the samples that will actually be needed. By basing the computation
on the positions of the _sample_ locations in the destination, and by
taking the interpolation filter into account, it fixes problem one.

The same function is also used with a one-pixel expanded version of
the destination extents. By checking if the transformed bounding box
will overflow 16.16 fixed point, it fixes problem two.

14 years agoExtend scaling-crash-test in various ways
Søren Sandmann Pedersen [Wed, 28 Jul 2010 06:11:08 +0000 (02:11 -0400)]
Extend scaling-crash-test in various ways

This extends scaling-crash-test to test some more things:

- All combinations of NEAREST/BILINEAR/CONVOLUTION filters and
  NORMAL/PAD/REFLECT repeat modes.

- Tests various scale factors very close to 1/7th such that the source
  area is very close to edge of the source image.

- The same things, only with scale factors very close to 1/32767th.

- Enables the commented-out tests for accessing memory outside the
  source buffer.

Also there is now a border around the source buffer which has a
different color than the source buffer itself so that if we sample
outside, it will show up.

Finally, the test now allows the destination buffer to not be changed
at all. This allows pixman to simply bail out in cases where the
transformation too strange.

14 years agoFix Altivec/OpenBSD patch
Søren Sandmann Pedersen [Thu, 5 Aug 2010 23:00:56 +0000 (19:00 -0400)]
Fix Altivec/OpenBSD patch

As Brad pointed out, I pushed the wrong version of this patch.

14 years agoAdd support for AltiVec detection for OpenBSD/PowerPC.
Brad Smith [Sat, 31 Jul 2010 09:07:02 +0000 (05:07 -0400)]
Add support for AltiVec detection for OpenBSD/PowerPC.

Bug 29331.

14 years agoCODING_STYLE: Delete the stuff about trailing spaces
Søren Sandmann Pedersen [Wed, 4 Aug 2010 13:50:30 +0000 (09:50 -0400)]
CODING_STYLE: Delete the stuff about trailing spaces

Also fix various other minor issues.

14 years agoIf we bail out of do_composite, make sure to undo any workarounds.
Søren Sandmann Pedersen [Wed, 28 Jul 2010 07:17:35 +0000 (03:17 -0400)]
If we bail out of do_composite, make sure to undo any workarounds.

The workaround for an old X bug has to be undone if we bail from
do_composite, so we can't just return.

14 years agoAdd x14r6g6b6 format to blitters-test
Søren Sandmann Pedersen [Wed, 4 Aug 2010 12:58:51 +0000 (08:58 -0400)]
Add x14r6g6b6 format to blitters-test

14 years agoAdd support for 32bpp X14R6G6B6 format.
Marek Vasut [Sun, 1 Aug 2010 00:18:52 +0000 (02:18 +0200)]
Add support for 32bpp X14R6G6B6 format.

This format is used on PXA framebuffer with some boards. It uses only 18 bits
from the 32 bit framebuffer to interpret color.

Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
14 years agotest: 'scaling-test' updated to provide better coverage
Siarhei Siamashka [Wed, 14 Jul 2010 13:43:16 +0000 (16:43 +0300)]
test: 'scaling-test' updated to provide better coverage

Negative scale factors are now also tested. A small additional
translate transform helps to stress the use of fractional
coordinates better.

Also the number of iterations to run by default increased in order
to compensate increased variety of operations to be tested.

14 years agotest: 'scaling-crash-test' added
Siarhei Siamashka [Mon, 19 Jul 2010 17:25:05 +0000 (20:25 +0300)]
test: 'scaling-crash-test' added

This test tries to exploit some corner cases and previously known
bugs in nearest neighbor scaling fast path code, attempting to
crash pixman or cause some other nasty effect.

14 years agobits: Fix potential divide-by-zero in projective code
Søren Sandmann Pedersen [Fri, 16 Jul 2010 03:40:28 +0000 (23:40 -0400)]
bits: Fix potential divide-by-zero in projective code

If the homogeneous coordinate is 0, just set the coordinates to 0.

14 years ago[sse2] Add sse2_composite_add_n_8()
Søren Sandmann Pedersen [Mon, 26 Apr 2010 00:25:50 +0000 (20:25 -0400)]
[sse2] Add sse2_composite_add_n_8()

This shows up when epiphany displays the "ImageTest" on
glimr.rubyforge.org/cake/canvas.html

14 years ago[sse2] Add sse2_composite_in_n_8()
Søren Sandmann Pedersen [Sun, 25 Apr 2010 23:54:28 +0000 (19:54 -0400)]
[sse2] Add sse2_composite_in_n_8()

This shows up when epiphany displays the "ImageTest" on
glimr.rubyforge.org/cake/canvas.html