platform/upstream/pixman.git
13 years agoPost-release version bump to 0.21.3
Søren Sandmann Pedersen [Tue, 16 Nov 2010 22:14:47 +0000 (17:14 -0500)]
Post-release version bump to 0.21.3

13 years agoPre-release version bump pixman-0.21.2
Søren Sandmann Pedersen [Tue, 16 Nov 2010 21:43:26 +0000 (16:43 -0500)]
Pre-release version bump

13 years agoGenerate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images
Søren Sandmann Pedersen [Wed, 3 Nov 2010 03:38:10 +0000 (23:38 -0400)]
Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images

There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each function is an
inline function that takes a format and a repeat mode as parameters.

14 years agoImprove conical gradients opacity check
Andrea Canciani [Tue, 2 Nov 2010 16:04:35 +0000 (17:04 +0100)]
Improve conical gradients opacity check

Conical gradients are completely opaque if all of their stops are
opaque and the repeat mode is not 'none'.

14 years agoFix opacity check
Andrea Canciani [Tue, 2 Nov 2010 16:02:01 +0000 (17:02 +0100)]
Fix opacity check

Radial gradients are "conical", thus they can have some non-opaque
parts even if all of their stops are completely opaque.

To guarantee that a radial gradient is actually opaque, it needs to
also have one of the two circles containing the other one. In this
case when extrapolating, the whole plane is completely covered (as
explained in the comment in pixman-radial-gradient.c).

14 years agoRemove unused stop_range field
Andrea Canciani [Sun, 31 Oct 2010 15:59:45 +0000 (16:59 +0100)]
Remove unused stop_range field

14 years agoARM: optimization for scaled src_0565_0565 with nearest filter
Siarhei Siamashka [Sun, 3 Oct 2010 22:56:59 +0000 (01:56 +0300)]
ARM: optimization for scaled src_0565_0565 with nearest filter

The performance improvement is only in the ballpark of 5% when
compared against C code built with a reasonably good compiler
(gcc 4.5.1). But gcc 4.4 produces approximately 30% slower code
here, so assembly optimization makes sense to avoid dependency
on the compiler quality and/or optimization options.

Benchmark from ARM11:
    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=34.86 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=36.62 MPix/s

Benchmark from ARM Cortex-A8:
    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=94.91 MPix/s

14 years agoARM: NEON optimization for scaled src_0565_8888 with nearest filter
Siarhei Siamashka [Tue, 2 Nov 2010 14:12:42 +0000 (16:12 +0200)]
ARM: NEON optimization for scaled src_0565_8888 with nearest filter

Benchmark from ARM Cortex-A8 @720MHz:
    == before ==
    op=1, src_fmt=10020565, dst_fmt=20028888, speed=8.99 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=20028888, speed=76.98 MPix/s

    == unscaled ==
    op=1, src_fmt=10020565, dst_fmt=20028888, speed=137.78 MPix/s

14 years agoARM: NEON optimization for scaled src_8888_0565 with nearest filter
Siarhei Siamashka [Tue, 2 Nov 2010 13:25:51 +0000 (15:25 +0200)]
ARM: NEON optimization for scaled src_8888_0565 with nearest filter

Benchmark from ARM Cortex-A8 @720MHz:
    == before ==
    op=1, src_fmt=20028888, dst_fmt=10020565, speed=42.51 MPix/s

    == after ==
    op=1, src_fmt=20028888, dst_fmt=10020565, speed=55.61 MPix/s

    == unscaled ==
    op=1, src_fmt=20028888, dst_fmt=10020565, speed=117.99 MPix/s

14 years agoARM: NEON optimization for scaled over_8888_0565 with nearest filter
Siarhei Siamashka [Tue, 2 Nov 2010 12:39:02 +0000 (14:39 +0200)]
ARM: NEON optimization for scaled over_8888_0565 with nearest filter

Benchmark from ARM Cortex-A8 @720MHz:
    == before ==
    op=3, src_fmt=20028888, dst_fmt=10020565, speed=10.29 MPix/s

    == after ==
    op=3, src_fmt=20028888, dst_fmt=10020565, speed=36.36 MPix/s

    == unscaled ==
    op=3, src_fmt=20028888, dst_fmt=10020565, speed=79.40 MPix/s

14 years agoARM: NEON optimization for scaled over_8888_8888 with nearest filter
Siarhei Siamashka [Tue, 2 Nov 2010 12:29:57 +0000 (14:29 +0200)]
ARM: NEON optimization for scaled over_8888_8888 with nearest filter

Benchmark from ARM Cortex-A8 @720MHz:
    == before ==
    op=3, src_fmt=20028888, dst_fmt=20028888, speed=12.73 MPix/s

    == after ==
    op=3, src_fmt=20028888, dst_fmt=20028888, speed=28.75 MPix/s

    == unscaled ==
    op=3, src_fmt=20028888, dst_fmt=20028888, speed=53.03 MPix/s

14 years agoARM: performance tuning of NEON nearest scaled pixel fetcher
Siarhei Siamashka [Tue, 2 Nov 2010 17:16:46 +0000 (19:16 +0200)]
ARM: performance tuning of NEON nearest scaled pixel fetcher

Interleaving the use of NEON registers helps to avoid some stalls
in NEON pipeline and provides a small performance improvement.

14 years agoARM: macro template in C code to simplify using scaled fast paths
Siarhei Siamashka [Tue, 2 Nov 2010 12:26:13 +0000 (14:26 +0200)]
ARM: macro template in C code to simplify using scaled fast paths

This template can be used to instantiate scaled fast path functions
by providing main loop code and calling NEON assembly optimized
scanline processing functions from it. Another macro can be used
to simplify adding entries to fast path tables.

14 years agoARM: nearest scaling support for NEON scanline compositing functions
Siarhei Siamashka [Mon, 1 Nov 2010 08:03:59 +0000 (10:03 +0200)]
ARM: nearest scaling support for NEON scanline compositing functions

Now it is possible to generate scanline processing functions
for the case when the source image is scaled with NEAREST filter.

Only 16bpp and 32bpp pixel formats are supported for now. But the
others can be also added later when needed. All the existing NEON
fast path functions should be quite easy to reuse for implementing
fast paths which can work with scaled source images.

14 years agoARM: NEON: source image pixel fetcher can be overrided now
Siarhei Siamashka [Mon, 1 Nov 2010 03:10:34 +0000 (05:10 +0200)]
ARM: NEON: source image pixel fetcher can be overrided now

Added a special macro 'pixld_src' which is now responsible for fetching
pixels from the source image. Right now it just passes all its arguments
directly to 'pixld' macro, but it can be used in the future to provide
a special pixel fetcher for implementing nearest scaling.

The 'pixld_src' has a lot of arguments which define its behavior. But
for each particular fast path implementation, we already know NEON
registers allocation and how many pixels are processed in a single block.
That's why a higher level macro 'fetch_src_pixblock' is also introduced
(it's easier to use because it has no arguments) and used everywhere
in 'pixman-arm-neon-asm.S' instead of VLD instructions.

This patch does not introduce any functional changes and the resulting code
in the compiled object file is exactly the same.

14 years agoARM: fix 'vld1.8'->'vld1.32' typo in add_8888_8888 NEON fast path
Siarhei Siamashka [Tue, 2 Nov 2010 20:53:55 +0000 (22:53 +0200)]
ARM: fix 'vld1.8'->'vld1.32' typo in add_8888_8888 NEON fast path

This was mostly harmless and had no effect on little endian systems.
But wrong vector element size is at least inconsistent and also
can theoretically cause problems on big endian ARM systems.

14 years agoDo CPU features detection from 'constructor' function when compiled with gcc
Siarhei Siamashka [Fri, 24 Sep 2010 13:36:16 +0000 (16:36 +0300)]
Do CPU features detection from 'constructor' function when compiled with gcc

There is attribute 'constructor' supported since gcc 2.7 which allows
to have a constructor function for library initialization. This eliminates
an extra branch for each composite operation and also helps to avoid
complains from race condition detection tools like helgrind.

The other compilers may or may not support this attribute properly.
Ideally, the compilers should fail to compile the code with unknown
attribute, so the configure check should do the right job. But in
reality the problems are surely possible. Fortunately such problems
should be quite easy to find because NULL pointer dereference should
happen almost immediately if the constructor fails to run.

clang 2.7:
  supports __attribute__((constructor)) properly and pretends to be gcc

tcc 0.9.25:
  ignores __attribute__((constructor)), but does not pretend to be gcc

14 years agoDelete the source_image_t struct.
Søren Sandmann Pedersen [Sun, 31 Oct 2010 05:40:57 +0000 (01:40 -0400)]
Delete the source_image_t struct.

It serves no purpose anymore now that the source_class_t field is gone.

14 years ago[mmx] Mark some of the output variables as early-clobber.
Søren Sandmann Pedersen [Sat, 30 Oct 2010 21:20:22 +0000 (17:20 -0400)]
[mmx] Mark some of the output variables as early-clobber.

GCC assumes that input variables in inline assembly are fully consumed
before any output variable is written. This means it may allocate the
variables in the same register unless the output variables are marked
as early-clobber.

From Jeremy Huddleston:

    I noticed a problem building pixman with clang and reported it to
    the clang developers.  They responded back with a comment about
    the inline asm in pixman-mmx.c and suggested a fix:

    """
    Incidentally, Jeremy, in the asm that reads
    __asm__ (
    "movq %7, %0\n"
    "movq %7, %1\n"
    "movq %7, %2\n"
    "movq %7, %3\n"
    "movq %7, %4\n"
    "movq %7, %5\n"
    "movq %7, %6\n"
    : "=y" (v1), "=y" (v2), "=y" (v3),
      "=y" (v4), "=y" (v5), "=y" (v6), "=y" (v7)
    : "y" (vfill));

    all the output operands except the last one should be marked as
    earlyclobber ("=&y"). This is working by accident with gcc.
    """

Cc: jeremyhu@apple.com
Reviewed-by: Matt Turner <mattst88@gmail.com>
14 years agoRemove workaround for a bug in the 1.6 X server.
Søren Sandmann Pedersen [Fri, 29 Oct 2010 00:14:03 +0000 (20:14 -0400)]
Remove workaround for a bug in the 1.6 X server.

There used to be a bug in the X server where it would rely on
out-of-bounds accesses when it was asked to composite with a
window as the source. It would create a pixman image pointing
to some bogus position in memory, but then set a clip region
to the position where the actual bits were.

Due to a bug in old versions of pixman, where it would not clip
against the image bounds when a clip region was set, this would
actually work. So when the pixman bug was fixed, a workaround was
added to allow certain out-of-bound accesses.

However, the 1.6 X server is so old now that we can remove this
workaround. This does mean that if you update pixman to 0.22 or later,
you will need to use a 1.7 X server or later.

14 years agoFixed broken configure check for __thread support
Siarhei Siamashka [Sat, 30 Oct 2010 12:51:30 +0000 (15:51 +0300)]
Fixed broken configure check for __thread support

Somehow the patch from [1] was not applied correctly, fixing that.

1. http://lists.cairographics.org/archives/cairo/2010-September/020826.html

14 years agoCOPYING: Stop saying that a modification is currently under discussion.
Søren Sandmann Pedersen [Mon, 1 Nov 2010 21:52:29 +0000 (17:52 -0400)]
COPYING: Stop saying that a modification is currently under discussion.

Also put the copyright text into a C comment for easier cut and paste.

14 years agoVersion bump 0.21.1.
Søren Sandmann Pedersen [Wed, 27 Oct 2010 21:21:06 +0000 (17:21 -0400)]
Version bump 0.21.1.

The previous bump to 0.20.1 was a mistake; it belongs on the 0.20 branch.

14 years agoPost-release version bump to 0.20.1
Søren Sandmann Pedersen [Wed, 27 Oct 2010 20:58:29 +0000 (16:58 -0400)]
Post-release version bump to 0.20.1

14 years agoPre-release version bump to 0.20.0 pixman-0.20.0
Søren Sandmann Pedersen [Wed, 27 Oct 2010 20:51:40 +0000 (16:51 -0400)]
Pre-release version bump to 0.20.0

14 years agoAdded check to find pthread on Haiku.
Scott McCreary [Wed, 27 Oct 2010 19:31:27 +0000 (12:31 -0700)]
Added check to find pthread on Haiku.

14 years agoPlug another leak in alphamap test
Jon TURNEY [Sun, 24 Oct 2010 14:58:39 +0000 (15:58 +0100)]
Plug another leak in alphamap test

Even after commit e46be417cebac984a858da05e61d924889695c9e alphamap
test is still leaking the alphamap pixmap, leading to mmap() failures
on cygwin

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
14 years agoPost-release version bump to 0.19.7
Søren Sandmann Pedersen [Wed, 20 Oct 2010 20:31:57 +0000 (16:31 -0400)]
Post-release version bump to 0.19.7

14 years agoPre-release version bump to 0.19.6 pixman-0.19.6
Søren Sandmann Pedersen [Wed, 20 Oct 2010 20:25:55 +0000 (16:25 -0400)]
Pre-release version bump to 0.19.6

14 years agoFix an overflow in the new radial gradient code
Andrea Canciani [Tue, 12 Oct 2010 13:38:20 +0000 (15:38 +0200)]
Fix an overflow in the new radial gradient code

huge-radial in the cairo test suite pointed out an undocumented
overflow in the radial gradient code.
By casting to pixman_fixed_48_16_t before doing the operations,
the overflow can be avoided.

14 years agoRemove the class field from source_image_t
Søren Sandmann Pedersen [Wed, 20 Oct 2010 20:09:44 +0000 (16:09 -0400)]
Remove the class field from source_image_t

The linear gradient was the only image type that relied on the class
being stored in the image struct itself. With the previous changes, it
doesn't need that anymore, so we can delete the field.

14 years agoRemove unused enum value
Andrea Canciani [Wed, 20 Oct 2010 19:24:32 +0000 (21:24 +0200)]
Remove unused enum value

The new linear gradient code doesn't use SOURCE_IMAGE_CLASS_VERTICAL
anymore and it was not used anywhere else.

14 years agoMake classification consistent with rasterization
Andrea Canciani [Mon, 18 Oct 2010 20:21:52 +0000 (22:21 +0200)]
Make classification consistent with rasterization

Use the same computations to classify the gradient and to
rasterize it.
This improves the correctness of the classification by
avoiding integer division.

14 years agoImprove precision of linear gradients
Andrea Canciani [Wed, 11 Aug 2010 07:58:05 +0000 (09:58 +0200)]
Improve precision of linear gradients

Integer division (without keeping the remainder) can discard a lot
of information. Doing the division maths in floating point (and
paying attention to error propagation) allows to greatly improve
the precision of linear gradients.

14 years agoAdd comments about errors
Andrea Canciani [Tue, 12 Oct 2010 07:52:53 +0000 (09:52 +0200)]
Add comments about errors

Explain how errors are introduced in the computation performed for
radial gradients.

14 years agoDraw radial gradients with PDF semantics
Andrea Canciani [Sun, 15 Aug 2010 07:07:33 +0000 (09:07 +0200)]
Draw radial gradients with PDF semantics

Change radial gradient computations and definition to reflect the
radial gradients in PDF specifications (see section 8.7.4.5.4,
Type 3 (Radial) Shadings of the PDF Reference Manual).

Instead of having a valid interpolation parameter value for every
point of the plane, define it only for points withing the area
covered by the family of circles generated by interpolating or
extrapolating the start and end circles.

Points outside this area are now transparent black (rgba 0 0 0 0).
Points within this area have the color assiciated with the maximum
value of the interpolation parameter in that point (if multiple
solutions exist within the range specified by the extend mode).

14 years agoPlug leak in the alphamap test.
Søren Sandmann Pedersen [Fri, 8 Oct 2010 11:44:20 +0000 (07:44 -0400)]
Plug leak in the alphamap test.

The images are being created with non-NULL data, so we have to free it
outselves. This is important because the Cygwin tinderbox is running
out of memory and produces this:

    mmap failed on 20000 1507328
    mmap failed on 40000 1507328
    mmap failed on 20000 1507328
    mmap failed on 40000 1507328
    mmap failed on 40000 1507328
    mmap failed on 40000 1507328

http://tinderbox.x.org/builds/2010-10-05-0014/logs/pixman/#check

14 years agoAdd no-op combiners for DST and the CA versions of the HSL operators.
Søren Sandmann Pedersen [Wed, 6 Oct 2010 06:40:39 +0000 (02:40 -0400)]
Add no-op combiners for DST and the CA versions of the HSL operators.

We already exit early for DST, but for the HSL operators with
component alpha, we crash at the moment. Fix that by adding a dummy
combine_dst() function.

14 years agotest: Add some more colors to the color table in composite.c
Søren Sandmann Pedersen [Tue, 5 Oct 2010 15:05:25 +0000 (11:05 -0400)]
test: Add some more colors to the color table in composite.c

Specifically, add transparent black and superluminescent white with
alpha = 0.

14 years agotest: Parallize composite.c with OpenMP
Søren Sandmann Pedersen [Tue, 5 Oct 2010 13:49:45 +0000 (09:49 -0400)]
test: Parallize composite.c with OpenMP

Each test uses the test number as the random number seed; if it
didn't, all the threads would run the same tests since they would all
start from the same seed.

14 years agotest: Change composite so that it tests randomly generated images
Søren Sandmann Pedersen [Sun, 7 Mar 2010 16:26:16 +0000 (11:26 -0500)]
test: Change composite so that it tests randomly generated images

Previously this test would try to exhaustively test all combinations
of formats and operators, which meant that it would take hours to run.
Instead, generate images randomly and test compositing those.

Cc: chris@chris-wilson.co.uk
14 years agotest: Fix eval_diff() so that it provides useful error values.
Søren Sandmann Pedersen [Sun, 7 Mar 2010 16:24:30 +0000 (11:24 -0500)]
test: Fix eval_diff() so that it provides useful error values.

Previously, this function would evaluate the error under the
assumption that the format was 565 or wider. This patch changes it to
take the actual format into account.

With that fixed, we can turn on testing for the rest of the formats.

Cc: chris@chris-wilson.co.uk
14 years agotest: Fix bug in color_correct() in composite.c
Søren Sandmann Pedersen [Sun, 7 Mar 2010 15:31:04 +0000 (10:31 -0500)]
test: Fix bug in color_correct() in composite.c

This function was using the number of bits in a channel as if it were
a mask, which lead to many spurious errors. With that fixed, we can
turn on testing for all formats where all channels have 5 or more
bits.

Cc: chris@chris-wilson.co.uk
14 years agoRemove broken optimizations in combine_disjoint_over_u()
Søren Sandmann Pedersen [Tue, 5 Oct 2010 15:08:42 +0000 (11:08 -0400)]
Remove broken optimizations in combine_disjoint_over_u()

The first broken optimization is that it checks "a != 0x00" where it
should check "s != 0x00". The other is that it skips the computation
when alpha is 0xff. That is wrong because in the formula:

     min (1, (1 - Aa)/Ab)

the render specification states that if Ab is 0, the quotient is
defined to positive infinity. That is the case even if (1 - Aa) is 0.

14 years agoARM: restore fallback to ARMv6 implementation from NEON in the delegate chain
Siarhei Siamashka [Mon, 4 Oct 2010 01:49:08 +0000 (04:49 +0300)]
ARM: restore fallback to ARMv6 implementation from NEON in the delegate chain

After fast path cache introduction, the overhead of having this fallback is
insignificant. On the other hand, some of the ARM assembly optimizations (for
example nearest neighbor scaling) do not need NEON.

14 years agoUse more unrolling for scaled src_0565_0565 with nearest filter
Siarhei Siamashka [Wed, 8 Sep 2010 06:30:23 +0000 (09:30 +0300)]
Use more unrolling for scaled src_0565_0565 with nearest filter

Benchmark from Intel Core i7 860:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1335.29 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1550.96 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=2401.31 MPix/s

Benchmark from ARM Cortex-A8:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=81.79 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=197.44 MPix/s

14 years agoARM: added 'neon_composite_out_reverse_8_0565' fast path
Siarhei Siamashka [Thu, 23 Sep 2010 20:41:50 +0000 (23:41 +0300)]
ARM: added 'neon_composite_out_reverse_8_0565' fast path

== before ==

    outrev_8_0565 =  L1:  22.91  L2:  22.40  M: 18.75 ( 10.47%)
                     HT: 12.62   VT: 12.22   R: 11.32  RT:  5.30 (  58Kops/s)

== after ==

    outrev_8_0565 =  L1: 176.27  L2: 151.70  M:108.79 ( 60.81%)
                     HT: 50.43   VT: 37.16   R: 32.26  RT:  9.62 (  97Kops/s)

14 years agoARM: added 'neon_composite_add_0565_8_0565' fast path
Siarhei Siamashka [Thu, 23 Sep 2010 19:28:55 +0000 (22:28 +0300)]
ARM: added 'neon_composite_add_0565_8_0565' fast path

== before ==

    add_0565_8_0565 =  L1:  14.05  L2:  14.03  M: 11.57 ( 12.94%)
                       HT:  8.31   VT:  8.10   R:  7.47  RT:  3.64 (  42Kops/s)

== after ==

    add_0565_8_0565 =  L1: 123.36  L2:  94.70  M: 74.36 ( 83.15%)
                       HT: 31.17   VT:  23.97  R: 21.06  RT:  6.42 (  70Kops/s)

14 years agoARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565
Siarhei Siamashka [Fri, 21 May 2010 13:31:03 +0000 (16:31 +0300)]
ARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565

Prefetch provides up to 40-50% better performance when working
with large images and/or when having lots of L2 cache misses
on ARM Cortex-A8 @ 720MHz:

== before ==

    over_n_8888 =  L1: 225.83  L2: 181.02  M: 55.57 ( 41.41%)
                   HT: 38.96   VT: 36.92   R: 32.84  RT: 14.15 ( 123Kops/s)

    over_n_0565 =  L1: 153.91  L2: 149.69  M: 83.17 ( 30.95%)
                   HT: 50.41   VT: 49.15   R: 40.56  RT: 15.45 ( 131Kops/s)

== after ==

    over_n_8888 =  L1: 222.39  L2: 170.95  M: 76.86 ( 57.27%)
                   HT: 58.80   VT: 53.03   R: 45.51  RT: 14.13 ( 124Kops/s)

    over_n_0565 =  L1: 151.87  L2: 149.54  M:125.63 ( 46.80%)
                   HT: 67.85   VT: 57.54   R: 50.21  RT: 15.32 ( 130Kops/s)

14 years agoFix "syntax error: empty declaration" warnings.
Mika Yrjola [Fri, 1 Oct 2010 13:17:50 +0000 (16:17 +0300)]
Fix "syntax error: empty declaration" warnings.

These minor changes should fix a large number of
macro declaration - related "syntax error:  empty declaration" warnings
which are seen while compiling the code with the Solaris Studio
compiler.

14 years agoDelete simple repeat code
Søren Sandmann Pedersen [Tue, 28 Sep 2010 04:51:07 +0000 (00:51 -0400)]
Delete simple repeat code

This was supposedly an optimization, but it has pathological cases
where it definitely isn't. For example a 1 x n image will cause it to
have terrible memory access patterns and to generate a ton of modulus
operations.

Since no one has ever measured whether it actually is an improvement,
and since it is doing the repeating at the wrong the stage in the
pipeline, and since with the previous commit it can't be triggered
anymore because we now require SAMPLES_COVER_CLIP for regular fast
paths, just delete it.

14 years agoFix bug in FAST_PATH_STD_FAST_PATH
Søren Sandmann Pedersen [Tue, 28 Sep 2010 04:42:25 +0000 (00:42 -0400)]
Fix bug in FAST_PATH_STD_FAST_PATH

The standard fast paths deal with two kinds of images: solids and
bits. These two image types require different flags, but
PIXMAN_STD_FAST_PATH uses the same ones for both.

This patch makes it so that solid images just get the standard flags,
while bits images must be untransformed contain the destination clip
within the sample grid.

This means that the old FAST_PATH_COVERS_CLIP flag is now not used
anymore, so it can be deleted.

14 years agoSome clean-ups in fence_malloc() and fence_free()
Dmitri Vorobiev [Tue, 28 Sep 2010 11:42:02 +0000 (14:42 +0300)]
Some clean-ups in fence_malloc() and fence_free()

This patch removes an unnecessary typecast of MAP_FAILED,
replaces an erroneous free() by the correct munmap() in the
error path for a failing mprotect(), and, finally, removes
redundant calls to mprotect() that aren't necessary, because
munmap() doesn't call for any specific memory protection.

14 years agoFix search-and-replace issue in lowlevel-blt-bench.c
Søren Sandmann Pedersen [Tue, 28 Sep 2010 06:52:02 +0000 (02:52 -0400)]
Fix search-and-replace issue in lowlevel-blt-bench.c

14 years agoRename all the fast paths with _8000 in their names to _8
Søren Sandmann Pedersen [Fri, 17 Sep 2010 13:21:09 +0000 (09:21 -0400)]
Rename all the fast paths with _8000 in their names to _8

This inconsistent naming somehow survived the refactoring from a while
back.

14 years agoRemove cache prefetch code.
Liu Xinyun [Sat, 25 Sep 2010 06:56:38 +0000 (14:56 +0800)]
Remove cache prefetch code.

The performance is decreased with cache prefetch, especially for
ATOM. So remove these code. Following is the experiment.

old: 0.19.5-with-cache-prefetch
new: 0.19.5-without-cache-prefetch

CPU: Intel Atom N270@1.6GHz
OS: MeeGo (32 bits)
Speedups
========
image-rgba                    poppler-0    17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%):  1.16x speedup
image-rgba                  ocitysmap-0    9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%):  1.09x speedup
image-rgba          xfce4-terminal-a1-0    18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%):  1.08x speedup
image-rgba         gnome-terminal-vim-0    25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%):  1.07x speedup
image-rgba          firefox-talos-gfx-0    57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%):  1.06x speedup
image-rgba       firefox-planet-gnome-0    102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%):  1.06x speedup
image-rgba         swfdec-giant-steps-0    12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%):  1.06x speedup

CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz
OS: Ubuntu 10.04 (64bits)
Speedups
========
image-rgba                  ocitysmap-0    2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%):  1.16x speedup
image-rgba         swfdec-giant-steps-0    1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%):  1.05x speedup

Signed-off-by: Liu Xinyun <xinyun.liu@intel.com>
Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>
14 years agoUse <sys/mman.h> macros only when they are available
Dmitri Vorobiev [Wed, 22 Sep 2010 09:34:57 +0000 (12:34 +0300)]
Use <sys/mman.h> macros only when they are available

Not all systems are regular Unices, so let's be careful with the
mmap()-related stuff, which might be unavailable. This patch makes
sure that mmap() and friends is used only when the <sys/mman.h>
header is found.

14 years agoRevert "add enable-cache-prefetch option"
Søren Sandmann Pedersen [Tue, 21 Sep 2010 18:20:43 +0000 (14:20 -0400)]
Revert "add enable-cache-prefetch option"

Revert this accidentally committed patch.

This reverts commit 19ea0e16b958e5abe491365c203293ab372f3586.

14 years agoIf MAP_ANONYMOUS is not defined, define it to MAP_ANON.
Søren Sandmann Pedersen [Tue, 21 Sep 2010 18:12:00 +0000 (14:12 -0400)]
If MAP_ANONYMOUS is not defined, define it to MAP_ANON.

This hopefully fixes the build failure on OS X.

14 years agoadd enable-cache-prefetch option
Liu Xinyun [Tue, 21 Sep 2010 16:15:10 +0000 (00:15 +0800)]
add enable-cache-prefetch option

OK. here is the work to clear all cache prefetch. Please review it. 3x

On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote:
> Liu Xinyun <xinyun.liu@intel.com> writes:
>
> >    This patch is to add a new configuration option: enable-cache-prefetch,
> > which is default yes.
> >
> >    Here is a link which talks on cache issue.
> >    http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
> >
> >    When disable it on Atom CPU(configured with --enable-cache-prefetch=no),
> > it will have a little performance gain. Here is the patch.
>
> I think the cache prefetch code should just be deleted outright. No
> benchmarks that I'm aware of show it to be an improvement.
>
>
> Thanks,
> Soren

>From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001
From: Liu Xinyun <xinyun.liu@intel.com>
Date: Wed, 22 Sep 2010 00:11:56 +0800
Subject: [PATCH] remove cache prefetch

14 years agoPost-release version bump to 0.19.5
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:18:44 +0000 (10:18 -0400)]
Post-release version bump to 0.19.5

14 years agoPre-release version bump to 0.19.4 pixman-0.19.4
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:11:34 +0000 (10:11 -0400)]
Pre-release version bump to 0.19.4

14 years agocompute_composite_region32: Zero extents before returning FALSE.
Søren Sandmann Pedersen [Tue, 21 Sep 2010 14:05:52 +0000 (10:05 -0400)]
compute_composite_region32: Zero extents before returning FALSE.

If the extents of the composite region are broken such that x2 <= x1
or y2 <= y1, then we need to zero the extents before returning so that
the region won't be completely broken when calling
pixman_region32_fini().

14 years agoAdd a lowlevel blitter benchmark
Jonathan Morton [Fri, 17 Sep 2010 14:52:23 +0000 (17:52 +0300)]
Add a lowlevel blitter benchmark

This test is a modified version of Siarhei's compositor throughput
benchmark.  It's expanded with explicit reporting of memory bandwidth
consumption for the M-test, and with an additional 8x8-random test
intended to determine peak ops/sec capability.  There are also quite a
lot more operations tested for.

14 years agoAdd noinline macro
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:22 +0000 (17:52 +0300)]
Add noinline macro

This patch adds a noinline macro, which expands to compiler-dependent
keywords that tell the compiler to never inline a function.

14 years agoAdd gettime() routine to test utils
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:21 +0000 (17:52 +0300)]
Add gettime() routine to test utils

Impending benchmark code will need a function to get current time
in seconds, and this patch introduces such routine. We try to use
the POSIX gettimeofday() function when available, and fall back to
clock() when not.

14 years agoMove aligned_malloc() to utils
Dmitri Vorobiev [Fri, 17 Sep 2010 14:52:20 +0000 (17:52 +0300)]
Move aligned_malloc() to utils

The aligned_malloc() routine will be used in more than one test utility.
At least, a low-level blitter benchmark needs it. Therefore, let's make
this function a part of common test utilities code.

14 years agoEnable bits_image_fetch_bilinear_affine_normal_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:23 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:10 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_none_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:33:00 +0000 (10:33 -0400)]
Enable bits_image_fetch_bilinear_affine_none_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_pad_r5g6b5
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:44 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_r5g6b5

14 years agoEnable bits_image_fetch_bilinear_affine_normal_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:27 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_a8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:32:12 +0000 (10:32 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_a8

14 years agoEnable bits_image_fetch_bilinear_affine_none_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:31:57 +0000 (10:31 -0400)]
Enable bits_image_fetch_bilinear_affine_none_a8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_a8
Søren Sandmann Pedersen [Thu, 16 Sep 2010 14:31:45 +0000 (10:31 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_a8

14 years agoEnable bits_image_fetch_bilinear_affine_normal_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:41:20 +0000 (02:41 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:41:08 +0000 (02:41 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_none_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:56 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_x8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:46 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_normal_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:16 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:40:03 +0000 (02:40 -0400)]
Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_none_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:39:51 +0000 (02:39 -0400)]
Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8

14 years agoEnable bits_image_fetch_bilinear_affine_pad_a8r8g8b8
Søren Sandmann Pedersen [Sat, 28 Aug 2010 06:39:37 +0000 (02:39 -0400)]
Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8

14 years agoUse a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.
Søren Sandmann Pedersen [Sun, 23 May 2010 08:44:33 +0000 (04:44 -0400)]
Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.

There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each scaler is an
inline function that takes a format and a repeat mode as parameters.

The new scalers are all commented out, but the next commits will
enable them one at a time to facilitate bisecting.

14 years agotest: Add affine-test
Søren Sandmann Pedersen [Wed, 14 Jul 2010 20:27:27 +0000 (16:27 -0400)]
test: Add affine-test

This test tests compositing with various affine transformations. It is
almost identical to scaling-test, except that it also applies a random
rotation in addition to the random scaling and translation.

14 years agoanalyze_extents: Fast path for non-transformed BITS images
Søren Sandmann Pedersen [Sun, 12 Sep 2010 10:07:41 +0000 (06:07 -0400)]
analyze_extents: Fast path for non-transformed BITS images

Profiling various cairo traces showed that we were spending a lot of
time in analyze_extents and compute_sample_extents(). This was
especially bad for glyphs where all this computation was completely
unnecessary.

This patch adds a fast path for the case of non-transformed BITS
images. The result is approximately a 6% improvement on the
firefox-talos-gfx benchmark:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   13.797   13.848   0.20%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   12.946   13.018   0.39%    6/6

14 years agoMove some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c
Søren Sandmann Pedersen [Thu, 16 Sep 2010 12:35:05 +0000 (08:35 -0400)]
Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c

When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag
can be set in compute_image_info().

Also the code that turned this flag off in pixman.c was not correct;
it didn't take transformations into account. With this patch, pixman.c
doesn't set the flag by default, but instead relies on the call to
compute_samples_extents() to set it when possible.

14 years agoSupport __thread on MINGW 4.5
Tor Lillqvist [Wed, 15 Sep 2010 15:53:47 +0000 (11:53 -0400)]
Support __thread on MINGW 4.5

By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse
and mmx work fine.

I added the below to pixman 0.18 and as far as I can see, it works.
make check reports no problems. (Earlier I had to use --disable-mmx
and --disable-sse2.) Also gtk-demo and gimp run fine.

(Also a change to get rid of the warnings about -fvisibility being ignored.)

14 years agoClip composite region against the destination alpha map extents.
Søren Sandmann Pedersen [Mon, 30 Aug 2010 02:46:09 +0000 (22:46 -0400)]
Clip composite region against the destination alpha map extents.

Otherwise we can end up writing outside the alpha map.

14 years agoRemove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map
Søren Sandmann Pedersen [Sun, 29 Aug 2010 21:07:40 +0000 (17:07 -0400)]
Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map

If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.

In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.

14 years agoRename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT
Søren Sandmann Pedersen [Sun, 29 Aug 2010 21:03:01 +0000 (17:03 -0400)]
Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT

This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.

14 years agoUpdate and extend the alphamap test
Søren Sandmann Pedersen [Sun, 29 Aug 2010 20:59:02 +0000 (16:59 -0400)]
Update and extend the alphamap test

- Test many more combinations of formats

- Test destination alpha maps

- Test various different alpha origins

Also add a transformation to the destination, but comment it out
because it is actually broken at the moment (and pretty difficult to
fix).

14 years agoAdd fence_malloc() and fence_free().
Søren Sandmann Pedersen [Mon, 13 Sep 2010 18:34:34 +0000 (14:34 -0400)]
Add fence_malloc() and fence_free().

These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.

If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().

14 years agoDo opacity computation with shifts instead of comparing with 0
Søren Sandmann Pedersen [Sun, 12 Sep 2010 08:35:08 +0000 (04:35 -0400)]
Do opacity computation with shifts instead of comparing with 0

Also add a COMPILE_TIME_ASSERT() macro and use it to assert that the
shift is correct.

14 years agoSSE2 optimization for scaled over_8888_8888 operation with nearest filter
Siarhei Siamashka [Wed, 8 Sep 2010 06:16:12 +0000 (09:16 +0300)]
SSE2 optimization for scaled over_8888_8888 operation with nearest filter

This is the first demo implementation, it should be possible to
generalize it later to cover more operations with less lines of code.

It should be also possible to introduce the use of '__builtin_constant_p'
gcc builtin function for an efficient way of checking if 'unit_x' is known
to be zero at compile time (when processing padding pixels for NONE, or
PAD repeat).

Benchmarks from Intel Core i7 860:

== before (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=142.01 MPix/s

== after (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=314.99 MPix/s

== performance of nonscaled operation as a reference ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=652.09 MPix/s

14 years agoNONE repeat support for fast scaling with nearest filter
Siarhei Siamashka [Thu, 16 Sep 2010 15:25:40 +0000 (18:25 +0300)]
NONE repeat support for fast scaling with nearest filter

Implemented very similar to PAD repeat.

And gcc also seems to be able to completely eliminate the
code responsible for left and right padding pixels for OVER
operation with NONE repeat.

14 years agoPAD repeat support for fast scaling with nearest filter
Siarhei Siamashka [Thu, 16 Sep 2010 14:10:40 +0000 (17:10 +0300)]
PAD repeat support for fast scaling with nearest filter

When processing pixels from the left and right padding, the same
scanline function is used with 'unit_x' set to 0.

Actually appears that gcc can handle this quite efficiently. When
using 'restrict' keyword, it is able to optimize the whole operation
performed on left or right padding pixels to a small unrolled loop
(the code is reduced to a simple fill implementation):

    9b30:       89 08                   mov    %ecx,(%rax)
    9b32:       89 48 04                mov    %ecx,0x4(%rax)
    9b35:       48 83 c0 08             add    $0x8,%rax
    9b39:       49 39 c0                cmp    %rax,%r8
    9b3c:       75 f2                   jne    9b30

Without 'restrict' keyword, there is one instruction more: reloading
source pixel data from memory in the beginning of each iteration. That
is slower, but also acceptable.

14 years agoIntroduce a fake PIXMAN_REPEAT_COVER constant
Siarhei Siamashka [Fri, 17 Sep 2010 13:22:25 +0000 (16:22 +0300)]
Introduce a fake PIXMAN_REPEAT_COVER constant

We need to implement a true PIXMAN_REPEAT_NONE support later (padding
the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE
for handling FAST_PATH_SAMPLES_COVER_CLIP special case.

14 years agoNearest scaling fast path macro split into two parts
Siarhei Siamashka [Thu, 16 Sep 2010 10:02:18 +0000 (13:02 +0300)]
Nearest scaling fast path macro split into two parts

Scanline processing is now split into a separate function. This provides
an easy way of overriding it with a platform specific implementation,
which may use SIMD optimizations. Only basic C data types are used as
the arguments for this function, so it may be implemented entirely in
assembly or be generated by some JIT engine.

Also as a result of this split, the complexity of code is reduced a
bit and now it should be easier to introduce support for the currently
missing NONE, PAD and REFLECT repeat types.

14 years agoNearest scaling fast path macros moved to 'pixman-fast-path.h'
Siarhei Siamashka [Thu, 16 Sep 2010 09:31:27 +0000 (12:31 +0300)]
Nearest scaling fast path macros moved to 'pixman-fast-path.h'

These macros with some modifications can can be reused later by
various platform specific implementations, introducing SIMD
optimizations for nearest scaling fast paths.