platform/upstream/pixman.git
13 years agommx: convert while (w) to if (w) when possible
Matt Turner [Fri, 23 Sep 2011 18:10:52 +0000 (14:10 -0400)]
mmx: convert while (w) to if (w) when possible

gcc isn't able to see that w is no greater than 1, so it generates
unnecessary loop instructions with while (w).

Signed-off-by: Matt Turner <mattst88@gmail.com>
13 years agommx: fix formats in commented code
Matt Turner [Fri, 9 Sep 2011 13:33:14 +0000 (15:33 +0200)]
mmx: fix formats in commented code

b8r8g8 is apparently no longer supported sometime since this code was
commented.

Signed-off-by: Matt Turner <mattst88@gmail.com>
13 years agolowlevel-blt: add over_x888_8_8888
Matt Turner [Fri, 9 Sep 2011 13:34:04 +0000 (15:34 +0200)]
lowlevel-blt: add over_x888_8_8888

Signed-off-by: Matt Turner <mattst88@gmail.com>
13 years agoBILINEAR->NEAREST filter optimization for simple rotation and translation
Siarhei Siamashka [Sun, 22 May 2011 19:51:00 +0000 (22:51 +0300)]
BILINEAR->NEAREST filter optimization for simple rotation and translation

Simple rotation and translation are the additional cases when BILINEAR
filter can be safely reduced to NEAREST.

13 years agoStrength-reduce BILINEAR filter to NEAREST filter for identity transforms
Søren Sandmann Pedersen [Sun, 4 Sep 2011 06:53:39 +0000 (02:53 -0400)]
Strength-reduce BILINEAR filter to NEAREST filter for identity transforms

An image with a bilinear filter and an identity transform is
equivalent to one with a nearest filter, so there is no reason the
standard fast paths shouldn't be usable.

But because a BILINEAR filter samples a 2x2 pixel block in the source
image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the
source area is the entire image, because some compositing operations
might then read pixels outside the image.

This patch fixes the problem by splitting the
FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip
covers the samples taking into account NEAREST/BILINEAR filters
respectively.

All the existing compositing operations that require
FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick
either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which
filter they depend on.

In compute_image_info() both COVER_CILP_NEAREST and
COVER_CLIP_BILINEAR can be set depending on how much room there is
around the clip rectangle.

Finally, images with an identity transform and a bilinear filter get
FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER.

Performance measurementas with render_bench against Xephyr:

Before

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.

After:

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.

13 years agotest: Occasionally use a BILINEAR filter in blitters-test
Søren Sandmann Pedersen [Mon, 5 Sep 2011 18:43:25 +0000 (14:43 -0400)]
test: Occasionally use a BILINEAR filter in blitters-test

To test that reductions of BILINEAR->NEAREST for identity
transformations happen correctly, occasionally use a bilinear filter
in blitters test.

13 years agotest: better coverage for BILINEAR->NEAREST filter optimization
Siarhei Siamashka [Sun, 22 May 2011 19:16:38 +0000 (22:16 +0300)]
test: better coverage for BILINEAR->NEAREST filter optimization

The upcoming optimization which is going to be able to replace BILINEAR filter
with NEAREST where appropriate needs to analyze the transformation matrix
and not to make any mistakes.

The changes to affine-test include:
1. Higher chance of using the same scale factor for x and y axes. This can help
   to stress some special cases (for example the case when both x and y scale
   factors are integer). The same applies to x/y translation.
2. Introduced a small chance for "corrupting" transformation matrix by flipping
   random bits. This supposedly can help to identify the cases when some of the
   fast paths or other code logic is wrongly activated due to insufficient checks.

13 years agoEliminate compute_sample_extents() function
Søren Sandmann Pedersen [Mon, 5 Sep 2011 04:19:51 +0000 (00:19 -0400)]
Eliminate compute_sample_extents() function

In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear->nearest optimization
will do something different with these two pieces of code.

13 years agoSplit computation of sample area into own function
Søren Sandmann Pedersen [Sun, 4 Sep 2011 21:43:29 +0000 (17:43 -0400)]
Split computation of sample area into own function

compute_sample_extents() have two parts: one that computes the
transformed extents, and one that checks whether the computed extents
fit within the 16.16 coordinate space.

Split the first part into its own function
compute_transformed_extents().

13 years agoRemove x and y coordinates from analyze_extents() and compute_sample_extents()
Søren Sandmann Pedersen [Sun, 4 Sep 2011 21:17:53 +0000 (17:17 -0400)]
Remove x and y coordinates from analyze_extents() and compute_sample_extents()

These coordinates were only ever used for subtracting from the extents
box to put it into the coordinate space of the image, so we might as
well do this coordinate translation only once before entering the
functions.

13 years agoUse MAKE_ACCESSORS() to generate accessors for paletted formats
Søren Sandmann Pedersen [Tue, 16 Aug 2011 10:13:59 +0000 (06:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for paletted formats

Add support in convert_pixel_from_a8r8g8b8() and
convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats,
then use MAKE_ACCESSORS() to generate accessors for the indexed
formats: c8, g8, g4, c4, g1

13 years agoUse MAKE_ACCESSORS() to generate accessors for the a1 format.
Søren Sandmann Pedersen [Sun, 30 May 2010 16:36:58 +0000 (12:36 -0400)]
Use MAKE_ACCESSORS() to generate accessors for the a1 format.

Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp
pixels to fetch_and_convert_pixel() and convert_and_store_pixel(),
then use MAKE_ACCESSORS() to generate the accessors for the a1
format. (Not the g1 format as it is indexed).

13 years agoUse MAKE_ACCESSORS() to generate accessors for 24bpp formats
Søren Sandmann Pedersen [Tue, 16 Aug 2011 18:38:44 +0000 (14:38 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 24bpp formats

Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp
pixels in fetch_and_convert_pixel() and
convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate
accessors for the 24 bpp formats:

    r8g8b8
    b8g8r8

13 years agoUse MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formats
Søren Sandmann Pedersen [Thu, 18 Aug 2011 09:09:07 +0000 (05:09 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formats

Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to
fetch_and_convert_pixel() and convert_and_store_pixel(), then use
MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and
c4 which are indexed:

    a4
    r1g2b1
    b1g2r1
    a1r1g1b1
    a1b1g1r1

13 years agoUse MAKE_ACCESSORS() to generate accessors for 8bpp RGB formats
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:58 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for 8bpp RGB formats

Add support for 8 bpp formats to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the
accessors for all the 8 bpp formats, except g8 and c8, which are
indexed:

    a8
    r3g3b2
    b2g3r3
    a2r2g2b2
    a2b2g2r2
    x4a4

13 years agoUse MAKE_ACCESSORS() to generate accessors for all the 16bpp formats
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:44 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats

Add support for 16bpp pixels to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 16bpp formats:

    r5g6b5
    b5g6r5
    a1r5g5b5
    x1r5g5b5
    a1b5g5r5
    x1b5g5r5
    a4r4g4b4
    x4r4g4b4
    a4b4g4r4
    x4b4g4r4

13 years agoUse MAKE_ACCESSORS() to generate all the 32 bit accessors
Søren Sandmann Pedersen [Thu, 18 Aug 2011 12:13:30 +0000 (08:13 -0400)]
Use MAKE_ACCESSORS() to generate all the 32 bit accessors

Add support for 32bpp formats in fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 32 bpp formats:

    a8r8g8b8
    x8r8g8b8
    a8b8g8r8
    x8b8g8r8
    x14r6g6b6
    b8g8r8a8
    b8g8r8x8
    r8g8b8x8
    r8g8b8a8

13 years agoAdd initial version of the MAKE_ACCESSORS() macro
Søren Sandmann Pedersen [Wed, 17 Aug 2011 21:27:58 +0000 (17:27 -0400)]
Add initial version of the MAKE_ACCESSORS() macro

This macro will eventually allow the fetchers and storers to be
generated automatically. For now, it's just a skeleton that doesn't
actually do anything.

13 years agoAdd general pixel converter
Søren Sandmann Pedersen [Mon, 15 Aug 2011 22:42:38 +0000 (18:42 -0400)]
Add general pixel converter

This function can convert between any <= 32 bpp formats. Nothing uses
it yet.

13 years agoAdd a generic unorm_to_unorm() conversion utility
Søren Sandmann Pedersen [Mon, 15 Aug 2011 14:22:05 +0000 (10:22 -0400)]
Add a generic unorm_to_unorm() conversion utility

This function can convert between normalized numbers of different
depths. When converting to higher bit depths, it will replicate the
existing bits, when converting to lower bit depths, it will simply
truncate.

This function replaces the expand16() function in pixman-utils.c

13 years agoA few tweaks to a comment in pixman-combine.c.template
Søren Sandmann Pedersen [Mon, 19 Sep 2011 13:08:33 +0000 (09:08 -0400)]
A few tweaks to a comment in pixman-combine.c.template

Include a link to

http://marc.info/?l=xfree-render&m=99792000027857&w=2

where Keith explains how the disjoint/conjoint operators work.

13 years agoFix build on cygwin after commit efdf65c0c4fff551fb3cd9104deda9adb6261e22
Jon TURNEY [Mon, 19 Sep 2011 10:17:58 +0000 (06:17 -0400)]
Fix build on cygwin after commit efdf65c0c4fff551fb3cd9104deda9adb6261e22

libutils depends on pixman and so needs to preceed it in the link order

Found by tinderbox, see [1]

[1] http://tinderbox.freedesktop.org/builds/2011-09-15-0005/logs/pixman/#build

Signed-off-by: Jon TURNEY <jon.turney at dronecode.org.uk>
13 years agotest: Use smaller boxes in region_contains_test()
Søren Sandmann Pedersen [Tue, 13 Sep 2011 03:17:39 +0000 (23:17 -0400)]
test: Use smaller boxes in region_contains_test()

The boxes used region_contains_test() sometimes overflow causing

    *** BUG ***
    In pixman_region32_union_rect: Invalid rectangle passed
    Set a breakpoint on '_pixman_log_error' to debug

messages to be printed when pixman is compiled with DEBUG. Fix this by
dividing the x, y, w, h coordinates by 4 to prevent overflows.

13 years agobuild-win32: Add 'check' target
Andrea Canciani [Sun, 4 Sep 2011 19:33:05 +0000 (21:33 +0200)]
build-win32: Add 'check' target

On win32 the tests are built but they are not run automatically by the
build system.

A minimal 'check' target (depending on the tests being built) can
simply run them and log to the console their success/failure.

13 years agotest: Do not include config.h unless HAVE_CONFIG_H is defined
Andrea Canciani [Sun, 4 Sep 2011 20:52:53 +0000 (13:52 -0700)]
test: Do not include config.h unless HAVE_CONFIG_H is defined

The win32 build system does not generate config.h and correctly runs
the compiler without defining HAVE_CONFIG_H. Nevertheless some files
include config.h without checking for its availability, breaking the
build from a clean directory:

test\utils.h(2) : fatal error C1083: Cannot open include file:
'config.h': No such file or directory
...

13 years agobuild-win32: Add root Makefile.win32
Andrea Canciani [Sun, 4 Sep 2011 19:56:20 +0000 (21:56 +0200)]
build-win32: Add root Makefile.win32

Add Makefile.win32 to the pixman root. This makefile can recursively
run the other ones to compile the library or the test suite.

13 years agobuild-win32: Share targets and variables across win32 makefiles
Andrea Canciani [Sun, 4 Sep 2011 16:00:38 +0000 (18:00 +0200)]
build-win32: Share targets and variables across win32 makefiles

The win32 build system repeatedly defines some basic variables
(notably program names and flags) and C sources compilation rules.

They can be factored out to a common Makefile, to be included in every
other Makefile.win32.

13 years agobuild: Reuse test sources
Andrea Canciani [Sun, 4 Sep 2011 18:07:42 +0000 (20:07 +0200)]
build: Reuse test sources

Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.

This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.

In order to further simplify the test makefiles, the utility functions
are now in a static library, which gets linked to all the tests and
benchmarks.

13 years agobuild: Reuse sources and pixman-combine build rules
Andrea Canciani [Sun, 4 Sep 2011 16:41:41 +0000 (09:41 -0700)]
build: Reuse sources and pixman-combine build rules

Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.

This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.

13 years agotest: Fix compilation on win32
Andrea Canciani [Sun, 4 Sep 2011 18:07:57 +0000 (20:07 +0200)]
test: Fix compilation on win32

Adding scaling-helpers-test to the testsuite on win32 makes MSVC
complain about int64_t being used as an expression:

scaling-helpers-test.c(27) : error C2275: 'int64_t' : illegal use of
this type as an expression

13 years agoUse pkg-config to determine the flags to use with libpng
Søren Sandmann Pedersen [Sun, 11 Sep 2011 23:44:06 +0000 (19:44 -0400)]
Use pkg-config to determine the flags to use with libpng

Previously we would unconditionally link with -lpng leading to build
failures on systems without libpng.

13 years agotest: New function to save a pixman image to .png
Søren Sandmann Pedersen [Tue, 22 Feb 2011 10:20:36 +0000 (05:20 -0500)]
test: New function to save a pixman image to .png

When debugging it is often very useful to be able to save an image as
a png file. This commit adds a function "write_png()" that does that.

If libpng is not available, then the function becomes a noop.

13 years agoPost-release version bump to 0.23.5
Søren Sandmann Pedersen [Sat, 10 Sep 2011 03:59:20 +0000 (23:59 -0400)]
Post-release version bump to 0.23.5

13 years agoPre-release version bump to 0.23.4 pixman-0.23.4
Søren Sandmann Pedersen [Sat, 10 Sep 2011 03:51:11 +0000 (23:51 -0400)]
Pre-release version bump to 0.23.4

13 years agobits: optimise fetching width==1 repeats
Chris Wilson [Mon, 22 Aug 2011 14:29:25 +0000 (15:29 +0100)]
bits: optimise fetching width==1 repeats

Profiling ign.com, 20% of the entire render time was absorbed in this
single operation:

<< /content //COLOR_ALPHA /width 480 /height 800 >> surface context
<< /width 1 /height 677 /format //ARGB32 /source <|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern
  //EXTEND_REPEAT set-extend
  set-source
n 0 0 480 677 rectangle
fill+
pop

which is a simple composition of a single pixel wide image. Sadly this
is a workaround for lack of independent repeat-x/y handling in cairo and
pixman. Worse still is that the worst-case behaviour of the general repeat
path is for width 1 images...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
13 years agoARM: NEON better instruction scheduling of over_n_8888
Taekyun Kim [Fri, 19 Aug 2011 12:20:08 +0000 (21:20 +0900)]
ARM: NEON better instruction scheduling of over_n_8888

New head, tail, tail/head blocks are added and instructions
are reordered to eliminate pipeline stalls

Performance numbers of before/after

- cortex a8 -
before : L1: 375.39  L2: 391.93  M:114.39 ( 40.99%)  HT: 99.37  VT: 98.20  R: 90.24  RT: 32.87 ( 240Kops/s)
after  : L1: 481.90  L2: 483.46  M:114.29 ( 40.69%)  HT:106.91  VT: 93.38  R: 90.74  RT: 29.51 ( 236Kops/s)

- cortex a9 -
before : L1: 324.50  L2: 332.79  M:155.55 ( 47.51%)  HT:111.93  VT: 93.58  R: 71.92  RT: 28.21 ( 233Kops/s)
after  : L1: 355.87  L2: 364.49  M:156.90 ( 47.59%)  HT:111.52  VT: 91.76  R: 72.16  RT: 28.22 ( 234Kops/s)

13 years agoARM: NEON better instruction scheduling of over_n_8_8888
Taekyun Kim [Tue, 23 Aug 2011 06:00:11 +0000 (15:00 +0900)]
ARM: NEON better instruction scheduling of over_n_8_8888

tail/head block is expanded and reordered to eliminate stalls

Performance numbers of before/after

- cortex a8 -
before : L1: 201.35  L2: 190.48  M:101.94 ( 54.85%)  HT: 78.41  VT: 63.83  R: 58.25  RT: 21.74 ( 191Kops/s)
after  : L1: 257.65  L2: 255.49  M:102.04 ( 55.33%)  HT: 79.19  VT: 65.46  R: 59.23  RT: 21.12 ( 189Kops/s)

- cortex a9 -
before : L1: 157.35  L2: 159.81  M:133.00 ( 60.94%)  HT: 82.44  VT: 63.64  R: 51.66  RT: 19.15 ( 179Kops/s)
after  : L1: 216.83  L2: 219.40  M:135.83 ( 61.80%)  HT: 85.60  VT: 64.80  R: 52.23  RT: 19.16 ( 179Kops/s)

13 years agoWorkaround bug in llvm-gcc
Andrea Canciani [Sat, 13 Aug 2011 14:18:17 +0000 (16:18 +0200)]
Workaround bug in llvm-gcc

llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in
the 2.9 release of LLVM) performs an invalid optimization which
unifies the empty_region and the bad_region structures because they
have the same content.

A bugreport has been filed against Apple Developers Tool for this
issue. This commit works around this bug by making one of the two
structures volatile, so that it cannot be merged.

Fixes region-contains-test.

13 years agowin32: Build benchmarks
Andrea Canciani [Wed, 29 Jun 2011 12:14:38 +0000 (14:14 +0200)]
win32: Build benchmarks

Add the makefile rules needed to compile lowlevel-blt-bench on win32
and fix the compilation errors.

13 years agoMove bilinear interpolation to pixman-inlines.h
Søren Sandmann Pedersen [Fri, 11 Mar 2011 22:09:34 +0000 (17:09 -0500)]
Move bilinear interpolation to pixman-inlines.h

13 years agoUse repeat() function from pixman-inlines.h in pixman-bits-image.c
Søren Sandmann Pedersen [Fri, 11 Mar 2011 21:09:21 +0000 (16:09 -0500)]
Use repeat() function from pixman-inlines.h in pixman-bits-image.c

The repeat() functionality was duplicated between pixman-bits-image.c
and pixman-inlines.h

13 years agoRename pixman-fast-path.h to pixman-inlines.h
Søren Sandmann Pedersen [Fri, 11 Mar 2011 21:07:24 +0000 (16:07 -0500)]
Rename pixman-fast-path.h to pixman-inlines.h

It is not really specific to pixman-fast-path.c.

13 years agoIn pixman_image_create_bits() allow images larger than 2GB
Søren Sandmann Pedersen [Thu, 11 Aug 2011 10:30:43 +0000 (06:30 -0400)]
In pixman_image_create_bits() allow images larger than 2GB

There is no reason for pixman_image_create_bits() to check that the
image size fits in int32_t. The correct check is against size_t since
that is what the argument to calloc() is.

This patch fixes this by adding a new _pixman_multiply_overflows_size()
and using it in create_bits(). Also prepend an underscore to the names
of other similar functions since they are internal to pixman.

V2: Use int, not ssize_t for the arguments in create_bits() since
width/height are still limited to 32 bits, as pointed out by Chris
Wilson.

13 years agoDon't include stdint.h in lowlevel-blt-bench.c
Søren Sandmann Pedersen [Mon, 8 Aug 2011 14:18:07 +0000 (10:18 -0400)]
Don't include stdint.h in lowlevel-blt-bench.c

Some systems don't have the file, and the types are already defined in
pixman.h.

https://bugs.freedesktop.org//show_bug.cgi?id=37422

13 years agoUse find_box_for_y() in pixman_region_contains_point() too
Søren Sandmann Pedersen [Tue, 2 Aug 2011 07:03:48 +0000 (03:03 -0400)]
Use find_box_for_y() in pixman_region_contains_point() too

The same binary search from the previous commit can be used in this
function too.

V2: Remove check from loop that is not needed anymore, pointed out by
Andrea Canciani.

13 years agoSpeed up pixman_region{,32}_contains_rectangle()
Søren Sandmann Pedersen [Tue, 2 Aug 2011 02:32:09 +0000 (22:32 -0400)]
Speed up pixman_region{,32}_contains_rectangle()

When someone selects some text in Firefox under a non-composited X
server and initiates a drag, a shaped window is created with a complex
shape corresponding to the outline of the text. Then, on every mouse
movement pixman_region_contains_rectangle() is called many times on
that complicated region. And pixman_region_contains_rectangle() is
doing a linear scan through the rectangles in the region, although the
scan does exit when it finds the first box that can't possibly
intersect the passed-in rectangle.

This patch changes the loop so that it uses a binary search to skip
boxes that don't overlap the current y position.  The performance
improvement for the text dragging case is easily noticable.

V2: Use the binary search for the "getting up to speed or skippping
remainder of band" as well.

13 years agoNew test of pixman_region_contains_{rectangle,point}
Søren Sandmann Pedersen [Tue, 2 Aug 2011 05:32:15 +0000 (01:32 -0400)]
New test of pixman_region_contains_{rectangle,point}

This test generates random regions and checks whether random boxes and
points are contained within them. The results are combined and a CRC32
value is computed and compared to a known-correct one.

13 years agoFix lcg_rand_u32() to return 32 random bits.
Søren Sandmann Pedersen [Wed, 3 Aug 2011 22:38:20 +0000 (18:38 -0400)]
Fix lcg_rand_u32() to return 32 random bits.

The lcg_rand() function only returns 15 random bits, so lcg_rand_u32()
would always have 0 in bit 31 and bit 15. Fix that by calling
lcg_rand() three times, to generate 15, 15, and 2 random bits
respectively.

V2: Use the 10/11 most significant bits from the 3 lcg results and mix
them with the low ones from the adjacent one, as suggested by Andrea
Canciani.

13 years agoARM NEON: Standard fast path out_reverse_8_8888
Taekyun Kim [Thu, 4 Aug 2011 13:21:04 +0000 (22:21 +0900)]
ARM NEON: Standard fast path out_reverse_8_8888

This fast path is frequently used by cairo to do polygon rendering.
Existing NEON code generation framework is used.

13 years agoradial: Fix typos and trailing whitespace
Andrea Canciani [Mon, 18 Jul 2011 06:15:23 +0000 (08:15 +0200)]
radial: Fix typos and trailing whitespace

Correct a typo reported by James Cloos and some reported by automatic
spellchecking.

Remove trailing whitespace.

13 years agoARM: workaround binutils bug #12931 (code sections alignment)
Siarhei Siamashka [Fri, 22 Jul 2011 21:27:34 +0000 (00:27 +0300)]
ARM: workaround binutils bug #12931 (code sections alignment)

More details in binutils bugtracker:
  http://sourceware.org/bugzilla/show_bug.cgi?id=12931

The problem was encountered in the wild by Mozilla:
  https://bugzilla.mozilla.org/show_bug.cgi?id=672787

13 years agoC fast path for scaled src_x888_8888 with nearest filter
Siarhei Siamashka [Fri, 15 Jul 2011 20:35:21 +0000 (23:35 +0300)]
C fast path for scaled src_x888_8888 with nearest filter

The necessity is justified by a message in the pixman mailing list:
  http://lists.freedesktop.org/archives/pixman/2011-July/001330.html

NONE repeat is not supported, but could be added by tweaking
the interpretation and making use of 'fully_transparent_src'
scanline function argument.

13 years agoradial: Improve documentation and naming
Andrea Canciani [Fri, 15 Jul 2011 20:02:01 +0000 (22:02 +0200)]
radial: Improve documentation and naming

Add a comment to explain why the tests guarantee that the code always
computes the greatest valid root.

Rename "det" as "discr" to make it match the mathematical name
"discriminant".

Based on a patch by Jeff Muizelaar <jmuizelaar@mozilla.com>.

13 years agoMakefile.am: Add pixman@lists.freedesktop.org to RELEASE_ANNOUNCE_LIST
Søren Sandmann Pedersen [Mon, 4 Jul 2011 19:55:52 +0000 (15:55 -0400)]
Makefile.am: Add pixman@lists.freedesktop.org to RELEASE_ANNOUNCE_LIST

13 years agoPost-release version bump to 0.23.3
Søren Sandmann Pedersen [Mon, 4 Jul 2011 19:35:17 +0000 (15:35 -0400)]
Post-release version bump to 0.23.3

13 years agoPre-release version bump to 0.23.2 pixman-0.23.2
Søren Sandmann Pedersen [Mon, 4 Jul 2011 12:13:19 +0000 (08:13 -0400)]
Pre-release version bump to 0.23.2

13 years agoBilinear REPEAT_NORMAL source line extension for too short src_width
Taekyun Kim [Mon, 13 Jun 2011 10:53:49 +0000 (19:53 +0900)]
Bilinear REPEAT_NORMAL source line extension for too short src_width

To avoid function call and other calculation overhead, extend source
scanline into temporary buffer when source width is too small.
Temporary buffer will be repeatedly accessed, so extension cost is
very small due to cache effect.

13 years agoEnable REPEAT_NORMAL bilinear fast path entries
Taekyun Kim [Wed, 8 Jun 2011 08:17:42 +0000 (17:17 +0900)]
Enable REPEAT_NORMAL bilinear fast path entries

13 years agoARM: Add REPEAT_NORMAL functions to bilinear BIND macros
Taekyun Kim [Wed, 8 Jun 2011 08:14:29 +0000 (17:14 +0900)]
ARM: Add REPEAT_NORMAL functions to bilinear BIND macros

Now bilinear template support REPEAT_NORMAL, so functions for that
is added to PIXMAN_ARM_BIND_SCALED_BILINEAR_ macros. Fast path
entries are not enabled yet.

13 years agosse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite function
Taekyun Kim [Wed, 8 Jun 2011 08:11:24 +0000 (17:11 +0900)]
sse2: Declare bilinear src_8888_8888 REPEAT_NORMAL composite function

Now bilinear template support REPEAT_NORMAL, so declare composite
functions using it. Function is just declared not used yet.

13 years agoREPEAT_NORMAL support for bilinear fast path template
Taekyun Kim [Wed, 8 Jun 2011 06:58:01 +0000 (15:58 +0900)]
REPEAT_NORMAL support for bilinear fast path template

The basic idea is to break down normal repeat into a set of
non-repeat scanline compositions and stitching them together.

Bilinear may interpolate last and first pixels of source scanline.
In this case, we can use temporary wrap around buffer.

13 years agoReplace boolean arguments with flags for bilinear fast path template
Taekyun Kim [Wed, 8 Jun 2011 06:37:31 +0000 (15:37 +0900)]
Replace boolean arguments with flags for bilinear fast path template

By replacing boolean arguments with flags, the code can be more
readable and flags can be extended to do some more things later.

Currently following flags are defined.

FLAG_NONE
    - No flags are turned on.

FLAG_HAVE_SOLID_MASK
    - Template will generate solid mask composite functions.

FLAG_HAVE_NON_SOLID_MASK
    - Template will generate bits mask composite functions.

FLAG_HAVE_SOLID_MASK and FLAG_NON_SOLID_MASK should be mutually
exclusive.

13 years agotest: Make fuzzer-find-diff.pl executable
Søren Sandmann [Sat, 25 Jun 2011 14:16:25 +0000 (10:16 -0400)]
test: Make fuzzer-find-diff.pl executable

13 years agoARM: Fix two bugs in neon_composite_over_n_8888_0565_ca().
Søren Sandmann [Mon, 20 Jun 2011 00:29:08 +0000 (20:29 -0400)]
ARM: Fix two bugs in neon_composite_over_n_8888_0565_ca().

The first bug is that a vmull.u8 instruction would store its result in
the q1 register, clobbering the d2 register used later on. The second
is that a vraddhn instruction would overwrite d25, corrupting the q12
register used later.

Fixing the second bug caused a pipeline bubble where the d18 register
would be unavailable for a clock cycle. This is fixed by swapping the
instruction with its successor.

13 years agoblitters-test: Make common formats more likely to be tested.
Søren Sandmann Pedersen [Sun, 19 Jun 2011 23:10:45 +0000 (19:10 -0400)]
blitters-test: Make common formats more likely to be tested.

Move the eight most common formats to the top of the list of image
formats and make create_random_image() much more likely to select one
of those eight formats.

This should help catch more bugs in SIMD optimized operations.

13 years agoSilence autoconf warnings
Andrea Canciani [Fri, 10 Jun 2011 06:56:10 +0000 (08:56 +0200)]
Silence autoconf warnings

Autoconf 2.86 reports:

warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in body

Every code fragment must be wrapped in [AC_LANG_SOURCE([...])]

13 years agoReplace argumentxs to composite functions with a pointer to a struct
Søren Sandmann Pedersen [Fri, 25 Mar 2011 19:09:17 +0000 (15:09 -0400)]
Replace argumentxs to composite functions with a pointer to a struct

This allows more information, such as flags or the composite region,
to be passed to the composite functions.

13 years agoIn pixman-general.c rename image_parameters to {src, mask, dest}_image
Søren Sandmann Pedersen [Fri, 25 Mar 2011 18:20:43 +0000 (14:20 -0400)]
In pixman-general.c rename image_parameters to {src, mask, dest}_image

All the fast paths generally use these names as well.

13 years agoReplace instances of "dst_*" with "dest_*"
Søren Sandmann Pedersen [Fri, 25 Mar 2011 18:17:08 +0000 (14:17 -0400)]
Replace instances of "dst_*" with "dest_*"

The variables in question were dst_x, dst_y, dst_image. The majority
of _x and _y uses were already dest_x and dest_y, while the majority
of _image uses were dst_image.

13 years agodemos: Comment out some unused variables
Søren Sandmann [Sat, 28 May 2011 16:32:35 +0000 (12:32 -0400)]
demos: Comment out some unused variables

13 years agosse2: Delete some unused variables
Søren Sandmann [Sat, 28 May 2011 15:56:32 +0000 (11:56 -0400)]
sse2: Delete some unused variables

13 years agommx: Delete some unused variables
Søren Sandmann [Sat, 28 May 2011 15:51:31 +0000 (11:51 -0400)]
mmx: Delete some unused variables

13 years agoInclude noop in win32 builds
Andrea Canciani [Mon, 23 May 2011 10:08:54 +0000 (12:08 +0200)]
Include noop in win32 builds

13 years agoFix a few typos in pixman-combine.c.template
Nis Martensen [Mon, 2 May 2011 19:43:58 +0000 (21:43 +0200)]
Fix a few typos in pixman-combine.c.template

Some equations have too much multiplication with alpha.

13 years agoMove NOP src iterator into noop implementation.
Søren Sandmann Pedersen [Sat, 23 Apr 2011 14:26:49 +0000 (10:26 -0400)]
Move NOP src iterator into noop implementation.

The iterator for sources where neither RGB nor ALPHA is needed, really
belongs in the noop implementation.

13 years agoMove NULL iterator into pixman-noop.c
Søren Sandmann Pedersen [Sat, 23 Apr 2011 14:24:41 +0000 (10:24 -0400)]
Move NULL iterator into pixman-noop.c

Iterating a NULL image returns NULL for all scanlines. We may as well
do this in the noop iterator.

13 years agoAdd a noop src iterator
Søren Sandmann Pedersen [Wed, 9 Feb 2011 04:42:36 +0000 (23:42 -0500)]
Add a noop src iterator

When the image is a8r8g8b8 and not transformed, and the fetched
rectangle is within the image bounds, scanlines can be fetched by
simply returning a pointer instead of copying the bits.

13 years agoMove noop dest fetching to noop implementation
Søren Sandmann Pedersen [Mon, 24 Jan 2011 17:16:03 +0000 (12:16 -0500)]
Move noop dest fetching to noop implementation

It will at some point become useful to have CPU specific destination
iterators. However, a problem with that, is that such iterators should
not be used if we can composite directly in the destination image.

By moving the noop destination iterator to the noop implementation, we
can ensure that it will be chosen before any CPU specific iterator.

13 years agoAdd a noop composite function for the DST operator
Søren Sandmann Pedersen [Mon, 24 Jan 2011 16:35:27 +0000 (11:35 -0500)]
Add a noop composite function for the DST operator

The DST operator doesn't actually do anything, so add a noop "fast
path" for it, instead of checking in pixman_image_composite32().

The performance tradeoff here is that we get rid of a test for DST in
the common case where the operator is not DST, in return for an extra
walk over the clip rectangles in the uncommon case where the operator
actually is DST.

13 years agoAdd a "noop" implementation.
Søren Sandmann Pedersen [Mon, 24 Jan 2011 16:31:49 +0000 (11:31 -0500)]
Add a "noop" implementation.

This new implementation is ahead of all other implementations in the
fallback chain and is supposed to contain operations that are "noops",
ie., they don't require any work. For example, it might contain a
"fast path" for the DST operator that doesn't actually do anything or
an iterator for a8r8g8b8 that just returns a pointer into the image.

13 years agotest: Fix compilation on win32
Andrea Canciani [Thu, 5 May 2011 08:17:08 +0000 (10:17 +0200)]
test: Fix compilation on win32

MSVC complains about uint32_t being used as an expression:

composite.c(902) : error C2275: 'uint32_t' : illegal use of this type
as an expression

13 years agoCheck for working mmap()
Dave Yeo [Mon, 9 May 2011 10:38:44 +0000 (12:38 +0200)]
Check for working mmap()

OS/2 doesn't have a working mmap().

13 years agoPost-release version bump to 0.23.1
Søren Sandmann Pedersen [Mon, 2 May 2011 09:11:49 +0000 (05:11 -0400)]
Post-release version bump to 0.23.1

13 years agoPre-release version bump to 0.22.0 pixman-0.22.0
Søren Sandmann Pedersen [Mon, 2 May 2011 09:06:33 +0000 (05:06 -0400)]
Pre-release version bump to 0.22.0

13 years agoPost-release version bump to 0.21.9
Søren Sandmann Pedersen [Tue, 19 Apr 2011 04:22:29 +0000 (00:22 -0400)]
Post-release version bump to 0.21.9

13 years agoPre-release version bump to 0.21.8 pixman-0.21.8
Søren Sandmann Pedersen [Tue, 19 Apr 2011 04:00:37 +0000 (00:00 -0400)]
Pre-release version bump to 0.21.8

13 years agoARM: Enable bilinear fast paths using scanline functions in pixman-arm-neon-asm-bilin...
Taekyun Kim [Wed, 13 Apr 2011 02:57:35 +0000 (11:57 +0900)]
ARM: Enable bilinear fast paths using scanline functions in pixman-arm-neon-asm-bilinear.S

Enable fast paths which is supported by scanline functions in
pixman-arm-neon-asm-bilinear.S

13 years agoARM: NEON scanline functions for bilinear scaling
Taekyun Kim [Wed, 13 Apr 2011 02:48:40 +0000 (11:48 +0900)]
ARM: NEON scanline functions for bilinear scaling

General fetch->combine->store based bilinear scanline functions.
Need further optimizations and eventually will be replaced with optimal
functions one by one.
General functions should be located in pixman-arm-neon-asm-bilinear.S and
optimal functions in pixman-arm-neon-asm.S

Following general bilinear scanline functions are implemented
    over_8888_8888
    add_8888_8888
    src_8888_8_8888
    src_8888_8_0565
    src_0565_8_x888
    src_0565_8_0565
    over_8888_8_8888
    add_8888_8_8888

13 years agoARM: Common macro for scaled bilinear scanline function with A8 mask
Taekyun Kim [Wed, 13 Apr 2011 02:43:44 +0000 (11:43 +0900)]
ARM: Common macro for scaled bilinear scanline function with A8 mask

Defining PIXMAN_ARM_BIND_SCALED_BILINEAR_SRC_A8_DST macro for declaration of
scaled bilinear scanline functions in common header.

13 years agoOffset rendering in pixman_composite_trapezoids() by (x_dst, y_dst)
Søren Sandmann Pedersen [Fri, 11 Mar 2011 12:52:57 +0000 (07:52 -0500)]
Offset rendering in pixman_composite_trapezoids() by (x_dst, y_dst)

Previously, this function would do coordinate calculations in such a
way that (x_dst, y_dst) would only affect the alignment of the source
image, but not of the traps, which would always be considered to be in
absolute destination coordinates. This is unlike the
pixman_image_composite() function which also registers the mask to the
destination.

This patch makes it so that traps are also offset by (x_dst, y_dst).

Also add a comment explaining how this function is supposed to
operate, and update tri-test.c and composite-trap-test.c to deal with
the new semantics.

13 years agoARM: Add 'neon_composite_over_n_8888_0565_ca' fast path
Søren Sandmann Pedersen [Sun, 3 Apr 2011 03:24:48 +0000 (23:24 -0400)]
ARM: Add 'neon_composite_over_n_8888_0565_ca' fast path

This improves the performance of the firefox-talos-gfx benchmark with
the image16 backend. Benchmark on an 800 MHz ARM Cortex A8:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]  image16            firefox-talos-gfx  121.773  122.218   0.15%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]  image16            firefox-talos-gfx   85.247   85.563   0.22%    6/6

V2: Slightly better instruction scheduling based on comments from Taekyun Kim.
V3: Eliminate all stalls from the inner loop. Also based on comments from Taekyun Kim.

13 years agoFix OpenMP not supported case
Gilles Espinasse [Tue, 12 Apr 2011 20:44:56 +0000 (22:44 +0200)]
Fix OpenMP not supported case

PIXMAN_LINK_WITH_ENV did not fail unless -Wall -Werror is used.
So even when the compiler did not support OpenMP, USE_OPENMP was defined.
Fix that by running the second OpenMP test only when first AC_OPENMP find supported

configure tested in the cases :
gcc without libgomp support, no openmp option, --enable-openmp and --disable-openmp
gcc with libgomp support, no openmp option, --enable-openmp and --disable-openmp

Not tested with autoconf version not knowing openmp (<2.62)

Warn when --enable-openmp is requested but no support is found

Signed-off-by: Gilles Espinasse <g.esp@free.fr>
13 years agoFix missing AC_MSG_RESULT value from Werror test
Gilles Espinasse [Tue, 12 Apr 2011 20:44:25 +0000 (22:44 +0200)]
Fix missing AC_MSG_RESULT value from Werror test

Use the correct variable name

Signed-off-by: Gilles Espinasse <g.esp@free.fr>
13 years agoARM: pipelined NEON implementation of bilinear scaled 'src_8888_0565'
Siarhei Siamashka [Mon, 21 Mar 2011 18:25:27 +0000 (20:25 +0200)]
ARM: pipelined NEON implementation of bilinear scaled 'src_8888_0565'

Benchmark on ARM Cortex-A8 r1p3 @600MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=10020565, speed=33.59 MPix/s
  after:  op=1, src=20028888, dst=10020565, speed=46.25 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=10020565, speed=63.86 MPix/s
  after:  op=1, src=20028888, dst=10020565, speed=84.22 MPix/s

13 years agoARM: pipelined NEON implementation of bilinear scaled 'src_8888_8888'
Siarhei Siamashka [Wed, 16 Mar 2011 15:24:49 +0000 (17:24 +0200)]
ARM: pipelined NEON implementation of bilinear scaled 'src_8888_8888'

Performance of the inner loop when working with the data in L1 cache:
    ARM Cortex-A8: 41 cycles per 4 pixels (no stalls and partial dual issue)
    ARM Cortex-A9: 48 cycles per 4 pixels (no stalls)

It might be still possible to improve performance even more on ARM Cortex-A8
with a better use of dual issue.

Benchmark on ARM Cortex-A8 r1p3 @600MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=40.38 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=48.47 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=79.68 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=93.11 MPix/s

13 years agoARM: support different levels of loop unrolling in bilinear scaler
Siarhei Siamashka [Thu, 17 Mar 2011 17:42:01 +0000 (19:42 +0200)]
ARM: support different levels of loop unrolling in bilinear scaler

Now an extra 'flag' parameter is supported in bilinear scaline scaling
function generation macro. It can be used to enable 4 or 8 pixels per
loop iteration unrolling and provide save/restore code for d8-d15
registers.

13 years agoARM: use less ARM instructions in NEON bilinear scaling code
Siarhei Siamashka [Mon, 21 Mar 2011 16:41:53 +0000 (18:41 +0200)]
ARM: use less ARM instructions in NEON bilinear scaling code

This reduces code size and also puts less pressure on the
instruction decoder.

13 years agoARM: support for software pipelining in bilinear macros
Siarhei Siamashka [Wed, 16 Mar 2011 14:33:41 +0000 (16:33 +0200)]
ARM: support for software pipelining in bilinear macros

Now it's possible to override the main loop of bilinear scaling code
with optimized pipelined implementation.

13 years agoARM: use aligned memory writes in NEON bilinear scaling code
Siarhei Siamashka [Thu, 10 Mar 2011 14:12:23 +0000 (16:12 +0200)]
ARM: use aligned memory writes in NEON bilinear scaling code

13 years agoARM: tweaked horizontal weights update in NEON bilinear scaling code
Siarhei Siamashka [Thu, 10 Mar 2011 13:34:10 +0000 (15:34 +0200)]
ARM: tweaked horizontal weights update in NEON bilinear scaling code

Moving horizontal interpolation weights update instructions from the
beginning of loop to its end allows to hide some pipeline stalls and
improve performance.