profile/ivi/pixman.git
13 years agoIn delegate_{src,dest}_iter_init() call delegate directly.
Søren Sandmann Pedersen [Sun, 13 Mar 2011 00:06:02 +0000 (19:06 -0500)]
In delegate_{src,dest}_iter_init() call delegate directly.

There is no reason to go through
_pixman_implementation_{src,dest}_iter_init(), especially since
_pixman_implementation_src_iter_init() is doing various other checks
that only need to be done once.

Also call delegate->src_iter_init() directly in pixman-sse2.c

13 years agoARM: a bit faster NEON bilinear scaling for r5g6b5 source images
Siarhei Siamashka [Wed, 9 Mar 2011 11:55:48 +0000 (13:55 +0200)]
ARM: a bit faster NEON bilinear scaling for r5g6b5 source images

Instructions scheduling improved in the code responsible for fetching r5g6b5
pixels and converting them to the intermediate x8r8g8b8 color format used in
the interpolation part of code. Still a lot of NEON stalls are remaining,
which can be resolved later by the use of pipelining.

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s
          op=1, src=10020565, dst=20020888, speed=36.82 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=41.35 MPix/s
          op=1, src=10020565, dst=20020888, speed=49.16 MPix/s

13 years agoARM: NEON optimization for bilinear scaled 'src_0565_0565'
Siarhei Siamashka [Wed, 9 Mar 2011 11:27:41 +0000 (13:27 +0200)]
ARM: NEON optimization for bilinear scaled 'src_0565_0565'

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=3.30 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=32.29 MPix/s

13 years agoARM: NEON optimization for bilinear scaled 'src_0565_x888'
Siarhei Siamashka [Wed, 9 Mar 2011 11:21:53 +0000 (13:21 +0200)]
ARM: NEON optimization for bilinear scaled 'src_0565_x888'

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=20020888, speed=3.39 MPix/s
  after:  op=1, src=10020565, dst=20020888, speed=36.82 MPix/s

13 years agoARM: NEON optimization for bilinear scaled 'src_8888_0565'
Siarhei Siamashka [Wed, 9 Mar 2011 09:53:04 +0000 (11:53 +0200)]
ARM: NEON optimization for bilinear scaled 'src_8888_0565'

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=10020565, speed=6.56 MPix/s
  after:  op=1, src=20028888, dst=10020565, speed=61.65 MPix/s

13 years agoARM: use common macro template for bilinear scaled 'src_8888_8888'
Siarhei Siamashka [Wed, 9 Mar 2011 09:46:48 +0000 (11:46 +0200)]
ARM: use common macro template for bilinear scaled 'src_8888_8888'

This is a cleanup for old and now duplicated code. The performance improvement
is mostly coming from the enabled use of software prefetch, but instructions
scheduling is also slightly better.

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=53.24 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=74.36 MPix/s

13 years agoARM: NEON: common macro template for bilinear scanline scalers
Siarhei Siamashka [Wed, 9 Mar 2011 09:34:15 +0000 (11:34 +0200)]
ARM: NEON: common macro template for bilinear scanline scalers

This allows to generate bilinear scanline scaling functions targeting
various source and destination color formats. Right now a8r8g8b8/x8r8g8b8
and r5g6b5 color formats are supported. More formats can be added if needed.

13 years agoARM: new bilinear fast path template macro in 'pixman-arm-common.h'
Siarhei Siamashka [Wed, 9 Mar 2011 08:59:46 +0000 (10:59 +0200)]
ARM: new bilinear fast path template macro in 'pixman-arm-common.h'

It can be reused in different ARM NEON bilinear scaling fast path functions.

13 years agoARM: assembly optimized nearest scaled 'src_8888_8888'
Siarhei Siamashka [Sun, 6 Mar 2011 20:16:32 +0000 (22:16 +0200)]
ARM: assembly optimized nearest scaled 'src_8888_8888'

Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=44.36 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=39.79 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=102.36 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=163.12 MPix/s

13 years agoARM: common macro for nearest scaling fast paths
Siarhei Siamashka [Mon, 7 Mar 2011 01:10:43 +0000 (03:10 +0200)]
ARM: common macro for nearest scaling fast paths

The code of nearest scaled 'src_0565_0565' function was generalized
and moved to a common macro, so that it can be reused for other
fast paths.

13 years agoARM: use prefetch in nearest scaled 'src_0565_0565'
Siarhei Siamashka [Sun, 6 Mar 2011 14:17:12 +0000 (16:17 +0200)]
ARM: use prefetch in nearest scaled 'src_0565_0565'

Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=75.02 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=73.63 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=176.12 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=267.50 MPix/s

13 years agotest: Do endian swapping of the source and destination images.
Søren Sandmann Pedersen [Fri, 4 Mar 2011 20:51:18 +0000 (15:51 -0500)]
test: Do endian swapping of the source and destination images.

Otherwise the test fails on big endian. Fix for bug 34767, reported by
Siarhei Siamashka.

13 years agotest: In image_endian_swap() use pixman_image_get_format() to get the bpp.
Søren Sandmann Pedersen [Mon, 7 Mar 2011 18:45:54 +0000 (13:45 -0500)]
test: In image_endian_swap() use pixman_image_get_format() to get the bpp.

There is no reason to pass in the bpp as an argument; it can be gotten
directly from the image.

13 years agoARM: NEON optimization for bilinear scaled 'src_8888_8888'
Siarhei Siamashka [Tue, 22 Feb 2011 16:45:03 +0000 (18:45 +0200)]
ARM: NEON optimization for bilinear scaled 'src_8888_8888'

Initial NEON optimization for bilinear scaling. Can be probably
improved more.

Benchmark on ARM Cortex-A8:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=44.27 MPix/s

13 years agoSSE2 optimization for bilinear scaled 'src_8888_8888'
Siarhei Siamashka [Mon, 21 Feb 2011 18:18:02 +0000 (20:18 +0200)]
SSE2 optimization for bilinear scaled 'src_8888_8888'

A primitive naive implementation of bilinear scaling using SSE2 intrinsics,
which only handles one pixel at a time. It is approximately 2x faster than
pixman general compositing path. Single pass processing without intermediate
temporary buffer contributes to ~15% and loop unrolling contributes to ~20%
of this speedup.

Benchmark on Intel Core i7 (x86-64):
 Using cairo-perf-trace:
  before: image        firefox-planet-gnome   12.566   12.610   0.23%    6/6
  after:  image        firefox-planet-gnome   10.961   11.013   0.19%    5/6

 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=165.38 MPix/s

13 years agotest: check correctness of 'bilinear_pad_repeat_get_scanline_bounds'
Siarhei Siamashka [Mon, 21 Feb 2011 00:07:09 +0000 (02:07 +0200)]
test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds'

Individual correctness check for the new bilinear scaling related
supplementary function. This test program uses a bit wider range
of input arguments, not covered by other tests.

13 years agoMain loop template for fast single pass bilinear scaling
Siarhei Siamashka [Sun, 20 Feb 2011 23:29:02 +0000 (01:29 +0200)]
Main loop template for fast single pass bilinear scaling

Can be used for implementing SIMD optimized fast path
functions which work with bilinear scaled source images.

Similar to the template for nearest scaling main loop, the
following types of mask are supported:
1. no mask
2. non-scaled a8 mask with SAMPLES_COVER_CLIP flag
3. solid mask

PAD repeat is fully supported. NONE repeat is partially
supported (right now only works if source image has alpha
channel or when alpha channel of the source image does not
have any effect on the compositing operation).

13 years agotest: Silence MSVC warnings
Andrea Canciani [Thu, 24 Feb 2011 11:53:39 +0000 (12:53 +0100)]
test: Silence MSVC warnings

MSVC does not notice non-returning functions (abort() / assert(0))
and warns about paths which end with them in non-void functions:

c:\cygwin\home\ranma42\code\fdo\pixman\test\fetch-test.c(114) :
warning C4715: 'reader' : not all control paths return a value
c:\cygwin\home\ranma42\code\fdo\pixman\test\stress-test.c(133) :
warning C4715: 'real_reader' : not all control paths return a value
c:\cygwin\home\ranma42\code\fdo\pixman\test\composite.c(431) :
warning C4715: 'calc_op' : not all control paths return a value

These warnings can be silenced by adding a return after the
termination call.

13 years agoDo not include unused headers
Andrea Canciani [Tue, 22 Feb 2011 21:43:48 +0000 (22:43 +0100)]
Do not include unused headers

pixman-combine32.h is included without being used both in
pixman-image.c and in pixman-general.c.

13 years agotest: Add Makefile for Win32
Andrea Canciani [Tue, 22 Feb 2011 21:04:49 +0000 (22:04 +0100)]
test: Add Makefile for Win32

13 years agotest: Fix tests for compilation on Windows
Andrea Canciani [Tue, 22 Feb 2011 20:46:37 +0000 (21:46 +0100)]
test: Fix tests for compilation on Windows

The Microsoft C compiler cannot handle subobject initialization and
Win32 does not provide snprintf.

Work around these limitations by using normal struct initialization
and using sprintf (a manual check shows that the buffer size is
sufficient).

13 years agoFix compilation on Win32
Andrea Canciani [Thu, 24 Feb 2011 09:44:04 +0000 (10:44 +0100)]
Fix compilation on Win32

Makefile.win32 contained a typo and was missing the dependency from
the built sources.

13 years agoPost-release version bump to 0.21.7
Søren Sandmann Pedersen [Tue, 22 Feb 2011 21:13:32 +0000 (16:13 -0500)]
Post-release version bump to 0.21.7

13 years agoPre-release version bump to 0.21.6
Søren Sandmann Pedersen [Tue, 22 Feb 2011 20:43:41 +0000 (15:43 -0500)]
Pre-release version bump to 0.21.6

13 years agoMinor fix to the RELEASING file
Søren Sandmann Pedersen [Tue, 22 Feb 2011 20:40:34 +0000 (15:40 -0500)]
Minor fix to the RELEASING file

13 years agoDelete pixman-x64-mmx-emulation.h from pixman/Makefile.am
Søren Sandmann Pedersen [Tue, 22 Feb 2011 20:28:17 +0000 (15:28 -0500)]
Delete pixman-x64-mmx-emulation.h from pixman/Makefile.am

13 years agoEnsure that tests run as the last step of a build for 'make check'
Siarhei Siamashka [Tue, 22 Feb 2011 17:28:08 +0000 (19:28 +0200)]
Ensure that tests run as the last step of a build for 'make check'

Previously 'make check' would compile and run tests first, and only
then proceed to compiling demos. Which is not very convenient
because of the need to scroll back console output to see the
tests verdict. Swapping order of SUBDIRS variable entries in
Makefile.am resolves this.

13 years agosse2: Minor coding style cleanups.
Søren Sandmann Pedersen [Fri, 18 Feb 2011 12:38:49 +0000 (07:38 -0500)]
sse2: Minor coding style cleanups.

Also make pixman_fill_sse2() static.

13 years agosse2: Remove pixman-x64-mmx-emulation.h
Søren Sandmann Pedersen [Fri, 18 Feb 2011 12:40:02 +0000 (07:40 -0500)]
sse2: Remove pixman-x64-mmx-emulation.h

Also stop including mmintrin.h

13 years agosse2: Delete obsolete or redundant comments
Søren Sandmann Pedersen [Fri, 18 Feb 2011 12:38:03 +0000 (07:38 -0500)]
sse2: Delete obsolete or redundant comments

13 years agosse2: Remove all the core_combine_* functions
Søren Sandmann Pedersen [Fri, 18 Feb 2011 12:07:45 +0000 (07:07 -0500)]
sse2: Remove all the core_combine_* functions

Now that _mm_empty() is not used anymore, they are no longer different
from the sse2_combine_* functions, so they can be consolidated.

13 years agosse2: Don't compile pixman-sse2.c with -mmmx anymore
Søren Sandmann Pedersen [Fri, 18 Feb 2011 10:15:50 +0000 (05:15 -0500)]
sse2: Don't compile pixman-sse2.c with -mmmx anymore

It's not necessary now that the file doesn't use MMX instructions.

13 years agosse2: Delete unused MMX functions and constants and all _mm_empty()s
Søren Sandmann Pedersen [Fri, 18 Feb 2011 10:07:08 +0000 (05:07 -0500)]
sse2: Delete unused MMX functions and constants and all _mm_empty()s

These are not needed because the SSE2 implementation doesn't use MMX
anymore.

13 years agosse2: Convert all uses of MMX registers to use SSE2 registers instead.
Søren Sandmann Pedersen [Fri, 18 Feb 2011 08:56:20 +0000 (03:56 -0500)]
sse2: Convert all uses of MMX registers to use SSE2 registers instead.

By avoiding use of MMX registers we won't need to call emms all over
the place, which avoids various miscompilation issues.

13 years agoCoding style: core_combine_in_u_pixelsse2 -> core_combine_in_u_pixel_sse2
Søren Sandmann Pedersen [Fri, 18 Feb 2011 08:57:55 +0000 (03:57 -0500)]
Coding style:  core_combine_in_u_pixelsse2 -> core_combine_in_u_pixel_sse2

13 years agoIn pixman_image_set_transform() allow NULL for transform
Søren Sandmann Pedersen [Tue, 15 Feb 2011 14:11:44 +0000 (09:11 -0500)]
In pixman_image_set_transform() allow NULL for transform

Previously, this would crash unless the existing transform were also
NULL.

13 years agoAvoid marking images dirty when properties are reset
Søren Sandmann Pedersen [Tue, 15 Feb 2011 09:55:02 +0000 (04:55 -0500)]
Avoid marking images dirty when properties are reset

When an image property is set to the same value that it already is,
there is no reason to mark the image dirty and incur a recomputation
of the flags.

13 years agoAdd new public function pixman_add_triangles()
Søren Sandmann Pedersen [Fri, 11 Feb 2011 13:57:42 +0000 (08:57 -0500)]
Add new public function pixman_add_triangles()

This allows some more code to be deleted from the X server. The
implementation consists of converting to trapezoids, and is shared
with pixman_composite_triangles().

13 years agoOptimize adding opaque trapezoids onto a8 destination.
Søren Sandmann Pedersen [Fri, 14 Jan 2011 11:19:08 +0000 (06:19 -0500)]
Optimize adding opaque trapezoids onto a8 destination.

When the source is opaque and the destination is alpha only, we can
avoid the temporary mask and just add the trapezoids directly.

13 years agoAdd a test program, tri-test
Søren Sandmann Pedersen [Wed, 12 Jan 2011 08:02:59 +0000 (03:02 -0500)]
Add a test program, tri-test

This program tests whether the new triangle support works.

13 years agoAdd support for triangles to pixman.
Søren Sandmann Pedersen [Tue, 11 Jan 2011 15:15:21 +0000 (10:15 -0500)]
Add support for triangles to pixman.

The Render X extension can draw triangles as well as trapezoids, but
the implementation has always converted them to trapezoids. This patch
moves the X server's triangle conversion code into pixman, where we
can reuse the pixman_composite_trapezoid() code.

13 years agoAdd a test program for pixman_composite_trapezoids().
Søren Sandmann Pedersen [Thu, 10 Feb 2011 15:37:08 +0000 (10:37 -0500)]
Add a test program for pixman_composite_trapezoids().

A CRC32 based test program to check that pixman_composite_trapezoids()
actually works.

13 years agoAdd pixman_composite_trapezoids().
Søren Sandmann Pedersen [Tue, 11 Jan 2011 14:23:43 +0000 (09:23 -0500)]
Add pixman_composite_trapezoids().

This function is an implementation of the X server request
Trapezoids. That request is what the X backend of cairo is using all
the time; by moving it into pixman we can hopefully make it faster.

13 years agotest/Makefile.am: Move all the TEST_LDADD into a new global LDADD.
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:40:53 +0000 (19:40 -0500)]
test/Makefile.am: Move all the TEST_LDADD into a new global LDADD.

This gets rid of a bunch of replicated *_LDADD clauses

13 years agoAdd @TESTPROGS_EXTRA_LDFLAGS@ to AM_LDFLAGS
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:20:18 +0000 (19:20 -0500)]
Add @TESTPROGS_EXTRA_LDFLAGS@ to AM_LDFLAGS

Instead of explicitly adding it to each test program.

13 years agoMove all the GTK+ based test programs to a new subdir, "demos"
Søren Sandmann Pedersen [Wed, 19 Jan 2011 00:16:39 +0000 (19:16 -0500)]
Move all the GTK+ based test programs to a new subdir, "demos"

This separates the test suite from the random gtk+ using test
programs. "demos" is somewhat misleading because the programs there
are not particularly exciting (with the possible exception of
composite-test which shows off all the compositing operators).

13 years agoSSE2 optimization for nearest scaled over_8888_n_8888
Siarhei Siamashka [Thu, 3 Feb 2011 22:47:36 +0000 (00:47 +0200)]
SSE2 optimization for nearest scaled over_8888_n_8888

This operation shows up a little bit in some of the html5 based
games from http://www.kesiev.com/akihabara/

=== Cairo trace of the game intro animation for 'Legend of Sadness' ===

before:
[  0]    image    firefox-legend-of-sadness   46.286   46.298   0.01%    5/6

after:
[  0]    image    firefox-legend-of-sadness   45.088   45.102   0.04%    6/6

=== Microbenchmark (scaling ~2000x~2000 -> ~2000x~2000) ===

before:
    translucent: op=3, src=8888, mask=s dst=8888, speed=131.30 MPix/s
    transparent: op=3, src=8888, mask=s dst=8888, speed=132.38 MPix/s
    opaque:      op=3, src=8888, mask=s dst=8888, speed=167.90 MPix/s
after:
    translucent: op=3, src=8888, mask=s dst=8888, speed=301.93 MPix/s
    transparent: op=3, src=8888, mask=s dst=8888, speed=770.70 MPix/s
    opaque:      op=3, src=8888, mask=s dst=8888, speed=301.80 MPix/s

13 years agoARM: NEON optimization for nearest scaled over_0565_8_0565
Siarhei Siamashka [Wed, 3 Nov 2010 13:22:28 +0000 (15:22 +0200)]
ARM: NEON optimization for nearest scaled over_0565_8_0565

In some cases may be used for html5 video when hardware acceleration
is not available.

13 years agoARM: NEON optimization for nearest scaled over_8888_8_0565
Siarhei Siamashka [Wed, 3 Nov 2010 13:16:28 +0000 (15:16 +0200)]
ARM: NEON optimization for nearest scaled over_8888_8_0565

In some cases may be used for html5 video when hardware acceleration
is not available.

13 years agoARM: new macro template for using scaled fast paths with a8 mask
Siarhei Siamashka [Wed, 3 Nov 2010 13:15:15 +0000 (15:15 +0200)]
ARM: new macro template for using scaled fast paths with a8 mask

13 years agoBetter support for NONE repeat in nearest scaling main loop template
Siarhei Siamashka [Wed, 2 Feb 2011 16:14:56 +0000 (18:14 +0200)]
Better support for NONE repeat in nearest scaling main loop template

Scaling function now gets an extra boolean argument, which is set
to TRUE when we are fetching padding pixels for NONE repeat. This
allows to make a decision whether to interpret alpha as 0xFF or 0x00
for such pixels when working with formats which don't have alpha
channel (for example x8r8g8b8 and r5g6b5).

13 years agoSupport for a8 and solid mask in nearest scaling main loop template
Siarhei Siamashka [Fri, 22 Oct 2010 14:54:41 +0000 (17:54 +0300)]
Support for a8 and solid mask in nearest scaling main loop template

In addition to the most common case of not having any mask at all, two
variants of scaling with mask show up in cairo traces:
1. non-scaled a8 mask with SAMPLES_COVER_CLIP flag
2. solid mask

This patch extends the nearest scaling main loop template to also
support these cases.

13 years agotest: Extend scaling-test to support a8/solid mask and ADD operation
Siarhei Siamashka [Fri, 22 Oct 2010 13:29:01 +0000 (16:29 +0300)]
test: Extend scaling-test to support a8/solid mask and ADD operation

Image width also has been increased because SIMD optimizations typically
do more unrolling in the inner loops, and this needs to be tested.

13 years agoUse const modifiers for source buffers in nearest scaling fast paths
Siarhei Siamashka [Mon, 17 Jan 2011 00:29:43 +0000 (02:29 +0200)]
Use const modifiers for source buffers in nearest scaling fast paths

13 years agoC fast paths for a simple 90/270 degrees rotation
Siarhei Siamashka [Fri, 30 Jul 2010 15:37:51 +0000 (18:37 +0300)]
C fast paths for a simple 90/270 degrees rotation

Depending on CPU architecture, performance is in the range of 1.5 to 4 times
slower than simple nonrotated copy (which would be an ideal case, perfectly
utilizing memory bandwidth), but still is more than 7 times faster if
compared to general path.

This implementation sets a performance baseline for rotation. The use
of SIMD instructions may further improve memory bandwidth utilization.

13 years agoNew flags for 90/180/270 rotation
Siarhei Siamashka [Thu, 29 Jul 2010 14:58:13 +0000 (17:58 +0300)]
New flags for 90/180/270 rotation

These flags are set when the transform is a simple nonscaled 90/180/270
degrees rotation.

13 years agotest: affine-test updated to stress 90/180/270 degrees rotation more
Siarhei Siamashka [Tue, 26 Oct 2010 12:40:01 +0000 (15:40 +0300)]
test: affine-test updated to stress 90/180/270 degrees rotation more

13 years agoAdd pixman-conical-gradient.c to Makefile.win32.
Søren Sandmann Pedersen [Thu, 10 Feb 2011 10:21:42 +0000 (05:21 -0500)]
Add pixman-conical-gradient.c to Makefile.win32.

Pointed out by Kirill Tishin.

13 years agoAdd SSE2 fetcher for 0565
Søren Sandmann Pedersen [Sun, 23 Jan 2011 21:53:26 +0000 (16:53 -0500)]
Add SSE2 fetcher for 0565

Before:

add_0565_0565 = L1:  61.08  L2:  61.03  M: 60.57 ( 10.95%)  HT: 46.85  VT: 45.25  R: 39.99  RT: 20.41 ( 233Kops/s)

After:

add_0565_0565 = L1:  77.84  L2:  76.25  M: 75.38 ( 13.71%)  HT: 55.99  VT: 54.56  R: 45.41  RT: 21.95 ( 255Kops/s)

13 years agoImprove performance of sse2_combine_over_u()
Søren Sandmann Pedersen [Fri, 31 Dec 2010 05:57:46 +0000 (00:57 -0500)]
Improve performance of sse2_combine_over_u()

Split this function into two, one that has a mask, and one that
doesn't. This is a fairly substantial speed-up in many cases.

New output of lowlevel-blt-bench over_x888_8_0565:

over_x888_8_0565 =  L1:  63.76  L2:  62.75  M: 59.37 ( 21.55%)  HT: 45.89  VT: 43.55  R: 34.51  RT: 16.80 ( 201Kops/s)

13 years agoAdd SSE2 fetcher for a8
Søren Sandmann Pedersen [Sun, 23 Jan 2011 21:17:17 +0000 (16:17 -0500)]
Add SSE2 fetcher for a8

New output of lowlevel-blt-bench over_x888_8_0565:

over_x888_8_0565 =  L1:  57.85  L2:  56.80  M: 54.14 ( 19.50%)  HT: 42.64  VT: 40.56  R: 32.67  RT: 16.22 ( 195Kops/s)

Based in part on code by Steve Snyder from

    https://bugs.freedesktop.org/show_bug.cgi?id=21173

13 years agoAdd SSE2 fetcher for x8r8g8b8
Søren Sandmann Pedersen [Wed, 12 Jan 2011 11:38:54 +0000 (06:38 -0500)]
Add SSE2 fetcher for x8r8g8b8

New output of lowlevel-blt-bench over_x888_8_0565:

over_x888_8_0565 =  L1:  55.68  L2:  55.11  M: 52.83 ( 19.04%)  HT: 39.62  VT: 37.70  R: 30.88  RT: 14.62 ( 174Kops/s)

The fetcher is looked up in a table, so that other fetchers can easily
be added.

See also https://bugs.freedesktop.org/show_bug.cgi?id=20709

13 years agoAdd a test for over_x888_8_0565 in lowlevel_blt_bench().
Søren Sandmann Pedersen [Sat, 22 Jan 2011 22:13:19 +0000 (17:13 -0500)]
Add a test for over_x888_8_0565 in lowlevel_blt_bench().

The next few commits will speed this up quite a bit.

Current output:

---
reference memcpy speed = 2217.5MB/s (554.4MP/s for 32bpp fills)
---
over_x888_8_0565 =  L1:  54.67  L2:  54.01  M: 52.33 ( 18.88%)  HT: 37.19  VT: 35.54  R: 29.40  RT: 13.63 ( 162Kops/s)

13 years agoMove fallback decisions from implementations into pixman-cpu.c.
Søren Sandmann Pedersen [Mon, 24 Jan 2011 17:24:42 +0000 (12:24 -0500)]
Move fallback decisions from implementations into pixman-cpu.c.

Instead of having each individual implementation decide which fallback
to use, move it into pixman-cpu.c, where a more global decision can be
made.

This is accomplished by adding a "fallback" argument to all the
pixman_implementation_create_*() implementations, and then in
_pixman_choose_implementation() pass in the desired fallback.

13 years agoPrint a warning when a development snapshot is being configured.
Søren Sandmann Pedersen [Fri, 21 Jan 2011 19:47:33 +0000 (14:47 -0500)]
Print a warning when a development snapshot is being configured.

It seems to be relatively common for people to use development
snapshots of pixman thinking they are ordinary releases. This patch
makes it such that if the current minor version is odd, configure will
print a banner explaining the version number scheme plus information
about where to report bugs.

13 years agoFix "variable was set but never used" warnings
Rolland Dudemaine [Tue, 25 Jan 2011 13:08:26 +0000 (15:08 +0200)]
Fix "variable was set but never used" warnings

Removes useless variable declarations. This can only result in more
efficient code, as these variables where sometimes assigned, but
their values were never used.

13 years agotest: Use the right enum types instead of int to fix warnings
Rolland Dudemaine [Tue, 25 Jan 2011 12:14:57 +0000 (14:14 +0200)]
test: Use the right enum types instead of int to fix warnings

Green Hills Software MULTI compiler was producing a number
of warnings due to incorrect uses of int instead of the correct
corresponding pixman_*_t type.

13 years agoCorrect the initialization of 'max_vx'
Rolland Dudemaine [Tue, 25 Jan 2011 12:52:49 +0000 (14:52 +0200)]
Correct the initialization of 'max_vx'

http://lists.freedesktop.org/archives/pixman/2011-January/000937.html

13 years agotest: Fix for mismatched 'fence_malloc' prototype/implementation
Rolland Dudemaine [Tue, 25 Jan 2011 11:55:28 +0000 (13:55 +0200)]
test: Fix for mismatched 'fence_malloc' prototype/implementation

Solves compilation problem when 'mprotect' is not available. For
example, when using Green Hills Software MULTI compiler or mingw:
http://lists.freedesktop.org/archives/pixman/2011-January/000939.html

13 years agoThe code in 'bitmap_addrect' already assumes non-null 'reg->data'
Siarhei Siamashka [Mon, 10 Jan 2011 19:01:16 +0000 (21:01 +0200)]
The code in 'bitmap_addrect' already assumes non-null 'reg->data'

So the check of 'reg->data' pointer can be safely removed.

13 years agoPost-release version bump to 0.21.5
Søren Sandmann Pedersen [Wed, 19 Jan 2011 12:47:52 +0000 (07:47 -0500)]
Post-release version bump to 0.21.5

13 years agoPre-release version bump to 0.21.4
Søren Sandmann Pedersen [Wed, 19 Jan 2011 12:38:24 +0000 (07:38 -0500)]
Pre-release version bump to 0.21.4

13 years agoFix dangling-pointer bug in bits_image_fetch_bilinear_no_repeat_8888().
Søren Sandmann Pedersen [Mon, 17 Jan 2011 19:12:20 +0000 (14:12 -0500)]
Fix dangling-pointer bug in bits_image_fetch_bilinear_no_repeat_8888().

The mask_bits variable is only declared in a limited scope, so the
pointer to it becomes invalid instantly. Somehow this didn't actually
trigger any bugs, but Brent Fulgham reported that Bounds Checker was
complaining about it.

Fix the bug by moving mask_bits to the function scope.

13 years agoAdd a test for radial gradients
Andrea Canciani [Wed, 12 Jan 2011 16:43:40 +0000 (17:43 +0100)]
Add a test for radial gradients

radial-test is a port of the radial-gradient test from the cairo test
suite. It has been modified so that some pixels have 0 in both the a
and b coefficients of the quadratic equation solved by the rasterizer,
to expose a division by zero in the original implementation.

13 years agoFix destination fetching
Søren Sandmann Pedersen [Sun, 12 Dec 2010 12:34:42 +0000 (07:34 -0500)]
Fix destination fetching

When fetching from destinations, we need to ignore transformations,
repeat and filtering. Currently we don't ignore them, which means all
kinds of bad things can happen.

This bug fixes this problem by directly calling the scanline fetchers
for destinations instead of going through the full
get_scanline_32/64().

13 years agoTurn on testing for destination transformation
Søren Sandmann Pedersen [Sun, 12 Dec 2010 14:19:13 +0000 (09:19 -0500)]
Turn on testing for destination transformation

13 years agoSkip fetching pixels when possible
Søren Sandmann Pedersen [Sat, 11 Dec 2010 13:10:04 +0000 (08:10 -0500)]
Skip fetching pixels when possible

Add two new iterator flags, ITER_IGNORE_ALPHA and ITER_IGNORE_RGB that
are set when the alpha and rgb values are not needed. If both are set,
then we can skip fetching entirely and just use
_pixman_iter_get_scanline_noop.

13 years agoAdd direct-write optimization back
Søren Sandmann Pedersen [Fri, 10 Dec 2010 21:55:55 +0000 (16:55 -0500)]
Add direct-write optimization back

Introduce a new ITER_LOCALIZED_ALPHA flag that indicates that the
alpha value computed is used only for the alpha channel of the output;
it doesn't affect the RGB channels.

Then in pixman-bits-image.c, if a destination is either a8r8g8b8 or
x8r8g8b8 with localized alpha, the iterator will return a pointer
directly into the image.

13 years agoGet rid of the classify methods
Søren Sandmann Pedersen [Fri, 10 Dec 2010 20:18:48 +0000 (15:18 -0500)]
Get rid of the classify methods

They are not used anymore, and the linear gradient is now doing the
optimization in a different way.

13 years agoLinear: Optimize for horizontal gradients
Søren Sandmann Pedersen [Fri, 10 Dec 2010 20:14:24 +0000 (15:14 -0500)]
Linear: Optimize for horizontal gradients

If the gradient is horizontal, we can reuse the same scanline over and
over. Add support for this optimization to
_pixman_linear_gradient_iter_init().

13 years agoConsolidate the various get_scanline_32() into get_scanline_narrow()
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:59:20 +0000 (14:59 -0500)]
Consolidate the various get_scanline_32() into get_scanline_narrow()

The separate get_scanline_32() functions in solid, linear, radial and
conical images are no longer necessary because all access to these
images now go through iterators.

13 years agoAllow NULL property_changed function
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:44:22 +0000 (14:44 -0500)]
Allow NULL property_changed function

Initialize the field to NULL, and then delete the empty functions from
the solid, linear, radial, and conical images.

13 years agoMove get_scanline_32/64 to the bits part of the image struct
Søren Sandmann Pedersen [Fri, 10 Dec 2010 19:39:01 +0000 (14:39 -0500)]
Move get_scanline_32/64 to the bits part of the image struct

At this point these functions are basically a cache that the bits
image uses for its fetchers, so they can be moved to the bits image.

With the scanline getters only being initialized in the bits image,
the _pixman_image_get_scanline_generic_64 can be moved to
pixman-bits-image.c. That gets rid of the final user of
_pixman_image_get_scanline_32/64, so these can be deleted.

13 years agoUse an iterator in pixman_image_get_solid()
Søren Sandmann Pedersen [Fri, 10 Dec 2010 15:53:02 +0000 (10:53 -0500)]
Use an iterator in pixman_image_get_solid()

This is a step towards getting rid of the
_pixman_image_get_scanline_32/64() functions.

13 years agoVirtualize iterator initialization
Søren Sandmann Pedersen [Fri, 10 Dec 2010 18:26:53 +0000 (13:26 -0500)]
Virtualize iterator initialization

Make src_iter_init() and dest_iter_init() virtual methods in the
implementation struct. This allows individual implementations to plug
in their own CPU specific scanline fetchers.

13 years agoMove iterator initialization to the respective image files
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:40:26 +0000 (12:40 -0500)]
Move iterator initialization to the respective image files

Instead of calling _pixman_image_get_scanline_32/64(), move the
iterator initialization into the respecive image implementations and
call the scanline generators directly.

13 years agoEliminate the _pixman_image_store_scanline_32/64 functions
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:31:29 +0000 (12:31 -0500)]
Eliminate the _pixman_image_store_scanline_32/64 functions

They were only called from next_line_write_narrow/wide, so they could
simply be absorbed into those functions.

13 years agoMove initialization of iterators for bits images to pixman-bits-image.c
Søren Sandmann Pedersen [Fri, 10 Dec 2010 17:19:50 +0000 (12:19 -0500)]
Move initialization of iterators for bits images to pixman-bits-image.c

pixman_iter_t is now defined in pixman-private.h, and iterators for
bits images are being initialized in pixman-bits-image.c

13 years agoAdd iterators in the general implementation
Søren Sandmann Pedersen [Fri, 10 Dec 2010 16:30:27 +0000 (11:30 -0500)]
Add iterators in the general implementation

We add a new structure called a pixman_iter_t that encapsulates the
information required to read scanlines from an image. It contains two
functions, get_scanline() and write_back(). The get_scanline()
function will generate pixels for the current scanline. For iterators
for source images, it will also advance to the next scanline. The
write_back() function is only called for destination images. Its
function is to write back the modified pixels to the image and then
advance to the next scanline.

When an iterator is initialized, it is passed this information:

   - The image to iterate

   - The rectangle to be iterated

   - A buffer that the iterator may (but is not required to) use. This
     buffer is guaranteed to have space for at least width pixels.

   - A flag indicating whether a8r8g8b8 or a16r16g16b16 pixels should
     be fetched

There are a number of (eventual) benefits to the iterators:

   - The initialization of the iterator can be virtualized such that
     implementations can plug in their own CPU specific get_scanline()
     and write_back() functions.

   - If an image is horizontal, it can simply plug in an appropriate
     get_scanline(). This way we can get rid of the annoying
     classify() virtual function.

   - In general, iterators can remember what they did on the last
     scanline, so for example a REPEAT_NONE image might reuse the same
     data for all the empty scanlines generated by the zero-extension.

   - More detailed information can be passed to iterator, allowing
     more specialized fetchers to be used.

   - We can fix the bug where destination filters and transformations
     are not currently being ignored as they should be.

However, this initial implementation is not optimized at all. We lose
several existing optimizations:

   - The ability to composite directly in the destination
   - The ability to only fetch one scanline for horizontal images
   - The ability to avoid fetching the src and mask for the CLEAR
     operator

Later patches will re-introduce these optimizations.

13 years agoARM: do /proc/self/auxv based cpu features detection only in linux
Siarhei Siamashka [Tue, 11 Jan 2011 12:36:24 +0000 (14:36 +0200)]
ARM: do /proc/self/auxv based cpu features detection only in linux

This method is linux specific, but earlier it was tried for any platform
that did not have _MSC_VER macro defined.

13 years agoA new configure option --enable-static-testprogs
Siarhei Siamashka [Mon, 13 Sep 2010 01:21:33 +0000 (04:21 +0300)]
A new configure option --enable-static-testprogs

This option can be used for building fully static binaries of the test
programs so that they can be easily run using qemu-user. With binfmt-misc
configured, 'make check' works fine for crosscompiled pixman builds.

13 years agoMake 'fast_composite_scaled_nearest_*' less suspicious
Siarhei Siamashka [Mon, 10 Jan 2011 16:29:33 +0000 (18:29 +0200)]
Make 'fast_composite_scaled_nearest_*' less suspicious

Taking address of a variable and then using it as an array looks suspicious
to static code analyzers. So change it into an array with 1 element to make
them happy. Both old and new variants of this code are correct because 'vx'
and 'unit_x' arguments are set to 0 and it means that the called scanline
function can only access a single element of 'zero' buffer.

13 years agoBugfix for a corner case in 'pixman_transform_is_inverse'
Siarhei Siamashka [Mon, 10 Jan 2011 16:09:16 +0000 (18:09 +0200)]
Bugfix for a corner case in 'pixman_transform_is_inverse'

When 'pixman_transform_multiply' fails, the result of multiplication just
could not have been identity matrix (one of the values in the resulting
matrix can't be represented as 16.16 fixed point value). So it is safe
to return FALSE.

13 years agoWorkaround for a preprocessor issue in old Sun Studio
Siarhei Siamashka [Tue, 4 Jan 2011 11:42:29 +0000 (13:42 +0200)]
Workaround for a preprocessor issue in old Sun Studio

Patch from Peter O'Gorman with some modifications

https://bugs.freedesktop.org//show_bug.cgi?id=32764

13 years agoFix for "syntax error: empty declaration" Solaris Studio warnings
Siarhei Siamashka [Tue, 4 Jan 2011 06:41:02 +0000 (08:41 +0200)]
Fix for "syntax error: empty declaration" Solaris Studio warnings

13 years agoRevert "Fix "syntax error: empty declaration" warnings."
Siarhei Siamashka [Tue, 4 Jan 2011 06:18:38 +0000 (08:18 +0200)]
Revert "Fix "syntax error: empty declaration" warnings."

This reverts commit b924bb1f8191cc7c386d8211d9822aeeaadcab44.

There is a better fix for these Solaris Studio warnings.

13 years agoImprove handling of tangent circles
Andrea Canciani [Tue, 23 Nov 2010 10:37:54 +0000 (11:37 +0100)]
Improve handling of tangent circles

When b is 0, avoid the division by zero and just return transparent
black.

When the solution t would have an invalid radius (negative or outside
[0,1] for none-extended gradients), return transparent black.

13 years agosse2: Skip src pixels that are zero in sse2_composite_over_8888_n_8888()
Søren Sandmann Pedersen [Mon, 20 Dec 2010 21:11:48 +0000 (16:11 -0500)]
sse2: Skip src pixels that are zero in sse2_composite_over_8888_n_8888()

This is a big speed-up in the SVG helicopter game:

   http://ie.microsoft.com/testdrive/Performance/Helicopter/Default.xhtml

when rendered by Firefox 4 since it is compositing big images
consisting almost entirely of zeros.

13 years agoFix divide-by-zero in set_lum().
Søren Sandmann Pedersen [Sat, 18 Dec 2010 11:06:39 +0000 (06:06 -0500)]
Fix divide-by-zero in set_lum().

When (l - min) or (max - l) are zero, simply set all the channels to
the limit, 0 in the case of (l - min), and a in the case of (max - l).

13 years agoAdd a test compositing with the various PDF operators.
Søren Sandmann Pedersen [Sat, 18 Dec 2010 11:05:52 +0000 (06:05 -0500)]
Add a test compositing with the various PDF operators.

The test has floating point exceptions enabled, and currently fails
with a divide-by-zero.