platform/upstream/pixman.git
5 years agomeson: store ARM SIMD and NEON tests as text files
Dylan Baker [Mon, 25 Mar 2019 23:28:06 +0000 (16:28 -0700)]
meson: store ARM SIMD and NEON tests as text files

This is unfortunately required to make the tests work correctly, as
otherwise meson assumes that the files are C code not assembly. I've
opened https://github.com/mesonbuild/meson/issues/5151, to discuss
fixing the issue in meson upstream.

Fixes #29

5 years agomeson: simplify and fix mmx library compilation
Dylan Baker [Mon, 25 Mar 2019 23:10:11 +0000 (16:10 -0700)]
meson: simplify and fix mmx library compilation

This simplifies the logic and fixes the loongson-mmi implementation to
build correctly.

5 years agomeson: Add proper include paths for the loongson check
Dylan Baker [Mon, 25 Mar 2019 23:05:33 +0000 (16:05 -0700)]
meson: Add proper include paths for the loongson check

5 years agomeson: fix copy-n-paste error for arm simd assembly
Dylan Baker [Mon, 25 Mar 2019 22:39:18 +0000 (15:39 -0700)]
meson: fix copy-n-paste error for arm simd assembly

mentioned in #29

5 years agomeson: fix typo which breaks loongson checks
Dylan Baker [Mon, 25 Mar 2019 22:24:12 +0000 (15:24 -0700)]
meson: fix typo which breaks loongson checks

mach -> march

5 years agomeson: work around meson issue #5115
Dylan Baker [Mon, 25 Mar 2019 22:22:13 +0000 (15:22 -0700)]
meson: work around meson issue #5115

This issue causes openmp arguments to be injected into compilers that
can support openmp, even if they don't. This issue will be fixed in
0.51 (code already landed in mesonbuild#5116), for older versions lets
work around the issue.

5 years agoBump version to 0.38.0 pixman-0.38.0
Maarten Lankhorst [Mon, 11 Feb 2019 12:25:14 +0000 (13:25 +0100)]
Bump version to 0.38.0

And update RELEASING for the new meson build system.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
5 years agopixman: Use maximum precision for pixman-bits-image, v2.
Maarten Lankhorst [Thu, 6 Dec 2018 14:42:26 +0000 (15:42 +0100)]
pixman: Use maximum precision for pixman-bits-image, v2.

pixman-bits-image's wide helpers first obtains the 8-bits image,
then converts it to float. This destroys all the precision that
the wide path was offering.

Fix this by making get_pixel() take a pointer instead of returning
a value. Floating point will fill in a argb_t, while the 8-bits path
will fill a 32-bits ARGB value. This also requires writing a floating
point bilinear interpolator. With this change pixman can use the full
floating point precision internally in all paths.

Changes since v1:
- Make accum and reduce an argument to convolution functions,
  to remove duplication.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Basile Clement <basile-pixman@clement.pm>
5 years agoImplement floating point gradient computation, v2.
Basile Clement [Mon, 3 Dec 2018 14:55:28 +0000 (15:55 +0100)]
Implement floating point gradient computation, v2.

This patch modifies the gradient walker to be able to generate floating
point values directly in addition to a8r8g8b8 32 bit values.  This is
then used by the various gradient implementations to render in floating
point when asked to do so, instead of rendering to a8r8g8b8 and then
expanding to floating point as they were doing previously.

Changes since v1 (mlankhorst):
- Implement pixman_gradient_walker_pixel_32 without calling
  pixman_gradient_walker_pixel_float, to prevent performance degradation.
  Suggested by Adam Jackson.
- Fix whitespace errors.
- Remove unnecessary function prototypes in pixman-private.h

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[mlankhorst: Add comment about pixman_contract_from_float,
             based on Basille's suggestion]
Acked-by: Basile Clement <basile-pixman@clement.pm>
5 years agobuild: Add meson files to EXTRA_DIST
Dylan Baker [Thu, 29 Nov 2018 21:48:22 +0000 (13:48 -0800)]
build: Add meson files to EXTRA_DIST

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoeditorconfig: use tabs for Makefiles
Dylan Baker [Thu, 29 Nov 2018 21:48:12 +0000 (13:48 -0800)]
editorconfig: use tabs for Makefiles

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoMerge remote-tracking branch 'origin/master'
Maarten Lankhorst [Fri, 7 Dec 2018 13:18:00 +0000 (14:18 +0100)]
Merge remote-tracking branch 'origin/master'

And bump meson version to 37.1 as well. Seems my push to upstream failed.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
5 years agoPost release version bump to 37.1
Maarten Lankhorst [Fri, 7 Dec 2018 12:44:38 +0000 (13:44 +0100)]
Post release version bump to 37.1

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
5 years agogitlab-ci: Add meson build to pipeline test
Dylan Baker [Fri, 31 Aug 2018 20:05:02 +0000 (13:05 -0700)]
gitlab-ci: Add meson build to pipeline test

5 years agomeson: Add a meson build system
Dylan Baker [Thu, 30 Aug 2018 22:07:51 +0000 (15:07 -0700)]
meson: Add a meson build system

This commit adds a meson build system for pixman. It carries the usual
improvements of meson, better clean build time, much better incremental
build times, while being simpler and easier to understand.

This takes advantage of some features from the most recent versions of
meson: the builtin openmp dependency and the feature option type.

There are a couple of things that I've done a bit differently than the
autotools build system, I've built a libdemos which is the utilities
from the demos folder, and I've linked the demos with libtestutils from
tetsts, otherwise I expect that most things will be the same.

I've tested so far cross compiling from x86_64 -> x86, x86_64 ->
Aarch64, and Linux to Windows via mingw, as well as native x86_64 Linux
builds which all work. I've also built with mingw nativly, there are
some test failures there. An MSVC build can be generated, but fails.

v2: - set WORDS_BIGENDIAN in the config for big endian systems.

5 years agoAdd .editorconfig file
Dylan Baker [Wed, 29 Aug 2018 23:14:54 +0000 (16:14 -0700)]
Add .editorconfig file

This sets the style for meson (which uses the upstream style, 2 space
indent with no tabs), and sets the tab_width to 8 per the CODING_STYLE
document.

5 years agoBump version to 0.36.0 pixman-0.36.0
Maarten Lankhorst [Wed, 21 Nov 2018 11:40:21 +0000 (12:40 +0100)]
Bump version to 0.36.0

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
5 years agopixman: Update git repository to the one at gitlab.
Maarten Lankhorst [Wed, 21 Nov 2018 11:39:11 +0000 (12:39 +0100)]
pixman: Update git repository to the one at gitlab.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
5 years agopixman: Add tests for (a)rgb floating point formats.
Maarten Lankhorst [Wed, 11 Jul 2018 10:10:41 +0000 (12:10 +0200)]
pixman: Add tests for (a)rgb floating point formats.

Add some basic tests to ensure that the newly added formats work as
intended.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
5 years agopixman: Add support for argb/xrgb float formats, v5.
Maarten Lankhorst [Wed, 30 May 2018 14:07:10 +0000 (16:07 +0200)]
pixman: Add support for argb/xrgb float formats, v5.

Pixman is already using the floating point formats internally, expose
this capability in case someone wants to support higher bit per
component formats.

This is useful for igt which depends on cairo to do the rendering.
It can use it to convert floats internally to planar Y'CbCr formats,
or to F16.

We add a new type PIXMAN_TYPE_RGBA_FLOAT for this format, which is an
all float array of R, G, B, and A. Formats that use mixed float/int
RGBA aren't supported, and will probably need their own type.

Changes since v1:
- Use RGBA 128 bits and RGB 96 bits memory layouts, to better match the opengl format.
Changes since v2:
- Add asserts in accessor and for strides to force alignment.
- Move test changes to their own commit.
Changes since v3:
- Define 32bpc as PIXMAN_FORMAT_PACKED_C32
- Rename pixman accessors from rgb*_float_float to rgb*f_float
Changes since v4:
- Create a new PIXMAN_FORMAT_BYTE for fitting up to 64 bits per component.
  (based on Siarhei Siamashka's suggestion)
- Use new format type PIXMAN_TYPE_RGBA_FLOAT

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v4
[mlankhorst: Fix missing braces in PIXMAN_FORMAT_RESHIFT macro]

6 years agotest: Fix stride calculation in stress-test
Siarhei Siamashka [Tue, 12 Jun 2018 14:38:57 +0000 (17:38 +0300)]
test: Fix stride calculation in stress-test

Currently the number of bits per pixel is used instead of the
number of bytes per pixel when calculating image strides. This
does not cause any real problems, but the gaps between scanlines
are excessively large.

This patch actually converts bits to bytes and rounds up the result
to the nearest byte boundary.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: soren.sandmann@gmail.com
6 years agotest: Adjust for clang's removal of __builtin_shuffle
Vladimir Smirnov [Mon, 4 Jun 2018 17:04:15 +0000 (10:04 -0700)]
test: Adjust for clang's removal of __builtin_shuffle

__builtin_shuffle was removed in clang 5.0.

Build log says:
test/utils-prng.c:207:27: error: use of unknown builtin '__builtin_shuffle' [-Wimplicit-function-declaration]
            randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
                          ^
test/utils-prng.c:207:25: error: assigning to 'uint8x16' (vector of 16 'uint8_t' values) from incompatible type 'int'
            randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
                        ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated

Link to original discussion:
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055140.html

It's possible to build pixman if attached patch is applied. Basically
patch adds check for __builtin_shuffle support and in case there is
none, falls back to clang-specific __builtin_shufflevector that do the
same but have different API.

Bugzilla: https://bugs.gentoo.org/646360
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104886
Tested-by: Philip Chimento <philip.chimento@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agoMerge branch 'ci' into 'master'
Adam Jackson [Tue, 5 Jun 2018 16:33:50 +0000 (16:33 +0000)]
Merge branch 'ci' into 'master'

ci: Add .gitlab-ci.yml

See merge request pixman/pixman!1

6 years agoci: Add .gitlab-ci.yml
Adam Jackson [Thu, 31 May 2018 16:32:18 +0000 (12:32 -0400)]
ci: Add .gitlab-ci.yml

Just builds on Fedora 28 for x86_64 at the moment, but it's a start.
Credit to Daniel Stone for eliminating the nested docker image.

Signed-off-by: Adam Jackson <ajax@redhat.com>
6 years agovmx: Fix vector loads on ppc64le
Dan Horák [Thu, 10 May 2018 14:47:09 +0000 (10:47 -0400)]
vmx: Fix vector loads on ppc64le

Use vector intrinsic for loading possibly unaligned data instead of a
typecast.

Bugzilla: https://bugzilla.redhat.com/1572540
Signed-off-by: Dan Horák <dan@danny.cz>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
6 years agoPromote unsigned short to unsigned int explicitly
Behdad Esfahbod [Tue, 9 Jan 2018 09:26:29 +0000 (10:26 +0100)]
Promote unsigned short to unsigned int explicitly

...to avoid default promotion to signed int, which causes undefined
behaviour in the shift expression.

7 years agoRevert "demos/scale: Added pulldown to choose PIXMAN_FILTER_* value"
Søren Sandmann Pedersen [Sat, 3 Sep 2016 19:09:12 +0000 (15:09 -0400)]
Revert "demos/scale: Added pulldown to choose PIXMAN_FILTER_* value"

This reverts commit 375f5ec5c5d2a6cc3586f57e36fdf08a3d0ac4e4.

This patch was accidentally pushed.

7 years agopixman-filter: Made Gaussian a bit wider
Bill Spitzak [Wed, 31 Aug 2016 05:03:15 +0000 (22:03 -0700)]
pixman-filter: Made Gaussian a bit wider

Expanded the size slightly (from ~4.25 to 5) to make the cutoff less
noticable.  Previouly the value at the cutoff was
gaussian_filter(sqrt(2)*3/2) = 0.00626 which is larger than the
difference between 8-bit pixels (1/255 = 0.003921). New cutoff is
gaussian_filter(2.5) = 0.001089 which is smaller.

v11: added some math to commit message
v14: left SIGMA in there
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agopixman-filter: Nested polynomial for cubic
Bill Spitzak [Wed, 31 Aug 2016 05:03:14 +0000 (22:03 -0700)]
pixman-filter: Nested polynomial for cubic

v11: Restored range checks

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
7 years agopixman-filter: Fix several issues related to normalization
Søren Sandmann Pedersen [Sat, 9 Apr 2016 02:32:30 +0000 (22:32 -0400)]
pixman-filter: Fix several issues related to normalization

There are a few bugs in the current normalization code

(1) The normalization is based on the sum of the *floating point*
    values generated by integral(). But in order to get the sum to be
    close to pixman_fixed_1, the sum of the rounded fixed point values
    should be used.

(2) The multiplications in the normalization loops often round the
    same way, so the residual error can fairly large.

(3) The residual error is added to the sample located at index
    (width - width / 2), which is not the midpoint for odd widths (and
    for width 1 is in fact outside the array).

This patch fixes these issues by (1) using the sum of the fixed point
values as the total to divide by, (2) doing error diffusion in the
normalization loop, and (3) putting any residual error (which is now
guaranteed to be less than pixman_fixed_e) at the first sample, which
is the only one that didn't get any error diffused into it.

Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agopixman-filter: Speed up BOX/BOX filter
Søren Sandmann Pedersen [Wed, 31 Aug 2016 05:03:12 +0000 (22:03 -0700)]
pixman-filter: Speed up BOX/BOX filter

The convolution of two BOX filters is simply the length of the
interval where both are non-zero, so we can simply return width from
the integral() function because the integration region has already
been restricted to be such that both functions are non-zero on it.

This is both faster and more accurate than doing numerical integration.

This patch is based on one by Bill Spitzak

    https://lists.freedesktop.org/archives/pixman/2016-March/004446.html

with these changes:

- Rebased to not assume any changes in the arguments to integral().

- Dropped the multiplication by scale

- Added more details in the commit message.

Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
7 years agopixman-filter: integral splitting is only needed for triangle filter
Bill Spitzak [Wed, 31 Aug 2016 05:03:11 +0000 (22:03 -0700)]
pixman-filter: integral splitting is only needed for triangle filter

Only the triangle is discontinuous at 0. The other filters resemble a
cubic closely enough that Simpsons integration works without
splitting.

Changes by Søren: Rebase without the changes to the integral function,
update comment to match the new code.

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agopixman-filter: Correct Simpsons integration
Bill Spitzak [Wed, 31 Aug 2016 05:03:10 +0000 (22:03 -0700)]
pixman-filter: Correct Simpsons integration

Simpsons uses cubic curve fitting, with 3 samples defining each
cubic. This makes the weights of the samples be in a pattern of
1,4,2,4,2...4,1, and then dividing the result by 3.

The previous code was using weights of 1,2,0,6,0,6...,2,1.

With this fix the integration is accurate enough that the number of
samples could be reduced a lot. Multiples of 12 seem to work best.

v7: Merged with patch to reduce from 128 samples to 16
v9: Changed samples from 16 to 12
v10: Fixed rebase error that made it not compile
v11: minor whitespace change
v14: more whitespace changes

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agopixman-filter: reduce amount of malloc/free/memcpy to generate filter
Bill Spitzak [Wed, 31 Aug 2016 05:03:09 +0000 (22:03 -0700)]
pixman-filter: reduce amount of malloc/free/memcpy to generate filter

Rearranged so that the entire block of memory for the filter pair
is allocated first, and then filled in. Previous version allocated
and freed two temporary buffers for each filter and did an extra
memcpy.

v8: small refactor to remove the filter_width function

v10: Restored filter_width function but with arguments changed to
     match later patches

v11: Removed unused arg and pointer from filter_width function
     Whitespace fixes.

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agopixman-image: Added enable-gnuplot config to view filters in gnuplot
Bill Spitzak [Wed, 31 Aug 2016 05:03:08 +0000 (22:03 -0700)]
pixman-image: Added enable-gnuplot config to view filters in gnuplot

If enable-gnuplot is configured, then you can pipe the output of a
pixman-using program to gnuplot and get a continuously-updated plot of
the horizontal filter. This works well with demos/scale to test the
filter generation.

The plot is all the different subposition filters shuffled
together. This is misleading in a few cases:

  IMPULSE.BOX - goes up and down as the subfilters have different
                numbers of non-zero samples

  IMPULSE.TRIANGLE - somewhat crooked for the same reason

  1-wide filters - looks triangular, but a 1-wide box would be more
                   accurate

Changes by Søren: Rewrote the pixman-filter.c part to
     - make it generate correct coordinates
     - add a comment on how coordinates are generated
     - in rounding.txt, add a ceil() variant of the first-sample
       formula
     - make the gnuplot output slightly prettier

v7: First time this ability was included

v8: Use config option
    Moved code to the filter generator
    Modified scale demo to not call filter generator a second time.

v10: Only print if successful generation of plots
     Use #ifdef, not #if

v11: small whitespace fixes
v12: output range from -width/2 to width/2 and include y==0, to avoid misleading plots
     for subsample_bits==0 and for box filters which may have no small values.

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
7 years agodemos/scale: Added pulldown to choose PIXMAN_FILTER_* value
Bill Spitzak [Wed, 31 Aug 2016 05:03:07 +0000 (22:03 -0700)]
demos/scale: Added pulldown to choose PIXMAN_FILTER_* value

This is very useful for comparing the results of SEPARABLE_CONVOLUTION
with BILINEAR and NEAREST.

v14: Removed good/best items
v15: Skip filter generation so gnuplot output continues showing previous value

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
7 years agodemos/scale: Default to locked axis
Bill Spitzak [Wed, 31 Aug 2016 05:03:06 +0000 (22:03 -0700)]
demos/scale: Default to locked axis

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agodemos/scale: fix blank subsamples spin box
Bill Spitzak [Wed, 31 Aug 2016 05:03:05 +0000 (22:03 -0700)]
demos/scale: fix blank subsamples spin box

It now shows the initial value of 4 when the demo is started

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
7 years agodemos/scale: Compute filter size using boundary of xformed ellipse
Bill Spitzak [Wed, 31 Aug 2016 05:03:04 +0000 (22:03 -0700)]
demos/scale: Compute filter size using boundary of xformed ellipse

Instead of using the boundary of xformed rectangle, use the boundary
of xformed ellipse. This is much more accurate and less blurry. In
particular the filtering does not change as the image is rotated.

Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Soren Sandmann <soren.sandmann@gmail.com>
7 years agoMore general BILINEAR=>NEAREST reduction
Søren Sandmann Pedersen [Wed, 31 Aug 2016 05:03:03 +0000 (22:03 -0700)]
More general BILINEAR=>NEAREST reduction

Generalize and simplify the code that reduces BILINEAR to NEAREST so
that the reduction happens for all affine transformations where
t00...t12 are integers and (t00 + t01) and (t10 + t11) are both
odd. This is a sufficient condition for the resulting transformed
coordinates to be exactly at the center of a pixel so that BILINEAR
becomes identical to NEAREST.

V2: Address some comments by Bill Spitzak

Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
7 years agoAdd new test of filter reduction from BILINEAR to NEAREST
Søren Sandmann Pedersen [Wed, 31 Aug 2016 05:03:02 +0000 (22:03 -0700)]
Add new test of filter reduction from BILINEAR to NEAREST

This new test tests a bunch of bilinear downscalings, where many have
a transformation such that the BILINEAR filter can be reduced to
NEAREST (and many don't).

A CRC32 is computed for all the resulting images and compared to a
known-good value for both 4-bit and 7-bit interpolation.

V2: Remove leftover comment, some minor formatting fixes, use a
timestamp as the PRNG seed.

Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
7 years agopixman-fast-path.c: Pick NEAREST affine fast paths before BILINEAR ones
Søren Sandmann Pedersen [Wed, 31 Aug 2016 05:03:01 +0000 (22:03 -0700)]
pixman-fast-path.c: Pick NEAREST affine fast paths before BILINEAR ones

When a BILINEAR filter is reduced to NEAREST, it is possible for both
types of fast paths to run; in this case, the NEAREST ones should be
preferred as that is the simpler filter.

Signed-off-by: Soren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
8 years agopixman-private: include <float.h> only in C code
Thomas Petazzoni [Sun, 17 Jan 2016 14:22:50 +0000 (15:22 +0100)]
pixman-private: include <float.h> only in C code

<float.h> is included unconditionally by pixman-private.h, which in
turn gets included by assembler files. Unfortunately, with certain C
libraries (like the musl C library), <float.h> cannot be included in
assembler files:

  CCLD     libpixman-arm-simd.la
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages:
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)'
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages:
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)'

It turns out however that <float.h> is not needed by assembly files,
so we move its inclusion within the #ifndef __ASSEMBLER__ condition,
which solves the problem.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
8 years agobuild: Distinguish SKIP and FAIL on Win32
Andrea Canciani [Wed, 23 Dec 2015 22:22:02 +0000 (23:22 +0100)]
build: Distinguish SKIP and FAIL on Win32

The `check` target in test/Makefile.win32 assumed that any non-0 exit
code from the tests was an error, but the testsuite is currently using
77 as a SKIP exit code (based on the convention used in autotools).

Fixes fence-image-self-test and cover-test (now reported as SKIP).

Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agobuild: Use `del` instead of `rm` on `cmd.exe` shells
Simon Richter [Tue, 22 Dec 2015 21:45:33 +0000 (22:45 +0100)]
build: Use `del` instead of `rm` on `cmd.exe` shells

The `rm` command is not usually available when running on Win32 in a
`cmd.exe` shell. Instead the shell provides the `del` builtin, which
has somewhat more limited wildcars expansion and error handling.

This makes all of the Makefile targets work on Win32 both using
`cmd.exe` and using the MSYS environment.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agobuild: Do not use `mkdir -p` on Windows
Andrea Canciani [Tue, 22 Dec 2015 21:46:05 +0000 (22:46 +0100)]
build: Do not use `mkdir -p` on Windows

When the build is performed using `cmd.exe` as shell, the `mkdir`
command does not support the `-p` flag. The ability to create multiple
netsted folder is not used, hence it can be easily replaced by only
creating the directory if it does not exist.

This makes the build work on the `cmd.exe` shell, except for the
`clean` targets.

Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agobuild: Avoid phony `pixman` target in test/Makefile.win32
Andrea Canciani [Wed, 23 Dec 2015 10:15:59 +0000 (11:15 +0100)]
build: Avoid phony `pixman` target in test/Makefile.win32

Instead of explicitly depending on "pixman" for the "all" and "check"
targets, rely on the dependency to the .lib file

Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agobuild: Remove use of BUILT_SOURCES from Makefile.win32
Andrea Canciani [Tue, 22 Dec 2015 20:53:14 +0000 (21:53 +0100)]
build: Remove use of BUILT_SOURCES from Makefile.win32

Since 3d81d89c292058522cce91338028d9b4c4a23c24 BUILT_SOURCES is not
used anymore, but it was unintentionally left in Win32 Makefiles.

Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoPost 0.34 branch creation version bump to 0.35.1
Oded Gabbay [Wed, 23 Dec 2015 08:46:40 +0000 (10:46 +0200)]
Post 0.34 branch creation version bump to 0.35.1

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoPost-release version bump to 0.33.7
Oded Gabbay [Tue, 22 Dec 2015 13:55:32 +0000 (15:55 +0200)]
Post-release version bump to 0.33.7

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoPre-release version bump to 0.33.6 pixman-0.33.6
Oded Gabbay [Tue, 22 Dec 2015 13:30:10 +0000 (15:30 +0200)]
Pre-release version bump to 0.33.6

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoconfigura.ac: fix test for SSE2 & SSSE3 assembler support
Oded Gabbay [Tue, 15 Dec 2015 12:53:18 +0000 (14:53 +0200)]
configura.ac: fix test for SSE2 & SSSE3 assembler support

This patch modifies the SSE2 & SSSE3 tests in configure.ac to use a
global variable to initialize vector variables. In addition, we now
return the value of the computation instead of 0.

This is done so gcc 4.9 (and lower) won't optimize the SSE assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE instructions (the test will pass because the compiler does support
the intrinsics).

v2: instead of using volatile, use a global variable as input

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agommx: Improve detection of support for "K" constraint
Andrea Canciani [Sun, 11 Oct 2015 07:45:57 +0000 (09:45 +0200)]
mmx: Improve detection of support for "K" constraint

Older versions of clang emitted an error on the "K" constraint, but at
least since version 3.7 it is supported. Just like gcc, this
constraint is only allowed for constants, but apparently clang
requires them to be known before inlining.

Using the macro definition _mm_shuffle_pi16(A, N) ensures that the "K"
constraint is always applied to a literal constant, independently from
the compiler optimizations and allows building pixman-mmx on modern
clang.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
8 years agoRevert "mmx: Use MMX2 intrinsics from xmmintrin.h directly."
Matt Turner [Wed, 18 Nov 2015 22:16:24 +0000 (14:16 -0800)]
Revert "mmx: Use MMX2 intrinsics from xmmintrin.h directly."

This reverts commit 7de61d8d14e84623b6fa46506eb74f938287f536.

Newer versions of gcc allow inclusion of xmmintrin.h without -msse, but
still won't allow usage of the intrinsics.

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=564024

8 years agoPost-release version bump to 0.33.5
Oded Gabbay [Fri, 23 Oct 2015 15:33:55 +0000 (18:33 +0300)]
Post-release version bump to 0.33.5

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoPre-release version bump to 0.33.4 pixman-0.33.4
Oded Gabbay [Fri, 23 Oct 2015 14:58:49 +0000 (17:58 +0300)]
Pre-release version bump to 0.33.4

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agotest: Fix fence-image-self-test on Mac
Andrea Canciani [Tue, 13 Oct 2015 11:35:59 +0000 (13:35 +0200)]
test: Fix fence-image-self-test on Mac

On MacOS X, according to the manpage of mprotect(), "When a program
violates the protections of a page, it gets a SIGBUS or SIGSEGV
signal.", but fence-image-self-test was only accepting a SIGSEGV as
notification of invalid access.

Fixes fence-image-self-test

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
8 years agommx: Use MMX2 intrinsics from xmmintrin.h directly.
Matt Turner [Sun, 11 Oct 2015 21:44:46 +0000 (14:44 -0700)]
mmx: Use MMX2 intrinsics from xmmintrin.h directly.

We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.

Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.

Evidently allows gcc 4.9.3 to optimize better as well:

   text    data     bss     dec     hex filename
 657078   30848     680  688606   a81de libpixman-1.so.0.33.3 before
 656710   30848     680  688238   a806e libpixman-1.so.0.33.3 after

Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Matt Turner <mattst88@gmail.com>
8 years agovmx: implement fast path vmx_composite_over_n_8888
Siarhei Siamashka [Fri, 4 Sep 2015 12:39:00 +0000 (15:39 +0300)]
vmx: implement fast path vmx_composite_over_n_8888

Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
Gentoo ppc (32-bit userland) gave the following results:

before:  over_n_8888 =  L1: 147.47  L2: 205.86  M:121.07
after:   over_n_8888 =  L1: 287.27  L2: 261.09  M:133.48

Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:

ocitysmap          659.69  -> 611.71   :  1.08x speedup
xfce4-terminal-a1  2725.22 -> 2547.47  :  1.07x speedup

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoaffine-bench: remove 8e margin from COVER area
Ben Avison [Fri, 4 Sep 2015 02:09:20 +0000 (03:09 +0100)]
affine-bench: remove 8e margin from COVER area

Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced
the required image area for setting the COVER flags in
pixman.c:analyze_extent(). Do the same reduction in affine-bench.

Leaving the old calculations in place would be very confusing for anyone
reading the code.

Also add a comment that explains how affine-bench wants to hit the COVER
paths. This explains why the intricate extent calculations are copied
from pixman.c.

[Pekka: split patch, change comments, write commit message]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agoRemove the 8e extra safety margin in COVER_CLIP analysis
Ben Avison [Fri, 4 Sep 2015 02:09:20 +0000 (03:09 +0100)]
Remove the 8e extra safety margin in COVER_CLIP analysis

As discussed in
http://lists.freedesktop.org/archives/pixman/2015-August/003905.html

the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed
coordinates is a legacy of rounding errors which used to occur in old
versions of Pixman, but which no longer apply. For any affine transform,
you are now guaranteed to get the same result by transforming the upper
coordinate as though you transform the lower coordinate and add (size-1)
steps of the increment in source coordinate space. No projective
transform routines use the COVER_CLIP flags, so they cannot be affected.

Proof by Siarhei Siamashka:

Let's take a look at the following affine transformation matrix (with 16.16
fixed point values) and two vectors:

         | a   b     c    |
M      = | d   e     f    |
         | 0   0  0x10000 |

         |  x_dst  |
P     =  |  y_dst  |
         | 0x10000 |

         | 0x10000 |
ONE_X  = |    0    |
         |    0    |

The current matrix multiplication code does the following calculations:

             | (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c |
    M * P =  | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f |
             |                   0x10000                      |

These calculations are not perfectly exact and we may get rounding
because the integer coordinates are adjusted by 0.5 (or 0x8000 in the
16.16 fixed point format) before doing matrix multiplication. For
example, if the 'a' coefficient is an odd number and 'b' is zero,
then we are losing some of the least significant bits when dividing by
0x10000.

So we need to strictly prove that the following expression is always
true even though we have to deal with rounding:

                                          | a |
    M * (P + ONE_X) - M * P = M * ONE_X = | d |
                                          | 0 |

or

   ((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c)
  -
   ((a * x_dst             + b * y_dst + 0x8000) / 0x10000 + c)
  =
    a

It's easy to see that this is equivalent to

    a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
      - ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
  =
    a

Which means that stepping exactly by one pixel horizontally in the
destination image space (advancing 'x_dst' by 0x10000) is the same as
changing the transformed 'x_src' coordinate in the source image space
exactly by 'a'. The same applies to the vertical direction too.
Repeating these steps, we can reach any pixel in the source image
space and get exactly the same fixed point coordinates as doing
matrix multiplications per each pixel.

By the way, the older matrix multiplication implementation, which was
relying on less accurate calculations with three intermediate roundings
"((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)",
also has the same properties. However reverting
    http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c
and applying this "Remove the 8e extra safety margin in COVER_CLIP
analysis" patch makes the cover test fail. The real reason why it fails
is that the old pixman code was using "pixman_transform_point_3d()"
function
    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49
for getting the transformed coordinate of the top left corner pixel
in the image scaling code, but at the same time using a different
"pixman_transform_point()" function
    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82
in the extents calculation code for setting the cover flag. And these
functions did the intermediate rounding differently. That's why the 8e
safety margin was needed.

** proof ends

However, for COVER_CLIP_NEAREST, the actual margins added were not 8e.
Because the half-way cases round down, that is, coordinate 0 hits pixel
index -1 while coordinate e hits pixel index 0, the extra safety margins
were actually 7e to the left and up, and 9e to the right and down. This
patch removes the 7e and 9e margins and restores the -e adjustment
required for NEAREST sampling in Pixman. For reference, see
pixman/rounding.txt.

For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no
additional offsets to be restored, so simply removing the 8e additions
is enough.

Proof:

All implementations must give the same numerical results as
bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear().

The former does
    int x0 = pixman_fixed_to_int (x - pixman_fixed_e);
which maps directly to the new test for the nearest flag, when you consider
that x0 must fall in the interval [0,width).

The latter does
    x1 = x - pixman_fixed_1 / 2;
    x1 = pixman_fixed_to_int (x1);
    x2 = x1 + 1;
When you write a COVER path, you take advantage of the assumption that
both x1 and x2 fall in the interval [0, width).

As samplers are allowed to fetch the pixel at x2 unconditionally, we
require
    x1 >= 0
    x2 < width
so
    x - pixman_fixed_1 / 2 >= 0
    x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1
so
    pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0
    pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width
which matches the source code lines for the bilinear case, once you delete
the lines that add the 8e margin.

Signed-off-by: Ben Avison <bavison@riscosopen.org>
[Pekka: adjusted commit message, left affine-bench changes for another patch]
[Pekka: add commit message parts from Siarhei]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agopixman-general: Tighten up calculation of temporary buffer sizes
Ben Avison [Tue, 22 Sep 2015 11:43:25 +0000 (12:43 +0100)]
pixman-general: Tighten up calculation of temporary buffer sizes

Each of the aligns can only add a maximum of 15 bytes to the space
requirement. This permits some edge cases to use the stack buffer where
previously it would have deduced that a heap buffer was required.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
8 years agopixman-general: Fix stack related pointer arithmetic overflow
Siarhei Siamashka [Tue, 22 Sep 2015 01:25:40 +0000 (04:25 +0300)]
pixman-general: Fix stack related pointer arithmetic overflow

As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains,
the stack is allocated at the very top of the process address space
in some configurations (32-bit x86 systems with ASLR disabled).
And the careless computations done with the 'dest_buffer' pointer
may overflow, failing the buffer upper limit check.

The problem can be reproduced using the 'stress-test' program,
which segfaults when executed via setarch:

    export CFLAGS="-O2 -m32" && ./autogen.sh
    ./configure --disable-libpng --disable-gtk && make
    setarch i686 -R test/stress-test

This patch introduces the required corrections. The extra check
for negative 'width' may be redundant (the invalid 'width' value
is not supposed to reach here), but it's better to play safe
when dealing with the buffers allocated on stack.

Reported-by: Ludovic Courtès <ludo@gnu.org>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: soren.sandmann@gmail.com
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agotest: add a check for FE_DIVBYZERO
Thomas Petazzoni [Thu, 17 Sep 2015 13:43:27 +0000 (15:43 +0200)]
test: add a check for FE_DIVBYZERO

Some architectures, such as Microblaze and Nios2, currently do not
implement FE_DIVBYZERO, even though they have <fenv.h> and
feenableexcept(). This commit adds a configure.ac check to verify
whether FE_DIVBYZERO is defined or not, and if not, disables the
problematic code in test/utils.c.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agovmx: Remove unused expensive functions
Oded Gabbay [Sun, 6 Sep 2015 08:45:20 +0000 (11:45 +0300)]
vmx: Remove unused expensive functions

Now that we replaced the expensive functions with better performing
alternatives, we should remove them so they will not be used again.

Running Cairo benchmark on trimmed traces gave the following results:

POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.

Speedups
========
t-firefox-scrolling     1232.30 -> 1096.55 :  1.12x
t-gnome-terminal-vim    613.86  -> 553.10  :  1.11x
t-evolution             405.54  -> 371.02  :  1.09x
t-firefox-talos-gfx     919.31  -> 862.27  :  1.07x
t-gvim                  653.02  -> 616.85  :  1.06x
t-firefox-canvas-alpha  941.29  -> 890.42  :  1.06x

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
8 years agovmx: implement fast path vmx_composite_over_n_8_8888
Oded Gabbay [Sun, 28 Jun 2015 10:17:41 +0000 (13:17 +0300)]
vmx: implement fast path vmx_composite_over_n_8_8888

POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.

reference memcpy speed = 25008.9MB/s (6252.2MP/s for 32bpp fills)

                Before         After           Change
              ---------------------------------------------
L1              91.32          182.84         +100.22%
L2              94.94          182.83         +92.57%
M               95.55          181.51         +89.96%
HT              88.96          162.09         +82.21%
VT              87.4           168.35         +92.62%
R               83.37          146.23         +75.40%
RT              66.4           91.5           +37.80%
Kops/s          683            859            +25.77%

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
8 years agovmx: optimize vmx_composite_over_n_8888_8888_ca
Oded Gabbay [Sun, 6 Sep 2015 08:46:15 +0000 (11:46 +0300)]
vmx: optimize vmx_composite_over_n_8888_8888_ca

This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use
of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of
splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h.

Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:

reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills)

                Before          After           Change
              --------------------------------------------
L1              244.97          474.05         +93.51%
L2              243.74          473.05         +94.08%
M               243.29          467.16         +92.02%
HT              144.03          252.79         +75.51%
VT              174.24          279.03         +60.14%
R               109.86          149.98         +36.52%
RT              47.96           53.18          +10.88%
Kops/s          524             576            +9.92%

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
8 years agovmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER
Oded Gabbay [Sun, 6 Sep 2015 07:58:30 +0000 (10:58 +0300)]
vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER

This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all
the functions it calls (combine1, combine4 and
core_combine_over_u_pixel_vmx).

The optimization is done by removing use of expand_alpha_1x128 and
expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from
pixman_combine32.h.

Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:

reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)

                Before          After           Change
              --------------------------------------------
L1              182.05          210.22         +15.47%
L2              180.6           208.92         +15.68%
M               180.52          208.22         +15.34%
HT              130.17          178.97         +37.49%
VT              145.82          184.22         +26.33%
R               104.51          129.38         +23.80%
RT              48.3            61.54          +27.41%
Kops/s          430             504            +17.21%

v2: Check *pm is not NULL before dereferencing it in combine1()

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
8 years agoarmv6: enable over_n_8888
Pekka Paalanen [Mon, 7 Sep 2015 11:40:49 +0000 (14:40 +0300)]
armv6: enable over_n_8888

Enable the fast path added in the previous patch by moving the lookup
table entries to their proper locations.

Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the
effect of adding this one patch on top of
"armv6: Add over_n_8888 fast path (disabled)", which was applied on
fd595692941f3d9ddea8934462bd1d18aed07c65.

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    12.5   0.04     45.2   0.10    100.00%    +263.1%
L2    11.1   0.02     43.2   0.03    100.00%    +289.3%
M      9.4   0.00     42.4   0.02    100.00%    +351.7%
HT     8.5   0.02     25.4   0.10    100.00%    +198.8%
VT     8.4   0.02     22.3   0.07    100.00%    +167.0%
R      8.2   0.02     23.1   0.09    100.00%    +183.6%
RT     5.4   0.05     11.4   0.21    100.00%    +110.3%

At most 3 outliers rejected per test per set.

Iterating here means that lowlevel-blt-bench was executed 30 times, and
the statistics above were computed from the output.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
8 years agoarmv6: Add over_n_8888 fast path (disabled)
Ben Avison [Mon, 7 Sep 2015 11:40:48 +0000 (14:40 +0300)]
armv6: Add over_n_8888 fast path (disabled)

This new fast path is initially disabled by putting the entries in the
lookup table after the sentinel. The compiler cannot tell the new code
is not used, so it cannot eliminate the code. Also the lookup table size
will include the new fast path. When the follow-up patch then enables
the new fast path, the binary layout (alignments, size, etc.) will stay
the same compared to the disabled case.

Keeping the binary layout identical is important for benchmarking on
Raspberry Pi 1. The addresses at which functions are loaded will have a
significant impact on benchmark results, causing unexpected performance
changes. Keeping all function addresses the same across the patch
enabling a new fast path improves the reliability of benchmarks.

Benchmark results are included in the patch enabling this fast path.

[Pekka: disabled the fast path, commit message]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
8 years agotest: Add cover-test v5
Ben Avison [Wed, 2 Sep 2015 19:35:59 +0000 (20:35 +0100)]
test: Add cover-test v5

This test aims to verify both numerical correctness and the honouring of
array bounds for scaled plots (both nearest-neighbour and bilinear) at or
close to the boundary conditions for applicability of "cover" type fast paths
and iter fetch routines.

It has a secondary purpose: by setting the env var EXACT (to any value) it
will only test plots that are exactly on the boundary condition. This makes
it possible to ensure that "cover" routines are being used to the maximum,
although this requires the use of a debugger or code instrumentation to
verify.

Changes in v4:

  Check the fence page size and skip the test if it is too large. Since
  we need to deal with pixman_fixed_t coordinates that go beyond the
  real image width, make the page size limit 16 kB. A 32 kB or larger
  page size would cause an a8 image width to be 32k or more, which is no
  longer representable in pixman_fixed_t.

  Use a shorthand variable 'filter' in test_cover().

  Whitespace adjustments.

Changes in v5:

  Skip if fenced memory is not supported. Do you know of any such
  platform?

Signed-off-by: Ben Avison <bavison@riscosopen.org>
[Pekka: changes in v4 and v5]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoimplementation: add PIXMAN_DISABLE=wholeops
Pekka Paalanen [Tue, 8 Sep 2015 10:35:33 +0000 (13:35 +0300)]
implementation: add PIXMAN_DISABLE=wholeops

Add a new option to PIXMAN_DISABLE: "wholeops". This option disables all
whole-operation fast paths regardless of implementation level, except
the general path (general_composite_rect).

The purpose is to add a debug option that allows us to test optimized
iterator paths specifically. With this, it is possible to see if:
- fast paths mask bugs in iterators
- compare fast paths with iterator paths for performance

The effect was tested on x86_64 by running:
$ PIXMAN_DISABLE='' ./test/lowlevel-blt-bench over_8888_8888
$ PIXMAN_DISABLE='wholeops' ./test/lowlevel-blt-bench over_8888_8888

In the first case time is spent in sse2_composite_over_8888_8888(), and
in the latter in sse2_combine_over_u().

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoutils.[ch]: add fence_get_page_size()
Pekka Paalanen [Tue, 8 Sep 2015 06:36:48 +0000 (09:36 +0300)]
utils.[ch]: add fence_get_page_size()

Add a function to get the page size used for memory fence purposes, and
use it everywhere where getpagesize() was used.

This offers a single point in code to override the page size, in case
one wants to experiment how the tests work with a higher page size than
what the developer's machine has.

This also offers a clean API, without adding #ifdefs, to tests for
checking the page size.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agoutils.c: fix fallback code for fence_image_create_bits()
Pekka Paalanen [Tue, 8 Sep 2015 06:20:46 +0000 (09:20 +0300)]
utils.c: fix fallback code for fence_image_create_bits()

Used a wrong variable name, causing:
/home/pq/git/pixman/demos/../test/utils.c: In function ‘fence_image_create_bits’:
/home/pq/git/pixman/demos/../test/utils.c:562:46: error: ‘width’ undeclared (first use in this function)

Use the correct variable.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agotest: add fence-image-self-test
Pekka Paalanen [Thu, 7 May 2015 14:16:05 +0000 (17:16 +0300)]
test: add fence-image-self-test

Tests that fence_malloc and fence_image_create_bits actually work: that
out-of-bounds and out-of-row (unused stride area) accesses trigger
SIGSEGV.

If fence_malloc is a dummy (FENCE_MALLOC_ACTIVE not defined), this test
is skipped.

Changes in v2:

- check FENCE_MALLOC_ACTIVE value, not whether it is defined
- test that reading bytes near the fence pages does not cause a
  segmentation fault

Changes in v3:

- Do not print progress messages unless VERBOSE environment variable is
  set. Avoid spamming the terminal output of 'make check' on some
  versions of autotools.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agoutils.[ch]: add fence_image_create_bits ()
Pekka Paalanen [Thu, 7 May 2015 13:46:01 +0000 (16:46 +0300)]
utils.[ch]: add fence_image_create_bits ()

Useful for detecting out-of-bounds accesses in composite operations.

This will be used by follow-up patches adding new tests.

Changes in v2:

- fix style on fence_image_create_bits args
- add page to stride only if stride_fence
- add comment on the fallback definition about freeing storage

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agoutils.[ch]: add FENCE_MALLOC_ACTIVE
Pekka Paalanen [Thu, 7 May 2015 11:21:30 +0000 (14:21 +0300)]
utils.[ch]: add FENCE_MALLOC_ACTIVE

Define a new token to simplify checking whether fence_malloc() actually
can catch out-of-bounds access.

This will be used in the future to skip tests that rely on fence_malloc
checking functionality.

Changes in v2:

- #define FENCE_MALLOC_ACTIVE always, but change its value to help catch
  use of it without including utils.h

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agoscaling-test: list more details when verbose
Ben Avison [Thu, 20 Aug 2015 12:07:48 +0000 (13:07 +0100)]
scaling-test: list more details when verbose

Add mask details to the output.

[Pekka: redo whitespace and print src,dst,mask x and y.]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
8 years agolowlevel-blt-bench: make extra arguments an error
Pekka Paalanen [Tue, 7 Jul 2015 08:31:20 +0000 (11:31 +0300)]
lowlevel-blt-bench: make extra arguments an error

If a user gives multiple patterns or extra arguments, only the last one
was used as the pattern while the former were just ignored. This is a
user error silently converted to something possibly unexpected.

In presence of extra arguments, complain and quit.

Cc: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
8 years agoPost-release version bump to 0.33.3
Oded Gabbay [Sat, 1 Aug 2015 20:01:43 +0000 (23:01 +0300)]
Post-release version bump to 0.33.3

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
8 years agoPre-release version bump to 0.33.2 pixman-0.33.2
Oded Gabbay [Sat, 1 Aug 2015 19:34:53 +0000 (22:34 +0300)]
Pre-release version bump to 0.33.2

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
9 years agovmx: implement fast path iterator vmx_fetch_a8
Oded Gabbay [Wed, 1 Jul 2015 11:34:07 +0000 (14:34 +0300)]
vmx: implement fast path iterator vmx_fetch_a8

no changes were observed when running cairo trimmed benchmarks.

Running "lowlevel-blt-bench src_8_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.1 ppc64le gave the following results:

reference memcpy speed = 25197.2MB/s (6299.3MP/s for 32bpp fills)

                Before          After           Change
              --------------------------------------------
L1              965.34          3936           +307.73%
L2              942.99          3436.29        +264.40%
M               902.24          2757.77        +205.66%
HT              448.46          784.99         +75.04%
VT              430.05          819.78         +90.62%
R               412.9           717.04         +73.66%
RT              168.93          220.63         +30.60%
Kops/s          1025            1303           +27.12%

It was benchmarked against commid id e2d211a from pixman/master

Siarhei Siamashka reported that on playstation3, it shows the following
results:

== before ==

              src_8_8888 =  L1: 194.37  L2: 198.46  M:155.90 (148.35%)
              HT: 59.18  VT: 36.71  R: 38.93  RT: 12.79 ( 106Kops/s)

== after ==

              src_8_8888 =  L1: 373.96  L2: 391.10  M:245.81 (233.88%)
              HT: 80.81  VT: 44.33  R: 48.10  RT: 14.79 ( 122Kops/s)

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path iterator vmx_fetch_x8r8g8b8
Oded Gabbay [Mon, 29 Jun 2015 12:31:02 +0000 (15:31 +0300)]
vmx: implement fast path iterator vmx_fetch_x8r8g8b8

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.

cairo trimmed benchmarks :

Speedups
========
t-firefox-asteroids  533.92  -> 489.94 :  1.09x

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path scaled nearest vmx_8888_8888_OVER
Oded Gabbay [Sun, 28 Jun 2015 20:25:24 +0000 (23:25 +0300)]
vmx: implement fast path scaled nearest vmx_8888_8888_OVER

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              134.36          181.68          +35.22%
L2              135.07          180.67          +33.76%
M               134.6           180.51          +34.11%
HT              121.77          128.79          +5.76%
VT              120.49          145.07          +20.40%
R               93.83           102.3           +9.03%
RT              50.82           46.93           -7.65%
Kops/s          448             422             -5.80%

cairo trimmed benchmarks :

Speedups
========
t-firefox-asteroids  533.92 -> 497.92 :  1.07x
    t-midori-zoomed  692.98 -> 651.24 :  1.06x

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path vmx_composite_src_x888_8888
Oded Gabbay [Sun, 28 Jun 2015 19:23:44 +0000 (22:23 +0300)]
vmx: implement fast path vmx_composite_src_x888_8888

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              1115.4          5006.49         +348.85%
L2              1112.26         4338.01         +290.02%
M               1110.54         2524.15         +127.29%
HT              745.41          1140.03         +52.94%
VT              749.03          1287.13         +71.84%
R               423.91          547.6           +29.18%
RT              205.79          194.98          -5.25%
Kops/s          1414            1361            -3.75%

cairo trimmed benchmarks :

Speedups
========
t-gnome-system-monitor  1402.62  -> 1212.75 :  1.16x
   t-firefox-asteroids   533.92  ->  474.50 :  1.13x

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path vmx_composite_over_n_8888_8888_ca
Oded Gabbay [Sun, 28 Jun 2015 07:14:20 +0000 (10:14 +0300)]
vmx: implement fast path vmx_composite_over_n_8888_8888_ca

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.

reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              61.92            244.91          +295.53%
L2              62.74            243.3           +287.79%
M               63.03            241.94          +283.85%
HT              59.91            144.22          +140.73%
VT              59.4             174.39          +193.59%
R               53.6             111.37          +107.78%
RT              37.99            46.38           +22.08%
Kops/s          436              506             +16.06%

cairo trimmed benchmarks :

Speedups
========
t-xfce4-terminal-a1  1540.37 -> 1226.14 :  1.26x
t-firefox-talos-gfx  1488.59 -> 1209.19 :  1.23x

Slowdowns
=========
        t-evolution  553.88  -> 581.63  :  1.05x
          t-poppler  364.99  -> 383.79  :  1.05x
t-firefox-scrolling  1223.65 -> 1304.34 :  1.07x

The slowdowns can be explained in cases where the images are small and
un-aligned to 16-byte boundary. In that case, the function will first
work on the un-aligned area, even in operations of 1 byte. In case of
small images, the overhead of such operations can be more than the
savings we get from using the vmx instructions that are done on the
aligned part of the image.

In the C fast-path implementation, there is no special treatment for the
un-aligned part, as it works in 4 byte quantities on the entire image.

Because llbb is a synthetic test, I would assume it has much less
alignment issues than "real-world" scenario, such as cairo benchmarks,
which are basically recorded traces of real application activity.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path composite_add_8888_8888
Oded Gabbay [Thu, 18 Jun 2015 12:05:49 +0000 (15:05 +0300)]
vmx: implement fast path composite_add_8888_8888

Copied impl. from sse2 file and edited to use vmx functions

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 16 cores, 3.4GHz, ppc64le :

reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              248.76          3284.48         +1220.34%
L2              264.09          2826.47         +970.27%
M               261.24          2405.06         +820.63%
HT              217.27          857.3           +294.58%
VT              213.78          980.09          +358.46%
R               176.61          442.95          +150.81%
RT              107.54          150.08          +39.56%
Kops/s          917             1125            +22.68%

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path composite_add_8_8
Oded Gabbay [Thu, 18 Jun 2015 11:56:47 +0000 (14:56 +0300)]
vmx: implement fast path composite_add_8_8

Copied impl. from sse2 file and edited to use vmx functions

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 16 cores, 3.4GHz, ppc64le :

reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              687.63          9140.84         +1229.33%
L2              715             7495.78         +948.36%
M               717.39          8460.14         +1079.29%
HT              569.56          1020.12         +79.11%
VT              520.3           1215.56         +133.63%
R               514.81          874.35          +69.84%
RT              341.28          305.42          -10.51%
Kops/s          1621            1579            -2.59%

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path composite_over_8888_8888
Oded Gabbay [Thu, 18 Jun 2015 11:12:05 +0000 (14:12 +0300)]
vmx: implement fast path composite_over_8888_8888

Copied impl. from sse2 file and edited to use vmx functions

It was benchmarked against commid id 2be523b from pixman/master

POWER8, 16 cores, 3.4GHz, ppc64le :

reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)

                Before           After           Change
              ---------------------------------------------
L1              129.47          1054.62         +714.57%
L2              138.31          1011.02         +630.98%
M               139.99          1008.65         +620.52%
HT              122.11          468.45          +283.63%
VT              121.06          532.21          +339.62%
R               108.48          240.5           +121.70%
RT              77.87           116.7           +49.87%
Kops/s          758             981             +29.42%

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: implement fast path vmx_fill
Oded Gabbay [Sun, 28 Jun 2015 06:42:19 +0000 (09:42 +0300)]
vmx: implement fast path vmx_fill

Based on sse2 impl.

It was benchmarked against commid id e2d211a from pixman/master

Tested cairo trimmed benchmarks on POWER8, 8 cores, 3.4GHz,
RHEL 7.1 ppc64le :

speedups
========
     t-swfdec-giant-steps  1383.09 ->  718.63  :  1.92x speedup
   t-gnome-system-monitor  1403.53 ->  918.77  :  1.53x speedup
              t-evolution  552.34  ->  415.24  :  1.33x speedup
      t-xfce4-terminal-a1  1573.97 ->  1351.46 :  1.16x speedup
      t-firefox-paintball  847.87  ->  734.50  :  1.15x speedup
      t-firefox-asteroids  565.99  ->  492.77  :  1.15x speedup
t-firefox-canvas-swscroll  1656.87 ->  1447.48 :  1.14x speedup
          t-midori-zoomed  724.73  ->  642.16  :  1.13x speedup
   t-firefox-planet-gnome  975.78  ->  911.92  :  1.07x speedup
          t-chromium-tabs  292.12  ->  274.74  :  1.06x speedup
     t-firefox-chalkboard  690.78  ->  653.93  :  1.06x speedup
      t-firefox-talos-gfx  1375.30 ->  1303.74 :  1.05x speedup
   t-firefox-canvas-alpha  1016.79 ->  967.24  :  1.05x speedup

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: add helper functions
Oded Gabbay [Sun, 28 Jun 2015 06:42:08 +0000 (09:42 +0300)]
vmx: add helper functions

This patch adds the following helper functions for reuse of code,
hiding BE/LE differences and maintainability.

All of the functions were defined as static force_inline.

Names were copied from pixman-sse2.c so conversion of fast-paths between
sse2 and vmx would be easier from now on. Therefore, I tried to keep the
input/output of the functions to be as close as possible to the sse2
definitions.

The functions are:

- load_128_aligned       : load 128-bit from a 16-byte aligned memory
                           address into a vector

- load_128_unaligned     : load 128-bit from memory into a vector,
                           without guarantee of alignment for the
                           source pointer

- save_128_aligned       : save 128-bit vector into a 16-byte aligned
                           memory address

- create_mask_16_128     : take a 16-bit value and fill with it
                           a new vector

- create_mask_1x32_128   : take a 32-bit pointer and fill a new
                           vector with the 32-bit value from that pointer

- create_mask_32_128     : take a 32-bit value and fill with it
                           a new vector

- unpack_32_1x128        : unpack 32-bit value into a vector

- unpacklo_128_16x8      : unpack the eight low 8-bit values of a vector

- unpackhi_128_16x8      : unpack the eight high 8-bit values of a vector

- unpacklo_128_8x16      : unpack the four low 16-bit values of a vector

- unpackhi_128_8x16      : unpack the four high 16-bit values of a vector

- unpack_128_2x128       : unpack the eight low 8-bit values of a vector
                           into one vector and the eight high 8-bit
                           values into another vector

- unpack_128_2x128_16    : unpack the four low 16-bit values of a vector
                           into one vector and the four high 16-bit
                           values into another vector

- unpack_565_to_8888     : unpack an RGB_565 vector to 8888 vector

- pack_1x128_32          : pack a vector and return the LSB 32-bit of it

- pack_2x128_128         : pack two vectors into one and return it

- negate_2x128           : xor two vectors with mask_00ff (separately)

- is_opaque              : returns whether all the pixels contained in
                           the vector are opaque

- is_zero                : returns whether the vector equals 0

- is_transparent         : returns whether all the pixels
                           contained in the vector are transparent

- expand_pixel_8_1x128   : expand an 8-bit pixel into lower 8 bytes of a
                           vector

- expand_alpha_1x128     : expand alpha from vector and return the new
                           vector

- expand_alpha_2x128     : expand alpha from one vector and another alpha
                           from a second vector

- expand_alpha_rev_2x128 : expand a reversed alpha from one vector and
                           another reversed alpha from a second vector

- pix_multiply_2x128     : do pix_multiply for two vectors (separately)

- over_2x128             : perform over op. on two vectors

- in_over_2x128          : perform in-over op. on two vectors

v2: removed expand_pixel_32_1x128 as it was not used by any function and
its implementation was erroneous

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agovmx: add LOAD_VECTOR macro
Oded Gabbay [Thu, 2 Jul 2015 08:04:20 +0000 (11:04 +0300)]
vmx: add LOAD_VECTOR macro

This patch adds a macro for loading a single vector.
It also make the other LOAD_VECTORx macros use this macro as a base so
code would be re-used.

In addition, I fixed minor coding style issues.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
9 years agoMIPS: update author's e-mail address
Nemanja Lukic [Fri, 27 Jun 2014 16:05:39 +0000 (18:05 +0200)]
MIPS: update author's e-mail address

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
9 years agolowlevel-blt-bench: add option to skip memcpy measurement
Pekka Paalanen [Wed, 10 Jun 2015 10:54:01 +0000 (13:54 +0300)]
lowlevel-blt-bench: add option to skip memcpy measurement

The memcpy speed measurement takes several seconds. When you are running
single tests in a harness that iterates dozens or hundreds of times, the
repeated measurements are redundant and take a lot of time. It is also
an open question whether the measured speed changes over long test runs
due to unidentified platform reasons (Raspberry Pi).

Add a command line option to set the reference memcpy speed, skipping
the measuring.

The speed is mainly used to compute how many iterations do run inside
the bench_*() functions, so for repeated testing on the same hardware,
it makes sense to lock that number to a constant.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
9 years agolowlevel-blt-bench: add CSV output mode
Pekka Paalanen [Wed, 10 Jun 2015 10:20:47 +0000 (13:20 +0300)]
lowlevel-blt-bench: add CSV output mode

Add a command line option for choosing CSV output mode.

In CSV mode, only the results in Mpixels/s are printed in an easily
machine-parseable format. All user-friendly printing is suppressed.

This is intended for cases where you benchmark one particular operation
at a time. Running the "all" set of benchmarks will print just fine, but
you may have trouble matching rows to operations as you have to look at
the tests_tbl[] to see what row is which.

Reviewed-by: Ben Avison <bavison@riscosopen.org>
v2: don't add a space after comma in CSV.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
9 years agolowlevel-blt-bench: refactor to Mpx_per_sec()
Pekka Paalanen [Wed, 10 Jun 2015 09:41:57 +0000 (12:41 +0300)]
lowlevel-blt-bench: refactor to Mpx_per_sec()

Refactor the Mpixels/s computations into a function. Easier to read and
better documents what is being computed.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
9 years agolowlevel-blt-bench: all bench funcs to return pix_cnt
Pekka Paalanen [Wed, 10 Jun 2015 09:53:09 +0000 (12:53 +0300)]
lowlevel-blt-bench: all bench funcs to return pix_cnt

The bench_* functions, that did not already do it, are modified to
return the number of pixels processed during the benchmark. This moves
the computation to the site that actually determines the number, and
simplifies bench_composite() a bit.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
9 years agolowlevel-blt-bench: move speed and scaling printing
Pekka Paalanen [Wed, 10 Jun 2015 09:02:17 +0000 (12:02 +0300)]
lowlevel-blt-bench: move speed and scaling printing

Move the printing of the memory speed and scaling mode into a new
function. This will help with implementing a machine-readable output
option.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
9 years agolowlevel-blt-bench: print single pattern details
Pekka Paalanen [Wed, 10 Jun 2015 08:56:39 +0000 (11:56 +0300)]
lowlevel-blt-bench: print single pattern details

When given just a single test pattern instead of "all", print the test
details. This can be used to verify the pattern parser agrees with the
user, just like scaling settings are printed.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
9 years agolowlevel-blt-bench: make test_entry::testname const
Pekka Paalanen [Wed, 10 Jun 2015 08:34:45 +0000 (11:34 +0300)]
lowlevel-blt-bench: make test_entry::testname const

We assign string literals to it, so it better be const.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>