review.tizen.org Git - platform/upstream/mesa.git/log

mesa/st/glsl_to_tgsi: fixup copy-paste mistake

This is clearly a copy-paste error; if we validate the reladdr2-pointer,
we don't want to traverse to the reladdr-pointer. Especially since the
check above shows that reladdr could be NULL here.

Noticed by Coverity.

CID: 1438389, 1438390
Fixes: 568bda2f2d3 ("mesa/st/glsl_to_tgsi: Split arrays whose elements are only accessed directly")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>

i965/nir: Use the nir copy of shader_info to handle gl_PatchVerticesIn

Instead of using the copy of shader_info stored in gl_program, it now
uses the one in nir_shader. This is needed for SPIR-V because the
info.tess.tcs_vertices_out is filled in via _mesa_spirv_to_nir which
happens much later than with a GLSL shader. The copy of shader_data in
gl_program is only updated later via brw_shader_gather_info but that
is too late.

For GLSL this shouldn't create any problems because the nir copy of
the shader_info is immediately copied from the gl_program in
glsl_to_nir.

v2: updated after commit "i965: Combine both gl_PatchVerticesIn
lowering passes." (488972) (Alejandro Piñeiro)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa/glspirv: Set separate_shader on shader_info

The value is copied from the gl_program. If we don’t do this then it
will get reset back to zero in brw_shader_gather_info. This isn’t a
problem for GLSL because in that case the nir_shader is initialised
with a copy of the shader_info from the gl_program.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa/glspirv: pick off the only entry point we need

This is the same we do for vulkan drivers

This is needed to pass the following CTS test:
KHR-GL45.gl_spirv.spirv_modules_shader_binary_multiple_shader_objects_test

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

mesa/glspirv: compute double inputs and remap attributes

input locations used by input attributes are not handled in the same
way in OpenGL vs Vulkan. There is a detailed explanation of such
differences on the following commit:

c2acf97fcc9b32eaa9778771282758e5652a8ad4

So with this commit, the same adjustment that is done after
glsl_to_nir, is being done after spirv_to_nir, when it is used on
OpenGL (ARB_gl_spirv).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/glsl: make nir_remap_attributes public

As we plan to reuse it for ARB_gl_spirv implementation.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/lower_samplers: don't assume a deref for both texture and sampler srcs

After commit "nir: Use derefs in nir_lower_samplers"
(75286c2d083cdbdfb202a93349e567df0441d5f7) assumes one deref for both
the texture and the sampler. However there are cases (on OpenGL, using
ARB_gl_spirv) where SPIR-V is not providing a sampler, like for
texture query levels ops. Although we could make spirv_to_nir to
provide a sampler deref for those cases, it is not really needed, and
wrong from the Vulkan point of view.

This patch fixes the following (borrowed) tests run on SPIR-V mode:
  arb_compute_shader/execution/basic-texelFetch.shader_test
  arb_gpu_shader5/execution/sampler_array_indexing/fs-simple-texture-size.shader_test
  arb_texture_query_levels/execution/fs-baselevel.shader_test
  arb_texture_query_levels/execution/fs-maxlevel.shader_test
  arb_texture_query_levels/execution/fs-miptree.shader_test
  arb_texture_query_levels/execution/fs-nomips.shader_test
  arb_texture_query_levels/execution/vs-baselevel.shader_test
  arb_texture_query_levels/execution/vs-maxlevel.shader_test
  arb_texture_query_levels/execution/vs-miptree.shader_test
  arb_texture_query_levels/execution/vs-nomips.shader_test
  glsl-1.30/execution/fs-textureSize-compare.shader_test

v2: merge lower_tex_src_to_offset and calc_sampler_offsets together,
    update texture/sampler index and texture_array_size directly on
    lower_tex_src_to_offset (Jason)
v3: clarify one comment (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir/linker: take into account hidden uniforms

So they are not exposed through the introspection API.

It is worth to note that the number of hidden uniforms of GLSL linking
vs SPIR-V linking would be somewhat different due the differen order
of the nir lowerings/optimizations.

For example: gl_FbWposYTransform. This is introduced as part of
nir_lower_wpos_ytransform. On GLSL that is executed after the IR-based
linking. So that means that on GLSL the UniformStorage will not
include this uniform. With the SPIR-V linking, that uniform is already
present, but marked as hidden. So it will be included on the
UniformStorage, but as hidden.

One alternative would create a special how_declared for that case, but
seemed an overkill. Using hidden should be ok as far as it is used
properly.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: add how_declared to nir_variable.data

Equivalent to the already existing how_declared at GLSL IR. The only
difference is that we are not adding all the declaration_type
available on GLSL, only the one that we will use on the short term. We
would add more mode if needed on the future.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

spirv: Make VertexIndex and VertexId both non-zero-based

GLSL has gl_VertexID which is supposed to be non-zero-based.

SPIR-V has both VertexIndex and VertexId builtins whose meanings are
defined by the APIs.

Vulkan defines VertexIndex as being non-zero-based. In Vulkan VertexId
and InstanceId have no meaning and are pretty much just reserved for
OpenGL at this point.

GL_ARB_spirv removes VertexIndex and defines VertexId to be the same
as gl_VertexId (which is also non-zero-based).

Previously in Mesa it was treating VertexIndex as non-zero-based and
VertexId as zero-based, so it was breaking for GL. This behaviour was
apparently based on Khronos bug 14255. However that bug doesn’t seem
to have made a final decision for VertexId.

Assuming there really is no other definition for VertexId for Vulkan
it seems better to just make them both have the same value.

v2: update comment and commit descriptions, based on Jason Ekstrand
explanation of the meaning/rationale behind all those builtins
(Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: fill info.gs.input_primitive too

info.gs.output_primitive was already being filled. Not sure why this
is not needed on Vulkan, but we found to be needed for
ARB_gl_spirv. Specifically, this is needed to get the following test
passing:

KHR-GL45.gl_spirv.spirv_validation_builtin_variable_decorations_test

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

docs/features: mark GL_EXT_render_snorm as done for i965

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>

i965: enable EXT_render_snorm

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

mesa: enable EXT_render_snorm extension

Patch sets additional formats renderable and enables the extension
when OpenGL ES 3.1 is supported.

v2: instead of dummy_true, have a separate toggle for extension
    (Eric Anholt)

v3: add missing checks, simplify some existing checks and fix
    glCopyTexImage2D check (Nanley Chery)

    add SHORT and BYTE support in read_pixels_es3_error_check

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

blorp: Properly handle Z24X8 blits.

One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS
destinations were broken was that an earlier layer was swapping it
out for B8G8R8A8_UNORM.  That made Z24X8 -> Z24X8 blits work.

However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken.
The old code only considered one format at a time, without thinking
that format conversion may need to occur.

This patch moves the translation out to a place where it can consider
both formats.  If both are Z24X8, we continue using B8G8R8A8_UNORM to
avoid having to do shader math workarounds.  If we have a Z24X8
destination, but a non-matching source, we use our shader hacks to
actually render to it properly.

Fixes: 804856fa5735164cc0733ad0ea62adad39b00ae2 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

blorp: Don't try to use R32_UNORM for R24_UNORM_X8_TYPELESS rendering.

The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so
Jason decided to fake it with a bit of shader math and R32_UNORM RTs.

The only problem is that R32_UNORM isn't renderable either...so we've
just traded one bad format for another.

This patch makes us use R32_UINT instead.

Fixes: 804856fa5735164cc0733ad0ea62adad39b00ae2 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel: Switch the order of the 2x MSAA sample positions

The Vulkan 1.1.82 spec flipped the order to better match D3D.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

mesa/st/tests: Add array life range estimation and renumbering tests

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/tests: Add array life range tests infrastructure to common test class

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Expose array live range tracking and merging

This patch ties in the array split, merge, and interleave code.

shader-db changes in the TGSI code are:

              original code  |  array-merge  |       change
              mean      max  |  mean    max  | best  mean %  worst
      -----------------------------------------------------------
      arrays   0.05       2  |   0.00     0  |  -2   -100      0
total temps    5.05      21  |   4.92    20  | -15   -2.59     1
      instr   55.33     988  |  55.20   988  | -15   -0.24     0

Evaluation:

Run shader-db in single thread mode (otherwise the output is
not ordered and the best and worst column don't make sense) to
get results pre-stats.txt and post-stats.txt. Then using
python pandas:

import pandas as pd
old_stats = pd.read_csv('pre-stats.txt')
new_stats = pd.read_csv('post-stats.txt')
omean = old_stats.mean()
omax = old_stats.max()
nmean = new_stats.mean()
nmax = new_stats.max()
delta =  new_stats - old_stats
pd.concat([omean, omax, nmean, nmax, delta.min(),
            delta.mean()/old_stats.mean()*100, delta.max()],
            axis=1, keys=['mean', 'max', 'mean', 'max', 'best',
            'avg change %', 'worst'])

v4: - Correct typo and add bugs that are fixed by this series.
    - Update stats and describe stats evaluation

Bugzilla:
  https://bugs.freedesktop.org/show_bug.cgi?id=105371
  https://bugs.freedesktop.org/show_bug.cgi?id=100200
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: add array life range evaluation into tracking code

v4: Also track the register given in inst->resource. (thanks: Benedikt Schemmer
for testing the patches on radeonsi, which revealed that I was missing
tracking this)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: add class for array access tracking

Because of the indirect access it is impossible to obtain an accurate per
component and array element tracking. Therefore, the tracking is simplified
to only track whether any element was accessed, whether this happend
conditionally in a loop. In addition, while tracking of temporaries requires
a per-componet tracking that is later fused, for arrays only the components
access mask is neede. The resulting tracking code and evaluation of the array
live range is sufficiently different from the evaluation of the live range of
temporaries to justify implementing this in a different class instead of
adding more complexity to the already existing code for temporary life
range evaluation.

v4: Update commit message to make it clearer why this class is seperate from
the tracking of temporaries.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: move evaluation of read mask up in the call hierarchy

In preparation of the array live range tracking the evaluation of the read
mask is moved out the register live range tracking to the enclosing call
of the generalized read access tracking.

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: rename access_record to register_merge_record and some more renames

In preparartion of adding the tracking of the live range the classes that refer
to temporary registers are renamed.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/tests: Add tests for array merge helper classes.

v2: - Define tests also in the meson.build file.
v4: - Check no-op mapping of all bits.
    - Convert tests to the new class layout used in the merge evaulation.
    - remove dependency on llvm in meson build (Thanks Dylan Baker for pointing
       out that this might not needed)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Add array merge logic

v4: - Update the code to use the new merge logic.
- Use a cleaner, class-based approach for the evaluation of merges.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Add helper classes to apply array merging and interleaving

v4: - Remove logic for evaluation of swizzles and merges since this
was moved to array_live_range. This class now only handles the
actual remapping.

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Add helper class for array live range merging and interleaving

This class holds the array length, live range, and accessed components, and
it implements the logic for evaluating how arrays are merged and interleaved.

v4: - Add logic to evaluate merge and interleave of a pair of arrays to
      the class array_live_range.
    - document class
    - update commit message

Thanks Nicolai Hähnle for the pointers given.

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi:rename lifetime to register_live_range

On one hand "live range" is the term used in the literature, and on the
other hand a distinction is needed from the array live ranges.

v4: Fix indentions and white spaces

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Properly resolve life times simple if/else + use constructs

in constructs like below, currently the live range estimation extends the live range
of t unecessarily to the whole loop because it was not detected that t is
unconditional written and later read only in the "if (a)" scope.

  while (foo)  {
    ...
    if (a) {
       ...
       if (b)
         t = ...
       else
         t = ...
       x = t;
       ...
    }
     ...
  }

This patch adds a unit test for this case and corrects the minimal live range estimation
accordingly.

v4: update comments
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Split arrays whose elements are only accessed directly

Array whose elements are only accessed directly are replaced by the
according number of temporary registers. By doing so the otherwise
reserved register range becomes subject to further optimizations like
copy propagation and register merging.

Thanks to the resulting reduced register pressure this patch makes
the piglits

  spec/glsl-1.50/execution -
      variable-indexing/vs-output-array-vec3-index-wr-before-gs
      geometry/max-input-components

pass on r600 (barts) where they would fail before with a "GPR limit exceeded"
error (even with the spilling that was recently added).

v2: * rename method dissolve_arrays to split_arrays
    * unify the tracking and remapping methods for src and dst registers
    * also track access to arrays via reladdr*

v3: * enable this optimization only if the driver requests register merge

v4: * Correct comments
    * Also update inst->resource if it is an array element
      (thanks: Benedikt Schemmer for testing the patches on radeonsi, which
       revealed that I was missing tracking this)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

mesa/st/glsl_to_tgsi: Add method to collect some TGSI statistics

When mesa is compiled in debug mode then this adds the possibility
to print out some statistics about the translated and optimized TGSI
shaders to a file.

The functionality is enabled by setting the environment variable

GLSL_TO_TGSI_PRINT_STATS

to the file name where the statistics should be collected. The file is
opened in append mode so that statistics from various runs will be
accumulated.

v4: Make accress to log file thread save (thanks for pointing this out Nicolai
Hähnle)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>

Gallium/tgsi: Correct signdness of return value of bit operations

The GLSL operations findLSB, findMSB, and countBits always return
a signed integer type. Let TGSI reflect this.

v2: Properly set values in infer_(src|dst)_type   (Thanks Roland
    Schneidegger for pointing out problems with my 1st approach)
v2: Set values in the common infer_type code path, and only add
    the correct source type for UMSB (Roland Schneidegger)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

meson: Build with Python 3

Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Rework bytes/unicode string handling

In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').

On Python 2, the read() method of a file object opened in mode 'r' will
return byte strings, while on Python 3 it will return unicode strings.

Explicitly specifying the binary mode ('rb') then decoding the byte
string means we always handle unicode strings on both Python 2 and 3.

Which in turns means all re.match(line) will return unicode strings as
well.

If we also make expandCString return unicode strings, we don't need the
call to the unicode() constructor any more.

We were using the ugettext() method because it always returns unicode
strings in Python 2, contrarily to the gettext() one which returns
byte strings. The ugettext() method doesn't exist on Python 3, so we
must use the right method on each version of Python.

The last hurdles are that Python 3 doesn't let us concatenate unicode
and byte strings directly, and that Python 2's stdout wants encoded byte
strings while Python 3's want unicode strings.

With these changes, the script gives the same output on both Python 2
and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Fix inequality comparisons

On Python 3, executing `foo != bar` will first try to call
foo.__ne__(bar), and fallback on the opposite result of foo.__eq__(bar).

Python 2 does not do that.

As a result, those __eq__ methods were never called, when we were
testing for inequality.

Expliclty adding the __ne__ methods fixes this issue, in a way that is
compatible with both Python 2 and 3.

However, this means the __eq__ methods are now called when testing for
`foo != None`, so they need to be guarded correctly.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

mesa/st: ETC2 now uses R8G8B8A8_SRGB as fallback

The check for ETC2 compatibility was not updated when the fallback
format was changed.

Fixes: 71867a0a61cea20bf3f6115692e70b0d60f0b70d
st/mesa: Fall back to R8G8B8A8_SRGB for ETC2

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

python: Simplify list sorting

Instead of copying the list, then sorting the copy in-place, we can just
get a new sorted copy directly.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Use key-functions when sorting containers

In Python 2, the traditional way to sort containers was to use a
comparison function (which returned either -1, 0 or 1 when passed two
objects) and pass that as the "cmp" argument to the container's sort()
method.

Python 2.4 introduced key-functions, which instead only operate on a
given item, and return a sorting key for this item.

In general, this runs faster, because the cmp-function has to get run
multiple times for each item of the container.

Python 3 removed the cmp-function, enforcing usage of key-functions
instead.

This change makes the script compatible with Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Better check for integer types

Python 3 lost the long type: now everything is an int, with the right
size.

This commit makes the script compatible with Python 2 (where we check
for both int and long) and Python 3 (where we only check for int).

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Do not mix bytes and unicode strings

Mixing the two is a long-standing recipe for errors in Python 2, so much
so that Python 3 now completely separates them.

This commit stops treating both as if they were the same, and in the
process makes the script compatible with both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Explicitly use a list

On Python 2, the builtin functions filter() returns a list.

On Python 3, it returns an iterator.

Since we want to use those objects in contexts where we need lists, we
need to explicitly turn them into lists.

This makes the code compatible with both Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Use the right function for the job

The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.

This does change the order of the results slightly, but it should be ok.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

egl: Fix leak of X11 pixmaps backing pbuffers in DRI3.

This is basically copied from the DRI2 destroy path. Without this,
Raspberry Pi would quickly run out of CMA during the EGL tests in the CTS
due to all the pixmaps laying around.

Fixes: f35198badeb9 ("egl/x11: Implement dri3 support with loader's dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.

When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>

i965: Only enable depth IZ signals if there's an actual depthbuffer.

According to the G45 PRM Volume 2 Page 265 we're supposed to only set
these signals when there is an actual depth buffer. Note that we
already do this for the stencil buffer by virtue of brw->stencil_enabled
invoking _mesa_is_stencil_enabled(ctx) which checks whether the current
drawbuffer's visual has stencil bits (which is updated based on what
buffers are bound). We just need to do it for depth as well.

Not observed to fix anything.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

glx: GLX_MESA_multithread_makecurrent is direct-only

This extension is not defined for indirect contexts. Marking it as
"client only", as the old code did here, would make the extension
available in indirect contexts, even though the server would certainly
not have it in its extension list.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

anv: set error in all failure paths

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 5b196f39bddc689742d3 "anv/pipeline: Compile to NIR in compile_graphics"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools: add missing variable initialisation

Fixes: 6a60beba4089315685b8 "intel/tools: Add an error state to aub translator"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drirc: Allow extension midshader for Metro Redux

This fixes both Metro 2033 Redux and Metro Last Light Redux

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99730
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

glsl: handle error case with ast_post_inc, ast_post_dec

Return ir_rvalue::error_value with ast_post_inc, ast_post_dec if
parser error was emitted previously. This way process_array_size
won't see bogus IR generated like with commit 9c676a64273.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98699
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

vc4: Implement texture_subdata() to directly upload tiled data.

This avoids a memcpy into a temporary in the upload path.

Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)

vc4: Handle partial loads/stores of tiled textures.

Previously, we would load out the tile-aligned area, update the raster
copy, and store it back. This was a huge cost for XPutImage calls to the
screen under glamor.

Instead, implement a general load/store path that walks over the source
x/y writing into the corresponding pixel of the destination (using clever
math from
https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/).
If things are aligned, we go through the previous utile-at-a-time loop.

Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)

vc4: Compile the LT image helper per cpp we might load/store.

For the partial load/store support I'm about to add, we want the memcpy to
be compiled out to a single load/store. This should also eliminate the
calls to vc4_utile_width/height().

Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)

vc4: Refactor to reuse the LT tile walking code.

wayland/egl: update surface size on window resize

According to EGL 1.5 spec, section 3.10.1.1 ("Native Window Resizing"):

  "If the native window corresponding to _surface_ has been resized
   prior to the swap, _surface_ must be resized to match. _surface_ will
   normally be resized by the EGL implementation at the time the native
   window is resized. If the implementation cannot do this transparently
   to the client, then *eglSwapBuffers* must detect the change and
   resize surface prior to copying its pixels to the native window."

So far, resizing a native window in Wayland/EGL was interpreted in Mesa
as a request to resize, which is not executed until the first draw call.
And hence, surface size is not updated until executing it. Thus,
querying the surface size with eglQuerySurface() after a window resize
still returns the old values.

This commit updates the surface size values as soon as the resize is
done, even when the real resize is done in the draw call. This makes the
semantics that any native window resize request take effect inmediately,
and if user calls eglQuerySurface() it will return the new resized
values.

v2: update surface size if there isn't a back surface (Daniel)

CC: Daniel Stone <daniel@fooishbar.org>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>

wayland/egl: initialize window surface size to window size

When creating a windows surface with eglCreateWindowSurface(), the
width and height returned by eglQuerySurface(EGL_{WIDTH,HEIGHT}) is
invalid until buffers are updated (like calling glClear()).

But according to EGL 1.5 spec, section 3.5.6 ("Surface Attributes"):

  "Querying EGL_WIDTH and EGL_HEIGHT returns respectively the width and
   height, in pixels, of the surface. For a window or pixmap surface,
   these values are initially equal to the width and height of the
   native window or pixmap with respect to which the surface was
   created"

This fixes dEQP-EGL.functional.color_clears.* CTS tests

v2:
- Do not modify attached_{width,height} (Daniel)
- Do not update size on resizing window (Brendan)

CC: Daniel Stone <daniel@fooishbar.org>
CC: Brendan King <brendan.king@imgtec.com>
CC: mesa-stable@lists.freedesktop.org
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>

travis: make drivers explicit in Meson targets

Like in the autotools target, make the list of drivers to be built in
each of the Meson targets explicit.

This will help to identify missing dependencies and other issues more
easily.

CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

svga: use pipe_sampler_view::target in svga_set_sampler_views()

instead of the underlying texture's target. This fixes an issue
where the TGSI sampler type was not agreeing with the sampler view
target/type. In particular, this fixes a Mint 19 XFCE desktop
scaling issue because the TGSI code was using a RECT sampler but
the sampler view's underlying texture was PIPE_TEXTURE_2D.

We want to use the sampler view's type rather than the underlying
resource, as we do for the view's surface format.

No piglit regressions.

VMware issue 2156696.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: use SVGA3D_RS_FILLMODE for vgpu9

I'm not sure why we didn't support this in the past, but fillmode
is supported by all renderers nowadays.

Also fix the logic in svga_create_rasterizer_state() to avoid a few
swtnl case.

No piglit regressions

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: add TGSI_SEMANTIC_FACE switch case in svga_swtnl_update_vdecl()

Fixes failed assertion running Piglit polygon-mode-face test.
Though, the test still does not pass.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

xlib: remove unused Fake_glXGetAGPOffsetMESA() function

To silence compiler warning.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gl.h: define GLeglImageOES depending on GL_EXT_EGL_image_storage

To avoid duplicate typedef with the definition in glext.h

V2: test for both GL_OES_EGL_image and GL_EXT_EGL_image_storage in
case both the GL and GLES headers are included. Per Emil.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107488
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>

Android: copy -fno*math* options from the autotools build

Add -fno-math-errno and -fno-trapping-math to the build.

Mesa does not depend on the functionality provided, thus this should
result in slightly faster code and smaller binaries.

Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>

autotools: use correct gl.pc LIBS when using glvnd

This is more of a hack, since glvnd itself should be providing the file.
Until that happens, ensure the libs is correctly set to -lGL

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

glx: automake: add egl.pc/headers TODO when using glvnd

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

egl: automake: add egl.pc/headers TODO when using glvnd

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

autotools: error out when building with mangling and glvnd

It's not a thing that can work, nor is a wise idea to attempt.

v2: Tweak error message (Dylan)

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)

autotools: error out when using the broken --with-{gl, osmesa}-lib-name

The toggles were broken with the introduction of --enable-mangling.
Fixing that up might be possible, but it's not worth the complexity
since one can rename the libraries at any point.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

meson: recommend building the surfaceless platform

It has no special requirements, size and build-time is effectively zero.

v2: Rebase

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

automake: require shared glapi when using DRI based libGL

This has been a requirement for ages, yet it seems like we never
explicitly errored out during configure.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>

ttn: remove {varying_slot, frag_result}_to_tgsi_semantic helpers

The respective drivers have been updated and the helpers are no longer
needed.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

travis: remove libedit-dev dependency in LLVM 6.0 targets

In LLVM <6.0 we added explicitly libedit-dev, as it was required to
satisfy apt dependencies.

In LLVM 6.0, this is not required anymore, so let's remove it.

CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

glsl_to_tgsi: plumb image writable through to driver

The virgl driver cares about the writable-flag on image definitions,
because it re-emits GLSL from the TGSI. However, so far it was hardcoded
to true in glsl_to_tgsi, which cause problems when virglrenderer is
running on top of GLES 3.1, where not all formats are supported for
writable images.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels.

We won't have an FD if we're just having the server wait on a fence
created by eglCreateSyncKHR(). Our seqno fences will happen in order, so
server-side waits are no-ops in that case. Fixes
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete

Fixes: b0acc3a5628c ("broadcom/vc4: Native fence fd support")

vc4: Ignore samplers for finding uniform offsets.

Fixes:
dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex

Cc: mesa-stable@lists.freedesktop.org

vc4: Extend dumping of uniforms in QIR and in the command stream.

Similar to what I did for V3D, provide some description of the uniforms.

vc4: Pull uinfo->data[i] dereference out to the top of the loop.

Reduces the size of vc4_uniforms.o by about 10%. We would basically
always end up loading the cachline of uinfo->data[i] anyway, so it should
be good for performance as well as making the code a bit cleaner.

vc4: Make sure to emit a tile coordinates between two MSAA loads.

The HW only executes a load once the tile coordinates packet happens, and
only tracks one at a time, so by emitting our two MSAA loads back to back
we would end up with an undefined color or Z buffer. The simulator
doesn't seem to care, but sync up the RCL generation with the kernel
anyway.

Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window

vc4: Respect a sampler view's first_layer field.

Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture

Cc: mesa-stable@lists.freedesktop.org

virgl: add ARB_shader_clock support

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

python: Specify the template output encoding

We're trying to write a unicode string (i.e decoded) to a file opened
in binary (i.e encoded) mode.

In Python 2 this works, because of the automatic conversion between
byte and unicode strings.

In Python 3 this fails though, as no automatic conversion is attempted.

This change makes the scripts compatible with both versions of Python.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Fix rich comparisons

Python 3 doesn't call objects __cmp__() methods any more to compare
them. Instead, it requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().

Fortunately Python 2 also supports those.

This commit only implements the comparison methods which are actually
used by the build scripts.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Use explicit integer divisions

In Python 2, divisions of integers return an integer:

    >>> 32 / 4
    8

In Python 3 though, they return floats:

    >>> 32 / 4
    8.0

However, Python 3 has an explicit integer division operator:

    >>> 32 // 4
    8

That operator exists on Python >= 2.2, so let's use it everywhere to
make the scripts compatible with both Python 2 and 3.

In addition, using __future__.division tells Python 2 to behave the same
way as Python 3, which helps ensure the scripts produce the same output
in both versions of Python.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

egl/main: Add bits for EGL_KHR_mutable_render_buffer

A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
This patch is separate from the Android patch because I think it's
easier to review the platform-independent bits separately.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

dri: Add param driCreateConfigs(mutable_render_buffer)

If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.

Not used yet.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

dri: Define DRI_MutableRenderBuffer extensions

Define extensions DRI_MutableRenderBufferDriver and
DRI_MutableRenderBufferLoader. These are the two halves for
EGL_KHR_mutable_render_buffer.

Outside the DRI code there is one additional change. Add
gl_config::mutableRenderBuffer to match
__DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

egl/dri2: In dri2_make_current, return early on failure

This pulls an 'else' block into the function's main body, making the
code easier to follow.

Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
transforms dri2_make_current() into spaghetti.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

egl: Simplify queries for EGL_RENDER_BUFFER

There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
eglQuerySurface(EGL_RENDER_BUFFER) and
eglQueryContext(EGL_RENDER_BUFFER).

These changes eliminate potentially very fragile code in the upcoming
EGL_KHR_mutable_render_buffer implementation.

* eglQuerySurface(EGL_RENDER_BUFFER)

  The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
  abstruse logic which required comprehending the specification
  complexities of how the two EGL_RENDER_BUFFER states interact.  The
  function sometimes returned _EGLContext::WindowRenderBuffer, sometimes
  _EGLSurface::RenderBuffer. Why? The function tried to encode the
  actual logic from the EGL spec. When did the function return which
  variable? Go study the EGL spec, hope you understand it, then hope
  Mesa mutated the EGL_RENDER_BUFFER state in all the correct places.
  Have fun.

  To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
  confidence in its correctness, flatten its indirect logic. For pixmap
  and pbuffer surfaces, simply return a hard-coded literal value, as the
  spec suggests. For window surfaces, simply return
  _EGLSurface::RequestedRenderBuffer.  Nothing difficult here.

* eglQueryContext(EGL_RENDER_BUFFER)

  The implementation of this suffered from the same issues as
  eglQuerySurface, and the solution is the same.  confidence in its
  correctness, flatten its indirect logic. For pixmap and pbuffer
  surfaces, simply return a hard-coded literal value, as the spec
  suggests. For window surfaces, simply return
  _EGLSurface::ActiveRenderBuffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

radeonsi: set GLC=1 for all write-only shader resources

radeonsi: don't load block dimensions into SGPRs if they are not variable

travis: meson/Vulkan requires LLVM 6.0

RADV now requires LLVM 6.0.

Fixes: fd1121e8399 ("amd: remove support for LLVM 5.0")
CC: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>

travis: add ubuntu-toolchain-r-test

LLVM 6.0 requires libstc++4.9, which is not available in main Travis
repository.

v2: LLVM 6.0 requires libstdc+4.9, rather than GCC 4.9 (Jan Vesely)

Fixes: fd1121e8399 ("amd: remove support for LLVM 5.0")
CC: Marek Olšák <marek.olsak@amd.com>
CC: Emil Velikov <emil.velikov@collabora.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

egl: set EGL_BAD_NATIVE_PIXMAP in the copy_buffers fallback

As the spec says:

EGL_BAD_NATIVE_PIXMAP is generated if the implementation
does not support native pixmaps.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

egl/x11: use the no-op dri2_fallback_copy_buffers for swrast

Currently dri2_copy_buffers is used for swrast, which depends on the
DRI2_FLUSH extension. Since that's not a thing on software based
drivers we crash out.

Do the slightly more graceful, thing of returning EGL_FALSE.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

egl: remove unneeded _eglGetNativePlatform check

There's little point in calling _eglGetNativePlatform() in
eglCopyBuffers. The platform returned should be identical to the one
already stored in our _EGLDisplay.

In the following corner case, the check is incorrect.

The function _eglGetNativePlatform effectively invokes the old-style
eglGetDisplay platform selection. Thus if the EGL_PLATFORM platform does
not match with the EGL_EXT_platform_* used to create the display we'll
error out.

Addresses the egl-copy-buffers piglit test.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

travis: use https for all the links

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

autoconf: stop exporting internal wayland details

With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".

Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.

That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"

Inspired by similar xserver patch by Adam Jackson.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

meson: stop exporting internal wayland details

With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".

Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.

That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"

Inspired by similar xserver patch by Adam Jackson.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

meson: use dependency()+find_program() for wayland-scanner

Helps when the native wayland-scanner is located outside of PATH.
Inspired by the xserver code ;-)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>