review.tizen.org Git - platform/upstream/mesa.git/log

llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex info

I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I
actually thought it should not be necessary and a piglit run didn't show
any differences, but this shouldn't have been in there.
draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER.
The new polygon-mode-facing test indeed shows why this is necessary, there's
lots of invalid reads and writes with valgrind (also crashes without
valgrind), because the pre-pipeline vertex size doesn't match the
post-pipeline vertex size (note this won't help much with stages which don't
have the prepare hook which can grow the vertex size, in particular the wide
point stage, but this isn't used by llvmpipe). The test still won't pass, of
course, but it is only usage of uninitialized values now, which is much
less dangerous...
(Albeit I'm pretty sure for i915 it really is not needed anymore as it
doesn't care about the extra outputs and doesn't call
draw_prepare_shader_outputs().)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiers

Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm))
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gk110/ir: fix load from shared memory

It was accidentally using the store opcode.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gk110/ir: add partial BAR support

This is enough for the plain TGSI BARRIER implementation.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

Revert "glsl: move uniform calculation to link_uniforms"

This reverts commit 4475d8f9169195baefa893b9b147fe20414cda7c.

glsl: move uniform calculation to link_uniforms

Patch moves uniform calculation to happen during link_uniforms, this
is possible with help of UniformRemapTable that has all the reserved
locations.

Location assignment for implicit locations is changed so that we
utilize also the 'holes' that explicit uniform location assignment
might have left in UniformRemapTable, this makes it possible to fit
more uniforms as previously we were lazy here and wasting space.

Fixes following CTS tests:
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
   ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array

v2: code cleanups, increment NumUniformRemapTable correctly, fix
    find_empty_block to work properly and add some more comments.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>

glsl: add missing explicit_image_format flag to has_layout()

Fixes piglit regression after fixes to duplicate layout rules.

Previously catching multiple layouts was relying on the code
meant to catch duplicates within a single layout(...), this
change triggers the rules for multiple layouts.

Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formats

If we have a d24x8 format, there is no stencil. Therefore, we can always
clear these bits too, which means this will be some kind of memset rather
than read-modify-write.
This is good for some 7% increase or so in gears with huge window size -
seems to have a bigger effect if things aren't in caches. Of course, any
real app won't spend nearly as much time comparatively in clearing
depth buffer in the first place, so the speedup will be much lower.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

i965: Implement compute sampler state atom.

Fixes a number of GLES31 CTS failures and hangs on various hardware:

ES31-CTS.texture_gather.plain-gather-depth-2d
ES31-CTS.texture_gather.plain-gather-depth-2darray
ES31-CTS.texture_gather.plain-gather-depth-cube
ES31-CTS.texture_gather.offset-gather-depth-2d
ES31-CTS.texture_gather.offset-gather-depth-2darray
ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader
ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader
ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers
ES31-CTS.compute_shader.resources-texture

Some of them were actually passing by luck on some generations even
though we weren't uploading sampler state tables explicitly for the
compute stage, most likely because they relied on the cached sampler
state left from previous rendering to be close enough.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725
Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965: Trigger CS state reemission when new sampler state is uploaded.

This reuses the NEW_SAMPLER_STATE_TABLE state bit (currently only used
on pre-Gen7 hardware) to signal that the sampler state tables have
changed in order to make sure that the GPGPU interface descriptor is
updated.

Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

glsl: Don't abbreviate tessellation shader stage names.

I have a patch that writes shaders as .shader_test files, and it uses
this function to create the headers (i.e. [vertex shader]).

[tess ctrl shader] isn't a valid shader_runner header - it's spelled
out as [tessellation control shader].

There's no real reason to abbreviate it, so spell it out.

v2: Rebase on Rob's patches to move the code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

mesa: remove link validation that should be done elsewhere

Even if re-linking fails rendering shouldn't fail as the previous
succesfully linked program will still be available. It also shouldn't
be possible to have an unlinked program as part of the current rendering
state.

This fixes a subtest in:
ES31-CTS.sepshaderobjs.StateInteraction

This change should improve performance on CPU limited benchmarks as noted
in commit d6c6b186cf308f.

>From Section 7.3 (Program Objects) of the OpenGL 4.5 spec:

   "If a program object that is active for any shader stage is re-linked
    unsuccessfully, the link status will be set to FALSE, but any existing
    executables and associated state will remain part of the current rendering
    state until a subsequent call to UseProgram, UseProgramStages, or
    BindProgramPipeline removes them from use. If such a program is attached to
    any program pipeline object, the existing executables and associated state
    will remain part of the program pipeline object until a subsequent call to
    UseProgramStages removes them from use. An unsuccessfully linked program may
    not be made part of the current rendering state by UseProgram or added to
    program pipeline objects by UseProgramStages until it is successfully
    re-linked."

   "void UseProgram(uint program);

   ...

   An INVALID_OPERATION error is generated if program has not been linked, or
   was last linked unsuccessfully.  The current rendering state is not modified."

V2: apply the rule to both core and compat.

Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

glsl: allow multiple layout qualifiers for a single declaration

From the ARB_shading_language_420pack spec:

   "More than one layout qualifier may appear in a single
   declaration. If the same layout-qualifier-name occurs in
   multiple layout qualifiers for the same declaration, the
   last one overrides the former ones."

The parser was already failing correctly when the extension is
not available but testing for duplicates within a single layout
qualifier was still causing this to fail when available as both
cases share the same function for merging.

Here we add a parameter to differentiate between the two uses
and apply it to the duplicate test.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: update parser to allow duplicate default layout qualifiers

In order to only create a single node for each default declaration
we add a new boolean parameter to the in/out merge function to
only create one once we reach the rightmost layout qualifier.

From the ARB_shading_language_420pack spec:

   "More than one layout qualifier may appear in a single
   declaration. If the same layout-qualifier-name occurs in
   multiple layout qualifiers for the same declaration, the
   last one overrides the former ones."

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: move default layout qualifier rules out of the parser

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: split layout_defaults into specific types

This will allow merging of duplicate layout qualifiers as allowed
by ARB_shading_language_420pack

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: allow duplicate layout-qualifier-names

This is added by ARB_enhanced_layouts although it doesn't fit
into any of the six main changes so we enable this independently.

From the ARB_enhanced_layouts spec:

   "More than one layout qualifier may appear in a single
   declaration. Additionally, the same layout-qualifier-name
   can occur multiple times within a layout qualifier or across
   multiple layout qualifiers in the  same declaration"

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

i965/vec4: Spaces around operators.

i965: Inform compiler of variable range to silence warning.

Extends commit 6531ccb70 to silence the warning in release builds as
well.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

glsl: Restore Mesa-style to shader_enums.c/h.

st/va: fix motion adaptive deinterlacing

Signed-off-by: Christian König <christian.koenig@amd.com>

util/u_pstipple.c: copy immediates during transformation

Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...

Fixes a bug in Kodi with radeonsi reported by Christian König.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Move sanity check of BindVertexBuffer for OpenGL ES 3.1

Sanity check of BindVertexBuffer for OpenGL ES in
_mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

glsl: fix interface block error message

Print the stream value not the pointer to the expression,
also use the unsigned format specifier.

Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv50/ir: swap the least-ref'd source into src1 when both const/imm

The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.

This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.

While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:

total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs    : 945082 -> 943417 (-0.18%)
total local used in shared programs   : 30372 -> 30288 (-0.28%)
total bytes used in shared programs   : 50089256 -> 50047880 (-0.08%)

                local        gpr       inst      bytes
    helped          21         822        3332        3332
      hurt           0         278         565         565

And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.

On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):

total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs    : 728719 -> 725131 (-0.49%)
total local used in shared programs   : 3552 -> 3552 (0.00%)
total bytes used in shared programs   : 43995688 -> 43932928 (-0.14%)

                local        gpr       inst      bytes
    helped           0        1380        3252        3774
      hurt           0         287        1710        1365

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

st/mesa: restore the stObj's size if it was cleared out

An issue could still occur if the base level is set, but fixing that
would require a lot more logic.

This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

freedreno/a4xx: use smaller threadsize for more registers

Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize. (The vertex shader already
runs at TWO_QUADS, which is the minimum.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno: per-generation OUT_IB packet

Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per
generation. Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).

Signed-off-by: Rob Clark <robclark@freedesktop.org>

gallium: bundle the compat header u_pwr8.h in the tarball

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

mapi: include gl.xml in the tarball

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

i965: adding missing headers to the dist tarball

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

st/va: add motion adaptive deinterlacing v2

v2: minor cleanup

Signed-off-by: Christian König <christian.koenig@amd.com>

gallium/radeon: Rename do_invalidate_resource to invalidate_buffer

And only call it from r600_invalidate_resource for buffer resources.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/dri: Don't call invalidate_resource for NULL depth/stencil buffers

Fixes crash in 4 EGL piglit tests with radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

radeonsi: Print "LLVM emitted unknown config register" warning only once

Say "LLVM" instead of "Compiler" for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

llvmpipe: use vpkswss when dst is signed

This patch fixes a bug when building a pack instruction.

For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.

This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale

I've also corrected some coding style issues in the function.

v2: Returned else statements to vmware style

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

glsl: fix subroutine lowering reusing actual parmaters

One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).

this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>

glsl: remove special case for detecting stream duplicates

Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

glsl: add missing explicit_stream flag to has_layout()

This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

mesa: fix segfault in glUniformSubroutinesuiv()

From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:

   "The command

       void UniformSubroutinesuiv(enum shadertype, sizei count,
                                  const uint *indices);

   will load all active subroutine uniforms for shader stage
   shadertype with subroutine indices from indices, storing
   indices[i] into the uniform at location i. The indices for
   any locations between zero and the value of
   ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
   used will be ignored."

V2: simplify NULL check suggested by Jason.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=93731

glsl: fix segfault linking subroutine uniform with explicit location

Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org

gm107/ir: don't do indirect frag shader inputs on GM107

Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

tgsi: initialize Atomic field in tgsi_default_declaration

Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nvc0: bsp_bo can't be null

We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

llvmpipe: fix arguments order given to vec_andc

This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

freedreno/ir3: fix mad 3rd src delay calc

In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: better array register allocation

Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: array offset can be negative

It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: workaround bug/feature

Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions. This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

ttn: use writemask for store_var

Only user is freedreno, and after array-rework it can cope. Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: array rework

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: refactor/simplify cp

If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: fix incorrect decoding of mov instructions

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: remove unused tgsi tokens ptr

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: bit of ra refactor

Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/ir3: cosmetic de-indent

Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

ttn: add missing writemask on store_output

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir/print: const_index is signed

Noticed this with $piglit/bin/vp-address-01

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: few missing struct names

nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'. But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.

So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>

nv50/ir: add saturate support on ex2

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gallivm: avoid crashing in mod by 0 with llvmpipe

This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).

The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions. Implementations of
current language revisions may or may not perform them.

This patch makes Mesa apply implicti conversions even on current
language versions. Applications appear to expect this behavior,
and there's really no downside to doing so.

Fixes shader compilation in Shadow of Mordor.

Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org

i965/fs: Always set channel 2 of texture headers in some stages

In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers.  However, in compute,
geometry, and tessellation stages, the hardware is not so nice.  In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15.  As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask.  This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled.  Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

i965/fs/generator: Take an actual shader stage rather than a string

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/vec4: Use UW type for multiply into accumulator on GEN8+

BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."

Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

llvmpipe: ditch additional ref counting for vertex/geometry sampler views

The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

llvmpipe: fix "leaking" textures

This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...

v2: rename var

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

glsl: restrict consumer stage condition to modify interpolation type

Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.

If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.

Fixes the following CTS test:
ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate

Fixes the following dEQP tests:

dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

i965: Apply add_const_offset_to_base for vec4 VS inputs too.

This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965: Make add_const_offset_to_base() work at the shader level.

This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965: Make an is_scalar boolean in brw_compile_vs().

Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir/builder: Add a nir_build_ivec4() convenience helper.

nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

glsl: mark explicit uniforms as explicit in other stages too

If shader declares uniform explicit location in one stage but
implicit in another, explicit location should be used. Patch marks
implicit uniforms as explicit if they were explicit in previous stage.
This makes sure that we don't treat them implicit later when assigning
locations.

Fixes following CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-implicit-in-some-stages3

v2: move check to cross_validate_globals (Timothy)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>

i965/gen7.5+: Disable resource streamer during GPGPU workloads.

The RS and hardware binding tables are only supported on the 3D
pipeline and can lead to corruption if left enabled during a GPGPU
workload. Disable it when switching to the GPGPU (or media) pipeline
and re-enable it when switching back to the 3D pipeline.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

i965/gen7: Emit stall and dummy primitive draw after switching to the 3D pipeline.

This hardware bug can supposedly lead to a hang on IVB and VLV.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines.

AFAIK brw_emit_select_pipeline() is only called once during context
init on Gen4-5, at which point the pipeline is likely to be already
idle so it may just happen to work by luck regardless of the MI_FLUSH.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen6-7: Implement stall and flushes required prior to switching pipelines.

Switching the current pipeline while it's not completely idle or the
read and write caches aren't flushed can lead to corruption. Fixes
misrendering of at least the following Khronos CTS test:

ES31-CTS.shader_image_load_store.basic-allTargets-store-fs

The stall and flushes are no longer required on Gen8+.

v2: Emit PIPE_CONTROL with non-zero post-sync op before the write
cache flush on SNB due to hardware bug. (Ken)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline.

This hardware bug can cause a hang on context restore while the
current pipeline is set to GPGPU (BDWGFX HSD 1909593). In addition to
clearing the valid bit, mark the CC state as dirty to make sure that
the CC indirect state pointer is re-emitted when we switch back to the
3D pipeline.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add state bit to trigger re-emission of color calculator state.

This will be used on Gen8+ to make sure that the color calculator
state pointers are re-emitted when switching back to the 3D pipeline
after some GPGPU workload due to a hardware workaround. There are
other state bits already defined that could be used to achieve the
same effect but they all cause a ton of unrelated state to be
re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new
one, state bits are cheap.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:

total local used in shared programs : 54116 -> 30372 (-43.88%)

Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection

Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.

This helps a lot of Metro 2033 Redux and a handful of KSP shaders:

total instructions in shared programs : 6288373 -> 6261517 (-0.43%)
total gprs used in shared programs : 944051 -> 945131 (0.11%)
total local used in shared programs : 54116 -> 54116 (0.00%)

A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: be careful about propagating very large offsets into const load

Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.

This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.

Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: allow fragment shader inputs to use indirect indexing

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

st/mesa: use surface format to generate mipmaps when available

This fixes the recently posted mipmap + texture views piglit test.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

radeonsi: don't miss changes to SPI_TMPRING_SIZE

I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

svga: add DXGenMips command support

For those formats that support hw mipmap generation, use the
DXGenMips command. Otherwise fallback to the mipmap generation utility.

Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench)

v2: make sure the texture surface was created with the render target bind flag
set relocation flag to SVGA_RELOC_WRITE for the texture surface

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

svga: add num-generate-mipmap HUD query

The actual increment of the num-generate-mipmap counter will be done
in a subsequent patch when hw generate mipmap is supported.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium/st: add pipe_context::generate_mipmap()

This patch adds a new interface to support hardware mipmap generation.
PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify
if this new interface is supported; if not supported, the state tracker will
fallback to mipmap generation by rendering/texturing.

v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers
v3: add format to the generate_mipmap interface to allow mipmap generation
using a format other than the resource format
v4: fix return type of trace_context_generate_mipmap()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

st/mesa: declare struct pipe_screen in st_cb_bufferobjects.h

To silence a compiler warning. Trivial.

nir: Lower bitfield_extract.

The OpenGL specifications for bitfieldExtract() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.

This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as

   bitfieldExtract:
      bits > 31 ? value : bfe(value, offset, bits)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>

nir: Handle <bits>=32 case in bitfield_insert lowering.

The OpenGL specifications for bitfieldInsert() says:

   The result will be undefined if <offset> or <bits> is negative, or if
   the sum of <offset> and <bits> is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.

This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as

   bitfieldInsert:
      bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>

st/mesa: add check for color logicop in blit_copy_pixels()

We check that a bunch of raster operations are disabled in
blit_copy_pixels(). We also need to check that color logicop is
disabled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/radeon: do not reallocate user memory buffers

The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE

This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: implement Driver.InvalidateBufferSubData

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/mesa: use pipe->invalidate_resource instead of buffer re-allocation

Drivers are expected to avoid unnecessary work when possible in this code
path.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: add PIPE_CAP_INVALIDATE_BUFFER

It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: add Driver.InvalidateBufferSubData

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>