platform/upstream/mesa.git
3 years agozink: remove support for fcsel
Erik Faye-Lund [Wed, 6 Jan 2021 05:59:27 +0000 (06:59 +0100)]
zink: remove support for fcsel

fcsel is only emitted by bool -> float lowering. We used to do that a
long time ago, but no longer. So we don't need to support this opcode
any longer.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8347>

3 years agozink: also lower scmp for soft-fp
Erik Faye-Lund [Wed, 6 Jan 2021 06:38:41 +0000 (07:38 +0100)]
zink: also lower scmp for soft-fp

We recently added two versions of these options, due to soft-fp support.
So let's also add the lowering to the soft-fp version.

Fixes: 43302ead383 ("zink: use lower_scmp instead of open-coding")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8347>

3 years agopanfrost: Fix AFBC on Bifrost v6
Boris Brezillon [Tue, 5 Jan 2021 13:02:07 +0000 (14:02 +0100)]
panfrost: Fix AFBC on Bifrost v6

The AFBC layout of RT/ZS-extension descriptors on Bifrost v6 matches the
v7 one except for the Block Format field. Update the set_buf() functions
accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8328>

3 years agoradeonsi: enable vrs2x2 coarse shading if flat shading (v9)
Yogesh mohan marimuthu [Mon, 14 Dec 2020 15:56:27 +0000 (21:26 +0530)]
radeonsi: enable vrs2x2 coarse shading if flat shading (v9)

Enable vrs2x2 coarse shading if flat shading as per
idea and guidance given by Marek.

is_flat_shading variable in struct si_shader_info is set
based on the data from gather_intrinsic_info() function
and struct si_state_rasterizer. If is_flat_shading_variable
is set, then in function si_emit_db_render_state() vrs2x2
shading is enabled in hardware.

v2: Fix review comments from Pierre-Eric. Code optimizations.
v3: Fix indentation style issue.
v4: Fix review comments from Marek. Fixed logical issue pointed
    by Marek where info->is_flat_shading variable can be corrupted
    and other code cleanup.
v5: Make the code compact as suggested by Pierre-Eric.
v6: Fix new review comments from Marek.
v7: use info->uses_interp_color variable fix from Marek.
v8: Fix coding style comment from Marek.
v9: Add uses_fbfetch_output check as suggested by Marek.

Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8161>

3 years agogallium/ntt: Add support for PIPE_CAP_LOAD_CONSTBUF.
Eric Anholt [Mon, 21 Dec 2020 23:29:19 +0000 (15:29 -0800)]
gallium/ntt: Add support for PIPE_CAP_LOAD_CONSTBUF.

We needed to do this anyway to finish enabling NTT in general, but more
importantly: when we enabled sending NIR to the draw module, that broke
PIPE_CAP_LOAD_CONSTBUF drivers in the select/feedback paths if LLVM was
disabled.

Fixes: 44b7e1497f91 ("st/mesa: don't generate TGSI for the draw VS because it now supports NIR too")
(along with the rest of this MR)

Closes: #3996
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/ntt: Fix load_ubo_vec4 buffer index setup.
Eric Anholt [Tue, 22 Dec 2020 00:22:03 +0000 (16:22 -0800)]
gallium/ntt: Fix load_ubo_vec4 buffer index setup.

I had a funny +1 in nir_to_tgsi's load_ubo lowering on the buffer index,
because I hadn't set lower_uniform_to_ubo for softpipe.  This removes that
weirdness in favor of just using lower_uniform_to_ubo, regardless of
driver preference (which matters if a NIR-native driver had it set, and
then the gallium draw module triggered the non-LLVM TGSI fallback path
that hit NTT).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/ntt: Fix dynamic indirect indexing of per_vertex_input.
Eric Anholt [Tue, 22 Dec 2020 01:12:02 +0000 (17:12 -0800)]
gallium/ntt: Fix dynamic indirect indexing of per_vertex_input.

It was off by one due to some copy and paste from UBO handling.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/ntt: Fix emitting UBO declarations.
Eric Anholt [Fri, 11 Dec 2020 21:11:26 +0000 (13:11 -0800)]
gallium/ntt: Fix emitting UBO declarations.

Fixes: d70fff99c5bc ("nir: Use a single list for all shader variables")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/tgsi_exec: Add support for PIPE_CAP_LOAD_CONSTBUF.
Eric Anholt [Tue, 5 Jan 2021 19:15:06 +0000 (11:15 -0800)]
gallium/tgsi_exec: Add support for PIPE_CAP_LOAD_CONSTBUF.

Now that we can end up in nir-to-tgsi in the draw fallback paths of
drivers with that flag set, we need to support it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/tgsi_exec: Refactor to fix CS local memory overflow checks.
Eric Anholt [Tue, 5 Jan 2021 19:12:11 +0000 (11:12 -0800)]
gallium/tgsi_exec: Refactor to fix CS local memory overflow checks.

It was OK because right now we only execute in the first channel of the
CS, but if you wanted to extend that then you'd need to check each
channel.  We already had what we needed for SSBOs, so just reuse it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agogallium/tgsi_exec: Fix assertion failure about missing constbufs.
Eric Anholt [Thu, 10 Dec 2020 19:34:48 +0000 (11:34 -0800)]
gallium/tgsi_exec: Fix assertion failure about missing constbufs.

GL by default gives you UB when you access a missing constbuf, and we were
crashing on debug builds in that case.  More importantly, we were
assertion failing even under valid circumstances, when a !ExecMask channel
had a bad value for the indirect buffer index and we tried to load from it
anyway.

In removing the assertion, also sink the buf declaration to after we've
done the bounds check that determines that there's a constbuf actually
bound to this index.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8196>

3 years agod3d12: Don't allocate mappable textures
Jesse Natalie [Mon, 14 Dec 2020 19:59:53 +0000 (11:59 -0800)]
d3d12: Don't allocate mappable textures

There's not really a reason to directly map textures. Doing so
requires the texture to be allocated in system RAM instead of
video RAM, which means all GPU access to it would be needlessly slow.

Notably, the one texture type that was allocated this way is the
display target texture for the software driver path. Instead, use
pipe_transfer_map to be able to copy the texture to system RAM.

Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>

3 years agod3d12: Use an appropriate pipe resource usage for map intermediates
Jesse Natalie [Mon, 14 Dec 2020 19:53:39 +0000 (11:53 -0800)]
d3d12: Use an appropriate pipe resource usage for map intermediates

Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>

3 years agod3d12: Use buffer pipe usage to inform allocation
Jesse Natalie [Mon, 14 Dec 2020 19:53:13 +0000 (11:53 -0800)]
d3d12: Use buffer pipe usage to inform allocation

For non-CPU-accessible pipe resource types (DEFAULT/IMMUTABLE),
allocate non-CPU-accessible buffers directly from the cache_bufmgr.
Update the d3d12_bo creation to handle nonmappable buffers.

For CPU-write-only (DYNAMIC/STREAM), use the upload slab_bufmgr.
Update this slab manager to use CPU_WRITE | GPU_READ PB usage.

For CPU-read-write (STAGING), use the readback_slab_bufmgr.

Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>

3 years agod3d12: Add a slab bufmgr for readback buffers
Jesse Natalie [Mon, 14 Dec 2020 19:45:16 +0000 (11:45 -0800)]
d3d12: Add a slab bufmgr for readback buffers

Readback (GPU write, CPU read) should use different CPU page
properties compared to upload (write-back vs write-combined).

A future commit will start to respect these PB usage flags.

Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>

3 years agod3d12: Add a path for mapping of not-directly-mappable buffers
Jesse Natalie [Mon, 14 Dec 2020 19:42:49 +0000 (11:42 -0800)]
d3d12: Add a path for mapping of not-directly-mappable buffers

Currently all buffers are allocated as mappable, but a future
commit will change that so that some buffers can be allocated
directly in non-CPU-accessible memory for improved performance.

Note that the returned pointer must be appropriately offset from
a 64-byte-aligned base pointer, so if offsets are used, the data
will be read/written to an offset region in the staging buffer.

Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8095>

3 years agonir: Update saturated float->int/uint conversion algorithm
Jesse Natalie [Mon, 28 Dec 2020 23:45:58 +0000 (15:45 -0800)]
nir: Update saturated float->int/uint conversion algorithm

The mantissa for a float doesn't contain enough data to accurately represent
the min/max values for some destination types. Instead of clamping before
converting, clamp after converting when coming from floats. This improves
conformance of CL conversions, specifically for float -> long/ulong with
int64 emulation enabled.

Refactors the limit determination from the clamp, so we can determine
limits for the dest type (int/uint) in both the source (float) and dest
type. The limit as a float is used for comparison, while the limit as a
dest type is used for bcsel.

Important note is that the comparison is inverted to fge instead of flt,
so the bcsel chooses the direct int/uint over the converted float in the
case where the comparison comes up equal, but the conversion can't produce
the exact min/max value.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8256>

3 years agofreedreno/a5xx: Move link_stream_out after VPC_VAR_DISABLE like on a6xx.
Eric Anholt [Mon, 4 Jan 2021 23:32:56 +0000 (15:32 -0800)]
freedreno/a5xx: Move link_stream_out after VPC_VAR_DISABLE like on a6xx.

Since we've got issues on a5xx xfb that we don't on a6xx, I've been
looking at making them line up a bit better.  No change on tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8336>

3 years agofreedreno/a5xx: Drop redundant stream output linking check.
Eric Anholt [Mon, 4 Jan 2021 23:25:17 +0000 (15:25 -0800)]
freedreno/a5xx: Drop redundant stream output linking check.

The link function just loops over the num_outputs.  Brings us closer to
a6xx.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8336>

3 years agofreedreno/ir3: Deduplicate link_stream_out.
Eric Anholt [Mon, 4 Jan 2021 23:23:02 +0000 (15:23 -0800)]
freedreno/ir3: Deduplicate link_stream_out.

All 3 copies were the same other than style tweaks.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8336>

3 years agozink: use lower_scmp instead of open-coding
Erik Faye-Lund [Tue, 5 Jan 2021 17:36:09 +0000 (18:36 +0100)]
zink: use lower_scmp instead of open-coding

We already have the proper lowering in NIR for this, so there's no point
in doing our own implementations of these. The end result is the same
code anyway.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8335>

3 years agofreedreno/a5xx: implement transform feedback resuming
Danylo Piliaiev [Thu, 31 Dec 2020 13:54:10 +0000 (15:54 +0200)]
freedreno/a5xx: implement transform feedback resuming

Each transform feedback target should have a separate buffer
for an offset from which to resume, instead of just having
one buffer per binding point. Otherwise, if transform feedback
is paused and other tf object is bound - the offset of the
previous tf object would be lost.

Fixes CTS tests:
 dEQP-GLES3.functional.transform_feedback.*triangles*

Fixes Piglit tests:
 gl-3.1-primitive-restart-xfb flush
 gles-3.0-transform-feedback-uniform-buffer-object
 arb_transform_feedback2-change-objects-while-paused
 arb_transform_feedback2-change-objects-while-paused_gles3
 ext_transform_feedback-intervening-read

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8281>

3 years agofreedreno/a6xx: fix transform feedback resuming
Danylo Piliaiev [Thu, 31 Dec 2020 12:49:12 +0000 (14:49 +0200)]
freedreno/a6xx: fix transform feedback resuming

Each transform feedback target should have a separate buffer
for an offset from which to resume, instead of just having
one buffer per binding point. Otherwise, if transform feedback
is paused and other tf object is bound - the offset of the
previous tf object would be lost.

Fixes Piglit tests:
 arb_transform_feedback2-change-objects-while-paused
 arb_transform_feedback2-change-objects-while-paused_gles3
 ext_transform_feedback-alignment 4
 ext_transform_feedback-alignment 8
 ext_transform_feedback-alignment 12
 ext_transform_feedback-change-size offset-grow
 ext_transform_feedback-change-size offset-shrink
 ext_transform_feedback-change-size range-grow
 ext_transform_feedback-change-size range-shrink
 ext_transform_feedback-immediate-reuse-uniform-buffer
 ext_transform_feedback-position *

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8281>

3 years agozink: handle non-const offsets for txf/tg4 ops
Mike Blumenkrantz [Tue, 5 Jan 2021 14:08:33 +0000 (09:08 -0500)]
zink: handle non-const offsets for txf/tg4 ops

required for gl_spirv handling and tg4

Fixes: b77f43f2539 ("zink: use ConstOffset for nir_tex_src_offset")

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8333>

3 years agogallium/dri: Use per-screen DRI extension list
James Jones [Thu, 15 Oct 2020 21:17:16 +0000 (14:17 -0700)]
gallium/dri: Use per-screen DRI extension list

Some DRI extension features are enabled/disabled
based on capabilities of the gallium pipe_screen
associated with the DRI screen. Additionally, the
list of extensions enabled also varied based on
features requested by the screen creator. However,
prior to this change the extension list and
extension definition structures within it were
global variables, meaning the last screen
initialized ended up defining the DRI capabilities
of all screens.

This change instead stores a copy of the
extensions which vary per screen, as well as a
copy of the extension list itself in the gallium
DRI screen structure, allowing them to vary per
screen.

Closes: https://gitlab.freedesktop.org/drm/nouveau/issues/9

Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7175>

3 years agogallium/dri: Factor out DRI extension setup code
James Jones [Thu, 15 Oct 2020 21:30:32 +0000 (14:30 -0700)]
gallium/dri: Factor out DRI extension setup code

Share the DRI extension setup code between
dri2_init_screen and dri_kms_init_screen. There's
currently very little difference, and the sharing
will make a subsequent change to refactor this
code to use per-screen extension lists easier.

Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7175>

3 years agozink: use ConstOffset for nir_tex_src_offset
Erik Faye-Lund [Tue, 5 Jan 2021 10:58:28 +0000 (11:58 +0100)]
zink: use ConstOffset for nir_tex_src_offset

Quote from the OpenGL Shading Language spec, version 4.40, section 8.9.2
"Texel Lookup Functions":

> The offset value must be a constant expression.

So, until we start consuming SPIR-V shaders, it seems we don't need to
deal with non-constant offsets.

This means we can avoid lowering this away in some cases.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8327>

3 years agozink: do not reserve or pack fragment outputs
Erik Faye-Lund [Wed, 25 Nov 2020 11:43:46 +0000 (12:43 +0100)]
zink: do not reserve or pack fragment outputs

These are completely unrelated to other shader IO variables, so they
don't need this logic.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7986>

3 years agozink: do not use reservations for stream-out
Erik Faye-Lund [Tue, 24 Nov 2020 18:08:47 +0000 (19:08 +0100)]
zink: do not use reservations for stream-out

reservations are accumulated for all shader-stages in a program without
resetting it. But stream-out is completely orthogonal to all other
inputs and outputs, so they don't matter for this stuff at all.

So let's drop considering reservations here, and simply count how many
generic outputs we have here instead.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7986>

3 years agozink: destroy device and instance
Erik Faye-Lund [Tue, 15 Dec 2020 09:42:11 +0000 (10:42 +0100)]
zink: destroy device and instance

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: destroy transfer-helper
Erik Faye-Lund [Tue, 15 Dec 2020 09:41:57 +0000 (10:41 +0100)]
zink: destroy transfer-helper

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: free sets and hash-tables in context
Erik Faye-Lund [Tue, 15 Dec 2020 09:41:03 +0000 (10:41 +0100)]
zink: free sets and hash-tables in context

Up until now, we've simply leaked all of these. Let's try to do better.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: dot leak dummy_buffer
Erik Faye-Lund [Tue, 15 Dec 2020 09:39:46 +0000 (10:39 +0100)]
zink: dot leak dummy_buffer

Fixes: 8736ffae2ed ("zink: replace unset buffer with a dummy-buffer")

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: do not leak vertex element state
Erik Faye-Lund [Tue, 15 Dec 2020 09:39:15 +0000 (10:39 +0100)]
zink: do not leak vertex element state

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: release batch memory
Erik Faye-Lund [Tue, 15 Dec 2020 09:38:07 +0000 (10:38 +0100)]
zink: release batch memory

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: destroy blitter before destroying batches
Erik Faye-Lund [Tue, 15 Dec 2020 09:35:32 +0000 (10:35 +0100)]
zink: destroy blitter before destroying batches

Destroying the blitter frees samplers, which pushes the sampler-handles
onto the batches' zombie-sampler lists. So if we want to properly clean
these zombie-samplers up, we need to first get them onto the list so
we'll know about them in time.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: factor out zink_batch_release-helper
Erik Faye-Lund [Tue, 15 Dec 2020 09:33:58 +0000 (10:33 +0100)]
zink: factor out zink_batch_release-helper

This will be useful for making sure everything has gotten cleaned up
properly.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agozink: do not open-code CALLOC_STRUCT
Erik Faye-Lund [Tue, 15 Dec 2020 09:38:34 +0000 (10:38 +0100)]
zink: do not open-code CALLOC_STRUCT

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8099>

3 years agofeatures: mark off GL 4.1 for zink
Mike Blumenkrantz [Tue, 5 Jan 2021 13:47:29 +0000 (08:47 -0500)]
features: mark off GL 4.1 for zink

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8329>

3 years agozink: GLSL 410
Mike Blumenkrantz [Mon, 3 Aug 2020 13:12:11 +0000 (09:12 -0400)]
zink: GLSL 410

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8329>

3 years agofeatures: mark off GL 4.0 for zink
Mike Blumenkrantz [Tue, 5 Jan 2021 13:46:35 +0000 (08:46 -0500)]
features: mark off GL 4.0 for zink

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8329>

3 years agozink: GLSL 4.00
Mike Blumenkrantz [Thu, 30 Jul 2020 00:34:41 +0000 (20:34 -0400)]
zink: GLSL 4.00

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8329>

3 years agozink: handle arrays of ubos
Mike Blumenkrantz [Thu, 30 Jul 2020 00:33:50 +0000 (20:33 -0400)]
zink: handle arrays of ubos

with the nir pass removing all dynamic indexing, all that's needed here
is generating extra binding points for each array member, as everything else
is already handled

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8314>

3 years agozink: run nir_lower_dynamic_bo_access
Mike Blumenkrantz [Wed, 29 Jul 2020 19:30:09 +0000 (15:30 -0400)]
zink: run nir_lower_dynamic_bo_access

this fixes up most cases of dynamic bo loading

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8314>

3 years agozink: handle vertex streams
Mike Blumenkrantz [Thu, 30 Jul 2020 18:11:21 +0000 (14:11 -0400)]
zink: handle vertex streams

we already support all this, it's just a matter of slapping on some Stream
decoration flex tape

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8204>

3 years agozink: enable PIPE_CAP_START_INSTANCE
Mike Blumenkrantz [Mon, 3 Aug 2020 19:37:35 +0000 (15:37 -0400)]
zink: enable PIPE_CAP_START_INSTANCE

and add feature

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8313>

3 years agozink: always load (gl_InstanceID - gl_BaseInstance) when loading gl_InstanceID
Mike Blumenkrantz [Mon, 3 Aug 2020 19:34:38 +0000 (15:34 -0400)]
zink: always load (gl_InstanceID - gl_BaseInstance) when loading gl_InstanceID

gl's values here always begin at 0, while vk begins with the firstInstance param
used in the current draw command

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8313>

3 years agoradv: enable TC-compat HTILE in GENERAL on GFX10+
Samuel Pitoiset [Thu, 10 Dec 2020 17:29:03 +0000 (18:29 +0100)]
radv: enable TC-compat HTILE in GENERAL on GFX10+

GFX10+ supports compressed writes to HTILE, so it should just work
to skip decompressions when transitioning from/to GENERAL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: only load the DS fast clear values for compressed rendering
Samuel Pitoiset [Thu, 10 Dec 2020 13:50:40 +0000 (14:50 +0100)]
radv: only load the DS fast clear values for compressed rendering

Otherwise it's useless because we are unlikely to perform a
fast depth stencil clear.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: clean up radv_layout_is_htile_compressed()
Samuel Pitoiset [Thu, 10 Dec 2020 13:28:11 +0000 (14:28 +0100)]
radv: clean up radv_layout_is_htile_compressed()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: fix TC-compat HTILE images with DST_OPTIMAL on the compute queue
Samuel Pitoiset [Thu, 10 Dec 2020 13:06:58 +0000 (14:06 +0100)]
radv: fix TC-compat HTILE images with DST_OPTIMAL on the compute queue

This is probably rare but can happen if someone performs a depth-stencil
copy on the compute queue. This might work (untested by CTS) but it
looks more conservative to decompress before perfoming the operation.

Found by inspection.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: add radv_htile_get_initial_value() and document the HTILE dword
Samuel Pitoiset [Wed, 9 Dec 2020 16:51:10 +0000 (17:51 +0100)]
radv: add radv_htile_get_initial_value() and document the HTILE dword

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: fix potential HTILE issues for TC-compat images on GFX8
Samuel Pitoiset [Wed, 9 Dec 2020 16:48:56 +0000 (17:48 +0100)]
radv: fix potential HTILE issues for TC-compat images on GFX8

We can only use the entire HTILE buffer if TILE_STENCIL_DISABLE is
TRUE. On GFX8+, this is only true if the depth image has no stencil
and if it's not TC-compatible because of the ZRANGE_PRECISION issue.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agoradv: always clear the SR0/SR1 bits of the HTILE buffer
Samuel Pitoiset [Wed, 9 Dec 2020 16:28:40 +0000 (17:28 +0100)]
radv: always clear the SR0/SR1 bits of the HTILE buffer

To make sure the stencil compare state is properly initialized and
cleared when the driver performs a fast depth clear.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>

3 years agomesa/st: fix redundant initialization
Pierre-Eric Pelloux-Prayer [Fri, 18 Dec 2020 14:48:51 +0000 (15:48 +0100)]
mesa/st: fix redundant initialization

https://gitlab.freedesktop.org/mesa/mesa/-/issues/3966

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agoradeonsi: fix redundant initializations
Pierre-Eric Pelloux-Prayer [Fri, 18 Dec 2020 14:48:05 +0000 (15:48 +0100)]
radeonsi: fix redundant initializations

See https://gitlab.freedesktop.org/mesa/mesa/-/issues/3966

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agogallium/vl: merge identical h264/h265 enums
Pierre-Eric Pelloux-Prayer [Tue, 1 Dec 2020 10:04:16 +0000 (11:04 +0100)]
gallium/vl: merge identical h264/h265 enums

Use h2645 notations for shared enums to reduce duplication and
fix a clang warning.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agotesselator: remove unused variable
Pierre-Eric Pelloux-Prayer [Wed, 25 Nov 2020 14:45:02 +0000 (15:45 +0100)]
tesselator: remove unused variable

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agoamd/addrlib: use cpp.has_argument() to filter compiler arguments
Pierre-Eric Pelloux-Prayer [Wed, 25 Nov 2020 14:44:53 +0000 (15:44 +0100)]
amd/addrlib: use cpp.has_argument() to filter compiler arguments

Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agovdpau: fix invalid enum usage
Pierre-Eric Pelloux-Prayer [Tue, 1 Dec 2020 17:09:44 +0000 (18:09 +0100)]
vdpau: fix invalid enum usage

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agovdpau: fix -Wabsolute-value warning
Pierre-Eric Pelloux-Prayer [Wed, 25 Nov 2020 14:32:36 +0000 (15:32 +0100)]
vdpau: fix -Wabsolute-value warning

vdpau specifies that top-left is x0/y0, bottom-right is x1/y1 and that x0/y0 are
inclusive while x1/y1 are exclusive.

This commit remove the abs() usage and instead verifies that the VdpRects passed
by the user matche the documentation. When they don't they're treated as empty
rectangles.

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7846>

3 years agoac/nir: use llvm.readcyclecounter for LLVM9+
Rhys Perry [Mon, 4 Jan 2021 13:06:15 +0000 (13:06 +0000)]
ac/nir: use llvm.readcyclecounter for LLVM9+

Unlike llvm.amdgcn.s.memtime, this works on GFX10.3

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4033
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8306>

3 years agogallium/tgsi_exec: Remove unused MaxGeometryShaderOutputs.
Eric Anholt [Wed, 30 Dec 2020 23:33:45 +0000 (15:33 -0800)]
gallium/tgsi_exec: Remove unused MaxGeometryShaderOutputs.

Just an indirection from the value you should be grepping for (the one
that controls the allocation of the output buffer).

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>

3 years agogallium/tgsi_exec: Clean up storage of the pixel kill mask.
Eric Anholt [Tue, 22 Dec 2020 21:45:33 +0000 (13:45 -0800)]
gallium/tgsi_exec: Clean up storage of the pixel kill mask.

We need one dword per exec, rather than one per channel, since it's the
bitmask of channels killed.  Removes the remainder of the
TGSI_EXEC_NUM_TEMP_EXTRAS!

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>

3 years agogallium/tgsi_exec: Drop the unused scratch temp regs.
Eric Anholt [Tue, 22 Dec 2020 21:40:53 +0000 (13:40 -0800)]
gallium/tgsi_exec: Drop the unused scratch temp regs.

I suspect this was used back in the SSE2 backend days.  Definitely dead
now.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>

3 years agogallium/tgsi_exec: Stop doing the weird allocation of the Addrs array.
Eric Anholt [Tue, 22 Dec 2020 21:37:54 +0000 (13:37 -0800)]
gallium/tgsi_exec: Stop doing the weird allocation of the Addrs array.

Saves an indirection on referencing the address regs, and also my sanity.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>

3 years agogallium/tgsi_exec: Simplify GS output vertex count tracking.
Eric Anholt [Tue, 22 Dec 2020 21:02:20 +0000 (13:02 -0800)]
gallium/tgsi_exec: Simplify GS output vertex count tracking.

We had this strange 5-dword-per-stream storage for the single dword
current vertex count, due to copy and paste.  We can make much cleaner
code by just having a 4-element array in the machine.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8283>

3 years agoradv: remove unused radv_image::aspects
Samuel Pitoiset [Tue, 5 Jan 2021 07:37:56 +0000 (08:37 +0100)]
radv: remove unused radv_image::aspects

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8324>

3 years agoradv: fix clearing images with vkCmdClear{Color,DepthStencil}Image()
Samuel Pitoiset [Tue, 5 Jan 2021 07:36:59 +0000 (08:36 +0100)]
radv: fix clearing images with vkCmdClear{Color,DepthStencil}Image()

The image aspects field is actually never set and we should use the
range aspect anyways.

Fixes: 1a7b7b17ad0 ("radv: avoid oob read during clear")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8324>

3 years agovbo/dlist: use a shared index buffer
Pierre-Eric Pelloux-Prayer [Mon, 7 Dec 2020 16:34:18 +0000 (17:34 +0100)]
vbo/dlist: use a shared index buffer

Draws can be merged by u_threaded if they share the same IB.

This improves performance in SPECviewperf13 snx-03: tests fps
are improved by a 1.2x - 2.0x factor.

v2: reworked error handling

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com> (v2)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8111>

3 years agomesa: fix a second bug in merging light state parameters with unpacked uniforms
Marek Olšák [Fri, 1 Jan 2021 19:01:25 +0000 (14:01 -0500)]
mesa: fix a second bug in merging light state parameters with unpacked uniforms

The memcpy size should be packed even if the allocated parameter size
is padded to 4 components.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agomesa: fix a bug in merging light state parameters with unpacked uniforms
Marek Olšák [Fri, 1 Jan 2021 19:00:13 +0000 (14:00 -0500)]
mesa: fix a bug in merging light state parameters with unpacked uniforms

This code is not enabled yet.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agomesa: add STATIC_ASSERTs to the STATE_LIGHT_ATTRIBS case
Marek Olšák [Fri, 1 Jan 2021 17:13:28 +0000 (12:13 -0500)]
mesa: add STATIC_ASSERTs to the STATE_LIGHT_ATTRIBS case

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agost/mesa: fix a defect when st_validate_state was invoked for unused states
Marek Olšák [Sat, 28 Nov 2020 08:46:30 +0000 (03:46 -0500)]
st/mesa: fix a defect when st_validate_state was invoked for unused states

This fixes a small performance issue. Discovered with piglit/drawoverhead.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agost/mesa: simplify checking whether to pin threads to L3
Marek Olšák [Sun, 29 Nov 2020 08:03:50 +0000 (03:03 -0500)]
st/mesa: simplify checking whether to pin threads to L3

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agoutil: replace UTIL_MAX_CPUS by util_cpu_caps.num_cpu_mask_bits
Marek Olšák [Sat, 28 Nov 2020 09:18:32 +0000 (04:18 -0500)]
util: replace UTIL_MAX_CPUS by util_cpu_caps.num_cpu_mask_bits

to reduce overhead when setting thread affinity.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8017>

3 years agoglsl/builtin_functions: Rename int64 function to int64_avail
Alexander von Gluck IV [Wed, 30 Dec 2020 00:46:59 +0000 (18:46 -0600)]
glsl/builtin_functions: Rename int64 function to int64_avail

* int64 is a core type on Haiku (and potentially other platforms)
* rename to int64_avail matching other similar calls

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
3 years agomeson: Add _GNU_SOURCE for Haiku to activate non-posix functions
Alexander von Gluck IV [Wed, 30 Dec 2020 00:46:45 +0000 (18:46 -0600)]
meson: Add _GNU_SOURCE for Haiku to activate non-posix functions

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
3 years agoradeonsi: take color interpolation into account for shader variants
Marek Olšák [Thu, 24 Dec 2020 12:42:12 +0000 (07:42 -0500)]
radeonsi: take color interpolation into account for shader variants

Fixes:
- Sample shading now uses per-sample interpolation for colors if colors
  are the only inputs. (this is the only case that was broken)

Optimizations:
- BC_OPTIMIZE (barycentric optimization) is now enabled with MSAA if colors
  are qualified with both center and centroid. (BC_OPTIMIZE means that
  the hardware skips initializing centroid (i,j) if they are equal to
  center (i,j))
- If MSAA is disabled and at least 2 out of (center, centroid, sample) are
  used by all inputs now including colors, center is forced for all inputs.
- If INTERP_MODE_COLOR is not used and the legacy GL shade model is flat,
  the shader variant for flat shading is not generated.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>

3 years agoradeonsi: add driconf options to enable/disable Smart Access Memory
Marek Olšák [Thu, 24 Dec 2020 12:04:07 +0000 (07:04 -0500)]
radeonsi: add driconf options to enable/disable Smart Access Memory

so that anybody can test it if they have Above 4G Decoding and compare
performance.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>

3 years agoac,radeonsi: limit Smart Access Memory to Zen 3 and GFX10.3 due to perf issues
Marek Olšák [Thu, 24 Dec 2020 11:14:11 +0000 (06:14 -0500)]
ac,radeonsi: limit Smart Access Memory to Zen 3 and GFX10.3 due to perf issues

Many people experience performance degradation on some systems.
There will be a driconf option to enable SAM on other chips as well as
disable it on enabled systems.

Fixes: d3d6d381450 - ac: add radeon_info::all_vram_visible for Smart Access Memory
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3982

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>

3 years agoutil: add AMD CPU family enums and enable L3 cache pinning on Zen3
Marek Olšák [Thu, 24 Dec 2020 10:43:25 +0000 (05:43 -0500)]
util: add AMD CPU family enums and enable L3 cache pinning on Zen3

Based on: https://en.wikichip.org/wiki/amd/cpuid

The only reason it's nominated as a fix is because Zen3 might underperform
because the CPU detection ignored it.

Fixes: 15fa2c5e359 - gallium/u_cpu_detect: get the number of cores per L3 cache for AMD Zen

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>

3 years agoradeonsi: Fix typos.
Vinson Lee [Fri, 1 Jan 2021 02:01:10 +0000 (18:01 -0800)]
radeonsi: Fix typos.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8289>

3 years agonir/algebraic: Move the flrp -> bcsel rule earlier
Ian Romanick [Tue, 16 Jun 2020 21:29:58 +0000 (14:29 -0700)]
nir/algebraic: Move the flrp -> bcsel rule earlier

If multiple rules could match, the rule that appears first in the file
is used.

Only Tiger Lake and Ice Lake are affected.  Other platforms either have
a LRP instruction or can't run any shaders from shader-db that would
benefit.

v2: Fix issues created when this commit was rebased on top of
3c8934a644b8 ("nir/algebraic: add flrp patterns for 16 and 64 bits").
Noticed by Caio.

Tiger Lake and Ice Lake had similar results.
total instructions in shared programs: 20908672 -> 20908661 (<.01%)
instructions in affected programs: 419 -> 408 (-2.63%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.20 x̃: 3
helped stats (rel) min: 1.85% max: 3.19% x̄: 2.49% x̃: 2.65%
95% mean confidence interval for instructions value: -3.56 -0.84
95% mean confidence interval for instructions %-change: -3.24% -1.73%
Instructions are helped.

total cycles in shared programs: 473513940 -> 473513793 (<.01%)
cycles in affected programs: 7176 -> 7029 (-2.05%)
helped: 12
HURT: 0
helped stats (abs) min: 5 max: 22 x̄: 12.25 x̃: 12
helped stats (rel) min: 0.84% max: 3.24% x̄: 2.09% x̃: 1.80%
95% mean confidence interval for cycles value: -15.43 -9.07
95% mean confidence interval for cycles %-change: -2.57% -1.61%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Mark comparisons generated from lowered fsign precise
Ian Romanick [Wed, 19 Feb 2020 20:47:21 +0000 (12:47 -0800)]
nir/algebraic: Mark comparisons generated from lowered fsign precise

This prevents other transformations from converting them to 'a != 0'.
For example, both of these transformations can do this:

   (('~flt', 0.0, ('fabs', a)), ('fne', a, 0.0)),
   (('~flt', ('fneg', ('fabs', a)), 0.0), ('fne', a, 0.0)),

Both fsign(fabs(NaN)) and fsign(fneg(fabs(NaN))) should produce zero,
but, since 'NaN != 0.0' is true, cascading these transformations could
cause them to generate 1.0 or -1.0 respecively.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Fix broken NaN and -0.0 behavior
Ian Romanick [Tue, 18 Feb 2020 20:52:42 +0000 (12:52 -0800)]
nir/algebraic: Fix broken NaN and -0.0 behavior

No shader-db or fossil-db changes on any Intel platform.

v2: Add a coding line to fix SCons build problems caused by the ±
character.

Fixes: 25bfba3335d ("nir/algebraic: Recognize open-coded copysign(1.0, a)")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agospir-v: Mark floating point comparisons exact
Ian Romanick [Wed, 5 Aug 2020 02:43:52 +0000 (19:43 -0700)]
spir-v: Mark floating point comparisons exact

OpenGL GLSL, OpenGL ARB assembly shaders, and DX9 are pretty loose about
the behavior in the presence of NaNs.  Many GPUs that implement these
specifications do not even have a representation of NaN.  However,
OpenCL and Vulkan SPIR-V are not so lax.  Both actually have some
required behavior in the presence of NaN, and, of the two, OpenCL is the
most strict.

For years we have implemented SPIR-V by using the same comparison
opcodes as we use for OpenGL GLSL and OpenGL assembly shaders.  This has
repeatedly caused problems where an optimization that is valid in the
NaN-relaxed world is not valid in Vulkan or OpenCL.  To fix this, set
the "exact" flag on comparisons instructions generated from SPIR-V.
This will block optimizations that may have different NaN behavior.

v2: Set the exact flag in the nir_builder, not in the vtn_builder.

v3: Add an assertion in vtn_handle_constant that the exact flag wasn't
set (because it's ignored).  Rebase on 80163bbec3a ("nir/vtn: Support
OpOrdered and OpUnordered opcodes").  Mark the NIR generated for those
opcodes as exact as well.

v4: s/unused_exact/exact/ in a couple places, and assert that exact has
the expected value (true in one place, false in the other).  Suggested
by Caio.

Closes: #3345
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Fixes: 8513b12590c ("nir/opt_if: split ALU from Phi more aggressively")

This commit doesn't really fix anything in 8513b12590c.  However,
without 8513b12590c, a regression is triggered in RADV on No Man's
Sky.  I want to ensure that this change is only applied on top of
8513b12590c, and Fixes: seems the safest way to do that.

No shader-db changes on any Intel platform.  This only affects SPIR-V,
and we have no OpenGL SPIR-V shaders in shader-db.

124 shaders in Shadow of the Tomb Raider (Steam "native") were hurt by 1
spill and 1 fill each.

All Intel platforms had similar results. (Tiger Lake shown)
Instructions in all programs: 155668276 -> 155685764 (+0.0%)

SENDs in all programs: 6474570 -> 6474570 (+0.0%)

Loops in all programs: 35271 -> 35271 (+0.0%)

Cycles in all programs: 3198055373 -> 3198628031 (+0.0%)

Spills in all programs: 231522 -> 231646 (+0.1%)

Fills in all programs: 347571 -> 347695 (+0.0%)

Vega
Totals:
SGPRs: 20955712 -> 20956756 (+0.00%); split: -0.02%, +0.03%
VGPRs: 13476920 -> 13473132 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613371940 -> 613339348 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 3111886 -> 3112481 (+0.02%); split: +0.02%, -0.00%
Instrs: 120723785 -> 120746991 (+0.02%); split: -0.04%, +0.06%
Cycles: 626658992 -> 626862708 (+0.03%); split: -0.05%, +0.08%
VMEM: 216330854 -> 216343196 (+0.01%); split: +0.04%, -0.04%
SMEM: 32079391 -> 32081972 (+0.01%); split: +0.05%, -0.04%
VClause: 2688784 -> 2688789 (+0.00%); split: -0.03%, +0.03%
SClause: 6554669 -> 6556251 (+0.02%); split: -0.01%, +0.03%
Copies: 5356667 -> 5353283 (-0.06%); split: -0.36%, +0.29%
Branches: 954466 -> 954716 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 9078300 -> 9081626 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 10972090 -> 10966576 (-0.05%); split: -0.06%, +0.01%

Totals from 48239 (12.08% of 399432) affected shaders:
SGPRs: 2713984 -> 2715028 (+0.04%); split: -0.16%, +0.19%
VGPRs: 1997804 -> 1994016 (-0.19%); split: -0.46%, +0.27%
CodeSize: 172094092 -> 172061500 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 337327 -> 337922 (+0.18%); split: +0.20%, -0.02%
Instrs: 33053657 -> 33076863 (+0.07%); split: -0.15%, +0.22%
Cycles: 254961228 -> 255164944 (+0.08%); split: -0.12%, +0.20%
VMEM: 15165226 -> 15177568 (+0.08%); split: +0.59%, -0.51%
SMEM: 3304938 -> 3307519 (+0.08%); split: +0.49%, -0.41%
VClause: 766225 -> 766230 (+0.00%); split: -0.12%, +0.12%
SClause: 1332645 -> 1334227 (+0.12%); split: -0.04%, +0.16%
Copies: 2040651 -> 2037267 (-0.17%); split: -0.94%, +0.77%
Branches: 743668 -> 743918 (+0.03%); split: -0.01%, +0.05%
PreSGPRs: 1697667 -> 1700993 (+0.20%); split: -0.07%, +0.27%
PreVGPRs: 1718424 -> 1712910 (-0.32%); split: -0.39%, +0.07%

Polaris
Totals:
SGPRs: 21349172 -> 21354376 (+0.02%); split: -0.02%, +0.04%
VGPRs: 13690680 -> 13686920 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613745824 -> 613704988 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 2775012 -> 2775189 (+0.01%); split: +0.01%, -0.00%
Instrs: 120735079 -> 120756209 (+0.02%); split: -0.04%, +0.06%
Cycles: 627906100 -> 628076156 (+0.03%); split: -0.05%, +0.08%
VMEM: 216623065 -> 216641838 (+0.01%); split: +0.04%, -0.04%
SMEM: 32295618 -> 32299338 (+0.01%); split: +0.05%, -0.04%
VClause: 2711025 -> 2711141 (+0.00%); split: -0.03%, +0.04%
SClause: 6545185 -> 6546769 (+0.02%); split: -0.01%, +0.03%
Copies: 5387723 -> 5383249 (-0.08%); split: -0.37%, +0.29%
Branches: 953775 -> 953954 (+0.02%); split: -0.01%, +0.03%
PreSGPRs: 9148814 -> 9153211 (+0.05%); split: -0.01%, +0.06%
PreVGPRs: 11029429 -> 11023915 (-0.05%); split: -0.06%, +0.01%

Totals from 48239 (12.00% of 402052) affected shaders:

SGPRs: 2682056 -> 2687260 (+0.19%); split: -0.16%, +0.35%
VGPRs: 1994436 -> 1990676 (-0.19%); split: -0.46%, +0.27%
CodeSize: 170857060 -> 170816224 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 295429 -> 295606 (+0.06%); split: +0.07%, -0.01%
Instrs: 32808802 -> 32829932 (+0.06%); split: -0.16%, +0.22%
Cycles: 254633252 -> 254803308 (+0.07%); split: -0.13%, +0.20%
VMEM: 14897934 -> 14916707 (+0.13%); split: +0.65%, -0.52%
SMEM: 3289726 -> 3293446 (+0.11%); split: +0.53%, -0.42%
VClause: 775318 -> 775434 (+0.01%); split: -0.11%, +0.13%
SClause: 1304867 -> 1306451 (+0.12%); split: -0.04%, +0.16%
Copies: 2026334 -> 2021860 (-0.22%); split: -0.99%, +0.77%
Branches: 742554 -> 742733 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1690887 -> 1695284 (+0.26%); split: -0.07%, +0.33%
PreVGPRs: 1717709 -> 1712195 (-0.32%); split: -0.40%, +0.07%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Add some compare-with-zero optimizations that are exact
Ian Romanick [Tue, 11 Aug 2020 01:34:37 +0000 (18:34 -0700)]
nir/algebraic: Add some compare-with-zero optimizations that are exact

This prevents some fossil-db regressions in "spir-v: Mark floating point
comparisons exact".

v2: Note that the patterns and replacements produce the same value when
isnan(b).  Suggested by Caio.

v3: Use C99 isfinite() instead of (obsolete) BSD finite().  Fixes
various Windows builds.

No fossil-db changes on any Inetl platform, Vega, or Polaris10.

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908670 -> 20908672 (<.01%)
instructions in affected programs: 69 -> 71 (2.90%)
helped: 0
HURT: 1

total cycles in shared programs: 473515288 -> 473513940 (<.01%)
cycles in affected programs: 4942 -> 3594 (-27.28%)
helped: 2
HURT: 0

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Mark some logic-joined comparison reductions as exact
Ian Romanick [Wed, 8 Jul 2020 19:53:07 +0000 (12:53 -0700)]
nir/algebraic: Mark some logic-joined comparison reductions as exact

This also prevents some fossil-db regressions in "spir-v: Mark floating
point comparisons exact".

v2: Mark the fmin / fmax in the replacement exact to prevent other
optimizations from ruining the NaN-clensing property of the fmin / fmax.
Suggested by Rhys.  Don't assume that constants are not NaN because some
components of a vector might be NaN while others are numbers.  Noticed
by Rhys.  This causes ~8 more shaders in Age of Wonders III (dxvk) to
regress on cycles (not instructions) by less than 1% when "spir-v: Mark
floating point comparisons exact" is applied.  This difference is too
small to care.

All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908668 -> 20908670 (<.01%)
instructions in affected programs: 9196 -> 9198 (0.02%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 2 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.02% max: 5.41% x̄: 2.20% x̃: 2.16%
HURT stats (abs)   min: 2 max: 6 x̄: 3.20 x̃: 3
HURT stats (rel)   min: 2.44% max: 16.67% x̄: 9.39% x̃: 12.50%
95% mean confidence interval for instructions value: -1.22 1.49
95% mean confidence interval for instructions %-change: -2.08% 5.41%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 473515330 -> 473515288 (<.01%)
cycles in affected programs: 67146 -> 67104 (-0.06%)
helped: 10
HURT: 7
helped stats (abs) min: 1 max: 36 x̄: 15.90 x̃: 17
helped stats (rel) min: 0.01% max: 1.29% x̄: 0.66% x̃: 0.89%
HURT stats (abs)   min: 1 max: 48 x̄: 16.71 x̃: 4
HURT stats (rel)   min: 0.08% max: 1.94% x̄: 0.87% x̃: 0.19%
95% mean confidence interval for cycles value: -13.88 8.94
95% mean confidence interval for cycles %-change: -0.56% 0.49%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir: Correctly constant fold fsign(NaN) and fsign(-0)
Ian Romanick [Tue, 18 Feb 2020 20:34:15 +0000 (12:34 -0800)]
nir: Correctly constant fold fsign(NaN) and fsign(-0)

GLSL and SPIR-V GLSL.std.450 don't have any requirements for fsign(NaN),
and both only require that FSign(-0.0) == 0.0.  OpenCL, on the other
hand, requires sign(-0.0) be exactly -0.0.  It also requires that
sign(NaN) be exactly 0.0.

In practice, this change is difficult to test.  Our GLSL frontend
already constant folds sign(NaN) to 0.0 before even getting to NIR.  As
far as I can tell, glslang does the same.  I don't have a good way to
run an OpenCL SPIR-V test.  Maybe SPIR-V GLSL.std.450 assembly?

No shader-db or fossil-db changes on any Intel platform.

Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Don't add reordered version of patterns for commutative instructions
Ian Romanick [Sat, 25 Jan 2020 01:10:07 +0000 (17:10 -0800)]
nir/algebraic: Don't add reordered version of patterns for commutative instructions

The reordered are automatically considered by nir_algebraic rules for
commutative instructions.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agoRevert "nir: Replace an odd comparison involving fmin of -b2f"
Ian Romanick [Fri, 12 Jun 2020 01:48:41 +0000 (18:48 -0700)]
Revert "nir: Replace an odd comparison involving fmin of -b2f"

I originally noticed that 3b308147916 ("nir/algebraic: Optimize 1-bit
Booleans") caused this pattern no longer be matched by incorrectly
replacing b@32 with b@1.  Making that correct had no effect on
shader-db.  When this pattern originally was added, it only affected 4
shaders, so it's not worth the effort to debug further.

This reverts commit f50400cc8040cf2d07de97e76d9b1ed144c5c8b4.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir/algebraic: Make some notes about comparison rearrangements versus infinity
Ian Romanick [Wed, 8 Jul 2020 00:30:42 +0000 (17:30 -0700)]
nir/algebraic: Make some notes about comparison rearrangements versus infinity

The original comment was a little terse and a little incorrect.  The
rearrangements are fine w.r.t. NaN.  However, they produce incorrect
results if one operand is +Inf and the other is -Inf.

A later commit, "nir/algebraic: Add some compare-with-zero optimizations
that are exact", will add some more patterns here.  It may be reasonable
to squash this commit (forward) into that commit.

v2: Fix some incorrect comparisons operators in the comment (<= vs >=).
Add commentary that subtraction works like addition w.r.t. NaN.  Both
noticed / suggested by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agonir: Make some notes about fsign versus NaN
Ian Romanick [Tue, 18 Feb 2020 18:18:57 +0000 (10:18 -0800)]
nir: Make some notes about fsign versus NaN

This commit only documents the current behavior, even if that behavior
is not the behavior preferred by the relevant specs.

In SPIR-V, there are two flavors of the sign instruction, and each lives
in an extended instruction set.  The GLSL.std.450 FSign instruction is
defined as:

    Result is 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0.

This also matches the GLSL 4.60 definition.

However, the OpenCL.ExtendedInstructionSet.100 sign instruction is
defined as:

    Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if
    x < 0. Returns 0.0 if x is a NaN.

There are two differences.  Each treats -0.0 differently, and each also
treats NaN differently.  Specifically, GLSL.std.450 FSign does not
define any specific behavior for NaN.

There has been some discussion in Khronos about the NaN behavior of
GLSL.std.450 FSign.  As part of that discussion, I did some research
into how we treat NaN for nir_op_fsign, and this commit just captures
some of those notes.

v2: Document the expected behavior of nir_op_fsign more thoroughly.
Suggested by Rhys.  Note that the current implementation of constant
folding does not produce the expected result for NaN.  Suggested by
Caio.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>

3 years agost/mesa: don't affect original st_CompressedTexSubImage parameters
Andrii Simiklit [Mon, 28 Dec 2020 13:58:24 +0000 (15:58 +0200)]
st/mesa: don't affect original st_CompressedTexSubImage parameters

The fallback path is still possible here so let keep them as is.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3952
Fixes: 4b02f165 ("st/mesa: implement PBO upload for glCompressedTex(Sub)Image")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8258>

3 years agogallium: remove PIPE_CAP_INFO_START_WITH_USER_INDICES and fix all drivers
Marek Olšák [Sat, 28 Nov 2020 05:44:19 +0000 (00:44 -0500)]
gallium: remove PIPE_CAP_INFO_START_WITH_USER_INDICES and fix all drivers

Drivers aren't allowed to ignore start with user index buffers anymore.
This is required by the new fast path where mesa/main is using pipe_draw_info.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>

3 years agost/mesa: implement Driver.DrawGallium callbacks
Marek Olšák [Tue, 3 Nov 2020 18:04:03 +0000 (13:04 -0500)]
st/mesa: implement Driver.DrawGallium callbacks

This is the new fast path replacing the _mesa_prim path.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>

3 years agovirgl: fix handling draw info
Marek Olšák [Sun, 22 Nov 2020 03:08:50 +0000 (22:08 -0500)]
virgl: fix handling draw info

index_bias is undefined if index_size == 0.
index bounds are undefined if index_bounds_valid == false.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>

3 years agov3d: don't use index_bias if not indexed
Marek Olšák [Sun, 22 Nov 2020 06:45:11 +0000 (01:45 -0500)]
v3d: don't use index_bias if not indexed

index_bias is undefined if index_size == 0.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>

3 years agovc4: don't use index_bias if indexed
Marek Olšák [Sun, 22 Nov 2020 06:53:18 +0000 (01:53 -0500)]
vc4: don't use index_bias if indexed

index_bias is undefined if index_size == 0.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7679>