review.tizen.org Git - platform/upstream/mesa.git/log

ci/bare-metal: Drop the BM_POE_USERNAME/PASSWORD env var checks.

They're unused since the transition to SNMP in the rpi test farm.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15201>

zink: lower dmod on AMD hardware

this hardware won't return the correct value from dmod instructions,
so lower it to ensure that cts passes

nobody else will ever hit this, so perf isn't an issue and regular fmod
can be left alone

fixes (amd):
KHR-GL46.gpu_shader_fp64.builtin.mod_d*

Fixes: 5fae35fb17d6d89c4fe1d9d5a19d827caf25b9fc ('zink: fix 64bit float shader ops ')

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15306>

zink: add another radv fail

it looks like this one was erroneously excluded

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15307>

zink: update radv fails

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15307>

venus: add VK_EXT_vertex_attribute_divisor

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: add VK_EXT_shader_stencil_export

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: add VK_EXT_robustness2

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: add VK_EXT_depth_clip_enable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: add VK_EXT_conservative_rasterization

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: add VK_EXT_shader_demote_to_helper_invocation

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

venus: update venus-protocol headers

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15265>

anv: include Primitive Header in mesh shader per-primitive output

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

anv: set number of viewports in clip state (mesh)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

intel/compiler: mark some variables as per-primitive in FS if they come from MS

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

intel/compiler: handle ViewportIndex, PrimitiveID and Layer in MUE setup

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

intel/compiler: inject MUE initialization

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

intel/compiler: shift mesh urb read/write window when offset is too large

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15303>

aco: always emit vk_cvt_pkrtz_f16_f32 for nir_op_pack_half_2x16_split

From the VK_KHR_shader_float_controls extension:

    "5) Do any of the “Pack” GLSL.std.450 instructions count as
     conversion instructions and have the rounding mode applied?"

    "RESOLVED: No, only instructions listed in “section 3.32.11.
     Conversion Instructions” of the SPIR-V specification count as
     conversion instructions."

This is also the same logic as the LLVM backend.

No fossils-db changes on Sienna Cichlid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15301>

docs: improve language in zink article

Turns out, this was not proper use of language!

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15300>

docs: fixup zink gl 4.3 requirements

The multiViewport feature isn't required for GL 4.3, it's required for
GL 4.1. Technically speaking, we could have just dropped it because we
already list the maxViewports requirement. But it seems better to be
very clear here to me.

Fixes: 29f8f21bff6 ("docs: document zink GL 4.3 requirements")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15300>

broadcom/compiler: don't always assign r5 if available

Instead, only favor assigning r5 if we have first decided to
assign an accumulator. This helps with assining r5 to short
lived uniforms, favoring accumulator rotation to facilitate
QPU merges.

total instructions in shared programs: 12656164 -> 12628339 (-0.22%)
instructions in affected programs: 5368373 -> 5340548 (-0.52%)
helped: 17420
HURT: 9996

total uniforms in shared programs: 3704776 -> 3704863 (<.01%)
uniforms in affected programs: 12247 -> 12334 (0.71%)
helped: 23
HURT: 78

total max-temps in shared programs: 2153505 -> 2152684 (-0.04%)
max-temps in affected programs: 26468 -> 25647 (-3.10%)
helped: 569
HURT: 328

total fills in shared programs: 4656 -> 4657 (0.02%)
fills in affected programs: 43 -> 44 (2.33%)
helped: 0
HURT: 1

total sfu-stalls in shared programs: 34728 -> 34403 (-0.94%)
sfu-stalls in affected programs: 3411 -> 3086 (-9.53%)
helped: 842
HURT: 534

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

broadcom/compiler: add comment on why we don't use r5 with ldunifa

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

broadcom/compiler: adjust register threshold for 2-thread compiles

We have twice the registers in this case so it makes sense to double
this as well. While this causes slight regressions in shader-db
stats (due to additional register pressure), it helps us hide latency
of memory reads better on 2-thread compiles, where the thread switch
mechanism will be less effective. This shows a ~3% performance
improvement on the UE4 SunTemple demo.

total instructions in shared programs: 12642413 -> 12656164 (0.11%)
instructions in affected programs: 2272652 -> 2286403 (0.61%)
helped: 2924
HURT: 3389

total uniforms in shared programs: 3703861 -> 3704776 (0.02%)
uniforms in affected programs: 213729 -> 214644 (0.43%)
helped: 823
HURT: 1272

total max-temps in shared programs: 2150686 -> 2153505 (0.13%)
max-temps in affected programs: 191332 -> 194151 (1.47%)
helped: 1900
HURT: 1891

total spills in shared programs: 3255 -> 3274 (0.58%)
spills in affected programs: 166 -> 185 (11.45%)
helped: 3
HURT: 6

total fills in shared programs: 4630 -> 4656 (0.56%)
fills in affected programs: 367 -> 393 (7.08%)
helped: 7
HURT: 15

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

broadcom/compiler: add a strategy to disable scheduling of general TMU reads

This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:

total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77

total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0

total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455

total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47

total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4

total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0

Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:

total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338

total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74

total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499

total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035

total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21

total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15

total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038

total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

broadcom/compiler: define v3d-specific delays for NIR instructions

We do a few changes over NIR's defaults:

1. Lower delay for texture reads. Empirically, we don't observe any
   benefits with delays over 50 and since this delay value is still
   used by the scheduler in the "favor register pressure" case it is
   benefitial to avoid overestimating it too much.

2. Adjust delay for non-filtered TMU reads to the delay selected for
   texture reads.

3. In our case, UBO reads from dynamically uniform addresses don't
   use the TMU and have a latency of 1 instruction in the best case
   scenario or 4 at worse, so we go with 1 so we don't try to move
   this early.

This helps us get back some of what we lost when updating the
default scheduler configuration to add a delay for non-filtered
memory reads:

total instructions in shared programs: 13126587 -> 12671765 (-3.46%)
instructions in affected programs: 3764097 -> 3309275 (-12.08%)
helped: 14664
HURT: 4244

total threads in shared programs: 407208 -> 415522 (2.04%)
threads in affected programs: 8716 -> 17030 (95.39%)
helped: 4224
HURT: 67

total uniforms in shared programs: 3812698 -> 3711224 (-2.66%)
uniforms in affected programs: 335170 -> 233696 (-30.28%)
helped: 2816
HURT: 3551

total max-temps in shared programs: 2318430 -> 2159345 (-6.86%)
max-temps in affected programs: 539991 -> 380906 (-29.46%)
helped: 13173
HURT: 1440

total spills in shared programs: 49086 -> 5966 (-87.85%)
spills in affected programs: 48306 -> 5186 (-89.26%)
helped: 1655
HURT: 28

total fills in shared programs: 55810 -> 9328 (-83.29%)
fills in affected programs: 54821 -> 8339 (-84.79%)
helped: 1659
HURT: 22

LOST:   0
GAINED: 3

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

nir/schedule: allow drivers to decide about instruction latency

On V3D reading UBOs from uniform addresses uses a more efficient
mechanism with lower latency. On other platforms there may be
simular scenarios.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

nir/schedule: use larger delay for non-filtered memory reads

This has been pending for a long time. It is not very consistent to
add a significant delay for textures and not do it for UBOs, etc
The reason we have not been doing this so far is the accumulated effect
on register pressure for V3D as shown by shader-db results below, but
from the point of view of a generic scheduler it makes sense to do this.

Later patches will address V3D specific issues with register pressure
derived from this by letting the driver control its instruction delay
settings.

total instructions in shared programs: 12662138 -> 13126587 (3.67%)
instructions in affected programs: 1813091 -> 2277540 (25.62%)
helped: 2410
HURT: 10499

total threads in shared programs: 415858 -> 407208 (-2.08%)
threads in affected programs: 17348 -> 8698 (-49.86%)
helped: 8
HURT: 4333

total uniforms in shared programs: 3711483 -> 3812698 (2.73%)
uniforms in affected programs: 128012 -> 229227 (79.07%)
helped: 3474
HURT: 2143

total max-temps in shared programs: 2138763 -> 2318430 (8.40%)
max-temps in affected programs: 318780 -> 498447 (56.36%)
helped: 588
HURT: 11997

total spills in shared programs: 3860 -> 49086 (1171.66%)
spills in affected programs: 709 -> 45935 (6378.84%)
helped: 23
HURT: 1595

total fills in shared programs: 5573 -> 55810 (901.44%)
fills in affected programs: 1067 -> 51304 (4708.25%)
helped: 23
HURT: 1595

LOST: 3
GAINED: 0

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

nir/schedule: handle nir_intrinsic_group_memory_barrier

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

nir/schedule: fix handling of generic memory barrier

We can get a generic nir_intrinsic_memory_barrier to represent a
barrier involving multiple semantics (instead of getting individual
specific barriers for each semantic). This means that we need to
consider these as potentially affecting shared memory access as well.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

broadcom/compiler: stop moving UBO loads before NIR scheduling

This doesn't have any significant impact shader-db stats and would
reduce our capacity to hide latency from the loads, so it is probably
undesirable:

total instructions in shared programs: 12663189 -> 12663186 (<.01%)
instructions in affected programs: 4222 -> 4219 (-0.07%)
helped: 9
HURT: 4

total uniforms in shared programs: 3711624 -> 3711629 (<.01%)
uniforms in affected programs: 186 -> 191 (2.69%)
helped: 0
HURT: 2

total max-temps in shared programs: 2138822 -> 2138857 (<.01%)
max-temps in affected programs: 569 -> 604 (6.15%)
helped: 1
HURT: 9

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>

lavapipe: set non-zero device/driver uuid

Closes #5875

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15230>

turnip: Make autotuner work with reusable command buffers

To achieve it each command buffer now has its own GPU memory.

However the BOs usage by autotuner is not optimal, the ideal
pattern would be to use some memory pool to suballocate small
GPU memory chunks, since most command buffers have only a few
renderpasses.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5990

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14996>

virgl: Add a few more formats to the format table

These formats are used by the piglit

arb_texture_buffer_object-formats fs arb

Adding them here keeps the piglit from crashing, but most of the related
tests don't pass.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13645>

intel: Use 3DSTATE_BINDING_TABLE_POOL_ALLOC exclusively on Gfx11+

On Icelake and later, we can use a new 3DSTATE_BINDING_TABLE_POOL_ALLOC
command to update the location of the binder (buffer containing binding
table entries), rather than having to move Surface State Base Address
via a STATE_BASE_ADDRESS command. This has less stalling and also means
our surface addresses can remain relative to a fixed 4GB address range,
meaning we don't have to re-stream them any time the binder changes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>

intel: Limit Wa_1607854226 to Gfx12.0 only

This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>

iris: Rename surface_base_address to binder_address in a few places

On Gfx11+, we're going to stop changing Surface State Base Address
and instead start changing the Binding Table Pool address instead.

So, rename a few things to track the last binder address, which is
what we're actually changing, regardless of how we program it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>

iris: Use more efficient binding table pointer formats on Icelake+.

Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size.  Each binding table within the
binder must be aligned to 32B.

XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment.  Larger
binders mean fewer stalls as we update the base address for the binder.

Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.

We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8).  Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.

Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048).  However, tables that needed more
than 8 entries would need multiple 32B chunks.  A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.

This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>

blorp: Add a binding_table_offset_to_pointer helper

On Gen11+, we have a feature that requires us to shift binding table
offsets by 3. This adds a helper which gives the driver a hook to do
this if it so chooses.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>

gallium/tc: zero alloc transfers

Otherwise this causes trouble with unitialized memory, eg with:
   struct si_transfer {
      struct threaded_transfer b;
      struct si_resource *staging;
   };
'staging' will not be initialized and this causes #6109.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6109

Cc: mesa-stable
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15298>

util/slab: add slab_zalloc

A a variant that clears the allocated object to 0.

Cc: mesa-stable
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15298>

tu: Refactor VS DECODE/DEST to be emitted in two pkt4

Refactor to emit VFD_DECODE and VFD_DEST_CNTL in two packets
regardless of attribute count.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14584>

mesa/st: make export_point_size shader key clobber existing psiz

this is necessary to upload the API value using the uniform constant

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: check max output components for adding pointsize during precompile

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: count FF shaders as needing psiz export for precompile

this is consistent with logic for regular compiles

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: precompile with API pointsize only if the shader doesn't have pointsize

this is a more accurate hint, and maintains the existing behavior given
subsequent changes to this area

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: simplify pointsize precompile conditional

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: simplify pointsize shader update conditional

ES contexts have no API toggle for this, so it will never be flagged

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa: always set PointSizeEnabled for API_OPENGLES2

this is implicit, so make it explicit

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: only add pointsize output if it doesn't exceed max component limit

fixes (zink-nvidia):
dEQP-GLES31.functional.geometry_shading.basic*

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

nir/gather_info: check copy_deref instrs for writing outputs

this is a valid way to write an output even though it usually gets rewritten
to some other instruction later on

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: conditionally add pointsize outputs to ES tess/geom shaders

if the driver requires this value to be set, add it if the shader doesn't
use the ext to allow exporting it

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

mesa/st: add a gl_program struct flag to skip psiz exports for xfb

if this output did not exist in the original shader,
then it must not be exported in xfb

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

glsl: store OES/EXT point_size extension enablement to shader struct

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15228>

panvk: Implement vkCmdDispatch()

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Add support for storage image

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Move dummy attribute buffer emission out of emit_{attribute,varying}_bufs

So we can easily add entries after the standard varyings/attributes
(like image descriptors in the attribute array).

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Add support for storage/uniform buffers with dynamic offsets

The idea of storing offsets in a separate UBO and lowering accesses to
UBOs/SSBOs with a dynamic offset was not great. Let's apply the offset
at UBO/SSBO emission time instead.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Support creation of compute pipelines

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Add support for storage buffers

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

panvk: Add support for push constants

Push constants are stored in a separate UBO.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15248>

zink: handle spirv xfb insanity

this comes in two flavors:
* streamout of array<struct/block>
* partial streamout of array/struct/block

for the former:
* arrays of structs can just be blasted out in the initial var declaration (easy)
* arrays of blocks must be output to separate xfb buffers for each array block,
  which requires skipping initial xfb blast-off and instead propagating the values
  using tmp variables at a later point

for the latter:
* the optimal way to do this is to unwrap the struct first to figure out what's being
  emitted, at which point the value can be extracted and exported

fixes the rest of spec@arb_gl_spirv@execution@xfb
...except spec@arb_gl_spirv@execution@xfb@vs_block_array, which I'm suspecting is broken
due to vtn bugs

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: store shader to ntv_context

it's insane the gymnastics that have to be done because this wasn't stored

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: handle remaining xfb corner cases during analysis

this now handles inlining of stupid types (dvec3, dmatX) and complex
types (goku) as seen in cts

fixes:
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
KHR-Single-GL46.enhanced_layouts.xfb_struct_explicit_location

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: fix xfb analysis variable finding for arrays

this fixes clipdistance exports

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: correctly set xfb packed output offsets

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: store the correct number of components for xfb packing outputs

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

zink: use 64bit mask for xfb analysis

I don't know how this worked before since all the values are oob?

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15224>

ci: more stoney flakes

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

ci: add another stoney flake

from #6109

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/cso: stop tracing during cso_unbind()

this unnecessarily bloats lavapipe traces

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/trace: dump more rasterizer state members

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/trace: dump clear_texture colors

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/trace: dump clear colors as uints

dumping as float is nice if the clear color is a float, but if it isn't
then the value is useless

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/trace: rzalloc the context struct

this has problems if pointers are garbage

cc: mesa-stable

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

aux/trace: more screen methods

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14854>

lavapipe: fix pipeline creation for blend and zs states

these values are read based on the specified subpass containing the
required attachments, not on the overall renderpass

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15282>

lavapipe: update multisample state after blend state

null blend pipeline state will zero the blend struct, which would cause
values set here to be overwritten

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15282>

turnip: Don't call getenv() directly

I noticed it was using getenv directly when I tried to use 'setprop
mesa.tu.debug ..' on android. Use os_get_option() instead so we get
sysprop fallback on android.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15289>

virgl: Fix texture transfers by using a staging resource

This commit fixes the following flaws in the implementation:

* when a resource was re-allocated, the guest side storage
  was also allocated
* when a source needs a readback before being written to, then
  the call would go through vws->transfer_get, thereby bypassing the
  staging resource, and this would fail on the host, because no
  the allocated IOV was too small (just one byte)
* if the texture write would need neither flush nor readback, the
  old code path would be used expecting that guest side backing stogage
  for the texture.

v2: - actually do a readback to the stageing resource when it is required
    - fix typo (Lepton)

v3: Don't use stageing transfers if the host can't read back the data
    by rendering to an FBO or calling getTexImage, because in this case
    we rely on the IOV to hold the date.

v4: Also don't use staging transfers if the format is no readback
    format. Otherwise we have to deal with the resolve blit, and
    this is currently not working correctly.

v5: add a new flag that indicates whether non-renderable textures can
    be read back (either via glGetTexImage or GBM)

v6: Restrict the use of staging texture transfers to textures that can
    be read back, and on GLES also if the they are bound to scanout and
    the host uses minigbm to allocate such textures.
    For that replace the flag indicating the capability to read back
    non-renderable textures with a cap that indicates whether scanout
    textures can be read back.

v7: update virglrenderer version in the CI

v8: update use of stageing (Chia-I)

v9: remove superflous check and assignment (Chia-I)

v10: disable stageing textures for arrays with stencil format. This is a
     workaround for failures of the CI.

Fixes: cdc480585c9be368ddfdc33e2eb73e3582f25fe7
    virgl/drm: New optimization for uploading textures

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14495>

llvmpipe: clamp surface clear geometry

avoid oob writes to avoid crashing

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14655>

lavapipe: clamp clear attachments rects

there is at least one unnamed game which has problems with this, so try
to avoid crashing

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14655>

llvmpipe: fix debug print iterating in set_framebuffer_state

this would potentially access garbage memory by checking the existing
state using the incoming state's iterator values

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14857>

zink: ci updates

cc: mesa-stable

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>

zink: fix 64bit float shader ops

this was being set from back before zink actually supported 64bit
natively and only 32bit was functional, but it breaks 64bit support

cc: mesa-stable

fixes (lavapipe):
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec2
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec3
KHR-GL46.gpu_shader_fp64.builtin.mod_dvec4

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>

zink: run nir_lower_phis_to_scalar in optimization loop

fixes (lavapipe):
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic*

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15274>

radv,aco,llvm: lower post shuffle vertex in NIR

fossils-db (Sienna Cichlid):
Totals from 774 (0.57% of 134913) affected shaders:
VGPRs: 26496 -> 26312 (-0.69%)
CodeSize: 1825936 -> 1828812 (+0.16%); split: -0.04%, +0.20%
MaxWaves: 22046 -> 22062 (+0.07%)
Instrs: 347634 -> 347975 (+0.10%); split: -0.05%, +0.15%
Latency: 1363949 -> 1356426 (-0.55%); split: -0.59%, +0.04%
InvThroughput: 221529 -> 221380 (-0.07%); split: -0.10%, +0.04%
VClause: 5682 -> 5676 (-0.11%); split: -1.46%, +1.36%
SClause: 7485 -> 7411 (-0.99%); split: -1.48%, +0.49%
Copies: 30481 -> 30420 (-0.20%); split: -0.51%, +0.31%
PreVGPRs: 19717 -> 19656 (-0.31%)

fossil-db (Polaris10):
Totals from 896 (0.66% of 135960) affected shaders:
SGPRs: 49824 -> 49648 (-0.35%); split: -0.39%, +0.03%
VGPRs: 31040 -> 29948 (-3.52%); split: -3.62%, +0.10%
CodeSize: 875960 -> 875920 (-0.00%); split: -0.06%, +0.05%
MaxWaves: 6380 -> 6429 (+0.77%)
Instrs: 171522 -> 171482 (-0.02%); split: -0.07%, +0.05%
Latency: 1356082 -> 1334386 (-1.60%); split: -1.61%, +0.01%
InvThroughput: 553389 -> 552957 (-0.08%); split: -0.08%, +0.00%
VClause: 4317 -> 4244 (-1.69%); split: -2.41%, +0.72%
SClause: 6157 -> 6139 (-0.29%); split: -0.45%, +0.16%
Copies: 9340 -> 9235 (-1.12%); split: -1.24%, +0.12%
PreVGPRs: 22366 -> 22116 (-1.12%)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15113>

nir: Introduce workgroup_index and ability to lower workgroup_id to it.

The workgroup_index is intended for situations when a 3 dimensional
workgroup_id is not available on the HW, but a 1 dimensional index is.
In this case, we can use lower the 3D ID to use this.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>

nir: Extract lower_id_to_index into a separate function.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>

nir: Fix lowering terminology of compute system values: "from"->"to".

This is to match other NIR terminology.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>

panvk: Non-destructively stub GetRenderAreaGranularity

Don't crash. Just print a warning and return 1x1.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15285>

panvk: Advertise zero sparse format properties

This is the correct implementation when you don't support sparse and
fixes piles of CTS crashes.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15285>

panvk: Advertise VK_KHR_get_physical_device_properties2

All the entrypooints are already implemented and a bunch of Vulkan CTS
tests assume this extension exists.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15285>

gallium/dri: Add missing in_fence_fd initialization

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6108
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15272>

vulkan/device_select: add has_vulkan11 flag with has_pci_bus flag

In EnumeratePhysicalDevices(), pci bus info is available only in
vulkan version >= 1.1. hence adding has_vulkan11 flag in places
where has_pci_bus is used in EnumeratePhysicalDevices() code flow.

Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14535>

vulkan/device_select: for vulkan 1.0 use vid/did for boot_vga

In device select layer EnumeratePhysicalDevices() function pci
bus information is available only in case of vulkan >= 1.1.
Hence use vid/did to match boot_vga device in case of vulkan 1.0.

Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14535>

nir: Fix handling of NV_mesh_shader PRIMITIVE_INDICES output.

PRIMITIVE_INDICES is a flat array in NV_mesh_shader,
not a proper arrayed output, as opposed to D3D-style
mesh shaders where it's addressed by the primitive index.

Prevent assigning several slots to primitive indices,
to avoid issues.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15160>

aco/insert_exec_mask: optimize top-level transition to exact before demote

fossil-db (Sienna Cichlid):
Totals from 5767 (3.55% of 162293) affected shaders:
Instrs: 3264949 -> 3257527 (-0.23%); split: -0.23%, +0.00%
CodeSize: 17835692 -> 17806004 (-0.17%); split: -0.17%, +0.00%
Latency: 45990060 -> 45987924 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 7643850 -> 7643835 (-0.00%); split: -0.00%, +0.00%
Copies: 193641 -> 186219 (-3.83%); split: -3.84%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>

aco/insert_exec_mask: use get_exec_op

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>

aco/insert_exec_mask: fix top-level to-exact with non-global exact mask

After transitioning to exact after a discard, the exec stack might be:
[exact|global, wqm, exact]

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>

radeonsi/ci: Mark a bunch of flaky tests on stoney

radeonsi-stoney-gl:amd64 job fails due to random crashes of some
'dEQP-GLES3.functional.buffer.map.write.explicit_flush.*' tests.

Fix the pipeline by adding them to 'radeonsi-stoney-flakes.txt'.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15238>