review.tizen.org Git - platform/upstream/mesa.git/log

anv: Use vma_heap for descriptor pool host allocation

Pre-patch, anv_descriptor_pool used a free list for host allocations
that never merged adjacent free blocks.  If the pool only allocated
fixed-sized blocks, then this would not be a problem. But the pool
allocations are variable-sized, and this caused over half of the pool's
memory to be consumed by unusable free blocks in some workloads, causing
unnecessary memory footprint.

Replacing the free list with util_vma_heap, which does merge adjacent
free blocks, fixes the memory explosion in the target workload.

Disdavantges of util_vma_heap compared to the free list:
  - The heap calls malloc() when a new hole is created.
  - The heap calls free() when a hole disappears or is merged with an
    adjacent hole.
  - The Vulkan spec expects descriptor set creation/destruction to be
    thread-local lockless in the common case. For workloads that
    create/destroy with high frequency, malloc/free may cause overhead.
    Profiling is needed.

Tested with a ChromeOS internal TensorFlow benchmark, provided by
package 'tensorflow', running with its OpenCL backend on clvk.

  cmdline: benchmark_model --graph=mn2.tflite --use_gpu=true --min_secs=60
  gpu: adl
  memory footprint from start of benchmark:
    before: init=132.691MB max=227.684MB
    after:  init=134.988MB max=134.988MB

Reported-by: Romaric Jodin <rjodin@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20289>

util/vma: Track size of free memory in heap

This allows users to detect fragmentation on allocation failure.
If heap allocation fails but the allocation size is not larger than the
total free size, then the allocation failed due to fragmentation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20289>

Revert "anv: Refactor anv_pipeline to use the anv_pipeline_type"

This reverts commit b1126abb38a5ff4a10c4c3240d01c22c1fb90c1b.

This breaks all hell at least on DG2, as there are several cases left
where current_pipeline gets checked against GPGPU to decide what to do,
and the value doesn't match that of ANV_HW_PIPELINE_STATE_COMPUTE.
On top of that, it also misses checking for
ANV_HW_PIPELINE_STATE_RAYTRACING.

Then there's the fact that in some cases, current_pipeline will be
UINT32_MAX, because it's the original undefined state and also used
after executing a secondary command buffer because we are not tracking
on which pipeline did the secondary left us.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7910
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20349>

iris: Don't reject CPU access for non-invalidating buffer write maps

Buffer maps that don't invalidate their destination range work better
as direct CPU maps than staging blits.  The application may write only
part of the range, effectively combining the new data with existing
data.  So even if the map would stall, the staging blit path won't help
us, as we have to read the existing data to populate the staging buffer
before returning it.  This incurs a stall anyway - plus a read and copy.

In contrast, a direct map doesn't need to read any data - it can just
write the destination and the existing data will still be there.

Fixes excessive blits for stalling buffer writes that don't invalidate
the buffer since my recent map heuristic rework.

Fixes: bec68a85a2d ("iris: Improve direct CPU map heuristics")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7895
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20330>

anv: remove some gen8 specifics handled now in hasvk

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20342>

ci: restore reliable Alpine 3.16

Alpine 3.17 suffered random freezes.

Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20294>

iris: Check for zero in clear color compatibility fn

Both formats may interpret the clear color as zero.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20320>

d3d12: Add ASSERTED to variables only used in debug builds to fix build MSVC with C4189 errors

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20340>

intel/isl: Disable CCS on MTL until B0 (Wa_14017353530)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>

intel/dev: Enable AUX map on MTL

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>

intel/aux_map: Ignore format bits when using tile-4

Based on Jianxun's ("iris: don't get format bits in AUX tables").

With gfx12.5+, the compression format is once again coming from the
surface state programming. MTL once again uses an aux-map, but it
ignores the format bits within the the aux-map metadata.

Ref: Bspec 44930: "Compression format from AUX page walk is ignored.
Instead compression format from Surface State is used."

gfx12.5+ also uses tile-4 rather than y-tiling, so if we don't see
y-tiling, we can return 0 from intel_aux_map_format_bits() for the
ignored format bits.

Rework:
* Just return 0 if not using y-tiling as suggested by Nanley.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>

iris/resource: Check devinfo::has_local_mem before using BO_ALLOC_LMEM

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>

iris: Nuke dead IRIS_CONTEXT* macros

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19650>

iris: Nuke flags from iris_bufmgr that can read from devinfo

Now that devinfo is stored in iris_bufmgr we can nuke this duplicated
flags.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19650>

iris: Only fetch intel_device_info once per bufmgr

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19650>

iris: Store intel_device_info in iris_bufmgr

We can have multiple pipe_screen but only one iris_bufmgr per device.
So better to store intel_device_info into the shared iris_bufmgr and
save some memory.
Also in future patches iris_bufmgr will make more use of
intel_device_info.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19650>

anv: fixup another dirty issue with gpu_memcpy

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20335>

panfrost: fix memory leak related to disk cache

Direct leak of 3912 byte(s) in 2 object(s) allocated from:
    #0 0x7fbd4641b0 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xa41b0)
    #1 0x7f74413518 in parse_and_validate_cache_item ../src/util/disk_cache_os.c:549
    #2 0x7f74414b84 in disk_cache_load_item ../src/util/disk_cache_os.c:599
    #3 0x7f74410364 in disk_cache_get ../src/util/disk_cache.c:551
    #4 0x7f775695ac in panfrost_disk_cache_retrieve ../src/gallium/drivers/panfrost/pan_disk_cache.c:125

Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20336>

anv: Refactor anv_pipeline to use the anv_pipeline_type

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20316>

radv/rra: Fix leaf node id order

Leaf nodes aren't stored in build order so we have to account for that
when dumping leaf node ids.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20184>

radv/rra: Validate geometry_id

The following patch will use geometry_id so make sure that it's in
bounds.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20184>

radv/rra: Refactor resource management during dumping

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20184>

radv/rra: Emit leaf node ids for leaf nodes instead of internal nodes

Fixes: e4283d8 ("radv/rra: Handle box16 nodes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20184>

ac/sqtt: bump the maximum number of traces to 6 for GFX11

GFX11 can have more than 4 SEs. I think it would be better to allocate
an array but that's for later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20337>

ac/rgp: add missing GFX11 bits for RGP

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20337>

ac/nir: remove num_es_threads_var

A bit count of es_accepted works for both when ngg is and isn't
dynamically enabled. Unlike the other sequence, this should only be a
single SALU instruction.

fossil-db (gfx1100, nggc):
Totals from 41388 (30.75% of 134574) affected shaders:
Instrs: 25783544 -> 25432959 (-1.36%); split: -1.36%, +0.00%
CodeSize: 127281160 -> 125878820 (-1.10%); split: -1.10%, +0.00%
Latency: 92849566 -> 92723047 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 9542194 -> 9485012 (-0.60%); split: -0.60%, +0.00%
Copies: 2031074 -> 1928796 (-5.04%); split: -5.04%, +0.00%
Branches: 642407 -> 642409 (+0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20321>

ac/nir: fix ngg culling on gfx11

This subtraction can underflow.

If subgroup_id*wave_size is larger than num_live_vertices_in_workgroup,
num_es_threads_var should be zero.

fossil-db (gfx1100, nggc):
Totals from 41388 (30.75% of 134574) affected shaders:
Instrs: 25700772 -> 25783544 (+0.32%)
CodeSize: 126950072 -> 127281160 (+0.26%)
Latency: 92809233 -> 92849566 (+0.04%); split: -0.00%, +0.04%
InvThroughput: 9526675 -> 9542194 (+0.16%)
Copies: 2031078 -> 2031074 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20321>

vc4: replace open-coded F_DUPFD_CLOEXEC with os_dupfd_cloexec()

Just like 12 lines above.

Split out of !20180

Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20313>

intel/common/intel_genX_state.h: Add intel_set_ps_dispatch_state()

This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c477 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.

So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>

intel/common: Add intel_genX_state.h

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>

radv/ci: add lists for GFX1100

0 failures, call it a win (the RT ones are CTS bugs).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20315>

st/mesa: Enable Alpha writes when writing RGB faked as RGBA

Some GPUs are able to render more efficiently when all channels of a
color attachment are written, since whole pixels are being overwritten,
rather than hitting a read-modify-write cycle where newly written data
has to be combined with existing unmodified image data.

When faking GL_RGB as RGBA (in case RGB/RGBX isn't color renderable),
we introduce an extra channel that doesn't exist from the application
point of view.  With such a format, a color mask of 0x7 (RGB) would mean
to write all channels.  But because we've added an alpha channel behind
their back, this becomes a partial write.  We are free to write whatever
garbage we want to the alpha channel, however.  So we can enable alpha
writes, making this a more efficient full pixel write again.

This is done unconditionally as it's expected to address a problem
common to many drivers and isn't expected to be harmful, even on GPUs
where it may not help much.

Improves WebGL Aquarium performance on Alderlake GT1 by around 2.4x, in
the Chromium, using Wayland (the --enable-features=UseOzonePlatform and
--ozone-platform=wayland flags).

v2: Don't require PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND (Marek)
v3: Fix independent blending enables (Emma) - now set when needed,
    skipped when not needed, and PIPE_CAP_INDEP_BLEND_ENABLE is no
    longer a requirement.  We just optimize where we can.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7864
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v2]
Reviewed-by: Emma Anholt <emma@anholt.net> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20290>

docs: update calendar and link releases notes for 22.3.1

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20328>

docs: add release notes for 22.3.1

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20328>

panfrost: Add tool to print supported texture formats

While all Panfrost-supported Mali GPUs support all the compressed texture
formats architecturally, the system integrator decides which formats will
actually be wired up in the production system-on-chip. In the past there may
have been legal considerations, I'm neither a lawyer nor a system integrator so
couldn't say.

It's useful for users to know which compressed texture formats are supported by
their hardware, to understand its performance characteristics (and perhaps to
buy systems that support their needs, especially if they need BCn formats which
are omitted in many Mali implementations).

To help with that, this commit adds a small standalone tool that prints which
formats are supported. It is tested so far on Mali-T860 and Mali-G57.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Chris Healy <healych@amazon.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20086>

ci/nouveau: Add a bunch of the top hits of gk20a flakes.

A bit of categorization in the process.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20326>

ci/nouveau: Sort some uncategorized gk20a flakes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20326>

nir: Allow more than just ALU instructions in 'weak' GVN

This removes the ALU-only restriction on the "weak" GVN introduced by
the previous commit. This makes it slightly more aggressive, allowing
it to coalesce things like UBO loads (still within sister then/else
blocks). This also can have surprisingly large cascading effects.

I was concerned that this might increase register pressure, but
shader-db and fossil-db show effectively no change in spills/fills,
so it seems to be fine.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>

nir: Perform 'weak' global value numbering in all GCM passes

Full global value numbering (GVN) can be pretty aggressive, moving
values far away from their original locations, even out of loops,
and can extend their live ranges a lot.  So we've left it disabled.

This patch introduces a weaker form of GVN: we only allow coalescing
identical values when they appear on either side of the same if/else
construct.  For now, we also only allow ALU instructions.

This allows nir_opt_gcm to clean up identical instructions appearing
on both sides of if/then/else control flow.  But it avoids aggressively
combining every other occurrence of a value in the program.

This can still have surprisingly large cascading effects, as simple
constructs are cleaned up, leading to more opportunities to do the
same clean up, up a chain of nested ifs.  It also enables greater use
of the select peephole as ifs are cleaned up.

shader-db and fossil-db results show a reduction in spills/fills on
Icelake, so it doesn't seem to be hurting register pressure.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19823>

anv: remove anv_reloc_list->array_length

This is another field that, after the recent commits, became unused.
It's either zero-initialized (by the memset) or copy-initialized
(which means it's also zero). And it never even gets used anywhere
anyway, so even if the value was non-zero it wouldn't matter.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20309>

anv: remove anv_reloc_list->reloc_bos

As a consequence of the last two commits, reloc_bos is always NULL and
never used anywhere, so remove it.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20309>

anv: remove anv_reloc_list_grow()

The last commit made it clear that anv_reloc_list_grow() only ever
gets called with zero as num_additional_relocs, which means it will
always immediately return VK_SUCCESS without doing anything. That
means we can remove it.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20309>

anv: remove anv_reloc_list->num_relocs

There are only a few places in the code where num_relocs gets set:

  - During anv_reloc_list_init() where it gets memset() to 0.
  - At anv_reloc_list_init_clone() where it gets set with the value of
    another anv_reloc_list->num_relocs.
  - During anv_reloc_list_clear(), where it gets set to 0.
  - During anv_reloc_list_append(), where it gets added with the value
    of another anv_reloc_list->num_relocs.

As you can see, either we explicitly set the value to 0 or we copy the
value that's present in another anv_reloc_list, which should be 0. The
one place where we used to increment num_relocs was in
anv_reloc_list_add(), but that was deleted by:

  7b7381e8d7a5 ("anv: Delete anv_reloc_list_add()")

So in this commit we delete the num_relocs field from struct
anv_reloc_list and we also delete some lines where, if the value is 0,
nothing will happen.

There's more we could be deleting here, but I wanted this commit to be
minimal so it's very clear that num_relocs can't be non-zero. We were
having some speculation that anv_reloc_list may still be important for
actually adding BOs to the batch and building the validation list, so
let's go slowly with the removal to make everything more easily
reviewable.

The one possibility I could be missing here is another situation like
the memset() we have at anv_reloc_list_init() or some other crazy
indirect overwrite, but as far as I have checked, that is not the
case.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20309>

anv: remove anv_execbuf->surface_states_relocs

Now that we removed relocations, this is not being used anywhere.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20309>

intel/common: clean up AUX macros

The hardcoded is either replaced with new interfaces or relocated
to C file if it is private.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20259>

intel/vulkan: replace AUX macros with interfaces

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20259>

intel/isl: Support 1MB alignment for AUX mapping

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20259>

intel/common: Support 1MB granularity AUX mapping format (Bspec 44930)

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20259>

ci/bare-metal: Avoid a bug in armhf stripping causing tempfiles in artifacts.

We're failing to strip, so at least try not to leave a million tempfiles
around.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20302>

ci/baremetal: Clean the directory we unpack artifacts into.

gitlab-runner reuses containers, and since we don't pull git, the working
directory doesn't get cleaned automatically. You don't want to have stale
files from previous builds, particularly if someone's testing changes of
build options that might disable a driver.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20302>

tu: Use start offset for storage buffers

This lets us expose a minStorageBufferOffsetAlignment of 4 which is what
vkd3d-proton expects.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20105>

tu: Expose *TexelBufferOffsetSingleTexelAlignment

This exactly matches what the HW can do.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20105>

freedreno/fdl: Support texel-aligned iova for buffer views

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20105>

freedreno/a6xx: Document buffer-specific tex const fields

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20105>

freedreno: Document various preemption-related registers/packets

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20229>

wsi/x11: Rename the present progress objects.

The lock and condition variable isn't just for present_id anymore,
it's also for normal forward progress.

Adds more detailed comments what the variables are supposed to
accomplish.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19990>

wsi/x11: Fix possible deadlock with wait_ready.

With the introduction of locks around the XCB polling mechanism,
a possible deadlock was introduced.

If all 5 images were rapidly acquired and presented before the
FIFO thread had the chance to submit a present,
we would deadlock.

Before the lock however, it was still buggy since the two threads would
race to poll events and update internal state.

The fix is to just ensure that there are pending presentation requests
in flight, so that forward progress is guaranteed before we take the
poll lock.

Also, use a timedlock for acquire next image.

Similar as WaitForPresentKHR.
Also need to make the busy flag atomic to actually allow acquire thread
and present threads to access the busy flag.
Take advantage of busy flag being atomic so that we can gracefully handle
timeout == 0 scenarios where there actually are images available.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: 8fc7927787 ("wsi/x11: Implement VK_KHR_present_wait on X11.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19990>

radv: Don't lower subgroup shuffle on GFX11.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>

aco: Emulate Wave64 bpermute on GFX11.

Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>

aco: Stylistic changes to emit_gfx10_wave64_bpermute.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>

aco: Split opcodes for GFX6 and GFX10 emulated bpermute.

Different sequences are emitted for these, so it makes sense to
have different opcodes too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>

aco: Don't accept constants on p_bpermute.

The sequence emitted for this pseudo instruction is not ready
to handle constants or literals at all.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>

ci/venus: add a VKCTS mapping test to the flakes list

Seen on https://gitlab.freedesktop.org/mesa/mesa/-/jobs/33483156.

Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20312>

iris: Enable compression for image load/store in more cases

We were calling iris_resource_texture_aux_usage here, which disables
auxiliary support if color happens to already be resolved.  This makes
sense for read only images, where if we know ahead of time that aux
doesn't contain any useful information, we can just tell the hardware
to not bother looking at it.  However, it makes no sense for mutable
images, as even if the aux currently has no useful data, we want to
produce that data when doing our image writes.

Import the bits of logic we need from there and shed the rest.  We don't
need to consider HiZ, MCS, or MC, nor do we need to do format-based
CCS compatibility checks on Gfx12+, so it's actually very little code.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Allow fast clears on compressed image load/store access

While I haven't found documentation saying definitively that HDC
supports fast clear blocks, it seems to work just fine, even on
Tigerlake.  I have found several issues (atomics and HDC support
for linear compression) that both call out fast clears as an issue
in those corner cases, which suggests that fast clears do actually
work outside of those corners (which we already disable).

The previous commit implemented actual aux state updates for image
views.  With ISL_AUX_USAGE_GFX12_CCS_E, this means that we update
the aux state to COMPRESSED_CLEAR after writes.  But because we
weren't supporting fast clears, this meant that any such images
would need partial resolves to remove the clear color on next use.
Supporting fast clears allows us to drop all these resolves.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Update aux state tracking for image views after draws/dispatches

On Tigerlake and later, we enable compression for image views.  However,
we never actually added any code to update the aux state, which meant
that if it ever changed, things would break, badly.

We managed to avoid catastrophic effects in most cases because of
two other issues which papered over the problem: if compression wasn't
already enabled for an image, we'd leave it disabled.  And, we avoided
writing via the CPU to buffers with auxiliary.  So in most cases, CCS
remained disabled, or got enabled (say by glTexImage()) then stayed on
permanently.  There were still issues, but they managed to remain more
hidden than one would expect given the severity of the bug.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Drop disable_rb_aux_buffer handling for image views

The goal here is to support OpenGL 4.6 section 9.3, "Feedback Loops
Between Textures and the Framebuffer" (from GL_ARB_texture_barrier)
where you can bind an image as both a framebuffer attachment and a
texture, and simultaneously sample-from and render-to it.

I'm not aware of any spec language that requires us to handle
simultaneously accessing something as a framebuffer attachment and an
image load/store resource. GL_ARB_shader_image_load_store tends to
make flushing and synchronization something the app has to handle
explicitly rather than something the driver needs to do implicitly.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Drop 'isl_' prefix from 'formats_are_fast_clear_compatible'

Every time I see this function I think it's part of isl.  But it's not,
it's just a static function in an iris file.  The point of the name was
that the function checks two isl_format enums...but the prefix is just
confusing.  Just drop the prefix as it's a static function.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Pin the clear color BO in use_image()

Images with the RC_CCS modifier store the clear color in a separate BO,
which we also need to pin when using an image view.

Most images store the clear color in the same BO so it works anyway.

Thanks to Nanley Chery for catching this!

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

iris: Drop batch parameter from iris_update_postdraw_resolve_tracking

Eventually the resolve code started making everything take ice instead
of batch, and at some point this ceased to be used. It's always render.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19060>

zink: Fix reversed cap declarations for ImageBuffer

Fixes validation fails on
KHR-GLES31.core.texture_buffer.texture_buffer_texture_buffer_range.

Fixes: f55a4407ef97 ("zink: more accurately set {Sampled,Image}Buffer caps")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20308>

radv/ci: bump most jobs to the kernel to 6.1 + latest firmwares

Unfortunately, not all jobs can be using Linux 6.1 right now, as
NAVI10 hits __vm_enough_memory errors then hangs in VKCTS. So for
this job, we will keep Linux 5.17 until this gets fixed.

Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7888
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16835>

anv: assert when number of primitives is higher than max

Such cases can lead to memory corruptions.

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20279>

anv: handle mesh shaders with max primitives == 0

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20279>

radv: disable more NIR opts in radv_postprocess_nir() with DISABLE_OPTIMIZATIONS

To make fast-linking with GPL hopefully a bit faster.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20244>

radv: move a conditional check to radv_remove_color_exports()

Better to have all restrictions inside the function.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20244>

radv: advertise VK_AMD_shader_early_and_late_fragment_tests

Pass all dEQP-VK.*early_and_late* tests on GFX10.3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19738>

radv: implement AMD_shader_early_and_late_fragment_tests

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19738>

spirv: add support for AMD_shader_early_and_late_fragment_tests

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19738>

radeonsi/vcn: add support for 10bit input and enc 8bit output

This change is to support 10bit YUV input in addition to
original H264/HEVC 8bit output case. It adds
rvcn_enc_input_format_t and rvcn_enc_output_format_t for
picture input format and output format separately.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20284>

nir: Eliminate nir_op_i2b

There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. :shrug:

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/builder: Handle i2b conversions specially in nir_type_convert

The shaders affected here are ones that were previously affected when
i2b was unconditionally lowered in opt_algebraic. There are a few places
where some transformations happen in a different order, so some
algebraic patterns are missed.

All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914369 -> 19914566 (<.01%)
instructions in affected programs: 92375 -> 92572 (0.21%)
helped: 0 / HURT: 90

total cycles in shared programs: 853851470 -> 853867215 (<.01%)
cycles in affected programs: 12400663 -> 12416408 (0.13%)
helped: 28 / HURT: 69

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16710721 -> 16710700 (<.01%)
instructions in affected programs: 108010 -> 107989 (-0.02%)
helped: 57 / HURT: 103

total cycles in shared programs: 884299412 -> 884306546 (<.01%)
cycles in affected programs: 12986423 -> 12993557 (0.05%)
helped: 87 / HURT: 102

total spills in shared programs: 14937 -> 14925 (-0.08%)
spills in affected programs: 12 -> 0
helped: 9 / HURT: 0

total fills in shared programs: 17569 -> 17557 (-0.07%)
fills in affected programs: 12 -> 0
helped: 9 / HURT: 0

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902347 (<.01%)
instructions in affected programs: 7311 -> 7317 (0.08%)
helped: 3 / HURT: 8

total cycles in shared programs: 741795500 -> 741792266 (<.01%)
cycles in affected programs: 273308 -> 270074 (-1.18%)
helped: 9 / HURT: 2

No shader-db changes on any other Intel platform.

No fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

intel/fs: Use nir_type_convert instead of nir_type_conversion_op

In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

microsoft/compiler: Use nir_type_convert instead of nir_type_conversion_op

In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

spirv: Use nir_type_convert instead of nir_type_conversion_op

In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

v2: Use the actual bit size of the source to determine the conversion
op. With mediump, the "planned" bit size and the actual bit size might
be different. Fixes many, many Vulkan CTS assertion failures on any
platform that sets mediump_16bit_alu (e.g., Freedreno).

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir: Use nir_type_convert instead of nir_type_conversion_op

In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

glsl: Use nir_type_convert instead of nir_type_conversion_op

In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

In gl_nir_lower_packed_varyings, all of the type conversions are between
int32 and uint32 types. In NIR, those are just moves, so elide them.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/builder: Add rounding mode parameter to nir_type_convert

Later changes will use this.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

glsl_to_nir: Fix NIR bit-size of ir_triop_bitfield_extract and ir_quadop_bitfield_insert

Previously these would return result->bit_size of 32 even though the
type might have been int16_t or uint16_t. This prevents many assertion
failures in "glsl: Use nir_type_convert instead of
nir_type_conversion_op" on zink.

Fixes: 5e922fbc160 ("glsl_to_nir: fix bitfield_extract with 16-bit operands")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

microsoft/compiler: Simplify nir_intrinsic_load_front_face handling

It is invalid to have Boolean variables as either shader inputs or
outputs, so there is no point to try to lower them in general.  The only
use for this was some two-phase lowering of
nir_intrinsic_load_front_face that could be done in a single phase.
Create the SYSTEM_VALUE_FRONT_FACE as a uint and compare it with zero at
the same time.

No shader-db or fossil-db changes on any Intel platform.

v2: Remove dxil_nir_lower_bool_input from dxil_nir.h and drop it from
the other caller in the spirv_to_dxil codepath.  Noticed by Jesse.  Fix
setting bit size when loading SYSTEM_VALUE_FRONT_FACE.  Caught by CI.

v3: Use nir_ine_imm.  Change type of gl_FrontFacing GS output in
d3d12_nir_passes from Boolean to integer.  Both suggested by Jesse.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/builder: Emit x != 0 for nir_i2b

There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variation that include both ine
and i2b), just emit a != 0 instead of i2b(a).

I think that the changes to the unit tests weaken them slightly, but
perhaps that's okay?

No shader-db changes on any Intel platform.  The GLSL paths use other
means to generate i2b operations, but the SPIR-V paths use nir_i2b.
Presumably since 4676b3d3dd9 (nir: Use nir_test_mask instead of
i2b(iand)), no fossil-db changes either.

v2: Use nir_ine_imm.  Suggested by Jesse.

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir: Use nir_i2b wrapper everywhere instead of using nir_i2b1 directly

No shader-db or fossil-db changes on any Intel platform.

v2: Add missed i2b1 in ir3_nir_opt_preamble.c.

v3: Add missed i2b1 in ac_nir_lower_ngg.c.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Optimize some b2i involved in masking operations

v2: Remove the ineg from the b2i in the ior pattern. Suggested by
Jason.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914441 -> 19914369 (<.01%)
instructions in affected programs: 63507 -> 63435 (-0.11%)
helped: 24 / HURT: 0

total cycles in shared programs: 853869766 -> 853851470 (<.01%)
cycles in affected programs: 10551542 -> 10533246 (-0.17%)
helped: 24 / HURT: 0

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141163061 -> 141092683 (-0.0%)
Instructions helped: 14103
Instructions hurt: 55

Cycles in all programs: 9132376195 -> 9133183045 (+0.0%)
Cycles helped: 13775
Cycles hurt: 380

Spills in all programs: 18286 -> 18284 (-0.0%)
Spills helped: 1

Fills in all programs: 30647 -> 30643 (-0.0%)
Fills helped: 1

Gained: 133
Lost: 130

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Eliminate unary op on src of integer comparison w/ zero

This helps because it enables cmod propagation to do more.

The removed patterns involving b2i will be handled by other existing
patterns after the unary operations are removed.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19914458 -> 19914441 (<.01%)
instructions in affected programs: 5456 -> 5439 (-0.31%)
helped: 17 / HURT: 0

total cycles in shared programs: 855302118 -> 853869766 (-0.17%)
cycles in affected programs: 327354347 -> 325921995 (-0.44%)
helped: 291 / HURT: 81

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141205979 -> 141205961 (-0.0%)
Instructions helped: 4
Instructions hurt: 3

SENDs in all programs: 7466919 -> 7466913 (-0.0%)
SENDs helped: 1

Cycles in all programs: 9133387327 -> 9133384475 (-0.0%)
Cycles helped: 3
Cycles hurt: 12

In the shader that was helped for sends, it appears that a NIR pass that
moves code out of loops was able to move 3 send operations outside a
loop after this change. I did not investigate further.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Simplify min and max of b2i

This prevents ~400 shader-db regresssions and a handful of fossil-db
regressions after i2b is always lowered.

All Ivy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total cycles in shared programs: 855301494 -> 855302118 (<.01%)
cycles in affected programs: 52787 -> 53411 (1.18%)
helped: 4 / HURT: 5

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141206055 -> 141205979 (-0.0%)
Instructions helped: 14

Cycles in all programs: 9133376616 -> 9133387327 (+0.0%)
Cycles helped: 13
Cycles hurt: 3

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Reassociate some iand to eliminate an operation

No shader-db changes on any Intel platform.

All of the helped shaders were presumably regressed by 4676b3d3dd9 (nir:
Use nir_test_mask instead of i2b(iand)).

v2: Add some comments explaining why specific replacements are used. In
the umin pattern, only markup the first usage of 'b' in the source
pattern.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141384970 -> 141200966 (-0.1%)
Instructions helped: 45842

Cycles in all programs: 9133648977 -> 9133282672 (-0.0%)
Cycles helped: 26812
Cycles hurt: 6025

Gained: 23
Lost: 135

Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Remove redundant i2b(b2i(x)) patterns

A loop below already adds all the permutations... including the 1-bit
version that isn't included in this group.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Remove redundant i2b(-x) pattern

The exact same pattern appears later (around line 1323).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

nir/algebraic: Catch some kinds of copy-and-paste bugs in algebraic patterns

A later commit adds a pattern

   (('umin', ('iand', a, '#b(is_pos_power_of_two)'),
             ('iand', c, '#b(is_pos_power_of_two)')),
    ('iand', ('iand', a, b), ('iand', c, b))),

When I originally made that pattern, I copied and pasted the search to
the replacement as

  (('umin', ('iand', a, '#b(is_pos_power_of_two)'),
            ('iand', c, '#b(is_pos_power_of_two)')),
   ('iand', ('iand', a, '#b(is_pos_power_of_two)'),
            ('iand', c, '#b(is_pos_power_of_two)'))),

The caused the variables in the replacement to be marked is_constant,
and that resulted in an assertion failure deep inside nir_search.

    src/compiler/nir/nir_search.c:530: construct_value: Assertion `!var->is_constant' failed.

These extra validation rules catch this kind of error at compile time
rather than at run time.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>

gallium/pp: typedef and use pp_st_invalidate_state_func to avoid cast

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20042>