review.tizen.org Git - platform/upstream/mesa.git/log

zink: optimize transfer_map for resources with pending reads/writes

we don't need to stall here if we know that we're not about to have any io
conflicts in the buffer

Reviewed-by: Erik Faye-Lun <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6924>

zink: add a mechanism to track current resource usage in batches

this is really primitive, but it at least gives an idea of whether a
resource has been submitted for writing in a pending batch

Reviewed-by: Erik Faye-Lun <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6924>

radv: fix ignoring the vertex attribute stride if set as dynamic

The vertex attribute stride should be ignored, so make sure it's
initialized to zero if dynamic to avoid computing a wrong offset.

The fact that each element of pStrides must be greater than or equal
to the maximum extent of all vertex input attributes fetched saves us
one user SGPR for the dynamic stride.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3627
Cc: 20.2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7101>

ac,amd/llvm,radv: Initialize structs with {0}

Necessary to compile with MSVC.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7123>

radv/aco: disable NGG GS support because it randomly hangs the GPU

Disable ACO NGG GS until the random GPU hangs are fixed
(one CTS run == one GPU hang here). No hangs so far after
5 full CTS runs with this disabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7108>

nir/opt_uniform_atomics: remove useless returns

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7117>

radv: Only close local_fd when valid

Necessary when drm_device is bypassed.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119>

util: Hide timespec_passed on Windows

Windows doesn't have clockid_t.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119>

radv: Increased const usage

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119>

amd/addrlib: Fix warning list for msvc

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119>

intel/fs: Rework scratch handling on Gen9+

The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write.  We also use that region for
the header for scratch reads.

This commit changes things and gets rid of the MRF hack.  Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy.  This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves.  In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0.  Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.

One other noticeable difference is the use of SHADER_OPCODE_SEND.  We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions.  This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND.  It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.

Shader-db results on Skylake:

    total spills in shared programs: 12137 -> 10604 (-12.63%)
    spills in affected programs: 6685 -> 5152 (-22.93%)
    helped: 274
    HURT: 2

    total fills in shared programs: 13065 -> 11515 (-11.86%)
    fills in affected programs: 9007 -> 7457 (-17.21%)
    helped: 275
    HURT: 1

Shader-db results on Ice Lake:

    total spills in shared programs: 12482 -> 10953 (-12.25%)
    spills in affected programs: 6586 -> 5057 (-23.22%)
    helped: 275
    HURT: 0

    total fills in shared programs: 12819 -> 11234 (-12.36%)
    fills in affected programs: 7867 -> 6282 (-20.15%)
    helped: 274
    HURT: 0

Shader-db results on Tigerlake:

    total spills in shared programs: 11689 -> 10233 (-12.46%)
    spills in affected programs: 4740 -> 3284 (-30.72%)
    helped: 259
    HURT: 0

    total fills in shared programs: 10840 -> 9443 (-12.89%)
    fills in affected programs: 6244 -> 4847 (-22.37%)
    helped: 259
    HURT: 0

Fossil-db results on Ice Lake:

    Spills in all programs: 245249 -> 201633 (-17.8%)
    Fills in all programs: 366066 -> 314368 (-14.1%)

More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs/ra: Use a set to track added spill/fill instructions

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs/ra: Sanity-check our IP counts

Starting with e99081e76d4a, we don't re-construct liveness information
every time we spill a register. Instead, we're very careful to track
which instructions are spill instructions and not contribute those to
the IP count so that we can continue to use the old liveness information
even though instructions have been added. This commit adds an assert
that sanity-checks that we count the same number of instructions as our
liveness information is based on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs/ra: Store the last non-spill VGRF node

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs/ra: Refactor handling of Gen7 scratch reads

The attempt at de-duplication with the gen7_read Boolean wasn't actually
saving us anything.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs/ra: Increment spill_offset as part of the emit_spill loop

This makes it consistent with our handling of src.offset and with our
handling of spill_offset in emit_unspill.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs: Add a SCRATCH_HEADER opcode

This opcode is responsible for setting up the buffer base address and
per-thread scratch space fields of a scratch message header.  For the
most part, it's a copy of g0 but some messages need us to zero out g0.2
and the bottom bits of g0.5.

This may actually fix a bug when nir_load/store_scratch is used.  The
docs say that the DWORD scattered messages respect the per-thread
scratch size specified in gN.3[3:0] in the message header but we've been
leaving it zero.  This may mean that we've been ignoring any scratch
reads/writes from a load/store_scratch intrinsic above the 1KB mark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/fs: Copy the PTSS from g0 for scratch reads/writes

In theory, this fixes a bug where we were dropping the PTSS bound on the
floor.  The hardware docs claim that the A32 DWORD and BYTE scattered
read/write messages do a PTSS bounds check.   However, in practice, it
seems that the hardware ignores the bounds check so this doesn't
actually matter.  I verified this with the following couple of piglit
tests:

    https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399

In practice, this prevents the next commit from making a subtle
behavioral change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

intel/batch_decoder: Don't clame vec4 vs/gs/tcs shaders on Gen11+

Because we hard-coded the default to vec4, any platform where it doesn't
have a "Dispatch Mode" field gets vec4 by default. This includes Gen11+
where vec4 is no longer a thing. Change the default so it works on
newer hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>

v3dv/device: Support loader interface version 3.

Port of 1e41d7f7b0855934744fe578ba4eae9209ee69f7:
"anv: Support loader interface version 3 (patch v2)"

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix buffer copies to compressed images on the blit path

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: drop a couple of obsolete comments

We only expose a coherent memory heap, so invalidation and flushing
are always no-ops for us.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: limit blit framebuffer dimensions to max coordinates

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: generate proper UUIDs for device and driver

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix blit path for copies from 3D compressed images

The aliasing we were using was not always correct. Particularly,
for 3D images, the simulator would complain about image strides
not being large enough in some cases.

This patch fixes this by aliasing both src and dst images and
carefully choosing the alias dimensions taking into account the
format chosen for the copy and the ratio of block sizes between
both images.

Playing a bit with the image dimensions used by the relevant CTS
tests we confirmed this works well for all tile layouts (lineartile,
ublinear1/2 and UIF).

This fixes all CTS tests involving 3D image copies from compressed
formats without needing to force UIF layout for all compressed
images (which would actually not work for all image sizes either).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fixes for barriers in secondary command buffers

This patch addresses various issues, mostly from secondary command buffers
that recorded pipeline barriers that are not consumed in the secondary itself,
so they need to be applied to jobs that come right after the execution of the
secondary in a primary command buffer.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement workaround for GFXH-1918

Loading depth with odd width/height might cause incorrect loading
of the early-Z buffer.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement workaround for GFXH-1461

If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.

Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: flag tmu_dirty_rcl in primaries when linking secondaries that have it set

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: only advertise one memory type

Our current implemenation is always coherent.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: always program a reasonable internal depth type for copies/clears

This doesn't seem to fix anything, but it is the right thing to do.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline_cache: extend pipeline cache envvar

So far V3DV_ENABLE_DEFAULT_PIPELINE_CACHE allowed to configure
pipeline cache to avoid any caching using a pipeline cache.

With this change we can be more detailed. Then envvar is not anymore a
boolean. Allowed values:

  * "off": no pipeline cache at all. PipelineCache objects behaves as
    no-op objects.

  * "no-default-cache": user PipelineCache caches nir/variants, but we
    don't provide a default cache in case the user doesn't provide a
    PipelineCache object, neither for internal pipelines.

  * "full" (default): we provide a default PipelineCache, used when
    the user doesn't provide one when creating a Pipeline, and for
    internal Pipelines.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline_cache: set a max size for the pipeline cache

We don't want to let the default pipeline cache grow without limit. We
choose a maximum number of entries that should work for all real world
applications. CTS will exceed that limit, but that is okay, as it will
prevent us from running out of memory.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/compiler: allow to batch spills

Some shaders that need to spill hundreds of registers can take very long times
to compile as each allocation attempt spills a single register and restarts
the allocation process. We can significantly cut down these times if we allow
the compiler to spill in batches, which should be possible if we are spilling
uniforms, which is in fact the kind of spills that we do first because they
have lower cost than TMU spills.

Doing this could cause us to slightly over spill in some cases (depending on
the chosen batch size) leading to slightly worse performance, so we only
enable this behavior after we have started to spill over a certain threshold,
at which point we assume that performance won't be good and we want to
favor compilation speed instead.

v2:
  - Keep it simple and just try to spill a fixed amount of registers in a
    batch instead of trying to compute this dynamically based on accumulated
    spills and current register pressure. (Eric).

v3:
  - Check if the node is valid before doing anything with it.
  - Drop the environment variable to select batch size and just fix it to 20.

With this we can take this CTS test from 35 minutes down to about 3 minutes:
dEQP-VK.ssbo.layout.random.all_shared_buffer.5

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: free noop job if needed when finishing the queue

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: clean-up after obtaining an XCB connection

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak dumb BO handles allocated for swapchain images

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/meta_copy: fix TFU blitting when using 3D images

We had some code on blit_tfu to hande 3D images but it was wrong. For
example, it executed a copy on the 3D image no matter the depth
component copy needed. This was not detected until vk-gl-cts 1.2.4
introduced more 1D and 3D blitting tests.

Also add checks for rely on blit_shader if needed like when mirroring
on the depth component.

Fixes the following tests:
  dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
  dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.whole_3d.nearest
  dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.mirror_z_3d.nearest
  dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.whole_3d.nearest

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: honor VkPipelineDepthStencilStateCreateInfo::depthWriteEnable

Fixes:
dEQP-VK.renderpass.suballocation.subpass_dependencies.separate_channels.d24_unorm_s8_uint

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix sampling from stencil aspect of a combined depth/stencil image

When sampling the stencil aspect we want to reinterpret the D24S8 format
as RGBA8 and read stencil values from the R component.

Fixes:
dEQP-VK.renderpass.suballocation.formats.d24_unorm_s8_uint.input.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/formats: properly return unsupported for 1D compressed textures

Gets tests like the following one properly skipped:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.1d.etc2_r8g8b8a8_unorm_block.etc2_r8g8b8a8_unorm_block.optimal_general

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: signal semaphore/fence if needed after acquiring a swapchain image

Fixes:
dEQP-VK.wsi.*.swapchain.acquire.too_many
dEQP-VK.wsi.*.swapchain.acquire.too_many_timeout

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: do not expose VK_IMAGE_USAGE_SAMPLED_BIT for swapchains

The display pipeline on the Rpi4 requires that images are linear and the
3D pipeline cannot sample from linear images.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix size computed by vkGetImageSubresourceLayout for 3D images

Fixes:
dEQP-VK.image.subresource_layout.3d.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix offset computed by vkGetImageSubresourceLayout for array images

Fixes:
dEQP-VK.image.subresource_layout.2d_array.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: expose DRM modifiers based on supported features

So far we have only been exposing linear for WSI formats and UIF on
everythig else, but we should instead expose linear or UIF based
on whether the underlying format supports any features for the given
layout.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_DRM_FORMAT_MODIFIER_INFO

When negotiating DRM modifiers, applications may use this to validate the
features that are supported with a particular modifier. The WSI code in
Mesa relies on this to validate its modifiers.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/meta_copy: handle mirroring z component bliting 3D images

By basing the tex_coord on the max layer, instead of min (similarly to
what we do for mirroring x/y)

Avoid all crashes, and get to Pass most of the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.*

The only one failing is this one:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest

but looks that the core cause would be different, as there are other
3d nearests tests failing.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix color clear pipeline destruction for 32-bit architectures

Command buffer object destruction callbacks take 64-bit object
handles, but we defined the color clear pipeline callback to take
a 32-bit argument.

Should fix recent crash regressions with some CTS tests on Rpi4.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: hook up robust buffer access

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/compiler: add a lowering pass for robust buffer access

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

broadcom/compiler: rename QUNIFORM_GET_BUFFER_SIZE to QUNIFORM_GET_SSBO_SIZE

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle QUNIFORM_GET_UBO_SIZE

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/compiler: implement nir_intrinsic_get_ubo_size

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

nir: add a nir_get_ubo_size intrinsic

This is the same as nir_get_buffer_size but geared towards UBOs instead
of SSBOs. The new intrinsic is useful in Vulkan backends that need to
add bound checks on buffer accesses to honor the robust buffer access
feature.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dV: move meta init/finish to meta implementation files

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't cache subpass color clear pipelines

Subpass color clear pipelines are those used to emit partial attachment
clears as draw calls inside the render pass currently bound by the
application in the command buffer, leading to a huge performance improvement
compared to the case where we emit them in their own render pass.

Unfortunately, because the pipeline references the render pass
object in which it is used and the render pass object is owned by the
application (and can be destroyed at any point), we can't cache these
pipelines (unless we implement a refcounting mechanism or other
similar strategy).

Performance impact looks negligible based on experiments with vkQuake3,
probably because the underlying pipeline cache is preventing the
redundant shader recompiles.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix 3D image blits

Specifically, we should select the slice to blit from on the source
image to be in the middle of the depth step.

This issue was only raised recently after the CTS improved the 3D
blitting tests.

Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*.3d.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: only require texel-size alignment for linear images

Originally, copies between buffers and images required a buffer offset
that was a multiple of 4 bytes, however, the spec was later fixed to
relax this rule and only require offsets that had texel alignment.

Our implementation of image to buffer copies using the blit path needs
to bind the destination buffer as a linear image and be able to bind
the requested buffer memory at the required offset, so for that to work
we need to chnage the alignment requirements for linear images to match
the relaxed texel alignment requirement.

Fixes new tests in Vulkan CTS 1.2.4:
dEQP-VK.api.copy_and_blit.core.image_to_buffer.buffer_offset_relaxed
dEQP-VK.api.copy_and_blit.dedicated_allocation.image_to_buffer.buffer_offset_relaxed

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: lower interpolateAt functions in NIR and enable sample rate shading

The lowering will get all the interpolateAt() functions from GLSL lowered to
the corresponding intrinsics we have just implemented in the compiler backend,
which was the last piece we needed to enable the feature.

This gets us to pass all the relevant tests in:
dEQP-VK.pipeline.multisample_interpolation.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

nir/lower_io: add an option to lower interpolateAt functions

The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: enable largePoints

as we have just set proper values for point granularity etc, we can
enable largePoints. With this change tests like this:
dEQP-VK.rasterization.primitive_size.points.point_size_*

goes from Skip to Pass.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: fix point-related VkPhysicalDeviceLimits

As we are here, we also tweak some line-related limits, as some use
the same value that for point, and in order to use the enum we added
recently at common/v3d_limits.h

Fixes the following test:
dEQP-VK.glsl.builtin_var.simple.pointcoord

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/limits: add line width and point size limits

They will be the same for the OpenGL and Vulkan driver, so let's put
it on the commit limits header.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/cmd_buffer: set instance id to 0 at start of tile

PTB assumes that instance id to be 0 at start of tile, but hw would
not do that, we need to set it.

This fixes some Vulkan CTS tests that start to fails after some other
tests used an instance id.

So for example, before this commit for the following tests, executed
in that order, we got the following behaviour:

dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Pass
dEQP-VK.draw.indexed_draw.draw_instanced_indexed_triangle_strip => Pass
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Fails

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline: set 16bit return_size for shadows always

So far we were pre-generating two variants, an all 16 bit return_size
and an all 32-bit return_size, as at pipeline creation time we don't
know the texture format that it would be used finally used.

But it is possible to override or at least refine the 32bit case, as
we know in advance that all shadow textures can (and in fact should)
use return_size 16bit.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline: track if texture is shadow

To be used to decide the texture return size. We add it on the
descriptor map because it is the easier place to do so. As we are
lowering the texture accesses we can check instr->is_shadow at that
point. It is true that it is somewhat odd, as so far the descriptor
map was general-descriptor info, but is_shadow is only for
textures. But it doesn't make sense to make an effort now, as it is
possible that we would get more descriptor-specific info on the map on
the future. We can revisit that later.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: Call nir_lower_io for push constants

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline: use derefs for ubo/ssbo

There are some potential advantages for that. Even if we are not
taking advantage of them, it would be interesting to be using this
path now, specially as non-deref path could be removed at some point.

Note that instead of returning for both resource_index and
vulkan_descriptor a vec2, we return a scalar for the first one, as it
is what the v3d backend expect (like for get_ssbo_size). For this to
work, we reconfigure the vec2 at vulkan_descriptor using the index and
an unused 0 value.

As far as I see turnip avoids that by lowering too load_ssbo/ubo, so
it just gets the index lowered (that in their case it is a vec3 with a
fixed 0 on the third component), but for now it is easier doing this.

v2: return a single-component for the index, to avoid the backend
needing to handle it (Eric, Jason).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: fix compute_heap_size for the simulator

Asking the simulator the total memory it is using, instead of sysinfo
(that returned the host system memory).

Fixes the following CTS tests when using the simulator:
dEQP-VK.memory.allocation.basic.percent_1.forward.count_12
dEQP-VK.memory.allocation.basic.percent_1.reverse.count_12

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/simulator: add v3d_simulator_get_mem_size

Reviewed-by: Iago Toral <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

broadcom/compiler: allow GLSL_SAMPLER_DIM_BUF on txs emission

Although we don't support texture buffers on the OpenGL driver, we are
already doing that for the Vulkan driver. This would be needed for the
OpenGL driver in any case.

Fixes following tests on v3dv:
dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.*
dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.*

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/meta: fix hash table insertion

So far we were using directly the local variable key to do the
insertion, when the hash table expects a permanent address. We add a
key field on all the meta structures (that are already basically a
wrapper over v3dv_pipeline).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline: fix combined_index_map insertions

We were inserting as key directly the local key variable used to
search for entries, but hash_table expect a real pointer. Fixed by
using the array of keys that we already had at v3dv_pipeline.

Fixed failures on the rpi4 like:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.a1r5g5b5_unorm_pack16.a1r5g5b5_unorm_pack16.general_general_linear

but fwiw, this tests on the simulator, and several other tests on both
the simulator and rpi4, were working just by luck.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/debug: add v3dv_print_v3d_key

Useful to print which v3d keys were used for each variant.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: warn when the pipeline cache is disabled

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: add assert for texture-related limits

There are several limits that when added shouldn't be greater than
V3D_MAX_TEXTURE_SAMPLERS (defined at common/v3d_limits.h), so let's
assert it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisample rasterization with empty framebuffers

If the framebuffer has no attachments then multisample rasterization
is enabled based on the rasterizationSamples multisample state of
the pipelines. It should be noted that since we don't support
the variableMultisampleRate feature, all pipelines in the same
subpass must have matching number of samples.

V3D requires that we specifically setup our frames to enable
multisampling or not, and we do this when we create jobs inside
a subpass. Since we create the first job for a subpass as soon as
the subpas starts, this is problematic: if we don't have any
attachments, we don't won't enable MSAA at this point, but later
on we might bind an MSAA pipeline, since pipelines can be bound
at any point in the lifespan of a command buffer.

Here, we fix this by testing if the first draw call in a job uses
an MSAA pipeline but the job the was setup to not use MSAA, and in
that case we re-start the job with MSAA enabled.

We also take care of a corner case that seems to be tested by CTS
where a framebuffer with no attachments doesn't bind any pipelines
with MSAA enabled (so according to the Vulkan spec, multisample
rasterization must be disabled) but the fragment shader in use
reads gl_SampleID (which enables per-sample shading). This would
lead to enabling per-sample shading with single-sample rasterization,
which doesn't make sense and makes the simulator complain, so we just
disable per-sample shading in that case.

Fixes:
dEQP-VK.pipeline.multisample.mixed_count.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement nir_texop_texture_samples

Fixes:
dEQP-VK.glsl.texture_functions.query.texturesamples.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: enable sample rate shading if fragment shader reads gl_SampleID

According to the spec, if a fragment shader reads gl_SampleID then the
shader must be evaluated per-sample.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.write_sample_mask.4_samples

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

broadcom/compiler: track if the fragment shader forces per-sample MSAA

For example, regarding gl_SampleID, the GLSL spec states:

   "Any static use of this variable in a fragment shader causes the
    entire shader to be evaluated per-sample."

So we need to track if the fragment shader does anything that implicitly
enables per-sample shading in the compiler for the driver to
auto-enable sample rate shading if needed.

v2:
- Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID
   and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like
   other drivers are doing to activate this behavior (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: remove v3dv_descriptor_map_get_image_view

Now that we added support for texel_buffers, on all the cases that we
were checking for a image_view we end checking for a image_view or
buffer_view, so we stopped to use it. Remove it as it become
superfluous.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/uniforms: handle texture size for texel buffers

This gets tests like the following one working:
dEQP-VK.image.image_size.buffer.readonly_writeonly_1

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

broadcom/compiler: implement nir_intrinsic_load_sample_pos

This is intended to return the sample location within the pixel.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_position.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/formats: fix exposing FEATURE_UNIFORM/STORAGE_TEXEL_BUFFER_BIT

If the formats are not suitable as texture type, then they can't be
used as texel buffers.

Gets tests like the following one:
dEQP-VK.image.load_store.without_format.buffer.r32g32b32_sfloat_minalign_uniform

to be properly skipped (instead of Crash on the simulator)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisample image clears

Fixes:
dEQP-VK.pipeline.framebuffer_attachment.*_ms

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisample resolves for formats that don't support TLB resolves

The TLB multisample resolve feature is only limited to specific format types.
For everything else, including sfloat and integer formats, we need to
fallback to a blit resolve. This needs to be handled both for in-pass
resolves as well as for vkCmdResolveImage.

Because these blits would happen after the tile store operations, we need
to make sure we store the multisampled buffers so we can then read them for
the blit resolve.

Fixes the remaining test failures in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisample resolve of integer formats

The multisample resolve of an integer framebuffer should just take one
of the samples instead of averaging.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix blitting of signed integer formats

For these we want to select a signed integer output format
and a signed sampler type.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

nir/glsl: add a glsl_ivec4_type() helper

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: amend tile size tables with smallest tile sizes available

We'll need this for some cases involving maximum number of multisampled
color targets.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/device: fix minTexelBufferOffsetAlingment

As we understand that texture accesses should be aligned to the UIF
block size.

Fixes several of the CTS tests under this pattern:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*.offset_nonzero
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_texel_buffer.*.offset_nonzero

Note: for those tests, using a lower value (64) was enough to get them
working, but again, we understand that the real alignment is the UIF
block size.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add v3dv_limits file

There are several definitions for hw limits on v3dv_image that we want
to share, but v3dv_private was already growing bigger and messier.

So let's move them to a specific header. Note that there is already a
broadcom/common/v3d_limits.h. We are not putting them there because
right now they are only used by the Vulkan driver, but are candidates
to be moved.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: support for UNIFORM/STORAGE_TEXEL_BUFFER

This gets passing most uniform/storage_texel buffer tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

broadcom/compiler: handle gl_SampleMask writes in fragment shaders

We didn't need this until now, since this was included with GLES 3.2,
but we need it for Vulkan.

Eric had already done the plumbing for it though, we just need to
actually emit the mask.

Fixes some tests in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisampled image copies with the blit path

This should be able to handle partial copies of multisampled images.

This change extends our blit shader interface to also handle multisampled
destinations so that if the blit destination is a multisampled image,
the blit will rely on sample rate shading to copy all samples from
the source image (which must have a matching number of samples).

I have not found any tests in CTS that do partial copies of
multisampled images, so I tested this with a full multisampled image
copy, using this test:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add a blit fallback path for vkCmdResolveImage

This fallback is required when we have to do partial resolves. It
works the same way as other blit fallbacks for copy operations: it
will bind the source image as a source texture and blit the selected
region to the destination image.

The difference in this case is that the source image is multisampled
and the blit shader needs to fetch and average individual samples for
each texel.

This gets us to pass all the remaining test cases in
dEQP-VK.api.copy_and_blit.core.resolve_image.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: setup texture shader state correctly for multisampled images

Fixes multisampled cases in:
dEQP-VK.pipeline.multisample.sampled_image.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle multisampled image copies in the TLB path

vkCmdCopyImage can be used to copy multisampled images. We can
easily support that on the TLB path, which copies full images.

For partial copies we will need to amend our blit shader path
to support multisampling resolve.

Fixes:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement vkCmdResolveImage for whole images

For partial resolves we will need a shader blit & resolve fallback.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>