review.tizen.org Git - platform/upstream/mesa.git/log

v3dv: always map full BOs

Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: try to use TFU path when creating tiled images from linear buffers

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add a CPU path for buffer to image copies

The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.

This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).

We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add a TFU path for buffer to image copies

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: try harder to skip emission of redundant state

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: actually enable early Z

We had done all the plumbing for this but EZ can be disabled in 3 places
and we were never setting the enable bit in the configuration bits packet.

Also, configuration bits must not enable EZ if this has been disabled in
the RCL for the whole frame, which we do if we don't have a depth
attachment at all.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix release build warnings

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix a few cases where we were ignoring suballocated buffers

This gets VkQuake2 to render correctly.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: regen BO lists for CLs inside cloned jobs

Otherwise cloned BO lists point to the original list objects and not
the cloned ones, and that will confuse anything that tries to iterate over
them, such as list_length(), leading to infinite loops.

Fixes (in debug mode):
dEQP-VK.api.command_buffers.render_pass_continue

In that test we clone a full CL job from a secondary, and without this,
the BO lists in its CL lists will point to the bo_list field in the
original job, leading to an infinite loop as we assert the expected size
of these lists at queue submit time in handle_cl_job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/meta-copy: add uintptr_t casting to avoid warning

Without it, on the rpi4 (and any 32-bit SO) we would get a warning
about wrong sizes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix BCL start offset in presence of chained BOs

If a job's BCL spans multiple BOs we should take the start offset of the
BCL from the first BO in the list.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: warn users that this is not a conformant driver

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add stubs for missing API implementations

Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.

Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: use descriptor pool bo for image/samplers

This allows us to remove some individual bos for the image and
sampler, used to store the SAMPLER_STATE and TEXTURE_SHADER_STATE. Now
they are prepacked on static memory as part of the vulkan object
struct.

This commit introduces small descriptor structs, used to define what
the bo subregion would contain. It is used mostly to compute offsets
to that specific data, and define the size needed. Having said so, it
would be possible to replace them with some kind of flag (like anv) or
just compute the offset based on the context.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: add general bo on descriptor pool

So far we were saving all the descriptor info on the host memory. With
this commit we do the equivalent that other mesa vulkan drivers (Anvil
and Turnip) and create a bo on the descriptor pool that would be
suballocated for each descriptor.

This would allow to clean up individual bos from some vulkan objects,
reducing device memory fragmentation, and allowing to avoid to alloc
bos for that info. After all, pre-allocating needed memory is one of
the purposes of the descriptor pool.

This commit introduces all the infrastructure, but doesn't use it for
any descriptor yet, as if no descriptor needed data uploaded to a bo.

The idea to decide which info goes to the descriptor pool bo is info
that we would need to upload to a bo in any case, as it is referenced
as an address by any packet.

We could be more aggressive with that general rule, but that would be
enough for now. If in the future we support
VK_EXT_descriptor_indexing, we probably would need to store more info,
as under that extension, descriptors can be updated after being bound.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak attachment state

We were assuming that if the command buffer state doesn't have any
attachments (as per the attachment count) the attachment state array
should not be allocated, however, during meta operations it is
possible that the attachment state grows (since meta operations can
emit render passes of their own). In that case, we would grow the
state for the meta operation but then pop the previous attachment
count and we would leak the state.

An example of that is a secondary command buffer which has no
attachment state by default since it doesn't execute a render pass
begin, but that executes one in a meta operation (for
vkCmdClearAttachments for example).

Fix this by making the attachment count an allocation count instead
and not popping it once we finish a meta operation. Also, always free
the state so long as there is a valid pointer, and assert that the
allocated count is not zero in that case.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: support vkCmdClearAttachments in secondary command buffers

The main change we are introducing here is that now we allow secondary
command buffers that execute in a render pass to have a job list with
more than one job.

The main issue with vkCmdClearAttachments is that we currently need
this to spawn multiple jobs to clear multilayered framebuffers, as we
need to setup a different 2D framebuffer for each layer to clear and
therefore emit a different RCL for each. We could avoid this
completely by used layered rendering with the "clear rect" path to
redirect the clear rects to appropriate layers of the primary
framebuffer, however, our hardware only supports layered rendering
with geometry shaders, which we don't support at present.

Because vkCmdClearAttachments relies on having framebuffer state
available (something we would not need if we used the geometry shader
implementation), if this is not available in the secondary we need to
postpone emission of the command until the secondary is executed
inside a primary. We do this by using a new CPU job
V3DV_JOB_TYPE_CPU_CLEAR_ATTACHMENTS that is processed during
vkCmdExecuteCommands by calling vkCmdClearAttachments directly in the
primary.

As a consequence of these changes, it is now possible that a secondary
command buffer that runs inside a render pass have any kind of job in
its job list, including partial CLs that need to be branched to and
full CLs that need to be submitted to the GPU as is, so we introduced
a new GPU job type V3DV_JOB_TYPE_GPU_CL_SECONDARY to identify partial
CLs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement vkCmdWaitEvents for secondary command buffers

Event waits can be safely moved before a render pass start, since
event setting and resetting commands cannot happen inside one. We
don't need to go that far, but we can use this to record the wait
in its own separate job and then execute this job before the
binning commands recorded in the secondary command buffer when
we execute the secondary into a primary.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add basic support for secondary command buffers

There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.

For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.

For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.

This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments

Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix bogus command buffer allocation scopes

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle OOM properly during command buffer recording in more places

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: ensure BCL space is available before emitting packets

We should always do this. So far we have been getting away with this
because we overallocate at v3dv_job_start_frame, but that won't do
for secondary command buffers for example, it is also unreliable
if CLs grow past that initial allocation.

In the future, we might want to fix our emit macros so they do the
allocation check implicitly, which would simplify the code and would
make this process a lot less error prone.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: check that GPU device matches requirements

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: assert command buffers are executable when submitting to a queue

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: remove some unnecessary / unused functions

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: drop the extra BO handling from the command buffer

Now that we have a framework to register objects allocated internally
by the driver we can just use that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: make TLB clearing paths return true/false

We are currently able to clear any supported format using the TLB, but
this is more consistent with other parts of the code and is what we want
should we add any formats in the future where we can't get away with
TLB clears.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix vkResetCommandPool

During a command buffer reset we call cmd_buffer_init(), which will
add the command buffer to the pool, so make sure we remove it first
and that we use a safe iterator when resetting a pool.

Fixes:
dEQP-VK.api.command_buffers.pool_reset_reuse

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak BOs from CLs when using BRANCH

Keep the list of BOs referenced in the CL and free all of them
when the CL is destroyed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/pipeline: support for specialization constants

That it is justconvert VkSpecializationInfo to
nir_spirv_specialization and pass it to spirv_to_nir.

The code is also basically the same used by anv, tu, and radv
Eventually it would make sense to move it to a common place.

Note that we are using calloc there to allocate the temporary
spec_entries. Trying to use vk_alloc2 causes some problems with the
nir_validate.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/cmd_buffer: move variant checking to CmdDraw

In order to properly check (and possibly compile) shader variants we
need a pipeline and a compatible descriptor set. So far we were trying
to do that check as early as possible, so we were trying to do it at
CmdBindPipeline or CmdBindDescriptorSets, and a combination of dirty
flags. This showed to not cover all the corners cases, and made the
code complex, as needed to handle cases where the descriptors were not
yet available, and return early. The latter also meant that we were
running several checks that failed in the middle.

This commit moves the variant check to CmdDraw, when we should have a
pipeline and compatible descriptor sets, and simplifies and makes more
strict the existing code.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement events

This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.

Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.

If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: make the driver more robust against OOM

This is generally very difficult to handle properly everywhere, but
at least this is good enough to make the few CTS tests for this happy.

Fixes (on Rpi4):
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix depth/stencil clears on hardware

There is a hw bug by which the only way to clear the depth/stencil
tile buffers is to emit a clear of all tile buffers, so if we have
to do any such clears, we just emit a single clear of all tile
buffers and skip doing any per-buffer clears, even for color buffers,
since they would be redundant.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix the command buffer private object framework for 32-bit

We were declaring the destroy callback function as taking a pointer for the
vulkan object handle and relying on an implicit conversion to the Vulkan
handle type, however that would be incorrect on 32-bit platforms, where
non-dispatchable Vulkan objects (the kind that we may allocate privately during
command buffer recording), are defined as uint64_t, so the signature of the
destry callback type doesn't match the signature of the actual Vulkan
function, leading to bogus results. Fix that by using uint64_t instead.

This fixes compilation warnings and also crashes in some tests when
compiling and executing natively in Rpi4.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix dynamic blend constants

We were pre-packing the constants from the pipeline state and then
always emitting that at draw time, ignoring dynamic state. This makes
it so we don't prepack at pipeline creation time and we always emit
the correct constants directly the command buffer dynamic state.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement wide lines

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: ignore dynamic updates of depth bounds state

Depth bounds testing is not available in V3D 4.2 so we just ignore
this piece of state and assert if any pipeline attempts to enable
the feature.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement depth bias

This doesn't implement depth bias clamp, which requires to support the
depth bias clamp feature, which we do not advertise as available at present.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: drop blit path for depth/stencil formats

We can now implement all depth/stencil blits as compatible color blits,
so let's just have the blit shader interface simply convert any blit with
a depth/stencil format to a compatible color blit (like we were already
doing for s8d24) and get rid of the depth blitting path. This also allows
us to ignore the blit aspect in the blit pipeline cache key.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: do not rewrite blit spec for combined depth/stencil in get_blit_pipeline

Now that our blit shader interface supports color writemasks and swizzles,
users can specify depth/stencil blits as color blits properly when needed,
so let's just do instead, which is more straight forward and less error prone.

Because s8d24 to s8d24 blits always require the same conversion to color blit,
the blit shader will handle these automatically, including blits of just one
aspect, however, for scenarios where there are additional semantics to consider
(particularly, copies from/to buffer), it is still up to the blit shader caller
to specify a proper color blit matching the required semantics.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement partial image to buffer copies

We implement this by blitting the requested region to a linear image
setup to use the buffer memory store at the requested offset.

Because we can't store linear depth/stencil images, we implement
copies of depth/stencil aspects by using a compatible color blit.
To do this, we also need to account for the fact that when we are
copying depth from a d24 format we need to copy them from the MSB
24-bits of each word as provided by the hardware and store them
in the LSB 24-bit of the buffer (as per Vulkan requirements). This
is achieved by expanding our blit interface to also accept a swizzle
to apply to the source texture.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: only require 4-byte alignment for linear images

The page alignment requirement is for UIF images only, and for linear
images it is actually useful to use a 4-byte alignment so we can
use them to write images to linear buffers at arbitrary positions, which
we will need when copying subrects of an image to a buffer.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix image addressing calculations to account for suballocation

An image can be suballocated from a larger memory allocation, in which
case we get a memory offset for the start of the bound region at
vkBindImageMemory. Take that offset into account when doing image
addressing calculations.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/meta-copy: ensure valid height/width with compressed formats

With compressed formats we compute the final height/width dividing by
the block sizes. On some cases the block sizes are bigger that the
original values, or it is not a exact division, so we need to round up
the division.

Fixes tests like:
dEQP-VK.texture.compressed.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: always return true from a fallback path if it can handle the case

As opposed to returning true only if it can handle the case and it
also successfully processes it.

This is because we expect handled cases to be successfully processed
except in abnormal situations such as out-of-memory errors. If an OOM
is the reason a fallback path fails, we don't want to try another path
(which will likely hit an OOM too): we have already recorded the OOM
error in the command buffer and we just want to stop executing the
command, so just flag the case as handled and move on.

Also, if we don't do this, in an OOM scenario we'll likely end up running
out of fallback paths and end up asserting (on debug builds), which makes
some CTS tests unhappy because they expect OOM to be handled more
gracefully, so this allows us to make CTS happy also in debug builds,
which is convenient.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement partial buffer copies to depth/stencil images

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: support blitting both depth and stencil aspects at the same time

In "v3dv: implement stencil aspect blits for combined depth/stencil format"
we only implemented the support we needed to implement partial copies.
Copies only allow to copy a single aspect, so we really only supported blitting
either the depth aspect or the stencil aspect, but not both, however,
actual blits (as per vkCmdBlitImage) allow to blit both stencil and depth
aspects at the same time, so this adds support for that.

Finally, this also fixes the fact that we were not really masking color writes
effectively for stencil-only blits, since create_blit_pipeline() would check
the requested aspect to see if it would need to mask writes, but by the time
we called this, we had already switched the aspect to color. The reason this
was not caught before is that for copies this would only mean that when we
copied stencil we would also copy depth, and the image copy CTS tests are
probably copying both aspects anyway.

Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.*d24_unorm_s8_uint*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement partial buffer copies to color images

The idea is that we also want to use the blit mechanism to implement
the copy, like we do for partial image copies. Unfortunately, we can't
sample from a linear image, so we first need to upload the buffer
contents to a tiled image, and then blit from that image to the
destination, which is not great for performance or memory usage.

In the future, we mihgt be able to do better by using a specialized
shader for these copies that takes a UBO as input instead of a texture.
The shader would then be able to access the linea buffer through the
UBO directly without having to copy the buffer contents to a tiled
image first.

This only supports color images for now, we will add support for
depth/stencil images separately.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle copies from/to compressed formats

Since we don't support rendering to compressed formats, we implement this
by using a compatible format with the same bpp as the compressed format.
However, we need to take into account that the underlying compressed image
bpp is computed for a full block, so when we specify regions on the
compressed image, we need to divide offsets and dimensions by the block
size.

This works well for anything that copies to a compressed format using
the TLB, but it doesn't specifically address copies from compressed
formats to other compatible images. These go through the blit path
and require to copy by blitting (texturing) from the compressed format
to another format. In this case, we choose a comptible format with
the per-texel bpp (not the block bpp) of the compressed format so it
matches the setup for the blit operation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: limit software integer RT clamp to rgb10a2

We can use the HW integer clamp feature, which clamps automatically
to the render target type. This works for all integer formats but
rgb10a2, which has a 16-bit type.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d: fix Tile Rendering Mode Cfg (Color) packet description

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement stencil aspect blits for combined depth/stencil format

To do this we just implement the stencil blit as a masked color bit
with uint8 format. This allows us to support blitting on combined
depth/stencil formats, and therefore, also partial image copies
for these formats.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement fallback for partial image copies

For this we use blits with nearest filtering and choose a compatible
format for the render target if the copy format is not renderable.

This works for all supported formats except combined depth/stencil
(for which we don't support blitting for now) and compressed formats.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: properly return OOM error during pipeline creation

So far we were just asserting or aborting if any of the internal
method used during the pipeline creation failed.

We needed to change the return value of several methods, in order to
bubble up the proper memory allocation error.

Note that as the pipeline creation is complex and splitted in several
methods, if an error happens in the middle, it returns back, and rely
on the higher level to call PipelineDestroy. This method needs to take
into account that some of the resources could have not been allocated
when freeing it.

Also note that v3dv_get_shader_variant is used during the pipeline
bind, as with the new resources bound, we need to check if we need to
recompile a new variant. At that moment we are not creating a new
vulkan object so we can really return a OOM error. For now we just
assert on that case.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle texture/sampler shader state bo failure with OOM error

As we are doing this while we are creating the ImageView, we should
handle it with a real error, and not an abort.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: use the private object framework in the meta clear path

This was allocating image views in the stack, which was kind of
hackish, and of course was expecting that allocated Vulkan objects
could be immediately freed after being recorded in the command buffer
which is not always safe to do in the general case (even if it was
here). This makes things more consistent and reliable.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix leaks during recording of meta blits

This uses the framework to register private commmand buffer objects
that get freed automatically when the command buffer is destroyed by
the application.

This change also moves the descriptor set pool that the meta blit path
uses to allocate descriptors for the blit source textures, from the
device to the command buffer, so we can have a descriptor pool per
command buffer. This is necessary to ensure correct behavior when
doing multi-threaded command buffer recording (alternatively, we would
have to lock around the descriptor set allocation code, which would be
undesirable).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add framework for private driver objects

This allows the driver to register private Vulkan objects it creates as part
of command buffer recording (usually for meta operations) in the command
buffer, so they can be destroyed together with it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: support blits with 1D and 3D images

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: remove incorrect assert

There is no reason why we can't have a non-zero base offset in a push
constant load.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't support 1D depth/stencil for transfer sources or sampling

The hardware can't do sampling from raster depth/stencil textures and
1D images are always raster, even if the user requested optimal tiling.

Using an image as the source of a blit is a transfer source operation,
so we can't expose that either, as blitting involves sampling.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't support blitting of combined depth/stencil formats

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: support depth blits

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle miplevel correctly for blits

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/blit: fix integer blits from larger to lower bit size

In this case the hardware seems to copy the bits that actually fit
in the destination instead of clamping to the maximum value allowed
by the bit size of the destination components like Vulkan expects.

Fix this by adding code to clamp the color results to the bit size
of the destination components.

It should be noted that this is a general issue with the hardware,
and while we can fix it here for blits done by the driver, user
shaders writing outside the range of the destination bit size will
have the same issue and we probably don't want to add code to clamp
every single render target write in every shader with integer format.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak state BO from samplers

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak the texture shader state BO from image views

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak the compiler from the physical device

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak prog_data from shader variants

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak default pipeline attributes BO

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak host memory allocated for shader variants

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: don't leak NIR code in pipelines

The pipeline stages have a reference to the NIR code produced from
the SPIR-V shader modules, but they never destroy it.

It should also be noted that our coordinate shader stage was sharing
the NIR with the vertex shader stage, which is kind of tricky to handle
and probably very error prone. Just make sure each pipeline stage has
owns it NIR shader and that we always free it when the stage is
destroyed.

Also, for the case of NIR modules created by the driver internally,
we always need to clone them, since the driver will destroy the NIR
as soon as it is done creating pipelines with it. We could also not
clone it and let the pipeline stage take ownership of the NIR code for
NIR modules, but that would be inconsistent with how ownership works for
SPIR-V modules.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: move early-Z update to pre-draw

This needs to be updated everytime we bind a new pipeline, but we can
bind a pipeline and not have an actual job yet, so we want to postpone
this until we actually need to emit CFG_BITS, during the pre-draw
setup.

Also, rename the update helper to be about the job rather the command
buffer, since it is updating state that we track per job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: require optimal tiling for features that reqiure sampling

The hardware can only do sampling with a raster format for 1D textures,
so let's just require optimal for everything.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement shader draw fallback for vkCmdBlitImage

For now this is limited to blits of 2D color images.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: save and restore push constant state during meta operations

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: save and restore descriptor state during meta operations if needed

For now we have only been using meta operations for clears which don't need
to bind descriptor sets, however meta blits will need to.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: meta operations can happen outside a render pass

We were asserting that we had a valid subpass index, but we can have
meta operations that run outside a render pass, such as for blitting.

If we allow this, then we also need to account for the fact that
pipelines can be bound outside a render pass too.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: reset subpass index at render pass end

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement TFU blits

While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: add a bunch of API stubs

This his helpful to identify samples that attempt to use unimplemented
features.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: simplify handling of no-op jobs

Avoid creating (and destroying) no-op jobs more than once. Instead,
cache the job and use it every time we need to submit one.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: submit a no-op job if a command buffer doesn't have any jobs.

This is similar to the scenario where we have a submit without
any command buffers, even if we don't have any actual GPU work to do
we still might need to signal fences/semaphored and possibly wait on
previous jobs to finish, so we need to submit something to the kernel
to get all that done right.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: implement occlusion queries

The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.

The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.

Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: reset all state to dirty when we start a new job for a command buffer

Most of our state doesn't carry over across jobs, so it needs to be re-emitted.
For example, if we have two render passes running back to back using the
same pipeline, the application could decide to only bind the vertex buffer
or/and the pipeline just once, but as soon as we record the second render
pass and create a new job for it we will need to re-emit the shader state
record for it just because it is a new job.

We could probably only do this for jobs inside a render pass, since those
are the only ones that actually draw something and need to care about
dirty state, however, there is no harm in doing this for all jobs, for the
same reason.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/format: expose correctly if a texture format is filterable

We were enabling VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT for
any format valid for texturing, but for example, right now we don't
support linear filtering on any depth format.

This is needed to get some hundreds of tests like this:
dEQP-VK.pipeline.sampler.view_type.1d.format.r32g32_sfloat.mag_filter.linear

properly skipped (those were all Crashes with the simulator, and
almost all Fails with the real device).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix subpass merge tests

When testing if we could merge the new subpass into the previous one
We were taking the subpass index from the state (which isn't updated
to the new subpass until a bit later when the job for the new subpass
has been settled). This means that we were doing the merge checks with
the previous subpass, not the current one.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/uniforms: fill up texture size-related uniforms

Needed for textureQueryLevels and textureSize

Gets tests like the following working:
dEQP-VK.glsl.texture_functions.query.texturequerylevels.isampler2d_fragment

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: handle not having a sampler when combining texture and sampler id

There are some texture operations (like mipmap query levels) that
doesn't require a sampler. In fact, you should ignore it. So we need
to take it into account when combining the
indexes. nir_tex_instr_src_index is returning a negative value to
identify that case, but as we are using a uint32_t to pack both values
(for convenience, easy to pack/unpack the hash table key), we just use
a uint value big enough to be a wrong sampler id.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: emit instanced draw calls when requested

This requires that we emit a specific draw command and that we emit
the base instance if not zero right before the instanced draw call.
Notice that we were already doing this for instanced indexed draw
calls, so here we are only adding this for non-indexed draw calls.

We also need to flag whether the vertex shader reads the base instance
in the shader record (which it will if it reads uses gl_InstanceIndex,
as that is lowered in Vulkan to base_instance + instance_id).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3d/compiler: implement nir_intrinsic_load_base_instance

Vulkan lowers gl_InstanceIndex to load_base_instance +
load_instance_id, so we need to implement loading the base instance in
the compiler.

The base instance is set by the BASE_VERTEX_BASE_INSTANCE command
right before the instanced draw call and it is included in the VPM
payload together with the InstanceID and VertexID if this is requested
by the shader record.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor_set: combine texture and sampler indices

OpenGL doesn't have the concept of individual texture and sampler, so
texture and sampler indexes have the same value. v3d compiler uses
this assumption, so for example, the texture info at the v3d key
include values that you need to use the texture format and the sampler
to fill (like the return_size).

One option would be to adapt the v3d compiler to handle both, but then
we would need to adapt to the lowerings it uses, like nir_lower_tex,
that also take the same assumption.

We deal with this on the Vulkan driver, by reassigning the texture and
sampler index to a combined one. We add a hash table to map the
combined texture idx and sampler idx to this combined idx, and a
simple array to the opposite map. On the driver we work with the
separate indices to fill up the data, while the v3d compiler works
with the combined one.

As mentioned, this is needed to properly fill up the texture return
size, so as we are here, we fix that. This gets tests like the
following working:

dEQP-VK.glsl.texture_gather.basic.2d.depth32f.base_level.level_2

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv/descriptor: move descriptor_map_get_sampler, add and use get_image_view

First one as we plan to use get_sampler on more places, second one
just to get cleaner code.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: handle partial clears of just one aspect of combined DS targets

For these we can still use a compatible color format, but we need to mask
out the color components matching the aspect that is preserved.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: simplify partial clearing code

Alaways work with the render pass attachment index and avoid using
the subpass render target index completely. This makes things easier.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix incorrect attachment reference

We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead.

Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.{40,48}

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: fix incorrect attachment reference

We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead, which we were already computing
but that we were not actually using for indexing by mistake.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: compute tile granularity for each subpass

We must update our check for whether the render area is tile-aligned for
each subpass, since the hardware will update tile sizes for each RCL.

Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.8

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>

v3dv: set render area for partial clears to match clear rect

While this was already being achieved by the scissort rect set on the
pipeline, we still want to limit the render area to we reduce the tile
coverage of the pass as much as possible and avoid unnecessar
tile load and store operations.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>