broadcom/compiler: use VPM offsets in GS load_per_vertex input
authorJuan A. Suarez Romero <jasuarez@igalia.com>
Thu, 8 Apr 2021 16:20:08 +0000 (18:20 +0200)
committerMarge Bot <eric+marge@anholt.net>
Tue, 13 Apr 2021 16:08:00 +0000 (16:08 +0000)
Vertex Shader has a store_out lowering pass that converts gallium driver
locations in offsets inside the VPM.

One of the consequences is that these offsets are consecutives; that is,
if the VS stores VARYING_SLOT_VAR0.xyz and VARYING_SLOT_VAR1.xyzw, there
isn't a hole in the VPM offsets for the un-stored VARYING_SLOT_VAR0.w.

Thus we need to change how the VPM offset is computed in the Geometry
Shader when loading the inputs.

This bug is exposed by !9050.

v2 (Iago):
 - Include explanatory comment.
 - Use assert.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10129>

src/broadcom/compiler/nir_to_vir.c

index 89760ca..e313a82 100644 (file)
@@ -2991,10 +2991,30 @@ ntq_emit_intrinsic(struct v3d_compile *c, nir_intrinsic_instr *instr)
                 break;
 
         case nir_intrinsic_load_per_vertex_input: {
-                /* col: vertex index, row = varying index */
+                /* The vertex shader writes all its used outputs into
+                 * consecutive VPM offsets, so if any output component is
+                 * unused, its VPM offset is used by the next used
+                 * component. This means that we can't assume that each
+                 * location will use 4 consecutive scalar offsets in the VPM
+                 * and we need to compute the VPM offset for each input by
+                 * going through the inputs and finding the one that matches
+                 * our location and component.
+                 *
+                 * col: vertex index, row = varying index
+                 */
+                int32_t row_idx = -1;
+                for (int i = 0; i < c->num_inputs; i++) {
+                        struct v3d_varying_slot slot = c->input_slots[i];
+                        if (v3d_slot_get_slot(slot) == nir_intrinsic_io_semantics(instr).location &&
+                            v3d_slot_get_component(slot) == nir_intrinsic_component(instr)) {
+                                row_idx = i;
+                                break;
+                        }
+                }
+
+                assert(row_idx != -1);
+
                 struct qreg col = ntq_get_src(c, instr->src[0], 0);
-                uint32_t row_idx = nir_intrinsic_base(instr) * 4 +
-                                   nir_intrinsic_component(instr);
                 for (int i = 0; i < instr->num_components; i++) {
                         struct qreg row = vir_uniform_ui(c, row_idx++);
                         ntq_store_dest(c, &instr->dest, i,