Next part don't know whether p_end_with_regs args are loaded from
memory ops or not, need to wait it's done here.
Other memory load needs to be waited too like:
a = load_mem()
b = ...
if (...) {
wait_mem(a)
store_mem(a)
}
p_end_with_regs(b)
"a" still needs to be waited, otherwise next shader part regs may
be overwritten by unfinished memory loads.
Memory stores are waited too. When >=gfx10 and last VGT has no
parameter export, we need to wait all memeory stores done before
pos export (see ac_nir_export_position). So when merged shader
(ES+GS or VS+GS) is partially built, first stage needs to wait
all memory stores done, otherwise second stage don't know if
any memory stores pending before.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signe-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
}
}
+ /* For last block of a program which has succeed shader part, wait all memory ops done
+ * before go to next shader part.
+ */
+ if (block.kind & block_kind_end_with_regs)
+ force_waitcnt(ctx, queued_imm);
+
if (!queued_imm.empty())
emit_waitcnt(ctx, new_instructions, queued_imm);
if (!queued_delay.empty())
in_ctx[current.index] = ctx;
}
- if (current.instructions.empty()) {
- out_ctx[current.index] = std::move(ctx);
- continue;
- }
-
loop_progress = std::max<unsigned>(loop_progress, current.loop_nest_depth);
done[current.index] = true;
Builder bld(program, end_with_regs_block);
bld.sopp(aco_opcode::s_branch, exit_block->index);
+
+ /* For insert waitcnt pass to add waitcnt in exit block, otherwise waitcnt will be added
+ * after the s_branch which won't be executed.
+ */
+ end_with_regs_block->kind &= ~block_kind_end_with_regs;
+ exit_block->kind |= block_kind_end_with_regs;
}
}