broadcom/compiler: try harder to merge thread switch earlier
We have been stopping as soon as we find a conflict but that doesn't
mean we can't merge it in an earlier slot, so keep going. Going by
shader-db, this sometimes allows us to merge the final thrsw a bit
earlier and avoid emitting NOP instructions at the program end to
make up for its delay slots. I have not observed cases where this
helps with regular thrsw though, but it doesn't hurt to try with
those too.
total instructions in shared programs:
11526876 ->
11526354 (<.01%)
instructions in affected programs: 10760 -> 10238 (-4.85%)
helped: 236
HURT: 0
Instructions are helped.
total max-temps in shared programs: 2231705 -> 2231677 (<.01%)
max-temps in affected programs: 276 -> 248 (-10.14%)
helped: 27
HURT: 0
Max-temps are helped.
total inst-and-stalls in shared programs:
11545177 ->
11544655 (<.01%)
inst-and-stalls in affected programs: 10777 -> 10255 (-4.84%)
helped: 236
HURT: 0
Inst-and-stalls are helped.
total nops in shared programs: 321624 -> 321152 (-0.15%)
nops in affected programs: 751 -> 279 (-62.85%)
helped: 236
HURT: 0
Nops are helped.
Reviewed-by: Alejandro PiƱeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22679>