r300: fix register rewrite when converting rbg instructions to alpha
Example from dEQP-GLES2.functional.shaders.indexing.tmp_array.float_dynamic_write_dynamic_loop_read_fragment
Fragment Program: after 'pair translate'
0: src0.xyz = input[0], src1.xyz = const[5]
MAD temp[0].xyz, src0.xxx, src1.Hyz, src0.000
1: src0.xyz = const[1], src1.xyz = const[6]
MAD temp[1].xyz, src0.xxx, src0.111, -src1.x1z
2: src0.xyz = temp[1]
CMP temp[1].xyz, src0.000, src0.111, src0.xyz
3: src0.xyz = temp[0], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[2].x, src0.x__, src1.x__, -src2.y__
4: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[3].x, src0.x__, src1.x__, -src2.z__
5: src0.xyz = temp[1]
MAX temp[4].x, src0.x__, src0.z__
6: src0.xyz = temp[0], src1.xyz = input[0], src2.xyz = temp[4]
CMP temp[4].x, src0.x__, src1.x__, -src2.x__
7: src0.xyz = temp[3], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[3].x, src0.x__, src1.x__, -src2.x__
8: src0.xyz = input[0], src1.xyz = temp[2], src2.xyz = temp[1]
CMP temp[2].x, src0.x__, src1.x__, -src2.x__
9: src0.xyz = temp[1]
MAD temp[1].x, src0.x__, src0.y__, src0.000
10: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[1].x, src0.x__, src1.x__, -src2.x__
11: src0.xyz = const[2], src1.xyz = const[6]
MAD temp[5].xyz, src0.xxx, src0.111, -src1.x1z
12: src0.xyz = temp[5]
CMP temp[5].xyz, src0.000, src0.111, src0.xyz
13: src0.xyz = temp[0], src1.xyz = temp[2], src2.xyz = temp[5]
CMP temp[6].x, src0.y__, src1.x__, -src2.y__
14: src0.xyz = temp[3], src1.xyz = temp[0], src2.xyz = temp[5]
CMP temp[7].x, src0.x__, src1.y__, -src2.z__
15: src0.xyz = temp[5]
MAX temp[8].x, src0.x__, src0.z__
16: src0.xyz = temp[0], src1.xyz = temp[4], src2.xyz = temp[8]
CMP temp[4].x, src0.y__, src1.x__, -src2.x__
17: src0.xyz = temp[7], src1.xyz = temp[3], src2.xyz = temp[5]
CMP temp[3].x, src0.x__, src1.x__, -src2.x__
18: src0.xyz = temp[2], src1.xyz = temp[6], src2.xyz = temp[5]
CMP temp[2].x, src0.x__, src1.x__, -src2.x__
....
This will be pair scheduled to:
Fragment Program: after 'pair scheduling'
0: src0.xyz = input[0], src1.xyz = const[5] // original inst 0
MAD temp[0].xyz, src0.xxx, src1.Hyz, src0.000
1: src0.xyz = const[1], src1.xyz = const[6] // original inst 1
MAD temp[1].xyz, src0.xxx, src0.111, -src1.x1z
2: src0.xyz = const[2], src1.xyz = const[6] // original inst 11
MAD temp[5].xyz, src0.xxx, src0.111, -src1.x1
3: src0.xyz = temp[1] // original inst 2
CMP temp[1].xyz, src0.000, src0.111, src0.xyz
4: src0.xyz = temp[1], src1.xyz = temp[0], src2.xyz = input[0]
MAX temp[4].x, src0.x__, src0.z__ // original inst 5
CMP temp[2].w, src1.x, src2.x, -src0.y // original inst 3
5: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[3].w, src0.x, src1.x, -src2.z // original inst 4
6: src0.xyz = temp[5], src0.w = temp[2], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[5].xyz, src0.000, src0.111, src0.xyz // original inst 12
CMP temp[5].w, src1.x, src0.w, -src2.x // original inst 8
7: src0.xyz = temp[0], src0.w = temp[5], src1.xyz = temp[2], src2.xyz = temp[5]
CMP temp[6].x, src0.y__, src0.w__, -src2.y__ // original inst 13
8: src0.xyz = temp[5], src0.w = temp[3], src1.xyz = input[0], src2.xyz = temp[1]
MAX temp[8].x, src0.x__, src0.z__ // original inst 15
CMP temp[5].w, src0.w, src1.x, -src2.x // original inst 7
9: src0.xyz = temp[3], src0.w = temp[5], src1.xyz = temp[0], src2.xyz = temp[5]
CMP temp[7].x, src0.w__, src1.y__, -src2.z__ // original inst 14
10: src0.xyz = temp[2], src0.w = temp[5], src1.xyz = temp[6], src2.xyz = temp[5]
CMP temp[2].x, src0.w__, src1.x__, -src2.x__ // original inst 18
11: src0.xyz = temp[7], src0.w = temp[5], src1.xyz = temp[3], src2.xyz = temp[5]
CMP temp[3].x, src0.x__, src0.w__, -src2.x__ // original inst 17
....
The problem is that instruction 11 (which was instruction 17 before the scheduling) now reads
a wrong source for src0. It initially used the result of instruction 8 (now scheduled as 6),
but now it reads from instruction 8 (corresponding to instruction 7 before the scheduling).
The bug is quite subtle and needs few conditions to reproduce:
- there is a loop, therefore we skip the the register rename
pass and hence don't have the ssa-like form,
- there are at least two rgb instructions writing the same register
and both are convertible to alpha instruction,
- there is excess of rgb instructions, so that the conversion actually
happens.
So what happens, while scheduling instructions, the scheduler will
recognize there are no alpha instruction to pair the rgb ones with
and convert some to alpha. It primarily tries to use the same register,
just reuse the alpha channel.
Why it happens? We are tracking the usage of registers in the block
being scheduled and when we rewrite something we move the users tracked
by the reg_value structures to the new register. The problem is that when
we do this, the current code expects that the code is in the ssa-like
form. Here it is not (because of the loop) and when we convert the
original instruction 2, we move the dependency information about the
temp[2].x to temp[2].w. When we later convert instruction 8, which also
writes temp[2].x, the original dependency info is gone, and when we copy
that to the new reg (temp[5].w), we just set it to NULL and it means we
don't mark it as used effectively, and later wrongly use it again when
we look for a next empty register.
Fix this by not deleting the original dependency info. We can't reuse the
reg now, but it doesn't matter, because the regalloc later can sort it out.
There are no changes in the shader-db.
Fixes: dEQP-GLES2.functional.shaders.indexing.tmp_array.float_dynamic_write_dynamic_loop_read_fragment
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6508
Reviewed-by: Filip Gawin <filip@gawin.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18621>