aco: optimize load_local_invocation_index with single-wave workgroups
fossil-db (Sienna Cichlid):
Totals from 668 (0.52% of 128647) affected shaders:
CodeSize: 2201912 -> 2193336 (-0.39%)
Instrs: 403124 -> 402325 (-0.20%)
Latency: 4510940 -> 4510214 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 681057 -> 679453 (-0.24%); split: -0.24%, +0.00%
VClause: 6470 -> 6467 (-0.05%)
SClause: 12759 -> 12755 (-0.03%)
Copies: 26348 -> 26218 (-0.49%); split: -0.50%, +0.00%
PreSGPRs: 26140 -> 26101 (-0.15%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel-schuermann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13757>