From 3d80d6b69678fe0a76ac896311748769b23e8ced Mon Sep 17 00:00:00 2001 From: =?utf8?q?Marek=20Ol=C5=A1=C3=A1k?= Date: Sat, 30 Oct 2021 07:09:22 -0400 Subject: [PATCH] radeonsi: enable nir_group_loads for better performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit The best case I have is one viewperf subtest getting +9% performance. 56979 shaders in 34726 tests Totals: SGPRS: 2667522 -> 2669178 (0.06 %) VGPRS: 1543608 -> 1553472 (0.64 %) Spilled SGPRs: 4090 -> 4100 (0.24 %) Spilled VGPRs: 1600 -> 1791 (11.94 %) Private memory VGPRs: 256 -> 256 (0.00 %) Scratch size: 1872 -> 2076 (10.90 %) dwords per thread Code Size: 59443980 -> 59479804 (0.06 %) bytes Max Waves: 867280 -> 865634 (-0.19 %) Acked-by: Pierre-Eric Pelloux-Prayer Reviewed-by: Timur Kristóf v2: No change in pixels but the hash changed. Part-of: --- src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml b/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml index b2e46f3..bd80fe7 100644 --- a/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml +++ b/src/gallium/drivers/radeonsi/ci/traces-radeonsi.yml @@ -37,7 +37,7 @@ traces: - path: gputest/pixmark-piano.trace expectations: - device: gl-radeonsi-stoney - checksum: 58a86d233d03e2a174cb79c16028f916 + checksum: a7317d54d452d19ce630c7f554f2279b - path: gputest/triangle.trace expectations: - device: gl-radeonsi-stoney diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index c8892a3..c19790a 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1426,6 +1426,11 @@ struct nir_shader *si_get_nir_shader(struct si_shader_selector *sel, nir_var_shader_out); } + /* This helps LLVM form VMEM clauses and thus get more GPU cache hits. + * 200 is tuned for Viewperf. It should be done last. + */ + NIR_PASS_V(nir, nir_group_loads, nir_group_same_resource_only, 200); + return nir; } -- 2.7.4