From 3fedf22b6089f0251426cd4c431363bbd0e4413a Mon Sep 17 00:00:00 2001 From: Alyssa Rosenzweig Date: Fri, 17 Jun 2022 12:58:55 -0400 Subject: [PATCH] pan/bi: Tune lower_vars_to_scratch MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Increase the threshold to lower indirect indexing of arrays to scratch memory all the way up to 256 bytes, which was the lowest power-of-two threshold for which enabling the pass on Mali-G57 was a win in shaderdb. It's difficult to tell what threshold is optimal here. The shader-db stats are based on a rough cycle model that assumes a 16:1 ratio between CVT and load/store on Valhall, and a 24:1 ratio between arithmetic and load/store on Bifrost. Those ratios are at most rules of thumb, as the number of cycles required by a load/store instruction will vary tremendously based on caching and the memory controller. However, they may well be lower bounds (if those are the upper bounds on instruction issuing in the Mali shader cores). As such, a large threshold seems well motivated. shader-db results on Mali-G52 follow, results on Mali-G57 were similar. Note the shader that's hurt for spills/fills is *helped* for load/store overall. cycles helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%)) ldst helped: 129 -> 98 (-24.03%) (spills: 17 -> 20 (17.65%); fills: 34 -> 40 (17.65%)) total instructions in shared programs: 2415410 -> 2415372 (<.01%) instructions in affected programs: 1041 -> 1003 (-3.65%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 31.0 x̄: 12.67 x̃: 5 helped stats (rel) min: 2.08% max: 6.02% x̄: 3.90% x̃: 3.60% total tuples in shared programs: 1928558 -> 1928527 (<.01%) tuples in affected programs: 826 -> 795 (-3.75%) helped: 2 HURT: 1 helped stats (abs) min: 6.0 max: 26.0 x̄: 16.00 x̃: 16 helped stats (rel) min: 3.72% max: 9.68% x̄: 6.70% x̃: 6.70% HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.54% max: 1.54% x̄: 1.54% x̃: 1.54% total clauses in shared programs: 355013 -> 354981 (<.01%) clauses in affected programs: 220 -> 188 (-14.55%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 27.0 x̄: 10.67 x̃: 3 helped stats (rel) min: 13.99% max: 21.43% x̄: 16.93% x̃: 15.38% total cycles in shared programs: 166610.27 -> 166574.90 (-0.02%) cycles in affected programs: 138 -> 102.62 (-25.63%) helped: 3 HURT: 0 helped stats (abs) min: 0.4583330000000001 max: 31.0 x̄: 11.79 x̃: 3 helped stats (rel) min: 15.28% max: 65.28% x̄: 34.86% x̃: 24.03% total arith in shared programs: 73690.13 -> 73690.58 (<.01%) arith in affected programs: 29.71 -> 30.17 (1.54%) helped: 1 HURT: 2 helped stats (abs) min: 0.0833339999999998 max: 0.0833339999999998 x̄: 0.08 x̃: 0 helped stats (rel) min: 3.85% max: 3.85% x̄: 3.85% x̃: 3.85% HURT stats (abs) min: 0.125 max: 0.4166659999999993 x̄: 0.27 x̃: 0 HURT stats (rel) min: 1.66% max: 5.17% x̄: 3.42% x̃: 3.42% total ldst in shared programs: 135611 -> 135571 (-0.03%) ldst in affected programs: 138 -> 98 (-28.99%) helped: 3 HURT: 0 helped stats (abs) min: 3.0 max: 31.0 x̄: 13.33 x̃: 6 helped stats (rel) min: 24.03% max: 100.00% x̄: 74.68% x̃: 100.00% total quadwords in shared programs: 1674599 -> 1674523 (<.01%) quadwords in affected programs: 838 -> 762 (-9.07%) helped: 3 HURT: 0 helped stats (abs) min: 2.0 max: 65.0 x̄: 25.33 x̃: 9 helped stats (rel) min: 3.39% max: 15.00% x̄: 9.14% x̃: 9.04% total spills in shared programs: 37 -> 40 (8.11%) spills in affected programs: 17 -> 20 (17.65%) helped: 0 HURT: 1 total fills in shared programs: 190 -> 196 (3.16%) fills in affected programs: 34 -> 40 (17.65%) helped: 0 HURT: 1 Signed-off-by: Alyssa Rosenzweig Part-of: --- src/panfrost/bifrost/bifrost_compile.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/panfrost/bifrost/bifrost_compile.c b/src/panfrost/bifrost/bifrost_compile.c index 6328c80..bacd9e1 100644 --- a/src/panfrost/bifrost/bifrost_compile.c +++ b/src/panfrost/bifrost/bifrost_compile.c @@ -4848,9 +4848,8 @@ bi_finalize_nir(nir_shader *nir, unsigned gpu_id, bool is_blend) */ bool packed_tls = (gpu_id >= 0x9000); - /* Lower large arrays to scratch and small arrays to bcsel (TODO: tune - * threshold, but not until addresses / csel is optimized better) */ - NIR_PASS_V(nir, nir_lower_vars_to_scratch, nir_var_function_temp, 16, + /* Lower large arrays to scratch and small arrays to bcsel */ + NIR_PASS_V(nir, nir_lower_vars_to_scratch, nir_var_function_temp, 256, packed_tls ? glsl_get_vec4_size_align_bytes : glsl_get_natural_size_align_bytes); -- 2.7.4