From 08548650bd36f9202564f3266c3e2b4736e885a4 Mon Sep 17 00:00:00 2001 From: Emma Anholt Date: Thu, 25 Aug 2022 14:34:20 -0700 Subject: [PATCH] turnip: Enable lowering of mediump temps/CS shared to 16-bit. In Aztec Ruins, we end up storing some big shared-mem arrays as 16-bit, cutting shared mem size in half across many shaders while also reducing conversions. gfxbench vk-5-normal perf +0.364983% +/- 0.189764% (n=4). fossil-db: Totals from 448 (2.99% of 14988) affected shaders: MaxWaves: 6154 -> 6390 (+3.83%); split: +3.96%, -0.13% Instrs: 174554 -> 165045 (-5.45%); split: -6.45%, +1.01% CodeSize: 364224 -> 345558 (-5.12%); split: -6.03%, +0.90% NOPs: 48224 -> 48024 (-0.41%); split: -3.33%, +2.91% MOVs: 6985 -> 6104 (-12.61%); split: -19.11%, +6.50% Full: 4577 -> 4101 (-10.40%); split: -11.08%, +0.68% (ss): 3428 -> 3335 (-2.71%); split: -4.17%, +1.46% (sy): 1250 -> 1205 (-3.60%); split: -4.72%, +1.12% (ss)-stall: 14695 -> 14528 (-1.14%); split: -2.25%, +1.12% (sy)-stall: 19565 -> 17998 (-8.01%); split: -9.55%, +1.54% STPs: 1086 -> 870 (-19.89%) LDPs: 162 -> 108 (-33.33%) Cat0: 51400 -> 51120 (-0.54%); split: -3.31%, +2.76% Cat1: 16861 -> 14688 (-12.89%); split: -18.18%, +5.30% Cat2: 71161 -> 68454 (-3.80%); split: -4.52%, +0.72% Cat3: 29572 -> 25306 (-14.43%); split: -14.49%, +0.06% Cat4: 3128 -> 3131 (+0.10%) Cat5: 1502 -> 1506 (+0.27%) Cat6: 840 -> 750 (-10.71%) aztec ruins is a big winner with the ldp/stp reductions. summoners_war racks up an astounding 41% reduction in instructions and +15% max_waves. Most affected apps show a minor win in instrs, with fallout_shelter_online, and aztec ruins on ANGLE taking minor hits. Part-of: --- src/freedreno/vulkan/tu_shader.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/src/freedreno/vulkan/tu_shader.c b/src/freedreno/vulkan/tu_shader.c index 8c98452..c932cc7 100644 --- a/src/freedreno/vulkan/tu_shader.c +++ b/src/freedreno/vulkan/tu_shader.c @@ -104,9 +104,22 @@ tu_spirv_to_nir(struct tu_device *dev, NIR_PASS_V(nir, nir_lower_sysvals_to_varyings, &sysvals_to_varyings); NIR_PASS_V(nir, nir_lower_global_vars_to_local); + + /* Older glslang missing bf6efd0316d8 ("SPV: Fix #2293: keep relaxed + * precision on arg passed to relaxed param") will pass function args through + * a highp temporary, so we need the nir_opt_find_array_copies() and a copy + * prop before we lower mediump vars, or you'll be unable to optimize out + * array copies after lowering. We do this before splitting copies, since + * that works against nir_opt_find_array_copies(). + * */ + NIR_PASS_V(nir, nir_opt_find_array_copies); + NIR_PASS_V(nir, nir_opt_copy_prop_vars); + NIR_PASS_V(nir, nir_opt_dce); + NIR_PASS_V(nir, nir_split_var_copies); NIR_PASS_V(nir, nir_lower_var_copies); + NIR_PASS_V(nir, nir_lower_mediump_vars, nir_var_function_temp | nir_var_shader_temp | nir_var_mem_shared); NIR_PASS_V(nir, nir_opt_copy_prop_vars); NIR_PASS_V(nir, nir_opt_combine_stores, nir_var_all); -- 2.7.4