From 71e5530c07a1743c68d889b11c3fa2259a6376be Mon Sep 17 00:00:00 2001
From: Ian Romanick <ian.d.romanick@intel.com>
Date: Thu, 23 Mar 2023 16:01:13 -0700
Subject: [PATCH] nir/algebraic: Undistribute fsat from fmax

To be helpful, the thing inside the fsat has to be used with and without
the fsat. Otherwise it just moves a saturate destination modifier
around. To not be harmful, the fsat has to only be used by the bcsel.

All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20174475 -> 20174449 (<.01%)
instructions in affected programs: 3913 -> 3887 (-0.66%)
helped: 13 / HURT: 0

total cycles in shared programs: 866844832 -> 866844719 (<.01%)
cycles in affected programs: 46037 -> 45924 (-0.25%)
helped: 10 / HURT: 1

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 161491468 -> 161491372 (-0.0%)
helped: 31 / HURT: 8

Cycles in all programs: 10933090736 -> 10933024716 (-0.0%)
helped: 32 / HURT: 18

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index 1e4ea52..1c553c4 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -833,6 +833,7 @@ optimizations.extend([
    (('umin', ('umax', ('umin', ('umax', a, b), c), b), c), ('umin', ('umax', a, b), c)),
    # Both the left and right patterns are "b" when isnan(a), so this is exact.
    (('fmax', ('fsat', a), '#b(is_zero_to_one)'), ('fsat', ('fmax', a, b))),
+   (('fmax', ('fsat(is_used_once)', a), ('fsat(is_used_once)', b)), ('fsat', ('fmax', a, b))),
    # The left pattern is 0.0 when isnan(a) (because fmin(fsat(NaN), b) ->
    # fmin(0.0, b)) while the right one is "b", so this optimization is inexact.
    (('~fmin', ('fsat', a), '#b(is_zero_to_one)'), ('fsat', ('fmin', a, b))),
-- 
2.7.4