aarch64: Add --params to control the number of recip steps [PR94154]

author Bu Le <bule1@huawei.com>

Thu, 12 Mar 2020 22:39:12 +0000 (22:39 +0000)

committer Richard Sandiford <richard.sandiford@arm.com>

Fri, 13 Mar 2020 09:18:40 +0000 (09:18 +0000)
author Bu Le <bule1@huawei.com>
Thu, 12 Mar 2020 22:39:12 +0000 (22:39 +0000)
committer Richard Sandiford <richard.sandiford@arm.com>
Fri, 13 Mar 2020 09:18:40 +0000 (09:18 +0000)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog

index 6198289..ac8940a 100644 (file)
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2020-03-13  Bu Le  <bule1@huawei.com>
+
+       PR target/94154
+       * config/aarch64/aarch64.opt (-param=aarch64-float-recp-precision=)
+       (-param=aarch64-double-recp-precision=): New options.
+       * doc/invoke.texi: Document them.
+       * config/aarch64/aarch64.c (aarch64_emit_approx_div): Use them
+       instead of hard-coding the choice of 1 for float and 2 for double.
+
  2019-03-13  Eric Botcazou  <ebotcazou@adacore.com>
  
         PR rtl-optimization/94119
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

index c320d5b..2c81f86 100644 (file)
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12911,10 +12911,12 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
    /* Iterate over the series twice for SF and thrice for DF.  */
    int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
  
-  /* Optionally iterate over the series once less for faster performance,
-     while sacrificing the accuracy.  */
+  /* Optionally iterate over the series less for faster performance,
+     while sacrificing the accuracy.  The default is 2 for DF and 1 for SF.  */
    if (flag_mlow_precision_div)
-    iterations--;
+    iterations = (GET_MODE_INNER (mode) == DFmode
+                 ? aarch64_double_recp_precision
+                 : aarch64_float_recp_precision);
  
    /* Iterate over the series to calculate the approximate reciprocal.  */
    rtx xtmp = gen_reg_rtx (mode);
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt

index 77df0b7..37181b5 100644 (file)
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -262,3 +262,12 @@ Generate local calls to out-of-line atomic operations.
  -param=aarch64-sve-compare-costs=
  Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
  When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach.  Also use the cost model to choose between SVE and Advanced SIMD vectorization.
+
+-param=aarch64-float-recp-precision=
+Target Joined UInteger Var(aarch64_float_recp_precision) Init(1) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for float type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 1.
+
+-param=aarch64-double-recp-precision=
+Target Joined UInteger Var(aarch64_double_recp_precision) Init(2) IntegerRange(1, 5) Param
+The number of Newton iterations for calculating the reciprocal for double type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 2.
+
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

index af28015..96a9516 100644 (file)
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13179,6 +13179,17 @@ Also use the cost model to choose between SVE and Advanced SIMD vectorization.
  Using unpacked vectors includes storing smaller elements in larger
  containers and accessing elements with extending loads and truncating
  stores.
+
+@item aarch64-float-recp-precision
+The number of Newton iterations for calculating the reciprocal for float type.
+The precision of division is proportional to this param when division
+approximation is enabled.  The default value is 1.
+
+@item aarch64-double-recp-precision
+The number of Newton iterations for calculating the reciprocal for double type.
+The precision of division is propotional to this param when division
+approximation is enabled.  The default value is 2.
+
  @end table
  
  @end table
author	Bu Le <bule1@huawei.com>
	Thu, 12 Mar 2020 22:39:12 +0000 (22:39 +0000)
committer	Richard Sandiford <richard.sandiford@arm.com>
	Fri, 13 Mar 2020 09:18:40 +0000 (09:18 +0000)
gcc/ChangeLog		patch \| blob \| history
gcc/config/aarch64/aarch64.c		patch \| blob \| history
gcc/config/aarch64/aarch64.opt		patch \| blob \| history
gcc/doc/invoke.texi		patch \| blob \| history