-fftz-math: assume that denorms _must_ be flushed to zero optimizations

Submitted by Pekka Jääskeläinen on Aug. 10, 2017, 4:39 p.m.

Details

Message ID CAJk11WCfMLenz=hume0aTTpkpJ6GM0bB3xsFL+dSvnA9zVPs+Q@mail.gmail.com
State New
Headers show

Commit Message

Pekka Jääskeläinen Aug. 10, 2017, 4:39 p.m.
Hi,

The attached patch adds a new switch -fftz-math which makes certain
optimizations
assume that "flush to zero" behavior of denormal inputs and outputs is
not an optimization
hint, but required behavior for semantical correctness.

The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to
zero handling is required,
(not only "allowed") in case an HSAIL instruction is marked with the
'ftz' modifier (all HSA Base
profile instructions are).

The patch is not complete and likely misses many optimizations.
However, it is a starting point
that fixes a few cases brought out by the HSAIL conformance suite. We
plan to extend this
as new cases come up.

OK for trunk?

BR,
Pekka

Comments

Richard Guenther Aug. 14, 2017, 8:43 a.m.
On Thu, Aug 10, 2017 at 6:39 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote:
> Hi,
>
> The attached patch adds a new switch -fftz-math which makes certain
> optimizations
> assume that "flush to zero" behavior of denormal inputs and outputs is
> not an optimization
> hint, but required behavior for semantical correctness.
>
> The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to
> zero handling is required,
> (not only "allowed") in case an HSAIL instruction is marked with the
> 'ftz' modifier (all HSA Base
> profile instructions are).

This suggests only outputs are flushed to zero?  OTOH documentation
for X * 1 -> X suggests otherwise.  This simplification also suggests to
make FTZ operations explicit instead of adding a flag?  Thus the BRIG
FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
could eventually add a pass optimizing FTZ operations?

> The patch is not complete and likely misses many optimizations.
> However, it is a starting point
> that fixes a few cases brought out by the HSAIL conformance suite. We
> plan to extend this
> as new cases come up.
>
> OK for trunk?

Hmm.  I don't like testing flag_ftz_math too much here.

Are input denormals really required to be flushed to zero or is it enough
to flush outputs to zero?  If the latter then this is more like the modes
not having denormals (and much nicer to optimization, only constant
folding being affected)?

Richard.

> BR,
> Pekka
Pekka Jääskeläinen Aug. 14, 2017, 10:45 a.m.
Hi Richard,

The base idea of the patch is to optimize for the (common) situation
where FTZ/DAZ
is controlled by a CPU-wide flag and we then need to only avoid compile-time
optimizations that assume semantics where denorm handling is on to support
the ‘forced FTZ/DAZ semantics’.

> This suggests only outputs are flushed to zero?  OTOH documentation
> for X * 1 -> X suggests otherwise.  This simplification also suggests to
> make FTZ operations explicit instead of adding a flag?  Thus the BRIG
> FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
> could eventually add a pass optimizing FTZ operations?

Both the inputs and outputs must be flushed to zero in the HSAIL’s
‘ftz’ semantics.
FTZ operations were previously always “explicit” in the BRIG FE output, like you
propose here; there were builtin calls injected for all inputs and the
output of ‘ftz’-marked
float HSAIL instructions. This is still provided as a fallback for
targets which do not
support a CPU mode flag.

The problem with a special FTZ ‘operation’ of some kind in the generic output is
that the basic optimizations get confused by a new operation and we’d need to
add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
code, which
seems unnecessary to support this case as the optimizations typically apply also
for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.

Thanks,
Pekka
Richard Guenther Aug. 14, 2017, 11:17 a.m.
On Mon, Aug 14, 2017 at 12:45 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote:
> Hi Richard,
>
> The base idea of the patch is to optimize for the (common) situation
> where FTZ/DAZ
> is controlled by a CPU-wide flag and we then need to only avoid compile-time
> optimizations that assume semantics where denorm handling is on to support
> the ‘forced FTZ/DAZ semantics’.
>
>> This suggests only outputs are flushed to zero?  OTOH documentation
>> for X * 1 -> X suggests otherwise.  This simplification also suggests to
>> make FTZ operations explicit instead of adding a flag?  Thus the BRIG
>> FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we
>> could eventually add a pass optimizing FTZ operations?
>
> Both the inputs and outputs must be flushed to zero in the HSAIL’s
> ‘ftz’ semantics.
> FTZ operations were previously always “explicit” in the BRIG FE output, like you
> propose here; there were builtin calls injected for all inputs and the
> output of ‘ftz’-marked
> float HSAIL instructions. This is still provided as a fallback for
> targets which do not
> support a CPU mode flag.

I see.  But how does making them implicit fix cases in the conformance
testsuite?  That is, isn't the error in the runtime implementation of
__hsail_ftz_*?  I'd have used a "simple"

  if (fpclassify (x) == FP_SUBNORMAL)
    return copysign (0, x);

> The problem with a special FTZ ‘operation’ of some kind in the generic output is
> that the basic optimizations get confused by a new operation and we’d need to
> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
> code, which
> seems unnecessary to support this case as the optimizations typically apply also
> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.

Apart from the exceptions you needed to guard ... do you have an example of
a transform that is confused by explicit FTZ and that would be valid if that FTZ
were implicit?  An explicit FTZ should be much safer.  I think the builtins
should also be CONST and not only PURE.

Richard.

> Thanks,
> Pekka
Joseph S. Myers Aug. 14, 2017, 12:30 p.m.
On Mon, 14 Aug 2017, Pekka Jääskeläinen wrote:

> Both the inputs and outputs must be flushed to zero in the HSAIL’s
> ‘ftz’ semantics.

Presumably this means that constant folding needs to know about those 
semantics, both for operations with a subnormal floating-point argument 
(whether or not the output is floating point, or floating point in the 
same format), and those with such a result?

Can assignments copy subnormals without converting them to zero?  Should 
comparisons flush input subnormals to zero before comparing?  Should 
conversions e.g. from float to double convert a float subnormal input to 
zero?

Patch hide | download patch | download mbox

Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 251026)
+++ gcc/common.opt	(working copy)
@@ -2281,6 +2281,11 @@ 
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fftz-math
+Common Report Var(flag_ftz_math) Optimization
+Optimizations handle floating-point operations as they must flush
+subnormal floating-point values to zero.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 251026)
+++ gcc/doc/invoke.texi	(working copy)
@@ -9458,6 +9458,17 @@ 
 This option is experimental and does not currently guarantee to
 disable all GCC optimizations that affect signaling NaN behavior.
 
+@item -fftz-math
+@opindex ftz-math
+This option is experimental. With this flag on GCC treats
+floating-point operations (except abs, class, copysign and neg) as
+they must flush subnormal input operands and results to zero
+(FTZ). The FTZ rules are derived from HSA Programmers Reference Manual
+for the base profile. This alters optimizations that would break the
+rules, for example X * 1 -> X simplification. The option assumes the
+target supports FTZ in hardware and has it enabled - either by default
+or set by the user.
+
 @item -fno-fp-int-builtin-inexact
 @opindex fno-fp-int-builtin-inexact
 Do not allow the built-in functions @code{ceil}, @code{floor},
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	(revision 251026)
+++ gcc/fold-const-call.c	(working copy)
@@ -697,7 +697,7 @@ 
 	      && do_mpfr_arg1 (result, mpfr_y1, arg, format));
 
     CASE_CFN_FLOOR:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_floor (result, format, arg);
 	  return true;
@@ -705,7 +705,7 @@ 
       return false;
 
     CASE_CFN_CEIL:
-      if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math)
+      if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math)
 	{
 	  real_ceil (result, format, arg);
 	  return true;
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 251026)
+++ gcc/match.pd	(working copy)
@@ -143,6 +143,7 @@ 
 (simplify
  (mult @0 real_onep)
  (if (!HONOR_SNANS (type)
+      && !flag_ftz_math
       && (!HONOR_SIGNED_ZEROS (type)
           || !COMPLEX_FLOAT_TYPE_P (type)))
   (non_lvalue @0)))
@@ -151,6 +152,7 @@ 
 (simplify
  (mult @0 real_minus_onep)
   (if (!HONOR_SNANS (type)
+       && !flag_ftz_math
        && (!HONOR_SIGNED_ZEROS (type)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
@@ -332,13 +334,13 @@ 
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (negate @0)))
 
 (if (flag_reciprocal_math)
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(revision 251026)
+++ gcc/simplify-rtx.c	(working copy)
@@ -2565,8 +2565,10 @@ 
 	return op1;
 
       /* In IEEE floating point, x*1 is not equivalent to x for
-	 signalling NaNs.  */
+	 signalling NaNs.
+	 For -fftz-math, x*1 is not equivalent to x for subnormals. */
       if (!HONOR_SNANS (mode)
+	  && (FLOAT_MODE_P (mode) && !flag_ftz_math)
 	  && trueop1 == CONST1_RTX (mode))
 	return op0;