Message ID | CAJk11WCfMLenz=hume0aTTpkpJ6GM0bB3xsFL+dSvnA9zVPs+Q@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Thu, Aug 10, 2017 at 6:39 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote: > Hi, > > The attached patch adds a new switch -fftz-math which makes certain > optimizations > assume that "flush to zero" behavior of denormal inputs and outputs is > not an optimization > hint, but required behavior for semantical correctness. > > The need for this was initiated by HSAIL (BRIG). With HSAIL, flush to > zero handling is required, > (not only "allowed") in case an HSAIL instruction is marked with the > 'ftz' modifier (all HSA Base > profile instructions are). This suggests only outputs are flushed to zero? OTOH documentation for X * 1 -> X suggests otherwise. This simplification also suggests to make FTZ operations explicit instead of adding a flag? Thus the BRIG FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we could eventually add a pass optimizing FTZ operations? > The patch is not complete and likely misses many optimizations. > However, it is a starting point > that fixes a few cases brought out by the HSAIL conformance suite. We > plan to extend this > as new cases come up. > > OK for trunk? Hmm. I don't like testing flag_ftz_math too much here. Are input denormals really required to be flushed to zero or is it enough to flush outputs to zero? If the latter then this is more like the modes not having denormals (and much nicer to optimization, only constant folding being affected)? Richard. > BR, > Pekka
Hi Richard, The base idea of the patch is to optimize for the (common) situation where FTZ/DAZ is controlled by a CPU-wide flag and we then need to only avoid compile-time optimizations that assume semantics where denorm handling is on to support the ‘forced FTZ/DAZ semantics’. > This suggests only outputs are flushed to zero? OTOH documentation > for X * 1 -> X suggests otherwise. This simplification also suggests to > make FTZ operations explicit instead of adding a flag? Thus the BRIG > FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we > could eventually add a pass optimizing FTZ operations? Both the inputs and outputs must be flushed to zero in the HSAIL’s ‘ftz’ semantics. FTZ operations were previously always “explicit” in the BRIG FE output, like you propose here; there were builtin calls injected for all inputs and the output of ‘ftz’-marked float HSAIL instructions. This is still provided as a fallback for targets which do not support a CPU mode flag. The problem with a special FTZ ‘operation’ of some kind in the generic output is that the basic optimizations get confused by a new operation and we’d need to add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer code, which seems unnecessary to support this case as the optimizations typically apply also for the ‘FTZ semantics’ when the FTZ/DAZ flag is on. Thanks, Pekka
On Mon, Aug 14, 2017 at 12:45 PM, Pekka Jääskeläinen <pekka@parmance.com> wrote: > Hi Richard, > > The base idea of the patch is to optimize for the (common) situation > where FTZ/DAZ > is controlled by a CPU-wide flag and we then need to only avoid compile-time > optimizations that assume semantics where denorm handling is on to support > the ‘forced FTZ/DAZ semantics’. > >> This suggests only outputs are flushed to zero? OTOH documentation >> for X * 1 -> X suggests otherwise. This simplification also suggests to >> make FTZ operations explicit instead of adding a flag? Thus the BRIG >> FE would emit FTZ (X) * 1 which we can optimize to FTZ (X), and we >> could eventually add a pass optimizing FTZ operations? > > Both the inputs and outputs must be flushed to zero in the HSAIL’s > ‘ftz’ semantics. > FTZ operations were previously always “explicit” in the BRIG FE output, like you > propose here; there were builtin calls injected for all inputs and the > output of ‘ftz’-marked > float HSAIL instructions. This is still provided as a fallback for > targets which do not > support a CPU mode flag. I see. But how does making them implicit fix cases in the conformance testsuite? That is, isn't the error in the runtime implementation of __hsail_ftz_*? I'd have used a "simple" if (fpclassify (x) == FP_SUBNORMAL) return copysign (0, x); > The problem with a special FTZ ‘operation’ of some kind in the generic output is > that the basic optimizations get confused by a new operation and we’d need to > add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer > code, which > seems unnecessary to support this case as the optimizations typically apply also > for the ‘FTZ semantics’ when the FTZ/DAZ flag is on. Apart from the exceptions you needed to guard ... do you have an example of a transform that is confused by explicit FTZ and that would be valid if that FTZ were implicit? An explicit FTZ should be much safer. I think the builtins should also be CONST and not only PURE. Richard. > Thanks, > Pekka
On Mon, 14 Aug 2017, Pekka Jääskeläinen wrote: > Both the inputs and outputs must be flushed to zero in the HSAIL’s > ‘ftz’ semantics. Presumably this means that constant folding needs to know about those semantics, both for operations with a subnormal floating-point argument (whether or not the output is floating point, or floating point in the same format), and those with such a result? Can assignments copy subnormals without converting them to zero? Should comparisons flush input subnormals to zero before comparing? Should conversions e.g. from float to double convert a float subnormal input to zero?
Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 251026) +++ gcc/common.opt (working copy) @@ -2281,6 +2281,11 @@ Common Report Var(flag_single_precision_constant) Optimization Convert floating point constants to single precision constants. +fftz-math +Common Report Var(flag_ftz_math) Optimization +Optimizations handle floating-point operations as they must flush +subnormal floating-point values to zero. + fsplit-ivs-in-unroller Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization Split lifetimes of induction variables when loops are unrolled. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 251026) +++ gcc/doc/invoke.texi (working copy) @@ -9458,6 +9458,17 @@ This option is experimental and does not currently guarantee to disable all GCC optimizations that affect signaling NaN behavior. +@item -fftz-math +@opindex ftz-math +This option is experimental. With this flag on GCC treats +floating-point operations (except abs, class, copysign and neg) as +they must flush subnormal input operands and results to zero +(FTZ). The FTZ rules are derived from HSA Programmers Reference Manual +for the base profile. This alters optimizations that would break the +rules, for example X * 1 -> X simplification. The option assumes the +target supports FTZ in hardware and has it enabled - either by default +or set by the user. + @item -fno-fp-int-builtin-inexact @opindex fno-fp-int-builtin-inexact Do not allow the built-in functions @code{ceil}, @code{floor}, Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c (revision 251026) +++ gcc/fold-const-call.c (working copy) @@ -697,7 +697,7 @@ && do_mpfr_arg1 (result, mpfr_y1, arg, format)); CASE_CFN_FLOOR: - if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) + if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math) { real_floor (result, format, arg); return true; @@ -705,7 +705,7 @@ return false; CASE_CFN_CEIL: - if (!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) + if ((!REAL_VALUE_ISNAN (*arg) || !flag_errno_math) && !flag_ftz_math) { real_ceil (result, format, arg); return true; Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 251026) +++ gcc/match.pd (working copy) @@ -143,6 +143,7 @@ (simplify (mult @0 real_onep) (if (!HONOR_SNANS (type) + && !flag_ftz_math && (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (non_lvalue @0))) @@ -151,6 +152,7 @@ (simplify (mult @0 real_minus_onep) (if (!HONOR_SNANS (type) + && !flag_ftz_math && (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (negate @0))) @@ -332,13 +334,13 @@ /* In IEEE floating point, x/1 is not equivalent to x for snans. */ (simplify (rdiv @0 real_onep) - (if (!HONOR_SNANS (type)) + (if (!HONOR_SNANS (type) && !flag_ftz_math) (non_lvalue @0))) /* In IEEE floating point, x/-1 is not equivalent to -x for snans. */ (simplify (rdiv @0 real_minus_onep) - (if (!HONOR_SNANS (type)) + (if (!HONOR_SNANS (type) && !flag_ftz_math) (negate @0))) (if (flag_reciprocal_math) Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c (revision 251026) +++ gcc/simplify-rtx.c (working copy) @@ -2565,8 +2565,10 @@ return op1; /* In IEEE floating point, x*1 is not equivalent to x for - signalling NaNs. */ + signalling NaNs. + For -fftz-math, x*1 is not equivalent to x for subnormals. */ if (!HONOR_SNANS (mode) + && (FLOAT_MODE_P (mode) && !flag_ftz_math) && trueop1 == CONST1_RTX (mode)) return op0;