diff mbox

[PATCHv2] Optimise the fpclassify builtin to perform integer operations when possible

Message ID VI1PR0801MB2031AD7545A8DAA5F6B9450DFFC10@VI1PR0801MB2031.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Tamar Christina Sept. 30, 2016, 1:22 p.m. UTC
Hi All,

This is v2 of the patch which adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

I have addressed most comments from everyone except for two things:

1) Providing a back-end hook to override the functionality. While certainly
   possible the current fpclassify doesn't provide this either. So I'd like to
   treat it as an enhancement rather than an issue.

2) Doing it in a lowering phase. If the general consensus is that this is the
   path the patch must take then I'd be happy to reconsider. However at this
   this patch does not seem to produce worse code than what there was before.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.

At a high level, the optimized path uses integer operations
to perform the following:

  if (exponent bits aren't all set or unset)
     return Normal;
  else if (no bits are set on the number after masking out
	   sign bits then)
     return Zero;
  else if (exponent has no bits set)
     return Subnormal;
  else if (mantissa has no bits set)
     return Infinite;
  else
     return NaN;

In case the optimization can't be applied the old
implementation is used as a fall-back.

A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).

To determine this IEEE likeness a new boolean was added to real_format.

As an example, Aarch64 now generates for classification of doubles:

f:
	fmov	x1, d0
	mov	w0, 7
	sbfx	x2, x1, 52, 11
	add	w3, w2, 1
	tst	w3, 0x07FE
	bne	.L1
	mov	w0, 13
	tst	x1, 0x7fffffffffffffff
	beq	.L1
	mov	w0, 11
	tbz	x2, 0, .L1
	tst	x1, 0xfffffffffffff
	mov	w0, 3
	mov	w1, 5
	csel	w0, w0, w1, ne

.L1:
	ret

No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 42.5%
performance gain on Aarch64.

Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 also has no regressions and modest gains (3%).

Ok for trunk?

Thanks,
Tamar

gcc/
2016-08-25  Tamar Christina  <tamar.christina@arm.com>
	    Wilco Dijkstra  <wilco.dijkstra@arm.com>

	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version. 
	* gcc/real.h (real_format): Added is_ieee_compatible field.
	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
	(mips_single_format): Likewise.
	(motorola_single_format): Likewise.
	(spu_single_format): Likewise.
	(ieee_double_format): Likewise.
	(mips_double_format): Likewise.
	(motorola_double_format): Likewise.
	(ieee_extended_motorola_format): Likewise.
	(ieee_extended_intel_128_format): Likewise.
	(ieee_extended_intel_96_round_53_format): Likewise.
	(ibm_extended_format): Likewise.
	(mips_extended_format): Likewise.
	(ieee_quad_format): Likewise.
	(mips_quad_format): Likewise.
	(vax_f_format): Likewise.
	(vax_d_format): Likewise.
	(vax_g_format): Likewise.
	(decimal_single_format): Likewise.
	(decimal_quad_format): Likewise.
	(iee_half_format): Likewise.
	(mips_single_format): Likewise.
	(arm_half_format): Likewise.
	(real_internal_format): Likewise.
        

gcc/testsuite/
2016-09-27  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.

Comments

Tamar Christina Oct. 17, 2016, 9:05 a.m. UTC | #1
Ping
Jeff Law Oct. 20, 2016, 10:59 p.m. UTC | #2
On 09/30/2016 07:22 AM, Tamar Christina wrote:
> Hi All,
>
> This is v2 of the patch which adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in format.
>
> I have addressed most comments from everyone except for two things:
>
> 1) Providing a back-end hook to override the functionality. While certainly
>    possible the current fpclassify doesn't provide this either. So I'd like to
>    treat it as an enhancement rather than an issue.
I think the concern here is PPC, particularly the newer ones which have 
significant hardware support for these kind of characterizations.

Based on the discussions though, I suspect we're going to need something 
nontrivial due to the way the API for __builtin_fpclassify works.  In 
the end I can easily see some target way to override the default code 
synthesis.

I think these issues should be left for the PPC folks to propose a 
solution when they're ready to exploit their new hardware.  I don't 
think this should block the patch.



>
> 2) Doing it in a lowering phase. If the general consensus is that this is the
>    path the patch must take then I'd be happy to reconsider. However at this
>    this patch does not seem to produce worse code than what there was before.
I think that was a desire from Richi.   I'm a bit torn here.

The code looks more like lowering rather than folding.  But it's also 
generating non-gimple trees and relies on gimple_fold_builtin to 
re-gimplify the result AFAICT.

Richi -- thoughts?

--

I think its nontrivial to judge worse vs better since it's really a 
function of the target's micro-architecture and the context in which 
fpclassify is called -- particularly where the input value lives and 
whether or not its used in other ways nearby.

In the case where the input value is in memory or not used in floating 
point arithmetic nearby, your change should be a clear win (with the 
exception of the latest ppc hardware perhaps).

If the input value is not in memory and used nearby in FP ops, then it 
gets a lot trickier.  We run the risk of making the object addressable 
which means it won't be an SSA_NAME and thus not exposed to the high 
level optimizers.

Richi has indicated that in gimple an object need not be addressable 
just because we access random pieces of it, including the ability to 
avoid marking something as addressable even though we have MEM (&decl) 
style expressions.  I'm not sure how all that works, but trust Richi 
implicitly.  Additionally you're using VIEW_CONVERT_EXPR now rather than 
ADDR_EXPR, so that may mitigate things as well.

Finally there's the issue of having to transfer the object between the 
FP and GP register files which can be highly expensive on some 
architectures.  Short of looking at the defs & immediate uses of the 
input argument and trying to guess at the cost of moving the object 
between the register files I don't see a good way to tackle this issue.

I'm inclined to not object on the performance questions.  But I would 
like to hear from Richi WRT lowering vs folding and whether or not he 
believes this belongs elsewhere.  If he does, I'd be inclined to suggest 
earlier rather than later since we do want to expose the generated code 
to the gimple optimizers.

Additional implementation comments follow inline.


>
> gcc/
> 2016-08-25  Tamar Christina  <tamar.christina@arm.com>
> 	    Wilco Dijkstra  <wilco.dijkstra@arm.com>
>
> 	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
> 	* gcc/real.h (real_format): Added is_ieee_compatible field.
> 	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
> 	(mips_single_format): Likewise.
> 	(motorola_single_format): Likewise.
> 	(spu_single_format): Likewise.
> 	(ieee_double_format): Likewise.
> 	(mips_double_format): Likewise.
> 	(motorola_double_format): Likewise.
> 	(ieee_extended_motorola_format): Likewise.
> 	(ieee_extended_intel_128_format): Likewise.
> 	(ieee_extended_intel_96_round_53_format): Likewise.
> 	(ibm_extended_format): Likewise.
> 	(mips_extended_format): Likewise.
> 	(ieee_quad_format): Likewise.
> 	(mips_quad_format): Likewise.
> 	(vax_f_format): Likewise.
> 	(vax_d_format): Likewise.
> 	(vax_g_format): Likewise.
> 	(decimal_single_format): Likewise.
> 	(decimal_quad_format): Likewise.
> 	(iee_half_format): Likewise.
> 	(mips_single_format): Likewise.
> 	(arm_half_format): Likewise.
> 	(real_internal_format): Likewise.
>
>
> gcc/testsuite/
> 2016-09-27  Tamar Christina  <tamar.christina@arm.com>
>
> 	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.
>
>
> gcc-v2-fpclassify.patch
>
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 9a19a75cc8ed6edb5f543cd7bd26bcc0693e6ebb..1b4878c5ba098dcc0a4a506dbc7959d150cc9028 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -7943,10 +7943,8 @@ static tree
>  fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
>  {
>    tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
> -    arg, type, res, tmp;
> +    arg, type, res;
>    machine_mode mode;
> -  REAL_VALUE_TYPE r;
> -  char buf[128];
>
>    /* Verify the required arguments in the original call.  */
>    if (nargs != 6
> @@ -7966,14 +7964,164 @@ fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
>    arg = args[5];
>    type = TREE_TYPE (arg);
>    mode = TYPE_MODE (type);
> -  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
> +  const real_format *format = REAL_MODE_FORMAT (mode);
> +  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
> +
> +  /*
> +  For IEEE 754 types:
> +
> +  fpclassify (x) ->
> +       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
> +	 ? (x & sign_mask == 0 ? FP_ZERO :
> +	   (exp & exp_mask == exp_mask
> +	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
> +	      FP_SUBNORMAL)):
> +       FP_NORMAL.
> +
> +  Otherwise
> +
> +  fpclassify (x) ->
> +       isnan (x) ? FP_NAN :
> +	(fabs (x) == Inf ? FP_INFINITE :
> +	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
> +	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
> +  */
> +
> +  /* Check if the number that is being classified is close enough to IEEE 754
> +     format to be able to go in the early exit code.  */
> +  if (format->is_binary_ieee_compatible
> +      && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
> +      && !(UNITS_PER_WORD == 4 && type_width == 128)
Why the UNITS_PER_WORD && type width tests against constants?

> +
> +      /* Extract exp bits from the float, where we expect the exponent to be.
> +	 We create a new type because BIT_FIELD_REF does not allow you to
> +	 extract less bits than the precision of the storage variable.  */
> +      exp_bitfield
> +        = fold_build3_loc (loc, BIT_FIELD_REF,
> +			   build_nonstandard_integer_type (exp_bits, 0),
> +			   int_arg,
> +			   build_int_cst (int_type, exp_bits),
> +			   build_int_cst (int_type, format->p - 1));
> +
> +      /* Re-interpret the extracted exponent bits as a 32 bit int.
> +	 This allows us to continue doing operations as int_type.  */
> +      exp = fold_build1_loc (loc, NOP_EXPR, int_type, exp_bitfield);
> +
> +      /* Set up some often used constants.  */
> +      const_arg0 = build_int_cst (int_arg_type, 0);
> +      const_arg1 = build_int_cst (int_arg_type, 1);
> +      const_arg2 = build_int_cst (int_arg_type, 2);
> +      const0 = build_int_cst (int_type, 0);
> +      const1 = build_int_cst (int_type, 1);
Note we have integer_zero_node and integer_one_node.  You may be able to 
use them rather than generating new nodes.  If you need the node in a 
different type, you can use fold_convert_const to do that.



> +
> +      /* b^(p-1) - 1 or 1 << (p - 2)
> +	 This creates a mask to be used to check the mantissa value.  */
> +      significant_bit = build_int_cst (int_arg_type, format->p - 2);
> +      mantissa_mask
> +        = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
> +			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
> +					    const_arg2, significant_bit),
> +			   const_arg1);
Can you please double-check this?  In particular you seem to be 
generating 2 << (p - 2), which doesn't seem to match the comment.




>
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..84a73a6483780dac2347e72fa7d139545d2087eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> @@ -0,0 +1,22 @@
> +/* This file checks the code generation for the new __builtin_fpclassify.
> +   because checking the exact assembly isn't very useful, we'll just be checking
> +   for the presence of certain instructions and the omition of others. */
s/omition/omission/


ISTM you'd want to test this for float, double and long double.


jeff
Richard Biener Oct. 21, 2016, 8:05 a.m. UTC | #3
On Thu, 20 Oct 2016, Jeff Law wrote:

> On 09/30/2016 07:22 AM, Tamar Christina wrote:
> > Hi All,
> > 
> > This is v2 of the patch which adds an optimized route to the fpclassify
> > builtin
> > for floating point numbers which are similar to IEEE-754 in format.
> > 
> > I have addressed most comments from everyone except for two things:
> > 
> > 1) Providing a back-end hook to override the functionality. While certainly
> >    possible the current fpclassify doesn't provide this either. So I'd like
> > to
> >    treat it as an enhancement rather than an issue.
> I think the concern here is PPC, particularly the newer ones which have
> significant hardware support for these kind of characterizations.
> 
> Based on the discussions though, I suspect we're going to need something
> nontrivial due to the way the API for __builtin_fpclassify works.  In the end
> I can easily see some target way to override the default code synthesis.
> 
> I think these issues should be left for the PPC folks to propose a solution
> when they're ready to exploit their new hardware.  I don't think this should
> block the patch.
> 
> 
> 
> > 
> > 2) Doing it in a lowering phase. If the general consensus is that this is
> > the
> >    path the patch must take then I'd be happy to reconsider. However at this
> >    this patch does not seem to produce worse code than what there was
> > before.
> I think that was a desire from Richi.   I'm a bit torn here.
> 
> The code looks more like lowering rather than folding.  But it's also
> generating non-gimple trees and relies on gimple_fold_builtin to re-gimplify
> the result AFAICT.
>
> Richi -- thoughts?

I'm not entirely happy with the patch but also not with the current
state of handling of fpclassify.  I do see the need to lower(!)
fpclassify early because we want to optimize it both depending on
the return value usage and the input value.

The lowering we currently apply open-codes isnormal (we have a
builtin for this) and isfinite (likewise).  I'd prefer if we can
apply the lowering in the gimplifier and somehow avoid the
early decision on whether to use FP or integer code to perform
the operation.  Sth like

 fpclassify(x) -> isnan(x) ? FP_NAN : isnormal(x) ? FP_NORMAL
: !isfinite(x) ? FP_INFINITE : x == 0 ? FP_ZERO : FP_SUBNORMAL

(leaves the comparison against zero in explicit FP math).  We do
have later foldings that expand isnormal and isfinite and
isinf to use compares -- those are the ones that we might want to
change to integer reps.

We are also missing optabs for most of the sub-classification
tasks which would make it possible to re-combine the whole
thing back to a single fpclassify asm op.

That said, the folding to integer ops obfuscates the real operation
and thus makes the job of a (not yet existing) pass optimizing
these kind of classifications via range analysis or the like hard.
Thus I'd rather apply those at or near to RTL expansion time.

Richard.
 
> --
> 
> I think its nontrivial to judge worse vs better since it's really a function
> of the target's micro-architecture and the context in which fpclassify is
> called -- particularly where the input value lives and whether or not its used
> in other ways nearby.
> 
> In the case where the input value is in memory or not used in floating point
> arithmetic nearby, your change should be a clear win (with the exception of
> the latest ppc hardware perhaps).
> 
> If the input value is not in memory and used nearby in FP ops, then it gets a
> lot trickier.  We run the risk of making the object addressable which means it
> won't be an SSA_NAME and thus not exposed to the high level optimizers.
> 
> Richi has indicated that in gimple an object need not be addressable just
> because we access random pieces of it, including the ability to avoid marking
> something as addressable even though we have MEM (&decl) style expressions.
> I'm not sure how all that works, but trust Richi implicitly.  Additionally
> you're using VIEW_CONVERT_EXPR now rather than ADDR_EXPR, so that may mitigate
> things as well.
> 
> Finally there's the issue of having to transfer the object between the FP and
> GP register files which can be highly expensive on some architectures.  Short
> of looking at the defs & immediate uses of the input argument and trying to
> guess at the cost of moving the object between the register files I don't see
> a good way to tackle this issue.
> 
> I'm inclined to not object on the performance questions.  But I would like to
> hear from Richi WRT lowering vs folding and whether or not he believes this
> belongs elsewhere.  If he does, I'd be inclined to suggest earlier rather than
> later since we do want to expose the generated code to the gimple optimizers.
> 
> Additional implementation comments follow inline.
> 
> 
> > 
> > gcc/
> > 2016-08-25  Tamar Christina  <tamar.christina@arm.com>
> > 	    Wilco Dijkstra  <wilco.dijkstra@arm.com>
> > 
> > 	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
> > 	* gcc/real.h (real_format): Added is_ieee_compatible field.
> > 	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
> > 	(mips_single_format): Likewise.
> > 	(motorola_single_format): Likewise.
> > 	(spu_single_format): Likewise.
> > 	(ieee_double_format): Likewise.
> > 	(mips_double_format): Likewise.
> > 	(motorola_double_format): Likewise.
> > 	(ieee_extended_motorola_format): Likewise.
> > 	(ieee_extended_intel_128_format): Likewise.
> > 	(ieee_extended_intel_96_round_53_format): Likewise.
> > 	(ibm_extended_format): Likewise.
> > 	(mips_extended_format): Likewise.
> > 	(ieee_quad_format): Likewise.
> > 	(mips_quad_format): Likewise.
> > 	(vax_f_format): Likewise.
> > 	(vax_d_format): Likewise.
> > 	(vax_g_format): Likewise.
> > 	(decimal_single_format): Likewise.
> > 	(decimal_quad_format): Likewise.
> > 	(iee_half_format): Likewise.
> > 	(mips_single_format): Likewise.
> > 	(arm_half_format): Likewise.
> > 	(real_internal_format): Likewise.
> > 
> > 
> > gcc/testsuite/
> > 2016-09-27  Tamar Christina  <tamar.christina@arm.com>
> > 
> > 	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.
> > 
> > 
> > gcc-v2-fpclassify.patch
> > 
> > 
> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> > index
> > 9a19a75cc8ed6edb5f543cd7bd26bcc0693e6ebb..1b4878c5ba098dcc0a4a506dbc7959d150cc9028
> > 100644
> > --- a/gcc/builtins.c
> > +++ b/gcc/builtins.c
> > @@ -7943,10 +7943,8 @@ static tree
> >  fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
> >  {
> >    tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
> > -    arg, type, res, tmp;
> > +    arg, type, res;
> >    machine_mode mode;
> > -  REAL_VALUE_TYPE r;
> > -  char buf[128];
> > 
> >    /* Verify the required arguments in the original call.  */
> >    if (nargs != 6
> > @@ -7966,14 +7964,164 @@ fold_builtin_fpclassify (location_t loc, tree
> > *args, int nargs)
> >    arg = args[5];
> >    type = TREE_TYPE (arg);
> >    mode = TYPE_MODE (type);
> > -  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
> > +  const real_format *format = REAL_MODE_FORMAT (mode);
> > +  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
> > +
> > +  /*
> > +  For IEEE 754 types:
> > +
> > +  fpclassify (x) ->
> > +       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
> > +	 ? (x & sign_mask == 0 ? FP_ZERO :
> > +	   (exp & exp_mask == exp_mask
> > +	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
> > +	      FP_SUBNORMAL)):
> > +       FP_NORMAL.
> > +
> > +  Otherwise
> > +
> > +  fpclassify (x) ->
> > +       isnan (x) ? FP_NAN :
> > +	(fabs (x) == Inf ? FP_INFINITE :
> > +	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
> > +	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
> > +  */
> > +
> > +  /* Check if the number that is being classified is close enough to IEEE
> > 754
> > +     format to be able to go in the early exit code.  */
> > +  if (format->is_binary_ieee_compatible
> > +      && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
> > +      && !(UNITS_PER_WORD == 4 && type_width == 128)
> Why the UNITS_PER_WORD && type width tests against constants?
> 
> > +
> > +      /* Extract exp bits from the float, where we expect the exponent to
> > be.
> > +	 We create a new type because BIT_FIELD_REF does not allow you to
> > +	 extract less bits than the precision of the storage variable.  */
> > +      exp_bitfield
> > +        = fold_build3_loc (loc, BIT_FIELD_REF,
> > +			   build_nonstandard_integer_type (exp_bits, 0),
> > +			   int_arg,
> > +			   build_int_cst (int_type, exp_bits),
> > +			   build_int_cst (int_type, format->p - 1));
> > +
> > +      /* Re-interpret the extracted exponent bits as a 32 bit int.
> > +	 This allows us to continue doing operations as int_type.  */
> > +      exp = fold_build1_loc (loc, NOP_EXPR, int_type, exp_bitfield);
> > +
> > +      /* Set up some often used constants.  */
> > +      const_arg0 = build_int_cst (int_arg_type, 0);
> > +      const_arg1 = build_int_cst (int_arg_type, 1);
> > +      const_arg2 = build_int_cst (int_arg_type, 2);
> > +      const0 = build_int_cst (int_type, 0);
> > +      const1 = build_int_cst (int_type, 1);
> Note we have integer_zero_node and integer_one_node.  You may be able to use
> them rather than generating new nodes.  If you need the node in a different
> type, you can use fold_convert_const to do that.
> 
> 
> 
> > +
> > +      /* b^(p-1) - 1 or 1 << (p - 2)
> > +	 This creates a mask to be used to check the mantissa value.  */
> > +      significant_bit = build_int_cst (int_arg_type, format->p - 2);
> > +      mantissa_mask
> > +        = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
> > +			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
> > +					    const_arg2, significant_bit),
> > +			   const_arg1);
> Can you please double-check this?  In particular you seem to be generating 2
> << (p - 2), which doesn't seem to match the comment.
> 
> 
> 
> 
> > 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..84a73a6483780dac2347e72fa7d139545d2087eb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > @@ -0,0 +1,22 @@
> > +/* This file checks the code generation for the new __builtin_fpclassify.
> > +   because checking the exact assembly isn't very useful, we'll just be
> > checking
> > +   for the presence of certain instructions and the omition of others. */
> s/omition/omission/
> 
> 
> ISTM you'd want to test this for float, double and long double.
> 
> 
> jeff
> 
>
Tamar Christina Oct. 21, 2016, 4:38 p.m. UTC | #4
Hi Richard, Jeff,

Fair enough, I understand the reservations both of you have.

I'll spend some time experimenting with what kind of code I'd
Get out of it from lowering early and come up with an updated
Patch.

Thanks!
Tamar

> -----Original Message-----
> From: Richard Biener [mailto:rguenther@suse.de]
> Sent: 21 October 2016 09:05
> To: Jeff Law
> Cc: Tamar Christina; GCC Patches; nd; Richard Earnshaw; Wilco Dijkstra;
> jakub@redhat.com; Joseph Myers; Michael Meissner; Moritz Klammler;
> Andrew Pinski
> Subject: Re: [PATCHv2][GCC] Optimise the fpclassify builtin to perform
> integer operations when possible
> 
> On Thu, 20 Oct 2016, Jeff Law wrote:
> 
> > On 09/30/2016 07:22 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This is v2 of the patch which adds an optimized route to the
> > > fpclassify builtin for floating point numbers which are similar to
> > > IEEE-754 in format.
> > >
> > > I have addressed most comments from everyone except for two things:
> > >
> > > 1) Providing a back-end hook to override the functionality. While certainly
> > >    possible the current fpclassify doesn't provide this either. So
> > > I'd like to
> > >    treat it as an enhancement rather than an issue.
> > I think the concern here is PPC, particularly the newer ones which
> > have significant hardware support for these kind of characterizations.
> >
> > Based on the discussions though, I suspect we're going to need
> > something nontrivial due to the way the API for __builtin_fpclassify
> > works.  In the end I can easily see some target way to override the default
> code synthesis.
> >
> > I think these issues should be left for the PPC folks to propose a
> > solution when they're ready to exploit their new hardware.  I don't
> > think this should block the patch.
> >
> >
> >
> > >
> > > 2) Doing it in a lowering phase. If the general consensus is that
> > > this is the
> > >    path the patch must take then I'd be happy to reconsider. However at
> this
> > >    this patch does not seem to produce worse code than what there
> > > was before.
> > I think that was a desire from Richi.   I'm a bit torn here.
> >
> > The code looks more like lowering rather than folding.  But it's also
> > generating non-gimple trees and relies on gimple_fold_builtin to
> > re-gimplify the result AFAICT.
> >
> > Richi -- thoughts?
> 
> I'm not entirely happy with the patch but also not with the current state of
> handling of fpclassify.  I do see the need to lower(!) fpclassify early because
> we want to optimize it both depending on the return value usage and the
> input value.
> 
> The lowering we currently apply open-codes isnormal (we have a builtin for
> this) and isfinite (likewise).  I'd prefer if we can apply the lowering in the
> gimplifier and somehow avoid the early decision on whether to use FP or
> integer code to perform the operation.  Sth like
> 
>  fpclassify(x) -> isnan(x) ? FP_NAN : isnormal(x) ? FP_NORMAL
> : !isfinite(x) ? FP_INFINITE : x == 0 ? FP_ZERO : FP_SUBNORMAL
> 
> (leaves the comparison against zero in explicit FP math).  We do have later
> foldings that expand isnormal and isfinite and isinf to use compares -- those
> are the ones that we might want to change to integer reps.
> 
> We are also missing optabs for most of the sub-classification tasks which
> would make it possible to re-combine the whole thing back to a single
> fpclassify asm op.
> 
> That said, the folding to integer ops obfuscates the real operation and thus
> makes the job of a (not yet existing) pass optimizing these kind of
> classifications via range analysis or the like hard.
> Thus I'd rather apply those at or near to RTL expansion time.
> 
> Richard.
> 
> > --
> >
> > I think its nontrivial to judge worse vs better since it's really a
> > function of the target's micro-architecture and the context in which
> > fpclassify is called -- particularly where the input value lives and
> > whether or not its used in other ways nearby.
> >
> > In the case where the input value is in memory or not used in floating
> > point arithmetic nearby, your change should be a clear win (with the
> > exception of the latest ppc hardware perhaps).
> >
> > If the input value is not in memory and used nearby in FP ops, then it
> > gets a lot trickier.  We run the risk of making the object addressable
> > which means it won't be an SSA_NAME and thus not exposed to the high
> level optimizers.
> >
> > Richi has indicated that in gimple an object need not be addressable
> > just because we access random pieces of it, including the ability to
> > avoid marking something as addressable even though we have MEM
> (&decl) style expressions.
> > I'm not sure how all that works, but trust Richi implicitly.
> > Additionally you're using VIEW_CONVERT_EXPR now rather than
> ADDR_EXPR,
> > so that may mitigate things as well.
> >
> > Finally there's the issue of having to transfer the object between the
> > FP and GP register files which can be highly expensive on some
> > architectures.  Short of looking at the defs & immediate uses of the
> > input argument and trying to guess at the cost of moving the object
> > between the register files I don't see a good way to tackle this issue.
> >
> > I'm inclined to not object on the performance questions.  But I would
> > like to hear from Richi WRT lowering vs folding and whether or not he
> > believes this belongs elsewhere.  If he does, I'd be inclined to
> > suggest earlier rather than later since we do want to expose the generated
> code to the gimple optimizers.
> >
> > Additional implementation comments follow inline.
> >
> >
> > >
> > > gcc/
> > > 2016-08-25  Tamar Christina  <tamar.christina@arm.com>
> > > 	    Wilco Dijkstra  <wilco.dijkstra@arm.com>
> > >
> > > 	* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
> > > 	* gcc/real.h (real_format): Added is_ieee_compatible field.
> > > 	* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
> > > 	(mips_single_format): Likewise.
> > > 	(motorola_single_format): Likewise.
> > > 	(spu_single_format): Likewise.
> > > 	(ieee_double_format): Likewise.
> > > 	(mips_double_format): Likewise.
> > > 	(motorola_double_format): Likewise.
> > > 	(ieee_extended_motorola_format): Likewise.
> > > 	(ieee_extended_intel_128_format): Likewise.
> > > 	(ieee_extended_intel_96_round_53_format): Likewise.
> > > 	(ibm_extended_format): Likewise.
> > > 	(mips_extended_format): Likewise.
> > > 	(ieee_quad_format): Likewise.
> > > 	(mips_quad_format): Likewise.
> > > 	(vax_f_format): Likewise.
> > > 	(vax_d_format): Likewise.
> > > 	(vax_g_format): Likewise.
> > > 	(decimal_single_format): Likewise.
> > > 	(decimal_quad_format): Likewise.
> > > 	(iee_half_format): Likewise.
> > > 	(mips_single_format): Likewise.
> > > 	(arm_half_format): Likewise.
> > > 	(real_internal_format): Likewise.
> > >
> > >
> > > gcc/testsuite/
> > > 2016-09-27  Tamar Christina  <tamar.christina@arm.com>
> > >
> > > 	* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.
> > >
> > >
> > > gcc-v2-fpclassify.patch
> > >
> > >
> > > diff --git a/gcc/builtins.c b/gcc/builtins.c index
> > >
> 9a19a75cc8ed6edb5f543cd7bd26bcc0693e6ebb..1b4878c5ba098dcc0a4a506d
> bc
> > > 7959d150cc9028
> > > 100644
> > > --- a/gcc/builtins.c
> > > +++ b/gcc/builtins.c
> > > @@ -7943,10 +7943,8 @@ static tree
> > >  fold_builtin_fpclassify (location_t loc, tree *args, int nargs)  {
> > >    tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
> > > -    arg, type, res, tmp;
> > > +    arg, type, res;
> > >    machine_mode mode;
> > > -  REAL_VALUE_TYPE r;
> > > -  char buf[128];
> > >
> > >    /* Verify the required arguments in the original call.  */
> > >    if (nargs != 6
> > > @@ -7966,14 +7964,164 @@ fold_builtin_fpclassify (location_t loc,
> > > tree *args, int nargs)
> > >    arg = args[5];
> > >    type = TREE_TYPE (arg);
> > >    mode = TYPE_MODE (type);
> > > -  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type,
> > > arg));
> > > +  const real_format *format = REAL_MODE_FORMAT (mode);  const
> > > + HOST_WIDE_INT type_width = TYPE_PRECISION (type);
> > > +
> > > +  /*
> > > +  For IEEE 754 types:
> > > +
> > > +  fpclassify (x) ->
> > > +       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
> > > +	 ? (x & sign_mask == 0 ? FP_ZERO :
> > > +	   (exp & exp_mask == exp_mask
> > > +	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
> > > +	      FP_SUBNORMAL)):
> > > +       FP_NORMAL.
> > > +
> > > +  Otherwise
> > > +
> > > +  fpclassify (x) ->
> > > +       isnan (x) ? FP_NAN :
> > > +	(fabs (x) == Inf ? FP_INFINITE :
> > > +	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
> > > +	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
> > > +  */
> > > +
> > > +  /* Check if the number that is being classified is close enough
> > > + to IEEE
> > > 754
> > > +     format to be able to go in the early exit code.  */  if
> > > + (format->is_binary_ieee_compatible
> > > +      && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
> > > +      && !(UNITS_PER_WORD == 4 && type_width == 128)
> > Why the UNITS_PER_WORD && type width tests against constants?
> >
> > > +
> > > +      /* Extract exp bits from the float, where we expect the
> > > + exponent to
> > > be.
> > > +	 We create a new type because BIT_FIELD_REF does not allow you to
> > > +	 extract less bits than the precision of the storage variable.  */
> > > +      exp_bitfield
> > > +        = fold_build3_loc (loc, BIT_FIELD_REF,
> > > +			   build_nonstandard_integer_type (exp_bits, 0),
> > > +			   int_arg,
> > > +			   build_int_cst (int_type, exp_bits),
> > > +			   build_int_cst (int_type, format->p - 1));
> > > +
> > > +      /* Re-interpret the extracted exponent bits as a 32 bit int.
> > > +	 This allows us to continue doing operations as int_type.  */
> > > +      exp = fold_build1_loc (loc, NOP_EXPR, int_type,
> > > +exp_bitfield);
> > > +
> > > +      /* Set up some often used constants.  */
> > > +      const_arg0 = build_int_cst (int_arg_type, 0);
> > > +      const_arg1 = build_int_cst (int_arg_type, 1);
> > > +      const_arg2 = build_int_cst (int_arg_type, 2);
> > > +      const0 = build_int_cst (int_type, 0);
> > > +      const1 = build_int_cst (int_type, 1);
> > Note we have integer_zero_node and integer_one_node.  You may be
> able
> > to use them rather than generating new nodes.  If you need the node in
> > a different type, you can use fold_convert_const to do that.
> >
> >
> >
> > > +
> > > +      /* b^(p-1) - 1 or 1 << (p - 2)
> > > +	 This creates a mask to be used to check the mantissa value.  */
> > > +      significant_bit = build_int_cst (int_arg_type, format->p - 2);
> > > +      mantissa_mask
> > > +        = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
> > > +			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
> > > +					    const_arg2, significant_bit),
> > > +			   const_arg1);
> > Can you please double-check this?  In particular you seem to be
> > generating 2 << (p - 2), which doesn't seem to match the comment.
> >
> >
> >
> >
> > >
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > > b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > > new file mode 100644
> > > index
> > >
> 0000000000000000000000000000000000000000..84a73a6483780dac2347e72fa7
> > > d139545d2087eb
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
> > > @@ -0,0 +1,22 @@
> > > +/* This file checks the code generation for the new __builtin_fpclassify.
> > > +   because checking the exact assembly isn't very useful, we'll
> > > +just be
> > > checking
> > > +   for the presence of certain instructions and the omition of
> > > + others. */
> > s/omition/omission/
> >
> >
> > ISTM you'd want to test this for float, double and long double.
> >
> >
> > jeff
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)
diff mbox

Patch

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a19a75cc8ed6edb5f543cd7bd26bcc0693e6ebb..1b4878c5ba098dcc0a4a506dbc7959d150cc9028 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7943,10 +7943,8 @@  static tree
 fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
 {
   tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
-    arg, type, res, tmp;
+    arg, type, res;
   machine_mode mode;
-  REAL_VALUE_TYPE r;
-  char buf[128];
 
   /* Verify the required arguments in the original call.  */
   if (nargs != 6
@@ -7966,14 +7964,164 @@  fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
   arg = args[5];
   type = TREE_TYPE (arg);
   mode = TYPE_MODE (type);
-  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+  const real_format *format = REAL_MODE_FORMAT (mode);
+  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
+
+  /*
+  For IEEE 754 types:
+
+  fpclassify (x) ->
+       !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+	 ? (x & sign_mask == 0 ? FP_ZERO :
+	   (exp & exp_mask == exp_mask
+	      ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+	      FP_SUBNORMAL)):
+       FP_NORMAL.
+
+  Otherwise
+
+  fpclassify (x) ->
+       isnan (x) ? FP_NAN :
+	(fabs (x) == Inf ? FP_INFINITE :
+	   (fabs (x) >= DBL_MIN ? FP_NORMAL :
+	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+  */
+
+  /* Check if the number that is being classified is close enough to IEEE 754
+     format to be able to go in the early exit code.  */
+  if (format->is_binary_ieee_compatible
+      && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
+      && !(UNITS_PER_WORD == 4 && type_width == 128)
+      && targetm.scalar_mode_supported_p (mode))
+    {
+      gcc_assert (format->b == 2);
+
+      const tree int_type = integer_type_node;
+      const int exp_bits  = (GET_MODE_SIZE (mode) * BITS_PER_UNIT) - format->p;
+      const int exp_mask  = (1 << exp_bits) - 1;
+
+      tree exp, exp_check, specials, special_test;
+      tree exp_bitfield, sign_bit, significant_bit;
+      tree const_arg0, const_arg1, const_arg2, const0, const1;
+      tree not_sign_mask, zero_check;
+      tree mantissa_mask, mantissa_any_set;
+      tree exp_lsb_set, mask_check;
+      tree int_arg_type, int_arg, conv_arg;
+
+      /* Re-interpret the float as an unsigned integer type
+	 with equal precision.  */
+      int_arg_type = build_nonstandard_integer_type (type_width, 0);
+      conv_arg = fold_build1_loc (loc, VIEW_CONVERT_EXPR, int_arg_type, arg);
+      int_arg = builtin_save_expr (conv_arg);
+
+      /* Extract exp bits from the float, where we expect the exponent to be.
+	 We create a new type because BIT_FIELD_REF does not allow you to
+	 extract less bits than the precision of the storage variable.  */
+      exp_bitfield
+        = fold_build3_loc (loc, BIT_FIELD_REF,
+			   build_nonstandard_integer_type (exp_bits, 0),
+			   int_arg,
+			   build_int_cst (int_type, exp_bits),
+			   build_int_cst (int_type, format->p - 1));
+
+      /* Re-interpret the extracted exponent bits as a 32 bit int.
+	 This allows us to continue doing operations as int_type.  */
+      exp = fold_build1_loc (loc, NOP_EXPR, int_type, exp_bitfield);
+
+      /* Set up some often used constants.  */
+      const_arg0 = build_int_cst (int_arg_type, 0);
+      const_arg1 = build_int_cst (int_arg_type, 1);
+      const_arg2 = build_int_cst (int_arg_type, 2);
+      const0 = build_int_cst (int_type, 0);
+      const1 = build_int_cst (int_type, 1);
+
+      /* 1) First check for 0 by first masking out sign bit.
+	 2) Then check for NaNs using a bit mask by checking first if the
+	    exponent has all bits set, if it does it can be either NaN or INF.
+	 3) Anything else are subnormal numbers.  */
+
+      /* ~(1 << location_sign_bit).
+	 This creates a mask that can be used to mask out the sign bit.  */
+      sign_bit = build_int_cst (int_arg_type, format->signbit_rw);
+      not_sign_mask
+        = fold_build1_loc (loc, BIT_NOT_EXPR, int_arg_type,
+			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+					    const_arg1, sign_bit));
+
+      /* num & not_sign_mask == 0.
+	 This checks to see if the number is zero.  */
+      zero_check
+        = fold_build2_loc (loc, EQ_EXPR, int_type, const_arg0,
+			   fold_build2_loc (loc, BIT_AND_EXPR, int_arg_type,
+					    int_arg, not_sign_mask));
+
+      /* b^(p-1) - 1 or 1 << (p - 2)
+	 This creates a mask to be used to check the mantissa value.  */
+      significant_bit = build_int_cst (int_arg_type, format->p - 2);
+      mantissa_mask
+        = fold_build2_loc (loc, MINUS_EXPR, int_arg_type,
+			   fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+					    const_arg2, significant_bit),
+			   const_arg1);
+
+      /* num & mantissa_mask != 0.  */
+      mantissa_any_set
+        = fold_build2_loc (loc, NE_EXPR, int_type, const_arg0,
+			   fold_build2_loc (loc, BIT_AND_EXPR, int_arg_type,
+					    mantissa_mask, int_arg));
+
+      /* (exp & 1) != 0.
+	 This check can be used to check if the exp is all 0 or all 1.
+	 At the point it is used the exp is either all 1 or 0, so checking
+	 one bit is enough to disambiguate between the two.  */
+      exp_lsb_set
+        = fold_build2_loc (loc, NE_EXPR, int_type, const0,
+			   fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+					    exp, const1));
+
+      /* Combine the values together.  */
+      special_test
+        = fold_build3_loc (loc, COND_EXPR, int_type, exp_lsb_set,
+			   fold_build3_loc (loc, COND_EXPR, int_type,
+					    mantissa_any_set,
+					    HONOR_NANS (mode)
+					      ? fp_nan : fp_normal,
+					    HONOR_INFINITIES (mode)
+					      ? fp_infinite : fp_normal),
+			   fp_subnormal);
+      specials
+        = fold_build3_loc (loc, COND_EXPR, int_type, zero_check, fp_zero,
+			   special_test);
+
+      /* Top level compare of the most general case,
+	 try to see if it's a normal real.  */
+
+      /* exp_mask & ~1.  */
+      mask_check
+        = fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+			   build_int_cst (int_type, exp_mask),
+			   fold_build1_loc (loc, BIT_NOT_EXPR, int_type,
+					    const1));
+      /* (exp + 1) & mask_check.
+	 Check to see if exp is not all 0 or all 1.  */
+      exp_check
+        = fold_build2_loc (loc, BIT_AND_EXPR, int_type,
+	                   fold_build2_loc (loc, PLUS_EXPR, int_type,
+			                    exp, const1),
+		           mask_check);
+
+      res = fold_build3_loc (loc, COND_EXPR, int_type,
+	                     fold_build2_loc (loc, NE_EXPR, int_type, const0,
+					      exp_check),
+			     fp_normal, specials);
 
-  /* fpclassify(x) ->
-       isnan(x) ? FP_NAN :
-         (fabs(x) == Inf ? FP_INFINITE :
-	   (fabs(x) >= DBL_MIN ? FP_NORMAL :
-	     (x == 0 ? FP_ZERO : FP_SUBNORMAL))).  */
+      return res;
+    }
 
+  REAL_VALUE_TYPE r;
+  tree tmp;
+  char buf[128];
+  arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
   tmp = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg,
 		     build_real (type, dconst0));
   res = fold_build3_loc (loc, COND_EXPR, integer_type_node,
diff --git a/gcc/real.h b/gcc/real.h
index 59af580e78f2637be84f71b98b45ec6611053222..6aed787b2cacfe67a2fb8996f82325870b356223 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -161,6 +161,19 @@  struct real_format
   bool has_signed_zero;
   bool qnan_msb_set;
   bool canonical_nan_lsbs_set;
+
+  /* This flag indicates whether the format is suitable for the optimized
+     code paths for the __builtin_fpclassify function and friends.  For
+     this, the format must be a base 2 representation with the sign bit as
+     the most-significant bit followed by (exp <= 32) exponent bits
+     followed by the mantissa bits.  It must be possible to interpret the
+     bits of the floating-point representation as an integer.  NaNs and
+     INFs (if available) must be represented by the same schema used by
+     IEEE 754.  (NaNs must be represented by an exponent with all bits 1,
+     any mantissa except all bits 0 and any sign bit.  +INF and -INF must be
+     represented by an exponent with all bits 1, a mantissa with all bits 0 and
+     a sign bit of 0 and 1 respectively.)  */
+  bool is_binary_ieee_compatible;
   const char *name;
 };
 
diff --git a/gcc/real.c b/gcc/real.c
index 66e88e2ad366f7848609d157074c80420d778bcf..fa51e4a504bfa24b2492340e5c1dfd5366ce8800 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -3052,6 +3052,7 @@  const struct real_format ieee_single_format =
     true,
     true,
     false,
+    true,
     "ieee_single"
   };
 
@@ -3075,6 +3076,7 @@  const struct real_format mips_single_format =
     true,
     false,
     true,
+    true,
     "mips_single"
   };
 
@@ -3098,6 +3100,7 @@  const struct real_format motorola_single_format =
     true,
     true,
     true,
+    true,
     "motorola_single"
   };
 
@@ -3132,6 +3135,7 @@  const struct real_format spu_single_format =
     true,
     false,
     false,
+    false,
     "spu_single"
   };
 
@@ -3343,6 +3347,7 @@  const struct real_format ieee_double_format =
     true,
     true,
     false,
+    true,
     "ieee_double"
   };
 
@@ -3366,6 +3371,7 @@  const struct real_format mips_double_format =
     true,
     false,
     true,
+    true,
     "mips_double"
   };
 
@@ -3389,6 +3395,7 @@  const struct real_format motorola_double_format =
     true,
     true,
     true,
+    true,
     "motorola_double"
   };
 
@@ -3735,6 +3742,7 @@  const struct real_format ieee_extended_motorola_format =
     true,
     true,
     true,
+    false,
     "ieee_extended_motorola"
   };
 
@@ -3758,6 +3766,7 @@  const struct real_format ieee_extended_intel_96_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_96"
   };
 
@@ -3781,6 +3790,7 @@  const struct real_format ieee_extended_intel_128_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_128"
   };
 
@@ -3806,6 +3816,7 @@  const struct real_format ieee_extended_intel_96_round_53_format =
     true,
     true,
     false,
+    false,
     "ieee_extended_intel_96_round_53"
   };
 
@@ -3896,6 +3907,7 @@  const struct real_format ibm_extended_format =
     true,
     true,
     false,
+    false,
     "ibm_extended"
   };
 
@@ -3919,6 +3931,7 @@  const struct real_format mips_extended_format =
     true,
     false,
     true,
+    false,
     "mips_extended"
   };
 
@@ -4184,6 +4197,7 @@  const struct real_format ieee_quad_format =
     true,
     true,
     false,
+    true,
     "ieee_quad"
   };
 
@@ -4207,6 +4221,7 @@  const struct real_format mips_quad_format =
     true,
     false,
     true,
+    true,
     "mips_quad"
   };
 
@@ -4509,6 +4524,7 @@  const struct real_format vax_f_format =
     false,
     false,
     false,
+    false,
     "vax_f"
   };
 
@@ -4532,6 +4548,7 @@  const struct real_format vax_d_format =
     false,
     false,
     false,
+    false,
     "vax_d"
   };
 
@@ -4555,6 +4572,7 @@  const struct real_format vax_g_format =
     false,
     false,
     false,
+    false,
     "vax_g"
   };
 
@@ -4633,6 +4651,7 @@  const struct real_format decimal_single_format =
     true,
     true,
     false,
+    false,
     "decimal_single"
   };
 
@@ -4657,6 +4676,7 @@  const struct real_format decimal_double_format =
     true,
     true,
     false,
+    false,
     "decimal_double"
   };
 
@@ -4681,6 +4701,7 @@  const struct real_format decimal_quad_format =
     true,
     true,
     false,
+    false,
     "decimal_quad"
   };
 
@@ -4820,6 +4841,7 @@  const struct real_format ieee_half_format =
     true,
     true,
     false,
+    true,
     "ieee_half"
   };
 
@@ -4846,6 +4868,7 @@  const struct real_format arm_half_format =
     true,
     false,
     false,
+    false,
     "arm_half"
   };
 
@@ -4893,6 +4916,7 @@  const struct real_format real_internal_format =
     true,
     true,
     false,
+    false,
     "real_internal"
   };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
new file mode 100644
index 0000000000000000000000000000000000000000..84a73a6483780dac2347e72fa7d139545d2087eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin-fpclassify.c
@@ -0,0 +1,22 @@ 
+/* This file checks the code generation for the new __builtin_fpclassify.
+   because checking the exact assembly isn't very useful, we'll just be checking
+   for the presence of certain instructions and the omition of others. */
+/* { dg-options "-O2" } */
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-not "\[ \t\]?fabs\[ \t\]?" } } */
+/* { dg-final { scan-assembler-not "\[ \t\]?fcmp\[ \t\]?" } } */
+/* { dg-final { scan-assembler-not "\[ \t\]?fcmpe\[ \t\]?" } } */
+/* { dg-final { scan-assembler "\[ \t\]?sbfx\[ \t\]?" } } */
+
+#include <stdio.h>
+#include <math.h>
+
+/*
+ fp_nan = args[0];
+ fp_infinite = args[1];
+ fp_normal = args[2];
+ fp_subnormal = args[3];
+ fp_zero = args[4];
+*/
+
+int f(double x) { return __builtin_fpclassify(0, 1, 4, 3, 2, x); }