diff mbox series

[v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

Message ID 20240522011725.424952-1-pan2.li@intel.com
State New
Headers show
Series [v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD | expand

Commit Message

Li, Pan2 May 22, 2024, 1:17 a.m. UTC
From: Pan Li <pan2.li@intel.com>

This patch would like to support the __builtin_add_overflow branch form for
unsigned SAT_ADD.  For example as below:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Different to the branchless version,  we leverage the simplify to
convert the branch version of SAT_ADD into branchless if and only
if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
the ability to choose branch or branchless implementation of .SAT_ADD.
For example,  some target can take care of branches code more optimally.

When the target implement the IFN_SAT_ADD for unsigned and before this
patch:

uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
    goto <bb 4>; [35.00%]
  else
    goto <bb 3>; [65.00%]
;;    succ:       4
;;                3

;;   basic block 3, loop depth 0
;;    pred:       2
  _1 = REALPART_EXPR <_6>;
;;    succ:       4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;    succ:       EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;    succ:       EXIT
}

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

	* match.pd: Add new simplify to convert branch SAT_ADD into
	branchless,  if and only if backend implement the IFN.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/match.pd | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Richard Biener May 22, 2024, 1:14 p.m. UTC | #1
On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
>     goto <bb 4>; [35.00%]
>   else
>     goto <bb 3>; [65.00%]
> ;;    succ:       4
> ;;                3
>
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
>   _1 = REALPART_EXPR <_6>;
> ;;    succ:       4
>
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;    succ:       EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;    succ:       EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
>         * match.pd: Add new simplify to convert branch SAT_ADD into
>         branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/match.pd | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cff67c84498..2dc77a46e67 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>
Li, Pan2 May 23, 2024, 4:16 a.m. UTC | #2
Thanks Richard for reviewing.

> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.

I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form.
Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling
and convert the branch form to the branchless form in v2.

> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.

Sure thing.

> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult.

Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt?
The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the
other early pass.

Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD
In phiopt (mostly like what we do in widen-mult).
Not sure if my understanding is correct or not, thanks again for help.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint8_t);

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, May 22, 2024 9:14 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
>     goto <bb 4>; [35.00%]
>   else
>     goto <bb 3>; [65.00%]
> ;;    succ:       4
> ;;                3
>
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
>   _1 = REALPART_EXPR <_6>;
> ;;    succ:       4
>
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;    succ:       EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;    succ:       EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
>         * match.pd: Add new simplify to convert branch SAT_ADD into
>         branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/match.pd | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cff67c84498..2dc77a46e67 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>
Li, Pan2 May 23, 2024, 11:08 a.m. UTC | #3
I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
And then we can do the matching on COND_EXPR in the underlying widen-mul pass.

Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => 
sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86

will go on to see if this works or not.

Part-A:
uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
{
  unsigned char _1;
  uint8_t _2;

  <bb 2> :
  _1 = x_3(D) + y_4(D);
  if (_1 >= x_3(D))
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :

  <bb 4> :
  # _2 = PHI <255(2), _1(3)>
  return _2;

}

Part-B:
uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
{
  unsigned char _1;
  _Bool phi_cond_6;

  <bb 2> :
  _1 = x_3(D) + y_4(D);
  phi_cond_6 = _1 >= x_3(D);
  _2 = phi_cond_6 ? _1 : 255;
  return _2;

}

-----Original Message-----
From: Li, Pan2 
Sent: Thursday, May 23, 2024 12:17 PM
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

Thanks Richard for reviewing.

> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.

I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form.
Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling
and convert the branch form to the branchless form in v2.

> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.

Sure thing.

> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult.

Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt?
The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the
other early pass.

Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD
In phiopt (mostly like what we do in widen-mult).
Not sure if my understanding is correct or not, thanks again for help.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint8_t);

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, May 22, 2024 9:14 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
>     goto <bb 4>; [35.00%]
>   else
>     goto <bb 3>; [65.00%]
> ;;    succ:       4
> ;;                3
>
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
>   _1 = REALPART_EXPR <_6>;
> ;;    succ:       4
>
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;    succ:       EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;    succ:       EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

I'm not convinced we should match this during early if-conversion, should we?
The middle-end doesn't really know .SAT_ADD but some handling of
.ADD_OVERFLOW is present.

But please add a comment before the new pattern, esp. since it's
non-obvious that this is an improvent.

I suspect you rely on this form being recognized as .SAT_ADD later but
what prevents us from breaking this?  Why not convert it to .SAT_ADD
immediately?  If this is because the ISEL pass (or the widen-mult pass)
cannot handle PHIs then I would suggest to split out enough parts of
tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.

> gcc/ChangeLog:
>
>         * match.pd: Add new simplify to convert branch SAT_ADD into
>         branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/match.pd | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index cff67c84498..2dc77a46e67 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>
Richard Biener May 23, 2024, 12:14 p.m. UTC | #4
On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
>
> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86

Likely you have released _2, more comments below on your previous mail.

> will go on to see if this works or not.
>
> Part-A:
> uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   uint8_t _2;
>
>   <bb 2> :
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>
>   <bb 4> :
>   # _2 = PHI <255(2), _1(3)>
>   return _2;
>
> }
>
> Part-B:
> uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   _Bool phi_cond_6;
>
>   <bb 2> :
>   _1 = x_3(D) + y_4(D);
>   phi_cond_6 = _1 >= x_3(D);
>   _2 = phi_cond_6 ? _1 : 255;
>   return _2;
>
> }
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Thursday, May 23, 2024 12:17 PM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
> Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD
>
> Thanks Richard for reviewing.
>
> > I'm not convinced we should match this during early if-conversion, should we?
> > The middle-end doesn't really know .SAT_ADD but some handling of
> > .ADD_OVERFLOW is present.
>
> I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form.
> Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling
> and convert the branch form to the branchless form in v2.
>
> > But please add a comment before the new pattern, esp. since it's
> > non-obvious that this is an improvent.
>
> Sure thing.
>
> > I suspect you rely on this form being recognized as .SAT_ADD later but
> > what prevents us from breaking this?  Why not convert it to .SAT_ADD
> > immediately?  If this is because the ISEL pass (or the widen-mult pass)
> > cannot handle PHIs then I would suggest to split out enough parts of
> > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.
>
> Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult.
>
> Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt?
> The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the
> other early pass.
>
> Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD
> In phiopt (mostly like what we do in widen-mult).
> Not sure if my understanding is correct or not, thanks again for help.

The trick for widen-mult (or ISEL) would be to try to match the PHI
nodes in a similar way as to
gimple_simplify_phiopt calls op.resimplify.  The difficulty resides in
that the (match ...) generated
code gets the entry to the stmt root.  Either we'd teach genmatch to
recognize a PHI def
as a COND or we make (match ..) (additionally?) generate entry points
taking a gimple_match_op
so the toplevel COND trick works.  Note it's already a bit awkward
because we build a GENERIC
form of the condition and that's now invalid in the IL for a GIMPLE
COND_EXPR but still present
because of that phiopt trick.  There isn't a SSA def for the condition
in the IL (it's only part
of a GIMPLE_COND and that one doesn't define "CC flags").

That means possibly special-casing (match (..) (cond (cmp ...) ..)) in
genmatch to handle
PHI defs might be the easiest "trick" here.

Not sure what you did for the IL you quoted above.

Richard.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
>
> SAT_ADD_U_1(uint8_t);
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, May 22, 2024 9:14 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
> Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD
>
> On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to support the __builtin_add_overflow branch form for
> > unsigned SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   uint64_t ret;
> >   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> >
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   long unsigned int _2;
> >   uint64_t _3;
> >   __complex__ long unsigned int _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
> >   _2 = IMAGPART_EXPR <_6>;
> >   if (_2 != 0)
> >     goto <bb 4>; [35.00%]
> >   else
> >     goto <bb 3>; [65.00%]
> > ;;    succ:       4
> > ;;                3
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> >   _1 = REALPART_EXPR <_6>;
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       3
> > ;;                2
> >   # _3 = PHI <_1(3), 18446744073709551615(2)>
> >   return _3;
> > ;;    succ:       EXIT
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _12;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
> >   return _12;
> > ;;    succ:       EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
>
> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.
>
> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.
>
> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.
>
> > gcc/ChangeLog:
> >
> >         * match.pd: Add new simplify to convert branch SAT_ADD into
> >         branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/match.pd | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index cff67c84498..2dc77a46e67 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +(simplify
> > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> > +  integer_minus_onep (realpart @2))
> > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> > +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
> > +
> > +#endif
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
> >
Jeff Law May 23, 2024, 2:58 p.m. UTC | #5
On 5/23/24 6:14 AM, Richard Biener wrote:
> On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
>>
>> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
>> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
>>
>> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
>> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86
> 
> Likely you have released _2, more comments below on your previous mail.
You can be sure by calling debug_tree () on the SSA_NAME node in 
question.  If it reports "in-free-list", then that's definitive that the 
SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is 
still in the IL, then that's very bad.

jeff
Li, Pan2 May 24, 2024, 6:37 a.m. UTC | #6
Thanks Jeff and Richard for suggestion and reviewing.

Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b.
It can perform the convert from PHI to stmt = cond ? a : b successfully, and then
the widen-mul is able to do the recog to .SAT_ADD.

For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched,
as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs
is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition.

sat_add.c: In function ‘sat_add_u_3_uint8_t’:
sat_add.c:69:1: error: missing definition
   69 | SAT_ADD_U_3(uint8_t);
      | ^~~~~~~~~~~
for SSA_NAME: _6 in statement:
# VUSE <.MEM_14(D)>
return _6;
during GIMPLE pass: phiopt
dump file: sat_add.c.046t.phiopt1
sat_add.c:69:1: internal compiler error: verify_ssa failed
0x1db41ba verify_ssa(bool, bool
/home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
0x18e3075 execute_function_todo
        /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
0x18e1c52 do_per_function
        /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
0x18e3222 execute_todo

I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it.
Thus, there will be orphan nodes somewhere and we need something like rollback to recover the
gimple up to a point. I tried sorts of release_xx or likewise but seems not working.

So is there any suggest to take care of such gimple rollback or another solution for this? Below are
The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot.

Pan

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 918cf50b589..7982b65bac4 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
     }
 }

+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/* Try to match the phi expr to the gimple cond. Return true if we can
+   perform the convert or return false.  There will be some restrictions
+   or such kind of conversion, aka:
+
+   1. Only selected pattern will try this convert.
+   2. The generated gassign matched the selected IFN pattern.
+   3. The backend has implement the standard name.
+
+   From:
+     <bb 2> :
+     _1 = x_3(D) + y_4(D);
+     if (_1 >= x_3(D))
+       goto <bb 3>; [INV]
+     else
+       goto <bb 4>; [INV]
+
+     <bb 3> :
+
+     <bb 4> :
+     # _2 = PHI <255(2), _1(3)>
+
+   To:
+     <bb 2> :
+     _1 = x_3(D) + y_4(D);
+     phi_cond_6 = _1 >= x_3(D);
+     _2 = phi_cond_6 ? _1 : 255; */
+
+static bool
+match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1)
+{
+  gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
+
+  if (!cond)
+    return false;
+
+  enum tree_code code = gimple_cond_code (cond);
+  tree phi_result = gimple_phi_result (phi);        
+  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond");
+  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
+                         gimple_cond_rhs (cond));
+  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1);
+
+  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
+  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
+
+  tree ops[2];
+  tree lhs = gimple_assign_lhs (stmt_val);
+  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
+    && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
+                                      OPTIMIZE_FOR_BOTH));
+
+  if (matched_p)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+      gimple_stmt_iterator psi = gsi_for_stmt (phi);
+
+      gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT);
+      gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT);
+      remove_phi_node (&psi, false);
+    }
+  else
+    {
+      // Clean up the stmt created, but non of blow works well.
+      // gsi = gsi_for_stmt (stmt_val); 
+      // gsi_remove (&gsi, true);
+      // release_defs (stmt_val);
+      // ggc_free (stmt_val);
+
+      // gsi = gsi_for_stmt (stmt_cond);
+      // gsi_remove (&gsi, true);
+      // release_defs (stmt_cond);
+      // ggc_free (stmt_cond);
+
+      // release_defs (stmt_cond);
+      // release_defs (stmt_val);
+      release_ssa_name (cond_tree);
+    }
+
+  return matched_p;
+}
+
/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
    Return NULL if nothing can be simplified or the resulting simplified value
    with parts pushed if EARLY_P was true. Also rejects non allowed tree code
@@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb,
      So, given the condition COND, and the two PHI arguments, match and simplify
      can happen on (COND) ? arg0 : arg1. */

+  if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1))
+    return true;
+
   stmt = last_nondebug_stmt (cond_bb);

   /* We need to know which is the true edge and which is the false


-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Thursday, May 23, 2024 10:59 PM
To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD



On 5/23/24 6:14 AM, Richard Biener wrote:
> On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
>>
>> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
>> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
>>
>> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
>> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86
> 
> Likely you have released _2, more comments below on your previous mail.
You can be sure by calling debug_tree () on the SSA_NAME node in 
question.  If it reports "in-free-list", then that's definitive that the 
SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is 
still in the IL, then that's very bad.

jeff
Richard Biener May 24, 2024, 6:56 a.m. UTC | #7
On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Thanks Jeff and Richard for suggestion and reviewing.
>
> Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b.
> It can perform the convert from PHI to stmt = cond ? a : b successfully, and then
> the widen-mul is able to do the recog to .SAT_ADD.
>
> For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched,
> as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs
> is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition.
>
> sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> sat_add.c:69:1: error: missing definition
>    69 | SAT_ADD_U_3(uint8_t);
>       | ^~~~~~~~~~~
> for SSA_NAME: _6 in statement:
> # VUSE <.MEM_14(D)>
> return _6;
> during GIMPLE pass: phiopt
> dump file: sat_add.c.046t.phiopt1
> sat_add.c:69:1: internal compiler error: verify_ssa failed
> 0x1db41ba verify_ssa(bool, bool
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> 0x18e3075 execute_function_todo
>         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> 0x18e1c52 do_per_function
>         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> 0x18e3222 execute_todo
>
> I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it.
> Thus, there will be orphan nodes somewhere and we need something like rollback to recover the
> gimple up to a point. I tried sorts of release_xx or likewise but seems not working.
>
> So is there any suggest to take care of such gimple rollback or another solution for this? Below are
> The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot.
>
> Pan
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 918cf50b589..7982b65bac4 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
>      }
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/* Try to match the phi expr to the gimple cond. Return true if we can
> +   perform the convert or return false.  There will be some restrictions
> +   or such kind of conversion, aka:
> +
> +   1. Only selected pattern will try this convert.
> +   2. The generated gassign matched the selected IFN pattern.
> +   3. The backend has implement the standard name.
> +
> +   From:
> +     <bb 2> :
> +     _1 = x_3(D) + y_4(D);
> +     if (_1 >= x_3(D))
> +       goto <bb 3>; [INV]
> +     else
> +       goto <bb 4>; [INV]
> +
> +     <bb 3> :
> +
> +     <bb 4> :
> +     # _2 = PHI <255(2), _1(3)>
> +
> +   To:
> +     <bb 2> :
> +     _1 = x_3(D) + y_4(D);
> +     phi_cond_6 = _1 >= x_3(D);
> +     _2 = phi_cond_6 ? _1 : 255; */
> +
> +static bool
> +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1)

You should do this in widen-mult and/or ISEL and if necessary for vectorization
in tree-if-conv.cc, though eventually what if-convert creates might be
good enough
to match during pattern recognition.

> +{
> +  gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
> +
> +  if (!cond)
> +    return false;
> +
> +  enum tree_code code = gimple_cond_code (cond);
> +  tree phi_result = gimple_phi_result (phi);
> +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond");
> +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> +                         gimple_cond_rhs (cond));
> +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1);

phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond.

> +
> +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> +  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt_val);
> +  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +    && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +                                      OPTIMIZE_FOR_BOTH));
> +
> +  if (matched_p)
> +    {
> +      gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +      gimple_stmt_iterator psi = gsi_for_stmt (phi);
> +
> +      gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT);
> +      gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT);
> +      remove_phi_node (&psi, false);

You only matched but you do not insert the actual .SAT_ADD here and that's
the definition that's missing.  You probably shouldn't need to add the
cond-stmt?

> +    }
> +  else
> +    {
> +      // Clean up the stmt created, but non of blow works well.
> +      // gsi = gsi_for_stmt (stmt_val);
> +      // gsi_remove (&gsi, true);
> +      // release_defs (stmt_val);
> +      // ggc_free (stmt_val);
> +
> +      // gsi = gsi_for_stmt (stmt_cond);
> +      // gsi_remove (&gsi, true);
> +      // release_defs (stmt_cond);
> +      // ggc_free (stmt_cond);
> +
> +      // release_defs (stmt_cond);
> +      // release_defs (stmt_val);
> +      release_ssa_name (cond_tree);

As you don't insert the stmts you should be able to simply
only release the SSA names and ggc_free the stmt.  You can also
look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc
for more "ugly" ways to do this.

Building a helper in one place to match a PHI def as COND_EXPR
might be nice.  As said, avoiding all the mess by providing native
support from genmatch would be even better, but I'm not asking you
to do that.

Richard.

> +    }
> +
> +  return matched_p;
> +}
> +
> /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
>     Return NULL if nothing can be simplified or the resulting simplified value
>     with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb,
>       So, given the condition COND, and the two PHI arguments, match and simplify
>       can happen on (COND) ? arg0 : arg1. */
>
> +  if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1))
> +    return true;
> +
>    stmt = last_nondebug_stmt (cond_bb);
>
>    /* We need to know which is the true edge and which is the false
>
>
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Thursday, May 23, 2024 10:59 PM
> To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
> Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD
>
>
>
> On 5/23/24 6:14 AM, Richard Biener wrote:
> > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
> >>
> >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
> >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
> >>
> >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
> >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86
> >
> > Likely you have released _2, more comments below on your previous mail.
> You can be sure by calling debug_tree () on the SSA_NAME node in
> question.  If it reports "in-free-list", then that's definitive that the
> SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is
> still in the IL, then that's very bad.
>
> jeff
>
Richard Biener May 24, 2024, 7:20 a.m. UTC | #8
On Fri, May 24, 2024 at 8:56 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > Thanks Jeff and Richard for suggestion and reviewing.
> >
> > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b.
> > It can perform the convert from PHI to stmt = cond ? a : b successfully, and then
> > the widen-mul is able to do the recog to .SAT_ADD.
> >
> > For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched,
> > as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs
> > is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition.
> >
> > sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> > sat_add.c:69:1: error: missing definition
> >    69 | SAT_ADD_U_3(uint8_t);
> >       | ^~~~~~~~~~~
> > for SSA_NAME: _6 in statement:
> > # VUSE <.MEM_14(D)>
> > return _6;
> > during GIMPLE pass: phiopt
> > dump file: sat_add.c.046t.phiopt1
> > sat_add.c:69:1: internal compiler error: verify_ssa failed
> > 0x1db41ba verify_ssa(bool, bool
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> > 0x18e3075 execute_function_todo
> >         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> > 0x18e1c52 do_per_function
> >         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> > 0x18e3222 execute_todo
> >
> > I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it.
> > Thus, there will be orphan nodes somewhere and we need something like rollback to recover the
> > gimple up to a point. I tried sorts of release_xx or likewise but seems not working.
> >
> > So is there any suggest to take care of such gimple rollback or another solution for this? Below are
> > The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot.
> >
> > Pan
> >
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 918cf50b589..7982b65bac4 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
> >      }
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/* Try to match the phi expr to the gimple cond. Return true if we can
> > +   perform the convert or return false.  There will be some restrictions
> > +   or such kind of conversion, aka:
> > +
> > +   1. Only selected pattern will try this convert.
> > +   2. The generated gassign matched the selected IFN pattern.
> > +   3. The backend has implement the standard name.
> > +
> > +   From:
> > +     <bb 2> :
> > +     _1 = x_3(D) + y_4(D);
> > +     if (_1 >= x_3(D))
> > +       goto <bb 3>; [INV]
> > +     else
> > +       goto <bb 4>; [INV]
> > +
> > +     <bb 3> :
> > +
> > +     <bb 4> :
> > +     # _2 = PHI <255(2), _1(3)>
> > +
> > +   To:
> > +     <bb 2> :
> > +     _1 = x_3(D) + y_4(D);
> > +     phi_cond_6 = _1 >= x_3(D);
> > +     _2 = phi_cond_6 ? _1 : 255; */
> > +
> > +static bool
> > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1)
>
> You should do this in widen-mult and/or ISEL and if necessary for vectorization
> in tree-if-conv.cc, though eventually what if-convert creates might be
> good enough
> to match during pattern recognition.
>
> > +{
> > +  gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
> > +
> > +  if (!cond)
> > +    return false;
> > +
> > +  enum tree_code code = gimple_cond_code (cond);
> > +  tree phi_result = gimple_phi_result (phi);
> > +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond");
> > +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> > +                         gimple_cond_rhs (cond));
> > +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1);
>
> phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond.
>
> > +
> > +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> > +  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt_val);
> > +  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +    && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +                                      OPTIMIZE_FOR_BOTH));
> > +
> > +  if (matched_p)
> > +    {
> > +      gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +      gimple_stmt_iterator psi = gsi_for_stmt (phi);
> > +
> > +      gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT);
> > +      gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT);
> > +      remove_phi_node (&psi, false);
>
> You only matched but you do not insert the actual .SAT_ADD here and that's
> the definition that's missing.  You probably shouldn't need to add the
> cond-stmt?
>
> > +    }
> > +  else
> > +    {
> > +      // Clean up the stmt created, but non of blow works well.
> > +      // gsi = gsi_for_stmt (stmt_val);
> > +      // gsi_remove (&gsi, true);
> > +      // release_defs (stmt_val);
> > +      // ggc_free (stmt_val);
> > +
> > +      // gsi = gsi_for_stmt (stmt_cond);
> > +      // gsi_remove (&gsi, true);
> > +      // release_defs (stmt_cond);
> > +      // ggc_free (stmt_cond);
> > +
> > +      // release_defs (stmt_cond);
> > +      // release_defs (stmt_val);
> > +      release_ssa_name (cond_tree);
>
> As you don't insert the stmts you should be able to simply
> only release the SSA names and ggc_free the stmt.  You can also
> look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc
> for more "ugly" ways to do this.
>
> Building a helper in one place to match a PHI def as COND_EXPR
> might be nice.  As said, avoiding all the mess by providing native
> support from genmatch would be even better, but I'm not asking you
> to do that.

For reference the attached outlines where/what to do if you are curious.

Richard.

>
> Richard.
>
> > +    }
> > +
> > +  return matched_p;
> > +}
> > +
> > /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
> >     Return NULL if nothing can be simplified or the resulting simplified value
> >     with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> > @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb,
> >       So, given the condition COND, and the two PHI arguments, match and simplify
> >       can happen on (COND) ? arg0 : arg1. */
> >
> > +  if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1))
> > +    return true;
> > +
> >    stmt = last_nondebug_stmt (cond_bb);
> >
> >    /* We need to know which is the true edge and which is the false
> >
> >
> > -----Original Message-----
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Thursday, May 23, 2024 10:59 PM
> > To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
> > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD
> >
> >
> >
> > On 5/23/24 6:14 AM, Richard Biener wrote:
> > > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
> > >>
> > >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
> > >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
> > >>
> > >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
> > >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86
> > >
> > > Likely you have released _2, more comments below on your previous mail.
> > You can be sure by calling debug_tree () on the SSA_NAME node in
> > question.  If it reports "in-free-list", then that's definitive that the
> > SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is
> > still in the IL, then that's very bad.
> >
> > jeff
> >
Li, Pan2 May 24, 2024, 7:47 a.m. UTC | #9
Thanks Richard for help and comments.

If my understanding is correct, I should be able to follow below step(s) to support the branch form for unsigned SAT_ADD.

1. Building a helper in one place to match a PHI def as COND_EXPR (or even a better way to do it by providing native support from genmatch)
2. Leverage this helper in widen-mul and recog it as .SAT_ADD if matches.

Will have a try and keep you posted.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Friday, May 24, 2024 3:21 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

On Fri, May 24, 2024 at 8:56 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > Thanks Jeff and Richard for suggestion and reviewing.
> >
> > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b.
> > It can perform the convert from PHI to stmt = cond ? a : b successfully, and then
> > the widen-mul is able to do the recog to .SAT_ADD.
> >
> > For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched,
> > as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs
> > is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition.
> >
> > sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> > sat_add.c:69:1: error: missing definition
> >    69 | SAT_ADD_U_3(uint8_t);
> >       | ^~~~~~~~~~~
> > for SSA_NAME: _6 in statement:
> > # VUSE <.MEM_14(D)>
> > return _6;
> > during GIMPLE pass: phiopt
> > dump file: sat_add.c.046t.phiopt1
> > sat_add.c:69:1: internal compiler error: verify_ssa failed
> > 0x1db41ba verify_ssa(bool, bool
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> > 0x18e3075 execute_function_todo
> >         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> > 0x18e1c52 do_per_function
> >         /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> > 0x18e3222 execute_todo
> >
> > I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it.
> > Thus, there will be orphan nodes somewhere and we need something like rollback to recover the
> > gimple up to a point. I tried sorts of release_xx or likewise but seems not working.
> >
> > So is there any suggest to take care of such gimple rollback or another solution for this? Below are
> > The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot.
> >
> > Pan
> >
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 918cf50b589..7982b65bac4 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
> >      }
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/* Try to match the phi expr to the gimple cond. Return true if we can
> > +   perform the convert or return false.  There will be some restrictions
> > +   or such kind of conversion, aka:
> > +
> > +   1. Only selected pattern will try this convert.
> > +   2. The generated gassign matched the selected IFN pattern.
> > +   3. The backend has implement the standard name.
> > +
> > +   From:
> > +     <bb 2> :
> > +     _1 = x_3(D) + y_4(D);
> > +     if (_1 >= x_3(D))
> > +       goto <bb 3>; [INV]
> > +     else
> > +       goto <bb 4>; [INV]
> > +
> > +     <bb 3> :
> > +
> > +     <bb 4> :
> > +     # _2 = PHI <255(2), _1(3)>
> > +
> > +   To:
> > +     <bb 2> :
> > +     _1 = x_3(D) + y_4(D);
> > +     phi_cond_6 = _1 >= x_3(D);
> > +     _2 = phi_cond_6 ? _1 : 255; */
> > +
> > +static bool
> > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1)
>
> You should do this in widen-mult and/or ISEL and if necessary for vectorization
> in tree-if-conv.cc, though eventually what if-convert creates might be
> good enough
> to match during pattern recognition.
>
> > +{
> > +  gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb));
> > +
> > +  if (!cond)
> > +    return false;
> > +
> > +  enum tree_code code = gimple_cond_code (cond);
> > +  tree phi_result = gimple_phi_result (phi);
> > +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond");
> > +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> > +                         gimple_cond_rhs (cond));
> > +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1);
>
> phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond.
>
> > +
> > +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> > +  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt_val);
> > +  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +    && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +                                      OPTIMIZE_FOR_BOTH));
> > +
> > +  if (matched_p)
> > +    {
> > +      gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +      gimple_stmt_iterator psi = gsi_for_stmt (phi);
> > +
> > +      gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT);
> > +      gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT);
> > +      remove_phi_node (&psi, false);
>
> You only matched but you do not insert the actual .SAT_ADD here and that's
> the definition that's missing.  You probably shouldn't need to add the
> cond-stmt?
>
> > +    }
> > +  else
> > +    {
> > +      // Clean up the stmt created, but non of blow works well.
> > +      // gsi = gsi_for_stmt (stmt_val);
> > +      // gsi_remove (&gsi, true);
> > +      // release_defs (stmt_val);
> > +      // ggc_free (stmt_val);
> > +
> > +      // gsi = gsi_for_stmt (stmt_cond);
> > +      // gsi_remove (&gsi, true);
> > +      // release_defs (stmt_cond);
> > +      // ggc_free (stmt_cond);
> > +
> > +      // release_defs (stmt_cond);
> > +      // release_defs (stmt_val);
> > +      release_ssa_name (cond_tree);
>
> As you don't insert the stmts you should be able to simply
> only release the SSA names and ggc_free the stmt.  You can also
> look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc
> for more "ugly" ways to do this.
>
> Building a helper in one place to match a PHI def as COND_EXPR
> might be nice.  As said, avoiding all the mess by providing native
> support from genmatch would be even better, but I'm not asking you
> to do that.

For reference the attached outlines where/what to do if you are curious.

Richard.

>
> Richard.
>
> > +    }
> > +
> > +  return matched_p;
> > +}
> > +
> > /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
> >     Return NULL if nothing can be simplified or the resulting simplified value
> >     with parts pushed if EARLY_P was true. Also rejects non allowed tree code
> > @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb,
> >       So, given the condition COND, and the two PHI arguments, match and simplify
> >       can happen on (COND) ? arg0 : arg1. */
> >
> > +  if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1))
> > +    return true;
> > +
> >    stmt = last_nondebug_stmt (cond_bb);
> >
> >    /* We need to know which is the true edge and which is the false
> >
> >
> > -----Original Message-----
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Thursday, May 23, 2024 10:59 PM
> > To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com
> > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD
> >
> >
> >
> > On 5/23/24 6:14 AM, Richard Biener wrote:
> > > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote:
> > >>
> > >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255.
> > >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
> > >>
> > >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
> > >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86
> > >
> > > Likely you have released _2, more comments below on your previous mail.
> > You can be sure by calling debug_tree () on the SSA_NAME node in
> > question.  If it reports "in-free-list", then that's definitive that the
> > SSA_NAME was released back to the SSA_NAME manager.  If that SSA_NAME is
> > still in the IL, then that's very bad.
> >
> > jeff
> >
diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index cff67c84498..2dc77a46e67 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3080,6 +3080,17 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+#if GIMPLE
+
+(simplify
+ (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
+  integer_minus_onep (realpart @2))
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
+      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
+
+#endif
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)