[v1,1/2] Match: Support branch form for unsigned SAT_ADD

Message ID	20240520110048.2827900-1-pan2.li@intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE02F3858D1E From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com, richard.guenther@gmail.com, Pan Li <pan2.li@intel.com> Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD Date: Mon, 20 May 2024 19:00:48 +0800 Message-Id: <20240520110048.2827900-1-pan2.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[v1,1/2] Match: Support branch form for unsigned SAT_ADD \| expand [v1,1/2] Match: Support branch form for unsigned SAT_ADD [v1,2/2] RISC-V: Add test cases for branch form unsigned SAT_ADD

Message ID

20240520110048.2827900-1-pan2.li@intel.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE02F3858D1E
From: pan2.li@intel.com
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com,
 richard.guenther@gmail.com, Pan Li <pan2.li@intel.com>
Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
Date: Mon, 20 May 2024 19:00:48 +0800
Message-Id: <20240520110048.2827900-1-pan2.li@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

[v1,1/2] Match: Support branch form for unsigned SAT_ADD | expand

Commit Message

Li, Pan2 May 20, 2024, 11 a.m. UTC

From: Pan Li <pan2.li@intel.com>

This patch would like to support the branch form for unsigned
SAT_ADD.  For example as below:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  return (uint64_t) (x + y) >= x ? (x + y) : -1;
}

Different to the branchless version,  we leverage the simplify to
convert the branch version of SAT_ADD into branchless if and only
if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
the ability to choose branch or branchless implementation of .SAT_ADD.
For example,  some target can take care of branches code more optimally.

When the target implement the IFN_SAT_ADD for unsigned and before this
patch:
uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  uint64_t _2;
  __complex__ long unsigned int _6;
  long unsigned int _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
  _1 = REALPART_EXPR <_6>;
  _7 = IMAGPART_EXPR <_6>;
  if (_7 == 0)
    goto <bb 4>; [65.00%]
  else
    goto <bb 3>; [35.00%]
;;    succ:       4
;;                3

;;   basic block 3, loop depth 0
;;    pred:       2
;;    succ:       4

;;   basic block 4, loop depth 0
;;    pred:       3
;;                2
  # _2 = PHI <18446744073709551615(3), _1(2)>
  return _2;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _9;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
  return _9;
;;    succ:       EXIT
}

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

	* match.pd: Add new simplify to convert branch SAT_ADD into
	branchless,  if and only if backend implement the IFN.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/match.pd | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Tamar Christina May 20, 2024, 11:50 a.m. UTC | #1

Hi Pan,

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Monday, May 20, 2024 12:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li
> <pan2.li@intel.com>
> Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to support the branch form for unsigned
> SAT_ADD.  For example as below:
> 
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> }
> 
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
> 
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
> uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   uint64_t _2;
>   __complex__ long unsigned int _6;
>   long unsigned int _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
>   _1 = REALPART_EXPR <_6>;
>   _7 = IMAGPART_EXPR <_6>;
>   if (_7 == 0)
>     goto <bb 4>; [65.00%]
>   else
>     goto <bb 3>; [35.00%]
> ;;    succ:       4
> ;;                3
> 
> ;;   basic block 3, loop depth 0
> ;;    pred:       2
> ;;    succ:       4
> 
> ;;   basic block 4, loop depth 0
> ;;    pred:       3
> ;;                2
>   # _2 = PHI <18446744073709551615(3), _1(2)>
>   return _2;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _9;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
>   return _9;
> ;;    succ:       EXIT
> }
> 
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
> 
> gcc/ChangeLog:
> 
> 	* match.pd: Add new simplify to convert branch SAT_ADD into
> 	branchless,  if and only if backend implement the IFN.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/match.pd | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..0547b57b3a3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> 
> +#if GIMPLE
> +
> +/* Simplify the branch version of SAT_ADD into branchless if and only if
> +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> +   ability to choose branch or branchless implementation of .SAT_ADD.  */
> +
> +(simplify
> + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +(simplify
> + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> +
> +#endif

Thanks, this looks good to me!

I'll leave it up to Richard to approve,
Richard: The reason for the direct_internal_fn_supported_p is because some
targets said that they currently handle the branch version better due to the lack
of some types.  At the time I reason it's just a target expansion bug but didn't hear anything.

To be honest, it feels to me like we should do this unconditionally, and just have the targets
that get faster branch version to handle it during expand? Since the patch series provides
a canonicalized version now.

This means we can also better support targets that have the vector optab but not the scalar one
as the above check would fail for these targets.

What do you think?

Thanks,
Tamar

> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1

Richard Biener May 22, 2024, 1:23 p.m. UTC | #2

On Mon, May 20, 2024 at 1:50 PM Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Pan,
>
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Monday, May 20, 2024 12:01 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li
> > <pan2.li@intel.com>
> > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to support the branch form for unsigned
> > SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   uint64_t _2;
> >   __complex__ long unsigned int _6;
> >   long unsigned int _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
> >   _1 = REALPART_EXPR <_6>;
> >   _7 = IMAGPART_EXPR <_6>;
> >   if (_7 == 0)
> >     goto <bb 4>; [65.00%]
> >   else
> >     goto <bb 3>; [35.00%]
> > ;;    succ:       4
> > ;;                3
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       3
> > ;;                2
> >   # _2 = PHI <18446744073709551615(3), _1(2)>
> >   return _2;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _9;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
> >   return _9;
> > ;;    succ:       EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
> >
> > gcc/ChangeLog:
> >
> >       * match.pd: Add new simplify to convert branch SAT_ADD into
> >       branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/match.pd | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0f9c34fa897..0547b57b3a3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +/* Simplify the branch version of SAT_ADD into branchless if and only if
> > +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> > +   ability to choose branch or branchless implementation of .SAT_ADD.  */

This comment or part of the description above should say this simplifies

   (x + y) >= x ? (x + y) : -1

as

  (x + y) | (-(typeof(x))((x + y) < x))

> > +(simplify
> > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> > +
> > +(simplify
> > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))

and this should probably be (gt @2 @0)?

This misses INTEGER_TYPE_P constraints and it's supposed to be only
for TYPE_UNSIGNED?

> > +
> > +#endif
>
> Thanks, this looks good to me!
>
> I'll leave it up to Richard to approve,
> Richard: The reason for the direct_internal_fn_supported_p is because some
> targets said that they currently handle the branch version better due to the lack
> of some types.  At the time I reason it's just a target expansion bug but didn't hear anything.
>
> To be honest, it feels to me like we should do this unconditionally, and just have the targets
> that get faster branch version to handle it during expand? Since the patch series provides
> a canonicalized version now.

I'm not sure this is a good canonical form.

__imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1

would be better IMO.  It can be branch-less by using a COND_EXPR.

> This means we can also better support targets that have the vector optab but not the scalar one
> as the above check would fail for these targets.
>
> What do you think?
>
> Thanks,
> Tamar
>
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
>

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..0547b57b3a3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3094,6 +3094,24 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+#if GIMPLE
+
+/* Simplify the branch version of SAT_ADD into branchless if and only if
+   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
+   ability to choose branch or branchless implementation of .SAT_ADD.  */
+
+(simplify
+ (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))))))
+
+(simplify
+ (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))))))
+
+#endif
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)

[v1,1/2] Match: Support branch form for unsigned SAT_ADD

Commit Message

Comments

Patch