diff mbox series

[2/3,vect] Add widening add, subtract vect patterns

Message ID VI1PR0802MB2205B412A197A26D635E6199F5E70@VI1PR0802MB2205.eurprd08.prod.outlook.com
State New
Headers show
Series [1/3,aarch64] Add aarch64 support for vec_widen_add, vec_widen_sub patterns | expand

Commit Message

Joel Hutton Nov. 12, 2020, 7:34 p.m. UTC
Hi all,

This patch adds widening add and widening subtract patterns to tree-vect-patterns.

All 3 patches together bootstrapped and regression tested on aarch64.

gcc/ChangeLog:

2020-11-12  Joel Hutton  <joel.hutton@arm.com>

        * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
        * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
        * optabs.def (OPTAB_D): define vectorized widen add, subtracts
        * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
        * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
        * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
        * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
        (vect_recog_widen_sub_pattern): New recog pattern
        (vect_recog_average_pattern): Update widened add code
        (vect_recog_average_pattern): Update widened add code
        * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
        (supportable_widening_operation): Add case for widened add, subtract
        * tree.def (WIDEN_ADD_EXPR): New tree code
        (WIDEN_SUB_EXPR): New tree code
        (VEC_WIDEN_ADD_HI_EXPR): New tree code
        (VEC_WIDEN_ADD_LO_EXPR): New tree code
        (VEC_WIDEN_SUB_HI_EXPR): New tree code
        (VEC_WIDEN_SUB_LO_EXPR): New tree code

gcc/testsuite/ChangeLog:

2020-11-12  Joel Hutton  <joel.hutton@arm.com>

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.


Ok for trunk?

Comments

Richard Biener Nov. 13, 2020, 7:58 a.m. UTC | #1
On Thu, 12 Nov 2020, Joel Hutton wrote:

> Hi all,
> 
> This patch adds widening add and widening subtract patterns to 
> tree-vect-patterns.

I am missing documentation in md.texi for the new patterns.  In
particular I wonder why you need singed and unsigned variants
for the add/subtract patterns.

We're walking away from adding tree codes for new vectorizer
pieces and instead want to use direct internal functions for them.
Can you rework the patch to use this approach?

Thanks,
Richard.

> All 3 patches together bootstrapped and regression tested on aarch64.
> 
> gcc/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?<joel.hutton@arm.com>
> 
> ? ? ? ? * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> ? ? ? ? * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> ? ? ? ? * optabs.def (OPTAB_D): define vectorized widen add, subtracts
> ? ? ? ? * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
> ? ? ? ? * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
> ? ? ? ? * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
> ? ? ? ? * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> ? ? ? ? (vect_recog_widen_sub_pattern): New recog pattern
> ? ? ? ? (vect_recog_average_pattern): Update widened add code
> ? ? ? ? (vect_recog_average_pattern): Update widened add code
> ? ? ? ? * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
> ? ? ? ? (supportable_widening_operation): Add case for widened add, subtract
> ? ? ? ? * tree.def (WIDEN_ADD_EXPR): New tree code
> ? ? ? ? (WIDEN_SUB_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_ADD_HI_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_ADD_LO_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_SUB_HI_EXPR): New tree code
> ? ? ? ? (VEC_WIDEN_SUB_LO_EXPR): New tree code
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-12 ?Joel Hutton ?<joel.hutton@arm.com>
> 
> ? ? ? ? * gcc.target/aarch64/vect-widen-add.c: New test.
> ? ? ? ? * gcc.target/aarch64/vect-widen-sub.c: New test.
> 
> 
> Ok for trunk?
>
Richard Sandiford Nov. 13, 2020, 12:16 p.m. UTC | #2
[ There was a discussion on irc about how easy it would be to support
  internal functions and tree codes at the same time, so the agreement
  was to go for tree codes for now with a promise to convert the
  widening-related code to use internal functions for GCC 12. ]

Like Richard said, the new patterns need to be documented in md.texi
and the new tree codes need to be documented in generic.texi.

While we're using tree codes, I think we need to make the naming
consistent with other tree codes: WIDEN_PLUS_EXPR instead of
WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
Same idea for the VEC_* codes.

(In constrast, the internal functions do try to follow the optab names,
since there's usually a 1:1 correspondence.)

Joel Hutton via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> This patch adds widening add and widening subtract patterns to tree-vect-patterns.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>
>         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases

Not that I personally care about this stuff (would love to see changelogs
go away :-)) but some nits:

Each description is supposed to start with a capital letter and end with
a full stop (even if it's not a complete sentence).  Same for the rest
of the log.

>         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts

The line limit for changelogs is 80 characters.  The entry should say
what changed, so “Handle …” or “Add case for …” or something.

>         * optabs.def (OPTAB_D): define vectorized widen add, subtracts
>         * tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds, subtracts
>         * tree-inline.c (estimate_operator_cost): Add case for widening adds, subtracts
>         * tree-vect-generic.c (expand_vector_operations_1): Add case for widening adds, subtracts
>         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern

typo: pattern

>         (vect_recog_widen_sub_pattern): New recog pattern
>         (vect_recog_average_pattern): Update widened add code
>         (vect_recog_average_pattern): Update widened add code
>         * tree-vect-stmts.c (vectorizable_conversion): Add case for widened add, subtract
>         (supportable_widening_operation): Add case for widened add, subtract
>         * tree.def (WIDEN_ADD_EXPR): New tree code
>         (WIDEN_SUB_EXPR): New tree code
>         (VEC_WIDEN_ADD_HI_EXPR): New tree code
>         (VEC_WIDEN_ADD_LO_EXPR): New tree code
>         (VEC_WIDEN_SUB_HI_EXPR): New tree code
>         (VEC_WIDEN_SUB_LO_EXPR): New tree code
>
> gcc/testsuite/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
>
>
> Ok for trunk?
>
> From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
> From: Joel Hutton <joel.hutton@arm.com>
> Date: Mon, 9 Nov 2020 15:44:18 +0000
> Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns
>
> Add widening add, subtract patterns to tree-vect-patterns.
> Add aarch64 tests for patterns.
>
> fix sad

Would be good to expand on this for the final commit message.

> […]
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>        return (TYPE_UNSIGNED (type)
>  	      ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
>  
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
> +
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
> +
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
> +
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +	      ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> +
> +

Nits: excess blank line at the end and excess space before the “:”s.

>      case VEC_UNPACK_HI_EXPR:
>        return (TYPE_UNSIGNED (type)
>  	      ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
>  OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
>  OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
>  OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
> +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
> +OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
> +OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
> +OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
> +OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
> +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")

Looks like the current code groups signed stuff together and
unsigned stuff together, so would be good to follow that.

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps" } */
> +#include <stdint.h>
> +#include <string.h>
> +
> +#define ARR_SIZE 1024
> +
> +/* Should produce an uaddl */
> +void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +/* Should produce an saddl */
> +void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +
> +void __attribute__((optimize (0)))
> +init(uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE;i++)
> +    {
> +      a[i] = i;
> +      b[i] = 2*i;
> +    }
> +}
> +
> +int __attribute__((optimize (0)))
> +main()
> +{
> +    uint32_t foo_arr[ARR_SIZE];
> +    uint32_t bar_arr[ARR_SIZE];
> +    uint16_t a[ARR_SIZE];
> +    uint16_t b[ARR_SIZE];
> +
> +    init(a, b);
> +    uadd_opt(foo_arr, a, b);
> +    uadd_nonopt(bar_arr, a, b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
> +    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "uaddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl2\t" 1} } */

Same comments as the previous patch about having a "+nosve" pragma
and about the scan-assembler-times lines.  Same for the sub test.

> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3864,6 +3864,12 @@ verify_gimple_assign_binary (gassign *stmt)
>          return false;
>        }
>  
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return false;
> +

I think these should get the same validity checking as
VEC_WIDEN_MULT_HI_EXPR etc.

>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>        {
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
>       of the above pattern.  */
>  
>    tree plus_oprnd0, plus_oprnd1;
> -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> -				       &plus_oprnd0, &plus_oprnd1))
> +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> +				       &plus_oprnd0, &plus_oprnd1)
> +	|| vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> +				       &plus_oprnd0, &plus_oprnd1)))
>      return NULL;
>  
>    tree sum_type = gimple_expr_type (last_stmt);

I think we should make:

  /* Any non-truncating sequence of conversions is OK here, since
     with a successful match, the result of the ABS(U) is known to fit
     within the nonnegative range of the result type.  (It cannot be the
     negative of the minimum signed value due to the range of the widening
     MINUS_EXPR.)  */
  vect_unpromoted_value unprom_abs;
  plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
						      &unprom_abs);

specific to the PLUS_EXPR case.  If we look through promotions on
the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
and one on its inputs.

E.g. if we had:

    uint32_t x;
    int32_t y = (int32_t) x;
    int64_t z = WIDEN_ADD_EXPR <y, …>;

I think the code above would look through the signedness change and
give the SAD_EXPR an unsigned extension instead of a signed one.

(Alternatively, we could try to handle WIDEN_ADD_EXPR with promoted
operands, but I think in practice it would be wasted work.)

> @@ -1148,7 +1150,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom[2];
> -  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
> +  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_SUB_EXPR,
>  			     false, 2, unprom, &half_type))
>      return NULL;
>  
> @@ -1262,6 +1264,24 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
>  				      "vect_recog_widen_mult_pattern");
>  }
>  
> +static gimple *
> +vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      PLUS_EXPR, WIDEN_ADD_EXPR, false,
> +				      "vect_recog_widen_add_pattern");
> +}
> +
> +static gimple *
> +vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      MINUS_EXPR, WIDEN_SUB_EXPR, false,
> +				      "vect_recog_widen_sub_pattern");
> +}
> +

The new functions should have comments before them.  Can probably
just use the vect_recog_widen_mult_pattern comment as a template.

Thanks,
Richard
Joel Hutton Nov. 13, 2020, 4:48 p.m. UTC | #3
Tests are still running, but I believe I've addressed all the comments.

> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.

Done.

> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.

Fixed.

> > gcc/ChangeLog:
> >
> > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
> >
> >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> 
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
> 
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence).  Same for the rest

Fixed.

> >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> 
> The line limit for changelogs is 80 characters.  The entry should say
> what changed, so “Handle …” or “Add case for …” or something.

Fixed.

> >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> 
> typo: pattern

Fixed.

> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
> 
> Would be good to expand on this for the final commit message.

'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.

> > +
> > +    case VEC_WIDEN_SUB_HI_EXPR:
> > +      return (TYPE_UNSIGNED (type)
> > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > +
> > +
> 
> Nits: excess blank line at the end and excess space before the “:”s.

Fixed.

> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> 
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.

Fixed.

> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines.  Same for the sub test.

Fixed.

> I am missing documentation in md.texi for the new patterns.  In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.

Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).

> The new functions should have comments before them.  Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.

Fixed.

> > +    case VEC_WIDEN_SUB_HI_EXPR:
> > +    case VEC_WIDEN_SUB_LO_EXPR:
> > +    case VEC_WIDEN_ADD_HI_EXPR:
> > +    case VEC_WIDEN_ADD_LO_EXPR:
> > +      return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.

Fixed.

> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> >       of the above pattern.  */
> >
> >    tree plus_oprnd0, plus_oprnd1;
> > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > -                                    &plus_oprnd0, &plus_oprnd1))
> > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > +                                    &plus_oprnd0, &plus_oprnd1)
> > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > +                                    &plus_oprnd0, &plus_oprnd1)))
> >      return NULL;
> >
> >     tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
>   /* Any non-truncating sequence of conversions is OK here, since
>      with a successful match, the result of the ABS(U) is known to fit
>      within the nonnegative range of the result type.  (It cannot be the
>      negative of the minimum signed value due to the range of the widening
>      MINUS_EXPR.)  */
>   vect_unpromoted_value unprom_abs;
>   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>                                                       &unprom_abs);
>
> specific to the PLUS_EXPR case.  If we look through promotions on
> the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> and one on its inputs.

Fixed.


gcc/ChangeLog:

2020-11-13  Joel Hutton  <joel.hutton@arm.com>

	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
	  adds, subtracts.
        * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
	  subtracts.
	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
	   subtracts.
	* tree-vect-generic.c (expand_vector_operations_1): Add case for
	  widening adds, subtracts tree-vect-patterns.c
	* (vect_recog_widen_add_pattern): New recog pattern.
	  (vect_recog_widen_sub_pattern): New recog pattern.
	  (vect_recog_average_pattern): Update widened add code.
	  (vect_recog_average_pattern): Update widened add code.
	* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
	  subtract.
	(supportable_widening_operation): Add case for widened add, subtract.
	* tree.def
	(WIDEN_PLUS_EXPR): New tree code.
	(WIDEN_MINUS_EXPR): New tree code.
	(VEC_WIDEN_ADD_HI_EXPR): New tree code.
	(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
	(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
	(VEC_WIDEN_MINUS_LO_EXPR): New tree code.

gcc/testsuite/ChangeLog:

2020-11-13  Joel Hutton  <joel.hutton@arm.com>

        * gcc.target/aarch64/vect-widen-add.c: New test.
        * gcc.target/aarch64/vect-widen-sub.c: New test.
Richard Biener Nov. 16, 2020, 2:04 p.m. UTC | #4
On Fri, 13 Nov 2020, Joel Hutton wrote:

> Tests are still running, but I believe I've addressed all the comments.
> 
> > Like Richard said, the new patterns need to be documented in md.texi
> > and the new tree codes need to be documented in generic.texi.
> 
> Done.
> 
> > While we're using tree codes, I think we need to make the naming
> > consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> > WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> > Same idea for the VEC_* codes.
> 
> Fixed.
> 
> > > gcc/ChangeLog:
> > >
> > > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
> > >
> > >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
> > 
> > Not that I personally care about this stuff (would love to see changelogs
> > go away :-)) but some nits:
> > 
> > Each description is supposed to start with a capital letter and end with
> > a full stop (even if it's not a complete sentence).  Same for the rest
> 
> Fixed.
> 
> > >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
> > 
> > The line limit for changelogs is 80 characters.  The entry should say
> > what changed, so ?Handle ?? or ?Add case for ?? or something.
> 
> Fixed.
> 
> > >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
> > 
> > typo: pattern
> 
> Fixed.
> 
> > > Add widening add, subtract patterns to tree-vect-patterns.
> > > Add aarch64 tests for patterns.
> > >
> > > fix sad
> > 
> > Would be good to expand on this for the final commit message.
> 
> 'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.
> 
> > > +
> > > +    case VEC_WIDEN_SUB_HI_EXPR:
> > > +      return (TYPE_UNSIGNED (type)
> > > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> > > +
> > > +
> > 
> > Nits: excess blank line at the end and excess space before the ?:?s.
> 
> Fixed.
> 
> > > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> > >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> > >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> > >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
> > 
> > Looks like the current code groups signed stuff together and
> > unsigned stuff together, so would be good to follow that.
> 
> Fixed.
> 
> > Same comments as the previous patch about having a "+nosve" pragma
> > and about the scan-assembler-times lines.  Same for the sub test.
> 
> Fixed.
> 
> > I am missing documentation in md.texi for the new patterns.  In
> > particular I wonder why you need singed and unsigned variants
> > for the add/subtract patterns.
> 
> Fixed. Signed and unsigned variants because they correspond to signed and
> unsigned instructions, (uaddl/uaddl2, saddl/saddl2).
> 
> > The new functions should have comments before them.  Can probably
> > just use the vect_recog_widen_mult_pattern comment as a template.
> 
> Fixed.
> 
> > > +    case VEC_WIDEN_SUB_HI_EXPR:
> > > +    case VEC_WIDEN_SUB_LO_EXPR:
> > > +    case VEC_WIDEN_ADD_HI_EXPR:
> > > +    case VEC_WIDEN_ADD_LO_EXPR:
> > > +      return false;
> > > +
> >
> > I think these should get the same validity checking as
> > VEC_WIDEN_MULT_HI_EXPR etc.
> 
> Fixed.
> 
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > >       of the above pattern.  */
> > >
> > >    tree plus_oprnd0, plus_oprnd1;
> > > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > > -                                    &plus_oprnd0, &plus_oprnd1))
> > > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > > +                                    &plus_oprnd0, &plus_oprnd1)
> > > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > > +                                    &plus_oprnd0, &plus_oprnd1)))
> > >      return NULL;
> > >
> > >     tree sum_type = gimple_expr_type (last_stmt);
> >
> > I think we should make:
> >
> >   /* Any non-truncating sequence of conversions is OK here, since
> >      with a successful match, the result of the ABS(U) is known to fit
> >      within the nonnegative range of the result type.  (It cannot be the
> >      negative of the minimum signed value due to the range of the widening
> >      MINUS_EXPR.)  */
> >   vect_unpromoted_value unprom_abs;
> >   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
> >                                                       &unprom_abs);
> >
> > specific to the PLUS_EXPR case.  If we look through promotions on
> > the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> > of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> > and one on its inputs.
> 
> Fixed.

LGTM.

Thanks,
Richard.

> 
> gcc/ChangeLog:
> 
> 2020-11-13  Joel Hutton  <joel.hutton@arm.com>
> 
> 	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
> 	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
> 	  adds, subtracts.
>         * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
> 	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
> 	  subtracts.
> 	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
> 	   subtracts.
> 	* tree-vect-generic.c (expand_vector_operations_1): Add case for
> 	  widening adds, subtracts tree-vect-patterns.c
> 	* (vect_recog_widen_add_pattern): New recog pattern.
> 	  (vect_recog_widen_sub_pattern): New recog pattern.
> 	  (vect_recog_average_pattern): Update widened add code.
> 	  (vect_recog_average_pattern): Update widened add code.
> 	* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
> 	  subtract.
> 	(supportable_widening_operation): Add case for widened add, subtract.
> 	* tree.def
> 	(WIDEN_PLUS_EXPR): New tree code.
> 	(WIDEN_MINUS_EXPR): New tree code.
> 	(VEC_WIDEN_ADD_HI_EXPR): New tree code.
> 	(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
> 	(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
> 	(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-11-13  Joel Hutton  <joel.hutton@arm.com>
> 
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
>
Richard Sandiford Nov. 17, 2020, 1:40 p.m. UTC | #5
Richard Biener <rguenther@suse.de> writes:
> On Fri, 13 Nov 2020, Joel Hutton wrote:
>
>> Tests are still running, but I believe I've addressed all the comments.
>> 
>> > Like Richard said, the new patterns need to be documented in md.texi
>> > and the new tree codes need to be documented in generic.texi.
>> 
>> Done.
>> 
>> > While we're using tree codes, I think we need to make the naming
>> > consistent with other tree codes: WIDEN_PLUS_EXPR instead of
>> > WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
>> > Same idea for the VEC_* codes.
>> 
>> Fixed.
>> 
>> > > gcc/ChangeLog:
>> > >
>> > > 2020-11-12  Joel Hutton  <joel.hutton@arm.com>
>> > >
>> > >         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
>> > 
>> > Not that I personally care about this stuff (would love to see changelogs
>> > go away :-)) but some nits:
>> > 
>> > Each description is supposed to start with a capital letter and end with
>> > a full stop (even if it's not a complete sentence).  Same for the rest
>> 
>> Fixed.
>> 
>> > >         * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
>> > 
>> > The line limit for changelogs is 80 characters.  The entry should say
>> > what changed, so ?Handle ?? or ?Add case for ?? or something.
>> 
>> Fixed.
>> 
>> > >         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
>> > 
>> > typo: pattern
>> 
>> Fixed.
>> 
>> > > Add widening add, subtract patterns to tree-vect-patterns.
>> > > Add aarch64 tests for patterns.
>> > >
>> > > fix sad
>> > 
>> > Would be good to expand on this for the final commit message.
>> 
>> 'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.
>> 
>> > > +
>> > > +    case VEC_WIDEN_SUB_HI_EXPR:
>> > > +      return (TYPE_UNSIGNED (type)
>> > > +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
>> > > +
>> > > +
>> > 
>> > Nits: excess blank line at the end and excess space before the ?:?s.
>> 
>> Fixed.
>> 
>> > > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
>> > > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
>> > > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>> > >  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>> > >  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>> > >  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
>> > 
>> > Looks like the current code groups signed stuff together and
>> > unsigned stuff together, so would be good to follow that.
>> 
>> Fixed.
>> 
>> > Same comments as the previous patch about having a "+nosve" pragma
>> > and about the scan-assembler-times lines.  Same for the sub test.
>> 
>> Fixed.
>> 
>> > I am missing documentation in md.texi for the new patterns.  In
>> > particular I wonder why you need singed and unsigned variants
>> > for the add/subtract patterns.
>> 
>> Fixed. Signed and unsigned variants because they correspond to signed and
>> unsigned instructions, (uaddl/uaddl2, saddl/saddl2).
>> 
>> > The new functions should have comments before them.  Can probably
>> > just use the vect_recog_widen_mult_pattern comment as a template.
>> 
>> Fixed.
>> 
>> > > +    case VEC_WIDEN_SUB_HI_EXPR:
>> > > +    case VEC_WIDEN_SUB_LO_EXPR:
>> > > +    case VEC_WIDEN_ADD_HI_EXPR:
>> > > +    case VEC_WIDEN_ADD_LO_EXPR:
>> > > +      return false;
>> > > +
>> >
>> > I think these should get the same validity checking as
>> > VEC_WIDEN_MULT_HI_EXPR etc.
>> 
>> Fixed.
>> 
>> > > --- a/gcc/tree-vect-patterns.c
>> > > +++ b/gcc/tree-vect-patterns.c
>> > > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
>> > >       of the above pattern.  */
>> > >
>> > >    tree plus_oprnd0, plus_oprnd1;
>> > > -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
>> > > -                                    &plus_oprnd0, &plus_oprnd1))
>> > > +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
>> > > +                                    &plus_oprnd0, &plus_oprnd1)
>> > > +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
>> > > +                                    &plus_oprnd0, &plus_oprnd1)))
>> > >      return NULL;
>> > >
>> > >     tree sum_type = gimple_expr_type (last_stmt);
>> >
>> > I think we should make:
>> >
>> >   /* Any non-truncating sequence of conversions is OK here, since
>> >      with a successful match, the result of the ABS(U) is known to fit
>> >      within the nonnegative range of the result type.  (It cannot be the
>> >      negative of the minimum signed value due to the range of the widening
>> >      MINUS_EXPR.)  */
>> >   vect_unpromoted_value unprom_abs;
>> >   plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
>> >                                                       &unprom_abs);
>> >
>> > specific to the PLUS_EXPR case.  If we look through promotions on
>> > the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
>> > of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
>> > and one on its inputs.
>> 
>> Fixed.
>
> LGTM.

Same here FWIW, although:

> +/* Try to detect addition on widened inputs, converting SUB_EXPR
> +   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
> +static gimple *
> +vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +			       tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
> +				      "vect_recog_widen_minus_pattern");
> +}

s/addition/subtraction/ and s/SUB_EXPR/MINUS_EXPR/ in the comment.

Just to be sure, on the changelog:

>> 	* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
>> 	* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
>> 	  adds, subtracts.

Continuation lines should be indented by a tab only, not a tab and two
spaces.  (Although I agree the above looks nicer than the “correct” way.)

>>         * optabs.def (OPTAB_D): Define vectorized widen add, subtracts.

Should be indented by a tab rather than 8 spaces.

>> 	* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
>> 	  subtracts.
>> 	* tree-inline.c (estimate_operator_cost): Add case for widening adds,
>> 	   subtracts.
>> 	* tree-vect-generic.c (expand_vector_operations_1): Add case for
>> 	  widening adds, subtracts tree-vect-patterns.c
>> 	* (vect_recog_widen_add_pattern): New recog pattern.

Mis-positioned tree-vect-patterns.c, should be

	* tree-vect-generic.c (expand_vector_operations_1): Add case for
	widening adds, subtracts.
	* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
	pattern.
	…

No need for another review around over that though, just go ahead and
apply the patch with those changes.

Thanks,
Richard
diff mbox series

Patch

From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
From: Joel Hutton <joel.hutton@arm.com>
Date: Mon, 9 Nov 2020 15:44:18 +0000
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns.
Add aarch64 tests for patterns.

fix sad
---
 gcc/expr.c                                    |  6 ++
 gcc/optabs-tree.c                             | 17 ++++
 gcc/optabs.def                                |  8 ++
 .../gcc.target/aarch64/vect-widen-add.c       | 90 +++++++++++++++++++
 .../gcc.target/aarch64/vect-widen-sub.c       | 90 +++++++++++++++++++
 gcc/tree-cfg.c                                |  8 ++
 gcc/tree-inline.c                             |  6 ++
 gcc/tree-vect-generic.c                       |  4 +
 gcc/tree-vect-patterns.c                      | 32 +++++--
 gcc/tree-vect-stmts.c                         | 15 +++-
 gcc/tree.def                                  |  6 ++
 11 files changed, 276 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c

diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f07775870792729e3805436d7f2debafb6ca..ffc8aed5296174066849d9e0d73b1c352c20fd9e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 					  target, unsignedp);
       return target;
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case WIDEN_MULT_EXPR:
       /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
@@ -9754,6 +9756,10 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,23 @@  optab_for_tree_code (enum tree_code code, const_tree type,
       return (TYPE_UNSIGNED (type)
 	      ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
 
+    case VEC_WIDEN_ADD_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
+
+    case VEC_WIDEN_ADD_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
+
+    case VEC_WIDEN_SUB_LO_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
+
+    case VEC_WIDEN_SUB_HI_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
+
+
     case VEC_UNPACK_HI_EXPR:
       return (TYPE_UNSIGNED (type)
 	      ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,14 @@  OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
 OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
 OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
 OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
+OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
+OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
+OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
+OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
+OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
+OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
+OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
new file mode 100644
index 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -0,0 +1,90 @@ 
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#define ARR_SIZE 1024
+
+/* Should produce an uaddl */
+void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+/* Should produce an saddl */
+void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   + b[i];
+        foo[i+1] = a[i+1] + b[i+1];
+        foo[i+2] = a[i+2] + b[i+2];
+        foo[i+3] = a[i+3] + b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    uadd_opt(foo_arr, a, b);
+    uadd_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times "uaddl\t" 1} } */
+/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */
+/* { dg-final { scan-assembler-times "saddl\t" 1} } */
+/* { dg-final { scan-assembler-times "saddl2\t" 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
new file mode 100644
index 0000000000000000000000000000000000000000..eab252786cd3a24974011c0b4451029ac1194935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -0,0 +1,90 @@ 
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#define ARR_SIZE 1024
+
+/* Should produce an usubl */
+void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+/* Should produce an ssubl */
+void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+__attribute__((optimize (0)))
+void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+    {
+        foo[i]   = a[i]   - b[i];
+        foo[i+1] = a[i+1] - b[i+1];
+        foo[i+2] = a[i+2] - b[i+2];
+        foo[i+3] = a[i+3] - b[i+3];
+    }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+    for( int i = 0; i < ARR_SIZE;i++)
+    {
+      a[i] = i;
+      b[i] = 2*i;
+    }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+    uint32_t foo_arr[ARR_SIZE];
+    uint32_t bar_arr[ARR_SIZE];
+    uint16_t a[ARR_SIZE];
+    uint16_t b[ARR_SIZE];
+
+    init(a, b);
+    usub_opt(foo_arr, a, b);
+    usub_nonopt(bar_arr, a, b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+    ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+      return 1;
+    return 0;
+}
+
+/* { dg-final { scan-assembler-times "usubl\t" 1} } */
+/* { dg-final { scan-assembler-times "usubl2\t" 1} } */
+/* { dg-final { scan-assembler-times "ssubl\t" 1} } */
+/* { dg-final { scan-assembler-times "ssubl2\t" 1} } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3864,6 +3864,12 @@  verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+      return false;
+
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
       {
@@ -3885,6 +3891,8 @@  verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case PLUS_EXPR:
     case MINUS_EXPR:
       {
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 32424b169c7310c03168baf43cc56a8b6ef2f15b..bff650be79e0c820f619ab333871770658ce93ee 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4224,6 +4224,8 @@  estimate_operator_cost (enum tree_code code, eni_weights *weights,
 
     case REALIGN_LOAD_EXPR:
 
+    case WIDEN_ADD_EXPR:
+    case WIDEN_SUB_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
@@ -4232,6 +4234,10 @@  estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case WIDEN_MULT_MINUS_EXPR:
     case WIDEN_LSHIFT_EXPR:
 
+    case VEC_WIDEN_ADD_HI_EXPR:
+    case VEC_WIDEN_ADD_LO_EXPR:
+    case VEC_WIDEN_SUB_HI_EXPR:
+    case VEC_WIDEN_SUB_LO_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d7bafa77134079faf743c1b482251311abb681c5..940658c6be9c41cdc35b4b72d78fc6c2ed3f2072 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -2118,6 +2118,10 @@  expand_vector_operations_1 (gimple_stmt_iterator *gsi,
      arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
      calculated in the same way above.  */
   if (code == WIDEN_SUM_EXPR
+      || code == VEC_WIDEN_ADD_HI_EXPR
+      || code == VEC_WIDEN_ADD_LO_EXPR
+      || code == VEC_WIDEN_SUB_HI_EXPR
+      || code == VEC_WIDEN_SUB_LO_EXPR
       || code == VEC_WIDEN_MULT_HI_EXPR
       || code == VEC_WIDEN_MULT_LO_EXPR
       || code == VEC_WIDEN_MULT_EVEN_EXPR
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1086,8 +1086,10 @@  vect_recog_sad_pattern (vec_info *vinfo,
      of the above pattern.  */
 
   tree plus_oprnd0, plus_oprnd1;
-  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
-				       &plus_oprnd0, &plus_oprnd1))
+  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
+				       &plus_oprnd0, &plus_oprnd1)
+	|| vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
+				       &plus_oprnd0, &plus_oprnd1)))
     return NULL;
 
   tree sum_type = gimple_expr_type (last_stmt);
@@ -1148,7 +1150,7 @@  vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_SUB_EXPR,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1262,6 +1264,24 @@  vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 				      "vect_recog_widen_mult_pattern");
 }
 
+static gimple *
+vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      PLUS_EXPR, WIDEN_ADD_EXPR, false,
+				      "vect_recog_widen_add_pattern");
+}
+
+static gimple *
+vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			       tree *type_out)
+{
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      MINUS_EXPR, WIDEN_SUB_EXPR, false,
+				      "vect_recog_widen_sub_pattern");
+}
+
 /* Function vect_recog_pow_pattern
 
    Try to find the following pattern:
@@ -1978,7 +1998,7 @@  vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    PLUS_EXPR, false, 3,
+					    WIDEN_ADD_EXPR, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -5249,7 +5269,9 @@  static vect_recog_func vect_vect_recog_func_ptrs[] = {
      of mask conversion that are needed for gather and scatter
      internal functions.  */
   { vect_recog_gather_scatter_pattern, "gather_scatter" },
-  { vect_recog_mask_conversion_pattern, "mask_conversion" }
+  { vect_recog_mask_conversion_pattern, "mask_conversion" },
+  { vect_recog_widen_add_pattern, "widen_add" },
+  { vect_recog_widen_sub_pattern, "widen_sub" },
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2c7a8a70913bfc4b9903e9328f4489257ca59e02..f12fd158b13656ee24022ec7e445c53444be6554 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4570,6 +4570,8 @@  vectorizable_conversion (vec_info *vinfo,
   if (!CONVERT_EXPR_CODE_P (code)
       && code != FIX_TRUNC_EXPR
       && code != FLOAT_EXPR
+      && code != WIDEN_ADD_EXPR
+      && code != WIDEN_SUB_EXPR
       && code != WIDEN_MULT_EXPR
       && code != WIDEN_LSHIFT_EXPR)
     return false;
@@ -4615,7 +4617,8 @@  vectorizable_conversion (vec_info *vinfo,
 
   if (op_type == binary_op)
     {
-      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR);
+      gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR
+		  || code == WIDEN_ADD_EXPR || code == WIDEN_SUB_EXPR);
 
       op1 = gimple_assign_rhs2 (stmt);
       tree vectype1_in;
@@ -11534,6 +11537,16 @@  supportable_widening_operation (vec_info *vinfo,
       c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
       break;
 
+    case WIDEN_ADD_EXPR:
+      c1 = VEC_WIDEN_ADD_LO_EXPR;
+      c2 = VEC_WIDEN_ADD_HI_EXPR;
+      break;
+
+    case WIDEN_SUB_EXPR:
+      c1 = VEC_WIDEN_SUB_LO_EXPR;
+      c2 = VEC_WIDEN_SUB_HI_EXPR;
+      break;
+
     CASE_CONVERT:
       c1 = VEC_UNPACK_LO_EXPR;
       c2 = VEC_UNPACK_HI_EXPR;
diff --git a/gcc/tree.def b/gcc/tree.def
index 6c53fe1bf67cd8eee7084de0b20b8d217d70710a..daaa2f22b384739c2d9fddcb1fd9185099e11788 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1359,6 +1359,8 @@  DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
    the first argument from type t1 to type t2, and then shifting it
    by the second argument.  */
 DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_ADD_EXPR, "widen_add_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_SUB_EXPR, "widen_sub_expr", tcc_binary, 2)
 
 /* Widening vector multiplication.
    The two operands are vectors with N elements of size S. Multiplying the
@@ -1423,6 +1425,10 @@  DEFTREECODE (VEC_PACK_FLOAT_EXPR, "vec_pack_float_expr", tcc_binary, 2)
  */
 DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_ADD_HI_EXPR, "widen_add_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_ADD_LO_EXPR, "widen_add_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_SUB_HI_EXPR, "widen_add_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_SUB_LO_EXPR, "widen_add_lo_expr", tcc_binary, 2)
 
 /* PREDICT_EXPR.  Specify hint for branch prediction.  The
    PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
-- 
2.17.1