diff mbox series

lra: Canonicalize mult to shift in address reloads

Message ID 20200825101855.clw3dgzfxsgudpzn@arm.com
State New
Headers show
Series lra: Canonicalize mult to shift in address reloads | expand

Commit Message

Alex Coplan Aug. 25, 2020, 10:18 a.m. UTC
Hello,

Inside a (mem) RTX, it is canonical to write multiplications by powers
of two using a (mult) [0]. For example, given the following C function:

long f(long *p, long x)
{
    return p[x];
}

AArch64 GCC generates the following RTL insn (in final):

(set (reg/i:DI 0 x0)
     (mem:DI (plus:DI (mult:DI (reg:DI 1 x1 [101])
                 (const_int 8 [0x8]))
             (reg:DI 0 x0 [100])) [1 *_3+0 S8 A64]))

However, outside of a (mem), the canonical way to write multiplications
by powers of two is using (ashift). E.g. for the following C function:

long *g(long *p, long x)
{
    return p + x;
}

AArch64 GCC generates:

(set (reg/i:DI 0 x0)
     (plus:DI (ashift:DI (reg:DI 1 x1 [100])
             (const_int 3 [0x3]))
         (reg:DI 0 x0 [99])))

Now I observed that LRA does not quite respect this RTL canonicalization
rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
-ftree-vectorize, the RTL in the dump "281r.ira" has the insn:

(set (reg:SI 111 [ MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B] ])
     (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
                 (const_int 4 [0x4]))
             (reg/v/f:DI 105 [ b ])) [2 MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B]+0 S4 A32]))

but LRA then proceeds to generate a reload, and we get the following
non-canonical insn in "282r.reload":

(set (reg:DI 7 x7 [121])
     (plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
             (const_int 4 [0x4]))
         (reg/v/f:DI 1 x1 [orig:105 b ] [105])))

This patch fixes LRA to ensure that we generate canonical RTL in this
case. After the patch, we get the following insn in "282r.reload":

(set (reg:DI 7 x7 [121])
        (plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
                (const_int 2 [0x2]))
            (reg/v/f:DI 1 x1 [orig:105 b ] [105])))

The motivation here is to be able to remove several redundant patterns
in the AArch64 backend. See the previous thread [1] for context.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu,
   x86_64-pc-linux-gnu.
 * New unit test which checks that we're using the shift pattern (rather
   than the problematic mult pattern) on AArch64.

OK for master?

Thanks,
Alex

[0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
[1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html

---

gcc/ChangeLog:

	* lra-constraints.c (canonicalize_reload_addr): New.
	(curr_insn_transform): Use canonicalize_reload_addr to ensure we
	generate canonical RTL for an address reload.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/mem-shift-canonical.c: New test.

Comments

Vladimir Makarov Aug. 25, 2020, 8:45 p.m. UTC | #1
On 2020-08-25 6:18 a.m., Alex Coplan wrote:
> The motivation here is to be able to remove several redundant patterns
> in the AArch64 backend. See the previous thread [1] for context.
>
> Testing:
>   * Bootstrapped and regtested on aarch64-none-linux-gnu,
>     x86_64-pc-linux-gnu.
>   * New unit test which checks that we're using the shift pattern (rather
>     than the problematic mult pattern) on AArch64.
>
> OK for master?
Yes. Thank you for working on this issue and the patch.
> Thanks,
> Alex
>
> [0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
> [1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html
>
> ---
>
> gcc/ChangeLog:
>
> 	* lra-constraints.c (canonicalize_reload_addr): New.
> 	(curr_insn_transform): Use canonicalize_reload_addr to ensure we
> 	generate canonical RTL for an address reload.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/mem-shift-canonical.c: New test.
>
Richard Sandiford Aug. 26, 2020, 9:06 a.m. UTC | #2
Alex Coplan <alex.coplan@arm.com> writes:
> Hello,
>
> Inside a (mem) RTX, it is canonical to write multiplications by powers
> of two using a (mult) [0]. For example, given the following C function:
>
> long f(long *p, long x)
> {
>     return p[x];
> }
>
> AArch64 GCC generates the following RTL insn (in final):
>
> (set (reg/i:DI 0 x0)
>      (mem:DI (plus:DI (mult:DI (reg:DI 1 x1 [101])
>                  (const_int 8 [0x8]))
>              (reg:DI 0 x0 [100])) [1 *_3+0 S8 A64]))
>
> However, outside of a (mem), the canonical way to write multiplications
> by powers of two is using (ashift). E.g. for the following C function:
>
> long *g(long *p, long x)
> {
>     return p + x;
> }
>
> AArch64 GCC generates:
>
> (set (reg/i:DI 0 x0)
>      (plus:DI (ashift:DI (reg:DI 1 x1 [100])
>              (const_int 3 [0x3]))
>          (reg:DI 0 x0 [99])))
>
> Now I observed that LRA does not quite respect this RTL canonicalization
> rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
> -ftree-vectorize, the RTL in the dump "281r.ira" has the insn:
>
> (set (reg:SI 111 [ MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B] ])
>      (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
>                  (const_int 4 [0x4]))
>              (reg/v/f:DI 105 [ b ])) [2 MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B]+0 S4 A32]))
>
> but LRA then proceeds to generate a reload, and we get the following
> non-canonical insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
>      (plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
>              (const_int 4 [0x4]))
>          (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> This patch fixes LRA to ensure that we generate canonical RTL in this
> case. After the patch, we get the following insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
>         (plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
>                 (const_int 2 [0x2]))
>             (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> The motivation here is to be able to remove several redundant patterns
> in the AArch64 backend. See the previous thread [1] for context.
>
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu,
>    x86_64-pc-linux-gnu.
>  * New unit test which checks that we're using the shift pattern (rather
>    than the problematic mult pattern) on AArch64.
>
> OK for master?
>
> Thanks,
> Alex
>
> [0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
> [1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html

Thanks for doing this.

>
> ---
>
> gcc/ChangeLog:
>
> 	* lra-constraints.c (canonicalize_reload_addr): New.
> 	(curr_insn_transform): Use canonicalize_reload_addr to ensure we
> 	generate canonical RTL for an address reload.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/mem-shift-canonical.c: New test.
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 421c453997b..630a4f167cd 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -570,6 +570,34 @@ init_curr_insn_input_reloads (void)
>    curr_insn_input_reloads_num = 0;
>  }
>  
> +/* The canonical form of an rtx inside a MEM is not necessarily the same as the
> +   canonical form of the rtx outside the MEM.  Fix this up in the case that
> +   we're reloading an address (and therefore pulling it outside a MEM).  */
> +static rtx canonicalize_reload_addr (rtx addr)

Minor nit, should be formatted as:

static rtx
canonicalize_reload_addr (rtx addr)

> +{
> +  const enum rtx_code code = GET_CODE (addr);
> +
> +  if (code == PLUS)
> +    {
> +      rtx inner = XEXP (addr, 0);
> +      if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
> +	{
> +	  const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> +	  const int pwr2 = exact_log2 (ci);
> +	  if (pwr2 > 0)
> +	    {
> +	      /* Rewrite this to use a shift instead, which is canonical
> +		 when outside of a MEM.  */
> +	      XEXP (addr, 0) = gen_rtx_ASHIFT (GET_MODE (inner),
> +					       XEXP (inner, 0),
> +					       GEN_INT (pwr2));
> +	    }
> +	}
> +    }

I don't think we should we restrict this to (plus (mult X Y) Z),
since addresses can be more complicated than that.  One way to
search for all MULTs is:

  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, x, NONCONST)
    {
      rtx x = *iter;
      if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
        ...
    }

(Needs rtl-iter.h)

Thanks,
Richard
Vladimir Makarov Aug. 26, 2020, 1:19 p.m. UTC | #3
On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> Alex Coplan <alex.coplan@arm.com> writes:
>
> Minor nit, should be formatted as:
>
> static rtx
> canonicalize_reload_addr (rtx addr)
Sorry for missing this.  Alex, it should be fixed anyway.
>
> I don't think we should we restrict this to (plus (mult X Y) Z),
> since addresses can be more complicated than that.  One way to
> search for all MULTs is:
>
>    subrtx_iterator::array_type array;
>    FOR_EACH_SUBRTX (iter, array, x, NONCONST)
>      {
>        rtx x = *iter;
>        if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
>          ...
>      }
>
> (Needs rtl-iter.h)

I am agree it would be nice to process a general case.  Alex, you can do 
this as a separate patch if you want.

Richard, thank you for looking at this patch too.
Alex Coplan Aug. 26, 2020, 3:15 p.m. UTC | #4
Thanks for the review, both.

On 26/08/2020 09:19, Vladimir Makarov wrote:
> 
> On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > Alex Coplan <alex.coplan@arm.com> writes:
> > 
> > Minor nit, should be formatted as:
> > 
> > static rtx
> > canonicalize_reload_addr (rtx addr)
> Sorry for missing this.  Alex, it should be fixed anyway.
> > 
> > I don't think we should we restrict this to (plus (mult X Y) Z),
> > since addresses can be more complicated than that.  One way to
> > search for all MULTs is:
> > 
> >    subrtx_iterator::array_type array;
> >    FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> >      {
> >        rtx x = *iter;
> >        if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> >          ...
> >      }
> > 
> > (Needs rtl-iter.h)
> 
> I am agree it would be nice to process a general case.  Alex, you can do
> this as a separate patch if you want.
> 
> Richard, thank you for looking at this patch too.
> 
> 

Please find a reworked version of the patch attached incorporating
Richard's feedback.

Testing:
 * Bootstrap and regtest on aarch64-none-linux-gnu,
   arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.

Is the updated patch OK for master?

Thanks,
Alex
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 421c453997b..580da9c3ed6 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -131,6 +131,7 @@
 #include "lra-int.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "rtl-iter.h"
 
 /* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current
    insn.  Remember that LRA_CURR_RELOAD_NUM is the number of emitted
@@ -570,6 +571,33 @@ init_curr_insn_input_reloads (void)
   curr_insn_input_reloads_num = 0;
 }
 
+/* The canonical form of an rtx inside a MEM is not necessarily the same as the
+   canonical form of the rtx outside the MEM.  Fix this up in the case that
+   we're reloading an address (and therefore pulling it outside a MEM).  */
+static rtx
+canonicalize_reload_addr (rtx addr)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, addr, NONCONST)
+    {
+      rtx x = *iter;
+      if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
+	{
+	  const HOST_WIDE_INT ci = INTVAL (XEXP (x, 1));
+	  const int pwr2 = exact_log2 (ci);
+	  if (pwr2 > 0)
+	    {
+	      /* Rewrite this to use a shift instead, which is canonical when
+		 outside of a MEM.  */
+	      PUT_CODE (x, ASHIFT);
+	      XEXP (x, 1) = GEN_INT (pwr2);
+	    }
+	}
+    }
+
+  return addr;
+}
+
 /* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already
    created input reload pseudo (only if TYPE is not OP_OUT).  Don't
    reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
@@ -4362,12 +4390,19 @@ curr_insn_transform (bool check_only_p)
 	    {
 	      rtx addr = *loc;
 	      enum rtx_code code = GET_CODE (addr);
-	      
+	      bool align_p = false;
+
 	      if (code == AND && CONST_INT_P (XEXP (addr, 1)))
-		/* (and ... (const_int -X)) is used to align to X bytes.  */
-		addr = XEXP (*loc, 0);
+		{
+		  /* (and ... (const_int -X)) is used to align to X bytes.  */
+		  align_p = true;
+		  addr = XEXP (*loc, 0);
+		}
+	      else
+		addr = canonicalize_reload_addr (addr);
+
 	      lra_emit_move (new_reg, addr);
-	      if (addr != *loc)
+	      if (align_p)
 		emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), new_reg, XEXP (*loc, 1)));
 	    }
 	  before = get_insns ();
diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
new file mode 100644
index 00000000000..36beed497a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -0,0 +1,27 @@
+/* This test is a copy of gcc.dg/torture/pr34330.c: here we are looking for
+   specific patterns being matched in the AArch64 backend.  */
+
+/* { dg-do compile } */
+/* { dg-options "-Os -ftree-vectorize -dp" } */
+
+
+struct T
+{
+  int t;
+  struct { short s1, s2, s3, s4; } *s;
+};
+
+void
+foo (int *a, int *b, int *c, int *d, struct T *e)
+{
+  int i;
+  for (i = 0; i < e->t; i++)
+    {
+      e->s[i].s1 = a[i];
+      e->s[i].s2 = b[i];
+      e->s[i].s3 = c[i];
+      e->s[i].s4 = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times "add_lsl_di" 3 } } */
Vladimir Makarov Aug. 27, 2020, 2:26 a.m. UTC | #5
On 2020-08-26 11:15 a.m., Alex Coplan wrote:
> Thanks for the review, both.
>
>
> Please find a reworked version of the patch attached incorporating
> Richard's feedback.
>
> Testing:
>   * Bootstrap and regtest on aarch64-none-linux-gnu,
>     arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.
>
> Is the updated patch OK for master?
>
Yes.  Thank you, Alex.
Christophe Lyon Aug. 28, 2020, 8:16 a.m. UTC | #6
Hi Alex,


On Wed, 26 Aug 2020 at 17:15, Alex Coplan <alex.coplan@arm.com> wrote:
>
> Thanks for the review, both.
>
> On 26/08/2020 09:19, Vladimir Makarov wrote:
> >
> > On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > > Alex Coplan <alex.coplan@arm.com> writes:
> > >
> > > Minor nit, should be formatted as:
> > >
> > > static rtx
> > > canonicalize_reload_addr (rtx addr)
> > Sorry for missing this.  Alex, it should be fixed anyway.
> > >
> > > I don't think we should we restrict this to (plus (mult X Y) Z),
> > > since addresses can be more complicated than that.  One way to
> > > search for all MULTs is:
> > >
> > >    subrtx_iterator::array_type array;
> > >    FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> > >      {
> > >        rtx x = *iter;
> > >        if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> > >          ...
> > >      }
> > >
> > > (Needs rtl-iter.h)
> >
> > I am agree it would be nice to process a general case.  Alex, you can do
> > this as a separate patch if you want.
> >
> > Richard, thank you for looking at this patch too.
> >
> >
>
> Please find a reworked version of the patch attached incorporating
> Richard's feedback.
>
> Testing:
>  * Bootstrap and regtest on aarch64-none-linux-gnu,
>    arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.
>

Unfortunately, the new test fails on aarch64 with mabi=ilp32
FAIL: gcc.target/aarch64/mem-shift-canonical.c scan-assembler-times add_lsl_di 3

Christophe

> Is the updated patch OK for master?
>
> Thanks,
> Alex
Alex Coplan Aug. 28, 2020, 9:25 a.m. UTC | #7
Re: [PATCH] lra: Canonicalize mult to shift in address reloads

Hi Christophe,

On 28/08/2020 10:16, Christophe Lyon wrote:
> Hi Alex,
> 
> 
> On Wed, 26 Aug 2020 at 17:15, Alex Coplan <alex.coplan@arm.com> wrote:
> >
> > Thanks for the review, both.
> >
> > On 26/08/2020 09:19, Vladimir Makarov wrote:
> > >
> > > On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > > > Alex Coplan <alex.coplan@arm.com> writes:
> > > >
> > > > Minor nit, should be formatted as:
> > > >
> > > > static rtx
> > > > canonicalize_reload_addr (rtx addr)
> > > Sorry for missing this.  Alex, it should be fixed anyway.
> > > >
> > > > I don't think we should we restrict this to (plus (mult X Y) Z),
> > > > since addresses can be more complicated than that.  One way to
> > > > search for all MULTs is:
> > > >
> > > >    subrtx_iterator::array_type array;
> > > >    FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> > > >      {
> > > >        rtx x = *iter;
> > > >        if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> > > >          ...
> > > >      }
> > > >
> > > > (Needs rtl-iter.h)
> > >
> > > I am agree it would be nice to process a general case.  Alex, you can do
> > > this as a separate patch if you want.
> > >
> > > Richard, thank you for looking at this patch too.
> > >
> > >
> >
> > Please find a reworked version of the patch attached incorporating
> > Richard's feedback.
> >
> > Testing:
> >  * Bootstrap and regtest on aarch64-none-linux-gnu,
> >    arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.
> >
> 
> Unfortunately, the new test fails on aarch64 with mabi=ilp32
> FAIL: gcc.target/aarch64/mem-shift-canonical.c scan-assembler-times add_lsl_di 3

Thanks for catching this. It fails since we're looking for a pattern
that could only be hit on LP64. Disabling the test on ILP32 since the
problematic mult pattern was never hit there, so there's nothing to
test.

Committing as obvious.

Thanks,
Alex

> 
> Christophe
> 

---

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/mem-shift-canonical.c: Skip on ILP32.

---

diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
index 36beed497a0..47479ffff29 100644
--- a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -3,6 +3,7 @@
 
 /* { dg-do compile } */
 /* { dg-options "-Os -ftree-vectorize -dp" } */
+/* { dg-require-effective-target lp64 } */
 
 
 struct T
diff mbox series

Patch

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 421c453997b..630a4f167cd 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -570,6 +570,34 @@  init_curr_insn_input_reloads (void)
   curr_insn_input_reloads_num = 0;
 }
 
+/* The canonical form of an rtx inside a MEM is not necessarily the same as the
+   canonical form of the rtx outside the MEM.  Fix this up in the case that
+   we're reloading an address (and therefore pulling it outside a MEM).  */
+static rtx canonicalize_reload_addr (rtx addr)
+{
+  const enum rtx_code code = GET_CODE (addr);
+
+  if (code == PLUS)
+    {
+      rtx inner = XEXP (addr, 0);
+      if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
+	{
+	  const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
+	  const int pwr2 = exact_log2 (ci);
+	  if (pwr2 > 0)
+	    {
+	      /* Rewrite this to use a shift instead, which is canonical
+		 when outside of a MEM.  */
+	      XEXP (addr, 0) = gen_rtx_ASHIFT (GET_MODE (inner),
+					       XEXP (inner, 0),
+					       GEN_INT (pwr2));
+	    }
+	}
+    }
+
+  return addr;
+}
+
 /* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already
    created input reload pseudo (only if TYPE is not OP_OUT).  Don't
    reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
@@ -4362,12 +4390,19 @@  curr_insn_transform (bool check_only_p)
 	    {
 	      rtx addr = *loc;
 	      enum rtx_code code = GET_CODE (addr);
-	      
+	      bool align_p = false;
+
 	      if (code == AND && CONST_INT_P (XEXP (addr, 1)))
-		/* (and ... (const_int -X)) is used to align to X bytes.  */
-		addr = XEXP (*loc, 0);
+		{
+		  /* (and ... (const_int -X)) is used to align to X bytes.  */
+		  align_p = true;
+		  addr = XEXP (*loc, 0);
+		}
+	      else
+		addr = canonicalize_reload_addr (addr);
+
 	      lra_emit_move (new_reg, addr);
-	      if (addr != *loc)
+	      if (align_p)
 		emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), new_reg, XEXP (*loc, 1)));
 	    }
 	  before = get_insns ();
diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
new file mode 100644
index 00000000000..36beed497a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -0,0 +1,27 @@ 
+/* This test is a copy of gcc.dg/torture/pr34330.c: here we are looking for
+   specific patterns being matched in the AArch64 backend.  */
+
+/* { dg-do compile } */
+/* { dg-options "-Os -ftree-vectorize -dp" } */
+
+
+struct T
+{
+  int t;
+  struct { short s1, s2, s3, s4; } *s;
+};
+
+void
+foo (int *a, int *b, int *c, int *d, struct T *e)
+{
+  int i;
+  for (i = 0; i < e->t; i++)
+    {
+      e->s[i].s1 = a[i];
+      e->s[i].s2 = b[i];
+      e->s[i].s3 = c[i];
+      e->s[i].s4 = d[i];
+    }
+}
+
+/* { dg-final { scan-assembler-times "add_lsl_di" 3 } } */