diff mbox series

[AArch64] Limit movmem copies to TImode copies.

Message ID 20180813093307.GA19530@arm.com
State New
Headers show
Series [AArch64] Limit movmem copies to TImode copies. | expand

Commit Message

Tamar Christina Aug. 13, 2018, 9:33 a.m. UTC
Hi All,

On AArch64 we have integer modes larger than TImode, and while we can generate
moves for these they're not as efficient.

So instead make sure we limit the maximum we can copy to TImode.  This means
copying a 16 byte struct will issue 1 TImode copy, which will be done using a
single STP as we expect but an CImode sized copy won't issue CImode operations.

Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
Crosstested aarch4_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-13  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.

gcc/testsuite/
2018-08-13  Tamar Christina  <tamar.christina@arm.com>

 	* gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.

--

Comments

Thomas Preudhomme Aug. 13, 2018, 1:37 p.m. UTC | #1
Hi Tamar,

Thanks for your patch.

Just one comment about your ChangeLog entry for the testsuiet change:
shouldn't it mention that it is a new testcase? The patch you attached
seems to create the file.

Best regards,

Thomas

On Mon, 13 Aug 2018 at 10:33, Tamar Christina <tamar.christina@arm.com>
wrote:

> Hi All,
>
> On AArch64 we have integer modes larger than TImode, and while we can
> generate
> moves for these they're not as efficient.
>
> So instead make sure we limit the maximum we can copy to TImode.  This
> means
> copying a 16 byte struct will issue 1 TImode copy, which will be done
> using a
> single STP as we expect but an CImode sized copy won't issue CImode
> operations.
>
> Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
> Crosstested aarch4_be-none-elf and no issues.
>
> Ok for trunk?
>
> Thanks,
> Tamar
>
> gcc/
> 2018-08-13  Tamar Christina  <tamar.christina@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.
>
> gcc/testsuite/
> 2018-08-13  Tamar Christina  <tamar.christina@arm.com>
>
>         * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.
>
> --
>
Tamar Christina Aug. 13, 2018, 4:27 p.m. UTC | #2
Hi Thomas,

Thanks for the review.

I’ll correct the typo before committing if I have no other changes required by a maintainer.

Regards,
Tamar.

From: Thomas Preudhomme <thomas.preudhomme@linaro.org>
Sent: Monday, August 13, 2018 14:37
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.

Hi Tamar,

Thanks for your patch.

Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file.

Best regards,

Thomas

On Mon, 13 Aug 2018 at 10:33, Tamar Christina <tamar.christina@arm.com<mailto:tamar.christina@arm.com>> wrote:
Hi All,

On AArch64 we have integer modes larger than TImode, and while we can generate
moves for these they're not as efficient.

So instead make sure we limit the maximum we can copy to TImode.  This means
copying a 16 byte struct will issue 1 TImode copy, which will be done using a
single STP as we expect but an CImode sized copy won't issue CImode operations.

Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
Crosstested aarch4_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>

        * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.

gcc/testsuite/
2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>

        * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.

--
Sudakshina Das Aug. 14, 2018, 12:34 p.m. UTC | #3
Hi Tamar

On 13/08/18 17:27, Tamar Christina wrote:
> Hi Thomas,
> 
> Thanks for the review.
> 
> I’ll correct the typo before committing if I have no other changes required by a maintainer.
> 
> Regards,
> Tamar.
> 

I am not a maintainer but I would like to point out something in your
patch. I think you test case will fail with -mabi=ilp32

FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors)
Excess errors:
/work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: 
warning: overflow in conversion from 'long
long int' to 'long int' changes value from '4073709551611' to
'2080555003' [-Woverflow]

We have had more such recent failures and James gave a very neat
way to make sure the mode comes out what you intend it to here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html

I would just ask you to change the data types accordingly and test it
with -mabi=ilp32.

Thanks
Sudi

> From: Thomas Preudhomme <thomas.preudhomme@linaro.org>
> Sent: Monday, August 13, 2018 14:37
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft <Marcus.Shawcroft@arm.com>
> Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.
> 
> Hi Tamar,
> 
> Thanks for your patch.
> 
> Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file.
> 
> Best regards,
> 
> Thomas
> 
> On Mon, 13 Aug 2018 at 10:33, Tamar Christina <tamar.christina@arm.com<mailto:tamar.christina@arm.com>> wrote:
> Hi All,
> 
> On AArch64 we have integer modes larger than TImode, and while we can generate
> moves for these they're not as efficient.
> 
> So instead make sure we limit the maximum we can copy to TImode.  This means
> copying a 16 byte struct will issue 1 TImode copy, which will be done using a
> single STP as we expect but an CImode sized copy won't issue CImode operations.
> 
> Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
> Crosstested aarch4_be-none-elf and no issues.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>
> 
>          * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.
> 
> gcc/testsuite/
> 2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>
> 
>          * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.
> 
> --
>
Tamar Christina Aug. 15, 2018, 12:55 p.m. UTC | #4
Hi All,

I'm updating the patch with the suggested changes and also fixing a bug with a boundary condition.

On AArch64 we have integer modes larger than TImode, and while we can generate
moves for these they're not as efficient.

So instead make sure we limit the maximum we can copy to TImode.  This means
copying a 16 byte struct will issue 1 TImode copy, which will be done using a
single STP as we expect but an CImode sized copy won't issue CImode operations.

I am also moving the residual code inside the if since smallest_mode_for_int may
trap if the mode doesn't exist.  And the only time we know the mode to exist for
sure is when the condition of the if is true.  This also saves repeated calls to
the iterator.

Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
Crosstested aarch4_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-15  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.

gcc/testsuite/
2018-08-15  Tamar Christina  <tamar.christina@arm.com>

 	* gcc.target/aarch64/large_struct_copy_2.c: New.

The 08/14/2018 13:34, Sudakshina Das wrote:
> Hi Tamar
> 
> On 13/08/18 17:27, Tamar Christina wrote:
> > Hi Thomas,
> > 
> > Thanks for the review.
> > 
> > I’ll correct the typo before committing if I have no other changes required by a maintainer.
> > 
> > Regards,
> > Tamar.
> > 
> 
> I am not a maintainer but I would like to point out something in your
> patch. I think you test case will fail with -mabi=ilp32
> 
> FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors)
> Excess errors:
> /work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: 
> warning: overflow in conversion from 'long
> long int' to 'long int' changes value from '4073709551611' to
> '2080555003' [-Woverflow]
> 
> We have had more such recent failures and James gave a very neat
> way to make sure the mode comes out what you intend it to here:
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html
> 
> I would just ask you to change the data types accordingly and test it
> with -mabi=ilp32.
> 
> Thanks
> Sudi
> 
> > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>
> > Sent: Monday, August 13, 2018 14:37
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft <Marcus.Shawcroft@arm.com>
> > Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.
> > 
> > Hi Tamar,
> > 
> > Thanks for your patch.
> > 
> > Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file.
> > 
> > Best regards,
> > 
> > Thomas
> > 
> > On Mon, 13 Aug 2018 at 10:33, Tamar Christina <tamar.christina@arm.com<mailto:tamar.christina@arm.com>> wrote:
> > Hi All,
> > 
> > On AArch64 we have integer modes larger than TImode, and while we can generate
> > moves for these they're not as efficient.
> > 
> > So instead make sure we limit the maximum we can copy to TImode.  This means
> > copying a 16 byte struct will issue 1 TImode copy, which will be done using a
> > single STP as we expect but an CImode sized copy won't issue CImode operations.
> > 
> > Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
> > Crosstested aarch4_be-none-elf and no issues.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/
> > 2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>
> > 
> >          * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.
> > 
> > gcc/testsuite/
> > 2018-08-13  Tamar Christina  <tamar.christina@arm.com<mailto:tamar.christina@arm.com>>
> > 
> >          * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.
> > 
> > --
> > 
> 

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e8d8104c066a265120ab776f7ab5a959d3512b6..9d677c6eead382aa469ab04edd4e432d4c2e0279 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15927,13 +15927,17 @@ aarch64_expand_movmem (rtx *operands)
   /* Convert n to bits to make the rest of the code simpler.  */
   n = n * BITS_PER_UNIT;
 
+  /* Maximum amount to copy in one go.  The AArch64 back-end has integer modes
+     larger than TImode, but we should not use them for loads/stores here.  */
+  const int copy_limit = GET_MODE_BITSIZE (TImode);
+
   while (n > 0)
     {
       /* Find the largest mode in which to do the copy in without over reading
 	 or writing.  */
       opt_scalar_int_mode mode_iter;
       FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
-	if (GET_MODE_BITSIZE (mode_iter.require ()) <= n)
+	if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
 	  cur_mode = mode_iter.require ();
 
       gcc_assert (cur_mode != BLKmode);
@@ -15947,10 +15951,10 @@ aarch64_expand_movmem (rtx *operands)
 	 cheaper.  i.e. less instructions to do so.  For instance doing a 15
 	 byte copy it's more efficient to do two overlapping 8 byte copies than
 	 8 + 6 + 1.  */
-      next_mode = smallest_mode_for_size (n, MODE_INT);
-      int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
-      if (n > 0 && n_bits > n && n_bits <= 8 * BITS_PER_UNIT)
+      if (n > 0 && n <= 8 * BITS_PER_UNIT)
 	{
+	  next_mode = smallest_mode_for_size (n, MODE_INT);
+	  int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();
 	  src = aarch64_move_pointer (src, (n - n_bits) / BITS_PER_UNIT);
 	  dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
 	  n = n_bits;
diff --git a/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c b/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..565434244e8296749a99bebc4e095065945d825e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned __attribute__((mode(DI))) uint64_t;
+
+struct S0 {
+  uint64_t f1;
+  uint64_t f2;
+  uint64_t f3;
+  uint64_t f4;
+  uint64_t f5;
+} a;
+struct S2 {
+  uint64_t f0;
+  uint64_t f2;
+  struct S0 f3;
+};
+
+void fn1 () {
+  struct S2 b = {0, 1, 7, 4073709551611, 4, 8, 7};
+  a = b.f3;
+}
+
+/* { dg-final { scan-assembler-times {ldp\s+x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {stp\s+x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-not {ld[1-3]} } } */
James Greenhalgh Aug. 29, 2018, 4:49 p.m. UTC | #5
On Wed, Aug 15, 2018 at 07:55:18AM -0500, Tamar Christina wrote:
> Hi All,
> 
> I'm updating the patch with the suggested changes and also fixing a bug with a boundary condition.
> 
> On AArch64 we have integer modes larger than TImode, and while we can generate
> moves for these they're not as efficient.
> 
> So instead make sure we limit the maximum we can copy to TImode.  This means
> copying a 16 byte struct will issue 1 TImode copy, which will be done using a
> single STP as we expect but an CImode sized copy won't issue CImode operations.
> 
> I am also moving the residual code inside the if since smallest_mode_for_int may
> trap if the mode doesn't exist.  And the only time we know the mode to exist for
> sure is when the condition of the if is true.  This also saves repeated calls to
> the iterator.
> 
> Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
> Crosstested aarch4_be-none-elf and no issues.
> 
> Ok for trunk?

OK.

Thanks,
James

> gcc/
> 2018-08-15  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.
> 
> gcc/testsuite/
> 2018-08-15  Tamar Christina  <tamar.christina@arm.com>
> 
>  	* gcc.target/aarch64/large_struct_copy_2.c: New.
>
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e8d8104c066a265120ab776f7ab5a959d3512b6..cdd8bca98f8c50a804986510144db9ecf911bf1e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15927,13 +15927,17 @@  aarch64_expand_movmem (rtx *operands)
   /* Convert n to bits to make the rest of the code simpler.  */
   n = n * BITS_PER_UNIT;
 
+  /* Maximum amount to copy in one go.  The AArch64 back-end has integer modes
+     larger than TImode, but we should not use them for loads/stores here.  */
+  const int copy_limit = GET_MODE_BITSIZE (TImode);
+
   while (n > 0)
     {
       /* Find the largest mode in which to do the copy in without over reading
 	 or writing.  */
       opt_scalar_int_mode mode_iter;
       FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
-	if (GET_MODE_BITSIZE (mode_iter.require ()) <= n)
+	if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
 	  cur_mode = mode_iter.require ();
 
       gcc_assert (cur_mode != BLKmode);
diff --git a/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c b/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..8380ce008e7ffd30b6d21d89dc5ff3a9fd395e9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct S0 {
+  signed f1;
+  long f2;
+  unsigned f3;
+  unsigned f4;
+  unsigned f5;
+} a;
+struct S2 {
+  long f0;
+  int f2;
+  struct S0 f3;
+};
+
+void fn1 () {
+  struct S2 b = {0, 1, 7, 4073709551611, 4, 8, 7};
+  a = b.f3;
+}
+
+/* { dg-final { scan-assembler-times {ldp\s+x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {stp\s+x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-not {ld[1-3]} } } */