diff mbox series

[PR98791] : IRA: Make sure allocno copy mode's are ordered

Message ID 0637fdd9-f396-57e5-ed4f-4d54594020c9@arm.com
State New
Headers show
Series [PR98791] : IRA: Make sure allocno copy mode's are ordered | expand

Commit Message

Andre Vieira (lists) Feb. 19, 2021, 10:53 a.m. UTC
Hi,

This patch makes sure that allocno copies are not created for unordered 
modes. The testcases in the PR highlighted a case where an allocno copy 
was being created for:
(insn 121 120 123 11 (parallel [
             (set (reg:VNx2QI 217)
                 (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 
0)))
             (clobber (scratch:VNx16BI))
         ]) 4750 {*vec_duplicatevnx2qi_reg}
      (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
         (nil)))

As the compiler detected that the vec_duplicate<mode>_reg pattern 
allowed the input and output operand to be of the same register class, 
it tried to create an allocno copy for these two operands, stripping 
subregs in the process. However, this meant that the copy was between 
VNx2QI and SI, which have unordered mode precisions.

So at compile time we do not know which of the two modes is smaller 
which is a requirement when updating allocno copy costs.

Regression tested on aarch64-linux-gnu.

Is this OK for trunk (and after a week backport to gcc-10) ?

Regards,
Andre


gcc/ChangeLog:
2021-02-19  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         PR rtl-optimization/98791
         * ira-conflicts.c (process_regs_for_copy): Don't create allocno 
copies for unordered modes.

gcc/testsuite/ChangeLog:
2021-02-19  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         PR rtl-optimization/98791
         * gcc.target/aarch64/sve/pr98791.c: New test.

Comments

Vladimir Makarov Feb. 19, 2021, 3:05 p.m. UTC | #1
On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote:
> Hi,
>
> This patch makes sure that allocno copies are not created for 
> unordered modes. The testcases in the PR highlighted a case where an 
> allocno copy was being created for:
> (insn 121 120 123 11 (parallel [
>             (set (reg:VNx2QI 217)
>                 (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 
> ]) 0)))
>             (clobber (scratch:VNx16BI))
>         ]) 4750 {*vec_duplicatevnx2qi_reg}
>      (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
>         (nil)))
>
> As the compiler detected that the vec_duplicate<mode>_reg pattern 
> allowed the input and output operand to be of the same register class, 
> it tried to create an allocno copy for these two operands, stripping 
> subregs in the process. However, this meant that the copy was between 
> VNx2QI and SI, which have unordered mode precisions.
>
> So at compile time we do not know which of the two modes is smaller 
> which is a requirement when updating allocno copy costs.
>
> Regression tested on aarch64-linux-gnu.
>
> Is this OK for trunk (and after a week backport to gcc-10) ?
>
OK.  Yes, it is wise to wait a bit and see how the patch behaves on the 
trunk before submitting it to gcc-10 branch.  Sometimes such changes can 
have quite unexpected consequences.  But I guess not in this is case.

Thank you for working on the issue.

> gcc/ChangeLog:
> 2021-02-19  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>         PR rtl-optimization/98791
>         * ira-conflicts.c (process_regs_for_copy): Don't create 
> allocno copies for unordered modes.
>
> gcc/testsuite/ChangeLog:
> 2021-02-19  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>
>         PR rtl-optimization/98791
>         * gcc.target/aarch64/sve/pr98791.c: New test.
>
Alex Coplan Feb. 22, 2021, 10:20 a.m. UTC | #2
Hi Andre,

Thanks for fixing this.

On 19/02/2021 10:53, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
> 
> This patch makes sure that allocno copies are not created for unordered
> modes. The testcases in the PR highlighted a case where an allocno copy was
> being created for:
> (insn 121 120 123 11 (parallel [
>             (set (reg:VNx2QI 217)
>                 (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0)))
>             (clobber (scratch:VNx16BI))
>         ]) 4750 {*vec_duplicatevnx2qi_reg}
>      (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
>         (nil)))
> 
> As the compiler detected that the vec_duplicate<mode>_reg pattern allowed
> the input and output operand to be of the same register class, it tried to
> create an allocno copy for these two operands, stripping subregs in the
> process. However, this meant that the copy was between VNx2QI and SI, which
> have unordered mode precisions.
> 
> So at compile time we do not know which of the two modes is smaller which is
> a requirement when updating allocno copy costs.
> 
> Regression tested on aarch64-linux-gnu.
> 
> Is this OK for trunk (and after a week backport to gcc-10) ?
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
> @@ -0,0 +1,12 @@
> +/* PR rtl-optimization/98791  */
> +/* { dg-do compile } */
> +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */
> +extern char a[], b[];
> +short c, d;
> +long *e;
> +void f() {
> +  for (int g; g < c; g += 1) {
> +    a[g] = d;
> +    b[g] = e[g];
> +  }
> +}

For the testcase, you might want to use the one I posted most recently:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3
which avoids the dependency on the aarch64-autovec-preference param
(which is in GCC 11 only) as this will simplify backporting.

But if it's preferable to have a testcase without SVE intrinsics for GCC
11 then we should stick with that.

Cheers,
Alex
Andre Vieira (lists) Feb. 22, 2021, 1:47 p.m. UTC | #3
Hi Alex,

On 22/02/2021 10:20, Alex Coplan wrote:
> For the testcase, you might want to use the one I posted most recently:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3
> which avoids the dependency on the aarch64-autovec-preference param
> (which is in GCC 11 only) as this will simplify backporting.
>
> But if it's preferable to have a testcase without SVE intrinsics for GCC
> 11 then we should stick with that.
I don't see any problem with having SVE intrinsics in the testcase, 
committed with your other test as I agree it makes the backport easier 
eventually.

Thanks for pointing that out.
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p,
       ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
       ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
 
-      if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2)
+      if (!allocnos_conflict_for_copy_p (a1, a2)
+	  && offset1 == offset2
+	  && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
+			GET_MODE_PRECISION (ALLOCNO_MODE (a2))))
 	{
 	  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
 				     ira_curr_loop_tree_node);
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc1f1831afb68ba70016cbe26f8f9273cfceafa8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/98791  */
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize" } */
+#include <arm_sve.h>
+extern char a[11];
+extern long b[];
+void f() {
+  for (int d; d < 10; d++) {
+    a[d] = svaddv(svptrue_b8(), svdup_u8(0));
+    b[d] = 0;
+  }
+}
Andre Vieira (lists) March 10, 2021, 10:25 a.m. UTC | #4
On 19/02/2021 15:05, Vladimir Makarov wrote:
>
> On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote:
>> Hi,
>>
>> This patch makes sure that allocno copies are not created for 
>> unordered modes. The testcases in the PR highlighted a case where an 
>> allocno copy was being created for:
>> (insn 121 120 123 11 (parallel [
>>             (set (reg:VNx2QI 217)
>>                 (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 
>> ]) 0)))
>>             (clobber (scratch:VNx16BI))
>>         ]) 4750 {*vec_duplicatevnx2qi_reg}
>>      (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
>>         (nil)))
>>
>> As the compiler detected that the vec_duplicate<mode>_reg pattern 
>> allowed the input and output operand to be of the same register 
>> class, it tried to create an allocno copy for these two operands, 
>> stripping subregs in the process. However, this meant that the copy 
>> was between VNx2QI and SI, which have unordered mode precisions.
>>
>> So at compile time we do not know which of the two modes is smaller 
>> which is a requirement when updating allocno copy costs.
>>
>> Regression tested on aarch64-linux-gnu.
>>
>> Is this OK for trunk (and after a week backport to gcc-10) ?
>>
> OK.  Yes, it is wise to wait a bit and see how the patch behaves on 
> the trunk before submitting it to gcc-10 branch.  Sometimes such 
> changes can have quite unexpected consequences.  But I guess not in 
> this is case.
>
>
Is it OK to backport now? The committed patch applies cleanly and I 
regression tested it on gcc-10 branch for aarch64-linux-gnu.

Kind regards,

Andre
Vladimir Makarov March 10, 2021, 2:18 p.m. UTC | #5
On 2021-03-10 5:25 a.m., Andre Vieira (lists) wrote:
>
> On 19/02/2021 15:05, Vladimir Makarov wrote:
>>
>> On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote:
>>> Hi,
>>>
>>> This patch makes sure that allocno copies are not created for 
>>> unordered modes. The testcases in the PR highlighted a case where an 
>>> allocno copy was being created for:
>>> (insn 121 120 123 11 (parallel [
>>>             (set (reg:VNx2QI 217)
>>>                 (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 
>>> ]) 0)))
>>>             (clobber (scratch:VNx16BI))
>>>         ]) 4750 {*vec_duplicatevnx2qi_reg}
>>>      (expr_list:REG_DEAD (reg:SI 93 [ _2 ])
>>>         (nil)))
>>>
>>> As the compiler detected that the vec_duplicate<mode>_reg pattern 
>>> allowed the input and output operand to be of the same register 
>>> class, it tried to create an allocno copy for these two operands, 
>>> stripping subregs in the process. However, this meant that the copy 
>>> was between VNx2QI and SI, which have unordered mode precisions.
>>>
>>> So at compile time we do not know which of the two modes is smaller 
>>> which is a requirement when updating allocno copy costs.
>>>
>>> Regression tested on aarch64-linux-gnu.
>>>
>>> Is this OK for trunk (and after a week backport to gcc-10) ?
>>>
>> OK.  Yes, it is wise to wait a bit and see how the patch behaves on 
>> the trunk before submitting it to gcc-10 branch.  Sometimes such 
>> changes can have quite unexpected consequences.  But I guess not in 
>> this is case.
>>
>>
> Is it OK to backport now? The committed patch applies cleanly and I 
> regression tested it on gcc-10 branch for aarch64-linux-gnu.
>
Yes.  I did not see any problems with the patch after its submitting.
diff mbox series

Patch

diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -275,7 +275,10 @@  process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p,
       ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)];
       ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)];
 
-      if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2)
+      if (!allocnos_conflict_for_copy_p (a1, a2)
+	  && offset1 == offset2
+	  && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)),
+			GET_MODE_PRECISION (ALLOCNO_MODE (a2))))
 	{
 	  cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn,
 				     ira_curr_loop_tree_node);
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
new file mode 100644
index 0000000000000000000000000000000000000000..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c
@@ -0,0 +1,12 @@ 
+/* PR rtl-optimization/98791  */
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */
+extern char a[], b[];
+short c, d;
+long *e;
+void f() {
+  for (int g; g < c; g += 1) {
+    a[g] = d;
+    b[g] = e[g];
+  }
+}