diff mbox series

PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

Message ID 20200910034405.39647-1-kito.cheng@sifive.com
State New
Headers show
Series PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values. | expand

Commit Message

Kito Cheng Sept. 10, 2020, 3:44 a.m. UTC
In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled
misaligned stores, but it didn't handle PARALLEL values, and I refer the
other part of this function, I found the PARALLEL need handled by
emit_group_* functions, so I add a check, and using emit_group_store if
storing a PARALLEL value, also checked this change didn't break the
testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes.

For riscv64 target, struct S {int a; double b;} will pack into a parallel
value to return and it has TImode when misaligned access is supported,
however TImode required 16-byte align, but it only 8-byte align, so it go to
the misaligned stores handling, then it will try to generate move
instruction from a PARALLEL value.

Tested on following target without introduced new reguression:
  - riscv32/riscv64 elf
  - x86_64-linux
  - arm-eabi

gcc/ChangeLog:

	PR target/96759
	* expr.c (expand_assignment): Handle misaligned stores with PARALLEL
	value.

gcc/testsuite/ChangeLog:

	PR target/96759
	* g++.target/riscv/pr96759.C: New.
	* gcc.target/riscv/pr96759.c: New.
---
 gcc/expr.c                               |  9 ++++++++-
 gcc/testsuite/g++.target/riscv/pr96759.C |  8 ++++++++
 gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c

Comments

Richard Biener Sept. 10, 2020, 7:03 a.m. UTC | #1
On Thu, 10 Sep 2020, Kito Cheng wrote:

> In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has handled
> misaligned stores, but it didn't handle PARALLEL values, and I refer the
> other part of this function, I found the PARALLEL need handled by
> emit_group_* functions, so I add a check, and using emit_group_store if
> storing a PARALLEL value, also checked this change didn't break the
> testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal changes.
> 
> For riscv64 target, struct S {int a; double b;} will pack into a parallel
> value to return and it has TImode when misaligned access is supported,
> however TImode required 16-byte align, but it only 8-byte align, so it go to
> the misaligned stores handling, then it will try to generate move
> instruction from a PARALLEL value.
> 
> Tested on following target without introduced new reguression:
>   - riscv32/riscv64 elf
>   - x86_64-linux
>   - arm-eabi

riscv doesn't seem to have movmisalign and I don't see why movmisalign
should not support a TImode parallel RHS so at least you should
put this check after the icode != CODE_FOR_nothing check?

Also wouldn't it be better to support PARALLEL from within
store_bit_field?  After all this is a misaligned access and
since you didn't quote the RTL involved I'm guessing that
emit_group_store knows nothing about this fact.

Richard.

> gcc/ChangeLog:
> 
> 	PR target/96759
> 	* expr.c (expand_assignment): Handle misaligned stores with PARALLEL
> 	value.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/96759
> 	* g++.target/riscv/pr96759.C: New.
> 	* gcc.target/riscv/pr96759.c: New.
> ---
>  gcc/expr.c                               |  9 ++++++++-
>  gcc/testsuite/g++.target/riscv/pr96759.C |  8 ++++++++
>  gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +++++++++++++
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b3979..cc2bc52c457f 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -5173,7 +5173,14 @@ expand_assignment (tree to, tree from, bool nontemporal)
>        if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
>  	reg = flip_storage_order (mode, reg);
>  
> -      if (icode != CODE_FOR_nothing)
> +      if (GET_CODE (reg) == PARALLEL)
> +	{
> +	  push_temp_slots ();
> +	  emit_group_store (mem, reg, TREE_TYPE (from),
> +			    int_size_in_bytes (TREE_TYPE (from)));
> +	  pop_temp_slots ();
> +	}
> +      else if (icode != CODE_FOR_nothing)
>  	{
>  	  class expand_operand ops[2];
>  
> diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C b/gcc/testsuite/g++.target/riscv/pr96759.C
> new file mode 100644
> index 000000000000..673999a4baf7
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/pr96759.C
> @@ -0,0 +1,8 @@
> +/* { dg-options "-mno-strict-align -std=gnu++17" } */
> +/* { dg-do compile } */
> +struct S {
> +  int a;
> +  double b;
> +};
> +S GetNumbers();
> +auto [globalC, globalD] = GetNumbers();
> diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c b/gcc/testsuite/gcc.target/riscv/pr96759.c
> new file mode 100644
> index 000000000000..621c39196fca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr96759.c
> @@ -0,0 +1,13 @@
> +/* { dg-options "-mno-strict-align" } */
> +/* { dg-do compile } */
> +
> +struct S {
> +  int a;
> +  double b;
> +};
> +struct S GetNumbers();
> +struct S g;
> +
> +void foo(){
> +  g = GetNumbers();
> +}
>
Kito Cheng Sept. 10, 2020, 8:18 a.m. UTC | #2
Hi Richard:

Thanks for your review :)

> riscv doesn't seem to have movmisalign and I don't see why movmisalign
> should not support a TImode parallel RHS so at least you should
> put this check after the icode != CODE_FOR_nothing check?

RISC-V has an option `-mno-strict-align` to enable mis-aligned access,
but we didn't provide movmisalign pattern, I guess might be another issue,
I tried to add movmisalign pattern and then it will feed PARALLEL into that,
and got into a similar situation again, most targets are not except
got PARALLEL,
so that's the reason why I put before icode != CODE_FOR_nothing check.

> Also wouldn't it be better to support PARALLEL from within
> store_bit_field?
For the above reason, I think store_bit_field supports PARALLEL is  not enough.

> After all this is a misaligned access and
> since you didn't quote the RTL involved I'm guessing that
> emit_group_store knows nothing about this fact.

Oh, good point, how do you think about loading into temp and then
storing it in the DECL, so that we could use original logic to
handle misaligned store.

A PoC for this idea, didn't fully tested yet:

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b3979..6ef97740c764 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -5166,8 +5166,19 @@ expand_assignment (tree to, tree from, bool nontemporal)
         || targetm.slow_unaligned_access (mode, align)))
    {
      rtx reg, mem;
+      push_temp_slots ();

      reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+
+      if (GET_CODE (reg) == PARALLEL)
+       {
+         rtx temp = assign_stack_temp (mode,
+                                       GET_MODE_SIZE (mode));
+         emit_group_store (temp, reg, TREE_TYPE (from),
+                          int_size_in_bytes (TREE_TYPE (from)));
+         reg = temp;
+       }
+
      reg = force_not_mem (reg);
      mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
      if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
@@ -5186,6 +5197,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
      else
       store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
                        false);
+      pop_temp_slots ();
      return;
    }
Richard Biener Sept. 10, 2020, 9:09 a.m. UTC | #3
On Thu, 10 Sep 2020, Kito Cheng wrote:

> Hi Richard:
> 
> Thanks for your review :)
> 
> > riscv doesn't seem to have movmisalign and I don't see why movmisalign
> > should not support a TImode parallel RHS so at least you should
> > put this check after the icode != CODE_FOR_nothing check?
> 
> RISC-V has an option `-mno-strict-align` to enable mis-aligned access,
> but we didn't provide movmisalign pattern, I guess might be another issue,
> I tried to add movmisalign pattern and then it will feed PARALLEL into that,
> and got into a similar situation again, most targets are not except
> got PARALLEL,
> so that's the reason why I put before icode != CODE_FOR_nothing check.
> 
> > Also wouldn't it be better to support PARALLEL from within
> > store_bit_field?
> For the above reason, I think store_bit_field supports PARALLEL is  not enough.
> 
> > After all this is a misaligned access and
> > since you didn't quote the RTL involved I'm guessing that
> > emit_group_store knows nothing about this fact.
> 
> Oh, good point, how do you think about loading into temp and then
> storing it in the DECL, so that we could use original logic to
> handle misaligned store.
> 
> A PoC for this idea, didn't fully tested yet:
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b3979..6ef97740c764 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -5166,8 +5166,19 @@ expand_assignment (tree to, tree from, bool nontemporal)
>          || targetm.slow_unaligned_access (mode, align)))
>     {
>       rtx reg, mem;
> +      push_temp_slots ();
> 
>       reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);

I'd say there must be (wishful thinking) some expand_expr
modifier that guarantees this?  Alternatively use

     reg = force_reg (mode, reg);

instead of assign_stack_temp & emit_group_store?

That said, I don't know very much about the fancy ways to handle
stores from PARALLEL - but doesn't PARALLEL mean we can just take
any of its members as source?  How does 'reg' look like exactly
in your case?  What does 'to' expand to?

Richard.

> +
> +      if (GET_CODE (reg) == PARALLEL)
> +       {
> +         rtx temp = assign_stack_temp (mode,
> +                                       GET_MODE_SIZE (mode));
> +         emit_group_store (temp, reg, TREE_TYPE (from),
> +                          int_size_in_bytes (TREE_TYPE (from)));
> +         reg = temp;
> +       }
> +
>       reg = force_not_mem (reg);
>       mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
>       if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
> @@ -5186,6 +5197,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>       else
>        store_bit_field (mem, GET_MODE_BITSIZE (mode), 0, 0, 0, mode, reg,
>                         false);
> +      pop_temp_slots ();
>       return;
>     }
>
Eric Botcazou Sept. 10, 2020, 9:39 a.m. UTC | #4
> I'd say there must be (wishful thinking) some expand_expr
> modifier that guarantees this?  Alternatively use
> 
>      reg = force_reg (mode, reg);
> 
> instead of assign_stack_temp & emit_group_store?
> 
> That said, I don't know very much about the fancy ways to handle
> stores from PARALLEL - but doesn't PARALLEL mean we can just take
> any of its members as source?  How does 'reg' look like exactly
> in your case?  What does 'to' expand to?

No, PARALLEL describes a multi-register layout, typically a return value with 
aggregate type, and you need to use emit_group_store to decipher it properly.

emit_group_store should already be able to handle misaligned stores I think.
Kito Cheng Sept. 10, 2020, 10:04 a.m. UTC | #5
On Thu, Sep 10, 2020 at 5:41 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > I'd say there must be (wishful thinking) some expand_expr
> > modifier that guarantees this?  Alternatively use
> >
> >      reg = force_reg (mode, reg);

I did a try, seems like force_reg didn't handle PARALLE :(

> >
> > instead of assign_stack_temp & emit_group_store?
> >
> > That said, I don't know very much about the fancy ways to handle
> > stores from PARALLEL - but doesn't PARALLEL mean we can just take
> > any of its members as source?  How does 'reg' look like exactly
> > in your case?  What does 'to' expand to?

'reg' look like this;
(parallel:TI [
    (expr_list:REG_DEP_TRUE (reg:SI 72)
        (const_int 0 [0]))
    (expr_list:REG_DEP_TRUE (reg:DF 73)
        (const_int 8 [0x8]))])

'to' is global var decl as struct S {int a; double b;}, align to 4,
and TYPE_MODE is TImode,
and TImode requires 8 byte alignment.


>
> No, PARALLEL describes a multi-register layout, typically a return value with
> aggregate type, and you need to use emit_group_store to decipher it properly.
>
> emit_group_store should already be able to handle misaligned stores I think.
>
> --
> Eric Botcazou
>
>
Kito Cheng Sept. 10, 2020, 10:05 a.m. UTC | #6
Optimized version of the v2 patch, get rid of assign_stack_temp.

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b3979..5f744a6c1b8d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -5168,7 +5168,16 @@ expand_assignment (tree to, tree from, bool nontemporal)
      rtx reg, mem;

      reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
-      reg = force_not_mem (reg);
+
+      if (GET_CODE (reg) == PARALLEL)
+       {
+         rtx temp = gen_reg_rtx (mode);
+         emit_group_store (temp, reg, TREE_TYPE (from),
+                          int_size_in_bytes (TREE_TYPE (from)));
+         reg = temp;
+       }
+
+      reg = force_not_mem (mode, reg);
      mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
      if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
       reg = flip_storage_order (mode, reg);
diff mbox series

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b3979..cc2bc52c457f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -5173,7 +5173,14 @@  expand_assignment (tree to, tree from, bool nontemporal)
       if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
 	reg = flip_storage_order (mode, reg);
 
-      if (icode != CODE_FOR_nothing)
+      if (GET_CODE (reg) == PARALLEL)
+	{
+	  push_temp_slots ();
+	  emit_group_store (mem, reg, TREE_TYPE (from),
+			    int_size_in_bytes (TREE_TYPE (from)));
+	  pop_temp_slots ();
+	}
+      else if (icode != CODE_FOR_nothing)
 	{
 	  class expand_operand ops[2];
 
diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C b/gcc/testsuite/g++.target/riscv/pr96759.C
new file mode 100644
index 000000000000..673999a4baf7
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/pr96759.C
@@ -0,0 +1,8 @@ 
+/* { dg-options "-mno-strict-align -std=gnu++17" } */
+/* { dg-do compile } */
+struct S {
+  int a;
+  double b;
+};
+S GetNumbers();
+auto [globalC, globalD] = GetNumbers();
diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c b/gcc/testsuite/gcc.target/riscv/pr96759.c
new file mode 100644
index 000000000000..621c39196fca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr96759.c
@@ -0,0 +1,13 @@ 
+/* { dg-options "-mno-strict-align" } */
+/* { dg-do compile } */
+
+struct S {
+  int a;
+  double b;
+};
+struct S GetNumbers();
+struct S g;
+
+void foo(){
+  g = GetNumbers();
+}