diff mbox series

[gcc-15,2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

Message ID 20240316173524.1147760-3-vineetg@rivosinc.com
State New
Headers show
Series RISC-V improve stack/array access by constant mat tweak | expand

Commit Message

Vineet Gupta March 16, 2024, 5:35 p.m. UTC
Noticed that new sum of two s12 splitter was generating following:

| 059930 <tempnam>:
|   59930:	add	sp,sp,-16
|   59934:	lui	t0,0xfffff
|   59938:	sd	s0,0(sp)
|   5993c:	sd	ra,8(sp)
|   59940:	add	sp,sp,t0
|   59944:	add	s0,sp,2047  <----
|   59948:	mv	a2,a0
|   5994c:	mv	a3,a1
|   59950:	mv	a0,sp
|   59954:	li	a4,1
|   59958:	lui	a1,0x1
|   5995c:	add	s0,s0,1     <---
|   59960:	jal	59a3c

SP here becomes unaligned, even if transitively which is undesirable as
well as incorrect:
 - ABI requires stack to be 8 byte aligned
 - asm code looks weird and unexpected
 - to the user it might falsely seem like a compiler bug even when not,
   specially when staring at asm for debugging unrelated issue.

Fix this by using 2032+addend idiom when handling register operands
related to stack. This only affects positive S12 values, negative values
are already -2048 based.

Unfortunately implementation requires making a copy of splitter, since
it needs different varaints of predicate and constraint which cant be
done conditionally in the same MD pattern (constraint with restricted
range prevents LRA from allowing such insn despite new predicate)

With the patch, we get following correct code instead:

| ..
| 59944:	add	s0,sp,2032
| ..
| 5995c:	add	s0,s0,16

gcc/Changelog:
	* config/riscv/riscv.h: Add alignment arg to new macros.
	* config/riscv/constraints.md: Variant of new constraint.
	* config/riscv/predicates.md: Variant of new predicate.
	* config/riscv/riscv.md: Variant of new splitter which offsets
	off of 2032 (vs. 2047).
	Gate existing splitter behind riscv_reg_frame_related.
	* config/riscv/riscv.cc (riscv_reg_frame_related): New helper to
	conditionalize the existing and new spitters.
	* config/riscv/riscv-protos.h: Add new prototype.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/config/riscv/constraints.md |  6 ++++++
 gcc/config/riscv/predicates.md  |  8 ++++++-
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc       | 11 ++++++++++
 gcc/config/riscv/riscv.h        | 15 ++++++++-----
 gcc/config/riscv/riscv.md       | 37 ++++++++++++++++++++++++++++++---
 6 files changed, 69 insertions(+), 9 deletions(-)

Comments

Jeff Law March 16, 2024, 8:21 p.m. UTC | #1
On 3/16/24 11:35 AM, Vineet Gupta wrote:
> Noticed that new sum of two s12 splitter was generating following:
> 
> | 059930 <tempnam>:
> |   59930:	add	sp,sp,-16
> |   59934:	lui	t0,0xfffff
> |   59938:	sd	s0,0(sp)
> |   5993c:	sd	ra,8(sp)
> |   59940:	add	sp,sp,t0
> |   59944:	add	s0,sp,2047  <----
> |   59948:	mv	a2,a0
> |   5994c:	mv	a3,a1
> |   59950:	mv	a0,sp
> |   59954:	li	a4,1
> |   59958:	lui	a1,0x1
> |   5995c:	add	s0,s0,1     <---
> |   59960:	jal	59a3c
> 
> SP here becomes unaligned, even if transitively which is undesirable as
> well as incorrect:
>   - ABI requires stack to be 8 byte aligned
>   - asm code looks weird and unexpected
>   - to the user it might falsely seem like a compiler bug even when not,
>     specially when staring at asm for debugging unrelated issue.
It's not ideal, but I think it's still ABI compliant as-is.  If it 
wasn't, then I suspect things like virtual origins in Ada couldn't be 
made ABI compliant.


> 
> Fix this by using 2032+addend idiom when handling register operands
> related to stack. This only affects positive S12 values, negative values
> are already -2048 based.
> 
> Unfortunately implementation requires making a copy of splitter, since
> it needs different varaints of predicate and constraint which cant be
> done conditionally in the same MD pattern (constraint with restricted
> range prevents LRA from allowing such insn despite new predicate)
> 
> With the patch, we get following correct code instead:
> 
> | ..
> | 59944:	add	s0,sp,2032
> | ..
> | 5995c:	add	s0,s0,16
Alternately you could tighten the positive side of the range of the 
splitter from patch 1/3 so that you could always use 2032 rather than 
2047 on the first addi.   ie instead of allowing 2048..4094, allow 
2048..4064.

I don't have a strong opinion on that vs the direction you've gone here.


Jeff
Vineet Gupta March 19, 2024, 12:27 a.m. UTC | #2
On 3/16/24 13:21, Jeff Law wrote:
> |   59944:	add	s0,sp,2047  <----
> |   59948:	mv	a2,a0
> |   5994c:	mv	a3,a1
> |   59950:	mv	a0,sp
> |   59954:	li	a4,1
> |   59958:	lui	a1,0x1
> |   5995c:	add	s0,s0,1     <---
> |   59960:	jal	59a3c
>
> SP here becomes unaligned, even if transitively which is undesirable as
> well as incorrect:
>   - ABI requires stack to be 8 byte aligned
>   - asm code looks weird and unexpected
>   - to the user it might falsely seem like a compiler bug even when not,
>     specially when staring at asm for debugging unrelated issue.
> It's not ideal, but I think it's still ABI compliant as-is.  If it 
> wasn't, then I suspect things like virtual origins in Ada couldn't be 
> made ABI compliant.

To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
I'd still like to avoid it as I'm sure someone will complain about it.

>> With the patch, we get following correct code instead:
>>
>> | ..
>> | 59944:	add	s0,sp,2032
>> | ..
>> | 5995c:	add	s0,s0,16
> Alternately you could tighten the positive side of the range of the 
> splitter from patch 1/3 so that you could always use 2032 rather than 
> 2047 on the first addi.   ie instead of allowing 2048..4094, allow 
> 2048..4064.

2033..4064 vs. 2048..4094

Yeah I was a bit split about this as well. Since you are OK with either,
I'll keep them as-is and perhaps add this observation to commitlog.

Thx,
-Vineet
Andrew Waterman March 19, 2024, 6:48 a.m. UTC | #3
On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 3/16/24 13:21, Jeff Law wrote:
> > |   59944:    add     s0,sp,2047  <----
> > |   59948:    mv      a2,a0
> > |   5994c:    mv      a3,a1
> > |   59950:    mv      a0,sp
> > |   59954:    li      a4,1
> > |   59958:    lui     a1,0x1
> > |   5995c:    add     s0,s0,1     <---
> > |   59960:    jal     59a3c
> >
> > SP here becomes unaligned, even if transitively which is undesirable as
> > well as incorrect:
> >   - ABI requires stack to be 8 byte aligned
> >   - asm code looks weird and unexpected
> >   - to the user it might falsely seem like a compiler bug even when not,
> >     specially when staring at asm for debugging unrelated issue.
> > It's not ideal, but I think it's still ABI compliant as-is.  If it
> > wasn't, then I suspect things like virtual origins in Ada couldn't be
> > made ABI compliant.
>
> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
> I'd still like to avoid it as I'm sure someone will complain about it.
>
> >> With the patch, we get following correct code instead:
> >>
> >> | ..
> >> | 59944:     add     s0,sp,2032
> >> | ..
> >> | 5995c:     add     s0,s0,16
> > Alternately you could tighten the positive side of the range of the
> > splitter from patch 1/3 so that you could always use 2032 rather than
> > 2047 on the first addi.   ie instead of allowing 2048..4094, allow
> > 2048..4064.
>
> 2033..4064 vs. 2048..4094
>
> Yeah I was a bit split about this as well. Since you are OK with either,
> I'll keep them as-is and perhaps add this observation to commitlog.

There's a subset of embedded use cases where an interrupt service
routine continues on the same stack as the interrupted thread,
requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
and with no important data on the stack at an address below sp).

Although not all use cases care about this property, it seems more
straightforward to maintain the invariant everywhere, rather than
selectively enforce it.

>
> Thx,
> -Vineet
Jeff Law March 19, 2024, 1:10 p.m. UTC | #4
On 3/19/24 12:48 AM, Andrew Waterman wrote:
> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>
>>
>>
>> On 3/16/24 13:21, Jeff Law wrote:
>>> |   59944:    add     s0,sp,2047  <----
>>> |   59948:    mv      a2,a0
>>> |   5994c:    mv      a3,a1
>>> |   59950:    mv      a0,sp
>>> |   59954:    li      a4,1
>>> |   59958:    lui     a1,0x1
>>> |   5995c:    add     s0,s0,1     <---
>>> |   59960:    jal     59a3c
>>>
>>> SP here becomes unaligned, even if transitively which is undesirable as
>>> well as incorrect:
>>>    - ABI requires stack to be 8 byte aligned
>>>    - asm code looks weird and unexpected
>>>    - to the user it might falsely seem like a compiler bug even when not,
>>>      specially when staring at asm for debugging unrelated issue.
>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
>>> made ABI compliant.
>>
>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
>> I'd still like to avoid it as I'm sure someone will complain about it.
>>
>>>> With the patch, we get following correct code instead:
>>>>
>>>> | ..
>>>> | 59944:     add     s0,sp,2032
>>>> | ..
>>>> | 5995c:     add     s0,s0,16
>>> Alternately you could tighten the positive side of the range of the
>>> splitter from patch 1/3 so that you could always use 2032 rather than
>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
>>> 2048..4064.
>>
>> 2033..4064 vs. 2048..4094
>>
>> Yeah I was a bit split about this as well. Since you are OK with either,
>> I'll keep them as-is and perhaps add this observation to commitlog.
> 
> There's a subset of embedded use cases where an interrupt service
> routine continues on the same stack as the interrupted thread,
> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
> and with no important data on the stack at an address below sp).
> 
> Although not all use cases care about this property, it seems more
> straightforward to maintain the invariant everywhere, rather than
> selectively enforce it.
Just to be clear, the changes don't misalign the stack pointer at all. 
They merely have the potential to create *another* pointer into the 
stack which may or may not be aligned.  Which is totally normal, it's no 
different than taking the address of a char on the stack.

jeff
Vineet Gupta March 19, 2024, 8:05 p.m. UTC | #5
On 3/19/24 06:10, Jeff Law wrote:
> On 3/19/24 12:48 AM, Andrew Waterman wrote:
>> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>> On 3/16/24 13:21, Jeff Law wrote:
>>>> |   59944:    add     s0,sp,2047  <----
>>>> |   59948:    mv      a2,a0
>>>> |   5994c:    mv      a3,a1
>>>> |   59950:    mv      a0,sp
>>>> |   59954:    li      a4,1
>>>> |   59958:    lui     a1,0x1
>>>> |   5995c:    add     s0,s0,1     <---
>>>> |   59960:    jal     59a3c
>>>>
>>>> SP here becomes unaligned, even if transitively which is undesirable as
>>>> well as incorrect:
>>>>    - ABI requires stack to be 8 byte aligned
>>>>    - asm code looks weird and unexpected
>>>>    - to the user it might falsely seem like a compiler bug even when not,
>>>>      specially when staring at asm for debugging unrelated issue.
>>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
>>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
>>>> made ABI compliant.
>>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
>>> I'd still like to avoid it as I'm sure someone will complain about it.
>>>
>>>>> With the patch, we get following correct code instead:
>>>>>
>>>>> | ..
>>>>> | 59944:     add     s0,sp,2032
>>>>> | ..
>>>>> | 5995c:     add     s0,s0,16
>>>> Alternately you could tighten the positive side of the range of the
>>>> splitter from patch 1/3 so that you could always use 2032 rather than
>>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
>>>> 2048..4064.
>>> 2033..4064 vs. 2048..4094
>>>
>>> Yeah I was a bit split about this as well. Since you are OK with either,
>>> I'll keep them as-is and perhaps add this observation to commitlog.
>> There's a subset of embedded use cases where an interrupt service
>> routine continues on the same stack as the interrupted thread,
>> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
>> and with no important data on the stack at an address below sp).
>>
>> Although not all use cases care about this property, it seems more
>> straightforward to maintain the invariant everywhere, rather than
>> selectively enforce it.
> Just to be clear, the changes don't misalign the stack pointer at all. 
> They merely have the potential to create *another* pointer into the 
> stack which may or may not be aligned.  Which is totally normal, it's no 
> different than taking the address of a char on the stack.

Right I never saw any sp,sp,2047 getting generated - not even in the
first version of patch which lacked any filtering of stack regs via
riscv_reg_frame_related () and obviously didn't have the stack variant
of splitter. I don't know if that is just being lucky and not enough
testing exposure (I only spot checked buildroot libc, vmlinux) or
something somewhere enforces that.

However given that misaligned pointer off of stack is a non-issue, I
think we can do the following:

1. keep just one splitter with 2047 based predicates and constraint (and
not 2032) for both stack-related and general regs.
2. gate the splitter on only operands[0] being not stack related
(currently it checks for either [0] or [1]) - this allows the prominent
case where SP is simply a src, and avoids when any potential shenanigans
to SP itself.

-Vineet
Andrew Waterman March 19, 2024, 8:58 p.m. UTC | #6
On Tue, Mar 19, 2024 at 1:05 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 3/19/24 06:10, Jeff Law wrote:
> > On 3/19/24 12:48 AM, Andrew Waterman wrote:
> >> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
> >>> On 3/16/24 13:21, Jeff Law wrote:
> >>>> |   59944:    add     s0,sp,2047  <----
> >>>> |   59948:    mv      a2,a0
> >>>> |   5994c:    mv      a3,a1
> >>>> |   59950:    mv      a0,sp
> >>>> |   59954:    li      a4,1
> >>>> |   59958:    lui     a1,0x1
> >>>> |   5995c:    add     s0,s0,1     <---
> >>>> |   59960:    jal     59a3c
> >>>>
> >>>> SP here becomes unaligned, even if transitively which is undesirable as
> >>>> well as incorrect:
> >>>>    - ABI requires stack to be 8 byte aligned
> >>>>    - asm code looks weird and unexpected
> >>>>    - to the user it might falsely seem like a compiler bug even when not,
> >>>>      specially when staring at asm for debugging unrelated issue.
> >>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
> >>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
> >>>> made ABI compliant.
> >>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
> >>> I'd still like to avoid it as I'm sure someone will complain about it.
> >>>
> >>>>> With the patch, we get following correct code instead:
> >>>>>
> >>>>> | ..
> >>>>> | 59944:     add     s0,sp,2032
> >>>>> | ..
> >>>>> | 5995c:     add     s0,s0,16
> >>>> Alternately you could tighten the positive side of the range of the
> >>>> splitter from patch 1/3 so that you could always use 2032 rather than
> >>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
> >>>> 2048..4064.
> >>> 2033..4064 vs. 2048..4094
> >>>
> >>> Yeah I was a bit split about this as well. Since you are OK with either,
> >>> I'll keep them as-is and perhaps add this observation to commitlog.
> >> There's a subset of embedded use cases where an interrupt service
> >> routine continues on the same stack as the interrupted thread,
> >> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
> >> and with no important data on the stack at an address below sp).
> >>
> >> Although not all use cases care about this property, it seems more
> >> straightforward to maintain the invariant everywhere, rather than
> >> selectively enforce it.
> > Just to be clear, the changes don't misalign the stack pointer at all.
> > They merely have the potential to create *another* pointer into the
> > stack which may or may not be aligned.  Which is totally normal, it's no
> > different than taking the address of a char on the stack.
>
> Right I never saw any sp,sp,2047 getting generated - not even in the
> first version of patch which lacked any filtering of stack regs via
> riscv_reg_frame_related () and obviously didn't have the stack variant
> of splitter. I don't know if that is just being lucky and not enough
> testing exposure (I only spot checked buildroot libc, vmlinux) or
> something somewhere enforces that.
>
> However given that misaligned pointer off of stack is a non-issue, I
> think we can do the following:
>
> 1. keep just one splitter with 2047 based predicates and constraint (and
> not 2032) for both stack-related and general regs.
> 2. gate the splitter on only operands[0] being not stack related
> (currently it checks for either [0] or [1]) - this allows the prominent
> case where SP is simply a src, and avoids when any potential shenanigans
> to SP itself.

Agreed.  I misread the original code (add s0,sp,2047 looks a lot like
add sp,sp,2047 from a quick glance on a cell phone).

>
> -Vineet
Palmer Dabbelt March 19, 2024, 9:17 p.m. UTC | #7
On Tue, 19 Mar 2024 13:05:54 PDT (-0700), Vineet Gupta wrote:
>
>
> On 3/19/24 06:10, Jeff Law wrote:
>> On 3/19/24 12:48 AM, Andrew Waterman wrote:
>>> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>>> On 3/16/24 13:21, Jeff Law wrote:
>>>>> |   59944:    add     s0,sp,2047  <----
>>>>> |   59948:    mv      a2,a0
>>>>> |   5994c:    mv      a3,a1
>>>>> |   59950:    mv      a0,sp
>>>>> |   59954:    li      a4,1
>>>>> |   59958:    lui     a1,0x1
>>>>> |   5995c:    add     s0,s0,1     <---
>>>>> |   59960:    jal     59a3c
>>>>>
>>>>> SP here becomes unaligned, even if transitively which is undesirable as
>>>>> well as incorrect:
>>>>>    - ABI requires stack to be 8 byte aligned

It's 16-byte aligned in the default ABI, for Q, but that doesn't really 
matter.

>>>>>    - asm code looks weird and unexpected
>>>>>    - to the user it might falsely seem like a compiler bug even when not,
>>>>>      specially when staring at asm for debugging unrelated issue.
>>>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
>>>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
>>>>> made ABI compliant.
>>>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
>>>> I'd still like to avoid it as I'm sure someone will complain about it.
>>>>
>>>>>> With the patch, we get following correct code instead:
>>>>>>
>>>>>> | ..
>>>>>> | 59944:     add     s0,sp,2032
>>>>>> | ..
>>>>>> | 5995c:     add     s0,s0,16
>>>>> Alternately you could tighten the positive side of the range of the
>>>>> splitter from patch 1/3 so that you could always use 2032 rather than
>>>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
>>>>> 2048..4064.
>>>> 2033..4064 vs. 2048..4094
>>>>
>>>> Yeah I was a bit split about this as well. Since you are OK with either,
>>>> I'll keep them as-is and perhaps add this observation to commitlog.
>>> There's a subset of embedded use cases where an interrupt service
>>> routine continues on the same stack as the interrupted thread,
>>> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
>>> and with no important data on the stack at an address below sp).
>>>
>>> Although not all use cases care about this property, it seems more
>>> straightforward to maintain the invariant everywhere, rather than
>>> selectively enforce it.
>> Just to be clear, the changes don't misalign the stack pointer at all.
>> They merely have the potential to create *another* pointer into the
>> stack which may or may not be aligned.  Which is totally normal, it's no
>> different than taking the address of a char on the stack.

IIRC the "always"-ness of the stack pointer alignment has come up 
before, and we decided to keep it aligned for these embedded 
interrupt-related reasons that Andrew points out.  That's a bit 
different than most other ABI requirements, where we're just enforcing 
them on function boundaries.

That said, this one sounds like just a terminology issue: it's not SP 
that's misaligned, but intemediate SP-based addressing calculations.  
I'm not sure if there's a word for these SP-based intermediate values, 
during last week's team meeting we came up with "stack anchors".

> Right I never saw any sp,sp,2047 getting generated - not even in the
> first version of patch which lacked any filtering of stack regs via
> riscv_reg_frame_related () and obviously didn't have the stack variant
> of splitter. I don't know if that is just being lucky and not enough
> testing exposure (I only spot checked buildroot libc, vmlinux) or
> something somewhere enforces that.

I guess we could run the tests with something like

    diff --git a/target/riscv/translate.c b/target/riscv/translate.c
    index ab18899122..e87cc83067 100644
    --- a/target/riscv/translate.c
    +++ b/target/riscv/translate.c
    @@ -320,12 +320,18 @@ static void gen_goto_tb(DisasContext *ctx, int n, target_long diff)
      */
     static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
     {
    -    TCGv t;
    +    TCGv t, gpr;
    
         if (reg_num == 0) {
             return ctx->zero;
         }
    
    +    if (reg_num == 2) {
    +        gpr = tcg_gen_temp_new();
    +        tcg_gen_andi_tl(gpr, cpu_gpr[reg_num], 0xF);
    +    } else
    +        gpr = cpu_gpr[reg_num];
    +
         switch (get_ol(ctx)) {
         case MXL_RV32:
             switch (ext) {
    @@ -333,11 +339,11 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
                 break;
             case EXT_SIGN:
                 t = tcg_temp_new();
    -            tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
    +            tcg_gen_ext32s_tl(t, gpr);
                 return t;
             case EXT_ZERO:
                 t = tcg_temp_new();
    -            tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
    +            tcg_gen_ext32u_tl(t, gpr);
                 return t;
             default:
                 g_assert_not_reached();
    @@ -349,7 +355,7 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
         default:
             g_assert_not_reached();
         }
    -    return cpu_gpr[reg_num];
    +    return gpr;
     }
    
     static TCGv get_gprh(DisasContext *ctx, int reg_num)

and see if anything blows up?  I'm not sure it's worth opening that can 
of worms, though...

> However given that misaligned pointer off of stack is a non-issue, I
> think we can do the following:
>
> 1. keep just one splitter with 2047 based predicates and constraint (and
> not 2032) for both stack-related and general regs.
> 2. gate the splitter on only operands[0] being not stack related
> (currently it checks for either [0] or [1]) - this allows the prominent
> case where SP is simply a src, and avoids when any potential shenanigans
> to SP itself.

That seems reasonable to me.

>
> -Vineet
Jeff Law March 20, 2024, 6:57 p.m. UTC | #8
On 3/19/24 2:05 PM, Vineet Gupta wrote:

>> Just to be clear, the changes don't misalign the stack pointer at all.
>> They merely have the potential to create *another* pointer into the
>> stack which may or may not be aligned.  Which is totally normal, it's no
>> different than taking the address of a char on the stack.
> 
> Right I never saw any sp,sp,2047 getting generated - not even in the
> first version of patch which lacked any filtering of stack regs via
> riscv_reg_frame_related () and obviously didn't have the stack variant
> of splitter. I don't know if that is just being lucky and not enough
> testing exposure (I only spot checked buildroot libc, vmlinux) or
> something somewhere enforces that.
> 
> However given that misaligned pointer off of stack is a non-issue, I
> think we can do the following:
> 
> 1. keep just one splitter with 2047 based predicates and constraint (and
> not 2032) for both stack-related and general regs.
> 2. gate the splitter on only operands[0] being not stack related
> (currently it checks for either [0] or [1]) - this allows the prominent
> case where SP is simply a src, and avoids when any potential shenanigans
> to SP itself.
Works for me.

Jeff
Jeff Law March 23, 2024, 6:05 a.m. UTC | #9
On 3/19/24 2:05 PM, Vineet Gupta wrote:
> 
> 
> On 3/19/24 06:10, Jeff Law wrote:
>> On 3/19/24 12:48 AM, Andrew Waterman wrote:
>>> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>>> On 3/16/24 13:21, Jeff Law wrote:
>>>>> |   59944:    add     s0,sp,2047  <----
>>>>> |   59948:    mv      a2,a0
>>>>> |   5994c:    mv      a3,a1
>>>>> |   59950:    mv      a0,sp
>>>>> |   59954:    li      a4,1
>>>>> |   59958:    lui     a1,0x1
>>>>> |   5995c:    add     s0,s0,1     <---
>>>>> |   59960:    jal     59a3c
>>>>>
>>>>> SP here becomes unaligned, even if transitively which is undesirable as
>>>>> well as incorrect:
>>>>>     - ABI requires stack to be 8 byte aligned
>>>>>     - asm code looks weird and unexpected
>>>>>     - to the user it might falsely seem like a compiler bug even when not,
>>>>>       specially when staring at asm for debugging unrelated issue.
>>>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
>>>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
>>>>> made ABI compliant.
>>>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
>>>> I'd still like to avoid it as I'm sure someone will complain about it.
>>>>
>>>>>> With the patch, we get following correct code instead:
>>>>>>
>>>>>> | ..
>>>>>> | 59944:     add     s0,sp,2032
>>>>>> | ..
>>>>>> | 5995c:     add     s0,s0,16
>>>>> Alternately you could tighten the positive side of the range of the
>>>>> splitter from patch 1/3 so that you could always use 2032 rather than
>>>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
>>>>> 2048..4064.
>>>> 2033..4064 vs. 2048..4094
>>>>
>>>> Yeah I was a bit split about this as well. Since you are OK with either,
>>>> I'll keep them as-is and perhaps add this observation to commitlog.
>>> There's a subset of embedded use cases where an interrupt service
>>> routine continues on the same stack as the interrupted thread,
>>> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
>>> and with no important data on the stack at an address below sp).
>>>
>>> Although not all use cases care about this property, it seems more
>>> straightforward to maintain the invariant everywhere, rather than
>>> selectively enforce it.
>> Just to be clear, the changes don't misalign the stack pointer at all.
>> They merely have the potential to create *another* pointer into the
>> stack which may or may not be aligned.  Which is totally normal, it's no
>> different than taking the address of a char on the stack.
> 
> Right I never saw any sp,sp,2047 getting generated - not even in the
> first version of patch which lacked any filtering of stack regs via
> riscv_reg_frame_related () and obviously didn't have the stack variant
> of splitter. I don't know if that is just being lucky and not enough
> testing exposure (I only spot checked buildroot libc, vmlinux) or
> something somewhere enforces that.
> 
> However given that misaligned pointer off of stack is a non-issue, I
> think we can do the following:
> 
> 1. keep just one splitter with 2047 based predicates and constraint (and
> not 2032) for both stack-related and general regs.
> 2. gate the splitter on only operands[0] being not stack related
> (currently it checks for either [0] or [1]) - this allows the prominent
> case where SP is simply a src, and avoids when any potential shenanigans
> to SP itself.
Sounds reasonable to me.

Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 435461180c7e..a9446c54ee45 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -86,6 +86,12 @@ 
        (ior (match_test "IN_RANGE (ival,  2048,  4094)")
 	    (match_test "IN_RANGE (ival, -4096, -2049)"))))
 
+(define_constraint "MiA"
+  "const can be represented as sum of two S12 values with first aligned."
+  (and (match_code "const_int")
+       (ior (match_test "IN_RANGE (ival,  2033,  4064)")
+	    (match_test "IN_RANGE (ival, -4096, -2049)"))))
+
 (define_constraint "Ds3"
   "@internal
    1, 2 or 3 immediate"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 89490339c2da..56f9919daafa 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -423,7 +423,13 @@ 
 (define_predicate "const_two_s12"
   (match_code "const_int")
 {
-  return SUM_OF_TWO_S12 (INTVAL (op));
+  return SUM_OF_TWO_S12 (INTVAL (op), false);
+})
+
+(define_predicate "const_two_s12_algn"
+  (match_code "const_int")
+{
+  return SUM_OF_TWO_S12 (INTVAL (op), true);
 })
 
 ;; CORE-V Predicates:
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b87355938052..f9e407bf5768 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -164,6 +164,7 @@  extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
+extern bool riscv_reg_frame_related (rtx);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 680c4a728e92..38aebefa2590 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6667,6 +6667,17 @@  riscv_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
   return (to == HARD_FRAME_POINTER_REGNUM || to == STACK_POINTER_REGNUM);
 }
 
+/* Helper to determine if reg X pertains to stack.  */
+bool
+riscv_reg_frame_related (rtx x)
+{
+  return REG_P (x)
+	 && (REGNO (x) == FRAME_POINTER_REGNUM
+	     || REGNO (x) == HARD_FRAME_POINTER_REGNUM
+	     || REGNO (x) == ARG_POINTER_REGNUM
+	     || REGNO (x) == VIRTUAL_STACK_VARS_REGNUM);
+}
+
 /* Implement INITIAL_ELIMINATION_OFFSET.  FROM is either the frame pointer
    or argument pointer.  TO is either the stack pointer or hard frame
    pointer.  */
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 817661058896..00964ccd81db 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -633,11 +633,16 @@  enum reg_class
 #define SUM_OF_TWO_S12_N(VALUE)						\
   (((VALUE) >= (-2048 * 2)) && ((VALUE) <= (-2048 - 1)))
 
-#define SUM_OF_TWO_S12_P(VALUE)						\
-  (((VALUE) >= ( 2047 + 1)) && ((VALUE) <= ( 2047 * 2)))
-
-#define SUM_OF_TWO_S12(VALUE)						\
-  (SUM_OF_TWO_S12_N (VALUE) || SUM_OF_TWO_S12_P (VALUE))
+/* Variant with first value 8 byte aligned if involving stack regs.  */
+#define SUM_OF_TWO_S12_P(VALUE, WANT_ALIGN)				\
+  ((WANT_ALIGN)								\
+    ? (((VALUE) >= (2032 + 1)) && ((VALUE) <= (2032 * 2)))		\
+	: ((VALUE >= (2047 + 1)) && ((VALUE) <= (2047 * 2))))
+
+#define SUM_OF_TWO_S12(VALUE, WANT_ALIGN)				\
+  (SUM_OF_TWO_S12_N (VALUE)						\
+   || ((SUM_OF_TWO_S12_P (VALUE, false) && !(WANT_ALIGN))		\
+	|| (SUM_OF_TWO_S12_P (VALUE, true) && (WANT_ALIGN))))
 
 /* If this is a single bit mask, then we can load it with bseti.  Special
    handling of SImode 0x80000000 on RV64 is done in riscv_build_integer_1. */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 79fe861ef91f..cc8c3c653f3e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -760,16 +760,17 @@ 
   [(set (match_operand:P	 0 "register_operand" "=r,r")
 	(plus:P (match_operand:P 1 "register_operand" " r,r")
 		(match_operand:P 2 "const_two_s12"    " MiG,r")))]
-  ""
+  "!riscv_reg_frame_related (operands[0])
+    && !riscv_reg_frame_related (operands[1])"
   "add %0,%1,%2"
-  ""
+  "&& 1"
   [(set (match_dup 0)
 	(plus:P (match_dup 1) (match_dup 3)))
    (set (match_dup 0)
 	(plus:P (match_dup 0) (match_dup 4)))]
 {
   int val = INTVAL (operands[2]);
-  if (SUM_OF_TWO_S12_P (val))
+  if (SUM_OF_TWO_S12_P (val, false))
     {
        operands[3] = GEN_INT (2047);
        operands[4] = GEN_INT (val - 2047);
@@ -785,6 +786,36 @@ 
   [(set_attr "type" "arith")
    (set_attr "mode" "<P:MODE>")])
 
+(define_insn_and_split "*add<mode>3_const_sum_of_two_s12_stack"
+  [(set (match_operand:P	 0 "register_operand"   "=r,r")
+	(plus:P (match_operand:P 1 "register_operand"   " r,r")
+		(match_operand:P 2 "const_two_s12_algn" " MiA,r")))]
+  "riscv_reg_frame_related (operands[0])
+   || riscv_reg_frame_related (operands[1])"
+  "add %0,%1,%2"
+  "&& 1"
+  [(set (match_dup 0)
+	(plus:P (match_dup 1) (match_dup 3)))
+   (set (match_dup 0)
+	(plus:P (match_dup 0) (match_dup 4)))]
+{
+  int val = INTVAL (operands[2]);
+  if (SUM_OF_TWO_S12_P (val, true))
+    {
+       operands[3] = GEN_INT (2032);
+       operands[4] = GEN_INT (val - 2032);
+    }
+  else if (SUM_OF_TWO_S12_N (val))
+    {
+       operands[3] = GEN_INT (-2048);
+       operands[4] = GEN_INT (val + 2048);
+    }
+  else
+      gcc_unreachable ();
+}
+  [(set_attr "type" "arith")
+   (set_attr "mode" "<P:MODE>")])
+
 (define_expand "addv<mode>4"
   [(set (match_operand:GPR           0 "register_operand" "=r,r")
 	(plus:GPR (match_operand:GPR 1 "register_operand" " r,r")