diff mbox series

[committed] Fix RISC-V missing stack tie

Message ID df557184-0b6c-4523-b2b1-dc02ee9512e9@ventanamicro.com
State New
Headers show
Series [committed] Fix RISC-V missing stack tie | expand

Commit Message

Jeff Law March 22, 2024, 2:47 a.m. UTC
As some of you know, Raphael has been working on stack-clash support for 
the RISC-V port.  A little while ago Florian reached out to us with an 
issue where glibc was failing its smoke test due to referencing an 
unallocated stack slot.

Without diving into the code in detail I (incorrectly) concluded it was 
a problem with the fallback of using Ada's stack-check paths due to not 
having stack-clash support.

Once enough stack-clash bits were ready I had Raphael review the code 
generated for Florian's test and we concluded the the original case from 
Florian was just wrong irrespective of stack clash/stack check.  While 
Raphael's stack-clash work will indirectly fix Florian's case, it really 
should also work without stack-clash.

In particular this code was called out by valgrind:

> 000000000003cb5e <realpath@@GLIBC_2.27>:
> __GI___realpath():
>    3cb5e:       81010113                addi    sp,sp,-2032
>    3cb62:       7d313423                sd      s3,1992(sp)
>    3cb66:       79fd                    lui     s3,0xfffff
>    3cb68:       7e813023                sd      s0,2016(sp)
>    3cb6c:       7c913c23                sd      s1,2008(sp)
>    3cb70:       7f010413                addi    s0,sp,2032
>    3cb74:       35098793                addi    a5,s3,848 # fffffffffffff350 <__libc_initial+0xffffffffffe8946a>
>    3cb78:       74fd                    lui     s1,0xfffff
>    3cb7a:       008789b3                add     s3,a5,s0
>    3cb7e:       f9048793                addi    a5,s1,-112 # ffffffffffffef90 <__libc_initial+0xffffffffffe890aa>
>    3cb82:       008784b3                add     s1,a5,s0
>    3cb86:       77fd                    lui     a5,0xfffff
>    3cb88:       7d413023                sd      s4,1984(sp)
>    3cb8c:       7b513c23                sd      s5,1976(sp)
>    3cb90:       7e113423                sd      ra,2024(sp)
>    3cb94:       7d213823                sd      s2,2000(sp)
>    3cb98:       7b613823                sd      s6,1968(sp)
>    3cb9c:       7b713423                sd      s7,1960(sp)
>    3cba0:       7b813023                sd      s8,1952(sp)
>    3cba4:       79913c23                sd      s9,1944(sp)
>    3cba8:       79a13823                sd      s10,1936(sp)
>    3cbac:       79b13423                sd      s11,1928(sp)
>    3cbb0:       34878793                addi    a5,a5,840 # fffffffffffff348 <__libc_initial+0xffffffffffe89462>
>    3cbb4:       40000713                li      a4,1024
>    3cbb8:       00132a17                auipc   s4,0x132
>    3cbbc:       ae0a3a03                ld      s4,-1312(s4) # 16e698 <__stack_chk_guard>
>    3cbc0:       01098893                addi    a7,s3,16
>    3cbc4:       42098693                addi    a3,s3,1056
>    3cbc8:       b8040a93                addi    s5,s0,-1152
>    3cbcc:       97a2                    add     a5,a5,s0
>    3cbce:       000a3603                ld      a2,0(s4)
>    3cbd2:       f8c43423                sd      a2,-120(s0)
>    3cbd6:       4601                    li      a2,0
>    3cbd8:       3d14b023                sd      a7,960(s1)
>    3cbdc:       3ce4b423                sd      a4,968(s1)
>    3cbe0:       7cd4b823                sd      a3,2000(s1)
>    3cbe4:       7ce4bc23                sd      a4,2008(s1)
>    3cbe8:       b7543823                sd      s5,-1168(s0)
>    3cbec:       b6e43c23                sd      a4,-1160(s0)
>    3cbf0:       e38c                    sd      a1,0(a5)
>    3cbf2:       b0010113                addi    sp,sp,-1280

In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If 
you chase the values around, you'll find it's a bit more than 1k into 
unallocated stack space.  It's also worth noting the final stack 
adjustment at 0x3cbf2.

While I haven't reproduced Florian's code exactly, I was able to get 
reasonably close and verify my suspicion that everything was fine before 
sched2 and incorrect after sched2.  It was also obvious at that point 
what had gone wrong -- we were missing a stack tie after the final stack 
pointer adjustment.

This patch adds the missing stack tie.

While not technically a regression, I shudder at the thought of chasing 
one of these issues down again in the wild.  Been there, done that.

Regression tested on rv64gc.  Verified the scheduler no longer mucked up 
realpath by hand.  Pushing to the trunk.

Jeff
commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Thu Mar 21 20:41:59 2024 -0600

    [committed] Fix RISC-V missing stack tie
    
    As some of you know, Raphael has been working on stack-clash support for the
    RISC-V port.  A little while ago Florian reached out to us with an issue where
    glibc was failing its smoke test due to referencing an unallocated stack slot.
    
    Without diving into the code in detail I (incorrectly) concluded it was a
    problem with the fallback of using Ada's stack-check paths due to not having
    stack-clash support.
    
    Once enough stack-clash bits were ready I had Raphael review the code generated
    for Florian's test and we concluded the the original case from Florian was just
    wrong irrespective of stack clash/stack check.  While Raphael's stack-clash
    work will indirectly fix Florian's case, it really should also work without
    stack-clash.
    
    In particular this code was called out by valgrind:
    
    > 000000000003cb5e <realpath@@GLIBC_2.27>:
    > __GI___realpath():
    >    3cb5e:       81010113                addi    sp,sp,-2032
    >    3cb62:       7d313423                sd      s3,1992(sp)
    >    3cb66:       79fd                    lui     s3,0xfffff
    >    3cb68:       7e813023                sd      s0,2016(sp)
    >    3cb6c:       7c913c23                sd      s1,2008(sp)
    >    3cb70:       7f010413                addi    s0,sp,2032
    >    3cb74:       35098793                addi    a5,s3,848 # fffffffffffff350 <__libc_initial+0xffffffffffe8946a>
    >    3cb78:       74fd                    lui     s1,0xfffff
    >    3cb7a:       008789b3                add     s3,a5,s0
    >    3cb7e:       f9048793                addi    a5,s1,-112 # ffffffffffffef90 <__libc_initial+0xffffffffffe890aa>
    >    3cb82:       008784b3                add     s1,a5,s0
    >    3cb86:       77fd                    lui     a5,0xfffff
    >    3cb88:       7d413023                sd      s4,1984(sp)
    >    3cb8c:       7b513c23                sd      s5,1976(sp)
    >    3cb90:       7e113423                sd      ra,2024(sp)
    >    3cb94:       7d213823                sd      s2,2000(sp)
    >    3cb98:       7b613823                sd      s6,1968(sp)
    >    3cb9c:       7b713423                sd      s7,1960(sp)
    >    3cba0:       7b813023                sd      s8,1952(sp)
    >    3cba4:       79913c23                sd      s9,1944(sp)
    >    3cba8:       79a13823                sd      s10,1936(sp)
    >    3cbac:       79b13423                sd      s11,1928(sp)
    >    3cbb0:       34878793                addi    a5,a5,840 # fffffffffffff348 <__libc_initial+0xffffffffffe89462>
    >    3cbb4:       40000713                li      a4,1024
    >    3cbb8:       00132a17                auipc   s4,0x132
    >    3cbbc:       ae0a3a03                ld      s4,-1312(s4) # 16e698 <__stack_chk_guard>
    >    3cbc0:       01098893                addi    a7,s3,16
    >    3cbc4:       42098693                addi    a3,s3,1056
    >    3cbc8:       b8040a93                addi    s5,s0,-1152
    >    3cbcc:       97a2                    add     a5,a5,s0
    >    3cbce:       000a3603                ld      a2,0(s4)
    >    3cbd2:       f8c43423                sd      a2,-120(s0)
    >    3cbd6:       4601                    li      a2,0
    >    3cbd8:       3d14b023                sd      a7,960(s1)
    >    3cbdc:       3ce4b423                sd      a4,968(s1)
    >    3cbe0:       7cd4b823                sd      a3,2000(s1)
    >    3cbe4:       7ce4bc23                sd      a4,2008(s1)
    >    3cbe8:       b7543823                sd      s5,-1168(s0)
    >    3cbec:       b6e43c23                sd      a4,-1160(s0)
    >    3cbf0:       e38c                    sd      a1,0(a5)
    >    3cbf2:       b0010113                addi    sp,sp,-1280
    In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If you
    chase the values around, you'll find it's a bit more than 1k into unallocated
    stack space.  It's also worth noting the final stack adjustment at 0x3cbf2.
    
    While I haven't reproduced Florian's code exactly, I was able to get reasonably
    close and verify my suspicion that everything was fine before sched2 and
    incorrect after sched2.  It was also obvious at that point what had gone wrong
    -- we were missing a stack tie after the final stack pointer adjustment.
    
    This patch adds the missing stack tie.
    
    While not technically a regression, I shudder at the thought of chasing one of
    these issues down again in the wild.  Been there, done that.
    
    Regression tested on rv64gc.  Verified the scheduler no longer mucked up
    realpath by hand.  Pushing to the trunk.
    
    gcc/
            * config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
            tie for scalable and final stack adjustment if needed.
    
            Co-authored-by: Raphael Zinsly <rzinsly@ventanamicro.com>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 97350b8305e..1358c243898 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7365,7 +7365,15 @@  riscv_expand_prologue (void)
       /* Second step for constant frame.  */
       HOST_WIDE_INT constant_frame = remaining_size.to_constant ();
       if (constant_frame == 0)
-	return;
+	{
+	  /* We must have allocated stack space for the scalable frame.
+	     Emit a stack tie if we have a frame pointer so that the
+	     allocation is ordered WRT fp setup and subsequent writes
+	     into the frame.  */
+	  if (frame_pointer_needed)
+	    riscv_emit_stack_tie ();
+	  return;
+	}
 
       if (SMALL_OPERAND (-constant_frame))
 	{
@@ -7385,6 +7393,13 @@  riscv_expand_prologue (void)
 	  insn = gen_rtx_SET (stack_pointer_rtx, insn);
 	  riscv_set_frame_expr (insn);
 	}
+
+      /* We must have allocated the remainder of the stack frame.
+	 Emit a stack tie if we have a frame pointer so that the
+	 allocation is ordered WRT fp setup and subsequent writes
+	 into the frame.  */
+      if (frame_pointer_needed)
+	riscv_emit_stack_tie ();
     }
 }