[U-Boot,5/7] riscv: add support for multi-hart systems

Message ID 20190211221345.31980-6-lukas.auer@aisec.fraunhofer.de
State Superseded
Delegated to: Andes
Headers show
Series
  • SMP support for RISC-V
Related show

Commit Message

Auer, Lukas Feb. 11, 2019, 10:13 p.m.
On RISC-V, all harts boot independently. To be able to run on a
multi-hart system, U-Boot must be extended with the functionality to
manage all harts in the system. A new config option, CONFIG_MAIN_HART,
is used to select the hart U-Boot runs on. All other harts are halted.
U-Boot can delegate functions to them using smp_call_function().

Every hart has a valid pointer to the global data structure and a 8KiB
stack by default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.

Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
---

 arch/riscv/Kconfig           |  12 +++++
 arch/riscv/cpu/start.S       | 102 ++++++++++++++++++++++++++++++++++-
 arch/riscv/include/asm/csr.h |   1 +
 3 files changed, 114 insertions(+), 1 deletion(-)

Comments

Anup Patel Feb. 12, 2019, 1:48 a.m. | #1
> -----Original Message-----
> From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> Sent: Tuesday, February 12, 2019 3:44 AM
> To: u-boot@lists.denx.de
> Cc: Atish Patra <Atish.Patra@wdc.com>; Anup Patel
> <Anup.Patel@wdc.com>; Bin Meng <bmeng.cn@gmail.com>; Andreas
> Schwab <schwab@suse.de>; Palmer Dabbelt <palmer@sifive.com>;
> Alexander Graf <agraf@suse.de>; Lukas Auer
> <lukas.auer@aisec.fraunhofer.de>; Anup Patel <anup@brainfault.org>; Rick
> Chen <rick@andestech.com>; Baruch Siach <baruch@tkos.co.il>; Stefan
> Roese <sr@denx.de>
> Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> 
> On RISC-V, all harts boot independently. To be able to run on a multi-hart
> system, U-Boot must be extended with the functionality to manage all harts
> in the system. A new config option, CONFIG_MAIN_HART, is used to select
> the hart U-Boot runs on. All other harts are halted.
> U-Boot can delegate functions to them using smp_call_function().
> 
> Every hart has a valid pointer to the global data structure and a 8KiB stack by
> default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> 
> Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> ---
> 
>  arch/riscv/Kconfig           |  12 +++++
>  arch/riscv/cpu/start.S       | 102 ++++++++++++++++++++++++++++++++++-
>  arch/riscv/include/asm/csr.h |   1 +
>  3 files changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> 3a51339c4d..af8d0f8d67 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -140,4 +140,16 @@ config SBI_IPI
>  	default y if RISCV_SMODE
>  	depends on SMP
> 
> +config MAIN_HART
> +	int "Main hart in system"
> +	default 0
> +	help
> +	  Some SoCs include harts of various sizes, some of which might not
> +	  be suitable for running U-Boot. CONFIG_MAIN_HART is used to
> select
> +	  the hart U-Boot runs on.

This config option can be avoided altogether if we have
lottery based system to select "Main HART" in start.S.

With the MAIN_HART config option in-place, every system
will have to pick a "Main HART". What if the "Main HART"
itself does not come online due to HW failure.

Regards,
Anup
Rick Chen Feb. 15, 2019, 6:51 a.m. | #2
Hi Lukas

> > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > Sent: Tuesday, February 12, 2019 6:14 AM
> > To: u-boot@lists.denx.de
> > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer Dabbelt;
> > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi Chen(陳建志); Baruch Siach;
> > Stefan Roese
> > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> >
> > On RISC-V, all harts boot independently. To be able to run on a multi-hart system,
> > U-Boot must be extended with the functionality to manage all harts in the
> > system. A new config option, CONFIG_MAIN_HART, is used to select the hart
> > U-Boot runs on. All other harts are halted.
> > U-Boot can delegate functions to them using smp_call_function().
> >
> > Every hart has a valid pointer to the global data structure and a 8KiB stack by
> > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> >
> > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > ---
> >
> >  arch/riscv/Kconfig           |  12 +++++
> >  arch/riscv/cpu/start.S       | 102 ++++++++++++++++++++++++++++++++++-
> >  arch/riscv/include/asm/csr.h |   1 +
> >  3 files changed, 114 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 3a51339c4d..af8d0f8d67
> > 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -140,4 +140,16 @@ config SBI_IPI
> >       default y if RISCV_SMODE
> >       depends on SMP
> >
> > +config MAIN_HART
> > +     int "Main hart in system"
> > +     default 0
> > +     help
> > +       Some SoCs include harts of various sizes, some of which might not
> > +       be suitable for running U-Boot. CONFIG_MAIN_HART is used to select
> > +       the hart U-Boot runs on.
> > +
> > +config STACK_SIZE_SHIFT
> > +     int
> > +     default 13
> > +
> >  endmenu
> > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S index
> > a30f6f7194..ce7230df37 100644
> > --- a/arch/riscv/cpu/start.S
> > +++ b/arch/riscv/cpu/start.S
> > @@ -13,6 +13,7 @@
> >  #include <config.h>
> >  #include <common.h>
> >  #include <elf.h>
> > +#include <asm/csr.h>
> >  #include <asm/encoding.h>
> >  #include <generated/asm-offsets.h>
> >

If u-boot boot from flash by itself in M-mode without any FSBL or gdb,
in this case there will be no chance to assign a0.
Can we add some code as below :

 _start:
+#ifdef CONFIG_RISCV_MMODE
+       csrr    a0, mhartid
+#endif
        /* save hart id and dtb pointer */
        mv      s0, a0
        mv      s1, a1

How do you think about it ?

Thanks
Rick


> > @@ -45,6 +46,23 @@ _start:
> >       /* mask all interrupts */
> >       csrw    MODE_PREFIX(ie), zero
> >
> > +#ifdef CONFIG_SMP
> > +     /* check if hart is within range */
> > +     /* s0: hart id */
> > +     li      t0, CONFIG_NR_CPUS
> > +     bge     s0, t0, hart_out_of_bounds_loop
> > +#endif
Auer, Lukas Feb. 17, 2019, 10:02 p.m. | #3
On Tue, 2019-02-12 at 01:48 +0000, Anup Patel wrote:
> > -----Original Message-----
> > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > Sent: Tuesday, February 12, 2019 3:44 AM
> > To: u-boot@lists.denx.de
> > Cc: Atish Patra <Atish.Patra@wdc.com>; Anup Patel
> > <Anup.Patel@wdc.com>; Bin Meng <bmeng.cn@gmail.com>; Andreas
> > Schwab <schwab@suse.de>; Palmer Dabbelt <palmer@sifive.com>;
> > Alexander Graf <agraf@suse.de>; Lukas Auer
> > <lukas.auer@aisec.fraunhofer.de>; Anup Patel <anup@brainfault.org>;
> > Rick
> > Chen <rick@andestech.com>; Baruch Siach <baruch@tkos.co.il>; Stefan
> > Roese <sr@denx.de>
> > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> > 
> > On RISC-V, all harts boot independently. To be able to run on a
> > multi-hart
> > system, U-Boot must be extended with the functionality to manage
> > all harts
> > in the system. A new config option, CONFIG_MAIN_HART, is used to
> > select
> > the hart U-Boot runs on. All other harts are halted.
> > U-Boot can delegate functions to them using smp_call_function().
> > 
> > Every hart has a valid pointer to the global data structure and a
> > 8KiB stack by
> > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > 
> > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > ---
> > 
> >  arch/riscv/Kconfig           |  12 +++++
> >  arch/riscv/cpu/start.S       | 102
> > ++++++++++++++++++++++++++++++++++-
> >  arch/riscv/include/asm/csr.h |   1 +
> >  3 files changed, 114 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > 3a51339c4d..af8d0f8d67 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -140,4 +140,16 @@ config SBI_IPI
> >  	default y if RISCV_SMODE
> >  	depends on SMP
> > 
> > +config MAIN_HART
> > +	int "Main hart in system"
> > +	default 0
> > +	help
> > +	  Some SoCs include harts of various sizes, some of which might
> > not
> > +	  be suitable for running U-Boot. CONFIG_MAIN_HART is used to
> > select
> > +	  the hart U-Boot runs on.
> 
> This config option can be avoided altogether if we have
> lottery based system to select "Main HART" in start.S.
> 
> With the MAIN_HART config option in-place, every system
> will have to pick a "Main HART". What if the "Main HART"
> itself does not come online due to HW failure.
> 

Good point, I did not consider this. I will add a lottery-based system
in the next version.

Thanks,
Lukas
Auer, Lukas Feb. 17, 2019, 10:06 p.m. | #4
Hi Rick,

On Fri, 2019-02-15 at 14:51 +0800, Rick Chen wrote:
> Hi Lukas
> 
> > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > To: u-boot@lists.denx.de
> > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > Dabbelt;
> > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi Chen(陳建志);
> > > Baruch Siach;
> > > Stefan Roese
> > > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> > > 
> > > On RISC-V, all harts boot independently. To be able to run on a
> > > multi-hart system,
> > > U-Boot must be extended with the functionality to manage all
> > > harts in the
> > > system. A new config option, CONFIG_MAIN_HART, is used to select
> > > the hart
> > > U-Boot runs on. All other harts are halted.
> > > U-Boot can delegate functions to them using smp_call_function().
> > > 
> > > Every hart has a valid pointer to the global data structure and a
> > > 8KiB stack by
> > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > 
> > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > ---
> > > 
> > >  arch/riscv/Kconfig           |  12 +++++
> > >  arch/riscv/cpu/start.S       | 102
> > > ++++++++++++++++++++++++++++++++++-
> > >  arch/riscv/include/asm/csr.h |   1 +
> > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > 3a51339c4d..af8d0f8d67
> > > 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -140,4 +140,16 @@ config SBI_IPI
> > >       default y if RISCV_SMODE
> > >       depends on SMP
> > > 
> > > +config MAIN_HART
> > > +     int "Main hart in system"
> > > +     default 0
> > > +     help
> > > +       Some SoCs include harts of various sizes, some of which
> > > might not
> > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is used
> > > to select
> > > +       the hart U-Boot runs on.
> > > +
> > > +config STACK_SIZE_SHIFT
> > > +     int
> > > +     default 13
> > > +
> > >  endmenu
> > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > index
> > > a30f6f7194..ce7230df37 100644
> > > --- a/arch/riscv/cpu/start.S
> > > +++ b/arch/riscv/cpu/start.S
> > > @@ -13,6 +13,7 @@
> > >  #include <config.h>
> > >  #include <common.h>
> > >  #include <elf.h>
> > > +#include <asm/csr.h>
> > >  #include <asm/encoding.h>
> > >  #include <generated/asm-offsets.h>
> > > 
> 
> If u-boot boot from flash by itself in M-mode without any FSBL or
> gdb,
> in this case there will be no chance to assign a0.
> Can we add some code as below :
> 
>  _start:
> +#ifdef CONFIG_RISCV_MMODE
> +       csrr    a0, mhartid
> +#endif
>         /* save hart id and dtb pointer */
>         mv      s0, a0
>         mv      s1, a1
> 
> How do you think about it ?
> 

Good point, thanks! Even without SMP support this could cause issues if
we pass an invalid hart ID to Linux. I will add a patch for this.

Thanks,
Lukas

> Thanks
> Rick
> 
> 
> > > @@ -45,6 +46,23 @@ _start:
> > >       /* mask all interrupts */
> > >       csrw    MODE_PREFIX(ie), zero
> > > 
> > > +#ifdef CONFIG_SMP
> > > +     /* check if hart is within range */
> > > +     /* s0: hart id */
> > > +     li      t0, CONFIG_NR_CPUS
> > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > +#endif
Rick Chen March 7, 2019, 9:30 a.m. | #5
Hi Lukas

> > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > Sent: Tuesday, February 12, 2019 6:14 AM
> > To: u-boot@lists.denx.de
> > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer Dabbelt;
> > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi Chen(陳建志); Baruch Siach;
> > Stefan Roese
> > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> >
> > On RISC-V, all harts boot independently. To be able to run on a multi-hart system,
> > U-Boot must be extended with the functionality to manage all harts in the
> > system. A new config option, CONFIG_MAIN_HART, is used to select the hart
> > U-Boot runs on. All other harts are halted.
> > U-Boot can delegate functions to them using smp_call_function().
> >
> > Every hart has a valid pointer to the global data structure and a 8KiB stack by
> > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> >
> > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > ---
> >
> >  arch/riscv/Kconfig           |  12 +++++
> >  arch/riscv/cpu/start.S       | 102 ++++++++++++++++++++++++++++++++++-
> >  arch/riscv/include/asm/csr.h |   1 +
> >  3 files changed, 114 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 3a51339c4d..af8d0f8d67
> > 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -140,4 +140,16 @@ config SBI_IPI
> >       default y if RISCV_SMODE
> >       depends on SMP
> >
> > +config MAIN_HART
> > +     int "Main hart in system"
> > +     default 0
> > +     help
> > +       Some SoCs include harts of various sizes, some of which might not
> > +       be suitable for running U-Boot. CONFIG_MAIN_HART is used to select
> > +       the hart U-Boot runs on.
> > +
> > +config STACK_SIZE_SHIFT
> > +     int
> > +     default 13
> > +
> >  endmenu
> > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S index
> > a30f6f7194..ce7230df37 100644
> > --- a/arch/riscv/cpu/start.S
> > +++ b/arch/riscv/cpu/start.S
> > @@ -13,6 +13,7 @@
> >  #include <config.h>
> >  #include <common.h>
> >  #include <elf.h>
> > +#include <asm/csr.h>
> >  #include <asm/encoding.h>
> >  #include <generated/asm-offsets.h>
> >
> > @@ -45,6 +46,23 @@ _start:
> >       /* mask all interrupts */
> >       csrw    MODE_PREFIX(ie), zero
> >
> > +#ifdef CONFIG_SMP
> > +     /* check if hart is within range */
> > +     /* s0: hart id */
> > +     li      t0, CONFIG_NR_CPUS
> > +     bge     s0, t0, hart_out_of_bounds_loop
> > +#endif
> > +
> > +#ifdef CONFIG_SMP
> > +     /* set xSIE bit to receive IPIs */
> > +#ifdef CONFIG_RISCV_MMODE
> > +     li      t0, MIE_MSIE
> > +#else
> > +     li      t0, SIE_SSIE
> > +#endif
> > +     csrs    MODE_PREFIX(ie), t0
> > +#endif
> > +
> >  /*
> >   * Set stackpointer in internal/ex RAM to call board_init_f
> >   */
> > @@ -56,7 +74,25 @@ call_board_init_f:
> >  call_board_init_f_0:
> >       mv      a0, sp
> >       jal     board_init_f_alloc_reserve
> > +
> > +     /*
> > +      * Set global data pointer here for all harts, uninitialized at this
> > +      * point.
> > +      */
> > +     mv      gp, a0
> > +
> > +     /* setup stack */
> > +#ifdef CONFIG_SMP
> > +     /* s0: hart id */
> > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > +     sub     sp, a0, t0
> > +#else
> >       mv      sp, a0
> > +#endif
> > +
> > +     /* Continue on main hart, others branch to secondary_hart_loop */
> > +     li      t0, CONFIG_MAIN_HART
> > +     bne     s0, t0, secondary_hart_loop
> >
> >       la      t0, prior_stage_fdt_address
> >       SREG    s1, 0(t0)
> > @@ -95,7 +131,14 @@ relocate_code:
> >   *Set up the stack
> >   */
> >  stack_setup:
> > +#ifdef CONFIG_SMP
> > +     /* s0: hart id */
> > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > +     sub     sp, s2, t0
> > +#else
> >       mv      sp, s2
> > +#endif
> > +
> >       la      t0, _start
> >       sub     t6, s4, t0              /* t6 <- relocation offset */
> >       beq     t0, s4, clear_bss       /* skip relocation */
> > @@ -175,13 +218,30 @@ clear_bss:
> >       add     t0, t0, t6              /* t0 <- rel __bss_start in RAM */
> >       la      t1, __bss_end           /* t1 <- rel __bss_end in FLASH */
> >       add     t1, t1, t6              /* t1 <- rel __bss_end in RAM */
> > -     beq     t0, t1, call_board_init_r
> > +     beq     t0, t1, relocate_secondary_harts
> >
> >  clbss_l:
> >       SREG    zero, 0(t0)             /* clear loop... */
> >       addi    t0, t0, REGBYTES
> >       bne     t0, t1, clbss_l
> >
> > +relocate_secondary_harts:
> > +#ifdef CONFIG_SMP
> > +     /* send relocation IPI */
> > +     la      t0, secondary_hart_relocate
> > +     add     a0, t0, t6
> > +
> > +     /* store relocation offset */
> > +     mv      s5, t6
> > +
> > +     mv      a1, s2
> > +     mv      a2, s3
> > +     jal     smp_call_function
> > +
> > +     /* restore relocation offset */
> > +     mv      t6, s5
> > +#endif
> > +
> >  /*
> >   * We are done. Do not return, instead branch to second part of board
> >   * initialization, now running from RAM.
> > @@ -202,3 +262,43 @@ call_board_init_r:
> >   * jump to it ...
> >   */
> >       jr      t4                      /* jump to board_init_r() */
> > +
> > +#ifdef CONFIG_SMP
> > +hart_out_of_bounds_loop:
> > +     /* Harts in this loop are out of bounds, increase CONFIG_NR_CPUS. */
> > +     wfi
> > +     j       hart_out_of_bounds_loop
> > +#endif
> > +
> > +#ifdef CONFIG_SMP
> > +/* SMP relocation entry */
> > +secondary_hart_relocate:
> > +     /* a1: new sp */
> > +     /* a2: new gd */
> > +     /* s0: hart id */
> > +
> > +     /* setup stack */
> > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > +     sub     sp, a1, t0
> > +
> > +     /* update global data pointer */
> > +     mv      gp, a2
> > +#endif
> > +
> > +secondary_hart_loop:
> > +     wfi
> > +
> > +#ifdef CONFIG_SMP
> > +     csrr    t0, MODE_PREFIX(ip)
> > +#ifdef CONFIG_RISCV_MMODE
> > +     andi    t0, t0, MIE_MSIE
> > +#else
> > +     andi    t0, t0, SIE_SSIE
> > +#endif
> > +     beqz    t0, secondary_hart_loop
> > +
> > +     mv      a0, s0
> > +     jal     handle_ipi

I found that s0 maybe corrupted after execute handle_ipi.
Because smp_function will be treated as a return function by compiler,
so compiler will generate codes to execute restore after smp_function().

But actually it is a no-return function. So there maybe no chance to execute
restore. And s0 will be corrupted somehow.

The usage of s0 in v2 flow seem the same as v1.
So I reply mail in v1 patch.

Thanks
Rick


> > +#endif
> > +
> > +     j       secondary_hart_loop
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index
> > 86136f542c..644e6baa15 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -46,6 +46,7 @@
> >  #endif
> >
> >  /* Interrupt Enable and Interrupt Pending flags */
> > +#define MIE_MSIE     _AC(0x00000008, UL) /* Software Interrupt Enable */
> >  #define SIE_SSIE     _AC(0x00000002, UL) /* Software Interrupt Enable */
> >  #define SIE_STIE     _AC(0x00000020, UL) /* Timer Interrupt Enable */
> >
> > --
> > 2.20.1
Auer, Lukas March 10, 2019, 1:58 p.m. | #6
Hi Rick,

On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> Hi Lukas
> 
> > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > To: u-boot@lists.denx.de
> > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > Dabbelt;
> > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi Chen(陳建志);
> > > Baruch Siach;
> > > Stefan Roese
> > > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> > > 
> > > On RISC-V, all harts boot independently. To be able to run on a
> > > multi-hart system,
> > > U-Boot must be extended with the functionality to manage all
> > > harts in the
> > > system. A new config option, CONFIG_MAIN_HART, is used to select
> > > the hart
> > > U-Boot runs on. All other harts are halted.
> > > U-Boot can delegate functions to them using smp_call_function().
> > > 
> > > Every hart has a valid pointer to the global data structure and a
> > > 8KiB stack by
> > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > 
> > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > ---
> > > 
> > >  arch/riscv/Kconfig           |  12 +++++
> > >  arch/riscv/cpu/start.S       | 102
> > > ++++++++++++++++++++++++++++++++++-
> > >  arch/riscv/include/asm/csr.h |   1 +
> > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > 3a51339c4d..af8d0f8d67
> > > 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -140,4 +140,16 @@ config SBI_IPI
> > >       default y if RISCV_SMODE
> > >       depends on SMP
> > > 
> > > +config MAIN_HART
> > > +     int "Main hart in system"
> > > +     default 0
> > > +     help
> > > +       Some SoCs include harts of various sizes, some of which
> > > might not
> > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is used
> > > to select
> > > +       the hart U-Boot runs on.
> > > +
> > > +config STACK_SIZE_SHIFT
> > > +     int
> > > +     default 13
> > > +
> > >  endmenu
> > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > index
> > > a30f6f7194..ce7230df37 100644
> > > --- a/arch/riscv/cpu/start.S
> > > +++ b/arch/riscv/cpu/start.S
> > > @@ -13,6 +13,7 @@
> > >  #include <config.h>
> > >  #include <common.h>
> > >  #include <elf.h>
> > > +#include <asm/csr.h>
> > >  #include <asm/encoding.h>
> > >  #include <generated/asm-offsets.h>
> > > 
> > > @@ -45,6 +46,23 @@ _start:
> > >       /* mask all interrupts */
> > >       csrw    MODE_PREFIX(ie), zero
> > > 
> > > +#ifdef CONFIG_SMP
> > > +     /* check if hart is within range */
> > > +     /* s0: hart id */
> > > +     li      t0, CONFIG_NR_CPUS
> > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > +#endif
> > > +
> > > +#ifdef CONFIG_SMP
> > > +     /* set xSIE bit to receive IPIs */
> > > +#ifdef CONFIG_RISCV_MMODE
> > > +     li      t0, MIE_MSIE
> > > +#else
> > > +     li      t0, SIE_SSIE
> > > +#endif
> > > +     csrs    MODE_PREFIX(ie), t0
> > > +#endif
> > > +
> > >  /*
> > >   * Set stackpointer in internal/ex RAM to call board_init_f
> > >   */
> > > @@ -56,7 +74,25 @@ call_board_init_f:
> > >  call_board_init_f_0:
> > >       mv      a0, sp
> > >       jal     board_init_f_alloc_reserve
> > > +
> > > +     /*
> > > +      * Set global data pointer here for all harts,
> > > uninitialized at this
> > > +      * point.
> > > +      */
> > > +     mv      gp, a0
> > > +
> > > +     /* setup stack */
> > > +#ifdef CONFIG_SMP
> > > +     /* s0: hart id */
> > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > +     sub     sp, a0, t0
> > > +#else
> > >       mv      sp, a0
> > > +#endif
> > > +
> > > +     /* Continue on main hart, others branch to
> > > secondary_hart_loop */
> > > +     li      t0, CONFIG_MAIN_HART
> > > +     bne     s0, t0, secondary_hart_loop
> > > 
> > >       la      t0, prior_stage_fdt_address
> > >       SREG    s1, 0(t0)
> > > @@ -95,7 +131,14 @@ relocate_code:
> > >   *Set up the stack
> > >   */
> > >  stack_setup:
> > > +#ifdef CONFIG_SMP
> > > +     /* s0: hart id */
> > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > +     sub     sp, s2, t0
> > > +#else
> > >       mv      sp, s2
> > > +#endif
> > > +
> > >       la      t0, _start
> > >       sub     t6, s4, t0              /* t6 <- relocation offset
> > > */
> > >       beq     t0, s4, clear_bss       /* skip relocation */
> > > @@ -175,13 +218,30 @@ clear_bss:
> > >       add     t0, t0, t6              /* t0 <- rel __bss_start in
> > > RAM */
> > >       la      t1, __bss_end           /* t1 <- rel __bss_end in
> > > FLASH */
> > >       add     t1, t1, t6              /* t1 <- rel __bss_end in
> > > RAM */
> > > -     beq     t0, t1, call_board_init_r
> > > +     beq     t0, t1, relocate_secondary_harts
> > > 
> > >  clbss_l:
> > >       SREG    zero, 0(t0)             /* clear loop... */
> > >       addi    t0, t0, REGBYTES
> > >       bne     t0, t1, clbss_l
> > > 
> > > +relocate_secondary_harts:
> > > +#ifdef CONFIG_SMP
> > > +     /* send relocation IPI */
> > > +     la      t0, secondary_hart_relocate
> > > +     add     a0, t0, t6
> > > +
> > > +     /* store relocation offset */
> > > +     mv      s5, t6
> > > +
> > > +     mv      a1, s2
> > > +     mv      a2, s3
> > > +     jal     smp_call_function
> > > +
> > > +     /* restore relocation offset */
> > > +     mv      t6, s5
> > > +#endif
> > > +
> > >  /*
> > >   * We are done. Do not return, instead branch to second part of
> > > board
> > >   * initialization, now running from RAM.
> > > @@ -202,3 +262,43 @@ call_board_init_r:
> > >   * jump to it ...
> > >   */
> > >       jr      t4                      /* jump to board_init_r()
> > > */
> > > +
> > > +#ifdef CONFIG_SMP
> > > +hart_out_of_bounds_loop:
> > > +     /* Harts in this loop are out of bounds, increase
> > > CONFIG_NR_CPUS. */
> > > +     wfi
> > > +     j       hart_out_of_bounds_loop
> > > +#endif
> > > +
> > > +#ifdef CONFIG_SMP
> > > +/* SMP relocation entry */
> > > +secondary_hart_relocate:
> > > +     /* a1: new sp */
> > > +     /* a2: new gd */
> > > +     /* s0: hart id */
> > > +
> > > +     /* setup stack */
> > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > +     sub     sp, a1, t0
> > > +
> > > +     /* update global data pointer */
> > > +     mv      gp, a2
> > > +#endif
> > > +
> > > +secondary_hart_loop:
> > > +     wfi
> > > +
> > > +#ifdef CONFIG_SMP
> > > +     csrr    t0, MODE_PREFIX(ip)
> > > +#ifdef CONFIG_RISCV_MMODE
> > > +     andi    t0, t0, MIE_MSIE
> > > +#else
> > > +     andi    t0, t0, SIE_SSIE
> > > +#endif
> > > +     beqz    t0, secondary_hart_loop
> > > +
> > > +     mv      a0, s0
> > > +     jal     handle_ipi
> 
> I found that s0 maybe corrupted after execute handle_ipi.
> Because smp_function will be treated as a return function by
> compiler,
> so compiler will generate codes to execute restore after
> smp_function().
> 
> But actually it is a no-return function. So there maybe no chance to
> execute
> restore. And s0 will be corrupted somehow.
> 
> The usage of s0 in v2 flow seem the same as v1.
> So I reply mail in v1 patch.
> 

Thanks for reporting this issue!
What compiler version are you using? I never saw this issue with GCC
8.2.0, because optimization removes the ret following the call to
smp_function(), it is called using jr. The registers are therefore
restored before jumping to smp_function().

This system is probably too fragile. A simple solution would be to
store the hart ID somewhere else, for example register tp. What do you
think?

Thanks,
Lukas
Anup Patel March 10, 2019, 2:54 p.m. | #7
On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
<lukas.auer@aisec.fraunhofer.de> wrote:
>
> Hi Rick,
>
> On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > Hi Lukas
> >
> > > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > To: u-boot@lists.denx.de
> > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > > Dabbelt;
> > > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi Chen(陳建志);
> > > > Baruch Siach;
> > > > Stefan Roese
> > > > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> > > >
> > > > On RISC-V, all harts boot independently. To be able to run on a
> > > > multi-hart system,
> > > > U-Boot must be extended with the functionality to manage all
> > > > harts in the
> > > > system. A new config option, CONFIG_MAIN_HART, is used to select
> > > > the hart
> > > > U-Boot runs on. All other harts are halted.
> > > > U-Boot can delegate functions to them using smp_call_function().
> > > >
> > > > Every hart has a valid pointer to the global data structure and a
> > > > 8KiB stack by
> > > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > >
> > > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > > ---
> > > >
> > > >  arch/riscv/Kconfig           |  12 +++++
> > > >  arch/riscv/cpu/start.S       | 102
> > > > ++++++++++++++++++++++++++++++++++-
> > > >  arch/riscv/include/asm/csr.h |   1 +
> > > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > > 3a51339c4d..af8d0f8d67
> > > > 100644
> > > > --- a/arch/riscv/Kconfig
> > > > +++ b/arch/riscv/Kconfig
> > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > >       default y if RISCV_SMODE
> > > >       depends on SMP
> > > >
> > > > +config MAIN_HART
> > > > +     int "Main hart in system"
> > > > +     default 0
> > > > +     help
> > > > +       Some SoCs include harts of various sizes, some of which
> > > > might not
> > > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is used
> > > > to select
> > > > +       the hart U-Boot runs on.
> > > > +
> > > > +config STACK_SIZE_SHIFT
> > > > +     int
> > > > +     default 13
> > > > +
> > > >  endmenu
> > > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > > index
> > > > a30f6f7194..ce7230df37 100644
> > > > --- a/arch/riscv/cpu/start.S
> > > > +++ b/arch/riscv/cpu/start.S
> > > > @@ -13,6 +13,7 @@
> > > >  #include <config.h>
> > > >  #include <common.h>
> > > >  #include <elf.h>
> > > > +#include <asm/csr.h>
> > > >  #include <asm/encoding.h>
> > > >  #include <generated/asm-offsets.h>
> > > >
> > > > @@ -45,6 +46,23 @@ _start:
> > > >       /* mask all interrupts */
> > > >       csrw    MODE_PREFIX(ie), zero
> > > >
> > > > +#ifdef CONFIG_SMP
> > > > +     /* check if hart is within range */
> > > > +     /* s0: hart id */
> > > > +     li      t0, CONFIG_NR_CPUS
> > > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > +     /* set xSIE bit to receive IPIs */
> > > > +#ifdef CONFIG_RISCV_MMODE
> > > > +     li      t0, MIE_MSIE
> > > > +#else
> > > > +     li      t0, SIE_SSIE
> > > > +#endif
> > > > +     csrs    MODE_PREFIX(ie), t0
> > > > +#endif
> > > > +
> > > >  /*
> > > >   * Set stackpointer in internal/ex RAM to call board_init_f
> > > >   */
> > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > >  call_board_init_f_0:
> > > >       mv      a0, sp
> > > >       jal     board_init_f_alloc_reserve
> > > > +
> > > > +     /*
> > > > +      * Set global data pointer here for all harts,
> > > > uninitialized at this
> > > > +      * point.
> > > > +      */
> > > > +     mv      gp, a0
> > > > +
> > > > +     /* setup stack */
> > > > +#ifdef CONFIG_SMP
> > > > +     /* s0: hart id */
> > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > +     sub     sp, a0, t0
> > > > +#else
> > > >       mv      sp, a0
> > > > +#endif
> > > > +
> > > > +     /* Continue on main hart, others branch to
> > > > secondary_hart_loop */
> > > > +     li      t0, CONFIG_MAIN_HART
> > > > +     bne     s0, t0, secondary_hart_loop
> > > >
> > > >       la      t0, prior_stage_fdt_address
> > > >       SREG    s1, 0(t0)
> > > > @@ -95,7 +131,14 @@ relocate_code:
> > > >   *Set up the stack
> > > >   */
> > > >  stack_setup:
> > > > +#ifdef CONFIG_SMP
> > > > +     /* s0: hart id */
> > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > +     sub     sp, s2, t0
> > > > +#else
> > > >       mv      sp, s2
> > > > +#endif
> > > > +
> > > >       la      t0, _start
> > > >       sub     t6, s4, t0              /* t6 <- relocation offset
> > > > */
> > > >       beq     t0, s4, clear_bss       /* skip relocation */
> > > > @@ -175,13 +218,30 @@ clear_bss:
> > > >       add     t0, t0, t6              /* t0 <- rel __bss_start in
> > > > RAM */
> > > >       la      t1, __bss_end           /* t1 <- rel __bss_end in
> > > > FLASH */
> > > >       add     t1, t1, t6              /* t1 <- rel __bss_end in
> > > > RAM */
> > > > -     beq     t0, t1, call_board_init_r
> > > > +     beq     t0, t1, relocate_secondary_harts
> > > >
> > > >  clbss_l:
> > > >       SREG    zero, 0(t0)             /* clear loop... */
> > > >       addi    t0, t0, REGBYTES
> > > >       bne     t0, t1, clbss_l
> > > >
> > > > +relocate_secondary_harts:
> > > > +#ifdef CONFIG_SMP
> > > > +     /* send relocation IPI */
> > > > +     la      t0, secondary_hart_relocate
> > > > +     add     a0, t0, t6
> > > > +
> > > > +     /* store relocation offset */
> > > > +     mv      s5, t6
> > > > +
> > > > +     mv      a1, s2
> > > > +     mv      a2, s3
> > > > +     jal     smp_call_function
> > > > +
> > > > +     /* restore relocation offset */
> > > > +     mv      t6, s5
> > > > +#endif
> > > > +
> > > >  /*
> > > >   * We are done. Do not return, instead branch to second part of
> > > > board
> > > >   * initialization, now running from RAM.
> > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > >   * jump to it ...
> > > >   */
> > > >       jr      t4                      /* jump to board_init_r()
> > > > */
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > +hart_out_of_bounds_loop:
> > > > +     /* Harts in this loop are out of bounds, increase
> > > > CONFIG_NR_CPUS. */
> > > > +     wfi
> > > > +     j       hart_out_of_bounds_loop
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > +/* SMP relocation entry */
> > > > +secondary_hart_relocate:
> > > > +     /* a1: new sp */
> > > > +     /* a2: new gd */
> > > > +     /* s0: hart id */
> > > > +
> > > > +     /* setup stack */
> > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > +     sub     sp, a1, t0
> > > > +
> > > > +     /* update global data pointer */
> > > > +     mv      gp, a2
> > > > +#endif
> > > > +
> > > > +secondary_hart_loop:
> > > > +     wfi
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > +     csrr    t0, MODE_PREFIX(ip)
> > > > +#ifdef CONFIG_RISCV_MMODE
> > > > +     andi    t0, t0, MIE_MSIE
> > > > +#else
> > > > +     andi    t0, t0, SIE_SSIE
> > > > +#endif
> > > > +     beqz    t0, secondary_hart_loop
> > > > +
> > > > +     mv      a0, s0
> > > > +     jal     handle_ipi
> >
> > I found that s0 maybe corrupted after execute handle_ipi.
> > Because smp_function will be treated as a return function by
> > compiler,
> > so compiler will generate codes to execute restore after
> > smp_function().
> >
> > But actually it is a no-return function. So there maybe no chance to
> > execute
> > restore. And s0 will be corrupted somehow.
> >
> > The usage of s0 in v2 flow seem the same as v1.
> > So I reply mail in v1 patch.
> >
>
> Thanks for reporting this issue!
> What compiler version are you using? I never saw this issue with GCC
> 8.2.0, because optimization removes the ret following the call to
> smp_function(), it is called using jr. The registers are therefore
> restored before jumping to smp_function().
>
> This system is probably too fragile. A simple solution would be to
> store the hart ID somewhere else, for example register tp. What do you
> think?

I think you can also use scratch/mscratch (if it is not used anywhere
else).

Regards,
Anup
Auer, Lukas March 10, 2019, 6:12 p.m. | #8
On Sun, 2019-03-10 at 20:24 +0530, Anup Patel wrote:
> On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
> <lukas.auer@aisec.fraunhofer.de> wrote:
> > Hi Rick,
> > 
> > On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > > Hi Lukas
> > > 
> > > > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > > To: u-boot@lists.denx.de
> > > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > > > Dabbelt;
> > > > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi
> > > > > Chen(陳建志);
> > > > > Baruch Siach;
> > > > > Stefan Roese
> > > > > Subject: [PATCH 5/7] riscv: add support for multi-hart
> > > > > systems
> > > > > 
> > > > > On RISC-V, all harts boot independently. To be able to run on
> > > > > a
> > > > > multi-hart system,
> > > > > U-Boot must be extended with the functionality to manage all
> > > > > harts in the
> > > > > system. A new config option, CONFIG_MAIN_HART, is used to
> > > > > select
> > > > > the hart
> > > > > U-Boot runs on. All other harts are halted.
> > > > > U-Boot can delegate functions to them using
> > > > > smp_call_function().
> > > > > 
> > > > > Every hart has a valid pointer to the global data structure
> > > > > and a
> > > > > 8KiB stack by
> > > > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > > > 
> > > > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > > > ---
> > > > > 
> > > > >  arch/riscv/Kconfig           |  12 +++++
> > > > >  arch/riscv/cpu/start.S       | 102
> > > > > ++++++++++++++++++++++++++++++++++-
> > > > >  arch/riscv/include/asm/csr.h |   1 +
> > > > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > > > 3a51339c4d..af8d0f8d67
> > > > > 100644
> > > > > --- a/arch/riscv/Kconfig
> > > > > +++ b/arch/riscv/Kconfig
> > > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > > >       default y if RISCV_SMODE
> > > > >       depends on SMP
> > > > > 
> > > > > +config MAIN_HART
> > > > > +     int "Main hart in system"
> > > > > +     default 0
> > > > > +     help
> > > > > +       Some SoCs include harts of various sizes, some of
> > > > > which
> > > > > might not
> > > > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is
> > > > > used
> > > > > to select
> > > > > +       the hart U-Boot runs on.
> > > > > +
> > > > > +config STACK_SIZE_SHIFT
> > > > > +     int
> > > > > +     default 13
> > > > > +
> > > > >  endmenu
> > > > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > > > index
> > > > > a30f6f7194..ce7230df37 100644
> > > > > --- a/arch/riscv/cpu/start.S
> > > > > +++ b/arch/riscv/cpu/start.S
> > > > > @@ -13,6 +13,7 @@
> > > > >  #include <config.h>
> > > > >  #include <common.h>
> > > > >  #include <elf.h>
> > > > > +#include <asm/csr.h>
> > > > >  #include <asm/encoding.h>
> > > > >  #include <generated/asm-offsets.h>
> > > > > 
> > > > > @@ -45,6 +46,23 @@ _start:
> > > > >       /* mask all interrupts */
> > > > >       csrw    MODE_PREFIX(ie), zero
> > > > > 
> > > > > +#ifdef CONFIG_SMP
> > > > > +     /* check if hart is within range */
> > > > > +     /* s0: hart id */
> > > > > +     li      t0, CONFIG_NR_CPUS
> > > > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > > > +#endif
> > > > > +
> > > > > +#ifdef CONFIG_SMP
> > > > > +     /* set xSIE bit to receive IPIs */
> > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > +     li      t0, MIE_MSIE
> > > > > +#else
> > > > > +     li      t0, SIE_SSIE
> > > > > +#endif
> > > > > +     csrs    MODE_PREFIX(ie), t0
> > > > > +#endif
> > > > > +
> > > > >  /*
> > > > >   * Set stackpointer in internal/ex RAM to call board_init_f
> > > > >   */
> > > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > > >  call_board_init_f_0:
> > > > >       mv      a0, sp
> > > > >       jal     board_init_f_alloc_reserve
> > > > > +
> > > > > +     /*
> > > > > +      * Set global data pointer here for all harts,
> > > > > uninitialized at this
> > > > > +      * point.
> > > > > +      */
> > > > > +     mv      gp, a0
> > > > > +
> > > > > +     /* setup stack */
> > > > > +#ifdef CONFIG_SMP
> > > > > +     /* s0: hart id */
> > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > +     sub     sp, a0, t0
> > > > > +#else
> > > > >       mv      sp, a0
> > > > > +#endif
> > > > > +
> > > > > +     /* Continue on main hart, others branch to
> > > > > secondary_hart_loop */
> > > > > +     li      t0, CONFIG_MAIN_HART
> > > > > +     bne     s0, t0, secondary_hart_loop
> > > > > 
> > > > >       la      t0, prior_stage_fdt_address
> > > > >       SREG    s1, 0(t0)
> > > > > @@ -95,7 +131,14 @@ relocate_code:
> > > > >   *Set up the stack
> > > > >   */
> > > > >  stack_setup:
> > > > > +#ifdef CONFIG_SMP
> > > > > +     /* s0: hart id */
> > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > +     sub     sp, s2, t0
> > > > > +#else
> > > > >       mv      sp, s2
> > > > > +#endif
> > > > > +
> > > > >       la      t0, _start
> > > > >       sub     t6, s4, t0              /* t6 <- relocation
> > > > > offset
> > > > > */
> > > > >       beq     t0, s4, clear_bss       /* skip relocation */
> > > > > @@ -175,13 +218,30 @@ clear_bss:
> > > > >       add     t0, t0, t6              /* t0 <- rel
> > > > > __bss_start in
> > > > > RAM */
> > > > >       la      t1, __bss_end           /* t1 <- rel __bss_end
> > > > > in
> > > > > FLASH */
> > > > >       add     t1, t1, t6              /* t1 <- rel __bss_end
> > > > > in
> > > > > RAM */
> > > > > -     beq     t0, t1, call_board_init_r
> > > > > +     beq     t0, t1, relocate_secondary_harts
> > > > > 
> > > > >  clbss_l:
> > > > >       SREG    zero, 0(t0)             /* clear loop... */
> > > > >       addi    t0, t0, REGBYTES
> > > > >       bne     t0, t1, clbss_l
> > > > > 
> > > > > +relocate_secondary_harts:
> > > > > +#ifdef CONFIG_SMP
> > > > > +     /* send relocation IPI */
> > > > > +     la      t0, secondary_hart_relocate
> > > > > +     add     a0, t0, t6
> > > > > +
> > > > > +     /* store relocation offset */
> > > > > +     mv      s5, t6
> > > > > +
> > > > > +     mv      a1, s2
> > > > > +     mv      a2, s3
> > > > > +     jal     smp_call_function
> > > > > +
> > > > > +     /* restore relocation offset */
> > > > > +     mv      t6, s5
> > > > > +#endif
> > > > > +
> > > > >  /*
> > > > >   * We are done. Do not return, instead branch to second part
> > > > > of
> > > > > board
> > > > >   * initialization, now running from RAM.
> > > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > > >   * jump to it ...
> > > > >   */
> > > > >       jr      t4                      /* jump to
> > > > > board_init_r()
> > > > > */
> > > > > +
> > > > > +#ifdef CONFIG_SMP
> > > > > +hart_out_of_bounds_loop:
> > > > > +     /* Harts in this loop are out of bounds, increase
> > > > > CONFIG_NR_CPUS. */
> > > > > +     wfi
> > > > > +     j       hart_out_of_bounds_loop
> > > > > +#endif
> > > > > +
> > > > > +#ifdef CONFIG_SMP
> > > > > +/* SMP relocation entry */
> > > > > +secondary_hart_relocate:
> > > > > +     /* a1: new sp */
> > > > > +     /* a2: new gd */
> > > > > +     /* s0: hart id */
> > > > > +
> > > > > +     /* setup stack */
> > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > +     sub     sp, a1, t0
> > > > > +
> > > > > +     /* update global data pointer */
> > > > > +     mv      gp, a2
> > > > > +#endif
> > > > > +
> > > > > +secondary_hart_loop:
> > > > > +     wfi
> > > > > +
> > > > > +#ifdef CONFIG_SMP
> > > > > +     csrr    t0, MODE_PREFIX(ip)
> > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > +     andi    t0, t0, MIE_MSIE
> > > > > +#else
> > > > > +     andi    t0, t0, SIE_SSIE
> > > > > +#endif
> > > > > +     beqz    t0, secondary_hart_loop
> > > > > +
> > > > > +     mv      a0, s0
> > > > > +     jal     handle_ipi
> > > 
> > > I found that s0 maybe corrupted after execute handle_ipi.
> > > Because smp_function will be treated as a return function by
> > > compiler,
> > > so compiler will generate codes to execute restore after
> > > smp_function().
> > > 
> > > But actually it is a no-return function. So there maybe no chance
> > > to
> > > execute
> > > restore. And s0 will be corrupted somehow.
> > > 
> > > The usage of s0 in v2 flow seem the same as v1.
> > > So I reply mail in v1 patch.
> > > 
> > 
> > Thanks for reporting this issue!
> > What compiler version are you using? I never saw this issue with
> > GCC
> > 8.2.0, because optimization removes the ret following the call to
> > smp_function(), it is called using jr. The registers are therefore
> > restored before jumping to smp_function().
> > 
> > This system is probably too fragile. A simple solution would be to
> > store the hart ID somewhere else, for example register tp. What do
> > you
> > think?
> 
> I think you can also use scratch/mscratch (if it is not used anywhere
> else).
> 

You are right, sscratch and mscratch are also not currently used in  
U-Boot. Is there an advantage to use the scratch CSRs instead of tp?

Thanks,
Lukas
Anup Patel March 10, 2019, 8:56 p.m. | #9
> -----Original Message-----
> From: U-Boot <u-boot-bounces@lists.denx.de> On Behalf Of Auer, Lukas
> Sent: Sunday, March 10, 2019 11:42 PM
> To: anup@brainfault.org
> Cc: rickchen36@gmail.com; baruch@tkos.co.il; sr@denx.de;
> cmchen@andestech.com; greentime@andestech.com; schwab@suse.de;
> palmer@sifive.com; agraf@suse.de; u-boot@lists.denx.de;
> kito@andestech.com
> Subject: Re: [U-Boot] [PATCH 5/7] riscv: add support for multi-hart systems
> 
> On Sun, 2019-03-10 at 20:24 +0530, Anup Patel wrote:
> > On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
> > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > Hi Rick,
> > >
> > > On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > > > Hi Lukas
> > > >
> > > > > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > > > To: u-boot@lists.denx.de
> > > > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > > > > Dabbelt; Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi
> > > > > > Chen(陳建志);
> > > > > > Baruch Siach;
> > > > > > Stefan Roese
> > > > > > Subject: [PATCH 5/7] riscv: add support for multi-hart systems
> > > > > >
> > > > > > On RISC-V, all harts boot independently. To be able to run on
> > > > > > a multi-hart system, U-Boot must be extended with the
> > > > > > functionality to manage all harts in the system. A new config
> > > > > > option, CONFIG_MAIN_HART, is used to select the hart U-Boot
> > > > > > runs on. All other harts are halted.
> > > > > > U-Boot can delegate functions to them using
> > > > > > smp_call_function().
> > > > > >
> > > > > > Every hart has a valid pointer to the global data structure
> > > > > > and a 8KiB stack by default. The stack size is set with
> > > > > > CONFIG_STACK_SIZE_SHIFT.
> > > > > >
> > > > > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > > > > ---
> > > > > >
> > > > > >  arch/riscv/Kconfig           |  12 +++++
> > > > > >  arch/riscv/cpu/start.S       | 102
> > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > >  arch/riscv/include/asm/csr.h |   1 +
> > > > > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > > > > 3a51339c4d..af8d0f8d67
> > > > > > 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > > > >       default y if RISCV_SMODE
> > > > > >       depends on SMP
> > > > > >
> > > > > > +config MAIN_HART
> > > > > > +     int "Main hart in system"
> > > > > > +     default 0
> > > > > > +     help
> > > > > > +       Some SoCs include harts of various sizes, some of
> > > > > > which
> > > > > > might not
> > > > > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is
> > > > > > used
> > > > > > to select
> > > > > > +       the hart U-Boot runs on.
> > > > > > +
> > > > > > +config STACK_SIZE_SHIFT
> > > > > > +     int
> > > > > > +     default 13
> > > > > > +
> > > > > >  endmenu
> > > > > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > > > > index
> > > > > > a30f6f7194..ce7230df37 100644
> > > > > > --- a/arch/riscv/cpu/start.S
> > > > > > +++ b/arch/riscv/cpu/start.S
> > > > > > @@ -13,6 +13,7 @@
> > > > > >  #include <config.h>
> > > > > >  #include <common.h>
> > > > > >  #include <elf.h>
> > > > > > +#include <asm/csr.h>
> > > > > >  #include <asm/encoding.h>
> > > > > >  #include <generated/asm-offsets.h>
> > > > > >
> > > > > > @@ -45,6 +46,23 @@ _start:
> > > > > >       /* mask all interrupts */
> > > > > >       csrw    MODE_PREFIX(ie), zero
> > > > > >
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* check if hart is within range */
> > > > > > +     /* s0: hart id */
> > > > > > +     li      t0, CONFIG_NR_CPUS
> > > > > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* set xSIE bit to receive IPIs */ #ifdef
> > > > > > +CONFIG_RISCV_MMODE
> > > > > > +     li      t0, MIE_MSIE
> > > > > > +#else
> > > > > > +     li      t0, SIE_SSIE
> > > > > > +#endif
> > > > > > +     csrs    MODE_PREFIX(ie), t0
> > > > > > +#endif
> > > > > > +
> > > > > >  /*
> > > > > >   * Set stackpointer in internal/ex RAM to call board_init_f
> > > > > >   */
> > > > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > > > >  call_board_init_f_0:
> > > > > >       mv      a0, sp
> > > > > >       jal     board_init_f_alloc_reserve
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Set global data pointer here for all harts,
> > > > > > uninitialized at this
> > > > > > +      * point.
> > > > > > +      */
> > > > > > +     mv      gp, a0
> > > > > > +
> > > > > > +     /* setup stack */
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* s0: hart id */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, a0, t0
> > > > > > +#else
> > > > > >       mv      sp, a0
> > > > > > +#endif
> > > > > > +
> > > > > > +     /* Continue on main hart, others branch to
> > > > > > secondary_hart_loop */
> > > > > > +     li      t0, CONFIG_MAIN_HART
> > > > > > +     bne     s0, t0, secondary_hart_loop
> > > > > >
> > > > > >       la      t0, prior_stage_fdt_address
> > > > > >       SREG    s1, 0(t0)
> > > > > > @@ -95,7 +131,14 @@ relocate_code:
> > > > > >   *Set up the stack
> > > > > >   */
> > > > > >  stack_setup:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* s0: hart id */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, s2, t0
> > > > > > +#else
> > > > > >       mv      sp, s2
> > > > > > +#endif
> > > > > > +
> > > > > >       la      t0, _start
> > > > > >       sub     t6, s4, t0              /* t6 <- relocation
> > > > > > offset
> > > > > > */
> > > > > >       beq     t0, s4, clear_bss       /* skip relocation */
> > > > > > @@ -175,13 +218,30 @@ clear_bss:
> > > > > >       add     t0, t0, t6              /* t0 <- rel
> > > > > > __bss_start in
> > > > > > RAM */
> > > > > >       la      t1, __bss_end           /* t1 <- rel __bss_end
> > > > > > in
> > > > > > FLASH */
> > > > > >       add     t1, t1, t6              /* t1 <- rel __bss_end
> > > > > > in
> > > > > > RAM */
> > > > > > -     beq     t0, t1, call_board_init_r
> > > > > > +     beq     t0, t1, relocate_secondary_harts
> > > > > >
> > > > > >  clbss_l:
> > > > > >       SREG    zero, 0(t0)             /* clear loop... */
> > > > > >       addi    t0, t0, REGBYTES
> > > > > >       bne     t0, t1, clbss_l
> > > > > >
> > > > > > +relocate_secondary_harts:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* send relocation IPI */
> > > > > > +     la      t0, secondary_hart_relocate
> > > > > > +     add     a0, t0, t6
> > > > > > +
> > > > > > +     /* store relocation offset */
> > > > > > +     mv      s5, t6
> > > > > > +
> > > > > > +     mv      a1, s2
> > > > > > +     mv      a2, s3
> > > > > > +     jal     smp_call_function
> > > > > > +
> > > > > > +     /* restore relocation offset */
> > > > > > +     mv      t6, s5
> > > > > > +#endif
> > > > > > +
> > > > > >  /*
> > > > > >   * We are done. Do not return, instead branch to second part
> > > > > > of board
> > > > > >   * initialization, now running from RAM.
> > > > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > > > >   * jump to it ...
> > > > > >   */
> > > > > >       jr      t4                      /* jump to
> > > > > > board_init_r()
> > > > > > */
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +hart_out_of_bounds_loop:
> > > > > > +     /* Harts in this loop are out of bounds, increase
> > > > > > CONFIG_NR_CPUS. */
> > > > > > +     wfi
> > > > > > +     j       hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +/* SMP relocation entry */
> > > > > > +secondary_hart_relocate:
> > > > > > +     /* a1: new sp */
> > > > > > +     /* a2: new gd */
> > > > > > +     /* s0: hart id */
> > > > > > +
> > > > > > +     /* setup stack */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, a1, t0
> > > > > > +
> > > > > > +     /* update global data pointer */
> > > > > > +     mv      gp, a2
> > > > > > +#endif
> > > > > > +
> > > > > > +secondary_hart_loop:
> > > > > > +     wfi
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     csrr    t0, MODE_PREFIX(ip)
> > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > +     andi    t0, t0, MIE_MSIE
> > > > > > +#else
> > > > > > +     andi    t0, t0, SIE_SSIE
> > > > > > +#endif
> > > > > > +     beqz    t0, secondary_hart_loop
> > > > > > +
> > > > > > +     mv      a0, s0
> > > > > > +     jal     handle_ipi
> > > >
> > > > I found that s0 maybe corrupted after execute handle_ipi.
> > > > Because smp_function will be treated as a return function by
> > > > compiler, so compiler will generate codes to execute restore after
> > > > smp_function().
> > > >
> > > > But actually it is a no-return function. So there maybe no chance
> > > > to execute restore. And s0 will be corrupted somehow.
> > > >
> > > > The usage of s0 in v2 flow seem the same as v1.
> > > > So I reply mail in v1 patch.
> > > >
> > >
> > > Thanks for reporting this issue!
> > > What compiler version are you using? I never saw this issue with GCC
> > > 8.2.0, because optimization removes the ret following the call to
> > > smp_function(), it is called using jr. The registers are therefore
> > > restored before jumping to smp_function().
> > >
> > > This system is probably too fragile. A simple solution would be to
> > > store the hart ID somewhere else, for example register tp. What do
> > > you think?
> >
> > I think you can also use scratch/mscratch (if it is not used anywhere
> > else).
> >
> 
> You are right, sscratch and mscratch are also not currently used in U-Boot. Is
> there an advantage to use the scratch CSRs instead of tp?

No specific advantage as such. This was another possibility. 

I suggest you go ahead with TP because mscratch will be used by M-mode
U-Boot when linked with OpenSBI as library.

Regards,
Anup
Rick Chen March 12, 2019, 1:15 a.m. | #10
Hi Lukas

Auer, Lukas <lukas.auer@aisec.fraunhofer.de> 於 2019年3月11日 週一 上午2:12寫道:
>
> On Sun, 2019-03-10 at 20:24 +0530, Anup Patel wrote:
> > On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
> > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > Hi Rick,
> > >
> > > On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > > > Hi Lukas
> > > >
> > > > > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > > > To: u-boot@lists.denx.de
> > > > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab; Palmer
> > > > > > Dabbelt;
> > > > > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi
> > > > > > Chen(陳建志);
> > > > > > Baruch Siach;
> > > > > > Stefan Roese
> > > > > > Subject: [PATCH 5/7] riscv: add support for multi-hart
> > > > > > systems
> > > > > >
> > > > > > On RISC-V, all harts boot independently. To be able to run on
> > > > > > a
> > > > > > multi-hart system,
> > > > > > U-Boot must be extended with the functionality to manage all
> > > > > > harts in the
> > > > > > system. A new config option, CONFIG_MAIN_HART, is used to
> > > > > > select
> > > > > > the hart
> > > > > > U-Boot runs on. All other harts are halted.
> > > > > > U-Boot can delegate functions to them using
> > > > > > smp_call_function().
> > > > > >
> > > > > > Every hart has a valid pointer to the global data structure
> > > > > > and a
> > > > > > 8KiB stack by
> > > > > > default. The stack size is set with CONFIG_STACK_SIZE_SHIFT.
> > > > > >
> > > > > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de>
> > > > > > ---
> > > > > >
> > > > > >  arch/riscv/Kconfig           |  12 +++++
> > > > > >  arch/riscv/cpu/start.S       | 102
> > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > >  arch/riscv/include/asm/csr.h |   1 +
> > > > > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index
> > > > > > 3a51339c4d..af8d0f8d67
> > > > > > 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > > > >       default y if RISCV_SMODE
> > > > > >       depends on SMP
> > > > > >
> > > > > > +config MAIN_HART
> > > > > > +     int "Main hart in system"
> > > > > > +     default 0
> > > > > > +     help
> > > > > > +       Some SoCs include harts of various sizes, some of
> > > > > > which
> > > > > > might not
> > > > > > +       be suitable for running U-Boot. CONFIG_MAIN_HART is
> > > > > > used
> > > > > > to select
> > > > > > +       the hart U-Boot runs on.
> > > > > > +
> > > > > > +config STACK_SIZE_SHIFT
> > > > > > +     int
> > > > > > +     default 13
> > > > > > +
> > > > > >  endmenu
> > > > > > diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> > > > > > index
> > > > > > a30f6f7194..ce7230df37 100644
> > > > > > --- a/arch/riscv/cpu/start.S
> > > > > > +++ b/arch/riscv/cpu/start.S
> > > > > > @@ -13,6 +13,7 @@
> > > > > >  #include <config.h>
> > > > > >  #include <common.h>
> > > > > >  #include <elf.h>
> > > > > > +#include <asm/csr.h>
> > > > > >  #include <asm/encoding.h>
> > > > > >  #include <generated/asm-offsets.h>
> > > > > >
> > > > > > @@ -45,6 +46,23 @@ _start:
> > > > > >       /* mask all interrupts */
> > > > > >       csrw    MODE_PREFIX(ie), zero
> > > > > >
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* check if hart is within range */
> > > > > > +     /* s0: hart id */
> > > > > > +     li      t0, CONFIG_NR_CPUS
> > > > > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* set xSIE bit to receive IPIs */
> > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > +     li      t0, MIE_MSIE
> > > > > > +#else
> > > > > > +     li      t0, SIE_SSIE
> > > > > > +#endif
> > > > > > +     csrs    MODE_PREFIX(ie), t0
> > > > > > +#endif
> > > > > > +
> > > > > >  /*
> > > > > >   * Set stackpointer in internal/ex RAM to call board_init_f
> > > > > >   */
> > > > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > > > >  call_board_init_f_0:
> > > > > >       mv      a0, sp
> > > > > >       jal     board_init_f_alloc_reserve
> > > > > > +
> > > > > > +     /*
> > > > > > +      * Set global data pointer here for all harts,
> > > > > > uninitialized at this
> > > > > > +      * point.
> > > > > > +      */
> > > > > > +     mv      gp, a0
> > > > > > +
> > > > > > +     /* setup stack */
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* s0: hart id */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, a0, t0
> > > > > > +#else
> > > > > >       mv      sp, a0
> > > > > > +#endif
> > > > > > +
> > > > > > +     /* Continue on main hart, others branch to
> > > > > > secondary_hart_loop */
> > > > > > +     li      t0, CONFIG_MAIN_HART
> > > > > > +     bne     s0, t0, secondary_hart_loop
> > > > > >
> > > > > >       la      t0, prior_stage_fdt_address
> > > > > >       SREG    s1, 0(t0)
> > > > > > @@ -95,7 +131,14 @@ relocate_code:
> > > > > >   *Set up the stack
> > > > > >   */
> > > > > >  stack_setup:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* s0: hart id */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, s2, t0
> > > > > > +#else
> > > > > >       mv      sp, s2
> > > > > > +#endif
> > > > > > +
> > > > > >       la      t0, _start
> > > > > >       sub     t6, s4, t0              /* t6 <- relocation
> > > > > > offset
> > > > > > */
> > > > > >       beq     t0, s4, clear_bss       /* skip relocation */
> > > > > > @@ -175,13 +218,30 @@ clear_bss:
> > > > > >       add     t0, t0, t6              /* t0 <- rel
> > > > > > __bss_start in
> > > > > > RAM */
> > > > > >       la      t1, __bss_end           /* t1 <- rel __bss_end
> > > > > > in
> > > > > > FLASH */
> > > > > >       add     t1, t1, t6              /* t1 <- rel __bss_end
> > > > > > in
> > > > > > RAM */
> > > > > > -     beq     t0, t1, call_board_init_r
> > > > > > +     beq     t0, t1, relocate_secondary_harts
> > > > > >
> > > > > >  clbss_l:
> > > > > >       SREG    zero, 0(t0)             /* clear loop... */
> > > > > >       addi    t0, t0, REGBYTES
> > > > > >       bne     t0, t1, clbss_l
> > > > > >
> > > > > > +relocate_secondary_harts:
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     /* send relocation IPI */
> > > > > > +     la      t0, secondary_hart_relocate
> > > > > > +     add     a0, t0, t6
> > > > > > +
> > > > > > +     /* store relocation offset */
> > > > > > +     mv      s5, t6
> > > > > > +
> > > > > > +     mv      a1, s2
> > > > > > +     mv      a2, s3
> > > > > > +     jal     smp_call_function
> > > > > > +
> > > > > > +     /* restore relocation offset */
> > > > > > +     mv      t6, s5
> > > > > > +#endif
> > > > > > +
> > > > > >  /*
> > > > > >   * We are done. Do not return, instead branch to second part
> > > > > > of
> > > > > > board
> > > > > >   * initialization, now running from RAM.
> > > > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > > > >   * jump to it ...
> > > > > >   */
> > > > > >       jr      t4                      /* jump to
> > > > > > board_init_r()
> > > > > > */
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +hart_out_of_bounds_loop:
> > > > > > +     /* Harts in this loop are out of bounds, increase
> > > > > > CONFIG_NR_CPUS. */
> > > > > > +     wfi
> > > > > > +     j       hart_out_of_bounds_loop
> > > > > > +#endif
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +/* SMP relocation entry */
> > > > > > +secondary_hart_relocate:
> > > > > > +     /* a1: new sp */
> > > > > > +     /* a2: new gd */
> > > > > > +     /* s0: hart id */
> > > > > > +
> > > > > > +     /* setup stack */
> > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > +     sub     sp, a1, t0
> > > > > > +
> > > > > > +     /* update global data pointer */
> > > > > > +     mv      gp, a2
> > > > > > +#endif
> > > > > > +
> > > > > > +secondary_hart_loop:
> > > > > > +     wfi
> > > > > > +
> > > > > > +#ifdef CONFIG_SMP
> > > > > > +     csrr    t0, MODE_PREFIX(ip)
> > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > +     andi    t0, t0, MIE_MSIE
> > > > > > +#else
> > > > > > +     andi    t0, t0, SIE_SSIE
> > > > > > +#endif
> > > > > > +     beqz    t0, secondary_hart_loop
> > > > > > +
> > > > > > +     mv      a0, s0
> > > > > > +     jal     handle_ipi
> > > >
> > > > I found that s0 maybe corrupted after execute handle_ipi.
> > > > Because smp_function will be treated as a return function by
> > > > compiler,
> > > > so compiler will generate codes to execute restore after
> > > > smp_function().
> > > >
> > > > But actually it is a no-return function. So there maybe no chance
> > > > to
> > > > execute
> > > > restore. And s0 will be corrupted somehow.
> > > >
> > > > The usage of s0 in v2 flow seem the same as v1.
> > > > So I reply mail in v1 patch.
> > > >
> > >
> > > Thanks for reporting this issue!
> > > What compiler version are you using? I never saw this issue with
> > > GCC
> > > 8.2.0, because optimization removes the ret following the call to
> > > smp_function(), it is called using jr. The registers are therefore
> > > restored before jumping to smp_function().
> > >

I found this issue with Andestech's toolchain (GCC version is 7.3.0)
And it's code gen will be as below:

00000e94 <handle_ipi>:
     e94:       4785                    c.li    a5,1
     e96:       04a7e663                bltu    a5,a0,ee2 <handle_ipi+0x4e>
     e9a:       456362ef                jal     t0,372f0 <__riscv_save_0>
     e9e:       84aa                    c.mv    s1,a0
     ea0:       3539                    c.jal   cae <riscv_clear_ipi>
     ea2:       cd19                    c.beqz  a0,ec0 <handle_ipi+0x2c>
     ea4:       0003d517                auipc   a0,0x3d
     ea8:       7c450513                addi    a0,a0,1988 # 3e668
<freqs.8638+0x430>
     eac:       17c320ef                jal     ra,33028 <printf>
     eb0:       0003d517                auipc   a0,0x3d
     eb4:       7b850513                addi    a0,a0,1976 # 3e668
<freqs.8638+0x430>
     eb8:       170320ef                jal     ra,33028 <printf>
     ebc:       4583606f                j       37314 <__riscv_restore_0>
     ec0:       47b1                    c.li    a5,12
     ec2:       02f48433                mul     s0,s1,a5    s0 : 1 => 0xc
     ec6:       003407b3                add     a5,s0,gp
     eca:       0bc7a903                lw      s2,188(a5) # 800000bc
<__bss_end+0x7ff8f718>
     ece:       3b21                    c.jal   be6 <invalidate_icache_all>
     ed0:       008187b3                add     a5,gp,s0
     ed4:       0c47a603                lw      a2,196(a5)
     ed8:       0c07a583                lw      a1,192(a5)
     edc:       8526                    c.mv    a0,s1
     ede:       9902                    c.jalr  s2  s2=0x3ff8f194 , s0=0xcsou
     ee0:       bff1                    c.j     ebc <handle_ipi+0x28>
     ee2:       8082                    c.jr    ra

As I said last mail, compiler will treat smp_function() as a return
call, so restore() will be executed after it.
This behavior shall comply with the calling convention.

Thanks for your understanding and tolerant.

> > > This system is probably too fragile. A simple solution would be to
> > > store the hart ID somewhere else, for example register tp. What do
> > > you
> > > think?
> >
> > I think you can also use scratch/mscratch (if it is not used anywhere
> > else).
> >
>
> You are right, sscratch and mscratch are also not currently used in
> U-Boot. Is there an advantage to use the scratch CSRs instead of tp?
>

I will prefer to use tp.
xscratch shall consider mode distinguish additionally.

Rick

> Thanks,
> Lukas
Auer, Lukas March 14, 2019, 9:04 p.m. | #11
Hi Rick,

On Tue, 2019-03-12 at 09:15 +0800, Rick Chen wrote:
> Hi Lukas
> 
> Auer, Lukas <lukas.auer@aisec.fraunhofer.de> 於 2019年3月11日 週一
> 上午2:12寫道:
> > On Sun, 2019-03-10 at 20:24 +0530, Anup Patel wrote:
> > > On Sun, Mar 10, 2019 at 7:28 PM Auer, Lukas
> > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > Hi Rick,
> > > > 
> > > > On Thu, 2019-03-07 at 17:30 +0800, Rick Chen wrote:
> > > > > Hi Lukas
> > > > > 
> > > > > > > From: Lukas Auer [mailto:lukas.auer@aisec.fraunhofer.de]
> > > > > > > Sent: Tuesday, February 12, 2019 6:14 AM
> > > > > > > To: u-boot@lists.denx.de
> > > > > > > Cc: Atish Patra; Anup Patel; Bin Meng; Andreas Schwab;
> > > > > > > Palmer
> > > > > > > Dabbelt;
> > > > > > > Alexander Graf; Lukas Auer; Anup Patel; Rick Jian-Zhi
> > > > > > > Chen(陳建志);
> > > > > > > Baruch Siach;
> > > > > > > Stefan Roese
> > > > > > > Subject: [PATCH 5/7] riscv: add support for multi-hart
> > > > > > > systems
> > > > > > > 
> > > > > > > On RISC-V, all harts boot independently. To be able to
> > > > > > > run on
> > > > > > > a
> > > > > > > multi-hart system,
> > > > > > > U-Boot must be extended with the functionality to manage
> > > > > > > all
> > > > > > > harts in the
> > > > > > > system. A new config option, CONFIG_MAIN_HART, is used to
> > > > > > > select
> > > > > > > the hart
> > > > > > > U-Boot runs on. All other harts are halted.
> > > > > > > U-Boot can delegate functions to them using
> > > > > > > smp_call_function().
> > > > > > > 
> > > > > > > Every hart has a valid pointer to the global data
> > > > > > > structure
> > > > > > > and a
> > > > > > > 8KiB stack by
> > > > > > > default. The stack size is set with
> > > > > > > CONFIG_STACK_SIZE_SHIFT.
> > > > > > > 
> > > > > > > Signed-off-by: Lukas Auer <lukas.auer@aisec.fraunhofer.de
> > > > > > > >
> > > > > > > ---
> > > > > > > 
> > > > > > >  arch/riscv/Kconfig           |  12 +++++
> > > > > > >  arch/riscv/cpu/start.S       | 102
> > > > > > > ++++++++++++++++++++++++++++++++++-
> > > > > > >  arch/riscv/include/asm/csr.h |   1 +
> > > > > > >  3 files changed, 114 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > index
> > > > > > > 3a51339c4d..af8d0f8d67
> > > > > > > 100644
> > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > @@ -140,4 +140,16 @@ config SBI_IPI
> > > > > > >       default y if RISCV_SMODE
> > > > > > >       depends on SMP
> > > > > > > 
> > > > > > > +config MAIN_HART
> > > > > > > +     int "Main hart in system"
> > > > > > > +     default 0
> > > > > > > +     help
> > > > > > > +       Some SoCs include harts of various sizes, some of
> > > > > > > which
> > > > > > > might not
> > > > > > > +       be suitable for running U-Boot. CONFIG_MAIN_HART
> > > > > > > is
> > > > > > > used
> > > > > > > to select
> > > > > > > +       the hart U-Boot runs on.
> > > > > > > +
> > > > > > > +config STACK_SIZE_SHIFT
> > > > > > > +     int
> > > > > > > +     default 13
> > > > > > > +
> > > > > > >  endmenu
> > > > > > > diff --git a/arch/riscv/cpu/start.S
> > > > > > > b/arch/riscv/cpu/start.S
> > > > > > > index
> > > > > > > a30f6f7194..ce7230df37 100644
> > > > > > > --- a/arch/riscv/cpu/start.S
> > > > > > > +++ b/arch/riscv/cpu/start.S
> > > > > > > @@ -13,6 +13,7 @@
> > > > > > >  #include <config.h>
> > > > > > >  #include <common.h>
> > > > > > >  #include <elf.h>
> > > > > > > +#include <asm/csr.h>
> > > > > > >  #include <asm/encoding.h>
> > > > > > >  #include <generated/asm-offsets.h>
> > > > > > > 
> > > > > > > @@ -45,6 +46,23 @@ _start:
> > > > > > >       /* mask all interrupts */
> > > > > > >       csrw    MODE_PREFIX(ie), zero
> > > > > > > 
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     /* check if hart is within range */
> > > > > > > +     /* s0: hart id */
> > > > > > > +     li      t0, CONFIG_NR_CPUS
> > > > > > > +     bge     s0, t0, hart_out_of_bounds_loop
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     /* set xSIE bit to receive IPIs */
> > > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > > +     li      t0, MIE_MSIE
> > > > > > > +#else
> > > > > > > +     li      t0, SIE_SSIE
> > > > > > > +#endif
> > > > > > > +     csrs    MODE_PREFIX(ie), t0
> > > > > > > +#endif
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * Set stackpointer in internal/ex RAM to call
> > > > > > > board_init_f
> > > > > > >   */
> > > > > > > @@ -56,7 +74,25 @@ call_board_init_f:
> > > > > > >  call_board_init_f_0:
> > > > > > >       mv      a0, sp
> > > > > > >       jal     board_init_f_alloc_reserve
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Set global data pointer here for all harts,
> > > > > > > uninitialized at this
> > > > > > > +      * point.
> > > > > > > +      */
> > > > > > > +     mv      gp, a0
> > > > > > > +
> > > > > > > +     /* setup stack */
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     /* s0: hart id */
> > > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > > +     sub     sp, a0, t0
> > > > > > > +#else
> > > > > > >       mv      sp, a0
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +     /* Continue on main hart, others branch to
> > > > > > > secondary_hart_loop */
> > > > > > > +     li      t0, CONFIG_MAIN_HART
> > > > > > > +     bne     s0, t0, secondary_hart_loop
> > > > > > > 
> > > > > > >       la      t0, prior_stage_fdt_address
> > > > > > >       SREG    s1, 0(t0)
> > > > > > > @@ -95,7 +131,14 @@ relocate_code:
> > > > > > >   *Set up the stack
> > > > > > >   */
> > > > > > >  stack_setup:
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     /* s0: hart id */
> > > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > > +     sub     sp, s2, t0
> > > > > > > +#else
> > > > > > >       mv      sp, s2
> > > > > > > +#endif
> > > > > > > +
> > > > > > >       la      t0, _start
> > > > > > >       sub     t6, s4, t0              /* t6 <- relocation
> > > > > > > offset
> > > > > > > */
> > > > > > >       beq     t0, s4, clear_bss       /* skip relocation
> > > > > > > */
> > > > > > > @@ -175,13 +218,30 @@ clear_bss:
> > > > > > >       add     t0, t0, t6              /* t0 <- rel
> > > > > > > __bss_start in
> > > > > > > RAM */
> > > > > > >       la      t1, __bss_end           /* t1 <- rel
> > > > > > > __bss_end
> > > > > > > in
> > > > > > > FLASH */
> > > > > > >       add     t1, t1, t6              /* t1 <- rel
> > > > > > > __bss_end
> > > > > > > in
> > > > > > > RAM */
> > > > > > > -     beq     t0, t1, call_board_init_r
> > > > > > > +     beq     t0, t1, relocate_secondary_harts
> > > > > > > 
> > > > > > >  clbss_l:
> > > > > > >       SREG    zero, 0(t0)             /* clear loop... */
> > > > > > >       addi    t0, t0, REGBYTES
> > > > > > >       bne     t0, t1, clbss_l
> > > > > > > 
> > > > > > > +relocate_secondary_harts:
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     /* send relocation IPI */
> > > > > > > +     la      t0, secondary_hart_relocate
> > > > > > > +     add     a0, t0, t6
> > > > > > > +
> > > > > > > +     /* store relocation offset */
> > > > > > > +     mv      s5, t6
> > > > > > > +
> > > > > > > +     mv      a1, s2
> > > > > > > +     mv      a2, s3
> > > > > > > +     jal     smp_call_function
> > > > > > > +
> > > > > > > +     /* restore relocation offset */
> > > > > > > +     mv      t6, s5
> > > > > > > +#endif
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * We are done. Do not return, instead branch to second
> > > > > > > part
> > > > > > > of
> > > > > > > board
> > > > > > >   * initialization, now running from RAM.
> > > > > > > @@ -202,3 +262,43 @@ call_board_init_r:
> > > > > > >   * jump to it ...
> > > > > > >   */
> > > > > > >       jr      t4                      /* jump to
> > > > > > > board_init_r()
> > > > > > > */
> > > > > > > +
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +hart_out_of_bounds_loop:
> > > > > > > +     /* Harts in this loop are out of bounds, increase
> > > > > > > CONFIG_NR_CPUS. */
> > > > > > > +     wfi
> > > > > > > +     j       hart_out_of_bounds_loop
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +/* SMP relocation entry */
> > > > > > > +secondary_hart_relocate:
> > > > > > > +     /* a1: new sp */
> > > > > > > +     /* a2: new gd */
> > > > > > > +     /* s0: hart id */
> > > > > > > +
> > > > > > > +     /* setup stack */
> > > > > > > +     slli    t0, s0, CONFIG_STACK_SIZE_SHIFT
> > > > > > > +     sub     sp, a1, t0
> > > > > > > +
> > > > > > > +     /* update global data pointer */
> > > > > > > +     mv      gp, a2
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +secondary_hart_loop:
> > > > > > > +     wfi
> > > > > > > +
> > > > > > > +#ifdef CONFIG_SMP
> > > > > > > +     csrr    t0, MODE_PREFIX(ip)
> > > > > > > +#ifdef CONFIG_RISCV_MMODE
> > > > > > > +     andi    t0, t0, MIE_MSIE
> > > > > > > +#else
> > > > > > > +     andi    t0, t0, SIE_SSIE
> > > > > > > +#endif
> > > > > > > +     beqz    t0, secondary_hart_loop
> > > > > > > +
> > > > > > > +     mv      a0, s0
> > > > > > > +     jal     handle_ipi
> > > > > 
> > > > > I found that s0 maybe corrupted after execute handle_ipi.
> > > > > Because smp_function will be treated as a return function by
> > > > > compiler,
> > > > > so compiler will generate codes to execute restore after
> > > > > smp_function().
> > > > > 
> > > > > But actually it is a no-return function. So there maybe no
> > > > > chance
> > > > > to
> > > > > execute
> > > > > restore. And s0 will be corrupted somehow.
> > > > > 
> > > > > The usage of s0 in v2 flow seem the same as v1.
> > > > > So I reply mail in v1 patch.
> > > > > 
> > > > 
> > > > Thanks for reporting this issue!
> > > > What compiler version are you using? I never saw this issue
> > > > with
> > > > GCC
> > > > 8.2.0, because optimization removes the ret following the call
> > > > to
> > > > smp_function(), it is called using jr. The registers are
> > > > therefore
> > > > restored before jumping to smp_function().
> > > > 
> 
> I found this issue with Andestech's toolchain (GCC version is 7.3.0)
> And it's code gen will be as below:
> 
> 00000e94 <handle_ipi>:
>      e94:       4785                    c.li    a5,1
>      e96:       04a7e663                bltu    a5,a0,ee2
> <handle_ipi+0x4e>
>      e9a:       456362ef                jal     t0,372f0
> <__riscv_save_0>
>      e9e:       84aa                    c.mv    s1,a0
>      ea0:       3539                    c.jal   cae <riscv_clear_ipi>
>      ea2:       cd19                    c.beqz  a0,ec0
> <handle_ipi+0x2c>
>      ea4:       0003d517                auipc   a0,0x3d
>      ea8:       7c450513                addi    a0,a0,1988 # 3e668
> <freqs.8638+0x430>
>      eac:       17c320ef                jal     ra,33028 <printf>
>      eb0:       0003d517                auipc   a0,0x3d
>      eb4:       7b850513                addi    a0,a0,1976 # 3e668
> <freqs.8638+0x430>
>      eb8:       170320ef                jal     ra,33028 <printf>
>      ebc:       4583606f                j       37314
> <__riscv_restore_0>
>      ec0:       47b1                    c.li    a5,12
>      ec2:       02f48433                mul     s0,s1,a5    s0 : 1 =>
> 0xc
>      ec6:       003407b3                add     a5,s0,gp
>      eca:       0bc7a903                lw      s2,188(a5) # 800000bc
> <__bss_end+0x7ff8f718>
>      ece:       3b21                    c.jal   be6
> <invalidate_icache_all>
>      ed0:       008187b3                add     a5,gp,s0
>      ed4:       0c47a603                lw      a2,196(a5)
>      ed8:       0c07a583                lw      a1,192(a5)
>      edc:       8526                    c.mv    a0,s1
>      ede:       9902                    c.jalr  s2  s2=0x3ff8f194 ,
> s0=0xcsou
>      ee0:       bff1                    c.j     ebc <handle_ipi+0x28>
>      ee2:       8082                    c.jr    ra
> 
> As I said last mail, compiler will treat smp_function() as a return
> call, so restore() will be executed after it.
> This behavior shall comply with the calling convention.
> 
> Thanks for your understanding and tolerant.

No problem, I was just asking out of interest. Thanks for providing the
toolchain output above!

> 
> > > > This system is probably too fragile. A simple solution would be
> > > > to
> > > > store the hart ID somewhere else, for example register tp. What
> > > > do
> > > > you
> > > > think?
> > > 
> > > I think you can also use scratch/mscratch (if it is not used
> > > anywhere
> > > else).
> > > 
> > 
> > You are right, sscratch and mscratch are also not currently used in
> > U-Boot. Is there an advantage to use the scratch CSRs instead of
> > tp?
> > 
> 
> I will prefer to use tp.
> xscratch shall consider mode distinguish additionally.
> 

Ok, I will update my series and use tp instead of s0.

Thanks,
Lukas

Patch

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3a51339c4d..af8d0f8d67 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -140,4 +140,16 @@  config SBI_IPI
 	default y if RISCV_SMODE
 	depends on SMP
 
+config MAIN_HART
+	int "Main hart in system"
+	default 0
+	help
+	  Some SoCs include harts of various sizes, some of which might not
+	  be suitable for running U-Boot. CONFIG_MAIN_HART is used to select
+	  the hart U-Boot runs on.
+
+config STACK_SIZE_SHIFT
+	int
+	default 13
+
 endmenu
diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
index a30f6f7194..ce7230df37 100644
--- a/arch/riscv/cpu/start.S
+++ b/arch/riscv/cpu/start.S
@@ -13,6 +13,7 @@ 
 #include <config.h>
 #include <common.h>
 #include <elf.h>
+#include <asm/csr.h>
 #include <asm/encoding.h>
 #include <generated/asm-offsets.h>
 
@@ -45,6 +46,23 @@  _start:
 	/* mask all interrupts */
 	csrw	MODE_PREFIX(ie), zero
 
+#ifdef CONFIG_SMP
+	/* check if hart is within range */
+	/* s0: hart id */
+	li	t0, CONFIG_NR_CPUS
+	bge	s0, t0, hart_out_of_bounds_loop
+#endif
+
+#ifdef CONFIG_SMP
+	/* set xSIE bit to receive IPIs */
+#ifdef CONFIG_RISCV_MMODE
+	li	t0, MIE_MSIE
+#else
+	li	t0, SIE_SSIE
+#endif
+	csrs	MODE_PREFIX(ie), t0
+#endif
+
 /*
  * Set stackpointer in internal/ex RAM to call board_init_f
  */
@@ -56,7 +74,25 @@  call_board_init_f:
 call_board_init_f_0:
 	mv	a0, sp
 	jal	board_init_f_alloc_reserve
+
+	/*
+	 * Set global data pointer here for all harts, uninitialized at this
+	 * point.
+	 */
+	mv	gp, a0
+
+	/* setup stack */
+#ifdef CONFIG_SMP
+	/* s0: hart id */
+	slli	t0, s0, CONFIG_STACK_SIZE_SHIFT
+	sub	sp, a0, t0
+#else
 	mv	sp, a0
+#endif
+
+	/* Continue on main hart, others branch to secondary_hart_loop */
+	li	t0, CONFIG_MAIN_HART
+	bne	s0, t0, secondary_hart_loop
 
 	la	t0, prior_stage_fdt_address
 	SREG	s1, 0(t0)
@@ -95,7 +131,14 @@  relocate_code:
  *Set up the stack
  */
 stack_setup:
+#ifdef CONFIG_SMP
+	/* s0: hart id */
+	slli	t0, s0, CONFIG_STACK_SIZE_SHIFT
+	sub	sp, s2, t0
+#else
 	mv	sp, s2
+#endif
+
 	la	t0, _start
 	sub	t6, s4, t0		/* t6 <- relocation offset */
 	beq	t0, s4, clear_bss	/* skip relocation */
@@ -175,13 +218,30 @@  clear_bss:
 	add	t0, t0, t6		/* t0 <- rel __bss_start in RAM */
 	la	t1, __bss_end		/* t1 <- rel __bss_end in FLASH */
 	add	t1, t1, t6		/* t1 <- rel __bss_end in RAM */
-	beq	t0, t1, call_board_init_r
+	beq	t0, t1, relocate_secondary_harts
 
 clbss_l:
 	SREG	zero, 0(t0)		/* clear loop... */
 	addi	t0, t0, REGBYTES
 	bne	t0, t1, clbss_l
 
+relocate_secondary_harts:
+#ifdef CONFIG_SMP
+	/* send relocation IPI */
+	la	t0, secondary_hart_relocate
+	add	a0, t0, t6
+
+	/* store relocation offset */
+	mv	s5, t6
+
+	mv	a1, s2
+	mv	a2, s3
+	jal	smp_call_function
+
+	/* restore relocation offset */
+	mv	t6, s5
+#endif
+
 /*
  * We are done. Do not return, instead branch to second part of board
  * initialization, now running from RAM.
@@ -202,3 +262,43 @@  call_board_init_r:
  * jump to it ...
  */
 	jr	t4			/* jump to board_init_r() */
+
+#ifdef CONFIG_SMP
+hart_out_of_bounds_loop:
+	/* Harts in this loop are out of bounds, increase CONFIG_NR_CPUS. */
+	wfi
+	j	hart_out_of_bounds_loop
+#endif
+
+#ifdef CONFIG_SMP
+/* SMP relocation entry */
+secondary_hart_relocate:
+	/* a1: new sp */
+	/* a2: new gd */
+	/* s0: hart id */
+
+	/* setup stack */
+	slli	t0, s0, CONFIG_STACK_SIZE_SHIFT
+	sub	sp, a1, t0
+
+	/* update global data pointer */
+	mv	gp, a2
+#endif
+
+secondary_hart_loop:
+	wfi
+
+#ifdef CONFIG_SMP
+	csrr	t0, MODE_PREFIX(ip)
+#ifdef CONFIG_RISCV_MMODE
+	andi	t0, t0, MIE_MSIE
+#else
+	andi	t0, t0, SIE_SSIE
+#endif
+	beqz	t0, secondary_hart_loop
+
+	mv	a0, s0
+	jal	handle_ipi
+#endif
+
+	j	secondary_hart_loop
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 86136f542c..644e6baa15 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -46,6 +46,7 @@ 
 #endif
 
 /* Interrupt Enable and Interrupt Pending flags */
+#define MIE_MSIE	_AC(0x00000008, UL) /* Software Interrupt Enable */
 #define SIE_SSIE	_AC(0x00000002, UL) /* Software Interrupt Enable */
 #define SIE_STIE	_AC(0x00000020, UL) /* Timer Interrupt Enable */