diff mbox series

[U-Boot,4/8] riscv: andes_plic: Fix some wrong configurations

Message ID 20191025061027.20962-5-uboot@andestech.com
State Superseded
Delegated to: Andes
Headers show
Series RISC-V AX25-AE350 support SPL | expand

Commit Message

Andes Oct. 25, 2019, 6:10 a.m. UTC
From: Rick Chen <rick@andestech.com>

It will work fine due to hart 0 always will be main
hart coincidentally. When develop SPL flow, I try to
force other harts to be main hart. And it will go
wrong in sending IPI flow. So fix it.

Having this fix, any hart can be main hart in U-Boot SPL
theoretically, but it still fail somewhere. After dig in
and found there is an assumption that hart 0 shall be
main hart in OpenSbi.

After some work-arounds, it can pass the verifications
that any hart can be main hart and boots U-Boot SPL ->
OpenSbi -> U-Boot proper -> Linux Kernel successfully.

Signed-off-by: Rick Chen <rick@andestech.com>
Cc: KC Lin <kclin@andestech.com>
Cc: Alan Kao <alankao@andestech.com>
---
 arch/riscv/lib/andes_plic.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Bin Meng Oct. 29, 2019, 2:51 p.m. UTC | #1
Hi Rick,

On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
>
> From: Rick Chen <rick@andestech.com>
>
> It will work fine due to hart 0 always will be main
> hart coincidentally. When develop SPL flow, I try to
> force other harts to be main hart. And it will go
> wrong in sending IPI flow. So fix it.

Fix what? Does this commit contain 2 fixes, or just 1 fix?

>
> Having this fix, any hart can be main hart in U-Boot SPL
> theoretically, but it still fail somewhere. After dig in
> and found there is an assumption that hart 0 shall be
> main hart in OpenSbi.

So does this mean there is a bug in OpenSBI too?

>
> After some work-arounds, it can pass the verifications
> that any hart can be main hart and boots U-Boot SPL ->
> OpenSbi -> U-Boot proper -> Linux Kernel successfully.
>

It's a bit hard for me to understand what was described here in the
commit message. Maybe you need rewrite something.

> Signed-off-by: Rick Chen <rick@andestech.com>
> Cc: KC Lin <kclin@andestech.com>
> Cc: Alan Kao <alankao@andestech.com>
> ---
>  arch/riscv/lib/andes_plic.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> index 28568e4..42394b9 100644
> --- a/arch/riscv/lib/andes_plic.c
> +++ b/arch/riscv/lib/andes_plic.c
> @@ -19,7 +19,7 @@
>  #include <cpu.h>
>
>  /* pending register */
> -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
>  /* enable register */
>  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
>  /* claim register */
> @@ -46,7 +46,7 @@ static int init_plic(void);
>
>  static int enable_ipi(int hart)
>  {
> -       int en;
> +       unsigned int en;

Is this for some compiler warning fix?

>
>         en = ENABLE_HART_IPI >> hart;
>         writel(en, (void __iomem *)ENABLE_REG(gd->arch.plic, hart));
> @@ -94,10 +94,13 @@ static int init_plic(void)
>
>  int riscv_send_ipi(int hart)
>  {
> +       unsigned int ipi;
> +
>         PLIC_BASE_GET();
>
> -       writel(SEND_IPI_TO_HART(hart),
> -              (void __iomem *)PENDING_REG(gd->arch.plic, gd->arch.boot_hart));
> +       ipi = (SEND_IPI_TO_HART(hart) << (8 * gd->arch.boot_hart));
> +       writel(ipi, (void __iomem *)PENDING_REG(gd->arch.plic,
> +                               gd->arch.boot_hart));
>
>         return 0;
>  }
> --

Regards,
Bin
Rick Chen Oct. 30, 2019, 2:50 a.m. UTC | #2
Hi Bin

>
> Hi Rick,
>
> On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> >
> > From: Rick Chen <rick@andestech.com>
> >
> > It will work fine due to hart 0 always will be main
> > hart coincidentally. When develop SPL flow, I try to
> > force other harts to be main hart. And it will go
> > wrong in sending IPI flow. So fix it.
>
> Fix what? Does this commit contain 2 fixes, or just 1 fix?

Yes, it include two fixs. But they will cause one negative result
that only hart 0 can send ipi to other harts.

>
> >
> > Having this fix, any hart can be main hart in U-Boot SPL
> > theoretically, but it still fail somewhere. After dig in
> > and found there is an assumption that hart 0 shall be
> > main hart in OpenSbi.
>
> So does this mean there is a bug in OpenSBI too?

I am not sure if it is a bug. Maybe it is a compatible issue.
There is a limitation that only hart 0 can be main hart in OpenSBI.
But any hart can be main hart in U-Boot.

In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
but hart 0 still in U-Boot SPL.

>
> >
> > After some work-arounds, it can pass the verifications
> > that any hart can be main hart and boots U-Boot SPL ->
> > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> >
>
> It's a bit hard for me to understand what was described here in the
> commit message. Maybe you need rewrite something.

OK. I will rewrite this commit message.

>
> > Signed-off-by: Rick Chen <rick@andestech.com>
> > Cc: KC Lin <kclin@andestech.com>
> > Cc: Alan Kao <alankao@andestech.com>
> > ---
> >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > index 28568e4..42394b9 100644
> > --- a/arch/riscv/lib/andes_plic.c
> > +++ b/arch/riscv/lib/andes_plic.c
> > @@ -19,7 +19,7 @@
> >  #include <cpu.h>
> >
> >  /* pending register */
> > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> >  /* enable register */
> >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> >  /* claim register */
> > @@ -46,7 +46,7 @@ static int init_plic(void);
> >
> >  static int enable_ipi(int hart)
> >  {
> > -       int en;
> > +       unsigned int en;
>
> Is this for some compiler warning fix?

No, it is not a warning fix. It is a bug fix.
I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
or it will cause IPI sending errors.

Thanks
Rick

>
> >
> >         en = ENABLE_HART_IPI >> hart;
> >         writel(en, (void __iomem *)ENABLE_REG(gd->arch.plic, hart));
> > @@ -94,10 +94,13 @@ static int init_plic(void)
> >
> >  int riscv_send_ipi(int hart)
> >  {
> > +       unsigned int ipi;
> > +
> >         PLIC_BASE_GET();
> >
> > -       writel(SEND_IPI_TO_HART(hart),
> > -              (void __iomem *)PENDING_REG(gd->arch.plic, gd->arch.boot_hart));
> > +       ipi = (SEND_IPI_TO_HART(hart) << (8 * gd->arch.boot_hart));
> > +       writel(ipi, (void __iomem *)PENDING_REG(gd->arch.plic,
> > +                               gd->arch.boot_hart));
> >
> >         return 0;
> >  }
> > --
>
> Regards,
> Bin
Bin Meng Oct. 30, 2019, 10:38 a.m. UTC | #3
Hi Rick,

On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Bin
>
> >
> > Hi Rick,
> >
> > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > >
> > > From: Rick Chen <rick@andestech.com>
> > >
> > > It will work fine due to hart 0 always will be main
> > > hart coincidentally. When develop SPL flow, I try to
> > > force other harts to be main hart. And it will go
> > > wrong in sending IPI flow. So fix it.
> >
> > Fix what? Does this commit contain 2 fixes, or just 1 fix?
>
> Yes, it include two fixs. But they will cause one negative result
> that only hart 0 can send ipi to other harts.
>
> >
> > >
> > > Having this fix, any hart can be main hart in U-Boot SPL
> > > theoretically, but it still fail somewhere. After dig in
> > > and found there is an assumption that hart 0 shall be
> > > main hart in OpenSbi.
> >
> > So does this mean there is a bug in OpenSBI too?
>
> I am not sure if it is a bug. Maybe it is a compatible issue.
> There is a limitation that only hart 0 can be main hart in OpenSBI.

I don't think OpenSBI has such limitation.

> But any hart can be main hart in U-Boot.
>
> In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> but hart 0 still in U-Boot SPL.
>
> >
> > >
> > > After some work-arounds, it can pass the verifications
> > > that any hart can be main hart and boots U-Boot SPL ->
> > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > >
> >
> > It's a bit hard for me to understand what was described here in the
> > commit message. Maybe you need rewrite something.
>
> OK. I will rewrite this commit message.
>
> >
> > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > Cc: KC Lin <kclin@andestech.com>
> > > Cc: Alan Kao <alankao@andestech.com>
> > > ---
> > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > index 28568e4..42394b9 100644
> > > --- a/arch/riscv/lib/andes_plic.c
> > > +++ b/arch/riscv/lib/andes_plic.c
> > > @@ -19,7 +19,7 @@
> > >  #include <cpu.h>
> > >
> > >  /* pending register */
> > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > >  /* enable register */
> > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > >  /* claim register */
> > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > >
> > >  static int enable_ipi(int hart)
> > >  {
> > > -       int en;
> > > +       unsigned int en;
> >
> > Is this for some compiler warning fix?
>
> No, it is not a warning fix. It is a bug fix.
> I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,

But it is int, which is only 32-bit. The example you gave was a 64-bit number.

> or it will cause IPI sending errors.
>

Regards,
Bin
Alan Kao Oct. 31, 2019, 1 a.m. UTC | #4
Hi Bin,

Thanks for the critics.  Comments below.
On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> Hi Rick,
> 
> On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Bin
> >
> > >
> > > Hi Rick,
> > >
> > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > >
> > > > From: Rick Chen <rick@andestech.com>
> > > >
> > > > It will work fine due to hart 0 always will be main
> > > > hart coincidentally. When develop SPL flow, I try to
> > > > force other harts to be main hart. And it will go
> > > > wrong in sending IPI flow. So fix it.
> > >
> > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> >
> > Yes, it include two fixs. But they will cause one negative result
> > that only hart 0 can send ipi to other harts.
> >
> > >
> > > >
> > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > theoretically, but it still fail somewhere. After dig in
> > > > and found there is an assumption that hart 0 shall be
> > > > main hart in OpenSbi.
> > >
> > > So does this mean there is a bug in OpenSBI too?
> >
> > I am not sure if it is a bug. Maybe it is a compatible issue.
> > There is a limitation that only hart 0 can be main hart in OpenSBI.
> 
> I don't think OpenSBI has such limitation.
> 

Please check the source.
https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54

Apparently, the FIRST TWO LINEs of the initialization are the
1. get hart ID.
2. determine which route to take based on their ID respectively.

So, I do think OpenSBI has this signature, if you are not willing to call it
a limitation.

> > But any hart can be main hart in U-Boot.
> >
> > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > but hart 0 still in U-Boot SPL.
> >
> > >
> > > >
> > > > After some work-arounds, it can pass the verifications
> > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > >
> > >
> > > It's a bit hard for me to understand what was described here in the
> > > commit message. Maybe you need rewrite something.
> >
> > OK. I will rewrite this commit message.
> >
> > >
> > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > Cc: KC Lin <kclin@andestech.com>
> > > > Cc: Alan Kao <alankao@andestech.com>
> > > > ---
> > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > index 28568e4..42394b9 100644
> > > > --- a/arch/riscv/lib/andes_plic.c
> > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > @@ -19,7 +19,7 @@
> > > >  #include <cpu.h>
> > > >
> > > >  /* pending register */
> > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > >  /* enable register */
> > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > >  /* claim register */
> > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > >
> > > >  static int enable_ipi(int hart)
> > > >  {
> > > > -       int en;
> > > > +       unsigned int en;
> > >
> > > Is this for some compiler warning fix?
> >
> > No, it is not a warning fix. It is a bug fix.
> > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> 
> But it is int, which is only 32-bit. The example you gave was a 64-bit number.
>

Please consider the following simple program:

> #define MASK 0x80808080
>int main(){
>        int en;
>        en = MASK;
>        printf("%x, shifted %x\n", en, en >> 1);
>        return 0;
>}

Would you mind sharing what you get after running this on your x86_64
(if you have one) computer?  Really appreiciate that.

The almost identical episode is in the patch, specifically,
> en = ENABLE_HART_IPI >> hart

> > or it will cause IPI sending errors.
> >
> 
> Regards,
> Bin

Best,
Alan
Rick Chen Oct. 31, 2019, 2:23 a.m. UTC | #5
Hi Bin

>
> Hi Rick,
>
> On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Bin
> >
> > >
> > > Hi Rick,
> > >
> > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > >
> > > > From: Rick Chen <rick@andestech.com>
> > > >
> > > > It will work fine due to hart 0 always will be main
> > > > hart coincidentally. When develop SPL flow, I try to
> > > > force other harts to be main hart. And it will go
> > > > wrong in sending IPI flow. So fix it.
> > >
> > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> >
> > Yes, it include two fixs. But they will cause one negative result
> > that only hart 0 can send ipi to other harts.
> >
> > >
> > > >
> > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > theoretically, but it still fail somewhere. After dig in
> > > > and found there is an assumption that hart 0 shall be
> > > > main hart in OpenSbi.
> > >
> > > So does this mean there is a bug in OpenSBI too?
> >
> > I am not sure if it is a bug. Maybe it is a compatible issue.
> > There is a limitation that only hart 0 can be main hart in OpenSBI.
>
> I don't think OpenSBI has such limitation.

OK.
But there is a hint in OpenSBI indeed.
Maybe it is just different interpretation each other.

/*
* Jump to warm-boot if this is not the first core booting,
* that is, for mhartid != 0
*/

>
> > But any hart can be main hart in U-Boot.
> >
> > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > but hart 0 still in U-Boot SPL.
> >
> > >
> > > >
> > > > After some work-arounds, it can pass the verifications
> > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > >
> > >
> > > It's a bit hard for me to understand what was described here in the
> > > commit message. Maybe you need rewrite something.
> >
> > OK. I will rewrite this commit message.
> >
> > >
> > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > Cc: KC Lin <kclin@andestech.com>
> > > > Cc: Alan Kao <alankao@andestech.com>
> > > > ---
> > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > index 28568e4..42394b9 100644
> > > > --- a/arch/riscv/lib/andes_plic.c
> > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > @@ -19,7 +19,7 @@
> > > >  #include <cpu.h>
> > > >
> > > >  /* pending register */
> > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > >  /* enable register */
> > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > >  /* claim register */
> > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > >
> > > >  static int enable_ipi(int hart)
> > > >  {
> > > > -       int en;
> > > > +       unsigned int en;
> > >
> > > Is this for some compiler warning fix?
> >
> > No, it is not a warning fix. It is a bug fix.
> > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
>
> But it is int, which is only 32-bit. The example you gave was a 64-bit number.

The verification runs on RV64, so it is a 64-bit number example.

Thanks
Rick

>
> > or it will cause IPI sending errors.
> >
>
> Regards,
> Bin
Bin Meng Oct. 31, 2019, 3:36 a.m. UTC | #6
Hi Alan,

On Thu, Oct 31, 2019 at 9:00 AM Alan Kao <alankao@andestech.com> wrote:
>
> Hi Bin,
>
> Thanks for the critics.  Comments below.
> On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > Hi Rick,
> >
> > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Bin
> > >
> > > >
> > > > Hi Rick,
> > > >
> > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > >
> > > > > From: Rick Chen <rick@andestech.com>
> > > > >
> > > > > It will work fine due to hart 0 always will be main
> > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > force other harts to be main hart. And it will go
> > > > > wrong in sending IPI flow. So fix it.
> > > >
> > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > >
> > > Yes, it include two fixs. But they will cause one negative result
> > > that only hart 0 can send ipi to other harts.
> > >
> > > >
> > > > >
> > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > theoretically, but it still fail somewhere. After dig in
> > > > > and found there is an assumption that hart 0 shall be
> > > > > main hart in OpenSbi.
> > > >
> > > > So does this mean there is a bug in OpenSBI too?
> > >
> > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> >
> > I don't think OpenSBI has such limitation.
> >
>
> Please check the source.
> https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
>
> Apparently, the FIRST TWO LINEs of the initialization are the
> 1. get hart ID.
> 2. determine which route to take based on their ID respectively.
>

This is true only for the very first a few instructions when OpenSBI
boots. Later OpenSBI main initialization does not require hart to be
zero.



> So, I do think OpenSBI has this signature, if you are not willing to call it
> a limitation.
>
> > > But any hart can be main hart in U-Boot.
> > >
> > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > but hart 0 still in U-Boot SPL.
> > >
> > > >
> > > > >
> > > > > After some work-arounds, it can pass the verifications
> > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > >
> > > >
> > > > It's a bit hard for me to understand what was described here in the
> > > > commit message. Maybe you need rewrite something.
> > >
> > > OK. I will rewrite this commit message.
> > >
> > > >
> > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > ---
> > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > index 28568e4..42394b9 100644
> > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > @@ -19,7 +19,7 @@
> > > > >  #include <cpu.h>
> > > > >
> > > > >  /* pending register */
> > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > >  /* enable register */
> > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > >  /* claim register */
> > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > >
> > > > >  static int enable_ipi(int hart)
> > > > >  {
> > > > > -       int en;
> > > > > +       unsigned int en;
> > > >
> > > > Is this for some compiler warning fix?
> > >
> > > No, it is not a warning fix. It is a bug fix.
> > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> >
> > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> >
>
> Please consider the following simple program:
>
> > #define MASK 0x80808080
> >int main(){
> >        int en;
> >        en = MASK;
> >        printf("%x, shifted %x\n", en, en >> 1);
> >        return 0;
> >}
>
> Would you mind sharing what you get after running this on your x86_64
> (if you have one) computer?  Really appreiciate that.
>
> The almost identical episode is in the patch, specifically,
> > en = ENABLE_HART_IPI >> hart

Yes, this is a bug. But I was confused by Rick's comments as he was
using a 64-bit number as int is never to be a 64-bit for both 32-bit
and 64-bit.

Regards,
Bin
Alan Kao Oct. 31, 2019, 7:48 a.m. UTC | #7
On Thu, Oct 31, 2019 at 11:36:48AM +0800, Bin Meng wrote:
> Hi Alan,
> 
> On Thu, Oct 31, 2019 at 9:00 AM Alan Kao <alankao@andestech.com> wrote:
> >
> > Hi Bin,
> >
> > Thanks for the critics.  Comments below.
> > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > Hi Rick,
> > >
> > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Bin
> > > >
> > > > >
> > > > > Hi Rick,
> > > > >
> > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > >
> > > > > > From: Rick Chen <rick@andestech.com>
> > > > > >
> > > > > > It will work fine due to hart 0 always will be main
> > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > force other harts to be main hart. And it will go
> > > > > > wrong in sending IPI flow. So fix it.
> > > > >
> > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > >
> > > > Yes, it include two fixs. But they will cause one negative result
> > > > that only hart 0 can send ipi to other harts.
> > > >
> > > > >
> > > > > >
> > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > and found there is an assumption that hart 0 shall be
> > > > > > main hart in OpenSbi.
> > > > >
> > > > > So does this mean there is a bug in OpenSBI too?
> > > >
> > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > >
> > > I don't think OpenSBI has such limitation.
> > >
> >
> > Please check the source.
> > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> >
> > Apparently, the FIRST TWO LINEs of the initialization are the
> > 1. get hart ID.
> > 2. determine which route to take based on their ID respectively.
> >
> 
> This is true only for the very first a few instructions when OpenSBI
> boots. Later OpenSBI main initialization does not require hart to be
> zero.
> 
> > So, I do think OpenSBI has this signature, if you are not willing to call it
> > a limitation.
> >
> > > > But any hart can be main hart in U-Boot.
> > > >
> > > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > > but hart 0 still in U-Boot SPL.
> > > >
> > > > >
> > > > > >
> > > > > > After some work-arounds, it can pass the verifications
> > > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > > >
> > > > >
> > > > > It's a bit hard for me to understand what was described here in the
> > > > > commit message. Maybe you need rewrite something.
> > > >
> > > > OK. I will rewrite this commit message.
> > > >
> > > > >
> > > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > > ---
> > > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > > index 28568e4..42394b9 100644
> > > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > > @@ -19,7 +19,7 @@
> > > > > >  #include <cpu.h>
> > > > > >
> > > > > >  /* pending register */
> > > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > > >  /* enable register */
> > > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > > >  /* claim register */
> > > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > > >
> > > > > >  static int enable_ipi(int hart)
> > > > > >  {
> > > > > > -       int en;
> > > > > > +       unsigned int en;
> > > > >
> > > > > Is this for some compiler warning fix?
> > > >
> > > > No, it is not a warning fix. It is a bug fix.
> > > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> > >
> > > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> > >
> >
> > Please consider the following simple program:
> >
> > > #define MASK 0x80808080
> > >int main(){
> > >        int en;
> > >        en = MASK;
> > >        printf("%x, shifted %x\n", en, en >> 1);
> > >        return 0;
> > >}
> >
> > Would you mind sharing what you get after running this on your x86_64
> > (if you have one) computer?  Really appreiciate that.
> >
> > The almost identical episode is in the patch, specifically,
> > > en = ENABLE_HART_IPI >> hart
> 
> Yes, this is a bug. ...

Wait, what do you mean but "this"?  What is a bug here?
If you want to be helpful, please also be specific or anyone else reviewing
this patch will be confused.

> ... But I was confused by Rick's comments as he was
> using a 64-bit number as int is never to be a 64-bit for both 32-bit
> and 64-bit.

It was just an example.  Nothing to do with bit width, but just a sign-
extension issue.

> 
> Regards,
> Bin

Thanks,
Alan
Bin Meng Oct. 31, 2019, 8:10 a.m. UTC | #8
Hi Alan,

On Thu, Oct 31, 2019 at 3:49 PM Alan Kao <alankao@andestech.com> wrote:
>
>
> On Thu, Oct 31, 2019 at 11:36:48AM +0800, Bin Meng wrote:
> > Hi Alan,
> >
> > On Thu, Oct 31, 2019 at 9:00 AM Alan Kao <alankao@andestech.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > Thanks for the critics.  Comments below.
> > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > Hi Rick,
> > > >
> > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > >
> > > > > Hi Bin
> > > > >
> > > > > >
> > > > > > Hi Rick,
> > > > > >
> > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > >
> > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > >
> > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > force other harts to be main hart. And it will go
> > > > > > > wrong in sending IPI flow. So fix it.
> > > > > >
> > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > >
> > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > that only hart 0 can send ipi to other harts.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > main hart in OpenSbi.
> > > > > >
> > > > > > So does this mean there is a bug in OpenSBI too?
> > > > >
> > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > >
> > > > I don't think OpenSBI has such limitation.
> > > >
> > >
> > > Please check the source.
> > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > >
> > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > 1. get hart ID.
> > > 2. determine which route to take based on their ID respectively.
> > >
> >
> > This is true only for the very first a few instructions when OpenSBI
> > boots. Later OpenSBI main initialization does not require hart to be
> > zero.
> >
> > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > a limitation.
> > >
> > > > > But any hart can be main hart in U-Boot.
> > > > >
> > > > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > > > but hart 0 still in U-Boot SPL.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > After some work-arounds, it can pass the verifications
> > > > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > > > >
> > > > > >
> > > > > > It's a bit hard for me to understand what was described here in the
> > > > > > commit message. Maybe you need rewrite something.
> > > > >
> > > > > OK. I will rewrite this commit message.
> > > > >
> > > > > >
> > > > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > > > ---
> > > > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > > > index 28568e4..42394b9 100644
> > > > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > > > @@ -19,7 +19,7 @@
> > > > > > >  #include <cpu.h>
> > > > > > >
> > > > > > >  /* pending register */
> > > > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > > > >  /* enable register */
> > > > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > > > >  /* claim register */
> > > > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > > > >
> > > > > > >  static int enable_ipi(int hart)
> > > > > > >  {
> > > > > > > -       int en;
> > > > > > > +       unsigned int en;
> > > > > >
> > > > > > Is this for some compiler warning fix?
> > > > >
> > > > > No, it is not a warning fix. It is a bug fix.
> > > > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> > > >
> > > > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> > > >
> > >
> > > Please consider the following simple program:
> > >
> > > > #define MASK 0x80808080
> > > >int main(){
> > > >        int en;
> > > >        en = MASK;
> > > >        printf("%x, shifted %x\n", en, en >> 1);
> > > >        return 0;
> > > >}
> > >
> > > Would you mind sharing what you get after running this on your x86_64
> > > (if you have one) computer?  Really appreiciate that.
> > >
> > > The almost identical episode is in the patch, specifically,
> > > > en = ENABLE_HART_IPI >> hart
> >
> > Yes, this is a bug. ...
>
> Wait, what do you mean but "this"?  What is a bug here?

The bug you mentioned.

> If you want to be helpful, please also be specific or anyone else reviewing
> this patch will be confused.

It's just the explanation that Rick gave confuses people like me.

>
> > ... But I was confused by Rick's comments as he was
> > using a 64-bit number as int is never to be a 64-bit for both 32-bit
> > and 64-bit.
>
> It was just an example.  Nothing to do with bit width, but just a sign-
> extension issue.

Regards,
Bin
Anup Patel Oct. 31, 2019, 8:12 a.m. UTC | #9
On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
>
> Hi Bin,
>
> Thanks for the critics.  Comments below.
> On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > Hi Rick,
> >
> > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Bin
> > >
> > > >
> > > > Hi Rick,
> > > >
> > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > >
> > > > > From: Rick Chen <rick@andestech.com>
> > > > >
> > > > > It will work fine due to hart 0 always will be main
> > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > force other harts to be main hart. And it will go
> > > > > wrong in sending IPI flow. So fix it.
> > > >
> > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > >
> > > Yes, it include two fixs. But they will cause one negative result
> > > that only hart 0 can send ipi to other harts.
> > >
> > > >
> > > > >
> > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > theoretically, but it still fail somewhere. After dig in
> > > > > and found there is an assumption that hart 0 shall be
> > > > > main hart in OpenSbi.
> > > >
> > > > So does this mean there is a bug in OpenSBI too?
> > >
> > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> >
> > I don't think OpenSBI has such limitation.
> >
>
> Please check the source.
> https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
>
> Apparently, the FIRST TWO LINEs of the initialization are the
> 1. get hart ID.
> 2. determine which route to take based on their ID respectively.
>
> So, I do think OpenSBI has this signature, if you are not willing to call it
> a limitation.

This dependency on hart id #0 was not there until we added self-relocation
in OpenSBI for FW_DYNAMIC.

I will try to fix this in OpenSBI but we might end-up having boot_lottery.

>
> > > But any hart can be main hart in U-Boot.
> > >
> > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > but hart 0 still in U-Boot SPL.
> > >
> > > >
> > > > >
> > > > > After some work-arounds, it can pass the verifications
> > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > >
> > > >
> > > > It's a bit hard for me to understand what was described here in the
> > > > commit message. Maybe you need rewrite something.
> > >
> > > OK. I will rewrite this commit message.
> > >
> > > >
> > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > ---
> > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > index 28568e4..42394b9 100644
> > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > @@ -19,7 +19,7 @@
> > > > >  #include <cpu.h>
> > > > >
> > > > >  /* pending register */
> > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > >  /* enable register */
> > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > >  /* claim register */
> > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > >
> > > > >  static int enable_ipi(int hart)
> > > > >  {
> > > > > -       int en;
> > > > > +       unsigned int en;
> > > >
> > > > Is this for some compiler warning fix?
> > >
> > > No, it is not a warning fix. It is a bug fix.
> > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> >
> > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> >
>
> Please consider the following simple program:
>
> > #define MASK 0x80808080
> >int main(){
> >        int en;
> >        en = MASK;
> >        printf("%x, shifted %x\n", en, en >> 1);
> >        return 0;
> >}
>
> Would you mind sharing what you get after running this on your x86_64
> (if you have one) computer?  Really appreiciate that.
>
> The almost identical episode is in the patch, specifically,
> > en = ENABLE_HART_IPI >> hart
>
> > > or it will cause IPI sending errors.
> > >
> >
> > Regards,
> > Bin
>
> Best,
> Alan

Regards,
Anup
Anup Patel Oct. 31, 2019, 10:43 a.m. UTC | #10
On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> >
> > Hi Bin,
> >
> > Thanks for the critics.  Comments below.
> > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > Hi Rick,
> > >
> > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Bin
> > > >
> > > > >
> > > > > Hi Rick,
> > > > >
> > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > >
> > > > > > From: Rick Chen <rick@andestech.com>
> > > > > >
> > > > > > It will work fine due to hart 0 always will be main
> > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > force other harts to be main hart. And it will go
> > > > > > wrong in sending IPI flow. So fix it.
> > > > >
> > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > >
> > > > Yes, it include two fixs. But they will cause one negative result
> > > > that only hart 0 can send ipi to other harts.
> > > >
> > > > >
> > > > > >
> > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > and found there is an assumption that hart 0 shall be
> > > > > > main hart in OpenSbi.
> > > > >
> > > > > So does this mean there is a bug in OpenSBI too?
> > > >
> > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > >
> > > I don't think OpenSBI has such limitation.
> > >
> >
> > Please check the source.
> > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> >
> > Apparently, the FIRST TWO LINEs of the initialization are the
> > 1. get hart ID.
> > 2. determine which route to take based on their ID respectively.
> >
> > So, I do think OpenSBI has this signature, if you are not willing to call it
> > a limitation.
>
> This dependency on hart id #0 was not there until we added self-relocation
> in OpenSBI for FW_DYNAMIC.
>
> I will try to fix this in OpenSBI but we might end-up having boot_lottery.

I have send a patch to fix this OpenSBI:
"[PATCH] firmware: Introduce relocation lottery"

Can you try above patch and see if that helps ?

It will be great if you can provide Tested-by to my patch as well.

Regards,
Anup

>
> >
> > > > But any hart can be main hart in U-Boot.
> > > >
> > > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > > but hart 0 still in U-Boot SPL.
> > > >
> > > > >
> > > > > >
> > > > > > After some work-arounds, it can pass the verifications
> > > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > > >
> > > > >
> > > > > It's a bit hard for me to understand what was described here in the
> > > > > commit message. Maybe you need rewrite something.
> > > >
> > > > OK. I will rewrite this commit message.
> > > >
> > > > >
> > > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > > ---
> > > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > > index 28568e4..42394b9 100644
> > > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > > @@ -19,7 +19,7 @@
> > > > > >  #include <cpu.h>
> > > > > >
> > > > > >  /* pending register */
> > > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > > >  /* enable register */
> > > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > > >  /* claim register */
> > > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > > >
> > > > > >  static int enable_ipi(int hart)
> > > > > >  {
> > > > > > -       int en;
> > > > > > +       unsigned int en;
> > > > >
> > > > > Is this for some compiler warning fix?
> > > >
> > > > No, it is not a warning fix. It is a bug fix.
> > > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> > >
> > > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> > >
> >
> > Please consider the following simple program:
> >
> > > #define MASK 0x80808080
> > >int main(){
> > >        int en;
> > >        en = MASK;
> > >        printf("%x, shifted %x\n", en, en >> 1);
> > >        return 0;
> > >}
> >
> > Would you mind sharing what you get after running this on your x86_64
> > (if you have one) computer?  Really appreiciate that.
> >
> > The almost identical episode is in the patch, specifically,
> > > en = ENABLE_HART_IPI >> hart
> >
> > > > or it will cause IPI sending errors.
> > > >
> > >
> > > Regards,
> > > Bin
> >
> > Best,
> > Alan
>
> Regards,
> Anup
Rick Chen Nov. 1, 2019, 5:25 a.m. UTC | #11
Hi Anup

>
> On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > Thanks for the critics.  Comments below.
> > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > Hi Rick,
> > > >
> > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > >
> > > > > Hi Bin
> > > > >
> > > > > >
> > > > > > Hi Rick,
> > > > > >
> > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > >
> > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > >
> > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > force other harts to be main hart. And it will go
> > > > > > > wrong in sending IPI flow. So fix it.
> > > > > >
> > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > >
> > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > that only hart 0 can send ipi to other harts.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > main hart in OpenSbi.
> > > > > >
> > > > > > So does this mean there is a bug in OpenSBI too?
> > > > >
> > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > >
> > > > I don't think OpenSBI has such limitation.
> > > >
> > >
> > > Please check the source.
> > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > >
> > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > 1. get hart ID.
> > > 2. determine which route to take based on their ID respectively.
> > >
> > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > a limitation.
> >
> > This dependency on hart id #0 was not there until we added self-relocation
> > in OpenSBI for FW_DYNAMIC.
> >
> > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
>
> I have send a patch to fix this OpenSBI:
> "[PATCH] firmware: Introduce relocation lottery"
>
> Can you try above patch and see if that helps ?
>
> It will be great if you can provide Tested-by to my patch as well.
>

OK

Thanks
Rick

> Regards,
> Anup
>
> >
> > >
> > > > > But any hart can be main hart in U-Boot.
> > > > >
> > > > > In general case, hart 0 will be main and it is fine when U-Boot jump ot OpenSBI.
> > > > > But if we force hart 1 to be main hart, when hart 0 jump to OPenSBI from U-Boot,
> > > > > It will do relocation flow in OpenSBI which willcorrupt U-Boot SPL,
> > > > > but hart 0 still in U-Boot SPL.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > After some work-arounds, it can pass the verifications
> > > > > > > that any hart can be main hart and boots U-Boot SPL ->
> > > > > > > OpenSbi -> U-Boot proper -> Linux Kernel successfully.
> > > > > > >
> > > > > >
> > > > > > It's a bit hard for me to understand what was described here in the
> > > > > > commit message. Maybe you need rewrite something.
> > > > >
> > > > > OK. I will rewrite this commit message.
> > > > >
> > > > > >
> > > > > > > Signed-off-by: Rick Chen <rick@andestech.com>
> > > > > > > Cc: KC Lin <kclin@andestech.com>
> > > > > > > Cc: Alan Kao <alankao@andestech.com>
> > > > > > > ---
> > > > > > >  arch/riscv/lib/andes_plic.c | 11 +++++++----
> > > > > > >  1 file changed, 7 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
> > > > > > > index 28568e4..42394b9 100644
> > > > > > > --- a/arch/riscv/lib/andes_plic.c
> > > > > > > +++ b/arch/riscv/lib/andes_plic.c
> > > > > > > @@ -19,7 +19,7 @@
> > > > > > >  #include <cpu.h>
> > > > > > >
> > > > > > >  /* pending register */
> > > > > > > -#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + (hart) * 8)
> > > > > > > +#define PENDING_REG(base, hart)        ((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
> > > > > > >  /* enable register */
> > > > > > >  #define ENABLE_REG(base, hart) ((ulong)(base) + 0x2000 + (hart) * 0x80)
> > > > > > >  /* claim register */
> > > > > > > @@ -46,7 +46,7 @@ static int init_plic(void);
> > > > > > >
> > > > > > >  static int enable_ipi(int hart)
> > > > > > >  {
> > > > > > > -       int en;
> > > > > > > +       unsigned int en;
> > > > > >
> > > > > > Is this for some compiler warning fix?
> > > > >
> > > > > No, it is not a warning fix. It is a bug fix.
> > > > > I hope en can be 0x0000000080808080 instead of 0xffffffff80808080,
> > > >
> > > > But it is int, which is only 32-bit. The example you gave was a 64-bit number.
> > > >
> > >
> > > Please consider the following simple program:
> > >
> > > > #define MASK 0x80808080
> > > >int main(){
> > > >        int en;
> > > >        en = MASK;
> > > >        printf("%x, shifted %x\n", en, en >> 1);
> > > >        return 0;
> > > >}
> > >
> > > Would you mind sharing what you get after running this on your x86_64
> > > (if you have one) computer?  Really appreiciate that.
> > >
> > > The almost identical episode is in the patch, specifically,
> > > > en = ENABLE_HART_IPI >> hart
> > >
> > > > > or it will cause IPI sending errors.
> > > > >
> > > >
> > > > Regards,
> > > > Bin
> > >
> > > Best,
> > > Alan
> >
> > Regards,
> > Anup
Rick Chen Nov. 5, 2019, 1:50 a.m. UTC | #12
Hi Anup

> > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > >
> > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > Thanks for the critics.  Comments below.
> > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > Hi Rick,
> > > > >
> > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin
> > > > > >
> > > > > > >
> > > > > > > Hi Rick,
> > > > > > >
> > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > >
> > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > >
> > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > >
> > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > >
> > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > that only hart 0 can send ipi to other harts.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > main hart in OpenSbi.
> > > > > > >
> > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > >
> > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > >
> > > > > I don't think OpenSBI has such limitation.
> > > > >
> > > >
> > > > Please check the source.
> > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > >
> > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > 1. get hart ID.
> > > > 2. determine which route to take based on their ID respectively.
> > > >
> > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > a limitation.
> > >
> > > This dependency on hart id #0 was not there until we added self-relocation
> > > in OpenSBI for FW_DYNAMIC.
> > >
> > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> >
> > I have send a patch to fix this OpenSBI:
> > "[PATCH] firmware: Introduce relocation lottery"
> >
> > Can you try above patch and see if that helps ?
> >
> > It will be great if you can provide Tested-by to my patch as well.
> >
>

I can not find this patch in mailing list.
Can you provide a hyperlink ?

Thanks
Rick
Anup Patel Nov. 5, 2019, 6:34 a.m. UTC | #13
On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > >
> > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > Thanks for the critics.  Comments below.
> > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > Hi Rick,
> > > > > >
> > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Rick,
> > > > > > > >
> > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > >
> > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > >
> > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > >
> > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > main hart in OpenSbi.
> > > > > > > >
> > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > >
> > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > >
> > > > > > I don't think OpenSBI has such limitation.
> > > > > >
> > > > >
> > > > > Please check the source.
> > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > >
> > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > 1. get hart ID.
> > > > > 2. determine which route to take based on their ID respectively.
> > > > >
> > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > a limitation.
> > > >
> > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > in OpenSBI for FW_DYNAMIC.
> > > >
> > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > >
> > > I have send a patch to fix this OpenSBI:
> > > "[PATCH] firmware: Introduce relocation lottery"
> > >
> > > Can you try above patch and see if that helps ?
> > >
> > > It will be great if you can provide Tested-by to my patch as well.
> > >
> >
>
> I can not find this patch in mailing list.
> Can you provide a hyperlink ?

You can try latest riscv/opensbi master.

I have tested the patch on SiFive Unleashed multiple times.

Regards,
Anup

>
> Thanks
> Rick
Rick Chen Nov. 6, 2019, 6:44 a.m. UTC | #14
Hi Anup

>
> On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Anup
> >
> > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > >
> > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > Thanks for the critics.  Comments below.
> > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > Hi Rick,
> > > > > > >
> > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Rick,
> > > > > > > > >
> > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > >
> > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > >
> > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > >
> > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > >
> > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > main hart in OpenSbi.
> > > > > > > > >
> > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > >
> > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > >
> > > > > > > I don't think OpenSBI has such limitation.
> > > > > > >
> > > > > >
> > > > > > Please check the source.
> > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > >
> > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > 1. get hart ID.
> > > > > > 2. determine which route to take based on their ID respectively.
> > > > > >
> > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > a limitation.
> > > > >
> > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > in OpenSBI for FW_DYNAMIC.
> > > > >
> > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > >
> > > > I have send a patch to fix this OpenSBI:
> > > > "[PATCH] firmware: Introduce relocation lottery"
> > > >
> > > > Can you try above patch and see if that helps ?
> > > >
> > > > It will be great if you can provide Tested-by to my patch as well.
> > > >
> > >
> >
> > I can not find this patch in mailing list.
> > Can you provide a hyperlink ?
>
> You can try latest riscv/opensbi master.
>
> I have tested the patch on SiFive Unleashed multiple times.

I have tried this patch, but it fail
firmware: Introduce relocation lottery(
98f4a208995b027662a7b04a25e4fa5df5f3eefe)

The scenario was as below:
There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
Then hart 1 will do _relocate_copy_to_lower which will copy data from
0x1000000 to 0x0.
And it will corrupt U-Boot SPL.

Thanks
Rick

>
> Regards,
> Anup
>
> >
> > Thanks
> > Rick
Anup Patel Nov. 6, 2019, 8:48 a.m. UTC | #15
On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> >
> > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Anup
> > >
> > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > Thanks for the critics.  Comments below.
> > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > Hi Rick,
> > > > > > > >
> > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Rick,
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > >
> > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > >
> > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > >
> > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > >
> > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > >
> > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > >
> > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > >
> > > > > > >
> > > > > > > Please check the source.
> > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > >
> > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > 1. get hart ID.
> > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > >
> > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > a limitation.
> > > > > >
> > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > >
> > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > >
> > > > > I have send a patch to fix this OpenSBI:
> > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > >
> > > > > Can you try above patch and see if that helps ?
> > > > >
> > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > >
> > > >
> > >
> > > I can not find this patch in mailing list.
> > > Can you provide a hyperlink ?
> >
> > You can try latest riscv/opensbi master.
> >
> > I have tested the patch on SiFive Unleashed multiple times.
>
> I have tried this patch, but it fail
> firmware: Introduce relocation lottery(
> 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
>
> The scenario was as below:
> There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> Then hart 1 will do _relocate_copy_to_lower which will copy data from
> 0x1000000 to 0x0.
> And it will corrupt U-Boot SPL.

The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
are moved to the FW_TEXT_START before entering C code. This helps
us load OpenSBI firmwares anywhere in RAM.

However, OpenSBI firmwares don't know where the U-Boot SPL is running.

In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
address same address 0x0. This means secondary HARTs cannot safely
wait while primary HART enters OpenSBI. You should hold secondary HARTs
in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
at the same time after everything is loaded in RAM.

Regards,
Anup
Anup Patel Nov. 6, 2019, 8:58 a.m. UTC | #16
On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
>
> On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Anup
> >
> > >
> > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Anup
> > > >
> > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > Hi Rick,
> > > > > > > > >
> > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Rick,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > >
> > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > >
> > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > >
> > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > >
> > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > >
> > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > >
> > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Please check the source.
> > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > >
> > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > 1. get hart ID.
> > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > >
> > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > a limitation.
> > > > > > >
> > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > >
> > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > >
> > > > > > I have send a patch to fix this OpenSBI:
> > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > >
> > > > > > Can you try above patch and see if that helps ?
> > > > > >
> > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > >
> > > > >
> > > >
> > > > I can not find this patch in mailing list.
> > > > Can you provide a hyperlink ?
> > >
> > > You can try latest riscv/opensbi master.
> > >
> > > I have tested the patch on SiFive Unleashed multiple times.
> >
> > I have tried this patch, but it fail
> > firmware: Introduce relocation lottery(
> > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> >
> > The scenario was as below:
> > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > 0x1000000 to 0x0.
> > And it will corrupt U-Boot SPL.
>
> The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> are moved to the FW_TEXT_START before entering C code. This helps
> us load OpenSBI firmwares anywhere in RAM.
>
> However, OpenSBI firmwares don't know where the U-Boot SPL is running.
>
> In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> address same address 0x0. This means secondary HARTs cannot safely
> wait while primary HART enters OpenSBI. You should hold secondary HARTs
> in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> at the same time after everything is loaded in RAM.

I see the issue now.

The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
HART jumps to OpenSBI at the end.
(Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)

The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.

If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
changing U-Boot SPL TEXT_START would be convenient since this series
is under review. Thoughts ?

Regards,
Anup

>
> Regards,
> Anup
Rick Chen Nov. 6, 2019, 9:21 a.m. UTC | #17
Hi Anup

>
> On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Anup
> > >
> > > >
> > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > >
> > > > > Hi Anup
> > > > >
> > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > Hi Rick,
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > >
> > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > >
> > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > >
> > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > >
> > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > >
> > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please check the source.
> > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > >
> > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > 1. get hart ID.
> > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > >
> > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > a limitation.
> > > > > > > >
> > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > >
> > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > >
> > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > >
> > > > > > > Can you try above patch and see if that helps ?
> > > > > > >
> > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > >
> > > > > >
> > > > >
> > > > > I can not find this patch in mailing list.
> > > > > Can you provide a hyperlink ?
> > > >
> > > > You can try latest riscv/opensbi master.
> > > >
> > > > I have tested the patch on SiFive Unleashed multiple times.
> > >
> > > I have tried this patch, but it fail
> > > firmware: Introduce relocation lottery(
> > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > >
> > > The scenario was as below:
> > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > 0x1000000 to 0x0.
> > > And it will corrupt U-Boot SPL.
> >
> > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > are moved to the FW_TEXT_START before entering C code. This helps
> > us load OpenSBI firmwares anywhere in RAM.
> >
> > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> >
> > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > address same address 0x0. This means secondary HARTs cannot safely
> > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > at the same time after everything is loaded in RAM.
>
> I see the issue now.
>
> The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> HART jumps to OpenSBI at the end.
> (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
>
> The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
>
> If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> changing U-Boot SPL TEXT_START would be convenient since this series
> is under review. Thoughts ?

Yes.
I know it can avoid corrupting issue with changing  U-Boot SPL
TEXT_START not equal to OpenSBI TEXT base.

With the following changes, U-Boot SPL text base can equal to OpenSBI text base
1 U-Boot pass main hart information (a2) when jumping to OpenSBI
2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
_wait_relocate_copy_done

It will be convenient to change U-Boot SPL text base currently.

Thanks
Rick

>
> Regards,
> Anup
>
> >
> > Regards,
> > Anup
Anup Patel Nov. 6, 2019, 11:11 a.m. UTC | #18
On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> >
> > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Anup
> > > >
> > > > >
> > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > >
> > > > > > Hi Anup
> > > > > >
> > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > Hi Rick,
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > >
> > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please check the source.
> > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > >
> > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > 1. get hart ID.
> > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > >
> > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > a limitation.
> > > > > > > > >
> > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > >
> > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > >
> > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > >
> > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > >
> > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > I can not find this patch in mailing list.
> > > > > > Can you provide a hyperlink ?
> > > > >
> > > > > You can try latest riscv/opensbi master.
> > > > >
> > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > >
> > > > I have tried this patch, but it fail
> > > > firmware: Introduce relocation lottery(
> > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > >
> > > > The scenario was as below:
> > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > 0x1000000 to 0x0.
> > > > And it will corrupt U-Boot SPL.
> > >
> > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > are moved to the FW_TEXT_START before entering C code. This helps
> > > us load OpenSBI firmwares anywhere in RAM.
> > >
> > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > >
> > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > address same address 0x0. This means secondary HARTs cannot safely
> > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > at the same time after everything is loaded in RAM.
> >
> > I see the issue now.
> >
> > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > HART jumps to OpenSBI at the end.
> > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> >
> > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> >
> > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > changing U-Boot SPL TEXT_START would be convenient since this series
> > is under review. Thoughts ?
>
> Yes.
> I know it can avoid corrupting issue with changing  U-Boot SPL
> TEXT_START not equal to OpenSBI TEXT base.

I think this issue will be seen on U-Boot SPL running on QEMU as well.

>
> With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> _wait_relocate_copy_done

Overall it's a good suggestion but we cannot use a2 register because this
will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
boot HART id in struct fw_dynamic_info.

I have a patch for this in preferred_boot_hart_v1 branch of
https://github.com/avpatel/opensbi.git

Can you try OpenSBI from above branch ?

You will have to update the "struct fw_dynamic_info" passed to
OpenSBI by U-Boot SPL.

Meanwhile, I will try above patch on QEMU and SiFive Unleashed.

>
> It will be convenient to change U-Boot SPL text base currently.

Let's fix this correctly now itself.

Regards,
Anup
Rick Chen Nov. 7, 2019, 1:34 a.m. UTC | #19
Hi Anup

> On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Anup
> >
> > >
> > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > >
> > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > >
> > > > > Hi Anup
> > > > >
> > > > > >
> > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Anup
> > > > > > >
> > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Please check the source.
> > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > >
> > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > >
> > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > a limitation.
> > > > > > > > > >
> > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > >
> > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > >
> > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > >
> > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > >
> > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > I can not find this patch in mailing list.
> > > > > > > Can you provide a hyperlink ?
> > > > > >
> > > > > > You can try latest riscv/opensbi master.
> > > > > >
> > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > >
> > > > > I have tried this patch, but it fail
> > > > > firmware: Introduce relocation lottery(
> > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > >
> > > > > The scenario was as below:
> > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > 0x1000000 to 0x0.
> > > > > And it will corrupt U-Boot SPL.
> > > >
> > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > us load OpenSBI firmwares anywhere in RAM.
> > > >
> > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > >
> > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > at the same time after everything is loaded in RAM.
> > >
> > > I see the issue now.
> > >
> > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > HART jumps to OpenSBI at the end.
> > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > >
> > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > >
> > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > is under review. Thoughts ?
> >
> > Yes.
> > I know it can avoid corrupting issue with changing  U-Boot SPL
> > TEXT_START not equal to OpenSBI TEXT base.
>
> I think this issue will be seen on U-Boot SPL running on QEMU as well.
>
> >
> > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > _wait_relocate_copy_done
>
> Overall it's a good suggestion but we cannot use a2 register because this
> will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> boot HART id in struct fw_dynamic_info.

Sorry, what I want to say shall be a3.

>
> I have a patch for this in preferred_boot_hart_v1 branch of
> https://github.com/avpatel/opensbi.git
>
> Can you try OpenSBI from above branch ?
>
> You will have to update the "struct fw_dynamic_info" passed to
> OpenSBI by U-Boot SPL.

Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.

So if U-Boot SPL can pass main hart information via a3, OpenSBI just
have the following change
blt zero, a6, _wait_relocate_copy_done
change to
bne a3, a6, _wait_relocate_copy_done
before this commit
98f4a208995b027662a7b04a25e4fa5df5f3eefe
firmware: Introduce relocation lottery

But after this commit 98f4a, main hart become chosen from lottery mechanism.
Maybe I will prefer to change U-Boot SPL text base not overlap with
OpenSBI text start. :)

Thanks
Rick

>
> Meanwhile, I will try above patch on QEMU and SiFive Unleashed.
>
> >
> > It will be convenient to change U-Boot SPL text base currently.
>
> Let's fix this correctly now itself.
>
> Regards,
> Anup
Anup Patel Nov. 7, 2019, 5:15 a.m. UTC | #20
On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Anup
> > >
> > > >
> > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > >
> > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > >
> > > > > > Hi Anup
> > > > > >
> > > > > > >
> > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Anup
> > > > > > > >
> > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > >
> > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > >
> > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > a limitation.
> > > > > > > > > > >
> > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > >
> > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > >
> > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > >
> > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > >
> > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > I can not find this patch in mailing list.
> > > > > > > > Can you provide a hyperlink ?
> > > > > > >
> > > > > > > You can try latest riscv/opensbi master.
> > > > > > >
> > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > >
> > > > > > I have tried this patch, but it fail
> > > > > > firmware: Introduce relocation lottery(
> > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > >
> > > > > > The scenario was as below:
> > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > 0x1000000 to 0x0.
> > > > > > And it will corrupt U-Boot SPL.
> > > > >
> > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > >
> > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > >
> > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > at the same time after everything is loaded in RAM.
> > > >
> > > > I see the issue now.
> > > >
> > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > HART jumps to OpenSBI at the end.
> > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > >
> > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > >
> > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > is under review. Thoughts ?
> > >
> > > Yes.
> > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > TEXT_START not equal to OpenSBI TEXT base.
> >
> > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> >
> > >
> > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > _wait_relocate_copy_done
> >
> > Overall it's a good suggestion but we cannot use a2 register because this
> > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > boot HART id in struct fw_dynamic_info.
>
> Sorry, what I want to say shall be a3.
>
> >
> > I have a patch for this in preferred_boot_hart_v1 branch of
> > https://github.com/avpatel/opensbi.git
> >
> > Can you try OpenSBI from above branch ?
> >
> > You will have to update the "struct fw_dynamic_info" passed to
> > OpenSBI by U-Boot SPL.
>
> Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.

That's wrong in U-Boot SPL.

All HARTs have to follow FW_DYNAMIC protocol and pass
"struct fw_dynamic_info" pointer in 'a2' register.

>
> So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> have the following change
> blt zero, a6, _wait_relocate_copy_done
> change to
> bne a3, a6, _wait_relocate_copy_done
> before this commit
> 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> firmware: Introduce relocation lottery

What about FW_JUMP and FW_PAYLOAD? We have no way of passing
value in a3 for these firmwares because these are not booted by U-Boot
SPL.

Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
pass anything in 'a3' register.

We should definitely use "struct fw_dynamic_info" for this so that we can
maintain backward compatibility as well.

Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
pointer in 'a2' register for all HARTs.

>
> But after this commit 98f4a, main hart become chosen from lottery mechanism.
> Maybe I will prefer to change U-Boot SPL text base not overlap with
> OpenSBI text start. :)

Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
just that most of us did not notice it for U-Boot SPL on QEMU.

Let's fix this in the right way from start itself.

Regards,
Anup

>
> Thanks
> Rick
>
> >
> > Meanwhile, I will try above patch on QEMU and SiFive Unleashed.
> >
> > >
> > > It will be convenient to change U-Boot SPL text base currently.
> >
> > Let's fix this correctly now itself.
> >
> > Regards,
> > Anup
Anup Patel Nov. 7, 2019, 5:45 a.m. UTC | #21
On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
>
> On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Anup
> >
> > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Anup
> > > >
> > > > >
> > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > >
> > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Anup
> > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > >
> > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > a limitation.
> > > > > > > > > > > >
> > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > >
> > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > >
> > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > >
> > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > >
> > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > Can you provide a hyperlink ?
> > > > > > > >
> > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > >
> > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > >
> > > > > > > I have tried this patch, but it fail
> > > > > > > firmware: Introduce relocation lottery(
> > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > >
> > > > > > > The scenario was as below:
> > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > 0x1000000 to 0x0.
> > > > > > > And it will corrupt U-Boot SPL.
> > > > > >
> > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > >
> > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > >
> > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > at the same time after everything is loaded in RAM.
> > > > >
> > > > > I see the issue now.
> > > > >
> > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > HART jumps to OpenSBI at the end.
> > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > >
> > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > >
> > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > is under review. Thoughts ?
> > > >
> > > > Yes.
> > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > TEXT_START not equal to OpenSBI TEXT base.
> > >
> > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > >
> > > >
> > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > _wait_relocate_copy_done
> > >
> > > Overall it's a good suggestion but we cannot use a2 register because this
> > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > boot HART id in struct fw_dynamic_info.
> >
> > Sorry, what I want to say shall be a3.
> >
> > >
> > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > https://github.com/avpatel/opensbi.git
> > >
> > > Can you try OpenSBI from above branch ?
> > >
> > > You will have to update the "struct fw_dynamic_info" passed to
> > > OpenSBI by U-Boot SPL.
> >
> > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
>
> That's wrong in U-Boot SPL.
>
> All HARTs have to follow FW_DYNAMIC protocol and pass
> "struct fw_dynamic_info" pointer in 'a2' register.
>
> >
> > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > have the following change
> > blt zero, a6, _wait_relocate_copy_done
> > change to
> > bne a3, a6, _wait_relocate_copy_done
> > before this commit
> > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > firmware: Introduce relocation lottery
>
> What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> value in a3 for these firmwares because these are not booted by U-Boot
> SPL.
>
> Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> pass anything in 'a3' register.
>
> We should definitely use "struct fw_dynamic_info" for this so that we can
> maintain backward compatibility as well.
>
> Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> pointer in 'a2' register for all HARTs.
>
> >
> > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > OpenSBI text start. :)
>
> Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> just that most of us did not notice it for U-Boot SPL on QEMU.
>
> Let's fix this in the right way from start itself.

I double checked spl_invoke_opensbi() and it is doing the right thing
by passing "struct fw_dyanmic_info" pointer in 'a2' register.
(Refer, common/spl/spl_opensbi.c)

Not sure, why it is not passing 'a2' register correctly for you ??

Regards,
Anup

>
> Regards,
> Anup
>
> >
> > Thanks
> > Rick
> >
> > >
> > > Meanwhile, I will try above patch on QEMU and SiFive Unleashed.
> > >
> > > >
> > > > It will be convenient to change U-Boot SPL text base currently.
> > >
> > > Let's fix this correctly now itself.
> > >
> > > Regards,
> > > Anup
Rick Chen Nov. 7, 2019, 6:10 a.m. UTC | #22
Hi Anup

>
> On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> >
> > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Anup
> > >
> > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > >
> > > > > Hi Anup
> > > > >
> > > > > >
> > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > >
> > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Anup
> > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Anup
> > > > > > > > > >
> > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > >
> > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > >
> > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > >
> > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > >
> > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > >
> > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > >
> > > > > > > > I have tried this patch, but it fail
> > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > >
> > > > > > > > The scenario was as below:
> > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > 0x1000000 to 0x0.
> > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > >
> > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > >
> > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > >
> > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > at the same time after everything is loaded in RAM.
> > > > > >
> > > > > > I see the issue now.
> > > > > >
> > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > HART jumps to OpenSBI at the end.
> > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > >
> > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > >
> > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > is under review. Thoughts ?
> > > > >
> > > > > Yes.
> > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > >
> > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > >
> > > > >
> > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > _wait_relocate_copy_done
> > > >
> > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > boot HART id in struct fw_dynamic_info.
> > >
> > > Sorry, what I want to say shall be a3.
> > >
> > > >
> > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > https://github.com/avpatel/opensbi.git
> > > >
> > > > Can you try OpenSBI from above branch ?
> > > >
> > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > OpenSBI by U-Boot SPL.
> > >
> > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> >
> > That's wrong in U-Boot SPL.
> >
> > All HARTs have to follow FW_DYNAMIC protocol and pass
> > "struct fw_dynamic_info" pointer in 'a2' register.
> >
> > >
> > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > have the following change
> > > blt zero, a6, _wait_relocate_copy_done
> > > change to
> > > bne a3, a6, _wait_relocate_copy_done
> > > before this commit
> > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > firmware: Introduce relocation lottery
> >
> > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > value in a3 for these firmwares because these are not booted by U-Boot
> > SPL.
> >
> > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > pass anything in 'a3' register.
> >
> > We should definitely use "struct fw_dynamic_info" for this so that we can
> > maintain backward compatibility as well.
> >
> > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > pointer in 'a2' register for all HARTs.
> >
> > >
> > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > OpenSBI text start. :)
> >
> > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > just that most of us did not notice it for U-Boot SPL on QEMU.
> >
> > Let's fix this in the right way from start itself.
>
> I double checked spl_invoke_opensbi() and it is doing the right thing
> by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> (Refer, common/spl/spl_opensbi.c)
>
> Not sure, why it is not passing 'a2' register correctly for you ??
>

Yes, you are right. I reply too quickly.
Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.

Thanks for your corrections

Regards,
Rick
Anup Patel Nov. 7, 2019, 6:18 a.m. UTC | #23
On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> >
> > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > >
> > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > >
> > > > Hi Anup
> > > >
> > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > >
> > > > > > Hi Anup
> > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > >
> > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Anup
> > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > >
> > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > >
> > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > >
> > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > >
> > > > > > > > > The scenario was as below:
> > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > >
> > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > >
> > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > >
> > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > >
> > > > > > > I see the issue now.
> > > > > > >
> > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > >
> > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > >
> > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > is under review. Thoughts ?
> > > > > >
> > > > > > Yes.
> > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > >
> > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > >
> > > > > >
> > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > _wait_relocate_copy_done
> > > > >
> > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > boot HART id in struct fw_dynamic_info.
> > > >
> > > > Sorry, what I want to say shall be a3.
> > > >
> > > > >
> > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > https://github.com/avpatel/opensbi.git
> > > > >
> > > > > Can you try OpenSBI from above branch ?
> > > > >
> > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > OpenSBI by U-Boot SPL.
> > > >
> > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > >
> > > That's wrong in U-Boot SPL.
> > >
> > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > "struct fw_dynamic_info" pointer in 'a2' register.
> > >
> > > >
> > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > have the following change
> > > > blt zero, a6, _wait_relocate_copy_done
> > > > change to
> > > > bne a3, a6, _wait_relocate_copy_done
> > > > before this commit
> > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > firmware: Introduce relocation lottery
> > >
> > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > value in a3 for these firmwares because these are not booted by U-Boot
> > > SPL.
> > >
> > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > pass anything in 'a3' register.
> > >
> > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > maintain backward compatibility as well.
> > >
> > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > pointer in 'a2' register for all HARTs.
> > >
> > > >
> > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > OpenSBI text start. :)
> > >
> > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > >
> > > Let's fix this in the right way from start itself.
> >
> > I double checked spl_invoke_opensbi() and it is doing the right thing
> > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > (Refer, common/spl/spl_opensbi.c)
> >
> > Not sure, why it is not passing 'a2' register correctly for you ??
> >
>
> Yes, you are right. I reply too quickly.
> Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
>
> Thanks for your corrections

No problem, I am happy to help.

BTW, I tried to play around with U-Boot SPL on QEMU.

Maybe below changes can help you...

diff --git a/common/spl/spl_opensbi.c b/common/spl/spl_opensbi.c
index a6b4480ed2..79ee7edcf9 100644
--- a/common/spl/spl_opensbi.c
+++ b/common/spl/spl_opensbi.c
@@ -69,6 +69,7 @@ void spl_invoke_opensbi(struct spl_image_info *spl_image)
     opensbi_info.next_addr = uboot_entry;
     opensbi_info.next_mode = FW_DYNAMIC_INFO_NEXT_MODE_S;
     opensbi_info.options = SBI_SCRATCH_NO_BOOT_PRINTS;
+    opensbi_info.boot_hart = gd->arch.boot_hart;

     opensbi_entry = (void (*)(ulong, ulong, ulong))spl_image->entry_point;
     invalidate_icache_all();
diff --git a/include/opensbi.h b/include/opensbi.h
index 9f1d62e7dd..29a95fdfb6 100644
--- a/include/opensbi.h
+++ b/include/opensbi.h
@@ -11,7 +11,7 @@
 #define FW_DYNAMIC_INFO_MAGIC_VALUE        0x4942534f

 /** Maximum supported info version */
-#define FW_DYNAMIC_INFO_VERSION            0x1
+#define FW_DYNAMIC_INFO_VERSION            0x2

 /** Possible next mode values */
 #define FW_DYNAMIC_INFO_NEXT_MODE_U        0x0
@@ -35,6 +35,8 @@ struct fw_dynamic_info {
     unsigned long next_mode;
     /** Options for OpenSBI library */
     unsigned long options;
+    /** Preferred boot HART id */
+    unsigned long boot_hart;
 } __packed;

 #endif


Regards,
Anup
Lukas Auer Nov. 7, 2019, 9:41 a.m. UTC | #24
On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > Hi Anup
> > 
> > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > Hi Anup
> > > > > 
> > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > Hi Anup
> > > > > > > 
> > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > > 
> > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > 
> > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > 
> > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > 
> > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > 
> > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > 
> > > > > > > > > > The scenario was as below:
> > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > 
> > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > 
> > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > 
> > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > 
> > > > > > > > I see the issue now.
> > > > > > > > 
> > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > 
> > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > 
> > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > is under review. Thoughts ?
> > > > > > > 
> > > > > > > Yes.
> > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > 
> > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > 
> > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > _wait_relocate_copy_done
> > > > > > 
> > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > boot HART id in struct fw_dynamic_info.
> > > > > 
> > > > > Sorry, what I want to say shall be a3.
> > > > > 
> > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > https://github.com/avpatel/opensbi.git
> > > > > > 
> > > > > > Can you try OpenSBI from above branch ?
> > > > > > 
> > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > OpenSBI by U-Boot SPL.
> > > > > 
> > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > 
> > > > That's wrong in U-Boot SPL.
> > > > 
> > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > 
> > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > have the following change
> > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > change to
> > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > before this commit
> > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > firmware: Introduce relocation lottery
> > > > 
> > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > SPL.
> > > > 
> > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > pass anything in 'a3' register.
> > > > 
> > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > maintain backward compatibility as well.
> > > > 
> > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > pointer in 'a2' register for all HARTs.
> > > > 
> > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > OpenSBI text start. :)
> > > > 
> > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > 
> > > > Let's fix this in the right way from start itself.
> > > 
> > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > (Refer, common/spl/spl_opensbi.c)
> > > 
> > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > 
> > 
> > Yes, you are right. I reply too quickly.
> > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > 
> > Thanks for your corrections
> 
> No problem, I am happy to help.
> 
> BTW, I tried to play around with U-Boot SPL on QEMU.
> 
> Maybe below changes can help you...

Thanks for looking into this issue! I successfully tested it on QEMU, I
had to add a short delay between sending the IPIs to trigger the
problem.

We might still run into problems however. Right now, we are assuming
that the main hart is the last one to enter OpenSBI. If this is not the
case (some delay when handling the IPI), we will have the same problem
again. To fix this we could pass the hart mask, containing all harts
that have entered U-Boot, to OpenSBI and wait for all harts to be
running in OpenSBI. I am not sure how realistic this scenario is, so
this might not be needed.

Regards,
Lukas
Anup Patel Nov. 7, 2019, 10:44 a.m. UTC | #25
On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
<lukas.auer@aisec.fraunhofer.de> wrote:
>
> On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > Hi Anup
> > >
> > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > Hi Anup
> > > > > >
> > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > Hi Anup
> > > > > > > >
> > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > >
> > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > >
> > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > >
> > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > >
> > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > >
> > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > >
> > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > >
> > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > >
> > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > >
> > > > > > > > > I see the issue now.
> > > > > > > > >
> > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > >
> > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > >
> > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > is under review. Thoughts ?
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > >
> > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > >
> > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > _wait_relocate_copy_done
> > > > > > >
> > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > >
> > > > > > Sorry, what I want to say shall be a3.
> > > > > >
> > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > >
> > > > > > > Can you try OpenSBI from above branch ?
> > > > > > >
> > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > OpenSBI by U-Boot SPL.
> > > > > >
> > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > >
> > > > > That's wrong in U-Boot SPL.
> > > > >
> > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > >
> > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > have the following change
> > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > change to
> > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > before this commit
> > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > firmware: Introduce relocation lottery
> > > > >
> > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > SPL.
> > > > >
> > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > pass anything in 'a3' register.
> > > > >
> > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > maintain backward compatibility as well.
> > > > >
> > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > pointer in 'a2' register for all HARTs.
> > > > >
> > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > OpenSBI text start. :)
> > > > >
> > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > >
> > > > > Let's fix this in the right way from start itself.
> > > >
> > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > (Refer, common/spl/spl_opensbi.c)
> > > >
> > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > >
> > >
> > > Yes, you are right. I reply too quickly.
> > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > >
> > > Thanks for your corrections
> >
> > No problem, I am happy to help.
> >
> > BTW, I tried to play around with U-Boot SPL on QEMU.
> >
> > Maybe below changes can help you...
>
> Thanks for looking into this issue! I successfully tested it on QEMU, I
> had to add a short delay between sending the IPIs to trigger the
> problem.
>
> We might still run into problems however. Right now, we are assuming
> that the main hart is the last one to enter OpenSBI. If this is not the
> case (some delay when handling the IPI), we will have the same problem
> again. To fix this we could pass the hart mask, containing all harts
> that have entered U-Boot, to OpenSBI and wait for all harts to be
> running in OpenSBI. I am not sure how realistic this scenario is, so
> this might not be needed.

I agree that we might still run into this issue if primary HART enters
OpenSBI before secondary HARTs. I think this situation can only
happen on QEMU where each CPU is a thread running on host but
very unlikely/impossible on real HW.

Maybe a delay on primary HART in U-Boot SPL after SMP calls to
secondary HARTs and before jumping to OpenSBI ?

Regarding hart_mask in fw_dynamic_info, I think the issue will be the
size of the hart_mask. It is possible in-future SOC vendors come-up
with SOC having huge number of HARTs OR SOC with discontinuous
HART IDs which can cause a 64bit hart_mask to be not sufficient for
all HARTs.

Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
in fw_base.S which will add to the boot-time as well.

I still think the root cause of the issue is that TEXT_START of
U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
insist SOC vendors to not use same TEXT_START ?

Regards,
Anup
Rick Chen Nov. 7, 2019, 11:41 a.m. UTC | #26
Hi Anup & Lukas

Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
>
> On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> <lukas.auer@aisec.fraunhofer.de> wrote:
> >
> > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > Hi Anup
> > > >
> > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > Hi Anup
> > > > > > >
> > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > >
> > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > >
> > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > >
> > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > >
> > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > >
> > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > >
> > > > > > > > > > I see the issue now.
> > > > > > > > > >
> > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > >
> > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > >
> > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > is under review. Thoughts ?
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > >
> > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > >
> > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > _wait_relocate_copy_done
> > > > > > > >
> > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > >
> > > > > > > Sorry, what I want to say shall be a3.
> > > > > > >
> > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > >
> > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > >
> > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > >
> > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > >
> > > > > > That's wrong in U-Boot SPL.
> > > > > >
> > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > >
> > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > have the following change
> > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > change to
> > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > before this commit
> > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > firmware: Introduce relocation lottery
> > > > > >
> > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > SPL.
> > > > > >
> > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > pass anything in 'a3' register.
> > > > > >
> > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > maintain backward compatibility as well.
> > > > > >
> > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > pointer in 'a2' register for all HARTs.
> > > > > >
> > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > OpenSBI text start. :)
> > > > > >
> > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > >
> > > > > > Let's fix this in the right way from start itself.
> > > > >
> > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > (Refer, common/spl/spl_opensbi.c)
> > > > >
> > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > >
> > > >
> > > > Yes, you are right. I reply too quickly.
> > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > >
> > > > Thanks for your corrections
> > >
> > > No problem, I am happy to help.
> > >
> > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > >
> > > Maybe below changes can help you...
> >
> > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > had to add a short delay between sending the IPIs to trigger the
> > problem.
> >
> > We might still run into problems however. Right now, we are assuming
> > that the main hart is the last one to enter OpenSBI. If this is not the
> > case (some delay when handling the IPI), we will have the same problem
> > again. To fix this we could pass the hart mask, containing all harts
> > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > running in OpenSBI. I am not sure how realistic this scenario is, so
> > this might not be needed.
>
> I agree that we might still run into this issue if primary HART enters
> OpenSBI before secondary HARTs. I think this situation can only
> happen on QEMU where each CPU is a thread running on host but
> very unlikely/impossible on real HW.
>
> Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> secondary HARTs and before jumping to OpenSBI ?
>
> Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> size of the hart_mask. It is possible in-future SOC vendors come-up
> with SOC having huge number of HARTs OR SOC with discontinuous
> HART IDs which can cause a 64bit hart_mask to be not sufficient for
> all HARTs.
>
> Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> in fw_base.S which will add to the boot-time as well.
>
> I still think the root cause of the issue is that TEXT_START of
> U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> insist SOC vendors to not use same TEXT_START ?

I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
preferred_boot_hart_v2 branch
It still encounter some booting problems. I try to find out the root
cause but in vain.

I am very agree with options of Lukas.
After modifying U-Boot SPL text base not equal to zero and the booting
progress will be pass.

Thanks
Rick

>
> Regards,
> Anup
Lukas Auer Nov. 7, 2019, 11:44 a.m. UTC | #27
On Thu, 2019-11-07 at 16:14 +0530, Anup Patel wrote:
> On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> <lukas.auer@aisec.fraunhofer.de> wrote:
> > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > Hi Anup
> > > > 
> > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > Hi Anup
> > > > > > > 
> > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > Hi Anup
> > > > > > > > > 
> > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > 
> > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > 
> > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > 
> > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > 
> > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > 
> > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > 
> > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > 
> > > > > > > > > > I see the issue now.
> > > > > > > > > > 
> > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > 
> > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > 
> > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > 
> > > > > > > > > Yes.
> > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > 
> > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > 
> > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > _wait_relocate_copy_done
> > > > > > > > 
> > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > 
> > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > 
> > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > 
> > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > 
> > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > 
> > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > 
> > > > > > That's wrong in U-Boot SPL.
> > > > > > 
> > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > 
> > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > have the following change
> > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > change to
> > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > before this commit
> > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > firmware: Introduce relocation lottery
> > > > > > 
> > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > SPL.
> > > > > > 
> > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > pass anything in 'a3' register.
> > > > > > 
> > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > maintain backward compatibility as well.
> > > > > > 
> > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > pointer in 'a2' register for all HARTs.
> > > > > > 
> > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > OpenSBI text start. :)
> > > > > > 
> > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > 
> > > > > > Let's fix this in the right way from start itself.
> > > > > 
> > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > 
> > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > 
> > > > 
> > > > Yes, you are right. I reply too quickly.
> > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > 
> > > > Thanks for your corrections
> > > 
> > > No problem, I am happy to help.
> > > 
> > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > 
> > > Maybe below changes can help you...
> > 
> > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > had to add a short delay between sending the IPIs to trigger the
> > problem.
> > 
> > We might still run into problems however. Right now, we are assuming
> > that the main hart is the last one to enter OpenSBI. If this is not the
> > case (some delay when handling the IPI), we will have the same problem
> > again. To fix this we could pass the hart mask, containing all harts
> > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > running in OpenSBI. I am not sure how realistic this scenario is, so
> > this might not be needed.
> 
> I agree that we might still run into this issue if primary HART enters
> OpenSBI before secondary HARTs. I think this situation can only
> happen on QEMU where each CPU is a thread running on host but
> very unlikely/impossible on real HW.
> 
> Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> secondary HARTs and before jumping to OpenSBI ?
> 

You are right, I don't think we will ever see this on real hardware. I
can add a check to ensure all harts have processed the IPI before
jumping to OpenSBI on the main hart. A simple delay is probably already
sufficient however.

> Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> size of the hart_mask. It is possible in-future SOC vendors come-up
> with SOC having huge number of HARTs OR SOC with discontinuous
> HART IDs which can cause a 64bit hart_mask to be not sufficient for
> all HARTs.
> 
> Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> in fw_base.S which will add to the boot-time as well.
> 

I agree, it's best to keep everything simple here.

> I still think the root cause of the issue is that TEXT_START of
> U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> insist SOC vendors to not use same TEXT_START ?
> 

That is definitely the best solution. I am not familiar with the
details of the Andes platform, so unsure if that would work on it. On
the SiFive Unleashed board this will be possible as soon as we have a
working DRAM driver. U-Boot SPL could then run from L2 scratchpad
memory instead of DRAM.

Regards,
Lukas
Anup Patel Nov. 7, 2019, 12:22 p.m. UTC | #28
On Thu, Nov 7, 2019 at 5:11 PM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup & Lukas
>
> Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> >
> > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > <lukas.auer@aisec.fraunhofer.de> wrote:
> > >
> > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > Hi Anup
> > > > >
> > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > Hi Anup
> > > > > > > >
> > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > >
> > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > >
> > > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > >
> > > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > >
> > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > >
> > > > > > > > > > > I see the issue now.
> > > > > > > > > > >
> > > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > >
> > > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > >
> > > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > >
> > > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > >
> > > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > >
> > > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > >
> > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > >
> > > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > >
> > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > >
> > > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > >
> > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > >
> > > > > > > That's wrong in U-Boot SPL.
> > > > > > >
> > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > >
> > > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > > have the following change
> > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > change to
> > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > before this commit
> > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > firmware: Introduce relocation lottery
> > > > > > >
> > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > > SPL.
> > > > > > >
> > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > > pass anything in 'a3' register.
> > > > > > >
> > > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > > maintain backward compatibility as well.
> > > > > > >
> > > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > >
> > > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > > OpenSBI text start. :)
> > > > > > >
> > > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > >
> > > > > > > Let's fix this in the right way from start itself.
> > > > > >
> > > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > >
> > > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > >
> > > > >
> > > > > Yes, you are right. I reply too quickly.
> > > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > >
> > > > > Thanks for your corrections
> > > >
> > > > No problem, I am happy to help.
> > > >
> > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > >
> > > > Maybe below changes can help you...
> > >
> > > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > > had to add a short delay between sending the IPIs to trigger the
> > > problem.
> > >
> > > We might still run into problems however. Right now, we are assuming
> > > that the main hart is the last one to enter OpenSBI. If this is not the
> > > case (some delay when handling the IPI), we will have the same problem
> > > again. To fix this we could pass the hart mask, containing all harts
> > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > running in OpenSBI. I am not sure how realistic this scenario is, so
> > > this might not be needed.
> >
> > I agree that we might still run into this issue if primary HART enters
> > OpenSBI before secondary HARTs. I think this situation can only
> > happen on QEMU where each CPU is a thread running on host but
> > very unlikely/impossible on real HW.
> >
> > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > secondary HARTs and before jumping to OpenSBI ?
> >
> > Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> > size of the hart_mask. It is possible in-future SOC vendors come-up
> > with SOC having huge number of HARTs OR SOC with discontinuous
> > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > all HARTs.
> >
> > Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> > in fw_base.S which will add to the boot-time as well.
> >
> > I still think the root cause of the issue is that TEXT_START of
> > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > insist SOC vendors to not use same TEXT_START ?
>
> I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> preferred_boot_hart_v2 branch
> It still encounter some booting problems. I try to find out the root
> cause but in vain.
>
> I am very agree with options of Lukas.
> After modifying U-Boot SPL text base not equal to zero and the booting
> progress will be pass.

No problem, it will be your decision to go with different TEXT_BASE for
AndesTech platform.

We will keep the "boot_hart" field in OpenSBI for U-Boot SPL on QEMU
and I will wait for Lukas to add more checks and small delay in U-Boot SPL
(like he mentioned).

Thanks,
Anup
Anup Patel Nov. 7, 2019, 12:27 p.m. UTC | #29
On Thu, Nov 7, 2019 at 5:14 PM Auer, Lukas
<lukas.auer@aisec.fraunhofer.de> wrote:
>
> On Thu, 2019-11-07 at 16:14 +0530, Anup Patel wrote:
> > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > Hi Anup
> > > > >
> > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > Hi Anup
> > > > > > > >
> > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > >
> > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > >
> > > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > >
> > > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > >
> > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > >
> > > > > > > > > > > I see the issue now.
> > > > > > > > > > >
> > > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > >
> > > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > >
> > > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > >
> > > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > >
> > > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > >
> > > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > >
> > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > >
> > > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > >
> > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > >
> > > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > >
> > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > >
> > > > > > > That's wrong in U-Boot SPL.
> > > > > > >
> > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > >
> > > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > > have the following change
> > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > change to
> > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > before this commit
> > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > firmware: Introduce relocation lottery
> > > > > > >
> > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > > SPL.
> > > > > > >
> > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > > pass anything in 'a3' register.
> > > > > > >
> > > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > > maintain backward compatibility as well.
> > > > > > >
> > > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > >
> > > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > > OpenSBI text start. :)
> > > > > > >
> > > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > >
> > > > > > > Let's fix this in the right way from start itself.
> > > > > >
> > > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > >
> > > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > >
> > > > >
> > > > > Yes, you are right. I reply too quickly.
> > > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > >
> > > > > Thanks for your corrections
> > > >
> > > > No problem, I am happy to help.
> > > >
> > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > >
> > > > Maybe below changes can help you...
> > >
> > > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > > had to add a short delay between sending the IPIs to trigger the
> > > problem.
> > >
> > > We might still run into problems however. Right now, we are assuming
> > > that the main hart is the last one to enter OpenSBI. If this is not the
> > > case (some delay when handling the IPI), we will have the same problem
> > > again. To fix this we could pass the hart mask, containing all harts
> > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > running in OpenSBI. I am not sure how realistic this scenario is, so
> > > this might not be needed.
> >
> > I agree that we might still run into this issue if primary HART enters
> > OpenSBI before secondary HARTs. I think this situation can only
> > happen on QEMU where each CPU is a thread running on host but
> > very unlikely/impossible on real HW.
> >
> > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > secondary HARTs and before jumping to OpenSBI ?
> >
>
> You are right, I don't think we will ever see this on real hardware. I
> can add a check to ensure all harts have processed the IPI before
> jumping to OpenSBI on the main hart. A simple delay is probably already
> sufficient however.

Yes, please change spl_opensbi like you described.

Maybe also add some comments about the issues we discussed here.

>
> > Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> > size of the hart_mask. It is possible in-future SOC vendors come-up
> > with SOC having huge number of HARTs OR SOC with discontinuous
> > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > all HARTs.
> >
> > Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> > in fw_base.S which will add to the boot-time as well.
> >
>
> I agree, it's best to keep everything simple here.

Cool, I guess we are in-sync.

>
> > I still think the root cause of the issue is that TEXT_START of
> > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > insist SOC vendors to not use same TEXT_START ?
> >
>
> That is definitely the best solution. I am not familiar with the
> details of the Andes platform, so unsure if that would work on it. On
> the SiFive Unleashed board this will be possible as soon as we have a
> working DRAM driver. U-Boot SPL could then run from L2 scratchpad
> memory instead of DRAM.

Yes, this won't be a problem on SiFive Unleashed.

Regards,
Anup
Lukas Auer Nov. 7, 2019, 1:37 p.m. UTC | #30
On Thu, 2019-11-07 at 17:57 +0530, Anup Patel wrote:
> On Thu, Nov 7, 2019 at 5:14 PM Auer, Lukas
> <lukas.auer@aisec.fraunhofer.de> wrote:
> > On Thu, 2019-11-07 at 16:14 +0530, Anup Patel wrote:
> > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > Hi Anup
> > > > > > 
> > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > Hi Anup
> > > > > > > > > 
> > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > > > 
> > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > 
> > > > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > > > 
> > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > > > 
> > > > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > 
> > > > > > > > > > > Yes.
> > > > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > 
> > > > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > > > 
> > > > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > 
> > > > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > 
> > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > 
> > > > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > 
> > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > 
> > > > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > 
> > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > 
> > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > 
> > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > 
> > > > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > > > have the following change
> > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > change to
> > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > before this commit
> > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > 
> > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > > > SPL.
> > > > > > > > 
> > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > > > pass anything in 'a3' register.
> > > > > > > > 
> > > > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > > > maintain backward compatibility as well.
> > > > > > > > 
> > > > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > 
> > > > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > > > OpenSBI text start. :)
> > > > > > > > 
> > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > > > 
> > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > 
> > > > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > 
> > > > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > > > 
> > > > > > 
> > > > > > Yes, you are right. I reply too quickly.
> > > > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > > > 
> > > > > > Thanks for your corrections
> > > > > 
> > > > > No problem, I am happy to help.
> > > > > 
> > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > 
> > > > > Maybe below changes can help you...
> > > > 
> > > > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > > > had to add a short delay between sending the IPIs to trigger the
> > > > problem.
> > > > 
> > > > We might still run into problems however. Right now, we are assuming
> > > > that the main hart is the last one to enter OpenSBI. If this is not the
> > > > case (some delay when handling the IPI), we will have the same problem
> > > > again. To fix this we could pass the hart mask, containing all harts
> > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > running in OpenSBI. I am not sure how realistic this scenario is, so
> > > > this might not be needed.
> > > 
> > > I agree that we might still run into this issue if primary HART enters
> > > OpenSBI before secondary HARTs. I think this situation can only
> > > happen on QEMU where each CPU is a thread running on host but
> > > very unlikely/impossible on real HW.
> > > 
> > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > secondary HARTs and before jumping to OpenSBI ?
> > > 
> > 
> > You are right, I don't think we will ever see this on real hardware. I
> > can add a check to ensure all harts have processed the IPI before
> > jumping to OpenSBI on the main hart. A simple delay is probably already
> > sufficient however.
> 
> Yes, please change spl_opensbi like you described.
> 
> Maybe also add some comments about the issues we discussed here.

Good idea, I will add the changes and describe the issue in a comment.

Regards,
Lukas

> 
> > > Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > all HARTs.
> > > 
> > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> > > in fw_base.S which will add to the boot-time as well.
> > > 
> > 
> > I agree, it's best to keep everything simple here.
> 
> Cool, I guess we are in-sync.
> 
> > > I still think the root cause of the issue is that TEXT_START of
> > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > insist SOC vendors to not use same TEXT_START ?
> > > 
> > 
> > That is definitely the best solution. I am not familiar with the
> > details of the Andes platform, so unsure if that would work on it. On
> > the SiFive Unleashed board this will be possible as soon as we have a
> > working DRAM driver. U-Boot SPL could then run from L2 scratchpad
> > memory instead of DRAM.
> 
> Yes, this won't be a problem on SiFive Unleashed.
> 
> Regards,
> Anup
Atish Patra Nov. 7, 2019, 6:44 p.m. UTC | #31
On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> Hi Anup & Lukas
> 
> Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > wrote:
> > > > > Hi Anup
> > > > > 
> > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > anup@brainfault.org> wrote:
> > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > Hi Anup
> > > > > > > > 
> > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > > 
> > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > 
> > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > 
> > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > 
> > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > RAM.
> > > > > > > > > > > 
> > > > > > > > > > > I see the issue now.
> > > > > > > > > > > 
> > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > 
> > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > 
> > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > convenient since this series
> > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > 
> > > > > > > > > > Yes.
> > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > 
> > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > on QEMU as well.
> > > > > > > > > 
> > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > other harts go to
> > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > 
> > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > register because this
> > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > pass preferred
> > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > 
> > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > 
> > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > branch of
> > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > 
> > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > 
> > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > passed to
> > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > 
> > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > by U-Boot SPL.
> > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > 
> > > > > > > That's wrong in U-Boot SPL.
> > > > > > > 
> > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > 
> > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > OpenSBI just
> > > > > > > > have the following change
> > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > change to
> > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > before this commit
> > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > firmware: Introduce relocation lottery
> > > > > > > 
> > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > passing
> > > > > > > value in a3 for these firmwares because these are not
> > > > > > > booted by U-Boot
> > > > > > > SPL.
> > > > > > > 
> > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > which does not
> > > > > > > pass anything in 'a3' register.
> > > > > > > 
> > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > this so that we can
> > > > > > > maintain backward compatibility as well.
> > > > > > > 
> > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > fw_dynamic_info"
> > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > 
> > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > from lottery mechanism.
> > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > overlap with
> > > > > > > > OpenSBI text start. :)
> > > > > > > 
> > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > QEMU as well. It's
> > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > QEMU.
> > > > > > > 
> > > > > > > Let's fix this in the right way from start itself.
> > > > > > 
> > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > right thing
> > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > register.
> > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > 
> > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > you ??
> > > > > > 
> > > > > 
> > > > > Yes, you are right. I reply too quickly.
> > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > OpenSBI.
> > > > > 
> > > > > Thanks for your corrections
> > > > 
> > > > No problem, I am happy to help.
> > > > 
> > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > 
> > > > Maybe below changes can help you...
> > > 
> > > Thanks for looking into this issue! I successfully tested it on
> > > QEMU, I
> > > had to add a short delay between sending the IPIs to trigger the
> > > problem.
> > > 
> > > We might still run into problems however. Right now, we are
> > > assuming
> > > that the main hart is the last one to enter OpenSBI. If this is
> > > not the
> > > case (some delay when handling the IPI), we will have the same
> > > problem
> > > again. To fix this we could pass the hart mask, containing all
> > > harts
> > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > so
> > > this might not be needed.
> > 
> > I agree that we might still run into this issue if primary HART
> > enters
> > OpenSBI before secondary HARTs. I think this situation can only
> > happen on QEMU where each CPU is a thread running on host but
> > very unlikely/impossible on real HW.
> > 
> > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > secondary HARTs and before jumping to OpenSBI ?
> > 
> > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > the
> > size of the hart_mask. It is possible in-future SOC vendors come-up
> > with SOC having huge number of HARTs OR SOC with discontinuous
> > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > all HARTs.
> > 
> > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > loop
> > in fw_base.S which will add to the boot-time as well.
> > 
> > I still think the root cause of the issue is that TEXT_START of
> > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > insist SOC vendors to not use same TEXT_START ?
> 
> I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> preferred_boot_hart_v2 branch
> It still encounter some booting problems. I try to find out the root
> cause but in vain.
> 

Just wanted to make sure that you have tried this patch.

http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html

We should investigate the issue why it did not work for you if this
patch did not work for you.

> I am very agree with options of Lukas.
> After modifying U-Boot SPL text base not equal to zero and the
> booting
> progress will be pass.
> 
> Thanks
> Rick
> 
> > Regards,
> > Anup
Rick Chen Nov. 8, 2019, 1:13 a.m. UTC | #32
Hi Atish

>
> On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > Hi Anup & Lukas
> >
> > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > wrote:
> > > > > > Hi Anup
> > > > > >
> > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > anup@brainfault.org> wrote:
> > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > RAM.
> > > > > > > > > > > >
> > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > >
> > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > >
> > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > >
> > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > >
> > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > on QEMU as well.
> > > > > > > > > >
> > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > other harts go to
> > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > >
> > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > register because this
> > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > pass preferred
> > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > >
> > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > >
> > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > branch of
> > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > >
> > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > >
> > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > passed to
> > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > >
> > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > by U-Boot SPL.
> > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > >
> > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > >
> > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > >
> > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > OpenSBI just
> > > > > > > > > have the following change
> > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > change to
> > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > before this commit
> > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > >
> > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > passing
> > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > booted by U-Boot
> > > > > > > > SPL.
> > > > > > > >
> > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > which does not
> > > > > > > > pass anything in 'a3' register.
> > > > > > > >
> > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > this so that we can
> > > > > > > > maintain backward compatibility as well.
> > > > > > > >
> > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > fw_dynamic_info"
> > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > >
> > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > from lottery mechanism.
> > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > overlap with
> > > > > > > > > OpenSBI text start. :)
> > > > > > > >
> > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > QEMU as well. It's
> > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > QEMU.
> > > > > > > >
> > > > > > > > Let's fix this in the right way from start itself.
> > > > > > >
> > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > right thing
> > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > register.
> > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > >
> > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > you ??
> > > > > > >
> > > > > >
> > > > > > Yes, you are right. I reply too quickly.
> > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > OpenSBI.
> > > > > >
> > > > > > Thanks for your corrections
> > > > >
> > > > > No problem, I am happy to help.
> > > > >
> > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > >
> > > > > Maybe below changes can help you...
> > > >
> > > > Thanks for looking into this issue! I successfully tested it on
> > > > QEMU, I
> > > > had to add a short delay between sending the IPIs to trigger the
> > > > problem.
> > > >
> > > > We might still run into problems however. Right now, we are
> > > > assuming
> > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > not the
> > > > case (some delay when handling the IPI), we will have the same
> > > > problem
> > > > again. To fix this we could pass the hart mask, containing all
> > > > harts
> > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > so
> > > > this might not be needed.
> > >
> > > I agree that we might still run into this issue if primary HART
> > > enters
> > > OpenSBI before secondary HARTs. I think this situation can only
> > > happen on QEMU where each CPU is a thread running on host but
> > > very unlikely/impossible on real HW.
> > >
> > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > secondary HARTs and before jumping to OpenSBI ?
> > >
> > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > the
> > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > all HARTs.
> > >
> > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > loop
> > > in fw_base.S which will add to the boot-time as well.
> > >
> > > I still think the root cause of the issue is that TEXT_START of
> > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > insist SOC vendors to not use same TEXT_START ?
> >
> > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > preferred_boot_hart_v2 branch
> > It still encounter some booting problems. I try to find out the root
> > cause but in vain.
> >
>
> Just wanted to make sure that you have tried this patch.
>
> http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
>
> We should investigate the issue why it did not work for you if this
> patch did not work for you.

Yes, I try with this
commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
firmware: Add preferred boot HART field in struct fw_dynamic_info

It fail randomly yesterday, but this morning I try several times it will pass.
I will keep trying.

Thanks
Rick


>
> > I am very agree with options of Lukas.
> > After modifying U-Boot SPL text base not equal to zero and the
> > booting
> > progress will be pass.
> >
> > Thanks
> > Rick
> >
> > > Regards,
> > > Anup
>
> --
> Regards,
> Atish
Rick Chen Nov. 8, 2019, 1:23 a.m. UTC | #33
Hi Anup

>
> On Thu, Nov 7, 2019 at 5:11 PM Rick Chen <rickchen36@gmail.com> wrote:
> >
> > Hi Anup & Lukas
> >
> > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > >
> > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > >
> > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > Hi Anup
> > > > > >
> > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > > >
> > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > >
> > > > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > > >
> > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > > >
> > > > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > >
> > > > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > > >
> > > > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > >
> > > > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > >
> > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > >
> > > > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > >
> > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > >
> > > > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > >
> > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > >
> > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > >
> > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > >
> > > > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > > > have the following change
> > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > change to
> > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > before this commit
> > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > >
> > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > > > SPL.
> > > > > > > >
> > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > > > pass anything in 'a3' register.
> > > > > > > >
> > > > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > > > maintain backward compatibility as well.
> > > > > > > >
> > > > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > >
> > > > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > > > OpenSBI text start. :)
> > > > > > > >
> > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > > >
> > > > > > > > Let's fix this in the right way from start itself.
> > > > > > >
> > > > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > >
> > > > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > > >
> > > > > >
> > > > > > Yes, you are right. I reply too quickly.
> > > > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > > >
> > > > > > Thanks for your corrections
> > > > >
> > > > > No problem, I am happy to help.
> > > > >
> > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > >
> > > > > Maybe below changes can help you...
> > > >
> > > > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > > > had to add a short delay between sending the IPIs to trigger the
> > > > problem.
> > > >
> > > > We might still run into problems however. Right now, we are assuming
> > > > that the main hart is the last one to enter OpenSBI. If this is not the
> > > > case (some delay when handling the IPI), we will have the same problem
> > > > again. To fix this we could pass the hart mask, containing all harts
> > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > running in OpenSBI. I am not sure how realistic this scenario is, so
> > > > this might not be needed.
> > >
> > > I agree that we might still run into this issue if primary HART enters
> > > OpenSBI before secondary HARTs. I think this situation can only
> > > happen on QEMU where each CPU is a thread running on host but
> > > very unlikely/impossible on real HW.
> > >
> > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > secondary HARTs and before jumping to OpenSBI ?
> > >
> > > Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > all HARTs.
> > >
> > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> > > in fw_base.S which will add to the boot-time as well.
> > >
> > > I still think the root cause of the issue is that TEXT_START of
> > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > insist SOC vendors to not use same TEXT_START ?
> >
> > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > preferred_boot_hart_v2 branch
> > It still encounter some booting problems. I try to find out the root
> > cause but in vain.
> >
> > I am very agree with options of Lukas.
> > After modifying U-Boot SPL text base not equal to zero and the booting
> > progress will be pass.
>
> No problem, it will be your decision to go with different TEXT_BASE for
> AndesTech platform.

You misunderstand my intention
It is just a temporary solution for debugging.
I prefer U-Boot SPL text base can be sync with you finally.

>
> We will keep the "boot_hart" field in OpenSBI for U-Boot SPL on QEMU
> and I will wait for Lukas to add more checks and small delay in U-Boot SPL
> (like he mentioned).

I am happy to hear that. :)

Thanks
Rick

>
> Thanks,
> Anup
Rick Chen Nov. 8, 2019, 7:27 a.m. UTC | #34
Hi Atish

>
> Hi Atish
>
> >
> > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > Hi Anup & Lukas
> > >
> > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > wrote:
> > > > > > > Hi Anup
> > > > > > >
> > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > >
> > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes.
> > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > >
> > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > on QEMU as well.
> > > > > > > > > > >
> > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > other harts go to
> > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > >
> > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > register because this
> > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > pass preferred
> > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > >
> > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > >
> > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > branch of
> > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > >
> > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > >
> > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > passed to
> > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > >
> > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > >
> > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > >
> > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > >
> > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > OpenSBI just
> > > > > > > > > > have the following change
> > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > change to
> > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > before this commit
> > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > >
> > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > passing
> > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > booted by U-Boot
> > > > > > > > > SPL.
> > > > > > > > >
> > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > which does not
> > > > > > > > > pass anything in 'a3' register.
> > > > > > > > >
> > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > this so that we can
> > > > > > > > > maintain backward compatibility as well.
> > > > > > > > >
> > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > fw_dynamic_info"
> > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > >
> > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > from lottery mechanism.
> > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > overlap with
> > > > > > > > > > OpenSBI text start. :)
> > > > > > > > >
> > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > QEMU as well. It's
> > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > QEMU.
> > > > > > > > >
> > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > >
> > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > right thing
> > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > register.
> > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > >
> > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > you ??
> > > > > > > >
> > > > > > >
> > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > OpenSBI.
> > > > > > >
> > > > > > > Thanks for your corrections
> > > > > >
> > > > > > No problem, I am happy to help.
> > > > > >
> > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > >
> > > > > > Maybe below changes can help you...
> > > > >
> > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > QEMU, I
> > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > problem.
> > > > >
> > > > > We might still run into problems however. Right now, we are
> > > > > assuming
> > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > not the
> > > > > case (some delay when handling the IPI), we will have the same
> > > > > problem
> > > > > again. To fix this we could pass the hart mask, containing all
> > > > > harts
> > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > so
> > > > > this might not be needed.
> > > >
> > > > I agree that we might still run into this issue if primary HART
> > > > enters
> > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > happen on QEMU where each CPU is a thread running on host but
> > > > very unlikely/impossible on real HW.
> > > >
> > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > secondary HARTs and before jumping to OpenSBI ?
> > > >
> > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > the
> > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > all HARTs.
> > > >
> > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > loop
> > > > in fw_base.S which will add to the boot-time as well.
> > > >
> > > > I still think the root cause of the issue is that TEXT_START of
> > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > insist SOC vendors to not use same TEXT_START ?
> > >
> > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > preferred_boot_hart_v2 branch
> > > It still encounter some booting problems. I try to find out the root
> > > cause but in vain.
> > >
> >
> > Just wanted to make sure that you have tried this patch.
> >
> > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> >
> > We should investigate the issue why it did not work for you if this
> > patch did not work for you.
>
> Yes, I try with this
> commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> firmware: Add preferred boot HART field in struct fw_dynamic_info
>
> It fail randomly yesterday, but this morning I try several times it will pass.
> I will keep trying.
>

I have figure out one fail case which is belong to main hart of U-Boot
SPL is not the last hart while entering OpenSBI

case 1 (fail)

hb *0x1000000

Thread 2 hit Breakpoint 3, 0x0000000001000000 in ?? ()
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 1 (hart 1) 0x0000000001000000 in ?? ()
* 2    Thread 2 (hart 2) 0x0000000001000000 in ?? ()
  3    Thread 3 (hart 3) secondary_hart_loop () at arch/riscv/cpu/start.S:416
  4    Thread 4 (hart 4) secondary_hart_loop () at arch/riscv/cpu/start.S:416

(gdb) p/x gd->arch.boot_hart
$3 = 0x1

main hart is hart 1 in U-Boot SPL, hart 1 have arrived OpenSBI, but
hart 3, 4 still run in U-Boot SPL

c

U-Boot console print message

U-Boot SPL 2019.10-00603-g850848e-dirty (Nov 08 2019 - 14:37:54 +0800)
Trying to boot from RAM

U-Boot 2019.10-00603-g850848e-dirty (Nov 08 2019 - 14:37:54 +0800)

DRAM:  1 GiB
Flash:

<== hang here

Thanks
Rick
Lukas Auer Nov. 8, 2019, 8:59 a.m. UTC | #35
Hi Rick,

On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> Hi Atish
> 
> > Hi Atish
> > 
> > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > Hi Anup & Lukas
> > > > 
> > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > wrote:
> > > > > > > > Hi Anup
> > > > > > > > 
> > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > 
> > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > 
> > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > 
> > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > register because this
> > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > pass preferred
> > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > 
> > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > 
> > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > branch of
> > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > 
> > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > 
> > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > passed to
> > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > 
> > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > 
> > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > 
> > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > 
> > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > OpenSBI just
> > > > > > > > > > > have the following change
> > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > change to
> > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > before this commit
> > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > 
> > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > passing
> > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > booted by U-Boot
> > > > > > > > > > SPL.
> > > > > > > > > > 
> > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > which does not
> > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > 
> > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > this so that we can
> > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > 
> > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > 
> > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > overlap with
> > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > 
> > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > QEMU as well. It's
> > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > QEMU.
> > > > > > > > > > 
> > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > 
> > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > right thing
> > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > register.
> > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > 
> > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > you ??
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > OpenSBI.
> > > > > > > > 
> > > > > > > > Thanks for your corrections
> > > > > > > 
> > > > > > > No problem, I am happy to help.
> > > > > > > 
> > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > 
> > > > > > > Maybe below changes can help you...
> > > > > > 
> > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > QEMU, I
> > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > problem.
> > > > > > 
> > > > > > We might still run into problems however. Right now, we are
> > > > > > assuming
> > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > not the
> > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > problem
> > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > harts
> > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > so
> > > > > > this might not be needed.
> > > > > 
> > > > > I agree that we might still run into this issue if primary HART
> > > > > enters
> > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > very unlikely/impossible on real HW.
> > > > > 
> > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > 
> > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > the
> > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > all HARTs.
> > > > > 
> > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > loop
> > > > > in fw_base.S which will add to the boot-time as well.
> > > > > 
> > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > insist SOC vendors to not use same TEXT_START ?
> > > > 
> > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > preferred_boot_hart_v2 branch
> > > > It still encounter some booting problems. I try to find out the root
> > > > cause but in vain.
> > > > 
> > > 
> > > Just wanted to make sure that you have tried this patch.
> > > 
> > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > 
> > > We should investigate the issue why it did not work for you if this
> > > patch did not work for you.
> > 
> > Yes, I try with this
> > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > 
> > It fail randomly yesterday, but this morning I try several times it will pass.
> > I will keep trying.
> > 
> 
> I have figure out one fail case which is belong to main hart of U-Boot
> SPL is not the last hart while entering OpenSBI
> 

Can you try this branch [1]? It includes a quick implementation of the
changes a mentioned yesterday, where the main hart waits until all
harts have received the IPI.

[1]: 
https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart

Thanks,
Lukas
Anup Patel Nov. 8, 2019, 12:14 p.m. UTC | #36
On Fri, Nov 8, 2019 at 6:53 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Anup
>
> >
> > On Thu, Nov 7, 2019 at 5:11 PM Rick Chen <rickchen36@gmail.com> wrote:
> > >
> > > Hi Anup & Lukas
> > >
> > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > >
> > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > >
> > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > Hi Anup
> > > > > > >
> > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM Alan Kao <alankao@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments below.
> > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50 AM Rick Chen <rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at 2:18 PM Andes <uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to hart 0 always will be main
> > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When develop SPL flow, I try to
> > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit contain 2 fixes, or just 1 fix?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But they will cause one negative result
> > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi to other harts.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart can be main hart in U-Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it still fail somewhere. After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > and found there is an assumption that hart 0 shall be
> > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug. Maybe it is a compatible issue.
> > > > > > > > > > > > > > > > > > > > > > > There is a limitation that only hart 0 can be main hart in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such limitation.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs of the initialization are the
> > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > 2. determine which route to take based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this signature, if you are not willing to call it
> > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was not there until we added self-relocation
> > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI but we might end-up having boot_lottery.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I have send a patch to fix this OpenSBI:
> > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce relocation lottery"
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you try above patch and see if that helps ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > It will be great if you can provide Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can not find this patch in mailing list.
> > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0 play as main hart.
> > > > > > > > > > > > > > > The hart 1 will receive ipi and come into OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still run in U-Boot SPL.
> > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower which will copy data from
> > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares ensures that OpenSBI firmware
> > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering C code. This helps
> > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the U-Boot SPL is running.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-Boot SPL are linked to
> > > > > > > > > > > > > > address same address 0x0. This means secondary HARTs cannot safely
> > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You should hold secondary HARTs
> > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and U-Boot proper are
> > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs should jump to OpenSBI
> > > > > > > > > > > > > > at the same time after everything is loaded in RAM.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART jump to OpenSBI and primary
> > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > (Refer, jump_to_image_no_args() in arch/riscv/lib/spl.c)
> > > > > > > > > > > > >
> > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-Boot SPL TEXT_START.
> > > > > > > > > > > > >
> > > > > > > > > > > > > If possible please change TEXT base for U-Boot SPL or OpenSBI. I think
> > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be convenient since this series
> > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes.
> > > > > > > > > > > > I know it can avoid corrupting issue with changing  U-Boot SPL
> > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > >
> > > > > > > > > > > I think this issue will be seen on U-Boot SPL running on QEMU as well.
> > > > > > > > > > >
> > > > > > > > > > > > With the following changes, U-Boot SPL text base can equal to OpenSBI text base
> > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when jumping to OpenSBI
> > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart, other harts go to
> > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > >
> > > > > > > > > > > Overall it's a good suggestion but we cannot use a2 register because this
> > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should pass preferred
> > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > >
> > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > >
> > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1 branch of
> > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > >
> > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > >
> > > > > > > > > > > You will have to update the "struct fw_dynamic_info" passed to
> > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > >
> > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info" to OpenSBI by U-Boot SPL.
> > > > > > > > >
> > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > >
> > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > >
> > > > > > > > > > So if U-Boot SPL can pass main hart information via a3, OpenSBI just
> > > > > > > > > > have the following change
> > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > change to
> > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > before this commit
> > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > >
> > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of passing
> > > > > > > > > value in a3 for these firmwares because these are not booted by U-Boot
> > > > > > > > > SPL.
> > > > > > > > >
> > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support which does not
> > > > > > > > > pass anything in 'a3' register.
> > > > > > > > >
> > > > > > > > > We should definitely use "struct fw_dynamic_info" for this so that we can
> > > > > > > > > maintain backward compatibility as well.
> > > > > > > > >
> > > > > > > > > Please make sure that U-Boot SPL passes "struct fw_dynamic_info"
> > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > >
> > > > > > > > > > But after this commit 98f4a, main hart become chosen from lottery mechanism.
> > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not overlap with
> > > > > > > > > > OpenSBI text start. :)
> > > > > > > > >
> > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on QEMU as well. It's
> > > > > > > > > just that most of us did not notice it for U-Boot SPL on QEMU.
> > > > > > > > >
> > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > >
> > > > > > > > I double checked spl_invoke_opensbi() and it is doing the right thing
> > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2' register.
> > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > >
> > > > > > > > Not sure, why it is not passing 'a2' register correctly for you ??
> > > > > > > >
> > > > > > >
> > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to OpenSBI.
> > > > > > >
> > > > > > > Thanks for your corrections
> > > > > >
> > > > > > No problem, I am happy to help.
> > > > > >
> > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > >
> > > > > > Maybe below changes can help you...
> > > > >
> > > > > Thanks for looking into this issue! I successfully tested it on QEMU, I
> > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > problem.
> > > > >
> > > > > We might still run into problems however. Right now, we are assuming
> > > > > that the main hart is the last one to enter OpenSBI. If this is not the
> > > > > case (some delay when handling the IPI), we will have the same problem
> > > > > again. To fix this we could pass the hart mask, containing all harts
> > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > running in OpenSBI. I am not sure how realistic this scenario is, so
> > > > > this might not be needed.
> > > >
> > > > I agree that we might still run into this issue if primary HART enters
> > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > happen on QEMU where each CPU is a thread running on host but
> > > > very unlikely/impossible on real HW.
> > > >
> > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > secondary HARTs and before jumping to OpenSBI ?
> > > >
> > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be the
> > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > all HARTs.
> > > >
> > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-loop
> > > > in fw_base.S which will add to the boot-time as well.
> > > >
> > > > I still think the root cause of the issue is that TEXT_START of
> > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > insist SOC vendors to not use same TEXT_START ?
> > >
> > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > preferred_boot_hart_v2 branch
> > > It still encounter some booting problems. I try to find out the root
> > > cause but in vain.
> > >
> > > I am very agree with options of Lukas.
> > > After modifying U-Boot SPL text base not equal to zero and the booting
> > > progress will be pass.
> >
> > No problem, it will be your decision to go with different TEXT_BASE for
> > AndesTech platform.
>
> You misunderstand my intention
> It is just a temporary solution for debugging.
> I prefer U-Boot SPL text base can be sync with you finally.

No worries, we are on same page here. If we get U-Boot SPL -> OpenSBI
boot-flow stable on AndesTech platform then it will make things easy for
SiFive Unleashed as well.

If possible please try your patches on-top-of Lukas's changes. I am sure
this will help you achieve 100% reliability.

>
> >
> > We will keep the "boot_hart" field in OpenSBI for U-Boot SPL on QEMU
> > and I will wait for Lukas to add more checks and small delay in U-Boot SPL
> > (like he mentioned).
>
> I am happy to hear that. :)

Cool.

Best Regards,
Anup
Rick Chen Nov. 11, 2019, 7:19 a.m. UTC | #37
Hi Lukas

>
> Hi Rick,
>
> On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> > Hi Atish
> >
> > > Hi Atish
> > >
> > > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > > Hi Anup & Lukas
> > > > >
> > > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > > wrote:
> > > > > > > > > Hi Anup
> > > > > > > > >
> > > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > >
> > > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > > register because this
> > > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > > pass preferred
> > > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > >
> > > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > >
> > > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > > branch of
> > > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > > passed to
> > > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > >
> > > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > >
> > > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > >
> > > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > >
> > > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > > OpenSBI just
> > > > > > > > > > > > have the following change
> > > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > > change to
> > > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > > before this commit
> > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > >
> > > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > > passing
> > > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > > booted by U-Boot
> > > > > > > > > > > SPL.
> > > > > > > > > > >
> > > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > > which does not
> > > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > >
> > > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > > this so that we can
> > > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > >
> > > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > >
> > > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > > overlap with
> > > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > >
> > > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > > QEMU as well. It's
> > > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > > QEMU.
> > > > > > > > > > >
> > > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > >
> > > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > > right thing
> > > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > > register.
> > > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > >
> > > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > > you ??
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > > OpenSBI.
> > > > > > > > >
> > > > > > > > > Thanks for your corrections
> > > > > > > >
> > > > > > > > No problem, I am happy to help.
> > > > > > > >
> > > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > >
> > > > > > > > Maybe below changes can help you...
> > > > > > >
> > > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > > QEMU, I
> > > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > > problem.
> > > > > > >
> > > > > > > We might still run into problems however. Right now, we are
> > > > > > > assuming
> > > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > > not the
> > > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > > problem
> > > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > > harts
> > > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > > so
> > > > > > > this might not be needed.
> > > > > >
> > > > > > I agree that we might still run into this issue if primary HART
> > > > > > enters
> > > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > > very unlikely/impossible on real HW.
> > > > > >
> > > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > >
> > > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > > the
> > > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > > all HARTs.
> > > > > >
> > > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > > loop
> > > > > > in fw_base.S which will add to the boot-time as well.
> > > > > >
> > > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > > insist SOC vendors to not use same TEXT_START ?
> > > > >
> > > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > > preferred_boot_hart_v2 branch
> > > > > It still encounter some booting problems. I try to find out the root
> > > > > cause but in vain.
> > > > >
> > > >
> > > > Just wanted to make sure that you have tried this patch.
> > > >
> > > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > >
> > > > We should investigate the issue why it did not work for you if this
> > > > patch did not work for you.
> > >
> > > Yes, I try with this
> > > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > >
> > > It fail randomly yesterday, but this morning I try several times it will pass.
> > > I will keep trying.
> > >
> >
> > I have figure out one fail case which is belong to main hart of U-Boot
> > SPL is not the last hart while entering OpenSBI
> >
>
> Can you try this branch [1]? It includes a quick implementation of the
> changes a mentioned yesterday, where the main hart waits until all
> harts have received the IPI.
>
> [1]:
> https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart

I have try this patch, but it seems can not guarantee main hart to be
the last hart while leaving U-Boot SPL.
Even the main hart have checked all harts have received the IPI, but
it still have opportunities to arrive OpenSBI  before other harts.

Thanks
Rick


>
> Thanks,
> Lukas
Lukas Auer Nov. 12, 2019, 9:47 a.m. UTC | #38
Hi Rick,

On Mon, 2019-11-11 at 15:19 +0800, Rick Chen wrote:
> Hi Lukas
> 
> > Hi Rick,
> > 
> > On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> > > Hi Atish
> > > 
> > > > Hi Atish
> > > > 
> > > > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > > > Hi Anup & Lukas
> > > > > > 
> > > > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > > > wrote:
> > > > > > > > > > Hi Anup
> > > > > > > > > > 
> > > > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > > > register because this
> > > > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > > > pass preferred
> > > > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > > > branch of
> > > > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > > > passed to
> > > > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > > > 
> > > > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > > > 
> > > > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > > > 
> > > > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > > > OpenSBI just
> > > > > > > > > > > > > have the following change
> > > > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > change to
> > > > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > before this commit
> > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > > > 
> > > > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > > > passing
> > > > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > > > booted by U-Boot
> > > > > > > > > > > > SPL.
> > > > > > > > > > > > 
> > > > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > > > which does not
> > > > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > > > 
> > > > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > > > this so that we can
> > > > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > > > 
> > > > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > > > 
> > > > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > > > overlap with
> > > > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > > > 
> > > > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > > > QEMU as well. It's
> > > > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > > > QEMU.
> > > > > > > > > > > > 
> > > > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > > > 
> > > > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > > > right thing
> > > > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > > > register.
> > > > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > > > 
> > > > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > > > you ??
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > > > OpenSBI.
> > > > > > > > > > 
> > > > > > > > > > Thanks for your corrections
> > > > > > > > > 
> > > > > > > > > No problem, I am happy to help.
> > > > > > > > > 
> > > > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > > > 
> > > > > > > > > Maybe below changes can help you...
> > > > > > > > 
> > > > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > > > QEMU, I
> > > > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > > > problem.
> > > > > > > > 
> > > > > > > > We might still run into problems however. Right now, we are
> > > > > > > > assuming
> > > > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > > > not the
> > > > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > > > problem
> > > > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > > > harts
> > > > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > > > so
> > > > > > > > this might not be needed.
> > > > > > > 
> > > > > > > I agree that we might still run into this issue if primary HART
> > > > > > > enters
> > > > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > > > very unlikely/impossible on real HW.
> > > > > > > 
> > > > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > > > 
> > > > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > > > the
> > > > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > > > all HARTs.
> > > > > > > 
> > > > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > > > loop
> > > > > > > in fw_base.S which will add to the boot-time as well.
> > > > > > > 
> > > > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > > > insist SOC vendors to not use same TEXT_START ?
> > > > > > 
> > > > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > > > preferred_boot_hart_v2 branch
> > > > > > It still encounter some booting problems. I try to find out the root
> > > > > > cause but in vain.
> > > > > > 
> > > > > 
> > > > > Just wanted to make sure that you have tried this patch.
> > > > > 
> > > > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > > > 
> > > > > We should investigate the issue why it did not work for you if this
> > > > > patch did not work for you.
> > > > 
> > > > Yes, I try with this
> > > > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > > > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > > > 
> > > > It fail randomly yesterday, but this morning I try several times it will pass.
> > > > I will keep trying.
> > > > 
> > > 
> > > I have figure out one fail case which is belong to main hart of U-Boot
> > > SPL is not the last hart while entering OpenSBI
> > > 
> > 
> > Can you try this branch [1]? It includes a quick implementation of the
> > changes a mentioned yesterday, where the main hart waits until all
> > harts have received the IPI.
> > 
> > [1]:
> > https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart
> 
> I have try this patch, but it seems can not guarantee main hart to be
> the last hart while leaving U-Boot SPL.
> Even the main hart have checked all harts have received the IPI, but
> it still have opportunities to arrive OpenSBI  before other harts.
> 

Thanks for testing! Can you try again with the same branch? I added
another patch so that clearing the IPI is the very last thing before
jumping to the SMP function.
If that does not help, we'll have to add a delay in
spl_invoke_opensbi() to delay the main hart.

Thanks,
Lukas
Rick Chen Nov. 13, 2019, 3:42 a.m. UTC | #39
Hi Lukas

>
> Hi Rick,
>
> On Mon, 2019-11-11 at 15:19 +0800, Rick Chen wrote:
> > Hi Lukas
> >
> > > Hi Rick,
> > >
> > > On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> > > > Hi Atish
> > > >
> > > > > Hi Atish
> > > > >
> > > > > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > > > > Hi Anup & Lukas
> > > > > > >
> > > > > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > > > > wrote:
> > > > > > > > > > > Hi Anup
> > > > > > > > > > >
> > > > > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > > > > register because this
> > > > > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > > > > pass preferred
> > > > > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > > > > branch of
> > > > > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > > > > passed to
> > > > > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > > > > OpenSBI just
> > > > > > > > > > > > > > have the following change
> > > > > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > change to
> > > > > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > before this commit
> > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > > > >
> > > > > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > > > > passing
> > > > > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > > > > booted by U-Boot
> > > > > > > > > > > > > SPL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > > > > which does not
> > > > > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > > > > this so that we can
> > > > > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > > > > overlap with
> > > > > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > > > > QEMU as well. It's
> > > > > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > > > > QEMU.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > > > >
> > > > > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > > > > right thing
> > > > > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > > > > register.
> > > > > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > > > >
> > > > > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > > > > you ??
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > > > > OpenSBI.
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your corrections
> > > > > > > > > >
> > > > > > > > > > No problem, I am happy to help.
> > > > > > > > > >
> > > > > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > > > >
> > > > > > > > > > Maybe below changes can help you...
> > > > > > > > >
> > > > > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > > > > QEMU, I
> > > > > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > > > > problem.
> > > > > > > > >
> > > > > > > > > We might still run into problems however. Right now, we are
> > > > > > > > > assuming
> > > > > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > > > > not the
> > > > > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > > > > problem
> > > > > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > > > > harts
> > > > > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > > > > so
> > > > > > > > > this might not be needed.
> > > > > > > >
> > > > > > > > I agree that we might still run into this issue if primary HART
> > > > > > > > enters
> > > > > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > > > > very unlikely/impossible on real HW.
> > > > > > > >
> > > > > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > > > >
> > > > > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > > > > the
> > > > > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > > > > all HARTs.
> > > > > > > >
> > > > > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > > > > loop
> > > > > > > > in fw_base.S which will add to the boot-time as well.
> > > > > > > >
> > > > > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > > > > insist SOC vendors to not use same TEXT_START ?
> > > > > > >
> > > > > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > > > > preferred_boot_hart_v2 branch
> > > > > > > It still encounter some booting problems. I try to find out the root
> > > > > > > cause but in vain.
> > > > > > >
> > > > > >
> > > > > > Just wanted to make sure that you have tried this patch.
> > > > > >
> > > > > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > > > >
> > > > > > We should investigate the issue why it did not work for you if this
> > > > > > patch did not work for you.
> > > > >
> > > > > Yes, I try with this
> > > > > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > > > > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > > > >
> > > > > It fail randomly yesterday, but this morning I try several times it will pass.
> > > > > I will keep trying.
> > > > >
> > > >
> > > > I have figure out one fail case which is belong to main hart of U-Boot
> > > > SPL is not the last hart while entering OpenSBI
> > > >
> > >
> > > Can you try this branch [1]? It includes a quick implementation of the
> > > changes a mentioned yesterday, where the main hart waits until all
> > > harts have received the IPI.
> > >
> > > [1]:
> > > https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart
> >
> > I have try this patch, but it seems can not guarantee main hart to be
> > the last hart while leaving U-Boot SPL.
> > Even the main hart have checked all harts have received the IPI, but
> > it still have opportunities to arrive OpenSBI  before other harts.
> >
>
> Thanks for testing! Can you try again with the same branch? I added
> another patch so that clearing the IPI is the very last thing before
> jumping to the SMP function.
> If that does not help, we'll have to add a delay in
> spl_invoke_opensbi() to delay the main hart.


Thanks for your patch, it indeed solve the problem almost.
It always pass the booting verification under free running, I try many
times and can not hit the failure case.

But if I do something with GDB, E.g
Set a break in 0x6c2 (c.jalr  s2) in handle_ipi() and then delete the
break and free run after it hit this break
then it will hit the failure case as below:


-----------------------------
U-Boot SPL fail print message
-----------------------------

U-Boot SPL 2020.01-rc1-08573-gc42669d-dirty (Nov 13 2019 - 10:31:12 +0800)
Trying to boot from RAM
boot_hart 0


U-Boot 2020.01-rc1-08573-gc42669d-dirty (Nov 13 2019 - 10:31:12 +0800)

DRAM:  1 GiB
Flash:

---------------
gdb information
---------------
Thread 4 hit Breakpoint 3, 0x00000000000006c2 in handle_ipi (hart=3)
    at arch/riscv/lib/smp.c:110
110             smp_function(hart, gd->arch.ipi[hart].arg0,
gd->arch.ipi[hart].arg1);
(gdb) inf othreads
Undefined info command: "othreads".  Try "help info".
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 1 (hart 1) 0x000000000122a8f8 in ?? ()
  2    Thread 2 (hart 2) 0x00000000000006c2 in handle_ipi (hart=1)
    at arch/riscv/lib/smp.c:110
  3    Thread 3 (hart 3) 0x00000000000006c2 in handle_ipi (hart=2)
    at arch/riscv/lib/smp.c:110
* 4    Thread 4 (hart 4) 0x00000000000006c2 in handle_ipi (hart=3)
    at arch/riscv/lib/smp.c:110
(gdb) d
Delete all breakpoints? (y or n) y
(gdb) c
Continuing.


----------
handle_ipi
----------

0000000000000678 <handle_ipi>:
     678:       479d                    c.li    a5,7
     67a:       04a7e663                bltu    a5,a0,6c6 <handle_ipi+0x4e>
     67e:       6a1042ef                jal     t0,551e <__riscv_save_2>
     682:       842a                    c.mv    s0,a0
     684:       0330000f                fence   rw,rw
     688:       44e1                    c.li    s1,24
     68a:       029504b3                mul     s1,a0,s1
     68e:       003487b3                add     a5,s1,gp
     692:       1507b903                ld      s2,336(a5)
     696:       d57ff0ef                jal     ra,3ec <invalidate_icache_all>
     69a:       0004051b                sext.w  a0,s0
     69e:       e4bff0ef                jal     ra,4e8 <riscv_clear_ipi>
     6a2:       c911                    c.beqz  a0,6b6 <handle_ipi+0x3e>
     6a4:       85a2                    c.mv    a1,s0
     6a6:       00005517                auipc   a0,0x5
     6aa:       36250513                addi    a0,a0,866 # 5a08 <_ctype+0x440>
     6ae:       3f8030ef                jal     ra,3aa6 <printf>
     6b2:       6a50406f                j       5556 <__riscv_restore_2>
     6b6:       948e                    c.add   s1,gp
     6b8:       1604b603                ld      a2,352(s1)
     6bc:       1584b583                ld      a1,344(s1)
     6c0:       8522                    c.mv    a0,s0
     6c2:       9902                    c.jalr  s2
     6c4:       b7fd                    c.j     6b2 <handle_ipi+0x3a>
     6c6:       8082                    c.jr    ra


Maybe it is a little picky and fussy, what I try to say is that
someone maybe can not understand why the main hart shall be the last
hart when leaving U-Boot SPL when they review this part.
I think the better way shall be that make sure all harts are there
before relocation in OpenSBI (You mentioned it before)

Thanks for your sharing this way to avoid this problem again.
It really provide me another way to work around in current status.
I will keep trying it's robustness and reliability under free running.

Thanks
Rick

>
> Thanks,
> Lukas
Anup Patel Nov. 14, 2019, 7:27 a.m. UTC | #40
On Wed, Nov 13, 2019 at 9:13 AM Rick Chen <rickchen36@gmail.com> wrote:
>
> Hi Lukas
>
> >
> > Hi Rick,
> >
> > On Mon, 2019-11-11 at 15:19 +0800, Rick Chen wrote:
> > > Hi Lukas
> > >
> > > > Hi Rick,
> > > >
> > > > On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> > > > > Hi Atish
> > > > >
> > > > > > Hi Atish
> > > > > >
> > > > > > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > > > > > Hi Anup & Lukas
> > > > > > > >
> > > > > > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > Hi Anup
> > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > > > > > register because this
> > > > > > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > > > > > pass preferred
> > > > > > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > > > > > branch of
> > > > > > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > > > > > passed to
> > > > > > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > > > > > OpenSBI just
> > > > > > > > > > > > > > > have the following change
> > > > > > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > > change to
> > > > > > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > > before this commit
> > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > > > > > passing
> > > > > > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > > > > > booted by U-Boot
> > > > > > > > > > > > > > SPL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > > > > > which does not
> > > > > > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > > > > > this so that we can
> > > > > > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > > > > > overlap with
> > > > > > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > > > > > QEMU as well. It's
> > > > > > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > > > > > QEMU.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > > > > > right thing
> > > > > > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > > > > > register.
> > > > > > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > > > > > you ??
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > > > > > OpenSBI.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for your corrections
> > > > > > > > > > >
> > > > > > > > > > > No problem, I am happy to help.
> > > > > > > > > > >
> > > > > > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > > > > >
> > > > > > > > > > > Maybe below changes can help you...
> > > > > > > > > >
> > > > > > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > > > > > QEMU, I
> > > > > > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > > > > > problem.
> > > > > > > > > >
> > > > > > > > > > We might still run into problems however. Right now, we are
> > > > > > > > > > assuming
> > > > > > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > > > > > not the
> > > > > > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > > > > > problem
> > > > > > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > > > > > harts
> > > > > > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > > > > > so
> > > > > > > > > > this might not be needed.
> > > > > > > > >
> > > > > > > > > I agree that we might still run into this issue if primary HART
> > > > > > > > > enters
> > > > > > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > > > > > very unlikely/impossible on real HW.
> > > > > > > > >
> > > > > > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > > > > >
> > > > > > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > > > > > the
> > > > > > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > > > > > all HARTs.
> > > > > > > > >
> > > > > > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > > > > > loop
> > > > > > > > > in fw_base.S which will add to the boot-time as well.
> > > > > > > > >
> > > > > > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > > > > > insist SOC vendors to not use same TEXT_START ?
> > > > > > > >
> > > > > > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > > > > > preferred_boot_hart_v2 branch
> > > > > > > > It still encounter some booting problems. I try to find out the root
> > > > > > > > cause but in vain.
> > > > > > > >
> > > > > > >
> > > > > > > Just wanted to make sure that you have tried this patch.
> > > > > > >
> > > > > > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > > > > >
> > > > > > > We should investigate the issue why it did not work for you if this
> > > > > > > patch did not work for you.
> > > > > >
> > > > > > Yes, I try with this
> > > > > > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > > > > > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > > > > >
> > > > > > It fail randomly yesterday, but this morning I try several times it will pass.
> > > > > > I will keep trying.
> > > > > >
> > > > >
> > > > > I have figure out one fail case which is belong to main hart of U-Boot
> > > > > SPL is not the last hart while entering OpenSBI
> > > > >
> > > >
> > > > Can you try this branch [1]? It includes a quick implementation of the
> > > > changes a mentioned yesterday, where the main hart waits until all
> > > > harts have received the IPI.
> > > >
> > > > [1]:
> > > > https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart
> > >
> > > I have try this patch, but it seems can not guarantee main hart to be
> > > the last hart while leaving U-Boot SPL.
> > > Even the main hart have checked all harts have received the IPI, but
> > > it still have opportunities to arrive OpenSBI  before other harts.
> > >
> >
> > Thanks for testing! Can you try again with the same branch? I added
> > another patch so that clearing the IPI is the very last thing before
> > jumping to the SMP function.
> > If that does not help, we'll have to add a delay in
> > spl_invoke_opensbi() to delay the main hart.
>
>
> Thanks for your patch, it indeed solve the problem almost.
> It always pass the booting verification under free running, I try many
> times and can not hit the failure case.
>
> But if I do something with GDB, E.g
> Set a break in 0x6c2 (c.jalr  s2) in handle_ipi() and then delete the
> break and free run after it hit this break
> then it will hit the failure case as below:

The current booting mechanism where all HARTs jump to each
booting stage (including Linux) at the same time is very fragile.

Attaching a debugger at boot-time will certainly break current way
of booting all HARTs because we are holding a particular HART in
debug state. I am not sure if we can handle the debugger case in
software but in real-world deployments debugger won't be present
so maybe we can ignore this case.

The SBI v0.2 HSM extension will make things better. It would be
even better if HW designers provide explicit HW mechanism to
power-on/off HARTs at run-time.

Regards,
Anup
Lukas Auer Nov. 14, 2019, 5:10 p.m. UTC | #41
On Thu, 2019-11-14 at 12:57 +0530, Anup Patel wrote:
> On Wed, Nov 13, 2019 at 9:13 AM Rick Chen <rickchen36@gmail.com> wrote:
> > Hi Lukas
> > 
> > > Hi Rick,
> > > 
> > > On Mon, 2019-11-11 at 15:19 +0800, Rick Chen wrote:
> > > > Hi Lukas
> > > > 
> > > > > Hi Rick,
> > > > > 
> > > > > On Fri, 2019-11-08 at 15:27 +0800, Rick Chen wrote:
> > > > > > Hi Atish
> > > > > > 
> > > > > > > Hi Atish
> > > > > > > 
> > > > > > > > On Thu, 2019-11-07 at 19:41 +0800, Rick Chen wrote:
> > > > > > > > > Hi Anup & Lukas
> > > > > > > > > 
> > > > > > > > > Anup Patel <anup@brainfault.org> 於 2019年11月7日 週四 下午6:44寫道:
> > > > > > > > > > On Thu, Nov 7, 2019 at 3:11 PM Auer, Lukas
> > > > > > > > > > <lukas.auer@aisec.fraunhofer.de> wrote:
> > > > > > > > > > > On Thu, 2019-11-07 at 11:48 +0530, Anup Patel wrote:
> > > > > > > > > > > > On Thu, Nov 7, 2019 at 11:40 AM Rick Chen <rickchen36@gmail.com
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Thu, Nov 7, 2019 at 10:45 AM Anup Patel <
> > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > On Thu, Nov 7, 2019 at 7:04 AM Rick Chen <
> > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:51 PM Rick Chen <
> > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 2:18 PM Anup Patel <
> > > > > > > > > > > > > > > > > > > anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > On Wed, Nov 6, 2019 at 12:14 PM Rick Chen <
> > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 5, 2019 at 7:19 AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > Hi Anup
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 1:42 PM Anup
> > > > > > > > > > > > > > > > > > > > > > > > > Patel <anup@brainfault.org> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:30 AM
> > > > > > > > > > > > > > > > > > > > > > > > > > Alan Kao <alankao@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the critics.  Comments
> > > > > > > > > > > > > > > > > > > > > > > > > > > below.
> > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > > 06:38:00PM +0800, Bin Meng wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2019 at 10:50
> > > > > > > > > > > > > > > > > > > > > > > > > > > > AM Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > rickchen36@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bin
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Rick,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 25, 2019 at
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2:18 PM Andes <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > uboot@andestech.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Rick Chen <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > rick@andestech.com>
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It will work fine due to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart 0 always will be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hart coincidentally. When
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > develop SPL flow, I try
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > force other harts to be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart. And it will go
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrong in sending IPI
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > flow. So fix it.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix what? Does this commit
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > contain 2 fixes, or just 1
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fix?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, it include two fixs. But
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > they will cause one negative
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > result
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > that only hart 0 can send ipi
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > to other harts.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Having this fix, any hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > can be main hart in U-
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Boot SPL
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > theoretically, but it
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > still fail somewhere.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After dig in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and found there is an
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > assumption that hart 0
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > shall be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > main hart in OpenSbi.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So does this mean there is
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > a bug in OpenSBI too?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure if it is a bug.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe it is a compatible
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > issue.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > There is a limitation that
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > only hart 0 can be main hart
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think OpenSBI has such
> > > > > > > > > > > > > > > > > > > > > > > > > > > > limitation.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > Please check the source.
> > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/riscv/opensbi/blob/master/firmware/fw_base.S#L54
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > Apparently, the FIRST TWO LINEs
> > > > > > > > > > > > > > > > > > > > > > > > > > > of the initialization are the
> > > > > > > > > > > > > > > > > > > > > > > > > > > 1. get hart ID.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 2. determine which route to take
> > > > > > > > > > > > > > > > > > > > > > > > > > > based on their ID respectively.
> > > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > > So, I do think OpenSBI has this
> > > > > > > > > > > > > > > > > > > > > > > > > > > signature, if you are not willing
> > > > > > > > > > > > > > > > > > > > > > > > > > > to call it
> > > > > > > > > > > > > > > > > > > > > > > > > > > a limitation.
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > This dependency on hart id #0 was
> > > > > > > > > > > > > > > > > > > > > > > > > > not there until we added self-
> > > > > > > > > > > > > > > > > > > > > > > > > > relocation
> > > > > > > > > > > > > > > > > > > > > > > > > > in OpenSBI for FW_DYNAMIC.
> > > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > > I will try to fix this in OpenSBI
> > > > > > > > > > > > > > > > > > > > > > > > > > but we might end-up having
> > > > > > > > > > > > > > > > > > > > > > > > > > boot_lottery.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > I have send a patch to fix this
> > > > > > > > > > > > > > > > > > > > > > > > > OpenSBI:
> > > > > > > > > > > > > > > > > > > > > > > > > "[PATCH] firmware: Introduce
> > > > > > > > > > > > > > > > > > > > > > > > > relocation lottery"
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > Can you try above patch and see if
> > > > > > > > > > > > > > > > > > > > > > > > > that helps ?
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > > It will be great if you can provide
> > > > > > > > > > > > > > > > > > > > > > > > > Tested-by to my patch as well.
> > > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > I can not find this patch in mailing
> > > > > > > > > > > > > > > > > > > > > > > list.
> > > > > > > > > > > > > > > > > > > > > > > Can you provide a hyperlink ?
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > You can try latest riscv/opensbi master.
> > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > I have tested the patch on SiFive Unleashed
> > > > > > > > > > > > > > > > > > > > > > multiple times.
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > I have tried this patch, but it fail
> > > > > > > > > > > > > > > > > > > > > firmware: Introduce relocation lottery(
> > > > > > > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe)
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > The scenario was as below:
> > > > > > > > > > > > > > > > > > > > > There are 4 harts run in U-Boot SPL, hart 0
> > > > > > > > > > > > > > > > > > > > > play as main hart.
> > > > > > > > > > > > > > > > > > > > > The hart 1 will receive ipi and come into
> > > > > > > > > > > > > > > > > > > > > OpenSBI(0x1000000) from
> > > > > > > > > > > > > > > > > > > > > U-Boot SPL(0x0), meanwhile hart 0,2,3 still
> > > > > > > > > > > > > > > > > > > > > run in U-Boot SPL.
> > > > > > > > > > > > > > > > > > > > > Then hart 1 will do _relocate_copy_to_lower
> > > > > > > > > > > > > > > > > > > > > which will copy data from
> > > > > > > > > > > > > > > > > > > > > 0x1000000 to 0x0.
> > > > > > > > > > > > > > > > > > > > > And it will corrupt U-Boot SPL.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > The self-relocation in OpenSBI firmwares
> > > > > > > > > > > > > > > > > > > > ensures that OpenSBI firmware
> > > > > > > > > > > > > > > > > > > > are moved to the FW_TEXT_START before entering
> > > > > > > > > > > > > > > > > > > > C code. This helps
> > > > > > > > > > > > > > > > > > > > us load OpenSBI firmwares anywhere in RAM.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > However, OpenSBI firmwares don't know where the
> > > > > > > > > > > > > > > > > > > > U-Boot SPL is running.
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > In your case, both OpenSBI FW_DYNAMIC and U-
> > > > > > > > > > > > > > > > > > > > Boot SPL are linked to
> > > > > > > > > > > > > > > > > > > > address same address 0x0. This means secondary
> > > > > > > > > > > > > > > > > > > > HARTs cannot safely
> > > > > > > > > > > > > > > > > > > > wait while primary HART enters OpenSBI. You
> > > > > > > > > > > > > > > > > > > > should hold secondary HARTs
> > > > > > > > > > > > > > > > > > > > in U-Boot SPL only till OpenSBI FW_DYNAMIC and
> > > > > > > > > > > > > > > > > > > > U-Boot proper are
> > > > > > > > > > > > > > > > > > > > loaded in RAM by primary HART. All your HARTs
> > > > > > > > > > > > > > > > > > > > should jump to OpenSBI
> > > > > > > > > > > > > > > > > > > > at the same time after everything is loaded in
> > > > > > > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > I see the issue now.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > The U-Boot SPL is first letting secondary HART
> > > > > > > > > > > > > > > > > > > jump to OpenSBI and primary
> > > > > > > > > > > > > > > > > > > HART jumps to OpenSBI at the end.
> > > > > > > > > > > > > > > > > > > (Refer, jump_to_image_no_args() in
> > > > > > > > > > > > > > > > > > > arch/riscv/lib/spl.c)
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > The real issue is FW_TEXT_START being same as U-
> > > > > > > > > > > > > > > > > > > Boot SPL TEXT_START.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > If possible please change TEXT base for U-Boot
> > > > > > > > > > > > > > > > > > > SPL or OpenSBI. I think
> > > > > > > > > > > > > > > > > > > changing U-Boot SPL TEXT_START would be
> > > > > > > > > > > > > > > > > > > convenient since this series
> > > > > > > > > > > > > > > > > > > is under review. Thoughts ?
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > > > > > > I know it can avoid corrupting issue with
> > > > > > > > > > > > > > > > > > changing  U-Boot SPL
> > > > > > > > > > > > > > > > > > TEXT_START not equal to OpenSBI TEXT base.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I think this issue will be seen on U-Boot SPL running
> > > > > > > > > > > > > > > > > on QEMU as well.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > With the following changes, U-Boot SPL text base
> > > > > > > > > > > > > > > > > > can equal to OpenSBI text base
> > > > > > > > > > > > > > > > > > 1 U-Boot pass main hart information (a2) when
> > > > > > > > > > > > > > > > > > jumping to OpenSBI
> > > > > > > > > > > > > > > > > > 2 OpenSBI pick up $a2 to keep playing as main hart,
> > > > > > > > > > > > > > > > > > other harts go to
> > > > > > > > > > > > > > > > > > _wait_relocate_copy_done
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Overall it's a good suggestion but we cannot use a2
> > > > > > > > > > > > > > > > > register because this
> > > > > > > > > > > > > > > > > will break FW_JUMP and FW_PAYLOAD. Instead, we should
> > > > > > > > > > > > > > > > > pass preferred
> > > > > > > > > > > > > > > > > boot HART id in struct fw_dynamic_info.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Sorry, what I want to say shall be a3.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > I have a patch for this in preferred_boot_hart_v1
> > > > > > > > > > > > > > > > > branch of
> > > > > > > > > > > > > > > > > https://github.com/avpatel/opensbi.git
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Can you try OpenSBI from above branch ?
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > You will have to update the "struct fw_dynamic_info"
> > > > > > > > > > > > > > > > > passed to
> > > > > > > > > > > > > > > > > OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Main hart will pass struct "fw_dynamic_info" to OpenSBI
> > > > > > > > > > > > > > > > by U-Boot SPL.
> > > > > > > > > > > > > > > > But other harts will NOT pass struct "fw_dynamic_info"
> > > > > > > > > > > > > > > > to OpenSBI by U-Boot SPL.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That's wrong in U-Boot SPL.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > All HARTs have to follow FW_DYNAMIC protocol and pass
> > > > > > > > > > > > > > > "struct fw_dynamic_info" pointer in 'a2' register.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > So if U-Boot SPL can pass main hart information via a3,
> > > > > > > > > > > > > > > > OpenSBI just
> > > > > > > > > > > > > > > > have the following change
> > > > > > > > > > > > > > > > blt zero, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > > > change to
> > > > > > > > > > > > > > > > bne a3, a6, _wait_relocate_copy_done
> > > > > > > > > > > > > > > > before this commit
> > > > > > > > > > > > > > > > 98f4a208995b027662a7b04a25e4fa5df5f3eefe
> > > > > > > > > > > > > > > > firmware: Introduce relocation lottery
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > What about FW_JUMP and FW_PAYLOAD? We have no way of
> > > > > > > > > > > > > > > passing
> > > > > > > > > > > > > > > value in a3 for these firmwares because these are not
> > > > > > > > > > > > > > > booted by U-Boot
> > > > > > > > > > > > > > > SPL.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Also, U-Boot-2019.10 already uses U-Boot SPL support
> > > > > > > > > > > > > > > which does not
> > > > > > > > > > > > > > > pass anything in 'a3' register.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We should definitely use "struct fw_dynamic_info" for
> > > > > > > > > > > > > > > this so that we can
> > > > > > > > > > > > > > > maintain backward compatibility as well.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Please make sure that U-Boot SPL passes "struct
> > > > > > > > > > > > > > > fw_dynamic_info"
> > > > > > > > > > > > > > > pointer in 'a2' register for all HARTs.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > But after this commit 98f4a, main hart become chosen
> > > > > > > > > > > > > > > > from lottery mechanism.
> > > > > > > > > > > > > > > > Maybe I will prefer to change U-Boot SPL text base not
> > > > > > > > > > > > > > > > overlap with
> > > > > > > > > > > > > > > > OpenSBI text start. :)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Like I mentioned, we have this issue for U-Boot SPL on
> > > > > > > > > > > > > > > QEMU as well. It's
> > > > > > > > > > > > > > > just that most of us did not notice it for U-Boot SPL on
> > > > > > > > > > > > > > > QEMU.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Let's fix this in the right way from start itself.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I double checked spl_invoke_opensbi() and it is doing the
> > > > > > > > > > > > > > right thing
> > > > > > > > > > > > > > by passing "struct fw_dyanmic_info" pointer in 'a2'
> > > > > > > > > > > > > > register.
> > > > > > > > > > > > > > (Refer, common/spl/spl_opensbi.c)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Not sure, why it is not passing 'a2' register correctly for
> > > > > > > > > > > > > > you ??
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Yes, you are right. I reply too quickly.
> > > > > > > > > > > > > Other harts will pass struct fw_dyanmic_info in a2 to
> > > > > > > > > > > > > OpenSBI.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks for your corrections
> > > > > > > > > > > > 
> > > > > > > > > > > > No problem, I am happy to help.
> > > > > > > > > > > > 
> > > > > > > > > > > > BTW, I tried to play around with U-Boot SPL on QEMU.
> > > > > > > > > > > > 
> > > > > > > > > > > > Maybe below changes can help you...
> > > > > > > > > > > 
> > > > > > > > > > > Thanks for looking into this issue! I successfully tested it on
> > > > > > > > > > > QEMU, I
> > > > > > > > > > > had to add a short delay between sending the IPIs to trigger the
> > > > > > > > > > > problem.
> > > > > > > > > > > 
> > > > > > > > > > > We might still run into problems however. Right now, we are
> > > > > > > > > > > assuming
> > > > > > > > > > > that the main hart is the last one to enter OpenSBI. If this is
> > > > > > > > > > > not the
> > > > > > > > > > > case (some delay when handling the IPI), we will have the same
> > > > > > > > > > > problem
> > > > > > > > > > > again. To fix this we could pass the hart mask, containing all
> > > > > > > > > > > harts
> > > > > > > > > > > that have entered U-Boot, to OpenSBI and wait for all harts to be
> > > > > > > > > > > running in OpenSBI. I am not sure how realistic this scenario is,
> > > > > > > > > > > so
> > > > > > > > > > > this might not be needed.
> > > > > > > > > > 
> > > > > > > > > > I agree that we might still run into this issue if primary HART
> > > > > > > > > > enters
> > > > > > > > > > OpenSBI before secondary HARTs. I think this situation can only
> > > > > > > > > > happen on QEMU where each CPU is a thread running on host but
> > > > > > > > > > very unlikely/impossible on real HW.
> > > > > > > > > > 
> > > > > > > > > > Maybe a delay on primary HART in U-Boot SPL after SMP calls to
> > > > > > > > > > secondary HARTs and before jumping to OpenSBI ?
> > > > > > > > > > 
> > > > > > > > > > Regarding hart_mask in fw_dynamic_info, I think the issue will be
> > > > > > > > > > the
> > > > > > > > > > size of the hart_mask. It is possible in-future SOC vendors come-up
> > > > > > > > > > with SOC having huge number of HARTs OR SOC with discontinuous
> > > > > > > > > > HART IDs which can cause a 64bit hart_mask to be not sufficient for
> > > > > > > > > > all HARTs.
> > > > > > > > > > 
> > > > > > > > > > Also, waiting for all HARTs to enter OpenSBI will be one more wait-
> > > > > > > > > > loop
> > > > > > > > > > in fw_base.S which will add to the boot-time as well.
> > > > > > > > > > 
> > > > > > > > > > I still think the root cause of the issue is that TEXT_START of
> > > > > > > > > > U-Boot SPL and OpenSBI FW_DYNAMIC is same. Maybe we can
> > > > > > > > > > insist SOC vendors to not use same TEXT_START ?
> > > > > > > > > 
> > > > > > > > > I have try your changes about boot_hart for U-Boot SPL with OpenSBI,
> > > > > > > > > preferred_boot_hart_v2 branch
> > > > > > > > > It still encounter some booting problems. I try to find out the root
> > > > > > > > > cause but in vain.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Just wanted to make sure that you have tried this patch.
> > > > > > > > 
> > > > > > > > http://lists.infradead.org/pipermail/opensbi/2019-November/000672.html
> > > > > > > > 
> > > > > > > > We should investigate the issue why it did not work for you if this
> > > > > > > > patch did not work for you.
> > > > > > > 
> > > > > > > Yes, I try with this
> > > > > > > commit 831aa3c1ad2546a2b35ddf5b1aa0ce91cdc7fe89
> > > > > > > firmware: Add preferred boot HART field in struct fw_dynamic_info
> > > > > > > 
> > > > > > > It fail randomly yesterday, but this morning I try several times it will pass.
> > > > > > > I will keep trying.
> > > > > > > 
> > > > > > 
> > > > > > I have figure out one fail case which is belong to main hart of U-Boot
> > > > > > SPL is not the last hart while entering OpenSBI
> > > > > > 
> > > > > 
> > > > > Can you try this branch [1]? It includes a quick implementation of the
> > > > > changes a mentioned yesterday, where the main hart waits until all
> > > > > harts have received the IPI.
> > > > > 
> > > > > [1]:
> > > > > https://github.com/lukasauer/u-boot/commits/riscv-opensbi-boot-hart
> > > > 
> > > > I have try this patch, but it seems can not guarantee main hart to be
> > > > the last hart while leaving U-Boot SPL.
> > > > Even the main hart have checked all harts have received the IPI, but
> > > > it still have opportunities to arrive OpenSBI  before other harts.
> > > > 
> > > 
> > > Thanks for testing! Can you try again with the same branch? I added
> > > another patch so that clearing the IPI is the very last thing before
> > > jumping to the SMP function.
> > > If that does not help, we'll have to add a delay in
> > > spl_invoke_opensbi() to delay the main hart.
> > 
> > Thanks for your patch, it indeed solve the problem almost.
> > It always pass the booting verification under free running, I try many
> > times and can not hit the failure case.
> > 

Thanks for testing the changes again, Rick. I will try to clean them up
and submit them soon.

> > But if I do something with GDB, E.g
> > Set a break in 0x6c2 (c.jalr  s2) in handle_ipi() and then delete the
> > break and free run after it hit this break
> > then it will hit the failure case as below:
> 
> The current booting mechanism where all HARTs jump to each
> booting stage (including Linux) at the same time is very fragile.
> 
> Attaching a debugger at boot-time will certainly break current way
> of booting all HARTs because we are holding a particular HART in
> debug state. I am not sure if we can handle the debugger case in
> software but in real-world deployments debugger won't be present
> so maybe we can ignore this case.
> 
> The SBI v0.2 HSM extension will make things better. It would be
> even better if HW designers provide explicit HW mechanism to
> power-on/off HARTs at run-time.
> 

I agree with you, Anup. The debugger should not be attached in real-
world scenarios. Even if it is, it is likely for debugging something in
U-Boot proper or SPL and not the part where we are jumping to OpenSBI,
so we should not typically run into any issues here. In addition, the
recommended way of using SPL should be with separate text bases for  
U-Boot SPL and OpenSBI. In this setup we don't run into this problem.
All in all, I think the current solution is robust enough.

I will also make sure to add a comment on why it is important that the
main hart enters OpenSBI last. This way the problem is documented and
people looking at the code will be aware of it.

Thanks,
Lukas
diff mbox series

Patch

diff --git a/arch/riscv/lib/andes_plic.c b/arch/riscv/lib/andes_plic.c
index 28568e4..42394b9 100644
--- a/arch/riscv/lib/andes_plic.c
+++ b/arch/riscv/lib/andes_plic.c
@@ -19,7 +19,7 @@ 
 #include <cpu.h>
 
 /* pending register */
-#define PENDING_REG(base, hart)	((ulong)(base) + 0x1000 + (hart) * 8)
+#define PENDING_REG(base, hart)	((ulong)(base) + 0x1000 + ((hart) / 4) * 4)
 /* enable register */
 #define ENABLE_REG(base, hart)	((ulong)(base) + 0x2000 + (hart) * 0x80)
 /* claim register */
@@ -46,7 +46,7 @@  static int init_plic(void);
 
 static int enable_ipi(int hart)
 {
-	int en;
+	unsigned int en;
 
 	en = ENABLE_HART_IPI >> hart;
 	writel(en, (void __iomem *)ENABLE_REG(gd->arch.plic, hart));
@@ -94,10 +94,13 @@  static int init_plic(void)
 
 int riscv_send_ipi(int hart)
 {
+	unsigned int ipi;
+
 	PLIC_BASE_GET();
 
-	writel(SEND_IPI_TO_HART(hart),
-	       (void __iomem *)PENDING_REG(gd->arch.plic, gd->arch.boot_hart));
+	ipi = (SEND_IPI_TO_HART(hart) << (8 * gd->arch.boot_hart));
+	writel(ipi, (void __iomem *)PENDING_REG(gd->arch.plic,
+				gd->arch.boot_hart));
 
 	return 0;
 }