diff mbox series

[1/1] riscv: don't jump to 0x0 in handle_ipi()

Message ID 20200811035648.3284-1-xypron.glpk@gmx.de
State Rejected
Delegated to: Andes
Headers show
Series [1/1] riscv: don't jump to 0x0 in handle_ipi() | expand

Commit Message

Heinrich Schuchardt Aug. 11, 2020, 3:56 a.m. UTC
At least on the Kendryte K210:

gd->arch.available_harts= 0x0000000000000003
gd->arch.ipi[0].addr= 0x0000000000000000
gd->arch.ipi[0].arg0= 0x0000000000000000
gd->arch.ipi[0].arg1= 0x0000000000000000
gd->arch.ipi[1].addr= 0x0000000000000000
gd->arch.ipi[1].arg0= 0x0000000000000000
gd->arch.ipi[1].arg1= 0x0000000000000000

We should not jump to 0x0 to handle an interrupt.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
---
 arch/riscv/lib/smp.c | 2 ++
 1 file changed, 2 insertions(+)

--
2.28.0

Comments

Rick Chen Aug. 11, 2020, 6:20 a.m. UTC | #1
Hi Heinrich

> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
> Sent: Tuesday, August 11, 2020 11:57 AM
> To: Rick Jian-Zhi Chen(陳建志)
> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>
> At least on the Kendryte K210:
>
> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
> gd->arch.ipi[1].arg1= 0x0000000000000000
>
> We should not jump to 0x0 to handle an interrupt.

Can you explain why K210 be affected by it ?

>
> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> ---
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>                 return;
>         }
>
> +       if (!smp_function)
> +               return;
>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }

I remember Sean add this check in
[v10,14/21] riscv: Clean up IPI initialization code
. And I ask him to remove.
https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/

Thanks,
Rick

>
> --
> 2.28.0
Heinrich Schuchardt Aug. 11, 2020, 7:50 a.m. UTC | #2
On 11.08.20 08:20, Rick Chen wrote:
> Hi Heinrich
>
>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>> Sent: Tuesday, August 11, 2020 11:57 AM
>> To: Rick Jian-Zhi Chen(陳建志)
>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>
>> At least on the Kendryte K210:
>>
>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>
>> We should not jump to 0x0 to handle an interrupt.
>
> Can you explain why K210 be affected by it ?

I have been running U-Boot on the MaixDuino.

Without this patch secondary_hart_loop() is reached only once. With the
patch it is reached several thousand times.

I would not expect NULL to contain any code that should be executed by
the secondary hart. See doc/board/sipeed/maix.rst:

Address    Size      Description
========== ========= ===========
0x00000000 0x1000    debug
0x00001000 0x1000    rom
0x02000000 0xC000    clint

>
>>
>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> ---
>>  arch/riscv/lib/smp.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>> --- a/arch/riscv/lib/smp.c
>> +++ b/arch/riscv/lib/smp.c
>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>                 return;
>>         }
>>
>> +       if (!smp_function)
>> +               return;
>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>
> I remember Sean add this check in
> [v10,14/21] riscv: Clean up IPI initialization code
> . And I ask him to remove.
> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/

Your comment was: "Please remove the sanity check. If zero address is
the intended jump point, then system will work abnormal."

The only place where gd->arch.ipi[hart].addr is set is:

arch/riscv/lib/smp.c:53 send_ipi_many():
gd->arch.ipi[reg].addr = ipi->addr;

send_ipi_many() is only called in smp_call_function().

So the line

smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);

can only work if smp_function() has been called before this point at any
time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

Why do we call handle_ipi() on the secondary hart at all?

Where do you expect it to jump if U-Boot is the first binary loaded?

Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
address to secondary_hart_relocate(), do we want the secondary hart to
execute it again and again?

Best regards

Heinrich
Sean Anderson Aug. 11, 2020, 10:32 a.m. UTC | #3
On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
> On 11.08.20 08:20, Rick Chen wrote:
>> Hi Heinrich
>>
>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>> To: Rick Jian-Zhi Chen(陳建志)
>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>
>>> At least on the Kendryte K210:
>>>
>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>
>>> We should not jump to 0x0 to handle an interrupt.
>>
>> Can you explain why K210 be affected by it ?
> 
> I have been running U-Boot on the MaixDuino.
> 
> Without this patch secondary_hart_loop() is reached only once. With the
> patch it is reached several thousand times.

Hm, interesting. To me, this is a symptom of something else going
terribly wrong. I originally had this check in place so that it would
be easier to detect these sorts of errors. I don't think this is the
correct fix, however. We should really try and find the root cause of
the bug.

> I would not expect NULL to contain any code that should be executed by
> the secondary hart. See doc/board/sipeed/maix.rst:
> 
> Address    Size      Description
> ========== ========= ===========
> 0x00000000 0x1000    debug
> 0x00001000 0x1000    rom
> 0x02000000 0xC000    clint
> 
>>
>>>
>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>> ---
>>>  arch/riscv/lib/smp.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>                 return;
>>>         }
>>>
>>> +       if (!smp_function)
>>> +               return;
>>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>
>> I remember Sean add this check in
>> [v10,14/21] riscv: Clean up IPI initialization code
>> . And I ask him to remove.
>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
> 
> Your comment was: "Please remove the sanity check. If zero address is
> the intended jump point, then system will work abnormal."
> 
> The only place where gd->arch.ipi[hart].addr is set is:
> 
> arch/riscv/lib/smp.c:53 send_ipi_many():
> gd->arch.ipi[reg].addr = ipi->addr;
> 
> send_ipi_many() is only called in smp_call_function().
> 
> So the line
> 
> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
> 
> can only work if smp_function() has been called before this point at any
> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().

Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
in [2] as part of a larger revert. The actual revert-of-revert is in
[3], though it really should be split out into its own patch. The
original commit making these changes is [4].

Note that the situation before [4] was that the IPI got initialized by
the first hart to call any IPI function. If that hart was not the boot
hart, Bad Things started to happen (e.g. race conditions, memory
corruption, etc). In that patch, I moved the initialization to its own
function so we would not have any race conditions and instead have
(easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
course, when everything is working properly, we should get neither of
these.

[1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
[2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
[3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
[4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/

> Why do we call handle_ipi() on the secondary hart at all?

Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
you're getting at.

> Where do you expect it to jump if U-Boot is the first binary loaded?
> 
> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
> address to secondary_hart_relocate(), do we want the secondary hart to
> execute it again and again?

--Sean
Heinrich Schuchardt Aug. 18, 2020, 9 a.m. UTC | #4
On 11.08.20 12:32, Sean Anderson wrote:
> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 08:20, Rick Chen wrote:
>>> Hi Heinrich
>>>
>>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>
>>>> At least on the Kendryte K210:
>>>>
>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>
>>>> We should not jump to 0x0 to handle an interrupt.
>>>
>>> Can you explain why K210 be affected by it ?
>>
>> I have been running U-Boot on the MaixDuino.
>>
>> Without this patch secondary_hart_loop() is reached only once. With the
>> patch it is reached several thousand times.
>
> Hm, interesting. To me, this is a symptom of something else going
> terribly wrong. I originally had this check in place so that it would
> be easier to detect these sorts of errors. I don't think this is the
> correct fix, however. We should really try and find the root cause of
> the bug.
>
>> I would not expect NULL to contain any code that should be executed by
>> the secondary hart. See doc/board/sipeed/maix.rst:
>>
>> Address    Size      Description
>> ========== ========= ===========
>> 0x00000000 0x1000    debug
>> 0x00001000 0x1000    rom
>> 0x02000000 0xC000    clint
>>
>>>
>>>>
>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>> ---
>>>>  arch/riscv/lib/smp.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>> --- a/arch/riscv/lib/smp.c
>>>> +++ b/arch/riscv/lib/smp.c
>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>                 return;
>>>>         }
>>>>
>>>> +       if (!smp_function)
>>>> +               return;
>>>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>
>>> I remember Sean add this check in
>>> [v10,14/21] riscv: Clean up IPI initialization code
>>> . And I ask him to remove.
>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>
>> Your comment was: "Please remove the sanity check. If zero address is
>> the intended jump point, then system will work abnormal."
>>
>> The only place where gd->arch.ipi[hart].addr is set is:
>>
>> arch/riscv/lib/smp.c:53 send_ipi_many():
>> gd->arch.ipi[reg].addr = ipi->addr;
>>
>> send_ipi_many() is only called in smp_call_function().
>>
>> So the line
>>
>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>
>> can only work if smp_function() has been called before this point at any
>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>
> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
> in [2] as part of a larger revert. The actual revert-of-revert is in
> [3], though it really should be split out into its own patch. The
> original commit making these changes is [4].

Patch series [1] makes not difference. You still have

gd->arch.ipi[0].addr= 0x0000000000000000
gd->arch.ipi[1].addr= 0x0000000000000000

and the secondary hart jumps to NULL and never returns.

>
> Note that the situation before [4] was that the IPI got initialized by
> the first hart to call any IPI function. If that hart was not the boot
> hart, Bad Things started to happen (e.g. race conditions, memory
> corruption, etc). In that patch, I moved the initialization to its own
> function so we would not have any race conditions and instead have
> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
> course, when everything is working properly, we should get neither of
> these.
>
> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>
>> Why do we call handle_ipi() on the secondary hart at all?
>
> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
> you're getting at.
>

I cannot see anything to be done by a secondary hart in case of a
software interrupt.

Best regards

Heinrich

>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>
>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>> address to secondary_hart_relocate(), do we want the secondary hart to
>> execute it again and again?
>
> --Sean
>
Bin Meng Aug. 18, 2020, 9:14 a.m. UTC | #5
Hi Heinrich,

On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>
> At least on the Kendryte K210:
>
> gd->arch.available_harts= 0x0000000000000003
> gd->arch.ipi[0].addr= 0x0000000000000000
> gd->arch.ipi[0].arg0= 0x0000000000000000
> gd->arch.ipi[0].arg1= 0x0000000000000000
> gd->arch.ipi[1].addr= 0x0000000000000000
> gd->arch.ipi[1].arg0= 0x0000000000000000
> gd->arch.ipi[1].arg1= 0x0000000000000000
>
> We should not jump to 0x0 to handle an interrupt.
>
> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> ---
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> index ac22136314..d725fa32e8 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>                 return;
>         }
>
> +       if (!smp_function)
> +               return;

Can you trace down to why gd->arch.ipi[0].addr is set to zero? I
looked at all places that call smp_call_function() and
gd->arch.ipi[0].addr should not be zero, unless something is seriously
wrong on your board.

Regards,
Bin
Sean Anderson Aug. 18, 2020, 10:26 a.m. UTC | #6
On 8/18/20 5:00 AM, Heinrich Schuchardt wrote:
> On 11.08.20 12:32, Sean Anderson wrote:
>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 08:20, Rick Chen wrote:
>>>> Hi Heinrich
>>>>
>>>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>
>>>>> At least on the Kendryte K210:
>>>>>
>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>
>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>
>>>> Can you explain why K210 be affected by it ?
>>>
>>> I have been running U-Boot on the MaixDuino.
>>>
>>> Without this patch secondary_hart_loop() is reached only once. With the
>>> patch it is reached several thousand times.
>>
>> Hm, interesting. To me, this is a symptom of something else going
>> terribly wrong. I originally had this check in place so that it would
>> be easier to detect these sorts of errors. I don't think this is the
>> correct fix, however. We should really try and find the root cause of
>> the bug.
>>
>>> I would not expect NULL to contain any code that should be executed by
>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>
>>> Address    Size      Description
>>> ========== ========= ===========
>>> 0x00000000 0x1000    debug
>>> 0x00001000 0x1000    rom
>>> 0x02000000 0xC000    clint
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>> ---
>>>>>  arch/riscv/lib/smp.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>>> --- a/arch/riscv/lib/smp.c
>>>>> +++ b/arch/riscv/lib/smp.c
>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>                 return;
>>>>>         }
>>>>>
>>>>> +       if (!smp_function)
>>>>> +               return;
>>>>>         smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>>
>>>> I remember Sean add this check in
>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>> . And I ask him to remove.
>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>>
>>> Your comment was: "Please remove the sanity check. If zero address is
>>> the intended jump point, then system will work abnormal."
>>>
>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>
>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>
>>> send_ipi_many() is only called in smp_call_function().
>>>
>>> So the line
>>>
>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>
>>> can only work if smp_function() has been called before this point at any
>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>
>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>> in [2] as part of a larger revert. The actual revert-of-revert is in
>> [3], though it really should be split out into its own patch. The
>> original commit making these changes is [4].
> 
> Patch series [1] makes not difference. You still have
> 
> gd->arch.ipi[0].addr= 0x0000000000000000
> gd->arch.ipi[1].addr= 0x0000000000000000
> 
> and the secondary hart jumps to NULL and never returns.

Hm, then there is a code path where an IPI gets triggered by something
other than opensbi.

> 
>>
>> Note that the situation before [4] was that the IPI got initialized by
>> the first hart to call any IPI function. If that hart was not the boot
>> hart, Bad Things started to happen (e.g. race conditions, memory
>> corruption, etc). In that patch, I moved the initialization to its own
>> function so we would not have any race conditions and instead have
>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>> course, when everything is working properly, we should get neither of
>> these.
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
>> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
>> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>>
>>> Why do we call handle_ipi() on the secondary hart at all?
>>
>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>> you're getting at.
>>
> 
> I cannot see anything to be done by a secondary hart in case of a
> software interrupt.

Isn't it supposed to run the function which the boot hart sent to it?

--Sean
Heinrich Schuchardt Aug. 18, 2020, 11:23 a.m. UTC | #7
On 18.08.20 11:14, Bin Meng wrote:
> Hi Heinrich,
>
> On Tue, Aug 11, 2020 at 11:57 AM Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>>
>> At least on the Kendryte K210:
>>
>> gd->arch.available_harts= 0x0000000000000003
>> gd->arch.ipi[0].addr= 0x0000000000000000
>> gd->arch.ipi[0].arg0= 0x0000000000000000
>> gd->arch.ipi[0].arg1= 0x0000000000000000
>> gd->arch.ipi[1].addr= 0x0000000000000000
>> gd->arch.ipi[1].arg0= 0x0000000000000000
>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>
>> We should not jump to 0x0 to handle an interrupt.
>>
>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> ---
>>  arch/riscv/lib/smp.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>> index ac22136314..d725fa32e8 100644
>> --- a/arch/riscv/lib/smp.c
>> +++ b/arch/riscv/lib/smp.c
>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>                 return;
>>         }
>>
>> +       if (!smp_function)
>> +               return;
>
> Can you trace down to why gd->arch.ipi[0].addr is set to zero? I
> looked at all places that call smp_call_function() and
> gd->arch.ipi[0].addr should not be zero, unless something is seriously
> wrong on your board.

arch/riscv/cpu/start.S:122:     jal     board_init_f_init_reserve

board_init_f_init_reserve():
memset(gd_ptr, '\0', sizeof(*gd));

Which code did you expect to set another value on the Kendryte K210?

Best regards

Heinrich

>
> Regards,
> Bin
>
Heinrich Schuchardt Sept. 1, 2020, 10:51 a.m. UTC | #8
On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
> On 11.08.20 12:32, Sean Anderson wrote:
>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 08:20, Rick Chen wrote:
>>>> Hi Heinrich
>>>>
>>>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>
>>>>> At least on the Kendryte K210:
>>>>>
>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>
>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>
>>>> Can you explain why K210 be affected by it ?
>>>
>>> I have been running U-Boot on the MaixDuino.
>>>
>>> Without this patch secondary_hart_loop() is reached only once. With the
>>> patch it is reached several thousand times.
>>
>> Hm, interesting. To me, this is a symptom of something else going
>> terribly wrong. I originally had this check in place so that it would
>> be easier to detect these sorts of errors. I don't think this is the
>> correct fix, however. We should really try and find the root cause of
>> the bug.
>>
>>> I would not expect NULL to contain any code that should be executed by
>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>
>>> Address    Size      Description
>>> ========== ========= ===========
>>> 0x00000000 0x1000    debug
>>> 0x00001000 0x1000    rom
>>> 0x02000000 0xC000    clint
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>> ---
>>>>>   arch/riscv/lib/smp.c | 2 ++
>>>>>   1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>>> --- a/arch/riscv/lib/smp.c
>>>>> +++ b/arch/riscv/lib/smp.c
>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>                  return;
>>>>>          }
>>>>>
>>>>> +       if (!smp_function)
>>>>> +               return;
>>>>>          smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>>
>>>> I remember Sean add this check in
>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>> . And I ask him to remove.
>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>>
>>> Your comment was: "Please remove the sanity check. If zero address is
>>> the intended jump point, then system will work abnormal."
>>>
>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>
>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>
>>> send_ipi_many() is only called in smp_call_function().
>>>
>>> So the line
>>>
>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>
>>> can only work if smp_function() has been called before this point at any
>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>
>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>> in [2] as part of a larger revert. The actual revert-of-revert is in
>> [3], though it really should be split out into its own patch. The
>> original commit making these changes is [4].
>
> Patch series [1] makes not difference. You still have
>
> gd->arch.ipi[0].addr= 0x0000000000000000
> gd->arch.ipi[1].addr= 0x0000000000000000
>
> and the secondary hart jumps to NULL and never returns.
>
>>
>> Note that the situation before [4] was that the IPI got initialized by
>> the first hart to call any IPI function. If that hart was not the boot
>> hart, Bad Things started to happen (e.g. race conditions, memory
>> corruption, etc). In that patch, I moved the initialization to its own
>> function so we would not have any race conditions and instead have
>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>> course, when everything is working properly, we should get neither of
>> these.
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
>> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
>> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>>
>>> Why do we call handle_ipi() on the secondary hart at all?
>>
>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>> you're getting at.
>>
>
> I cannot see anything to be done by a secondary hart in case of a
> software interrupt.
>
> Best regards
>
> Heinrich

When resetting the sipeed_maix_bitm_defconfig without the patch I see a
crash:

=> reset
resetting ...
Unhandled exception: Illegal instruction
EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
### ERROR ### Please RESET the board ###

Objdump shows that it is related to the secondary_hart_loop:

         j       secondary_hart_loop
     8000021a:   b7ed                    j       80000204
<secondary_hart_loop>

This crash is avoided with this patch.

How do we get forward?

Best regards

Heinrich

>
>>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>>
>>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>>> address to secondary_hart_relocate(), do we want the secondary hart to
>>> execute it again and again?
>>
>> --Sean
>>
>
Sean Anderson Sept. 1, 2020, 11:14 a.m. UTC | #9
On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
>> On 11.08.20 12:32, Sean Anderson wrote:
>>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>>> On 11.08.20 08:20, Rick Chen wrote:
>>>>> Hi Heinrich
>>>>>
>>>>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>>
>>>>>> At least on the Kendryte K210:
>>>>>>
>>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>>
>>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>>
>>>>> Can you explain why K210 be affected by it ?
>>>>
>>>> I have been running U-Boot on the MaixDuino.
>>>>
>>>> Without this patch secondary_hart_loop() is reached only once. With the
>>>> patch it is reached several thousand times.
>>>
>>> Hm, interesting. To me, this is a symptom of something else going
>>> terribly wrong. I originally had this check in place so that it would
>>> be easier to detect these sorts of errors. I don't think this is the
>>> correct fix, however. We should really try and find the root cause of
>>> the bug.
>>>
>>>> I would not expect NULL to contain any code that should be executed by
>>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>>
>>>> Address    Size      Description
>>>> ========== ========= ===========
>>>> 0x00000000 0x1000    debug
>>>> 0x00001000 0x1000    rom
>>>> 0x02000000 0xC000    clint
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>> ---
>>>>>>   arch/riscv/lib/smp.c | 2 ++
>>>>>>   1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>>>> --- a/arch/riscv/lib/smp.c
>>>>>> +++ b/arch/riscv/lib/smp.c
>>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>>                  return;
>>>>>>          }
>>>>>>
>>>>>> +       if (!smp_function)
>>>>>> +               return;
>>>>>>          smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>>>
>>>>> I remember Sean add this check in
>>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>>> . And I ask him to remove.
>>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>>>
>>>> Your comment was: "Please remove the sanity check. If zero address is
>>>> the intended jump point, then system will work abnormal."
>>>>
>>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>>
>>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>>
>>>> send_ipi_many() is only called in smp_call_function().
>>>>
>>>> So the line
>>>>
>>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>>
>>>> can only work if smp_function() has been called before this point at any
>>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>>
>>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>>> in [2] as part of a larger revert. The actual revert-of-revert is in
>>> [3], though it really should be split out into its own patch. The
>>> original commit making these changes is [4].
>>
>> Patch series [1] makes not difference. You still have
>>
>> gd->arch.ipi[0].addr= 0x0000000000000000
>> gd->arch.ipi[1].addr= 0x0000000000000000
>>
>> and the secondary hart jumps to NULL and never returns.
>>
>>>
>>> Note that the situation before [4] was that the IPI got initialized by
>>> the first hart to call any IPI function. If that hart was not the boot
>>> hart, Bad Things started to happen (e.g. race conditions, memory
>>> corruption, etc). In that patch, I moved the initialization to its own
>>> function so we would not have any race conditions and instead have
>>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>>> course, when everything is working properly, we should get neither of
>>> these.
>>>
>>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>>> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
>>> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
>>> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>>>
>>>> Why do we call handle_ipi() on the secondary hart at all?
>>>
>>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>>> you're getting at.
>>>
>>
>> I cannot see anything to be done by a secondary hart in case of a
>> software interrupt.
>>
>> Best regards
>>
>> Heinrich
> 
> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
> crash:
> 
> => reset
> resetting ...
> Unhandled exception: Illegal instruction
> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
> ### ERROR ### Please RESET the board ###

Hm, interesting. I don't see that error. What commit are you testing with?

> 
> Objdump shows that it is related to the secondary_hart_loop:
> 
>         j       secondary_hart_loop
>     8000021a:   b7ed                    j       80000204
> <secondary_hart_loop>
> 
> This crash is avoided with this patch.
> 
> How do we get forward?
> 
> Best regards
> 
> Heinrich
> 
>>
>>>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>>>
>>>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>>>> address to secondary_hart_relocate(), do we want the secondary hart to
>>>> execute it again and again?
>>>
>>> --Sean
>>>
>>
>
Heinrich Schuchardt Sept. 1, 2020, 11:23 a.m. UTC | #10
On 01.09.20 13:14, Sean Anderson wrote:
> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
>> On 8/18/20 11:00 AM, Heinrich Schuchardt wrote:
>>> On 11.08.20 12:32, Sean Anderson wrote:
>>>> On 8/11/20 3:50 AM, Heinrich Schuchardt wrote:
>>>>> On 11.08.20 08:20, Rick Chen wrote:
>>>>>> Hi Heinrich
>>>>>>
>>>>>>> From: Heinrich Schuchardt [mailto:xypron.glpk@gmx.de]
>>>>>>> Sent: Tuesday, August 11, 2020 11:57 AM
>>>>>>> To: Rick Jian-Zhi Chen(陳建志)
>>>>>>> Cc: Sean Anderson; Lukas Auer; Simon Glass; Anup Patel; Daniel Schwierzeck; u-boot@lists.denx.de; Heinrich Schuchardt
>>>>>>> Subject: [PATCH 1/1] riscv: don't jump to 0x0 in handle_ipi()
>>>>>>>
>>>>>>> At least on the Kendryte K210:
>>>>>>>
>>>>>>> gd->arch.available_harts= 0x0000000000000003 arch.ipi[0].addr=
>>>>>>> gd->0x0000000000000000 arch.ipi[0].arg0= 0x0000000000000000
>>>>>>> gd->arch.ipi[0].arg1= 0x0000000000000000 arch.ipi[1].addr=
>>>>>>> gd->0x0000000000000000 arch.ipi[1].arg0= 0x0000000000000000
>>>>>>> gd->arch.ipi[1].arg1= 0x0000000000000000
>>>>>>>
>>>>>>> We should not jump to 0x0 to handle an interrupt.
>>>>>>
>>>>>> Can you explain why K210 be affected by it ?
>>>>>
>>>>> I have been running U-Boot on the MaixDuino.
>>>>>
>>>>> Without this patch secondary_hart_loop() is reached only once. With the
>>>>> patch it is reached several thousand times.
>>>>
>>>> Hm, interesting. To me, this is a symptom of something else going
>>>> terribly wrong. I originally had this check in place so that it would
>>>> be easier to detect these sorts of errors. I don't think this is the
>>>> correct fix, however. We should really try and find the root cause of
>>>> the bug.
>>>>
>>>>> I would not expect NULL to contain any code that should be executed by
>>>>> the secondary hart. See doc/board/sipeed/maix.rst:
>>>>>
>>>>> Address    Size      Description
>>>>> ========== ========= ===========
>>>>> 0x00000000 0x1000    debug
>>>>> 0x00001000 0x1000    rom
>>>>> 0x02000000 0xC000    clint
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>> ---
>>>>>>>   arch/riscv/lib/smp.c | 2 ++
>>>>>>>   1 file changed, 2 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c index ac22136314..d725fa32e8 100644
>>>>>>> --- a/arch/riscv/lib/smp.c
>>>>>>> +++ b/arch/riscv/lib/smp.c
>>>>>>> @@ -96,6 +96,8 @@ void handle_ipi(ulong hart)
>>>>>>>                  return;
>>>>>>>          }
>>>>>>>
>>>>>>> +       if (!smp_function)
>>>>>>> +               return;
>>>>>>>          smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);  }
>>>>>>
>>>>>> I remember Sean add this check in
>>>>>> [v10,14/21] riscv: Clean up IPI initialization code
>>>>>> . And I ask him to remove.
>>>>>> https://patchwork.ozlabs.org/project/uboot/patch/20200503024637.327733-15-seanga2@gmail.com/
>>>>>
>>>>> Your comment was: "Please remove the sanity check. If zero address is
>>>>> the intended jump point, then system will work abnormal."
>>>>>
>>>>> The only place where gd->arch.ipi[hart].addr is set is:
>>>>>
>>>>> arch/riscv/lib/smp.c:53 send_ipi_many():
>>>>> gd->arch.ipi[reg].addr = ipi->addr;
>>>>>
>>>>> send_ipi_many() is only called in smp_call_function().
>>>>>
>>>>> So the line
>>>>>
>>>>> smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
>>>>>
>>>>> can only work if smp_function() has been called before this point at any
>>>>> time. The start code only calls it in spl_secondary_hart_stack_gd_setup().
>>>>
>>>> Can you retry with [1]? I think Bin Meng removed a call to sbi_init_ipi
>>>> in [2] as part of a larger revert. The actual revert-of-revert is in
>>>> [3], though it really should be split out into its own patch. The
>>>> original commit making these changes is [4].
>>>
>>> Patch series [1] makes not difference. You still have
>>>
>>> gd->arch.ipi[0].addr= 0x0000000000000000
>>> gd->arch.ipi[1].addr= 0x0000000000000000
>>>
>>> and the secondary hart jumps to NULL and never returns.
>>>
>>>>
>>>> Note that the situation before [4] was that the IPI got initialized by
>>>> the first hart to call any IPI function. If that hart was not the boot
>>>> hart, Bad Things started to happen (e.g. race conditions, memory
>>>> corruption, etc). In that patch, I moved the initialization to its own
>>>> function so we would not have any race conditions and instead have
>>>> (easier-to-debug imo) calls to handle_ipi with bogus arguments. Of
>>>> course, when everything is working properly, we should get neither of
>>>> these.
>>>>
>>>> [1] https://patchwork.ozlabs.org/project/uboot/list/?series=193047
>>>> [2] https://patchwork.ozlabs.org/project/uboot/patch/1595225827-23674-1-git-send-email-bmeng.cn@gmail.com/
>>>> [3] https://patchwork.ozlabs.org/project/uboot/patch/20200729095636.1077054-5-seanga2@gmail.com/
>>>> [4] https://patchwork.ozlabs.org/project/uboot/patch/20200521161503.384823-15-seanga2@gmail.com/
>>>>
>>>>> Why do we call handle_ipi() on the secondary hart at all?
>>>>
>>>> Presumably to handle the IPI it got sent? Sorry, I'm confused as to what
>>>> you're getting at.
>>>>
>>>
>>> I cannot see anything to be done by a secondary hart in case of a
>>> software interrupt.
>>>
>>> Best regards
>>>
>>> Heinrich
>>
>> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
>> crash:
>>
>> => reset
>> resetting ...
>> Unhandled exception: Illegal instruction
>> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
>> ### ERROR ### Please RESET the board ###
>
> Hm, interesting. I don't see that error. What commit are you testing with?

origin/master 23e333a5c083a000d0cabc

sipeed_maix_bitm_defconfig plus

    CONFIG_DEBUG_UART=y
    CONFIG_DEBUG_UART_SIFIVE=y
    CONFIG_DEBUG_UART_BASE=0x38000000
    CONFIG_DEBUG_UART_CLOCK=390000000

on a Maixduino.

Best regards

Heinrich
>
>>
>> Objdump shows that it is related to the secondary_hart_loop:
>>
>>         j       secondary_hart_loop
>>     8000021a:   b7ed                    j       80000204
>> <secondary_hart_loop>
>>
>> This crash is avoided with this patch.
>>
>> How do we get forward?
>>
>> Best regards
>>
>> Heinrich
>>
>>>
>>>>> Where do you expect it to jump if U-Boot is the first binary loaded?
>>>>>
>>>>> Even if spl_secondary_hart_stack_gd_setup() has initialized the jump
>>>>> address to secondary_hart_relocate(), do we want the secondary hart to
>>>>> execute it again and again?
>>>>
>>>> --Sean
>>>>
>>>
>>
>
>
Sean Anderson Sept. 1, 2020, 2:38 p.m. UTC | #11
On 9/1/20 7:23 AM, Heinrich Schuchardt wrote:
> On 01.09.20 13:14, Sean Anderson wrote:
>> On 9/1/20 6:51 AM, Heinrich Schuchardt wrote:
>>> When resetting the sipeed_maix_bitm_defconfig without the patch I see a
>>> crash:
>>>
>>> => reset
>>> resetting ...
>>> Unhandled exception: Illegal instruction
>>> EPC: 0000000000000000 RA: 000000008000021a TVAL: 00000002d947c509
>>> ### ERROR ### Please RESET the board ###
>>
>> Hm, interesting. I don't see that error. What commit are you testing with?
> 
> origin/master 23e333a5c083a000d0cabc
> 
> sipeed_maix_bitm_defconfig plus
> 
>     CONFIG_DEBUG_UART=y
>     CONFIG_DEBUG_UART_SIFIVE=y
>     CONFIG_DEBUG_UART_BASE=0x38000000
>     CONFIG_DEBUG_UART_CLOCK=390000000
> 
> on a Maixduino.

Ok, I can reproduce that crash, but only with the debug uart enabled. I
suspect it happens every time, but we only get an error with the debug
uart. I enabled CONFIG_SHOW_REGS, but there was some interference
between the output and the (next) instance of U-Boot. 

> 
> Best regards
> 
> Heinrich
>>
>>>
>>> Objdump shows that it is related to the secondary_hart_loop:
>>>
>>>         j       secondary_hart_loop
>>>     8000021a:   b7ed                    j       80000204
>>> <secondary_hart_loop>

It is actually related to the previous instruction

80000216:   56e000ef                jal     ra,80000784 <handle_ipi>

secondary_hart_loop is never reached because handle_ipi jumps to 0x0.

I decided to do a bit more experimentation with the following patch

diff --git i/arch/riscv/lib/smp.c w/arch/riscv/lib/smp.c
index ac22136314..8be320f6ae 100644
--- i/arch/riscv/lib/smp.c
+++ w/arch/riscv/lib/smp.c
@@ -54,6 +54,10 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
                gd->arch.ipi[reg].arg0 = ipi->arg0;
                gd->arch.ipi[reg].arg1 = ipi->arg1;
 
+               printf("sending %lx(%lx, %lx) to hart %u\n", ipi->addr,
+                      ipi->arg0, ipi->arg1, reg);
+
+               __smp_mb();
                ret = riscv_send_ipi(reg);
                if (ret) {
                        pr_err("Cannot send IPI to hart %d\n", reg);
@@ -77,15 +81,23 @@ void handle_ipi(ulong hart)
 {
        int ret;
        void (*smp_function)(ulong hart, ulong arg0, ulong arg1);
+       int ipi;
 
        if (hart >= CONFIG_NR_CPUS)
                return;
+       riscv_get_ipi(hart, &ipi);
 
        __smp_mb();
 
        smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
        invalidate_icache_all();
 
+       printf("calling %p(%lx, %lx) on hart %lu, ipi = %d\n", smp_function,
+              gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1, hart, ipi);
+
+       if (!ipi)
+               return;
+
        /*
         * Clear the IPI to acknowledge the request before jumping to the
         * requested function.

With this patch I get the following output

=> reset
resetting ...
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 0
calling 0000000000000000(0, 0) on hart 1, ipi = 1


U-Boot 2020.10-rc3-00033-g23e333a5c0-dirty (Sep 01 2020 - 09:55:43 -0400)

DRAM:  8 MiB
sending 805cc1fa(80589040, 8058cdd0) to hart 1
In:    serial@38000000
Out:   serial@38000000
Err:   serial@38000000
=>

There is no "calling" line after the "sending" line because hart 1 is
hung from jumping to 0x0.

From this log, it is clear that something other than smp_call_function
is sending an ipi, and, further, that there is a delay between when the
IRQ_M_SOFT bit is asserted and when the clint has a pending interrupt.

To further investigate this, I enabled debug output. This causes some
garbling in the console, as both harts try and write at the same time.
To alleviate this, I applied the following patch which adds locking to
puts and putc (NB this only works on K210)


diff --git i/common/console.c w/common/console.c
index 8e50af1f9d..31622df9b3 100644
--- i/common/console.c
+++ w/common/console.c
@@ -515,7 +515,35 @@ static inline void pre_console_puts(const char *s) {}
 static inline void print_pre_console_buffer(int flushpoint) {}
 #endif
 
-void putc(const char c)
+/* Static location which will survive across resets */
+u32 *console_spinlock = (void *)0x80400000;
+
+void console_lock(void)
+{
+       u32 tmp = 1;
+
+       /*
+        * We only ever write 1 to the lock, so if anything else is there then
+        * this is the first access to the spinlock, and we have thus acquired
+        * it. This will not work if the spinlock is ever (un)initialized to
+        * exactly 1, but I'll take my chances :)
+        */
+       while (tmp == 1)
+               asm volatile ("amoswap.w.aq %0, %0, %1"
+                             : "+r" (tmp), "+A" (console_spinlock)
+                             :
+                             : "memory");
+}
+
+void console_unlock(void)
+{
+       asm volatile ("amoswap.w.rl x0, x0, %0"
+                     : "+A" (console_spinlock)
+                     :
+                     : "memory");
+}
+
+static void _putc(const char c)
 {
 #ifdef CONFIG_SANDBOX
        /* sandbox can send characters to stdout before it has a console */
@@ -563,7 +591,14 @@ void putc(const char c)
        }
 }
 
-void puts(const char *s)
+void putc(const char c)
+{
+       console_lock();
+       _putc(c);
+       console_unlock();
+}
+
+static void _puts(const char *s)
 {
 #ifdef CONFIG_SANDBOX
        /* sandbox can send characters to stdout before it has a console */
@@ -614,6 +649,13 @@ void puts(const char *s)
        }
 }
 
+void puts(const char *s)
+{
+       console_lock();
+       _puts(s);
+       console_unlock();
+}
+
 #ifdef CONFIG_CONSOLE_RECORD
 int console_record_init(void)
 {

With this patch applied the output is [1] (and also far too long for
this email). Note that hart 1 first recieves a software interrupt on
line 46, which is also the first output after reset. The next line
(initcall: 0000000080005cc0) is from initcall_run_list calling
initf_bootstage, the first initcall after log_init. This indicates to me
that the pending interrupt eithe was triggered by start.S, or was never
cleared. Perhaps line 69 (csrc    MODE_PREFIX(ip), t0) is a no-op.

The clint only indicates an interrupt is present on line 723. Line 722
(initcall: 000000008001ce3a) indicates a call to timer_init. This is the
initcall immediately following arch_cpu_init_dm. Although this function
initializes the IPI, afact it does not make any writes to it. This does
suggest that some kind of effect does happen.

--Sean

[1] https://gist.github.com/Forty-Bot/e010480c34f31f0ecc150bbacd552ede
diff mbox series

Patch

diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
index ac22136314..d725fa32e8 100644
--- a/arch/riscv/lib/smp.c
+++ b/arch/riscv/lib/smp.c
@@ -96,6 +96,8 @@  void handle_ipi(ulong hart)
 		return;
 	}

+	if (!smp_function)
+		return;
 	smp_function(hart, gd->arch.ipi[hart].arg0, gd->arch.ipi[hart].arg1);
 }