mbox series

[0/7] riscv: Correctly handle IPIs already pending upon boot

Message ID 20200907181659.92449-1-seanga2@gmail.com
Headers show
Series riscv: Correctly handle IPIs already pending upon boot | expand

Message

Sean Anderson Sept. 7, 2020, 6:16 p.m. UTC
On the K210, the prior stage bootloader does not clear IPIs. This presents
a problem, because U-Boot up until this point assumes (with one exception)
that IPIs are cleared when it starts. This series attempts to fix this in a
robust manner, and fix several concurrency bugs I noticed while fixing
these other areas. Heinrich previously submitted a patch addressing part of
this problem in [1].

[1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/


Sean Anderson (7):
  Revert "riscv: Clear pending interrupts before enabling IPIs"
  riscv: Match memory barriers between send_ipi_many and handle_ipi
  riscv: Use NULL as a sentinel value for smp_call_function
  riscv: Clear pending IPIs on initialization
  riscv: Add fence to available_harts_lock
  riscv: Ensure gp is NULL or points to valid data
  riscv: Add some comments to start.S

 arch/riscv/cpu/cpu.c        | 18 ++++++++++++++
 arch/riscv/cpu/start.S      | 47 +++++++++++++++++++++++++++++--------
 arch/riscv/lib/interrupts.c |  3 ++-
 arch/riscv/lib/smp.c        | 26 +++++++++++++++++---
 4 files changed, 80 insertions(+), 14 deletions(-)

Comments

Rick Chen Sept. 9, 2020, 2:02 a.m. UTC | #1
Hi Sean

> On the K210, the prior stage bootloader does not clear IPIs. This presents
> a problem, because U-Boot up until this point assumes (with one exception)
> that IPIs are cleared when it starts. This series attempts to fix this in a
> robust manner, and fix several concurrency bugs I noticed while fixing
> these other areas. Heinrich previously submitted a patch addressing part of
> this problem in [1].
>
> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/
>

It sounds like that the bootloader does not deal with SMP flow well
and jump to u-boot-spl, right ?

I have a question that why not try to fix the prior stage bootloader
to clear IPIs correctly?

Actually I have encounter a similar SMP issue like you.
Our prior stage bootloader will jump to u-boot-spl with the incorrect
mstatus and result in the SMP working abnormal in u-boot-spl.

I mean this is an individual case, not a general case.
If we try to cover any errors which come from any prior stage bootloaders,
the SMP flow will become more and more complicated and hard to debug.

That is why it does not need implement SMP flow in U-Boot proper with
SBI v0.2 HSM extension.

Thanks,
Rick

>
> Sean Anderson (7):
>   Revert "riscv: Clear pending interrupts before enabling IPIs"
>   riscv: Match memory barriers between send_ipi_many and handle_ipi
>   riscv: Use NULL as a sentinel value for smp_call_function
>   riscv: Clear pending IPIs on initialization
>   riscv: Add fence to available_harts_lock
>   riscv: Ensure gp is NULL or points to valid data
>   riscv: Add some comments to start.S
>
>  arch/riscv/cpu/cpu.c        | 18 ++++++++++++++
>  arch/riscv/cpu/start.S      | 47 +++++++++++++++++++++++++++++--------
>  arch/riscv/lib/interrupts.c |  3 ++-
>  arch/riscv/lib/smp.c        | 26 +++++++++++++++++---
>  4 files changed, 80 insertions(+), 14 deletions(-)
>
> --
> 2.28.0
>
Sean Anderson Sept. 9, 2020, 2:38 a.m. UTC | #2
On 9/8/20 10:02 PM, Rick Chen wrote:
> Hi Sean
> 
>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>> a problem, because U-Boot up until this point assumes (with one exception)
>> that IPIs are cleared when it starts. This series attempts to fix this in a
>> robust manner, and fix several concurrency bugs I noticed while fixing
>> these other areas. Heinrich previously submitted a patch addressing part of
>> this problem in [1].
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/
>>
> 
> It sounds like that the bootloader does not deal with SMP flow well
> and jump to u-boot-spl, right ?
> 
> I have a question that why not try to fix the prior stage bootloader
> to clear IPIs correctly?

Because it is a ROM :)

> 
> Actually I have encounter a similar SMP issue like you.
> Our prior stage bootloader will jump to u-boot-spl with the incorrect
> mstatus and result in the SMP working abnormal in u-boot-spl.

Perhaps we should just clear MIE then? I originally had a patch in this
series which moved the handle_ipi code into handle_trap, and got rid of
the manual checks on the interrupt. Something like

secondary_hart_loop:
	wfi
	j	secondary_hart_loop

Of course as part of that we would need to explicitly enable and disable
interrupts. Perhaps not the worst idea, but I didn't include it here
because I figure the current system work OK, even if it is not what one
might expect.

> I mean this is an individual case, not a general case.
> If we try to cover any errors which come from any prior stage bootloaders,
> the SMP flow will become more and more complicated and hard to debug.

Of course, if a prior bootloader doesn't hold up its end of the
contract, we can be left with some awful bugs to fix. U-Boot is
generally not too bad to debug, but I've had an awful time whenever some
concurrency sneaks into the mix. I think it's much better to confine the
(necessary) complexity to as few files as possible, so that the rest of
the code can be ignorant. I think part of that is verifying that we have
everything in a known state, so that when we see something unexpected,
we can handle it/panic/whatever instead of silently getting a bug.

--Sean
Sean Anderson Sept. 9, 2020, 2:44 a.m. UTC | #3
On 9/8/20 10:38 PM, Sean Anderson wrote:
> On 9/8/20 10:02 PM, Rick Chen wrote:
>> Hi Sean
>>
>>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>>> a problem, because U-Boot up until this point assumes (with one exception)
>>> that IPIs are cleared when it starts. This series attempts to fix this in a
>>> robust manner, and fix several concurrency bugs I noticed while fixing
>>> these other areas. Heinrich previously submitted a patch addressing part of
>>> this problem in [1].
>>>
>>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/
>>>
>>
>> It sounds like that the bootloader does not deal with SMP flow well
>> and jump to u-boot-spl, right ?
>>
>> I have a question that why not try to fix the prior stage bootloader
>> to clear IPIs correctly?
> 
> Because it is a ROM :)

Err, perhaps I should clarify. When I say "prior stage bootloader," I
mean that in the general sense of "anything which comes before U-Boot,"
and not to refer specifically to SPL or TPL. For the k210, this is
something akin to the ZSBL on an fu540.

--Sean
Rick Chen Sept. 10, 2020, 7:08 a.m. UTC | #4
Hi Sean

> On 9/8/20 10:02 PM, Rick Chen wrote:
> > Hi Sean
> >
> >> On the K210, the prior stage bootloader does not clear IPIs. This presents
> >> a problem, because U-Boot up until this point assumes (with one exception)
> >> that IPIs are cleared when it starts. This series attempts to fix this in a
> >> robust manner, and fix several concurrency bugs I noticed while fixing
> >> these other areas. Heinrich previously submitted a patch addressing part of
> >> this problem in [1].
> >>
> >> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/
> >>
> >
> > It sounds like that the bootloader does not deal with SMP flow well
> > and jump to u-boot-spl, right ?
> >
> > I have a question that why not try to fix the prior stage bootloader
> > to clear IPIs correctly?
>
> Because it is a ROM :)

Is it a mask ROM or flash ROM ?

>
> >
> > Actually I have encounter a similar SMP issue like you.
> > Our prior stage bootloader will jump to u-boot-spl with the incorrect
> > mstatus and result in the SMP working abnormal in u-boot-spl.
>
> Perhaps we should just clear MIE then? I originally had a patch in this
> series which moved the handle_ipi code into handle_trap, and got rid of
> the manual checks on the interrupt. Something like
>
> secondary_hart_loop:
>         wfi
>         j       secondary_hart_loop
>
> Of course as part of that we would need to explicitly enable and disable
> interrupts. Perhaps not the worst idea, but I didn't include it here
> because I figure the current system work OK, even if it is not what one
> might expect.
>
> > I mean this is an individual case, not a general case.
> > If we try to cover any errors which come from any prior stage bootloaders,
> > the SMP flow will become more and more complicated and hard to debug.
>
> Of course, if a prior bootloader doesn't hold up its end of the
> contract, we can be left with some awful bugs to fix. U-Boot is
> generally not too bad to debug, but I've had an awful time whenever some
> concurrency sneaks into the mix. I think it's much better to confine the
> (necessary) complexity to as few files as possible, so that the rest of
> the code can be ignorant. I think part of that is verifying that we have
> everything in a known state, so that when we see something unexpected,
> we can handle it/panic/whatever instead of silently getting a bug.

It sounds like an error handling and the errors come from the prior
stage bootloader.
Without U-Boot, does Kernel handle this kind of IPIs not clean
unexpected errors ?

Thanks,
Rick

>
> --Sean
Sean Anderson Sept. 10, 2020, 10:49 a.m. UTC | #5
On 9/10/20 3:08 AM, Rick Chen wrote:
> Hi Sean
> 
>> On 9/8/20 10:02 PM, Rick Chen wrote:
>>> Hi Sean
>>>
>>>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>>>> a problem, because U-Boot up until this point assumes (with one exception)
>>>> that IPIs are cleared when it starts. This series attempts to fix this in a
>>>> robust manner, and fix several concurrency bugs I noticed while fixing
>>>> these other areas. Heinrich previously submitted a patch addressing part of
>>>> this problem in [1].
>>>>
>>>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk@gmx.de/
>>>>
>>>
>>> It sounds like that the bootloader does not deal with SMP flow well
>>> and jump to u-boot-spl, right ?
>>>
>>> I have a question that why not try to fix the prior stage bootloader
>>> to clear IPIs correctly?
>>
>> Because it is a ROM :)
> 
> Is it a mask ROM or flash ROM ?

I don't know, it's not documented. From what I can tell, it's controlled
by the OTP device. However, I haven't thoroughly investigated it.

> 
>>
>>>
>>> Actually I have encounter a similar SMP issue like you.
>>> Our prior stage bootloader will jump to u-boot-spl with the incorrect
>>> mstatus and result in the SMP working abnormal in u-boot-spl.
>>
>> Perhaps we should just clear MIE then? I originally had a patch in this
>> series which moved the handle_ipi code into handle_trap, and got rid of
>> the manual checks on the interrupt. Something like
>>
>> secondary_hart_loop:
>>         wfi
>>         j       secondary_hart_loop
>>
>> Of course as part of that we would need to explicitly enable and disable
>> interrupts. Perhaps not the worst idea, but I didn't include it here
>> because I figure the current system work OK, even if it is not what one
>> might expect.
>>
>>> I mean this is an individual case, not a general case.
>>> If we try to cover any errors which come from any prior stage bootloaders,
>>> the SMP flow will become more and more complicated and hard to debug.
>>
>> Of course, if a prior bootloader doesn't hold up its end of the
>> contract, we can be left with some awful bugs to fix. U-Boot is
>> generally not too bad to debug, but I've had an awful time whenever some
>> concurrency sneaks into the mix. I think it's much better to confine the
>> (necessary) complexity to as few files as possible, so that the rest of
>> the code can be ignorant. I think part of that is verifying that we have
>> everything in a known state, so that when we see something unexpected,
>> we can handle it/panic/whatever instead of silently getting a bug.
> 
> It sounds like an error handling and the errors come from the prior
> stage bootloader.
> Without U-Boot, does Kernel handle this kind of IPIs not clean
> unexpected errors ?

Well, software interrupts are disabled on each hart until
riscv_intc_init is called (and enables soft irqs). This prevents the
handler from being called before ipi_data is initialized. After that,
handle_IPI can be called, but if ops == 0 (e.g. the IPI wasn't signaled
by Linux), then it just goes to done.

--Sean