diff mbox series

fdt: Avoid recursion when traversing tree

Message ID 20200716045542.33554-1-aik@ozlabs.ru
State New
Headers show
Series fdt: Avoid recursion when traversing tree | expand

Commit Message

Alexey Kardashevskiy July 16, 2020, 4:55 a.m. UTC
A loop over peers does not need recursion which becomes a problem with
hundreds devices.

Cc: Greg Kurz <groug@kaod.org>
Fixes: efa56b851fab ("fdt: Delete nodes of devices removed between boot and CAS")
Suggested-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 board-qemu/slof/fdt.fs | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

Comments

Greg Kurz July 16, 2020, 3:54 p.m. UTC | #1
Hi Alexey,

It seems it did again ! I haven't received this in my inbox, despite you've
put me on Cc... :-\

On Thu, 16 Jul 2020 14:55:42 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> A loop over peers does not need recursion which becomes a problem with
> hundreds devices.
> 

You're likely right but, per curiosity, do you have some numbers to
share ?

> Cc: Greg Kurz <groug@kaod.org>
> Fixes: efa56b851fab ("fdt: Delete nodes of devices removed between boot and CAS")
> Suggested-by: Jordan Niethe <jniethe5@gmail.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  board-qemu/slof/fdt.fs | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> index 7da4ff066702..03b1d4034d0a 100644
> --- a/board-qemu/slof/fdt.fs
> +++ b/board-qemu/slof/fdt.fs
> @@ -636,20 +636,23 @@ r> drop
>      THEN
>  ;
>  
> -: (fdt-cas-search-obsolete-nodes) ( start node -- )
> -    dup IF
> -	dup child 2 pick swap recurse
> -	dup peer 2 pick swap recurse
> -
> -	dup fdt-cas-node-obsolete? IF
> -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> -	    dup delete-node
> -	THEN
> +: (fdt-cas-search-obsolete-nodes) ( node -- )
> +    dup child
> +    BEGIN
> +	dup
> +    WHILE
> +	dup recurse
> +	peer
> +    REPEAT
> +    drop
> +    dup fdt-cas-node-obsolete? IF
> +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> +        dup delete-node
>      THEN
> -    2drop
> +    drop
>  ;
>  
> -: fdt-cas-delete-obsolete-nodes ( start -- )
> +: fdt-cas-delete-obsolete-nodes ( -- )
>      s" /" find-device get-node (fdt-cas-search-obsolete-nodes)
>      fdt-cas-delete-obsolete-aliases
>  ;
> @@ -657,7 +660,7 @@ r> drop
>  : fdt-fix-cas-node ( start -- )
>      fdt-generation# 1+ to fdt-generation#
>      0 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Add phandles
> -    dup fdt-cas-delete-obsolete-nodes             \ Delete removed devices
> +    fdt-cas-delete-obsolete-nodes             \ Delete removed devices

Maybe keep the comment aligned with the other ones ?

Reviewed-by: Greg Kurz <groug@kaod.org>

>      1 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Patch+add other properties
>      2 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Delete phandles from pass 0
>      drop
Segher Boessenkool July 16, 2020, 9:21 p.m. UTC | #2
Hi!

On Thu, Jul 16, 2020 at 05:54:30PM +0200, Greg Kurz wrote:
> On Thu, 16 Jul 2020 14:55:42 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> > A loop over peers does not need recursion which becomes a problem with
> > hundreds devices.
> 
> You're likely right but, per curiosity, do you have some numbers to
> share ?

It's two items on the data stack per recursion (and one on the return
stack).  This would probably not take too much stack (just a lot, not
*too* much though), except that to do "peer" we also recurse, so if
there are a lot of entries in any of your parent nodes, you lose.  And
most trees have some nodes with many children.

You can code any of the tree walks without any recursion, using just
O(1) extra memory (just keep track of the previous node visited, for
example), but in this case, it is probably fine to just do a loop to
handle the nodes on one level.

> > -: (fdt-cas-search-obsolete-nodes) ( start node -- )
> > -    dup IF
> > -	dup child 2 pick swap recurse
> > -	dup peer 2 pick swap recurse
> > -
> > -	dup fdt-cas-node-obsolete? IF
> > -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> > -	    dup delete-node
> > -	THEN
> > +: (fdt-cas-search-obsolete-nodes) ( node -- )
> > +    dup child
> > +    BEGIN
> > +	dup
> > +    WHILE
> > +	dup recurse
> > +	peer
> > +    REPEAT
> > +    drop
> > +    dup fdt-cas-node-obsolete? IF
> > +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> > +        dup delete-node
> >      THEN
> > -    2drop
> > +    drop
> >  ;

... which is what this does :-)

The patch looks fine to me.


Segher
Alexey Kardashevskiy July 17, 2020, 12:34 a.m. UTC | #3
On 17/07/2020 01:54, Greg Kurz wrote:
> Hi Alexey,
> 
> It seems it did again ! I haven't received this in my inbox, despite you've
> put me on Cc... :-\
> 
> On Thu, 16 Jul 2020 14:55:42 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> A loop over peers does not need recursion which becomes a problem with
>> hundreds devices.
>>
> 
> You're likely right but, per curiosity, do you have some numbers to
> share ?


This was discovered with just "-smp 2048,cores=512,threads=4" and the
results were either "cas not implemented" or some other weird XIVE
messages (I did not see any but Anton did).



> 
>> Cc: Greg Kurz <groug@kaod.org>
>> Fixes: efa56b851fab ("fdt: Delete nodes of devices removed between boot and CAS")
>> Suggested-by: Jordan Niethe <jniethe5@gmail.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  board-qemu/slof/fdt.fs | 27 +++++++++++++++------------
>>  1 file changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
>> index 7da4ff066702..03b1d4034d0a 100644
>> --- a/board-qemu/slof/fdt.fs
>> +++ b/board-qemu/slof/fdt.fs
>> @@ -636,20 +636,23 @@ r> drop
>>      THEN
>>  ;
>>  
>> -: (fdt-cas-search-obsolete-nodes) ( start node -- )
>> -    dup IF
>> -	dup child 2 pick swap recurse
>> -	dup peer 2 pick swap recurse
>> -
>> -	dup fdt-cas-node-obsolete? IF
>> -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>> -	    dup delete-node
>> -	THEN
>> +: (fdt-cas-search-obsolete-nodes) ( node -- )
>> +    dup child
>> +    BEGIN
>> +	dup
>> +    WHILE
>> +	dup recurse
>> +	peer
>> +    REPEAT
>> +    drop
>> +    dup fdt-cas-node-obsolete? IF
>> +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>> +        dup delete-node
>>      THEN
>> -    2drop
>> +    drop
>>  ;
>>  
>> -: fdt-cas-delete-obsolete-nodes ( start -- )
>> +: fdt-cas-delete-obsolete-nodes ( -- )
>>      s" /" find-device get-node (fdt-cas-search-obsolete-nodes)
>>      fdt-cas-delete-obsolete-aliases
>>  ;
>> @@ -657,7 +660,7 @@ r> drop
>>  : fdt-fix-cas-node ( start -- )
>>      fdt-generation# 1+ to fdt-generation#
>>      0 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Add phandles
>> -    dup fdt-cas-delete-obsolete-nodes             \ Delete removed devices
>> +    fdt-cas-delete-obsolete-nodes             \ Delete removed devices
> 
> Maybe keep the comment aligned with the other ones ?

May be :)

> 
> Reviewed-by: Greg Kurz <groug@kaod.org>


Thanks!

> 
>>      1 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Patch+add other properties
>>      2 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Delete phandles from pass 0
>>      drop
>
Alexey Kardashevskiy July 17, 2020, 12:45 a.m. UTC | #4
On 17/07/2020 07:21, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 16, 2020 at 05:54:30PM +0200, Greg Kurz wrote:
>> On Thu, 16 Jul 2020 14:55:42 +1000
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> A loop over peers does not need recursion which becomes a problem with
>>> hundreds devices.
>>
>> You're likely right but, per curiosity, do you have some numbers to
>> share ?
> 
> It's two items on the data stack per recursion (and one on the return
> stack).  This would probably not take too much stack (just a lot, not
> *too* much though), except that to do "peer" we also recurse, so if
> there are a lot of entries in any of your parent nodes, you lose.  And
> most trees have some nodes with many children.
> 
> You can code any of the tree walks without any recursion, using just
> O(1) extra memory (just keep track of the previous node visited, for
> example)


for O(1) we need to keep a parent in the current node (which we do). I
wonder now if SLOF does O(1) walks anywhere.


>, but in this case, it is probably fine to just do a loop to
> handle the nodes on one level.
> 
>>> -: (fdt-cas-search-obsolete-nodes) ( start node -- )
>>> -    dup IF
>>> -	dup child 2 pick swap recurse
>>> -	dup peer 2 pick swap recurse
>>> -
>>> -	dup fdt-cas-node-obsolete? IF
>>> -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>>> -	    dup delete-node
>>> -	THEN
>>> +: (fdt-cas-search-obsolete-nodes) ( node -- )
>>> +    dup child
>>> +    BEGIN
>>> +	dup
>>> +    WHILE
>>> +	dup recurse
>>> +	peer
>>> +    REPEAT
>>> +    drop
>>> +    dup fdt-cas-node-obsolete? IF
>>> +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>>> +        dup delete-node
>>>      THEN
>>> -    2drop
>>> +    drop
>>>  ;
> 
> ... which is what this does :-)
Segher Boessenkool July 17, 2020, 12:54 a.m. UTC | #5
On Fri, Jul 17, 2020 at 10:45:58AM +1000, Alexey Kardashevskiy wrote:
> > You can code any of the tree walks without any recursion, using just
> > O(1) extra memory (just keep track of the previous node visited, for
> > example)
> 
> for O(1) we need to keep a parent in the current node (which we do). I
> wonder now if SLOF does O(1) walks anywhere.

Before the FDT stuff the only (full) tree walks were done via the client
interface, and then all the traversion logic is done there.  In OF you
typically open a node by (partial) path name, unit addresses, etc.; you
are never interested in a whole (sub-)tree, as far as I can see.  Maybe
there is some case, but it then escapes me.


Segher
Greg Kurz July 17, 2020, 7:42 a.m. UTC | #6
On Fri, 17 Jul 2020 10:34:40 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> 
> 
> On 17/07/2020 01:54, Greg Kurz wrote:
> > Hi Alexey,
> > 
> > It seems it did again ! I haven't received this in my inbox, despite you've
> > put me on Cc... :-\
> > 
> > On Thu, 16 Jul 2020 14:55:42 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> > 
> >> A loop over peers does not need recursion which becomes a problem with
> >> hundreds devices.
> >>
> > 
> > You're likely right but, per curiosity, do you have some numbers to
> > share ?
> 
> 
> This was discovered with just "-smp 2048,cores=512,threads=4" and the
> results were either "cas not implemented" or some other weird XIVE
> messages (I did not see any but Anton did).
> 

Oh, I was thinking to something like "it took 30 minutes to boot", but
you seem to be talking about dysfunction of SLOF, which would be caused
by stack exhaustion if I understand Segher's anwser correctly, right ?

> 
> 
> > 
> >> Cc: Greg Kurz <groug@kaod.org>
> >> Fixes: efa56b851fab ("fdt: Delete nodes of devices removed between boot and CAS")
> >> Suggested-by: Jordan Niethe <jniethe5@gmail.com>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  board-qemu/slof/fdt.fs | 27 +++++++++++++++------------
> >>  1 file changed, 15 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
> >> index 7da4ff066702..03b1d4034d0a 100644
> >> --- a/board-qemu/slof/fdt.fs
> >> +++ b/board-qemu/slof/fdt.fs
> >> @@ -636,20 +636,23 @@ r> drop
> >>      THEN
> >>  ;
> >>  
> >> -: (fdt-cas-search-obsolete-nodes) ( start node -- )
> >> -    dup IF
> >> -	dup child 2 pick swap recurse
> >> -	dup peer 2 pick swap recurse
> >> -
> >> -	dup fdt-cas-node-obsolete? IF
> >> -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> >> -	    dup delete-node
> >> -	THEN
> >> +: (fdt-cas-search-obsolete-nodes) ( node -- )
> >> +    dup child
> >> +    BEGIN
> >> +	dup
> >> +    WHILE
> >> +	dup recurse
> >> +	peer
> >> +    REPEAT
> >> +    drop
> >> +    dup fdt-cas-node-obsolete? IF
> >> +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
> >> +        dup delete-node
> >>      THEN
> >> -    2drop
> >> +    drop
> >>  ;
> >>  
> >> -: fdt-cas-delete-obsolete-nodes ( start -- )
> >> +: fdt-cas-delete-obsolete-nodes ( -- )
> >>      s" /" find-device get-node (fdt-cas-search-obsolete-nodes)
> >>      fdt-cas-delete-obsolete-aliases
> >>  ;
> >> @@ -657,7 +660,7 @@ r> drop
> >>  : fdt-fix-cas-node ( start -- )
> >>      fdt-generation# 1+ to fdt-generation#
> >>      0 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Add phandles
> >> -    dup fdt-cas-delete-obsolete-nodes             \ Delete removed devices
> >> +    fdt-cas-delete-obsolete-nodes             \ Delete removed devices
> > 
> > Maybe keep the comment aligned with the other ones ?
> 
> May be :)
> 
> > 
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> 
> Thanks!
> 
> > 
> >>      1 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Patch+add other properties
> >>      2 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Delete phandles from pass 0
> >>      drop
> > 
>
Alexey Kardashevskiy July 17, 2020, 9:39 a.m. UTC | #7
On 17/07/2020 17:42, Greg Kurz wrote:
> On Fri, 17 Jul 2020 10:34:40 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>>
>>
>> On 17/07/2020 01:54, Greg Kurz wrote:
>>> Hi Alexey,
>>>
>>> It seems it did again ! I haven't received this in my inbox, despite you've
>>> put me on Cc... :-\
>>>
>>> On Thu, 16 Jul 2020 14:55:42 +1000
>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>>> A loop over peers does not need recursion which becomes a problem with
>>>> hundreds devices.
>>>>
>>>
>>> You're likely right but, per curiosity, do you have some numbers to
>>> share ?
>>
>>
>> This was discovered with just "-smp 2048,cores=512,threads=4" and the
>> results were either "cas not implemented" or some other weird XIVE
>> messages (I did not see any but Anton did).
>>
> 
> Oh, I was thinking to something like "it took 30 minutes to boot", but

Well, this too but it is just some waiting :)

> you seem to be talking about dysfunction of SLOF, which would be caused
> by stack exhaustion if I understand Segher's anwser correctly, right ?


Right. I am a bit surprised SLOF did not barf with "stack overflow" though.


> 
>>
>>
>>>
>>>> Cc: Greg Kurz <groug@kaod.org>
>>>> Fixes: efa56b851fab ("fdt: Delete nodes of devices removed between boot and CAS")
>>>> Suggested-by: Jordan Niethe <jniethe5@gmail.com>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>  board-qemu/slof/fdt.fs | 27 +++++++++++++++------------
>>>>  1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
>>>> index 7da4ff066702..03b1d4034d0a 100644
>>>> --- a/board-qemu/slof/fdt.fs
>>>> +++ b/board-qemu/slof/fdt.fs
>>>> @@ -636,20 +636,23 @@ r> drop
>>>>      THEN
>>>>  ;
>>>>  
>>>> -: (fdt-cas-search-obsolete-nodes) ( start node -- )
>>>> -    dup IF
>>>> -	dup child 2 pick swap recurse
>>>> -	dup peer 2 pick swap recurse
>>>> -
>>>> -	dup fdt-cas-node-obsolete? IF
>>>> -	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>>>> -	    dup delete-node
>>>> -	THEN
>>>> +: (fdt-cas-search-obsolete-nodes) ( node -- )
>>>> +    dup child
>>>> +    BEGIN
>>>> +	dup
>>>> +    WHILE
>>>> +	dup recurse
>>>> +	peer
>>>> +    REPEAT
>>>> +    drop
>>>> +    dup fdt-cas-node-obsolete? IF
>>>> +        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
>>>> +        dup delete-node
>>>>      THEN
>>>> -    2drop
>>>> +    drop
>>>>  ;
>>>>  
>>>> -: fdt-cas-delete-obsolete-nodes ( start -- )
>>>> +: fdt-cas-delete-obsolete-nodes ( -- )
>>>>      s" /" find-device get-node (fdt-cas-search-obsolete-nodes)
>>>>      fdt-cas-delete-obsolete-aliases
>>>>  ;
>>>> @@ -657,7 +660,7 @@ r> drop
>>>>  : fdt-fix-cas-node ( start -- )
>>>>      fdt-generation# 1+ to fdt-generation#
>>>>      0 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Add phandles
>>>> -    dup fdt-cas-delete-obsolete-nodes             \ Delete removed devices
>>>> +    fdt-cas-delete-obsolete-nodes             \ Delete removed devices
>>>
>>> Maybe keep the comment aligned with the other ones ?
>>
>> May be :)
>>
>>>
>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>
>>
>> Thanks!
>>
>>>
>>>>      1 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Patch+add other properties
>>>>      2 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Delete phandles from pass 0
>>>>      drop
>>>
>>
>
Segher Boessenkool July 17, 2020, 10:28 a.m. UTC | #8
On Fri, Jul 17, 2020 at 07:39:53PM +1000, Alexey Kardashevskiy wrote:
> Well, this too but it is just some waiting :)
> 
> > you seem to be talking about dysfunction of SLOF, which would be caused
> > by stack exhaustion if I understand Segher's anwser correctly, right ?
> 
> 
> Right. I am a bit surprised SLOF did not barf with "stack overflow" though.

I do not know the current state of this well, so apply salt as needed...

It is much too much overhead to check the stack pointer always
(executing a single virtual machine insn is just a handful of real
machine instructions!)  A traditional approach is to only check the
stack in the outer interpreter loop (where you have to find every name
that is used in the dictionary anyway, or parse numbers, or anything
else that costs thousands of cycles *anyway*!)

Many paflof configurations unmap the pages right above and below the
stacks, to trap such things.  That does let slight underflows through,
because the top element is not usually actually stored on the stack.

SLOF runs in real mode, so there are no guard pages, you can write to
all RAM always.

By default the interactive interpreter prints the stack depth at every
prompt.  This is very helpful while writing and testing code.  And with
the "test everything you write as soon as you write it" approach, most
problems are avoided.  Exceptions are capacity problems (like here), or
anything else that cannot be tested for easily (hardware problems for
example).

So, perhaps this tree traversal code should call some "test the stack
pointers and scream" word at strategic places?  At the start of handling
any subtree for example.

(If some heavier routine is called for every node, you can put the test
*there* instead, without making everything slow).


Segher
Alexey Kardashevskiy July 17, 2020, 11:02 a.m. UTC | #9
On 17/07/2020 20:28, Segher Boessenkool wrote:
> On Fri, Jul 17, 2020 at 07:39:53PM +1000, Alexey Kardashevskiy wrote:
>> Well, this too but it is just some waiting :)
>>
>>> you seem to be talking about dysfunction of SLOF, which would be caused
>>> by stack exhaustion if I understand Segher's anwser correctly, right ?
>>
>>
>> Right. I am a bit surprised SLOF did not barf with "stack overflow" though.
> 
> I do not know the current state of this well, so apply salt as needed...
> 
> It is much too much overhead to check the stack pointer always
> (executing a single virtual machine insn is just a handful of real
> machine instructions!) 


This all true but yet there is "ERROR: stack overflow in engine()" and I
saw these in the past, and there others. But I never got to understand
it completely their limits, Thomas would probably know.



> A traditional approach is to only check the
> stack in the outer interpreter loop (where you have to find every name
> that is used in the dictionary anyway, or parse numbers, or anything
> else that costs thousands of cycles *anyway*!)
> 
> Many paflof configurations unmap the pages right above and below the
> stacks, to trap such things.  That does let slight underflows through,
> because the top element is not usually actually stored on the stack.
> 
> SLOF runs in real mode, so there are no guard pages, you can write to
> all RAM always.

Yes :(


> By default the interactive interpreter prints the stack depth at every
> prompt.  This is very helpful while writing and testing code.  And with
> the "test everything you write as soon as you write it" approach, most
> problems are avoided.  Exceptions are capacity problems (like here), or
> anything else that cannot be tested for easily (hardware problems for
> example).
> 
> So, perhaps this tree traversal code should call some "test the stack
> pointers and scream" word at strategic places?  At the start of handling
> any subtree for example.
> 
> (If some heavier routine is called for every node, you can put the test
> *there* instead, without making everything slow).

Making everything slow is perfectly fine for debugging SLOF. Thankfully
SLOF is not that complicated (real mode, 1 CPU, 1 device open at the
time) and normally a VM config is enough to reproduce the problem.
Thomas Huth July 17, 2020, 11:45 a.m. UTC | #10
On 17/07/2020 13.02, Alexey Kardashevskiy wrote:
> 
> 
> On 17/07/2020 20:28, Segher Boessenkool wrote:
>> On Fri, Jul 17, 2020 at 07:39:53PM +1000, Alexey Kardashevskiy wrote:
>>> Well, this too but it is just some waiting :)
>>>
>>>> you seem to be talking about dysfunction of SLOF, which would be caused
>>>> by stack exhaustion if I understand Segher's anwser correctly, right ?
>>>
>>>
>>> Right. I am a bit surprised SLOF did not barf with "stack overflow" though.
>>
>> I do not know the current state of this well, so apply salt as needed...
>>
>> It is much too much overhead to check the stack pointer always
>> (executing a single virtual machine insn is just a handful of real
>> machine instructions!) 
> 
> 
> This all true but yet there is "ERROR: stack overflow in engine()" and I
> saw these in the past, and there others. But I never got to understand
> it completely their limits, Thomas would probably know.

IIRC that is issued by a check at the beginning of engine(), i.e. you'll
only see that if you e.g. call from Forth into C code and that calls
back into Forth via engine().

 Thomas
Segher Boessenkool July 17, 2020, 5:49 p.m. UTC | #11
On Fri, Jul 17, 2020 at 09:02:34PM +1000, Alexey Kardashevskiy wrote:
> > (If some heavier routine is called for every node, you can put the test
> > *there* instead, without making everything slow).
> 
> Making everything slow is perfectly fine for debugging SLOF.

Not if you have to debug on real hardware, or debug the hardware driver
itself, or even the actual hardware itself.

> Thankfully
> SLOF is not that complicated (real mode, 1 CPU, 1 device open at the
> time) and normally a VM config is enough to reproduce the problem.

SLOF always has many devices open at the same time.  Only one ihandle
is in my-self at any point of time, of course.  This very nicely
simplifies how you have to deal with things.


Segher
Alexey Kardashevskiy July 20, 2020, 12:50 a.m. UTC | #12
On 18/07/2020 03:49, Segher Boessenkool wrote:
> On Fri, Jul 17, 2020 at 09:02:34PM +1000, Alexey Kardashevskiy wrote:
>>> (If some heavier routine is called for every node, you can put the test
>>> *there* instead, without making everything slow).
>>
>> Making everything slow is perfectly fine for debugging SLOF.
> 
> Not if you have to debug on real hardware, or debug the hardware driver
> itself, or even the actual hardware itself.

SLOF on real hardware in 2020? :)


>> Thankfully
>> SLOF is not that complicated (real mode, 1 CPU, 1 device open at the
>> time) and normally a VM config is enough to reproduce the problem.
> 
> SLOF always has many devices open at the same time.  Only one ihandle
> is in my-self at any point of time, of course.  This very nicely
> simplifies how you have to deal with things.

I meant it does not open more than a single boot device (one or more
instances, with packages), and yeah, it keeps serial/vga and TPM open
but 1) it is not really many 2) checking stack boundary at every step
would make a useful debug tool.

Well, I got my answer anyway, thanks for sharing :)
Segher Boessenkool July 20, 2020, 9:23 p.m. UTC | #13
Hi!

On Mon, Jul 20, 2020 at 10:50:35AM +1000, Alexey Kardashevskiy wrote:
> On 18/07/2020 03:49, Segher Boessenkool wrote:
> > On Fri, Jul 17, 2020 at 09:02:34PM +1000, Alexey Kardashevskiy wrote:
> >>> (If some heavier routine is called for every node, you can put the test
> >>> *there* instead, without making everything slow).
> >>
> >> Making everything slow is perfectly fine for debugging SLOF.
> > 
> > Not if you have to debug on real hardware, or debug the hardware driver
> > itself, or even the actual hardware itself.
> 
> SLOF on real hardware in 2020? :)

*Shrug*.  It should still work!

> >> Thankfully
> >> SLOF is not that complicated (real mode, 1 CPU, 1 device open at the
> >> time) and normally a VM config is enough to reproduce the problem.
> > 
> > SLOF always has many devices open at the same time.  Only one ihandle
> > is in my-self at any point of time, of course.  This very nicely
> > simplifies how you have to deal with things.
> 
> I meant it does not open more than a single boot device (one or more
> instances, with packages), and yeah, it keeps serial/vga and TPM open
> but 1) it is not really many

In the normal case, whenever a device is opened first all its bus parent
devices are opened, recursively.  You also can have multiple files (on
block devices) open at once, etc.  This is the same on all Open Firmware
implementations.

> 2) checking stack boundary at every step
> would make a useful debug tool.

Yes, and very expensive.  I don't dispute that it can be useful in
certain cases :-)

(It is easy to add in such cases, as well.)


Segher
diff mbox series

Patch

diff --git a/board-qemu/slof/fdt.fs b/board-qemu/slof/fdt.fs
index 7da4ff066702..03b1d4034d0a 100644
--- a/board-qemu/slof/fdt.fs
+++ b/board-qemu/slof/fdt.fs
@@ -636,20 +636,23 @@  r> drop
     THEN
 ;
 
-: (fdt-cas-search-obsolete-nodes) ( start node -- )
-    dup IF
-	dup child 2 pick swap recurse
-	dup peer 2 pick swap recurse
-
-	dup fdt-cas-node-obsolete? IF
-	    fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
-	    dup delete-node
-	THEN
+: (fdt-cas-search-obsolete-nodes) ( node -- )
+    dup child
+    BEGIN
+	dup
+    WHILE
+	dup recurse
+	peer
+    REPEAT
+    drop
+    dup fdt-cas-node-obsolete? IF
+        fdt-debug IF dup ." Deleting obsolete node: " dup .node ." = " . cr THEN
+        dup delete-node
     THEN
-    2drop
+    drop
 ;
 
-: fdt-cas-delete-obsolete-nodes ( start -- )
+: fdt-cas-delete-obsolete-nodes ( -- )
     s" /" find-device get-node (fdt-cas-search-obsolete-nodes)
     fdt-cas-delete-obsolete-aliases
 ;
@@ -657,7 +660,7 @@  r> drop
 : fdt-fix-cas-node ( start -- )
     fdt-generation# 1+ to fdt-generation#
     0 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Add phandles
-    dup fdt-cas-delete-obsolete-nodes             \ Delete removed devices
+    fdt-cas-delete-obsolete-nodes             \ Delete removed devices
     1 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Patch+add other properties
     2 to fdt-cas-pass dup (fdt-fix-cas-node) drop \ Delete phandles from pass 0
     drop