diff mbox series

Revert "spl: Drop bd_info in the data section"

Message ID 20210408165611.1195887-1-mr.nuke.me@gmail.com
State Accepted
Delegated to: Tom Rini
Headers show
Series Revert "spl: Drop bd_info in the data section" | expand

Commit Message

Alexandru Gagniuc April 8, 2021, 4:56 p.m. UTC
This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.

struct global_data contains a pointer to the bd_info structure. This
pointer was populated spl_set_bd() to a pre-allocated bd_info in the
".data" section. The referenced commit replaced this mechanism to one
that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
which very few boards do.

The result is that (struct global_data)->bd is NULL in SPL on most
platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
access (struct global_data)->bd and set the "/memory" node in the
devicetree. The result is that the "/memory" node contains garbage
values, causing linux to panic() as it sets up the page table.

Instead of trying to fix the mess, potentially causing other issues,
revert to the code that worked, while this change is reworked.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 arch/arm/cpu/armv8/fsl-layerscape/spl.c |  5 +----
 common/spl/Kconfig                      |  9 ---------
 common/spl/spl.c                        | 20 ++++++++------------
 include/spl.h                           | 10 +---------
 4 files changed, 10 insertions(+), 34 deletions(-)

Comments

Simon Glass April 8, 2021, 11:55 p.m. UTC | #1
Hi Alexandru,

On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>
> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
>
> struct global_data contains a pointer to the bd_info structure. This
> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> ".data" section. The referenced commit replaced this mechanism to one
> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> which very few boards do.
>
> The result is that (struct global_data)->bd is NULL in SPL on most
> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> access (struct global_data)->bd and set the "/memory" node in the
> devicetree. The result is that the "/memory" node contains garbage
> values, causing linux to panic() as it sets up the page table.
>
> Instead of trying to fix the mess, potentially causing other issues,
> revert to the code that worked, while this change is reworked.

The goal here is to drop a feature that few boards use and reduce
memory usage in SPL. It has been in place for two releases now.

If Falcon mode needs it, perhaps you could add an 'imply' in the
Kconfig for that feature? Is there one? Or perhaps
CONFIG_ARCH_FIXUP_FDT_MEMORY ?

One option would be to return an error in arch_fixup_fdt(). In
general, fixups make things tricky because there is no way to
determine when they are used but at least this one has a CONFIG.

Regards,
Simon


>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  arch/arm/cpu/armv8/fsl-layerscape/spl.c |  5 +----
>  common/spl/Kconfig                      |  9 ---------
>  common/spl/spl.c                        | 20 ++++++++------------
>  include/spl.h                           | 10 +---------
>  4 files changed, 10 insertions(+), 34 deletions(-)
>
Alexandru Gagniuc April 9, 2021, 7:20 p.m. UTC | #2
Hi Simon

On 4/8/21 6:55 PM, Simon Glass wrote:
> Hi Alexandru,
> 
> On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>>
>> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
>>
>> struct global_data contains a pointer to the bd_info structure. This
>> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
>> ".data" section. The referenced commit replaced this mechanism to one
>> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
>> which very few boards do.
>>
>> The result is that (struct global_data)->bd is NULL in SPL on most
>> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
>> access (struct global_data)->bd and set the "/memory" node in the
>> devicetree. The result is that the "/memory" node contains garbage
>> values, causing linux to panic() as it sets up the page table.
>>
>> Instead of trying to fix the mess, potentially causing other issues,
>> revert to the code that worked, while this change is reworked.
> 
> The goal here is to drop a feature that few boards use and reduce
> memory usage in SPL. It has been in place for two releases now.
> 
> If Falcon mode needs it, perhaps you could add an 'imply' in the
> Kconfig for that feature? Is there one? Or perhaps
> CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> 
> One option would be to return an error in arch_fixup_fdt(). In
> general, fixups make things tricky because there is no way to
> determine when they are used but at least this one has a CONFIG.
> 

The argument that this has been in place for two releases is incorrect. 
Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with 
the v2021.04 release. It definitely was not in 2021.01. It's only in the 
last release, which is four days old t the time of this writing.

Although I was able to find one example, the reality is that we don't 
know the full extent of the breakage. The prudent thing at this point is 
to revert.

The knowledge of how to init the platform is in the devicetree and code. 
Why should kconfig also be involved in storing this knowledge? By this 
model, as the number of boards increases without bounds, every "if" 
predicate tends to be Kconfig driven. That is not maintainable, and why 
I think the original change --and the proposed fixes-- are broken by design.

Furthermore, I'm happy to discuss what to do about Falcon mode, and if 
we should kill it entirely (I have an alternative proposal).  But we 
shouldn't have that discussion in a manner holding my platform hostage. 
To be fair to you, I don't think reverting a 64-byte savings in .data 
will hold your platform hostage either.

Alex
Adam Ford April 9, 2021, 8:24 p.m. UTC | #3
On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
>
> Hi Simon
>
> On 4/8/21 6:55 PM, Simon Glass wrote:
> > Hi Alexandru,
> >
> > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> >>
> >> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> >>
> >> struct global_data contains a pointer to the bd_info structure. This
> >> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> >> ".data" section. The referenced commit replaced this mechanism to one
> >> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> >> which very few boards do.
> >>
> >> The result is that (struct global_data)->bd is NULL in SPL on most
> >> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> >> access (struct global_data)->bd and set the "/memory" node in the
> >> devicetree. The result is that the "/memory" node contains garbage
> >> values, causing linux to panic() as it sets up the page table.
> >>
> >> Instead of trying to fix the mess, potentially causing other issues,
> >> revert to the code that worked, while this change is reworked.
> >
> > The goal here is to drop a feature that few boards use and reduce
> > memory usage in SPL. It has been in place for two releases now.
> >
> > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > Kconfig for that feature? Is there one? Or perhaps
> > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> >
> > One option would be to return an error in arch_fixup_fdt(). In
> > general, fixups make things tricky because there is no way to
> > determine when they are used but at least this one has a CONFIG.
> >
>
> The argument that this has been in place for two releases is incorrect.
> Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> the v2021.04 release. It definitely was not in 2021.01. It's only in the
> last release, which is four days old t the time of this writing.
>
> Although I was able to find one example, the reality is that we don't
> know the full extent of the breakage. The prudent thing at this point is
> to revert.
>
> The knowledge of how to init the platform is in the devicetree and code.
> Why should kconfig also be involved in storing this knowledge? By this
> model, as the number of boards increases without bounds, every "if"
> predicate tends to be Kconfig driven. That is not maintainable, and why
> I think the original change --and the proposed fixes-- are broken by design.
>
> Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> we should kill it entirely (I have an alternative proposal).  But we
> shouldn't have that discussion in a manner holding my platform hostage.
> To be fair to you, I don't think reverting a 64-byte savings in .data
> will hold your platform hostage either.

That original patch broke a bunch of OMAP boards, but enabling
SPL_ALLOC_BD was the solution to fix the issue.
Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
can make falcon mode imply it.

adam
>
> Alex
>
Tom Rini April 9, 2021, 8:53 p.m. UTC | #4
On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> >
> > Hi Simon
> >
> > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > Hi Alexandru,
> > >
> > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > >>
> > >> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > >>
> > >> struct global_data contains a pointer to the bd_info structure. This
> > >> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > >> ".data" section. The referenced commit replaced this mechanism to one
> > >> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > >> which very few boards do.
> > >>
> > >> The result is that (struct global_data)->bd is NULL in SPL on most
> > >> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > >> access (struct global_data)->bd and set the "/memory" node in the
> > >> devicetree. The result is that the "/memory" node contains garbage
> > >> values, causing linux to panic() as it sets up the page table.
> > >>
> > >> Instead of trying to fix the mess, potentially causing other issues,
> > >> revert to the code that worked, while this change is reworked.
> > >
> > > The goal here is to drop a feature that few boards use and reduce
> > > memory usage in SPL. It has been in place for two releases now.
> > >
> > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > Kconfig for that feature? Is there one? Or perhaps
> > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > >
> > > One option would be to return an error in arch_fixup_fdt(). In
> > > general, fixups make things tricky because there is no way to
> > > determine when they are used but at least this one has a CONFIG.
> > >
> >
> > The argument that this has been in place for two releases is incorrect.
> > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > last release, which is four days old t the time of this writing.

Yes.  But another way of saying that is that we're 4 days in to the
merge window.  That's a bit early to say we must revert the change.  If
this was just before the release, yes, revert would be the right answer.
It's also the case the original commit fixes some cases while also
saving size, if I read it right.  So a strict revert isn't right, we'd
need to also probably also default y SPL_ALLOC_BD in some cases.

> > Although I was able to find one example, the reality is that we don't
> > know the full extent of the breakage. The prudent thing at this point is
> > to revert.
> >
> > The knowledge of how to init the platform is in the devicetree and code.
> > Why should kconfig also be involved in storing this knowledge? By this
> > model, as the number of boards increases without bounds, every "if"
> > predicate tends to be Kconfig driven. That is not maintainable, and why
> > I think the original change --and the proposed fixes-- are broken by design.
> >
> > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > we should kill it entirely (I have an alternative proposal).  But we
> > shouldn't have that discussion in a manner holding my platform hostage.
> > To be fair to you, I don't think reverting a 64-byte savings in .data
> > will hold your platform hostage either.
> 
> That original patch broke a bunch of OMAP boards, but enabling
> SPL_ALLOC_BD was the solution to fix the issue.
> Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> can make falcon mode imply it.

It would be "select" since it needs it rather than imply.
Tim Harvey April 10, 2021, 12:29 a.m. UTC | #5
On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
>
> On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > >
> > > Hi Simon
> > >
> > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > Hi Alexandru,
> > > >
> > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > >>
> > > >> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > >>
> > > >> struct global_data contains a pointer to the bd_info structure. This
> > > >> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > >> ".data" section. The referenced commit replaced this mechanism to one
> > > >> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > >> which very few boards do.
> > > >>
> > > >> The result is that (struct global_data)->bd is NULL in SPL on most
> > > >> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > >> access (struct global_data)->bd and set the "/memory" node in the
> > > >> devicetree. The result is that the "/memory" node contains garbage
> > > >> values, causing linux to panic() as it sets up the page table.
> > > >>
> > > >> Instead of trying to fix the mess, potentially causing other issues,
> > > >> revert to the code that worked, while this change is reworked.
> > > >
> > > > The goal here is to drop a feature that few boards use and reduce
> > > > memory usage in SPL. It has been in place for two releases now.
> > > >
> > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > Kconfig for that feature? Is there one? Or perhaps
> > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > >
> > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > general, fixups make things tricky because there is no way to
> > > > determine when they are used but at least this one has a CONFIG.
> > > >
> > >
> > > The argument that this has been in place for two releases is incorrect.
> > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > last release, which is four days old t the time of this writing.
>
> Yes.  But another way of saying that is that we're 4 days in to the
> merge window.  That's a bit early to say we must revert the change.  If
> this was just before the release, yes, revert would be the right answer.
> It's also the case the original commit fixes some cases while also
> saving size, if I read it right.  So a strict revert isn't right, we'd
> need to also probably also default y SPL_ALLOC_BD in some cases.
>
> > > Although I was able to find one example, the reality is that we don't
> > > know the full extent of the breakage. The prudent thing at this point is
> > > to revert.
> > >
> > > The knowledge of how to init the platform is in the devicetree and code.
> > > Why should kconfig also be involved in storing this knowledge? By this
> > > model, as the number of boards increases without bounds, every "if"
> > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > I think the original change --and the proposed fixes-- are broken by design.
> > >
> > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > we should kill it entirely (I have an alternative proposal).  But we
> > > shouldn't have that discussion in a manner holding my platform hostage.
> > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > will hold your platform hostage either.
> >
> > That original patch broke a bunch of OMAP boards, but enabling
> > SPL_ALLOC_BD was the solution to fix the issue.
> > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > can make falcon mode imply it.
>
> It would be "select" since it needs it rather than imply.
>

I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
Drop bd_info in the data section") breaks Gateworks Ventana and
defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
not being used and the breakage is because arch/arm/mach-imx/spl.c
dram_init_banksize() accesses gd->bd which is NULL.

So I would guess that this probably broke a whole lot of IMX based
boards that use SPL.

I don't quite know what the best solution is... we now have a v2021.04
that is broken for several or many boards and I dont' know if its
clear what cases break.

Best regards,

Tim
Tom Rini April 12, 2021, 1:25 p.m. UTC | #6
On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> >
> > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > >
> > > > Hi Simon
> > > >
> > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > Hi Alexandru,
> > > > >
> > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > >>
> > > > >> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > >>
> > > > >> struct global_data contains a pointer to the bd_info structure. This
> > > > >> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > >> ".data" section. The referenced commit replaced this mechanism to one
> > > > >> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > >> which very few boards do.
> > > > >>
> > > > >> The result is that (struct global_data)->bd is NULL in SPL on most
> > > > >> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > >> access (struct global_data)->bd and set the "/memory" node in the
> > > > >> devicetree. The result is that the "/memory" node contains garbage
> > > > >> values, causing linux to panic() as it sets up the page table.
> > > > >>
> > > > >> Instead of trying to fix the mess, potentially causing other issues,
> > > > >> revert to the code that worked, while this change is reworked.
> > > > >
> > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > memory usage in SPL. It has been in place for two releases now.
> > > > >
> > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > >
> > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > general, fixups make things tricky because there is no way to
> > > > > determine when they are used but at least this one has a CONFIG.
> > > > >
> > > >
> > > > The argument that this has been in place for two releases is incorrect.
> > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > last release, which is four days old t the time of this writing.
> >
> > Yes.  But another way of saying that is that we're 4 days in to the
> > merge window.  That's a bit early to say we must revert the change.  If
> > this was just before the release, yes, revert would be the right answer.
> > It's also the case the original commit fixes some cases while also
> > saving size, if I read it right.  So a strict revert isn't right, we'd
> > need to also probably also default y SPL_ALLOC_BD in some cases.
> >
> > > > Although I was able to find one example, the reality is that we don't
> > > > know the full extent of the breakage. The prudent thing at this point is
> > > > to revert.
> > > >
> > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > model, as the number of boards increases without bounds, every "if"
> > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > I think the original change --and the proposed fixes-- are broken by design.
> > > >
> > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > will hold your platform hostage either.
> > >
> > > That original patch broke a bunch of OMAP boards, but enabling
> > > SPL_ALLOC_BD was the solution to fix the issue.
> > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > can make falcon mode imply it.
> >
> > It would be "select" since it needs it rather than imply.
> >
> 
> I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> Drop bd_info in the data section") breaks Gateworks Ventana and
> defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> not being used and the breakage is because arch/arm/mach-imx/spl.c
> dram_init_banksize() accesses gd->bd which is NULL.
> 
> So I would guess that this probably broke a whole lot of IMX based
> boards that use SPL.
> 
> I don't quite know what the best solution is... we now have a v2021.04
> that is broken for several or many boards and I dont' know if its
> clear what cases break.

Looking forward, I think we need to rework this.  Simon, I gather you
have some platforms where we need to set gd->bd to something that we
malloc() ?  So perhaps spl_set_bd() should have been __weak so that
architectures / platforms could override as needed, but also move it
just past mem_malloc_init().
Alexandru Gagniuc April 12, 2021, 1:51 p.m. UTC | #7
On 4/12/21 8:25 AM, Tom Rini wrote:
> On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
>> On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
>>>
>>> On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
>>>> On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
>>>>>
>>>>> Hi Simon
>>>>>
>>>>> On 4/8/21 6:55 PM, Simon Glass wrote:
>>>>>> Hi Alexandru,
>>>>>>
>>>>>> On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>>>>>>>
>>>>>>> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
>>>>>>>
>>>>>>> struct global_data contains a pointer to the bd_info structure. This
>>>>>>> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
>>>>>>> ".data" section. The referenced commit replaced this mechanism to one
>>>>>>> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
>>>>>>> which very few boards do.
>>>>>>>
>>>>>>> The result is that (struct global_data)->bd is NULL in SPL on most
>>>>>>> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
>>>>>>> access (struct global_data)->bd and set the "/memory" node in the
>>>>>>> devicetree. The result is that the "/memory" node contains garbage
>>>>>>> values, causing linux to panic() as it sets up the page table.
>>>>>>>
>>>>>>> Instead of trying to fix the mess, potentially causing other issues,
>>>>>>> revert to the code that worked, while this change is reworked.
>>>>>>
>>>>>> The goal here is to drop a feature that few boards use and reduce
>>>>>> memory usage in SPL. It has been in place for two releases now.
>>>>>>
>>>>>> If Falcon mode needs it, perhaps you could add an 'imply' in the
>>>>>> Kconfig for that feature? Is there one? Or perhaps
>>>>>> CONFIG_ARCH_FIXUP_FDT_MEMORY ?
>>>>>>
>>>>>> One option would be to return an error in arch_fixup_fdt(). In
>>>>>> general, fixups make things tricky because there is no way to
>>>>>> determine when they are used but at least this one has a CONFIG.
>>>>>>
>>>>>
>>>>> The argument that this has been in place for two releases is incorrect.
>>>>> Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
>>>>> the v2021.04 release. It definitely was not in 2021.01. It's only in the
>>>>> last release, which is four days old t the time of this writing.
>>>
>>> Yes.  But another way of saying that is that we're 4 days in to the
>>> merge window.  That's a bit early to say we must revert the change.  If
>>> this was just before the release, yes, revert would be the right answer.
>>> It's also the case the original commit fixes some cases while also
>>> saving size, if I read it right.  So a strict revert isn't right, we'd
>>> need to also probably also default y SPL_ALLOC_BD in some cases.
>>>
>>>>> Although I was able to find one example, the reality is that we don't
>>>>> know the full extent of the breakage. The prudent thing at this point is
>>>>> to revert.
>>>>>
>>>>> The knowledge of how to init the platform is in the devicetree and code.
>>>>> Why should kconfig also be involved in storing this knowledge? By this
>>>>> model, as the number of boards increases without bounds, every "if"
>>>>> predicate tends to be Kconfig driven. That is not maintainable, and why
>>>>> I think the original change --and the proposed fixes-- are broken by design.
>>>>>
>>>>> Furthermore, I'm happy to discuss what to do about Falcon mode, and if
>>>>> we should kill it entirely (I have an alternative proposal).  But we
>>>>> shouldn't have that discussion in a manner holding my platform hostage.
>>>>> To be fair to you, I don't think reverting a 64-byte savings in .data
>>>>> will hold your platform hostage either.
>>>>
>>>> That original patch broke a bunch of OMAP boards, but enabling
>>>> SPL_ALLOC_BD was the solution to fix the issue.
>>>> Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
>>>> can make falcon mode imply it.
>>>
>>> It would be "select" since it needs it rather than imply.
>>>
>>
>> I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
>> Drop bd_info in the data section") breaks Gateworks Ventana and
>> defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
>> not being used and the breakage is because arch/arm/mach-imx/spl.c
>> dram_init_banksize() accesses gd->bd which is NULL.
>>
>> So I would guess that this probably broke a whole lot of IMX based
>> boards that use SPL.
>>
>> I don't quite know what the best solution is... we now have a v2021.04
>> that is broken for several or many boards and I dont' know if its
>> clear what cases break.
> 
> Looking forward, I think we need to rework this.  Simon, I gather you
> have some platforms where we need to set gd->bd to something that we
> malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> architectures / platforms could override as needed, but also move it
> just past mem_malloc_init().

Let's try and avoid making new weak functions. Why not introduce a 
spl_platform_alloc_bd(), that each plat- *must* implement? No diversion 
to Kconfig and no __weak__ required.

Alex
Tom Rini April 12, 2021, 2:40 p.m. UTC | #8
On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> 
> 
> On 4/12/21 8:25 AM, Tom Rini wrote:
> > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > 
> > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > 
> > > > > > Hi Simon
> > > > > > 
> > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > Hi Alexandru,
> > > > > > > 
> > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > 
> > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > 
> > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > which very few boards do.
> > > > > > > > 
> > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > 
> > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > 
> > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > 
> > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > 
> > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > 
> > > > > > 
> > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > last release, which is four days old t the time of this writing.
> > > > 
> > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > this was just before the release, yes, revert would be the right answer.
> > > > It's also the case the original commit fixes some cases while also
> > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > 
> > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > to revert.
> > > > > > 
> > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > 
> > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > will hold your platform hostage either.
> > > > > 
> > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > can make falcon mode imply it.
> > > > 
> > > > It would be "select" since it needs it rather than imply.
> > > > 
> > > 
> > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > dram_init_banksize() accesses gd->bd which is NULL.
> > > 
> > > So I would guess that this probably broke a whole lot of IMX based
> > > boards that use SPL.
> > > 
> > > I don't quite know what the best solution is... we now have a v2021.04
> > > that is broken for several or many boards and I dont' know if its
> > > clear what cases break.
> > 
> > Looking forward, I think we need to rework this.  Simon, I gather you
> > have some platforms where we need to set gd->bd to something that we
> > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > architectures / platforms could override as needed, but also move it
> > just past mem_malloc_init().
> 
> Let's try and avoid making new weak functions. Why not introduce a
> spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> Kconfig and no __weak__ required.

Well, who needs something different than what we had before exactly?
This shouldn't be kicked up to "every SoC must define...", but maybe
"every architecture must define..." is OK.  Or maybe it's really just "a
few unusual platforms need" and so a weak function makes sense.
Alexandru Gagniuc April 12, 2021, 3:21 p.m. UTC | #9
On 4/12/21 9:40 AM, Tom Rini wrote:
> On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
>>
>>
>> On 4/12/21 8:25 AM, Tom Rini wrote:
>>> On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
>>>> On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
>>>>>
>>>>> On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
>>>>>> On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Simon
>>>>>>>
>>>>>>> On 4/8/21 6:55 PM, Simon Glass wrote:
>>>>>>>> Hi Alexandru,
>>>>>>>>
>>>>>>>> On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
>>>>>>>>>
>>>>>>>>> struct global_data contains a pointer to the bd_info structure. This
>>>>>>>>> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
>>>>>>>>> ".data" section. The referenced commit replaced this mechanism to one
>>>>>>>>> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
>>>>>>>>> which very few boards do.
>>>>>>>>>
>>>>>>>>> The result is that (struct global_data)->bd is NULL in SPL on most
>>>>>>>>> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
>>>>>>>>> access (struct global_data)->bd and set the "/memory" node in the
>>>>>>>>> devicetree. The result is that the "/memory" node contains garbage
>>>>>>>>> values, causing linux to panic() as it sets up the page table.
>>>>>>>>>
>>>>>>>>> Instead of trying to fix the mess, potentially causing other issues,
>>>>>>>>> revert to the code that worked, while this change is reworked.
>>>>>>>>
>>>>>>>> The goal here is to drop a feature that few boards use and reduce
>>>>>>>> memory usage in SPL. It has been in place for two releases now.
>>>>>>>>
>>>>>>>> If Falcon mode needs it, perhaps you could add an 'imply' in the
>>>>>>>> Kconfig for that feature? Is there one? Or perhaps
>>>>>>>> CONFIG_ARCH_FIXUP_FDT_MEMORY ?
>>>>>>>>
>>>>>>>> One option would be to return an error in arch_fixup_fdt(). In
>>>>>>>> general, fixups make things tricky because there is no way to
>>>>>>>> determine when they are used but at least this one has a CONFIG.
>>>>>>>>
>>>>>>>
>>>>>>> The argument that this has been in place for two releases is incorrect.
>>>>>>> Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
>>>>>>> the v2021.04 release. It definitely was not in 2021.01. It's only in the
>>>>>>> last release, which is four days old t the time of this writing.
>>>>>
>>>>> Yes.  But another way of saying that is that we're 4 days in to the
>>>>> merge window.  That's a bit early to say we must revert the change.  If
>>>>> this was just before the release, yes, revert would be the right answer.
>>>>> It's also the case the original commit fixes some cases while also
>>>>> saving size, if I read it right.  So a strict revert isn't right, we'd
>>>>> need to also probably also default y SPL_ALLOC_BD in some cases.
>>>>>
>>>>>>> Although I was able to find one example, the reality is that we don't
>>>>>>> know the full extent of the breakage. The prudent thing at this point is
>>>>>>> to revert.
>>>>>>>
>>>>>>> The knowledge of how to init the platform is in the devicetree and code.
>>>>>>> Why should kconfig also be involved in storing this knowledge? By this
>>>>>>> model, as the number of boards increases without bounds, every "if"
>>>>>>> predicate tends to be Kconfig driven. That is not maintainable, and why
>>>>>>> I think the original change --and the proposed fixes-- are broken by design.
>>>>>>>
>>>>>>> Furthermore, I'm happy to discuss what to do about Falcon mode, and if
>>>>>>> we should kill it entirely (I have an alternative proposal).  But we
>>>>>>> shouldn't have that discussion in a manner holding my platform hostage.
>>>>>>> To be fair to you, I don't think reverting a 64-byte savings in .data
>>>>>>> will hold your platform hostage either.
>>>>>>
>>>>>> That original patch broke a bunch of OMAP boards, but enabling
>>>>>> SPL_ALLOC_BD was the solution to fix the issue.
>>>>>> Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
>>>>>> can make falcon mode imply it.
>>>>>
>>>>> It would be "select" since it needs it rather than imply.
>>>>>
>>>>
>>>> I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
>>>> Drop bd_info in the data section") breaks Gateworks Ventana and
>>>> defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
>>>> not being used and the breakage is because arch/arm/mach-imx/spl.c
>>>> dram_init_banksize() accesses gd->bd which is NULL.
>>>>
>>>> So I would guess that this probably broke a whole lot of IMX based
>>>> boards that use SPL.
>>>>
>>>> I don't quite know what the best solution is... we now have a v2021.04
>>>> that is broken for several or many boards and I dont' know if its
>>>> clear what cases break.
>>>
>>> Looking forward, I think we need to rework this.  Simon, I gather you
>>> have some platforms where we need to set gd->bd to something that we
>>> malloc() ?  So perhaps spl_set_bd() should have been __weak so that
>>> architectures / platforms could override as needed, but also move it
>>> just past mem_malloc_init().
>>
>> Let's try and avoid making new weak functions. Why not introduce a
>> spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
>> Kconfig and no __weak__ required.
> 
> Well, who needs something different than what we had before exactly?

I'm not sure. From reading the commit message of the broken change, it 
seems the main driver was to save 64 bytes on coral. I'm assuming coral 
is the codename for a platform.

> This shouldn't be kicked up to "every SoC must define...", but maybe
> "every architecture must define..." is OK.  Or maybe it's really just "a
> few unusual platforms need" and so a weak function makes sense.

We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe 
in weak functins, as I see them as the problems that "made sense at the 
time".

Alex
Tom Rini April 12, 2021, 4:26 p.m. UTC | #10
On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> 
> 
> On 4/12/21 9:40 AM, Tom Rini wrote:
> > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > 
> > > 
> > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > > > 
> > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > > > 
> > > > > > > > Hi Simon
> > > > > > > > 
> > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > Hi Alexandru,
> > > > > > > > > 
> > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > 
> > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > which very few boards do.
> > > > > > > > > > 
> > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > 
> > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > 
> > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > 
> > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > 
> > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > 
> > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > It's also the case the original commit fixes some cases while also
> > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > 
> > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > to revert.
> > > > > > > > 
> > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > 
> > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > will hold your platform hostage either.
> > > > > > > 
> > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > can make falcon mode imply it.
> > > > > > 
> > > > > > It would be "select" since it needs it rather than imply.
> > > > > > 
> > > > > 
> > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > 
> > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > boards that use SPL.
> > > > > 
> > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > that is broken for several or many boards and I dont' know if its
> > > > > clear what cases break.
> > > > 
> > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > have some platforms where we need to set gd->bd to something that we
> > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > architectures / platforms could override as needed, but also move it
> > > > just past mem_malloc_init().
> > > 
> > > Let's try and avoid making new weak functions. Why not introduce a
> > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > Kconfig and no __weak__ required.
> > 
> > Well, who needs something different than what we had before exactly?
> 
> I'm not sure. From reading the commit message of the broken change, it seems
> the main driver was to save 64 bytes on coral. I'm assuming coral is the
> codename for a platform.

Yes, coral is an x86 chromebook.  It also says:
    This uses up space in the SPL binary but it always starts as zero.
    Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).

and defaults the new feature to enabled on all x86.  And there's
a comment about x86 in the function (that with the changes made isn't so
helpful/correct?) now too.  So yes, I want to see what Simon 

> > This shouldn't be kicked up to "every SoC must define...", but maybe
> > "every architecture must define..." is OK.  Or maybe it's really just "a
> > few unusual platforms need" and so a weak function makes sense.
> 
> We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> weak functins, as I see them as the problems that "made sense at the time".

I see weak functions as a good solution when the problem is something
where the majority of the (cross-platform) cases require the same
solution, but there's one or two oddities out there.  But regardless, we
need to have the problem better defined before we can propose a
solution.
Simon Glass April 12, 2021, 5:49 p.m. UTC | #11
Hi Tom, Alex,

On Tue, 13 Apr 2021 at 04:26, Tom Rini <trini@konsulko.com> wrote:
>
> On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> >
> >
> > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > >
> > > >
> > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > > > >
> > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Simon
> > > > > > > > >
> > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > Hi Alexandru,
> > > > > > > > > >
> > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > >
> > > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > > which very few boards do.
> > > > > > > > > > >
> > > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > >
> > > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > >
> > > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > >
> > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > >
> > > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > >
> > > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > > It's also the case the original commit fixes some cases while also
> > > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > >
> > > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > > to revert.
> > > > > > > > >
> > > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > >
> > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > > will hold your platform hostage either.
> > > > > > > >
> > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > > can make falcon mode imply it.
> > > > > > >
> > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > >
> > > > > >
> > > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > >
> > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > boards that use SPL.
> > > > > >
> > > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > clear what cases break.
> > > > >
> > > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > > have some platforms where we need to set gd->bd to something that we
> > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > architectures / platforms could override as needed, but also move it
> > > > > just past mem_malloc_init().
> > > >
> > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > > Kconfig and no __weak__ required.
> > >
> > > Well, who needs something different than what we had before exactly?
> >
> > I'm not sure. From reading the commit message of the broken change, it seems
> > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > codename for a platform.
>
> Yes, coral is an x86 chromebook.  It also says:
>     This uses up space in the SPL binary but it always starts as zero.
>     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
>
> and defaults the new feature to enabled on all x86.  And there's
> a comment about x86 in the function (that with the changes made isn't so
> helpful/correct?) now too.  So yes, I want to see what Simon
>
> > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > few unusual platforms need" and so a weak function makes sense.
> >
> > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > weak functins, as I see them as the problems that "made sense at the time".
>
> I see weak functions as a good solution when the problem is something
> where the majority of the (cross-platform) cases require the same
> solution, but there's one or two oddities out there.  But regardless, we
> need to have the problem better defined before we can propose a
> solution.

So how about enabling the Kconfig for iMX? I am not sure how many
platforms do actually need this feature?

Another option might be to (just as a test) take the 'bd' out of
global_data when the Kconfig is not defined, to find out which
platforms use it?

As to a weak function, what would the default behaviour be? If we can
define that, then it would be better IMO.

When we try to refactor things, weak functions and undefined (or
arch-specific behaviour) can really make things tough.

Regards,
Simon
Tom Rini April 12, 2021, 6:18 p.m. UTC | #12
On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> Hi Tom, Alex,
> 
> On Tue, 13 Apr 2021 at 04:26, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > >
> > >
> > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > >
> > > > >
> > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Simon
> > > > > > > > > >
> > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > >
> > > > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > >
> > > > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > > >
> > > > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > > >
> > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > >
> > > > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > > >
> > > > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > > > It's also the case the original commit fixes some cases while also
> > > > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > > >
> > > > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > > > to revert.
> > > > > > > > > >
> > > > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > > >
> > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > > > will hold your platform hostage either.
> > > > > > > > >
> > > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > > > can make falcon mode imply it.
> > > > > > > >
> > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > >
> > > > > > >
> > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > >
> > > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > > boards that use SPL.
> > > > > > >
> > > > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > > clear what cases break.
> > > > > >
> > > > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > > > have some platforms where we need to set gd->bd to something that we
> > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > > architectures / platforms could override as needed, but also move it
> > > > > > just past mem_malloc_init().
> > > > >
> > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > > > Kconfig and no __weak__ required.
> > > >
> > > > Well, who needs something different than what we had before exactly?
> > >
> > > I'm not sure. From reading the commit message of the broken change, it seems
> > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > codename for a platform.
> >
> > Yes, coral is an x86 chromebook.  It also says:
> >     This uses up space in the SPL binary but it always starts as zero.
> >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> >
> > and defaults the new feature to enabled on all x86.  And there's
> > a comment about x86 in the function (that with the changes made isn't so
> > helpful/correct?) now too.  So yes, I want to see what Simon
> >
> > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > > few unusual platforms need" and so a weak function makes sense.
> > >
> > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > > weak functins, as I see them as the problems that "made sense at the time".
> >
> > I see weak functions as a good solution when the problem is something
> > where the majority of the (cross-platform) cases require the same
> > solution, but there's one or two oddities out there.  But regardless, we
> > need to have the problem better defined before we can propose a
> > solution.
> 
> So how about enabling the Kconfig for iMX? I am not sure how many
> platforms do actually need this feature?

That was an OK answer for the first group we hit this on.

> Another option might be to (just as a test) take the 'bd' out of
> global_data when the Kconfig is not defined, to find out which
> platforms use it?

And that might help us figure out the second grouping of platforms, yes.

> As to a weak function, what would the default behaviour be? If we can
> define that, then it would be better IMO.
> 
> When we try to refactor things, weak functions and undefined (or
> arch-specific behaviour) can really make things tough.

Well, what was the problem you were trying to solve here?  I assumed
there was a case where things actively broke, with the previous design,
and it's not just the 64byte savings in some cases.  But is it?
Simon Glass April 12, 2021, 6:26 p.m. UTC | #13
Hi Tom,

On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
>
> On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> > Hi Tom, Alex,
> >
> > On Tue, 13 Apr 2021 at 04:26, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > > >
> > > >
> > > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > > >
> > > > > >
> > > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Simon
> > > > > > > > > > >
> > > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > > >
> > > > > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > > > >
> > > > > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > > > >
> > > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > > >
> > > > > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > > > >
> > > > > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > > > > It's also the case the original commit fixes some cases while also
> > > > > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > > > >
> > > > > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > > > > to revert.
> > > > > > > > > > >
> > > > > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > > > >
> > > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > > > > will hold your platform hostage either.
> > > > > > > > > >
> > > > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > > > > can make falcon mode imply it.
> > > > > > > > >
> > > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > > >
> > > > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > > > boards that use SPL.
> > > > > > > >
> > > > > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > > > clear what cases break.
> > > > > > >
> > > > > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > > > > have some platforms where we need to set gd->bd to something that we
> > > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > > > architectures / platforms could override as needed, but also move it
> > > > > > > just past mem_malloc_init().
> > > > > >
> > > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > > > > Kconfig and no __weak__ required.
> > > > >
> > > > > Well, who needs something different than what we had before exactly?
> > > >
> > > > I'm not sure. From reading the commit message of the broken change, it seems
> > > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > > codename for a platform.
> > >
> > > Yes, coral is an x86 chromebook.  It also says:
> > >     This uses up space in the SPL binary but it always starts as zero.
> > >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> > >
> > > and defaults the new feature to enabled on all x86.  And there's
> > > a comment about x86 in the function (that with the changes made isn't so
> > > helpful/correct?) now too.  So yes, I want to see what Simon
> > >
> > > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > > > few unusual platforms need" and so a weak function makes sense.
> > > >
> > > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > > > weak functins, as I see them as the problems that "made sense at the time".
> > >
> > > I see weak functions as a good solution when the problem is something
> > > where the majority of the (cross-platform) cases require the same
> > > solution, but there's one or two oddities out there.  But regardless, we
> > > need to have the problem better defined before we can propose a
> > > solution.
> >
> > So how about enabling the Kconfig for iMX? I am not sure how many
> > platforms do actually need this feature?
>
> That was an OK answer for the first group we hit this on.
>
> > Another option might be to (just as a test) take the 'bd' out of
> > global_data when the Kconfig is not defined, to find out which
> > platforms use it?
>
> And that might help us figure out the second grouping of platforms, yes.

OK I will take a look at that.

>
> > As to a weak function, what would the default behaviour be? If we can
> > define that, then it would be better IMO.
> >
> > When we try to refactor things, weak functions and undefined (or
> > arch-specific behaviour) can really make things tough.
>
> Well, what was the problem you were trying to solve here?  I assumed
> there was a case where things actively broke, with the previous design,
> and it's not just the 64byte savings in some cases.  But is it?

Yes the region of memory is not writable, so must be allocated at runtime.

Regards,
SImon
Tom Rini April 12, 2021, 6:38 p.m. UTC | #14
On Tue, Apr 13, 2021 at 06:26:08AM +1200, Simon Glass wrote:
> Hi Tom,
> 
> On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
[snip]
> > > As to a weak function, what would the default behaviour be? If we can
> > > define that, then it would be better IMO.
> > >
> > > When we try to refactor things, weak functions and undefined (or
> > > arch-specific behaviour) can really make things tough.
> >
> > Well, what was the problem you were trying to solve here?  I assumed
> > there was a case where things actively broke, with the previous design,
> > and it's not just the 64byte savings in some cases.  But is it?
> 
> Yes the region of memory is not writable, so must be allocated at runtime.

Where, on x86?  Some ARM cases?  That's one of the other things to sort
out here.
Simon Glass April 12, 2021, 6:44 p.m. UTC | #15
Hi Tom,

On Tue, 13 Apr 2021 at 06:38, Tom Rini <trini@konsulko.com> wrote:
>
> On Tue, Apr 13, 2021 at 06:26:08AM +1200, Simon Glass wrote:
> > Hi Tom,
> >
> > On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> [snip]
> > > > As to a weak function, what would the default behaviour be? If we can
> > > > define that, then it would be better IMO.
> > > >
> > > > When we try to refactor things, weak functions and undefined (or
> > > > arch-specific behaviour) can really make things tough.
> > >
> > > Well, what was the problem you were trying to solve here?  I assumed
> > > there was a case where things actively broke, with the previous design,
> > > and it's not just the 64byte savings in some cases.  But is it?
> >
> > Yes the region of memory is not writable, so must be allocated at runtime.
>
> Where, on x86?  Some ARM cases?  That's one of the other things to sort
> out here.

Yes, on coral and likely newer things to come...For ARM I don't know
of any such problems.

Regards,
Simon
Tim Harvey April 16, 2021, 8:40 p.m. UTC | #16
On Mon, Apr 12, 2021 at 11:44 AM Simon Glass <sjg@chromium.org> wrote:
>
> Hi Tom,
>
> On Tue, 13 Apr 2021 at 06:38, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 06:26:08AM +1200, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
> > > >
> > > > On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> > [snip]
> > > > > As to a weak function, what would the default behaviour be? If we can
> > > > > define that, then it would be better IMO.
> > > > >
> > > > > When we try to refactor things, weak functions and undefined (or
> > > > > arch-specific behaviour) can really make things tough.
> > > >
> > > > Well, what was the problem you were trying to solve here?  I assumed
> > > > there was a case where things actively broke, with the previous design,
> > > > and it's not just the 64byte savings in some cases.  But is it?
> > >
> > > Yes the region of memory is not writable, so must be allocated at runtime.
> >
> > Where, on x86?  Some ARM cases?  That's one of the other things to sort
> > out here.
>
> Yes, on coral and likely newer things to come...For ARM I don't know
> of any such problems.
>

I'm not sure I understand if there is agreement on a solution to this
patch breaking several or many boards? I count 58 IMX6 boards using
SPL and none of them currently define CONFIG_SPL_ALLOC_BD=y. It sounds
like Adam said OMAP boards were broken as well and I'm not clear if
those boards are fixed yet either.

Best regards,

Tim
Adam Ford April 16, 2021, 11:16 p.m. UTC | #17
On Fri, Apr 16, 2021 at 3:41 PM Tim Harvey <tharvey@gateworks.com> wrote:
>
> On Mon, Apr 12, 2021 at 11:44 AM Simon Glass <sjg@chromium.org> wrote:
> >
> > Hi Tom,
> >
> > On Tue, 13 Apr 2021 at 06:38, Tom Rini <trini@konsulko.com> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 06:26:08AM +1200, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
> > > > >
> > > > > On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> > > [snip]
> > > > > > As to a weak function, what would the default behaviour be? If we can
> > > > > > define that, then it would be better IMO.
> > > > > >
> > > > > > When we try to refactor things, weak functions and undefined (or
> > > > > > arch-specific behaviour) can really make things tough.
> > > > >
> > > > > Well, what was the problem you were trying to solve here?  I assumed
> > > > > there was a case where things actively broke, with the previous design,
> > > > > and it's not just the 64byte savings in some cases.  But is it?
> > > >
> > > > Yes the region of memory is not writable, so must be allocated at runtime.
> > >
> > > Where, on x86?  Some ARM cases?  That's one of the other things to sort
> > > out here.
> >
> > Yes, on coral and likely newer things to come...For ARM I don't know
> > of any such problems.
> >
>
> I'm not sure I understand if there is agreement on a solution to this
> patch breaking several or many boards? I count 58 IMX6 boards using
> SPL and none of them currently define CONFIG_SPL_ALLOC_BD=y. It sounds
> like Adam said OMAP boards were broken as well and I'm not clear if
> those boards are fixed yet either.

I have already submitted a patch for the OMAP boards that I maintain
to address the issue.  I wonder if it would make sense to make these
architectures select CONFIG_SPL_ALLOC_BD automatically instead of
having a bunch of individual boards enable it.  I have not checked any
of the IMX8M or IMX6 boards that I maintain, but I am watching this
thread.  I can test the CONFIG_SPL_ALLOC_BD on my boards if people
think there is value.

adam

>
> Best regards,
>
> Tim
Alexandru Gagniuc April 19, 2021, 2:26 a.m. UTC | #18
On 4/16/21 6:16 PM, Adam Ford wrote:
> On Fri, Apr 16, 2021 at 3:41 PM Tim Harvey <tharvey@gateworks.com> wrote:
>>
>> On Mon, Apr 12, 2021 at 11:44 AM Simon Glass <sjg@chromium.org> wrote:
>>>
>>> Hi Tom,
>>>
>>> On Tue, 13 Apr 2021 at 06:38, Tom Rini <trini@konsulko.com> wrote:
>>>>
>>>> On Tue, Apr 13, 2021 at 06:26:08AM +1200, Simon Glass wrote:
>>>>> Hi Tom,
>>>>>
>>>>> On Tue, 13 Apr 2021 at 06:18, Tom Rini <trini@konsulko.com> wrote:
>>>>>>
>>>>>> On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
>>>> [snip]
>>>>>>> As to a weak function, what would the default behaviour be? If we can
>>>>>>> define that, then it would be better IMO.
>>>>>>>
>>>>>>> When we try to refactor things, weak functions and undefined (or
>>>>>>> arch-specific behaviour) can really make things tough.
>>>>>>
>>>>>> Well, what was the problem you were trying to solve here?  I assumed
>>>>>> there was a case where things actively broke, with the previous design,
>>>>>> and it's not just the 64byte savings in some cases.  But is it?
>>>>>
>>>>> Yes the region of memory is not writable, so must be allocated at runtime.
>>>>
>>>> Where, on x86?  Some ARM cases?  That's one of the other things to sort
>>>> out here.
>>>
>>> Yes, on coral and likely newer things to come...For ARM I don't know
>>> of any such problems.
>>>
>>
>> I'm not sure I understand if there is agreement on a solution to this
>> patch breaking several or many boards? I count 58 IMX6 boards using
>> SPL and none of them currently define CONFIG_SPL_ALLOC_BD=y. It sounds
>> like Adam said OMAP boards were broken as well and I'm not clear if
>> those boards are fixed yet either.
> 
> I have already submitted a patch for the OMAP boards that I maintain
> to address the issue.  I wonder if it would make sense to make these
> architectures select CONFIG_SPL_ALLOC_BD automatically instead of
> having a bunch of individual boards enable it.  I have not checked any
> of the IMX8M or IMX6 boards that I maintain, but I am watching this
> thread.  I can test the CONFIG_SPL_ALLOC_BD on my boards if people
> think there is value.
> 

I've been thinking about what Simon said -- that the memory region is 
not writable on x86. But that memory is in .data, which should be 
writable. It doesn't make sense.

Alex
Tom Rini April 19, 2021, 3:32 p.m. UTC | #19
On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> Hi Tom, Alex,
> 
> On Tue, 13 Apr 2021 at 04:26, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > >
> > >
> > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > >
> > > > >
> > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <trini@konsulko.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Simon
> > > > > > > > > >
> > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > >
> > > > > > > > > > > > struct global_data contains a pointer to the bd_info structure. This
> > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> > > > > > > > > > > > ".data" section. The referenced commit replaced this mechanism to one
> > > > > > > > > > > > that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > >
> > > > > > > > > > > > The result is that (struct global_data)->bd is NULL in SPL on most
> > > > > > > > > > > > platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" node in the
> > > > > > > > > > > > devicetree. The result is that the "/memory" node contains garbage
> > > > > > > > > > > > values, causing linux to panic() as it sets up the page table.
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of trying to fix the mess, potentially causing other issues,
> > > > > > > > > > > > revert to the code that worked, while this change is reworked.
> > > > > > > > > > >
> > > > > > > > > > > The goal here is to drop a feature that few boards use and reduce
> > > > > > > > > > > memory usage in SPL. It has been in place for two releases now.
> > > > > > > > > > >
> > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' in the
> > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > >
> > > > > > > > > > > One option would be to return an error in arch_fixup_fdt(). In
> > > > > > > > > > > general, fixups make things tricky because there is no way to
> > > > > > > > > > > determine when they are used but at least this one has a CONFIG.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The argument that this has been in place for two releases is incorrect.
> > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
> > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. It's only in the
> > > > > > > > > > last release, which is four days old t the time of this writing.
> > > > > > > >
> > > > > > > > Yes.  But another way of saying that is that we're 4 days in to the
> > > > > > > > merge window.  That's a bit early to say we must revert the change.  If
> > > > > > > > this was just before the release, yes, revert would be the right answer.
> > > > > > > > It's also the case the original commit fixes some cases while also
> > > > > > > > saving size, if I read it right.  So a strict revert isn't right, we'd
> > > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > > >
> > > > > > > > > > Although I was able to find one example, the reality is that we don't
> > > > > > > > > > know the full extent of the breakage. The prudent thing at this point is
> > > > > > > > > > to revert.
> > > > > > > > > >
> > > > > > > > > > The knowledge of how to init the platform is in the devicetree and code.
> > > > > > > > > > Why should kconfig also be involved in storing this knowledge? By this
> > > > > > > > > > model, as the number of boards increases without bounds, every "if"
> > > > > > > > > > predicate tends to be Kconfig driven. That is not maintainable, and why
> > > > > > > > > > I think the original change --and the proposed fixes-- are broken by design.
> > > > > > > > > >
> > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon mode, and if
> > > > > > > > > > we should kill it entirely (I have an alternative proposal).  But we
> > > > > > > > > > shouldn't have that discussion in a manner holding my platform hostage.
> > > > > > > > > > To be fair to you, I don't think reverting a 64-byte savings in .data
> > > > > > > > > > will hold your platform hostage either.
> > > > > > > > >
> > > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
> > > > > > > > > can make falcon mode imply it.
> > > > > > > >
> > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > >
> > > > > > >
> > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
> > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
> > > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > >
> > > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > > boards that use SPL.
> > > > > > >
> > > > > > > I don't quite know what the best solution is... we now have a v2021.04
> > > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > > clear what cases break.
> > > > > >
> > > > > > Looking forward, I think we need to rework this.  Simon, I gather you
> > > > > > have some platforms where we need to set gd->bd to something that we
> > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > > architectures / platforms could override as needed, but also move it
> > > > > > just past mem_malloc_init().
> > > > >
> > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to
> > > > > Kconfig and no __weak__ required.
> > > >
> > > > Well, who needs something different than what we had before exactly?
> > >
> > > I'm not sure. From reading the commit message of the broken change, it seems
> > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > codename for a platform.
> >
> > Yes, coral is an x86 chromebook.  It also says:
> >     This uses up space in the SPL binary but it always starts as zero.
> >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> >
> > and defaults the new feature to enabled on all x86.  And there's
> > a comment about x86 in the function (that with the changes made isn't so
> > helpful/correct?) now too.  So yes, I want to see what Simon
> >
> > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > > few unusual platforms need" and so a weak function makes sense.
> > >
> > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > > weak functins, as I see them as the problems that "made sense at the time".
> >
> > I see weak functions as a good solution when the problem is something
> > where the majority of the (cross-platform) cases require the same
> > solution, but there's one or two oddities out there.  But regardless, we
> > need to have the problem better defined before we can propose a
> > solution.
> 
> So how about enabling the Kconfig for iMX? I am not sure how many
> platforms do actually need this feature?
> 
> Another option might be to (just as a test) take the 'bd' out of
> global_data when the Kconfig is not defined, to find out which
> platforms use it?

I started down the path of seeing what fails to build if in SPL we don't
have gd->bd, but well, then we get back to the OP, where Alex notes it's
all cases of falcon mode.  And Tim noted that i.MX is widely broken due
to setting up memory info as well more generally.  It looks like
Rockchip hits that and Marvell and probably a few others.

So I think the right answer is the revert, followed by figuring out how
to let x86 instead have a malloc'd one.  Perhaps spl_board_init() is
OK?
Tom Rini April 19, 2021, 5:33 p.m. UTC | #20
On Thu, Apr 08, 2021 at 11:56:11AM -0500, Alexandru Gagniuc wrote:

> This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> 
> struct global_data contains a pointer to the bd_info structure. This
> pointer was populated spl_set_bd() to a pre-allocated bd_info in the
> ".data" section. The referenced commit replaced this mechanism to one
> that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
> which very few boards do.
> 
> The result is that (struct global_data)->bd is NULL in SPL on most
> platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
> access (struct global_data)->bd and set the "/memory" node in the
> devicetree. The result is that the "/memory" node contains garbage
> values, causing linux to panic() as it sets up the page table.
> 
> Instead of trying to fix the mess, potentially causing other issues,
> revert to the code that worked, while this change is reworked.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>

Applied to u-boot/master, thanks!
diff mbox series

Patch

diff --git a/arch/arm/cpu/armv8/fsl-layerscape/spl.c b/arch/arm/cpu/armv8/fsl-layerscape/spl.c
index d5131bcf4b..7d594a9f74 100644
--- a/arch/arm/cpu/armv8/fsl-layerscape/spl.c
+++ b/arch/arm/cpu/armv8/fsl-layerscape/spl.c
@@ -39,9 +39,6 @@  u32 spl_boot_device(void)
 
 #ifdef CONFIG_SPL_BUILD
 
-/* Define board data structure */
-static struct bd_info bdata __attribute__ ((section(".data")));
-
 void spl_board_init(void)
 {
 #if defined(CONFIG_NXP_ESBC) && defined(CONFIG_FSL_LSCH2)
@@ -78,7 +75,7 @@  void board_init_f(ulong dummy)
 	get_clocks();
 
 	preloader_console_init();
-	gd->bd = &bdata;
+	spl_set_bd();
 
 #ifdef CONFIG_SYS_I2C
 #ifdef CONFIG_SPL_I2C_SUPPORT
diff --git a/common/spl/Kconfig b/common/spl/Kconfig
index 0711cbf951..75b4f45c01 100644
--- a/common/spl/Kconfig
+++ b/common/spl/Kconfig
@@ -113,15 +113,6 @@  config SPL_FSL_PBL
 	  Create boot binary having SPL binary in PBI format concatenated with
 	  u-boot binary.
 
-config SPL_ALLOC_BD
-	bool "Allocate memory for bd_info"
-	default y if X86 || SANDBOX
-	help
-	  Some boards don't allocate space for this in their board_init_f()
-	  code. In this case U-Boot can allocate space for gd->bd in the
-	  standard SPL flow (board_init_r()). Enable this option to support
-	  this feature.
-
 endmenu
 
 config HANDOFF
diff --git a/common/spl/spl.c b/common/spl/spl.c
index 556a91ab53..fb37d71959 100644
--- a/common/spl/spl.c
+++ b/common/spl/spl.c
@@ -55,6 +55,9 @@  binman_sym_declare(ulong, spl, image_pos);
 binman_sym_declare(ulong, spl, size);
 #endif
 
+/* Define board data structure */
+static struct bd_info bdata __attribute__ ((section(".data")));
+
 /*
  * Board-specific Platform code can reimplement show_boot_progress () if needed
  */
@@ -464,19 +467,14 @@  static int spl_common_init(bool setup_malloc)
 	return 0;
 }
 
-int spl_alloc_bd(void)
+void spl_set_bd(void)
 {
 	/*
 	 * NOTE: On some platforms (e.g. x86) bdata may be in flash and not
 	 * writeable.
 	 */
-	if (!gd->bd) {
-		gd->bd = malloc(sizeof(*gd->bd));
-		if (!gd->bd)
-			return -ENOMEM;
-	}
-
-	return 0;
+	if (!gd->bd)
+		gd->bd = &bdata;
 }
 
 int spl_early_init(void)
@@ -626,6 +624,8 @@  void board_init_r(gd_t *dummy1, ulong dummy2)
 
 	debug(">>" SPL_TPL_PROMPT "board_init_r()\n");
 
+	spl_set_bd();
+
 #if defined(CONFIG_SYS_SPL_MALLOC_START)
 	mem_malloc_init(CONFIG_SYS_SPL_MALLOC_START,
 			CONFIG_SYS_SPL_MALLOC_SIZE);
@@ -635,10 +635,6 @@  void board_init_r(gd_t *dummy1, ulong dummy2)
 		if (spl_init())
 			hang();
 	}
-	if (IS_ENABLED(CONFIG_SPL_ALLOC_BD) && spl_alloc_bd()) {
-		puts("Cannot alloc bd\n");
-		hang();
-	}
 #if !defined(CONFIG_PPC) && !defined(CONFIG_ARCH_MX6)
 	/*
 	 * timer_init() does not exist on PPC systems. The timer is initialized
diff --git a/include/spl.h b/include/spl.h
index 4f6e0e53f5..cee9a42ddb 100644
--- a/include/spl.h
+++ b/include/spl.h
@@ -357,15 +357,7 @@  u32 spl_mmc_boot_mode(const u32 boot_device);
  * If not overridden, it is weakly defined in common/spl/spl_mmc.c.
  */
 int spl_mmc_boot_partition(const u32 boot_device);
-
-/**
- * spl_alloc_bd() - Allocate space for bd_info
- *
- * This sets up the gd->bd pointer by allocating memory for it
- *
- * @return 0 if OK, -ENOMEM if out of memory
- */
-int spl_alloc_bd(void);
+void spl_set_bd(void);
 
 /**
  * spl_set_header_raw_uboot() - Set up a standard SPL image structure