diff mbox

[U-Boot,1/2] armv8: add hooks for all cache-wide operations

Message ID 20161017213540.5984-1-swarren@wwwdotorg.org
State Superseded
Delegated to: Tom Warren
Headers show

Commit Message

Stephen Warren Oct. 17, 2016, 9:35 p.m. UTC
From: Stephen Warren <swarren@nvidia.com>

SoC-specific logic may be required for all forms of cache-wide
operations; invalidate and flush of both dcache and icache (note that
only 3 of the 4 possible combinations make sense, since the icache never
contains dirty lines). This patch adds an optional hook for all
implemented cache-wide operations, and renames the one existing hook to
better represent exactly which operation it is implementing. A dummy
no-op implementation of each hook is provided. These dummy
implementations are moved into C code, since there's no need to
implement them in assembly.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
 arch/arm/cpu/armv8/cache.S                   |  6 ------
 arch/arm/cpu/armv8/cache_v8.c                | 23 ++++++++++++++++++++---
 arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
 arch/arm/include/asm/system.h                |  5 ++++-
 arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
 5 files changed, 27 insertions(+), 13 deletions(-)

Comments

York Sun Oct. 18, 2016, 3:28 p.m. UTC | #1
On 10/17/2016 04:35 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> SoC-specific logic may be required for all forms of cache-wide
> operations; invalidate and flush of both dcache and icache (note that
> only 3 of the 4 possible combinations make sense, since the icache never
> contains dirty lines). This patch adds an optional hook for all
> implemented cache-wide operations, and renames the one existing hook to
> better represent exactly which operation it is implementing. A dummy
> no-op implementation of each hook is provided. These dummy
> implementations are moved into C code, since there's no need to
> implement them in assembly.
>
Stephen,

Moving this function to C may pose an issue. I had a debug a couple of 
years ago that calling a C function put the stack into cache after 
flushing L1/L2. That's why I used asm function to flush L3.

York
Stephen Warren Oct. 18, 2016, 4:14 p.m. UTC | #2
On 10/18/2016 09:28 AM, york sun wrote:
> On 10/17/2016 04:35 PM, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> SoC-specific logic may be required for all forms of cache-wide
>> operations; invalidate and flush of both dcache and icache (note that
>> only 3 of the 4 possible combinations make sense, since the icache never
>> contains dirty lines). This patch adds an optional hook for all
>> implemented cache-wide operations, and renames the one existing hook to
>> better represent exactly which operation it is implementing. A dummy
>> no-op implementation of each hook is provided. These dummy
>> implementations are moved into C code, since there's no need to
>> implement them in assembly.
>>
> Stephen,
>
> Moving this function to C may pose an issue. I had a debug a couple of
> years ago that calling a C function put the stack into cache after
> flushing L1/L2. That's why I used asm function to flush L3.

Assuming the stack is located in cachable memory, the CPU is free (per 
the definition of the ARM architecture) to pull it into the cache at any 
time the cache is enabled (and perhaps even when it isn't enabled, at 
the very least for the icache on ARMv8 if not other cases too). 
Implementation in C vs. assembly has absolutely no effect here. I guess 
your statement assumes that C functions will write data to the stack and 
assembly functions never will. There's no strict 1:1 correlation between 
those two things; assembly code can touch the stack just like C code. If 
there's an assumption it won't, it needs to be documented in the header 
defining these hook functions.

I assume you're specifically talking about dirtying the dcache between 
the point when dcache flushing starts and the point when the dcache is 
disabled? If so, flush_dcache_all() itself would have to be manually 
coded in assembly to avoid using the stack, as would dcache_disable() 
and set_sctlr(). I think this is why dcache_disable() currently disables 
the dcache first (thus preventing it acquiring new dirty data) and then 
flushes the dcache afterwards (thus guaranteeing that all dirty data is 
flushed with no race condition). This implies that your change to swap 
the order of those two functions isn't correct. I'm pretty sure I'm 
correct in saying that the dcache can hit even if it's disabled, hence 
disabling the dcache while it contains dirty data won't lead to issues?
Simon Glass Oct. 18, 2016, 4:23 p.m. UTC | #3
Hi Stephen,

On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org> wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> SoC-specific logic may be required for all forms of cache-wide
> operations; invalidate and flush of both dcache and icache (note that
> only 3 of the 4 possible combinations make sense, since the icache never
> contains dirty lines). This patch adds an optional hook for all
> implemented cache-wide operations, and renames the one existing hook to
> better represent exactly which operation it is implementing. A dummy
> no-op implementation of each hook is provided. These dummy
> implementations are moved into C code, since there's no need to
> implement them in assembly.
>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>  arch/arm/cpu/armv8/cache_v8.c                | 23 ++++++++++++++++++++---
>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>  arch/arm/include/asm/system.h                |  5 ++++-
>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>  5 files changed, 27 insertions(+), 13 deletions(-)
>

I think we should have a proper interface for this stuff rather than
weak functions. Maybe we need a linker-list approach, or a cache
uclass?

Regards,
Simon
York Sun Oct. 18, 2016, 6:40 p.m. UTC | #4
On 10/18/2016 11:14 AM, Stephen Warren wrote:
> On 10/18/2016 09:28 AM, york sun wrote:
>> On 10/17/2016 04:35 PM, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> SoC-specific logic may be required for all forms of cache-wide
>>> operations; invalidate and flush of both dcache and icache (note that
>>> only 3 of the 4 possible combinations make sense, since the icache never
>>> contains dirty lines). This patch adds an optional hook for all
>>> implemented cache-wide operations, and renames the one existing hook to
>>> better represent exactly which operation it is implementing. A dummy
>>> no-op implementation of each hook is provided. These dummy
>>> implementations are moved into C code, since there's no need to
>>> implement them in assembly.
>>>
>> Stephen,
>>
>> Moving this function to C may pose an issue. I had a debug a couple of
>> years ago that calling a C function put the stack into cache after
>> flushing L1/L2. That's why I used asm function to flush L3.
>
> Assuming the stack is located in cachable memory, the CPU is free (per
> the definition of the ARM architecture) to pull it into the cache at any
> time the cache is enabled (and perhaps even when it isn't enabled, at
> the very least for the icache on ARMv8 if not other cases too).
> Implementation in C vs. assembly has absolutely no effect here. I guess
> your statement assumes that C functions will write data to the stack and
> assembly functions never will. There's no strict 1:1 correlation between
> those two things; assembly code can touch the stack just like C code. If
> there's an assumption it won't, it needs to be documented in the header
> defining these hook functions.
>
> I assume you're specifically talking about dirtying the dcache between
> the point when dcache flushing starts and the point when the dcache is
> disabled? If so, flush_dcache_all() itself would have to be manually
> coded in assembly to avoid using the stack, as would dcache_disable()
> and set_sctlr(). I think this is why dcache_disable() currently disables
> the dcache first (thus preventing it acquiring new dirty data) and then
> flushes the dcache afterwards (thus guaranteeing that all dirty data is
> flushed with no race condition). This implies that your change to swap
> the order of those two functions isn't correct. I'm pretty sure I'm

I wonder if David can shed some light on the original order of calls to 
disable dcache.

> correct in saying that the dcache can hit even if it's disabled, hence
> disabling the dcache while it contains dirty data won't lead to issues?
>

My earlier debug was based on the original order of calls. I found I had 
to avoid using the stack before flushing L3. Now with the changed order, 
I haven't tested. But I can image the stack will be dirty and flushing 
L3 may or may not push the data into main memory (depending on the L3 
implementation whether inclusive or not).

You said you are sure dcache can hit even if it is disabled. Can you 
explain more? My test shows as soon as the d-cache is disabled, the core 
cannot get the data in dirty cache.

York
Stephen Warren Oct. 18, 2016, 6:58 p.m. UTC | #5
On 10/18/2016 10:23 AM, Simon Glass wrote:
> Hi Stephen,
>
> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> SoC-specific logic may be required for all forms of cache-wide
>> operations; invalidate and flush of both dcache and icache (note that
>> only 3 of the 4 possible combinations make sense, since the icache never
>> contains dirty lines). This patch adds an optional hook for all
>> implemented cache-wide operations, and renames the one existing hook to
>> better represent exactly which operation it is implementing. A dummy
>> no-op implementation of each hook is provided. These dummy
>> implementations are moved into C code, since there's no need to
>> implement them in assembly.
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>> ---
>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>  arch/arm/cpu/armv8/cache_v8.c                | 23 ++++++++++++++++++++---
>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>  arch/arm/include/asm/system.h                |  5 ++++-
>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>
>
> I think we should have a proper interface for this stuff rather than
> weak functions. Maybe we need a linker-list approach, or a cache
> uclass?

What's improper about this interface? Presumably we could argue that no 
function in the entire of U-Boot be called by symbol name, which doesn't 
seem useful.

I'm not sure exactly what you envisage by a linker-list approach. Can 
you provide some background? I understand how the linker can construct 
list of objects/implementations/..., but that doesn't seem useful here 
since there's a known-ahead-of-time single implementation of these 
functions in a single build of U-Boot.

A cache uclass seems like /massive/ overkill, especially since I'd 
expect these very low-level functions to be required well before any 
higher level code like DM/classes/... to be available.
Stephen Warren Oct. 18, 2016, 7:01 p.m. UTC | #6
On 10/18/2016 12:40 PM, york sun wrote:
> On 10/18/2016 11:14 AM, Stephen Warren wrote:
>> On 10/18/2016 09:28 AM, york sun wrote:
>>> On 10/17/2016 04:35 PM, Stephen Warren wrote:
>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>
>>>> SoC-specific logic may be required for all forms of cache-wide
>>>> operations; invalidate and flush of both dcache and icache (note that
>>>> only 3 of the 4 possible combinations make sense, since the icache never
>>>> contains dirty lines). This patch adds an optional hook for all
>>>> implemented cache-wide operations, and renames the one existing hook to
>>>> better represent exactly which operation it is implementing. A dummy
>>>> no-op implementation of each hook is provided. These dummy
>>>> implementations are moved into C code, since there's no need to
>>>> implement them in assembly.
>>>>
>>> Stephen,
>>>
>>> Moving this function to C may pose an issue. I had a debug a couple of
>>> years ago that calling a C function put the stack into cache after
>>> flushing L1/L2. That's why I used asm function to flush L3.
>>
>> Assuming the stack is located in cachable memory, the CPU is free (per
>> the definition of the ARM architecture) to pull it into the cache at any
>> time the cache is enabled (and perhaps even when it isn't enabled, at
>> the very least for the icache on ARMv8 if not other cases too).
>> Implementation in C vs. assembly has absolutely no effect here. I guess
>> your statement assumes that C functions will write data to the stack and
>> assembly functions never will. There's no strict 1:1 correlation between
>> those two things; assembly code can touch the stack just like C code. If
>> there's an assumption it won't, it needs to be documented in the header
>> defining these hook functions.
>>
>> I assume you're specifically talking about dirtying the dcache between
>> the point when dcache flushing starts and the point when the dcache is
>> disabled? If so, flush_dcache_all() itself would have to be manually
>> coded in assembly to avoid using the stack, as would dcache_disable()
>> and set_sctlr(). I think this is why dcache_disable() currently disables
>> the dcache first (thus preventing it acquiring new dirty data) and then
>> flushes the dcache afterwards (thus guaranteeing that all dirty data is
>> flushed with no race condition). This implies that your change to swap
>> the order of those two functions isn't correct. I'm pretty sure I'm
>
> I wonder if David can shed some light on the original order of calls to
> disable dcache.
>
>> correct in saying that the dcache can hit even if it's disabled, hence
>> disabling the dcache while it contains dirty data won't lead to issues?
>>
>
> My earlier debug was based on the original order of calls. I found I had
> to avoid using the stack before flushing L3. Now with the changed order,
> I haven't tested. But I can image the stack will be dirty and flushing
> L3 may or may not push the data into main memory (depending on the L3
> implementation whether inclusive or not).
>
> You said you are sure dcache can hit even if it is disabled. Can you
> explain more? My test shows as soon as the d-cache is disabled, the core
> cannot get the data in dirty cache.

By "hit" here, I mean that even with the dcache disabled, when the CPU 
performs a read access, if the dcache contains a copy of that data, it 
can return it rather than requiring it to be fetched from DRAM.

Yes, with the dcache disabled, I would not expect any writes to allocate 
new lines in the cache (although presumably writes would update any 
lines already there, in a write-though sense).

At least, I'm pretty sure this is all true. It seems the only way to 
allow switching from cache-on to cache-off state without losing dirty data.
Simon Glass Oct. 18, 2016, 7:03 p.m. UTC | #7
Hi Stephen,

On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>
>> Hi Stephen,
>>
>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> SoC-specific logic may be required for all forms of cache-wide
>>> operations; invalidate and flush of both dcache and icache (note that
>>> only 3 of the 4 possible combinations make sense, since the icache never
>>> contains dirty lines). This patch adds an optional hook for all
>>> implemented cache-wide operations, and renames the one existing hook to
>>> better represent exactly which operation it is implementing. A dummy
>>> no-op implementation of each hook is provided. These dummy
>>> implementations are moved into C code, since there's no need to
>>> implement them in assembly.
>>>
>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>> ---
>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>> ++++++++++++++++++++---
>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>
>>
>> I think we should have a proper interface for this stuff rather than
>> weak functions. Maybe we need a linker-list approach, or a cache
>> uclass?
>
>
> What's improper about this interface? Presumably we could argue that no
> function in the entire of U-Boot be called by symbol name, which doesn't
> seem useful.
>
> I'm not sure exactly what you envisage by a linker-list approach. Can you
> provide some background? I understand how the linker can construct list of
> objects/implementations/..., but that doesn't seem useful here since there's
> a known-ahead-of-time single implementation of these functions in a single
> build of U-Boot.

Your own commit messages says that this is SoC-specific. I'm
suggesting that we define an interface (which I think you have already
done with your header file additions), and allow SoCs to implement it
via a linker list.

IMO the cache code in U-Boot is starting to get a bit creaky.

>
> A cache uclass seems like /massive/ overkill, especially since I'd expect
> these very low-level functions to be required well before any higher level
> code like DM/classes/... to be available.

DM is available very early. But it's not clear from your patch when
this code is actually called.

Regards,
Simon
Stephen Warren Oct. 18, 2016, 7:10 p.m. UTC | #8
On 10/18/2016 01:03 PM, Simon Glass wrote:
> Hi Stephen,
>
> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>
>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>
>>>> SoC-specific logic may be required for all forms of cache-wide
>>>> operations; invalidate and flush of both dcache and icache (note that
>>>> only 3 of the 4 possible combinations make sense, since the icache never
>>>> contains dirty lines). This patch adds an optional hook for all
>>>> implemented cache-wide operations, and renames the one existing hook to
>>>> better represent exactly which operation it is implementing. A dummy
>>>> no-op implementation of each hook is provided. These dummy
>>>> implementations are moved into C code, since there's no need to
>>>> implement them in assembly.
>>>>
>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>> ---
>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>> ++++++++++++++++++++---
>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>
>>>
>>> I think we should have a proper interface for this stuff rather than
>>> weak functions. Maybe we need a linker-list approach, or a cache
>>> uclass?
>>
>>
>> What's improper about this interface? Presumably we could argue that no
>> function in the entire of U-Boot be called by symbol name, which doesn't
>> seem useful.
>>
>> I'm not sure exactly what you envisage by a linker-list approach. Can you
>> provide some background? I understand how the linker can construct list of
>> objects/implementations/..., but that doesn't seem useful here since there's
>> a known-ahead-of-time single implementation of these functions in a single
>> build of U-Boot.
>
> Your own commit messages says that this is SoC-specific. I'm
> suggesting that we define an interface (which I think you have already
> done with your header file additions), and allow SoCs to implement it
> via a linker list.
>
> IMO the cache code in U-Boot is starting to get a bit creaky.
>
>> A cache uclass seems like /massive/ overkill, especially since I'd expect
>> these very low-level functions to be required well before any higher level
>> code like DM/classes/... to be available.
>
> DM is available very early. But it's not clear from your patch when
> this code is actually called.

I believe that weak functions are a perfectly acceptable approach here.

Yes, the implementation of these functions is SoC-specific. The 
Makefiles will pull in the appropriate implementation for that SoC 
whenever U-Boot is built, just like every other board- or SoC-specific 
function in the entire of U-Boot.

There's no need for linker lists since there is only ever one 
implementation.
Simon Glass Oct. 18, 2016, 7:56 p.m. UTC | #9
Hi Stephen,

On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>
>> Hi Stephen,
>>
>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>
>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>
>>>>
>>>> Hi Stephen,
>>>>
>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>
>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>> never
>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>> implemented cache-wide operations, and renames the one existing hook to
>>>>> better represent exactly which operation it is implementing. A dummy
>>>>> no-op implementation of each hook is provided. These dummy
>>>>> implementations are moved into C code, since there's no need to
>>>>> implement them in assembly.
>>>>>
>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>> ---
>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>> ++++++++++++++++++++---
>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>
>>>>
>>>> I think we should have a proper interface for this stuff rather than
>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>> uclass?
>>>
>>>
>>>
>>> What's improper about this interface? Presumably we could argue that no
>>> function in the entire of U-Boot be called by symbol name, which doesn't
>>> seem useful.
>>>
>>> I'm not sure exactly what you envisage by a linker-list approach. Can you
>>> provide some background? I understand how the linker can construct list
>>> of
>>> objects/implementations/..., but that doesn't seem useful here since
>>> there's
>>> a known-ahead-of-time single implementation of these functions in a
>>> single
>>> build of U-Boot.
>>
>>
>> Your own commit messages says that this is SoC-specific. I'm
>> suggesting that we define an interface (which I think you have already
>> done with your header file additions), and allow SoCs to implement it
>> via a linker list.
>>
>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>
>>> A cache uclass seems like /massive/ overkill, especially since I'd expect
>>> these very low-level functions to be required well before any higher
>>> level
>>> code like DM/classes/... to be available.
>>
>>
>> DM is available very early. But it's not clear from your patch when
>> this code is actually called.
>
>
> I believe that weak functions are a perfectly acceptable approach here.
>
> Yes, the implementation of these functions is SoC-specific. The Makefiles
> will pull in the appropriate implementation for that SoC whenever U-Boot is
> built, just like every other board- or SoC-specific function in the entire
> of U-Boot.
>
> There's no need for linker lists since there is only ever one
> implementation.

If there is only ever one implementation, why do you need weak
functions? Just call them directly. I think in fact you mean that
there can be no implementation (but perhaps an empty one), or one
implementation. You are effectively using multiple weak functions to
provide default code. I think it would be better if this were
explicit.

I still think that the cache functions could do with a rethink.

Regards,
Simon
York Sun Oct. 18, 2016, 9:28 p.m. UTC | #10
On 10/18/2016 02:01 PM, Stephen Warren wrote:
> On 10/18/2016 12:40 PM, york sun wrote:
>> On 10/18/2016 11:14 AM, Stephen Warren wrote:
>>> On 10/18/2016 09:28 AM, york sun wrote:
>>>> On 10/17/2016 04:35 PM, Stephen Warren wrote:
>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>
>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>> only 3 of the 4 possible combinations make sense, since the icache never
>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>> implemented cache-wide operations, and renames the one existing hook to
>>>>> better represent exactly which operation it is implementing. A dummy
>>>>> no-op implementation of each hook is provided. These dummy
>>>>> implementations are moved into C code, since there's no need to
>>>>> implement them in assembly.
>>>>>
>>>> Stephen,
>>>>
>>>> Moving this function to C may pose an issue. I had a debug a couple of
>>>> years ago that calling a C function put the stack into cache after
>>>> flushing L1/L2. That's why I used asm function to flush L3.
>>>
>>> Assuming the stack is located in cachable memory, the CPU is free (per
>>> the definition of the ARM architecture) to pull it into the cache at any
>>> time the cache is enabled (and perhaps even when it isn't enabled, at
>>> the very least for the icache on ARMv8 if not other cases too).
>>> Implementation in C vs. assembly has absolutely no effect here. I guess
>>> your statement assumes that C functions will write data to the stack and
>>> assembly functions never will. There's no strict 1:1 correlation between
>>> those two things; assembly code can touch the stack just like C code. If
>>> there's an assumption it won't, it needs to be documented in the header
>>> defining these hook functions.
>>>
>>> I assume you're specifically talking about dirtying the dcache between
>>> the point when dcache flushing starts and the point when the dcache is
>>> disabled? If so, flush_dcache_all() itself would have to be manually
>>> coded in assembly to avoid using the stack, as would dcache_disable()
>>> and set_sctlr(). I think this is why dcache_disable() currently disables
>>> the dcache first (thus preventing it acquiring new dirty data) and then
>>> flushes the dcache afterwards (thus guaranteeing that all dirty data is
>>> flushed with no race condition). This implies that your change to swap
>>> the order of those two functions isn't correct. I'm pretty sure I'm
>>
>> I wonder if David can shed some light on the original order of calls to
>> disable dcache.
>>
>>> correct in saying that the dcache can hit even if it's disabled, hence
>>> disabling the dcache while it contains dirty data won't lead to issues?
>>>
>>
>> My earlier debug was based on the original order of calls. I found I had
>> to avoid using the stack before flushing L3. Now with the changed order,
>> I haven't tested. But I can image the stack will be dirty and flushing
>> L3 may or may not push the data into main memory (depending on the L3
>> implementation whether inclusive or not).
>>
>> You said you are sure dcache can hit even if it is disabled. Can you
>> explain more? My test shows as soon as the d-cache is disabled, the core
>> cannot get the data in dirty cache.
>
> By "hit" here, I mean that even with the dcache disabled, when the CPU
> performs a read access, if the dcache contains a copy of that data, it
> can return it rather than requiring it to be fetched from DRAM.
>
> Yes, with the dcache disabled, I would not expect any writes to allocate
> new lines in the cache (although presumably writes would update any
> lines already there, in a write-though sense).
>
> At least, I'm pretty sure this is all true. It seems the only way to
> allow switching from cache-on to cache-off state without losing dirty data.
>

I believe my test showed otherwise. As soon as the dcache is disabled, 
the core cannot get the dirty cached data if the dcache was flushed by 
way/set for L1 and L2 (L3 wasn't flushed). That's why I proposed to 
change the order to flush first.

York
Stephen Warren Oct. 18, 2016, 10:47 p.m. UTC | #11
On 10/18/2016 03:28 PM, york sun wrote:
> On 10/18/2016 02:01 PM, Stephen Warren wrote:
>> On 10/18/2016 12:40 PM, york sun wrote:
>>> On 10/18/2016 11:14 AM, Stephen Warren wrote:
>>>> On 10/18/2016 09:28 AM, york sun wrote:
>>>>> On 10/17/2016 04:35 PM, Stephen Warren wrote:
>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>
>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>>> only 3 of the 4 possible combinations make sense, since the icache never
>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>> implemented cache-wide operations, and renames the one existing hook to
>>>>>> better represent exactly which operation it is implementing. A dummy
>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>> implementations are moved into C code, since there's no need to
>>>>>> implement them in assembly.
>>>>>>
>>>>> Stephen,
>>>>>
>>>>> Moving this function to C may pose an issue. I had a debug a couple of
>>>>> years ago that calling a C function put the stack into cache after
>>>>> flushing L1/L2. That's why I used asm function to flush L3.
>>>>
>>>> Assuming the stack is located in cachable memory, the CPU is free (per
>>>> the definition of the ARM architecture) to pull it into the cache at any
>>>> time the cache is enabled (and perhaps even when it isn't enabled, at
>>>> the very least for the icache on ARMv8 if not other cases too).
>>>> Implementation in C vs. assembly has absolutely no effect here. I guess
>>>> your statement assumes that C functions will write data to the stack and
>>>> assembly functions never will. There's no strict 1:1 correlation between
>>>> those two things; assembly code can touch the stack just like C code. If
>>>> there's an assumption it won't, it needs to be documented in the header
>>>> defining these hook functions.
>>>>
>>>> I assume you're specifically talking about dirtying the dcache between
>>>> the point when dcache flushing starts and the point when the dcache is
>>>> disabled? If so, flush_dcache_all() itself would have to be manually
>>>> coded in assembly to avoid using the stack, as would dcache_disable()
>>>> and set_sctlr(). I think this is why dcache_disable() currently disables
>>>> the dcache first (thus preventing it acquiring new dirty data) and then
>>>> flushes the dcache afterwards (thus guaranteeing that all dirty data is
>>>> flushed with no race condition). This implies that your change to swap
>>>> the order of those two functions isn't correct. I'm pretty sure I'm
>>>
>>> I wonder if David can shed some light on the original order of calls to
>>> disable dcache.
>>>
>>>> correct in saying that the dcache can hit even if it's disabled, hence
>>>> disabling the dcache while it contains dirty data won't lead to issues?
>>>>
>>>
>>> My earlier debug was based on the original order of calls. I found I had
>>> to avoid using the stack before flushing L3. Now with the changed order,
>>> I haven't tested. But I can image the stack will be dirty and flushing
>>> L3 may or may not push the data into main memory (depending on the L3
>>> implementation whether inclusive or not).
>>>
>>> You said you are sure dcache can hit even if it is disabled. Can you
>>> explain more? My test shows as soon as the d-cache is disabled, the core
>>> cannot get the data in dirty cache.
>>
>> By "hit" here, I mean that even with the dcache disabled, when the CPU
>> performs a read access, if the dcache contains a copy of that data, it
>> can return it rather than requiring it to be fetched from DRAM.
>>
>> Yes, with the dcache disabled, I would not expect any writes to allocate
>> new lines in the cache (although presumably writes would update any
>> lines already there, in a write-though sense).
>>
>> At least, I'm pretty sure this is all true. It seems the only way to
>> allow switching from cache-on to cache-off state without losing dirty data.
>
> I believe my test showed otherwise. As soon as the dcache is disabled,
> the core cannot get the dirty cached data if the dcache was flushed by
> way/set for L1 and L2 (L3 wasn't flushed). That's why I proposed to
> change the order to flush first.

It looks like what I was saying is true for ARMv7 but not for ARMv8 (or 
perhaps it's not architectural but implementation-defined). For example, 
the Cortex A15 TRM says:

> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABHEJFF.html

When you disable the cache, all Write-Back Cacheable requests still look 
up the L1 cache. If there is a cache hit, the cache is read or updated 
in the same way as if the cache is enabled. This enables Cacheable 
memory to remain fully coherent while the cache is disabled.

However, the Cortex A72 TRM says:

> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0488h/way1382448697827.html

When you disable the cache, all Write-Back Cacheable requests do not 
look up the L1 data cache. L1 cache still services the snoops from the 
L2 cache.

So yes, we should flush the entire cache first, then disable it.

This does indeed imply that we shouldn't write data between starting to 
flush the dcache and disabling the cache. However, that seems too 
restrictive, especially in the face of caches beyond L1/L2 which might 
need quite some code to flush out. I'll try to investigate that aspect 
some more.
Stephen Warren Oct. 18, 2016, 10:54 p.m. UTC | #12
On 10/18/2016 01:56 PM, Simon Glass wrote:
> Hi Stephen,
>
> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>
>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>
>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>> never
>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>> implemented cache-wide operations, and renames the one existing hook to
>>>>>> better represent exactly which operation it is implementing. A dummy
>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>> implementations are moved into C code, since there's no need to
>>>>>> implement them in assembly.
>>>>>>
>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>> ---
>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>> ++++++++++++++++++++---
>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>
>>>>>
>>>>> I think we should have a proper interface for this stuff rather than
>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>> uclass?
>>>>
>>>>
>>>>
>>>> What's improper about this interface? Presumably we could argue that no
>>>> function in the entire of U-Boot be called by symbol name, which doesn't
>>>> seem useful.
>>>>
>>>> I'm not sure exactly what you envisage by a linker-list approach. Can you
>>>> provide some background? I understand how the linker can construct list
>>>> of
>>>> objects/implementations/..., but that doesn't seem useful here since
>>>> there's
>>>> a known-ahead-of-time single implementation of these functions in a
>>>> single
>>>> build of U-Boot.
>>>
>>>
>>> Your own commit messages says that this is SoC-specific. I'm
>>> suggesting that we define an interface (which I think you have already
>>> done with your header file additions), and allow SoCs to implement it
>>> via a linker list.
>>>
>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>
>>>> A cache uclass seems like /massive/ overkill, especially since I'd expect
>>>> these very low-level functions to be required well before any higher
>>>> level
>>>> code like DM/classes/... to be available.
>>>
>>>
>>> DM is available very early. But it's not clear from your patch when
>>> this code is actually called.
>>
>>
>> I believe that weak functions are a perfectly acceptable approach here.
>>
>> Yes, the implementation of these functions is SoC-specific. The Makefiles
>> will pull in the appropriate implementation for that SoC whenever U-Boot is
>> built, just like every other board- or SoC-specific function in the entire
>> of U-Boot.
>>
>> There's no need for linker lists since there is only ever one
>> implementation.
>
> If there is only ever one implementation, why do you need weak
> functions?

As I explicitly stated above, each SoC can have a different 
implementation, yet only a single implementation is ever needed for a 
particular U-Boot build.

> Just call them directly.

The code is doing that, both before and after my patch.

> I think in fact you mean that
> there can be no implementation (but perhaps an empty one), or one
> implementation. You are effectively using multiple weak functions to
> provide default code. I think it would be better if this were
> explicit.
>
> I still think that the cache functions could do with a rethink.

In my opinion, this patch doesn't change the code structure at all. 
There is already an interface between the core (L1/L2) cache management 
code and the SoC-specific cache management code. That interface already 
uses a weak function to provide the default no-op implementation. This 
patch doesn't change any of that. All this patch does is fix that 
existing interface to cover all 3 combinations of dcache_flush, 
dcache_invalidate, and icache_invalidate, rather than just one of those 
combinations. It's more of a bug-fix than anything else.

If you want to rework this interface sometime, be my guest. However, I 
don't think it's fair to require that someone who simply wants to fix 
the existing code (in a way that is orthogonal to your desired interface 
refactoring) do that refactoring first, rather than doing it yourself.
Simon Glass Oct. 18, 2016, 11:08 p.m. UTC | #13
Hi Stephen,

On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>
>> Hi Stephen,
>>
>> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>
>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>
>>>>
>>>> Hi Stephen,
>>>>
>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Stephen,
>>>>>>
>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>>
>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>> never
>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>> implemented cache-wide operations, and renames the one existing hook
>>>>>>> to
>>>>>>> better represent exactly which operation it is implementing. A dummy
>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>> implement them in assembly.
>>>>>>>
>>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>>> ---
>>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>> ++++++++++++++++++++---
>>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>
>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>> uclass?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> What's improper about this interface? Presumably we could argue that no
>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>> doesn't
>>>>> seem useful.
>>>>>
>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>> you
>>>>> provide some background? I understand how the linker can construct list
>>>>> of
>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>> there's
>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>> single
>>>>> build of U-Boot.
>>>>
>>>>
>>>>
>>>> Your own commit messages says that this is SoC-specific. I'm
>>>> suggesting that we define an interface (which I think you have already
>>>> done with your header file additions), and allow SoCs to implement it
>>>> via a linker list.
>>>>
>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>
>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>> expect
>>>>> these very low-level functions to be required well before any higher
>>>>> level
>>>>> code like DM/classes/... to be available.
>>>>
>>>>
>>>>
>>>> DM is available very early. But it's not clear from your patch when
>>>> this code is actually called.
>>>
>>>
>>>
>>> I believe that weak functions are a perfectly acceptable approach here.
>>>
>>> Yes, the implementation of these functions is SoC-specific. The Makefiles
>>> will pull in the appropriate implementation for that SoC whenever U-Boot
>>> is
>>> built, just like every other board- or SoC-specific function in the
>>> entire
>>> of U-Boot.
>>>
>>> There's no need for linker lists since there is only ever one
>>> implementation.
>>
>>
>> If there is only ever one implementation, why do you need weak
>> functions?
>
>
> As I explicitly stated above, each SoC can have a different implementation,
> yet only a single implementation is ever needed for a particular U-Boot
> build.
>
>> Just call them directly.
>
>
> The code is doing that, both before and after my patch.

I mean call them without needing weak functions.

>
>> I think in fact you mean that
>> there can be no implementation (but perhaps an empty one), or one
>> implementation. You are effectively using multiple weak functions to
>> provide default code. I think it would be better if this were
>> explicit.
>>
>> I still think that the cache functions could do with a rethink.
>
>
> In my opinion, this patch doesn't change the code structure at all. There is
> already an interface between the core (L1/L2) cache management code and the
> SoC-specific cache management code. That interface already uses a weak
> function to provide the default no-op implementation. This patch doesn't
> change any of that. All this patch does is fix that existing interface to
> cover all 3 combinations of dcache_flush, dcache_invalidate, and
> icache_invalidate, rather than just one of those combinations. It's more of
> a bug-fix than anything else.

Yes I see that.

>
> If you want to rework this interface sometime, be my guest. However, I don't
> think it's fair to require that someone who simply wants to fix the existing
> code (in a way that is orthogonal to your desired interface refactoring) do
> that refactoring first, rather than doing it yourself.

I understand what you are saying, but isn't that how open source
software works? Believe me, I have done my fair share of refactoring
:-)

At least can you look at not making it any harder to fix up later? The
more we pile onto this interface, the harder it will be later. We
should aim to make ARMv8 really nice as it is the new thing.

Regards,
Simon
Stephen Warren Oct. 18, 2016, 11:33 p.m. UTC | #14
On 10/18/2016 05:08 PM, Simon Glass wrote:
> Hi Stephen,
>
> On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>
>>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>>
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>>>
>>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>>> operations; invalidate and flush of both dcache and icache (note that
>>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>>> never
>>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>>> implemented cache-wide operations, and renames the one existing hook
>>>>>>>> to
>>>>>>>> better represent exactly which operation it is implementing. A dummy
>>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>>> implement them in assembly.
>>>>>>>>
>>>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>>>> ---
>>>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>>> ++++++++++++++++++++---
>>>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>>
>>>>>>>
>>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>>> uclass?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's improper about this interface? Presumably we could argue that no
>>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>>> doesn't
>>>>>> seem useful.
>>>>>>
>>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>>> you
>>>>>> provide some background? I understand how the linker can construct list
>>>>>> of
>>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>>> there's
>>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>>> single
>>>>>> build of U-Boot.
>>>>>
>>>>>
>>>>>
>>>>> Your own commit messages says that this is SoC-specific. I'm
>>>>> suggesting that we define an interface (which I think you have already
>>>>> done with your header file additions), and allow SoCs to implement it
>>>>> via a linker list.
>>>>>
>>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>>
>>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>>> expect
>>>>>> these very low-level functions to be required well before any higher
>>>>>> level
>>>>>> code like DM/classes/... to be available.
>>>>>
>>>>>
>>>>>
>>>>> DM is available very early. But it's not clear from your patch when
>>>>> this code is actually called.
>>>>
>>>>
>>>>
>>>> I believe that weak functions are a perfectly acceptable approach here.
>>>>
>>>> Yes, the implementation of these functions is SoC-specific. The Makefiles
>>>> will pull in the appropriate implementation for that SoC whenever U-Boot
>>>> is
>>>> built, just like every other board- or SoC-specific function in the
>>>> entire
>>>> of U-Boot.
>>>>
>>>> There's no need for linker lists since there is only ever one
>>>> implementation.
>>>
>>>
>>> If there is only ever one implementation, why do you need weak
>>> functions?
>>
>>
>> As I explicitly stated above, each SoC can have a different implementation,
>> yet only a single implementation is ever needed for a particular U-Boot
>> build.
>>
>>> Just call them directly.
>>
>>
>> The code is doing that, both before and after my patch.
>
> I mean call them without needing weak functions.
>
>>
>>> I think in fact you mean that
>>> there can be no implementation (but perhaps an empty one), or one
>>> implementation. You are effectively using multiple weak functions to
>>> provide default code. I think it would be better if this were
>>> explicit.
>>>
>>> I still think that the cache functions could do with a rethink.
>>
>>
>> In my opinion, this patch doesn't change the code structure at all. There is
>> already an interface between the core (L1/L2) cache management code and the
>> SoC-specific cache management code. That interface already uses a weak
>> function to provide the default no-op implementation. This patch doesn't
>> change any of that. All this patch does is fix that existing interface to
>> cover all 3 combinations of dcache_flush, dcache_invalidate, and
>> icache_invalidate, rather than just one of those combinations. It's more of
>> a bug-fix than anything else.
>
> Yes I see that.
>
>>
>> If you want to rework this interface sometime, be my guest. However, I don't
>> think it's fair to require that someone who simply wants to fix the existing
>> code (in a way that is orthogonal to your desired interface refactoring) do
>> that refactoring first, rather than doing it yourself.
>
> I understand what you are saying, but isn't that how open source
> software works? Believe me, I have done my fair share of refactoring
> :-)
>
> At least can you look at not making it any harder to fix up later? The
> more we pile onto this interface, the harder it will be later. We
> should aim to make ARMv8 really nice as it is the new thing.

I believe moving one or three functions into any replacement scheme will 
have an identical level of difficulty. So, I believe the patch already 
satisfies the "no harder" requirement.
Simon Glass Oct. 19, 2016, 2:41 a.m. UTC | #15
Hi Stephen,

On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 10/18/2016 05:08 PM, Simon Glass wrote:
>>
>> Hi Stephen,
>>
>> On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>
>>> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>>>
>>>>
>>>> Hi Stephen,
>>>>
>>>> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Stephen,
>>>>>>
>>>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Stephen,
>>>>>>>>
>>>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>>>>
>>>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>>>> operations; invalidate and flush of both dcache and icache (note
>>>>>>>>> that
>>>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>>>> never
>>>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>>>> implemented cache-wide operations, and renames the one existing
>>>>>>>>> hook
>>>>>>>>> to
>>>>>>>>> better represent exactly which operation it is implementing. A
>>>>>>>>> dummy
>>>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>>>> implement them in assembly.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>>>>> ---
>>>>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>>>> ++++++++++++++++++++---
>>>>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>>>> uclass?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What's improper about this interface? Presumably we could argue that
>>>>>>> no
>>>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>>>> doesn't
>>>>>>> seem useful.
>>>>>>>
>>>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>>>> you
>>>>>>> provide some background? I understand how the linker can construct
>>>>>>> list
>>>>>>> of
>>>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>>>> there's
>>>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>>>> single
>>>>>>> build of U-Boot.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Your own commit messages says that this is SoC-specific. I'm
>>>>>> suggesting that we define an interface (which I think you have already
>>>>>> done with your header file additions), and allow SoCs to implement it
>>>>>> via a linker list.
>>>>>>
>>>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>>>
>>>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>>>> expect
>>>>>>> these very low-level functions to be required well before any higher
>>>>>>> level
>>>>>>> code like DM/classes/... to be available.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> DM is available very early. But it's not clear from your patch when
>>>>>> this code is actually called.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I believe that weak functions are a perfectly acceptable approach here.
>>>>>
>>>>> Yes, the implementation of these functions is SoC-specific. The
>>>>> Makefiles
>>>>> will pull in the appropriate implementation for that SoC whenever
>>>>> U-Boot
>>>>> is
>>>>> built, just like every other board- or SoC-specific function in the
>>>>> entire
>>>>> of U-Boot.
>>>>>
>>>>> There's no need for linker lists since there is only ever one
>>>>> implementation.
>>>>
>>>>
>>>>
>>>> If there is only ever one implementation, why do you need weak
>>>> functions?
>>>
>>>
>>>
>>> As I explicitly stated above, each SoC can have a different
>>> implementation,
>>> yet only a single implementation is ever needed for a particular U-Boot
>>> build.
>>>
>>>> Just call them directly.
>>>
>>>
>>>
>>> The code is doing that, both before and after my patch.
>>
>>
>> I mean call them without needing weak functions.
>>
>>>
>>>> I think in fact you mean that
>>>> there can be no implementation (but perhaps an empty one), or one
>>>> implementation. You are effectively using multiple weak functions to
>>>> provide default code. I think it would be better if this were
>>>> explicit.
>>>>
>>>> I still think that the cache functions could do with a rethink.
>>>
>>>
>>>
>>> In my opinion, this patch doesn't change the code structure at all. There
>>> is
>>> already an interface between the core (L1/L2) cache management code and
>>> the
>>> SoC-specific cache management code. That interface already uses a weak
>>> function to provide the default no-op implementation. This patch doesn't
>>> change any of that. All this patch does is fix that existing interface to
>>> cover all 3 combinations of dcache_flush, dcache_invalidate, and
>>> icache_invalidate, rather than just one of those combinations. It's more
>>> of
>>> a bug-fix than anything else.
>>
>>
>> Yes I see that.
>>
>>>
>>> If you want to rework this interface sometime, be my guest. However, I
>>> don't
>>> think it's fair to require that someone who simply wants to fix the
>>> existing
>>> code (in a way that is orthogonal to your desired interface refactoring)
>>> do
>>> that refactoring first, rather than doing it yourself.
>>
>>
>> I understand what you are saying, but isn't that how open source
>> software works? Believe me, I have done my fair share of refactoring
>> :-)
>>
>> At least can you look at not making it any harder to fix up later? The
>> more we pile onto this interface, the harder it will be later. We
>> should aim to make ARMv8 really nice as it is the new thing.
>
>
> I believe moving one or three functions into any replacement scheme will
> have an identical level of difficulty. So, I believe the patch already
> satisfies the "no harder" requirement.

Well it seems your mind is already made up!

Regards,
Simon
Stephen Warren Oct. 19, 2016, 3:36 p.m. UTC | #16
On 10/18/2016 08:41 PM, Simon Glass wrote:
> Hi Stephen,
>
> On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> On 10/18/2016 05:08 PM, Simon Glass wrote:
>>>
>>> Hi Stephen,
>>>
>>> On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>
>>>> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>>>>
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>>>>>
>>>>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>>>>> operations; invalidate and flush of both dcache and icache (note
>>>>>>>>>> that
>>>>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>>>>> never
>>>>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>>>>> implemented cache-wide operations, and renames the one existing
>>>>>>>>>> hook
>>>>>>>>>> to
>>>>>>>>>> better represent exactly which operation it is implementing. A
>>>>>>>>>> dummy
>>>>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>>>>> implement them in assembly.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>  arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>>>>  arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>>>>> ++++++++++++++++++++---
>>>>>>>>>>  arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>>>>  arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>>>>  arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>>>>  5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>>>>> uclass?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What's improper about this interface? Presumably we could argue that
>>>>>>>> no
>>>>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>>>>> doesn't
>>>>>>>> seem useful.
>>>>>>>>
>>>>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>>>>> you
>>>>>>>> provide some background? I understand how the linker can construct
>>>>>>>> list
>>>>>>>> of
>>>>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>>>>> there's
>>>>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>>>>> single
>>>>>>>> build of U-Boot.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Your own commit messages says that this is SoC-specific. I'm
>>>>>>> suggesting that we define an interface (which I think you have already
>>>>>>> done with your header file additions), and allow SoCs to implement it
>>>>>>> via a linker list.
>>>>>>>
>>>>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>>>>
>>>>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>>>>> expect
>>>>>>>> these very low-level functions to be required well before any higher
>>>>>>>> level
>>>>>>>> code like DM/classes/... to be available.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> DM is available very early. But it's not clear from your patch when
>>>>>>> this code is actually called.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I believe that weak functions are a perfectly acceptable approach here.
>>>>>>
>>>>>> Yes, the implementation of these functions is SoC-specific. The
>>>>>> Makefiles
>>>>>> will pull in the appropriate implementation for that SoC whenever
>>>>>> U-Boot
>>>>>> is
>>>>>> built, just like every other board- or SoC-specific function in the
>>>>>> entire
>>>>>> of U-Boot.
>>>>>>
>>>>>> There's no need for linker lists since there is only ever one
>>>>>> implementation.
>>>>>
>>>>>
>>>>>
>>>>> If there is only ever one implementation, why do you need weak
>>>>> functions?
>>>>
>>>>
>>>>
>>>> As I explicitly stated above, each SoC can have a different
>>>> implementation,
>>>> yet only a single implementation is ever needed for a particular U-Boot
>>>> build.
>>>>
>>>>> Just call them directly.
>>>>
>>>>
>>>>
>>>> The code is doing that, both before and after my patch.
>>>
>>>
>>> I mean call them without needing weak functions.
>>>
>>>>
>>>>> I think in fact you mean that
>>>>> there can be no implementation (but perhaps an empty one), or one
>>>>> implementation. You are effectively using multiple weak functions to
>>>>> provide default code. I think it would be better if this were
>>>>> explicit.
>>>>>
>>>>> I still think that the cache functions could do with a rethink.
>>>>
>>>>
>>>>
>>>> In my opinion, this patch doesn't change the code structure at all. There
>>>> is
>>>> already an interface between the core (L1/L2) cache management code and
>>>> the
>>>> SoC-specific cache management code. That interface already uses a weak
>>>> function to provide the default no-op implementation. This patch doesn't
>>>> change any of that. All this patch does is fix that existing interface to
>>>> cover all 3 combinations of dcache_flush, dcache_invalidate, and
>>>> icache_invalidate, rather than just one of those combinations. It's more
>>>> of
>>>> a bug-fix than anything else.
>>>
>>>
>>> Yes I see that.
>>>
>>>>
>>>> If you want to rework this interface sometime, be my guest. However, I
>>>> don't
>>>> think it's fair to require that someone who simply wants to fix the
>>>> existing
>>>> code (in a way that is orthogonal to your desired interface refactoring)
>>>> do
>>>> that refactoring first, rather than doing it yourself.
>>>
>>>
>>> I understand what you are saying, but isn't that how open source
>>> software works? Believe me, I have done my fair share of refactoring
>>> :-)
>>>
>>> At least can you look at not making it any harder to fix up later? The
>>> more we pile onto this interface, the harder it will be later. We
>>> should aim to make ARMv8 really nice as it is the new thing.
>>
>>
>> I believe moving one or three functions into any replacement scheme will
>> have an identical level of difficulty. So, I believe the patch already
>> satisfies the "no harder" requirement.
>
> Well it seems your mind is already made up!

Well, I don't believe there are any viable or reasonable alternatives. 
I'm not prolonging this thread because I find it enjoyable, but because 
of the lack of alternatives to what this patch does.

Doing this via DM doesn't simplify anything or make it more 
maintainable; it's just using DM for the sake of it. DM is great when it 
makes life simpler and code more maintainable, but we use it because of 
those benefits, not just for the sake of it. Using DM adds complexity, 
and so there has to be a benefit of doing so. I don't believe there is here.

Doing this by linker scripts simply doesn't make sense:

Any given U-Boot binary will only contain a single implementation of 
these functions, so there's no need for any kind of runtime lookup, and 
if there was a runtime lookup, what would be the key and where would it 
come from? I believe we'd still have some compile-time-seledted 
function/data to drive the runtime lookup process, and so we'd be in 
exactly the same situation as we already are, just with more complexity 
added on top.

Using linker scripts to switch between implementations at compile time 
is much more complex than just letting the linker link stuff together 
directly. That's what it's good at. Using linker scripts would just add 
extra complexity without any benefit. We'd still end up with a single 
implementation in the binary, yet to call it code would have to indirect 
through the linker-defined table, rather than simply calling the same 
code by symbol name. Uggh!

Note that per the discussions in other forks of this thread, it's likely 
we'll need to code all these cache operations in assembly to avoid 
accessing DRAM while turning the cache on/off. This implies to me that 
we'll need to keep all the cache code extremely simple and direct, 
without any form of runtime or compile time indirection.
Tom Rini Oct. 19, 2016, 3:39 p.m. UTC | #17
On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
> On 10/18/2016 08:41 PM, Simon Glass wrote:
> >Hi Stephen,
> >
> >On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
> >>On 10/18/2016 05:08 PM, Simon Glass wrote:
> >>>
> >>>Hi Stephen,
> >>>
> >>>On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
> >>>>
> >>>>On 10/18/2016 01:56 PM, Simon Glass wrote:
> >>>>>
> >>>>>
> >>>>>Hi Stephen,
> >>>>>
> >>>>>On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
> >>>>>wrote:
> >>>>>>
> >>>>>>
> >>>>>>On 10/18/2016 01:03 PM, Simon Glass wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>Hi Stephen,
> >>>>>>>
> >>>>>>>On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
> >>>>>>>wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>On 10/18/2016 10:23 AM, Simon Glass wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>Hi Stephen,
> >>>>>>>>>
> >>>>>>>>>On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
> >>>>>>>>>wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>From: Stephen Warren <swarren@nvidia.com>
> >>>>>>>>>>
> >>>>>>>>>>SoC-specific logic may be required for all forms of cache-wide
> >>>>>>>>>>operations; invalidate and flush of both dcache and icache (note
> >>>>>>>>>>that
> >>>>>>>>>>only 3 of the 4 possible combinations make sense, since the icache
> >>>>>>>>>>never
> >>>>>>>>>>contains dirty lines). This patch adds an optional hook for all
> >>>>>>>>>>implemented cache-wide operations, and renames the one existing
> >>>>>>>>>>hook
> >>>>>>>>>>to
> >>>>>>>>>>better represent exactly which operation it is implementing. A
> >>>>>>>>>>dummy
> >>>>>>>>>>no-op implementation of each hook is provided. These dummy
> >>>>>>>>>>implementations are moved into C code, since there's no need to
> >>>>>>>>>>implement them in assembly.
> >>>>>>>>>>
> >>>>>>>>>>Signed-off-by: Stephen Warren <swarren@nvidia.com>
> >>>>>>>>>>---
> >>>>>>>>>> arch/arm/cpu/armv8/cache.S                   |  6 ------
> >>>>>>>>>> arch/arm/cpu/armv8/cache_v8.c                | 23
> >>>>>>>>>>++++++++++++++++++++---
> >>>>>>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
> >>>>>>>>>> arch/arm/include/asm/system.h                |  5 ++++-
> >>>>>>>>>> arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
> >>>>>>>>>> 5 files changed, 27 insertions(+), 13 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>I think we should have a proper interface for this stuff rather than
> >>>>>>>>>weak functions. Maybe we need a linker-list approach, or a cache
> >>>>>>>>>uclass?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>What's improper about this interface? Presumably we could argue that
> >>>>>>>>no
> >>>>>>>>function in the entire of U-Boot be called by symbol name, which
> >>>>>>>>doesn't
> >>>>>>>>seem useful.
> >>>>>>>>
> >>>>>>>>I'm not sure exactly what you envisage by a linker-list approach. Can
> >>>>>>>>you
> >>>>>>>>provide some background? I understand how the linker can construct
> >>>>>>>>list
> >>>>>>>>of
> >>>>>>>>objects/implementations/..., but that doesn't seem useful here since
> >>>>>>>>there's
> >>>>>>>>a known-ahead-of-time single implementation of these functions in a
> >>>>>>>>single
> >>>>>>>>build of U-Boot.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>Your own commit messages says that this is SoC-specific. I'm
> >>>>>>>suggesting that we define an interface (which I think you have already
> >>>>>>>done with your header file additions), and allow SoCs to implement it
> >>>>>>>via a linker list.
> >>>>>>>
> >>>>>>>IMO the cache code in U-Boot is starting to get a bit creaky.
> >>>>>>>
> >>>>>>>>A cache uclass seems like /massive/ overkill, especially since I'd
> >>>>>>>>expect
> >>>>>>>>these very low-level functions to be required well before any higher
> >>>>>>>>level
> >>>>>>>>code like DM/classes/... to be available.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>DM is available very early. But it's not clear from your patch when
> >>>>>>>this code is actually called.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>I believe that weak functions are a perfectly acceptable approach here.
> >>>>>>
> >>>>>>Yes, the implementation of these functions is SoC-specific. The
> >>>>>>Makefiles
> >>>>>>will pull in the appropriate implementation for that SoC whenever
> >>>>>>U-Boot
> >>>>>>is
> >>>>>>built, just like every other board- or SoC-specific function in the
> >>>>>>entire
> >>>>>>of U-Boot.
> >>>>>>
> >>>>>>There's no need for linker lists since there is only ever one
> >>>>>>implementation.
> >>>>>
> >>>>>
> >>>>>
> >>>>>If there is only ever one implementation, why do you need weak
> >>>>>functions?
> >>>>
> >>>>
> >>>>
> >>>>As I explicitly stated above, each SoC can have a different
> >>>>implementation,
> >>>>yet only a single implementation is ever needed for a particular U-Boot
> >>>>build.
> >>>>
> >>>>>Just call them directly.
> >>>>
> >>>>
> >>>>
> >>>>The code is doing that, both before and after my patch.
> >>>
> >>>
> >>>I mean call them without needing weak functions.
> >>>
> >>>>
> >>>>>I think in fact you mean that
> >>>>>there can be no implementation (but perhaps an empty one), or one
> >>>>>implementation. You are effectively using multiple weak functions to
> >>>>>provide default code. I think it would be better if this were
> >>>>>explicit.
> >>>>>
> >>>>>I still think that the cache functions could do with a rethink.
> >>>>
> >>>>
> >>>>
> >>>>In my opinion, this patch doesn't change the code structure at all. There
> >>>>is
> >>>>already an interface between the core (L1/L2) cache management code and
> >>>>the
> >>>>SoC-specific cache management code. That interface already uses a weak
> >>>>function to provide the default no-op implementation. This patch doesn't
> >>>>change any of that. All this patch does is fix that existing interface to
> >>>>cover all 3 combinations of dcache_flush, dcache_invalidate, and
> >>>>icache_invalidate, rather than just one of those combinations. It's more
> >>>>of
> >>>>a bug-fix than anything else.
> >>>
> >>>
> >>>Yes I see that.
> >>>
> >>>>
> >>>>If you want to rework this interface sometime, be my guest. However, I
> >>>>don't
> >>>>think it's fair to require that someone who simply wants to fix the
> >>>>existing
> >>>>code (in a way that is orthogonal to your desired interface refactoring)
> >>>>do
> >>>>that refactoring first, rather than doing it yourself.
> >>>
> >>>
> >>>I understand what you are saying, but isn't that how open source
> >>>software works? Believe me, I have done my fair share of refactoring
> >>>:-)
> >>>
> >>>At least can you look at not making it any harder to fix up later? The
> >>>more we pile onto this interface, the harder it will be later. We
> >>>should aim to make ARMv8 really nice as it is the new thing.
> >>
> >>
> >>I believe moving one or three functions into any replacement scheme will
> >>have an identical level of difficulty. So, I believe the patch already
> >>satisfies the "no harder" requirement.
> >
> >Well it seems your mind is already made up!
> 
> Well, I don't believe there are any viable or reasonable
> alternatives. I'm not prolonging this thread because I find it
> enjoyable, but because of the lack of alternatives to what this
> patch does.
> 
> Doing this via DM doesn't simplify anything or make it more
> maintainable; it's just using DM for the sake of it. DM is great
> when it makes life simpler and code more maintainable, but we use it
> because of those benefits, not just for the sake of it. Using DM
> adds complexity, and so there has to be a benefit of doing so. I
> don't believe there is here.
> 
> Doing this by linker scripts simply doesn't make sense:
> 
> Any given U-Boot binary will only contain a single implementation of
> these functions, so there's no need for any kind of runtime lookup,
> and if there was a runtime lookup, what would be the key and where
> would it come from? I believe we'd still have some
> compile-time-seledted function/data to drive the runtime lookup
> process, and so we'd be in exactly the same situation as we already
> are, just with more complexity added on top.
> 
> Using linker scripts to switch between implementations at compile
> time is much more complex than just letting the linker link stuff
> together directly. That's what it's good at. Using linker scripts
> would just add extra complexity without any benefit. We'd still end
> up with a single implementation in the binary, yet to call it code
> would have to indirect through the linker-defined table, rather than
> simply calling the same code by symbol name. Uggh!
> 
> Note that per the discussions in other forks of this thread, it's
> likely we'll need to code all these cache operations in assembly to
> avoid accessing DRAM while turning the cache on/off. This implies to
> me that we'll need to keep all the cache code extremely simple and
> direct, without any form of runtime or compile time indirection.

For the record, until / unless we move to the point where we can run
different SoCs within a single binary, doing this via weak functions
seems to me to be the correct abstraction.  If we know that some SoCs
will be able to nop these, that is.  If all SoCs will need something, we
should simply omit the default functions.  Thanks!
Simon Glass Oct. 19, 2016, 3:59 p.m. UTC | #18
Hi Tom,

On 19 October 2016 at 09:39, Tom Rini <trini@konsulko.com> wrote:
>
> On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
> > On 10/18/2016 08:41 PM, Simon Glass wrote:
> > >Hi Stephen,
> > >
> > >On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > >>On 10/18/2016 05:08 PM, Simon Glass wrote:
> > >>>
> > >>>Hi Stephen,
> > >>>
> > >>>On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > >>>>
> > >>>>On 10/18/2016 01:56 PM, Simon Glass wrote:
> > >>>>>
> > >>>>>
> > >>>>>Hi Stephen,
> > >>>>>
> > >>>>>On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
> > >>>>>wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>On 10/18/2016 01:03 PM, Simon Glass wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>Hi Stephen,
> > >>>>>>>
> > >>>>>>>On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
> > >>>>>>>wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>On 10/18/2016 10:23 AM, Simon Glass wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>Hi Stephen,
> > >>>>>>>>>
> > >>>>>>>>>On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
> > >>>>>>>>>wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>From: Stephen Warren <swarren@nvidia.com>
> > >>>>>>>>>>
> > >>>>>>>>>>SoC-specific logic may be required for all forms of cache-wide
> > >>>>>>>>>>operations; invalidate and flush of both dcache and icache (note
> > >>>>>>>>>>that
> > >>>>>>>>>>only 3 of the 4 possible combinations make sense, since the icache
> > >>>>>>>>>>never
> > >>>>>>>>>>contains dirty lines). This patch adds an optional hook for all
> > >>>>>>>>>>implemented cache-wide operations, and renames the one existing
> > >>>>>>>>>>hook
> > >>>>>>>>>>to
> > >>>>>>>>>>better represent exactly which operation it is implementing. A
> > >>>>>>>>>>dummy
> > >>>>>>>>>>no-op implementation of each hook is provided. These dummy
> > >>>>>>>>>>implementations are moved into C code, since there's no need to
> > >>>>>>>>>>implement them in assembly.
> > >>>>>>>>>>
> > >>>>>>>>>>Signed-off-by: Stephen Warren <swarren@nvidia.com>
> > >>>>>>>>>>---
> > >>>>>>>>>> arch/arm/cpu/armv8/cache.S                   |  6 ------
> > >>>>>>>>>> arch/arm/cpu/armv8/cache_v8.c                | 23
> > >>>>>>>>>>++++++++++++++++++++---
> > >>>>>>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
> > >>>>>>>>>> arch/arm/include/asm/system.h                |  5 ++++-
> > >>>>>>>>>> arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
> > >>>>>>>>>> 5 files changed, 27 insertions(+), 13 deletions(-)
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>I think we should have a proper interface for this stuff rather than
> > >>>>>>>>>weak functions. Maybe we need a linker-list approach, or a cache
> > >>>>>>>>>uclass?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>What's improper about this interface? Presumably we could argue that
> > >>>>>>>>no
> > >>>>>>>>function in the entire of U-Boot be called by symbol name, which
> > >>>>>>>>doesn't
> > >>>>>>>>seem useful.
> > >>>>>>>>
> > >>>>>>>>I'm not sure exactly what you envisage by a linker-list approach. Can
> > >>>>>>>>you
> > >>>>>>>>provide some background? I understand how the linker can construct
> > >>>>>>>>list
> > >>>>>>>>of
> > >>>>>>>>objects/implementations/..., but that doesn't seem useful here since
> > >>>>>>>>there's
> > >>>>>>>>a known-ahead-of-time single implementation of these functions in a
> > >>>>>>>>single
> > >>>>>>>>build of U-Boot.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>Your own commit messages says that this is SoC-specific. I'm
> > >>>>>>>suggesting that we define an interface (which I think you have already
> > >>>>>>>done with your header file additions), and allow SoCs to implement it
> > >>>>>>>via a linker list.
> > >>>>>>>
> > >>>>>>>IMO the cache code in U-Boot is starting to get a bit creaky.
> > >>>>>>>
> > >>>>>>>>A cache uclass seems like /massive/ overkill, especially since I'd
> > >>>>>>>>expect
> > >>>>>>>>these very low-level functions to be required well before any higher
> > >>>>>>>>level
> > >>>>>>>>code like DM/classes/... to be available.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>DM is available very early. But it's not clear from your patch when
> > >>>>>>>this code is actually called.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>I believe that weak functions are a perfectly acceptable approach here.
> > >>>>>>
> > >>>>>>Yes, the implementation of these functions is SoC-specific. The
> > >>>>>>Makefiles
> > >>>>>>will pull in the appropriate implementation for that SoC whenever
> > >>>>>>U-Boot
> > >>>>>>is
> > >>>>>>built, just like every other board- or SoC-specific function in the
> > >>>>>>entire
> > >>>>>>of U-Boot.
> > >>>>>>
> > >>>>>>There's no need for linker lists since there is only ever one
> > >>>>>>implementation.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>If there is only ever one implementation, why do you need weak
> > >>>>>functions?
> > >>>>
> > >>>>
> > >>>>
> > >>>>As I explicitly stated above, each SoC can have a different
> > >>>>implementation,
> > >>>>yet only a single implementation is ever needed for a particular U-Boot
> > >>>>build.
> > >>>>
> > >>>>>Just call them directly.
> > >>>>
> > >>>>
> > >>>>
> > >>>>The code is doing that, both before and after my patch.
> > >>>
> > >>>
> > >>>I mean call them without needing weak functions.
> > >>>
> > >>>>
> > >>>>>I think in fact you mean that
> > >>>>>there can be no implementation (but perhaps an empty one), or one
> > >>>>>implementation. You are effectively using multiple weak functions to
> > >>>>>provide default code. I think it would be better if this were
> > >>>>>explicit.
> > >>>>>
> > >>>>>I still think that the cache functions could do with a rethink.
> > >>>>
> > >>>>
> > >>>>
> > >>>>In my opinion, this patch doesn't change the code structure at all. There
> > >>>>is
> > >>>>already an interface between the core (L1/L2) cache management code and
> > >>>>the
> > >>>>SoC-specific cache management code. That interface already uses a weak
> > >>>>function to provide the default no-op implementation. This patch doesn't
> > >>>>change any of that. All this patch does is fix that existing interface to
> > >>>>cover all 3 combinations of dcache_flush, dcache_invalidate, and
> > >>>>icache_invalidate, rather than just one of those combinations. It's more
> > >>>>of
> > >>>>a bug-fix than anything else.
> > >>>
> > >>>
> > >>>Yes I see that.
> > >>>
> > >>>>
> > >>>>If you want to rework this interface sometime, be my guest. However, I
> > >>>>don't
> > >>>>think it's fair to require that someone who simply wants to fix the
> > >>>>existing
> > >>>>code (in a way that is orthogonal to your desired interface refactoring)
> > >>>>do
> > >>>>that refactoring first, rather than doing it yourself.
> > >>>
> > >>>
> > >>>I understand what you are saying, but isn't that how open source
> > >>>software works? Believe me, I have done my fair share of refactoring
> > >>>:-)
> > >>>
> > >>>At least can you look at not making it any harder to fix up later? The
> > >>>more we pile onto this interface, the harder it will be later. We
> > >>>should aim to make ARMv8 really nice as it is the new thing.
> > >>
> > >>
> > >>I believe moving one or three functions into any replacement scheme will
> > >>have an identical level of difficulty. So, I believe the patch already
> > >>satisfies the "no harder" requirement.
> > >
> > >Well it seems your mind is already made up!
> >
> > Well, I don't believe there are any viable or reasonable
> > alternatives. I'm not prolonging this thread because I find it
> > enjoyable, but because of the lack of alternatives to what this
> > patch does.
> >
> > Doing this via DM doesn't simplify anything or make it more
> > maintainable; it's just using DM for the sake of it. DM is great
> > when it makes life simpler and code more maintainable, but we use it
> > because of those benefits, not just for the sake of it. Using DM
> > adds complexity, and so there has to be a benefit of doing so. I
> > don't believe there is here.
> >
> > Doing this by linker scripts simply doesn't make sense:
> >
> > Any given U-Boot binary will only contain a single implementation of
> > these functions, so there's no need for any kind of runtime lookup,
> > and if there was a runtime lookup, what would be the key and where
> > would it come from? I believe we'd still have some
> > compile-time-seledted function/data to drive the runtime lookup
> > process, and so we'd be in exactly the same situation as we already
> > are, just with more complexity added on top.
> >
> > Using linker scripts to switch between implementations at compile
> > time is much more complex than just letting the linker link stuff
> > together directly. That's what it's good at. Using linker scripts
> > would just add extra complexity without any benefit. We'd still end
> > up with a single implementation in the binary, yet to call it code
> > would have to indirect through the linker-defined table, rather than
> > simply calling the same code by symbol name. Uggh!
> >
> > Note that per the discussions in other forks of this thread, it's
> > likely we'll need to code all these cache operations in assembly to
> > avoid accessing DRAM while turning the cache on/off. This implies to
> > me that we'll need to keep all the cache code extremely simple and
> > direct, without any form of runtime or compile time indirection.
>
> For the record, until / unless we move to the point where we can run
> different SoCs within a single binary, doing this via weak functions
> seems to me to be the correct abstraction.  If we know that some SoCs
> will be able to nop these, that is.  If all SoCs will need something, we
> should simply omit the default functions.  Thanks!

Is that a goal? I can see it would be useful to be able to build for
multiple SoCs (i.e. avoid having to worry about what board you have)
but we are a way off from that :-)

Regards,
Simon
Stephen Warren Oct. 19, 2016, 5:11 p.m. UTC | #19
On 10/19/2016 09:39 AM, Tom Rini wrote:
> On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
>> On 10/18/2016 08:41 PM, Simon Glass wrote:
>>> Hi Stephen,
>>>
>>> On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>> On 10/18/2016 05:08 PM, Simon Glass wrote:
>>>>>
>>>>> Hi Stephen,
>>>>>
>>>>> On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>>>>
>>>>>> On 10/18/2016 01:56 PM, Simon Glass wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/2016 01:03 PM, Simon Glass wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Stephen,
>>>>>>>>>
>>>>>>>>> On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/18/2016 10:23 AM, Simon Glass wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Stephen,
>>>>>>>>>>>
>>>>>>>>>>> On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>>>>>>>>>
>>>>>>>>>>>> SoC-specific logic may be required for all forms of cache-wide
>>>>>>>>>>>> operations; invalidate and flush of both dcache and icache (note
>>>>>>>>>>>> that
>>>>>>>>>>>> only 3 of the 4 possible combinations make sense, since the icache
>>>>>>>>>>>> never
>>>>>>>>>>>> contains dirty lines). This patch adds an optional hook for all
>>>>>>>>>>>> implemented cache-wide operations, and renames the one existing
>>>>>>>>>>>> hook
>>>>>>>>>>>> to
>>>>>>>>>>>> better represent exactly which operation it is implementing. A
>>>>>>>>>>>> dummy
>>>>>>>>>>>> no-op implementation of each hook is provided. These dummy
>>>>>>>>>>>> implementations are moved into C code, since there's no need to
>>>>>>>>>>>> implement them in assembly.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> arch/arm/cpu/armv8/cache.S                   |  6 ------
>>>>>>>>>>>> arch/arm/cpu/armv8/cache_v8.c                | 23
>>>>>>>>>>>> ++++++++++++++++++++---
>>>>>>>>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
>>>>>>>>>>>> arch/arm/include/asm/system.h                |  5 ++++-
>>>>>>>>>>>> arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
>>>>>>>>>>>> 5 files changed, 27 insertions(+), 13 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we should have a proper interface for this stuff rather than
>>>>>>>>>>> weak functions. Maybe we need a linker-list approach, or a cache
>>>>>>>>>>> uclass?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What's improper about this interface? Presumably we could argue that
>>>>>>>>>> no
>>>>>>>>>> function in the entire of U-Boot be called by symbol name, which
>>>>>>>>>> doesn't
>>>>>>>>>> seem useful.
>>>>>>>>>>
>>>>>>>>>> I'm not sure exactly what you envisage by a linker-list approach. Can
>>>>>>>>>> you
>>>>>>>>>> provide some background? I understand how the linker can construct
>>>>>>>>>> list
>>>>>>>>>> of
>>>>>>>>>> objects/implementations/..., but that doesn't seem useful here since
>>>>>>>>>> there's
>>>>>>>>>> a known-ahead-of-time single implementation of these functions in a
>>>>>>>>>> single
>>>>>>>>>> build of U-Boot.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Your own commit messages says that this is SoC-specific. I'm
>>>>>>>>> suggesting that we define an interface (which I think you have already
>>>>>>>>> done with your header file additions), and allow SoCs to implement it
>>>>>>>>> via a linker list.
>>>>>>>>>
>>>>>>>>> IMO the cache code in U-Boot is starting to get a bit creaky.
>>>>>>>>>
>>>>>>>>>> A cache uclass seems like /massive/ overkill, especially since I'd
>>>>>>>>>> expect
>>>>>>>>>> these very low-level functions to be required well before any higher
>>>>>>>>>> level
>>>>>>>>>> code like DM/classes/... to be available.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DM is available very early. But it's not clear from your patch when
>>>>>>>>> this code is actually called.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I believe that weak functions are a perfectly acceptable approach here.
>>>>>>>>
>>>>>>>> Yes, the implementation of these functions is SoC-specific. The
>>>>>>>> Makefiles
>>>>>>>> will pull in the appropriate implementation for that SoC whenever
>>>>>>>> U-Boot
>>>>>>>> is
>>>>>>>> built, just like every other board- or SoC-specific function in the
>>>>>>>> entire
>>>>>>>> of U-Boot.
>>>>>>>>
>>>>>>>> There's no need for linker lists since there is only ever one
>>>>>>>> implementation.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If there is only ever one implementation, why do you need weak
>>>>>>> functions?
>>>>>>
>>>>>>
>>>>>>
>>>>>> As I explicitly stated above, each SoC can have a different
>>>>>> implementation,
>>>>>> yet only a single implementation is ever needed for a particular U-Boot
>>>>>> build.
>>>>>>
>>>>>>> Just call them directly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The code is doing that, both before and after my patch.
>>>>>
>>>>>
>>>>> I mean call them without needing weak functions.
>>>>>
>>>>>>
>>>>>>> I think in fact you mean that
>>>>>>> there can be no implementation (but perhaps an empty one), or one
>>>>>>> implementation. You are effectively using multiple weak functions to
>>>>>>> provide default code. I think it would be better if this were
>>>>>>> explicit.
>>>>>>>
>>>>>>> I still think that the cache functions could do with a rethink.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In my opinion, this patch doesn't change the code structure at all. There
>>>>>> is
>>>>>> already an interface between the core (L1/L2) cache management code and
>>>>>> the
>>>>>> SoC-specific cache management code. That interface already uses a weak
>>>>>> function to provide the default no-op implementation. This patch doesn't
>>>>>> change any of that. All this patch does is fix that existing interface to
>>>>>> cover all 3 combinations of dcache_flush, dcache_invalidate, and
>>>>>> icache_invalidate, rather than just one of those combinations. It's more
>>>>>> of
>>>>>> a bug-fix than anything else.
>>>>>
>>>>>
>>>>> Yes I see that.
>>>>>
>>>>>>
>>>>>> If you want to rework this interface sometime, be my guest. However, I
>>>>>> don't
>>>>>> think it's fair to require that someone who simply wants to fix the
>>>>>> existing
>>>>>> code (in a way that is orthogonal to your desired interface refactoring)
>>>>>> do
>>>>>> that refactoring first, rather than doing it yourself.
>>>>>
>>>>>
>>>>> I understand what you are saying, but isn't that how open source
>>>>> software works? Believe me, I have done my fair share of refactoring
>>>>> :-)
>>>>>
>>>>> At least can you look at not making it any harder to fix up later? The
>>>>> more we pile onto this interface, the harder it will be later. We
>>>>> should aim to make ARMv8 really nice as it is the new thing.
>>>>
>>>>
>>>> I believe moving one or three functions into any replacement scheme will
>>>> have an identical level of difficulty. So, I believe the patch already
>>>> satisfies the "no harder" requirement.
>>>
>>> Well it seems your mind is already made up!
>>
>> Well, I don't believe there are any viable or reasonable
>> alternatives. I'm not prolonging this thread because I find it
>> enjoyable, but because of the lack of alternatives to what this
>> patch does.
>>
>> Doing this via DM doesn't simplify anything or make it more
>> maintainable; it's just using DM for the sake of it. DM is great
>> when it makes life simpler and code more maintainable, but we use it
>> because of those benefits, not just for the sake of it. Using DM
>> adds complexity, and so there has to be a benefit of doing so. I
>> don't believe there is here.
>>
>> Doing this by linker scripts simply doesn't make sense:
>>
>> Any given U-Boot binary will only contain a single implementation of
>> these functions, so there's no need for any kind of runtime lookup,
>> and if there was a runtime lookup, what would be the key and where
>> would it come from? I believe we'd still have some
>> compile-time-seledted function/data to drive the runtime lookup
>> process, and so we'd be in exactly the same situation as we already
>> are, just with more complexity added on top.
>>
>> Using linker scripts to switch between implementations at compile
>> time is much more complex than just letting the linker link stuff
>> together directly. That's what it's good at. Using linker scripts
>> would just add extra complexity without any benefit. We'd still end
>> up with a single implementation in the binary, yet to call it code
>> would have to indirect through the linker-defined table, rather than
>> simply calling the same code by symbol name. Uggh!
>>
>> Note that per the discussions in other forks of this thread, it's
>> likely we'll need to code all these cache operations in assembly to
>> avoid accessing DRAM while turning the cache on/off. This implies to
>> me that we'll need to keep all the cache code extremely simple and
>> direct, without any form of runtime or compile time indirection.
>
> For the record, until / unless we move to the point where we can run
> different SoCs within a single binary, doing this via weak functions
> seems to me to be the correct abstraction.  If we know that some SoCs
> will be able to nop these, that is.  If all SoCs will need something, we
> should simply omit the default functions.  Thanks!

I believe there can be[1] SoCs where the architectural by-set/way 
instructions are entirely sufficient to flush the dcache. In those 
cases, the default no-op implementation of the hook is appropriate.

[1] and likely are; not every existing ARMv8 U-Boot port implements the 
current __asm_flush_l3_cache hook, and hopefully they're all correct and 
working.
Tom Rini Oct. 19, 2016, 5:43 p.m. UTC | #20
On Wed, Oct 19, 2016 at 09:59:02AM -0600, Simon Glass wrote:
> Hi Tom,
> 
> On 19 October 2016 at 09:39, Tom Rini <trini@konsulko.com> wrote:
> >
> > On Wed, Oct 19, 2016 at 09:36:52AM -0600, Stephen Warren wrote:
> > > On 10/18/2016 08:41 PM, Simon Glass wrote:
> > > >Hi Stephen,
> > > >
> > > >On 18 October 2016 at 17:33, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > > >>On 10/18/2016 05:08 PM, Simon Glass wrote:
> > > >>>
> > > >>>Hi Stephen,
> > > >>>
> > > >>>On 18 October 2016 at 16:54, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > > >>>>
> > > >>>>On 10/18/2016 01:56 PM, Simon Glass wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>Hi Stephen,
> > > >>>>>
> > > >>>>>On 18 October 2016 at 13:10, Stephen Warren <swarren@wwwdotorg.org>
> > > >>>>>wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>On 10/18/2016 01:03 PM, Simon Glass wrote:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>Hi Stephen,
> > > >>>>>>>
> > > >>>>>>>On 18 October 2016 at 12:58, Stephen Warren <swarren@wwwdotorg.org>
> > > >>>>>>>wrote:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>On 10/18/2016 10:23 AM, Simon Glass wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>Hi Stephen,
> > > >>>>>>>>>
> > > >>>>>>>>>On 17 October 2016 at 15:35, Stephen Warren <swarren@wwwdotorg.org>
> > > >>>>>>>>>wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>From: Stephen Warren <swarren@nvidia.com>
> > > >>>>>>>>>>
> > > >>>>>>>>>>SoC-specific logic may be required for all forms of cache-wide
> > > >>>>>>>>>>operations; invalidate and flush of both dcache and icache (note
> > > >>>>>>>>>>that
> > > >>>>>>>>>>only 3 of the 4 possible combinations make sense, since the icache
> > > >>>>>>>>>>never
> > > >>>>>>>>>>contains dirty lines). This patch adds an optional hook for all
> > > >>>>>>>>>>implemented cache-wide operations, and renames the one existing
> > > >>>>>>>>>>hook
> > > >>>>>>>>>>to
> > > >>>>>>>>>>better represent exactly which operation it is implementing. A
> > > >>>>>>>>>>dummy
> > > >>>>>>>>>>no-op implementation of each hook is provided. These dummy
> > > >>>>>>>>>>implementations are moved into C code, since there's no need to
> > > >>>>>>>>>>implement them in assembly.
> > > >>>>>>>>>>
> > > >>>>>>>>>>Signed-off-by: Stephen Warren <swarren@nvidia.com>
> > > >>>>>>>>>>---
> > > >>>>>>>>>> arch/arm/cpu/armv8/cache.S                   |  6 ------
> > > >>>>>>>>>> arch/arm/cpu/armv8/cache_v8.c                | 23
> > > >>>>>>>>>>++++++++++++++++++++---
> > > >>>>>>>>>> arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S |  4 ++--
> > > >>>>>>>>>> arch/arm/include/asm/system.h                |  5 ++++-
> > > >>>>>>>>>> arch/arm/mach-tegra/tegra186/cache.c         |  2 +-
> > > >>>>>>>>>> 5 files changed, 27 insertions(+), 13 deletions(-)
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>I think we should have a proper interface for this stuff rather than
> > > >>>>>>>>>weak functions. Maybe we need a linker-list approach, or a cache
> > > >>>>>>>>>uclass?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>What's improper about this interface? Presumably we could argue that
> > > >>>>>>>>no
> > > >>>>>>>>function in the entire of U-Boot be called by symbol name, which
> > > >>>>>>>>doesn't
> > > >>>>>>>>seem useful.
> > > >>>>>>>>
> > > >>>>>>>>I'm not sure exactly what you envisage by a linker-list approach. Can
> > > >>>>>>>>you
> > > >>>>>>>>provide some background? I understand how the linker can construct
> > > >>>>>>>>list
> > > >>>>>>>>of
> > > >>>>>>>>objects/implementations/..., but that doesn't seem useful here since
> > > >>>>>>>>there's
> > > >>>>>>>>a known-ahead-of-time single implementation of these functions in a
> > > >>>>>>>>single
> > > >>>>>>>>build of U-Boot.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>Your own commit messages says that this is SoC-specific. I'm
> > > >>>>>>>suggesting that we define an interface (which I think you have already
> > > >>>>>>>done with your header file additions), and allow SoCs to implement it
> > > >>>>>>>via a linker list.
> > > >>>>>>>
> > > >>>>>>>IMO the cache code in U-Boot is starting to get a bit creaky.
> > > >>>>>>>
> > > >>>>>>>>A cache uclass seems like /massive/ overkill, especially since I'd
> > > >>>>>>>>expect
> > > >>>>>>>>these very low-level functions to be required well before any higher
> > > >>>>>>>>level
> > > >>>>>>>>code like DM/classes/... to be available.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>DM is available very early. But it's not clear from your patch when
> > > >>>>>>>this code is actually called.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>I believe that weak functions are a perfectly acceptable approach here.
> > > >>>>>>
> > > >>>>>>Yes, the implementation of these functions is SoC-specific. The
> > > >>>>>>Makefiles
> > > >>>>>>will pull in the appropriate implementation for that SoC whenever
> > > >>>>>>U-Boot
> > > >>>>>>is
> > > >>>>>>built, just like every other board- or SoC-specific function in the
> > > >>>>>>entire
> > > >>>>>>of U-Boot.
> > > >>>>>>
> > > >>>>>>There's no need for linker lists since there is only ever one
> > > >>>>>>implementation.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>If there is only ever one implementation, why do you need weak
> > > >>>>>functions?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>As I explicitly stated above, each SoC can have a different
> > > >>>>implementation,
> > > >>>>yet only a single implementation is ever needed for a particular U-Boot
> > > >>>>build.
> > > >>>>
> > > >>>>>Just call them directly.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>The code is doing that, both before and after my patch.
> > > >>>
> > > >>>
> > > >>>I mean call them without needing weak functions.
> > > >>>
> > > >>>>
> > > >>>>>I think in fact you mean that
> > > >>>>>there can be no implementation (but perhaps an empty one), or one
> > > >>>>>implementation. You are effectively using multiple weak functions to
> > > >>>>>provide default code. I think it would be better if this were
> > > >>>>>explicit.
> > > >>>>>
> > > >>>>>I still think that the cache functions could do with a rethink.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>In my opinion, this patch doesn't change the code structure at all. There
> > > >>>>is
> > > >>>>already an interface between the core (L1/L2) cache management code and
> > > >>>>the
> > > >>>>SoC-specific cache management code. That interface already uses a weak
> > > >>>>function to provide the default no-op implementation. This patch doesn't
> > > >>>>change any of that. All this patch does is fix that existing interface to
> > > >>>>cover all 3 combinations of dcache_flush, dcache_invalidate, and
> > > >>>>icache_invalidate, rather than just one of those combinations. It's more
> > > >>>>of
> > > >>>>a bug-fix than anything else.
> > > >>>
> > > >>>
> > > >>>Yes I see that.
> > > >>>
> > > >>>>
> > > >>>>If you want to rework this interface sometime, be my guest. However, I
> > > >>>>don't
> > > >>>>think it's fair to require that someone who simply wants to fix the
> > > >>>>existing
> > > >>>>code (in a way that is orthogonal to your desired interface refactoring)
> > > >>>>do
> > > >>>>that refactoring first, rather than doing it yourself.
> > > >>>
> > > >>>
> > > >>>I understand what you are saying, but isn't that how open source
> > > >>>software works? Believe me, I have done my fair share of refactoring
> > > >>>:-)
> > > >>>
> > > >>>At least can you look at not making it any harder to fix up later? The
> > > >>>more we pile onto this interface, the harder it will be later. We
> > > >>>should aim to make ARMv8 really nice as it is the new thing.
> > > >>
> > > >>
> > > >>I believe moving one or three functions into any replacement scheme will
> > > >>have an identical level of difficulty. So, I believe the patch already
> > > >>satisfies the "no harder" requirement.
> > > >
> > > >Well it seems your mind is already made up!
> > >
> > > Well, I don't believe there are any viable or reasonable
> > > alternatives. I'm not prolonging this thread because I find it
> > > enjoyable, but because of the lack of alternatives to what this
> > > patch does.
> > >
> > > Doing this via DM doesn't simplify anything or make it more
> > > maintainable; it's just using DM for the sake of it. DM is great
> > > when it makes life simpler and code more maintainable, but we use it
> > > because of those benefits, not just for the sake of it. Using DM
> > > adds complexity, and so there has to be a benefit of doing so. I
> > > don't believe there is here.
> > >
> > > Doing this by linker scripts simply doesn't make sense:
> > >
> > > Any given U-Boot binary will only contain a single implementation of
> > > these functions, so there's no need for any kind of runtime lookup,
> > > and if there was a runtime lookup, what would be the key and where
> > > would it come from? I believe we'd still have some
> > > compile-time-seledted function/data to drive the runtime lookup
> > > process, and so we'd be in exactly the same situation as we already
> > > are, just with more complexity added on top.
> > >
> > > Using linker scripts to switch between implementations at compile
> > > time is much more complex than just letting the linker link stuff
> > > together directly. That's what it's good at. Using linker scripts
> > > would just add extra complexity without any benefit. We'd still end
> > > up with a single implementation in the binary, yet to call it code
> > > would have to indirect through the linker-defined table, rather than
> > > simply calling the same code by symbol name. Uggh!
> > >
> > > Note that per the discussions in other forks of this thread, it's
> > > likely we'll need to code all these cache operations in assembly to
> > > avoid accessing DRAM while turning the cache on/off. This implies to
> > > me that we'll need to keep all the cache code extremely simple and
> > > direct, without any form of runtime or compile time indirection.
> >
> > For the record, until / unless we move to the point where we can run
> > different SoCs within a single binary, doing this via weak functions
> > seems to me to be the correct abstraction.  If we know that some SoCs
> > will be able to nop these, that is.  If all SoCs will need something, we
> > should simply omit the default functions.  Thanks!
> 
> Is that a goal? I can see it would be useful to be able to build for
> multiple SoCs (i.e. avoid having to worry about what board you have)
> but we are a way off from that :-)

We're well off from seeing about the reality of that, yes :)
diff mbox

Patch

diff --git a/arch/arm/cpu/armv8/cache.S b/arch/arm/cpu/armv8/cache.S
index 46f25e63f01d..23fa914dc556 100644
--- a/arch/arm/cpu/armv8/cache.S
+++ b/arch/arm/cpu/armv8/cache.S
@@ -150,12 +150,6 @@  ENTRY(__asm_invalidate_icache_all)
 	ret
 ENDPROC(__asm_invalidate_icache_all)
 
-ENTRY(__asm_flush_l3_cache)
-	mov	x0, #0			/* return status as success */
-	ret
-ENDPROC(__asm_flush_l3_cache)
-	.weak	__asm_flush_l3_cache
-
 /*
  * void __asm_switch_ttbr(ulong new_ttbr)
  *
diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
index cd3f6c10ae12..d2965e9878a0 100644
--- a/arch/arm/cpu/armv8/cache_v8.c
+++ b/arch/arm/cpu/armv8/cache_v8.c
@@ -415,25 +415,36 @@  __weak void mmu_setup(void)
 	set_sctlr(get_sctlr() | CR_M);
 }
 
+__weak int invalidate_dcache_all_l3(void)
+{
+	return 0;
+}
+
 /*
  * Performs a invalidation of the entire data cache at all levels
  */
 void invalidate_dcache_all(void)
 {
 	__asm_invalidate_dcache_all();
+	invalidate_dcache_all_l3();
+}
+
+__weak int flush_dcache_all_l3(void)
+{
+	return 0;
 }
 
 /*
  * Performs a clean & invalidation of the entire data cache at all levels.
  * This function needs to be inline to avoid using stack.
- * __asm_flush_l3_cache return status of timeout
+ * flush_dcache_all_l3 return status of timeout
  */
 inline void flush_dcache_all(void)
 {
 	int ret;
 
 	__asm_flush_dcache_all();
-	ret = __asm_flush_l3_cache();
+	ret = flush_dcache_all_l3();
 	if (ret)
 		debug("flushing dcache returns 0x%x\n", ret);
 	else
@@ -623,7 +634,7 @@  void mmu_set_region_dcache_behaviour(phys_addr_t start, size_t size,
 
 void icache_enable(void)
 {
-	__asm_invalidate_icache_all();
+	invalidate_icache_all();
 	set_sctlr(get_sctlr() | CR_I);
 }
 
@@ -637,9 +648,15 @@  int icache_status(void)
 	return (get_sctlr() & CR_I) != 0;
 }
 
+__weak int invalidate_icache_all_l3(void)
+{
+	return 0;
+}
+
 void invalidate_icache_all(void)
 {
 	__asm_invalidate_icache_all();
+	invalidate_icache_all_l3();
 }
 
 #else	/* CONFIG_SYS_ICACHE_OFF */
diff --git a/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S b/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S
index 5d0b7a45c354..4e4ef8b7a6df 100644
--- a/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S
+++ b/arch/arm/cpu/armv8/fsl-layerscape/lowlevel.S
@@ -245,7 +245,7 @@  hnf_set_pstate:
 
 	ret
 
-ENTRY(__asm_flush_l3_cache)
+ENTRY(flush_dcache_all_l3)
 	/*
 	 * Return status in x0
 	 *    success 0
@@ -275,7 +275,7 @@  ENTRY(__asm_flush_l3_cache)
 	mov	x0, x8
 	mov	lr, x29
 	ret
-ENDPROC(__asm_flush_l3_cache)
+ENDPROC(flush_dcache_all_l3)
 #endif
 
 #ifdef CONFIG_MP
diff --git a/arch/arm/include/asm/system.h b/arch/arm/include/asm/system.h
index c18e1e3a10ee..095f3742ce60 100644
--- a/arch/arm/include/asm/system.h
+++ b/arch/arm/include/asm/system.h
@@ -93,9 +93,12 @@  void __asm_invalidate_dcache_all(void);
 void __asm_flush_dcache_range(u64 start, u64 end);
 void __asm_invalidate_tlb_all(void);
 void __asm_invalidate_icache_all(void);
-int __asm_flush_l3_cache(void);
 void __asm_switch_ttbr(u64 new_ttbr);
 
+int invalidate_dcache_all_l3(void);
+int flush_dcache_all_l3(void);
+int invalidate_icache_all_l3(void);
+
 void armv8_switch_to_el2(void);
 void armv8_switch_to_el1(void);
 void gic_init(void);
diff --git a/arch/arm/mach-tegra/tegra186/cache.c b/arch/arm/mach-tegra/tegra186/cache.c
index adaed8968eb9..fb0b1142e49b 100644
--- a/arch/arm/mach-tegra/tegra186/cache.c
+++ b/arch/arm/mach-tegra/tegra186/cache.c
@@ -10,7 +10,7 @@ 
 #define SMC_SIP_INVOKE_MCE	0x82FFFF00
 #define MCE_SMC_ROC_FLUSH_CACHE	11
 
-int __asm_flush_l3_cache(void)
+int flush_dcache_all_l3(void)
 {
 	struct pt_regs regs = {0};