diff mbox series

[RFC] Revert "arm: Show cache warnings in U-Boot proper only"

Message ID 20191219005211.25327-1-andre.przywara@arm.com
State RFC
Delegated to: Tom Rini
Headers show
Series [RFC] Revert "arm: Show cache warnings in U-Boot proper only" | expand

Commit Message

Andre Przywara Dec. 19, 2019, 12:52 a.m. UTC
According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
check for v7"), which introduced check_cache_range(), this was meant
as a pure debugging feature, only to be compiled in when a developer
explicitly #defined DEBUG in cache.c. Presumably the intention was to
help with finding *certain* alignment issues with DMA buffers.

Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
compiled this in *unconditionally* into U-Boot proper.

This has the annoying side effect of producing tons of somewhat
pointless warnings about non-aligned clean&invalidate operations, which
tend to be appeased by even more pointless rounding operations in many
drivers (mostly those used on ARM boards).

Bring back the old behaviour, of only compiling this in for DEBUG
situations, but staying silent otherwise.

This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Hi,

if the intention was indeed to always force cache maintenance range
alignments, I would like to open a discussion on this, because I believe
it is not useful, especially in the clean&invalidate case.

Cheers,
Andre.

 arch/arm/lib/cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Marek Vasut Dec. 19, 2019, 12:55 a.m. UTC | #1
On 12/19/19 1:52 AM, Andre Przywara wrote:
> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> check for v7"), which introduced check_cache_range(), this was meant
> as a pure debugging feature, only to be compiled in when a developer
> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> help with finding *certain* alignment issues with DMA buffers.
> 
> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> compiled this in *unconditionally* into U-Boot proper.
> 
> This has the annoying side effect of producing tons of somewhat
> pointless warnings about non-aligned clean&invalidate operations, which
> tend to be appeased by even more pointless rounding operations in many
> drivers (mostly those used on ARM boards).
> 
> Bring back the old behaviour, of only compiling this in for DEBUG
> situations, but staying silent otherwise.
> 
> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Hi,
> 
> if the intention was indeed to always force cache maintenance range
> alignments, I would like to open a discussion on this, because I believe
> it is not useful, especially in the clean&invalidate case.

Why don't you rather fix the cache op alignment bugs ?
Andre Przywara Dec. 19, 2019, 1:23 a.m. UTC | #2
On 19/12/2019 00:55, Marek Vasut wrote:

Hi Marek,

> On 12/19/19 1:52 AM, Andre Przywara wrote:
>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
>> check for v7"), which introduced check_cache_range(), this was meant
>> as a pure debugging feature, only to be compiled in when a developer
>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
>> help with finding *certain* alignment issues with DMA buffers.
>>
>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
>> compiled this in *unconditionally* into U-Boot proper.
>>
>> This has the annoying side effect of producing tons of somewhat
>> pointless warnings about non-aligned clean&invalidate operations, which
>> tend to be appeased by even more pointless rounding operations in many
>> drivers (mostly those used on ARM boards).
>>
>> Bring back the old behaviour, of only compiling this in for DEBUG
>> situations, but staying silent otherwise.
>>
>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>> Hi,
>>
>> if the intention was indeed to always force cache maintenance range
>> alignments, I would like to open a discussion on this, because I believe
>> it is not useful, especially in the clean&invalidate case.
> 
> Why don't you rather fix the cache op alignment bugs ?

Which bugs do you mean?

Those that would currently trigger those warnings?
I don't think they are actual bugs, besides I don't think they are any
cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
driver in the rpi-4-32b_defconfig).

Or those that are currently hidden because we *force* an alignment on
the *arguments* passed to invalidate_dcache_range, for instance?
These are quite numerous, so I would rather get some input first before
spending a lot of time on this.

Starting a discussion on this topic and getting some feedback was the
actual reason for this patch - even though it is still valid, IMHO.

Cheers,
Andre
Marek Vasut Dec. 19, 2019, 8:04 a.m. UTC | #3
On 12/19/19 2:23 AM, André Przywara wrote:
> On 19/12/2019 00:55, Marek Vasut wrote:
> 
> Hi Marek,

Hi,

>> On 12/19/19 1:52 AM, Andre Przywara wrote:
>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
>>> check for v7"), which introduced check_cache_range(), this was meant
>>> as a pure debugging feature, only to be compiled in when a developer
>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
>>> help with finding *certain* alignment issues with DMA buffers.
>>>
>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
>>> compiled this in *unconditionally* into U-Boot proper.
>>>
>>> This has the annoying side effect of producing tons of somewhat
>>> pointless warnings about non-aligned clean&invalidate operations, which
>>> tend to be appeased by even more pointless rounding operations in many
>>> drivers (mostly those used on ARM boards).
>>>
>>> Bring back the old behaviour, of only compiling this in for DEBUG
>>> situations, but staying silent otherwise.
>>>
>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>> Hi,
>>>
>>> if the intention was indeed to always force cache maintenance range
>>> alignments, I would like to open a discussion on this, because I believe
>>> it is not useful, especially in the clean&invalidate case.
>>
>> Why don't you rather fix the cache op alignment bugs ?
> 
> Which bugs do you mean?

Well, the ones reported by this warning.

> Those that would currently trigger those warnings?
> I don't think they are actual bugs, besides I don't think they are any
> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> driver in the rpi-4-32b_defconfig).

If your cache isn't flushed or invalidated properly, it's a bug.
On ARM32, this is (was, before this warning) a real problem.

> Or those that are currently hidden because we *force* an alignment on
> the *arguments* passed to invalidate_dcache_range, for instance?
> These are quite numerous, so I would rather get some input first before
> spending a lot of time on this.

Make sure your buffers are properly aligned and you have no cache
issues. It's really that easy. If you see numerous, then your platform
has a real problem which must be fixed ; I didn't see any for quite a
while on platforms I have access to.

> Starting a discussion on this topic and getting some feedback was the
> actual reason for this patch - even though it is still valid, IMHO.

[...]
Andre Przywara Dec. 19, 2019, 11:36 a.m. UTC | #4
On Thu, 19 Dec 2019 09:04:32 +0100
Marek Vasut <marex@denx.de> wrote:

Hi Marek,

> On 12/19/19 2:23 AM, André Przywara wrote:
> > On 19/12/2019 00:55, Marek Vasut wrote:
> > 
> > Hi Marek,  
> 
> Hi,
> 
> >> On 12/19/19 1:52 AM, Andre Przywara wrote:  
> >>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> >>> check for v7"), which introduced check_cache_range(), this was meant
> >>> as a pure debugging feature, only to be compiled in when a developer
> >>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> >>> help with finding *certain* alignment issues with DMA buffers.
> >>>
> >>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> >>> compiled this in *unconditionally* into U-Boot proper.
> >>>
> >>> This has the annoying side effect of producing tons of somewhat
> >>> pointless warnings about non-aligned clean&invalidate operations, which
> >>> tend to be appeased by even more pointless rounding operations in many
> >>> drivers (mostly those used on ARM boards).
> >>>
> >>> Bring back the old behaviour, of only compiling this in for DEBUG
> >>> situations, but staying silent otherwise.
> >>>
> >>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> >>>
> >>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >>> ---
> >>> Hi,
> >>>
> >>> if the intention was indeed to always force cache maintenance range
> >>> alignments, I would like to open a discussion on this, because I believe
> >>> it is not useful, especially in the clean&invalidate case.  
> >>
> >> Why don't you rather fix the cache op alignment bugs ?  
> > 
> > Which bugs do you mean?  
> 
> Well, the ones reported by this warning.

As mentioned, most of them are just silenced in drivers, though not actually fixed.
This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
I will try to send out this series, which might make it more clear what I am after.

> > Those that would currently trigger those warnings?
> > I don't think they are actual bugs, besides I don't think they are any
> > cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> > driver in the rpi-4-32b_defconfig).  
> 
> If your cache isn't flushed or invalidated properly, it's a bug.
> On ARM32, this is (was, before this warning) a real problem.

The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:

Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.

Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().

Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation. This
forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.

So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.

> > Or those that are currently hidden because we *force* an alignment on
> > the *arguments* passed to invalidate_dcache_range, for instance?
> > These are quite numerous, so I would rather get some input first before
> > spending a lot of time on this.  
> 
> Make sure your buffers are properly aligned and you have no cache
> issues. It's really that easy. If you see numerous, then your platform
> has a real problem which must be fixed ; I didn't see any for quite a
> while on platforms I have access to.

Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.
And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).

Cheers,
Andre.


> > Starting a discussion on this topic and getting some feedback was the
> > actual reason for this patch - even though it is still valid, IMHO.  
> 
> [...]
Marek Vasut Dec. 19, 2019, 11:43 a.m. UTC | #5
On 12/19/19 12:36 PM, Andre Przywara wrote:
> On Thu, 19 Dec 2019 09:04:32 +0100
> Marek Vasut <marex@denx.de> wrote:
> 
> Hi Marek,

Hi,

>> On 12/19/19 2:23 AM, André Przywara wrote:
>>> On 19/12/2019 00:55, Marek Vasut wrote:
>>>
>>> Hi Marek,  
>>
>> Hi,
>>
>>>> On 12/19/19 1:52 AM, Andre Przywara wrote:  
>>>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
>>>>> check for v7"), which introduced check_cache_range(), this was meant
>>>>> as a pure debugging feature, only to be compiled in when a developer
>>>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
>>>>> help with finding *certain* alignment issues with DMA buffers.
>>>>>
>>>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
>>>>> compiled this in *unconditionally* into U-Boot proper.
>>>>>
>>>>> This has the annoying side effect of producing tons of somewhat
>>>>> pointless warnings about non-aligned clean&invalidate operations, which
>>>>> tend to be appeased by even more pointless rounding operations in many
>>>>> drivers (mostly those used on ARM boards).
>>>>>
>>>>> Bring back the old behaviour, of only compiling this in for DEBUG
>>>>> situations, but staying silent otherwise.
>>>>>
>>>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
>>>>>
>>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>>> ---
>>>>> Hi,
>>>>>
>>>>> if the intention was indeed to always force cache maintenance range
>>>>> alignments, I would like to open a discussion on this, because I believe
>>>>> it is not useful, especially in the clean&invalidate case.  
>>>>
>>>> Why don't you rather fix the cache op alignment bugs ?  
>>>
>>> Which bugs do you mean?  
>>
>> Well, the ones reported by this warning.
> 
> As mentioned, most of them are just silenced in drivers, though not actually fixed.

Do you have specific details here ? They should be fixed.

> This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
> I will try to send out this series, which might make it more clear what I am after.
> 
>>> Those that would currently trigger those warnings?
>>> I don't think they are actual bugs, besides I don't think they are any
>>> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
>>> driver in the rpi-4-32b_defconfig).  
>>
>> If your cache isn't flushed or invalidated properly, it's a bug.
>> On ARM32, this is (was, before this warning) a real problem.
> 
> The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:
> 
> Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.

U-Boot is not only V7 and V8, but also all the earlier ones (back to V4).

> Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().

Sure, but that's a driver bug. That's a separate topic from this patch.

> Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation.

Wrong, you may interfere with some DMA engine which might be accessing
the area which you just overshot.

> This
> forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.

Again, driver bug, that's unrelated and must be fixed.

> So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.

See above.

>>> Or those that are currently hidden because we *force* an alignment on
>>> the *arguments* passed to invalidate_dcache_range, for instance?
>>> These are quite numerous, so I would rather get some input first before
>>> spending a lot of time on this.  
>>
>> Make sure your buffers are properly aligned and you have no cache
>> issues. It's really that easy. If you see numerous, then your platform
>> has a real problem which must be fixed ; I didn't see any for quite a
>> while on platforms I have access to.
> 
> Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.

Of course the address passed to cache ops should not be rounded up.
That's a bug (unless the buffer actually covers that rounded address,
which sometimes is the case, and might or might not thus be harmless.
That's up to the driver author to determine)

> And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).

This implication makes no sense. How do you infer from "drivers have
bugs" that a warning about such bugs should be turned into a DEBUG()?
Andre Przywara Dec. 19, 2019, 12:32 p.m. UTC | #6
On Thu, 19 Dec 2019 12:43:35 +0100
Marek Vasut <marex@denx.de> wrote:

Hi Marek,

> On 12/19/19 12:36 PM, Andre Przywara wrote:
> > On Thu, 19 Dec 2019 09:04:32 +0100
> > Marek Vasut <marex@denx.de> wrote:
> >> On 12/19/19 2:23 AM, André Przywara wrote:  
> >>> On 19/12/2019 00:55, Marek Vasut wrote:
> >>>> On 12/19/19 1:52 AM, Andre Przywara wrote:    
> >>>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> >>>>> check for v7"), which introduced check_cache_range(), this was meant
> >>>>> as a pure debugging feature, only to be compiled in when a developer
> >>>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> >>>>> help with finding *certain* alignment issues with DMA buffers.
> >>>>>
> >>>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> >>>>> compiled this in *unconditionally* into U-Boot proper.
> >>>>>
> >>>>> This has the annoying side effect of producing tons of somewhat
> >>>>> pointless warnings about non-aligned clean&invalidate operations, which
> >>>>> tend to be appeased by even more pointless rounding operations in many
> >>>>> drivers (mostly those used on ARM boards).
> >>>>>
> >>>>> Bring back the old behaviour, of only compiling this in for DEBUG
> >>>>> situations, but staying silent otherwise.
> >>>>>
> >>>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> >>>>>
> >>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >>>>> ---
> >>>>> Hi,
> >>>>>
> >>>>> if the intention was indeed to always force cache maintenance range
> >>>>> alignments, I would like to open a discussion on this, because I believe
> >>>>> it is not useful, especially in the clean&invalidate case.    
> >>>>
> >>>> Why don't you rather fix the cache op alignment bugs ?    
> >>>
> >>> Which bugs do you mean?    
> >>
> >> Well, the ones reported by this warning.  
> > 
> > As mentioned, most of them are just silenced in drivers, though not actually fixed.  
> 
> Do you have specific details here ? They should be fixed.
> 
> > This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
> > I will try to send out this series, which might make it more clear what I am after.
> >   
> >>> Those that would currently trigger those warnings?
> >>> I don't think they are actual bugs, besides I don't think they are any
> >>> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> >>> driver in the rpi-4-32b_defconfig).    
> >>
> >> If your cache isn't flushed or invalidated properly, it's a bug.
> >> On ARM32, this is (was, before this warning) a real problem.  
> > 
> > The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:
> > 
> > Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.  
> 
> U-Boot is not only V7 and V8, but also all the earlier ones (back to V4).

I checked v6 as well (same), and couldn't find any evidence that this would be different for v5 or v4, although CP15 does not seem to be architecturally defined there. Do you have an example of an older core failing that?
But anyway this particular check and patch is affecting v7 only.

> > Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().  
> 
> Sure, but that's a driver bug. That's a separate topic from this patch.
> 
> > Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation.  
> 
> Wrong, you may interfere with some DMA engine which might be accessing
> the area which you just overshot.

Can you give an example on that? Are you thinking about cleaning dirty cache lines to a DRAM region which DMA is just writing data into?
After all a clean could happen at any time, when a line gets evicted from the cache, without an explicit cache maintenance instruction. So if you have such a case, that's a separate bug.

> > This
> > forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.
> 
> Again, driver bug, that's unrelated and must be fixed.

If you fix them, they will trigger tons of warnings, mostly bogus, about end addresses not being aligned (see below). Been there, done that. That's why this patch as a first step to allow those fixes without spamming the user.

> > So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.  
> 
> See above.
> 
> >>> Or those that are currently hidden because we *force* an alignment on
> >>> the *arguments* passed to invalidate_dcache_range, for instance?
> >>> These are quite numerous, so I would rather get some input first before
> >>> spending a lot of time on this.    
> >>
> >> Make sure your buffers are properly aligned and you have no cache
> >> issues. It's really that easy. If you see numerous, then your platform
> >> has a real problem which must be fixed ; I didn't see any for quite a
> >> while on platforms I have access to.  
> > 
> > Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.  
> 
> Of course the address passed to cache ops should not be rounded up.
> That's a bug (unless the buffer actually covers that rounded address,
> which sometimes is the case, and might or might not thus be harmless.
> That's up to the driver author to determine)

Yes, and that's why the strict unconditional check we have right now is not helping. We should have them in, and developers could check the warnings for being legit or not.
 
> > And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).  
> 
> This implication makes no sense. How do you infer from "drivers have
> bugs" that a warning about such bugs should be turned into a DEBUG()?

Most (if not all) of those warnings come from the end address not being aligned. And this is something not easily being checked outside of the driver. What we want to check for invalidate is that we only invalidate a well known region, so:
1) The start address of the buffer is aligned.
2) The end address, rounded up to the next cache line, is still within our buffer (so nothing else is using the same cacheline).

We can check 1) easily, but not 2), that would require driver specific code, which knows about the size of the buffer. We could provide something like:
  bool check_cache_buffer(void *start_address, size_t length, size_t buffer_size);
which drivers would call explicitly. Or we could include the buffer size into the invalidate call, but that would change the interface.

That would at least cover the buffers that the driver provides itself (RX buffers for network drivers). Network TX buffers are handed in from the generic network code, and the driver does not know what is there after "length". So the best thing we can do is to call "flush_dcache_range(packet, packet + length);", and then should not receive warnings if the end address is unaligned. That's why I think checks for flush are not useful.

Cheers,
Andre
Tom Rini Jan. 6, 2020, 4:01 p.m. UTC | #7
On Thu, Dec 19, 2019 at 12:32:09PM +0000, Andre Przywara wrote:
> On Thu, 19 Dec 2019 12:43:35 +0100
> Marek Vasut <marex@denx.de> wrote:
> 
> Hi Marek,
> 
> > On 12/19/19 12:36 PM, Andre Przywara wrote:
> > > On Thu, 19 Dec 2019 09:04:32 +0100
> > > Marek Vasut <marex@denx.de> wrote:
> > >> On 12/19/19 2:23 AM, André Przywara wrote:  
> > >>> On 19/12/2019 00:55, Marek Vasut wrote:
> > >>>> On 12/19/19 1:52 AM, Andre Przywara wrote:    
> > >>>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> > >>>>> check for v7"), which introduced check_cache_range(), this was meant
> > >>>>> as a pure debugging feature, only to be compiled in when a developer
> > >>>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> > >>>>> help with finding *certain* alignment issues with DMA buffers.
> > >>>>>
> > >>>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> > >>>>> compiled this in *unconditionally* into U-Boot proper.
> > >>>>>
> > >>>>> This has the annoying side effect of producing tons of somewhat
> > >>>>> pointless warnings about non-aligned clean&invalidate operations, which
> > >>>>> tend to be appeased by even more pointless rounding operations in many
> > >>>>> drivers (mostly those used on ARM boards).
> > >>>>>
> > >>>>> Bring back the old behaviour, of only compiling this in for DEBUG
> > >>>>> situations, but staying silent otherwise.
> > >>>>>
> > >>>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> > >>>>>
> > >>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > >>>>> ---
> > >>>>> Hi,
> > >>>>>
> > >>>>> if the intention was indeed to always force cache maintenance range
> > >>>>> alignments, I would like to open a discussion on this, because I believe
> > >>>>> it is not useful, especially in the clean&invalidate case.    
> > >>>>
> > >>>> Why don't you rather fix the cache op alignment bugs ?    
> > >>>
> > >>> Which bugs do you mean?    
> > >>
> > >> Well, the ones reported by this warning.  
> > > 
> > > As mentioned, most of them are just silenced in drivers, though not actually fixed.  
> > 
> > Do you have specific details here ? They should be fixed.
> > 
> > > This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
> > > I will try to send out this series, which might make it more clear what I am after.
> > >   
> > >>> Those that would currently trigger those warnings?
> > >>> I don't think they are actual bugs, besides I don't think they are any
> > >>> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> > >>> driver in the rpi-4-32b_defconfig).    
> > >>
> > >> If your cache isn't flushed or invalidated properly, it's a bug.
> > >> On ARM32, this is (was, before this warning) a real problem.  
> > > 
> > > The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:
> > > 
> > > Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.  
> > 
> > U-Boot is not only V7 and V8, but also all the earlier ones (back to V4).
> 
> I checked v6 as well (same), and couldn't find any evidence that this would be different for v5 or v4, although CP15 does not seem to be architecturally defined there. Do you have an example of an older core failing that?
> But anyway this particular check and patch is affecting v7 only.
> 
> > > Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().  
> > 
> > Sure, but that's a driver bug. That's a separate topic from this patch.
> > 
> > > Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation.  
> > 
> > Wrong, you may interfere with some DMA engine which might be accessing
> > the area which you just overshot.
> 
> Can you give an example on that? Are you thinking about cleaning dirty cache lines to a DRAM region which DMA is just writing data into?
> After all a clean could happen at any time, when a line gets evicted from the cache, without an explicit cache maintenance instruction. So if you have such a case, that's a separate bug.
> 
> > > This
> > > forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.
> > 
> > Again, driver bug, that's unrelated and must be fixed.
> 
> If you fix them, they will trigger tons of warnings, mostly bogus, about end addresses not being aligned (see below). Been there, done that. That's why this patch as a first step to allow those fixes without spamming the user.
> 
> > > So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.  
> > 
> > See above.
> > 
> > >>> Or those that are currently hidden because we *force* an alignment on
> > >>> the *arguments* passed to invalidate_dcache_range, for instance?
> > >>> These are quite numerous, so I would rather get some input first before
> > >>> spending a lot of time on this.    
> > >>
> > >> Make sure your buffers are properly aligned and you have no cache
> > >> issues. It's really that easy. If you see numerous, then your platform
> > >> has a real problem which must be fixed ; I didn't see any for quite a
> > >> while on platforms I have access to.  
> > > 
> > > Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.  
> > 
> > Of course the address passed to cache ops should not be rounded up.
> > That's a bug (unless the buffer actually covers that rounded address,
> > which sometimes is the case, and might or might not thus be harmless.
> > That's up to the driver author to determine)
> 
> Yes, and that's why the strict unconditional check we have right now is not helping. We should have them in, and developers could check the warnings for being legit or not.
>  
> > > And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).  
> > 
> > This implication makes no sense. How do you infer from "drivers have
> > bugs" that a warning about such bugs should be turned into a DEBUG()?
> 
> Most (if not all) of those warnings come from the end address not being aligned. And this is something not easily being checked outside of the driver. What we want to check for invalidate is that we only invalidate a well known region, so:
> 1) The start address of the buffer is aligned.
> 2) The end address, rounded up to the next cache line, is still within our buffer (so nothing else is using the same cacheline).
> 
> We can check 1) easily, but not 2), that would require driver specific code, which knows about the size of the buffer. We could provide something like:
>   bool check_cache_buffer(void *start_address, size_t length, size_t buffer_size);
> which drivers would call explicitly. Or we could include the buffer size into the invalidate call, but that would change the interface.
> 
> That would at least cover the buffers that the driver provides itself (RX buffers for network drivers). Network TX buffers are handed in from the generic network code, and the driver does not know what is there after "length". So the best thing we can do is to call "flush_dcache_range(packet, packet + length);", and then should not receive warnings if the end address is unaligned. That's why I think checks for flush are not useful.

Thanks for having this discussion.  I think it is reasonable at this
point to investigate how we can evolve what we about cache flushing to
be better than what we have now.  Looking at the cases above, it seems
like it might be reasonable to have always enabled checks #1 (that can
easily be disabled) and if we can't easily migrate our callers to handle
#2 (and maybe not have that check be default enabled?) at least provide
the helper for it, for debugging issues.  Thanks all!
Andre Przywara Jan. 7, 2020, 6:14 p.m. UTC | #8
On Mon, 6 Jan 2020 11:01:21 -0500
Tom Rini <trini@konsulko.com> wrote:

Hi Tom,

thanks for chiming in on this!

> On Thu, Dec 19, 2019 at 12:32:09PM +0000, Andre Przywara wrote:
> > On Thu, 19 Dec 2019 12:43:35 +0100
> > Marek Vasut <marex@denx.de> wrote:
> > 
> > Hi Marek,
> >   
> > > On 12/19/19 12:36 PM, Andre Przywara wrote:  
> > > > On Thu, 19 Dec 2019 09:04:32 +0100
> > > > Marek Vasut <marex@denx.de> wrote:  
> > > >> On 12/19/19 2:23 AM, André Przywara wrote:    
> > > >>> On 19/12/2019 00:55, Marek Vasut wrote:  
> > > >>>> On 12/19/19 1:52 AM, Andre Przywara wrote:      
> > > >>>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> > > >>>>> check for v7"), which introduced check_cache_range(), this was meant
> > > >>>>> as a pure debugging feature, only to be compiled in when a developer
> > > >>>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> > > >>>>> help with finding *certain* alignment issues with DMA buffers.
> > > >>>>>
> > > >>>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> > > >>>>> compiled this in *unconditionally* into U-Boot proper.
> > > >>>>>
> > > >>>>> This has the annoying side effect of producing tons of somewhat
> > > >>>>> pointless warnings about non-aligned clean&invalidate operations, which
> > > >>>>> tend to be appeased by even more pointless rounding operations in many
> > > >>>>> drivers (mostly those used on ARM boards).
> > > >>>>>
> > > >>>>> Bring back the old behaviour, of only compiling this in for DEBUG
> > > >>>>> situations, but staying silent otherwise.
> > > >>>>>
> > > >>>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> > > >>>>>
> > > >>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > >>>>> ---
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> if the intention was indeed to always force cache maintenance range
> > > >>>>> alignments, I would like to open a discussion on this, because I believe
> > > >>>>> it is not useful, especially in the clean&invalidate case.      
> > > >>>>
> > > >>>> Why don't you rather fix the cache op alignment bugs ?      
> > > >>>
> > > >>> Which bugs do you mean?      
> > > >>
> > > >> Well, the ones reported by this warning.    
> > > > 
> > > > As mentioned, most of them are just silenced in drivers, though not actually fixed.    
> > > 
> > > Do you have specific details here ? They should be fixed.
> > >   
> > > > This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
> > > > I will try to send out this series, which might make it more clear what I am after.
> > > >     
> > > >>> Those that would currently trigger those warnings?
> > > >>> I don't think they are actual bugs, besides I don't think they are any
> > > >>> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> > > >>> driver in the rpi-4-32b_defconfig).      
> > > >>
> > > >> If your cache isn't flushed or invalidated properly, it's a bug.
> > > >> On ARM32, this is (was, before this warning) a real problem.    
> > > > 
> > > > The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:
> > > > 
> > > > Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.    
> > > 
> > > U-Boot is not only V7 and V8, but also all the earlier ones (back to V4).  
> > 
> > I checked v6 as well (same), and couldn't find any evidence that this would be different for v5 or v4, although CP15 does not seem to be architecturally defined there. Do you have an example of an older core failing that?
> > But anyway this particular check and patch is affecting v7 only.
> >   
> > > > Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().    
> > > 
> > > Sure, but that's a driver bug. That's a separate topic from this patch.
> > >   
> > > > Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation.    
> > > 
> > > Wrong, you may interfere with some DMA engine which might be accessing
> > > the area which you just overshot.  
> > 
> > Can you give an example on that? Are you thinking about cleaning dirty cache lines to a DRAM region which DMA is just writing data into?
> > After all a clean could happen at any time, when a line gets evicted from the cache, without an explicit cache maintenance instruction. So if you have such a case, that's a separate bug.
> >   
> > > > This
> > > > forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.  
> > > 
> > > Again, driver bug, that's unrelated and must be fixed.  
> > 
> > If you fix them, they will trigger tons of warnings, mostly bogus, about end addresses not being aligned (see below). Been there, done that. That's why this patch as a first step to allow those fixes without spamming the user.
> >   
> > > > So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.    
> > > 
> > > See above.
> > >   
> > > >>> Or those that are currently hidden because we *force* an alignment on
> > > >>> the *arguments* passed to invalidate_dcache_range, for instance?
> > > >>> These are quite numerous, so I would rather get some input first before
> > > >>> spending a lot of time on this.      
> > > >>
> > > >> Make sure your buffers are properly aligned and you have no cache
> > > >> issues. It's really that easy. If you see numerous, then your platform
> > > >> has a real problem which must be fixed ; I didn't see any for quite a
> > > >> while on platforms I have access to.    
> > > > 
> > > > Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.    
> > > 
> > > Of course the address passed to cache ops should not be rounded up.
> > > That's a bug (unless the buffer actually covers that rounded address,
> > > which sometimes is the case, and might or might not thus be harmless.
> > > That's up to the driver author to determine)  
> > 
> > Yes, and that's why the strict unconditional check we have right now is not helping. We should have them in, and developers could check the warnings for being legit or not.
> >    
> > > > And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).    
> > > 
> > > This implication makes no sense. How do you infer from "drivers have
> > > bugs" that a warning about such bugs should be turned into a DEBUG()?  
> > 
> > Most (if not all) of those warnings come from the end address not being aligned. And this is something not easily being checked outside of the driver. What we want to check for invalidate is that we only invalidate a well known region, so:
> > 1) The start address of the buffer is aligned.
> > 2) The end address, rounded up to the next cache line, is still within our buffer (so nothing else is using the same cacheline).
> > 
> > We can check 1) easily, but not 2), that would require driver specific code, which knows about the size of the buffer. We could provide something like:
> >   bool check_cache_buffer(void *start_address, size_t length, size_t buffer_size);
> > which drivers would call explicitly. Or we could include the buffer size into the invalidate call, but that would change the interface.
> > 
> > That would at least cover the buffers that the driver provides itself (RX buffers for network drivers). Network TX buffers are handed in from the generic network code, and the driver does not know what is there after "length". So the best thing we can do is to call "flush_dcache_range(packet, packet + length);", and then should not receive warnings if the end address is unaligned. That's why I think checks for flush are not useful.  
> 
> Thanks for having this discussion.  I think it is reasonable at this
> point to investigate how we can evolve what we about cache flushing to
> be better than what we have now.  Looking at the cases above, it seems
> like it might be reasonable to have always enabled checks #1 (that can
> easily be disabled) and if we can't easily migrate our callers to handle
> #2 (and maybe not have that check be default enabled?) at least provide
> the helper for it, for debugging issues.  Thanks all!

OK, I prototyped #2 in my sun8i-emac fixes series already. Will try to send this out later as a discussion base.

But still I get (false positive) warnings about unaligned flush() calls when fixing the driver. So shall we apply this patch to get this out of the way first? People can still bring back the checks by #define DEBUG.

For see the impact of this problem I attach my analysis of the network drivers. This is best effort and probably not 100% correct in every aspect, so bear with me if I misjudged your driver.
But I think the general picture is evident:
- Most (if not all) drivers used on ARM(32) boards do some kind of (misleading) rounding. We don't see this in MIPS drivers, for instance. This tells me that this is probably just to appease these unconditional checks.
- There are far more length roundings than base roundings.
- The ARMv7 implementation of flush/invalidate_dcache_range() is the only one doing these checks (unconditionally). Another reason having this outside of debug is a mistake and it should be reverted?

Fixing those drivers is possible, but tedious. And would rely on those checks being fixed first.

Also it sounds hard to get this all properly tested. So shall we just fix those drivers that people actually care about?

Cheers,
Andre.
key:
archs: architectures this driver is used with (based on _defconfig files)
type: F(lush) and/or I(nvalidate) used in the driver
results: both= start address *and* length
correct: actual addresses passed
rounded: addresses rounded in arguments to cache maint functions

driver name    | archs         | type  | results / comments
---------------+---------------+-------+-------------------------------------
ag7xxx.c	MIPS		FI	both correct (flush == clean only?)
altera_tse.c	NIOS2		FI	both correct
bcm-sf2-eth-gmac.c	ARM	F	both correct
designware.c	ARM, ARC	FI	start OK, length rounded
dwc_eth_qos.c	ARM		FI	both rounded
e1000.c		X86, PPC, ARM	FI	some both rounded
ethoc.c		XTENSA		F	some length using PKTSIZE_ALIGN
fec_mxc.c	ARM		FI	size rounded, some base rounded
fsl-mc/mc.c	ARM64		F	both correct
ftgmac100.c	NDS32?		no cache maint? trivial DMA API wrapper?
ftmac100.c	NDS32, RISCV	FI	both correct
ldpaa_eth/ldpaa_eth.c	ARM64	F	both correct
macb.c		ARM		FI	length rounded
mt7628-eth.c	MIPS		FI	both correct
mtk_eth.c	ARM		FI	length rounded
mvneta.c	ARM		F	length rounded in send()
mvpp2.c		ARM		F	length rounded in send()
pch_gbe.c	MIPS		FI	both correct
pcnet.c		MIPS		FI	both correct
pic32_eth.c	MIPS		FI	both correct
ravb.c		ARM		FI	flush correct, inv both rounded
rtl8169.c	ARM, x86	FI	inv both rounded, flush correct
sh_eth.c	ARM, SH		FI	inv both rounded, flush length rounded
sni_ave.c	ARM		FI	both rounded in wrappers
sun8i_emac.c	ARM		FI	some length, some base rounded
cpsw.c		ARM		FI	flush length rounded, inv length rounded
ti/davinci_emac.c	ARM	FI	(length rounded: PKTSIZE_ALIGN)
zynq_gem.c	ARM		FI	both rounded
Tom Rini Jan. 16, 2020, 6:23 p.m. UTC | #9
On Tue, Jan 07, 2020 at 06:14:22PM +0000, Andre Przywara wrote:
> On Mon, 6 Jan 2020 11:01:21 -0500
> Tom Rini <trini@konsulko.com> wrote:
> 
> Hi Tom,
> 
> thanks for chiming in on this!
> 
> > On Thu, Dec 19, 2019 at 12:32:09PM +0000, Andre Przywara wrote:
> > > On Thu, 19 Dec 2019 12:43:35 +0100
> > > Marek Vasut <marex@denx.de> wrote:
> > > 
> > > Hi Marek,
> > >   
> > > > On 12/19/19 12:36 PM, Andre Przywara wrote:  
> > > > > On Thu, 19 Dec 2019 09:04:32 +0100
> > > > > Marek Vasut <marex@denx.de> wrote:  
> > > > >> On 12/19/19 2:23 AM, André Przywara wrote:    
> > > > >>> On 19/12/2019 00:55, Marek Vasut wrote:  
> > > > >>>> On 12/19/19 1:52 AM, Andre Przywara wrote:      
> > > > >>>>> According to commit 11aa6a32eb5f ("arm: cache: Implement cache range
> > > > >>>>> check for v7"), which introduced check_cache_range(), this was meant
> > > > >>>>> as a pure debugging feature, only to be compiled in when a developer
> > > > >>>>> explicitly #defined DEBUG in cache.c. Presumably the intention was to
> > > > >>>>> help with finding *certain* alignment issues with DMA buffers.
> > > > >>>>>
> > > > >>>>> Commit bcc53bf09589 ("arm: Show cache warnings in U-Boot proper only")
> > > > >>>>> compiled this in *unconditionally* into U-Boot proper.
> > > > >>>>>
> > > > >>>>> This has the annoying side effect of producing tons of somewhat
> > > > >>>>> pointless warnings about non-aligned clean&invalidate operations, which
> > > > >>>>> tend to be appeased by even more pointless rounding operations in many
> > > > >>>>> drivers (mostly those used on ARM boards).
> > > > >>>>>
> > > > >>>>> Bring back the old behaviour, of only compiling this in for DEBUG
> > > > >>>>> situations, but staying silent otherwise.
> > > > >>>>>
> > > > >>>>> This reverts commit bcc53bf095893fbdae531a9a7b5d4ef4a125a7fc.
> > > > >>>>>
> > > > >>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > > >>>>> ---
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>> if the intention was indeed to always force cache maintenance range
> > > > >>>>> alignments, I would like to open a discussion on this, because I believe
> > > > >>>>> it is not useful, especially in the clean&invalidate case.      
> > > > >>>>
> > > > >>>> Why don't you rather fix the cache op alignment bugs ?      
> > > > >>>
> > > > >>> Which bugs do you mean?      
> > > > >>
> > > > >> Well, the ones reported by this warning.    
> > > > > 
> > > > > As mentioned, most of them are just silenced in drivers, though not actually fixed.    
> > > > 
> > > > Do you have specific details here ? They should be fixed.
> > > >   
> > > > > This patch here was actually triggered by me cleaning up sun8i-emac.c, which then made those warnings appear again.
> > > > > I will try to send out this series, which might make it more clear what I am after.
> > > > >     
> > > > >>> Those that would currently trigger those warnings?
> > > > >>> I don't think they are actual bugs, besides I don't think they are any
> > > > >>> cases left for 32-bit ARM boards (leave alone the new RPi4 Ethernet
> > > > >>> driver in the rpi-4-32b_defconfig).      
> > > > >>
> > > > >> If your cache isn't flushed or invalidated properly, it's a bug.
> > > > >> On ARM32, this is (was, before this warning) a real problem.    
> > > > > 
> > > > > The actual cache maintenance is fine in the drivers, it's the over-strict alignment requirements that bothers me and makes things actually worse:
> > > > > 
> > > > > Architecturally (ARM ARM) there is no alignment requirement for VA operations. Walking over a range of addresses does get easier if you start on a cache line boundary, so this is what we do in the implementations for v7 and v8.    
> > > > 
> > > > U-Boot is not only V7 and V8, but also all the earlier ones (back to V4).  
> > > 
> > > I checked v6 as well (same), and couldn't find any evidence that this would be different for v5 or v4, although CP15 does not seem to be architecturally defined there. Do you have an example of an older core failing that?
> > > But anyway this particular check and patch is affecting v7 only.
> > >   
> > > > > Alignment gets critical when it comes to a pure invalidate call. If you have other dirty data in this same cache line, you might lose the latest stores, that's why it is important to align those *buffers*. So the strict check we have in v7_dcache_inval_range() is the right thing (at least for the base address). What is not right is to deliberately align just the *parameters* of the invalidate call, as seens in net/sun8i_emac.c or net/dwc_eth_qos.c, for instance. This could actually hide bugs. Drivers should always pass on the actual buffer address to invalidate().    
> > > > 
> > > > Sure, but that's a driver bug. That's a separate topic from this patch.
> > > >   
> > > > > Then there is no real need to align the range for flush (clean&invalidate) calls. Overshooting here is not critical, and again we do align them anyway in the implementation.    
> > > > 
> > > > Wrong, you may interfere with some DMA engine which might be accessing
> > > > the area which you just overshot.  
> > > 
> > > Can you give an example on that? Are you thinking about cleaning dirty cache lines to a DRAM region which DMA is just writing data into?
> > > After all a clean could happen at any time, when a line gets evicted from the cache, without an explicit cache maintenance instruction. So if you have such a case, that's a separate bug.
> > >   
> > > > > This
> > > > > forced check only leads to drivers aligning the *parameters* of these calls, to no actual effect, probably given a wrong impression to driver authors and readers.  
> > > > 
> > > > Again, driver bug, that's unrelated and must be fixed.  
> > > 
> > > If you fix them, they will trigger tons of warnings, mostly bogus, about end addresses not being aligned (see below). Been there, done that. That's why this patch as a first step to allow those fixes without spamming the user.
> > >   
> > > > > So in preparation for fixing the drivers, I ask for those checks to become pure DEBUG warnings. Another alternative would be to relax those checks, to just require the start address of invalidate to be aligned - but this is what we check separately already.    
> > > > 
> > > > See above.
> > > >   
> > > > >>> Or those that are currently hidden because we *force* an alignment on
> > > > >>> the *arguments* passed to invalidate_dcache_range, for instance?
> > > > >>> These are quite numerous, so I would rather get some input first before
> > > > >>> spending a lot of time on this.      
> > > > >>
> > > > >> Make sure your buffers are properly aligned and you have no cache
> > > > >> issues. It's really that easy. If you see numerous, then your platform
> > > > >> has a real problem which must be fixed ; I didn't see any for quite a
> > > > >> while on platforms I have access to.    
> > > > > 
> > > > > Yes, I think the buffers are correctly aligned in the drivers, as least the ones I checked. But then we should always pass this address to invalidate, and not round it just before calling invalidate. Same goes to some degree for the end address, although is a somewhat separate topic.    
> > > > 
> > > > Of course the address passed to cache ops should not be rounded up.
> > > > That's a bug (unless the buffer actually covers that rounded address,
> > > > which sometimes is the case, and might or might not thus be harmless.
> > > > That's up to the driver author to determine)  
> > > 
> > > Yes, and that's why the strict unconditional check we have right now is not helping. We should have them in, and developers could check the warnings for being legit or not.
> > >    
> > > > > And that's why I think having check_cache_range() is useful for development and debugging, but only then, because not all warnings are legit (end addresses).    
> > > > 
> > > > This implication makes no sense. How do you infer from "drivers have
> > > > bugs" that a warning about such bugs should be turned into a DEBUG()?  
> > > 
> > > Most (if not all) of those warnings come from the end address not being aligned. And this is something not easily being checked outside of the driver. What we want to check for invalidate is that we only invalidate a well known region, so:
> > > 1) The start address of the buffer is aligned.
> > > 2) The end address, rounded up to the next cache line, is still within our buffer (so nothing else is using the same cacheline).
> > > 
> > > We can check 1) easily, but not 2), that would require driver specific code, which knows about the size of the buffer. We could provide something like:
> > >   bool check_cache_buffer(void *start_address, size_t length, size_t buffer_size);
> > > which drivers would call explicitly. Or we could include the buffer size into the invalidate call, but that would change the interface.
> > > 
> > > That would at least cover the buffers that the driver provides itself (RX buffers for network drivers). Network TX buffers are handed in from the generic network code, and the driver does not know what is there after "length". So the best thing we can do is to call "flush_dcache_range(packet, packet + length);", and then should not receive warnings if the end address is unaligned. That's why I think checks for flush are not useful.  
> > 
> > Thanks for having this discussion.  I think it is reasonable at this
> > point to investigate how we can evolve what we about cache flushing to
> > be better than what we have now.  Looking at the cases above, it seems
> > like it might be reasonable to have always enabled checks #1 (that can
> > easily be disabled) and if we can't easily migrate our callers to handle
> > #2 (and maybe not have that check be default enabled?) at least provide
> > the helper for it, for debugging issues.  Thanks all!
> 
> OK, I prototyped #2 in my sun8i-emac fixes series already. Will try to send this out later as a discussion base.
> 
> But still I get (false positive) warnings about unaligned flush() calls when fixing the driver. So shall we apply this patch to get this out of the way first? People can still bring back the checks by #define DEBUG.
> 
> For see the impact of this problem I attach my analysis of the network drivers. This is best effort and probably not 100% correct in every aspect, so bear with me if I misjudged your driver.
> But I think the general picture is evident:
> - Most (if not all) drivers used on ARM(32) boards do some kind of (misleading) rounding. We don't see this in MIPS drivers, for instance. This tells me that this is probably just to appease these unconditional checks.
> - There are far more length roundings than base roundings.
> - The ARMv7 implementation of flush/invalidate_dcache_range() is the only one doing these checks (unconditionally). Another reason having this outside of debug is a mistake and it should be reverted?
> 
> Fixing those drivers is possible, but tedious. And would rely on those checks being fixed first.
> 
> Also it sounds hard to get this all properly tested. So shall we just fix those drivers that people actually care about?

Yes, we should rework the checks to be debug or LOGLEVEL dependent or
similar.  Then we can fix the drivers as we see problems with them.
diff mbox series

Patch

diff --git a/arch/arm/lib/cache.c b/arch/arm/lib/cache.c
index 007d4ebc49..f6c5c774cd 100644
--- a/arch/arm/lib/cache.c
+++ b/arch/arm/lib/cache.c
@@ -57,8 +57,8 @@  int check_cache_range(unsigned long start, unsigned long stop)
 		ok = 0;
 
 	if (!ok) {
-		warn_non_spl("CACHE: Misaligned operation at range [%08lx, %08lx]\n",
-			     start, stop);
+		debug("CACHE: Misaligned operation at range [%08lx, %08lx]\n",
+		      start, stop);
 	}
 
 	return ok;