diff mbox

[U-Boot] mmc:dcache: Cache line size aligned internal MMC buffers

Message ID 1313745913-28672-1-git-send-email-l.majewski@samsung.com
State Superseded, archived
Delegated to: Andy Fleming
Headers show

Commit Message

Łukasz Majewski Aug. 19, 2011, 9:25 a.m. UTC
MMC operations are performed on cache line size aligned buffers.
In the current MMC implementation it is allowed to pass buffer 
with arbitrary alignment.
In this patch assumption has been made, that it is better to align the buffer
on the MMC framework boundary, than in a number of u-boot subsystems, which are
using MMC.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Andy Fleming <afleming@gmail.com>
CC: Albert ARIBAUD <albert.u.boot@aribaud.net>
---
 drivers/mmc/mmc.c |   30 +++++++++++++++++++++++++++---
 1 files changed, 27 insertions(+), 3 deletions(-)

Comments

Mike Frysinger Aug. 19, 2011, 1:57 p.m. UTC | #1
On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> +	cache_align_buf = memalign(get_dcache_line_size(),

nowhere do i see get_dcache_line_size() defined

also, what is the code size increase with your patch ?
-mike
Łukasz Majewski Aug. 19, 2011, 3:28 p.m. UTC | #2
Hi Mike,

On Fri, 19 Aug 2011 09:57:10 -0400
Mike Frysinger <vapier@gentoo.org> wrote:

> On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> > +	cache_align_buf = memalign(get_dcache_line_size(),
> 
> nowhere do i see get_dcache_line_size() defined
> 
> also, what is the code size increase with your patch ?
> -mike

Please look to the following post:
http://patchwork.ozlabs.org/patch/110501/

and another related with this issue:
http://patchwork.ozlabs.org/patch/110300/


Code size overhead (s5p_goni target):
Without proposed changes: 167928 B (u-boot.bin)
With changes: 168208 B (u-boot.bin)

Delta: 280 B
Mike Frysinger Aug. 19, 2011, 3:35 p.m. UTC | #3
On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
> On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
> > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> > > +	cache_align_buf = memalign(get_dcache_line_size(),
> > 
> > nowhere do i see get_dcache_line_size() defined
> 
> Please look to the following post:
> http://patchwork.ozlabs.org/patch/110501/
> 
> and another related with this issue:
> http://patchwork.ozlabs.org/patch/110300/

if you're posting patches with dependencies, you need to mention them 
explicitly (below the "---" area), or send proper patch series ([PATCH N/M]).

ignoring that, this patch will break all arches except arm.  that's bad 
mmmkay.  you probably need to move that weak def out of arm's cache.c and into 
like lib/cache.c.

> > also, what is the code size increase with your patch ?
> 
> Code size overhead (s5p_goni target):
> Without proposed changes: 167928 B (u-boot.bin)
> With changes: 168208 B (u-boot.bin)
> 
> Delta: 280 B

np if it gives significant (more than system noise) speedups.  any details on 
that ?
-mike
Łukasz Majewski Aug. 22, 2011, 7:29 a.m. UTC | #4
Hi Mike,

On Fri, 19 Aug 2011 11:35:50 -0400
Mike Frysinger <vapier@gentoo.org> wrote:

> On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
> > On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
> > > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> > > > +	cache_align_buf = memalign(get_dcache_line_size(),
> > > 
> > > nowhere do i see get_dcache_line_size() defined
> > 
> > Please look to the following post:
> > http://patchwork.ozlabs.org/patch/110501/
> > 
> > and another related with this issue:
> > http://patchwork.ozlabs.org/patch/110300/
> 
> if you're posting patches with dependencies, you need to mention them 
> explicitly (below the "---" area), or send proper patch series
> ([PATCH N/M]).
> 
> ignoring that, this patch will break all arches except arm.  that's
> bad mmmkay.  you probably need to move that weak def out of arm's
> cache.c and into like lib/cache.c.

Yes, I will prepare two patch series:
One addressing the get_dcache_line_size function for all u-boot
architectures. 

Another patch series will address changes to the drivers/mmc.c file. 

> > > also, what is the code size increase with your patch ?
> > 
> > Code size overhead (s5p_goni target):
> > Without proposed changes: 167928 B (u-boot.bin)
> > With changes: 168208 B (u-boot.bin)
> > 
> > Delta: 280 B
> 
> np if it gives significant (more than system noise) speedups.  any
> details on that ?
> -mike

No tests performed yet. The goal of those patches is to preserve the
MMC subsystem functionality when dcache is enabled (the ext_csd[512]
corruption is observed with d-cache enabled).


I'm particularly interested if the approach with memalign and
get_dcache_line_size is the preferred way to go.

I will think about some reliable ways to measure the MMC performance
with enabled and disabled MMC subsystem.
Mike Frysinger Aug. 22, 2011, 4:08 p.m. UTC | #5
On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
> On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
> > On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
> > > On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
> > > > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> > > > also, what is the code size increase with your patch ?
> > > 
> > > Code size overhead (s5p_goni target):
> > > Without proposed changes: 167928 B (u-boot.bin)
> > > With changes: 168208 B (u-boot.bin)
> > > 
> > > Delta: 280 B
> > 
> > np if it gives significant (more than system noise) speedups.  any
> > details on that ?
> 
> No tests performed yet. The goal of those patches is to preserve the
> MMC subsystem functionality when dcache is enabled (the ext_csd[512]
> corruption is observed with d-cache enabled).

so you're papering over a bug in some controller's cache handling ?  shouldnt 
you fix the controller in question by having it flush its caches ?  aligning 
random buffers to make cache issues "go away" isnt the right way for anything.
-mike
Anton Staaf Aug. 22, 2011, 4:42 p.m. UTC | #6
On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
>> On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
>> > On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
>> > > On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
>> > > > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
>> > > > also, what is the code size increase with your patch ?
>> > >
>> > > Code size overhead (s5p_goni target):
>> > > Without proposed changes: 167928 B (u-boot.bin)
>> > > With changes: 168208 B (u-boot.bin)
>> > >
>> > > Delta: 280 B
>> >
>> > np if it gives significant (more than system noise) speedups.  any
>> > details on that ?
>>
>> No tests performed yet. The goal of those patches is to preserve the
>> MMC subsystem functionality when dcache is enabled (the ext_csd[512]
>> corruption is observed with d-cache enabled).
>
> so you're papering over a bug in some controller's cache handling ?  shouldnt
> you fix the controller in question by having it flush its caches ?  aligning
> random buffers to make cache issues "go away" isnt the right way for anything.
> -mike

No, this isn't something that can be fixed in the controller driver
code.  This is a fundamental problem with buffers in U-Boot that needs
to be resolved by aligning all buffers used for DMA.  The main problem
is that invalidating a non-cache line aligned buffer is not a safe
operation.  There have been a number of threads discussing this.  The
general consensus was to make attempting to invalidate an unaligned
buffer an error and then to clean up the unaligned buffers as we find
them.

Lukasz, I also have been using memalign to clean up accesses in local
patches, so you've got my vote there. I am curious as to whether we
should provide a single block allocation API or if each subsection
should add lazy memalign allocations to create aligned buffers when
they are needed...

Thanks,
    Anton

> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
>
Marek Vasut Aug. 22, 2011, 4:52 p.m. UTC | #7
On Monday, August 22, 2011 06:42:06 PM Anton Staaf wrote:
> On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
> >> On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
> >> > On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
> >> > > On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
> >> > > > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> >> > > > also, what is the code size increase with your patch ?
> >> > > 
> >> > > Code size overhead (s5p_goni target):
> >> > > Without proposed changes: 167928 B (u-boot.bin)
> >> > > With changes: 168208 B (u-boot.bin)
> >> > > 
> >> > > Delta: 280 B
> >> > 
> >> > np if it gives significant (more than system noise) speedups.  any
> >> > details on that ?
> >> 
> >> No tests performed yet. The goal of those patches is to preserve the
> >> MMC subsystem functionality when dcache is enabled (the ext_csd[512]
> >> corruption is observed with d-cache enabled).
> > 
> > so you're papering over a bug in some controller's cache handling ?
> >  shouldnt you fix the controller in question by having it flush its
> > caches ?  aligning random buffers to make cache issues "go away" isnt
> > the right way for anything. -mike
> 
> No, this isn't something that can be fixed in the controller driver
> code.  This is a fundamental problem with buffers in U-Boot that needs
> to be resolved by aligning all buffers used for DMA.  The main problem
> is that invalidating a non-cache line aligned buffer is not a safe
> operation.  There have been a number of threads discussing this.  The
> general consensus was to make attempting to invalidate an unaligned
> buffer an error and then to clean up the unaligned buffers as we find
> them.
> 
> Lukasz, I also have been using memalign to clean up accesses in local
> patches, so you've got my vote there. I am curious as to whether we
> should provide a single block allocation API or if each subsection
> should add lazy memalign allocations to create aligned buffers when
> they are needed...

Maybe some dma_allocate_aligned() would be cool. And probably control the 
alignment with some #define CONFIG PLATFORM_ALIGNMENT_SIZE ?

Cheers
> 
> Thanks,
>     Anton
> 
> > _______________________________________________
> > U-Boot mailing list
> > U-Boot@lists.denx.de
> > http://lists.denx.de/mailman/listinfo/u-boot
> 
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
Mike Frysinger Aug. 22, 2011, 5:17 p.m. UTC | #8
On Monday, August 22, 2011 12:42:06 Anton Staaf wrote:
> On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
> >> No tests performed yet. The goal of those patches is to preserve the
> >> MMC subsystem functionality when dcache is enabled (the ext_csd[512]
> >> corruption is observed with d-cache enabled).
> > 
> > so you're papering over a bug in some controller's cache handling ?
> >  shouldnt you fix the controller in question by having it flush its
> > caches ?  aligning random buffers to make cache issues "go away" isnt
> > the right way for anything.
> 
> No, this isn't something that can be fixed in the controller driver
> code.  This is a fundamental problem with buffers in U-Boot that needs
> to be resolved by aligning all buffers used for DMA.  The main problem
> is that invalidating a non-cache line aligned buffer is not a safe
> operation.  There have been a number of threads discussing this.  The
> general consensus was to make attempting to invalidate an unaligned
> buffer an error and then to clean up the unaligned buffers as we find
> them.

Linux has taken the approach of the callers providing dma safe buffers instead 
of having lower layers trying to fix things up all the time.  and for obvious 
reasons: you avoid the double allocation, the memcpy, and you avoid unclear 
documentation between layers where multiple layers attempt to fixup the 
callers thus futher multiplying the duplicated allocations/copies/etc...

so in this case, why not fix the caller that is passing a dma unaligned buffer 
down to the mmc code ?
-mike
Anton Staaf Aug. 22, 2011, 6:15 p.m. UTC | #9
Yes, this would be a much preferable approach.  The only problem that
I encountered when attempting to do that in the Chromium U-Boot was
that there are exposed stand alone U-Boot API's that can pass in
pointers that make their way all the way down to the device driver.
Two solutions occurred to me.  One was to put an Error or assert as
high up as possible and expect violators to fix their code, the other
was to patch things up in lower layers (right now we have a bounce
buffer in the tegra MMC driver to deal with unaligned buffers).  My
intent was that this would give us a functional U-Boot while we took
the time to clean up as many of the unaligned buffers that U-Boot
uses, and then tackle the question of stand alone applications.

-Anton

On Mon, Aug 22, 2011 at 10:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 22, 2011 12:42:06 Anton Staaf wrote:
>> On Mon, Aug 22, 2011 at 9:08 AM, Mike Frysinger <vapier@gentoo.org> wrote:
>> > On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
>> >> No tests performed yet. The goal of those patches is to preserve the
>> >> MMC subsystem functionality when dcache is enabled (the ext_csd[512]
>> >> corruption is observed with d-cache enabled).
>> >
>> > so you're papering over a bug in some controller's cache handling ?
>> >  shouldnt you fix the controller in question by having it flush its
>> > caches ?  aligning random buffers to make cache issues "go away" isnt
>> > the right way for anything.
>>
>> No, this isn't something that can be fixed in the controller driver
>> code.  This is a fundamental problem with buffers in U-Boot that needs
>> to be resolved by aligning all buffers used for DMA.  The main problem
>> is that invalidating a non-cache line aligned buffer is not a safe
>> operation.  There have been a number of threads discussing this.  The
>> general consensus was to make attempting to invalidate an unaligned
>> buffer an error and then to clean up the unaligned buffers as we find
>> them.
>
> Linux has taken the approach of the callers providing dma safe buffers instead
> of having lower layers trying to fix things up all the time.  and for obvious
> reasons: you avoid the double allocation, the memcpy, and you avoid unclear
> documentation between layers where multiple layers attempt to fixup the
> callers thus futher multiplying the duplicated allocations/copies/etc...
>
> so in this case, why not fix the caller that is passing a dma unaligned buffer
> down to the mmc code ?
> -mike
>
Mike Frysinger Aug. 22, 2011, 6:31 p.m. UTC | #10
On Monday, August 22, 2011 14:15:22 Anton Staaf wrote:

please dont top post

> Yes, this would be a much preferable approach.  The only problem that
> I encountered when attempting to do that in the Chromium U-Boot was
> that there are exposed stand alone U-Boot API's that can pass in
> pointers that make their way all the way down to the device driver.
> Two solutions occurred to me.  One was to put an Error or assert as
> high up as possible and expect violators to fix their code, the other
> was to patch things up in lower layers (right now we have a bounce
> buffer in the tegra MMC driver to deal with unaligned buffers).  My
> intent was that this would give us a functional U-Boot while we took
> the time to clean up as many of the unaligned buffers that U-Boot
> uses, and then tackle the question of stand alone applications.

could you elaborate on the code paths in question ?
-mike
Anton Staaf Aug. 22, 2011, 6:57 p.m. UTC | #11
On Mon, Aug 22, 2011 at 11:31 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, August 22, 2011 14:15:22 Anton Staaf wrote:
>
> please dont top post

Sorry about that.

>> Yes, this would be a much preferable approach.  The only problem that
>> I encountered when attempting to do that in the Chromium U-Boot was
>> that there are exposed stand alone U-Boot API's that can pass in
>> pointers that make their way all the way down to the device driver.
>> Two solutions occurred to me.  One was to put an Error or assert as
>> high up as possible and expect violators to fix their code, the other
>> was to patch things up in lower layers (right now we have a bounce
>> buffer in the tegra MMC driver to deal with unaligned buffers).  My
>> intent was that this would give us a functional U-Boot while we took
>> the time to clean up as many of the unaligned buffers that U-Boot
>> uses, and then tackle the question of stand alone applications.
>
> could you elaborate on the code paths in question ?

Heh, now I've got to go fine them again.  The ub_dev_read function in
the example/api/glue.c is an example of a use of the public syscall
API that as far as I can tell passes down the buffer pointer all the
way to the driver layer.

As for the unaligned buffers I've run into:

fs/ext2/dev.c: sec_buf is allocated on the stack and thus is not
guaranteed to be aligned.
drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stack.
drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
allocated on the stack.
drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
disk/part_efi.c: gpt_header in print_part_efi is allocated on the stack.
disk/part_efi.c: gpt_header in get_partition_info_efi is allocated on the stack.
disk/part_efi.c: legacy_mbr in test_part_efi is allocated on the stack.
disk/part_efi.c: pte in alloc_read_gpt_entries is allocated using
malloc (not memalign).

I have working solutions to all of these but they generally use the
lazy allocation of an aligned buffer as needed pattern.  I wanted to
see the results of Lukasz current thread before sending them upstream.
 But that's probably not right.  I should probably send them now so we
can see a number of examples of the problem and get a better idea of
how to fix it.

Thanks,
    Anton

> -mike
>
Łukasz Majewski Aug. 23, 2011, 8:42 a.m. UTC | #12
Hi Mike,

On Mon, 22 Aug 2011 12:08:42 -0400
Mike Frysinger <vapier@gentoo.org> wrote:

> On Monday, August 22, 2011 03:29:52 Lukasz Majewski wrote:
> > On Fri, 19 Aug 2011 11:35:50 -0400 Mike Frysinger wrote:
> > > On Friday, August 19, 2011 11:28:18 Lukasz Majewski wrote:
> > > > On Fri, 19 Aug 2011 09:57:10 -0400 Mike Frysinger wrote:
> > > > > On Friday, August 19, 2011 05:25:13 Lukasz Majewski wrote:
> > > > > also, what is the code size increase with your patch ?
> > > > 
> > > > Code size overhead (s5p_goni target):
> > > > Without proposed changes: 167928 B (u-boot.bin)
> > > > With changes: 168208 B (u-boot.bin)
> > > > 
> > > > Delta: 280 B
> > > 
> > > np if it gives significant (more than system noise) speedups.  any
> > > details on that ?
> > 
> > No tests performed yet. The goal of those patches is to preserve the
> > MMC subsystem functionality when dcache is enabled (the ext_csd[512]
> > corruption is observed with d-cache enabled).
> 
> so you're papering over a bug in some controller's cache handling ?
> shouldnt you fix the controller in question by having it flush its
> caches ?  aligning random buffers to make cache issues "go away" isnt
> the right way for anything. -mike

Please look into the following patch:
http://patchwork.ozlabs.org/patch/110576/

It seems that only flushing/invalidating buffers is not enough.
Łukasz Majewski Aug. 23, 2011, 9:19 a.m. UTC | #13
Hi Anton,

On Mon, 22 Aug 2011 11:57:57 -0700
Anton Staaf <robotboy@google.com> wrote:

> drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac
> drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
> allocated on the stack.
> drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.

This allocations are already fixed:

http://patchwork.ozlabs.org/patch/110300/
http://patchwork.ozlabs.org/patch/109790/

If any doubts/comments/ideas, please let me know :-)

> But that's probably not right.  I should probably send them now so we
> can see a number of examples of the problem and get a better idea of
> how to fix it.

Discussion is always welcome.
Anton Staaf Aug. 23, 2011, 5 p.m. UTC | #14
On Tue, Aug 23, 2011 at 2:19 AM, Lukasz Majewski <l.majewski@samsung.com> wrote:
> Hi Anton,
>
> On Mon, 22 Aug 2011 11:57:57 -0700
> Anton Staaf <robotboy@google.com> wrote:
>
>> drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac
>> drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
>> allocated on the stack.
>> drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
>
> This allocations are already fixed:
>
> http://patchwork.ozlabs.org/patch/110300/
> http://patchwork.ozlabs.org/patch/109790/

Yup, should have said that you'd already taken care of some of these.
I'll send a patch now for another unaligned buffer in the mmc code
that isn't in your patch set.

Thanks,
    Anton

> If any doubts/comments/ideas, please let me know :-)
>
>> But that's probably not right.  I should probably send them now so we
>> can see a number of examples of the problem and get a better idea of
>> how to fix it.
>
> Discussion is always welcome.
>
> --
> Best regards,
>
> Lukasz Majewski
>
> Samsung Poland R&D Center
> Platform Group
>
Mike Frysinger Aug. 23, 2011, 5:30 p.m. UTC | #15
On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
> On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
> > drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac
> > drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
> > allocated on the stack.
> > drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
> 
> This allocations are already fixed:
> 
> http://patchwork.ozlabs.org/patch/110300/
> http://patchwork.ozlabs.org/patch/109790/
> 
> If any doubts/comments/ideas, please let me know :-)

hmm, i wish we had a memalign_alloca().  and all this copy & pasting of 
get_dcache_line_size() makes me unhappy as we're encoding too-low-of-a-level 
logic into funcs.

what about adding a new func like:
#define dma_buffer_alloca(size)

and it would take care of allocating a big enough aligned buffer on the stack.
-mike
Anton Staaf Aug. 23, 2011, 6:12 p.m. UTC | #16
On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
>> On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
>> > drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac
>> > drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
>> > allocated on the stack.
>> > drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
>>
>> This allocations are already fixed:
>>
>> http://patchwork.ozlabs.org/patch/110300/
>> http://patchwork.ozlabs.org/patch/109790/
>>
>> If any doubts/comments/ideas, please let me know :-)
>
> hmm, i wish we had a memalign_alloca().  and all this copy & pasting of
> get_dcache_line_size() makes me unhappy as we're encoding too-low-of-a-level
> logic into funcs.
>
> what about adding a new func like:
> #define dma_buffer_alloca(size)

I generally avoid large allocations on the stack, they can confuse
virtual stack management and blow out small embedded stacks.  But
neither of these are really a problem for most U-Boot targets.  Are
you thinking something like:

#define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
1) & ~(get_dcache_line_size() - 1);

Subtracting one from the total allocated size is technically correct,
but could fail poorly if the get_dcache_line_size function returned 0
for some reason.  Perhaps because it's called on a target with no
cache so the implementer figured 0 was as good as any value to return.

I have a nagging suspicion that I'm forgetting something though.  I
know I looked at using alloca first when I was starting to deal with
cache and DMA issues on the Tegra.  And I seem to recall a reason not
to use it.  But it's not coming back to me now.  Perhaps someone else
will think of it.  :)

Thanks,
    Anton

> and it would take care of allocating a big enough aligned buffer on the stack.
> -mike
>
Mike Frysinger Aug. 23, 2011, 6:35 p.m. UTC | #17
On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
> On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
> > On Tuesday, August 23, 2011 05:19:39 Lukasz Majewski wrote:
> >> On Mon, 22 Aug 2011 11:57:57 -0700 Anton Staaf wrote:
> >> > drivers/mmc/mmc.c: ext_csd in mmc_change_freq is allocated on the stac
> >> > drivers/mmc/mmc.c: scr and switch_status in sd_change_freq are
> >> > allocated on the stack.
> >> > drivers/mmc/mmc.c: ext_csd in mmc_startup is allocated on the stack.
> >> 
> >> This allocations are already fixed:
> >> 
> >> http://patchwork.ozlabs.org/patch/110300/
> >> http://patchwork.ozlabs.org/patch/109790/
> >> 
> >> If any doubts/comments/ideas, please let me know :-)
> > 
> > hmm, i wish we had a memalign_alloca().  and all this copy & pasting of
> > get_dcache_line_size() makes me unhappy as we're encoding
> > too-low-of-a-level logic into funcs.
> > 
> > what about adding a new func like:
> > #define dma_buffer_alloca(size)
> 
> I generally avoid large allocations on the stack, they can confuse
> virtual stack management and blow out small embedded stacks.  But
> neither of these are really a problem for most U-Boot targets.

and more importantly, we're already doing these things on the stack ;)

> Are you thinking something like:
> 
> #define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
> 1) & ~(get_dcache_line_size() - 1);

yes, exactly

> Subtracting one from the total allocated size is technically correct,
> but could fail poorly if the get_dcache_line_size function returned 0
> for some reason.  Perhaps because it's called on a target with no
> cache so the implementer figured 0 was as good as any value to return.

i'm not sure we should cater to buggy get_dcache_line_size implementations

> I have a nagging suspicion that I'm forgetting something though.  I
> know I looked at using alloca first when I was starting to deal with
> cache and DMA issues on the Tegra.  And I seem to recall a reason not
> to use it.  But it's not coming back to me now.  Perhaps someone else
> will think of it.  :)

if the stack lives in a place that dma can't access, but that'd already be a 
problem for these funcs
-mike
Mike Frysinger Aug. 23, 2011, 6:36 p.m. UTC | #18
On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
> On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
> > what about adding a new func like:
> > #define dma_buffer_alloca(size)
> 
> I generally avoid large allocations on the stack, they can confuse
> virtual stack management and blow out small embedded stacks.

oh, and that doesnt preclude also adding a dma_buffer_malloc(size).  it's just 
that all the cases here are already on the stack, so i was focusing on that.
-mike
Anton Staaf Aug. 23, 2011, 6:46 p.m. UTC | #19
On Tue, Aug 23, 2011 at 11:36 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday, August 23, 2011 14:12:09 Anton Staaf wrote:
>> On Tue, Aug 23, 2011 at 10:30 AM, Mike Frysinger wrote:
>> > what about adding a new func like:
>> > #define dma_buffer_alloca(size)
>>
>> I generally avoid large allocations on the stack, they can confuse
>> virtual stack management and blow out small embedded stacks.
>
> oh, and that doesnt preclude also adding a dma_buffer_malloc(size).  it's just
> that all the cases here are already on the stack, so i was focusing on that.
> -mike
>

Yup, totally reasonable.  Lukasz, would you like to add the above
macro to your dcache cache line size patch?  Then we can change our
other patches to use it instead of memalign.

Thanks,
    Anton
Wolfgang Denk Aug. 23, 2011, 8:12 p.m. UTC | #20
Dear Anton Staaf,

In message <CAF6FioXeFQP2a0VaZOVOTmxPvJxF4=kamS18=2=p6TFNefwUuA@mail.gmail.com> you wrote:
>
> > what about adding a new func like:
> > #define dma_buffer_alloca(size)
> 
> I generally avoid large allocations on the stack, they can confuse
> virtual stack management and blow out small embedded stacks.  But
> neither of these are really a problem for most U-Boot targets.  Are
> you thinking something like:
> 
> #define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
> 1) & ~(get_dcache_line_size() - 1);

I don't think I will accept any alloca() based code into mainline.

Best regards,

Wolfgang Denk
Anton Staaf Aug. 23, 2011, 8:27 p.m. UTC | #21
On Tue, Aug 23, 2011 at 1:12 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Anton Staaf,
>
> In message <CAF6FioXeFQP2a0VaZOVOTmxPvJxF4=kamS18=2=p6TFNefwUuA@mail.gmail.com> you wrote:
>>
>> > what about adding a new func like:
>> > #define dma_buffer_alloca(size)
>>
>> I generally avoid large allocations on the stack, they can confuse
>> virtual stack management and blow out small embedded stacks.  But
>> neither of these are really a problem for most U-Boot targets.  Are
>> you thinking something like:
>>
>> #define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
>> 1) & ~(get_dcache_line_size() - 1);
>
> I don't think I will accept any alloca() based code into mainline.

Ahh, that must have been my reason for not using it in the first
place, a premonition.  :)

So then, to guide our efforts, what is a more suitable solution?
Would you prefer we stick with the existing path of calling memalign
and passing it the cache size by directly calling
get_dcache_line_size?  Or would you prefer something more like a
dma_buffer_malloc function that allocates on the heap a cache line
size aligned buffer and returns it?

Hmm, that will require a bit of thought because we need to recover the
original pointer returned by malloc to cleanly free the buffer later.

Thanks,
    Anton

> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> "There are three principal ways to lose money: wine, women,  and  en-
> gineers.  While  the first two are more pleasant, the third is by far
> the more certain."                      -- Baron Rothschild, ca. 1800
>
Mike Frysinger Aug. 23, 2011, 8:35 p.m. UTC | #22
On Tuesday, August 23, 2011 16:12:03 Wolfgang Denk wrote:
> Anton Staaf wrote:
> > > what about adding a new func like:
> > > #define dma_buffer_alloca(size)
> > 
> > I generally avoid large allocations on the stack, they can confuse
> > virtual stack management and blow out small embedded stacks.  But
> > neither of these are really a problem for most U-Boot targets.  Are
> > you thinking something like:
> > 
> > #define dma_buffer_alloca(size) alloca(size + get_dcache_line_size() -
> > 1) & ~(get_dcache_line_size() - 1);
> 
> I don't think I will accept any alloca() based code into mainline.

hmm, for some reason i thought we were using alloca.  but i dont see any 
consumers in u-boot itself.
-mike
Mike Frysinger Aug. 23, 2011, 8:37 p.m. UTC | #23
On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
> So then, to guide our efforts, what is a more suitable solution?
> Would you prefer we stick with the existing path of calling memalign
> and passing it the cache size by directly calling
> get_dcache_line_size?  Or would you prefer something more like a
> dma_buffer_malloc function that allocates on the heap a cache line
> size aligned buffer and returns it?

memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is 
the way to go imo.  anything that involves end code having to figure out how 
to align things itself is asking for pain.
-mike
Anton Staaf Aug. 23, 2011, 9:06 p.m. UTC | #24
On Tue, Aug 23, 2011 at 1:37 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
>> So then, to guide our efforts, what is a more suitable solution?
>> Would you prefer we stick with the existing path of calling memalign
>> and passing it the cache size by directly calling
>> get_dcache_line_size?  Or would you prefer something more like a
>> dma_buffer_malloc function that allocates on the heap a cache line
>> size aligned buffer and returns it?
>
> memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is
> the way to go imo.  anything that involves end code having to figure out how
> to align things itself is asking for pain.

Indeed, I had temporarily forgotten about memalign it seems. :/

Does it make more sense to put such a function into Lukasz's patch, or
a separate patch?

Thanks,
    Anton

> -mike
>
Wolfgang Denk Aug. 23, 2011, 9:09 p.m. UTC | #25
Dear Mike Frysinger,

In message <201108231637.05845.vapier@gentoo.org> you wrote:
>
> On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
> > So then, to guide our efforts, what is a more suitable solution?
> > Would you prefer we stick with the existing path of calling memalign
> > and passing it the cache size by directly calling
> > get_dcache_line_size?  Or would you prefer something more like a
> > dma_buffer_malloc function that allocates on the heap a cache line
> > size aligned buffer and returns it?
>
> memalign() is simply a malloc() with offset fudging, so dma_buffer_malloc() is 
> the way to go imo.  anything that involves end code having to figure out how 
> to align things itself is asking for pain.

I would like to avoid using any malloc code here.  We have to keep in
mind that such code changes will spread, and will be copied into
driver code, file systems, etc. which might be used (and even
required, for example for NAND or SDCard booting systems) before
relocation - but malloc becomes available only after relocation.

Why cannot we define a macro that declares a (sufficiently sized)
buffer on the stack and provides and a pointer to a (correctly
aligned) address in this buffer?

Best regards,

Wolfgang Denk
Mike Frysinger Aug. 23, 2011, 9:32 p.m. UTC | #26
On Tuesday, August 23, 2011 17:06:41 Anton Staaf wrote:
> On Tue, Aug 23, 2011 at 1:37 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
> >> So then, to guide our efforts, what is a more suitable solution?
> >> Would you prefer we stick with the existing path of calling memalign
> >> and passing it the cache size by directly calling
> >> get_dcache_line_size?  Or would you prefer something more like a
> >> dma_buffer_malloc function that allocates on the heap a cache line
> >> size aligned buffer and returns it?
> > 
> > memalign() is simply a malloc() with offset fudging, so
> > dma_buffer_malloc() is the way to go imo.  anything that involves end
> > code having to figure out how to align things itself is asking for pain.
> 
> Indeed, I had temporarily forgotten about memalign it seems. :/
> 
> Does it make more sense to put such a function into Lukasz's patch, or
> a separate patch?

Lukasz' havent been merged yet, so imo it makes more sense to put the sane 
framework in place and then fix the relevant code on top of that
-mike
Mike Frysinger Aug. 23, 2011, 9:32 p.m. UTC | #27
On Tuesday, August 23, 2011 17:09:37 Wolfgang Denk wrote:
> Mike Frysinger wrote:
> > On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
> > > So then, to guide our efforts, what is a more suitable solution?
> > > Would you prefer we stick with the existing path of calling memalign
> > > and passing it the cache size by directly calling
> > > get_dcache_line_size?  Or would you prefer something more like a
> > > dma_buffer_malloc function that allocates on the heap a cache line
> > > size aligned buffer and returns it?
> > 
> > memalign() is simply a malloc() with offset fudging, so
> > dma_buffer_malloc() is the way to go imo.  anything that involves end
> > code having to figure out how to align things itself is asking for pain.
> 
> I would like to avoid using any malloc code here.  We have to keep in
> mind that such code changes will spread, and will be copied into
> driver code, file systems, etc. which might be used (and even
> required, for example for NAND or SDCard booting systems) before
> relocation - but malloc becomes available only after relocation.
> 
> Why cannot we define a macro that declares a (sufficiently sized)
> buffer on the stack and provides and a pointer to a (correctly
> aligned) address in this buffer?

isnt that what i already posted and you NAK-ed ? :)

DMA_DECLARE_BUFFER(...)
-mike
Anton Staaf Aug. 23, 2011, 9:48 p.m. UTC | #28
On Tue, Aug 23, 2011 at 2:32 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday, August 23, 2011 17:09:37 Wolfgang Denk wrote:
>> Mike Frysinger wrote:
>> > On Tuesday, August 23, 2011 16:27:26 Anton Staaf wrote:
>> > > So then, to guide our efforts, what is a more suitable solution?
>> > > Would you prefer we stick with the existing path of calling memalign
>> > > and passing it the cache size by directly calling
>> > > get_dcache_line_size?  Or would you prefer something more like a
>> > > dma_buffer_malloc function that allocates on the heap a cache line
>> > > size aligned buffer and returns it?
>> >
>> > memalign() is simply a malloc() with offset fudging, so
>> > dma_buffer_malloc() is the way to go imo.  anything that involves end
>> > code having to figure out how to align things itself is asking for pain.
>>
>> I would like to avoid using any malloc code here.  We have to keep in
>> mind that such code changes will spread, and will be copied into
>> driver code, file systems, etc. which might be used (and even
>> required, for example for NAND or SDCard booting systems) before
>> relocation - but malloc becomes available only after relocation.
>>
>> Why cannot we define a macro that declares a (sufficiently sized)
>> buffer on the stack and provides and a pointer to a (correctly
>> aligned) address in this buffer?
>
> isnt that what i already posted and you NAK-ed ? :)
>
> DMA_DECLARE_BUFFER(...)

I wasn't going to say it.  :)  How about something like this, which is
very similar to what you had Mike, but doesn't define the array in the
macro.  It's a bit clearer what is going on, but also requires a bit
more work at each use.

#define DCACHE_RESIZE(size) ((size) + get_dcache_line_size() - 1)
#define DCACHE_ALIGN(buffer) ((buffer) + get_dcache_line_size() - 1) &
~(get_dcache_line_size() - 1)

char buffer[DCACHE_RESIZE(100)];
char * aligned = DCACHE_ALIGN(buffer);

It would be awesome if the idea below worked, but it can't because the
array is popped when the ({...}) scope is exited I believe.

#define allocate_dma_buffer_on_stack(size) \
({ \
    char _dma_buffer[(size) + get_dcache_line_size() - 1]; \
 \
    _dma_buffer & ~(get_dcache_line_size() - 1); \
})

And you could use it like:

char * buffer = allocate_dma_buffer_on_stack(100);

If anyone can think of a way to make this work that would be excellent...

Thanks,
    Anton

> -mike
>
Wolfgang Denk Aug. 23, 2011, 10:42 p.m. UTC | #29
Dear Mike Frysinger,

In message <201108231732.39791.vapier@gentoo.org> you wrote:
>
> > Why cannot we define a macro that declares a (sufficiently sized)
> > buffer on the stack and provides and a pointer to a (correctly
> > aligned) address in this buffer?
> 
> isnt that what i already posted and you NAK-ed ? :)
> 
> DMA_DECLARE_BUFFER(...)

I just NAKed this specific implementation.

Best regards,

Wolfgang Denk
Mike Frysinger Aug. 24, 2011, 3 a.m. UTC | #30
On Tuesday, August 23, 2011 18:42:46 Wolfgang Denk wrote:
> Mike Frysinger wrote:
> > > Why cannot we define a macro that declares a (sufficiently sized)
> > > buffer on the stack and provides and a pointer to a (correctly
> > > aligned) address in this buffer?
> > 
> > isnt that what i already posted and you NAK-ed ? :)
> > 
> > DMA_DECLARE_BUFFER(...)
> 
> I just NAKed this specific implementation.

well, it's hard to come up with a "good" one without knowing the parameters to 
work within ;).  i'm not stuck on any particular implementation, i just want 
the helper to be simple to use and hard to screw up.

the trouble with doing something like:
	char foo[func_to_do_round(size)];
is that "size" is not in # of bytes but is number of elements.  so the point 
of my (i dont deny) complicated cpp logic was so that the internal 
implementation took on more of the work leaving the user (which we all know 
can easily mess things up) with a very simple macro:
	DMA_DECLARE_BUFFER(<buffer type>, <variable name>, <num elements>);
-mike
Łukasz Majewski Aug. 24, 2011, 10:07 a.m. UTC | #31
Hi,

On Tue, 23 Aug 2011 23:00:59 -0400
Mike Frysinger <vapier@gentoo.org> wrote:

> On Tuesday, August 23, 2011 18:42:46 Wolfgang Denk wrote:
> > Mike Frysinger wrote:
> > > > Why cannot we define a macro that declares a (sufficiently
> > > > sized) buffer on the stack and provides and a pointer to a
> > > > (correctly aligned) address in this buffer?
> > > 
> > > isnt that what i already posted and you NAK-ed ? :)
> > > 
> > > DMA_DECLARE_BUFFER(...)
> > 
> > I just NAKed this specific implementation.
> 
> well, it's hard to come up with a "good" one without knowing the
> parameters to work within ;).  i'm not stuck on any particular
> implementation, i just want the helper to be simple to use and hard
> to screw up.
> 
> the trouble with doing something like:
> 	char foo[func_to_do_round(size)];
> is that "size" is not in # of bytes but is number of elements.  so
> the point of my (i dont deny) complicated cpp logic was so that the
> internal implementation took on more of the work leaving the user
> (which we all know can easily mess things up) with a very simple
> macro: DMA_DECLARE_BUFFER(<buffer type>, <variable name>, <num
> elements>); -mike

After reading dcache related threads I'd like to sum them up:

1. alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree
that alloca is NOT the way to go.

2. malloc/memalign -> avoidable to use

3. Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
<size in bytes>) 
solution looks OK for me, at least for the stack allocated data (e.g.
ext_csd).
However this proposed implementation has been NAK'ed by Wolfgang. 

I'm curious how this macro should look like? Is it only matter of code
reordering or other approach shall be found?

4. get_dcache_line_size() can be simply defined as 
#define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
_really_ want to save a few bytes. 
However, I think, that proposed get_dcache_line_size() implementation
( http://patchwork.ozlabs.org/patch/111048/ ) is more programmer
friendly (one don't need to exactly know the dcache line size on a
particular SoC).
Wolfgang Denk Aug. 24, 2011, 1:25 p.m. UTC | #32
Dear Lukasz Majewski,

In message <20110824120744.097ba2c5@lmajewski.digital.local> you wrote:
> 
> After reading dcache related threads I'd like to sum them up:
> 
> 1. alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree
> that alloca is NOT the way to go.
> 
> 2. malloc/memalign -> avoidable to use
> 
> 3. Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
> <size in bytes>) 
> solution looks OK for me, at least for the stack allocated data (e.g.
> ext_csd).
> However this proposed implementation has been NAK'ed by Wolfgang. 
> 
> I'm curious how this macro should look like? Is it only matter of code
> reordering or other approach shall be found?

I think I'd like to see a macro that can be used like this:

	void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);


> 4. get_dcache_line_size() can be simply defined as 
> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
> _really_ want to save a few bytes. 

Actually I fail to understand why we would ever need
get_dcache_line_size() in a boot loader.

Best regards,

Wolfgang Denk
Łukasz Majewski Aug. 24, 2011, 2:31 p.m. UTC | #33
Hi Wolfgang,


> I think I'd like to see a macro that can be used like this:
> 
> 	void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
> 

Frankly speaking I don't know any easy way to define buffer this way
in a macro (at least without digging deep enough to C preprocessor
programming tricks). Maybe Mike or Anton knows. 

The void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes); approach 
needs to do following things in macro:

#define DMA_BUFFER(100) \
	char buf[100 + get_dcache_line_size]; \
	unsigned long tmp = (unsigned long) buf; \
	void* buf_out = (void*) ((tmp + (get_dcache_line_size() - 1)) &
	~(get_dcache_line_size() - 1))

The problem here is to assign the buf_out pointer to the void
*dma_buffer_pointer. How can we "return" the void *buf_out?





For comparison please look to this solution:

#define ALIGN_ADDR(addr) ((void *)(((unsigned long) addr +
get_dcache_line_size() - 1)\ & ~(get_dcache_line_size() - 1)))

#define DMA_DECLARE_BUFFER(type, name, size) \
	char *__##name[size + get_dcache_line_size()]; \
	type *name = ALIGN_ADDR(__##name);

int mmc_startup(struct mmc *mmc)
{
	/* Some declarations */
	/* char ext_csd[512]; */
	DMA_DECLARE_BUFFER(char, ext_csd, 512);

	printf("%s: ptr:%p\n", __func__, ext_csd);

	/* rest of the code */
}

Here the DMA_DECLARE_BUFFER really defines the buffer as an automatic 
variable with this function scope. This is tested and works :-)

> 
> > 4. get_dcache_line_size() can be simply defined as 
> > #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
> > _really_ want to save a few bytes. 
> 
> Actually I fail to understand why we would ever need
> get_dcache_line_size() in a boot loader.

If I can ask for clarification here. 

Shall not u-boot read on fly the cache line size (as it is now done) or
you don't like the get_cache_line_size defined as function? 

Going further shall the get_cache_line_size be removed completely and
replaced with CONFIG_SYS_CACHE_LINE_SIZE?


> 
> Best regards,
> 
> Wolfgang Denk
>
Mike Frysinger Aug. 24, 2011, 4:16 p.m. UTC | #34
On Tuesday, August 23, 2011 17:48:50 Anton Staaf wrote:
> I wasn't going to say it.  :)  How about something like this, which is
> very similar to what you had Mike, but doesn't define the array in the
> macro.  It's a bit clearer what is going on, but also requires a bit
> more work at each use.
> 
> #define DCACHE_RESIZE(size) ((size) + get_dcache_line_size() - 1)
> #define DCACHE_ALIGN(buffer) ((buffer) + get_dcache_line_size() - 1) &
> ~(get_dcache_line_size() - 1)
> 
> char buffer[DCACHE_RESIZE(100)];

as long as people always use a byte-sized type (i.e. char), this should work.  
obviously using "u32 buffer[...]" will be bad.

> It would be awesome if the idea below worked, but it can't because the
> array is popped when the ({...}) scope is exited I believe.

yes, i believe you are correct.
-mike
Mike Frysinger Aug. 24, 2011, 4:20 p.m. UTC | #35
On Wednesday, August 24, 2011 09:25:53 Wolfgang Denk wrote:
> Lukasz Majewski wrote:
> > 3. Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
> > <size in bytes>)
> > solution looks OK for me, at least for the stack allocated data (e.g.
> > ext_csd).
> > However this proposed implementation has been NAK'ed by Wolfgang.
> > 
> > I'm curious how this macro should look like? Is it only matter of code
> > reordering or other approach shall be found?
> 
> I think I'd like to see a macro that can be used like this:
> 
> 	void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);

as Anton highlighted, i'm not sure this is possible.  DMA_BUFFER() would have 
to be defined with ({...}), and any storage declared in there is in new scope, 
so as soon as you exit that scope, any storage declared is invalid for use 
outside of it.  the only thing that is good for is producing a single value 
(i.e. an "int" or "char" or a pointer etc..., but not storage).

if a person who knows more about gcc internals in this regard could comment, 
that'd be great :).
-mike
Anton Staaf Aug. 24, 2011, 5:27 p.m. UTC | #36
On Wed, Aug 24, 2011 at 6:25 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Lukasz Majewski,
>
> In message <20110824120744.097ba2c5@lmajewski.digital.local> you wrote:
>>
>> After reading dcache related threads I'd like to sum them up:
>>
>> 1. alloca() -> not acceptable to u-boot mainline by Wolfgang. I agree
>> that alloca is NOT the way to go.
>>
>> 2. malloc/memalign -> avoidable to use
>>
>> 3. Mike's DMA_DECLARE_BUFFER(<buffer type>, <variable name>,
>> <size in bytes>)
>> solution looks OK for me, at least for the stack allocated data (e.g.
>> ext_csd).
>> However this proposed implementation has been NAK'ed by Wolfgang.
>>
>> I'm curious how this macro should look like? Is it only matter of code
>> reordering or other approach shall be found?
>
> I think I'd like to see a macro that can be used like this:
>
>        void *dma_buffer_pointer = DMA_BUFFER(size_in_bytes);
>
>
>> 4. get_dcache_line_size() can be simply defined as
>> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
>> _really_ want to save a few bytes.
>
> Actually I fail to understand why we would ever need
> get_dcache_line_size() in a boot loader.

It is required so that we can correctly allocate buffers that will be
used by DMA engines to read or write data.  The reason that these
buffers need to be cache line size aligned is because unaligned cache
invalidates are not possible to do in a safe way.  The problem is that
invalidating a partial cache line requires invalidating the entire
line.  And the other part of the line can contain nearby variables
(especially if the buffer is stack allocated), so we have to first
flush the cache line to be safe.  However, that flush will clobber the
values that the DMA engine wrote to main memory that are not reflected
in the cache.

Thanks,
    Anton

> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> "You know, after a woman's raised a family and so on,  she  wants  to
> start living her own life."   "Whose life she's _been_ living, then?"
>                                  - Terry Pratchett, _Witches Abroad_
>
Mike Frysinger Aug. 24, 2011, 6:06 p.m. UTC | #37
On Wednesday, August 24, 2011 13:27:05 Anton Staaf wrote:
> On Wed, Aug 24, 2011 at 6:25 AM, Wolfgang Denk wrote:
> > Lukasz Majewski wrote:
> >> 4. get_dcache_line_size() can be simply defined as
> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
> >> _really_ want to save a few bytes.
> > 
> > Actually I fail to understand why we would ever need
> > get_dcache_line_size() in a boot loader.
> 
> It is required so that we can correctly allocate buffers that will be
> used by DMA engines to read or write data.  The reason that these
> buffers need to be cache line size aligned is because unaligned cache
> invalidates are not possible to do in a safe way.  The problem is that
> invalidating a partial cache line requires invalidating the entire
> line.  And the other part of the line can contain nearby variables
> (especially if the buffer is stack allocated), so we have to first
> flush the cache line to be safe.  However, that flush will clobber the
> values that the DMA engine wrote to main memory that are not reflected
> in the cache.

right, and that's why i want to "hide" it be hind a "dma buffer allocate" API 
so that people can just say "i want a dma safe buffer" rather than having to 
delve into these details.  they'll inevitable get confused and screw it up. ;)
-mike
Wolfgang Denk Aug. 24, 2011, 6:12 p.m. UTC | #38
Dear Anton Staaf,

In message <CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com> you wrote:
>
> >> 4. get_dcache_line_size() can be simply defined as
> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
> >> _really_ want to save a few bytes.
> >
> > Actually I fail to understand why we would ever need
> > get_dcache_line_size() in a boot loader.
> 
> It is required so that we can correctly allocate buffers that will be
> used by DMA engines to read or write data.  The reason that these

No, it is definitely NOT needed for this purpose - we have been using
CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so
I don't really understand why we now would need a new function for
this?


Best regards,

Wolfgang Denk
Anton Staaf Aug. 24, 2011, 6:25 p.m. UTC | #39
On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Anton Staaf,
>
> In message <CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com> you wrote:
>>
>> >> 4. get_dcache_line_size() can be simply defined as
>> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
>> >> _really_ want to save a few bytes.
>> >
>> > Actually I fail to understand why we would ever need
>> > get_dcache_line_size() in a boot loader.
>>
>> It is required so that we can correctly allocate buffers that will be
>> used by DMA engines to read or write data.  The reason that these
>
> No, it is definitely NOT needed for this purpose - we have been using
> CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so
> I don't really understand why we now would need a new function for
> this?

Ahh, I misunderstood your question.  I thought you were asking about
the need to know the cache line size.  Not it's specific
implementation as a function call.  In which case, I agree, and am
happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
What have I missed?

Thanks,
    Anton

>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Brain: an apparatus with which we think we think.    - Ambrose Bierce
>
Wolfgang Denk Aug. 24, 2011, 7:04 p.m. UTC | #40
Dear Anton Staaf,

In message <CAF6FioWM6MezMMDr6+i8tGOLG-En4TihtOCauPbH0o7vqHaf_A@mail.gmail.com> you wrote:
>
> > No, it is definitely NOT needed for this purpose - we have been using
> > CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so
> > I don't really understand why we now would need a new function for
> > this?
> 
> Ahh, I misunderstood your question.  I thought you were asking about
> the need to know the cache line size.  Not it's specific
> implementation as a function call.  In which case, I agree, and am
> happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
> definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
> What have I missed?

I copy & pasted the name, incorrectly.  It's CONFIG_SYS_CACHELINE_SIZE

You will find this being used:

-> grep -R CONFIG_SYS_CACHELINE_SIZE * | wc -l
276


Best regards,

Wolfgang Denk
Lukasz Majewski Aug. 24, 2011, 7:18 p.m. UTC | #41
On Wed, 24 Aug 2011 11:25:42 -0700
Anton Staaf <robotboy@google.com> wrote:

> On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk <wd@denx.de> wrote:
> > Dear Anton Staaf,
> >
> > In message
> > <CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com>
> > you wrote:
> >>
> >> >> 4. get_dcache_line_size() can be simply defined as
> >> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
> >> >> _really_ want to save a few bytes.
> >> >
> >> > Actually I fail to understand why we would ever need
> >> > get_dcache_line_size() in a boot loader.
> >>
> >> It is required so that we can correctly allocate buffers that will
> >> be used by DMA engines to read or write data.  The reason that
> >> these
> >
> > No, it is definitely NOT needed for this purpose - we have been
> > using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without
> > problems, so I don't really understand why we now would need a new
> > function for this?
> 

Ok, so one problem solved :-).

> Ahh, I misunderstood your question.  I thought you were asking about
> the need to know the cache line size.  Not it's specific
> implementation as a function call.  In which case, I agree, and am
> happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
> definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
> What have I missed?
> 

There are some similar definitions:
./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32
./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32
./include/configs/favr-32-ezkit.h:#define
CONFIG_SYS_DCACHE_LINESZ 32


> Thanks,
>     Anton
> 
> >
> > Best regards,
> >
> > Wolfgang Denk
> >
> > --
> > DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev
> > Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194
> > Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax:
> > (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with
> > which we think we think.    - Ambrose Bierce
> >
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

Regards,
Lukasz Majewski
Anton Staaf Aug. 24, 2011, 8:12 p.m. UTC | #42
On Wed, Aug 24, 2011 at 12:04 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Anton Staaf,
>
> In message <CAF6FioWM6MezMMDr6+i8tGOLG-En4TihtOCauPbH0o7vqHaf_A@mail.gmail.com> you wrote:
>>
>> > No, it is definitely NOT needed for this purpose - we have been using
>> > CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without problems, so
>> > I don't really understand why we now would need a new function for
>> > this?
>>
>> Ahh, I misunderstood your question.  I thought you were asking about
>> the need to know the cache line size.  Not it's specific
>> implementation as a function call.  In which case, I agree, and am
>> happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
>> definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
>> What have I missed?
>
> I copy & pasted the name, incorrectly.  It's CONFIG_SYS_CACHELINE_SIZE
>
> You will find this being used:
>
> -> grep -R CONFIG_SYS_CACHELINE_SIZE * | wc -l
> 276

:) Hah, thanks.  I tried a number of permutations, but never the right one.

-Anton

>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> There are certain things men must do to remain men.
>        -- Kirk, "The Ultimate Computer", stardate 4929.4
>
Anton Staaf Aug. 24, 2011, 8:13 p.m. UTC | #43
On Wed, Aug 24, 2011 at 12:18 PM, Lukasz Majewski <majess1982@gmail.com> wrote:
> On Wed, 24 Aug 2011 11:25:42 -0700
> Anton Staaf <robotboy@google.com> wrote:
>
>> On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk <wd@denx.de> wrote:
>> > Dear Anton Staaf,
>> >
>> > In message
>> > <CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com>
>> > you wrote:
>> >>
>> >> >> 4. get_dcache_line_size() can be simply defined as
>> >> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
>> >> >> _really_ want to save a few bytes.
>> >> >
>> >> > Actually I fail to understand why we would ever need
>> >> > get_dcache_line_size() in a boot loader.
>> >>
>> >> It is required so that we can correctly allocate buffers that will
>> >> be used by DMA engines to read or write data.  The reason that
>> >> these
>> >
>> > No, it is definitely NOT needed for this purpose - we have been
>> > using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without
>> > problems, so I don't really understand why we now would need a new
>> > function for this?
>>
>
> Ok, so one problem solved :-).
>
>> Ahh, I misunderstood your question.  I thought you were asking about
>> the need to know the cache line size.  Not it's specific
>> implementation as a function call.  In which case, I agree, and am
>> happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
>> definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
>> What have I missed?
>>
>
> There are some similar definitions:
> ./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32
> ./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32
> ./include/configs/favr-32-ezkit.h:#define
> CONFIG_SYS_DCACHE_LINESZ 32

I would assume that these should be changed to the one mentioned by
Wolfgang.  But this still leaves us with a question of how to make a
simple, easy to use macro for allocating cache line size aligned stack
buffers.  I'll work on that some more and see if I can come up with
something.

Thanks,
    Anton

>
>> Thanks,
>>     Anton
>>
>> >
>> > Best regards,
>> >
>> > Wolfgang Denk
>> >
>> > --
>> > DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev
>> > Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194
>> > Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax:
>> > (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with
>> > which we think we think.    - Ambrose Bierce
>> >
>> _______________________________________________
>> U-Boot mailing list
>> U-Boot@lists.denx.de
>> http://lists.denx.de/mailman/listinfo/u-boot
>
> Regards,
> Lukasz Majewski
>
Anton Staaf Aug. 29, 2011, 8:12 p.m. UTC | #44
On Wed, Aug 24, 2011 at 1:13 PM, Anton Staaf <robotboy@google.com> wrote:
> On Wed, Aug 24, 2011 at 12:18 PM, Lukasz Majewski <majess1982@gmail.com> wrote:
>> On Wed, 24 Aug 2011 11:25:42 -0700
>> Anton Staaf <robotboy@google.com> wrote:
>>
>>> On Wed, Aug 24, 2011 at 11:12 AM, Wolfgang Denk <wd@denx.de> wrote:
>>> > Dear Anton Staaf,
>>> >
>>> > In message
>>> > <CAF6FioWBsjFa+QdFNHEQGO50eLoqZMc--Yt3-iYX6DWr=hcOrg@mail.gmail.com>
>>> > you wrote:
>>> >>
>>> >> >> 4. get_dcache_line_size() can be simply defined as
>>> >> >> #define get_dcache_line_size() CONFIG_SYS_CACHE_LINE_SIZE if we
>>> >> >> _really_ want to save a few bytes.
>>> >> >
>>> >> > Actually I fail to understand why we would ever need
>>> >> > get_dcache_line_size() in a boot loader.
>>> >>
>>> >> It is required so that we can correctly allocate buffers that will
>>> >> be used by DMA engines to read or write data.  The reason that
>>> >> these
>>> >
>>> > No, it is definitely NOT needed for this purpose - we have been
>>> > using CONFIG_SYS_CACHE_LINE_SIZE for more than a decade without
>>> > problems, so I don't really understand why we now would need a new
>>> > function for this?
>>>
>>
>> Ok, so one problem solved :-).
>>
>>> Ahh, I misunderstood your question.  I thought you were asking about
>>> the need to know the cache line size.  Not it's specific
>>> implementation as a function call.  In which case, I agree, and am
>>> happy to use CONFIG_SYS_CACHE_LINE_SIZE, though I don't see any
>>> definitions of CONFIG_SYS_CACHE_LINE_SIZE in the entire U-Boot tree.
>>> What have I missed?
>>>
>>
>> There are some similar definitions:
>> ./include/configs/atstk1006.h:#define CONFIG_SYS_DCACHE_LINESZ 32
>> ./include/configs/atngw100.h:#define CONFIG_SYS_DCACHE_LINESZ 32
>> ./include/configs/favr-32-ezkit.h:#define
>> CONFIG_SYS_DCACHE_LINESZ 32
>
> I would assume that these should be changed to the one mentioned by
> Wolfgang.  But this still leaves us with a question of how to make a
> simple, easy to use macro for allocating cache line size aligned stack
> buffers.  I'll work on that some more and see if I can come up with
> something.

OK, better late than never, I've run down all of the solutions I can
think of.  Below are three solutions and one non-solution, how they
would be used, and what I see as pro's and con's for each.


1) Mikes's macro

#define DMA_ALIGN_SIZE(size) \
       (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)

#define DMA_DECLARE_BUFFER(type, name, size) \
       void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \
       type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));

DMA_DECLARE_BUFFER(int, buffer, 100);

Pros: Doesn't use alloca, doesn't use malloc and doesn't use new GCC
alignment attribute abilities.  This makes it the most portable, and
the most supported solution.

Cons: It's a pretty ugly macro that obfuscates the creation of
multiple variables.  Thus I would say it falls into the macro magic
category.


It looks like this is already working it's way into a patch.  So the
rest of my comments below may be moot.


2) Use alloca wrapped in a macro:

#define alloca_cacheline_alligned(size) alloca(size +
CONFIG_SYS_CACHELINE_SIZE - 1) & ~(CONFIG_SYS_CACHELINE_SIZE - 1)

int * buffer = alloca_cacheline_alligned(100 * sizeof(int));

Pros: It is fairly obvious what the above code is intended to do, even
to someone that doesn't know the underlying implementation of the
macro.

Cons: Wolfgang does not want alloca in the U-Boot codebase.


I would like to know, mainly for my education, why you do not want
alloca, but are OK with dynamic array sizing (as in the first solution
above).  My understanding is that they pretty much equate to the same
thing.  What is it about alloca that is objectionable?


3) Use GCC specific alignment attribute:

#define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))

int buffer[100] CACHELINE_ALIGNED;

Pros: The declaration of the buffer is even simpler and more obvious,
no use of alloca at all.

Cons: This doesn't work in any version of GCC before October 2010.
Meaning that it probably doesn't work in whatever compiler you're
using.


It's really too bad that this isn't a usable solution.  I suppose that
we could switch to it at some point when we expect U-Boot to only be
compiled by versions of GCC that support this.  By the way, the
failure mode here is pretty bad.  If you compile the above code with
an older GCC it will silently fail to align the variable.  :(


4) Use memalign and free:

int * buffer = memalign(CONFIG_SYS_CACHELINE_SIZE, 100 * sizeof(int));

free(buffer);

Pros: portable.

Cons: Heap instead of Stack allocation.  The result is that it's
slower, and requires more manual management of buffers.  Also,
Wolfgang is opposed to this solution.


I think I agree with this not being a great solution.  I do wonder if
it would be useful to consider a dedicated buffer management API that
could be used to allocate and free cache line aligned power of two
buffers.  Perhaps something like a buddy heap for simplicity.


Thanks,
    Anton

> Thanks,
>    Anton
>
>>
>>> Thanks,
>>>     Anton
>>>
>>> >
>>> > Best regards,
>>> >
>>> > Wolfgang Denk
>>> >
>>> > --
>>> > DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev
>>> > Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194
>>> > Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax:
>>> > (+49)-8142-66989-80 Email: wd@denx.de Brain: an apparatus with
>>> > which we think we think.    - Ambrose Bierce
>>> >
>>> _______________________________________________
>>> U-Boot mailing list
>>> U-Boot@lists.denx.de
>>> http://lists.denx.de/mailman/listinfo/u-boot
>>
>> Regards,
>> Lukasz Majewski
>>
>
Wolfgang Denk Aug. 29, 2011, 8:35 p.m. UTC | #45
Dear Anton Staaf,

In message <CAF6FioWhhqbVQWWPrusb69S2mpfBtoiazxc8x56bkogJmzXZ3g@mail.gmail.com> you wrote:
>
> I would like to know, mainly for my education, why you do not want
> alloca, but are OK with dynamic array sizing (as in the first solution
> above).  My understanding is that they pretty much equate to the same
> thing.  What is it about alloca that is objectionable?

First, I've got bitten by alloca() before, when GCC misbehaved.
Admitted, that was in some _very_ old version, but such experience
sticks.

Second, the behaviour of alloca() depends on a number of non-obvious
settings and compiler flags, and results are not always obvious.  rom
the man page:

   Notes on the GNU Version
       Normally, gcc(1) translates calls to alloca() with inlined
       code. This is not done when either the -ansi, -std=c89,
       -std=c99, or the -fno-builtin option is given (and the header
       <alloca.h> is not included). But beware! By default the glibc
       version of <stdlib.h> includes <alloca.h> and that contains
       the line:

           #define alloca(size)   __builtin_alloca (size)

       with messy consequences if one has a private version of this function.

This is nothing I want to have in any production software. OK, you
may argument that U-Boot has a strictly controlled environment, but
anyway.

Third, the man page says:

NOTES

       The alloca() function is machine- and compiler-dependent. For
       certain applications, its use can improve efficiency compared
       to the use of malloc(3) plus free(3). In certain cases, it can
       also simplify memory deallocation in applications that use
       longjmp(3) or siglongjmp(3).
       Otherwise, its use is discouraged.

In my opinion, U-Boot falls into the "otherwise" category...


> I think I agree with this not being a great solution.  I do wonder if
> it would be useful to consider a dedicated buffer management API that
> could be used to allocate and free cache line aligned power of two
> buffers.  Perhaps something like a buddy heap for simplicity.

The longer I read this thread, the less frightening Mikes's macro
becomes (compared tho the alternatives).

Best regards,

Wolfgang Denk
Scott Wood Aug. 29, 2011, 8:47 p.m. UTC | #46
On 08/29/2011 03:12 PM, Anton Staaf wrote:
> 1) Mikes's macro
> 
> #define DMA_ALIGN_SIZE(size) \
>        (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
> 
> #define DMA_DECLARE_BUFFER(type, name, size) \
>        void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \
>        type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
> 
> DMA_DECLARE_BUFFER(int, buffer, 100);

This doesn't compile, and it tries to round the buffer down below its
starting point.

After fixing the more obvious issues, I get "error: initializer element
is not constant".

There might be no way to express and-by-constant as a relocation.

You could set the pointer at runtime, though, and remove some of the
macrification:

#define DMA_ALIGN_SIZE(size) \
	((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
#define DMA_ALIGN_ADDR(addr) \
	(DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))

int buffer_unaligned[DMA_ALIGN_SIZE(100)];
int *buffer;

some_init_func()
{
	buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned));
}

> 3) Use GCC specific alignment attribute:
> 
> #define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))
> 
> int buffer[100] CACHELINE_ALIGNED;
> 
> Pros: The declaration of the buffer is even simpler and more obvious,
> no use of alloca at all.
> 
> Cons: This doesn't work in any version of GCC before October 2010.
> Meaning that it probably doesn't work in whatever compiler you're
> using.
> 
> 
> It's really too bad that this isn't a usable solution.  I suppose that
> we could switch to it at some point when we expect U-Boot to only be
> compiled by versions of GCC that support this.  By the way, the
> failure mode here is pretty bad.  If you compile the above code with
> an older GCC it will silently fail to align the variable.  :(

If the decision is made to depend on newer compilers, U-Boot could check
for it and #error out if the compiler is too old (possibly just in the
files that depend on this feature).

-Scott
Anton Staaf Aug. 29, 2011, 8:58 p.m. UTC | #47
On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 08/29/2011 03:12 PM, Anton Staaf wrote:
>> 1) Mikes's macro
>>
>> #define DMA_ALIGN_SIZE(size) \
>>        (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
>>
>> #define DMA_DECLARE_BUFFER(type, name, size) \
>>        void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \
>>        type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
>>
>> DMA_DECLARE_BUFFER(int, buffer, 100);
>
> This doesn't compile, and it tries to round the buffer down below its
> starting point.

You are correct.  I wrote that one as a modification of mikes initial
proposal.  I should have caught the incorrect rounding when I did.
The patch that Lukasz sent titled "dcache: Dcache line size aligned
stack buffer allocation" has a correct implementation.

> After fixing the more obvious issues, I get "error: initializer element
> is not constant".

I think this requires the use of -std=c99 or GCC extensions.  More
specifics can be found here:
http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html

> There might be no way to express and-by-constant as a relocation.
>
> You could set the pointer at runtime, though, and remove some of the
> macrification:
>
> #define DMA_ALIGN_SIZE(size) \
>        ((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
> #define DMA_ALIGN_ADDR(addr) \
>        (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
>
> int buffer_unaligned[DMA_ALIGN_SIZE(100)];
> int *buffer;
>
> some_init_func()
> {
>        buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned));
> }

:) This was one of my suggestions earlier on a different thread.  It
was rejected there, I believe because it makes things less clear.

>> 3) Use GCC specific alignment attribute:
>>
>> #define CACHLINE_ALIGNED __attribute__ ((aligned (CONFIG_SYS_CACHELINE_SIZE)))
>>
>> int buffer[100] CACHELINE_ALIGNED;
>>
>> Pros: The declaration of the buffer is even simpler and more obvious,
>> no use of alloca at all.
>>
>> Cons: This doesn't work in any version of GCC before October 2010.
>> Meaning that it probably doesn't work in whatever compiler you're
>> using.
>>
>>
>> It's really too bad that this isn't a usable solution.  I suppose that
>> we could switch to it at some point when we expect U-Boot to only be
>> compiled by versions of GCC that support this.  By the way, the
>> failure mode here is pretty bad.  If you compile the above code with
>> an older GCC it will silently fail to align the variable.  :(
>
> If the decision is made to depend on newer compilers, U-Boot could check
> for it and #error out if the compiler is too old (possibly just in the
> files that depend on this feature).

Yup, I think whatever macro we come up with we can simplify it
eventually using the GCC attribute solution and we can make the macro
conditional on compiler version.

Thanks,
    Anton

> -Scott
>
>
Anton Staaf Aug. 29, 2011, 9:08 p.m. UTC | #48
On Mon, Aug 29, 2011 at 1:35 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Anton Staaf,
>
> In message <CAF6FioWhhqbVQWWPrusb69S2mpfBtoiazxc8x56bkogJmzXZ3g@mail.gmail.com> you wrote:
>>
>> I would like to know, mainly for my education, why you do not want
>> alloca, but are OK with dynamic array sizing (as in the first solution
>> above).  My understanding is that they pretty much equate to the same
>> thing.  What is it about alloca that is objectionable?
>
> First, I've got bitten by alloca() before, when GCC misbehaved.
> Admitted, that was in some _very_ old version, but such experience
> sticks.

Very fair, I have programming constructs I avoid for the same reason.
*cough* exceptions *cough*

> Second, the behaviour of alloca() depends on a number of non-obvious
> settings and compiler flags, and results are not always obvious.  rom
> the man page:
>
>   Notes on the GNU Version
>       Normally, gcc(1) translates calls to alloca() with inlined
>       code. This is not done when either the -ansi, -std=c89,
>       -std=c99, or the -fno-builtin option is given (and the header
>       <alloca.h> is not included). But beware! By default the glibc
>       version of <stdlib.h> includes <alloca.h> and that contains
>       the line:
>
>           #define alloca(size)   __builtin_alloca (size)
>
>       with messy consequences if one has a private version of this function.
>
> This is nothing I want to have in any production software. OK, you
> may argument that U-Boot has a strictly controlled environment, but
> anyway.

Yes, I don't expect that we will allow a custom implementation of
alloca into U-Boot, but it's a good point that the compiler treats it
in strange ways.  One alternative here would be to always call
__builtin_alloca...  This would probably have other issues though.

> Third, the man page says:
>
> NOTES
>
>       The alloca() function is machine- and compiler-dependent. For
>       certain applications, its use can improve efficiency compared
>       to the use of malloc(3) plus free(3). In certain cases, it can
>       also simplify memory deallocation in applications that use
>       longjmp(3) or siglongjmp(3).
>       Otherwise, its use is discouraged.
>
> In my opinion, U-Boot falls into the "otherwise" category...

Yup, and another man page in the "Bugs" section says:

The alloca() function is machine and compiler dependent. On many
systems its implementation is buggy. Its use is discouraged.

Which seems even worse.  So, OK, I think we've run this to the ground.
 And we can always find and replace all of the uses of Mikes macro
easily if we think of something better later.

Thank you,
    Anton

>
>> I think I agree with this not being a great solution.  I do wonder if
>> it would be useful to consider a dedicated buffer management API that
>> could be used to allocate and free cache line aligned power of two
>> buffers.  Perhaps something like a buddy heap for simplicity.
>
> The longer I read this thread, the less frightening Mikes's macro
> becomes (compared tho the alternatives).
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> "Though a program be but three lines long,
> someday it will have to be maintained."
> - The Tao of Programming
>
Scott Wood Aug. 29, 2011, 9:23 p.m. UTC | #49
On 08/29/2011 03:58 PM, Anton Staaf wrote:
> On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote:
>> On 08/29/2011 03:12 PM, Anton Staaf wrote:
>>> 1) Mikes's macro
>>>
>>> #define DMA_ALIGN_SIZE(size) \
>>>        (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
>>>
>>> #define DMA_DECLARE_BUFFER(type, name, size) \
>>>        void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \
>>>        type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
>>>
>>> DMA_DECLARE_BUFFER(int, buffer, 100);
>>
>> This doesn't compile, and it tries to round the buffer down below its
>> starting point.
> 
> You are correct.  I wrote that one as a modification of mikes initial
> proposal.  I should have caught the incorrect rounding when I did.
> The patch that Lukasz sent titled "dcache: Dcache line size aligned
> stack buffer allocation" has a correct implementation.

With the version in that patch I get the slightly different "error:
initializer element is not computable at load time".  Seems like whether
you cast the address to (type *) or (void *) determines which error you
get.  This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86).  Maybe it's
arch-dependent, based on available relocation types.

Also, shouldn't the array be of type "char" rather than "char *"?

How do you make the declaration static?

>> After fixing the more obvious issues, I get "error: initializer element
>> is not constant".
> 
> I think this requires the use of -std=c99 or GCC extensions.  More
> specifics can be found here:
> http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html

-std=c99 doesn't help.

The problem isn't the array itself, it's the pointer initializer.

>> You could set the pointer at runtime, though, and remove some of the
>> macrification:
>>
>> #define DMA_ALIGN_SIZE(size) \
>>        ((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
>> #define DMA_ALIGN_ADDR(addr) \
>>        (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
>>
>> int buffer_unaligned[DMA_ALIGN_SIZE(100)];
>> int *buffer;
>>
>> some_init_func()
>> {
>>        buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned));
>> }
> 
> :) This was one of my suggestions earlier on a different thread.  It
> was rejected there, I believe because it makes things less clear.

So, the complex macro is bad because it obscures things, and this
version is bad because it doesn't? :-)

-Scott
Anton Staaf Aug. 29, 2011, 9:54 p.m. UTC | #50
On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 08/29/2011 03:58 PM, Anton Staaf wrote:
>> On Mon, Aug 29, 2011 at 1:47 PM, Scott Wood <scottwood@freescale.com> wrote:
>>> On 08/29/2011 03:12 PM, Anton Staaf wrote:
>>>> 1) Mikes's macro
>>>>
>>>> #define DMA_ALIGN_SIZE(size) \
>>>>        (((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
>>>>
>>>> #define DMA_DECLARE_BUFFER(type, name, size) \
>>>>        void __##name[DMA_ALIGN_SIZE(size * sizeof(type))]; \
>>>>        type * name = __##name & ~(CONFIG_SYS_CACHELINE_SIZE - 1));
>>>>
>>>> DMA_DECLARE_BUFFER(int, buffer, 100);
>>>
>>> This doesn't compile, and it tries to round the buffer down below its
>>> starting point.
>>
>> You are correct.  I wrote that one as a modification of mikes initial
>> proposal.  I should have caught the incorrect rounding when I did.
>> The patch that Lukasz sent titled "dcache: Dcache line size aligned
>> stack buffer allocation" has a correct implementation.
>
> With the version in that patch I get the slightly different "error:
> initializer element is not computable at load time".  Seems like whether
> you cast the address to (type *) or (void *) determines which error you
> get.  This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86).  Maybe it's
> arch-dependent, based on available relocation types.
>
> Also, shouldn't the array be of type "char" rather than "char *"?

Yes, you are correct, it should be a char.  That may be the problem.

> How do you make the declaration static?

you can't with this version of the macro.  Are there cases where you
need the buffer to be static?

Thanks,
    Anton

>>> After fixing the more obvious issues, I get "error: initializer element
>>> is not constant".
>>
>> I think this requires the use of -std=c99 or GCC extensions.  More
>> specifics can be found here:
>> http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
>
> -std=c99 doesn't help.
>
> The problem isn't the array itself, it's the pointer initializer.
>
>>> You could set the pointer at runtime, though, and remove some of the
>>> macrification:
>>>
>>> #define DMA_ALIGN_SIZE(size) \
>>>        ((size) + CONFIG_SYS_CACHELINE_SIZE - 1)
>>> #define DMA_ALIGN_ADDR(addr) \
>>>        (DMA_ALIGN_SIZE(addr) & (CONFIG_SYS_CACHELINE_SIZE - 1))
>>>
>>> int buffer_unaligned[DMA_ALIGN_SIZE(100)];
>>> int *buffer;
>>>
>>> some_init_func()
>>> {
>>>        buffer = (int *)(DMA_ALIGN_ADDR((uintptr_t)buffer_unaligned));
>>> }
>>
>> :) This was one of my suggestions earlier on a different thread.  It
>> was rejected there, I believe because it makes things less clear.
>
> So, the complex macro is bad because it obscures things, and this
> version is bad because it doesn't? :-)
>
> -Scott
>
>
Scott Wood Aug. 29, 2011, 10:03 p.m. UTC | #51
On 08/29/2011 04:54 PM, Anton Staaf wrote:
> On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood <scottwood@freescale.com> wrote:
>> With the version in that patch I get the slightly different "error:
>> initializer element is not computable at load time".  Seems like whether
>> you cast the address to (type *) or (void *) determines which error you
>> get.  This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86).  Maybe it's
>> arch-dependent, based on available relocation types.
>>
>> Also, shouldn't the array be of type "char" rather than "char *"?
> 
> Yes, you are correct, it should be a char.  That may be the problem.

It didn't make a difference.

>> How do you make the declaration static?
> 
> you can't with this version of the macro.  Are there cases where you
> need the buffer to be static?

I think you'd want it to be static more often than not.

-Scott
Anton Staaf Aug. 29, 2011, 10:49 p.m. UTC | #52
On Mon, Aug 29, 2011 at 3:03 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 08/29/2011 04:54 PM, Anton Staaf wrote:
>> On Mon, Aug 29, 2011 at 2:23 PM, Scott Wood <scottwood@freescale.com> wrote:
>>> With the version in that patch I get the slightly different "error:
>>> initializer element is not computable at load time".  Seems like whether
>>> you cast the address to (type *) or (void *) determines which error you
>>> get.  This is with GCC 4.5.1 (powerpc) and 4.6.0 (x86).  Maybe it's
>>> arch-dependent, based on available relocation types.
>>>
>>> Also, shouldn't the array be of type "char" rather than "char *"?
>>
>> Yes, you are correct, it should be a char.  That may be the problem.
>
> It didn't make a difference.

Hmm, I haven't played with this code yet, so I can't say for sure.
But it would certainly throw a wrench in the works if Mike's solution
was also unavailable on some platforms.  By chance, you aren't also
adding static here are you?  I could certainly see that that wouldn't
work.

>>> How do you make the declaration static?
>>
>> you can't with this version of the macro.  Are there cases where you
>> need the buffer to be static?
>
> I think you'd want it to be static more often than not.

If the buffer is allocated at file scope, then yes, we would want it
to be static.  But not at function scope.  It would no longer be
allocated on the stack in that case, and that was frowned upon
earlier.

Thanks,
    Anton

> -Scott
>
>
Scott Wood Aug. 29, 2011, 11:01 p.m. UTC | #53
On 08/29/2011 05:49 PM, Anton Staaf wrote:
>>>> How do you make the declaration static?
>>>
>>> you can't with this version of the macro.  Are there cases where you
>>> need the buffer to be static?
>>
>> I think you'd want it to be static more often than not.
> 
> If the buffer is allocated at file scope, then yes, we would want it
> to be static.  But not at function scope.  It would no longer be
> allocated on the stack in that case, and that was frowned upon
> earlier.

Ah, that's the issue.  I was trying to declare it at file scope. :-P

-Scott
Anton Staaf Aug. 29, 2011, 11:05 p.m. UTC | #54
On Mon, Aug 29, 2011 at 4:01 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 08/29/2011 05:49 PM, Anton Staaf wrote:
>>>>> How do you make the declaration static?
>>>>
>>>> you can't with this version of the macro.  Are there cases where you
>>>> need the buffer to be static?
>>>
>>> I think you'd want it to be static more often than not.
>>
>> If the buffer is allocated at file scope, then yes, we would want it
>> to be static.  But not at function scope.  It would no longer be
>> allocated on the stack in that case, and that was frowned upon
>> earlier.
>
> Ah, that's the issue.  I was trying to declare it at file scope. :-P

Thank goodness, I was trying to figure out how that couldn't be a
valid GCC complaint.  :)

Thanks,
     Anton

> -Scott
>
>
diff mbox

Patch

diff --git a/drivers/mmc/mmc.c b/drivers/mmc/mmc.c
index 5f79a17..47e94c8 100644
--- a/drivers/mmc/mmc.c
+++ b/drivers/mmc/mmc.c
@@ -263,6 +263,7 @@  mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src)
 	struct mmc_cmd cmd;
 	struct mmc_data data;
 	int timeout = 1000;
+	void *cache_align_buf;
 
 	if ((start + blkcnt) > mmc->block_dev.lba) {
 		printf("MMC: block number 0x%lx exceeds max(0x%lx)\n",
@@ -283,13 +284,22 @@  mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src)
 	cmd.resp_type = MMC_RSP_R1;
 	cmd.flags = 0;
 
-	data.src = src;
+	cache_align_buf = memalign(get_dcache_line_size(),
+				   mmc->write_bl_len * blkcnt);
+
+	if (!cache_align_buf)
+		return -ENOMEM;
+
+	memcpy(cache_align_buf, src, mmc->write_bl_len * blkcnt);
+
+	data.src = cache_align_buf;
 	data.blocks = blkcnt;
 	data.blocksize = mmc->write_bl_len;
 	data.flags = MMC_DATA_WRITE;
 
 	if (mmc_send_cmd(mmc, &cmd, &data)) {
 		printf("mmc write failed\n");
+		free(cache_align_buf);
 		return 0;
 	}
 
@@ -303,6 +313,7 @@  mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src)
 		cmd.flags = 0;
 		if (mmc_send_cmd(mmc, &cmd, NULL)) {
 			printf("mmc fail to send stop cmd\n");
+			free(cache_align_buf);
 			return 0;
 		}
 
@@ -310,6 +321,7 @@  mmc_write_blocks(struct mmc *mmc, ulong start, lbaint_t blkcnt, const void*src)
 		mmc_send_status(mmc, timeout);
 	}
 
+	free(cache_align_buf);
 	return blkcnt;
 }
 
@@ -342,6 +354,7 @@  int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt)
 	struct mmc_cmd cmd;
 	struct mmc_data data;
 	int timeout = 1000;
+	void *cache_align_buf;
 
 	if (blkcnt > 1)
 		cmd.cmdidx = MMC_CMD_READ_MULTIPLE_BLOCK;
@@ -356,13 +369,21 @@  int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt)
 	cmd.resp_type = MMC_RSP_R1;
 	cmd.flags = 0;
 
-	data.dest = dst;
+	cache_align_buf = memalign(get_dcache_line_size(),
+				   mmc->read_bl_len * blkcnt);
+
+	if (!cache_align_buf)
+		return -ENOMEM;
+
+	data.dest = cache_align_buf;
 	data.blocks = blkcnt;
 	data.blocksize = mmc->read_bl_len;
 	data.flags = MMC_DATA_READ;
 
-	if (mmc_send_cmd(mmc, &cmd, &data))
+	if (mmc_send_cmd(mmc, &cmd, &data)) {
+		free(cache_align_buf);
 		return 0;
+	}
 
 	if (blkcnt > 1) {
 		cmd.cmdidx = MMC_CMD_STOP_TRANSMISSION;
@@ -371,6 +392,7 @@  int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt)
 		cmd.flags = 0;
 		if (mmc_send_cmd(mmc, &cmd, NULL)) {
 			printf("mmc fail to send stop cmd\n");
+			free(cache_align_buf);
 			return 0;
 		}
 
@@ -378,6 +400,8 @@  int mmc_read_blocks(struct mmc *mmc, void *dst, ulong start, lbaint_t blkcnt)
 		mmc_send_status(mmc, timeout);
 	}
 
+	memcpy(dst, cache_align_buf, mmc->read_bl_len * blkcnt);
+	free(cache_align_buf);
 	return blkcnt;
 }