diff mbox

[U-Boot] net: fec: Avoid MX28 bus sync issue

Message ID 523195CA.3010305@digi.com
State Changes Requested
Headers show

Commit Message

Hector Palacios Sept. 12, 2013, 10:22 a.m. UTC
Hello,

Going back to this old thread I have some news regarding the problem with TFTP 
transmissions blocking (timed out) after 10 seconds on the FEC of the MX28.
See below:

On 07/17/2013 05:55 PM, Hector Palacios wrote:
> Dear Marek,
>
> On 07/16/2013 06:44 AM, Marek Vasut wrote:
>> Dear Fabio Estevam,
>>
>>> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> wrote:
>>>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
>>>>
>>>> <hector.palacios@digi.com> wrote:
>>>>> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'
>>>>> in your EVK?
>>>>
>>>> Yes, this is what I have been running since the beginning.
>>>>
>>>>> If it doesn't fail, could you try running it again after playing with
>>>>> the environment (setting/printing some variables).
>>>>
>>>> I can't reproduce the problem here.
>>>>
>>>>> As I said, this issue appeared with different TFTP servers and is
>>>>> independent of whether the dcache is or not enabled.
>>>>
>>>> Can you transfer from a PC to another PC via TFTP?
>
> Yes I can.
>
>> Another thing of interest would be a 'tcpdump' pcap capture of that connection.
>
> I was initially filtering out only TFTP packets of my wireshark trace and all looked
> correct. After taking a second look to the full trace I see now a hint.
> Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the
> target asking for the owner of the target's IP. The target is receiving this ARP and
> apparently responding (at least this is what my debug code shows as it gets into
> arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from
> the target is not reaching the network. My sniffer does not capture this reply.
>
> The server resends the ARP request twice more (seconds 8 and 9) to the target and
> since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has
> that IP. Since nobody responds it stops sending data.
>
> The times that it works (and I don't know the magic behind using a numeric address
> versus using ${loadaddr} when they have the same value), the ARP replies do reach the
> network and the server continues the transmission normally.
>
> Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies
> always reach the network, and the transfers always succeed.
>
> Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

We tracked down the issue to an ARP request from the server that was never answered by 
the target. We later noticed that the problem did not happen anymore when building 
U-Boot with a different toolchain and that the issue seemed to be in the alignment of 
the RX buffer in the stack, which old GCC compilers seem to do wrong.

Here is a patch:

From: Robert Hodaszi <robert.hodaszi@digi.com>
Date: Fri, 6 Sep 2013 09:50:52 +0200
Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because
  of GCC bug

Older GCC versions don't handle well alignment on stack variables.
The temporary RX buffer is a local variable, so it is on the stack.
Because the FEC driver is using DMA for transmission, receive and
transmit buffers should be aligned on 64 byte. The transmit buffers are
not allocated in the driver internally, it sends the packets directly
as it gets them. So these packets should be aligned.
When the ARP layer wants to reply to an ARP request, it uses the FEC
driver's temporary RX buffer (used to pass data to the ARP layer) to
store the new packet, and pass it back to the FEC driver's send function.
Because of a GCC bug, this buffer is not aligned well, and when the
driver tries to send it, it first rounds the address down to the
alignment boundary. That causes invalid data.

To fix it, don't put the temporary onto the stack.

Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Signed-off-by: Hector Palacios <hector.palacios@digi.com>
---
  drivers/net/fec_mxc.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Marek Vasut Sept. 12, 2013, 10:50 a.m. UTC | #1
Dear Hector Palacios,

> Hello,
> 
> Going back to this old thread I have some news regarding the problem with
> TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the
> MX28. See below:
> 
> On 07/17/2013 05:55 PM, Hector Palacios wrote:
> > Dear Marek,
> > 
> > On 07/16/2013 06:44 AM, Marek Vasut wrote:
> >> Dear Fabio Estevam,
> >> 
> >>> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> 
wrote:
> >>>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
> >>>> 
> >>>> <hector.palacios@digi.com> wrote:
> >>>>> @Fabio: could you manually run the command 'tftp ${loadaddr}
> >>>>> file100M' in your EVK?
> >>>> 
> >>>> Yes, this is what I have been running since the beginning.
> >>>> 
> >>>>> If it doesn't fail, could you try running it again after playing with
> >>>>> the environment (setting/printing some variables).
> >>>> 
> >>>> I can't reproduce the problem here.
> >>>> 
> >>>>> As I said, this issue appeared with different TFTP servers and is
> >>>>> independent of whether the dcache is or not enabled.
> >>>> 
> >>>> Can you transfer from a PC to another PC via TFTP?
> > 
> > Yes I can.
> > 
> >> Another thing of interest would be a 'tcpdump' pcap capture of that
> >> connection.
> > 
> > I was initially filtering out only TFTP packets of my wireshark trace and
> > all looked correct. After taking a second look to the full trace I see
> > now a hint. Around 7 seconds after starting the TFTP transfer the server
> > is sending an ARP to the target asking for the owner of the target's IP.
> > The target is receiving this ARP and apparently responding (at least
> > this is what my debug code shows as it gets into arp.c:ArpReceive(),
> > case ARPOP_REQUEST and sending a packet), but this ARP reply from the
> > target is not reaching the network. My sniffer does not capture this
> > reply.
> > 
> > The server resends the ARP request twice more (seconds 8 and 9) to the
> > target and since it doesn't get a reply then sends a broadcast ARP
> > (seconds 10) asking who has that IP. Since nobody responds it stops
> > sending data.
> > 
> > The times that it works (and I don't know the magic behind using a
> > numeric address versus using ${loadaddr} when they have the same value),
> > the ARP replies do reach the network and the server continues the
> > transmission normally.
> > 
> > Using a v2009 U-Boot, the behaviour is exactly the same, but the target's
> > ARP replies always reach the network, and the transfers always succeed.
> > 
> > Since Fabio cannot reproduce it I guess it must be a local ghost. :o(
> 
> We tracked down the issue to an ARP request from the server that was never
> answered by the target. We later noticed that the problem did not happen
> anymore when building U-Boot with a different toolchain and that the issue
> seemed to be in the alignment of the RX buffer in the stack, which old GCC
> compilers seem to do wrong.
> 
> Here is a patch:
> 
> From: Robert Hodaszi <robert.hodaszi@digi.com>
> Date: Fri, 6 Sep 2013 09:50:52 +0200
> Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment
> because of GCC bug
> 
> Older GCC versions don't handle well alignment on stack variables.
> The temporary RX buffer is a local variable, so it is on the stack.
> Because the FEC driver is using DMA for transmission, receive and
> transmit buffers should be aligned on 64 byte. The transmit buffers are
> not allocated in the driver internally, it sends the packets directly
> as it gets them. So these packets should be aligned.
> When the ARP layer wants to reply to an ARP request, it uses the FEC
> driver's temporary RX buffer (used to pass data to the ARP layer) to
> store the new packet, and pass it back to the FEC driver's send function.
> Because of a GCC bug

Can you point to this GCC bug in the GCC bugzilla or something?

> this buffer is not aligned well, and when the
> driver tries to send it, it first rounds the address down to the
> alignment boundary. That causes invalid data.
> 
> To fix it, don't put the temporary onto the stack.
> 
> Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
> Signed-off-by: Hector Palacios <hector.palacios@digi.com>
> ---
>   drivers/net/fec_mxc.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
> index f4f72b7..315017e 100644
> --- a/drivers/net/fec_mxc.c
> +++ b/drivers/net/fec_mxc.c
> @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev)
>          uint16_t bd_status;
>          uint32_t addr, size, end;
>          int i;
> -       uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
> +       /* Don't place this variable on the stack, because older GCC
> version +        * doesn't handle alignement on stack well.
> +        */
> +       static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from 
include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you 
have a testcase maybe ?

btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .

Best regards,
Marek Vasut
Marek Vasut Sept. 12, 2013, 11 a.m. UTC | #2
Dear Robert Hodaszi,

> <html>
>   <head>
>     <meta content="text/html; charset=ISO-8859-1"
>       http-equiv="Content-Type">
>   </head>
>   <body bgcolor="#FFFFFF" text="#000000">
>     <div class="moz-cite-prefix">Hi,<br>
>       <br>
>       There are a lot of bug announcement, just make a search:<br>
>       <meta http-equiv="content-type" content="text/html;
>         charset=ISO-8859-1">
>       <a
> href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721">http://gcc.gnu.or
> g/bugzilla/show_bug.cgi?id=33721</a><br> <meta http-equiv="content-type"
> content="text/html;
>         charset=ISO-8859-1">
>       <a
> href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660">http://gcc.gnu.or
> g/bugzilla/show_bug.cgi?id=16660</a><br> <br>
>       Also, I printed out the buffer addresses, and that temporary RX
>       buffer was not aligned. So the transmit function rounded it down
>       to the alignment boundary, and so caused invalid data
>       transmission. (By the way. Shouldn't the transmit function check
>       whether the alignment is proper, and throw an error message,
>       instead of round it down? That would make more sense.)<br>
>       <div class="moz-signature"><br>
>         Best regards,<br>
>         Robert Hodaszi
>       </div>
>       <br>
>       On 2013-09-12 12:50, Marek Vasut wrote:<br>
>     </div>
>     <blockquote cite="mid:201309121250.36579.marex@denx.de" type="cite">
>       <pre wrap="">Dear Hector Palacios,
[...]

Sorry, I cannot decode this. Can you please send plain-text emails?

Thank you

Best regards,
Marek Vasut
Robert Hodaszi Sept. 12, 2013, 11:02 a.m. UTC | #3
Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Also, I printed out the buffer addresses, and that temporary RX buffer 
was not aligned. So the transmit function rounded it down to the 
alignment boundary, and so caused invalid data transmission. (By the 
way. Shouldn't the transmit function check whether the alignment is 
proper, and throw an error message, instead of round it down? That would 
make more sense.)

Best regards,
Robert Hodaszi

On 2013-09-12 12:50, Marek Vasut wrote:
> Dear Hector Palacios,
>
>> Hello,
>>
>> Going back to this old thread I have some news regarding the problem with
>> TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the
>> MX28. See below:
>>
>> On 07/17/2013 05:55 PM, Hector Palacios wrote:
>>> Dear Marek,
>>>
>>> On 07/16/2013 06:44 AM, Marek Vasut wrote:
>>>> Dear Fabio Estevam,
>>>>
>>>>> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam<festevam@gmail.com>  
> wrote:
>>>>>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
>>>>>>
>>>>>> <hector.palacios@digi.com>  wrote:
>>>>>>> @Fabio: could you manually run the command 'tftp ${loadaddr}
>>>>>>> file100M' in your EVK?
>>>>>> Yes, this is what I have been running since the beginning.
>>>>>>
>>>>>>> If it doesn't fail, could you try running it again after playing with
>>>>>>> the environment (setting/printing some variables).
>>>>>> I can't reproduce the problem here.
>>>>>>
>>>>>>> As I said, this issue appeared with different TFTP servers and is
>>>>>>> independent of whether the dcache is or not enabled.
>>>>>> Can you transfer from a PC to another PC via TFTP?
>>> Yes I can.
>>>
>>>> Another thing of interest would be a 'tcpdump' pcap capture of that
>>>> connection.
>>> I was initially filtering out only TFTP packets of my wireshark trace and
>>> all looked correct. After taking a second look to the full trace I see
>>> now a hint. Around 7 seconds after starting the TFTP transfer the server
>>> is sending an ARP to the target asking for the owner of the target's IP.
>>> The target is receiving this ARP and apparently responding (at least
>>> this is what my debug code shows as it gets into arp.c:ArpReceive(),
>>> case ARPOP_REQUEST and sending a packet), but this ARP reply from the
>>> target is not reaching the network. My sniffer does not capture this
>>> reply.
>>>
>>> The server resends the ARP request twice more (seconds 8 and 9) to the
>>> target and since it doesn't get a reply then sends a broadcast ARP
>>> (seconds 10) asking who has that IP. Since nobody responds it stops
>>> sending data.
>>>
>>> The times that it works (and I don't know the magic behind using a
>>> numeric address versus using ${loadaddr} when they have the same value),
>>> the ARP replies do reach the network and the server continues the
>>> transmission normally.
>>>
>>> Using a v2009 U-Boot, the behaviour is exactly the same, but the target's
>>> ARP replies always reach the network, and the transfers always succeed.
>>>
>>> Since Fabio cannot reproduce it I guess it must be a local ghost. :o(
>> We tracked down the issue to an ARP request from the server that was never
>> answered by the target. We later noticed that the problem did not happen
>> anymore when building U-Boot with a different toolchain and that the issue
>> seemed to be in the alignment of the RX buffer in the stack, which old GCC
>> compilers seem to do wrong.
>>
>> Here is a patch:
>>
>> From: Robert Hodaszi<robert.hodaszi@digi.com>
>> Date: Fri, 6 Sep 2013 09:50:52 +0200
>> Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment
>> because of GCC bug
>>
>> Older GCC versions don't handle well alignment on stack variables.
>> The temporary RX buffer is a local variable, so it is on the stack.
>> Because the FEC driver is using DMA for transmission, receive and
>> transmit buffers should be aligned on 64 byte. The transmit buffers are
>> not allocated in the driver internally, it sends the packets directly
>> as it gets them. So these packets should be aligned.
>> When the ARP layer wants to reply to an ARP request, it uses the FEC
>> driver's temporary RX buffer (used to pass data to the ARP layer) to
>> store the new packet, and pass it back to the FEC driver's send function.
>> Because of a GCC bug
> Can you point to this GCC bug in the GCC bugzilla or something?
>
>> this buffer is not aligned well, and when the
>> driver tries to send it, it first rounds the address down to the
>> alignment boundary. That causes invalid data.
>>
>> To fix it, don't put the temporary onto the stack.
>>
>> Signed-off-by: Robert Hodaszi<robert.hodaszi@digi.com>
>> Signed-off-by: Hector Palacios<hector.palacios@digi.com>
>> ---
>>    drivers/net/fec_mxc.c | 5 ++++-
>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
>> index f4f72b7..315017e 100644
>> --- a/drivers/net/fec_mxc.c
>> +++ b/drivers/net/fec_mxc.c
>> @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev)
>>           uint16_t bd_status;
>>           uint32_t addr, size, end;
>>           int i;
>> -       uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
>> +       /* Don't place this variable on the stack, because older GCC
>> version +        * doesn't handle alignement on stack well.
>> +        */
>> +       static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
> The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from
> include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you
> have a testcase maybe ?
>
> btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .
>
> Best regards,
> Marek Vasut
Robert Hodaszi Sept. 12, 2013, 11:08 a.m. UTC | #4
Hi,

Sorry, hopefully that will be a plain-text.

There are a lot of bug announcement, just make a search:
gcc.gnu.org/bugzilla/show_bug.cgi?id=33721
gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Also, I printed out the buffer addresses, and that temporary RX buffer 
was not aligned. So the transmit function rounded it down to the 
alignment boundary, and so caused invalid data transmission. (By the 
way. Shouldn't the transmit function check whether the alignment is 
proper, and throw an error message, instead of round it down? That would 
make more sense.)

Best regards,
Robert Hodaszi

On 2013-09-12 12:50, Marek Vasut wrote:
> Dear Hector Palacios,
>
>> Hello,
>>
>> Going back to this old thread I have some news regarding the problem with
>> TFTP transmissions blocking (timed out) after 10 seconds on the FEC of the
>> MX28. See below:
>>
>> On 07/17/2013 05:55 PM, Hector Palacios wrote:
>>> Dear Marek,
>>>
>>> On 07/16/2013 06:44 AM, Marek Vasut wrote:
>>>> Dear Fabio Estevam,
>>>>
>>>>> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam<festevam@gmail.com>
> wrote:
>>>>>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
>>>>>>
>>>>>> <hector.palacios@digi.com>  wrote:
>>>>>>> @Fabio: could you manually run the command 'tftp ${loadaddr}
>>>>>>> file100M' in your EVK?
>>>>>> Yes, this is what I have been running since the beginning.
>>>>>>
>>>>>>> If it doesn't fail, could you try running it again after playing with
>>>>>>> the environment (setting/printing some variables).
>>>>>> I can't reproduce the problem here.
>>>>>>
>>>>>>> As I said, this issue appeared with different TFTP servers and is
>>>>>>> independent of whether the dcache is or not enabled.
>>>>>> Can you transfer from a PC to another PC via TFTP?
>>> Yes I can.
>>>
>>>> Another thing of interest would be a 'tcpdump' pcap capture of that
>>>> connection.
>>> I was initially filtering out only TFTP packets of my wireshark trace and
>>> all looked correct. After taking a second look to the full trace I see
>>> now a hint. Around 7 seconds after starting the TFTP transfer the server
>>> is sending an ARP to the target asking for the owner of the target's IP.
>>> The target is receiving this ARP and apparently responding (at least
>>> this is what my debug code shows as it gets into arp.c:ArpReceive(),
>>> case ARPOP_REQUEST and sending a packet), but this ARP reply from the
>>> target is not reaching the network. My sniffer does not capture this
>>> reply.
>>>
>>> The server resends the ARP request twice more (seconds 8 and 9) to the
>>> target and since it doesn't get a reply then sends a broadcast ARP
>>> (seconds 10) asking who has that IP. Since nobody responds it stops
>>> sending data.
>>>
>>> The times that it works (and I don't know the magic behind using a
>>> numeric address versus using ${loadaddr} when they have the same value),
>>> the ARP replies do reach the network and the server continues the
>>> transmission normally.
>>>
>>> Using a v2009 U-Boot, the behaviour is exactly the same, but the target's
>>> ARP replies always reach the network, and the transfers always succeed.
>>>
>>> Since Fabio cannot reproduce it I guess it must be a local ghost. :o(
>> We tracked down the issue to an ARP request from the server that was never
>> answered by the target. We later noticed that the problem did not happen
>> anymore when building U-Boot with a different toolchain and that the issue
>> seemed to be in the alignment of the RX buffer in the stack, which old GCC
>> compilers seem to do wrong.
>>
>> Here is a patch:
>>
>> From: Robert Hodaszi<robert.hodaszi@digi.com>
>> Date: Fri, 6 Sep 2013 09:50:52 +0200
>> Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment
>> because of GCC bug
>>
>> Older GCC versions don't handle well alignment on stack variables.
>> The temporary RX buffer is a local variable, so it is on the stack.
>> Because the FEC driver is using DMA for transmission, receive and
>> transmit buffers should be aligned on 64 byte. The transmit buffers are
>> not allocated in the driver internally, it sends the packets directly
>> as it gets them. So these packets should be aligned.
>> When the ARP layer wants to reply to an ARP request, it uses the FEC
>> driver's temporary RX buffer (used to pass data to the ARP layer) to
>> store the new packet, and pass it back to the FEC driver's send function.
>> Because of a GCC bug
> Can you point to this GCC bug in the GCC bugzilla or something?
>
>> this buffer is not aligned well, and when the
>> driver tries to send it, it first rounds the address down to the
>> alignment boundary. That causes invalid data.
>>
>> To fix it, don't put the temporary onto the stack.
>>
>> Signed-off-by: Robert Hodaszi<robert.hodaszi@digi.com>
>> Signed-off-by: Hector Palacios<hector.palacios@digi.com>
>> ---
>>    drivers/net/fec_mxc.c | 5 ++++-
>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
>> index f4f72b7..315017e 100644
>> --- a/drivers/net/fec_mxc.c
>> +++ b/drivers/net/fec_mxc.c
>> @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev)
>>           uint16_t bd_status;
>>           uint32_t addr, size, end;
>>           int i;
>> -       uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
>> +       /* Don't place this variable on the stack, because older GCC
>> version +        * doesn't handle alignement on stack well.
>> +        */
>> +       static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
> The buffer might as well be allocated using ALLOC_CACHE_ALIGN_BUFFER() from
> include/common.h . Still, are you _really_ sure the buffer is unaligned ? Do you
> have a testcase maybe ?
>
> btw. I am able to replicate this issue sometimes even using GCC 4.8.0 .
>
> Best regards,
> Marek Vasut
Marek Vasut Sept. 12, 2013, 2:05 p.m. UTC | #5
Dear Robert Hodaszi,

> Hi,
> 
> Sorry, hopefully that will be a plain-text.
> 
> There are a lot of bug announcement, just make a search:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721

This was apparently fixed three years ago.

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

And this one six years ago ...

> Also, I printed out the buffer addresses, and that temporary RX buffer
> was not aligned. So the transmit function rounded it down to the
> alignment boundary, and so caused invalid data transmission. (By the
> way. Shouldn't the transmit function check whether the alignment is
> proper, and throw an error message, instead of round it down? That would
> make more sense.)

Looking at the code one more time, it'd make most sense to simply allocate the 
buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of 
stack. You see, the max packet size is around 2k, which is quite a lot. How does 
this proposal sound to you ?

Best regards,
Marek Vasut
Robert Hodaszi Sept. 12, 2013, 2:15 p.m. UTC | #6
I just brought up two samples, that it was a long-term problem. Now it 
is fixed. But if somebody is trying to compile the U-Boot with an older 
toolchain, it could cause problems. I had problems with 4.4.6.

Yes, 2k is a lot, and I would not put it on the stack. That was just a 
quick fix. It would be nicer to allocate the memory with malloc, and 
should do that at initialization time.

Best regards,
Robert Hodaszi

On 2013-09-12 16:05, Marek Vasut wrote:
> Dear Robert Hodaszi,
>
>> Hi,
>>
>> Sorry, hopefully that will be a plain-text.
>>
>> There are a lot of bug announcement, just make a search:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33721
> This was apparently fixed three years ago.
>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660
> And this one six years ago ...
>
>> Also, I printed out the buffer addresses, and that temporary RX buffer
>> was not aligned. So the transmit function rounded it down to the
>> alignment boundary, and so caused invalid data transmission. (By the
>> way. Shouldn't the transmit function check whether the alignment is
>> proper, and throw an error message, instead of round it down? That would
>> make more sense.)
> Looking at the code one more time, it'd make most sense to simply allocate the
> buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of
> stack. You see, the max packet size is around 2k, which is quite a lot. How does
> this proposal sound to you ?
>
> Best regards,
> Marek Vasut
Marek Vasut Sept. 12, 2013, 2:31 p.m. UTC | #7
Dear Robert Hodaszi,

> I just brought up two samples, that it was a long-term problem. Now it
> is fixed. But if somebody is trying to compile the U-Boot with an older
> toolchain, it could cause problems. I had problems with 4.4.6.
> 
> Yes, 2k is a lot, and I would not put it on the stack. That was just a
> quick fix. It would be nicer to allocate the memory with malloc, and
> should do that at initialization time.

Please do not top-post.

Memalign should do here with proper rounding. Can you prepare another patch 
please?

Best regards,
Marek Vasut
Robert Hodaszi Sept. 12, 2013, 2:32 p.m. UTC | #8
On 2013-09-12 16:31, Marek Vasut wrote:
> Dear Robert Hodaszi,
>
> Please do not top-post.
>
> Memalign should do here with proper rounding. Can you prepare another patch
> please?
>
> Best regards,
> Marek Vasut

Ok. I will try to do that tomorrow.

Best regards,
Robert Hodaszi
Marek Vasut Sept. 12, 2013, 3:06 p.m. UTC | #9
Dear Robert Hodaszi,

> On 2013-09-12 16:31, Marek Vasut wrote:
> > Dear Robert Hodaszi,
> > 
> > Please do not top-post.
> > 
> > Memalign should do here with proper rounding. Can you prepare another
> > patch please?
> > 
> > Best regards,
> > Marek Vasut
> 
> Ok. I will try to do that tomorrow.

Cool, thanks ! Nice find btw ;-)

Best regards,
Marek Vasut
Wolfgang Denk Sept. 12, 2013, 5:50 p.m. UTC | #10
Dear Hector Palacios,

In message <523195CA.3010305@digi.com> you wrote:
> 
> Here is a patch:
> 
> From: Robert Hodaszi <robert.hodaszi@digi.com>
> Date: Fri, 6 Sep 2013 09:50:52 +0200
> Subject: [PATCH] net: fec: fix invalid temporary RX buffer alignment because
>   of GCC bug
> 
> Older GCC versions don't handle well alignment on stack variables.

Can you please be specific - which exact versions of GCC are supposed
to misbehave here?

> To fix it, don't put the temporary onto the stack.

This is not a good idea, as it wastes a memory for no good reason.

> Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
> Signed-off-by: Hector Palacios <hector.palacios@digi.com>
> ---
>   drivers/net/fec_mxc.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
> index f4f72b7..315017e 100644
> --- a/drivers/net/fec_mxc.c
> +++ b/drivers/net/fec_mxc.c
> @@ -828,7 +828,10 @@ static int fec_recv(struct eth_device *dev)
>          uint16_t bd_status;
>          uint32_t addr, size, end;
>          int i;
> -       uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
> +       /* Don't place this variable on the stack, because older GCC version
> +        * doesn't handle alignement on stack well.
> +        */
> +       static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

I have to admit that I doubt the explanation - somthing else is
probaly wrong instead.  I would really like to know which compiler
version misbehaves, and what the generated code looks like, both in
the working and in the broken case.

Thanks.

Best regards,

Wolfgang Denk
Wolfgang Denk Sept. 12, 2013, 6:12 p.m. UTC | #11
Dear Robert Hodaszi,

In message <5231A0C0.8070308@digi.com> you wrote:
> 
> Sorry, hopefully that will be a plain-text.
> 
> There are a lot of bug announcement, just make a search:
> gcc.gnu.org/bugzilla/show_bug.cgi?id=33721
> gcc.gnu.org/bugzilla/show_bug.cgi?id=16660

Hm... what exactly are the values of STACK_BOUNDARY and
PREFERRED_STACK_BOUNDARY for the failing version of GCC, then?


Best regards,

Wolfgang Denk
Wolfgang Denk Sept. 12, 2013, 6:17 p.m. UTC | #12
Dear Marek Vasut,

In message <201309121605.04824.marex@denx.de> you wrote:
> 
> Looking at the code one more time, it'd make most sense to simply allocate the 
> buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of 
> stack. You see, the max packet size is around 2k, which is quite a lot. How does 
> this proposal sound to you ?

It makes perfect sense to allocate variable with function scope only
on the stack.  That's what the stack has been invented for.

If there is a bug with that, this mug must be isolated and fixed.  It
makes zero sense to just paper over it.

Best regards,

Wolfgang Denk
Fabio Estevam Sept. 12, 2013, 6:39 p.m. UTC | #13
On Thu, Sep 12, 2013 at 3:17 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Marek Vasut,
>
> In message <201309121605.04824.marex@denx.de> you wrote:
>>
>> Looking at the code one more time, it'd make most sense to simply allocate the
>> buffer NOT on stack, but with some memalign-kind-of call to avoid this abuse of
>> stack. You see, the max packet size is around 2k, which is quite a lot. How does
>> this proposal sound to you ?
>
> It makes perfect sense to allocate variable with function scope only
> on the stack.  That's what the stack has been invented for.

This buffer in the fec driver will be used in DMA transfer, so maybe
that's the reason we should use malloc instead of using the stack?

At least in the kernel, we don't use stack for DMA buffers.
Wolfgang Denk Sept. 12, 2013, 6:53 p.m. UTC | #14
Dear Fabio Estevam,

In message <CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com> you wrote:
>
> > It makes perfect sense to allocate variable with function scope only
> > on the stack.  That's what the stack has been invented for.
> 
> This buffer in the fec driver will be used in DMA transfer, so maybe
> that's the reason we should use malloc instead of using the stack?

What has DMA to do with that?  We're talking about alignment only.

> At least in the kernel, we don't use stack for DMA buffers.

Yes, but them, the kernel uses a much more complicated memory setup,
with somewhat different mappings for different regions.  We don't do
that - or do we?  IMO we use a single mapping for the whole RAM ?

Best regards,

Wolfgang Denk
Fabio Estevam Sept. 12, 2013, 7:37 p.m. UTC | #15
On Thu, Sep 12, 2013 at 3:53 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Fabio Estevam,
>
> In message <CAOMZO5BY6kDOCoWn_SRwhPE7ssMjAReZ2bcFXGgFF-_Wrdgo0g@mail.gmail.com> you wrote:
>>
>> > It makes perfect sense to allocate variable with function scope only
>> > on the stack.  That's what the stack has been invented for.
>>
>> This buffer in the fec driver will be used in DMA transfer, so maybe
>> that's the reason we should use malloc instead of using the stack?
>
> What has DMA to do with that?  We're talking about alignment only.

I mentioned DMA because we align the buffer with __aligned(ARCH_DMA_MINALIGN).

Will try to see if I can reproduce the problem here, but the last time
I tried I was not able to.

Maybe the gcc version that Robert and Hector pointed out may explain
the different behaviour.

Regards,

Fabio Estevam
diff mbox

Patch

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
index f4f72b7..315017e 100644
--- a/drivers/net/fec_mxc.c
+++ b/drivers/net/fec_mxc.c
@@ -828,7 +828,10 @@  static int fec_recv(struct eth_device *dev)
         uint16_t bd_status;
         uint32_t addr, size, end;
         int i;
-       uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);
+       /* Don't place this variable on the stack, because older GCC version
+        * doesn't handle alignement on stack well.
+        */
+       static uchar buff[FEC_MAX_PKT_SIZE] __aligned(ARCH_DMA_MINALIGN);

         /*
          * Check if any critical events have happened