diff mbox

[U-Boot] net: fec: Avoid MX28 bus sync issue

Message ID 1373583784-7129-1-git-send-email-marex@denx.de
State Superseded
Delegated to: Stefano Babic
Headers show

Commit Message

Marek Vasut July 11, 2013, 11:03 p.m. UTC
The MX28 multi-layer AHB bus can be too slow and trigger the
FEC DMA too early, before all the data hit the DRAM. This patch
ensures the data are written in the RAM before the DMA starts.
Please see the comment in the patch for full details.

This patch was produced with an amazing help from Albert Aribaud,
who pointed out it can possibly be such a bus synchronisation
issue.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Stefano Babic <sbabic@denx.de>
---
 drivers/net/fec_mxc.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Fabio Estevam July 11, 2013, 11:18 p.m. UTC | #1
On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> The MX28 multi-layer AHB bus can be too slow and trigger the
> FEC DMA too early, before all the data hit the DRAM. This patch
> ensures the data are written in the RAM before the DMA starts.
> Please see the comment in the patch for full details.
>
> This patch was produced with an amazing help from Albert Aribaud,
> who pointed out it can possibly be such a bus synchronisation
> issue.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> Cc: Stefano Babic <sbabic@denx.de>

Excellent, managed to transfer 90MB via TFTP on mx28evk without a
single timeout.

Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Alexandre Pereira da Silva July 12, 2013, 3:41 a.m. UTC | #2
On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>> The MX28 multi-layer AHB bus can be too slow and trigger the
>> FEC DMA too early, before all the data hit the DRAM. This patch
>> ensures the data are written in the RAM before the DMA starts.
>> Please see the comment in the patch for full details.
>>
>> This patch was produced with an amazing help from Albert Aribaud,
>> who pointed out it can possibly be such a bus synchronisation
>> issue.
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>> Cc: Stefano Babic <sbabic@denx.de>
>
> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> single timeout.
>
> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>

It's working here too.

Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
Marek Vasut July 12, 2013, 3:51 a.m. UTC | #3
Hi,

> On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
> > On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> >> The MX28 multi-layer AHB bus can be too slow and trigger the
> >> FEC DMA too early, before all the data hit the DRAM. This patch
> >> ensures the data are written in the RAM before the DMA starts.
> >> Please see the comment in the patch for full details.
> >> 
> >> This patch was produced with an amazing help from Albert Aribaud,
> >> who pointed out it can possibly be such a bus synchronisation
> >> issue.
> >> 
> >> Signed-off-by: Marek Vasut <marex@denx.de>
> >> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> >> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> >> Cc: Stefano Babic <sbabic@denx.de>
> > 
> > Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> > single timeout.
> > 
> > Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> 
> It's working here too.
> 
> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>

Nice to hear, thank Albert for finding this.

Best regards,
Marek Vasut
Albert ARIBAUD July 12, 2013, 5:57 a.m. UTC | #4
Hi Marek,

On Fri, 12 Jul 2013 01:03:04 +0200, Marek Vasut <marex@denx.de> wrote:

> The MX28 multi-layer AHB bus can be too slow and trigger the
> FEC DMA too early, before all the data hit the DRAM. This patch
> ensures the data are written in the RAM before the DMA starts.
> Please see the comment in the patch for full details.
> 
> This patch was produced with an amazing help from Albert Aribaud,
> who pointed out it can possibly be such a bus synchronisation
> issue.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> Cc: Stefano Babic <sbabic@denx.de>
> ---
>  drivers/net/fec_mxc.c |   22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
> index 97bf8fe..ec5b9db 100644
> --- a/drivers/net/fec_mxc.c
> +++ b/drivers/net/fec_mxc.c
> @@ -737,6 +737,28 @@ static int fec_send(struct eth_device *dev, void *packet, int length)
>  	flush_dcache_range(addr, addr + size);
>  
>  	/*
> +	 * Below we read the DMA descriptor's last four bytes back from the
> +	 * DRAM. This is important in order to make sure that all WRITE
> +	 * operations on the bus that were triggered by previous cache FLUSH
> +	 * have completed.
> +	 *
> +	 * Otherwise, on MX28, it is possible to observe a corruption of the
> +	 * DMA descriptors. Please refer to schematic "Figure 1-2" in MX28RM
> +	 * for the bus structure of MX28. The scenario is as follows:
> +	 *
> +	 * 1) ARM core triggers a series of WRITEs on the AHB_ARB2 bus going
> +	 *    to DRAM due to flush_dcache_range()
> +	 * 2) ARM core writes the FEC registers via AHB_ARB2
> +	 * 3) FEC DMA starts reading/writing from/to DRAM via AHB_ARB3
> +	 *
> +	 * Note that 2) does sometimes finish before 1) due to reordering of
> +	 * WRITE accesses on the AHB bus, therefore triggering 3) before the
> +	 * DMA descriptor is fully written into DRAM. This results in occasional
> +	 * corruption of the DMA descriptor.
> +	 */
> +	readl(addr + size - 4);
> +
> +	/*
>  	 * Enable SmartDMA transmit task
>  	 */
>  	fec_tx_task_enable(fec);

This being a bugfix patch, and having been tested twice, I suggest that
it go in 2013.07, maybe with the commit message reduced to its first
paragraph above -- although of course I do appreciate the second one,
except it tends to minimize Marek's own contribution to the fix, which
is by far the most important.

Amicalement,
Albert ARIBAUD July 12, 2013, 6:39 a.m. UTC | #5
Afterthought:

On Fri, 12 Jul 2013 07:57:18 +0200, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:

> except it tends to minimize Marek's own contribution to the fix, which
> is by far the most important.

'The most important' 'by far' being of course Marek's contribution, not
minimizing it. :)

Amicalement,
Stefano Babic July 12, 2013, 6:56 a.m. UTC | #6
Hi Marek, hi Albert,

On 12/07/2013 07:57, Albert ARIBAUD wrote:

> 
> This being a bugfix patch, and having been tested twice, I suggest that
> it go in 2013.07, maybe with the commit message reduced to its first
> paragraph above -- although of course I do appreciate the second one,
> except it tends to minimize Marek's own contribution to the fix, which
> is by far the most important.

Well, a big thank to both of you !

I merge this into u-boot-imx and I will send soon my PR.

Regards,
Stefano
Stefano Babic July 12, 2013, 7:30 a.m. UTC | #7
On 12/07/2013 01:03, Marek Vasut wrote:
> The MX28 multi-layer AHB bus can be too slow and trigger the
> FEC DMA too early, before all the data hit the DRAM. This patch
> ensures the data are written in the RAM before the DMA starts.
> Please see the comment in the patch for full details.
> 
> This patch was produced with an amazing help from Albert Aribaud,
> who pointed out it can possibly be such a bus synchronisation
> issue.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> Cc: Stefano Babic <sbabic@denx.de>
> ---

Applied to u-boot-imx (fix), thanks !

Best regards,
Stefano Babic
Hector Palacios July 12, 2013, 11:37 a.m. UTC | #8
Dear Marek,

On 07/12/2013 05:51 AM, Marek Vasut wrote:
> Hi,
>
>> On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
>>>> FEC DMA too early, before all the data hit the DRAM. This patch
>>>> ensures the data are written in the RAM before the DMA starts.
>>>> Please see the comment in the patch for full details.
>>>>
>>>> This patch was produced with an amazing help from Albert Aribaud,
>>>> who pointed out it can possibly be such a bus synchronisation
>>>> issue.
>>>>
>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>>>> Cc: Stefano Babic <sbabic@denx.de>
>>>
>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
>>> single timeout.
>>>
>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
>>
>> It's working here too.
>>
>> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
>
> Nice to hear, thank Albert for finding this.

Thanks for sharing.

Unfortunately I'm still seeing non-recoverable timeouts when doing tftp transfers.
Nevertheless, with this patch sometimes I'm able to transfer big files (100MiB) 
without problems (I was never able before). So this is a big improvement.
I applied this patch over a v2013.01, does it need any additional patch? I think I saw 
one email about flush dcache...

Considering the other guys seem to work without problems I guess this scenario is 
specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz. I always suspect from 
the PHY.
I'm disconcerted because usually the timeouts occur after having successfully 
downloaded 22 or 24 MiB. Other times it just downloads completely.

In any case, good job!

Best regards,
--
Hector Palacios
Fabio Estevam July 12, 2013, 11:39 a.m. UTC | #9
Hi Hector,

On Fri, Jul 12, 2013 at 8:37 AM, Hector Palacios
<hector.palacios@digi.com> wrote:

> Thanks for sharing.
>
> Unfortunately I'm still seeing non-recoverable timeouts when doing tftp
> transfers.
> Nevertheless, with this patch sometimes I'm able to transfer big files
> (100MiB) without problems (I was never able before). So this is a big
> improvement.
> I applied this patch over a v2013.01, does it need any additional patch? I
> think I saw one email about flush dcache...

Please try Stefano's tree instead:
http://git.denx.de/?p=u-boot/u-boot-imx.git;a=shortlog;h=refs/heads/master

Regards,

Fabio Estevam
Marek Vasut July 12, 2013, 11:51 a.m. UTC | #10
Hi Albert,

> Afterthought:
> 
> On Fri, 12 Jul 2013 07:57:18 +0200, Albert ARIBAUD
> 
> <albert.u.boot@aribaud.net> wrote:
> > except it tends to minimize Marek's own contribution to the fix, which
> > is by far the most important.
> 
> 'The most important' 'by far' being of course Marek's contribution, not
> minimizing it. :)

I'm convinced I owe you a truckload of tartelette ;-)

Best regards,
Marek Vasut
Marek Vasut July 12, 2013, 12:01 p.m. UTC | #11
Hi Hector,

> Dear Marek,
> 
> On 07/12/2013 05:51 AM, Marek Vasut wrote:
> > Hi,
> > 
> >> On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
> >>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> >>>> The MX28 multi-layer AHB bus can be too slow and trigger the
> >>>> FEC DMA too early, before all the data hit the DRAM. This patch
> >>>> ensures the data are written in the RAM before the DMA starts.
> >>>> Please see the comment in the patch for full details.
> >>>> 
> >>>> This patch was produced with an amazing help from Albert Aribaud,
> >>>> who pointed out it can possibly be such a bus synchronisation
> >>>> issue.
> >>>> 
> >>>> Signed-off-by: Marek Vasut <marex@denx.de>
> >>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> >>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> >>>> Cc: Stefano Babic <sbabic@denx.de>
> >>> 
> >>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> >>> single timeout.
> >>> 
> >>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> >> 
> >> It's working here too.
> >> 
> >> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
> > 
> > Nice to hear, thank Albert for finding this.
> 
> Thanks for sharing.
> 
> Unfortunately I'm still seeing non-recoverable timeouts when doing tftp
> transfers. Nevertheless, with this patch sometimes I'm able to transfer
> big files (100MiB) without problems (I was never able before). So this is
> a big improvement. I applied this patch over a v2013.01, does it need any
> additional patch? I think I saw one email about flush dcache...

Try Stefano's tree as Fabio suggested. I think it's already pushed and includes 
the fixes.

> Considering the other guys seem to work without problems I guess this
> scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz.
> I always suspect from the PHY.

You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h 
) . Also, can you check which of the two "ret = -EINVAL" is triggered in 
fec_send() ? You can add simple printf() alongside both of them.

> I'm disconcerted because usually the timeouts occur after having
> successfully downloaded 22 or 24 MiB. Other times it just downloads
> completely.
> 
> In any case, good job!
> 
> Best regards,
> --
> Hector Palacios

Best regards,
Marek Vasut
Hector Palacios July 12, 2013, 3:08 p.m. UTC | #12
Hi Marek,

On 07/12/2013 02:01 PM, Marek Vasut wrote:
> Hi Hector,
>
>> Dear Marek,
>>
>> On 07/12/2013 05:51 AM, Marek Vasut wrote:
>>> Hi,
>>>
>>>> On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
>>>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>>>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
>>>>>> FEC DMA too early, before all the data hit the DRAM. This patch
>>>>>> ensures the data are written in the RAM before the DMA starts.
>>>>>> Please see the comment in the patch for full details.
>>>>>>
>>>>>> This patch was produced with an amazing help from Albert Aribaud,
>>>>>> who pointed out it can possibly be such a bus synchronisation
>>>>>> issue.
>>>>>>
>>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>>>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>>>>>> Cc: Stefano Babic <sbabic@denx.de>
>>>>>
>>>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
>>>>> single timeout.
>>>>>
>>>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
>>>>
>>>> It's working here too.
>>>>
>>>> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
>>>
>>> Nice to hear, thank Albert for finding this.
>>
>> Thanks for sharing.
>>
>> Unfortunately I'm still seeing non-recoverable timeouts when doing tftp
>> transfers. Nevertheless, with this patch sometimes I'm able to transfer
>> big files (100MiB) without problems (I was never able before). So this is
>> a big improvement. I applied this patch over a v2013.01, does it need any
>> additional patch? I think I saw one email about flush dcache...
>
> Try Stefano's tree as Fabio suggested. I think it's already pushed and includes
> the fixes.

I just tried, but it didn't help.

>> Considering the other guys seem to work without problems I guess this
>> scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz.
>> I always suspect from the PHY.
>
> You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h
> ) . Also, can you check which of the two "ret = -EINVAL" is triggered in
> fec_send() ? You can add simple printf() alongside both of them.

fec_send() *does not* ever fail, but I found something:
It is very strange that the timeouts appear always after transferring between 20 and 
24 MiB. So I thought maybe it was not an issue with the size of the file or the number 
of packets received, but instead a timed issue (an issue that happens after some 
period of time). I checked, and in fact the timeouts occur exactly 10 seconds after 
running the tftp command.
I verified that this is what is happening by adding a udelay(100000) at fec_send(). In 
this case, the timeout also occurs after 10 seconds, but due to the delay, I have 
transferred only a few Kbytes.
I tried to change different timeout related constants at tftp.c but still the issue 
happens after 10s.
It's like if, after these 10 seconds, the PHY lost the link or something. Really odd.
Does it tell you anything?

Best regards,
--
Hector Palacios
Albert ARIBAUD July 12, 2013, 3:50 p.m. UTC | #13
Hi Hector,

On Fri, 12 Jul 2013 17:08:59 +0200, Hector Palacios
<hector.palacios@digi.com> wrote:

> Hi Marek,
> 
> On 07/12/2013 02:01 PM, Marek Vasut wrote:
> > Hi Hector,
> >
> >> Dear Marek,
> >>
> >> On 07/12/2013 05:51 AM, Marek Vasut wrote:
> >>> Hi,
> >>>
> >>>> On Thu, Jul 11, 2013 at 8:18 PM, Fabio Estevam <festevam@gmail.com> wrote:
> >>>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> >>>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
> >>>>>> FEC DMA too early, before all the data hit the DRAM. This patch
> >>>>>> ensures the data are written in the RAM before the DMA starts.
> >>>>>> Please see the comment in the patch for full details.
> >>>>>>
> >>>>>> This patch was produced with an amazing help from Albert Aribaud,
> >>>>>> who pointed out it can possibly be such a bus synchronisation
> >>>>>> issue.
> >>>>>>
> >>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
> >>>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> >>>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> >>>>>> Cc: Stefano Babic <sbabic@denx.de>
> >>>>>
> >>>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> >>>>> single timeout.
> >>>>>
> >>>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> >>>>
> >>>> It's working here too.
> >>>>
> >>>> Tested-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
> >>>
> >>> Nice to hear, thank Albert for finding this.
> >>
> >> Thanks for sharing.
> >>
> >> Unfortunately I'm still seeing non-recoverable timeouts when doing tftp
> >> transfers. Nevertheless, with this patch sometimes I'm able to transfer
> >> big files (100MiB) without problems (I was never able before). So this is
> >> a big improvement. I applied this patch over a v2013.01, does it need any
> >> additional patch? I think I saw one email about flush dcache...
> >
> > Try Stefano's tree as Fabio suggested. I think it's already pushed and includes
> > the fixes.
> 
> I just tried, but it didn't help.
> 
> >> Considering the other guys seem to work without problems I guess this
> >> scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at 50MHz.
> >> I always suspect from the PHY.
> >
> > You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in sc_sps_1.h
> > ) . Also, can you check which of the two "ret = -EINVAL" is triggered in
> > fec_send() ? You can add simple printf() alongside both of them.
> 
> fec_send() *does not* ever fail, but I found something:
> It is very strange that the timeouts appear always after transferring between 20 and 
> 24 MiB. So I thought maybe it was not an issue with the size of the file or the number 
> of packets received, but instead a timed issue (an issue that happens after some 
> period of time). I checked, and in fact the timeouts occur exactly 10 seconds after 
> running the tftp command.
> I verified that this is what is happening by adding a udelay(100000) at fec_send(). In 
> this case, the timeout also occurs after 10 seconds, but due to the delay, I have 
> transferred only a few Kbytes.
> I tried to change different timeout related constants at tftp.c but still the issue 
> happens after 10s.
> It's like if, after these 10 seconds, the PHY lost the link or something. Really odd.
> Does it tell you anything?

Well, such a round number makes me think of an application-layer
time-out. Do you have control over how your TFTP server is
configured?

> Best regards,
> --
> Hector Palacios

Amicalement,
Marek Vasut July 12, 2013, 4:48 p.m. UTC | #14
Hi Hector,

[...]

> > Try Stefano's tree as Fabio suggested. I think it's already pushed and
> > includes the fixes.
> 
> I just tried, but it didn't help.

OK, then it's something else.

> >> Considering the other guys seem to work without problems I guess this
> >> scenario is specific to my board. I'm using a Micrel KSZ8031RNLI at
> >> 50MHz. I always suspect from the PHY.
> > 
> > You can try using the PHYLIB (CONFIG_PHYLIB and CONFIG_PHY_SMSC as in
> > sc_sps_1.h ) . Also, can you check which of the two "ret = -EINVAL" is
> > triggered in fec_send() ? You can add simple printf() alongside both of
> > them.
> 
> fec_send() *does not* ever fail

OK, it might be something else entirely. Let's take a look ...

> but I found something:
> It is very strange that the timeouts appear always after transferring
> between 20 and 24 MiB. So I thought maybe it was not an issue with the
> size of the file or the number of packets received, but instead a timed
> issue (an issue that happens after some period of time). I checked, and in
> fact the timeouts occur exactly 10 seconds after running the tftp command.
> I verified that this is what is happening by adding a udelay(100000) at
> fec_send(). In this case, the timeout also occurs after 10 seconds, but
> due to the delay, I have transferred only a few Kbytes.

Holy moly!

> I tried to change different timeout related constants at tftp.c but still
> the issue happens after 10s.
> It's like if, after these 10 seconds, the PHY lost the link or something.
> Really odd. Does it tell you anything?

LAN8720 phy, right? Try implementing something like [1], by clearing the 
EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY 
register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.

[1] 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+/b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

Best regards,
Marek Vasut
Troy Kisky July 13, 2013, 2:43 a.m. UTC | #15
On 7/11/2013 4:18 PM, Fabio Estevam wrote:
> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>> The MX28 multi-layer AHB bus can be too slow and trigger the
>> FEC DMA too early, before all the data hit the DRAM. This patch
>> ensures the data are written in the RAM before the DMA starts.
>> Please see the comment in the patch for full details.
>>
>> This patch was produced with an amazing help from Albert Aribaud,
>> who pointed out it can possibly be such a bus synchronisation
>> issue.
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>> Cc: Stefano Babic <sbabic@denx.de>
> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> single timeout.
>
> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
>
Perhaps this because our memory barriers are lacking.

Linux has this code
asm/io.h:#define writel(v,c)            ({ __iowmb(); 
writel_relaxed(v,c); })

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
asm/io.h-#include <asm/barrier.h>
asm/io.h-#define __iormb()              rmb()
asm/io.h:#define __iowmb()              wmb()
asm/io.h-#else
asm/io.h-#define __iormb()              do { } while (0)
asm/io.h:#define __iowmb()              do { } while (0)
asm/io.h-#endif

asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
asm/io.h-#include <asm/barrier.h>
asm/io.h-#define __iormb()              rmb()
asm/io.h:#define __iowmb()              wmb()
asm/io.h-#else
asm/io.h-#define __iormb()              do { } while (0)
asm/io.h:#define __iowmb()              do { } while (0)
asm/io.h-#endif

asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || 
defined(CONFIG_SMP)
asm/barrier.h-#define mb()              do { dsb(); outer_sync(); } 
while (0)
asm/barrier.h-#define rmb()             dsb()
asm/barrier.h:#define wmb()             mb()
asm/barrier.h-#else
asm/barrier.h-#define mb()              barrier()
asm/barrier.h-#define rmb()             barrier()
asm/barrier.h:#define wmb()             barrier()
asm/barrier.h-#endif

asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7
asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
c5, 4" \
asm/barrier.h-                              : : "r" (0) : "memory")
asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
c10, 4" \
asm/barrier.h-                              : : "r" (0) : "memory")
asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
c10, 5" \
Hector Palacios July 15, 2013, 8:58 a.m. UTC | #16
Hi Marek,

On 07/12/2013 06:48 PM, Marek Vasut wrote:
>>  [...]
 >>
>> but I found something:
>> It is very strange that the timeouts appear always after transferring
>> between 20 and 24 MiB. So I thought maybe it was not an issue with the
>> size of the file or the number of packets received, but instead a timed
>> issue (an issue that happens after some period of time). I checked, and in
>> fact the timeouts occur exactly 10 seconds after running the tftp command.
>> I verified that this is what is happening by adding a udelay(100000) at
>> fec_send(). In this case, the timeout also occurs after 10 seconds, but
>> due to the delay, I have transferred only a few Kbytes.
>
> Holy moly!
>
>> I tried to change different timeout related constants at tftp.c but still
>> the issue happens after 10s.
>> It's like if, after these 10 seconds, the PHY lost the link or something.
>> Really odd. Does it tell you anything?
>
> LAN8720 phy, right? Try implementing something like [1], by clearing the
> EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a simple PHY
> register RMW which you can stick somewhere into the PHY net/phy/smsc.c code.
>
> [1]
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+/b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0

No, my PHY is a Micrel KSZ8031RNLI.

The hint about the PHY possibly going to power down mode is interesting but I checked 
the PHY registers and EDPD mode (Energy Detect Power Down) is off, at least before 
running the tftp command. Power Down mode is off too, so unless these are somehow 
enabled during the TFTP process, this is not what's happening.

The sniffer shows that the TFTP server simply stops sending data packets. I can see 
however the target sending several times the ACK packet to the last received data 
packet. This would point to the TFTP server (as Albert suggested), but the fact is the 
problem occurs with different TFTP servers (I tried three different servers) and it 
does not happen with an old v2009 U-Boot using the same target.

Many times, though not always, after the timeout occurs, I cancel with CTRL+C and run 
the tftp command again, and then the file downloads completely.
The PHY is my usual suspect...

Best regards,
--
Hector Palacios
Marek Vasut July 15, 2013, 12:30 p.m. UTC | #17
Dear Hector Palacios,

> Hi Marek,
> 
> On 07/12/2013 06:48 PM, Marek Vasut wrote:
> >>  [...]
> >> 
> >> but I found something:
> >> It is very strange that the timeouts appear always after transferring
> >> between 20 and 24 MiB. So I thought maybe it was not an issue with the
> >> size of the file or the number of packets received, but instead a timed
> >> issue (an issue that happens after some period of time). I checked, and
> >> in fact the timeouts occur exactly 10 seconds after running the tftp
> >> command. I verified that this is what is happening by adding a
> >> udelay(100000) at fec_send(). In this case, the timeout also occurs
> >> after 10 seconds, but due to the delay, I have transferred only a few
> >> Kbytes.
> > 
> > Holy moly!
> > 
> >> I tried to change different timeout related constants at tftp.c but
> >> still the issue happens after 10s.
> >> It's like if, after these 10 seconds, the PHY lost the link or
> >> something. Really odd. Does it tell you anything?
> > 
> > LAN8720 phy, right? Try implementing something like [1], by clearing the
> > EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a
> > simple PHY register RMW which you can stick somewhere into the PHY
> > net/phy/smsc.c code.
> > 
> > [1]
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+
> > /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0
> 
> No, my PHY is a Micrel KSZ8031RNLI.
> 
> The hint about the PHY possibly going to power down mode is interesting but
> I checked the PHY registers and EDPD mode (Energy Detect Power Down) is
> off, at least before running the tftp command. Power Down mode is off too,
> so unless these are somehow enabled during the TFTP process, this is not
> what's happening.

OK, makes sense.

> The sniffer shows that the TFTP server simply stops sending data packets. I
> can see however the target sending several times the ACK packet to the
> last received data packet. This would point to the TFTP server (as Albert
> suggested), but the fact is the problem occurs with different TFTP servers
> (I tried three different servers) and it does not happen with an old v2009
> U-Boot using the same target.

Can you try running "dcache off" command before running the TFTP transfer? Does 
it still behave like this?

You might need to define #define CONFIG_CMD_CACHE for this to work.

> Many times, though not always, after the timeout occurs, I cancel with
> CTRL+C and run the tftp command again, and then the file downloads
> completely.
> The PHY is my usual suspect...

Dang :-(

Best regards,
Marek Vasut
Albert ARIBAUD July 15, 2013, 1:41 p.m. UTC | #18
Hi Troy,

On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky
<troy.kisky@boundarydevices.com> wrote:

> On 7/11/2013 4:18 PM, Fabio Estevam wrote:
> > On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> >> The MX28 multi-layer AHB bus can be too slow and trigger the
> >> FEC DMA too early, before all the data hit the DRAM. This patch
> >> ensures the data are written in the RAM before the DMA starts.
> >> Please see the comment in the patch for full details.
> >>
> >> This patch was produced with an amazing help from Albert Aribaud,
> >> who pointed out it can possibly be such a bus synchronisation
> >> issue.
> >>
> >> Signed-off-by: Marek Vasut <marex@denx.de>
> >> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> >> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> >> Cc: Stefano Babic <sbabic@denx.de>
> > Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> > single timeout.
> >
> > Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> > _______________________________________________
> > U-Boot mailing list
> > U-Boot@lists.denx.de
> > http://lists.denx.de/mailman/listinfo/u-boot
> >
> Perhaps this because our memory barriers are lacking.
> 
> Linux has this code
> asm/io.h:#define writel(v,c)            ({ __iowmb(); 
> writel_relaxed(v,c); })
> 
> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> asm/io.h-#include <asm/barrier.h>
> asm/io.h-#define __iormb()              rmb()
> asm/io.h:#define __iowmb()              wmb()
> asm/io.h-#else
> asm/io.h-#define __iormb()              do { } while (0)
> asm/io.h:#define __iowmb()              do { } while (0)
> asm/io.h-#endif
> 
> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> asm/io.h-#include <asm/barrier.h>
> asm/io.h-#define __iormb()              rmb()
> asm/io.h:#define __iowmb()              wmb()
> asm/io.h-#else
> asm/io.h-#define __iormb()              do { } while (0)
> asm/io.h:#define __iowmb()              do { } while (0)
> asm/io.h-#endif
> 
> asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || 
> defined(CONFIG_SMP)
> asm/barrier.h-#define mb()              do { dsb(); outer_sync(); } 
> while (0)
> asm/barrier.h-#define rmb()             dsb()
> asm/barrier.h:#define wmb()             mb()
> asm/barrier.h-#else
> asm/barrier.h-#define mb()              barrier()
> asm/barrier.h-#define rmb()             barrier()
> asm/barrier.h:#define wmb()             barrier()
> asm/barrier.h-#endif
> 
> asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7
> asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
> asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
> asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
> asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
> asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
> c5, 4" \
> asm/barrier.h-                              : : "r" (0) : "memory")
> asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
> c10, 4" \
> asm/barrier.h-                              : : "r" (0) : "memory")
> asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
> c10, 5" \
> 
> _____________________________________
> Can you try just adding a dsb() instead of the dummy read?
> 
> If this also fixes your problem, it seems better to fix our writel macro

Not sure I understand who you are answering to exactly, as Fabio
indicates his problem is solved.

Besides, Marek and I had in fact investigated barriers, adding some as
I did in times past in mvgbe.c, and fiddling with the one already in
dcache_flush_range(). None of this had any effect.

However, if our barriers are lacking, then they may fail us somehow on
other occasions, so best is if we analyze the failings. Can you clarify
in what respect exactly they are lacking? For instance, are all our
barriers lacking, or only some, and which ones?

Amicalement,
Hector Palacios July 15, 2013, 3:09 p.m. UTC | #19
Hi Marek,

On 07/15/2013 02:30 PM, Marek Vasut wrote:
> Dear Hector Palacios,
>
>> Hi Marek,
>>
>> On 07/12/2013 06:48 PM, Marek Vasut wrote:
>>>>   [...]
>>>>
>>>> but I found something:
>>>> It is very strange that the timeouts appear always after transferring
>>>> between 20 and 24 MiB. So I thought maybe it was not an issue with the
>>>> size of the file or the number of packets received, but instead a timed
>>>> issue (an issue that happens after some period of time). I checked, and
>>>> in fact the timeouts occur exactly 10 seconds after running the tftp
>>>> command. I verified that this is what is happening by adding a
>>>> udelay(100000) at fec_send(). In this case, the timeout also occurs
>>>> after 10 seconds, but due to the delay, I have transferred only a few
>>>> Kbytes.
>>>
>>> Holy moly!
>>>
>>>> I tried to change different timeout related constants at tftp.c but
>>>> still the issue happens after 10s.
>>>> It's like if, after these 10 seconds, the PHY lost the link or
>>>> something. Really odd. Does it tell you anything?
>>>
>>> LAN8720 phy, right? Try implementing something like [1], by clearing the
>>> EDPWRDOWN bit , the PHY will never enter low-power mode. It's just a
>>> simple PHY register RMW which you can stick somewhere into the PHY
>>> net/phy/smsc.c code.
>>>
>>> [1]
>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine/+
>>> /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0
>>
>> No, my PHY is a Micrel KSZ8031RNLI.
>>
>> The hint about the PHY possibly going to power down mode is interesting but
>> I checked the PHY registers and EDPD mode (Energy Detect Power Down) is
>> off, at least before running the tftp command. Power Down mode is off too,
>> so unless these are somehow enabled during the TFTP process, this is not
>> what's happening.
>
> OK, makes sense.
>
>> The sniffer shows that the TFTP server simply stops sending data packets. I
>> can see however the target sending several times the ACK packet to the
>> last received data packet. This would point to the TFTP server (as Albert
>> suggested), but the fact is the problem occurs with different TFTP servers
>> (I tried three different servers) and it does not happen with an old v2009
>> U-Boot using the same target.
>
> Can you try running "dcache off" command before running the TFTP transfer? Does
> it still behave like this?
>
> You might need to define #define CONFIG_CMD_CACHE for this to work.

Sourcery!
It's not that it works with dcache off, I found something even more strange:
The way I reproduce this issue is by setting the 'bootcmd' to 'dboot ${loadaddr} 
file100M'. When you set the 'bootcmd' like this:

	setenv bootcmd tftp ${loadaddr} file100M

this eventually expands to

	bootcmd=tftp 0x42000000 file100M

So this is the command that runs automatically after the bootdelay.
I just discovered that if instead of letting the auto boot run, I press a key to stop 
the auto boot and run the command by hand (either running 'boot' or typing the command 
'tftp 0x42000000 file100M'), the tftp transfer works perfectly.
On the other hand, if I do the same but use the environment variable ${loadaddr}, i.e. 
'tftp ${loadaddr} file100M'. It will stop after 10 seconds.

And to make things funnier I just reproduced this issue on the MX28EVK using the imx 
U-Boot custodian tree at commit a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That 
discards being a platform issue.
@Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in your EVK?
If it doesn't fail, could you try running it again after playing with the environment 
(setting/printing some variables).
As I said, this issue appeared with different TFTP servers and is independent of 
whether the dcache is or not enabled.

Best regards,
--
Hector Palacios
Marek Vasut July 15, 2013, 3:12 p.m. UTC | #20
Hi Hector,

> Hi Marek,
> 
> On 07/15/2013 02:30 PM, Marek Vasut wrote:
> > Dear Hector Palacios,
> > 
> >> Hi Marek,
> >> 
> >> On 07/12/2013 06:48 PM, Marek Vasut wrote:
> >>>>   [...]
> >>>> 
> >>>> but I found something:
> >>>> It is very strange that the timeouts appear always after transferring
> >>>> between 20 and 24 MiB. So I thought maybe it was not an issue with the
> >>>> size of the file or the number of packets received, but instead a
> >>>> timed issue (an issue that happens after some period of time). I
> >>>> checked, and in fact the timeouts occur exactly 10 seconds after
> >>>> running the tftp command. I verified that this is what is happening
> >>>> by adding a udelay(100000) at fec_send(). In this case, the timeout
> >>>> also occurs after 10 seconds, but due to the delay, I have
> >>>> transferred only a few Kbytes.
> >>> 
> >>> Holy moly!
> >>> 
> >>>> I tried to change different timeout related constants at tftp.c but
> >>>> still the issue happens after 10s.
> >>>> It's like if, after these 10 seconds, the PHY lost the link or
> >>>> something. Really odd. Does it tell you anything?
> >>> 
> >>> LAN8720 phy, right? Try implementing something like [1], by clearing
> >>> the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just
> >>> a simple PHY register RMW which you can stick somewhere into the PHY
> >>> net/phy/smsc.c code.
> >>> 
> >>> [1]
> >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine
> >>> /+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0
> >> 
> >> No, my PHY is a Micrel KSZ8031RNLI.
> >> 
> >> The hint about the PHY possibly going to power down mode is interesting
> >> but I checked the PHY registers and EDPD mode (Energy Detect Power
> >> Down) is off, at least before running the tftp command. Power Down mode
> >> is off too, so unless these are somehow enabled during the TFTP
> >> process, this is not what's happening.
> > 
> > OK, makes sense.
> > 
> >> The sniffer shows that the TFTP server simply stops sending data
> >> packets. I can see however the target sending several times the ACK
> >> packet to the last received data packet. This would point to the TFTP
> >> server (as Albert suggested), but the fact is the problem occurs with
> >> different TFTP servers (I tried three different servers) and it does
> >> not happen with an old v2009 U-Boot using the same target.
> > 
> > Can you try running "dcache off" command before running the TFTP
> > transfer? Does it still behave like this?
> > 
> > You might need to define #define CONFIG_CMD_CACHE for this to work.
> 
> Sourcery!
> It's not that it works with dcache off, I found something even more
> strange: The way I reproduce this issue is by setting the 'bootcmd' to
> 'dboot ${loadaddr} file100M'. When you set the 'bootcmd' like this:
> 
> 	setenv bootcmd tftp ${loadaddr} file100M
> 
> this eventually expands to
> 
> 	bootcmd=tftp 0x42000000 file100M
> 
> So this is the command that runs automatically after the bootdelay.
> I just discovered that if instead of letting the auto boot run, I press a
> key to stop the auto boot and run the command by hand (either running
> 'boot' or typing the command 'tftp 0x42000000 file100M'), the tftp
> transfer works perfectly.
> On the other hand, if I do the same but use the environment variable
> ${loadaddr}, i.e. 'tftp ${loadaddr} file100M'. It will stop after 10
> seconds.

Count it be your hardware needs some more delay to stabilize?

> And to make things funnier I just reproduced this issue on the MX28EVK
> using the imx U-Boot custodian tree at commit
> a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That discards being a platform
> issue.

It still might be a PHY issue, no?

> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'

I guess that'd be a 100MB file?

> in
> your EVK? If it doesn't fail, could you try running it again after playing
> with the environment (setting/printing some variables).
> As I said, this issue appeared with different TFTP servers and is
> independent of whether the dcache is or not enabled.
> 
> Best regards,
> --
> Hector Palacios

Best regards,
Marek Vasut
Hector Palacios July 15, 2013, 3:24 p.m. UTC | #21
On 07/15/2013 05:12 PM, Marek Vasut wrote:
> Hi Hector,
>
>> Hi Marek,
>>
>> On 07/15/2013 02:30 PM, Marek Vasut wrote:
>>> Dear Hector Palacios,
>>>
>>>> Hi Marek,
>>>>
>>>> On 07/12/2013 06:48 PM, Marek Vasut wrote:
>>>>>>    [...]
>>>>>>
>>>>>> but I found something:
>>>>>> It is very strange that the timeouts appear always after transferring
>>>>>> between 20 and 24 MiB. So I thought maybe it was not an issue with the
>>>>>> size of the file or the number of packets received, but instead a
>>>>>> timed issue (an issue that happens after some period of time). I
>>>>>> checked, and in fact the timeouts occur exactly 10 seconds after
>>>>>> running the tftp command. I verified that this is what is happening
>>>>>> by adding a udelay(100000) at fec_send(). In this case, the timeout
>>>>>> also occurs after 10 seconds, but due to the delay, I have
>>>>>> transferred only a few Kbytes.
>>>>>
>>>>> Holy moly!
>>>>>
>>>>>> I tried to change different timeout related constants at tftp.c but
>>>>>> still the issue happens after 10s.
>>>>>> It's like if, after these 10 seconds, the PHY lost the link or
>>>>>> something. Really odd. Does it tell you anything?
>>>>>
>>>>> LAN8720 phy, right? Try implementing something like [1], by clearing
>>>>> the EDPWRDOWN bit , the PHY will never enter low-power mode. It's just
>>>>> a simple PHY register RMW which you can stick somewhere into the PHY
>>>>> net/phy/smsc.c code.
>>>>>
>>>>> [1]
>>>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/djbw/dmaengine
>>>>> /+ /b629820d18fa65cc598390e4b9712fd5f83ee693%5E!/#F0
>>>>
>>>> No, my PHY is a Micrel KSZ8031RNLI.
>>>>
>>>> The hint about the PHY possibly going to power down mode is interesting
>>>> but I checked the PHY registers and EDPD mode (Energy Detect Power
>>>> Down) is off, at least before running the tftp command. Power Down mode
>>>> is off too, so unless these are somehow enabled during the TFTP
>>>> process, this is not what's happening.
>>>
>>> OK, makes sense.
>>>
>>>> The sniffer shows that the TFTP server simply stops sending data
>>>> packets. I can see however the target sending several times the ACK
>>>> packet to the last received data packet. This would point to the TFTP
>>>> server (as Albert suggested), but the fact is the problem occurs with
>>>> different TFTP servers (I tried three different servers) and it does
>>>> not happen with an old v2009 U-Boot using the same target.
>>>
>>> Can you try running "dcache off" command before running the TFTP
>>> transfer? Does it still behave like this?
>>>
>>> You might need to define #define CONFIG_CMD_CACHE for this to work.
>>
>> Sourcery!
>> It's not that it works with dcache off, I found something even more
>> strange: The way I reproduce this issue is by setting the 'bootcmd' to
>> 'dboot ${loadaddr} file100M'. When you set the 'bootcmd' like this:
>>
>> 	setenv bootcmd tftp ${loadaddr} file100M
>>
>> this eventually expands to
>>
>> 	bootcmd=tftp 0x42000000 file100M
>>
>> So this is the command that runs automatically after the bootdelay.
>> I just discovered that if instead of letting the auto boot run, I press a
>> key to stop the auto boot and run the command by hand (either running
>> 'boot' or typing the command 'tftp 0x42000000 file100M'), the tftp
>> transfer works perfectly.
>> On the other hand, if I do the same but use the environment variable
>> ${loadaddr}, i.e. 'tftp ${loadaddr} file100M'. It will stop after 10
>> seconds.
>
> Count it be your hardware needs some more delay to stabilize?

It's not a problem of running the command later. If I run the command later, manually, 
but use ${loadaddr} instead of the hardcoded value 0x42000000, then it will fail. It's 
like if accessing the environment for grabbing the value of ${loadaddr} or for 
grabbing the value of ${bootcmd}, made the problem appear, while if I run a manual 
command that does not require accesing the environment, it works.

>> And to make things funnier I just reproduced this issue on the MX28EVK
>> using the imx U-Boot custodian tree at commit
>> a3f170cdbf7ae0bd24c94c2f46725699bbd69f05. That discards being a platform
>> issue.
>
> It still might be a PHY issue, no?

I guess it might, but the MX28EVK uses a different PHY. An SMSC, I believe.

>> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'
>
> I guess that'd be a 100MB file?

Yes. The size is not that important as far as it takes more than 10 seconds to 
download. The keypoint here is using the environment variable, rather than an hex value.

Best regards,
--
Hector Palacios
Troy Kisky July 15, 2013, 5:39 p.m. UTC | #22
On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:
> Hi Troy,
>
> On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky
> <troy.kisky@boundarydevices.com> wrote:
>
>> On 7/11/2013 4:18 PM, Fabio Estevam wrote:
>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
>>>> FEC DMA too early, before all the data hit the DRAM. This patch
>>>> ensures the data are written in the RAM before the DMA starts.
>>>> Please see the comment in the patch for full details.
>>>>
>>>> This patch was produced with an amazing help from Albert Aribaud,
>>>> who pointed out it can possibly be such a bus synchronisation
>>>> issue.
>>>>
>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>>>> Cc: Stefano Babic <sbabic@denx.de>
>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
>>> single timeout.
>>>
>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
>>> _______________________________________________
>>> U-Boot mailing list
>>> U-Boot@lists.denx.de
>>> http://lists.denx.de/mailman/listinfo/u-boot
>>>
>> Perhaps this because our memory barriers are lacking.
>>
>> Linux has this code
>> asm/io.h:#define writel(v,c)            ({ __iowmb();
>> writel_relaxed(v,c); })
>>
>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>> asm/io.h-#include <asm/barrier.h>
>> asm/io.h-#define __iormb()              rmb()
>> asm/io.h:#define __iowmb()              wmb()
>> asm/io.h-#else
>> asm/io.h-#define __iormb()              do { } while (0)
>> asm/io.h:#define __iowmb()              do { } while (0)
>> asm/io.h-#endif
>>
>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>> asm/io.h-#include <asm/barrier.h>
>> asm/io.h-#define __iormb()              rmb()
>> asm/io.h:#define __iowmb()              wmb()
>> asm/io.h-#else
>> asm/io.h-#define __iormb()              do { } while (0)
>> asm/io.h:#define __iowmb()              do { } while (0)
>> asm/io.h-#endif
>>
>> asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) ||
>> defined(CONFIG_SMP)
>> asm/barrier.h-#define mb()              do { dsb(); outer_sync(); }
>> while (0)
>> asm/barrier.h-#define rmb()             dsb()
>> asm/barrier.h:#define wmb()             mb()
>> asm/barrier.h-#else
>> asm/barrier.h-#define mb()              barrier()
>> asm/barrier.h-#define rmb()             barrier()
>> asm/barrier.h:#define wmb()             barrier()
>> asm/barrier.h-#endif
>>
>> asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7
>> asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
>> asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
>> asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>> c5, 4" \
>> asm/barrier.h-                              : : "r" (0) : "memory")
>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>> c10, 4" \
>> asm/barrier.h-                              : : "r" (0) : "memory")
>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>> c10, 5" \
>>
>> _____________________________________
>> Can you try just adding a dsb() instead of the dummy read?
>>
>> If this also fixes your problem, it seems better to fix our writel macro
> Not sure I understand who you are answering to exactly, as Fabio
> indicates his problem is solved.
>
> Besides, Marek and I had in fact investigated barriers, adding some as
> I did in times past in mvgbe.c, and fiddling with the one already in
> dcache_flush_range(). None of this had any effect.

You tried adding a  dsb()  to dcache_flush_range()?
That should have fixed the problem as well.

>
> However, if our barriers are lacking, then they may fail us somehow on
> other occasions, so best is if we analyze the failings. Can you clarify
> in what respect exactly they are lacking? For instance, are all our
> barriers lacking, or only some, and which ones?
>
> Amicalement,

Linux has

Kconfig:config ARM_DMA_MEM_BUFFERABLE
Kconfig-        bool "Use non-cacheable memory for DMA" if (CPU_V6 || 
CPU_V6K) && !CPU_V7
Kconfig-        depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP 
|| \
Kconfig-                     MACH_REALVIEW_PB11MP)
Kconfig-        default y if CPU_V6 || CPU_V6K || CPU_V7
Kconfig-        help


So, if this symbol is y then all writel/readl will be preceded by a 
dsb() as shown above.

However, u-boot probably uses cacheable memory for dma, so maybe u-boot 
is also correct with

asm/io.h-#define dmb()          __asm__ __volatile__ ("" : : : "memory")
asm/io.h-#define __iormb()    dmb()
asm/io.h:#define __iowmb()    dmb()


and no dsb(), but perhaps flush_dcache still needs one at the end.


Troy
Troy Kisky July 15, 2013, 7:59 p.m. UTC | #23
On 7/15/2013 10:39 AM, Troy Kisky wrote:
> On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:
>> Hi Troy,
>>
>> On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky
>> <troy.kisky@boundarydevices.com> wrote:
>>
>>> On 7/11/2013 4:18 PM, Fabio Estevam wrote:
>>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
>>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
>>>>> FEC DMA too early, before all the data hit the DRAM. This patch
>>>>> ensures the data are written in the RAM before the DMA starts.
>>>>> Please see the comment in the patch for full details.
>>>>>
>>>>> This patch was produced with an amazing help from Albert Aribaud,
>>>>> who pointed out it can possibly be such a bus synchronisation
>>>>> issue.
>>>>>
>>>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
>>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
>>>>> Cc: Stefano Babic <sbabic@denx.de>
>>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
>>>> single timeout.
>>>>
>>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
>>>> _______________________________________________
>>>> U-Boot mailing list
>>>> U-Boot@lists.denx.de
>>>> http://lists.denx.de/mailman/listinfo/u-boot
>>>>
>>> Perhaps this because our memory barriers are lacking.
>>>
>>> Linux has this code
>>> asm/io.h:#define writel(v,c)            ({ __iowmb();
>>> writel_relaxed(v,c); })
>>>
>>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>>> asm/io.h-#include <asm/barrier.h>
>>> asm/io.h-#define __iormb()              rmb()
>>> asm/io.h:#define __iowmb()              wmb()
>>> asm/io.h-#else
>>> asm/io.h-#define __iormb()              do { } while (0)
>>> asm/io.h:#define __iowmb()              do { } while (0)
>>> asm/io.h-#endif
>>>
>>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>>> asm/io.h-#include <asm/barrier.h>
>>> asm/io.h-#define __iormb()              rmb()
>>> asm/io.h:#define __iowmb()              wmb()
>>> asm/io.h-#else
>>> asm/io.h-#define __iormb()              do { } while (0)
>>> asm/io.h:#define __iowmb()              do { } while (0)
>>> asm/io.h-#endif
>>>
>>> asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) ||
>>> defined(CONFIG_SMP)
>>> asm/barrier.h-#define mb()              do { dsb(); outer_sync(); }
>>> while (0)
>>> asm/barrier.h-#define rmb()             dsb()
>>> asm/barrier.h:#define wmb()             mb()
>>> asm/barrier.h-#else
>>> asm/barrier.h-#define mb()              barrier()
>>> asm/barrier.h-#define rmb()             barrier()
>>> asm/barrier.h:#define wmb()             barrier()
>>> asm/barrier.h-#endif
>>>
>>> asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7
>>> asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
>>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
>>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
>>> asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
>>> asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>>> c5, 4" \
>>> asm/barrier.h-                              : : "r" (0) : "memory")
>>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>>> c10, 4" \
>>> asm/barrier.h-                              : : "r" (0) : "memory")
>>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
>>> c10, 5" \
>>>
>>> _____________________________________
>>> Can you try just adding a dsb() instead of the dummy read?
>>>
>>> If this also fixes your problem, it seems better to fix our writel 
>>> macro
>> Not sure I understand who you are answering to exactly, as Fabio
>> indicates his problem is solved.
>>
>> Besides, Marek and I had in fact investigated barriers, adding some as
>> I did in times past in mvgbe.c, and fiddling with the one already in
>> dcache_flush_range(). None of this had any effect.
>
> You tried adding a  dsb()  to dcache_flush_range()?
> That should have fixed the problem as well.
>
>>
>> However, if our barriers are lacking, then they may fail us somehow on
>> other occasions, so best is if we analyze the failings. Can you clarify
>> in what respect exactly they are lacking? For instance, are all our
>> barriers lacking, or only some, and which ones?
>>
>> Amicalement,
>
> Linux has
>
> Kconfig:config ARM_DMA_MEM_BUFFERABLE
> Kconfig-        bool "Use non-cacheable memory for DMA" if (CPU_V6 || 
> CPU_V6K) && !CPU_V7
> Kconfig-        depends on !(MACH_REALVIEW_PB1176 || 
> REALVIEW_EB_ARM11MP || \
> Kconfig-                     MACH_REALVIEW_PB11MP)
> Kconfig-        default y if CPU_V6 || CPU_V6K || CPU_V7
> Kconfig-        help
>
>
> So, if this symbol is y then all writel/readl will be preceded by a 
> dsb() as shown above.
>
> However, u-boot probably uses cacheable memory for dma, so maybe 
> u-boot is also correct with
>
> asm/io.h-#define dmb()          __asm__ __volatile__ ("" : : : "memory")
> asm/io.h-#define __iormb()    dmb()
> asm/io.h:#define __iowmb()    dmb()
>
>
> and no dsb(), but perhaps flush_dcache still needs one at the end.
>
>
> Troy
>

for armv7, flush dcache does have a dsb.

//u-boot
#define CP15DSB asm volatile ("mcr     p15, 0, %0, c7, c10, 4" : : "r" (0))

//linux
asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
c10, 4"  : : "r" (0) : "memory")



Don't know why I didn't see that before.
So, I don't know why that wasn't good enough.

Maybe  CONFIG_SYS_DCACHE_OFF was set and

void flush_dcache_range(unsigned long start, unsigned long stop)
{
}

Needs a dsb too???


Troy
Albert ARIBAUD July 15, 2013, 8:20 p.m. UTC | #24
Hi Troy,

On Mon, 15 Jul 2013 12:59:56 -0700, Troy Kisky
<troy.kisky@boundarydevices.com> wrote:

> On 7/15/2013 10:39 AM, Troy Kisky wrote:
> > On 7/15/2013 6:41 AM, Albert ARIBAUD wrote:
> >> Hi Troy,
> >>
> >> On Fri, 12 Jul 2013 19:43:07 -0700, Troy Kisky
> >> <troy.kisky@boundarydevices.com> wrote:
> >>
> >>> On 7/11/2013 4:18 PM, Fabio Estevam wrote:
> >>>> On Thu, Jul 11, 2013 at 8:03 PM, Marek Vasut <marex@denx.de> wrote:
> >>>>> The MX28 multi-layer AHB bus can be too slow and trigger the
> >>>>> FEC DMA too early, before all the data hit the DRAM. This patch
> >>>>> ensures the data are written in the RAM before the DMA starts.
> >>>>> Please see the comment in the patch for full details.
> >>>>>
> >>>>> This patch was produced with an amazing help from Albert Aribaud,
> >>>>> who pointed out it can possibly be such a bus synchronisation
> >>>>> issue.
> >>>>>
> >>>>> Signed-off-by: Marek Vasut <marex@denx.de>
> >>>>> Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
> >>>>> Cc: Fabio Estevam <fabio.estevam@freescale.com>
> >>>>> Cc: Stefano Babic <sbabic@denx.de>
> >>>> Excellent, managed to transfer 90MB via TFTP on mx28evk without a
> >>>> single timeout.
> >>>>
> >>>> Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
> >>>> _______________________________________________
> >>>> U-Boot mailing list
> >>>> U-Boot@lists.denx.de
> >>>> http://lists.denx.de/mailman/listinfo/u-boot
> >>>>
> >>> Perhaps this because our memory barriers are lacking.
> >>>
> >>> Linux has this code
> >>> asm/io.h:#define writel(v,c)            ({ __iowmb();
> >>> writel_relaxed(v,c); })
> >>>
> >>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> >>> asm/io.h-#include <asm/barrier.h>
> >>> asm/io.h-#define __iormb()              rmb()
> >>> asm/io.h:#define __iowmb()              wmb()
> >>> asm/io.h-#else
> >>> asm/io.h-#define __iormb()              do { } while (0)
> >>> asm/io.h:#define __iowmb()              do { } while (0)
> >>> asm/io.h-#endif
> >>>
> >>> asm/io.h-#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> >>> asm/io.h-#include <asm/barrier.h>
> >>> asm/io.h-#define __iormb()              rmb()
> >>> asm/io.h:#define __iowmb()              wmb()
> >>> asm/io.h-#else
> >>> asm/io.h-#define __iormb()              do { } while (0)
> >>> asm/io.h:#define __iowmb()              do { } while (0)
> >>> asm/io.h-#endif
> >>>
> >>> asm/barrier.h-#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) ||
> >>> defined(CONFIG_SMP)
> >>> asm/barrier.h-#define mb()              do { dsb(); outer_sync(); }
> >>> while (0)
> >>> asm/barrier.h-#define rmb()             dsb()
> >>> asm/barrier.h:#define wmb()             mb()
> >>> asm/barrier.h-#else
> >>> asm/barrier.h-#define mb()              barrier()
> >>> asm/barrier.h-#define rmb()             barrier()
> >>> asm/barrier.h:#define wmb()             barrier()
> >>> asm/barrier.h-#endif
> >>>
> >>> asm/barrier.h-#if __LINUX_ARM_ARCH__ >= 7
> >>> asm/barrier.h-#define isb() __asm__ __volatile__ ("isb" : : : "memory")
> >>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
> >>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("dmb" : : : "memory")
> >>> asm/barrier.h-#elif defined(CONFIG_CPU_XSC3) || __LINUX_ARM_ARCH__ == 6
> >>> asm/barrier.h-#define isb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
> >>> c5, 4" \
> >>> asm/barrier.h-                              : : "r" (0) : "memory")
> >>> asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
> >>> c10, 4" \
> >>> asm/barrier.h-                              : : "r" (0) : "memory")
> >>> asm/barrier.h-#define dmb() __asm__ __volatile__ ("mcr p15, 0, %0, c7,
> >>> c10, 5" \
> >>>
> >>> _____________________________________
> >>> Can you try just adding a dsb() instead of the dummy read?
> >>>
> >>> If this also fixes your problem, it seems better to fix our writel 
> >>> macro
> >> Not sure I understand who you are answering to exactly, as Fabio
> >> indicates his problem is solved.
> >>
> >> Besides, Marek and I had in fact investigated barriers, adding some as
> >> I did in times past in mvgbe.c, and fiddling with the one already in
> >> dcache_flush_range(). None of this had any effect.
> >
> > You tried adding a  dsb()  to dcache_flush_range()?
> > That should have fixed the problem as well.
> >
> >>
> >> However, if our barriers are lacking, then they may fail us somehow on
> >> other occasions, so best is if we analyze the failings. Can you clarify
> >> in what respect exactly they are lacking? For instance, are all our
> >> barriers lacking, or only some, and which ones?
> >>
> >> Amicalement,
> >
> > Linux has
> >
> > Kconfig:config ARM_DMA_MEM_BUFFERABLE
> > Kconfig-        bool "Use non-cacheable memory for DMA" if (CPU_V6 || 
> > CPU_V6K) && !CPU_V7
> > Kconfig-        depends on !(MACH_REALVIEW_PB1176 || 
> > REALVIEW_EB_ARM11MP || \
> > Kconfig-                     MACH_REALVIEW_PB11MP)
> > Kconfig-        default y if CPU_V6 || CPU_V6K || CPU_V7
> > Kconfig-        help
> >
> >
> > So, if this symbol is y then all writel/readl will be preceded by a 
> > dsb() as shown above.
> >
> > However, u-boot probably uses cacheable memory for dma, so maybe 
> > u-boot is also correct with
> >
> > asm/io.h-#define dmb()          __asm__ __volatile__ ("" : : : "memory")
> > asm/io.h-#define __iormb()    dmb()
> > asm/io.h:#define __iowmb()    dmb()
> >
> >
> > and no dsb(), but perhaps flush_dcache still needs one at the end.
> >
> >
> > Troy
> >
> 
> for armv7, flush dcache does have a dsb.
> 
> //u-boot
> #define CP15DSB asm volatile ("mcr     p15, 0, %0, c7, c10, 4" : : "r" (0))
> 
> //linux
> asm/barrier.h:#define dsb() __asm__ __volatile__ ("mcr p15, 0, %0, c7, 
> c10, 4"  : : "r" (0) : "memory")
> 
> 
> 
> Don't know why I didn't see that before.
> So, I don't know why that wasn't good enough.
> 
> Maybe  CONFIG_SYS_DCACHE_OFF was set and
> 
> void flush_dcache_range(unsigned long start, unsigned long stop)
> {
> }
> 
> Needs a dsb too???

I think everything is explained in Marek's commit message.

> Troy

Amicalement,
Albert ARIBAUD July 15, 2013, 8:20 p.m. UTC | #25
Hi Troy,

On Mon, 15 Jul 2013 10:39:54 -0700, Troy Kisky
<troy.kisky@boundarydevices.com> wrote:

> > Besides, Marek and I had in fact investigated barriers, adding some as
> > I did in times past in mvgbe.c, and fiddling with the one already in
> > dcache_flush_range(). None of this had any effect.
> 
> You tried adding a  dsb()  to dcache_flush_range()?
> That should have fixed the problem as well.

There already was a memory barrier -- the only one kind known bu
ARM926J-S, which is a write buffer(s) drain -- and no, it should not
have fixed the problem (and did not), because the issue is not about
pushing the transactions out of the CPU soon enough; it is about having
the EMI perform them before it passes our 'go' command to the ENET DMA.

> > However, if our barriers are lacking, then they may fail us somehow on
> > other occasions, so best is if we analyze the failings. Can you clarify
> > in what respect exactly they are lacking? For instance, are all our
> > barriers lacking, or only some, and which ones?
> >
> > Amicalement,
> 
> Linux has
> 
> Kconfig:config ARM_DMA_MEM_BUFFERABLE
> Kconfig-        bool "Use non-cacheable memory for DMA" if (CPU_V6 || 
> CPU_V6K) && !CPU_V7
> Kconfig-        depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP 
> || \
> Kconfig-                     MACH_REALVIEW_PB11MP)
> Kconfig-        default y if CPU_V6 || CPU_V6K || CPU_V7
> Kconfig-        help
> 
> 
> So, if this symbol is y then all writel/readl will be preceded by a 
> dsb() as shown above.

While I do understand what is done there, I do not see the interest
since 1) in our issue, the problem was not in any writel() or readl(),
and 2) U-Boot uses readl() and writel() for device memory ranges, which
are not cached at all and thus do not need any flush, invalidate or
barrier.

> However, u-boot probably uses cacheable memory for dma, so maybe u-boot 
> is also correct with

Actually, in the driver where the issue occurred, U-boot does use
cacheable memory for DMA; that's why the driver contains calls to
flush_dcache_range() and invalidate_dcache_range() on the descriptors...

> asm/io.h-#define dmb()          __asm__ __volatile__ ("" : : : "memory")
> asm/io.h-#define __iormb()    dmb()
> asm/io.h:#define __iowmb()    dmb()
> 
> and no dsb(), but perhaps flush_dcache still needs one at the end.

... but that's for descriptors, which are not accessed uwing writel()
or readl(); for device registers, such as ENET, we use writel() and
readl() but no cache.

So far we never had to use any data memory barrier on peripheral
accesses. The worst that happened AFAIK is when we needed instruction
barriers to prevent the compiler from mis-ordering device writes wrt
their source code order; and at that time, our readl() and writel()
implementations were not volatile. Today, I am pretty sure that these
instruction barriers are uneeded.

> Troy

Amicalement,
Troy Kisky July 15, 2013, 9:18 p.m. UTC | #26
On 7/15/2013 1:20 PM, Albert ARIBAUD wrote:
> Hi Troy,
>
> On Mon, 15 Jul 2013 10:39:54 -0700, Troy Kisky
> <troy.kisky@boundarydevices.com> wrote:
>
>>> Besides, Marek and I had in fact investigated barriers, adding some as
>>> I did in times past in mvgbe.c, and fiddling with the one already in
>>> dcache_flush_range(). None of this had any effect.
>> You tried adding a  dsb()  to dcache_flush_range()?
>> That should have fixed the problem as well.
> There already was a memory barrier -- the only one kind known bu
> ARM926J-S, which is a write buffer(s) drain -- and no, it should not
> have fixed the problem (and did not), because the issue is not about
> pushing the transactions out of the CPU soon enough; it is about having
> the EMI perform them before it passes our 'go' command to the ENET DMA.
>
Thanks for straightening me out. My back just popped a couple of times.

Troy
Fabio Estevam July 16, 2013, 3:51 a.m. UTC | #27
On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
<hector.palacios@digi.com> wrote:

> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in
> your EVK?

Yes, this is what I have been running since the beginning.

> If it doesn't fail, could you try running it again after playing with the
> environment (setting/printing some variables).

I can't reproduce the problem here.

> As I said, this issue appeared with different TFTP servers and is
> independent of whether the dcache is or not enabled.

Can you transfer from a PC to another PC via TFTP?
Fabio Estevam July 16, 2013, 4:18 a.m. UTC | #28
On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> wrote:
> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
> <hector.palacios@digi.com> wrote:
>
>> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M' in
>> your EVK?
>
> Yes, this is what I have been running since the beginning.
>
>> If it doesn't fail, could you try running it again after playing with the
>> environment (setting/printing some variables).
>
> I can't reproduce the problem here.
>
>> As I said, this issue appeared with different TFTP servers and is
>> independent of whether the dcache is or not enabled.
>
> Can you transfer from a PC to another PC via TFTP?

This is my tftp configuration for your reference:

$ cat /etc/xinetd.d/tftp
        service tftp
        {
        protocol        = udp
        port            = 69
        socket_type     = dgram
        wait            = yes
        user            = nobody
        server          = /usr/sbin/in.tftpd
        server_args     = /tftpboot
        disable         = no
        }
Marek Vasut July 16, 2013, 4:44 a.m. UTC | #29
Dear Fabio Estevam,

> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> wrote:
> > On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
> > 
> > <hector.palacios@digi.com> wrote:
> >> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'
> >> in your EVK?
> > 
> > Yes, this is what I have been running since the beginning.
> > 
> >> If it doesn't fail, could you try running it again after playing with
> >> the environment (setting/printing some variables).
> > 
> > I can't reproduce the problem here.
> > 
> >> As I said, this issue appeared with different TFTP servers and is
> >> independent of whether the dcache is or not enabled.
> > 
> > Can you transfer from a PC to another PC via TFTP?
> 
> This is my tftp configuration for your reference:
> 
> $ cat /etc/xinetd.d/tftp
>         service tftp
>         {
>         protocol        = udp
>         port            = 69
>         socket_type     = dgram
>         wait            = yes
>         user            = nobody
>         server          = /usr/sbin/in.tftpd
>         server_args     = /tftpboot
>         disable         = no
>         }

Another thing of interest would be a 'tcpdump' pcap capture of that connection.

Best regards,
Marek Vasut
Hector Palacios July 17, 2013, 3:55 p.m. UTC | #30
Dear Marek,

On 07/16/2013 06:44 AM, Marek Vasut wrote:
> Dear Fabio Estevam,
>
>> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> wrote:
>>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
>>>
>>> <hector.palacios@digi.com> wrote:
>>>> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'
>>>> in your EVK?
>>>
>>> Yes, this is what I have been running since the beginning.
>>>
>>>> If it doesn't fail, could you try running it again after playing with
>>>> the environment (setting/printing some variables).
>>>
>>> I can't reproduce the problem here.
>>>
>>>> As I said, this issue appeared with different TFTP servers and is
>>>> independent of whether the dcache is or not enabled.
>>>
>>> Can you transfer from a PC to another PC via TFTP?

Yes I can.

> Another thing of interest would be a 'tcpdump' pcap capture of that connection.

I was initially filtering out only TFTP packets of my wireshark trace and all looked 
correct. After taking a second look to the full trace I see now a hint.
Around 7 seconds after starting the TFTP transfer the server is sending an ARP to the 
target asking for the owner of the target's IP. The target is receiving this ARP and 
apparently responding (at least this is what my debug code shows as it gets into 
arp.c:ArpReceive(), case ARPOP_REQUEST and sending a packet), but this ARP reply from 
the target is not reaching the network. My sniffer does not capture this reply.

The server resends the ARP request twice more (seconds 8 and 9) to the target and 
since it doesn't get a reply then sends a broadcast ARP (seconds 10) asking who has 
that IP. Since nobody responds it stops sending data.

The times that it works (and I don't know the magic behind using a numeric address 
versus using ${loadaddr} when they have the same value), the ARP replies do reach the 
network and the server continues the transmission normally.

Using a v2009 U-Boot, the behaviour is exactly the same, but the target's ARP replies 
always reach the network, and the transfers always succeed.

Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

Best regards,
--
Hector Palacios
Marek Vasut July 18, 2013, 4:12 a.m. UTC | #31
Dear Hector Palacios,

> Dear Marek,
> 
> On 07/16/2013 06:44 AM, Marek Vasut wrote:
> > Dear Fabio Estevam,
> > 
> >> On Tue, Jul 16, 2013 at 12:51 AM, Fabio Estevam <festevam@gmail.com> wrote:
> >>> On Mon, Jul 15, 2013 at 12:09 PM, Hector Palacios
> >>> 
> >>> <hector.palacios@digi.com> wrote:
> >>>> @Fabio: could you manually run the command 'tftp ${loadaddr} file100M'
> >>>> in your EVK?
> >>> 
> >>> Yes, this is what I have been running since the beginning.
> >>> 
> >>>> If it doesn't fail, could you try running it again after playing with
> >>>> the environment (setting/printing some variables).
> >>> 
> >>> I can't reproduce the problem here.
> >>> 
> >>>> As I said, this issue appeared with different TFTP servers and is
> >>>> independent of whether the dcache is or not enabled.
> >>> 
> >>> Can you transfer from a PC to another PC via TFTP?
> 
> Yes I can.
> 
> > Another thing of interest would be a 'tcpdump' pcap capture of that
> > connection.
> 
> I was initially filtering out only TFTP packets of my wireshark trace and
> all looked correct. After taking a second look to the full trace I see now
> a hint. Around 7 seconds after starting the TFTP transfer the server is
> sending an ARP to the target asking for the owner of the target's IP. The
> target is receiving this ARP and apparently responding (at least this is
> what my debug code shows as it gets into arp.c:ArpReceive(), case
> ARPOP_REQUEST and sending a packet), but this ARP reply from the target is
> not reaching the network. My sniffer does not capture this reply.
> 
> The server resends the ARP request twice more (seconds 8 and 9) to the
> target and since it doesn't get a reply then sends a broadcast ARP
> (seconds 10) asking who has that IP. Since nobody responds it stops
> sending data.
> 
> The times that it works (and I don't know the magic behind using a numeric
> address versus using ${loadaddr} when they have the same value), the ARP
> replies do reach the network and the server continues the transmission
> normally.
> 
> Using a v2009 U-Boot, the behaviour is exactly the same, but the target's
> ARP replies always reach the network, and the transfers always succeed.
> 
> Since Fabio cannot reproduce it I guess it must be a local ghost. :o(

Try a 1:1 connection with a direct cable. Also, try multiple cables ;-)

Best regards,
Marek Vasut
diff mbox

Patch

diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
index 97bf8fe..ec5b9db 100644
--- a/drivers/net/fec_mxc.c
+++ b/drivers/net/fec_mxc.c
@@ -737,6 +737,28 @@  static int fec_send(struct eth_device *dev, void *packet, int length)
 	flush_dcache_range(addr, addr + size);
 
 	/*
+	 * Below we read the DMA descriptor's last four bytes back from the
+	 * DRAM. This is important in order to make sure that all WRITE
+	 * operations on the bus that were triggered by previous cache FLUSH
+	 * have completed.
+	 *
+	 * Otherwise, on MX28, it is possible to observe a corruption of the
+	 * DMA descriptors. Please refer to schematic "Figure 1-2" in MX28RM
+	 * for the bus structure of MX28. The scenario is as follows:
+	 *
+	 * 1) ARM core triggers a series of WRITEs on the AHB_ARB2 bus going
+	 *    to DRAM due to flush_dcache_range()
+	 * 2) ARM core writes the FEC registers via AHB_ARB2
+	 * 3) FEC DMA starts reading/writing from/to DRAM via AHB_ARB3
+	 *
+	 * Note that 2) does sometimes finish before 1) due to reordering of
+	 * WRITE accesses on the AHB bus, therefore triggering 3) before the
+	 * DMA descriptor is fully written into DRAM. This results in occasional
+	 * corruption of the DMA descriptor.
+	 */
+	readl(addr + size - 4);
+
+	/*
 	 * Enable SmartDMA transmit task
 	 */
 	fec_tx_task_enable(fec);