diff mbox

[1/7] net: ethernet: ti: cpdma: am437x: allow descs to be plased in ddr

Message ID 20161201233432.6182-2-grygorii.strashko@ti.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Grygorii Strashko Dec. 1, 2016, 11:34 p.m. UTC
It's observed that cpsw/cpdma is not working properly when CPPI
descriptors are placed in DDR instead of internal CPPI RAM on am437x
SoC:
- rx/tx silently stops processing packets;
- or - after boot it's working for sometime, but stuck once Network
load is increased (ping is working, but iperf is not).
(The same issue has not been reproduced on am335x and am57xx).

It seems that write to HDP register processed faster by interconnect
than writing of descriptor memory buffer in DDR, which is probably
caused by store buffer / write buffer differences as these function
are implemented differently across devices. So, to fix this i come up
with two changes:

1) all accesses to the channel register HDP/CP/RXFREE registers should
be done using sync IO accessors readl()/writel(), because all previous
memory writes writes have to be completed before starting channel
(write to HDP) or completing desc processing.

2) the change 1 only doesn't work on am437x and additional reading of
desc's field is required right after the new descriptor was filled
with data and before pointer on it will be stored in
prev_desc->hw_next field or HDP register.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

David Miller Dec. 3, 2016, 8:34 p.m. UTC | #1
From: Grygorii Strashko <grygorii.strashko@ti.com>
Date: Thu, 1 Dec 2016 17:34:26 -0600

> @@ -167,10 +167,10 @@ static struct cpdma_control_info controls[] = {
>  
>  /* various accessors */
>  #define dma_reg_read(ctlr, ofs)		__raw_readl((ctlr)->dmaregs + (ofs))
> -#define chan_read(chan, fld)		__raw_readl((chan)->fld)
> +#define chan_read(chan, fld)		readl((chan)->fld)
>  #define desc_read(desc, fld)		__raw_readl(&(desc)->fld)
>  #define dma_reg_write(ctlr, ofs, v)	__raw_writel(v, (ctlr)->dmaregs + (ofs))
> -#define chan_write(chan, fld, v)	__raw_writel(v, (chan)->fld)
> +#define chan_write(chan, fld, v)	writel(v, (chan)->fld)
>  #define desc_write(desc, fld, v)	__raw_writel((u32)(v), &(desc)->fld)

Unless you want to keep running into subtle errors all over the
place wrt. register vs. memory write ordering, I strong suggest
you use strongly ordered readl/writel for all register accesses.

I see no tangible, worthwhile, advantage to using these relaxed
ordering primitives.  The only result is potential bugs.

People who use the relaxed ordering primitives properly are only
doing so in extremely carefully coded sequences where a series
of writes has no dependency on main memory operations and is
explicitly completed with a barrier operation such as a read
back of a register in the same device.

That's not at all what is going on here, instead the driver is wildly
using relaxed ordered register accesses for basically everything.
This is extremely unwise and it's why you ran into this bug in the
first place.

Therefore, I absolutely require that you fix this by eliminating
any and all usese of relaxed ordering I/O accessors in this driver.

Thank you.
Grygorii Strashko Dec. 5, 2016, 5:57 p.m. UTC | #2
On 12/03/2016 02:34 PM, David Miller wrote:
> From: Grygorii Strashko <grygorii.strashko@ti.com>
> Date: Thu, 1 Dec 2016 17:34:26 -0600
> 
>> @@ -167,10 +167,10 @@ static struct cpdma_control_info controls[] = {
>>  
>>  /* various accessors */
>>  #define dma_reg_read(ctlr, ofs)		__raw_readl((ctlr)->dmaregs + (ofs))
>> -#define chan_read(chan, fld)		__raw_readl((chan)->fld)
>> +#define chan_read(chan, fld)		readl((chan)->fld)
>>  #define desc_read(desc, fld)		__raw_readl(&(desc)->fld)
>>  #define dma_reg_write(ctlr, ofs, v)	__raw_writel(v, (ctlr)->dmaregs + (ofs))
>> -#define chan_write(chan, fld, v)	__raw_writel(v, (chan)->fld)
>> +#define chan_write(chan, fld, v)	writel(v, (chan)->fld)
>>  #define desc_write(desc, fld, v)	__raw_writel((u32)(v), &(desc)->fld)
> 
> Unless you want to keep running into subtle errors all over the
> place wrt. register vs. memory write ordering, I strong suggest
> you use strongly ordered readl/writel for all register accesses.
> 
> I see no tangible, worthwhile, advantage to using these relaxed
> ordering primitives.  The only result is potential bugs.
> 
> People who use the relaxed ordering primitives properly are only
> doing so in extremely carefully coded sequences where a series
> of writes has no dependency on main memory operations and is
> explicitly completed with a barrier operation such as a read
> back of a register in the same device.
> 
> That's not at all what is going on here, instead the driver is wildly
> using relaxed ordered register accesses for basically everything.
> This is extremely unwise and it's why you ran into this bug in the
> first place.
> 
> Therefore, I absolutely require that you fix this by eliminating
> any and all usese of relaxed ordering I/O accessors in this driver.

Thanks for your comments - that's what I've tried first, but that decided
to find out minimal change which still works.

I'll update it.
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index c776e45..0924014 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -167,10 +167,10 @@  static struct cpdma_control_info controls[] = {
 
 /* various accessors */
 #define dma_reg_read(ctlr, ofs)		__raw_readl((ctlr)->dmaregs + (ofs))
-#define chan_read(chan, fld)		__raw_readl((chan)->fld)
+#define chan_read(chan, fld)		readl((chan)->fld)
 #define desc_read(desc, fld)		__raw_readl(&(desc)->fld)
 #define dma_reg_write(ctlr, ofs, v)	__raw_writel(v, (ctlr)->dmaregs + (ofs))
-#define chan_write(chan, fld, v)	__raw_writel(v, (chan)->fld)
+#define chan_write(chan, fld, v)	writel(v, (chan)->fld)
 #define desc_write(desc, fld, v)	__raw_writel((u32)(v), &(desc)->fld)
 
 #define cpdma_desc_to_port(chan, mode, directed)			\
@@ -1064,6 +1064,7 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	desc_write(desc, sw_token,  token);
 	desc_write(desc, sw_buffer, buffer);
 	desc_write(desc, sw_len,    len);
+	desc_read(desc, sw_len);
 
 	__cpdma_chan_submit(chan, desc);