diff mbox series

[U-Boot] dwc: ep0: Allocate and flush dwc->ep0_trb in a cache aligned manner

Message ID 1505819741-29843-1-git-send-email-faiz_abbas@ti.com
State Superseded
Delegated to: Marek Vasut
Headers show
Series [U-Boot] dwc: ep0: Allocate and flush dwc->ep0_trb in a cache aligned manner | expand

Commit Message

Faiz Abbas Sept. 19, 2017, 11:15 a.m. UTC
A flush of the cache is required before any DMA access can take place.
The minimum size that can be flushed from the cache is one cache line
size. Therefore, any buffer allocated for DMA should be in multiples
of cache line size.

Thus, allocate memory for ep0_trb in multiples of cache line size.

Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
to flush cache, it leads to cache misaligned messages as only the base
address dwc->ep0_trb is cache aligned.

Therefore, flush cache using ep0_trb_addr which is always cache aligned.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 drivers/usb/dwc3/ep0.c    | 7 ++++---
 drivers/usb/dwc3/gadget.c | 3 ++-
 2 files changed, 6 insertions(+), 4 deletions(-)

Comments

Faiz Abbas Oct. 3, 2017, 7:35 a.m. UTC | #1
Hi,

On Tuesday 19 September 2017 04:45 PM, Faiz Abbas wrote:
> A flush of the cache is required before any DMA access can take place.
> The minimum size that can be flushed from the cache is one cache line
> size. Therefore, any buffer allocated for DMA should be in multiples
> of cache line size.
> 
> Thus, allocate memory for ep0_trb in multiples of cache line size.
> 
> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
> to flush cache, it leads to cache misaligned messages as only the base
> address dwc->ep0_trb is cache aligned.
> 
> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
> 
> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>

Gentle ping.

Thanks,
Faiz
Andy Shevchenko Oct. 3, 2017, 9:08 a.m. UTC | #2
On Tue, 2017-10-03 at 13:05 +0530, Faiz Abbas wrote:
> Hi,
> 
> On Tuesday 19 September 2017 04:45 PM, Faiz Abbas wrote:
> > A flush of the cache is required before any DMA access can take
> > place.
> > The minimum size that can be flushed from the cache is one cache
> > line
> > size. Therefore, any buffer allocated for DMA should be in multiples
> > of cache line size.
> > 
> > Thus, allocate memory for ep0_trb in multiples of cache line size.
> > 
> > Also, when local variable trb is assigned to dwc->ep0_trb[1] and
> > used
> > to flush cache, it leads to cache misaligned messages as only the
> > base
> > address dwc->ep0_trb is cache aligned.
> > 
> > Therefore, flush cache using ep0_trb_addr which is always cache
> > aligned.
> > 
> > Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
> 
> Gentle ping.

Can you resend with Felipe Balbi included?

And I'm not sure Vincent is anyhow related to this anymore (or even
works with us).
Marek Vasut Oct. 3, 2017, 9:43 a.m. UTC | #3
On 10/03/2017 11:08 AM, Andy Shevchenko wrote:
> On Tue, 2017-10-03 at 13:05 +0530, Faiz Abbas wrote:
>> Hi,
>>
>> On Tuesday 19 September 2017 04:45 PM, Faiz Abbas wrote:
>>> A flush of the cache is required before any DMA access can take
>>> place.
>>> The minimum size that can be flushed from the cache is one cache
>>> line
>>> size. Therefore, any buffer allocated for DMA should be in multiples
>>> of cache line size.
>>>
>>> Thus, allocate memory for ep0_trb in multiples of cache line size.
>>>
>>> Also, when local variable trb is assigned to dwc->ep0_trb[1] and
>>> used
>>> to flush cache, it leads to cache misaligned messages as only the
>>> base
>>> address dwc->ep0_trb is cache aligned.
>>>
>>> Therefore, flush cache using ep0_trb_addr which is always cache
>>> aligned.
>>>
>>> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
>>
>> Gentle ping.
> 
> Can you resend with Felipe Balbi included?

You can just reply to the original email and add people to CC of that
reply instead of resending ?

> And I'm not sure Vincent is anyhow related to this anymore (or even
> works with us).
>
Faiz Abbas Oct. 3, 2017, 9:58 a.m. UTC | #4
Hi,

+Felipe Balbi

On Tuesday 19 September 2017 04:45 PM, Faiz Abbas wrote:
> A flush of the cache is required before any DMA access can take place.
> The minimum size that can be flushed from the cache is one cache line
> size. Therefore, any buffer allocated for DMA should be in multiples
> of cache line size.
> 
> Thus, allocate memory for ep0_trb in multiples of cache line size.
> 
> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
> to flush cache, it leads to cache misaligned messages as only the base
> address dwc->ep0_trb is cache aligned.
> 
> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
> 
> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
> ---
>  drivers/usb/dwc3/ep0.c    | 7 ++++---
>  drivers/usb/dwc3/gadget.c | 3 ++-
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index e61d980..f3a17a1 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -82,7 +82,7 @@ static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
>  				| DWC3_TRB_CTRL_LST);
>  
>  	dwc3_flush_cache((uintptr_t)buf_dma, len);
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>  
>  	if (chain)
>  		return 0;
> @@ -790,7 +790,7 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  	if (!r)
>  		return;
>  
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>  
>  	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
>  	if (status == DWC3_TRBSTS_SETUP_PENDING) {
> @@ -821,7 +821,8 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  			ur->actual += transferred;
>  
>  			trb++;
> -			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
> +					 sizeof(*trb) * 2);
>  			length = trb->size & DWC3_TRB_SIZE_MASK;
>  
>  			ep0->free_slot = 0;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index e065c5a..895a5bc 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2567,7 +2567,8 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>  		goto err0;
>  	}
>  
> -	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
> +	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
> +						CACHELINE_SIZE),
>  					  (unsigned long *)&dwc->ep0_trb_addr);
>  	if (!dwc->ep0_trb) {
>  		dev_err(dwc->dev, "failed to allocate ep0 trb\n");
> 

Thanks,
Faiz
Marek Vasut Oct. 3, 2017, 12:04 p.m. UTC | #5
On 09/19/2017 01:15 PM, Faiz Abbas wrote:
> A flush of the cache is required before any DMA access can take place.

You mean invalidation for inbound DMA, flush for outbound DMA, right ?

> The minimum size that can be flushed from the cache is one cache line
> size. Therefore, any buffer allocated for DMA should be in multiples
> of cache line size.
> 
> Thus, allocate memory for ep0_trb in multiples of cache line size.
> 
> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
> to flush cache, it leads to cache misaligned messages as only the base
> address dwc->ep0_trb is cache aligned.
> 
> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
> 
> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
> ---
>  drivers/usb/dwc3/ep0.c    | 7 ++++---
>  drivers/usb/dwc3/gadget.c | 3 ++-
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index e61d980..f3a17a1 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -82,7 +82,7 @@ static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
>  				| DWC3_TRB_CTRL_LST);
>  
>  	dwc3_flush_cache((uintptr_t)buf_dma, len);
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>  
>  	if (chain)
>  		return 0;
> @@ -790,7 +790,7 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  	if (!r)
>  		return;
>  
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);

Why *2 ?

>  	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
>  	if (status == DWC3_TRBSTS_SETUP_PENDING) {
> @@ -821,7 +821,8 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  			ur->actual += transferred;
>  
>  			trb++;
> -			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
> +					 sizeof(*trb) * 2);
>  			length = trb->size & DWC3_TRB_SIZE_MASK;
>  
>  			ep0->free_slot = 0;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index e065c5a..895a5bc 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2567,7 +2567,8 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>  		goto err0;
>  	}
>  
> -	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
> +	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
> +						CACHELINE_SIZE),

AFAIU dma_alloc_coherent() should mark the memory area uncachable .

>  					  (unsigned long *)&dwc->ep0_trb_addr);
>  	if (!dwc->ep0_trb) {
>  		dev_err(dwc->dev, "failed to allocate ep0 trb\n");
>
Philipp Tomsich Oct. 3, 2017, 12:18 p.m. UTC | #6
Marek,

> On 3 Oct 2017, at 14:04, Marek Vasut <marex@denx.de> wrote:
> 
> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>> A flush of the cache is required before any DMA access can take place.
> 
> You mean invalidation for inbound DMA, flush for outbound DMA, right ?
> 
>> The minimum size that can be flushed from the cache is one cache line
>> size. Therefore, any buffer allocated for DMA should be in multiples
>> of cache line size.
>> 
>> Thus, allocate memory for ep0_trb in multiples of cache line size.
>> 
>> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
>> to flush cache, it leads to cache misaligned messages as only the base
>> address dwc->ep0_trb is cache aligned.
>> 
>> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
>> 
>> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
>> ---
>> drivers/usb/dwc3/ep0.c    | 7 ++++---
>> drivers/usb/dwc3/gadget.c | 3 ++-
>> 2 files changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index e61d980..f3a17a1 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -82,7 +82,7 @@ static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
>> 				| DWC3_TRB_CTRL_LST);
>> 
>> 	dwc3_flush_cache((uintptr_t)buf_dma, len);
>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>> 
>> 	if (chain)
>> 		return 0;
>> @@ -790,7 +790,7 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>> 	if (!r)
>> 		return;
>> 
>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
> 
> Why *2 ?
> 
>> 	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
>> 	if (status == DWC3_TRBSTS_SETUP_PENDING) {
>> @@ -821,7 +821,8 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>> 			ur->actual += transferred;
>> 
>> 			trb++;
>> -			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>> +			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
>> +					 sizeof(*trb) * 2);
>> 			length = trb->size & DWC3_TRB_SIZE_MASK;
>> 
>> 			ep0->free_slot = 0;
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index e065c5a..895a5bc 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2567,7 +2567,8 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>> 		goto err0;
>> 	}
>> 
>> -	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
>> +	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
>> +						CACHELINE_SIZE),
> 
> AFAIU dma_alloc_coherent() should mark the memory area uncachable .

We had this discussion a while back, when I submitted the fixes to make
this driver work on the RK3399: dma_alloc_coherent only performs a
memalign on ARM and ARM64:

See the following snippet in arch/arm/include/asm/dma-mapping.h:

static inline void *dma_alloc_coherent(size_t len, unsigned long *handle)
{
        *handle = (unsigned long)memalign(ARCH_DMA_MINALIGN, len);
        return (void *)*handle;
}


> 
>> 					  (unsigned long *)&dwc->ep0_trb_addr);
>> 	if (!dwc->ep0_trb) {
>> 		dev_err(dwc->dev, "failed to allocate ep0 trb\n");
>> 
> 
> 
> -- 
> Best regards,
> Marek Vasut
Marek Vasut Oct. 3, 2017, 12:52 p.m. UTC | #7
On 10/03/2017 02:18 PM, Dr. Philipp Tomsich wrote:
> Marek,
> 
>> On 3 Oct 2017, at 14:04, Marek Vasut <marex@denx.de> wrote:
>>
>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>> A flush of the cache is required before any DMA access can take place.
>>
>> You mean invalidation for inbound DMA, flush for outbound DMA, right ?
>>
>>> The minimum size that can be flushed from the cache is one cache line
>>> size. Therefore, any buffer allocated for DMA should be in multiples
>>> of cache line size.
>>>
>>> Thus, allocate memory for ep0_trb in multiples of cache line size.
>>>
>>> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
>>> to flush cache, it leads to cache misaligned messages as only the base
>>> address dwc->ep0_trb is cache aligned.
>>>
>>> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
>>>
>>> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
>>> ---
>>> drivers/usb/dwc3/ep0.c    | 7 ++++---
>>> drivers/usb/dwc3/gadget.c | 3 ++-
>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>> index e61d980..f3a17a1 100644
>>> --- a/drivers/usb/dwc3/ep0.c
>>> +++ b/drivers/usb/dwc3/ep0.c
>>> @@ -82,7 +82,7 @@ static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
>>> 				| DWC3_TRB_CTRL_LST);
>>>
>>> 	dwc3_flush_cache((uintptr_t)buf_dma, len);
>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>
>>> 	if (chain)
>>> 		return 0;
>>> @@ -790,7 +790,7 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>>> 	if (!r)
>>> 		return;
>>>
>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>
>> Why *2 ?
>>
>>> 	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
>>> 	if (status == DWC3_TRBSTS_SETUP_PENDING) {
>>> @@ -821,7 +821,8 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>>> 			ur->actual += transferred;
>>>
>>> 			trb++;
>>> -			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>> +			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
>>> +					 sizeof(*trb) * 2);
>>> 			length = trb->size & DWC3_TRB_SIZE_MASK;
>>>
>>> 			ep0->free_slot = 0;
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index e065c5a..895a5bc 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2567,7 +2567,8 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>>> 		goto err0;
>>> 	}
>>>
>>> -	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
>>> +	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
>>> +						CACHELINE_SIZE),
>>
>> AFAIU dma_alloc_coherent() should mark the memory area uncachable .
> 
> We had this discussion a while back, when I submitted the fixes to make
> this driver work on the RK3399: dma_alloc_coherent only performs a
> memalign on ARM and ARM64:
> 
> See the following snippet in arch/arm/include/asm/dma-mapping.h:

Does that mean the code is wrong / the function name is misleading ?

> static inline void *dma_alloc_coherent(size_t len, unsigned long *handle)
> {
>         *handle = (unsigned long)memalign(ARCH_DMA_MINALIGN, len);
>         return (void *)*handle;
> }
[...]
Marek Vasut Oct. 3, 2017, 1:01 p.m. UTC | #8
On 09/19/2017 01:15 PM, Faiz Abbas wrote:
> A flush of the cache is required before any DMA access can take place.
> The minimum size that can be flushed from the cache is one cache line
> size. Therefore, any buffer allocated for DMA should be in multiples
> of cache line size.
> 
> Thus, allocate memory for ep0_trb in multiples of cache line size.
> 
> Also, when local variable trb is assigned to dwc->ep0_trb[1] and used
> to flush cache, it leads to cache misaligned messages as only the base
> address dwc->ep0_trb is cache aligned.
> 
> Therefore, flush cache using ep0_trb_addr which is always cache aligned.
> 
> Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>

btw the tags should be usb: dwc3:

> ---
>  drivers/usb/dwc3/ep0.c    | 7 ++++---
>  drivers/usb/dwc3/gadget.c | 3 ++-
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index e61d980..f3a17a1 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -82,7 +82,7 @@ static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
>  				| DWC3_TRB_CTRL_LST);
>  
>  	dwc3_flush_cache((uintptr_t)buf_dma, len);
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>  
>  	if (chain)
>  		return 0;
> @@ -790,7 +790,7 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  	if (!r)
>  		return;
>  
> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>  
>  	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
>  	if (status == DWC3_TRBSTS_SETUP_PENDING) {
> @@ -821,7 +821,8 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
>  			ur->actual += transferred;
>  
>  			trb++;
> -			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
> +			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
> +					 sizeof(*trb) * 2);
>  			length = trb->size & DWC3_TRB_SIZE_MASK;
>  
>  			ep0->free_slot = 0;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index e065c5a..895a5bc 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2567,7 +2567,8 @@ int dwc3_gadget_init(struct dwc3 *dwc)
>  		goto err0;
>  	}
>  
> -	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
> +	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
> +						CACHELINE_SIZE),
>  					  (unsigned long *)&dwc->ep0_trb_addr);
>  	if (!dwc->ep0_trb) {
>  		dev_err(dwc->dev, "failed to allocate ep0 trb\n");
>
Faiz Abbas Oct. 3, 2017, 1:17 p.m. UTC | #9
Hi,

On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>> A flush of the cache is required before any DMA access can take place.
> 
> You mean invalidation for inbound DMA, flush for outbound DMA, right ?

yes thats what i meant.


>>  
>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
> 
> Why *2 ?

Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
strictly required as dwc3_flush_cache() rounds up the size to
CACHELINE_SIZE but from a caller POV, flush everything we allocated.

Thanks,
Faiz
Marek Vasut Oct. 3, 2017, 1:18 p.m. UTC | #10
On 10/03/2017 03:17 PM, Faiz Abbas wrote:
> Hi,
> 
> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>> A flush of the cache is required before any DMA access can take place.
>>
>> You mean invalidation for inbound DMA, flush for outbound DMA, right ?
> 
> yes thats what i meant.
> 
> 
>>>  
>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>
>> Why *2 ?
> 
> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
> strictly required as dwc3_flush_cache() rounds up the size to
> CACHELINE_SIZE but from a caller POV, flush everything we allocated.

Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
would be better ?
Faiz Abbas Oct. 4, 2017, 10:51 a.m. UTC | #11
Hi,

On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>> Hi,
>>
>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>> A flush of the cache is required before any DMA access can take place.
>>>
>>> You mean invalidation for inbound DMA, flush for outbound DMA, right ?
>>
>> yes thats what i meant.
>>
>>
>>>>  
>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>
>>> Why *2 ?
>>
>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>> strictly required as dwc3_flush_cache() rounds up the size to
>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
> 
> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
> would be better ?
> 
A single trb is 16 bytes in size and two of them are allocated
contiguously. Originally, a flush on the first trb was flushing both of
them anyway as the minimum flush is CACHELINE_SIZE (64 bytes). This is
not changing any functionality as far as I have tested. Just making sure
cache misaligned warnings don't show up.

Thanks,
Faiz
Marek Vasut Oct. 4, 2017, 12:31 p.m. UTC | #12
On 10/04/2017 12:51 PM, Faiz Abbas wrote:
> Hi,
> 
> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>> Hi,
>>>
>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>> A flush of the cache is required before any DMA access can take place.
>>>>
>>>> You mean invalidation for inbound DMA, flush for outbound DMA, right ?
>>>
>>> yes thats what i meant.
>>>
>>>
>>>>>  
>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>
>>>> Why *2 ?
>>>
>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>> strictly required as dwc3_flush_cache() rounds up the size to
>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>
>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>> would be better ?
>>
> A single trb is 16 bytes in size and two of them are allocated
> contiguously.

Why are two allocated continuously ? (I am not dwc3 expert)

> Originally, a flush on the first trb was flushing both of
> them anyway as the minimum flush is CACHELINE_SIZE (64 bytes). This is
> not changing any functionality as far as I have tested. Just making sure
> cache misaligned warnings don't show up.

If you flush 64bytes, you flush more than 2 TRBs, you flush something
around those TRBs too.

> Thanks,
> Faiz
>
Faiz Abbas Oct. 4, 2017, 1:11 p.m. UTC | #13
Hi,

On Wednesday 04 October 2017 06:01 PM, Marek Vasut wrote:
> On 10/04/2017 12:51 PM, Faiz Abbas wrote:
>> Hi,
>> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>>> Hi,
>>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>>>  
>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>
>>>>> Why *2 ?
>>>>
>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>
>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>> would be better ?
>>>
>> A single trb is 16 bytes in size and two of them are allocated
>> contiguously.
> 
> Why are two allocated continuously ? (I am not dwc3 expert)

Neither am I. I did try to pad to the dwc_trb structure such that each
trb is 64 bytes in size but this leads to failures when testing. I
didn't get a chance to debug this though. I suspect its because the code
expects the trbs to be contiguous and/or 16 bytes in size.

It'll be great if someone can shed light on this.

> 
>> Originally, a flush on the first trb was flushing both of
>> them anyway as the minimum flush is CACHELINE_SIZE (64 bytes). This is
>> not changing any functionality as far as I have tested. Just making sure
>> cache misaligned warnings don't show up.
> 
> If you flush 64bytes, you flush more than 2 TRBs, you flush something
> around those TRBs too.

Yes and that is why I changed the allocation step to
ROUND(sizeof(trb) * 2, 64). We use only 32 bytes for two trbs but make
sure to allocate 64 bytes so that we can safely flush it.

Thanks
Faiz
Marek Vasut Oct. 5, 2017, 11:27 a.m. UTC | #14
On 10/04/2017 03:11 PM, Faiz Abbas wrote:
> Hi,
> 
> On Wednesday 04 October 2017 06:01 PM, Marek Vasut wrote:
>> On 10/04/2017 12:51 PM, Faiz Abbas wrote:
>>> Hi,
>>> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>>>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>>>> Hi,
>>>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>>>>  
>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>
>>>>>> Why *2 ?
>>>>>
>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>
>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>> would be better ?
>>>>
>>> A single trb is 16 bytes in size and two of them are allocated
>>> contiguously.
>>
>> Why are two allocated continuously ? (I am not dwc3 expert)
> 
> Neither am I. I did try to pad to the dwc_trb structure such that each
> trb is 64 bytes in size but this leads to failures when testing. I
> didn't get a chance to debug this though. I suspect its because the code
> expects the trbs to be contiguous and/or 16 bytes in size.

Maybe that's something you need to check -- why it fails if aligned . Do
the TRBs need to be stored back-to-back ?

> It'll be great if someone can shed light on this.
> 
>>
>>> Originally, a flush on the first trb was flushing both of
>>> them anyway as the minimum flush is CACHELINE_SIZE (64 bytes). This is
>>> not changing any functionality as far as I have tested. Just making sure
>>> cache misaligned warnings don't show up.
>>
>> If you flush 64bytes, you flush more than 2 TRBs, you flush something
>> around those TRBs too.
> 
> Yes and that is why I changed the allocation step to
> ROUND(sizeof(trb) * 2, 64). We use only 32 bytes for two trbs but make
> sure to allocate 64 bytes so that we can safely flush it.
> 
> Thanks
> Faiz
>
Faiz Abbas Oct. 6, 2017, 11:33 a.m. UTC | #15
Hi,

On Thursday 05 October 2017 04:57 PM, Marek Vasut wrote:
> On 10/04/2017 03:11 PM, Faiz Abbas wrote:
>> Hi,
>>
>> On Wednesday 04 October 2017 06:01 PM, Marek Vasut wrote:
>>> On 10/04/2017 12:51 PM, Faiz Abbas wrote:
>>>> Hi,
>>>> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>>>>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>>>>> Hi,
>>>>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>>>>>  
>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>
>>>>>>> Why *2 ?
>>>>>>
>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>
>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>> would be better ?
>>>>>
>>>> A single trb is 16 bytes in size and two of them are allocated
>>>> contiguously.
>>>
>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>
>> Neither am I. I did try to pad to the dwc_trb structure such that each
>> trb is 64 bytes in size but this leads to failures when testing. I
>> didn't get a chance to debug this though. I suspect its because the code
>> expects the trbs to be contiguous and/or 16 bytes in size.
> 
> Maybe that's something you need to check -- why it fails if aligned . Do
> the TRBs need to be stored back-to-back ?

Sure. Will check that and submit another version with fixes.

Thanks,
Faiz
Faiz Abbas Oct. 10, 2017, 5:37 a.m. UTC | #16
+Kishon

On Friday 06 October 2017 05:03 PM, Faiz Abbas wrote:
> Hi,
> 
> On Thursday 05 October 2017 04:57 PM, Marek Vasut wrote:
>> On 10/04/2017 03:11 PM, Faiz Abbas wrote:
>>> Hi,
>>>
>>> On Wednesday 04 October 2017 06:01 PM, Marek Vasut wrote:
>>>> On 10/04/2017 12:51 PM, Faiz Abbas wrote:
>>>>> Hi,
>>>>> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>>>>>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>>>>>> Hi,
>>>>>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>>>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>>>>>>  
>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>
>>>>>>>> Why *2 ?
>>>>>>>
>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>
>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>> would be better ?
>>>>>>
>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>> contiguously.
>>>>
>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>>
>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>> trb is 64 bytes in size but this leads to failures when testing. I
>>> didn't get a chance to debug this though. I suspect its because the code
>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>
>> Maybe that's something you need to check -- why it fails if aligned . Do
>> the TRBs need to be stored back-to-back ?
> 
> Sure. Will check that and submit another version with fixes.
> 
> Thanks,
> Faiz
>
Kishon Vijay Abraham I Oct. 10, 2017, 5:48 a.m. UTC | #17
Hi,

On Tuesday 10 October 2017 11:07 AM, Faiz Abbas wrote:
> +Kishon
> 
> On Friday 06 October 2017 05:03 PM, Faiz Abbas wrote:
>> Hi,
>>
>> On Thursday 05 October 2017 04:57 PM, Marek Vasut wrote:
>>> On 10/04/2017 03:11 PM, Faiz Abbas wrote:
>>>> Hi,
>>>>
>>>> On Wednesday 04 October 2017 06:01 PM, Marek Vasut wrote:
>>>>> On 10/04/2017 12:51 PM, Faiz Abbas wrote:
>>>>>> Hi,
>>>>>> On Tuesday 03 October 2017 06:48 PM, Marek Vasut wrote:
>>>>>>> On 10/03/2017 03:17 PM, Faiz Abbas wrote:
>>>>>>>> Hi,
>>>>>>>> On Tuesday 03 October 2017 05:34 PM, Marek Vasut wrote:
>>>>>>>>> On 09/19/2017 01:15 PM, Faiz Abbas wrote:
>>>>>>>>>>  
>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>
>>>>>>>>> Why *2 ?
>>>>>>>>
>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>
>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>> would be better ?
>>>>>>>
>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>> contiguously.
>>>>>
>>>>> Why are two allocated continuously ? (I am not dwc3 expert)

The TRB's should be allocated contiguously for dwc3 and only the base of the
entire TRB table is programmed in the HW.
 ________________ <------------------ TRB table base address
|     TRB0       |
|________________|
|     TRB1       |
|________________|
|     TRB2       |
|________________|
|     TRBn       |
|________________|


>>>>
>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>> didn't get a chance to debug this though. I suspect its because the code
>>>> expects the trbs to be contiguous and/or 16 bytes in size.

It's not the code but it's the HW.

Thanks
Kishon
Marek Vasut Oct. 10, 2017, 8 a.m. UTC | #18
On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
> Hi,

Hi,

[...]

>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>
>>>>>>>>>> Why *2 ?
>>>>>>>>>
>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>
>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>> would be better ?
>>>>>>>>
>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>> contiguously.
>>>>>>
>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
> 
> The TRB's should be allocated contiguously for dwc3 and only the base of the
> entire TRB table is programmed in the HW.
>  ________________ <------------------ TRB table base address
> |     TRB0       |
> |________________|
> |     TRB1       |
> |________________|
> |     TRB2       |
> |________________|
> |     TRBn       |
> |________________|
> 
> 
>>>>>
>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
> 
> It's not the code but it's the HW.

That'd imply we need either some sort of flushing scheme or non-cachable
memory allocation. What does Linux do ?
Faiz Abbas Oct. 10, 2017, 10:45 a.m. UTC | #19
Hi Marek,

On Tuesday 10 October 2017 01:30 PM, Marek Vasut wrote:
> On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
>> Hi,
> 
> Hi,
> 
> [...]
> 
>>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>>
>>>>>>>>>>> Why *2 ?
>>>>>>>>>>
>>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>>
>>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>>> would be better ?
>>>>>>>>>
>>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>>> contiguously.
>>>>>>>
>>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>
>> The TRB's should be allocated contiguously for dwc3 and only the base of the
>> entire TRB table is programmed in the HW.
>>  ________________ <------------------ TRB table base address
>> |     TRB0       |
>> |________________|
>> |     TRB1       |
>> |________________|
>> |     TRB2       |
>> |________________|
>> |     TRBn       |
>> |________________|
>>
>>
>>>>>>
>>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>
>> It's not the code but it's the HW.
> 
> That'd imply we need either some sort of flushing scheme or non-cachable
> memory allocation. What does Linux do ?
> The dma_alloc_coherent in linux kernel allocates non-cachable memory.

Currently, the code is using local variable trb to flush the cache. When
the first trb is used, dwc3_flush_cache flushes the complete
CACHELINE_SIZE (including the 2nd trb).
When the 2nd trb is used to flush cache, since it is an unaligned flush,
it will issue a warning and will align it to the lower cache line
boundary (flushing the 1st trb in the process).

So with or without this patch, both trbs are getting flushed with every
call. With the patch, we can just avoid misaligned messages by always
flushing using an aligned address.

Thanks,
Faiz
Marek Vasut Oct. 10, 2017, 1:49 p.m. UTC | #20
On 10/10/2017 12:45 PM, Faiz Abbas wrote:
> Hi Marek,
> 
> On Tuesday 10 October 2017 01:30 PM, Marek Vasut wrote:
>> On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
>>> Hi,
>>
>> Hi,
>>
>> [...]
>>
>>>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>>>
>>>>>>>>>>>> Why *2 ?
>>>>>>>>>>>
>>>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>>>
>>>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>>>> would be better ?
>>>>>>>>>>
>>>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>>>> contiguously.
>>>>>>>>
>>>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>>
>>> The TRB's should be allocated contiguously for dwc3 and only the base of the
>>> entire TRB table is programmed in the HW.
>>>  ________________ <------------------ TRB table base address
>>> |     TRB0       |
>>> |________________|
>>> |     TRB1       |
>>> |________________|
>>> |     TRB2       |
>>> |________________|
>>> |     TRBn       |
>>> |________________|
>>>
>>>
>>>>>>>
>>>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>>
>>> It's not the code but it's the HW.
>>
>> That'd imply we need either some sort of flushing scheme or non-cachable
>> memory allocation. What does Linux do ?
>> The dma_alloc_coherent in linux kernel allocates non-cachable memory.
> 
> Currently, the code is using local variable trb to flush the cache. When
> the first trb is used, dwc3_flush_cache flushes the complete
> CACHELINE_SIZE (including the 2nd trb).
> When the 2nd trb is used to flush cache, since it is an unaligned flush,
> it will issue a warning and will align it to the lower cache line
> boundary (flushing the 1st trb in the process).
> 
> So with or without this patch, both trbs are getting flushed with every
> call. With the patch, we can just avoid misaligned messages by always
> flushing using an aligned address.

What worries me is that you can flush something into the memory while
the controller is writing into it as well. Is that possible ?
Faiz Abbas Oct. 11, 2017, 8:23 a.m. UTC | #21
Hi,

On Tuesday 10 October 2017 07:19 PM, Marek Vasut wrote:
> On 10/10/2017 12:45 PM, Faiz Abbas wrote:
>> Hi Marek,
>>
>> On Tuesday 10 October 2017 01:30 PM, Marek Vasut wrote:
>>> On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
>>>> Hi,
>>>
>>> Hi,
>>>
>>> [...]
>>>
>>>>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why *2 ?
>>>>>>>>>>>>
>>>>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>>>>
>>>>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>>>>> would be better ?
>>>>>>>>>>>
>>>>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>>>>> contiguously.
>>>>>>>>>
>>>>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>>>
>>>> The TRB's should be allocated contiguously for dwc3 and only the base of the
>>>> entire TRB table is programmed in the HW.
>>>>  ________________ <------------------ TRB table base address
>>>> |     TRB0       |
>>>> |________________|
>>>> |     TRB1       |
>>>> |________________|
>>>> |     TRB2       |
>>>> |________________|
>>>> |     TRBn       |
>>>> |________________|
>>>>
>>>>
>>>>>>>>
>>>>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>>>
>>>> It's not the code but it's the HW.
>>>
>>> That'd imply we need either some sort of flushing scheme or non-cachable
>>> memory allocation. What does Linux do ?
>>> The dma_alloc_coherent in linux kernel allocates non-cachable memory.
>>
>> Currently, the code is using local variable trb to flush the cache. When
>> the first trb is used, dwc3_flush_cache flushes the complete
>> CACHELINE_SIZE (including the 2nd trb).
>> When the 2nd trb is used to flush cache, since it is an unaligned flush,
>> it will issue a warning and will align it to the lower cache line
>> boundary (flushing the 1st trb in the process).
>>
>> So with or without this patch, both trbs are getting flushed with every
>> call. With the patch, we can just avoid misaligned messages by always
>> flushing using an aligned address.
> 
> What worries me is that you can flush something into the memory while
> the controller is writing into it as well. Is that possible ?
> 
No, control to the hardware is only given after all the trbs have been
configured and flushed to memory. This is done by using the chain
variable in the code.

        dwc3_flush_cache((uintptr_t)buf_dma, len);
        dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);

        if (chain)
                return 0;

        memset(&params, 0, sizeof(params));
        params.param0 = upper_32_bits(dwc->ep0_trb_addr);
        params.param1 = lower_32_bits(dwc->ep0_trb_addr);

        ret = dwc3_send_gadget_ep_cmd(dwc, dep->number,
                        DWC3_DEPCMD_STARTTRANSFER, &params);



Thanks,
Faiz
Kishon Vijay Abraham I Oct. 11, 2017, 8:58 a.m. UTC | #22
Hi,

On Wednesday 11 October 2017 01:53 PM, Faiz Abbas wrote:
> Hi,
> 
> On Tuesday 10 October 2017 07:19 PM, Marek Vasut wrote:
>> On 10/10/2017 12:45 PM, Faiz Abbas wrote:
>>> Hi Marek,
>>>
>>> On Tuesday 10 October 2017 01:30 PM, Marek Vasut wrote:
>>>> On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
>>>>> Hi,
>>>>
>>>> Hi,
>>>>
>>>> [...]
>>>>
>>>>>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why *2 ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>>>>>
>>>>>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>>>>>> would be better ?
>>>>>>>>>>>>
>>>>>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>>>>>> contiguously.
>>>>>>>>>>
>>>>>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>>>>
>>>>> The TRB's should be allocated contiguously for dwc3 and only the base of the
>>>>> entire TRB table is programmed in the HW.
>>>>>  ________________ <------------------ TRB table base address
>>>>> |     TRB0       |
>>>>> |________________|
>>>>> |     TRB1       |
>>>>> |________________|
>>>>> |     TRB2       |
>>>>> |________________|
>>>>> |     TRBn       |
>>>>> |________________|
>>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>>>>
>>>>> It's not the code but it's the HW.
>>>>
>>>> That'd imply we need either some sort of flushing scheme or non-cachable
>>>> memory allocation. What does Linux do ?
>>>> The dma_alloc_coherent in linux kernel allocates non-cachable memory.
>>>
>>> Currently, the code is using local variable trb to flush the cache. When
>>> the first trb is used, dwc3_flush_cache flushes the complete
>>> CACHELINE_SIZE (including the 2nd trb).
>>> When the 2nd trb is used to flush cache, since it is an unaligned flush,
>>> it will issue a warning and will align it to the lower cache line
>>> boundary (flushing the 1st trb in the process).
>>>
>>> So with or without this patch, both trbs are getting flushed with every
>>> call. With the patch, we can just avoid misaligned messages by always
>>> flushing using an aligned address.
>>
>> What worries me is that you can flush something into the memory while
>> the controller is writing into it as well. Is that possible ?
>>
> No, control to the hardware is only given after all the trbs have been
> configured and flushed to memory. This is done by using the chain
> variable in the code.
> 
>         dwc3_flush_cache((uintptr_t)buf_dma, len);
>         dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);

this flush_cache can be moved after the chain so that flush is only invoked
after all the TRB's has been configured.

Thanks
Kishon
Faiz Abbas Oct. 11, 2017, 1:23 p.m. UTC | #23
On Wednesday 11 October 2017 02:28 PM, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On Wednesday 11 October 2017 01:53 PM, Faiz Abbas wrote:
>> Hi,
>>
>> On Tuesday 10 October 2017 07:19 PM, Marek Vasut wrote:
>>> On 10/10/2017 12:45 PM, Faiz Abbas wrote:
>>>> Hi Marek,
>>>>
>>>> On Tuesday 10 October 2017 01:30 PM, Marek Vasut wrote:
>>>>> On 10/10/2017 07:48 AM, Kishon Vijay Abraham I wrote:
>>>>>> Hi,
>>>>>
>>>>> Hi,
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>>>>>>>>> -	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
>>>>>>>>>>>>>>>> +	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why *2 ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Because its allocated as sizeof(*dwc->ep0_trb) * 2 below. This is not
>>>>>>>>>>>>>> strictly required as dwc3_flush_cache() rounds up the size to
>>>>>>>>>>>>>> CACHELINE_SIZE but from a caller POV, flush everything we allocated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can the other TRB be in use ? Maybe aligning the TRBs to cacheline size
>>>>>>>>>>>>> would be better ?
>>>>>>>>>>>>>
>>>>>>>>>>>> A single trb is 16 bytes in size and two of them are allocated
>>>>>>>>>>>> contiguously.
>>>>>>>>>>>
>>>>>>>>>>> Why are two allocated continuously ? (I am not dwc3 expert)
>>>>>>
>>>>>> The TRB's should be allocated contiguously for dwc3 and only the base of the
>>>>>> entire TRB table is programmed in the HW.
>>>>>>  ________________ <------------------ TRB table base address
>>>>>> |     TRB0       |
>>>>>> |________________|
>>>>>> |     TRB1       |
>>>>>> |________________|
>>>>>> |     TRB2       |
>>>>>> |________________|
>>>>>> |     TRBn       |
>>>>>> |________________|
>>>>>>
>>>>>>
>>>>>>>>>>
>>>>>>>>>> Neither am I. I did try to pad to the dwc_trb structure such that each
>>>>>>>>>> trb is 64 bytes in size but this leads to failures when testing. I
>>>>>>>>>> didn't get a chance to debug this though. I suspect its because the code
>>>>>>>>>> expects the trbs to be contiguous and/or 16 bytes in size.
>>>>>>
>>>>>> It's not the code but it's the HW.
>>>>>
>>>>> That'd imply we need either some sort of flushing scheme or non-cachable
>>>>> memory allocation. What does Linux do ?
>>>>> The dma_alloc_coherent in linux kernel allocates non-cachable memory.
>>>>
>>>> Currently, the code is using local variable trb to flush the cache. When
>>>> the first trb is used, dwc3_flush_cache flushes the complete
>>>> CACHELINE_SIZE (including the 2nd trb).
>>>> When the 2nd trb is used to flush cache, since it is an unaligned flush,
>>>> it will issue a warning and will align it to the lower cache line
>>>> boundary (flushing the 1st trb in the process).
>>>>
>>>> So with or without this patch, both trbs are getting flushed with every
>>>> call. With the patch, we can just avoid misaligned messages by always
>>>> flushing using an aligned address.
>>>
>>> What worries me is that you can flush something into the memory while
>>> the controller is writing into it as well. Is that possible ?
>>>
>> No, control to the hardware is only given after all the trbs have been
>> configured and flushed to memory. This is done by using the chain
>> variable in the code.
>>
>>         dwc3_flush_cache((uintptr_t)buf_dma, len);
>>         dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
> 
> this flush_cache can be moved after the chain so that flush is only invoked
> after all the TRB's has been configured.

Sure, that can be done.

Thanks,
Faiz
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index e61d980..f3a17a1 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -82,7 +82,7 @@  static int dwc3_ep0_start_trans(struct dwc3 *dwc, u8 epnum, dma_addr_t buf_dma,
 				| DWC3_TRB_CTRL_LST);
 
 	dwc3_flush_cache((uintptr_t)buf_dma, len);
-	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
+	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
 
 	if (chain)
 		return 0;
@@ -790,7 +790,7 @@  static void dwc3_ep0_complete_data(struct dwc3 *dwc,
 	if (!r)
 		return;
 
-	dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
+	dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr, sizeof(*trb) * 2);
 
 	status = DWC3_TRB_SIZE_TRBSTS(trb->size);
 	if (status == DWC3_TRBSTS_SETUP_PENDING) {
@@ -821,7 +821,8 @@  static void dwc3_ep0_complete_data(struct dwc3 *dwc,
 			ur->actual += transferred;
 
 			trb++;
-			dwc3_flush_cache((uintptr_t)trb, sizeof(*trb));
+			dwc3_flush_cache((uintptr_t)dwc->ep0_trb_addr,
+					 sizeof(*trb) * 2);
 			length = trb->size & DWC3_TRB_SIZE_MASK;
 
 			ep0->free_slot = 0;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index e065c5a..895a5bc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2567,7 +2567,8 @@  int dwc3_gadget_init(struct dwc3 *dwc)
 		goto err0;
 	}
 
-	dwc->ep0_trb = dma_alloc_coherent(sizeof(*dwc->ep0_trb) * 2,
+	dwc->ep0_trb = dma_alloc_coherent(ROUND(sizeof(*dwc->ep0_trb) * 2,
+						CACHELINE_SIZE),
 					  (unsigned long *)&dwc->ep0_trb_addr);
 	if (!dwc->ep0_trb) {
 		dev_err(dwc->dev, "failed to allocate ep0 trb\n");