diff mbox

[2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit

Message ID 1469534545-14478-3-git-send-email-u.kleine-koenig@pengutronix.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Uwe Kleine-König July 26, 2016, 12:02 p.m. UTC
Allocating and preparing a dma descriptor doesn't need to happen under
the channel's lock. So do this before taking the channel's lock. The only
down side is that the dma descriptor might be allocated even though the
channel is about to be stopped. This is unlikely though.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 38 +++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

Comments

Grygorii Strashko July 26, 2016, 2:25 p.m. UTC | #1
On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> Allocating and preparing a dma descriptor doesn't need to happen under
> the channel's lock. So do this before taking the channel's lock. The only
> down side is that the dma descriptor might be allocated even though the
> channel is about to be stopped. This is unlikely though.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/net/ethernet/ti/davinci_cpdma.c | 38 +++++++++++++++++----------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index 5ffa04a306c6..ba3462707ae3 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
>  	u32				mode;
>  	int				ret = 0;
>  
> -	spin_lock_irqsave(&chan->lock, flags);
> -
> -	if (chan->state == CPDMA_STATE_TEARDOWN) {
> -		ret = -EINVAL;
> -		goto unlock_ret;
> -	}
> -
> -	if (chan->count >= chan->desc_num)	{
> -		chan->stats.desc_alloc_fail++;
> -		ret = -ENOMEM;
> -		goto unlock_ret;
> -	}

I'm not sure this is right thing to do. This check is expected to be strict
and means "channel has exhausted the available descriptors, so further descs allocation does not allowed".


This also might affect on Ivan's work [1] "[PATCH 0/4]  net: ethernet: ti: cpsw: add multi-queue support"



[1] https://lkml.org/lkml/2016/6/30/603
Uwe Kleine-König July 27, 2016, 7:12 a.m. UTC | #2
Hello,

On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:
> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> > Allocating and preparing a dma descriptor doesn't need to happen under
> > the channel's lock. So do this before taking the channel's lock. The only
> > down side is that the dma descriptor might be allocated even though the
> > channel is about to be stopped. This is unlikely though.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> >  drivers/net/ethernet/ti/davinci_cpdma.c | 38 +++++++++++++++++----------------
> >  1 file changed, 20 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> > index 5ffa04a306c6..ba3462707ae3 100644
> > --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> > +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> > @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
> >  	u32				mode;
> >  	int				ret = 0;
> >  
> > -	spin_lock_irqsave(&chan->lock, flags);
> > -
> > -	if (chan->state == CPDMA_STATE_TEARDOWN) {
> > -		ret = -EINVAL;
> > -		goto unlock_ret;
> > -	}
> > -
> > -	if (chan->count >= chan->desc_num)	{
> > -		chan->stats.desc_alloc_fail++;
> > -		ret = -ENOMEM;
> > -		goto unlock_ret;
> > -	}
> 
> I'm not sure this is right thing to do. This check is expected to be strict
> and means "channel has exhausted the available descriptors, so further descs allocation does not allowed".

I developed this patch basing on a 4.4 kernel which doesn't have
742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
my patch is more obviously correct. As currently chan->count is
protected by chan->lock we must hold the lock for this check. If a
failing check means we must not call cpdma_desc_alloc in the first
place, that's bad.

But I'm not sure this is the case here. After all cpdma_desc_alloc
doesn't do anything relevant for the hardware, right?

Best regards
Uwe
Grygorii Strashko July 27, 2016, 2:08 p.m. UTC | #3
On 07/27/2016 10:12 AM, Uwe Kleine-König wrote:
> Hello,
> 
> On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:
>> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
>>> Allocating and preparing a dma descriptor doesn't need to happen under
>>> the channel's lock. So do this before taking the channel's lock. The only
>>> down side is that the dma descriptor might be allocated even though the
>>> channel is about to be stopped. This is unlikely though.
>>>
>>> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>>> ---
>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 38 +++++++++++++++++----------------
>>>  1 file changed, 20 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> index 5ffa04a306c6..ba3462707ae3 100644
>>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
>>>  	u32				mode;
>>>  	int				ret = 0;
>>>  
>>> -	spin_lock_irqsave(&chan->lock, flags);
>>> -
>>> -	if (chan->state == CPDMA_STATE_TEARDOWN) {
>>> -		ret = -EINVAL;
>>> -		goto unlock_ret;
>>> -	}
>>> -
>>> -	if (chan->count >= chan->desc_num)	{
>>> -		chan->stats.desc_alloc_fail++;
>>> -		ret = -ENOMEM;
>>> -		goto unlock_ret;
>>> -	}
>>
>> I'm not sure this is right thing to do. This check is expected to be strict
>> and means "channel has exhausted the available descriptors, so further descs allocation does not allowed".
> 
> I developed this patch basing on a 4.4 kernel which doesn't have
> 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
> my patch is more obviously correct. As currently chan->count is
> protected by chan->lock we must hold the lock for this check. If a
> failing check means we must not call cpdma_desc_alloc in the first
> place, that's bad.

Yes. That's intention of this check :(
Now it'll work as following for two (rx/tx) channels, as example
RX desc_num = 16 (max allowed number of descriptors)
TX desc_num = 16 (max allowed number of descriptors)
and with current code number of allocated descriptors will never exceed 16.

with your change, in corner case when TX channel's already utilized 16 descriptors the
following will happen:
cpdma_chan_submit()
 - cpdma_desc_alloc() -  will allocate 17th desc
 - lock
 - check for chan->count - fail
 - unlock
 - cpdma_desc_free() 

so your patch will add additional desc_alloc/desc_free in the above corner case
and that's what i'm worry about (TEARDOWN seems ok) especially taking into account
further multi-queue feature development.

Above corner case seems might happen very rare, because of the guard check in cpsw_ndo_start_xmit(), 
but it could.

> 
> But I'm not sure this is the case here. After all cpdma_desc_alloc
> doesn't do anything relevant for the hardware, right?

Right.

Thanks. I'd try to do some measurement also.
Ivan Khoronzhuk July 27, 2016, 6:11 p.m. UTC | #4
On 27.07.16 17:08, Grygorii Strashko wrote:
> On 07/27/2016 10:12 AM, Uwe Kleine-König wrote:
>> Hello,
>>
>> On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:
>>> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
>>>> Allocating and preparing a dma descriptor doesn't need to happen under
>>>> the channel's lock. So do this before taking the channel's lock. The only
>>>> down side is that the dma descriptor might be allocated even though the
>>>> channel is about to be stopped. This is unlikely though.
>>>>
>>>> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>>>> ---
>>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 38 +++++++++++++++++----------------
>>>>  1 file changed, 20 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> index 5ffa04a306c6..ba3462707ae3 100644
>>>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
>>>>  	u32				mode;
>>>>  	int				ret = 0;
>>>>
>>>> -	spin_lock_irqsave(&chan->lock, flags);
>>>> -
>>>> -	if (chan->state == CPDMA_STATE_TEARDOWN) {
>>>> -		ret = -EINVAL;
>>>> -		goto unlock_ret;
>>>> -	}
>>>> -
>>>> -	if (chan->count >= chan->desc_num)	{
>>>> -		chan->stats.desc_alloc_fail++;
>>>> -		ret = -ENOMEM;
>>>> -		goto unlock_ret;
>>>> -	}
>>>
>>> I'm not sure this is right thing to do. This check is expected to be strict
>>> and means "channel has exhausted the available descriptors, so further descs allocation does not allowed".
>>
>> I developed this patch basing on a 4.4 kernel which doesn't have
>> 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
>> my patch is more obviously correct. As currently chan->count is
>> protected by chan->lock we must hold the lock for this check. If a
>> failing check means we must not call cpdma_desc_alloc in the first
>> place, that's bad.
The chan->count is not only case where this lock is needed unfortunately.
I like the idea to remove a bunch of locks from here (I was wondering why it needs to have
so much locks when using h/w queues, but this is the style driver is written though)
This lock is also needed to cover stats counters at least.
In case of cpsw driver, that uses cpdam_chan, the same channel can be shared
between two emacs (in dual emac mode) then the lock is needed for every chan var.
So that's not rare case. In general, the optimization of cpdma is good idea,
but seems it can require much more changes.

>
> Yes. That's intention of this check :(
> Now it'll work as following for two (rx/tx) channels, as example
> RX desc_num = 16 (max allowed number of descriptors)
> TX desc_num = 16 (max allowed number of descriptors)
> and with current code number of allocated descriptors will never exceed 16.
>
> with your change, in corner case when TX channel's already utilized 16 descriptors the
> following will happen:
> cpdma_chan_submit()
>  - cpdma_desc_alloc() -  will allocate 17th desc
>  - lock
>  - check for chan->count - fail
>  - unlock
>  - cpdma_desc_free()
>
> so your patch will add additional desc_alloc/desc_free in the above corner case
> and that's what i'm worry about (TEARDOWN seems ok) especially taking into account
> further multi-queue feature development.
>
> Above corner case seems might happen very rare, because of the guard check in cpsw_ndo_start_xmit(),
> but it could.
>
>>
>> But I'm not sure this is the case here. After all cpdma_desc_alloc
>> doesn't do anything relevant for the hardware, right?
>
> Right.
>
> Thanks. I'd try to do some measurement also.
>
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 5ffa04a306c6..ba3462707ae3 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -542,24 +542,10 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	u32				mode;
 	int				ret = 0;
 
-	spin_lock_irqsave(&chan->lock, flags);
-
-	if (chan->state == CPDMA_STATE_TEARDOWN) {
-		ret = -EINVAL;
-		goto unlock_ret;
-	}
-
-	if (chan->count >= chan->desc_num)	{
-		chan->stats.desc_alloc_fail++;
-		ret = -ENOMEM;
-		goto unlock_ret;
-	}
-
 	desc = cpdma_desc_alloc(ctlr->pool);
 	if (!desc) {
 		chan->stats.desc_alloc_fail++;
-		ret = -ENOMEM;
-		goto unlock_ret;
+		return -ENOMEM;
 	}
 
 	if (len < ctlr->params.min_packet_size) {
@@ -571,8 +557,7 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	ret = dma_mapping_error(ctlr->dev, buffer);
 	if (ret) {
 		cpdma_desc_free(ctlr->pool, desc, 1);
-		ret = -EINVAL;
-		goto unlock_ret;
+		return -EINVAL;
 	}
 
 	mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP;
@@ -586,6 +571,19 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	desc_write(desc, sw_buffer, buffer);
 	desc_write(desc, sw_len,    len);
 
+	spin_lock_irqsave(&chan->lock, flags);
+
+	if (chan->state == CPDMA_STATE_TEARDOWN) {
+		ret = -EINVAL;
+		goto unlock_free;
+	}
+
+	if (chan->count >= chan->desc_num)	{
+		chan->stats.desc_alloc_fail++;
+		ret = -ENOMEM;
+		goto unlock_free;
+	}
+
 	__cpdma_chan_submit(chan, desc);
 
 	if (chan->state == CPDMA_STATE_ACTIVE && chan->rxfree)
@@ -593,8 +591,12 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 
 	chan->count++;
 
-unlock_ret:
 	spin_unlock_irqrestore(&chan->lock, flags);
+	return 0;
+
+unlock_free:
+	spin_unlock_irqrestore(&chan->lock, flags);
+	cpdma_desc_free(ctlr->pool, desc, 1);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(cpdma_chan_submit);