diff mbox

[RFC,net-next,4/4] net_sched: make ingress qdisc lockless

Message ID 1389291593-2494-5-git-send-email-xiyou.wangcong@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang Jan. 9, 2014, 6:19 p.m. UTC
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/core/dev.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Eric Dumazet Jan. 10, 2014, 12:21 a.m. UTC | #1
On Thu, 2014-01-09 at 10:19 -0800, Cong Wang wrote:
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/core/dev.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..e357d05 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3376,10 +3376,8 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
>  
>  	q = rxq->qdisc;
>  	if (q != &noop_qdisc) {
> -		spin_lock(qdisc_lock(q));
>  		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
>  			result = qdisc_enqueue_root(skb, q);
> -		spin_unlock(qdisc_lock(q));
>  	}
>  
>  	return result;


Really, you'll have to explain in the changelog why you think this is
safe, because I really do not see how this can be valid.

I think I already said it was not safe at all...

You could try a multiqueue NIC for some interesting effects.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Jan. 10, 2014, 12:30 a.m. UTC | #2
On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> Really, you'll have to explain in the changelog why you think this is
> safe, because I really do not see how this can be valid.
>
> I think I already said it was not safe at all...
>
> You could try a multiqueue NIC for some interesting effects.
>

There is only one ingress queue, that is dev->ingress_queue, right?

And since on ingress, the only possible qdisc is sch_ingress,
looking at ingress_enqueue(), I don't see anything so dangerous.

As I said in the cover letter, I may still miss something in the qdisc
layer, but doesn't look like related with multiqueue. Mind to be more
specific?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger Jan. 10, 2014, 12:49 a.m. UTC | #3
On Thu, 9 Jan 2014 16:30:12 -0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> > Really, you'll have to explain in the changelog why you think this is
> > safe, because I really do not see how this can be valid.
> >
> > I think I already said it was not safe at all...
> >
> > You could try a multiqueue NIC for some interesting effects.
> >
> 
> There is only one ingress queue, that is dev->ingress_queue, right?
> 
> And since on ingress, the only possible qdisc is sch_ingress,
> looking at ingress_enqueue(), I don't see anything so dangerous.
> 
> As I said in the cover letter, I may still miss something in the qdisc
> layer, but doesn't look like related with multiqueue. Mind to be more
> specific?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I think what Eric is saying is that on a multi-queue NIC, multiple
queues can be receiving packets and then sending them on to the ingress
queue discipline. Up until your patch that code itself was protected
by qdisc_lock and did not have to worry about any SMP issues. Now, any
qdisc attached on ingress could run in parallel. This would break all
the code in those queue disciplines. Think of the simplest case
of policing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Jan. 10, 2014, 1:06 a.m. UTC | #4
On 1/9/2014 4:49 PM, Stephen Hemminger wrote:
> On Thu, 9 Jan 2014 16:30:12 -0800
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>> Really, you'll have to explain in the changelog why you think this is
>>> safe, because I really do not see how this can be valid.
>>>
>>> I think I already said it was not safe at all...
>>>
>>> You could try a multiqueue NIC for some interesting effects.
>>>
>>
>> There is only one ingress queue, that is dev->ingress_queue, right?
>>
>> And since on ingress, the only possible qdisc is sch_ingress,
>> looking at ingress_enqueue(), I don't see anything so dangerous.
>>
>> As I said in the cover letter, I may still miss something in the qdisc
>> layer, but doesn't look like related with multiqueue. Mind to be more
>> specific?
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> I think what Eric is saying is that on a multi-queue NIC, multiple
> queues can be receiving packets and then sending them on to the ingress
> queue discipline. Up until your patch that code itself was protected
> by qdisc_lock and did not have to worry about any SMP issues. Now, any
> qdisc attached on ingress could run in parallel. This would break all
> the code in those queue disciplines. Think of the simplest case
> of policing.
> --

Just to re-iterate you need to go through each and every qdisc,
classifier, action and verify it is safe to run in parallel. Take
a look at how the skb lists are managed in the qdiscs. If we want
to do this we need to make these changes in some coherent way
because it touches lots of pieces.

Also your stats are going to get hosed none of the bstats, qstats
supports this. I'll send out the classifier set later tonight
if you want. I got stalled going through the actions.

Finally any global state in those qdiscs is going to drive performance
down so many of them would likely need to be redesigned.

.John



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Jan. 10, 2014, 1:09 a.m. UTC | #5
On Thu, Jan 9, 2014 at 4:49 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> I think what Eric is saying is that on a multi-queue NIC, multiple
> queues can be receiving packets and then sending them on to the ingress
> queue discipline. Up until your patch that code itself was protected
> by qdisc_lock and did not have to worry about any SMP issues. Now, any
> qdisc attached on ingress could run in parallel. This would break all
> the code in those queue disciplines. Think of the simplest case
> of policing.

I noticed that we have multiqueue for rx, which is dev->_rx[], so they
still share the name dev->ingress_queue. Packets on different CPU's
from the same device still need to run through the same ingress qdisc.

Except for ifb, ingress qdisc can only do filtering (with some actions),
therefore I still don't see the problem.

ifb device switches to its "egress" to do policing, which is safe since it
does not use ->ingress_queue any more.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 10, 2014, 1:11 a.m. UTC | #6
On Thu, 2014-01-09 at 16:30 -0800, Cong Wang wrote:
> On Thu, Jan 9, 2014 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> > Really, you'll have to explain in the changelog why you think this is
> > safe, because I really do not see how this can be valid.
> >
> > I think I already said it was not safe at all...
> >
> > You could try a multiqueue NIC for some interesting effects.
> >
> 
> There is only one ingress queue, that is dev->ingress_queue, right ?

Yes. And you can have multiple cpus trying to use it at the same time.

> And since on ingress, the only possible qdisc is sch_ingress,
> looking at ingress_enqueue(), I don't see anything so dangerous.
> 
> As I said in the cover letter, I may still miss something in the qdisc
> layer, but doesn't look like related with multiqueue. Mind to be more
> specific?

Well, there is one qdisc, and if your NIC is multiqueue, with for
example 32 queues, you can have 32 cpu happily using this qdisc at once.

Thats why you need the spinlock.

I am very afraid seeing you changing this path without understanding
how it is used.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cong Wang Jan. 10, 2014, 1:21 a.m. UTC | #7
On Thu, Jan 9, 2014 at 5:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Well, there is one qdisc, and if your NIC is multiqueue, with for
> example 32 queues, you can have 32 cpu happily using this qdisc at once.

I did see this, but still don't see the problem.

>
> Thats why you need the spinlock.
>

It looks like you are saying we queue the packets somewhere
in qdisc_enqueue_root() therefore needs a spinlock, but looking
at the code:

static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
{
        qdisc_calculate_pkt_len(skb, sch);
        return sch->enqueue(skb, sch);
}

static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
{
        qdisc_skb_cb(skb)->pkt_len = skb->len;
        return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
}

so it almost equals to calling ->enqueue directly, for ingress, which is
ingress_enqueue(). Except updating some stats, the only thing
it does is calling tc_classify().

So where is the problem at qdisc layer? I must miss something too obvious...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Jan. 12, 2014, 12:30 p.m. UTC | #8
On 01/09/14 20:06, John Fastabend wrote:

> Just to re-iterate you need to go through each and every qdisc,
> classifier, action and verify it is safe to run in parallel. Take
> a look at how the skb lists are managed in the qdiscs. If we want
> to do this we need to make these changes in some coherent way
> because it touches lots of pieces.
>

Indeed. Everything assumes the global qdisc lock is protecting them.
Actually actions are probably the best at the moment because
the lock is very fine grained to just the action instance
and it protects both control and data paths.
But filters have stuff littered everywhere.  Egress qdiscs
as you mention have queues at multi levels etc.

> Also your stats are going to get hosed none of the bstats, qstats
> supports this.

Stats are probably the easiest to "fix".
Didnt Eric (or somebody else) fix netdev level stats to use seq counts?
Would that idea not be applicable here?

>I'll send out the classifier set later tonight
> if you want. I got stalled going through the actions.
>

The thing to note is:
actions can be shared across filters, netdevices and cpus.
By default they are not shared across filters and netdevices that is
a config option. You still have to worry about sharing across cpus
which will happen because a flow can be shared across cpus.
You could probably get rid of the lock if you can show
that you can make data and control path mutually exclusive
(rtnl will protect control path).

> Finally any global state in those qdiscs is going to drive performance
> down so many of them would likely need to be redesigned.
>

I feel like per-cpu qdiscs is the best surgery at the moment.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index ce01847..e357d05 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3376,10 +3376,8 @@  static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 
 	q = rxq->qdisc;
 	if (q != &noop_qdisc) {
-		spin_lock(qdisc_lock(q));
 		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
 			result = qdisc_enqueue_root(skb, q);
-		spin_unlock(qdisc_lock(q));
 	}
 
 	return result;