diff mbox

pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.

Message ID 20081017201210.GA2527@ami.dom.local
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski Oct. 17, 2008, 8:12 p.m. UTC
On Fri, Oct 17, 2008 at 04:12:03PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Fri, Oct 17, 2008 at 02:33:23PM +0200, Patrick McHardy wrote:
>>>> @@ -233,7 +233,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>>>  		 */
>>>>  		cb->time_to_send = psched_get_time();
>>>>  		q->counter = 0;
>>>> -		ret = q->qdisc->ops->requeue(skb, q->qdisc);
>>>> +		q->qdisc->flags |= TCQ_F_REQUEUE;
>>>> +		ret = qdisc_equeue(skb, q->qdisc);
>>>> +		q->qdisc->flags &= ~TCQ_F_REQUEUE;
>>> Well, the inner qdisc would still need to logic to order packets
>>> apprioriately.
>>
>> I'm not sure I was understood: the idea is to do something like
>> in this example in tfifo_enqueue() in all leaf qdiscs like fifo
>> etc. too, so to redirect their ->enqueue() to their ->requeue()
>> which usually is qdisc_requeue() (or to it directly if needed).
>
> Yes, I misunderstood this, I though the intention was to get
> rid of requeue entirely.

I was less ambitious and thought about simplifying this at least, but
if you think we can go further, it's OK with me. Then we can do it
only in tfifo. If qdisc_requeue() does the proper logic for it now,
I guess it should be enough to open code this into tfifo_enqueue()
(so we could kill qdisc_requeue() later). Using this TCQ_F_REQUEUE
flag only for this looks a bit wasteful, but I can't see anything
smarter at the moment. 

>>> Its probably not that hard, but as I said, I don't
>>> think its necessary at all. It only makes a difference with a
>>> non-work-conserving inner qdisc, but a lot of the functionality of
>>> netem requires the inner tfifo anyways and rate-limiting is usually
>>> done on top of netem. So I would suggest so either hard-wire the
>>> tfifo qdisc or at least make the assumption that inner qdiscs are
>>> work-conserving.

I think Stephen could be interested with this change so I added him to Cc.

Thanks,
Jarek P.
------------------->

pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.

After introducing qdisc->ops->peek() method the only remaining user of
qdisc->ops->requeue() is netem_enqueue() using this for packet
re-ordering. According to Patrick McHardy: "a lot of the functionality
of netem requires the inner tfifo anyways and rate-limiting is usually
done on top of netem. So I would suggest so either hard-wire the tfifo
qdisc or at least make the assumption that inner qdiscs are work-
conserving." This patch tries the former.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 include/net/sch_generic.h |    1 +
 net/sched/sch_netem.c     |   18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Oct. 21, 2008, 11:36 p.m. UTC | #1
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Fri, 17 Oct 2008 22:12:10 +0200

> pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.
> 
> After introducing qdisc->ops->peek() method the only remaining user of
> qdisc->ops->requeue() is netem_enqueue() using this for packet
> re-ordering. According to Patrick McHardy: "a lot of the functionality
> of netem requires the inner tfifo anyways and rate-limiting is usually
> done on top of netem. So I would suggest so either hard-wire the tfifo
> qdisc or at least make the assumption that inner qdiscs are work-
> conserving." This patch tries the former.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

This is an interesting patch.

But the thing that strikes me is this: Why don't we just let sch_netem do
the reordering inside of itself entirely and just get rid of all of this
->requeue() business?

sch_netem is just a black box, like any other packet scheduler node in
the tree, and so it can internally do the reordering with a self managed
packet list or similar.  All of this can be hidden inside of it's ->dequeue()
with some pkt_sch watchdog timer that fires to prevent stale packets sitting
in the reorder queue forever.

Anyways, just and idea and RFC, just like this patch ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Oct. 21, 2008, 11:51 p.m. UTC | #2
On Tue, 21 Oct 2008 16:36:05 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Fri, 17 Oct 2008 22:12:10 +0200
> 
> > pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.
> > 
> > After introducing qdisc->ops->peek() method the only remaining user of
> > qdisc->ops->requeue() is netem_enqueue() using this for packet
> > re-ordering. According to Patrick McHardy: "a lot of the functionality
> > of netem requires the inner tfifo anyways and rate-limiting is usually
> > done on top of netem. So I would suggest so either hard-wire the tfifo
> > qdisc or at least make the assumption that inner qdiscs are work-
> > conserving." This patch tries the former.
> > 
> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> 
> This is an interesting patch.
> 
> But the thing that strikes me is this: Why don't we just let sch_netem do
> the reordering inside of itself entirely and just get rid of all of this
> ->requeue() business?
> 
> sch_netem is just a black box, like any other packet scheduler node in
> the tree, and so it can internally do the reordering with a self managed
> packet list or similar.  All of this can be hidden inside of it's ->dequeue()
> with some pkt_sch watchdog timer that fires to prevent stale packets sitting
> in the reorder queue forever.
> 
> Anyways, just and idea and RFC, just like this patch ;-)

The problem is that jamal talked me into having netem as a classful qdisc,
instead of doing its own rate control.  People like to do use TBF as inner qdisc,
and do reordering.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 22, 2008, 5:37 a.m. UTC | #3
On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote:
> On Tue, 21 Oct 2008 16:36:05 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Jarek Poplawski <jarkao2@gmail.com>
> > Date: Fri, 17 Oct 2008 22:12:10 +0200
> > 
> > > pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.
> > > 
> > > After introducing qdisc->ops->peek() method the only remaining user of
> > > qdisc->ops->requeue() is netem_enqueue() using this for packet
> > > re-ordering. According to Patrick McHardy: "a lot of the functionality
> > > of netem requires the inner tfifo anyways and rate-limiting is usually
> > > done on top of netem. So I would suggest so either hard-wire the tfifo
> > > qdisc or at least make the assumption that inner qdiscs are work-
> > > conserving." This patch tries the former.
> > > 
> > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> > 
> > This is an interesting patch.
> > 
> > But the thing that strikes me is this: Why don't we just let sch_netem do
> > the reordering inside of itself entirely and just get rid of all of this
> > ->requeue() business?

We just let for this here with sch_netem's default tfifo qdisc.

> > 
> > sch_netem is just a black box, like any other packet scheduler node in
> > the tree, and so it can internally do the reordering with a self managed
> > packet list or similar.  All of this can be hidden inside of it's ->dequeue()
> > with some pkt_sch watchdog timer that fires to prevent stale packets sitting
> > in the reorder queue forever.
> > 
> > Anyways, just and idea and RFC, just like this patch ;-)
> 
> The problem is that jamal talked me into having netem as a classful qdisc,
> instead of doing its own rate control.  People like to do use TBF as inner qdisc,
> and do reordering.

If it's only this kind of usage we could export tfifo and let use this
as a TBF's (etc.) leaf. Of course, this would require changes in those
people scripts.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 3:57 p.m. UTC | #4
David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Fri, 17 Oct 2008 22:12:10 +0200
> 
>> pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.
>>
>> After introducing qdisc->ops->peek() method the only remaining user of
>> qdisc->ops->requeue() is netem_enqueue() using this for packet
>> re-ordering. According to Patrick McHardy: "a lot of the functionality
>> of netem requires the inner tfifo anyways and rate-limiting is usually
>> done on top of netem. So I would suggest so either hard-wire the tfifo
>> qdisc or at least make the assumption that inner qdiscs are work-
>> conserving." This patch tries the former.
>>
>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> 
> This is an interesting patch.
> 
> But the thing that strikes me is this: Why don't we just let sch_netem do
> the reordering inside of itself entirely and just get rid of all of this
> ->requeue() business?

I fully agree, keeping all the ->requeue crap around just for this
cornercase doesn't seem like a good decision. Most of the ->requeu

> sch_netem is just a black box, like any other packet scheduler node in
> the tree, and so it can internally do the reordering with a self managed
> packet list or similar.  All of this can be hidden inside of it's ->dequeue()
> with some pkt_sch watchdog timer that fires to prevent stale packets sitting
> in the reorder queue forever.
> 
> Anyways, just and idea and RFC, just like this patch ;-)
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 4 p.m. UTC | #5
[Damn hotkeys for sending incomplete mails :)]

David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Fri, 17 Oct 2008 22:12:10 +0200
> 
>> pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc.
>>
>> After introducing qdisc->ops->peek() method the only remaining user of
>> qdisc->ops->requeue() is netem_enqueue() using this for packet
>> re-ordering. According to Patrick McHardy: "a lot of the functionality
>> of netem requires the inner tfifo anyways and rate-limiting is usually
>> done on top of netem. So I would suggest so either hard-wire the tfifo
>> qdisc or at least make the assumption that inner qdiscs are work-
>> conserving." This patch tries the former.
>>
>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> 
> This is an interesting patch.
> 
> But the thing that strikes me is this: Why don't we just let sch_netem do
> the reordering inside of itself entirely and just get rid of all of this
> ->requeue() business?

I fully agree, keeping all the ->requeue crap around just for this
cornercase doesn't seem like a good decision. Most of the ->requeue
functions are completely inconsistent in their behaviour anyways:
some undo ->enqueue state changes (which is obviously broken for
netem), some don't, some redo checks from ->enqueue (necessary for
netem), some don't (TBF), ...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 4 p.m. UTC | #6
Jarek Poplawski wrote:
> On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote:
>> On Tue, 21 Oct 2008 16:36:05 -0700 (PDT)
>> David Miller <davem@davemloft.net> wrote:
>>
>>> sch_netem is just a black box, like any other packet scheduler node in
>>> the tree, and so it can internally do the reordering with a self managed
>>> packet list or similar.  All of this can be hidden inside of it's ->dequeue()
>>> with some pkt_sch watchdog timer that fires to prevent stale packets sitting
>>> in the reorder queue forever.
>>>
>>> Anyways, just and idea and RFC, just like this patch ;-)
>> The problem is that jamal talked me into having netem as a classful qdisc,
>> instead of doing its own rate control.  People like to do use TBF as inner qdisc,
>> and do reordering.
> 
> If it's only this kind of usage we could export tfifo and let use this
> as a TBF's (etc.) leaf. Of course, this would require changes in those
> people scripts.

In that case we might as well teach them to use TBF as *parent*
of netem (and I'd vote to do that and kill requeue).

But we can argue about this forever without any progress. The
question is simple - should we enforce a reasonable qdisc structure
and kill ->requeue or keep it around forever. Keep in mind that there
is no loss of functionality by using TBF as parent and that we
can do this gradually so users have a chance to fix their scripts,
should anyone really use TBF as inner qdisc.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 22, 2008, 4:49 p.m. UTC | #7
On Wed, Oct 22, 2008 at 06:00:56PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote:
>>> On Tue, 21 Oct 2008 16:36:05 -0700 (PDT)
>>> David Miller <davem@davemloft.net> wrote:
>>>
>>>> sch_netem is just a black box, like any other packet scheduler node in
>>>> the tree, and so it can internally do the reordering with a self managed
>>>> packet list or similar.  All of this can be hidden inside of it's ->dequeue()
>>>> with some pkt_sch watchdog timer that fires to prevent stale packets sitting
>>>> in the reorder queue forever.
>>>>
>>>> Anyways, just and idea and RFC, just like this patch ;-)
>>> The problem is that jamal talked me into having netem as a classful qdisc,
>>> instead of doing its own rate control.  People like to do use TBF as inner qdisc,
>>> and do reordering.
>>
>> If it's only this kind of usage we could export tfifo and let use this
>> as a TBF's (etc.) leaf. Of course, this would require changes in those
>> people scripts.
>
> In that case we might as well teach them to use TBF as *parent*
> of netem (and I'd vote to do that and kill requeue).
>
> But we can argue about this forever without any progress. The
> question is simple - should we enforce a reasonable qdisc structure
> and kill ->requeue or keep it around forever. Keep in mind that there
> is no loss of functionality by using TBF as parent and that we
> can do this gradually so users have a chance to fix their scripts,
> should anyone really use TBF as inner qdisc.
>

I'm not sure we think about the same: this tfifo idea doesn't need
->requeue() at all. This would go through TBF's or prio's (etc.)
->enqueue(), and only tfifo's ->enqueue(), if it's used as a leaf,
checks the qdisc flag and can reorder.

But if it's useless, no problem. I can redo this patch without this
qdisc flag.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Oct. 22, 2008, 5:32 p.m. UTC | #8
Jarek Poplawski wrote:
> On Wed, Oct 22, 2008 at 06:00:56PM +0200, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
>>> If it's only this kind of usage we could export tfifo and let use this
>>> as a TBF's (etc.) leaf. Of course, this would require changes in those
>>> people scripts.
>> In that case we might as well teach them to use TBF as *parent*
>> of netem (and I'd vote to do that and kill requeue).
>>
>> But we can argue about this forever without any progress. The
>> question is simple - should we enforce a reasonable qdisc structure
>> and kill ->requeue or keep it around forever. Keep in mind that there
>> is no loss of functionality by using TBF as parent and that we
>> can do this gradually so users have a chance to fix their scripts,
>> should anyone really use TBF as inner qdisc.
>>
> 
> I'm not sure we think about the same: this tfifo idea doesn't need
> ->requeue() at all. This would go through TBF's or prio's (etc.)
> ->enqueue(), and only tfifo's ->enqueue(), if it's used as a leaf,
> checks the qdisc flag and can reorder.

I see. So both ways would work fine to get rid of requeue. The
flag doesn't seem to be necessary though since tfifo already
does reordering based on time_to_send.

> But if it's useless, no problem. I can redo this patch without this
> qdisc flag.

Well, you're doing the work, so you decide. I'm undecided myself,
the main issues I see are:

- we might have to reeducate users twice if we decide to enforce
   more structure later on

- a lot of other qdiscs still don't work as inner qdiscs of netem:

   - any reordering qdisc can cause stalls since netem_dequeue
     expects to get packets ordered by time_to_send. This means we
     can't use cbq, hfsc, htb, prio, sfq, leaving atm, dsmark, netem,
     red, gred, tbf and the fifos. I guess we can strike atm as
     "makes no sense" and dsmark as "obsolete since 10(?) years".

   - netem can't be used as inner qdisc since it would corrupt the
     skb's cb

So the usable inner qdiscs are: tbf, red, gred, *fifo. The fact
that over 50% of the qdiscs you could use can cause misbehaviour
and TBF, red and gred can be used as upper qdiscs without any loss
of functionality makes me think netem simply shouldn't be classful.

So actually I am decided :) I think netem shouldn't be classful.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Oct. 22, 2008, 5:53 p.m. UTC | #9
On Wed, Oct 22, 2008 at 07:32:02PM +0200, Patrick McHardy wrote:
...
> Well, you're doing the work, so you decide. I'm undecided myself,
> the main issues I see are:

Well, I've only tried to finish David's "Killing qdisc->ops->requeue()",
which I had broken before. Then somebody framed me into this ->peek()
/netem thing...

...
> So actually I am decided :) I think netem shouldn't be classful.

OK, I added RFC to the subject.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9dcb5bf..9157766 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -45,6 +45,7 @@  struct Qdisc
 #define TCQ_F_BUILTIN	1
 #define TCQ_F_THROTTLED	2
 #define TCQ_F_INGRESS	4
+#define TCQ_F_REQUEUE	8
 	int			padded;
 	struct Qdisc_ops	*ops;
 	struct qdisc_size_table	*stab;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 3080bd6..a30f5b6 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -233,7 +233,14 @@  static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		 */
 		cb->time_to_send = psched_get_time();
 		q->counter = 0;
-		ret = q->qdisc->ops->requeue(skb, q->qdisc);
+		q->qdisc->flags |= TCQ_F_REQUEUE;
+		ret = qdisc_enqueue(skb, q->qdisc);
+		if (unlikely(q->qdisc->flags & TCQ_F_REQUEUE)) {
+			q->qdisc->flags &= ~TCQ_F_REQUEUE;
+			if (net_ratelimit())
+				printk(KERN_WARNING "netem_enqueue: re-ordering"
+				    " unsupported; use default (tfifo) qdisc.");
+		}
 	}
 
 	if (likely(ret == NET_XMIT_SUCCESS)) {
@@ -478,6 +485,15 @@  static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 	psched_time_t tnext = netem_skb_cb(nskb)->time_to_send;
 	struct sk_buff *skb;
 
+	if (sch->flags & TCQ_F_REQUEUE) {
+		sch->flags &= ~TCQ_F_REQUEUE;
+		__skb_queue_head(list, nskb);
+		sch->qstats.backlog += qdisc_pkt_len(nskb);
+		sch->qstats.requeues++;
+
+		return NET_XMIT_SUCCESS;
+	}
+
 	if (likely(skb_queue_len(list) < q->limit)) {
 		/* Optimize for add at tail */
 		if (likely(skb_queue_empty(list) || tnext >= q->oldest)) {