Message ID | 20081017201210.GA2527@ami.dom.local |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Jarek Poplawski <jarkao2@gmail.com> Date: Fri, 17 Oct 2008 22:12:10 +0200 > pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc. > > After introducing qdisc->ops->peek() method the only remaining user of > qdisc->ops->requeue() is netem_enqueue() using this for packet > re-ordering. According to Patrick McHardy: "a lot of the functionality > of netem requires the inner tfifo anyways and rate-limiting is usually > done on top of netem. So I would suggest so either hard-wire the tfifo > qdisc or at least make the assumption that inner qdiscs are work- > conserving." This patch tries the former. > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> This is an interesting patch. But the thing that strikes me is this: Why don't we just let sch_netem do the reordering inside of itself entirely and just get rid of all of this ->requeue() business? sch_netem is just a black box, like any other packet scheduler node in the tree, and so it can internally do the reordering with a self managed packet list or similar. All of this can be hidden inside of it's ->dequeue() with some pkt_sch watchdog timer that fires to prevent stale packets sitting in the reorder queue forever. Anyways, just and idea and RFC, just like this patch ;-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 21 Oct 2008 16:36:05 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Fri, 17 Oct 2008 22:12:10 +0200 > > > pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc. > > > > After introducing qdisc->ops->peek() method the only remaining user of > > qdisc->ops->requeue() is netem_enqueue() using this for packet > > re-ordering. According to Patrick McHardy: "a lot of the functionality > > of netem requires the inner tfifo anyways and rate-limiting is usually > > done on top of netem. So I would suggest so either hard-wire the tfifo > > qdisc or at least make the assumption that inner qdiscs are work- > > conserving." This patch tries the former. > > > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > This is an interesting patch. > > But the thing that strikes me is this: Why don't we just let sch_netem do > the reordering inside of itself entirely and just get rid of all of this > ->requeue() business? > > sch_netem is just a black box, like any other packet scheduler node in > the tree, and so it can internally do the reordering with a self managed > packet list or similar. All of this can be hidden inside of it's ->dequeue() > with some pkt_sch watchdog timer that fires to prevent stale packets sitting > in the reorder queue forever. > > Anyways, just and idea and RFC, just like this patch ;-) The problem is that jamal talked me into having netem as a classful qdisc, instead of doing its own rate control. People like to do use TBF as inner qdisc, and do reordering. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote: > On Tue, 21 Oct 2008 16:36:05 -0700 (PDT) > David Miller <davem@davemloft.net> wrote: > > > From: Jarek Poplawski <jarkao2@gmail.com> > > Date: Fri, 17 Oct 2008 22:12:10 +0200 > > > > > pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc. > > > > > > After introducing qdisc->ops->peek() method the only remaining user of > > > qdisc->ops->requeue() is netem_enqueue() using this for packet > > > re-ordering. According to Patrick McHardy: "a lot of the functionality > > > of netem requires the inner tfifo anyways and rate-limiting is usually > > > done on top of netem. So I would suggest so either hard-wire the tfifo > > > qdisc or at least make the assumption that inner qdiscs are work- > > > conserving." This patch tries the former. > > > > > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > > > This is an interesting patch. > > > > But the thing that strikes me is this: Why don't we just let sch_netem do > > the reordering inside of itself entirely and just get rid of all of this > > ->requeue() business? We just let for this here with sch_netem's default tfifo qdisc. > > > > sch_netem is just a black box, like any other packet scheduler node in > > the tree, and so it can internally do the reordering with a self managed > > packet list or similar. All of this can be hidden inside of it's ->dequeue() > > with some pkt_sch watchdog timer that fires to prevent stale packets sitting > > in the reorder queue forever. > > > > Anyways, just and idea and RFC, just like this patch ;-) > > The problem is that jamal talked me into having netem as a classful qdisc, > instead of doing its own rate control. People like to do use TBF as inner qdisc, > and do reordering. If it's only this kind of usage we could export tfifo and let use this as a TBF's (etc.) leaf. Of course, this would require changes in those people scripts. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Fri, 17 Oct 2008 22:12:10 +0200 > >> pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc. >> >> After introducing qdisc->ops->peek() method the only remaining user of >> qdisc->ops->requeue() is netem_enqueue() using this for packet >> re-ordering. According to Patrick McHardy: "a lot of the functionality >> of netem requires the inner tfifo anyways and rate-limiting is usually >> done on top of netem. So I would suggest so either hard-wire the tfifo >> qdisc or at least make the assumption that inner qdiscs are work- >> conserving." This patch tries the former. >> >> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > This is an interesting patch. > > But the thing that strikes me is this: Why don't we just let sch_netem do > the reordering inside of itself entirely and just get rid of all of this > ->requeue() business? I fully agree, keeping all the ->requeue crap around just for this cornercase doesn't seem like a good decision. Most of the ->requeu > sch_netem is just a black box, like any other packet scheduler node in > the tree, and so it can internally do the reordering with a self managed > packet list or similar. All of this can be hidden inside of it's ->dequeue() > with some pkt_sch watchdog timer that fires to prevent stale packets sitting > in the reorder queue forever. > > Anyways, just and idea and RFC, just like this patch ;-) > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Damn hotkeys for sending incomplete mails :)] David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Fri, 17 Oct 2008 22:12:10 +0200 > >> pkt_sched: sch_netem: Limit packet re-ordering functionality to tfifo qdisc. >> >> After introducing qdisc->ops->peek() method the only remaining user of >> qdisc->ops->requeue() is netem_enqueue() using this for packet >> re-ordering. According to Patrick McHardy: "a lot of the functionality >> of netem requires the inner tfifo anyways and rate-limiting is usually >> done on top of netem. So I would suggest so either hard-wire the tfifo >> qdisc or at least make the assumption that inner qdiscs are work- >> conserving." This patch tries the former. >> >> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > This is an interesting patch. > > But the thing that strikes me is this: Why don't we just let sch_netem do > the reordering inside of itself entirely and just get rid of all of this > ->requeue() business? I fully agree, keeping all the ->requeue crap around just for this cornercase doesn't seem like a good decision. Most of the ->requeue functions are completely inconsistent in their behaviour anyways: some undo ->enqueue state changes (which is obviously broken for netem), some don't, some redo checks from ->enqueue (necessary for netem), some don't (TBF), ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jarek Poplawski wrote: > On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote: >> On Tue, 21 Oct 2008 16:36:05 -0700 (PDT) >> David Miller <davem@davemloft.net> wrote: >> >>> sch_netem is just a black box, like any other packet scheduler node in >>> the tree, and so it can internally do the reordering with a self managed >>> packet list or similar. All of this can be hidden inside of it's ->dequeue() >>> with some pkt_sch watchdog timer that fires to prevent stale packets sitting >>> in the reorder queue forever. >>> >>> Anyways, just and idea and RFC, just like this patch ;-) >> The problem is that jamal talked me into having netem as a classful qdisc, >> instead of doing its own rate control. People like to do use TBF as inner qdisc, >> and do reordering. > > If it's only this kind of usage we could export tfifo and let use this > as a TBF's (etc.) leaf. Of course, this would require changes in those > people scripts. In that case we might as well teach them to use TBF as *parent* of netem (and I'd vote to do that and kill requeue). But we can argue about this forever without any progress. The question is simple - should we enforce a reasonable qdisc structure and kill ->requeue or keep it around forever. Keep in mind that there is no loss of functionality by using TBF as parent and that we can do this gradually so users have a chance to fix their scripts, should anyone really use TBF as inner qdisc. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 22, 2008 at 06:00:56PM +0200, Patrick McHardy wrote: > Jarek Poplawski wrote: >> On Tue, Oct 21, 2008 at 04:51:29PM -0700, Stephen Hemminger wrote: >>> On Tue, 21 Oct 2008 16:36:05 -0700 (PDT) >>> David Miller <davem@davemloft.net> wrote: >>> >>>> sch_netem is just a black box, like any other packet scheduler node in >>>> the tree, and so it can internally do the reordering with a self managed >>>> packet list or similar. All of this can be hidden inside of it's ->dequeue() >>>> with some pkt_sch watchdog timer that fires to prevent stale packets sitting >>>> in the reorder queue forever. >>>> >>>> Anyways, just and idea and RFC, just like this patch ;-) >>> The problem is that jamal talked me into having netem as a classful qdisc, >>> instead of doing its own rate control. People like to do use TBF as inner qdisc, >>> and do reordering. >> >> If it's only this kind of usage we could export tfifo and let use this >> as a TBF's (etc.) leaf. Of course, this would require changes in those >> people scripts. > > In that case we might as well teach them to use TBF as *parent* > of netem (and I'd vote to do that and kill requeue). > > But we can argue about this forever without any progress. The > question is simple - should we enforce a reasonable qdisc structure > and kill ->requeue or keep it around forever. Keep in mind that there > is no loss of functionality by using TBF as parent and that we > can do this gradually so users have a chance to fix their scripts, > should anyone really use TBF as inner qdisc. > I'm not sure we think about the same: this tfifo idea doesn't need ->requeue() at all. This would go through TBF's or prio's (etc.) ->enqueue(), and only tfifo's ->enqueue(), if it's used as a leaf, checks the qdisc flag and can reorder. But if it's useless, no problem. I can redo this patch without this qdisc flag. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jarek Poplawski wrote: > On Wed, Oct 22, 2008 at 06:00:56PM +0200, Patrick McHardy wrote: >> Jarek Poplawski wrote: >>> If it's only this kind of usage we could export tfifo and let use this >>> as a TBF's (etc.) leaf. Of course, this would require changes in those >>> people scripts. >> In that case we might as well teach them to use TBF as *parent* >> of netem (and I'd vote to do that and kill requeue). >> >> But we can argue about this forever without any progress. The >> question is simple - should we enforce a reasonable qdisc structure >> and kill ->requeue or keep it around forever. Keep in mind that there >> is no loss of functionality by using TBF as parent and that we >> can do this gradually so users have a chance to fix their scripts, >> should anyone really use TBF as inner qdisc. >> > > I'm not sure we think about the same: this tfifo idea doesn't need > ->requeue() at all. This would go through TBF's or prio's (etc.) > ->enqueue(), and only tfifo's ->enqueue(), if it's used as a leaf, > checks the qdisc flag and can reorder. I see. So both ways would work fine to get rid of requeue. The flag doesn't seem to be necessary though since tfifo already does reordering based on time_to_send. > But if it's useless, no problem. I can redo this patch without this > qdisc flag. Well, you're doing the work, so you decide. I'm undecided myself, the main issues I see are: - we might have to reeducate users twice if we decide to enforce more structure later on - a lot of other qdiscs still don't work as inner qdiscs of netem: - any reordering qdisc can cause stalls since netem_dequeue expects to get packets ordered by time_to_send. This means we can't use cbq, hfsc, htb, prio, sfq, leaving atm, dsmark, netem, red, gred, tbf and the fifos. I guess we can strike atm as "makes no sense" and dsmark as "obsolete since 10(?) years". - netem can't be used as inner qdisc since it would corrupt the skb's cb So the usable inner qdiscs are: tbf, red, gred, *fifo. The fact that over 50% of the qdiscs you could use can cause misbehaviour and TBF, red and gred can be used as upper qdiscs without any loss of functionality makes me think netem simply shouldn't be classful. So actually I am decided :) I think netem shouldn't be classful. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 22, 2008 at 07:32:02PM +0200, Patrick McHardy wrote: ... > Well, you're doing the work, so you decide. I'm undecided myself, > the main issues I see are: Well, I've only tried to finish David's "Killing qdisc->ops->requeue()", which I had broken before. Then somebody framed me into this ->peek() /netem thing... ... > So actually I am decided :) I think netem shouldn't be classful. OK, I added RFC to the subject. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 9dcb5bf..9157766 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -45,6 +45,7 @@ struct Qdisc #define TCQ_F_BUILTIN 1 #define TCQ_F_THROTTLED 2 #define TCQ_F_INGRESS 4 +#define TCQ_F_REQUEUE 8 int padded; struct Qdisc_ops *ops; struct qdisc_size_table *stab; diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 3080bd6..a30f5b6 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -233,7 +233,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) */ cb->time_to_send = psched_get_time(); q->counter = 0; - ret = q->qdisc->ops->requeue(skb, q->qdisc); + q->qdisc->flags |= TCQ_F_REQUEUE; + ret = qdisc_enqueue(skb, q->qdisc); + if (unlikely(q->qdisc->flags & TCQ_F_REQUEUE)) { + q->qdisc->flags &= ~TCQ_F_REQUEUE; + if (net_ratelimit()) + printk(KERN_WARNING "netem_enqueue: re-ordering" + " unsupported; use default (tfifo) qdisc."); + } } if (likely(ret == NET_XMIT_SUCCESS)) { @@ -478,6 +485,15 @@ static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) psched_time_t tnext = netem_skb_cb(nskb)->time_to_send; struct sk_buff *skb; + if (sch->flags & TCQ_F_REQUEUE) { + sch->flags &= ~TCQ_F_REQUEUE; + __skb_queue_head(list, nskb); + sch->qstats.backlog += qdisc_pkt_len(nskb); + sch->qstats.requeues++; + + return NET_XMIT_SUCCESS; + } + if (likely(skb_queue_len(list) < q->limit)) { /* Optimize for add at tail */ if (likely(skb_queue_empty(list) || tnext >= q->oldest)) {