Message ID | 1342271787.3265.9632.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, 14 Jul 2012 15:16:27 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > netem does an early orphaning of skbs. Doing so breaks TCP Small Queue > or any mechanism relying on socket sk_wmem_alloc feedback. > > Ideally, we should perform this orphaning after the rate module and > before the delay module, to mimic what happens on a real link : > > skb orphaning is indeed normally done at TX completion, before the > transit on the link. > > +-------+ +--------+ +---------------+ +-----------------+ > + Qdisc +---> Device +--> TX completion +--> links / hops +-> > + + + xmit + + skb orphaning + + propagation + > +-------+ +--------+ +---------------+ +-----------------+ > < rate limiting > < delay, drops, reorders > > > If netem is used without delay feature (drops, reorders, rate > limiting), then we should avoid early skb orphaning, to keep pressure > on sockets as long as packets are still in qdisc queue. > > Ideally, netem should be refactored to implement delay module > as the last stage. Current algorithm merges the two phases > (rate limiting + delay) so its not correct. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Hagen Paul Pfeifer <hagen@jauu.net> > Cc: Mark Gordon <msg@google.com> > Cc: Andreas Terzis <aterzis@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > --- > net/sched/sch_netem.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c > index c412ad0..298c0dd 100644 > --- a/net/sched/sch_netem.c > +++ b/net/sched/sch_netem.c > @@ -380,7 +380,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) > return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; > } > > - skb_orphan(skb); > + /* If a delay is expected, orphan the skb. (orphaning usually takes > + * place at TX completion time, so _before_ the link transit delay) > + * Ideally, this orphaning should be done after the rate limiting > + * module, because this breaks TCP Small Queue, and other mechanisms > + * based on socket sk_wmem_alloc. > + */ > + if (q->latency || q->jitter) > + skb_orphan(skb); > > /* > * If we need to duplicate packet, then re-insert at top of the > Acked-by: Stephen Hemminger <shemminger@vyatta.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Stephen Hemminger <shemminger@vyatta.com> Date: Sat, 14 Jul 2012 14:53:33 -0700 > On Sat, 14 Jul 2012 15:16:27 +0200 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> From: Eric Dumazet <edumazet@google.com> >> >> netem does an early orphaning of skbs. Doing so breaks TCP Small Queue >> or any mechanism relying on socket sk_wmem_alloc feedback. >> >> Ideally, we should perform this orphaning after the rate module and >> before the delay module, to mimic what happens on a real link : >> >> skb orphaning is indeed normally done at TX completion, before the >> transit on the link. >> >> +-------+ +--------+ +---------------+ +-----------------+ >> + Qdisc +---> Device +--> TX completion +--> links / hops +-> >> + + + xmit + + skb orphaning + + propagation + >> +-------+ +--------+ +---------------+ +-----------------+ >> < rate limiting > < delay, drops, reorders > >> >> If netem is used without delay feature (drops, reorders, rate >> limiting), then we should avoid early skb orphaning, to keep pressure >> on sockets as long as packets are still in qdisc queue. >> >> Ideally, netem should be refactored to implement delay module >> as the last stage. Current algorithm merges the two phases >> (rate limiting + delay) so its not correct. >> >> Signed-off-by: Eric Dumazet <edumazet@google.com> ... > Acked-by: Stephen Hemminger <shemminger@vyatta.com> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index c412ad0..298c0dd 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -380,7 +380,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; } - skb_orphan(skb); + /* If a delay is expected, orphan the skb. (orphaning usually takes + * place at TX completion time, so _before_ the link transit delay) + * Ideally, this orphaning should be done after the rate limiting + * module, because this breaks TCP Small Queue, and other mechanisms + * based on socket sk_wmem_alloc. + */ + if (q->latency || q->jitter) + skb_orphan(skb); /* * If we need to duplicate packet, then re-insert at top of the