Message ID | 1340903237.13187.151.camel@edumazet-glaptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Jun 28, 2012 at 10:07 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > At enqueue time, check sojourn time of packet at head of the queue, > and return NET_XMIT_CN instead of NET_XMIT_SUCCESS if this sejourn > time is above codel @target. > > This permits local TCP stack to call tcp_enter_cwr() and reduce its cwnd > without drops (for example if ECN is not enabled for the flow) > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Dave Taht <dave.taht@bufferbloat.net> > Cc: Tom Herbert <therbert@google.com> > Cc: Matt Mathis <mattmathis@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Nandita Dukkipati <nanditad@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- > include/linux/pkt_sched.h | 1 + > include/net/codel.h | 2 +- > net/sched/sch_fq_codel.c | 19 +++++++++++++++---- > 3 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h > index 32aef0a..4d409a5 100644 > --- a/include/linux/pkt_sched.h > +++ b/include/linux/pkt_sched.h > @@ -714,6 +714,7 @@ struct tc_fq_codel_qd_stats { > */ > __u32 new_flows_len; /* count of flows in new list */ > __u32 old_flows_len; /* count of flows in old list */ > + __u32 congestion_count; > }; > > struct tc_fq_codel_cl_stats { > diff --git a/include/net/codel.h b/include/net/codel.h > index 550debf..8c7d6a7 100644 > --- a/include/net/codel.h > +++ b/include/net/codel.h > @@ -148,7 +148,7 @@ struct codel_vars { > * struct codel_stats - contains codel shared variables and stats > * @maxpacket: largest packet we've seen so far > * @drop_count: temp count of dropped packets in dequeue() > - * ecn_mark: number of packets we ECN marked instead of dropping > + * @ecn_mark: number of packets we ECN marked instead of dropping > */ > struct codel_stats { > u32 maxpacket; > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c > index 9fc1c62..c0485a0 100644 > --- a/net/sched/sch_fq_codel.c > +++ b/net/sched/sch_fq_codel.c > @@ -62,6 +62,7 @@ struct fq_codel_sched_data { > struct codel_stats cstats; > u32 drop_overlimit; > u32 new_flow_count; > + u32 congestion_count; > > struct list_head new_flows; /* list of new flows */ > struct list_head old_flows; /* list of old flows */ > @@ -196,16 +197,25 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) > flow->deficit = q->quantum; > flow->dropped = 0; > } > - if (++sch->q.qlen < sch->limit) > + if (++sch->q.qlen < sch->limit) { > + codel_time_t hdelay = codel_get_enqueue_time(skb) - > + codel_get_enqueue_time(flow->head); > + > + /* If this flow is congested, tell the caller ! */ > + if (codel_time_after(hdelay, q->cparams.target)) { > + q->congestion_count++; > + return NET_XMIT_CN; > + } > return NET_XMIT_SUCCESS; > - > + } > q->drop_overlimit++; > /* Return Congestion Notification only if we dropped a packet > * from this flow. > */ > - if (fq_codel_drop(sch) == idx) > + if (fq_codel_drop(sch) == idx) { > + q->congestion_count++; > return NET_XMIT_CN; > - > + } > /* As we dropped a packet, better let upper stack know this */ > qdisc_tree_decrease_qlen(sch, 1); > return NET_XMIT_SUCCESS; > @@ -467,6 +477,7 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d) > > st.qdisc_stats.maxpacket = q->cstats.maxpacket; > st.qdisc_stats.drop_overlimit = q->drop_overlimit; > + st.qdisc_stats.congestion_count = q->congestion_count; > st.qdisc_stats.ecn_mark = q->cstats.ecn_mark; > st.qdisc_stats.new_flow_count = q->new_flow_count; > > > clever idea. A problem is there are other forms of network traffic on a link, and this is punishing a single tcp stream that may not be the source of the problem in the first place, and basically putting it into double jeopardy. I am curious as to how often an enqueue is actually dropping in the codel/fq_codel case, the hope was that there would be plenty of headroom under far more circumstances on this qdisc. I note that on the dequeue side of codel (and in the network stack generally) I was thinking that supplying a netlink level message on a packet drop/congestion indication that userspace could register for and see would be very useful, particularly in the case of a routing daemon, but also for statistics collection, and perhaps other levels of overall network control (DCTCP-like) The existing NET_DROP functionality is hard to use, and your idea is "in-band", the more general netlink message idea would be "out of band" and more general.
On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote: > clever idea. A problem is there are other forms of network traffic on > a link, and this is punishing a single tcp > stream that may not be the source of the problem in the first place, > and basically putting it into double jeopardy. > Why ? In fact this patch helps the tcp session being signaled (as it will be anyway) at enqueue time, instead of having to react to packet losses indications given (after RTT) by receiver. Avoiding losses help receiver to consume data without having to buffer it into Out Of Order queue. So its not jeopardy, but early congestion notification without RTT delay. NET_XMIT_CN is a soft signal, far more disruptive than a DROP. > I am curious as to how often an enqueue is actually dropping in the > codel/fq_codel case, the hope was that there would be plenty of > headroom under far more circumstances on this qdisc. > "tc -s qdisc show dev eth0" can show you all the counts. We never drop a packet at enqueue time, unless you hit the emergency limit (10240 packets for fq_codel). When you reach this limit, you are under trouble. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote: > >> clever idea. A problem is there are other forms of network traffic on >> a link, and this is punishing a single tcp Dave: it won't just punish a single TCP, all protocols that react to XMIT_CN will share similar fate. >> stream that may not be the source of the problem in the first place, >> and basically putting it into double jeopardy. >> > > Why ? In fact this patch helps the tcp session being signaled (as it > will be anyway) at enqueue time, instead of having to react to packet > losses indications given (after RTT) by receiver. > > Avoiding losses help receiver to consume data without having to buffer > it into Out Of Order queue. > > So its not jeopardy, but early congestion notification without RTT > delay. > > NET_XMIT_CN is a soft signal, far more disruptive than a DROP. I don't read here: you mean far "less" disruptive in terms of performance? > >> I am curious as to how often an enqueue is actually dropping in the >> codel/fq_codel case, the hope was that there would be plenty of >> headroom under far more circumstances on this qdisc. >> > > "tc -s qdisc show dev eth0" can show you all the counts. > > We never drop a packet at enqueue time, unless you hit the emergency > limit (10240 packets for fq_codel). When you reach this limit, you are > under trouble. Another nice feature is to resume TCP xmit operation at when the skb hits the wire. My hutch is that Eric is working on this too. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 28, 2012 at 6:56 PM, Yuchung Cheng <ycheng@google.com> wrote: > On Thu, Jun 28, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote: >> >>> clever idea. A problem is there are other forms of network traffic on >>> a link, and this is punishing a single tcp > Dave: it won't just punish a single TCP, all protocols that react to > XMIT_CN will share similar fate. What protocols in the kernel do and don't? was the crux of this question. I'm not objecting to the idea, it's clever, as I said. I'm thinking I'll apply it to cerowrt's next build and see what happens, if this will apply against 3.3. Or maybe the ns3 model. Or both. As a segue... I am still fond of gaining an ability to also throw congestion notifications (who, where, and why) up to userspace via netlink. Having a viable metric for things like mesh routing and multipath has been a age-old quest of mine, and getting that data out could be a start towards it. Another example is something that lives in userspace like uTP. > >>> stream that may not be the source of the problem in the first place, >>> and basically putting it into double jeopardy. >>> >> >> Why ? In fact this patch helps the tcp session being signaled (as it >> will be anyway) at enqueue time, instead of having to react to packet >> losses indications given (after RTT) by receiver. I tend to think more in terms of routing packets rather than originating them. >> Avoiding losses help receiver to consume data without having to buffer >> it into Out Of Order queue. >> >> So its not jeopardy, but early congestion notification without RTT >> delay. Well there is the birthday problem and hashing to the same queues. the sims we have do some interesting things on new streams in slow start sometimes. But don't have enough of a grip on it to talk about it yet... >> >> NET_XMIT_CN is a soft signal, far more disruptive than a DROP. > I don't read here: you mean far "less" disruptive in terms of performance? I figured eric meant less. >> >>> I am curious as to how often an enqueue is actually dropping in the >>> codel/fq_codel case, the hope was that there would be plenty of >>> headroom under far more circumstances on this qdisc. >>> >> >> "tc -s qdisc show dev eth0" can show you all the counts. >> >> We never drop a packet at enqueue time, unless you hit the emergency >> limit (10240 packets for fq_codel). When you reach this limit, you are >> under trouble. In my own tests with artificial streams that set but don't respect ecn, I hit limit easily. But that's the subject of another thread on the codel list, and a different problem entirely. I just am not testing at > 1GigE speeds and I know you guys are. I worry about behaviors above 10GigE, and here too, the NET_XMIT_CN idea seems like a good idea. so, applause. new idea on top of fair queue-ing + codel. cool. So many hard problems seem to be getting tractable!
As you know I really like this idea. My main concern is that the same packet could cause TCP to reduce cwnd twice within an RTT - first on enqueue and then if this packet is ECN marked on dequeue. I don't think this is the desired behavior. Can we avoid it? On Thu, Jun 28, 2012 at 10:07 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > At enqueue time, check sojourn time of packet at head of the queue, > and return NET_XMIT_CN instead of NET_XMIT_SUCCESS if this sejourn > time is above codel @target. > > This permits local TCP stack to call tcp_enter_cwr() and reduce its cwnd > without drops (for example if ECN is not enabled for the flow) > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Dave Taht <dave.taht@bufferbloat.net> > Cc: Tom Herbert <therbert@google.com> > Cc: Matt Mathis <mattmathis@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Nandita Dukkipati <nanditad@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- > include/linux/pkt_sched.h | 1 + > include/net/codel.h | 2 +- > net/sched/sch_fq_codel.c | 19 +++++++++++++++---- > 3 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h > index 32aef0a..4d409a5 100644 > --- a/include/linux/pkt_sched.h > +++ b/include/linux/pkt_sched.h > @@ -714,6 +714,7 @@ struct tc_fq_codel_qd_stats { > */ > __u32 new_flows_len; /* count of flows in new list */ > __u32 old_flows_len; /* count of flows in old list */ > + __u32 congestion_count; > }; > > struct tc_fq_codel_cl_stats { > diff --git a/include/net/codel.h b/include/net/codel.h > index 550debf..8c7d6a7 100644 > --- a/include/net/codel.h > +++ b/include/net/codel.h > @@ -148,7 +148,7 @@ struct codel_vars { > * struct codel_stats - contains codel shared variables and stats > * @maxpacket: largest packet we've seen so far > * @drop_count: temp count of dropped packets in dequeue() > - * ecn_mark: number of packets we ECN marked instead of dropping > + * @ecn_mark: number of packets we ECN marked instead of dropping > */ > struct codel_stats { > u32 maxpacket; > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c > index 9fc1c62..c0485a0 100644 > --- a/net/sched/sch_fq_codel.c > +++ b/net/sched/sch_fq_codel.c > @@ -62,6 +62,7 @@ struct fq_codel_sched_data { > struct codel_stats cstats; > u32 drop_overlimit; > u32 new_flow_count; > + u32 congestion_count; > > struct list_head new_flows; /* list of new flows */ > struct list_head old_flows; /* list of old flows */ > @@ -196,16 +197,25 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) > flow->deficit = q->quantum; > flow->dropped = 0; > } > - if (++sch->q.qlen < sch->limit) > + if (++sch->q.qlen < sch->limit) { > + codel_time_t hdelay = codel_get_enqueue_time(skb) - > + codel_get_enqueue_time(flow->head); > + > + /* If this flow is congested, tell the caller ! */ > + if (codel_time_after(hdelay, q->cparams.target)) { > + q->congestion_count++; > + return NET_XMIT_CN; > + } > return NET_XMIT_SUCCESS; > - > + } > q->drop_overlimit++; > /* Return Congestion Notification only if we dropped a packet > * from this flow. > */ > - if (fq_codel_drop(sch) == idx) > + if (fq_codel_drop(sch) == idx) { > + q->congestion_count++; > return NET_XMIT_CN; > - > + } > /* As we dropped a packet, better let upper stack know this */ > qdisc_tree_decrease_qlen(sch, 1); > return NET_XMIT_SUCCESS; > @@ -467,6 +477,7 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d) > > st.qdisc_stats.maxpacket = q->cstats.maxpacket; > st.qdisc_stats.drop_overlimit = q->drop_overlimit; > + st.qdisc_stats.congestion_count = q->congestion_count; > st.qdisc_stats.ecn_mark = q->cstats.ecn_mark; > st.qdisc_stats.new_flow_count = q->new_flow_count; > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-06-28 at 16:52 -0700, Nandita Dukkipati wrote: > As you know I really like this idea. My main concern is that the same > packet could cause TCP to reduce cwnd twice within an RTT - first on > enqueue and then if this packet is ECN marked on dequeue. I don't > think this is the desired behavior. Can we avoid it? I'll work on this. In my experiences, I found that no drops (or ECN marks) were done at dequeue time once one NET_XMIT_CN was returned, but its certainly possible if other flows compete with this one. Strangely, SFQ has the same behavior and nobody complained yet ;) Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-06-28 at 19:47 -0400, Dave Taht wrote: > On Thu, Jun 28, 2012 at 6:56 PM, Yuchung Cheng <ycheng@google.com> wrote: > > On Thu, Jun 28, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote: > >> > >>> clever idea. A problem is there are other forms of network traffic on > >>> a link, and this is punishing a single tcp > > Dave: it won't just punish a single TCP, all protocols that react to > > XMIT_CN will share similar fate. > > What protocols in the kernel do and don't? was the crux of this question. > AFAIK that only tcp cares a bit, or seems to. But not that much, since it continues to send packets. Thats because tcp_transmit_skb() changes the NET_XMIT_CN status to plain NET_XMIT_SUCCESS. My long term plan is to reduce number of skbs queued in Qdisc for TCP stack, to reduce RTT (removing the artificial RTT bias because of local queues) > I'm not objecting to the idea, it's clever, as I said. I'm thinking I'll > apply it to cerowrt's next build and see what happens, if this > will apply against 3.3. Or maybe the ns3 model. Or both. A router will have no use of this feature, not sure you need to spend time trying this ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-06-28 at 19:07 +0200, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > At enqueue time, check sojourn time of packet at head of the queue, > and return NET_XMIT_CN instead of NET_XMIT_SUCCESS if this sejourn > time is above codel @target. > > This permits local TCP stack to call tcp_enter_cwr() and reduce its cwnd > without drops (for example if ECN is not enabled for the flow) > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Dave Taht <dave.taht@bufferbloat.net> > Cc: Tom Herbert <therbert@google.com> > Cc: Matt Mathis <mattmathis@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Nandita Dukkipati <nanditad@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- Please dont apply this patch, I'll submit an updated version later. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 29 Jun 2012 06:53:12 +0200 > Please dont apply this patch, I'll submit an updated version later. Ok. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 29, 2012 at 12:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > A router will have no use of this feature, not sure you need to spend > time trying this ;) It's not yer ordinary router... A cerowrt router has iperf, netperf/netserver from svn with congestion control switching and classification setting, rsync (with same), samba, transmission, a polipo proxy, scamper, and a legion of other network analysis tools on-board and available as optional packages. and it's used in the bufferbloat project as a thoroughly understood platform for originating, receiving, AND routing packets on a real embedded home gateway platform that end users actually use, through a decent set of drivers, on ethernet and wifi. I am always concerned when changes to the stack like GSO/GRO/BQL/fq_codel go into linux - or things like the infinite window in ECN bug from a few months back happen - as they hold promise to mutate (or explain) the statistics and analysis we've accumulated over the last year and a half. And as I'm hoping to do a major test run shortly to get some fresh statistics vs a vs fq_codel vs the old sfqred tests ( I'm looking forward to redoing this one in particular: http://www.teklibre.com/~d/bloat/hoqvssfqred.ps - ) ... and you are about to change what those stats are going to look like, under load, with this change... I kind of need to understand/track it/parse it/capture it. I've got sufficient hardware now to easily A/B things. (sorry for the noise on the lists)
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 32aef0a..4d409a5 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -714,6 +714,7 @@ struct tc_fq_codel_qd_stats { */ __u32 new_flows_len; /* count of flows in new list */ __u32 old_flows_len; /* count of flows in old list */ + __u32 congestion_count; }; struct tc_fq_codel_cl_stats { diff --git a/include/net/codel.h b/include/net/codel.h index 550debf..8c7d6a7 100644 --- a/include/net/codel.h +++ b/include/net/codel.h @@ -148,7 +148,7 @@ struct codel_vars { * struct codel_stats - contains codel shared variables and stats * @maxpacket: largest packet we've seen so far * @drop_count: temp count of dropped packets in dequeue() - * ecn_mark: number of packets we ECN marked instead of dropping + * @ecn_mark: number of packets we ECN marked instead of dropping */ struct codel_stats { u32 maxpacket; diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 9fc1c62..c0485a0 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -62,6 +62,7 @@ struct fq_codel_sched_data { struct codel_stats cstats; u32 drop_overlimit; u32 new_flow_count; + u32 congestion_count; struct list_head new_flows; /* list of new flows */ struct list_head old_flows; /* list of old flows */ @@ -196,16 +197,25 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) flow->deficit = q->quantum; flow->dropped = 0; } - if (++sch->q.qlen < sch->limit) + if (++sch->q.qlen < sch->limit) { + codel_time_t hdelay = codel_get_enqueue_time(skb) - + codel_get_enqueue_time(flow->head); + + /* If this flow is congested, tell the caller ! */ + if (codel_time_after(hdelay, q->cparams.target)) { + q->congestion_count++; + return NET_XMIT_CN; + } return NET_XMIT_SUCCESS; - + } q->drop_overlimit++; /* Return Congestion Notification only if we dropped a packet * from this flow. */ - if (fq_codel_drop(sch) == idx) + if (fq_codel_drop(sch) == idx) { + q->congestion_count++; return NET_XMIT_CN; - + } /* As we dropped a packet, better let upper stack know this */ qdisc_tree_decrease_qlen(sch, 1); return NET_XMIT_SUCCESS; @@ -467,6 +477,7 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d) st.qdisc_stats.maxpacket = q->cstats.maxpacket; st.qdisc_stats.drop_overlimit = q->drop_overlimit; + st.qdisc_stats.congestion_count = q->congestion_count; st.qdisc_stats.ecn_mark = q->cstats.ecn_mark; st.qdisc_stats.new_flow_count = q->new_flow_count;