Message ID | 1322156378-23257-1-git-send-email-hagen@jauu.net |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Le jeudi 24 novembre 2011 à 18:39 +0100, Hagen Paul Pfeifer a écrit : > Currently netem is not in the ability to emulate channel bandwidth. Only static > delay (and optional random jitter) can be configured. > > To emulate the channel rate the token bucket filter (sch_tbf) can be used. But > TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot > be 0. Also the idea behind TBF is that the credit (token in buckets) fills if > no packet is transmitted. So that there is always a "positive" credit for new > packets. In real life this behavior contradicts the law of nature where > nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s > link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0 > seconds. > > Netem is an excellent place to implement a rate limiting feature: static > delay is already implemented, tfifo already has time information and the > user can skip TBF configuration completely. > > This patch implement rate latency feature which can be configured via > tc. e.g: > > tc qdisc add dev eth0 root netem ratelatency 10kbit > > To emulate a link of 5000byte/s and add an additional static delay of 10ms: > > tc qdisc add dev eth0 root netem delay 10ms ratelatency 5KBps > > Note: similar to TBF the rate-latency extension is bounded to the kernel timing > system. Depending on the architecture timer granularity, higher rates (e.g. > 10mbit/s and higher) tend to transmission bursts. Also note: further queues > living in network adaptors; see ethtool(8). > > Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> > --- > include/linux/pkt_sched.h | 5 +++++ > net/sched/sch_netem.c | 40 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+), 0 deletions(-) I like this patch, this is a useful extension. Only point is why you chose ratelatency instead of rate ? We want to emulate a real link, and yes, a 1000 bytes packet must be delayed _before_ we deliver it to the device, but its a detail of how works netem. The usual word we use to describe a 1Mbps link is "1Mbps rate" ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet | 2011-11-24 23:14:58 [+0100]:
>Only point is why you chose ratelatency instead of rate ?
Not sure why, it was called rate in v1, then somebody said ratelatency and I
found it more stating. So in v2 it become ratelatency. I have no strong
opinion here - should I generate v3?
Hagen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 24 Nov 2011, Hagen Paul Pfeifer wrote: > * Eric Dumazet | 2011-11-24 23:14:58 [+0100]: > > >Only point is why you chose ratelatency instead of rate ? > > > Not sure why, it was called rate in v1, then somebody said ratelatency and I > found it more stating. So in v2 it become ratelatency. I have no strong > opinion here - should I generate v3? From the user perspective, I also find rate much more natural. No need to add further to tc obscurity. I would ask for an update to the netem man page, but I guess there isn't a netem man page. :-( -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Bill Fink | 2011-11-24 20:06:50 [-0500]: >From the user perspective, I also find rate much more natural. >No need to add further to tc obscurity. ok, then I will respin the patch. >I would ask for an update to the netem man page, but I guess >there isn't a netem man page. :-( Someone wrote a man page, but it was never commited to iproute2. I will have a look. Hagen -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 24 Nov 2011 23:14:58 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > I like this patch, this is a useful extension. > > Only point is why you chose ratelatency instead of rate ? > > We want to emulate a real link, and yes, a 1000 bytes packet must be > delayed _before_ we deliver it to the device, but its a detail of how > works netem. > > The usual word we use to describe a 1Mbps link is "1Mbps rate" ; I would rather a new qdisc then add more features to the already complex netem. Initially, there where was a rate control built into netem, but the consensus was to use stacking to do it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 24 novembre 2011 à 21:09 -0800, Stephen Hemminger a écrit : > I would rather a new qdisc then add more features to the already complex > netem. Initially, there where was a rate control built into netem, but > the consensus was to use stacking to do it. Yes, but Hagen change adds a few lines to netem, and netem already handles throttling. This is why I believe its a nice enhancement. Being able to simulate a ratelimit (in bits per second by the way, the usual bandwith unit, not bytes per second...) in a very easy way seems a good thing, even if it handles only the egress side. As Hagen mentioned, a standard qdisc is able to rate limit, but the first packet sent has a null delay, even if its 64Kbyte packet. It doesnt mimic a true link. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet | 2011-11-25 07:13:20 [+0100]: >Yes, but Hagen change adds a few lines to netem, and netem already >handles throttling. This is why I believe its a nice enhancement. We first modified TBF, but TBF address a slightly different task. So the patch was a little bit awkward and complex (more awkward then the two additional netem enqueue() lines). So in the end: yes, netem is the right place for this: only a few lines in netem are required. Additionally: setup a qdisc chain with TBF, netem, ... is also error prone. Students of mine repeatedly make mistakes here. This change make a complete emulation setup even more easy. But this is only a side note. Hagen -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index c533670..cf826d3 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -465,6 +465,7 @@ enum { TCA_NETEM_REORDER, TCA_NETEM_CORRUPT, TCA_NETEM_LOSS, + TCA_NETEM_RATELATENCY, __TCA_NETEM_MAX, }; @@ -495,6 +496,10 @@ struct tc_netem_corrupt { __u32 correlation; }; +struct tc_netem_ratelatency { + __u32 ratelatency; /* byte/s */ +}; + enum { NETEM_LOSS_UNSPEC, NETEM_LOSS_GI, /* General Intuitive - 4 state model */ diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index eb3b9a8..3ae1cdd 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -79,6 +79,7 @@ struct netem_sched_data { u32 duplicate; u32 reorder; u32 corrupt; + u32 ratelatency; struct crndstate { u32 last; @@ -298,6 +299,11 @@ static psched_tdiff_t tabledist(psched_tdiff_t mu, psched_tdiff_t sigma, return x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu; } +static psched_time_t packet_len_2_sched_time(unsigned int len, u32 rate) +{ + return PSCHED_NS2TICKS((u64)len * NSEC_PER_SEC / rate); +} + /* * Insert one skb into qdisc. * Note: parent depends on return value to account for queue length. @@ -371,6 +377,24 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) &q->delay_cor, q->delay_dist); now = psched_get_time(); + + if (q->ratelatency) { + struct sk_buff_head *list = &q->qdisc->q; + + delay += packet_len_2_sched_time(skb->len, q->ratelatency); + + if (!skb_queue_empty(list)) { + /* + * Last packet in queue is reference point (now). + * First packet in queue is already in flight, + * calculate this time bonus and substract + * from delay. + */ + delay -= now - netem_skb_cb(skb_peek(list))->time_to_send; + now = netem_skb_cb(skb_peek_tail(list))->time_to_send; + } + } + cb->time_to_send = now + delay; ++q->counter; ret = qdisc_enqueue(skb, q->qdisc); @@ -535,6 +559,14 @@ static void get_corrupt(struct Qdisc *sch, const struct nlattr *attr) init_crandom(&q->corrupt_cor, r->correlation); } +static void get_ratelatency(struct Qdisc *sch, const struct nlattr *attr) +{ + struct netem_sched_data *q = qdisc_priv(sch); + const struct tc_netem_ratelatency *r = nla_data(attr); + + q->ratelatency = r->ratelatency; +} + static int get_loss_clg(struct Qdisc *sch, const struct nlattr *attr) { struct netem_sched_data *q = qdisc_priv(sch); @@ -594,6 +626,7 @@ static const struct nla_policy netem_policy[TCA_NETEM_MAX + 1] = { [TCA_NETEM_CORR] = { .len = sizeof(struct tc_netem_corr) }, [TCA_NETEM_REORDER] = { .len = sizeof(struct tc_netem_reorder) }, [TCA_NETEM_CORRUPT] = { .len = sizeof(struct tc_netem_corrupt) }, + [TCA_NETEM_RATELATENCY] = { .len = sizeof(struct tc_netem_ratelatency) }, [TCA_NETEM_LOSS] = { .type = NLA_NESTED }, }; @@ -666,6 +699,9 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt) if (tb[TCA_NETEM_CORRUPT]) get_corrupt(sch, tb[TCA_NETEM_CORRUPT]); + if (tb[TCA_NETEM_RATELATENCY]) + get_ratelatency(sch, tb[TCA_NETEM_RATELATENCY]); + q->loss_model = CLG_RANDOM; if (tb[TCA_NETEM_LOSS]) ret = get_loss_clg(sch, tb[TCA_NETEM_LOSS]); @@ -846,6 +882,7 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb) struct tc_netem_corr cor; struct tc_netem_reorder reorder; struct tc_netem_corrupt corrupt; + struct tc_netem_ratelatency ratelatency; qopt.latency = q->latency; qopt.jitter = q->jitter; @@ -868,6 +905,9 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb) corrupt.correlation = q->corrupt_cor.rho; NLA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt); + ratelatency.ratelatency = q->ratelatency; + NLA_PUT(skb, TCA_NETEM_RATELATENCY, sizeof(ratelatency), &ratelatency); + if (dump_loss_model(q, skb) != 0) goto nla_put_failure;
Currently netem is not in the ability to emulate channel bandwidth. Only static delay (and optional random jitter) can be configured. To emulate the channel rate the token bucket filter (sch_tbf) can be used. But TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot be 0. Also the idea behind TBF is that the credit (token in buckets) fills if no packet is transmitted. So that there is always a "positive" credit for new packets. In real life this behavior contradicts the law of nature where nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0 seconds. Netem is an excellent place to implement a rate limiting feature: static delay is already implemented, tfifo already has time information and the user can skip TBF configuration completely. This patch implement rate latency feature which can be configured via tc. e.g: tc qdisc add dev eth0 root netem ratelatency 10kbit To emulate a link of 5000byte/s and add an additional static delay of 10ms: tc qdisc add dev eth0 root netem delay 10ms ratelatency 5KBps Note: similar to TBF the rate-latency extension is bounded to the kernel timing system. Depending on the architecture timer granularity, higher rates (e.g. 10mbit/s and higher) tend to transmission bursts. Also note: further queues living in network adaptors; see ethtool(8). Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> --- include/linux/pkt_sched.h | 5 +++++ net/sched/sch_netem.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 0 deletions(-)