From patchwork Tue Feb 12 10:12:05 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiri Pirko X-Patchwork-Id: 219765 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 694DF2C0326 for ; Tue, 12 Feb 2013 21:12:43 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932888Ab3BLKMk (ORCPT ); Tue, 12 Feb 2013 05:12:40 -0500 Received: from mail-ea0-f169.google.com ([209.85.215.169]:58129 "EHLO mail-ea0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932870Ab3BLKMh (ORCPT ); Tue, 12 Feb 2013 05:12:37 -0500 Received: by mail-ea0-f169.google.com with SMTP id d13so76543eaa.0 for ; Tue, 12 Feb 2013 02:12:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references:x-gm-message-state; bh=Cx7fjNh+I73nKSncYjPSIsBvrWXjpgdgM1q5SXPwSb8=; b=AQ9ZT9YSHNlSatdesa486y5jXnhJMGiiUr/UIxmiFPxYco6npD/tCWcjFQ16SBivPh kRQxXpzbjn8HxS8P+xCVv4QDwFj6ZK++CpLAhFT8BxsKWsrTIA+KzfQjKbDUg/J7yG9h sG2OSn3lL/s7wKDCBijYCYnAen1pbbOKpgkfloZPUj3g9EGST9sxYvzWRxhfzD/Iozq+ FfT6WlvLcoXNFvTOOoc1w19yFdza40gNAY/Dg1F0pOowFSfkq0EogcxbmQbz0k7GbnNt Ete05rx75OR/oUPOGhbTYzgbXDRlBjUzQGqTjCx6wyI79ykH/oTFGguNJyQmyKevwNIA zDhw== X-Received: by 10.14.213.131 with SMTP id a3mr61348146eep.24.1360663955555; Tue, 12 Feb 2013 02:12:35 -0800 (PST) Received: from localhost (ip-94-112-98-141.net.upcbroadband.cz. [94.112.98.141]) by mx.google.com with ESMTPS id h5sm66981055eem.1.2013.02.12.02.12.33 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 12 Feb 2013 02:12:34 -0800 (PST) From: Jiri Pirko To: netdev@vger.kernel.org Cc: davem@davemloft.net, edumazet@google.com, jhs@mojatatu.com, kuznet@ms2.inr.ac.ru, j.vimal@gmail.com Subject: [patch net-next v5 07/11] tbf: improved accuracy at high rates Date: Tue, 12 Feb 2013 11:12:05 +0100 Message-Id: <1360663929-1023-8-git-send-email-jiri@resnulli.us> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1360663929-1023-1-git-send-email-jiri@resnulli.us> References: <1360663929-1023-1-git-send-email-jiri@resnulli.us> X-Gm-Message-State: ALoCoQmb51mEH3hxl7LwmWf2tgK2tEOvvqXi1M5RqxIkx/fQv6XEqfeeRXx28d6DGDq825D5iHoG Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Current TBF uses rate table computed by the "tc" userspace program, which has the following issue: The rate table has 256 entries to map packet lengths to token (time units). With TSO sized packets, the 256 entry granularity leads to loss/gain of rate, making the token bucket inaccurate. Thus, instead of relying on rate table, this patch explicitly computes the time and accounts for packet transmission times with nanosecond granularity. This is a followup to 56b765b79e9a78dc7d3f8850ba5e5567205a3ecd Signed-off-by: Jiri Pirko Acked-by: Eric Dumazet --- net/sched/sch_tbf.c | 76 ++++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 39 deletions(-) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 4b056c15..c8388f3 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -100,23 +101,21 @@ struct tbf_sched_data { /* Parameters */ u32 limit; /* Maximal length of backlog: bytes */ - u32 buffer; /* Token bucket depth/rate: MUST BE >= MTU/B */ - u32 mtu; + s64 buffer; /* Token bucket depth/rate: MUST BE >= MTU/B */ + s64 mtu; u32 max_size; - struct qdisc_rate_table *R_tab; - struct qdisc_rate_table *P_tab; + struct psched_ratecfg rate; + struct psched_ratecfg peak; + bool peak_present; /* Variables */ - long tokens; /* Current number of B tokens */ - long ptokens; /* Current number of P tokens */ - psched_time_t t_c; /* Time check-point */ + s64 tokens; /* Current number of B tokens */ + s64 ptokens; /* Current number of P tokens */ + s64 t_c; /* Time check-point */ struct Qdisc *qdisc; /* Inner qdisc, default - bfifo queue */ struct qdisc_watchdog watchdog; /* Watchdog timer */ }; -#define L2T(q, L) qdisc_l2t((q)->R_tab, L) -#define L2T_P(q, L) qdisc_l2t((q)->P_tab, L) - static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch) { struct tbf_sched_data *q = qdisc_priv(sch); @@ -156,24 +155,24 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch) skb = q->qdisc->ops->peek(q->qdisc); if (skb) { - psched_time_t now; - long toks; - long ptoks = 0; + s64 now; + s64 toks; + s64 ptoks = 0; unsigned int len = qdisc_pkt_len(skb); - now = psched_get_time(); - toks = psched_tdiff_bounded(now, q->t_c, q->buffer); + now = ktime_to_ns(ktime_get()); + toks = min_t(s64, now - q->t_c, q->buffer); - if (q->P_tab) { + if (q->peak_present) { ptoks = toks + q->ptokens; - if (ptoks > (long)q->mtu) + if (ptoks > q->mtu) ptoks = q->mtu; - ptoks -= L2T_P(q, len); + ptoks -= (s64) psched_l2t_ns(&q->peak, len); } toks += q->tokens; - if (toks > (long)q->buffer) + if (toks > q->buffer) toks = q->buffer; - toks -= L2T(q, len); + toks -= (s64) psched_l2t_ns(&q->rate, len); if ((toks|ptoks) >= 0) { skb = qdisc_dequeue_peeked(q->qdisc); @@ -189,8 +188,8 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch) return skb; } - qdisc_watchdog_schedule(&q->watchdog, - now + max_t(long, -toks, -ptoks)); + qdisc_watchdog_schedule_ns(&q->watchdog, + now + max_t(long, -toks, -ptoks)); /* Maybe we have a shorter packet in the queue, which can be sent now. It sounds cool, @@ -214,7 +213,7 @@ static void tbf_reset(struct Qdisc *sch) qdisc_reset(q->qdisc); sch->q.qlen = 0; - q->t_c = psched_get_time(); + q->t_c = ktime_to_ns(ktime_get()); q->tokens = q->buffer; q->ptokens = q->mtu; qdisc_watchdog_cancel(&q->watchdog); @@ -293,14 +292,19 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) q->qdisc = child; } q->limit = qopt->limit; - q->mtu = qopt->mtu; + q->mtu = PSCHED_TICKS2NS(qopt->mtu); q->max_size = max_size; - q->buffer = qopt->buffer; + q->buffer = PSCHED_TICKS2NS(qopt->buffer); q->tokens = q->buffer; q->ptokens = q->mtu; - swap(q->R_tab, rtab); - swap(q->P_tab, ptab); + psched_ratecfg_precompute(&q->rate, rtab->rate.rate); + if (ptab) { + psched_ratecfg_precompute(&q->peak, ptab->rate.rate); + q->peak_present = true; + } else { + q->peak_present = false; + } sch_tree_unlock(sch); err = 0; @@ -319,7 +323,7 @@ static int tbf_init(struct Qdisc *sch, struct nlattr *opt) if (opt == NULL) return -EINVAL; - q->t_c = psched_get_time(); + q->t_c = ktime_to_ns(ktime_get()); qdisc_watchdog_init(&q->watchdog, sch); q->qdisc = &noop_qdisc; @@ -331,12 +335,6 @@ static void tbf_destroy(struct Qdisc *sch) struct tbf_sched_data *q = qdisc_priv(sch); qdisc_watchdog_cancel(&q->watchdog); - - if (q->P_tab) - qdisc_put_rtab(q->P_tab); - if (q->R_tab) - qdisc_put_rtab(q->R_tab); - qdisc_destroy(q->qdisc); } @@ -352,13 +350,13 @@ static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb) goto nla_put_failure; opt.limit = q->limit; - opt.rate = q->R_tab->rate; - if (q->P_tab) - opt.peakrate = q->P_tab->rate; + opt.rate.rate = psched_ratecfg_getrate(&q->rate); + if (q->peak_present) + opt.peakrate.rate = psched_ratecfg_getrate(&q->peak); else memset(&opt.peakrate, 0, sizeof(opt.peakrate)); - opt.mtu = q->mtu; - opt.buffer = q->buffer; + opt.mtu = PSCHED_NS2TICKS(q->mtu); + opt.buffer = PSCHED_NS2TICKS(q->buffer); if (nla_put(skb, TCA_TBF_PARMS, sizeof(opt), &opt)) goto nla_put_failure;