[{"id":1765375,"web_url":"http://patchwork.ozlabs.org/comment/1765375/","msgid":"<20170908134332.GD9064@sisyphus.home.austad.us>","list_archive_url":null,"date":"2017-09-08T13:43:32","subject":"Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper\n\t(CBS) qdisc","submitter":{"id":67212,"url":"http://patchwork.ozlabs.org/api/people/67212/","name":"Henrik Austad","email":"henrik@austad.us"},"content":"On Thu, Aug 31, 2017 at 06:26:22PM -0700, Vinicius Costa Gomes wrote:\n> This queueing discipline implements the shaper algorithm defined by\n> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.\n> \n> It's primary usage is to apply some bandwidth reservation to user\n> defined traffic classes, which are mapped to different queues via the\n> mqprio qdisc.\n> \n> Initially, it only supports offloading the traffic shaping work to\n> supporting controllers.\n> \n> Later, when a software implementation is added, the current dependency\n> on being installed \"under\" mqprio can be lifted.\n> \n> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>\n> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>\n\nSo, I started testing this in my VM to make sure I didn't do anything silly \nand it exploded spectacularly as it used the underlying e1000 driver which \ndoes not have a ndo_setup_tc present.\n\nI commented inline below, but as a tl;dr\n\nThe cbs_init() would call into cbs_change() that would correctly detect \nthat ndo_setup_tc is missing and abort. However, at this stage it would try \nto desroy the qdisc and in cbs_destroy() there's no such guard leading to a\n\nBUG: unable to handle kernel NULL pointer dereference at 0000000000000010\n\nFixing that, another NULL would be found when the code walks into \nqdisc_destroy(q->qdisc).\n\nApart from this, it loaded fine after some wrestling with tc :)\n\nAwesome! :D\n\n> ---\n>  include/linux/netdevice.h |   1 +\n>  net/sched/Kconfig         |  11 ++\n>  net/sched/Makefile        |   1 +\n>  net/sched/sch_cbs.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++++\n>  4 files changed, 299 insertions(+)\n>  create mode 100644 net/sched/sch_cbs.c\n> \n> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h\n> index 35de8312e0b5..dd9a2ecd0c03 100644\n> --- a/include/linux/netdevice.h\n> +++ b/include/linux/netdevice.h\n> @@ -775,6 +775,7 @@ enum tc_setup_type {\n>  \tTC_SETUP_CLSFLOWER,\n>  \tTC_SETUP_CLSMATCHALL,\n>  \tTC_SETUP_CLSBPF,\n> +\tTC_SETUP_CBS,\n>  };\n>  \n>  /* These structures hold the attributes of xdp state that are being passed\n> diff --git a/net/sched/Kconfig b/net/sched/Kconfig\n> index e70ed26485a2..c03d86a7775e 100644\n> --- a/net/sched/Kconfig\n> +++ b/net/sched/Kconfig\n> @@ -172,6 +172,17 @@ config NET_SCH_TBF\n>  \t  To compile this code as a module, choose M here: the\n>  \t  module will be called sch_tbf.\n>  \n> +config NET_SCH_CBS\n\nShouldn't this depend on NET_SCH_MQPRIO as it is supposed to hook into \nthat?\n\n@@ -173,6 +173,7 @@ config NET_SCH_TBF\n          module will be called sch_tbf.\n \n config NET_SCH_CBS\n+       depends on NET_SCH_MQPRIO\n        tristate \"Credit Based Shaper (CBS)\"\n        ---help---\n          Say Y here if you want to use the Credit Based Shaper (CBS) packet\n\n> +\ttristate \"Credit Based Shaper (CBS)\"\n> +\t---help---\n> +\t  Say Y here if you want to use the Credit Based Shaper (CBS) packet\n> +\t  scheduling algorithm.\n> +\n> +\t  See the top of <file:net/sched/sch_cbs.c> for more details.\n> +\n> +\t  To compile this code as a module, choose M here: the\n> +\t  module will be called sch_cbs.\n> +\n>  config NET_SCH_GRED\n>  \ttristate \"Generic Random Early Detection (GRED)\"\n>  \t---help---\n> diff --git a/net/sched/Makefile b/net/sched/Makefile\n> index 7b915d226de7..80c8f92d162d 100644\n> --- a/net/sched/Makefile\n> +++ b/net/sched/Makefile\n> @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)\t+= sch_fq_codel.o\n>  obj-$(CONFIG_NET_SCH_FQ)\t+= sch_fq.o\n>  obj-$(CONFIG_NET_SCH_HHF)\t+= sch_hhf.o\n>  obj-$(CONFIG_NET_SCH_PIE)\t+= sch_pie.o\n> +obj-$(CONFIG_NET_SCH_CBS)\t+= sch_cbs.o\n>  \n>  obj-$(CONFIG_NET_CLS_U32)\t+= cls_u32.o\n>  obj-$(CONFIG_NET_CLS_ROUTE4)\t+= cls_route.o\n> diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c\n> new file mode 100644\n> index 000000000000..1c86a9e14150\n> --- /dev/null\n> +++ b/net/sched/sch_cbs.c\n> @@ -0,0 +1,286 @@\n> +/*\n> + * net/sched/sch_cbs.c\tCredit Based Shaper\n> + *\n> + *\t\tThis program is free software; you can redistribute it and/or\n> + *\t\tmodify it under the terms of the GNU General Public License\n> + *\t\tas published by the Free Software Foundation; either version\n> + *\t\t2 of the License, or (at your option) any later version.\n> + *\n> + * Authors:\tVininicius Costa Gomes <vinicius.gomes@intel.com>\n> + *\n> + */\n> +\n> +#include <linux/module.h>\n> +#include <linux/types.h>\n> +#include <linux/kernel.h>\n> +#include <linux/string.h>\n> +#include <linux/errno.h>\n> +#include <linux/skbuff.h>\n> +#include <net/netlink.h>\n> +#include <net/sch_generic.h>\n> +#include <net/pkt_sched.h>\n> +\n> +struct cbs_sched_data {\n> +\tstruct Qdisc *qdisc; /* Inner qdisc, default - pfifo queue */\n> +\ts32 queue;\n> +\ts32 locredit;\n> +\ts32 hicredit;\n> +\ts32 sendslope;\n> +\ts32 idleslope;\n> +};\n> +\n> +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,\n> +\t\t       struct sk_buff **to_free)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\tint ret;\n> +\n> +\tret = qdisc_enqueue(skb, q->qdisc, to_free);\n> +\tif (ret != NET_XMIT_SUCCESS) {\n> +\t\tif (net_xmit_drop_count(ret))\n> +\t\t\tqdisc_qstats_drop(sch);\n> +\t\treturn ret;\n> +\t}\n> +\n> +\tqdisc_qstats_backlog_inc(sch, skb);\n> +\tsch->q.qlen++;\n> +\treturn NET_XMIT_SUCCESS;\n> +}\n> +\n> +static struct sk_buff *cbs_dequeue(struct Qdisc *sch)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\tstruct sk_buff *skb;\n> +\n> +\tskb = q->qdisc->ops->peek(q->qdisc);\n> +\tif (skb) {\n> +\t\tskb = qdisc_dequeue_peeked(q->qdisc);\n> +\t\tif (unlikely(!skb))\n> +\t\t\treturn NULL;\n> +\n> +\t\tqdisc_qstats_backlog_dec(sch, skb);\n> +\t\tsch->q.qlen--;\n> +\t\tqdisc_bstats_update(sch, skb);\n> +\n> +\t\treturn skb;\n> +\t}\n> +\treturn NULL;\n> +}\n> +\n> +static void cbs_reset(struct Qdisc *sch)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\n> +\tqdisc_reset(q->qdisc);\n> +}\n> +\n> +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {\n> +\t[TCA_CBS_PARMS]\t= { .len = sizeof(struct tc_cbs_qopt) },\n> +};\n> +\n> +static int cbs_change(struct Qdisc *sch, struct nlattr *opt)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\tstruct tc_cbs_qopt_offload cbs = { };\n> +\tstruct nlattr *tb[TCA_CBS_MAX + 1];\n> +\tconst struct net_device_ops *ops;\n> +\tstruct tc_cbs_qopt *qopt;\n> +\tstruct net_device *dev;\n> +\tint err;\n> +\n> +\terr = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);\n> +\tif (err < 0)\n> +\t\treturn err;\n> +\n> +\terr = -EINVAL;\n> +\tif (!tb[TCA_CBS_PARMS])\n> +\t\tgoto done;\n> +\n> +\tqopt = nla_data(tb[TCA_CBS_PARMS]);\n> +\n> +\tdev = qdisc_dev(sch);\n> +\tops = dev->netdev_ops;\n> +\n> +\t/* FIXME: this means that we can only install this qdisc\n> +\t * \"under\" mqprio. Do we need a more generic way to retrieve\n> +\t * the queue, or do we pass the netdev_queue to the driver?\n> +\t */\n> +\tcbs.queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);\n> +\n> +\tcbs.enable = 1;\n> +\tcbs.hicredit = qopt->hicredit;\n> +\tcbs.locredit = qopt->locredit;\n> +\tcbs.idleslope = qopt->idleslope;\n> +\tcbs.sendslope = qopt->sendslope;\n> +\n> +\terr = -ENOTSUPP;\n> +\tif (!ops->ndo_setup_tc)\n> +\t\tgoto done;\n\nThis confuses tc a bit, and looking at other qdisc schedulers, it seems \nlike EOPNOTSUPP is an alternative, this changes the output from\n\nRTNETLINK answers: Unknown error 524\n - to\nRTNETLINK answers: Operation not supported\n\nwhich perhaps is more informative.\n\n> +\n> +\terr = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);\n> +\tif (err < 0)\n> +\t\tgoto done;\n> +\n> +\tq->queue = cbs.queue;\n> +\tq->hicredit = cbs.hicredit;\n> +\tq->locredit = cbs.locredit;\n> +\tq->idleslope = cbs.idleslope;\n> +\tq->sendslope = cbs.sendslope;\n> +\n> +done:\n> +\treturn err;\n> +}\n> +\n> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\n> +\tif (!opt)\n> +\t\treturn -EINVAL;\n> +\n> +\tq->qdisc = fifo_create_dflt(sch, &pfifo_qdisc_ops, 1024);\n> +\tqdisc_hash_add(q->qdisc, true);\n> +\n> +\treturn cbs_change(sch, opt);\n> +}\n> +\n> +static void cbs_destroy(struct Qdisc *sch)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\tstruct tc_cbs_qopt_offload cbs = { };\n> +\tstruct net_device *dev;\n> +\tint err;\n> +\n> +\tq->hicredit = 0;\n> +\tq->locredit = 0;\n> +\tq->idleslope = 0;\n> +\tq->sendslope = 0;\n> +\n> +\tdev = qdisc_dev(sch);\n> +\n> +\tcbs.queue = q->queue;\n> +\tcbs.enable = 0;\n> +\n> +\terr = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);\n\nThis can lead to NULL pointer deref if it is not set (tested for in \ncbs_change and error there leads us here, which then explodes).\n\n> +\tif (err < 0)\n> +\t\tpr_warn(\"Couldn't reset queue %d to default values\\n\",\n> +\t\t\tcbs.queue);\n> +\n> +\tqdisc_destroy(q->qdisc);\n\nSame, q->qdisc was NULL when cbs_init() failed.\n\n> +}\n> +\n> +static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\tstruct nlattr *nest;\n> +\tstruct tc_cbs_qopt opt;\n> +\n> +\tsch->qstats.backlog = q->qdisc->qstats.backlog;\n> +\tnest = nla_nest_start(skb, TCA_OPTIONS);\n> +\tif (!nest)\n> +\t\tgoto nla_put_failure;\n> +\n> +\topt.hicredit = q->hicredit;\n> +\topt.locredit = q->locredit;\n> +\topt.sendslope = q->sendslope;\n> +\topt.idleslope = q->idleslope;\n> +\n> +\tif (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))\n> +\t\tgoto nla_put_failure;\n> +\n> +\treturn nla_nest_end(skb, nest);\n> +\n> +nla_put_failure:\n> +\tnla_nest_cancel(skb, nest);\n> +\treturn -1;\n> +}\n> +\n> +static int cbs_dump_class(struct Qdisc *sch, unsigned long cl,\n> +\t\t\t  struct sk_buff *skb, struct tcmsg *tcm)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\n> +\ttcm->tcm_handle |= TC_H_MIN(1);\n> +\ttcm->tcm_info = q->qdisc->handle;\n> +\n> +\treturn 0;\n> +}\n> +\n> +static int cbs_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,\n> +\t\t     struct Qdisc **old)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\n> +\tif (!new)\n> +\t\tnew = &noop_qdisc;\n> +\n> +\t*old = qdisc_replace(sch, new, &q->qdisc);\n> +\treturn 0;\n> +}\n> +\n> +static struct Qdisc *cbs_leaf(struct Qdisc *sch, unsigned long arg)\n> +{\n> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n> +\n> +\treturn q->qdisc;\n> +}\n> +\n> +static unsigned long cbs_find(struct Qdisc *sch, u32 classid)\n> +{\n> +\treturn 1;\n> +}\n> +\n> +static int cbs_delete(struct Qdisc *sch, unsigned long arg)\n> +{\n> +\treturn 0;\n> +}\n> +\n> +static void cbs_walk(struct Qdisc *sch, struct qdisc_walker *walker)\n> +{\n> +\tif (!walker->stop) {\n> +\t\tif (walker->count >= walker->skip)\n> +\t\t\tif (walker->fn(sch, 1, walker) < 0) {\n> +\t\t\t\twalker->stop = 1;\n> +\t\t\t\treturn;\n> +\t\t\t}\n> +\t\twalker->count++;\n> +\t}\n> +}\n> +\n> +static const struct Qdisc_class_ops cbs_class_ops = {\n> +\t.graft\t\t=\tcbs_graft,\n> +\t.leaf\t\t=\tcbs_leaf,\n> +\t.find\t\t=\tcbs_find,\n> +\t.delete\t\t=\tcbs_delete,\n> +\t.walk\t\t=\tcbs_walk,\n> +\t.dump\t\t=\tcbs_dump_class,\n> +};\n> +\n> +static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {\n> +\t.next\t\t=\tNULL,\n> +\t.cl_ops\t\t=\t&cbs_class_ops,\n> +\t.id\t\t=\t\"cbs\",\n> +\t.priv_size\t=\tsizeof(struct cbs_sched_data),\n> +\t.enqueue\t=\tcbs_enqueue,\n> +\t.dequeue\t=\tcbs_dequeue,\n> +\t.peek\t\t=\tqdisc_peek_dequeued,\n> +\t.init\t\t=\tcbs_init,\n> +\t.reset\t\t=\tcbs_reset,\n> +\t.destroy\t=\tcbs_destroy,\n> +\t.change\t\t=\tcbs_change,\n> +\t.dump\t\t=\tcbs_dump,\n> +\t.owner\t\t=\tTHIS_MODULE,\n> +};\n> +\n> +static int __init cbs_module_init(void)\n> +{\n> +\treturn register_qdisc(&cbs_qdisc_ops);\n> +}\n> +\n> +static void __exit cbs_module_exit(void)\n> +{\n> +\tunregister_qdisc(&cbs_qdisc_ops);\n> +}\n> +module_init(cbs_module_init)\n> +module_exit(cbs_module_exit)\n> +MODULE_LICENSE(\"GPL\");\n> -- \n> 2.14.1\n>","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=austad-us.20150623.gappssmtp.com\n\theader.i=@austad-us.20150623.gappssmtp.com\n\theader.b=\"F4tEr/T1\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpdpS0fn7z9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 23:44:16 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S933131AbdIHNoE (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 8 Sep 2017 09:44:04 -0400","from mail-lf0-f67.google.com ([209.85.215.67]:34897 \"EHLO\n\tmail-lf0-f67.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S933042AbdIHNn5 (ORCPT\n\t<rfc822;netdev@vger.kernel.org>); Fri, 8 Sep 2017 09:43:57 -0400","by mail-lf0-f67.google.com with SMTP id c8so1131017lfe.2\n\tfor <netdev@vger.kernel.org>; Fri, 08 Sep 2017 06:43:57 -0700 (PDT)","from sisyphus.home.austad.us (162.51-175-50.customer.lyse.net.\n\t[51.175.50.162])\n\tby smtp.gmail.com with ESMTPSA id 1sm333194lje.6.2017.09.08.06.43.53\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tFri, 08 Sep 2017 06:43:54 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=austad-us.20150623.gappssmtp.com; s=20150623;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=Zm0T+KQqg38V4wAk6FsIIhj6e/rVwfgqALrW99neQJw=;\n\tb=F4tEr/T1ctOZ0VkPa44/nFwoKLDaZ0UmmZiYcEIHd9QiIgLRdMlC6MR35h2MP4DHWU\n\tVidtCkoIwx3D8v0SBd5F+3bcO2joO/soIjH23ZnBWGQJaw0xWYKRauTtodSjeYUx21aE\n\t7RhhKZYVVoJszTZQu0SVKFem3tOOJghzw31RTxrjqcbCq9QCGNngDMyJL15xi7H6bW6L\n\t49Mj4RNYAjfaC/gD3x7/gDhQcUXoAHnhNLWhVfbW+FOcB8wzKs3d4uiKyGs18fkcMZxt\n\tveGmLRedX2tzntLyIgzFWnmF4Y8a+Hn3jx2c3H21MUx88OO9OfPN4CnHfnq0Irl2237b\n\tXBJA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=Zm0T+KQqg38V4wAk6FsIIhj6e/rVwfgqALrW99neQJw=;\n\tb=qUJRTD71wkCiyxupGBgMU8fCCE3pUUQuN+LCwb/peLtBvmHr4DtgBYkgm5bfgl9LPv\n\t6htww4BQplMHdzoHC/ne24XmeFx6YS2+zhajuyS0QEt7NW8milDtooF1T4wuY2OKG1ey\n\tilHAgtZPGaN5sf6pQni1+7pMFXTD4uxPut4nEoZwP2V9ojcl0uNo0zT4Y0jRiejdqTTz\n\tGvNX6kaJHlYEQ+bjyjs0jcYvglRu3Ej0e4zqr+nPCfsetJEzVnbj7T0xGx0lER+CW5hC\n\tdXnYo1jpP0PZYuWezU+1U9LTo0nA+b10dmN4UHboDVJ2c9FxYWNWJQbJ78rCD8SnP32j\n\tYAOQ==","X-Gm-Message-State":"AHPjjUjitD2UxyDInmh4ySS76nAdp82GqvnzI2hU4slyIWaY9oWAufl7\n\ta/+rCFFLvraSOb3g","X-Google-Smtp-Source":"AOwi7QAoIDM634n4Q4azET/TjV8DdkEaokLgYiPlQCjuOVFhg6xG/uOxe4E4YUs2wC454kqoU4ii6g==","X-Received":"by 10.46.0.203 with SMTP id e72mr1099140lji.115.1504878235778;\n\tFri, 08 Sep 2017 06:43:55 -0700 (PDT)","Date":"Fri, 8 Sep 2017 15:43:32 +0200","From":"Henrik Austad <henrik@austad.us>","To":"Vinicius Costa Gomes <vinicius.gomes@intel.com>","Cc":"netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com,\n\tjiri@resnulli.us, intel-wired-lan@lists.osuosl.org,\n\tandre.guedes@intel.com, ivan.briano@intel.com,\n\tjesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,\n\trichardcochran@gmail.com","Subject":"Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper\n\t(CBS) qdisc","Message-ID":"<20170908134332.GD9064@sisyphus.home.austad.us>","References":"<20170901012625.14838-1-vinicius.gomes@intel.com>\n\t<20170901012625.14838-3-vinicius.gomes@intel.com>","MIME-Version":"1.0","Content-Type":"multipart/signed; micalg=pgp-sha1;\n\tprotocol=\"application/pgp-signature\"; boundary=\"FFoLq8A0u+X9iRU8\"","Content-Disposition":"inline","In-Reply-To":"<20170901012625.14838-3-vinicius.gomes@intel.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}},{"id":1768259,"web_url":"http://patchwork.ozlabs.org/comment/1768259/","msgid":"<871snajepw.fsf@intel.com>","list_archive_url":null,"date":"2017-09-14T00:39:23","subject":"Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper\n\t(CBS) qdisc","submitter":{"id":72272,"url":"http://patchwork.ozlabs.org/api/people/72272/","name":"Vinicius Costa Gomes","email":"vinicius.gomes@intel.com"},"content":"Hi Henrik,\n\nHenrik Austad <henrik@austad.us> writes:\n\n> On Thu, Aug 31, 2017 at 06:26:22PM -0700, Vinicius Costa Gomes wrote:\n>> This queueing discipline implements the shaper algorithm defined by\n>> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.\n>>\n>> It's primary usage is to apply some bandwidth reservation to user\n>> defined traffic classes, which are mapped to different queues via the\n>> mqprio qdisc.\n>>\n>> Initially, it only supports offloading the traffic shaping work to\n>> supporting controllers.\n>>\n>> Later, when a software implementation is added, the current dependency\n>> on being installed \"under\" mqprio can be lifted.\n>>\n>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>\n>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>\n>\n> So, I started testing this in my VM to make sure I didn't do anything silly\n> and it exploded spectacularly as it used the underlying e1000 driver which\n> does not have a ndo_setup_tc present.\n>\n> I commented inline below, but as a tl;dr\n>\n> The cbs_init() would call into cbs_change() that would correctly detect\n> that ndo_setup_tc is missing and abort. However, at this stage it would try\n> to desroy the qdisc and in cbs_destroy() there's no such guard leading to a\n>\n> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010\n>\n> Fixing that, another NULL would be found when the code walks into\n> qdisc_destroy(q->qdisc).\n>\n> Apart from this, it loaded fine after some wrestling with tc :)\n>\n> Awesome! :D\n>\n>> ---\n>>  include/linux/netdevice.h |   1 +\n>>  net/sched/Kconfig         |  11 ++\n>>  net/sched/Makefile        |   1 +\n>>  net/sched/sch_cbs.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++++\n>>  4 files changed, 299 insertions(+)\n>>  create mode 100644 net/sched/sch_cbs.c\n>>\n>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h\n>> index 35de8312e0b5..dd9a2ecd0c03 100644\n>> --- a/include/linux/netdevice.h\n>> +++ b/include/linux/netdevice.h\n>> @@ -775,6 +775,7 @@ enum tc_setup_type {\n>>  \tTC_SETUP_CLSFLOWER,\n>>  \tTC_SETUP_CLSMATCHALL,\n>>  \tTC_SETUP_CLSBPF,\n>> +\tTC_SETUP_CBS,\n>>  };\n>>\n>>  /* These structures hold the attributes of xdp state that are being passed\n>> diff --git a/net/sched/Kconfig b/net/sched/Kconfig\n>> index e70ed26485a2..c03d86a7775e 100644\n>> --- a/net/sched/Kconfig\n>> +++ b/net/sched/Kconfig\n>> @@ -172,6 +172,17 @@ config NET_SCH_TBF\n>>  \t  To compile this code as a module, choose M here: the\n>>  \t  module will be called sch_tbf.\n>>\n>> +config NET_SCH_CBS\n>\n> Shouldn't this depend on NET_SCH_MQPRIO as it is supposed to hook into\n> that?\n\nRight now, the CBS shaper only makes sense with mqprio, later it may use\nsome other qdisc (like \"taprio\" mentioned in the cover letter) to hook\ninto, so, yes, you are right, there's a dependency here, better make it\nclear. Will fix.\n\n>\n> @@ -173,6 +173,7 @@ config NET_SCH_TBF\n>           module will be called sch_tbf.\n>\n>  config NET_SCH_CBS\n> +       depends on NET_SCH_MQPRIO\n>         tristate \"Credit Based Shaper (CBS)\"\n>         ---help---\n>           Say Y here if you want to use the Credit Based Shaper (CBS) packet\n>\n>> +\ttristate \"Credit Based Shaper (CBS)\"\n>> +\t---help---\n>> +\t  Say Y here if you want to use the Credit Based Shaper (CBS) packet\n>> +\t  scheduling algorithm.\n>> +\n>> +\t  See the top of <file:net/sched/sch_cbs.c> for more details.\n>> +\n>> +\t  To compile this code as a module, choose M here: the\n>> +\t  module will be called sch_cbs.\n>> +\n>>  config NET_SCH_GRED\n>>  \ttristate \"Generic Random Early Detection (GRED)\"\n>>  \t---help---\n>> diff --git a/net/sched/Makefile b/net/sched/Makefile\n>> index 7b915d226de7..80c8f92d162d 100644\n>> --- a/net/sched/Makefile\n>> +++ b/net/sched/Makefile\n>> @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)\t+= sch_fq_codel.o\n>>  obj-$(CONFIG_NET_SCH_FQ)\t+= sch_fq.o\n>>  obj-$(CONFIG_NET_SCH_HHF)\t+= sch_hhf.o\n>>  obj-$(CONFIG_NET_SCH_PIE)\t+= sch_pie.o\n>> +obj-$(CONFIG_NET_SCH_CBS)\t+= sch_cbs.o\n>>\n>>  obj-$(CONFIG_NET_CLS_U32)\t+= cls_u32.o\n>>  obj-$(CONFIG_NET_CLS_ROUTE4)\t+= cls_route.o\n>> diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c\n>> new file mode 100644\n>> index 000000000000..1c86a9e14150\n>> --- /dev/null\n>> +++ b/net/sched/sch_cbs.c\n>> @@ -0,0 +1,286 @@\n>> +/*\n>> + * net/sched/sch_cbs.c\tCredit Based Shaper\n>> + *\n>> + *\t\tThis program is free software; you can redistribute it and/or\n>> + *\t\tmodify it under the terms of the GNU General Public License\n>> + *\t\tas published by the Free Software Foundation; either version\n>> + *\t\t2 of the License, or (at your option) any later version.\n>> + *\n>> + * Authors:\tVininicius Costa Gomes <vinicius.gomes@intel.com>\n>> + *\n>> + */\n>> +\n>> +#include <linux/module.h>\n>> +#include <linux/types.h>\n>> +#include <linux/kernel.h>\n>> +#include <linux/string.h>\n>> +#include <linux/errno.h>\n>> +#include <linux/skbuff.h>\n>> +#include <net/netlink.h>\n>> +#include <net/sch_generic.h>\n>> +#include <net/pkt_sched.h>\n>> +\n>> +struct cbs_sched_data {\n>> +\tstruct Qdisc *qdisc; /* Inner qdisc, default - pfifo queue */\n>> +\ts32 queue;\n>> +\ts32 locredit;\n>> +\ts32 hicredit;\n>> +\ts32 sendslope;\n>> +\ts32 idleslope;\n>> +};\n>> +\n>> +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,\n>> +\t\t       struct sk_buff **to_free)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\tint ret;\n>> +\n>> +\tret = qdisc_enqueue(skb, q->qdisc, to_free);\n>> +\tif (ret != NET_XMIT_SUCCESS) {\n>> +\t\tif (net_xmit_drop_count(ret))\n>> +\t\t\tqdisc_qstats_drop(sch);\n>> +\t\treturn ret;\n>> +\t}\n>> +\n>> +\tqdisc_qstats_backlog_inc(sch, skb);\n>> +\tsch->q.qlen++;\n>> +\treturn NET_XMIT_SUCCESS;\n>> +}\n>> +\n>> +static struct sk_buff *cbs_dequeue(struct Qdisc *sch)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\tstruct sk_buff *skb;\n>> +\n>> +\tskb = q->qdisc->ops->peek(q->qdisc);\n>> +\tif (skb) {\n>> +\t\tskb = qdisc_dequeue_peeked(q->qdisc);\n>> +\t\tif (unlikely(!skb))\n>> +\t\t\treturn NULL;\n>> +\n>> +\t\tqdisc_qstats_backlog_dec(sch, skb);\n>> +\t\tsch->q.qlen--;\n>> +\t\tqdisc_bstats_update(sch, skb);\n>> +\n>> +\t\treturn skb;\n>> +\t}\n>> +\treturn NULL;\n>> +}\n>> +\n>> +static void cbs_reset(struct Qdisc *sch)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\n>> +\tqdisc_reset(q->qdisc);\n>> +}\n>> +\n>> +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {\n>> +\t[TCA_CBS_PARMS]\t= { .len = sizeof(struct tc_cbs_qopt) },\n>> +};\n>> +\n>> +static int cbs_change(struct Qdisc *sch, struct nlattr *opt)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\tstruct tc_cbs_qopt_offload cbs = { };\n>> +\tstruct nlattr *tb[TCA_CBS_MAX + 1];\n>> +\tconst struct net_device_ops *ops;\n>> +\tstruct tc_cbs_qopt *qopt;\n>> +\tstruct net_device *dev;\n>> +\tint err;\n>> +\n>> +\terr = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);\n>> +\tif (err < 0)\n>> +\t\treturn err;\n>> +\n>> +\terr = -EINVAL;\n>> +\tif (!tb[TCA_CBS_PARMS])\n>> +\t\tgoto done;\n>> +\n>> +\tqopt = nla_data(tb[TCA_CBS_PARMS]);\n>> +\n>> +\tdev = qdisc_dev(sch);\n>> +\tops = dev->netdev_ops;\n>> +\n>> +\t/* FIXME: this means that we can only install this qdisc\n>> +\t * \"under\" mqprio. Do we need a more generic way to retrieve\n>> +\t * the queue, or do we pass the netdev_queue to the driver?\n>> +\t */\n>> +\tcbs.queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);\n>> +\n>> +\tcbs.enable = 1;\n>> +\tcbs.hicredit = qopt->hicredit;\n>> +\tcbs.locredit = qopt->locredit;\n>> +\tcbs.idleslope = qopt->idleslope;\n>> +\tcbs.sendslope = qopt->sendslope;\n>> +\n>> +\terr = -ENOTSUPP;\n>> +\tif (!ops->ndo_setup_tc)\n>> +\t\tgoto done;\n>\n> This confuses tc a bit, and looking at other qdisc schedulers, it seems\n> like EOPNOTSUPP is an alternative, this changes the output from\n>\n> RTNETLINK answers: Unknown error 524\n>  - to\n> RTNETLINK answers: Operation not supported\n>\n\nYeah, I missed this. Thanks for catching this.\n\n> which perhaps is more informative.\n>\n>> +\n>> +\terr = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);\n>> +\tif (err < 0)\n>> +\t\tgoto done;\n>> +\n>> +\tq->queue = cbs.queue;\n>> +\tq->hicredit = cbs.hicredit;\n>> +\tq->locredit = cbs.locredit;\n>> +\tq->idleslope = cbs.idleslope;\n>> +\tq->sendslope = cbs.sendslope;\n>> +\n>> +done:\n>> +\treturn err;\n>> +}\n>> +\n>> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\n>> +\tif (!opt)\n>> +\t\treturn -EINVAL;\n>> +\n>> +\tq->qdisc = fifo_create_dflt(sch, &pfifo_qdisc_ops, 1024);\n>> +\tqdisc_hash_add(q->qdisc, true);\n>> +\n>> +\treturn cbs_change(sch, opt);\n>> +}\n>> +\n>> +static void cbs_destroy(struct Qdisc *sch)\n>> +{\n>> +\tstruct cbs_sched_data *q = qdisc_priv(sch);\n>> +\tstruct tc_cbs_qopt_offload cbs = { };\n>> +\tstruct net_device *dev;\n>> +\tint err;\n>> +\n>> +\tq->hicredit = 0;\n>> +\tq->locredit = 0;\n>> +\tq->idleslope = 0;\n>> +\tq->sendslope = 0;\n>> +\n>> +\tdev = qdisc_dev(sch);\n>> +\n>> +\tcbs.queue = q->queue;\n>> +\tcbs.enable = 0;\n>> +\n>> +\terr = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);\n>\n> This can lead to NULL pointer deref if it is not set (tested for in\n> cbs_change and error there leads us here, which then explodes).\n\nFixed.\n\n>\n>> +\tif (err < 0)\n>> +\t\tpr_warn(\"Couldn't reset queue %d to default values\\n\",\n>> +\t\t\tcbs.queue);\n>> +\n>> +\tqdisc_destroy(q->qdisc);\n>\n> Same, q->qdisc was NULL when cbs_init() failed.\n>\n\nFixed the error path, thanks.\n\n\nCheers,\n--\nVinicius","headers":{"Return-Path":"<netdev-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netdev-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xt0686xqBz9s83\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 14 Sep 2017 10:39:28 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751434AbdINAj0 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 13 Sep 2017 20:39:26 -0400","from mga03.intel.com ([134.134.136.65]:55420 \"EHLO mga03.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751173AbdINAjY (ORCPT <rfc822;netdev@vger.kernel.org>);\n\tWed, 13 Sep 2017 20:39:24 -0400","from fmsmga006.fm.intel.com ([10.253.24.20])\n\tby orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t13 Sep 2017 17:39:24 -0700","from ellie.jf.intel.com (HELO ellie) ([10.24.9.23])\n\tby fmsmga006.fm.intel.com with ESMTP; 13 Sep 2017 17:39:23 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,390,1500966000\"; d=\"scan'208\";a=\"151598416\"","From":"Vinicius Costa Gomes <vinicius.gomes@intel.com>","To":"Henrik Austad <henrik@austad.us>","Cc":"netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com,\n\tjiri@resnulli.us, intel-wired-lan@lists.osuosl.org,\n\tandre.guedes@intel.com, ivan.briano@intel.com,\n\tjesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,\n\trichardcochran@gmail.com","Subject":"Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper\n\t(CBS) qdisc","In-Reply-To":"<20170908134332.GD9064@sisyphus.home.austad.us>","References":"<20170901012625.14838-1-vinicius.gomes@intel.com>\n\t<20170901012625.14838-3-vinicius.gomes@intel.com>\n\t<20170908134332.GD9064@sisyphus.home.austad.us>","User-Agent":"Notmuch/0.21 (http://notmuchmail.org) Emacs/25.1.1\n\t(x86_64-pc-linux-gnu)","Date":"Wed, 13 Sep 2017 17:39:23 -0700","Message-ID":"<871snajepw.fsf@intel.com>","MIME-Version":"1.0","Content-Type":"text/plain","Sender":"netdev-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netdev.vger.kernel.org>","X-Mailing-List":"netdev@vger.kernel.org"}}]