From patchwork Tue Jan 14 16:12:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1222907 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=jCWlhu+Z; dkim-atps=neutral Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47xwV717s8z9sSL for ; Wed, 15 Jan 2020 03:13:02 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id F125A860A3; Tue, 14 Jan 2020 16:13:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xtCXjY7MgWOV; Tue, 14 Jan 2020 16:12:56 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 3D36886124; Tue, 14 Jan 2020 16:12:56 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 24113C1D83; Tue, 14 Jan 2020 16:12:56 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8B508C077D for ; Tue, 14 Jan 2020 16:12:54 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 884BB86110 for ; Tue, 14 Jan 2020 16:12:54 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EVOKLXGNejYH for ; Tue, 14 Jan 2020 16:12:47 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from us-smtp-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by whitealder.osuosl.org (Postfix) with ESMTPS id C239485BBA for ; Tue, 14 Jan 2020 16:12:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579018366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DE/AITKuphyN8JLSsd5f8HcUk1oorX2EQ8jwJS8YhIs=; b=jCWlhu+ZJ16dK+XMv/B8ZYET8vUfMiRinQ37llBuCn3baq4RvqPLTR3y9b6YVtdM1ErQiK Uh+ejRardTtDMZ1TulWQZM+d+9Tb/x/ED08mSrn/FbfuZqCdFRdWZ9kiYHw2PTxPtk/y0x u+etdSAHqI+I/MNDHrbnGasJu6+rRjk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-1-m_hTGs7wPSK8sL1f7_a45g-1; Tue, 14 Jan 2020 11:12:43 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 387AA192AB4E; Tue, 14 Jan 2020 16:12:42 +0000 (UTC) Received: from netdev64.ntdv.lab.eng.bos.redhat.com (wsfd-netdev64.ntdv.lab.eng.bos.redhat.com [10.19.188.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id D775163772; Tue, 14 Jan 2020 16:12:41 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Tue, 14 Jan 2020 11:12:40 -0500 Message-Id: <20200114161240.19950.9459.stgit@netdev64> In-Reply-To: <20200114161226.19950.35918.stgit@netdev64> References: <20200114161226.19950.35918.stgit@netdev64> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-MC-Unique: m_hTGs7wPSK8sL1f7_a45g-1 X-Mimecast-Spam-Score: 0 Subject: [ovs-dev] [PATCH v5 1/2] netdev-dpdk: Add support for multi-queue QoS to the DPDK datapath X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This patch adds support for multi-queue QoS to the DPDK datapath. Most of the code is based on an earlier patch from a patchset sent out by zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support the multi-queue QoS configuration for dpdk datapath" Co-authored-by: zhaozhanxu Signed-off-by: Eelco Chaudron --- lib/netdev-dpdk.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 213 insertions(+), 6 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8198a0b..128963f 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -219,6 +219,13 @@ struct qos_conf { rte_spinlock_t lock; }; +/* QoS queue information used by the netdev queue dump functions. */ +struct netdev_dpdk_queue_state { + uint32_t *queues; + size_t cur_queue; + size_t n_queues; +}; + /* A particular implementation of dpdk QoS operations. * * The functions below return 0 if successful or a positive errno value on @@ -285,6 +292,41 @@ struct dpdk_qos_ops { */ int (*qos_run)(struct qos_conf *qos_conf, struct rte_mbuf **pkts, int pkt_cnt, bool should_steal); + + /* Called to construct a QoS Queue. The implementation should make + * the appropriate calls to configure QoS Queue according to 'details'. + * + * The contents of 'details' should be documented as valid for 'ovs_name' + * in the "other_config" column in the "QoS" table in vswitchd/vswitch.xml + * (which is built as ovs-vswitchd.conf.db(8)). + * + * This function must return 0 if and only if it constructs + * QoS queue successfully. + */ + int (*qos_queue_construct)(const struct smap *details, + uint32_t queue_id, struct qos_conf *conf); + + /* Destroys the QoS Queue. */ + void (*qos_queue_destruct)(struct qos_conf *conf, uint32_t queue_id); + + /* Retrieves details of QoS Queue configuration into 'details'. + * + * The contents of 'details' should be documented as valid for 'ovs_name' + * in the "other_config" column in the "QoS" table in vswitchd/vswitch.xml + * (which is built as ovs-vswitchd.conf.db(8)). + */ + int (*qos_queue_get)(struct smap *details, uint32_t queue_id, + const struct qos_conf *conf); + + /* Retrieves statistics of QoS Queue configuration into 'stats'. */ + int (*qos_queue_get_stats)(const struct qos_conf *conf, uint32_t queue_id, + struct netdev_queue_stats *stats); + + /* Setup the 'netdev_dpdk_queue_state' structure used by the dpdk queue + * dump functions. + */ + int (*qos_queue_dump_state_init)(const struct qos_conf *conf, + struct netdev_dpdk_queue_state *state); }; /* dpdk_qos_ops for each type of user space QoS implementation */ @@ -4191,6 +4233,164 @@ netdev_dpdk_set_qos(struct netdev *netdev, const char *type, return error; } +static int +netdev_dpdk_get_queue(const struct netdev *netdev, uint32_t queue_id, + struct smap *details) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct qos_conf *qos_conf; + int error = 0; + + ovs_mutex_lock(&dev->mutex); + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_get) { + error = EOPNOTSUPP; + } else { + error = qos_conf->ops->qos_queue_get(details, queue_id, qos_conf); + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_set_queue(struct netdev *netdev, uint32_t queue_id, + const struct smap *details) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct qos_conf *qos_conf; + int error = 0; + + ovs_mutex_lock(&dev->mutex); + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_construct) { + error = EOPNOTSUPP; + } else { + error = qos_conf->ops->qos_queue_construct(details, queue_id, + qos_conf); + } + + if (error && error != EOPNOTSUPP) { + VLOG_ERR("Failed to set QoS queue %d on port %s: %s", + queue_id, netdev_get_name(netdev), rte_strerror(error)); + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_delete_queue(struct netdev *netdev, uint32_t queue_id) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct qos_conf *qos_conf; + int error = 0; + + ovs_mutex_lock(&dev->mutex); + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (qos_conf && qos_conf->ops && qos_conf->ops->qos_queue_destruct) { + qos_conf->ops->qos_queue_destruct(qos_conf, queue_id); + } else { + error = EOPNOTSUPP; + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_get_queue_stats(const struct netdev *netdev, uint32_t queue_id, + struct netdev_queue_stats *stats) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct qos_conf *qos_conf; + int error = 0; + + ovs_mutex_lock(&dev->mutex); + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (qos_conf && qos_conf->ops && qos_conf->ops->qos_queue_get_stats) { + qos_conf->ops->qos_queue_get_stats(qos_conf, queue_id, stats); + } else { + error = EOPNOTSUPP; + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_queue_dump_start(const struct netdev *netdev, void **statep) +{ + int error = 0; + struct qos_conf *qos_conf; + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dev->mutex); + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (qos_conf && qos_conf->ops + && qos_conf->ops->qos_queue_dump_state_init) { + struct netdev_dpdk_queue_state *state; + + *statep = state = xmalloc(sizeof *state); + error = qos_conf->ops->qos_queue_dump_state_init(qos_conf, state); + } else { + error = EOPNOTSUPP; + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_queue_dump_next(const struct netdev *netdev, void *state_, + uint32_t *queue_idp, struct smap *details) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_queue_state *state = state_; + struct qos_conf *qos_conf; + int error = EOF; + + ovs_mutex_lock(&dev->mutex); + + while (state->cur_queue < state->n_queues) { + uint32_t queue_id = state->queues[state->cur_queue++]; + + qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); + if (qos_conf && qos_conf->ops && qos_conf->ops->qos_queue_get) { + *queue_idp = queue_id; + error = qos_conf->ops->qos_queue_get(details, queue_id, qos_conf); + break; + } + } + + ovs_mutex_unlock(&dev->mutex); + + return error; +} + +static int +netdev_dpdk_queue_dump_done(const struct netdev *netdev OVS_UNUSED, + void *state_) +{ + struct netdev_dpdk_queue_state *state = state_; + + free(state->queues); + free(state); + return 0; +} + + + /* egress-policer details */ struct egress_policer { @@ -4288,12 +4488,12 @@ egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt, } static const struct dpdk_qos_ops egress_policer_ops = { - "egress-policer", /* qos_name */ - egress_policer_qos_construct, - egress_policer_qos_destruct, - egress_policer_qos_get, - egress_policer_qos_is_equal, - egress_policer_run + .qos_name = "egress-policer", /* qos_name */ + .qos_construct = egress_policer_qos_construct, + .qos_destruct = egress_policer_qos_destruct, + .qos_get = egress_policer_qos_get, + .qos_is_equal = egress_policer_qos_is_equal, + .qos_run = egress_policer_run }; static int @@ -4558,6 +4758,13 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev, .get_qos_types = netdev_dpdk_get_qos_types, \ .get_qos = netdev_dpdk_get_qos, \ .set_qos = netdev_dpdk_set_qos, \ + .get_queue = netdev_dpdk_get_queue, \ + .set_queue = netdev_dpdk_set_queue, \ + .delete_queue = netdev_dpdk_delete_queue, \ + .get_queue_stats = netdev_dpdk_get_queue_stats, \ + .queue_dump_start = netdev_dpdk_queue_dump_start, \ + .queue_dump_next = netdev_dpdk_queue_dump_next, \ + .queue_dump_done = netdev_dpdk_queue_dump_done, \ .update_flags = netdev_dpdk_update_flags, \ .rxq_alloc = netdev_dpdk_rxq_alloc, \ .rxq_construct = netdev_dpdk_rxq_construct, \ From patchwork Tue Jan 14 16:12:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1222911 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=PyMY23gl; dkim-atps=neutral Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47xwVG5t4Tz9sSL for ; Wed, 15 Jan 2020 03:13:10 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id B4AB885F69; Tue, 14 Jan 2020 16:13:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wUcnShh9tndJ; Tue, 14 Jan 2020 16:13:04 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id 09C95834EE; Tue, 14 Jan 2020 16:13:04 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DCDE0C1D88; Tue, 14 Jan 2020 16:13:03 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id D1F41C077D for ; Tue, 14 Jan 2020 16:13:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id CC9C28792F for ; Tue, 14 Jan 2020 16:13:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EcutBAftoeIq for ; Tue, 14 Jan 2020 16:12:56 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by hemlock.osuosl.org (Postfix) with ESMTPS id 0079287884 for ; Tue, 14 Jan 2020 16:12:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579018374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dIMeN86PU6ImvKg5ShlEqSZtxCPu927YScWj+042f3c=; b=PyMY23glJC/rQFdofaSMsB341jtubhZOKoEz46yGuJtL1xTIoodIj2WiqmFxeIRW0Di/Ei LoLI4g+ZEQXvHl8u4tp7dEzmVIiNMllproQ2x9hlIHB5LP45kRIJvKR7ewg0s9dxA0AR6v M4A4cLjjIh+2DBv+sLvXI0T5D1S2/8M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-323-2uAQQUHbPViwe73SLMeYVA-1; Tue, 14 Jan 2020 11:12:50 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 42B171138D6D; Tue, 14 Jan 2020 16:12:49 +0000 (UTC) Received: from netdev64.ntdv.lab.eng.bos.redhat.com (wsfd-netdev64.ntdv.lab.eng.bos.redhat.com [10.19.188.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEF638248A; Tue, 14 Jan 2020 16:12:48 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Tue, 14 Jan 2020 11:12:47 -0500 Message-Id: <20200114161247.19950.22828.stgit@netdev64> In-Reply-To: <20200114161226.19950.35918.stgit@netdev64> References: <20200114161226.19950.35918.stgit@netdev64> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: 2uAQQUHbPViwe73SLMeYVA-1 X-Mimecast-Spam-Score: 0 Subject: [ovs-dev] [PATCH v5 2/2] netdev-dpdk: Add new DPDK RFC 4115 egress policer X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This patch adds a new policer to the DPDK datapath based on RFC 4115's Two-Rate, Three-Color marker. It's a two-level hierarchical policer which first does a color-blind marking of the traffic at the queue level, followed by a color-aware marking at the port level. At the end traffic marked as Green or Yellow is forwarded, Red is dropped. For details on how traffic is marked, see RFC 4115. This egress policer can be used to limit traffic at different rated based on the queues the traffic is in. In addition, it can also be used to prioritize certain traffic over others at a port level. For example, the following configuration will limit the traffic rate at a port level to a maximum of 2000 packets a second (64 bytes IPv4 packets). 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess Information Rate). High priority traffic is routed to queue 10, which marks all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is marked as EIR, i.e. Yellow. ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \ --id=@myqos create qos type=trtcm-policer \ other-config:cir=52000 other-config:cbs=2048 \ other-config:eir=52000 other-config:ebs=2048 \ queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \ --id=@dpdk1Q10 create queue \ other-config:cir=41600000 other-config:cbs=2048 \ other-config:eir=0 other-config:ebs=0 -- \ --id=@dpdk1Q20 create queue \ other-config:cir=0 other-config:cbs=0 \ other-config:eir=41600000 other-config:ebs=2048 \ This configuration accomplishes that the high priority traffic has a guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also use the EIR, so a total of 2000pps at max. These additional 1000pps is shared with the low priority traffic. The low priority traffic can use at maximum 1000pps. Signed-off-by: Eelco Chaudron --- Documentation/topics/dpdk/qos.rst | 43 ++++ lib/netdev-dpdk.c | 359 +++++++++++++++++++++++++++++++++++-- vswitchd/vswitch.xml | 34 ++++ 3 files changed, 421 insertions(+), 15 deletions(-) diff --git a/Documentation/topics/dpdk/qos.rst b/Documentation/topics/dpdk/qos.rst index c0aec5d..1034954 100644 --- a/Documentation/topics/dpdk/qos.rst +++ b/Documentation/topics/dpdk/qos.rst @@ -33,6 +33,9 @@ datapath. These are referred to as *QoS* and *Rate Limiting*, respectively. QoS (Egress Policing) --------------------- +Single Queue Policer +~~~~~~~~~~~~~~~~~~~~ + Assuming you have a :doc:`vhost-user port ` transmitting traffic consisting of packets of size 64 bytes, the following command would limit the egress transmission rate of the port to ~1,000,000 packets per second:: @@ -49,6 +52,46 @@ To clear the QoS configuration from the port and ovsdb, run:: $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos + +Multi Queue Policer +~~~~~~~~~~~~~~~~~~~ + +In addition to the egress-policer OVS-DPDK also has support for a RFC +4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical +policer which first does a color-blind marking of the traffic at the queue +level, followed by a color-aware marking at the port level. At the end +traffic marked as Green or Yellow is forwarded, Red is dropped. For +details on how traffic is marked, see RFC 4115. + +This egress policer can be used to limit traffic at different rated +based on the queues the traffic is in. In addition, it can also be used +to prioritize certain traffic over others at a port level. + +For example, the following configuration will limit the traffic rate at a +port level to a maximum of 2000 packets a second (64 bytes IPv4 packets). +100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess +Information Rate). High priority traffic is routed to queue 10, which marks +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is +marked as EIR, i.e. Yellow:: + + $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \ + --id=@myqos create qos type=trtcm-policer \ + other-config:cir=52000 other-config:cbs=2048 \ + other-config:eir=52000 other-config:ebs=2048 \ + queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \ + --id=@dpdk1Q10 create queue \ + other-config:cir=41600000 other-config:cbs=2048 \ + other-config:eir=0 other-config:ebs=0 -- \ + --id=@dpdk1Q20 create queue \ + other-config:cir=0 other-config:cbs=0 \ + other-config:eir=41600000 other-config:ebs=2048 \ + +This configuration accomplishes that the high priority traffic has a +guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also +use the EIR, so a total of 2000pps at max. These additional 1000pps is +shared with the low priority traffic. The low priority traffic can use at +maximum 1000pps. + Refer to ``vswitch.xml`` for more details on egress policer. Rate Limiting (Ingress Policing) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 128963f..1ed4a47 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -26,6 +26,12 @@ #include #include +/* Include rte_compat.h first to allow experimental API's needed for the + * rte_meter.h rfc4115 functions. Once they are no longer marked as + * experimental the #define and rte_compat.h include can be removed. + */ +#define ALLOW_EXPERIMENTAL_API +#include #include #include #include @@ -329,8 +335,9 @@ struct dpdk_qos_ops { struct netdev_dpdk_queue_state *state); }; -/* dpdk_qos_ops for each type of user space QoS implementation */ +/* dpdk_qos_ops for each type of user space QoS implementation. */ static const struct dpdk_qos_ops egress_policer_ops; +static const struct dpdk_qos_ops trtcm_policer_ops; /* * Array of dpdk_qos_ops, contains pointer to all supported QoS @@ -338,6 +345,7 @@ static const struct dpdk_qos_ops egress_policer_ops; */ static const struct dpdk_qos_ops *const qos_confs[] = { &egress_policer_ops, + &trtcm_policer_ops, NULL }; @@ -2162,9 +2170,9 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, } static inline bool -netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter, - struct rte_meter_srtcm_profile *profile, - struct rte_mbuf *pkt, uint64_t time) +netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter, + struct rte_meter_srtcm_profile *profile, + struct rte_mbuf *pkt, uint64_t time) { uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr); @@ -2173,10 +2181,10 @@ netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter, } static int -netdev_dpdk_policer_run(struct rte_meter_srtcm *meter, - struct rte_meter_srtcm_profile *profile, - struct rte_mbuf **pkts, int pkt_cnt, - bool should_steal) +srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter, + struct rte_meter_srtcm_profile *profile, + struct rte_mbuf **pkts, int pkt_cnt, + bool should_steal) { int i = 0; int cnt = 0; @@ -2186,8 +2194,8 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm *meter, for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; /* Handle current packet */ - if (netdev_dpdk_policer_pkt_handle(meter, profile, - pkt, current_time)) { + if (netdev_dpdk_srtcm_policer_pkt_handle(meter, profile, + pkt, current_time)) { if (cnt != i) { pkts[cnt] = pkt; } @@ -2209,8 +2217,9 @@ ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, int cnt = 0; rte_spinlock_lock(&policer->policer_lock); - cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer->in_prof, - pkts, pkt_cnt, should_steal); + cnt = srtcm_policer_run_single_packet(&policer->in_policer, + &policer->in_prof, + pkts, pkt_cnt, should_steal); rte_spinlock_unlock(&policer->policer_lock); return cnt; @@ -4480,9 +4489,9 @@ egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt, struct egress_policer *policer = CONTAINER_OF(conf, struct egress_policer, qos_conf); - cnt = netdev_dpdk_policer_run(&policer->egress_meter, - &policer->egress_prof, pkts, - pkt_cnt, should_steal); + cnt = srtcm_policer_run_single_packet(&policer->egress_meter, + &policer->egress_prof, pkts, + pkt_cnt, should_steal); return cnt; } @@ -4496,6 +4505,326 @@ static const struct dpdk_qos_ops egress_policer_ops = { .qos_run = egress_policer_run }; +/* trtcm-policer details */ + +struct trtcm_policer { + struct qos_conf qos_conf; + struct rte_meter_trtcm_rfc4115_params meter_params; + struct rte_meter_trtcm_rfc4115_profile meter_profile; + struct rte_meter_trtcm_rfc4115 meter; + struct netdev_queue_stats stats; + struct hmap queues; +}; + +struct trtcm_policer_queue { + struct hmap_node hmap_node; + uint32_t queue_id; + struct rte_meter_trtcm_rfc4115_params meter_params; + struct rte_meter_trtcm_rfc4115_profile meter_profile; + struct rte_meter_trtcm_rfc4115 meter; + struct netdev_queue_stats stats; +}; + +static void +trtcm_policer_details_to_param(const struct smap *details, + struct rte_meter_trtcm_rfc4115_params *params) +{ + memset(params, 0, sizeof *params); + params->cir = smap_get_ullong(details, "cir", 0); + params->eir = smap_get_ullong(details, "eir", 0); + params->cbs = smap_get_ullong(details, "cbs", 0); + params->ebs = smap_get_ullong(details, "ebs", 0); +} + +static void +trtcm_policer_param_to_detail( + const struct rte_meter_trtcm_rfc4115_params *params, + struct smap *details) +{ + smap_add_format(details, "cir", "%"PRIu64, params->cir); + smap_add_format(details, "eir", "%"PRIu64, params->eir); + smap_add_format(details, "cbs", "%"PRIu64, params->cbs); + smap_add_format(details, "ebs", "%"PRIu64, params->ebs); +} + + +static int +trtcm_policer_qos_construct(const struct smap *details, + struct qos_conf **conf) +{ + struct trtcm_policer *policer; + int err = 0; + + policer = xmalloc(sizeof *policer); + qos_conf_init(&policer->qos_conf, &trtcm_policer_ops); + trtcm_policer_details_to_param(details, &policer->meter_params); + err = rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile, + &policer->meter_params); + if (!err) { + err = rte_meter_trtcm_rfc4115_config(&policer->meter, + &policer->meter_profile); + } + + if (!err) { + *conf = &policer->qos_conf; + memset(&policer->stats, 0, sizeof policer->stats); + hmap_init(&policer->queues); + } else { + free(policer); + *conf = NULL; + err = -err; + } + + return err; +} + +static void +trtcm_policer_qos_destruct(struct qos_conf *conf) +{ + struct trtcm_policer_queue *queue, *next_queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, &policer->queues) { + hmap_remove(&policer->queues, &queue->hmap_node); + free(queue); + } + hmap_destroy(&policer->queues); + free(policer); +} + +static int +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details) +{ + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + trtcm_policer_param_to_detail(&policer->meter_params, details); + return 0; +} + +static bool +trtcm_policer_qos_is_equal(const struct qos_conf *conf, + const struct smap *details) +{ + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + struct rte_meter_trtcm_rfc4115_params params; + + trtcm_policer_details_to_param(details, ¶ms); + + return !memcmp(¶ms, &policer->meter_params, sizeof params); +} + +static struct trtcm_policer_queue * +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t queue_id) +{ + struct trtcm_policer_queue *queue; + HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0), + &policer->queues) { + if (queue->queue_id == queue_id) { + return queue; + } + } + return NULL; +} + +static inline bool +trtcm_policer_run_single_packet(struct trtcm_policer *policer, + struct rte_mbuf *pkt, uint64_t time) +{ + enum rte_color pkt_color; + struct trtcm_policer_queue *queue; + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr); + struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf); + + queue = trtcm_policer_qos_find_queue(policer, dpkt->md.skb_priority); + if (!queue) { + /* If no queue is found, use the default queue, which MUST exist. */ + queue = trtcm_policer_qos_find_queue(policer, 0); + if (!queue) { + return false; + } + } + + pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter, + &queue->meter_profile, + time, + pkt_len); + + if (pkt_color == RTE_COLOR_RED) { + queue->stats.tx_errors++; + } else { + queue->stats.tx_bytes += pkt_len; + queue->stats.tx_packets++; + } + + pkt_color = rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter, + &policer->meter_profile, + time, pkt_len, + pkt_color); + + if (pkt_color == RTE_COLOR_RED) { + policer->stats.tx_errors++; + return false; + } + + policer->stats.tx_bytes += pkt_len; + policer->stats.tx_packets++; + return true; +} + +static int +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt, + bool should_steal) +{ + int i = 0; + int cnt = 0; + struct rte_mbuf *pkt = NULL; + uint64_t current_time = rte_rdtsc(); + + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + for (i = 0; i < pkt_cnt; i++) { + pkt = pkts[i]; + + if (trtcm_policer_run_single_packet(policer, pkt, current_time)) { + if (cnt != i) { + pkts[cnt] = pkt; + } + cnt++; + } else { + if (should_steal) { + rte_pktmbuf_free(pkt); + } + } + } + return cnt; +} + +static int +trtcm_policer_qos_queue_construct(const struct smap *details, + uint32_t queue_id, struct qos_conf *conf) +{ + int err = 0; + struct trtcm_policer_queue *queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + queue = trtcm_policer_qos_find_queue(policer, queue_id); + if (!queue) { + queue = xmalloc(sizeof *queue); + queue->queue_id = queue_id; + memset(&queue->stats, 0, sizeof queue->stats); + queue->stats.created = time_msec(); + hmap_insert(&policer->queues, &queue->hmap_node, + hash_2words(queue_id, 0)); + } + if (queue_id == 0 && smap_is_empty(details)) { + /* No default queue configured, use port values */ + memcpy(&queue->meter_params, &policer->meter_params, + sizeof queue->meter_params); + } else { + trtcm_policer_details_to_param(details, &queue->meter_params); + } + + err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile, + &queue->meter_params); + + if (!err) { + err = rte_meter_trtcm_rfc4115_config(&queue->meter, + &queue->meter_profile); + } + if (err) { + hmap_remove(&policer->queues, &queue->hmap_node); + free(queue); + err = -err; + } + return err; +} + +static void +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t queue_id) +{ + struct trtcm_policer_queue *queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + queue = trtcm_policer_qos_find_queue(policer, queue_id); + if (queue) { + hmap_remove(&policer->queues, &queue->hmap_node); + free(queue); + } +} + +static int +trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id, + const struct qos_conf *conf) +{ + struct trtcm_policer_queue *queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + queue = trtcm_policer_qos_find_queue(policer, queue_id); + if (!queue) { + return EINVAL; + } + + trtcm_policer_param_to_detail(&queue->meter_params, details); + return 0; +} + +static int +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf, + uint32_t queue_id, + struct netdev_queue_stats *stats) +{ + struct trtcm_policer_queue *queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + queue = trtcm_policer_qos_find_queue(policer, queue_id); + if (!queue) { + return EINVAL; + } + memcpy(stats, &queue->stats, sizeof *stats); + return 0; +} + +static int +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf, + struct netdev_dpdk_queue_state *state) +{ + uint32_t i = 0; + struct trtcm_policer_queue *queue; + struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer, + qos_conf); + + state->n_queues = hmap_count(&policer->queues); + state->cur_queue = 0; + state->queues = xmalloc(state->n_queues * sizeof *state->queues); + + HMAP_FOR_EACH (queue, hmap_node, &policer->queues) { + state->queues[i++] = queue->queue_id; + } + return 0; +} + +static const struct dpdk_qos_ops trtcm_policer_ops = { + .qos_name = "trtcm-policer", + .qos_construct = trtcm_policer_qos_construct, + .qos_destruct = trtcm_policer_qos_destruct, + .qos_get = trtcm_policer_qos_get, + .qos_is_equal = trtcm_policer_qos_is_equal, + .qos_run = trtcm_policer_run, + .qos_queue_construct = trtcm_policer_qos_queue_construct, + .qos_queue_destruct = trtcm_policer_qos_queue_destruct, + .qos_queue_get = trtcm_policer_qos_queue_get, + .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats, + .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init +}; + static int netdev_dpdk_reconfigure(struct netdev *netdev) { diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 0ec726c..c43cb1a 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -4441,6 +4441,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ in performance will be noticed in terms of overall aggregate traffic throughput. +
trtcm-policer
+
+ A DPDK egress policer algorithm using RFC 4115's Two-Rate, + Three-Color marker. It's a two-level hierarchical policer + which first does a color-blind marking of the traffic at the queue + level, followed by a color-aware marking at the port level. At the + end traffic marked as Green or Yellow is forwarded, Red is dropped. + For details on how traffic is marked, see RFC 4115. + + If the ``default queue'', 0, is not configured it's automatically + created with the same other_config values as the + physical port. +
@@ -4508,6 +4521,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ bytes/tokens of the packet. If there are not enough tokens in the cbs bucket the packet will be dropped. + + The Excess Information Rate (EIR) is measured in bytes of IP + packets per second, i.e. it includes the IP header, but not link + specific (e.g. Ethernet) headers. This represents the bytes per second + rate at which the token bucket will be updated. The eir value is + calculated by (pps x packet data size). For example assuming a user + wishes to limit a stream consisting of 64 byte packets to 1 million + packets per second the EIR would be set to to to 46000000. This value + can be broken into '1,000,000 x 46'. Where 1,000,000 is the policing + rate for the number of packets per second and 46 represents the size + of the packet data for a 64 byte ip packet. + + + The Excess Burst Size (EBS) is measured in bytes and represents a + token bucket. At a minimum this value should be be set to the expected + largest size packet in the traffic stream. In practice larger values + may be used to increase the size of the token bucket. If a packet can + be transmitted then the ebs will be decremented by the number of + bytes/tokens of the packet. If there are not enough tokens in the cbs + bucket the packet might be dropped. +