From patchwork Mon Jan 29 06:59:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867275 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="FDy3d9xm"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="KDXNjm5u"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVgsN3hQyz9ryT for ; Tue, 30 Jan 2018 07:22:00 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8AE3E28B8; Mon, 29 Jan 2018 20:06:25 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A2FFD1A2B for ; Mon, 29 Jan 2018 06:59:58 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4C2FB2F6 for ; Mon, 29 Jan 2018 06:59:57 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 94F9220BDA; Mon, 29 Jan 2018 01:59:56 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 01:59:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=TLcjan8o2QxP0CLId /cYN9SqtyHnLXWoS5B9cSHPQf8=; b=FDy3d9xmoF323Ze7iB/JsyJdnrmmmy5wD KDb7KmtXU2mNDir7/n6Nw12DW+zIeaMQa4VNlJu+P0hsKZlD+MFysUCyQp+VmO0c Jma62frPDyO9iBw9/JzUN8FslFQeUQFbpC49Hly3JZUHWy1GFGaeLAJYQZt/CjYe a08hMPGxfWj3CWFTn5ToEUhMXCqqNf6ECxotzRh8wJvuFzXGNnWnyku6ALqqenl+ rge6tjvFgD+RFk3Lr3LpXsAM73ljt/tnrTnBcc5vryW7Cg96gQCrDR4SpZFl5/yO wfChxqe+IjELwRDYj9i/N5yO4tetudmx51/UKmVPxwT1nZ8LnbuAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=TLcjan8o2QxP0CLId/cYN9SqtyHnLXWoS5B9cSHPQf8=; b=KDXNjm5u oVnd1GUbyjGbmfiBHi5Vxkh7HA2OrpHJ3GmIK29dqUKNFs4pm9IcTKQiuIkUSEOO hxEcpK3AcSEZbgrFx4nLiduiY8LQ6M24c8U8w1HD0k75juQPsufIUZpP14w2y7Np aGDEDKw2nCAFCtGBH/WAr67DQaWS0FYvQaTWr7rRnG47h00BSuDcR6hJLJUtQmw4 CStLkxeW66Iu+YNj7CjRZO3d530FQRVXHfkNQYL+EbKfRnZU1OXNw+txzYkyq8nB 1LO20QEJgZjSyLCKysP8uXJD7agZQZXtJuLa0uF8VwY+Nm0y/7DqA4y93XguYbgG wGlVIzMBsM+Igg== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id 0B6367E34D; Mon, 29 Jan 2018 01:59:53 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:43 +0800 Message-Id: <1517209188-16608-2-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 1/6] dpif-netdev: associate flow with a mark id X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Most modern NICs have the ability to bind a flow with a mark, so that every packet matches such flow will have that mark present in its descriptor. The basic idea of doing that is, when we receives packets later, we could directly get the flow from the mark. That could avoid some very costly CPU operations, including (but not limiting to) miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. Thus, the major work of this patch is to associate a flow with a mark id (an uint32_t number). The association in netdev datapath is done by CMAP, while in hardware it's done by the rte_flow MARK action. One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case there is only one phys port but with 2 queues, there could be 2 PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000), there could be 2 different dp_netdev flows, one for each PMD. That could results to the same mega flow being offloaded twice in the hardware, worse, we may get 2 different marks and only the last one will work. To avoid that, a megaflow_to_mark CMAP is created. An entry will be added for the first PMD that wants to offload a flow. For later PMDs, it will see such megaflow is already offloaded, then the flow will not be offloaded to HW twice. Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what the mark_to_flow CMAP is for. When the first PMD wants to offload a flow, it allocates a new mark and performs the flow offload by reusing the ->flow_put method. When it succeeds, a "mark to flow" entry will be added. For later PMDs, it will get the corresponding mark by above megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added. Co-authored-by: Finn Christensen Signed-off-by: Yuanhan Liu Signed-off-by: Finn Christensen --- v6: - fixed typos in commit log - fixed a sparse warning - used hash_int to compute the hash for mark_to_flow CMAP - added more comments - lock before port lookup v5: - fixed check of flow_mark_has_no_ref (renamed from is_last_flow_mark_reference). This fixed an issue that it took too long to finish flow add/removal if we do that repeatdly. - do mark_to_flow disassociation if flow modification failed --- lib/dpif-netdev.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/netdev.h | 6 ++ 2 files changed, 289 insertions(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index ba62128..a514de8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -75,6 +75,7 @@ #include "tnl-ports.h" #include "unixctl.h" #include "util.h" +#include "uuid.h" VLOG_DEFINE_THIS_MODULE(dpif_netdev); @@ -430,7 +431,9 @@ struct dp_netdev_flow { /* Hash table index by unmasked flow. */ const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */ /* 'flow_table'. */ + const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */ const ovs_u128 ufid; /* Unique flow identifier. */ + const ovs_u128 mega_ufid; /* Unique mega flow identifier. */ const unsigned pmd_id; /* The 'core_id' of pmd thread owning this */ /* flow. */ @@ -441,6 +444,7 @@ struct dp_netdev_flow { struct ovs_refcount ref_cnt; bool dead; + uint32_t mark; /* Unique flow mark assigned to a flow */ /* Statistics. */ struct dp_netdev_flow_stats stats; @@ -1837,6 +1841,178 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd, return cls; } +#define MAX_FLOW_MARK (UINT32_MAX - 1) +#define INVALID_FLOW_MARK (UINT32_MAX) + +struct megaflow_to_mark_data { + const struct cmap_node node; + ovs_u128 mega_ufid; + uint32_t mark; +}; + +struct flow_mark { + struct cmap megaflow_to_mark; + struct cmap mark_to_flow; + struct id_pool *pool; + struct ovs_mutex mutex; +}; + +static struct flow_mark flow_mark = { + .megaflow_to_mark = CMAP_INITIALIZER, + .mark_to_flow = CMAP_INITIALIZER, + .mutex = OVS_MUTEX_INITIALIZER, +}; + +static uint32_t +flow_mark_alloc(void) +{ + uint32_t mark; + + if (!flow_mark.pool) { + /* Haven't initiated yet, do it here */ + flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK); + } + + if (id_pool_alloc_id(flow_mark.pool, &mark)) { + return mark; + } + + return INVALID_FLOW_MARK; +} + +static void +flow_mark_free(uint32_t mark) +{ + id_pool_free_id(flow_mark.pool, mark); +} + +/* associate megaflow with a mark, which is a 1:1 mapping */ +static void +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark) +{ + size_t hash = dp_netdev_flow_hash(mega_ufid); + struct megaflow_to_mark_data *data = xzalloc(sizeof(*data)); + + data->mega_ufid = *mega_ufid; + data->mark = mark; + + cmap_insert(&flow_mark.megaflow_to_mark, + CONST_CAST(struct cmap_node *, &data->node), hash); +} + +/* disassociate meagaflow with a mark */ +static void +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) +{ + size_t hash = dp_netdev_flow_hash(mega_ufid); + struct megaflow_to_mark_data *data; + + CMAP_FOR_EACH_WITH_HASH (data, node, hash, &flow_mark.megaflow_to_mark) { + if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) { + cmap_remove(&flow_mark.megaflow_to_mark, + CONST_CAST(struct cmap_node *, &data->node), hash); + free(data); + return; + } + } + + VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n", + UUID_ARGS((struct uuid *)mega_ufid)); +} + +static inline uint32_t +megaflow_to_mark_find(const ovs_u128 *mega_ufid) +{ + size_t hash = dp_netdev_flow_hash(mega_ufid); + struct megaflow_to_mark_data *data; + + CMAP_FOR_EACH_WITH_HASH (data, node, hash, &flow_mark.megaflow_to_mark) { + if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) { + return data->mark; + } + } + + return INVALID_FLOW_MARK; +} + +/* associate mark with a flow, which is 1:N mapping */ +static void +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow *flow) +{ + dp_netdev_flow_ref(flow); + + cmap_insert(&flow_mark.mark_to_flow, + CONST_CAST(struct cmap_node *, &flow->mark_node), + hash_int(mark, 0)); + flow->mark = mark; + + VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow, mark); +} + +static bool +flow_mark_has_no_ref(uint32_t mark) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), + &flow_mark.mark_to_flow) { + if (flow->mark == mark) { + return false; + } + } + + return true; +} + +static int +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_flow *flow) +{ + int ret = 0; + uint32_t mark = flow->mark; + struct cmap_node *mark_node = CONST_CAST(struct cmap_node *, + &flow->mark_node); + + cmap_remove(&flow_mark.mark_to_flow, mark_node, hash_int(mark, 0)); + flow->mark = INVALID_FLOW_MARK; + + /* + * no flow is referencing the mark any more? If so, let's + * remove the flow from hardare and free the mark. + */ + if (flow_mark_has_no_ref(mark)) { + struct dp_netdev_port *port; + odp_port_t in_port = flow->flow.in_port.odp_port; + + ovs_mutex_lock(&pmd->dp->port_mutex); + port = dp_netdev_lookup_port(pmd->dp, in_port); + if (port) { + ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL); + } + ovs_mutex_unlock(&pmd->dp->port_mutex); + + flow_mark_free(mark); + VLOG_DBG("freed flow mark %u\n", mark); + + megaflow_to_mark_disassociate(&flow->mega_ufid); + } + dp_netdev_flow_unref(flow); + + return ret; +} + +static void +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) { + if (flow->pmd_id == pmd->core_id) { + mark_to_flow_disassociate(pmd, flow); + } + } +} + static void dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow) @@ -1850,6 +2026,9 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, ovs_assert(cls != NULL); dpcls_remove(cls, &flow->cr); cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); + if (flow->mark != INVALID_FLOW_MARK) { + mark_to_flow_disassociate(pmd, flow); + } flow->dead = true; dp_netdev_flow_unref(flow); @@ -2429,6 +2608,101 @@ out: return error; } +/* + * There are two flow offload operations here: addition and modification. + * + * For flow addition, this function does: + * - allocate a new flow mark id + * - perform hardware flow offload + * - associate the flow mark with flow and mega flow + * + * For flow modification, both flow mark and the associations are still + * valid, thus only item 2 needed. + */ +static void +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port, + struct dp_netdev_flow *flow, struct match *match, + const struct nlattr *actions, size_t actions_len) +{ + struct offload_info info; + struct dp_netdev_port *port; + bool modification = flow->mark != INVALID_FLOW_MARK; + const char *op = modification ? "modify" : "add"; + uint32_t mark; + int ret; + + ovs_mutex_lock(&flow_mark.mutex); + + if (modification) { + mark = flow->mark; + } else { + if (!netdev_is_flow_api_enabled()) { + goto out; + } + + /* + * If a mega flow has already been offloaded (from other PMD + * instances), do not offload it again. + */ + mark = megaflow_to_mark_find(&flow->mega_ufid); + if (mark != INVALID_FLOW_MARK) { + VLOG_DBG("flow has already been offloaded with mark %u\n", mark); + mark_to_flow_associate(mark, flow); + goto out; + } + + mark = flow_mark_alloc(); + if (mark == INVALID_FLOW_MARK) { + VLOG_ERR("failed to allocate flow mark!\n"); + goto out; + } + } + info.flow_mark = mark; + + ovs_mutex_lock(&pmd->dp->port_mutex); + port = dp_netdev_lookup_port(pmd->dp, in_port); + if (!port) { + ovs_mutex_unlock(&pmd->dp->port_mutex); + goto out; + } + ret = netdev_flow_put(port->netdev, match, + CONST_CAST(struct nlattr *, actions), + actions_len, &flow->mega_ufid, &info, NULL); + ovs_mutex_unlock(&pmd->dp->port_mutex); + + if (ret) { + VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark); + if (!modification) { + flow_mark_free(mark); + } else { + mark_to_flow_disassociate(pmd, flow); + } + goto out; + } + + if (!modification) { + megaflow_to_mark_associate(&flow->mega_ufid, mark); + mark_to_flow_associate(mark, flow); + } + VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark); + +out: + ovs_mutex_unlock(&flow_mark.mutex); +} + +static void +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid) +{ + struct flow masked_flow; + size_t i; + + for (i = 0; i < sizeof(struct flow); i++) { + ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] & + ((uint8_t *)&match->wc)[i]; + } + dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid); +} + static struct dp_netdev_flow * dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, struct match *match, const ovs_u128 *ufid, @@ -2464,12 +2738,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, memset(&flow->stats, 0, sizeof flow->stats); flow->dead = false; flow->batch = NULL; + flow->mark = INVALID_FLOW_MARK; *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id; *CONST_CAST(struct flow *, &flow->flow) = match->flow; *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid; ovs_refcount_init(&flow->ref_cnt); ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions, actions_len)); + dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *, &flow->mega_ufid)); netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask); /* Select dpcls for in_port. Relies on in_port to be exact match. */ @@ -2479,6 +2755,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow->node), dp_netdev_flow_hash(&flow->ufid)); + try_netdev_flow_put(pmd, in_port, flow, match, actions, actions_len); + if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) { struct ds ds = DS_EMPTY_INITIALIZER; struct ofpbuf key_buf, mask_buf; @@ -2559,6 +2837,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, if (put->flags & DPIF_FP_MODIFY) { struct dp_netdev_actions *new_actions; struct dp_netdev_actions *old_actions; + odp_port_t in_port = netdev_flow->flow.in_port.odp_port; new_actions = dp_netdev_actions_create(put->actions, put->actions_len); @@ -2566,6 +2845,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, old_actions = dp_netdev_flow_get_actions(netdev_flow); ovsrcu_set(&netdev_flow->actions, new_actions); + try_netdev_flow_put(pmd, in_port, netdev_flow, match, + put->actions, put->actions_len); + if (stats) { get_dpif_flow_stats(netdev_flow, stats); } @@ -3635,6 +3917,7 @@ reload_affected_pmds(struct dp_netdev *dp) CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { if (pmd->need_reload) { + flow_mark_flush(pmd); dp_netdev_reload_pmd__(pmd); pmd->need_reload = false; } diff --git a/lib/netdev.h b/lib/netdev.h index ff1b604..9ee3092 100644 --- a/lib/netdev.h +++ b/lib/netdev.h @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int qid); struct offload_info { const struct dpif_class *dpif_class; ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */ + + /* + * The flow mark id assigened to the flow. If any pkts hit the flow, + * it will be in the pkt meta data. + */ + uint32_t flow_mark; }; struct dpif_class; struct netdev_flow_dump; From patchwork Mon Jan 29 06:59:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867285 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="UxnKcDQg"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="Tkk3aptT"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVhC41qmMz9ryT for ; Tue, 30 Jan 2018 07:37:20 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 51F1C17CF; Mon, 29 Jan 2018 20:11:14 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E6F2B1A2D for ; Mon, 29 Jan 2018 07:00:00 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0C3CD149 for ; Mon, 29 Jan 2018 06:59:59 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 54EB320B4B; Mon, 29 Jan 2018 01:59:59 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 01:59:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=3TZ+BAh+88g/Hqf/C qAOEUAhkWHC1g/YEOReldY4s78=; b=UxnKcDQgjuOo6zPFl3PJBQ7L9uvMYBmpN M3J4YezRr8z5cOPQYkzYde+9UZvXbjto7sWaC+MI0QICXztFA06/mTjyh5/I7qQo q+V2URxFEgJ1xBXIhYDNx/8N/9kieETLHaqJvugIQpVpaCnz5qpCS/FphjKEJ5Gz x01f0O39x0JOPcKNdEdOTIGOPOWGT3OyWtTsMtFYbM4IuJnZVW1mqASTEeV1ZmN9 /BPAvkTg2YKEeQiwoSDtAhvXWeN1AS8XHPRh+R11PLsOtF/JpXHvrP3ly/mosDqm evGEYC5W86+CYHWJ9UrrcfbMeMlEZ8E5I7A1TUAHqWvq4jfd0DJrw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=3TZ+BAh+88g/Hqf/CqAOEUAhkWHC1g/YEOReldY4s78=; b=Tkk3aptT ETdUkBGBVRfXPENOsm8y2zvw4AkDF2uor1fTfZC4Oje8D684Cuwu6oMLC2Gk9su1 m5o0WrVK3W/qHAMpdYcF/qUjjWSoFYnUnv68AQe5YyzNo/6LWWwdy7XFhfFMRYFb elHaFK+riWNAI4yx3zeIM0VeYPCtH4UfvQlnLnUjTtsmgvIYKjnNPMIwb4E58bXw jNZXhH7S8H7+cxYsHF8uByylALRm+zlnUtjhNqsPy8sF54uCam4xzHgWMFZd/38y DVVH5a8oEb2NhMZiVOzubERZ4vUfm4bPqFik9cle/6jm7BWgrZRIxNdjisdxkHlS dlcXYWZJpiLm8A== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id C51007E34D; Mon, 29 Jan 2018 01:59:56 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:44 +0800 Message-Id: <1517209188-16608-3-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 2/6] dpif-netdev: retrieve flow directly from the flow mark X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org So that we could skip some very costly CPU operations, including but not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that 260% performance boost. Note that though the heavy miniflow_extract is skipped, we still have to do per packet checking, due to we have to check the tcp_flags. Co-authored-by: Finn Christensen Signed-off-by: Yuanhan Liu Signed-off-by: Finn Christensen --- v7: - fixed wrong hash for mark_to_flow that has been refactored in v6 v5: - do not fetch the flow if the flow has been dead --- lib/dp-packet.h | 13 +++++ lib/dpif-netdev.c | 44 +++++++++++++--- lib/flow.c | 155 +++++++++++++++++++++++++++++++++++++++++++----------- lib/flow.h | 1 + 4 files changed, 175 insertions(+), 38 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..ea1194c 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -691,6 +691,19 @@ reset_dp_packet_checksum_ol_flags(struct dp_packet *p) #define reset_dp_packet_checksum_ol_flags(arg) #endif +static inline bool +dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED, + uint32_t *mark OVS_UNUSED) +{ +#ifdef DPDK_NETDEV + if (p->mbuf.ol_flags & PKT_RX_FDIR_ID) { + *mark = p->mbuf.hash.fdir.hi; + return true; + } +#endif + return false; +} + enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */ struct dp_packet_batch { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a514de8..cb16e2e 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2013,6 +2013,23 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd) } } +static struct dp_netdev_flow * +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, + const uint32_t mark) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), + &flow_mark.mark_to_flow) { + if (flow->mark == mark && flow->pmd_id == pmd->core_id && + flow->dead == false) { + return flow; + } + } + + return NULL; +} + static void dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow) @@ -5203,10 +5220,10 @@ struct packet_batch_per_flow { static inline void packet_batch_per_flow_update(struct packet_batch_per_flow *batch, struct dp_packet *packet, - const struct miniflow *mf) + uint16_t tcp_flags) { batch->byte_count += dp_packet_size(packet); - batch->tcp_flags |= miniflow_get_tcp_flags(mf); + batch->tcp_flags |= tcp_flags; batch->array.packets[batch->array.count++] = packet; } @@ -5240,7 +5257,7 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, static inline void dp_netdev_queue_batches(struct dp_packet *pkt, - struct dp_netdev_flow *flow, const struct miniflow *mf, + struct dp_netdev_flow *flow, uint16_t tcp_flags, struct packet_batch_per_flow *batches, size_t *n_batches) { @@ -5251,7 +5268,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt, packet_batch_per_flow_init(batch, flow); } - packet_batch_per_flow_update(batch, pkt, mf); + packet_batch_per_flow_update(batch, pkt, tcp_flags); } /* Try to process all ('cnt') the 'packets' using only the exact match cache @@ -5282,6 +5299,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, const size_t cnt = dp_packet_batch_size(packets_); uint32_t cur_min; int i; + uint16_t tcp_flags; atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); pmd_perf_update_counter(&pmd->perf_stats, @@ -5290,6 +5308,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; + uint32_t flow_mark; if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { dp_packet_delete(packet); @@ -5297,6 +5316,16 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, continue; } + if (dp_packet_has_flow_mark(packet, &flow_mark)) { + flow = mark_to_flow_find(pmd, flow_mark); + if (flow) { + tcp_flags = parse_tcp_flags(packet); + dp_netdev_queue_batches(packet, flow, tcp_flags, batches, + n_batches); + continue; + } + } + if (i != cnt - 1) { struct dp_packet **packets = packets_->packets; /* Prefetch next packet data and metadata. */ @@ -5322,7 +5351,8 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, flow = NULL; } if (OVS_LIKELY(flow)) { - dp_netdev_queue_batches(packet, flow, &key->mf, batches, + tcp_flags = miniflow_get_tcp_flags(&key->mf); + dp_netdev_queue_batches(packet, flow, tcp_flags, batches, n_batches); } else { /* Exact match cache missed. Group missed packets together at @@ -5501,7 +5531,9 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, flow = dp_netdev_flow_cast(rules[i]); emc_probabilistic_insert(pmd, &keys[i], flow); - dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); + dp_netdev_queue_batches(packet, flow, + miniflow_get_tcp_flags(&keys[i].mf), + batches, n_batches); } pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, diff --git a/lib/flow.c b/lib/flow.c index f9d7c2a..80718d4 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -624,6 +624,70 @@ flow_extract(struct dp_packet *packet, struct flow *flow) miniflow_expand(&m.mf, flow); } +static inline bool +ipv4_sanity_check(const struct ip_header *nh, size_t size, + int *ip_lenp, uint16_t *tot_lenp) +{ + int ip_len; + uint16_t tot_len; + + if (OVS_UNLIKELY(size < IP_HEADER_LEN)) { + return false; + } + ip_len = IP_IHL(nh->ip_ihl_ver) * 4; + + if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN || size < ip_len)) { + return false; + } + + tot_len = ntohs(nh->ip_tot_len); + if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len || + size - tot_len > UINT8_MAX)) { + return false; + } + + *ip_lenp = ip_len; + *tot_lenp = tot_len; + + return true; +} + +static inline uint8_t +ipv4_get_nw_frag(const struct ip_header *nh) +{ + uint8_t nw_frag = 0; + + if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) { + nw_frag = FLOW_NW_FRAG_ANY; + if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) { + nw_frag |= FLOW_NW_FRAG_LATER; + } + } + + return nw_frag; +} + +static inline bool +ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, size_t size) +{ + uint16_t plen; + + if (OVS_UNLIKELY(size < sizeof *nh)) { + return false; + } + + plen = ntohs(nh->ip6_plen); + if (OVS_UNLIKELY(plen > size)) { + return false; + } + /* Jumbo Payload option not supported yet. */ + if (OVS_UNLIKELY(size - plen > UINT8_MAX)) { + return false; + } + + return true; +} + /* Caller is responsible for initializing 'dst' with enough storage for * FLOW_U64S * 8 bytes. */ void @@ -748,22 +812,7 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) int ip_len; uint16_t tot_len; - if (OVS_UNLIKELY(size < IP_HEADER_LEN)) { - goto out; - } - ip_len = IP_IHL(nh->ip_ihl_ver) * 4; - - if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN)) { - goto out; - } - if (OVS_UNLIKELY(size < ip_len)) { - goto out; - } - tot_len = ntohs(nh->ip_tot_len); - if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len)) { - goto out; - } - if (OVS_UNLIKELY(size - tot_len > UINT8_MAX)) { + if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, &ip_len, &tot_len))) { goto out; } dp_packet_set_l2_pad_size(packet, size - tot_len); @@ -786,31 +835,19 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) nw_tos = nh->ip_tos; nw_ttl = nh->ip_ttl; nw_proto = nh->ip_proto; - if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) { - nw_frag = FLOW_NW_FRAG_ANY; - if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) { - nw_frag |= FLOW_NW_FRAG_LATER; - } - } + nw_frag = ipv4_get_nw_frag(nh); data_pull(&data, &size, ip_len); } else if (dl_type == htons(ETH_TYPE_IPV6)) { - const struct ovs_16aligned_ip6_hdr *nh; + const struct ovs_16aligned_ip6_hdr *nh = data; ovs_be32 tc_flow; uint16_t plen; - if (OVS_UNLIKELY(size < sizeof *nh)) { + if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) { goto out; } - nh = data_pull(&data, &size, sizeof *nh); + data_pull(&data, &size, sizeof *nh); plen = ntohs(nh->ip6_plen); - if (OVS_UNLIKELY(plen > size)) { - goto out; - } - /* Jumbo Payload option not supported yet. */ - if (OVS_UNLIKELY(size - plen > UINT8_MAX)) { - goto out; - } dp_packet_set_l2_pad_size(packet, size - plen); size = plen; /* Never pull padding. */ @@ -982,6 +1019,60 @@ parse_dl_type(const struct eth_header *data_, size_t size) return parse_ethertype(&data, &size); } +uint16_t +parse_tcp_flags(struct dp_packet *packet) +{ + const void *data = dp_packet_data(packet); + size_t size = dp_packet_size(packet); + ovs_be16 dl_type; + uint8_t nw_frag = 0, nw_proto = 0; + + if (packet->packet_type != htonl(PT_ETH)) { + return 0; + } + + data_pull(&data, &size, ETH_ADDR_LEN * 2); + dl_type = parse_ethertype(&data, &size); + if (OVS_LIKELY(dl_type == htons(ETH_TYPE_IP))) { + const struct ip_header *nh = data; + int ip_len; + uint16_t tot_len; + + if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, &ip_len, &tot_len))) { + return 0; + } + nw_proto = nh->ip_proto; + nw_frag = ipv4_get_nw_frag(nh); + + size = tot_len; /* Never pull padding. */ + data_pull(&data, &size, ip_len); + } else if (dl_type == htons(ETH_TYPE_IPV6)) { + const struct ovs_16aligned_ip6_hdr *nh = data; + + if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) { + return 0; + } + data_pull(&data, &size, sizeof *nh); + + size = ntohs(nh->ip6_plen); /* Never pull padding. */ + if (!parse_ipv6_ext_hdrs__(&data, &size, &nw_proto, &nw_frag)) { + return 0; + } + nw_proto = nh->ip6_nxt; + } else { + return 0; + } + + if (!(nw_frag & FLOW_NW_FRAG_LATER) && nw_proto == IPPROTO_TCP && + size >= TCP_HEADER_LEN) { + const struct tcp_header *tcp = data; + + return TCP_FLAGS_BE32(tcp->tcp_ctl); + } + + return 0; +} + /* For every bit of a field that is wildcarded in 'wildcards', sets the * corresponding bit in 'flow' to zero. */ void diff --git a/lib/flow.h b/lib/flow.h index eb1e2bf..6be4199 100644 --- a/lib/flow.h +++ b/lib/flow.h @@ -130,6 +130,7 @@ bool parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto, uint8_t *nw_frag); ovs_be16 parse_dl_type(const struct eth_header *data_, size_t size); bool parse_nsh(const void **datap, size_t *sizep, struct ovs_key_nsh *key); +uint16_t parse_tcp_flags(struct dp_packet *packet); static inline uint64_t flow_get_xreg(const struct flow *flow, int idx) From patchwork Mon Jan 29 06:59:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867291 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="omtYvVNP"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="JnJ6TO/B"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVhRd1cNbz9s75 for ; Tue, 30 Jan 2018 07:48:13 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 594FDDD3; Mon, 29 Jan 2018 20:11:53 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A3B421A2E for ; Mon, 29 Jan 2018 07:00:04 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D9D24149 for ; Mon, 29 Jan 2018 07:00:02 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 2A77920AB3; Mon, 29 Jan 2018 02:00:02 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 02:00:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=cv4IIbHMTX5NEnMHy lKXLp59B3CXse0KV8ASwo1VYJ8=; b=omtYvVNPNOk3aunCBNuMqUu8ti8MQyavr sm5bJnATx/P3l3YnOBp8zOO9+yMwnnSCHkaX+bVEEwHvD8U+kJxZksn8rO8pXW7Q nrdK/my/+V+FTGwJzDtawgvVaNsKUV+2enryzLrWCcr96u//pQfghRXkzZxE06TJ TggNp8q9K7u8PofvNLaeN+3sVOM+3LfrPIUdtX/EVkxy+FM45WiJhRx1Sx3RNwCb fJzAA+pdl+u9ntxmq2EJd4eP7NXHTuSgB4EgzK7C0gpLlR9AYF0l4vcxdjV3W06c lgq5K1wZF/AwbIyikCv9q15gG0q++CS3JHtr8IFK6RMF5e+I5iBkA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=cv4IIbHMTX5NEnMHylKXLp59B3CXse0KV8ASwo1VYJ8=; b=JnJ6TO/B cxTcWcWs7X1VG+jVJdyLwYgh+n8kiqPLeIS6PNRy0MB07UIUjyhwK7DxHul2UrzI dbEcYotPuSIOy7vQyLzviAm2m0TT+Q1iBx6ratRjlgu14M1xSiv+dlyXgPcwztUq VsOqA2ihS4XN4D5Vl74W+BMiIm5uF94HskgrWjzUeRav1uOxbv/IXxXXKKNgppBF fkKsAinoNnZ20PYRUqjAUw3Ui7NtBDk1ZRy/m8S6MDWsH6YzIwcsDCHXwMqKoMG2 wHiR9mRMAdeuv3y+yK1dxIg7jCZaHtzuZsk81EvnXpTraxrlCkAAW75Uxf/WcbQB vo21CxAuSpbFBw== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id 8D2267E34D; Mon, 29 Jan 2018 01:59:59 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:45 +0800 Message-Id: <1517209188-16608-4-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 3/6] netdev-dpdk: implement flow offload with rte flow X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Finn Christensen The basic yet the major part of this patch is to translate the "match" to rte flow patterns. And then, we create a rte flow with MARK + RSS actions. Afterwards, all packets match the flow will have the mark id in the mbuf. The reason RSS is needed is, for most NICs, a MARK only action is not allowed. It has to be used together with some other actions, such as QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which may break the rss. Likely, RSS action is currently the best we could now. Thus, RSS action is choosen. For any unsupported flows, such as MPLS, -1 is returned, meaning the flow offload is failed and then skipped. Co-authored-by: Yuanhan Liu Signed-off-by: Finn Christensen Signed-off-by: Yuanhan Liu --- v7: - set the rss_conf for rss action to NULL, to workaround a mlx5 change in DPDK v17.11. Note that it will obey the rss settings OVS-DPDK has set in the beginning. Thus, nothing should be effected. --- lib/netdev-dpdk.c | 559 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 558 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ac2e38e..4bd0503 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -38,7 +38,9 @@ #include #include #include +#include +#include "cmap.h" #include "dirs.h" #include "dp-packet.h" #include "dpdk.h" @@ -60,6 +62,7 @@ #include "sset.h" #include "unaligned.h" #include "timeval.h" +#include "uuid.h" #include "unixctl.h" enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; @@ -3633,6 +3636,560 @@ unlock: return err; } +/* + * A mapping from ufid to dpdk rte_flow. + */ +static struct cmap ufid_to_rte_flow = CMAP_INITIALIZER; + +struct ufid_to_rte_flow_data { + struct cmap_node node; + ovs_u128 ufid; + struct rte_flow *rte_flow; +}; + +/* Find rte_flow with @ufid */ +static struct rte_flow * +ufid_to_rte_flow_find(const ovs_u128 *ufid) +{ + size_t hash = hash_bytes(ufid, sizeof(*ufid), 0); + struct ufid_to_rte_flow_data *data; + + CMAP_FOR_EACH_WITH_HASH (data, node, hash, &ufid_to_rte_flow) { + if (ovs_u128_equals(*ufid, data->ufid)) { + return data->rte_flow; + } + } + + return NULL; +} + +static inline void +ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct rte_flow *rte_flow) +{ + size_t hash = hash_bytes(ufid, sizeof(*ufid), 0); + struct ufid_to_rte_flow_data *data = xzalloc(sizeof(*data)); + + /* + * We should not simply overwrite an existing rte flow. + * We should have deleted it first before re-adding it. + * Thus, if following assert triggers, something is wrong: + * the rte_flow is not destroyed. + */ + ovs_assert(ufid_to_rte_flow_find(ufid) == NULL); + + data->ufid = *ufid; + data->rte_flow = rte_flow; + + cmap_insert(&ufid_to_rte_flow, + CONST_CAST(struct cmap_node *, &data->node), hash); +} + +static inline void +ufid_to_rte_flow_disassociate(const ovs_u128 *ufid) +{ + size_t hash = hash_bytes(ufid, sizeof(*ufid), 0); + struct ufid_to_rte_flow_data *data; + + CMAP_FOR_EACH_WITH_HASH (data, node, hash, &ufid_to_rte_flow) { + if (ovs_u128_equals(*ufid, data->ufid)) { + cmap_remove(&ufid_to_rte_flow, + CONST_CAST(struct cmap_node *, &data->node), hash); + free(data); + return; + } + } + + VLOG_WARN("ufid "UUID_FMT" is not associated with an rte flow\n", + UUID_ARGS((struct uuid *)ufid)); +} + +struct flow_patterns { + struct rte_flow_item *items; + int cnt; + int max; +}; + +struct flow_actions { + struct rte_flow_action *actions; + int cnt; + int max; +}; + +static void +add_flow_pattern(struct flow_patterns *patterns, enum rte_flow_item_type type, + const void *spec, const void *mask) +{ + int cnt = patterns->cnt; + + if (cnt == 0) { + patterns->max = 8; + patterns->items = xcalloc(patterns->max, sizeof(struct rte_flow_item)); + } else if (cnt == patterns->max) { + patterns->max *= 2; + patterns->items = xrealloc(patterns->items, patterns->max * + sizeof(struct rte_flow_item)); + } + + patterns->items[cnt].type = type; + patterns->items[cnt].spec = spec; + patterns->items[cnt].mask = mask; + patterns->items[cnt].last = NULL; + patterns->cnt++; +} + +static void +add_flow_action(struct flow_actions *actions, enum rte_flow_action_type type, + const void *conf) +{ + int cnt = actions->cnt; + + if (cnt == 0) { + actions->max = 8; + actions->actions = xcalloc(actions->max, + sizeof(struct rte_flow_action)); + } else if (cnt == actions->max) { + actions->max *= 2; + actions->actions = xrealloc(actions->actions, actions->max * + sizeof(struct rte_flow_action)); + } + + actions->actions[cnt].type = type; + actions->actions[cnt].conf = conf; + actions->cnt++; +} + +static struct rte_flow_action_rss * +add_flow_rss_action(struct flow_actions *actions, struct netdev *netdev) +{ + int i; + struct rte_flow_action_rss *rss; + + rss = xmalloc(sizeof(*rss) + sizeof(uint16_t) * netdev->n_rxq); + /* + * Setting it to NULL will let the driver use the default RSS + * configuration we have set: &port_conf.rx_adv_conf.rss_conf. + */ + rss->rss_conf = NULL; + rss->num = netdev->n_rxq; + + for (i = 0; i < rss->num; i++) { + rss->queue[i] = i; + } + + add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, rss); + + return rss; +} + +static int +netdev_dpdk_add_rte_flow_offload(struct netdev *netdev, + const struct match *match, + struct nlattr *nl_actions OVS_UNUSED, + size_t actions_len OVS_UNUSED, + const ovs_u128 *ufid, + struct offload_info *info) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + const struct rte_flow_attr flow_attr = { + .group = 0, + .priority = 0, + .ingress = 1, + .egress = 0 + }; + struct flow_patterns patterns = { .items = NULL, .cnt = 0 }; + struct flow_actions actions = { .actions = NULL, .cnt = 0 }; + struct rte_flow *flow; + struct rte_flow_error error; + uint8_t *ipv4_next_proto_mask = NULL; + int ret = 0; + + /* Eth */ + struct rte_flow_item_eth eth_spec; + struct rte_flow_item_eth eth_mask; + memset(ð_spec, 0, sizeof(eth_spec)); + memset(ð_mask, 0, sizeof(eth_mask)); + if (!eth_addr_is_zero(match->wc.masks.dl_src) || + !eth_addr_is_zero(match->wc.masks.dl_dst)) { + rte_memcpy(ð_spec.dst, &match->flow.dl_dst, sizeof(eth_spec.dst)); + rte_memcpy(ð_spec.src, &match->flow.dl_src, sizeof(eth_spec.src)); + eth_spec.type = match->flow.dl_type; + + rte_memcpy(ð_mask.dst, &match->wc.masks.dl_dst, + sizeof(eth_mask.dst)); + rte_memcpy(ð_mask.src, &match->wc.masks.dl_src, + sizeof(eth_mask.src)); + eth_mask.type = match->wc.masks.dl_type; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_ETH, + ð_spec, ð_mask); + } else { + /* + * If user specifies a flow (like UDP flow) without L2 patterns, + * OVS will at least set the dl_type. Normally, it's enough to + * create an eth pattern just with it. Unluckily, some Intel's + * NIC (such as XL710) doesn't support that. Below is a workaround, + * which simply matches any L2 pkts. + */ + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_ETH, NULL, NULL); + } + + /* VLAN */ + struct rte_flow_item_vlan vlan_spec; + struct rte_flow_item_vlan vlan_mask; + memset(&vlan_spec, 0, sizeof(vlan_spec)); + memset(&vlan_mask, 0, sizeof(vlan_mask)); + if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) { + vlan_spec.tci = match->flow.vlans[0].tci; + vlan_mask.tci = match->wc.masks.vlans[0].tci; + + /* match any protocols */ + vlan_mask.tpid = 0; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_VLAN, + &vlan_spec, &vlan_mask); + } + + /* IP v4 */ + uint8_t proto = 0; + struct rte_flow_item_ipv4 ipv4_spec; + struct rte_flow_item_ipv4 ipv4_mask; + memset(&ipv4_spec, 0, sizeof(ipv4_spec)); + memset(&ipv4_mask, 0, sizeof(ipv4_mask)); + if (match->flow.dl_type == ntohs(ETH_TYPE_IP) && + (match->wc.masks.nw_src || match->wc.masks.nw_dst || + match->wc.masks.nw_tos || match->wc.masks.nw_ttl || + match->wc.masks.nw_proto)) { + ipv4_spec.hdr.type_of_service = match->flow.nw_tos; + ipv4_spec.hdr.time_to_live = match->flow.nw_ttl; + ipv4_spec.hdr.next_proto_id = match->flow.nw_proto; + ipv4_spec.hdr.src_addr = match->flow.nw_src; + ipv4_spec.hdr.dst_addr = match->flow.nw_dst; + + ipv4_mask.hdr.type_of_service = match->wc.masks.nw_tos; + ipv4_mask.hdr.time_to_live = match->wc.masks.nw_ttl; + ipv4_mask.hdr.next_proto_id = match->wc.masks.nw_proto; + ipv4_mask.hdr.src_addr = match->wc.masks.nw_src; + ipv4_mask.hdr.dst_addr = match->wc.masks.nw_dst; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_IPV4, + &ipv4_spec, &ipv4_mask); + + /* Save proto for L4 protocol setup */ + proto = ipv4_spec.hdr.next_proto_id & ipv4_mask.hdr.next_proto_id; + + /* Remember proto mask address for later modification */ + ipv4_next_proto_mask = &ipv4_mask.hdr.next_proto_id; + } + + if (proto != IPPROTO_ICMP && proto != IPPROTO_UDP && + proto != IPPROTO_SCTP && proto != IPPROTO_TCP && + (match->wc.masks.tp_src || + match->wc.masks.tp_dst || + match->wc.masks.tcp_flags)) { + VLOG_DBG("L4 Protocol (%u) not supported", proto); + ret = -1; + goto out; + } + + if ((match->wc.masks.tp_src && match->wc.masks.tp_src != 0xffff) || + (match->wc.masks.tp_dst && match->wc.masks.tp_dst != 0xffff)) { + ret = -1; + goto out; + } + + struct rte_flow_item_udp udp_spec; + struct rte_flow_item_udp udp_mask; + memset(&udp_spec, 0, sizeof(udp_spec)); + memset(&udp_mask, 0, sizeof(udp_mask)); + if (proto == IPPROTO_UDP && + (match->wc.masks.tp_src || match->wc.masks.tp_dst)) { + udp_spec.hdr.src_port = match->flow.tp_src; + udp_spec.hdr.dst_port = match->flow.tp_dst; + + udp_mask.hdr.src_port = match->wc.masks.tp_src; + udp_mask.hdr.dst_port = match->wc.masks.tp_dst; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_UDP, + &udp_spec, &udp_mask); + + /* proto == UDP and ITEM_TYPE_UDP, thus no need for proto match */ + if (ipv4_next_proto_mask) { + *ipv4_next_proto_mask = 0; + } + } + + struct rte_flow_item_sctp sctp_spec; + struct rte_flow_item_sctp sctp_mask; + memset(&sctp_spec, 0, sizeof(sctp_spec)); + memset(&sctp_mask, 0, sizeof(sctp_mask)); + if (proto == IPPROTO_SCTP && + (match->wc.masks.tp_src || match->wc.masks.tp_dst)) { + sctp_spec.hdr.src_port = match->flow.tp_src; + sctp_spec.hdr.dst_port = match->flow.tp_dst; + + sctp_mask.hdr.src_port = match->wc.masks.tp_src; + sctp_mask.hdr.dst_port = match->wc.masks.tp_dst; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_SCTP, + &sctp_spec, &sctp_mask); + + /* proto == SCTP and ITEM_TYPE_SCTP, thus no need for proto match */ + if (ipv4_next_proto_mask) { + *ipv4_next_proto_mask = 0; + } + } + + struct rte_flow_item_icmp icmp_spec; + struct rte_flow_item_icmp icmp_mask; + memset(&icmp_spec, 0, sizeof(icmp_spec)); + memset(&icmp_mask, 0, sizeof(icmp_mask)); + if (proto == IPPROTO_ICMP && + (match->wc.masks.tp_src || match->wc.masks.tp_dst)) { + icmp_spec.hdr.icmp_type = (uint8_t)ntohs(match->flow.tp_src); + icmp_spec.hdr.icmp_code = (uint8_t)ntohs(match->flow.tp_dst); + + icmp_mask.hdr.icmp_type = (uint8_t)ntohs(match->wc.masks.tp_src); + icmp_mask.hdr.icmp_code = (uint8_t)ntohs(match->wc.masks.tp_dst); + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_ICMP, + &icmp_spec, &icmp_mask); + + /* proto == ICMP and ITEM_TYPE_ICMP, thus no need for proto match */ + if (ipv4_next_proto_mask) { + *ipv4_next_proto_mask = 0; + } + } + + struct rte_flow_item_tcp tcp_spec; + struct rte_flow_item_tcp tcp_mask; + memset(&tcp_spec, 0, sizeof(tcp_spec)); + memset(&tcp_mask, 0, sizeof(tcp_mask)); + if (proto == IPPROTO_TCP && + (match->wc.masks.tp_src || + match->wc.masks.tp_dst || + match->wc.masks.tcp_flags)) { + tcp_spec.hdr.src_port = match->flow.tp_src; + tcp_spec.hdr.dst_port = match->flow.tp_dst; + tcp_spec.hdr.data_off = ntohs(match->flow.tcp_flags) >> 8; + tcp_spec.hdr.tcp_flags = ntohs(match->flow.tcp_flags) & 0xff; + + tcp_mask.hdr.src_port = match->wc.masks.tp_src; + tcp_mask.hdr.dst_port = match->wc.masks.tp_dst; + tcp_mask.hdr.data_off = ntohs(match->wc.masks.tcp_flags) >> 8; + tcp_mask.hdr.tcp_flags = ntohs(match->wc.masks.tcp_flags) & 0xff; + + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_TCP, + &tcp_spec, &tcp_mask); + + /* proto == TCP and ITEM_TYPE_TCP, thus no need for proto match */ + if (ipv4_next_proto_mask) { + *ipv4_next_proto_mask = 0; + } + } + add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_END, NULL, NULL); + + struct rte_flow_action_mark mark; + mark.id = info->flow_mark; + add_flow_action(&actions, RTE_FLOW_ACTION_TYPE_MARK, &mark); + + struct rte_flow_action_rss *rss; + rss = add_flow_rss_action(&actions, netdev); + add_flow_action(&actions, RTE_FLOW_ACTION_TYPE_END, NULL); + + flow = rte_flow_create(dev->port_id, &flow_attr, patterns.items, + actions.actions, &error); + free(rss); + if (!flow) { + VLOG_ERR("rte flow creat error: %u : message : %s\n", + error.type, error.message); + ret = -1; + goto out; + } + ufid_to_rte_flow_associate(ufid, flow); + VLOG_DBG("installed flow %p by ufid "UUID_FMT"\n", + flow, UUID_ARGS((struct uuid *)ufid)); + +out: + free(patterns.items); + free(actions.actions); + return ret; +} + +static bool +is_all_zero(const void *addr, size_t n) +{ + size_t i = 0; + const uint8_t *p = (uint8_t *)addr; + + for (i = 0; i < n; i++) { + if (p[i] != 0) { + return false; + } + } + + return true; +} + +/* + * Check if any unsupported flow patterns are specified. + */ +static int +netdev_dpdk_validate_flow(const struct match *match) +{ + struct match match_zero_wc; + + /* Create a wc-zeroed version of flow */ + match_init(&match_zero_wc, &match->flow, &match->wc); + + if (!is_all_zero(&match_zero_wc.flow.tunnel, + sizeof(match_zero_wc.flow.tunnel))) { + goto err; + } + + if (match->wc.masks.metadata || + match->wc.masks.skb_priority || + match->wc.masks.pkt_mark || + match->wc.masks.dp_hash) { + goto err; + } + + /* recirc id must be zero */ + if (match_zero_wc.flow.recirc_id) { + goto err; + } + + if (match->wc.masks.ct_state || + match->wc.masks.ct_nw_proto || + match->wc.masks.ct_zone || + match->wc.masks.ct_mark || + match->wc.masks.ct_label.u64.hi || + match->wc.masks.ct_label.u64.lo) { + goto err; + } + + if (match->wc.masks.conj_id || + match->wc.masks.actset_output) { + goto err; + } + + /* unsupported L2 */ + if (!is_all_zero(&match->wc.masks.mpls_lse, + sizeof(match_zero_wc.flow.mpls_lse))) { + goto err; + } + + /* unsupported L3 */ + if (match->wc.masks.ipv6_label || + match->wc.masks.ct_nw_src || + match->wc.masks.ct_nw_dst || + !is_all_zero(&match->wc.masks.ipv6_src, sizeof(struct in6_addr)) || + !is_all_zero(&match->wc.masks.ipv6_dst, sizeof(struct in6_addr)) || + !is_all_zero(&match->wc.masks.ct_ipv6_src, sizeof(struct in6_addr)) || + !is_all_zero(&match->wc.masks.ct_ipv6_dst, sizeof(struct in6_addr)) || + !is_all_zero(&match->wc.masks.nd_target, sizeof(struct in6_addr)) || + !is_all_zero(&match->wc.masks.nsh, sizeof(struct ovs_key_nsh)) || + !is_all_zero(&match->wc.masks.arp_sha, sizeof(struct eth_addr)) || + !is_all_zero(&match->wc.masks.arp_tha, sizeof(struct eth_addr))) { + goto err; + } + + /* If fragmented, then don't HW accelerate - for now */ + if (match_zero_wc.flow.nw_frag) { + goto err; + } + + /* unsupported L4 */ + if (match->wc.masks.igmp_group_ip4 || + match->wc.masks.ct_tp_src || + match->wc.masks.ct_tp_dst) { + goto err; + } + + return 0; + +err: + VLOG_ERR("cannot HW accelerate this flow due to unsupported protocols"); + return -1; +} + +static int +netdev_dpdk_destroy_rte_flow(struct netdev_dpdk *dev, + const ovs_u128 *ufid, + struct rte_flow *rte_flow) +{ + struct rte_flow_error error; + int ret; + + ret = rte_flow_destroy(dev->port_id, rte_flow, &error); + if (ret == 0) { + ufid_to_rte_flow_disassociate(ufid); + VLOG_DBG("removed rte flow %p associated with ufid " UUID_FMT "\n", + rte_flow, UUID_ARGS((struct uuid *)ufid)); + } else { + VLOG_ERR("rte flow destroy error: %u : message : %s\n", + error.type, error.message); + } + + return ret; +} + +static int +netdev_dpdk_flow_put(struct netdev *netdev, struct match *match, + struct nlattr *actions, size_t actions_len, + const ovs_u128 *ufid, struct offload_info *info, + struct dpif_flow_stats *stats OVS_UNUSED) +{ + struct rte_flow *rte_flow; + int ret; + + /* + * If an old rte_flow exists, it means it's a flow modification. + * Here destroy the old rte flow first before adding a new one. + */ + rte_flow = ufid_to_rte_flow_find(ufid); + if (rte_flow) { + ret = netdev_dpdk_destroy_rte_flow(netdev_dpdk_cast(netdev), + ufid, rte_flow); + if (ret < 0) { + return ret; + } + } + + ret = netdev_dpdk_validate_flow(match); + if (ret < 0) { + return ret; + } + + return netdev_dpdk_add_rte_flow_offload(netdev, match, actions, + actions_len, ufid, info); +} + +static int +netdev_dpdk_flow_del(struct netdev *netdev, const ovs_u128 *ufid, + struct dpif_flow_stats *stats OVS_UNUSED) +{ + + struct rte_flow *rte_flow = ufid_to_rte_flow_find(ufid); + + if (!rte_flow) { + return -1; + } + + return netdev_dpdk_destroy_rte_flow(netdev_dpdk_cast(netdev), + ufid, rte_flow); +} + +#define DPDK_FLOW_OFFLOAD_API \ + NULL, /* flow_flush */ \ + NULL, /* flow_dump_create */ \ + NULL, /* flow_dump_destroy */ \ + NULL, /* flow_dump_next */ \ + netdev_dpdk_flow_put, \ + NULL, /* flow_get */ \ + netdev_dpdk_flow_del, \ + NULL /* init_flow_api */ + + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ @@ -3707,7 +4264,7 @@ unlock: RXQ_RECV, \ NULL, /* rx_wait */ \ NULL, /* rxq_drain */ \ - NO_OFFLOAD_API \ + DPDK_FLOW_OFFLOAD_API \ } static const struct netdev_class dpdk_class = From patchwork Mon Jan 29 06:59:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867298 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="nZ2+nXrK"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="cb87HM2m"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVhdT1kqKz9s7s for ; Tue, 30 Jan 2018 07:56:45 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8BB01D62; Mon, 29 Jan 2018 20:12:20 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id DA3DF1A31 for ; Mon, 29 Jan 2018 07:00:06 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 197C4149 for ; Mon, 29 Jan 2018 07:00:06 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 6429E20B9D; Mon, 29 Jan 2018 02:00:05 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 02:00:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=oiuYizdYl/kL6jjAp dgDOmkIUGsJLFkNNyNzGDbO08Y=; b=nZ2+nXrK7ySiuSFPHL6iZ5M6g78wwWiy2 EzjiPypClvtN9NC73bBbi4rGAQ4n5jM0K9rOyvoU5wyevs2ZVWdFZMrsCb6Dxfst 2LeH9SI4paS7VF2AlfYmC8gG5fDBZLAJHJa5L3RhH56B26vcApjZ1ud27YvKdoxp LG3QDR0th2BrwKd4Dkwx28B5BS37kFIHLqWUQfTJrgpQhoklZquZoQm8+IvSadLe t1a7zfWNU9BdOzRtbcnrTVOi5seS9xMoFshuN1MkViXqmXGEEQpjFDapdcN5bEzO MhUotdVc2AxpPY+nN8P3o/YA1/9I799tTTGbuE6hn07nTqXYUhCuQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=oiuYizdYl/kL6jjApdgDOmkIUGsJLFkNNyNzGDbO08Y=; b=cb87HM2m qxmL1RtzXMDjUEhzNnUGbGRjlEmEOGKvBmdRBfZUGY1T/gS1EbbVqD4KKQ7X49XW MyDiHaq/RbTKNFUokeUb9UdDDkRgN/Fn4S7RDRgqJfy+pc2IixeQlVthsmQx53eO MTg9F+TVuApMWoWy6JIVHsAuvHWwKkczppZThIVbx5L5Xa+fKWfVex+gmLaSPNaD epbjgqIAjlsa0lBw+MVrTMi302Uu2XlgPfSV/aolRn/5vSC71WiWeG3tR2nm5QFX b/pz5nUHkJBWw+tGvGbOw/Ij8m5VxYWx0YrGpxv6Iz/8XHCZrn+NTMCwvIJ8cQ1+ lsB3w1vl+EvOGA== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id 2D6EA7E5CA; Mon, 29 Jan 2018 02:00:01 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:46 +0800 Message-Id: <1517209188-16608-5-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 4/6] netdev-dpdk: add debug for rte flow patterns X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org For debug purpose. Co-authored-by: Finn Christensen Signed-off-by: Yuanhan Liu Signed-off-by: Finn Christensen --- v5: - turned log to DBG level --- lib/netdev-dpdk.c | 177 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4bd0503..a18a8b9 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -3716,6 +3716,182 @@ struct flow_actions { }; static void +dump_flow_pattern(struct rte_flow_item *item) +{ + if (item->type == RTE_FLOW_ITEM_TYPE_ETH) { + const struct rte_flow_item_eth *eth_spec = item->spec; + const struct rte_flow_item_eth *eth_mask = item->mask; + + VLOG_DBG("rte flow eth pattern:\n"); + if (eth_spec) { + VLOG_DBG(" spec: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", " + "type=0x%04" PRIx16"\n", + eth_spec->src.addr_bytes[0], eth_spec->src.addr_bytes[1], + eth_spec->src.addr_bytes[2], eth_spec->src.addr_bytes[3], + eth_spec->src.addr_bytes[4], eth_spec->src.addr_bytes[5], + eth_spec->dst.addr_bytes[0], eth_spec->dst.addr_bytes[1], + eth_spec->dst.addr_bytes[2], eth_spec->dst.addr_bytes[3], + eth_spec->dst.addr_bytes[4], eth_spec->dst.addr_bytes[5], + ntohs(eth_spec->type)); + } else { + VLOG_DBG(" spec = null\n"); + } + if (eth_mask) { + VLOG_DBG(" mask: src="ETH_ADDR_FMT", dst="ETH_ADDR_FMT", " + "type=0x%04"PRIx16"\n", + eth_mask->src.addr_bytes[0], eth_mask->src.addr_bytes[1], + eth_mask->src.addr_bytes[2], eth_mask->src.addr_bytes[3], + eth_mask->src.addr_bytes[4], eth_mask->src.addr_bytes[5], + eth_mask->dst.addr_bytes[0], eth_mask->dst.addr_bytes[1], + eth_mask->dst.addr_bytes[2], eth_mask->dst.addr_bytes[3], + eth_mask->dst.addr_bytes[4], eth_mask->dst.addr_bytes[5], + eth_mask->type); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) { + const struct rte_flow_item_vlan *vlan_spec = item->spec; + const struct rte_flow_item_vlan *vlan_mask = item->mask; + + VLOG_DBG("rte flow vlan pattern:\n"); + if (vlan_spec) { + VLOG_DBG(" spec: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n", + ntohs(vlan_spec->tpid), ntohs(vlan_spec->tci)); + } else { + VLOG_DBG(" spec = null\n"); + } + + if (vlan_mask) { + VLOG_DBG(" mask: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n", + vlan_mask->tpid, vlan_mask->tci); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) { + const struct rte_flow_item_ipv4 *ipv4_spec = item->spec; + const struct rte_flow_item_ipv4 *ipv4_mask = item->mask; + + VLOG_DBG("rte flow ipv4 pattern:\n"); + if (ipv4_spec) { + VLOG_DBG(" spec: tos=0x%"PRIx8", ttl=%"PRIx8", proto=0x%"PRIx8 + ", src="IP_FMT", dst="IP_FMT"\n", + ipv4_spec->hdr.type_of_service, + ipv4_spec->hdr.time_to_live, + ipv4_spec->hdr.next_proto_id, + IP_ARGS(ipv4_spec->hdr.src_addr), + IP_ARGS(ipv4_spec->hdr.dst_addr)); + } else { + VLOG_DBG(" spec = null\n"); + } + if (ipv4_mask) { + VLOG_DBG(" mask: tos=0x%"PRIx8", ttl=%"PRIx8", proto=0x%"PRIx8 + ", src="IP_FMT", dst="IP_FMT"\n", + ipv4_mask->hdr.type_of_service, + ipv4_mask->hdr.time_to_live, + ipv4_mask->hdr.next_proto_id, + IP_ARGS(ipv4_mask->hdr.src_addr), + IP_ARGS(ipv4_mask->hdr.dst_addr)); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_UDP) { + const struct rte_flow_item_udp *udp_spec = item->spec; + const struct rte_flow_item_udp *udp_mask = item->mask; + + VLOG_DBG("rte flow udp pattern:\n"); + if (udp_spec) { + VLOG_DBG(" spec: src_port=%"PRIu16", dst_port=%"PRIu16"\n", + ntohs(udp_spec->hdr.src_port), + ntohs(udp_spec->hdr.dst_port)); + } else { + VLOG_DBG(" spec = null\n"); + } + if (udp_mask) { + VLOG_DBG(" mask: src_port=0x%"PRIx16", dst_port=0x%"PRIx16"\n", + udp_mask->hdr.src_port, + udp_mask->hdr.dst_port); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_SCTP) { + const struct rte_flow_item_sctp *sctp_spec = item->spec; + const struct rte_flow_item_sctp *sctp_mask = item->mask; + + VLOG_DBG("rte flow sctp pattern:\n"); + if (sctp_spec) { + VLOG_DBG(" spec: src_port=%"PRIu16", dst_port=%"PRIu16"\n", + ntohs(sctp_spec->hdr.src_port), + ntohs(sctp_spec->hdr.dst_port)); + } else { + VLOG_DBG(" spec = null\n"); + } + if (sctp_mask) { + VLOG_DBG(" mask: src_port=0x%"PRIx16", dst_port=0x%"PRIx16"\n", + sctp_mask->hdr.src_port, + sctp_mask->hdr.dst_port); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_ICMP) { + const struct rte_flow_item_icmp *icmp_spec = item->spec; + const struct rte_flow_item_icmp *icmp_mask = item->mask; + + VLOG_DBG("rte flow icmp pattern:\n"); + if (icmp_spec) { + VLOG_DBG(" spec: icmp_type=%"PRIu8", icmp_code=%"PRIu8"\n", + ntohs(icmp_spec->hdr.icmp_type), + ntohs(icmp_spec->hdr.icmp_code)); + } else { + VLOG_DBG(" spec = null\n"); + } + if (icmp_mask) { + VLOG_DBG(" mask: icmp_type=0x%"PRIx8", icmp_code=0x%"PRIx8"\n", + icmp_spec->hdr.icmp_type, + icmp_spec->hdr.icmp_code); + } else { + VLOG_DBG(" mask = null\n"); + } + } + + if (item->type == RTE_FLOW_ITEM_TYPE_TCP) { + const struct rte_flow_item_tcp *tcp_spec = item->spec; + const struct rte_flow_item_tcp *tcp_mask = item->mask; + + VLOG_DBG("rte flow tcp pattern:\n"); + if (tcp_spec) { + VLOG_DBG(" spec: src_port=%"PRIu16", dst_port=%"PRIu16 + ", data_off=0x%"PRIx8", tcp_flags=0x%"PRIx8"\n", + ntohs(tcp_spec->hdr.src_port), + ntohs(tcp_spec->hdr.dst_port), + tcp_spec->hdr.data_off, + tcp_spec->hdr.tcp_flags); + } else { + VLOG_DBG(" spec = null\n"); + } + if (tcp_mask) { + VLOG_DBG(" mask: src_port=%"PRIx16", dst_port=%"PRIx16 + ", data_off=0x%"PRIx8", tcp_flags=0x%"PRIx8"\n", + tcp_mask->hdr.src_port, + tcp_mask->hdr.dst_port, + tcp_mask->hdr.data_off, + tcp_mask->hdr.tcp_flags); + } else { + VLOG_DBG(" mask = null\n"); + } + } +} + +static void add_flow_pattern(struct flow_patterns *patterns, enum rte_flow_item_type type, const void *spec, const void *mask) { @@ -3734,6 +3910,7 @@ add_flow_pattern(struct flow_patterns *patterns, enum rte_flow_item_type type, patterns->items[cnt].spec = spec; patterns->items[cnt].mask = mask; patterns->items[cnt].last = NULL; + dump_flow_pattern(&patterns->items[cnt]); patterns->cnt++; } From patchwork Mon Jan 29 06:59:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867293 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="ZpFa13bU"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="IU2OwRrJ"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVhWs4QDQz9s75 for ; Tue, 30 Jan 2018 07:51:53 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2EF3911A9; Mon, 29 Jan 2018 20:12:04 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B47CE1A32 for ; Mon, 29 Jan 2018 07:00:09 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C4808149 for ; Mon, 29 Jan 2018 07:00:08 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 2BADA20BCF; Mon, 29 Jan 2018 02:00:08 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 02:00:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=8IG+u/eE//EZ0l4vR 56zehBUHcelgbyqIcbrRSd2p0c=; b=ZpFa13bUfCBzQPg15eFBPC8b5hQq4UT1N Q9HQd8S66EaY+aQ5U+zOIa1zUyCBfkDuJFxf94sI9x6tvWjj9uOlpmCyIDk8J2xF xO4C39FfZL4krA9ZMy6al9Kd2FR+/b9PKmLMxZRN8SDjVgC8xwUD3MP7m3Dp+itC loGadgzd6iZ6MGLIt81kUDKeKBDzketNIUZvaLFyOdEei3swM/AVky+W6z5lGEiN QBpN/VqFS1Hbgwa6RMEr349pMLBgbDov/ifexr5T0nl7V5HLNLepDLLK6kbzvtfR BC1h8xJJ0zIRNh82G6muE0LCg4c+KhPyx6D2m+cJtyO8hnqAGtjfg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=8IG+u/eE//EZ0l4vR56zehBUHcelgbyqIcbrRSd2p0c=; b=IU2OwRrJ Ljyp+npXotqllplDwtMQ7kieUpjy35DlJ6jRmIcgFLJN4RBEyakzESEclw+V32WI juYljZrRaZtamDbpFIfEvjAnd0BifnxUek9H0qUsn1/mi9W2S4fG7EPurNGayhX7 X+rN4ZdK7REHVEAnSYXqW3YeRpNnp9xK/AZmMJ0yl0KsX2d0MaAlidAxkPjsoWIy 1FLEvgOeJOvuPN23RFTZgl/t3YTnBr321KvnuZzBYSe73xzjqmFSox0wpVJr/j9u GcGaay/pG9TIyNqHf8motXbbcs4J4/TLPwT5JUZTH1n1qdKg9PG4vdMx8q/HmdN2 LJ/YV8z/BCJ/Mw== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id A89507E34D; Mon, 29 Jan 2018 02:00:05 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:47 +0800 Message-Id: <1517209188-16608-6-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 5/6] dpif-netdev: do hw flow offload in a thread X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Currently, the major trigger for hw flow offload is at upcall handling, which is actually in the datapath. Moreover, the hw offload installation and modification is not that lightweight. Meaning, if there are so many flows being added or modified frequently, it could stall the datapath, which could result to packet loss. To diminish that, all those flow operations will be recorded and appended to a list. A thread is then introduced to process this list (to do the real flow offloading put/del operations). This could leave the datapath as lightweight as possible. Signed-off-by: Yuanhan Liu --- v5: - removed an not-used mutex lock --- lib/dpif-netdev.c | 348 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 258 insertions(+), 90 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index cb16e2e..b1f6973 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1854,13 +1854,11 @@ struct flow_mark { struct cmap megaflow_to_mark; struct cmap mark_to_flow; struct id_pool *pool; - struct ovs_mutex mutex; }; static struct flow_mark flow_mark = { .megaflow_to_mark = CMAP_INITIALIZER, .mark_to_flow = CMAP_INITIALIZER, - .mutex = OVS_MUTEX_INITIALIZER, }; static uint32_t @@ -2001,6 +1999,8 @@ mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd, return ret; } +static void queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_flow *flow); static void flow_mark_flush(struct dp_netdev_pmd_thread *pmd) { @@ -2008,7 +2008,7 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd) CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) { if (flow->pmd_id == pmd->core_id) { - mark_to_flow_disassociate(pmd, flow); + queue_netdev_flow_del(pmd, flow); } } } @@ -2030,6 +2030,257 @@ mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, return NULL; } +enum { + DP_NETDEV_FLOW_OFFLOAD_OP_ADD, + DP_NETDEV_FLOW_OFFLOAD_OP_MOD, + DP_NETDEV_FLOW_OFFLOAD_OP_DEL, +}; + +struct dp_flow_offload_item { + struct dp_netdev_pmd_thread *pmd; + struct dp_netdev_flow *flow; + int op; + struct match match; + struct nlattr *actions; + size_t actions_len; + + struct ovs_list node; +}; + +struct dp_flow_offload { + struct ovs_mutex mutex; + struct ovs_list list; + pthread_cond_t cond; +}; + +static struct dp_flow_offload dp_flow_offload = { + .mutex = OVS_MUTEX_INITIALIZER, + .list = OVS_LIST_INITIALIZER(&dp_flow_offload.list), +}; + +static struct ovsthread_once offload_thread_once + = OVSTHREAD_ONCE_INITIALIZER; + +static struct dp_flow_offload_item * +dp_netdev_alloc_flow_offload(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_flow *flow, + int op) +{ + struct dp_flow_offload_item *offload; + + offload = xzalloc(sizeof(*offload)); + offload->pmd = pmd; + offload->flow = flow; + offload->op = op; + + dp_netdev_flow_ref(flow); + dp_netdev_pmd_try_ref(pmd); + + return offload; +} + +static void +dp_netdev_free_flow_offload(struct dp_flow_offload_item *offload) +{ + dp_netdev_pmd_unref(offload->pmd); + dp_netdev_flow_unref(offload->flow); + + free(offload->actions); + free(offload); +} + +static void +dp_netdev_append_flow_offload(struct dp_flow_offload_item *offload) +{ + ovs_mutex_lock(&dp_flow_offload.mutex); + ovs_list_push_back(&dp_flow_offload.list, &offload->node); + xpthread_cond_signal(&dp_flow_offload.cond); + ovs_mutex_unlock(&dp_flow_offload.mutex); +} + +static int +dp_netdev_flow_offload_del(struct dp_flow_offload_item *offload) +{ + return mark_to_flow_disassociate(offload->pmd, offload->flow); +} + +/* + * There are two flow offload operations here: addition and modification. + * + * For flow addition, this function does: + * - allocate a new flow mark id + * - perform hardware flow offload + * - associate the flow mark with flow and mega flow + * + * For flow modification, both flow mark and the associations are still + * valid, thus only item 2 needed. + */ +static int +dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload) +{ + struct dp_netdev_port *port; + struct dp_netdev_pmd_thread *pmd = offload->pmd; + struct dp_netdev_flow *flow = offload->flow; + odp_port_t in_port = flow->flow.in_port.odp_port; + bool modification = offload->op == DP_NETDEV_FLOW_OFFLOAD_OP_MOD; + struct offload_info info; + uint32_t mark; + int ret; + + if (flow->dead) { + return -1; + } + + if (modification) { + mark = flow->mark; + ovs_assert(mark != INVALID_FLOW_MARK); + } else { + /* + * If a mega flow has already been offloaded (from other PMD + * instances), do not offload it again. + */ + mark = megaflow_to_mark_find(&flow->mega_ufid); + if (mark != INVALID_FLOW_MARK) { + VLOG_DBG("flow has already been offloaded with mark %u\n", mark); + if (flow->mark != INVALID_FLOW_MARK) { + ovs_assert(flow->mark == mark); + } else { + mark_to_flow_associate(mark, flow); + } + return 0; + } + + mark = flow_mark_alloc(); + if (mark == INVALID_FLOW_MARK) { + VLOG_ERR("failed to allocate flow mark!\n"); + } + } + info.flow_mark = mark; + + ovs_mutex_lock(&pmd->dp->port_mutex); + port = dp_netdev_lookup_port(pmd->dp, in_port); + if (!port) { + ovs_mutex_unlock(&pmd->dp->port_mutex); + return -1; + } + ret = netdev_flow_put(port->netdev, &offload->match, + CONST_CAST(struct nlattr *, offload->actions), + offload->actions_len, &flow->mega_ufid, &info, + NULL); + ovs_mutex_unlock(&pmd->dp->port_mutex); + + if (ret) { + if (!modification) { + flow_mark_free(mark); + } else { + mark_to_flow_disassociate(pmd, flow); + } + return -1; + } + + if (!modification) { + megaflow_to_mark_associate(&flow->mega_ufid, mark); + mark_to_flow_associate(mark, flow); + } + + return 0; +} + +static void * +dp_netdev_flow_offload_main(void *data OVS_UNUSED) +{ + struct dp_flow_offload_item *offload; + struct ovs_list *list; + const char *op; + int ret; + + for (;;) { + ovs_mutex_lock(&dp_flow_offload.mutex); + if (ovs_list_is_empty(&dp_flow_offload.list)) { + ovsrcu_quiesce_start(); + ovs_mutex_cond_wait(&dp_flow_offload.cond, + &dp_flow_offload.mutex); + } + list = ovs_list_pop_front(&dp_flow_offload.list); + offload = CONTAINER_OF(list, struct dp_flow_offload_item, node); + ovs_mutex_unlock(&dp_flow_offload.mutex); + + switch (offload->op) { + case DP_NETDEV_FLOW_OFFLOAD_OP_ADD: + op = "add"; + ret = dp_netdev_flow_offload_put(offload); + break; + case DP_NETDEV_FLOW_OFFLOAD_OP_MOD: + op = "modify"; + ret = dp_netdev_flow_offload_put(offload); + break; + case DP_NETDEV_FLOW_OFFLOAD_OP_DEL: + op = "delete"; + ret = dp_netdev_flow_offload_del(offload); + break; + default: + OVS_NOT_REACHED(); + } + + VLOG_DBG("%s to %s netdev flow\n", + ret == 0 ? "succeed" : "failed", op); + dp_netdev_free_flow_offload(offload); + } + + return NULL; +} + +static void +queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_flow *flow) +{ + struct dp_flow_offload_item *offload; + + if (ovsthread_once_start(&offload_thread_once)) { + xpthread_cond_init(&dp_flow_offload.cond, NULL); + ovs_thread_create("dp_netdev_flow_offload", + dp_netdev_flow_offload_main, NULL); + ovsthread_once_done(&offload_thread_once); + } + + offload = dp_netdev_alloc_flow_offload(pmd, flow, + DP_NETDEV_FLOW_OFFLOAD_OP_DEL); + dp_netdev_append_flow_offload(offload); +} + +static void +queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, + struct dp_netdev_flow *flow, struct match *match, + const struct nlattr *actions, size_t actions_len) +{ + struct dp_flow_offload_item *offload; + int op; + + if (!netdev_is_flow_api_enabled()) { + return; + } + + if (ovsthread_once_start(&offload_thread_once)) { + xpthread_cond_init(&dp_flow_offload.cond, NULL); + ovs_thread_create("dp_netdev_flow_offload", + dp_netdev_flow_offload_main, NULL); + ovsthread_once_done(&offload_thread_once); + } + + if (flow->mark != INVALID_FLOW_MARK) { + op = DP_NETDEV_FLOW_OFFLOAD_OP_MOD; + } else { + op = DP_NETDEV_FLOW_OFFLOAD_OP_ADD; + } + offload = dp_netdev_alloc_flow_offload(pmd, flow, op); + offload->match = *match; + offload->actions = xmalloc(actions_len); + memcpy(offload->actions, actions, actions_len); + offload->actions_len = actions_len; + + dp_netdev_append_flow_offload(offload); +} + static void dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow) @@ -2044,7 +2295,7 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, dpcls_remove(cls, &flow->cr); cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid)); if (flow->mark != INVALID_FLOW_MARK) { - mark_to_flow_disassociate(pmd, flow); + queue_netdev_flow_del(pmd, flow); } flow->dead = true; @@ -2625,88 +2876,6 @@ out: return error; } -/* - * There are two flow offload operations here: addition and modification. - * - * For flow addition, this function does: - * - allocate a new flow mark id - * - perform hardware flow offload - * - associate the flow mark with flow and mega flow - * - * For flow modification, both flow mark and the associations are still - * valid, thus only item 2 needed. - */ -static void -try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port, - struct dp_netdev_flow *flow, struct match *match, - const struct nlattr *actions, size_t actions_len) -{ - struct offload_info info; - struct dp_netdev_port *port; - bool modification = flow->mark != INVALID_FLOW_MARK; - const char *op = modification ? "modify" : "add"; - uint32_t mark; - int ret; - - ovs_mutex_lock(&flow_mark.mutex); - - if (modification) { - mark = flow->mark; - } else { - if (!netdev_is_flow_api_enabled()) { - goto out; - } - - /* - * If a mega flow has already been offloaded (from other PMD - * instances), do not offload it again. - */ - mark = megaflow_to_mark_find(&flow->mega_ufid); - if (mark != INVALID_FLOW_MARK) { - VLOG_DBG("flow has already been offloaded with mark %u\n", mark); - mark_to_flow_associate(mark, flow); - goto out; - } - - mark = flow_mark_alloc(); - if (mark == INVALID_FLOW_MARK) { - VLOG_ERR("failed to allocate flow mark!\n"); - goto out; - } - } - info.flow_mark = mark; - - ovs_mutex_lock(&pmd->dp->port_mutex); - port = dp_netdev_lookup_port(pmd->dp, in_port); - if (!port) { - ovs_mutex_unlock(&pmd->dp->port_mutex); - goto out; - } - ret = netdev_flow_put(port->netdev, match, - CONST_CAST(struct nlattr *, actions), - actions_len, &flow->mega_ufid, &info, NULL); - ovs_mutex_unlock(&pmd->dp->port_mutex); - - if (ret) { - VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark); - if (!modification) { - flow_mark_free(mark); - } else { - mark_to_flow_disassociate(pmd, flow); - } - goto out; - } - - if (!modification) { - megaflow_to_mark_associate(&flow->mega_ufid, mark); - mark_to_flow_associate(mark, flow); - } - VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark); - -out: - ovs_mutex_unlock(&flow_mark.mutex); -} - static void dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid) { @@ -2772,7 +2941,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow->node), dp_netdev_flow_hash(&flow->ufid)); - try_netdev_flow_put(pmd, in_port, flow, match, actions, actions_len); + queue_netdev_flow_put(pmd, flow, match, actions, actions_len); if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) { struct ds ds = DS_EMPTY_INITIALIZER; @@ -2854,7 +3023,6 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, if (put->flags & DPIF_FP_MODIFY) { struct dp_netdev_actions *new_actions; struct dp_netdev_actions *old_actions; - odp_port_t in_port = netdev_flow->flow.in_port.odp_port; new_actions = dp_netdev_actions_create(put->actions, put->actions_len); @@ -2862,8 +3030,8 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, old_actions = dp_netdev_flow_get_actions(netdev_flow); ovsrcu_set(&netdev_flow->actions, new_actions); - try_netdev_flow_put(pmd, in_port, netdev_flow, match, - put->actions, put->actions_len); + queue_netdev_flow_put(pmd, netdev_flow, match, + put->actions, put->actions_len); if (stats) { get_dpif_flow_stats(netdev_flow, stats); From patchwork Mon Jan 29 06:59:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867277 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="OWLk40VG"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="aCizq5wM"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVgvb4Vtsz9sCZ for ; Tue, 30 Jan 2018 07:23:55 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id DF62229F2; Mon, 29 Jan 2018 20:04:54 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C336F1A25 for ; Mon, 29 Jan 2018 07:00:11 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 59FA5149 for ; Mon, 29 Jan 2018 07:00:11 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id B752720CB8; Mon, 29 Jan 2018 02:00:10 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 02:00:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=j4kUyPT1ZcUjgV0Qv SuBcanA1pRR43pnhnevPpiM/e8=; b=OWLk40VGn5DhWfFggGj6PejvJG3Sl+quv Gy7SxWIkakqmWF/VYk1D9P8ua0t9nQhE44f69Ph8gitTPsj8SKKINUUSA5hKSCfD 6LeIuZf+SH7fbVCyx7wTQhoN/go1PcQ4cjNryKRWnLr8qCza/g62P8LvILV+Mu1M 1wvRf8vcu2AQiaTJ4zsY0xClf5CDJb38NNcSrUNO/5wq7cMfYbJpNIqxSsm7uMMH inkrn7lw75l9oKEOd97AbInwgjaoeP479rjn4bkQJVOjc2vjhneJmpAI5zNbkvkb F5/7a+FAkMPZddl2hXTVkgZeO77baZU1+FfoLqSrMSFp6CLq0zETw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=j4kUyPT1ZcUjgV0QvSuBcanA1pRR43pnhnevPpiM/e8=; b=aCizq5wM 6fDuG40qbUnZdpSuCUJ2WeBpDU/zAqy6roLw9Q7vyUUXDfyy3UK1gAsuh3qElg6e wjDTCoihHWLYcAlv+23EjsG4DF2yUcyKDHHvLSMlwL4hzv9u5ycqLD1t4AVwjA9Q nLu/+HCF+cu/Ucgwz3eGN25vu/B0Dr7HIfUBgneso0ow0yL1FmNRwYCtRaZpC6lY VsaDGyD2IrPTmJYwV6lE1nX2wAY7k/1K8TUK6Vcqdjn61hpa5m53HVxPZbL5ijug IBhqFu2wujc43AEUHtKAukq5V+rCQaxSMu3E9voCmEKWcNFX787L7tBNBkgOSssL d1AAhJlemF2m9A== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id 597647E34D; Mon, 29 Jan 2018 02:00:08 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:48 +0800 Message-Id: <1517209188-16608-7-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 6/6] Documentation: document ovs-dpdk flow offload X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org And mark it as experimental. Signed-off-by: Yuanhan Liu --- Documentation/howto/dpdk.rst | 17 +++++++++++++++++ NEWS | 1 + 2 files changed, 18 insertions(+) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 40f9d96..047525c 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -727,3 +727,20 @@ devices to bridge ``br0``. Once complete, follow the below steps: Check traffic on multiple queues:: $ cat /proc/interrupts | grep virtio + +.. _dpdk-flow-hardware-offload: + +Flow Hardware Offload (Experimental) +------------------------------------ + +The flow hardware offload is disabled by default and can be enabled by:: + + $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true + +So far only partial flow offload is implemented. Moreover, it only works +with PDM drivers have the rte flow action "MARK + RSS" support. + +The validated NICs are: + +- Mellanox (ConnectX-4, ConnectX-4 Lx, ConnectX-5) +- Napatech (NT200B01) diff --git a/NEWS b/NEWS index d7d585b..d0c9f44 100644 --- a/NEWS +++ b/NEWS @@ -54,6 +54,7 @@ v2.9.0 - xx xxx xxxx * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. + * Add experimental flow hardware offload support - Userspace datapath: * Output packet batching support. - vswitchd: