From patchwork Mon Jan 29 06:59:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuanhan Liu X-Patchwork-Id: 867285 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=fridaylinux.org header.i=@fridaylinux.org header.b="UxnKcDQg"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="Tkk3aptT"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zVhC41qmMz9ryT for ; Tue, 30 Jan 2018 07:37:20 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 51F1C17CF; Mon, 29 Jan 2018 20:11:14 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E6F2B1A2D for ; Mon, 29 Jan 2018 07:00:00 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0C3CD149 for ; Mon, 29 Jan 2018 06:59:59 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 54EB320B4B; Mon, 29 Jan 2018 01:59:59 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 29 Jan 2018 01:59:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=3TZ+BAh+88g/Hqf/C qAOEUAhkWHC1g/YEOReldY4s78=; b=UxnKcDQgjuOo6zPFl3PJBQ7L9uvMYBmpN M3J4YezRr8z5cOPQYkzYde+9UZvXbjto7sWaC+MI0QICXztFA06/mTjyh5/I7qQo q+V2URxFEgJ1xBXIhYDNx/8N/9kieETLHaqJvugIQpVpaCnz5qpCS/FphjKEJ5Gz x01f0O39x0JOPcKNdEdOTIGOPOWGT3OyWtTsMtFYbM4IuJnZVW1mqASTEeV1ZmN9 /BPAvkTg2YKEeQiwoSDtAhvXWeN1AS8XHPRh+R11PLsOtF/JpXHvrP3ly/mosDqm evGEYC5W86+CYHWJ9UrrcfbMeMlEZ8E5I7A1TUAHqWvq4jfd0DJrw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=3TZ+BAh+88g/Hqf/CqAOEUAhkWHC1g/YEOReldY4s78=; b=Tkk3aptT ETdUkBGBVRfXPENOsm8y2zvw4AkDF2uor1fTfZC4Oje8D684Cuwu6oMLC2Gk9su1 m5o0WrVK3W/qHAMpdYcF/qUjjWSoFYnUnv68AQe5YyzNo/6LWWwdy7XFhfFMRYFb elHaFK+riWNAI4yx3zeIM0VeYPCtH4UfvQlnLnUjTtsmgvIYKjnNPMIwb4E58bXw jNZXhH7S8H7+cxYsHF8uByylALRm+zlnUtjhNqsPy8sF54uCam4xzHgWMFZd/38y DVVH5a8oEb2NhMZiVOzubERZ4vUfm4bPqFik9cle/6jm7BWgrZRIxNdjisdxkHlS dlcXYWZJpiLm8A== X-ME-Sender: Received: from yliu-dev.mtl.com (unknown [220.177.86.229]) by mail.messagingengine.com (Postfix) with ESMTPA id C51007E34D; Mon, 29 Jan 2018 01:59:56 -0500 (EST) From: Yuanhan Liu To: dev@openvswitch.org Date: Mon, 29 Jan 2018 14:59:44 +0800 Message-Id: <1517209188-16608-3-git-send-email-yliu@fridaylinux.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> References: <1517209188-16608-1-git-send-email-yliu@fridaylinux.org> X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Simon Horman Subject: [ovs-dev] [PATCH v7 2/6] dpif-netdev: retrieve flow directly from the flow mark X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org So that we could skip some very costly CPU operations, including but not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that 260% performance boost. Note that though the heavy miniflow_extract is skipped, we still have to do per packet checking, due to we have to check the tcp_flags. Co-authored-by: Finn Christensen Signed-off-by: Yuanhan Liu Signed-off-by: Finn Christensen --- v7: - fixed wrong hash for mark_to_flow that has been refactored in v6 v5: - do not fetch the flow if the flow has been dead --- lib/dp-packet.h | 13 +++++ lib/dpif-netdev.c | 44 +++++++++++++--- lib/flow.c | 155 +++++++++++++++++++++++++++++++++++++++++++----------- lib/flow.h | 1 + 4 files changed, 175 insertions(+), 38 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..ea1194c 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -691,6 +691,19 @@ reset_dp_packet_checksum_ol_flags(struct dp_packet *p) #define reset_dp_packet_checksum_ol_flags(arg) #endif +static inline bool +dp_packet_has_flow_mark(struct dp_packet *p OVS_UNUSED, + uint32_t *mark OVS_UNUSED) +{ +#ifdef DPDK_NETDEV + if (p->mbuf.ol_flags & PKT_RX_FDIR_ID) { + *mark = p->mbuf.hash.fdir.hi; + return true; + } +#endif + return false; +} + enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */ struct dp_packet_batch { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a514de8..cb16e2e 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2013,6 +2013,23 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd) } } +static struct dp_netdev_flow * +mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, + const uint32_t mark) +{ + struct dp_netdev_flow *flow; + + CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0), + &flow_mark.mark_to_flow) { + if (flow->mark == mark && flow->pmd_id == pmd->core_id && + flow->dead == false) { + return flow; + } + } + + return NULL; +} + static void dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow) @@ -5203,10 +5220,10 @@ struct packet_batch_per_flow { static inline void packet_batch_per_flow_update(struct packet_batch_per_flow *batch, struct dp_packet *packet, - const struct miniflow *mf) + uint16_t tcp_flags) { batch->byte_count += dp_packet_size(packet); - batch->tcp_flags |= miniflow_get_tcp_flags(mf); + batch->tcp_flags |= tcp_flags; batch->array.packets[batch->array.count++] = packet; } @@ -5240,7 +5257,7 @@ packet_batch_per_flow_execute(struct packet_batch_per_flow *batch, static inline void dp_netdev_queue_batches(struct dp_packet *pkt, - struct dp_netdev_flow *flow, const struct miniflow *mf, + struct dp_netdev_flow *flow, uint16_t tcp_flags, struct packet_batch_per_flow *batches, size_t *n_batches) { @@ -5251,7 +5268,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt, packet_batch_per_flow_init(batch, flow); } - packet_batch_per_flow_update(batch, pkt, mf); + packet_batch_per_flow_update(batch, pkt, tcp_flags); } /* Try to process all ('cnt') the 'packets' using only the exact match cache @@ -5282,6 +5299,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, const size_t cnt = dp_packet_batch_size(packets_); uint32_t cur_min; int i; + uint16_t tcp_flags; atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); pmd_perf_update_counter(&pmd->perf_stats, @@ -5290,6 +5308,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { struct dp_netdev_flow *flow; + uint32_t flow_mark; if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { dp_packet_delete(packet); @@ -5297,6 +5316,16 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, continue; } + if (dp_packet_has_flow_mark(packet, &flow_mark)) { + flow = mark_to_flow_find(pmd, flow_mark); + if (flow) { + tcp_flags = parse_tcp_flags(packet); + dp_netdev_queue_batches(packet, flow, tcp_flags, batches, + n_batches); + continue; + } + } + if (i != cnt - 1) { struct dp_packet **packets = packets_->packets; /* Prefetch next packet data and metadata. */ @@ -5322,7 +5351,8 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, flow = NULL; } if (OVS_LIKELY(flow)) { - dp_netdev_queue_batches(packet, flow, &key->mf, batches, + tcp_flags = miniflow_get_tcp_flags(&key->mf); + dp_netdev_queue_batches(packet, flow, tcp_flags, batches, n_batches); } else { /* Exact match cache missed. Group missed packets together at @@ -5501,7 +5531,9 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, flow = dp_netdev_flow_cast(rules[i]); emc_probabilistic_insert(pmd, &keys[i], flow); - dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); + dp_netdev_queue_batches(packet, flow, + miniflow_get_tcp_flags(&keys[i].mf), + batches, n_batches); } pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, diff --git a/lib/flow.c b/lib/flow.c index f9d7c2a..80718d4 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -624,6 +624,70 @@ flow_extract(struct dp_packet *packet, struct flow *flow) miniflow_expand(&m.mf, flow); } +static inline bool +ipv4_sanity_check(const struct ip_header *nh, size_t size, + int *ip_lenp, uint16_t *tot_lenp) +{ + int ip_len; + uint16_t tot_len; + + if (OVS_UNLIKELY(size < IP_HEADER_LEN)) { + return false; + } + ip_len = IP_IHL(nh->ip_ihl_ver) * 4; + + if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN || size < ip_len)) { + return false; + } + + tot_len = ntohs(nh->ip_tot_len); + if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len || + size - tot_len > UINT8_MAX)) { + return false; + } + + *ip_lenp = ip_len; + *tot_lenp = tot_len; + + return true; +} + +static inline uint8_t +ipv4_get_nw_frag(const struct ip_header *nh) +{ + uint8_t nw_frag = 0; + + if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) { + nw_frag = FLOW_NW_FRAG_ANY; + if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) { + nw_frag |= FLOW_NW_FRAG_LATER; + } + } + + return nw_frag; +} + +static inline bool +ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, size_t size) +{ + uint16_t plen; + + if (OVS_UNLIKELY(size < sizeof *nh)) { + return false; + } + + plen = ntohs(nh->ip6_plen); + if (OVS_UNLIKELY(plen > size)) { + return false; + } + /* Jumbo Payload option not supported yet. */ + if (OVS_UNLIKELY(size - plen > UINT8_MAX)) { + return false; + } + + return true; +} + /* Caller is responsible for initializing 'dst' with enough storage for * FLOW_U64S * 8 bytes. */ void @@ -748,22 +812,7 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) int ip_len; uint16_t tot_len; - if (OVS_UNLIKELY(size < IP_HEADER_LEN)) { - goto out; - } - ip_len = IP_IHL(nh->ip_ihl_ver) * 4; - - if (OVS_UNLIKELY(ip_len < IP_HEADER_LEN)) { - goto out; - } - if (OVS_UNLIKELY(size < ip_len)) { - goto out; - } - tot_len = ntohs(nh->ip_tot_len); - if (OVS_UNLIKELY(tot_len > size || ip_len > tot_len)) { - goto out; - } - if (OVS_UNLIKELY(size - tot_len > UINT8_MAX)) { + if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, &ip_len, &tot_len))) { goto out; } dp_packet_set_l2_pad_size(packet, size - tot_len); @@ -786,31 +835,19 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) nw_tos = nh->ip_tos; nw_ttl = nh->ip_ttl; nw_proto = nh->ip_proto; - if (OVS_UNLIKELY(IP_IS_FRAGMENT(nh->ip_frag_off))) { - nw_frag = FLOW_NW_FRAG_ANY; - if (nh->ip_frag_off & htons(IP_FRAG_OFF_MASK)) { - nw_frag |= FLOW_NW_FRAG_LATER; - } - } + nw_frag = ipv4_get_nw_frag(nh); data_pull(&data, &size, ip_len); } else if (dl_type == htons(ETH_TYPE_IPV6)) { - const struct ovs_16aligned_ip6_hdr *nh; + const struct ovs_16aligned_ip6_hdr *nh = data; ovs_be32 tc_flow; uint16_t plen; - if (OVS_UNLIKELY(size < sizeof *nh)) { + if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) { goto out; } - nh = data_pull(&data, &size, sizeof *nh); + data_pull(&data, &size, sizeof *nh); plen = ntohs(nh->ip6_plen); - if (OVS_UNLIKELY(plen > size)) { - goto out; - } - /* Jumbo Payload option not supported yet. */ - if (OVS_UNLIKELY(size - plen > UINT8_MAX)) { - goto out; - } dp_packet_set_l2_pad_size(packet, size - plen); size = plen; /* Never pull padding. */ @@ -982,6 +1019,60 @@ parse_dl_type(const struct eth_header *data_, size_t size) return parse_ethertype(&data, &size); } +uint16_t +parse_tcp_flags(struct dp_packet *packet) +{ + const void *data = dp_packet_data(packet); + size_t size = dp_packet_size(packet); + ovs_be16 dl_type; + uint8_t nw_frag = 0, nw_proto = 0; + + if (packet->packet_type != htonl(PT_ETH)) { + return 0; + } + + data_pull(&data, &size, ETH_ADDR_LEN * 2); + dl_type = parse_ethertype(&data, &size); + if (OVS_LIKELY(dl_type == htons(ETH_TYPE_IP))) { + const struct ip_header *nh = data; + int ip_len; + uint16_t tot_len; + + if (OVS_UNLIKELY(!ipv4_sanity_check(nh, size, &ip_len, &tot_len))) { + return 0; + } + nw_proto = nh->ip_proto; + nw_frag = ipv4_get_nw_frag(nh); + + size = tot_len; /* Never pull padding. */ + data_pull(&data, &size, ip_len); + } else if (dl_type == htons(ETH_TYPE_IPV6)) { + const struct ovs_16aligned_ip6_hdr *nh = data; + + if (OVS_UNLIKELY(!ipv6_sanity_check(nh, size))) { + return 0; + } + data_pull(&data, &size, sizeof *nh); + + size = ntohs(nh->ip6_plen); /* Never pull padding. */ + if (!parse_ipv6_ext_hdrs__(&data, &size, &nw_proto, &nw_frag)) { + return 0; + } + nw_proto = nh->ip6_nxt; + } else { + return 0; + } + + if (!(nw_frag & FLOW_NW_FRAG_LATER) && nw_proto == IPPROTO_TCP && + size >= TCP_HEADER_LEN) { + const struct tcp_header *tcp = data; + + return TCP_FLAGS_BE32(tcp->tcp_ctl); + } + + return 0; +} + /* For every bit of a field that is wildcarded in 'wildcards', sets the * corresponding bit in 'flow' to zero. */ void diff --git a/lib/flow.h b/lib/flow.h index eb1e2bf..6be4199 100644 --- a/lib/flow.h +++ b/lib/flow.h @@ -130,6 +130,7 @@ bool parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto, uint8_t *nw_frag); ovs_be16 parse_dl_type(const struct eth_header *data_, size_t size); bool parse_nsh(const void **datap, size_t *sizep, struct ovs_key_nsh *key); +uint16_t parse_tcp_flags(struct dp_packet *packet); static inline uint64_t flow_get_xreg(const struct flow *flow, int idx)