From patchwork Mon Aug 26 16:05:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Darrell Ball X-Patchwork-Id: 1153282 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FgrYLlA3"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46HH3Z2pqcz9sDB for ; Tue, 27 Aug 2019 02:08:09 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 5B99718FD; Mon, 26 Aug 2019 16:07:58 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id CAF6518A3 for ; Mon, 26 Aug 2019 16:05:59 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id EA74A710 for ; Mon, 26 Aug 2019 16:05:58 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id w2so12088379pfi.3 for ; Mon, 26 Aug 2019 09:05:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=GDuCx030z0uyslhGpKpURzNlOuYwT+XqhfnAH3EVgs0=; b=FgrYLlA3OL3NgYDkegT31c1Hq7IRPU3H1MiVH4Vgm4nCKWApTL4FHI7aI/j87RkM26 fR98FbMooAQhQPx1ac6SgSPiwcEERTxXesob7/Fg46S4EQH8t55J/SC0swZvE/Hg2mPe fuIxgwy9G2e+ZpayPRrlRwpl0YfXnAhqUzJm6kD0xFViFNJdzktDYzER6bt1qhanmvt2 YEXyFYMwBMoHxvi05fH8SySdCPB7OjjgbKJGggvxCWUQcud25PRWHQeWneFqN8JOKQuF h4t1cXxwYd7h8PgheCbYskrsxD+WDHF8ZhkWji7wDwXNGoBckZsRROlGCjkmYYv1Ch3S fgYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=GDuCx030z0uyslhGpKpURzNlOuYwT+XqhfnAH3EVgs0=; b=bRuAz0MldbJvZYlH1ADR98Bo5Cb0+uVdudg6dKYaXFQ0lkxRvIjesmmOJpWFzy3FpW gVHnvzlsq6WIXmFWZPDkcCFf4gwzrZjS++t2YSJAAvvGsfhOKBB+tYJYjnkGUVtliqAi D+qkC2ADz+CCDqZaqRIgA1WAIrOI0GaYyHv+E7Fjje1rc6F/LVTD/ptyQ+3N7WypS7SE qxssQTo2p8N4zNh46y9TY+F2Nr1j4ZZ4QgF3zXJXdSPGkFrJlYe64zJcyNUmoTLeDuWI 7NP/KZ0nUUNXFShOkZWmPSEoZLxYRJ0PNPUsGYRiiwduu2jGPA7DprKYjJimzH4DasXA QWxw== X-Gm-Message-State: APjAAAX90YL5i6fhbH7vhGVSImFWidA0GnZODF1Iue1F7b2yvyrZT4M/ T/UtLXPISvySfGNAI4FRyP0= X-Google-Smtp-Source: APXvYqwU3GmitGtDGFU2peuRsWDbEMJODCo/Dse2+R8uk99jdwV+WfhqRMyQfv64B9gQBThlzq3onw== X-Received: by 2002:a65:4507:: with SMTP id n7mr16458044pgq.86.1566835558230; Mon, 26 Aug 2019 09:05:58 -0700 (PDT) Received: from ubuntu.localdomain (c-76-102-76-212.hsd1.ca.comcast.net. [76.102.76.212]) by smtp.gmail.com with ESMTPSA id h17sm14068758pfo.24.2019.08.26.09.05.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Aug 2019 09:05:57 -0700 (PDT) From: Darrell Ball To: dlu998@gmail.com, dev@openvswitch.org Date: Mon, 26 Aug 2019 09:05:44 -0700 Message-Id: <1566835544-72391-1-git-send-email-dlu998@gmail.com> X-Mailer: git-send-email 1.9.1 X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [patch v6] conntrack: Optimize recirculations. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Cache the 'conn' context and use it when it is valid. The cached 'conn' context will get reset if it is not expected to be valid; the cost to do this is negligible. Besides being most optimal, this also handles corner cases, such as decapsulation leading to the same tuple, as in tunnel VPN cases. A negative test is added to check the resetting of the cached 'conn'. Signed-off-by: Darrell Ball --- v6: a/ Added 'conn' reset for mpls push case to force 'invalid', for consistency reasons. b/ Add missed lock around 'conn->mark' and 'conn->label' access in 'process_one_fast()'. c/ Use opaque vs void 'conn' ptr (Ben). d/ Remove inline specifier on 2 static functions in source file. (Ben). e/ Remove 3 'continue' instructions in lieu of 'else' clause. (Ben). v5: Check for alg ctl on recirculation to handle potential corner case. Remove unnecessary 'commit' filter. v4: Reset 'conn' cache context automatically when tuple changes, rather than needing an explicit 'ct_clear' action. Need to check if all cases are handled. v3: Remove unneeded 'NAT' field added to 'struct conn_lookup_ctx'. lib/conntrack.c | 60 ++++++++++++++++++++++++++++++++++++++++++---- lib/netdev.c | 1 + lib/packets.c | 9 +++++++ lib/packets.h | 11 +++++++++ tests/system-traffic.at | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 139 insertions(+), 5 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 0125bb4..ecf3bcc 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1086,6 +1086,46 @@ conn_update_state_alg(struct conntrack *ct, struct dp_packet *pkt, } static void +set_cached_conn(const struct nat_action_info_t *nat_action_info, + const struct conn_lookup_ctx *ctx, struct conn *conn, + struct dp_packet *pkt) +{ + if (OVS_LIKELY(!nat_action_info)) { + pkt->md.conn = conn; + pkt->md.reply = ctx->reply; + pkt->md.icmp_related = ctx->icmp_related; + } else { + pkt->md.conn = NULL; + } +} + +static void +process_one_fast(uint16_t zone, const uint32_t *setmark, + const struct ovs_key_ct_labels *setlabel, + const struct nat_action_info_t *nat_action_info, + struct conn *conn, struct dp_packet *pkt) +{ + if (nat_action_info) { + handle_nat(pkt, conn, zone, pkt->md.reply, pkt->md.icmp_related); + pkt->md.conn = NULL; + } + + pkt->md.ct_zone = zone; + ovs_mutex_lock(&conn->lock); + pkt->md.ct_mark = conn->mark; + pkt->md.ct_label = conn->label; + ovs_mutex_unlock(&conn->lock); + + if (setmark) { + set_mark(pkt, conn, setmark[0], setmark[1]); + } + + if (setlabel) { + set_label(pkt, conn, &setlabel[0], &setlabel[1]); + } +} + +static void process_one(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, uint16_t zone, bool force, bool commit, long long now, const uint32_t *setmark, @@ -1185,6 +1225,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, } handle_alg_ctl(ct, ctx, pkt, ct_alg_ctl, conn, now, !!nat_action_info); + + set_cached_conn(nat_action_info, ctx, conn, pkt); } /* Sends the packets in '*pkt_batch' through the connection tracker 'ct'. All @@ -1212,14 +1254,21 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, struct conn_lookup_ctx ctx; DP_PACKET_BATCH_FOR_EACH (i, packet, pkt_batch) { - if (packet->md.ct_state == CS_INVALID - || !conn_key_extract(ct, packet, dl_type, &ctx, zone)) { + struct conn *conn = packet->md.conn; + if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) { + write_ct_md(packet, zone, NULL, NULL, NULL); + } else if (conn && conn->key.zone == zone && !force + && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { + process_one_fast(zone, setmark, setlabel, nat_action_info, + conn, packet); + } else if (OVS_UNLIKELY(!conn_key_extract(ct, packet, dl_type, &ctx, + zone))) { packet->md.ct_state = CS_INVALID; write_ct_md(packet, zone, NULL, NULL, NULL); - continue; + } else { + process_one(ct, packet, &ctx, zone, force, commit, now, setmark, + setlabel, nat_action_info, tp_src, tp_dst, helper); } - process_one(ct, packet, &ctx, zone, force, commit, now, setmark, - setlabel, nat_action_info, tp_src, tp_dst, helper); } ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type); @@ -1233,6 +1282,7 @@ conntrack_clear(struct dp_packet *packet) /* According to pkt_metadata_init(), ct_state == 0 is enough to make all of * the conntrack fields invalid. */ packet->md.ct_state = 0; + pkt_metadata_init_conn(&packet->md); } static void diff --git a/lib/netdev.c b/lib/netdev.c index b1976d3..af8f856 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -837,6 +837,7 @@ netdev_pop_header(struct netdev *netdev, struct dp_packet_batch *batch) * interpretation in the further packet processing when * recirculated.*/ dp_packet_reset_offload(packet); + pkt_metadata_init_conn(&packet->md); dp_packet_batch_refill(batch, packet, i); } } diff --git a/lib/packets.c b/lib/packets.c index ab0b1a3..f59ce39 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -391,6 +391,8 @@ push_mpls(struct dp_packet *packet, ovs_be16 ethtype, ovs_be32 lse) header = dp_packet_resize_l2_5(packet, MPLS_HLEN); memmove(header, header + MPLS_HLEN, len); memcpy(header + len, &lse, sizeof lse); + + pkt_metadata_init_conn(&packet->md); } /* If 'packet' is an MPLS packet, removes its outermost MPLS label stack entry. @@ -979,6 +981,8 @@ packet_set_ipv4_addr(struct dp_packet *packet, ovs_be32 old_addr = get_16aligned_be32(addr); size_t l4_size = dp_packet_l4_size(packet); + pkt_metadata_init_conn(&packet->md); + if (nh->ip_proto == IPPROTO_TCP && l4_size >= TCP_HEADER_LEN) { struct tcp_header *th = dp_packet_l4(packet); @@ -1118,6 +1122,7 @@ packet_set_ipv6_addr(struct dp_packet *packet, uint8_t proto, packet_update_csum128(packet, proto, addr, new_addr); } memcpy(addr, new_addr, sizeof(ovs_be32[4])); + pkt_metadata_init_conn(&packet->md); } static void @@ -1219,6 +1224,7 @@ packet_set_tcp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) packet_set_port(&th->tcp_src, src, &th->tcp_csum); packet_set_port(&th->tcp_dst, dst, &th->tcp_csum); + pkt_metadata_init_conn(&packet->md); } /* Sets the UDP source and destination port ('src' and 'dst' respectively) of @@ -1240,6 +1246,7 @@ packet_set_udp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) uh->udp_src = src; uh->udp_dst = dst; } + pkt_metadata_init_conn(&packet->md); } /* Sets the SCTP source and destination port ('src' and 'dst' respectively) of @@ -1261,6 +1268,7 @@ packet_set_sctp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) new_csum = crc32c((void *)sh, tp_len); put_16aligned_be32(&sh->sctp_csum, old_csum ^ old_correct_csum ^ new_csum); + pkt_metadata_init_conn(&packet->md); } /* Sets the ICMP type and code of the ICMP header contained in 'packet'. @@ -1279,6 +1287,7 @@ packet_set_icmp(struct dp_packet *packet, uint8_t type, uint8_t code) ih->icmp_csum = recalc_csum16(ih->icmp_csum, orig_tc, new_tc); } + pkt_metadata_init_conn(&packet->md); } /* Sets the IGMP type to IGMP_HOST_MEMBERSHIP_QUERY and populates the diff --git a/lib/packets.h b/lib/packets.h index a4bee38..0b3e3a2 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -35,6 +35,7 @@ #include "timeval.h" struct dp_packet; +struct conn; struct ds; /* Purely internal to OVS userspace. These flags should never be exposed to @@ -108,6 +109,9 @@ PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, uint32_t ct_mark; /* Connection mark. */ ovs_u128 ct_label; /* Connection label. */ union flow_in_port in_port; /* Input port. */ + struct conn *conn; /* Cached conntrack connection. */ + bool reply; /* True if reply direction. */ + bool icmp_related; /* True if ICMP related. */ ); PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, @@ -140,6 +144,12 @@ pkt_metadata_init_tnl(struct pkt_metadata *md) } static inline void +pkt_metadata_init_conn(struct pkt_metadata *md) +{ + md->conn = NULL; +} + +static inline void pkt_metadata_init(struct pkt_metadata *md, odp_port_t port) { /* This is called for every packet in userspace datapath and affects @@ -157,6 +167,7 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t port) md->tunnel.ip_dst = 0; md->tunnel.ipv6_dst = in6addr_any; md->in_port.odp_port = port; + md->conn = NULL; } /* This function prefetches the cachelines touched by pkt_metadata_init() diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 1a04199..c2276f7 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -5618,6 +5618,69 @@ grep "tcp,orig=(src=10.254.254.2,dst=10.1.1.1,sport=,dport=),r OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([conntrack - negative test for recirculation optimization]) +dnl This test will fail if 'conn' caching is being used, because the tuple +dnl has been changed outside of conntrack. +AT_SKIP_IF([test $HAVE_NC = no]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() +OVS_CHECK_CT_CLEAR() + +ADD_NAMESPACES(at_ns0, at_ns1) +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01") dnl FIP 10.254.254.1 +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02") dnl FIP 10.254.254.2 + +dnl Static ARPs +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.2 lladdr f0:00:00:01:01:02 dev p0]) +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev p1]) + +dnl Static ARP and route entries for the FIP "gateway" +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.254 lladdr f0:00:00:01:01:FE dev p0]) +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.254 lladdr f0:00:00:01:01:FE dev p1]) +NS_CHECK_EXEC([at_ns0], [ip route add default nexthop via 10.1.1.254]) +NS_CHECK_EXEC([at_ns1], [ip route add default nexthop via 10.1.1.254]) + +NETNS_DAEMONIZE([at_ns0], [nc -l -k 1234 > /dev/null], [nc0.pid]) + +AT_DATA([flows.txt], [dnl +table=0,priority=10 ip action=ct(table=1) +dnl dst FIP +table=1,priority=20 ip,ct_state=+trk+est,nw_dst=10.254.254.0/24 action=goto_table:2 +table=1,priority=20 ip,ct_state=+trk+new,nw_dst=10.254.254.0/24 action=ct(commit,exec(set_field:1->ct_mark),table=2) +dnl +dnl FIP translation (dst FIP, src local) --> (dst local, src FIP) +table=2 ip,nw_dst=10.254.254.1 action=set_field:10.1.1.1->nw_dst,goto_table:3 +table=2 ip,nw_dst=10.254.254.2 action=set_field:10.1.1.2->nw_dst,goto_table:3 +table=3 ip,nw_src=10.1.1.1 action=set_field:10.254.254.1->nw_src,goto_table:4 +table=3 ip,nw_src=10.1.1.2 action=set_field:10.254.254.2->nw_src,goto_table:4 +table=4 ip,nw_dst=10.1.1.1 action=set_field:f0:00:00:01:01:01->eth_dst,goto_table:5 +table=4 ip,nw_dst=10.1.1.2 action=set_field:f0:00:00:01:01:02->eth_dst,goto_table:5 +table=5 ip,nw_src=10.254.254.0/24 action=set_field:f0:00:00:01:01:FE->eth_src,goto_table:6 +dnl +dnl Tuple has been changed outside of conntrack +table=6,priority=10 ip action=ct(table=7) +dnl +table=7 ip,ct_state=+trk+est action=goto_table:8 +table=7 ip,ct_mark=0x0,ct_state=+trk+new action=ct(commit,exec(set_field:2->ct_mark),table=8) +dnl +table=8 ip,nw_dst=10.1.1.1 action=output:ovs-p0 +table=8 ip,nw_dst=10.1.1.2 action=output:ovs-p1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.254.254.1 1234]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.254.254)], [0], [dnl +tcp,orig=(src=10.1.1.2,dst=10.254.254.1,sport=,dport=),reply=(src=10.254.254.1,dst=10.1.1.2,sport=,dport=),mark=1,protoinfo=(state=) +tcp,orig=(src=10.254.254.2,dst=10.1.1.1,sport=,dport=),reply=(src=10.1.1.1,dst=10.254.254.2,sport=,dport=),mark=2,protoinfo=(state=) +]) + +ovs-appctl dpif/dump-flows br0 + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + AT_BANNER([802.1ad]) AT_SETUP([802.1ad - vlan_limit])