From patchwork Sat Oct 30 06:12:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng He X-Patchwork-Id: 1548479 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20210112 header.b=p2TxA67y; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Hh8974Rk2z9sSf for ; Sat, 30 Oct 2021 17:12:43 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 437BD80E14; Sat, 30 Oct 2021 06:12:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uk-bdPKZRV5C; Sat, 30 Oct 2021 06:12:39 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id 1689E80BBB; Sat, 30 Oct 2021 06:12:38 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E6538C0012; Sat, 30 Oct 2021 06:12:37 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id E69B6C000E for ; Sat, 30 Oct 2021 06:12:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D4F8C605CE for ; Sat, 30 Oct 2021 06:12:36 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp3.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Hba9qe0FSFLb for ; Sat, 30 Oct 2021 06:12:35 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.8.0 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by smtp3.osuosl.org (Postfix) with ESMTPS id D2280605CB for ; Sat, 30 Oct 2021 06:12:35 +0000 (UTC) Received: by mail-pg1-x52e.google.com with SMTP id n23so1308497pgh.8 for ; Fri, 29 Oct 2021 23:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=/zpmAAHA/ZFAKCsuK+NGVnFVc7IjC/W7vO7Yck09zZM=; b=p2TxA67ygK0NJnob4XFyl5UAPPuvJKdu13ZRABM9iwtuYQdzopGXVk12cN1NnK3462 NrJrtlMyI7LGVj5gcXwcxWdXuhGGHRCTNaVdSQjxF8ZbwxnP6h7WWv9zpWuLN/hw7Jjl 5fv5aeZz4BjQK0oNq5vhj7a+oUExsoaHVdIx2f/GfpQsZOK5toln+waevbpDr6UKb5ed H+1aLqg28f3nz3K5NKeGxXsrrlLvVSErdHu9Va3uMcBVthZbtZY5OzMLe46Ja1Av83X0 bvtblgCGdO7uwyBqfehWBQyM0NY54c7FtA6coe4X2WPDMHD2j+ylmZTKONzD3chgTvsa xIsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=/zpmAAHA/ZFAKCsuK+NGVnFVc7IjC/W7vO7Yck09zZM=; b=6V1mq8j/pwKmDRZYDDIY6T9V7GyS61zNcKksN0notgkS6q7t6zMBrfe9GlLe3kQjyE PTW9He4D3HqhvOYXCTAgn4BGyGTbw0zifbrSz5KXKyF5yqnmAuA5jz9lQ3KB2ks6olOp lzJZL1CmysIadwRKdIzRhCnfqrwzaOkjil7ZHw/CEle+Mr5FVh2E+B9ToP8lqg5sF7+/ lXjdMKmwAmZq3DQFkKmwehO8OGVkMZl4CPoznhJf794+6eFEVsr2mf0GRDYXVIH0cMIL rRwB1gV6mD4xotUHDRPDiHR1/Xiu4uJucNe3UTmmcPVY37TCAVtbyWvHfYOET1qfEWan /amA== X-Gm-Message-State: AOAM5301J3nU/jU7Eql4XHwxkTzMHDmBIRE7DF8LCpu84j8VxrSUYfFw AQl4GtAv+Nj2B3UJrE/S6jcCRSDYnstyKA== X-Google-Smtp-Source: ABdhPJwyJIWlukmbf8ajrRGyEjT7UgBVYtwItiwkL8Uyi+On2tD2dWDK/M/9+xfO9X6HwIobBnYejQ== X-Received: by 2002:a05:6a00:a8b:b0:44d:ef7c:94b9 with SMTP id b11-20020a056a000a8b00b0044def7c94b9mr15624371pfl.36.1635574354817; Fri, 29 Oct 2021 23:12:34 -0700 (PDT) Received: from localhost ([139.177.225.252]) by smtp.gmail.com with ESMTPSA id r14sm6769437pgn.91.2021.10.29.23.12.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Oct 2021 23:12:34 -0700 (PDT) From: Peng He X-Google-Original-From: Peng He To: dev@openvswitch.org, i.maximets@ovn.org Date: Sat, 30 Oct 2021 14:12:31 +0800 Message-Id: <20211030061231.55815-1-hepeng.0320@bytedance.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 Subject: [ovs-dev] [PATCH] ipf: add ipf context X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" ipf_postprocess will emit packets into the datapath pipeline ignoring the conntrack context, this might casuse weird issues when a packet batch has less space to contain all the fragments belonging to single packet. Given the below ruleest and consider sending a 64K ICMP packet which is splitted into 64 fragments. priority=1,action=drop priority=10,arp,action=normal priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0) priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2 priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2 priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9) priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1 Batch 1: the first 32 packets will be buffered in the ipf preprocessing, nothing more proceeds. Batch 2: the second 32 packets succeed the fragment reassembly and goes to ct and ipf_post will emits the first 32 packets due to the limit of batch size. the first 32 packets goes to the datapath again due to the recirculation, and again buffered at ipf preprocessing before ct, then the ovs tries to call ct commit and ipf_postprocessing which emits the last 32 packets, in this case the last 32 packets will follow the current action list which will be sent to port 2 directly without recirculation and going to ipf preprocssing again. This will cause the first 32 packets never get the chance to reassemble and evevntually this large ICMP packets fail to transmit. this patch fixes this issue by adding firstly ipf context to avoid ipf_postprocessing emits packets in the wrong context. Then by re-executing the action list again to emit the last 32 packets in the right context to correctly transmitting multiple fragments. --- lib/conntrack.c | 10 ++++---- lib/conntrack.h | 14 +++++------ lib/dpif-netdev.c | 41 +++++++++++++++++++++++++++---- lib/ipf.c | 53 +++++++++++++++++++++++++++++++++++------ lib/ipf.h | 12 +++++++--- tests/system-traffic.at | 33 +++++++++++++++++++++++++ 6 files changed, 136 insertions(+), 27 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 33a1a9295..a37993d9d 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1434,7 +1434,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, * connection tables. 'setmark', if not NULL, should point to a two * elements array containing a value and a mask to set the connection mark. * 'setlabel' behaves similarly for the connection label.*/ -int +bool conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, ovs_be16 dl_type, bool force, bool commit, uint16_t zone, const uint32_t *setmark, @@ -1443,7 +1443,9 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, const struct nat_action_info_t *nat_action_info, long long now, uint32_t tp_id) { - ipf_preprocess_conntrack(ct->ipf, pkt_batch, now, dl_type, zone, + struct ipf_ctx _ipf_ctx; + struct ipf_ctx *ipf_ctx = dp_packet_batch_is_empty(pkt_batch) ? NULL : &_ipf_ctx; + ipf_preprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, now, dl_type, zone, ct->hash_basis); struct dp_packet *packet; @@ -1468,9 +1470,7 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, } } - ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type); - - return 0; + return ipf_postprocess_conntrack(ct->ipf, pkt_batch, ipf_ctx, now, dl_type); } void diff --git a/lib/conntrack.h b/lib/conntrack.h index 9553b188a..7772444fd 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -88,13 +88,13 @@ struct nat_action_info_t { struct conntrack *conntrack_init(void); void conntrack_destroy(struct conntrack *); -int conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, - ovs_be16 dl_type, bool force, bool commit, uint16_t zone, - const uint32_t *setmark, - const struct ovs_key_ct_labels *setlabel, - ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, - const struct nat_action_info_t *nat_action_info, - long long now, uint32_t tp_id); +bool conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, + ovs_be16 dl_type, bool force, bool commit, uint16_t zone, + const uint32_t *setmark, + const struct ovs_key_ct_labels *setlabel, + ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, + const struct nat_action_info_t *nat_action_info, + long long now, uint32_t tp_id); void conntrack_clear(struct dp_packet *packet); struct conntrack_dump { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b078c2da5..127a22d04 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7755,6 +7755,7 @@ dp_netdev_recirculate(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_execute_aux { struct dp_netdev_pmd_thread *pmd; const struct flow *flow; + const struct nlattr *redo_actions; }; static void @@ -8284,10 +8285,14 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, VLOG_WARN_RL(&rl, "NAT specified without commit."); } - conntrack_execute(dp->conntrack, packets_, aux->flow->dl_type, force, - commit, zone, setmark, setlabel, aux->flow->tp_src, - aux->flow->tp_dst, helper, nat_action_info_ref, - pmd->ctx.now / 1000, tp_id); + bool more_pkts; + more_pkts = conntrack_execute(dp->conntrack, packets_, aux->flow->dl_type, force, + commit, zone, setmark, setlabel, aux->flow->tp_src, + aux->flow->tp_dst, helper, nat_action_info_ref, + pmd->ctx.now / 1000, tp_id); + if (more_pkts) { + aux->redo_actions = a; + } break; } @@ -8321,16 +8326,42 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, dp_packet_delete_batch(packets_, should_steal); } +static size_t +dp_netdev_find_actions_left(const struct nlattr *actions, + size_t actions_len, + const struct nlattr *target) +{ + const struct nlattr *a; + size_t left; + NL_ATTR_FOR_EACH_UNSAFE(a, left, actions, actions_len) { + if (a == target) { + break; + } + } + return left; +} + static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets, bool should_steal, const struct flow *flow, const struct nlattr *actions, size_t actions_len) { - struct dp_netdev_execute_aux aux = { pmd, flow }; + struct dp_netdev_execute_aux aux = { pmd, flow, NULL }; odp_execute_actions(&aux, packets, should_steal, actions, actions_len, dp_execute_cb); + + if (OVS_UNLIKELY(aux.redo_actions)) { + + struct dp_packet_batch batch; + dp_packet_batch_init(&batch); + size_t redo_actions_len = dp_netdev_find_actions_left(actions, \ + actions_len, \ + aux.redo_actions); + dp_netdev_execute_actions(pmd, &batch, should_steal, flow, + aux.redo_actions, redo_actions_len); + } } struct dp_netdev_ct_dump { diff --git a/lib/ipf.c b/lib/ipf.c index 507db2aea..eb8fe87bd 100644 --- a/lib/ipf.c +++ b/lib/ipf.c @@ -1046,30 +1046,52 @@ ipf_send_frags_in_list(struct ipf *ipf, struct ipf_list *ipf_list, OVS_NOT_REACHED(); } +static bool +ipf_ctx_eq(struct ipf_list *ipf_list, struct ipf_ctx *ctx) +{ + struct dp_packet *pkt = + ipf_list->frag_list[ipf_list->last_sent_idx + 1].pkt; + + if (pkt->md.recirc_id != ctx->recirc_id || + pkt->md.in_port.odp_port != ctx->in_port.odp_port || + pkt->md.ct_zone != ctx->zone) { + return false; + } + return true; +} + /* Adds fragments associated with a completed fragment list to a packet batch * to be processed by the calling application, typically conntrack. Also * cleans up the list context when it is empty.*/ -static void +static bool ipf_send_completed_frags(struct ipf *ipf, struct dp_packet_batch *pb, - long long now, bool v6) + struct ipf_ctx *ctx, long long now, bool v6) { if (ovs_list_is_empty(&ipf->frag_complete_list)) { - return; + return false; } + bool more_pkts = false; ovs_mutex_lock(&ipf->ipf_lock); struct ipf_list *ipf_list, *next; LIST_FOR_EACH_SAFE (ipf_list, next, list_node, &ipf->frag_complete_list) { + + if (ctx && !ipf_ctx_eq(ipf_list, ctx)) { + continue; + } + if (ipf_send_frags_in_list(ipf, ipf_list, pb, IPF_FRAG_COMPLETED_LIST, v6, now)) { ipf_completed_list_clean(&ipf->frag_lists, ipf_list); } else { + more_pkts = true; break; } } ovs_mutex_unlock(&ipf->ipf_lock); + return more_pkts; } /* Conservatively adds fragments associated with a expired fragment list to @@ -1220,14 +1242,29 @@ ipf_post_execute_reass_pkts(struct ipf *ipf, ovs_mutex_unlock(&ipf->ipf_lock); } +static void +ipf_ctx_save(struct ipf_ctx *ctx, struct dp_packet_batch *pb, uint16_t zone) +{ + if (ctx) { + ctx->recirc_id = pb->packets[0]->md.recirc_id; + ctx->in_port.odp_port = pb->packets[0]->md.in_port.odp_port; + ctx->zone = zone; + } +} + /* Extracts any fragments from the batch and reassembles them when a * complete packet is received. Completed packets are attempted to * be added to the batch to be sent through conntrack. */ void ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, - long long now, ovs_be16 dl_type, uint16_t zone, + struct ipf_ctx *ctx, long long now, ovs_be16 dl_type, uint16_t zone, uint32_t hash_basis) { + /* No matter if ipf is enabled or not, we should save the ctx, + * because ipf_postprocess will emit packets not matter ipf is + * enabled or not + */ + ipf_ctx_save(ctx, pb, zone); if (ipf_get_enabled(ipf)) { ipf_extract_frags_from_batch(ipf, pb, dl_type, zone, now, hash_basis); } @@ -1241,16 +1278,18 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, * through conntrack and adds these fragments to any batches seen. Expired * fragments are marked as invalid and also added to the batches seen * with low priority. Reassembled packets are freed. */ -void +bool ipf_postprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, - long long now, ovs_be16 dl_type) + struct ipf_ctx *ctx, long long now, ovs_be16 dl_type) { + bool more_pkts = false; if (ipf_get_enabled(ipf) || atomic_count_get(&ipf->nfrag)) { bool v6 = dl_type == htons(ETH_TYPE_IPV6); ipf_post_execute_reass_pkts(ipf, pb, v6); - ipf_send_completed_frags(ipf, pb, now, v6); + more_pkts = ipf_send_completed_frags(ipf, pb, ctx, now, v6); ipf_send_expired_frags(ipf, pb, now, v6); } + return more_pkts; } static void * diff --git a/lib/ipf.h b/lib/ipf.h index 6ac91b270..1fa040cd1 100644 --- a/lib/ipf.h +++ b/lib/ipf.h @@ -40,14 +40,20 @@ struct ipf_status { unsigned int nfrag_max; }; +struct ipf_ctx { + uint32_t recirc_id; + union flow_in_port in_port; /* Input port. */ + uint16_t zone; +}; + struct ipf *ipf_init(void); void ipf_destroy(struct ipf *ipf); void ipf_preprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, - long long now, ovs_be16 dl_type, uint16_t zone, + struct ipf_ctx *ctx, long long now, ovs_be16 dl_type, uint16_t zone, uint32_t hash_basis); -void ipf_postprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, - long long now, ovs_be16 dl_type); +bool ipf_postprocess_conntrack(struct ipf *ipf, struct dp_packet_batch *pb, + struct ipf_ctx *ctx, long long now, ovs_be16 dl_type); int ipf_set_enabled(struct ipf *ipf, bool v6, bool enable); int ipf_set_min_frag(struct ipf *ipf, bool v6, uint32_t value); diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 092de308b..ca9c4d7ea 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -2705,6 +2705,39 @@ DPCTL_CHECK_FRAGMENTATION_PASS() OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([conntrack - IPv4 large fragmentation]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Sending ping through conntrack +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,ct_state=-trk,icmp,action=ct(zone=9,table=0) +priority=100,in_port=1,ct_state=+new+trk,icmp,action=ct(zone=9,commit),2 +priority=100,in_port=1,ct_state=-new+est+trk,icmp,action=2 +priority=100,in_port=2,ct_state=-trk,icmp,action=ct(table=0,zone=9) +priority=100,in_port=2,ct_state=+trk+est-new,icmp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Modify userspace conntrack fragmentation handling. +DPCTL_MODIFY_FRAGMENTATION() + +dnl Ipv4 larger fragmentation connectivity check. +NS_CHECK_EXEC([at_ns0], [ping -s 65000 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + AT_SETUP([conntrack - IPv4 fragmentation expiry]) CHECK_CONNTRACK() OVS_TRAFFIC_VSWITCHD_START()