From patchwork Fri Sep 11 09:10:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Numan Siddique X-Patchwork-Id: 1362319 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BnqjK4slCz9sSs for ; Fri, 11 Sep 2020 19:10:29 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id C0BC687232; Fri, 11 Sep 2020 09:10:27 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KtVk7sN0JHuF; Fri, 11 Sep 2020 09:10:26 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id CFF49871BF; Fri, 11 Sep 2020 09:10:26 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id AB2FFC0859; Fri, 11 Sep 2020 09:10:26 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7ED20C0051 for ; Fri, 11 Sep 2020 09:10:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 733648716E for ; Fri, 11 Sep 2020 09:10:24 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YaBjrDb_K6mr for ; Fri, 11 Sep 2020 09:10:23 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by fraxinus.osuosl.org (Postfix) with ESMTPS id A0B3186F7E for ; Fri, 11 Sep 2020 09:10:22 +0000 (UTC) X-Originating-IP: 115.99.136.65 Received: from nusiddiq.home.org.com (unknown [115.99.136.65]) (Authenticated sender: numans@ovn.org) by relay5-d.mail.gandi.net (Postfix) with ESMTPSA id 538101C0011; Fri, 11 Sep 2020 09:10:20 +0000 (UTC) From: numans@ovn.org To: dev@openvswitch.org Date: Fri, 11 Sep 2020 14:40:13 +0530 Message-Id: <20200911091013.694297-1-numans@ovn.org> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Subject: [ovs-dev] [PATCH ovn v2] northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Numan Siddique Prior to this patch, if a load balancer is configured on a logical switch but with no ACLs with allow-related configured, then in the ingress pipeline only the packets with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port. If the backend of the load balancer, sends an invalid packet (for example invalid tcp sequence number), then such packets will be delivered to the source logical port VIF without unDNATting. This causes the source to reset the connection. This patch fixes this issue by sending all the packets to conntrack if a load balancer is configured on the logical switch. Because of this, any invalid (ct.inv) packets will be dropped in the ingress pipeline itself. Unfortunately this impacts the performance as now there will be extra recirculations because of ct and ct_commit (for new connections) actions. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359 Reported-by: Tim Rozet (trozet@redhat.com) Signed-off-by: Numan Siddique Acked-by: Dumitru Ceara Acked-by: Mark Michelson --- v1 -> v2 ---- * Addressed the review comments from Dumitru and Mark. northd/ovn-northd.8.xml | 41 +++++++++++++++++----- northd/ovn-northd.c | 77 +++++++++++++++++++++++++---------------- 2 files changed, 80 insertions(+), 38 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index a275442a82..a9826e8e9e 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -340,15 +340,12 @@ it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic to the next table. If load balancing rules with virtual IP addresses (and ports) are configured in OVN_Northbound database for a - logical switch datapath, a priority-100 flow is added for each configured - virtual IP address VIP. For IPv4 VIPs, the match is - ip && ip4.dst == VIP. For IPv6 - VIPs, the match is ip && - ip6.dst == VIP. The flow sets an action + logical switch datapath, a priority-100 flow is added with the match + ip to match on IP packets and sets the action reg0[0] = 1; next; to act as a hint for table Pre-stateful to send IP packets to the connection tracker - for packet de-fragmentation before eventually advancing to ingress table - LB. + for packet de-fragmentation before eventually advancing to ingress + table LB. If controller_event has been enabled and load balancing rules with empty backends have been added in OVN_Northbound, a 130 flow is added to trigger ovn-controller events whenever the chassis receives a @@ -356,6 +353,32 @@ previously created, it will be associated to the empty_lb logical flow

+

+ Prior to OVN 20.09 we were setting the + reg0[0] = 1 only if the IP destination matches the load + balancer VIP. However this had few issues cases where a logical switch + doesn't have any ACLs with allow-related action. + To understand the issue lets a take a TCP load balancer - + 10.0.0.10:80=10.0.0.3:80. If a logical port - p1 with + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the + packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone + id and the packet is load balanced to the backend - 10.0.0.3. For the + reply packet from the backend lport, it is not sent to the conntrack of + backend lport's zone id. This is fine as long as the packet is valid. + Suppose the backend lport sends an invalid TCP packet (like incorrect + sequence number), the packet gets delivered to the lport 'p1' without + unDNATing the packet to the VIP - 10.0.0.10. And this causes the + connection to be reset by the lport p1's VIF. +

+ +

+ We can't fix this issue by adding a logical flow to drop ct.inv packets + in the egress pipeline since it will drop all other connections not + destined to the load balancers. To fix this issue, we send all the + packets to the conntrack in the ingress pipeline if a load balancer is + configured. We can now add a lflow to drop ct.inv packets. +

+

This table also has a priority-110 flow with the match eth.dst == E for all logical switch @@ -430,8 +453,8 @@

This table also contains a priority 0 flow with action next;, so that ACLs allow packets by default. If the - logical datapath has a statetful ACL, the following flows will - also be added: + logical datapath has a statetful ACL or a load balancer with VIP + configured, the following flows will also be added:

    diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index b95d6cd8a1..3967bae569 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows, free(action); } +static bool +has_lb_vip(struct ovn_datapath *od) +{ + for (int i = 0; i < od->nbs->n_load_balancer; i++) { + struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; + if (!smap_is_empty(&nb_lb->vips)) { + return true; + } + } + + return false; +} + static void build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, struct shash *meter_groups, struct hmap *lbs) @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, 110, lflows); } - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4); - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6); bool vip_configured = false; for (int i = 0; i < od->nbs->n_load_balancer; i++) { struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i]; @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, for (size_t j = 0; j < lb->n_vips; j++) { struct lb_vip *lb_vip = &lb->vips[j]; - if (lb_vip->addr_family == AF_INET) { - sset_add(&all_ips_v4, lb_vip->vip); - } else { - sset_add(&all_ips_v6, lb_vip->vip); - } - build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb, S_SWITCH_IN_PRE_LB, meter_groups); @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows, } /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send - * packet to conntrack for defragmentation. */ - const char *ip_address; - SSET_FOR_EACH (ip_address, &all_ips_v4) { - char *match = xasprintf("ip && ip4.dst == %s", ip_address); - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); - free(match); - } - - SSET_FOR_EACH (ip_address, &all_ips_v6) { - char *match = xasprintf("ip && ip6.dst == %s", ip_address); - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;"); - free(match); - } - - sset_destroy(&all_ips_v4); - sset_destroy(&all_ips_v6); - + * packet to conntrack for defragmentation. + * + * Send all the packets to conntrack in the ingress pipeline if the + * logical switch has a load balancer with VIP configured. Earlier + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline + * if the IP destination matches the VIP. But this causes few issues when + * a logical switch has no ACLs configured with allow-related. + * To understand the issue, lets a take a TCP load balancer - + * 10.0.0.10:80=10.0.0.3:80. + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1' + * is sent to the p1's conntrack zone id and the packet is load balanced + * to the backend - 10.0.0.3. For the reply packet from the backend lport, + * it is not sent to the conntrack of backend lport's zone id. This is fine + * as long as the packet is valid. Suppose the backend lport sends an + * invalid TCP packet (like incorrect sequence number), the packet gets + * delivered to the lport 'p1' without unDNATing the packet to the + * VIP - 10.0.0.10. And this causes the connection to be reset by the + * lport p1's VIF. + * + * We can't fix this issue by adding a logical flow to drop ct.inv packets + * in the egress pipeline since it will drop all other connections not + * destined to the load balancers. + * + * To fix this issue, we send all the packets to the conntrack in the + * ingress pipeline if a load balancer is configured. We can now + * add a lflow to drop ct.inv packets. + */ if (vip_configured) { + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, + 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;"); } @@ -5482,7 +5498,7 @@ static void build_acls(struct ovn_datapath *od, struct hmap *lflows, struct hmap *port_groups) { - bool has_stateful = has_stateful_acl(od); + bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od)); /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by * default. A related rule at priority 1 is added below if there @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;"); - if (od->nbs->load_balancer) { + if (od->nbs->n_load_balancer) { for (size_t i = 0; i < od->n_router_ports; i++) { skip_port_from_conntrack(od, od->router_ports[i], S_SWITCH_IN_LB, S_SWITCH_OUT_LB, UINT16_MAX, lflows); } + } + + if (has_lb_vip(od)) { /* Ingress and Egress LB Table (Priority 65534). * * Send established traffic through conntrack for just NAT. */