From patchwork Wed Jun 24 15:51:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dumitru Ceara X-Patchwork-Id: 1316329 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OpQfkFib; dkim-atps=neutral Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49sSMJ1jnFz9sSS for ; Thu, 25 Jun 2020 01:52:12 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 9A9FB860EA; Wed, 24 Jun 2020 15:52:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9PogBNSgH3uN; Wed, 24 Jun 2020 15:52:08 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id 0C27485FC7; Wed, 24 Jun 2020 15:52:08 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E39F2C0891; Wed, 24 Jun 2020 15:52:07 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 51043C016F for ; Wed, 24 Jun 2020 15:52:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 491DE877BA for ; Wed, 24 Jun 2020 15:52:06 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BDCihgARrQ2a for ; Wed, 24 Jun 2020 15:51:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by whitealder.osuosl.org (Postfix) with ESMTPS id 0503887938 for ; Wed, 24 Jun 2020 15:51:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593013912; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/QmoSvHC3ZCF4kpUXaNSeIRCMC6JXYdO7WIfCsPKyAc=; b=OpQfkFibtmM2Qn+bu/gOJAT+Tri4gnZD2Qlj9WzRHz7OudjA5OjmbLjwGXXv14gDct8YdR bZqEbD4jkdhnOs3x9Sqdkyygk7pF+YKHJDm50idxJNMMIRbiz7IIOXmogS9dJTMZliUi5f qm000mSonB0peW9RJduN47yFNHcudX0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-444-xHvBMRgTPK2Dn8B0x_QdPw-1; Wed, 24 Jun 2020 11:51:49 -0400 X-MC-Unique: xHvBMRgTPK2Dn8B0x_QdPw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1E6B6107ACCA; Wed, 24 Jun 2020 15:51:48 +0000 (UTC) Received: from dceara.remote.csb (ovpn-114-124.ams2.redhat.com [10.36.114.124]) by smtp.corp.redhat.com (Postfix) with ESMTP id D5252512FE; Wed, 24 Jun 2020 15:51:46 +0000 (UTC) From: Dumitru Ceara To: dev@openvswitch.org Date: Wed, 24 Jun 2020 17:51:44 +0200 Message-Id: <20200624155141.11798.64431.stgit@dceara.remote.csb> In-Reply-To: <20200624155053.11798.12143.stgit@dceara.remote.csb> References: <20200624155053.11798.12143.stgit@dceara.remote.csb> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: hzhou@ovn.org Subject: [ovs-dev] [PATCH ovn 4/4] ovn-northd: Minimize number of ARP/NS responder flows for DNAT. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Most ARP/NS responder flows can be configured per datapath instead of per router port. The only exception is with distributed gateway router ports which need special treatment. This patch changes the ARP/NS responder behavior and adds: - Priority 92 flows to reply to ARP requests on distributed gateway router ports, on the chassis where the DNAT entry is bound. - Priority 91 flows to drop ARP requests on distributed gateway router ports, on chassis where the DNAT entry is not bound. - Priority 90 flows to reply to ARP requests on all other router ports. This last type of flows is programmed exactly once per logical router limiting the total number of required logical flows. Suggested-by: Han Zhou Reported-by: Girish Moodalbail Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050186.html Signed-off-by: Dumitru Ceara --- northd/ovn-northd.8.xml | 16 +++- northd/ovn-northd.c | 203 ++++++++++++++++++++++++++++++++--------------- tests/ovn.at | 8 +- 3 files changed, 153 insertions(+), 74 deletions(-) diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index ec89a63..df66caa 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -1857,9 +1857,8 @@ nd_na_router { IPv4: For a configured DNAT IP address or a load balancer IPv4 VIP A, for each router port P with Ethernet address E, a priority-90 flow matches - inport == P && arp.op == 1 && - arp.tpa == A (ARP request) - with the following actions: + arp.op == 1 && arp.tpa == A + (ARP request) with the following actions:

@@ -1876,6 +1875,11 @@ output;
         

+ IPv4: For a configured load balancer IPv4 VIP, a similar flow is + added with the additional match inport == P. +

+ +

If the router port P is a distributed gateway router port, then the is_chassis_resident(P) is also added in the match condition for the load balancer IPv4 @@ -1922,9 +1926,11 @@ nd_na {

  • If the corresponding NAT rule cannot be handled in a - distributed manner, then this flow is only programmed on + distributed manner, then a priority-92 flow is programmed on the gateway port instance on the - redirect-chassis. This behavior avoids + redirect-chassis. A priority-91 drop flow is + programmed on the other chassis when ARP requests/NS packets + are received on the gateway port. This behavior avoids generation of multiple ARP responses from different chassis, and allows upstream MAC learning to point to the redirect-chassis. diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 6b22316..82783f4 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -7961,7 +7961,7 @@ lrouter_nat_is_stateless(const struct nbrec_nat *nat) static void build_lrouter_arp_flow(struct ovn_datapath *od, struct ovn_port *op, const char *ip_address, const char *eth_addr, - struct ds *extra_match, uint16_t priority, + struct ds *extra_match, bool drop, uint16_t priority, struct hmap *lflows, const struct ovsdb_idl_row *hint) { struct ds match = DS_EMPTY_INITIALIZER; @@ -7976,20 +7976,24 @@ build_lrouter_arp_flow(struct ovn_datapath *od, struct ovn_port *op, if (extra_match && ds_last(extra_match) != EOF) { ds_put_format(&match, " && %s", ds_cstr(extra_match)); } - ds_put_format(&actions, - "eth.dst = eth.src; " - "eth.src = %s; " - "arp.op = 2; /* ARP reply */ " - "arp.tha = arp.sha; " - "arp.sha = %s; " - "arp.tpa = arp.spa; " - "arp.spa = %s; " - "outport = inport; " - "flags.loopback = 1; " - "output;", - eth_addr, - eth_addr, - ip_address); + if (drop) { + ds_put_format(&actions, "drop;"); + } else { + ds_put_format(&actions, + "eth.dst = eth.src; " + "eth.src = %s; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = %s; " + "arp.tpa = arp.spa; " + "arp.spa = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output;", + eth_addr, + eth_addr, + ip_address); + } ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_INPUT, priority, ds_cstr(&match), ds_cstr(&actions), hint); @@ -8008,7 +8012,7 @@ static void build_lrouter_nd_flow(struct ovn_datapath *od, struct ovn_port *op, const char *action, const char *ip_address, const char *sn_ip_address, const char *eth_addr, - struct ds *extra_match, uint16_t priority, + struct ds *extra_match, bool drop, uint16_t priority, struct hmap *lflows, const struct ovsdb_idl_row *hint) { @@ -8030,21 +8034,25 @@ build_lrouter_nd_flow(struct ovn_datapath *od, struct ovn_port *op, ds_put_format(&match, " && %s", ds_cstr(extra_match)); } - ds_put_format(&actions, - "%s { " - "eth.src = %s; " - "ip6.src = %s; " - "nd.target = %s; " - "nd.tll = %s; " - "outport = inport; " - "flags.loopback = 1; " - "output; " - "};", - action, - eth_addr, - ip_address, - ip_address, - eth_addr); + if (drop) { + ds_put_format(&actions, "drop;"); + } else { + ds_put_format(&actions, + "%s { " + "eth.src = %s; " + "ip6.src = %s; " + "nd.target = %s; " + "nd.tll = %s; " + "outport = inport; " + "flags.loopback = 1; " + "output; " + "};", + action, + eth_addr, + ip_address, + ip_address, + eth_addr); + } ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_INPUT, priority, ds_cstr(&match), ds_cstr(&actions), hint); @@ -8234,7 +8242,41 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, "ip4.dst == 0.0.0.0/8", "drop;"); - /* Priority-90 flows reply to ARP requests and ND packets. */ + /* Priority-90-92 flows handle ARP requests and ND packets. Most are + * per logical port but DNAT addresses can be handled per datapath + * for non gateway router ports. + */ + for (int i = 0; i < od->nbr->n_nat; i++) { + struct ovn_nat *nat_entry = &od->nat_entries[i]; + const struct nbrec_nat *nat = nat_entry->nb; + + /* Skip entries we failed to parse. */ + if (!nat_entry_is_valid(nat_entry)) { + continue; + } + + if (!strcmp(nat->type, "snat")) { + continue; + } + + /* Priority 91 and 92 flows are added for each gateway router + * port to handle the special cases. In case we get the packet + * on a regular port, just reply with the port's ETH address. + */ + struct lport_addresses *ext_addrs = &nat_entry->ext_addrs; + if (nat_entry_is_v6(nat_entry)) { + build_lrouter_nd_flow(od, NULL, "nd_na", + ext_addrs->ipv6_addrs[0].addr_s, + ext_addrs->ipv6_addrs[0].sn_addr_s, + REG_INPORT_ETH_ADDR, NULL, false, 90, + lflows, &nat->header_); + } else { + build_lrouter_arp_flow(od, NULL, + ext_addrs->ipv4_addrs[0].addr_s, + REG_INPORT_ETH_ADDR, NULL, false, 90, + lflows, &nat->header_); + } + } /* Drop ARP packets (priority 85). ARP request packets for router's own * IPs are handled with priority-90 flows. @@ -8384,8 +8426,8 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lrouter_arp_flow(op->od, op, op->lrp_networks.ipv4_addrs[i].addr_s, - REG_INPORT_ETH_ADDR, &match, 90, lflows, - &op->nbrp->header_); + REG_INPORT_ETH_ADDR, &match, false, 90, + lflows, &op->nbrp->header_); } /* A set to hold all load-balancer vips that need ARP responses. */ @@ -8403,7 +8445,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lrouter_arp_flow(op->od, op, ip_address, REG_INPORT_ETH_ADDR, - &match, 90, lflows, NULL); + &match, false, 90, lflows, NULL); } SSET_FOR_EACH (ip_address, &all_ips_v6) { @@ -8415,7 +8457,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lrouter_nd_flow(op->od, op, "nd_na", ip_address, NULL, REG_INPORT_ETH_ADDR, - &match, 90, lflows, NULL); + &match, false, 90, lflows, NULL); } sset_destroy(&all_ips_v4); @@ -8468,53 +8510,84 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, /* Mac address to use when replying to ARP/NS. */ const char *mac_s = REG_INPORT_ETH_ADDR; + /* ARP/NS packets are taken care of per router. The only exception + * is on the l3dgw_port where we might need to use a different + * ETH address. + */ + if (op != op->od->l3dgw_port) { + continue; + } + /* ARP / ND handling for external IP addresses. * * DNAT IP addresses are external IP addresses that need ARP * handling. */ ds_clear(&match); - if (op->od->l3dgw_port && op == op->od->l3dgw_port) { - struct eth_addr mac; - if (nat->external_mac && - eth_addr_from_string(nat->external_mac, &mac) - && nat->logical_port) { - /* distributed NAT case, use nat->external_mac */ - mac_s = nat->external_mac; - /* Traffic with eth.src = nat->external_mac should only be - * sent from the chassis where nat->logical_port is - * resident, so that upstream MAC learning points to the - * correct chassis. Also need to avoid generation of - * multiple ARP responses from different chassis. */ - ds_put_format(&match, "is_chassis_resident(\"%s\")", - nat->logical_port); - } else { - mac_s = REG_INPORT_ETH_ADDR; - /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s - * should only be sent from the "redirect-chassis", so that - * upstream MAC learning points to the "redirect-chassis". - * Also need to avoid generation of multiple ARP responses - * from different chassis. */ - if (op->od->l3redirect_port) { - ds_put_format(&match, "is_chassis_resident(%s)", - op->od->l3redirect_port->json_key); - } + struct ds match_cr_port = DS_EMPTY_INITIALIZER; + struct ds match_non_cr_port = DS_EMPTY_INITIALIZER; + + struct eth_addr mac; + if (nat->external_mac && + eth_addr_from_string(nat->external_mac, &mac) + && nat->logical_port) { + /* distributed NAT case, use nat->external_mac */ + mac_s = nat->external_mac; + /* Traffic with eth.src = nat->external_mac should only be + * sent from the chassis where nat->logical_port is + * resident, so that upstream MAC learning points to the + * correct chassis. Also need to avoid generation of + * multiple ARP responses from different chassis. */ + ds_put_format(&match_cr_port, + "is_chassis_resident(\"%s\")", + nat->logical_port); + ds_put_format(&match_non_cr_port, + "!is_chassis_resident(\"%s\")", + nat->logical_port); + } else { + mac_s = REG_INPORT_ETH_ADDR; + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the "redirect-chassis", so that + * upstream MAC learning points to the "redirect-chassis". + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + if (op->od->l3redirect_port) { + ds_put_format(&match_cr_port, + "is_chassis_resident(\"%s\")", + op->od->l3redirect_port->json_key); + ds_put_format(&match_non_cr_port, + "!is_chassis_resident(\"%s\")", + op->od->l3redirect_port->json_key); } } + /* Respond to ARP/NS requests on the chassis that binds the gw + * port. Drop the ARP/NS requests on other chassis. + */ struct lport_addresses *ext_addrs = &nat_entry->ext_addrs; if (nat_entry_is_v6(nat_entry)) { build_lrouter_nd_flow(op->od, op, "nd_na", ext_addrs->ipv6_addrs[0].addr_s, ext_addrs->ipv6_addrs[0].sn_addr_s, - mac_s, &match, 90, + mac_s, &match_cr_port, false, 92, + lflows, &nat->header_); + build_lrouter_nd_flow(op->od, op, "nd_na", + ext_addrs->ipv6_addrs[0].addr_s, + ext_addrs->ipv6_addrs[0].sn_addr_s, + mac_s, &match_non_cr_port, true, 91, lflows, &nat->header_); } else { build_lrouter_arp_flow(op->od, op, ext_addrs->ipv4_addrs[0].addr_s, - mac_s, &match, 90, + mac_s, &match_cr_port, false, 92, + lflows, &nat->header_); + build_lrouter_arp_flow(op->od, op, + ext_addrs->ipv4_addrs[0].addr_s, + mac_s, &match_non_cr_port, true, 91, lflows, &nat->header_); } + ds_destroy(&match_cr_port); + ds_destroy(&match_non_cr_port); } if (!smap_get(&op->od->nbr->options, "chassis") @@ -8700,8 +8773,8 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lrouter_nd_flow(op->od, op, "nd_na_router", op->lrp_networks.ipv6_addrs[i].addr_s, op->lrp_networks.ipv6_addrs[i].sn_addr_s, - REG_INPORT_ETH_ADDR, &match, 90, lflows, - &op->nbrp->header_); + REG_INPORT_ETH_ADDR, &match, false, 90, + lflows, &op->nbrp->header_); } /* UDP/TCP port unreachable */ diff --git a/tests/ovn.at b/tests/ovn.at index 1ff7952..15139a1 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -19169,7 +19169,7 @@ OVS_WAIT_UNTIL([ send_arp_request 1 0 ${src_mac} $(ip_to_hex 10 0 0 254) $(ip_to_hex 10 0 0 121) # Verify that the ARP request is replied to from hv1 and not hv2. -match_arp_req="priority=90.*${match_r1_metadata}.*arp_tpa=10.0.0.121,arp_op=1" +match_arp_req="priority=92.*${match_r1_metadata}.*arp_tpa=10.0.0.121,arp_op=1" as hv1 OVS_WAIT_UNTIL([ @@ -19189,7 +19189,7 @@ OVS_WAIT_UNTIL([ send_arp_request 1 0 ${src_mac} $(ip_to_hex 10 0 0 254) $(ip_to_hex 10 0 0 122) # Verify that the ARP request is replied to from hv2 and not hv1. -match_arp_req="priority=90.*${match_r1_metadata}.*arp_tpa=10.0.0.122,arp_op=1" +match_arp_req="priority=92.*${match_r1_metadata}.*arp_tpa=10.0.0.122,arp_op=1" as hv2 OVS_WAIT_UNTIL([ @@ -19233,7 +19233,7 @@ dst_ipv6=00100000000000000000000000000121 send_nd_ns 1 0 ${src_mac} ${src_ipv6} ${dst_ipv6} 72dd # Verify that the ND_NS is replied to from hv1 and not hv2. -match_nd_ns="priority=90.*${match_r1_metadata}.*icmp_type=135.*nd_target=10::121" +match_nd_ns="priority=92.*${match_r1_metadata}.*icmp_type=135.*nd_target=10::121" as hv1 OVS_WAIT_UNTIL([ @@ -19255,7 +19255,7 @@ dst_ipv6=00100000000000000000000000000122 send_nd_ns 1 0 ${src_mac} ${src_ipv6} ${dst_ipv6} 72db # Verify that the ND_NS is replied to from hv2 and not hv1. -match_nd_ns="priority=90.*${match_r1_metadata}.*icmp_type=135.*nd_target=10::122" +match_nd_ns="priority=92.*${match_r1_metadata}.*icmp_type=135.*nd_target=10::122" as hv2 OVS_WAIT_UNTIL([