From patchwork Thu May 30 00:20:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Sharma X-Patchwork-Id: 1107465 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nutanix.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nutanix.com header.i=@nutanix.com header.b="k9WTNdq5"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45DpLS3PJLz9s3l for ; Thu, 30 May 2019 10:27:12 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 50B802D26; Thu, 30 May 2019 00:26:37 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 3AD122D03 for ; Thu, 30 May 2019 00:20:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx0a-002c1b01.pphosted.com (mx0a-002c1b01.pphosted.com [148.163.151.68]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D230D821 for ; Thu, 30 May 2019 00:20:25 +0000 (UTC) Received: from pps.filterd (m0127837.ppops.net [127.0.0.1]) by mx0a-002c1b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4U0Ewtk019629 for ; Wed, 29 May 2019 17:20:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=proofpoint20171006; bh=rlQVDOWI/Vn0w7QUZsJN7xlwUXtwRRSQjmXwiAC5RZE=; b=k9WTNdq5IxIhlj09D0DkHI3KNPPhoufI+PCEWRiWYFuIYQOW2UKoyAHJ5jDPm5krC8ik 5HfHoq6VdCgffk5uK3waHCARF2MPG49w5mWTVa8KCMDRPMaIHxvJyVNb+fg11kWF57za wpsdkj5h2Tv65L6P12i1/u5XVN7F4Cv1dFyycP1Lre02r1mQZBFfDfSvQR3PT+j/PoRQ AvhYyQbxISt7qYXBer/DBtaj6N3LKjoSCNKQKHyJmjUAiI1hBJkYnUQWPi917kBqqM3+ UcM4f1jmnyiJ5EKGi8RWF1ZF5yT6I7Ye3P0LIRObYx7xeHdQPiuAKm1QOTO13LIbPs1i +g== Received: from nam05-co1-obe.outbound.protection.outlook.com (mail-co1nam05lp2056.outbound.protection.outlook.com [104.47.48.56]) by mx0a-002c1b01.pphosted.com with ESMTP id 2st3ee03dk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Wed, 29 May 2019 17:20:25 -0700 Received: from MW2PR02MB3899.namprd02.prod.outlook.com (52.132.178.28) by MW2PR02MB3850.namprd02.prod.outlook.com (52.132.178.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1922.15; Thu, 30 May 2019 00:20:23 +0000 Received: from MW2PR02MB3899.namprd02.prod.outlook.com ([fe80::95c1:331:992:9d85]) by MW2PR02MB3899.namprd02.prod.outlook.com ([fe80::95c1:331:992:9d85%7]) with mapi id 15.20.1922.021; Thu, 30 May 2019 00:20:23 +0000 From: Ankur Sharma To: "ovs-dev@openvswitch.org" Thread-Topic: [PATCH v9 1/2] OVN: Enable E-W Traffic, Vlan backed DVR Thread-Index: AQHVFn15LPQ2f6OJ9EyHfiYZGxlz8w== Date: Thu, 30 May 2019 00:20:23 +0000 Message-ID: <1559175728-127062-2-git-send-email-ankur.sharma@nutanix.com> References: <1559175728-127062-1-git-send-email-ankur.sharma@nutanix.com> In-Reply-To: <1559175728-127062-1-git-send-email-ankur.sharma@nutanix.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BYAPR08CA0042.namprd08.prod.outlook.com (2603:10b6:a03:117::19) To MW2PR02MB3899.namprd02.prod.outlook.com (2603:10b6:907:4::28) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 1.8.3.1 x-originating-ip: [192.146.154.1] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 6d5cbc98-ecbf-4fcb-bd0a-08d6e4949b85 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MW2PR02MB3850; x-ms-traffictypediagnostic: MW2PR02MB3850: x-ms-exchange-purlcount: 2 x-proofpoint-crosstenant: true x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:2887; x-forefront-prvs: 00531FAC2C x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(376002)(39860400002)(366004)(346002)(136003)(189003)(199004)(14454004)(14444005)(26005)(386003)(25786009)(2616005)(446003)(71200400001)(66066001)(53936002)(11346002)(7736002)(410100003)(6512007)(6916009)(81166006)(102836004)(6306002)(36756003)(186003)(53946003)(8936002)(86362001)(305945005)(8676002)(486006)(44832011)(478600001)(6506007)(5024004)(476003)(71190400001)(5660300002)(2906002)(30864003)(2351001)(316002)(81156014)(966005)(6436002)(66946007)(50226002)(6486002)(66574012)(5640700003)(4326008)(107886003)(66556008)(64756008)(76176011)(66446008)(4720700003)(99286004)(68736007)(2501003)(52116002)(6116002)(256004)(3846002)(66476007)(73956011)(64030200001)(569006); DIR:OUT; SFP:1102; SCL:1; SRVR:MW2PR02MB3850; H:MW2PR02MB3899.namprd02.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: nutanix.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: OuKf2TX+/OSN6/c0/i9n2hbvmQblbK3DsUHHWTzPruoo+2wJ7LW8la2SH3E2V1wHVkBvVtXeiGfxlgC4lwDG3rs69A0xNSb1uDyz3GGSSn9PXDTvS+9+LI1vjKBVWPMRsCQBkOSWE9wXHYr8CN2gieXULd01nKm7cL3Ik72jbMpZ3F2EcnMM2A0pmSaXTas/EKNTgl2rWpTb7z6qQQRvJOv23dQI4xnz/iEsJ+Ctos7Fxg+9D460ts2DdcF7ofzKKPruZsZU0pWZI4JaHngzdDKbxGV0KEuGLnDVgOcKaA6Jy/4weBx5gq7BNgsV0joF+0xkOeRW838YOqF6IyjNHpBp4Hw/Rctp6/QlhIT1JnKn220auhuWsHsLyxjennYTFOA9vcUOq36NlY7dxNG6k/zW9yCDabk2l2krXRtyBdI= MIME-Version: 1.0 X-OriginatorOrg: nutanix.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6d5cbc98-ecbf-4fcb-bd0a-08d6e4949b85 X-MS-Exchange-CrossTenant-originalarrivaltime: 30 May 2019 00:20:23.7948 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: bb047546-786f-4de1-bd75-24e5b6f79043 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ankur.sharma@nutanix.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR02MB3850 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-30_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v9 1/2] OVN: Enable E-W Traffic, Vlan backed DVR X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Background: [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353066.html [2] https://docs.google.com/document/d/1uoQH478wM1OZ16HrxzbOUvk5LvFnfNEWbkPT6Zmm9OU/edit?usp=sharing This Series: Layer 3 E-W and Layer 3 N-S (NO NAT) changes for vlan backed distributed logical router. This patch: A. Key difference between an overlay logical switch and vlan backed logical switch is that for vlan logical switches packets are not encapsulated. Hence, if a distributed router port is connected to vlan type logical switch, then router port mac as source mac could be seen from multiple hypervisors. Same pairs coming from multiple ports from a top of the rack switch (TOR) perspective could be seen as a security threat and it could send alarms, drop the packets or block the ports etc. This patch addresses the same by introducing the concept of chassis mac. A chassis mac is CMS provisioned unique mac per chassis. For any routed packet (i.e source mac is router port mac) going on the wire on a vlan type logical switch, we will replace its source mac with chassis mac. This replacing of source mac with chassis mac will happen in table=65 of the logical switch datapath. A flow is added at priority 150, which matches the source mac and replaces it with chassis mac if the value is a router port mac. Example flow: cookie=0x0, duration=67765.830s, table=65, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=150,reg15=0x1,metadata=0x4, dl_src=00:00:01:01:02:03 actions=mod_dl_src:aa:bb:cc:dd:ee:ff, mod_vlan_vid:1000,output:16 Here, 00:00:01:01:02:03 is router port mac and aa:bb:cc:dd:ee:ff is chassis mac. B. This patch adds one more change of associating "types" with logical switches. i.e a logical switch could be of type "overlay" or "bridged". This is done to explicitly call out that on a bridged logical switch there will no encapsulation. Just a localnet port's presence is not sufficient, as we do encap while redirecting the packet to gateway chassis. By marking the logical switch as bridged, we can either avoid redirection totally (if there is no NAT) or do redirection based on router port mac, rather than encap over a tunnel. Signed-off-by: Ankur Sharma --- ovn/controller/binding.c | 12 +-- ovn/controller/chassis.c | 64 +++++++++++- ovn/controller/chassis.h | 4 + ovn/controller/ovn-controller.8.xml | 10 ++ ovn/controller/ovn-controller.c | 4 +- ovn/controller/ovn-controller.h | 5 +- ovn/controller/physical.c | 95 +++++++++++++++++ ovn/northd/ovn-northd.c | 38 +++++++ ovn/ovn-architecture.7.xml | 24 +++++ ovn/ovn-nb.ovsschema | 10 +- ovn/ovn-nb.xml | 19 ++++ ovn/ovn-sb.xml | 15 +++ ovn/utilities/ovn-nbctl.c | 49 +++++++-- tests/ovn-nbctl.at | 48 ++++++--- tests/ovn-northd.at | 22 ++++ tests/ovn.at | 197 ++++++++++++++++++++++++++++++++++++ 16 files changed, 583 insertions(+), 33 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index b62b3da..c73d1aa 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -159,13 +159,11 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key, sbrec_port_binding_by_name, peer->datapath, false, depth + 1, local_datapaths); - ld->n_peer_dps++; - ld->peer_dps = xrealloc( - ld->peer_dps, - ld->n_peer_dps * sizeof *ld->peer_dps); - ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key( - sbrec_datapath_binding_by_key, - peer->datapath->tunnel_key); + ld->n_peer_ports++; + ld->peer_ports = xrealloc(ld->peer_ports, + ld->n_peer_ports * + sizeof *ld->peer_ports); + ld->peer_ports[ld->n_peer_ports - 1] = peer; } } } diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c index 0f537f1..8403212 100644 --- a/ovn/controller/chassis.c +++ b/ovn/controller/chassis.c @@ -23,6 +23,7 @@ #include "lib/vswitch-idl.h" #include "openvswitch/dynamic-string.h" #include "openvswitch/vlog.h" +#include "openvswitch/ofp-parse.h" #include "ovn/lib/chassis-index.h" #include "ovn/lib/ovn-sb-idl.h" #include "ovn-controller.h" @@ -69,6 +70,12 @@ get_bridge_mappings(const struct smap *ext_ids) } static const char * +get_chassis_mac_mappings(const struct smap *ext_ids) +{ + return smap_get_def(ext_ids, "ovn-chassis-mac-mappings", ""); +} + +static const char * get_cms_options(const struct smap *ext_ids) { return smap_get_def(ext_ids, "ovn-cms-options", ""); @@ -162,6 +169,7 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn, const char *datapath_type = br_int && br_int->datapath_type ? br_int->datapath_type : ""; const char *cms_options = get_cms_options(&cfg->external_ids); + const char *chassis_macs = get_chassis_mac_mappings(&cfg->external_ids); struct ds iface_types = DS_EMPTY_INITIALIZER; ds_put_cstr(&iface_types, ""); @@ -190,18 +198,22 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn, = smap_get_def(&chassis_rec->external_ids, "iface-types", ""); const char *chassis_cms_options = get_cms_options(&chassis_rec->external_ids); + const char *chassis_mac_mappings + = get_chassis_mac_mappings(&chassis_rec->external_ids); /* If any of the external-ids should change, update them. */ if (strcmp(bridge_mappings, chassis_bridge_mappings) || strcmp(datapath_type, chassis_datapath_type) || strcmp(iface_types_str, chassis_iface_types) || - strcmp(cms_options, chassis_cms_options)) { + strcmp(cms_options, chassis_cms_options) || + strcmp(chassis_macs, chassis_mac_mappings)) { struct smap new_ids; smap_clone(&new_ids, &chassis_rec->external_ids); smap_replace(&new_ids, "ovn-bridge-mappings", bridge_mappings); smap_replace(&new_ids, "datapath-type", datapath_type); smap_replace(&new_ids, "iface-types", iface_types_str); smap_replace(&new_ids, "ovn-cms-options", cms_options); + smap_replace(&new_ids, "ovn-chassis-mac-mappings", chassis_macs); sbrec_chassis_verify_external_ids(chassis_rec); sbrec_chassis_set_external_ids(chassis_rec, &new_ids); smap_destroy(&new_ids); @@ -319,6 +331,56 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn, return chassis_rec; } +bool +chassis_get_mac(const struct sbrec_chassis *chassis_rec, + const char *bridge_mapping, + struct eth_addr *chassis_mac) +{ + const char *tokens + = get_chassis_mac_mappings(&chassis_rec->external_ids); + + if (!strlen(tokens)) { + return false; + } + + char *save_ptr = NULL; + char *token; + bool ret = false; + char *tokstr = xstrdup(tokens); + + /* Format for a chassis mac configuration is: + * ovn-chassis-mac-mappings="bridge-name1:MAC1,bridge-name2:MAC2" + */ + for (token = strtok_r(tokstr, ",", &save_ptr); + token != NULL; + token = strtok_r(NULL, ",", &save_ptr)) { + char *save_ptr2 = NULL; + char *chassis_mac_bridge = strtok_r(token, ":", &save_ptr2); + char *chassis_mac_str = strtok_r(NULL, "", &save_ptr2); + + if (!strcmp(chassis_mac_bridge, bridge_mapping)) { + struct eth_addr temp_mac; + char *err_str = NULL; + + ret = true; + + /* Return the first chassis mac. */ + if ((err_str = str_to_mac(chassis_mac_str, &temp_mac))) { + free(err_str); + ret = false; + continue; + } + + *chassis_mac = temp_mac; + break; + } + } + + free(tokstr); + + return ret; +} + /* Returns true if the database is all cleaned up, false if more work is * required. */ bool diff --git a/ovn/controller/chassis.h b/ovn/controller/chassis.h index 9847e19..e3fbc31 100644 --- a/ovn/controller/chassis.h +++ b/ovn/controller/chassis.h @@ -26,6 +26,7 @@ struct ovsrec_open_vswitch_table; struct sbrec_chassis; struct sbrec_chassis_table; struct sset; +struct eth_addr; void chassis_register_ovs_idl(struct ovsdb_idl *); const struct sbrec_chassis *chassis_run( @@ -36,5 +37,8 @@ const struct sbrec_chassis *chassis_run( const struct sset *transport_zones); bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn, const struct sbrec_chassis *); +bool chassis_get_mac(const struct sbrec_chassis *chassis, + const char *bridge_mapping, + struct eth_addr *chassis_mac); #endif /* ovn/chassis.h */ diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml index 9721d9a..18f66fe 100644 --- a/ovn/controller/ovn-controller.8.xml +++ b/ovn/controller/ovn-controller.8.xml @@ -182,6 +182,16 @@ transport zone.

+
external_ids:ovn-chassis-mac-mappings
+
+ A list of key-value pairs that map a chassis specific mac to + a physical network name. An example + value mapping two chassis macs to two physical network names would be: + physnet1:aa:bb:cc:dd:ee:ff,physnet2:a1:b2:c3:d4:e5:f6. + These are the macs that ovn-controller will replace a router port + mac with, if packet is going from a distributed router port on + vlan type logical switch. +

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 6019016..315a88b 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -899,7 +899,7 @@ en_runtime_data_cleanup(struct engine_node *node) struct local_datapath *cur_node, *next_node; HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &data->local_datapaths) { - free(cur_node->peer_dps); + free(cur_node->peer_ports); hmap_remove(&data->local_datapaths, &cur_node->hmap_node); free(cur_node); } @@ -929,7 +929,7 @@ en_runtime_data_run(struct engine_node *node) } else { struct local_datapath *cur_node, *next_node; HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, local_datapaths) { - free(cur_node->peer_dps); + free(cur_node->peer_ports); hmap_remove(local_datapaths, &cur_node->hmap_node); free(cur_node); } diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h index 6afd727..a4c1309 100644 --- a/ovn/controller/ovn-controller.h +++ b/ovn/controller/ovn-controller.h @@ -59,8 +59,9 @@ struct local_datapath { /* True if this datapath contains an l3gateway port located on this * hypervisor. */ bool has_local_l3gateway; - const struct sbrec_datapath_binding **peer_dps; - size_t n_peer_dps; + + const struct sbrec_port_binding **peer_ports; + size_t n_peer_ports; }; struct local_datapath *get_local_datapath(const struct hmap *, diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index c8dc282..af587a5 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -20,6 +20,7 @@ #include "ha-chassis.h" #include "lflow.h" #include "lport.h" +#include "chassis.h" #include "lib/bundle.h" #include "openvswitch/poll-loop.h" #include "lib/uuid.h" @@ -30,6 +31,7 @@ #include "openvswitch/ofp-actions.h" #include "openvswitch/ofpbuf.h" #include "openvswitch/vlog.h" +#include "openvswitch/ofp-parse.h" #include "ovn-controller.h" #include "ovn/lib/chassis-index.h" #include "ovn/lib/ovn-sb-idl.h" @@ -236,6 +238,92 @@ get_zone_ids(const struct sbrec_port_binding *binding, } static void +put_replace_router_port_mac_flows(const struct + sbrec_port_binding *localnet_port, + const struct sbrec_chassis *chassis, + const struct hmap *local_datapaths, + struct ofpbuf *ofpacts_p, + ofp_port_t ofport, + struct ovn_desired_flow_table *flow_table) +{ + struct local_datapath *ld = get_local_datapath(local_datapaths, + localnet_port->datapath-> + tunnel_key); + ovs_assert(ld); + + uint32_t dp_key = localnet_port->datapath->tunnel_key; + uint32_t port_key = localnet_port->tunnel_key; + int tag = localnet_port->tag ? *localnet_port->tag : 0; + const char *network = smap_get(&localnet_port->options, "network_name"); + struct eth_addr chassis_mac; + + if (!network) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "Physical network not configured for datapath: %ld " + "with localnet port", + localnet_port->datapath->tunnel_key); + return; + } + + /* Get chassis mac */ + if (!chassis_get_mac(chassis, network, &chassis_mac)) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + /* Keeping the log level low for backward compatibility. + * Chassis mac is a new configuration. + */ + VLOG_DBG_RL(&rl, "Could not get chassis mac for network: %s", network); + return; + } + + for (int i = 0; i < ld->n_peer_ports; i++) { + const struct sbrec_port_binding *rport_binding = ld->peer_ports[i]; + struct eth_addr router_port_mac; + char *err_str = NULL; + struct match match; + struct ofpact_mac *replace_mac; + + /* Table 65, priority 150. + * ======================= + * + * Implements output to localnet port. + * a. Flow replaces ingress router port mac with a chassis mac. + * b. Flow appends the vlan id localnet port is configured with. + */ + match_init_catchall(&match); + ofpbuf_clear(ofpacts_p); + + ovs_assert(rport_binding->n_mac == 1); + if ((err_str = str_to_mac(rport_binding->mac[0], &router_port_mac))) { + /* Parsing of mac failed. */ + VLOG_WARN("Parsing or router port mac failed for router port: %s, " + "with error: %s", rport_binding->logical_port, err_str); + free(err_str); + return; + } + + /* Replace Router mac flow */ + match_set_metadata(&match, htonll(dp_key)); + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key); + match_set_dl_src(&match, router_port_mac); + + replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p); + replace_mac->mac = chassis_mac; + + if (tag) { + struct ofpact_vlan_vid *vlan_vid; + vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p); + vlan_vid->vlan_vid = tag; + vlan_vid->push_vlan_if_needed = true; + } + + ofpact_put_OUTPUT(ofpacts_p)->port = ofport; + + ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 150, 0, + &match, ofpacts_p, &localnet_port->header_.uuid); + } +} + +static void put_local_common_flows(uint32_t dp_key, uint32_t port_key, uint32_t parent_port_key, const struct zone_ids *zone_ids, @@ -707,6 +795,13 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name, } ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 100, 0, &match, ofpacts_p, &binding->header_.uuid); + + if (!strcmp(binding->type, "localnet")) { + put_replace_router_port_mac_flows(binding, chassis, + local_datapaths, ofpacts_p, + ofport, flow_table); + } + } else if (!tun && !is_ha_remote) { /* Remote port connected by localnet port */ /* Table 33, priority 100. diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index de0c06d..74d3692 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -86,6 +86,12 @@ enum ovn_datapath_type { DP_ROUTER /* OVN logical router. */ }; +/* Network type of a datapath */ +enum ovn_datapath_nw_type { + DP_NETWORK_OVERLAY, + DP_NETWORK_BRIDGED +}; + /* Returns an "enum ovn_stage" built from the arguments. * * (It's better to use ovn_stage_build() for type-safety reasons, but inline @@ -445,6 +451,8 @@ struct ovn_datapath { bool has_unknown; + enum ovn_datapath_nw_type network_type; + /* IPAM data. */ struct ipam_info ipam_info; @@ -491,6 +499,27 @@ cleanup_macam(struct hmap *macam_) } } +static void +ovn_datapath_update_nw_type(struct ovn_datapath *od) +{ + if (!od->nbs) { + return; + } + + if (!od->nbs->network_type || + !strlen(od->nbs->network_type) || + !strcmp(od->nbs->network_type, "overlay")) { + /* No value in network_type is taken as OVERLAY. */ + od->network_type = DP_NETWORK_OVERLAY; + } else if (!strcmp(od->nbs->network_type, "bridged")) { + od->network_type = DP_NETWORK_BRIDGED; + } else { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad network type %s, for %s", + od->nbs->network_type, od->nbs->name); + } +} + static struct ovn_datapath * ovn_datapath_create(struct hmap *datapaths, const struct uuid *key, const struct nbrec_logical_switch *nbs, @@ -682,6 +711,13 @@ ovn_datapath_update_external_ids(struct ovn_datapath *od) if (name2 && name2[0]) { smap_add(&ids, "name2", name2); } + + if (od->nbs) { + smap_add(&ids, "network-type", + (od->nbs->network_type && strlen(od->nbs->network_type)) ? + od->nbs->network_type : "overlay"); + } + sbrec_datapath_binding_set_external_ids(od->sb, &ids); smap_destroy(&ids); } @@ -734,9 +770,11 @@ join_datapaths(struct northd_context *ctx, struct hmap *datapaths, ovs_list_remove(&od->list); ovs_list_push_back(both, &od->list); ovn_datapath_update_external_ids(od); + ovn_datapath_update_nw_type(od); } else { od = ovn_datapath_create(datapaths, &nbs->header_.uuid, nbs, NULL, NULL); + ovn_datapath_update_nw_type(od); ovs_list_push_back(nb_only, &od->list); } diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 8c9e106..6275db1 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -1407,6 +1407,30 @@ egress pipeline of the destination localnet logical switch datapath and goes out of the integration bridge to the provider bridge ( belonging to the destination logical switch) via the localnet port. + While sending the packet to provider bridge, we also replace router + port mac as source mac with a chassis unique mac. + + This chassis unique mac is configured as global ovs config on each + chassis (eg. via "ovs-vsctl set open . external-ids: + ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i"").More + details on this config are present in ovn-controller(8). + + If the above is not configured, then source mac would be the router + port mac. This could create problem if we have more than one chassis. + This is because, since the router port is distributed, hence same + mac,vlan tuple will seen by physical network from other chassis + as well. This could cause some/all of these issues: +

    +
  • + Continous mac moves in top of the rack switch (TOR). +
  • +
  • + TOR dropping the traffic, which is causing continous mac moves. +
  • +
  • + TOR blocking the ports from which mac moves are happening. +
  • +
  • diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema index 2c87cbb..09df854 100644 --- a/ovn/ovn-nb.ovsschema +++ b/ovn/ovn-nb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Northbound", - "version": "5.16.0", - "cksum": "923459061 23095", + "version": "5.17.0", + "cksum": "1251608309 23487", "tables": { "NB_Global": { "columns": { @@ -29,6 +29,12 @@ "Logical_Switch": { "columns": { "name": {"type": "string"}, + "network_type": {"type": {"key": {"type": "string", + "enum": ["set", ["overlay", + "bridged"] + ]}, + "min": 0, + "max": 1}}, "ports": {"type": {"key": {"type": "uuid", "refTable": "Logical_Switch_Port", "refType": "strong"}, diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index cbaa949..0fa406a 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -158,6 +158,25 @@

    + +

    + Whether logical switch will fully virtualize the network (i.e overlay) + or it simply connects to the physical network (i.e bridged). + This field will take either of the following values: "overlay" or + "bridged". +

    + +

    + An "overlay" type logical switch means that 24 bit virtual network + identifier defines its broadcast domain and hence packets leaving + the chassis will be encapsulated. A "bridged" logical switch means that + it uses 12 bit vlan id as broadcast domain and packets leaving the + chassis would not be encapsulated, but would have a vlan header + instead (logical switches with vlan zero are also to be assigned the + type as "bridged"). For a "bridged" type logical switch, a localnet + port MUST be created. +

    +
    Load balance a virtual ip address to a set of logical port endpoint ip addresses. diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index 1a2bc1d..74498c6 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -301,6 +301,14 @@ See ovn-controller(8) for more information. + + ovn-controller populates this key with the set of options + configured in the column of the + Open_vSwitch database's + table. See ovn-controller(8) for more information. + + The overall purpose of these columns is described under Common Columns at the beginning of this document. @@ -2162,6 +2170,13 @@ tcp.flags = RST; the database. + + For a logical datapath that represents a logical switch, + ovn-northd stores in this key the network type from + corresponding row in + the database. + +

    ovn-northd copies these from the name fields in the header_.uuid), ls->name); + ds_put_format(s, "switch "UUID_FMT" (%s) (type: %s)", + UUID_ARGS(&ls->header_.uuid), ls->name, + ls->network_type && strlen(ls->network_type) ? + ls->network_type : "overlay"); print_alias(&ls->external_ids, "neutron:network_name", s); ds_put_char(s, '\n'); @@ -1116,7 +1119,8 @@ nbctl_show(struct ctl_context *ctx) static void nbctl_ls_add(struct ctl_context *ctx) { - const char *ls_name = ctx->argc == 2 ? ctx->argv[1] : NULL; + const char *ls_name = ctx->argc >= 2 ? ctx->argv[1] : NULL; + const char *nw_type = ctx->argc == 3 ? ctx->argv[2] : NULL; bool may_exist = shash_find(&ctx->options, "--may-exist") != NULL; bool add_duplicate = shash_find(&ctx->options, "--add-duplicate") != NULL; @@ -1153,6 +1157,33 @@ nbctl_ls_add(struct ctl_context *ctx) if (ls_name) { nbrec_logical_switch_set_name(ls, ls_name); } + + if (nw_type) { + nbrec_logical_switch_set_network_type(ls, nw_type); + } +} + +static void +nbctl_ls_set_network_type(struct ctl_context *ctx) +{ + const char *ls_name = ctx->argv[1]; + const char *ls_type = ctx->argv[2]; + const struct nbrec_logical_switch *ls = NULL; + + char *error = ls_by_name_or_uuid(ctx, ls_name, true, &ls); + + if (!ls || error) { + ctx->error = error; + return; + } + + if (strcmp(ls_type, "bridged") && strcmp(ls_type, "overlay")) { + ctl_error(ctx, "Invalid type: \"%s\", supported types are \"bridged\" " + "and \"overlay\"", ls_type); + return; + } + + nbrec_logical_switch_set_network_type(ls, ls_type); } static void @@ -1182,8 +1213,10 @@ nbctl_ls_list(struct ctl_context *ctx) smap_init(&switches); NBREC_LOGICAL_SWITCH_FOR_EACH(ls, ctx->idl) { - smap_add_format(&switches, ls->name, UUID_FMT " (%s)", - UUID_ARGS(&ls->header_.uuid), ls->name); + smap_add_format(&switches, ls->name, UUID_FMT " (%s) (type: %s)", + UUID_ARGS(&ls->header_.uuid), ls->name, + ls->network_type && strlen(ls->network_type) ? + ls->network_type : "overlay"); } const struct smap_node **nodes = smap_sort(&switches); for (size_t i = 0; i < smap_count(&switches); i++) { @@ -5504,10 +5537,12 @@ static const struct ctl_command_syntax nbctl_commands[] = { { "show", 0, 1, "[SWITCH]", NULL, nbctl_show, NULL, "", RO }, /* logical switch commands. */ - { "ls-add", 0, 1, "[SWITCH]", NULL, nbctl_ls_add, NULL, + { "ls-add", 0, 2, "[SWITCH] [TYPE]", NULL, nbctl_ls_add, NULL, "--may-exist,--add-duplicate", RW }, { "ls-del", 1, 1, "SWITCH", NULL, nbctl_ls_del, NULL, "--if-exists", RW }, { "ls-list", 0, 0, "", NULL, nbctl_ls_list, NULL, "", RO }, + { "ls-set-network-type", 2, 2, "SWITCH TYPE", NULL, + nbctl_ls_set_network_type, NULL, "", RW }, /* acl commands. */ { "acl-add", 5, 6, "{SWITCH | PORTGROUP} DIRECTION PRIORITY MATCH ACTION", diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at index 18c5c1d..457d6fd 100644 --- a/tests/ovn-nbctl.at +++ b/tests/ovn-nbctl.at @@ -56,31 +56,39 @@ m4_define([OVN_NBCTL_TEST], OVN_NBCTL_TEST([ovn_nbctl_basic_switch], [basic switch commands], [ AT_CHECK([ovn-nbctl ls-add ls0]) AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl -<0> (ls0) +<0> (ls0) (type: overlay) ]) AT_CHECK([ovn-nbctl ls-add ls1]) AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl -<0> (ls0) -<1> (ls1) +<0> (ls0) (type: overlay) +<1> (ls1) (type: overlay) +]) + +AT_CHECK([ovn-nbctl ls-add ls2 bridged]) +AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl +<0> (ls0) (type: overlay) +<1> (ls1) (type: overlay) +<2> (ls2) (type: bridged) ]) +AT_CHECK([ovn-nbctl ls-del ls2]) AT_CHECK([ovn-nbctl ls-del ls0]) AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl -<0> (ls1) +<0> (ls1) (type: overlay) ]) AT_CHECK([ovn-nbctl show ls0]) AT_CHECK([ovn-nbctl ls-add ls0]) AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0], - [switch <0> (ls0) + [switch <0> (ls0) (type: overlay) ]) AT_CHECK([ovn-nbctl ls-add ls0], [1], [], [ovn-nbctl: ls0: a switch with this name already exists ]) AT_CHECK([ovn-nbctl --may-exist ls-add ls0]) AT_CHECK([ovn-nbctl show ls0 | uuidfilt], [0], - [switch <0> (ls0) + [switch <0> (ls0) (type: overlay) ]) AT_CHECK([ovn-nbctl --add-duplicate ls-add ls0]) AT_CHECK([ovn-nbctl --may-exist --add-duplicate ls-add ls0], [1], [], @@ -102,7 +110,23 @@ AT_CHECK([ovn-nbctl --add-duplicate ls-add], [1], [], ]) AT_CHECK([ovn-nbctl --may-exist ls-add], [1], [], [ovn-nbctl: --may-exist requires specifying a name -])]) +]) + +AT_CHECK([ovn-nbctl ls-set-network-type ls1 bridged]) +AT_CHECK([ovn-nbctl show ls1 | uuidfilt], [0], + [switch <0> (ls1) (type: bridged) +]) + +AT_CHECK([ovn-nbctl ls-set-network-type ls1 overlay]) +AT_CHECK([ovn-nbctl show ls1 | uuidfilt], [0], + [switch <0> (ls1) (type: overlay) +]) + +AT_CHECK([ovn-nbctl ls-set-network-type ls1 temp], [1], [], + [ovn-nbctl: Invalid type: "temp", supported types are "bridged" and "overlay" +]) + +]) dnl --------------------------------------------------------------------- @@ -1490,7 +1514,7 @@ dnl --------------------------------------------------------------------- OVN_NBCTL_TEST([ovn_nbctl_dry_run_mode], [dry run mode], [ dnl Check that dry run has no permanent effect. AT_CHECK([ovn-nbctl --dry-run ls-add ls0 -- ls-list | uuidfilt], [0], [dnl -<0> (ls0) +<0> (ls0) (type: overlay) ]) AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl ]) @@ -1498,7 +1522,7 @@ AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl dnl Check that dry-run mode is not sticky. AT_CHECK([ovn-nbctl ls-add ls0]) AT_CHECK([ovn-nbctl ls-list | uuidfilt], [0], [dnl -<0> (ls0) +<0> (ls0) (type: overlay) ])]) dnl --------------------------------------------------------------------- @@ -1508,13 +1532,13 @@ AT_CHECK([ovn-nbctl ls-add ls0 -- ls-add ls1]) dnl Expect one line for one command. AT_CHECK([ovn-nbctl --oneline ls-list | uuidfilt], [0], [dnl -<0> (ls0)\n<1> (ls1) +<0> (ls0) (type: overlay)\n<1> (ls1) (type: overlay) ]) dnl Expect lines for two commands. AT_CHECK([ovn-nbctl --oneline ls-list -- ls-list | uuidfilt], [0], [dnl -<0> (ls0)\n<1> (ls1) -<0> (ls0)\n<1> (ls1) +<0> (ls0) (type: overlay)\n<1> (ls1) (type: overlay) +<0> (ls0) (type: overlay)\n<1> (ls1) (type: overlay) ])]) dnl --------------------------------------------------------------------- diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 62e58fd..5564936 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -898,3 +898,25 @@ as northd OVS_APP_EXIT_AND_WAIT([ovn-northd]) AT_CLEANUP + +AT_SETUP([ovn -- check logical switch type propagation from NBDB to SBDB]) +AT_SKIP_IF([test $HAVE_PYTHON = no]) +ovn_start + +ovn-nbctl ls-add ls0 + +uuid=`ovn-sbctl --bare --columns=_uuid list Datapath` +echo "LS UUID is: " $uuid + +type=`ovn-sbctl get Datapath_Binding ${uuid} external_ids:network-type` +echo "LS TYPE is: " $type +AT_CHECK([ovn-sbctl get Datapath_Binding ${uuid} external_ids:network-type], [0], [overlay +]) + +ovn-nbctl ls-set-network-type ls0 bridged +type=`ovn-sbctl get Datapath_Binding ${uuid} external_ids:network-type` +echo "LS TYPE is: " $type +AT_CHECK([ovn-sbctl get Datapath_Binding ${uuid} external_ids:network-type], [0], [bridged +]) + +AT_CLEANUP diff --git a/tests/ovn.at b/tests/ovn.at index 1231f41..e5108a7 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -14017,3 +14017,200 @@ ovn-hv4-0 OVN_CLEANUP([hv1], [hv2], [hv3]) AT_CLEANUP + +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR chassis mac]) +ovn_start + + +# In this test cases we create 2 switches, all connected to same +# physical network (through br-phys on each HV). Each switch has +# 1 VIF. Each HV has 1 VIF port. The first digit +# of VIF port name indicates the hypervisor it is bound to, e.g. +# lp23 means VIF 3 on hv2. +# +# Each switch's VLAN tag and their logical switch ports are: +# - ls1: +# - tagged with VLAN 101 +# - ports: lp11 +# - ls2: +# - tagged with VLAN 201 +# - ports: lp22 +# +# Note: a localnet port is created for each switch to connect to +# physical network. + +for i in 1 2; do + ls_name=ls$i + ovn-nbctl ls-add $ls_name bridged + ln_port_name=ln$i + if test $i -eq 1; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 101 + elif test $i -eq 2; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 201 + fi + ovn-nbctl lsp-set-addresses $ln_port_name unknown + ovn-nbctl lsp-set-type $ln_port_name localnet + ovn-nbctl lsp-set-options $ln_port_name network_name=phys +done + +# lsp_to_ls LSP +# +# Prints the name of the logical switch that contains LSP. +lsp_to_ls () { + case $1 in dnl ( + lp?[[11]]) echo ls1 ;; dnl ( + lp?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_ls () { + case $1 in dnl ( + vif?[[11]]) echo ls1 ;; dnl ( + vif?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +hv_to_num () { + case $1 in dnl ( + hv1) echo 1 ;; dnl ( + hv2) echo 2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_num () { + case $1 in dnl ( + vif22) echo 22 ;; dnl ( + vif21) echo 21 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_hv () { + case $1 in dnl ( + vif[[1]]?) echo hv1 ;; dnl ( + vif[[2]]?) echo hv2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_lrp () { + echo router-to-`vif_to_ls $1` +} + +hv_to_chassis_mac () { + case $1 in dnl ( + hv[[1]]) echo aa:bb:cc:dd:ee:11 ;; dnl ( + hv[[2]]) echo aa:bb:cc:dd:ee:22 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +net_add n1 +for i in 1 2; do + sim_add hv$i + as hv$i + ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i" + ovn_attach n1 br-phys 192.168.0.$i + + ovs-vsctl add-port br-int vif$i$i -- \ + set Interface vif$i$i external-ids:iface-id=lp$i$i \ + options:tx_pcap=hv$i/vif$i$i-tx.pcap \ + options:rxq_pcap=hv$i/vif$i$i-rx.pcap \ + ofport-request=$i$i + + lsp_name=lp$i$i + ls_name=$(lsp_to_ls $lsp_name) + + ovn-nbctl lsp-add $ls_name $lsp_name + ovn-nbctl lsp-set-addresses $lsp_name "f0:00:00:00:00:$i$i 192.168.$i.$i" + ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$i + + OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup]) + +done + +ovn-nbctl lr-add router +ovn-nbctl lrp-add router router-to-ls1 00:00:01:01:02:03 192.168.1.3/24 +ovn-nbctl lrp-add router router-to-ls2 00:00:01:01:02:05 192.168.2.3/24 + +ovn-nbctl lsp-add ls1 ls1-to-router -- set Logical_Switch_Port ls1-to-router type=router options:router-port=router-to-ls1 -- lsp-set-addresses ls1-to-router router +ovn-nbctl lsp-add ls2 ls2-to-router -- set Logical_Switch_Port ls2-to-router type=router options:router-port=router-to-ls2 -- lsp-set-addresses ls2-to-router router + +ovn-nbctl --wait=sb sync +#ovn-sbctl dump-flows + +ovn-nbctl show +ovn-sbctl show + +OVN_POPULATE_ARP + +test_ip() { + # This packet has bad checksums but logical L3 routing doesn't check. + local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5 + local packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 + shift; shift; shift; shift; shift + hv=`vif_to_hv $inport` + hv_num=`hv_to_num $hv` + chassis_mac=`hv_to_chassis_mac $hv` + as $hv ovs-appctl netdev-dummy/receive $inport $packet + #as $hv ovs-appctl ofproto/trace br-int in_port=$inport $packet + in_ls=`vif_to_ls $inport` + in_lrp=`vif_to_lrp $inport` + for outport; do + out_ls=`vif_to_ls $outport` + if test $in_ls = $out_ls; then + # Ports on the same logical switch receive exactly the same packet. + echo $packet + else + # Routing decrements TTL and updates source and dest MAC + # (and checksum). + outport_num=`vif_to_num $outport` + out_lrp=`vif_to_lrp $outport` + echo f000000000${outport_num}aabbccddee${hv_num}${hv_num}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000 + fi >> $outport.expected + done +} + +# Dump a bunch of info helpful for debugging if there's a failure. + +echo "------ OVN dump ------" +ovn-nbctl show +ovn-sbctl show + +echo "------ hv1 dump ------" +as hv1 ovs-vsctl show +as hv1 ovs-vsctl list Open_Vswitch + +echo "------ hv2 dump ------" +as hv2 ovs-vsctl show +as hv2 ovs-vsctl list Open_Vswitch + +echo "Send traffic" +sip=`ip_to_hex 192 168 1 1` +dip=`ip_to_hex 192 168 2 2` +test_ip vif11 f00000000011 000001010203 $sip $dip vif22 + +sleep 1 + +echo "----------- Post Traffic hv1 dump -----------" +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv1 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv2 dump -----------" +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv2 ovs-appctl fdb/show br-phys + +OVN_CHECK_PACKETS([hv2/vif22-tx.pcap], [vif22.expected]) + +OVN_CLEANUP([hv1],[hv2]) + +AT_CLEANUP From patchwork Thu May 30 00:20:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Sharma X-Patchwork-Id: 1107466 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nutanix.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nutanix.com header.i=@nutanix.com header.b="U5BZ9cpM"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45DpM61669z9s3l for ; Thu, 30 May 2019 10:27:46 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 31A0A2D29; Thu, 30 May 2019 00:26:38 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 34FA32D03 for ; Thu, 30 May 2019 00:20:31 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx0a-002c1b01.pphosted.com (mx0a-002c1b01.pphosted.com [148.163.151.68]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1F5F5826 for ; Thu, 30 May 2019 00:20:28 +0000 (UTC) Received: from pps.filterd (m0127838.ppops.net [127.0.0.1]) by mx0a-002c1b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4U0F7jn022933 for ; Wed, 29 May 2019 17:20:27 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=proofpoint20171006; bh=P0gByHV4PK1HKJX3oss4kbeK1fttKnqCdJJhi4NjluM=; b=U5BZ9cpMcAGX18YpuJvU/ZvPw3smzZdVIOc1SCQglgGCyr/4Lel7DSA2WCgBlVB49xJo 0dFZtN6QKekY2Mx72niv1Suv0zqRMQVgmkC/4xNty1+8PX3uai6b5InOXsEiOwWQGBAT PpcrKdZhRG0FnQE0Ysh5P4kdsQJlsWyNJuQaHnOURIA2Fl5klps5R+C/3NCa3VPzXedN 1UGjpvizh34Lof0NB9mWQBEiSDIeM7eDuwxt5v2f89dBfGnDT4yEYKvDacFmYhnIcWwx or3kRuhFK5dfR2Asxl8RjRLlkp3P2RWmyswOb4y2u8Zp68IS3A4d3R2ftgWGpHDQXAAm gQ== Received: from nam05-co1-obe.outbound.protection.outlook.com (mail-co1nam05lp2059.outbound.protection.outlook.com [104.47.48.59]) by mx0a-002c1b01.pphosted.com with ESMTP id 2ssd3tjftf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Wed, 29 May 2019 17:20:27 -0700 Received: from MW2PR02MB3899.namprd02.prod.outlook.com (52.132.178.28) by MW2PR02MB3850.namprd02.prod.outlook.com (52.132.178.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1922.15; Thu, 30 May 2019 00:20:25 +0000 Received: from MW2PR02MB3899.namprd02.prod.outlook.com ([fe80::95c1:331:992:9d85]) by MW2PR02MB3899.namprd02.prod.outlook.com ([fe80::95c1:331:992:9d85%7]) with mapi id 15.20.1922.021; Thu, 30 May 2019 00:20:25 +0000 From: Ankur Sharma To: "ovs-dev@openvswitch.org" Thread-Topic: [PATCH v9 2/2] OVN: Enable N-S Traffic, Vlan backed DVR Thread-Index: AQHVFn16Khzi2npTe061HkUKIG++wg== Date: Thu, 30 May 2019 00:20:25 +0000 Message-ID: <1559175728-127062-3-git-send-email-ankur.sharma@nutanix.com> References: <1559175728-127062-1-git-send-email-ankur.sharma@nutanix.com> In-Reply-To: <1559175728-127062-1-git-send-email-ankur.sharma@nutanix.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BYAPR08CA0042.namprd08.prod.outlook.com (2603:10b6:a03:117::19) To MW2PR02MB3899.namprd02.prod.outlook.com (2603:10b6:907:4::28) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 1.8.3.1 x-originating-ip: [192.146.154.1] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 5e60ce0f-d259-4743-1ffc-08d6e4949cc8 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MW2PR02MB3850; x-ms-traffictypediagnostic: MW2PR02MB3850: x-ms-exchange-purlcount: 2 x-proofpoint-crosstenant: true x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00531FAC2C x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(376002)(39860400002)(366004)(346002)(136003)(189003)(199004)(14454004)(14444005)(26005)(386003)(25786009)(2616005)(446003)(71200400001)(66066001)(53936002)(11346002)(7736002)(410100003)(6512007)(6916009)(81166006)(102836004)(6306002)(36756003)(186003)(53946003)(8936002)(86362001)(305945005)(8676002)(486006)(44832011)(478600001)(6506007)(5024004)(476003)(71190400001)(5660300002)(2906002)(30864003)(2351001)(316002)(81156014)(966005)(6436002)(66946007)(50226002)(6486002)(66574012)(5640700003)(4326008)(107886003)(66556008)(64756008)(76176011)(66446008)(4720700003)(99286004)(68736007)(2501003)(52116002)(6116002)(256004)(3846002)(66476007)(73956011)(64030200001)(569006); DIR:OUT; SFP:1102; SCL:1; SRVR:MW2PR02MB3850; H:MW2PR02MB3899.namprd02.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: nutanix.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 3UwgKctC+GYA5ocSgweefLUaUmVHfBlYMRUv67ywx2wore81+YJ7k+ZFxLi+550JuMP5XWBiX3lB7lupPWTeXLJJCiCrBywl6WdZIbr1jbdq2WGUTs3SRJKOhsgjbLnRGlI9ZVPntjyTydUKt6fCqpL+0ETSiGZVFdwtV0M99f5JRX+Lmi974gHBL6E17sJcDnG8Be+sHYaPCG5mYs45kNf+SCM8NS1vJirL+qsoeiC2urDkgMW+FVYC2HzRaeoRwzfz4ek/43FxiC2NJty+WDhWDj6cQIOMmIehMBoqP8YIpWx4IX1BoemRm7InaiPOrDKwyyj2SJ6a15Rbcklb2aK3zaxjs0OYCec99JMyzAGJvNRx81+0GJrps0WyDL3OhrDgCbC9Tswz4bz1l1hFdsC5hwRYNSaeHasoBxhQ8dA= MIME-Version: 1.0 X-OriginatorOrg: nutanix.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5e60ce0f-d259-4743-1ffc-08d6e4949cc8 X-MS-Exchange-CrossTenant-originalarrivaltime: 30 May 2019 00:20:25.8926 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: bb047546-786f-4de1-bd75-24e5b6f79043 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ankur.sharma@nutanix.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR02MB3850 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-30_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v9 2/2] OVN: Enable N-S Traffic, Vlan backed DVR X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Background: [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353066.html [2] https://docs.google.com/document/d/1uoQH478wM1OZ16HrxzbOUvk5LvFnfNEWbkPT6Zmm9OU/edit?usp=sharing This Series: Layer 2, Layer 3 E-W and Layer 3 N-S (NO NAT) changes for vlan backed distributed logical router. This patch: For North-South traffic, we need a chassis which will respond to ARP requests for router port coming from outside. For this purpose, we will reply upon gateway-chassis construct in OVN, on a logical router port, we will associate one or more chassis as gateway chassis. One of these chassis would be active at a point and will become entry point to traffic, bound for end points behind logical router coming from outside network (North to South). This patch make some enhancements to gateway chassis implementation to manage above used case. A. Do not replace router port mac with chassis mac on gateway chassis. This is done, because: i. Chassisredirect port is NOT a distributed port, hence we need not replace its mac address (which same as router port mac). ii. ARP cache will be consistent everywhere, i.e just like endpoints on OVN chassis will see configured router port mac as resolved mac for router port ip, outside endpoints will see that as well. iii. For implementing Network Address Translation. Although not a part of this series. But, follow up series would be having this feature and approach would rely upon sending packets to redirect chassis using chassis redirect router port mac as dest mac. B. Advertise router port GARP on gateway chassis. This is needed, especially if a failover happens and chassisredirect port moves to a new gateway chassis. Otherwise, there would be packet drops till outside router ARPs for router port ip again. Intention of this GARP is to update top of the rack (TOR) to direct router port mac to new hypervisor. Hence, we could have done the same using RARP as well, but because ovn-controller has implementation for GARP already, hence it did not look like worthy to add a RARP implementation just for this. C. For South to North traffic, we need not pass through gateway chassis, if there is no address transalation needed. For overlay networks, NATing is a must to talk to outside networks. However, for vlan backed networks, NATing is not a must, and hence in the absence of NATing configuration we need redirect the packet to gateway chassis. Signed-off-by: Ankur Sharma Signed-off-by: Ankur Sharma > Signed-off-by: Ankur Sharma > --- ovn/controller/physical.c | 24 +- ovn/controller/pinctrl.c | 205 +++++++++++-- ovn/controller/pinctrl.h | 6 + ovn/lib/ovn-util.c | 31 ++ ovn/lib/ovn-util.h | 6 + ovn/northd/ovn-northd.c | 43 ++- ovn/ovn-architecture.7.xml | 87 +++++- tests/ovn.at | 732 ++++++++++++++++++++++++++++++++++++++++++++- 8 files changed, 1090 insertions(+), 44 deletions(-) diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index af587a5..1ab5968 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -21,6 +21,7 @@ #include "lflow.h" #include "lport.h" #include "chassis.h" +#include "pinctrl.h" #include "lib/bundle.h" #include "openvswitch/poll-loop.h" #include "lib/uuid.h" @@ -238,9 +239,12 @@ get_zone_ids(const struct sbrec_port_binding *binding, } static void -put_replace_router_port_mac_flows(const struct +put_replace_router_port_mac_flows(struct ovsdb_idl_index + *sbrec_port_binding_by_name, + const struct sbrec_port_binding *localnet_port, const struct sbrec_chassis *chassis, + const struct sset *active_tunnels, const struct hmap *local_datapaths, struct ofpbuf *ofpacts_p, ofp_port_t ofport, @@ -281,8 +285,21 @@ put_replace_router_port_mac_flows(const struct char *err_str = NULL; struct match match; struct ofpact_mac *replace_mac; + char *cr_peer_name = xasprintf("cr-%s", rport_binding->logical_port); - /* Table 65, priority 150. + + if (pinctrl_is_chassis_resident(sbrec_port_binding_by_name, + chassis, active_tunnels, + cr_peer_name)) { + /* If a router port's chassisredirect port is + * resident on this chassis, then we need not do mac replace. */ + free(cr_peer_name); + continue; + } + + free(cr_peer_name); + + /* Table 65, priority 150. * ======================= * * Implements output to localnet port. @@ -797,7 +814,8 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name, &match, ofpacts_p, &binding->header_.uuid); if (!strcmp(binding->type, "localnet")) { - put_replace_router_port_mac_flows(binding, chassis, + put_replace_router_port_mac_flows(sbrec_port_binding_by_name, + binding, chassis, active_tunnels, local_datapaths, ofpacts_p, ofport, flow_table); } diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index b7bb4c9..a145867 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -226,6 +226,8 @@ static bool may_inject_pkts(void); COVERAGE_DEFINE(pinctrl_drop_put_mac_binding); COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map); +#define GARP_DEF_REPEAT_INTERVAL_MS (3 * 60 * 1000) /* 3 minutes */ + void pinctrl_init(void) { @@ -242,6 +244,25 @@ pinctrl_init(void) &pinctrl); } +bool +pinctrl_is_chassis_resident(struct ovsdb_idl_index *sbrec_port_binding_by_name, + const struct sbrec_chassis *chassis, + const struct sset *active_tunnels, + const char *port_name) +{ + const struct sbrec_port_binding *pb + = lport_lookup_by_name(sbrec_port_binding_by_name, port_name); + if (!pb || !pb->chassis) { + return false; + } + if (strcmp(pb->type, "chassisredirect")) { + return pb->chassis == chassis; + } else { + return ha_chassis_group_is_active(pb->ha_chassis_group, + active_tunnels, chassis); + } +} + static ovs_be32 queue_msg(struct rconn *swconn, struct ofpbuf *msg) { @@ -2548,6 +2569,8 @@ struct garp_data { int backoff; /* Backoff for the next announcement. */ uint32_t dp_key; /* Datapath used to output this GARP. */ uint32_t port_key; /* Port to inject the GARP into. */ + bool is_repeat; /* Send GARPs continously */ + long long int repeat_interval; /* Interval between GARP bursts in ms */ }; /* Contains GARPs to be sent. Protected by pinctrl_mutex*/ @@ -2568,7 +2591,8 @@ destroy_send_garps(void) /* Runs with in the main ovn-controller thread context. */ static void add_garp(const char *name, const struct eth_addr ea, ovs_be32 ip, - uint32_t dp_key, uint32_t port_key) + uint32_t dp_key, uint32_t port_key, bool is_repeat, + long long int repeat_interval) { struct garp_data *garp = xmalloc(sizeof *garp); garp->ea = ea; @@ -2577,6 +2601,8 @@ add_garp(const char *name, const struct eth_addr ea, ovs_be32 ip, garp->backoff = 1; garp->dp_key = dp_key; garp->port_key = port_key; + garp->is_repeat = is_repeat; + garp->repeat_interval = repeat_interval; shash_add(&send_garp_data, name, garp); /* Notify pinctrl_handler so that it can wakeup and process @@ -2586,7 +2612,8 @@ add_garp(const char *name, const struct eth_addr ea, ovs_be32 ip, /* Add or update a vif for which GARPs need to be announced. */ static void -send_garp_update(const struct sbrec_port_binding *binding_rec, +send_garp_update(struct ovsdb_idl_index *sbrec_port_binding_by_name, + const struct sbrec_port_binding *binding_rec, struct shash *nat_addresses) { volatile struct garp_data *garp = NULL; @@ -2611,7 +2638,7 @@ send_garp_update(const struct sbrec_port_binding *binding_rec, add_garp(name, laddrs->ea, laddrs->ipv4_addrs[i].addr, binding_rec->datapath->tunnel_key, - binding_rec->tunnel_key); + binding_rec->tunnel_key, false, 0); } free(name); } @@ -2621,6 +2648,64 @@ send_garp_update(const struct sbrec_port_binding *binding_rec, return; } + /* Update GARPs for local chassisredirect port, if the peer + * layer 2 switch is of type vlan. + */ + if (!strcmp(binding_rec->type, "chassisredirect")) { + struct eth_addr mac; + ovs_be32 ip, mask; + uint32_t dp_key = 0; + uint32_t port_key = 0; + const struct sbrec_port_binding *peer_port = NULL; + const struct sbrec_port_binding *distributed_port = NULL; + + if (!ovn_sbrec_get_port_binding_ip_mac(binding_rec, &mac, + &ip, &mask)) { + /* Router Port binding without ip and mac configured. */ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "cannot send garp, router port binding: %s, " + "does not have proper ip,mac values: %s", + binding_rec->logical_port, *binding_rec->mac); + return; + } + + const char *lrp_name = smap_get(&binding_rec->options, + "distributed-port"); + ovs_assert(lrp_name); + + distributed_port = lport_lookup_by_name(sbrec_port_binding_by_name, + lrp_name); + ovs_assert(distributed_port); + + const char *peer_name = smap_get(&distributed_port->options, "peer"); + ovs_assert(peer_name); + + peer_port = lport_lookup_by_name(sbrec_port_binding_by_name, + peer_name); + ovs_assert(peer_port); + + const char *network_type = smap_get(&peer_port->datapath->external_ids, + "network-type"); + + /* Advertise GARP only of logical switch is of type bridged. */ + if (!network_type || strcmp(network_type, "bridged")) { + return; + } + + dp_key = peer_port->datapath->tunnel_key; + port_key = peer_port->tunnel_key; + + garp = shash_find_data(&send_garp_data, binding_rec->logical_port); + if (garp) { + garp->dp_key = dp_key; + garp->port_key = port_key; + } else { + add_garp(binding_rec->logical_port, mac, ip, + dp_key, port_key, true, GARP_DEF_REPEAT_INTERVAL_MS); + } + return; + } + /* Update GARP for vif if it exists. */ garp = shash_find_data(&send_garp_data, binding_rec->logical_port); if (garp) { @@ -2640,7 +2725,8 @@ send_garp_update(const struct sbrec_port_binding *binding_rec, add_garp(binding_rec->logical_port, laddrs.ea, laddrs.ipv4_addrs[0].addr, - binding_rec->datapath->tunnel_key, binding_rec->tunnel_key); + binding_rec->datapath->tunnel_key, binding_rec->tunnel_key, + false, 0); destroy_lport_addresses(&laddrs); break; @@ -2702,7 +2788,12 @@ send_garp(struct rconn *swconn, struct garp_data *garp, garp->backoff *= 2; garp->announce_time = current_time + garp->backoff * 1000; } else { - garp->announce_time = LLONG_MAX; + if (garp->is_repeat) { + garp->backoff = 1; + garp->announce_time = current_time + garp->repeat_interval; + } else { + garp->announce_time = LLONG_MAX; + } } return garp->announce_time; } @@ -2786,25 +2877,6 @@ get_localnet_vifs_l3gwports( sbrec_port_binding_index_destroy_row(target); } -static bool -pinctrl_is_chassis_resident(struct ovsdb_idl_index *sbrec_port_binding_by_name, - const struct sbrec_chassis *chassis, - const struct sset *active_tunnels, - const char *port_name) -{ - const struct sbrec_port_binding *pb - = lport_lookup_by_name(sbrec_port_binding_by_name, port_name); - if (!pb || !pb->chassis) { - return false; - } - if (strcmp(pb->type, "chassisredirect")) { - return pb->chassis == chassis; - } else { - return ha_chassis_group_is_active(pb->ha_chassis_group, - active_tunnels, chassis); - } -} - /* Extracts the mac, IPv4 and IPv6 addresses, and logical port from * 'addresses' which should be of the format 'MAC [IP1 IP2 ..] * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid IPv4 @@ -2946,6 +3018,67 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index *sbrec_port_binding_by_name, } static void +get_local_cr_ports(struct ovsdb_idl_index *sbrec_port_binding_by_name, + struct sset *local_cr_ports, + struct sset *local_l3gw_ports, + const struct sbrec_chassis *chassis, + const struct sset *active_tunnels) +{ + const char *gw_port; + SSET_FOR_EACH (gw_port, local_l3gw_ports) { + const struct sbrec_port_binding *binding_rec; + + binding_rec = lport_lookup_by_name(sbrec_port_binding_by_name, + gw_port); + if (!binding_rec) { + continue; + } + + /* For the patch port we will add send garp for peer's ip and mac. */ + if (!strcmp(binding_rec->type, "patch")) { + const struct sbrec_port_binding *cr_port = NULL; + + bool is_cr_resident; + struct eth_addr mac; + ovs_be32 ip, mask; + + const char *peer_name = smap_get(&binding_rec->options, "peer"); + ovs_assert(peer_name); + + char *cr_peer_name = xasprintf("cr-%s", peer_name); + cr_port = lport_lookup_by_name(sbrec_port_binding_by_name, + cr_peer_name); + free(cr_peer_name); + + if (!cr_port) { + continue; + } + + is_cr_resident = pinctrl_is_chassis_resident + (sbrec_port_binding_by_name, + chassis, + active_tunnels, + cr_port->logical_port); + if (!is_cr_resident) { + continue; + } + + if (!ovn_sbrec_get_port_binding_ip_mac(cr_port, &mac, &ip, + &mask)) { + /* Router Port binding without ip and mac configured. */ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "cannot send garp, router port binding: %s, " + "does not have proper ip,mac values: %s", + cr_port->logical_port, *cr_port->mac); + return; + } + + sset_add(local_cr_ports, cr_port->logical_port); + } + } +} + +static void send_garp_wait(long long int send_garp_time) { /* Set the poll timer for next garp only if there is garp data to @@ -2990,6 +3123,8 @@ send_garp_prepare(struct ovsdb_idl_index *sbrec_port_binding_by_datapath, { struct sset localnet_vifs = SSET_INITIALIZER(&localnet_vifs); struct sset local_l3gw_ports = SSET_INITIALIZER(&local_l3gw_ports); + struct sset local_cr_ports = SSET_INITIALIZER(&local_cr_ports); + struct sset nat_ip_keys = SSET_INITIALIZER(&nat_ip_keys); struct shash nat_addresses; @@ -3004,11 +3139,17 @@ send_garp_prepare(struct ovsdb_idl_index *sbrec_port_binding_by_datapath, &nat_ip_keys, &local_l3gw_ports, chassis, active_tunnels, &nat_addresses); + + get_local_cr_ports(sbrec_port_binding_by_name, + &local_cr_ports, &local_l3gw_ports, + chassis, active_tunnels); + /* For deleted ports and deleted nat ips, remove from send_garp_data. */ struct shash_node *iter, *next; SHASH_FOR_EACH_SAFE (iter, next, &send_garp_data) { if (!sset_contains(&localnet_vifs, iter->name) && - !sset_contains(&nat_ip_keys, iter->name)) { + !sset_contains(&nat_ip_keys, iter->name) && + !sset_contains(&local_cr_ports, iter->name)) { send_garp_delete(iter->name); } } @@ -3019,7 +3160,7 @@ send_garp_prepare(struct ovsdb_idl_index *sbrec_port_binding_by_datapath, const struct sbrec_port_binding *pb = lport_lookup_by_name( sbrec_port_binding_by_name, iface_id); if (pb) { - send_garp_update(pb, &nat_addresses); + send_garp_update(sbrec_port_binding_by_name, pb, &nat_addresses); } } @@ -3029,7 +3170,17 @@ send_garp_prepare(struct ovsdb_idl_index *sbrec_port_binding_by_datapath, const struct sbrec_port_binding *pb = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port); if (pb) { - send_garp_update(pb, &nat_addresses); + send_garp_update(sbrec_port_binding_by_name, pb, &nat_addresses); + } + } + + /* Update send_garp_data for chassisredirect router ports. */ + const char *cr_port; + SSET_FOR_EACH (cr_port, &local_cr_ports) { + const struct sbrec_port_binding *pb + = lport_lookup_by_name(sbrec_port_binding_by_name, cr_port); + if (pb) { + send_garp_update(sbrec_port_binding_by_name, pb, &nat_addresses); } } diff --git a/ovn/controller/pinctrl.h b/ovn/controller/pinctrl.h index f61d705..92f704e 100644 --- a/ovn/controller/pinctrl.h +++ b/ovn/controller/pinctrl.h @@ -44,4 +44,10 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, void pinctrl_wait(struct ovsdb_idl_txn *ovnsb_idl_txn); void pinctrl_destroy(void); +bool +pinctrl_is_chassis_resident(struct ovsdb_idl_index *sbrec_port_binding_by_name, + const struct sbrec_chassis *chassis, + const struct sset *active_tunnels, + const char *port_name); + #endif /* ovn/pinctrl.h */ diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c index 0f07d80..3d0ad8e 100644 --- a/ovn/lib/ovn-util.c +++ b/ovn/lib/ovn-util.c @@ -16,6 +16,7 @@ #include "ovn-util.h" #include "dirs.h" #include "openvswitch/vlog.h" +#include "openvswitch/ofp-parse.h" #include "ovn/lib/ovn-nb-idl.h" #include "ovn/lib/ovn-sb-idl.h" @@ -371,3 +372,33 @@ ovn_logical_flow_hash(const struct uuid *logical_datapath, hash = hash_string(match, hash); return hash_string(actions, hash); } + +/* Extracts the mac, ip and mask for a sbrec_port_binding. + * + * Expects following format: + * "MAC_ADDRESS IP/MASK" + * + * Return true if MAC, IP and MASK are found, false otherwise. + */ +bool +ovn_sbrec_get_port_binding_ip_mac(const struct sbrec_port_binding *binding, + struct eth_addr *mac, + ovs_be32 *ip, ovs_be32 *mask) +{ + char *err_str = NULL; + + err_str = str_to_mac(binding->mac[0], mac); + if (err_str) { + free(err_str); + return false; + } + + err_str = ip_parse_masked(binding->mac[0] + ETH_ADDR_STRLEN + 1, + ip, mask); + if (err_str) { + free(err_str); + return false; + } + + return true; +} diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h index 6d5e1df..c01595a 100644 --- a/ovn/lib/ovn-util.h +++ b/ovn/lib/ovn-util.h @@ -19,6 +19,7 @@ #include "lib/packets.h" struct nbrec_logical_router_port; +struct sbrec_port_binding; struct sbrec_logical_flow; struct uuid; @@ -81,4 +82,9 @@ uint32_t ovn_logical_flow_hash(const struct uuid *logical_datapath, uint16_t priority, const char *match, const char *actions); +bool +ovn_sbrec_get_port_binding_ip_mac(const struct sbrec_port_binding *binding, + struct eth_addr *mac, ovs_be32 *ip, + ovs_be32 *mask); + #endif diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 74d3692..6835910 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -5914,6 +5914,20 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_put_format(&match, " && is_chassis_resident(%s)", op->od->l3redirect_port->json_key); } + } else if (op->peer && + op->peer->od->network_type == DP_NETWORK_BRIDGED) { + /* For a router port connected to bridged logical switch, + * we will always have the is_chassis_resident check. + * This is because there could be vm/server on vlan network, + * but not on OVN chassis and could end up arping for router + * port ip. + * + * This check works on the assumption that for OVN chassis, + * VMs logical switch ARP responder will respond to ARP + * requests for router port IP. + */ + ds_put_format(&match, " && is_chassis_resident(\"cr-%s\")", + op->key); } ds_clear(&actions); @@ -7365,18 +7379,23 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 300, REGBIT_DISTRIBUTED_NAT" == 1", "next;"); - /* For traffic with outport == l3dgw_port, if the - * packet did not match any higher priority redirect - * rule, then the traffic is redirected to the central - * instance of the l3dgw_port. */ - ds_clear(&match); - ds_put_format(&match, "outport == %s", - od->l3dgw_port->json_key); - ds_clear(&actions); - ds_put_format(&actions, "outport = %s; next;", - od->l3redirect_port->json_key); - ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50, - ds_cstr(&match), ds_cstr(&actions)); + /* For VLAN backed networks, default match will not redirect to + * chassis redirect port. */ + if (od->l3dgw_port->peer && + od->l3dgw_port->peer->od->network_type == DP_NETWORK_OVERLAY) { + /* For traffic with outport == l3dgw_port, if the + * packet did not match any higher priority redirect + * rule, then the traffic is redirected to the central + * instance of the l3dgw_port. */ + ds_clear(&match); + ds_put_format(&match, "outport == %s", + od->l3dgw_port->json_key); + ds_clear(&actions); + ds_put_format(&actions, "outport = %s; next;", + od->l3redirect_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50, + ds_cstr(&match), ds_cstr(&actions)); + } /* If the Ethernet destination has not been resolved, * redirect to the central instance of the l3dgw_port. diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 6275db1..6df711e 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -1441,7 +1441,7 @@

  • -

    External traffic

    +

    External traffic (NAT)

    The following happens when a VM sends an external traffic (which requires @@ -1607,6 +1607,91 @@ +

    External traffic (NO NAT)

    +

    + The following happens when a VM sends an external traffic (i.e to non + logical router connected network), but there is not need for NATing. +

    + +

    + Since, there is no NATing required, hence we need not redirect the packet + to a gateway chassis. As a result, this packet flow is same as East-West. + In order to ensure that OVN will not redirect the packet over a tunnel + to gateway-chassis, "network_type" of destination localnet logical switch, + should be set as "bridged". A "bridged" logical switch ensures that there + is no tunnel encapsulation done while forwarding the packet on it. + Please refer to ovn-nb(5) for more details. +

    + +
      +
    1. + It first enters the ingress pipeline, and then egress pipeline of the + source localnet logical switch datapath. It then enters the ingress + pipeline of the logical router datapath via the logical router port in + the source chassis. +
    2. + +
    3. + Routing decision is taken. Since, destination network is NOT directly + connected to logial router, hence a static route is expected, which will + provide next hop ip. +
    4. + +
    5. + From the router datapath, packet enters the ingress pipeline and then + egress pipeline of the destination localnet logical switch datapath + (it is of type "bridged" and this is where the next hop is present) + and goes out of the integration bridge to the provider bridge ( + belonging to the destination logical switch) via the localnet port. + Same as East-West, source mac will replaced with chassis mac. +
    6. +
    + +

    + The following happens for the reverse external traffic. +

    + +
      +
    1. + The gateway chassis receives the packet from the localnet port of + the logical switch (bridged type) which provides external connectivity. + The packet then enters the ingress pipeline and then egress pipeline of + the localnet logical switch (which provides external connectivity). + The packet then enters the ingress pipeline of the logical router + datapath. +
    2. + +
    3. + Routing decision is taken and logical switch of destination VM is + identified. +
    4. + +
    5. + The packet then enters the ingress pipeline and then egress + pipeline of VM's localnet logical switch. Since the source VM + doesn't reside in the gateway chassis, the packet is sent out via the + localnet port of the VM's logical switch. Source mac of this packet + will be replaced with chassis unique mac. +
    6. + +
    7. + VM's chassis receives the packet via the localnet port and + sends it to the integration bridge. The packet enters the + ingress pipeline and then egress pipeline of the localnet + logical switch and finally gets delivered to the VM port. +
    8. +
    + +

    + One thing to note here is that, while VM to External traffic did not + require redirection to gateway chassis, the reverse traffic is through + gateway chassis only. This is because, for external router, OVN logical + router port IP will be the next hop to reach the endpoints behind it. + As a result, we need a centralized chassis, which will respond to ARP + requests coming from external network. This centralized chassis, is the + gateway chassis which is attached to corresponding router port. +

    +

    Life Cycle of a VTEP gateway

    diff --git a/tests/ovn.at b/tests/ovn.at index e5108a7..8a03393 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -29,6 +29,12 @@ m4_define([OVN_CHECK_PACKETS], [ovn_check_packets__ "$1" "$2" AT_CHECK([sort $rcv_text], [0], [expout])]) +m4_define([OVN_CHECK_PACKETS_REMOVE_BROADCAST], + [ovn_check_packets__ "$1" "$2" + echo "received_text=$rcv_text" + sed -i '/ffffffffffff/d' $rcv_text + AT_CHECK([sort $rcv_text], [0], [expout])]) + AT_BANNER([OVN components]) AT_SETUP([ovn -- lexer]) @@ -14018,7 +14024,7 @@ ovn-hv4-0 OVN_CLEANUP([hv1], [hv2], [hv3]) AT_CLEANUP -AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR chassis mac]) +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR E-W chassis mac]) ovn_start @@ -14028,6 +14034,8 @@ ovn_start # of VIF port name indicates the hypervisor it is bound to, e.g. # lp23 means VIF 3 on hv2. # +# Both the switches are connected to a logical router "router". +# # Each switch's VLAN tag and their logical switch ports are: # - ls1: # - tagged with VLAN 101 @@ -14185,6 +14193,7 @@ test_ip() { echo "------ OVN dump ------" ovn-nbctl show ovn-sbctl show +ovn-sbctl list port_binding echo "------ hv1 dump ------" as hv1 ovs-vsctl show @@ -14211,6 +14220,727 @@ as hv2 ovs-appctl fdb/show br-phys OVN_CHECK_PACKETS([hv2/vif22-tx.pcap], [vif22.expected]) + +# Associate a chassis as gateway chassis and validate garp. + +OVN_CLEANUP([hv1],[hv2]) + +AT_CLEANUP + + +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S GARP]) +ovn_start + + +# In this test cases we create 2 switches, all connected to same +# physical network (through br-phys on each HV). Each switch has +# 1 VIF. Each HV has 1 VIF port. The first digit +# of VIF port name indicates the hypervisor it is bound to, e.g. +# lp23 means VIF 3 on hv2. +# +# Both the switches are connected to a logical router "router". +# +# Additionally, we create a logical switch (ls-underlay) for N-S traffic. +# +# Each switch's VLAN tag and their logical switch ports are: +# - ls1: +# - tagged with VLAN 101 +# - ports: lp11 +# - ls2: +# - tagged with VLAN 201 +# - ports: lp22 +# - ls-underlay: +# - tagged with VLAN 1000 +# +# Note: a localnet port is created for each switch to connect to +# physical network. +# lsp_to_ls LSP +# +# Prints the name of the logical switch that contains LSP. + +net_add n1 +for i in 1 2; do + sim_add hv$i + as hv$i + ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i" + ovs-vsctl set open . external-ids:system-id="HV$i" + ovn_attach n1 br-phys 192.168.0.$i + ovs-vsctl set-controller br-int ptcp: + AT_CHECK([ovs-vsctl add-port br-phys snoopvif -- set Interface snoopvif options:tx_pcap=hv$i/snoopvif-tx.pcap options:rxq_pcap=hv$i/snoopvif-rx.pcap]) +done + +ovn-nbctl ls-add ls-underlay bridged +ovn-nbctl lsp-add ls-underlay ln3 "" 1000 +ovn-nbctl lsp-set-addresses ln3 unknown +ovn-nbctl lsp-set-type ln3 localnet +ovn-nbctl lsp-set-options ln3 network_name=phys + +ovn-nbctl lr-add router +ovn-nbctl lrp-add router router-to-underlay 00:00:01:01:02:07 172.31.0.1/24 + +ovn-nbctl lsp-add ls-underlay underlay-to-router -- set Logical_Switch_Port \ + underlay-to-router type=router \ + options:router-port=router-to-underlay \ + -- lsp-set-addresses underlay-to-router router + +ovn-nbctl --wait=sb sync + +# Associate hv2 as gateway chassis +ovn-nbctl lrp-set-gateway-chassis router-to-underlay hv2 + +ovn-nbctl show +ovn-sbctl show + +# Dump a bunch of info helpful for debugging if there's a failure. + +echo "------ OVN dump ------" +ovn-nbctl show +ovn-sbctl show + +echo "------ hv1 dump ------" +as hv1 ovs-vsctl show +as hv1 ovs-vsctl list Open_Vswitch + +echo "------ hv2 dump ------" +as hv2 ovs-vsctl show +as hv2 ovs-vsctl list Open_Vswitch + +sleep 1 + +echo "----------- Post Traffic hv1 dump -----------" +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv1 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv2 dump -----------" +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv2 ovs-appctl fdb/show br-phys + +AT_CHECK([as hv2 ovs-appctl fdb/show br-phys | grep 00:00:01:01:02:07 | grep 1000 | wc -l], [0], [[1 +]]) + +echo "ffffffffffff000001010207810003e808060001080006040001000001010207ac1f0001000000000000ac1f0001" > expected +OVN_CHECK_PACKETS([hv2/snoopvif-tx.pcap], [expected]) + OVN_CLEANUP([hv1],[hv2]) AT_CLEANUP + + +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S Ping]) +ovn_start + +# In this test cases we create 3 switches, all connected to same +# physical network (through br-phys on each HV). LS1 and LS2 have +# 1 VIF each. Each HV has 1 VIF port. The first digit +# of VIF port name indicates the hypervisor it is bound to, e.g. +# lp23 means VIF 3 on hv2. +# +# All the switches are connected to a logical router "router". +# +# Each switch's VLAN tag and their logical switch ports are: +# - ls1: +# - tagged with VLAN 101 +# - ports: lp11 +# - ls2: +# - tagged with VLAN 201 +# - ports: lp22 +# - ls-underlay: +# - tagged with VLAN 1000 +# Note: a localnet port is created for each switch to connect to +# physical network. + +for i in 1 2; do + ls_name=ls$i + ovn-nbctl ls-add $ls_name bridged + ln_port_name=ln$i + if test $i -eq 1; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 101 + elif test $i -eq 2; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 201 + fi + ovn-nbctl lsp-set-addresses $ln_port_name unknown + ovn-nbctl lsp-set-type $ln_port_name localnet + ovn-nbctl lsp-set-options $ln_port_name network_name=phys +done + +# lsp_to_ls LSP +# +# Prints the name of the logical switch that contains LSP. +lsp_to_ls () { + case $1 in dnl ( + lp?[[11]]) echo ls1 ;; dnl ( + lp?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_hv () { + case $1 in dnl ( + vif[[1]]?) echo hv1 ;; dnl ( + vif[[2]]?) echo hv2 ;; dnl ( + vif?[[north]]?) echo hv4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +net_add n1 +for i in 1 2; do + sim_add hv$i + as hv$i + ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i" + ovn_attach n1 br-phys 192.168.0.$i + + ovs-vsctl add-port br-int vif$i$i -- \ + set Interface vif$i$i external-ids:iface-id=lp$i$i \ + options:tx_pcap=hv$i/vif$i$i-tx.pcap \ + options:rxq_pcap=hv$i/vif$i$i-rx.pcap \ + ofport-request=$i$i + + lsp_name=lp$i$i + ls_name=$(lsp_to_ls $lsp_name) + + ovn-nbctl lsp-add $ls_name $lsp_name + ovn-nbctl lsp-set-addresses $lsp_name "f0:00:00:00:00:$i$i 192.168.$i.$i" + ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$i + + OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup]) + +done + +ovn-nbctl ls-add ls-underlay bridged +ovn-nbctl lsp-add ls-underlay ln3 "" 1000 +ovn-nbctl lsp-set-addresses ln3 unknown +ovn-nbctl lsp-set-type ln3 localnet +ovn-nbctl lsp-set-options ln3 network_name=phys + +ovn-nbctl ls-add ls-north bridged +ovn-nbctl lsp-add ls-north ln4 "" 1000 +ovn-nbctl lsp-set-addresses ln4 unknown +ovn-nbctl lsp-set-type ln4 localnet +ovn-nbctl lsp-set-options ln4 network_name=phys + +# Add a VM on ls-north +ovn-nbctl lsp-add ls-north lp-north +ovn-nbctl lsp-set-addresses lp-north "f0:f0:00:00:00:11 172.31.0.10" +ovn-nbctl lsp-set-port-security lp-north f0:f0:00:00:00:11 + +# Add 3rd hypervisor +sim_add hv3 +as hv3 ovs-vsctl add-br br-phys +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +as hv3 ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:33" +as hv3 ovn_attach n1 br-phys 192.168.0.3 + +# Add 4th hypervisor +sim_add hv4 +as hv4 ovs-vsctl add-br br-phys +as hv4 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +as hv4 ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:44" +as hv4 ovn_attach n1 br-phys 192.168.0.4 + +as hv4 ovs-vsctl add-port br-int vif-north -- \ + set Interface vif-north external-ids:iface-id=lp-north \ + options:tx_pcap=hv4/vif-north-tx.pcap \ + options:rxq_pcap=hv4/vif-north-rx.pcap \ + ofport-request=44 + +ovn-nbctl lr-add router +ovn-nbctl lrp-add router router-to-ls1 00:00:01:01:02:03 192.168.1.3/24 +ovn-nbctl lrp-add router router-to-ls2 00:00:01:01:02:05 192.168.2.3/24 +ovn-nbctl lrp-add router router-to-underlay 00:00:01:01:02:07 172.31.0.1/24 + +ovn-nbctl lsp-add ls1 ls1-to-router -- set Logical_Switch_Port ls1-to-router type=router \ + options:router-port=router-to-ls1 -- lsp-set-addresses ls1-to-router router +ovn-nbctl lsp-add ls2 ls2-to-router -- set Logical_Switch_Port ls2-to-router type=router \ + options:router-port=router-to-ls2 -- lsp-set-addresses ls2-to-router router +ovn-nbctl lsp-add ls-underlay underlay-to-router -- set Logical_Switch_Port \ + underlay-to-router type=router \ + options:router-port=router-to-underlay \ + -- lsp-set-addresses underlay-to-router router + +ovn-nbctl lrp-set-gateway-chassis router-to-underlay hv3 + +ovn-nbctl --wait=sb sync + +sleep 2 + +OVN_POPULATE_ARP + ++# lsp_to_ls LSP ++# ++# Prints the name of the logical switch that contains LSP. +lsp_to_ls () { + case $1 in dnl ( + lp?[[11]]) echo ls1 ;; dnl ( + lp?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_ls () { + case $1 in dnl ( + vif?[[11]]) echo ls1 ;; dnl ( + vif?[[12]]) echo ls2 ;; dnl ( + vif-north) echo ls-north ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +hv_to_num () { + case $1 in dnl ( + hv1) echo 1 ;; dnl ( + hv2) echo 2 ;; dnl ( + hv3) echo 3 ;; dnl ( + hv4) echo 4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_num () { + case $1 in dnl ( + vif22) echo 22 ;; dnl ( + vif21) echo 21 ;; dnl ( + vif11) echo 11 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_hv () { + case $1 in dnl ( + vif[[1]]?) echo hv1 ;; dnl ( + vif[[2]]?) echo hv2 ;; dnl ( + vif-north) echo hv4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_lrp () { + echo router-to-`vif_to_ls $1` +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + + +test_ip() { + # This packet has bad checksums but logical L3 routing doesn't check. + local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5 outport=$6 + local packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 + shift; shift; shift; shift; shift + hv=`vif_to_hv $inport` + as $hv ovs-appctl netdev-dummy/receive $inport $packet + in_ls=`vif_to_ls $inport` + for outport; do + out_ls=`vif_to_ls $outport` + if test $in_ls = $out_ls; then + # Ports on the same logical switch receive exactly the same packet. + echo $packet + else + # Routing decrements TTL and updates source and dest MAC + # (and checksum). + out_lrp=`vif_to_lrp $outport` + # For North-South, packet will come via gateway chassis, i.e hv3 + if test $inport = vif-north; then + echo f00000000011aabbccddee3308004500001c000000003f110100${src_ip}${dst_ip}0035111100080000 >> $outport.expected + fi + if test $outport = vif-north; then + echo f0f000000011aabbccddee1108004500001c000000003f110100${src_ip}${dst_ip}0035111100080000 >> $outport.expected + fi + fi >> $outport.expected + done +} + +# Dump a bunch of info helpful for debugging if there's a failure. + +echo "------ OVN dump ------" +ovn-nbctl show +ovn-sbctl show +ovn-sbctl list port_binding +ovn-sbctl list mac_binding + +echo "------ hv1 dump ------" +as hv1 ovs-vsctl show +as hv1 ovs-vsctl list Open_Vswitch + +echo "------ hv2 dump ------" +as hv2 ovs-vsctl show +as hv2 ovs-vsctl list Open_Vswitch + +echo "------ hv3 dump ------" +as hv3 ovs-vsctl show +as hv3 ovs-vsctl list Open_Vswitch + +echo "------ hv4 dump ------" +as hv4 ovs-vsctl show +as hv4 ovs-vsctl list Open_Vswitch + +echo "Send traffic North to South" + +sip=`ip_to_hex 172 31 0 10` +dip=`ip_to_hex 192 168 1 1` +test_ip vif-north f0f000000011 000001010207 $sip $dip vif11 + +sleep 1 + +# Confirm that North to south traffic works fine and went through gateway chassis, i.e HV3 +OVN_CHECK_PACKETS([hv1/vif11-tx.pcap], [vif11.expected]) + +echo "Send traffic South to Nouth" +sip=`ip_to_hex 192 168 1 1` +dip=`ip_to_hex 172 31 0 10` +test_ip vif11 f00000000011 000001010203 $sip $dip vif-north + +sleep 1 + +# Confirm that South to North traffic works fine. +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv4/vif-north-tx.pcap], [vif-north.expected]) + +# Confirm that packets did not go out via tunnel port. +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=32 | grep NXM_NX_TUN_METADATA0 | grep n_packets=0 | wc -l], [0], [[1 +]]) + +# Confirm that HV1 chassis mac is never seen on Gateway chassis, i.e HV3 +AT_CHECK([as hv3 ovs-appctl fdb/show br-phys | grep aa:bb:cc:dd:ee:11 | wc -l], [0], [[0 +]]) + +echo "----------- Post Traffic hv1 dump -----------" +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv1 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv2 dump -----------" +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv2 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv3 dump -----------" +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv3 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv4 dump -----------" +as hv4 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv4 ovs-appctl fdb/show br-phys + +OVN_CLEANUP([hv1],[hv2],[hv3],[hv4]) + +AT_CLEANUP + + +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S ARP handling]) +ovn_start + +# In this test cases we create 3 switches, all connected to same +# physical network (through br-phys on each HV). LS1 and LS2 have +# 1 VIF each. Each HV has 1 VIF port. The first digit +# of VIF port name indicates the hypervisor it is bound to, e.g. +# lp23 means VIF 3 on hv2. +# +# All the switches are connected to a logical router "router". +# +# Each switch's VLAN tag and their logical switch ports are: +# - ls1: +# - tagged with VLAN 101 +# - ports: lp11 +# - ls2: +# - tagged with VLAN 201 +# - ports: lp22 +# - ls-underlay: +# - tagged with VLAN 1000 +# Note: a localnet port is created for each switch to connect to +# physical network. + +for i in 1 2; do + ls_name=ls$i + ovn-nbctl ls-add $ls_name bridged + ln_port_name=ln$i + if test $i -eq 1; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 101 + elif test $i -eq 2; then + ovn-nbctl lsp-add $ls_name $ln_port_name "" 201 + fi + ovn-nbctl lsp-set-addresses $ln_port_name unknown + ovn-nbctl lsp-set-type $ln_port_name localnet + ovn-nbctl lsp-set-options $ln_port_name network_name=phys +done + +# lsp_to_ls LSP +# +# Prints the name of the logical switch that contains LSP. +lsp_to_ls () { + case $1 in dnl ( + lp?[[11]]) echo ls1 ;; dnl ( + lp?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_hv () { + case $1 in dnl ( + vif[[1]]?) echo hv1 ;; dnl ( + vif[[2]]?) echo hv2 ;; dnl ( + vif?[[north]]?) echo hv4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +net_add n1 +for i in 1 2; do + sim_add hv$i + as hv$i + ovs-vsctl add-br br-phys + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys + ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i" + ovn_attach n1 br-phys 192.168.0.$i + + ovs-vsctl add-port br-int vif$i$i -- \ + set Interface vif$i$i external-ids:iface-id=lp$i$i \ + options:tx_pcap=hv$i/vif$i$i-tx.pcap \ + options:rxq_pcap=hv$i/vif$i$i-rx.pcap \ + ofport-request=$i$i + + lsp_name=lp$i$i + ls_name=$(lsp_to_ls $lsp_name) + + ovn-nbctl lsp-add $ls_name $lsp_name + ovn-nbctl lsp-set-addresses $lsp_name "f0:00:00:00:00:$i$i 192.168.$i.$i" + ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$i + + OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup]) + +done + +ovn-nbctl ls-add ls-underlay bridged +ovn-nbctl lsp-add ls-underlay ln3 "" 1000 +ovn-nbctl lsp-set-addresses ln3 unknown +ovn-nbctl lsp-set-type ln3 localnet +ovn-nbctl lsp-set-options ln3 network_name=phys + +ovn-nbctl ls-add ls-north bridged +ovn-nbctl lsp-add ls-north ln4 "" 1000 +ovn-nbctl lsp-set-addresses ln4 unknown +ovn-nbctl lsp-set-type ln4 localnet +ovn-nbctl lsp-set-options ln4 network_name=phys + +# Add a VM on ls-north +ovn-nbctl lsp-add ls-north lp-north +ovn-nbctl lsp-set-addresses lp-north "f0:f0:00:00:00:11 172.31.0.10" +ovn-nbctl lsp-set-port-security lp-north f0:f0:00:00:00:11 + +# Add 3rd hypervisor +sim_add hv3 +as hv3 ovs-vsctl add-br br-phys +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +as hv3 ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:33" +as hv3 ovn_attach n1 br-phys 192.168.0.3 + +# Add 4th hypervisor +sim_add hv4 +as hv4 ovs-vsctl add-br br-phys +as hv4 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys +as hv4 ovs-vsctl set open . external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:44" +as hv4 ovn_attach n1 br-phys 192.168.0.4 + +as hv4 ovs-vsctl add-port br-int vif-north -- \ + set Interface vif-north external-ids:iface-id=lp-north \ + options:tx_pcap=hv4/vif-north-tx.pcap \ + options:rxq_pcap=hv4/vif-north-rx.pcap \ + ofport-request=44 + +ovn-nbctl lr-add router +ovn-nbctl lrp-add router router-to-ls1 00:00:01:01:02:03 192.168.1.3/24 +ovn-nbctl lrp-add router router-to-ls2 00:00:01:01:02:05 192.168.2.3/24 +ovn-nbctl lrp-add router router-to-underlay 00:00:01:01:02:07 172.31.0.1/24 + +ovn-nbctl lsp-add ls1 ls1-to-router -- set Logical_Switch_Port ls1-to-router type=router \ + options:router-port=router-to-ls1 -- lsp-set-addresses ls1-to-router router +ovn-nbctl lsp-add ls2 ls2-to-router -- set Logical_Switch_Port ls2-to-router type=router \ + options:router-port=router-to-ls2 -- lsp-set-addresses ls2-to-router router +ovn-nbctl lsp-add ls-underlay underlay-to-router -- set Logical_Switch_Port \ + underlay-to-router type=router \ + options:router-port=router-to-underlay \ + -- lsp-set-addresses underlay-to-router router + + +OVN_POPULATE_ARP + ++# lsp_to_ls LSP ++# ++# Prints the name of the logical switch that contains LSP. +lsp_to_ls () { + case $1 in dnl ( + lp?[[11]]) echo ls1 ;; dnl ( + lp?[[12]]) echo ls2 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_ls () { + case $1 in dnl ( + vif?[[11]]) echo ls1 ;; dnl ( + vif?[[12]]) echo ls2 ;; dnl ( + vif-north) echo ls-north ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +hv_to_num () { + case $1 in dnl ( + hv1) echo 1 ;; dnl ( + hv2) echo 2 ;; dnl ( + hv3) echo 3 ;; dnl ( + hv4) echo 4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_num () { + case $1 in dnl ( + vif22) echo 22 ;; dnl ( + vif21) echo 21 ;; dnl ( + vif11) echo 11 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_hv () { + case $1 in dnl ( + vif[[1]]?) echo hv1 ;; dnl ( + vif[[2]]?) echo hv2 ;; dnl ( + vif-north) echo hv4 ;; dnl ( + *) AT_FAIL_IF([:]) ;; + esac +} + +vif_to_lrp () { + echo router-to-`vif_to_ls $1` +} + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +# Dump a bunch of info helpful for debugging if there's a failure. + +echo "------ OVN dump ------" +ovn-nbctl show +ovn-sbctl show +ovn-sbctl list port_binding +ovn-sbctl list mac_binding + +echo "------ hv1 dump ------" +as hv1 ovs-vsctl show +as hv1 ovs-vsctl list Open_Vswitch + +echo "------ hv2 dump ------" +as hv2 ovs-vsctl show +as hv2 ovs-vsctl list Open_Vswitch + +echo "------ hv3 dump ------" +as hv3 ovs-vsctl show +as hv3 ovs-vsctl list Open_Vswitch + +echo "------ hv4 dump ------" +as hv4 ovs-vsctl show +as hv4 ovs-vsctl list Open_Vswitch + +# test_arp INPORT SHA SPA TPA [REPLY_HA] +# +# Causes a packet to be received on INPORT. The packet is an ARP +# request with SHA, SPA, and TPA as specified. If REPLY_HA is provided, then +# it should be the hardware address of the target to expect to receive in an +# ARP reply; otherwise no reply is expected. +# +# INPORT is an logical switch port number, e.g. 11 for vif11. +# SHA and REPLY_HA are each 12 hex digits. +# SPA and TPA are each 8 hex digits. +test_arp() { + local inport=$1 sha=$2 spa=$3 tpa=$4 reply_ha=$5 + local request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa} + hv=`vif_to_hv $inport` + as $hv ovs-appctl netdev-dummy/receive $inport $request + + if test X$reply_ha = X; then + # Expect to receive the broadcast ARP on the other logical switch ports + # if no reply is expected. + local i j + for i in 1 2 3; do + for j in 1 2 3; do + if test $i$j != $inport; then + echo $request >> $i$j.expected + fi + done + done + else + # Expect to receive the reply, if any. + local reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa} + local reply_vid=${sha}${reply_ha}810003e808060001080006040002${reply_ha}${tpa}${sha}${spa} + echo $reply_vid >> ${inport}_vid.expected + echo $reply >> $inport.expected + fi +} + +sip=`ip_to_hex 172 31 0 10` +tip=`ip_to_hex 172 31 0 1` + +test_arp vif-north f0f000000011 $sip $tip +# Confirm that vif-north does not get ARP reply +AT_CHECK([wc -l hv4/vif-north-tx.pcap | awk '{print $1}'], [0], [[0 +]]) + +# Set a hypervisor as gateway chassis, for router port 172.31.0.1 +ovn-nbctl lrp-set-gateway-chassis router-to-underlay hv3 +ovn-nbctl --wait=sb sync +sleep 2 + +test_arp vif-north f0f000000011 $sip $tip 000001010207 + +sleep 1 + +# Confirm that vif-north gets a single ARP reply this time +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv4/vif-north-tx.pcap], [vif-north.expected]) + +# Confirm that only redirect chassis allowed arp resolution. +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv3/br-phys_n1-tx.pcap], [vif-north_vid.expected]) +sed -i '/ffffffffffff/d' hv3/br-phys_n1-tx.packets +AT_CHECK([grep 000001010207 hv3/br-phys_n1-tx.packets | wc -l], [0], [[1 +]]) + +# Confirm that other OVN chassis did not generate ARP reply. +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-phys_n1-tx.pcap > hv1/br-phys_n1-tx.packets +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/br-phys_n1-tx.pcap > hv2/br-phys_n1-tx.packets + +AT_CHECK([grep 000001010207 hv1/br-phys_n1-tx.packets | wc -l], [0], [[0 +]]) +AT_CHECK([grep 000001010207 hv2/br-phys_n1-tx.packets | wc -l], [0], [[0 +]]) + +echo "----------- Post Traffic hv1 dump -----------" +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv1 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv2 dump -----------" +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv2 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv3 dump -----------" +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv3 ovs-appctl fdb/show br-phys + +echo "----------- Post Traffic hv4 dump -----------" +as hv4 ovs-ofctl -O OpenFlow13 dump-flows br-int +as hv4 ovs-appctl fdb/show br-phys + +OVN_CLEANUP([hv1],[hv2],[hv3],[hv4]) + +AT_CLEANUP