[ovs-dev] ofproto: Preserve ofport number for failed ofport across restarts
diff mbox series

Message ID 1559677811-15512-1-git-send-email-vishal.deep.ajmera@ericsson.com
State New
Headers show
Series
  • [ovs-dev] ofproto: Preserve ofport number for failed ofport across restarts
Related show

Commit Message

Vishal Deep Ajmera June 4, 2019, 11:47 a.m. UTC
Current OVS implementation frees the ofport number allocated for
a given ofport if the underlying interface could not get initialized
due to any error. However it does not delete the entry of ofport from
ovsdb. This means any new ofport can get this ofport number anytime
in future. When the underlying interface issue is resolved for the
ofport it gets a new ofport number from the pool. This has an
undesired effect where any SDN controller will have to modify all the
flows pertaining this ofport due to change in ofport number and
reprogram the flows in the switch.

This patch reserves the ofport port number for the duration the ofport
entry persists in the ovsdb. Once the entry is deleted from ovsdb the
ofport number is recirculated on LRU basis.

Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
---
 ofproto/ofproto.c |  17 +++++--
 ofproto/ofproto.h |   2 +
 vswitchd/bridge.c | 145 +++++++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 132 insertions(+), 32 deletions(-)

Comments

Ben Pfaff June 4, 2019, 5:30 p.m. UTC | #1
On Tue, Jun 04, 2019 at 11:47:57AM +0000, Vishal Deep Ajmera wrote:
> Current OVS implementation frees the ofport number allocated for
> a given ofport if the underlying interface could not get initialized
> due to any error. However it does not delete the entry of ofport from
> ovsdb. This means any new ofport can get this ofport number anytime
> in future. When the underlying interface issue is resolved for the
> ofport it gets a new ofport number from the pool. This has an
> undesired effect where any SDN controller will have to modify all the
> flows pertaining this ofport due to change in ofport number and
> reprogram the flows in the switch.
> 
> This patch reserves the ofport port number for the duration the ofport
> entry persists in the ovsdb. Once the entry is deleted from ovsdb the
> ofport number is recirculated on LRU basis.
> 
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>

This is an interesting patch and I like the idea.  The implementation
does add some conceptual complexity.

Can you say some more about the scenario that this covers?  It isn't one
that I've heard come up very much in the past.  Does it arise in real
situations that you've encountered?
Vishal Deep Ajmera June 6, 2019, 9:46 a.m. UTC | #2
> >
> > This patch reserves the ofport port number for the duration the ofport
> > entry persists in the ovsdb. Once the entry is deleted from ovsdb the
> > ofport number is recirculated on LRU basis.
> >
> > Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
> 
> This is an interesting patch and I like the idea.  The implementation does add
> some conceptual complexity.
> 
> Can you say some more about the scenario that this covers?  It isn't one that I've
> heard come up very much in the past.  Does it arise in real situations that you've
> encountered?

Hi Ben,

Thanks for looking into this patch. We have 2 scenarios in our deployment where 
we encountered issue.

Scenario 1: (SDN Controller - ODL, OpenStack - Pike)

Assume we have 4 ports in ovs on "netdev" bridge br-int: 
tap0 (1)
tap1 (2)
vhu1 (3)
vhu2 (4)
() denotes the ofport number allocated to the port by OVS.

If the host compute is rebooted, the tap devices tap0 and tap1 are not recreated 
(or atleast not immediately). When OVS restarts the corresponding tap interfaces 
are in error state: "could not open network device tap0 (No such device)".  
Unfortunately due to this error, OVS sets the ofport field to -1 in OVSDB which 
means these ofport numbers (1) and (2) are now available to be allocated to any 
new port. At this time, SDN controller will need to remove all the flows 
pertaining to tap ports as ofport numbers are no longer same. 

A peculiar behavior we observed with OpenStack Pike was that when compute was 
restarted, it also triggered deletes and adds of all VHU ports (and not as a 
single transaction). This also resulted in change of existing vhu's ofport 
number.

After compute reboot:

tap0 (-1) -> "could not open network device tap0 (No such device)"
tap1 (-1) -> "could not open network device tap1 (No such device)"
vhu1 (3)
vhu2 (4)

Then a delete/add of vhu port changes the ofport allocation as follows:
tap0 (-1)
tap1 (-1)
vhu1 (1)
vhu2 (2)

Things got messed up here as SDN controller on one hand trying to remove 
flows for tap0 /tap1 and on the other also modifying flows for vhu1 and vhu2. 
Unfortunately both tap0 and vhu1 has ofport (1) and tap1 and vhu2 has ofport (2). 
Until the time flows are properly cleaned by SDN controller we end up in stale 
flows of tap ports being used for vhu ports. If OVS avoids giving same ofport 
number (1) and (2) to existing VHU ports or any new port, this situation could 
be avoided.

Scenario 2: (SDN Controller - ODL, OpenStack - Pike)
We have another scenario in our deployment, where in we always start OVS with 
DPDK enabled. But if DPDK initialization fails due to any error, we fall back to
non-dpdk mode and start OVS. This is to make sure we still manage to have 
connectivity with the computes. But this causes all VHU's and DPDK ports to fail and
frees up their ofport numbers. Once we correct the underlying DPDK issue and
start OVS again with dpdk mode, SDN controller is now required to reconstruct (and 
not just replay) all the flows which it had earlier as the ofport numbers are 
not guaranteed to be same again.

This patch fixes both the above issue by retaining the ofport number till ofport
entry exists in the OVSDB.

Warm Regards,
Vishal Ajmera
Vishal Deep Ajmera June 13, 2019, 8:28 a.m. UTC | #3
> 
> This patch fixes both the above issue by retaining the ofport number till ofport
> entry exists in the OVSDB.
> 
> Warm Regards,
> Vishal Ajmera

Hi Ben,

Did you get a chance to review this patch ?

Warm Regards,
Vishal Ajmera

Patch
diff mbox series

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 1d6fc00..df77859 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -190,8 +190,6 @@  static void reinit_ports(struct ofproto *);
 
 static long long int ofport_get_usage(const struct ofproto *,
                                       ofp_port_t ofp_port);
-static void ofport_set_usage(struct ofproto *, ofp_port_t ofp_port,
-                             long long int last_used);
 static void ofport_remove_usage(struct ofproto *, ofp_port_t ofp_port);
 
 /* Ofport usage.
@@ -2264,10 +2262,23 @@  static ofp_port_t
 alloc_ofp_port(struct ofproto *ofproto, const char *netdev_name)
 {
     uint16_t port_idx;
+    struct ofport *port;
 
     port_idx = simap_get(&ofproto->ofp_requests, netdev_name);
     port_idx = port_idx ? port_idx : UINT16_MAX;
 
+    /* Check if port_idx is not allocated to any other interface.
+     * If available, then reserve for given netdev and skip
+     * allocation algorithm. */
+    if (port_idx != UINT16_MAX) {
+        port = ofproto_get_port(ofproto, u16_to_ofp(port_idx));
+        if ((!port) || (port->netdev != NULL &&
+                        !strcmp(netdev_name, netdev_get_name(port->netdev)))) {
+            ofport_set_usage(ofproto, u16_to_ofp(port_idx), LLONG_MAX);
+            return u16_to_ofp(port_idx);
+        }
+    }
+
     if (port_idx >= ofproto->max_ports
         || ofport_get_usage(ofproto, u16_to_ofp(port_idx)) == LLONG_MAX) {
         uint16_t lru_ofport = 0, end_port_no = ofproto->alloc_port_no;
@@ -2577,7 +2588,7 @@  ofport_get_usage(const struct ofproto *ofproto, ofp_port_t ofp_port)
     return 0;
 }
 
-static void
+void
 ofport_set_usage(struct ofproto *ofproto, ofp_port_t ofp_port,
                  long long int last_used)
 {
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index 6e4afff..c236e0b 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -313,6 +313,8 @@  const char *ofproto_port_open_type(const struct ofproto *,
                                    const char *port_type);
 int ofproto_port_add(struct ofproto *, struct netdev *, ofp_port_t *ofp_portp);
 int ofproto_port_del(struct ofproto *, ofp_port_t ofp_port);
+void ofport_set_usage(struct ofproto *, ofp_port_t ofp_port,
+                      long long int last_used);
 void ofproto_port_set_config(struct ofproto *, ofp_port_t ofp_port,
                              const struct smap *cfg);
 int ofproto_port_get_stats(const struct ofport *, struct netdev_stats *stats);
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 0702cc6..4852ef0 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -137,6 +137,8 @@  struct bridge {
 
     /* Used during reconfiguration. */
     struct shash wanted_ports;
+    struct hmap failed_ports;   /* List of ports for which interface
+                                 * add failed. */
 
     /* Synthetic local port if necessary. */
     struct ovsrec_port synth_local_port;
@@ -273,10 +275,17 @@  static bool port_is_bond_fake_iface(const struct port *);
 static unixctl_cb_func qos_unixctl_show_types;
 static unixctl_cb_func qos_unixctl_show;
 
-static struct port *port_create(struct bridge *, const struct ovsrec_port *);
-static void port_del_ifaces(struct port *);
+static struct port *port_create(struct bridge *, const struct ovsrec_port *,
+                                bool stale);
+static void port_del_ifaces(struct port *, bool stale);
 static void port_destroy(struct port *);
-static struct port *port_lookup(const struct bridge *, const char *name);
+static struct port *port_lookup(const struct bridge *, const char *name,
+                                bool stale);
+
+static void failed_port_destroy(struct port *);
+static void failed_iface_insert(struct port *port,
+                                const struct ovsrec_interface *iface_cfg);
+
 static void port_configure(struct port *);
 static struct lacp_settings *port_configure_lacp(struct port *,
                                                  struct lacp_settings *);
@@ -300,7 +309,7 @@  static bool iface_is_internal(const struct ovsrec_interface *iface,
                               const struct ovsrec_bridge *br);
 static const char *iface_get_type(const struct ovsrec_interface *,
                                   const struct ovsrec_bridge *);
-static void iface_destroy(struct iface *);
+static void iface_destroy(struct iface *, bool stale);
 static void iface_destroy__(struct iface *);
 static struct iface *iface_lookup(const struct bridge *, const char *name);
 static struct iface *iface_find(const char *name);
@@ -869,7 +878,7 @@  bridge_delete_or_reconfigure_ports(struct bridge *br)
             victim_request = iface_get_requested_ofp_port(victim->cfg);
             if (victim_request != requested_ofp_port) {
                 del = add_ofp_port(victim->ofp_port, del, &n, &allocated);
-                iface_destroy(victim);
+                iface_destroy(victim, false);
                 goto delete;
             }
         }
@@ -878,7 +887,7 @@  bridge_delete_or_reconfigure_ports(struct bridge *br)
         continue;
 
     delete:
-        iface_destroy(iface);
+        iface_destroy(iface, false);
         del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated);
     }
     for (i = 0; i < n; i++) {
@@ -1701,7 +1710,7 @@  bridge_configure_spanning_tree(struct bridge *br)
 static bool
 bridge_has_bond_fake_iface(const struct bridge *br, const char *name)
 {
-    const struct port *port = port_lookup(br, name);
+    const struct port *port = port_lookup(br, name, false);
     return port && port_is_bond_fake_iface(port);
 }
 
@@ -1774,7 +1783,8 @@  iface_set_netdev_config(const struct ovsrec_interface *iface_cfg,
  * If successful, returns 0 and stores the network device in '*netdevp'.  On
  * failure, returns a positive errno value and stores NULL in '*netdevp'. */
 static int
-iface_do_create(const struct bridge *br,
+iface_do_create(struct bridge *br,
+                const struct ovsrec_port *port_cfg,
                 const struct ovsrec_interface *iface_cfg,
                 ofp_port_t *ofp_portp, struct netdev **netdevp,
                 char **errp)
@@ -1782,6 +1792,7 @@  iface_do_create(const struct bridge *br,
     struct netdev *netdev = NULL;
     int error;
     const char *type;
+    struct port *port;
 
     if (netdev_is_reserved_name(iface_cfg->name)) {
         VLOG_WARN("could not create interface %s, name is reserved",
@@ -1823,6 +1834,18 @@  iface_do_create(const struct bridge *br,
 error:
     *netdevp = NULL;
     netdev_close(netdev);
+    /* Reserve ofport number for failed interface.
+     * This will enable us to use same ofport number
+     * for interface when added successfully later. */
+    if (iface_cfg->n_ofport) {
+        ofport_set_usage(br->ofproto, u16_to_ofp(*(iface_cfg->ofport)),
+                         LLONG_MAX);
+        port = port_lookup(br, port_cfg->name, true);
+        if (!port) {
+            port = port_create(br, port_cfg, true);
+        }
+        failed_iface_insert(port, iface_cfg);
+    }
     return error;
 }
 
@@ -1845,7 +1868,8 @@  iface_create(struct bridge *br, const struct ovsrec_interface *iface_cfg,
 
     /* Do the bits that can fail up front. */
     ovs_assert(!iface_lookup(br, iface_cfg->name));
-    error = iface_do_create(br, iface_cfg, &ofp_port, &netdev, &errp);
+    error = iface_do_create(br, port_cfg, iface_cfg, &ofp_port, &netdev,
+                            &errp);
     if (error) {
         iface_clear_db_record(iface_cfg, errp);
         free(errp);
@@ -1853,9 +1877,9 @@  iface_create(struct bridge *br, const struct ovsrec_interface *iface_cfg,
     }
 
     /* Get or create the port structure. */
-    port = port_lookup(br, port_cfg->name);
+    port = port_lookup(br, port_cfg->name, false);
     if (!port) {
-        port = port_create(br, port_cfg);
+        port = port_create(br, port_cfg, false);
     }
 
     /* Create the iface structure. */
@@ -3292,6 +3316,7 @@  bridge_create(const struct ovsrec_bridge *br_cfg)
     eth_addr_mark_random(&br->default_ea);
 
     hmap_init(&br->ports);
+    hmap_init(&br->failed_ports);
     hmap_init(&br->ifaces);
     hmap_init(&br->iface_by_name);
     hmap_init(&br->mirrors);
@@ -3313,11 +3338,15 @@  bridge_destroy(struct bridge *br, bool del)
         HMAP_FOR_EACH_SAFE (mirror, next_mirror, hmap_node, &br->mirrors) {
             mirror_destroy(mirror);
         }
+        HMAP_FOR_EACH_SAFE (port, next_port, hmap_node, &br->failed_ports) {
+            failed_port_destroy(port);
+        }
 
         hmap_remove(&all_bridges, &br->node);
         ofproto_destroy(br->ofproto, del);
         hmap_destroy(&br->ifaces);
         hmap_destroy(&br->ports);
+        hmap_destroy(&br->failed_ports);
         hmap_destroy(&br->iface_by_name);
         hmap_destroy(&br->mirrors);
         hmap_destroy(&br->mappings);
@@ -3458,7 +3487,15 @@  bridge_del_ports(struct bridge *br, const struct shash *wanted_ports)
         if (!port->cfg) {
             port_destroy(port);
         } else {
-            port_del_ifaces(port);
+            port_del_ifaces(port, false);
+        }
+    }
+    HMAP_FOR_EACH_SAFE (port, next, hmap_node, &br->failed_ports) {
+        port->cfg = shash_find_data(wanted_ports, port->name);
+        if (!port->cfg) {
+            failed_port_destroy(port);
+        } else {
+            port_del_ifaces(port, true);
         }
     }
 
@@ -4045,7 +4082,7 @@  bridge_aa_refresh_queued(struct bridge *br)
         VLOG_INFO("ifname=%s, vlan=%u, oper=%u", node->port_name, node->vlan,
                   node->oper);
 
-        port = port_lookup(br, node->port_name);
+        port = port_lookup(br, node->port_name, false);
         if (port) {
             bridge_aa_update_trunks(port, node);
         }
@@ -4062,9 +4099,10 @@  bridge_aa_refresh_queued(struct bridge *br)
 /* Port functions. */
 
 static struct port *
-port_create(struct bridge *br, const struct ovsrec_port *cfg)
+port_create(struct bridge *br, const struct ovsrec_port *cfg, bool stale)
 {
     struct port *port;
+    struct hmap *port_list = (stale ? &br->failed_ports : &br->ports);
 
     port = xzalloc(sizeof *port);
     port->bridge = br;
@@ -4072,13 +4110,50 @@  port_create(struct bridge *br, const struct ovsrec_port *cfg)
     port->cfg = cfg;
     ovs_list_init(&port->ifaces);
 
-    hmap_insert(&br->ports, &port->hmap_node, hash_string(port->name, 0));
+    hmap_insert(port_list, &port->hmap_node, hash_string(port->name, 0));
     return port;
 }
 
+static void
+failed_iface_insert(struct port *port,
+                    const struct ovsrec_interface *iface_cfg)
+{
+    struct iface *iface, *next;
+
+    LIST_FOR_EACH_SAFE (iface, next, port_elem, &port->ifaces) {
+        if (!strcmp(iface->name, iface_cfg->name)) {
+            /* already present in the list */
+            return;
+        }
+    }
+    iface = xzalloc(sizeof *iface);
+    ovs_list_push_back(&port->ifaces, &iface->port_elem);
+    iface->port = port;
+    iface->name = xstrdup(iface_cfg->name);
+    iface->ofp_port = u16_to_ofp(*(iface_cfg->ofport));
+    iface->cfg = iface_cfg;
+}
+
+
+static void
+failed_port_destroy(struct port *port)
+{
+    if (port) {
+        struct bridge *br = port->bridge;
+        struct iface *iface, *next;
+
+        LIST_FOR_EACH_SAFE (iface, next, port_elem, &port->ifaces) {
+            iface_destroy(iface, true);
+        }
+        hmap_remove(&br->failed_ports, &port->hmap_node);
+        free(port->name);
+        free(port);
+    }
+}
+
 /* Deletes interfaces from 'port' that are no longer configured for it. */
 static void
-port_del_ifaces(struct port *port)
+port_del_ifaces(struct port *port, bool stale)
 {
     struct iface *iface, *next;
     struct sset new_ifaces;
@@ -4093,7 +4168,7 @@  port_del_ifaces(struct port *port)
     /* Get rid of deleted interfaces. */
     LIST_FOR_EACH_SAFE (iface, next, port_elem, &port->ifaces) {
         if (!sset_contains(&new_ifaces, iface->name)) {
-            iface_destroy(iface);
+            iface_destroy(iface, stale);
         }
     }
 
@@ -4122,12 +4197,13 @@  port_destroy(struct port *port)
 }
 
 static struct port *
-port_lookup(const struct bridge *br, const char *name)
+port_lookup(const struct bridge *br, const char *name, bool stale)
 {
     struct port *port;
+    const struct hmap *port_map = (stale ? &br->failed_ports: &br->ports);
 
     HMAP_FOR_EACH_WITH_HASH (port, hmap_node, hash_string(name, 0),
-                             &br->ports) {
+                             port_map) {
         if (!strcmp(port->name, name)) {
             return port;
         }
@@ -4372,14 +4448,23 @@  iface_destroy__(struct iface *iface)
 }
 
 static void
-iface_destroy(struct iface *iface)
+iface_destroy(struct iface *iface, bool stale)
 {
     if (iface) {
-        struct port *port = iface->port;
+        if (!stale) {
+            struct port *port = iface->port;
 
-        iface_destroy__(iface);
-        if (ovs_list_is_empty(&port->ifaces)) {
-            port_destroy(port);
+            iface_destroy__(iface);
+            if (ovs_list_is_empty(&port->ifaces)) {
+                port_destroy(port);
+            }
+        } else {
+            /* release iface ofp_port number */
+            ofport_set_usage(iface->port->bridge->ofproto, iface->ofp_port,
+                             time_msec());
+            ovs_list_remove(&iface->port_elem);
+            free(iface->name);
+            free(iface);
         }
     }
 }
@@ -4479,8 +4564,11 @@  iface_set_ofport(const struct ovsrec_interface *if_cfg, ofp_port_t ofport)
     }
 }
 
-/* Clears all of the fields in 'if_cfg' that indicate interface status, and
- * sets the "ofport" field to -1.
+/* Clears all of the fields in 'if_cfg' that indicate interface status.
+ * We leave ofport number in the DB so that if this interface comes up
+ * properly again, the openflow rules need not change.
+ *
+ * Once port is deleted from DB the ofport number will be recirculated.
  *
  * This is appropriate when 'if_cfg''s interface cannot be created or is
  * otherwise invalid. */
@@ -4488,7 +4576,6 @@  static void
 iface_clear_db_record(const struct ovsrec_interface *if_cfg, char *errp)
 {
     if (!ovsdb_idl_row_is_synthetic(&if_cfg->header_)) {
-        iface_set_ofport(if_cfg, OFPP_NONE);
         ovsrec_interface_set_error(if_cfg, errp);
         ovsrec_interface_set_status(if_cfg, NULL);
         ovsrec_interface_set_admin_state(if_cfg, NULL);
@@ -4770,7 +4857,7 @@  mirror_collect_ports(struct mirror *m,
 
     for (i = 0; i < n_in_ports; i++) {
         const char *name = in_ports[i]->name;
-        struct port *port = port_lookup(m->bridge, name);
+        struct port *port = port_lookup(m->bridge, name, false);
         if (port) {
             out_ports[n_out_ports++] = port;
         } else {
@@ -4797,7 +4884,7 @@  mirror_configure(struct mirror *m)
 
     /* Get output port or VLAN. */
     if (cfg->output_port) {
-        s.out_bundle = port_lookup(m->bridge, cfg->output_port->name);
+        s.out_bundle = port_lookup(m->bridge, cfg->output_port->name, false);
         if (!s.out_bundle) {
             VLOG_ERR("bridge %s: mirror %s outputs to port not on bridge",
                      m->bridge->name, m->name);