diff mbox series

[ovs-dev,v3] route-table: Filter route changes by interface.

Message ID 20240324121622.773630-1-lic121@chinatelecom.cn
State Changes Requested
Delegated to: Simon Horman
Headers show
Series [ovs-dev,v3] route-table: Filter route changes by interface. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Cheng Li March 24, 2024, 12:16 p.m. UTC
When ovs host is also a kubernets node, pod creation/deletion may
trigger route changes. As a result, ovs run route_table_reset().
As ovs do not care the kubernetes pod routes, route_table_reset()
is not neccessary.

Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
---

Notes:
    v2: Add function definition for bsd and stub.
    v3: Fix unused-parameter error.

 lib/route-table-bsd.c  |  5 +++++
 lib/route-table-stub.c |  5 +++++
 lib/route-table.c      | 41 ++++++++++++++++++++++++++++++---
 lib/route-table.h      |  1 +
 tests/system-route.at  | 51 ++++++++++++++++++++++++++++++++++++++++++
 vswitchd/bridge.c      |  3 +++
 vswitchd/vswitch.xml   | 10 +++++++++
 7 files changed, 113 insertions(+), 3 deletions(-)

Comments

Ilya Maximets March 26, 2024, 5:42 p.m. UTC | #1
On 3/24/24 13:16, Cheng Li wrote:
> When ovs host is also a kubernets node, pod creation/deletion may
> trigger route changes. As a result, ovs run route_table_reset().
> As ovs do not care the kubernetes pod routes, route_table_reset()
> is not neccessary.
> 
> Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
> ---

Hi, Cheng Li.  Thanks for the patch!

I'm a little confused by the use case though.  Could you explain
a bit more why this is a problem (route dump is relatively fast,
unless there are millions of routes) and how this change helps?

AFAIU, routes will not change much in kubernetes environment once
the pod is initially created and configured and the port creation
will trigger route cache reset anyway.

And we will need to reset the cache when new interfaces are
added/removed from the filter, because otherwise we'll have
stale routes in there and the cache may become inconsistent.

Best regards, Ilya Maximets.
Cheng Li March 27, 2024, 2:22 a.m. UTC | #2
On Tue, Mar 26, 2024 at 06:42:02PM +0100, 【外部账号】Ilya Maximets wrote:
> On 3/24/24 13:16, Cheng Li wrote:
> > When ovs host is also a kubernets node, pod creation/deletion may
> > trigger route changes. As a result, ovs run route_table_reset().
> > As ovs do not care the kubernetes pod routes, route_table_reset()
> > is not neccessary.
> > 
> > Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
> > ---
> 
> Hi, Cheng Li.  Thanks for the patch!
> 
> I'm a little confused by the use case though.  Could you explain
> a bit more why this is a problem (route dump is relatively fast,
> unless there are millions of routes) and how this change helps?
> 
> AFAIU, routes will not change much in kubernetes environment once
> the pod is initially created and configured and the port creation
> will trigger route cache reset anyway.
> 

Hi Ilya, thanks for reviewing this patch.

When calico is used as kubernets cni and IPinIP overlay mode[1] is
enabled, each time a pod created/deleted a route(dev tunl0) is
avertised across all cluster nodes.
```
# ip monitor route
10.233.75.61 via 11.46.8.90 dev tunl0 proto bird onlink
```

If we have a large cluster, route update may happen in high
frequency, which triggers lots of route_table_reset().

In route_table_reset(), all ovs route items are first deleted
then add latest kernel route items. There is a time gap between old
route items all deleted and new route items not ready. During this
gap, upcall/revalidation generate bad datapath flows.  As ovs does not
care kuberntes route changes, seems adding a filter is the simple way
to resolve this issue.

[1] https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip

> And we will need to reset the cache when new interfaces are
> added/removed from the filter, because otherwise we'll have
> stale routes in there and the cache may become inconsistent.
> 

Make sense, will fix in next version.

> Best regards, Ilya Maximets
diff mbox series

Patch

diff --git a/lib/route-table-bsd.c b/lib/route-table-bsd.c
index 34d42cfab..4762e2194 100644
--- a/lib/route-table-bsd.c
+++ b/lib/route-table-bsd.c
@@ -205,3 +205,8 @@  void
 route_table_wait(void)
 {
 }
+
+void
+disable_notify_on_interfaces(const char *ifaces OVS_UNUSED)
+{
+}
diff --git a/lib/route-table-stub.c b/lib/route-table-stub.c
index dd0b096d4..cc13c5191 100644
--- a/lib/route-table-stub.c
+++ b/lib/route-table-stub.c
@@ -48,3 +48,8 @@  void
 route_table_wait(void)
 {
 }
+
+void
+disable_notify_on_interfaces(const char *ifaces OVS_UNUSED)
+{
+}
diff --git a/lib/route-table.c b/lib/route-table.c
index f1fe32714..eeff509f0 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -33,6 +33,7 @@ 
 #include "netlink-notifier.h"
 #include "netlink-socket.h"
 #include "openvswitch/ofpbuf.h"
+#include "lib/sset.h"
 #include "ovs-router.h"
 #include "packets.h"
 #include "rtnetlink.h"
@@ -82,6 +83,7 @@  static struct nln_notifier *route6_notifier = NULL;
 static struct nln_notifier *name_notifier = NULL;
 
 static bool route_table_valid = false;
+static struct sset disabled_ifaces = SSET_INITIALIZER(&disabled_ifaces);
 
 static void route_table_reset(void);
 static void route_table_handle_msg(const struct route_table_msg *);
@@ -92,6 +94,7 @@  static void route_map_clear(void);
 static void name_table_init(void);
 static void name_table_change(const struct rtnetlink_change *, void *);
 
+
 uint64_t
 route_table_get_change_seq(void)
 {
@@ -354,13 +357,45 @@  route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
     return ipv4 ? RTNLGRP_IPV4_ROUTE : RTNLGRP_IPV6_ROUTE;
 }
 
+void
+disable_notify_on_interfaces(const char *ifaces)
+{
+    struct sset tmp_ifaces;
+
+    if (ifaces) {
+        sset_from_delimited_string(&tmp_ifaces, ifaces, ", ");
+    } else {
+        sset_init(&tmp_ifaces);
+    }
+    if (! sset_equals(&disabled_ifaces, &tmp_ifaces)) {
+        const char *iface;
+        struct ds ds = DS_EMPTY_INITIALIZER;
+
+        sset_swap(&disabled_ifaces, &tmp_ifaces);
+        SSET_FOR_EACH (iface, &disabled_ifaces) {
+            ds_put_format(&ds, " %s", iface);
+        }
+        VLOG_DBG_RL(&rl, "route notify disabled interfaces: [%s]",
+                    ds_cstr(&ds));
+        ds_destroy(&ds);
+    }
+    sset_destroy(&tmp_ifaces);
+}
+
 static void
-route_table_change(const struct route_table_msg *change OVS_UNUSED,
+route_table_change(const struct route_table_msg *change,
                    void *aux OVS_UNUSED)
 {
-    if (!change || change->relevant) {
-        route_table_valid = false;
+    if (change) {
+        if (!change->relevant) {
+            return;
+        }
+        if (change->rd.ifname[0] != '\0' &&
+            sset_contains(&disabled_ifaces, change->rd.ifname)) {
+            return;
+        }
     }
+    route_table_valid = false;
 }
 
 static void
diff --git a/lib/route-table.h b/lib/route-table.h
index 3a02d737a..716e5bae0 100644
--- a/lib/route-table.h
+++ b/lib/route-table.h
@@ -33,4 +33,5 @@  void route_table_wait(void);
 bool route_table_fallback_lookup(const struct in6_addr *ip6_dst,
                                  char name[],
                                  struct in6_addr *gw6);
+void disable_notify_on_interfaces(const char *ifaces);
 #endif /* route-table.h */
diff --git a/tests/system-route.at b/tests/system-route.at
index c0ecad6cf..039255df7 100644
--- a/tests/system-route.at
+++ b/tests/system-route.at
@@ -128,3 +128,54 @@  OVS_WAIT_UNTIL([test $(ovs-appctl ovs/route/show | grep -c 'p1-route') -eq 0 ])
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+
+dnl Checks that disabled interface doesn't trigger route table refresh.
+AT_SETUP([ovs-route - filter by interface])
+AT_KEYWORDS([route])
+OVS_TRAFFIC_VSWITCHD_START()
+
+dnl Create tap port.
+on_exit 'ip link del p1-route; ip link del p2-route'
+AT_CHECK([ip tuntap add name p1-route mode tap])
+AT_CHECK([ip tuntap add name p2-route mode tap])
+AT_CHECK([ip link set p1-route up])
+AT_CHECK([ip link set p2-route up])
+
+dnl Add ip address.
+AT_CHECK([ip addr add 10.0.0.17/24 dev p1-route], [0], [stdout])
+AT_CHECK([ip addr add 10.0.1.17/24 dev p2-route], [0], [stdout])
+
+dnl Check that OVS catches route updates.
+OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep -P 'p(1|2)-route' | sort], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
+Cached: 10.0.1.0/24 dev p2-route SRC 10.0.1.17
+Cached: 10.0.1.17/32 dev p2-route SRC 10.0.1.17 local])
+
+dnl Set disabled interface
+AT_CHECK([ovs-appctl vlog/set 'route_table,dbg'])
+get_log_next_line_num
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:route-notify-disabled-interfaces="p2-route"])
+dnl expected log line: "route_table|DBG|route notify disabled interfaces: [ p2-route]"
+OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep -P "notify disabled interfaces: . p2-route"])
+
+dnl Add a route with interface p1-route.
+AT_CHECK([ip route add 10.0.0.18/32 dev p1-route])
+OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
+Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17])
+
+dnl Add a route with disabled interface p2-route.
+AT_CHECK([ip route add 10.0.1.18/32 dev p2-route])
+dnl Give the main thread a chance to act.
+AT_CHECK([ovs-appctl revalidator/wait])
+dnl Check that OVS didn't refresh route table.
+AT_CHECK([ovs-appctl ovs/route/show | grep 'p2-route' | sort], [0], [dnl
+Cached: 10.0.1.0/24 dev p2-route SRC 10.0.1.17
+Cached: 10.0.1.17/32 dev p2-route SRC 10.0.1.17 local
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 95a65fcdc..217f5b3cf 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -71,6 +71,7 @@ 
 #include "unixctl.h"
 #include "lib/vswitch-idl.h"
 #include "vlan-bitmap.h"
+#include "route-table.h"
 
 VLOG_DEFINE_THIS_MODULE(bridge);
 
@@ -888,6 +889,8 @@  bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
     ofproto_set_threads(
         smap_get_int(&ovs_cfg->other_config, "n-handler-threads", 0),
         smap_get_int(&ovs_cfg->other_config, "n-revalidator-threads", 0));
+    disable_notify_on_interfaces(smap_get(&ovs_cfg->other_config,
+                                 "route-notify-disabled-interfaces"));
 
     /* Destroy "struct bridge"s, "struct port"s, and "struct iface"s according
      * to 'ovs_cfg', with only very minimal configuration otherwise.
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 8a1b607d7..ee2ca05fd 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -869,6 +869,16 @@ 
           The feature is considered experimental.
         </p>
       </column>
+      <column name="other_config" key="route-notify-disabled-interfaces">
+        <p>
+          Ignore route changes of specified interfaces. One of the usage
+          scenarios is to prevent kubernets from triggering ovs route table
+          refresh on pod create/destroy.
+        </p>
+        <p>
+          The format is interface names joined by ','. i.e. "eth1,eth2"
+        </p>
+      </column>
     </group>
     <group title="Status">
       <column name="next_cfg">