diff mbox series

[ovs-dev] route-table: Avoid routes from non-standard routing tables.

Message ID 20240315190532.233907-1-i.maximets@ovn.org
State Changes Requested
Headers show
Series [ovs-dev] route-table: Avoid routes from non-standard routing tables. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets March 15, 2024, 7:05 p.m. UTC
Currently, ovs-vswitchd is subscribed to all the routing changes in the
kernel.  On each change, it marks the internal routing table cache as
invalid, then resets it and dumps all the routes from the kernel from
scratch.  The reason for that is kernel routing updates not being
reliable in a sense that it's hard to tell which route is getting
removed or modified.  Userspace application has to track the order in
which route entries are dumped from the kernel.  Updates can get lost
or even duplicated and the kernel doesn't provide a good mechanism to
distinguish one route from another.  To my knowledge, dumping all the
routes from a kernel after each change is the only way to keep the
cache consistent.  Some more info can be found in the following never
addressed issues:
  https://bugzilla.redhat.com/1337860
  https://bugzilla.redhat.com/1337855

It seems to be believed that NetworkManager "mostly" does incremental
updates right.  But it is still not completely correct, will re-dump
the whole table in certain cases, and it takes a huge amount of very
complicated code to do the accounting and route comparisons.

Going back to ovs-vswitchd, it currently dumps routes from all the
routing tables.  If it will get conflicting routes from multiple
tables, the cache will not be useful.  The routing cache in userspace
is primarily used for checking the egress port for tunneled traffic
and this way also detecting link state changes for a tunnel port.
For userspace datapath it is used for actual routing of the packet
after sending to a native tunnel.
With kernel datapath we don't really have a mechanism to know which
routing table will actually be used by the kernel after encapsulation,
so our lookups on a cache may be incorrect because of this as well.

So, unless all the relevant routes are in the standard tables, the
lookup in userspace route cache is unreliable.

Luckily, most setups are not using any complicated routing in
non-standard tables that OVS has to be aware of.

It is possible, but unlikely, that standard routing tables are
completely empty while some other custom table is not, and all the OVS
tunnel traffic is directed to that table.  That would be the only
scenario where dumping non-standard tables would make sense.  But it
seems like this kind of setup will likely need a way to tell OVS from
which table the routes should be taken, or we'll need to dump routing
rules and keep a separate cache for each table, so we can first match
on rules and then lookup correct routes in a specific table.  I'm not
sure if trying to implement all that is justified.

For now, stop considering routes from non-standard tables to avoid
mixing different tables together and also wasting CPU resources.

This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
running on a same host and in a same network namespace with OVS using
its own custom routing table.

Unfortunately, there seems to be no way to tell the kernel to send
updates only for particular tables.  So, we'll still receive and parse
all of them.  But they will not result in a full cache invalidation in
most cases.

Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
So, we can make use of it and dump only standard tables when we get a
relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
the socket for filtering to work.  There is no reason to not enable it
by default, if supported.  It is not used outside of NETLINK_ROUTE.

Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 lib/netlink-protocol.h | 10 ++++++
 lib/netlink-socket.c   |  8 +++++
 lib/route-table.c      | 80 +++++++++++++++++++++++++++++++++---------
 tests/system-route.at  | 64 +++++++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+), 16 deletions(-)

Comments

Aaron Conole March 19, 2024, 7:56 p.m. UTC | #1
Ilya Maximets <i.maximets@ovn.org> writes:

> Currently, ovs-vswitchd is subscribed to all the routing changes in the
> kernel.  On each change, it marks the internal routing table cache as
> invalid, then resets it and dumps all the routes from the kernel from
> scratch.  The reason for that is kernel routing updates not being
> reliable in a sense that it's hard to tell which route is getting
> removed or modified.  Userspace application has to track the order in
> which route entries are dumped from the kernel.  Updates can get lost
> or even duplicated and the kernel doesn't provide a good mechanism to
> distinguish one route from another.  To my knowledge, dumping all the
> routes from a kernel after each change is the only way to keep the
> cache consistent.  Some more info can be found in the following never
> addressed issues:
>   https://bugzilla.redhat.com/1337860
>   https://bugzilla.redhat.com/1337855
>
> It seems to be believed that NetworkManager "mostly" does incremental
> updates right.  But it is still not completely correct, will re-dump
> the whole table in certain cases, and it takes a huge amount of very
> complicated code to do the accounting and route comparisons.
>
> Going back to ovs-vswitchd, it currently dumps routes from all the
> routing tables.  If it will get conflicting routes from multiple
> tables, the cache will not be useful.  The routing cache in userspace
> is primarily used for checking the egress port for tunneled traffic
> and this way also detecting link state changes for a tunnel port.
> For userspace datapath it is used for actual routing of the packet
> after sending to a native tunnel.
> With kernel datapath we don't really have a mechanism to know which
> routing table will actually be used by the kernel after encapsulation,
> so our lookups on a cache may be incorrect because of this as well.
>
> So, unless all the relevant routes are in the standard tables, the
> lookup in userspace route cache is unreliable.
>
> Luckily, most setups are not using any complicated routing in
> non-standard tables that OVS has to be aware of.
>
> It is possible, but unlikely, that standard routing tables are
> completely empty while some other custom table is not, and all the OVS
> tunnel traffic is directed to that table.  That would be the only
> scenario where dumping non-standard tables would make sense.  But it
> seems like this kind of setup will likely need a way to tell OVS from
> which table the routes should be taken, or we'll need to dump routing
> rules and keep a separate cache for each table, so we can first match
> on rules and then lookup correct routes in a specific table.  I'm not
> sure if trying to implement all that is justified.
>
> For now, stop considering routes from non-standard tables to avoid
> mixing different tables together and also wasting CPU resources.
>
> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
> running on a same host and in a same network namespace with OVS using
> its own custom routing table.
>
> Unfortunately, there seems to be no way to tell the kernel to send
> updates only for particular tables.  So, we'll still receive and parse
> all of them.  But they will not result in a full cache invalidation in
> most cases.
>
> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
> So, we can make use of it and dump only standard tables when we get a
> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
> the socket for filtering to work.  There is no reason to not enable it
> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>
> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  lib/netlink-protocol.h | 10 ++++++
>  lib/netlink-socket.c   |  8 +++++
>  lib/route-table.c      | 80 +++++++++++++++++++++++++++++++++---------
>  tests/system-route.at  | 64 +++++++++++++++++++++++++++++++++
>  4 files changed, 146 insertions(+), 16 deletions(-)
>
> diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
> index 6eaa7035a..e4bb28ac9 100644
> --- a/lib/netlink-protocol.h
> +++ b/lib/netlink-protocol.h
> @@ -155,6 +155,11 @@ enum {
>  #define NLA_TYPE_MASK       ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
>  #endif
>  
> +/* Introduced in v4.4. */
> +#ifndef NLM_F_DUMP_FILTERED
> +#define NLM_F_DUMP_FILTERED 0x20
> +#endif
> +
>  /* These were introduced all together in 2.6.14.  (We want our programs to
>   * support the newer kernel features even if compiled with older headers.) */
>  #ifndef NETLINK_ADD_MEMBERSHIP
> @@ -168,6 +173,11 @@ enum {
>  #define NETLINK_LISTEN_ALL_NSID 8
>  #endif
>  
> +/* Strict checking of netlink arguments introduced in Linux kernel v4.20. */
> +#ifndef NETLINK_GET_STRICT_CHK
> +#define NETLINK_GET_STRICT_CHK 12
> +#endif
> +
>  /* These were introduced all together in 2.6.23.  (We want our programs to
>   * support the newer kernel features even if compiled with older headers.) */
>  #ifndef CTRL_ATTR_MCAST_GRP_MAX
> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
> index 80da20d9f..eea880b2c 100644
> --- a/lib/netlink-socket.c
> +++ b/lib/netlink-socket.c
> @@ -205,6 +205,14 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
>          }
>      }
>  
> +    /* Strict checking only supported for NETLINK_ROUTE. */
> +    if (protocol == NETLINK_ROUTE
> +        && setsockopt(sock->fd, SOL_NETLINK, NETLINK_GET_STRICT_CHK,
> +                      &one, sizeof one) < 0) {
> +        VLOG_INFO("netlink: could not enable strict checking (%s)",
> +                  ovs_strerror(errno));

Maybe this message is a bit scary for users - but I'm not sure what
message might be less alarming.  Something like:

netlink: strict checking off (%s)

Not a big deal, just a kindof nit.  I hope it doesn't become VLOG_WARN
later or something, and want to avoid someone reading it and thinking
that it is a misclassified warning.

I don't think it could happen often, but if we have opened a large
number of nl sockets (maybe some large nl_pool_alloc) then we end up
with quite a few of these messages as well.  Maybe we can just vlog this
once?

> +    }
> +
>      retval = get_socket_rcvbuf(sock->fd);
>      if (retval < 0) {
>          retval = -retval;
> diff --git a/lib/route-table.c b/lib/route-table.c
> index 9927dcc18..f1fe32714 100644
> --- a/lib/route-table.c
> +++ b/lib/route-table.c
> @@ -26,6 +26,7 @@
>  #include <linux/rtnetlink.h>
>  #include <net/if.h>
>  
> +#include "coverage.h"
>  #include "hash.h"
>  #include "netdev.h"
>  #include "netlink.h"
> @@ -44,6 +45,8 @@
>  
>  VLOG_DEFINE_THIS_MODULE(route_table);
>  
> +COVERAGE_DEFINE(route_table_dump);
> +
>  struct route_data {
>      /* Copied from struct rtmsg. */
>      unsigned char rtm_dst_len;
> @@ -80,7 +83,7 @@ static struct nln_notifier *name_notifier = NULL;
>  
>  static bool route_table_valid = false;
>  
> -static int route_table_reset(void);
> +static void route_table_reset(void);
>  static void route_table_handle_msg(const struct route_table_msg *);
>  static int route_table_parse(struct ofpbuf *, struct route_table_msg *);
>  static void route_table_change(const struct route_table_msg *, void *);
> @@ -153,26 +156,22 @@ route_table_wait(void)
>      ovs_mutex_unlock(&route_table_mutex);
>  }
>  
> -static int
> -route_table_reset(void)
> +static bool
> +route_table_dump_one_table(unsigned char id)
>  {
> -    struct nl_dump dump;
> -    struct rtgenmsg *rtgenmsg;
>      uint64_t reply_stub[NL_DUMP_BUFSIZE / 8];
>      struct ofpbuf request, reply, buf;
> -
> -    route_map_clear();
> -    netdev_get_addrs_list_flush();
> -    route_table_valid = true;
> -    rt_change_seq++;
> +    struct rtmsg *rq_msg;
> +    bool filtered = true;
> +    struct nl_dump dump;
>  
>      ofpbuf_init(&request, 0);
>  
> -    nl_msg_put_nlmsghdr(&request, sizeof *rtgenmsg, RTM_GETROUTE,
> -                        NLM_F_REQUEST);
> +    nl_msg_put_nlmsghdr(&request, sizeof *rq_msg, RTM_GETROUTE, NLM_F_REQUEST);
>  
> -    rtgenmsg = ofpbuf_put_zeros(&request, sizeof *rtgenmsg);
> -    rtgenmsg->rtgen_family = AF_UNSPEC;
> +    rq_msg = ofpbuf_put_zeros(&request, sizeof *rq_msg);
> +    rq_msg->rtm_family = AF_UNSPEC;
> +    rq_msg->rtm_table = id;

Any reason to not consider setting RTA_TABLE attribute rather than use
rtm_table.  RTA_TABLE supports the extended range, rather than just the
8bits range that rtm_table can fit.

>  
>      nl_dump_start(&dump, NETLINK_ROUTE, &request);
>      ofpbuf_uninit(&request);
> @@ -182,12 +181,43 @@ route_table_reset(void)
>          struct route_table_msg msg;
>  
>          if (route_table_parse(&reply, &msg)) {
> +            struct nlmsghdr *nlmsghdr = nl_msg_nlmsghdr(&reply);
> +
> +            /* Older kernels do not support filtering. */
> +            if (!(nlmsghdr->nlmsg_flags & NLM_F_DUMP_FILTERED)) {
> +                filtered = false;
> +            }
>              route_table_handle_msg(&msg);
>          }
>      }
>      ofpbuf_uninit(&buf);
> +    nl_dump_done(&dump);
> +
> +    return filtered;
> +}
> +
> +static void
> +route_table_reset(void)
> +{
> +    unsigned char tables[] = {
> +        RT_TABLE_DEFAULT,
> +        RT_TABLE_MAIN,
> +        RT_TABLE_LOCAL,
> +    };
>  
> -    return nl_dump_done(&dump);
> +    route_map_clear();
> +    netdev_get_addrs_list_flush();
> +    route_table_valid = true;
> +    rt_change_seq++;
> +
> +    COVERAGE_INC(route_table_dump);
> +
> +    for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
> +        if (!route_table_dump_one_table(tables[i])) {
> +            /* Got unfiltered reply, no need to dump further. */
> +            break;
> +        }
> +    }
>  }
>  
>  /* Return RTNLGRP_IPV4_ROUTE or RTNLGRP_IPV6_ROUTE on success, 0 on parse
> @@ -203,6 +233,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>          [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>          [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>      };
>  
>      static const struct nl_policy policy6[] = {
> @@ -211,6 +242,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>          [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
>          [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>      };
>  
>      struct nlattr *attrs[ARRAY_SIZE(policy)];
> @@ -232,6 +264,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>  
>      if (parsed) {
>          const struct nlmsghdr *nlmsg;
> +        uint32_t table_id;
>          int rta_oif;      /* Output interface index. */
>  
>          nlmsg = buf->data;
> @@ -247,6 +280,19 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>              rtm->rtm_type != RTN_LOCAL) {
>              change->relevant = false;
>          }
> +
> +        table_id = rtm->rtm_table;
> +        if (attrs[RTA_TABLE]) {
> +            table_id = nl_attr_get_u32(attrs[RTA_TABLE]);
> +        }
> +        /* Do not consider changes in non-standard routing tables. */
> +        if (table_id
> +            && table_id != RT_TABLE_DEFAULT
> +            && table_id != RT_TABLE_MAIN
> +            && table_id != RT_TABLE_LOCAL) {
> +            change->relevant = false;
> +        }
> +
>          change->nlmsg_type     = nlmsg->nlmsg_type;
>          change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? 96 : 0);
>          change->rd.local = rtm->rtm_type == RTN_LOCAL;
> @@ -312,7 +358,9 @@ static void
>  route_table_change(const struct route_table_msg *change OVS_UNUSED,
>                     void *aux OVS_UNUSED)
>  {
> -    route_table_valid = false;
> +    if (!change || change->relevant) {
> +        route_table_valid = false;
> +    }
>  }
>  
>  static void
> diff --git a/tests/system-route.at b/tests/system-route.at
> index 114aaebc7..c0ecad6cf 100644
> --- a/tests/system-route.at
> +++ b/tests/system-route.at
> @@ -64,3 +64,67 @@ Cached: fc00:db8:beef::13/128 dev br0 GW fc00:db8:cafe::1 SRC fc00:db8:cafe::2])
>  
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +dnl Checks that OVS doesn't use routes from non-standard tables.
> +AT_SETUP([ovs-route - route tables])
> +AT_KEYWORDS([route])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +dnl Create tap port.
> +on_exit 'ip link del p1-route'
> +AT_CHECK([ip tuntap add name p1-route mode tap])
> +AT_CHECK([ip link set p1-route up])
> +
> +dnl Add ip address.
> +AT_CHECK([ip addr add 10.0.0.17/24 dev p1-route], [0], [stdout])
> +
> +dnl Check that OVS catches route updates.
> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
> +
> +dnl Add a route to the main routing table and check that OVS caches
> +dnl this new route.
> +AT_CHECK([ip route add 10.0.0.18/32 dev p1-route])
> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17])
> +
> +dnl Add a route to a custom routing table and check that OVS doesn't cache it.
> +AT_CHECK([ip route add 10.0.0.19/32 dev p1-route table 42])
> +AT_CHECK([ip route show table 42 | grep 'p1-route' | grep -q '10.0.0.19'])
> +dnl Give the main thread a chance to act.
> +AT_CHECK([ovs-appctl revalidator/wait])
> +dnl Check that OVS didn't learn this route.
> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17
> +])
> +
> +dnl Delete a route from the main table and check that OVS removes the route
> +dnl from the cache.
> +AT_CHECK([ip route del 10.0.0.18/32 dev p1-route])
> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
> +
> +dnl Delete a route from a custom routing table and check that the cache
> +dnl dosn't change.
> +AT_CHECK([ip route del 10.0.0.19/32 dev p1-route table 42])
> +dnl Give the main thread a chance to act.
> +AT_CHECK([ovs-appctl revalidator/wait])
> +dnl Check that the cache is still the same.
> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
> +])
> +
> +dnl Delete ip address.
> +AT_CHECK([ip addr del 10.0.0.17/24 dev p1-route], [0], [stdout])
> +dnl Check that routes were removed from OVS.
> +OVS_WAIT_UNTIL([test $(ovs-appctl ovs/route/show | grep -c 'p1-route') -eq 0 ])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
Ilya Maximets March 20, 2024, 12:52 p.m. UTC | #2
On 3/19/24 20:56, Aaron Conole wrote:
> Ilya Maximets <i.maximets@ovn.org> writes:
> 
>> Currently, ovs-vswitchd is subscribed to all the routing changes in the
>> kernel.  On each change, it marks the internal routing table cache as
>> invalid, then resets it and dumps all the routes from the kernel from
>> scratch.  The reason for that is kernel routing updates not being
>> reliable in a sense that it's hard to tell which route is getting
>> removed or modified.  Userspace application has to track the order in
>> which route entries are dumped from the kernel.  Updates can get lost
>> or even duplicated and the kernel doesn't provide a good mechanism to
>> distinguish one route from another.  To my knowledge, dumping all the
>> routes from a kernel after each change is the only way to keep the
>> cache consistent.  Some more info can be found in the following never
>> addressed issues:
>>   https://bugzilla.redhat.com/1337860
>>   https://bugzilla.redhat.com/1337855
>>
>> It seems to be believed that NetworkManager "mostly" does incremental
>> updates right.  But it is still not completely correct, will re-dump
>> the whole table in certain cases, and it takes a huge amount of very
>> complicated code to do the accounting and route comparisons.
>>
>> Going back to ovs-vswitchd, it currently dumps routes from all the
>> routing tables.  If it will get conflicting routes from multiple
>> tables, the cache will not be useful.  The routing cache in userspace
>> is primarily used for checking the egress port for tunneled traffic
>> and this way also detecting link state changes for a tunnel port.
>> For userspace datapath it is used for actual routing of the packet
>> after sending to a native tunnel.
>> With kernel datapath we don't really have a mechanism to know which
>> routing table will actually be used by the kernel after encapsulation,
>> so our lookups on a cache may be incorrect because of this as well.
>>
>> So, unless all the relevant routes are in the standard tables, the
>> lookup in userspace route cache is unreliable.
>>
>> Luckily, most setups are not using any complicated routing in
>> non-standard tables that OVS has to be aware of.
>>
>> It is possible, but unlikely, that standard routing tables are
>> completely empty while some other custom table is not, and all the OVS
>> tunnel traffic is directed to that table.  That would be the only
>> scenario where dumping non-standard tables would make sense.  But it
>> seems like this kind of setup will likely need a way to tell OVS from
>> which table the routes should be taken, or we'll need to dump routing
>> rules and keep a separate cache for each table, so we can first match
>> on rules and then lookup correct routes in a specific table.  I'm not
>> sure if trying to implement all that is justified.
>>
>> For now, stop considering routes from non-standard tables to avoid
>> mixing different tables together and also wasting CPU resources.
>>
>> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
>> running on a same host and in a same network namespace with OVS using
>> its own custom routing table.
>>
>> Unfortunately, there seems to be no way to tell the kernel to send
>> updates only for particular tables.  So, we'll still receive and parse
>> all of them.  But they will not result in a full cache invalidation in
>> most cases.
>>
>> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
>> So, we can make use of it and dump only standard tables when we get a
>> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
>> the socket for filtering to work.  There is no reason to not enable it
>> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>>
>> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
>> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
>> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>  lib/netlink-protocol.h | 10 ++++++
>>  lib/netlink-socket.c   |  8 +++++
>>  lib/route-table.c      | 80 +++++++++++++++++++++++++++++++++---------
>>  tests/system-route.at  | 64 +++++++++++++++++++++++++++++++++
>>  4 files changed, 146 insertions(+), 16 deletions(-)
>>
>> diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
>> index 6eaa7035a..e4bb28ac9 100644
>> --- a/lib/netlink-protocol.h
>> +++ b/lib/netlink-protocol.h
>> @@ -155,6 +155,11 @@ enum {
>>  #define NLA_TYPE_MASK       ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
>>  #endif
>>  
>> +/* Introduced in v4.4. */
>> +#ifndef NLM_F_DUMP_FILTERED
>> +#define NLM_F_DUMP_FILTERED 0x20
>> +#endif
>> +
>>  /* These were introduced all together in 2.6.14.  (We want our programs to
>>   * support the newer kernel features even if compiled with older headers.) */
>>  #ifndef NETLINK_ADD_MEMBERSHIP
>> @@ -168,6 +173,11 @@ enum {
>>  #define NETLINK_LISTEN_ALL_NSID 8
>>  #endif
>>  
>> +/* Strict checking of netlink arguments introduced in Linux kernel v4.20. */
>> +#ifndef NETLINK_GET_STRICT_CHK
>> +#define NETLINK_GET_STRICT_CHK 12
>> +#endif
>> +
>>  /* These were introduced all together in 2.6.23.  (We want our programs to
>>   * support the newer kernel features even if compiled with older headers.) */
>>  #ifndef CTRL_ATTR_MCAST_GRP_MAX
>> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
>> index 80da20d9f..eea880b2c 100644
>> --- a/lib/netlink-socket.c
>> +++ b/lib/netlink-socket.c
>> @@ -205,6 +205,14 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
>>          }
>>      }
>>  
>> +    /* Strict checking only supported for NETLINK_ROUTE. */
>> +    if (protocol == NETLINK_ROUTE
>> +        && setsockopt(sock->fd, SOL_NETLINK, NETLINK_GET_STRICT_CHK,
>> +                      &one, sizeof one) < 0) {
>> +        VLOG_INFO("netlink: could not enable strict checking (%s)",
>> +                  ovs_strerror(errno));
> 
> Maybe this message is a bit scary for users - but I'm not sure what
> message might be less alarming.  Something like:
> 
> netlink: strict checking off (%s)
> 
> Not a big deal, just a kindof nit.  I hope it doesn't become VLOG_WARN
> later or something, and want to avoid someone reading it and thinking
> that it is a misclassified warning.

How about we do:

  VLOG(errno == ENOPROTOOPT ? VLL_DBG : VLL_WARN, ... );

i.e. keep it as a debug message for kernels that do not support it
and WARN if something actually went wrong.  What do you think?

> 
> I don't think it could happen often, but if we have opened a large
> number of nl sockets (maybe some large nl_pool_alloc) then we end up
> with quite a few of these messages as well.  Maybe we can just vlog this
> once?

We currently have a single place that opens NETLINK_ROUTE sockets, and
pools are per-protocol, so we should not have more than one of these.
Also should not be a problem with DBG.

> 
>> +    }
>> +
>>      retval = get_socket_rcvbuf(sock->fd);
>>      if (retval < 0) {
>>          retval = -retval;
>> diff --git a/lib/route-table.c b/lib/route-table.c
>> index 9927dcc18..f1fe32714 100644
>> --- a/lib/route-table.c
>> +++ b/lib/route-table.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/rtnetlink.h>
>>  #include <net/if.h>
>>  
>> +#include "coverage.h"
>>  #include "hash.h"
>>  #include "netdev.h"
>>  #include "netlink.h"
>> @@ -44,6 +45,8 @@
>>  
>>  VLOG_DEFINE_THIS_MODULE(route_table);
>>  
>> +COVERAGE_DEFINE(route_table_dump);
>> +
>>  struct route_data {
>>      /* Copied from struct rtmsg. */
>>      unsigned char rtm_dst_len;
>> @@ -80,7 +83,7 @@ static struct nln_notifier *name_notifier = NULL;
>>  
>>  static bool route_table_valid = false;
>>  
>> -static int route_table_reset(void);
>> +static void route_table_reset(void);
>>  static void route_table_handle_msg(const struct route_table_msg *);
>>  static int route_table_parse(struct ofpbuf *, struct route_table_msg *);
>>  static void route_table_change(const struct route_table_msg *, void *);
>> @@ -153,26 +156,22 @@ route_table_wait(void)
>>      ovs_mutex_unlock(&route_table_mutex);
>>  }
>>  
>> -static int
>> -route_table_reset(void)
>> +static bool
>> +route_table_dump_one_table(unsigned char id)
>>  {
>> -    struct nl_dump dump;
>> -    struct rtgenmsg *rtgenmsg;
>>      uint64_t reply_stub[NL_DUMP_BUFSIZE / 8];
>>      struct ofpbuf request, reply, buf;
>> -
>> -    route_map_clear();
>> -    netdev_get_addrs_list_flush();
>> -    route_table_valid = true;
>> -    rt_change_seq++;
>> +    struct rtmsg *rq_msg;
>> +    bool filtered = true;
>> +    struct nl_dump dump;
>>  
>>      ofpbuf_init(&request, 0);
>>  
>> -    nl_msg_put_nlmsghdr(&request, sizeof *rtgenmsg, RTM_GETROUTE,
>> -                        NLM_F_REQUEST);
>> +    nl_msg_put_nlmsghdr(&request, sizeof *rq_msg, RTM_GETROUTE, NLM_F_REQUEST);
>>  
>> -    rtgenmsg = ofpbuf_put_zeros(&request, sizeof *rtgenmsg);
>> -    rtgenmsg->rtgen_family = AF_UNSPEC;
>> +    rq_msg = ofpbuf_put_zeros(&request, sizeof *rq_msg);
>> +    rq_msg->rtm_family = AF_UNSPEC;
>> +    rq_msg->rtm_table = id;
> 
> Any reason to not consider setting RTA_TABLE attribute rather than use
> rtm_table.  RTA_TABLE supports the extended range, rather than just the
> 8bits range that rtm_table can fit.

I just wanted to avoid unnecessary memory allocation for an attribute.
All the tables we care about are in 8 bit range.  And compiler protects
us from an integer overflow, since all types are set to unsigned char.

We can add RTA_TABLE attribute, but we'll basically add the same field
twice as we're still putting the rtm_table in the message.

We could put RTA_TABLE just to be extra safe, but I'm not sure.

Does that make sense?

> 
>>  
>>      nl_dump_start(&dump, NETLINK_ROUTE, &request);
>>      ofpbuf_uninit(&request);
>> @@ -182,12 +181,43 @@ route_table_reset(void)
>>          struct route_table_msg msg;
>>  
>>          if (route_table_parse(&reply, &msg)) {
>> +            struct nlmsghdr *nlmsghdr = nl_msg_nlmsghdr(&reply);
>> +
>> +            /* Older kernels do not support filtering. */
>> +            if (!(nlmsghdr->nlmsg_flags & NLM_F_DUMP_FILTERED)) {
>> +                filtered = false;
>> +            }
>>              route_table_handle_msg(&msg);
>>          }
>>      }
>>      ofpbuf_uninit(&buf);
>> +    nl_dump_done(&dump);
>> +
>> +    return filtered;
>> +}
>> +
>> +static void
>> +route_table_reset(void)
>> +{
>> +    unsigned char tables[] = {
>> +        RT_TABLE_DEFAULT,
>> +        RT_TABLE_MAIN,
>> +        RT_TABLE_LOCAL,
>> +    };
>>  
>> -    return nl_dump_done(&dump);
>> +    route_map_clear();
>> +    netdev_get_addrs_list_flush();
>> +    route_table_valid = true;
>> +    rt_change_seq++;
>> +
>> +    COVERAGE_INC(route_table_dump);
>> +
>> +    for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
>> +        if (!route_table_dump_one_table(tables[i])) {
>> +            /* Got unfiltered reply, no need to dump further. */
>> +            break;
>> +        }
>> +    }
>>  }
>>  
>>  /* Return RTNLGRP_IPV4_ROUTE or RTNLGRP_IPV6_ROUTE on success, 0 on parse
>> @@ -203,6 +233,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>          [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
>>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>>          [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
>> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>>      };
>>  
>>      static const struct nl_policy policy6[] = {
>> @@ -211,6 +242,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>>          [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
>>          [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
>> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>>      };
>>  
>>      struct nlattr *attrs[ARRAY_SIZE(policy)];
>> @@ -232,6 +264,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>  
>>      if (parsed) {
>>          const struct nlmsghdr *nlmsg;
>> +        uint32_t table_id;
>>          int rta_oif;      /* Output interface index. */
>>  
>>          nlmsg = buf->data;
>> @@ -247,6 +280,19 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>              rtm->rtm_type != RTN_LOCAL) {
>>              change->relevant = false;
>>          }
>> +
>> +        table_id = rtm->rtm_table;
>> +        if (attrs[RTA_TABLE]) {
>> +            table_id = nl_attr_get_u32(attrs[RTA_TABLE]);
>> +        }
>> +        /* Do not consider changes in non-standard routing tables. */
>> +        if (table_id
>> +            && table_id != RT_TABLE_DEFAULT
>> +            && table_id != RT_TABLE_MAIN
>> +            && table_id != RT_TABLE_LOCAL) {
>> +            change->relevant = false;
>> +        }
>> +
>>          change->nlmsg_type     = nlmsg->nlmsg_type;
>>          change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? 96 : 0);
>>          change->rd.local = rtm->rtm_type == RTN_LOCAL;
>> @@ -312,7 +358,9 @@ static void
>>  route_table_change(const struct route_table_msg *change OVS_UNUSED,
>>                     void *aux OVS_UNUSED)
>>  {
>> -    route_table_valid = false;
>> +    if (!change || change->relevant) {
>> +        route_table_valid = false;
>> +    }
>>  }
>>  
>>  static void
>> diff --git a/tests/system-route.at b/tests/system-route.at
>> index 114aaebc7..c0ecad6cf 100644
>> --- a/tests/system-route.at
>> +++ b/tests/system-route.at
>> @@ -64,3 +64,67 @@ Cached: fc00:db8:beef::13/128 dev br0 GW fc00:db8:cafe::1 SRC fc00:db8:cafe::2])
>>  
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +dnl Checks that OVS doesn't use routes from non-standard tables.
>> +AT_SETUP([ovs-route - route tables])
>> +AT_KEYWORDS([route])
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +dnl Create tap port.
>> +on_exit 'ip link del p1-route'
>> +AT_CHECK([ip tuntap add name p1-route mode tap])
>> +AT_CHECK([ip link set p1-route up])
>> +
>> +dnl Add ip address.
>> +AT_CHECK([ip addr add 10.0.0.17/24 dev p1-route], [0], [stdout])
>> +
>> +dnl Check that OVS catches route updates.
>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
>> +
>> +dnl Add a route to the main routing table and check that OVS caches
>> +dnl this new route.
>> +AT_CHECK([ip route add 10.0.0.18/32 dev p1-route])
>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17])
>> +
>> +dnl Add a route to a custom routing table and check that OVS doesn't cache it.
>> +AT_CHECK([ip route add 10.0.0.19/32 dev p1-route table 42])
>> +AT_CHECK([ip route show table 42 | grep 'p1-route' | grep -q '10.0.0.19'])
>> +dnl Give the main thread a chance to act.
>> +AT_CHECK([ovs-appctl revalidator/wait])
>> +dnl Check that OVS didn't learn this route.
>> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17
>> +])
>> +
>> +dnl Delete a route from the main table and check that OVS removes the route
>> +dnl from the cache.
>> +AT_CHECK([ip route del 10.0.0.18/32 dev p1-route])
>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
>> +
>> +dnl Delete a route from a custom routing table and check that the cache
>> +dnl dosn't change.
>> +AT_CHECK([ip route del 10.0.0.19/32 dev p1-route table 42])
>> +dnl Give the main thread a chance to act.
>> +AT_CHECK([ovs-appctl revalidator/wait])
>> +dnl Check that the cache is still the same.
>> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>> +])
>> +
>> +dnl Delete ip address.
>> +AT_CHECK([ip addr del 10.0.0.17/24 dev p1-route], [0], [stdout])
>> +dnl Check that routes were removed from OVS.
>> +OVS_WAIT_UNTIL([test $(ovs-appctl ovs/route/show | grep -c 'p1-route') -eq 0 ])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>
Aaron Conole March 20, 2024, 2:56 p.m. UTC | #3
Ilya Maximets <i.maximets@ovn.org> writes:

> On 3/19/24 20:56, Aaron Conole wrote:
>> Ilya Maximets <i.maximets@ovn.org> writes:
>> 
>>> Currently, ovs-vswitchd is subscribed to all the routing changes in the
>>> kernel.  On each change, it marks the internal routing table cache as
>>> invalid, then resets it and dumps all the routes from the kernel from
>>> scratch.  The reason for that is kernel routing updates not being
>>> reliable in a sense that it's hard to tell which route is getting
>>> removed or modified.  Userspace application has to track the order in
>>> which route entries are dumped from the kernel.  Updates can get lost
>>> or even duplicated and the kernel doesn't provide a good mechanism to
>>> distinguish one route from another.  To my knowledge, dumping all the
>>> routes from a kernel after each change is the only way to keep the
>>> cache consistent.  Some more info can be found in the following never
>>> addressed issues:
>>>   https://bugzilla.redhat.com/1337860
>>>   https://bugzilla.redhat.com/1337855
>>>
>>> It seems to be believed that NetworkManager "mostly" does incremental
>>> updates right.  But it is still not completely correct, will re-dump
>>> the whole table in certain cases, and it takes a huge amount of very
>>> complicated code to do the accounting and route comparisons.
>>>
>>> Going back to ovs-vswitchd, it currently dumps routes from all the
>>> routing tables.  If it will get conflicting routes from multiple
>>> tables, the cache will not be useful.  The routing cache in userspace
>>> is primarily used for checking the egress port for tunneled traffic
>>> and this way also detecting link state changes for a tunnel port.
>>> For userspace datapath it is used for actual routing of the packet
>>> after sending to a native tunnel.
>>> With kernel datapath we don't really have a mechanism to know which
>>> routing table will actually be used by the kernel after encapsulation,
>>> so our lookups on a cache may be incorrect because of this as well.
>>>
>>> So, unless all the relevant routes are in the standard tables, the
>>> lookup in userspace route cache is unreliable.
>>>
>>> Luckily, most setups are not using any complicated routing in
>>> non-standard tables that OVS has to be aware of.
>>>
>>> It is possible, but unlikely, that standard routing tables are
>>> completely empty while some other custom table is not, and all the OVS
>>> tunnel traffic is directed to that table.  That would be the only
>>> scenario where dumping non-standard tables would make sense.  But it
>>> seems like this kind of setup will likely need a way to tell OVS from
>>> which table the routes should be taken, or we'll need to dump routing
>>> rules and keep a separate cache for each table, so we can first match
>>> on rules and then lookup correct routes in a specific table.  I'm not
>>> sure if trying to implement all that is justified.
>>>
>>> For now, stop considering routes from non-standard tables to avoid
>>> mixing different tables together and also wasting CPU resources.
>>>
>>> This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is
>>> running on a same host and in a same network namespace with OVS using
>>> its own custom routing table.
>>>
>>> Unfortunately, there seems to be no way to tell the kernel to send
>>> updates only for particular tables.  So, we'll still receive and parse
>>> all of them.  But they will not result in a full cache invalidation in
>>> most cases.
>>>
>>> Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps.
>>> So, we can make use of it and dump only standard tables when we get a
>>> relevant route update.  NETLINK_GET_STRICT_CHK has to be enabled on
>>> the socket for filtering to work.  There is no reason to not enable it
>>> by default, if supported.  It is not used outside of NETLINK_ROUTE.
>>>
>>> Fixes: f0e167f0dbad ("route-table: Handle route updates more robustly.")
>>> Fixes: ea83a2fcd0d3 ("lib: Show tunnel egress interface in ovsdb")
>>> Reported-at: https://github.com/openvswitch/ovs-issues/issues/185
>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>>  lib/netlink-protocol.h | 10 ++++++
>>>  lib/netlink-socket.c   |  8 +++++
>>>  lib/route-table.c      | 80 +++++++++++++++++++++++++++++++++---------
>>>  tests/system-route.at  | 64 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 146 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
>>> index 6eaa7035a..e4bb28ac9 100644
>>> --- a/lib/netlink-protocol.h
>>> +++ b/lib/netlink-protocol.h
>>> @@ -155,6 +155,11 @@ enum {
>>>  #define NLA_TYPE_MASK       ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
>>>  #endif
>>>  
>>> +/* Introduced in v4.4. */
>>> +#ifndef NLM_F_DUMP_FILTERED
>>> +#define NLM_F_DUMP_FILTERED 0x20
>>> +#endif
>>> +
>>>  /* These were introduced all together in 2.6.14.  (We want our programs to
>>>   * support the newer kernel features even if compiled with older headers.) */
>>>  #ifndef NETLINK_ADD_MEMBERSHIP
>>> @@ -168,6 +173,11 @@ enum {
>>>  #define NETLINK_LISTEN_ALL_NSID 8
>>>  #endif
>>>  
>>> +/* Strict checking of netlink arguments introduced in Linux kernel v4.20. */
>>> +#ifndef NETLINK_GET_STRICT_CHK
>>> +#define NETLINK_GET_STRICT_CHK 12
>>> +#endif
>>> +
>>>  /* These were introduced all together in 2.6.23.  (We want our programs to
>>>   * support the newer kernel features even if compiled with older headers.) */
>>>  #ifndef CTRL_ATTR_MCAST_GRP_MAX
>>> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
>>> index 80da20d9f..eea880b2c 100644
>>> --- a/lib/netlink-socket.c
>>> +++ b/lib/netlink-socket.c
>>> @@ -205,6 +205,14 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
>>>          }
>>>      }
>>>  
>>> +    /* Strict checking only supported for NETLINK_ROUTE. */
>>> +    if (protocol == NETLINK_ROUTE
>>> +        && setsockopt(sock->fd, SOL_NETLINK, NETLINK_GET_STRICT_CHK,
>>> +                      &one, sizeof one) < 0) {
>>> +        VLOG_INFO("netlink: could not enable strict checking (%s)",
>>> +                  ovs_strerror(errno));
>> 
>> Maybe this message is a bit scary for users - but I'm not sure what
>> message might be less alarming.  Something like:
>> 
>> netlink: strict checking off (%s)
>> 
>> Not a big deal, just a kindof nit.  I hope it doesn't become VLOG_WARN
>> later or something, and want to avoid someone reading it and thinking
>> that it is a misclassified warning.
>
> How about we do:
>
>   VLOG(errno == ENOPROTOOPT ? VLL_DBG : VLL_WARN, ... );
>
> i.e. keep it as a debug message for kernels that do not support it
> and WARN if something actually went wrong.  What do you think?

Seems reasonable.

>> 
>> I don't think it could happen often, but if we have opened a large
>> number of nl sockets (maybe some large nl_pool_alloc) then we end up
>> with quite a few of these messages as well.  Maybe we can just vlog this
>> once?
>
> We currently have a single place that opens NETLINK_ROUTE sockets, and
> pools are per-protocol, so we should not have more than one of these.
> Also should not be a problem with DBG.

Ack.

>> 
>>> +    }
>>> +
>>>      retval = get_socket_rcvbuf(sock->fd);
>>>      if (retval < 0) {
>>>          retval = -retval;
>>> diff --git a/lib/route-table.c b/lib/route-table.c
>>> index 9927dcc18..f1fe32714 100644
>>> --- a/lib/route-table.c
>>> +++ b/lib/route-table.c
>>> @@ -26,6 +26,7 @@
>>>  #include <linux/rtnetlink.h>
>>>  #include <net/if.h>
>>>  
>>> +#include "coverage.h"
>>>  #include "hash.h"
>>>  #include "netdev.h"
>>>  #include "netlink.h"
>>> @@ -44,6 +45,8 @@
>>>  
>>>  VLOG_DEFINE_THIS_MODULE(route_table);
>>>  
>>> +COVERAGE_DEFINE(route_table_dump);
>>> +
>>>  struct route_data {
>>>      /* Copied from struct rtmsg. */
>>>      unsigned char rtm_dst_len;
>>> @@ -80,7 +83,7 @@ static struct nln_notifier *name_notifier = NULL;
>>>  
>>>  static bool route_table_valid = false;
>>>  
>>> -static int route_table_reset(void);
>>> +static void route_table_reset(void);
>>>  static void route_table_handle_msg(const struct route_table_msg *);
>>>  static int route_table_parse(struct ofpbuf *, struct route_table_msg *);
>>>  static void route_table_change(const struct route_table_msg *, void *);
>>> @@ -153,26 +156,22 @@ route_table_wait(void)
>>>      ovs_mutex_unlock(&route_table_mutex);
>>>  }
>>>  
>>> -static int
>>> -route_table_reset(void)
>>> +static bool
>>> +route_table_dump_one_table(unsigned char id)
>>>  {
>>> -    struct nl_dump dump;
>>> -    struct rtgenmsg *rtgenmsg;
>>>      uint64_t reply_stub[NL_DUMP_BUFSIZE / 8];
>>>      struct ofpbuf request, reply, buf;
>>> -
>>> -    route_map_clear();
>>> -    netdev_get_addrs_list_flush();
>>> -    route_table_valid = true;
>>> -    rt_change_seq++;
>>> +    struct rtmsg *rq_msg;
>>> +    bool filtered = true;
>>> +    struct nl_dump dump;
>>>  
>>>      ofpbuf_init(&request, 0);
>>>  
>>> -    nl_msg_put_nlmsghdr(&request, sizeof *rtgenmsg, RTM_GETROUTE,
>>> -                        NLM_F_REQUEST);
>>> +    nl_msg_put_nlmsghdr(&request, sizeof *rq_msg, RTM_GETROUTE, NLM_F_REQUEST);
>>>  
>>> -    rtgenmsg = ofpbuf_put_zeros(&request, sizeof *rtgenmsg);
>>> -    rtgenmsg->rtgen_family = AF_UNSPEC;
>>> +    rq_msg = ofpbuf_put_zeros(&request, sizeof *rq_msg);
>>> +    rq_msg->rtm_family = AF_UNSPEC;
>>> +    rq_msg->rtm_table = id;
>> 
>> Any reason to not consider setting RTA_TABLE attribute rather than use
>> rtm_table.  RTA_TABLE supports the extended range, rather than just the
>> 8bits range that rtm_table can fit.
>
> I just wanted to avoid unnecessary memory allocation for an attribute.
> All the tables we care about are in 8 bit range.  And compiler protects
> us from an integer overflow, since all types are set to unsigned char.
>
> We can add RTA_TABLE attribute, but we'll basically add the same field
> twice as we're still putting the rtm_table in the message.
>
> We could put RTA_TABLE just to be extra safe, but I'm not sure.
>
> Does that make sense?

I guess it makes sense.  I think RTA_TABLE has been considered best
practice for quite some time, but since we are only interested in main
routing table, it doesn't matter.

>> 
>>>  
>>>      nl_dump_start(&dump, NETLINK_ROUTE, &request);
>>>      ofpbuf_uninit(&request);
>>> @@ -182,12 +181,43 @@ route_table_reset(void)
>>>          struct route_table_msg msg;
>>>  
>>>          if (route_table_parse(&reply, &msg)) {
>>> +            struct nlmsghdr *nlmsghdr = nl_msg_nlmsghdr(&reply);
>>> +
>>> +            /* Older kernels do not support filtering. */
>>> +            if (!(nlmsghdr->nlmsg_flags & NLM_F_DUMP_FILTERED)) {
>>> +                filtered = false;
>>> +            }
>>>              route_table_handle_msg(&msg);
>>>          }
>>>      }
>>>      ofpbuf_uninit(&buf);
>>> +    nl_dump_done(&dump);
>>> +
>>> +    return filtered;
>>> +}
>>> +
>>> +static void
>>> +route_table_reset(void)
>>> +{
>>> +    unsigned char tables[] = {
>>> +        RT_TABLE_DEFAULT,
>>> +        RT_TABLE_MAIN,
>>> +        RT_TABLE_LOCAL,
>>> +    };
>>>  
>>> -    return nl_dump_done(&dump);
>>> +    route_map_clear();
>>> +    netdev_get_addrs_list_flush();
>>> +    route_table_valid = true;
>>> +    rt_change_seq++;
>>> +
>>> +    COVERAGE_INC(route_table_dump);
>>> +
>>> +    for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
>>> +        if (!route_table_dump_one_table(tables[i])) {
>>> +            /* Got unfiltered reply, no need to dump further. */
>>> +            break;
>>> +        }
>>> +    }
>>>  }
>>>  
>>>  /* Return RTNLGRP_IPV4_ROUTE or RTNLGRP_IPV6_ROUTE on success, 0 on parse
>>> @@ -203,6 +233,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>>          [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
>>>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>>>          [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
>>> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>>>      };
>>>  
>>>      static const struct nl_policy policy6[] = {
>>> @@ -211,6 +242,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>>          [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>>>          [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
>>>          [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
>>> +        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>>>      };
>>>  
>>>      struct nlattr *attrs[ARRAY_SIZE(policy)];
>>> @@ -232,6 +264,7 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>>  
>>>      if (parsed) {
>>>          const struct nlmsghdr *nlmsg;
>>> +        uint32_t table_id;
>>>          int rta_oif;      /* Output interface index. */
>>>  
>>>          nlmsg = buf->data;
>>> @@ -247,6 +280,19 @@ route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
>>>              rtm->rtm_type != RTN_LOCAL) {
>>>              change->relevant = false;
>>>          }
>>> +
>>> +        table_id = rtm->rtm_table;
>>> +        if (attrs[RTA_TABLE]) {
>>> +            table_id = nl_attr_get_u32(attrs[RTA_TABLE]);
>>> +        }
>>> +        /* Do not consider changes in non-standard routing tables. */
>>> +        if (table_id
>>> +            && table_id != RT_TABLE_DEFAULT
>>> +            && table_id != RT_TABLE_MAIN
>>> +            && table_id != RT_TABLE_LOCAL) {
>>> +            change->relevant = false;
>>> +        }
>>> +
>>>          change->nlmsg_type     = nlmsg->nlmsg_type;
>>>          change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? 96 : 0);
>>>          change->rd.local = rtm->rtm_type == RTN_LOCAL;
>>> @@ -312,7 +358,9 @@ static void
>>>  route_table_change(const struct route_table_msg *change OVS_UNUSED,
>>>                     void *aux OVS_UNUSED)
>>>  {
>>> -    route_table_valid = false;
>>> +    if (!change || change->relevant) {
>>> +        route_table_valid = false;
>>> +    }
>>>  }
>>>  
>>>  static void
>>> diff --git a/tests/system-route.at b/tests/system-route.at
>>> index 114aaebc7..c0ecad6cf 100644
>>> --- a/tests/system-route.at
>>> +++ b/tests/system-route.at
>>> @@ -64,3 +64,67 @@ Cached: fc00:db8:beef::13/128 dev br0 GW
> fc00:db8:cafe::1 SRC fc00:db8:cafe::2])
>>>  
>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>  AT_CLEANUP
>>> +
>>> +dnl Checks that OVS doesn't use routes from non-standard tables.
>>> +AT_SETUP([ovs-route - route tables])
>>> +AT_KEYWORDS([route])
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +
>>> +dnl Create tap port.
>>> +on_exit 'ip link del p1-route'
>>> +AT_CHECK([ip tuntap add name p1-route mode tap])
>>> +AT_CHECK([ip link set p1-route up])
>>> +
>>> +dnl Add ip address.
>>> +AT_CHECK([ip addr add 10.0.0.17/24 dev p1-route], [0], [stdout])
>>> +
>>> +dnl Check that OVS catches route updates.
>>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
>>> +
>>> +dnl Add a route to the main routing table and check that OVS caches
>>> +dnl this new route.
>>> +AT_CHECK([ip route add 10.0.0.18/32 dev p1-route])
>>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>>> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17])
>>> +
>>> +dnl Add a route to a custom routing table and check that OVS doesn't cache it.
>>> +AT_CHECK([ip route add 10.0.0.19/32 dev p1-route table 42])
>>> +AT_CHECK([ip route show table 42 | grep 'p1-route' | grep -q '10.0.0.19'])
>>> +dnl Give the main thread a chance to act.
>>> +AT_CHECK([ovs-appctl revalidator/wait])
>>> +dnl Check that OVS didn't learn this route.
>>> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
>>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>>> +Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17
>>> +])
>>> +
>>> +dnl Delete a route from the main table and check that OVS removes the route
>>> +dnl from the cache.
>>> +AT_CHECK([ip route del 10.0.0.18/32 dev p1-route])
>>> +OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
>>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
>>> +
>>> +dnl Delete a route from a custom routing table and check that the cache
>>> +dnl dosn't change.
>>> +AT_CHECK([ip route del 10.0.0.19/32 dev p1-route table 42])
>>> +dnl Give the main thread a chance to act.
>>> +AT_CHECK([ovs-appctl revalidator/wait])
>>> +dnl Check that the cache is still the same.
>>> +AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
>>> +Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
>>> +Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
>>> +])
>>> +
>>> +dnl Delete ip address.
>>> +AT_CHECK([ip addr del 10.0.0.17/24 dev p1-route], [0], [stdout])
>>> +dnl Check that routes were removed from OVS.
>>> +OVS_WAIT_UNTIL([test $(ovs-appctl ovs/route/show | grep -c 'p1-route') -eq 0 ])
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>
diff mbox series

Patch

diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
index 6eaa7035a..e4bb28ac9 100644
--- a/lib/netlink-protocol.h
+++ b/lib/netlink-protocol.h
@@ -155,6 +155,11 @@  enum {
 #define NLA_TYPE_MASK       ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
 #endif
 
+/* Introduced in v4.4. */
+#ifndef NLM_F_DUMP_FILTERED
+#define NLM_F_DUMP_FILTERED 0x20
+#endif
+
 /* These were introduced all together in 2.6.14.  (We want our programs to
  * support the newer kernel features even if compiled with older headers.) */
 #ifndef NETLINK_ADD_MEMBERSHIP
@@ -168,6 +173,11 @@  enum {
 #define NETLINK_LISTEN_ALL_NSID 8
 #endif
 
+/* Strict checking of netlink arguments introduced in Linux kernel v4.20. */
+#ifndef NETLINK_GET_STRICT_CHK
+#define NETLINK_GET_STRICT_CHK 12
+#endif
+
 /* These were introduced all together in 2.6.23.  (We want our programs to
  * support the newer kernel features even if compiled with older headers.) */
 #ifndef CTRL_ATTR_MCAST_GRP_MAX
diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 80da20d9f..eea880b2c 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -205,6 +205,14 @@  nl_sock_create(int protocol, struct nl_sock **sockp)
         }
     }
 
+    /* Strict checking only supported for NETLINK_ROUTE. */
+    if (protocol == NETLINK_ROUTE
+        && setsockopt(sock->fd, SOL_NETLINK, NETLINK_GET_STRICT_CHK,
+                      &one, sizeof one) < 0) {
+        VLOG_INFO("netlink: could not enable strict checking (%s)",
+                  ovs_strerror(errno));
+    }
+
     retval = get_socket_rcvbuf(sock->fd);
     if (retval < 0) {
         retval = -retval;
diff --git a/lib/route-table.c b/lib/route-table.c
index 9927dcc18..f1fe32714 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -26,6 +26,7 @@ 
 #include <linux/rtnetlink.h>
 #include <net/if.h>
 
+#include "coverage.h"
 #include "hash.h"
 #include "netdev.h"
 #include "netlink.h"
@@ -44,6 +45,8 @@ 
 
 VLOG_DEFINE_THIS_MODULE(route_table);
 
+COVERAGE_DEFINE(route_table_dump);
+
 struct route_data {
     /* Copied from struct rtmsg. */
     unsigned char rtm_dst_len;
@@ -80,7 +83,7 @@  static struct nln_notifier *name_notifier = NULL;
 
 static bool route_table_valid = false;
 
-static int route_table_reset(void);
+static void route_table_reset(void);
 static void route_table_handle_msg(const struct route_table_msg *);
 static int route_table_parse(struct ofpbuf *, struct route_table_msg *);
 static void route_table_change(const struct route_table_msg *, void *);
@@ -153,26 +156,22 @@  route_table_wait(void)
     ovs_mutex_unlock(&route_table_mutex);
 }
 
-static int
-route_table_reset(void)
+static bool
+route_table_dump_one_table(unsigned char id)
 {
-    struct nl_dump dump;
-    struct rtgenmsg *rtgenmsg;
     uint64_t reply_stub[NL_DUMP_BUFSIZE / 8];
     struct ofpbuf request, reply, buf;
-
-    route_map_clear();
-    netdev_get_addrs_list_flush();
-    route_table_valid = true;
-    rt_change_seq++;
+    struct rtmsg *rq_msg;
+    bool filtered = true;
+    struct nl_dump dump;
 
     ofpbuf_init(&request, 0);
 
-    nl_msg_put_nlmsghdr(&request, sizeof *rtgenmsg, RTM_GETROUTE,
-                        NLM_F_REQUEST);
+    nl_msg_put_nlmsghdr(&request, sizeof *rq_msg, RTM_GETROUTE, NLM_F_REQUEST);
 
-    rtgenmsg = ofpbuf_put_zeros(&request, sizeof *rtgenmsg);
-    rtgenmsg->rtgen_family = AF_UNSPEC;
+    rq_msg = ofpbuf_put_zeros(&request, sizeof *rq_msg);
+    rq_msg->rtm_family = AF_UNSPEC;
+    rq_msg->rtm_table = id;
 
     nl_dump_start(&dump, NETLINK_ROUTE, &request);
     ofpbuf_uninit(&request);
@@ -182,12 +181,43 @@  route_table_reset(void)
         struct route_table_msg msg;
 
         if (route_table_parse(&reply, &msg)) {
+            struct nlmsghdr *nlmsghdr = nl_msg_nlmsghdr(&reply);
+
+            /* Older kernels do not support filtering. */
+            if (!(nlmsghdr->nlmsg_flags & NLM_F_DUMP_FILTERED)) {
+                filtered = false;
+            }
             route_table_handle_msg(&msg);
         }
     }
     ofpbuf_uninit(&buf);
+    nl_dump_done(&dump);
+
+    return filtered;
+}
+
+static void
+route_table_reset(void)
+{
+    unsigned char tables[] = {
+        RT_TABLE_DEFAULT,
+        RT_TABLE_MAIN,
+        RT_TABLE_LOCAL,
+    };
 
-    return nl_dump_done(&dump);
+    route_map_clear();
+    netdev_get_addrs_list_flush();
+    route_table_valid = true;
+    rt_change_seq++;
+
+    COVERAGE_INC(route_table_dump);
+
+    for (size_t i = 0; i < ARRAY_SIZE(tables); i++) {
+        if (!route_table_dump_one_table(tables[i])) {
+            /* Got unfiltered reply, no need to dump further. */
+            break;
+        }
+    }
 }
 
 /* Return RTNLGRP_IPV4_ROUTE or RTNLGRP_IPV6_ROUTE on success, 0 on parse
@@ -203,6 +233,7 @@  route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
         [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
         [RTA_MARK] = { .type = NL_A_U32, .optional = true },
         [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
+        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
     };
 
     static const struct nl_policy policy6[] = {
@@ -211,6 +242,7 @@  route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
         [RTA_MARK] = { .type = NL_A_U32, .optional = true },
         [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
         [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
+        [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
     };
 
     struct nlattr *attrs[ARRAY_SIZE(policy)];
@@ -232,6 +264,7 @@  route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
 
     if (parsed) {
         const struct nlmsghdr *nlmsg;
+        uint32_t table_id;
         int rta_oif;      /* Output interface index. */
 
         nlmsg = buf->data;
@@ -247,6 +280,19 @@  route_table_parse(struct ofpbuf *buf, struct route_table_msg *change)
             rtm->rtm_type != RTN_LOCAL) {
             change->relevant = false;
         }
+
+        table_id = rtm->rtm_table;
+        if (attrs[RTA_TABLE]) {
+            table_id = nl_attr_get_u32(attrs[RTA_TABLE]);
+        }
+        /* Do not consider changes in non-standard routing tables. */
+        if (table_id
+            && table_id != RT_TABLE_DEFAULT
+            && table_id != RT_TABLE_MAIN
+            && table_id != RT_TABLE_LOCAL) {
+            change->relevant = false;
+        }
+
         change->nlmsg_type     = nlmsg->nlmsg_type;
         change->rd.rtm_dst_len = rtm->rtm_dst_len + (ipv4 ? 96 : 0);
         change->rd.local = rtm->rtm_type == RTN_LOCAL;
@@ -312,7 +358,9 @@  static void
 route_table_change(const struct route_table_msg *change OVS_UNUSED,
                    void *aux OVS_UNUSED)
 {
-    route_table_valid = false;
+    if (!change || change->relevant) {
+        route_table_valid = false;
+    }
 }
 
 static void
diff --git a/tests/system-route.at b/tests/system-route.at
index 114aaebc7..c0ecad6cf 100644
--- a/tests/system-route.at
+++ b/tests/system-route.at
@@ -64,3 +64,67 @@  Cached: fc00:db8:beef::13/128 dev br0 GW fc00:db8:cafe::1 SRC fc00:db8:cafe::2])
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+dnl Checks that OVS doesn't use routes from non-standard tables.
+AT_SETUP([ovs-route - route tables])
+AT_KEYWORDS([route])
+OVS_TRAFFIC_VSWITCHD_START()
+
+dnl Create tap port.
+on_exit 'ip link del p1-route'
+AT_CHECK([ip tuntap add name p1-route mode tap])
+AT_CHECK([ip link set p1-route up])
+
+dnl Add ip address.
+AT_CHECK([ip addr add 10.0.0.17/24 dev p1-route], [0], [stdout])
+
+dnl Check that OVS catches route updates.
+OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
+
+dnl Add a route to the main routing table and check that OVS caches
+dnl this new route.
+AT_CHECK([ip route add 10.0.0.18/32 dev p1-route])
+OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
+Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17])
+
+dnl Add a route to a custom routing table and check that OVS doesn't cache it.
+AT_CHECK([ip route add 10.0.0.19/32 dev p1-route table 42])
+AT_CHECK([ip route show table 42 | grep 'p1-route' | grep -q '10.0.0.19'])
+dnl Give the main thread a chance to act.
+AT_CHECK([ovs-appctl revalidator/wait])
+dnl Check that OVS didn't learn this route.
+AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
+Cached: 10.0.0.18/32 dev p1-route SRC 10.0.0.17
+])
+
+dnl Delete a route from the main table and check that OVS removes the route
+dnl from the cache.
+AT_CHECK([ip route del 10.0.0.18/32 dev p1-route])
+OVS_WAIT_UNTIL_EQUAL([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local])
+
+dnl Delete a route from a custom routing table and check that the cache
+dnl dosn't change.
+AT_CHECK([ip route del 10.0.0.19/32 dev p1-route table 42])
+dnl Give the main thread a chance to act.
+AT_CHECK([ovs-appctl revalidator/wait])
+dnl Check that the cache is still the same.
+AT_CHECK([ovs-appctl ovs/route/show | grep 'p1-route' | sort], [0], [dnl
+Cached: 10.0.0.0/24 dev p1-route SRC 10.0.0.17
+Cached: 10.0.0.17/32 dev p1-route SRC 10.0.0.17 local
+])
+
+dnl Delete ip address.
+AT_CHECK([ip addr del 10.0.0.17/24 dev p1-route], [0], [stdout])
+dnl Check that routes were removed from OVS.
+OVS_WAIT_UNTIL([test $(ovs-appctl ovs/route/show | grep -c 'p1-route') -eq 0 ])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP