Message ID | 1500435856-4120-2-git-send-email-xiangxia.m.yue@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | Ben Pfaff |
Headers | show |
On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote: > ovs-router module uses the netdev_open() to get routes. > But this module always calls the netdev_open() with type > which is NULL. This module may open the eth0, vethx, > vxlan_sys_4789, br0 if these network devices exist. And > these device will be opened as 'system' type. > > When debugging, somewhere netdev_ref it. After reverting > "netdev: Fix netdev_open() to adhere to class type if given", > and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show', > the info is shown as below. the vxlan_sys_4789 is up > (eg. ifconfig vxlan_sys_4789 up). > > $ ovs-dpctl show > system@ovs-system: > lookups: hit:4053 missed:118 lost:3 > flows: 0 > masks: hit:4154 total:1 hit/pkt:1.00 > port 0: ovs-system (internal) > port 1: user-ovs-vm (internal) > port 2: vxlan_sys_4789 (vxlan) > > But the info should be as below. > $ ovs-dpctl show > system@ovs-system: > lookups: hit:4053 missed:118 lost:3 > flows: 0 > masks: hit:4154 total:1 hit/pkt:1.00 > port 0: ovs-system (internal) > port 1: user-ovs-vm (internal) > port 2: vxlan_sys_4789 (vxlan: packet_type=ptap) > > Because the netdev-class of 'system' type does not have the > 'get_config', and tunnel vports have 'get_config', then we can > get the config info(eg. packet_type=ptap) of tunnel vports. > > If we only revert the patch, there is a bug all the time. The > patch which Eelco support is fine to me. That patch avoid issue. > URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html > But without it, the patch I support also avoid the problem. > > However we should check the type in the ovs-router module, this > patch works well with the patch Eelco support. > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Thanks for the bug fix. This patch seems fine but can you help me understand the change to route_table_init()? It isn't obviously connected to the rest of the patch. Thanks, Ben. > diff --git a/lib/route-table.c b/lib/route-table.c > index 67fda31..fc6845f 100644 > --- a/lib/route-table.c > +++ b/lib/route-table.c > @@ -112,7 +112,6 @@ route_table_init(void) > nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE, > (nln_notify_func *) route_table_change, NULL); > > - route_table_reset(); > name_table_init(); > > ovs_mutex_unlock(&route_table_mutex);
We can avoid the deadlock via removing the route_table_reset() from the route_table_init() the call trace is below dp_initialize (ovsthread_once) route_table_init route_table_reset route_table_handle_msg ovs_router_insert__ ovs_router_insert__ get_src_addr get_netdev_type dp_enumerate_types dp_initialize (ovsthread_once) After removing the route_table_reset() from the route_table_init(), the ovs-router works well. Because we run the route_table_run() periodically, and route_table_valid is inited as false (we will call the route_table_reset in this case). So it’s also unnecessary to immediately reset route table in the route_table_init(). On Fri, Aug 4, 2017 at 2:00 AM, Ben Pfaff <blp@ovn.org> wrote: > On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote: >> ovs-router module uses the netdev_open() to get routes. >> But this module always calls the netdev_open() with type >> which is NULL. This module may open the eth0, vethx, >> vxlan_sys_4789, br0 if these network devices exist. And >> these device will be opened as 'system' type. >> >> When debugging, somewhere netdev_ref it. After reverting >> "netdev: Fix netdev_open() to adhere to class type if given", >> and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show', >> the info is shown as below. the vxlan_sys_4789 is up >> (eg. ifconfig vxlan_sys_4789 up). >> >> $ ovs-dpctl show >> system@ovs-system: >> lookups: hit:4053 missed:118 lost:3 >> flows: 0 >> masks: hit:4154 total:1 hit/pkt:1.00 >> port 0: ovs-system (internal) >> port 1: user-ovs-vm (internal) >> port 2: vxlan_sys_4789 (vxlan) >> >> But the info should be as below. >> $ ovs-dpctl show >> system@ovs-system: >> lookups: hit:4053 missed:118 lost:3 >> flows: 0 >> masks: hit:4154 total:1 hit/pkt:1.00 >> port 0: ovs-system (internal) >> port 1: user-ovs-vm (internal) >> port 2: vxlan_sys_4789 (vxlan: packet_type=ptap) >> >> Because the netdev-class of 'system' type does not have the >> 'get_config', and tunnel vports have 'get_config', then we can >> get the config info(eg. packet_type=ptap) of tunnel vports. >> >> If we only revert the patch, there is a bug all the time. The >> patch which Eelco support is fine to me. That patch avoid issue. >> URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html >> But without it, the patch I support also avoid the problem. >> >> However we should check the type in the ovs-router module, this >> patch works well with the patch Eelco support. >> >> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > Thanks for the bug fix. > > This patch seems fine but can you help me understand the change to > route_table_init()? It isn't obviously connected to the rest of the > patch. > > Thanks, > > Ben. > >> diff --git a/lib/route-table.c b/lib/route-table.c >> index 67fda31..fc6845f 100644 >> --- a/lib/route-table.c >> +++ b/lib/route-table.c >> @@ -112,7 +112,6 @@ route_table_init(void) >> nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE, >> (nln_notify_func *) route_table_change, NULL); >> >> - route_table_reset(); >> name_table_init(); >> >> ovs_mutex_unlock(&route_table_mutex);
Hi Ben, this patch is ok ? On Fri, Aug 4, 2017 at 10:45 AM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote: > We can avoid the deadlock via removing the route_table_reset() from > the route_table_init() > > the call trace is below > dp_initialize (ovsthread_once) > route_table_init > route_table_reset > route_table_handle_msg > ovs_router_insert__ > > ovs_router_insert__ > get_src_addr > get_netdev_type > dp_enumerate_types > dp_initialize (ovsthread_once) > > > After removing the route_table_reset() from the route_table_init(), > the ovs-router works well. Because we run the route_table_run() > periodically, and route_table_valid is inited as false (we will call > the route_table_reset in this case). So it’s also unnecessary to > immediately reset route table in the route_table_init(). > > On Fri, Aug 4, 2017 at 2:00 AM, Ben Pfaff <blp@ovn.org> wrote: >> On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote: >>> ovs-router module uses the netdev_open() to get routes. >>> But this module always calls the netdev_open() with type >>> which is NULL. This module may open the eth0, vethx, >>> vxlan_sys_4789, br0 if these network devices exist. And >>> these device will be opened as 'system' type. >>> >>> When debugging, somewhere netdev_ref it. After reverting >>> "netdev: Fix netdev_open() to adhere to class type if given", >>> and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show', >>> the info is shown as below. the vxlan_sys_4789 is up >>> (eg. ifconfig vxlan_sys_4789 up). >>> >>> $ ovs-dpctl show >>> system@ovs-system: >>> lookups: hit:4053 missed:118 lost:3 >>> flows: 0 >>> masks: hit:4154 total:1 hit/pkt:1.00 >>> port 0: ovs-system (internal) >>> port 1: user-ovs-vm (internal) >>> port 2: vxlan_sys_4789 (vxlan) >>> >>> But the info should be as below. >>> $ ovs-dpctl show >>> system@ovs-system: >>> lookups: hit:4053 missed:118 lost:3 >>> flows: 0 >>> masks: hit:4154 total:1 hit/pkt:1.00 >>> port 0: ovs-system (internal) >>> port 1: user-ovs-vm (internal) >>> port 2: vxlan_sys_4789 (vxlan: packet_type=ptap) >>> >>> Because the netdev-class of 'system' type does not have the >>> 'get_config', and tunnel vports have 'get_config', then we can >>> get the config info(eg. packet_type=ptap) of tunnel vports. >>> >>> If we only revert the patch, there is a bug all the time. The >>> patch which Eelco support is fine to me. That patch avoid issue. >>> URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html >>> But without it, the patch I support also avoid the problem. >>> >>> However we should check the type in the ovs-router module, this >>> patch works well with the patch Eelco support. >>> >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> >> >> Thanks for the bug fix. >> >> This patch seems fine but can you help me understand the change to >> route_table_init()? It isn't obviously connected to the rest of the >> patch. >> >> Thanks, >> >> Ben. >> >>> diff --git a/lib/route-table.c b/lib/route-table.c >>> index 67fda31..fc6845f 100644 >>> --- a/lib/route-table.c >>> +++ b/lib/route-table.c >>> @@ -112,7 +112,6 @@ route_table_init(void) >>> nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE, >>> (nln_notify_func *) route_table_change, NULL); >>> >>> - route_table_reset(); >>> name_table_init(); >>> >>> ovs_mutex_unlock(&route_table_mutex);
diff --git a/lib/ovs-router.c b/lib/ovs-router.c index ce2f80b..7db03f2 100644 --- a/lib/ovs-router.c +++ b/lib/ovs-router.c @@ -45,6 +45,7 @@ #include "util.h" #include "unaligned.h" #include "openvswitch/vlog.h" +#include "sset.h" VLOG_DEFINE_THIS_MODULE(ovs_router); @@ -138,6 +139,47 @@ static void rt_init_match(struct match *match, uint32_t mark, match->flow.pkt_mark = mark; } +/* Return the network device type of the specified + * 'netdev_name' if successful, otherwise null. + * + * The caller should free it. + * */ +static char * +get_netdev_type(const char *netdev_name) +{ + struct sset dpif_names = SSET_INITIALIZER(&dpif_names), + dpif_types = SSET_INITIALIZER(&dpif_types); + struct dpif_port dpif_port; + const char *dpif_type, *dpif_name; + char *netdev_type = NULL; + + dp_enumerate_types(&dpif_types); + + SSET_FOR_EACH (dpif_type, &dpif_types) { + if (dp_enumerate_names(dpif_type, &dpif_names)) { + continue; + } + + SSET_FOR_EACH (dpif_name, &dpif_names) { + struct dpif *dpif; + if (!dpif_open(dpif_name, dpif_type, &dpif)) { + if (!dpif_port_query_by_name(dpif, netdev_name, &dpif_port)) { + netdev_type = xstrdup(dpif_port.type); + dpif_close(dpif); + goto out; + } + dpif_close(dpif); + } + } + } + +out: + sset_destroy(&dpif_names); + sset_destroy(&dpif_types); + + return netdev_type; +} + static int get_src_addr(const struct in6_addr *ip6_dst, const char output_bridge[], struct in6_addr *psrc) @@ -146,9 +188,11 @@ get_src_addr(const struct in6_addr *ip6_dst, int err, n_in6, i, max_plen = -1; struct netdev *dev; bool is_ipv4; + char *netdev_type = get_netdev_type(output_bridge); - err = netdev_open(output_bridge, NULL, &dev); + err = netdev_open(output_bridge, netdev_type, &dev); if (err) { + free(netdev_type); return err; } @@ -182,6 +226,7 @@ get_src_addr(const struct in6_addr *ip6_dst, out: free(addr6); free(mask); + free(netdev_type); netdev_close(dev); return err; } diff --git a/lib/route-table.c b/lib/route-table.c index 67fda31..fc6845f 100644 --- a/lib/route-table.c +++ b/lib/route-table.c @@ -112,7 +112,6 @@ route_table_init(void) nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE, (nln_notify_func *) route_table_change, NULL); - route_table_reset(); name_table_init(); ovs_mutex_unlock(&route_table_mutex);
ovs-router module uses the netdev_open() to get routes. But this module always calls the netdev_open() with type which is NULL. This module may open the eth0, vethx, vxlan_sys_4789, br0 if these network devices exist. And these device will be opened as 'system' type. When debugging, somewhere netdev_ref it. After reverting "netdev: Fix netdev_open() to adhere to class type if given", and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show', the info is shown as below. the vxlan_sys_4789 is up (eg. ifconfig vxlan_sys_4789 up). $ ovs-dpctl show system@ovs-system: lookups: hit:4053 missed:118 lost:3 flows: 0 masks: hit:4154 total:1 hit/pkt:1.00 port 0: ovs-system (internal) port 1: user-ovs-vm (internal) port 2: vxlan_sys_4789 (vxlan) But the info should be as below. $ ovs-dpctl show system@ovs-system: lookups: hit:4053 missed:118 lost:3 flows: 0 masks: hit:4154 total:1 hit/pkt:1.00 port 0: ovs-system (internal) port 1: user-ovs-vm (internal) port 2: vxlan_sys_4789 (vxlan: packet_type=ptap) Because the netdev-class of 'system' type does not have the 'get_config', and tunnel vports have 'get_config', then we can get the config info(eg. packet_type=ptap) of tunnel vports. If we only revert the patch, there is a bug all the time. The patch which Eelco support is fine to me. That patch avoid issue. URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html But without it, the patch I support also avoid the problem. However we should check the type in the ovs-router module, this patch works well with the patch Eelco support. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> --- v2: fix memory leak. --- lib/ovs-router.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++- lib/route-table.c | 1 - 2 files changed, 46 insertions(+), 2 deletions(-)