[ovs-dev,v2,2/3] ovs-router: Set suitable type to netdev_open().
diff mbox

Message ID 1500435856-4120-2-git-send-email-xiangxia.m.yue@gmail.com
State Changes Requested
Delegated to: Ben Pfaff
Headers show

Commit Message

Tonghao Zhang July 19, 2017, 3:44 a.m. UTC
ovs-router module uses the netdev_open() to get routes.
But this module always calls the netdev_open() with type
which is NULL. This module may open the eth0, vethx,
vxlan_sys_4789, br0 if these network devices exist. And
these device will be opened as 'system' type.

When debugging, somewhere netdev_ref it.  After reverting
"netdev: Fix netdev_open() to adhere to class type if given",
and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show',
the info is shown as below. the vxlan_sys_4789 is up
(eg. ifconfig vxlan_sys_4789 up).

$ ovs-dpctl show
system@ovs-system:
	lookups: hit:4053 missed:118 lost:3
	flows: 0
	masks: hit:4154 total:1 hit/pkt:1.00
	port 0: ovs-system (internal)
	port 1: user-ovs-vm (internal)
	port 2: vxlan_sys_4789 (vxlan)

But the info should be as below.
$ ovs-dpctl show
system@ovs-system:
	lookups: hit:4053 missed:118 lost:3
	flows: 0
	masks: hit:4154 total:1 hit/pkt:1.00
	port 0: ovs-system (internal)
	port 1: user-ovs-vm (internal)
	port 2: vxlan_sys_4789 (vxlan: packet_type=ptap)

Because the netdev-class of 'system' type does not have the
'get_config', and tunnel vports have 'get_config', then we can
get the config info(eg. packet_type=ptap) of tunnel vports.

If we only revert the patch, there is a bug all the time. The
patch which Eelco support is fine to me. That patch avoid issue.
URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html
But without it, the patch I support also avoid the problem.

However we should check the type in the ovs-router module, this
patch works well with the patch Eelco support.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
	v2: fix memory leak.
---
 lib/ovs-router.c  | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
 lib/route-table.c |  1 -
 2 files changed, 46 insertions(+), 2 deletions(-)

Comments

Ben Pfaff Aug. 3, 2017, 6 p.m. UTC | #1
On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote:
> ovs-router module uses the netdev_open() to get routes.
> But this module always calls the netdev_open() with type
> which is NULL. This module may open the eth0, vethx,
> vxlan_sys_4789, br0 if these network devices exist. And
> these device will be opened as 'system' type.
> 
> When debugging, somewhere netdev_ref it.  After reverting
> "netdev: Fix netdev_open() to adhere to class type if given",
> and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show',
> the info is shown as below. the vxlan_sys_4789 is up
> (eg. ifconfig vxlan_sys_4789 up).
> 
> $ ovs-dpctl show
> system@ovs-system:
> 	lookups: hit:4053 missed:118 lost:3
> 	flows: 0
> 	masks: hit:4154 total:1 hit/pkt:1.00
> 	port 0: ovs-system (internal)
> 	port 1: user-ovs-vm (internal)
> 	port 2: vxlan_sys_4789 (vxlan)
> 
> But the info should be as below.
> $ ovs-dpctl show
> system@ovs-system:
> 	lookups: hit:4053 missed:118 lost:3
> 	flows: 0
> 	masks: hit:4154 total:1 hit/pkt:1.00
> 	port 0: ovs-system (internal)
> 	port 1: user-ovs-vm (internal)
> 	port 2: vxlan_sys_4789 (vxlan: packet_type=ptap)
> 
> Because the netdev-class of 'system' type does not have the
> 'get_config', and tunnel vports have 'get_config', then we can
> get the config info(eg. packet_type=ptap) of tunnel vports.
> 
> If we only revert the patch, there is a bug all the time. The
> patch which Eelco support is fine to me. That patch avoid issue.
> URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html
> But without it, the patch I support also avoid the problem.
> 
> However we should check the type in the ovs-router module, this
> patch works well with the patch Eelco support.
> 
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Thanks for the bug fix.

This patch seems fine but can you help me understand the change to
route_table_init()?  It isn't obviously connected to the rest of the
patch.

Thanks,

Ben.

> diff --git a/lib/route-table.c b/lib/route-table.c
> index 67fda31..fc6845f 100644
> --- a/lib/route-table.c
> +++ b/lib/route-table.c
> @@ -112,7 +112,6 @@ route_table_init(void)
>          nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE,
>                              (nln_notify_func *) route_table_change, NULL);
>  
> -    route_table_reset();
>      name_table_init();
>  
>      ovs_mutex_unlock(&route_table_mutex);
Tonghao Zhang Aug. 4, 2017, 2:45 a.m. UTC | #2
We can avoid the deadlock via  removing the route_table_reset() from
the route_table_init()

the call trace is below
dp_initialize (ovsthread_once)
    route_table_init
        route_table_reset
             route_table_handle_msg
                 ovs_router_insert__

ovs_router_insert__
       get_src_addr
            get_netdev_type
                   dp_enumerate_types
                         dp_initialize (ovsthread_once)


After removing the route_table_reset() from the route_table_init(),
the ovs-router works well. Because we run the route_table_run()
periodically, and route_table_valid is inited as false (we will call
the route_table_reset in this case). So it’s also unnecessary to
immediately  reset route table in the route_table_init().

On Fri, Aug 4, 2017 at 2:00 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote:
>> ovs-router module uses the netdev_open() to get routes.
>> But this module always calls the netdev_open() with type
>> which is NULL. This module may open the eth0, vethx,
>> vxlan_sys_4789, br0 if these network devices exist. And
>> these device will be opened as 'system' type.
>>
>> When debugging, somewhere netdev_ref it.  After reverting
>> "netdev: Fix netdev_open() to adhere to class type if given",
>> and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show',
>> the info is shown as below. the vxlan_sys_4789 is up
>> (eg. ifconfig vxlan_sys_4789 up).
>>
>> $ ovs-dpctl show
>> system@ovs-system:
>>       lookups: hit:4053 missed:118 lost:3
>>       flows: 0
>>       masks: hit:4154 total:1 hit/pkt:1.00
>>       port 0: ovs-system (internal)
>>       port 1: user-ovs-vm (internal)
>>       port 2: vxlan_sys_4789 (vxlan)
>>
>> But the info should be as below.
>> $ ovs-dpctl show
>> system@ovs-system:
>>       lookups: hit:4053 missed:118 lost:3
>>       flows: 0
>>       masks: hit:4154 total:1 hit/pkt:1.00
>>       port 0: ovs-system (internal)
>>       port 1: user-ovs-vm (internal)
>>       port 2: vxlan_sys_4789 (vxlan: packet_type=ptap)
>>
>> Because the netdev-class of 'system' type does not have the
>> 'get_config', and tunnel vports have 'get_config', then we can
>> get the config info(eg. packet_type=ptap) of tunnel vports.
>>
>> If we only revert the patch, there is a bug all the time. The
>> patch which Eelco support is fine to me. That patch avoid issue.
>> URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html
>> But without it, the patch I support also avoid the problem.
>>
>> However we should check the type in the ovs-router module, this
>> patch works well with the patch Eelco support.
>>
>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> Thanks for the bug fix.
>
> This patch seems fine but can you help me understand the change to
> route_table_init()?  It isn't obviously connected to the rest of the
> patch.
>
> Thanks,
>
> Ben.
>
>> diff --git a/lib/route-table.c b/lib/route-table.c
>> index 67fda31..fc6845f 100644
>> --- a/lib/route-table.c
>> +++ b/lib/route-table.c
>> @@ -112,7 +112,6 @@ route_table_init(void)
>>          nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE,
>>                              (nln_notify_func *) route_table_change, NULL);
>>
>> -    route_table_reset();
>>      name_table_init();
>>
>>      ovs_mutex_unlock(&route_table_mutex);
Tonghao Zhang Aug. 11, 2017, 10:02 a.m. UTC | #3
Hi Ben, this patch is ok ?

On Fri, Aug 4, 2017 at 10:45 AM, Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> We can avoid the deadlock via  removing the route_table_reset() from
> the route_table_init()
>
> the call trace is below
> dp_initialize (ovsthread_once)
>     route_table_init
>         route_table_reset
>              route_table_handle_msg
>                  ovs_router_insert__
>
> ovs_router_insert__
>        get_src_addr
>             get_netdev_type
>                    dp_enumerate_types
>                          dp_initialize (ovsthread_once)
>
>
> After removing the route_table_reset() from the route_table_init(),
> the ovs-router works well. Because we run the route_table_run()
> periodically, and route_table_valid is inited as false (we will call
> the route_table_reset in this case). So it’s also unnecessary to
> immediately  reset route table in the route_table_init().
>
> On Fri, Aug 4, 2017 at 2:00 AM, Ben Pfaff <blp@ovn.org> wrote:
>> On Tue, Jul 18, 2017 at 08:44:15PM -0700, Tonghao Zhang wrote:
>>> ovs-router module uses the netdev_open() to get routes.
>>> But this module always calls the netdev_open() with type
>>> which is NULL. This module may open the eth0, vethx,
>>> vxlan_sys_4789, br0 if these network devices exist. And
>>> these device will be opened as 'system' type.
>>>
>>> When debugging, somewhere netdev_ref it.  After reverting
>>> "netdev: Fix netdev_open() to adhere to class type if given",
>>> and when we call the 'ovs-appctl dpctl/show' or 'ovs-dpctl show',
>>> the info is shown as below. the vxlan_sys_4789 is up
>>> (eg. ifconfig vxlan_sys_4789 up).
>>>
>>> $ ovs-dpctl show
>>> system@ovs-system:
>>>       lookups: hit:4053 missed:118 lost:3
>>>       flows: 0
>>>       masks: hit:4154 total:1 hit/pkt:1.00
>>>       port 0: ovs-system (internal)
>>>       port 1: user-ovs-vm (internal)
>>>       port 2: vxlan_sys_4789 (vxlan)
>>>
>>> But the info should be as below.
>>> $ ovs-dpctl show
>>> system@ovs-system:
>>>       lookups: hit:4053 missed:118 lost:3
>>>       flows: 0
>>>       masks: hit:4154 total:1 hit/pkt:1.00
>>>       port 0: ovs-system (internal)
>>>       port 1: user-ovs-vm (internal)
>>>       port 2: vxlan_sys_4789 (vxlan: packet_type=ptap)
>>>
>>> Because the netdev-class of 'system' type does not have the
>>> 'get_config', and tunnel vports have 'get_config', then we can
>>> get the config info(eg. packet_type=ptap) of tunnel vports.
>>>
>>> If we only revert the patch, there is a bug all the time. The
>>> patch which Eelco support is fine to me. That patch avoid issue.
>>> URL: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335560.html
>>> But without it, the patch I support also avoid the problem.
>>>
>>> However we should check the type in the ovs-router module, this
>>> patch works well with the patch Eelco support.
>>>
>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>
>> Thanks for the bug fix.
>>
>> This patch seems fine but can you help me understand the change to
>> route_table_init()?  It isn't obviously connected to the rest of the
>> patch.
>>
>> Thanks,
>>
>> Ben.
>>
>>> diff --git a/lib/route-table.c b/lib/route-table.c
>>> index 67fda31..fc6845f 100644
>>> --- a/lib/route-table.c
>>> +++ b/lib/route-table.c
>>> @@ -112,7 +112,6 @@ route_table_init(void)
>>>          nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE,
>>>                              (nln_notify_func *) route_table_change, NULL);
>>>
>>> -    route_table_reset();
>>>      name_table_init();
>>>
>>>      ovs_mutex_unlock(&route_table_mutex);

Patch
diff mbox

diff --git a/lib/ovs-router.c b/lib/ovs-router.c
index ce2f80b..7db03f2 100644
--- a/lib/ovs-router.c
+++ b/lib/ovs-router.c
@@ -45,6 +45,7 @@ 
 #include "util.h"
 #include "unaligned.h"
 #include "openvswitch/vlog.h"
+#include "sset.h"
 
 VLOG_DEFINE_THIS_MODULE(ovs_router);
 
@@ -138,6 +139,47 @@  static void rt_init_match(struct match *match, uint32_t mark,
     match->flow.pkt_mark = mark;
 }
 
+/* Return the network device type of the specified
+ * 'netdev_name' if successful, otherwise null.
+ *
+ * The caller should free it.
+ * */
+static char *
+get_netdev_type(const char *netdev_name)
+{
+    struct sset dpif_names = SSET_INITIALIZER(&dpif_names),
+                dpif_types = SSET_INITIALIZER(&dpif_types);
+    struct dpif_port dpif_port;
+    const char *dpif_type, *dpif_name;
+    char *netdev_type = NULL;
+
+    dp_enumerate_types(&dpif_types);
+
+    SSET_FOR_EACH (dpif_type, &dpif_types) {
+        if (dp_enumerate_names(dpif_type, &dpif_names)) {
+            continue;
+        }
+
+        SSET_FOR_EACH (dpif_name, &dpif_names) {
+            struct dpif *dpif;
+            if (!dpif_open(dpif_name, dpif_type, &dpif)) {
+                if (!dpif_port_query_by_name(dpif, netdev_name, &dpif_port)) {
+                    netdev_type = xstrdup(dpif_port.type);
+                    dpif_close(dpif);
+                    goto out;
+                }
+                dpif_close(dpif);
+            }
+        }
+    }
+
+out:
+    sset_destroy(&dpif_names);
+    sset_destroy(&dpif_types);
+
+    return netdev_type;
+}
+
 static int
 get_src_addr(const struct in6_addr *ip6_dst,
              const char output_bridge[], struct in6_addr *psrc)
@@ -146,9 +188,11 @@  get_src_addr(const struct in6_addr *ip6_dst,
     int err, n_in6, i, max_plen = -1;
     struct netdev *dev;
     bool is_ipv4;
+    char *netdev_type = get_netdev_type(output_bridge);
 
-    err = netdev_open(output_bridge, NULL, &dev);
+    err = netdev_open(output_bridge, netdev_type, &dev);
     if (err) {
+        free(netdev_type);
         return err;
     }
 
@@ -182,6 +226,7 @@  get_src_addr(const struct in6_addr *ip6_dst,
 out:
     free(addr6);
     free(mask);
+    free(netdev_type);
     netdev_close(dev);
     return err;
 }
diff --git a/lib/route-table.c b/lib/route-table.c
index 67fda31..fc6845f 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -112,7 +112,6 @@  route_table_init(void)
         nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE,
                             (nln_notify_func *) route_table_change, NULL);
 
-    route_table_reset();
     name_table_init();
 
     ovs_mutex_unlock(&route_table_mutex);