diff mbox series

[ovs-dev,v4] netdev-vport : Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.

Message ID MEYP282MB330286C88D76C634B380C4E4CD6F9@MEYP282MB3302.AUSP282.PROD.OUTLOOK.COM
State Accepted
Headers show
Series [ovs-dev,v4] netdev-vport : Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs. | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

miter Dec. 8, 2021, 2:57 a.m. UTC
From: linhuang <linhuang@ruijie.com.cn>

Userspace tunnel doesn't have a valid device in the kernel. So
get_ifindex() function (ioctl) always get error during
adding a port, deleting a port or updating a port status.

The info log is
"2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
on vxlan_sys_4789 device failed: No such device"

If there are a lot of userspace tunnel ports on a bridge, the
iface_refresh_netdev_status() function will spend a lot of time.

So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
return -ENODEV.

Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Test-by: Mike Pattrick <mkp@redhat.com>
---
 lib/netdev-vport.c | 6 ++++++
 vswitchd/bridge.c  | 2 ++
 2 files changed, 8 insertions(+)

--
2.27.0

Comments

Ilya Maximets Dec. 9, 2021, 1:16 p.m. UTC | #1
On 12/8/21 03:57, lin huang wrote:
> From: linhuang <linhuang@ruijie.com.cn>
> 
> Userspace tunnel doesn't have a valid device in the kernel. So
> get_ifindex() function (ioctl) always get error during
> adding a port, deleting a port or updating a port status.
> 
> The info log is
> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
> on vxlan_sys_4789 device failed: No such device"
> 
> If there are a lot of userspace tunnel ports on a bridge, the
> iface_refresh_netdev_status() function will spend a lot of time.
> 
> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
> return -ENODEV.
> 
> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
> Test-by: Mike Pattrick <mkp@redhat.com>
> ---
>  lib/netdev-vport.c | 6 ++++++
>  vswitchd/bridge.c  | 2 ++
>  2 files changed, 8 insertions(+)
> 

Applied.  Thanks!

Best regards, Ilya Maximets.
Roi Dayan Dec. 14, 2021, 11:59 a.m. UTC | #2
On 2021-12-09 3:16 PM, Ilya Maximets wrote:
> On 12/8/21 03:57, lin huang wrote:
>> From: linhuang <linhuang@ruijie.com.cn>
>>
>> Userspace tunnel doesn't have a valid device in the kernel. So
>> get_ifindex() function (ioctl) always get error during
>> adding a port, deleting a port or updating a port status.
>>
>> The info log is
>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>> on vxlan_sys_4789 device failed: No such device"
>>
>> If there are a lot of userspace tunnel ports on a bridge, the
>> iface_refresh_netdev_status() function will spend a lot of time.
>>
>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>> return -ENODEV.
>>
>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>> Test-by: Mike Pattrick <mkp@redhat.com>
>> ---
>>   lib/netdev-vport.c | 6 ++++++
>>   vswitchd/bridge.c  | 2 ++
>>   2 files changed, 8 insertions(+)
>>
> 
> Applied.  Thanks!
> 
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C8865bff084424563a30308d9bb16357f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637746526329615023%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=53XCrfcyaY1orDpnmjdf7FV5espcWKPElCMDJKVU5Jw%3D&amp;reserved=0


Hi,

We encounter an offload issue with this commit that OVS doesn't add
rules to TC and keeps on with OVS dp.

To reproduce the issue, configure ovs with bridge and 2 ports,
restart openvswitch service and then do ping between the ports.
Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
issue and ovs calls TC again.

 From first look it's because br->ofproto->type is not the same pointer
usually set in netdev_ports_insert() which can be dpi_class->type or
static "system" and in the netdev_ports_* functions the code uses
pointer equal instead of strcmp() to check the type.
So netdev_ports_lookup() returns error now.

2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX 
parse_flow_put 2243 get port for type system ret (nil)

Even after changing all netdev_ports_* to use strcmp() and
netdev_ports_lookup() is ok and we do get to tc flow put.
Looks like we get an error from tc to offload. so checking this part
now.

2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to 
offload flow: No such device: enp8s0f0_0

So still looking on this but wanted to share about the behavior.

Thanks,
Roi
Roi Dayan Dec. 14, 2021, 12:07 p.m. UTC | #3
On 2021-12-14 1:59 PM, Roi Dayan wrote:
> 
> 
> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>> On 12/8/21 03:57, lin huang wrote:
>>> From: linhuang <linhuang@ruijie.com.cn>
>>>
>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>> get_ifindex() function (ioctl) always get error during
>>> adding a port, deleting a port or updating a port status.
>>>
>>> The info log is
>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>> on vxlan_sys_4789 device failed: No such device"
>>>
>>> If there are a lot of userspace tunnel ports on a bridge, the
>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>
>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>> return -ENODEV.
>>>
>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>> ---
>>>   lib/netdev-vport.c | 6 ++++++
>>>   vswitchd/bridge.c  | 2 ++
>>>   2 files changed, 8 insertions(+)
>>>
>>
>> Applied.  Thanks!
>>
>> Best regards, Ilya Maximets.
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C8865bff084424563a30308d9bb16357f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637746526329615023%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=53XCrfcyaY1orDpnmjdf7FV5espcWKPElCMDJKVU5Jw%3D&amp;reserved=0
> 
> 
> Hi,
> 
> We encounter an offload issue with this commit that OVS doesn't add
> rules to TC and keeps on with OVS dp.
> 
> To reproduce the issue, configure ovs with bridge and 2 ports,
> restart openvswitch service and then do ping between the ports.
> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
> issue and ovs calls TC again.
> 
>  From first look it's because br->ofproto->type is not the same pointer
> usually set in netdev_ports_insert() which can be dpi_class->type or
> static "system" and in the netdev_ports_* functions the code uses
> pointer equal instead of strcmp() to check the type.
> So netdev_ports_lookup() returns error now.
> 
> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX 
> parse_flow_put 2243 get port for type system ret (nil)
> 
> Even after changing all netdev_ports_* to use strcmp() and
> netdev_ports_lookup() is ok and we do get to tc flow put.
> Looks like we get an error from tc to offload. so checking this part
> now.
> 
> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to 
> offload flow: No such device: enp8s0f0_0
> 
> So still looking on this but wanted to share about the behavior.
> 
> Thanks,
> Roi

I forgot to mention but just to make it clear,
need to set hw-offload=true.
Ilya Maximets Dec. 14, 2021, 1:14 p.m. UTC | #4
On 12/14/21 13:07, Roi Dayan wrote:
> 
> 
> On 2021-12-14 1:59 PM, Roi Dayan wrote:
>>
>>
>> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>>> On 12/8/21 03:57, lin huang wrote:
>>>> From: linhuang <linhuang@ruijie.com.cn>
>>>>
>>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>>> get_ifindex() function (ioctl) always get error during
>>>> adding a port, deleting a port or updating a port status.
>>>>
>>>> The info log is
>>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>>> on vxlan_sys_4789 device failed: No such device"
>>>>
>>>> If there are a lot of userspace tunnel ports on a bridge, the
>>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>>
>>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>>> return -ENODEV.
>>>>
>>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>>> ---
>>>>   lib/netdev-vport.c | 6 ++++++
>>>>   vswitchd/bridge.c  | 2 ++
>>>>   2 files changed, 8 insertions(+)
>>>>
>>>
>>> Applied.  Thanks!
>>>
>>> Best regards, Ilya Maximets.
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C8865bff084424563a30308d9bb16357f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637746526329615023%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=53XCrfcyaY1orDpnmjdf7FV5espcWKPElCMDJKVU5Jw%3D&amp;reserved=0
>>
>>
>> Hi,
>>
>> We encounter an offload issue with this commit that OVS doesn't add
>> rules to TC and keeps on with OVS dp.
>>
>> To reproduce the issue, configure ovs with bridge and 2 ports,
>> restart openvswitch service and then do ping between the ports.
>> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
>> issue and ovs calls TC again.
>>
>>  From first look it's because br->ofproto->type is not the same pointer
>> usually set in netdev_ports_insert() which can be dpi_class->type or
>> static "system" and in the netdev_ports_* functions the code uses
>> pointer equal instead of strcmp() to check the type.
>> So netdev_ports_lookup() returns error now.
>>
>> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX parse_flow_put 2243 get port for type system ret (nil)
>>
>> Even after changing all netdev_ports_* to use strcmp() and
>> netdev_ports_lookup() is ok and we do get to tc flow put.
>> Looks like we get an error from tc to offload. so checking this part
>> now.
>>
>> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to offload flow: No such device: enp8s0f0_0
>>
>> So still looking on this but wanted to share about the behavior.
>>
>> Thanks,
>> Roi
> 
> I forgot to mention but just to make it clear,
> need to set hw-offload=true.
> 

Ugh.  Sorry, my fault.  Looking at the patch again now and this version
seems weird.  I don't know WTF I was thinking.  The main thing is that
we're calling netdev_set_dpif_type() twice with different arguments.
First time with br->ofproto->type as a type and second time with
dpif_normalize_type(dpif_type(dpif)).  And these are not the same.
And as you correctly noticed br->ofproto->type is not a constant string
unlike the normalized one, hence cannot be checked by a pointer comparison.


Could you try this:

diff --git a/lib/dpif.c b/lib/dpif.c
index 38bcb47cb..cff0bc2db 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -364,7 +364,8 @@ do_open(const char *name, const char *type, bool create, struct dpif **dpifp)
             err = netdev_open(dpif_port.name, dpif_port.type, &netdev);
 
             if (!err) {
-                netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
+                netdev_set_dpif_type(netdev, dpif_type_str);
+                netdev_ports_insert(netdev, &dpif_port);
                 netdev_close(netdev);
             } else {
                 VLOG_WARN("could not open netdev %s type %s: %s",
@@ -602,10 +603,12 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop)
             const char *dpif_type_str = dpif_normalize_type(dpif_type(dpif));
             struct dpif_port dpif_port;
 
+            netdev_set_dpif_type(netdev, dpif_type_str);
+
             dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
             dpif_port.name = CONST_CAST(char *, netdev_name);
             dpif_port.port_no = port_no;
-            netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
+            netdev_ports_insert(netdev, &dpif_port);
         }
     } else {
         VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 8075cfbd8..b8dc108f3 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -573,12 +573,14 @@ netdev_ports_lookup(odp_port_t port_no, const char *dpif_type)
 }
 
 int
-netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
-                    struct dpif_port *dpif_port)
+netdev_ports_insert(struct netdev *netdev, struct dpif_port *dpif_port)
 {
+    const char *dpif_type = netdev_get_dpif_type(netdev);
     struct port_to_netdev_data *data;
     int ifindex = netdev_get_ifindex(netdev);
 
+    ovs_assert(dpif_type);
+
     ovs_rwlock_wrlock(&netdev_hmap_rwlock);
     if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) {
         ovs_rwlock_unlock(&netdev_hmap_rwlock);
@@ -596,8 +598,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
         data->ifindex = -1;
     }
 
-    netdev_set_dpif_type(netdev, dpif_type);
-
     hmap_insert(&port_to_netdev, &data->portno_node,
                 netdev_ports_hash(dpif_port->port_no, dpif_type));
     ovs_rwlock_unlock(&netdev_hmap_rwlock);
diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
index e7fcedae9..43a98d499 100644
--- a/lib/netdev-offload.h
+++ b/lib/netdev-offload.h
@@ -108,8 +108,7 @@ bool netdev_is_offload_rebalance_policy_enabled(void);
 int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
 
 struct dpif_port;
-int netdev_ports_insert(struct netdev *, const char *dpif_type,
-                        struct dpif_port *);
+int netdev_ports_insert(struct netdev *, struct dpif_port *);
 struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type);
 int netdev_ports_remove(odp_port_t port, const char *dpif_type);
 odp_port_t netdev_ifindex_to_odp_port(int ifindex);
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 513ef7ea9..5223aa897 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2052,8 +2052,6 @@ iface_do_create(const struct bridge *br,
         goto error;
     }
 
-    netdev_set_dpif_type(netdev, br->ofproto->type);
-
     error = iface_set_netdev_config(iface_cfg, netdev, errp);
     if (error) {
         goto error;
---

?

Or else, we can just revert the patch.

Best regards, Ilya Maximets.
Roi Dayan Dec. 15, 2021, 9:50 a.m. UTC | #5
On 2021-12-14 3:14 PM, Ilya Maximets wrote:
> On 12/14/21 13:07, Roi Dayan wrote:
>>
>>
>> On 2021-12-14 1:59 PM, Roi Dayan wrote:
>>>
>>>
>>> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>>>> On 12/8/21 03:57, lin huang wrote:
>>>>> From: linhuang <linhuang@ruijie.com.cn>
>>>>>
>>>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>>>> get_ifindex() function (ioctl) always get error during
>>>>> adding a port, deleting a port or updating a port status.
>>>>>
>>>>> The info log is
>>>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>>>> on vxlan_sys_4789 device failed: No such device"
>>>>>
>>>>> If there are a lot of userspace tunnel ports on a bridge, the
>>>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>>>
>>>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>>>> return -ENODEV.
>>>>>
>>>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>>>> ---
>>>>>    lib/netdev-vport.c | 6 ++++++
>>>>>    vswitchd/bridge.c  | 2 ++
>>>>>    2 files changed, 8 insertions(+)
>>>>>
>>>>
>>>> Applied.  Thanks!
>>>>
>>>> Best regards, Ilya Maximets.
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C6700a97a81bd41ff4a2208d9bf03a65f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637750844672247450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=jdOCdoJ9gWoj4kbl6dRTRAg0vIXdgoHYKAfVRjg1z44%3D&amp;reserved=0
>>>
>>>
>>> Hi,
>>>
>>> We encounter an offload issue with this commit that OVS doesn't add
>>> rules to TC and keeps on with OVS dp.
>>>
>>> To reproduce the issue, configure ovs with bridge and 2 ports,
>>> restart openvswitch service and then do ping between the ports.
>>> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
>>> issue and ovs calls TC again.
>>>
>>>   From first look it's because br->ofproto->type is not the same pointer
>>> usually set in netdev_ports_insert() which can be dpi_class->type or
>>> static "system" and in the netdev_ports_* functions the code uses
>>> pointer equal instead of strcmp() to check the type.
>>> So netdev_ports_lookup() returns error now.
>>>
>>> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX parse_flow_put 2243 get port for type system ret (nil)
>>>
>>> Even after changing all netdev_ports_* to use strcmp() and
>>> netdev_ports_lookup() is ok and we do get to tc flow put.
>>> Looks like we get an error from tc to offload. so checking this part
>>> now.
>>>
>>> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to offload flow: No such device: enp8s0f0_0
>>>
>>> So still looking on this but wanted to share about the behavior.
>>>
>>> Thanks,
>>> Roi
>>
>> I forgot to mention but just to make it clear,
>> need to set hw-offload=true.
>>
> 
> Ugh.  Sorry, my fault.  Looking at the patch again now and this version
> seems weird.  I don't know WTF I was thinking.  The main thing is that
> we're calling netdev_set_dpif_type() twice with different arguments.
> First time with br->ofproto->type as a type and second time with
> dpif_normalize_type(dpif_type(dpif)).  And these are not the same.
> And as you correctly noticed br->ofproto->type is not a constant string
> unlike the normalized one, hence cannot be checked by a pointer comparison.
> 
> 
> Could you try this:
> 
> diff --git a/lib/dpif.c b/lib/dpif.c
> index 38bcb47cb..cff0bc2db 100644
> --- a/lib/dpif.c
> +++ b/lib/dpif.c
> @@ -364,7 +364,8 @@ do_open(const char *name, const char *type, bool create, struct dpif **dpifp)
>               err = netdev_open(dpif_port.name, dpif_port.type, &netdev);
>   
>               if (!err) {
> -                netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
> +                netdev_set_dpif_type(netdev, dpif_type_str);
> +                netdev_ports_insert(netdev, &dpif_port);
>                   netdev_close(netdev);
>               } else {
>                   VLOG_WARN("could not open netdev %s type %s: %s",
> @@ -602,10 +603,12 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop)
>               const char *dpif_type_str = dpif_normalize_type(dpif_type(dpif));
>               struct dpif_port dpif_port;
>   
> +            netdev_set_dpif_type(netdev, dpif_type_str);
> +
>               dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
>               dpif_port.name = CONST_CAST(char *, netdev_name);
>               dpif_port.port_no = port_no;
> -            netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
> +            netdev_ports_insert(netdev, &dpif_port);
>           }
>       } else {
>           VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index 8075cfbd8..b8dc108f3 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -573,12 +573,14 @@ netdev_ports_lookup(odp_port_t port_no, const char *dpif_type)
>   }
>   
>   int
> -netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
> -                    struct dpif_port *dpif_port)
> +netdev_ports_insert(struct netdev *netdev, struct dpif_port *dpif_port)
>   {
> +    const char *dpif_type = netdev_get_dpif_type(netdev);
>       struct port_to_netdev_data *data;
>       int ifindex = netdev_get_ifindex(netdev);
>   
> +    ovs_assert(dpif_type);
> +
>       ovs_rwlock_wrlock(&netdev_hmap_rwlock);
>       if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) {
>           ovs_rwlock_unlock(&netdev_hmap_rwlock);
> @@ -596,8 +598,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>           data->ifindex = -1;
>       }
>   
> -    netdev_set_dpif_type(netdev, dpif_type);
> -
>       hmap_insert(&port_to_netdev, &data->portno_node,
>                   netdev_ports_hash(dpif_port->port_no, dpif_type));
>       ovs_rwlock_unlock(&netdev_hmap_rwlock);
> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
> index e7fcedae9..43a98d499 100644
> --- a/lib/netdev-offload.h
> +++ b/lib/netdev-offload.h
> @@ -108,8 +108,7 @@ bool netdev_is_offload_rebalance_policy_enabled(void);
>   int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
>   
>   struct dpif_port;
> -int netdev_ports_insert(struct netdev *, const char *dpif_type,
> -                        struct dpif_port *);
> +int netdev_ports_insert(struct netdev *, struct dpif_port *);
>   struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type);
>   int netdev_ports_remove(odp_port_t port, const char *dpif_type);
>   odp_port_t netdev_ifindex_to_odp_port(int ifindex);
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 513ef7ea9..5223aa897 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -2052,8 +2052,6 @@ iface_do_create(const struct bridge *br,
>           goto error;
>       }
>   
> -    netdev_set_dpif_type(netdev, br->ofproto->type);
> -
>       error = iface_set_netdev_config(iface_cfg, netdev, errp);
>       if (error) {
>           goto error;
> ---
> 
> ?
> 
> Or else, we can just revert the patch.
> 
> Best regards, Ilya Maximets.


Hi Ilya,

Thanks for the patch. The patch is fine but I think it doesn't do much.

You remove the set dpif type of br->ofproto->type which is eliminating
the issue of not offloading to tc.
Leaving only the strcmp when trying to get ifindex.

You refactored netdev_ports_insert() not to accept dpif_type to set
to the netdev and instead you set it just before calling.
So it doesn't change any logic or am I missing something?
it's ok by me if you think its more readable and tc issue is resolved.

I also tried to reproduce the original log msg that fails to get ifindex
on vxlan_sys_4789 and I didn't reproduce so I can't verify about it.

The origin patch assumption was that in iface_do_create() we were
missing netdev dpif type and thus got the error trying to config some 
devices. so maybe we were not?

Thanks,
Roi
Ilya Maximets Dec. 15, 2021, 12:44 p.m. UTC | #6
On 12/15/21 10:50, Roi Dayan wrote:
> 
> 
> On 2021-12-14 3:14 PM, Ilya Maximets wrote:
>> On 12/14/21 13:07, Roi Dayan wrote:
>>>
>>>
>>> On 2021-12-14 1:59 PM, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>>>>> On 12/8/21 03:57, lin huang wrote:
>>>>>> From: linhuang <linhuang@ruijie.com.cn>
>>>>>>
>>>>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>>>>> get_ifindex() function (ioctl) always get error during
>>>>>> adding a port, deleting a port or updating a port status.
>>>>>>
>>>>>> The info log is
>>>>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>>>>> on vxlan_sys_4789 device failed: No such device"
>>>>>>
>>>>>> If there are a lot of userspace tunnel ports on a bridge, the
>>>>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>>>>
>>>>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>>>>> return -ENODEV.
>>>>>>
>>>>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>>>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>>>>> ---
>>>>>>    lib/netdev-vport.c | 6 ++++++
>>>>>>    vswitchd/bridge.c  | 2 ++
>>>>>>    2 files changed, 8 insertions(+)
>>>>>>
>>>>>
>>>>> Applied.  Thanks!
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev@openvswitch.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C6700a97a81bd41ff4a2208d9bf03a65f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637750844672247450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=jdOCdoJ9gWoj4kbl6dRTRAg0vIXdgoHYKAfVRjg1z44%3D&amp;reserved=0
>>>>
>>>>
>>>> Hi,
>>>>
>>>> We encounter an offload issue with this commit that OVS doesn't add
>>>> rules to TC and keeps on with OVS dp.
>>>>
>>>> To reproduce the issue, configure ovs with bridge and 2 ports,
>>>> restart openvswitch service and then do ping between the ports.
>>>> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
>>>> issue and ovs calls TC again.
>>>>
>>>>   From first look it's because br->ofproto->type is not the same pointer
>>>> usually set in netdev_ports_insert() which can be dpi_class->type or
>>>> static "system" and in the netdev_ports_* functions the code uses
>>>> pointer equal instead of strcmp() to check the type.
>>>> So netdev_ports_lookup() returns error now.
>>>>
>>>> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX parse_flow_put 2243 get port for type system ret (nil)
>>>>
>>>> Even after changing all netdev_ports_* to use strcmp() and
>>>> netdev_ports_lookup() is ok and we do get to tc flow put.
>>>> Looks like we get an error from tc to offload. so checking this part
>>>> now.
>>>>
>>>> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to offload flow: No such device: enp8s0f0_0
>>>>
>>>> So still looking on this but wanted to share about the behavior.
>>>>
>>>> Thanks,
>>>> Roi
>>>
>>> I forgot to mention but just to make it clear,
>>> need to set hw-offload=true.
>>>
>>
>> Ugh.  Sorry, my fault.  Looking at the patch again now and this version
>> seems weird.  I don't know WTF I was thinking.  The main thing is that
>> we're calling netdev_set_dpif_type() twice with different arguments.
>> First time with br->ofproto->type as a type and second time with
>> dpif_normalize_type(dpif_type(dpif)).  And these are not the same.
>> And as you correctly noticed br->ofproto->type is not a constant string
>> unlike the normalized one, hence cannot be checked by a pointer comparison.
>>
>>
>> Could you try this:
>>
>> diff --git a/lib/dpif.c b/lib/dpif.c
>> index 38bcb47cb..cff0bc2db 100644
>> --- a/lib/dpif.c
>> +++ b/lib/dpif.c
>> @@ -364,7 +364,8 @@ do_open(const char *name, const char *type, bool create, struct dpif **dpifp)
>>               err = netdev_open(dpif_port.name, dpif_port.type, &netdev);
>>                 if (!err) {
>> -                netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>> +                netdev_set_dpif_type(netdev, dpif_type_str);
>> +                netdev_ports_insert(netdev, &dpif_port);
>>                   netdev_close(netdev);
>>               } else {
>>                   VLOG_WARN("could not open netdev %s type %s: %s",
>> @@ -602,10 +603,12 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop)
>>               const char *dpif_type_str = dpif_normalize_type(dpif_type(dpif));
>>               struct dpif_port dpif_port;
>>   +            netdev_set_dpif_type(netdev, dpif_type_str);
>> +
>>               dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
>>               dpif_port.name = CONST_CAST(char *, netdev_name);
>>               dpif_port.port_no = port_no;
>> -            netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>> +            netdev_ports_insert(netdev, &dpif_port);
>>           }
>>       } else {
>>           VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>> index 8075cfbd8..b8dc108f3 100644
>> --- a/lib/netdev-offload.c
>> +++ b/lib/netdev-offload.c
>> @@ -573,12 +573,14 @@ netdev_ports_lookup(odp_port_t port_no, const char *dpif_type)
>>   }
>>     int
>> -netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>> -                    struct dpif_port *dpif_port)
>> +netdev_ports_insert(struct netdev *netdev, struct dpif_port *dpif_port)
>>   {
>> +    const char *dpif_type = netdev_get_dpif_type(netdev);
>>       struct port_to_netdev_data *data;
>>       int ifindex = netdev_get_ifindex(netdev);
>>   +    ovs_assert(dpif_type);
>> +
>>       ovs_rwlock_wrlock(&netdev_hmap_rwlock);
>>       if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) {
>>           ovs_rwlock_unlock(&netdev_hmap_rwlock);
>> @@ -596,8 +598,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>>           data->ifindex = -1;
>>       }
>>   -    netdev_set_dpif_type(netdev, dpif_type);
>> -
>>       hmap_insert(&port_to_netdev, &data->portno_node,
>>                   netdev_ports_hash(dpif_port->port_no, dpif_type));
>>       ovs_rwlock_unlock(&netdev_hmap_rwlock);
>> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
>> index e7fcedae9..43a98d499 100644
>> --- a/lib/netdev-offload.h
>> +++ b/lib/netdev-offload.h
>> @@ -108,8 +108,7 @@ bool netdev_is_offload_rebalance_policy_enabled(void);
>>   int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
>>     struct dpif_port;
>> -int netdev_ports_insert(struct netdev *, const char *dpif_type,
>> -                        struct dpif_port *);
>> +int netdev_ports_insert(struct netdev *, struct dpif_port *);
>>   struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type);
>>   int netdev_ports_remove(odp_port_t port, const char *dpif_type);
>>   odp_port_t netdev_ifindex_to_odp_port(int ifindex);
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index 513ef7ea9..5223aa897 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -2052,8 +2052,6 @@ iface_do_create(const struct bridge *br,
>>           goto error;
>>       }
>>   -    netdev_set_dpif_type(netdev, br->ofproto->type);
>> -
>>       error = iface_set_netdev_config(iface_cfg, netdev, errp);
>>       if (error) {
>>           goto error;
>> ---
>>
>> ?
>>
>> Or else, we can just revert the patch.
>>
>> Best regards, Ilya Maximets.
> 
> 
> Hi Ilya,
> 
> Thanks for the patch. The patch is fine but I think it doesn't do much.
> 
> You remove the set dpif type of br->ofproto->type which is eliminating
> the issue of not offloading to tc.
> Leaving only the strcmp when trying to get ifindex.
> 
> You refactored netdev_ports_insert() not to accept dpif_type to set
> to the netdev and instead you set it just before calling.
> So it doesn't change any logic or am I missing something?

The idea is to set the dpif_type before the first call to netdev_get_ifindex.
Previously we had the netdev_get_ifindex() right at the start of
netdev_ports_insert() and that triggered the INFO log.  The patch added
another netdev_set_dpif_type() to the bridge.c which used an incorrect
argument, and didn't remove the "extra" call inside the netdev_ports_insert()
for some reason.

What I proposed above is to keep calling  netdev_set_dpif_type() before the
netdev_ports_insert(), so the first call to the netdev_get_ifindex() happens
after the type is already set, so we don't have the INFO log.  Also removing
the old "extra" call inside the netdev_ports_insert(), since it's not needed.

The call moved from bridge.c to dpif.c, because we need to have access to the
dpif class, but bridge.c should not.

Not trying to set the dpif_type inside the netdev_ports_insert(), because it's
used now outside the offloading context.  So, I think, it's cleaner to move the
netdev_set_dpif_type() call outside of the netdev-offload module.

So, yeah, the logic did not change much.  Using the correct argument for
the netdev_set_dpif_type() and refactoring along a way to make the code
a bit cleaner.

Does that make sense?

> it's ok by me if you think its more readable and tc issue is resolved.
> 
> I also tried to reproduce the original log msg that fails to get ifindex
> on vxlan_sys_4789 and I didn't reproduce so I can't verify about it.
> 
> The origin patch assumption was that in iface_do_create() we were
> missing netdev dpif type and thus got the error trying to config some
> devices. so maybe we were not?

The issue was that netdev_ports_insert() requests ifindex before
setting the dpif type, so we wasn't able to check it.  And since
the userspace tunnel devices doesn't exist in the kernel, attempt
to get the ifindex fails and prints the INFO log.

I can confirm that netdev_ports_insert() is a first place where
ifindex is requested, so setting dpif type before calling this
function should fix the original logging issue.


The offloading works fine with the above change applied, right?
If so, I'll send an official patch.

Best regards, Ilya Maximets.
Roi Dayan Dec. 16, 2021, 1:25 p.m. UTC | #7
On 2021-12-15 2:44 PM, Ilya Maximets wrote:
> On 12/15/21 10:50, Roi Dayan wrote:
>>
>>
>> On 2021-12-14 3:14 PM, Ilya Maximets wrote:
>>> On 12/14/21 13:07, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2021-12-14 1:59 PM, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>>>>>> On 12/8/21 03:57, lin huang wrote:
>>>>>>> From: linhuang <linhuang@ruijie.com.cn>
>>>>>>>
>>>>>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>>>>>> get_ifindex() function (ioctl) always get error during
>>>>>>> adding a port, deleting a port or updating a port status.
>>>>>>>
>>>>>>> The info log is
>>>>>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>>>>>> on vxlan_sys_4789 device failed: No such device"
>>>>>>>
>>>>>>> If there are a lot of userspace tunnel ports on a bridge, the
>>>>>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>>>>>
>>>>>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>>>>>> return -ENODEV.
>>>>>>>
>>>>>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>>>>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>>>>>> ---
>>>>>>>     lib/netdev-vport.c | 6 ++++++
>>>>>>>     vswitchd/bridge.c  | 2 ++
>>>>>>>     2 files changed, 8 insertions(+)
>>>>>>>
>>>>>>
>>>>>> Applied.  Thanks!
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C68774137f0cd4286502e08d9bfc8b9a8%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637751691102561850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OxGVmhBKUZpiBAkH9e3OLWPV%2F4O7baVYmosmBsXwQb8%3D&amp;reserved=0
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> We encounter an offload issue with this commit that OVS doesn't add
>>>>> rules to TC and keeps on with OVS dp.
>>>>>
>>>>> To reproduce the issue, configure ovs with bridge and 2 ports,
>>>>> restart openvswitch service and then do ping between the ports.
>>>>> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
>>>>> issue and ovs calls TC again.
>>>>>
>>>>>    From first look it's because br->ofproto->type is not the same pointer
>>>>> usually set in netdev_ports_insert() which can be dpi_class->type or
>>>>> static "system" and in the netdev_ports_* functions the code uses
>>>>> pointer equal instead of strcmp() to check the type.
>>>>> So netdev_ports_lookup() returns error now.
>>>>>
>>>>> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX parse_flow_put 2243 get port for type system ret (nil)
>>>>>
>>>>> Even after changing all netdev_ports_* to use strcmp() and
>>>>> netdev_ports_lookup() is ok and we do get to tc flow put.
>>>>> Looks like we get an error from tc to offload. so checking this part
>>>>> now.
>>>>>
>>>>> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to offload flow: No such device: enp8s0f0_0
>>>>>
>>>>> So still looking on this but wanted to share about the behavior.
>>>>>
>>>>> Thanks,
>>>>> Roi
>>>>
>>>> I forgot to mention but just to make it clear,
>>>> need to set hw-offload=true.
>>>>
>>>
>>> Ugh.  Sorry, my fault.  Looking at the patch again now and this version
>>> seems weird.  I don't know WTF I was thinking.  The main thing is that
>>> we're calling netdev_set_dpif_type() twice with different arguments.
>>> First time with br->ofproto->type as a type and second time with
>>> dpif_normalize_type(dpif_type(dpif)).  And these are not the same.
>>> And as you correctly noticed br->ofproto->type is not a constant string
>>> unlike the normalized one, hence cannot be checked by a pointer comparison.
>>>
>>>
>>> Could you try this:
>>>
>>> diff --git a/lib/dpif.c b/lib/dpif.c
>>> index 38bcb47cb..cff0bc2db 100644
>>> --- a/lib/dpif.c
>>> +++ b/lib/dpif.c
>>> @@ -364,7 +364,8 @@ do_open(const char *name, const char *type, bool create, struct dpif **dpifp)
>>>                err = netdev_open(dpif_port.name, dpif_port.type, &netdev);
>>>                  if (!err) {
>>> -                netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>>> +                netdev_set_dpif_type(netdev, dpif_type_str);
>>> +                netdev_ports_insert(netdev, &dpif_port);
>>>                    netdev_close(netdev);
>>>                } else {
>>>                    VLOG_WARN("could not open netdev %s type %s: %s",
>>> @@ -602,10 +603,12 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop)
>>>                const char *dpif_type_str = dpif_normalize_type(dpif_type(dpif));
>>>                struct dpif_port dpif_port;
>>>    +            netdev_set_dpif_type(netdev, dpif_type_str);
>>> +
>>>                dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
>>>                dpif_port.name = CONST_CAST(char *, netdev_name);
>>>                dpif_port.port_no = port_no;
>>> -            netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>>> +            netdev_ports_insert(netdev, &dpif_port);
>>>            }
>>>        } else {
>>>            VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
>>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>> index 8075cfbd8..b8dc108f3 100644
>>> --- a/lib/netdev-offload.c
>>> +++ b/lib/netdev-offload.c
>>> @@ -573,12 +573,14 @@ netdev_ports_lookup(odp_port_t port_no, const char *dpif_type)
>>>    }
>>>      int
>>> -netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>>> -                    struct dpif_port *dpif_port)
>>> +netdev_ports_insert(struct netdev *netdev, struct dpif_port *dpif_port)
>>>    {
>>> +    const char *dpif_type = netdev_get_dpif_type(netdev);
>>>        struct port_to_netdev_data *data;
>>>        int ifindex = netdev_get_ifindex(netdev);
>>>    +    ovs_assert(dpif_type);
>>> +
>>>        ovs_rwlock_wrlock(&netdev_hmap_rwlock);
>>>        if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) {
>>>            ovs_rwlock_unlock(&netdev_hmap_rwlock);
>>> @@ -596,8 +598,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>>>            data->ifindex = -1;
>>>        }
>>>    -    netdev_set_dpif_type(netdev, dpif_type);
>>> -
>>>        hmap_insert(&port_to_netdev, &data->portno_node,
>>>                    netdev_ports_hash(dpif_port->port_no, dpif_type));
>>>        ovs_rwlock_unlock(&netdev_hmap_rwlock);
>>> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
>>> index e7fcedae9..43a98d499 100644
>>> --- a/lib/netdev-offload.h
>>> +++ b/lib/netdev-offload.h
>>> @@ -108,8 +108,7 @@ bool netdev_is_offload_rebalance_policy_enabled(void);
>>>    int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
>>>      struct dpif_port;
>>> -int netdev_ports_insert(struct netdev *, const char *dpif_type,
>>> -                        struct dpif_port *);
>>> +int netdev_ports_insert(struct netdev *, struct dpif_port *);
>>>    struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type);
>>>    int netdev_ports_remove(odp_port_t port, const char *dpif_type);
>>>    odp_port_t netdev_ifindex_to_odp_port(int ifindex);
>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>> index 513ef7ea9..5223aa897 100644
>>> --- a/vswitchd/bridge.c
>>> +++ b/vswitchd/bridge.c
>>> @@ -2052,8 +2052,6 @@ iface_do_create(const struct bridge *br,
>>>            goto error;
>>>        }
>>>    -    netdev_set_dpif_type(netdev, br->ofproto->type);
>>> -
>>>        error = iface_set_netdev_config(iface_cfg, netdev, errp);
>>>        if (error) {
>>>            goto error;
>>> ---
>>>
>>> ?
>>>
>>> Or else, we can just revert the patch.
>>>
>>> Best regards, Ilya Maximets.
>>
>>
>> Hi Ilya,
>>
>> Thanks for the patch. The patch is fine but I think it doesn't do much.
>>
>> You remove the set dpif type of br->ofproto->type which is eliminating
>> the issue of not offloading to tc.
>> Leaving only the strcmp when trying to get ifindex.
>>
>> You refactored netdev_ports_insert() not to accept dpif_type to set
>> to the netdev and instead you set it just before calling.
>> So it doesn't change any logic or am I missing something?
> 
> The idea is to set the dpif_type before the first call to netdev_get_ifindex.
> Previously we had the netdev_get_ifindex() right at the start of
> netdev_ports_insert() and that triggered the INFO log.  The patch added
> another netdev_set_dpif_type() to the bridge.c which used an incorrect
> argument, and didn't remove the "extra" call inside the netdev_ports_insert()
> for some reason.
> 
> What I proposed above is to keep calling  netdev_set_dpif_type() before the
> netdev_ports_insert(), so the first call to the netdev_get_ifindex() happens
> after the type is already set, so we don't have the INFO log.  Also removing
> the old "extra" call inside the netdev_ports_insert(), since it's not needed.
> 
> The call moved from bridge.c to dpif.c, because we need to have access to the
> dpif class, but bridge.c should not.
> 
> Not trying to set the dpif_type inside the netdev_ports_insert(), because it's
> used now outside the offloading context.  So, I think, it's cleaner to move the
> netdev_set_dpif_type() call outside of the netdev-offload module.
> 
> So, yeah, the logic did not change much.  Using the correct argument for
> the netdev_set_dpif_type() and refactoring along a way to make the code
> a bit cleaner.
> 
> Does that make sense?
> 
>> it's ok by me if you think its more readable and tc issue is resolved.
>>
>> I also tried to reproduce the original log msg that fails to get ifindex
>> on vxlan_sys_4789 and I didn't reproduce so I can't verify about it.
>>
>> The origin patch assumption was that in iface_do_create() we were
>> missing netdev dpif type and thus got the error trying to config some
>> devices. so maybe we were not?
> 
> The issue was that netdev_ports_insert() requests ifindex before
> setting the dpif type, so we wasn't able to check it.  And since
> the userspace tunnel devices doesn't exist in the kernel, attempt
> to get the ifindex fails and prints the INFO log.
> 
> I can confirm that netdev_ports_insert() is a first place where
> ifindex is requested, so setting dpif type before calling this
> function should fix the original logging issue.
> 
> 
> The offloading works fine with the above change applied, right?
> If so, I'll send an official patch.
> 
> Best regards, Ilya Maximets.

right. I missed the get_ifindex call at the top of
netdev_ports_insert() when reading.

so everything looks good. thanks.

if important, you can also move get_ifindex call inside
netdev_ports_insert() to right before we use it so if port
exist there won't be a call to get_ifindex at all.
Ilya Maximets Dec. 16, 2021, 11:03 p.m. UTC | #8
On 12/16/21 14:25, Roi Dayan wrote:
> 
> 
> On 2021-12-15 2:44 PM, Ilya Maximets wrote:
>> On 12/15/21 10:50, Roi Dayan wrote:
>>>
>>>
>>> On 2021-12-14 3:14 PM, Ilya Maximets wrote:
>>>> On 12/14/21 13:07, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2021-12-14 1:59 PM, Roi Dayan wrote:
>>>>>>
>>>>>>
>>>>>> On 2021-12-09 3:16 PM, Ilya Maximets wrote:
>>>>>>> On 12/8/21 03:57, lin huang wrote:
>>>>>>>> From: linhuang <linhuang@ruijie.com.cn>
>>>>>>>>
>>>>>>>> Userspace tunnel doesn't have a valid device in the kernel. So
>>>>>>>> get_ifindex() function (ioctl) always get error during
>>>>>>>> adding a port, deleting a port or updating a port status.
>>>>>>>>
>>>>>>>> The info log is
>>>>>>>> "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX)
>>>>>>>> on vxlan_sys_4789 device failed: No such device"
>>>>>>>>
>>>>>>>> If there are a lot of userspace tunnel ports on a bridge, the
>>>>>>>> iface_refresh_netdev_status() function will spend a lot of time.
>>>>>>>>
>>>>>>>> So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just
>>>>>>>> return -ENODEV.
>>>>>>>>
>>>>>>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>>>>>>> Test-by: Mike Pattrick <mkp@redhat.com>
>>>>>>>> ---
>>>>>>>>     lib/netdev-vport.c | 6 ++++++
>>>>>>>>     vswitchd/bridge.c  | 2 ++
>>>>>>>>     2 files changed, 8 insertions(+)
>>>>>>>>
>>>>>>>
>>>>>>> Applied.  Thanks!
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> dev@openvswitch.org
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Croid%40nvidia.com%7C68774137f0cd4286502e08d9bfc8b9a8%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637751691102561850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OxGVmhBKUZpiBAkH9e3OLWPV%2F4O7baVYmosmBsXwQb8%3D&amp;reserved=0
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We encounter an offload issue with this commit that OVS doesn't add
>>>>>> rules to TC and keeps on with OVS dp.
>>>>>>
>>>>>> To reproduce the issue, configure ovs with bridge and 2 ports,
>>>>>> restart openvswitch service and then do ping between the ports.
>>>>>> Removing/readding port to bridge or calling dpctl/dump-flows "fix" the
>>>>>> issue and ovs calls TC again.
>>>>>>
>>>>>>    From first look it's because br->ofproto->type is not the same pointer
>>>>>> usually set in netdev_ports_insert() which can be dpi_class->type or
>>>>>> static "system" and in the netdev_ports_* functions the code uses
>>>>>> pointer equal instead of strcmp() to check the type.
>>>>>> So netdev_ports_lookup() returns error now.
>>>>>>
>>>>>> 2021-12-14T10:56:18.240Z|00010|dpif_netlink(handler1)|ERR|XXX parse_flow_put 2243 get port for type system ret (nil)
>>>>>>
>>>>>> Even after changing all netdev_ports_* to use strcmp() and
>>>>>> netdev_ports_lookup() is ok and we do get to tc flow put.
>>>>>> Looks like we get an error from tc to offload. so checking this part
>>>>>> now.
>>>>>>
>>>>>> 2021-12-14T11:23:09.230Z|00035|dpif_netlink(handler1)|ERR|failed to offload flow: No such device: enp8s0f0_0
>>>>>>
>>>>>> So still looking on this but wanted to share about the behavior.
>>>>>>
>>>>>> Thanks,
>>>>>> Roi
>>>>>
>>>>> I forgot to mention but just to make it clear,
>>>>> need to set hw-offload=true.
>>>>>
>>>>
>>>> Ugh.  Sorry, my fault.  Looking at the patch again now and this version
>>>> seems weird.  I don't know WTF I was thinking.  The main thing is that
>>>> we're calling netdev_set_dpif_type() twice with different arguments.
>>>> First time with br->ofproto->type as a type and second time with
>>>> dpif_normalize_type(dpif_type(dpif)).  And these are not the same.
>>>> And as you correctly noticed br->ofproto->type is not a constant string
>>>> unlike the normalized one, hence cannot be checked by a pointer comparison.
>>>>
>>>>
>>>> Could you try this:
>>>>
>>>> diff --git a/lib/dpif.c b/lib/dpif.c
>>>> index 38bcb47cb..cff0bc2db 100644
>>>> --- a/lib/dpif.c
>>>> +++ b/lib/dpif.c
>>>> @@ -364,7 +364,8 @@ do_open(const char *name, const char *type, bool create, struct dpif **dpifp)
>>>>                err = netdev_open(dpif_port.name, dpif_port.type, &netdev);
>>>>                  if (!err) {
>>>> -                netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>>>> +                netdev_set_dpif_type(netdev, dpif_type_str);
>>>> +                netdev_ports_insert(netdev, &dpif_port);
>>>>                    netdev_close(netdev);
>>>>                } else {
>>>>                    VLOG_WARN("could not open netdev %s type %s: %s",
>>>> @@ -602,10 +603,12 @@ dpif_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t *port_nop)
>>>>                const char *dpif_type_str = dpif_normalize_type(dpif_type(dpif));
>>>>                struct dpif_port dpif_port;
>>>>    +            netdev_set_dpif_type(netdev, dpif_type_str);
>>>> +
>>>>                dpif_port.type = CONST_CAST(char *, netdev_get_type(netdev));
>>>>                dpif_port.name = CONST_CAST(char *, netdev_name);
>>>>                dpif_port.port_no = port_no;
>>>> -            netdev_ports_insert(netdev, dpif_type_str, &dpif_port);
>>>> +            netdev_ports_insert(netdev, &dpif_port);
>>>>            }
>>>>        } else {
>>>>            VLOG_WARN_RL(&error_rl, "%s: failed to add %s as port: %s",
>>>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>>> index 8075cfbd8..b8dc108f3 100644
>>>> --- a/lib/netdev-offload.c
>>>> +++ b/lib/netdev-offload.c
>>>> @@ -573,12 +573,14 @@ netdev_ports_lookup(odp_port_t port_no, const char *dpif_type)
>>>>    }
>>>>      int
>>>> -netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>>>> -                    struct dpif_port *dpif_port)
>>>> +netdev_ports_insert(struct netdev *netdev, struct dpif_port *dpif_port)
>>>>    {
>>>> +    const char *dpif_type = netdev_get_dpif_type(netdev);
>>>>        struct port_to_netdev_data *data;
>>>>        int ifindex = netdev_get_ifindex(netdev);
>>>>    +    ovs_assert(dpif_type);
>>>> +
>>>>        ovs_rwlock_wrlock(&netdev_hmap_rwlock);
>>>>        if (netdev_ports_lookup(dpif_port->port_no, dpif_type)) {
>>>>            ovs_rwlock_unlock(&netdev_hmap_rwlock);
>>>> @@ -596,8 +598,6 @@ netdev_ports_insert(struct netdev *netdev, const char *dpif_type,
>>>>            data->ifindex = -1;
>>>>        }
>>>>    -    netdev_set_dpif_type(netdev, dpif_type);
>>>> -
>>>>        hmap_insert(&port_to_netdev, &data->portno_node,
>>>>                    netdev_ports_hash(dpif_port->port_no, dpif_type));
>>>>        ovs_rwlock_unlock(&netdev_hmap_rwlock);
>>>> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
>>>> index e7fcedae9..43a98d499 100644
>>>> --- a/lib/netdev-offload.h
>>>> +++ b/lib/netdev-offload.h
>>>> @@ -108,8 +108,7 @@ bool netdev_is_offload_rebalance_policy_enabled(void);
>>>>    int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
>>>>      struct dpif_port;
>>>> -int netdev_ports_insert(struct netdev *, const char *dpif_type,
>>>> -                        struct dpif_port *);
>>>> +int netdev_ports_insert(struct netdev *, struct dpif_port *);
>>>>    struct netdev *netdev_ports_get(odp_port_t port, const char *dpif_type);
>>>>    int netdev_ports_remove(odp_port_t port, const char *dpif_type);
>>>>    odp_port_t netdev_ifindex_to_odp_port(int ifindex);
>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>>> index 513ef7ea9..5223aa897 100644
>>>> --- a/vswitchd/bridge.c
>>>> +++ b/vswitchd/bridge.c
>>>> @@ -2052,8 +2052,6 @@ iface_do_create(const struct bridge *br,
>>>>            goto error;
>>>>        }
>>>>    -    netdev_set_dpif_type(netdev, br->ofproto->type);
>>>> -
>>>>        error = iface_set_netdev_config(iface_cfg, netdev, errp);
>>>>        if (error) {
>>>>            goto error;
>>>> ---
>>>>
>>>> ?
>>>>
>>>> Or else, we can just revert the patch.
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>>
>>> Hi Ilya,
>>>
>>> Thanks for the patch. The patch is fine but I think it doesn't do much.
>>>
>>> You remove the set dpif type of br->ofproto->type which is eliminating
>>> the issue of not offloading to tc.
>>> Leaving only the strcmp when trying to get ifindex.
>>>
>>> You refactored netdev_ports_insert() not to accept dpif_type to set
>>> to the netdev and instead you set it just before calling.
>>> So it doesn't change any logic or am I missing something?
>>
>> The idea is to set the dpif_type before the first call to netdev_get_ifindex.
>> Previously we had the netdev_get_ifindex() right at the start of
>> netdev_ports_insert() and that triggered the INFO log.  The patch added
>> another netdev_set_dpif_type() to the bridge.c which used an incorrect
>> argument, and didn't remove the "extra" call inside the netdev_ports_insert()
>> for some reason.
>>
>> What I proposed above is to keep calling  netdev_set_dpif_type() before the
>> netdev_ports_insert(), so the first call to the netdev_get_ifindex() happens
>> after the type is already set, so we don't have the INFO log.  Also removing
>> the old "extra" call inside the netdev_ports_insert(), since it's not needed.
>>
>> The call moved from bridge.c to dpif.c, because we need to have access to the
>> dpif class, but bridge.c should not.
>>
>> Not trying to set the dpif_type inside the netdev_ports_insert(), because it's
>> used now outside the offloading context.  So, I think, it's cleaner to move the
>> netdev_set_dpif_type() call outside of the netdev-offload module.
>>
>> So, yeah, the logic did not change much.  Using the correct argument for
>> the netdev_set_dpif_type() and refactoring along a way to make the code
>> a bit cleaner.
>>
>> Does that make sense?
>>
>>> it's ok by me if you think its more readable and tc issue is resolved.
>>>
>>> I also tried to reproduce the original log msg that fails to get ifindex
>>> on vxlan_sys_4789 and I didn't reproduce so I can't verify about it.
>>>
>>> The origin patch assumption was that in iface_do_create() we were
>>> missing netdev dpif type and thus got the error trying to config some
>>> devices. so maybe we were not?
>>
>> The issue was that netdev_ports_insert() requests ifindex before
>> setting the dpif type, so we wasn't able to check it.  And since
>> the userspace tunnel devices doesn't exist in the kernel, attempt
>> to get the ifindex fails and prints the INFO log.
>>
>> I can confirm that netdev_ports_insert() is a first place where
>> ifindex is requested, so setting dpif type before calling this
>> function should fix the original logging issue.
>>
>>
>> The offloading works fine with the above change applied, right?
>> If so, I'll send an official patch.
>>
>> Best regards, Ilya Maximets.
> 
> right. I missed the get_ifindex call at the top of
> netdev_ports_insert() when reading.
> 
> so everything looks good. thanks.

OK.  Thanks!  I'll prepare and send a proper patch.

> 
> if important, you can also move get_ifindex call inside
> netdev_ports_insert() to right before we use it so if port
> exist there won't be a call to get_ifindex at all.

That's an interesting point.  But it might be better to not move
a syscall under the write lock here.  I'll keep as-is for now.
We can change that later, if that will be an issue.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 499c0291c9..64331f74bf 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -1151,6 +1151,12 @@  netdev_vport_get_ifindex(const struct netdev *netdev_)
 {
     char buf[NETDEV_VPORT_NAME_BUFSIZE];
     const char *name = netdev_vport_get_dpif_port(netdev_, buf, sizeof(buf));
+    const char *dpif_type = netdev_get_dpif_type(netdev_);
+
+    if (dpif_type && strcmp(dpif_type, "system")) {
+        /* Not a system device. */
+        return -ENODEV;
+    }

     return linux_get_ifindex(name);
 }
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 5223aa8970..513ef7ea9c 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2052,6 +2052,8 @@  iface_do_create(const struct bridge *br,
         goto error;
     }

+    netdev_set_dpif_type(netdev, br->ofproto->type);
+
     error = iface_set_netdev_config(iface_cfg, netdev, errp);
     if (error) {
         goto error;