diff mbox series

[ovs-dev,ovs-dev,v1] netdev-offload: Flush offloaded rules when ports removed

Message ID 20200611103635.53367-1-xiangxia.m.yue@gmail.com
State New
Headers show
Series [ovs-dev,ovs-dev,v1] netdev-offload: Flush offloaded rules when ports removed | expand

Commit Message

Tonghao Zhang June 11, 2020, 10:36 a.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

When ports were removed from bridge, we should flush the
offload rules on the ports. The main reason is two factors:

* The ports removed from bridge, will be managed by OvS, so
  flush the rules which installed by OvS.
* If using the TC flower offload, for example, tc rules still
  are on the ports. And the information still are maintained by
  OvS, such as the mapping for tc and ufid.
  Then if adding the port to bridge and installing the rules
  to it again, *del_filter_and_ufid_mapping will be invoked,
  and delete the tc rule using tc handle which may not exist
  (offload init api flushed them.) on kernel or is used by other
  previous rules (if so, that rules will be deleted that is not
  we expected.).

Cc: Simon Horman <simon.horman@netronome.com>
Cc: Paul Blakey <paulb@mellanox.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Ben Pfaff <blp@ovn.org>
Cc: William Tu <u9012063@gmail.com>
Cc: Ilya Maximets <i.maximets@ovn.org>
Tested-at: https://travis-ci.com/github/ovn-open-virtual-networks/ovs/builds/170832624
Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 lib/netdev-offload.c | 1 +
 1 file changed, 1 insertion(+)

Comments

0-day Robot June 11, 2020, 11:33 a.m. UTC | #1
Bleep bloop.  Greetings Tonghao Zhang, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Co-author Wengang Hou <houwengang@didiglobal.com> needs to sign off.
Lines checked: 48, Warnings: 0, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Roi Dayan June 14, 2020, 5:30 a.m. UTC | #2
On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> When ports were removed from bridge, we should flush the
> offload rules on the ports. The main reason is two factors:
> 
> * The ports removed from bridge, will be managed by OvS, so
>   flush the rules which installed by OvS.
> * If using the TC flower offload, for example, tc rules still
>   are on the ports. And the information still are maintained by
>   OvS, such as the mapping for tc and ufid.
>   Then if adding the port to bridge and installing the rules
>   to it again, *del_filter_and_ufid_mapping will be invoked,
>   and delete the tc rule using tc handle which may not exist
>   (offload init api flushed them.) on kernel or is used by other
>   previous rules (if so, that rules will be deleted that is not
>   we expected.).
> 
> Cc: Simon Horman <simon.horman@netronome.com>
> Cc: Paul Blakey <paulb@mellanox.com>
> Cc: Roi Dayan <roid@mellanox.com>
> Cc: Ben Pfaff <blp@ovn.org>
> Cc: William Tu <u9012063@gmail.com>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
>  lib/netdev-offload.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index ab97a292ebac..964566caab1e 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
>      data = netdev_ports_lookup(port_no, dpif_class);
>      if (data) {
>          dpif_port_destroy(&data->dpif_port);
> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
>          netdev_close(data->netdev); /* unref and possibly close */
>          hmap_remove(&port_to_netdev, &data->portno_node);
>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> 


it looks ok but I wonder if it's redundant?
I tested and when I removed a port with tc rules on it,
all tc rules got deleted using the flow api flow del,
we get into netdev_tc_flow_del() for each flow.
So you shouldn't have a tc rule left or ufid mapping.
How did you test that you saw tc rules were still on the port?
Tonghao Zhang June 14, 2020, 11:40 a.m. UTC | #3
On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
>
>
>
> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > When ports were removed from bridge, we should flush the
> > offload rules on the ports. The main reason is two factors:
> >
> > * The ports removed from bridge, will be managed by OvS, so
> >   flush the rules which installed by OvS.
> > * If using the TC flower offload, for example, tc rules still
> >   are on the ports. And the information still are maintained by
> >   OvS, such as the mapping for tc and ufid.
> >   Then if adding the port to bridge and installing the rules
> >   to it again, *del_filter_and_ufid_mapping will be invoked,
> >   and delete the tc rule using tc handle which may not exist
> >   (offload init api flushed them.) on kernel or is used by other
> >   previous rules (if so, that rules will be deleted that is not
> >   we expected.).
> >
> > Cc: Simon Horman <simon.horman@netronome.com>
> > Cc: Paul Blakey <paulb@mellanox.com>
> > Cc: Roi Dayan <roid@mellanox.com>
> > Cc: Ben Pfaff <blp@ovn.org>
> > Cc: William Tu <u9012063@gmail.com>
> > Cc: Ilya Maximets <i.maximets@ovn.org>
> > Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> > Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> >  lib/netdev-offload.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> > index ab97a292ebac..964566caab1e 100644
> > --- a/lib/netdev-offload.c
> > +++ b/lib/netdev-offload.c
> > @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> >      data = netdev_ports_lookup(port_no, dpif_class);
> >      if (data) {
> >          dpif_port_destroy(&data->dpif_port);
> > +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> >          netdev_close(data->netdev); /* unref and possibly close */
> >          hmap_remove(&port_to_netdev, &data->portno_node);
> >          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> >
>
Hi Roi
Thanks for your review. The case is reproduced as below:
# ovs-appctl dpctl/show
system@ovs-system:
  lookups: hit:0 missed:0 lost:0
  flows: 0
  masks: hit:0 total:0 hit/pkt:0.00
  port 0: ovs-system (internal)
  port 1: br-int (internal)
  port 2: enp130s0f0_0
  port 3: enp130s0f0_1
  port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)

# cat /tmp/tmp-ovs-debug.sh
set -x
ovs-appctl dpctl/add-flow
'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
3
ovs-appctl dpctl/dump-flows
ovs-vsctl del-port br-int enp130s0f0_0
tc filter show dev enp130s0f0_0 ingress
ovs-appctl dpctl/dump-flows

# sh /tmp/tmp-ovs-debug.sh
+ ovs-appctl dpctl/add-flow
'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
3
+ ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
packets:0, bytes:0, used:0.020s, actions:3
+ ovs-vsctl del-port br-int enp130s0f0_0
+ tc filter show dev enp130s0f0_0 ingress
filter protocol ip pref 2 flower chain 0
filter protocol ip pref 2 flower chain 0 handle 0x1
  dst_mac 00:11:22:33:44:66
  eth_type ipv4
  dst_ip 1.1.1.200
  in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
index 1 ref 1 bind 1
cookie 40ca20750c4add5a9cc1c880ccb3d0e8
no_percpu
used_hw_stats delayed

+ ovs-appctl dpctl/dump-flows

The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
command to show them.

> it looks ok but I wonder if it's redundant?
> I tested and when I removed a port with tc rules on it,
> all tc rules got deleted using the flow api flow del,
> we get into netdev_tc_flow_del() for each flow.
I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
The debug step is:
1. run the /tmp/tmp-ovs-debug.sh
2. and while gdb the ovs-vswitchd:
gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]

the breakpoint was not triggered.
> So you shouldn't have a tc rule left or ufid mapping.
As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
flow still are in the mapping.
When we add the port back and the install the rules again, the new
rules(e.g. tc handle 10) may be deleted by the later new rules(which
ufid mapping not
deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> How did you test that you saw tc rules were still on the port?
Yes, I test it using the tc command.



--
Best regards, Tonghao
Ilya Maximets June 15, 2020, 11:13 a.m. UTC | #4
On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
>>
>>
>>
>> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>
>>> When ports were removed from bridge, we should flush the
>>> offload rules on the ports. The main reason is two factors:
>>>
>>> * The ports removed from bridge, will be managed by OvS, so
>>>   flush the rules which installed by OvS.
>>> * If using the TC flower offload, for example, tc rules still
>>>   are on the ports. And the information still are maintained by
>>>   OvS, such as the mapping for tc and ufid.
>>>   Then if adding the port to bridge and installing the rules
>>>   to it again, *del_filter_and_ufid_mapping will be invoked,
>>>   and delete the tc rule using tc handle which may not exist
>>>   (offload init api flushed them.) on kernel or is used by other
>>>   previous rules (if so, that rules will be deleted that is not
>>>   we expected.).
>>>
>>> Cc: Simon Horman <simon.horman@netronome.com>
>>> Cc: Paul Blakey <paulb@mellanox.com>
>>> Cc: Roi Dayan <roid@mellanox.com>
>>> Cc: Ben Pfaff <blp@ovn.org>
>>> Cc: William Tu <u9012063@gmail.com>
>>> Cc: Ilya Maximets <i.maximets@ovn.org>
>>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
>>> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>> ---
>>>  lib/netdev-offload.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>> index ab97a292ebac..964566caab1e 100644
>>> --- a/lib/netdev-offload.c
>>> +++ b/lib/netdev-offload.c
>>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
>>>      data = netdev_ports_lookup(port_no, dpif_class);
>>>      if (data) {
>>>          dpif_port_destroy(&data->dpif_port);
>>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
>>>          netdev_close(data->netdev); /* unref and possibly close */
>>>          hmap_remove(&port_to_netdev, &data->portno_node);
>>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
>>>
>>
> Hi Roi
> Thanks for your review. The case is reproduced as below:
> # ovs-appctl dpctl/show
> system@ovs-system:
>   lookups: hit:0 missed:0 lost:0
>   flows: 0
>   masks: hit:0 total:0 hit/pkt:0.00
>   port 0: ovs-system (internal)
>   port 1: br-int (internal)
>   port 2: enp130s0f0_0
>   port 3: enp130s0f0_1
>   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> 
> # cat /tmp/tmp-ovs-debug.sh
> set -x
> ovs-appctl dpctl/add-flow
> 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> 3
> ovs-appctl dpctl/dump-flows
> ovs-vsctl del-port br-int enp130s0f0_0
> tc filter show dev enp130s0f0_0 ingress
> ovs-appctl dpctl/dump-flows
> 
> # sh /tmp/tmp-ovs-debug.sh
> + ovs-appctl dpctl/add-flow
> 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> 3
> + ovs-appctl dpctl/dump-flows
> recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> packets:0, bytes:0, used:0.020s, actions:3
> + ovs-vsctl del-port br-int enp130s0f0_0
> + tc filter show dev enp130s0f0_0 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   dst_mac 00:11:22:33:44:66
>   eth_type ipv4
>   dst_ip 1.1.1.200
>   in_hw in_hw_count 1
> action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> index 1 ref 1 bind 1
> cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> no_percpu
> used_hw_stats delayed
> 
> + ovs-appctl dpctl/dump-flows
> 
> The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> command to show them.
> 
>> it looks ok but I wonder if it's redundant?
>> I tested and when I removed a port with tc rules on it,
>> all tc rules got deleted using the flow api flow del,
>> we get into netdev_tc_flow_del() for each flow.
> I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> The debug step is:
> 1. run the /tmp/tmp-ovs-debug.sh
> 2. and while gdb the ovs-vswitchd:
> gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> 
> the breakpoint was not triggered.
>> So you shouldn't have a tc rule left or ufid mapping.
> As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> flow still are in the mapping.
> When we add the port back and the install the rules again, the new
> rules(e.g. tc handle 10) may be deleted by the later new rules(which
> ufid mapping not
> deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
>> How did you test that you saw tc rules were still on the port?
> Yes, I test it using the tc command.

Hi.

{add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
is clearly stated in documentation:

"DATAPATH FLOW TABLE DEBUGGING COMMANDS
 ...
 Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
 is running because it interferes with ovs-vswitchd's own datapath flow
 management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."

As you're injecting flows directly to datapath you can't expect higher layers
to handle them correctly.  The issue here is that you're using OVS in a way
it's not intended to be used.  Please, configure OVS rules via OpenFlow or
be sure that you're removing flows by hands from the datapath as you're adding
them.

In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
reverting of the previous patch that allowed offloading via add-flow since this
is effectively a memory leak and even debugging commands should not produce
a memory leak.

---

BTW, IMHO, it's a very confusing design that we're installing rules directly to
tc.  The issue here is that we have to maintain list of ports we offloaded to
with ufids completely in userspace and we can not re-create these mappings after
OVS restart like we do in case of kernel datapath.  The main issue is that we
do not know which ports are ours.  Datapath is actually split between kernel
and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
This leads to big number of issues.  If we could install tc rules from the
inside of the usual kernel datapath that might solve a lot of configuration issues.
Might create some new, but anyway, I think, common architecture would be much
more clear.
For example, in userspace datapath we're installing flows to usual userspace
flow table AND to offload provider. This allows us to manage flows in more
natural way.  And even if we will lost some flows inside offload provider
implementation we will still clean up all the hanging data while removing flows
from the usual userspace datapath flow tables.  So, offloading is actually
joined part of userspace datapath and not a side thing like tc for kernel datapath.
So, what is this part about: kernel datapath has persistent ports and flows
(persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
flows and non-persistent ports (i.e. list of ports we're affloading to) and
non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
a lot of complications.


Best regards, Ilya Maximets.
Tonghao Zhang June 16, 2020, 8:25 a.m. UTC | #5
On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
> >>
> >>
> >>
> >> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>
> >>> When ports were removed from bridge, we should flush the
> >>> offload rules on the ports. The main reason is two factors:
> >>>
> >>> * The ports removed from bridge, will be managed by OvS, so
> >>>   flush the rules which installed by OvS.
> >>> * If using the TC flower offload, for example, tc rules still
> >>>   are on the ports. And the information still are maintained by
> >>>   OvS, such as the mapping for tc and ufid.
> >>>   Then if adding the port to bridge and installing the rules
> >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> >>>   and delete the tc rule using tc handle which may not exist
> >>>   (offload init api flushed them.) on kernel or is used by other
> >>>   previous rules (if so, that rules will be deleted that is not
> >>>   we expected.).
> >>>
> >>> Cc: Simon Horman <simon.horman@netronome.com>
> >>> Cc: Paul Blakey <paulb@mellanox.com>
> >>> Cc: Roi Dayan <roid@mellanox.com>
> >>> Cc: Ben Pfaff <blp@ovn.org>
> >>> Cc: William Tu <u9012063@gmail.com>
> >>> Cc: Ilya Maximets <i.maximets@ovn.org>
> >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> >>> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>> ---
> >>>  lib/netdev-offload.c | 1 +
> >>>  1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> >>> index ab97a292ebac..964566caab1e 100644
> >>> --- a/lib/netdev-offload.c
> >>> +++ b/lib/netdev-offload.c
> >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> >>>      data = netdev_ports_lookup(port_no, dpif_class);
> >>>      if (data) {
> >>>          dpif_port_destroy(&data->dpif_port);
> >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> >>>          netdev_close(data->netdev); /* unref and possibly close */
> >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> >>>
> >>
> > Hi Roi
> > Thanks for your review. The case is reproduced as below:
> > # ovs-appctl dpctl/show
> > system@ovs-system:
> >   lookups: hit:0 missed:0 lost:0
> >   flows: 0
> >   masks: hit:0 total:0 hit/pkt:0.00
> >   port 0: ovs-system (internal)
> >   port 1: br-int (internal)
> >   port 2: enp130s0f0_0
> >   port 3: enp130s0f0_1
> >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> >
> > # cat /tmp/tmp-ovs-debug.sh
> > set -x
> > ovs-appctl dpctl/add-flow
> > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > 3
> > ovs-appctl dpctl/dump-flows
> > ovs-vsctl del-port br-int enp130s0f0_0
> > tc filter show dev enp130s0f0_0 ingress
> > ovs-appctl dpctl/dump-flows
> >
> > # sh /tmp/tmp-ovs-debug.sh
> > + ovs-appctl dpctl/add-flow
> > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > 3
> > + ovs-appctl dpctl/dump-flows
> > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > packets:0, bytes:0, used:0.020s, actions:3
> > + ovs-vsctl del-port br-int enp130s0f0_0
> > + tc filter show dev enp130s0f0_0 ingress
> > filter protocol ip pref 2 flower chain 0
> > filter protocol ip pref 2 flower chain 0 handle 0x1
> >   dst_mac 00:11:22:33:44:66
> >   eth_type ipv4
> >   dst_ip 1.1.1.200
> >   in_hw in_hw_count 1
> > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > index 1 ref 1 bind 1
> > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > no_percpu
> > used_hw_stats delayed
> >
> > + ovs-appctl dpctl/dump-flows
> >
> > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > command to show them.
> >
> >> it looks ok but I wonder if it's redundant?
> >> I tested and when I removed a port with tc rules on it,
> >> all tc rules got deleted using the flow api flow del,
> >> we get into netdev_tc_flow_del() for each flow.
> > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > The debug step is:
> > 1. run the /tmp/tmp-ovs-debug.sh
> > 2. and while gdb the ovs-vswitchd:
> > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> >
> > the breakpoint was not triggered.
> >> So you shouldn't have a tc rule left or ufid mapping.
> > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > flow still are in the mapping.
> > When we add the port back and the install the rules again, the new
> > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > ufid mapping not
> > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> >> How did you test that you saw tc rules were still on the port?
> > Yes, I test it using the tc command.
>
> Hi.
>
> {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> is clearly stated in documentation:
>
> "DATAPATH FLOW TABLE DEBUGGING COMMANDS
>  ...
>  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
>  is running because it interferes with ovs-vswitchd's own datapath flow
>  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
>
> As you're injecting flows directly to datapath you can't expect higher layers
> to handle them correctly.  The issue here is that you're using OVS in a way
> it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> be sure that you're removing flows by hands from the datapath as you're adding
> them.
Hi
Thanks for your review. I test it with openflow, It works fine because
the *revalidator
will clean up the flows(which may be offloaded).
> In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> reverting of the previous patch that allowed offloading via add-flow since this
> is effectively a memory leak and even debugging commands should not produce
> a memory leak.
yes, we can revert the previous patch. But I think the command is
useful for us to
debug offload(rte flow offload and tc offload), and test the offload function.
For this issue, can we use this little patch to fix it ? or other better way ?
> ---
>
> BTW, IMHO, it's a very confusing design that we're installing rules directly to
> tc.  The issue here is that we have to maintain list of ports we offloaded to
> with ufids completely in userspace and we can not re-create these mappings after
> OVS restart like we do in case of kernel datapath.  The main issue is that we
> do not know which ports are ours.  Datapath is actually split between kernel
> and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> This leads to big number of issues.  If we could install tc rules from the
> inside of the usual kernel datapath that might solve a lot of configuration issues.
> Might create some new, but anyway, I think, common architecture would be much
> more clear.
> For example, in userspace datapath we're installing flows to usual userspace
> flow table AND to offload provider. This allows us to manage flows in more
> natural way.  And even if we will lost some flows inside offload provider
> implementation we will still clean up all the hanging data while removing flows
> from the usual userspace datapath flow tables.  So, offloading is actually
> joined part of userspace datapath and not a side thing like tc for kernel datapath.
> So, what is this part about: kernel datapath has persistent ports and flows
> (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> flows and non-persistent ports (i.e. list of ports we're affloading to) and
> non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> a lot of complications.
>
>
> Best regards, Ilya Maximets.
Simon Horman June 18, 2020, 11:18 a.m. UTC | #6
On Tue, Jun 16, 2020 at 04:25:21PM +0800, Tonghao Zhang wrote:
> On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> >
> > On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>>
> > >>> When ports were removed from bridge, we should flush the
> > >>> offload rules on the ports. The main reason is two factors:
> > >>>
> > >>> * The ports removed from bridge, will be managed by OvS, so
> > >>>   flush the rules which installed by OvS.
> > >>> * If using the TC flower offload, for example, tc rules still
> > >>>   are on the ports. And the information still are maintained by
> > >>>   OvS, such as the mapping for tc and ufid.
> > >>>   Then if adding the port to bridge and installing the rules
> > >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> > >>>   and delete the tc rule using tc handle which may not exist
> > >>>   (offload init api flushed them.) on kernel or is used by other
> > >>>   previous rules (if so, that rules will be deleted that is not
> > >>>   we expected.).
> > >>>
> > >>> Cc: Simon Horman <simon.horman@netronome.com>
> > >>> Cc: Paul Blakey <paulb@mellanox.com>
> > >>> Cc: Roi Dayan <roid@mellanox.com>
> > >>> Cc: Ben Pfaff <blp@ovn.org>
> > >>> Cc: William Tu <u9012063@gmail.com>
> > >>> Cc: Ilya Maximets <i.maximets@ovn.org>
> > >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> > >>> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>> ---
> > >>>  lib/netdev-offload.c | 1 +
> > >>>  1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> > >>> index ab97a292ebac..964566caab1e 100644
> > >>> --- a/lib/netdev-offload.c
> > >>> +++ b/lib/netdev-offload.c
> > >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> > >>>      data = netdev_ports_lookup(port_no, dpif_class);
> > >>>      if (data) {
> > >>>          dpif_port_destroy(&data->dpif_port);
> > >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> > >>>          netdev_close(data->netdev); /* unref and possibly close */
> > >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> > >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> > >>>
> > >>
> > > Hi Roi
> > > Thanks for your review. The case is reproduced as below:
> > > # ovs-appctl dpctl/show
> > > system@ovs-system:
> > >   lookups: hit:0 missed:0 lost:0
> > >   flows: 0
> > >   masks: hit:0 total:0 hit/pkt:0.00
> > >   port 0: ovs-system (internal)
> > >   port 1: br-int (internal)
> > >   port 2: enp130s0f0_0
> > >   port 3: enp130s0f0_1
> > >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> > >
> > > # cat /tmp/tmp-ovs-debug.sh
> > > set -x
> > > ovs-appctl dpctl/add-flow
> > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > 3
> > > ovs-appctl dpctl/dump-flows
> > > ovs-vsctl del-port br-int enp130s0f0_0
> > > tc filter show dev enp130s0f0_0 ingress
> > > ovs-appctl dpctl/dump-flows
> > >
> > > # sh /tmp/tmp-ovs-debug.sh
> > > + ovs-appctl dpctl/add-flow
> > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > 3
> > > + ovs-appctl dpctl/dump-flows
> > > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > > packets:0, bytes:0, used:0.020s, actions:3
> > > + ovs-vsctl del-port br-int enp130s0f0_0
> > > + tc filter show dev enp130s0f0_0 ingress
> > > filter protocol ip pref 2 flower chain 0
> > > filter protocol ip pref 2 flower chain 0 handle 0x1
> > >   dst_mac 00:11:22:33:44:66
> > >   eth_type ipv4
> > >   dst_ip 1.1.1.200
> > >   in_hw in_hw_count 1
> > > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > > index 1 ref 1 bind 1
> > > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > > no_percpu
> > > used_hw_stats delayed
> > >
> > > + ovs-appctl dpctl/dump-flows
> > >
> > > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > > command to show them.
> > >
> > >> it looks ok but I wonder if it's redundant?
> > >> I tested and when I removed a port with tc rules on it,
> > >> all tc rules got deleted using the flow api flow del,
> > >> we get into netdev_tc_flow_del() for each flow.
> > > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > > The debug step is:
> > > 1. run the /tmp/tmp-ovs-debug.sh
> > > 2. and while gdb the ovs-vswitchd:
> > > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> > >
> > > the breakpoint was not triggered.
> > >> So you shouldn't have a tc rule left or ufid mapping.
> > > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > > flow still are in the mapping.
> > > When we add the port back and the install the rules again, the new
> > > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > > ufid mapping not
> > > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> > >> How did you test that you saw tc rules were still on the port?
> > > Yes, I test it using the tc command.
> >
> > Hi.
> >
> > {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> > is clearly stated in documentation:
> >
> > "DATAPATH FLOW TABLE DEBUGGING COMMANDS
> >  ...
> >  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
> >  is running because it interferes with ovs-vswitchd's own datapath flow
> >  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
> >
> > As you're injecting flows directly to datapath you can't expect higher layers
> > to handle them correctly.  The issue here is that you're using OVS in a way
> > it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> > be sure that you're removing flows by hands from the datapath as you're adding
> > them.
> Hi
> Thanks for your review. I test it with openflow, It works fine because
> the *revalidator
> will clean up the flows(which may be offloaded).
> > In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> > reverting of the previous patch that allowed offloading via add-flow since this
> > is effectively a memory leak and even debugging commands should not produce
> > a memory leak.
> yes, we can revert the previous patch. But I think the command is
> useful for us to
> debug offload(rte flow offload and tc offload), and test the offload function.
> For this issue, can we use this little patch to fix it ? or other better way ?

I am confused. Which little patch?

> > ---
> >
> > BTW, IMHO, it's a very confusing design that we're installing rules directly to
> > tc.  The issue here is that we have to maintain list of ports we offloaded to
> > with ufids completely in userspace and we can not re-create these mappings after
> > OVS restart like we do in case of kernel datapath.  The main issue is that we
> > do not know which ports are ours.  Datapath is actually split between kernel
> > and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> > This leads to big number of issues.  If we could install tc rules from the
> > inside of the usual kernel datapath that might solve a lot of configuration issues.
> > Might create some new, but anyway, I think, common architecture would be much
> > more clear.
> > For example, in userspace datapath we're installing flows to usual userspace
> > flow table AND to offload provider. This allows us to manage flows in more
> > natural way.  And even if we will lost some flows inside offload provider
> > implementation we will still clean up all the hanging data while removing flows
> > from the usual userspace datapath flow tables.  So, offloading is actually
> > joined part of userspace datapath and not a side thing like tc for kernel datapath.
> > So, what is this part about: kernel datapath has persistent ports and flows
> > (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> > flows and non-persistent ports (i.e. list of ports we're affloading to) and
> > non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> > a lot of complications.
> >
> >
> > Best regards, Ilya Maximets.
> 
> 
> 
> -- 
> Best regards, Tonghao
Tonghao Zhang June 18, 2020, 11:41 a.m. UTC | #7
On Thu, Jun 18, 2020 at 7:18 PM Simon Horman <simon.horman@netronome.com> wrote:
>
> On Tue, Jun 16, 2020 at 04:25:21PM +0800, Tonghao Zhang wrote:
> > On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > >
> > > On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > > > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> > > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >>>
> > > >>> When ports were removed from bridge, we should flush the
> > > >>> offload rules on the ports. The main reason is two factors:
> > > >>>
> > > >>> * The ports removed from bridge, will be managed by OvS, so
> > > >>>   flush the rules which installed by OvS.
> > > >>> * If using the TC flower offload, for example, tc rules still
> > > >>>   are on the ports. And the information still are maintained by
> > > >>>   OvS, such as the mapping for tc and ufid.
> > > >>>   Then if adding the port to bridge and installing the rules
> > > >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> > > >>>   and delete the tc rule using tc handle which may not exist
> > > >>>   (offload init api flushed them.) on kernel or is used by other
> > > >>>   previous rules (if so, that rules will be deleted that is not
> > > >>>   we expected.).
> > > >>>
> > > >>> Cc: Simon Horman <simon.horman@netronome.com>
> > > >>> Cc: Paul Blakey <paulb@mellanox.com>
> > > >>> Cc: Roi Dayan <roid@mellanox.com>
> > > >>> Cc: Ben Pfaff <blp@ovn.org>
> > > >>> Cc: William Tu <u9012063@gmail.com>
> > > >>> Cc: Ilya Maximets <i.maximets@ovn.org>
> > > >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> > > >>> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> > > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >>> ---
> > > >>>  lib/netdev-offload.c | 1 +
> > > >>>  1 file changed, 1 insertion(+)
> > > >>>
> > > >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> > > >>> index ab97a292ebac..964566caab1e 100644
> > > >>> --- a/lib/netdev-offload.c
> > > >>> +++ b/lib/netdev-offload.c
> > > >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> > > >>>      data = netdev_ports_lookup(port_no, dpif_class);
> > > >>>      if (data) {
> > > >>>          dpif_port_destroy(&data->dpif_port);
> > > >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> > > >>>          netdev_close(data->netdev); /* unref and possibly close */
> > > >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> > > >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> > > >>>
> > > >>
> > > > Hi Roi
> > > > Thanks for your review. The case is reproduced as below:
> > > > # ovs-appctl dpctl/show
> > > > system@ovs-system:
> > > >   lookups: hit:0 missed:0 lost:0
> > > >   flows: 0
> > > >   masks: hit:0 total:0 hit/pkt:0.00
> > > >   port 0: ovs-system (internal)
> > > >   port 1: br-int (internal)
> > > >   port 2: enp130s0f0_0
> > > >   port 3: enp130s0f0_1
> > > >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> > > >
> > > > # cat /tmp/tmp-ovs-debug.sh
> > > > set -x
> > > > ovs-appctl dpctl/add-flow
> > > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > > 3
> > > > ovs-appctl dpctl/dump-flows
> > > > ovs-vsctl del-port br-int enp130s0f0_0
> > > > tc filter show dev enp130s0f0_0 ingress
> > > > ovs-appctl dpctl/dump-flows
> > > >
> > > > # sh /tmp/tmp-ovs-debug.sh
> > > > + ovs-appctl dpctl/add-flow
> > > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > > 3
> > > > + ovs-appctl dpctl/dump-flows
> > > > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > > > packets:0, bytes:0, used:0.020s, actions:3
> > > > + ovs-vsctl del-port br-int enp130s0f0_0
> > > > + tc filter show dev enp130s0f0_0 ingress
> > > > filter protocol ip pref 2 flower chain 0
> > > > filter protocol ip pref 2 flower chain 0 handle 0x1
> > > >   dst_mac 00:11:22:33:44:66
> > > >   eth_type ipv4
> > > >   dst_ip 1.1.1.200
> > > >   in_hw in_hw_count 1
> > > > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > > > index 1 ref 1 bind 1
> > > > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > > > no_percpu
> > > > used_hw_stats delayed
> > > >
> > > > + ovs-appctl dpctl/dump-flows
> > > >
> > > > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > > > command to show them.
> > > >
> > > >> it looks ok but I wonder if it's redundant?
> > > >> I tested and when I removed a port with tc rules on it,
> > > >> all tc rules got deleted using the flow api flow del,
> > > >> we get into netdev_tc_flow_del() for each flow.
> > > > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > > > The debug step is:
> > > > 1. run the /tmp/tmp-ovs-debug.sh
> > > > 2. and while gdb the ovs-vswitchd:
> > > > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > > > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> > > >
> > > > the breakpoint was not triggered.
> > > >> So you shouldn't have a tc rule left or ufid mapping.
> > > > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > > > flow still are in the mapping.
> > > > When we add the port back and the install the rules again, the new
> > > > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > > > ufid mapping not
> > > > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> > > >> How did you test that you saw tc rules were still on the port?
> > > > Yes, I test it using the tc command.
> > >
> > > Hi.
> > >
> > > {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> > > is clearly stated in documentation:
> > >
> > > "DATAPATH FLOW TABLE DEBUGGING COMMANDS
> > >  ...
> > >  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
> > >  is running because it interferes with ovs-vswitchd's own datapath flow
> > >  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
> > >
> > > As you're injecting flows directly to datapath you can't expect higher layers
> > > to handle them correctly.  The issue here is that you're using OVS in a way
> > > it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> > > be sure that you're removing flows by hands from the datapath as you're adding
> > > them.
> > Hi
> > Thanks for your review. I test it with openflow, It works fine because
> > the *revalidator
> > will clean up the flows(which may be offloaded).
> > > In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> > > reverting of the previous patch that allowed offloading via add-flow since this
> > > is effectively a memory leak and even debugging commands should not produce
> > > a memory leak.
> > yes, we can revert the previous patch. But I think the command is
> > useful for us to
> > debug offload(rte flow offload and tc offload), and test the offload function.
> > For this issue, can we use this little patch to fix it ? or other better way ?
>
> I am confused. Which little patch?
Hi Simon
This patch [1]:
[1] - http://patchwork.ozlabs.org/project/openvswitch/patch/20200611103635.53367-1-xiangxia.m.yue@gmail.com/

If we revert the patch which commit id is
e61984e781e6c7d621568428788cb87c11be8f1f
and we can't debug or install the flows with "ovs-app dpctl/add-flow",
for tc offload.
I think that tools are useful for us.
> > > ---
> > >
> > > BTW, IMHO, it's a very confusing design that we're installing rules directly to
> > > tc.  The issue here is that we have to maintain list of ports we offloaded to
> > > with ufids completely in userspace and we can not re-create these mappings after
> > > OVS restart like we do in case of kernel datapath.  The main issue is that we
> > > do not know which ports are ours.  Datapath is actually split between kernel
> > > and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> > > This leads to big number of issues.  If we could install tc rules from the
> > > inside of the usual kernel datapath that might solve a lot of configuration issues.
> > > Might create some new, but anyway, I think, common architecture would be much
> > > more clear.
> > > For example, in userspace datapath we're installing flows to usual userspace
> > > flow table AND to offload provider. This allows us to manage flows in more
> > > natural way.  And even if we will lost some flows inside offload provider
> > > implementation we will still clean up all the hanging data while removing flows
> > > from the usual userspace datapath flow tables.  So, offloading is actually
> > > joined part of userspace datapath and not a side thing like tc for kernel datapath.
> > > So, what is this part about: kernel datapath has persistent ports and flows
> > > (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> > > flows and non-persistent ports (i.e. list of ports we're affloading to) and
> > > non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> > > a lot of complications.
> > >
> > >
> > > Best regards, Ilya Maximets.
> >
> >
> >
> > --
> > Best regards, Tonghao
Tonghao Zhang July 16, 2020, 2:20 a.m. UTC | #8
On Thu, Jun 18, 2020 at 7:41 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Thu, Jun 18, 2020 at 7:18 PM Simon Horman <simon.horman@netronome.com> wrote:
> >
> > On Tue, Jun 16, 2020 at 04:25:21PM +0800, Tonghao Zhang wrote:
> > > On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > >
> > > > On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > > > > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid@mellanox.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 2020-06-11 1:36 PM, xiangxia.m.yue@gmail.com wrote:
> > > > >>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > > >>>
> > > > >>> When ports were removed from bridge, we should flush the
> > > > >>> offload rules on the ports. The main reason is two factors:
> > > > >>>
> > > > >>> * The ports removed from bridge, will be managed by OvS, so
> > > > >>>   flush the rules which installed by OvS.
> > > > >>> * If using the TC flower offload, for example, tc rules still
> > > > >>>   are on the ports. And the information still are maintained by
> > > > >>>   OvS, such as the mapping for tc and ufid.
> > > > >>>   Then if adding the port to bridge and installing the rules
> > > > >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> > > > >>>   and delete the tc rule using tc handle which may not exist
> > > > >>>   (offload init api flushed them.) on kernel or is used by other
> > > > >>>   previous rules (if so, that rules will be deleted that is not
> > > > >>>   we expected.).
> > > > >>>
> > > > >>> Cc: Simon Horman <simon.horman@netronome.com>
> > > > >>> Cc: Paul Blakey <paulb@mellanox.com>
> > > > >>> Cc: Roi Dayan <roid@mellanox.com>
> > > > >>> Cc: Ben Pfaff <blp@ovn.org>
> > > > >>> Cc: William Tu <u9012063@gmail.com>
> > > > >>> Cc: Ilya Maximets <i.maximets@ovn.org>
> > > > >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&amp;data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&amp;sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&amp;reserved=0
> > > > >>> Co-authored-by: Wengang Hou <houwengang@didiglobal.com>
> > > > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > > >>> ---
> > > > >>>  lib/netdev-offload.c | 1 +
> > > > >>>  1 file changed, 1 insertion(+)
> > > > >>>
> > > > >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> > > > >>> index ab97a292ebac..964566caab1e 100644
> > > > >>> --- a/lib/netdev-offload.c
> > > > >>> +++ b/lib/netdev-offload.c
> > > > >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> > > > >>>      data = netdev_ports_lookup(port_no, dpif_class);
> > > > >>>      if (data) {
> > > > >>>          dpif_port_destroy(&data->dpif_port);
> > > > >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> > > > >>>          netdev_close(data->netdev); /* unref and possibly close */
> > > > >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> > > > >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> > > > >>>
> > > > >>
> > > > > Hi Roi
> > > > > Thanks for your review. The case is reproduced as below:
> > > > > # ovs-appctl dpctl/show
> > > > > system@ovs-system:
> > > > >   lookups: hit:0 missed:0 lost:0
> > > > >   flows: 0
> > > > >   masks: hit:0 total:0 hit/pkt:0.00
> > > > >   port 0: ovs-system (internal)
> > > > >   port 1: br-int (internal)
> > > > >   port 2: enp130s0f0_0
> > > > >   port 3: enp130s0f0_1
> > > > >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> > > > >
> > > > > # cat /tmp/tmp-ovs-debug.sh
> > > > > set -x
> > > > > ovs-appctl dpctl/add-flow
> > > > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > > > 3
> > > > > ovs-appctl dpctl/dump-flows
> > > > > ovs-vsctl del-port br-int enp130s0f0_0
> > > > > tc filter show dev enp130s0f0_0 ingress
> > > > > ovs-appctl dpctl/dump-flows
> > > > >
> > > > > # sh /tmp/tmp-ovs-debug.sh
> > > > > + ovs-appctl dpctl/add-flow
> > > > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > > > 3
> > > > > + ovs-appctl dpctl/dump-flows
> > > > > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > > > > packets:0, bytes:0, used:0.020s, actions:3
> > > > > + ovs-vsctl del-port br-int enp130s0f0_0
> > > > > + tc filter show dev enp130s0f0_0 ingress
> > > > > filter protocol ip pref 2 flower chain 0
> > > > > filter protocol ip pref 2 flower chain 0 handle 0x1
> > > > >   dst_mac 00:11:22:33:44:66
> > > > >   eth_type ipv4
> > > > >   dst_ip 1.1.1.200
> > > > >   in_hw in_hw_count 1
> > > > > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > > > > index 1 ref 1 bind 1
> > > > > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > > > > no_percpu
> > > > > used_hw_stats delayed
> > > > >
> > > > > + ovs-appctl dpctl/dump-flows
> > > > >
> > > > > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > > > > command to show them.
> > > > >
> > > > >> it looks ok but I wonder if it's redundant?
> > > > >> I tested and when I removed a port with tc rules on it,
> > > > >> all tc rules got deleted using the flow api flow del,
> > > > >> we get into netdev_tc_flow_del() for each flow.
> > > > > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > > > > The debug step is:
> > > > > 1. run the /tmp/tmp-ovs-debug.sh
> > > > > 2. and while gdb the ovs-vswitchd:
> > > > > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > > > > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> > > > >
> > > > > the breakpoint was not triggered.
> > > > >> So you shouldn't have a tc rule left or ufid mapping.
> > > > > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > > > > flow still are in the mapping.
> > > > > When we add the port back and the install the rules again, the new
> > > > > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > > > > ufid mapping not
> > > > > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> > > > >> How did you test that you saw tc rules were still on the port?
> > > > > Yes, I test it using the tc command.
> > > >
> > > > Hi.
> > > >
> > > > {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> > > > is clearly stated in documentation:
> > > >
> > > > "DATAPATH FLOW TABLE DEBUGGING COMMANDS
> > > >  ...
> > > >  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
> > > >  is running because it interferes with ovs-vswitchd's own datapath flow
> > > >  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
> > > >
> > > > As you're injecting flows directly to datapath you can't expect higher layers
> > > > to handle them correctly.  The issue here is that you're using OVS in a way
> > > > it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> > > > be sure that you're removing flows by hands from the datapath as you're adding
> > > > them.
> > > Hi
> > > Thanks for your review. I test it with openflow, It works fine because
> > > the *revalidator
> > > will clean up the flows(which may be offloaded).
> > > > In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> > > > reverting of the previous patch that allowed offloading via add-flow since this
> > > > is effectively a memory leak and even debugging commands should not produce
> > > > a memory leak.
> > > yes, we can revert the previous patch. But I think the command is
> > > useful for us to
> > > debug offload(rte flow offload and tc offload), and test the offload function.
> > > For this issue, can we use this little patch to fix it ? or other better way ?
> >
> > I am confused. Which little patch?
> Hi Simon
> This patch [1]:
> [1] - http://patchwork.ozlabs.org/project/openvswitch/patch/20200611103635.53367-1-xiangxia.m.yue@gmail.com/
>
> If we revert the patch which commit id is
> e61984e781e6c7d621568428788cb87c11be8f1f
> and we can't debug or install the flows with "ovs-app dpctl/add-flow",
> for tc offload.
> I think that tools are useful for us.
Hi Simon, Ilya
Do we have plan to apply this patch, or drop this patch ?
> > > > ---
> > > >
> > > > BTW, IMHO, it's a very confusing design that we're installing rules directly to
> > > > tc.  The issue here is that we have to maintain list of ports we offloaded to
> > > > with ufids completely in userspace and we can not re-create these mappings after
> > > > OVS restart like we do in case of kernel datapath.  The main issue is that we
> > > > do not know which ports are ours.  Datapath is actually split between kernel
> > > > and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> > > > This leads to big number of issues.  If we could install tc rules from the
> > > > inside of the usual kernel datapath that might solve a lot of configuration issues.
> > > > Might create some new, but anyway, I think, common architecture would be much
> > > > more clear.
> > > > For example, in userspace datapath we're installing flows to usual userspace
> > > > flow table AND to offload provider. This allows us to manage flows in more
> > > > natural way.  And even if we will lost some flows inside offload provider
> > > > implementation we will still clean up all the hanging data while removing flows
> > > > from the usual userspace datapath flow tables.  So, offloading is actually
> > > > joined part of userspace datapath and not a side thing like tc for kernel datapath.
> > > > So, what is this part about: kernel datapath has persistent ports and flows
> > > > (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> > > > flows and non-persistent ports (i.e. list of ports we're affloading to) and
> > > > non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> > > > a lot of complications.
> > > >
> > > >
> > > > Best regards, Ilya Maximets.
> > >
> > >
> > >
> > > --
> > > Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao
William Tu July 16, 2020, 2:23 p.m. UTC | #9
> > > > Hi
> > > > Thanks for your review. I test it with openflow, It works fine because
> > > > the *revalidator
> > > > will clean up the flows(which may be offloaded).
> > > > > In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> > > > > reverting of the previous patch that allowed offloading via add-flow since this
> > > > > is effectively a memory leak and even debugging commands should not produce
> > > > > a memory leak.
> > > > yes, we can revert the previous patch. But I think the command is
> > > > useful for us to
> > > > debug offload(rte flow offload and tc offload), and test the offload function.
> > > > For this issue, can we use this little patch to fix it ? or other better way ?
> > >
> > > I am confused. Which little patch?
> > Hi Simon
> > This patch [1]:
> > [1] - http://patchwork.ozlabs.org/project/openvswitch/patch/20200611103635.53367-1-xiangxia.m.yue@gmail.com/
> >
> > If we revert the patch which commit id is
> > e61984e781e6c7d621568428788cb87c11be8f1f
> > and we can't debug or install the flows with "ovs-app dpctl/add-flow",
> > for tc offload.
> > I think that tools are useful for us.
> Hi Simon, Ilya
> Do we have plan to apply this patch, or drop this patch ?

My understanding from reading the discussion is that
using ovs-appctl dpclt/add-flow is just for debugging purposes.
When using the right way, the ovs-ofctl add-flow, there is no such problem.

I don't have any strong opinion on drop or apply.
William
diff mbox series

Patch

diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index ab97a292ebac..964566caab1e 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -593,6 +593,7 @@  netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
     data = netdev_ports_lookup(port_no, dpif_class);
     if (data) {
         dpif_port_destroy(&data->dpif_port);
+        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
         netdev_close(data->netdev); /* unref and possibly close */
         hmap_remove(&port_to_netdev, &data->portno_node);
         hmap_remove(&ifindex_to_port, &data->ifindex_node);