diff mbox series

[ovs-dev,v2] netdev-linux: do not touch LAG slaves if master is not attached to ovs.

Message ID 20220621061039.22524-1-thomas.liu@ucloud.cn
State Superseded
Headers show
Series [ovs-dev,v2] netdev-linux: do not touch LAG slaves if master is not attached to ovs. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Tao Liu June 21, 2022, 6:10 a.m. UTC
Bond master netdev may be created without a classification type, due
to routing or tunneling code.

If bond master is not attached to ovs, the ingress block on slaves shoud
not be updated.

Simple reproducer:
  ip a add 10.1.1.1/30 dev bond0
  ip l set net3 master bond0
  tc q ls dev net3

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
---
 lib/netdev-linux.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Marcelo Leitner June 21, 2022, 1:49 p.m. UTC | #1
Hi,

On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
> Bond master netdev may be created without a classification type, due
> to routing or tunneling code.

Can you please elaborate on why is this an issue?

>
> If bond master is not attached to ovs, the ingress block on slaves shoud
> not be updated.

Why? (in short, but more below)

>
> Simple reproducer:
>   ip a add 10.1.1.1/30 dev bond0
>   ip l set net3 master bond0
>   tc q ls dev net3
>
> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> ---
>  lib/netdev-linux.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 9d12502..b9e0c99 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
>                  return;
>              }
>
> -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
> +            /* If bond master is not attached to ovs, ingress block on slaves
> +             * shoud not be updated. */

I think this will break a core use case. As in your reproducer, that's
pretty much how it is expected to work today with tunnels, for
example:

  ip a add 10.1.1.1/30 dev bond0
  ip l set net3 master bond0
  ip l s bond0 up
  ovs-vsctl add-port ovsbr0 vxlan0 -- \
	set interface vxlan0
	type=vxlan \
	options:local_ip=10.1.1.1
	options:remote_ip=10.1.1.2
	options:key=0

If you patch like this, then who would be adding the ingress qdiscs on
the slaves?

Forcing the bond to be added is probably not optimal, because it
doesn't really need to be. Unless your considering that as some sort
of authorization for ovs to mangle with it?

  Marcelo

> +            if (!master_netdev->auto_classified &&
> +                is_netdev_linux_class(master_netdev->netdev_class)) {
>                  block_id = netdev_get_block_id(master_netdev);
>                  if (!block_id) {
>                      netdev_close(master_netdev);
> --
> 1.8.3.1
>
Tao Liu June 22, 2022, 3:10 a.m. UTC | #2
On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
> Hi,
> 
> On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
> > Bond master netdev may be created without a classification type, due
> > to routing or tunneling code.
> 
> Can you please elaborate on why is this an issue?
> 
Hi, thanks for your reply.

We are using BlueField2 in Bare Metal Server.
p0 and p1 enslave to master bond0. A SF is created for RDMA.
ovs manages pf0hpf and gre.

Traffics between bond0 and SF are controlled by tc rules.
```
master=bond0
slave1=p0
slave2=p1
sf=enp3s0f0s0
sf_rep=en3f0pf0sf0
$tc qdisc add dev $master ingress_block 22 ingress
$tc qdisc add dev $slave1 ingress_block 22 ingress
$tc qdisc add dev $slave2 ingress_block 22 ingress
$tc qdisc add dev $sf_rep ingress

# some customized tc rules
$tc filter add dev $sf_rep pref 1 ingress ... action mirred egress redirect dev $master

$tc filter add block 22 pref 1 ... action mirred egress redirect dev $sf_rep
```

Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
ovs when they enslaves to bond0.

Also we have a similar architectur on cx5/cx6.

> >
> > If bond master is not attached to ovs, the ingress block on slaves shoud
> > not be updated.
> 
> Why? (in short, but more below)
> 
If we do not use bond master and slaves in ovs, they shoud not be interfered.

> >
> > Simple reproducer:
> >   ip a add 10.1.1.1/30 dev bond0
> >   ip l set net3 master bond0
> >   tc q ls dev net3
> >
> > Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
> > Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> > ---
> >  lib/netdev-linux.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index 9d12502..b9e0c99 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
> >                  return;
> >              }
> >
> > -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
> > +            /* If bond master is not attached to ovs, ingress block on slaves
> > +             * shoud not be updated. */
> 
> I think this will break a core use case. As in your reproducer, that's
> pretty much how it is expected to work today with tunnels, for
> example:
> 
>   ip a add 10.1.1.1/30 dev bond0
>   ip l set net3 master bond0
>   ip l s bond0 up
>   ovs-vsctl add-port ovsbr0 vxlan0 -- \
> 	set interface vxlan0
> 	type=vxlan \
> 	options:local_ip=10.1.1.1
> 	options:remote_ip=10.1.1.2
> 	options:key=0
> 
> If you patch like this, then who would be adding the ingress qdiscs on
> the slaves?
> 
> Forcing the bond to be added is probably not optimal, because it
> doesn't really need to be. Unless your considering that as some sort
> of authorization for ovs to mangle with it?
> 
>   Marcelo
> 
This patch does not break the tunnel use case. The ingress attaches on
vxlan_sys, but not need to attach on bond master or slaves.

> > +            if (!master_netdev->auto_classified &&
> > +                is_netdev_linux_class(master_netdev->netdev_class)) {
> >                  block_id = netdev_get_block_id(master_netdev);
> >                  if (!block_id) {
> >                      netdev_close(master_netdev);
> > --
> > 1.8.3.1
> >
> 
>
Marcelo Leitner June 22, 2022, 8:56 p.m. UTC | #3
On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
> On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
> > Hi,
> >
> > On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
> > > Bond master netdev may be created without a classification type, due
> > > to routing or tunneling code.
> >
> > Can you please elaborate on why is this an issue?
> >
> Hi, thanks for your reply.
>
> We are using BlueField2 in Bare Metal Server.
> p0 and p1 enslave to master bond0. A SF is created for RDMA.
> ovs manages pf0hpf and gre.
>
> Traffics between bond0 and SF are controlled by tc rules.
> ```
> master=bond0
> slave1=p0
> slave2=p1
> sf=enp3s0f0s0
> sf_rep=en3f0pf0sf0
> $tc qdisc add dev $master ingress_block 22 ingress
> $tc qdisc add dev $slave1 ingress_block 22 ingress
> $tc qdisc add dev $slave2 ingress_block 22 ingress
> $tc qdisc add dev $sf_rep ingress
>
> # some customized tc rules
> $tc filter add dev $sf_rep pref 1 ingress ... action mirred egress redirect dev $master
>
> $tc filter add block 22 pref 1 ... action mirred egress redirect dev $sf_rep
> ```
>
> Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
> ovs when they enslaves to bond0.
>
> Also we have a similar architectur on cx5/cx6.
>
> > >
> > > If bond master is not attached to ovs, the ingress block on slaves shoud
> > > not be updated.
> >
> > Why? (in short, but more below)
> >
> If we do not use bond master and slaves in ovs, they shoud not be interfered.

Makes sense, ...

>
> > >
> > > Simple reproducer:
> > >   ip a add 10.1.1.1/30 dev bond0
> > >   ip l set net3 master bond0
> > >   tc q ls dev net3
> > >
> > > Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
> > > Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> > > ---
> > >  lib/netdev-linux.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > > index 9d12502..b9e0c99 100644
> > > --- a/lib/netdev-linux.c
> > > +++ b/lib/netdev-linux.c
> > > @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
> > >                  return;
> > >              }
> > >
> > > -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
> > > +            /* If bond master is not attached to ovs, ingress block on slaves
> > > +             * shoud not be updated. */
> >
> > I think this will break a core use case. As in your reproducer, that's
> > pretty much how it is expected to work today with tunnels, for
> > example:
> >
> >   ip a add 10.1.1.1/30 dev bond0
> >   ip l set net3 master bond0
> >   ip l s bond0 up
> >   ovs-vsctl add-port ovsbr0 vxlan0 -- \
> > 	set interface vxlan0
> > 	type=vxlan \
> > 	options:local_ip=10.1.1.1
> > 	options:remote_ip=10.1.1.2
> > 	options:key=0
> >
> > If you patch like this, then who would be adding the ingress qdiscs on
> > the slaves?
> >
> > Forcing the bond to be added is probably not optimal, because it
> > doesn't really need to be. Unless your considering that as some sort
> > of authorization for ovs to mangle with it?
> >
> >   Marcelo
> >
> This patch does not break the tunnel use case. The ingress attaches on
> vxlan_sys, but not need to attach on bond master or slaves.

... I was under the impression that the driver needed it somehow,
despite that. But apparently that's not right, as the driver will have
one indr callback for each uplink, and should be able to have its way
from there.

Thanks for the explanations.

Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>

>
> > > +            if (!master_netdev->auto_classified &&
> > > +                is_netdev_linux_class(master_netdev->netdev_class)) {
> > >                  block_id = netdev_get_block_id(master_netdev);
> > >                  if (!block_id) {
> > >                      netdev_close(master_netdev);
> > > --
> > > 1.8.3.1
> > >
> >
> >
>
Roi Dayan June 23, 2022, 9:42 a.m. UTC | #4
On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
>> On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
>>> Hi,
>>>
>>> On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
>>>> Bond master netdev may be created without a classification type, due
>>>> to routing or tunneling code.
>>>
>>> Can you please elaborate on why is this an issue?
>>>
>> Hi, thanks for your reply.
>>
>> We are using BlueField2 in Bare Metal Server.
>> p0 and p1 enslave to master bond0. A SF is created for RDMA.
>> ovs manages pf0hpf and gre.
>>
>> Traffics between bond0 and SF are controlled by tc rules.
>> ```
>> master=bond0
>> slave1=p0
>> slave2=p1
>> sf=enp3s0f0s0
>> sf_rep=en3f0pf0sf0
>> $tc qdisc add dev $master ingress_block 22 ingress
>> $tc qdisc add dev $slave1 ingress_block 22 ingress
>> $tc qdisc add dev $slave2 ingress_block 22 ingress
>> $tc qdisc add dev $sf_rep ingress
>>
>> # some customized tc rules
>> $tc filter add dev $sf_rep pref 1 ingress ... action mirred egress redirect dev $master
>>
>> $tc filter add block 22 pref 1 ... action mirred egress redirect dev $sf_rep
>> ```
>>
>> Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
>> ovs when they enslaves to bond0.
>>
>> Also we have a similar architectur on cx5/cx6.
>>
>>>>
>>>> If bond master is not attached to ovs, the ingress block on slaves shoud
>>>> not be updated.
>>>
>>> Why? (in short, but more below)
>>>
>> If we do not use bond master and slaves in ovs, they shoud not be interfered.
> 
> Makes sense, ...
> 
>>
>>>>
>>>> Simple reproducer:
>>>>    ip a add 10.1.1.1/30 dev bond0
>>>>    ip l set net3 master bond0
>>>>    tc q ls dev net3
>>>>
>>>> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
>>>> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
>>>> ---
>>>>   lib/netdev-linux.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>>> index 9d12502..b9e0c99 100644
>>>> --- a/lib/netdev-linux.c
>>>> +++ b/lib/netdev-linux.c
>>>> @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
>>>>                   return;
>>>>               }
>>>>
>>>> -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
>>>> +            /* If bond master is not attached to ovs, ingress block on slaves
>>>> +             * shoud not be updated. */
>>>
>>> I think this will break a core use case. As in your reproducer, that's
>>> pretty much how it is expected to work today with tunnels, for
>>> example:
>>>
>>>    ip a add 10.1.1.1/30 dev bond0
>>>    ip l set net3 master bond0
>>>    ip l s bond0 up
>>>    ovs-vsctl add-port ovsbr0 vxlan0 -- \
>>> 	set interface vxlan0
>>> 	type=vxlan \
>>> 	options:local_ip=10.1.1.1
>>> 	options:remote_ip=10.1.1.2
>>> 	options:key=0
>>>
>>> If you patch like this, then who would be adding the ingress qdiscs on
>>> the slaves?
>>>
>>> Forcing the bond to be added is probably not optimal, because it
>>> doesn't really need to be. Unless your considering that as some sort
>>> of authorization for ovs to mangle with it?
>>>
>>>    Marcelo
>>>
>> This patch does not break the tunnel use case. The ingress attaches on
>> vxlan_sys, but not need to attach on bond master or slaves.
> 
> ... I was under the impression that the driver needed it somehow,
> despite that. But apparently that's not right, as the driver will have
> one indr callback for each uplink, and should be able to have its way
> from there.
> 
> Thanks for the explanations.
> 
> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 

there is an example without bond about ingress. We can
use vxlan port in ovs with tunnel ip assigned on the pf.
like Tao mentioned, the ingress and tc rules are created on
the vxlan netdev created by ovs but ovs doesnt recreate ingress
on the pf and the rules are offloaded by the driver.

also looks good to me. thanks.

Reviewed-by: Roi Dayan <roid@nvidia.com>


>>
>>>> +            if (!master_netdev->auto_classified &&
>>>> +                is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>                   block_id = netdev_get_block_id(master_netdev);
>>>>                   if (!block_id) {
>>>>                       netdev_close(master_netdev);
>>>> --
>>>> 1.8.3.1
>>>>
>>>
>>>
>>
>
Roi Dayan June 27, 2022, 6:40 a.m. UTC | #5
On 2022-06-23 12:42 PM, Roi Dayan wrote:
> 
> 
> On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
>> On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
>>> On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
>>>> Hi,
>>>>
>>>> On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
>>>>> Bond master netdev may be created without a classification type, due
>>>>> to routing or tunneling code.
>>>>
>>>> Can you please elaborate on why is this an issue?
>>>>
>>> Hi, thanks for your reply.
>>>
>>> We are using BlueField2 in Bare Metal Server.
>>> p0 and p1 enslave to master bond0. A SF is created for RDMA.
>>> ovs manages pf0hpf and gre.
>>>
>>> Traffics between bond0 and SF are controlled by tc rules.
>>> ```
>>> master=bond0
>>> slave1=p0
>>> slave2=p1
>>> sf=enp3s0f0s0
>>> sf_rep=en3f0pf0sf0
>>> $tc qdisc add dev $master ingress_block 22 ingress
>>> $tc qdisc add dev $slave1 ingress_block 22 ingress
>>> $tc qdisc add dev $slave2 ingress_block 22 ingress
>>> $tc qdisc add dev $sf_rep ingress
>>>
>>> # some customized tc rules
>>> $tc filter add dev $sf_rep pref 1 ingress ... action mirred egress 
>>> redirect dev $master
>>>
>>> $tc filter add block 22 pref 1 ... action mirred egress redirect dev 
>>> $sf_rep
>>> ```
>>>
>>> Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
>>> ovs when they enslaves to bond0.
>>>
>>> Also we have a similar architectur on cx5/cx6.
>>>
>>>>>
>>>>> If bond master is not attached to ovs, the ingress block on slaves 
>>>>> shoud
>>>>> not be updated.
>>>>
>>>> Why? (in short, but more below)
>>>>
>>> If we do not use bond master and slaves in ovs, they shoud not be 
>>> interfered.
>>
>> Makes sense, ...
>>
>>>
>>>>>
>>>>> Simple reproducer:
>>>>>    ip a add 10.1.1.1/30 dev bond0
>>>>>    ip l set net3 master bond0
>>>>>    tc q ls dev net3
>>>>>
>>>>> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves 
>>>>> to TC")
>>>>> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
>>>>> ---
>>>>>   lib/netdev-linux.c | 5 ++++-
>>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>>>> index 9d12502..b9e0c99 100644
>>>>> --- a/lib/netdev-linux.c
>>>>> +++ b/lib/netdev-linux.c
>>>>> @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct 
>>>>> rtnetlink_change *change)
>>>>>                   return;
>>>>>               }
>>>>>
>>>>> -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>> +            /* If bond master is not attached to ovs, ingress 
>>>>> block on slaves
>>>>> +             * shoud not be updated. */
>>>>
>>>> I think this will break a core use case. As in your reproducer, that's
>>>> pretty much how it is expected to work today with tunnels, for
>>>> example:
>>>>
>>>>    ip a add 10.1.1.1/30 dev bond0
>>>>    ip l set net3 master bond0
>>>>    ip l s bond0 up
>>>>    ovs-vsctl add-port ovsbr0 vxlan0 -- \
>>>>     set interface vxlan0
>>>>     type=vxlan \
>>>>     options:local_ip=10.1.1.1
>>>>     options:remote_ip=10.1.1.2
>>>>     options:key=0
>>>>
>>>> If you patch like this, then who would be adding the ingress qdiscs on
>>>> the slaves?
>>>>
>>>> Forcing the bond to be added is probably not optimal, because it
>>>> doesn't really need to be. Unless your considering that as some sort
>>>> of authorization for ovs to mangle with it?
>>>>
>>>>    Marcelo
>>>>
>>> This patch does not break the tunnel use case. The ingress attaches on
>>> vxlan_sys, but not need to attach on bond master or slaves.
>>
>> ... I was under the impression that the driver needed it somehow,
>> despite that. But apparently that's not right, as the driver will have
>> one indr callback for each uplink, and should be able to have its way
>> from there.
>>
>> Thanks for the explanations.
>>
>> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>
> 
> there is an example without bond about ingress. We can
> use vxlan port in ovs with tunnel ip assigned on the pf.
> like Tao mentioned, the ingress and tc rules are created on
> the vxlan netdev created by ovs but ovs doesnt recreate ingress
> on the pf and the rules are offloaded by the driver.
> 
> also looks good to me. thanks.
> 
> Reviewed-by: Roi Dayan <roid@nvidia.com>
> 

Hmm, sorry but looks like I did review too quick.
I did some manual testing now and ovs doesn't add
ingress to the slave devices for me.
master_netdev->auto_classified==true in my case.
can you describe the steps you did to test? or you just tested
the flow where bond is not attached to ovs?


> 
>>>
>>>>> +            if (!master_netdev->auto_classified &&
>>>>> +                is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>>                   block_id = netdev_get_block_id(master_netdev);
>>>>>                   if (!block_id) {
>>>>>                       netdev_close(master_netdev);
>>>>> -- 
>>>>> 1.8.3.1
>>>>>
>>>>
>>>>
>>>
>>
Tao Liu June 27, 2022, 7:49 a.m. UTC | #6
On Mon, Jun 27, 2022 at 09:40:38AM +0300, Roi Dayan wrote:
> 
> 
> On 2022-06-23 12:42 PM, Roi Dayan wrote:
> > 
> > 
> > On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
> > > On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
> > > > On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
> > > > > Hi,
> > > > > 
> > > > > On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
> > > > > > Bond master netdev may be created without a classification type, due
> > > > > > to routing or tunneling code.
> > > > > 
> > > > > Can you please elaborate on why is this an issue?
> > > > > 
> > > > Hi, thanks for your reply.
> > > > 
> > > > We are using BlueField2 in Bare Metal Server.
> > > > p0 and p1 enslave to master bond0. A SF is created for RDMA.
> > > > ovs manages pf0hpf and gre.
> > > > 
> > > > Traffics between bond0 and SF are controlled by tc rules.
> > > > ```
> > > > master=bond0
> > > > slave1=p0
> > > > slave2=p1
> > > > sf=enp3s0f0s0
> > > > sf_rep=en3f0pf0sf0
> > > > $tc qdisc add dev $master ingress_block 22 ingress
> > > > $tc qdisc add dev $slave1 ingress_block 22 ingress
> > > > $tc qdisc add dev $slave2 ingress_block 22 ingress
> > > > $tc qdisc add dev $sf_rep ingress
> > > > 
> > > > # some customized tc rules
> > > > $tc filter add dev $sf_rep pref 1 ingress ... action mirred
> > > > egress redirect dev $master
> > > > 
> > > > $tc filter add block 22 pref 1 ... action mirred egress redirect
> > > > dev $sf_rep
> > > > ```
> > > > 
> > > > Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
> > > > ovs when they enslaves to bond0.
> > > > 
> > > > Also we have a similar architectur on cx5/cx6.
> > > > 
> > > > > > 
> > > > > > If bond master is not attached to ovs, the ingress block
> > > > > > on slaves shoud
> > > > > > not be updated.
> > > > > 
> > > > > Why? (in short, but more below)
> > > > > 
> > > > If we do not use bond master and slaves in ovs, they shoud not
> > > > be interfered.
> > > 
> > > Makes sense, ...
> > > 
> > > > 
> > > > > > 
> > > > > > Simple reproducer:
> > > > > >    ip a add 10.1.1.1/30 dev bond0
> > > > > >    ip l set net3 master bond0
> > > > > >    tc q ls dev net3
> > > > > > 
> > > > > > Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload
> > > > > > LAG slaves to TC")
> > > > > > Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> > > > > > ---
> > > > > >   lib/netdev-linux.c | 5 ++++-
> > > > > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > > > > > index 9d12502..b9e0c99 100644
> > > > > > --- a/lib/netdev-linux.c
> > > > > > +++ b/lib/netdev-linux.c
> > > > > > @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct
> > > > > > rtnetlink_change *change)
> > > > > >                   return;
> > > > > >               }
> > > > > > 
> > > > > > -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
> > > > > > +            /* If bond master is not attached to ovs,
> > > > > > ingress block on slaves
> > > > > > +             * shoud not be updated. */
> > > > > 
> > > > > I think this will break a core use case. As in your reproducer, that's
> > > > > pretty much how it is expected to work today with tunnels, for
> > > > > example:
> > > > > 
> > > > >    ip a add 10.1.1.1/30 dev bond0
> > > > >    ip l set net3 master bond0
> > > > >    ip l s bond0 up
> > > > >    ovs-vsctl add-port ovsbr0 vxlan0 -- \
> > > > >     set interface vxlan0
> > > > >     type=vxlan \
> > > > >     options:local_ip=10.1.1.1
> > > > >     options:remote_ip=10.1.1.2
> > > > >     options:key=0
> > > > > 
> > > > > If you patch like this, then who would be adding the ingress qdiscs on
> > > > > the slaves?
> > > > > 
> > > > > Forcing the bond to be added is probably not optimal, because it
> > > > > doesn't really need to be. Unless your considering that as some sort
> > > > > of authorization for ovs to mangle with it?
> > > > > 
> > > > >    Marcelo
> > > > > 
> > > > This patch does not break the tunnel use case. The ingress attaches on
> > > > vxlan_sys, but not need to attach on bond master or slaves.
> > > 
> > > ... I was under the impression that the driver needed it somehow,
> > > despite that. But apparently that's not right, as the driver will have
> > > one indr callback for each uplink, and should be able to have its way
> > > from there.
> > > 
> > > Thanks for the explanations.
> > > 
> > > Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > 
> > 
> > there is an example without bond about ingress. We can
> > use vxlan port in ovs with tunnel ip assigned on the pf.
> > like Tao mentioned, the ingress and tc rules are created on
> > the vxlan netdev created by ovs but ovs doesnt recreate ingress
> > on the pf and the rules are offloaded by the driver.
> > 
> > also looks good to me. thanks.
> > 
> > Reviewed-by: Roi Dayan <roid@nvidia.com>
> > 
> 
> Hmm, sorry but looks like I did review too quick.
> I did some manual testing now and ovs doesn't add
> ingress to the slave devices for me.
> master_netdev->auto_classified==true in my case.
> can you describe the steps you did to test? or you just tested
> the flow where bond is not attached to ovs?
> 
# ovs-vsctl show
866b5d8e-bf4d-4600-9dd3-b724036c7e99
    ovs_version: "2.17.90"

# echo "+bond0" > /sys/class/net/bonding_masters

# ip l show dev bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 94:29:2f:70:22:9c brd ff:ff:ff:ff:ff:ff
# ip a add 10.1.1.1/30 dev bond0

# ip l set net3 down
# tc q add dev net3 ingress_block 111 ingress
# tc q ls dev net3 ingress
    qdisc ingress ffff: parent ffff:fff1 ingress_block 111 ----------------
# ip l set net3 master bond0
# tc q ls dev net3 ingress
    qdisc ingress ffff: parent ffff:fff1 ingress_block 22 ----------------

Also add some print in netdev_linux_update_lag():
2022-06-27T07:43:45.324Z|00058|netdev_linux|INFO|LAG: master_name=bond0, master_ifindex=22, auto_classified=1
> 
> > 
> > > > 
> > > > > > +            if (!master_netdev->auto_classified &&
> > > > > > +                is_netdev_linux_class(master_netdev->netdev_class)) {
> > > > > >                   block_id = netdev_get_block_id(master_netdev);
> > > > > >                   if (!block_id) {
> > > > > >                       netdev_close(master_netdev);
> > > > > > -- 
> > > > > > 1.8.3.1
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
>
Roi Dayan June 27, 2022, 8:51 a.m. UTC | #7
On 2022-06-27 10:49 AM, Tao Liu wrote:
> On Mon, Jun 27, 2022 at 09:40:38AM +0300, Roi Dayan wrote:
>>
>>
>> On 2022-06-23 12:42 PM, Roi Dayan wrote:
>>>
>>>
>>> On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
>>>> On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
>>>>> On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
>>>>>>> Bond master netdev may be created without a classification type, due
>>>>>>> to routing or tunneling code.
>>>>>>
>>>>>> Can you please elaborate on why is this an issue?
>>>>>>
>>>>> Hi, thanks for your reply.
>>>>>
>>>>> We are using BlueField2 in Bare Metal Server.
>>>>> p0 and p1 enslave to master bond0. A SF is created for RDMA.
>>>>> ovs manages pf0hpf and gre.
>>>>>
>>>>> Traffics between bond0 and SF are controlled by tc rules.
>>>>> ```
>>>>> master=bond0
>>>>> slave1=p0
>>>>> slave2=p1
>>>>> sf=enp3s0f0s0
>>>>> sf_rep=en3f0pf0sf0
>>>>> $tc qdisc add dev $master ingress_block 22 ingress
>>>>> $tc qdisc add dev $slave1 ingress_block 22 ingress
>>>>> $tc qdisc add dev $slave2 ingress_block 22 ingress
>>>>> $tc qdisc add dev $sf_rep ingress
>>>>>
>>>>> # some customized tc rules
>>>>> $tc filter add dev $sf_rep pref 1 ingress ... action mirred
>>>>> egress redirect dev $master
>>>>>
>>>>> $tc filter add block 22 pref 1 ... action mirred egress redirect
>>>>> dev $sf_rep
>>>>> ```
>>>>>
>>>>> Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
>>>>> ovs when they enslaves to bond0.
>>>>>
>>>>> Also we have a similar architectur on cx5/cx6.
>>>>>
>>>>>>>
>>>>>>> If bond master is not attached to ovs, the ingress block
>>>>>>> on slaves shoud
>>>>>>> not be updated.
>>>>>>
>>>>>> Why? (in short, but more below)
>>>>>>
>>>>> If we do not use bond master and slaves in ovs, they shoud not
>>>>> be interfered.
>>>>
>>>> Makes sense, ...
>>>>
>>>>>
>>>>>>>
>>>>>>> Simple reproducer:
>>>>>>>     ip a add 10.1.1.1/30 dev bond0
>>>>>>>     ip l set net3 master bond0
>>>>>>>     tc q ls dev net3
>>>>>>>
>>>>>>> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload
>>>>>>> LAG slaves to TC")
>>>>>>> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
>>>>>>> ---
>>>>>>>    lib/netdev-linux.c | 5 ++++-
>>>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>>>>>> index 9d12502..b9e0c99 100644
>>>>>>> --- a/lib/netdev-linux.c
>>>>>>> +++ b/lib/netdev-linux.c
>>>>>>> @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct
>>>>>>> rtnetlink_change *change)
>>>>>>>                    return;
>>>>>>>                }
>>>>>>>
>>>>>>> -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>>>> +            /* If bond master is not attached to ovs,
>>>>>>> ingress block on slaves
>>>>>>> +             * shoud not be updated. */
>>>>>>
>>>>>> I think this will break a core use case. As in your reproducer, that's
>>>>>> pretty much how it is expected to work today with tunnels, for
>>>>>> example:
>>>>>>
>>>>>>     ip a add 10.1.1.1/30 dev bond0
>>>>>>     ip l set net3 master bond0
>>>>>>     ip l s bond0 up
>>>>>>     ovs-vsctl add-port ovsbr0 vxlan0 -- \
>>>>>>      set interface vxlan0
>>>>>>      type=vxlan \
>>>>>>      options:local_ip=10.1.1.1
>>>>>>      options:remote_ip=10.1.1.2
>>>>>>      options:key=0
>>>>>>
>>>>>> If you patch like this, then who would be adding the ingress qdiscs on
>>>>>> the slaves?
>>>>>>
>>>>>> Forcing the bond to be added is probably not optimal, because it
>>>>>> doesn't really need to be. Unless your considering that as some sort
>>>>>> of authorization for ovs to mangle with it?
>>>>>>
>>>>>>     Marcelo
>>>>>>
>>>>> This patch does not break the tunnel use case. The ingress attaches on
>>>>> vxlan_sys, but not need to attach on bond master or slaves.
>>>>
>>>> ... I was under the impression that the driver needed it somehow,
>>>> despite that. But apparently that's not right, as the driver will have
>>>> one indr callback for each uplink, and should be able to have its way
>>>> from there.
>>>>
>>>> Thanks for the explanations.
>>>>
>>>> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>
>>>
>>> there is an example without bond about ingress. We can
>>> use vxlan port in ovs with tunnel ip assigned on the pf.
>>> like Tao mentioned, the ingress and tc rules are created on
>>> the vxlan netdev created by ovs but ovs doesnt recreate ingress
>>> on the pf and the rules are offloaded by the driver.
>>>
>>> also looks good to me. thanks.
>>>
>>> Reviewed-by: Roi Dayan <roid@nvidia.com>
>>>
>>
>> Hmm, sorry but looks like I did review too quick.
>> I did some manual testing now and ovs doesn't add
>> ingress to the slave devices for me.
>> master_netdev->auto_classified==true in my case.
>> can you describe the steps you did to test? or you just tested
>> the flow where bond is not attached to ovs?
>>
> # ovs-vsctl show
> 866b5d8e-bf4d-4600-9dd3-b724036c7e99
>      ovs_version: "2.17.90"
> 
> # echo "+bond0" > /sys/class/net/bonding_masters
> 
> # ip l show dev bond0
> 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>      link/ether 94:29:2f:70:22:9c brd ff:ff:ff:ff:ff:ff
> # ip a add 10.1.1.1/30 dev bond0
> 
> # ip l set net3 down
> # tc q add dev net3 ingress_block 111 ingress
> # tc q ls dev net3 ingress
>      qdisc ingress ffff: parent ffff:fff1 ingress_block 111 ----------------
> # ip l set net3 master bond0
> # tc q ls dev net3 ingress
>      qdisc ingress ffff: parent ffff:fff1 ingress_block 22 ----------------
> 
> Also add some print in netdev_linux_update_lag():
> 2022-06-27T07:43:45.324Z|00058|netdev_linux|INFO|LAG: master_name=bond0, master_ifindex=22, auto_classified=1

you show a test for ovs adding ingress to slaves when bond0
is not part of ovs.
now with your patch also adding bond0 to ovs bridge doesn't add the
ingress to the bond slaves. only to bond0 itself.

>>
>>>
>>>>>
>>>>>>> +            if (!master_netdev->auto_classified &&
>>>>>>> +                is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>>>>                    block_id = netdev_get_block_id(master_netdev);
>>>>>>>                    if (!block_id) {
>>>>>>>                        netdev_close(master_netdev);
>>>>>>> -- 
>>>>>>> 1.8.3.1
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
Tao Liu June 27, 2022, 10:13 a.m. UTC | #8
On Mon, Jun 27, 2022 at 11:51:48AM +0300, Roi Dayan wrote:
> 
> 
> On 2022-06-27 10:49 AM, Tao Liu wrote:
> > On Mon, Jun 27, 2022 at 09:40:38AM +0300, Roi Dayan wrote:
> > > 
> > > 
> > > On 2022-06-23 12:42 PM, Roi Dayan wrote:
> > > > 
> > > > 
> > > > On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
> > > > > On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
> > > > > > On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
> > > > > > > > Bond master netdev may be created without a classification type, due
> > > > > > > > to routing or tunneling code.
> > > > > > > 
> > > > > > > Can you please elaborate on why is this an issue?
> > > > > > > 
> > > > > > Hi, thanks for your reply.
> > > > > > 
> > > > > > We are using BlueField2 in Bare Metal Server.
> > > > > > p0 and p1 enslave to master bond0. A SF is created for RDMA.
> > > > > > ovs manages pf0hpf and gre.
> > > > > > 
> > > > > > Traffics between bond0 and SF are controlled by tc rules.
> > > > > > ```
> > > > > > master=bond0
> > > > > > slave1=p0
> > > > > > slave2=p1
> > > > > > sf=enp3s0f0s0
> > > > > > sf_rep=en3f0pf0sf0
> > > > > > $tc qdisc add dev $master ingress_block 22 ingress
> > > > > > $tc qdisc add dev $slave1 ingress_block 22 ingress
> > > > > > $tc qdisc add dev $slave2 ingress_block 22 ingress
> > > > > > $tc qdisc add dev $sf_rep ingress
> > > > > > 
> > > > > > # some customized tc rules
> > > > > > $tc filter add dev $sf_rep pref 1 ingress ... action mirred
> > > > > > egress redirect dev $master
> > > > > > 
> > > > > > $tc filter add block 22 pref 1 ... action mirred egress redirect
> > > > > > dev $sf_rep
> > > > > > ```
> > > > > > 
> > > > > > Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
> > > > > > ovs when they enslaves to bond0.
> > > > > > 
> > > > > > Also we have a similar architectur on cx5/cx6.
> > > > > > 
> > > > > > > > 
> > > > > > > > If bond master is not attached to ovs, the ingress block
> > > > > > > > on slaves shoud
> > > > > > > > not be updated.
> > > > > > > 
> > > > > > > Why? (in short, but more below)
> > > > > > > 
> > > > > > If we do not use bond master and slaves in ovs, they shoud not
> > > > > > be interfered.
> > > > > 
> > > > > Makes sense, ...
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > Simple reproducer:
> > > > > > > >     ip a add 10.1.1.1/30 dev bond0
> > > > > > > >     ip l set net3 master bond0
> > > > > > > >     tc q ls dev net3
> > > > > > > > 
> > > > > > > > Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload
> > > > > > > > LAG slaves to TC")
> > > > > > > > Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> > > > > > > > ---
> > > > > > > >    lib/netdev-linux.c | 5 ++++-
> > > > > > > >    1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > > > > > > > index 9d12502..b9e0c99 100644
> > > > > > > > --- a/lib/netdev-linux.c
> > > > > > > > +++ b/lib/netdev-linux.c
> > > > > > > > @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct
> > > > > > > > rtnetlink_change *change)
> > > > > > > >                    return;
> > > > > > > >                }
> > > > > > > > 
> > > > > > > > -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
> > > > > > > > +            /* If bond master is not attached to ovs,
> > > > > > > > ingress block on slaves
> > > > > > > > +             * shoud not be updated. */
> > > > > > > 
> > > > > > > I think this will break a core use case. As in your reproducer, that's
> > > > > > > pretty much how it is expected to work today with tunnels, for
> > > > > > > example:
> > > > > > > 
> > > > > > >     ip a add 10.1.1.1/30 dev bond0
> > > > > > >     ip l set net3 master bond0
> > > > > > >     ip l s bond0 up
> > > > > > >     ovs-vsctl add-port ovsbr0 vxlan0 -- \
> > > > > > >      set interface vxlan0
> > > > > > >      type=vxlan \
> > > > > > >      options:local_ip=10.1.1.1
> > > > > > >      options:remote_ip=10.1.1.2
> > > > > > >      options:key=0
> > > > > > > 
> > > > > > > If you patch like this, then who would be adding the ingress qdiscs on
> > > > > > > the slaves?
> > > > > > > 
> > > > > > > Forcing the bond to be added is probably not optimal, because it
> > > > > > > doesn't really need to be. Unless your considering that as some sort
> > > > > > > of authorization for ovs to mangle with it?
> > > > > > > 
> > > > > > >     Marcelo
> > > > > > > 
> > > > > > This patch does not break the tunnel use case. The ingress attaches on
> > > > > > vxlan_sys, but not need to attach on bond master or slaves.
> > > > > 
> > > > > ... I was under the impression that the driver needed it somehow,
> > > > > despite that. But apparently that's not right, as the driver will have
> > > > > one indr callback for each uplink, and should be able to have its way
> > > > > from there.
> > > > > 
> > > > > Thanks for the explanations.
> > > > > 
> > > > > Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > > > 
> > > > 
> > > > there is an example without bond about ingress. We can
> > > > use vxlan port in ovs with tunnel ip assigned on the pf.
> > > > like Tao mentioned, the ingress and tc rules are created on
> > > > the vxlan netdev created by ovs but ovs doesnt recreate ingress
> > > > on the pf and the rules are offloaded by the driver.
> > > > 
> > > > also looks good to me. thanks.
> > > > 
> > > > Reviewed-by: Roi Dayan <roid@nvidia.com>
> > > > 
> > > 
> > > Hmm, sorry but looks like I did review too quick.
> > > I did some manual testing now and ovs doesn't add
> > > ingress to the slave devices for me.
> > > master_netdev->auto_classified==true in my case.
> > > can you describe the steps you did to test? or you just tested
> > > the flow where bond is not attached to ovs?
> > > 
> > # ovs-vsctl show
> > 866b5d8e-bf4d-4600-9dd3-b724036c7e99
> >      ovs_version: "2.17.90"
> > 
> > # echo "+bond0" > /sys/class/net/bonding_masters
> > 
> > # ip l show dev bond0
> > 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
> >      link/ether 94:29:2f:70:22:9c brd ff:ff:ff:ff:ff:ff
> > # ip a add 10.1.1.1/30 dev bond0
> > 
> > # ip l set net3 down
> > # tc q add dev net3 ingress_block 111 ingress
> > # tc q ls dev net3 ingress
> >      qdisc ingress ffff: parent ffff:fff1 ingress_block 111 ----------------
> > # ip l set net3 master bond0
> > # tc q ls dev net3 ingress
> >      qdisc ingress ffff: parent ffff:fff1 ingress_block 22 ----------------
> > 
> > Also add some print in netdev_linux_update_lag():
> > 2022-06-27T07:43:45.324Z|00058|netdev_linux|INFO|LAG: master_name=bond0, master_ifindex=22, auto_classified=1
> 
> you show a test for ovs adding ingress to slaves when bond0
> is not part of ovs.
> now with your patch also adding bond0 to ovs bridge doesn't add the
> ingress to the bond slaves. only to bond0 itself.
> 
Strange, it works fine in my env.

# echo "+bond0" > /sys/class/net/bonding_masters
# ip l | grep bond
27: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

# ip l set net3 master bond0
# tc q ls dev net3 ingress
# tc q ls dev bond0

# ovs-vsctl add-port br0 bond0

# tc q ls dev net3 ingress
qdisc ingress ffff: parent ffff:fff1 ingress_block 27 ----------------
# tc q ls dev bond0
qdisc ingress ffff: parent ffff:fff1 ingress_block 27 ----------------

> > > 
> > > > 
> > > > > > 
> > > > > > > > +            if (!master_netdev->auto_classified &&
> > > > > > > > +                is_netdev_linux_class(master_netdev->netdev_class)) {
> > > > > > > >                    block_id = netdev_get_block_id(master_netdev);
> > > > > > > >                    if (!block_id) {
> > > > > > > >                        netdev_close(master_netdev);
> > > > > > > > -- 
> > > > > > > > 1.8.3.1
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > 
>
Roi Dayan June 27, 2022, 12:47 p.m. UTC | #9
On 2022-06-27 1:13 PM, Tao Liu wrote:
> On Mon, Jun 27, 2022 at 11:51:48AM +0300, Roi Dayan wrote:
>>
>>
>> On 2022-06-27 10:49 AM, Tao Liu wrote:
>>> On Mon, Jun 27, 2022 at 09:40:38AM +0300, Roi Dayan wrote:
>>>>
>>>>
>>>> On 2022-06-23 12:42 PM, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 2022-06-22 11:56 PM, Marcelo Ricardo Leitner wrote:
>>>>>> On Wed, Jun 22, 2022 at 11:10:17AM +0800, Tao Liu wrote:
>>>>>>> On Tue, Jun 21, 2022 at 06:49:42AM -0700, Marcelo Ricardo Leitner wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Tue, Jun 21, 2022 at 02:10:39PM +0800, Tao Liu wrote:
>>>>>>>>> Bond master netdev may be created without a classification type, due
>>>>>>>>> to routing or tunneling code.
>>>>>>>>
>>>>>>>> Can you please elaborate on why is this an issue?
>>>>>>>>
>>>>>>> Hi, thanks for your reply.
>>>>>>>
>>>>>>> We are using BlueField2 in Bare Metal Server.
>>>>>>> p0 and p1 enslave to master bond0. A SF is created for RDMA.
>>>>>>> ovs manages pf0hpf and gre.
>>>>>>>
>>>>>>> Traffics between bond0 and SF are controlled by tc rules.
>>>>>>> ```
>>>>>>> master=bond0
>>>>>>> slave1=p0
>>>>>>> slave2=p1
>>>>>>> sf=enp3s0f0s0
>>>>>>> sf_rep=en3f0pf0sf0
>>>>>>> $tc qdisc add dev $master ingress_block 22 ingress
>>>>>>> $tc qdisc add dev $slave1 ingress_block 22 ingress
>>>>>>> $tc qdisc add dev $slave2 ingress_block 22 ingress
>>>>>>> $tc qdisc add dev $sf_rep ingress
>>>>>>>
>>>>>>> # some customized tc rules
>>>>>>> $tc filter add dev $sf_rep pref 1 ingress ... action mirred
>>>>>>> egress redirect dev $master
>>>>>>>
>>>>>>> $tc filter add block 22 pref 1 ... action mirred egress redirect
>>>>>>> dev $sf_rep
>>>>>>> ```
>>>>>>>
>>>>>>> Unfortunately ingress_block on p0 or p1 may be deleted and recreated by
>>>>>>> ovs when they enslaves to bond0.
>>>>>>>
>>>>>>> Also we have a similar architectur on cx5/cx6.
>>>>>>>
>>>>>>>>>
>>>>>>>>> If bond master is not attached to ovs, the ingress block
>>>>>>>>> on slaves shoud
>>>>>>>>> not be updated.
>>>>>>>>
>>>>>>>> Why? (in short, but more below)
>>>>>>>>
>>>>>>> If we do not use bond master and slaves in ovs, they shoud not
>>>>>>> be interfered.
>>>>>>
>>>>>> Makes sense, ...
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Simple reproducer:
>>>>>>>>>      ip a add 10.1.1.1/30 dev bond0
>>>>>>>>>      ip l set net3 master bond0
>>>>>>>>>      tc q ls dev net3
>>>>>>>>>
>>>>>>>>> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload
>>>>>>>>> LAG slaves to TC")
>>>>>>>>> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
>>>>>>>>> ---
>>>>>>>>>     lib/netdev-linux.c | 5 ++++-
>>>>>>>>>     1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>>>>>>>> index 9d12502..b9e0c99 100644
>>>>>>>>> --- a/lib/netdev-linux.c
>>>>>>>>> +++ b/lib/netdev-linux.c
>>>>>>>>> @@ -682,7 +682,10 @@ netdev_linux_update_lag(struct
>>>>>>>>> rtnetlink_change *change)
>>>>>>>>>                     return;
>>>>>>>>>                 }
>>>>>>>>>
>>>>>>>>> -            if (is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>>>>>> +            /* If bond master is not attached to ovs,
>>>>>>>>> ingress block on slaves
>>>>>>>>> +             * shoud not be updated. */
>>>>>>>>
>>>>>>>> I think this will break a core use case. As in your reproducer, that's
>>>>>>>> pretty much how it is expected to work today with tunnels, for
>>>>>>>> example:
>>>>>>>>
>>>>>>>>      ip a add 10.1.1.1/30 dev bond0
>>>>>>>>      ip l set net3 master bond0
>>>>>>>>      ip l s bond0 up
>>>>>>>>      ovs-vsctl add-port ovsbr0 vxlan0 -- \
>>>>>>>>       set interface vxlan0
>>>>>>>>       type=vxlan \
>>>>>>>>       options:local_ip=10.1.1.1
>>>>>>>>       options:remote_ip=10.1.1.2
>>>>>>>>       options:key=0
>>>>>>>>
>>>>>>>> If you patch like this, then who would be adding the ingress qdiscs on
>>>>>>>> the slaves?
>>>>>>>>
>>>>>>>> Forcing the bond to be added is probably not optimal, because it
>>>>>>>> doesn't really need to be. Unless your considering that as some sort
>>>>>>>> of authorization for ovs to mangle with it?
>>>>>>>>
>>>>>>>>      Marcelo
>>>>>>>>
>>>>>>> This patch does not break the tunnel use case. The ingress attaches on
>>>>>>> vxlan_sys, but not need to attach on bond master or slaves.
>>>>>>
>>>>>> ... I was under the impression that the driver needed it somehow,
>>>>>> despite that. But apparently that's not right, as the driver will have
>>>>>> one indr callback for each uplink, and should be able to have its way
>>>>>> from there.
>>>>>>
>>>>>> Thanks for the explanations.
>>>>>>
>>>>>> Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
>>>>>>
>>>>>
>>>>> there is an example without bond about ingress. We can
>>>>> use vxlan port in ovs with tunnel ip assigned on the pf.
>>>>> like Tao mentioned, the ingress and tc rules are created on
>>>>> the vxlan netdev created by ovs but ovs doesnt recreate ingress
>>>>> on the pf and the rules are offloaded by the driver.
>>>>>
>>>>> also looks good to me. thanks.
>>>>>
>>>>> Reviewed-by: Roi Dayan <roid@nvidia.com>
>>>>>
>>>>
>>>> Hmm, sorry but looks like I did review too quick.
>>>> I did some manual testing now and ovs doesn't add
>>>> ingress to the slave devices for me.
>>>> master_netdev->auto_classified==true in my case.
>>>> can you describe the steps you did to test? or you just tested
>>>> the flow where bond is not attached to ovs?
>>>>
>>> # ovs-vsctl show
>>> 866b5d8e-bf4d-4600-9dd3-b724036c7e99
>>>       ovs_version: "2.17.90"
>>>
>>> # echo "+bond0" > /sys/class/net/bonding_masters
>>>
>>> # ip l show dev bond0
>>> 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>>>       link/ether 94:29:2f:70:22:9c brd ff:ff:ff:ff:ff:ff
>>> # ip a add 10.1.1.1/30 dev bond0
>>>
>>> # ip l set net3 down
>>> # tc q add dev net3 ingress_block 111 ingress
>>> # tc q ls dev net3 ingress
>>>       qdisc ingress ffff: parent ffff:fff1 ingress_block 111 ----------------
>>> # ip l set net3 master bond0
>>> # tc q ls dev net3 ingress
>>>       qdisc ingress ffff: parent ffff:fff1 ingress_block 22 ----------------
>>>
>>> Also add some print in netdev_linux_update_lag():
>>> 2022-06-27T07:43:45.324Z|00058|netdev_linux|INFO|LAG: master_name=bond0, master_ifindex=22, auto_classified=1
>>
>> you show a test for ovs adding ingress to slaves when bond0
>> is not part of ovs.
>> now with your patch also adding bond0 to ovs bridge doesn't add the
>> ingress to the bond slaves. only to bond0 itself.
>>
> Strange, it works fine in my env.
> 
> # echo "+bond0" > /sys/class/net/bonding_masters
> # ip l | grep bond
> 27: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
> 
> # ip l set net3 master bond0
> # tc q ls dev net3 ingress
> # tc q ls dev bond0
> 
> # ovs-vsctl add-port br0 bond0
> 
> # tc q ls dev net3 ingress
> qdisc ingress ffff: parent ffff:fff1 ingress_block 27 ----------------
> # tc q ls dev bond0
> qdisc ingress ffff: parent ffff:fff1 ingress_block 27 ----------------
> 

could it be your bond0 configuration?
beside doing +bond0 into /sys/class/net/bonding_masters how do you
config bond0 and its slave?

I don't have any static config file and use ip commands
as follows:

ip link add name bond0 type bond mode $mode miimon 100
ip link set dev $nic1 down
ip link set dev $nic2 down
ip link set dev $nic1 master bond0
ip link set dev $nic2 master bond0
ip link set dev bond0 up
ip link set dev $nic1 up
ip link set dev $nic2 up


then adding bond0 to ovs bridge, I see bond0 got ingress block 310
but the slaves didn't get ingress.

#  tc qdisc show dev bond0 ingress
#  ovs-vsctl  show
eb7bb34b-bdf6-4b9b-88d8-22ffeaf47630
     ovs_version: "2.17.90"
#  ovs-vsctl  add-br ov1
#  ovs-vsctl  add-port ov1 bond0
#  tc qdisc show dev bond0 ingress
qdisc ingress ffff: parent ffff:fff1 ingress_block 310 ----------------
#  tc qdisc show dev enp8s0f0 ingress
#  tc qdisc show dev enp8s0f1 ingress
#  ovs-vsctl  show
eb7bb34b-bdf6-4b9b-88d8-22ffeaf47630
     Bridge ov1
         Port ov1
             Interface ov1
                 type: internal
         Port bond0
             Interface bond0
     ovs_version: "2.17.90"
#  tc qdisc show dev bond0 ingress
qdisc ingress ffff: parent ffff:fff1 ingress_block 310 ----------------
#  tc qdisc show dev enp8s0f0 ingress
#  tc qdisc show dev enp8s0f1 ingress
#


printing in the log showing master_netdev->auto_classified is 1.
you showed in your prev test its 1 as well . so how it got to be
0 now for you to pass the if-statement?

>>>>
>>>>>
>>>>>>>
>>>>>>>>> +            if (!master_netdev->auto_classified &&
>>>>>>>>> +                is_netdev_linux_class(master_netdev->netdev_class)) {
>>>>>>>>>                     block_id = netdev_get_block_id(master_netdev);
>>>>>>>>>                     if (!block_id) {
>>>>>>>>>                         netdev_close(master_netdev);
>>>>>>>>> -- 
>>>>>>>>> 1.8.3.1
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
Tao Liu June 29, 2022, 3 p.m. UTC | #10
On Mon, Jun 27, 2022 at 03:47:34PM +0300, Roi Dayan wrote:
> 
> could it be your bond0 configuration?
> beside doing +bond0 into /sys/class/net/bonding_masters how do you
> config bond0 and its slave?
> 
> I don't have any static config file and use ip commands
> as follows:
> 
> ip link add name bond0 type bond mode $mode miimon 100
> ip link set dev $nic1 down
> ip link set dev $nic2 down
> ip link set dev $nic1 master bond0
> ip link set dev $nic2 master bond0
> ip link set dev bond0 up
> ip link set dev $nic1 up
> ip link set dev $nic2 up
> 
> 
> then adding bond0 to ovs bridge, I see bond0 got ingress block 310
> but the slaves didn't get ingress.
> 
> #  tc qdisc show dev bond0 ingress
> #  ovs-vsctl  show
> eb7bb34b-bdf6-4b9b-88d8-22ffeaf47630
>     ovs_version: "2.17.90"
> #  ovs-vsctl  add-br ov1
> #  ovs-vsctl  add-port ov1 bond0
> #  tc qdisc show dev bond0 ingress
> qdisc ingress ffff: parent ffff:fff1 ingress_block 310 ----------------
> #  tc qdisc show dev enp8s0f0 ingress
> #  tc qdisc show dev enp8s0f1 ingress
> #  ovs-vsctl  show
> eb7bb34b-bdf6-4b9b-88d8-22ffeaf47630
>     Bridge ov1
>         Port ov1
>             Interface ov1
>                 type: internal
>         Port bond0
>             Interface bond0
>     ovs_version: "2.17.90"
> #  tc qdisc show dev bond0 ingress
> qdisc ingress ffff: parent ffff:fff1 ingress_block 310 ----------------
> #  tc qdisc show dev enp8s0f0 ingress
> #  tc qdisc show dev enp8s0f1 ingress
> #
> 
Hi, Roi. Thanks for having tested this patch.

I have reproduced your test case. It is because bond0 first opened with
type == NULL, then auto_classified == true. When bond0 adds to ovs, bond0
opens with type == "system", and auto_classified will not be cleared.

Will send v3.
> 
> printing in the log showing master_netdev->auto_classified is 1.
> you showed in your prev test its 1 as well . so how it got to be
> 0 now for you to pass the if-statement?
> 
Sorry, I copied the wrong log line.
>
diff mbox series

Patch

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 9d12502..b9e0c99 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -682,7 +682,10 @@  netdev_linux_update_lag(struct rtnetlink_change *change)
                 return;
             }
 
-            if (is_netdev_linux_class(master_netdev->netdev_class)) {
+            /* If bond master is not attached to ovs, ingress block on slaves
+             * shoud not be updated. */
+            if (!master_netdev->auto_classified &&
+                is_netdev_linux_class(master_netdev->netdev_class)) {
                 block_id = netdev_get_block_id(master_netdev);
                 if (!block_id) {
                     netdev_close(master_netdev);