diff mbox series

[ovs-dev,v4] bond: Always revalidate unbalanced bonds when active member changes.

Message ID 20231019023701.1424733-1-mkp@redhat.com
State Superseded
Delegated to: Eelco Chaudron
Headers show
Series [ovs-dev,v4] bond: Always revalidate unbalanced bonds when active member changes. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Mike Pattrick Oct. 19, 2023, 2:37 a.m. UTC
Currently a bond will not always revalidate when an active member
changes. This can result in counter-intuitive behaviors like the fact
that using ovs-appctl bond/set-active-member will cause the bond to
revalidate but changing other_config:bond-primary will not trigger a
revalidate in the bond.

When revalidation is not set but the active member changes in an
unbalanced bond, OVS may send traffic out of previously active member
instead of the new active member.

This change will always mark unbalanced bonds for revalidation if the
active member changes.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
Signed-off-by: Mike Pattrick <mkp@redhat.com>
---
v2: Added a test
v3: Made the test more reliable
v4: Made test much more reliable
---
 ofproto/bond.c          |  8 +++++--
 tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 2 deletions(-)

Comments

Eelco Chaudron Oct. 19, 2023, 1 p.m. UTC | #1
On 19 Oct 2023, at 4:37, Mike Pattrick wrote:

> Currently a bond will not always revalidate when an active member
> changes. This can result in counter-intuitive behaviors like the fact
> that using ovs-appctl bond/set-active-member will cause the bond to
> revalidate but changing other_config:bond-primary will not trigger a
> revalidate in the bond.
>
> When revalidation is not set but the active member changes in an
> unbalanced bond, OVS may send traffic out of previously active member
> instead of the new active member.
>
> This change will always mark unbalanced bonds for revalidation if the
> active member changes.

Thanks for fixing my comments on V3, some more comments on the tests, and the removed annotation.

//Eelco

> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> Signed-off-by: Mike Pattrick <mkp@redhat.com>
> ---
> v2: Added a test
> v3: Made the test more reliable
> v4: Made test much more reliable
> ---
>  ofproto/bond.c          |  8 +++++--
>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index cfdf44f85..fb108d30a 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>  static bool bond_is_falling_back_to_ab(const struct bond *);
>  static void bond_add_lb_output_buckets(const struct bond *);
>  static void bond_del_lb_output_buckets(const struct bond *);
> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>
>
>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>
>  static void
>  bond_active_member_changed(struct bond *bond)
> +    OVS_REQ_WRLOCK(rwlock)
>  {
>      if (bond->active_member) {
>          struct eth_addr mac;
>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>          bond->active_member_mac = mac;
> +        if (!bond_is_balanced(bond)) {
> +            bond->bond_revalidate = true;
> +        }
>      } else {
>          bond->active_member_mac = eth_addr_zero;
>      }
> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>  /* Rebalancing. */
>
>  static bool
> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
> +bond_is_balanced(const struct bond *bond)

See the other email, but I think we should re-add the annotation as there might be other (new) callers of this function that need protection from calling this without the readlock.

>  {
>      return bond->rebalance_interval
>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>      }
>
>      if (bond->active_member != member) {
> -        bond->bond_revalidate = true;
>          bond->active_member = member;
>          VLOG_INFO("bond %s: active member is now %s",
>                    bond->name, member->name);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 945037ec0..52c233be9 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([datapath - bond active-backup failover])
> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> +on_exit 'ip link del link0a'
> +on_exit 'ip link del link0b'
> +AT_CHECK([ip link add link0a type veth peer name link1a])
> +AT_CHECK([ip link add link0b type veth peer name link1b])
> +
> +AT_CHECK([ip link set dev link0a up])
> +AT_CHECK([ip link set dev link1a up])
> +AT_CHECK([ip link set dev link0b up])
> +AT_CHECK([ip link set dev link1b up])
> +
> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
> +
> +for i in `seq 1 3`; do

Guess this is a leftover of your testing?

> +dnl Set primary bond member.
> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> +                    set port bond1 other_config:bond-primary=link1a])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])

Here you are fine with some packets being replied to, and below you want all 12. Is this intended, and if so why?

Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time from 16 seconds to 6 seconds.

> +dnl Check correct port is used.
> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
> +
> +dnl Change primary bond member.
> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> +                    set port bond1 other_config:bond-primary=link1b])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
> +
> +dnl Check correct port is used.
> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
> +done
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP

FYI, the modified test with 4 pings, and removal of the for loop passed 50 runs without any failure on my system.

>  AT_SETUP([datapath - ping over vxlan tunnel])
>  OVS_CHECK_TUNNEL_TSO()
>  OVS_CHECK_VXLAN()
> -- 
> 2.39.3
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Oct. 19, 2023, 1:57 p.m. UTC | #2
On 19 Oct 2023, at 15:00, Eelco Chaudron wrote:

> On 19 Oct 2023, at 4:37, Mike Pattrick wrote:
>
>> Currently a bond will not always revalidate when an active member
>> changes. This can result in counter-intuitive behaviors like the fact
>> that using ovs-appctl bond/set-active-member will cause the bond to
>> revalidate but changing other_config:bond-primary will not trigger a
>> revalidate in the bond.
>>
>> When revalidation is not set but the active member changes in an
>> unbalanced bond, OVS may send traffic out of previously active member
>> instead of the new active member.
>>
>> This change will always mark unbalanced bonds for revalidation if the
>> active member changes.
>
> Thanks for fixing my comments on V3, some more comments on the tests, and the removed annotation.
>
> //Eelco
>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
>> Signed-off-by: Mike Pattrick <mkp@redhat.com>
>> ---
>> v2: Added a test
>> v3: Made the test more reliable
>> v4: Made test much more reliable
>> ---
>>  ofproto/bond.c          |  8 +++++--
>>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 56 insertions(+), 2 deletions(-)
>>
>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>> index cfdf44f85..fb108d30a 100644
>> --- a/ofproto/bond.c
>> +++ b/ofproto/bond.c
>> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>>  static bool bond_is_falling_back_to_ab(const struct bond *);
>>  static void bond_add_lb_output_buckets(const struct bond *);
>>  static void bond_del_lb_output_buckets(const struct bond *);
>> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>>
>>
>>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
>> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>>
>>  static void
>>  bond_active_member_changed(struct bond *bond)
>> +    OVS_REQ_WRLOCK(rwlock)
>>  {
>>      if (bond->active_member) {
>>          struct eth_addr mac;
>>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>>          bond->active_member_mac = mac;
>> +        if (!bond_is_balanced(bond)) {
>> +            bond->bond_revalidate = true;
>> +        }
>>      } else {
>>          bond->active_member_mac = eth_addr_zero;
>>      }
>> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>>  /* Rebalancing. */
>>
>>  static bool
>> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>> +bond_is_balanced(const struct bond *bond)
>
> See the other email, but I think we should re-add the annotation as there might be other (new) callers of this function that need protection from calling this without the readlock.

Ignore this comment, you are right anotating it at declaration is enough. I thought it failed for me in the past :(

//Eelco


>>  {
>>      return bond->rebalance_interval
>>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
>> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>>      }
>>
>>      if (bond->active_member != member) {
>> -        bond->bond_revalidate = true;
>>          bond->active_member = member;
>>          VLOG_INFO("bond %s: active member is now %s",
>>                    bond->name, member->name);
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 945037ec0..52c233be9 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([datapath - bond active-backup failover])
>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
>> +on_exit 'ip link del link0a'
>> +on_exit 'ip link del link0b'
>> +AT_CHECK([ip link add link0a type veth peer name link1a])
>> +AT_CHECK([ip link add link0b type veth peer name link1b])
>> +
>> +AT_CHECK([ip link set dev link0a up])
>> +AT_CHECK([ip link set dev link1a up])
>> +AT_CHECK([ip link set dev link0b up])
>> +AT_CHECK([ip link set dev link1b up])
>> +
>> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
>> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
>> +
>> +for i in `seq 1 3`; do
>
> Guess this is a leftover of your testing?
>
>> +dnl Set primary bond member.
>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
>> +                    set port bond1 other_config:bond-primary=link1a])
>> +
>> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
>> +
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])
>
> Here you are fine with some packets being replied to, and below you want all 12. Is this intended, and if so why?
>
> Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time from 16 seconds to 6 seconds.
>
>> +dnl Check correct port is used.
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
>> +
>> +dnl Change primary bond member.
>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
>> +                    set port bond1 other_config:bond-primary=link1b])
>> +
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
>> +
>> +dnl Check correct port is used.
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
>> +done
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>
> FYI, the modified test with 4 pings, and removal of the for loop passed 50 runs without any failure on my system.
>
>>  AT_SETUP([datapath - ping over vxlan tunnel])
>>  OVS_CHECK_TUNNEL_TSO()
>>  OVS_CHECK_VXLAN()
>> -- 
>> 2.39.3
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Mike Pattrick Oct. 19, 2023, 3:29 p.m. UTC | #3
On Thu, Oct 19, 2023 at 9:00 AM Eelco Chaudron <echaudro@redhat.com> wrote:
>
>
>
> On 19 Oct 2023, at 4:37, Mike Pattrick wrote:
>
> > Currently a bond will not always revalidate when an active member
> > changes. This can result in counter-intuitive behaviors like the fact
> > that using ovs-appctl bond/set-active-member will cause the bond to
> > revalidate but changing other_config:bond-primary will not trigger a
> > revalidate in the bond.
> >
> > When revalidation is not set but the active member changes in an
> > unbalanced bond, OVS may send traffic out of previously active member
> > instead of the new active member.
> >
> > This change will always mark unbalanced bonds for revalidation if the
> > active member changes.
>
> Thanks for fixing my comments on V3, some more comments on the tests, and the removed annotation.
>
> //Eelco
>
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
> > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > ---
> > v2: Added a test
> > v3: Made the test more reliable
> > v4: Made test much more reliable
> > ---
> >  ofproto/bond.c          |  8 +++++--
> >  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/ofproto/bond.c b/ofproto/bond.c
> > index cfdf44f85..fb108d30a 100644
> > --- a/ofproto/bond.c
> > +++ b/ofproto/bond.c
> > @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
> >  static bool bond_is_falling_back_to_ab(const struct bond *);
> >  static void bond_add_lb_output_buckets(const struct bond *);
> >  static void bond_del_lb_output_buckets(const struct bond *);
> > +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
> >
> >
> >  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
> > @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
> >
> >  static void
> >  bond_active_member_changed(struct bond *bond)
> > +    OVS_REQ_WRLOCK(rwlock)
> >  {
> >      if (bond->active_member) {
> >          struct eth_addr mac;
> >          netdev_get_etheraddr(bond->active_member->netdev, &mac);
> >          bond->active_member_mac = mac;
> > +        if (!bond_is_balanced(bond)) {
> > +            bond->bond_revalidate = true;
> > +        }
> >      } else {
> >          bond->active_member_mac = eth_addr_zero;
> >      }
> > @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
> >  /* Rebalancing. */
> >
> >  static bool
> > -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
> > +bond_is_balanced(const struct bond *bond)
>
> See the other email, but I think we should re-add the annotation as there might be other (new) callers of this function that need protection from calling this without the readlock.
>
> >  {
> >      return bond->rebalance_interval
> >          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
> > @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
> >      }
> >
> >      if (bond->active_member != member) {
> > -        bond->bond_revalidate = true;
> >          bond->active_member = member;
> >          VLOG_INFO("bond %s: active member is now %s",
> >                    bond->name, member->name);
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index 945037ec0..52c233be9 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > +AT_SETUP([datapath - bond active-backup failover])
> > +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
> > +on_exit 'ip link del link0a'
> > +on_exit 'ip link del link0b'
> > +AT_CHECK([ip link add link0a type veth peer name link1a])
> > +AT_CHECK([ip link add link0b type veth peer name link1b])
> > +
> > +AT_CHECK([ip link set dev link0a up])
> > +AT_CHECK([ip link set dev link1a up])
> > +AT_CHECK([ip link set dev link0b up])
> > +AT_CHECK([ip link set dev link1b up])
> > +
> > +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
> > +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
> > +
> > +for i in `seq 1 3`; do
>
> Guess this is a leftover of your testing?
>
> > +dnl Set primary bond member.
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
> > +                    set port bond1 other_config:bond-primary=link1a])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])
>
> Here you are fine with some packets being replied to, and below you want all 12. Is this intended, and if so why?
>
> Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time from 16 seconds to 6 seconds.


The last few revisions have been due to the test not being reliable
enough, so I wanted something that would be very reliable. But I
probably went a bit too far with this change.

The first ping isn't impacted by the failover so I didn't think it was
important to measure packet loss there.


-M

>
> > +dnl Check correct port is used.
> > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
> > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
> > +
> > +dnl Change primary bond member.
> > +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
> > +                    set port bond1 other_config:bond-primary=link1b])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
> > +
> > +dnl Check correct port is used.
> > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
> > +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
> > +done
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
>
> FYI, the modified test with 4 pings, and removal of the for loop passed 50 runs without any failure on my system.
>
> >  AT_SETUP([datapath - ping over vxlan tunnel])
> >  OVS_CHECK_TUNNEL_TSO()
> >  OVS_CHECK_VXLAN()
> > --
> > 2.39.3
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Eelco Chaudron Oct. 20, 2023, 9:56 a.m. UTC | #4
On 19 Oct 2023, at 17:29, Mike Pattrick wrote:

> On Thu, Oct 19, 2023 at 9:00 AM Eelco Chaudron <echaudro@redhat.com> wrote:
>>
>>
>>
>> On 19 Oct 2023, at 4:37, Mike Pattrick wrote:
>>
>>> Currently a bond will not always revalidate when an active member
>>> changes. This can result in counter-intuitive behaviors like the fact
>>> that using ovs-appctl bond/set-active-member will cause the bond to
>>> revalidate but changing other_config:bond-primary will not trigger a
>>> revalidate in the bond.
>>>
>>> When revalidation is not set but the active member changes in an
>>> unbalanced bond, OVS may send traffic out of previously active member
>>> instead of the new active member.
>>>
>>> This change will always mark unbalanced bonds for revalidation if the
>>> active member changes.
>>
>> Thanks for fixing my comments on V3, some more comments on the tests, and the removed annotation.
>>
>> //Eelco
>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
>>> Signed-off-by: Mike Pattrick <mkp@redhat.com>
>>> ---
>>> v2: Added a test
>>> v3: Made the test more reliable
>>> v4: Made test much more reliable
>>> ---
>>>  ofproto/bond.c          |  8 +++++--
>>>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 56 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index cfdf44f85..fb108d30a 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>>>  static bool bond_is_falling_back_to_ab(const struct bond *);
>>>  static void bond_add_lb_output_buckets(const struct bond *);
>>>  static void bond_del_lb_output_buckets(const struct bond *);
>>> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>>>
>>>
>>>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
>>> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>>>
>>>  static void
>>>  bond_active_member_changed(struct bond *bond)
>>> +    OVS_REQ_WRLOCK(rwlock)
>>>  {
>>>      if (bond->active_member) {
>>>          struct eth_addr mac;
>>>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>>>          bond->active_member_mac = mac;
>>> +        if (!bond_is_balanced(bond)) {
>>> +            bond->bond_revalidate = true;
>>> +        }
>>>      } else {
>>>          bond->active_member_mac = eth_addr_zero;
>>>      }
>>> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>>>  /* Rebalancing. */
>>>
>>>  static bool
>>> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>>> +bond_is_balanced(const struct bond *bond)
>>
>> See the other email, but I think we should re-add the annotation as there might be other (new) callers of this function that need protection from calling this without the readlock.
>>
>>>  {
>>>      return bond->rebalance_interval
>>>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
>>> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>>>      }
>>>
>>>      if (bond->active_member != member) {
>>> -        bond->bond_revalidate = true;
>>>          bond->active_member = member;
>>>          VLOG_INFO("bond %s: active member is now %s",
>>>                    bond->name, member->name);
>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>> index 945037ec0..52c233be9 100644
>>> --- a/tests/system-traffic.at
>>> +++ b/tests/system-traffic.at
>>> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>  AT_CLEANUP
>>>
>>> +AT_SETUP([datapath - bond active-backup failover])
>>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>>> +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>>> +
>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>> +
>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
>>> +on_exit 'ip link del link0a'
>>> +on_exit 'ip link del link0b'
>>> +AT_CHECK([ip link add link0a type veth peer name link1a])
>>> +AT_CHECK([ip link add link0b type veth peer name link1b])
>>> +
>>> +AT_CHECK([ip link set dev link0a up])
>>> +AT_CHECK([ip link set dev link1a up])
>>> +AT_CHECK([ip link set dev link0b up])
>>> +AT_CHECK([ip link set dev link1b up])
>>> +
>>> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
>>> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
>>> +
>>> +for i in `seq 1 3`; do
>>
>> Guess this is a leftover of your testing?
>>
>>> +dnl Set primary bond member.
>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
>>> +                    set port bond1 other_config:bond-primary=link1a])
>>> +
>>> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])
>>
>> Here you are fine with some packets being replied to, and below you want all 12. Is this intended, and if so why?
>>
>> Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time from 16 seconds to 6 seconds.
>
>
> The last few revisions have been due to the test not being reliable
> enough, so I wanted something that would be very reliable. But I
> probably went a bit too far with this change.
>
> The first ping isn't impacted by the failover so I didn't think it was
> important to measure packet loss there.

I agree, but it would be nice to be consistent, as people seem to cut/paste other tests without too much thought.
So I would prefer to have both test grep for  “, 4 received”, as we should not drop pings in a simple test like this.

>>
>>> +dnl Check correct port is used.
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
>>> +
>>> +dnl Change primary bond member.
>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
>>> +                    set port bond1 other_config:bond-primary=link1b])
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
>>> +
>>> +dnl Check correct port is used.
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
>>> +done
>>> +
>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>> +AT_CLEANUP
>>
>> FYI, the modified test with 4 pings, and removal of the for loop passed 50 runs without any failure on my system.
>>
>>>  AT_SETUP([datapath - ping over vxlan tunnel])
>>>  OVS_CHECK_TUNNEL_TSO()
>>>  OVS_CHECK_VXLAN()
>>> --
>>> 2.39.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Ilya Maximets Oct. 20, 2023, 10:08 a.m. UTC | #5
On 10/20/23 11:56, Eelco Chaudron wrote:
> 
> 
> On 19 Oct 2023, at 17:29, Mike Pattrick wrote:
> 
>> On Thu, Oct 19, 2023 at 9:00 AM Eelco Chaudron <echaudro@redhat.com> wrote:
>>>
>>>
>>>
>>> On 19 Oct 2023, at 4:37, Mike Pattrick wrote:
>>>
>>>> Currently a bond will not always revalidate when an active member
>>>> changes. This can result in counter-intuitive behaviors like the fact
>>>> that using ovs-appctl bond/set-active-member will cause the bond to
>>>> revalidate but changing other_config:bond-primary will not trigger a
>>>> revalidate in the bond.
>>>>
>>>> When revalidation is not set but the active member changes in an
>>>> unbalanced bond, OVS may send traffic out of previously active member
>>>> instead of the new active member.
>>>>
>>>> This change will always mark unbalanced bonds for revalidation if the
>>>> active member changes.
>>>
>>> Thanks for fixing my comments on V3, some more comments on the tests, and the removed annotation.
>>>
>>> //Eelco
>>>
>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2214979
>>>> Signed-off-by: Mike Pattrick <mkp@redhat.com>
>>>> ---
>>>> v2: Added a test
>>>> v3: Made the test more reliable
>>>> v4: Made test much more reliable
>>>> ---
>>>>  ofproto/bond.c          |  8 +++++--
>>>>  tests/system-traffic.at | 50 +++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 56 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>> index cfdf44f85..fb108d30a 100644
>>>> --- a/ofproto/bond.c
>>>> +++ b/ofproto/bond.c
>>>> @@ -193,6 +193,7 @@ static void bond_update_post_recirc_rules__(struct bond *, bool force)
>>>>  static bool bond_is_falling_back_to_ab(const struct bond *);
>>>>  static void bond_add_lb_output_buckets(const struct bond *);
>>>>  static void bond_del_lb_output_buckets(const struct bond *);
>>>> +static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
>>>>
>>>>
>>>>  /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
>>>> @@ -552,11 +553,15 @@ bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
>>>>
>>>>  static void
>>>>  bond_active_member_changed(struct bond *bond)
>>>> +    OVS_REQ_WRLOCK(rwlock)
>>>>  {
>>>>      if (bond->active_member) {
>>>>          struct eth_addr mac;
>>>>          netdev_get_etheraddr(bond->active_member->netdev, &mac);
>>>>          bond->active_member_mac = mac;
>>>> +        if (!bond_is_balanced(bond)) {
>>>> +            bond->bond_revalidate = true;
>>>> +        }
>>>>      } else {
>>>>          bond->active_member_mac = eth_addr_zero;
>>>>      }
>>>> @@ -1124,7 +1129,7 @@ bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
>>>>  /* Rebalancing. */
>>>>
>>>>  static bool
>>>> -bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
>>>> +bond_is_balanced(const struct bond *bond)
>>>
>>> See the other email, but I think we should re-add the annotation as there might be other (new) callers of this function that need protection from calling this without the readlock.
>>>
>>>>  {
>>>>      return bond->rebalance_interval
>>>>          && (bond->balance == BM_SLB || bond->balance == BM_TCP)
>>>> @@ -1728,7 +1733,6 @@ bond_unixctl_set_active_member(struct unixctl_conn *conn,
>>>>      }
>>>>
>>>>      if (bond->active_member != member) {
>>>> -        bond->bond_revalidate = true;
>>>>          bond->active_member = member;
>>>>          VLOG_INFO("bond %s: active member is now %s",
>>>>                    bond->name, member->name);
>>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>>> index 945037ec0..52c233be9 100644
>>>> --- a/tests/system-traffic.at
>>>> +++ b/tests/system-traffic.at
>>>> @@ -291,6 +291,56 @@ NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
>>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>>  AT_CLEANUP
>>>>
>>>> +AT_SETUP([datapath - bond active-backup failover])
>>>> +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
>>>> +
>>>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>>>> +AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
>>>> +
>>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>>> +
>>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>>>> +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
>>>> +on_exit 'ip link del link0a'
>>>> +on_exit 'ip link del link0b'
>>>> +AT_CHECK([ip link add link0a type veth peer name link1a])
>>>> +AT_CHECK([ip link add link0b type veth peer name link1b])
>>>> +
>>>> +AT_CHECK([ip link set dev link0a up])
>>>> +AT_CHECK([ip link set dev link1a up])
>>>> +AT_CHECK([ip link set dev link0b up])
>>>> +AT_CHECK([ip link set dev link1b up])
>>>> +
>>>> +AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
>>>> +AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
>>>> +
>>>> +for i in `seq 1 3`; do
>>>
>>> Guess this is a leftover of your testing?
>>>
>>>> +dnl Set primary bond member.
>>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
>>>> +                    set port bond1 other_config:bond-primary=link1a])
>>>> +
>>>> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
>>>> +
>>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])
>>>
>>> Here you are fine with some packets being replied to, and below you want all 12. Is this intended, and if so why?
>>>
>>> Also, 12 pings is quite some time, would 4 pings be enough? It cuts test time from 16 seconds to 6 seconds.
>>
>>
>> The last few revisions have been due to the test not being reliable
>> enough, so I wanted something that would be very reliable. But I
>> probably went a bit too far with this change.
>>
>> The first ping isn't impacted by the failover so I didn't think it was
>> important to measure packet loss there.
> 
> I agree, but it would be nice to be consistent, as people seem to cut/paste other tests without too much thought.
> So I would prefer to have both test grep for  “, 4 received”, as we should not drop pings in a simple test like this.

Slightly orthogonal question:  why this needs to be a system test
with pings and other stuff?  Simple 'unit' tests with dummy datapath
are more reliable, much faster and running more frequently in CI.

> 
>>>
>>>> +dnl Check correct port is used.
>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
>>>> +
>>>> +dnl Change primary bond member.
>>>> +AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
>>>> +                    set port bond1 other_config:bond-primary=link1b])
>>>> +
>>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
>>>> +
>>>> +dnl Check correct port is used.
>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
>>>> +AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
>>>> +done
>>>> +
>>>> +OVS_TRAFFIC_VSWITCHD_STOP
>>>> +AT_CLEANUP
>>>
>>> FYI, the modified test with 4 pings, and removal of the for loop passed 50 runs without any failure on my system.
>>>
>>>>  AT_SETUP([datapath - ping over vxlan tunnel])
>>>>  OVS_CHECK_TUNNEL_TSO()
>>>>  OVS_CHECK_VXLAN()
>>>> --
>>>> 2.39.3
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/ofproto/bond.c b/ofproto/bond.c
index cfdf44f85..fb108d30a 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -193,6 +193,7 @@  static void bond_update_post_recirc_rules__(struct bond *, bool force)
 static bool bond_is_falling_back_to_ab(const struct bond *);
 static void bond_add_lb_output_buckets(const struct bond *);
 static void bond_del_lb_output_buckets(const struct bond *);
+static bool bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock);
 
 
 /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
@@ -552,11 +553,15 @@  bond_find_member_by_mac(const struct bond *bond, const struct eth_addr mac)
 
 static void
 bond_active_member_changed(struct bond *bond)
+    OVS_REQ_WRLOCK(rwlock)
 {
     if (bond->active_member) {
         struct eth_addr mac;
         netdev_get_etheraddr(bond->active_member->netdev, &mac);
         bond->active_member_mac = mac;
+        if (!bond_is_balanced(bond)) {
+            bond->bond_revalidate = true;
+        }
     } else {
         bond->active_member_mac = eth_addr_zero;
     }
@@ -1124,7 +1129,7 @@  bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
 /* Rebalancing. */
 
 static bool
-bond_is_balanced(const struct bond *bond) OVS_REQ_RDLOCK(rwlock)
+bond_is_balanced(const struct bond *bond)
 {
     return bond->rebalance_interval
         && (bond->balance == BM_SLB || bond->balance == BM_TCP)
@@ -1728,7 +1733,6 @@  bond_unixctl_set_active_member(struct unixctl_conn *conn,
     }
 
     if (bond->active_member != member) {
-        bond->bond_revalidate = true;
         bond->active_member = member;
         VLOG_INFO("bond %s: active member is now %s",
                   bond->name, member->name);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 945037ec0..52c233be9 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -291,6 +291,56 @@  NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - bond active-backup failover])
+OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br1 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24")
+on_exit 'ip link del link0a'
+on_exit 'ip link del link0b'
+AT_CHECK([ip link add link0a type veth peer name link1a])
+AT_CHECK([ip link add link0b type veth peer name link1b])
+
+AT_CHECK([ip link set dev link0a up])
+AT_CHECK([ip link set dev link1a up])
+AT_CHECK([ip link set dev link0b up])
+AT_CHECK([ip link set dev link1b up])
+
+AT_CHECK([ovs-vsctl add-bond br0 bond0 link0a link0b bond_mode=active-backup])
+AT_CHECK([ovs-vsctl add-bond br1 bond1 link1a link1b bond_mode=active-backup])
+
+for i in `seq 1 3`; do
+dnl Set primary bond member.
+AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0a -- \
+                    set port bond1 other_config:bond-primary=link1a])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -qv "100% packet loss"], [0])
+
+dnl Check correct port is used.
+AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]a"], [0])
+AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]b"], [0])
+
+dnl Change primary bond member.
+AT_CHECK([ovs-vsctl set port bond0 other_config:bond-primary=link0b -- \
+                    set port bond1 other_config:bond-primary=link1b])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 12 -i 0.6 -w 12 10.1.1.2 | grep -q "12 received"], [0])
+
+dnl Check correct port is used.
+AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eqv "actions:link[[01]]a"], [0])
+AT_CHECK([ovs-appctl dpctl/dump-flows -m | grep -Eq "actions:link[[01]]b"], [0])
+done
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over vxlan tunnel])
 OVS_CHECK_TUNNEL_TSO()
 OVS_CHECK_VXLAN()