diff mbox series

[ovs-dev,v5] controller: Add delay after multicast ARP packet

Message ID 20220714103700.1204573-1-amusil@redhat.com
State Accepted
Headers show
Series [ovs-dev,v5] controller: Add delay after multicast ARP packet | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Ales Musil July 14, 2022, 10:37 a.m. UTC
The ovn-controller had a race condition over MAC
binding table with other controllers. When multiple
controllers received GARP from single source usually
the one who was able to win the race put it into SB.
The others got transaction error which triggered
full recompute even if it's not needed.

In order to reduce the chance of multiple controllers
trying to insert same row at the same time add slight
delay to the MAC binding processing. This delay
is random interval between 1-50 in ms. This greatly
reduces the chance that multiple controllers will try
to add MAC binding at exactly the same time. This applies
only to multicast ARP request which applies only to GARP
that is sent to broadcast address.

Local testing with this delay vs without show significantly
reduced chance of hitting the transaction error.

During the testing 10 GARPs was sent to two controllers
at the same time. Without proposed fix at least one controller
had multiple transaction errors present every test iteration.

With the proposed fix the transaction error was reduced to a
single one when it happened which was usually in less than
10% of the iterations.

As was mentioned before the race condition can still happen,
but the chance is greatly reduced.

Suggested-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
---
v3: Rebase on top of current main.
    Change the delay to be per ARP request instead of
    single delay for everything. This allows the possibility
    to delay only multicast/broadcast AP as suggested by Han.
v4,v5: Address comments from Dumitru.
---
 controller/mac-learn.c | 41 +++++++++++++++++++++++++++++++++--------
 controller/mac-learn.h |  9 +++++++--
 controller/pinctrl.c   | 27 ++++++++++++++++++---------
 tests/ovn.at           |  6 +++---
 4 files changed, 61 insertions(+), 22 deletions(-)

Comments

Dumitru Ceara July 14, 2022, 11:42 a.m. UTC | #1
On 7/14/22 12:37, Ales Musil wrote:
> The ovn-controller had a race condition over MAC
> binding table with other controllers. When multiple
> controllers received GARP from single source usually
> the one who was able to win the race put it into SB.
> The others got transaction error which triggered
> full recompute even if it's not needed.
> 
> In order to reduce the chance of multiple controllers
> trying to insert same row at the same time add slight
> delay to the MAC binding processing. This delay
> is random interval between 1-50 in ms. This greatly
> reduces the chance that multiple controllers will try
> to add MAC binding at exactly the same time. This applies
> only to multicast ARP request which applies only to GARP
> that is sent to broadcast address.
> 
> Local testing with this delay vs without show significantly
> reduced chance of hitting the transaction error.
> 
> During the testing 10 GARPs was sent to two controllers
> at the same time. Without proposed fix at least one controller
> had multiple transaction errors present every test iteration.
> 
> With the proposed fix the transaction error was reduced to a
> single one when it happened which was usually in less than
> 10% of the iterations.
> 
> As was mentioned before the race condition can still happen,
> but the chance is greatly reduced.
> 
> Suggested-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
> Signed-off-by: Ales Musil <amusil@redhat.com>
> ---
> v3: Rebase on top of current main.
>     Change the delay to be per ARP request instead of
>     single delay for everything. This allows the possibility
>     to delay only multicast/broadcast AP as suggested by Han.
> v4,v5: Address comments from Dumitru.
> ---

Thanks for this new revision!

I have two very small comments but I don't think you need to send a v6
for this, let's see if the maintainers require that or if the
suggestions can be incorporated when the patch is accepted.

Acked-by: Dumitru Ceara <dceara@redhat.com>

>  controller/mac-learn.c | 41 +++++++++++++++++++++++++++++++++--------
>  controller/mac-learn.h |  9 +++++++--
>  controller/pinctrl.c   | 27 ++++++++++++++++++---------
>  tests/ovn.at           |  6 +++---
>  4 files changed, 61 insertions(+), 22 deletions(-)
> 
> diff --git a/controller/mac-learn.c b/controller/mac-learn.c
> index 27634dca8..a27607016 100644
> --- a/controller/mac-learn.c
> +++ b/controller/mac-learn.c
> @@ -18,14 +18,18 @@
>  #include "mac-learn.h"
>  
>  /* OpenvSwitch lib includes. */
> +#include "openvswitch/poll-loop.h"
>  #include "openvswitch/vlog.h"
>  #include "lib/packets.h"
> +#include "lib/random.h"
>  #include "lib/smap.h"
> +#include "lib/timeval.h"
>  
>  VLOG_DEFINE_THIS_MODULE(mac_learn);
>  
>  #define MAX_MAC_BINDINGS 1000
>  #define MAX_FDB_ENTRIES  1000
> +#define MAX_MAC_BINDING_DELAY_MSEC 50
>  
>  static size_t mac_binding_hash(uint32_t dp_key, uint32_t port_key,
>                                 struct in6_addr *);
> @@ -46,25 +50,19 @@ ovn_mac_bindings_init(struct hmap *mac_bindings)
>  }
>  
>  void
> -ovn_mac_bindings_flush(struct hmap *mac_bindings)
> +ovn_mac_bindings_destroy(struct hmap *mac_bindings)
>  {
>      struct mac_binding *mb;
>      HMAP_FOR_EACH_POP (mb, hmap_node, mac_bindings) {
>          free(mb);
>      }
> -}
> -
> -void
> -ovn_mac_bindings_destroy(struct hmap *mac_bindings)
> -{
> -    ovn_mac_bindings_flush(mac_bindings);
>      hmap_destroy(mac_bindings);
>  }
>  
>  struct mac_binding *
>  ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
>                      uint32_t port_key, struct in6_addr *ip,
> -                    struct eth_addr mac)
> +                    struct eth_addr mac, bool is_unicast)
>  {
>      uint32_t hash = mac_binding_hash(dp_key, port_key, ip);
>  
> @@ -75,10 +73,13 @@ ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
>              return NULL;
>          }
>  
> +        uint32_t delay = is_unicast
> +            ? 0 : random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1;

Nit: the coding-style doesn't seem to be extremely specific but the way
I read it this should be indented a bit differently:

uint32_t delay = is_unicast
                 ? 0 : random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1;

Or maybe cleaner:

uint32_t delay = !is_unicast
                 ? random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1
                 : 0;

>          mb = xmalloc(sizeof *mb);
>          mb->dp_key = dp_key;
>          mb->port_key = port_key;
>          mb->ip = *ip;
> +        mb->commit_at_ms = time_msec() + delay;
>          hmap_insert(mac_bindings, &mb->hmap_node, hash);
>      }
>      mb->mac = mac;
> @@ -86,6 +87,30 @@ ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
>      return mb;
>  }
>  
> +/* This is called from ovn-controller main context */
> +void
> +ovn_mac_binding_wait(struct hmap *mac_bindings)
> +{
> +    struct mac_binding *mb;
> +
> +    HMAP_FOR_EACH (mb, hmap_node, mac_bindings) {
> +        poll_timer_wait_until(mb->commit_at_ms);
> +    }
> +}
> +
> +void
> +ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings)
> +{
> +    hmap_remove(mac_bindings, &mb->hmap_node);
> +    free(mb);
> +}
> +
> +bool
> +ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now)
> +{
> +    return now >= mb->commit_at_ms;
> +}
> +
>  /* fdb functions. */
>  void
>  ovn_fdb_init(struct hmap *fdbs)
> diff --git a/controller/mac-learn.h b/controller/mac-learn.h
> index e7e8ba2d3..57c50c58b 100644
> --- a/controller/mac-learn.h
> +++ b/controller/mac-learn.h
> @@ -31,16 +31,21 @@ struct mac_binding {
>  
>      /* Value. */
>      struct eth_addr mac;
> +
> +    /* Timestamp when to commit to SB. */
> +    long long commit_at_ms;
>  };
>  
>  void ovn_mac_bindings_init(struct hmap *mac_bindings);
> -void ovn_mac_bindings_flush(struct hmap *mac_bindings);
>  void ovn_mac_bindings_destroy(struct hmap *mac_bindings);
> +void ovn_mac_binding_wait(struct hmap *mac_bindings);
> +void ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings);
> +bool ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now);
>  
>  struct mac_binding *ovn_mac_binding_add(struct hmap *mac_bindings,
>                                          uint32_t dp_key, uint32_t port_key,
>                                          struct in6_addr *ip,
> -                                        struct eth_addr mac);
> +                                        struct eth_addr mac, bool is_unicast);
>  
>  
>  
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index f954362b7..beb3d3344 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -4134,9 +4134,13 @@ pinctrl_handle_put_mac_binding(const struct flow *md,
>          memcpy(&ip_key, &ip6, sizeof ip_key);
>      }
>  
> +    /* If the ARP reply was unicast we should not delay it,
> +     * there won't be any race. */

Nit: Now that you encapsulated the delay logic in mac-learn.c, I think
this comment can go inside ovn_mac_binding_add() before we compute 'delay'.

> +    bool is_unicast = !eth_addr_is_multicast(headers->dl_dst);
>      struct mac_binding *mb = ovn_mac_binding_add(&put_mac_bindings, dp_key,
>                                                   port_key, &ip_key,
> -                                                 headers->dl_src);
> +                                                 headers->dl_src,
> +                                                 is_unicast);
>      if (!mb) {
>          COVERAGE_INC(pinctrl_drop_put_mac_binding);
>          return;
> @@ -4296,14 +4300,18 @@ run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn,
>          return;
>      }
>  
> -    const struct mac_binding *mb;
> -    HMAP_FOR_EACH (mb, hmap_node, &put_mac_bindings) {
> -        run_put_mac_binding(ovnsb_idl_txn, sbrec_datapath_binding_by_key,
> -                            sbrec_port_binding_by_key,
> -                            sbrec_mac_binding_by_lport_ip,
> -                            mb);
> +    long long now = time_msec();
> +
> +    struct mac_binding *mb;
> +    HMAP_FOR_EACH_SAFE (mb, hmap_node, &put_mac_bindings) {
> +        if (ovn_mac_binding_can_commit(mb, now)) {
> +            run_put_mac_binding(ovnsb_idl_txn,
> +                                sbrec_datapath_binding_by_key,
> +                                sbrec_port_binding_by_key,
> +                                sbrec_mac_binding_by_lport_ip, mb);
> +            ovn_mac_binding_remove(mb, &put_mac_bindings);
> +        }
>      }
> -    ovn_mac_bindings_flush(&put_mac_bindings);
>  }
>  
>  static void
> @@ -4352,9 +4360,10 @@ run_buffered_binding(struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
>  
>  static void
>  wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn)
> +    OVS_REQUIRES(pinctrl_mutex)
>  {
>      if (ovnsb_idl_txn && !hmap_is_empty(&put_mac_bindings)) {
> -        poll_immediate_wake();
> +        ovn_mac_binding_wait(&put_mac_bindings);
>      }
>  }
>  
> diff --git a/tests/ovn.at b/tests/ovn.at
> index c346975e6..eff18fb84 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -23002,7 +23002,7 @@ send_garp 1 1 $eth_src $eth_dst $spa $tpa
>  
>  wait_row_count MAC_Binding 1
>  
> -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
>  list mac_binding], [0], [lr0-sw0
>  10.0.0.30
>  50:54:00:00:00:03
> @@ -23049,7 +23049,7 @@ grep table_id=10 | wc -l`])
>  
>  check_row_count MAC_Binding 1
>  
> -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
>  list mac_binding], [0], [lr0-sw0
>  10.0.0.30
>  50:54:00:00:00:13
> @@ -23078,7 +23078,7 @@ OVS_WAIT_UNTIL(
>  | wc -l`]
>  )
>  
> -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
>  find mac_binding ip=10.0.0.50], [0], [lr0-sw0
>  10.0.0.50
>  50:54:00:00:00:33
Numan Siddique July 19, 2022, 7:18 p.m. UTC | #2
On Thu, Jul 14, 2022 at 6:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/14/22 12:37, Ales Musil wrote:
> > The ovn-controller had a race condition over MAC
> > binding table with other controllers. When multiple
> > controllers received GARP from single source usually
> > the one who was able to win the race put it into SB.
> > The others got transaction error which triggered
> > full recompute even if it's not needed.
> >
> > In order to reduce the chance of multiple controllers
> > trying to insert same row at the same time add slight
> > delay to the MAC binding processing. This delay
> > is random interval between 1-50 in ms. This greatly
> > reduces the chance that multiple controllers will try
> > to add MAC binding at exactly the same time. This applies
> > only to multicast ARP request which applies only to GARP
> > that is sent to broadcast address.
> >
> > Local testing with this delay vs without show significantly
> > reduced chance of hitting the transaction error.
> >
> > During the testing 10 GARPs was sent to two controllers
> > at the same time. Without proposed fix at least one controller
> > had multiple transaction errors present every test iteration.
> >
> > With the proposed fix the transaction error was reduced to a
> > single one when it happened which was usually in less than
> > 10% of the iterations.
> >
> > As was mentioned before the race condition can still happen,
> > but the chance is greatly reduced.
> >
> > Suggested-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
> > Signed-off-by: Ales Musil <amusil@redhat.com>
> > ---
> > v3: Rebase on top of current main.
> >     Change the delay to be per ARP request instead of
> >     single delay for everything. This allows the possibility
> >     to delay only multicast/broadcast AP as suggested by Han.
> > v4,v5: Address comments from Dumitru.
> > ---
>
> Thanks for this new revision!
>
> I have two very small comments but I don't think you need to send a v6
> for this, let's see if the maintainers require that or if the
> suggestions can be incorporated when the patch is accepted.
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks.  I applied this patch to the main branch.  Unfortunately I
missed addressing
Dumitru's comments before applying.  If you can send a follow up patch
that would be great.

Numan

>
> >  controller/mac-learn.c | 41 +++++++++++++++++++++++++++++++++--------
> >  controller/mac-learn.h |  9 +++++++--
> >  controller/pinctrl.c   | 27 ++++++++++++++++++---------
> >  tests/ovn.at           |  6 +++---
> >  4 files changed, 61 insertions(+), 22 deletions(-)
> >
> > diff --git a/controller/mac-learn.c b/controller/mac-learn.c
> > index 27634dca8..a27607016 100644
> > --- a/controller/mac-learn.c
> > +++ b/controller/mac-learn.c
> > @@ -18,14 +18,18 @@
> >  #include "mac-learn.h"
> >
> >  /* OpenvSwitch lib includes. */
> > +#include "openvswitch/poll-loop.h"
> >  #include "openvswitch/vlog.h"
> >  #include "lib/packets.h"
> > +#include "lib/random.h"
> >  #include "lib/smap.h"
> > +#include "lib/timeval.h"
> >
> >  VLOG_DEFINE_THIS_MODULE(mac_learn);
> >
> >  #define MAX_MAC_BINDINGS 1000
> >  #define MAX_FDB_ENTRIES  1000
> > +#define MAX_MAC_BINDING_DELAY_MSEC 50
> >
> >  static size_t mac_binding_hash(uint32_t dp_key, uint32_t port_key,
> >                                 struct in6_addr *);
> > @@ -46,25 +50,19 @@ ovn_mac_bindings_init(struct hmap *mac_bindings)
> >  }
> >
> >  void
> > -ovn_mac_bindings_flush(struct hmap *mac_bindings)
> > +ovn_mac_bindings_destroy(struct hmap *mac_bindings)
> >  {
> >      struct mac_binding *mb;
> >      HMAP_FOR_EACH_POP (mb, hmap_node, mac_bindings) {
> >          free(mb);
> >      }
> > -}
> > -
> > -void
> > -ovn_mac_bindings_destroy(struct hmap *mac_bindings)
> > -{
> > -    ovn_mac_bindings_flush(mac_bindings);
> >      hmap_destroy(mac_bindings);
> >  }
> >
> >  struct mac_binding *
> >  ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
> >                      uint32_t port_key, struct in6_addr *ip,
> > -                    struct eth_addr mac)
> > +                    struct eth_addr mac, bool is_unicast)
> >  {
> >      uint32_t hash = mac_binding_hash(dp_key, port_key, ip);
> >
> > @@ -75,10 +73,13 @@ ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
> >              return NULL;
> >          }
> >
> > +        uint32_t delay = is_unicast
> > +            ? 0 : random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1;
>
> Nit: the coding-style doesn't seem to be extremely specific but the way
> I read it this should be indented a bit differently:
>
> uint32_t delay = is_unicast
>                  ? 0 : random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1;
>
> Or maybe cleaner:
>
> uint32_t delay = !is_unicast
>                  ? random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1
>                  : 0;
>
> >          mb = xmalloc(sizeof *mb);
> >          mb->dp_key = dp_key;
> >          mb->port_key = port_key;
> >          mb->ip = *ip;
> > +        mb->commit_at_ms = time_msec() + delay;
> >          hmap_insert(mac_bindings, &mb->hmap_node, hash);
> >      }
> >      mb->mac = mac;
> > @@ -86,6 +87,30 @@ ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
> >      return mb;
> >  }
> >
> > +/* This is called from ovn-controller main context */
> > +void
> > +ovn_mac_binding_wait(struct hmap *mac_bindings)
> > +{
> > +    struct mac_binding *mb;
> > +
> > +    HMAP_FOR_EACH (mb, hmap_node, mac_bindings) {
> > +        poll_timer_wait_until(mb->commit_at_ms);
> > +    }
> > +}
> > +
> > +void
> > +ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings)
> > +{
> > +    hmap_remove(mac_bindings, &mb->hmap_node);
> > +    free(mb);
> > +}
> > +
> > +bool
> > +ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now)
> > +{
> > +    return now >= mb->commit_at_ms;
> > +}
> > +
> >  /* fdb functions. */
> >  void
> >  ovn_fdb_init(struct hmap *fdbs)
> > diff --git a/controller/mac-learn.h b/controller/mac-learn.h
> > index e7e8ba2d3..57c50c58b 100644
> > --- a/controller/mac-learn.h
> > +++ b/controller/mac-learn.h
> > @@ -31,16 +31,21 @@ struct mac_binding {
> >
> >      /* Value. */
> >      struct eth_addr mac;
> > +
> > +    /* Timestamp when to commit to SB. */
> > +    long long commit_at_ms;
> >  };
> >
> >  void ovn_mac_bindings_init(struct hmap *mac_bindings);
> > -void ovn_mac_bindings_flush(struct hmap *mac_bindings);
> >  void ovn_mac_bindings_destroy(struct hmap *mac_bindings);
> > +void ovn_mac_binding_wait(struct hmap *mac_bindings);
> > +void ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings);
> > +bool ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now);
> >
> >  struct mac_binding *ovn_mac_binding_add(struct hmap *mac_bindings,
> >                                          uint32_t dp_key, uint32_t port_key,
> >                                          struct in6_addr *ip,
> > -                                        struct eth_addr mac);
> > +                                        struct eth_addr mac, bool is_unicast);
> >
> >
> >
> > diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > index f954362b7..beb3d3344 100644
> > --- a/controller/pinctrl.c
> > +++ b/controller/pinctrl.c
> > @@ -4134,9 +4134,13 @@ pinctrl_handle_put_mac_binding(const struct flow *md,
> >          memcpy(&ip_key, &ip6, sizeof ip_key);
> >      }
> >
> > +    /* If the ARP reply was unicast we should not delay it,
> > +     * there won't be any race. */
>
> Nit: Now that you encapsulated the delay logic in mac-learn.c, I think
> this comment can go inside ovn_mac_binding_add() before we compute 'delay'.
>
> > +    bool is_unicast = !eth_addr_is_multicast(headers->dl_dst);
> >      struct mac_binding *mb = ovn_mac_binding_add(&put_mac_bindings, dp_key,
> >                                                   port_key, &ip_key,
> > -                                                 headers->dl_src);
> > +                                                 headers->dl_src,
> > +                                                 is_unicast);
> >      if (!mb) {
> >          COVERAGE_INC(pinctrl_drop_put_mac_binding);
> >          return;
> > @@ -4296,14 +4300,18 @@ run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >          return;
> >      }
> >
> > -    const struct mac_binding *mb;
> > -    HMAP_FOR_EACH (mb, hmap_node, &put_mac_bindings) {
> > -        run_put_mac_binding(ovnsb_idl_txn, sbrec_datapath_binding_by_key,
> > -                            sbrec_port_binding_by_key,
> > -                            sbrec_mac_binding_by_lport_ip,
> > -                            mb);
> > +    long long now = time_msec();
> > +
> > +    struct mac_binding *mb;
> > +    HMAP_FOR_EACH_SAFE (mb, hmap_node, &put_mac_bindings) {
> > +        if (ovn_mac_binding_can_commit(mb, now)) {
> > +            run_put_mac_binding(ovnsb_idl_txn,
> > +                                sbrec_datapath_binding_by_key,
> > +                                sbrec_port_binding_by_key,
> > +                                sbrec_mac_binding_by_lport_ip, mb);
> > +            ovn_mac_binding_remove(mb, &put_mac_bindings);
> > +        }
> >      }
> > -    ovn_mac_bindings_flush(&put_mac_bindings);
> >  }
> >
> >  static void
> > @@ -4352,9 +4360,10 @@ run_buffered_binding(struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
> >
> >  static void
> >  wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn)
> > +    OVS_REQUIRES(pinctrl_mutex)
> >  {
> >      if (ovnsb_idl_txn && !hmap_is_empty(&put_mac_bindings)) {
> > -        poll_immediate_wake();
> > +        ovn_mac_binding_wait(&put_mac_bindings);
> >      }
> >  }
> >
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index c346975e6..eff18fb84 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -23002,7 +23002,7 @@ send_garp 1 1 $eth_src $eth_dst $spa $tpa
> >
> >  wait_row_count MAC_Binding 1
> >
> > -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> > +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> >  list mac_binding], [0], [lr0-sw0
> >  10.0.0.30
> >  50:54:00:00:00:03
> > @@ -23049,7 +23049,7 @@ grep table_id=10 | wc -l`])
> >
> >  check_row_count MAC_Binding 1
> >
> > -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> > +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> >  list mac_binding], [0], [lr0-sw0
> >  10.0.0.30
> >  50:54:00:00:00:13
> > @@ -23078,7 +23078,7 @@ OVS_WAIT_UNTIL(
> >  | wc -l`]
> >  )
> >
> > -AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> > +OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
> >  find mac_binding ip=10.0.0.50], [0], [lr0-sw0
> >  10.0.0.50
> >  50:54:00:00:00:33
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/controller/mac-learn.c b/controller/mac-learn.c
index 27634dca8..a27607016 100644
--- a/controller/mac-learn.c
+++ b/controller/mac-learn.c
@@ -18,14 +18,18 @@ 
 #include "mac-learn.h"
 
 /* OpenvSwitch lib includes. */
+#include "openvswitch/poll-loop.h"
 #include "openvswitch/vlog.h"
 #include "lib/packets.h"
+#include "lib/random.h"
 #include "lib/smap.h"
+#include "lib/timeval.h"
 
 VLOG_DEFINE_THIS_MODULE(mac_learn);
 
 #define MAX_MAC_BINDINGS 1000
 #define MAX_FDB_ENTRIES  1000
+#define MAX_MAC_BINDING_DELAY_MSEC 50
 
 static size_t mac_binding_hash(uint32_t dp_key, uint32_t port_key,
                                struct in6_addr *);
@@ -46,25 +50,19 @@  ovn_mac_bindings_init(struct hmap *mac_bindings)
 }
 
 void
-ovn_mac_bindings_flush(struct hmap *mac_bindings)
+ovn_mac_bindings_destroy(struct hmap *mac_bindings)
 {
     struct mac_binding *mb;
     HMAP_FOR_EACH_POP (mb, hmap_node, mac_bindings) {
         free(mb);
     }
-}
-
-void
-ovn_mac_bindings_destroy(struct hmap *mac_bindings)
-{
-    ovn_mac_bindings_flush(mac_bindings);
     hmap_destroy(mac_bindings);
 }
 
 struct mac_binding *
 ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
                     uint32_t port_key, struct in6_addr *ip,
-                    struct eth_addr mac)
+                    struct eth_addr mac, bool is_unicast)
 {
     uint32_t hash = mac_binding_hash(dp_key, port_key, ip);
 
@@ -75,10 +73,13 @@  ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
             return NULL;
         }
 
+        uint32_t delay = is_unicast
+            ? 0 : random_range(MAX_MAC_BINDING_DELAY_MSEC) + 1;
         mb = xmalloc(sizeof *mb);
         mb->dp_key = dp_key;
         mb->port_key = port_key;
         mb->ip = *ip;
+        mb->commit_at_ms = time_msec() + delay;
         hmap_insert(mac_bindings, &mb->hmap_node, hash);
     }
     mb->mac = mac;
@@ -86,6 +87,30 @@  ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
     return mb;
 }
 
+/* This is called from ovn-controller main context */
+void
+ovn_mac_binding_wait(struct hmap *mac_bindings)
+{
+    struct mac_binding *mb;
+
+    HMAP_FOR_EACH (mb, hmap_node, mac_bindings) {
+        poll_timer_wait_until(mb->commit_at_ms);
+    }
+}
+
+void
+ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings)
+{
+    hmap_remove(mac_bindings, &mb->hmap_node);
+    free(mb);
+}
+
+bool
+ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now)
+{
+    return now >= mb->commit_at_ms;
+}
+
 /* fdb functions. */
 void
 ovn_fdb_init(struct hmap *fdbs)
diff --git a/controller/mac-learn.h b/controller/mac-learn.h
index e7e8ba2d3..57c50c58b 100644
--- a/controller/mac-learn.h
+++ b/controller/mac-learn.h
@@ -31,16 +31,21 @@  struct mac_binding {
 
     /* Value. */
     struct eth_addr mac;
+
+    /* Timestamp when to commit to SB. */
+    long long commit_at_ms;
 };
 
 void ovn_mac_bindings_init(struct hmap *mac_bindings);
-void ovn_mac_bindings_flush(struct hmap *mac_bindings);
 void ovn_mac_bindings_destroy(struct hmap *mac_bindings);
+void ovn_mac_binding_wait(struct hmap *mac_bindings);
+void ovn_mac_binding_remove(struct mac_binding *mb, struct hmap *mac_bindings);
+bool ovn_mac_binding_can_commit(const struct mac_binding *mb, long long now);
 
 struct mac_binding *ovn_mac_binding_add(struct hmap *mac_bindings,
                                         uint32_t dp_key, uint32_t port_key,
                                         struct in6_addr *ip,
-                                        struct eth_addr mac);
+                                        struct eth_addr mac, bool is_unicast);
 
 
 
diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f954362b7..beb3d3344 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -4134,9 +4134,13 @@  pinctrl_handle_put_mac_binding(const struct flow *md,
         memcpy(&ip_key, &ip6, sizeof ip_key);
     }
 
+    /* If the ARP reply was unicast we should not delay it,
+     * there won't be any race. */
+    bool is_unicast = !eth_addr_is_multicast(headers->dl_dst);
     struct mac_binding *mb = ovn_mac_binding_add(&put_mac_bindings, dp_key,
                                                  port_key, &ip_key,
-                                                 headers->dl_src);
+                                                 headers->dl_src,
+                                                 is_unicast);
     if (!mb) {
         COVERAGE_INC(pinctrl_drop_put_mac_binding);
         return;
@@ -4296,14 +4300,18 @@  run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn,
         return;
     }
 
-    const struct mac_binding *mb;
-    HMAP_FOR_EACH (mb, hmap_node, &put_mac_bindings) {
-        run_put_mac_binding(ovnsb_idl_txn, sbrec_datapath_binding_by_key,
-                            sbrec_port_binding_by_key,
-                            sbrec_mac_binding_by_lport_ip,
-                            mb);
+    long long now = time_msec();
+
+    struct mac_binding *mb;
+    HMAP_FOR_EACH_SAFE (mb, hmap_node, &put_mac_bindings) {
+        if (ovn_mac_binding_can_commit(mb, now)) {
+            run_put_mac_binding(ovnsb_idl_txn,
+                                sbrec_datapath_binding_by_key,
+                                sbrec_port_binding_by_key,
+                                sbrec_mac_binding_by_lport_ip, mb);
+            ovn_mac_binding_remove(mb, &put_mac_bindings);
+        }
     }
-    ovn_mac_bindings_flush(&put_mac_bindings);
 }
 
 static void
@@ -4352,9 +4360,10 @@  run_buffered_binding(struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip,
 
 static void
 wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn)
+    OVS_REQUIRES(pinctrl_mutex)
 {
     if (ovnsb_idl_txn && !hmap_is_empty(&put_mac_bindings)) {
-        poll_immediate_wake();
+        ovn_mac_binding_wait(&put_mac_bindings);
     }
 }
 
diff --git a/tests/ovn.at b/tests/ovn.at
index c346975e6..eff18fb84 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -23002,7 +23002,7 @@  send_garp 1 1 $eth_src $eth_dst $spa $tpa
 
 wait_row_count MAC_Binding 1
 
-AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
+OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
 list mac_binding], [0], [lr0-sw0
 10.0.0.30
 50:54:00:00:00:03
@@ -23049,7 +23049,7 @@  grep table_id=10 | wc -l`])
 
 check_row_count MAC_Binding 1
 
-AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
+OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
 list mac_binding], [0], [lr0-sw0
 10.0.0.30
 50:54:00:00:00:13
@@ -23078,7 +23078,7 @@  OVS_WAIT_UNTIL(
 | wc -l`]
 )
 
-AT_CHECK([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
+OVS_WAIT_UNTIL([ovn-sbctl --format=csv --bare --columns logical_port,ip,mac \
 find mac_binding ip=10.0.0.50], [0], [lr0-sw0
 10.0.0.50
 50:54:00:00:00:33