diff mbox

[ovs-dev] ovn-controller: process lport bindings only when transaction is possible

Message ID 1467482181-27691-1-git-send-email-lrichard@redhat.com
State Accepted
Headers show

Commit Message

Lance Richardson July 2, 2016, 5:56 p.m. UTC
As currently implemented, binding_run() normally updates the set of
locally owned logical ports on each call.  When changes to the
membership of this set are detected (i.e. when locally bound
logical ports are added or deleted), additional processing to
update the sb database with lport binding is performed.

However, the sb database can only be updated when a transaction to
the sb database is possible (that is, when ctx->ovnsb_idl_txn is
non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
happens to be NULL, its binding information will not be updated in
the the sb database until another change to the set of locally-owned
logical ports changes. If no such change ever occurs, the sb database
is never updated with the appropriate binding information.

Eliminate this issue by only updating the set of locally owned logical
ports when an sb database transaction is possible. This addresses
a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
test case.

Signed-off-by: Lance Richardson <lrichard@redhat.com>
---
 ovn/controller/binding.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ben Pfaff July 2, 2016, 7:23 p.m. UTC | #1
On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> As currently implemented, binding_run() normally updates the set of
> locally owned logical ports on each call.  When changes to the
> membership of this set are detected (i.e. when locally bound
> logical ports are added or deleted), additional processing to
> update the sb database with lport binding is performed.
> 
> However, the sb database can only be updated when a transaction to
> the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> happens to be NULL, its binding information will not be updated in
> the the sb database until another change to the set of locally-owned
> logical ports changes. If no such change ever occurs, the sb database
> is never updated with the appropriate binding information.
> 
> Eliminate this issue by only updating the set of locally owned logical
> ports when an sb database transaction is possible. This addresses
> a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> test case.
> 
> Signed-off-by: Lance Richardson <lrichard@redhat.com>
> ---
>  ovn/controller/binding.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index 8b439a6..369f8f2 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
>      }
>  
>      if (br_int) {
> -        if (get_local_iface_ids(br_int, &lports)) {
> +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports)) {
>              process_full_binding = true;
>          }
>      } else {

At a glance, get_local_iface_ids() doesn't seem to directly modify the
southbound database, but it does modify data structures that later code
depends on.  Based on that, it's not obvious for me to see that this is
a correct change.  Can you point out anything relevant?

Thanks,

Ben.
Lance Richardson July 2, 2016, 8:11 p.m. UTC | #2
----- Original Message -----
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org
> Sent: Saturday, July 2, 2016 3:23:35 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> 
> On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > As currently implemented, binding_run() normally updates the set of
> > locally owned logical ports on each call.  When changes to the
> > membership of this set are detected (i.e. when locally bound
> > logical ports are added or deleted), additional processing to
> > update the sb database with lport binding is performed.
> > 
> > However, the sb database can only be updated when a transaction to
> > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > happens to be NULL, its binding information will not be updated in
> > the the sb database until another change to the set of locally-owned
> > logical ports changes. If no such change ever occurs, the sb database
> > is never updated with the appropriate binding information.
> > 
> > Eliminate this issue by only updating the set of locally owned logical
> > ports when an sb database transaction is possible. This addresses
> > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > test case.
> > 
> > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > ---
> >  ovn/controller/binding.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > index 8b439a6..369f8f2 100644
> > --- a/ovn/controller/binding.c
> > +++ b/ovn/controller/binding.c
> > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct
> > ovsrec_bridge *br_int,
> >      }
> >  
> >      if (br_int) {
> > -        if (get_local_iface_ids(br_int, &lports)) {
> > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports)) {
> >              process_full_binding = true;
> >          }
> >      } else {
> 
> At a glance, get_local_iface_ids() doesn't seem to directly modify the
> southbound database, but it does modify data structures that later code
> depends on.  Based on that, it's not obvious for me to see that this is
> a correct change.  Can you point out anything relevant?
> 
> Thanks,
> 
> Ben.
> 

Hi Ben,

The failing scenario goes like this:
   1) Test case logical network setup is complete.
   2) The last physical network port is added via
      as hv3 ovs-vsctl --add-port ... --set Interface vif333 external-ids:iface-id=lp333
   3) hv3 ovn-controller receives update from hv3 ovsdb-server with above mapping,
      binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
   4) binding_run() calls get_local_iface_ids(), which recognizes the new
      local port as matching a logical port, so the lp333 is added to the
      global ssets "lports" and "all_lports".  This means lp333 will not be treated
      as a new logical port on subsequent calls. Because getLocal_iface_ids()
      has discovered a new lport, it returns changed = true.
   5) Because get_local_iface_ids() returned true, binding_run() sets process_full_binding
      to true.
   6) Because process_full_binding is true, binding_run() calls consider_local_datapath()
      for each logical port in shash_lports (which now includes lp333).
   7) consider_local_datapath() processing returns without calling
      sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
   8) There are subsequent calls to binding_run() with non-NULL ctx->ovnsb_idl,
      but because lp333 is already in the "lports" sset, get_local_iface_ids()
      returns changed=false, so process_full_binding is false, which means
      consider_local_datapath() is not called for lp333.
   9) Because consider_local_datapath() is not called for lp333, the sb database
      is not updated with the lport/chassis binding.

Hopefully the above is intelligible. Another way of looking at it would be
to say the condition for calling consider_local_datapath() is an "edge trigger",
this change suppresses the trigger until the necessary actions can be performed.

One thing I have a doubt about is whether there's any guarantee that, after a call
with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL ctx->ovnsb_idl_txn,
but I suspect that other things would not work without this being the case.

   Lance
Lance Richardson July 2, 2016, 8:27 p.m. UTC | #3
----- Original Message -----
> From: "Lance Richardson" <lrichard@redhat.com>
> To: "Ben Pfaff" <blp@ovn.org>
> Cc: dev@openvswitch.org
> Sent: Saturday, July 2, 2016 4:11:14 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> 
> 
> 
> ----- Original Message -----
> > From: "Ben Pfaff" <blp@ovn.org>
> > To: "Lance Richardson" <lrichard@redhat.com>
> > Cc: dev@openvswitch.org
> > Sent: Saturday, July 2, 2016 3:23:35 PM
> > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only
> > when transaction is possible
> > 
> > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > As currently implemented, binding_run() normally updates the set of
> > > locally owned logical ports on each call.  When changes to the
> > > membership of this set are detected (i.e. when locally bound
> > > logical ports are added or deleted), additional processing to
> > > update the sb database with lport binding is performed.
> > > 
> > > However, the sb database can only be updated when a transaction to
> > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > > happens to be NULL, its binding information will not be updated in
> > > the the sb database until another change to the set of locally-owned
> > > logical ports changes. If no such change ever occurs, the sb database
> > > is never updated with the appropriate binding information.
> > > 
> > > Eliminate this issue by only updating the set of locally owned logical
> > > ports when an sb database transaction is possible. This addresses
> > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > > test case.
> > > 
> > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > ---
> > >  ovn/controller/binding.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > index 8b439a6..369f8f2 100644
> > > --- a/ovn/controller/binding.c
> > > +++ b/ovn/controller/binding.c
> > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct
> > > ovsrec_bridge *br_int,
> > >      }
> > >  
> > >      if (br_int) {
> > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports))
> > > {
> > >              process_full_binding = true;
> > >          }
> > >      } else {
> > 
> > At a glance, get_local_iface_ids() doesn't seem to directly modify the
> > southbound database, but it does modify data structures that later code
> > depends on.  Based on that, it's not obvious for me to see that this is
> > a correct change.  Can you point out anything relevant?
> > 
> > Thanks,
> > 
> > Ben.
> > 
> 
> Hi Ben,
> 
> The failing scenario goes like this:
>    1) Test case logical network setup is complete.
>    2) The last physical network port is added via
>       as hv3 ovs-vsctl --add-port ... --set Interface vif333
>       external-ids:iface-id=lp333
>    3) hv3 ovn-controller receives update from hv3 ovsdb-server with above
>    mapping,
>       binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
>    4) binding_run() calls get_local_iface_ids(), which recognizes the new
>       local port as matching a logical port, so the lp333 is added to the
>       global ssets "lports" and "all_lports".  This means lp333 will not be
>       treated
>       as a new logical port on subsequent calls. Because getLocal_iface_ids()
>       has discovered a new lport, it returns changed = true.
>    5) Because get_local_iface_ids() returned true, binding_run() sets
>    process_full_binding
>       to true.
>    6) Because process_full_binding is true, binding_run() calls
>    consider_local_datapath()
>       for each logical port in shash_lports (which now includes lp333).
>    7) consider_local_datapath() processing returns without calling
>       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
>    8) There are subsequent calls to binding_run() with non-NULL
>    ctx->ovnsb_idl,
>       but because lp333 is already in the "lports" sset,
>       get_local_iface_ids()
>       returns changed=false, so process_full_binding is false, which means
>       consider_local_datapath() is not called for lp333.
>    9) Because consider_local_datapath() is not called for lp333, the sb
>    database
>       is not updated with the lport/chassis binding.
> 
> Hopefully the above is intelligible. Another way of looking at it would be
> to say the condition for calling consider_local_datapath() is an "edge
> trigger",
> this change suppresses the trigger until the necessary actions can be
> performed.
> 
> One thing I have a doubt about is whether there's any guarantee that, after a
> call
> with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL
> ctx->ovnsb_idl_txn,
> but I suspect that other things would not work without this being the case.
> 
>    Lance

BTW, I've noticed in the change history that binding_run() used to return
immediately when called with ctx->ovnsb_idl_txn==NULL, but this was later
changed.  I don't have a good understanding of the issues related to that
change, but I could definitely be missing something. Also, before incremental
update handling was added this failure scenario could not have occurred,
so it would be good if the folks working on that notice this thread and weigh in.

   Lance
Ben Pfaff July 3, 2016, 7:18 p.m. UTC | #4
On Sat, Jul 02, 2016 at 04:11:14PM -0400, Lance Richardson wrote:
> 
> 
> ----- Original Message -----
> > From: "Ben Pfaff" <blp@ovn.org>
> > To: "Lance Richardson" <lrichard@redhat.com>
> > Cc: dev@openvswitch.org
> > Sent: Saturday, July 2, 2016 3:23:35 PM
> > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> > 
> > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > As currently implemented, binding_run() normally updates the set of
> > > locally owned logical ports on each call.  When changes to the
> > > membership of this set are detected (i.e. when locally bound
> > > logical ports are added or deleted), additional processing to
> > > update the sb database with lport binding is performed.
> > > 
> > > However, the sb database can only be updated when a transaction to
> > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > > happens to be NULL, its binding information will not be updated in
> > > the the sb database until another change to the set of locally-owned
> > > logical ports changes. If no such change ever occurs, the sb database
> > > is never updated with the appropriate binding information.
> > > 
> > > Eliminate this issue by only updating the set of locally owned logical
> > > ports when an sb database transaction is possible. This addresses
> > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > > test case.
> > > 
> > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > ---
> > >  ovn/controller/binding.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > index 8b439a6..369f8f2 100644
> > > --- a/ovn/controller/binding.c
> > > +++ b/ovn/controller/binding.c
> > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct
> > > ovsrec_bridge *br_int,
> > >      }
> > >  
> > >      if (br_int) {
> > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports)) {
> > >              process_full_binding = true;
> > >          }
> > >      } else {
> > 
> > At a glance, get_local_iface_ids() doesn't seem to directly modify the
> > southbound database, but it does modify data structures that later code
> > depends on.  Based on that, it's not obvious for me to see that this is
> > a correct change.  Can you point out anything relevant?
> > 
> > Thanks,
> > 
> > Ben.
> > 
> 
> Hi Ben,
> 
> The failing scenario goes like this:
>    1) Test case logical network setup is complete.
>    2) The last physical network port is added via
>       as hv3 ovs-vsctl --add-port ... --set Interface vif333 external-ids:iface-id=lp333
>    3) hv3 ovn-controller receives update from hv3 ovsdb-server with above mapping,
>       binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
>    4) binding_run() calls get_local_iface_ids(), which recognizes the new
>       local port as matching a logical port, so the lp333 is added to the
>       global ssets "lports" and "all_lports".  This means lp333 will not be treated
>       as a new logical port on subsequent calls. Because getLocal_iface_ids()
>       has discovered a new lport, it returns changed = true.
>    5) Because get_local_iface_ids() returned true, binding_run() sets process_full_binding
>       to true.
>    6) Because process_full_binding is true, binding_run() calls consider_local_datapath()
>       for each logical port in shash_lports (which now includes lp333).
>    7) consider_local_datapath() processing returns without calling
>       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
>    8) There are subsequent calls to binding_run() with non-NULL ctx->ovnsb_idl,
>       but because lp333 is already in the "lports" sset, get_local_iface_ids()
>       returns changed=false, so process_full_binding is false, which means
>       consider_local_datapath() is not called for lp333.
>    9) Because consider_local_datapath() is not called for lp333, the sb database
>       is not updated with the lport/chassis binding.
> 
> Hopefully the above is intelligible. Another way of looking at it would be
> to say the condition for calling consider_local_datapath() is an "edge trigger",
> this change suppresses the trigger until the necessary actions can be performed.

That is an amazingly detailed rationale.  I appended all of it to the
commit message and applied this to master.

> One thing I have a doubt about is whether there's any guarantee that, after a call
> with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL ctx->ovnsb_idl_txn,
> but I suspect that other things would not work without this being the case.

This should be guaranteed.
Ryan Moats July 3, 2016, 7:24 p.m. UTC | #5
"dev" <dev-bounces@openvswitch.org> wrote on 07/03/2016 02:18:51 PM:

> From: Ben Pfaff <blp@ovn.org>
> To: Lance Richardson <lrichard@redhat.com>
> Cc: dev@openvswitch.org
> Date: 07/03/2016 02:19 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport
> bindings only when transaction is possible
> Sent by: "dev" <dev-bounces@openvswitch.org>
>
> On Sat, Jul 02, 2016 at 04:11:14PM -0400, Lance Richardson wrote:
> >
> >
> > ----- Original Message -----
> > > From: "Ben Pfaff" <blp@ovn.org>
> > > To: "Lance Richardson" <lrichard@redhat.com>
> > > Cc: dev@openvswitch.org
> > > Sent: Saturday, July 2, 2016 3:23:35 PM
> > > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport
> bindings only when transaction is possible
> > >
> > > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > > As currently implemented, binding_run() normally updates the set of
> > > > locally owned logical ports on each call.  When changes to the
> > > > membership of this set are detected (i.e. when locally bound
> > > > logical ports are added or deleted), additional processing to
> > > > update the sb database with lport binding is performed.
> > > >
> > > > However, the sb database can only be updated when a transaction to
> > > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > > non-NULL).  If a new logical port is detected  while ctx->
ovnsb_idl_txn
> > > > happens to be NULL, its binding information will not be updated in
> > > > the the sb database until another change to the set of
locally-owned
> > > > logical ports changes. If no such change ever occurs, the sb
database
> > > > is never updated with the appropriate binding information.
> > > >
> > > > Eliminate this issue by only updating the set of locally owned
logical
> > > > ports when an sb database transaction is possible. This addresses
> > > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1
LR"
> > > > test case.
> > > >
> > > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > > ---
> > > >  ovn/controller/binding.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > > index 8b439a6..369f8f2 100644
> > > > --- a/ovn/controller/binding.c
> > > > +++ b/ovn/controller/binding.c
> > > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const
struct
> > > > ovsrec_bridge *br_int,
> > > >      }
> > > >
> > > >      if (br_int) {
> > > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int,
> &lports)) {
> > > >              process_full_binding = true;
> > > >          }
> > > >      } else {
> > >
> > > At a glance, get_local_iface_ids() doesn't seem to directly modify
the
> > > southbound database, but it does modify data structures that later
code
> > > depends on.  Based on that, it's not obvious for me to see that this
is
> > > a correct change.  Can you point out anything relevant?
> > >
> > > Thanks,
> > >
> > > Ben.
> > >
> >
> > Hi Ben,
> >
> > The failing scenario goes like this:
> >    1) Test case logical network setup is complete.
> >    2) The last physical network port is added via
> >       as hv3 ovs-vsctl --add-port ... --set Interface vif333
> external-ids:iface-id=lp333
> >    3) hv3 ovn-controller receives update from hv3 ovsdb-server
> with above mapping,
> >       binding_run() is called, and ctx->ovnsb_idl_txn happens to be
NULL.
> >    4) binding_run() calls get_local_iface_ids(), which recognizes the
new
> >       local port as matching a logical port, so the lp333 is added to
the
> >       global ssets "lports" and "all_lports".  This means lp333
> will not be treated
> >       as a new logical port on subsequent calls. Because
> getLocal_iface_ids()
> >       has discovered a new lport, it returns changed = true.
> >    5) Because get_local_iface_ids() returned true, binding_run()
> sets process_full_binding
> >       to true.
> >    6) Because process_full_binding is true, binding_run() calls
> consider_local_datapath()
> >       for each logical port in shash_lports (which now includes lp333).
> >    7) consider_local_datapath() processing returns without calling
> >       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is
NULL.
> >    8) There are subsequent calls to binding_run() with non-NULL
> ctx->ovnsb_idl,
> >       but because lp333 is already in the "lports" sset,
> get_local_iface_ids()
> >       returns changed=false, so process_full_binding is false, which
means
> >       consider_local_datapath() is not called for lp333.
> >    9) Because consider_local_datapath() is not called for lp333,
> the sb database
> >       is not updated with the lport/chassis binding.
> >
> > Hopefully the above is intelligible. Another way of looking at it would
be
> > to say the condition for calling consider_local_datapath() is an
> "edge trigger",
> > this change suppresses the trigger until the necessary actions can
> be performed.
>
> That is an amazingly detailed rationale.  I appended all of it to the
> commit message and applied this to master.
>
> > One thing I have a doubt about is whether there's any guarantee
> that, after a call
> > with ctx->ovnsb_idl_txn==NULL, there will be another call with
> non-NULL ctx->ovnsb_idl_txn,
> > but I suspect that other things would not work without this being the
case.
>
> This should be guaranteed.

I admit to being late to the party on this one, but I would have acked it
as being a definite error case had I gotten there in time.

Ryan
Lance Richardson July 3, 2016, 8:46 p.m. UTC | #6
----- Original Message -----
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org
> Sent: Sunday, July 3, 2016 3:18:51 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> 
> On Sat, Jul 02, 2016 at 04:11:14PM -0400, Lance Richardson wrote:
> > 
> > 
> > ----- Original Message -----
> > > From: "Ben Pfaff" <blp@ovn.org>
> > > To: "Lance Richardson" <lrichard@redhat.com>
> > > Cc: dev@openvswitch.org
> > > Sent: Saturday, July 2, 2016 3:23:35 PM
> > > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings
> > > only when transaction is possible
> > > 
> > > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > > As currently implemented, binding_run() normally updates the set of
> > > > locally owned logical ports on each call.  When changes to the
> > > > membership of this set are detected (i.e. when locally bound
> > > > logical ports are added or deleted), additional processing to
> > > > update the sb database with lport binding is performed.
> > > > 
> > > > However, the sb database can only be updated when a transaction to
> > > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > > > happens to be NULL, its binding information will not be updated in
> > > > the the sb database until another change to the set of locally-owned
> > > > logical ports changes. If no such change ever occurs, the sb database
> > > > is never updated with the appropriate binding information.
> > > > 
> > > > Eliminate this issue by only updating the set of locally owned logical
> > > > ports when an sb database transaction is possible. This addresses
> > > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > > > test case.
> > > > 
> > > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > > ---
> > > >  ovn/controller/binding.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > > index 8b439a6..369f8f2 100644
> > > > --- a/ovn/controller/binding.c
> > > > +++ b/ovn/controller/binding.c
> > > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const
> > > > struct
> > > > ovsrec_bridge *br_int,
> > > >      }
> > > >  
> > > >      if (br_int) {
> > > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int,
> > > > &lports)) {
> > > >              process_full_binding = true;
> > > >          }
> > > >      } else {
> > > 
> > > At a glance, get_local_iface_ids() doesn't seem to directly modify the
> > > southbound database, but it does modify data structures that later code
> > > depends on.  Based on that, it's not obvious for me to see that this is
> > > a correct change.  Can you point out anything relevant?
> > > 
> > > Thanks,
> > > 
> > > Ben.
> > > 
> > 
> > Hi Ben,
> > 
> > The failing scenario goes like this:
> >    1) Test case logical network setup is complete.
> >    2) The last physical network port is added via
> >       as hv3 ovs-vsctl --add-port ... --set Interface vif333
> >       external-ids:iface-id=lp333
> >    3) hv3 ovn-controller receives update from hv3 ovsdb-server with above
> >    mapping,
> >       binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
> >    4) binding_run() calls get_local_iface_ids(), which recognizes the new
> >       local port as matching a logical port, so the lp333 is added to the
> >       global ssets "lports" and "all_lports".  This means lp333 will not be
> >       treated
> >       as a new logical port on subsequent calls. Because
> >       getLocal_iface_ids()
> >       has discovered a new lport, it returns changed = true.
> >    5) Because get_local_iface_ids() returned true, binding_run() sets
> >    process_full_binding
> >       to true.
> >    6) Because process_full_binding is true, binding_run() calls
> >    consider_local_datapath()
> >       for each logical port in shash_lports (which now includes lp333).
> >    7) consider_local_datapath() processing returns without calling
> >       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
> >    8) There are subsequent calls to binding_run() with non-NULL
> >    ctx->ovnsb_idl,
> >       but because lp333 is already in the "lports" sset,
> >       get_local_iface_ids()
> >       returns changed=false, so process_full_binding is false, which means
> >       consider_local_datapath() is not called for lp333.
> >    9) Because consider_local_datapath() is not called for lp333, the sb
> >    database
> >       is not updated with the lport/chassis binding.
> > 
> > Hopefully the above is intelligible. Another way of looking at it would be
> > to say the condition for calling consider_local_datapath() is an "edge
> > trigger",
> > this change suppresses the trigger until the necessary actions can be
> > performed.
> 
> That is an amazingly detailed rationale.  I appended all of it to the
> commit message and applied this to master.
> 
Thanks! Now I wish I had proof-read it before sending :)

> > One thing I have a doubt about is whether there's any guarantee that, after
> > a call
> > with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL
> > ctx->ovnsb_idl_txn,
> > but I suspect that other things would not work without this being the case.
> 
> This should be guaranteed.
>
diff mbox

Patch

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 8b439a6..369f8f2 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -266,7 +266,7 @@  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     }
 
     if (br_int) {
-        if (get_local_iface_ids(br_int, &lports)) {
+        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports)) {
             process_full_binding = true;
         }
     } else {