diff mbox series

[ovs-dev] ovn-controller: Relax get_nb_cfg() condition seqno check.

Message ID 1607006942-32452-1-git-send-email-dceara@redhat.com
State Rejected
Headers show
Series [ovs-dev] ovn-controller: Relax get_nb_cfg() condition seqno check. | expand

Commit Message

Dumitru Ceara Dec. 3, 2020, 2:49 p.m. UTC
The IDL might decide to resend pending monitor condition changes
implicitly (one such case is at reconnect) in which case the client
(ovn-controller) might end waiting for a condition seqno that has
already been satisfied.

In order to avoid this case, we now relax the seqno check in
get_nb_cfg() and if the IDL condition seqno is greater or equal than
what ovn-controller expected after changing monitor conditions it means
that there are no in-flight pending changes.

Fixes: 58e5d32b011b ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
CI run with tests passing from the first run (no recheck):
https://github.com/dceara/ovn/actions/runs/398575371
---
 controller/ovn-controller.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Han Zhou Dec. 4, 2020, 8:13 a.m. UTC | #1
On Thu, Dec 3, 2020 at 6:49 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> The IDL might decide to resend pending monitor condition changes
> implicitly (one such case is at reconnect) in which case the client
> (ovn-controller) might end waiting for a condition seqno that has
> already been satisfied.
>
> In order to avoid this case, we now relax the seqno check in
> get_nb_cfg() and if the IDL condition seqno is greater or equal than
> what ovn-controller expected after changing monitor conditions it means
> that there are no in-flight pending changes.
>
> Fixes: 58e5d32b011b ("ovn-controller: Fix nb_cfg update with
monitor_cond_change in flight.")
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
> CI run with tests passing from the first run (no recheck):
> https://github.com/dceara/ovn/actions/runs/398575371
> ---
>  controller/ovn-controller.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 46589e4..a9415b2 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -824,8 +824,15 @@ get_nb_cfg(const struct sbrec_sb_global_table
*sb_global_table,
>      /* Delay getting nb_cfg if there are monitor condition changes
>       * in flight.  It might be that those changes would instruct the
>       * server to send updates that happened before SB_Global.nb_cfg.
> +     *
> +     * The IDL can decide to resend pending conditions upon reconnect in
> +     * which case the expected_cond_seqno is not updated because the
client
> +     * (ovn-controller) did not explicitly request it.  That means that
we
> +     * cannot just check for cond_seqno != expected_cond_seqno and we
also
> +     * have to take into account potential unsigned int overflows.
>       */
> -    if (cond_seqno != expected_cond_seqno) {
> +    if (cond_seqno < expected_cond_seqno &&

If we consider overflows, should the above condition use "!=" ?

> +            (cond_seqno != 0 || expected_cond_seqno != UINT_MAX)) {
>          return nb_cfg;
>      }
>
> --
> 1.8.3.1
>
Dumitru Ceara Dec. 4, 2020, 2:58 p.m. UTC | #2
On 12/4/20 9:13 AM, Han Zhou wrote:
> 
> 
> On Thu, Dec 3, 2020 at 6:49 AM Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>> wrote:
>>
>> The IDL might decide to resend pending monitor condition changes
>> implicitly (one such case is at reconnect) in which case the client
>> (ovn-controller) might end waiting for a condition seqno that has
>> already been satisfied.
>>
>> In order to avoid this case, we now relax the seqno check in
>> get_nb_cfg() and if the IDL condition seqno is greater or equal than
>> what ovn-controller expected after changing monitor conditions it means
>> that there are no in-flight pending changes.
>>
>> Fixes: 58e5d32b011b ("ovn-controller: Fix nb_cfg update with
> monitor_cond_change in flight.")
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>>
>> ---
>> CI run with tests passing from the first run (no recheck):
>> https://github.com/dceara/ovn/actions/runs/398575371
>> ---
>>  controller/ovn-controller.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 46589e4..a9415b2 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -824,8 +824,15 @@ get_nb_cfg(const struct sbrec_sb_global_table
> *sb_global_table,
>>      /* Delay getting nb_cfg if there are monitor condition changes
>>       * in flight.  It might be that those changes would instruct the
>>       * server to send updates that happened before SB_Global.nb_cfg.
>> +     *
>> +     * The IDL can decide to resend pending conditions upon reconnect in
>> +     * which case the expected_cond_seqno is not updated because the
> client
>> +     * (ovn-controller) did not explicitly request it.  That means
> that we
>> +     * cannot just check for cond_seqno != expected_cond_seqno and we
> also
>> +     * have to take into account potential unsigned int overflows.
>>       */
>> -    if (cond_seqno != expected_cond_seqno) {
>> +    if (cond_seqno < expected_cond_seqno &&
> 
> If we consider overflows, should the above condition use "!=" ?
> 

Hi Han,

Thanks for the review.  Actually, after some more debugging, I think the
real issue was due to an IDL bug for which I've just sent a patch:

https://patchwork.ozlabs.org/project/openvswitch/patch/1607093681-17955-1-git-send-email-dceara@redhat.com/

I think we can discard the OVN patch (at least for now).

Thanks,
Dumitru

>> +            (cond_seqno != 0 || expected_cond_seqno != UINT_MAX)) {
>>          return nb_cfg;
>>      }
>>
>> --
>> 1.8.3.1
>>
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 46589e4..a9415b2 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -824,8 +824,15 @@  get_nb_cfg(const struct sbrec_sb_global_table *sb_global_table,
     /* Delay getting nb_cfg if there are monitor condition changes
      * in flight.  It might be that those changes would instruct the
      * server to send updates that happened before SB_Global.nb_cfg.
+     *
+     * The IDL can decide to resend pending conditions upon reconnect in
+     * which case the expected_cond_seqno is not updated because the client
+     * (ovn-controller) did not explicitly request it.  That means that we
+     * cannot just check for cond_seqno != expected_cond_seqno and we also
+     * have to take into account potential unsigned int overflows.
      */
-    if (cond_seqno != expected_cond_seqno) {
+    if (cond_seqno < expected_cond_seqno &&
+            (cond_seqno != 0 || expected_cond_seqno != UINT_MAX)) {
         return nb_cfg;
     }