diff mbox series

[ovs-dev] ovsdb-cs: Consider all tables when computing expected cond seqno.

Message ID 20210507164706.10013-1-dceara@redhat.com
State Superseded
Headers show
Series [ovs-dev] ovsdb-cs: Consider all tables when computing expected cond seqno. | expand

Commit Message

Dumitru Ceara May 7, 2021, 4:47 p.m. UTC
In ovsdb_cs_db_set_condition(), take into account all pending condition
changes for all tables when computing the db->cond_seqno at which the
monitor is expected to be updated.

In the following scenario, with two tables, A and B, the old code
performed the following steps:
1. Initial db->cond_seqno = X.
2. Client changes condition for table A:
   - A->new_cond gets set
   - expected cond seqno returned to the client: X + 1
3. ovsdb-cs sends the monitor_cond_change for table A
   - A->req_cond <- A->new_cond
4. Client changes condition for table B:
   - B->new_cond gets set
   - expected cond seqno returned to the client: X + 1
   - however, because the conditon change at step 3 is still not replied
     to, table B's monitor_cond_change request is not sent yet.
5. ovsdb-cs receives the reply for the condition change at step 3:
   - db->cond_seqno <- X + 1
6. ovsdb-cs sends the monitor_cond_change for table B
7. ovsdb-cs receives the reply for the condition change at step 6:
  - db->cond_seqno <- X + 2

The client was incorrectly informed that it will have all relevant
updates for table B at seqno X + 1 while actually that happens later, at
seqno X + 2.

Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 lib/ovsdb-cs.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

Comments

Ben Pfaff May 7, 2021, 5:55 p.m. UTC | #1
On Fri, May 07, 2021 at 06:47:06PM +0200, Dumitru Ceara wrote:
> In ovsdb_cs_db_set_condition(), take into account all pending condition
> changes for all tables when computing the db->cond_seqno at which the
> monitor is expected to be updated.
> 
> In the following scenario, with two tables, A and B, the old code
> performed the following steps:
> 1. Initial db->cond_seqno = X.
> 2. Client changes condition for table A:
>    - A->new_cond gets set
>    - expected cond seqno returned to the client: X + 1
> 3. ovsdb-cs sends the monitor_cond_change for table A
>    - A->req_cond <- A->new_cond
> 4. Client changes condition for table B:
>    - B->new_cond gets set
>    - expected cond seqno returned to the client: X + 1
>    - however, because the conditon change at step 3 is still not replied
>      to, table B's monitor_cond_change request is not sent yet.
> 5. ovsdb-cs receives the reply for the condition change at step 3:
>    - db->cond_seqno <- X + 1
> 6. ovsdb-cs sends the monitor_cond_change for table B
> 7. ovsdb-cs receives the reply for the condition change at step 6:
>   - db->cond_seqno <- X + 2
> 
> The client was incorrectly informed that it will have all relevant
> updates for table B at seqno X + 1 while actually that happens later, at
> seqno X + 2.
> 
> Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>

It's not very often that we get a bug fix for a 4 1/2 year old patch.  I
hope you're willing to backport.

I thought about asking for a test but this seems really challenging to
test.  Distributed systems are hard.

I've stared at this and thought about it and it seems correct.  Here's
my ack:
Acked-by: Ben Pfaff <blp@ovn.org>

I have tiny comments on the code itself.  One is that any_req_cond is
only used in one branch of the 'if', so it could be calculated just on
that branch.  The other is that any_req_cond is a bool so that
the !! in !!any_req_cond can be removed.
Dumitru Ceara May 7, 2021, 7:01 p.m. UTC | #2
On 5/7/21 7:55 PM, Ben Pfaff wrote:
> On Fri, May 07, 2021 at 06:47:06PM +0200, Dumitru Ceara wrote:
>> In ovsdb_cs_db_set_condition(), take into account all pending condition
>> changes for all tables when computing the db->cond_seqno at which the
>> monitor is expected to be updated.
>>
>> In the following scenario, with two tables, A and B, the old code
>> performed the following steps:
>> 1. Initial db->cond_seqno = X.
>> 2. Client changes condition for table A:
>>    - A->new_cond gets set
>>    - expected cond seqno returned to the client: X + 1
>> 3. ovsdb-cs sends the monitor_cond_change for table A
>>    - A->req_cond <- A->new_cond
>> 4. Client changes condition for table B:
>>    - B->new_cond gets set
>>    - expected cond seqno returned to the client: X + 1
>>    - however, because the conditon change at step 3 is still not replied
>>      to, table B's monitor_cond_change request is not sent yet.
>> 5. ovsdb-cs receives the reply for the condition change at step 3:
>>    - db->cond_seqno <- X + 1
>> 6. ovsdb-cs sends the monitor_cond_change for table B
>> 7. ovsdb-cs receives the reply for the condition change at step 6:
>>   - db->cond_seqno <- X + 2
>>
>> The client was incorrectly informed that it will have all relevant
>> updates for table B at seqno X + 1 while actually that happens later, at
>> seqno X + 2.
>>
>> Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> 
> It's not very often that we get a bug fix for a 4 1/2 year old patch.  I
> hope you're willing to backport.

Sure, no problem.  I can send backports for all relevant branches once
this is accepted.

I think it wasn't really spotted before because there aren't so many
users of the IDL/CS modules that use the return value of
ovsdb_idl_set_condition()/ovsdb_cs_set_condition().

OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected
sequence number while setting conditions.") and 58e5d32b011b
("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight.").

> 
> I thought about asking for a test but this seems really challenging to
> test.  Distributed systems are hard.
> 
> I've stared at this and thought about it and it seems correct.  Here's
> my ack:
> Acked-by: Ben Pfaff <blp@ovn.org>
> 
> I have tiny comments on the code itself.  One is that any_req_cond is
> only used in one branch of the 'if', so it could be calculated just on
> that branch.  The other is that any_req_cond is a bool so that
> the !! in !!any_req_cond can be removed.

Ah, I was under the (wrong) impression that there's no guarantee in the
standard about bool having only 0 and 1 as possible values.  I'll send a
v2 taking care of your comments.

Thanks for the review!

Regards,
Dumitru
Ben Pfaff May 7, 2021, 7:09 p.m. UTC | #3
On Fri, May 07, 2021 at 09:01:01PM +0200, Dumitru Ceara wrote:
> On 5/7/21 7:55 PM, Ben Pfaff wrote:
> > It's not very often that we get a bug fix for a 4 1/2 year old patch.  I
> > hope you're willing to backport.
> 
> Sure, no problem.  I can send backports for all relevant branches once
> this is accepted.
> 
> I think it wasn't really spotted before because there aren't so many
> users of the IDL/CS modules that use the return value of
> ovsdb_idl_set_condition()/ovsdb_cs_set_condition().
> 
> OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected
> sequence number while setting conditions.") and 58e5d32b011b
> ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight.").

Oh, I see.

If this is a bug that doesn't actually affect older versions, because
the return value wasn't used for anything that mattered, then there's no
need to backport it.

> > I thought about asking for a test but this seems really challenging to
> > test.  Distributed systems are hard.
> > 
> > I've stared at this and thought about it and it seems correct.  Here's
> > my ack:
> > Acked-by: Ben Pfaff <blp@ovn.org>
> > 
> > I have tiny comments on the code itself.  One is that any_req_cond is
> > only used in one branch of the 'if', so it could be calculated just on
> > that branch.  The other is that any_req_cond is a bool so that
> > the !! in !!any_req_cond can be removed.
> 
> Ah, I was under the (wrong) impression that there's no guarantee in the
> standard about bool having only 0 and 1 as possible values.  I'll send a
> v2 taking care of your comments.

C99 introduced bool as a 1-bit type.  Compilers only slowly updated to
add support for that, so for a long time C programmers use conditional
compilation with a fallback to a typedef to a character type, which
obviously does have more than 0 and 1 as possible values.  All the
compilers that OVS/OVN care about do support bool as a 1-bit type now,
so it's no longer anything to worry about.  We should update
coding-style.rst to reflect this; it still has the following:

    - ``bool`` and ``<stdbool.h>``, but don't assume that ``bool`` or
      ``_Bool`` can only take on the values ``0`` or ``1``, because this
      behavior can't be simulated on C89 compilers.

      Also, don't assume that a conversion to ``bool`` or ``_Bool``
      follows C99 semantics, i.e. use ``(bool) (some_value != 0)``
      rather than ``(bool) some_value``. The latter might produce
      unexpected results on non-C99 environments. For example, if
      ``bool`` is implemented as a typedef of char and ``some_value =
      0x10000000``.

However, in this case, the variable in question is only assigned true or
false, which are always defined as 1 or 0, so it works correctly even if
it has more than 1 bit.

Boy, that's a lot of fuss for being able to remove !!, maybe I should
have stayed quiet :-)
Dumitru Ceara May 7, 2021, 7:13 p.m. UTC | #4
On 5/7/21 9:09 PM, Ben Pfaff wrote:
> On Fri, May 07, 2021 at 09:01:01PM +0200, Dumitru Ceara wrote:
>> On 5/7/21 7:55 PM, Ben Pfaff wrote:
>>> It's not very often that we get a bug fix for a 4 1/2 year old patch.  I
>>> hope you're willing to backport.
>>
>> Sure, no problem.  I can send backports for all relevant branches once
>> this is accepted.
>>
>> I think it wasn't really spotted before because there aren't so many
>> users of the IDL/CS modules that use the return value of
>> ovsdb_idl_set_condition()/ovsdb_cs_set_condition().
>>
>> OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected
>> sequence number while setting conditions.") and 58e5d32b011b
>> ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight.").
> 
> Oh, I see.
> 
> If this is a bug that doesn't actually affect older versions, because
> the return value wasn't used for anything that mattered, then there's no
> need to backport it.

We're discussing whether OVN stable branches should be using OVS
submodules pointing to OVS stable branches so I think it still makes
sense to backport to the LTS and supported stable branches.

I see some OVN tests failing every now and then because of the bug this
patch fixes so it would help reducing the false positives especially on
stable branches.

> 
>>> I thought about asking for a test but this seems really challenging to
>>> test.  Distributed systems are hard.
>>>
>>> I've stared at this and thought about it and it seems correct.  Here's
>>> my ack:
>>> Acked-by: Ben Pfaff <blp@ovn.org>
>>>
>>> I have tiny comments on the code itself.  One is that any_req_cond is
>>> only used in one branch of the 'if', so it could be calculated just on
>>> that branch.  The other is that any_req_cond is a bool so that
>>> the !! in !!any_req_cond can be removed.
>>
>> Ah, I was under the (wrong) impression that there's no guarantee in the
>> standard about bool having only 0 and 1 as possible values.  I'll send a
>> v2 taking care of your comments.
> 
> C99 introduced bool as a 1-bit type.  Compilers only slowly updated to
> add support for that, so for a long time C programmers use conditional
> compilation with a fallback to a typedef to a character type, which
> obviously does have more than 0 and 1 as possible values.  All the
> compilers that OVS/OVN care about do support bool as a 1-bit type now,
> so it's no longer anything to worry about.  We should update
> coding-style.rst to reflect this; it still has the following:
> 
>     - ``bool`` and ``<stdbool.h>``, but don't assume that ``bool`` or
>       ``_Bool`` can only take on the values ``0`` or ``1``, because this
>       behavior can't be simulated on C89 compilers.
> 
>       Also, don't assume that a conversion to ``bool`` or ``_Bool``
>       follows C99 semantics, i.e. use ``(bool) (some_value != 0)``
>       rather than ``(bool) some_value``. The latter might produce
>       unexpected results on non-C99 environments. For example, if
>       ``bool`` is implemented as a typedef of char and ``some_value =
>       0x10000000``.
> 
> However, in this case, the variable in question is only assigned true or
> false, which are always defined as 1 or 0, so it works correctly even if
> it has more than 1 bit.
> 
> Boy, that's a lot of fuss for being able to remove !!, maybe I should
> have stayed quiet :-)
> 

:)
diff mbox series

Patch

diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index 6f9f912ac4..8b9b586cdd 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -903,8 +903,29 @@  ovsdb_cs_db_set_condition(struct ovsdb_cs_db *db, const char *table,
     }
 
     /* Conditions will be up to date when we receive replies for already
-     * requested and new conditions, if any. */
-    return db->cond_seqno + (t->new_cond ? 1 : 0) + (t->req_cond ? 1 : 0);
+     * requested and new conditions, if any.  This includes condition change
+     * requests for other tables too.
+     */
+    bool any_req_cond = false;
+    struct ovsdb_cs_db_table *t2;
+    HMAP_FOR_EACH (t2, hmap_node, &db->tables) {
+        if (t2->req_cond) {
+            any_req_cond = true;
+            break;
+        }
+    }
+
+    if (t->new_cond) {
+        /* New condition will be sent out after all already requested ones
+         * are acked.
+         */
+        return db->cond_seqno + !!any_req_cond + 1;
+    } else {
+        /* Already requested conditions should be up to date at
+         * db->cond_seqno + 1 while acked conditions are already up to date.
+         */
+        return db->cond_seqno + !!t->req_cond;
+    }
 }
 
 /* Sets the replication condition for 'tc' in 'cs' to 'condition' and arranges