Message ID | 20210507164706.10013-1-dceara@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] ovsdb-cs: Consider all tables when computing expected cond seqno. | expand |
On Fri, May 07, 2021 at 06:47:06PM +0200, Dumitru Ceara wrote: > In ovsdb_cs_db_set_condition(), take into account all pending condition > changes for all tables when computing the db->cond_seqno at which the > monitor is expected to be updated. > > In the following scenario, with two tables, A and B, the old code > performed the following steps: > 1. Initial db->cond_seqno = X. > 2. Client changes condition for table A: > - A->new_cond gets set > - expected cond seqno returned to the client: X + 1 > 3. ovsdb-cs sends the monitor_cond_change for table A > - A->req_cond <- A->new_cond > 4. Client changes condition for table B: > - B->new_cond gets set > - expected cond seqno returned to the client: X + 1 > - however, because the conditon change at step 3 is still not replied > to, table B's monitor_cond_change request is not sent yet. > 5. ovsdb-cs receives the reply for the condition change at step 3: > - db->cond_seqno <- X + 1 > 6. ovsdb-cs sends the monitor_cond_change for table B > 7. ovsdb-cs receives the reply for the condition change at step 6: > - db->cond_seqno <- X + 2 > > The client was incorrectly informed that it will have all relevant > updates for table B at seqno X + 1 while actually that happens later, at > seqno X + 2. > > Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API") > Signed-off-by: Dumitru Ceara <dceara@redhat.com> It's not very often that we get a bug fix for a 4 1/2 year old patch. I hope you're willing to backport. I thought about asking for a test but this seems really challenging to test. Distributed systems are hard. I've stared at this and thought about it and it seems correct. Here's my ack: Acked-by: Ben Pfaff <blp@ovn.org> I have tiny comments on the code itself. One is that any_req_cond is only used in one branch of the 'if', so it could be calculated just on that branch. The other is that any_req_cond is a bool so that the !! in !!any_req_cond can be removed.
On 5/7/21 7:55 PM, Ben Pfaff wrote: > On Fri, May 07, 2021 at 06:47:06PM +0200, Dumitru Ceara wrote: >> In ovsdb_cs_db_set_condition(), take into account all pending condition >> changes for all tables when computing the db->cond_seqno at which the >> monitor is expected to be updated. >> >> In the following scenario, with two tables, A and B, the old code >> performed the following steps: >> 1. Initial db->cond_seqno = X. >> 2. Client changes condition for table A: >> - A->new_cond gets set >> - expected cond seqno returned to the client: X + 1 >> 3. ovsdb-cs sends the monitor_cond_change for table A >> - A->req_cond <- A->new_cond >> 4. Client changes condition for table B: >> - B->new_cond gets set >> - expected cond seqno returned to the client: X + 1 >> - however, because the conditon change at step 3 is still not replied >> to, table B's monitor_cond_change request is not sent yet. >> 5. ovsdb-cs receives the reply for the condition change at step 3: >> - db->cond_seqno <- X + 1 >> 6. ovsdb-cs sends the monitor_cond_change for table B >> 7. ovsdb-cs receives the reply for the condition change at step 6: >> - db->cond_seqno <- X + 2 >> >> The client was incorrectly informed that it will have all relevant >> updates for table B at seqno X + 1 while actually that happens later, at >> seqno X + 2. >> >> Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API") >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > It's not very often that we get a bug fix for a 4 1/2 year old patch. I > hope you're willing to backport. Sure, no problem. I can send backports for all relevant branches once this is accepted. I think it wasn't really spotted before because there aren't so many users of the IDL/CS modules that use the return value of ovsdb_idl_set_condition()/ovsdb_cs_set_condition(). OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected sequence number while setting conditions.") and 58e5d32b011b ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight."). > > I thought about asking for a test but this seems really challenging to > test. Distributed systems are hard. > > I've stared at this and thought about it and it seems correct. Here's > my ack: > Acked-by: Ben Pfaff <blp@ovn.org> > > I have tiny comments on the code itself. One is that any_req_cond is > only used in one branch of the 'if', so it could be calculated just on > that branch. The other is that any_req_cond is a bool so that > the !! in !!any_req_cond can be removed. Ah, I was under the (wrong) impression that there's no guarantee in the standard about bool having only 0 and 1 as possible values. I'll send a v2 taking care of your comments. Thanks for the review! Regards, Dumitru
On Fri, May 07, 2021 at 09:01:01PM +0200, Dumitru Ceara wrote: > On 5/7/21 7:55 PM, Ben Pfaff wrote: > > It's not very often that we get a bug fix for a 4 1/2 year old patch. I > > hope you're willing to backport. > > Sure, no problem. I can send backports for all relevant branches once > this is accepted. > > I think it wasn't really spotted before because there aren't so many > users of the IDL/CS modules that use the return value of > ovsdb_idl_set_condition()/ovsdb_cs_set_condition(). > > OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected > sequence number while setting conditions.") and 58e5d32b011b > ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight."). Oh, I see. If this is a bug that doesn't actually affect older versions, because the return value wasn't used for anything that mattered, then there's no need to backport it. > > I thought about asking for a test but this seems really challenging to > > test. Distributed systems are hard. > > > > I've stared at this and thought about it and it seems correct. Here's > > my ack: > > Acked-by: Ben Pfaff <blp@ovn.org> > > > > I have tiny comments on the code itself. One is that any_req_cond is > > only used in one branch of the 'if', so it could be calculated just on > > that branch. The other is that any_req_cond is a bool so that > > the !! in !!any_req_cond can be removed. > > Ah, I was under the (wrong) impression that there's no guarantee in the > standard about bool having only 0 and 1 as possible values. I'll send a > v2 taking care of your comments. C99 introduced bool as a 1-bit type. Compilers only slowly updated to add support for that, so for a long time C programmers use conditional compilation with a fallback to a typedef to a character type, which obviously does have more than 0 and 1 as possible values. All the compilers that OVS/OVN care about do support bool as a 1-bit type now, so it's no longer anything to worry about. We should update coding-style.rst to reflect this; it still has the following: - ``bool`` and ``<stdbool.h>``, but don't assume that ``bool`` or ``_Bool`` can only take on the values ``0`` or ``1``, because this behavior can't be simulated on C89 compilers. Also, don't assume that a conversion to ``bool`` or ``_Bool`` follows C99 semantics, i.e. use ``(bool) (some_value != 0)`` rather than ``(bool) some_value``. The latter might produce unexpected results on non-C99 environments. For example, if ``bool`` is implemented as a typedef of char and ``some_value = 0x10000000``. However, in this case, the variable in question is only assigned true or false, which are always defined as 1 or 0, so it works correctly even if it has more than 1 bit. Boy, that's a lot of fuss for being able to remove !!, maybe I should have stayed quiet :-)
On 5/7/21 9:09 PM, Ben Pfaff wrote: > On Fri, May 07, 2021 at 09:01:01PM +0200, Dumitru Ceara wrote: >> On 5/7/21 7:55 PM, Ben Pfaff wrote: >>> It's not very often that we get a bug fix for a 4 1/2 year old patch. I >>> hope you're willing to backport. >> >> Sure, no problem. I can send backports for all relevant branches once >> this is accepted. >> >> I think it wasn't really spotted before because there aren't so many >> users of the IDL/CS modules that use the return value of >> ovsdb_idl_set_condition()/ovsdb_cs_set_condition(). >> >> OVN started doing that since be4b7719dc0d ("ovsdb-idlc: Return expected >> sequence number while setting conditions.") and 58e5d32b011b >> ("ovn-controller: Fix nb_cfg update with monitor_cond_change in flight."). > > Oh, I see. > > If this is a bug that doesn't actually affect older versions, because > the return value wasn't used for anything that mattered, then there's no > need to backport it. We're discussing whether OVN stable branches should be using OVS submodules pointing to OVS stable branches so I think it still makes sense to backport to the LTS and supported stable branches. I see some OVN tests failing every now and then because of the bug this patch fixes so it would help reducing the false positives especially on stable branches. > >>> I thought about asking for a test but this seems really challenging to >>> test. Distributed systems are hard. >>> >>> I've stared at this and thought about it and it seems correct. Here's >>> my ack: >>> Acked-by: Ben Pfaff <blp@ovn.org> >>> >>> I have tiny comments on the code itself. One is that any_req_cond is >>> only used in one branch of the 'if', so it could be calculated just on >>> that branch. The other is that any_req_cond is a bool so that >>> the !! in !!any_req_cond can be removed. >> >> Ah, I was under the (wrong) impression that there's no guarantee in the >> standard about bool having only 0 and 1 as possible values. I'll send a >> v2 taking care of your comments. > > C99 introduced bool as a 1-bit type. Compilers only slowly updated to > add support for that, so for a long time C programmers use conditional > compilation with a fallback to a typedef to a character type, which > obviously does have more than 0 and 1 as possible values. All the > compilers that OVS/OVN care about do support bool as a 1-bit type now, > so it's no longer anything to worry about. We should update > coding-style.rst to reflect this; it still has the following: > > - ``bool`` and ``<stdbool.h>``, but don't assume that ``bool`` or > ``_Bool`` can only take on the values ``0`` or ``1``, because this > behavior can't be simulated on C89 compilers. > > Also, don't assume that a conversion to ``bool`` or ``_Bool`` > follows C99 semantics, i.e. use ``(bool) (some_value != 0)`` > rather than ``(bool) some_value``. The latter might produce > unexpected results on non-C99 environments. For example, if > ``bool`` is implemented as a typedef of char and ``some_value = > 0x10000000``. > > However, in this case, the variable in question is only assigned true or > false, which are always defined as 1 or 0, so it works correctly even if > it has more than 1 bit. > > Boy, that's a lot of fuss for being able to remove !!, maybe I should > have stayed quiet :-) > :)
diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c index 6f9f912ac4..8b9b586cdd 100644 --- a/lib/ovsdb-cs.c +++ b/lib/ovsdb-cs.c @@ -903,8 +903,29 @@ ovsdb_cs_db_set_condition(struct ovsdb_cs_db *db, const char *table, } /* Conditions will be up to date when we receive replies for already - * requested and new conditions, if any. */ - return db->cond_seqno + (t->new_cond ? 1 : 0) + (t->req_cond ? 1 : 0); + * requested and new conditions, if any. This includes condition change + * requests for other tables too. + */ + bool any_req_cond = false; + struct ovsdb_cs_db_table *t2; + HMAP_FOR_EACH (t2, hmap_node, &db->tables) { + if (t2->req_cond) { + any_req_cond = true; + break; + } + } + + if (t->new_cond) { + /* New condition will be sent out after all already requested ones + * are acked. + */ + return db->cond_seqno + !!any_req_cond + 1; + } else { + /* Already requested conditions should be up to date at + * db->cond_seqno + 1 while acked conditions are already up to date. + */ + return db->cond_seqno + !!t->req_cond; + } } /* Sets the replication condition for 'tc' in 'cs' to 'condition' and arranges
In ovsdb_cs_db_set_condition(), take into account all pending condition changes for all tables when computing the db->cond_seqno at which the monitor is expected to be updated. In the following scenario, with two tables, A and B, the old code performed the following steps: 1. Initial db->cond_seqno = X. 2. Client changes condition for table A: - A->new_cond gets set - expected cond seqno returned to the client: X + 1 3. ovsdb-cs sends the monitor_cond_change for table A - A->req_cond <- A->new_cond 4. Client changes condition for table B: - B->new_cond gets set - expected cond seqno returned to the client: X + 1 - however, because the conditon change at step 3 is still not replied to, table B's monitor_cond_change request is not sent yet. 5. ovsdb-cs receives the reply for the condition change at step 3: - db->cond_seqno <- X + 1 6. ovsdb-cs sends the monitor_cond_change for table B 7. ovsdb-cs receives the reply for the condition change at step 6: - db->cond_seqno <- X + 2 The client was incorrectly informed that it will have all relevant updates for table B at seqno X + 1 while actually that happens later, at seqno X + 2. Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API") Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- lib/ovsdb-cs.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)