Message ID | 1582930845-15475-1-git-send-email-dceara@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] ovsdb-idl.c: Clear conditions when clearing IDL. | expand |
On 2/29/20 12:00 AM, Dumitru Ceara wrote: > If the ovsdb-server reply to "monitor_cond_since" requests has > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls > ovsdb_idl_db_clear() which iterates through all tables and > unconditionally sets table->cond_changed to false. > > However, if the client had already set a new condition for some of the > tables, this new condition request will never be sent to ovsdb-server > until the condition is reset to a different value. This is due to the > check in ovsdb_idl_db_set_condition(). > > In order to fix this we now also call ovsdb_idl_condition_clear() for > each table condition in ovsdb_idl_db_clear(). This ensures that > resetting the condition to the same value as before the > "monitor_cond_since" reply will trigger a new request. > > One way to replicate the issue is described in the bugzilla reporting > the bug, when ovn-controller is configured to use "ovn-monitor-all": > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 > > Reported-by: Dan Williams <dcbw@redhat.com> > Reported-at: https://bugzilla.redhat.com/1808125 > CC: Han Zhou <hzhou@ovn.org> > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.") > Signed-off-by: Dumitru Ceara <dceara@redhat.com> Hi Han, I was wondering if you could review this change because ovn-kubernetes is trying to enable ovn-monitor-all for scalability reasons but their CI fails due to the bug fixed by the patch below. Thanks, Dumitru > --- > lib/ovsdb-idl.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > index 190143f..64721ca 100644 > --- a/lib/ovsdb-idl.c > +++ b/lib/ovsdb-idl.c > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) > struct ovsdb_idl_table *table = &db->tables[i]; > struct ovsdb_idl_row *row, *next_row; > > - table->cond_changed = false; > + if (table->cond_changed) { > + table->cond_changed = false; > + ovsdb_idl_condition_clear(&table->condition); > + } > + > if (hmap_is_empty(&table->rows)) { > continue; > } >
On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 2/29/20 12:00 AM, Dumitru Ceara wrote: > > If the ovsdb-server reply to "monitor_cond_since" requests has > > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls > > ovsdb_idl_db_clear() which iterates through all tables and > > unconditionally sets table->cond_changed to false. > > > > However, if the client had already set a new condition for some of the > > tables, this new condition request will never be sent to ovsdb-server > > until the condition is reset to a different value. This is due to the > > check in ovsdb_idl_db_set_condition(). > > > > In order to fix this we now also call ovsdb_idl_condition_clear() for > > each table condition in ovsdb_idl_db_clear(). This ensures that > > resetting the condition to the same value as before the > > "monitor_cond_since" reply will trigger a new request. > > > > One way to replicate the issue is described in the bugzilla reporting > > the bug, when ovn-controller is configured to use "ovn-monitor-all": > > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 > > > > Reported-by: Dan Williams <dcbw@redhat.com> > > Reported-at: https://bugzilla.redhat.com/1808125 > > CC: Han Zhou <hzhou@ovn.org> > > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.") > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > Hi Han, > > I was wondering if you could review this change because ovn-kubernetes > is trying to enable ovn-monitor-all for scalability reasons but their CI > fails due to the bug fixed by the patch below. > > Thanks, > Dumitru > > > --- > > lib/ovsdb-idl.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > > index 190143f..64721ca 100644 > > --- a/lib/ovsdb-idl.c > > +++ b/lib/ovsdb-idl.c > > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) > > struct ovsdb_idl_table *table = &db->tables[i]; > > struct ovsdb_idl_row *row, *next_row; > > > > - table->cond_changed = false; > > + if (table->cond_changed) { > > + table->cond_changed = false; > > + ovsdb_idl_condition_clear(&table->condition); > > + } > > + > > if (hmap_is_empty(&table->rows)) { > > continue; > > } > > > Hi Dumitru, If I understand the problem, it is when the client updated monitor condition and at the same time it received monitor_cond_since reply from server for it's previous conditional monitor request, and the handling of the reply overwrite the flag table->cond_changed from true to false, and because of this, the new condition is not sent to server. However, I don't understand the fix. It seems that the fix should be "don't overwrite the flag", instead of clearing the conditions as well. Also, by simply looking at this code I don't understand why the flag needs to be set to false when clearing data in IDL. What would be the problem if we keep it as what it is? If you understand the details, could you explain more about the fix? In addition, I didn't change this logic when implementing monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset."). In that commit it only reduced the chance of calling ovsdb_idl_db_clear() as an optimization. Before that commit, ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and it would result in same problem. Thanks, Han
CC-ing Andy and Ben. On 3/3/20 7:43 AM, Han Zhou wrote: > > > On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com > <mailto:dceara@redhat.com>> wrote: >> >> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >> > If the ovsdb-server reply to "monitor_cond_since" requests has >> > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >> > ovsdb_idl_db_clear() which iterates through all tables and >> > unconditionally sets table->cond_changed to false. >> > >> > However, if the client had already set a new condition for some of the >> > tables, this new condition request will never be sent to ovsdb-server >> > until the condition is reset to a different value. This is due to the >> > check in ovsdb_idl_db_set_condition(). >> > >> > In order to fix this we now also call ovsdb_idl_condition_clear() for >> > each table condition in ovsdb_idl_db_clear(). This ensures that >> > resetting the condition to the same value as before the >> > "monitor_cond_since" reply will trigger a new request. >> > >> > One way to replicate the issue is described in the bugzilla reporting >> > the bug, when ovn-controller is configured to use "ovn-monitor-all": >> > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >> > >> > Reported-by: Dan Williams <dcbw@redhat.com <mailto:dcbw@redhat.com>> >> > Reported-at: https://bugzilla.redhat.com/1808125 >> > CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> >> > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when > connection reset.") >> > Signed-off-by: Dumitru Ceara <dceara@redhat.com > <mailto:dceara@redhat.com>> >> >> Hi Han, >> >> I was wondering if you could review this change because ovn-kubernetes >> is trying to enable ovn-monitor-all for scalability reasons but their CI >> fails due to the bug fixed by the patch below. >> >> Thanks, >> Dumitru >> >> > --- >> > lib/ovsdb-idl.c | 6 +++++- >> > 1 file changed, 5 insertions(+), 1 deletion(-) >> > >> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >> > index 190143f..64721ca 100644 >> > --- a/lib/ovsdb-idl.c >> > +++ b/lib/ovsdb-idl.c >> > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >> > struct ovsdb_idl_table *table = &db->tables[i]; >> > struct ovsdb_idl_row *row, *next_row; >> > >> > - table->cond_changed = false; >> > + if (table->cond_changed) { >> > + table->cond_changed = false; >> > + ovsdb_idl_condition_clear(&table->condition); >> > + } >> > + >> > if (hmap_is_empty(&table->rows)) { >> > continue; >> > } >> > >> > > Hi Dumitru> > If I understand the problem, it is when the client updated monitor > condition and at the same time it received monitor_cond_since reply from > server for it's previous conditional monitor request, and the handling > of the reply overwrite the flag table->cond_changed from true to false, > and because of this, the new condition is not sent to server. > Hi Han, Yes, that's what's happening. > However, I don't understand the fix. It seems that the fix should be > "don't overwrite the flag", instead of clearing the conditions as well. > Also, by simply looking at this code I don't understand why the flag > needs to be set to false when clearing data in IDL. What would be the > problem if we keep it as what it is? If you understand the details, > could you explain more about the fix? > > In addition, I didn't change this logic when implementing > monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server > when connection reset."). In that commit it only reduced the chance of > calling ovsdb_idl_db_clear() as an optimization. Before that commit, > ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and > it would result in same problem. You're right, my bad, I messed up the "git blame", sorry about that. It was actually introduced by: https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a I'll fix up the commit message in V2 once we decide on the best approach. Hi Andy, Ben, As far as I understand, whenever ovsdb_idl_db_clear() is called during reconnect, any buffered but unsent conditions should be cleared because they will be anyway resent with the following monitoring condition update message after reconnect has completed. However, clearing the "table->cond_changed" flag is not enough because when the IDL client (ovn-controller in my case) will try to set the same condition as before (e.g., "true" if ovn-monitor-all=true) ovsdb_idl_db_set_condition() will return early: https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 Therefore in the cases when ovsdb_idl_db_clear() is called, we should also clear the existing conditions on the DB tables. As Han suggested, we could also revert the changes from 5351980b047f ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") and we wouldn't hit the issue anymore. I'm not sure however if this has any other major implications. What do you think? Thanks, Dumitru > > Thanks, > Han
On Tue, Mar 03, 2020 at 01:47:48PM +0100, Dumitru Ceara wrote: > CC-ing Andy and Ben. > > On 3/3/20 7:43 AM, Han Zhou wrote: > > > > > > On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com > > <mailto:dceara@redhat.com>> wrote: > >> > >> On 2/29/20 12:00 AM, Dumitru Ceara wrote: > >> > If the ovsdb-server reply to "monitor_cond_since" requests has > >> > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls > >> > ovsdb_idl_db_clear() which iterates through all tables and > >> > unconditionally sets table->cond_changed to false. > >> > > >> > However, if the client had already set a new condition for some of the > >> > tables, this new condition request will never be sent to ovsdb-server > >> > until the condition is reset to a different value. This is due to the > >> > check in ovsdb_idl_db_set_condition(). > >> > > >> > In order to fix this we now also call ovsdb_idl_condition_clear() for > >> > each table condition in ovsdb_idl_db_clear(). This ensures that > >> > resetting the condition to the same value as before the > >> > "monitor_cond_since" reply will trigger a new request. > >> > > >> > One way to replicate the issue is described in the bugzilla reporting > >> > the bug, when ovn-controller is configured to use "ovn-monitor-all": > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 > >> > > >> > Reported-by: Dan Williams <dcbw@redhat.com <mailto:dcbw@redhat.com>> > >> > Reported-at: https://bugzilla.redhat.com/1808125 > >> > CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> > >> > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when > > connection reset.") > >> > Signed-off-by: Dumitru Ceara <dceara@redhat.com > > <mailto:dceara@redhat.com>> > >> > >> Hi Han, > >> > >> I was wondering if you could review this change because ovn-kubernetes > >> is trying to enable ovn-monitor-all for scalability reasons but their CI > >> fails due to the bug fixed by the patch below. > >> > >> Thanks, > >> Dumitru > >> > >> > --- > >> > lib/ovsdb-idl.c | 6 +++++- > >> > 1 file changed, 5 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > >> > index 190143f..64721ca 100644 > >> > --- a/lib/ovsdb-idl.c > >> > +++ b/lib/ovsdb-idl.c > >> > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) > >> > struct ovsdb_idl_table *table = &db->tables[i]; > >> > struct ovsdb_idl_row *row, *next_row; > >> > > >> > - table->cond_changed = false; > >> > + if (table->cond_changed) { > >> > + table->cond_changed = false; > >> > + ovsdb_idl_condition_clear(&table->condition); > >> > + } > >> > + > >> > if (hmap_is_empty(&table->rows)) { > >> > continue; > >> > } > >> > > >> > > > > Hi Dumitru> > > If I understand the problem, it is when the client updated monitor > > condition and at the same time it received monitor_cond_since reply from > > server for it's previous conditional monitor request, and the handling > > of the reply overwrite the flag table->cond_changed from true to false, > > and because of this, the new condition is not sent to server. > > > Hi Han, > > Yes, that's what's happening. > > > However, I don't understand the fix. It seems that the fix should be > > "don't overwrite the flag", instead of clearing the conditions as well. > > Also, by simply looking at this code I don't understand why the flag > > needs to be set to false when clearing data in IDL. What would be the > > problem if we keep it as what it is? If you understand the details, > > could you explain more about the fix? > > > > In addition, I didn't change this logic when implementing > > monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server > > when connection reset."). In that commit it only reduced the chance of > > calling ovsdb_idl_db_clear() as an optimization. Before that commit, > > ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and > > it would result in same problem. > You're right, my bad, I messed up the "git blame", sorry about that. It > was actually introduced by: > > https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a > > I'll fix up the commit message in V2 once we decide on the best approach. > > Hi Andy, Ben, > > As far as I understand, whenever ovsdb_idl_db_clear() is called during > reconnect, any buffered but unsent conditions should be cleared because > they will be anyway resent with the following monitoring condition > update message after reconnect has completed. > > However, clearing the "table->cond_changed" flag is not enough because > when the IDL client (ovn-controller in my case) will try to set the same > condition as before (e.g., "true" if ovn-monitor-all=true) > ovsdb_idl_db_set_condition() will return early: > > https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 > > Therefore in the cases when ovsdb_idl_db_clear() is called, we should > also clear the existing conditions on the DB tables. > > As Han suggested, we could also revert the changes from 5351980b047f > ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") > and we wouldn't hit the issue anymore. I'm not sure however if this has > any other major implications. > > What do you think? I honestly don't understand what's going on here well enough at the moment. There's a lot of context. If necessary, I can try to untangle it, but I'd rather leave it to the two of you if you can figure it out.
On 3/5/20 9:50 PM, Ben Pfaff wrote: > On Tue, Mar 03, 2020 at 01:47:48PM +0100, Dumitru Ceara wrote: >> CC-ing Andy and Ben. >> >> On 3/3/20 7:43 AM, Han Zhou wrote: >>> >>> >>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com >>> <mailto:dceara@redhat.com>> wrote: >>>> >>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >>>>> If the ovsdb-server reply to "monitor_cond_since" requests has >>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >>>>> ovsdb_idl_db_clear() which iterates through all tables and >>>>> unconditionally sets table->cond_changed to false. >>>>> >>>>> However, if the client had already set a new condition for some of the >>>>> tables, this new condition request will never be sent to ovsdb-server >>>>> until the condition is reset to a different value. This is due to the >>>>> check in ovsdb_idl_db_set_condition(). >>>>> >>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for >>>>> each table condition in ovsdb_idl_db_clear(). This ensures that >>>>> resetting the condition to the same value as before the >>>>> "monitor_cond_since" reply will trigger a new request. >>>>> >>>>> One way to replicate the issue is described in the bugzilla reporting >>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all": >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >>>>> >>>>> Reported-by: Dan Williams <dcbw@redhat.com <mailto:dcbw@redhat.com>> >>>>> Reported-at: https://bugzilla.redhat.com/1808125 >>>>> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> >>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when >>> connection reset.") >>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com >>> <mailto:dceara@redhat.com>> >>>> >>>> Hi Han, >>>> >>>> I was wondering if you could review this change because ovn-kubernetes >>>> is trying to enable ovn-monitor-all for scalability reasons but their CI >>>> fails due to the bug fixed by the patch below. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> --- >>>>> lib/ovsdb-idl.c | 6 +++++- >>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >>>>> index 190143f..64721ca 100644 >>>>> --- a/lib/ovsdb-idl.c >>>>> +++ b/lib/ovsdb-idl.c >>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >>>>> struct ovsdb_idl_table *table = &db->tables[i]; >>>>> struct ovsdb_idl_row *row, *next_row; >>>>> >>>>> - table->cond_changed = false; >>>>> + if (table->cond_changed) { >>>>> + table->cond_changed = false; >>>>> + ovsdb_idl_condition_clear(&table->condition); >>>>> + } >>>>> + >>>>> if (hmap_is_empty(&table->rows)) { >>>>> continue; >>>>> } >>>>> >>>> >>> >>> Hi Dumitru> >>> If I understand the problem, it is when the client updated monitor >>> condition and at the same time it received monitor_cond_since reply from >>> server for it's previous conditional monitor request, and the handling >>> of the reply overwrite the flag table->cond_changed from true to false, >>> and because of this, the new condition is not sent to server. >>> >> Hi Han, >> >> Yes, that's what's happening. >> >>> However, I don't understand the fix. It seems that the fix should be >>> "don't overwrite the flag", instead of clearing the conditions as well. >>> Also, by simply looking at this code I don't understand why the flag >>> needs to be set to false when clearing data in IDL. What would be the >>> problem if we keep it as what it is? If you understand the details, >>> could you explain more about the fix? >>> >>> In addition, I didn't change this logic when implementing >>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server >>> when connection reset."). In that commit it only reduced the chance of >>> calling ovsdb_idl_db_clear() as an optimization. Before that commit, >>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and >>> it would result in same problem. >> You're right, my bad, I messed up the "git blame", sorry about that. It >> was actually introduced by: >> >> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a >> >> I'll fix up the commit message in V2 once we decide on the best approach. >> >> Hi Andy, Ben, >> >> As far as I understand, whenever ovsdb_idl_db_clear() is called during >> reconnect, any buffered but unsent conditions should be cleared because >> they will be anyway resent with the following monitoring condition >> update message after reconnect has completed. >> >> However, clearing the "table->cond_changed" flag is not enough because >> when the IDL client (ovn-controller in my case) will try to set the same >> condition as before (e.g., "true" if ovn-monitor-all=true) >> ovsdb_idl_db_set_condition() will return early: >> >> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 >> >> Therefore in the cases when ovsdb_idl_db_clear() is called, we should >> also clear the existing conditions on the DB tables. >> >> As Han suggested, we could also revert the changes from 5351980b047f >> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") >> and we wouldn't hit the issue anymore. I'm not sure however if this has >> any other major implications. >> >> What do you think? > > I honestly don't understand what's going on here well enough at the > moment. There's a lot of context. If necessary, I can try to untangle > it, but I'd rather leave it to the two of you if you can figure it out. > Hi Han, I just sent a v2 of the patch. The code changes are the same as in v1, just that I updated the "Fixes" tag to reflect the correct commit that introduced the issue: https://patchwork.ozlabs.org/patch/1252278/ It would be great if you could help review this new version of the patch. Thanks, Dumitru
On 3/3/20 1:47 PM, Dumitru Ceara wrote: > CC-ing Andy and Ben. > > On 3/3/20 7:43 AM, Han Zhou wrote: >> >> >> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com >> <mailto:dceara@redhat.com>> wrote: >>> >>> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >>>> If the ovsdb-server reply to "monitor_cond_since" requests has >>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >>>> ovsdb_idl_db_clear() which iterates through all tables and >>>> unconditionally sets table->cond_changed to false. >>>> >>>> However, if the client had already set a new condition for some of the >>>> tables, this new condition request will never be sent to ovsdb-server >>>> until the condition is reset to a different value. This is due to the >>>> check in ovsdb_idl_db_set_condition(). >>>> >>>> In order to fix this we now also call ovsdb_idl_condition_clear() for >>>> each table condition in ovsdb_idl_db_clear(). This ensures that >>>> resetting the condition to the same value as before the >>>> "monitor_cond_since" reply will trigger a new request. >>>> >>>> One way to replicate the issue is described in the bugzilla reporting >>>> the bug, when ovn-controller is configured to use "ovn-monitor-all": >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >>>> >>>> Reported-by: Dan Williams <dcbw@redhat.com <mailto:dcbw@redhat.com>> >>>> Reported-at: https://bugzilla.redhat.com/1808125 >>>> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> >>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when >> connection reset.") >>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com >> <mailto:dceara@redhat.com>> >>> >>> Hi Han, >>> >>> I was wondering if you could review this change because ovn-kubernetes >>> is trying to enable ovn-monitor-all for scalability reasons but their CI >>> fails due to the bug fixed by the patch below. >>> >>> Thanks, >>> Dumitru >>> >>>> --- >>>> lib/ovsdb-idl.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >>>> index 190143f..64721ca 100644 >>>> --- a/lib/ovsdb-idl.c >>>> +++ b/lib/ovsdb-idl.c >>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >>>> struct ovsdb_idl_table *table = &db->tables[i]; >>>> struct ovsdb_idl_row *row, *next_row; >>>> >>>> - table->cond_changed = false; >>>> + if (table->cond_changed) { >>>> + table->cond_changed = false; >>>> + ovsdb_idl_condition_clear(&table->condition); >>>> + } >>>> + >>>> if (hmap_is_empty(&table->rows)) { >>>> continue; >>>> } >>>> >>> >> >> Hi Dumitru> >> If I understand the problem, it is when the client updated monitor >> condition and at the same time it received monitor_cond_since reply from >> server for it's previous conditional monitor request, and the handling >> of the reply overwrite the flag table->cond_changed from true to false, >> and because of this, the new condition is not sent to server. >> > Hi Han, > > Yes, that's what's happening. > >> However, I don't understand the fix. It seems that the fix should be >> "don't overwrite the flag", instead of clearing the conditions as well. >> Also, by simply looking at this code I don't understand why the flag >> needs to be set to false when clearing data in IDL. What would be the >> problem if we keep it as what it is? If you understand the details, >> could you explain more about the fix? >> >> In addition, I didn't change this logic when implementing >> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server >> when connection reset."). In that commit it only reduced the chance of >> calling ovsdb_idl_db_clear() as an optimization. Before that commit, >> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and >> it would result in same problem. > You're right, my bad, I messed up the "git blame", sorry about that. It > was actually introduced by: > > https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a > > I'll fix up the commit message in V2 once we decide on the best approach. > > Hi Andy, Ben, > > As far as I understand, whenever ovsdb_idl_db_clear() is called during > reconnect, any buffered but unsent conditions should be cleared because > they will be anyway resent with the following monitoring condition > update message after reconnect has completed. > > However, clearing the "table->cond_changed" flag is not enough because > when the IDL client (ovn-controller in my case) will try to set the same > condition as before (e.g., "true" if ovn-monitor-all=true) > ovsdb_idl_db_set_condition() will return early: > > https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 > > Therefore in the cases when ovsdb_idl_db_clear() is called, we should > also clear the existing conditions on the DB tables. > > As Han suggested, we could also revert the changes from 5351980b047f > ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") > and we wouldn't hit the issue anymore. I'm not sure however if this has > any other major implications. If we're clearing existing conditions, we will send empty conditions in initial monitor_cond after reconnect, AFAIU, i.e. application will stop receiving any updates. I think that breaks the logic that was behind commit 5351980b047f. The application that uses idl library will have to set up conditions from scratch in order to avoid logic changes. I think that reconnection process should not affect application anyhow and we can't force application to track all the reconnections and reconfigure conditions. So, I think that it's better to revert commit 5351980b047f instead, because otherwise we will break its logic anyway. And, in the end, this was just a performance optimization. Also, after commit 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset."), ovsdb_idl_db_parse_monitor_reply() doesn't call ovsdb_idl_db_clear() in most cases, i.e. the performance optimization of sending same conditions twice might be less important. Best regards, Ilya Maximets.
On 3/25/20 7:21 PM, Ilya Maximets wrote: > On 3/3/20 1:47 PM, Dumitru Ceara wrote: >> CC-ing Andy and Ben. >> >> On 3/3/20 7:43 AM, Han Zhou wrote: >>> >>> >>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara@redhat.com >>> <mailto:dceara@redhat.com>> wrote: >>>> >>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote: >>>>> If the ovsdb-server reply to "monitor_cond_since" requests has >>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls >>>>> ovsdb_idl_db_clear() which iterates through all tables and >>>>> unconditionally sets table->cond_changed to false. >>>>> >>>>> However, if the client had already set a new condition for some of the >>>>> tables, this new condition request will never be sent to ovsdb-server >>>>> until the condition is reset to a different value. This is due to the >>>>> check in ovsdb_idl_db_set_condition(). >>>>> >>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for >>>>> each table condition in ovsdb_idl_db_clear(). This ensures that >>>>> resetting the condition to the same value as before the >>>>> "monitor_cond_since" reply will trigger a new request. >>>>> >>>>> One way to replicate the issue is described in the bugzilla reporting >>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all": >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 >>>>> >>>>> Reported-by: Dan Williams <dcbw@redhat.com <mailto:dcbw@redhat.com>> >>>>> Reported-at: https://bugzilla.redhat.com/1808125 >>>>> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> >>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when >>> connection reset.") >>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com >>> <mailto:dceara@redhat.com>> >>>> >>>> Hi Han, >>>> >>>> I was wondering if you could review this change because ovn-kubernetes >>>> is trying to enable ovn-monitor-all for scalability reasons but their CI >>>> fails due to the bug fixed by the patch below. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> --- >>>>> lib/ovsdb-idl.c | 6 +++++- >>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >>>>> index 190143f..64721ca 100644 >>>>> --- a/lib/ovsdb-idl.c >>>>> +++ b/lib/ovsdb-idl.c >>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) >>>>> struct ovsdb_idl_table *table = &db->tables[i]; >>>>> struct ovsdb_idl_row *row, *next_row; >>>>> >>>>> - table->cond_changed = false; >>>>> + if (table->cond_changed) { >>>>> + table->cond_changed = false; >>>>> + ovsdb_idl_condition_clear(&table->condition); >>>>> + } >>>>> + >>>>> if (hmap_is_empty(&table->rows)) { >>>>> continue; >>>>> } >>>>> >>>> >>> >>> Hi Dumitru> >>> If I understand the problem, it is when the client updated monitor >>> condition and at the same time it received monitor_cond_since reply from >>> server for it's previous conditional monitor request, and the handling >>> of the reply overwrite the flag table->cond_changed from true to false, >>> and because of this, the new condition is not sent to server. >>> >> Hi Han, >> >> Yes, that's what's happening. >> >>> However, I don't understand the fix. It seems that the fix should be >>> "don't overwrite the flag", instead of clearing the conditions as well. >>> Also, by simply looking at this code I don't understand why the flag >>> needs to be set to false when clearing data in IDL. What would be the >>> problem if we keep it as what it is? If you understand the details, >>> could you explain more about the fix? >>> >>> In addition, I didn't change this logic when implementing >>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server >>> when connection reset."). In that commit it only reduced the chance of >>> calling ovsdb_idl_db_clear() as an optimization. Before that commit, >>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and >>> it would result in same problem. >> You're right, my bad, I messed up the "git blame", sorry about that. It >> was actually introduced by: >> >> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a >> >> I'll fix up the commit message in V2 once we decide on the best approach. >> >> Hi Andy, Ben, >> >> As far as I understand, whenever ovsdb_idl_db_clear() is called during >> reconnect, any buffered but unsent conditions should be cleared because >> they will be anyway resent with the following monitoring condition >> update message after reconnect has completed. >> >> However, clearing the "table->cond_changed" flag is not enough because >> when the IDL client (ovn-controller in my case) will try to set the same >> condition as before (e.g., "true" if ovn-monitor-all=true) >> ovsdb_idl_db_set_condition() will return early: >> >> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518 >> >> Therefore in the cases when ovsdb_idl_db_clear() is called, we should >> also clear the existing conditions on the DB tables. >> >> As Han suggested, we could also revert the changes from 5351980b047f >> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates") >> and we wouldn't hit the issue anymore. I'm not sure however if this has >> any other major implications. > Hi Ilya, Thanks for the review. > If we're clearing existing conditions, we will send empty conditions in > initial monitor_cond after reconnect, AFAIU, i.e. application will stop > receiving any updates. I think that breaks the logic that was behind commit > 5351980b047f. The application that uses idl library will have to set up > conditions from scratch in order to avoid logic changes. I think that > reconnection process should not affect application anyhow and we can't force > application to track all the reconnections and reconfigure conditions. > Ok, makes sense. I was looking at ovn-controller as the standard application that uses the idl library and ovn-controller tracks reconnects and also reconfigures conditions at every iteration (also when reconnects happen). But you're right, this change would force all idl user apps to do the same thing. > So, I think that it's better to revert commit 5351980b047f instead, because > otherwise we will break its logic anyway. And, in the end, this was just a > performance optimization. > Also, after commit 403a6a0cb003 ("ovsdb-idl: Fast resync from server when > connection reset."), ovsdb_idl_db_parse_monitor_reply() doesn't call > ovsdb_idl_db_clear() in most cases, i.e. the performance optimization of > sending same conditions twice might be less important. > Ack, I'll send a v3 that reverts 5351980b047f. Thanks, Dumitru > Best regards, Ilya Maximets. >
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 190143f..64721ca 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db) struct ovsdb_idl_table *table = &db->tables[i]; struct ovsdb_idl_row *row, *next_row; - table->cond_changed = false; + if (table->cond_changed) { + table->cond_changed = false; + ovsdb_idl_condition_clear(&table->condition); + } + if (hmap_is_empty(&table->rows)) { continue; }
If the ovsdb-server reply to "monitor_cond_since" requests has "found" == false then ovsdb_idl_db_parse_monitor_reply() calls ovsdb_idl_db_clear() which iterates through all tables and unconditionally sets table->cond_changed to false. However, if the client had already set a new condition for some of the tables, this new condition request will never be sent to ovsdb-server until the condition is reset to a different value. This is due to the check in ovsdb_idl_db_set_condition(). In order to fix this we now also call ovsdb_idl_condition_clear() for each table condition in ovsdb_idl_db_clear(). This ensures that resetting the condition to the same value as before the "monitor_cond_since" reply will trigger a new request. One way to replicate the issue is described in the bugzilla reporting the bug, when ovn-controller is configured to use "ovn-monitor-all": https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6 Reported-by: Dan Williams <dcbw@redhat.com> Reported-at: https://bugzilla.redhat.com/1808125 CC: Han Zhou <hzhou@ovn.org> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection reset.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- lib/ovsdb-idl.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)