Message ID | 20220415142146.13169-1-dceara@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,v2] ovsdb-idl: Support change tracking of write-only columns. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> wrote: > > At a first glance, change tracking should never be allowed for > write-only columns. However, some clients (e.g., ovn-northd) that are > mostly exclusive writers of a database, use change tracking to avoid > duplicating the IDL row records into a local cache when implementing > incremental processing. Hi Dumitru, It is still not clear to me why ovn-northd would need change-tracking for the write-only columns. It didn't require change tracking before using the incremental processing engine. In theory, to my understanding, change-tracking is required only if the column is an input to some of the engine nodes, so that it needs to be alerted when there is a change in the database. If the column is write-only to ovn-northd, it means it doesn't care about any change of the column from the DB, but blindly writes to the column regardless of the current value. I checked the patch [0], which mentioned that Such nodes are primary inputs to the northd incremental processing engine and without proper update processing for Southbound tables, northd might fail to timely reconcile the contents of the Southbound database. I am confused that, if the columns were write-only, how come they become inputs to northd and requires reconcile upon them? Does it mean that they were misused before I-P, that they were in fact read/write columns to ovn-northd? Similarly for the patch [1], I didn't figure out what's exactly broken without change-tracking to the options column from the commit message and the patch itself. I think I need to understand the context before commenting on the current patch. Could you help explain a little more? [0] https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 [1] https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 Thanks, Han > > The default behavior of the IDL is to automatically turn a write-only > column into a read-write column whenever the client enables change > tracking for that column. > > For the afore mentioned clients, this becomes a performance issue. > Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't > change a column's value.") explains why writes that don't change a > column's value cannot be optimized out early if the column is > read/write. > > Furthermore, if there is at least one record in any table that > changed during a transaction, then *all* records that have been > written are added to the transaction, even if their values didn't > change. If there are many such rows (e.g., like in ovn-northd's > case) this incurs a significant overhead because: > a. the client has to build this large transaction > b. the transaction has to be sent over the network > c. the server needs to parse this (mostly) no-op update > > We now introduce new IDL APIs allowing users to explicitly request > monitoring mode combinations for different columns in the IDL. > > These accept the combination 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' > as a valid input allowing users to change track write-only > tables. > > We benchmarked ovn-northd performance when using this new mode > against NB and SB databases taken from ovn-kubernetes scale tests. > We noticed that when a minor change is performed to the Northbound > database (e.g., NB_Global.nb_cfg is incremented) the time it takes to > build the Southbound transaction becomes negligible (vs ~1.5 seconds > before this change). > > End-to-end ovn-kubernetes scale tests on 120-node clusters also show > significant reduction of latency to bring up pods; both average and P99 > latency decreased by ~30%. > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > V2: > - Addressed Numan's comments: > - Added APIs to allow per column configuration of modes. > - Fixed unit tests to actually enable change tracking for write-only > columns. > - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change > seqnos if change tracking is enabled (even for write-only rows). > > Note: The OVN counter part change is: > https://github.com/dceara/ovn/commit/0d0c689e6469ae2f8e195c423e9a91fbc514bd60 > --- > NEWS | 4 +++ > lib/ovsdb-idl.c | 85 +++++++++++++++++++++++++++++++++------------- > lib/ovsdb-idl.h | 8 ++++- > tests/ovsdb-idl.at | 20 +++++++++-- > tests/test-ovsdb.c | 25 ++++++++++---- > 5 files changed, 108 insertions(+), 34 deletions(-) > > diff --git a/NEWS b/NEWS > index 5074b97aa51c..c1c41da94e9b 100644 > --- a/NEWS > +++ b/NEWS > @@ -14,6 +14,10 @@ Post-v2.17.0 > - OVSDB: > * 'relay' service model now supports transaction history, i.e. honors the > 'last-txn-id' field in 'monitor_cond_since' requests from clients. > + - OVSDB-IDL: > + * New interfaces for explcitly setting column modes. The combination > + 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' is now considered valid and > + enables change tracking of write-only columns. > > > v2.17.0 - 17 Feb 2022 > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > index 882ede75598d..5f62c267d496 100644 > --- a/lib/ovsdb-idl.c > +++ b/lib/ovsdb-idl.c > @@ -859,20 +859,6 @@ ovsdb_idl_get_mode(struct ovsdb_idl *idl, > return &table->modes[column - table->class_->columns]; > } > > -static void > -ovsdb_idl_set_mode(struct ovsdb_idl *idl, > - const struct ovsdb_idl_column *column, > - unsigned char mode) > -{ > - const struct ovsdb_idl_table *table = ovsdb_idl_table_from_column(idl, > - column); > - size_t column_idx = column - table->class_->columns; > - > - if (table->modes[column_idx] != mode) { > - *ovsdb_idl_get_mode(idl, column) = mode; > - } > -} > - > static void > add_ref_table(struct ovsdb_idl *idl, const struct ovsdb_base_type *base) > { > @@ -902,7 +888,8 @@ void > ovsdb_idl_add_column(struct ovsdb_idl *idl, > const struct ovsdb_idl_column *column) > { > - ovsdb_idl_set_mode(idl, column, OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); > + ovsdb_idl_set_column_mode(idl, column, > + OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); > add_ref_table(idl, &column->type.key); > add_ref_table(idl, &column->type.value); > } > @@ -1186,6 +1173,51 @@ ovsdb_idl_omit_alert(struct ovsdb_idl *idl, > *ovsdb_idl_get_mode(idl, column) &= ~(OVSDB_IDL_ALERT | OVSDB_IDL_TRACK); > } > > +/* Sets the 'column' mode to 'mode' which must be a valid combination of > + * OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT and OVSDB_IDL_TRACK. > + */ > +void > +ovsdb_idl_set_column_mode(struct ovsdb_idl *idl, > + const struct ovsdb_idl_column *column, > + unsigned char mode) > +{ > + if (mode & ~(OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { > + VLOG_ABORT("%s(): Column '%s' unknown mode %02x.", > + __func__, column->name, mode); > + } > + > + if ((mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) > + && !(mode & OVSDB_IDL_MONITOR)) { > + VLOG_ABORT("%s(): Column '%s' mode must include OVSDB_IDL_MONITOR.", > + __func__, column->name); > + } > + > + const struct ovsdb_idl_table *table = ovsdb_idl_table_from_column(idl, > + column); > + size_t column_idx = column - table->class_->columns; > + > + if (table->modes[column_idx] != mode) { > + *ovsdb_idl_get_mode(idl, column) = mode; > + } > +} > + > +/* Walks all columns of tables in 'idl' and sets their mode to 'mode' > + * which must be a valid combination of OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT > + * and OVSDB_IDL_TRACK. > + */ > +void > +ovsdb_idl_set_column_mode_all(struct ovsdb_idl *idl, unsigned char mode) > +{ > + for (size_t i = 0; i < idl->class_->n_tables; i++) { > + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; > + > + for (size_t j = 0; j < tc->n_columns; j++) { > + const struct ovsdb_idl_column *column = &tc->columns[j]; > + ovsdb_idl_set_column_mode(idl, column, mode); > + } > + } > +} > + > /* Sets the mode for 'column' in 'idl' to 0. See the big comment above > * OVSDB_IDL_MONITOR for details. > * > @@ -1236,8 +1268,8 @@ ovsdb_idl_row_get_seqno(const struct ovsdb_idl_row *row, > * functions. > * > * This function should be called between ovsdb_idl_create() and > - * the first call to ovsdb_idl_run(). The column to be tracked > - * should have OVSDB_IDL_ALERT turned on. > + * the first call to ovsdb_idl_run(). The function will also ensure > + * that the column to be tracked has OVSDB_IDL_ALERT turned on. > */ > void > ovsdb_idl_track_add_column(struct ovsdb_idl *idl, > @@ -1246,7 +1278,9 @@ ovsdb_idl_track_add_column(struct ovsdb_idl *idl, > if (!(*ovsdb_idl_get_mode(idl, column) & OVSDB_IDL_ALERT)) { > ovsdb_idl_add_column(idl, column); > } > - *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_TRACK; > + > + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK; > + ovsdb_idl_set_column_mode(idl, column, mode); > } > > void > @@ -1694,11 +1728,14 @@ ovsdb_idl_row_change(struct ovsdb_idl_row *row, const struct shash *values, > continue; > } > > - if (datum_changed && table->modes[column_idx] & OVSDB_IDL_ALERT) { > - changed = true; > - row->change_seqno[change] > - = row->table->change_seqno[change] > - = row->table->idl->change_seqno + 1; > + if (datum_changed) { > + unsigned char column_mode = table->modes[column_idx]; > + if (column_mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { > + changed = true; > + row->change_seqno[change] > + = row->table->change_seqno[change] > + = row->table->idl->change_seqno + 1; > + } > > if (table->modes[column_idx] & OVSDB_IDL_TRACK) { > if (ovs_list_is_empty(&row->track_node) && > @@ -3501,7 +3538,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > > class = row->table->class_; > column_idx = column - class->columns; > - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; > + write_only = !(row->table->modes[column_idx] & OVSDB_IDL_ALERT); > > ovs_assert(row->new_datum != NULL); > ovs_assert(column_idx < class->n_columns); > diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h > index d00599616ef9..2d74ed0727a7 100644 > --- a/lib/ovsdb-idl.h > +++ b/lib/ovsdb-idl.h > @@ -180,7 +180,8 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, > * > * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column > * that a client wants to track using the change tracking > - * ovsdb_idl_track_get_*() functions. > + * ovsdb_idl_track_get_*() functions. OVSDB_IDL_ALERT can be omitted for > + * write-only columns. > */ > #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ > #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ > @@ -193,6 +194,11 @@ void ovsdb_idl_add_table(struct ovsdb_idl *, > void ovsdb_idl_omit(struct ovsdb_idl *, const struct ovsdb_idl_column *); > void ovsdb_idl_omit_alert(struct ovsdb_idl *, const struct ovsdb_idl_column *); > > +void ovsdb_idl_set_column_mode(struct ovsdb_idl *, > + const struct ovsdb_idl_column *, > + unsigned char mode); > +void ovsdb_idl_set_column_mode_all(struct ovsdb_idl *, unsigned char mode); > + > /* Change tracking. > * > * In OVSDB, change tracking is applied at each client in the IDL layer. This > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 62e2b638320c..0436e0f77b0f 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -1154,12 +1154,25 @@ OVSDB_CHECK_IDL_WO_MONITOR_COND([simple idl disable monitor-cond], > ]]) > > m4_define([OVSDB_CHECK_IDL_TRACK_C], > - [AT_SETUP([$1 - C]) > + [AT_SETUP([$1 - alert - C]) > + AT_KEYWORDS([ovsdb server idl tracking positive $5]) > + AT_CHECK([ovsdb_start_idltest]) > + m4_if([$2], [], [], > + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 --change-track=alert idl unix:socket $3], > + [0], [stdout], [ignore]) > + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > + [0], [$4]) > + OVSDB_SERVER_SHUTDOWN > + AT_CLEANUP]) > + > +m4_define([OVSDB_CHECK_IDL_TRACK_NOALERT_C], > + [AT_SETUP([$1 - noalert - C]) > AT_KEYWORDS([ovsdb server idl tracking positive $5]) > AT_CHECK([ovsdb_start_idltest]) > m4_if([$2], [], [], > [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > - AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c idl unix:socket $3], > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 --change-track=noalert idl unix:socket $3], > [0], [stdout], [ignore]) > AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > [0], [$4]) > @@ -1167,7 +1180,8 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], > AT_CLEANUP]) > > m4_define([OVSDB_CHECK_IDL_TRACK], > - [OVSDB_CHECK_IDL_TRACK_C($@)]) > + [OVSDB_CHECK_IDL_TRACK_C($@) > + OVSDB_CHECK_IDL_TRACK_NOALERT_C($@)]) > > OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], > [['["idltest", > diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > index ca4e87b8115c..abd30cf27d18 100644 > --- a/tests/test-ovsdb.c > +++ b/tests/test-ovsdb.c > @@ -56,6 +56,7 @@ > VLOG_DEFINE_THIS_MODULE(test_ovsdb); > > struct test_ovsdb_pvt_context { > + bool track_alert; > bool track; > }; > > @@ -122,6 +123,15 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > break; > > case 'c': > + if (optarg) { > + if (!strcmp(optarg, "noalert")) { > + pvt->track_alert = false; > + } else if (!strcmp(optarg, "alert")) { > + pvt->track_alert = true; > + } else { > + ovs_fatal(0, "value %s is invalid", optarg); > + } > + } > pvt->track = true; > break; > > @@ -2610,6 +2620,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) > static void > do_idl(struct ovs_cmdl_context *ctx) > { > + struct test_ovsdb_pvt_context *pvt = ctx->pvt; > struct jsonrpc *rpc; > struct ovsdb_idl *idl; > unsigned int seqno = 0; > @@ -2618,9 +2629,6 @@ do_idl(struct ovs_cmdl_context *ctx) > int step = 0; > int error; > int i; > - bool track; > - > - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; > > idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); > ovsdb_idl_set_leader_only(idl, false); > @@ -2637,8 +2645,13 @@ do_idl(struct ovs_cmdl_context *ctx) > rpc = NULL; > } > > - if (track) { > - ovsdb_idl_track_add_all(idl); > + if (pvt->track) { > + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK; > + > + if (pvt->track_alert) { > + mode |= OVSDB_IDL_ALERT; > + } > + ovsdb_idl_set_column_mode_all(idl, mode); > } > > setvbuf(stdout, NULL, _IONBF, 0); > @@ -2683,7 +2696,7 @@ do_idl(struct ovs_cmdl_context *ctx) > } > > /* Print update. */ > - if (track) { > + if (pvt->track) { > print_idl_track(idl, step++, terse); > ovsdb_idl_track_clear(idl); > } else { > -- > 2.27.0 >
On 4/15/22 20:25, Han Zhou wrote: > On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> At a first glance, change tracking should never be allowed for >> write-only columns. However, some clients (e.g., ovn-northd) that are >> mostly exclusive writers of a database, use change tracking to avoid >> duplicating the IDL row records into a local cache when implementing >> incremental processing. > > Hi Dumitru, > Hi Han, > It is still not clear to me why ovn-northd would need change-tracking for > the write-only columns. It didn't require change tracking before using the > incremental processing engine. In theory, to my understanding, > change-tracking is required only if the column is an input to some of the > engine nodes, so that it needs to be alerted when there is a change in the > database. If the column is write-only to ovn-northd, it means it doesn't > care about any change of the column from the DB, but blindly writes to the > column regardless of the current value. I checked the patch [0], which > mentioned that > > Such nodes are primary inputs to the northd incremental processing > engine and without proper update processing for Southbound tables, > northd might fail to timely reconcile the contents of the Southbound > database. > > I am confused that, if the columns were write-only, how come they become > inputs to northd and requires reconcile upon them? Does it mean that they > were misused before I-P, that they were in fact read/write columns to > ovn-northd? I think it's actually a bit more complicated than this. Even for pure write-only tables/columns we have a problem due to the fact that clustered ovsdb offers at-least-once consistency [2]. We actually hit the case in which a single northd transaction to insert a SB.Load_Balancer record resulted in two identical rows being added to the database, both of them pointing to the same logical switch datapath binding [3]. When that logical switch was deleted northd would fail to remove the duplicate from the SB causing a referential integrity violation due to the dangling reference. Arguably the schema for load balancers is not ideal because it doesn't defines an index on "name". Nevertheless, that's not something we can change easily because it will (quite likely with raft) break existing deployments. And it applies to other tables in the NB/SB schema. The fix was relatively straighforward though [4], and it meant fixing the way northd reconciles the SB.Load_balancers. This is were I-P became an issue. Consider the current main branch code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for all SB tables.") reverted. In a sandbox we do: $ ovn-nbctl ls-add ls $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 # At this point the SB.LB table looks ok: $ ovn-sbctl list load_balancer _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} name : lb1 options : {hairpin_orig_tuple="true"} protocol : tcp vips : {"1.1.1.1:1000"="2.2.2.2:2000"} # Simulate the bug in [3] $ ovn-sbctl create load_balancer name=lb1 external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 # Even with the northd fix [4], due to the fact that the LB table is # write-only, northd doesn't wake up so we end up with the duplicate # staying in the SB for an indeterminate time. $ ovn-sbctl list load_balancer _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} name : lb1 options : {hairpin_orig_tuple="true"} protocol : tcp vips : {"1.1.1.1:1000"="2.2.2.2:2000"} _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f datapaths : [] external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} name : lb1 options : {} protocol : [] vips : {} # Only when we generate an input for a "non-write-only" column the SB.LB # is reconciled, because the "northd" I-P node runs. $ ovn-nbctl --wait=sb sync $ ovn-sbctl list load_balancer _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} name : lb1 options : {hairpin_orig_tuple="true"} protocol : tcp vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > Similarly for the patch [1], I didn't figure out what's exactly broken > without change-tracking to the options column from the commit message and > the patch itself. I think the reason behind [1] is that ovn-controller (in the pinctrl module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, which were then propagated by ovn-northd to Logical_Router_Port.ipv6_prefix. This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the "northd" I-P node. But, if the only change that woke up northd was the SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is not tracked, the en_sb_port_binding I-P node (input of northd) doesn't run. > I think I need to understand the context before commenting on the current > patch. Could you help explain a little more? > I hope this makes it more clear. I'm sure there are more aspects I missed here but I think we need to find a way to make northd work correctly and without delays in reconciling state while at the same time avoiding to affect latency (like in the performance regression I introduced with [0]). > [0] > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 > [1] > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 > [2] https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 [4] https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 > Thanks, > Han > Thanks, Dumitru >> >> The default behavior of the IDL is to automatically turn a write-only >> column into a read-write column whenever the client enables change >> tracking for that column. >> >> For the afore mentioned clients, this becomes a performance issue. >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't >> change a column's value.") explains why writes that don't change a >> column's value cannot be optimized out early if the column is >> read/write. >> >> Furthermore, if there is at least one record in any table that >> changed during a transaction, then *all* records that have been >> written are added to the transaction, even if their values didn't >> change. If there are many such rows (e.g., like in ovn-northd's >> case) this incurs a significant overhead because: >> a. the client has to build this large transaction >> b. the transaction has to be sent over the network >> c. the server needs to parse this (mostly) no-op update >> >> We now introduce new IDL APIs allowing users to explicitly request >> monitoring mode combinations for different columns in the IDL. >> >> These accept the combination 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' >> as a valid input allowing users to change track write-only >> tables. >> >> We benchmarked ovn-northd performance when using this new mode >> against NB and SB databases taken from ovn-kubernetes scale tests. >> We noticed that when a minor change is performed to the Northbound >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to >> build the Southbound transaction becomes negligible (vs ~1.5 seconds >> before this change). >> >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show >> significant reduction of latency to bring up pods; both average and P99 >> latency decreased by ~30%. >> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >> --- >> V2: >> - Addressed Numan's comments: >> - Added APIs to allow per column configuration of modes. >> - Fixed unit tests to actually enable change tracking for write-only >> columns. >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change >> seqnos if change tracking is enabled (even for write-only rows). >> >> Note: The OVN counter part change is: >> > https://github.com/dceara/ovn/commit/0d0c689e6469ae2f8e195c423e9a91fbc514bd60 >> --- >> NEWS | 4 +++ >> lib/ovsdb-idl.c | 85 +++++++++++++++++++++++++++++++++------------- >> lib/ovsdb-idl.h | 8 ++++- >> tests/ovsdb-idl.at | 20 +++++++++-- >> tests/test-ovsdb.c | 25 ++++++++++---- >> 5 files changed, 108 insertions(+), 34 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 5074b97aa51c..c1c41da94e9b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -14,6 +14,10 @@ Post-v2.17.0 >> - OVSDB: >> * 'relay' service model now supports transaction history, i.e. > honors the >> 'last-txn-id' field in 'monitor_cond_since' requests from clients. >> + - OVSDB-IDL: >> + * New interfaces for explcitly setting column modes. The > combination >> + 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' is now considered valid and >> + enables change tracking of write-only columns. >> >> >> v2.17.0 - 17 Feb 2022 >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >> index 882ede75598d..5f62c267d496 100644 >> --- a/lib/ovsdb-idl.c >> +++ b/lib/ovsdb-idl.c >> @@ -859,20 +859,6 @@ ovsdb_idl_get_mode(struct ovsdb_idl *idl, >> return &table->modes[column - table->class_->columns]; >> } >> >> -static void >> -ovsdb_idl_set_mode(struct ovsdb_idl *idl, >> - const struct ovsdb_idl_column *column, >> - unsigned char mode) >> -{ >> - const struct ovsdb_idl_table *table = > ovsdb_idl_table_from_column(idl, >> - > column); >> - size_t column_idx = column - table->class_->columns; >> - >> - if (table->modes[column_idx] != mode) { >> - *ovsdb_idl_get_mode(idl, column) = mode; >> - } >> -} >> - >> static void >> add_ref_table(struct ovsdb_idl *idl, const struct ovsdb_base_type *base) >> { >> @@ -902,7 +888,8 @@ void >> ovsdb_idl_add_column(struct ovsdb_idl *idl, >> const struct ovsdb_idl_column *column) >> { >> - ovsdb_idl_set_mode(idl, column, OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); >> + ovsdb_idl_set_column_mode(idl, column, >> + OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); >> add_ref_table(idl, &column->type.key); >> add_ref_table(idl, &column->type.value); >> } >> @@ -1186,6 +1173,51 @@ ovsdb_idl_omit_alert(struct ovsdb_idl *idl, >> *ovsdb_idl_get_mode(idl, column) &= ~(OVSDB_IDL_ALERT | > OVSDB_IDL_TRACK); >> } >> >> +/* Sets the 'column' mode to 'mode' which must be a valid combination of >> + * OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT and OVSDB_IDL_TRACK. >> + */ >> +void >> +ovsdb_idl_set_column_mode(struct ovsdb_idl *idl, >> + const struct ovsdb_idl_column *column, >> + unsigned char mode) >> +{ >> + if (mode & ~(OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) > { >> + VLOG_ABORT("%s(): Column '%s' unknown mode %02x.", >> + __func__, column->name, mode); >> + } >> + >> + if ((mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) >> + && !(mode & OVSDB_IDL_MONITOR)) { >> + VLOG_ABORT("%s(): Column '%s' mode must include > OVSDB_IDL_MONITOR.", >> + __func__, column->name); >> + } >> + >> + const struct ovsdb_idl_table *table = > ovsdb_idl_table_from_column(idl, >> + > column); >> + size_t column_idx = column - table->class_->columns; >> + >> + if (table->modes[column_idx] != mode) { >> + *ovsdb_idl_get_mode(idl, column) = mode; >> + } >> +} >> + >> +/* Walks all columns of tables in 'idl' and sets their mode to 'mode' >> + * which must be a valid combination of OVSDB_IDL_MONITOR, > OVSDB_IDL_ALERT >> + * and OVSDB_IDL_TRACK. >> + */ >> +void >> +ovsdb_idl_set_column_mode_all(struct ovsdb_idl *idl, unsigned char mode) >> +{ >> + for (size_t i = 0; i < idl->class_->n_tables; i++) { >> + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; >> + >> + for (size_t j = 0; j < tc->n_columns; j++) { >> + const struct ovsdb_idl_column *column = &tc->columns[j]; >> + ovsdb_idl_set_column_mode(idl, column, mode); >> + } >> + } >> +} >> + >> /* Sets the mode for 'column' in 'idl' to 0. See the big comment above >> * OVSDB_IDL_MONITOR for details. >> * >> @@ -1236,8 +1268,8 @@ ovsdb_idl_row_get_seqno(const struct ovsdb_idl_row > *row, >> * functions. >> * >> * This function should be called between ovsdb_idl_create() and >> - * the first call to ovsdb_idl_run(). The column to be tracked >> - * should have OVSDB_IDL_ALERT turned on. >> + * the first call to ovsdb_idl_run(). The function will also ensure >> + * that the column to be tracked has OVSDB_IDL_ALERT turned on. >> */ >> void >> ovsdb_idl_track_add_column(struct ovsdb_idl *idl, >> @@ -1246,7 +1278,9 @@ ovsdb_idl_track_add_column(struct ovsdb_idl *idl, >> if (!(*ovsdb_idl_get_mode(idl, column) & OVSDB_IDL_ALERT)) { >> ovsdb_idl_add_column(idl, column); >> } >> - *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_TRACK; >> + >> + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | > OVSDB_IDL_TRACK; >> + ovsdb_idl_set_column_mode(idl, column, mode); >> } >> >> void >> @@ -1694,11 +1728,14 @@ ovsdb_idl_row_change(struct ovsdb_idl_row *row, > const struct shash *values, >> continue; >> } >> >> - if (datum_changed && table->modes[column_idx] & OVSDB_IDL_ALERT) > { >> - changed = true; >> - row->change_seqno[change] >> - = row->table->change_seqno[change] >> - = row->table->idl->change_seqno + 1; >> + if (datum_changed) { >> + unsigned char column_mode = table->modes[column_idx]; >> + if (column_mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { >> + changed = true; >> + row->change_seqno[change] >> + = row->table->change_seqno[change] >> + = row->table->idl->change_seqno + 1; >> + } >> >> if (table->modes[column_idx] & OVSDB_IDL_TRACK) { >> if (ovs_list_is_empty(&row->track_node) && >> @@ -3501,7 +3538,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row > *row_, >> >> class = row->table->class_; >> column_idx = column - class->columns; >> - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; >> + write_only = !(row->table->modes[column_idx] & OVSDB_IDL_ALERT); >> >> ovs_assert(row->new_datum != NULL); >> ovs_assert(column_idx < class->n_columns); >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h >> index d00599616ef9..2d74ed0727a7 100644 >> --- a/lib/ovsdb-idl.h >> +++ b/lib/ovsdb-idl.h >> @@ -180,7 +180,8 @@ bool ovsdb_idl_server_has_column(const struct > ovsdb_idl *, >> * >> * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a > column >> * that a client wants to track using the change tracking >> - * ovsdb_idl_track_get_*() functions. >> + * ovsdb_idl_track_get_*() functions. OVSDB_IDL_ALERT can be > omitted for >> + * write-only columns. >> */ >> #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ >> #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? > */ >> @@ -193,6 +194,11 @@ void ovsdb_idl_add_table(struct ovsdb_idl *, >> void ovsdb_idl_omit(struct ovsdb_idl *, const struct ovsdb_idl_column *); >> void ovsdb_idl_omit_alert(struct ovsdb_idl *, const struct > ovsdb_idl_column *); >> >> +void ovsdb_idl_set_column_mode(struct ovsdb_idl *, >> + const struct ovsdb_idl_column *, >> + unsigned char mode); >> +void ovsdb_idl_set_column_mode_all(struct ovsdb_idl *, unsigned char > mode); >> + >> /* Change tracking. >> * >> * In OVSDB, change tracking is applied at each client in the IDL > layer. This >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >> index 62e2b638320c..0436e0f77b0f 100644 >> --- a/tests/ovsdb-idl.at >> +++ b/tests/ovsdb-idl.at >> @@ -1154,12 +1154,25 @@ OVSDB_CHECK_IDL_WO_MONITOR_COND([simple idl > disable monitor-cond], >> ]]) >> >> m4_define([OVSDB_CHECK_IDL_TRACK_C], >> - [AT_SETUP([$1 - C]) >> + [AT_SETUP([$1 - alert - C]) >> + AT_KEYWORDS([ovsdb server idl tracking positive $5]) >> + AT_CHECK([ovsdb_start_idltest]) >> + m4_if([$2], [], [], >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > [ignore])]) >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > -t10 --change-track=alert idl unix:socket $3], >> + [0], [stdout], [ignore]) >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> + [0], [$4]) >> + OVSDB_SERVER_SHUTDOWN >> + AT_CLEANUP]) >> + >> +m4_define([OVSDB_CHECK_IDL_TRACK_NOALERT_C], >> + [AT_SETUP([$1 - noalert - C]) >> AT_KEYWORDS([ovsdb server idl tracking positive $5]) >> AT_CHECK([ovsdb_start_idltest]) >> m4_if([$2], [], [], >> [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > [ignore])]) >> - AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > -t10 -c idl unix:socket $3], >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > -t10 --change-track=noalert idl unix:socket $3], >> [0], [stdout], [ignore]) >> AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> [0], [$4]) >> @@ -1167,7 +1180,8 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], >> AT_CLEANUP]) >> >> m4_define([OVSDB_CHECK_IDL_TRACK], >> - [OVSDB_CHECK_IDL_TRACK_C($@)]) >> + [OVSDB_CHECK_IDL_TRACK_C($@) >> + OVSDB_CHECK_IDL_TRACK_NOALERT_C($@)]) >> >> OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], >> [['["idltest", >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c >> index ca4e87b8115c..abd30cf27d18 100644 >> --- a/tests/test-ovsdb.c >> +++ b/tests/test-ovsdb.c >> @@ -56,6 +56,7 @@ >> VLOG_DEFINE_THIS_MODULE(test_ovsdb); >> >> struct test_ovsdb_pvt_context { >> + bool track_alert; >> bool track; >> }; >> >> @@ -122,6 +123,15 @@ parse_options(int argc, char *argv[], struct > test_ovsdb_pvt_context *pvt) >> break; >> >> case 'c': >> + if (optarg) { >> + if (!strcmp(optarg, "noalert")) { >> + pvt->track_alert = false; >> + } else if (!strcmp(optarg, "alert")) { >> + pvt->track_alert = true; >> + } else { >> + ovs_fatal(0, "value %s is invalid", optarg); >> + } >> + } >> pvt->track = true; >> break; >> >> @@ -2610,6 +2620,7 @@ update_conditions(struct ovsdb_idl *idl, char > *commands) >> static void >> do_idl(struct ovs_cmdl_context *ctx) >> { >> + struct test_ovsdb_pvt_context *pvt = ctx->pvt; >> struct jsonrpc *rpc; >> struct ovsdb_idl *idl; >> unsigned int seqno = 0; >> @@ -2618,9 +2629,6 @@ do_idl(struct ovs_cmdl_context *ctx) >> int step = 0; >> int error; >> int i; >> - bool track; >> - >> - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; >> >> idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); >> ovsdb_idl_set_leader_only(idl, false); >> @@ -2637,8 +2645,13 @@ do_idl(struct ovs_cmdl_context *ctx) >> rpc = NULL; >> } >> >> - if (track) { >> - ovsdb_idl_track_add_all(idl); >> + if (pvt->track) { >> + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK; >> + >> + if (pvt->track_alert) { >> + mode |= OVSDB_IDL_ALERT; >> + } >> + ovsdb_idl_set_column_mode_all(idl, mode); >> } >> >> setvbuf(stdout, NULL, _IONBF, 0); >> @@ -2683,7 +2696,7 @@ do_idl(struct ovs_cmdl_context *ctx) >> } >> >> /* Print update. */ >> - if (track) { >> + if (pvt->track) { >> print_idl_track(idl, step++, terse); >> ovsdb_idl_track_clear(idl); >> } else { >> -- >> 2.27.0 >> >
On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/15/22 20:25, Han Zhou wrote: > > On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> wrote: > >> > >> At a first glance, change tracking should never be allowed for > >> write-only columns. However, some clients (e.g., ovn-northd) that are > >> mostly exclusive writers of a database, use change tracking to avoid > >> duplicating the IDL row records into a local cache when implementing > >> incremental processing. > > > > Hi Dumitru, > > > > Hi Han, > > > It is still not clear to me why ovn-northd would need change-tracking for > > the write-only columns. It didn't require change tracking before using the > > incremental processing engine. In theory, to my understanding, > > change-tracking is required only if the column is an input to some of the > > engine nodes, so that it needs to be alerted when there is a change in the > > database. If the column is write-only to ovn-northd, it means it doesn't > > care about any change of the column from the DB, but blindly writes to the > > column regardless of the current value. I checked the patch [0], which > > mentioned that > > > > Such nodes are primary inputs to the northd incremental processing > > engine and without proper update processing for Southbound tables, > > northd might fail to timely reconcile the contents of the Southbound > > database. > > > > I am confused that, if the columns were write-only, how come they become > > inputs to northd and requires reconcile upon them? Does it mean that they > > were misused before I-P, that they were in fact read/write columns to > > ovn-northd? > > I think it's actually a bit more complicated than this. > > Even for pure write-only tables/columns we have a problem due to the > fact that clustered ovsdb offers at-least-once consistency [2]. > > We actually hit the case in which a single northd transaction to insert > a SB.Load_Balancer record resulted in two identical rows being added to > the database, both of them pointing to the same logical switch datapath > binding [3]. When that logical switch was deleted northd would fail to > remove the duplicate from the SB causing a referential integrity > violation due to the dangling reference. > > Arguably the schema for load balancers is not ideal because it doesn't > defines an index on "name". Nevertheless, that's not something we can > change easily because it will (quite likely with raft) break existing > deployments. And it applies to other tables in the NB/SB schema. > > The fix was relatively straighforward though [4], and it meant fixing > the way northd reconciles the SB.Load_balancers. > > This is were I-P became an issue. Consider the current main branch > code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for > all SB tables.") reverted. In a sandbox we do: > > $ ovn-nbctl ls-add ls > $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 > > # At this point the SB.LB table looks ok: > $ ovn-sbctl list load_balancer > _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > name : lb1 > options : {hairpin_orig_tuple="true"} > protocol : tcp > vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > > # Simulate the bug in [3] > $ ovn-sbctl create load_balancer name=lb1 external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 > > # Even with the northd fix [4], due to the fact that the LB table is > # write-only, northd doesn't wake up so we end up with the duplicate > # staying in the SB for an indeterminate time. > $ ovn-sbctl list load_balancer > _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > name : lb1 > options : {hairpin_orig_tuple="true"} > protocol : tcp > vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > > _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f > datapaths : [] > external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > name : lb1 > options : {} > protocol : [] > vips : {} > > # Only when we generate an input for a "non-write-only" column the SB.LB > # is reconciled, because the "northd" I-P node runs. > $ ovn-nbctl --wait=sb sync > $ ovn-sbctl list load_balancer > _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > name : lb1 > options : {hairpin_orig_tuple="true"} > protocol : tcp > vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > > > Similarly for the patch [1], I didn't figure out what's exactly broken > > without change-tracking to the options column from the commit message and > > the patch itself. > > I think the reason behind [1] is that ovn-controller (in the pinctrl > module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, > which were then propagated by ovn-northd to > Logical_Router_Port.ipv6_prefix. > > This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the > "northd" I-P node. But, if the only change that woke up northd was the > SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is > not tracked, the en_sb_port_binding I-P node (input of northd) doesn't > run. > > > I think I need to understand the context before commenting on the current > > patch. Could you help explain a little more? > > > > I hope this makes it more clear. I'm sure there are more aspects I > missed here but I think we need to find a way to make northd work > correctly and without delays in reconciling state while at the same > time avoiding to affect latency (like in the performance regression > I introduced with [0]). > Thanks Dumitru for the detailed explanation. I think I understand the problem now. In both of the cases above, we were in fact monitoring the columns as "write-only" while they were in fact "read-write". 1) The first case (the load balancer one) looks more complex, but essentially it is an input to ovn-northd. If we consider "removing duplicates" as a feature of northd, although the duplicates would in fact only be triggered by northd itself (but not actually generated by itself, it was RAFT's problem that actually generated the duplicated row), northd is reading the records and then remove the duplicates and write it back. That's why the northd needs to track the load-balancer columns and react to the changes. (We may still call it "write-only" in this case, which doesn't mean northd doesn't read the value, but just that northd is the only one that decides what should be there, not worrying about overwriting others' changes. I'd rather call it "read-write" to be more precise.) 2) The second case (the IPv6 one) is more straightforward. If ovn-controller can modify the column, the column should never be regarded as write-only to northd, which is a bug even existed before northd I-P. There may be other such bugs that we haven't noticed before, but can be exposed by the I-P change. These bugs were not noticed before because anything that wakes up the main loop would trigger a recompute before I-P implementation, so we were actually handling any changes in time. And the problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't change a column's value.") doesn't apply to this use case (correct me if I am wrong). Now to the solution: a) To fix the bugs, we shouldn't blindly change-tracking for all tables and columns, but instead, track changes for the columns that are read-write but treated as write-only by mistake, while keep the real write-only columns write-only (e.g. the nb_cfg related columns in SB_Global). I think we should fix this regardless. b) However, this is not enough, because we are facing the performance problem introduced by such fixes, even if we only add change-tracking to very few new columns. E.g. the one added by [1] already impacts performance significantly even adding tracking for just one column. We need further improvement to solve the performance problem, which is primarily caused by the behavior introduced by [5]. However, in most use cases we don't have the problem mentioned by [5]. I think this is the assumption that leads to this patch as the solution for the performance problem and I think it is the right approach. It allows northd to get alerted by changes, but only write back columns that were changed. I just want to emphasize that we need this approach on top of a), and only apply for the columns that are read-write but don't worry about atomicity of transactions if only the changed values are written back. With the above understanding, I have some comments for the patch: i) Functionally I believe it serves the purpose of solving the performance problem, but it seems confusing what the modes really mean now. There were definitions to each of the flags: - OVSDB_IDL_MONITOR /* Replicate this column? */ - OVSDB_IDL_ALERT /* Alert client when column changes? */ - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name suggests. Each of the meanings was clear. Now with the patch it seems to change the meanings of: - OVSDB_IDL_ALERT to "alert client when column changes, and the column is NOT write-only" - OVSDB_IDL_TRACK to "alert client when column changes, and track the changes" This is really confusing. I think the purpose here is just to provide a capability of "writing back changed columns only" while still "alert client when column changes". Would it be better just to add a new flag for this, like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed APIs to keep the existed behavior, and provide an API for clients to set it explicitly for columns that are read-write but doesn't care about the atomicity. ii) The comments in ovsdb-idl.h under "The possible mode combinations are:" needs updating. Thanks, Han > > [0] > > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 > > [1] > > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 > > > > [2] https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency > [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 > [4] https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 > [5] https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605 > > Thanks, > > Han > > > > Thanks, > Dumitru > > >> > >> The default behavior of the IDL is to automatically turn a write-only > >> column into a read-write column whenever the client enables change > >> tracking for that column. > >> > >> For the afore mentioned clients, this becomes a performance issue. > >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't > >> change a column's value.") explains why writes that don't change a > >> column's value cannot be optimized out early if the column is > >> read/write. > >> > >> Furthermore, if there is at least one record in any table that > >> changed during a transaction, then *all* records that have been > >> written are added to the transaction, even if their values didn't > >> change. If there are many such rows (e.g., like in ovn-northd's > >> case) this incurs a significant overhead because: > >> a. the client has to build this large transaction > >> b. the transaction has to be sent over the network > >> c. the server needs to parse this (mostly) no-op update > >> > >> We now introduce new IDL APIs allowing users to explicitly request > >> monitoring mode combinations for different columns in the IDL. > >> > >> These accept the combination 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' > >> as a valid input allowing users to change track write-only > >> tables. > >> > >> We benchmarked ovn-northd performance when using this new mode > >> against NB and SB databases taken from ovn-kubernetes scale tests. > >> We noticed that when a minor change is performed to the Northbound > >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to > >> build the Southbound transaction becomes negligible (vs ~1.5 seconds > >> before this change). > >> > >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show > >> significant reduction of latency to bring up pods; both average and P99 > >> latency decreased by ~30%. > >> > >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > >> --- > >> V2: > >> - Addressed Numan's comments: > >> - Added APIs to allow per column configuration of modes. > >> - Fixed unit tests to actually enable change tracking for write-only > >> columns. > >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change > >> seqnos if change tracking is enabled (even for write-only rows). > >> > >> Note: The OVN counter part change is: > >> > > https://github.com/dceara/ovn/commit/0d0c689e6469ae2f8e195c423e9a91fbc514bd60 > >> --- > >> NEWS | 4 +++ > >> lib/ovsdb-idl.c | 85 +++++++++++++++++++++++++++++++++------------- > >> lib/ovsdb-idl.h | 8 ++++- > >> tests/ovsdb-idl.at | 20 +++++++++-- > >> tests/test-ovsdb.c | 25 ++++++++++---- > >> 5 files changed, 108 insertions(+), 34 deletions(-) > >> > >> diff --git a/NEWS b/NEWS > >> index 5074b97aa51c..c1c41da94e9b 100644 > >> --- a/NEWS > >> +++ b/NEWS > >> @@ -14,6 +14,10 @@ Post-v2.17.0 > >> - OVSDB: > >> * 'relay' service model now supports transaction history, i.e. > > honors the > >> 'last-txn-id' field in 'monitor_cond_since' requests from clients. > >> + - OVSDB-IDL: > >> + * New interfaces for explcitly setting column modes. The > > combination > >> + 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' is now considered valid and > >> + enables change tracking of write-only columns. > >> > >> > >> v2.17.0 - 17 Feb 2022 > >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > >> index 882ede75598d..5f62c267d496 100644 > >> --- a/lib/ovsdb-idl.c > >> +++ b/lib/ovsdb-idl.c > >> @@ -859,20 +859,6 @@ ovsdb_idl_get_mode(struct ovsdb_idl *idl, > >> return &table->modes[column - table->class_->columns]; > >> } > >> > >> -static void > >> -ovsdb_idl_set_mode(struct ovsdb_idl *idl, > >> - const struct ovsdb_idl_column *column, > >> - unsigned char mode) > >> -{ > >> - const struct ovsdb_idl_table *table = > > ovsdb_idl_table_from_column(idl, > >> - > > column); > >> - size_t column_idx = column - table->class_->columns; > >> - > >> - if (table->modes[column_idx] != mode) { > >> - *ovsdb_idl_get_mode(idl, column) = mode; > >> - } > >> -} > >> - > >> static void > >> add_ref_table(struct ovsdb_idl *idl, const struct ovsdb_base_type *base) > >> { > >> @@ -902,7 +888,8 @@ void > >> ovsdb_idl_add_column(struct ovsdb_idl *idl, > >> const struct ovsdb_idl_column *column) > >> { > >> - ovsdb_idl_set_mode(idl, column, OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); > >> + ovsdb_idl_set_column_mode(idl, column, > >> + OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); > >> add_ref_table(idl, &column->type.key); > >> add_ref_table(idl, &column->type.value); > >> } > >> @@ -1186,6 +1173,51 @@ ovsdb_idl_omit_alert(struct ovsdb_idl *idl, > >> *ovsdb_idl_get_mode(idl, column) &= ~(OVSDB_IDL_ALERT | > > OVSDB_IDL_TRACK); > >> } > >> > >> +/* Sets the 'column' mode to 'mode' which must be a valid combination of > >> + * OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT and OVSDB_IDL_TRACK. > >> + */ > >> +void > >> +ovsdb_idl_set_column_mode(struct ovsdb_idl *idl, > >> + const struct ovsdb_idl_column *column, > >> + unsigned char mode) > >> +{ > >> + if (mode & ~(OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) > > { > >> + VLOG_ABORT("%s(): Column '%s' unknown mode %02x.", > >> + __func__, column->name, mode); > >> + } > >> + > >> + if ((mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) > >> + && !(mode & OVSDB_IDL_MONITOR)) { > >> + VLOG_ABORT("%s(): Column '%s' mode must include > > OVSDB_IDL_MONITOR.", > >> + __func__, column->name); > >> + } > >> + > >> + const struct ovsdb_idl_table *table = > > ovsdb_idl_table_from_column(idl, > >> + > > column); > >> + size_t column_idx = column - table->class_->columns; > >> + > >> + if (table->modes[column_idx] != mode) { > >> + *ovsdb_idl_get_mode(idl, column) = mode; > >> + } > >> +} > >> + > >> +/* Walks all columns of tables in 'idl' and sets their mode to 'mode' > >> + * which must be a valid combination of OVSDB_IDL_MONITOR, > > OVSDB_IDL_ALERT > >> + * and OVSDB_IDL_TRACK. > >> + */ > >> +void > >> +ovsdb_idl_set_column_mode_all(struct ovsdb_idl *idl, unsigned char mode) > >> +{ > >> + for (size_t i = 0; i < idl->class_->n_tables; i++) { > >> + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; > >> + > >> + for (size_t j = 0; j < tc->n_columns; j++) { > >> + const struct ovsdb_idl_column *column = &tc->columns[j]; > >> + ovsdb_idl_set_column_mode(idl, column, mode); > >> + } > >> + } > >> +} > >> + > >> /* Sets the mode for 'column' in 'idl' to 0. See the big comment above > >> * OVSDB_IDL_MONITOR for details. > >> * > >> @@ -1236,8 +1268,8 @@ ovsdb_idl_row_get_seqno(const struct ovsdb_idl_row > > *row, > >> * functions. > >> * > >> * This function should be called between ovsdb_idl_create() and > >> - * the first call to ovsdb_idl_run(). The column to be tracked > >> - * should have OVSDB_IDL_ALERT turned on. > >> + * the first call to ovsdb_idl_run(). The function will also ensure > >> + * that the column to be tracked has OVSDB_IDL_ALERT turned on. > >> */ > >> void > >> ovsdb_idl_track_add_column(struct ovsdb_idl *idl, > >> @@ -1246,7 +1278,9 @@ ovsdb_idl_track_add_column(struct ovsdb_idl *idl, > >> if (!(*ovsdb_idl_get_mode(idl, column) & OVSDB_IDL_ALERT)) { > >> ovsdb_idl_add_column(idl, column); > >> } > >> - *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_TRACK; > >> + > >> + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | > > OVSDB_IDL_TRACK; > >> + ovsdb_idl_set_column_mode(idl, column, mode); > >> } > >> > >> void > >> @@ -1694,11 +1728,14 @@ ovsdb_idl_row_change(struct ovsdb_idl_row *row, > > const struct shash *values, > >> continue; > >> } > >> > >> - if (datum_changed && table->modes[column_idx] & OVSDB_IDL_ALERT) > > { > >> - changed = true; > >> - row->change_seqno[change] > >> - = row->table->change_seqno[change] > >> - = row->table->idl->change_seqno + 1; > >> + if (datum_changed) { > >> + unsigned char column_mode = table->modes[column_idx]; > >> + if (column_mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { > >> + changed = true; > >> + row->change_seqno[change] > >> + = row->table->change_seqno[change] > >> + = row->table->idl->change_seqno + 1; > >> + } > >> > >> if (table->modes[column_idx] & OVSDB_IDL_TRACK) { > >> if (ovs_list_is_empty(&row->track_node) && > >> @@ -3501,7 +3538,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row > > *row_, > >> > >> class = row->table->class_; > >> column_idx = column - class->columns; > >> - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; > >> + write_only = !(row->table->modes[column_idx] & OVSDB_IDL_ALERT); > >> > >> ovs_assert(row->new_datum != NULL); > >> ovs_assert(column_idx < class->n_columns); > >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h > >> index d00599616ef9..2d74ed0727a7 100644 > >> --- a/lib/ovsdb-idl.h > >> +++ b/lib/ovsdb-idl.h > >> @@ -180,7 +180,8 @@ bool ovsdb_idl_server_has_column(const struct > > ovsdb_idl *, > >> * > >> * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a > > column > >> * that a client wants to track using the change tracking > >> - * ovsdb_idl_track_get_*() functions. > >> + * ovsdb_idl_track_get_*() functions. OVSDB_IDL_ALERT can be > > omitted for > >> + * write-only columns. > >> */ > >> #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ > >> #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? > > */ > >> @@ -193,6 +194,11 @@ void ovsdb_idl_add_table(struct ovsdb_idl *, > >> void ovsdb_idl_omit(struct ovsdb_idl *, const struct ovsdb_idl_column *); > >> void ovsdb_idl_omit_alert(struct ovsdb_idl *, const struct > > ovsdb_idl_column *); > >> > >> +void ovsdb_idl_set_column_mode(struct ovsdb_idl *, > >> + const struct ovsdb_idl_column *, > >> + unsigned char mode); > >> +void ovsdb_idl_set_column_mode_all(struct ovsdb_idl *, unsigned char > > mode); > >> + > >> /* Change tracking. > >> * > >> * In OVSDB, change tracking is applied at each client in the IDL > > layer. This > >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > >> index 62e2b638320c..0436e0f77b0f 100644 > >> --- a/tests/ovsdb-idl.at > >> +++ b/tests/ovsdb-idl.at > >> @@ -1154,12 +1154,25 @@ OVSDB_CHECK_IDL_WO_MONITOR_COND([simple idl > > disable monitor-cond], > >> ]]) > >> > >> m4_define([OVSDB_CHECK_IDL_TRACK_C], > >> - [AT_SETUP([$1 - C]) > >> + [AT_SETUP([$1 - alert - C]) > >> + AT_KEYWORDS([ovsdb server idl tracking positive $5]) > >> + AT_CHECK([ovsdb_start_idltest]) > >> + m4_if([$2], [], [], > >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > > [ignore])]) > >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > > -t10 --change-track=alert idl unix:socket $3], > >> + [0], [stdout], [ignore]) > >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > >> + [0], [$4]) > >> + OVSDB_SERVER_SHUTDOWN > >> + AT_CLEANUP]) > >> + > >> +m4_define([OVSDB_CHECK_IDL_TRACK_NOALERT_C], > >> + [AT_SETUP([$1 - noalert - C]) > >> AT_KEYWORDS([ovsdb server idl tracking positive $5]) > >> AT_CHECK([ovsdb_start_idltest]) > >> m4_if([$2], [], [], > >> [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > > [ignore])]) > >> - AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > > -t10 -c idl unix:socket $3], > >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > > -t10 --change-track=noalert idl unix:socket $3], > >> [0], [stdout], [ignore]) > >> AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > >> [0], [$4]) > >> @@ -1167,7 +1180,8 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], > >> AT_CLEANUP]) > >> > >> m4_define([OVSDB_CHECK_IDL_TRACK], > >> - [OVSDB_CHECK_IDL_TRACK_C($@)]) > >> + [OVSDB_CHECK_IDL_TRACK_C($@) > >> + OVSDB_CHECK_IDL_TRACK_NOALERT_C($@)]) > >> > >> OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], > >> [['["idltest", > >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > >> index ca4e87b8115c..abd30cf27d18 100644 > >> --- a/tests/test-ovsdb.c > >> +++ b/tests/test-ovsdb.c > >> @@ -56,6 +56,7 @@ > >> VLOG_DEFINE_THIS_MODULE(test_ovsdb); > >> > >> struct test_ovsdb_pvt_context { > >> + bool track_alert; > >> bool track; > >> }; > >> > >> @@ -122,6 +123,15 @@ parse_options(int argc, char *argv[], struct > > test_ovsdb_pvt_context *pvt) > >> break; > >> > >> case 'c': > >> + if (optarg) { > >> + if (!strcmp(optarg, "noalert")) { > >> + pvt->track_alert = false; > >> + } else if (!strcmp(optarg, "alert")) { > >> + pvt->track_alert = true; > >> + } else { > >> + ovs_fatal(0, "value %s is invalid", optarg); > >> + } > >> + } > >> pvt->track = true; > >> break; > >> > >> @@ -2610,6 +2620,7 @@ update_conditions(struct ovsdb_idl *idl, char > > *commands) > >> static void > >> do_idl(struct ovs_cmdl_context *ctx) > >> { > >> + struct test_ovsdb_pvt_context *pvt = ctx->pvt; > >> struct jsonrpc *rpc; > >> struct ovsdb_idl *idl; > >> unsigned int seqno = 0; > >> @@ -2618,9 +2629,6 @@ do_idl(struct ovs_cmdl_context *ctx) > >> int step = 0; > >> int error; > >> int i; > >> - bool track; > >> - > >> - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; > >> > >> idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); > >> ovsdb_idl_set_leader_only(idl, false); > >> @@ -2637,8 +2645,13 @@ do_idl(struct ovs_cmdl_context *ctx) > >> rpc = NULL; > >> } > >> > >> - if (track) { > >> - ovsdb_idl_track_add_all(idl); > >> + if (pvt->track) { > >> + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK; > >> + > >> + if (pvt->track_alert) { > >> + mode |= OVSDB_IDL_ALERT; > >> + } > >> + ovsdb_idl_set_column_mode_all(idl, mode); > >> } > >> > >> setvbuf(stdout, NULL, _IONBF, 0); > >> @@ -2683,7 +2696,7 @@ do_idl(struct ovs_cmdl_context *ctx) > >> } > >> > >> /* Print update. */ > >> - if (track) { > >> + if (pvt->track) { > >> print_idl_track(idl, step++, terse); > >> ovsdb_idl_track_clear(idl); > >> } else { > >> -- > >> 2.27.0 > >> > > >
On 4/16/22 00:41, Han Zhou wrote: > On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 4/15/22 20:25, Han Zhou wrote: >>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> At a first glance, change tracking should never be allowed for >>>> write-only columns. However, some clients (e.g., ovn-northd) that are >>>> mostly exclusive writers of a database, use change tracking to avoid >>>> duplicating the IDL row records into a local cache when implementing >>>> incremental processing. >>> >>> Hi Dumitru, >>> >> >> Hi Han, >> >>> It is still not clear to me why ovn-northd would need change-tracking > for >>> the write-only columns. It didn't require change tracking before using > the >>> incremental processing engine. In theory, to my understanding, >>> change-tracking is required only if the column is an input to some of > the >>> engine nodes, so that it needs to be alerted when there is a change in > the >>> database. If the column is write-only to ovn-northd, it means it doesn't >>> care about any change of the column from the DB, but blindly writes to > the >>> column regardless of the current value. I checked the patch [0], which >>> mentioned that >>> >>> Such nodes are primary inputs to the northd incremental processing >>> engine and without proper update processing for Southbound tables, >>> northd might fail to timely reconcile the contents of the Southbound >>> database. >>> >>> I am confused that, if the columns were write-only, how come they become >>> inputs to northd and requires reconcile upon them? Does it mean that > they >>> were misused before I-P, that they were in fact read/write columns to >>> ovn-northd? >> >> I think it's actually a bit more complicated than this. >> >> Even for pure write-only tables/columns we have a problem due to the >> fact that clustered ovsdb offers at-least-once consistency [2]. >> >> We actually hit the case in which a single northd transaction to insert >> a SB.Load_Balancer record resulted in two identical rows being added to >> the database, both of them pointing to the same logical switch datapath >> binding [3]. When that logical switch was deleted northd would fail to >> remove the duplicate from the SB causing a referential integrity >> violation due to the dangling reference. >> >> Arguably the schema for load balancers is not ideal because it doesn't >> defines an index on "name". Nevertheless, that's not something we can >> change easily because it will (quite likely with raft) break existing >> deployments. And it applies to other tables in the NB/SB schema. >> >> The fix was relatively straighforward though [4], and it meant fixing >> the way northd reconciles the SB.Load_balancers. >> >> This is were I-P became an issue. Consider the current main branch >> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for >> all SB tables.") reverted. In a sandbox we do: >> >> $ ovn-nbctl ls-add ls >> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 >> >> # At this point the SB.LB table looks ok: >> $ ovn-sbctl list load_balancer > > >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >> name : lb1 >> options : {hairpin_orig_tuple="true"} >> protocol : tcp >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >> >> # Simulate the bug in [3] >> $ ovn-sbctl create load_balancer name=lb1 > external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 >> >> # Even with the northd fix [4], due to the fact that the LB table is >> # write-only, northd doesn't wake up so we end up with the duplicate >> # staying in the SB for an indeterminate time. >> $ ovn-sbctl list load_balancer >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >> name : lb1 >> options : {hairpin_orig_tuple="true"} >> protocol : tcp >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >> >> _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f >> datapaths : [] >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >> name : lb1 >> options : {} >> protocol : [] >> vips : {} >> >> # Only when we generate an input for a "non-write-only" column the SB.LB >> # is reconciled, because the "northd" I-P node runs. >> $ ovn-nbctl --wait=sb sync >> $ ovn-sbctl list load_balancer >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >> name : lb1 >> options : {hairpin_orig_tuple="true"} >> protocol : tcp >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >> >>> Similarly for the patch [1], I didn't figure out what's exactly broken >>> without change-tracking to the options column from the commit message > and >>> the patch itself. >> >> I think the reason behind [1] is that ovn-controller (in the pinctrl >> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, >> which were then propagated by ovn-northd to >> Logical_Router_Port.ipv6_prefix. >> >> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the >> "northd" I-P node. But, if the only change that woke up northd was the >> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is >> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't >> run. >> >>> I think I need to understand the context before commenting on the > current >>> patch. Could you help explain a little more? >>> >> >> I hope this makes it more clear. I'm sure there are more aspects I >> missed here but I think we need to find a way to make northd work >> correctly and without delays in reconciling state while at the same >> time avoiding to affect latency (like in the performance regression >> I introduced with [0]). >> > Thanks Dumitru for the detailed explanation. I think I understand the > problem now. In both of the cases above, we were in fact monitoring the > columns as "write-only" while they were in fact "read-write". > > 1) The first case (the load balancer one) looks more complex, but > essentially it is an input to ovn-northd. If we consider "removing > duplicates" as a feature of northd, although the duplicates would in fact > only be triggered by northd itself (but not actually generated by itself, > it was RAFT's problem that actually generated the duplicated row), northd > is reading the records and then remove the duplicates and write it back. > That's why the northd needs to track the load-balancer columns and react to > the changes. (We may still call it "write-only" in this case, which doesn't > mean northd doesn't read the value, but just that northd is the only one > that decides what should be there, not worrying about overwriting others' > changes. I'd rather call it "read-write" to be more precise.) > Sure, but just for completeness, this applies to the following tables too (they don't have an index defined in the SB schema): DHCP_Options, DHCPv6_Options, DNS, HA_Chassis. Most of these didn't have change tracking enabled before [0] either as far as I can tell; or at least not directly. We'd have to see if there are more corner cases to cover. > 2) The second case (the IPv6 one) is more straightforward. If > ovn-controller can modify the column, the column should never be regarded > as write-only to northd, which is a bug even existed before northd I-P. > There may be other such bugs that we haven't noticed before, but can be > exposed by the I-P change. These bugs were not noticed before because > anything that wakes up the main loop would trigger a recompute before I-P > implementation, so we were actually handling any changes in time. And the > problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of > writes that don't change a column's value.") doesn't apply to this use case > (correct me if I am wrong). > Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <-> ovn-controller context because both ovn-northd and ovn-controller don't really care about transactions and (should) ensure eventual consistency of the SB data. > Now to the solution: > > a) To fix the bugs, we shouldn't blindly change-tracking for all tables and > columns, but instead, track changes for the columns that are read-write but > treated as write-only by mistake, while keep the real write-only columns > write-only (e.g. the nb_cfg related columns in SB_Global). I think we > should fix this regardless. > In theory, yes, but please see "c)" below. > b) However, this is not enough, because we are facing the performance > problem introduced by such fixes, even if we only add change-tracking to > very few new columns. E.g. the one added by [1] already impacts performance > significantly even adding tracking for just one column. We need further > improvement to solve the performance problem, which is primarily caused by > the behavior introduced by [5]. However, in most use cases we don't have > the problem mentioned by [5]. I think this is the assumption that leads to > this patch as the solution for the performance problem and I think it is > the right approach. It allows northd to get alerted by changes, but only > write back columns that were changed. I just want to emphasize that we need > this approach on top of a), and only apply for the columns that are > read-write but don't worry about atomicity of transactions if only the > changed values are written back. I would add: c) However, before [6], the non-incremental implementation of ovn-northd had the side effect that *any* external change to a SB record/column, regardless of how the column was monitored, would trigger northd to reconcile the SB contents. That won't be the case anymore. I'm not sure if this is a big problem but it might uncover issues that used to be handled indirectly by the northd full reprocessing. E.g., in a sandbox with [0] reverted: $ ovn-nbctl ls-add ls $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0 , match=(1), action=(next;) # Simulate a lflow "disappearing" from the SB. $ ovn-sbctl destroy logical_flow f771dbcb # Northd doesn't recreate it until a read/write column changes value: $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb $ ovn-nbctl --wait=sb sync $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0 , match=(1), action=(next;) So, with all this in mind, if we find a good way of defining this new monitoring mode ("writing back changed columns only") I don't really see a problem with blindly enabling change tracking and this new mode for the Southbound. All in all it just ensures the same behavior as before [6] without any real drawbacks. > > With the above understanding, I have some comments for the patch: > > i) Functionally I believe it serves the purpose of solving the performance > problem, but it seems confusing what the modes really mean now. There were > definitions to each of the flags: > - OVSDB_IDL_MONITOR /* Replicate this column? */ > - OVSDB_IDL_ALERT /* Alert client when column changes? */ > - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name > suggests. > > Each of the meanings was clear. Now with the patch it seems to change the > meanings of: > - OVSDB_IDL_ALERT to "alert client when column changes, and the column is > NOT write-only" > - OVSDB_IDL_TRACK to "alert client when column changes, and track the > changes" > > This is really confusing. I think the purpose here is just to provide a > capability of "writing back changed columns only" while still "alert client > when column changes". Would it be better just to add a new flag for this, > like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed APIs > to keep the existed behavior, and provide an API for clients to set it > explicitly for columns that are read-write but doesn't care about the > atomicity. Sure, sounds better to make it explicit indeed. > > ii) The comments in ovsdb-idl.h under "The possible mode combinations are:" > needs updating. Ack. I'll prepare a v3 for the idl change. In the meantime it would be great if we could already agree on the way we want to enable this in ovn-northd (i.e., blindly for all SB vs. selectively for some tables). > > Thanks, > Han > >>> [0] >>> > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 >>> [1] >>> > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 >>> >> >> [2] > https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency >> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 >> [4] > https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 >> > [5] > https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605 > [6] https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e Thanks, Dumitru
On Tue, Apr 19, 2022 at 12:49 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/16/22 00:41, Han Zhou wrote: > > On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <dceara@redhat.com> wrote: > >> > >> On 4/15/22 20:25, Han Zhou wrote: > >>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> wrote: > >>>> > >>>> At a first glance, change tracking should never be allowed for > >>>> write-only columns. However, some clients (e.g., ovn-northd) that are > >>>> mostly exclusive writers of a database, use change tracking to avoid > >>>> duplicating the IDL row records into a local cache when implementing > >>>> incremental processing. > >>> > >>> Hi Dumitru, > >>> > >> > >> Hi Han, > >> > >>> It is still not clear to me why ovn-northd would need change-tracking > > for > >>> the write-only columns. It didn't require change tracking before using > > the > >>> incremental processing engine. In theory, to my understanding, > >>> change-tracking is required only if the column is an input to some of > > the > >>> engine nodes, so that it needs to be alerted when there is a change in > > the > >>> database. If the column is write-only to ovn-northd, it means it doesn't > >>> care about any change of the column from the DB, but blindly writes to > > the > >>> column regardless of the current value. I checked the patch [0], which > >>> mentioned that > >>> > >>> Such nodes are primary inputs to the northd incremental processing > >>> engine and without proper update processing for Southbound tables, > >>> northd might fail to timely reconcile the contents of the Southbound > >>> database. > >>> > >>> I am confused that, if the columns were write-only, how come they become > >>> inputs to northd and requires reconcile upon them? Does it mean that > > they > >>> were misused before I-P, that they were in fact read/write columns to > >>> ovn-northd? > >> > >> I think it's actually a bit more complicated than this. > >> > >> Even for pure write-only tables/columns we have a problem due to the > >> fact that clustered ovsdb offers at-least-once consistency [2]. > >> > >> We actually hit the case in which a single northd transaction to insert > >> a SB.Load_Balancer record resulted in two identical rows being added to > >> the database, both of them pointing to the same logical switch datapath > >> binding [3]. When that logical switch was deleted northd would fail to > >> remove the duplicate from the SB causing a referential integrity > >> violation due to the dangling reference. > >> > >> Arguably the schema for load balancers is not ideal because it doesn't > >> defines an index on "name". Nevertheless, that's not something we can > >> change easily because it will (quite likely with raft) break existing > >> deployments. And it applies to other tables in the NB/SB schema. > >> > >> The fix was relatively straighforward though [4], and it meant fixing > >> the way northd reconciles the SB.Load_balancers. > >> > >> This is were I-P became an issue. Consider the current main branch > >> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for > >> all SB tables.") reverted. In a sandbox we do: > >> > >> $ ovn-nbctl ls-add ls > >> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 > >> > >> # At this point the SB.LB table looks ok: > >> $ ovn-sbctl list load_balancer > > > > > >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > >> name : lb1 > >> options : {hairpin_orig_tuple="true"} > >> protocol : tcp > >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > >> > >> # Simulate the bug in [3] > >> $ ovn-sbctl create load_balancer name=lb1 > > external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 > >> > >> # Even with the northd fix [4], due to the fact that the LB table is > >> # write-only, northd doesn't wake up so we end up with the duplicate > >> # staying in the SB for an indeterminate time. > >> $ ovn-sbctl list load_balancer > >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > >> name : lb1 > >> options : {hairpin_orig_tuple="true"} > >> protocol : tcp > >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > >> > >> _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f > >> datapaths : [] > >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > >> name : lb1 > >> options : {} > >> protocol : [] > >> vips : {} > >> > >> # Only when we generate an input for a "non-write-only" column the SB.LB > >> # is reconciled, because the "northd" I-P node runs. > >> $ ovn-nbctl --wait=sb sync > >> $ ovn-sbctl list load_balancer > >> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 > >> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] > >> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} > >> name : lb1 > >> options : {hairpin_orig_tuple="true"} > >> protocol : tcp > >> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} > >> > >>> Similarly for the patch [1], I didn't figure out what's exactly broken > >>> without change-tracking to the options column from the commit message > > and > >>> the patch itself. > >> > >> I think the reason behind [1] is that ovn-controller (in the pinctrl > >> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, > >> which were then propagated by ovn-northd to > >> Logical_Router_Port.ipv6_prefix. > >> > >> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the > >> "northd" I-P node. But, if the only change that woke up northd was the > >> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is > >> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't > >> run. > >> > >>> I think I need to understand the context before commenting on the > > current > >>> patch. Could you help explain a little more? > >>> > >> > >> I hope this makes it more clear. I'm sure there are more aspects I > >> missed here but I think we need to find a way to make northd work > >> correctly and without delays in reconciling state while at the same > >> time avoiding to affect latency (like in the performance regression > >> I introduced with [0]). > >> > > Thanks Dumitru for the detailed explanation. I think I understand the > > problem now. In both of the cases above, we were in fact monitoring the > > columns as "write-only" while they were in fact "read-write". > > > > 1) The first case (the load balancer one) looks more complex, but > > essentially it is an input to ovn-northd. If we consider "removing > > duplicates" as a feature of northd, although the duplicates would in fact > > only be triggered by northd itself (but not actually generated by itself, > > it was RAFT's problem that actually generated the duplicated row), northd > > is reading the records and then remove the duplicates and write it back. > > That's why the northd needs to track the load-balancer columns and react to > > the changes. (We may still call it "write-only" in this case, which doesn't > > mean northd doesn't read the value, but just that northd is the only one > > that decides what should be there, not worrying about overwriting others' > > changes. I'd rather call it "read-write" to be more precise.) > > > > Sure, but just for completeness, this applies to the following tables > too (they don't have an index defined in the SB schema): DHCP_Options, > DHCPv6_Options, DNS, HA_Chassis. > > Most of these didn't have change tracking enabled before [0] either as > far as I can tell; or at least not directly. We'd have to see if there > are more corner cases to cover. > > > 2) The second case (the IPv6 one) is more straightforward. If > > ovn-controller can modify the column, the column should never be regarded > > as write-only to northd, which is a bug even existed before northd I-P. > > There may be other such bugs that we haven't noticed before, but can be > > exposed by the I-P change. These bugs were not noticed before because > > anything that wakes up the main loop would trigger a recompute before I-P > > implementation, so we were actually handling any changes in time. And the > > problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of > > writes that don't change a column's value.") doesn't apply to this use case > > (correct me if I am wrong). > > > > Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <-> > ovn-controller context because both ovn-northd and ovn-controller don't > really care about transactions and (should) ensure eventual consistency > of the SB data. I'd be careful to claim "don't really care about transactions" for all use cases, not that I know any use cases we do care about, but I just haven't examined every use case of ovn-northd and ovn-controller. Anyway, as long as we have the capability in the IDL to handle each column the way we want (with or without a flag: OVSDB_IDL_WRITE_CHANGED_ONLY), we should be able to handle both cases. Maybe we can assume it is ok to not care about atomicity of writes in northd, and write the changed columns only for now, but later if we figure out there is any problem for some columns, we can always remove the flag for those columns. > > > Now to the solution: > > > > a) To fix the bugs, we shouldn't blindly change-tracking for all tables and > > columns, but instead, track changes for the columns that are read-write but > > treated as write-only by mistake, while keep the real write-only columns > > write-only (e.g. the nb_cfg related columns in SB_Global). I think we > > should fix this regardless. > > > > In theory, yes, but please see "c)" below. > > > b) However, this is not enough, because we are facing the performance > > problem introduced by such fixes, even if we only add change-tracking to > > very few new columns. E.g. the one added by [1] already impacts performance > > significantly even adding tracking for just one column. We need further > > improvement to solve the performance problem, which is primarily caused by > > the behavior introduced by [5]. However, in most use cases we don't have > > the problem mentioned by [5]. I think this is the assumption that leads to > > this patch as the solution for the performance problem and I think it is > > the right approach. It allows northd to get alerted by changes, but only > > write back columns that were changed. I just want to emphasize that we need > > this approach on top of a), and only apply for the columns that are > > read-write but don't worry about atomicity of transactions if only the > > changed values are written back. > > I would add: > > c) However, before [6], the non-incremental implementation of ovn-northd > had the side effect that *any* external change to a SB record/column, > regardless of how the column was monitored, would trigger northd to > reconcile the SB contents. That won't be the case anymore. I'm not > sure if this is a big problem but it might uncover issues that used to > be handled indirectly by the northd full reprocessing. > > E.g., in a sandbox with [0] reverted: > > $ ovn-nbctl ls-add ls > $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb > uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0 , > match=(1), action=(next;) > > # Simulate a lflow "disappearing" from the SB. > $ ovn-sbctl destroy logical_flow f771dbcb > > # Northd doesn't recreate it until a read/write column changes value: > $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb > $ ovn-nbctl --wait=sb sync > $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb > uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0 , > match=(1), action=(next;) > > So, with all this in mind, if we find a good way of defining this new > monitoring mode ("writing back changed columns only") I don't really see > a problem with blindly enabling change tracking and this new mode for > the Southbound. All in all it just ensures the same behavior as before > [6] without any real drawbacks. > I understand c), but I still think a) is important. For the example above, since we expect northd to detect changes in SB logical_flow table and fix it, it should be regarded as read-write. Probably most of the tables and columns fall into this category, but there are exceptions. In some cases we don't even want/need to fix changes. One example is nb_cfg, which is only for sync or performance test purposes. It is purely write-only, and no need to detect any changes from other components. Currently with alert enabled, after every nb_cfg update to the SB by northd, the northd would receive notification for its own change to this column which triggers a recompute unnecessarily. I am not sure if there are other examples. But I think overall we are on the same page. We can detect such cases one-by-one and treat them as write-only. Thanks, Han > > > > With the above understanding, I have some comments for the patch: > > > > i) Functionally I believe it serves the purpose of solving the performance > > problem, but it seems confusing what the modes really mean now. There were > > definitions to each of the flags: > > - OVSDB_IDL_MONITOR /* Replicate this column? */ > > - OVSDB_IDL_ALERT /* Alert client when column changes? */ > > - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name > > suggests. > > > > Each of the meanings was clear. Now with the patch it seems to change the > > meanings of: > > - OVSDB_IDL_ALERT to "alert client when column changes, and the column is > > NOT write-only" > > - OVSDB_IDL_TRACK to "alert client when column changes, and track the > > changes" > > > > This is really confusing. I think the purpose here is just to provide a > > capability of "writing back changed columns only" while still "alert client > > when column changes". Would it be better just to add a new flag for this, > > like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed APIs > > to keep the existed behavior, and provide an API for clients to set it > > explicitly for columns that are read-write but doesn't care about the > > atomicity. > > Sure, sounds better to make it explicit indeed. > > > > > ii) The comments in ovsdb-idl.h under "The possible mode combinations are:" > > needs updating. > > Ack. > > I'll prepare a v3 for the idl change. In the meantime it would be great > if we could already agree on the way we want to enable this in > ovn-northd (i.e., blindly for all SB vs. selectively for some tables). > > > > > Thanks, > > Han > > > >>> [0] > >>> > > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 > >>> [1] > >>> > > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 > >>> > >> > >> [2] > > https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency > >> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 > >> [4] > > https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 > >> > > [5] > > https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605 > > > > [6] > https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e > > Thanks, > Dumitru >
On 4/19/22 19:47, Han Zhou wrote: > On Tue, Apr 19, 2022 at 12:49 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 4/16/22 00:41, Han Zhou wrote: >>> On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 4/15/22 20:25, Han Zhou wrote: >>>>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <dceara@redhat.com> > wrote: >>>>>> >>>>>> At a first glance, change tracking should never be allowed for >>>>>> write-only columns. However, some clients (e.g., ovn-northd) that > are >>>>>> mostly exclusive writers of a database, use change tracking to avoid >>>>>> duplicating the IDL row records into a local cache when implementing >>>>>> incremental processing. >>>>> >>>>> Hi Dumitru, >>>>> >>>> >>>> Hi Han, >>>> >>>>> It is still not clear to me why ovn-northd would need change-tracking >>> for >>>>> the write-only columns. It didn't require change tracking before using >>> the >>>>> incremental processing engine. In theory, to my understanding, >>>>> change-tracking is required only if the column is an input to some of >>> the >>>>> engine nodes, so that it needs to be alerted when there is a change in >>> the >>>>> database. If the column is write-only to ovn-northd, it means it > doesn't >>>>> care about any change of the column from the DB, but blindly writes to >>> the >>>>> column regardless of the current value. I checked the patch [0], which >>>>> mentioned that >>>>> >>>>> Such nodes are primary inputs to the northd incremental processing >>>>> engine and without proper update processing for Southbound tables, >>>>> northd might fail to timely reconcile the contents of the > Southbound >>>>> database. >>>>> >>>>> I am confused that, if the columns were write-only, how come they > become >>>>> inputs to northd and requires reconcile upon them? Does it mean that >>> they >>>>> were misused before I-P, that they were in fact read/write columns to >>>>> ovn-northd? >>>> >>>> I think it's actually a bit more complicated than this. >>>> >>>> Even for pure write-only tables/columns we have a problem due to the >>>> fact that clustered ovsdb offers at-least-once consistency [2]. >>>> >>>> We actually hit the case in which a single northd transaction to insert >>>> a SB.Load_Balancer record resulted in two identical rows being added to >>>> the database, both of them pointing to the same logical switch datapath >>>> binding [3]. When that logical switch was deleted northd would fail to >>>> remove the duplicate from the SB causing a referential integrity >>>> violation due to the dangling reference. >>>> >>>> Arguably the schema for load balancers is not ideal because it doesn't >>>> defines an index on "name". Nevertheless, that's not something we can >>>> change easily because it will (quite likely with raft) break existing >>>> deployments. And it applies to other tables in the NB/SB schema. >>>> >>>> The fix was relatively straighforward though [4], and it meant fixing >>>> the way northd reconciles the SB.Load_balancers. >>>> >>>> This is were I-P became an issue. Consider the current main branch >>>> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for >>>> all SB tables.") reverted. In a sandbox we do: >>>> >>>> $ ovn-nbctl ls-add ls >>>> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 >>>> >>>> # At this point the SB.LB table looks ok: >>>> $ ovn-sbctl list load_balancer >>> >>> >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>> # Simulate the bug in [3] >>>> $ ovn-sbctl create load_balancer name=lb1 >>> external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 >>>> >>>> # Even with the northd fix [4], due to the fact that the LB table is >>>> # write-only, northd doesn't wake up so we end up with the duplicate >>>> # staying in the SB for an indeterminate time. >>>> $ ovn-sbctl list load_balancer >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>> _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f >>>> datapaths : [] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {} >>>> protocol : [] >>>> vips : {} >>>> >>>> # Only when we generate an input for a "non-write-only" column the > SB.LB >>>> # is reconciled, because the "northd" I-P node runs. >>>> $ ovn-nbctl --wait=sb sync >>>> $ ovn-sbctl list load_balancer >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>>> Similarly for the patch [1], I didn't figure out what's exactly broken >>>>> without change-tracking to the options column from the commit message >>> and >>>>> the patch itself. >>>> >>>> I think the reason behind [1] is that ovn-controller (in the pinctrl >>>> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, >>>> which were then propagated by ovn-northd to >>>> Logical_Router_Port.ipv6_prefix. >>>> >>>> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the >>>> "northd" I-P node. But, if the only change that woke up northd was the >>>> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is >>>> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't >>>> run. >>>> >>>>> I think I need to understand the context before commenting on the >>> current >>>>> patch. Could you help explain a little more? >>>>> >>>> >>>> I hope this makes it more clear. I'm sure there are more aspects I >>>> missed here but I think we need to find a way to make northd work >>>> correctly and without delays in reconciling state while at the same >>>> time avoiding to affect latency (like in the performance regression >>>> I introduced with [0]). >>>> >>> Thanks Dumitru for the detailed explanation. I think I understand the >>> problem now. In both of the cases above, we were in fact monitoring the >>> columns as "write-only" while they were in fact "read-write". >>> >>> 1) The first case (the load balancer one) looks more complex, but >>> essentially it is an input to ovn-northd. If we consider "removing >>> duplicates" as a feature of northd, although the duplicates would in > fact >>> only be triggered by northd itself (but not actually generated by > itself, >>> it was RAFT's problem that actually generated the duplicated row), > northd >>> is reading the records and then remove the duplicates and write it back. >>> That's why the northd needs to track the load-balancer columns and > react to >>> the changes. (We may still call it "write-only" in this case, which > doesn't >>> mean northd doesn't read the value, but just that northd is the only one >>> that decides what should be there, not worrying about overwriting > others' >>> changes. I'd rather call it "read-write" to be more precise.) >>> >> >> Sure, but just for completeness, this applies to the following tables >> too (they don't have an index defined in the SB schema): DHCP_Options, >> DHCPv6_Options, DNS, HA_Chassis. >> >> Most of these didn't have change tracking enabled before [0] either as >> far as I can tell; or at least not directly. We'd have to see if there >> are more corner cases to cover. >> >>> 2) The second case (the IPv6 one) is more straightforward. If >>> ovn-controller can modify the column, the column should never be > regarded >>> as write-only to northd, which is a bug even existed before northd I-P. >>> There may be other such bugs that we haven't noticed before, but can be >>> exposed by the I-P change. These bugs were not noticed before because >>> anything that wakes up the main loop would trigger a recompute before > I-P >>> implementation, so we were actually handling any changes in time. And > the >>> problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity > of >>> writes that don't change a column's value.") doesn't apply to this use > case >>> (correct me if I am wrong). >>> >> >> Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <-> >> ovn-controller context because both ovn-northd and ovn-controller don't >> really care about transactions and (should) ensure eventual consistency >> of the SB data. > > I'd be careful to claim "don't really care about transactions" for all use > cases, not that I know any use cases we do care about, but I just haven't > examined every use case of ovn-northd and ovn-controller. Anyway, as long > as we have the capability in the IDL to handle each column the way we want > (with or without a flag: OVSDB_IDL_WRITE_CHANGED_ONLY), we should be able > to handle both cases. Maybe we can assume it is ok to not care about > atomicity of writes in northd, and write the changed columns only for now, > but later if we figure out there is any problem for some columns, we can > always remove the flag for those columns. > >> >>> Now to the solution: >>> >>> a) To fix the bugs, we shouldn't blindly change-tracking for all tables > and >>> columns, but instead, track changes for the columns that are read-write > but >>> treated as write-only by mistake, while keep the real write-only columns >>> write-only (e.g. the nb_cfg related columns in SB_Global). I think we >>> should fix this regardless. >>> >> >> In theory, yes, but please see "c)" below. >> >>> b) However, this is not enough, because we are facing the performance >>> problem introduced by such fixes, even if we only add change-tracking to >>> very few new columns. E.g. the one added by [1] already impacts > performance >>> significantly even adding tracking for just one column. We need further >>> improvement to solve the performance problem, which is primarily caused > by >>> the behavior introduced by [5]. However, in most use cases we don't have >>> the problem mentioned by [5]. I think this is the assumption that leads > to >>> this patch as the solution for the performance problem and I think it is >>> the right approach. It allows northd to get alerted by changes, but only >>> write back columns that were changed. I just want to emphasize that we > need >>> this approach on top of a), and only apply for the columns that are >>> read-write but don't worry about atomicity of transactions if only the >>> changed values are written back. >> >> I would add: >> >> c) However, before [6], the non-incremental implementation of ovn-northd >> had the side effect that *any* external change to a SB record/column, >> regardless of how the column was monitored, would trigger northd to >> reconcile the SB contents. That won't be the case anymore. I'm not >> sure if this is a big problem but it might uncover issues that used to >> be handled indirectly by the northd full reprocessing. >> >> E.g., in a sandbox with [0] reverted: >> >> $ ovn-nbctl ls-add ls >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0 , >> match=(1), action=(next;) >> >> # Simulate a lflow "disappearing" from the SB. >> $ ovn-sbctl destroy logical_flow f771dbcb >> >> # Northd doesn't recreate it until a read/write column changes value: >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> $ ovn-nbctl --wait=sb sync >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0 , >> match=(1), action=(next;) >> >> So, with all this in mind, if we find a good way of defining this new >> monitoring mode ("writing back changed columns only") I don't really see >> a problem with blindly enabling change tracking and this new mode for >> the Southbound. All in all it just ensures the same behavior as before >> [6] without any real drawbacks. >> > > I understand c), but I still think a) is important. For the example above, > since we expect northd to detect changes in SB logical_flow table and fix > it, it should be regarded as read-write. Probably most of the tables and > columns fall into this category, but there are exceptions. In some cases we > don't even want/need to fix changes. One example is nb_cfg, which is only > for sync or performance test purposes. It is purely write-only, and no need > to detect any changes from other components. Currently with alert enabled, > after every nb_cfg update to the SB by northd, the northd would receive > notification for its own change to this column which triggers a recompute > unnecessarily. I am not sure if there are other examples. But I think > overall we are on the same page. We can detect such cases one-by-one and > treat them as write-only. > Thanks, Han, for the review! I just posted v3: https://patchwork.ozlabs.org/project/openvswitch/patch/20220420193122.22651-1-dceara@redhat.com/ Regards, Dumitru > Thanks, > Han > >>> >>> With the above understanding, I have some comments for the patch: >>> >>> i) Functionally I believe it serves the purpose of solving the > performance >>> problem, but it seems confusing what the modes really mean now. There > were >>> definitions to each of the flags: >>> - OVSDB_IDL_MONITOR /* Replicate this column? */ >>> - OVSDB_IDL_ALERT /* Alert client when column changes? */ >>> - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name >>> suggests. >>> >>> Each of the meanings was clear. Now with the patch it seems to change > the >>> meanings of: >>> - OVSDB_IDL_ALERT to "alert client when column changes, and the column > is >>> NOT write-only" >>> - OVSDB_IDL_TRACK to "alert client when column changes, and track the >>> changes" >>> >>> This is really confusing. I think the purpose here is just to provide a >>> capability of "writing back changed columns only" while still "alert > client >>> when column changes". Would it be better just to add a new flag for > this, >>> like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed > APIs >>> to keep the existed behavior, and provide an API for clients to set it >>> explicitly for columns that are read-write but doesn't care about the >>> atomicity. >> >> Sure, sounds better to make it explicit indeed. >> >>> >>> ii) The comments in ovsdb-idl.h under "The possible mode combinations > are:" >>> needs updating. >> >> Ack. >> >> I'll prepare a v3 for the idl change. In the meantime it would be great >> if we could already agree on the way we want to enable this in >> ovn-northd (i.e., blindly for all SB vs. selectively for some tables). >> >>> >>> Thanks, >>> Han >>> >>>>> [0] >>>>> >>> > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 >>>>> [1] >>>>> >>> > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 >>>>> >>>> >>>> [2] >>> > https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency >>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 >>>> [4] >>> > https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 >>>> >>> [5] >>> > https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605 >>> >> >> [6] >> > https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e >> >> Thanks, >> Dumitru >> >
diff --git a/NEWS b/NEWS index 5074b97aa51c..c1c41da94e9b 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,10 @@ Post-v2.17.0 - OVSDB: * 'relay' service model now supports transaction history, i.e. honors the 'last-txn-id' field in 'monitor_cond_since' requests from clients. + - OVSDB-IDL: + * New interfaces for explcitly setting column modes. The combination + 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' is now considered valid and + enables change tracking of write-only columns. v2.17.0 - 17 Feb 2022 diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 882ede75598d..5f62c267d496 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -859,20 +859,6 @@ ovsdb_idl_get_mode(struct ovsdb_idl *idl, return &table->modes[column - table->class_->columns]; } -static void -ovsdb_idl_set_mode(struct ovsdb_idl *idl, - const struct ovsdb_idl_column *column, - unsigned char mode) -{ - const struct ovsdb_idl_table *table = ovsdb_idl_table_from_column(idl, - column); - size_t column_idx = column - table->class_->columns; - - if (table->modes[column_idx] != mode) { - *ovsdb_idl_get_mode(idl, column) = mode; - } -} - static void add_ref_table(struct ovsdb_idl *idl, const struct ovsdb_base_type *base) { @@ -902,7 +888,8 @@ void ovsdb_idl_add_column(struct ovsdb_idl *idl, const struct ovsdb_idl_column *column) { - ovsdb_idl_set_mode(idl, column, OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); + ovsdb_idl_set_column_mode(idl, column, + OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT); add_ref_table(idl, &column->type.key); add_ref_table(idl, &column->type.value); } @@ -1186,6 +1173,51 @@ ovsdb_idl_omit_alert(struct ovsdb_idl *idl, *ovsdb_idl_get_mode(idl, column) &= ~(OVSDB_IDL_ALERT | OVSDB_IDL_TRACK); } +/* Sets the 'column' mode to 'mode' which must be a valid combination of + * OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT and OVSDB_IDL_TRACK. + */ +void +ovsdb_idl_set_column_mode(struct ovsdb_idl *idl, + const struct ovsdb_idl_column *column, + unsigned char mode) +{ + if (mode & ~(OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { + VLOG_ABORT("%s(): Column '%s' unknown mode %02x.", + __func__, column->name, mode); + } + + if ((mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) + && !(mode & OVSDB_IDL_MONITOR)) { + VLOG_ABORT("%s(): Column '%s' mode must include OVSDB_IDL_MONITOR.", + __func__, column->name); + } + + const struct ovsdb_idl_table *table = ovsdb_idl_table_from_column(idl, + column); + size_t column_idx = column - table->class_->columns; + + if (table->modes[column_idx] != mode) { + *ovsdb_idl_get_mode(idl, column) = mode; + } +} + +/* Walks all columns of tables in 'idl' and sets their mode to 'mode' + * which must be a valid combination of OVSDB_IDL_MONITOR, OVSDB_IDL_ALERT + * and OVSDB_IDL_TRACK. + */ +void +ovsdb_idl_set_column_mode_all(struct ovsdb_idl *idl, unsigned char mode) +{ + for (size_t i = 0; i < idl->class_->n_tables; i++) { + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; + + for (size_t j = 0; j < tc->n_columns; j++) { + const struct ovsdb_idl_column *column = &tc->columns[j]; + ovsdb_idl_set_column_mode(idl, column, mode); + } + } +} + /* Sets the mode for 'column' in 'idl' to 0. See the big comment above * OVSDB_IDL_MONITOR for details. * @@ -1236,8 +1268,8 @@ ovsdb_idl_row_get_seqno(const struct ovsdb_idl_row *row, * functions. * * This function should be called between ovsdb_idl_create() and - * the first call to ovsdb_idl_run(). The column to be tracked - * should have OVSDB_IDL_ALERT turned on. + * the first call to ovsdb_idl_run(). The function will also ensure + * that the column to be tracked has OVSDB_IDL_ALERT turned on. */ void ovsdb_idl_track_add_column(struct ovsdb_idl *idl, @@ -1246,7 +1278,9 @@ ovsdb_idl_track_add_column(struct ovsdb_idl *idl, if (!(*ovsdb_idl_get_mode(idl, column) & OVSDB_IDL_ALERT)) { ovsdb_idl_add_column(idl, column); } - *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_TRACK; + + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK; + ovsdb_idl_set_column_mode(idl, column, mode); } void @@ -1694,11 +1728,14 @@ ovsdb_idl_row_change(struct ovsdb_idl_row *row, const struct shash *values, continue; } - if (datum_changed && table->modes[column_idx] & OVSDB_IDL_ALERT) { - changed = true; - row->change_seqno[change] - = row->table->change_seqno[change] - = row->table->idl->change_seqno + 1; + if (datum_changed) { + unsigned char column_mode = table->modes[column_idx]; + if (column_mode & (OVSDB_IDL_ALERT | OVSDB_IDL_TRACK)) { + changed = true; + row->change_seqno[change] + = row->table->change_seqno[change] + = row->table->idl->change_seqno + 1; + } if (table->modes[column_idx] & OVSDB_IDL_TRACK) { if (ovs_list_is_empty(&row->track_node) && @@ -3501,7 +3538,7 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, class = row->table->class_; column_idx = column - class->columns; - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; + write_only = !(row->table->modes[column_idx] & OVSDB_IDL_ALERT); ovs_assert(row->new_datum != NULL); ovs_assert(column_idx < class->n_columns); diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h index d00599616ef9..2d74ed0727a7 100644 --- a/lib/ovsdb-idl.h +++ b/lib/ovsdb-idl.h @@ -180,7 +180,8 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, * * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column * that a client wants to track using the change tracking - * ovsdb_idl_track_get_*() functions. + * ovsdb_idl_track_get_*() functions. OVSDB_IDL_ALERT can be omitted for + * write-only columns. */ #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ @@ -193,6 +194,11 @@ void ovsdb_idl_add_table(struct ovsdb_idl *, void ovsdb_idl_omit(struct ovsdb_idl *, const struct ovsdb_idl_column *); void ovsdb_idl_omit_alert(struct ovsdb_idl *, const struct ovsdb_idl_column *); +void ovsdb_idl_set_column_mode(struct ovsdb_idl *, + const struct ovsdb_idl_column *, + unsigned char mode); +void ovsdb_idl_set_column_mode_all(struct ovsdb_idl *, unsigned char mode); + /* Change tracking. * * In OVSDB, change tracking is applied at each client in the IDL layer. This diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index 62e2b638320c..0436e0f77b0f 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -1154,12 +1154,25 @@ OVSDB_CHECK_IDL_WO_MONITOR_COND([simple idl disable monitor-cond], ]]) m4_define([OVSDB_CHECK_IDL_TRACK_C], - [AT_SETUP([$1 - C]) + [AT_SETUP([$1 - alert - C]) + AT_KEYWORDS([ovsdb server idl tracking positive $5]) + AT_CHECK([ovsdb_start_idltest]) + m4_if([$2], [], [], + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 --change-track=alert idl unix:socket $3], + [0], [stdout], [ignore]) + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), + [0], [$4]) + OVSDB_SERVER_SHUTDOWN + AT_CLEANUP]) + +m4_define([OVSDB_CHECK_IDL_TRACK_NOALERT_C], + [AT_SETUP([$1 - noalert - C]) AT_KEYWORDS([ovsdb server idl tracking positive $5]) AT_CHECK([ovsdb_start_idltest]) m4_if([$2], [], [], [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) - AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c idl unix:socket $3], + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 --change-track=noalert idl unix:socket $3], [0], [stdout], [ignore]) AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), [0], [$4]) @@ -1167,7 +1180,8 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], AT_CLEANUP]) m4_define([OVSDB_CHECK_IDL_TRACK], - [OVSDB_CHECK_IDL_TRACK_C($@)]) + [OVSDB_CHECK_IDL_TRACK_C($@) + OVSDB_CHECK_IDL_TRACK_NOALERT_C($@)]) OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], [['["idltest", diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index ca4e87b8115c..abd30cf27d18 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -56,6 +56,7 @@ VLOG_DEFINE_THIS_MODULE(test_ovsdb); struct test_ovsdb_pvt_context { + bool track_alert; bool track; }; @@ -122,6 +123,15 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) break; case 'c': + if (optarg) { + if (!strcmp(optarg, "noalert")) { + pvt->track_alert = false; + } else if (!strcmp(optarg, "alert")) { + pvt->track_alert = true; + } else { + ovs_fatal(0, "value %s is invalid", optarg); + } + } pvt->track = true; break; @@ -2610,6 +2620,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) static void do_idl(struct ovs_cmdl_context *ctx) { + struct test_ovsdb_pvt_context *pvt = ctx->pvt; struct jsonrpc *rpc; struct ovsdb_idl *idl; unsigned int seqno = 0; @@ -2618,9 +2629,6 @@ do_idl(struct ovs_cmdl_context *ctx) int step = 0; int error; int i; - bool track; - - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); ovsdb_idl_set_leader_only(idl, false); @@ -2637,8 +2645,13 @@ do_idl(struct ovs_cmdl_context *ctx) rpc = NULL; } - if (track) { - ovsdb_idl_track_add_all(idl); + if (pvt->track) { + unsigned char mode = OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK; + + if (pvt->track_alert) { + mode |= OVSDB_IDL_ALERT; + } + ovsdb_idl_set_column_mode_all(idl, mode); } setvbuf(stdout, NULL, _IONBF, 0); @@ -2683,7 +2696,7 @@ do_idl(struct ovs_cmdl_context *ctx) } /* Print update. */ - if (track) { + if (pvt->track) { print_idl_track(idl, step++, terse); ovsdb_idl_track_clear(idl); } else {
At a first glance, change tracking should never be allowed for write-only columns. However, some clients (e.g., ovn-northd) that are mostly exclusive writers of a database, use change tracking to avoid duplicating the IDL row records into a local cache when implementing incremental processing. The default behavior of the IDL is to automatically turn a write-only column into a read-write column whenever the client enables change tracking for that column. For the afore mentioned clients, this becomes a performance issue. Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't change a column's value.") explains why writes that don't change a column's value cannot be optimized out early if the column is read/write. Furthermore, if there is at least one record in any table that changed during a transaction, then *all* records that have been written are added to the transaction, even if their values didn't change. If there are many such rows (e.g., like in ovn-northd's case) this incurs a significant overhead because: a. the client has to build this large transaction b. the transaction has to be sent over the network c. the server needs to parse this (mostly) no-op update We now introduce new IDL APIs allowing users to explicitly request monitoring mode combinations for different columns in the IDL. These accept the combination 'OVSDB_IDL_MONITOR | OVSDB_IDL_TRACK' as a valid input allowing users to change track write-only tables. We benchmarked ovn-northd performance when using this new mode against NB and SB databases taken from ovn-kubernetes scale tests. We noticed that when a minor change is performed to the Northbound database (e.g., NB_Global.nb_cfg is incremented) the time it takes to build the Southbound transaction becomes negligible (vs ~1.5 seconds before this change). End-to-end ovn-kubernetes scale tests on 120-node clusters also show significant reduction of latency to bring up pods; both average and P99 latency decreased by ~30%. Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- V2: - Addressed Numan's comments: - Added APIs to allow per column configuration of modes. - Fixed unit tests to actually enable change tracking for write-only columns. - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change seqnos if change tracking is enabled (even for write-only rows). Note: The OVN counter part change is: https://github.com/dceara/ovn/commit/0d0c689e6469ae2f8e195c423e9a91fbc514bd60 --- NEWS | 4 +++ lib/ovsdb-idl.c | 85 +++++++++++++++++++++++++++++++++------------- lib/ovsdb-idl.h | 8 ++++- tests/ovsdb-idl.at | 20 +++++++++-- tests/test-ovsdb.c | 25 ++++++++++---- 5 files changed, 108 insertions(+), 34 deletions(-)