Message ID | 20220420193122.22651-1-dceara@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,v3] ovsdb-idl: Support write-only-changed IDL monitor mode. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote: > > At a first glance, change tracking should never be allowed for > write-only columns. However, some clients (e.g., ovn-northd) that are > mostly exclusive writers of a database, use change tracking to avoid > duplicating the IDL row records into a local cache when implementing > incremental processing. > > The default behavior of the IDL is to automatically turn a write-only > column into a read-write column whenever the client enables change > tracking for that column. > > For the afore mentioned clients, this becomes a performance issue. > Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't > change a column's value.") explains why writes that don't change a > column's value cannot be optimized out early if the column is > read/write. > > Furthermore, if there is at least one record in any table that > changed during a transaction, then *all* records that have been > written are added to the transaction, even if their values didn't > change. If there are many such rows (e.g., like in ovn-northd's > case) this incurs a significant overhead because: > a. the client has to build this large transaction > b. the transaction has to be sent over the network > c. the server needs to parse this (mostly) no-op update > > We now introduce new IDL APIs allowing users to set a new monitoring > mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the > atomicity constraints may be relaxed and written columns that don't > change value can be skipped from the current transaction. Is this something that we should consider adding at some point to the Python IDL code as well? I know in the past feature support has diverged. Maybe at some point we should create a python/TODO file that lists features in the C IDL that still need to be implemented if they aren't immediately added with C IDL changes? > We benchmarked ovn-northd performance when using this new mode > against NB and SB databases taken from ovn-kubernetes scale tests. > We noticed that when a minor change is performed to the Northbound > database (e.g., NB_Global.nb_cfg is incremented) the time it takes to > build the Southbound transaction becomes negligible (vs ~1.5 seconds > before this change). > > End-to-end ovn-kubernetes scale tests on 120-node clusters also show > significant reduction of latency to bring up pods; both average and P99 > latency decreased by ~30%. > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > V3: > - Addressed Han's comments: > - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. > - Rephrased commit log. > - Changed commit title to reflect the new approach. > - Old patch (v2) was: > https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ > V2: > - Addressed Numan's comments: > - Added APIs to allow per column configuration of modes. > - Fixed unit tests to actually enable change tracking for write-only > columns. > - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change > seqnos if change tracking is enabled (even for write-only rows). > > Note: The OVN counter part change is: > https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 > --- > NEWS | 4 ++++ > lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- > lib/ovsdb-idl.h | 14 +++++++++++- > tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- > tests/test-ovsdb.c | 18 +++++++++++---- > 5 files changed, 111 insertions(+), 13 deletions(-) > > diff --git a/NEWS b/NEWS > index 5074b97aa51c..60932791bfcf 100644 > --- a/NEWS > +++ b/NEWS > @@ -14,6 +14,10 @@ Post-v2.17.0 > - OVSDB: > * 'relay' service model now supports transaction history, i.e. honors the > 'last-txn-id' field in 'monitor_cond_since' requests from clients. > + - OVSDB-IDL: > + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing > + applications to relax atomicity requirements when dealing with > + columns whose value has been rewritten (but not changed). > > > v2.17.0 - 17 Feb 2022 > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > index 882ede75598d..e5bb6cca5274 100644 > --- a/lib/ovsdb-idl.c > +++ b/lib/ovsdb-idl.c > @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) > { > ovsdb_idl_track_clear__(idl, false); > } > + > +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY > + * for 'column' in 'idl', ensuring that the column will be included in a > + * transaction only if its value has actually changed locally. Normally > + * read/write columns that are written to are always included in the > + * transaction but, in specific cases, when the application doesn't > + * require atomicity of writes across different columns the ones that > + * don't chave value may be skipped. > + * > + * This function should be called between ovsdb_idl_create() and > + * the first call to ovsdb_idl_run(). > + */ > +void > +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > + const struct ovsdb_idl_column *column, > + bool enable) > +{ > + if (enable) { > + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; > + } else { > + *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY; > + } > +} > + > +void > +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) > +{ > + for (size_t i = 0; i < idl->class_->n_tables; i++) { > + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; > + > + for (size_t j = 0; j < tc->n_columns; j++) { > + const struct ovsdb_idl_column *column = &tc->columns[j]; > + ovsdb_idl_set_write_change_only(idl, column, enable); > + } > + } > +} > > static void > log_parse_update_error(struct ovsdb_error *error) > @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > { > struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); > const struct ovsdb_idl_table_class *class; > + unsigned char column_mode; > + bool optimize_rewritten; > size_t column_idx; > bool write_only; > > @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > > class = row->table->class_; > column_idx = column - class->columns; > - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; > + column_mode = row->table->modes[column_idx]; > + write_only = column_mode == OVSDB_IDL_MONITOR; > + optimize_rewritten = > + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); > + > > ovs_assert(row->new_datum != NULL); > ovs_assert(column_idx < class->n_columns); > - ovs_assert(row->old_datum == NULL || > - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); > + ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR); > > if (row->table->idl->verify_write_only && !write_only) { > VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when" > @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > * different value in that column since we read it. (But if a whole > * transaction only does writes of existing values, without making any real > * changes, we will drop the whole transaction later in > - * ovsdb_idl_txn_commit().) */ > - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > - datum, &column->type)) { > + * ovsdb_idl_txn_commit().) > + * > + * The application may choose to bypass this restriction and always > + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. > + */ > + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column), > + datum, &column->type)) { > goto discard_datum; > } > > diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h > index d00599616ef9..1716955ab61c 100644 > --- a/lib/ovsdb-idl.h > +++ b/lib/ovsdb-idl.h > @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, > * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column > * that a client wants to track using the change tracking > * ovsdb_idl_track_get_*() functions. > + * > + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY) > + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it > + * only adds a written column to a transaction if the column's value > + * has actually changed. > */ > #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ > #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ > -#define OVSDB_IDL_TRACK (1 << 2) > +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ > +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ > + (1 << 3) /* Write back only changed columns? */ > > void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *); > void ovsdb_idl_add_table(struct ovsdb_idl *, > @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row, > const struct ovsdb_idl_column *column); > void ovsdb_idl_track_clear(struct ovsdb_idl *); > > +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > + const struct ovsdb_idl_column *column, > + bool enable); > +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable); > + > > /* Reading the database replica. */ > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 62e2b638320c..d1cfa59c5758 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], > OVSDB_SERVER_SHUTDOWN > AT_CLEANUP]) > > +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. > +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], > + [AT_SETUP([$1 - write-changed-only - C]) > + AT_KEYWORDS([ovsdb server idl positive $5]) > + AT_CHECK([ovsdb_start_idltest]) > + m4_if([$2], [], [], > + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3], > + [0], [stdout], [ignore]) > + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > + [0], [$4]) > + OVSDB_SERVER_SHUTDOWN > + AT_CLEANUP]) > + > # same as OVSDB_CHECK_IDL but uses tcp. > m4_define([OVSDB_CHECK_IDL_TCP_C], > [AT_SETUP([$1 - C - tcp]) > @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], > > m4_define([OVSDB_CHECK_IDL], > [OVSDB_CHECK_IDL_C($@) > + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) > OVSDB_CHECK_IDL_TCP_C($@) > OVSDB_CHECK_IDL_TCP6_C($@) > OVSDB_CHECK_IDL_PY($@) > @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], > OVSDB_SERVER_SHUTDOWN > AT_CLEANUP]) > > +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], > + [AT_SETUP([$1 - write-changed-only - C]) > + AT_KEYWORDS([ovsdb server idl tracking positive $5]) > + AT_CHECK([ovsdb_start_idltest]) > + m4_if([$2], [], [], > + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3], > + [0], [stdout], [ignore]) > + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > + [0], [$4]) > + OVSDB_SERVER_SHUTDOWN > + AT_CLEANUP]) > + > m4_define([OVSDB_CHECK_IDL_TRACK], > - [OVSDB_CHECK_IDL_TRACK_C($@)]) > + [OVSDB_CHECK_IDL_TRACK_C($@) > + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) > > OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], > [['["idltest", > diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > index ca4e87b8115c..808b15355743 100644 > --- a/tests/test-ovsdb.c > +++ b/tests/test-ovsdb.c > @@ -56,6 +56,7 @@ > VLOG_DEFINE_THIS_MODULE(test_ovsdb); > > struct test_ovsdb_pvt_context { > + bool write_changed_only; > bool track; > }; > > @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > {"timeout", required_argument, NULL, 't'}, > {"verbose", optional_argument, NULL, 'v'}, > {"change-track", optional_argument, NULL, 'c'}, > + {"write-changed-only", optional_argument, NULL, 'w'}, > {"magic", required_argument, NULL, OPT_MAGIC}, > {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES}, > {"help", no_argument, NULL, 'h'}, > @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > pvt->track = true; > break; > > + case 'w': > + pvt->write_changed_only = true; > + break; > + > case OPT_MAGIC: > magic = optarg; > break; > @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) > static void > do_idl(struct ovs_cmdl_context *ctx) > { > + struct test_ovsdb_pvt_context *pvt = ctx->pvt; > struct jsonrpc *rpc; > struct ovsdb_idl *idl; > unsigned int seqno = 0; > @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) > int step = 0; > int error; > int i; > - bool track; > - > - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; > > idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); > ovsdb_idl_set_leader_only(idl, false); > @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) > rpc = NULL; > } > > - if (track) { > + if (pvt->track) { > ovsdb_idl_track_add_all(idl); > } > > + if (pvt->write_changed_only) { > + ovsdb_idl_set_write_change_only_all(idl, true); > + } > + > setvbuf(stdout, NULL, _IONBF, 0); > > symtab = ovsdb_symbol_table_create(); > @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) > } > > /* Print update. */ > - if (track) { > + if (pvt->track) { > print_idl_track(idl, step++, terse); > ovsdb_idl_track_clear(idl); > } else { > -- > 2.27.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 4/22/22 20:38, Terry Wilson wrote: > On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote: >> >> At a first glance, change tracking should never be allowed for >> write-only columns. However, some clients (e.g., ovn-northd) that are >> mostly exclusive writers of a database, use change tracking to avoid >> duplicating the IDL row records into a local cache when implementing >> incremental processing. >> >> The default behavior of the IDL is to automatically turn a write-only >> column into a read-write column whenever the client enables change >> tracking for that column. >> >> For the afore mentioned clients, this becomes a performance issue. >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't >> change a column's value.") explains why writes that don't change a >> column's value cannot be optimized out early if the column is >> read/write. >> >> Furthermore, if there is at least one record in any table that >> changed during a transaction, then *all* records that have been >> written are added to the transaction, even if their values didn't >> change. If there are many such rows (e.g., like in ovn-northd's >> case) this incurs a significant overhead because: >> a. the client has to build this large transaction >> b. the transaction has to be sent over the network >> c. the server needs to parse this (mostly) no-op update >> >> We now introduce new IDL APIs allowing users to set a new monitoring >> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the >> atomicity constraints may be relaxed and written columns that don't >> change value can be skipped from the current transaction. > > Is this something that we should consider adding at some point to the > Python IDL code as well? I know in the past feature support has > diverged. Maybe at some point we should create a python/TODO file that > lists features in the C IDL that still need to be implemented if they > aren't immediately added with C IDL changes? You're right, we should [robably consider adding this as a new mode to the Python IDL too. I can do that as a follow up if needed. However, until now, the only user of this new mode is ovn-northd and it will be using it in conjunction with change tracking (which is another missing Python IDL feature). Han, Ilya, what do you think? Nevertheless, a TODO list would probably be good to have indeed. > >> We benchmarked ovn-northd performance when using this new mode >> against NB and SB databases taken from ovn-kubernetes scale tests. >> We noticed that when a minor change is performed to the Northbound >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to >> build the Southbound transaction becomes negligible (vs ~1.5 seconds >> before this change). >> >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show >> significant reduction of latency to bring up pods; both average and P99 >> latency decreased by ~30%. >> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >> --- >> V3: >> - Addressed Han's comments: >> - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. >> - Rephrased commit log. >> - Changed commit title to reflect the new approach. >> - Old patch (v2) was: >> https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ >> V2: >> - Addressed Numan's comments: >> - Added APIs to allow per column configuration of modes. >> - Fixed unit tests to actually enable change tracking for write-only >> columns. >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change >> seqnos if change tracking is enabled (even for write-only rows). >> >> Note: The OVN counter part change is: >> https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 >> --- >> NEWS | 4 ++++ >> lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- >> lib/ovsdb-idl.h | 14 +++++++++++- >> tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- >> tests/test-ovsdb.c | 18 +++++++++++---- >> 5 files changed, 111 insertions(+), 13 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 5074b97aa51c..60932791bfcf 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -14,6 +14,10 @@ Post-v2.17.0 >> - OVSDB: >> * 'relay' service model now supports transaction history, i.e. honors the >> 'last-txn-id' field in 'monitor_cond_since' requests from clients. >> + - OVSDB-IDL: >> + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing >> + applications to relax atomicity requirements when dealing with >> + columns whose value has been rewritten (but not changed). >> >> >> v2.17.0 - 17 Feb 2022 >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >> index 882ede75598d..e5bb6cca5274 100644 >> --- a/lib/ovsdb-idl.c >> +++ b/lib/ovsdb-idl.c >> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) >> { >> ovsdb_idl_track_clear__(idl, false); >> } >> + >> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY >> + * for 'column' in 'idl', ensuring that the column will be included in a >> + * transaction only if its value has actually changed locally. Normally >> + * read/write columns that are written to are always included in the >> + * transaction but, in specific cases, when the application doesn't >> + * require atomicity of writes across different columns the ones that >> + * don't chave value may be skipped. >> + * >> + * This function should be called between ovsdb_idl_create() and >> + * the first call to ovsdb_idl_run(). >> + */ >> +void >> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, >> + const struct ovsdb_idl_column *column, >> + bool enable) >> +{ >> + if (enable) { >> + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; >> + } else { >> + *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY; >> + } >> +} >> + >> +void >> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) >> +{ >> + for (size_t i = 0; i < idl->class_->n_tables; i++) { >> + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; >> + >> + for (size_t j = 0; j < tc->n_columns; j++) { >> + const struct ovsdb_idl_column *column = &tc->columns[j]; >> + ovsdb_idl_set_write_change_only(idl, column, enable); >> + } >> + } >> +} >> >> static void >> log_parse_update_error(struct ovsdb_error *error) >> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, >> { >> struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); >> const struct ovsdb_idl_table_class *class; >> + unsigned char column_mode; >> + bool optimize_rewritten; >> size_t column_idx; >> bool write_only; >> >> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, >> >> class = row->table->class_; >> column_idx = column - class->columns; >> - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; >> + column_mode = row->table->modes[column_idx]; >> + write_only = column_mode == OVSDB_IDL_MONITOR; >> + optimize_rewritten = >> + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); >> + >> >> ovs_assert(row->new_datum != NULL); >> ovs_assert(column_idx < class->n_columns); >> - ovs_assert(row->old_datum == NULL || >> - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); >> + ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR); >> >> if (row->table->idl->verify_write_only && !write_only) { >> VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when" >> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, >> * different value in that column since we read it. (But if a whole >> * transaction only does writes of existing values, without making any real >> * changes, we will drop the whole transaction later in >> - * ovsdb_idl_txn_commit().) */ >> - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), >> - datum, &column->type)) { >> + * ovsdb_idl_txn_commit().) >> + * >> + * The application may choose to bypass this restriction and always >> + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. >> + */ >> + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column), >> + datum, &column->type)) { >> goto discard_datum; >> } >> >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h >> index d00599616ef9..1716955ab61c 100644 >> --- a/lib/ovsdb-idl.h >> +++ b/lib/ovsdb-idl.h >> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, >> * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column >> * that a client wants to track using the change tracking >> * ovsdb_idl_track_get_*() functions. >> + * >> + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY) >> + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it >> + * only adds a written column to a transaction if the column's value >> + * has actually changed. >> */ >> #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ >> #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ >> -#define OVSDB_IDL_TRACK (1 << 2) >> +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ >> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ >> + (1 << 3) /* Write back only changed columns? */ >> >> void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *); >> void ovsdb_idl_add_table(struct ovsdb_idl *, >> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row, >> const struct ovsdb_idl_column *column); >> void ovsdb_idl_track_clear(struct ovsdb_idl *); >> >> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, >> + const struct ovsdb_idl_column *column, >> + bool enable); >> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable); >> + >> >> /* Reading the database replica. */ >> >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >> index 62e2b638320c..d1cfa59c5758 100644 >> --- a/tests/ovsdb-idl.at >> +++ b/tests/ovsdb-idl.at >> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], >> OVSDB_SERVER_SHUTDOWN >> AT_CLEANUP]) >> >> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. >> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], >> + [AT_SETUP([$1 - write-changed-only - C]) >> + AT_KEYWORDS([ovsdb server idl positive $5]) >> + AT_CHECK([ovsdb_start_idltest]) >> + m4_if([$2], [], [], >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3], >> + [0], [stdout], [ignore]) >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> + [0], [$4]) >> + OVSDB_SERVER_SHUTDOWN >> + AT_CLEANUP]) >> + >> # same as OVSDB_CHECK_IDL but uses tcp. >> m4_define([OVSDB_CHECK_IDL_TCP_C], >> [AT_SETUP([$1 - C - tcp]) >> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], >> >> m4_define([OVSDB_CHECK_IDL], >> [OVSDB_CHECK_IDL_C($@) >> + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) >> OVSDB_CHECK_IDL_TCP_C($@) >> OVSDB_CHECK_IDL_TCP6_C($@) >> OVSDB_CHECK_IDL_PY($@) >> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], >> OVSDB_SERVER_SHUTDOWN >> AT_CLEANUP]) >> >> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], >> + [AT_SETUP([$1 - write-changed-only - C]) >> + AT_KEYWORDS([ovsdb server idl tracking positive $5]) >> + AT_CHECK([ovsdb_start_idltest]) >> + m4_if([$2], [], [], >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3], >> + [0], [stdout], [ignore]) >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> + [0], [$4]) >> + OVSDB_SERVER_SHUTDOWN >> + AT_CLEANUP]) >> + >> m4_define([OVSDB_CHECK_IDL_TRACK], >> - [OVSDB_CHECK_IDL_TRACK_C($@)]) >> + [OVSDB_CHECK_IDL_TRACK_C($@) >> + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) >> >> OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], >> [['["idltest", >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c >> index ca4e87b8115c..808b15355743 100644 >> --- a/tests/test-ovsdb.c >> +++ b/tests/test-ovsdb.c >> @@ -56,6 +56,7 @@ >> VLOG_DEFINE_THIS_MODULE(test_ovsdb); >> >> struct test_ovsdb_pvt_context { >> + bool write_changed_only; >> bool track; >> }; >> >> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) >> {"timeout", required_argument, NULL, 't'}, >> {"verbose", optional_argument, NULL, 'v'}, >> {"change-track", optional_argument, NULL, 'c'}, >> + {"write-changed-only", optional_argument, NULL, 'w'}, >> {"magic", required_argument, NULL, OPT_MAGIC}, >> {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES}, >> {"help", no_argument, NULL, 'h'}, >> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) >> pvt->track = true; >> break; >> >> + case 'w': >> + pvt->write_changed_only = true; >> + break; >> + >> case OPT_MAGIC: >> magic = optarg; >> break; >> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) >> static void >> do_idl(struct ovs_cmdl_context *ctx) >> { >> + struct test_ovsdb_pvt_context *pvt = ctx->pvt; >> struct jsonrpc *rpc; >> struct ovsdb_idl *idl; >> unsigned int seqno = 0; >> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) >> int step = 0; >> int error; >> int i; >> - bool track; >> - >> - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; >> >> idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); >> ovsdb_idl_set_leader_only(idl, false); >> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) >> rpc = NULL; >> } >> >> - if (track) { >> + if (pvt->track) { >> ovsdb_idl_track_add_all(idl); >> } >> >> + if (pvt->write_changed_only) { >> + ovsdb_idl_set_write_change_only_all(idl, true); >> + } >> + >> setvbuf(stdout, NULL, _IONBF, 0); >> >> symtab = ovsdb_symbol_table_create(); >> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) >> } >> >> /* Print update. */ >> - if (track) { >> + if (pvt->track) { >> print_idl_track(idl, step++, terse); >> ovsdb_idl_track_clear(idl); >> } else { >> -- >> 2.27.0 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >
On Wed, Apr 20, 2022 at 12:31 PM Dumitru Ceara <dceara@redhat.com> wrote: > > At a first glance, change tracking should never be allowed for > write-only columns. However, some clients (e.g., ovn-northd) that are > mostly exclusive writers of a database, use change tracking to avoid > duplicating the IDL row records into a local cache when implementing > incremental processing. > > The default behavior of the IDL is to automatically turn a write-only > column into a read-write column whenever the client enables change > tracking for that column. > > For the afore mentioned clients, this becomes a performance issue. > Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't > change a column's value.") explains why writes that don't change a > column's value cannot be optimized out early if the column is > read/write. > > Furthermore, if there is at least one record in any table that > changed during a transaction, then *all* records that have been > written are added to the transaction, even if their values didn't > change. If there are many such rows (e.g., like in ovn-northd's > case) this incurs a significant overhead because: > a. the client has to build this large transaction > b. the transaction has to be sent over the network > c. the server needs to parse this (mostly) no-op update > > We now introduce new IDL APIs allowing users to set a new monitoring > mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the > atomicity constraints may be relaxed and written columns that don't > change value can be skipped from the current transaction. > > We benchmarked ovn-northd performance when using this new mode > against NB and SB databases taken from ovn-kubernetes scale tests. > We noticed that when a minor change is performed to the Northbound > database (e.g., NB_Global.nb_cfg is incremented) the time it takes to > build the Southbound transaction becomes negligible (vs ~1.5 seconds > before this change). > > End-to-end ovn-kubernetes scale tests on 120-node clusters also show > significant reduction of latency to bring up pods; both average and P99 > latency decreased by ~30%. > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> Thanks Dumitru for the revision. I only have two minor comments below. Acked-by: Han Zhou <hzhou@ovn.org> > --- > V3: > - Addressed Han's comments: > - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. > - Rephrased commit log. > - Changed commit title to reflect the new approach. > - Old patch (v2) was: > https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ > V2: > - Addressed Numan's comments: > - Added APIs to allow per column configuration of modes. > - Fixed unit tests to actually enable change tracking for write-only > columns. > - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change > seqnos if change tracking is enabled (even for write-only rows). > > Note: The OVN counter part change is: > https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 > --- > NEWS | 4 ++++ > lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- > lib/ovsdb-idl.h | 14 +++++++++++- > tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- > tests/test-ovsdb.c | 18 +++++++++++---- > 5 files changed, 111 insertions(+), 13 deletions(-) > > diff --git a/NEWS b/NEWS > index 5074b97aa51c..60932791bfcf 100644 > --- a/NEWS > +++ b/NEWS > @@ -14,6 +14,10 @@ Post-v2.17.0 > - OVSDB: > * 'relay' service model now supports transaction history, i.e. honors the > 'last-txn-id' field in 'monitor_cond_since' requests from clients. > + - OVSDB-IDL: > + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing > + applications to relax atomicity requirements when dealing with > + columns whose value has been rewritten (but not changed). > > > v2.17.0 - 17 Feb 2022 > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > index 882ede75598d..e5bb6cca5274 100644 > --- a/lib/ovsdb-idl.c > +++ b/lib/ovsdb-idl.c > @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) > { > ovsdb_idl_track_clear__(idl, false); > } > + > +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY > + * for 'column' in 'idl', ensuring that the column will be included in a > + * transaction only if its value has actually changed locally. Normally > + * read/write columns that are written to are always included in the > + * transaction but, in specific cases, when the application doesn't > + * require atomicity of writes across different columns the ones that > + * don't chave value may be skipped. s/chave/change > + * > + * This function should be called between ovsdb_idl_create() and > + * the first call to ovsdb_idl_run(). > + */ > +void > +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > + const struct ovsdb_idl_column *column, > + bool enable) > +{ > + if (enable) { > + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; > + } else { > + *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY; > + } > +} > + > +void > +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) > +{ > + for (size_t i = 0; i < idl->class_->n_tables; i++) { > + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; > + > + for (size_t j = 0; j < tc->n_columns; j++) { > + const struct ovsdb_idl_column *column = &tc->columns[j]; > + ovsdb_idl_set_write_change_only(idl, column, enable); > + } > + } > +} > > static void > log_parse_update_error(struct ovsdb_error *error) > @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > { > struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); > const struct ovsdb_idl_table_class *class; > + unsigned char column_mode; > + bool optimize_rewritten; > size_t column_idx; > bool write_only; > > @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > > class = row->table->class_; > column_idx = column - class->columns; > - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; > + column_mode = row->table->modes[column_idx]; > + write_only = column_mode == OVSDB_IDL_MONITOR; > + optimize_rewritten = > + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); > + > > ovs_assert(row->new_datum != NULL); > ovs_assert(column_idx < class->n_columns); > - ovs_assert(row->old_datum == NULL || > - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); > + ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR); > > if (row->table->idl->verify_write_only && !write_only) { > VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when" > @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > * different value in that column since we read it. (But if a whole > * transaction only does writes of existing values, without making any real > * changes, we will drop the whole transaction later in > - * ovsdb_idl_txn_commit().) */ > - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > - datum, &column->type)) { > + * ovsdb_idl_txn_commit().) > + * > + * The application may choose to bypass this restriction and always > + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. > + */ > + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column), > + datum, &column->type)) { > goto discard_datum; > } > > diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h > index d00599616ef9..1716955ab61c 100644 > --- a/lib/ovsdb-idl.h > +++ b/lib/ovsdb-idl.h > @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, > * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column > * that a client wants to track using the change tracking > * ovsdb_idl_track_get_*() functions. > + * > + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY) > + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it > + * only adds a written column to a transaction if the column's value > + * has actually changed. In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as well, but I think it is too tedious. Maybe we can change the section from "possible mode combinations" to "typical mode combinations". > */ > #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ > #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ > -#define OVSDB_IDL_TRACK (1 << 2) > +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ > +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ > + (1 << 3) /* Write back only changed columns? */ > > void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *); > void ovsdb_idl_add_table(struct ovsdb_idl *, > @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row, > const struct ovsdb_idl_column *column); > void ovsdb_idl_track_clear(struct ovsdb_idl *); > > +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > + const struct ovsdb_idl_column *column, > + bool enable); > +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable); > + > > /* Reading the database replica. */ > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 62e2b638320c..d1cfa59c5758 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], > OVSDB_SERVER_SHUTDOWN > AT_CLEANUP]) > > +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. > +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], > + [AT_SETUP([$1 - write-changed-only - C]) > + AT_KEYWORDS([ovsdb server idl positive $5]) > + AT_CHECK([ovsdb_start_idltest]) > + m4_if([$2], [], [], > + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3], > + [0], [stdout], [ignore]) > + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > + [0], [$4]) > + OVSDB_SERVER_SHUTDOWN > + AT_CLEANUP]) > + > # same as OVSDB_CHECK_IDL but uses tcp. > m4_define([OVSDB_CHECK_IDL_TCP_C], > [AT_SETUP([$1 - C - tcp]) > @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], > > m4_define([OVSDB_CHECK_IDL], > [OVSDB_CHECK_IDL_C($@) > + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) > OVSDB_CHECK_IDL_TCP_C($@) > OVSDB_CHECK_IDL_TCP6_C($@) > OVSDB_CHECK_IDL_PY($@) > @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], > OVSDB_SERVER_SHUTDOWN > AT_CLEANUP]) > > +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], > + [AT_SETUP([$1 - write-changed-only - C]) > + AT_KEYWORDS([ovsdb server idl tracking positive $5]) > + AT_CHECK([ovsdb_start_idltest]) > + m4_if([$2], [], [], > + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3], > + [0], [stdout], [ignore]) > + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > + [0], [$4]) > + OVSDB_SERVER_SHUTDOWN > + AT_CLEANUP]) > + > m4_define([OVSDB_CHECK_IDL_TRACK], > - [OVSDB_CHECK_IDL_TRACK_C($@)]) > + [OVSDB_CHECK_IDL_TRACK_C($@) > + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) > > OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], > [['["idltest", > diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > index ca4e87b8115c..808b15355743 100644 > --- a/tests/test-ovsdb.c > +++ b/tests/test-ovsdb.c > @@ -56,6 +56,7 @@ > VLOG_DEFINE_THIS_MODULE(test_ovsdb); > > struct test_ovsdb_pvt_context { > + bool write_changed_only; > bool track; > }; > > @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > {"timeout", required_argument, NULL, 't'}, > {"verbose", optional_argument, NULL, 'v'}, > {"change-track", optional_argument, NULL, 'c'}, > + {"write-changed-only", optional_argument, NULL, 'w'}, > {"magic", required_argument, NULL, OPT_MAGIC}, > {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES}, > {"help", no_argument, NULL, 'h'}, > @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > pvt->track = true; > break; > > + case 'w': > + pvt->write_changed_only = true; > + break; > + > case OPT_MAGIC: > magic = optarg; > break; > @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) > static void > do_idl(struct ovs_cmdl_context *ctx) > { > + struct test_ovsdb_pvt_context *pvt = ctx->pvt; > struct jsonrpc *rpc; > struct ovsdb_idl *idl; > unsigned int seqno = 0; > @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) > int step = 0; > int error; > int i; > - bool track; > - > - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; > > idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); > ovsdb_idl_set_leader_only(idl, false); > @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) > rpc = NULL; > } > > - if (track) { > + if (pvt->track) { > ovsdb_idl_track_add_all(idl); > } > > + if (pvt->write_changed_only) { > + ovsdb_idl_set_write_change_only_all(idl, true); > + } > + > setvbuf(stdout, NULL, _IONBF, 0); > > symtab = ovsdb_symbol_table_create(); > @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) > } > > /* Print update. */ > - if (track) { > + if (pvt->track) { > print_idl_track(idl, step++, terse); > ovsdb_idl_track_clear(idl); > } else { > -- > 2.27.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Mon, Apr 25, 2022 at 2:28 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/22/22 20:38, Terry Wilson wrote: > > On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote: > >> > >> At a first glance, change tracking should never be allowed for > >> write-only columns. However, some clients (e.g., ovn-northd) that are > >> mostly exclusive writers of a database, use change tracking to avoid > >> duplicating the IDL row records into a local cache when implementing > >> incremental processing. > >> > >> The default behavior of the IDL is to automatically turn a write-only > >> column into a read-write column whenever the client enables change > >> tracking for that column. > >> > >> For the afore mentioned clients, this becomes a performance issue. > >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't > >> change a column's value.") explains why writes that don't change a > >> column's value cannot be optimized out early if the column is > >> read/write. > >> > >> Furthermore, if there is at least one record in any table that > >> changed during a transaction, then *all* records that have been > >> written are added to the transaction, even if their values didn't > >> change. If there are many such rows (e.g., like in ovn-northd's > >> case) this incurs a significant overhead because: > >> a. the client has to build this large transaction > >> b. the transaction has to be sent over the network > >> c. the server needs to parse this (mostly) no-op update > >> > >> We now introduce new IDL APIs allowing users to set a new monitoring > >> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the > >> atomicity constraints may be relaxed and written columns that don't > >> change value can be skipped from the current transaction. > > > > Is this something that we should consider adding at some point to the > > Python IDL code as well? I know in the past feature support has > > diverged. Maybe at some point we should create a python/TODO file that > > lists features in the C IDL that still need to be implemented if they > > aren't immediately added with C IDL changes? > > You're right, we should [robably consider adding this as a new mode to > the Python IDL too. I can do that as a follow up if needed. However, > until now, the only user of this new mode is ovn-northd and it will be > using it in conjunction with change tracking (which is another missing > Python IDL feature). > > Han, Ilya, what do you think? > > Nevertheless, a TODO list would probably be good to have indeed. > I agree that a TODO would be good for the python IDL. Since there are already feature gaps, I think it may be a separate patch to add TODOs for all the gaps. Thanks, Han > > > >> We benchmarked ovn-northd performance when using this new mode > >> against NB and SB databases taken from ovn-kubernetes scale tests. > >> We noticed that when a minor change is performed to the Northbound > >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to > >> build the Southbound transaction becomes negligible (vs ~1.5 seconds > >> before this change). > >> > >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show > >> significant reduction of latency to bring up pods; both average and P99 > >> latency decreased by ~30%. > >> > >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > >> --- > >> V3: > >> - Addressed Han's comments: > >> - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. > >> - Rephrased commit log. > >> - Changed commit title to reflect the new approach. > >> - Old patch (v2) was: > >> https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ > >> V2: > >> - Addressed Numan's comments: > >> - Added APIs to allow per column configuration of modes. > >> - Fixed unit tests to actually enable change tracking for write-only > >> columns. > >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change > >> seqnos if change tracking is enabled (even for write-only rows). > >> > >> Note: The OVN counter part change is: > >> https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 > >> --- > >> NEWS | 4 ++++ > >> lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- > >> lib/ovsdb-idl.h | 14 +++++++++++- > >> tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- > >> tests/test-ovsdb.c | 18 +++++++++++---- > >> 5 files changed, 111 insertions(+), 13 deletions(-) > >> > >> diff --git a/NEWS b/NEWS > >> index 5074b97aa51c..60932791bfcf 100644 > >> --- a/NEWS > >> +++ b/NEWS > >> @@ -14,6 +14,10 @@ Post-v2.17.0 > >> - OVSDB: > >> * 'relay' service model now supports transaction history, i.e. honors the > >> 'last-txn-id' field in 'monitor_cond_since' requests from clients. > >> + - OVSDB-IDL: > >> + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing > >> + applications to relax atomicity requirements when dealing with > >> + columns whose value has been rewritten (but not changed). > >> > >> > >> v2.17.0 - 17 Feb 2022 > >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c > >> index 882ede75598d..e5bb6cca5274 100644 > >> --- a/lib/ovsdb-idl.c > >> +++ b/lib/ovsdb-idl.c > >> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) > >> { > >> ovsdb_idl_track_clear__(idl, false); > >> } > >> + > >> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY > >> + * for 'column' in 'idl', ensuring that the column will be included in a > >> + * transaction only if its value has actually changed locally. Normally > >> + * read/write columns that are written to are always included in the > >> + * transaction but, in specific cases, when the application doesn't > >> + * require atomicity of writes across different columns the ones that > >> + * don't chave value may be skipped. > >> + * > >> + * This function should be called between ovsdb_idl_create() and > >> + * the first call to ovsdb_idl_run(). > >> + */ > >> +void > >> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > >> + const struct ovsdb_idl_column *column, > >> + bool enable) > >> +{ > >> + if (enable) { > >> + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; > >> + } else { > >> + *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY; > >> + } > >> +} > >> + > >> +void > >> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) > >> +{ > >> + for (size_t i = 0; i < idl->class_->n_tables; i++) { > >> + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; > >> + > >> + for (size_t j = 0; j < tc->n_columns; j++) { > >> + const struct ovsdb_idl_column *column = &tc->columns[j]; > >> + ovsdb_idl_set_write_change_only(idl, column, enable); > >> + } > >> + } > >> +} > >> > >> static void > >> log_parse_update_error(struct ovsdb_error *error) > >> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > >> { > >> struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); > >> const struct ovsdb_idl_table_class *class; > >> + unsigned char column_mode; > >> + bool optimize_rewritten; > >> size_t column_idx; > >> bool write_only; > >> > >> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > >> > >> class = row->table->class_; > >> column_idx = column - class->columns; > >> - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; > >> + column_mode = row->table->modes[column_idx]; > >> + write_only = column_mode == OVSDB_IDL_MONITOR; > >> + optimize_rewritten = > >> + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); > >> + > >> > >> ovs_assert(row->new_datum != NULL); > >> ovs_assert(column_idx < class->n_columns); > >> - ovs_assert(row->old_datum == NULL || > >> - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); > >> + ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR); > >> > >> if (row->table->idl->verify_write_only && !write_only) { > >> VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when" > >> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, > >> * different value in that column since we read it. (But if a whole > >> * transaction only does writes of existing values, without making any real > >> * changes, we will drop the whole transaction later in > >> - * ovsdb_idl_txn_commit().) */ > >> - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), > >> - datum, &column->type)) { > >> + * ovsdb_idl_txn_commit().) > >> + * > >> + * The application may choose to bypass this restriction and always > >> + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. > >> + */ > >> + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column), > >> + datum, &column->type)) { > >> goto discard_datum; > >> } > >> > >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h > >> index d00599616ef9..1716955ab61c 100644 > >> --- a/lib/ovsdb-idl.h > >> +++ b/lib/ovsdb-idl.h > >> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, > >> * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column > >> * that a client wants to track using the change tracking > >> * ovsdb_idl_track_get_*() functions. > >> + * > >> + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY) > >> + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it > >> + * only adds a written column to a transaction if the column's value > >> + * has actually changed. > >> */ > >> #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ > >> #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ > >> -#define OVSDB_IDL_TRACK (1 << 2) > >> +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ > >> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ > >> + (1 << 3) /* Write back only changed columns? */ > >> > >> void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *); > >> void ovsdb_idl_add_table(struct ovsdb_idl *, > >> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row, > >> const struct ovsdb_idl_column *column); > >> void ovsdb_idl_track_clear(struct ovsdb_idl *); > >> > >> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, > >> + const struct ovsdb_idl_column *column, > >> + bool enable); > >> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable); > >> + > >> > >> /* Reading the database replica. */ > >> > >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > >> index 62e2b638320c..d1cfa59c5758 100644 > >> --- a/tests/ovsdb-idl.at > >> +++ b/tests/ovsdb-idl.at > >> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], > >> OVSDB_SERVER_SHUTDOWN > >> AT_CLEANUP]) > >> > >> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. > >> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], > >> + [AT_SETUP([$1 - write-changed-only - C]) > >> + AT_KEYWORDS([ovsdb server idl positive $5]) > >> + AT_CHECK([ovsdb_start_idltest]) > >> + m4_if([$2], [], [], > >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3], > >> + [0], [stdout], [ignore]) > >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > >> + [0], [$4]) > >> + OVSDB_SERVER_SHUTDOWN > >> + AT_CLEANUP]) > >> + > >> # same as OVSDB_CHECK_IDL but uses tcp. > >> m4_define([OVSDB_CHECK_IDL_TCP_C], > >> [AT_SETUP([$1 - C - tcp]) > >> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], > >> > >> m4_define([OVSDB_CHECK_IDL], > >> [OVSDB_CHECK_IDL_C($@) > >> + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) > >> OVSDB_CHECK_IDL_TCP_C($@) > >> OVSDB_CHECK_IDL_TCP6_C($@) > >> OVSDB_CHECK_IDL_PY($@) > >> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], > >> OVSDB_SERVER_SHUTDOWN > >> AT_CLEANUP]) > >> > >> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], > >> + [AT_SETUP([$1 - write-changed-only - C]) > >> + AT_KEYWORDS([ovsdb server idl tracking positive $5]) > >> + AT_CHECK([ovsdb_start_idltest]) > >> + m4_if([$2], [], [], > >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) > >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3], > >> + [0], [stdout], [ignore]) > >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), > >> + [0], [$4]) > >> + OVSDB_SERVER_SHUTDOWN > >> + AT_CLEANUP]) > >> + > >> m4_define([OVSDB_CHECK_IDL_TRACK], > >> - [OVSDB_CHECK_IDL_TRACK_C($@)]) > >> + [OVSDB_CHECK_IDL_TRACK_C($@) > >> + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) > >> > >> OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], > >> [['["idltest", > >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > >> index ca4e87b8115c..808b15355743 100644 > >> --- a/tests/test-ovsdb.c > >> +++ b/tests/test-ovsdb.c > >> @@ -56,6 +56,7 @@ > >> VLOG_DEFINE_THIS_MODULE(test_ovsdb); > >> > >> struct test_ovsdb_pvt_context { > >> + bool write_changed_only; > >> bool track; > >> }; > >> > >> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > >> {"timeout", required_argument, NULL, 't'}, > >> {"verbose", optional_argument, NULL, 'v'}, > >> {"change-track", optional_argument, NULL, 'c'}, > >> + {"write-changed-only", optional_argument, NULL, 'w'}, > >> {"magic", required_argument, NULL, OPT_MAGIC}, > >> {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES}, > >> {"help", no_argument, NULL, 'h'}, > >> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) > >> pvt->track = true; > >> break; > >> > >> + case 'w': > >> + pvt->write_changed_only = true; > >> + break; > >> + > >> case OPT_MAGIC: > >> magic = optarg; > >> break; > >> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) > >> static void > >> do_idl(struct ovs_cmdl_context *ctx) > >> { > >> + struct test_ovsdb_pvt_context *pvt = ctx->pvt; > >> struct jsonrpc *rpc; > >> struct ovsdb_idl *idl; > >> unsigned int seqno = 0; > >> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) > >> int step = 0; > >> int error; > >> int i; > >> - bool track; > >> - > >> - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; > >> > >> idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); > >> ovsdb_idl_set_leader_only(idl, false); > >> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) > >> rpc = NULL; > >> } > >> > >> - if (track) { > >> + if (pvt->track) { > >> ovsdb_idl_track_add_all(idl); > >> } > >> > >> + if (pvt->write_changed_only) { > >> + ovsdb_idl_set_write_change_only_all(idl, true); > >> + } > >> + > >> setvbuf(stdout, NULL, _IONBF, 0); > >> > >> symtab = ovsdb_symbol_table_create(); > >> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) > >> } > >> > >> /* Print update. */ > >> - if (track) { > >> + if (pvt->track) { > >> print_idl_track(idl, step++, terse); > >> ovsdb_idl_track_clear(idl); > >> } else { > >> -- > >> 2.27.0 > >> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > >
On 4/26/22 03:08, Han Zhou wrote: > On Wed, Apr 20, 2022 at 12:31 PM Dumitru Ceara <dceara@redhat.com> wrote: >> >> At a first glance, change tracking should never be allowed for >> write-only columns. However, some clients (e.g., ovn-northd) that are >> mostly exclusive writers of a database, use change tracking to avoid >> duplicating the IDL row records into a local cache when implementing >> incremental processing. >> >> The default behavior of the IDL is to automatically turn a write-only >> column into a read-write column whenever the client enables change >> tracking for that column. >> >> For the afore mentioned clients, this becomes a performance issue. >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't >> change a column's value.") explains why writes that don't change a >> column's value cannot be optimized out early if the column is >> read/write. >> >> Furthermore, if there is at least one record in any table that >> changed during a transaction, then *all* records that have been >> written are added to the transaction, even if their values didn't >> change. If there are many such rows (e.g., like in ovn-northd's >> case) this incurs a significant overhead because: >> a. the client has to build this large transaction >> b. the transaction has to be sent over the network >> c. the server needs to parse this (mostly) no-op update >> >> We now introduce new IDL APIs allowing users to set a new monitoring >> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the >> atomicity constraints may be relaxed and written columns that don't >> change value can be skipped from the current transaction. >> >> We benchmarked ovn-northd performance when using this new mode >> against NB and SB databases taken from ovn-kubernetes scale tests. >> We noticed that when a minor change is performed to the Northbound >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to >> build the Southbound transaction becomes negligible (vs ~1.5 seconds >> before this change). >> >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show >> significant reduction of latency to bring up pods; both average and P99 >> latency decreased by ~30%. >> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > Thanks Dumitru for the revision. I only have two minor comments below. > > Acked-by: Han Zhou <hzhou@ovn.org> > >> --- >> V3: >> - Addressed Han's comments: >> - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. >> - Rephrased commit log. >> - Changed commit title to reflect the new approach. >> - Old patch (v2) was: >> > https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ >> V2: >> - Addressed Numan's comments: >> - Added APIs to allow per column configuration of modes. >> - Fixed unit tests to actually enable change tracking for write-only >> columns. >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change >> seqnos if change tracking is enabled (even for write-only rows). >> >> Note: The OVN counter part change is: >> > https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 >> --- >> NEWS | 4 ++++ >> lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- >> lib/ovsdb-idl.h | 14 +++++++++++- >> tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- >> tests/test-ovsdb.c | 18 +++++++++++---- >> 5 files changed, 111 insertions(+), 13 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 5074b97aa51c..60932791bfcf 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -14,6 +14,10 @@ Post-v2.17.0 >> - OVSDB: >> * 'relay' service model now supports transaction history, i.e. > honors the >> 'last-txn-id' field in 'monitor_cond_since' requests from clients. >> + - OVSDB-IDL: >> + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing >> + applications to relax atomicity requirements when dealing with >> + columns whose value has been rewritten (but not changed). >> >> >> v2.17.0 - 17 Feb 2022 >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c >> index 882ede75598d..e5bb6cca5274 100644 >> --- a/lib/ovsdb-idl.c >> +++ b/lib/ovsdb-idl.c >> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) >> { >> ovsdb_idl_track_clear__(idl, false); >> } >> + >> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY >> + * for 'column' in 'idl', ensuring that the column will be included in a >> + * transaction only if its value has actually changed locally. Normally >> + * read/write columns that are written to are always included in the >> + * transaction but, in specific cases, when the application doesn't >> + * require atomicity of writes across different columns the ones that >> + * don't chave value may be skipped. > > s/chave/change > Ack. >> + * >> + * This function should be called between ovsdb_idl_create() and >> + * the first call to ovsdb_idl_run(). >> + */ >> +void >> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, >> + const struct ovsdb_idl_column *column, >> + bool enable) >> +{ >> + if (enable) { >> + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; >> + } else { >> + *ovsdb_idl_get_mode(idl, column) &= > ~OVSDB_IDL_WRITE_CHANGED_ONLY; >> + } >> +} >> + >> +void >> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) >> +{ >> + for (size_t i = 0; i < idl->class_->n_tables; i++) { >> + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; >> + >> + for (size_t j = 0; j < tc->n_columns; j++) { >> + const struct ovsdb_idl_column *column = &tc->columns[j]; >> + ovsdb_idl_set_write_change_only(idl, column, enable); >> + } >> + } >> +} >> >> static void >> log_parse_update_error(struct ovsdb_error *error) >> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row > *row_, >> { >> struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); >> const struct ovsdb_idl_table_class *class; >> + unsigned char column_mode; >> + bool optimize_rewritten; >> size_t column_idx; >> bool write_only; >> >> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row > *row_, >> >> class = row->table->class_; >> column_idx = column - class->columns; >> - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; >> + column_mode = row->table->modes[column_idx]; >> + write_only = column_mode == OVSDB_IDL_MONITOR; >> + optimize_rewritten = >> + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); >> + >> >> ovs_assert(row->new_datum != NULL); >> ovs_assert(column_idx < class->n_columns); >> - ovs_assert(row->old_datum == NULL || >> - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); >> + ovs_assert(row->old_datum == NULL || column_mode & > OVSDB_IDL_MONITOR); >> >> if (row->table->idl->verify_write_only && !write_only) { >> VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) > when" >> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row > *row_, >> * different value in that column since we read it. (But if a whole >> * transaction only does writes of existing values, without making > any real >> * changes, we will drop the whole transaction later in >> - * ovsdb_idl_txn_commit().) */ >> - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), >> - datum, &column->type)) { >> + * ovsdb_idl_txn_commit().) >> + * >> + * The application may choose to bypass this restriction and always >> + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. >> + */ >> + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, > column), >> + datum, &column->type)) { >> goto discard_datum; >> } >> >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h >> index d00599616ef9..1716955ab61c 100644 >> --- a/lib/ovsdb-idl.h >> +++ b/lib/ovsdb-idl.h >> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct > ovsdb_idl *, >> * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a > column >> * that a client wants to track using the change tracking >> * ovsdb_idl_track_get_*() functions. >> + * >> + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | > OVSDB_IDL_WRITE_CHANGED_ONLY) >> + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it >> + * only adds a written column to a transaction if the column's value >> + * has actually changed. > > In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR > | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as > well, but I think it is too tedious. Maybe we can change the section from > "possible mode combinations" to "typical mode combinations". > > I'll change this in v4. Thanks! Dumitru > >> */ >> #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ >> #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? > */ >> -#define OVSDB_IDL_TRACK (1 << 2) >> +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ >> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ >> + (1 << 3) /* Write back only changed columns? */ >> >> void ovsdb_idl_add_column(struct ovsdb_idl *, const struct > ovsdb_idl_column *); >> void ovsdb_idl_add_table(struct ovsdb_idl *, >> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct > ovsdb_idl_row *row, >> const struct ovsdb_idl_column *column); >> void ovsdb_idl_track_clear(struct ovsdb_idl *); >> >> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, >> + const struct ovsdb_idl_column > *column, >> + bool enable); >> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool > enable); >> + >> >> /* Reading the database replica. */ >> >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >> index 62e2b638320c..d1cfa59c5758 100644 >> --- a/tests/ovsdb-idl.at >> +++ b/tests/ovsdb-idl.at >> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], >> OVSDB_SERVER_SHUTDOWN >> AT_CLEANUP]) >> >> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. >> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], >> + [AT_SETUP([$1 - write-changed-only - C]) >> + AT_KEYWORDS([ovsdb server idl positive $5]) >> + AT_CHECK([ovsdb_start_idltest]) >> + m4_if([$2], [], [], >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > [ignore])]) >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > -t10 idl unix:socket $3], >> + [0], [stdout], [ignore]) >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> + [0], [$4]) >> + OVSDB_SERVER_SHUTDOWN >> + AT_CLEANUP]) >> + >> # same as OVSDB_CHECK_IDL but uses tcp. >> m4_define([OVSDB_CHECK_IDL_TCP_C], >> [AT_SETUP([$1 - C - tcp]) >> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], >> >> m4_define([OVSDB_CHECK_IDL], >> [OVSDB_CHECK_IDL_C($@) >> + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) >> OVSDB_CHECK_IDL_TCP_C($@) >> OVSDB_CHECK_IDL_TCP6_C($@) >> OVSDB_CHECK_IDL_PY($@) >> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], >> OVSDB_SERVER_SHUTDOWN >> AT_CLEANUP]) >> >> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], >> + [AT_SETUP([$1 - write-changed-only - C]) >> + AT_KEYWORDS([ovsdb server idl tracking positive $5]) >> + AT_CHECK([ovsdb_start_idltest]) >> + m4_if([$2], [], [], >> + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], > [ignore])]) >> + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc > -t10 -c -w idl unix:socket $3], >> + [0], [stdout], [ignore]) >> + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), >> + [0], [$4]) >> + OVSDB_SERVER_SHUTDOWN >> + AT_CLEANUP]) >> + >> m4_define([OVSDB_CHECK_IDL_TRACK], >> - [OVSDB_CHECK_IDL_TRACK_C($@)]) >> + [OVSDB_CHECK_IDL_TRACK_C($@) >> + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) >> >> OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], >> [['["idltest", >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c >> index ca4e87b8115c..808b15355743 100644 >> --- a/tests/test-ovsdb.c >> +++ b/tests/test-ovsdb.c >> @@ -56,6 +56,7 @@ >> VLOG_DEFINE_THIS_MODULE(test_ovsdb); >> >> struct test_ovsdb_pvt_context { >> + bool write_changed_only; >> bool track; >> }; >> >> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct > test_ovsdb_pvt_context *pvt) >> {"timeout", required_argument, NULL, 't'}, >> {"verbose", optional_argument, NULL, 'v'}, >> {"change-track", optional_argument, NULL, 'c'}, >> + {"write-changed-only", optional_argument, NULL, 'w'}, >> {"magic", required_argument, NULL, OPT_MAGIC}, >> {"no-rename-open-files", no_argument, NULL, > OPT_NO_RENAME_OPEN_FILES}, >> {"help", no_argument, NULL, 'h'}, >> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct > test_ovsdb_pvt_context *pvt) >> pvt->track = true; >> break; >> >> + case 'w': >> + pvt->write_changed_only = true; >> + break; >> + >> case OPT_MAGIC: >> magic = optarg; >> break; >> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char > *commands) >> static void >> do_idl(struct ovs_cmdl_context *ctx) >> { >> + struct test_ovsdb_pvt_context *pvt = ctx->pvt; >> struct jsonrpc *rpc; >> struct ovsdb_idl *idl; >> unsigned int seqno = 0; >> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) >> int step = 0; >> int error; >> int i; >> - bool track; >> - >> - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; >> >> idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); >> ovsdb_idl_set_leader_only(idl, false); >> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) >> rpc = NULL; >> } >> >> - if (track) { >> + if (pvt->track) { >> ovsdb_idl_track_add_all(idl); >> } >> >> + if (pvt->write_changed_only) { >> + ovsdb_idl_set_write_change_only_all(idl, true); >> + } >> + >> setvbuf(stdout, NULL, _IONBF, 0); >> >> symtab = ovsdb_symbol_table_create(); >> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) >> } >> >> /* Print update. */ >> - if (track) { >> + if (pvt->track) { >> print_idl_track(idl, step++, terse); >> ovsdb_idl_track_clear(idl); >> } else { >> -- >> 2.27.0 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 4/26/22 03:10, Han Zhou wrote: >>> Is this something that we should consider adding at some point to the >>> Python IDL code as well? I know in the past feature support has >>> diverged. Maybe at some point we should create a python/TODO file that >>> lists features in the C IDL that still need to be implemented if they >>> aren't immediately added with C IDL changes? >> You're right, we should [robably consider adding this as a new mode to >> the Python IDL too. I can do that as a follow up if needed. However, >> until now, the only user of this new mode is ovn-northd and it will be >> using it in conjunction with change tracking (which is another missing >> Python IDL feature). >> >> Han, Ilya, what do you think? >> >> Nevertheless, a TODO list would probably be good to have indeed. >> > I agree that a TODO would be good for the python IDL. Since there are > already feature gaps, I think it may be a separate patch to add TODOs for > all the gaps. > > Thanks, > Han > OK, thanks for the feedback! I'll work on a patch to add a python/TODO.rst file and send it out after the C IDL change is accepted. Regards, Dumitru
On 4/26/22 10:05, Dumitru Ceara wrote: >> In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR >> | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as >> well, but I think it is too tedious. Maybe we can change the section from >> "possible mode combinations" to "typical mode combinations". >> >> > I'll change this in v4. > v4 available at: https://patchwork.ozlabs.org/project/openvswitch/patch/20220426103708.17266-1-dceara@redhat.com/
diff --git a/NEWS b/NEWS index 5074b97aa51c..60932791bfcf 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,10 @@ Post-v2.17.0 - OVSDB: * 'relay' service model now supports transaction history, i.e. honors the 'last-txn-id' field in 'monitor_cond_since' requests from clients. + - OVSDB-IDL: + * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing + applications to relax atomicity requirements when dealing with + columns whose value has been rewritten (but not changed). v2.17.0 - 17 Feb 2022 diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 882ede75598d..e5bb6cca5274 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl) { ovsdb_idl_track_clear__(idl, false); } + +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY + * for 'column' in 'idl', ensuring that the column will be included in a + * transaction only if its value has actually changed locally. Normally + * read/write columns that are written to are always included in the + * transaction but, in specific cases, when the application doesn't + * require atomicity of writes across different columns the ones that + * don't chave value may be skipped. + * + * This function should be called between ovsdb_idl_create() and + * the first call to ovsdb_idl_run(). + */ +void +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, + const struct ovsdb_idl_column *column, + bool enable) +{ + if (enable) { + *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY; + } else { + *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY; + } +} + +void +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable) +{ + for (size_t i = 0; i < idl->class_->n_tables; i++) { + const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i]; + + for (size_t j = 0; j < tc->n_columns; j++) { + const struct ovsdb_idl_column *column = &tc->columns[j]; + ovsdb_idl_set_write_change_only(idl, column, enable); + } + } +} static void log_parse_update_error(struct ovsdb_error *error) @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, { struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_); const struct ovsdb_idl_table_class *class; + unsigned char column_mode; + bool optimize_rewritten; size_t column_idx; bool write_only; @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, class = row->table->class_; column_idx = column - class->columns; - write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR; + column_mode = row->table->modes[column_idx]; + write_only = column_mode == OVSDB_IDL_MONITOR; + optimize_rewritten = + write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY); + ovs_assert(row->new_datum != NULL); ovs_assert(column_idx < class->n_columns); - ovs_assert(row->old_datum == NULL || - row->table->modes[column_idx] & OVSDB_IDL_MONITOR); + ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR); if (row->table->idl->verify_write_only && !write_only) { VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when" @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_, * different value in that column since we read it. (But if a whole * transaction only does writes of existing values, without making any real * changes, we will drop the whole transaction later in - * ovsdb_idl_txn_commit().) */ - if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column), - datum, &column->type)) { + * ovsdb_idl_txn_commit().) + * + * The application may choose to bypass this restriction and always + * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY. + */ + if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column), + datum, &column->type)) { goto discard_datum; } diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h index d00599616ef9..1716955ab61c 100644 --- a/lib/ovsdb-idl.h +++ b/lib/ovsdb-idl.h @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *, * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column * that a client wants to track using the change tracking * ovsdb_idl_track_get_*() functions. + * + * - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY) + * is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it + * only adds a written column to a transaction if the column's value + * has actually changed. */ #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */ #define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */ -#define OVSDB_IDL_TRACK (1 << 2) +#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */ +#define OVSDB_IDL_WRITE_CHANGED_ONLY \ + (1 << 3) /* Write back only changed columns? */ void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *); void ovsdb_idl_add_table(struct ovsdb_idl *, @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row, const struct ovsdb_idl_column *column); void ovsdb_idl_track_clear(struct ovsdb_idl *); +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl, + const struct ovsdb_idl_column *column, + bool enable); +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable); + /* Reading the database replica. */ diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index 62e2b638320c..d1cfa59c5758 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C], OVSDB_SERVER_SHUTDOWN AT_CLEANUP]) +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY. +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C], + [AT_SETUP([$1 - write-changed-only - C]) + AT_KEYWORDS([ovsdb server idl positive $5]) + AT_CHECK([ovsdb_start_idltest]) + m4_if([$2], [], [], + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3], + [0], [stdout], [ignore]) + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), + [0], [$4]) + OVSDB_SERVER_SHUTDOWN + AT_CLEANUP]) + # same as OVSDB_CHECK_IDL but uses tcp. m4_define([OVSDB_CHECK_IDL_TCP_C], [AT_SETUP([$1 - C - tcp]) @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY], m4_define([OVSDB_CHECK_IDL], [OVSDB_CHECK_IDL_C($@) + OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@) OVSDB_CHECK_IDL_TCP_C($@) OVSDB_CHECK_IDL_TCP6_C($@) OVSDB_CHECK_IDL_PY($@) @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C], OVSDB_SERVER_SHUTDOWN AT_CLEANUP]) +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C], + [AT_SETUP([$1 - write-changed-only - C]) + AT_KEYWORDS([ovsdb server idl tracking positive $5]) + AT_CHECK([ovsdb_start_idltest]) + m4_if([$2], [], [], + [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])]) + AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3], + [0], [stdout], [ignore]) + AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]), + [0], [$4]) + OVSDB_SERVER_SHUTDOWN + AT_CLEANUP]) + m4_define([OVSDB_CHECK_IDL_TRACK], - [OVSDB_CHECK_IDL_TRACK_C($@)]) + [OVSDB_CHECK_IDL_TRACK_C($@) + OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)]) OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated], [['["idltest", diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index ca4e87b8115c..808b15355743 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -56,6 +56,7 @@ VLOG_DEFINE_THIS_MODULE(test_ovsdb); struct test_ovsdb_pvt_context { + bool write_changed_only; bool track; }; @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) {"timeout", required_argument, NULL, 't'}, {"verbose", optional_argument, NULL, 'v'}, {"change-track", optional_argument, NULL, 'c'}, + {"write-changed-only", optional_argument, NULL, 'w'}, {"magic", required_argument, NULL, OPT_MAGIC}, {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES}, {"help", no_argument, NULL, 'h'}, @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt) pvt->track = true; break; + case 'w': + pvt->write_changed_only = true; + break; + case OPT_MAGIC: magic = optarg; break; @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands) static void do_idl(struct ovs_cmdl_context *ctx) { + struct test_ovsdb_pvt_context *pvt = ctx->pvt; struct jsonrpc *rpc; struct ovsdb_idl *idl; unsigned int seqno = 0; @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx) int step = 0; int error; int i; - bool track; - - track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track; idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true); ovsdb_idl_set_leader_only(idl, false); @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx) rpc = NULL; } - if (track) { + if (pvt->track) { ovsdb_idl_track_add_all(idl); } + if (pvt->write_changed_only) { + ovsdb_idl_set_write_change_only_all(idl, true); + } + setvbuf(stdout, NULL, _IONBF, 0); symtab = ovsdb_symbol_table_create(); @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx) } /* Print update. */ - if (track) { + if (pvt->track) { print_idl_track(idl, step++, terse); ovsdb_idl_track_clear(idl); } else {
At a first glance, change tracking should never be allowed for write-only columns. However, some clients (e.g., ovn-northd) that are mostly exclusive writers of a database, use change tracking to avoid duplicating the IDL row records into a local cache when implementing incremental processing. The default behavior of the IDL is to automatically turn a write-only column into a read-write column whenever the client enables change tracking for that column. For the afore mentioned clients, this becomes a performance issue. Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't change a column's value.") explains why writes that don't change a column's value cannot be optimized out early if the column is read/write. Furthermore, if there is at least one record in any table that changed during a transaction, then *all* records that have been written are added to the transaction, even if their values didn't change. If there are many such rows (e.g., like in ovn-northd's case) this incurs a significant overhead because: a. the client has to build this large transaction b. the transaction has to be sent over the network c. the server needs to parse this (mostly) no-op update We now introduce new IDL APIs allowing users to set a new monitoring mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the atomicity constraints may be relaxed and written columns that don't change value can be skipped from the current transaction. We benchmarked ovn-northd performance when using this new mode against NB and SB databases taken from ovn-kubernetes scale tests. We noticed that when a minor change is performed to the Northbound database (e.g., NB_Global.nb_cfg is incremented) the time it takes to build the Southbound transaction becomes negligible (vs ~1.5 seconds before this change). End-to-end ovn-kubernetes scale tests on 120-node clusters also show significant reduction of latency to bring up pods; both average and P99 latency decreased by ~30%. Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- V3: - Addressed Han's comments: - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY. - Rephrased commit log. - Changed commit title to reflect the new approach. - Old patch (v2) was: https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/ V2: - Addressed Numan's comments: - Added APIs to allow per column configuration of modes. - Fixed unit tests to actually enable change tracking for write-only columns. - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change seqnos if change tracking is enabled (even for write-only rows). Note: The OVN counter part change is: https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2 --- NEWS | 4 ++++ lib/ovsdb-idl.c | 57 +++++++++++++++++++++++++++++++++++++++++----- lib/ovsdb-idl.h | 14 +++++++++++- tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++- tests/test-ovsdb.c | 18 +++++++++++---- 5 files changed, 111 insertions(+), 13 deletions(-)