diff mbox series

[ovs-dev,v3] ovsdb-idl: Support write-only-changed IDL monitor mode.

Message ID 20220420193122.22651-1-dceara@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,v3] ovsdb-idl: Support write-only-changed IDL monitor mode. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Dumitru Ceara April 20, 2022, 7:31 p.m. UTC
At a first glance, change tracking should never be allowed for
write-only columns.  However, some clients (e.g., ovn-northd) that are
mostly exclusive writers of a database, use change tracking to avoid
duplicating the IDL row records into a local cache when implementing
incremental processing.

The default behavior of the IDL is to automatically turn a write-only
column into a read-write column whenever the client enables change
tracking for that column.

For the afore mentioned clients, this becomes a performance issue.
Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
change a column's value.") explains why writes that don't change a
column's value cannot be optimized out early if the column is
read/write.

Furthermore, if there is at least one record in any table that
changed during a transaction, then *all* records that have been
written are added to the transaction, even if their values didn't
change.  If there are many such rows (e.g., like in ovn-northd's
case) this incurs a significant overhead because:
a. the client has to build this large transaction
b. the transaction has to be sent over the network
c. the server needs to parse this (mostly) no-op update

We now introduce new IDL APIs allowing users to set a new monitoring
mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
atomicity constraints may be relaxed and written columns that don't
change value can be skipped from the current transaction.

We benchmarked ovn-northd performance when using this new mode
against NB and SB databases taken from ovn-kubernetes scale tests.
We noticed that when a minor change is performed to the Northbound
database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
build the Southbound transaction becomes negligible (vs ~1.5 seconds
before this change).

End-to-end ovn-kubernetes scale tests on 120-node clusters also show
significant reduction of latency to bring up pods; both average and P99
latency decreased by ~30%.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
V3:
- Addressed Han's comments:
  - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
- Rephrased commit log.
- Changed commit title to reflect the new approach.
- Old patch (v2) was:
  https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
V2:
- Addressed Numan's comments:
  - Added APIs to allow per column configuration of modes.
- Fixed unit tests to actually enable change tracking for write-only
  columns.
- Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
  seqnos if change tracking is enabled (even for write-only rows).

Note: The OVN counter part change is:
https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
---
 NEWS               |  4 ++++
 lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
 lib/ovsdb-idl.h    | 14 +++++++++++-
 tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
 tests/test-ovsdb.c | 18 +++++++++++----
 5 files changed, 111 insertions(+), 13 deletions(-)

Comments

Terry Wilson April 22, 2022, 6:38 p.m. UTC | #1
On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> At a first glance, change tracking should never be allowed for
> write-only columns.  However, some clients (e.g., ovn-northd) that are
> mostly exclusive writers of a database, use change tracking to avoid
> duplicating the IDL row records into a local cache when implementing
> incremental processing.
>
> The default behavior of the IDL is to automatically turn a write-only
> column into a read-write column whenever the client enables change
> tracking for that column.
>
> For the afore mentioned clients, this becomes a performance issue.
> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
> change a column's value.") explains why writes that don't change a
> column's value cannot be optimized out early if the column is
> read/write.
>
> Furthermore, if there is at least one record in any table that
> changed during a transaction, then *all* records that have been
> written are added to the transaction, even if their values didn't
> change.  If there are many such rows (e.g., like in ovn-northd's
> case) this incurs a significant overhead because:
> a. the client has to build this large transaction
> b. the transaction has to be sent over the network
> c. the server needs to parse this (mostly) no-op update
>
> We now introduce new IDL APIs allowing users to set a new monitoring
> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
> atomicity constraints may be relaxed and written columns that don't
> change value can be skipped from the current transaction.

Is this something that we should consider adding at some point to the
Python IDL code as well? I know in the past feature support has
diverged. Maybe at some point we should create a python/TODO file that
lists features in the C IDL that still need to be implemented if they
aren't immediately added with C IDL changes?

> We benchmarked ovn-northd performance when using this new mode
> against NB and SB databases taken from ovn-kubernetes scale tests.
> We noticed that when a minor change is performed to the Northbound
> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
> build the Southbound transaction becomes negligible (vs ~1.5 seconds
> before this change).
>
> End-to-end ovn-kubernetes scale tests on 120-node clusters also show
> significant reduction of latency to bring up pods; both average and P99
> latency decreased by ~30%.
>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
> V3:
> - Addressed Han's comments:
>   - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
> - Rephrased commit log.
> - Changed commit title to reflect the new approach.
> - Old patch (v2) was:
>   https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
> V2:
> - Addressed Numan's comments:
>   - Added APIs to allow per column configuration of modes.
> - Fixed unit tests to actually enable change tracking for write-only
>   columns.
> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
>   seqnos if change tracking is enabled (even for write-only rows).
>
> Note: The OVN counter part change is:
> https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
> ---
>  NEWS               |  4 ++++
>  lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
>  lib/ovsdb-idl.h    | 14 +++++++++++-
>  tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
>  tests/test-ovsdb.c | 18 +++++++++++----
>  5 files changed, 111 insertions(+), 13 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5074b97aa51c..60932791bfcf 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,10 @@ Post-v2.17.0
>     - OVSDB:
>       * 'relay' service model now supports transaction history, i.e. honors the
>         'last-txn-id' field in 'monitor_cond_since' requests from clients.
> +   - OVSDB-IDL:
> +     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
> +       applications to relax atomicity requirements when dealing with
> +       columns whose value has been rewritten (but not changed).
>
>
>  v2.17.0 - 17 Feb 2022
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> index 882ede75598d..e5bb6cca5274 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
>  {
>      ovsdb_idl_track_clear__(idl, false);
>  }
> +
> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
> + * for 'column' in 'idl', ensuring that the column will be included in a
> + * transaction only if its value has actually changed locally.  Normally
> + * read/write columns that are written to are always included in the
> + * transaction but, in specific cases, when the application doesn't
> + * require atomicity of writes across different columns the ones that
> + * don't chave value may be skipped.
> + *
> + * This function should be called between ovsdb_idl_create() and
> + * the first call to ovsdb_idl_run().
> + */
> +void
> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> +                                const struct ovsdb_idl_column *column,
> +                                bool enable)
> +{
> +    if (enable) {
> +        *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
> +    } else {
> +        *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY;
> +    }
> +}
> +
> +void
> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable)
> +{
> +    for (size_t i = 0; i < idl->class_->n_tables; i++) {
> +        const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];
> +
> +        for (size_t j = 0; j < tc->n_columns; j++) {
> +            const struct ovsdb_idl_column *column = &tc->columns[j];
> +            ovsdb_idl_set_write_change_only(idl, column, enable);
> +        }
> +    }
> +}
>
>  static void
>  log_parse_update_error(struct ovsdb_error *error)
> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>  {
>      struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
>      const struct ovsdb_idl_table_class *class;
> +    unsigned char column_mode;
> +    bool optimize_rewritten;
>      size_t column_idx;
>      bool write_only;
>
> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>
>      class = row->table->class_;
>      column_idx = column - class->columns;
> -    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
> +    column_mode = row->table->modes[column_idx];
> +    write_only = column_mode == OVSDB_IDL_MONITOR;
> +    optimize_rewritten =
> +        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
> +
>
>      ovs_assert(row->new_datum != NULL);
>      ovs_assert(column_idx < class->n_columns);
> -    ovs_assert(row->old_datum == NULL ||
> -               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
> +    ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR);
>
>      if (row->table->idl->verify_write_only && !write_only) {
>          VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when"
> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>       * different value in that column since we read it.  (But if a whole
>       * transaction only does writes of existing values, without making any real
>       * changes, we will drop the whole transaction later in
> -     * ovsdb_idl_txn_commit().) */
> -    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> -                                         datum, &column->type)) {
> +     * ovsdb_idl_txn_commit().)
> +     *
> +     * The application may choose to bypass this restriction and always
> +     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
> +     */
> +    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> +                                                 datum, &column->type)) {
>          goto discard_datum;
>      }
>
> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
> index d00599616ef9..1716955ab61c 100644
> --- a/lib/ovsdb-idl.h
> +++ b/lib/ovsdb-idl.h
> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *,
>   *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column
>   *     that a client wants to track using the change tracking
>   *     ovsdb_idl_track_get_*() functions.
> + *
> + *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY)
> + *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
> + *     only adds a written column to a transaction if the column's value
> + *     has actually changed.
>   */
>  #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
>  #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column changes? */
> -#define OVSDB_IDL_TRACK   (1 << 2)
> +#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \
> +                          (1 << 3) /* Write back only changed columns? */
>
>  void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *);
>  void ovsdb_idl_add_table(struct ovsdb_idl *,
> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row,
>                                  const struct ovsdb_idl_column *column);
>  void ovsdb_idl_track_clear(struct ovsdb_idl *);
>
> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> +                                     const struct ovsdb_idl_column *column,
> +                                     bool enable);
> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable);
> +
>
>  /* Reading the database replica. */
>
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 62e2b638320c..d1cfa59c5758 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
>     OVSDB_SERVER_SHUTDOWN
>     AT_CLEANUP])
>
> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
> +  [AT_SETUP([$1 - write-changed-only - C])
> +   AT_KEYWORDS([ovsdb server idl positive $5])
> +   AT_CHECK([ovsdb_start_idltest])
> +   m4_if([$2], [], [],
> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3],
> +            [0], [stdout], [ignore])
> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> +            [0], [$4])
> +   OVSDB_SERVER_SHUTDOWN
> +   AT_CLEANUP])
> +
>  # same as OVSDB_CHECK_IDL but uses tcp.
>  m4_define([OVSDB_CHECK_IDL_TCP_C],
>    [AT_SETUP([$1 - C - tcp])
> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],
>
>  m4_define([OVSDB_CHECK_IDL],
>    [OVSDB_CHECK_IDL_C($@)
> +  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
>     OVSDB_CHECK_IDL_TCP_C($@)
>     OVSDB_CHECK_IDL_TCP6_C($@)
>     OVSDB_CHECK_IDL_PY($@)
> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
>     OVSDB_SERVER_SHUTDOWN
>     AT_CLEANUP])
>
> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
> +  [AT_SETUP([$1 - write-changed-only - C])
> +   AT_KEYWORDS([ovsdb server idl tracking positive $5])
> +   AT_CHECK([ovsdb_start_idltest])
> +   m4_if([$2], [], [],
> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3],
> +            [0], [stdout], [ignore])
> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> +            [0], [$4])
> +   OVSDB_SERVER_SHUTDOWN
> +   AT_CLEANUP])
> +
>  m4_define([OVSDB_CHECK_IDL_TRACK],
> -  [OVSDB_CHECK_IDL_TRACK_C($@)])
> +  [OVSDB_CHECK_IDL_TRACK_C($@)
> +   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
>
>  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
>    [['["idltest",
> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
> index ca4e87b8115c..808b15355743 100644
> --- a/tests/test-ovsdb.c
> +++ b/tests/test-ovsdb.c
> @@ -56,6 +56,7 @@
>  VLOG_DEFINE_THIS_MODULE(test_ovsdb);
>
>  struct test_ovsdb_pvt_context {
> +    bool write_changed_only;
>      bool track;
>  };
>
> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
>          {"timeout", required_argument, NULL, 't'},
>          {"verbose", optional_argument, NULL, 'v'},
>          {"change-track", optional_argument, NULL, 'c'},
> +        {"write-changed-only", optional_argument, NULL, 'w'},
>          {"magic", required_argument, NULL, OPT_MAGIC},
>          {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES},
>          {"help", no_argument, NULL, 'h'},
> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
>              pvt->track = true;
>              break;
>
> +        case 'w':
> +            pvt->write_changed_only = true;
> +            break;
> +
>          case OPT_MAGIC:
>              magic = optarg;
>              break;
> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands)
>  static void
>  do_idl(struct ovs_cmdl_context *ctx)
>  {
> +    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
>      struct jsonrpc *rpc;
>      struct ovsdb_idl *idl;
>      unsigned int seqno = 0;
> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
>      int step = 0;
>      int error;
>      int i;
> -    bool track;
> -
> -    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
>
>      idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
>      ovsdb_idl_set_leader_only(idl, false);
> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
>          rpc = NULL;
>      }
>
> -    if (track) {
> +    if (pvt->track) {
>          ovsdb_idl_track_add_all(idl);
>      }
>
> +    if (pvt->write_changed_only) {
> +        ovsdb_idl_set_write_change_only_all(idl, true);
> +    }
> +
>      setvbuf(stdout, NULL, _IONBF, 0);
>
>      symtab = ovsdb_symbol_table_create();
> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>              }
>
>              /* Print update. */
> -            if (track) {
> +            if (pvt->track) {
>                  print_idl_track(idl, step++, terse);
>                  ovsdb_idl_track_clear(idl);
>              } else {
> --
> 2.27.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara April 25, 2022, 9:28 a.m. UTC | #2
On 4/22/22 20:38, Terry Wilson wrote:
> On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> At a first glance, change tracking should never be allowed for
>> write-only columns.  However, some clients (e.g., ovn-northd) that are
>> mostly exclusive writers of a database, use change tracking to avoid
>> duplicating the IDL row records into a local cache when implementing
>> incremental processing.
>>
>> The default behavior of the IDL is to automatically turn a write-only
>> column into a read-write column whenever the client enables change
>> tracking for that column.
>>
>> For the afore mentioned clients, this becomes a performance issue.
>> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
>> change a column's value.") explains why writes that don't change a
>> column's value cannot be optimized out early if the column is
>> read/write.
>>
>> Furthermore, if there is at least one record in any table that
>> changed during a transaction, then *all* records that have been
>> written are added to the transaction, even if their values didn't
>> change.  If there are many such rows (e.g., like in ovn-northd's
>> case) this incurs a significant overhead because:
>> a. the client has to build this large transaction
>> b. the transaction has to be sent over the network
>> c. the server needs to parse this (mostly) no-op update
>>
>> We now introduce new IDL APIs allowing users to set a new monitoring
>> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
>> atomicity constraints may be relaxed and written columns that don't
>> change value can be skipped from the current transaction.
> 
> Is this something that we should consider adding at some point to the
> Python IDL code as well? I know in the past feature support has
> diverged. Maybe at some point we should create a python/TODO file that
> lists features in the C IDL that still need to be implemented if they
> aren't immediately added with C IDL changes?

You're right, we should [robably consider adding this as a new mode to
the Python IDL too.  I can do that as a follow up if needed.  However,
until now, the only user of this new mode is ovn-northd and it will be
using it in conjunction with change tracking (which is another missing
Python IDL feature).

Han, Ilya, what do you think?

Nevertheless, a TODO list would probably be good to have indeed.

> 
>> We benchmarked ovn-northd performance when using this new mode
>> against NB and SB databases taken from ovn-kubernetes scale tests.
>> We noticed that when a minor change is performed to the Northbound
>> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
>> build the Southbound transaction becomes negligible (vs ~1.5 seconds
>> before this change).
>>
>> End-to-end ovn-kubernetes scale tests on 120-node clusters also show
>> significant reduction of latency to bring up pods; both average and P99
>> latency decreased by ~30%.
>>
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>> ---
>> V3:
>> - Addressed Han's comments:
>>   - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
>> - Rephrased commit log.
>> - Changed commit title to reflect the new approach.
>> - Old patch (v2) was:
>>   https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
>> V2:
>> - Addressed Numan's comments:
>>   - Added APIs to allow per column configuration of modes.
>> - Fixed unit tests to actually enable change tracking for write-only
>>   columns.
>> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
>>   seqnos if change tracking is enabled (even for write-only rows).
>>
>> Note: The OVN counter part change is:
>> https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
>> ---
>>  NEWS               |  4 ++++
>>  lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
>>  lib/ovsdb-idl.h    | 14 +++++++++++-
>>  tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
>>  tests/test-ovsdb.c | 18 +++++++++++----
>>  5 files changed, 111 insertions(+), 13 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 5074b97aa51c..60932791bfcf 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -14,6 +14,10 @@ Post-v2.17.0
>>     - OVSDB:
>>       * 'relay' service model now supports transaction history, i.e. honors the
>>         'last-txn-id' field in 'monitor_cond_since' requests from clients.
>> +   - OVSDB-IDL:
>> +     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
>> +       applications to relax atomicity requirements when dealing with
>> +       columns whose value has been rewritten (but not changed).
>>
>>
>>  v2.17.0 - 17 Feb 2022
>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> index 882ede75598d..e5bb6cca5274 100644
>> --- a/lib/ovsdb-idl.c
>> +++ b/lib/ovsdb-idl.c
>> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
>>  {
>>      ovsdb_idl_track_clear__(idl, false);
>>  }
>> +
>> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
>> + * for 'column' in 'idl', ensuring that the column will be included in a
>> + * transaction only if its value has actually changed locally.  Normally
>> + * read/write columns that are written to are always included in the
>> + * transaction but, in specific cases, when the application doesn't
>> + * require atomicity of writes across different columns the ones that
>> + * don't chave value may be skipped.
>> + *
>> + * This function should be called between ovsdb_idl_create() and
>> + * the first call to ovsdb_idl_run().
>> + */
>> +void
>> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
>> +                                const struct ovsdb_idl_column *column,
>> +                                bool enable)
>> +{
>> +    if (enable) {
>> +        *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
>> +    } else {
>> +        *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY;
>> +    }
>> +}
>> +
>> +void
>> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable)
>> +{
>> +    for (size_t i = 0; i < idl->class_->n_tables; i++) {
>> +        const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];
>> +
>> +        for (size_t j = 0; j < tc->n_columns; j++) {
>> +            const struct ovsdb_idl_column *column = &tc->columns[j];
>> +            ovsdb_idl_set_write_change_only(idl, column, enable);
>> +        }
>> +    }
>> +}
>>
>>  static void
>>  log_parse_update_error(struct ovsdb_error *error)
>> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>>  {
>>      struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
>>      const struct ovsdb_idl_table_class *class;
>> +    unsigned char column_mode;
>> +    bool optimize_rewritten;
>>      size_t column_idx;
>>      bool write_only;
>>
>> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>>
>>      class = row->table->class_;
>>      column_idx = column - class->columns;
>> -    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
>> +    column_mode = row->table->modes[column_idx];
>> +    write_only = column_mode == OVSDB_IDL_MONITOR;
>> +    optimize_rewritten =
>> +        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
>> +
>>
>>      ovs_assert(row->new_datum != NULL);
>>      ovs_assert(column_idx < class->n_columns);
>> -    ovs_assert(row->old_datum == NULL ||
>> -               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
>> +    ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR);
>>
>>      if (row->table->idl->verify_write_only && !write_only) {
>>          VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when"
>> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
>>       * different value in that column since we read it.  (But if a whole
>>       * transaction only does writes of existing values, without making any real
>>       * changes, we will drop the whole transaction later in
>> -     * ovsdb_idl_txn_commit().) */
>> -    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
>> -                                         datum, &column->type)) {
>> +     * ovsdb_idl_txn_commit().)
>> +     *
>> +     * The application may choose to bypass this restriction and always
>> +     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
>> +     */
>> +    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column),
>> +                                                 datum, &column->type)) {
>>          goto discard_datum;
>>      }
>>
>> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
>> index d00599616ef9..1716955ab61c 100644
>> --- a/lib/ovsdb-idl.h
>> +++ b/lib/ovsdb-idl.h
>> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *,
>>   *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column
>>   *     that a client wants to track using the change tracking
>>   *     ovsdb_idl_track_get_*() functions.
>> + *
>> + *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY)
>> + *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
>> + *     only adds a written column to a transaction if the column's value
>> + *     has actually changed.
>>   */
>>  #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
>>  #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column changes? */
>> -#define OVSDB_IDL_TRACK   (1 << 2)
>> +#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
>> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \
>> +                          (1 << 3) /* Write back only changed columns? */
>>
>>  void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *);
>>  void ovsdb_idl_add_table(struct ovsdb_idl *,
>> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row,
>>                                  const struct ovsdb_idl_column *column);
>>  void ovsdb_idl_track_clear(struct ovsdb_idl *);
>>
>> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
>> +                                     const struct ovsdb_idl_column *column,
>> +                                     bool enable);
>> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable);
>> +
>>
>>  /* Reading the database replica. */
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 62e2b638320c..d1cfa59c5758 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
>>     OVSDB_SERVER_SHUTDOWN
>>     AT_CLEANUP])
>>
>> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
>> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
>> +  [AT_SETUP([$1 - write-changed-only - C])
>> +   AT_KEYWORDS([ovsdb server idl positive $5])
>> +   AT_CHECK([ovsdb_start_idltest])
>> +   m4_if([$2], [], [],
>> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
>> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3],
>> +            [0], [stdout], [ignore])
>> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
>> +            [0], [$4])
>> +   OVSDB_SERVER_SHUTDOWN
>> +   AT_CLEANUP])
>> +
>>  # same as OVSDB_CHECK_IDL but uses tcp.
>>  m4_define([OVSDB_CHECK_IDL_TCP_C],
>>    [AT_SETUP([$1 - C - tcp])
>> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],
>>
>>  m4_define([OVSDB_CHECK_IDL],
>>    [OVSDB_CHECK_IDL_C($@)
>> +  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
>>     OVSDB_CHECK_IDL_TCP_C($@)
>>     OVSDB_CHECK_IDL_TCP6_C($@)
>>     OVSDB_CHECK_IDL_PY($@)
>> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
>>     OVSDB_SERVER_SHUTDOWN
>>     AT_CLEANUP])
>>
>> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
>> +  [AT_SETUP([$1 - write-changed-only - C])
>> +   AT_KEYWORDS([ovsdb server idl tracking positive $5])
>> +   AT_CHECK([ovsdb_start_idltest])
>> +   m4_if([$2], [], [],
>> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
>> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3],
>> +            [0], [stdout], [ignore])
>> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
>> +            [0], [$4])
>> +   OVSDB_SERVER_SHUTDOWN
>> +   AT_CLEANUP])
>> +
>>  m4_define([OVSDB_CHECK_IDL_TRACK],
>> -  [OVSDB_CHECK_IDL_TRACK_C($@)])
>> +  [OVSDB_CHECK_IDL_TRACK_C($@)
>> +   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
>>
>>  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
>>    [['["idltest",
>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>> index ca4e87b8115c..808b15355743 100644
>> --- a/tests/test-ovsdb.c
>> +++ b/tests/test-ovsdb.c
>> @@ -56,6 +56,7 @@
>>  VLOG_DEFINE_THIS_MODULE(test_ovsdb);
>>
>>  struct test_ovsdb_pvt_context {
>> +    bool write_changed_only;
>>      bool track;
>>  };
>>
>> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
>>          {"timeout", required_argument, NULL, 't'},
>>          {"verbose", optional_argument, NULL, 'v'},
>>          {"change-track", optional_argument, NULL, 'c'},
>> +        {"write-changed-only", optional_argument, NULL, 'w'},
>>          {"magic", required_argument, NULL, OPT_MAGIC},
>>          {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES},
>>          {"help", no_argument, NULL, 'h'},
>> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
>>              pvt->track = true;
>>              break;
>>
>> +        case 'w':
>> +            pvt->write_changed_only = true;
>> +            break;
>> +
>>          case OPT_MAGIC:
>>              magic = optarg;
>>              break;
>> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands)
>>  static void
>>  do_idl(struct ovs_cmdl_context *ctx)
>>  {
>> +    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
>>      struct jsonrpc *rpc;
>>      struct ovsdb_idl *idl;
>>      unsigned int seqno = 0;
>> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
>>      int step = 0;
>>      int error;
>>      int i;
>> -    bool track;
>> -
>> -    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
>>
>>      idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
>>      ovsdb_idl_set_leader_only(idl, false);
>> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
>>          rpc = NULL;
>>      }
>>
>> -    if (track) {
>> +    if (pvt->track) {
>>          ovsdb_idl_track_add_all(idl);
>>      }
>>
>> +    if (pvt->write_changed_only) {
>> +        ovsdb_idl_set_write_change_only_all(idl, true);
>> +    }
>> +
>>      setvbuf(stdout, NULL, _IONBF, 0);
>>
>>      symtab = ovsdb_symbol_table_create();
>> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>              }
>>
>>              /* Print update. */
>> -            if (track) {
>> +            if (pvt->track) {
>>                  print_idl_track(idl, step++, terse);
>>                  ovsdb_idl_track_clear(idl);
>>              } else {
>> --
>> 2.27.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Han Zhou April 26, 2022, 1:08 a.m. UTC | #3
On Wed, Apr 20, 2022 at 12:31 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> At a first glance, change tracking should never be allowed for
> write-only columns.  However, some clients (e.g., ovn-northd) that are
> mostly exclusive writers of a database, use change tracking to avoid
> duplicating the IDL row records into a local cache when implementing
> incremental processing.
>
> The default behavior of the IDL is to automatically turn a write-only
> column into a read-write column whenever the client enables change
> tracking for that column.
>
> For the afore mentioned clients, this becomes a performance issue.
> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
> change a column's value.") explains why writes that don't change a
> column's value cannot be optimized out early if the column is
> read/write.
>
> Furthermore, if there is at least one record in any table that
> changed during a transaction, then *all* records that have been
> written are added to the transaction, even if their values didn't
> change.  If there are many such rows (e.g., like in ovn-northd's
> case) this incurs a significant overhead because:
> a. the client has to build this large transaction
> b. the transaction has to be sent over the network
> c. the server needs to parse this (mostly) no-op update
>
> We now introduce new IDL APIs allowing users to set a new monitoring
> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
> atomicity constraints may be relaxed and written columns that don't
> change value can be skipped from the current transaction.
>
> We benchmarked ovn-northd performance when using this new mode
> against NB and SB databases taken from ovn-kubernetes scale tests.
> We noticed that when a minor change is performed to the Northbound
> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
> build the Southbound transaction becomes negligible (vs ~1.5 seconds
> before this change).
>
> End-to-end ovn-kubernetes scale tests on 120-node clusters also show
> significant reduction of latency to bring up pods; both average and P99
> latency decreased by ~30%.
>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Thanks Dumitru for the revision. I only have two minor comments below.

Acked-by: Han Zhou <hzhou@ovn.org>

> ---
> V3:
> - Addressed Han's comments:
>   - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
> - Rephrased commit log.
> - Changed commit title to reflect the new approach.
> - Old patch (v2) was:
>
https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
> V2:
> - Addressed Numan's comments:
>   - Added APIs to allow per column configuration of modes.
> - Fixed unit tests to actually enable change tracking for write-only
>   columns.
> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
>   seqnos if change tracking is enabled (even for write-only rows).
>
> Note: The OVN counter part change is:
>
https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
> ---
>  NEWS               |  4 ++++
>  lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
>  lib/ovsdb-idl.h    | 14 +++++++++++-
>  tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
>  tests/test-ovsdb.c | 18 +++++++++++----
>  5 files changed, 111 insertions(+), 13 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5074b97aa51c..60932791bfcf 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,10 @@ Post-v2.17.0
>     - OVSDB:
>       * 'relay' service model now supports transaction history, i.e.
honors the
>         'last-txn-id' field in 'monitor_cond_since' requests from clients.
> +   - OVSDB-IDL:
> +     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
> +       applications to relax atomicity requirements when dealing with
> +       columns whose value has been rewritten (but not changed).
>
>
>  v2.17.0 - 17 Feb 2022
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> index 882ede75598d..e5bb6cca5274 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
>  {
>      ovsdb_idl_track_clear__(idl, false);
>  }
> +
> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
> + * for 'column' in 'idl', ensuring that the column will be included in a
> + * transaction only if its value has actually changed locally.  Normally
> + * read/write columns that are written to are always included in the
> + * transaction but, in specific cases, when the application doesn't
> + * require atomicity of writes across different columns the ones that
> + * don't chave value may be skipped.

s/chave/change

> + *
> + * This function should be called between ovsdb_idl_create() and
> + * the first call to ovsdb_idl_run().
> + */
> +void
> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> +                                const struct ovsdb_idl_column *column,
> +                                bool enable)
> +{
> +    if (enable) {
> +        *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
> +    } else {
> +        *ovsdb_idl_get_mode(idl, column) &=
~OVSDB_IDL_WRITE_CHANGED_ONLY;
> +    }
> +}
> +
> +void
> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable)
> +{
> +    for (size_t i = 0; i < idl->class_->n_tables; i++) {
> +        const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];
> +
> +        for (size_t j = 0; j < tc->n_columns; j++) {
> +            const struct ovsdb_idl_column *column = &tc->columns[j];
> +            ovsdb_idl_set_write_change_only(idl, column, enable);
> +        }
> +    }
> +}
>
>  static void
>  log_parse_update_error(struct ovsdb_error *error)
> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
*row_,
>  {
>      struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
>      const struct ovsdb_idl_table_class *class;
> +    unsigned char column_mode;
> +    bool optimize_rewritten;
>      size_t column_idx;
>      bool write_only;
>
> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
*row_,
>
>      class = row->table->class_;
>      column_idx = column - class->columns;
> -    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
> +    column_mode = row->table->modes[column_idx];
> +    write_only = column_mode == OVSDB_IDL_MONITOR;
> +    optimize_rewritten =
> +        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
> +
>
>      ovs_assert(row->new_datum != NULL);
>      ovs_assert(column_idx < class->n_columns);
> -    ovs_assert(row->old_datum == NULL ||
> -               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
> +    ovs_assert(row->old_datum == NULL || column_mode &
OVSDB_IDL_MONITOR);
>
>      if (row->table->idl->verify_write_only && !write_only) {
>          VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s)
when"
> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
*row_,
>       * different value in that column since we read it.  (But if a whole
>       * transaction only does writes of existing values, without making
any real
>       * changes, we will drop the whole transaction later in
> -     * ovsdb_idl_txn_commit().) */
> -    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> -                                         datum, &column->type)) {
> +     * ovsdb_idl_txn_commit().)
> +     *
> +     * The application may choose to bypass this restriction and always
> +     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
> +     */
> +    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row,
column),
> +                                                 datum, &column->type)) {
>          goto discard_datum;
>      }
>
> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
> index d00599616ef9..1716955ab61c 100644
> --- a/lib/ovsdb-idl.h
> +++ b/lib/ovsdb-idl.h
> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct
ovsdb_idl *,
>   *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a
column
>   *     that a client wants to track using the change tracking
>   *     ovsdb_idl_track_get_*() functions.
> + *
> + *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT |
OVSDB_IDL_WRITE_CHANGED_ONLY)
> + *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
> + *     only adds a written column to a transaction if the column's value
> + *     has actually changed.

In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR
| OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as
well, but I think it is too tedious. Maybe we can change the section from
"possible mode combinations" to "typical mode combinations".



>   */
>  #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
>  #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column changes?
*/
> -#define OVSDB_IDL_TRACK   (1 << 2)
> +#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \
> +                          (1 << 3) /* Write back only changed columns? */
>
>  void ovsdb_idl_add_column(struct ovsdb_idl *, const struct
ovsdb_idl_column *);
>  void ovsdb_idl_add_table(struct ovsdb_idl *,
> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct
ovsdb_idl_row *row,
>                                  const struct ovsdb_idl_column *column);
>  void ovsdb_idl_track_clear(struct ovsdb_idl *);
>
> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> +                                     const struct ovsdb_idl_column
*column,
> +                                     bool enable);
> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool
enable);
> +
>
>  /* Reading the database replica. */
>
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 62e2b638320c..d1cfa59c5758 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
>     OVSDB_SERVER_SHUTDOWN
>     AT_CLEANUP])
>
> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
> +  [AT_SETUP([$1 - write-changed-only - C])
> +   AT_KEYWORDS([ovsdb server idl positive $5])
> +   AT_CHECK([ovsdb_start_idltest])
> +   m4_if([$2], [], [],
> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
[ignore])])
> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc
-t10 idl unix:socket $3],
> +            [0], [stdout], [ignore])
> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> +            [0], [$4])
> +   OVSDB_SERVER_SHUTDOWN
> +   AT_CLEANUP])
> +
>  # same as OVSDB_CHECK_IDL but uses tcp.
>  m4_define([OVSDB_CHECK_IDL_TCP_C],
>    [AT_SETUP([$1 - C - tcp])
> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],
>
>  m4_define([OVSDB_CHECK_IDL],
>    [OVSDB_CHECK_IDL_C($@)
> +  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
>     OVSDB_CHECK_IDL_TCP_C($@)
>     OVSDB_CHECK_IDL_TCP6_C($@)
>     OVSDB_CHECK_IDL_PY($@)
> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
>     OVSDB_SERVER_SHUTDOWN
>     AT_CLEANUP])
>
> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
> +  [AT_SETUP([$1 - write-changed-only - C])
> +   AT_KEYWORDS([ovsdb server idl tracking positive $5])
> +   AT_CHECK([ovsdb_start_idltest])
> +   m4_if([$2], [], [],
> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
[ignore])])
> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc
-t10 -c -w idl unix:socket $3],
> +            [0], [stdout], [ignore])
> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> +            [0], [$4])
> +   OVSDB_SERVER_SHUTDOWN
> +   AT_CLEANUP])
> +
>  m4_define([OVSDB_CHECK_IDL_TRACK],
> -  [OVSDB_CHECK_IDL_TRACK_C($@)])
> +  [OVSDB_CHECK_IDL_TRACK_C($@)
> +   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
>
>  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
>    [['["idltest",
> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
> index ca4e87b8115c..808b15355743 100644
> --- a/tests/test-ovsdb.c
> +++ b/tests/test-ovsdb.c
> @@ -56,6 +56,7 @@
>  VLOG_DEFINE_THIS_MODULE(test_ovsdb);
>
>  struct test_ovsdb_pvt_context {
> +    bool write_changed_only;
>      bool track;
>  };
>
> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct
test_ovsdb_pvt_context *pvt)
>          {"timeout", required_argument, NULL, 't'},
>          {"verbose", optional_argument, NULL, 'v'},
>          {"change-track", optional_argument, NULL, 'c'},
> +        {"write-changed-only", optional_argument, NULL, 'w'},
>          {"magic", required_argument, NULL, OPT_MAGIC},
>          {"no-rename-open-files", no_argument, NULL,
OPT_NO_RENAME_OPEN_FILES},
>          {"help", no_argument, NULL, 'h'},
> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct
test_ovsdb_pvt_context *pvt)
>              pvt->track = true;
>              break;
>
> +        case 'w':
> +            pvt->write_changed_only = true;
> +            break;
> +
>          case OPT_MAGIC:
>              magic = optarg;
>              break;
> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char
*commands)
>  static void
>  do_idl(struct ovs_cmdl_context *ctx)
>  {
> +    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
>      struct jsonrpc *rpc;
>      struct ovsdb_idl *idl;
>      unsigned int seqno = 0;
> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
>      int step = 0;
>      int error;
>      int i;
> -    bool track;
> -
> -    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
>
>      idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
>      ovsdb_idl_set_leader_only(idl, false);
> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
>          rpc = NULL;
>      }
>
> -    if (track) {
> +    if (pvt->track) {
>          ovsdb_idl_track_add_all(idl);
>      }
>
> +    if (pvt->write_changed_only) {
> +        ovsdb_idl_set_write_change_only_all(idl, true);
> +    }
> +
>      setvbuf(stdout, NULL, _IONBF, 0);
>
>      symtab = ovsdb_symbol_table_create();
> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>              }
>
>              /* Print update. */
> -            if (track) {
> +            if (pvt->track) {
>                  print_idl_track(idl, step++, terse);
>                  ovsdb_idl_track_clear(idl);
>              } else {
> --
> 2.27.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou April 26, 2022, 1:10 a.m. UTC | #4
On Mon, Apr 25, 2022 at 2:28 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 4/22/22 20:38, Terry Wilson wrote:
> > On Wed, Apr 20, 2022 at 2:31 PM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> At a first glance, change tracking should never be allowed for
> >> write-only columns.  However, some clients (e.g., ovn-northd) that are
> >> mostly exclusive writers of a database, use change tracking to avoid
> >> duplicating the IDL row records into a local cache when implementing
> >> incremental processing.
> >>
> >> The default behavior of the IDL is to automatically turn a write-only
> >> column into a read-write column whenever the client enables change
> >> tracking for that column.
> >>
> >> For the afore mentioned clients, this becomes a performance issue.
> >> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
> >> change a column's value.") explains why writes that don't change a
> >> column's value cannot be optimized out early if the column is
> >> read/write.
> >>
> >> Furthermore, if there is at least one record in any table that
> >> changed during a transaction, then *all* records that have been
> >> written are added to the transaction, even if their values didn't
> >> change.  If there are many such rows (e.g., like in ovn-northd's
> >> case) this incurs a significant overhead because:
> >> a. the client has to build this large transaction
> >> b. the transaction has to be sent over the network
> >> c. the server needs to parse this (mostly) no-op update
> >>
> >> We now introduce new IDL APIs allowing users to set a new monitoring
> >> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that
the
> >> atomicity constraints may be relaxed and written columns that don't
> >> change value can be skipped from the current transaction.
> >
> > Is this something that we should consider adding at some point to the
> > Python IDL code as well? I know in the past feature support has
> > diverged. Maybe at some point we should create a python/TODO file that
> > lists features in the C IDL that still need to be implemented if they
> > aren't immediately added with C IDL changes?
>
> You're right, we should [robably consider adding this as a new mode to
> the Python IDL too.  I can do that as a follow up if needed.  However,
> until now, the only user of this new mode is ovn-northd and it will be
> using it in conjunction with change tracking (which is another missing
> Python IDL feature).
>
> Han, Ilya, what do you think?
>
> Nevertheless, a TODO list would probably be good to have indeed.
>
I agree that a TODO would be good for the python IDL. Since there are
already feature gaps, I think it may be a separate patch to add TODOs for
all the gaps.

Thanks,
Han

> >
> >> We benchmarked ovn-northd performance when using this new mode
> >> against NB and SB databases taken from ovn-kubernetes scale tests.
> >> We noticed that when a minor change is performed to the Northbound
> >> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
> >> build the Southbound transaction becomes negligible (vs ~1.5 seconds
> >> before this change).
> >>
> >> End-to-end ovn-kubernetes scale tests on 120-node clusters also show
> >> significant reduction of latency to bring up pods; both average and P99
> >> latency decreased by ~30%.
> >>
> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> >> ---
> >> V3:
> >> - Addressed Han's comments:
> >>   - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
> >> - Rephrased commit log.
> >> - Changed commit title to reflect the new approach.
> >> - Old patch (v2) was:
> >>
https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
> >> V2:
> >> - Addressed Numan's comments:
> >>   - Added APIs to allow per column configuration of modes.
> >> - Fixed unit tests to actually enable change tracking for write-only
> >>   columns.
> >> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
> >>   seqnos if change tracking is enabled (even for write-only rows).
> >>
> >> Note: The OVN counter part change is:
> >>
https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
> >> ---
> >>  NEWS               |  4 ++++
> >>  lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
> >>  lib/ovsdb-idl.h    | 14 +++++++++++-
> >>  tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
> >>  tests/test-ovsdb.c | 18 +++++++++++----
> >>  5 files changed, 111 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 5074b97aa51c..60932791bfcf 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -14,6 +14,10 @@ Post-v2.17.0
> >>     - OVSDB:
> >>       * 'relay' service model now supports transaction history, i.e.
honors the
> >>         'last-txn-id' field in 'monitor_cond_since' requests from
clients.
> >> +   - OVSDB-IDL:
> >> +     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
> >> +       applications to relax atomicity requirements when dealing with
> >> +       columns whose value has been rewritten (but not changed).
> >>
> >>
> >>  v2.17.0 - 17 Feb 2022
> >> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >> index 882ede75598d..e5bb6cca5274 100644
> >> --- a/lib/ovsdb-idl.c
> >> +++ b/lib/ovsdb-idl.c
> >> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
> >>  {
> >>      ovsdb_idl_track_clear__(idl, false);
> >>  }
> >> +
> >> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
> >> + * for 'column' in 'idl', ensuring that the column will be included
in a
> >> + * transaction only if its value has actually changed locally.
Normally
> >> + * read/write columns that are written to are always included in the
> >> + * transaction but, in specific cases, when the application doesn't
> >> + * require atomicity of writes across different columns the ones that
> >> + * don't chave value may be skipped.
> >> + *
> >> + * This function should be called between ovsdb_idl_create() and
> >> + * the first call to ovsdb_idl_run().
> >> + */
> >> +void
> >> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> >> +                                const struct ovsdb_idl_column *column,
> >> +                                bool enable)
> >> +{
> >> +    if (enable) {
> >> +        *ovsdb_idl_get_mode(idl, column) |=
OVSDB_IDL_WRITE_CHANGED_ONLY;
> >> +    } else {
> >> +        *ovsdb_idl_get_mode(idl, column) &=
~OVSDB_IDL_WRITE_CHANGED_ONLY;
> >> +    }
> >> +}
> >> +
> >> +void
> >> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool
enable)
> >> +{
> >> +    for (size_t i = 0; i < idl->class_->n_tables; i++) {
> >> +        const struct ovsdb_idl_table_class *tc =
&idl->class_->tables[i];
> >> +
> >> +        for (size_t j = 0; j < tc->n_columns; j++) {
> >> +            const struct ovsdb_idl_column *column = &tc->columns[j];
> >> +            ovsdb_idl_set_write_change_only(idl, column, enable);
> >> +        }
> >> +    }
> >> +}
> >>
> >>  static void
> >>  log_parse_update_error(struct ovsdb_error *error)
> >> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
*row_,
> >>  {
> >>      struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *,
row_);
> >>      const struct ovsdb_idl_table_class *class;
> >> +    unsigned char column_mode;
> >> +    bool optimize_rewritten;
> >>      size_t column_idx;
> >>      bool write_only;
> >>
> >> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct
ovsdb_idl_row *row_,
> >>
> >>      class = row->table->class_;
> >>      column_idx = column - class->columns;
> >> -    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
> >> +    column_mode = row->table->modes[column_idx];
> >> +    write_only = column_mode == OVSDB_IDL_MONITOR;
> >> +    optimize_rewritten =
> >> +        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
> >> +
> >>
> >>      ovs_assert(row->new_datum != NULL);
> >>      ovs_assert(column_idx < class->n_columns);
> >> -    ovs_assert(row->old_datum == NULL ||
> >> -               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
> >> +    ovs_assert(row->old_datum == NULL || column_mode &
OVSDB_IDL_MONITOR);
> >>
> >>      if (row->table->idl->verify_write_only && !write_only) {
> >>          VLOG_ERR("Bug: Attempt to write to a read/write column
(%s:%s) when"
> >> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct
ovsdb_idl_row *row_,
> >>       * different value in that column since we read it.  (But if a
whole
> >>       * transaction only does writes of existing values, without
making any real
> >>       * changes, we will drop the whole transaction later in
> >> -     * ovsdb_idl_txn_commit().) */
> >> -    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
> >> -                                         datum, &column->type)) {
> >> +     * ovsdb_idl_txn_commit().)
> >> +     *
> >> +     * The application may choose to bypass this restriction and
always
> >> +     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
> >> +     */
> >> +    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row,
column),
> >> +                                                 datum,
&column->type)) {
> >>          goto discard_datum;
> >>      }
> >>
> >> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
> >> index d00599616ef9..1716955ab61c 100644
> >> --- a/lib/ovsdb-idl.h
> >> +++ b/lib/ovsdb-idl.h
> >> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct
ovsdb_idl *,
> >>   *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a
column
> >>   *     that a client wants to track using the change tracking
> >>   *     ovsdb_idl_track_get_*() functions.
> >> + *
> >> + *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT |
OVSDB_IDL_WRITE_CHANGED_ONLY)
> >> + *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except
that it
> >> + *     only adds a written column to a transaction if the column's
value
> >> + *     has actually changed.
> >>   */
> >>  #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
> >>  #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column
changes? */
> >> -#define OVSDB_IDL_TRACK   (1 << 2)
> >> +#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
> >> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \
> >> +                          (1 << 3) /* Write back only changed
columns? */
> >>
> >>  void ovsdb_idl_add_column(struct ovsdb_idl *, const struct
ovsdb_idl_column *);
> >>  void ovsdb_idl_add_table(struct ovsdb_idl *,
> >> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct
ovsdb_idl_row *row,
> >>                                  const struct ovsdb_idl_column
*column);
> >>  void ovsdb_idl_track_clear(struct ovsdb_idl *);
> >>
> >> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
> >> +                                     const struct ovsdb_idl_column
*column,
> >> +                                     bool enable);
> >> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool
enable);
> >> +
> >>
> >>  /* Reading the database replica. */
> >>
> >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> >> index 62e2b638320c..d1cfa59c5758 100644
> >> --- a/tests/ovsdb-idl.at
> >> +++ b/tests/ovsdb-idl.at
> >> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
> >>     OVSDB_SERVER_SHUTDOWN
> >>     AT_CLEANUP])
> >>
> >> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
> >> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
> >> +  [AT_SETUP([$1 - write-changed-only - C])
> >> +   AT_KEYWORDS([ovsdb server idl positive $5])
> >> +   AT_CHECK([ovsdb_start_idltest])
> >> +   m4_if([$2], [], [],
> >> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
[ignore])])
> >> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m'
-vjsonrpc -t10 idl unix:socket $3],
> >> +            [0], [stdout], [ignore])
> >> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> >> +            [0], [$4])
> >> +   OVSDB_SERVER_SHUTDOWN
> >> +   AT_CLEANUP])
> >> +
> >>  # same as OVSDB_CHECK_IDL but uses tcp.
> >>  m4_define([OVSDB_CHECK_IDL_TCP_C],
> >>    [AT_SETUP([$1 - C - tcp])
> >> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],
> >>
> >>  m4_define([OVSDB_CHECK_IDL],
> >>    [OVSDB_CHECK_IDL_C($@)
> >> +  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
> >>     OVSDB_CHECK_IDL_TCP_C($@)
> >>     OVSDB_CHECK_IDL_TCP6_C($@)
> >>     OVSDB_CHECK_IDL_PY($@)
> >> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
> >>     OVSDB_SERVER_SHUTDOWN
> >>     AT_CLEANUP])
> >>
> >> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
> >> +  [AT_SETUP([$1 - write-changed-only - C])
> >> +   AT_KEYWORDS([ovsdb server idl tracking positive $5])
> >> +   AT_CHECK([ovsdb_start_idltest])
> >> +   m4_if([$2], [], [],
> >> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
[ignore])])
> >> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m'
-vjsonrpc -t10 -c -w idl unix:socket $3],
> >> +            [0], [stdout], [ignore])
> >> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
> >> +            [0], [$4])
> >> +   OVSDB_SERVER_SHUTDOWN
> >> +   AT_CLEANUP])
> >> +
> >>  m4_define([OVSDB_CHECK_IDL_TRACK],
> >> -  [OVSDB_CHECK_IDL_TRACK_C($@)])
> >> +  [OVSDB_CHECK_IDL_TRACK_C($@)
> >> +   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
> >>
> >>  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
> >>    [['["idltest",
> >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
> >> index ca4e87b8115c..808b15355743 100644
> >> --- a/tests/test-ovsdb.c
> >> +++ b/tests/test-ovsdb.c
> >> @@ -56,6 +56,7 @@
> >>  VLOG_DEFINE_THIS_MODULE(test_ovsdb);
> >>
> >>  struct test_ovsdb_pvt_context {
> >> +    bool write_changed_only;
> >>      bool track;
> >>  };
> >>
> >> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct
test_ovsdb_pvt_context *pvt)
> >>          {"timeout", required_argument, NULL, 't'},
> >>          {"verbose", optional_argument, NULL, 'v'},
> >>          {"change-track", optional_argument, NULL, 'c'},
> >> +        {"write-changed-only", optional_argument, NULL, 'w'},
> >>          {"magic", required_argument, NULL, OPT_MAGIC},
> >>          {"no-rename-open-files", no_argument, NULL,
OPT_NO_RENAME_OPEN_FILES},
> >>          {"help", no_argument, NULL, 'h'},
> >> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct
test_ovsdb_pvt_context *pvt)
> >>              pvt->track = true;
> >>              break;
> >>
> >> +        case 'w':
> >> +            pvt->write_changed_only = true;
> >> +            break;
> >> +
> >>          case OPT_MAGIC:
> >>              magic = optarg;
> >>              break;
> >> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char
*commands)
> >>  static void
> >>  do_idl(struct ovs_cmdl_context *ctx)
> >>  {
> >> +    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
> >>      struct jsonrpc *rpc;
> >>      struct ovsdb_idl *idl;
> >>      unsigned int seqno = 0;
> >> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
> >>      int step = 0;
> >>      int error;
> >>      int i;
> >> -    bool track;
> >> -
> >> -    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
> >>
> >>      idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true,
true);
> >>      ovsdb_idl_set_leader_only(idl, false);
> >> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
> >>          rpc = NULL;
> >>      }
> >>
> >> -    if (track) {
> >> +    if (pvt->track) {
> >>          ovsdb_idl_track_add_all(idl);
> >>      }
> >>
> >> +    if (pvt->write_changed_only) {
> >> +        ovsdb_idl_set_write_change_only_all(idl, true);
> >> +    }
> >> +
> >>      setvbuf(stdout, NULL, _IONBF, 0);
> >>
> >>      symtab = ovsdb_symbol_table_create();
> >> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
> >>              }
> >>
> >>              /* Print update. */
> >> -            if (track) {
> >> +            if (pvt->track) {
> >>                  print_idl_track(idl, step++, terse);
> >>                  ovsdb_idl_track_clear(idl);
> >>              } else {
> >> --
> >> 2.27.0
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
Dumitru Ceara April 26, 2022, 8:05 a.m. UTC | #5
On 4/26/22 03:08, Han Zhou wrote:
> On Wed, Apr 20, 2022 at 12:31 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> At a first glance, change tracking should never be allowed for
>> write-only columns.  However, some clients (e.g., ovn-northd) that are
>> mostly exclusive writers of a database, use change tracking to avoid
>> duplicating the IDL row records into a local cache when implementing
>> incremental processing.
>>
>> The default behavior of the IDL is to automatically turn a write-only
>> column into a read-write column whenever the client enables change
>> tracking for that column.
>>
>> For the afore mentioned clients, this becomes a performance issue.
>> Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of writes that don't
>> change a column's value.") explains why writes that don't change a
>> column's value cannot be optimized out early if the column is
>> read/write.
>>
>> Furthermore, if there is at least one record in any table that
>> changed during a transaction, then *all* records that have been
>> written are added to the transaction, even if their values didn't
>> change.  If there are many such rows (e.g., like in ovn-northd's
>> case) this incurs a significant overhead because:
>> a. the client has to build this large transaction
>> b. the transaction has to be sent over the network
>> c. the server needs to parse this (mostly) no-op update
>>
>> We now introduce new IDL APIs allowing users to set a new monitoring
>> mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
>> atomicity constraints may be relaxed and written columns that don't
>> change value can be skipped from the current transaction.
>>
>> We benchmarked ovn-northd performance when using this new mode
>> against NB and SB databases taken from ovn-kubernetes scale tests.
>> We noticed that when a minor change is performed to the Northbound
>> database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
>> build the Southbound transaction becomes negligible (vs ~1.5 seconds
>> before this change).
>>
>> End-to-end ovn-kubernetes scale tests on 120-node clusters also show
>> significant reduction of latency to bring up pods; both average and P99
>> latency decreased by ~30%.
>>
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> 
> Thanks Dumitru for the revision. I only have two minor comments below.
> 
> Acked-by: Han Zhou <hzhou@ovn.org>
> 
>> ---
>> V3:
>> - Addressed Han's comments:
>>   - Add a new IDL monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY.
>> - Rephrased commit log.
>> - Changed commit title to reflect the new approach.
>> - Old patch (v2) was:
>>
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220415142146.13169-1-dceara@redhat.com/
>> V2:
>> - Addressed Numan's comments:
>>   - Added APIs to allow per column configuration of modes.
>> - Fixed unit tests to actually enable change tracking for write-only
>>   columns.
>> - Fixed ovsdb_idl_row_change() to correctly update row/table/idl change
>>   seqnos if change tracking is enabled (even for write-only rows).
>>
>> Note: The OVN counter part change is:
>>
> https://github.com/dceara/ovn/commit/5afba6f3aa51f508df41d8f0b7b9d739b00b1ee2
>> ---
>>  NEWS               |  4 ++++
>>  lib/ovsdb-idl.c    | 57 +++++++++++++++++++++++++++++++++++++++++-----
>>  lib/ovsdb-idl.h    | 14 +++++++++++-
>>  tests/ovsdb-idl.at | 31 ++++++++++++++++++++++++-
>>  tests/test-ovsdb.c | 18 +++++++++++----
>>  5 files changed, 111 insertions(+), 13 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 5074b97aa51c..60932791bfcf 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -14,6 +14,10 @@ Post-v2.17.0
>>     - OVSDB:
>>       * 'relay' service model now supports transaction history, i.e.
> honors the
>>         'last-txn-id' field in 'monitor_cond_since' requests from clients.
>> +   - OVSDB-IDL:
>> +     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
>> +       applications to relax atomicity requirements when dealing with
>> +       columns whose value has been rewritten (but not changed).
>>
>>
>>  v2.17.0 - 17 Feb 2022
>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> index 882ede75598d..e5bb6cca5274 100644
>> --- a/lib/ovsdb-idl.c
>> +++ b/lib/ovsdb-idl.c
>> @@ -1396,6 +1396,42 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
>>  {
>>      ovsdb_idl_track_clear__(idl, false);
>>  }
>> +
>> +/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
>> + * for 'column' in 'idl', ensuring that the column will be included in a
>> + * transaction only if its value has actually changed locally.  Normally
>> + * read/write columns that are written to are always included in the
>> + * transaction but, in specific cases, when the application doesn't
>> + * require atomicity of writes across different columns the ones that
>> + * don't chave value may be skipped.
> 
> s/chave/change
> 

Ack.

>> + *
>> + * This function should be called between ovsdb_idl_create() and
>> + * the first call to ovsdb_idl_run().
>> + */
>> +void
>> +ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
>> +                                const struct ovsdb_idl_column *column,
>> +                                bool enable)
>> +{
>> +    if (enable) {
>> +        *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
>> +    } else {
>> +        *ovsdb_idl_get_mode(idl, column) &=
> ~OVSDB_IDL_WRITE_CHANGED_ONLY;
>> +    }
>> +}
>> +
>> +void
>> +ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable)
>> +{
>> +    for (size_t i = 0; i < idl->class_->n_tables; i++) {
>> +        const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];
>> +
>> +        for (size_t j = 0; j < tc->n_columns; j++) {
>> +            const struct ovsdb_idl_column *column = &tc->columns[j];
>> +            ovsdb_idl_set_write_change_only(idl, column, enable);
>> +        }
>> +    }
>> +}
>>
>>  static void
>>  log_parse_update_error(struct ovsdb_error *error)
>> @@ -3491,6 +3527,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
> *row_,
>>  {
>>      struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
>>      const struct ovsdb_idl_table_class *class;
>> +    unsigned char column_mode;
>> +    bool optimize_rewritten;
>>      size_t column_idx;
>>      bool write_only;
>>
>> @@ -3501,12 +3539,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
> *row_,
>>
>>      class = row->table->class_;
>>      column_idx = column - class->columns;
>> -    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
>> +    column_mode = row->table->modes[column_idx];
>> +    write_only = column_mode == OVSDB_IDL_MONITOR;
>> +    optimize_rewritten =
>> +        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
>> +
>>
>>      ovs_assert(row->new_datum != NULL);
>>      ovs_assert(column_idx < class->n_columns);
>> -    ovs_assert(row->old_datum == NULL ||
>> -               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
>> +    ovs_assert(row->old_datum == NULL || column_mode &
> OVSDB_IDL_MONITOR);
>>
>>      if (row->table->idl->verify_write_only && !write_only) {
>>          VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s)
> when"
>> @@ -3524,9 +3565,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row
> *row_,
>>       * different value in that column since we read it.  (But if a whole
>>       * transaction only does writes of existing values, without making
> any real
>>       * changes, we will drop the whole transaction later in
>> -     * ovsdb_idl_txn_commit().) */
>> -    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
>> -                                         datum, &column->type)) {
>> +     * ovsdb_idl_txn_commit().)
>> +     *
>> +     * The application may choose to bypass this restriction and always
>> +     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
>> +     */
>> +    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row,
> column),
>> +                                                 datum, &column->type)) {
>>          goto discard_datum;
>>      }
>>
>> diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
>> index d00599616ef9..1716955ab61c 100644
>> --- a/lib/ovsdb-idl.h
>> +++ b/lib/ovsdb-idl.h
>> @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct
> ovsdb_idl *,
>>   *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a
> column
>>   *     that a client wants to track using the change tracking
>>   *     ovsdb_idl_track_get_*() functions.
>> + *
>> + *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT |
> OVSDB_IDL_WRITE_CHANGED_ONLY)
>> + *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
>> + *     only adds a written column to a transaction if the column's value
>> + *     has actually changed.
> 
> In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR
> | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as
> well, but I think it is too tedious. Maybe we can change the section from
> "possible mode combinations" to "typical mode combinations".
> 
> 

I'll change this in v4.

Thanks!

Dumitru

> 
>>   */
>>  #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
>>  #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column changes?
> */
>> -#define OVSDB_IDL_TRACK   (1 << 2)
>> +#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
>> +#define OVSDB_IDL_WRITE_CHANGED_ONLY \
>> +                          (1 << 3) /* Write back only changed columns? */
>>
>>  void ovsdb_idl_add_column(struct ovsdb_idl *, const struct
> ovsdb_idl_column *);
>>  void ovsdb_idl_add_table(struct ovsdb_idl *,
>> @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct
> ovsdb_idl_row *row,
>>                                  const struct ovsdb_idl_column *column);
>>  void ovsdb_idl_track_clear(struct ovsdb_idl *);
>>
>> +void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
>> +                                     const struct ovsdb_idl_column
> *column,
>> +                                     bool enable);
>> +void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool
> enable);
>> +
>>
>>  /* Reading the database replica. */
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 62e2b638320c..d1cfa59c5758 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
>>     OVSDB_SERVER_SHUTDOWN
>>     AT_CLEANUP])
>>
>> +# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
>> +m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
>> +  [AT_SETUP([$1 - write-changed-only - C])
>> +   AT_KEYWORDS([ovsdb server idl positive $5])
>> +   AT_CHECK([ovsdb_start_idltest])
>> +   m4_if([$2], [], [],
>> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
> [ignore])])
>> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc
> -t10 idl unix:socket $3],
>> +            [0], [stdout], [ignore])
>> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
>> +            [0], [$4])
>> +   OVSDB_SERVER_SHUTDOWN
>> +   AT_CLEANUP])
>> +
>>  # same as OVSDB_CHECK_IDL but uses tcp.
>>  m4_define([OVSDB_CHECK_IDL_TCP_C],
>>    [AT_SETUP([$1 - C - tcp])
>> @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],
>>
>>  m4_define([OVSDB_CHECK_IDL],
>>    [OVSDB_CHECK_IDL_C($@)
>> +  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
>>     OVSDB_CHECK_IDL_TCP_C($@)
>>     OVSDB_CHECK_IDL_TCP6_C($@)
>>     OVSDB_CHECK_IDL_PY($@)
>> @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
>>     OVSDB_SERVER_SHUTDOWN
>>     AT_CLEANUP])
>>
>> +m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
>> +  [AT_SETUP([$1 - write-changed-only - C])
>> +   AT_KEYWORDS([ovsdb server idl tracking positive $5])
>> +   AT_CHECK([ovsdb_start_idltest])
>> +   m4_if([$2], [], [],
>> +     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore],
> [ignore])])
>> +   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc
> -t10 -c -w idl unix:socket $3],
>> +            [0], [stdout], [ignore])
>> +   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
>> +            [0], [$4])
>> +   OVSDB_SERVER_SHUTDOWN
>> +   AT_CLEANUP])
>> +
>>  m4_define([OVSDB_CHECK_IDL_TRACK],
>> -  [OVSDB_CHECK_IDL_TRACK_C($@)])
>> +  [OVSDB_CHECK_IDL_TRACK_C($@)
>> +   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
>>
>>  OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
>>    [['["idltest",
>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>> index ca4e87b8115c..808b15355743 100644
>> --- a/tests/test-ovsdb.c
>> +++ b/tests/test-ovsdb.c
>> @@ -56,6 +56,7 @@
>>  VLOG_DEFINE_THIS_MODULE(test_ovsdb);
>>
>>  struct test_ovsdb_pvt_context {
>> +    bool write_changed_only;
>>      bool track;
>>  };
>>
>> @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct
> test_ovsdb_pvt_context *pvt)
>>          {"timeout", required_argument, NULL, 't'},
>>          {"verbose", optional_argument, NULL, 'v'},
>>          {"change-track", optional_argument, NULL, 'c'},
>> +        {"write-changed-only", optional_argument, NULL, 'w'},
>>          {"magic", required_argument, NULL, OPT_MAGIC},
>>          {"no-rename-open-files", no_argument, NULL,
> OPT_NO_RENAME_OPEN_FILES},
>>          {"help", no_argument, NULL, 'h'},
>> @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct
> test_ovsdb_pvt_context *pvt)
>>              pvt->track = true;
>>              break;
>>
>> +        case 'w':
>> +            pvt->write_changed_only = true;
>> +            break;
>> +
>>          case OPT_MAGIC:
>>              magic = optarg;
>>              break;
>> @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char
> *commands)
>>  static void
>>  do_idl(struct ovs_cmdl_context *ctx)
>>  {
>> +    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
>>      struct jsonrpc *rpc;
>>      struct ovsdb_idl *idl;
>>      unsigned int seqno = 0;
>> @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
>>      int step = 0;
>>      int error;
>>      int i;
>> -    bool track;
>> -
>> -    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
>>
>>      idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
>>      ovsdb_idl_set_leader_only(idl, false);
>> @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
>>          rpc = NULL;
>>      }
>>
>> -    if (track) {
>> +    if (pvt->track) {
>>          ovsdb_idl_track_add_all(idl);
>>      }
>>
>> +    if (pvt->write_changed_only) {
>> +        ovsdb_idl_set_write_change_only_all(idl, true);
>> +    }
>> +
>>      setvbuf(stdout, NULL, _IONBF, 0);
>>
>>      symtab = ovsdb_symbol_table_create();
>> @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>              }
>>
>>              /* Print update. */
>> -            if (track) {
>> +            if (pvt->track) {
>>                  print_idl_track(idl, step++, terse);
>>                  ovsdb_idl_track_clear(idl);
>>              } else {
>> --
>> 2.27.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara April 26, 2022, 8:06 a.m. UTC | #6
On 4/26/22 03:10, Han Zhou wrote:
>>> Is this something that we should consider adding at some point to the
>>> Python IDL code as well? I know in the past feature support has
>>> diverged. Maybe at some point we should create a python/TODO file that
>>> lists features in the C IDL that still need to be implemented if they
>>> aren't immediately added with C IDL changes?
>> You're right, we should [robably consider adding this as a new mode to
>> the Python IDL too.  I can do that as a follow up if needed.  However,
>> until now, the only user of this new mode is ovn-northd and it will be
>> using it in conjunction with change tracking (which is another missing
>> Python IDL feature).
>>
>> Han, Ilya, what do you think?
>>
>> Nevertheless, a TODO list would probably be good to have indeed.
>>
> I agree that a TODO would be good for the python IDL. Since there are
> already feature gaps, I think it may be a separate patch to add TODOs for
> all the gaps.
> 
> Thanks,
> Han
> 

OK, thanks for the feedback!

I'll work on a patch to add a python/TODO.rst file and send it out after
the C IDL change is accepted.

Regards,
Dumitru
Dumitru Ceara April 26, 2022, 10:41 a.m. UTC | #7
On 4/26/22 10:05, Dumitru Ceara wrote:
>> In the original form of this comment, we need to mention "OVSDB_IDL_MONITOR
>> | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK | OVSDB_IDL_WRITE_CHANGED_ONLY" as
>> well, but I think it is too tedious. Maybe we can change the section from
>> "possible mode combinations" to "typical mode combinations".
>>
>>
> I'll change this in v4.
> 

v4 available at:
https://patchwork.ozlabs.org/project/openvswitch/patch/20220426103708.17266-1-dceara@redhat.com/
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 5074b97aa51c..60932791bfcf 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,10 @@  Post-v2.17.0
    - OVSDB:
      * 'relay' service model now supports transaction history, i.e. honors the
        'last-txn-id' field in 'monitor_cond_since' requests from clients.
+   - OVSDB-IDL:
+     * New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
+       applications to relax atomicity requirements when dealing with
+       columns whose value has been rewritten (but not changed).
 
 
 v2.17.0 - 17 Feb 2022
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 882ede75598d..e5bb6cca5274 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -1396,6 +1396,42 @@  ovsdb_idl_track_clear(struct ovsdb_idl *idl)
 {
     ovsdb_idl_track_clear__(idl, false);
 }
+
+/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
+ * for 'column' in 'idl', ensuring that the column will be included in a
+ * transaction only if its value has actually changed locally.  Normally
+ * read/write columns that are written to are always included in the
+ * transaction but, in specific cases, when the application doesn't
+ * require atomicity of writes across different columns the ones that
+ * don't chave value may be skipped.
+ *
+ * This function should be called between ovsdb_idl_create() and
+ * the first call to ovsdb_idl_run().
+ */
+void
+ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
+                                const struct ovsdb_idl_column *column,
+                                bool enable)
+{
+    if (enable) {
+        *ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
+    } else {
+        *ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY;
+    }
+}
+
+void
+ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable)
+{
+    for (size_t i = 0; i < idl->class_->n_tables; i++) {
+        const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];
+
+        for (size_t j = 0; j < tc->n_columns; j++) {
+            const struct ovsdb_idl_column *column = &tc->columns[j];
+            ovsdb_idl_set_write_change_only(idl, column, enable);
+        }
+    }
+}
 
 static void
 log_parse_update_error(struct ovsdb_error *error)
@@ -3491,6 +3527,8 @@  ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
 {
     struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
     const struct ovsdb_idl_table_class *class;
+    unsigned char column_mode;
+    bool optimize_rewritten;
     size_t column_idx;
     bool write_only;
 
@@ -3501,12 +3539,15 @@  ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
 
     class = row->table->class_;
     column_idx = column - class->columns;
-    write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
+    column_mode = row->table->modes[column_idx];
+    write_only = column_mode == OVSDB_IDL_MONITOR;
+    optimize_rewritten =
+        write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);
+
 
     ovs_assert(row->new_datum != NULL);
     ovs_assert(column_idx < class->n_columns);
-    ovs_assert(row->old_datum == NULL ||
-               row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
+    ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR);
 
     if (row->table->idl->verify_write_only && !write_only) {
         VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when"
@@ -3524,9 +3565,13 @@  ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
      * different value in that column since we read it.  (But if a whole
      * transaction only does writes of existing values, without making any real
      * changes, we will drop the whole transaction later in
-     * ovsdb_idl_txn_commit().) */
-    if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
-                                         datum, &column->type)) {
+     * ovsdb_idl_txn_commit().)
+     *
+     * The application may choose to bypass this restriction and always
+     * optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
+     */
+    if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column),
+                                                 datum, &column->type)) {
         goto discard_datum;
     }
 
diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
index d00599616ef9..1716955ab61c 100644
--- a/lib/ovsdb-idl.h
+++ b/lib/ovsdb-idl.h
@@ -181,10 +181,17 @@  bool ovsdb_idl_server_has_column(const struct ovsdb_idl *,
  *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column
  *     that a client wants to track using the change tracking
  *     ovsdb_idl_track_get_*() functions.
+ *
+ *   - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY)
+ *     is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
+ *     only adds a written column to a transaction if the column's value
+ *     has actually changed.
  */
 #define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
 #define OVSDB_IDL_ALERT   (1 << 1) /* Alert client when column changes? */
-#define OVSDB_IDL_TRACK   (1 << 2)
+#define OVSDB_IDL_TRACK   (1 << 2) /* Track column changes? */
+#define OVSDB_IDL_WRITE_CHANGED_ONLY \
+                          (1 << 3) /* Write back only changed columns? */
 
 void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *);
 void ovsdb_idl_add_table(struct ovsdb_idl *,
@@ -233,6 +240,11 @@  bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row,
                                 const struct ovsdb_idl_column *column);
 void ovsdb_idl_track_clear(struct ovsdb_idl *);
 
+void ovsdb_idl_set_write_change_only(struct ovsdb_idl *idl,
+                                     const struct ovsdb_idl_column *column,
+                                     bool enable);
+void ovsdb_idl_set_write_change_only_all(struct ovsdb_idl *idl, bool enable);
+
 
 /* Reading the database replica. */
 
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 62e2b638320c..d1cfa59c5758 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -87,6 +87,20 @@  m4_define([OVSDB_CHECK_IDL_C],
    OVSDB_SERVER_SHUTDOWN
    AT_CLEANUP])
 
+# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
+m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
+  [AT_SETUP([$1 - write-changed-only - C])
+   AT_KEYWORDS([ovsdb server idl positive $5])
+   AT_CHECK([ovsdb_start_idltest])
+   m4_if([$2], [], [],
+     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
+   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3],
+            [0], [stdout], [ignore])
+   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
+            [0], [$4])
+   OVSDB_SERVER_SHUTDOWN
+   AT_CLEANUP])
+
 # same as OVSDB_CHECK_IDL but uses tcp.
 m4_define([OVSDB_CHECK_IDL_TCP_C],
   [AT_SETUP([$1 - C - tcp])
@@ -257,6 +271,7 @@  m4_define([OVSDB_CHECK_IDL_SSL_PY],
 
 m4_define([OVSDB_CHECK_IDL],
   [OVSDB_CHECK_IDL_C($@)
+  OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
    OVSDB_CHECK_IDL_TCP_C($@)
    OVSDB_CHECK_IDL_TCP6_C($@)
    OVSDB_CHECK_IDL_PY($@)
@@ -1166,8 +1181,22 @@  m4_define([OVSDB_CHECK_IDL_TRACK_C],
    OVSDB_SERVER_SHUTDOWN
    AT_CLEANUP])
 
+m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
+  [AT_SETUP([$1 - write-changed-only - C])
+   AT_KEYWORDS([ovsdb server idl tracking positive $5])
+   AT_CHECK([ovsdb_start_idltest])
+   m4_if([$2], [], [],
+     [AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
+   AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3],
+            [0], [stdout], [ignore])
+   AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
+            [0], [$4])
+   OVSDB_SERVER_SHUTDOWN
+   AT_CLEANUP])
+
 m4_define([OVSDB_CHECK_IDL_TRACK],
-  [OVSDB_CHECK_IDL_TRACK_C($@)])
+  [OVSDB_CHECK_IDL_TRACK_C($@)
+   OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])
 
 OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
   [['["idltest",
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index ca4e87b8115c..808b15355743 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -56,6 +56,7 @@ 
 VLOG_DEFINE_THIS_MODULE(test_ovsdb);
 
 struct test_ovsdb_pvt_context {
+    bool write_changed_only;
     bool track;
 };
 
@@ -91,6 +92,7 @@  parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
         {"timeout", required_argument, NULL, 't'},
         {"verbose", optional_argument, NULL, 'v'},
         {"change-track", optional_argument, NULL, 'c'},
+        {"write-changed-only", optional_argument, NULL, 'w'},
         {"magic", required_argument, NULL, OPT_MAGIC},
         {"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES},
         {"help", no_argument, NULL, 'h'},
@@ -125,6 +127,10 @@  parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
             pvt->track = true;
             break;
 
+        case 'w':
+            pvt->write_changed_only = true;
+            break;
+
         case OPT_MAGIC:
             magic = optarg;
             break;
@@ -2610,6 +2616,7 @@  update_conditions(struct ovsdb_idl *idl, char *commands)
 static void
 do_idl(struct ovs_cmdl_context *ctx)
 {
+    struct test_ovsdb_pvt_context *pvt = ctx->pvt;
     struct jsonrpc *rpc;
     struct ovsdb_idl *idl;
     unsigned int seqno = 0;
@@ -2618,9 +2625,6 @@  do_idl(struct ovs_cmdl_context *ctx)
     int step = 0;
     int error;
     int i;
-    bool track;
-
-    track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;
 
     idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
     ovsdb_idl_set_leader_only(idl, false);
@@ -2637,10 +2641,14 @@  do_idl(struct ovs_cmdl_context *ctx)
         rpc = NULL;
     }
 
-    if (track) {
+    if (pvt->track) {
         ovsdb_idl_track_add_all(idl);
     }
 
+    if (pvt->write_changed_only) {
+        ovsdb_idl_set_write_change_only_all(idl, true);
+    }
+
     setvbuf(stdout, NULL, _IONBF, 0);
 
     symtab = ovsdb_symbol_table_create();
@@ -2683,7 +2691,7 @@  do_idl(struct ovs_cmdl_context *ctx)
             }
 
             /* Print update. */
-            if (track) {
+            if (pvt->track) {
                 print_idl_track(idl, step++, terse);
                 ovsdb_idl_track_clear(idl);
             } else {