diff mbox series

[ovs-dev,v2] Provide the option to pin ovn-controller and ovn-northd to a specific version.

Message ID 20201117115709.3527697-1-numans@ovn.org
State Changes Requested
Headers show
Series [ovs-dev,v2] Provide the option to pin ovn-controller and ovn-northd to a specific version. | expand

Commit Message

Numan Siddique Nov. 17, 2020, 11:57 a.m. UTC
From: Numan Siddique <numans@ovn.org>

OVN recommends updating/upgrading ovn-controllers first and then
ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
new functionality specified by the database or logical flows created
by ovn-northd is understood by ovn-controller.

However certain deployments may upgrade ovn-northd services first and
then ovn-controllers.  In a large scal deployment, this can result in
downtime during upgrades as old ovn-controllers may not understand
new logical flows or new actions added by ovn-northd.

Even with upgrading ovn-controllers first, can result in ovn-controllers
rejecting some of the logical flows if an existing OVN action is
changed.  One such example is ct_commit action which recently was updated
to take new arguments.

To avoid such downtimes during upgrades, this patch adds the
functionality of pinning ovn-controller and ovn-northd to a specific
version. An internal OVN version is generated and this version is stored
by ovn-northd in the Southbound SB_Global table's
options:northd_internal_version.

When ovn-controller notices that the internal version has changed, it
stops handling the database changes - both Southbound and OVS. All the
existing OF flows are preserved.  When ovn-controller is upgraded to the
same version as ovn-northd services, it will process the database
changes.

This feature is made optional and disabled by default. Any CMS can
enable it by configuring the OVS local database with the option -
ovn-pin-to-northd=false.

Signed-off-by: Numan Siddique <numans@ovn.org>
---

v1 -> v2
----
  * Added test cases and missing documentation.
  * Renamed the ovsdb option from ignore_northd_verison
    to ovn-pin-to-northd.
  * Added NEWS item.

 NEWS                            |   3 +
 controller/ovn-controller.8.xml |  11 +++
 controller/ovn-controller.c     |  58 ++++++++++++-
 lib/ovn-util.c                  |  17 ++++
 lib/ovn-util.h                  |   4 +
 northd/ovn-northd.c             |  18 +++-
 tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
 7 files changed, 251 insertions(+), 7 deletions(-)

Comments

Flaviof Nov. 17, 2020, 5:21 p.m. UTC | #1
Hi Numan!

A few nits I thought of mentioning [inline]

> On Nov 17, 2020, at 6:57 AM, numans@ovn.org wrote:
> 
> From: Numan Siddique <numans@ovn.org>
> 
> OVN recommends updating/upgrading ovn-controllers first and then
> ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
> new functionality specified by the database or logical flows created
> by ovn-northd is understood by ovn-controller.
> 
> However certain deployments may upgrade ovn-northd services first and
> then ovn-controllers.  In a large scal deployment, this can result in
> downtime during upgrades as old ovn-controllers may not understand
> new logical flows or new actions added by ovn-northd.
> 
> Even with upgrading ovn-controllers first, can result in ovn-controllers
> rejecting some of the logical flows if an existing OVN action is
> changed.  One such example is ct_commit action which recently was updated
> to take new arguments.
> 
> To avoid such downtimes during upgrades, this patch adds the
> functionality of pinning ovn-controller and ovn-northd to a specific
> version. An internal OVN version is generated and this version is stored
> by ovn-northd in the Southbound SB_Global table's
> options:northd_internal_version.
> 
> When ovn-controller notices that the internal version has changed, it
> stops handling the database changes - both Southbound and OVS. All the
> existing OF flows are preserved.  When ovn-controller is upgraded to the
> same version as ovn-northd services, it will process the database
> changes.
> 
> This feature is made optional and disabled by default. Any CMS can
> enable it by configuring the OVS local database with the option -
> ovn-pin-to-northd=false.

^^=false or =true^^ ?

> 
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
> 
> v1 -> v2
> ----
>  * Added test cases and missing documentation.
>  * Renamed the ovsdb option from ignore_northd_verison
>    to ovn-pin-to-northd.
>  * Added NEWS item.
> 
> NEWS                            |   3 +
> controller/ovn-controller.8.xml |  11 +++
> controller/ovn-controller.c     |  58 ++++++++++++-
> lib/ovn-util.c                  |  17 ++++
> lib/ovn-util.h                  |   4 +
> northd/ovn-northd.c             |  18 +++-
> tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
> 7 files changed, 251 insertions(+), 7 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 6010230679..bb95724b36 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -8,6 +8,9 @@ Post-v20.09.0
>      server.
>    - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
>      traffic.
> +   - Support version pinning between ovn-northd and ovn-controller as an
> +     option. If the option is enabled and the versions don't match,
> +     ovn-controller will not process any DB changes.
> 
> OVN v20.09.0 - 28 Sep 2020
> --------------------------
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index 16bc47b205..357edd1f5c 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -233,6 +233,17 @@
>         The boolean flag indicates if the chassis is used as an
>         interconnection gateway.
>       </dd>
> +
> +      <dt><code>external_ids:ovn-pin-to-northd</code></dt>
> +      <dd>
> +        The boolean flag indicates if <code>ovn-controller</code> needs to
> +        be pinned to the exact <code>ovn-northd</code> version. If this
> +        flag is set to true and the <code>ovn-northd's</code> version (reported
> +        in the Southbound database) doesn't match with the
> +        <code>ovn-controller's</code> internal version, then it will stop
> +        processing the Southbound and local Open vSwitch database changes.
> +        The default value is considered false if this option is not defined.
> +      </dd>
>     </dl>
> 
>     <p>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index a06cae3ccb..a7344ea9c4 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -2136,6 +2136,37 @@ struct ovn_controller_exit_args {
>     bool *restart;
> };
> 
> +static bool
> +pin_to_northd(struct ovsdb_idl *ovs_idl)
> +{
> +    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
> +    return !cfg ? false : smap_get_bool(&cfg->external_ids,
> +                                        "ovn-pin-to-northd", false);
> +}
> +
> +/* Returns false if the northd internal version and ovn-controller internal
> + * version doesn't match.
> + */
> +static bool
> +check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
> +{
> +    if (!sb) {
> +        return false;
> +    }
> +
> +    const char *northd_version =
> +        smap_get_def(&sb->options, "northd_internal_version", "");
> +
> +    bool mismatch = strcmp(northd_version, my_version);
> +    if (mismatch) {
> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> +        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
> +                     "version - %s", my_version, northd_version);
> +    }
> +
> +    return mismatch;
> +}
> +
> int
> main(int argc, char *argv[])
> {
> @@ -2428,10 +2459,14 @@ main(int argc, char *argv[])
>         .enable_lflow_cache = true
>     };
> 
> +    char *ovn_internal_version = ovn_get_internal_version();
> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> +
>     /* Main loop. */
>     exiting = false;
>     restart = false;
>     bool sb_monitor_all = false;
> +    bool version_mismatch_with_northd = false;
>     while (!exiting) {
>         /* If we're paused just run the unixctl server and skip most of the
>          * processing loop.
> @@ -2442,8 +2477,6 @@ main(int argc, char *argv[])
>             goto loop_done;
>         }
> 
> -        engine_init_run();
> -
>         struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
>         unsigned int new_ovs_cond_seqno
>             = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
> @@ -2473,6 +2506,22 @@ main(int argc, char *argv[])
>             ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>         }
> 
> +        bool version_mismatched = false;
> +        if (pin_to_northd(ovs_idl_loop.idl)) {
> +            version_mismatched = check_northd_version(
> +                sbrec_sb_global_first(ovnsb_idl_loop.idl),
> +                ovn_internal_version);
> +        } else {
> +            version_mismatched = false;
> +        }

NIT: I think the else block is redundant, since that is the value that version_mismatched is initialized with.

> +
> +        if (version_mismatch_with_northd != version_mismatched) {
> +            version_mismatch_with_northd = version_mismatched;
> +            engine_set_force_recompute(true);
> +        }
> +
> +        engine_init_run();
> +
>         struct engine_context eng_ctx = {
>             .ovs_idl_txn = ovs_idl_txn,
>             .ovnsb_idl_txn = ovnsb_idl_txn,
> @@ -2481,7 +2530,8 @@ main(int argc, char *argv[])
> 
>         engine_set_context(&eng_ctx);
> 
> -        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
> +        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
> +            !version_mismatch_with_northd) {
>             /* Contains the transport zones that this Chassis belongs to */
>             struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
>             sset_from_delimited_string(&transport_zones,
> @@ -2770,6 +2820,7 @@ loop_done:
>         }
>     }
> 
> +    free(ovn_internal_version);
>     unixctl_server_destroy(unixctl);
>     lflow_destroy();
>     ofctrl_destroy();
> @@ -2822,6 +2873,7 @@ parse_options(int argc, char *argv[])
> 
>         case 'V':
>             ovs_print_version(OFP15_VERSION, OFP15_VERSION);
> +            printf("SB DB Schema %s\n", sbrec_get_db_version());
>             exit(EXIT_SUCCESS);
> 
>         VLOG_OPTION_HANDLERS
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 18aac8da34..9dc9ade3b4 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -20,6 +20,7 @@
> #include <unistd.h>
> 
> #include "daemon.h"
> +#include "include/ovn/actions.h"
> #include "openvswitch/ofp-parse.h"
> #include "openvswitch/vlog.h"
> #include "ovn-dirs.h"
> @@ -713,3 +714,19 @@ ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>     *addr_family = ss.ss_family;
>     return true;
> }
> +
> +#define N_OVN_ACTIONS N_OVNACTS
> +BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
> +
> +/* Increment this for any logical flow changes or if existing OVN action is
> + * modified. */
> +#define OVN_INTERNAL_MINOR_VER 0
> +
> +/* Returns the OVN version. The caller must free the returned value. */
> +char *
> +ovn_get_internal_version(void)
> +{
> +    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
> +                     sbrec_get_db_version(),
> +                     N_OVN_ACTIONS, OVN_INTERNAL_MINOR_VER);
> +}
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index 3496673b26..7edd301802 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -221,4 +221,8 @@ char *str_tolower(const char *orig);
> bool ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>                                      uint16_t *port, int *addr_family);
> 
> +/* Returns the internal OVN version. The caller must free the returned
> + * value. */
> +char *ovn_get_internal_version(void);
> +
> #endif
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 4d4190cb9b..2475605cc0 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -12062,7 +12062,8 @@ ovnnb_db_run(struct northd_context *ctx,
>              struct ovsdb_idl_loop *sb_loop,
>              struct hmap *datapaths, struct hmap *ports,
>              struct ovs_list *lr_list,
> -             int64_t loop_start_time)
> +             int64_t loop_start_time,
> +             const char *ovn_internal_version)
> {
>     if (!ctx->ovnsb_txn || !ctx->ovnnb_txn) {
>         return;
> @@ -12139,6 +12140,8 @@ ovnnb_db_run(struct northd_context *ctx,
>     smap_replace(&options, "max_tunid", max_tunid);
>     free(max_tunid);
> 
> +    smap_replace(&options, "northd_internal_version", ovn_internal_version);
> +
>     nbrec_nb_global_verify_options(nb);
>     nbrec_nb_global_set_options(nb, &options);
> 
> @@ -12746,7 +12749,8 @@ ovnsb_db_run(struct northd_context *ctx,
> static void
> ovn_db_run(struct northd_context *ctx,
>            struct ovsdb_idl_index *sbrec_chassis_by_name,
> -           struct ovsdb_idl_loop *ovnsb_idl_loop)
> +           struct ovsdb_idl_loop *ovnsb_idl_loop,
> +           const char *ovn_internal_version)
> {
>     struct hmap datapaths, ports;
>     struct ovs_list lr_list;
> @@ -12756,7 +12760,8 @@ ovn_db_run(struct northd_context *ctx,
> 
>     int64_t start_time = time_wall_msec();
>     ovnnb_db_run(ctx, sbrec_chassis_by_name, ovnsb_idl_loop,
> -                 &datapaths, &ports, &lr_list, start_time);
> +                 &datapaths, &ports, &lr_list, start_time,
> +                 ovn_internal_version);
>     ovnsb_db_run(ctx, ovnsb_idl_loop, &ports, start_time);
>     destroy_datapaths_and_ports(&datapaths, &ports, &lr_list);
> }
> @@ -13113,6 +13118,9 @@ main(int argc, char *argv[])
>     unixctl_command_register("sb-connection-status", "", 0, 0,
>                              ovn_conn_show, ovnsb_idl_loop.idl);
> 
> +    char *ovn_internal_version = ovn_get_internal_version();
> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> +
>     /* Main loop. */
>     exiting = false;
>     state.had_lock = false;
> @@ -13154,7 +13162,8 @@ main(int argc, char *argv[])
>             }
> 
>             if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> -                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
> +                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop,
> +                           ovn_internal_version);
>                 if (ctx.ovnsb_txn) {
>                     check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>                     check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> @@ -13216,6 +13225,7 @@ main(int argc, char *argv[])
>         }
>     }
> 
> +    free(ovn_internal_version);
>     unixctl_server_destroy(unixctl);
>     ovsdb_idl_loop_destroy(&ovnnb_idl_loop);
>     ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index c0219bbd47..42a0f9a00f 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -22687,3 +22687,150 @@ AT_CHECK([test "$encap_rec_mvtep" = "$encap_rec_mvtep1"], [0], [])
> 
> OVN_CLEANUP([hv1])
> AT_CLEANUP
> +
> +AT_SETUP([ovn -- check ovn-northd and ovn-controller version pinning])
> +ovn_start
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.10
> +
> +check ovn-nbctl ls-add sw0
> +check ovn-nbctl lsp-add sw0 sw0-p1
> +check ovn-nbctl lsp-add sw0 sw0-p2
> +
> +as hv1
> +ovs-vsctl \
> +    -- add-port br-int vif1 \
> +    -- set Interface vif1 external_ids:iface-id=sw0-p1 \
> +    ofport-request=1
> +ovs-vsctl \
> +    -- add-port br-int vif2 \
> +    -- set Interface vif2 external_ids:iface-id=sw0-p2 \
> +    ofport-request=2
> +
> +# Wait for port to be bound.
> +wait_row_count Chassis 1 name=hv1
> +ch=$(fetch_column Chassis _uuid name=hv1)
> +wait_row_count Port_Binding 1 logical_port=sw0-p1 chassis=$ch
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +northd_version=$(ovn-sbctl get SB_Global . options:northd_internal_version | sed s/\"//g)
> +echo "northd version = $northd_version"
> +AT_CHECK([grep -c $northd_version hv1/ovn-controller.log], [0], [1
> +])
> +
> +# Stop ovn-northd so that we can modify the northd_version.
> +as northd
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +as northd-backup
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=foo
> +
> +as hv1
> +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> +
> +# ovn-controller should release the lport sw0-p2 since ovn-pin-to-northd
> +# is not true.
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +echo
> +echo "__file__:__line__: Pin ovn-controller to ovn-northd version."
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-pin-to-northd=true
> +
> +OVS_WAIT_UNTIL(
> +    [test 1 = $(grep -c "controller version - $northd_version mismatch with northd version - foo" hv1/ovn-controller.log)
> +])
> +
> +as hv1
> +check ovs-vsctl set interface vif2 external_ids:iface-id=sw0-p2
> +
> +# ovn-controller should not claim sw0-p2 since there is version mismatch
> +as hv1 ovn-appctl -t ovn-controller recompute
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> +
> +# It should claim sw0-p2
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +as hv1
> +ovn_remote=$(ovs-vsctl get open . external_ids:ovn-remote | sed s/\"//g)
> +ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> +
> +# ovn-controller is not connected to the SB DB. Even though it
> +# releases sw0-p2, it will not delete the OF flows.
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +# Change the version to incorrect one and reconnect to the SB DB.
> +check ovn-sbctl set SB_Global . options:northd_internal_version=bar
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> +
> +sleep 1
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +# Change the ovn-remote to incorrect and set the correct northd version
> +# and then change back to the correct ovn-remote
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> +
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +as hv1
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +AT_CLEANUP
> --
> 2.28.0
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara Nov. 18, 2020, 11:53 a.m. UTC | #2
On 11/17/20 12:57 PM, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> OVN recommends updating/upgrading ovn-controllers first and then
> ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
> new functionality specified by the database or logical flows created
> by ovn-northd is understood by ovn-controller.
> 
> However certain deployments may upgrade ovn-northd services first and
> then ovn-controllers.  In a large scal deployment, this can result in
> downtime during upgrades as old ovn-controllers may not understand
> new logical flows or new actions added by ovn-northd.
> 
> Even with upgrading ovn-controllers first, can result in ovn-controllers
> rejecting some of the logical flows if an existing OVN action is
> changed.  One such example is ct_commit action which recently was updated
> to take new arguments.
> 
> To avoid such downtimes during upgrades, this patch adds the
> functionality of pinning ovn-controller and ovn-northd to a specific
> version. An internal OVN version is generated and this version is stored
> by ovn-northd in the Southbound SB_Global table's
> options:northd_internal_version.
> 
> When ovn-controller notices that the internal version has changed, it
> stops handling the database changes - both Southbound and OVS. All the
> existing OF flows are preserved.  When ovn-controller is upgraded to the
> same version as ovn-northd services, it will process the database
> changes.
> 
> This feature is made optional and disabled by default. Any CMS can
> enable it by configuring the OVS local database with the option -
> ovn-pin-to-northd=false.
> 
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---

Hi Numan,

Next to Flavio's comments I have some too.

> 
> v1 -> v2
> ----
>   * Added test cases and missing documentation.
>   * Renamed the ovsdb option from ignore_northd_verison
>     to ovn-pin-to-northd.
>   * Added NEWS item.
> 
>  NEWS                            |   3 +
>  controller/ovn-controller.8.xml |  11 +++
>  controller/ovn-controller.c     |  58 ++++++++++++-
>  lib/ovn-util.c                  |  17 ++++
>  lib/ovn-util.h                  |   4 +
>  northd/ovn-northd.c             |  18 +++-
>  tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
>  7 files changed, 251 insertions(+), 7 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 6010230679..bb95724b36 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -8,6 +8,9 @@ Post-v20.09.0
>       server.
>     - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
>       traffic.
> +   - Support version pinning between ovn-northd and ovn-controller as an
> +     option. If the option is enabled and the versions don't match,
> +     ovn-controller will not process any DB changes.
>  
>  OVN v20.09.0 - 28 Sep 2020
>  --------------------------
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index 16bc47b205..357edd1f5c 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -233,6 +233,17 @@
>          The boolean flag indicates if the chassis is used as an
>          interconnection gateway.
>        </dd>
> +
> +      <dt><code>external_ids:ovn-pin-to-northd</code></dt>

This still seems misleading to me.  We don't actually pin to a northd
version, we pin to an OVN internal version.  What if we call it
"ovn-pin-to-version"?

> +      <dd>
> +        The boolean flag indicates if <code>ovn-controller</code> needs to
> +        be pinned to the exact <code>ovn-northd</code> version. If this
> +        flag is set to true and the <code>ovn-northd's</code> version (reported
> +        in the Southbound database) doesn't match with the
> +        <code>ovn-controller's</code> internal version, then it will stop
> +        processing the Southbound and local Open vSwitch database changes.
> +        The default value is considered false if this option is not defined.
> +      </dd>
>      </dl>
>  
>      <p>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index a06cae3ccb..a7344ea9c4 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -2136,6 +2136,37 @@ struct ovn_controller_exit_args {
>      bool *restart;
>  };
>  
> +static bool
> +pin_to_northd(struct ovsdb_idl *ovs_idl)
> +{
> +    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
> +    return !cfg ? false : smap_get_bool(&cfg->external_ids,
> +                                        "ovn-pin-to-northd", false);
> +}
> +
> +/* Returns false if the northd internal version and ovn-controller internal
> + * version doesn't match.
> + */
> +static bool
> +check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
> +{
> +    if (!sb) {
> +        return false;
> +    }
> +
> +    const char *northd_version =
> +        smap_get_def(&sb->options, "northd_internal_version", "");
> +
> +    bool mismatch = strcmp(northd_version, my_version);
> +    if (mismatch) {
> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> +        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
> +                     "version - %s", my_version, northd_version);
> +    }
> +
> +    return mismatch;
> +}
> +
>  int
>  main(int argc, char *argv[])
>  {
> @@ -2428,10 +2459,14 @@ main(int argc, char *argv[])
>          .enable_lflow_cache = true
>      };
>  
> +    char *ovn_internal_version = ovn_get_internal_version();
> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> +
>      /* Main loop. */
>      exiting = false;
>      restart = false;
>      bool sb_monitor_all = false;
> +    bool version_mismatch_with_northd = false;
>      while (!exiting) {
>          /* If we're paused just run the unixctl server and skip most of the
>           * processing loop.
> @@ -2442,8 +2477,6 @@ main(int argc, char *argv[])
>              goto loop_done;
>          }
>  
> -        engine_init_run();
> -

I'm not sure why it's better to move engine_init_run() lower.  I think
it can stay here.

>          struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
>          unsigned int new_ovs_cond_seqno
>              = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
> @@ -2473,6 +2506,22 @@ main(int argc, char *argv[])
>              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>          }
>  
> +        bool version_mismatched = false;
> +        if (pin_to_northd(ovs_idl_loop.idl)) {
> +            version_mismatched = check_northd_version(
> +                sbrec_sb_global_first(ovnsb_idl_loop.idl),
> +                ovn_internal_version);
> +        } else {
> +            version_mismatched = false;
> +        }
> +
> +        if (version_mismatch_with_northd != version_mismatched) {
> +            version_mismatch_with_northd = version_mismatched;
> +            engine_set_force_recompute(true);
> +        }
> +
> +        engine_init_run();
> +

I think we can make the version checking more contained, what do you
think about the following incremental?

https://github.com/dceara/ovn/commit/29c382137eb974aff3ea080b8098b2919189e6e3

>          struct engine_context eng_ctx = {
>              .ovs_idl_txn = ovs_idl_txn,
>              .ovnsb_idl_txn = ovnsb_idl_txn,
> @@ -2481,7 +2530,8 @@ main(int argc, char *argv[])
>  
>          engine_set_context(&eng_ctx);
>  
> -        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
> +        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
> +            !version_mismatch_with_northd) {
>              /* Contains the transport zones that this Chassis belongs to */
>              struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
>              sset_from_delimited_string(&transport_zones,
> @@ -2770,6 +2820,7 @@ loop_done:
>          }
>      }
>  
> +    free(ovn_internal_version);
>      unixctl_server_destroy(unixctl);
>      lflow_destroy();
>      ofctrl_destroy();
> @@ -2822,6 +2873,7 @@ parse_options(int argc, char *argv[])
>  
>          case 'V':
>              ovs_print_version(OFP15_VERSION, OFP15_VERSION);
> +            printf("SB DB Schema %s\n", sbrec_get_db_version());
>              exit(EXIT_SUCCESS);
>  
>          VLOG_OPTION_HANDLERS
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 18aac8da34..9dc9ade3b4 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -20,6 +20,7 @@
>  #include <unistd.h>
>  
>  #include "daemon.h"
> +#include "include/ovn/actions.h"
>  #include "openvswitch/ofp-parse.h"
>  #include "openvswitch/vlog.h"
>  #include "ovn-dirs.h"
> @@ -713,3 +714,19 @@ ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>      *addr_family = ss.ss_family;
>      return true;
>  }
> +
> +#define N_OVN_ACTIONS N_OVNACTS
> +BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
> +
> +/* Increment this for any logical flow changes or if existing OVN action is
> + * modified. */

Except for this comment here (that can be easily overlooked), we don't
mention anywhere else that the minor version should be incremented when
changing existing OVN actions.

I'm not sure if we could automatically generate the minor version but in
any case we should make this more visible (both for contributors and
reviewers).

I'd suggest at least updating the comment in actions.h where we define
OVNACTS.

Maybe we should also mention this in one of the internals/*.rst docs.
Is Documentation/internals/submitting-patches.rst the best place to
mention this?  I see we have a section for "Feature Deprecation
Guidelines", maybe we can add one for "OVN compatibility" or "OVN Upgrade"?

Thanks,
Dumitru

> +#define OVN_INTERNAL_MINOR_VER 0
> +
> +/* Returns the OVN version. The caller must free the returned value. */
> +char *
> +ovn_get_internal_version(void)
> +{
> +    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
> +                     sbrec_get_db_version(),
> +                     N_OVN_ACTIONS, OVN_INTERNAL_MINOR_VER);
> +}
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index 3496673b26..7edd301802 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -221,4 +221,8 @@ char *str_tolower(const char *orig);
>  bool ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>                                       uint16_t *port, int *addr_family);
>  
> +/* Returns the internal OVN version. The caller must free the returned
> + * value. */
> +char *ovn_get_internal_version(void);
> +
>  #endif
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 4d4190cb9b..2475605cc0 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -12062,7 +12062,8 @@ ovnnb_db_run(struct northd_context *ctx,
>               struct ovsdb_idl_loop *sb_loop,
>               struct hmap *datapaths, struct hmap *ports,
>               struct ovs_list *lr_list,
> -             int64_t loop_start_time)
> +             int64_t loop_start_time,
> +             const char *ovn_internal_version)
>  {
>      if (!ctx->ovnsb_txn || !ctx->ovnnb_txn) {
>          return;
> @@ -12139,6 +12140,8 @@ ovnnb_db_run(struct northd_context *ctx,
>      smap_replace(&options, "max_tunid", max_tunid);
>      free(max_tunid);
>  
> +    smap_replace(&options, "northd_internal_version", ovn_internal_version);
> +
>      nbrec_nb_global_verify_options(nb);
>      nbrec_nb_global_set_options(nb, &options);
>  
> @@ -12746,7 +12749,8 @@ ovnsb_db_run(struct northd_context *ctx,
>  static void
>  ovn_db_run(struct northd_context *ctx,
>             struct ovsdb_idl_index *sbrec_chassis_by_name,
> -           struct ovsdb_idl_loop *ovnsb_idl_loop)
> +           struct ovsdb_idl_loop *ovnsb_idl_loop,
> +           const char *ovn_internal_version)
>  {
>      struct hmap datapaths, ports;
>      struct ovs_list lr_list;
> @@ -12756,7 +12760,8 @@ ovn_db_run(struct northd_context *ctx,
>  
>      int64_t start_time = time_wall_msec();
>      ovnnb_db_run(ctx, sbrec_chassis_by_name, ovnsb_idl_loop,
> -                 &datapaths, &ports, &lr_list, start_time);
> +                 &datapaths, &ports, &lr_list, start_time,
> +                 ovn_internal_version);
>      ovnsb_db_run(ctx, ovnsb_idl_loop, &ports, start_time);
>      destroy_datapaths_and_ports(&datapaths, &ports, &lr_list);
>  }
> @@ -13113,6 +13118,9 @@ main(int argc, char *argv[])
>      unixctl_command_register("sb-connection-status", "", 0, 0,
>                               ovn_conn_show, ovnsb_idl_loop.idl);
>  
> +    char *ovn_internal_version = ovn_get_internal_version();
> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> +
>      /* Main loop. */
>      exiting = false;
>      state.had_lock = false;
> @@ -13154,7 +13162,8 @@ main(int argc, char *argv[])
>              }
>  
>              if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> -                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
> +                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop,
> +                           ovn_internal_version);
>                  if (ctx.ovnsb_txn) {
>                      check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
>                      check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> @@ -13216,6 +13225,7 @@ main(int argc, char *argv[])
>          }
>      }
>  
> +    free(ovn_internal_version);
>      unixctl_server_destroy(unixctl);
>      ovsdb_idl_loop_destroy(&ovnnb_idl_loop);
>      ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index c0219bbd47..42a0f9a00f 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -22687,3 +22687,150 @@ AT_CHECK([test "$encap_rec_mvtep" = "$encap_rec_mvtep1"], [0], [])
>  
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
> +
> +AT_SETUP([ovn -- check ovn-northd and ovn-controller version pinning])
> +ovn_start
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.10
> +
> +check ovn-nbctl ls-add sw0
> +check ovn-nbctl lsp-add sw0 sw0-p1
> +check ovn-nbctl lsp-add sw0 sw0-p2
> +
> +as hv1
> +ovs-vsctl \
> +    -- add-port br-int vif1 \
> +    -- set Interface vif1 external_ids:iface-id=sw0-p1 \
> +    ofport-request=1
> +ovs-vsctl \
> +    -- add-port br-int vif2 \
> +    -- set Interface vif2 external_ids:iface-id=sw0-p2 \
> +    ofport-request=2
> +
> +# Wait for port to be bound.
> +wait_row_count Chassis 1 name=hv1
> +ch=$(fetch_column Chassis _uuid name=hv1)
> +wait_row_count Port_Binding 1 logical_port=sw0-p1 chassis=$ch
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +northd_version=$(ovn-sbctl get SB_Global . options:northd_internal_version | sed s/\"//g)
> +echo "northd version = $northd_version"
> +AT_CHECK([grep -c $northd_version hv1/ovn-controller.log], [0], [1
> +])
> +
> +# Stop ovn-northd so that we can modify the northd_version.
> +as northd
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +as northd-backup
> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=foo
> +
> +as hv1
> +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> +
> +# ovn-controller should release the lport sw0-p2 since ovn-pin-to-northd
> +# is not true.
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +echo
> +echo "__file__:__line__: Pin ovn-controller to ovn-northd version."
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-pin-to-northd=true
> +
> +OVS_WAIT_UNTIL(
> +    [test 1 = $(grep -c "controller version - $northd_version mismatch with northd version - foo" hv1/ovn-controller.log)
> +])
> +
> +as hv1
> +check ovs-vsctl set interface vif2 external_ids:iface-id=sw0-p2
> +
> +# ovn-controller should not claim sw0-p2 since there is version mismatch
> +as hv1 ovn-appctl -t ovn-controller recompute
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> +
> +# It should claim sw0-p2
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +as hv1
> +ovn_remote=$(ovs-vsctl get open . external_ids:ovn-remote | sed s/\"//g)
> +ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> +
> +# ovn-controller is not connected to the SB DB. Even though it
> +# releases sw0-p2, it will not delete the OF flows.
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +# Change the version to incorrect one and reconnect to the SB DB.
> +check ovn-sbctl set SB_Global . options:northd_internal_version=bar
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> +
> +sleep 1
> +
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> +1
> +])
> +
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> +
> +# Change the ovn-remote to incorrect and set the correct northd version
> +# and then change back to the correct ovn-remote
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> +
> +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> +
> +as hv1
> +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> +
> +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> +AT_CAPTURE_FILE([offlows_table0.txt])
> +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> +0
> +])
> +
> +as hv1
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +AT_CLEANUP
>
Numan Siddique Nov. 18, 2020, 1:08 p.m. UTC | #3
On Wed, Nov 18, 2020 at 5:23 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 11/17/20 12:57 PM, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > OVN recommends updating/upgrading ovn-controllers first and then
> > ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
> > new functionality specified by the database or logical flows created
> > by ovn-northd is understood by ovn-controller.
> >
> > However certain deployments may upgrade ovn-northd services first and
> > then ovn-controllers.  In a large scal deployment, this can result in
> > downtime during upgrades as old ovn-controllers may not understand
> > new logical flows or new actions added by ovn-northd.
> >
> > Even with upgrading ovn-controllers first, can result in ovn-controllers
> > rejecting some of the logical flows if an existing OVN action is
> > changed.  One such example is ct_commit action which recently was updated
> > to take new arguments.
> >
> > To avoid such downtimes during upgrades, this patch adds the
> > functionality of pinning ovn-controller and ovn-northd to a specific
> > version. An internal OVN version is generated and this version is stored
> > by ovn-northd in the Southbound SB_Global table's
> > options:northd_internal_version.
> >
> > When ovn-controller notices that the internal version has changed, it
> > stops handling the database changes - both Southbound and OVS. All the
> > existing OF flows are preserved.  When ovn-controller is upgraded to the
> > same version as ovn-northd services, it will process the database
> > changes.
> >
> > This feature is made optional and disabled by default. Any CMS can
> > enable it by configuring the OVS local database with the option -
> > ovn-pin-to-northd=false.
> >
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > ---
>
> Hi Numan,
>
> Next to Flavio's comments I have some too.
>

Thanks Flavio and Dumitru for the reviews.


> >
> > v1 -> v2
> > ----
> >   * Added test cases and missing documentation.
> >   * Renamed the ovsdb option from ignore_northd_verison
> >     to ovn-pin-to-northd.
> >   * Added NEWS item.
> >
> >  NEWS                            |   3 +
> >  controller/ovn-controller.8.xml |  11 +++
> >  controller/ovn-controller.c     |  58 ++++++++++++-
> >  lib/ovn-util.c                  |  17 ++++
> >  lib/ovn-util.h                  |   4 +
> >  northd/ovn-northd.c             |  18 +++-
> >  tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
> >  7 files changed, 251 insertions(+), 7 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 6010230679..bb95724b36 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -8,6 +8,9 @@ Post-v20.09.0
> >       server.
> >     - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
> >       traffic.
> > +   - Support version pinning between ovn-northd and ovn-controller as an
> > +     option. If the option is enabled and the versions don't match,
> > +     ovn-controller will not process any DB changes.
> >
> >  OVN v20.09.0 - 28 Sep 2020
> >  --------------------------
> > diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> > index 16bc47b205..357edd1f5c 100644
> > --- a/controller/ovn-controller.8.xml
> > +++ b/controller/ovn-controller.8.xml
> > @@ -233,6 +233,17 @@
> >          The boolean flag indicates if the chassis is used as an
> >          interconnection gateway.
> >        </dd>
> > +
> > +      <dt><code>external_ids:ovn-pin-to-northd</code></dt>
>
> This still seems misleading to me.  We don't actually pin to a northd
> version, we pin to an OVN internal version.  What if we call it
> "ovn-pin-to-version"?

It's true that we pin to the OVN internal version, but my idea is that
ovn-northd writes
its internal version to SB_Global. And ovn-controller reads the northd
version and compares
with its own version.

Few other points
  - I think ovn-pin-to-northd (or ovn-pin-to-northd-version) makes
sense from a CMS point of view.
  - CMS doesn't need to know about the internal version mechanism used.
  - In production deployments when northd services are updated first,
ovn-northd will update its new
    internal version to SB_Global.
  - And the idea of this patch is to pin the controller to the same
northd version.

So I'd prefer to call it ovn-pin-to-northd.


>
> > +      <dd>
> > +        The boolean flag indicates if <code>ovn-controller</code> needs to
> > +        be pinned to the exact <code>ovn-northd</code> version. If this
> > +        flag is set to true and the <code>ovn-northd's</code> version (reported
> > +        in the Southbound database) doesn't match with the
> > +        <code>ovn-controller's</code> internal version, then it will stop
> > +        processing the Southbound and local Open vSwitch database changes.
> > +        The default value is considered false if this option is not defined.
> > +      </dd>
> >      </dl>
> >
> >      <p>
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index a06cae3ccb..a7344ea9c4 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -2136,6 +2136,37 @@ struct ovn_controller_exit_args {
> >      bool *restart;
> >  };
> >
> > +static bool
> > +pin_to_northd(struct ovsdb_idl *ovs_idl)
> > +{
> > +    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
> > +    return !cfg ? false : smap_get_bool(&cfg->external_ids,
> > +                                        "ovn-pin-to-northd", false);
> > +}
> > +
> > +/* Returns false if the northd internal version and ovn-controller internal
> > + * version doesn't match.
> > + */
> > +static bool
> > +check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
> > +{
> > +    if (!sb) {
> > +        return false;
> > +    }
> > +
> > +    const char *northd_version =
> > +        smap_get_def(&sb->options, "northd_internal_version", "");
> > +
> > +    bool mismatch = strcmp(northd_version, my_version);
> > +    if (mismatch) {
> > +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> > +        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
> > +                     "version - %s", my_version, northd_version);
> > +    }
> > +
> > +    return mismatch;
> > +}
> > +
> >  int
> >  main(int argc, char *argv[])
> >  {
> > @@ -2428,10 +2459,14 @@ main(int argc, char *argv[])
> >          .enable_lflow_cache = true
> >      };
> >
> > +    char *ovn_internal_version = ovn_get_internal_version();
> > +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> > +
> >      /* Main loop. */
> >      exiting = false;
> >      restart = false;
> >      bool sb_monitor_all = false;
> > +    bool version_mismatch_with_northd = false;
> >      while (!exiting) {
> >          /* If we're paused just run the unixctl server and skip most of the
> >           * processing loop.
> > @@ -2442,8 +2477,6 @@ main(int argc, char *argv[])
> >              goto loop_done;
> >          }
> >
> > -        engine_init_run();
> > -
>
> I'm not sure why it's better to move engine_init_run() lower.  I think
> it can stay here.

Ack. I'm fine either way.

>
> >          struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
> >          unsigned int new_ovs_cond_seqno
> >              = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
> > @@ -2473,6 +2506,22 @@ main(int argc, char *argv[])
> >              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
> >          }
> >
> > +        bool version_mismatched = false;
> > +        if (pin_to_northd(ovs_idl_loop.idl)) {
> > +            version_mismatched = check_northd_version(
> > +                sbrec_sb_global_first(ovnsb_idl_loop.idl),
> > +                ovn_internal_version);
> > +        } else {
> > +            version_mismatched = false;
> > +        }
> > +
> > +        if (version_mismatch_with_northd != version_mismatched) {
> > +            version_mismatch_with_northd = version_mismatched;
> > +            engine_set_force_recompute(true);
> > +        }
> > +
> > +        engine_init_run();
> > +
>
> I think we can make the version checking more contained, what do you
> think about the following incremental?
>
> https://github.com/dceara/ovn/commit/29c382137eb974aff3ea080b8098b2919189e6e3

It looks fine to me. I'd replace check_version with
check_northd_version based on the reasons I gave above.


>
> >          struct engine_context eng_ctx = {
> >              .ovs_idl_txn = ovs_idl_txn,
> >              .ovnsb_idl_txn = ovnsb_idl_txn,
> > @@ -2481,7 +2530,8 @@ main(int argc, char *argv[])
> >
> >          engine_set_context(&eng_ctx);
> >
> > -        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
> > +        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
> > +            !version_mismatch_with_northd) {
> >              /* Contains the transport zones that this Chassis belongs to */
> >              struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
> >              sset_from_delimited_string(&transport_zones,
> > @@ -2770,6 +2820,7 @@ loop_done:
> >          }
> >      }
> >
> > +    free(ovn_internal_version);
> >      unixctl_server_destroy(unixctl);
> >      lflow_destroy();
> >      ofctrl_destroy();
> > @@ -2822,6 +2873,7 @@ parse_options(int argc, char *argv[])
> >
> >          case 'V':
> >              ovs_print_version(OFP15_VERSION, OFP15_VERSION);
> > +            printf("SB DB Schema %s\n", sbrec_get_db_version());
> >              exit(EXIT_SUCCESS);
> >
> >          VLOG_OPTION_HANDLERS
> > diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> > index 18aac8da34..9dc9ade3b4 100644
> > --- a/lib/ovn-util.c
> > +++ b/lib/ovn-util.c
> > @@ -20,6 +20,7 @@
> >  #include <unistd.h>
> >
> >  #include "daemon.h"
> > +#include "include/ovn/actions.h"
> >  #include "openvswitch/ofp-parse.h"
> >  #include "openvswitch/vlog.h"
> >  #include "ovn-dirs.h"
> > @@ -713,3 +714,19 @@ ip_address_and_port_from_lb_key(const char *key, char **ip_address,
> >      *addr_family = ss.ss_family;
> >      return true;
> >  }
> > +
> > +#define N_OVN_ACTIONS N_OVNACTS
> > +BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
> > +
> > +/* Increment this for any logical flow changes or if existing OVN action is
> > + * modified. */
>
> Except for this comment here (that can be easily overlooked), we don't
> mention anywhere else that the minor version should be incremented when
> changing existing OVN actions.
>
> I'm not sure if we could automatically generate the minor version but in
> any case we should make this more visible (both for contributors and
> reviewers).

Maybe we should never update an existing action (from now on ) ? If
there is a need we can
add a new action  ?

Or can we come up with a way to version the ovnacts ? Right now we
have a 'pad' field in the struct ovnact.
I think we can rename it to version. Whenever an action is updated we
need to increment this field.
And the OVN_INTERNAL_MINOR_VER will be the sum of all the versions of ovnacts ?

If this sounds reasonable I can look into this as a follow up patch.

>
> I'd suggest at least updating the comment in actions.h where we define
> OVNACTS.

Ack.

>
> Maybe we should also mention this in one of the internals/*.rst docs.
> Is Documentation/internals/submitting-patches.rst the best place to
> mention this?  I see we have a section for "Feature Deprecation
> Guidelines", maybe we can add one for "OVN compatibility" or "OVN Upgrade"?

I'll see if we can add more documentation about it here.

Thanks
Numan

>
> Thanks,
> Dumitru
>
> > +#define OVN_INTERNAL_MINOR_VER 0
> > +
> > +/* Returns the OVN version. The caller must free the returned value. */
> > +char *
> > +ovn_get_internal_version(void)
> > +{
> > +    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
> > +                     sbrec_get_db_version(),
> > +                     N_OVN_ACTIONS, OVN_INTERNAL_MINOR_VER);
> > +}
> > diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> > index 3496673b26..7edd301802 100644
> > --- a/lib/ovn-util.h
> > +++ b/lib/ovn-util.h
> > @@ -221,4 +221,8 @@ char *str_tolower(const char *orig);
> >  bool ip_address_and_port_from_lb_key(const char *key, char **ip_address,
> >                                       uint16_t *port, int *addr_family);
> >
> > +/* Returns the internal OVN version. The caller must free the returned
> > + * value. */
> > +char *ovn_get_internal_version(void);
> > +
> >  #endif
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 4d4190cb9b..2475605cc0 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -12062,7 +12062,8 @@ ovnnb_db_run(struct northd_context *ctx,
> >               struct ovsdb_idl_loop *sb_loop,
> >               struct hmap *datapaths, struct hmap *ports,
> >               struct ovs_list *lr_list,
> > -             int64_t loop_start_time)
> > +             int64_t loop_start_time,
> > +             const char *ovn_internal_version)
> >  {
> >      if (!ctx->ovnsb_txn || !ctx->ovnnb_txn) {
> >          return;
> > @@ -12139,6 +12140,8 @@ ovnnb_db_run(struct northd_context *ctx,
> >      smap_replace(&options, "max_tunid", max_tunid);
> >      free(max_tunid);
> >
> > +    smap_replace(&options, "northd_internal_version", ovn_internal_version);
> > +
> >      nbrec_nb_global_verify_options(nb);
> >      nbrec_nb_global_set_options(nb, &options);
> >
> > @@ -12746,7 +12749,8 @@ ovnsb_db_run(struct northd_context *ctx,
> >  static void
> >  ovn_db_run(struct northd_context *ctx,
> >             struct ovsdb_idl_index *sbrec_chassis_by_name,
> > -           struct ovsdb_idl_loop *ovnsb_idl_loop)
> > +           struct ovsdb_idl_loop *ovnsb_idl_loop,
> > +           const char *ovn_internal_version)
> >  {
> >      struct hmap datapaths, ports;
> >      struct ovs_list lr_list;
> > @@ -12756,7 +12760,8 @@ ovn_db_run(struct northd_context *ctx,
> >
> >      int64_t start_time = time_wall_msec();
> >      ovnnb_db_run(ctx, sbrec_chassis_by_name, ovnsb_idl_loop,
> > -                 &datapaths, &ports, &lr_list, start_time);
> > +                 &datapaths, &ports, &lr_list, start_time,
> > +                 ovn_internal_version);
> >      ovnsb_db_run(ctx, ovnsb_idl_loop, &ports, start_time);
> >      destroy_datapaths_and_ports(&datapaths, &ports, &lr_list);
> >  }
> > @@ -13113,6 +13118,9 @@ main(int argc, char *argv[])
> >      unixctl_command_register("sb-connection-status", "", 0, 0,
> >                               ovn_conn_show, ovnsb_idl_loop.idl);
> >
> > +    char *ovn_internal_version = ovn_get_internal_version();
> > +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> > +
> >      /* Main loop. */
> >      exiting = false;
> >      state.had_lock = false;
> > @@ -13154,7 +13162,8 @@ main(int argc, char *argv[])
> >              }
> >
> >              if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > -                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
> > +                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop,
> > +                           ovn_internal_version);
> >                  if (ctx.ovnsb_txn) {
> >                      check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> >                      check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> > @@ -13216,6 +13225,7 @@ main(int argc, char *argv[])
> >          }
> >      }
> >
> > +    free(ovn_internal_version);
> >      unixctl_server_destroy(unixctl);
> >      ovsdb_idl_loop_destroy(&ovnnb_idl_loop);
> >      ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index c0219bbd47..42a0f9a00f 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -22687,3 +22687,150 @@ AT_CHECK([test "$encap_rec_mvtep" = "$encap_rec_mvtep1"], [0], [])
> >
> >  OVN_CLEANUP([hv1])
> >  AT_CLEANUP
> > +
> > +AT_SETUP([ovn -- check ovn-northd and ovn-controller version pinning])
> > +ovn_start
> > +
> > +net_add n1
> > +sim_add hv1
> > +as hv1
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.10
> > +
> > +check ovn-nbctl ls-add sw0
> > +check ovn-nbctl lsp-add sw0 sw0-p1
> > +check ovn-nbctl lsp-add sw0 sw0-p2
> > +
> > +as hv1
> > +ovs-vsctl \
> > +    -- add-port br-int vif1 \
> > +    -- set Interface vif1 external_ids:iface-id=sw0-p1 \
> > +    ofport-request=1
> > +ovs-vsctl \
> > +    -- add-port br-int vif2 \
> > +    -- set Interface vif2 external_ids:iface-id=sw0-p2 \
> > +    ofport-request=2
> > +
> > +# Wait for port to be bound.
> > +wait_row_count Chassis 1 name=hv1
> > +ch=$(fetch_column Chassis _uuid name=hv1)
> > +wait_row_count Port_Binding 1 logical_port=sw0-p1 chassis=$ch
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> > +
> > +northd_version=$(ovn-sbctl get SB_Global . options:northd_internal_version | sed s/\"//g)
> > +echo "northd version = $northd_version"
> > +AT_CHECK([grep -c $northd_version hv1/ovn-controller.log], [0], [1
> > +])
> > +
> > +# Stop ovn-northd so that we can modify the northd_version.
> > +as northd
> > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > +
> > +as northd-backup
> > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
> > +
> > +check ovn-sbctl set SB_Global . options:northd_internal_version=foo
> > +
> > +as hv1
> > +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> > +
> > +# ovn-controller should release the lport sw0-p2 since ovn-pin-to-northd
> > +# is not true.
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> > +
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> > +0
> > +])
> > +
> > +echo
> > +echo "__file__:__line__: Pin ovn-controller to ovn-northd version."
> > +
> > +as hv1
> > +check ovs-vsctl set open . external_ids:ovn-pin-to-northd=true
> > +
> > +OVS_WAIT_UNTIL(
> > +    [test 1 = $(grep -c "controller version - $northd_version mismatch with northd version - foo" hv1/ovn-controller.log)
> > +])
> > +
> > +as hv1
> > +check ovs-vsctl set interface vif2 external_ids:iface-id=sw0-p2
> > +
> > +# ovn-controller should not claim sw0-p2 since there is version mismatch
> > +as hv1 ovn-appctl -t ovn-controller recompute
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> > +
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> > +0
> > +])
> > +
> > +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> > +
> > +# It should claim sw0-p2
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> > +
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> > +1
> > +])
> > +
> > +as hv1
> > +ovn_remote=$(ovs-vsctl get open . external_ids:ovn-remote | sed s/\"//g)
> > +ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> > +check ovs-vsctl set interface vif2 external_ids:iface-id=foo
> > +
> > +# ovn-controller is not connected to the SB DB. Even though it
> > +# releases sw0-p2, it will not delete the OF flows.
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> > +1
> > +])
> > +
> > +# Change the version to incorrect one and reconnect to the SB DB.
> > +check ovn-sbctl set SB_Global . options:northd_internal_version=bar
> > +
> > +as hv1
> > +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> > +
> > +sleep 1
> > +
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
> > +1
> > +])
> > +
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
> > +
> > +# Change the ovn-remote to incorrect and set the correct northd version
> > +# and then change back to the correct ovn-remote
> > +as hv1
> > +check ovs-vsctl set open . external_ids:ovn-remote=unix:foo
> > +
> > +check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
> > +
> > +as hv1
> > +check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
> > +
> > +wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
> > +as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
> > +AT_CAPTURE_FILE([offlows_table0.txt])
> > +AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
> > +0
> > +])
> > +
> > +as hv1
> > +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> > +
> > +as ovn-sb
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +as ovn-nb
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +AT_CLEANUP
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara Nov. 18, 2020, 1:21 p.m. UTC | #4
On 11/18/20 2:08 PM, Numan Siddique wrote:
> On Wed, Nov 18, 2020 at 5:23 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 11/17/20 12:57 PM, numans@ovn.org wrote:
>>> From: Numan Siddique <numans@ovn.org>
>>>
>>> OVN recommends updating/upgrading ovn-controllers first and then
>>> ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
>>> new functionality specified by the database or logical flows created
>>> by ovn-northd is understood by ovn-controller.
>>>
>>> However certain deployments may upgrade ovn-northd services first and
>>> then ovn-controllers.  In a large scal deployment, this can result in
>>> downtime during upgrades as old ovn-controllers may not understand
>>> new logical flows or new actions added by ovn-northd.
>>>
>>> Even with upgrading ovn-controllers first, can result in ovn-controllers
>>> rejecting some of the logical flows if an existing OVN action is
>>> changed.  One such example is ct_commit action which recently was updated
>>> to take new arguments.
>>>
>>> To avoid such downtimes during upgrades, this patch adds the
>>> functionality of pinning ovn-controller and ovn-northd to a specific
>>> version. An internal OVN version is generated and this version is stored
>>> by ovn-northd in the Southbound SB_Global table's
>>> options:northd_internal_version.
>>>
>>> When ovn-controller notices that the internal version has changed, it
>>> stops handling the database changes - both Southbound and OVS. All the
>>> existing OF flows are preserved.  When ovn-controller is upgraded to the
>>> same version as ovn-northd services, it will process the database
>>> changes.
>>>
>>> This feature is made optional and disabled by default. Any CMS can
>>> enable it by configuring the OVS local database with the option -
>>> ovn-pin-to-northd=false.
>>>
>>> Signed-off-by: Numan Siddique <numans@ovn.org>
>>> ---
>>
>> Hi Numan,
>>
>> Next to Flavio's comments I have some too.
>>
> 
> Thanks Flavio and Dumitru for the reviews.
> 
> 
>>>
>>> v1 -> v2
>>> ----
>>>   * Added test cases and missing documentation.
>>>   * Renamed the ovsdb option from ignore_northd_verison
>>>     to ovn-pin-to-northd.
>>>   * Added NEWS item.
>>>
>>>  NEWS                            |   3 +
>>>  controller/ovn-controller.8.xml |  11 +++
>>>  controller/ovn-controller.c     |  58 ++++++++++++-
>>>  lib/ovn-util.c                  |  17 ++++
>>>  lib/ovn-util.h                  |   4 +
>>>  northd/ovn-northd.c             |  18 +++-
>>>  tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
>>>  7 files changed, 251 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 6010230679..bb95724b36 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -8,6 +8,9 @@ Post-v20.09.0
>>>       server.
>>>     - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
>>>       traffic.
>>> +   - Support version pinning between ovn-northd and ovn-controller as an
>>> +     option. If the option is enabled and the versions don't match,
>>> +     ovn-controller will not process any DB changes.
>>>
>>>  OVN v20.09.0 - 28 Sep 2020
>>>  --------------------------
>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
>>> index 16bc47b205..357edd1f5c 100644
>>> --- a/controller/ovn-controller.8.xml
>>> +++ b/controller/ovn-controller.8.xml
>>> @@ -233,6 +233,17 @@
>>>          The boolean flag indicates if the chassis is used as an
>>>          interconnection gateway.
>>>        </dd>
>>> +
>>> +      <dt><code>external_ids:ovn-pin-to-northd</code></dt>
>>
>> This still seems misleading to me.  We don't actually pin to a northd
>> version, we pin to an OVN internal version.  What if we call it
>> "ovn-pin-to-version"?
> 
> It's true that we pin to the OVN internal version, but my idea is that
> ovn-northd writes
> its internal version to SB_Global. And ovn-controller reads the northd
> version and compares
> with its own version.
> 
> Few other points
>   - I think ovn-pin-to-northd (or ovn-pin-to-northd-version) makes
> sense from a CMS point of view.
>   - CMS doesn't need to know about the internal version mechanism used.
>   - In production deployments when northd services are updated first,
> ovn-northd will update its new
>     internal version to SB_Global.
>   - And the idea of this patch is to pin the controller to the same
> northd version.

So I guess this translates to "make ovn-controller reject SB contents if
they were created by the wrong northd version".

> 
> So I'd prefer to call it ovn-pin-to-northd.
> 

With that in mind, and considering your explanation, what about
"ovn-match-northd-version"?

> 
>>
>>> +      <dd>
>>> +        The boolean flag indicates if <code>ovn-controller</code> needs to
>>> +        be pinned to the exact <code>ovn-northd</code> version. If this
>>> +        flag is set to true and the <code>ovn-northd's</code> version (reported
>>> +        in the Southbound database) doesn't match with the
>>> +        <code>ovn-controller's</code> internal version, then it will stop
>>> +        processing the Southbound and local Open vSwitch database changes.
>>> +        The default value is considered false if this option is not defined.
>>> +      </dd>
>>>      </dl>
>>>
>>>      <p>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index a06cae3ccb..a7344ea9c4 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -2136,6 +2136,37 @@ struct ovn_controller_exit_args {
>>>      bool *restart;
>>>  };
>>>
>>> +static bool
>>> +pin_to_northd(struct ovsdb_idl *ovs_idl)
>>> +{
>>> +    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
>>> +    return !cfg ? false : smap_get_bool(&cfg->external_ids,
>>> +                                        "ovn-pin-to-northd", false);
>>> +}
>>> +
>>> +/* Returns false if the northd internal version and ovn-controller internal
>>> + * version doesn't match.
>>> + */
>>> +static bool
>>> +check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
>>> +{
>>> +    if (!sb) {
>>> +        return false;
>>> +    }
>>> +
>>> +    const char *northd_version =
>>> +        smap_get_def(&sb->options, "northd_internal_version", "");
>>> +
>>> +    bool mismatch = strcmp(northd_version, my_version);
>>> +    if (mismatch) {
>>> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>>> +        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
>>> +                     "version - %s", my_version, northd_version);
>>> +    }
>>> +
>>> +    return mismatch;
>>> +}
>>> +
>>>  int
>>>  main(int argc, char *argv[])
>>>  {
>>> @@ -2428,10 +2459,14 @@ main(int argc, char *argv[])
>>>          .enable_lflow_cache = true
>>>      };
>>>
>>> +    char *ovn_internal_version = ovn_get_internal_version();
>>> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
>>> +
>>>      /* Main loop. */
>>>      exiting = false;
>>>      restart = false;
>>>      bool sb_monitor_all = false;
>>> +    bool version_mismatch_with_northd = false;
>>>      while (!exiting) {
>>>          /* If we're paused just run the unixctl server and skip most of the
>>>           * processing loop.
>>> @@ -2442,8 +2477,6 @@ main(int argc, char *argv[])
>>>              goto loop_done;
>>>          }
>>>
>>> -        engine_init_run();
>>> -
>>
>> I'm not sure why it's better to move engine_init_run() lower.  I think
>> it can stay here.
> 
> Ack. I'm fine either way.
> 
>>
>>>          struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
>>>          unsigned int new_ovs_cond_seqno
>>>              = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
>>> @@ -2473,6 +2506,22 @@ main(int argc, char *argv[])
>>>              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
>>>          }
>>>
>>> +        bool version_mismatched = false;
>>> +        if (pin_to_northd(ovs_idl_loop.idl)) {
>>> +            version_mismatched = check_northd_version(
>>> +                sbrec_sb_global_first(ovnsb_idl_loop.idl),
>>> +                ovn_internal_version);
>>> +        } else {
>>> +            version_mismatched = false;
>>> +        }
>>> +
>>> +        if (version_mismatch_with_northd != version_mismatched) {
>>> +            version_mismatch_with_northd = version_mismatched;
>>> +            engine_set_force_recompute(true);
>>> +        }
>>> +
>>> +        engine_init_run();
>>> +
>>
>> I think we can make the version checking more contained, what do you
>> think about the following incremental?
>>
>> https://github.com/dceara/ovn/commit/29c382137eb974aff3ea080b8098b2919189e6e3
> 
> It looks fine to me. I'd replace check_version with
> check_northd_version based on the reasons I gave above.
> 

Ok.

> 
>>
>>>          struct engine_context eng_ctx = {
>>>              .ovs_idl_txn = ovs_idl_txn,
>>>              .ovnsb_idl_txn = ovnsb_idl_txn,
>>> @@ -2481,7 +2530,8 @@ main(int argc, char *argv[])
>>>
>>>          engine_set_context(&eng_ctx);
>>>
>>> -        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
>>> +        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
>>> +            !version_mismatch_with_northd) {
>>>              /* Contains the transport zones that this Chassis belongs to */
>>>              struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
>>>              sset_from_delimited_string(&transport_zones,
>>> @@ -2770,6 +2820,7 @@ loop_done:
>>>          }
>>>      }
>>>
>>> +    free(ovn_internal_version);
>>>      unixctl_server_destroy(unixctl);
>>>      lflow_destroy();
>>>      ofctrl_destroy();
>>> @@ -2822,6 +2873,7 @@ parse_options(int argc, char *argv[])
>>>
>>>          case 'V':
>>>              ovs_print_version(OFP15_VERSION, OFP15_VERSION);
>>> +            printf("SB DB Schema %s\n", sbrec_get_db_version());
>>>              exit(EXIT_SUCCESS);
>>>
>>>          VLOG_OPTION_HANDLERS
>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>> index 18aac8da34..9dc9ade3b4 100644
>>> --- a/lib/ovn-util.c
>>> +++ b/lib/ovn-util.c
>>> @@ -20,6 +20,7 @@
>>>  #include <unistd.h>
>>>
>>>  #include "daemon.h"
>>> +#include "include/ovn/actions.h"
>>>  #include "openvswitch/ofp-parse.h"
>>>  #include "openvswitch/vlog.h"
>>>  #include "ovn-dirs.h"
>>> @@ -713,3 +714,19 @@ ip_address_and_port_from_lb_key(const char *key, char **ip_address,
>>>      *addr_family = ss.ss_family;
>>>      return true;
>>>  }
>>> +
>>> +#define N_OVN_ACTIONS N_OVNACTS
>>> +BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
>>> +
>>> +/* Increment this for any logical flow changes or if existing OVN action is
>>> + * modified. */
>>
>> Except for this comment here (that can be easily overlooked), we don't
>> mention anywhere else that the minor version should be incremented when
>> changing existing OVN actions.
>>
>> I'm not sure if we could automatically generate the minor version but in
>> any case we should make this more visible (both for contributors and
>> reviewers).
> 
> Maybe we should never update an existing action (from now on ) ? If
> there is a need we can
> add a new action  ?
> 
> Or can we come up with a way to version the ovnacts ? Right now we
> have a 'pad' field in the struct ovnact.
> I think we can rename it to version. Whenever an action is updated we
> need to increment this field.
> And the OVN_INTERNAL_MINOR_VER will be the sum of all the versions of ovnacts ?

There would still be no easy way to enforce this, except at review time
but it's probably better than with a hardcoded OVN_INTERNAL_MINOR_VER value.

> 
> If this sounds reasonable I can look into this as a follow up patch.
> 

I think this would be nice to have.

>>
>> I'd suggest at least updating the comment in actions.h where we define
>> OVNACTS.
> 
> Ack.
> 
>>
>> Maybe we should also mention this in one of the internals/*.rst docs.
>> Is Documentation/internals/submitting-patches.rst the best place to
>> mention this?  I see we have a section for "Feature Deprecation
>> Guidelines", maybe we can add one for "OVN compatibility" or "OVN Upgrade"?
> 
> I'll see if we can add more documentation about it here.
> 

Thanks!
Numan Siddique Nov. 18, 2020, 1:25 p.m. UTC | #5
On Wed, Nov 18, 2020 at 6:52 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 11/18/20 2:08 PM, Numan Siddique wrote:
> > On Wed, Nov 18, 2020 at 5:23 PM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 11/17/20 12:57 PM, numans@ovn.org wrote:
> >>> From: Numan Siddique <numans@ovn.org>
> >>>
> >>> OVN recommends updating/upgrading ovn-controllers first and then
> >>> ovn-northd and OVN DB ovsdb-servers.  This is to ensure that any
> >>> new functionality specified by the database or logical flows created
> >>> by ovn-northd is understood by ovn-controller.
> >>>
> >>> However certain deployments may upgrade ovn-northd services first and
> >>> then ovn-controllers.  In a large scal deployment, this can result in
> >>> downtime during upgrades as old ovn-controllers may not understand
> >>> new logical flows or new actions added by ovn-northd.
> >>>
> >>> Even with upgrading ovn-controllers first, can result in ovn-controllers
> >>> rejecting some of the logical flows if an existing OVN action is
> >>> changed.  One such example is ct_commit action which recently was updated
> >>> to take new arguments.
> >>>
> >>> To avoid such downtimes during upgrades, this patch adds the
> >>> functionality of pinning ovn-controller and ovn-northd to a specific
> >>> version. An internal OVN version is generated and this version is stored
> >>> by ovn-northd in the Southbound SB_Global table's
> >>> options:northd_internal_version.
> >>>
> >>> When ovn-controller notices that the internal version has changed, it
> >>> stops handling the database changes - both Southbound and OVS. All the
> >>> existing OF flows are preserved.  When ovn-controller is upgraded to the
> >>> same version as ovn-northd services, it will process the database
> >>> changes.
> >>>
> >>> This feature is made optional and disabled by default. Any CMS can
> >>> enable it by configuring the OVS local database with the option -
> >>> ovn-pin-to-northd=false.
> >>>
> >>> Signed-off-by: Numan Siddique <numans@ovn.org>
> >>> ---
> >>
> >> Hi Numan,
> >>
> >> Next to Flavio's comments I have some too.
> >>
> >
> > Thanks Flavio and Dumitru for the reviews.
> >
> >
> >>>
> >>> v1 -> v2
> >>> ----
> >>>   * Added test cases and missing documentation.
> >>>   * Renamed the ovsdb option from ignore_northd_verison
> >>>     to ovn-pin-to-northd.
> >>>   * Added NEWS item.
> >>>
> >>>  NEWS                            |   3 +
> >>>  controller/ovn-controller.8.xml |  11 +++
> >>>  controller/ovn-controller.c     |  58 ++++++++++++-
> >>>  lib/ovn-util.c                  |  17 ++++
> >>>  lib/ovn-util.h                  |   4 +
> >>>  northd/ovn-northd.c             |  18 +++-
> >>>  tests/ovn.at                    | 147 ++++++++++++++++++++++++++++++++
> >>>  7 files changed, 251 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/NEWS b/NEWS
> >>> index 6010230679..bb95724b36 100644
> >>> --- a/NEWS
> >>> +++ b/NEWS
> >>> @@ -8,6 +8,9 @@ Post-v20.09.0
> >>>       server.
> >>>     - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
> >>>       traffic.
> >>> +   - Support version pinning between ovn-northd and ovn-controller as an
> >>> +     option. If the option is enabled and the versions don't match,
> >>> +     ovn-controller will not process any DB changes.
> >>>
> >>>  OVN v20.09.0 - 28 Sep 2020
> >>>  --------------------------
> >>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> >>> index 16bc47b205..357edd1f5c 100644
> >>> --- a/controller/ovn-controller.8.xml
> >>> +++ b/controller/ovn-controller.8.xml
> >>> @@ -233,6 +233,17 @@
> >>>          The boolean flag indicates if the chassis is used as an
> >>>          interconnection gateway.
> >>>        </dd>
> >>> +
> >>> +      <dt><code>external_ids:ovn-pin-to-northd</code></dt>
> >>
> >> This still seems misleading to me.  We don't actually pin to a northd
> >> version, we pin to an OVN internal version.  What if we call it
> >> "ovn-pin-to-version"?
> >
> > It's true that we pin to the OVN internal version, but my idea is that
> > ovn-northd writes
> > its internal version to SB_Global. And ovn-controller reads the northd
> > version and compares
> > with its own version.
> >
> > Few other points
> >   - I think ovn-pin-to-northd (or ovn-pin-to-northd-version) makes
> > sense from a CMS point of view.
> >   - CMS doesn't need to know about the internal version mechanism used.
> >   - In production deployments when northd services are updated first,
> > ovn-northd will update its new
> >     internal version to SB_Global.
> >   - And the idea of this patch is to pin the controller to the same
> > northd version.
>
> So I guess this translates to "make ovn-controller reject SB contents if
> they were created by the wrong northd version".
>
> >
> > So I'd prefer to call it ovn-pin-to-northd.
> >
>
> With that in mind, and considering your explanation, what about
> "ovn-match-northd-version"?

Sounds better. Thanks.

>
> >
> >>
> >>> +      <dd>
> >>> +        The boolean flag indicates if <code>ovn-controller</code> needs to
> >>> +        be pinned to the exact <code>ovn-northd</code> version. If this
> >>> +        flag is set to true and the <code>ovn-northd's</code> version (reported
> >>> +        in the Southbound database) doesn't match with the
> >>> +        <code>ovn-controller's</code> internal version, then it will stop
> >>> +        processing the Southbound and local Open vSwitch database changes.
> >>> +        The default value is considered false if this option is not defined.
> >>> +      </dd>
> >>>      </dl>
> >>>
> >>>      <p>
> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>> index a06cae3ccb..a7344ea9c4 100644
> >>> --- a/controller/ovn-controller.c
> >>> +++ b/controller/ovn-controller.c
> >>> @@ -2136,6 +2136,37 @@ struct ovn_controller_exit_args {
> >>>      bool *restart;
> >>>  };
> >>>
> >>> +static bool
> >>> +pin_to_northd(struct ovsdb_idl *ovs_idl)
> >>> +{
> >>> +    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
> >>> +    return !cfg ? false : smap_get_bool(&cfg->external_ids,
> >>> +                                        "ovn-pin-to-northd", false);
> >>> +}
> >>> +
> >>> +/* Returns false if the northd internal version and ovn-controller internal
> >>> + * version doesn't match.
> >>> + */
> >>> +static bool
> >>> +check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
> >>> +{
> >>> +    if (!sb) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    const char *northd_version =
> >>> +        smap_get_def(&sb->options, "northd_internal_version", "");
> >>> +
> >>> +    bool mismatch = strcmp(northd_version, my_version);
> >>> +    if (mismatch) {
> >>> +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> >>> +        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
> >>> +                     "version - %s", my_version, northd_version);
> >>> +    }
> >>> +
> >>> +    return mismatch;
> >>> +}
> >>> +
> >>>  int
> >>>  main(int argc, char *argv[])
> >>>  {
> >>> @@ -2428,10 +2459,14 @@ main(int argc, char *argv[])
> >>>          .enable_lflow_cache = true
> >>>      };
> >>>
> >>> +    char *ovn_internal_version = ovn_get_internal_version();
> >>> +    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
> >>> +
> >>>      /* Main loop. */
> >>>      exiting = false;
> >>>      restart = false;
> >>>      bool sb_monitor_all = false;
> >>> +    bool version_mismatch_with_northd = false;
> >>>      while (!exiting) {
> >>>          /* If we're paused just run the unixctl server and skip most of the
> >>>           * processing loop.
> >>> @@ -2442,8 +2477,6 @@ main(int argc, char *argv[])
> >>>              goto loop_done;
> >>>          }
> >>>
> >>> -        engine_init_run();
> >>> -
> >>
> >> I'm not sure why it's better to move engine_init_run() lower.  I think
> >> it can stay here.
> >
> > Ack. I'm fine either way.
> >
> >>
> >>>          struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
> >>>          unsigned int new_ovs_cond_seqno
> >>>              = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
> >>> @@ -2473,6 +2506,22 @@ main(int argc, char *argv[])
> >>>              ovnsb_cond_seqno = new_ovnsb_cond_seqno;
> >>>          }
> >>>
> >>> +        bool version_mismatched = false;
> >>> +        if (pin_to_northd(ovs_idl_loop.idl)) {
> >>> +            version_mismatched = check_northd_version(
> >>> +                sbrec_sb_global_first(ovnsb_idl_loop.idl),
> >>> +                ovn_internal_version);
> >>> +        } else {
> >>> +            version_mismatched = false;
> >>> +        }
> >>> +
> >>> +        if (version_mismatch_with_northd != version_mismatched) {
> >>> +            version_mismatch_with_northd = version_mismatched;
> >>> +            engine_set_force_recompute(true);
> >>> +        }
> >>> +
> >>> +        engine_init_run();
> >>> +
> >>
> >> I think we can make the version checking more contained, what do you
> >> think about the following incremental?
> >>
> >> https://github.com/dceara/ovn/commit/29c382137eb974aff3ea080b8098b2919189e6e3
> >
> > It looks fine to me. I'd replace check_version with
> > check_northd_version based on the reasons I gave above.
> >
>
> Ok.
>
> >
> >>
> >>>          struct engine_context eng_ctx = {
> >>>              .ovs_idl_txn = ovs_idl_txn,
> >>>              .ovnsb_idl_txn = ovnsb_idl_txn,
> >>> @@ -2481,7 +2530,8 @@ main(int argc, char *argv[])
> >>>
> >>>          engine_set_context(&eng_ctx);
> >>>
> >>> -        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
> >>> +        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
> >>> +            !version_mismatch_with_northd) {
> >>>              /* Contains the transport zones that this Chassis belongs to */
> >>>              struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
> >>>              sset_from_delimited_string(&transport_zones,
> >>> @@ -2770,6 +2820,7 @@ loop_done:
> >>>          }
> >>>      }
> >>>
> >>> +    free(ovn_internal_version);
> >>>      unixctl_server_destroy(unixctl);
> >>>      lflow_destroy();
> >>>      ofctrl_destroy();
> >>> @@ -2822,6 +2873,7 @@ parse_options(int argc, char *argv[])
> >>>
> >>>          case 'V':
> >>>              ovs_print_version(OFP15_VERSION, OFP15_VERSION);
> >>> +            printf("SB DB Schema %s\n", sbrec_get_db_version());
> >>>              exit(EXIT_SUCCESS);
> >>>
> >>>          VLOG_OPTION_HANDLERS
> >>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> >>> index 18aac8da34..9dc9ade3b4 100644
> >>> --- a/lib/ovn-util.c
> >>> +++ b/lib/ovn-util.c
> >>> @@ -20,6 +20,7 @@
> >>>  #include <unistd.h>
> >>>
> >>>  #include "daemon.h"
> >>> +#include "include/ovn/actions.h"
> >>>  #include "openvswitch/ofp-parse.h"
> >>>  #include "openvswitch/vlog.h"
> >>>  #include "ovn-dirs.h"
> >>> @@ -713,3 +714,19 @@ ip_address_and_port_from_lb_key(const char *key, char **ip_address,
> >>>      *addr_family = ss.ss_family;
> >>>      return true;
> >>>  }
> >>> +
> >>> +#define N_OVN_ACTIONS N_OVNACTS
> >>> +BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
> >>> +
> >>> +/* Increment this for any logical flow changes or if existing OVN action is
> >>> + * modified. */
> >>
> >> Except for this comment here (that can be easily overlooked), we don't
> >> mention anywhere else that the minor version should be incremented when
> >> changing existing OVN actions.
> >>
> >> I'm not sure if we could automatically generate the minor version but in
> >> any case we should make this more visible (both for contributors and
> >> reviewers).
> >
> > Maybe we should never update an existing action (from now on ) ? If
> > there is a need we can
> > add a new action  ?
> >
> > Or can we come up with a way to version the ovnacts ? Right now we
> > have a 'pad' field in the struct ovnact.
> > I think we can rename it to version. Whenever an action is updated we
> > need to increment this field.
> > And the OVN_INTERNAL_MINOR_VER will be the sum of all the versions of ovnacts ?
>
> There would still be no easy way to enforce this, except at review time
> but it's probably better than with a hardcoded OVN_INTERNAL_MINOR_VER value.

That's true.


>
> >
> > If this sounds reasonable I can look into this as a follow up patch.
> >
>
> I think this would be nice to have.

I will explore if it is straightforward enough to be included in v3.

Thanks for the reviews.

Numan

>
> >>
> >> I'd suggest at least updating the comment in actions.h where we define
> >> OVNACTS.
> >
> > Ack.
> >
> >>
> >> Maybe we should also mention this in one of the internals/*.rst docs.
> >> Is Documentation/internals/submitting-patches.rst the best place to
> >> mention this?  I see we have a section for "Feature Deprecation
> >> Guidelines", maybe we can add one for "OVN compatibility" or "OVN Upgrade"?
> >
> > I'll see if we can add more documentation about it here.
> >
>
> Thanks!
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 6010230679..bb95724b36 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,9 @@  Post-v20.09.0
      server.
    - Support other_config:vlan-passthru=true to allow VLAN tagged incoming
      traffic.
+   - Support version pinning between ovn-northd and ovn-controller as an
+     option. If the option is enabled and the versions don't match,
+     ovn-controller will not process any DB changes.
 
 OVN v20.09.0 - 28 Sep 2020
 --------------------------
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 16bc47b205..357edd1f5c 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -233,6 +233,17 @@ 
         The boolean flag indicates if the chassis is used as an
         interconnection gateway.
       </dd>
+
+      <dt><code>external_ids:ovn-pin-to-northd</code></dt>
+      <dd>
+        The boolean flag indicates if <code>ovn-controller</code> needs to
+        be pinned to the exact <code>ovn-northd</code> version. If this
+        flag is set to true and the <code>ovn-northd's</code> version (reported
+        in the Southbound database) doesn't match with the
+        <code>ovn-controller's</code> internal version, then it will stop
+        processing the Southbound and local Open vSwitch database changes.
+        The default value is considered false if this option is not defined.
+      </dd>
     </dl>
 
     <p>
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index a06cae3ccb..a7344ea9c4 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2136,6 +2136,37 @@  struct ovn_controller_exit_args {
     bool *restart;
 };
 
+static bool
+pin_to_northd(struct ovsdb_idl *ovs_idl)
+{
+    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
+    return !cfg ? false : smap_get_bool(&cfg->external_ids,
+                                        "ovn-pin-to-northd", false);
+}
+
+/* Returns false if the northd internal version and ovn-controller internal
+ * version doesn't match.
+ */
+static bool
+check_northd_version(const struct sbrec_sb_global *sb, const char *my_version)
+{
+    if (!sb) {
+        return false;
+    }
+
+    const char *northd_version =
+        smap_get_def(&sb->options, "northd_internal_version", "");
+
+    bool mismatch = strcmp(northd_version, my_version);
+    if (mismatch) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+        VLOG_WARN_RL(&rl, "controller version - %s mismatch with northd "
+                     "version - %s", my_version, northd_version);
+    }
+
+    return mismatch;
+}
+
 int
 main(int argc, char *argv[])
 {
@@ -2428,10 +2459,14 @@  main(int argc, char *argv[])
         .enable_lflow_cache = true
     };
 
+    char *ovn_internal_version = ovn_get_internal_version();
+    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
+
     /* Main loop. */
     exiting = false;
     restart = false;
     bool sb_monitor_all = false;
+    bool version_mismatch_with_northd = false;
     while (!exiting) {
         /* If we're paused just run the unixctl server and skip most of the
          * processing loop.
@@ -2442,8 +2477,6 @@  main(int argc, char *argv[])
             goto loop_done;
         }
 
-        engine_init_run();
-
         struct ovsdb_idl_txn *ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
         unsigned int new_ovs_cond_seqno
             = ovsdb_idl_get_condition_seqno(ovs_idl_loop.idl);
@@ -2473,6 +2506,22 @@  main(int argc, char *argv[])
             ovnsb_cond_seqno = new_ovnsb_cond_seqno;
         }
 
+        bool version_mismatched = false;
+        if (pin_to_northd(ovs_idl_loop.idl)) {
+            version_mismatched = check_northd_version(
+                sbrec_sb_global_first(ovnsb_idl_loop.idl),
+                ovn_internal_version);
+        } else {
+            version_mismatched = false;
+        }
+
+        if (version_mismatch_with_northd != version_mismatched) {
+            version_mismatch_with_northd = version_mismatched;
+            engine_set_force_recompute(true);
+        }
+
+        engine_init_run();
+
         struct engine_context eng_ctx = {
             .ovs_idl_txn = ovs_idl_txn,
             .ovnsb_idl_txn = ovnsb_idl_txn,
@@ -2481,7 +2530,8 @@  main(int argc, char *argv[])
 
         engine_set_context(&eng_ctx);
 
-        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl)) {
+        if (ovsdb_idl_has_ever_connected(ovnsb_idl_loop.idl) &&
+            !version_mismatch_with_northd) {
             /* Contains the transport zones that this Chassis belongs to */
             struct sset transport_zones = SSET_INITIALIZER(&transport_zones);
             sset_from_delimited_string(&transport_zones,
@@ -2770,6 +2820,7 @@  loop_done:
         }
     }
 
+    free(ovn_internal_version);
     unixctl_server_destroy(unixctl);
     lflow_destroy();
     ofctrl_destroy();
@@ -2822,6 +2873,7 @@  parse_options(int argc, char *argv[])
 
         case 'V':
             ovs_print_version(OFP15_VERSION, OFP15_VERSION);
+            printf("SB DB Schema %s\n", sbrec_get_db_version());
             exit(EXIT_SUCCESS);
 
         VLOG_OPTION_HANDLERS
diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 18aac8da34..9dc9ade3b4 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -20,6 +20,7 @@ 
 #include <unistd.h>
 
 #include "daemon.h"
+#include "include/ovn/actions.h"
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/vlog.h"
 #include "ovn-dirs.h"
@@ -713,3 +714,19 @@  ip_address_and_port_from_lb_key(const char *key, char **ip_address,
     *addr_family = ss.ss_family;
     return true;
 }
+
+#define N_OVN_ACTIONS N_OVNACTS
+BUILD_ASSERT_DECL(N_OVN_ACTIONS == 47);
+
+/* Increment this for any logical flow changes or if existing OVN action is
+ * modified. */
+#define OVN_INTERNAL_MINOR_VER 0
+
+/* Returns the OVN version. The caller must free the returned value. */
+char *
+ovn_get_internal_version(void)
+{
+    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
+                     sbrec_get_db_version(),
+                     N_OVN_ACTIONS, OVN_INTERNAL_MINOR_VER);
+}
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index 3496673b26..7edd301802 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -221,4 +221,8 @@  char *str_tolower(const char *orig);
 bool ip_address_and_port_from_lb_key(const char *key, char **ip_address,
                                      uint16_t *port, int *addr_family);
 
+/* Returns the internal OVN version. The caller must free the returned
+ * value. */
+char *ovn_get_internal_version(void);
+
 #endif
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 4d4190cb9b..2475605cc0 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -12062,7 +12062,8 @@  ovnnb_db_run(struct northd_context *ctx,
              struct ovsdb_idl_loop *sb_loop,
              struct hmap *datapaths, struct hmap *ports,
              struct ovs_list *lr_list,
-             int64_t loop_start_time)
+             int64_t loop_start_time,
+             const char *ovn_internal_version)
 {
     if (!ctx->ovnsb_txn || !ctx->ovnnb_txn) {
         return;
@@ -12139,6 +12140,8 @@  ovnnb_db_run(struct northd_context *ctx,
     smap_replace(&options, "max_tunid", max_tunid);
     free(max_tunid);
 
+    smap_replace(&options, "northd_internal_version", ovn_internal_version);
+
     nbrec_nb_global_verify_options(nb);
     nbrec_nb_global_set_options(nb, &options);
 
@@ -12746,7 +12749,8 @@  ovnsb_db_run(struct northd_context *ctx,
 static void
 ovn_db_run(struct northd_context *ctx,
            struct ovsdb_idl_index *sbrec_chassis_by_name,
-           struct ovsdb_idl_loop *ovnsb_idl_loop)
+           struct ovsdb_idl_loop *ovnsb_idl_loop,
+           const char *ovn_internal_version)
 {
     struct hmap datapaths, ports;
     struct ovs_list lr_list;
@@ -12756,7 +12760,8 @@  ovn_db_run(struct northd_context *ctx,
 
     int64_t start_time = time_wall_msec();
     ovnnb_db_run(ctx, sbrec_chassis_by_name, ovnsb_idl_loop,
-                 &datapaths, &ports, &lr_list, start_time);
+                 &datapaths, &ports, &lr_list, start_time,
+                 ovn_internal_version);
     ovnsb_db_run(ctx, ovnsb_idl_loop, &ports, start_time);
     destroy_datapaths_and_ports(&datapaths, &ports, &lr_list);
 }
@@ -13113,6 +13118,9 @@  main(int argc, char *argv[])
     unixctl_command_register("sb-connection-status", "", 0, 0,
                              ovn_conn_show, ovnsb_idl_loop.idl);
 
+    char *ovn_internal_version = ovn_get_internal_version();
+    VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
+
     /* Main loop. */
     exiting = false;
     state.had_lock = false;
@@ -13154,7 +13162,8 @@  main(int argc, char *argv[])
             }
 
             if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
+                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop,
+                           ovn_internal_version);
                 if (ctx.ovnsb_txn) {
                     check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
                     check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
@@ -13216,6 +13225,7 @@  main(int argc, char *argv[])
         }
     }
 
+    free(ovn_internal_version);
     unixctl_server_destroy(unixctl);
     ovsdb_idl_loop_destroy(&ovnnb_idl_loop);
     ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
diff --git a/tests/ovn.at b/tests/ovn.at
index c0219bbd47..42a0f9a00f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22687,3 +22687,150 @@  AT_CHECK([test "$encap_rec_mvtep" = "$encap_rec_mvtep1"], [0], [])
 
 OVN_CLEANUP([hv1])
 AT_CLEANUP
+
+AT_SETUP([ovn -- check ovn-northd and ovn-controller version pinning])
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.10
+
+check ovn-nbctl ls-add sw0
+check ovn-nbctl lsp-add sw0 sw0-p1
+check ovn-nbctl lsp-add sw0 sw0-p2
+
+as hv1
+ovs-vsctl \
+    -- add-port br-int vif1 \
+    -- set Interface vif1 external_ids:iface-id=sw0-p1 \
+    ofport-request=1
+ovs-vsctl \
+    -- add-port br-int vif2 \
+    -- set Interface vif2 external_ids:iface-id=sw0-p2 \
+    ofport-request=2
+
+# Wait for port to be bound.
+wait_row_count Chassis 1 name=hv1
+ch=$(fetch_column Chassis _uuid name=hv1)
+wait_row_count Port_Binding 1 logical_port=sw0-p1 chassis=$ch
+wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
+
+northd_version=$(ovn-sbctl get SB_Global . options:northd_internal_version | sed s/\"//g)
+echo "northd version = $northd_version"
+AT_CHECK([grep -c $northd_version hv1/ovn-controller.log], [0], [1
+])
+
+# Stop ovn-northd so that we can modify the northd_version.
+as northd
+OVS_APP_EXIT_AND_WAIT([ovn-northd])
+
+as northd-backup
+OVS_APP_EXIT_AND_WAIT([ovn-northd])
+
+check ovn-sbctl set SB_Global . options:northd_internal_version=foo
+
+as hv1
+check ovs-vsctl set interface vif2 external_ids:iface-id=foo
+
+# ovn-controller should release the lport sw0-p2 since ovn-pin-to-northd
+# is not true.
+wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
+
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
+0
+])
+
+echo
+echo "__file__:__line__: Pin ovn-controller to ovn-northd version."
+
+as hv1
+check ovs-vsctl set open . external_ids:ovn-pin-to-northd=true
+
+OVS_WAIT_UNTIL(
+    [test 1 = $(grep -c "controller version - $northd_version mismatch with northd version - foo" hv1/ovn-controller.log)
+])
+
+as hv1
+check ovs-vsctl set interface vif2 external_ids:iface-id=sw0-p2
+
+# ovn-controller should not claim sw0-p2 since there is version mismatch
+as hv1 ovn-appctl -t ovn-controller recompute
+wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
+
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
+0
+])
+
+check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
+
+# It should claim sw0-p2
+wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
+
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
+1
+])
+
+as hv1
+ovn_remote=$(ovs-vsctl get open . external_ids:ovn-remote | sed s/\"//g)
+ovs-vsctl set open . external_ids:ovn-remote=unix:foo
+check ovs-vsctl set interface vif2 external_ids:iface-id=foo
+
+# ovn-controller is not connected to the SB DB. Even though it
+# releases sw0-p2, it will not delete the OF flows.
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
+1
+])
+
+# Change the version to incorrect one and reconnect to the SB DB.
+check ovn-sbctl set SB_Global . options:northd_internal_version=bar
+
+as hv1
+check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
+
+sleep 1
+
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [0], [dnl
+1
+])
+
+wait_row_count Port_Binding 1 logical_port=sw0-p2 chassis=$ch
+
+# Change the ovn-remote to incorrect and set the correct northd version
+# and then change back to the correct ovn-remote
+as hv1
+check ovs-vsctl set open . external_ids:ovn-remote=unix:foo
+
+check ovn-sbctl set SB_Global . options:northd_internal_version=$northd_version
+
+as hv1
+check ovs-vsctl set open . external_ids:ovn-remote=$ovn_remote
+
+wait_row_count Port_Binding 1 logical_port=sw0-p2 'chassis=[[]]'
+as hv1 ovs-ofctl dump-flows br-int table=0 > offlows_table0.txt
+AT_CAPTURE_FILE([offlows_table0.txt])
+AT_CHECK_UNQUOTED([grep -c "in_port=2" offlows_table0.txt], [1], [dnl
+0
+])
+
+as hv1
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+as ovn-sb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as ovn-nb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+AT_CLEANUP