diff mbox series

[ovs-dev,v2] ovn-controller: Avoid recompute triggered by external_ids:ovn-installed update.

Message ID 20220503193722.1112901-1-hzhou@ovn.org
State Accepted
Headers show
Series [ovs-dev,v2] ovn-controller: Avoid recompute triggered by external_ids:ovn-installed update. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Han Zhou May 3, 2022, 7:37 p.m. UTC
During the process of claiming a VIF port on a chassis, there are
multiple SB and local OVSDB updates by ovn-controller, which may trigger
ovn-controller recompute, depending on the timing between the completion
of SB and local OVSDB updates:

After port claiming, ovn-controller sends an update to the local OVSDB
to clean the Interface:external_ids:ovn-installed, and also sends an
update to the SB DB to update the Port_Binding:chassis.  If the local
OVSDB update notification comes back before the SB update completes,
ovn-controller's SB IDL is still "in transaction" and read-only, but the
local Interface update triggers the I-P engine runtime_data node to
handle the OVS_interface change, which considers the VIF to be claimed
(again) potentially, which requires SB updates while it is not possible
at the moment, and so the I-P handler returns failure and the engine
aborts, which triggers recompute at the next I-P engine iteration when
SB DB is writable.

If we are lucky at the above step that the SB update completes before
the local OVSDB update is received, ovn-controller would handle the
change incrementally (and figures out that the SB Port_Binding chassis
is already the current one thus wouldn't reclaim the VIF). But the next
step after it completes flow computing for the newly bound VIF it sends
another update to the local OVSDB to set the
Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
same time it also sends an update to the SB DB to set the
Port_Binding:up. These two parallel updates would lead to the same
situation as the above step and potentially cause ovn-controll
recompute.

The recompute is very likely to be triggered in production because the
SB DB is remote (on central nodes) and has heavier load, which requires
longer time to complete the updates than the local OVSDB server.

The problem is that the runtime_data's OVS_interface change handler
couldn't handle the update of the Interface:external_ids:ovn-installed
incrementally, because ovn-controller assumes it could require an SB
update that is not possible at the moment. However, the assumption is
not true. In fact it shouldn't even care about the change because this
external_id:ovn-installed is written by ovn-controller itself and it is
write-only (no need to read it as an input for any further processing).

So, the idea to fix the problem is to ignore such changes that happen to
the "write-only" fields only. OVSDB change tracking provides APIs to
check if a specific column is updated, but the difficulty here is that
the column external_ids is a k-v map which contains not only the data
that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
but also the data ovn-controller cares about, such as iface-id,
iface-id-ver and encap-ip. To solve this problem, we need to know what's
changed exactly inside the external_ids map.

This patch solves it by introducing a new I-P engine node
ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
copy of the older version of the column external_ids, and replace the
input OVS_interface of the runtime_data node. In the change handler of the
runtime_data node we evaluates if the change needs to be handled before
calling consider_iface_claim(), so in the above scenarios the changes
don't need to be handled and won't trigger recompute any more. A test
case is also updated to capture this (which is very likely, if not 100%,
to fail without the fix).

Signed-off-by: Han Zhou <hzhou@ovn.org>
---
v2: Add comments that was missing while committing v1.

 controller/binding.c        |  70 ++++++++++++++++++++++++
 controller/binding.h        |   1 +
 controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
 tests/ovn-performance.at    |  17 ++++++
 4 files changed, 186 insertions(+), 7 deletions(-)

Comments

Mark Michelson May 9, 2022, 7:32 p.m. UTC | #1
Based on the explanation and the code, this seems to solve the problem 
of the spurious recomputes. I'd like confirmation from Xavier that this 
is fixing the issue he was addressing in 
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/ 
, and I'd also like confirmation that this approach doesn't incur the 
delays that Dumitru mentioned on Xavier's patch.

With those confirmations, I'll add my ack to this patch.

On 5/3/22 15:37, Han Zhou wrote:
> During the process of claiming a VIF port on a chassis, there are
> multiple SB and local OVSDB updates by ovn-controller, which may trigger
> ovn-controller recompute, depending on the timing between the completion
> of SB and local OVSDB updates:
> 
> After port claiming, ovn-controller sends an update to the local OVSDB
> to clean the Interface:external_ids:ovn-installed, and also sends an
> update to the SB DB to update the Port_Binding:chassis.  If the local
> OVSDB update notification comes back before the SB update completes,
> ovn-controller's SB IDL is still "in transaction" and read-only, but the
> local Interface update triggers the I-P engine runtime_data node to
> handle the OVS_interface change, which considers the VIF to be claimed
> (again) potentially, which requires SB updates while it is not possible
> at the moment, and so the I-P handler returns failure and the engine
> aborts, which triggers recompute at the next I-P engine iteration when
> SB DB is writable.
> 
> If we are lucky at the above step that the SB update completes before
> the local OVSDB update is received, ovn-controller would handle the
> change incrementally (and figures out that the SB Port_Binding chassis
> is already the current one thus wouldn't reclaim the VIF). But the next
> step after it completes flow computing for the newly bound VIF it sends
> another update to the local OVSDB to set the
> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
> same time it also sends an update to the SB DB to set the
> Port_Binding:up. These two parallel updates would lead to the same
> situation as the above step and potentially cause ovn-controll
> recompute.
> 
> The recompute is very likely to be triggered in production because the
> SB DB is remote (on central nodes) and has heavier load, which requires
> longer time to complete the updates than the local OVSDB server.
> 
> The problem is that the runtime_data's OVS_interface change handler
> couldn't handle the update of the Interface:external_ids:ovn-installed
> incrementally, because ovn-controller assumes it could require an SB
> update that is not possible at the moment. However, the assumption is
> not true. In fact it shouldn't even care about the change because this
> external_id:ovn-installed is written by ovn-controller itself and it is
> write-only (no need to read it as an input for any further processing).
> 
> So, the idea to fix the problem is to ignore such changes that happen to
> the "write-only" fields only. OVSDB change tracking provides APIs to
> check if a specific column is updated, but the difficulty here is that
> the column external_ids is a k-v map which contains not only the data
> that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
> but also the data ovn-controller cares about, such as iface-id,
> iface-id-ver and encap-ip. To solve this problem, we need to know what's
> changed exactly inside the external_ids map.
> 
> This patch solves it by introducing a new I-P engine node
> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
> copy of the older version of the column external_ids, and replace the
> input OVS_interface of the runtime_data node. In the change handler of the
> runtime_data node we evaluates if the change needs to be handled before
> calling consider_iface_claim(), so in the above scenarios the changes
> don't need to be handled and won't trigger recompute any more. A test
> case is also updated to capture this (which is very likely, if not 100%,
> to fail without the fix).
> 
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
> v2: Add comments that was missing while committing v1.
> 
>   controller/binding.c        |  70 ++++++++++++++++++++++++
>   controller/binding.h        |   1 +
>   controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
>   tests/ovn-performance.at    |  17 ++++++
>   4 files changed, 186 insertions(+), 7 deletions(-)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 7281b0485..0563d9d1e 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct ovsrec_interface *iface,
>       return false;
>   }
>   
> +static bool
> +is_ext_id_changed(const struct smap *a, const struct smap *b, const char *key)
> +{
> +    const char *value_a = smap_get(a, key);
> +    const char *value_b = smap_get(b, key);
> +    if ((value_a && !value_b)
> +        || (!value_a && value_b)
> +        || (value_a && value_b && strcmp(value_a, value_b))) {
> +        return true;
> +    }
> +    return false;
> +}
> +
> +/* Check if the change in 'iface_rec' is something we are interested in from
> + * port binding perspective.  Return true if the change needs to be handled,
> + * otherwise return false.
> + *
> + * The 'iface_rec' must be change tracked, i.e. iterator from
> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
> +static bool
> +ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec,
> +                                 struct shash *iface_table_external_ids_old)
> +{
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_NAME)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_TYPE)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
> +        /* Compare the external_ids that we are interested in with the old
> +         * values:
> +         * - iface-id
> +         * - iface-id-ver
> +         * - encap-ip
> +         * For any other changes, such as ovn-installed, ovn-installed-ts, etc,
> +         * we don't need to handle. */
> +        struct smap *external_ids_old =
> +            shash_find_data(iface_table_external_ids_old, iface_rec->name);
> +        if (!external_ids_old) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "iface-id")) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "iface-id-ver")) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "encap-ip")) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
>   /* Returns true if the ovs interface changes were handled successfully,
>    * false otherwise.
>    */
> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct binding_ctx_in *b_ctx_in,
>               continue;
>           }
>   
> +        if (!ovs_interface_change_need_handle(
> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
> +            continue;
> +        }
> +
>           const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id");
>           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
>           if (iface_id && ofport > 0 &&
> diff --git a/controller/binding.h b/controller/binding.h
> index 430a8d9b1..e49e1ebb6 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>       const struct ovsrec_bridge_table *bridge_table;
>       const struct ovsrec_open_vswitch_table *ovs_table;
>       const struct ovsrec_interface_table *iface_table;
> +    struct shash *iface_table_external_ids_old;
>   };
>   
>   /* Locally relevant port bindings, e.g., VIFs that might be bound locally,
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 22b7fa935..649353f7d 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node *node, void *data)
>       engine_set_node_state(node, EN_UNCHANGED);
>   }
>   
> +/* This engine node is to wrap the OVS_interface input and maintain a copy of
> + * the old version of data for the column external_ids.
> + *
> + * There are some special considerations of this engine node:
> + * 1. It has a single input OVS_interface, and it transparently passes the
> + *    input changes as its own output data to its dependants. So there is no
> + *    processing to OVS_interface changes but simply mark the node status as
> + *    UPDATED (and so the run() and the change handler is the same).
> + * 2. The iface_table_external_ids_old is computed/updated in the member
> + *    clear_tracked_data(), because that is when the last round of processing
> + *    has completed but the new IDL data is yet to refresh, so we replace the
> + *    old data with the current data. */
> +struct ed_type_ovs_interface_shadow {
> +    struct ovsrec_interface_table *iface_table;
> +    struct shash iface_table_external_ids_old;
> +};
> +
> +static void *
> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
> +                             struct engine_arg *arg OVS_UNUSED)
> +{
> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
> +    data->iface_table = NULL;
> +    shash_init(&data->iface_table_external_ids_old);
> +
> +    return data;
> +}
> +
> +static void
> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
> +{
> +    struct shash_node *node;
> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
> +        struct smap *ext_ids = node->data;
> +        smap_destroy(ext_ids);
> +    }
> +    shash_destroy_free_data(table_ext_ids);
> +}
> +
> +static void
> +en_ovs_interface_shadow_cleanup(void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> +}
> +
> +static void
> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> +    shash_init(&data->iface_table_external_ids_old);
> +
> +    if (!data->iface_table) {
> +        return;
> +    }
> +
> +    const struct ovsrec_interface *iface_rec;
> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
> +        smap_clone(external_ids, &iface_rec->external_ids);
> +        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
> +                  external_ids);
> +    }
> +}
> +
> +static void
> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    struct ovsrec_interface_table *iface_table =
> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> +            engine_get_input("OVS_interface", node));
> +    data->iface_table = iface_table;
> +    engine_set_node_state(node, EN_UPDATED);
> +}
> +
> +static bool
> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
> +                                           void *data_)
> +{
> +    en_ovs_interface_shadow_run(node, data_);
> +    return true;
> +}
> +
>   struct ed_type_runtime_data {
>       /* Contains "struct local_datapath" nodes. */
>       struct hmap local_datapaths;
> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>           (struct ovsrec_port_table *)EN_OVSDB_GET(
>               engine_get_input("OVS_port", node));
>   
> -    struct ovsrec_interface_table *iface_table =
> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> -            engine_get_input("OVS_interface", node));
> +    struct ed_type_ovs_interface_shadow *iface_shadow =
> +        engine_get_input_data("ovs_interface_shadow", node);
>   
>       struct ovsrec_qos_table *qos_table =
>           (struct ovsrec_qos_table *)EN_OVSDB_GET(
> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>       b_ctx_in->sbrec_port_binding_by_datapath = sbrec_port_binding_by_datapath;
>       b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
>       b_ctx_in->port_table = port_table;
> -    b_ctx_in->iface_table = iface_table;
> +    b_ctx_in->iface_table = iface_shadow->iface_table;
> +    b_ctx_in->iface_table_external_ids_old =
> +        &iface_shadow->iface_table_external_ids_old;
>       b_ctx_in->qos_table = qos_table;
>       b_ctx_in->port_binding_table = pb_table;
>       b_ctx_in->br_int = br_int;
> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
>   }
>   
>   static bool
> -runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
>   {
>       struct ed_type_runtime_data *rt_data = data;
>       struct binding_ctx_in b_ctx_in;
> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>   
>       /* Define inc-proc-engine nodes. */
>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> +                                      "ovs_interface_shadow");
>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>       ENGINE_NODE(non_vif_data, "non_vif_data");
>       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>       engine_add_input(&en_ct_zones, &en_runtime_data,
>                        ct_zones_runtime_data_handler);
>   
> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> +                     ovs_interface_shadow_ovs_interface_handler);
> +
>       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>   
>       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>        */
>       engine_add_input(&en_runtime_data, &en_ovs_port,
>                        engine_noop_handler);
> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
> -                     runtime_data_ovs_interface_handler);
> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
> +                     runtime_data_ovs_interface_shadow_handler);
>   
>       engine_add_input(&en_flow_output, &en_lflow_output,
>                        flow_output_lflow_output_handler);
> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
> index bc133f93b..9affca498 100644
> --- a/tests/ovn-performance.at
> +++ b/tests/ovn-performance.at
> @@ -366,6 +366,23 @@ for i in 1 2; do
>            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>            ovn-nbctl --wait=hv sync]
>       )
> +
> +    # Delete and recreate $lp to make it unbind and rebind multiple times, and
> +    # ensure no recompute is triggered.
> +    for k in $(seq 10); do
> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> +            [hv$i hv$j], [lflow_run],
> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=xxxx &&
> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
> +             ovn-nbctl --wait=hv sync]
> +        )
> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> +            [hv$i hv$j], [lflow_run],
> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp &&
> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> +             ovn-nbctl --wait=hv sync]
> +        )
> +    done
>   done
>   
>   for i in 1 2; do
>
Xavier Simonart May 10, 2022, 12:36 a.m. UTC | #2
Hi Mark, Han

Thanks for looking into this.
Han's patch is addressing the issues I initially tried fixing with
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
is definitely not a good solution (incurs some delay and is more a
workaround than a fix).

I have been trying to follow another approach, which would fix a slightly
broader issue.
The goal is to try to avoid recomputes when updating Port_Binding in SBDB.
In addition to the case Han (greatly) described, we also hit some
recomputes when the same controller binds multiple ports quite quickly
consecutively.

So I have been trying to extend the approach used in if-status (i.e. not
write to SBDB if SBDB is read-only, and have the if-status state machine
running through the different interface states, update SBDB when possible,
and change the states).
I have a patch (almost) ready but had some issues with some test cases, and
was still investigating whether those issues were due to my patch....
I hope to post it tomorrow.
We can then discuss whether we need both approaches.

Thanks
Xavier


On Mon, May 9, 2022 at 9:32 PM Mark Michelson <mmichels@redhat.com> wrote:

> Based on the explanation and the code, this seems to solve the problem
> of the spurious recomputes. I'd like confirmation from Xavier that this
> is fixing the issue he was addressing in
>
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
> , and I'd also like confirmation that this approach doesn't incur the
> delays that Dumitru mentioned on Xavier's patch.
>
> With those confirmations, I'll add my ack to this patch.
>
> On 5/3/22 15:37, Han Zhou wrote:
> > During the process of claiming a VIF port on a chassis, there are
> > multiple SB and local OVSDB updates by ovn-controller, which may trigger
> > ovn-controller recompute, depending on the timing between the completion
> > of SB and local OVSDB updates:
> >
> > After port claiming, ovn-controller sends an update to the local OVSDB
> > to clean the Interface:external_ids:ovn-installed, and also sends an
> > update to the SB DB to update the Port_Binding:chassis.  If the local
> > OVSDB update notification comes back before the SB update completes,
> > ovn-controller's SB IDL is still "in transaction" and read-only, but the
> > local Interface update triggers the I-P engine runtime_data node to
> > handle the OVS_interface change, which considers the VIF to be claimed
> > (again) potentially, which requires SB updates while it is not possible
> > at the moment, and so the I-P handler returns failure and the engine
> > aborts, which triggers recompute at the next I-P engine iteration when
> > SB DB is writable.
> >
> > If we are lucky at the above step that the SB update completes before
> > the local OVSDB update is received, ovn-controller would handle the
> > change incrementally (and figures out that the SB Port_Binding chassis
> > is already the current one thus wouldn't reclaim the VIF). But the next
> > step after it completes flow computing for the newly bound VIF it sends
> > another update to the local OVSDB to set the
> > Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
> > same time it also sends an update to the SB DB to set the
> > Port_Binding:up. These two parallel updates would lead to the same
> > situation as the above step and potentially cause ovn-controll
> > recompute.
> >
> > The recompute is very likely to be triggered in production because the
> > SB DB is remote (on central nodes) and has heavier load, which requires
> > longer time to complete the updates than the local OVSDB server.
> >
> > The problem is that the runtime_data's OVS_interface change handler
> > couldn't handle the update of the Interface:external_ids:ovn-installed
> > incrementally, because ovn-controller assumes it could require an SB
> > update that is not possible at the moment. However, the assumption is
> > not true. In fact it shouldn't even care about the change because this
> > external_id:ovn-installed is written by ovn-controller itself and it is
> > write-only (no need to read it as an input for any further processing).
> >
> > So, the idea to fix the problem is to ignore such changes that happen to
> > the "write-only" fields only. OVSDB change tracking provides APIs to
> > check if a specific column is updated, but the difficulty here is that
> > the column external_ids is a k-v map which contains not only the data
> > that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
> > but also the data ovn-controller cares about, such as iface-id,
> > iface-id-ver and encap-ip. To solve this problem, we need to know what's
> > changed exactly inside the external_ids map.
> >
> > This patch solves it by introducing a new I-P engine node
> > ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
> > copy of the older version of the column external_ids, and replace the
> > input OVS_interface of the runtime_data node. In the change handler of
> the
> > runtime_data node we evaluates if the change needs to be handled before
> > calling consider_iface_claim(), so in the above scenarios the changes
> > don't need to be handled and won't trigger recompute any more. A test
> > case is also updated to capture this (which is very likely, if not 100%,
> > to fail without the fix).
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> > v2: Add comments that was missing while committing v1.
> >
> >   controller/binding.c        |  70 ++++++++++++++++++++++++
> >   controller/binding.h        |   1 +
> >   controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
> >   tests/ovn-performance.at    |  17 ++++++
> >   4 files changed, 186 insertions(+), 7 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 7281b0485..0563d9d1e 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
> ovsrec_interface *iface,
> >       return false;
> >   }
> >
> > +static bool
> > +is_ext_id_changed(const struct smap *a, const struct smap *b, const
> char *key)
> > +{
> > +    const char *value_a = smap_get(a, key);
> > +    const char *value_b = smap_get(b, key);
> > +    if ((value_a && !value_b)
> > +        || (!value_a && value_b)
> > +        || (value_a && value_b && strcmp(value_a, value_b))) {
> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> > +/* Check if the change in 'iface_rec' is something we are interested in
> from
> > + * port binding perspective.  Return true if the change needs to be
> handled,
> > + * otherwise return false.
> > + *
> > + * The 'iface_rec' must be change tracked, i.e. iterator from
> > + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
> > +static bool
> > +ovs_interface_change_need_handle(const struct ovsrec_interface
> *iface_rec,
> > +                                 struct shash
> *iface_table_external_ids_old)
> > +{
> > +    if (ovsrec_interface_is_updated(iface_rec,
> > +                                    OVSREC_INTERFACE_COL_NAME)) {
> > +        return true;
> > +    }
> > +    if (ovsrec_interface_is_updated(iface_rec,
> > +                                    OVSREC_INTERFACE_COL_OFPORT)) {
> > +        return true;
> > +    }
> > +    if (ovsrec_interface_is_updated(iface_rec,
> > +                                    OVSREC_INTERFACE_COL_TYPE)) {
> > +        return true;
> > +    }
> > +    if (ovsrec_interface_is_updated(iface_rec,
> > +                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS))
> {
> > +        /* Compare the external_ids that we are interested in with the
> old
> > +         * values:
> > +         * - iface-id
> > +         * - iface-id-ver
> > +         * - encap-ip
> > +         * For any other changes, such as ovn-installed,
> ovn-installed-ts, etc,
> > +         * we don't need to handle. */
> > +        struct smap *external_ids_old =
> > +            shash_find_data(iface_table_external_ids_old,
> iface_rec->name);
> > +        if (!external_ids_old) {
> > +            return true;
> > +        }
> > +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
> > +                              "iface-id")) {
> > +            return true;
> > +        }
> > +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
> > +                              "iface-id-ver")) {
> > +            return true;
> > +        }
> > +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
> > +                              "encap-ip")) {
> > +            return true;
> > +        }
> > +    }
> > +    return false;
> > +}
> > +
> >   /* Returns true if the ovs interface changes were handled successfully,
> >    * false otherwise.
> >    */
> > @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
> binding_ctx_in *b_ctx_in,
> >               continue;
> >           }
> >
> > +        if (!ovs_interface_change_need_handle(
> > +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
> > +            continue;
> > +        }
> > +
> >           const char *iface_id = smap_get(&iface_rec->external_ids,
> "iface-id");
> >           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
> >           if (iface_id && ofport > 0 &&
> > diff --git a/controller/binding.h b/controller/binding.h
> > index 430a8d9b1..e49e1ebb6 100644
> > --- a/controller/binding.h
> > +++ b/controller/binding.h
> > @@ -55,6 +55,7 @@ struct binding_ctx_in {
> >       const struct ovsrec_bridge_table *bridge_table;
> >       const struct ovsrec_open_vswitch_table *ovs_table;
> >       const struct ovsrec_interface_table *iface_table;
> > +    struct shash *iface_table_external_ids_old;
> >   };
> >
> >   /* Locally relevant port bindings, e.g., VIFs that might be bound
> locally,
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 22b7fa935..649353f7d 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
> *node, void *data)
> >       engine_set_node_state(node, EN_UNCHANGED);
> >   }
> >
> > +/* This engine node is to wrap the OVS_interface input and maintain a
> copy of
> > + * the old version of data for the column external_ids.
> > + *
> > + * There are some special considerations of this engine node:
> > + * 1. It has a single input OVS_interface, and it transparently passes
> the
> > + *    input changes as its own output data to its dependants. So there
> is no
> > + *    processing to OVS_interface changes but simply mark the node
> status as
> > + *    UPDATED (and so the run() and the change handler is the same).
> > + * 2. The iface_table_external_ids_old is computed/updated in the member
> > + *    clear_tracked_data(), because that is when the last round of
> processing
> > + *    has completed but the new IDL data is yet to refresh, so we
> replace the
> > + *    old data with the current data. */
> > +struct ed_type_ovs_interface_shadow {
> > +    struct ovsrec_interface_table *iface_table;
> > +    struct shash iface_table_external_ids_old;
> > +};
> > +
> > +static void *
> > +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
> > +                             struct engine_arg *arg OVS_UNUSED)
> > +{
> > +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
> > +    data->iface_table = NULL;
> > +    shash_init(&data->iface_table_external_ids_old);
> > +
> > +    return data;
> > +}
> > +
> > +static void
> > +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
> > +{
> > +    struct shash_node *node;
> > +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
> > +        struct smap *ext_ids = node->data;
> > +        smap_destroy(ext_ids);
> > +    }
> > +    shash_destroy_free_data(table_ext_ids);
> > +}
> > +
> > +static void
> > +en_ovs_interface_shadow_cleanup(void *data_)
> > +{
> > +    struct ed_type_ovs_interface_shadow *data = data_;
> > +
> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> > +}
> > +
> > +static void
> > +en_ovs_interface_shadow_clear_tracked_data(void *data_)
> > +{
> > +    struct ed_type_ovs_interface_shadow *data = data_;
> > +
> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> > +    shash_init(&data->iface_table_external_ids_old);
> > +
> > +    if (!data->iface_table) {
> > +        return;
> > +    }
> > +
> > +    const struct ovsrec_interface *iface_rec;
> > +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
> > +        struct smap *external_ids = xmalloc(sizeof *external_ids);
> > +        smap_clone(external_ids, &iface_rec->external_ids);
> > +        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
> > +                  external_ids);
> > +    }
> > +}
> > +
> > +static void
> > +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
> > +{
> > +    struct ed_type_ovs_interface_shadow *data = data_;
> > +    struct ovsrec_interface_table *iface_table =
> > +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> > +            engine_get_input("OVS_interface", node));
> > +    data->iface_table = iface_table;
> > +    engine_set_node_state(node, EN_UPDATED);
> > +}
> > +
> > +static bool
> > +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
> > +                                           void *data_)
> > +{
> > +    en_ovs_interface_shadow_run(node, data_);
> > +    return true;
> > +}
> > +
> >   struct ed_type_runtime_data {
> >       /* Contains "struct local_datapath" nodes. */
> >       struct hmap local_datapaths;
> > @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
> >           (struct ovsrec_port_table *)EN_OVSDB_GET(
> >               engine_get_input("OVS_port", node));
> >
> > -    struct ovsrec_interface_table *iface_table =
> > -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> > -            engine_get_input("OVS_interface", node));
> > +    struct ed_type_ovs_interface_shadow *iface_shadow =
> > +        engine_get_input_data("ovs_interface_shadow", node);
> >
> >       struct ovsrec_qos_table *qos_table =
> >           (struct ovsrec_qos_table *)EN_OVSDB_GET(
> > @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
> >       b_ctx_in->sbrec_port_binding_by_datapath =
> sbrec_port_binding_by_datapath;
> >       b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
> >       b_ctx_in->port_table = port_table;
> > -    b_ctx_in->iface_table = iface_table;
> > +    b_ctx_in->iface_table = iface_shadow->iface_table;
> > +    b_ctx_in->iface_table_external_ids_old =
> > +        &iface_shadow->iface_table_external_ids_old;
> >       b_ctx_in->qos_table = qos_table;
> >       b_ctx_in->port_binding_table = pb_table;
> >       b_ctx_in->br_int = br_int;
> > @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void
> *data)
> >   }
> >
> >   static bool
> > -runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
> > +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> void *data)
> >   {
> >       struct ed_type_runtime_data *rt_data = data;
> >       struct binding_ctx_in b_ctx_in;
> > @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
> >
> >       /* Define inc-proc-engine nodes. */
> >       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> > +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> > +                                      "ovs_interface_shadow");
> >       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
> >       ENGINE_NODE(non_vif_data, "non_vif_data");
> >       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> > @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
> >       engine_add_input(&en_ct_zones, &en_runtime_data,
> >                        ct_zones_runtime_data_handler);
> >
> > +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> > +                     ovs_interface_shadow_ovs_interface_handler);
> > +
> >       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
> >
> >       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> > @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
> >        */
> >       engine_add_input(&en_runtime_data, &en_ovs_port,
> >                        engine_noop_handler);
> > -    engine_add_input(&en_runtime_data, &en_ovs_interface,
> > -                     runtime_data_ovs_interface_handler);
> > +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
> > +                     runtime_data_ovs_interface_shadow_handler);
> >
> >       engine_add_input(&en_flow_output, &en_lflow_output,
> >                        flow_output_lflow_output_handler);
> > diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
> > index bc133f93b..9affca498 100644
> > --- a/tests/ovn-performance.at
> > +++ b/tests/ovn-performance.at
> > @@ -366,6 +366,23 @@ for i in 1 2; do
> >            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> >            ovn-nbctl --wait=hv sync]
> >       )
> > +
> > +    # Delete and recreate $lp to make it unbind and rebind multiple
> times, and
> > +    # ensure no recompute is triggered.
> > +    for k in $(seq 10); do
> > +        OVN_CONTROLLER_EXPECT_NO_HIT(
> > +            [hv$i hv$j], [lflow_run],
> > +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=xxxx &&
> > +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
> > +             ovn-nbctl --wait=hv sync]
> > +        )
> > +        OVN_CONTROLLER_EXPECT_NO_HIT(
> > +            [hv$i hv$j], [lflow_run],
> > +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=$lp &&
> > +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> > +             ovn-nbctl --wait=hv sync]
> > +        )
> > +    done
> >   done
> >
> >   for i in 1 2; do
> >
>
>
Han Zhou May 10, 2022, 1:52 a.m. UTC | #3
On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <xsimonar@redhat.com> wrote:
>
> Hi Mark, Han
>
> Thanks for looking into this.
> Han's patch is addressing the issues I initially tried fixing with
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
>
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
is definitely not a good solution (incurs some delay and is more a
workaround than a fix).
>
> I have been trying to follow another approach, which would fix a slightly
broader issue.
> The goal is to try to avoid recomputes when updating Port_Binding in SBDB.
> In addition to the case Han (greatly) described, we also hit some
recomputes when the same controller binds multiple ports quite quickly
consecutively.
>
> So I have been trying to extend the approach used in if-status (i.e. not
write to SBDB if SBDB is read-only, and have the if-status state machine
running through the different interface states, update SBDB when possible,
and change the states).
> I have a patch (almost) ready but had some issues with some test cases,
and was still investigating whether those issues were due to my patch....
> I hope to post it tomorrow.
> We can then discuss whether we need both approaches.

Thanks Xavier. I am interested in the solution you are working on.
I also want to add that my patch here is to provide a generic solution that
filters out the changes we don't care about and quits the I-P engine
processing early. It can be extended easily in the future if we have new
changes to ignore. In many cases this has been achieved by omitting columns
in OVSDB IDL monitoring, but the case here is that we care about part of a
column that is with k-v pairs and OVSDB IDL can monitor/omit the whole
column. So I feel we may need this regardless of the new solution that
addresses the consecutively multiple ports binding problem.
Anyway, we can wait and evaluate when your new patch is posted.

Thanks,
Han

>
> Thanks
> Xavier
>
>
> On Mon, May 9, 2022 at 9:32 PM Mark Michelson <mmichels@redhat.com> wrote:
>>
>> Based on the explanation and the code, this seems to solve the problem
>> of the spurious recomputes. I'd like confirmation from Xavier that this
>> is fixing the issue he was addressing in
>>
https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
>> , and I'd also like confirmation that this approach doesn't incur the
>> delays that Dumitru mentioned on Xavier's patch.
>>
>> With those confirmations, I'll add my ack to this patch.
>>
>> On 5/3/22 15:37, Han Zhou wrote:
>> > During the process of claiming a VIF port on a chassis, there are
>> > multiple SB and local OVSDB updates by ovn-controller, which may
trigger
>> > ovn-controller recompute, depending on the timing between the
completion
>> > of SB and local OVSDB updates:
>> >
>> > After port claiming, ovn-controller sends an update to the local OVSDB
>> > to clean the Interface:external_ids:ovn-installed, and also sends an
>> > update to the SB DB to update the Port_Binding:chassis.  If the local
>> > OVSDB update notification comes back before the SB update completes,
>> > ovn-controller's SB IDL is still "in transaction" and read-only, but
the
>> > local Interface update triggers the I-P engine runtime_data node to
>> > handle the OVS_interface change, which considers the VIF to be claimed
>> > (again) potentially, which requires SB updates while it is not possible
>> > at the moment, and so the I-P handler returns failure and the engine
>> > aborts, which triggers recompute at the next I-P engine iteration when
>> > SB DB is writable.
>> >
>> > If we are lucky at the above step that the SB update completes before
>> > the local OVSDB update is received, ovn-controller would handle the
>> > change incrementally (and figures out that the SB Port_Binding chassis
>> > is already the current one thus wouldn't reclaim the VIF). But the next
>> > step after it completes flow computing for the newly bound VIF it sends
>> > another update to the local OVSDB to set the
>> > Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
>> > same time it also sends an update to the SB DB to set the
>> > Port_Binding:up. These two parallel updates would lead to the same
>> > situation as the above step and potentially cause ovn-controll
>> > recompute.
>> >
>> > The recompute is very likely to be triggered in production because the
>> > SB DB is remote (on central nodes) and has heavier load, which requires
>> > longer time to complete the updates than the local OVSDB server.
>> >
>> > The problem is that the runtime_data's OVS_interface change handler
>> > couldn't handle the update of the Interface:external_ids:ovn-installed
>> > incrementally, because ovn-controller assumes it could require an SB
>> > update that is not possible at the moment. However, the assumption is
>> > not true. In fact it shouldn't even care about the change because this
>> > external_id:ovn-installed is written by ovn-controller itself and it is
>> > write-only (no need to read it as an input for any further processing).
>> >
>> > So, the idea to fix the problem is to ignore such changes that happen
to
>> > the "write-only" fields only. OVSDB change tracking provides APIs to
>> > check if a specific column is updated, but the difficulty here is that
>> > the column external_ids is a k-v map which contains not only the data
>> > that ovn-controller can ignore, such as ovn-installed,
ovn-installed-ts,
>> > but also the data ovn-controller cares about, such as iface-id,
>> > iface-id-ver and encap-ip. To solve this problem, we need to know
what's
>> > changed exactly inside the external_ids map.
>> >
>> > This patch solves it by introducing a new I-P engine node
>> > ovs_interface_shadow to wrap the OVSDB input ovs_interface with an
extra
>> > copy of the older version of the column external_ids, and replace the
>> > input OVS_interface of the runtime_data node. In the change handler of
the
>> > runtime_data node we evaluates if the change needs to be handled before
>> > calling consider_iface_claim(), so in the above scenarios the changes
>> > don't need to be handled and won't trigger recompute any more. A test
>> > case is also updated to capture this (which is very likely, if not
100%,
>> > to fail without the fix).
>> >
>> > Signed-off-by: Han Zhou <hzhou@ovn.org>
>> > ---
>> > v2: Add comments that was missing while committing v1.
>> >
>> >   controller/binding.c        |  70 ++++++++++++++++++++++++
>> >   controller/binding.h        |   1 +
>> >   controller/ovn-controller.c | 105
+++++++++++++++++++++++++++++++++---
>> >   tests/ovn-performance.at    |  17 ++++++
>> >   4 files changed, 186 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/controller/binding.c b/controller/binding.c
>> > index 7281b0485..0563d9d1e 100644
>> > --- a/controller/binding.c
>> > +++ b/controller/binding.c
>> > @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
ovsrec_interface *iface,
>> >       return false;
>> >   }
>> >
>> > +static bool
>> > +is_ext_id_changed(const struct smap *a, const struct smap *b, const
char *key)
>> > +{
>> > +    const char *value_a = smap_get(a, key);
>> > +    const char *value_b = smap_get(b, key);
>> > +    if ((value_a && !value_b)
>> > +        || (!value_a && value_b)
>> > +        || (value_a && value_b && strcmp(value_a, value_b))) {
>> > +        return true;
>> > +    }
>> > +    return false;
>> > +}
>> > +
>> > +/* Check if the change in 'iface_rec' is something we are interested
in from
>> > + * port binding perspective.  Return true if the change needs to be
handled,
>> > + * otherwise return false.
>> > + *
>> > + * The 'iface_rec' must be change tracked, i.e. iterator from
>> > + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
>> > +static bool
>> > +ovs_interface_change_need_handle(const struct ovsrec_interface
*iface_rec,
>> > +                                 struct shash
*iface_table_external_ids_old)
>> > +{
>> > +    if (ovsrec_interface_is_updated(iface_rec,
>> > +                                    OVSREC_INTERFACE_COL_NAME)) {
>> > +        return true;
>> > +    }
>> > +    if (ovsrec_interface_is_updated(iface_rec,
>> > +                                    OVSREC_INTERFACE_COL_OFPORT)) {
>> > +        return true;
>> > +    }
>> > +    if (ovsrec_interface_is_updated(iface_rec,
>> > +                                    OVSREC_INTERFACE_COL_TYPE)) {
>> > +        return true;
>> > +    }
>> > +    if (ovsrec_interface_is_updated(iface_rec,
>> > +
 OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
>> > +        /* Compare the external_ids that we are interested in with
the old
>> > +         * values:
>> > +         * - iface-id
>> > +         * - iface-id-ver
>> > +         * - encap-ip
>> > +         * For any other changes, such as ovn-installed,
ovn-installed-ts, etc,
>> > +         * we don't need to handle. */
>> > +        struct smap *external_ids_old =
>> > +            shash_find_data(iface_table_external_ids_old,
iface_rec->name);
>> > +        if (!external_ids_old) {
>> > +            return true;
>> > +        }
>> > +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
>> > +                              "iface-id")) {
>> > +            return true;
>> > +        }
>> > +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
>> > +                              "iface-id-ver")) {
>> > +            return true;
>> > +        }
>> > +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
>> > +                              "encap-ip")) {
>> > +            return true;
>> > +        }
>> > +    }
>> > +    return false;
>> > +}
>> > +
>> >   /* Returns true if the ovs interface changes were handled
successfully,
>> >    * false otherwise.
>> >    */
>> > @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
binding_ctx_in *b_ctx_in,
>> >               continue;
>> >           }
>> >
>> > +        if (!ovs_interface_change_need_handle(
>> > +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
>> > +            continue;
>> > +        }
>> > +
>> >           const char *iface_id = smap_get(&iface_rec->external_ids,
"iface-id");
>> >           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport :
0;
>> >           if (iface_id && ofport > 0 &&
>> > diff --git a/controller/binding.h b/controller/binding.h
>> > index 430a8d9b1..e49e1ebb6 100644
>> > --- a/controller/binding.h
>> > +++ b/controller/binding.h
>> > @@ -55,6 +55,7 @@ struct binding_ctx_in {
>> >       const struct ovsrec_bridge_table *bridge_table;
>> >       const struct ovsrec_open_vswitch_table *ovs_table;
>> >       const struct ovsrec_interface_table *iface_table;
>> > +    struct shash *iface_table_external_ids_old;
>> >   };
>> >
>> >   /* Locally relevant port bindings, e.g., VIFs that might be bound
locally,
>> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> > index 22b7fa935..649353f7d 100644
>> > --- a/controller/ovn-controller.c
>> > +++ b/controller/ovn-controller.c
>> > @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
*node, void *data)
>> >       engine_set_node_state(node, EN_UNCHANGED);
>> >   }
>> >
>> > +/* This engine node is to wrap the OVS_interface input and maintain a
copy of
>> > + * the old version of data for the column external_ids.
>> > + *
>> > + * There are some special considerations of this engine node:
>> > + * 1. It has a single input OVS_interface, and it transparently
passes the
>> > + *    input changes as its own output data to its dependants. So
there is no
>> > + *    processing to OVS_interface changes but simply mark the node
status as
>> > + *    UPDATED (and so the run() and the change handler is the same).
>> > + * 2. The iface_table_external_ids_old is computed/updated in the
member
>> > + *    clear_tracked_data(), because that is when the last round of
processing
>> > + *    has completed but the new IDL data is yet to refresh, so we
replace the
>> > + *    old data with the current data. */
>> > +struct ed_type_ovs_interface_shadow {
>> > +    struct ovsrec_interface_table *iface_table;
>> > +    struct shash iface_table_external_ids_old;
>> > +};
>> > +
>> > +static void *
>> > +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
>> > +                             struct engine_arg *arg OVS_UNUSED)
>> > +{
>> > +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
>> > +    data->iface_table = NULL;
>> > +    shash_init(&data->iface_table_external_ids_old);
>> > +
>> > +    return data;
>> > +}
>> > +
>> > +static void
>> > +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
>> > +{
>> > +    struct shash_node *node;
>> > +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
>> > +        struct smap *ext_ids = node->data;
>> > +        smap_destroy(ext_ids);
>> > +    }
>> > +    shash_destroy_free_data(table_ext_ids);
>> > +}
>> > +
>> > +static void
>> > +en_ovs_interface_shadow_cleanup(void *data_)
>> > +{
>> > +    struct ed_type_ovs_interface_shadow *data = data_;
>> > +
 iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> > +}
>> > +
>> > +static void
>> > +en_ovs_interface_shadow_clear_tracked_data(void *data_)
>> > +{
>> > +    struct ed_type_ovs_interface_shadow *data = data_;
>> > +
 iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> > +    shash_init(&data->iface_table_external_ids_old);
>> > +
>> > +    if (!data->iface_table) {
>> > +        return;
>> > +    }
>> > +
>> > +    const struct ovsrec_interface *iface_rec;
>> > +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
>> > +        struct smap *external_ids = xmalloc(sizeof *external_ids);
>> > +        smap_clone(external_ids, &iface_rec->external_ids);
>> > +        shash_add(&data->iface_table_external_ids_old,
iface_rec->name,
>> > +                  external_ids);
>> > +    }
>> > +}
>> > +
>> > +static void
>> > +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
>> > +{
>> > +    struct ed_type_ovs_interface_shadow *data = data_;
>> > +    struct ovsrec_interface_table *iface_table =
>> > +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> > +            engine_get_input("OVS_interface", node));
>> > +    data->iface_table = iface_table;
>> > +    engine_set_node_state(node, EN_UPDATED);
>> > +}
>> > +
>> > +static bool
>> > +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
>> > +                                           void *data_)
>> > +{
>> > +    en_ovs_interface_shadow_run(node, data_);
>> > +    return true;
>> > +}
>> > +
>> >   struct ed_type_runtime_data {
>> >       /* Contains "struct local_datapath" nodes. */
>> >       struct hmap local_datapaths;
>> > @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>> >           (struct ovsrec_port_table *)EN_OVSDB_GET(
>> >               engine_get_input("OVS_port", node));
>> >
>> > -    struct ovsrec_interface_table *iface_table =
>> > -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> > -            engine_get_input("OVS_interface", node));
>> > +    struct ed_type_ovs_interface_shadow *iface_shadow =
>> > +        engine_get_input_data("ovs_interface_shadow", node);
>> >
>> >       struct ovsrec_qos_table *qos_table =
>> >           (struct ovsrec_qos_table *)EN_OVSDB_GET(
>> > @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>> >       b_ctx_in->sbrec_port_binding_by_datapath =
sbrec_port_binding_by_datapath;
>> >       b_ctx_in->sbrec_port_binding_by_name =
sbrec_port_binding_by_name;
>> >       b_ctx_in->port_table = port_table;
>> > -    b_ctx_in->iface_table = iface_table;
>> > +    b_ctx_in->iface_table = iface_shadow->iface_table;
>> > +    b_ctx_in->iface_table_external_ids_old =
>> > +        &iface_shadow->iface_table_external_ids_old;
>> >       b_ctx_in->qos_table = qos_table;
>> >       b_ctx_in->port_binding_table = pb_table;
>> >       b_ctx_in->br_int = br_int;
>> > @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node,
void *data)
>> >   }
>> >
>> >   static bool
>> > -runtime_data_ovs_interface_handler(struct engine_node *node, void
*data)
>> > +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
void *data)
>> >   {
>> >       struct ed_type_runtime_data *rt_data = data;
>> >       struct binding_ctx_in b_ctx_in;
>> > @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>> >
>> >       /* Define inc-proc-engine nodes. */
>> >       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>> > +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>> > +                                      "ovs_interface_shadow");
>> >       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>> >       ENGINE_NODE(non_vif_data, "non_vif_data");
>> >       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
>> > @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>> >       engine_add_input(&en_ct_zones, &en_runtime_data,
>> >                        ct_zones_runtime_data_handler);
>> >
>> > +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>> > +                     ovs_interface_shadow_ovs_interface_handler);
>> > +
>> >       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
NULL);
>> >
>> >       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>> > @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>> >        */
>> >       engine_add_input(&en_runtime_data, &en_ovs_port,
>> >                        engine_noop_handler);
>> > -    engine_add_input(&en_runtime_data, &en_ovs_interface,
>> > -                     runtime_data_ovs_interface_handler);
>> > +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
>> > +                     runtime_data_ovs_interface_shadow_handler);
>> >
>> >       engine_add_input(&en_flow_output, &en_lflow_output,
>> >                        flow_output_lflow_output_handler);
>> > diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>> > index bc133f93b..9affca498 100644
>> > --- a/tests/ovn-performance.at
>> > +++ b/tests/ovn-performance.at
>> > @@ -366,6 +366,23 @@ for i in 1 2; do
>> >            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>> >            ovn-nbctl --wait=hv sync]
>> >       )
>> > +
>> > +    # Delete and recreate $lp to make it unbind and rebind multiple
times, and
>> > +    # ensure no recompute is triggered.
>> > +    for k in $(seq 10); do
>> > +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> > +            [hv$i hv$j], [lflow_run],
>> > +            [as hv$i ovs-vsctl set Interface $vif
external-ids:iface-id=xxxx &&
>> > +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false'
&&
>> > +             ovn-nbctl --wait=hv sync]
>> > +        )
>> > +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> > +            [hv$i hv$j], [lflow_run],
>> > +            [as hv$i ovs-vsctl set Interface $vif
external-ids:iface-id=$lp &&
>> > +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>> > +             ovn-nbctl --wait=hv sync]
>> > +        )
>> > +    done
>> >   done
>> >
>> >   for i in 1 2; do
>> >
>>
Dumitru Ceara May 12, 2022, 8:58 a.m. UTC | #4
On 5/3/22 21:37, Han Zhou wrote:
> During the process of claiming a VIF port on a chassis, there are
> multiple SB and local OVSDB updates by ovn-controller, which may trigger
> ovn-controller recompute, depending on the timing between the completion
> of SB and local OVSDB updates:
> 
> After port claiming, ovn-controller sends an update to the local OVSDB
> to clean the Interface:external_ids:ovn-installed, and also sends an
> update to the SB DB to update the Port_Binding:chassis.  If the local
> OVSDB update notification comes back before the SB update completes,
> ovn-controller's SB IDL is still "in transaction" and read-only, but the
> local Interface update triggers the I-P engine runtime_data node to
> handle the OVS_interface change, which considers the VIF to be claimed
> (again) potentially, which requires SB updates while it is not possible
> at the moment, and so the I-P handler returns failure and the engine
> aborts, which triggers recompute at the next I-P engine iteration when
> SB DB is writable.
> 
> If we are lucky at the above step that the SB update completes before
> the local OVSDB update is received, ovn-controller would handle the
> change incrementally (and figures out that the SB Port_Binding chassis
> is already the current one thus wouldn't reclaim the VIF). But the next
> step after it completes flow computing for the newly bound VIF it sends
> another update to the local OVSDB to set the
> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
> same time it also sends an update to the SB DB to set the
> Port_Binding:up. These two parallel updates would lead to the same
> situation as the above step and potentially cause ovn-controll
> recompute.
> 
> The recompute is very likely to be triggered in production because the
> SB DB is remote (on central nodes) and has heavier load, which requires
> longer time to complete the updates than the local OVSDB server.
> 
> The problem is that the runtime_data's OVS_interface change handler
> couldn't handle the update of the Interface:external_ids:ovn-installed
> incrementally, because ovn-controller assumes it could require an SB
> update that is not possible at the moment. However, the assumption is
> not true. In fact it shouldn't even care about the change because this
> external_id:ovn-installed is written by ovn-controller itself and it is
> write-only (no need to read it as an input for any further processing).
> 
> So, the idea to fix the problem is to ignore such changes that happen to
> the "write-only" fields only. OVSDB change tracking provides APIs to
> check if a specific column is updated, but the difficulty here is that
> the column external_ids is a k-v map which contains not only the data
> that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
> but also the data ovn-controller cares about, such as iface-id,
> iface-id-ver and encap-ip. To solve this problem, we need to know what's
> changed exactly inside the external_ids map.
> 
> This patch solves it by introducing a new I-P engine node
> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
> copy of the older version of the column external_ids, and replace the
> input OVS_interface of the runtime_data node. In the change handler of the
> runtime_data node we evaluates if the change needs to be handled before
> calling consider_iface_claim(), so in the above scenarios the changes
> don't need to be handled and won't trigger recompute any more. A test
> case is also updated to capture this (which is very likely, if not 100%,
> to fail without the fix).
> 
> Signed-off-by: Han Zhou <hzhou@ovn.org>

Hi Han,

Thanks for the fix!

I would add:
Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows are installed.")

I have another tiny comment below.  I'm currently rerunning the
ovn-kuberenetes tests with your patch applied [0] because the run done
by ovsrobot failed to bring up a cluster [1] and I want to make sure
we're not breaking anything.

Thanks,
Dumitru

[0] https://github.com/dceara/ovn/actions/runs/2312210800
[1] https://github.com/ovsrobot/ovn/runs/6280999584?check_suite_focus=true#step:9:725

> ---
> v2: Add comments that was missing while committing v1.
> 
>  controller/binding.c        |  70 ++++++++++++++++++++++++
>  controller/binding.h        |   1 +
>  controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
>  tests/ovn-performance.at    |  17 ++++++
>  4 files changed, 186 insertions(+), 7 deletions(-)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 7281b0485..0563d9d1e 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct ovsrec_interface *iface,
>      return false;
>  }
>  
> +static bool
> +is_ext_id_changed(const struct smap *a, const struct smap *b, const char *key)
> +{
> +    const char *value_a = smap_get(a, key);
> +    const char *value_b = smap_get(b, key);
> +    if ((value_a && !value_b)
> +        || (!value_a && value_b)
> +        || (value_a && value_b && strcmp(value_a, value_b))) {
> +        return true;
> +    }
> +    return false;
> +}
> +
> +/* Check if the change in 'iface_rec' is something we are interested in from
> + * port binding perspective.  Return true if the change needs to be handled,
> + * otherwise return false.
> + *
> + * The 'iface_rec' must be change tracked, i.e. iterator from
> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
> +static bool
> +ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec,
> +                                 struct shash *iface_table_external_ids_old)
> +{
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_NAME)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_TYPE)) {
> +        return true;
> +    }
> +    if (ovsrec_interface_is_updated(iface_rec,
> +                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
> +        /* Compare the external_ids that we are interested in with the old
> +         * values:
> +         * - iface-id
> +         * - iface-id-ver
> +         * - encap-ip
> +         * For any other changes, such as ovn-installed, ovn-installed-ts, etc,
> +         * we don't need to handle. */
> +        struct smap *external_ids_old =
> +            shash_find_data(iface_table_external_ids_old, iface_rec->name);
> +        if (!external_ids_old) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "iface-id")) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "iface-id-ver")) {
> +            return true;
> +        }
> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
> +                              "encap-ip")) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
>  /* Returns true if the ovs interface changes were handled successfully,
>   * false otherwise.
>   */
> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct binding_ctx_in *b_ctx_in,
>              continue;
>          }
>  
> +        if (!ovs_interface_change_need_handle(
> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
> +            continue;
> +        }
> +
>          const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id");
>          int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
>          if (iface_id && ofport > 0 &&
> diff --git a/controller/binding.h b/controller/binding.h
> index 430a8d9b1..e49e1ebb6 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>      const struct ovsrec_bridge_table *bridge_table;
>      const struct ovsrec_open_vswitch_table *ovs_table;
>      const struct ovsrec_interface_table *iface_table;
> +    struct shash *iface_table_external_ids_old;
>  };
>  
>  /* Locally relevant port bindings, e.g., VIFs that might be bound locally,
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 22b7fa935..649353f7d 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node *node, void *data)
>      engine_set_node_state(node, EN_UNCHANGED);
>  }
>  
> +/* This engine node is to wrap the OVS_interface input and maintain a copy of
> + * the old version of data for the column external_ids.
> + *
> + * There are some special considerations of this engine node:
> + * 1. It has a single input OVS_interface, and it transparently passes the
> + *    input changes as its own output data to its dependants. So there is no
> + *    processing to OVS_interface changes but simply mark the node status as
> + *    UPDATED (and so the run() and the change handler is the same).
> + * 2. The iface_table_external_ids_old is computed/updated in the member
> + *    clear_tracked_data(), because that is when the last round of processing
> + *    has completed but the new IDL data is yet to refresh, so we replace the
> + *    old data with the current data. */
> +struct ed_type_ovs_interface_shadow {
> +    struct ovsrec_interface_table *iface_table;
> +    struct shash iface_table_external_ids_old;
> +};
> +
> +static void *
> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
> +                             struct engine_arg *arg OVS_UNUSED)
> +{
> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
> +    data->iface_table = NULL;
> +    shash_init(&data->iface_table_external_ids_old);
> +
> +    return data;
> +}
> +
> +static void
> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
> +{
> +    struct shash_node *node;
> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
> +        struct smap *ext_ids = node->data;
> +        smap_destroy(ext_ids);

Nit: no need for SHASH_FOR_EACH_SAFE as far as I can tell.


> +    }
> +    shash_destroy_free_data(table_ext_ids);
> +}
> +
> +static void
> +en_ovs_interface_shadow_cleanup(void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> +}
> +
> +static void
> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> +    shash_init(&data->iface_table_external_ids_old);
> +
> +    if (!data->iface_table) {
> +        return;
> +    }
> +
> +    const struct ovsrec_interface *iface_rec;
> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
> +        smap_clone(external_ids, &iface_rec->external_ids);
> +        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
> +                  external_ids);
> +    }
> +}
> +
> +static void
> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
> +{
> +    struct ed_type_ovs_interface_shadow *data = data_;
> +    struct ovsrec_interface_table *iface_table =
> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> +            engine_get_input("OVS_interface", node));
> +    data->iface_table = iface_table;
> +    engine_set_node_state(node, EN_UPDATED);
> +}
> +
> +static bool
> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
> +                                           void *data_)
> +{
> +    en_ovs_interface_shadow_run(node, data_);
> +    return true;
> +}
> +
>  struct ed_type_runtime_data {
>      /* Contains "struct local_datapath" nodes. */
>      struct hmap local_datapaths;
> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>          (struct ovsrec_port_table *)EN_OVSDB_GET(
>              engine_get_input("OVS_port", node));
>  
> -    struct ovsrec_interface_table *iface_table =
> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> -            engine_get_input("OVS_interface", node));
> +    struct ed_type_ovs_interface_shadow *iface_shadow =
> +        engine_get_input_data("ovs_interface_shadow", node);
>  
>      struct ovsrec_qos_table *qos_table =
>          (struct ovsrec_qos_table *)EN_OVSDB_GET(
> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>      b_ctx_in->sbrec_port_binding_by_datapath = sbrec_port_binding_by_datapath;
>      b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
>      b_ctx_in->port_table = port_table;
> -    b_ctx_in->iface_table = iface_table;
> +    b_ctx_in->iface_table = iface_shadow->iface_table;
> +    b_ctx_in->iface_table_external_ids_old =
> +        &iface_shadow->iface_table_external_ids_old;
>      b_ctx_in->qos_table = qos_table;
>      b_ctx_in->port_binding_table = pb_table;
>      b_ctx_in->br_int = br_int;
> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
>  }
>  
>  static bool
> -runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
>  {
>      struct ed_type_runtime_data *rt_data = data;
>      struct binding_ctx_in b_ctx_in;
> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>  
>      /* Define inc-proc-engine nodes. */
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> +                                      "ovs_interface_shadow");
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>      ENGINE_NODE(non_vif_data, "non_vif_data");
>      ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>      engine_add_input(&en_ct_zones, &en_runtime_data,
>                       ct_zones_runtime_data_handler);
>  
> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> +                     ovs_interface_shadow_ovs_interface_handler);
> +
>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>  
>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>       */
>      engine_add_input(&en_runtime_data, &en_ovs_port,
>                       engine_noop_handler);
> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
> -                     runtime_data_ovs_interface_handler);
> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
> +                     runtime_data_ovs_interface_shadow_handler);
>  
>      engine_add_input(&en_flow_output, &en_lflow_output,
>                       flow_output_lflow_output_handler);
> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
> index bc133f93b..9affca498 100644
> --- a/tests/ovn-performance.at
> +++ b/tests/ovn-performance.at
> @@ -366,6 +366,23 @@ for i in 1 2; do
>           ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>           ovn-nbctl --wait=hv sync]
>      )
> +
> +    # Delete and recreate $lp to make it unbind and rebind multiple times, and
> +    # ensure no recompute is triggered.
> +    for k in $(seq 10); do
> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> +            [hv$i hv$j], [lflow_run],
> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=xxxx &&
> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
> +             ovn-nbctl --wait=hv sync]
> +        )
> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> +            [hv$i hv$j], [lflow_run],
> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp &&
> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> +             ovn-nbctl --wait=hv sync]
> +        )
> +    done
>  done
>  
>  for i in 1 2; do
Dumitru Ceara May 12, 2022, 9 a.m. UTC | #5
Hi Han, Mark, Xavier,

On 5/10/22 03:52, Han Zhou wrote:
> On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <xsimonar@redhat.com> wrote:
>>
>> Hi Mark, Han
>>
>> Thanks for looking into this.
>> Han's patch is addressing the issues I initially tried fixing with
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
>>
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
> is definitely not a good solution (incurs some delay and is more a
> workaround than a fix).
>>
>> I have been trying to follow another approach, which would fix a slightly
> broader issue.
>> The goal is to try to avoid recomputes when updating Port_Binding in SBDB.
>> In addition to the case Han (greatly) described, we also hit some
> recomputes when the same controller binds multiple ports quite quickly
> consecutively.
>>
>> So I have been trying to extend the approach used in if-status (i.e. not
> write to SBDB if SBDB is read-only, and have the if-status state machine
> running through the different interface states, update SBDB when possible,
> and change the states).
>> I have a patch (almost) ready but had some issues with some test cases,
> and was still investigating whether those issues were due to my patch....
>> I hope to post it tomorrow.
>> We can then discuss whether we need both approaches.
> 
> Thanks Xavier. I am interested in the solution you are working on.
> I also want to add that my patch here is to provide a generic solution that
> filters out the changes we don't care about and quits the I-P engine
> processing early. It can be extended easily in the future if we have new
> changes to ignore. In many cases this has been achieved by omitting columns
> in OVSDB IDL monitoring, but the case here is that we care about part of a
> column that is with k-v pairs and OVSDB IDL can monitor/omit the whole
> column. So I feel we may need this regardless of the new solution that
> addresses the consecutively multiple ports binding problem.

I think so too.  If I understand correctly we need both changes.

> Anyway, we can wait and evaluate when your new patch is posted.
> 

+1

Thanks,
Dumitru

> Thanks,
> Han
> 
>>
>> Thanks
>> Xavier
>>
>>
>> On Mon, May 9, 2022 at 9:32 PM Mark Michelson <mmichels@redhat.com> wrote:
>>>
>>> Based on the explanation and the code, this seems to solve the problem
>>> of the spurious recomputes. I'd like confirmation from Xavier that this
>>> is fixing the issue he was addressing in
>>>
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
>>> , and I'd also like confirmation that this approach doesn't incur the
>>> delays that Dumitru mentioned on Xavier's patch.
>>>
>>> With those confirmations, I'll add my ack to this patch.
>>>
>>> On 5/3/22 15:37, Han Zhou wrote:
>>>> During the process of claiming a VIF port on a chassis, there are
>>>> multiple SB and local OVSDB updates by ovn-controller, which may
> trigger
>>>> ovn-controller recompute, depending on the timing between the
> completion
>>>> of SB and local OVSDB updates:
>>>>
>>>> After port claiming, ovn-controller sends an update to the local OVSDB
>>>> to clean the Interface:external_ids:ovn-installed, and also sends an
>>>> update to the SB DB to update the Port_Binding:chassis.  If the local
>>>> OVSDB update notification comes back before the SB update completes,
>>>> ovn-controller's SB IDL is still "in transaction" and read-only, but
> the
>>>> local Interface update triggers the I-P engine runtime_data node to
>>>> handle the OVS_interface change, which considers the VIF to be claimed
>>>> (again) potentially, which requires SB updates while it is not possible
>>>> at the moment, and so the I-P handler returns failure and the engine
>>>> aborts, which triggers recompute at the next I-P engine iteration when
>>>> SB DB is writable.
>>>>
>>>> If we are lucky at the above step that the SB update completes before
>>>> the local OVSDB update is received, ovn-controller would handle the
>>>> change incrementally (and figures out that the SB Port_Binding chassis
>>>> is already the current one thus wouldn't reclaim the VIF). But the next
>>>> step after it completes flow computing for the newly bound VIF it sends
>>>> another update to the local OVSDB to set the
>>>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
>>>> same time it also sends an update to the SB DB to set the
>>>> Port_Binding:up. These two parallel updates would lead to the same
>>>> situation as the above step and potentially cause ovn-controll
>>>> recompute.
>>>>
>>>> The recompute is very likely to be triggered in production because the
>>>> SB DB is remote (on central nodes) and has heavier load, which requires
>>>> longer time to complete the updates than the local OVSDB server.
>>>>
>>>> The problem is that the runtime_data's OVS_interface change handler
>>>> couldn't handle the update of the Interface:external_ids:ovn-installed
>>>> incrementally, because ovn-controller assumes it could require an SB
>>>> update that is not possible at the moment. However, the assumption is
>>>> not true. In fact it shouldn't even care about the change because this
>>>> external_id:ovn-installed is written by ovn-controller itself and it is
>>>> write-only (no need to read it as an input for any further processing).
>>>>
>>>> So, the idea to fix the problem is to ignore such changes that happen
> to
>>>> the "write-only" fields only. OVSDB change tracking provides APIs to
>>>> check if a specific column is updated, but the difficulty here is that
>>>> the column external_ids is a k-v map which contains not only the data
>>>> that ovn-controller can ignore, such as ovn-installed,
> ovn-installed-ts,
>>>> but also the data ovn-controller cares about, such as iface-id,
>>>> iface-id-ver and encap-ip. To solve this problem, we need to know
> what's
>>>> changed exactly inside the external_ids map.
>>>>
>>>> This patch solves it by introducing a new I-P engine node
>>>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an
> extra
>>>> copy of the older version of the column external_ids, and replace the
>>>> input OVS_interface of the runtime_data node. In the change handler of
> the
>>>> runtime_data node we evaluates if the change needs to be handled before
>>>> calling consider_iface_claim(), so in the above scenarios the changes
>>>> don't need to be handled and won't trigger recompute any more. A test
>>>> case is also updated to capture this (which is very likely, if not
> 100%,
>>>> to fail without the fix).
>>>>
>>>> Signed-off-by: Han Zhou <hzhou@ovn.org>
>>>> ---
>>>> v2: Add comments that was missing while committing v1.
>>>>
>>>>   controller/binding.c        |  70 ++++++++++++++++++++++++
>>>>   controller/binding.h        |   1 +
>>>>   controller/ovn-controller.c | 105
> +++++++++++++++++++++++++++++++++---
>>>>   tests/ovn-performance.at    |  17 ++++++
>>>>   4 files changed, 186 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>> index 7281b0485..0563d9d1e 100644
>>>> --- a/controller/binding.c
>>>> +++ b/controller/binding.c
>>>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
> ovsrec_interface *iface,
>>>>       return false;
>>>>   }
>>>>
>>>> +static bool
>>>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const
> char *key)
>>>> +{
>>>> +    const char *value_a = smap_get(a, key);
>>>> +    const char *value_b = smap_get(b, key);
>>>> +    if ((value_a && !value_b)
>>>> +        || (!value_a && value_b)
>>>> +        || (value_a && value_b && strcmp(value_a, value_b))) {
>>>> +        return true;
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>> +/* Check if the change in 'iface_rec' is something we are interested
> in from
>>>> + * port binding perspective.  Return true if the change needs to be
> handled,
>>>> + * otherwise return false.
>>>> + *
>>>> + * The 'iface_rec' must be change tracked, i.e. iterator from
>>>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
>>>> +static bool
>>>> +ovs_interface_change_need_handle(const struct ovsrec_interface
> *iface_rec,
>>>> +                                 struct shash
> *iface_table_external_ids_old)
>>>> +{
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_NAME)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +                                    OVSREC_INTERFACE_COL_TYPE)) {
>>>> +        return true;
>>>> +    }
>>>> +    if (ovsrec_interface_is_updated(iface_rec,
>>>> +
>  OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
>>>> +        /* Compare the external_ids that we are interested in with
> the old
>>>> +         * values:
>>>> +         * - iface-id
>>>> +         * - iface-id-ver
>>>> +         * - encap-ip
>>>> +         * For any other changes, such as ovn-installed,
> ovn-installed-ts, etc,
>>>> +         * we don't need to handle. */
>>>> +        struct smap *external_ids_old =
>>>> +            shash_find_data(iface_table_external_ids_old,
> iface_rec->name);
>>>> +        if (!external_ids_old) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "iface-id")) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "iface-id-ver")) {
>>>> +            return true;
>>>> +        }
>>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> external_ids_old,
>>>> +                              "encap-ip")) {
>>>> +            return true;
>>>> +        }
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>>   /* Returns true if the ovs interface changes were handled
> successfully,
>>>>    * false otherwise.
>>>>    */
>>>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
> binding_ctx_in *b_ctx_in,
>>>>               continue;
>>>>           }
>>>>
>>>> +        if (!ovs_interface_change_need_handle(
>>>> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>           const char *iface_id = smap_get(&iface_rec->external_ids,
> "iface-id");
>>>>           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport :
> 0;
>>>>           if (iface_id && ofport > 0 &&
>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>> index 430a8d9b1..e49e1ebb6 100644
>>>> --- a/controller/binding.h
>>>> +++ b/controller/binding.h
>>>> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>>>>       const struct ovsrec_bridge_table *bridge_table;
>>>>       const struct ovsrec_open_vswitch_table *ovs_table;
>>>>       const struct ovsrec_interface_table *iface_table;
>>>> +    struct shash *iface_table_external_ids_old;
>>>>   };
>>>>
>>>>   /* Locally relevant port bindings, e.g., VIFs that might be bound
> locally,
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index 22b7fa935..649353f7d 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
> *node, void *data)
>>>>       engine_set_node_state(node, EN_UNCHANGED);
>>>>   }
>>>>
>>>> +/* This engine node is to wrap the OVS_interface input and maintain a
> copy of
>>>> + * the old version of data for the column external_ids.
>>>> + *
>>>> + * There are some special considerations of this engine node:
>>>> + * 1. It has a single input OVS_interface, and it transparently
> passes the
>>>> + *    input changes as its own output data to its dependants. So
> there is no
>>>> + *    processing to OVS_interface changes but simply mark the node
> status as
>>>> + *    UPDATED (and so the run() and the change handler is the same).
>>>> + * 2. The iface_table_external_ids_old is computed/updated in the
> member
>>>> + *    clear_tracked_data(), because that is when the last round of
> processing
>>>> + *    has completed but the new IDL data is yet to refresh, so we
> replace the
>>>> + *    old data with the current data. */
>>>> +struct ed_type_ovs_interface_shadow {
>>>> +    struct ovsrec_interface_table *iface_table;
>>>> +    struct shash iface_table_external_ids_old;
>>>> +};
>>>> +
>>>> +static void *
>>>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
>>>> +                             struct engine_arg *arg OVS_UNUSED)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
>>>> +    data->iface_table = NULL;
>>>> +    shash_init(&data->iface_table_external_ids_old);
>>>> +
>>>> +    return data;
>>>> +}
>>>> +
>>>> +static void
>>>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
>>>> +{
>>>> +    struct shash_node *node;
>>>> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
>>>> +        struct smap *ext_ids = node->data;
>>>> +        smap_destroy(ext_ids);
>>>> +    }
>>>> +    shash_destroy_free_data(table_ext_ids);
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_cleanup(void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +
>  iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +
>  iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>>>> +    shash_init(&data->iface_table_external_ids_old);
>>>> +
>>>> +    if (!data->iface_table) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    const struct ovsrec_interface *iface_rec;
>>>> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
>>>> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
>>>> +        smap_clone(external_ids, &iface_rec->external_ids);
>>>> +        shash_add(&data->iface_table_external_ids_old,
> iface_rec->name,
>>>> +                  external_ids);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void
>>>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
>>>> +{
>>>> +    struct ed_type_ovs_interface_shadow *data = data_;
>>>> +    struct ovsrec_interface_table *iface_table =
>>>> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>>>> +            engine_get_input("OVS_interface", node));
>>>> +    data->iface_table = iface_table;
>>>> +    engine_set_node_state(node, EN_UPDATED);
>>>> +}
>>>> +
>>>> +static bool
>>>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
>>>> +                                           void *data_)
>>>> +{
>>>> +    en_ovs_interface_shadow_run(node, data_);
>>>> +    return true;
>>>> +}
>>>> +
>>>>   struct ed_type_runtime_data {
>>>>       /* Contains "struct local_datapath" nodes. */
>>>>       struct hmap local_datapaths;
>>>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>>>>           (struct ovsrec_port_table *)EN_OVSDB_GET(
>>>>               engine_get_input("OVS_port", node));
>>>>
>>>> -    struct ovsrec_interface_table *iface_table =
>>>> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>>>> -            engine_get_input("OVS_interface", node));
>>>> +    struct ed_type_ovs_interface_shadow *iface_shadow =
>>>> +        engine_get_input_data("ovs_interface_shadow", node);
>>>>
>>>>       struct ovsrec_qos_table *qos_table =
>>>>           (struct ovsrec_qos_table *)EN_OVSDB_GET(
>>>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>>>>       b_ctx_in->sbrec_port_binding_by_datapath =
> sbrec_port_binding_by_datapath;
>>>>       b_ctx_in->sbrec_port_binding_by_name =
> sbrec_port_binding_by_name;
>>>>       b_ctx_in->port_table = port_table;
>>>> -    b_ctx_in->iface_table = iface_table;
>>>> +    b_ctx_in->iface_table = iface_shadow->iface_table;
>>>> +    b_ctx_in->iface_table_external_ids_old =
>>>> +        &iface_shadow->iface_table_external_ids_old;
>>>>       b_ctx_in->qos_table = qos_table;
>>>>       b_ctx_in->port_binding_table = pb_table;
>>>>       b_ctx_in->br_int = br_int;
>>>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node,
> void *data)
>>>>   }
>>>>
>>>>   static bool
>>>> -runtime_data_ovs_interface_handler(struct engine_node *node, void
> *data)
>>>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> void *data)
>>>>   {
>>>>       struct ed_type_runtime_data *rt_data = data;
>>>>       struct binding_ctx_in b_ctx_in;
>>>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>>>>
>>>>       /* Define inc-proc-engine nodes. */
>>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>>>> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>> +                                      "ovs_interface_shadow");
>>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>>>>       ENGINE_NODE(non_vif_data, "non_vif_data");
>>>>       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
>>>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>>>>       engine_add_input(&en_ct_zones, &en_runtime_data,
>>>>                        ct_zones_runtime_data_handler);
>>>>
>>>> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>> +                     ovs_interface_shadow_ovs_interface_handler);
>>>> +
>>>>       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> NULL);
>>>>
>>>>       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>>>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>>>>        */
>>>>       engine_add_input(&en_runtime_data, &en_ovs_port,
>>>>                        engine_noop_handler);
>>>> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
>>>> -                     runtime_data_ovs_interface_handler);
>>>> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
>>>> +                     runtime_data_ovs_interface_shadow_handler);
>>>>
>>>>       engine_add_input(&en_flow_output, &en_lflow_output,
>>>>                        flow_output_lflow_output_handler);
>>>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>>>> index bc133f93b..9affca498 100644
>>>> --- a/tests/ovn-performance.at
>>>> +++ b/tests/ovn-performance.at
>>>> @@ -366,6 +366,23 @@ for i in 1 2; do
>>>>            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>>>            ovn-nbctl --wait=hv sync]
>>>>       )
>>>> +
>>>> +    # Delete and recreate $lp to make it unbind and rebind multiple
> times, and
>>>> +    # ensure no recompute is triggered.
>>>> +    for k in $(seq 10); do
>>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>>>> +            [hv$i hv$j], [lflow_run],
>>>> +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=xxxx &&
>>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false'
> &&
>>>> +             ovn-nbctl --wait=hv sync]
>>>> +        )
>>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>>>> +            [hv$i hv$j], [lflow_run],
>>>> +            [as hv$i ovs-vsctl set Interface $vif
> external-ids:iface-id=$lp &&
>>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>>> +             ovn-nbctl --wait=hv sync]
>>>> +        )
>>>> +    done
>>>>   done
>>>>
>>>>   for i in 1 2; do
>>>>
>>>
>
Xavier Simonart May 12, 2022, 9:04 a.m. UTC | #6
Hi Han

I think that I agree: both patches are somehow orthogonal, even though they
got initiated by the same issue.
I just submitted the patch preventing recomputes due to Port_Binding.

Thanks
Xavier

On Thu, May 12, 2022 at 11:00 AM Dumitru Ceara <dceara@redhat.com> wrote:

> Hi Han, Mark, Xavier,
>
> On 5/10/22 03:52, Han Zhou wrote:
> > On Mon, May 9, 2022 at 5:36 PM Xavier Simonart <xsimonar@redhat.com>
> wrote:
> >>
> >> Hi Mark, Han
> >>
> >> Thanks for looking into this.
> >> Han's patch is addressing the issues I initially tried fixing with
> >
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
> >>
> >
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
> > is definitely not a good solution (incurs some delay and is more a
> > workaround than a fix).
> >>
> >> I have been trying to follow another approach, which would fix a
> slightly
> > broader issue.
> >> The goal is to try to avoid recomputes when updating Port_Binding in
> SBDB.
> >> In addition to the case Han (greatly) described, we also hit some
> > recomputes when the same controller binds multiple ports quite quickly
> > consecutively.
> >>
> >> So I have been trying to extend the approach used in if-status (i.e. not
> > write to SBDB if SBDB is read-only, and have the if-status state machine
> > running through the different interface states, update SBDB when
> possible,
> > and change the states).
> >> I have a patch (almost) ready but had some issues with some test cases,
> > and was still investigating whether those issues were due to my patch....
> >> I hope to post it tomorrow.
> >> We can then discuss whether we need both approaches.
> >
> > Thanks Xavier. I am interested in the solution you are working on.
> > I also want to add that my patch here is to provide a generic solution
> that
> > filters out the changes we don't care about and quits the I-P engine
> > processing early. It can be extended easily in the future if we have new
> > changes to ignore. In many cases this has been achieved by omitting
> columns
> > in OVSDB IDL monitoring, but the case here is that we care about part of
> a
> > column that is with k-v pairs and OVSDB IDL can monitor/omit the whole
> > column. So I feel we may need this regardless of the new solution that
> > addresses the consecutively multiple ports binding problem.
>
> I think so too.  If I understand correctly we need both changes.
>
> > Anyway, we can wait and evaluate when your new patch is posted.
> >
>
> +1
>
> Thanks,
> Dumitru
>
> > Thanks,
> > Han
> >
> >>
> >> Thanks
> >> Xavier
> >>
> >>
> >> On Mon, May 9, 2022 at 9:32 PM Mark Michelson <mmichels@redhat.com>
> wrote:
> >>>
> >>> Based on the explanation and the code, this seems to solve the problem
> >>> of the spurious recomputes. I'd like confirmation from Xavier that this
> >>> is fixing the issue he was addressing in
> >>>
> >
> https://patchwork.ozlabs.org/project/ovn/patch/20220401101640.1139426-1-xsimonar@redhat.com/
> >>> , and I'd also like confirmation that this approach doesn't incur the
> >>> delays that Dumitru mentioned on Xavier's patch.
> >>>
> >>> With those confirmations, I'll add my ack to this patch.
> >>>
> >>> On 5/3/22 15:37, Han Zhou wrote:
> >>>> During the process of claiming a VIF port on a chassis, there are
> >>>> multiple SB and local OVSDB updates by ovn-controller, which may
> > trigger
> >>>> ovn-controller recompute, depending on the timing between the
> > completion
> >>>> of SB and local OVSDB updates:
> >>>>
> >>>> After port claiming, ovn-controller sends an update to the local OVSDB
> >>>> to clean the Interface:external_ids:ovn-installed, and also sends an
> >>>> update to the SB DB to update the Port_Binding:chassis.  If the local
> >>>> OVSDB update notification comes back before the SB update completes,
> >>>> ovn-controller's SB IDL is still "in transaction" and read-only, but
> > the
> >>>> local Interface update triggers the I-P engine runtime_data node to
> >>>> handle the OVS_interface change, which considers the VIF to be claimed
> >>>> (again) potentially, which requires SB updates while it is not
> possible
> >>>> at the moment, and so the I-P handler returns failure and the engine
> >>>> aborts, which triggers recompute at the next I-P engine iteration when
> >>>> SB DB is writable.
> >>>>
> >>>> If we are lucky at the above step that the SB update completes before
> >>>> the local OVSDB update is received, ovn-controller would handle the
> >>>> change incrementally (and figures out that the SB Port_Binding chassis
> >>>> is already the current one thus wouldn't reclaim the VIF). But the
> next
> >>>> step after it completes flow computing for the newly bound VIF it
> sends
> >>>> another update to the local OVSDB to set the
> >>>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at
> the
> >>>> same time it also sends an update to the SB DB to set the
> >>>> Port_Binding:up. These two parallel updates would lead to the same
> >>>> situation as the above step and potentially cause ovn-controll
> >>>> recompute.
> >>>>
> >>>> The recompute is very likely to be triggered in production because the
> >>>> SB DB is remote (on central nodes) and has heavier load, which
> requires
> >>>> longer time to complete the updates than the local OVSDB server.
> >>>>
> >>>> The problem is that the runtime_data's OVS_interface change handler
> >>>> couldn't handle the update of the Interface:external_ids:ovn-installed
> >>>> incrementally, because ovn-controller assumes it could require an SB
> >>>> update that is not possible at the moment. However, the assumption is
> >>>> not true. In fact it shouldn't even care about the change because this
> >>>> external_id:ovn-installed is written by ovn-controller itself and it
> is
> >>>> write-only (no need to read it as an input for any further
> processing).
> >>>>
> >>>> So, the idea to fix the problem is to ignore such changes that happen
> > to
> >>>> the "write-only" fields only. OVSDB change tracking provides APIs to
> >>>> check if a specific column is updated, but the difficulty here is that
> >>>> the column external_ids is a k-v map which contains not only the data
> >>>> that ovn-controller can ignore, such as ovn-installed,
> > ovn-installed-ts,
> >>>> but also the data ovn-controller cares about, such as iface-id,
> >>>> iface-id-ver and encap-ip. To solve this problem, we need to know
> > what's
> >>>> changed exactly inside the external_ids map.
> >>>>
> >>>> This patch solves it by introducing a new I-P engine node
> >>>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an
> > extra
> >>>> copy of the older version of the column external_ids, and replace the
> >>>> input OVS_interface of the runtime_data node. In the change handler of
> > the
> >>>> runtime_data node we evaluates if the change needs to be handled
> before
> >>>> calling consider_iface_claim(), so in the above scenarios the changes
> >>>> don't need to be handled and won't trigger recompute any more. A test
> >>>> case is also updated to capture this (which is very likely, if not
> > 100%,
> >>>> to fail without the fix).
> >>>>
> >>>> Signed-off-by: Han Zhou <hzhou@ovn.org>
> >>>> ---
> >>>> v2: Add comments that was missing while committing v1.
> >>>>
> >>>>   controller/binding.c        |  70 ++++++++++++++++++++++++
> >>>>   controller/binding.h        |   1 +
> >>>>   controller/ovn-controller.c | 105
> > +++++++++++++++++++++++++++++++++---
> >>>>   tests/ovn-performance.at    |  17 ++++++
> >>>>   4 files changed, 186 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/controller/binding.c b/controller/binding.c
> >>>> index 7281b0485..0563d9d1e 100644
> >>>> --- a/controller/binding.c
> >>>> +++ b/controller/binding.c
> >>>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
> > ovsrec_interface *iface,
> >>>>       return false;
> >>>>   }
> >>>>
> >>>> +static bool
> >>>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const
> > char *key)
> >>>> +{
> >>>> +    const char *value_a = smap_get(a, key);
> >>>> +    const char *value_b = smap_get(b, key);
> >>>> +    if ((value_a && !value_b)
> >>>> +        || (!value_a && value_b)
> >>>> +        || (value_a && value_b && strcmp(value_a, value_b))) {
> >>>> +        return true;
> >>>> +    }
> >>>> +    return false;
> >>>> +}
> >>>> +
> >>>> +/* Check if the change in 'iface_rec' is something we are interested
> > in from
> >>>> + * port binding perspective.  Return true if the change needs to be
> > handled,
> >>>> + * otherwise return false.
> >>>> + *
> >>>> + * The 'iface_rec' must be change tracked, i.e. iterator from
> >>>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
> >>>> +static bool
> >>>> +ovs_interface_change_need_handle(const struct ovsrec_interface
> > *iface_rec,
> >>>> +                                 struct shash
> > *iface_table_external_ids_old)
> >>>> +{
> >>>> +    if (ovsrec_interface_is_updated(iface_rec,
> >>>> +                                    OVSREC_INTERFACE_COL_NAME)) {
> >>>> +        return true;
> >>>> +    }
> >>>> +    if (ovsrec_interface_is_updated(iface_rec,
> >>>> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
> >>>> +        return true;
> >>>> +    }
> >>>> +    if (ovsrec_interface_is_updated(iface_rec,
> >>>> +                                    OVSREC_INTERFACE_COL_TYPE)) {
> >>>> +        return true;
> >>>> +    }
> >>>> +    if (ovsrec_interface_is_updated(iface_rec,
> >>>> +
> >  OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
> >>>> +        /* Compare the external_ids that we are interested in with
> > the old
> >>>> +         * values:
> >>>> +         * - iface-id
> >>>> +         * - iface-id-ver
> >>>> +         * - encap-ip
> >>>> +         * For any other changes, such as ovn-installed,
> > ovn-installed-ts, etc,
> >>>> +         * we don't need to handle. */
> >>>> +        struct smap *external_ids_old =
> >>>> +            shash_find_data(iface_table_external_ids_old,
> > iface_rec->name);
> >>>> +        if (!external_ids_old) {
> >>>> +            return true;
> >>>> +        }
> >>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> > external_ids_old,
> >>>> +                              "iface-id")) {
> >>>> +            return true;
> >>>> +        }
> >>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> > external_ids_old,
> >>>> +                              "iface-id-ver")) {
> >>>> +            return true;
> >>>> +        }
> >>>> +        if (is_ext_id_changed(&iface_rec->external_ids,
> > external_ids_old,
> >>>> +                              "encap-ip")) {
> >>>> +            return true;
> >>>> +        }
> >>>> +    }
> >>>> +    return false;
> >>>> +}
> >>>> +
> >>>>   /* Returns true if the ovs interface changes were handled
> > successfully,
> >>>>    * false otherwise.
> >>>>    */
> >>>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
> > binding_ctx_in *b_ctx_in,
> >>>>               continue;
> >>>>           }
> >>>>
> >>>> +        if (!ovs_interface_change_need_handle(
> >>>> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
> >>>> +            continue;
> >>>> +        }
> >>>> +
> >>>>           const char *iface_id = smap_get(&iface_rec->external_ids,
> > "iface-id");
> >>>>           int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport :
> > 0;
> >>>>           if (iface_id && ofport > 0 &&
> >>>> diff --git a/controller/binding.h b/controller/binding.h
> >>>> index 430a8d9b1..e49e1ebb6 100644
> >>>> --- a/controller/binding.h
> >>>> +++ b/controller/binding.h
> >>>> @@ -55,6 +55,7 @@ struct binding_ctx_in {
> >>>>       const struct ovsrec_bridge_table *bridge_table;
> >>>>       const struct ovsrec_open_vswitch_table *ovs_table;
> >>>>       const struct ovsrec_interface_table *iface_table;
> >>>> +    struct shash *iface_table_external_ids_old;
> >>>>   };
> >>>>
> >>>>   /* Locally relevant port bindings, e.g., VIFs that might be bound
> > locally,
> >>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>>> index 22b7fa935..649353f7d 100644
> >>>> --- a/controller/ovn-controller.c
> >>>> +++ b/controller/ovn-controller.c
> >>>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
> > *node, void *data)
> >>>>       engine_set_node_state(node, EN_UNCHANGED);
> >>>>   }
> >>>>
> >>>> +/* This engine node is to wrap the OVS_interface input and maintain a
> > copy of
> >>>> + * the old version of data for the column external_ids.
> >>>> + *
> >>>> + * There are some special considerations of this engine node:
> >>>> + * 1. It has a single input OVS_interface, and it transparently
> > passes the
> >>>> + *    input changes as its own output data to its dependants. So
> > there is no
> >>>> + *    processing to OVS_interface changes but simply mark the node
> > status as
> >>>> + *    UPDATED (and so the run() and the change handler is the same).
> >>>> + * 2. The iface_table_external_ids_old is computed/updated in the
> > member
> >>>> + *    clear_tracked_data(), because that is when the last round of
> > processing
> >>>> + *    has completed but the new IDL data is yet to refresh, so we
> > replace the
> >>>> + *    old data with the current data. */
> >>>> +struct ed_type_ovs_interface_shadow {
> >>>> +    struct ovsrec_interface_table *iface_table;
> >>>> +    struct shash iface_table_external_ids_old;
> >>>> +};
> >>>> +
> >>>> +static void *
> >>>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
> >>>> +                             struct engine_arg *arg OVS_UNUSED)
> >>>> +{
> >>>> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof
> *data);
> >>>> +    data->iface_table = NULL;
> >>>> +    shash_init(&data->iface_table_external_ids_old);
> >>>> +
> >>>> +    return data;
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
> >>>> +{
> >>>> +    struct shash_node *node;
> >>>> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
> >>>> +        struct smap *ext_ids = node->data;
> >>>> +        smap_destroy(ext_ids);
> >>>> +    }
> >>>> +    shash_destroy_free_data(table_ext_ids);
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +en_ovs_interface_shadow_cleanup(void *data_)
> >>>> +{
> >>>> +    struct ed_type_ovs_interface_shadow *data = data_;
> >>>> +
> >
> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
> >>>> +{
> >>>> +    struct ed_type_ovs_interface_shadow *data = data_;
> >>>> +
> >
> iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> >>>> +    shash_init(&data->iface_table_external_ids_old);
> >>>> +
> >>>> +    if (!data->iface_table) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    const struct ovsrec_interface *iface_rec;
> >>>> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
> >>>> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
> >>>> +        smap_clone(external_ids, &iface_rec->external_ids);
> >>>> +        shash_add(&data->iface_table_external_ids_old,
> > iface_rec->name,
> >>>> +                  external_ids);
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
> >>>> +{
> >>>> +    struct ed_type_ovs_interface_shadow *data = data_;
> >>>> +    struct ovsrec_interface_table *iface_table =
> >>>> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> >>>> +            engine_get_input("OVS_interface", node));
> >>>> +    data->iface_table = iface_table;
> >>>> +    engine_set_node_state(node, EN_UPDATED);
> >>>> +}
> >>>> +
> >>>> +static bool
> >>>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
> >>>> +                                           void *data_)
> >>>> +{
> >>>> +    en_ovs_interface_shadow_run(node, data_);
> >>>> +    return true;
> >>>> +}
> >>>> +
> >>>>   struct ed_type_runtime_data {
> >>>>       /* Contains "struct local_datapath" nodes. */
> >>>>       struct hmap local_datapaths;
> >>>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
> >>>>           (struct ovsrec_port_table *)EN_OVSDB_GET(
> >>>>               engine_get_input("OVS_port", node));
> >>>>
> >>>> -    struct ovsrec_interface_table *iface_table =
> >>>> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> >>>> -            engine_get_input("OVS_interface", node));
> >>>> +    struct ed_type_ovs_interface_shadow *iface_shadow =
> >>>> +        engine_get_input_data("ovs_interface_shadow", node);
> >>>>
> >>>>       struct ovsrec_qos_table *qos_table =
> >>>>           (struct ovsrec_qos_table *)EN_OVSDB_GET(
> >>>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
> >>>>       b_ctx_in->sbrec_port_binding_by_datapath =
> > sbrec_port_binding_by_datapath;
> >>>>       b_ctx_in->sbrec_port_binding_by_name =
> > sbrec_port_binding_by_name;
> >>>>       b_ctx_in->port_table = port_table;
> >>>> -    b_ctx_in->iface_table = iface_table;
> >>>> +    b_ctx_in->iface_table = iface_shadow->iface_table;
> >>>> +    b_ctx_in->iface_table_external_ids_old =
> >>>> +        &iface_shadow->iface_table_external_ids_old;
> >>>>       b_ctx_in->qos_table = qos_table;
> >>>>       b_ctx_in->port_binding_table = pb_table;
> >>>>       b_ctx_in->br_int = br_int;
> >>>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node,
> > void *data)
> >>>>   }
> >>>>
> >>>>   static bool
> >>>> -runtime_data_ovs_interface_handler(struct engine_node *node, void
> > *data)
> >>>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> > void *data)
> >>>>   {
> >>>>       struct ed_type_runtime_data *rt_data = data;
> >>>>       struct binding_ctx_in b_ctx_in;
> >>>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
> >>>>
> >>>>       /* Define inc-proc-engine nodes. */
> >>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> "ct_zones");
> >>>> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >>>> +                                      "ovs_interface_shadow");
> >>>>       ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
> >>>>       ENGINE_NODE(non_vif_data, "non_vif_data");
> >>>>       ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> >>>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
> >>>>       engine_add_input(&en_ct_zones, &en_runtime_data,
> >>>>                        ct_zones_runtime_data_handler);
> >>>>
> >>>> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >>>> +                     ovs_interface_shadow_ovs_interface_handler);
> >>>> +
> >>>>       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> > NULL);
> >>>>
> >>>>       engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> >>>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
> >>>>        */
> >>>>       engine_add_input(&en_runtime_data, &en_ovs_port,
> >>>>                        engine_noop_handler);
> >>>> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
> >>>> -                     runtime_data_ovs_interface_handler);
> >>>> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
> >>>> +                     runtime_data_ovs_interface_shadow_handler);
> >>>>
> >>>>       engine_add_input(&en_flow_output, &en_lflow_output,
> >>>>                        flow_output_lflow_output_handler);
> >>>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
> >>>> index bc133f93b..9affca498 100644
> >>>> --- a/tests/ovn-performance.at
> >>>> +++ b/tests/ovn-performance.at
> >>>> @@ -366,6 +366,23 @@ for i in 1 2; do
> >>>>            ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> >>>>            ovn-nbctl --wait=hv sync]
> >>>>       )
> >>>> +
> >>>> +    # Delete and recreate $lp to make it unbind and rebind multiple
> > times, and
> >>>> +    # ensure no recompute is triggered.
> >>>> +    for k in $(seq 10); do
> >>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> >>>> +            [hv$i hv$j], [lflow_run],
> >>>> +            [as hv$i ovs-vsctl set Interface $vif
> > external-ids:iface-id=xxxx &&
> >>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false'
> > &&
> >>>> +             ovn-nbctl --wait=hv sync]
> >>>> +        )
> >>>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> >>>> +            [hv$i hv$j], [lflow_run],
> >>>> +            [as hv$i ovs-vsctl set Interface $vif
> > external-ids:iface-id=$lp &&
> >>>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true'
> &&
> >>>> +             ovn-nbctl --wait=hv sync]
> >>>> +        )
> >>>> +    done
> >>>>   done
> >>>>
> >>>>   for i in 1 2; do
> >>>>
> >>>
> >
>
>
Dumitru Ceara May 12, 2022, 4:32 p.m. UTC | #7
On 5/12/22 10:58, Dumitru Ceara wrote:
> On 5/3/22 21:37, Han Zhou wrote:
>> During the process of claiming a VIF port on a chassis, there are
>> multiple SB and local OVSDB updates by ovn-controller, which may trigger
>> ovn-controller recompute, depending on the timing between the completion
>> of SB and local OVSDB updates:
>>
>> After port claiming, ovn-controller sends an update to the local OVSDB
>> to clean the Interface:external_ids:ovn-installed, and also sends an
>> update to the SB DB to update the Port_Binding:chassis.  If the local
>> OVSDB update notification comes back before the SB update completes,
>> ovn-controller's SB IDL is still "in transaction" and read-only, but the
>> local Interface update triggers the I-P engine runtime_data node to
>> handle the OVS_interface change, which considers the VIF to be claimed
>> (again) potentially, which requires SB updates while it is not possible
>> at the moment, and so the I-P handler returns failure and the engine
>> aborts, which triggers recompute at the next I-P engine iteration when
>> SB DB is writable.
>>
>> If we are lucky at the above step that the SB update completes before
>> the local OVSDB update is received, ovn-controller would handle the
>> change incrementally (and figures out that the SB Port_Binding chassis
>> is already the current one thus wouldn't reclaim the VIF). But the next
>> step after it completes flow computing for the newly bound VIF it sends
>> another update to the local OVSDB to set the
>> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
>> same time it also sends an update to the SB DB to set the
>> Port_Binding:up. These two parallel updates would lead to the same
>> situation as the above step and potentially cause ovn-controll
>> recompute.
>>
>> The recompute is very likely to be triggered in production because the
>> SB DB is remote (on central nodes) and has heavier load, which requires
>> longer time to complete the updates than the local OVSDB server.
>>
>> The problem is that the runtime_data's OVS_interface change handler
>> couldn't handle the update of the Interface:external_ids:ovn-installed
>> incrementally, because ovn-controller assumes it could require an SB
>> update that is not possible at the moment. However, the assumption is
>> not true. In fact it shouldn't even care about the change because this
>> external_id:ovn-installed is written by ovn-controller itself and it is
>> write-only (no need to read it as an input for any further processing).
>>
>> So, the idea to fix the problem is to ignore such changes that happen to
>> the "write-only" fields only. OVSDB change tracking provides APIs to
>> check if a specific column is updated, but the difficulty here is that
>> the column external_ids is a k-v map which contains not only the data
>> that ovn-controller can ignore, such as ovn-installed, ovn-installed-ts,
>> but also the data ovn-controller cares about, such as iface-id,
>> iface-id-ver and encap-ip. To solve this problem, we need to know what's
>> changed exactly inside the external_ids map.
>>
>> This patch solves it by introducing a new I-P engine node
>> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an extra
>> copy of the older version of the column external_ids, and replace the
>> input OVS_interface of the runtime_data node. In the change handler of the
>> runtime_data node we evaluates if the change needs to be handled before
>> calling consider_iface_claim(), so in the above scenarios the changes
>> don't need to be handled and won't trigger recompute any more. A test
>> case is also updated to capture this (which is very likely, if not 100%,
>> to fail without the fix).
>>
>> Signed-off-by: Han Zhou <hzhou@ovn.org>
> 
> Hi Han,
> 
> Thanks for the fix!
> 
> I would add:
> Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows are installed.")
> 
> I have another tiny comment below.  I'm currently rerunning the
> ovn-kuberenetes tests with your patch applied [0] because the run done
> by ovsrobot failed to bring up a cluster [1] and I want to make sure
> we're not breaking anything.
> 

The CI job passed so, with the minor comments addressed:

Acked-by: Dumitru Ceara <dceara@redhat.com>

> Thanks,
> Dumitru
> 
> [0] https://github.com/dceara/ovn/actions/runs/2312210800
> [1] https://github.com/ovsrobot/ovn/runs/6280999584?check_suite_focus=true#step:9:725
> 
>> ---
>> v2: Add comments that was missing while committing v1.
>>
>>  controller/binding.c        |  70 ++++++++++++++++++++++++
>>  controller/binding.h        |   1 +
>>  controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
>>  tests/ovn-performance.at    |  17 ++++++
>>  4 files changed, 186 insertions(+), 7 deletions(-)
>>
>> diff --git a/controller/binding.c b/controller/binding.c
>> index 7281b0485..0563d9d1e 100644
>> --- a/controller/binding.c
>> +++ b/controller/binding.c
>> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct ovsrec_interface *iface,
>>      return false;
>>  }
>>  
>> +static bool
>> +is_ext_id_changed(const struct smap *a, const struct smap *b, const char *key)
>> +{
>> +    const char *value_a = smap_get(a, key);
>> +    const char *value_b = smap_get(b, key);
>> +    if ((value_a && !value_b)
>> +        || (!value_a && value_b)
>> +        || (value_a && value_b && strcmp(value_a, value_b))) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +/* Check if the change in 'iface_rec' is something we are interested in from
>> + * port binding perspective.  Return true if the change needs to be handled,
>> + * otherwise return false.
>> + *
>> + * The 'iface_rec' must be change tracked, i.e. iterator from
>> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
>> +static bool
>> +ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec,
>> +                                 struct shash *iface_table_external_ids_old)
>> +{
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_NAME)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_TYPE)) {
>> +        return true;
>> +    }
>> +    if (ovsrec_interface_is_updated(iface_rec,
>> +                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
>> +        /* Compare the external_ids that we are interested in with the old
>> +         * values:
>> +         * - iface-id
>> +         * - iface-id-ver
>> +         * - encap-ip
>> +         * For any other changes, such as ovn-installed, ovn-installed-ts, etc,
>> +         * we don't need to handle. */
>> +        struct smap *external_ids_old =
>> +            shash_find_data(iface_table_external_ids_old, iface_rec->name);
>> +        if (!external_ids_old) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "iface-id")) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "iface-id-ver")) {
>> +            return true;
>> +        }
>> +        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
>> +                              "encap-ip")) {
>> +            return true;
>> +        }
>> +    }
>> +    return false;
>> +}
>> +
>>  /* Returns true if the ovs interface changes were handled successfully,
>>   * false otherwise.
>>   */
>> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct binding_ctx_in *b_ctx_in,
>>              continue;
>>          }
>>  
>> +        if (!ovs_interface_change_need_handle(
>> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
>> +            continue;
>> +        }
>> +
>>          const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id");
>>          int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
>>          if (iface_id && ofport > 0 &&
>> diff --git a/controller/binding.h b/controller/binding.h
>> index 430a8d9b1..e49e1ebb6 100644
>> --- a/controller/binding.h
>> +++ b/controller/binding.h
>> @@ -55,6 +55,7 @@ struct binding_ctx_in {
>>      const struct ovsrec_bridge_table *bridge_table;
>>      const struct ovsrec_open_vswitch_table *ovs_table;
>>      const struct ovsrec_interface_table *iface_table;
>> +    struct shash *iface_table_external_ids_old;
>>  };
>>  
>>  /* Locally relevant port bindings, e.g., VIFs that might be bound locally,
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 22b7fa935..649353f7d 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node *node, void *data)
>>      engine_set_node_state(node, EN_UNCHANGED);
>>  }
>>  
>> +/* This engine node is to wrap the OVS_interface input and maintain a copy of
>> + * the old version of data for the column external_ids.
>> + *
>> + * There are some special considerations of this engine node:
>> + * 1. It has a single input OVS_interface, and it transparently passes the
>> + *    input changes as its own output data to its dependants. So there is no
>> + *    processing to OVS_interface changes but simply mark the node status as
>> + *    UPDATED (and so the run() and the change handler is the same).
>> + * 2. The iface_table_external_ids_old is computed/updated in the member
>> + *    clear_tracked_data(), because that is when the last round of processing
>> + *    has completed but the new IDL data is yet to refresh, so we replace the
>> + *    old data with the current data. */
>> +struct ed_type_ovs_interface_shadow {
>> +    struct ovsrec_interface_table *iface_table;
>> +    struct shash iface_table_external_ids_old;
>> +};
>> +
>> +static void *
>> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
>> +                             struct engine_arg *arg OVS_UNUSED)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
>> +    data->iface_table = NULL;
>> +    shash_init(&data->iface_table_external_ids_old);
>> +
>> +    return data;
>> +}
>> +
>> +static void
>> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
>> +{
>> +    struct shash_node *node;
>> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
>> +        struct smap *ext_ids = node->data;
>> +        smap_destroy(ext_ids);
> 
> Nit: no need for SHASH_FOR_EACH_SAFE as far as I can tell.
> 
> 
>> +    }
>> +    shash_destroy_free_data(table_ext_ids);
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_cleanup(void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
>> +    shash_init(&data->iface_table_external_ids_old);
>> +
>> +    if (!data->iface_table) {
>> +        return;
>> +    }
>> +
>> +    const struct ovsrec_interface *iface_rec;
>> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
>> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
>> +        smap_clone(external_ids, &iface_rec->external_ids);
>> +        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
>> +                  external_ids);
>> +    }
>> +}
>> +
>> +static void
>> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
>> +{
>> +    struct ed_type_ovs_interface_shadow *data = data_;
>> +    struct ovsrec_interface_table *iface_table =
>> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> +            engine_get_input("OVS_interface", node));
>> +    data->iface_table = iface_table;
>> +    engine_set_node_state(node, EN_UPDATED);
>> +}
>> +
>> +static bool
>> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
>> +                                           void *data_)
>> +{
>> +    en_ovs_interface_shadow_run(node, data_);
>> +    return true;
>> +}
>> +
>>  struct ed_type_runtime_data {
>>      /* Contains "struct local_datapath" nodes. */
>>      struct hmap local_datapaths;
>> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
>>          (struct ovsrec_port_table *)EN_OVSDB_GET(
>>              engine_get_input("OVS_port", node));
>>  
>> -    struct ovsrec_interface_table *iface_table =
>> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
>> -            engine_get_input("OVS_interface", node));
>> +    struct ed_type_ovs_interface_shadow *iface_shadow =
>> +        engine_get_input_data("ovs_interface_shadow", node);
>>  
>>      struct ovsrec_qos_table *qos_table =
>>          (struct ovsrec_qos_table *)EN_OVSDB_GET(
>> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
>>      b_ctx_in->sbrec_port_binding_by_datapath = sbrec_port_binding_by_datapath;
>>      b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
>>      b_ctx_in->port_table = port_table;
>> -    b_ctx_in->iface_table = iface_table;
>> +    b_ctx_in->iface_table = iface_shadow->iface_table;
>> +    b_ctx_in->iface_table_external_ids_old =
>> +        &iface_shadow->iface_table_external_ids_old;
>>      b_ctx_in->qos_table = qos_table;
>>      b_ctx_in->port_binding_table = pb_table;
>>      b_ctx_in->br_int = br_int;
>> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
>>  }
>>  
>>  static bool
>> -runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
>> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
>>  {
>>      struct ed_type_runtime_data *rt_data = data;
>>      struct binding_ctx_in b_ctx_in;
>> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
>>  
>>      /* Define inc-proc-engine nodes. */
>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>> +                                      "ovs_interface_shadow");
>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
>>      ENGINE_NODE(non_vif_data, "non_vif_data");
>>      ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
>> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
>>      engine_add_input(&en_ct_zones, &en_runtime_data,
>>                       ct_zones_runtime_data_handler);
>>  
>> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>> +                     ovs_interface_shadow_ovs_interface_handler);
>> +
>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>>  
>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
>>       */
>>      engine_add_input(&en_runtime_data, &en_ovs_port,
>>                       engine_noop_handler);
>> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
>> -                     runtime_data_ovs_interface_handler);
>> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
>> +                     runtime_data_ovs_interface_shadow_handler);
>>  
>>      engine_add_input(&en_flow_output, &en_lflow_output,
>>                       flow_output_lflow_output_handler);
>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>> index bc133f93b..9affca498 100644
>> --- a/tests/ovn-performance.at
>> +++ b/tests/ovn-performance.at
>> @@ -366,6 +366,23 @@ for i in 1 2; do
>>           ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>>           ovn-nbctl --wait=hv sync]
>>      )
>> +
>> +    # Delete and recreate $lp to make it unbind and rebind multiple times, and
>> +    # ensure no recompute is triggered.
>> +    for k in $(seq 10); do
>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> +            [hv$i hv$j], [lflow_run],
>> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=xxxx &&
>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
>> +             ovn-nbctl --wait=hv sync]
>> +        )
>> +        OVN_CONTROLLER_EXPECT_NO_HIT(
>> +            [hv$i hv$j], [lflow_run],
>> +            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp &&
>> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
>> +             ovn-nbctl --wait=hv sync]
>> +        )
>> +    done
>>  done
>>  
>>  for i in 1 2; do
>
Han Zhou May 12, 2022, 7:33 p.m. UTC | #8
On Thu, May 12, 2022 at 9:32 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 5/12/22 10:58, Dumitru Ceara wrote:
> > On 5/3/22 21:37, Han Zhou wrote:
> >> During the process of claiming a VIF port on a chassis, there are
> >> multiple SB and local OVSDB updates by ovn-controller, which may
trigger
> >> ovn-controller recompute, depending on the timing between the
completion
> >> of SB and local OVSDB updates:
> >>
> >> After port claiming, ovn-controller sends an update to the local OVSDB
> >> to clean the Interface:external_ids:ovn-installed, and also sends an
> >> update to the SB DB to update the Port_Binding:chassis.  If the local
> >> OVSDB update notification comes back before the SB update completes,
> >> ovn-controller's SB IDL is still "in transaction" and read-only, but
the
> >> local Interface update triggers the I-P engine runtime_data node to
> >> handle the OVS_interface change, which considers the VIF to be claimed
> >> (again) potentially, which requires SB updates while it is not possible
> >> at the moment, and so the I-P handler returns failure and the engine
> >> aborts, which triggers recompute at the next I-P engine iteration when
> >> SB DB is writable.
> >>
> >> If we are lucky at the above step that the SB update completes before
> >> the local OVSDB update is received, ovn-controller would handle the
> >> change incrementally (and figures out that the SB Port_Binding chassis
> >> is already the current one thus wouldn't reclaim the VIF). But the next
> >> step after it completes flow computing for the newly bound VIF it sends
> >> another update to the local OVSDB to set the
> >> Interface:external_ids:ovn-installed and ovn-installed-ts, while at the
> >> same time it also sends an update to the SB DB to set the
> >> Port_Binding:up. These two parallel updates would lead to the same
> >> situation as the above step and potentially cause ovn-controll
> >> recompute.
> >>
> >> The recompute is very likely to be triggered in production because the
> >> SB DB is remote (on central nodes) and has heavier load, which requires
> >> longer time to complete the updates than the local OVSDB server.
> >>
> >> The problem is that the runtime_data's OVS_interface change handler
> >> couldn't handle the update of the Interface:external_ids:ovn-installed
> >> incrementally, because ovn-controller assumes it could require an SB
> >> update that is not possible at the moment. However, the assumption is
> >> not true. In fact it shouldn't even care about the change because this
> >> external_id:ovn-installed is written by ovn-controller itself and it is
> >> write-only (no need to read it as an input for any further processing).
> >>
> >> So, the idea to fix the problem is to ignore such changes that happen
to
> >> the "write-only" fields only. OVSDB change tracking provides APIs to
> >> check if a specific column is updated, but the difficulty here is that
> >> the column external_ids is a k-v map which contains not only the data
> >> that ovn-controller can ignore, such as ovn-installed,
ovn-installed-ts,
> >> but also the data ovn-controller cares about, such as iface-id,
> >> iface-id-ver and encap-ip. To solve this problem, we need to know
what's
> >> changed exactly inside the external_ids map.
> >>
> >> This patch solves it by introducing a new I-P engine node
> >> ovs_interface_shadow to wrap the OVSDB input ovs_interface with an
extra
> >> copy of the older version of the column external_ids, and replace the
> >> input OVS_interface of the runtime_data node. In the change handler of
the
> >> runtime_data node we evaluates if the change needs to be handled before
> >> calling consider_iface_claim(), so in the above scenarios the changes
> >> don't need to be handled and won't trigger recompute any more. A test
> >> case is also updated to capture this (which is very likely, if not
100%,
> >> to fail without the fix).
> >>
> >> Signed-off-by: Han Zhou <hzhou@ovn.org>
> >
> > Hi Han,
> >
> > Thanks for the fix!
> >
> > I would add:
> > Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS
flows are installed.")
> >
> > I have another tiny comment below.  I'm currently rerunning the
> > ovn-kuberenetes tests with your patch applied [0] because the run done
> > by ovsrobot failed to bring up a cluster [1] and I want to make sure
> > we're not breaking anything.
> >
>
> The CI job passed so, with the minor comments addressed:
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>
>

Thanks Dumitru. I addressed the minor comments and applied to main and
backported to branch-22.03.
@Mark Michelson <mmichels@redhat.com> I added your ack as well based on
your earlier statement :)

Thanks,
Han

> > Thanks,
> > Dumitru
> >
> > [0] https://github.com/dceara/ovn/actions/runs/2312210800
> > [1]
https://github.com/ovsrobot/ovn/runs/6280999584?check_suite_focus=true#step:9:725
> >
> >> ---
> >> v2: Add comments that was missing while committing v1.
> >>
> >>  controller/binding.c        |  70 ++++++++++++++++++++++++
> >>  controller/binding.h        |   1 +
> >>  controller/ovn-controller.c | 105 +++++++++++++++++++++++++++++++++---
> >>  tests/ovn-performance.at    |  17 ++++++
> >>  4 files changed, 186 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/controller/binding.c b/controller/binding.c
> >> index 7281b0485..0563d9d1e 100644
> >> --- a/controller/binding.c
> >> +++ b/controller/binding.c
> >> @@ -1907,6 +1907,71 @@ is_iface_in_int_bridge(const struct
ovsrec_interface *iface,
> >>      return false;
> >>  }
> >>
> >> +static bool
> >> +is_ext_id_changed(const struct smap *a, const struct smap *b, const
char *key)
> >> +{
> >> +    const char *value_a = smap_get(a, key);
> >> +    const char *value_b = smap_get(b, key);
> >> +    if ((value_a && !value_b)
> >> +        || (!value_a && value_b)
> >> +        || (value_a && value_b && strcmp(value_a, value_b))) {
> >> +        return true;
> >> +    }
> >> +    return false;
> >> +}
> >> +
> >> +/* Check if the change in 'iface_rec' is something we are interested
in from
> >> + * port binding perspective.  Return true if the change needs to be
handled,
> >> + * otherwise return false.
> >> + *
> >> + * The 'iface_rec' must be change tracked, i.e. iterator from
> >> + * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
> >> +static bool
> >> +ovs_interface_change_need_handle(const struct ovsrec_interface
*iface_rec,
> >> +                                 struct shash
*iface_table_external_ids_old)
> >> +{
> >> +    if (ovsrec_interface_is_updated(iface_rec,
> >> +                                    OVSREC_INTERFACE_COL_NAME)) {
> >> +        return true;
> >> +    }
> >> +    if (ovsrec_interface_is_updated(iface_rec,
> >> +                                    OVSREC_INTERFACE_COL_OFPORT)) {
> >> +        return true;
> >> +    }
> >> +    if (ovsrec_interface_is_updated(iface_rec,
> >> +                                    OVSREC_INTERFACE_COL_TYPE)) {
> >> +        return true;
> >> +    }
> >> +    if (ovsrec_interface_is_updated(iface_rec,
> >> +
 OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
> >> +        /* Compare the external_ids that we are interested in with
the old
> >> +         * values:
> >> +         * - iface-id
> >> +         * - iface-id-ver
> >> +         * - encap-ip
> >> +         * For any other changes, such as ovn-installed,
ovn-installed-ts, etc,
> >> +         * we don't need to handle. */
> >> +        struct smap *external_ids_old =
> >> +            shash_find_data(iface_table_external_ids_old,
iface_rec->name);
> >> +        if (!external_ids_old) {
> >> +            return true;
> >> +        }
> >> +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
> >> +                              "iface-id")) {
> >> +            return true;
> >> +        }
> >> +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
> >> +                              "iface-id-ver")) {
> >> +            return true;
> >> +        }
> >> +        if (is_ext_id_changed(&iface_rec->external_ids,
external_ids_old,
> >> +                              "encap-ip")) {
> >> +            return true;
> >> +        }
> >> +    }
> >> +    return false;
> >> +}
> >> +
> >>  /* Returns true if the ovs interface changes were handled
successfully,
> >>   * false otherwise.
> >>   */
> >> @@ -2018,6 +2083,11 @@ binding_handle_ovs_interface_changes(struct
binding_ctx_in *b_ctx_in,
> >>              continue;
> >>          }
> >>
> >> +        if (!ovs_interface_change_need_handle(
> >> +            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
> >> +            continue;
> >> +        }
> >> +
> >>          const char *iface_id = smap_get(&iface_rec->external_ids,
"iface-id");
> >>          int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
> >>          if (iface_id && ofport > 0 &&
> >> diff --git a/controller/binding.h b/controller/binding.h
> >> index 430a8d9b1..e49e1ebb6 100644
> >> --- a/controller/binding.h
> >> +++ b/controller/binding.h
> >> @@ -55,6 +55,7 @@ struct binding_ctx_in {
> >>      const struct ovsrec_bridge_table *bridge_table;
> >>      const struct ovsrec_open_vswitch_table *ovs_table;
> >>      const struct ovsrec_interface_table *iface_table;
> >> +    struct shash *iface_table_external_ids_old;
> >>  };
> >>
> >>  /* Locally relevant port bindings, e.g., VIFs that might be bound
locally,
> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> index 22b7fa935..649353f7d 100644
> >> --- a/controller/ovn-controller.c
> >> +++ b/controller/ovn-controller.c
> >> @@ -1042,6 +1042,91 @@ en_ofctrl_is_connected_run(struct engine_node
*node, void *data)
> >>      engine_set_node_state(node, EN_UNCHANGED);
> >>  }
> >>
> >> +/* This engine node is to wrap the OVS_interface input and maintain a
copy of
> >> + * the old version of data for the column external_ids.
> >> + *
> >> + * There are some special considerations of this engine node:
> >> + * 1. It has a single input OVS_interface, and it transparently
passes the
> >> + *    input changes as its own output data to its dependants. So
there is no
> >> + *    processing to OVS_interface changes but simply mark the node
status as
> >> + *    UPDATED (and so the run() and the change handler is the same).
> >> + * 2. The iface_table_external_ids_old is computed/updated in the
member
> >> + *    clear_tracked_data(), because that is when the last round of
processing
> >> + *    has completed but the new IDL data is yet to refresh, so we
replace the
> >> + *    old data with the current data. */
> >> +struct ed_type_ovs_interface_shadow {
> >> +    struct ovsrec_interface_table *iface_table;
> >> +    struct shash iface_table_external_ids_old;
> >> +};
> >> +
> >> +static void *
> >> +en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
> >> +                             struct engine_arg *arg OVS_UNUSED)
> >> +{
> >> +    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
> >> +    data->iface_table = NULL;
> >> +    shash_init(&data->iface_table_external_ids_old);
> >> +
> >> +    return data;
> >> +}
> >> +
> >> +static void
> >> +iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
> >> +{
> >> +    struct shash_node *node;
> >> +    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
> >> +        struct smap *ext_ids = node->data;
> >> +        smap_destroy(ext_ids);
> >
> > Nit: no need for SHASH_FOR_EACH_SAFE as far as I can tell.
> >
> >
> >> +    }
> >> +    shash_destroy_free_data(table_ext_ids);
> >> +}
> >> +
> >> +static void
> >> +en_ovs_interface_shadow_cleanup(void *data_)
> >> +{
> >> +    struct ed_type_ovs_interface_shadow *data = data_;
> >> +
 iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> >> +}
> >> +
> >> +static void
> >> +en_ovs_interface_shadow_clear_tracked_data(void *data_)
> >> +{
> >> +    struct ed_type_ovs_interface_shadow *data = data_;
> >> +
 iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
> >> +    shash_init(&data->iface_table_external_ids_old);
> >> +
> >> +    if (!data->iface_table) {
> >> +        return;
> >> +    }
> >> +
> >> +    const struct ovsrec_interface *iface_rec;
> >> +    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
> >> +        struct smap *external_ids = xmalloc(sizeof *external_ids);
> >> +        smap_clone(external_ids, &iface_rec->external_ids);
> >> +        shash_add(&data->iface_table_external_ids_old,
iface_rec->name,
> >> +                  external_ids);
> >> +    }
> >> +}
> >> +
> >> +static void
> >> +en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
> >> +{
> >> +    struct ed_type_ovs_interface_shadow *data = data_;
> >> +    struct ovsrec_interface_table *iface_table =
> >> +        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> >> +            engine_get_input("OVS_interface", node));
> >> +    data->iface_table = iface_table;
> >> +    engine_set_node_state(node, EN_UPDATED);
> >> +}
> >> +
> >> +static bool
> >> +ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
> >> +                                           void *data_)
> >> +{
> >> +    en_ovs_interface_shadow_run(node, data_);
> >> +    return true;
> >> +}
> >> +
> >>  struct ed_type_runtime_data {
> >>      /* Contains "struct local_datapath" nodes. */
> >>      struct hmap local_datapaths;
> >> @@ -1214,9 +1299,8 @@ init_binding_ctx(struct engine_node *node,
> >>          (struct ovsrec_port_table *)EN_OVSDB_GET(
> >>              engine_get_input("OVS_port", node));
> >>
> >> -    struct ovsrec_interface_table *iface_table =
> >> -        (struct ovsrec_interface_table *)EN_OVSDB_GET(
> >> -            engine_get_input("OVS_interface", node));
> >> +    struct ed_type_ovs_interface_shadow *iface_shadow =
> >> +        engine_get_input_data("ovs_interface_shadow", node);
> >>
> >>      struct ovsrec_qos_table *qos_table =
> >>          (struct ovsrec_qos_table *)EN_OVSDB_GET(
> >> @@ -1249,7 +1333,9 @@ init_binding_ctx(struct engine_node *node,
> >>      b_ctx_in->sbrec_port_binding_by_datapath =
sbrec_port_binding_by_datapath;
> >>      b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
> >>      b_ctx_in->port_table = port_table;
> >> -    b_ctx_in->iface_table = iface_table;
> >> +    b_ctx_in->iface_table = iface_shadow->iface_table;
> >> +    b_ctx_in->iface_table_external_ids_old =
> >> +        &iface_shadow->iface_table_external_ids_old;
> >>      b_ctx_in->qos_table = qos_table;
> >>      b_ctx_in->port_binding_table = pb_table;
> >>      b_ctx_in->br_int = br_int;
> >> @@ -1331,7 +1417,7 @@ en_runtime_data_run(struct engine_node *node,
void *data)
> >>  }
> >>
> >>  static bool
> >> -runtime_data_ovs_interface_handler(struct engine_node *node, void
*data)
> >> +runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
void *data)
> >>  {
> >>      struct ed_type_runtime_data *rt_data = data;
> >>      struct binding_ctx_in b_ctx_in;
> >> @@ -3338,6 +3424,8 @@ main(int argc, char *argv[])
> >>
> >>      /* Define inc-proc-engine nodes. */
> >>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> >> +    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >> +                                      "ovs_interface_shadow");
> >>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
> >>      ENGINE_NODE(non_vif_data, "non_vif_data");
> >>      ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
> >> @@ -3459,6 +3547,9 @@ main(int argc, char *argv[])
> >>      engine_add_input(&en_ct_zones, &en_runtime_data,
> >>                       ct_zones_runtime_data_handler);
> >>
> >> +    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >> +                     ovs_interface_shadow_ovs_interface_handler);
> >> +
> >>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
> >>
> >>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> >> @@ -3480,8 +3571,8 @@ main(int argc, char *argv[])
> >>       */
> >>      engine_add_input(&en_runtime_data, &en_ovs_port,
> >>                       engine_noop_handler);
> >> -    engine_add_input(&en_runtime_data, &en_ovs_interface,
> >> -                     runtime_data_ovs_interface_handler);
> >> +    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
> >> +                     runtime_data_ovs_interface_shadow_handler);
> >>
> >>      engine_add_input(&en_flow_output, &en_lflow_output,
> >>                       flow_output_lflow_output_handler);
> >> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
> >> index bc133f93b..9affca498 100644
> >> --- a/tests/ovn-performance.at
> >> +++ b/tests/ovn-performance.at
> >> @@ -366,6 +366,23 @@ for i in 1 2; do
> >>           ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> >>           ovn-nbctl --wait=hv sync]
> >>      )
> >> +
> >> +    # Delete and recreate $lp to make it unbind and rebind multiple
times, and
> >> +    # ensure no recompute is triggered.
> >> +    for k in $(seq 10); do
> >> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> >> +            [hv$i hv$j], [lflow_run],
> >> +            [as hv$i ovs-vsctl set Interface $vif
external-ids:iface-id=xxxx &&
> >> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false'
&&
> >> +             ovn-nbctl --wait=hv sync]
> >> +        )
> >> +        OVN_CONTROLLER_EXPECT_NO_HIT(
> >> +            [hv$i hv$j], [lflow_run],
> >> +            [as hv$i ovs-vsctl set Interface $vif
external-ids:iface-id=$lp &&
> >> +             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
> >> +             ovn-nbctl --wait=hv sync]
> >> +        )
> >> +    done
> >>  done
> >>
> >>  for i in 1 2; do
> >
>
diff mbox series

Patch

diff --git a/controller/binding.c b/controller/binding.c
index 7281b0485..0563d9d1e 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1907,6 +1907,71 @@  is_iface_in_int_bridge(const struct ovsrec_interface *iface,
     return false;
 }
 
+static bool
+is_ext_id_changed(const struct smap *a, const struct smap *b, const char *key)
+{
+    const char *value_a = smap_get(a, key);
+    const char *value_b = smap_get(b, key);
+    if ((value_a && !value_b)
+        || (!value_a && value_b)
+        || (value_a && value_b && strcmp(value_a, value_b))) {
+        return true;
+    }
+    return false;
+}
+
+/* Check if the change in 'iface_rec' is something we are interested in from
+ * port binding perspective.  Return true if the change needs to be handled,
+ * otherwise return false.
+ *
+ * The 'iface_rec' must be change tracked, i.e. iterator from
+ * OVSREC_INTERFACE_TABLE_FOR_EACH_TRACKED. */
+static bool
+ovs_interface_change_need_handle(const struct ovsrec_interface *iface_rec,
+                                 struct shash *iface_table_external_ids_old)
+{
+    if (ovsrec_interface_is_updated(iface_rec,
+                                    OVSREC_INTERFACE_COL_NAME)) {
+        return true;
+    }
+    if (ovsrec_interface_is_updated(iface_rec,
+                                    OVSREC_INTERFACE_COL_OFPORT)) {
+        return true;
+    }
+    if (ovsrec_interface_is_updated(iface_rec,
+                                    OVSREC_INTERFACE_COL_TYPE)) {
+        return true;
+    }
+    if (ovsrec_interface_is_updated(iface_rec,
+                                    OVSREC_INTERFACE_COL_EXTERNAL_IDS)) {
+        /* Compare the external_ids that we are interested in with the old
+         * values:
+         * - iface-id
+         * - iface-id-ver
+         * - encap-ip
+         * For any other changes, such as ovn-installed, ovn-installed-ts, etc,
+         * we don't need to handle. */
+        struct smap *external_ids_old =
+            shash_find_data(iface_table_external_ids_old, iface_rec->name);
+        if (!external_ids_old) {
+            return true;
+        }
+        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
+                              "iface-id")) {
+            return true;
+        }
+        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
+                              "iface-id-ver")) {
+            return true;
+        }
+        if (is_ext_id_changed(&iface_rec->external_ids, external_ids_old,
+                              "encap-ip")) {
+            return true;
+        }
+    }
+    return false;
+}
+
 /* Returns true if the ovs interface changes were handled successfully,
  * false otherwise.
  */
@@ -2018,6 +2083,11 @@  binding_handle_ovs_interface_changes(struct binding_ctx_in *b_ctx_in,
             continue;
         }
 
+        if (!ovs_interface_change_need_handle(
+            iface_rec, b_ctx_in->iface_table_external_ids_old)) {
+            continue;
+        }
+
         const char *iface_id = smap_get(&iface_rec->external_ids, "iface-id");
         int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
         if (iface_id && ofport > 0 &&
diff --git a/controller/binding.h b/controller/binding.h
index 430a8d9b1..e49e1ebb6 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -55,6 +55,7 @@  struct binding_ctx_in {
     const struct ovsrec_bridge_table *bridge_table;
     const struct ovsrec_open_vswitch_table *ovs_table;
     const struct ovsrec_interface_table *iface_table;
+    struct shash *iface_table_external_ids_old;
 };
 
 /* Locally relevant port bindings, e.g., VIFs that might be bound locally,
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 22b7fa935..649353f7d 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1042,6 +1042,91 @@  en_ofctrl_is_connected_run(struct engine_node *node, void *data)
     engine_set_node_state(node, EN_UNCHANGED);
 }
 
+/* This engine node is to wrap the OVS_interface input and maintain a copy of
+ * the old version of data for the column external_ids.
+ *
+ * There are some special considerations of this engine node:
+ * 1. It has a single input OVS_interface, and it transparently passes the
+ *    input changes as its own output data to its dependants. So there is no
+ *    processing to OVS_interface changes but simply mark the node status as
+ *    UPDATED (and so the run() and the change handler is the same).
+ * 2. The iface_table_external_ids_old is computed/updated in the member
+ *    clear_tracked_data(), because that is when the last round of processing
+ *    has completed but the new IDL data is yet to refresh, so we replace the
+ *    old data with the current data. */
+struct ed_type_ovs_interface_shadow {
+    struct ovsrec_interface_table *iface_table;
+    struct shash iface_table_external_ids_old;
+};
+
+static void *
+en_ovs_interface_shadow_init(struct engine_node *node OVS_UNUSED,
+                             struct engine_arg *arg OVS_UNUSED)
+{
+    struct ed_type_ovs_interface_shadow *data = xzalloc(sizeof *data);
+    data->iface_table = NULL;
+    shash_init(&data->iface_table_external_ids_old);
+
+    return data;
+}
+
+static void
+iface_table_external_ids_old_destroy(struct shash *table_ext_ids)
+{
+    struct shash_node *node;
+    SHASH_FOR_EACH_SAFE (node, table_ext_ids) {
+        struct smap *ext_ids = node->data;
+        smap_destroy(ext_ids);
+    }
+    shash_destroy_free_data(table_ext_ids);
+}
+
+static void
+en_ovs_interface_shadow_cleanup(void *data_)
+{
+    struct ed_type_ovs_interface_shadow *data = data_;
+    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
+}
+
+static void
+en_ovs_interface_shadow_clear_tracked_data(void *data_)
+{
+    struct ed_type_ovs_interface_shadow *data = data_;
+    iface_table_external_ids_old_destroy(&data->iface_table_external_ids_old);
+    shash_init(&data->iface_table_external_ids_old);
+
+    if (!data->iface_table) {
+        return;
+    }
+
+    const struct ovsrec_interface *iface_rec;
+    OVSREC_INTERFACE_TABLE_FOR_EACH (iface_rec, data->iface_table) {
+        struct smap *external_ids = xmalloc(sizeof *external_ids);
+        smap_clone(external_ids, &iface_rec->external_ids);
+        shash_add(&data->iface_table_external_ids_old, iface_rec->name,
+                  external_ids);
+    }
+}
+
+static void
+en_ovs_interface_shadow_run(struct engine_node *node, void *data_)
+{
+    struct ed_type_ovs_interface_shadow *data = data_;
+    struct ovsrec_interface_table *iface_table =
+        (struct ovsrec_interface_table *)EN_OVSDB_GET(
+            engine_get_input("OVS_interface", node));
+    data->iface_table = iface_table;
+    engine_set_node_state(node, EN_UPDATED);
+}
+
+static bool
+ovs_interface_shadow_ovs_interface_handler(struct engine_node *node,
+                                           void *data_)
+{
+    en_ovs_interface_shadow_run(node, data_);
+    return true;
+}
+
 struct ed_type_runtime_data {
     /* Contains "struct local_datapath" nodes. */
     struct hmap local_datapaths;
@@ -1214,9 +1299,8 @@  init_binding_ctx(struct engine_node *node,
         (struct ovsrec_port_table *)EN_OVSDB_GET(
             engine_get_input("OVS_port", node));
 
-    struct ovsrec_interface_table *iface_table =
-        (struct ovsrec_interface_table *)EN_OVSDB_GET(
-            engine_get_input("OVS_interface", node));
+    struct ed_type_ovs_interface_shadow *iface_shadow =
+        engine_get_input_data("ovs_interface_shadow", node);
 
     struct ovsrec_qos_table *qos_table =
         (struct ovsrec_qos_table *)EN_OVSDB_GET(
@@ -1249,7 +1333,9 @@  init_binding_ctx(struct engine_node *node,
     b_ctx_in->sbrec_port_binding_by_datapath = sbrec_port_binding_by_datapath;
     b_ctx_in->sbrec_port_binding_by_name = sbrec_port_binding_by_name;
     b_ctx_in->port_table = port_table;
-    b_ctx_in->iface_table = iface_table;
+    b_ctx_in->iface_table = iface_shadow->iface_table;
+    b_ctx_in->iface_table_external_ids_old =
+        &iface_shadow->iface_table_external_ids_old;
     b_ctx_in->qos_table = qos_table;
     b_ctx_in->port_binding_table = pb_table;
     b_ctx_in->br_int = br_int;
@@ -1331,7 +1417,7 @@  en_runtime_data_run(struct engine_node *node, void *data)
 }
 
 static bool
-runtime_data_ovs_interface_handler(struct engine_node *node, void *data)
+runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
 {
     struct ed_type_runtime_data *rt_data = data;
     struct binding_ctx_in b_ctx_in;
@@ -3338,6 +3424,8 @@  main(int argc, char *argv[])
 
     /* Define inc-proc-engine nodes. */
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
+    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
+                                      "ovs_interface_shadow");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
     ENGINE_NODE(non_vif_data, "non_vif_data");
     ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
@@ -3459,6 +3547,9 @@  main(int argc, char *argv[])
     engine_add_input(&en_ct_zones, &en_runtime_data,
                      ct_zones_runtime_data_handler);
 
+    engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
+                     ovs_interface_shadow_ovs_interface_handler);
+
     engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
 
     engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
@@ -3480,8 +3571,8 @@  main(int argc, char *argv[])
      */
     engine_add_input(&en_runtime_data, &en_ovs_port,
                      engine_noop_handler);
-    engine_add_input(&en_runtime_data, &en_ovs_interface,
-                     runtime_data_ovs_interface_handler);
+    engine_add_input(&en_runtime_data, &en_ovs_interface_shadow,
+                     runtime_data_ovs_interface_shadow_handler);
 
     engine_add_input(&en_flow_output, &en_lflow_output,
                      flow_output_lflow_output_handler);
diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
index bc133f93b..9affca498 100644
--- a/tests/ovn-performance.at
+++ b/tests/ovn-performance.at
@@ -366,6 +366,23 @@  for i in 1 2; do
          ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
          ovn-nbctl --wait=hv sync]
     )
+
+    # Delete and recreate $lp to make it unbind and rebind multiple times, and
+    # ensure no recompute is triggered.
+    for k in $(seq 10); do
+        OVN_CONTROLLER_EXPECT_NO_HIT(
+            [hv$i hv$j], [lflow_run],
+            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=xxxx &&
+             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=false' &&
+             ovn-nbctl --wait=hv sync]
+        )
+        OVN_CONTROLLER_EXPECT_NO_HIT(
+            [hv$i hv$j], [lflow_run],
+            [as hv$i ovs-vsctl set Interface $vif external-ids:iface-id=$lp &&
+             ovn-nbctl wait-until Logical_Switch_Port $lp 'up=true' &&
+             ovn-nbctl --wait=hv sync]
+        )
+    done
 done
 
 for i in 1 2; do