diff mbox series

[ovs-dev,ovn,2/2] ovn-controller: Fix flow installation latency caused by recompute.

Message ID 1564450434-104928-2-git-send-email-hzhou8@ebay.com
State Accepted
Headers show
Series [ovs-dev,ovn,1/2] ovn-controller: Fix path for lib/inc-proc-eng.h after OVN split. | expand

Commit Message

Han Zhou July 30, 2019, 1:33 a.m. UTC
From: Han Zhou <hzhou8@ebay.com>

When there are in-flight flow-installations pending to ovs-vswitchd,
current incremental processing logic prioritizes new change handling.
However, in scenarios where frequent recomputes are triggered, the
later recompute would block the flow-installation for previously
computed flows because recompute usually takes long time, especially
when there are large number of flows. This results in worse latency
than the version without incremental processing in specific scale
test scenarios.

While we can simply fix the problem by prioritizing flow installation
rather than new change handling, it can cause the incremental
processing to degrade to always recompute in certain scenarios when
there are some changes triggering recomputes, followed by a lot of
continously coming changes that can be handled incrementally. Because
OVSDB change tracker cannot preserve changes across iterations, once
the recompute is triggered and resulted in a lot of pending messages
to ovs-vswitchd, and if we choose to skip the engine_run()
in the next iteration when a incrementally processible change comes,
we miss the opportunity to handle that tracked change and will have
to trigger recompute again in the next next iteration, and so on, if
such changes come continously.

This patch solves the problem by introducing engine_set_abort_recompute(),
so that we can prioritize new change handling if the change can be
incrementally processed, but if the change triggers recompute, we
abort there without spending CPU on the recompute to avoid blocking
the previous computed flow installation.

Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
Tested-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
---
 controller/ofctrl.c         |  2 +-
 controller/ofctrl.h         |  1 +
 controller/ovn-controller.c | 30 +++++++++++++++++++++++++++---
 lib/inc-proc-eng.c          | 26 ++++++++++++++++++++++----
 lib/inc-proc-eng.h          |  9 +++++++--
 5 files changed, 58 insertions(+), 10 deletions(-)

Comments

Numan Siddique July 30, 2019, 4:09 a.m. UTC | #1
On Tue, Jul 30, 2019 at 7:10 AM Han Zhou <zhouhan@gmail.com> wrote:

> From: Han Zhou <hzhou8@ebay.com>
>
> When there are in-flight flow-installations pending to ovs-vswitchd,
> current incremental processing logic prioritizes new change handling.
> However, in scenarios where frequent recomputes are triggered, the
> later recompute would block the flow-installation for previously
> computed flows because recompute usually takes long time, especially
> when there are large number of flows. This results in worse latency
> than the version without incremental processing in specific scale
> test scenarios.
>
> While we can simply fix the problem by prioritizing flow installation
> rather than new change handling, it can cause the incremental
> processing to degrade to always recompute in certain scenarios when
> there are some changes triggering recomputes, followed by a lot of
> continously coming changes that can be handled incrementally. Because
> OVSDB change tracker cannot preserve changes across iterations, once
> the recompute is triggered and resulted in a lot of pending messages
> to ovs-vswitchd, and if we choose to skip the engine_run()
> in the next iteration when a incrementally processible change comes,
> we miss the opportunity to handle that tracked change and will have
> to trigger recompute again in the next next iteration, and so on, if
> such changes come continously.
>
> This patch solves the problem by introducing engine_set_abort_recompute(),
> so that we can prioritize new change handling if the change can be
> incrementally processed, but if the change triggers recompute, we
> abort there without spending CPU on the recompute to avoid blocking
> the previous computed flow installation.
>
> Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
> Reported-by: Numan Siddique <nusiddiq@redhat.com>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
> Tested-by: Numan Siddique <nusiddiq@redhat.com>
> Acked-by: Numan Siddique <nusiddiq@redhat.com>
> Acked-by: Mark Michelson <mmichels@redhat.com>
> Signed-off-by: Han Zhou <hzhou8@ebay.com>
> ---
>

Thanks. I applied this series to master.

Numan


>  controller/ofctrl.c         |  2 +-
>  controller/ofctrl.h         |  1 +
>  controller/ovn-controller.c | 30 +++++++++++++++++++++++++++---
>  lib/inc-proc-eng.c          | 26 ++++++++++++++++++++++----
>  lib/inc-proc-eng.h          |  9 +++++++--
>  5 files changed, 58 insertions(+), 10 deletions(-)
>
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 4d24a8b..8928205 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired,
>   * in the correct state and not backlogged with existing flow_mods.  (Our
>   * criteria for being backlogged appear very conservative, but the socket
>   * between ovn-controller and OVS provides some buffering.) */
> -static bool
> +bool
>  ofctrl_can_put(void)
>  {
>      if (state != S_UPDATE_FLOWS
> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> index 114c9ef..1e9ac16 100644
> --- a/controller/ofctrl.h
> +++ b/controller/ofctrl.h
> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *,
>                  const struct sbrec_meter_table *,
>                  int64_t nb_cfg,
>                  bool flow_changed);
> +bool ofctrl_can_put(void);
>  void ofctrl_wait(void);
>  void ofctrl_destroy(void);
>  int64_t ofctrl_get_cur_cfg(void);
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index d3b28b9..ad33d17 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1896,6 +1896,7 @@ main(int argc, char *argv[])
>
>      uint64_t engine_run_id = 0;
>      uint64_t old_engine_run_id = 0;
> +    bool engine_aborted = false;
>
>      unsigned int ovs_cond_seqno = UINT_MAX;
>      unsigned int ovnsb_cond_seqno = UINT_MAX;
> @@ -1982,7 +1983,30 @@ main(int argc, char *argv[])
>                      stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>                                      time_msec());
>                      if (ovnsb_idl_txn) {
> -                        engine_run(&en_flow_output, ++engine_run_id);
> +                        if (!ofctrl_can_put()) {
> +                            /* When there are in-flight messages pending
> to
> +                             * ovs-vswitchd, we should hold on
> recomputing so
> +                             * that the previous flow installations won't
> be
> +                             * delayed.  However, we still want to try if
> +                             * recompute is not needed and we can quickly
> +                             * incrementally process the new changes, to
> avoid
> +                             * unnecessarily forced recomputes later on.
> This
> +                             * is because the OVSDB change tracker cannot
> +                             * preserve tracked changes across
> iterations.  If
> +                             * change tracking is improved, we can simply
> skip
> +                             * this round of engine_run and continue
> processing
> +                             * acculated changes incrementally later when
> +                             * ofctrl_can_put() returns true. */
> +                            if (!engine_aborted) {
> +                                engine_set_abort_recompute(true);
> +                                engine_aborted =
> engine_run(&en_flow_output,
> +
> ++engine_run_id);
> +                            }
> +                        } else {
> +                            engine_set_abort_recompute(false);
> +                            engine_aborted = false;
> +                            engine_run(&en_flow_output, ++engine_run_id);
> +                        }
>                      }
>                      stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>                                     time_msec());
> @@ -2024,8 +2048,8 @@ main(int argc, char *argv[])
>                  }
>
>              }
> -            if (old_engine_run_id == engine_run_id) {
> -                if (engine_need_run(&en_flow_output)) {
> +            if (old_engine_run_id == engine_run_id || engine_aborted) {
> +                if (engine_aborted || engine_need_run(&en_flow_output)) {
>                      VLOG_DBG("engine did not run, force recompute next
> time: "
>                               "br_int %p, chassis %p", br_int, chassis);
>                      engine_set_force_recompute(true);
> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
> index 1ddea1a..1064a08 100644
> --- a/lib/inc-proc-eng.c
> +++ b/lib/inc-proc-eng.c
> @@ -31,6 +31,7 @@
>  VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
>
>  static bool engine_force_recompute = false;
> +static bool engine_abort_recompute = false;
>  static const struct engine_context *engine_context;
>
>  void
> @@ -39,6 +40,12 @@ engine_set_force_recompute(bool val)
>      engine_force_recompute = val;
>  }
>
> +void
> +engine_set_abort_recompute(bool val)
> +{
> +    engine_abort_recompute = val;
> +}
> +
>  const struct engine_context *
>  engine_get_context(void)
>  {
> @@ -121,11 +128,11 @@ engine_ovsdb_node_add_index(struct engine_node
> *node, const char *name,
>      ed->n_indexes ++;
>  }
>
> -void
> +bool
>  engine_run(struct engine_node *node, uint64_t run_id)
>  {
>      if (node->run_id == run_id) {
> -        return;
> +        return true;
>      }
>      node->run_id = run_id;
>
> @@ -133,11 +140,13 @@ engine_run(struct engine_node *node, uint64_t run_id)
>      if (!node->n_inputs) {
>          node->run(node);
>          VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
> -        return;
> +        return true;
>      }
>
>      for (size_t i = 0; i < node->n_inputs; i++) {
> -        engine_run(node->inputs[i].node, run_id);
> +        if (!engine_run(node->inputs[i].node, run_id)) {
> +            return false;
> +        }
>      }
>
>      bool need_compute = false;
> @@ -160,6 +169,10 @@ engine_run(struct engine_node *node, uint64_t run_id)
>      if (need_recompute) {
>          VLOG_DBG("node: %s, recompute (%s)", node->name,
>                   engine_force_recompute ? "forced" : "triggered");
> +        if (engine_abort_recompute) {
> +            VLOG_DBG("node: %s, recompute aborted", node->name);
> +            return false;
> +        }
>          node->run(node);
>      } else if (need_compute) {
>          for (size_t i = 0; i < node->n_inputs; i++) {
> @@ -170,6 +183,10 @@ engine_run(struct engine_node *node, uint64_t run_id)
>                      VLOG_DBG("node: %s, can't handle change for input %s,
> "
>                               "fall back to recompute",
>                               node->name, node->inputs[i].node->name);
> +                    if (engine_abort_recompute) {
> +                        VLOG_DBG("node: %s, recompute aborted",
> node->name);
> +                        return false;
> +                    }
>                      node->run(node);
>                      break;
>                  }
> @@ -178,6 +195,7 @@ engine_run(struct engine_node *node, uint64_t run_id)
>      }
>
>      VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
> +    return true;
>  }
>
>  bool
> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> index 6f0d08d..3a69dc2 100644
> --- a/lib/inc-proc-eng.h
> +++ b/lib/inc-proc-eng.h
> @@ -121,8 +121,9 @@ struct engine_node {
>  void engine_init(struct engine_node *);
>
>  /* Execute the processing recursively, which should be called in the main
> - * loop. */
> -void engine_run(struct engine_node *, uint64_t run_id);
> + * loop. Returns true if the execution is compelte, false if it is
> aborted,
> + * which could happen when engine_abort_recompute is set. */
> +bool engine_run(struct engine_node *, uint64_t run_id);
>
>  /* Clean up the data for the engine nodes recursively. It calls each
> node's
>   * cleanup() method if not NULL. It should be called before the program
> @@ -150,6 +151,10 @@ void engine_add_input(struct engine_node *node,
> struct engine_node *input,
>   * iteration, and the change can't be tracked across iterations */
>  void engine_set_force_recompute(bool val);
>
> +/* Set the flag to cause engine execution to be aborted when there
> + * is any recompute to be triggered in any node. */
> +void engine_set_abort_recompute(bool val);
> +
>  const struct engine_context * engine_get_context(void);
>
>  void engine_set_context(const struct engine_context *);
> --
> 2.1.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique July 30, 2019, 4:11 a.m. UTC | #2
On Tue, Jul 30, 2019 at 9:39 AM Numan Siddique <nusiddiq@redhat.com> wrote:

>
>
> On Tue, Jul 30, 2019 at 7:10 AM Han Zhou <zhouhan@gmail.com> wrote:
>
>> From: Han Zhou <hzhou8@ebay.com>
>>
>> When there are in-flight flow-installations pending to ovs-vswitchd,
>> current incremental processing logic prioritizes new change handling.
>> However, in scenarios where frequent recomputes are triggered, the
>> later recompute would block the flow-installation for previously
>> computed flows because recompute usually takes long time, especially
>> when there are large number of flows. This results in worse latency
>> than the version without incremental processing in specific scale
>> test scenarios.
>>
>> While we can simply fix the problem by prioritizing flow installation
>> rather than new change handling, it can cause the incremental
>> processing to degrade to always recompute in certain scenarios when
>> there are some changes triggering recomputes, followed by a lot of
>> continously coming changes that can be handled incrementally. Because
>> OVSDB change tracker cannot preserve changes across iterations, once
>> the recompute is triggered and resulted in a lot of pending messages
>> to ovs-vswitchd, and if we choose to skip the engine_run()
>> in the next iteration when a incrementally processible change comes,
>> we miss the opportunity to handle that tracked change and will have
>> to trigger recompute again in the next next iteration, and so on, if
>> such changes come continously.
>>
>> This patch solves the problem by introducing engine_set_abort_recompute(),
>> so that we can prioritize new change handling if the change can be
>> incrementally processed, but if the change triggers recompute, we
>> abort there without spending CPU on the recompute to avoid blocking
>> the previous computed flow installation.
>>
>> Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
>> Reported-by: Numan Siddique <nusiddiq@redhat.com>
>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
>> Tested-by: Numan Siddique <nusiddiq@redhat.com>
>> Acked-by: Numan Siddique <nusiddiq@redhat.com>
>> Acked-by: Mark Michelson <mmichels@redhat.com>
>> Signed-off-by: Han Zhou <hzhou8@ebay.com>
>> ---
>>
>
> Thanks. I applied this series to master.
>
> Numan
>

I think we need this fix for branch 2.12, can you please post the back port
patch.

Thanks
Numan


>
>
>>  controller/ofctrl.c         |  2 +-
>>  controller/ofctrl.h         |  1 +
>>  controller/ovn-controller.c | 30 +++++++++++++++++++++++++++---
>>  lib/inc-proc-eng.c          | 26 ++++++++++++++++++++++----
>>  lib/inc-proc-eng.h          |  9 +++++++--
>>  5 files changed, 58 insertions(+), 10 deletions(-)
>>
>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>> index 4d24a8b..8928205 100644
>> --- a/controller/ofctrl.c
>> +++ b/controller/ofctrl.c
>> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired,
>>   * in the correct state and not backlogged with existing flow_mods.  (Our
>>   * criteria for being backlogged appear very conservative, but the socket
>>   * between ovn-controller and OVS provides some buffering.) */
>> -static bool
>> +bool
>>  ofctrl_can_put(void)
>>  {
>>      if (state != S_UPDATE_FLOWS
>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>> index 114c9ef..1e9ac16 100644
>> --- a/controller/ofctrl.h
>> +++ b/controller/ofctrl.h
>> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *,
>>                  const struct sbrec_meter_table *,
>>                  int64_t nb_cfg,
>>                  bool flow_changed);
>> +bool ofctrl_can_put(void);
>>  void ofctrl_wait(void);
>>  void ofctrl_destroy(void);
>>  int64_t ofctrl_get_cur_cfg(void);
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index d3b28b9..ad33d17 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -1896,6 +1896,7 @@ main(int argc, char *argv[])
>>
>>      uint64_t engine_run_id = 0;
>>      uint64_t old_engine_run_id = 0;
>> +    bool engine_aborted = false;
>>
>>      unsigned int ovs_cond_seqno = UINT_MAX;
>>      unsigned int ovnsb_cond_seqno = UINT_MAX;
>> @@ -1982,7 +1983,30 @@ main(int argc, char *argv[])
>>                      stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>                                      time_msec());
>>                      if (ovnsb_idl_txn) {
>> -                        engine_run(&en_flow_output, ++engine_run_id);
>> +                        if (!ofctrl_can_put()) {
>> +                            /* When there are in-flight messages pending
>> to
>> +                             * ovs-vswitchd, we should hold on
>> recomputing so
>> +                             * that the previous flow installations
>> won't be
>> +                             * delayed.  However, we still want to try if
>> +                             * recompute is not needed and we can quickly
>> +                             * incrementally process the new changes, to
>> avoid
>> +                             * unnecessarily forced recomputes later
>> on.  This
>> +                             * is because the OVSDB change tracker cannot
>> +                             * preserve tracked changes across
>> iterations.  If
>> +                             * change tracking is improved, we can
>> simply skip
>> +                             * this round of engine_run and continue
>> processing
>> +                             * acculated changes incrementally later when
>> +                             * ofctrl_can_put() returns true. */
>> +                            if (!engine_aborted) {
>> +                                engine_set_abort_recompute(true);
>> +                                engine_aborted =
>> engine_run(&en_flow_output,
>> +
>> ++engine_run_id);
>> +                            }
>> +                        } else {
>> +                            engine_set_abort_recompute(false);
>> +                            engine_aborted = false;
>> +                            engine_run(&en_flow_output, ++engine_run_id);
>> +                        }
>>                      }
>>                      stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>                                     time_msec());
>> @@ -2024,8 +2048,8 @@ main(int argc, char *argv[])
>>                  }
>>
>>              }
>> -            if (old_engine_run_id == engine_run_id) {
>> -                if (engine_need_run(&en_flow_output)) {
>> +            if (old_engine_run_id == engine_run_id || engine_aborted) {
>> +                if (engine_aborted || engine_need_run(&en_flow_output)) {
>>                      VLOG_DBG("engine did not run, force recompute next
>> time: "
>>                               "br_int %p, chassis %p", br_int, chassis);
>>                      engine_set_force_recompute(true);
>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
>> index 1ddea1a..1064a08 100644
>> --- a/lib/inc-proc-eng.c
>> +++ b/lib/inc-proc-eng.c
>> @@ -31,6 +31,7 @@
>>  VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
>>
>>  static bool engine_force_recompute = false;
>> +static bool engine_abort_recompute = false;
>>  static const struct engine_context *engine_context;
>>
>>  void
>> @@ -39,6 +40,12 @@ engine_set_force_recompute(bool val)
>>      engine_force_recompute = val;
>>  }
>>
>> +void
>> +engine_set_abort_recompute(bool val)
>> +{
>> +    engine_abort_recompute = val;
>> +}
>> +
>>  const struct engine_context *
>>  engine_get_context(void)
>>  {
>> @@ -121,11 +128,11 @@ engine_ovsdb_node_add_index(struct engine_node
>> *node, const char *name,
>>      ed->n_indexes ++;
>>  }
>>
>> -void
>> +bool
>>  engine_run(struct engine_node *node, uint64_t run_id)
>>  {
>>      if (node->run_id == run_id) {
>> -        return;
>> +        return true;
>>      }
>>      node->run_id = run_id;
>>
>> @@ -133,11 +140,13 @@ engine_run(struct engine_node *node, uint64_t
>> run_id)
>>      if (!node->n_inputs) {
>>          node->run(node);
>>          VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
>> -        return;
>> +        return true;
>>      }
>>
>>      for (size_t i = 0; i < node->n_inputs; i++) {
>> -        engine_run(node->inputs[i].node, run_id);
>> +        if (!engine_run(node->inputs[i].node, run_id)) {
>> +            return false;
>> +        }
>>      }
>>
>>      bool need_compute = false;
>> @@ -160,6 +169,10 @@ engine_run(struct engine_node *node, uint64_t run_id)
>>      if (need_recompute) {
>>          VLOG_DBG("node: %s, recompute (%s)", node->name,
>>                   engine_force_recompute ? "forced" : "triggered");
>> +        if (engine_abort_recompute) {
>> +            VLOG_DBG("node: %s, recompute aborted", node->name);
>> +            return false;
>> +        }
>>          node->run(node);
>>      } else if (need_compute) {
>>          for (size_t i = 0; i < node->n_inputs; i++) {
>> @@ -170,6 +183,10 @@ engine_run(struct engine_node *node, uint64_t run_id)
>>                      VLOG_DBG("node: %s, can't handle change for input
>> %s, "
>>                               "fall back to recompute",
>>                               node->name, node->inputs[i].node->name);
>> +                    if (engine_abort_recompute) {
>> +                        VLOG_DBG("node: %s, recompute aborted",
>> node->name);
>> +                        return false;
>> +                    }
>>                      node->run(node);
>>                      break;
>>                  }
>> @@ -178,6 +195,7 @@ engine_run(struct engine_node *node, uint64_t run_id)
>>      }
>>
>>      VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
>> +    return true;
>>  }
>>
>>  bool
>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
>> index 6f0d08d..3a69dc2 100644
>> --- a/lib/inc-proc-eng.h
>> +++ b/lib/inc-proc-eng.h
>> @@ -121,8 +121,9 @@ struct engine_node {
>>  void engine_init(struct engine_node *);
>>
>>  /* Execute the processing recursively, which should be called in the main
>> - * loop. */
>> -void engine_run(struct engine_node *, uint64_t run_id);
>> + * loop. Returns true if the execution is compelte, false if it is
>> aborted,
>> + * which could happen when engine_abort_recompute is set. */
>> +bool engine_run(struct engine_node *, uint64_t run_id);
>>
>>  /* Clean up the data for the engine nodes recursively. It calls each
>> node's
>>   * cleanup() method if not NULL. It should be called before the program
>> @@ -150,6 +151,10 @@ void engine_add_input(struct engine_node *node,
>> struct engine_node *input,
>>   * iteration, and the change can't be tracked across iterations */
>>  void engine_set_force_recompute(bool val);
>>
>> +/* Set the flag to cause engine execution to be aborted when there
>> + * is any recompute to be triggered in any node. */
>> +void engine_set_abort_recompute(bool val);
>> +
>>  const struct engine_context * engine_get_context(void);
>>
>>  void engine_set_context(const struct engine_context *);
>> --
>> 2.1.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Han Zhou July 30, 2019, 7:14 a.m. UTC | #3
On Mon, Jul 29, 2019 at 9:11 PM Numan Siddique <nusiddiq@redhat.com> wrote:
>
>
>
> On Tue, Jul 30, 2019 at 9:39 AM Numan Siddique <nusiddiq@redhat.com>
wrote:
>>
>>
>>
>> On Tue, Jul 30, 2019 at 7:10 AM Han Zhou <zhouhan@gmail.com> wrote:
>>>
>>> From: Han Zhou <hzhou8@ebay.com>
>>>
>>> When there are in-flight flow-installations pending to ovs-vswitchd,
>>> current incremental processing logic prioritizes new change handling.
>>> However, in scenarios where frequent recomputes are triggered, the
>>> later recompute would block the flow-installation for previously
>>> computed flows because recompute usually takes long time, especially
>>> when there are large number of flows. This results in worse latency
>>> than the version without incremental processing in specific scale
>>> test scenarios.
>>>
>>> While we can simply fix the problem by prioritizing flow installation
>>> rather than new change handling, it can cause the incremental
>>> processing to degrade to always recompute in certain scenarios when
>>> there are some changes triggering recomputes, followed by a lot of
>>> continously coming changes that can be handled incrementally. Because
>>> OVSDB change tracker cannot preserve changes across iterations, once
>>> the recompute is triggered and resulted in a lot of pending messages
>>> to ovs-vswitchd, and if we choose to skip the engine_run()
>>> in the next iteration when a incrementally processible change comes,
>>> we miss the opportunity to handle that tracked change and will have
>>> to trigger recompute again in the next next iteration, and so on, if
>>> such changes come continously.
>>>
>>> This patch solves the problem by introducing
engine_set_abort_recompute(),
>>> so that we can prioritize new change handling if the change can be
>>> incrementally processed, but if the change triggers recompute, we
>>> abort there without spending CPU on the recompute to avoid blocking
>>> the previous computed flow installation.
>>>
>>> Reported-by: Daniel Alvarez Sanchez <dalvarez@redhat.com>
>>> Reported-by: Numan Siddique <nusiddiq@redhat.com>
>>> Reported-at:
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
>>> Tested-by: Numan Siddique <nusiddiq@redhat.com>
>>> Acked-by: Numan Siddique <nusiddiq@redhat.com>
>>> Acked-by: Mark Michelson <mmichels@redhat.com>
>>> Signed-off-by: Han Zhou <hzhou8@ebay.com>
>>> ---
>>
>>
>> Thanks. I applied this series to master.
>>
>> Numan
>
>
> I think we need this fix for branch 2.12, can you please post the back
port patch.
>
> Thanks
> Numan
>
Thank Numan. For 2.12, I think the original patch posted should work:
https://patchwork.ozlabs.org/patch/1135991/
diff mbox series

Patch

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 4d24a8b..8928205 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -985,7 +985,7 @@  add_meter(struct ovn_extend_table_info *m_desired,
  * in the correct state and not backlogged with existing flow_mods.  (Our
  * criteria for being backlogged appear very conservative, but the socket
  * between ovn-controller and OVS provides some buffering.) */
-static bool
+bool
 ofctrl_can_put(void)
 {
     if (state != S_UPDATE_FLOWS
diff --git a/controller/ofctrl.h b/controller/ofctrl.h
index 114c9ef..1e9ac16 100644
--- a/controller/ofctrl.h
+++ b/controller/ofctrl.h
@@ -51,6 +51,7 @@  void ofctrl_put(struct ovn_desired_flow_table *,
                 const struct sbrec_meter_table *,
                 int64_t nb_cfg,
                 bool flow_changed);
+bool ofctrl_can_put(void);
 void ofctrl_wait(void);
 void ofctrl_destroy(void);
 int64_t ofctrl_get_cur_cfg(void);
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index d3b28b9..ad33d17 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1896,6 +1896,7 @@  main(int argc, char *argv[])
 
     uint64_t engine_run_id = 0;
     uint64_t old_engine_run_id = 0;
+    bool engine_aborted = false;
 
     unsigned int ovs_cond_seqno = UINT_MAX;
     unsigned int ovnsb_cond_seqno = UINT_MAX;
@@ -1982,7 +1983,30 @@  main(int argc, char *argv[])
                     stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
                                     time_msec());
                     if (ovnsb_idl_txn) {
-                        engine_run(&en_flow_output, ++engine_run_id);
+                        if (!ofctrl_can_put()) {
+                            /* When there are in-flight messages pending to
+                             * ovs-vswitchd, we should hold on recomputing so
+                             * that the previous flow installations won't be
+                             * delayed.  However, we still want to try if
+                             * recompute is not needed and we can quickly
+                             * incrementally process the new changes, to avoid
+                             * unnecessarily forced recomputes later on.  This
+                             * is because the OVSDB change tracker cannot
+                             * preserve tracked changes across iterations.  If
+                             * change tracking is improved, we can simply skip
+                             * this round of engine_run and continue processing
+                             * acculated changes incrementally later when
+                             * ofctrl_can_put() returns true. */
+                            if (!engine_aborted) {
+                                engine_set_abort_recompute(true);
+                                engine_aborted = engine_run(&en_flow_output,
+                                                            ++engine_run_id);
+                            }
+                        } else {
+                            engine_set_abort_recompute(false);
+                            engine_aborted = false;
+                            engine_run(&en_flow_output, ++engine_run_id);
+                        }
                     }
                     stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
                                    time_msec());
@@ -2024,8 +2048,8 @@  main(int argc, char *argv[])
                 }
 
             }
-            if (old_engine_run_id == engine_run_id) {
-                if (engine_need_run(&en_flow_output)) {
+            if (old_engine_run_id == engine_run_id || engine_aborted) {
+                if (engine_aborted || engine_need_run(&en_flow_output)) {
                     VLOG_DBG("engine did not run, force recompute next time: "
                              "br_int %p, chassis %p", br_int, chassis);
                     engine_set_force_recompute(true);
diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
index 1ddea1a..1064a08 100644
--- a/lib/inc-proc-eng.c
+++ b/lib/inc-proc-eng.c
@@ -31,6 +31,7 @@ 
 VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
 
 static bool engine_force_recompute = false;
+static bool engine_abort_recompute = false;
 static const struct engine_context *engine_context;
 
 void
@@ -39,6 +40,12 @@  engine_set_force_recompute(bool val)
     engine_force_recompute = val;
 }
 
+void
+engine_set_abort_recompute(bool val)
+{
+    engine_abort_recompute = val;
+}
+
 const struct engine_context *
 engine_get_context(void)
 {
@@ -121,11 +128,11 @@  engine_ovsdb_node_add_index(struct engine_node *node, const char *name,
     ed->n_indexes ++;
 }
 
-void
+bool
 engine_run(struct engine_node *node, uint64_t run_id)
 {
     if (node->run_id == run_id) {
-        return;
+        return true;
     }
     node->run_id = run_id;
 
@@ -133,11 +140,13 @@  engine_run(struct engine_node *node, uint64_t run_id)
     if (!node->n_inputs) {
         node->run(node);
         VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
-        return;
+        return true;
     }
 
     for (size_t i = 0; i < node->n_inputs; i++) {
-        engine_run(node->inputs[i].node, run_id);
+        if (!engine_run(node->inputs[i].node, run_id)) {
+            return false;
+        }
     }
 
     bool need_compute = false;
@@ -160,6 +169,10 @@  engine_run(struct engine_node *node, uint64_t run_id)
     if (need_recompute) {
         VLOG_DBG("node: %s, recompute (%s)", node->name,
                  engine_force_recompute ? "forced" : "triggered");
+        if (engine_abort_recompute) {
+            VLOG_DBG("node: %s, recompute aborted", node->name);
+            return false;
+        }
         node->run(node);
     } else if (need_compute) {
         for (size_t i = 0; i < node->n_inputs; i++) {
@@ -170,6 +183,10 @@  engine_run(struct engine_node *node, uint64_t run_id)
                     VLOG_DBG("node: %s, can't handle change for input %s, "
                              "fall back to recompute",
                              node->name, node->inputs[i].node->name);
+                    if (engine_abort_recompute) {
+                        VLOG_DBG("node: %s, recompute aborted", node->name);
+                        return false;
+                    }
                     node->run(node);
                     break;
                 }
@@ -178,6 +195,7 @@  engine_run(struct engine_node *node, uint64_t run_id)
     }
 
     VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
+    return true;
 }
 
 bool
diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
index 6f0d08d..3a69dc2 100644
--- a/lib/inc-proc-eng.h
+++ b/lib/inc-proc-eng.h
@@ -121,8 +121,9 @@  struct engine_node {
 void engine_init(struct engine_node *);
 
 /* Execute the processing recursively, which should be called in the main
- * loop. */
-void engine_run(struct engine_node *, uint64_t run_id);
+ * loop. Returns true if the execution is compelte, false if it is aborted,
+ * which could happen when engine_abort_recompute is set. */
+bool engine_run(struct engine_node *, uint64_t run_id);
 
 /* Clean up the data for the engine nodes recursively. It calls each node's
  * cleanup() method if not NULL. It should be called before the program
@@ -150,6 +151,10 @@  void engine_add_input(struct engine_node *node, struct engine_node *input,
  * iteration, and the change can't be tracked across iterations */
 void engine_set_force_recompute(bool val);
 
+/* Set the flag to cause engine execution to be aborted when there
+ * is any recompute to be triggered in any node. */
+void engine_set_abort_recompute(bool val);
+
 const struct engine_context * engine_get_context(void);
 
 void engine_set_context(const struct engine_context *);