[ovs-dev,v6,2/5] Measure timing of ovn-controller loops.

Message ID 20180307170115.29137-3-mmichels@redhat.com
State Superseded
Headers show
Series
  • Stopwatch Library
Related show

Commit Message

Mark Michelson March 7, 2018, 5:01 p.m.
This modifies ovn-controller to measure the amount of time it takes to
detect a change in the southbound database and generate the resulting
flow table. This may require multiple iterations of the ovn-controller
loop.

The statistics can be queried using:

ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop

The statistics can be reset using:

ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop

Signed-off-by: Mark Michelson <mmichels@redhat.com>
---
 ovn/controller/ovn-controller.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Han Zhou March 15, 2018, 5:04 p.m. | #1
On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmichels@redhat.com> wrote:
>
> This modifies ovn-controller to measure the amount of time it takes to
> detect a change in the southbound database and generate the resulting
> flow table. This may require multiple iterations of the ovn-controller
> loop.
>
> The statistics can be queried using:
>
> ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>
> The statistics can be reset using:
>
> ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>
If the purpose of this patch is for measuring *SB changes* handling time
only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since the
controller loop handles many other things, such as local interface changes,
Packet-ins, OF messages, etc.

> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> ---
>  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
> index 7592bda25..abd253fca 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -57,6 +57,9 @@
>  #include "stream.h"
>  #include "unixctl.h"
>  #include "util.h"
> +#include "timeval.h"
> +#include "timer.h"
> +#include "stopwatch.h"
>
>  VLOG_DEFINE_THIS_MODULE(main);
>
> @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>  #define DEFAULT_BRIDGE_NAME "br-int"
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>
> +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
> +
>  static void update_probe_interval(struct controller_ctx *,
>                                    const char *ovnsb_remote);
>  static void parse_options(int argc, char *argv[]);
> @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1, inject_pkt,
>                               &pending_pkt);
>
> +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>      /* Main loop. */
>      exiting = false;
> +    unsigned int our_seqno = 0;
>      while (!exiting) {
>          /* Check OVN SB database. */
>          char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
> @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>          };
>
> +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
> +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
> +                            time_msec());
When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
> +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
> +        }
> +
>          update_probe_interval(&ctx, ovnsb_remote);
>
>          update_ssl_config(ctx.ovs_idl);
> @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>                      ofctrl_put(&flow_table, &pending_ct_zones,
>                                 get_nb_cfg(ctx.ovnsb_idl));
>
> +                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
> +                                   time_msec());
> +

Two issues here:

Firstly, we start the stopwatch only when seqno changes, but we will always
call the stop, which may result in many unnecessary messages to the
stopwatch thread. Maybe it is not a big issue, but it would be better if we
have a variable to control it to stop only when necessary.

Secondly, when multiple iterations are needed for one SB change handling,
the iterations will be like:
- iter1:
    - stopwatch_start
    - ofctrl_put
    - stopwatch stop
- iter2:
    - ofctrl_run
    - (ofctrl_put skipped)
...
- iterN:
    - ofctrl_run
    - ofctrl_put (will not send to OVS, since desired and installed flows
should be same now)
    - stopwatch stop

So putting stopwatch_stop here still measures only the first iteration. I
had a suggestion earlier about checking ofctrl.cur_cfg, but I was wrong
since I realized that nb_cfg changes only upon user requesting waiting for
HV. To really measure the multi-iteration processing, we need a mechanism
that ofctrl takes current SB seqno as input and maintains the cur_seq_no
just like how cur_cfg is maintained.
>                      hmap_destroy(&flow_table);
>                  }
>                  if (ctx.ovnsb_idl_txn) {
> @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>              ofctrl_wait();
>              pinctrl_wait(&ctx);
>          }
> +
>          ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>
>          if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
> --
> 2.14.3
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Acked-by: Han Zhou <hzhou8@ebay.com>
Han Zhou March 21, 2018, 9:33 p.m. | #2
(Add the dev list back)

On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <mmichels@redhat.com> wrote:
>
> On 03/15/2018 12:04 PM, Han Zhou wrote:
>>
>>
>>
>> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmichels@redhat.com
<mailto:mmichels@redhat.com>> wrote:
>>  >
>>  > This modifies ovn-controller to measure the amount of time it takes to
>>  > detect a change in the southbound database and generate the resulting
>>  > flow table. This may require multiple iterations of the ovn-controller
>>  > loop.
>>  >
>>  > The statistics can be queried using:
>>  >
>>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>>  >
>>  > The statistics can be reset using:
>>  >
>>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>>  >
>> If the purpose of this patch is for measuring *SB changes* handling time
only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since the
controller loop handles many other things, such as local interface changes,
Packet-ins, OF messages, etc.
>
>
> Sure, this sounds reasonable. I'll talk about this a bit more down below,
though.
>
>>
>>  > Signed-off-by: Mark Michelson <mmichels@redhat.com <mailto:
mmichels@redhat.com>>
>>
>>  > ---
>>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>>  >  1 file changed, 17 insertions(+)
>>  >
>>  > diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>>  > index 7592bda25..abd253fca 100644
>>  > --- a/ovn/controller/ovn-controller.c
>>  > +++ b/ovn/controller/ovn-controller.c
>>  > @@ -57,6 +57,9 @@
>>  >  #include "stream.h"
>>  >  #include "unixctl.h"
>>  >  #include "util.h"
>>  > +#include "timeval.h"
>>  > +#include "timer.h"
>>  > +#include "stopwatch.h"
>>  >
>>  >  VLOG_DEFINE_THIS_MODULE(main);
>>  >
>>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>>  >
>>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>>  > +
>>  >  static void update_probe_interval(struct controller_ctx *,
>>  >                                    const char *ovnsb_remote);
>>  >  static void parse_options(int argc, char *argv[]);
>>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1,
inject_pkt,
>>  >                               &pending_pkt);
>>  >
>>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>  >      /* Main loop. */
>>  >      exiting = false;
>>  > +    unsigned int our_seqno = 0;
>>  >      while (!exiting) {
>>  >          /* Check OVN SB database. */
>>  >          char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
>>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>  >          };
>>  >
>>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  > +                            time_msec());
>> When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
>
>
> That actually cannot happen. Check out stopwatch_start_sample_protected()
in lib/stopwatch.c. If the stopwatch currently has a sample in progress,
then the start operation immediately returns and will not overwrite the
previously recorded start time. The only way that the start time can be
recorded is if there is no sample currently in progress.

Agree. So some of the SB change processing time are silently ignored, but
it is totally fine since it is just for statistics, and it is ok to ignore
some samples in rare situation :)
>
>>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>>  > +        }
>>  > +
>>  >          update_probe_interval(&ctx, ovnsb_remote);
>>  >
>>  >          update_ssl_config(ctx.ovs_idl);
>>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>>  >
>>  > +                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  > +                                   time_msec());
>>  > +
>>
>> Two issues here:
>>
>> Firstly, we start the stopwatch only when seqno changes, but we will
always call the stop, which may result in many unnecessary messages to the
stopwatch thread. Maybe it is not a big issue, but it would be better if we
have a variable to control it to stop only when necessary.
>
>
> This is a good point. I had been viewing the extra stopwatch_stop() calls
as being inconsequential since they would be eventual no-ops. However,
you're definitely right that it leads to extra processing that we can
easily avoid by keeping track of whether we have a sample in progress.
>
>>
>> Secondly, when multiple iterations are needed for one SB change
handling, the iterations will be like:
>> - iter1:
>>      - stopwatch_start
>>      - ofctrl_put
>>      - stopwatch stop
>> - iter2:
>>      - ofctrl_run
>>      - (ofctrl_put skipped)
>> ...
>> - iterN:
>>      - ofctrl_run
>>      - ofctrl_put (will not send to OVS, since desired and installed
flows should be same now)
>>      - stopwatch stop
>>
>> So putting stopwatch_stop here still measures only the first iteration.
I had a suggestion earlier about checking ofctrl.cur_cfg, but I was wrong
since I realized that nb_cfg changes only upon user requesting waiting for
HV. To really measure the multi-iteration processing, we need a mechanism
that ofctrl takes current SB seqno as input and maintains the cur_seq_no
just like how cur_cfg is maintained.
>
>
> Thanks for the detailed explanation. I think I understand better the
position you are coming from. Let me first explain what I had been thinking.
>
> - iter1:
>     - SB seqno has changed
>     - stopwatch_start()
>     - ofctrl_can_put() returns false, so no stopwatch_stop call
ofctrl_can_put() return false means the SB change detected in this
iteration can't even be started to get added to ovs.
> ...
> - iterN:
>     - ofctrl_can_put() returns true
ofctrl_can_put() returns true means the last SB change now can get added to
OVS, but it may just get started, and it is not completed yet when multiple
iterations are needed.
>     - stopwatch_stop()
>
> On iterations 2 through N, it does not matter if the SB seqno has changed
or not. The stopwatch_start() will be a no-op since there is already a
sample being measured. The change to the southbound database that we
observed in iteration 1 gets timed until iteration N.
>
> The question here is whether my poor naming of the measurement has
created confusion. In other words, am I expecting a measurement of one
thing while you expect the measurement of something else? Or is the logic
being used here completely incorrect?
>
> What I was intending to measure was the time it takes to see a change in
the southbound database, create the resultant flows, and send those to
vswitchd. I believe what you're thinking about is just the latter two steps.

To measure the time it takes to "see a change in the southbound database,
create the resultant flows, and send those to vswitchd", I think we need to
consider the iterations in my previous description. ofctrl_can_put()
returns true means the flow sending to vswitchd starts in this iteration
for sure, but it may or may not finish in this same iteration. If
oftrl_put() finishes sending the flow in this same iteration, your current
logic works perfectly. Otherwise, if multiple iterations are needed, it is
ofctrl_run() in the next iterations that completes the flow sendings to
vswitchd. The iterations you described is not a start-to-end cycle of
SB change handling (please see my comment inlined).

As you see, to measure a real complete cycle of "SB change - flow compute -
adding to ovs" is a little bit tricky, and ofctrl_can_put() itself doesn't
provide all the information, so that's why I proposed to change the
scope/rename the statistics of this patch just to measure the controller
loop time which is straighforward, and then have the *current seq_no*
mechanism implemented in ofctrl later so that we can measure the "SB change
- flow compute - adding to ovs" cycle. Hope I didn't make it more confusing
:)

Thanks,
Han

>
>>  >                      hmap_destroy(&flow_table);
>>  >                  }
>>  >                  if (ctx.ovnsb_idl_txn) {
>>  > @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>>  >              ofctrl_wait();
>>  >              pinctrl_wait(&ctx);
>>  >          }
>>  > +
>>  >          ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>>  >
>>  >          if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
>>  > --
>>  > 2.14.3
>>  >
>>  > _______________________________________________
>>  > dev mailing list
>>  > dev@openvswitch.org <mailto:dev@openvswitch.org>
>>  > https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>
>> Acked-by: Han Zhou <hzhou8@ebay.com <mailto:hzhou8@ebay.com>>
>
>
Mark Michelson March 23, 2018, 10:22 p.m. | #3
Sorry for the delayed response on this, but things are a bit busy/weird 
for me right now.

On 03/21/2018 04:33 PM, Han Zhou wrote:
> (Add the dev list back)
> 
> On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com>> wrote:
>  >
>  > On 03/15/2018 12:04 PM, Han Zhou wrote:
>  >>
>  >>
>  >>
>  >> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com> <mailto:mmichels@redhat.com 
> <mailto:mmichels@redhat.com>>> wrote:
>  >>  >
>  >>  > This modifies ovn-controller to measure the amount of time it 
> takes to
>  >>  > detect a change in the southbound database and generate the resulting
>  >>  > flow table. This may require multiple iterations of the 
> ovn-controller
>  >>  > loop.
>  >>  >
>  >>  > The statistics can be queried using:
>  >>  >
>  >>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>  >>  >
>  >>  > The statistics can be reset using:
>  >>  >
>  >>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>  >>  >
>  >> If the purpose of this patch is for measuring *SB changes* handling 
> time only, the statistics may be renamed to something like 
> ovn-controller-loop-sb. Otherwise it is a little bit misleading since 
> the controller loop handles many other things, such as local interface 
> changes, Packet-ins, OF messages, etc.
>  >
>  >
>  > Sure, this sounds reasonable. I'll talk about this a bit more down 
> below, though.
>  >
>  >>
>  >>  > Signed-off-by: Mark Michelson <mmichels@redhat.com 
> <mailto:mmichels@redhat.com> <mailto:mmichels@redhat.com 
> <mailto:mmichels@redhat.com>>>
>  >>
>  >>  > ---
>  >>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>  >>  >  1 file changed, 17 insertions(+)
>  >>  >
>  >>  > diff --git a/ovn/controller/ovn-controller.c 
> b/ovn/controller/ovn-controller.c
>  >>  > index 7592bda25..abd253fca 100644
>  >>  > --- a/ovn/controller/ovn-controller.c
>  >>  > +++ b/ovn/controller/ovn-controller.c
>  >>  > @@ -57,6 +57,9 @@
>  >>  >  #include "stream.h"
>  >>  >  #include "unixctl.h"
>  >>  >  #include "util.h"
>  >>  > +#include "timeval.h"
>  >>  > +#include "timer.h"
>  >>  > +#include "stopwatch.h"
>  >>  >
>  >>  >  VLOG_DEFINE_THIS_MODULE(main);
>  >>  >
>  >>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>  >>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>  >>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>  >>  >
>  >>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>  >>  > +
>  >>  >  static void update_probe_interval(struct controller_ctx *,
>  >>  >                                    const char *ovnsb_remote);
>  >>  >  static void parse_options(int argc, char *argv[]);
>  >>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>  >>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1, 
> inject_pkt,
>  >>  >                               &pending_pkt);
>  >>  >
>  >>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>  >>  >      /* Main loop. */
>  >>  >      exiting = false;
>  >>  > +    unsigned int our_seqno = 0;
>  >>  >      while (!exiting) {
>  >>  >          /* Check OVN SB database. */
>  >>  >          char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
>  >>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>  >>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>  >>  >          };
>  >>  >
>  >>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>  >>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>  >>  > +                            time_msec());
>  >> When multiple iterations are needed for one SB change handling, the 
> stopwatch can be started before the previous one is stopped, if seqno 
> changes in between. Could you explain the consideration here?
>  >
>  >
>  > That actually cannot happen. Check out 
> stopwatch_start_sample_protected() in lib/stopwatch.c. If the stopwatch 
> currently has a sample in progress, then the start operation immediately 
> returns and will not overwrite the previously recorded start time. The 
> only way that the start time can be recorded is if there is no sample 
> currently in progress.
> 
> Agree. So some of the SB change processing time are silently ignored, 
> but it is totally fine since it is just for statistics, and it is ok to 
> ignore some samples in rare situation :)
>  >
>  >>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>  >>  > +        }
>  >>  > +
>  >>  >          update_probe_interval(&ctx, ovnsb_remote);
>  >>  >
>  >>  >          update_ssl_config(ctx.ovs_idl);
>  >>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>  >>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>  >>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>  >>  >
>  >>  > +                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>  >>  > +                                   time_msec());
>  >>  > +
>  >>
>  >> Two issues here:
>  >>
>  >> Firstly, we start the stopwatch only when seqno changes, but we will 
> always call the stop, which may result in many unnecessary messages to 
> the stopwatch thread. Maybe it is not a big issue, but it would be 
> better if we have a variable to control it to stop only when necessary.
>  >
>  >
>  > This is a good point. I had been viewing the extra stopwatch_stop() 
> calls as being inconsequential since they would be eventual no-ops. 
> However, you're definitely right that it leads to extra processing that 
> we can easily avoid by keeping track of whether we have a sample in 
> progress.
>  >
>  >>
>  >> Secondly, when multiple iterations are needed for one SB change 
> handling, the iterations will be like:
>  >> - iter1:
>  >>      - stopwatch_start
>  >>      - ofctrl_put
>  >>      - stopwatch stop
>  >> - iter2:
>  >>      - ofctrl_run
>  >>      - (ofctrl_put skipped)
>  >> ...
>  >> - iterN:
>  >>      - ofctrl_run
>  >>      - ofctrl_put (will not send to OVS, since desired and installed 
> flows should be same now)
>  >>      - stopwatch stop
>  >>
>  >> So putting stopwatch_stop here still measures only the first 
> iteration. I had a suggestion earlier about checking ofctrl.cur_cfg, but 
> I was wrong since I realized that nb_cfg changes only upon user 
> requesting waiting for HV. To really measure the multi-iteration 
> processing, we need a mechanism that ofctrl takes current SB seqno as 
> input and maintains the cur_seq_no just like how cur_cfg is maintained.
>  >
>  >
>  > Thanks for the detailed explanation. I think I understand better the 
> position you are coming from. Let me first explain what I had been thinking.
>  >
>  > - iter1:
>  >     - SB seqno has changed
>  >     - stopwatch_start()
>  >     - ofctrl_can_put() returns false, so no stopwatch_stop call
> ofctrl_can_put() return false means the SB change detected in this 
> iteration can't even be started to get added to ovs.
>  > ...
>  > - iterN:
>  >     - ofctrl_can_put() returns true
> ofctrl_can_put() returns true means the last SB change now can get added 
> to OVS, but it may just get started, and it is not completed yet when 
> multiple iterations are needed.
>  >     - stopwatch_stop()
>  >
>  > On iterations 2 through N, it does not matter if the SB seqno has 
> changed or not. The stopwatch_start() will be a no-op since there is 
> already a sample being measured. The change to the southbound database 
> that we observed in iteration 1 gets timed until iteration N.
>  >
>  > The question here is whether my poor naming of the measurement has 
> created confusion. In other words, am I expecting a measurement of one 
> thing while you expect the measurement of something else? Or is the 
> logic being used here completely incorrect?
>  >
>  > What I was intending to measure was the time it takes to see a change 
> in the southbound database, create the resultant flows, and send those 
> to vswitchd. I believe what you're thinking about is just the latter two 
> steps.
> 
> To measure the time it takes to "see a change in the southbound 
> database, create the resultant flows, and send those to vswitchd", I 
> think we need to consider the iterations in my previous description. 
> ofctrl_can_put() returns true means the flow sending to vswitchd starts 
> in this iteration for sure, but it may or may not finish in this same 
> iteration. If oftrl_put() finishes sending the flow in this same 
> iteration, your current logic works perfectly. Otherwise, if multiple 
> iterations are needed, it is ofctrl_run() in the next iterations that 
> completes the flow sendings to vswitchd. The iterations you described is 
> not a start-to-end cycle of
> SB change handling (please see my comment inlined).
> 
> As you see, to measure a real complete cycle of "SB change - flow 
> compute - adding to ovs" is a little bit tricky, and ofctrl_can_put() 
> itself doesn't provide all the information, so that's why I proposed to 
> change the scope/rename the statistics of this patch just to measure the 
> controller loop time which is straighforward, and then have the *current 
> seq_no* mechanism implemented in ofctrl later so that we can measure the 
> "SB change - flow compute - adding to ovs" cycle. Hope I didn't make it 
> more confusing :)

This makes a great amount of sense and is not confusing. Thank you.

I think the main flaw between what I actually coded and what I said is 
the part about adding the flows to OVS. In reality, I'm way more 
concerned about the time it takes to generate flows than anything else, 
since that happens 100% in ovn-controller and is something that is good 
to have measurements for. I can think of two things I can do here that 
might be good:

1) Change the scope of the stopwatch (as well as the name) so that it 
only operates within the ofctrl_can_put() conditional. This way, all it 
is measuring is flow creation. I could call the stopwatch something like 
"ovn-controller-flow-process" or something. This way, I'm zeroing in on 
just the operation I think is a good one to be measuring. The southbound 
database doesn't play into it, nor does passing the flows down to OVS.

2) Keep the stopwatch measurements basically where they are and change 
the description of what I'm measuring. What it currently measures is the 
flow creation for a specific change to the southbound database, plus 
potentially passing it down to OVS. If I move the stopwatch_stop() call 
to before ofctrl_put(), then I think I'll be measuring the time it takes 
to process a specific change to the southbound database, but not taking 
into account sending the flows to OVS.

Another option is to implement both (1) and (2). (1) can give a general 
idea of how long it takes to generate flows, no matter the circumstance. 
(2) can give a general idea of how long it takes to create flows based 
on changes seen in the southbound database.

In either case, I think you're right that putting off the ofctrl seqno 
changes until later.

What do you think?

> 
> Thanks,
> Han
> 
>  >
>  >>  >                      hmap_destroy(&flow_table);
>  >>  >                  }
>  >>  >                  if (ctx.ovnsb_idl_txn) {
>  >>  > @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>  >>  >              ofctrl_wait();
>  >>  >              pinctrl_wait(&ctx);
>  >>  >          }
>  >>  > +
>  >>  >          ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>  >>  >
>  >>  >          if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
>  >>  > --
>  >>  > 2.14.3
>  >>  >
>  >>  > _______________________________________________
>  >>  > dev mailing list
>  >>  > dev@openvswitch.org <mailto:dev@openvswitch.org> 
> <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>
>  >>  > https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>  >>
>  >> Acked-by: Han Zhou <hzhou8@ebay.com <mailto:hzhou8@ebay.com> 
> <mailto:hzhou8@ebay.com <mailto:hzhou8@ebay.com>>>
>  >
>  >
Han Zhou March 24, 2018, 1:22 a.m. | #4
On Fri, Mar 23, 2018 at 3:22 PM, Mark Michelson <mmichels@redhat.com> wrote:
>
> Sorry for the delayed response on this, but things are a bit busy/weird
for me right now.
>
> On 03/21/2018 04:33 PM, Han Zhou wrote:
>>
>> (Add the dev list back)
>>
>> On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <mmichels@redhat.com
<mailto:mmichels@redhat.com>> wrote:
>>  >
>>  > On 03/15/2018 12:04 PM, Han Zhou wrote:
>>  >>
>>  >>
>>  >>
>>  >> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmichels@redhat.com
<mailto:mmichels@redhat.com> <mailto:mmichels@redhat.com <mailto:
mmichels@redhat.com>>> wrote:
>>  >>  >
>>  >>  > This modifies ovn-controller to measure the amount of time it
takes to
>>  >>  > detect a change in the southbound database and generate the
resulting
>>  >>  > flow table. This may require multiple iterations of the
ovn-controller
>>  >>  > loop.
>>  >>  >
>>  >>  > The statistics can be queried using:
>>  >>  >
>>  >>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>>  >>  >
>>  >>  > The statistics can be reset using:
>>  >>  >
>>  >>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>>  >>  >
>>  >> If the purpose of this patch is for measuring *SB changes* handling
time only, the statistics may be renamed to something like
ovn-controller-loop-sb. Otherwise it is a little bit misleading since the
controller loop handles many other things, such as local interface changes,
Packet-ins, OF messages, etc.
>>  >
>>  >
>>  > Sure, this sounds reasonable. I'll talk about this a bit more down
below, though.
>>  >
>>  >>
>>  >>  > Signed-off-by: Mark Michelson <mmichels@redhat.com <mailto:
mmichels@redhat.com> <mailto:mmichels@redhat.com <mailto:mmichels@redhat.com
>>>
>>
>>  >>
>>  >>  > ---
>>  >>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>>  >>  >  1 file changed, 17 insertions(+)
>>  >>  >
>>  >>  > diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>>  >>  > index 7592bda25..abd253fca 100644
>>  >>  > --- a/ovn/controller/ovn-controller.c
>>  >>  > +++ b/ovn/controller/ovn-controller.c
>>  >>  > @@ -57,6 +57,9 @@
>>  >>  >  #include "stream.h"
>>  >>  >  #include "unixctl.h"
>>  >>  >  #include "util.h"
>>  >>  > +#include "timeval.h"
>>  >>  > +#include "timer.h"
>>  >>  > +#include "stopwatch.h"
>>  >>  >
>>  >>  >  VLOG_DEFINE_THIS_MODULE(main);
>>  >>  >
>>  >>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>>  >>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>>  >>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>>  >>  >
>>  >>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>>  >>  > +
>>  >>  >  static void update_probe_interval(struct controller_ctx *,
>>  >>  >                                    const char *ovnsb_remote);
>>  >>  >  static void parse_options(int argc, char *argv[]);
>>  >>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>>  >>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1,
inject_pkt,
>>  >>  >                               &pending_pkt);
>>  >>  >
>>  >>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>  >>  >      /* Main loop. */
>>  >>  >      exiting = false;
>>  >>  > +    unsigned int our_seqno = 0;
>>  >>  >      while (!exiting) {
>>  >>  >          /* Check OVN SB database. */
>>  >>  >          char *new_ovnsb_remote =
get_ovnsb_remote(ovs_idl_loop.idl);
>>  >>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>>  >>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>>  >>  >          };
>>  >>  >
>>  >>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>>  >>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  >>  > +                            time_msec());
>>  >> When multiple iterations are needed for one SB change handling, the
stopwatch can be started before the previous one is stopped, if seqno
changes in between. Could you explain the consideration here?
>>  >
>>  >
>>  > That actually cannot happen. Check out
stopwatch_start_sample_protected() in lib/stopwatch.c. If the stopwatch
currently has a sample in progress, then the start operation immediately
returns and will not overwrite the previously recorded start time. The only
way that the start time can be recorded is if there is no sample currently
in progress.
>>
>> Agree. So some of the SB change processing time are silently ignored,
but it is totally fine since it is just for statistics, and it is ok to
ignore some samples in rare situation :)
>>  >
>>  >>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>>  >>  > +        }
>>  >>  > +
>>  >>  >          update_probe_interval(&ctx, ovnsb_remote);
>>  >>  >
>>  >>  >          update_ssl_config(ctx.ovs_idl);
>>  >>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>>  >>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>>  >>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>>  >>  >
>>  >>  > +
 stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>>  >>  > +                                   time_msec());
>>  >>  > +
>>  >>
>>  >> Two issues here:
>>  >>
>>  >> Firstly, we start the stopwatch only when seqno changes, but we will
always call the stop, which may result in many unnecessary messages to the
stopwatch thread. Maybe it is not a big issue, but it would be better if we
have a variable to control it to stop only when necessary.
>>  >
>>  >
>>  > This is a good point. I had been viewing the extra stopwatch_stop()
calls as being inconsequential since they would be eventual no-ops.
However, you're definitely right that it leads to extra processing that we
can easily avoid by keeping track of whether we have a sample in progress.
>>  >
>>  >>
>>  >> Secondly, when multiple iterations are needed for one SB change
handling, the iterations will be like:
>>  >> - iter1:
>>  >>      - stopwatch_start
>>  >>      - ofctrl_put
>>  >>      - stopwatch stop
>>  >> - iter2:
>>  >>      - ofctrl_run
>>  >>      - (ofctrl_put skipped)
>>  >> ...
>>  >> - iterN:
>>  >>      - ofctrl_run
>>  >>      - ofctrl_put (will not send to OVS, since desired and installed
flows should be same now)
>>  >>      - stopwatch stop
>>  >>
>>  >> So putting stopwatch_stop here still measures only the first
iteration. I had a suggestion earlier about checking ofctrl.cur_cfg, but I
was wrong since I realized that nb_cfg changes only upon user requesting
waiting for HV. To really measure the multi-iteration processing, we need a
mechanism that ofctrl takes current SB seqno as input and maintains the
cur_seq_no just like how cur_cfg is maintained.
>>  >
>>  >
>>  > Thanks for the detailed explanation. I think I understand better the
position you are coming from. Let me first explain what I had been thinking.
>>  >
>>  > - iter1:
>>  >     - SB seqno has changed
>>  >     - stopwatch_start()
>>  >     - ofctrl_can_put() returns false, so no stopwatch_stop call
>> ofctrl_can_put() return false means the SB change detected in this
iteration can't even be started to get added to ovs.
>>  > ...
>>  > - iterN:
>>  >     - ofctrl_can_put() returns true
>> ofctrl_can_put() returns true means the last SB change now can get added
to OVS, but it may just get started, and it is not completed yet when
multiple iterations are needed.
>>  >     - stopwatch_stop()
>>  >
>>  > On iterations 2 through N, it does not matter if the SB seqno has
changed or not. The stopwatch_start() will be a no-op since there is
already a sample being measured. The change to the southbound database that
we observed in iteration 1 gets timed until iteration N.
>>  >
>>  > The question here is whether my poor naming of the measurement has
created confusion. In other words, am I expecting a measurement of one
thing while you expect the measurement of something else? Or is the logic
being used here completely incorrect?
>>  >
>>  > What I was intending to measure was the time it takes to see a change
in the southbound database, create the resultant flows, and send those to
vswitchd. I believe what you're thinking about is just the latter two steps.
>>
>> To measure the time it takes to "see a change in the southbound
database, create the resultant flows, and send those to vswitchd", I think
we need to consider the iterations in my previous description.
ofctrl_can_put() returns true means the flow sending to vswitchd starts in
this iteration for sure, but it may or may not finish in this same
iteration. If oftrl_put() finishes sending the flow in this same iteration,
your current logic works perfectly. Otherwise, if multiple iterations are
needed, it is ofctrl_run() in the next iterations that completes the flow
sendings to vswitchd. The iterations you described is not a start-to-end
cycle of
>> SB change handling (please see my comment inlined).
>>
>> As you see, to measure a real complete cycle of "SB change - flow
compute - adding to ovs" is a little bit tricky, and ofctrl_can_put()
itself doesn't provide all the information, so that's why I proposed to
change the scope/rename the statistics of this patch just to measure the
controller loop time which is straighforward, and then have the *current
seq_no* mechanism implemented in ofctrl later so that we can measure the
"SB change - flow compute - adding to ovs" cycle. Hope I didn't make it
more confusing :)
>
>
> This makes a great amount of sense and is not confusing. Thank you.
>
> I think the main flaw between what I actually coded and what I said is
the part about adding the flows to OVS. In reality, I'm way more concerned
about the time it takes to generate flows than anything else, since that
happens 100% in ovn-controller and is something that is good to have
measurements for. I can think of two things I can do here that might be
good:
>
> 1) Change the scope of the stopwatch (as well as the name) so that it
only operates within the ofctrl_can_put() conditional. This way, all it is
measuring is flow creation. I could call the stopwatch something like
"ovn-controller-flow-process" or something. This way, I'm zeroing in on
just the operation I think is a good one to be measuring. The southbound
database doesn't play into it, nor does passing the flows down to OVS.
>
> 2) Keep the stopwatch measurements basically where they are and change
the description of what I'm measuring. What it currently measures is the
flow creation for a specific change to the southbound database, plus
potentially passing it down to OVS. If I move the stopwatch_stop() call to
before ofctrl_put(), then I think I'll be measuring the time it takes to
process a specific change to the southbound database, but not taking into
account sending the flows to OVS.
>
> Another option is to implement both (1) and (2). (1) can give a general
idea of how long it takes to generate flows, no matter the circumstance.
(2) can give a general idea of how long it takes to create flows based on
changes seen in the southbound database.
>
> In either case, I think you're right that putting off the ofctrl seqno
changes until later.
>
> What do you think?
>
Thanks Mark. I think 1) sounds better because it will be more clear for
what we are measuring. We may add more measures later on for other
scenarios.

Han

Patch

diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 7592bda25..abd253fca 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -57,6 +57,9 @@ 
 #include "stream.h"
 #include "unixctl.h"
 #include "util.h"
+#include "timeval.h"
+#include "timer.h"
+#include "stopwatch.h"
 
 VLOG_DEFINE_THIS_MODULE(main);
 
@@ -67,6 +70,8 @@  static unixctl_cb_func inject_pkt;
 #define DEFAULT_BRIDGE_NAME "br-int"
 #define DEFAULT_PROBE_INTERVAL_MSEC 5000
 
+#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
+
 static void update_probe_interval(struct controller_ctx *,
                                   const char *ovnsb_remote);
 static void parse_options(int argc, char *argv[]);
@@ -639,8 +644,10 @@  main(int argc, char *argv[])
     unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1, inject_pkt,
                              &pending_pkt);
 
+    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
     /* Main loop. */
     exiting = false;
+    unsigned int our_seqno = 0;
     while (!exiting) {
         /* Check OVN SB database. */
         char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
@@ -659,6 +666,12 @@  main(int argc, char *argv[])
             .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
         };
 
+        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
+            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
+                            time_msec());
+            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
+        }
+
         update_probe_interval(&ctx, ovnsb_remote);
 
         update_ssl_config(ctx.ovs_idl);
@@ -728,6 +741,9 @@  main(int argc, char *argv[])
                     ofctrl_put(&flow_table, &pending_ct_zones,
                                get_nb_cfg(ctx.ovnsb_idl));
 
+                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
+                                   time_msec());
+
                     hmap_destroy(&flow_table);
                 }
                 if (ctx.ovnsb_idl_txn) {
@@ -792,6 +808,7 @@  main(int argc, char *argv[])
             ofctrl_wait();
             pinctrl_wait(&ctx);
         }
+
         ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
 
         if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {