diff mbox series

[ovs-dev,v7,3/3] dpif-netdev: Detection and logging of suspicious PMD iterations

Message ID 1516067473-29286-4-git-send-email-jan.scheurich@ericsson.com
State Superseded
Delegated to: Ian Stokes
Headers show
Series dpif-netdev: Detailed PMD performance metrics and supervision | expand

Commit Message

Jan Scheurich Jan. 16, 2018, 1:51 a.m. UTC
This patch enhances dpif-netdev-perf to detect iterations with
suspicious statistics according to the following criteria:

- iteration lasts longer than US_THR microseconds (default 250).
  This can be used to capture events where a PMD is blocked or
  interrupted for such a period of time that there is a risk for
  dropped packets on any of its Rx queues.

- max vhost qlen exceeds a threshold Q_THR (default 128). This can
  be used to infer virtio queue overruns and dropped packets inside
  a VM, which are not visible in OVS otherwise.

Such suspicious iterations can be logged together with their iteration
statistics to be able to correlate them to packet drop or other events
outside OVS.

A new command is introduced to enable/disable logging at run-time and
to adjust the above thresholds for suspicious iterations:

ovs-appctl dpif-netdev/pmd-perf-log-set on | off
    [-b before] [-a after] [-us usec] [-q qlen]

Turn logging on or off at run-time (on|off).

-b before:  The number of iterations before the suspicious iteration to
            be logged (default 5).
-a after:   The number of iterations after the suspicious iteration to
            be logged (default 5).
-q qlen:    Suspicious vhost queue fill level threshold. Increase this
            to 512 if the Qemu supports 1024 virtio queue length.
            (default 128).
-us usec:   change the duration threshold for a suspicious iteration
            (default 250 us).

If more than 100 iterations before or after a suspicious iteration have
been looged once, OVS falls back to the safe default values (5/5) to
avoid that logging itself causes continuos further logging.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
---
 NEWS                        |   2 +
 lib/dpif-netdev-perf.c      | 142 ++++++++++++++++++++++++++++++++++++++++++++
 lib/dpif-netdev-perf.h      |  40 ++++++++++++-
 lib/dpif-netdev.c           |   7 ++-
 lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
 5 files changed, 233 insertions(+), 5 deletions(-)

Comments

Billy O'Mahony Jan. 19, 2018, 11:07 a.m. UTC | #1
Hi All,

I'm going to actually try out the code next. But for now a few comments on the code. 

> -----Original Message-----
> From: Jan Scheurich [mailto:jan.scheurich@ericsson.com]
> Sent: Tuesday, January 16, 2018 1:51 AM
> To: dev@openvswitch.org
> Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>;
> i.maximets@samsung.com; O Mahony, Billy <billy.o.mahony@intel.com>;
> Jan Scheurich <jan.scheurich@ericsson.com>
> Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> PMD iterations
> 
> This patch enhances dpif-netdev-perf to detect iterations with suspicious
> statistics according to the following criteria:
> 
> - iteration lasts longer than US_THR microseconds (default 250).
>   This can be used to capture events where a PMD is blocked or
>   interrupted for such a period of time that there is a risk for
>   dropped packets on any of its Rx queues.
> 
> - max vhost qlen exceeds a threshold Q_THR (default 128). This can
>   be used to infer virtio queue overruns and dropped packets inside
>   a VM, which are not visible in OVS otherwise.
> 
> Such suspicious iterations can be logged together with their iteration
> statistics to be able to correlate them to packet drop or other events outside
> OVS.
> 
> A new command is introduced to enable/disable logging at run-time and to
> adjust the above thresholds for suspicious iterations:
> 
> ovs-appctl dpif-netdev/pmd-perf-log-set on | off
>     [-b before] [-a after] [-us usec] [-q qlen]
> 
> Turn logging on or off at run-time (on|off).
> 
> -b before:  The number of iterations before the suspicious iteration to
>             be logged (default 5).
> -a after:   The number of iterations after the suspicious iteration to
>             be logged (default 5).
> -q qlen:    Suspicious vhost queue fill level threshold. Increase this
>             to 512 if the Qemu supports 1024 virtio queue length.
>             (default 128).
> -us usec:   change the duration threshold for a suspicious iteration
>             (default 250 us).
> 
> If more than 100 iterations before or after a suspicious iteration have been
> looged once, OVS falls back to the safe default values (5/5) to avoid that
> logging itself causes continuos further logging.
> 
> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> ---
>  NEWS                        |   2 +
>  lib/dpif-netdev-perf.c      | 142
> ++++++++++++++++++++++++++++++++++++++++++++
>  lib/dpif-netdev-perf.h      |  40 ++++++++++++-
>  lib/dpif-netdev.c           |   7 ++-
>  lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
>  5 files changed, 233 insertions(+), 5 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 743528e..7d40374 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -47,6 +47,8 @@ Post-v2.8.0
>       * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> single PMD
>       * Detailed PMD performance metrics available with new command
>           ovs-appctl dpif-netdev/pmd-perf-show
> +     * Supervision of PMD performance metrics and logging of suspicious
> +       iterations
>     - vswitchd:
>       * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
>       * Configuring a controller, or unconfiguring all controllers, now deletes diff
> --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> e0ef15d..259a6c8 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -24,6 +24,23 @@
> 
>  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> 
> +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> duration
> +                                   in microseconds. */
> +#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
> +#define LOG_IT_BEFORE 5         /* Number of iteration to log before
> +                                   suspicious iteration. */
> +#define LOG_IT_AFTER 5          /* Number of iteration to log after
> +                                   suspicious iteration. */
[[BO'M]] typo 'Number of iterations...'
> +
> +bool log_on = false;
> +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> +iter_cycle_threshold;
> +
> +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> +600);
> +
>  #ifdef DPDK_NETDEV
>  static uint64_t
>  get_tsc_hz(void)
> @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
>      histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
>      histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
>      s->start_ms = time_msec();
> +    s->log_begin_it = UINT64_MAX;
> +    s->log_end_it = UINT64_MAX;
>  }
> 
>  void
> @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
>      history_init(&s->milliseconds);
>      s->start_ms = time_msec();
>      s->milliseconds.sample[0].timestamp = s->start_ms;
> +    s->log_begin_it = UINT64_MAX;
> +    s->log_end_it = UINT64_MAX;
>      /* Clearing finished. */
>      s->clear = false;
>  }
> @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> *s)  {
>      pmd_perf_stats_clear__(s);
>  }
> +
> +void
> +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> +                                     uint64_t cycles,
> +                                     char *reason) {
> +    VLOG_WARN_RL(&latency_rl,
> +                 "Suspicious iteration (%s): tsc=%"PRIu64
> +                 " duration=%"PRIu64" us\n",
> +                 reason, s->current.timestamp,
> +                 (1000000L * cycles) / get_tsc_hz());
> +    if (log_it_before + log_it_after > 0) {
> +        if (s->log_begin_it == UINT64_MAX) {
> +            s->log_begin_it =
> +                    (s->iterations.idx + HISTORY_LEN - log_it_before)
> +                        % HISTORY_LEN;
> +        }
> +        s->log_end_it =
> +                (s->iterations.idx + log_it_after) % HISTORY_LEN;
> +    }
> +}
> +
> +void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> +    struct ds log = DS_EMPTY_INITIALIZER;
> +    pmd_perf_format_iteration_history(&log, s,
> +            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> +    VLOG_WARN_RL(&latency_rl,
> +                 "Neighborhood of suspicious iteration:\n"
> +                 "%s", ds_cstr(&log));
> +    ds_destroy(&log);
> +    s->log_end_it = s->log_begin_it = UINT64_MAX;
> +    if (log_it_before > 100 || log_it_after > 100) {
> +        /* Reset to safe default values to avoid disturbance. */
> +        log_it_before = LOG_IT_BEFORE;
> +        log_it_after = LOG_IT_AFTER;
> +    }
> +}
> +
> +void
> +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> +                 int argc, const char *argv[],
> +                 void *aux OVS_UNUSED)
> +{
> +    int it_before, it_after, us_thr, q_thr;
> +    bool on;
> +    bool usage = false;
> +
> +    on = log_on;
> +    it_before = log_it_before;
> +    it_after = log_it_after;
> +    q_thr = log_q_thr;
> +    us_thr = log_us_thr;
> +
> +    while (argc > 1) {
> +        if (!strcmp(argv[1], "on")) {
> +            on = true;
> +            argc--;
> +            argv++;
> +        } else if (!strcmp(argv[1], "off")) {
> +            on = false;
> +            argc--;
> +            argv++;
> +        } else if (!strcmp(argv[1], "-a")) {
> +            it_after = strtol(argv[2], NULL, 10);
> +            if (it_after < 0) {
> +                it_after = 0;
> +            } else if (it_before + it_after > HISTORY_LEN-1) {
> +                it_after = HISTORY_LEN-1 - it_before;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-b")) {
> +            it_before = strtol(argv[2], NULL, 10);
> +            if (it_before < 0) {
> +                it_before = 0;
> +            } else if (it_before > HISTORY_LEN-10) {
> +                it_before = HISTORY_LEN-10;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-q")) {
> +            q_thr = strtol(argv[2], NULL, 10);
> +            if (q_thr < 0) {
> +                q_thr = 0;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-us")) {
> +            us_thr = strtol(argv[2], NULL, 10);
> +            if (us_thr < 0) {
> +                us_thr = 0;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else {
> +            usage = true;
> +            break;
> +        }
> +    }
> +
> +    if (usage) {
> +        unixctl_command_reply_error(conn,
> +                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> +                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> +        return;
> +    }
> +
> +    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> +              " q_thr=%d\n",
> +              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> +    log_on = on;
> +    log_it_before = it_before;
> +    log_it_after = it_after;
> +    log_q_thr = q_thr;
> +    log_us_thr = us_thr;
> +    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> +
> +    unixctl_command_reply(conn, "");
> +}
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> 7a89c40..aa55c68 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -160,8 +160,12 @@ struct pmd_perf_stats {
>      struct histogram max_vhost_qfill;
>      /* Iteration history buffer. */
>      struct history iterations;
> -    /* Millisecond hitory buffer. */
> +    /* Millisecond history buffer. */
>      struct history milliseconds;
> +    /* Start of iteration range to log. */
> +    uint64_t log_begin_it;
> +    /* End of iteration range to log. */
> +    uint64_t log_end_it;
>  };
> 
>  /* Support for accurate timing of PMD execution on TSC clock cycle level.
> @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
>      return history_next(h);
>  }
> 
> +/* Data and function related to logging of suspicious iterations. */
> +
> +extern bool log_on;
> +extern uint32_t log_q_thr;
> +extern uint64_t iter_cycle_threshold;
> +
> +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> +                                       uint64_t cycles,
> +                                       char *reason); void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> +
>  /* Functions recording PMD metrics per iteration. */
> 
>  static inline void
> @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
>      uint64_t now_tsc = cycles_counter_update(s);
>      struct iter_stats *cum_ms;
>      uint64_t cycles, cycles_per_pkt = 0;
> +    char *reason = NULL;
> 
>      if (OVS_UNLIKELY(s->current.timestamp == 0)) {
>          /* Stats were cleared during the ongoing iteration. */ @@ -404,6 +420,23
> @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>      cum_ms->batches += s->current.batches;
>      cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> 
> +    if (log_on) {
> +        /* Log suspicious iterations. */
[[BO'M]] It took me a while to realise that of course we need to wait for some iterations after the suspicious iter in order to log details of the requested after iterations too. 
Suggest comment something like "Here we decide if this iteration is 'suspicious' and needs to be logged and if so what the range of before and after iterations that should be logged along with it."
> +        if (cycles > iter_cycle_threshold) {
> +            reason = "Excessive total cycles";
> +        } else if (s->current.max_vhost_qfill >= log_q_thr) {
> +            reason = "Vhost RX queue full";
> +        }
> +        if (OVS_UNLIKELY(reason)) {
> +            pmd_perf_log_suspicious_iteration(s, cycles, reason);
> +        }
> +
> +        /* Log iteration stats interval when required. */
[[BO'M]] Suggest something like " Suspicous iterations are not logged immediately - we wait until the required number of iterations 'after' iterations are available and log everything together"
> +        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> +            pmd_perf_log_iteration_neighborhood(s);
> +        }
> +    }
> +
>      /* Store in iteration history. */
>      history_store(&s->iterations, &s->current);
>      if (now_tsc > s->next_check_tsc) {
> @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
>      }
>  }
> 
> -/* Functions for formatting the output of commands. */
> +/* Formatting the output of commands. */
> 
>  struct pmd_perf_params {
>      int command_type;
> @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> *str,
>                                         int n_iter);  void pmd_perf_format_ms_history(struct ds
> *str, struct pmd_perf_stats *s,
>                                  int n_ms);
> +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> +                          int argc, const char *argv[],
> +                          void *aux OVS_UNUSED);
> 
>  #ifdef  __cplusplus
>  }
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> int argc, const char *argv[],
>      ovs_mutex_lock(&dp_netdev_mutex);
> 
>      while (argc > 1) {
> -        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> +        if (!strcmp(argv[1], "-pmd") && argc > 2) {
>              if (str_to_uint(argv[2], 10, &core_id)) {
>                  filter_on_pmd = true;
>              }
> @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
>      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
>                               0, 1, dpif_netdev_pmd_rebalance,
>                               NULL);
> +    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> +                             "on|off [-b before] [-a after] "
> +                             "[-us usec] [-q qlen]",
> +                             0, 10, pmd_perf_log_set_cmd,
> +                             NULL);
>      return 0;
>  }
> 
> diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> 53f4c51..82e372f 100644
> --- a/lib/netdev-dpif-unixctl.man
> +++ b/lib/netdev-dpif-unixctl.man
> @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
> 
>  The command options are
> 
> -    \fB-nh\fR:                  Suppress the histograms
> +    \fB-nh\fR:            Suppress the histograms
>      \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
>      \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
> 
> @@ -101,10 +101,53 @@ To reset the counters and start a new
> measurement use  .
>  .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
>  Resets to zero the per pmd thread performance numbers shown by the -
> \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> commands.
> +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> commands.
>  It will NOT reset datapath or bridge statistics, only the values shown by  the
> above commands.
>  .
> +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \  [\fB-b\fR
> +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> +[\fB-q\fR \fIqlen\fR]"
> +.
> +The userspace "netdev" datapath is able to supervise the PMD
> +performance metrics and detect iterations with suspicious statistics
> +according to the following criteria:
> +
> +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> +  This can be used to capture events where a PMD is blocked or
> +  interrupted for such a period of time that there is a risk for
> +  dropped packets on any of its Rx queues.
> +
> +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> +can
> +  be used to infer virtio queue overruns and dropped packets inside a
> +VM,
> +  which are not visible in OVS otherwise.
> +
> +Such suspicious iterations can be logged together with their iteration
> +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> +to packet drop or other events outside OVS.
> +
> +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> +supervision and logging at run-time and can be used to adjust the above
> +thresholds for detecting suspicious iterations. By default supervision
> +and logging is disabled.
> +
> +The command options are:
> +
> +    \fB-b\fR \fIbefore\fR:
> +        The number of iterations before the suspicious iteration to be logged
> (default 5).
> +    \fB-a\fR \fIafter\fR:
> +        The number of iterations after the suspicious iteration to be logged
> (default 5).
> +    \fB-q\fR \fIqlen\fR:
> +        Suspicious vhost queue fill level threshold. Increase this to 512 if
> +        the Qemu supports 1024 virtio queue length (default 128).
> +    \fB-us\fR \fIusec\fR:
> +        change the duration threshold for a suspicious iteration (default 250 us).
> +
> +If more than 100 iterations before or after a suspicious iteration have
> +been looged once, OVS falls back to the safe default values (5/5) to
[[BO'M]] typo
> +avoid that logging itself causes continuos further suspicious
[[BO'M]] typo
> +iterations and continuous logging.
> +.
>  .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
>  For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> ids  with port names, which this thread polls.
> --
> 1.9.1
Billy O'Mahony Jan. 19, 2018, 5:10 p.m. UTC | #2
> -----Original Message-----
> From: Jan Scheurich [mailto:jan.scheurich@ericsson.com]
> Sent: Tuesday, January 16, 2018 1:51 AM
> To: dev@openvswitch.org
> Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>;
> i.maximets@samsung.com; O Mahony, Billy <billy.o.mahony@intel.com>;
> Jan Scheurich <jan.scheurich@ericsson.com>
> Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> PMD iterations
> 
> This patch enhances dpif-netdev-perf to detect iterations with suspicious
> statistics according to the following criteria:
> 
> - iteration lasts longer than US_THR microseconds (default 250).
>   This can be used to capture events where a PMD is blocked or
>   interrupted for such a period of time that there is a risk for
>   dropped packets on any of its Rx queues.
> 
> - max vhost qlen exceeds a threshold Q_THR (default 128). This can
>   be used to infer virtio queue overruns and dropped packets inside
>   a VM, which are not visible in OVS otherwise.
> 
> Such suspicious iterations can be logged together with their iteration
> statistics to be able to correlate them to packet drop or other events outside
> OVS.
> 
> A new command is introduced to enable/disable logging at run-time and to
> adjust the above thresholds for suspicious iterations:
> 
> ovs-appctl dpif-netdev/pmd-perf-log-set on | off
>     [-b before] [-a after] [-us usec] [-q qlen]
> 
> Turn logging on or off at run-time (on|off).
> 
> -b before:  The number of iterations before the suspicious iteration to
>             be logged (default 5).

 [[BO'M]] I've found the before value includes the suspicious iteration itself. I think it's fine to just mention that here rather than

> -a after:   The number of iterations after the suspicious iteration to
>             be logged (default 5).
> -q qlen:    Suspicious vhost queue fill level threshold. Increase this
>             to 512 if the Qemu supports 1024 virtio queue length.
>             (default 128).
> -us usec:   change the duration threshold for a suspicious iteration
>             (default 250 us).
> 
> If more than 100 iterations before or after a suspicious iteration have been
> looged once, OVS falls back to the safe default values (5/5) to avoid that
> logging itself causes continuos further logging.
> 
> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> ---
>  NEWS                        |   2 +
>  lib/dpif-netdev-perf.c      | 142
> ++++++++++++++++++++++++++++++++++++++++++++
>  lib/dpif-netdev-perf.h      |  40 ++++++++++++-
>  lib/dpif-netdev.c           |   7 ++-
>  lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
>  5 files changed, 233 insertions(+), 5 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 743528e..7d40374 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -47,6 +47,8 @@ Post-v2.8.0
>       * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> single PMD
>       * Detailed PMD performance metrics available with new command
>           ovs-appctl dpif-netdev/pmd-perf-show
> +     * Supervision of PMD performance metrics and logging of suspicious
> +       iterations
>     - vswitchd:
>       * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
>       * Configuring a controller, or unconfiguring all controllers, now deletes diff
> --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> e0ef15d..259a6c8 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -24,6 +24,23 @@
> 
>  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> 
> +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> duration
> +                                   in microseconds. */
> +#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
> +#define LOG_IT_BEFORE 5         /* Number of iteration to log before
> +                                   suspicious iteration. */
> +#define LOG_IT_AFTER 5          /* Number of iteration to log after
> +                                   suspicious iteration. */
> +
> +bool log_on = false;
> +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> +iter_cycle_threshold;
> +
> +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> +600);
> +
>  #ifdef DPDK_NETDEV
>  static uint64_t
>  get_tsc_hz(void)
> @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
>      histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
>      histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
>      s->start_ms = time_msec();
> +    s->log_begin_it = UINT64_MAX;
> +    s->log_end_it = UINT64_MAX;
>  }
> 
>  void
> @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
>      history_init(&s->milliseconds);
>      s->start_ms = time_msec();
>      s->milliseconds.sample[0].timestamp = s->start_ms;
> +    s->log_begin_it = UINT64_MAX;
> +    s->log_end_it = UINT64_MAX;
>      /* Clearing finished. */
>      s->clear = false;
>  }
> @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> *s)  {
>      pmd_perf_stats_clear__(s);
>  }
> +
> +void
> +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> +                                     uint64_t cycles,
> +                                     char *reason) {
> +    VLOG_WARN_RL(&latency_rl,
> +                 "Suspicious iteration (%s): tsc=%"PRIu64
> +                 " duration=%"PRIu64" us\n",
> +                 reason, s->current.timestamp,
> +                 (1000000L * cycles) / get_tsc_hz());
> +    if (log_it_before + log_it_after > 0) {
> +        if (s->log_begin_it == UINT64_MAX) {
> +            s->log_begin_it =
> +                    (s->iterations.idx + HISTORY_LEN - log_it_before)
> +                        % HISTORY_LEN;
> +        }
> +        s->log_end_it =
> +                (s->iterations.idx + log_it_after) % HISTORY_LEN;
> +    }
> +}
> +
> +void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> +    struct ds log = DS_EMPTY_INITIALIZER;
> +    pmd_perf_format_iteration_history(&log, s,
> +            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> +    VLOG_WARN_RL(&latency_rl,
> +                 "Neighborhood of suspicious iteration:\n"
> +                 "%s", ds_cstr(&log));
> +    ds_destroy(&log);
> +    s->log_end_it = s->log_begin_it = UINT64_MAX;
> +    if (log_it_before > 100 || log_it_after > 100) {
> +        /* Reset to safe default values to avoid disturbance. */
> +        log_it_before = LOG_IT_BEFORE;
> +        log_it_after = LOG_IT_AFTER;
> +    }
> +}
> +
> +void
> +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> +                 int argc, const char *argv[],
> +                 void *aux OVS_UNUSED)
> +{
> +    int it_before, it_after, us_thr, q_thr;
> +    bool on;
> +    bool usage = false;
> +
> +    on = log_on;
> +    it_before = log_it_before;
> +    it_after = log_it_after;
> +    q_thr = log_q_thr;
> +    us_thr = log_us_thr;
> +
> +    while (argc > 1) {
> +        if (!strcmp(argv[1], "on")) {
> +            on = true;
> +            argc--;
> +            argv++;
> +        } else if (!strcmp(argv[1], "off")) {
> +            on = false;
> +            argc--;
> +            argv++;
> +        } else if (!strcmp(argv[1], "-a")) {
> +            it_after = strtol(argv[2], NULL, 10);
> +            if (it_after < 0) {
> +                it_after = 0;
> +            } else if (it_before + it_after > HISTORY_LEN-1) {
> +                it_after = HISTORY_LEN-1 - it_before;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-b")) {
> +            it_before = strtol(argv[2], NULL, 10);
> +            if (it_before < 0) {
> +                it_before = 0;
> +            } else if (it_before > HISTORY_LEN-10) {
> +                it_before = HISTORY_LEN-10;
> +            }
[[BO'M]] The logic here assumes -b is give before -a. If you say 
	ovs-appctl dpif-netdev/pmd-perf-log-set on -a 1500 -b 1500
Then you get a before+after > history_len
	2018-01-19T13:57:48.953Z|00141|pmd_perf|INFO|pmd-perf-log-set: on, before=990, after=994, us_thr=250, q_thr=128
If the cli values were parsed first and then fixed up after the argv loop it would avoid the inter-dependencies causing this.
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-q")) {
> +            q_thr = strtol(argv[2], NULL, 10);
> +            if (q_thr < 0) {
> +                q_thr = 0;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else if (!strcmp(argv[1], "-us")) {
> +            us_thr = strtol(argv[2], NULL, 10);
> +            if (us_thr < 0) {
> +                us_thr = 0;
> +            }
> +            argc -= 2;
> +            argv += 2;
> +        } else {
> +            usage = true;
> +            break;
> +        }
> +    }
> +
> +    if (usage) {
> +        unixctl_command_reply_error(conn,
> +                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> +                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> +        return;
> +    }
> +
> +    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> +              " q_thr=%d\n",
> +              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> +    log_on = on;
> +    log_it_before = it_before;
> +    log_it_after = it_after;
> +    log_q_thr = q_thr;
> +    log_us_thr = us_thr;
> +    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> +
> +    unixctl_command_reply(conn, "");
> +}
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> 7a89c40..aa55c68 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -160,8 +160,12 @@ struct pmd_perf_stats {
>      struct histogram max_vhost_qfill;
>      /* Iteration history buffer. */
>      struct history iterations;
> -    /* Millisecond hitory buffer. */
> +    /* Millisecond history buffer. */
>      struct history milliseconds;
> +    /* Start of iteration range to log. */
> +    uint64_t log_begin_it;
> +    /* End of iteration range to log. */
> +    uint64_t log_end_it;
>  };
> 
>  /* Support for accurate timing of PMD execution on TSC clock cycle level.
> @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
>      return history_next(h);
>  }
> 
> +/* Data and function related to logging of suspicious iterations. */
> +
> +extern bool log_on;
> +extern uint32_t log_q_thr;
> +extern uint64_t iter_cycle_threshold;
> +
> +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> +                                       uint64_t cycles,
> +                                       char *reason); void
> +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> +
>  /* Functions recording PMD metrics per iteration. */
> 
>  static inline void
> @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
>      uint64_t now_tsc = cycles_counter_update(s);
>      struct iter_stats *cum_ms;
>      uint64_t cycles, cycles_per_pkt = 0;
> +    char *reason = NULL;
> 
>      if (OVS_UNLIKELY(s->current.timestamp == 0)) {
>          /* Stats were cleared during the ongoing iteration. */ @@ -404,6 +420,23
> @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>      cum_ms->batches += s->current.batches;
>      cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> 
> +    if (log_on) {
> +        /* Log suspicious iterations. */
> +        if (cycles > iter_cycle_threshold) {
> +            reason = "Excessive total cycles";
> +        } else if (s->current.max_vhost_qfill >= log_q_thr) {
> +            reason = "Vhost RX queue full";
> +        }
> +        if (OVS_UNLIKELY(reason)) {
> +            pmd_perf_log_suspicious_iteration(s, cycles, reason);
> +        }
> +
> +        /* Log iteration stats interval when required. */
> +        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> +            pmd_perf_log_iteration_neighborhood(s);
> +        }
> +    }
> +
>      /* Store in iteration history. */
>      history_store(&s->iterations, &s->current);
>      if (now_tsc > s->next_check_tsc) {
> @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> int rx_packets,
>      }
>  }
[[BO'M]] 
I tested the logging by randomly deciding (about every 30s or so) that an iter was suspicious (I did "if (cycles > iter_cycle_threshold || random_uint32() < 200 ) { ... ")

In that case it seems that nearly always that the iter *after* the suspicious iter is also quite a long iter.  I believe this is because the cost of the vlog in in pmd_perf_log_suspicious_iteration get assigned to the iter after the suspicious iter (the cycles for the  current iteration being finalized earlier in the fn by s->current.cycles = cycles;

2018-01-19T14:46:13.280Z|00497|pmd_perf(pmd60)|WARN|Suspicious iteration (Excessive total cycles): tsc=8512846813946109 duration=269 us
2018-01-19T14:46:13.280Z|00498|pmd_perf(pmd60)|WARN|Neighborhood of suspicious iteration:
   tsc                 cycles       packets      cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   8512846814787757    30208        128          236          32           0            0            0
   8512846814757449    30308        128          236          32           0            0            0
   8512846814727185    30264        128          236          32           0            0            0
   8512846814564817    162368       128          1268         32           0            0            0
   8512846813946109    618708       128          4833         32           0            0            0   <<<  
   8512846813916321    29788        128          232          32           0            0            0
   8512846813886577    29744        128          232          32           0            0            0
   8512846813856937    29640        128          231          32           0            0            0
   8512846813827497    29440        128          230          32           0            0            0
   8512846813798061    29436        128          229          32           0            0            0


It would be good to acknowledge this in the docs or ideally to move the "WARN suspicious ... " vlog to be given at the same time as the iter dump vlog.

However, I can't figure out why when I just added a random_uint32 to trigger a suspicious iter  why the 'suspicious' iteration  shows and increased cycle count.

> 
> -/* Functions for formatting the output of commands. */
> +/* Formatting the output of commands. */
> 
>  struct pmd_perf_params {
>      int command_type;
> @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> *str,
>                                         int n_iter);  void pmd_perf_format_ms_history(struct ds
> *str, struct pmd_perf_stats *s,
>                                  int n_ms);
> +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> +                          int argc, const char *argv[],
> +                          void *aux OVS_UNUSED);
> 
>  #ifdef  __cplusplus
>  }
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> int argc, const char *argv[],
>      ovs_mutex_lock(&dp_netdev_mutex);
> 
>      while (argc > 1) {
> -        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> +        if (!strcmp(argv[1], "-pmd") && argc > 2) {
>              if (str_to_uint(argv[2], 10, &core_id)) {
>                  filter_on_pmd = true;
>              }
> @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
>      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
>                               0, 1, dpif_netdev_pmd_rebalance,
>                               NULL);
> +    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> +                             "on|off [-b before] [-a after] "
> +                             "[-us usec] [-q qlen]",
> +                             0, 10, pmd_perf_log_set_cmd,
> +                             NULL);
>      return 0;
>  }
> 
> diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> 53f4c51..82e372f 100644
> --- a/lib/netdev-dpif-unixctl.man
> +++ b/lib/netdev-dpif-unixctl.man
> @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
> 
>  The command options are
> 
> -    \fB-nh\fR:                  Suppress the histograms
> +    \fB-nh\fR:            Suppress the histograms
>      \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
>      \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
> 
> @@ -101,10 +101,53 @@ To reset the counters and start a new
> measurement use  .
>  .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
>  Resets to zero the per pmd thread performance numbers shown by the -
> \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> commands.
> +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> commands.
>  It will NOT reset datapath or bridge statistics, only the values shown by  the
> above commands.
>  .
> +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \  [\fB-b\fR
> +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> +[\fB-q\fR \fIqlen\fR]"
> +.
> +The userspace "netdev" datapath is able to supervise the PMD
> +performance metrics and detect iterations with suspicious statistics
> +according to the following criteria:
> +
> +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> +  This can be used to capture events where a PMD is blocked or
> +  interrupted for such a period of time that there is a risk for
> +  dropped packets on any of its Rx queues.
> +
> +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> +can
> +  be used to infer virtio queue overruns and dropped packets inside a
> +VM,
> +  which are not visible in OVS otherwise.
> +
> +Such suspicious iterations can be logged together with their iteration
> +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> +to packet drop or other events outside OVS.
> +
> +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> +supervision and logging at run-time and can be used to adjust the above
> +thresholds for detecting suspicious iterations. By default supervision
> +and logging is disabled.
> +
> +The command options are:
> +
> +    \fB-b\fR \fIbefore\fR:
> +        The number of iterations before the suspicious iteration to be logged
> (default 5).
> +    \fB-a\fR \fIafter\fR:
> +        The number of iterations after the suspicious iteration to be logged
> (default 5).
> +    \fB-q\fR \fIqlen\fR:
> +        Suspicious vhost queue fill level threshold. Increase this to 512 if
> +        the Qemu supports 1024 virtio queue length (default 128).
> +    \fB-us\fR \fIusec\fR:
> +        change the duration threshold for a suspicious iteration (default 250 us).
> +
> +If more than 100 iterations before or after a suspicious iteration have
> +been looged once, OVS falls back to the safe default values (5/5) to
> +avoid that logging itself causes continuos further suspicious
> +iterations and continuous logging.
> +.
>  .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
>  For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> ids  with port names, which this thread polls.
> --
> 1.9.1
Jan Scheurich Jan. 26, 2018, 11:46 a.m. UTC | #3
Thanks for the review. Issues addressed in v8.
/Jan

> -----Original Message-----
> From: O Mahony, Billy [mailto:billy.o.mahony@intel.com]
> Sent: Friday, 19 January, 2018 12:07
> To: Jan Scheurich <jan.scheurich@ericsson.com>; dev@openvswitch.org
> Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>; i.maximets@samsung.com
> Subject: RE: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations
> 
> Hi All,
> 
> I'm going to actually try out the code next. But for now a few comments on the code.
> 
> > -----Original Message-----
> > From: Jan Scheurich [mailto:jan.scheurich@ericsson.com]
> > Sent: Tuesday, January 16, 2018 1:51 AM
> > To: dev@openvswitch.org
> > Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>;
> > i.maximets@samsung.com; O Mahony, Billy <billy.o.mahony@intel.com>;
> > Jan Scheurich <jan.scheurich@ericsson.com>
> > Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> > PMD iterations
> >
> > This patch enhances dpif-netdev-perf to detect iterations with suspicious
> > statistics according to the following criteria:
> >
> > - iteration lasts longer than US_THR microseconds (default 250).
> >   This can be used to capture events where a PMD is blocked or
> >   interrupted for such a period of time that there is a risk for
> >   dropped packets on any of its Rx queues.
> >
> > - max vhost qlen exceeds a threshold Q_THR (default 128). This can
> >   be used to infer virtio queue overruns and dropped packets inside
> >   a VM, which are not visible in OVS otherwise.
> >
> > Such suspicious iterations can be logged together with their iteration
> > statistics to be able to correlate them to packet drop or other events outside
> > OVS.
> >
> > A new command is introduced to enable/disable logging at run-time and to
> > adjust the above thresholds for suspicious iterations:
> >
> > ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> >     [-b before] [-a after] [-us usec] [-q qlen]
> >
> > Turn logging on or off at run-time (on|off).
> >
> > -b before:  The number of iterations before the suspicious iteration to
> >             be logged (default 5).
> > -a after:   The number of iterations after the suspicious iteration to
> >             be logged (default 5).
> > -q qlen:    Suspicious vhost queue fill level threshold. Increase this
> >             to 512 if the Qemu supports 1024 virtio queue length.
> >             (default 128).
> > -us usec:   change the duration threshold for a suspicious iteration
> >             (default 250 us).
> >
> > If more than 100 iterations before or after a suspicious iteration have been
> > looged once, OVS falls back to the safe default values (5/5) to avoid that
> > logging itself causes continuos further logging.
> >
> > Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> > ---
> >  NEWS                        |   2 +
> >  lib/dpif-netdev-perf.c      | 142
> > ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/dpif-netdev-perf.h      |  40 ++++++++++++-
> >  lib/dpif-netdev.c           |   7 ++-
> >  lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
> >  5 files changed, 233 insertions(+), 5 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 743528e..7d40374 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -47,6 +47,8 @@ Post-v2.8.0
> >       * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> > single PMD
> >       * Detailed PMD performance metrics available with new command
> >           ovs-appctl dpif-netdev/pmd-perf-show
> > +     * Supervision of PMD performance metrics and logging of suspicious
> > +       iterations
> >     - vswitchd:
> >       * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
> >       * Configuring a controller, or unconfiguring all controllers, now deletes diff
> > --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> > e0ef15d..259a6c8 100644
> > --- a/lib/dpif-netdev-perf.c
> > +++ b/lib/dpif-netdev-perf.c
> > @@ -24,6 +24,23 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> >
> > +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> > duration
> > +                                   in microseconds. */
> > +#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
> > +#define LOG_IT_BEFORE 5         /* Number of iteration to log before
> > +                                   suspicious iteration. */
> > +#define LOG_IT_AFTER 5          /* Number of iteration to log after
> > +                                   suspicious iteration. */
> [[BO'M]] typo 'Number of iterations...'

Fixed

> > +
> > +bool log_on = false;
> > +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> > +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> > +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> > +iter_cycle_threshold;
> > +
> > +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> > +600);
> > +
> >  #ifdef DPDK_NETDEV
> >  static uint64_t
> >  get_tsc_hz(void)
> > @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
> >      histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
> >      histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
> >      s->start_ms = time_msec();
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >  }
> >
> >  void
> > @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
> >      history_init(&s->milliseconds);
> >      s->start_ms = time_msec();
> >      s->milliseconds.sample[0].timestamp = s->start_ms;
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >      /* Clearing finished. */
> >      s->clear = false;
> >  }
> > @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> > *s)  {
> >      pmd_perf_stats_clear__(s);
> >  }
> > +
> > +void
> > +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                     uint64_t cycles,
> > +                                     char *reason) {
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Suspicious iteration (%s): tsc=%"PRIu64
> > +                 " duration=%"PRIu64" us\n",
> > +                 reason, s->current.timestamp,
> > +                 (1000000L * cycles) / get_tsc_hz());
> > +    if (log_it_before + log_it_after > 0) {
> > +        if (s->log_begin_it == UINT64_MAX) {
> > +            s->log_begin_it =
> > +                    (s->iterations.idx + HISTORY_LEN - log_it_before)
> > +                        % HISTORY_LEN;
> > +        }
> > +        s->log_end_it =
> > +                (s->iterations.idx + log_it_after) % HISTORY_LEN;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> > +    struct ds log = DS_EMPTY_INITIALIZER;
> > +    pmd_perf_format_iteration_history(&log, s,
> > +            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Neighborhood of suspicious iteration:\n"
> > +                 "%s", ds_cstr(&log));
> > +    ds_destroy(&log);
> > +    s->log_end_it = s->log_begin_it = UINT64_MAX;
> > +    if (log_it_before > 100 || log_it_after > 100) {
> > +        /* Reset to safe default values to avoid disturbance. */
> > +        log_it_before = LOG_IT_BEFORE;
> > +        log_it_after = LOG_IT_AFTER;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                 int argc, const char *argv[],
> > +                 void *aux OVS_UNUSED)
> > +{
> > +    int it_before, it_after, us_thr, q_thr;
> > +    bool on;
> > +    bool usage = false;
> > +
> > +    on = log_on;
> > +    it_before = log_it_before;
> > +    it_after = log_it_after;
> > +    q_thr = log_q_thr;
> > +    us_thr = log_us_thr;
> > +
> > +    while (argc > 1) {
> > +        if (!strcmp(argv[1], "on")) {
> > +            on = true;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "off")) {
> > +            on = false;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "-a")) {
> > +            it_after = strtol(argv[2], NULL, 10);
> > +            if (it_after < 0) {
> > +                it_after = 0;
> > +            } else if (it_before + it_after > HISTORY_LEN-1) {
> > +                it_after = HISTORY_LEN-1 - it_before;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-b")) {
> > +            it_before = strtol(argv[2], NULL, 10);
> > +            if (it_before < 0) {
> > +                it_before = 0;
> > +            } else if (it_before > HISTORY_LEN-10) {
> > +                it_before = HISTORY_LEN-10;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-q")) {
> > +            q_thr = strtol(argv[2], NULL, 10);
> > +            if (q_thr < 0) {
> > +                q_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-us")) {
> > +            us_thr = strtol(argv[2], NULL, 10);
> > +            if (us_thr < 0) {
> > +                us_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else {
> > +            usage = true;
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (usage) {
> > +        unixctl_command_reply_error(conn,
> > +                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> > +                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> > +        return;
> > +    }
> > +
> > +    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> > +              " q_thr=%d\n",
> > +              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> > +    log_on = on;
> > +    log_it_before = it_before;
> > +    log_it_after = it_after;
> > +    log_q_thr = q_thr;
> > +    log_us_thr = us_thr;
> > +    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> > +
> > +    unixctl_command_reply(conn, "");
> > +}
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> > 7a89c40..aa55c68 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -160,8 +160,12 @@ struct pmd_perf_stats {
> >      struct histogram max_vhost_qfill;
> >      /* Iteration history buffer. */
> >      struct history iterations;
> > -    /* Millisecond hitory buffer. */
> > +    /* Millisecond history buffer. */
> >      struct history milliseconds;
> > +    /* Start of iteration range to log. */
> > +    uint64_t log_begin_it;
> > +    /* End of iteration range to log. */
> > +    uint64_t log_end_it;
> >  };
> >
> >  /* Support for accurate timing of PMD execution on TSC clock cycle level.
> > @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
> >      return history_next(h);
> >  }
> >
> > +/* Data and function related to logging of suspicious iterations. */
> > +
> > +extern bool log_on;
> > +extern uint32_t log_q_thr;
> > +extern uint64_t iter_cycle_threshold;
> > +
> > +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                       uint64_t cycles,
> > +                                       char *reason); void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> > +
> >  /* Functions recording PMD metrics per iteration. */
> >
> >  static inline void
> > @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      uint64_t now_tsc = cycles_counter_update(s);
> >      struct iter_stats *cum_ms;
> >      uint64_t cycles, cycles_per_pkt = 0;
> > +    char *reason = NULL;
> >
> >      if (OVS_UNLIKELY(s->current.timestamp == 0)) {
> >          /* Stats were cleared during the ongoing iteration. */ @@ -404,6 +420,23
> > @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> >      cum_ms->batches += s->current.batches;
> >      cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> >
> > +    if (log_on) {
> > +        /* Log suspicious iterations. */
> [[BO'M]] It took me a while to realise that of course we need to wait for some iterations after the suspicious iter in order to log details of
> the requested after iterations too.
> Suggest comment something like "Here we decide if this iteration is 'suspicious' and needs to be logged and if so what the range of before
> and after iterations that should be logged along with it."

Have improved the comments.

> > +        if (cycles > iter_cycle_threshold) {
> > +            reason = "Excessive total cycles";
> > +        } else if (s->current.max_vhost_qfill >= log_q_thr) {
> > +            reason = "Vhost RX queue full";
> > +        }
> > +        if (OVS_UNLIKELY(reason)) {
> > +            pmd_perf_log_suspicious_iteration(s, cycles, reason);
> > +        }
> > +
> > +        /* Log iteration stats interval when required. */
> [[BO'M]] Suggest something like " Suspicous iterations are not logged immediately - we wait until the required number of iterations
> 'after' iterations are available and log everything together"

See above.

> > +        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> > +            pmd_perf_log_iteration_neighborhood(s);
> > +        }
> > +    }
> > +
> >      /* Store in iteration history. */
> >      history_store(&s->iterations, &s->current);
> >      if (now_tsc > s->next_check_tsc) {
> > @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      }
> >  }
> >
> > -/* Functions for formatting the output of commands. */
> > +/* Formatting the output of commands. */
> >
> >  struct pmd_perf_params {
> >      int command_type;
> > @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> > *str,
> >                                         int n_iter);  void pmd_perf_format_ms_history(struct ds
> > *str, struct pmd_perf_stats *s,
> >                                  int n_ms);
> > +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                          int argc, const char *argv[],
> > +                          void *aux OVS_UNUSED);
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> > int argc, const char *argv[],
> >      ovs_mutex_lock(&dp_netdev_mutex);
> >
> >      while (argc > 1) {
> > -        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> > +        if (!strcmp(argv[1], "-pmd") && argc > 2) {
> >              if (str_to_uint(argv[2], 10, &core_id)) {
> >                  filter_on_pmd = true;
> >              }
> > @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> >                               0, 1, dpif_netdev_pmd_rebalance,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> > +                             "on|off [-b before] [-a after] "
> > +                             "[-us usec] [-q qlen]",
> > +                             0, 10, pmd_perf_log_set_cmd,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> > 53f4c51..82e372f 100644
> > --- a/lib/netdev-dpif-unixctl.man
> > +++ b/lib/netdev-dpif-unixctl.man
> > @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
> >
> >  The command options are
> >
> > -    \fB-nh\fR:                  Suppress the histograms
> > +    \fB-nh\fR:            Suppress the histograms
> >      \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
> >      \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
> >
> > @@ -101,10 +101,53 @@ To reset the counters and start a new
> > measurement use  .
> >  .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> >  Resets to zero the per pmd thread performance numbers shown by the -
> > \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> > commands.
> > +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> > commands.
> >  It will NOT reset datapath or bridge statistics, only the values shown by  the
> > above commands.
> >  .
> > +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \  [\fB-b\fR
> > +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> > +[\fB-q\fR \fIqlen\fR]"
> > +.
> > +The userspace "netdev" datapath is able to supervise the PMD
> > +performance metrics and detect iterations with suspicious statistics
> > +according to the following criteria:
> > +
> > +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> > +  This can be used to capture events where a PMD is blocked or
> > +  interrupted for such a period of time that there is a risk for
> > +  dropped packets on any of its Rx queues.
> > +
> > +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> > +can
> > +  be used to infer virtio queue overruns and dropped packets inside a
> > +VM,
> > +  which are not visible in OVS otherwise.
> > +
> > +Such suspicious iterations can be logged together with their iteration
> > +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> > +to packet drop or other events outside OVS.
> > +
> > +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> > +supervision and logging at run-time and can be used to adjust the above
> > +thresholds for detecting suspicious iterations. By default supervision
> > +and logging is disabled.
> > +
> > +The command options are:
> > +
> > +    \fB-b\fR \fIbefore\fR:
> > +        The number of iterations before the suspicious iteration to be logged
> > (default 5).
> > +    \fB-a\fR \fIafter\fR:
> > +        The number of iterations after the suspicious iteration to be logged
> > (default 5).
> > +    \fB-q\fR \fIqlen\fR:
> > +        Suspicious vhost queue fill level threshold. Increase this to 512 if
> > +        the Qemu supports 1024 virtio queue length (default 128).
> > +    \fB-us\fR \fIusec\fR:
> > +        change the duration threshold for a suspicious iteration (default 250 us).
> > +
> > +If more than 100 iterations before or after a suspicious iteration have
> > +been looged once, OVS falls back to the safe default values (5/5) to
> [[BO'M]] typo
> > +avoid that logging itself causes continuos further suspicious
> [[BO'M]] typo

Fixed

> > +iterations and continuous logging.
> > +.
> >  .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> >  For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> > ids  with port names, which this thread polls.
> > --
> > 1.9.1
Jan Scheurich Jan. 26, 2018, 11:47 a.m. UTC | #4
Thanks for the review. Issues will be addressed in v8.
Please find answers below.
/Jan

> -----Original Message-----
> From: O Mahony, Billy [mailto:billy.o.mahony@intel.com]
> Sent: Friday, 19 January, 2018 18:11
> To: Jan Scheurich <jan.scheurich@ericsson.com>; dev@openvswitch.org
> Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>; i.maximets@samsung.com
> Subject: RE: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious PMD iterations
> 
> 
> 
> 
> > -----Original Message-----
> > From: Jan Scheurich [mailto:jan.scheurich@ericsson.com]
> > Sent: Tuesday, January 16, 2018 1:51 AM
> > To: dev@openvswitch.org
> > Cc: ktraynor@redhat.com; Stokes, Ian <ian.stokes@intel.com>;
> > i.maximets@samsung.com; O Mahony, Billy <billy.o.mahony@intel.com>;
> > Jan Scheurich <jan.scheurich@ericsson.com>
> > Subject: [PATCH v7 3/3] dpif-netdev: Detection and logging of suspicious
> > PMD iterations
> >
> > This patch enhances dpif-netdev-perf to detect iterations with suspicious
> > statistics according to the following criteria:
> >
> > - iteration lasts longer than US_THR microseconds (default 250).
> >   This can be used to capture events where a PMD is blocked or
> >   interrupted for such a period of time that there is a risk for
> >   dropped packets on any of its Rx queues.
> >
> > - max vhost qlen exceeds a threshold Q_THR (default 128). This can
> >   be used to infer virtio queue overruns and dropped packets inside
> >   a VM, which are not visible in OVS otherwise.
> >
> > Such suspicious iterations can be logged together with their iteration
> > statistics to be able to correlate them to packet drop or other events outside
> > OVS.
> >
> > A new command is introduced to enable/disable logging at run-time and to
> > adjust the above thresholds for suspicious iterations:
> >
> > ovs-appctl dpif-netdev/pmd-perf-log-set on | off
> >     [-b before] [-a after] [-us usec] [-q qlen]
> >
> > Turn logging on or off at run-time (on|off).
> >
> > -b before:  The number of iterations before the suspicious iteration to
> >             be logged (default 5).
> 
>  [[BO'M]] I've found the before value includes the suspicious iteration itself. I think it's fine to just mention that here rather than

I have fixed the logic so that I now log "before" + suspicious + "after" iterations.

> 
> > -a after:   The number of iterations after the suspicious iteration to
> >             be logged (default 5).
> > -q qlen:    Suspicious vhost queue fill level threshold. Increase this
> >             to 512 if the Qemu supports 1024 virtio queue length.
> >             (default 128).
> > -us usec:   change the duration threshold for a suspicious iteration
> >             (default 250 us).
> >
> > If more than 100 iterations before or after a suspicious iteration have been
> > looged once, OVS falls back to the safe default values (5/5) to avoid that
> > logging itself causes continuos further logging.
> >
> > Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> > ---
> >  NEWS                        |   2 +
> >  lib/dpif-netdev-perf.c      | 142
> > ++++++++++++++++++++++++++++++++++++++++++++
> >  lib/dpif-netdev-perf.h      |  40 ++++++++++++-
> >  lib/dpif-netdev.c           |   7 ++-
> >  lib/netdev-dpif-unixctl.man |  47 ++++++++++++++-
> >  5 files changed, 233 insertions(+), 5 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 743528e..7d40374 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -47,6 +47,8 @@ Post-v2.8.0
> >       * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a
> > single PMD
> >       * Detailed PMD performance metrics available with new command
> >           ovs-appctl dpif-netdev/pmd-perf-show
> > +     * Supervision of PMD performance metrics and logging of suspicious
> > +       iterations
> >     - vswitchd:
> >       * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
> >       * Configuring a controller, or unconfiguring all controllers, now deletes diff
> > --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> > e0ef15d..259a6c8 100644
> > --- a/lib/dpif-netdev-perf.c
> > +++ b/lib/dpif-netdev-perf.c
> > @@ -24,6 +24,23 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(pmd_perf);
> >
> > +#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration
> > duration
> > +                                   in microseconds. */
> > +#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
> > +#define LOG_IT_BEFORE 5         /* Number of iteration to log before
> > +                                   suspicious iteration. */
> > +#define LOG_IT_AFTER 5          /* Number of iteration to log after
> > +                                   suspicious iteration. */
> > +
> > +bool log_on = false;
> > +static uint32_t log_it_before = LOG_IT_BEFORE; static uint32_t
> > +log_it_after = LOG_IT_AFTER; static uint32_t log_us_thr =
> > +ITER_US_THRESHOLD; uint32_t log_q_thr = VHOST_QUEUE_FULL; uint64_t
> > +iter_cycle_threshold;
> > +
> > +static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600,
> > +600);
> > +
> >  #ifdef DPDK_NETDEV
> >  static uint64_t
> >  get_tsc_hz(void)
> > @@ -124,6 +141,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s)
> >      histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
> >      histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
> >      s->start_ms = time_msec();
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >  }
> >
> >  void
> > @@ -362,6 +381,8 @@ pmd_perf_stats_clear__(struct pmd_perf_stats *s)
> >      history_init(&s->milliseconds);
> >      s->start_ms = time_msec();
> >      s->milliseconds.sample[0].timestamp = s->start_ms;
> > +    s->log_begin_it = UINT64_MAX;
> > +    s->log_end_it = UINT64_MAX;
> >      /* Clearing finished. */
> >      s->clear = false;
> >  }
> > @@ -387,3 +408,124 @@ non_pmd_perf_stats_clear(struct pmd_perf_stats
> > *s)  {
> >      pmd_perf_stats_clear__(s);
> >  }
> > +
> > +void
> > +pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                     uint64_t cycles,
> > +                                     char *reason) {
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Suspicious iteration (%s): tsc=%"PRIu64
> > +                 " duration=%"PRIu64" us\n",
> > +                 reason, s->current.timestamp,
> > +                 (1000000L * cycles) / get_tsc_hz());
> > +    if (log_it_before + log_it_after > 0) {
> > +        if (s->log_begin_it == UINT64_MAX) {
> > +            s->log_begin_it =
> > +                    (s->iterations.idx + HISTORY_LEN - log_it_before)
> > +                        % HISTORY_LEN;
> > +        }
> > +        s->log_end_it =
> > +                (s->iterations.idx + log_it_after) % HISTORY_LEN;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s) {
> > +    struct ds log = DS_EMPTY_INITIALIZER;
> > +    pmd_perf_format_iteration_history(&log, s,
> > +            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
> > +    VLOG_WARN_RL(&latency_rl,
> > +                 "Neighborhood of suspicious iteration:\n"
> > +                 "%s", ds_cstr(&log));
> > +    ds_destroy(&log);
> > +    s->log_end_it = s->log_begin_it = UINT64_MAX;
> > +    if (log_it_before > 100 || log_it_after > 100) {
> > +        /* Reset to safe default values to avoid disturbance. */
> > +        log_it_before = LOG_IT_BEFORE;
> > +        log_it_after = LOG_IT_AFTER;
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                 int argc, const char *argv[],
> > +                 void *aux OVS_UNUSED)
> > +{
> > +    int it_before, it_after, us_thr, q_thr;
> > +    bool on;
> > +    bool usage = false;
> > +
> > +    on = log_on;
> > +    it_before = log_it_before;
> > +    it_after = log_it_after;
> > +    q_thr = log_q_thr;
> > +    us_thr = log_us_thr;
> > +
> > +    while (argc > 1) {
> > +        if (!strcmp(argv[1], "on")) {
> > +            on = true;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "off")) {
> > +            on = false;
> > +            argc--;
> > +            argv++;
> > +        } else if (!strcmp(argv[1], "-a")) {
> > +            it_after = strtol(argv[2], NULL, 10);
> > +            if (it_after < 0) {
> > +                it_after = 0;
> > +            } else if (it_before + it_after > HISTORY_LEN-1) {
> > +                it_after = HISTORY_LEN-1 - it_before;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-b")) {
> > +            it_before = strtol(argv[2], NULL, 10);
> > +            if (it_before < 0) {
> > +                it_before = 0;
> > +            } else if (it_before > HISTORY_LEN-10) {
> > +                it_before = HISTORY_LEN-10;
> > +            }
> [[BO'M]] The logic here assumes -b is give before -a. If you say
> 	ovs-appctl dpif-netdev/pmd-perf-log-set on -a 1500 -b 1500
> Then you get a before+after > history_len
> 	2018-01-19T13:57:48.953Z|00141|pmd_perf|INFO|pmd-perf-log-set: on, before=990, after=994, us_thr=250, q_thr=128
> If the cli values were parsed first and then fixed up after the argv loop it would avoid the inter-dependencies causing this.

Good spot. Have fixed this in v8.

> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-q")) {
> > +            q_thr = strtol(argv[2], NULL, 10);
> > +            if (q_thr < 0) {
> > +                q_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else if (!strcmp(argv[1], "-us")) {
> > +            us_thr = strtol(argv[2], NULL, 10);
> > +            if (us_thr < 0) {
> > +                us_thr = 0;
> > +            }
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else {
> > +            usage = true;
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (usage) {
> > +        unixctl_command_reply_error(conn,
> > +                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
> > +                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
> > +        return;
> > +    }
> > +
> > +    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
> > +              " q_thr=%d\n",
> > +              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
> > +    log_on = on;
> > +    log_it_before = it_before;
> > +    log_it_after = it_after;
> > +    log_q_thr = q_thr;
> > +    log_us_thr = us_thr;
> > +    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
> > +
> > +    unixctl_command_reply(conn, "");
> > +}
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> > 7a89c40..aa55c68 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -160,8 +160,12 @@ struct pmd_perf_stats {
> >      struct histogram max_vhost_qfill;
> >      /* Iteration history buffer. */
> >      struct history iterations;
> > -    /* Millisecond hitory buffer. */
> > +    /* Millisecond history buffer. */
> >      struct history milliseconds;
> > +    /* Start of iteration range to log. */
> > +    uint64_t log_begin_it;
> > +    /* End of iteration range to log. */
> > +    uint64_t log_end_it;
> >  };
> >
> >  /* Support for accurate timing of PMD execution on TSC clock cycle level.
> > @@ -322,6 +326,17 @@ history_store(struct history *h, struct iter_stats *is)
> >      return history_next(h);
> >  }
> >
> > +/* Data and function related to logging of suspicious iterations. */
> > +
> > +extern bool log_on;
> > +extern uint32_t log_q_thr;
> > +extern uint64_t iter_cycle_threshold;
> > +
> > +void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
> > +                                       uint64_t cycles,
> > +                                       char *reason); void
> > +pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
> > +
> >  /* Functions recording PMD metrics per iteration. */
> >
> >  static inline void
> > @@ -351,6 +366,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      uint64_t now_tsc = cycles_counter_update(s);
> >      struct iter_stats *cum_ms;
> >      uint64_t cycles, cycles_per_pkt = 0;
> > +    char *reason = NULL;
> >
> >      if (OVS_UNLIKELY(s->current.timestamp == 0)) {
> >          /* Stats were cleared during the ongoing iteration. */ @@ -404,6 +420,23
> > @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
> >      cum_ms->batches += s->current.batches;
> >      cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
> >
> > +    if (log_on) {
> > +        /* Log suspicious iterations. */
> > +        if (cycles > iter_cycle_threshold) {
> > +            reason = "Excessive total cycles";
> > +        } else if (s->current.max_vhost_qfill >= log_q_thr) {
> > +            reason = "Vhost RX queue full";
> > +        }
> > +        if (OVS_UNLIKELY(reason)) {
> > +            pmd_perf_log_suspicious_iteration(s, cycles, reason);
> > +        }
> > +
> > +        /* Log iteration stats interval when required. */
> > +        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
> > +            pmd_perf_log_iteration_neighborhood(s);
> > +        }
> > +    }
> > +
> >      /* Store in iteration history. */
> >      history_store(&s->iterations, &s->current);
> >      if (now_tsc > s->next_check_tsc) {
> > @@ -426,7 +459,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s,
> > int rx_packets,
> >      }
> >  }
> [[BO'M]]
> I tested the logging by randomly deciding (about every 30s or so) that an iter was suspicious (I did "if (cycles > iter_cycle_threshold ||
> random_uint32() < 200 ) { ... ")

Clever idea! Used that myself now in testing this more easily.

> 
> In that case it seems that nearly always that the iter *after* the suspicious iter is also quite a long iter.  I believe this is because the cost of
> the vlog in in pmd_perf_log_suspicious_iteration get assigned to the iter after the suspicious iter (the cycles for the  current iteration
> being finalized earlier in the fn by s->current.cycles = cycles;
> 
> 2018-01-19T14:46:13.280Z|00497|pmd_perf(pmd60)|WARN|Suspicious iteration (Excessive total cycles): tsc=8512846813946109
> duration=269 us
> 2018-01-19T14:46:13.280Z|00498|pmd_perf(pmd60)|WARN|Neighborhood of suspicious iteration:
>    tsc                 cycles       packets      cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
>    8512846814787757    30208        128          236          32           0            0            0
>    8512846814757449    30308        128          236          32           0            0            0
>    8512846814727185    30264        128          236          32           0            0            0
>    8512846814564817    162368       128          1268         32           0            0            0
>    8512846813946109    618708       128          4833         32           0            0            0   <<<
>    8512846813916321    29788        128          232          32           0            0            0
>    8512846813886577    29744        128          232          32           0            0            0
>    8512846813856937    29640        128          231          32           0            0            0
>    8512846813827497    29440        128          230          32           0            0            0
>    8512846813798061    29436        128          229          32           0            0            0
> 
> 
> It would be good to acknowledge this in the docs or ideally to move the "WARN suspicious ... " vlog to be given at the same time as the iter
> dump vlog.
> 
> However, I can't figure out why when I just added a random_uint32 to trigger a suspicious iter  why the 'suspicious' iteration  shows and
> increased cycle count.

Thanks for the proposal. I have delayed the WARN vlog to the iteration where I also log the iteration neighborhood, except when there are multiple suspicious iterations in the logged range, in which case I need to log the previous one immediate before storing the next. 

I have also added a config knob "[-e|-ne]" to the perf-log-set command to enable/disable the automatic extension of the logged interval in the event of subsequent suspicious iterations before logging is triggered. By default extension is disabled so that we don't get uncontrolled long intervals logged.

The fallback to safe settings "-b 5 -a 5 -ne" now happens whenever more than 100 iterations have been logged.

I also found the ghost peak of cycles in and around the suspicious iteration confusing. Previously I thought it was caused by the WARN vlog itself, but the ghost peak remained even after postponing the actual vlog. Perhaps some memory access latencies?

I have now decided to hide cycles used by actions related to supervision and logging and report only the actually packet handling cycles in the history. One can still get the full iteration wall-clock cycles by looking at the tsc time stamps.

> 
> >
> > -/* Functions for formatting the output of commands. */
> > +/* Formatting the output of commands. */
> >
> >  struct pmd_perf_params {
> >      int command_type;
> > @@ -443,6 +476,9 @@ void pmd_perf_format_iteration_history(struct ds
> > *str,
> >                                         int n_iter);  void pmd_perf_format_ms_history(struct ds
> > *str, struct pmd_perf_stats *s,
> >                                  int n_ms);
> > +void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
> > +                          int argc, const char *argv[],
> > +                          void *aux OVS_UNUSED);
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f5931bf..b493216
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -1095,7 +1095,7 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> > int argc, const char *argv[],
> >      ovs_mutex_lock(&dp_netdev_mutex);
> >
> >      while (argc > 1) {
> > -        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
> > +        if (!strcmp(argv[1], "-pmd") && argc > 2) {
> >              if (str_to_uint(argv[2], 10, &core_id)) {
> >                  filter_on_pmd = true;
> >              }
> > @@ -1218,6 +1218,11 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> >                               0, 1, dpif_netdev_pmd_rebalance,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
> > +                             "on|off [-b before] [-a after] "
> > +                             "[-us usec] [-q qlen]",
> > +                             0, 10, pmd_perf_log_set_cmd,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man index
> > 53f4c51..82e372f 100644
> > --- a/lib/netdev-dpif-unixctl.man
> > +++ b/lib/netdev-dpif-unixctl.man
> > @@ -70,7 +70,7 @@ This raw recorded data is used threefold:
> >
> >  The command options are
> >
> > -    \fB-nh\fR:                  Suppress the histograms
> > +    \fB-nh\fR:            Suppress the histograms
> >      \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
> >      \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
> >
> > @@ -101,10 +101,53 @@ To reset the counters and start a new
> > measurement use  .
> >  .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
> >  Resets to zero the per pmd thread performance numbers shown by the -
> > \fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR
> > commands.
> > +\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR
> > commands.
> >  It will NOT reset datapath or bridge statistics, only the values shown by  the
> > above commands.
> >  .
> > +.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \  [\fB-b\fR
> > +\fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
> > +[\fB-q\fR \fIqlen\fR]"
> > +.
> > +The userspace "netdev" datapath is able to supervise the PMD
> > +performance metrics and detect iterations with suspicious statistics
> > +according to the following criteria:
> > +
> > +- The iteration lasts longer than \fIusec\fR microseconds (default 250).
> > +  This can be used to capture events where a PMD is blocked or
> > +  interrupted for such a period of time that there is a risk for
> > +  dropped packets on any of its Rx queues.
> > +
> > +- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This
> > +can
> > +  be used to infer virtio queue overruns and dropped packets inside a
> > +VM,
> > +  which are not visible in OVS otherwise.
> > +
> > +Such suspicious iterations can be logged together with their iteration
> > +statistics in the \fBovs-vswitchd.log\fR to be able to correlate them
> > +to packet drop or other events outside OVS.
> > +
> > +The above command enables (\fBon\fR) or disables (\fBoff\fR)
> > +supervision and logging at run-time and can be used to adjust the above
> > +thresholds for detecting suspicious iterations. By default supervision
> > +and logging is disabled.
> > +
> > +The command options are:
> > +
> > +    \fB-b\fR \fIbefore\fR:
> > +        The number of iterations before the suspicious iteration to be logged
> > (default 5).
> > +    \fB-a\fR \fIafter\fR:
> > +        The number of iterations after the suspicious iteration to be logged
> > (default 5).
> > +    \fB-q\fR \fIqlen\fR:
> > +        Suspicious vhost queue fill level threshold. Increase this to 512 if
> > +        the Qemu supports 1024 virtio queue length (default 128).
> > +    \fB-us\fR \fIusec\fR:
> > +        change the duration threshold for a suspicious iteration (default 250 us).
> > +
> > +If more than 100 iterations before or after a suspicious iteration have
> > +been looged once, OVS falls back to the safe default values (5/5) to
> > +avoid that logging itself causes continuos further suspicious
> > +iterations and continuous logging.
> > +.
> >  .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
> >  For one or all pmd threads of the datapath \fIdp\fR show the list of queue-
> > ids  with port names, which this thread polls.
> > --
> > 1.9.1
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 743528e..7d40374 100644
--- a/NEWS
+++ b/NEWS
@@ -47,6 +47,8 @@  Post-v2.8.0
      * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD
      * Detailed PMD performance metrics available with new command
          ovs-appctl dpif-netdev/pmd-perf-show
+     * Supervision of PMD performance metrics and logging of suspicious
+       iterations
    - vswitchd:
      * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 digits.
      * Configuring a controller, or unconfiguring all controllers, now deletes
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index e0ef15d..259a6c8 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -24,6 +24,23 @@ 
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration duration
+                                   in microseconds. */
+#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
+#define LOG_IT_BEFORE 5         /* Number of iteration to log before
+                                   suspicious iteration. */
+#define LOG_IT_AFTER 5          /* Number of iteration to log after
+                                   suspicious iteration. */
+
+bool log_on = false;
+static uint32_t log_it_before = LOG_IT_BEFORE;
+static uint32_t log_it_after = LOG_IT_AFTER;
+static uint32_t log_us_thr = ITER_US_THRESHOLD;
+uint32_t log_q_thr = VHOST_QUEUE_FULL;
+uint64_t iter_cycle_threshold;
+
+static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
+
 #ifdef DPDK_NETDEV
 static uint64_t
 get_tsc_hz(void)
@@ -124,6 +141,8 @@  pmd_perf_stats_init(struct pmd_perf_stats *s)
     histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
     histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
     s->start_ms = time_msec();
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
 }
 
 void
@@ -362,6 +381,8 @@  pmd_perf_stats_clear__(struct pmd_perf_stats *s)
     history_init(&s->milliseconds);
     s->start_ms = time_msec();
     s->milliseconds.sample[0].timestamp = s->start_ms;
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
     /* Clearing finished. */
     s->clear = false;
 }
@@ -387,3 +408,124 @@  non_pmd_perf_stats_clear(struct pmd_perf_stats *s)
 {
     pmd_perf_stats_clear__(s);
 }
+
+void
+pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                     uint64_t cycles,
+                                     char *reason)
+{
+    VLOG_WARN_RL(&latency_rl,
+                 "Suspicious iteration (%s): tsc=%"PRIu64
+                 " duration=%"PRIu64" us\n",
+                 reason, s->current.timestamp,
+                 (1000000L * cycles) / get_tsc_hz());
+    if (log_it_before + log_it_after > 0) {
+        if (s->log_begin_it == UINT64_MAX) {
+            s->log_begin_it =
+                    (s->iterations.idx + HISTORY_LEN - log_it_before)
+                        % HISTORY_LEN;
+        }
+        s->log_end_it =
+                (s->iterations.idx + log_it_after) % HISTORY_LEN;
+    }
+}
+
+void
+pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s)
+{
+    struct ds log = DS_EMPTY_INITIALIZER;
+    pmd_perf_format_iteration_history(&log, s,
+            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
+    VLOG_WARN_RL(&latency_rl,
+                 "Neighborhood of suspicious iteration:\n"
+                 "%s", ds_cstr(&log));
+    ds_destroy(&log);
+    s->log_end_it = s->log_begin_it = UINT64_MAX;
+    if (log_it_before > 100 || log_it_after > 100) {
+        /* Reset to safe default values to avoid disturbance. */
+        log_it_before = LOG_IT_BEFORE;
+        log_it_after = LOG_IT_AFTER;
+    }
+}
+
+void
+pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                 int argc, const char *argv[],
+                 void *aux OVS_UNUSED)
+{
+    int it_before, it_after, us_thr, q_thr;
+    bool on;
+    bool usage = false;
+
+    on = log_on;
+    it_before = log_it_before;
+    it_after = log_it_after;
+    q_thr = log_q_thr;
+    us_thr = log_us_thr;
+
+    while (argc > 1) {
+        if (!strcmp(argv[1], "on")) {
+            on = true;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "off")) {
+            on = false;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "-a")) {
+            it_after = strtol(argv[2], NULL, 10);
+            if (it_after < 0) {
+                it_after = 0;
+            } else if (it_before + it_after > HISTORY_LEN-1) {
+                it_after = HISTORY_LEN-1 - it_before;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-b")) {
+            it_before = strtol(argv[2], NULL, 10);
+            if (it_before < 0) {
+                it_before = 0;
+            } else if (it_before > HISTORY_LEN-10) {
+                it_before = HISTORY_LEN-10;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-q")) {
+            q_thr = strtol(argv[2], NULL, 10);
+            if (q_thr < 0) {
+                q_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-us")) {
+            us_thr = strtol(argv[2], NULL, 10);
+            if (us_thr < 0) {
+                us_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else {
+            usage = true;
+            break;
+        }
+    }
+
+    if (usage) {
+        unixctl_command_reply_error(conn,
+                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
+                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
+        return;
+    }
+
+    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
+              " q_thr=%d\n",
+              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
+    log_on = on;
+    log_it_before = it_before;
+    log_it_after = it_after;
+    log_q_thr = q_thr;
+    log_us_thr = us_thr;
+    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
+
+    unixctl_command_reply(conn, "");
+}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 7a89c40..aa55c68 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -160,8 +160,12 @@  struct pmd_perf_stats {
     struct histogram max_vhost_qfill;
     /* Iteration history buffer. */
     struct history iterations;
-    /* Millisecond hitory buffer. */
+    /* Millisecond history buffer. */
     struct history milliseconds;
+    /* Start of iteration range to log. */
+    uint64_t log_begin_it;
+    /* End of iteration range to log. */
+    uint64_t log_end_it;
 };
 
 /* Support for accurate timing of PMD execution on TSC clock cycle level.
@@ -322,6 +326,17 @@  history_store(struct history *h, struct iter_stats *is)
     return history_next(h);
 }
 
+/* Data and function related to logging of suspicious iterations. */
+
+extern bool log_on;
+extern uint32_t log_q_thr;
+extern uint64_t iter_cycle_threshold;
+
+void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                       uint64_t cycles,
+                                       char *reason);
+void pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
+
 /* Functions recording PMD metrics per iteration. */
 
 static inline void
@@ -351,6 +366,7 @@  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
     uint64_t now_tsc = cycles_counter_update(s);
     struct iter_stats *cum_ms;
     uint64_t cycles, cycles_per_pkt = 0;
+    char *reason = NULL;
 
     if (OVS_UNLIKELY(s->current.timestamp == 0)) {
         /* Stats were cleared during the ongoing iteration. */
@@ -404,6 +420,23 @@  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
     cum_ms->batches += s->current.batches;
     cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
 
+    if (log_on) {
+        /* Log suspicious iterations. */
+        if (cycles > iter_cycle_threshold) {
+            reason = "Excessive total cycles";
+        } else if (s->current.max_vhost_qfill >= log_q_thr) {
+            reason = "Vhost RX queue full";
+        }
+        if (OVS_UNLIKELY(reason)) {
+            pmd_perf_log_suspicious_iteration(s, cycles, reason);
+        }
+
+        /* Log iteration stats interval when required. */
+        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
+            pmd_perf_log_iteration_neighborhood(s);
+        }
+    }
+
     /* Store in iteration history. */
     history_store(&s->iterations, &s->current);
     if (now_tsc > s->next_check_tsc) {
@@ -426,7 +459,7 @@  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
     }
 }
 
-/* Functions for formatting the output of commands. */
+/* Formatting the output of commands. */
 
 struct pmd_perf_params {
     int command_type;
@@ -443,6 +476,9 @@  void pmd_perf_format_iteration_history(struct ds *str,
                                        int n_iter);
 void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s,
                                 int n_ms);
+void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                          int argc, const char *argv[],
+                          void *aux OVS_UNUSED);
 
 #ifdef  __cplusplus
 }
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f5931bf..b493216 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1095,7 +1095,7 @@  dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     ovs_mutex_lock(&dp_netdev_mutex);
 
     while (argc > 1) {
-        if (!strcmp(argv[1], "-pmd") && argc >= 3) {
+        if (!strcmp(argv[1], "-pmd") && argc > 2) {
             if (str_to_uint(argv[2], 10, &core_id)) {
                 filter_on_pmd = true;
             }
@@ -1218,6 +1218,11 @@  dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
                              NULL);
+    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
+                             "on|off [-b before] [-a after] "
+                             "[-us usec] [-q qlen]",
+                             0, 10, pmd_perf_log_set_cmd,
+                             NULL);
     return 0;
 }
 
diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man
index 53f4c51..82e372f 100644
--- a/lib/netdev-dpif-unixctl.man
+++ b/lib/netdev-dpif-unixctl.man
@@ -70,7 +70,7 @@  This raw recorded data is used threefold:
 
 The command options are
 
-    \fB-nh\fR:                  Suppress the histograms
+    \fB-nh\fR:            Suppress the histograms
     \fB-it\fR \fIiter_len\fR:   Display the last iter_len iteration stats
     \fB-ms\fR \fIms_len\fR:     Display the last ms_len millisecond stats
 
@@ -101,10 +101,53 @@  To reset the counters and start a new measurement use
 .
 .IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]"
 Resets to zero the per pmd thread performance numbers shown by the
-\fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR commands.
+\fBdpif-netdev/pmd-stats-show\fR and \fBdpif-netdev/pmd-perf-show\fR commands.
 It will NOT reset datapath or bridge statistics, only the values shown by
 the above commands.
 .
+.IP "\fBdpif-netdev/pmd-perf-log-set\fR \fBon\fR|\fBoff\fR \
+ [\fB-b\fR \fIbefore\fR] [\fB-a\fR \fIafter\fR] [\fB-us\fR \fIusec\fR] \
+ [\fB-q\fR \fIqlen\fR]"
+.
+The userspace "netdev" datapath is able to supervise the PMD performance
+metrics and detect iterations with suspicious statistics according to the
+following criteria:
+
+- The iteration lasts longer than \fIusec\fR microseconds (default 250).
+  This can be used to capture events where a PMD is blocked or
+  interrupted for such a period of time that there is a risk for
+  dropped packets on any of its Rx queues.
+
+- The max vhost qlen exceeds a threshold \fIqlen\fR (default 128). This can
+  be used to infer virtio queue overruns and dropped packets inside a VM,
+  which are not visible in OVS otherwise.
+
+Such suspicious iterations can be logged together with their iteration
+statistics in the \fBovs-vswitchd.log\fR to be able to correlate them to
+packet drop or other events outside OVS.
+
+The above command enables (\fBon\fR) or disables (\fBoff\fR) supervision and
+logging at run-time and can be used to adjust the above thresholds for
+detecting suspicious iterations. By default supervision and logging is
+disabled.
+
+The command options are:
+
+    \fB-b\fR \fIbefore\fR:
+        The number of iterations before the suspicious iteration to be logged (default 5).
+    \fB-a\fR \fIafter\fR:
+        The number of iterations after the suspicious iteration to be logged (default 5).
+    \fB-q\fR \fIqlen\fR:
+        Suspicious vhost queue fill level threshold. Increase this to 512 if
+        the Qemu supports 1024 virtio queue length (default 128).
+    \fB-us\fR \fIusec\fR:
+        change the duration threshold for a suspicious iteration (default 250 us).
+
+If more than 100 iterations before or after a suspicious iteration have
+been looged once, OVS falls back to the safe default values (5/5) to
+avoid that logging itself causes continuos further suspicious iterations
+and continuous logging.
+.
 .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]"
 For one or all pmd threads of the datapath \fIdp\fR show the list of queue-ids
 with port names, which this thread polls.