diff mbox series

[ovs-dev,v2,1/2] dpif-offload: Add infrastructure for offload provider PMD helpers.

Message ID da7bd21d7cde2bb8f29f7acb7b3e9b3e9e2fed04.1775044381.git.echaudro@redhat.com
State New
Headers show
Series dpif-offload: Add PMD thread helpers and hardware offload simulation | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/cirrus-robot success cirrus build: passed
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Eelco Chaudron April 1, 2026, 11:57 a.m. UTC
This patch adds support for specific PMD thread initialization,
deinitialization, and a callback execution to perform work as
part of the PMD thread loop. This allows hardware offload
providers to handle any specific asynchronous or batching work.

This patch also adds cycle statistics for the provider-specific
callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/dpif-netdev-perf.c        |  19 ++++-
 lib/dpif-netdev-perf.h        |   3 +-
 lib/dpif-netdev.c             |  42 ++++++++++-
 lib/dpif-offload-dummy.c      |  38 ++++++++++
 lib/dpif-offload-provider.h   |  26 +++++++
 lib/dpif-offload.c            | 133 ++++++++++++++++++++++++++++++++++
 lib/dpif-offload.h            |  11 +++
 tests/pmd.at                  |  32 ++++++++
 utilities/checkpatch_dict.txt |   2 +
 9 files changed, 298 insertions(+), 8 deletions(-)

Comments

Eelco Chaudron April 1, 2026, 12:03 p.m. UTC | #1
On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:

> This patch adds support for specific PMD thread initialization,
> deinitialization, and a callback execution to perform work as
> part of the PMD thread loop. This allows hardware offload
> providers to handle any specific asynchronous or batching work.
>
> This patch also adds cycle statistics for the provider-specific
> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.

Bringing back the discussion on the earlier patch between Ilya and Gaetan to this revision :)

Ilya:
  Hi, Eelco.  As we talked before, this infrastructure resembles the async
  work infra that was proposed in the past for the use case of async vhost
  processing.  And I don't see any real use case proposed for it here nor
  in the RFC, where the question was asked, but not replied.

Gaetan:

  Hi Ilya, Eelco,

  Thanks for the patch and for the review.

  The use-case on our side is distributed data-structures in DOCA that
  requires each participating threads to do maintenance work periodically.

  Specifically, offload threads will insert offload objects.
  Those will reserve entries in a map that can be resized. The DOCA
  implementation requires any thread that owns an entry to perform the
  work of moving it to the new bucket / space after resize is initiated.

  This is a pervasive design choice in DOCA, they write most of their APIs
  assuming participating threads are periodically calling into these
  maintenance functions.

  Some of such work is also time-sensitive, for example the current
  implementation requires a CT offload thread to receive completions after
  some hardware initialization. Until this completion is done, the CT
  offload entry is not fully usable (cannot be queried for activity /
  counters). We cannot leave batches of CT offload entry waiting for
  completion, assuming that at some later point, we will eventually
  re-execute something in our offload provider: it leaves a few stranded
  connection objects incomplete.

  This has the result of having hardware execution of a flow with CT
  actions, but no activity counters: the software datapath then deletes
  the connection and/or flow due to inactivity.

Thanks,

Eelco

> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>  lib/dpif-netdev-perf.c        |  19 ++++-
>  lib/dpif-netdev-perf.h        |   3 +-
>  lib/dpif-netdev.c             |  42 ++++++++++-
>  lib/dpif-offload-dummy.c      |  38 ++++++++++
>  lib/dpif-offload-provider.h   |  26 +++++++
>  lib/dpif-offload.c            | 133 ++++++++++++++++++++++++++++++++++
>  lib/dpif-offload.h            |  11 +++
>  tests/pmd.at                  |  32 ++++++++
>  utilities/checkpatch_dict.txt |   2 +
>  9 files changed, 298 insertions(+), 8 deletions(-)
>
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> index 1cd4ee0842..39465ba819 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -232,6 +232,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>      uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
>      uint64_t sleep_iter = stats[PMD_SLEEP_ITER];
>      uint64_t tot_sleep_cycles = stats[PMD_CYCLES_SLEEP];
> +    uint64_t offload_cycles = stats[PMD_CYCLES_OFFLOAD];
>
>      ds_put_format(str,
>              "  Iterations:         %12"PRIu64"  (%.2f us/it)\n"
> @@ -242,7 +243,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>              "  Sleep time (us):    %12.0f  (%3.0f us/iteration avg.)\n",
>              tot_iter,
>              tot_iter
> -                ? (tot_cycles + tot_sleep_cycles) * us_per_cycle / tot_iter
> +                ? (tot_cycles + tot_sleep_cycles + offload_cycles)
> +                  * us_per_cycle / tot_iter
>                  : 0,
>              tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
>              idle_iter,
> @@ -252,6 +254,13 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>              sleep_iter, tot_iter ? 100.0 * sleep_iter / tot_iter : 0,
>              tot_sleep_cycles * us_per_cycle,
>              sleep_iter ? (tot_sleep_cycles * us_per_cycle) / sleep_iter : 0);
> +    if (offload_cycles > 0) {
> +        ds_put_format(str,
> +            "  Offload cycles:     %12" PRIu64 "  (%5.1f %% of used cycles)\n",
> +            offload_cycles,
> +            100.0 * offload_cycles / (tot_cycles + tot_sleep_cycles
> +                                      + offload_cycles));
> +    }
>      if (rx_packets > 0) {
>          ds_put_format(str,
>              "  Rx packets:         %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
> @@ -532,14 +541,14 @@ OVS_REQUIRES(s->stats_mutex)
>  void
>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>                         int tx_packets, uint64_t sleep_cycles,
> -                       bool full_metrics)
> +                       uint64_t offload_cycles, bool full_metrics)
>  {
>      uint64_t now_tsc = cycles_counter_update(s);
>      struct iter_stats *cum_ms;
>      uint64_t cycles, cycles_per_pkt = 0;
>      char *reason = NULL;
>
> -    cycles = now_tsc - s->start_tsc - sleep_cycles;
> +    cycles = now_tsc - s->start_tsc - sleep_cycles - offload_cycles;
>      s->current.timestamp = s->iteration_cnt;
>      s->current.cycles = cycles;
>      s->current.pkts = rx_packets;
> @@ -558,6 +567,10 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>          pmd_perf_update_counter(s, PMD_CYCLES_SLEEP, sleep_cycles);
>      }
>
> +    if (offload_cycles) {
> +        pmd_perf_update_counter(s, PMD_CYCLES_OFFLOAD, offload_cycles);
> +    }
> +
>      if (!full_metrics) {
>          return;
>      }
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 84beced151..2a055dacdd 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -82,6 +82,7 @@ enum pmd_stat_type {
>      PMD_CYCLES_UPCALL,      /* Cycles spent processing upcalls. */
>      PMD_SLEEP_ITER,         /* Iterations where a sleep has taken place. */
>      PMD_CYCLES_SLEEP,       /* Total cycles slept to save power. */
> +    PMD_CYCLES_OFFLOAD,     /* Total cycles spend handling offload. */
>      PMD_N_STATS
>  };
>
> @@ -411,7 +412,7 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s);
>  void
>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>                         int tx_packets, uint64_t sleep_cycles,
> -                       bool full_metrics);
> +                       uint64_t offload_cycles, bool full_metrics);
>
>  /* Formatting the output of commands. */
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 9df05c4c28..64531a02c0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -329,6 +329,9 @@ struct dp_netdev {
>      uint64_t last_reconfigure_seq;
>      struct ovsthread_once once_set_config;
>
> +    /* When a reconfigure is requested, forcefully reload all PMDs. */
> +    bool force_pmd_reload;
> +
>      /* Cpu mask for pin of pmd threads. */
>      char *pmd_cmask;
>
> @@ -339,6 +342,7 @@ struct dp_netdev {
>
>      struct conntrack *conntrack;
>      struct pmd_auto_lb pmd_alb;
> +    bool offload_enabled;
>
>      /* Bonds. */
>      struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
> @@ -4556,6 +4560,14 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>          log_all_pmd_sleeps(dp);
>      }
>
> +    if (!dp->offload_enabled) {
> +        dp->offload_enabled = dpif_offload_enabled();
> +        if (dp->offload_enabled) {
> +            dp->force_pmd_reload = true;
> +            dp_netdev_request_reconfigure(dp);
> +        }
> +    }
> +
>      return 0;
>  }
>
> @@ -6216,6 +6228,14 @@ reconfigure_datapath(struct dp_netdev *dp)
>          ovs_mutex_unlock(&pmd->port_mutex);
>      }
>
> +    /* Do we need to forcefully reload all threads? */
> +    if (dp->force_pmd_reload) {
> +        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> +            pmd->need_reload = true;
> +        }
> +        dp->force_pmd_reload = false;
> +    }
> +
>      /* Reload affected pmd threads. */
>      reload_affected_pmds(dp);
>
> @@ -6516,6 +6536,7 @@ pmd_thread_main(void *f_)
>  {
>      struct dp_netdev_pmd_thread *pmd = f_;
>      struct pmd_perf_stats *s = &pmd->perf_stats;
> +    struct dpif_offload_pmd_ctx *offload_ctx = NULL;
>      unsigned int lc = 0;
>      struct polled_queue *poll_list;
>      bool wait_for_reload = false;
> @@ -6549,6 +6570,9 @@ reload:
>          dpdk_attached = dpdk_attach_thread(pmd->core_id);
>      }
>
> +    dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id,
> +                                   pmd->numa_id, &offload_ctx);
> +
>      /* List port/core affinity */
>      for (i = 0; i < poll_cnt; i++) {
>         VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
> @@ -6588,7 +6612,7 @@ reload:
>      ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
>      for (;;) {
>          uint64_t rx_packets = 0, tx_packets = 0;
> -        uint64_t time_slept = 0;
> +        uint64_t time_slept = 0, offload_cycles = 0;
>          uint64_t max_sleep;
>
>          pmd_perf_start_iteration(s);
> @@ -6628,6 +6652,10 @@ reload:
>                                                     ? true : false);
>          }
>
> +        /* Do work required by any of the hardware offload providers. */
> +        offload_cycles = dpif_offload_pmd_thread_do_work(offload_ctx,
> +                                                         &pmd->perf_stats);
> +
>          if (max_sleep) {
>              /* Check if a sleep should happen on this iteration. */
>              if (sleep_time) {
> @@ -6687,7 +6715,7 @@ reload:
>          }
>
>          pmd_perf_end_iteration(s, rx_packets, tx_packets, time_slept,
> -                               pmd_perf_metrics_enabled(pmd));
> +                               offload_cycles, pmd_perf_metrics_enabled(pmd));
>      }
>      ovs_mutex_unlock(&pmd->perf_stats.stats_mutex);
>
> @@ -6708,6 +6736,7 @@ reload:
>          goto reload;
>      }
>
> +    dpif_offload_pmd_thread_exit(offload_ctx);
>      pmd_free_static_tx_qid(pmd);
>      dfc_cache_uninit(&pmd->flow_cache);
>      free(poll_list);
> @@ -9623,7 +9652,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>                             struct polled_queue *poll_list, int poll_cnt)
>  {
>      struct dpcls *cls;
> -    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0;
> +    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0, tot_offload = 0;
>      unsigned int pmd_load = 0;
>
>      if (pmd->ctx.now > pmd->next_cycle_store) {
> @@ -9642,11 +9671,14 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>                         pmd->prev_stats[PMD_CYCLES_ITER_BUSY];
>              tot_sleep = pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP] -
>                          pmd->prev_stats[PMD_CYCLES_SLEEP];
> +            tot_offload = pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD] -
> +                          pmd->prev_stats[PMD_CYCLES_OFFLOAD];
>
>              if (pmd_alb->is_enabled && !pmd->isolated) {
>                  if (tot_proc) {
>                      pmd_load = ((tot_proc * 100) /
> -                                    (tot_idle + tot_proc + tot_sleep));
> +                                    (tot_idle + tot_proc + tot_sleep
> +                                     + tot_offload));
>                  }
>
>                  atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> @@ -9665,6 +9697,8 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>                          pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY];
>          pmd->prev_stats[PMD_CYCLES_SLEEP] =
>                          pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP];
> +        pmd->prev_stats[PMD_CYCLES_OFFLOAD] =
> +                        pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD];
>
>          /* Get the cycles that were used to process each queue and store. */
>          for (unsigned i = 0; i < poll_cnt; i++) {
> diff --git a/lib/dpif-offload-dummy.c b/lib/dpif-offload-dummy.c
> index 28f1f40013..80d6ce67c3 100644
> --- a/lib/dpif-offload-dummy.c
> +++ b/lib/dpif-offload-dummy.c
> @@ -17,6 +17,7 @@
>  #include <config.h>
>  #include <errno.h>
>
> +#include "coverage.h"
>  #include "dpif.h"
>  #include "dpif-offload.h"
>  #include "dpif-offload-provider.h"
> @@ -33,6 +34,8 @@
>
>  VLOG_DEFINE_THIS_MODULE(dpif_offload_dummy);
>
> +COVERAGE_DEFINE(dummy_offload_do_work);
> +
>  struct pmd_id_data {
>      struct hmap_node node;
>      void *flow_reference;
> @@ -1020,6 +1023,40 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
>      }
>  }
>
> +static void
> +dummy_pmd_thread_work_cb(unsigned core_id OVS_UNUSED, int numa_id OVS_UNUSED,
> +                         void *ctx OVS_UNUSED)
> +{
> +    COVERAGE_INC(dummy_offload_do_work);
> +}
> +
> +static void
> +dummy_pmd_thread_lifecycle(const struct dpif_offload *dpif_offload,
> +                           bool exit, unsigned core_id, int numa_id,
> +                           dpif_offload_pmd_thread_work_cb **callback,
> +                           void **ctx)
> +{
> +    /* Only do this for the 'dummy' class, not for 'dummy_x'. */
> +    if (strcmp(dpif_offload_type(dpif_offload), "dummy")) {
> +        *callback = NULL;
> +        *ctx = NULL;
> +        return;
> +    }
> +
> +    VLOG_DBG(
> +        "pmd_thread_lifecycle; exit=%s, core=%u, numa=%d, cb=%p, ctx=%p",
> +        exit ? "true" : "false", core_id, numa_id, *callback, *ctx);
> +
> +    ovs_assert(!*callback || *callback == dummy_pmd_thread_work_cb);
> +
> +    if (exit) {
> +        free(*ctx);
> +    } else {
> +        *ctx = *ctx ? *ctx : xstrdup("DUMMY_OFFLOAD_WORK");
> +        *callback = dummy_pmd_thread_work_cb;
> +    }
> +}
> +
>  #define DEFINE_DPIF_DUMMY_CLASS(NAME, TYPE_STR)                             \
>      struct dpif_offload_class NAME = {                                      \
>          .type = TYPE_STR,                                                   \
> @@ -1039,6 +1076,7 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
>          .netdev_flow_del = dummy_flow_del,                                  \
>          .netdev_flow_stats = dummy_flow_stats,                              \
>          .register_flow_unreference_cb = dummy_register_flow_unreference_cb, \
> +        .pmd_thread_lifecycle = dummy_pmd_thread_lifecycle                  \
>  }
>
>  DEFINE_DPIF_DUMMY_CLASS(dpif_offload_dummy_class, "dummy");
> diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
> index 02ef46cb08..259de2c299 100644
> --- a/lib/dpif-offload-provider.h
> +++ b/lib/dpif-offload-provider.h
> @@ -87,6 +87,10 @@ dpif_offload_flow_dump_thread_init(
>  }
>
>
> +/* Offload Provider specific PMD thread work callback definition. */
> +typedef void dpif_offload_pmd_thread_work_cb(unsigned core_id, int numa_id,
> +                                             void *ctx);
> +
>  struct dpif_offload_class {
>      /* Type of DPIF offload provider in this class, e.g., "tc", "dpdk",
>       * "dummy", etc. */
> @@ -305,6 +309,28 @@ struct dpif_offload_class {
>       * to netdev_flow_put() is no longer held by the offload provider. */
>      void (*register_flow_unreference_cb)(const struct dpif_offload *,
>                                           dpif_offload_flow_unreference_cb *);
> +
> +
> +    /* The API below is specific to PMD (userspace) thread lifecycle handling.
> +     *
> +     * This API allows a provider to supply a callback function
> +     * (via `*callback`) and an optional context pointer (via `*ctx`) for a
> +     * PMD thread.
> +     *
> +     * The lifecycle hook may be invoked multiple times for the same PMD
> +     * thread.  For example, when the thread is reinitialized, this function
> +     * will be called again and the previous `callback` and `ctx` values will
> +     * be passed back in.  It is the provider's responsibility to decide
> +     * whether those should be reused, replaced, or cleaned up before storing
> +     * new values.
> +     *
> +     * When the PMD thread is terminating, this API is called with
> +     * `exit == true`.  At that point, the provider must release any resources
> +     * associated with the previously returned `callback` and `ctx`. */
> +    void (*pmd_thread_lifecycle)(const struct dpif_offload *, bool exit,
> +                                 unsigned core_id, int numa_id,
> +                                 dpif_offload_pmd_thread_work_cb **callback,
> +                                 void **ctx);
>  };
>
>  extern struct dpif_offload_class dpif_offload_dummy_class;
> diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
> index bb2feced9e..cbf1f6c704 100644
> --- a/lib/dpif-offload.c
> +++ b/lib/dpif-offload.c
> @@ -17,6 +17,7 @@
>  #include <config.h>
>  #include <errno.h>
>
> +#include "dpif-netdev-perf.h"
>  #include "dpif-offload.h"
>  #include "dpif-offload-provider.h"
>  #include "dpif-provider.h"
> @@ -54,6 +55,7 @@ static const struct dpif_offload_class *base_dpif_offload_classes[] = {
>      &dpif_offload_dummy_x_class,
>  };
>
> +#define TOTAL_PROVIDERS ARRAY_SIZE(base_dpif_offload_classes)
>  #define DEFAULT_PROVIDER_PRIORITY_LIST "tc,dpdk,dummy,dummy_x"
>
>  static char *dpif_offload_provider_priority_list = NULL;
> @@ -1665,3 +1667,134 @@ dpif_offload_port_mgr_port_count(const struct dpif_offload *offload)
>
>      return cmap_count(&offload->ports->odp_port_to_port);
>  }
> +
> +struct dpif_offload_pmd_ctx_node {
> +    const struct dpif_offload *offload;
> +    dpif_offload_pmd_thread_work_cb *callback;
> +    void *provider_ctx;
> +};
> +
> +struct dpif_offload_pmd_ctx {
> +    unsigned core_id;
> +    int numa_id;
> +    size_t n_nodes;
> +    struct dpif_offload_pmd_ctx_node nodes[TOTAL_PROVIDERS];
> +};
> +
> +void
> +dpif_offload_pmd_thread_reload(const char *dpif_name, unsigned core_id,
> +                               int numa_id, struct dpif_offload_pmd_ctx **ctx_)
> +{
> +    struct dpif_offload_pmd_ctx_node old_nodes[TOTAL_PROVIDERS];
> +    struct dpif_offload_provider_collection *collection;
> +    struct dpif_offload_pmd_ctx *ctx;
> +    struct dpif_offload *offload;
> +    size_t old_n_nodes = 0;
> +
> +    if (!dpif_offload_enabled()) {
> +        ovs_assert(!*ctx_);
> +        return;
> +    }
> +
> +    ovs_mutex_lock(&dpif_offload_mutex);
> +    collection = shash_find_data(&dpif_offload_providers, dpif_name);
> +    ovs_mutex_unlock(&dpif_offload_mutex);
> +
> +    if (OVS_UNLIKELY(!collection)) {
> +        ovs_assert(!*ctx_);
> +        return;
> +    }
> +
> +    if (!*ctx_) {
> +        /* Would be nice if we have a numa specific xzalloc(). */
> +        ctx = xzalloc(sizeof *ctx);
> +        ctx->core_id = core_id;
> +        ctx->numa_id = numa_id;
> +        *ctx_ = ctx;
> +    } else {
> +        ctx = *ctx_;
> +        old_n_nodes = ctx->n_nodes;
> +
> +        if (old_n_nodes) {
> +            memcpy(old_nodes, ctx->nodes, old_n_nodes * sizeof old_nodes[0]);
> +        }
> +
> +        /* Reset active nodes array. */
> +        memset(ctx->nodes, 0, sizeof ctx->nodes);
> +        ctx->n_nodes = 0;
> +    }
> +
> +    LIST_FOR_EACH (offload, dpif_list_node, &collection->list) {
> +
> +        ovs_assert(ctx->n_nodes < TOTAL_PROVIDERS);
> +
> +        if (!offload->class->pmd_thread_lifecycle) {
> +            continue;
> +        }
> +
> +        if (old_n_nodes) {
> +            /* If this is a reload, try to find previous callback and ctx. */
> +            for (size_t i = 0; i < old_n_nodes; i++) {
> +                struct dpif_offload_pmd_ctx_node *node = &old_nodes[i];
> +
> +                if (offload == node->offload) {
> +                    ctx->nodes[ctx->n_nodes].callback = node->callback;
> +                    ctx->nodes[ctx->n_nodes].provider_ctx = node->provider_ctx;
> +                    break;
> +                }
> +            }
> +        }
> +
> +        offload->class->pmd_thread_lifecycle(
> +            offload, false, core_id, numa_id,
> +            &ctx->nodes[ctx->n_nodes].callback,
> +            &ctx->nodes[ctx->n_nodes].provider_ctx);
> +
> +        if (ctx->nodes[ctx->n_nodes].callback) {
> +            ctx->nodes[ctx->n_nodes].offload = offload;
> +            ctx->n_nodes++;
> +        } else {
> +            memset(&ctx->nodes[ctx->n_nodes], 0,
> +                   sizeof ctx->nodes[ctx->n_nodes]);
> +        }
> +    }
> +}
> +
> +uint64_t
> +dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *ctx,
> +                                struct pmd_perf_stats *stats)
> +{
> +    struct cycle_timer offload_work_timer;
> +
> +    if (!ctx || !ctx->n_nodes) {
> +        return 0;
> +    }
> +
> +    cycle_timer_start(stats, &offload_work_timer);
> +
> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
> +        ctx->nodes[i].callback(ctx->core_id, ctx->numa_id,
> +                               ctx->nodes[i].provider_ctx);
> +    }
> +
> +    return cycle_timer_stop(stats, &offload_work_timer);
> +}
> +
> +void
> +dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *ctx)
> +{
> +    if (!ctx) {
> +        return;
> +    }
> +
> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
> +        struct dpif_offload_pmd_ctx_node *node = &ctx->nodes[i];
> +
> +        node->offload->class->pmd_thread_lifecycle(node->offload, true,
> +                                                   ctx->core_id, ctx->numa_id,
> +                                                   &node->callback,
> +                                                   &node->provider_ctx);
> +    }
> +
> +    free(ctx);
> +}
> diff --git a/lib/dpif-offload.h b/lib/dpif-offload.h
> index 7fad3ebee3..0f66d8cd8e 100644
> --- a/lib/dpif-offload.h
> +++ b/lib/dpif-offload.h
> @@ -22,6 +22,7 @@
>  /* Forward declarations of private structures. */
>  struct dpif_offload_class;
>  struct dpif_offload;
> +struct pmd_perf_stats;
>
>  /* Definition of the DPIF offload implementation type.
>   *
> @@ -186,4 +187,14 @@ dpif_offload_datapath_flow_op_continue(struct dpif_offload_flow_cb_data *cb,
>      }
>  }
>
> +/* PMD Thread helper functions. */
> +struct dpif_offload_pmd_ctx;
> +
> +void dpif_offload_pmd_thread_reload(const char *dpif_name,
> +                                    unsigned core_id, int numa_id,
> +                                    struct dpif_offload_pmd_ctx **);
> +uint64_t dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *,
> +                                         struct pmd_perf_stats *);
> +void dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *);
> +
>  #endif /* DPIF_OFFLOAD_H */
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 8254ac3b0f..54184d8c92 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -1689,3 +1689,35 @@ recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.
>
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - offload work])
> +OVS_VSWITCHD_START([], [], [], [DUMMY_NUMA],
> +                   [-- set Open_vSwitch . other_config:hw-offload=true])
> +
> +AT_CHECK([ovs-appctl vlog/set dpif_offload_dummy:dbg])
> +AT_CHECK([ovs-vsctl add-port br0 p0 -- set Interface p0 type=dummy-pmd])
> +
> +CHECK_CPU_DISCOVERED()
> +CHECK_PMD_THREADS_CREATED()
> +
> +OVS_WAIT_UNTIL(
> +  [test $(ovs-appctl coverage/read-counter dummy_offload_do_work) -gt 0])
> +
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-perf-show \
> +  | grep -Eq 'Offload cycles: +[[0-9]]+  \( *[[0-9.]]+ % of used cycles\)'])
> +
> +OVS_VSWITCHD_STOP
> +
> +LOG="$(sed -n 's/.*\(pmd_thread_lifecycle.*\)/\1/p' ovs-vswitchd.log)"
> +CB=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*cb=\([[^,]]*\).*/\1/p')
> +CTX=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*ctx=\(.*\)$/\1/p')
> +
> +AT_CHECK([echo "$LOG" | sed -n '1p' | sed 's/(nil)/0x0/g'], [0], [dnl
> +pmd_thread_lifecycle; exit=false, core=0, numa=0, cb=0x0, ctx=0x0
> +])
> +AT_CHECK([echo "$LOG" | sed -n '2p' \
> +          | grep -q "exit=false, core=0, numa=0, cb=$CB, ctx=$CTX"])
> +AT_CHECK([echo "$LOG" | sed -n '$p' \
> +          | grep -q "exit=true, core=0, numa=0, cb=$CB, ctx=$CTX"])
> +
> +AT_CLEANUP
> diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt
> index c1f43e5afa..c9b758d63c 100644
> --- a/utilities/checkpatch_dict.txt
> +++ b/utilities/checkpatch_dict.txt
> @@ -36,6 +36,7 @@ cpu
>  cpus
>  cstime
>  csum
> +ctx
>  cutime
>  cvlan
>  datapath
> @@ -46,6 +47,7 @@ decap
>  decapsulation
>  defrag
>  defragment
> +deinitialization
>  deref
>  dereference
>  dest
> -- 
> 2.52.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Kevin Traynor April 2, 2026, 10:41 a.m. UTC | #2
On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
> 
> 
> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
> 
>> This patch adds support for specific PMD thread initialization,
>> deinitialization, and a callback execution to perform work as
>> part of the PMD thread loop. This allows hardware offload
>> providers to handle any specific asynchronous or batching work.
>>
>> This patch also adds cycle statistics for the provider-specific
>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
> 
> Bringing back the discussion on the earlier patch between Ilya and Gaetan to this revision :)
> 
> Ilya:
>   Hi, Eelco.  As we talked before, this infrastructure resembles the async
>   work infra that was proposed in the past for the use case of async vhost
>   processing.  And I don't see any real use case proposed for it here nor
>   in the RFC, where the question was asked, but not replied.
> 
> Gaetan:
> 

Hi Gaetan,

A few questions below. I'm not so clear on the DOCA threading
requirements, so questions may be broad.

>   Hi Ilya, Eelco,
> 
>   Thanks for the patch and for the review.
> 
>   The use-case on our side is distributed data-structures in DOCA that
>   requires each participating threads to do maintenance work periodically.
> 
>   Specifically, offload threads will insert offload objects.
>   Those will reserve entries in a map that can be resized. The DOCA
>   implementation requires any thread that owns an entry to perform the
>   work of moving it to the new bucket / space after resize is initiated.
> 
>   This is a pervasive design choice in DOCA, they write most of their APIs
>   assuming participating threads are periodically calling into these
>   maintenance functions.
> 

What is a "particpating thread" ? IIUC, the pmd thread passes down the
flow pattern/action and the offload thread inserts the offload into the NIC.

In that case, is it the offload thread that owns the entry ?

>   Some of such work is also time-sensitive, for example the current
>   implementation requires a CT offload thread to receive completions after
>   some hardware initialization. Until this completion is done, the CT
>   offload entry is not fully usable (cannot be queried for activity /
>   counters). We cannot leave batches of CT offload entry waiting for
>   completion, assuming that at some later point, we will eventually
>   re-execute something in our offload provider: it leaves a few stranded
>   connection objects incomplete.
> 
>   This has the result of having hardware execution of a flow with CT
>   actions, but no activity counters: the software datapath then deletes
>   the connection and/or flow due to inactivity.
> 

Can this periodic work be done by the offload thread ? If it is fast
enough for inserting the offload, then maybe it is fast enough for this.

Some DPDK PMDs use alarms for periodic maintenance work, could they be
used inside DOCA for this?

If it needs to be on the PMD thread, is the work significant (i.e. more
than a few % cpu) and how variable is it ? Could it be added inside the
call to rte_eth_rx_burst polling ?

thanks,
Kevin.

> Thanks,
> 
> Eelco
> 
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>>  lib/dpif-netdev-perf.c        |  19 ++++-
>>  lib/dpif-netdev-perf.h        |   3 +-
>>  lib/dpif-netdev.c             |  42 ++++++++++-
>>  lib/dpif-offload-dummy.c      |  38 ++++++++++
>>  lib/dpif-offload-provider.h   |  26 +++++++
>>  lib/dpif-offload.c            | 133 ++++++++++++++++++++++++++++++++++
>>  lib/dpif-offload.h            |  11 +++
>>  tests/pmd.at                  |  32 ++++++++
>>  utilities/checkpatch_dict.txt |   2 +
>>  9 files changed, 298 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
>> index 1cd4ee0842..39465ba819 100644
>> --- a/lib/dpif-netdev-perf.c
>> +++ b/lib/dpif-netdev-perf.c
>> @@ -232,6 +232,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>>      uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
>>      uint64_t sleep_iter = stats[PMD_SLEEP_ITER];
>>      uint64_t tot_sleep_cycles = stats[PMD_CYCLES_SLEEP];
>> +    uint64_t offload_cycles = stats[PMD_CYCLES_OFFLOAD];
>>
>>      ds_put_format(str,
>>              "  Iterations:         %12"PRIu64"  (%.2f us/it)\n"
>> @@ -242,7 +243,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>>              "  Sleep time (us):    %12.0f  (%3.0f us/iteration avg.)\n",
>>              tot_iter,
>>              tot_iter
>> -                ? (tot_cycles + tot_sleep_cycles) * us_per_cycle / tot_iter
>> +                ? (tot_cycles + tot_sleep_cycles + offload_cycles)
>> +                  * us_per_cycle / tot_iter
>>                  : 0,
>>              tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
>>              idle_iter,
>> @@ -252,6 +254,13 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
>>              sleep_iter, tot_iter ? 100.0 * sleep_iter / tot_iter : 0,
>>              tot_sleep_cycles * us_per_cycle,
>>              sleep_iter ? (tot_sleep_cycles * us_per_cycle) / sleep_iter : 0);
>> +    if (offload_cycles > 0) {
>> +        ds_put_format(str,
>> +            "  Offload cycles:     %12" PRIu64 "  (%5.1f %% of used cycles)\n",
>> +            offload_cycles,
>> +            100.0 * offload_cycles / (tot_cycles + tot_sleep_cycles
>> +                                      + offload_cycles));
>> +    }
>>      if (rx_packets > 0) {
>>          ds_put_format(str,
>>              "  Rx packets:         %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
>> @@ -532,14 +541,14 @@ OVS_REQUIRES(s->stats_mutex)
>>  void
>>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>>                         int tx_packets, uint64_t sleep_cycles,
>> -                       bool full_metrics)
>> +                       uint64_t offload_cycles, bool full_metrics)
>>  {
>>      uint64_t now_tsc = cycles_counter_update(s);
>>      struct iter_stats *cum_ms;
>>      uint64_t cycles, cycles_per_pkt = 0;
>>      char *reason = NULL;
>>
>> -    cycles = now_tsc - s->start_tsc - sleep_cycles;
>> +    cycles = now_tsc - s->start_tsc - sleep_cycles - offload_cycles;
>>      s->current.timestamp = s->iteration_cnt;
>>      s->current.cycles = cycles;
>>      s->current.pkts = rx_packets;
>> @@ -558,6 +567,10 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>>          pmd_perf_update_counter(s, PMD_CYCLES_SLEEP, sleep_cycles);
>>      }
>>
>> +    if (offload_cycles) {
>> +        pmd_perf_update_counter(s, PMD_CYCLES_OFFLOAD, offload_cycles);
>> +    }
>> +
>>      if (!full_metrics) {
>>          return;
>>      }
>> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
>> index 84beced151..2a055dacdd 100644
>> --- a/lib/dpif-netdev-perf.h
>> +++ b/lib/dpif-netdev-perf.h
>> @@ -82,6 +82,7 @@ enum pmd_stat_type {
>>      PMD_CYCLES_UPCALL,      /* Cycles spent processing upcalls. */
>>      PMD_SLEEP_ITER,         /* Iterations where a sleep has taken place. */
>>      PMD_CYCLES_SLEEP,       /* Total cycles slept to save power. */
>> +    PMD_CYCLES_OFFLOAD,     /* Total cycles spend handling offload. */
>>      PMD_N_STATS
>>  };
>>
>> @@ -411,7 +412,7 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s);
>>  void
>>  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
>>                         int tx_packets, uint64_t sleep_cycles,
>> -                       bool full_metrics);
>> +                       uint64_t offload_cycles, bool full_metrics);
>>
>>  /* Formatting the output of commands. */
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 9df05c4c28..64531a02c0 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -329,6 +329,9 @@ struct dp_netdev {
>>      uint64_t last_reconfigure_seq;
>>      struct ovsthread_once once_set_config;
>>
>> +    /* When a reconfigure is requested, forcefully reload all PMDs. */
>> +    bool force_pmd_reload;
>> +
>>      /* Cpu mask for pin of pmd threads. */
>>      char *pmd_cmask;
>>
>> @@ -339,6 +342,7 @@ struct dp_netdev {
>>
>>      struct conntrack *conntrack;
>>      struct pmd_auto_lb pmd_alb;
>> +    bool offload_enabled;
>>
>>      /* Bonds. */
>>      struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
>> @@ -4556,6 +4560,14 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>>          log_all_pmd_sleeps(dp);
>>      }
>>
>> +    if (!dp->offload_enabled) {
>> +        dp->offload_enabled = dpif_offload_enabled();
>> +        if (dp->offload_enabled) {
>> +            dp->force_pmd_reload = true;
>> +            dp_netdev_request_reconfigure(dp);
>> +        }
>> +    }
>> +
>>      return 0;
>>  }
>>
>> @@ -6216,6 +6228,14 @@ reconfigure_datapath(struct dp_netdev *dp)
>>          ovs_mutex_unlock(&pmd->port_mutex);
>>      }
>>
>> +    /* Do we need to forcefully reload all threads? */
>> +    if (dp->force_pmd_reload) {
>> +        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
>> +            pmd->need_reload = true;
>> +        }
>> +        dp->force_pmd_reload = false;
>> +    }
>> +
>>      /* Reload affected pmd threads. */
>>      reload_affected_pmds(dp);
>>
>> @@ -6516,6 +6536,7 @@ pmd_thread_main(void *f_)
>>  {
>>      struct dp_netdev_pmd_thread *pmd = f_;
>>      struct pmd_perf_stats *s = &pmd->perf_stats;
>> +    struct dpif_offload_pmd_ctx *offload_ctx = NULL;
>>      unsigned int lc = 0;
>>      struct polled_queue *poll_list;
>>      bool wait_for_reload = false;
>> @@ -6549,6 +6570,9 @@ reload:
>>          dpdk_attached = dpdk_attach_thread(pmd->core_id);
>>      }
>>
>> +    dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id,
>> +                                   pmd->numa_id, &offload_ctx);
>> +
>>      /* List port/core affinity */
>>      for (i = 0; i < poll_cnt; i++) {
>>         VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>> @@ -6588,7 +6612,7 @@ reload:
>>      ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
>>      for (;;) {
>>          uint64_t rx_packets = 0, tx_packets = 0;
>> -        uint64_t time_slept = 0;
>> +        uint64_t time_slept = 0, offload_cycles = 0;
>>          uint64_t max_sleep;
>>
>>          pmd_perf_start_iteration(s);
>> @@ -6628,6 +6652,10 @@ reload:
>>                                                     ? true : false);
>>          }
>>
>> +        /* Do work required by any of the hardware offload providers. */
>> +        offload_cycles = dpif_offload_pmd_thread_do_work(offload_ctx,
>> +                                                         &pmd->perf_stats);
>> +
>>          if (max_sleep) {
>>              /* Check if a sleep should happen on this iteration. */
>>              if (sleep_time) {
>> @@ -6687,7 +6715,7 @@ reload:
>>          }
>>
>>          pmd_perf_end_iteration(s, rx_packets, tx_packets, time_slept,
>> -                               pmd_perf_metrics_enabled(pmd));
>> +                               offload_cycles, pmd_perf_metrics_enabled(pmd));
>>      }
>>      ovs_mutex_unlock(&pmd->perf_stats.stats_mutex);
>>
>> @@ -6708,6 +6736,7 @@ reload:
>>          goto reload;
>>      }
>>
>> +    dpif_offload_pmd_thread_exit(offload_ctx);
>>      pmd_free_static_tx_qid(pmd);
>>      dfc_cache_uninit(&pmd->flow_cache);
>>      free(poll_list);
>> @@ -9623,7 +9652,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>                             struct polled_queue *poll_list, int poll_cnt)
>>  {
>>      struct dpcls *cls;
>> -    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0;
>> +    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0, tot_offload = 0;
>>      unsigned int pmd_load = 0;
>>
>>      if (pmd->ctx.now > pmd->next_cycle_store) {
>> @@ -9642,11 +9671,14 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>                         pmd->prev_stats[PMD_CYCLES_ITER_BUSY];
>>              tot_sleep = pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP] -
>>                          pmd->prev_stats[PMD_CYCLES_SLEEP];
>> +            tot_offload = pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD] -
>> +                          pmd->prev_stats[PMD_CYCLES_OFFLOAD];
>>
>>              if (pmd_alb->is_enabled && !pmd->isolated) {
>>                  if (tot_proc) {
>>                      pmd_load = ((tot_proc * 100) /
>> -                                    (tot_idle + tot_proc + tot_sleep));
>> +                                    (tot_idle + tot_proc + tot_sleep
>> +                                     + tot_offload));
>>                  }
>>
>>                  atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> @@ -9665,6 +9697,8 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>                          pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY];
>>          pmd->prev_stats[PMD_CYCLES_SLEEP] =
>>                          pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP];
>> +        pmd->prev_stats[PMD_CYCLES_OFFLOAD] =
>> +                        pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD];
>>
>>          /* Get the cycles that were used to process each queue and store. */
>>          for (unsigned i = 0; i < poll_cnt; i++) {
>> diff --git a/lib/dpif-offload-dummy.c b/lib/dpif-offload-dummy.c
>> index 28f1f40013..80d6ce67c3 100644
>> --- a/lib/dpif-offload-dummy.c
>> +++ b/lib/dpif-offload-dummy.c
>> @@ -17,6 +17,7 @@
>>  #include <config.h>
>>  #include <errno.h>
>>
>> +#include "coverage.h"
>>  #include "dpif.h"
>>  #include "dpif-offload.h"
>>  #include "dpif-offload-provider.h"
>> @@ -33,6 +34,8 @@
>>
>>  VLOG_DEFINE_THIS_MODULE(dpif_offload_dummy);
>>
>> +COVERAGE_DEFINE(dummy_offload_do_work);
>> +
>>  struct pmd_id_data {
>>      struct hmap_node node;
>>      void *flow_reference;
>> @@ -1020,6 +1023,40 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
>>      }
>>  }
>>
>> +static void
>> +dummy_pmd_thread_work_cb(unsigned core_id OVS_UNUSED, int numa_id OVS_UNUSED,
>> +                         void *ctx OVS_UNUSED)
>> +{
>> +    COVERAGE_INC(dummy_offload_do_work);
>> +}
>> +
>> +static void
>> +dummy_pmd_thread_lifecycle(const struct dpif_offload *dpif_offload,
>> +                           bool exit, unsigned core_id, int numa_id,
>> +                           dpif_offload_pmd_thread_work_cb **callback,
>> +                           void **ctx)
>> +{
>> +    /* Only do this for the 'dummy' class, not for 'dummy_x'. */
>> +    if (strcmp(dpif_offload_type(dpif_offload), "dummy")) {
>> +        *callback = NULL;
>> +        *ctx = NULL;
>> +        return;
>> +    }
>> +
>> +    VLOG_DBG(
>> +        "pmd_thread_lifecycle; exit=%s, core=%u, numa=%d, cb=%p, ctx=%p",
>> +        exit ? "true" : "false", core_id, numa_id, *callback, *ctx);
>> +
>> +    ovs_assert(!*callback || *callback == dummy_pmd_thread_work_cb);
>> +
>> +    if (exit) {
>> +        free(*ctx);
>> +    } else {
>> +        *ctx = *ctx ? *ctx : xstrdup("DUMMY_OFFLOAD_WORK");
>> +        *callback = dummy_pmd_thread_work_cb;
>> +    }
>> +}
>> +
>>  #define DEFINE_DPIF_DUMMY_CLASS(NAME, TYPE_STR)                             \
>>      struct dpif_offload_class NAME = {                                      \
>>          .type = TYPE_STR,                                                   \
>> @@ -1039,6 +1076,7 @@ dummy_netdev_hw_offload_run(struct netdev *netdev)
>>          .netdev_flow_del = dummy_flow_del,                                  \
>>          .netdev_flow_stats = dummy_flow_stats,                              \
>>          .register_flow_unreference_cb = dummy_register_flow_unreference_cb, \
>> +        .pmd_thread_lifecycle = dummy_pmd_thread_lifecycle                  \
>>  }
>>
>>  DEFINE_DPIF_DUMMY_CLASS(dpif_offload_dummy_class, "dummy");
>> diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
>> index 02ef46cb08..259de2c299 100644
>> --- a/lib/dpif-offload-provider.h
>> +++ b/lib/dpif-offload-provider.h
>> @@ -87,6 +87,10 @@ dpif_offload_flow_dump_thread_init(
>>  }
>>
>>
>> +/* Offload Provider specific PMD thread work callback definition. */
>> +typedef void dpif_offload_pmd_thread_work_cb(unsigned core_id, int numa_id,
>> +                                             void *ctx);
>> +
>>  struct dpif_offload_class {
>>      /* Type of DPIF offload provider in this class, e.g., "tc", "dpdk",
>>       * "dummy", etc. */
>> @@ -305,6 +309,28 @@ struct dpif_offload_class {
>>       * to netdev_flow_put() is no longer held by the offload provider. */
>>      void (*register_flow_unreference_cb)(const struct dpif_offload *,
>>                                           dpif_offload_flow_unreference_cb *);
>> +
>> +
>> +    /* The API below is specific to PMD (userspace) thread lifecycle handling.
>> +     *
>> +     * This API allows a provider to supply a callback function
>> +     * (via `*callback`) and an optional context pointer (via `*ctx`) for a
>> +     * PMD thread.
>> +     *
>> +     * The lifecycle hook may be invoked multiple times for the same PMD
>> +     * thread.  For example, when the thread is reinitialized, this function
>> +     * will be called again and the previous `callback` and `ctx` values will
>> +     * be passed back in.  It is the provider's responsibility to decide
>> +     * whether those should be reused, replaced, or cleaned up before storing
>> +     * new values.
>> +     *
>> +     * When the PMD thread is terminating, this API is called with
>> +     * `exit == true`.  At that point, the provider must release any resources
>> +     * associated with the previously returned `callback` and `ctx`. */
>> +    void (*pmd_thread_lifecycle)(const struct dpif_offload *, bool exit,
>> +                                 unsigned core_id, int numa_id,
>> +                                 dpif_offload_pmd_thread_work_cb **callback,
>> +                                 void **ctx);
>>  };
>>
>>  extern struct dpif_offload_class dpif_offload_dummy_class;
>> diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
>> index bb2feced9e..cbf1f6c704 100644
>> --- a/lib/dpif-offload.c
>> +++ b/lib/dpif-offload.c
>> @@ -17,6 +17,7 @@
>>  #include <config.h>
>>  #include <errno.h>
>>
>> +#include "dpif-netdev-perf.h"
>>  #include "dpif-offload.h"
>>  #include "dpif-offload-provider.h"
>>  #include "dpif-provider.h"
>> @@ -54,6 +55,7 @@ static const struct dpif_offload_class *base_dpif_offload_classes[] = {
>>      &dpif_offload_dummy_x_class,
>>  };
>>
>> +#define TOTAL_PROVIDERS ARRAY_SIZE(base_dpif_offload_classes)
>>  #define DEFAULT_PROVIDER_PRIORITY_LIST "tc,dpdk,dummy,dummy_x"
>>
>>  static char *dpif_offload_provider_priority_list = NULL;
>> @@ -1665,3 +1667,134 @@ dpif_offload_port_mgr_port_count(const struct dpif_offload *offload)
>>
>>      return cmap_count(&offload->ports->odp_port_to_port);
>>  }
>> +
>> +struct dpif_offload_pmd_ctx_node {
>> +    const struct dpif_offload *offload;
>> +    dpif_offload_pmd_thread_work_cb *callback;
>> +    void *provider_ctx;
>> +};
>> +
>> +struct dpif_offload_pmd_ctx {
>> +    unsigned core_id;
>> +    int numa_id;
>> +    size_t n_nodes;
>> +    struct dpif_offload_pmd_ctx_node nodes[TOTAL_PROVIDERS];
>> +};
>> +
>> +void
>> +dpif_offload_pmd_thread_reload(const char *dpif_name, unsigned core_id,
>> +                               int numa_id, struct dpif_offload_pmd_ctx **ctx_)
>> +{
>> +    struct dpif_offload_pmd_ctx_node old_nodes[TOTAL_PROVIDERS];
>> +    struct dpif_offload_provider_collection *collection;
>> +    struct dpif_offload_pmd_ctx *ctx;
>> +    struct dpif_offload *offload;
>> +    size_t old_n_nodes = 0;
>> +
>> +    if (!dpif_offload_enabled()) {
>> +        ovs_assert(!*ctx_);
>> +        return;
>> +    }
>> +
>> +    ovs_mutex_lock(&dpif_offload_mutex);
>> +    collection = shash_find_data(&dpif_offload_providers, dpif_name);
>> +    ovs_mutex_unlock(&dpif_offload_mutex);
>> +
>> +    if (OVS_UNLIKELY(!collection)) {
>> +        ovs_assert(!*ctx_);
>> +        return;
>> +    }
>> +
>> +    if (!*ctx_) {
>> +        /* Would be nice if we have a numa specific xzalloc(). */
>> +        ctx = xzalloc(sizeof *ctx);
>> +        ctx->core_id = core_id;
>> +        ctx->numa_id = numa_id;
>> +        *ctx_ = ctx;
>> +    } else {
>> +        ctx = *ctx_;
>> +        old_n_nodes = ctx->n_nodes;
>> +
>> +        if (old_n_nodes) {
>> +            memcpy(old_nodes, ctx->nodes, old_n_nodes * sizeof old_nodes[0]);
>> +        }
>> +
>> +        /* Reset active nodes array. */
>> +        memset(ctx->nodes, 0, sizeof ctx->nodes);
>> +        ctx->n_nodes = 0;
>> +    }
>> +
>> +    LIST_FOR_EACH (offload, dpif_list_node, &collection->list) {
>> +
>> +        ovs_assert(ctx->n_nodes < TOTAL_PROVIDERS);
>> +
>> +        if (!offload->class->pmd_thread_lifecycle) {
>> +            continue;
>> +        }
>> +
>> +        if (old_n_nodes) {
>> +            /* If this is a reload, try to find previous callback and ctx. */
>> +            for (size_t i = 0; i < old_n_nodes; i++) {
>> +                struct dpif_offload_pmd_ctx_node *node = &old_nodes[i];
>> +
>> +                if (offload == node->offload) {
>> +                    ctx->nodes[ctx->n_nodes].callback = node->callback;
>> +                    ctx->nodes[ctx->n_nodes].provider_ctx = node->provider_ctx;
>> +                    break;
>> +                }
>> +            }
>> +        }
>> +
>> +        offload->class->pmd_thread_lifecycle(
>> +            offload, false, core_id, numa_id,
>> +            &ctx->nodes[ctx->n_nodes].callback,
>> +            &ctx->nodes[ctx->n_nodes].provider_ctx);
>> +
>> +        if (ctx->nodes[ctx->n_nodes].callback) {
>> +            ctx->nodes[ctx->n_nodes].offload = offload;
>> +            ctx->n_nodes++;
>> +        } else {
>> +            memset(&ctx->nodes[ctx->n_nodes], 0,
>> +                   sizeof ctx->nodes[ctx->n_nodes]);
>> +        }
>> +    }
>> +}
>> +
>> +uint64_t
>> +dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *ctx,
>> +                                struct pmd_perf_stats *stats)
>> +{
>> +    struct cycle_timer offload_work_timer;
>> +
>> +    if (!ctx || !ctx->n_nodes) {
>> +        return 0;
>> +    }
>> +
>> +    cycle_timer_start(stats, &offload_work_timer);
>> +
>> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
>> +        ctx->nodes[i].callback(ctx->core_id, ctx->numa_id,
>> +                               ctx->nodes[i].provider_ctx);
>> +    }
>> +
>> +    return cycle_timer_stop(stats, &offload_work_timer);
>> +}
>> +
>> +void
>> +dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *ctx)
>> +{
>> +    if (!ctx) {
>> +        return;
>> +    }
>> +
>> +    for (size_t i = 0; i < ctx->n_nodes; i++) {
>> +        struct dpif_offload_pmd_ctx_node *node = &ctx->nodes[i];
>> +
>> +        node->offload->class->pmd_thread_lifecycle(node->offload, true,
>> +                                                   ctx->core_id, ctx->numa_id,
>> +                                                   &node->callback,
>> +                                                   &node->provider_ctx);
>> +    }
>> +
>> +    free(ctx);
>> +}
>> diff --git a/lib/dpif-offload.h b/lib/dpif-offload.h
>> index 7fad3ebee3..0f66d8cd8e 100644
>> --- a/lib/dpif-offload.h
>> +++ b/lib/dpif-offload.h
>> @@ -22,6 +22,7 @@
>>  /* Forward declarations of private structures. */
>>  struct dpif_offload_class;
>>  struct dpif_offload;
>> +struct pmd_perf_stats;
>>
>>  /* Definition of the DPIF offload implementation type.
>>   *
>> @@ -186,4 +187,14 @@ dpif_offload_datapath_flow_op_continue(struct dpif_offload_flow_cb_data *cb,
>>      }
>>  }
>>
>> +/* PMD Thread helper functions. */
>> +struct dpif_offload_pmd_ctx;
>> +
>> +void dpif_offload_pmd_thread_reload(const char *dpif_name,
>> +                                    unsigned core_id, int numa_id,
>> +                                    struct dpif_offload_pmd_ctx **);
>> +uint64_t dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *,
>> +                                         struct pmd_perf_stats *);
>> +void dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *);
>> +
>>  #endif /* DPIF_OFFLOAD_H */
>> diff --git a/tests/pmd.at b/tests/pmd.at
>> index 8254ac3b0f..54184d8c92 100644
>> --- a/tests/pmd.at
>> +++ b/tests/pmd.at
>> @@ -1689,3 +1689,35 @@ recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.
>>
>>  OVS_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +AT_SETUP([PMD - offload work])
>> +OVS_VSWITCHD_START([], [], [], [DUMMY_NUMA],
>> +                   [-- set Open_vSwitch . other_config:hw-offload=true])
>> +
>> +AT_CHECK([ovs-appctl vlog/set dpif_offload_dummy:dbg])
>> +AT_CHECK([ovs-vsctl add-port br0 p0 -- set Interface p0 type=dummy-pmd])
>> +
>> +CHECK_CPU_DISCOVERED()
>> +CHECK_PMD_THREADS_CREATED()
>> +
>> +OVS_WAIT_UNTIL(
>> +  [test $(ovs-appctl coverage/read-counter dummy_offload_do_work) -gt 0])
>> +
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-perf-show \
>> +  | grep -Eq 'Offload cycles: +[[0-9]]+  \( *[[0-9.]]+ % of used cycles\)'])
>> +
>> +OVS_VSWITCHD_STOP
>> +
>> +LOG="$(sed -n 's/.*\(pmd_thread_lifecycle.*\)/\1/p' ovs-vswitchd.log)"
>> +CB=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*cb=\([[^,]]*\).*/\1/p')
>> +CTX=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*ctx=\(.*\)$/\1/p')
>> +
>> +AT_CHECK([echo "$LOG" | sed -n '1p' | sed 's/(nil)/0x0/g'], [0], [dnl
>> +pmd_thread_lifecycle; exit=false, core=0, numa=0, cb=0x0, ctx=0x0
>> +])
>> +AT_CHECK([echo "$LOG" | sed -n '2p' \
>> +          | grep -q "exit=false, core=0, numa=0, cb=$CB, ctx=$CTX"])
>> +AT_CHECK([echo "$LOG" | sed -n '$p' \
>> +          | grep -q "exit=true, core=0, numa=0, cb=$CB, ctx=$CTX"])
>> +
>> +AT_CLEANUP
>> diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt
>> index c1f43e5afa..c9b758d63c 100644
>> --- a/utilities/checkpatch_dict.txt
>> +++ b/utilities/checkpatch_dict.txt
>> @@ -36,6 +36,7 @@ cpu
>>  cpus
>>  cstime
>>  csum
>> +ctx
>>  cutime
>>  cvlan
>>  datapath
>> @@ -46,6 +47,7 @@ decap
>>  decapsulation
>>  defrag
>>  defragment
>> +deinitialization
>>  deref
>>  dereference
>>  dest
>> -- 
>> 2.52.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 1cd4ee0842..39465ba819 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -232,6 +232,7 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
     uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
     uint64_t sleep_iter = stats[PMD_SLEEP_ITER];
     uint64_t tot_sleep_cycles = stats[PMD_CYCLES_SLEEP];
+    uint64_t offload_cycles = stats[PMD_CYCLES_OFFLOAD];
 
     ds_put_format(str,
             "  Iterations:         %12"PRIu64"  (%.2f us/it)\n"
@@ -242,7 +243,8 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
             "  Sleep time (us):    %12.0f  (%3.0f us/iteration avg.)\n",
             tot_iter,
             tot_iter
-                ? (tot_cycles + tot_sleep_cycles) * us_per_cycle / tot_iter
+                ? (tot_cycles + tot_sleep_cycles + offload_cycles)
+                  * us_per_cycle / tot_iter
                 : 0,
             tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
             idle_iter,
@@ -252,6 +254,13 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
             sleep_iter, tot_iter ? 100.0 * sleep_iter / tot_iter : 0,
             tot_sleep_cycles * us_per_cycle,
             sleep_iter ? (tot_sleep_cycles * us_per_cycle) / sleep_iter : 0);
+    if (offload_cycles > 0) {
+        ds_put_format(str,
+            "  Offload cycles:     %12" PRIu64 "  (%5.1f %% of used cycles)\n",
+            offload_cycles,
+            100.0 * offload_cycles / (tot_cycles + tot_sleep_cycles
+                                      + offload_cycles));
+    }
     if (rx_packets > 0) {
         ds_put_format(str,
             "  Rx packets:         %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
@@ -532,14 +541,14 @@  OVS_REQUIRES(s->stats_mutex)
 void
 pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
                        int tx_packets, uint64_t sleep_cycles,
-                       bool full_metrics)
+                       uint64_t offload_cycles, bool full_metrics)
 {
     uint64_t now_tsc = cycles_counter_update(s);
     struct iter_stats *cum_ms;
     uint64_t cycles, cycles_per_pkt = 0;
     char *reason = NULL;
 
-    cycles = now_tsc - s->start_tsc - sleep_cycles;
+    cycles = now_tsc - s->start_tsc - sleep_cycles - offload_cycles;
     s->current.timestamp = s->iteration_cnt;
     s->current.cycles = cycles;
     s->current.pkts = rx_packets;
@@ -558,6 +567,10 @@  pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
         pmd_perf_update_counter(s, PMD_CYCLES_SLEEP, sleep_cycles);
     }
 
+    if (offload_cycles) {
+        pmd_perf_update_counter(s, PMD_CYCLES_OFFLOAD, offload_cycles);
+    }
+
     if (!full_metrics) {
         return;
     }
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 84beced151..2a055dacdd 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -82,6 +82,7 @@  enum pmd_stat_type {
     PMD_CYCLES_UPCALL,      /* Cycles spent processing upcalls. */
     PMD_SLEEP_ITER,         /* Iterations where a sleep has taken place. */
     PMD_CYCLES_SLEEP,       /* Total cycles slept to save power. */
+    PMD_CYCLES_OFFLOAD,     /* Total cycles spend handling offload. */
     PMD_N_STATS
 };
 
@@ -411,7 +412,7 @@  pmd_perf_start_iteration(struct pmd_perf_stats *s);
 void
 pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets,
                        int tx_packets, uint64_t sleep_cycles,
-                       bool full_metrics);
+                       uint64_t offload_cycles, bool full_metrics);
 
 /* Formatting the output of commands. */
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9df05c4c28..64531a02c0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -329,6 +329,9 @@  struct dp_netdev {
     uint64_t last_reconfigure_seq;
     struct ovsthread_once once_set_config;
 
+    /* When a reconfigure is requested, forcefully reload all PMDs. */
+    bool force_pmd_reload;
+
     /* Cpu mask for pin of pmd threads. */
     char *pmd_cmask;
 
@@ -339,6 +342,7 @@  struct dp_netdev {
 
     struct conntrack *conntrack;
     struct pmd_auto_lb pmd_alb;
+    bool offload_enabled;
 
     /* Bonds. */
     struct ovs_mutex bond_mutex; /* Protects updates of 'tx_bonds'. */
@@ -4556,6 +4560,14 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         log_all_pmd_sleeps(dp);
     }
 
+    if (!dp->offload_enabled) {
+        dp->offload_enabled = dpif_offload_enabled();
+        if (dp->offload_enabled) {
+            dp->force_pmd_reload = true;
+            dp_netdev_request_reconfigure(dp);
+        }
+    }
+
     return 0;
 }
 
@@ -6216,6 +6228,14 @@  reconfigure_datapath(struct dp_netdev *dp)
         ovs_mutex_unlock(&pmd->port_mutex);
     }
 
+    /* Do we need to forcefully reload all threads? */
+    if (dp->force_pmd_reload) {
+        CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
+            pmd->need_reload = true;
+        }
+        dp->force_pmd_reload = false;
+    }
+
     /* Reload affected pmd threads. */
     reload_affected_pmds(dp);
 
@@ -6516,6 +6536,7 @@  pmd_thread_main(void *f_)
 {
     struct dp_netdev_pmd_thread *pmd = f_;
     struct pmd_perf_stats *s = &pmd->perf_stats;
+    struct dpif_offload_pmd_ctx *offload_ctx = NULL;
     unsigned int lc = 0;
     struct polled_queue *poll_list;
     bool wait_for_reload = false;
@@ -6549,6 +6570,9 @@  reload:
         dpdk_attached = dpdk_attach_thread(pmd->core_id);
     }
 
+    dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id,
+                                   pmd->numa_id, &offload_ctx);
+
     /* List port/core affinity */
     for (i = 0; i < poll_cnt; i++) {
        VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
@@ -6588,7 +6612,7 @@  reload:
     ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
     for (;;) {
         uint64_t rx_packets = 0, tx_packets = 0;
-        uint64_t time_slept = 0;
+        uint64_t time_slept = 0, offload_cycles = 0;
         uint64_t max_sleep;
 
         pmd_perf_start_iteration(s);
@@ -6628,6 +6652,10 @@  reload:
                                                    ? true : false);
         }
 
+        /* Do work required by any of the hardware offload providers. */
+        offload_cycles = dpif_offload_pmd_thread_do_work(offload_ctx,
+                                                         &pmd->perf_stats);
+
         if (max_sleep) {
             /* Check if a sleep should happen on this iteration. */
             if (sleep_time) {
@@ -6687,7 +6715,7 @@  reload:
         }
 
         pmd_perf_end_iteration(s, rx_packets, tx_packets, time_slept,
-                               pmd_perf_metrics_enabled(pmd));
+                               offload_cycles, pmd_perf_metrics_enabled(pmd));
     }
     ovs_mutex_unlock(&pmd->perf_stats.stats_mutex);
 
@@ -6708,6 +6736,7 @@  reload:
         goto reload;
     }
 
+    dpif_offload_pmd_thread_exit(offload_ctx);
     pmd_free_static_tx_qid(pmd);
     dfc_cache_uninit(&pmd->flow_cache);
     free(poll_list);
@@ -9623,7 +9652,7 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
                            struct polled_queue *poll_list, int poll_cnt)
 {
     struct dpcls *cls;
-    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0;
+    uint64_t tot_idle = 0, tot_proc = 0, tot_sleep = 0, tot_offload = 0;
     unsigned int pmd_load = 0;
 
     if (pmd->ctx.now > pmd->next_cycle_store) {
@@ -9642,11 +9671,14 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
                        pmd->prev_stats[PMD_CYCLES_ITER_BUSY];
             tot_sleep = pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP] -
                         pmd->prev_stats[PMD_CYCLES_SLEEP];
+            tot_offload = pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD] -
+                          pmd->prev_stats[PMD_CYCLES_OFFLOAD];
 
             if (pmd_alb->is_enabled && !pmd->isolated) {
                 if (tot_proc) {
                     pmd_load = ((tot_proc * 100) /
-                                    (tot_idle + tot_proc + tot_sleep));
+                                    (tot_idle + tot_proc + tot_sleep
+                                     + tot_offload));
                 }
 
                 atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
@@ -9665,6 +9697,8 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
                         pmd->perf_stats.counters.n[PMD_CYCLES_ITER_BUSY];
         pmd->prev_stats[PMD_CYCLES_SLEEP] =
                         pmd->perf_stats.counters.n[PMD_CYCLES_SLEEP];
+        pmd->prev_stats[PMD_CYCLES_OFFLOAD] =
+                        pmd->perf_stats.counters.n[PMD_CYCLES_OFFLOAD];
 
         /* Get the cycles that were used to process each queue and store. */
         for (unsigned i = 0; i < poll_cnt; i++) {
diff --git a/lib/dpif-offload-dummy.c b/lib/dpif-offload-dummy.c
index 28f1f40013..80d6ce67c3 100644
--- a/lib/dpif-offload-dummy.c
+++ b/lib/dpif-offload-dummy.c
@@ -17,6 +17,7 @@ 
 #include <config.h>
 #include <errno.h>
 
+#include "coverage.h"
 #include "dpif.h"
 #include "dpif-offload.h"
 #include "dpif-offload-provider.h"
@@ -33,6 +34,8 @@ 
 
 VLOG_DEFINE_THIS_MODULE(dpif_offload_dummy);
 
+COVERAGE_DEFINE(dummy_offload_do_work);
+
 struct pmd_id_data {
     struct hmap_node node;
     void *flow_reference;
@@ -1020,6 +1023,40 @@  dummy_netdev_hw_offload_run(struct netdev *netdev)
     }
 }
 
+static void
+dummy_pmd_thread_work_cb(unsigned core_id OVS_UNUSED, int numa_id OVS_UNUSED,
+                         void *ctx OVS_UNUSED)
+{
+    COVERAGE_INC(dummy_offload_do_work);
+}
+
+static void
+dummy_pmd_thread_lifecycle(const struct dpif_offload *dpif_offload,
+                           bool exit, unsigned core_id, int numa_id,
+                           dpif_offload_pmd_thread_work_cb **callback,
+                           void **ctx)
+{
+    /* Only do this for the 'dummy' class, not for 'dummy_x'. */
+    if (strcmp(dpif_offload_type(dpif_offload), "dummy")) {
+        *callback = NULL;
+        *ctx = NULL;
+        return;
+    }
+
+    VLOG_DBG(
+        "pmd_thread_lifecycle; exit=%s, core=%u, numa=%d, cb=%p, ctx=%p",
+        exit ? "true" : "false", core_id, numa_id, *callback, *ctx);
+
+    ovs_assert(!*callback || *callback == dummy_pmd_thread_work_cb);
+
+    if (exit) {
+        free(*ctx);
+    } else {
+        *ctx = *ctx ? *ctx : xstrdup("DUMMY_OFFLOAD_WORK");
+        *callback = dummy_pmd_thread_work_cb;
+    }
+}
+
 #define DEFINE_DPIF_DUMMY_CLASS(NAME, TYPE_STR)                             \
     struct dpif_offload_class NAME = {                                      \
         .type = TYPE_STR,                                                   \
@@ -1039,6 +1076,7 @@  dummy_netdev_hw_offload_run(struct netdev *netdev)
         .netdev_flow_del = dummy_flow_del,                                  \
         .netdev_flow_stats = dummy_flow_stats,                              \
         .register_flow_unreference_cb = dummy_register_flow_unreference_cb, \
+        .pmd_thread_lifecycle = dummy_pmd_thread_lifecycle                  \
 }
 
 DEFINE_DPIF_DUMMY_CLASS(dpif_offload_dummy_class, "dummy");
diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
index 02ef46cb08..259de2c299 100644
--- a/lib/dpif-offload-provider.h
+++ b/lib/dpif-offload-provider.h
@@ -87,6 +87,10 @@  dpif_offload_flow_dump_thread_init(
 }
 
 
+/* Offload Provider specific PMD thread work callback definition. */
+typedef void dpif_offload_pmd_thread_work_cb(unsigned core_id, int numa_id,
+                                             void *ctx);
+
 struct dpif_offload_class {
     /* Type of DPIF offload provider in this class, e.g., "tc", "dpdk",
      * "dummy", etc. */
@@ -305,6 +309,28 @@  struct dpif_offload_class {
      * to netdev_flow_put() is no longer held by the offload provider. */
     void (*register_flow_unreference_cb)(const struct dpif_offload *,
                                          dpif_offload_flow_unreference_cb *);
+
+
+    /* The API below is specific to PMD (userspace) thread lifecycle handling.
+     *
+     * This API allows a provider to supply a callback function
+     * (via `*callback`) and an optional context pointer (via `*ctx`) for a
+     * PMD thread.
+     *
+     * The lifecycle hook may be invoked multiple times for the same PMD
+     * thread.  For example, when the thread is reinitialized, this function
+     * will be called again and the previous `callback` and `ctx` values will
+     * be passed back in.  It is the provider's responsibility to decide
+     * whether those should be reused, replaced, or cleaned up before storing
+     * new values.
+     *
+     * When the PMD thread is terminating, this API is called with
+     * `exit == true`.  At that point, the provider must release any resources
+     * associated with the previously returned `callback` and `ctx`. */
+    void (*pmd_thread_lifecycle)(const struct dpif_offload *, bool exit,
+                                 unsigned core_id, int numa_id,
+                                 dpif_offload_pmd_thread_work_cb **callback,
+                                 void **ctx);
 };
 
 extern struct dpif_offload_class dpif_offload_dummy_class;
diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
index bb2feced9e..cbf1f6c704 100644
--- a/lib/dpif-offload.c
+++ b/lib/dpif-offload.c
@@ -17,6 +17,7 @@ 
 #include <config.h>
 #include <errno.h>
 
+#include "dpif-netdev-perf.h"
 #include "dpif-offload.h"
 #include "dpif-offload-provider.h"
 #include "dpif-provider.h"
@@ -54,6 +55,7 @@  static const struct dpif_offload_class *base_dpif_offload_classes[] = {
     &dpif_offload_dummy_x_class,
 };
 
+#define TOTAL_PROVIDERS ARRAY_SIZE(base_dpif_offload_classes)
 #define DEFAULT_PROVIDER_PRIORITY_LIST "tc,dpdk,dummy,dummy_x"
 
 static char *dpif_offload_provider_priority_list = NULL;
@@ -1665,3 +1667,134 @@  dpif_offload_port_mgr_port_count(const struct dpif_offload *offload)
 
     return cmap_count(&offload->ports->odp_port_to_port);
 }
+
+struct dpif_offload_pmd_ctx_node {
+    const struct dpif_offload *offload;
+    dpif_offload_pmd_thread_work_cb *callback;
+    void *provider_ctx;
+};
+
+struct dpif_offload_pmd_ctx {
+    unsigned core_id;
+    int numa_id;
+    size_t n_nodes;
+    struct dpif_offload_pmd_ctx_node nodes[TOTAL_PROVIDERS];
+};
+
+void
+dpif_offload_pmd_thread_reload(const char *dpif_name, unsigned core_id,
+                               int numa_id, struct dpif_offload_pmd_ctx **ctx_)
+{
+    struct dpif_offload_pmd_ctx_node old_nodes[TOTAL_PROVIDERS];
+    struct dpif_offload_provider_collection *collection;
+    struct dpif_offload_pmd_ctx *ctx;
+    struct dpif_offload *offload;
+    size_t old_n_nodes = 0;
+
+    if (!dpif_offload_enabled()) {
+        ovs_assert(!*ctx_);
+        return;
+    }
+
+    ovs_mutex_lock(&dpif_offload_mutex);
+    collection = shash_find_data(&dpif_offload_providers, dpif_name);
+    ovs_mutex_unlock(&dpif_offload_mutex);
+
+    if (OVS_UNLIKELY(!collection)) {
+        ovs_assert(!*ctx_);
+        return;
+    }
+
+    if (!*ctx_) {
+        /* Would be nice if we have a numa specific xzalloc(). */
+        ctx = xzalloc(sizeof *ctx);
+        ctx->core_id = core_id;
+        ctx->numa_id = numa_id;
+        *ctx_ = ctx;
+    } else {
+        ctx = *ctx_;
+        old_n_nodes = ctx->n_nodes;
+
+        if (old_n_nodes) {
+            memcpy(old_nodes, ctx->nodes, old_n_nodes * sizeof old_nodes[0]);
+        }
+
+        /* Reset active nodes array. */
+        memset(ctx->nodes, 0, sizeof ctx->nodes);
+        ctx->n_nodes = 0;
+    }
+
+    LIST_FOR_EACH (offload, dpif_list_node, &collection->list) {
+
+        ovs_assert(ctx->n_nodes < TOTAL_PROVIDERS);
+
+        if (!offload->class->pmd_thread_lifecycle) {
+            continue;
+        }
+
+        if (old_n_nodes) {
+            /* If this is a reload, try to find previous callback and ctx. */
+            for (size_t i = 0; i < old_n_nodes; i++) {
+                struct dpif_offload_pmd_ctx_node *node = &old_nodes[i];
+
+                if (offload == node->offload) {
+                    ctx->nodes[ctx->n_nodes].callback = node->callback;
+                    ctx->nodes[ctx->n_nodes].provider_ctx = node->provider_ctx;
+                    break;
+                }
+            }
+        }
+
+        offload->class->pmd_thread_lifecycle(
+            offload, false, core_id, numa_id,
+            &ctx->nodes[ctx->n_nodes].callback,
+            &ctx->nodes[ctx->n_nodes].provider_ctx);
+
+        if (ctx->nodes[ctx->n_nodes].callback) {
+            ctx->nodes[ctx->n_nodes].offload = offload;
+            ctx->n_nodes++;
+        } else {
+            memset(&ctx->nodes[ctx->n_nodes], 0,
+                   sizeof ctx->nodes[ctx->n_nodes]);
+        }
+    }
+}
+
+uint64_t
+dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *ctx,
+                                struct pmd_perf_stats *stats)
+{
+    struct cycle_timer offload_work_timer;
+
+    if (!ctx || !ctx->n_nodes) {
+        return 0;
+    }
+
+    cycle_timer_start(stats, &offload_work_timer);
+
+    for (size_t i = 0; i < ctx->n_nodes; i++) {
+        ctx->nodes[i].callback(ctx->core_id, ctx->numa_id,
+                               ctx->nodes[i].provider_ctx);
+    }
+
+    return cycle_timer_stop(stats, &offload_work_timer);
+}
+
+void
+dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *ctx)
+{
+    if (!ctx) {
+        return;
+    }
+
+    for (size_t i = 0; i < ctx->n_nodes; i++) {
+        struct dpif_offload_pmd_ctx_node *node = &ctx->nodes[i];
+
+        node->offload->class->pmd_thread_lifecycle(node->offload, true,
+                                                   ctx->core_id, ctx->numa_id,
+                                                   &node->callback,
+                                                   &node->provider_ctx);
+    }
+
+    free(ctx);
+}
diff --git a/lib/dpif-offload.h b/lib/dpif-offload.h
index 7fad3ebee3..0f66d8cd8e 100644
--- a/lib/dpif-offload.h
+++ b/lib/dpif-offload.h
@@ -22,6 +22,7 @@ 
 /* Forward declarations of private structures. */
 struct dpif_offload_class;
 struct dpif_offload;
+struct pmd_perf_stats;
 
 /* Definition of the DPIF offload implementation type.
  *
@@ -186,4 +187,14 @@  dpif_offload_datapath_flow_op_continue(struct dpif_offload_flow_cb_data *cb,
     }
 }
 
+/* PMD Thread helper functions. */
+struct dpif_offload_pmd_ctx;
+
+void dpif_offload_pmd_thread_reload(const char *dpif_name,
+                                    unsigned core_id, int numa_id,
+                                    struct dpif_offload_pmd_ctx **);
+uint64_t dpif_offload_pmd_thread_do_work(struct dpif_offload_pmd_ctx *,
+                                         struct pmd_perf_stats *);
+void dpif_offload_pmd_thread_exit(struct dpif_offload_pmd_ctx *);
+
 #endif /* DPIF_OFFLOAD_H */
diff --git a/tests/pmd.at b/tests/pmd.at
index 8254ac3b0f..54184d8c92 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -1689,3 +1689,35 @@  recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - offload work])
+OVS_VSWITCHD_START([], [], [], [DUMMY_NUMA],
+                   [-- set Open_vSwitch . other_config:hw-offload=true])
+
+AT_CHECK([ovs-appctl vlog/set dpif_offload_dummy:dbg])
+AT_CHECK([ovs-vsctl add-port br0 p0 -- set Interface p0 type=dummy-pmd])
+
+CHECK_CPU_DISCOVERED()
+CHECK_PMD_THREADS_CREATED()
+
+OVS_WAIT_UNTIL(
+  [test $(ovs-appctl coverage/read-counter dummy_offload_do_work) -gt 0])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-perf-show \
+  | grep -Eq 'Offload cycles: +[[0-9]]+  \( *[[0-9.]]+ % of used cycles\)'])
+
+OVS_VSWITCHD_STOP
+
+LOG="$(sed -n 's/.*\(pmd_thread_lifecycle.*\)/\1/p' ovs-vswitchd.log)"
+CB=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*cb=\([[^,]]*\).*/\1/p')
+CTX=$(echo "$LOG" | sed -n '2p' | sed -n 's/.*ctx=\(.*\)$/\1/p')
+
+AT_CHECK([echo "$LOG" | sed -n '1p' | sed 's/(nil)/0x0/g'], [0], [dnl
+pmd_thread_lifecycle; exit=false, core=0, numa=0, cb=0x0, ctx=0x0
+])
+AT_CHECK([echo "$LOG" | sed -n '2p' \
+          | grep -q "exit=false, core=0, numa=0, cb=$CB, ctx=$CTX"])
+AT_CHECK([echo "$LOG" | sed -n '$p' \
+          | grep -q "exit=true, core=0, numa=0, cb=$CB, ctx=$CTX"])
+
+AT_CLEANUP
diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt
index c1f43e5afa..c9b758d63c 100644
--- a/utilities/checkpatch_dict.txt
+++ b/utilities/checkpatch_dict.txt
@@ -36,6 +36,7 @@  cpu
 cpus
 cstime
 csum
+ctx
 cutime
 cvlan
 datapath
@@ -46,6 +47,7 @@  decap
 decapsulation
 defrag
 defragment
+deinitialization
 deref
 dereference
 dest