diff mbox series

[ovs-dev,v13,05/12] dpif-netdev: Add command to switch dpif implementation.

Message ID 20210617161825.94741-6-cian.ferriter@intel.com
State Changes Requested
Headers show
Series DPIF Framework + Optimizations | expand

Commit Message

Ferriter, Cian June 17, 2021, 4:18 p.m. UTC
From: Harry van Haaren <harry.van.haaren@intel.com>

This commit adds a new command to allow the user to switch
the active DPIF implementation at runtime. A probe function
is executed before switching the DPIF implementation, to ensure
the CPU is capable of running the ISA required. For example, the
below code will switch to the AVX512 enabled DPIF assuming
that the runtime CPU is capable of running AVX512 instructions:

 $ ovs-appctl dpif-netdev/dpif-set dpif_avx512

A new configuration flag is added to allow selection of the
default DPIF. This is useful for running the unit-tests against
the available DPIF implementations, without modifying each unit test.

The design of the testing & validation for ISA optimized DPIF
implementations is based around the work already upstream for DPCLS.
Note however that a DPCLS lookup has no state or side-effects, allowing
the auto-validator implementation to perform multiple lookups and
provide consistent statistic counters.

The DPIF component does have state, so running two implementations in
parallel and comparing output is not a valid testing method, as there
are changes in DPIF statistic counters (side effects). As a result, the
DPIF is tested directly against the unit-tests.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>

---

v13:
- Add Docs items about the switch DPIF command here rather than in
  later commit.
- Document operation in manpages as well as rST.
- Minor code refactoring to address review comments.
---
 Documentation/topics/dpdk/bridge.rst |  34 +++++++++
 acinclude.m4                         |  15 ++++
 configure.ac                         |   1 +
 lib/automake.mk                      |   1 +
 lib/dpif-netdev-avx512.c             |  14 ++++
 lib/dpif-netdev-private-dpif.c       | 103 +++++++++++++++++++++++++++
 lib/dpif-netdev-private-dpif.h       |  49 ++++++++++++-
 lib/dpif-netdev-private-thread.h     |  11 +--
 lib/dpif-netdev-unixctl.man          |   3 +
 lib/dpif-netdev.c                    |  89 +++++++++++++++++++++--
 10 files changed, 304 insertions(+), 16 deletions(-)
 create mode 100644 lib/dpif-netdev-private-dpif.c

Comments

Flavio Leitner June 23, 2021, 6:18 p.m. UTC | #1
On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren <harry.van.haaren@intel.com>
> 
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
> 
>  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> 
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
> 
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
> 
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
> 
> ---
> 
> v13:
> - Add Docs items about the switch DPIF command here rather than in
>   later commit.
> - Document operation in manpages as well as rST.
> - Minor code refactoring to address review comments.
> ---
>  Documentation/topics/dpdk/bridge.rst |  34 +++++++++
>  acinclude.m4                         |  15 ++++
>  configure.ac                         |   1 +
>  lib/automake.mk                      |   1 +
>  lib/dpif-netdev-avx512.c             |  14 ++++
>  lib/dpif-netdev-private-dpif.c       | 103 +++++++++++++++++++++++++++
>  lib/dpif-netdev-private-dpif.h       |  49 ++++++++++++-
>  lib/dpif-netdev-private-thread.h     |  11 +--
>  lib/dpif-netdev-unixctl.man          |   3 +
>  lib/dpif-netdev.c                    |  89 +++++++++++++++++++++--
>  10 files changed, 304 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dpif.c
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
> index 526d5c959..fafa8c821 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -214,3 +214,37 @@ implementation ::
>  
>  Compile OVS in debug mode to have `ovs_assert` statements error out if
>  there is a mis-match in the DPCLS lookup implementation.
> +
> +Datapath Interface Performance
> +------------------------------
> +
> +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> +packets through the major components of the userspace datapath; such as
> +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> +stats associated with the datapath.
> +
> +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
> +improve performance.
> +
> +By default, dpif_scalar is used. The DPIF implementation can be selected by
> +name ::
> +
> +    $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> +    DPIF implementation set to dpif_avx512.
> +
> +    $ ovs-appctl dpif-netdev/dpif-set dpif_scalar
> +    DPIF implementation set to dpif_scalar.
> +
> +Running Unit Tests with AVX512 DPIF
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Since the AVX512 DPIF is disabled by default, a compile time option is
> +available in order to test it with the OVS unit test suite. When building with
> +a CPU that supports AVX512, use the following configure option ::
> +
> +    $ ./configure --enable-dpif-default-avx512
> +
> +The following line should be seen in the configure output when the above option
> +is used ::
> +
> +    checking whether DPIF AVX512 is default implementation... yes
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
>    fi
>  ])
>  
> +dnl Set OVS DPIF default implementation at configure time for running the unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl
> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> +  AC_ARG_ENABLE([dpif-default-avx512],
> +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])],
> +                [dpifavx512=yes],[dpifavx512=no])
> +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> +  if test "$dpifavx512" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> +    AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl OVS_ENABLE_WERROR
>  AC_DEFUN([OVS_ENABLE_WERROR],
>    [AC_ARG_ENABLE(
> diff --git a/configure.ac b/configure.ac
> index c077034d4..e45685a6c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
>  OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
> +OVS_CHECK_DPIF_AVX512_DEFAULT
>  OVS_CHECK_BINUTILS_AVX512
>  
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 660cd07f0..49f42c2a3 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -116,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/dpif-netdev-private-dfc.c \
>  	lib/dpif-netdev-private-dfc.h \
>  	lib/dpif-netdev-private-dpcls.h \
> +	lib/dpif-netdev-private-dpif.c \
>  	lib/dpif-netdev-private-dpif.h \
>  	lib/dpif-netdev-private-flow.h \
>  	lib/dpif-netdev-private-hwol.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 0e55b0be2..f3f66fc60 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -19,6 +19,7 @@
>  #if !defined(__CHECKER__)
>  
>  #include <config.h>
> +#include <errno.h>
>  
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-perf.h"
> @@ -61,6 +62,19 @@ struct dpif_userdata {
>          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
>  };
>  
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void)
> +{
> +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");

Those should be of type "bool"

> +
> +    if (!avx512f_available || !bmi2_available) {
> +        return -ENOTSUP;

Also the callers of the probe function only cares about true or
false, so there is no need to return errno here, then the new
include can also be removed.

> +    }
> +
> +    return 0;
> +}
> +
>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> new file mode 100644
> index 000000000..d829a7ee5
> --- /dev/null
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -0,0 +1,103 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include "dpif-netdev-private-dpif.h"
> +#include "util.h"
> +#include "openvswitch/vlog.h"

Please sort headers alphabetically, see:
https://github.com/openvswitch/ovs/blob/master/Documentation/internals/contributing/coding-style.rst#source-files

> +
> +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> +
> +/* Actual list of implementations goes here. */
> +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> +    /* The default scalar C code implementation. */
> +    { .func = dp_netdev_input,

The '.func' is too generic. Can we rename this to something that
relates to 'input'?

> +      .probe = NULL,
> +      .name = "dpif_scalar", },
> +
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> +    { .func = dp_netdev_input_outer_avx512,
> +      .probe = dp_netdev_input_outer_avx512_probe,
> +      .name = "dpif_avx512", },
> +#endif
> +};
> +
> +static dp_netdev_input_func default_dpif_func;
> +
> +dp_netdev_input_func
> +dp_netdev_impl_get_default(void)
> +{
> +    /* For the first call, this will be NULL. Compute the compile time default.
> +     */
> +    if (!default_dpif_func) {
> +        int dpif_idx = 0;
> +
> +/* Configure-time overriding to run test suite on all implementations. */
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> +#ifdef DPIF_AVX512_DEFAULT
> +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);

If we change the code inserting a new input implementation
this code will crash in runtime.  Alternatively we can declare
an enum and use the known idx to initialize dpif_idx properly.
[not tested]

enum dpif_netdev_impl_info_idx {
    DPIF_NETDEV_IMPL_SCALAR,
    DPIF_NETDEV_IMPL_AVX512
};

static struct dpif_netdev_impl_info_t dpif_impls[] = {
    [DPIF_NETDEV_IMPL_SCALAR] = { .func = dp_netdev_input,
      .probe = NULL,
      .name = "dpif_scalar", },

#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
    [DPIF_NETDEV_IMPL_AVX512] = { .func = dp_netdev_input_outer_avx512,
      .probe = dp_netdev_input_outer_avx512_probe,
      .name = "dpif_avx512", },
#endif


};

> +        if (!dp_netdev_input_outer_avx512_probe()) {
> +            dpif_idx = 1;
> +        };
> +#endif
> +#endif
> +
> +        VLOG_INFO("Default DPIF implementation is %s.\n",
> +                  dpif_impls[dpif_idx].name);
> +        default_dpif_func = dpif_impls[dpif_idx].func;
> +    }
> +
> +    return default_dpif_func;
> +}
> +
> +void
> +dp_netdev_impl_set_default(dp_netdev_input_func func)
> +{

This is rather fragile to export as an interface. See my
comment on dpif_netdev_impl_set().

> +    default_dpif_func = func;
> +}
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
> +{
> +    ovs_assert(name);
> +    ovs_assert(out_func);
> +
> +    uint32_t i;
> +
> +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> +        if (strcmp(dpif_impls[i].name, name) == 0) {
> +            /* Probe function is optional - so check it is set before exec. */
> +            if (dpif_impls[i].probe) {
> +                int probe_err = dpif_impls[i].probe();
> +                if (probe_err) {
> +                  *out_func = NULL;
> +                   return probe_err;

nit: indentation issues above.

> +                }
> +            }
> +            *out_func = dpif_impls[i].func;
> +            return 0;
> +        }
> +    }
> +
> +    return -EINVAL;
> +}
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index 2fd7cc400..a6db3c7f2 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -23,7 +23,54 @@
>  struct dp_netdev_pmd_thread;
>  struct dp_packet_batch;
>  
> -/* Available implementations for dpif work. */
> +/* Typedef for DPIF functions.
> + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> + */

The above doesn't seem right.

> +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> +                                        struct dp_packet_batch *packets,
> +                                        odp_port_t port_no);
> +
> +/* Probe a DPIF implementation. This allows the implementation to validate CPU
> + * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
> + * use.
> + */
> +typedef int32_t (*dp_netdev_input_func_probe)(void);
> +
> +/* Structure describing each available DPIF implementation. */
> +struct dpif_netdev_impl_info_t {
> +    /* Function pointer to execute to have this DPIF implementation run. */
> +    dp_netdev_input_func func;
> +    /* Function pointer to execute to check the CPU ISA is available to run.
> +     * May be NULL, which implies that it is always valid to use.
> +     */
> +    dp_netdev_input_func_probe probe;
> +    /* Name used to select this DPIF implementation. */
> +    const char *name;
> +};
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
> +
> +/* Returns the default DPIF which is first ./configure selected, but can be
> + * overridden at runtime. */
> +dp_netdev_input_func dp_netdev_impl_get_default(void);
> +
> +/* Overrides the default DPIF with the user set DPIF. */
> +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> +
> +/* Available implementations of DPIF below. */
> +int32_t
> +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> +                struct dp_packet_batch *packets,
> +                odp_port_t in_port);
> +
> +/* AVX512 enabled DPIF implementation and probe functions. */
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void);
> +
>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> index 17356d5e2..f89b1ddaa 100644
> --- a/lib/dpif-netdev-private-thread.h
> +++ b/lib/dpif-netdev-private-thread.h
> @@ -25,6 +25,7 @@
>  #include "cmap.h"
>  
>  #include "dpif-netdev-private-dfc.h"
> +#include "dpif-netdev-private-dpif.h"
>  #include "dpif-netdev-perf.h"
>  #include "openvswitch/thread.h"
>  
> @@ -49,16 +50,6 @@ struct dp_netdev_pmd_thread_ctx {
>      bool smc_enable_db;
>  };
>  
> -/* Forward declaration for typedef. */
> -struct dp_netdev_pmd_thread;
> -
> -/* Typedef for DPIF functions.
> - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> - */
> -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> -                                        struct dp_packet_batch *packets,
> -                                        odp_port_t port_no);
> -
>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
>   * the performance overhead of interrupt processing.  Therefore netdev can
>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index 858d491df..b348940b0 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -226,3 +226,6 @@ recirculation (only in balance-tcp mode).
>  When this is the case, the above command prints the load-balancing information
>  of the bonds configured in datapath \fIdp\fR showing the interface associated
>  with each bucket (hash).
> +.
> +.IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
> +Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1f15af882..9c234ef3d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
>                                        const struct flow *flow,
>                                        const struct nlattr *actions,
>                                        size_t actions_len);
> -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> -                            struct dp_packet_batch *, odp_port_t port_no);
> +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> +                        struct dp_packet_batch *, odp_port_t port_no);


All other functions around are static and this one is now part of
dpif-netdev-private-dpif.h which is included by
dpif-netdev-private-thread.h as part of dpif-netdev-private.h.

Perhaps fixing that header issue I reported in the first patch
helps to solve this too.

>  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
>                                    struct dp_packet_batch *);
>  
> @@ -991,6 +991,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
>      ds_destroy(&reply);
>  }
>  
> +static void
> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> +                     const char *argv[], void *aux OVS_UNUSED)
> +{
> +    /* This function requires just one parameter, the DPIF name.
> +     * A second optional parameter can identify the datapath instance.
> +     */
> +    const char *dpif_name = argv[1];
> +
> +    static const char *error_description[2] = {
> +        "Unknown DPIF implementation",
> +        "CPU doesn't support the required instruction for",
> +    };
> +
> +    dp_netdev_input_func new_func;
> +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> +    if (err) {
> +        struct ds reply = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> +        const char *reply_str = ds_cstr(&reply);
> +        unixctl_command_reply(conn, reply_str);
> +        VLOG_INFO("%s", reply_str);
> +        ds_destroy(&reply);
> +        return;
> +    }
> +
> +    /* argv[2] is optional datapath instance. If no datapath name is provided
> +     * and only one datapath exists, the one existing datapath is reprobed.
> +     */
> +    ovs_mutex_lock(&dp_netdev_mutex);
> +    struct dp_netdev *dp = NULL;
> +
> +    if (argc == 3) {
> +        dp = shash_find_data(&dp_netdevs, argv[2]);
> +    } else if (shash_count(&dp_netdevs) == 1) {
> +        dp = shash_first(&dp_netdevs)->data;
> +    }
> +
> +    if (!dp) {
> +        ovs_mutex_unlock(&dp_netdev_mutex);
> +        unixctl_command_reply_error(conn,
> +                                    "please specify an existing datapath");
> +        return;
> +    }
> +
> +    /* Get PMD threads list. */
> +    size_t n;
> +    struct dp_netdev_pmd_thread **pmd_list;
> +    sorted_poll_thread_list(dp, &pmd_list, &n);
> +
> +    for (size_t i = 0; i < n; i++) {
> +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> +        if (pmd->core_id == NON_PMD_CORE_ID) {
> +            continue;
> +        }
> +
> +        /* Set PMD threads DPIF implementation to requested one. */
> +        pmd->netdev_input_func = *new_func;
> +    };
> +
> +    ovs_mutex_unlock(&dp_netdev_mutex);
> +
> +    /* Set the default implementation for PMD threads created in the future. */
> +    dp_netdev_impl_set_default(*new_func);

I checked other patches and it seems this interface could be simplified
and would fix the set_default() above to be more robust.
What do you think of:

   1) lock dp_netdev_mutex
   2) Check if the DP argument is valid.
   3) Use a new dp_netdev_impl_set_default_by_name()
      That fails if the name is not available or set the default input
      hiding the function pointer from outside.
   4) Loop on each pmd doing the same as in dp_netdev_configure_pmd():
      pmd->netdev_input_func = dp_netdev_impl_get_default();
   5) unlock dp_netdev_mutex
   
It will hold the lock a bit more time but we don't expect to have
several inputs and no frequent input changes, so we should be fine.

Thanks,
fbl

> +
> +    /* Reply with success to command. */
> +    struct ds reply = DS_EMPTY_INITIALIZER;
> +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> +    const char *reply_str = ds_cstr(&reply);
> +    unixctl_command_reply(conn, reply_str);
> +    VLOG_INFO("%s", reply_str);
> +    ds_destroy(&reply);
> +}
> +
>  static void
>  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
>                            const char *argv[], void *aux OVS_UNUSED)
> @@ -1213,6 +1288,10 @@ dpif_netdev_init(void)
>      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
>                               0, 0, dpif_netdev_subtable_lookup_get,
>                               NULL);
> +    unixctl_command_register("dpif-netdev/dpif-set",
> +                             "dpif_implementation_name [dp]",
> +                             1, 2, dpif_netdev_impl_set,
> +                             NULL);
>      return 0;
>  }
>  
> @@ -6067,8 +6146,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
>      hmap_init(&pmd->send_port_cache);
>      cmap_init(&pmd->tx_bonds);
>  
> -    /* Initialize the DPIF function pointer to the default scalar version. */
> -    pmd->netdev_input_func = dp_netdev_input;
> +    /* Initialize DPIF function pointer to the default configured version. */
> +    pmd->netdev_input_func = dp_netdev_impl_get_default();
>  
>      /* init the 'flow_cache' since there is no
>       * actual thread created for NON_PMD_CORE_ID. */
> @@ -7002,7 +7081,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
>      }
>  }
>  
> -static int32_t
> +int32_t
>  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
>                  struct dp_packet_batch *packets,
>                  odp_port_t port_no)
> -- 
> 2.32.0
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian June 24, 2021, 11:53 a.m. UTC | #2
Hi Flavio,

Thanks for the review. My responses are inline.

Cian

> -----Original Message-----
> From: Flavio Leitner <fbl@sysclose.org>
> Sent: Wednesday 23 June 2021 19:18
> To: Ferriter, Cian <cian.ferriter@intel.com>
> Cc: ovs-dev@openvswitch.org; i.maximets@ovn.org
> Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> 
> On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > From: Harry van Haaren <harry.van.haaren@intel.com>
> >
> > This commit adds a new command to allow the user to switch
> > the active DPIF implementation at runtime. A probe function
> > is executed before switching the DPIF implementation, to ensure
> > the CPU is capable of running the ISA required. For example, the
> > below code will switch to the AVX512 enabled DPIF assuming
> > that the runtime CPU is capable of running AVX512 instructions:
> >
> >  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> >
> > A new configuration flag is added to allow selection of the
> > default DPIF. This is useful for running the unit-tests against
> > the available DPIF implementations, without modifying each unit test.
> >
> > The design of the testing & validation for ISA optimized DPIF
> > implementations is based around the work already upstream for DPCLS.
> > Note however that a DPCLS lookup has no state or side-effects, allowing
> > the auto-validator implementation to perform multiple lookups and
> > provide consistent statistic counters.
> >
> > The DPIF component does have state, so running two implementations in
> > parallel and comparing output is not a valid testing method, as there
> > are changes in DPIF statistic counters (side effects). As a result, the
> > DPIF is tested directly against the unit-tests.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
> >
> > ---
> >
> > v13:
> > - Add Docs items about the switch DPIF command here rather than in
> >   later commit.
> > - Document operation in manpages as well as rST.
> > - Minor code refactoring to address review comments.
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  34 +++++++++
> >  acinclude.m4                         |  15 ++++
> >  configure.ac                         |   1 +
> >  lib/automake.mk                      |   1 +
> >  lib/dpif-netdev-avx512.c             |  14 ++++
> >  lib/dpif-netdev-private-dpif.c       | 103 +++++++++++++++++++++++++++
> >  lib/dpif-netdev-private-dpif.h       |  49 ++++++++++++-
> >  lib/dpif-netdev-private-thread.h     |  11 +--
> >  lib/dpif-netdev-unixctl.man          |   3 +
> >  lib/dpif-netdev.c                    |  89 +++++++++++++++++++++--
> >  10 files changed, 304 insertions(+), 16 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-dpif.c
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
> > index 526d5c959..fafa8c821 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -214,3 +214,37 @@ implementation ::
> >
> >  Compile OVS in debug mode to have `ovs_assert` statements error out if
> >  there is a mis-match in the DPCLS lookup implementation.
> > +
> > +Datapath Interface Performance
> > +------------------------------
> > +
> > +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> > +packets through the major components of the userspace datapath; such as
> > +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> > +stats associated with the datapath.
> > +
> > +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
> > +improve performance.
> > +
> > +By default, dpif_scalar is used. The DPIF implementation can be selected by
> > +name ::
> > +
> > +    $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> > +    DPIF implementation set to dpif_avx512.
> > +
> > +    $ ovs-appctl dpif-netdev/dpif-set dpif_scalar
> > +    DPIF implementation set to dpif_scalar.
> > +
> > +Running Unit Tests with AVX512 DPIF
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Since the AVX512 DPIF is disabled by default, a compile time option is
> > +available in order to test it with the OVS unit test suite. When building with
> > +a CPU that supports AVX512, use the following configure option ::
> > +
> > +    $ ./configure --enable-dpif-default-avx512
> > +
> > +The following line should be seen in the configure output when the above option
> > +is used ::
> > +
> > +    checking whether DPIF AVX512 is default implementation... yes
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 15a54d636..5fbcd9872 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
> >    fi
> >  ])
> >
> > +dnl Set OVS DPIF default implementation at configure time for running the unit
> > +dnl tests on the whole codebase without modifying tests per DPIF impl
> > +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> > +  AC_ARG_ENABLE([dpif-default-avx512],
> > +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation
> as default.])],
> > +                [dpifavx512=yes],[dpifavx512=no])
> > +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> > +  if test "$dpifavx512" != yes; then
> > +    AC_MSG_RESULT([no])
> > +  else
> > +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> > +    AC_MSG_RESULT([yes])
> > +  fi
> > +])
> > +
> >  dnl OVS_ENABLE_WERROR
> >  AC_DEFUN([OVS_ENABLE_WERROR],
> >    [AC_ARG_ENABLE(
> > diff --git a/configure.ac b/configure.ac
> > index c077034d4..e45685a6c 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
> >  OVS_ENABLE_SPARSE
> >  OVS_CTAGS_IDENTIFIERS
> >  OVS_CHECK_DPCLS_AUTOVALIDATOR
> > +OVS_CHECK_DPIF_AVX512_DEFAULT
> >  OVS_CHECK_BINUTILS_AVX512
> >
> >  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 660cd07f0..49f42c2a3 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -116,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> >  	lib/dpif-netdev-private-dfc.c \
> >  	lib/dpif-netdev-private-dfc.h \
> >  	lib/dpif-netdev-private-dpcls.h \
> > +	lib/dpif-netdev-private-dpif.c \
> >  	lib/dpif-netdev-private-dpif.h \
> >  	lib/dpif-netdev-private-flow.h \
> >  	lib/dpif-netdev-private-hwol.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > index 0e55b0be2..f3f66fc60 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -19,6 +19,7 @@
> >  #if !defined(__CHECKER__)
> >
> >  #include <config.h>
> > +#include <errno.h>
> >
> >  #include "dpif-netdev.h"
> >  #include "dpif-netdev-perf.h"
> > @@ -61,6 +62,19 @@ struct dpif_userdata {
> >          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
> >  };
> >
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void)
> > +{
> > +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> > +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> 
> Those should be of type "bool"
> 

I'll change to bool in the next version.

> > +
> > +    if (!avx512f_available || !bmi2_available) {
> > +        return -ENOTSUP;
> 
> Also the callers of the probe function only cares about true or
> false, so there is no need to return errno here, then the new
> include can also be removed.
> 

I'll simplify the return and remove the errno include.

> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> > new file mode 100644
> > index 000000000..d829a7ee5
> > --- /dev/null
> > +++ b/lib/dpif-netdev-private-dpif.c
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Copyright (c) 2021 Intel Corporation.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include "dpif-netdev-private-dpif.h"
> > +#include "util.h"
> > +#include "openvswitch/vlog.h"
> 
> Please sort headers alphabetically, see:
> https://github.com/openvswitch/ovs/blob/master/Documentation/internals/contributing/coding-
> style.rst#source-files
> 

I'll fix this for the next version.

> > +
> > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > +
> > +/* Actual list of implementations goes here. */
> > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > +    /* The default scalar C code implementation. */
> > +    { .func = dp_netdev_input,
> 
> The '.func' is too generic. Can we rename this to something that
> relates to 'input'?
> 

I'll rename to 'input_func'. Does that sound good to you?

> > +      .probe = NULL,
> > +      .name = "dpif_scalar", },
> > +
> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> > +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> > +    { .func = dp_netdev_input_outer_avx512,
> > +      .probe = dp_netdev_input_outer_avx512_probe,
> > +      .name = "dpif_avx512", },
> > +#endif
> > +};
> > +
> > +static dp_netdev_input_func default_dpif_func;
> > +
> > +dp_netdev_input_func
> > +dp_netdev_impl_get_default(void)
> > +{
> > +    /* For the first call, this will be NULL. Compute the compile time default.
> > +     */
> > +    if (!default_dpif_func) {
> > +        int dpif_idx = 0;
> > +
> > +/* Configure-time overriding to run test suite on all implementations. */
> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> > +#ifdef DPIF_AVX512_DEFAULT
> > +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> 
> If we change the code inserting a new input implementation
> this code will crash in runtime.  Alternatively we can declare
> an enum and use the known idx to initialize dpif_idx properly.
> [not tested]
> 
> enum dpif_netdev_impl_info_idx {
>     DPIF_NETDEV_IMPL_SCALAR,
>     DPIF_NETDEV_IMPL_AVX512
> };
> 
> static struct dpif_netdev_impl_info_t dpif_impls[] = {
>     [DPIF_NETDEV_IMPL_SCALAR] = { .func = dp_netdev_input,
>       .probe = NULL,
>       .name = "dpif_scalar", },
> 
> #if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
>     /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
>     [DPIF_NETDEV_IMPL_AVX512] = { .func = dp_netdev_input_outer_avx512,
>       .probe = dp_netdev_input_outer_avx512_probe,
>       .name = "dpif_avx512", },
> #endif
> 
> 
> };
> 

Great suggestion, I'll use this.

> > +        if (!dp_netdev_input_outer_avx512_probe()) {
> > +            dpif_idx = 1;
> > +        };
> > +#endif
> > +#endif
> > +
> > +        VLOG_INFO("Default DPIF implementation is %s.\n",
> > +                  dpif_impls[dpif_idx].name);
> > +        default_dpif_func = dpif_impls[dpif_idx].func;
> > +    }
> > +
> > +    return default_dpif_func;
> > +}
> > +
> > +void
> > +dp_netdev_impl_set_default(dp_netdev_input_func func)
> > +{
> 
> This is rather fragile to export as an interface. See my
> comment on dpif_netdev_impl_set().
> 

I'll rework dpif_netdev_impl_set() according to you comments, thanks for the suggestions.

> > +    default_dpif_func = func;
> > +}
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
> > +{
> > +    ovs_assert(name);
> > +    ovs_assert(out_func);
> > +
> > +    uint32_t i;
> > +
> > +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> > +        if (strcmp(dpif_impls[i].name, name) == 0) {
> > +            /* Probe function is optional - so check it is set before exec. */
> > +            if (dpif_impls[i].probe) {
> > +                int probe_err = dpif_impls[i].probe();
> > +                if (probe_err) {
> > +                  *out_func = NULL;
> > +                   return probe_err;
> 
> nit: indentation issues above.
> 

I'll fix this in the next version.

> > +                }
> > +            }
> > +            *out_func = dpif_impls[i].func;
> > +            return 0;
> > +        }
> > +    }
> > +
> > +    return -EINVAL;
> > +}
> > diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> > index 2fd7cc400..a6db3c7f2 100644
> > --- a/lib/dpif-netdev-private-dpif.h
> > +++ b/lib/dpif-netdev-private-dpif.h
> > @@ -23,7 +23,54 @@
> >  struct dp_netdev_pmd_thread;
> >  struct dp_packet_batch;
> >
> > -/* Available implementations for dpif work. */
> > +/* Typedef for DPIF functions.
> > + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > + */
> 
> The above doesn't seem right.
> 

I'll fix to "Returns whether all packets were processed successfully.".

> > +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > +                                        struct dp_packet_batch *packets,
> > +                                        odp_port_t port_no);
> > +
> > +/* Probe a DPIF implementation. This allows the implementation to validate CPU
> > + * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
> > + * use.
> > + */
> > +typedef int32_t (*dp_netdev_input_func_probe)(void);
> > +
> > +/* Structure describing each available DPIF implementation. */
> > +struct dpif_netdev_impl_info_t {
> > +    /* Function pointer to execute to have this DPIF implementation run. */
> > +    dp_netdev_input_func func;
> > +    /* Function pointer to execute to check the CPU ISA is available to run.
> > +     * May be NULL, which implies that it is always valid to use.
> > +     */
> > +    dp_netdev_input_func_probe probe;
> > +    /* Name used to select this DPIF implementation. */
> > +    const char *name;
> > +};
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
> > +
> > +/* Returns the default DPIF which is first ./configure selected, but can be
> > + * overridden at runtime. */
> > +dp_netdev_input_func dp_netdev_impl_get_default(void);
> > +
> > +/* Overrides the default DPIF with the user set DPIF. */
> > +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> > +
> > +/* Available implementations of DPIF below. */
> > +int32_t
> > +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> > +                struct dp_packet_batch *packets,
> > +                odp_port_t in_port);
> > +
> > +/* AVX512 enabled DPIF implementation and probe functions. */
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void);
> > +
> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> > index 17356d5e2..f89b1ddaa 100644
> > --- a/lib/dpif-netdev-private-thread.h
> > +++ b/lib/dpif-netdev-private-thread.h
> > @@ -25,6 +25,7 @@
> >  #include "cmap.h"
> >
> >  #include "dpif-netdev-private-dfc.h"
> > +#include "dpif-netdev-private-dpif.h"
> >  #include "dpif-netdev-perf.h"
> >  #include "openvswitch/thread.h"
> >
> > @@ -49,16 +50,6 @@ struct dp_netdev_pmd_thread_ctx {
> >      bool smc_enable_db;
> >  };
> >
> > -/* Forward declaration for typedef. */
> > -struct dp_netdev_pmd_thread;
> > -
> > -/* Typedef for DPIF functions.
> > - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > - */
> > -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > -                                        struct dp_packet_batch *packets,
> > -                                        odp_port_t port_no);
> > -
> >  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
> >   * the performance overhead of interrupt processing.  Therefore netdev can
> >   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> > diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> > index 858d491df..b348940b0 100644
> > --- a/lib/dpif-netdev-unixctl.man
> > +++ b/lib/dpif-netdev-unixctl.man
> > @@ -226,3 +226,6 @@ recirculation (only in balance-tcp mode).
> >  When this is the case, the above command prints the load-balancing information
> >  of the bonds configured in datapath \fIdp\fR showing the interface associated
> >  with each bucket (hash).
> > +.
> > +.IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
> > +Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 1f15af882..9c234ef3d 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
> >                                        const struct flow *flow,
> >                                        const struct nlattr *actions,
> >                                        size_t actions_len);
> > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > -                            struct dp_packet_batch *, odp_port_t port_no);
> > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > +                        struct dp_packet_batch *, odp_port_t port_no);
> 
> 
> All other functions around are static and this one is now part of
> dpif-netdev-private-dpif.h which is included by
> dpif-netdev-private-thread.h as part of dpif-netdev-private.h.
> 
> Perhaps fixing that header issue I reported in the first patch
> helps to solve this too.
> 

This function can't be static. We need it to be available in lib/dpif-netdev-private-dpif.c to register it as the DPIF function for the dpif_scalar dpif_impl. I hope that makes sense.

> >  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> >                                    struct dp_packet_batch *);
> >
> > @@ -991,6 +991,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
> >      ds_destroy(&reply);
> >  }
> >
> > +static void
> > +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> > +                     const char *argv[], void *aux OVS_UNUSED)
> > +{
> > +    /* This function requires just one parameter, the DPIF name.
> > +     * A second optional parameter can identify the datapath instance.
> > +     */
> > +    const char *dpif_name = argv[1];
> > +
> > +    static const char *error_description[2] = {
> > +        "Unknown DPIF implementation",
> > +        "CPU doesn't support the required instruction for",
> > +    };
> > +
> > +    dp_netdev_input_func new_func;
> > +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> > +    if (err) {
> > +        struct ds reply = DS_EMPTY_INITIALIZER;
> > +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> > +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> > +        const char *reply_str = ds_cstr(&reply);
> > +        unixctl_command_reply(conn, reply_str);
> > +        VLOG_INFO("%s", reply_str);
> > +        ds_destroy(&reply);
> > +        return;
> > +    }
> > +
> > +    /* argv[2] is optional datapath instance. If no datapath name is provided
> > +     * and only one datapath exists, the one existing datapath is reprobed.
> > +     */
> > +    ovs_mutex_lock(&dp_netdev_mutex);
> > +    struct dp_netdev *dp = NULL;
> > +
> > +    if (argc == 3) {
> > +        dp = shash_find_data(&dp_netdevs, argv[2]);
> > +    } else if (shash_count(&dp_netdevs) == 1) {
> > +        dp = shash_first(&dp_netdevs)->data;
> > +    }
> > +
> > +    if (!dp) {
> > +        ovs_mutex_unlock(&dp_netdev_mutex);
> > +        unixctl_command_reply_error(conn,
> > +                                    "please specify an existing datapath");
> > +        return;
> > +    }
> > +
> > +    /* Get PMD threads list. */
> > +    size_t n;
> > +    struct dp_netdev_pmd_thread **pmd_list;
> > +    sorted_poll_thread_list(dp, &pmd_list, &n);
> > +
> > +    for (size_t i = 0; i < n; i++) {
> > +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> > +        if (pmd->core_id == NON_PMD_CORE_ID) {
> > +            continue;
> > +        }
> > +
> > +        /* Set PMD threads DPIF implementation to requested one. */
> > +        pmd->netdev_input_func = *new_func;
> > +    };
> > +
> > +    ovs_mutex_unlock(&dp_netdev_mutex);
> > +
> > +    /* Set the default implementation for PMD threads created in the future. */
> > +    dp_netdev_impl_set_default(*new_func);
> 
> I checked other patches and it seems this interface could be simplified
> and would fix the set_default() above to be more robust.
> What do you think of:
> 
>    1) lock dp_netdev_mutex
>    2) Check if the DP argument is valid.
>    3) Use a new dp_netdev_impl_set_default_by_name()
>       That fails if the name is not available or set the default input
>       hiding the function pointer from outside.
>    4) Loop on each pmd doing the same as in dp_netdev_configure_pmd():
>       pmd->netdev_input_func = dp_netdev_impl_get_default();
>    5) unlock dp_netdev_mutex
> 
> It will hold the lock a bit more time but we don't expect to have
> several inputs and no frequent input changes, so we should be fine.
> 

Good idea. Hiding the function pointer from here is a nice improvement. I'll rework it to do this.

> Thanks,
> fbl
> 
> > +
> > +    /* Reply with success to command. */
> > +    struct ds reply = DS_EMPTY_INITIALIZER;
> > +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> > +    const char *reply_str = ds_cstr(&reply);
> > +    unixctl_command_reply(conn, reply_str);
> > +    VLOG_INFO("%s", reply_str);
> > +    ds_destroy(&reply);
> > +}
> > +
> >  static void
> >  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
> >                            const char *argv[], void *aux OVS_UNUSED)
> > @@ -1213,6 +1288,10 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
> >                               0, 0, dpif_netdev_subtable_lookup_get,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/dpif-set",
> > +                             "dpif_implementation_name [dp]",
> > +                             1, 2, dpif_netdev_impl_set,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > @@ -6067,8 +6146,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev
> *dp,
> >      hmap_init(&pmd->send_port_cache);
> >      cmap_init(&pmd->tx_bonds);
> >
> > -    /* Initialize the DPIF function pointer to the default scalar version. */
> > -    pmd->netdev_input_func = dp_netdev_input;
> > +    /* Initialize DPIF function pointer to the default configured version. */
> > +    pmd->netdev_input_func = dp_netdev_impl_get_default();
> >
> >      /* init the 'flow_cache' since there is no
> >       * actual thread created for NON_PMD_CORE_ID. */
> > @@ -7002,7 +7081,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> >      }
> >  }
> >
> > -static int32_t
> > +int32_t
> >  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> >                  struct dp_packet_batch *packets,
> >                  odp_port_t port_no)
> > --
> > 2.32.0
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> --
> fbl
Flavio Leitner June 24, 2021, 12:26 p.m. UTC | #3
On Thu, Jun 24, 2021 at 11:53:34AM +0000, Ferriter, Cian wrote:
[...]
> > On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > > +
> > > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > > +
> > > +/* Actual list of implementations goes here. */
> > > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > > +    /* The default scalar C code implementation. */
> > > +    { .func = dp_netdev_input,
> > 
> > The '.func' is too generic. Can we rename this to something that
> > relates to 'input'?
> > 
> 
> I'll rename to 'input_func'. Does that sound good to you?

Sounds better, indeed.

[..]
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index 1f15af882..9c234ef3d 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
> > >                                        const struct flow *flow,
> > >                                        const struct nlattr *actions,
> > >                                        size_t actions_len);
> > > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > -                            struct dp_packet_batch *, odp_port_t port_no);
> > > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > +                        struct dp_packet_batch *, odp_port_t port_no);
> > 
> > 
> > All other functions around are static and this one is now part of
> > dpif-netdev-private-dpif.h which is included by
> > dpif-netdev-private-thread.h as part of dpif-netdev-private.h.
> > 
> > Perhaps fixing that header issue I reported in the first patch
> > helps to solve this too.
> > 
> 
> This function can't be static. We need it to be available in
> lib/dpif-netdev-private-dpif.c to register it as the DPIF function
> for the dpif_scalar dpif_impl. I hope that makes sense.

Sure, it can't be. What I am saying is that it is declared in two
different places. One in dpif-netdev.c and another in the header
dpif-netdev-private-dpif.h which is included as well.

Thanks,
Ferriter, Cian June 24, 2021, 12:57 p.m. UTC | #4
Hi Flavio,

Thanks for clarifying. My response is inline.

Cian

> -----Original Message-----
> From: Flavio Leitner <fbl@sysclose.org>
> Sent: Thursday 24 June 2021 13:26
> To: Ferriter, Cian <cian.ferriter@intel.com>
> Cc: ovs-dev@openvswitch.org; i.maximets@ovn.org
> Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> 
> On Thu, Jun 24, 2021 at 11:53:34AM +0000, Ferriter, Cian wrote:
> [...]
> > > On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > > > +
> > > > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > > > +
> > > > +/* Actual list of implementations goes here. */
> > > > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > > > +    /* The default scalar C code implementation. */
> > > > +    { .func = dp_netdev_input,
> > >
> > > The '.func' is too generic. Can we rename this to something that
> > > relates to 'input'?
> > >
> >
> > I'll rename to 'input_func'. Does that sound good to you?
> 
> Sounds better, indeed.
> 
> [..]
> > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > > index 1f15af882..9c234ef3d 100644
> > > > --- a/lib/dpif-netdev.c
> > > > +++ b/lib/dpif-netdev.c
> > > > @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
> > > >                                        const struct flow *flow,
> > > >                                        const struct nlattr *actions,
> > > >                                        size_t actions_len);
> > > > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > > -                            struct dp_packet_batch *, odp_port_t port_no);
> > > > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > > +                        struct dp_packet_batch *, odp_port_t port_no);
> > >
> > >
> > > All other functions around are static and this one is now part of
> > > dpif-netdev-private-dpif.h which is included by
> > > dpif-netdev-private-thread.h as part of dpif-netdev-private.h.
> > >
> > > Perhaps fixing that header issue I reported in the first patch
> > > helps to solve this too.
> > >
> >
> > This function can't be static. We need it to be available in
> > lib/dpif-netdev-private-dpif.c to register it as the DPIF function
> > for the dpif_scalar dpif_impl. I hope that makes sense.
> 
> Sure, it can't be. What I am saying is that it is declared in two
> different places. One in dpif-netdev.c and another in the header
> dpif-netdev-private-dpif.h which is included as well.
> 

My apologies, I understand now. I'll fix this. I'm going to remove the lib/dpif-netdev.c declaration of dp_netdev_input().

> Thanks,
> --
> fbl
Ferriter, Cian June 25, 2021, 8:55 a.m. UTC | #5
Hi Flavio,

Just one correction I noticed while implementing your feedback. I've replied inline.

Thanks,
Cian

> -----Original Message-----
> From: Ferriter, Cian
> Sent: Thursday 24 June 2021 12:54
> To: Flavio Leitner <fbl@sysclose.org>
> Cc: ovs-dev@openvswitch.org; i.maximets@ovn.org
> Subject: RE: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> 
> Hi Flavio,
> 
> Thanks for the review. My responses are inline.
> 
> Cian
> 
> > -----Original Message-----
> > From: Flavio Leitner <fbl@sysclose.org>
> > Sent: Wednesday 23 June 2021 19:18
> > To: Ferriter, Cian <cian.ferriter@intel.com>
> > Cc: ovs-dev@openvswitch.org; i.maximets@ovn.org
> > Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> >
> > On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > > From: Harry van Haaren <harry.van.haaren@intel.com>
> > >

<snip most of the diff>

> > > +
> > > +    if (!avx512f_available || !bmi2_available) {
> > > +        return -ENOTSUP;
> >
> > Also the callers of the probe function only cares about true or
> > false, so there is no need to return errno here, then the new
> > include can also be removed.
> >
> 
> I'll simplify the return and remove the errno include.
> 

This return -ENOTSUP is used in dpif_netdev_impl_set() to select the correct error message to show the user on error.

    static const char *error_description[2] = {
        "Unknown DPIF implementation",
        "CPU doesn't support the required instruction for",
    };

I'm going to leave in. I hope that makes sense.

> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  int32_t
> > >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> > >                               struct dp_packet_batch *packets,

<snip the rest of the diff>
Eelco Chaudron June 25, 2021, 3:14 p.m. UTC | #6
On 24 Jun 2021, at 13:53, Ferriter, Cian wrote:

> Hi Flavio,
>
> Thanks for the review. My responses are inline.
>
> Cian
>

<SNIP>

>>>
>>> +static void
>>> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
>>> +                     const char *argv[], void *aux OVS_UNUSED)
>>> +{
>>> +    /* This function requires just one parameter, the DPIF name.
>>> +     * A second optional parameter can identify the datapath instance.
>>> +     */
>>> +    const char *dpif_name = argv[1];
>>> +
>>> +    static const char *error_description[2] = {
>>> +        "Unknown DPIF implementation",
>>> +        "CPU doesn't support the required instruction for",
>>> +    };
>>> +
>>> +    dp_netdev_input_func new_func;
>>> +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
>>> +    if (err) {
>>> +        struct ds reply = DS_EMPTY_INITIALIZER;
>>> +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
>>> +                      error_description[ (err == -ENOTSUP) ], dpif_name);
>>> +        const char *reply_str = ds_cstr(&reply);
>>> +        unixctl_command_reply(conn, reply_str);
>>> +        VLOG_INFO("%s", reply_str);
>>> +        ds_destroy(&reply);
>>> +        return;
>>> +    }
>>> +
>>> +    /* argv[2] is optional datapath instance. If no datapath name is provided
>>> +     * and only one datapath exists, the one existing datapath is reprobed.
>>> +     */
>>> +    ovs_mutex_lock(&dp_netdev_mutex);
>>> +    struct dp_netdev *dp = NULL;
>>> +
>>> +    if (argc == 3) {
>>> +        dp = shash_find_data(&dp_netdevs, argv[2]);
>>> +    } else if (shash_count(&dp_netdevs) == 1) {
>>> +        dp = shash_first(&dp_netdevs)->data;
>>> +    }
>>> +
>>> +    if (!dp) {
>>> +        ovs_mutex_unlock(&dp_netdev_mutex);
>>> +        unixctl_command_reply_error(conn,
>>> +                                    "please specify an existing datapath");
>>> +        return;
>>> +    }
>>> +
>>> +    /* Get PMD threads list. */
>>> +    size_t n;
>>> +    struct dp_netdev_pmd_thread **pmd_list;
>>> +    sorted_poll_thread_list(dp, &pmd_list, &n);
>>> +
>>> +    for (size_t i = 0; i < n; i++) {
>>> +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
>>> +        if (pmd->core_id == NON_PMD_CORE_ID) {
>>> +            continue;
>>> +        }
>>> +
>>> +        /* Set PMD threads DPIF implementation to requested one. */
>>> +        pmd->netdev_input_func = *new_func;
>>> +    };
>>> +
>>> +    ovs_mutex_unlock(&dp_netdev_mutex);
>>> +
>>> +    /* Set the default implementation for PMD threads created in the future. */
>>> +    dp_netdev_impl_set_default(*new_func);
>>
>> I checked other patches and it seems this interface could be simplified
>> and would fix the set_default() above to be more robust.
>> What do you think of:
>>
>>    1) lock dp_netdev_mutex
>>    2) Check if the DP argument is valid.
>>    3) Use a new dp_netdev_impl_set_default_by_name()
>>       That fails if the name is not available or set the default input
>>       hiding the function pointer from outside.
>>    4) Loop on each pmd doing the same as in dp_netdev_configure_pmd():
>>       pmd->netdev_input_func = dp_netdev_impl_get_default();
>>    5) unlock dp_netdev_mutex
>>
>> It will hold the lock a bit more time but we don't expect to have
>> several inputs and no frequent input changes, so we should be fine.
>>
>
> Good idea. Hiding the function pointer from here is a nice improvement. I'll rework it to do this.


Do we also assume that setting the function pointer happens atomically on all supported architectures? I would assume this requires an atomic set?

//Eelco
Eelco Chaudron June 28, 2021, 2:35 p.m. UTC | #7
On 17 Jun 2021, at 18:18, Cian Ferriter wrote:

> From: Harry van Haaren <harry.van.haaren@intel.com>
>
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
>
>  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
>
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
>
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
>
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
>
> ---
>
> v13:
> - Add Docs items about the switch DPIF command here rather than in
>   later commit.
> - Document operation in manpages as well as rST.
> - Minor code refactoring to address review comments.
> ---
>  Documentation/topics/dpdk/bridge.rst |  34 +++++++++
>  acinclude.m4                         |  15 ++++
>  configure.ac                         |   1 +
>  lib/automake.mk                      |   1 +
>  lib/dpif-netdev-avx512.c             |  14 ++++
>  lib/dpif-netdev-private-dpif.c       | 103 +++++++++++++++++++++++++++
>  lib/dpif-netdev-private-dpif.h       |  49 ++++++++++++-
>  lib/dpif-netdev-private-thread.h     |  11 +--
>  lib/dpif-netdev-unixctl.man          |   3 +
>  lib/dpif-netdev.c                    |  89 +++++++++++++++++++++--
>  10 files changed, 304 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dpif.c
>
> diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
> index 526d5c959..fafa8c821 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -214,3 +214,37 @@ implementation ::
>
>  Compile OVS in debug mode to have `ovs_assert` statements error out if
>  there is a mis-match in the DPCLS lookup implementation.
> +
> +Datapath Interface Performance
> +------------------------------
> +
> +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> +packets through the major components of the userspace datapath; such as
> +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> +stats associated with the datapath.
> +
> +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
> +improve performance.
> +
> +By default, dpif_scalar is used. The DPIF implementation can be selected by
> +name ::
> +
> +    $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> +    DPIF implementation set to dpif_avx512.
> +
> +    $ ovs-appctl dpif-netdev/dpif-set dpif_scalar
> +    DPIF implementation set to dpif_scalar.
> +
> +Running Unit Tests with AVX512 DPIF
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Since the AVX512 DPIF is disabled by default, a compile time option is
> +available in order to test it with the OVS unit test suite. When building with
> +a CPU that supports AVX512, use the following configure option ::
> +
> +    $ ./configure --enable-dpif-default-avx512
> +
> +The following line should be seen in the configure output when the above option
> +is used ::
> +
> +    checking whether DPIF AVX512 is default implementation... yes
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
>    fi
>  ])
>
> +dnl Set OVS DPIF default implementation at configure time for running the unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl
> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> +  AC_ARG_ENABLE([dpif-default-avx512],
> +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])],
> +                [dpifavx512=yes],[dpifavx512=no])
> +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> +  if test "$dpifavx512" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> +    AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl OVS_ENABLE_WERROR
>  AC_DEFUN([OVS_ENABLE_WERROR],
>    [AC_ARG_ENABLE(
> diff --git a/configure.ac b/configure.ac
> index c077034d4..e45685a6c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
>  OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
> +OVS_CHECK_DPIF_AVX512_DEFAULT
>  OVS_CHECK_BINUTILS_AVX512
>
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 660cd07f0..49f42c2a3 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -116,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/dpif-netdev-private-dfc.c \
>  	lib/dpif-netdev-private-dfc.h \
>  	lib/dpif-netdev-private-dpcls.h \
> +	lib/dpif-netdev-private-dpif.c \
>  	lib/dpif-netdev-private-dpif.h \
>  	lib/dpif-netdev-private-flow.h \
>  	lib/dpif-netdev-private-hwol.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 0e55b0be2..f3f66fc60 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -19,6 +19,7 @@
>  #if !defined(__CHECKER__)
>
>  #include <config.h>
> +#include <errno.h>
>
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-perf.h"
> @@ -61,6 +62,19 @@ struct dpif_userdata {
>          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
>  };
>
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void)
> +{
> +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> +
> +    if (!avx512f_available || !bmi2_available) {
> +        return -ENOTSUP;
> +    }
> +
> +    return 0;
> +}
> +
>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> new file mode 100644
> index 000000000..d829a7ee5
> --- /dev/null
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -0,0 +1,103 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include "dpif-netdev-private-dpif.h"
> +#include "util.h"
> +#include "openvswitch/vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> +
> +/* Actual list of implementations goes here. */
> +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> +    /* The default scalar C code implementation. */
> +    { .func = dp_netdev_input,
> +      .probe = NULL,
> +      .name = "dpif_scalar", },
> +
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> +    { .func = dp_netdev_input_outer_avx512,
> +      .probe = dp_netdev_input_outer_avx512_probe,
> +      .name = "dpif_avx512", },
> +#endif
> +};
> +
> +static dp_netdev_input_func default_dpif_func;
> +
> +dp_netdev_input_func
> +dp_netdev_impl_get_default(void)
> +{
> +    /* For the first call, this will be NULL. Compute the compile time default.
> +     */
> +    if (!default_dpif_func) {
> +        int dpif_idx = 0;
> +
> +/* Configure-time overriding to run test suite on all implementations. */
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> +#ifdef DPIF_AVX512_DEFAULT
> +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> +        if (!dp_netdev_input_outer_avx512_probe()) {
> +            dpif_idx = 1;
> +        };
> +#endif
> +#endif
> +
> +        VLOG_INFO("Default DPIF implementation is %s.\n",
> +                  dpif_impls[dpif_idx].name);
> +        default_dpif_func = dpif_impls[dpif_idx].func;
> +    }
> +
> +    return default_dpif_func;
> +}
> +
> +void
> +dp_netdev_impl_set_default(dp_netdev_input_func func)
> +{
> +    default_dpif_func = func;
> +}
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
> +{
> +    ovs_assert(name);
> +    ovs_assert(out_func);
> +
> +    uint32_t i;
> +
> +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> +        if (strcmp(dpif_impls[i].name, name) == 0) {
> +            /* Probe function is optional - so check it is set before exec. */
> +            if (dpif_impls[i].probe) {
> +                int probe_err = dpif_impls[i].probe();
> +                if (probe_err) {
> +                  *out_func = NULL;
> +                   return probe_err;
> +                }
> +            }
> +            *out_func = dpif_impls[i].func;
> +            return 0;
> +        }
> +    }
> +
> +    return -EINVAL;
> +}
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index 2fd7cc400..a6db3c7f2 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -23,7 +23,54 @@
>  struct dp_netdev_pmd_thread;
>  struct dp_packet_batch;
>
> -/* Available implementations for dpif work. */
> +/* Typedef for DPIF functions.
> + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> + */

I'm not really doing a review of this patchset, but looking at it to review the mfex patch series.

I was wondering about the return value? The avx implementation returns -1 on error, so not really a bitmask. Also, the batch could be 32, which will not fit in the int32, unless we use the sign mask.

Also, what happens if you increase the batch size? This will stop working? Maybe we need a compile time assert?

Currently, the bitmask is not used in the code at all, only the return != 0. So maybe update the &batch passed, so you can check the size, and you do not need a return value after all? Or only when you return false?


> +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> +                                        struct dp_packet_batch *packets,
> +                                        odp_port_t port_no);
> +
> +/* Probe a DPIF implementation. This allows the implementation to validate CPU
> + * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
> + * use.
> + */
> +typedef int32_t (*dp_netdev_input_func_probe)(void);
> +
> +/* Structure describing each available DPIF implementation. */
> +struct dpif_netdev_impl_info_t {
> +    /* Function pointer to execute to have this DPIF implementation run. */
> +    dp_netdev_input_func func;
> +    /* Function pointer to execute to check the CPU ISA is available to run.
> +     * May be NULL, which implies that it is always valid to use.
> +     */
> +    dp_netdev_input_func_probe probe;
> +    /* Name used to select this DPIF implementation. */
> +    const char *name;
> +};
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
> +
> +/* Returns the default DPIF which is first ./configure selected, but can be
> + * overridden at runtime. */
> +dp_netdev_input_func dp_netdev_impl_get_default(void);
> +
> +/* Overrides the default DPIF with the user set DPIF. */
> +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> +
> +/* Available implementations of DPIF below. */
> +int32_t
> +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> +                struct dp_packet_batch *packets,
> +                odp_port_t in_port);
> +
> +/* AVX512 enabled DPIF implementation and probe functions. */
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void);
> +
>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> index 17356d5e2..f89b1ddaa 100644
> --- a/lib/dpif-netdev-private-thread.h
> +++ b/lib/dpif-netdev-private-thread.h
> @@ -25,6 +25,7 @@
>  #include "cmap.h"
>
>  #include "dpif-netdev-private-dfc.h"
> +#include "dpif-netdev-private-dpif.h"
>  #include "dpif-netdev-perf.h"
>  #include "openvswitch/thread.h"
>
> @@ -49,16 +50,6 @@ struct dp_netdev_pmd_thread_ctx {
>      bool smc_enable_db;
>  };
>
> -/* Forward declaration for typedef. */
> -struct dp_netdev_pmd_thread;
> -
> -/* Typedef for DPIF functions.
> - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> - */
> -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> -                                        struct dp_packet_batch *packets,
> -                                        odp_port_t port_no);
> -
>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
>   * the performance overhead of interrupt processing.  Therefore netdev can
>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index 858d491df..b348940b0 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -226,3 +226,6 @@ recirculation (only in balance-tcp mode).
>  When this is the case, the above command prints the load-balancing information
>  of the bonds configured in datapath \fIdp\fR showing the interface associated
>  with each bucket (hash).
> +.
> +.IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
> +Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1f15af882..9c234ef3d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
>                                        const struct flow *flow,
>                                        const struct nlattr *actions,
>                                        size_t actions_len);
> -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> -                            struct dp_packet_batch *, odp_port_t port_no);
> +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> +                        struct dp_packet_batch *, odp_port_t port_no);
>  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
>                                    struct dp_packet_batch *);
>
> @@ -991,6 +991,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
>      ds_destroy(&reply);
>  }
>
> +static void
> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> +                     const char *argv[], void *aux OVS_UNUSED)
> +{
> +    /* This function requires just one parameter, the DPIF name.
> +     * A second optional parameter can identify the datapath instance.
> +     */
> +    const char *dpif_name = argv[1];
> +
> +    static const char *error_description[2] = {
> +        "Unknown DPIF implementation",
> +        "CPU doesn't support the required instruction for",
> +    };
> +
> +    dp_netdev_input_func new_func;
> +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> +    if (err) {
> +        struct ds reply = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> +        const char *reply_str = ds_cstr(&reply);
> +        unixctl_command_reply(conn, reply_str);
> +        VLOG_INFO("%s", reply_str);
> +        ds_destroy(&reply);
> +        return;
> +    }
> +
> +    /* argv[2] is optional datapath instance. If no datapath name is provided
> +     * and only one datapath exists, the one existing datapath is reprobed.
> +     */
> +    ovs_mutex_lock(&dp_netdev_mutex);
> +    struct dp_netdev *dp = NULL;
> +
> +    if (argc == 3) {
> +        dp = shash_find_data(&dp_netdevs, argv[2]);
> +    } else if (shash_count(&dp_netdevs) == 1) {
> +        dp = shash_first(&dp_netdevs)->data;
> +    }
> +
> +    if (!dp) {
> +        ovs_mutex_unlock(&dp_netdev_mutex);
> +        unixctl_command_reply_error(conn,
> +                                    "please specify an existing datapath");
> +        return;
> +    }
> +
> +    /* Get PMD threads list. */
> +    size_t n;
> +    struct dp_netdev_pmd_thread **pmd_list;
> +    sorted_poll_thread_list(dp, &pmd_list, &n);
> +
> +    for (size_t i = 0; i < n; i++) {
> +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> +        if (pmd->core_id == NON_PMD_CORE_ID) {
> +            continue;
> +        }
> +
> +        /* Set PMD threads DPIF implementation to requested one. */
> +        pmd->netdev_input_func = *new_func;
> +    };
> +
> +    ovs_mutex_unlock(&dp_netdev_mutex);
> +
> +    /* Set the default implementation for PMD threads created in the future. */
> +    dp_netdev_impl_set_default(*new_func);
> +
> +    /* Reply with success to command. */
> +    struct ds reply = DS_EMPTY_INITIALIZER;
> +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> +    const char *reply_str = ds_cstr(&reply);
> +    unixctl_command_reply(conn, reply_str);
> +    VLOG_INFO("%s", reply_str);
> +    ds_destroy(&reply);
> +}
> +
>  static void
>  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
>                            const char *argv[], void *aux OVS_UNUSED)
> @@ -1213,6 +1288,10 @@ dpif_netdev_init(void)
>      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
>                               0, 0, dpif_netdev_subtable_lookup_get,
>                               NULL);
> +    unixctl_command_register("dpif-netdev/dpif-set",
> +                             "dpif_implementation_name [dp]",
> +                             1, 2, dpif_netdev_impl_set,
> +                             NULL);
>      return 0;
>  }
>
> @@ -6067,8 +6146,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
>      hmap_init(&pmd->send_port_cache);
>      cmap_init(&pmd->tx_bonds);
>
> -    /* Initialize the DPIF function pointer to the default scalar version. */
> -    pmd->netdev_input_func = dp_netdev_input;
> +    /* Initialize DPIF function pointer to the default configured version. */
> +    pmd->netdev_input_func = dp_netdev_impl_get_default();
>
>      /* init the 'flow_cache' since there is no
>       * actual thread created for NON_PMD_CORE_ID. */
> @@ -7002,7 +7081,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
>      }
>  }
>
> -static int32_t
> +int32_t
>  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
>                  struct dp_packet_batch *packets,
>                  odp_port_t port_no)
> -- 
> 2.32.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian June 28, 2021, 3:49 p.m. UTC | #8
Hi Eelco,

Thanks for the comments. My responses are inline.

Cian

> -----Original Message-----
> From: Eelco Chaudron <echaudro@redhat.com>
> Sent: Monday 28 June 2021 15:35
> To: Ferriter, Cian <cian.ferriter@intel.com>
> Cc: ovs-dev@openvswitch.org; i.maximets@ovn.org; Flavio Leitner <fbl@sysclose.org>; Van Haaren, Harry
> <harry.van.haaren@intel.com>
> Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> 
> 
> 
> On 17 Jun 2021, at 18:18, Cian Ferriter wrote:
> 
> > From: Harry van Haaren <harry.van.haaren@intel.com>
> >
> > This commit adds a new command to allow the user to switch
> > the active DPIF implementation at runtime. A probe function
> > is executed before switching the DPIF implementation, to ensure
> > the CPU is capable of running the ISA required. For example, the
> > below code will switch to the AVX512 enabled DPIF assuming
> > that the runtime CPU is capable of running AVX512 instructions:
> >
> >  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> >
> > A new configuration flag is added to allow selection of the
> > default DPIF. This is useful for running the unit-tests against
> > the available DPIF implementations, without modifying each unit test.
> >
> > The design of the testing & validation for ISA optimized DPIF
> > implementations is based around the work already upstream for DPCLS.
> > Note however that a DPCLS lookup has no state or side-effects, allowing
> > the auto-validator implementation to perform multiple lookups and
> > provide consistent statistic counters.
> >
> > The DPIF component does have state, so running two implementations in
> > parallel and comparing output is not a valid testing method, as there
> > are changes in DPIF statistic counters (side effects). As a result, the
> > DPIF is tested directly against the unit-tests.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
> >
> > ---
> >
> > v13:
> > - Add Docs items about the switch DPIF command here rather than in
> >   later commit.
> > - Document operation in manpages as well as rST.
> > - Minor code refactoring to address review comments.
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  34 +++++++++
> >  acinclude.m4                         |  15 ++++
> >  configure.ac                         |   1 +
> >  lib/automake.mk                      |   1 +
> >  lib/dpif-netdev-avx512.c             |  14 ++++
> >  lib/dpif-netdev-private-dpif.c       | 103 +++++++++++++++++++++++++++
> >  lib/dpif-netdev-private-dpif.h       |  49 ++++++++++++-
> >  lib/dpif-netdev-private-thread.h     |  11 +--
> >  lib/dpif-netdev-unixctl.man          |   3 +
> >  lib/dpif-netdev.c                    |  89 +++++++++++++++++++++--
> >  10 files changed, 304 insertions(+), 16 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-dpif.c
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
> > index 526d5c959..fafa8c821 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -214,3 +214,37 @@ implementation ::
> >
> >  Compile OVS in debug mode to have `ovs_assert` statements error out if
> >  there is a mis-match in the DPCLS lookup implementation.
> > +
> > +Datapath Interface Performance
> > +------------------------------
> > +
> > +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> > +packets through the major components of the userspace datapath; such as
> > +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> > +stats associated with the datapath.
> > +
> > +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
> > +improve performance.
> > +
> > +By default, dpif_scalar is used. The DPIF implementation can be selected by
> > +name ::
> > +
> > +    $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> > +    DPIF implementation set to dpif_avx512.
> > +
> > +    $ ovs-appctl dpif-netdev/dpif-set dpif_scalar
> > +    DPIF implementation set to dpif_scalar.
> > +
> > +Running Unit Tests with AVX512 DPIF
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Since the AVX512 DPIF is disabled by default, a compile time option is
> > +available in order to test it with the OVS unit test suite. When building with
> > +a CPU that supports AVX512, use the following configure option ::
> > +
> > +    $ ./configure --enable-dpif-default-avx512
> > +
> > +The following line should be seen in the configure output when the above option
> > +is used ::
> > +
> > +    checking whether DPIF AVX512 is default implementation... yes
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 15a54d636..5fbcd9872 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
> >    fi
> >  ])
> >
> > +dnl Set OVS DPIF default implementation at configure time for running the unit
> > +dnl tests on the whole codebase without modifying tests per DPIF impl
> > +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> > +  AC_ARG_ENABLE([dpif-default-avx512],
> > +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation
> as default.])],
> > +                [dpifavx512=yes],[dpifavx512=no])
> > +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> > +  if test "$dpifavx512" != yes; then
> > +    AC_MSG_RESULT([no])
> > +  else
> > +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> > +    AC_MSG_RESULT([yes])
> > +  fi
> > +])
> > +
> >  dnl OVS_ENABLE_WERROR
> >  AC_DEFUN([OVS_ENABLE_WERROR],
> >    [AC_ARG_ENABLE(
> > diff --git a/configure.ac b/configure.ac
> > index c077034d4..e45685a6c 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
> >  OVS_ENABLE_SPARSE
> >  OVS_CTAGS_IDENTIFIERS
> >  OVS_CHECK_DPCLS_AUTOVALIDATOR
> > +OVS_CHECK_DPIF_AVX512_DEFAULT
> >  OVS_CHECK_BINUTILS_AVX512
> >
> >  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 660cd07f0..49f42c2a3 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -116,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> >  	lib/dpif-netdev-private-dfc.c \
> >  	lib/dpif-netdev-private-dfc.h \
> >  	lib/dpif-netdev-private-dpcls.h \
> > +	lib/dpif-netdev-private-dpif.c \
> >  	lib/dpif-netdev-private-dpif.h \
> >  	lib/dpif-netdev-private-flow.h \
> >  	lib/dpif-netdev-private-hwol.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > index 0e55b0be2..f3f66fc60 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -19,6 +19,7 @@
> >  #if !defined(__CHECKER__)
> >
> >  #include <config.h>
> > +#include <errno.h>
> >
> >  #include "dpif-netdev.h"
> >  #include "dpif-netdev-perf.h"
> > @@ -61,6 +62,19 @@ struct dpif_userdata {
> >          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
> >  };
> >
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void)
> > +{
> > +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> > +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> > +
> > +    if (!avx512f_available || !bmi2_available) {
> > +        return -ENOTSUP;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> > new file mode 100644
> > index 000000000..d829a7ee5
> > --- /dev/null
> > +++ b/lib/dpif-netdev-private-dpif.c
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Copyright (c) 2021 Intel Corporation.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include "dpif-netdev-private-dpif.h"
> > +#include "util.h"
> > +#include "openvswitch/vlog.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > +
> > +/* Actual list of implementations goes here. */
> > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > +    /* The default scalar C code implementation. */
> > +    { .func = dp_netdev_input,
> > +      .probe = NULL,
> > +      .name = "dpif_scalar", },
> > +
> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> > +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> > +    { .func = dp_netdev_input_outer_avx512,
> > +      .probe = dp_netdev_input_outer_avx512_probe,
> > +      .name = "dpif_avx512", },
> > +#endif
> > +};
> > +
> > +static dp_netdev_input_func default_dpif_func;
> > +
> > +dp_netdev_input_func
> > +dp_netdev_impl_get_default(void)
> > +{
> > +    /* For the first call, this will be NULL. Compute the compile time default.
> > +     */
> > +    if (!default_dpif_func) {
> > +        int dpif_idx = 0;
> > +
> > +/* Configure-time overriding to run test suite on all implementations. */
> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> > +#ifdef DPIF_AVX512_DEFAULT
> > +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> > +        if (!dp_netdev_input_outer_avx512_probe()) {
> > +            dpif_idx = 1;
> > +        };
> > +#endif
> > +#endif
> > +
> > +        VLOG_INFO("Default DPIF implementation is %s.\n",
> > +                  dpif_impls[dpif_idx].name);
> > +        default_dpif_func = dpif_impls[dpif_idx].func;
> > +    }
> > +
> > +    return default_dpif_func;
> > +}
> > +
> > +void
> > +dp_netdev_impl_set_default(dp_netdev_input_func func)
> > +{
> > +    default_dpif_func = func;
> > +}
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
> > +{
> > +    ovs_assert(name);
> > +    ovs_assert(out_func);
> > +
> > +    uint32_t i;
> > +
> > +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> > +        if (strcmp(dpif_impls[i].name, name) == 0) {
> > +            /* Probe function is optional - so check it is set before exec. */
> > +            if (dpif_impls[i].probe) {
> > +                int probe_err = dpif_impls[i].probe();
> > +                if (probe_err) {
> > +                  *out_func = NULL;
> > +                   return probe_err;
> > +                }
> > +            }
> > +            *out_func = dpif_impls[i].func;
> > +            return 0;
> > +        }
> > +    }
> > +
> > +    return -EINVAL;
> > +}
> > diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> > index 2fd7cc400..a6db3c7f2 100644
> > --- a/lib/dpif-netdev-private-dpif.h
> > +++ b/lib/dpif-netdev-private-dpif.h
> > @@ -23,7 +23,54 @@
> >  struct dp_netdev_pmd_thread;
> >  struct dp_packet_batch;
> >
> > -/* Available implementations for dpif work. */
> > +/* Typedef for DPIF functions.
> > + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > + */
> 
> I'm not really doing a review of this patchset, but looking at it to review the mfex patch series.
> 

Makes sense to get the context by looking at this patchset first, agreed. Thanks for the comments anyways, any feedback you have is valuable.

> I was wondering about the return value? The avx implementation returns -1 on error, so not really a
> bitmask. Also, the batch could be 32, which will not fit in the int32, unless we use the sign mask.
> 

Agreed, the return value indicates whether the DPIF was successful or there was an error, which is the desired functionality. It is not being used as a bitmask. If it was a bitmask, we would want to use a uint32_t. The comment for this function pointer typedef will be fixed in the next version to:
/* Typedef for DPIF functions.
 * Returns whether all packets were processed successfully.
 */

> Also, what happens if you increase the batch size? This will stop working? Maybe we need a compile
> time assert?
> 

If we were using it as a bitmask, a compile time assert would be a great idea to make sure max_batch_size >= bits_in_bitmask, nice suggestion.

> Currently, the bitmask is not used in the code at all, only the return != 0. So maybe update the
> &batch passed, so you can check the size, and you do not need a return value after all? Or only when
> you return false?
> 

Hopefully, by fixing the comment, this clarifies all your concerns. Please let me know if it does not. 

> 
> > +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > +                                        struct dp_packet_batch *packets,
> > +                                        odp_port_t port_no);
> > +
> > +/* Probe a DPIF implementation. This allows the implementation to validate CPU
> > + * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
> > + * use.
> > + */
> > +typedef int32_t (*dp_netdev_input_func_probe)(void);
> > +
> > +/* Structure describing each available DPIF implementation. */
> > +struct dpif_netdev_impl_info_t {
> > +    /* Function pointer to execute to have this DPIF implementation run. */
> > +    dp_netdev_input_func func;
> > +    /* Function pointer to execute to check the CPU ISA is available to run.
> > +     * May be NULL, which implies that it is always valid to use.
> > +     */
> > +    dp_netdev_input_func_probe probe;
> > +    /* Name used to select this DPIF implementation. */
> > +    const char *name;
> > +};
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
> > +
> > +/* Returns the default DPIF which is first ./configure selected, but can be
> > + * overridden at runtime. */
> > +dp_netdev_input_func dp_netdev_impl_get_default(void);
> > +
> > +/* Overrides the default DPIF with the user set DPIF. */
> > +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> > +
> > +/* Available implementations of DPIF below. */
> > +int32_t
> > +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> > +                struct dp_packet_batch *packets,
> > +                odp_port_t in_port);
> > +
> > +/* AVX512 enabled DPIF implementation and probe functions. */
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void);
> > +
> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> > index 17356d5e2..f89b1ddaa 100644
> > --- a/lib/dpif-netdev-private-thread.h
> > +++ b/lib/dpif-netdev-private-thread.h
> > @@ -25,6 +25,7 @@
> >  #include "cmap.h"
> >
> >  #include "dpif-netdev-private-dfc.h"
> > +#include "dpif-netdev-private-dpif.h"
> >  #include "dpif-netdev-perf.h"
> >  #include "openvswitch/thread.h"
> >
> > @@ -49,16 +50,6 @@ struct dp_netdev_pmd_thread_ctx {
> >      bool smc_enable_db;
> >  };
> >
> > -/* Forward declaration for typedef. */
> > -struct dp_netdev_pmd_thread;
> > -
> > -/* Typedef for DPIF functions.
> > - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > - */
> > -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > -                                        struct dp_packet_batch *packets,
> > -                                        odp_port_t port_no);
> > -
> >  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
> >   * the performance overhead of interrupt processing.  Therefore netdev can
> >   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> > diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> > index 858d491df..b348940b0 100644
> > --- a/lib/dpif-netdev-unixctl.man
> > +++ b/lib/dpif-netdev-unixctl.man
> > @@ -226,3 +226,6 @@ recirculation (only in balance-tcp mode).
> >  When this is the case, the above command prints the load-balancing information
> >  of the bonds configured in datapath \fIdp\fR showing the interface associated
> >  with each bucket (hash).
> > +.
> > +.IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
> > +Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 1f15af882..9c234ef3d 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -479,8 +479,8 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
> >                                        const struct flow *flow,
> >                                        const struct nlattr *actions,
> >                                        size_t actions_len);
> > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > -                            struct dp_packet_batch *, odp_port_t port_no);
> > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > +                        struct dp_packet_batch *, odp_port_t port_no);
> >  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> >                                    struct dp_packet_batch *);
> >
> > @@ -991,6 +991,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
> >      ds_destroy(&reply);
> >  }
> >
> > +static void
> > +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> > +                     const char *argv[], void *aux OVS_UNUSED)
> > +{
> > +    /* This function requires just one parameter, the DPIF name.
> > +     * A second optional parameter can identify the datapath instance.
> > +     */
> > +    const char *dpif_name = argv[1];
> > +
> > +    static const char *error_description[2] = {
> > +        "Unknown DPIF implementation",
> > +        "CPU doesn't support the required instruction for",
> > +    };
> > +
> > +    dp_netdev_input_func new_func;
> > +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> > +    if (err) {
> > +        struct ds reply = DS_EMPTY_INITIALIZER;
> > +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> > +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> > +        const char *reply_str = ds_cstr(&reply);
> > +        unixctl_command_reply(conn, reply_str);
> > +        VLOG_INFO("%s", reply_str);
> > +        ds_destroy(&reply);
> > +        return;
> > +    }
> > +
> > +    /* argv[2] is optional datapath instance. If no datapath name is provided
> > +     * and only one datapath exists, the one existing datapath is reprobed.
> > +     */
> > +    ovs_mutex_lock(&dp_netdev_mutex);
> > +    struct dp_netdev *dp = NULL;
> > +
> > +    if (argc == 3) {
> > +        dp = shash_find_data(&dp_netdevs, argv[2]);
> > +    } else if (shash_count(&dp_netdevs) == 1) {
> > +        dp = shash_first(&dp_netdevs)->data;
> > +    }
> > +
> > +    if (!dp) {
> > +        ovs_mutex_unlock(&dp_netdev_mutex);
> > +        unixctl_command_reply_error(conn,
> > +                                    "please specify an existing datapath");
> > +        return;
> > +    }
> > +
> > +    /* Get PMD threads list. */
> > +    size_t n;
> > +    struct dp_netdev_pmd_thread **pmd_list;
> > +    sorted_poll_thread_list(dp, &pmd_list, &n);
> > +
> > +    for (size_t i = 0; i < n; i++) {
> > +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> > +        if (pmd->core_id == NON_PMD_CORE_ID) {
> > +            continue;
> > +        }
> > +
> > +        /* Set PMD threads DPIF implementation to requested one. */
> > +        pmd->netdev_input_func = *new_func;
> > +    };
> > +
> > +    ovs_mutex_unlock(&dp_netdev_mutex);
> > +
> > +    /* Set the default implementation for PMD threads created in the future. */
> > +    dp_netdev_impl_set_default(*new_func);
> > +
> > +    /* Reply with success to command. */
> > +    struct ds reply = DS_EMPTY_INITIALIZER;
> > +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> > +    const char *reply_str = ds_cstr(&reply);
> > +    unixctl_command_reply(conn, reply_str);
> > +    VLOG_INFO("%s", reply_str);
> > +    ds_destroy(&reply);
> > +}
> > +
> >  static void
> >  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
> >                            const char *argv[], void *aux OVS_UNUSED)
> > @@ -1213,6 +1288,10 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
> >                               0, 0, dpif_netdev_subtable_lookup_get,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/dpif-set",
> > +                             "dpif_implementation_name [dp]",
> > +                             1, 2, dpif_netdev_impl_set,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > @@ -6067,8 +6146,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev
> *dp,
> >      hmap_init(&pmd->send_port_cache);
> >      cmap_init(&pmd->tx_bonds);
> >
> > -    /* Initialize the DPIF function pointer to the default scalar version. */
> > -    pmd->netdev_input_func = dp_netdev_input;
> > +    /* Initialize DPIF function pointer to the default configured version. */
> > +    pmd->netdev_input_func = dp_netdev_impl_get_default();
> >
> >      /* init the 'flow_cache' since there is no
> >       * actual thread created for NON_PMD_CORE_ID. */
> > @@ -7002,7 +7081,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> >      }
> >  }
> >
> > -static int32_t
> > +int32_t
> >  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> >                  struct dp_packet_batch *packets,
> >                  odp_port_t port_no)
> > --
> > 2.32.0
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian June 29, 2021, 8:17 a.m. UTC | #9
Hi Eelco,

Thanks for your comments. My response is inline.

Cian

> -----Original Message-----
> From: Eelco Chaudron <echaudro@redhat.com>
> Sent: Friday 25 June 2021 16:14
> To: Ferriter, Cian <cian.ferriter@intel.com>
> Cc: Flavio Leitner <fbl@sysclose.org>; ovs-dev@openvswitch.org; i.maximets@ovn.org
> Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.
> 
> 
> 
> On 24 Jun 2021, at 13:53, Ferriter, Cian wrote:
> 
> > Hi Flavio,
> >
> > Thanks for the review. My responses are inline.
> >
> > Cian
> >
> 
> <SNIP>
> 
> >>>
> >>> +static void
> >>> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> >>> +                     const char *argv[], void *aux OVS_UNUSED)
> >>> +{
> >>> +    /* This function requires just one parameter, the DPIF name.
> >>> +     * A second optional parameter can identify the datapath instance.
> >>> +     */
> >>> +    const char *dpif_name = argv[1];
> >>> +
> >>> +    static const char *error_description[2] = {
> >>> +        "Unknown DPIF implementation",
> >>> +        "CPU doesn't support the required instruction for",
> >>> +    };
> >>> +
> >>> +    dp_netdev_input_func new_func;
> >>> +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> >>> +    if (err) {
> >>> +        struct ds reply = DS_EMPTY_INITIALIZER;
> >>> +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> >>> +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> >>> +        const char *reply_str = ds_cstr(&reply);
> >>> +        unixctl_command_reply(conn, reply_str);
> >>> +        VLOG_INFO("%s", reply_str);
> >>> +        ds_destroy(&reply);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    /* argv[2] is optional datapath instance. If no datapath name is provided
> >>> +     * and only one datapath exists, the one existing datapath is reprobed.
> >>> +     */
> >>> +    ovs_mutex_lock(&dp_netdev_mutex);
> >>> +    struct dp_netdev *dp = NULL;
> >>> +
> >>> +    if (argc == 3) {
> >>> +        dp = shash_find_data(&dp_netdevs, argv[2]);
> >>> +    } else if (shash_count(&dp_netdevs) == 1) {
> >>> +        dp = shash_first(&dp_netdevs)->data;
> >>> +    }
> >>> +
> >>> +    if (!dp) {
> >>> +        ovs_mutex_unlock(&dp_netdev_mutex);
> >>> +        unixctl_command_reply_error(conn,
> >>> +                                    "please specify an existing datapath");
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    /* Get PMD threads list. */
> >>> +    size_t n;
> >>> +    struct dp_netdev_pmd_thread **pmd_list;
> >>> +    sorted_poll_thread_list(dp, &pmd_list, &n);
> >>> +
> >>> +    for (size_t i = 0; i < n; i++) {
> >>> +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> >>> +        if (pmd->core_id == NON_PMD_CORE_ID) {
> >>> +            continue;
> >>> +        }
> >>> +
> >>> +        /* Set PMD threads DPIF implementation to requested one. */
> >>> +        pmd->netdev_input_func = *new_func;
> >>> +    };
> >>> +
> >>> +    ovs_mutex_unlock(&dp_netdev_mutex);
> >>> +
> >>> +    /* Set the default implementation for PMD threads created in the future. */
> >>> +    dp_netdev_impl_set_default(*new_func);
> >>
> >> I checked other patches and it seems this interface could be simplified
> >> and would fix the set_default() above to be more robust.
> >> What do you think of:
> >>
> >>    1) lock dp_netdev_mutex
> >>    2) Check if the DP argument is valid.
> >>    3) Use a new dp_netdev_impl_set_default_by_name()
> >>       That fails if the name is not available or set the default input
> >>       hiding the function pointer from outside.
> >>    4) Loop on each pmd doing the same as in dp_netdev_configure_pmd():
> >>       pmd->netdev_input_func = dp_netdev_impl_get_default();
> >>    5) unlock dp_netdev_mutex
> >>
> >> It will hold the lock a bit more time but we don't expect to have
> >> several inputs and no frequent input changes, so we should be fine.
> >>
> >
> > Good idea. Hiding the function pointer from here is a nice improvement. I'll rework it to do this.
> 
> 
> Do we also assume that setting the function pointer happens atomically on all supported architectures?
> I would assume this requires an atomic set?
> 

This is a good point. We want an atomic set here. I will push the below fix with the next version of the patchset.

-            pmd->netdev_input_func = dp_netdev_impl_get_default();
+            dp_netdev_input_func default_func = dp_netdev_impl_get_default();
+            atomic_uintptr_t *pmd_func = (void *)&pmd->netdev_input_func;
+            atomic_store_relaxed(pmd_func, (uintptr_t)default_func);

> //Eelco
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
index 526d5c959..fafa8c821 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -214,3 +214,37 @@  implementation ::
 
 Compile OVS in debug mode to have `ovs_assert` statements error out if
 there is a mis-match in the DPCLS lookup implementation.
+
+Datapath Interface Performance
+------------------------------
+
+The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
+packets through the major components of the userspace datapath; such as
+miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
+stats associated with the datapath.
+
+Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
+improve performance.
+
+By default, dpif_scalar is used. The DPIF implementation can be selected by
+name ::
+
+    $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
+    DPIF implementation set to dpif_avx512.
+
+    $ ovs-appctl dpif-netdev/dpif-set dpif_scalar
+    DPIF implementation set to dpif_scalar.
+
+Running Unit Tests with AVX512 DPIF
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the AVX512 DPIF is disabled by default, a compile time option is
+available in order to test it with the OVS unit test suite. When building with
+a CPU that supports AVX512, use the following configure option ::
+
+    $ ./configure --enable-dpif-default-avx512
+
+The following line should be seen in the configure output when the above option
+is used ::
+
+    checking whether DPIF AVX512 is default implementation... yes
diff --git a/acinclude.m4 b/acinclude.m4
index 15a54d636..5fbcd9872 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -30,6 +30,21 @@  AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
   fi
 ])
 
+dnl Set OVS DPIF default implementation at configure time for running the unit
+dnl tests on the whole codebase without modifying tests per DPIF impl
+AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
+  AC_ARG_ENABLE([dpif-default-avx512],
+                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])],
+                [dpifavx512=yes],[dpifavx512=no])
+  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
+  if test "$dpifavx512" != yes; then
+    AC_MSG_RESULT([no])
+  else
+    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
+    AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl OVS_ENABLE_WERROR
 AC_DEFUN([OVS_ENABLE_WERROR],
   [AC_ARG_ENABLE(
diff --git a/configure.ac b/configure.ac
index c077034d4..e45685a6c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -185,6 +185,7 @@  OVS_ENABLE_WERROR
 OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
+OVS_CHECK_DPIF_AVX512_DEFAULT
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/automake.mk b/lib/automake.mk
index 660cd07f0..49f42c2a3 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -116,6 +116,7 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/dpif-netdev-private-dfc.c \
 	lib/dpif-netdev-private-dfc.h \
 	lib/dpif-netdev-private-dpcls.h \
+	lib/dpif-netdev-private-dpif.c \
 	lib/dpif-netdev-private-dpif.h \
 	lib/dpif-netdev-private-flow.h \
 	lib/dpif-netdev-private-hwol.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 0e55b0be2..f3f66fc60 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -19,6 +19,7 @@ 
 #if !defined(__CHECKER__)
 
 #include <config.h>
+#include <errno.h>
 
 #include "dpif-netdev.h"
 #include "dpif-netdev-perf.h"
@@ -61,6 +62,19 @@  struct dpif_userdata {
         struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
 };
 
+int32_t
+dp_netdev_input_outer_avx512_probe(void)
+{
+    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
+    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
+
+    if (!avx512f_available || !bmi2_available) {
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
 int32_t
 dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
                              struct dp_packet_batch *packets,
diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
new file mode 100644
index 000000000..d829a7ee5
--- /dev/null
+++ b/lib/dpif-netdev-private-dpif.c
@@ -0,0 +1,103 @@ 
+/*
+ * Copyright (c) 2021 Intel Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <errno.h>
+#include <string.h>
+
+#include "dpif-netdev-private-dpif.h"
+#include "util.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
+
+/* Actual list of implementations goes here. */
+static struct dpif_netdev_impl_info_t dpif_impls[] = {
+    /* The default scalar C code implementation. */
+    { .func = dp_netdev_input,
+      .probe = NULL,
+      .name = "dpif_scalar", },
+
+#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
+    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
+    { .func = dp_netdev_input_outer_avx512,
+      .probe = dp_netdev_input_outer_avx512_probe,
+      .name = "dpif_avx512", },
+#endif
+};
+
+static dp_netdev_input_func default_dpif_func;
+
+dp_netdev_input_func
+dp_netdev_impl_get_default(void)
+{
+    /* For the first call, this will be NULL. Compute the compile time default.
+     */
+    if (!default_dpif_func) {
+        int dpif_idx = 0;
+
+/* Configure-time overriding to run test suite on all implementations. */
+#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
+#ifdef DPIF_AVX512_DEFAULT
+        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
+        if (!dp_netdev_input_outer_avx512_probe()) {
+            dpif_idx = 1;
+        };
+#endif
+#endif
+
+        VLOG_INFO("Default DPIF implementation is %s.\n",
+                  dpif_impls[dpif_idx].name);
+        default_dpif_func = dpif_impls[dpif_idx].func;
+    }
+
+    return default_dpif_func;
+}
+
+void
+dp_netdev_impl_set_default(dp_netdev_input_func func)
+{
+    default_dpif_func = func;
+}
+
+/* This function checks all available DPIF implementations, and selects the
+ * returns the function pointer to the one requested by "name".
+ */
+int32_t
+dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
+{
+    ovs_assert(name);
+    ovs_assert(out_func);
+
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
+        if (strcmp(dpif_impls[i].name, name) == 0) {
+            /* Probe function is optional - so check it is set before exec. */
+            if (dpif_impls[i].probe) {
+                int probe_err = dpif_impls[i].probe();
+                if (probe_err) {
+                  *out_func = NULL;
+                   return probe_err;
+                }
+            }
+            *out_func = dpif_impls[i].func;
+            return 0;
+        }
+    }
+
+    return -EINVAL;
+}
diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
index 2fd7cc400..a6db3c7f2 100644
--- a/lib/dpif-netdev-private-dpif.h
+++ b/lib/dpif-netdev-private-dpif.h
@@ -23,7 +23,54 @@ 
 struct dp_netdev_pmd_thread;
 struct dp_packet_batch;
 
-/* Available implementations for dpif work. */
+/* Typedef for DPIF functions.
+ * Returns a bitmask of packets to handle, possibly including upcall/misses.
+ */
+typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
+                                        struct dp_packet_batch *packets,
+                                        odp_port_t port_no);
+
+/* Probe a DPIF implementation. This allows the implementation to validate CPU
+ * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
+ * use.
+ */
+typedef int32_t (*dp_netdev_input_func_probe)(void);
+
+/* Structure describing each available DPIF implementation. */
+struct dpif_netdev_impl_info_t {
+    /* Function pointer to execute to have this DPIF implementation run. */
+    dp_netdev_input_func func;
+    /* Function pointer to execute to check the CPU ISA is available to run.
+     * May be NULL, which implies that it is always valid to use.
+     */
+    dp_netdev_input_func_probe probe;
+    /* Name used to select this DPIF implementation. */
+    const char *name;
+};
+
+/* This function checks all available DPIF implementations, and selects the
+ * returns the function pointer to the one requested by "name".
+ */
+int32_t
+dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
+
+/* Returns the default DPIF which is first ./configure selected, but can be
+ * overridden at runtime. */
+dp_netdev_input_func dp_netdev_impl_get_default(void);
+
+/* Overrides the default DPIF with the user set DPIF. */
+void dp_netdev_impl_set_default(dp_netdev_input_func func);
+
+/* Available implementations of DPIF below. */
+int32_t
+dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
+                struct dp_packet_batch *packets,
+                odp_port_t in_port);
+
+/* AVX512 enabled DPIF implementation and probe functions. */
+int32_t
+dp_netdev_input_outer_avx512_probe(void);
+
 int32_t
 dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
                              struct dp_packet_batch *packets,
diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index 17356d5e2..f89b1ddaa 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -25,6 +25,7 @@ 
 #include "cmap.h"
 
 #include "dpif-netdev-private-dfc.h"
+#include "dpif-netdev-private-dpif.h"
 #include "dpif-netdev-perf.h"
 #include "openvswitch/thread.h"
 
@@ -49,16 +50,6 @@  struct dp_netdev_pmd_thread_ctx {
     bool smc_enable_db;
 };
 
-/* Forward declaration for typedef. */
-struct dp_netdev_pmd_thread;
-
-/* Typedef for DPIF functions.
- * Returns a bitmask of packets to handle, possibly including upcall/misses.
- */
-typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
-                                        struct dp_packet_batch *packets,
-                                        odp_port_t port_no);
-
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 858d491df..b348940b0 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -226,3 +226,6 @@  recirculation (only in balance-tcp mode).
 When this is the case, the above command prints the load-balancing information
 of the bonds configured in datapath \fIdp\fR showing the interface associated
 with each bucket (hash).
+.
+.IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
+Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1f15af882..9c234ef3d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -479,8 +479,8 @@  static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
                                       const struct flow *flow,
                                       const struct nlattr *actions,
                                       size_t actions_len);
-static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
-                            struct dp_packet_batch *, odp_port_t port_no);
+int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
+                        struct dp_packet_batch *, odp_port_t port_no);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
                                   struct dp_packet_batch *);
 
@@ -991,6 +991,81 @@  dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
     ds_destroy(&reply);
 }
 
+static void
+dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
+                     const char *argv[], void *aux OVS_UNUSED)
+{
+    /* This function requires just one parameter, the DPIF name.
+     * A second optional parameter can identify the datapath instance.
+     */
+    const char *dpif_name = argv[1];
+
+    static const char *error_description[2] = {
+        "Unknown DPIF implementation",
+        "CPU doesn't support the required instruction for",
+    };
+
+    dp_netdev_input_func new_func;
+    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
+    if (err) {
+        struct ds reply = DS_EMPTY_INITIALIZER;
+        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
+                      error_description[ (err == -ENOTSUP) ], dpif_name);
+        const char *reply_str = ds_cstr(&reply);
+        unixctl_command_reply(conn, reply_str);
+        VLOG_INFO("%s", reply_str);
+        ds_destroy(&reply);
+        return;
+    }
+
+    /* argv[2] is optional datapath instance. If no datapath name is provided
+     * and only one datapath exists, the one existing datapath is reprobed.
+     */
+    ovs_mutex_lock(&dp_netdev_mutex);
+    struct dp_netdev *dp = NULL;
+
+    if (argc == 3) {
+        dp = shash_find_data(&dp_netdevs, argv[2]);
+    } else if (shash_count(&dp_netdevs) == 1) {
+        dp = shash_first(&dp_netdevs)->data;
+    }
+
+    if (!dp) {
+        ovs_mutex_unlock(&dp_netdev_mutex);
+        unixctl_command_reply_error(conn,
+                                    "please specify an existing datapath");
+        return;
+    }
+
+    /* Get PMD threads list. */
+    size_t n;
+    struct dp_netdev_pmd_thread **pmd_list;
+    sorted_poll_thread_list(dp, &pmd_list, &n);
+
+    for (size_t i = 0; i < n; i++) {
+        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
+        if (pmd->core_id == NON_PMD_CORE_ID) {
+            continue;
+        }
+
+        /* Set PMD threads DPIF implementation to requested one. */
+        pmd->netdev_input_func = *new_func;
+    };
+
+    ovs_mutex_unlock(&dp_netdev_mutex);
+
+    /* Set the default implementation for PMD threads created in the future. */
+    dp_netdev_impl_set_default(*new_func);
+
+    /* Reply with success to command. */
+    struct ds reply = DS_EMPTY_INITIALIZER;
+    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
+    const char *reply_str = ds_cstr(&reply);
+    unixctl_command_reply(conn, reply_str);
+    VLOG_INFO("%s", reply_str);
+    ds_destroy(&reply);
+}
+
 static void
 dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
                           const char *argv[], void *aux OVS_UNUSED)
@@ -1213,6 +1288,10 @@  dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
                              0, 0, dpif_netdev_subtable_lookup_get,
                              NULL);
+    unixctl_command_register("dpif-netdev/dpif-set",
+                             "dpif_implementation_name [dp]",
+                             1, 2, dpif_netdev_impl_set,
+                             NULL);
     return 0;
 }
 
@@ -6067,8 +6146,8 @@  dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
     hmap_init(&pmd->send_port_cache);
     cmap_init(&pmd->tx_bonds);
 
-    /* Initialize the DPIF function pointer to the default scalar version. */
-    pmd->netdev_input_func = dp_netdev_input;
+    /* Initialize DPIF function pointer to the default configured version. */
+    pmd->netdev_input_func = dp_netdev_impl_get_default();
 
     /* init the 'flow_cache' since there is no
      * actual thread created for NON_PMD_CORE_ID. */
@@ -7002,7 +7081,7 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
     }
 }
 
-static int32_t
+int32_t
 dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
                 struct dp_packet_batch *packets,
                 odp_port_t port_no)