diff mbox series

[ovs-dev,v12,06/16] dpif-netdev: Add command to switch dpif implementation.

Message ID 20210517140434.59555-7-cian.ferriter@intel.com
State Changes Requested
Headers show
Series DPIF Framework + Optimizations | expand

Commit Message

Ferriter, Cian May 17, 2021, 2:04 p.m. UTC
From: Harry van Haaren <harry.van.haaren@intel.com>

This commit adds a new command to allow the user to switch
the active DPIF implementation at runtime. A probe function
is executed before switching the DPIF implementation, to ensure
the CPU is capable of running the ISA required. For example, the
below code will switch to the AVX512 enabled DPIF assuming
that the runtime CPU is capable of running AVX512 instructions:

 $ ovs-appctl dpif-netdev/dpif-set dpif_avx512

A new configuration flag is added to allow selection of the
default DPIF. This is useful for running the unit-tests against
the available DPIF implementations, without modifying each unit test.

The design of the testing & validation for ISA optimized DPIF
implementations is based around the work already upstream for DPCLS.
Note however that a DPCLS lookup has no state or side-effects, allowing
the auto-validator implementation to perform multiple lookups and
provide consistent statistic counters.

The DPIF component does have state, so running two implementations in
parallel and comparing output is not a valid testing method, as there
are changes in DPIF statistic counters (side effects). As a result, the
DPIF is tested directly against the unit-tests.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>

---

v11:
- Improve the dp_netdev_impl_get_default() function so PMD threads created
  after running "dpif-set" command will use the DPIF implementation that was
  set.
---
 acinclude.m4                     |  15 +++++
 configure.ac                     |   1 +
 lib/automake.mk                  |   1 +
 lib/dpif-netdev-avx512.c         |  14 +++++
 lib/dpif-netdev-private-dpif.c   | 103 +++++++++++++++++++++++++++++++
 lib/dpif-netdev-private-dpif.h   |  47 +++++++++++++-
 lib/dpif-netdev-private-thread.h |  12 +---
 lib/dpif-netdev.c                |  89 ++++++++++++++++++++++++--
 8 files changed, 266 insertions(+), 16 deletions(-)
 create mode 100644 lib/dpif-netdev-private-dpif.c

Comments

Stokes, Ian June 2, 2021, 5:15 p.m. UTC | #1
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
> 
>  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> 
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
> 
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
> 
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>

Thanks for the patch Harry/Cian, comments inline below.

> 
> ---
> 
> v11:
> - Improve the dp_netdev_impl_get_default() function so PMD threads created
>   after running "dpif-set" command will use the DPIF implementation that was
>   set.
> ---
>  acinclude.m4                     |  15 +++++
>  configure.ac                     |   1 +
>  lib/automake.mk                  |   1 +
>  lib/dpif-netdev-avx512.c         |  14 +++++
>  lib/dpif-netdev-private-dpif.c   | 103 +++++++++++++++++++++++++++++++
>  lib/dpif-netdev-private-dpif.h   |  47 +++++++++++++-
>  lib/dpif-netdev-private-thread.h |  12 +---
>  lib/dpif-netdev.c                |  89 ++++++++++++++++++++++++--
>  8 files changed, 266 insertions(+), 16 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-dpif.c
> 
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
>    fi
>  ])
> 
> +dnl Set OVS DPIF default implementation at configure time for running the unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl

A minor nit I have here is the use of the word default. For me default will always be scalar dpif.
Any thoughts on using a term such as MODE or IMP (implementation)?

> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> +  AC_ARG_ENABLE([dpif-default-avx512],
> +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF
> AVX512 implementation as default.])],
> +                [dpifavx512=yes],[dpifavx512=no])

Are both the arguments above "--enable-dpif-default-avx512" and "dpifavx512=yes" required?

I would have assumed that "--enable-dpif-default-avx512" would have been enough at configuration?

> +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> +  if test "$dpifavx512" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> +    AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl OVS_ENABLE_WERROR
>  AC_DEFUN([OVS_ENABLE_WERROR],
>    [AC_ARG_ENABLE(
> diff --git a/configure.ac b/configure.ac
> index c077034d4..e45685a6c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
>  OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
> +OVS_CHECK_DPIF_AVX512_DEFAULT
>  OVS_CHECK_BINUTILS_AVX512
> 
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 5fab8ba4f..6279662f8 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/dpif-netdev.h \
>  	lib/dpif-netdev-private-dfc.h \
>  	lib/dpif-netdev-private-dpcls.h \
> +	lib/dpif-netdev-private-dpif.c \
>  	lib/dpif-netdev-private-dpif.h \
>  	lib/dpif-netdev-private-flow.h \
>  	lib/dpif-netdev-private-hwol.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 98638de15..e255c9d17 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -19,6 +19,7 @@
>  #if !defined(__CHECKER__)
> 
>  #include <config.h>
> +#include <errno.h>
> 
>  #include "dpif-netdev.h"
>  #include "dpif-netdev-perf.h"
> @@ -54,6 +55,19 @@ struct dpif_userdata {
>          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
>  };
> 
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void)
> +{
> +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");

I know in the past there has been concern with DPDK functions being called in areas outside of netdev-dpdk.

But as this functionality is specific to the AVX512 DPIF implementation which is only available via DPDK I think the calls above are ok.

> +
> +    if (!avx512f_available || !bmi2_available) {
> +        return -ENOTSUP;
> +    }
> +
> +    return 0;
> +}
> +
>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> new file mode 100644
> index 000000000..e417fa86d
> --- /dev/null
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -0,0 +1,103 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include "dpif-netdev-private-dpif.h"
> +#include "util.h"
> +#include "openvswitch/vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> +
> +/* Actual list of implementations goes here. */
> +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> +    /* The default scalar C code implementation. */
> +    { .func = dp_netdev_input,
> +      .probe = NULL,
> +      .name = "dpif_scalar", },
> +
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> __SSE4_2__)
> +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> +    { .func = dp_netdev_input_outer_avx512,
> +      .probe = dp_netdev_input_outer_avx512_probe,
> +      .name = "dpif_avx512", },
> +#endif
> +};
> +
> +static dp_netdev_input_func default_dpif_func;
> +
> +dp_netdev_input_func
> +dp_netdev_impl_get_default(void)
> +{
> +    /* For the first call, this will be NULL. Compute the compile time default.
> +     */
> +    if (!default_dpif_func) {
> +        int dpif_idx = 0;
> +
> +/* Configure time overriding to run test suite on all implementations. */

To confirm is this code block specifically to speed up dpif selection if it has been set as avx512 as default during configuration of OVS?

Can you expand on what is meant by time overriding?

> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> __SSE4_2__)
> +#ifdef DPIF_AVX512_DEFAULT
> +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> +        if (!dp_netdev_input_outer_avx512_probe()) {
> +            dpif_idx = 1;
> +        };
> +#endif
> +#endif
> +
> +        VLOG_INFO("Default DPIF implementation is %s.\n",
> +                  dpif_impls[dpif_idx].name);
> +        default_dpif_func = dpif_impls[dpif_idx].func;
> +    }
> +
> +    return default_dpif_func;
> +}
> +
> +void
> +dp_netdev_impl_set_default(dp_netdev_input_func func)
> +{
> +    default_dpif_func = func;
> +}
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> *out_func)
> +{
> +    ovs_assert(name);
> +    ovs_assert(out_func);
> +
> +    uint32_t i;
> +
> +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> +        if (strcmp(dpif_impls[i].name, name) == 0) {
> +            /* Probe function is optional - so check it is set before exec. */
> +            if (dpif_impls[i].probe) {
> +                int probe_ok = dpif_impls[i].probe();
The name probe_ok and the logic threw me here at first.

So if the probe function works and the function can be used then probe_ok will be set to 0.

However if the probe fails and the function cannot be used, probe_ok would be set to - ENOTSUP.

But the if statement below reads as if the probe function has executed and everything is OK, i.e. nor issues or error getting the function.

Then set out_func to NULL which just seems wrong upon first reading. Would suggest change variable to probe_error instead?

> +                if (probe_ok) {
> +                  *out_func = NULL;
> +                   return probe_ok;
> +                }
> +            }
> +            *out_func = dpif_impls[i].func;
> +            return 0;
> +        }
> +    }
> +
> +    return -EINVAL;
> +}
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index 2fd7cc400..fb5380d2c 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -23,7 +23,52 @@
>  struct dp_netdev_pmd_thread;
>  struct dp_packet_batch;
> 
> -/* Available implementations for dpif work. */
> +/* Typedef for DPIF functions.
> + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> + */
> +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> +                                        struct dp_packet_batch *packets,
> +                                        odp_port_t port_no);
> +
> +/* Probe a DPIF implementation. This allows the implementation to validate
> CPU
> + * ISA availability. Returns 0 if not available, returns 1 is valid to use.
> + */

Not sure if the comment above is correct in terms of the implemented behavior for the probe and what it returns,  or at least it's the wrong way around.

Looking at the dp_netdev_input_outer_avx512_probe, it returns -ENOTSUP if there is a problem supporting the required flags (BMI AVX512 etc) and returns 0 if there is no issue.

Can you confirm?

> +typedef int32_t (*dp_netdev_input_func_probe)(void);
> +
> +/* Structure describing each available DPIF implmeentation. */

Typo above for implementation.

> +struct dpif_netdev_impl_info_t {
> +    /* Function pointer to execute to have this DPIF implementation run. */
> +    dp_netdev_input_func func;
> +    /* Function pointer to execute to check the CPU ISA is available to run.
> +     * May be NULL, which implies that it is always valid to use.
> +     */
> +    dp_netdev_input_func_probe probe;
> +    /* Name used to select this DPIF implementation. */
> +    const char *name;
> +};
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> *out_func);
> +
> +/* Returns the default DPIF which is first ./configure selected, but can be
> + * overridden at runtime. */
> +dp_netdev_input_func dp_netdev_impl_get_default(void);
> +
> +/* Overrides the default DPIF with the user set DPIF. */
> +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> +
> +/* Available implementations of DPIF below. */
> +int32_t
> +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> +                struct dp_packet_batch *packets,
> +                odp_port_t in_port);
> +
> +/* AVX512 enabled DPIF implementation and probe functions. */
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void);

Can you add a new line just after the declaration above to split it from existing function below.

>  int32_t
>  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
>                               struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> index c0c94c566..19ce99f4e 100644
> --- a/lib/dpif-netdev-private-thread.h
> +++ b/lib/dpif-netdev-private-thread.h
> @@ -28,6 +28,8 @@
>  #include "dpif-netdev-perf.h"
>  #include "openvswitch/thread.h"
> 
> +#include "dpif-netdev-private-dpif.h"
Should this not be added underneath the existing dpif-netdev header files above?

> +
>  #ifdef  __cplusplus
>  extern "C" {
>  #endif
> @@ -49,16 +51,6 @@ struct dp_netdev_pmd_thread_ctx {
>      bool smc_enable_db;
>  };
> 
> -/* Forward declaration for typedef. */
> -struct dp_netdev_pmd_thread;
> -
> -/* Typedef for DPIF functions.
> - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> - */
> -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> -                                        struct dp_packet_batch *packets,
> -                                        odp_port_t port_no);
> -
>  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
>   * the performance overhead of interrupt processing.  Therefore netdev can
>   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 5ed61d08b..2dfe9003e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -481,8 +481,8 @@ static void dp_netdev_execute_actions(struct
> dp_netdev_pmd_thread *pmd,
>                                        const struct flow *flow,
>                                        const struct nlattr *actions,
>                                        size_t actions_len);
> -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> -                            struct dp_packet_batch *, odp_port_t port_no);
> +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> +                        struct dp_packet_batch *, odp_port_t port_no);
>  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
>                                    struct dp_packet_batch *);
> 
> @@ -993,6 +993,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn
> *conn, int argc,
>      ds_destroy(&reply);
>  }
> 
> +static void
> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> +                     const char *argv[], void *aux OVS_UNUSED)
> +{
> +    /* This function requires just one parameter, the DPIF name.
> +     * A second optional parameter can identify the datapath instance.
> +     */
> +    const char *dpif_name = argv[1];
> +
> +    static const char *error_description[2] = {
> +        "Unknown DPIF implementation",
> +        "CPU doesn't support the required instruction for",
> +    };
> +
> +    dp_netdev_input_func new_func;
> +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> +    if (err) {
> +        struct ds reply = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> +        const char *reply_str = ds_cstr(&reply);
> +        unixctl_command_reply(conn, reply_str);
> +        VLOG_INFO("%s", reply_str);
> +        ds_destroy(&reply);
> +        return;
> +    }
> +
> +    /* argv[2] is optional datapath instance. If no datapath name is provided
> +     * and only one datapath exists, the one existing datapath is reprobed.
> +     */
> +    ovs_mutex_lock(&dp_netdev_mutex);
> +    struct dp_netdev *dp = NULL;
> +
> +    if (argc == 3) {
> +        dp = shash_find_data(&dp_netdevs, argv[2]);
> +    } else if (shash_count(&dp_netdevs) == 1) {
> +        dp = shash_first(&dp_netdevs)->data;
> +    }
> +
> +    if (!dp) {
> +        ovs_mutex_unlock(&dp_netdev_mutex);
> +        unixctl_command_reply_error(conn,
> +                                    "please specify an existing datapath");
> +        return;
> +    }
> +
> +    /* Get PMD threads list. */
> +    size_t n;
> +    struct dp_netdev_pmd_thread **pmd_list;
> +    sorted_poll_thread_list(dp, &pmd_list, &n);
> +
> +    for (size_t i = 0; i < n; i++) {
> +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> +        if (pmd->core_id == NON_PMD_CORE_ID) {
> +            continue;
> +        }
> +
> +        /* Set PMD threads DPIF implementation to requested one. */
> +        pmd->netdev_input_func = *new_func;
> +    };
> +
> +    ovs_mutex_unlock(&dp_netdev_mutex);
> +
> +    /* Set the default implementation for PMD threads created in the future. */
> +    dp_netdev_impl_set_default(*new_func);
> +
> +    /* Reply with success to command. */
> +    struct ds reply = DS_EMPTY_INITIALIZER;
> +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> +    const char *reply_str = ds_cstr(&reply);
> +    unixctl_command_reply(conn, reply_str);
> +    VLOG_INFO("%s", reply_str);
> +    ds_destroy(&reply);
> +}
> +
>  static void
>  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
>                            const char *argv[], void *aux OVS_UNUSED)
> @@ -1215,6 +1290,10 @@ dpif_netdev_init(void)
>      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
>                               0, 0, dpif_netdev_subtable_lookup_get,
>                               NULL);
> +    unixctl_command_register("dpif-netdev/dpif-set",
> +                             "dpif_implementation_name [dp]",
> +                             1, 2, dpif_netdev_impl_set,
> +                             NULL);
>      return 0;
>  }
> 
> @@ -6038,8 +6117,8 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
>      hmap_init(&pmd->send_port_cache);
>      cmap_init(&pmd->tx_bonds);
> 
> -    /* Initialize the DPIF function pointer to the default scalar version. */
> -    pmd->netdev_input_func = dp_netdev_input;
> +    /* Initialize DPIF function pointer to the default configured version. */
> +    pmd->netdev_input_func = dp_netdev_impl_get_default();
> 
>      /* init the 'flow_cache' since there is no
>       * actual thread created for NON_PMD_CORE_ID. */
> @@ -6973,7 +7052,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread
> *pmd,
>      }
>  }
> 
> -static int32_t
> +int32_t
>  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
>                  struct dp_packet_batch *packets,
>                  odp_port_t port_no)

Will continue with some further testing while comments are addressed.

Thanks
Ian

> --
> 2.31.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian June 10, 2021, 2:44 p.m. UTC | #2
Hi Ian,

Thanks for the review. My responses are inline.

> -----Original Message-----
> From: Stokes, Ian <ian.stokes@intel.com>
> Sent: Wednesday 2 June 2021 18:16
> To: Ferriter, Cian <cian.ferriter@intel.com>; ovs-dev@openvswitch.org; Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: i.maximets@ovn.org
> Subject: RE: [ovs-dev] [v12 06/16] dpif-netdev: Add command to switch dpif implementation.
> 
> > This commit adds a new command to allow the user to switch
> > the active DPIF implementation at runtime. A probe function
> > is executed before switching the DPIF implementation, to ensure
> > the CPU is capable of running the ISA required. For example, the
> > below code will switch to the AVX512 enabled DPIF assuming
> > that the runtime CPU is capable of running AVX512 instructions:
> >
> >  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> >
> > A new configuration flag is added to allow selection of the
> > default DPIF. This is useful for running the unit-tests against
> > the available DPIF implementations, without modifying each unit test.
> >
> > The design of the testing & validation for ISA optimized DPIF
> > implementations is based around the work already upstream for DPCLS.
> > Note however that a DPCLS lookup has no state or side-effects, allowing
> > the auto-validator implementation to perform multiple lookups and
> > provide consistent statistic counters.
> >
> > The DPIF component does have state, so running two implementations in
> > parallel and comparing output is not a valid testing method, as there
> > are changes in DPIF statistic counters (side effects). As a result, the
> > DPIF is tested directly against the unit-tests.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
> 
> Thanks for the patch Harry/Cian, comments inline below.
> 
> >
> > ---
> >
> > v11:
> > - Improve the dp_netdev_impl_get_default() function so PMD threads created
> >   after running "dpif-set" command will use the DPIF implementation that was
> >   set.
> > ---
> >  acinclude.m4                     |  15 +++++
> >  configure.ac                     |   1 +
> >  lib/automake.mk                  |   1 +
> >  lib/dpif-netdev-avx512.c         |  14 +++++
> >  lib/dpif-netdev-private-dpif.c   | 103 +++++++++++++++++++++++++++++++
> >  lib/dpif-netdev-private-dpif.h   |  47 +++++++++++++-
> >  lib/dpif-netdev-private-thread.h |  12 +---
> >  lib/dpif-netdev.c                |  89 ++++++++++++++++++++++++--
> >  8 files changed, 266 insertions(+), 16 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-dpif.c
> >
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 15a54d636..5fbcd9872 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
> >    fi
> >  ])
> >
> > +dnl Set OVS DPIF default implementation at configure time for running the unit
> > +dnl tests on the whole codebase without modifying tests per DPIF impl
> 
> A minor nit I have here is the use of the word default. For me default will always be scalar dpif.
> Any thoughts on using a term such as MODE or IMP (implementation)?
> 

I see what your saying. The default for a user that doesn't change any DPIF settings is the scalar DPIF. We want to use the term default because using this compile time flag passed to the OVS "./configure" script changes the default DPIF implementation when OVS is launched.

This is different to changing the DPIF implementation on the fly while running OVS with the "ovs-appctl dpif-netdev/dpif-set" CMD. This is where we would us terms like MODE or IMP for implementation. 

By using the " --enable-dpif-default-avx512" flag, we change the default.

Hopefully that makes sense.

> > +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> > +  AC_ARG_ENABLE([dpif-default-avx512],
> > +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF
> > AVX512 implementation as default.])],
> > +                [dpifavx512=yes],[dpifavx512=no])
> 
> Are both the arguments above "--enable-dpif-default-avx512" and "dpifavx512=yes" required?
> 
> I would have assumed that "--enable-dpif-default-avx512" would have been enough at configuration?
> 

Only " --enable-dpif-default-avx512" is required. " dpifavx512=yes" isn't an argument. It's used internally just in the "acinclude.m4" file.

> > +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> > +  if test "$dpifavx512" != yes; then
> > +    AC_MSG_RESULT([no])
> > +  else
> > +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> > +    AC_MSG_RESULT([yes])
> > +  fi
> > +])
> > +
> >  dnl OVS_ENABLE_WERROR
> >  AC_DEFUN([OVS_ENABLE_WERROR],
> >    [AC_ARG_ENABLE(
> > diff --git a/configure.ac b/configure.ac
> > index c077034d4..e45685a6c 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
> >  OVS_ENABLE_SPARSE
> >  OVS_CTAGS_IDENTIFIERS
> >  OVS_CHECK_DPCLS_AUTOVALIDATOR
> > +OVS_CHECK_DPIF_AVX512_DEFAULT
> >  OVS_CHECK_BINUTILS_AVX512
> >
> >  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 5fab8ba4f..6279662f8 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
> >  lib/dpif-netdev.h \
> >  lib/dpif-netdev-private-dfc.h \
> >  lib/dpif-netdev-private-dpcls.h \
> > +lib/dpif-netdev-private-dpif.c \
> >  lib/dpif-netdev-private-dpif.h \
> >  lib/dpif-netdev-private-flow.h \
> >  lib/dpif-netdev-private-hwol.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > index 98638de15..e255c9d17 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -19,6 +19,7 @@
> >  #if !defined(__CHECKER__)
> >
> >  #include <config.h>
> > +#include <errno.h>
> >
> >  #include "dpif-netdev.h"
> >  #include "dpif-netdev-perf.h"
> > @@ -54,6 +55,19 @@ struct dpif_userdata {
> >          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
> >  };
> >
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void)
> > +{
> > +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> > +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> 
> I know in the past there has been concern with DPDK functions being called in areas outside of netdev-dpdk.
> 
> But as this functionality is specific to the AVX512 DPIF implementation which is only available via DPDK I think the calls above are ok.
> 

Agreed. I guess the list of acceptable files where DPDK function are called has been expanded, but this is a file used only with DPDK builds of OVS.

> > +
> > +    if (!avx512f_available || !bmi2_available) {
> > +        return -ENOTSUP;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> > new file mode 100644
> > index 000000000..e417fa86d
> > --- /dev/null
> > +++ b/lib/dpif-netdev-private-dpif.c
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Copyright (c) 2021 Intel Corporation.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include "dpif-netdev-private-dpif.h"
> > +#include "util.h"
> > +#include "openvswitch/vlog.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > +
> > +/* Actual list of implementations goes here. */
> > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > +    /* The default scalar C code implementation. */
> > +    { .func = dp_netdev_input,
> > +      .probe = NULL,
> > +      .name = "dpif_scalar", },
> > +
> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> > __SSE4_2__)
> > +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> > +    { .func = dp_netdev_input_outer_avx512,
> > +      .probe = dp_netdev_input_outer_avx512_probe,
> > +      .name = "dpif_avx512", },
> > +#endif
> > +};
> > +
> > +static dp_netdev_input_func default_dpif_func;
> > +
> > +dp_netdev_input_func
> > +dp_netdev_impl_get_default(void)
> > +{
> > +    /* For the first call, this will be NULL. Compute the compile time default.
> > +     */
> > +    if (!default_dpif_func) {
> > +        int dpif_idx = 0;
> > +
> > +/* Configure time overriding to run test suite on all implementations. */
> 
> To confirm is this code block specifically to speed up dpif selection if it has been set as avx512 as default during configuration of OVS?
> 

The function gets the default DPIF set for OVS. The very first time it's called, "default_dpif_func" won't be set, so we compute the compile time default, which is the scalar DPIF, unless the " --enable-dpif-default-avx512" ./configure option has been used.

One of the reasons to set AVX512 DPIF as the default during configuration is so unit tests can be run using the AVX512 DPIF without having to call "ovs-appctl dpif-netdev/dpif-set dpif_avx512" everytime.

> Can you expand on what is meant by time overriding?
> 

I think it would be clearer if we wrote "Configure-time overriding". I'll add that in the next version.

> > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> > __SSE4_2__)
> > +#ifdef DPIF_AVX512_DEFAULT
> > +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> > +        if (!dp_netdev_input_outer_avx512_probe()) {
> > +            dpif_idx = 1;
> > +        };
> > +#endif
> > +#endif
> > +
> > +        VLOG_INFO("Default DPIF implementation is %s.\n",
> > +                  dpif_impls[dpif_idx].name);
> > +        default_dpif_func = dpif_impls[dpif_idx].func;
> > +    }
> > +
> > +    return default_dpif_func;
> > +}
> > +
> > +void
> > +dp_netdev_impl_set_default(dp_netdev_input_func func)
> > +{
> > +    default_dpif_func = func;
> > +}
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> > *out_func)
> > +{
> > +    ovs_assert(name);
> > +    ovs_assert(out_func);
> > +
> > +    uint32_t i;
> > +
> > +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> > +        if (strcmp(dpif_impls[i].name, name) == 0) {
> > +            /* Probe function is optional - so check it is set before exec. */
> > +            if (dpif_impls[i].probe) {
> > +                int probe_ok = dpif_impls[i].probe();
> The name probe_ok and the logic threw me here at first.
> 
> So if the probe function works and the function can be used then probe_ok will be set to 0.
> 
> However if the probe fails and the function cannot be used, probe_ok would be set to - ENOTSUP.
> 
> But the if statement below reads as if the probe function has executed and everything is OK, i.e. nor issues or error getting the
> function.
> 
> Then set out_func to NULL which just seems wrong upon first reading. Would suggest change variable to probe_error instead?
> 

Very good point. This looks counterintuitive. I'll change the variable name to probe_err.

> > +                if (probe_ok) {
> > +                  *out_func = NULL;
> > +                   return probe_ok;
> > +                }
> > +            }
> > +            *out_func = dpif_impls[i].func;
> > +            return 0;
> > +        }
> > +    }
> > +
> > +    return -EINVAL;
> > +}
> > diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> > index 2fd7cc400..fb5380d2c 100644
> > --- a/lib/dpif-netdev-private-dpif.h
> > +++ b/lib/dpif-netdev-private-dpif.h
> > @@ -23,7 +23,52 @@
> >  struct dp_netdev_pmd_thread;
> >  struct dp_packet_batch;
> >
> > -/* Available implementations for dpif work. */
> > +/* Typedef for DPIF functions.
> > + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > + */
> > +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > +                                        struct dp_packet_batch *packets,
> > +                                        odp_port_t port_no);
> > +
> > +/* Probe a DPIF implementation. This allows the implementation to validate
> > CPU
> > + * ISA availability. Returns 0 if not available, returns 1 is valid to use.
> > + */
> 
> Not sure if the comment above is correct in terms of the implemented behavior for the probe and what it returns,  or at least it's the
> wrong way around.
> 
> Looking at the dp_netdev_input_outer_avx512_probe, it returns -ENOTSUP if there is a problem supporting the required flags (BMI
> AVX512 etc) and returns 0 if there is no issue.
> 
> Can you confirm?
> 

Good catch, the comment is wrong here. I'll update in the next version.

> > +typedef int32_t (*dp_netdev_input_func_probe)(void);
> > +
> > +/* Structure describing each available DPIF implmeentation. */
> 
> Typo above for implementation.
> 

I'll fix in the next version.

> > +struct dpif_netdev_impl_info_t {
> > +    /* Function pointer to execute to have this DPIF implementation run. */
> > +    dp_netdev_input_func func;
> > +    /* Function pointer to execute to check the CPU ISA is available to run.
> > +     * May be NULL, which implies that it is always valid to use.
> > +     */
> > +    dp_netdev_input_func_probe probe;
> > +    /* Name used to select this DPIF implementation. */
> > +    const char *name;
> > +};
> > +
> > +/* This function checks all available DPIF implementations, and selects the
> > + * returns the function pointer to the one requested by "name".
> > + */
> > +int32_t
> > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> > *out_func);
> > +
> > +/* Returns the default DPIF which is first ./configure selected, but can be
> > + * overridden at runtime. */
> > +dp_netdev_input_func dp_netdev_impl_get_default(void);
> > +
> > +/* Overrides the default DPIF with the user set DPIF. */
> > +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> > +
> > +/* Available implementations of DPIF below. */
> > +int32_t
> > +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> > +                struct dp_packet_batch *packets,
> > +                odp_port_t in_port);
> > +
> > +/* AVX512 enabled DPIF implementation and probe functions. */
> > +int32_t
> > +dp_netdev_input_outer_avx512_probe(void);
> 
> Can you add a new line just after the declaration above to split it from existing function below.
> 

I'll fix in the next version.

> >  int32_t
> >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> >                               struct dp_packet_batch *packets,
> > diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> > index c0c94c566..19ce99f4e 100644
> > --- a/lib/dpif-netdev-private-thread.h
> > +++ b/lib/dpif-netdev-private-thread.h
> > @@ -28,6 +28,8 @@
> >  #include "dpif-netdev-perf.h"
> >  #include "openvswitch/thread.h"
> >
> > +#include "dpif-netdev-private-dpif.h"
> Should this not be added underneath the existing dpif-netdev header files above?
> 

Yes, agreed. I'll fix this in the next version.

> > +
> >  #ifdef  __cplusplus
> >  extern "C" {
> >  #endif
> > @@ -49,16 +51,6 @@ struct dp_netdev_pmd_thread_ctx {
> >      bool smc_enable_db;
> >  };
> >
> > -/* Forward declaration for typedef. */
> > -struct dp_netdev_pmd_thread;
> > -
> > -/* Typedef for DPIF functions.
> > - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > - */
> > -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> > -                                        struct dp_packet_batch *packets,
> > -                                        odp_port_t port_no);
> > -
> >  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
> >   * the performance overhead of interrupt processing.  Therefore netdev can
> >   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 5ed61d08b..2dfe9003e 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -481,8 +481,8 @@ static void dp_netdev_execute_actions(struct
> > dp_netdev_pmd_thread *pmd,
> >                                        const struct flow *flow,
> >                                        const struct nlattr *actions,
> >                                        size_t actions_len);
> > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > -                            struct dp_packet_batch *, odp_port_t port_no);
> > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > +                        struct dp_packet_batch *, odp_port_t port_no);
> >  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> >                                    struct dp_packet_batch *);
> >
> > @@ -993,6 +993,81 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn
> > *conn, int argc,
> >      ds_destroy(&reply);
> >  }
> >
> > +static void
> > +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> > +                     const char *argv[], void *aux OVS_UNUSED)
> > +{
> > +    /* This function requires just one parameter, the DPIF name.
> > +     * A second optional parameter can identify the datapath instance.
> > +     */
> > +    const char *dpif_name = argv[1];
> > +
> > +    static const char *error_description[2] = {
> > +        "Unknown DPIF implementation",
> > +        "CPU doesn't support the required instruction for",
> > +    };
> > +
> > +    dp_netdev_input_func new_func;
> > +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> > +    if (err) {
> > +        struct ds reply = DS_EMPTY_INITIALIZER;
> > +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> > +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> > +        const char *reply_str = ds_cstr(&reply);
> > +        unixctl_command_reply(conn, reply_str);
> > +        VLOG_INFO("%s", reply_str);
> > +        ds_destroy(&reply);
> > +        return;
> > +    }
> > +
> > +    /* argv[2] is optional datapath instance. If no datapath name is provided
> > +     * and only one datapath exists, the one existing datapath is reprobed.
> > +     */
> > +    ovs_mutex_lock(&dp_netdev_mutex);
> > +    struct dp_netdev *dp = NULL;
> > +
> > +    if (argc == 3) {
> > +        dp = shash_find_data(&dp_netdevs, argv[2]);
> > +    } else if (shash_count(&dp_netdevs) == 1) {
> > +        dp = shash_first(&dp_netdevs)->data;
> > +    }
> > +
> > +    if (!dp) {
> > +        ovs_mutex_unlock(&dp_netdev_mutex);
> > +        unixctl_command_reply_error(conn,
> > +                                    "please specify an existing datapath");
> > +        return;
> > +    }
> > +
> > +    /* Get PMD threads list. */
> > +    size_t n;
> > +    struct dp_netdev_pmd_thread **pmd_list;
> > +    sorted_poll_thread_list(dp, &pmd_list, &n);
> > +
> > +    for (size_t i = 0; i < n; i++) {
> > +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> > +        if (pmd->core_id == NON_PMD_CORE_ID) {
> > +            continue;
> > +        }
> > +
> > +        /* Set PMD threads DPIF implementation to requested one. */
> > +        pmd->netdev_input_func = *new_func;
> > +    };
> > +
> > +    ovs_mutex_unlock(&dp_netdev_mutex);
> > +
> > +    /* Set the default implementation for PMD threads created in the future. */
> > +    dp_netdev_impl_set_default(*new_func);
> > +
> > +    /* Reply with success to command. */
> > +    struct ds reply = DS_EMPTY_INITIALIZER;
> > +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> > +    const char *reply_str = ds_cstr(&reply);
> > +    unixctl_command_reply(conn, reply_str);
> > +    VLOG_INFO("%s", reply_str);
> > +    ds_destroy(&reply);
> > +}
> > +
> >  static void
> >  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
> >                            const char *argv[], void *aux OVS_UNUSED)
> > @@ -1215,6 +1290,10 @@ dpif_netdev_init(void)
> >      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
> >                               0, 0, dpif_netdev_subtable_lookup_get,
> >                               NULL);
> > +    unixctl_command_register("dpif-netdev/dpif-set",
> > +                             "dpif_implementation_name [dp]",
> > +                             1, 2, dpif_netdev_impl_set,
> > +                             NULL);
> >      return 0;
> >  }
> >
> > @@ -6038,8 +6117,8 @@ dp_netdev_configure_pmd(struct
> > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> >      hmap_init(&pmd->send_port_cache);
> >      cmap_init(&pmd->tx_bonds);
> >
> > -    /* Initialize the DPIF function pointer to the default scalar version. */
> > -    pmd->netdev_input_func = dp_netdev_input;
> > +    /* Initialize DPIF function pointer to the default configured version. */
> > +    pmd->netdev_input_func = dp_netdev_impl_get_default();
> >
> >      /* init the 'flow_cache' since there is no
> >       * actual thread created for NON_PMD_CORE_ID. */
> > @@ -6973,7 +7052,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread
> > *pmd,
> >      }
> >  }
> >
> > -static int32_t
> > +int32_t
> >  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> >                  struct dp_packet_batch *packets,
> >                  odp_port_t port_no)
> 
> Will continue with some further testing while comments are addressed.
> 
> Thanks
> Ian
> 
> > --
> > 2.31.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Stokes, Ian June 16, 2021, 11:21 a.m. UTC | #3
> Hi Ian,
> 
> Thanks for the review. My responses are inline.
> 
> > -----Original Message-----
> > From: Stokes, Ian <ian.stokes@intel.com>
> > Sent: Wednesday 2 June 2021 18:16
> > To: Ferriter, Cian <cian.ferriter@intel.com>; ovs-dev@openvswitch.org; Van
> Haaren, Harry <harry.van.haaren@intel.com>
> > Cc: i.maximets@ovn.org
> > Subject: RE: [ovs-dev] [v12 06/16] dpif-netdev: Add command to switch dpif
> implementation.
> >
> > > This commit adds a new command to allow the user to switch
> > > the active DPIF implementation at runtime. A probe function
> > > is executed before switching the DPIF implementation, to ensure
> > > the CPU is capable of running the ISA required. For example, the
> > > below code will switch to the AVX512 enabled DPIF assuming
> > > that the runtime CPU is capable of running AVX512 instructions:
> > >
> > >  $ ovs-appctl dpif-netdev/dpif-set dpif_avx512
> > >
> > > A new configuration flag is added to allow selection of the
> > > default DPIF. This is useful for running the unit-tests against
> > > the available DPIF implementations, without modifying each unit test.
> > >
> > > The design of the testing & validation for ISA optimized DPIF
> > > implementations is based around the work already upstream for DPCLS.
> > > Note however that a DPCLS lookup has no state or side-effects, allowing
> > > the auto-validator implementation to perform multiple lookups and
> > > provide consistent statistic counters.
> > >
> > > The DPIF component does have state, so running two implementations in
> > > parallel and comparing output is not a valid testing method, as there
> > > are changes in DPIF statistic counters (side effects). As a result, the
> > > DPIF is tested directly against the unit-tests.
> > >
> > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> > > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
> > > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
> >
> > Thanks for the patch Harry/Cian, comments inline below.
> >
> > >
> > > ---
> > >
> > > v11:
> > > - Improve the dp_netdev_impl_get_default() function so PMD threads
> created
> > >   after running "dpif-set" command will use the DPIF implementation that
> was
> > >   set.
> > > ---
> > >  acinclude.m4                     |  15 +++++
> > >  configure.ac                     |   1 +
> > >  lib/automake.mk                  |   1 +
> > >  lib/dpif-netdev-avx512.c         |  14 +++++
> > >  lib/dpif-netdev-private-dpif.c   | 103 +++++++++++++++++++++++++++++++
> > >  lib/dpif-netdev-private-dpif.h   |  47 +++++++++++++-
> > >  lib/dpif-netdev-private-thread.h |  12 +---
> > >  lib/dpif-netdev.c                |  89 ++++++++++++++++++++++++--
> > >  8 files changed, 266 insertions(+), 16 deletions(-)
> > >  create mode 100644 lib/dpif-netdev-private-dpif.c
> > >
> > > diff --git a/acinclude.m4 b/acinclude.m4
> > > index 15a54d636..5fbcd9872 100644
> > > --- a/acinclude.m4
> > > +++ b/acinclude.m4
> > > @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
> > >    fi
> > >  ])
> > >
> > > +dnl Set OVS DPIF default implementation at configure time for running the
> unit
> > > +dnl tests on the whole codebase without modifying tests per DPIF impl
> >
> > A minor nit I have here is the use of the word default. For me default will
> always be scalar dpif.
> > Any thoughts on using a term such as MODE or IMP (implementation)?
> >
> 
> I see what your saying. The default for a user that doesn't change any DPIF
> settings is the scalar DPIF. We want to use the term default because using this
> compile time flag passed to the OVS "./configure" script changes the default
> DPIF implementation when OVS is launched.
> 
> This is different to changing the DPIF implementation on the fly while running
> OVS with the "ovs-appctl dpif-netdev/dpif-set" CMD. This is where we would us
> terms like MODE or IMP for implementation.
> 
> By using the " --enable-dpif-default-avx512" flag, we change the default.
> 
> Hopefully that makes sense.

Yes it does, I can see your point here. This is OK with me.

> 
> > > +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> > > +  AC_ARG_ENABLE([dpif-default-avx512],
> > > +                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF
> > > AVX512 implementation as default.])],
> > > +                [dpifavx512=yes],[dpifavx512=no])
> >
> > Are both the arguments above "--enable-dpif-default-avx512" and
> "dpifavx512=yes" required?
> >
> > I would have assumed that "--enable-dpif-default-avx512" would have been
> enough at configuration?
> >
> 
> Only " --enable-dpif-default-avx512" is required. " dpifavx512=yes" isn't an
> argument. It's used internally just in the "acinclude.m4" file.

Understood.

> 
> > > +  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> > > +  if test "$dpifavx512" != yes; then
> > > +    AC_MSG_RESULT([no])
> > > +  else
> > > +    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> > > +    AC_MSG_RESULT([yes])
> > > +  fi
> > > +])
> > > +
> > >  dnl OVS_ENABLE_WERROR
> > >  AC_DEFUN([OVS_ENABLE_WERROR],
> > >    [AC_ARG_ENABLE(
> > > diff --git a/configure.ac b/configure.ac
> > > index c077034d4..e45685a6c 100644
> > > --- a/configure.ac
> > > +++ b/configure.ac
> > > @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
> > >  OVS_ENABLE_SPARSE
> > >  OVS_CTAGS_IDENTIFIERS
> > >  OVS_CHECK_DPCLS_AUTOVALIDATOR
> > > +OVS_CHECK_DPIF_AVX512_DEFAULT
> > >  OVS_CHECK_BINUTILS_AVX512
> > >
> > >  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> > > diff --git a/lib/automake.mk b/lib/automake.mk
> > > index 5fab8ba4f..6279662f8 100644
> > > --- a/lib/automake.mk
> > > +++ b/lib/automake.mk
> > > @@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
> > >  lib/dpif-netdev.h \
> > >  lib/dpif-netdev-private-dfc.h \
> > >  lib/dpif-netdev-private-dpcls.h \
> > > +lib/dpif-netdev-private-dpif.c \
> > >  lib/dpif-netdev-private-dpif.h \
> > >  lib/dpif-netdev-private-flow.h \
> > >  lib/dpif-netdev-private-hwol.h \
> > > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > > index 98638de15..e255c9d17 100644
> > > --- a/lib/dpif-netdev-avx512.c
> > > +++ b/lib/dpif-netdev-avx512.c
> > > @@ -19,6 +19,7 @@
> > >  #if !defined(__CHECKER__)
> > >
> > >  #include <config.h>
> > > +#include <errno.h>
> > >
> > >  #include "dpif-netdev.h"
> > >  #include "dpif-netdev-perf.h"
> > > @@ -54,6 +55,19 @@ struct dpif_userdata {
> > >          struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
> > >  };
> > >
> > > +int32_t
> > > +dp_netdev_input_outer_avx512_probe(void)
> > > +{
> > > +    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> > > +    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> >
> > I know in the past there has been concern with DPDK functions being called in
> areas outside of netdev-dpdk.
> >
> > But as this functionality is specific to the AVX512 DPIF implementation which is
> only available via DPDK I think the calls above are ok.
> >
> 
> Agreed. I guess the list of acceptable files where DPDK function are called has
> been expanded, but this is a file used only with DPDK builds of OVS.
> 

Thanks.

> > > +
> > > +    if (!avx512f_available || !bmi2_available) {
> > > +        return -ENOTSUP;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >  int32_t
> > >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> > >                               struct dp_packet_batch *packets,
> > > diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> > > new file mode 100644
> > > index 000000000..e417fa86d
> > > --- /dev/null
> > > +++ b/lib/dpif-netdev-private-dpif.c
> > > @@ -0,0 +1,103 @@
> > > +/*
> > > + * Copyright (c) 2021 Intel Corporation.
> > > + *
> > > + * Licensed under the Apache License, Version 2.0 (the "License");
> > > + * you may not use this file except in compliance with the License.
> > > + * You may obtain a copy of the License at:
> > > + *
> > > + *     http://www.apache.org/licenses/LICENSE-2.0
> > > + *
> > > + * Unless required by applicable law or agreed to in writing, software
> > > + * distributed under the License is distributed on an "AS IS" BASIS,
> > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > > implied.
> > > + * See the License for the specific language governing permissions and
> > > + * limitations under the License.
> > > + */
> > > +
> > > +#include <config.h>
> > > +#include <errno.h>
> > > +#include <string.h>
> > > +
> > > +#include "dpif-netdev-private-dpif.h"
> > > +#include "util.h"
> > > +#include "openvswitch/vlog.h"
> > > +
> > > +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> > > +
> > > +/* Actual list of implementations goes here. */
> > > +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> > > +    /* The default scalar C code implementation. */
> > > +    { .func = dp_netdev_input,
> > > +      .probe = NULL,
> > > +      .name = "dpif_scalar", },
> > > +
> > > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> > > __SSE4_2__)
> > > +    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> > > +    { .func = dp_netdev_input_outer_avx512,
> > > +      .probe = dp_netdev_input_outer_avx512_probe,
> > > +      .name = "dpif_avx512", },
> > > +#endif
> > > +};
> > > +
> > > +static dp_netdev_input_func default_dpif_func;
> > > +
> > > +dp_netdev_input_func
> > > +dp_netdev_impl_get_default(void)
> > > +{
> > > +    /* For the first call, this will be NULL. Compute the compile time default.
> > > +     */
> > > +    if (!default_dpif_func) {
> > > +        int dpif_idx = 0;
> > > +
> > > +/* Configure time overriding to run test suite on all implementations. */
> >
> > To confirm is this code block specifically to speed up dpif selection if it has
> been set as avx512 as default during configuration of OVS?
> >
> 
> The function gets the default DPIF set for OVS. The very first time it's called,
> "default_dpif_func" won't be set, so we compute the compile time default,
> which is the scalar DPIF, unless the " --enable-dpif-default-avx512" ./configure
> option has been used.
>

Understood.
 
> One of the reasons to set AVX512 DPIF as the default during configuration is so
> unit tests can be run using the AVX512 DPIF without having to call "ovs-appctl
> dpif-netdev/dpif-set dpif_avx512" everytime.
> 
> > Can you expand on what is meant by time overriding?
> >
> 
> I think it would be clearer if we wrote "Configure-time overriding". I'll add that in
> the next version.
> 

Thanks

> > > +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD &&
> > > __SSE4_2__)
> > > +#ifdef DPIF_AVX512_DEFAULT
> > > +        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
> > > +        if (!dp_netdev_input_outer_avx512_probe()) {
> > > +            dpif_idx = 1;
> > > +        };
> > > +#endif
> > > +#endif
> > > +
> > > +        VLOG_INFO("Default DPIF implementation is %s.\n",
> > > +                  dpif_impls[dpif_idx].name);
> > > +        default_dpif_func = dpif_impls[dpif_idx].func;
> > > +    }
> > > +
> > > +    return default_dpif_func;
> > > +}
> > > +
> > > +void
> > > +dp_netdev_impl_set_default(dp_netdev_input_func func)
> > > +{
> > > +    default_dpif_func = func;
> > > +}
> > > +
> > > +/* This function checks all available DPIF implementations, and selects the
> > > + * returns the function pointer to the one requested by "name".
> > > + */
> > > +int32_t
> > > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> > > *out_func)
> > > +{
> > > +    ovs_assert(name);
> > > +    ovs_assert(out_func);
> > > +
> > > +    uint32_t i;
> > > +
> > > +    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> > > +        if (strcmp(dpif_impls[i].name, name) == 0) {
> > > +            /* Probe function is optional - so check it is set before exec. */
> > > +            if (dpif_impls[i].probe) {
> > > +                int probe_ok = dpif_impls[i].probe();
> > The name probe_ok and the logic threw me here at first.
> >
> > So if the probe function works and the function can be used then probe_ok will
> be set to 0.
> >
> > However if the probe fails and the function cannot be used, probe_ok would
> be set to - ENOTSUP.
> >
> > But the if statement below reads as if the probe function has executed and
> everything is OK, i.e. nor issues or error getting the
> > function.
> >
> > Then set out_func to NULL which just seems wrong upon first reading. Would
> suggest change variable to probe_error instead?
> >
> 
> Very good point. This looks counterintuitive. I'll change the variable name to
> probe_err.

That would be great.

> 
> > > +                if (probe_ok) {
> > > +                  *out_func = NULL;
> > > +                   return probe_ok;
> > > +                }
> > > +            }
> > > +            *out_func = dpif_impls[i].func;
> > > +            return 0;
> > > +        }
> > > +    }
> > > +
> > > +    return -EINVAL;
> > > +}
> > > diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> > > index 2fd7cc400..fb5380d2c 100644
> > > --- a/lib/dpif-netdev-private-dpif.h
> > > +++ b/lib/dpif-netdev-private-dpif.h
> > > @@ -23,7 +23,52 @@
> > >  struct dp_netdev_pmd_thread;
> > >  struct dp_packet_batch;
> > >
> > > -/* Available implementations for dpif work. */
> > > +/* Typedef for DPIF functions.
> > > + * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > > + */
> > > +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread
> *pmd,
> > > +                                        struct dp_packet_batch *packets,
> > > +                                        odp_port_t port_no);
> > > +
> > > +/* Probe a DPIF implementation. This allows the implementation to validate
> > > CPU
> > > + * ISA availability. Returns 0 if not available, returns 1 is valid to use.
> > > + */
> >
> > Not sure if the comment above is correct in terms of the implemented
> behavior for the probe and what it returns,  or at least it's the
> > wrong way around.
> >
> > Looking at the dp_netdev_input_outer_avx512_probe, it returns -ENOTSUP if
> there is a problem supporting the required flags (BMI
> > AVX512 etc) and returns 0 if there is no issue.
> >
> > Can you confirm?
> >
> 
> Good catch, the comment is wrong here. I'll update in the next version.
> 
> > > +typedef int32_t (*dp_netdev_input_func_probe)(void);
> > > +
> > > +/* Structure describing each available DPIF implmeentation. */
> >
> > Typo above for implementation.
> >
> 
> I'll fix in the next version.
> 
> > > +struct dpif_netdev_impl_info_t {
> > > +    /* Function pointer to execute to have this DPIF implementation run. */
> > > +    dp_netdev_input_func func;
> > > +    /* Function pointer to execute to check the CPU ISA is available to run.
> > > +     * May be NULL, which implies that it is always valid to use.
> > > +     */
> > > +    dp_netdev_input_func_probe probe;
> > > +    /* Name used to select this DPIF implementation. */
> > > +    const char *name;
> > > +};
> > > +
> > > +/* This function checks all available DPIF implementations, and selects the
> > > + * returns the function pointer to the one requested by "name".
> > > + */
> > > +int32_t
> > > +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func
> > > *out_func);
> > > +
> > > +/* Returns the default DPIF which is first ./configure selected, but can be
> > > + * overridden at runtime. */
> > > +dp_netdev_input_func dp_netdev_impl_get_default(void);
> > > +
> > > +/* Overrides the default DPIF with the user set DPIF. */
> > > +void dp_netdev_impl_set_default(dp_netdev_input_func func);
> > > +
> > > +/* Available implementations of DPIF below. */
> > > +int32_t
> > > +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> > > +                struct dp_packet_batch *packets,
> > > +                odp_port_t in_port);
> > > +
> > > +/* AVX512 enabled DPIF implementation and probe functions. */
> > > +int32_t
> > > +dp_netdev_input_outer_avx512_probe(void);
> >
> > Can you add a new line just after the declaration above to split it from existing
> function below.
> >
> 
> I'll fix in the next version.
> 
> > >  int32_t
> > >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> > >                               struct dp_packet_batch *packets,
> > > diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-
> thread.h
> > > index c0c94c566..19ce99f4e 100644
> > > --- a/lib/dpif-netdev-private-thread.h
> > > +++ b/lib/dpif-netdev-private-thread.h
> > > @@ -28,6 +28,8 @@
> > >  #include "dpif-netdev-perf.h"
> > >  #include "openvswitch/thread.h"
> > >
> > > +#include "dpif-netdev-private-dpif.h"
> > Should this not be added underneath the existing dpif-netdev header files
> above?
> >
> 
> Yes, agreed. I'll fix this in the next version.
> 
> > > +
> > >  #ifdef  __cplusplus
> > >  extern "C" {
> > >  #endif
> > > @@ -49,16 +51,6 @@ struct dp_netdev_pmd_thread_ctx {
> > >      bool smc_enable_db;
> > >  };
> > >
> > > -/* Forward declaration for typedef. */
> > > -struct dp_netdev_pmd_thread;
> > > -
> > > -/* Typedef for DPIF functions.
> > > - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> > > - */
> > > -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread
> *pmd,
> > > -                                        struct dp_packet_batch *packets,
> > > -                                        odp_port_t port_no);
> > > -
> > >  /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
> > >   * the performance overhead of interrupt processing.  Therefore netdev can
> > >   * not implement rx-wait for these devices.  dpif-netdev needs to poll
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index 5ed61d08b..2dfe9003e 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -481,8 +481,8 @@ static void dp_netdev_execute_actions(struct
> > > dp_netdev_pmd_thread *pmd,
> > >                                        const struct flow *flow,
> > >                                        const struct nlattr *actions,
> > >                                        size_t actions_len);
> > > -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > -                            struct dp_packet_batch *, odp_port_t port_no);
> > > +int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> > > +                        struct dp_packet_batch *, odp_port_t port_no);
> > >  static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> > >                                    struct dp_packet_batch *);
> > >
> > > @@ -993,6 +993,81 @@ dpif_netdev_subtable_lookup_set(struct
> unixctl_conn
> > > *conn, int argc,
> > >      ds_destroy(&reply);
> > >  }
> > >
> > > +static void
> > > +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
> > > +                     const char *argv[], void *aux OVS_UNUSED)
> > > +{
> > > +    /* This function requires just one parameter, the DPIF name.
> > > +     * A second optional parameter can identify the datapath instance.
> > > +     */
> > > +    const char *dpif_name = argv[1];
> > > +
> > > +    static const char *error_description[2] = {
> > > +        "Unknown DPIF implementation",
> > > +        "CPU doesn't support the required instruction for",
> > > +    };
> > > +
> > > +    dp_netdev_input_func new_func;
> > > +    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
> > > +    if (err) {
> > > +        struct ds reply = DS_EMPTY_INITIALIZER;
> > > +        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> > > +                      error_description[ (err == -ENOTSUP) ], dpif_name);
> > > +        const char *reply_str = ds_cstr(&reply);
> > > +        unixctl_command_reply(conn, reply_str);
> > > +        VLOG_INFO("%s", reply_str);
> > > +        ds_destroy(&reply);
> > > +        return;
> > > +    }
> > > +
> > > +    /* argv[2] is optional datapath instance. If no datapath name is provided
> > > +     * and only one datapath exists, the one existing datapath is reprobed.
> > > +     */
> > > +    ovs_mutex_lock(&dp_netdev_mutex);
> > > +    struct dp_netdev *dp = NULL;
> > > +
> > > +    if (argc == 3) {
> > > +        dp = shash_find_data(&dp_netdevs, argv[2]);
> > > +    } else if (shash_count(&dp_netdevs) == 1) {
> > > +        dp = shash_first(&dp_netdevs)->data;
> > > +    }
> > > +
> > > +    if (!dp) {
> > > +        ovs_mutex_unlock(&dp_netdev_mutex);
> > > +        unixctl_command_reply_error(conn,
> > > +                                    "please specify an existing datapath");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Get PMD threads list. */
> > > +    size_t n;
> > > +    struct dp_netdev_pmd_thread **pmd_list;
> > > +    sorted_poll_thread_list(dp, &pmd_list, &n);
> > > +
> > > +    for (size_t i = 0; i < n; i++) {
> > > +        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> > > +        if (pmd->core_id == NON_PMD_CORE_ID) {
> > > +            continue;
> > > +        }
> > > +
> > > +        /* Set PMD threads DPIF implementation to requested one. */
> > > +        pmd->netdev_input_func = *new_func;
> > > +    };
> > > +
> > > +    ovs_mutex_unlock(&dp_netdev_mutex);
> > > +
> > > +    /* Set the default implementation for PMD threads created in the future.
> */
> > > +    dp_netdev_impl_set_default(*new_func);
> > > +
> > > +    /* Reply with success to command. */
> > > +    struct ds reply = DS_EMPTY_INITIALIZER;
> > > +    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> > > +    const char *reply_str = ds_cstr(&reply);
> > > +    unixctl_command_reply(conn, reply_str);
> > > +    VLOG_INFO("%s", reply_str);
> > > +    ds_destroy(&reply);
> > > +}
> > > +
> > >  static void
> > >  dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
> > >                            const char *argv[], void *aux OVS_UNUSED)
> > > @@ -1215,6 +1290,10 @@ dpif_netdev_init(void)
> > >      unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
> > >                               0, 0, dpif_netdev_subtable_lookup_get,
> > >                               NULL);
> > > +    unixctl_command_register("dpif-netdev/dpif-set",
> > > +                             "dpif_implementation_name [dp]",
> > > +                             1, 2, dpif_netdev_impl_set,
> > > +                             NULL);
> > >      return 0;
> > >  }
> > >
> > > @@ -6038,8 +6117,8 @@ dp_netdev_configure_pmd(struct
> > > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> > >      hmap_init(&pmd->send_port_cache);
> > >      cmap_init(&pmd->tx_bonds);
> > >
> > > -    /* Initialize the DPIF function pointer to the default scalar version. */
> > > -    pmd->netdev_input_func = dp_netdev_input;
> > > +    /* Initialize DPIF function pointer to the default configured version. */
> > > +    pmd->netdev_input_func = dp_netdev_impl_get_default();
> > >
> > >      /* init the 'flow_cache' since there is no
> > >       * actual thread created for NON_PMD_CORE_ID. */
> > > @@ -6973,7 +7052,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread
> > > *pmd,
> > >      }
> > >  }
> > >
> > > -static int32_t
> > > +int32_t
> > >  dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> > >                  struct dp_packet_batch *packets,
> > >                  odp_port_t port_no)
> >
> > Will continue with some further testing while comments are addressed.
> >
> > Thanks
> > Ian

Thanks for the feedback above and the changes, looking forward to the next revision.

BR
Ian
> >
> > > --
> > > 2.31.1
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/acinclude.m4 b/acinclude.m4
index 15a54d636..5fbcd9872 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -30,6 +30,21 @@  AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
   fi
 ])
 
+dnl Set OVS DPIF default implementation at configure time for running the unit
+dnl tests on the whole codebase without modifying tests per DPIF impl
+AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
+  AC_ARG_ENABLE([dpif-default-avx512],
+                [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])],
+                [dpifavx512=yes],[dpifavx512=no])
+  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
+  if test "$dpifavx512" != yes; then
+    AC_MSG_RESULT([no])
+  else
+    OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
+    AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl OVS_ENABLE_WERROR
 AC_DEFUN([OVS_ENABLE_WERROR],
   [AC_ARG_ENABLE(
diff --git a/configure.ac b/configure.ac
index c077034d4..e45685a6c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -185,6 +185,7 @@  OVS_ENABLE_WERROR
 OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
+OVS_CHECK_DPIF_AVX512_DEFAULT
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/automake.mk b/lib/automake.mk
index 5fab8ba4f..6279662f8 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -115,6 +115,7 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/dpif-netdev.h \
 	lib/dpif-netdev-private-dfc.h \
 	lib/dpif-netdev-private-dpcls.h \
+	lib/dpif-netdev-private-dpif.c \
 	lib/dpif-netdev-private-dpif.h \
 	lib/dpif-netdev-private-flow.h \
 	lib/dpif-netdev-private-hwol.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 98638de15..e255c9d17 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -19,6 +19,7 @@ 
 #if !defined(__CHECKER__)
 
 #include <config.h>
+#include <errno.h>
 
 #include "dpif-netdev.h"
 #include "dpif-netdev-perf.h"
@@ -54,6 +55,19 @@  struct dpif_userdata {
         struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
 };
 
+int32_t
+dp_netdev_input_outer_avx512_probe(void)
+{
+    int avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
+    int bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
+
+    if (!avx512f_available || !bmi2_available) {
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
 int32_t
 dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
                              struct dp_packet_batch *packets,
diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
new file mode 100644
index 000000000..e417fa86d
--- /dev/null
+++ b/lib/dpif-netdev-private-dpif.c
@@ -0,0 +1,103 @@ 
+/*
+ * Copyright (c) 2021 Intel Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <errno.h>
+#include <string.h>
+
+#include "dpif-netdev-private-dpif.h"
+#include "util.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
+
+/* Actual list of implementations goes here. */
+static struct dpif_netdev_impl_info_t dpif_impls[] = {
+    /* The default scalar C code implementation. */
+    { .func = dp_netdev_input,
+      .probe = NULL,
+      .name = "dpif_scalar", },
+
+#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
+    /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
+    { .func = dp_netdev_input_outer_avx512,
+      .probe = dp_netdev_input_outer_avx512_probe,
+      .name = "dpif_avx512", },
+#endif
+};
+
+static dp_netdev_input_func default_dpif_func;
+
+dp_netdev_input_func
+dp_netdev_impl_get_default(void)
+{
+    /* For the first call, this will be NULL. Compute the compile time default.
+     */
+    if (!default_dpif_func) {
+        int dpif_idx = 0;
+
+/* Configure time overriding to run test suite on all implementations. */
+#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
+#ifdef DPIF_AVX512_DEFAULT
+        ovs_assert(dpif_impls[1].func == dp_netdev_input_outer_avx512);
+        if (!dp_netdev_input_outer_avx512_probe()) {
+            dpif_idx = 1;
+        };
+#endif
+#endif
+
+        VLOG_INFO("Default DPIF implementation is %s.\n",
+                  dpif_impls[dpif_idx].name);
+        default_dpif_func = dpif_impls[dpif_idx].func;
+    }
+
+    return default_dpif_func;
+}
+
+void
+dp_netdev_impl_set_default(dp_netdev_input_func func)
+{
+    default_dpif_func = func;
+}
+
+/* This function checks all available DPIF implementations, and selects the
+ * returns the function pointer to the one requested by "name".
+ */
+int32_t
+dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
+{
+    ovs_assert(name);
+    ovs_assert(out_func);
+
+    uint32_t i;
+
+    for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
+        if (strcmp(dpif_impls[i].name, name) == 0) {
+            /* Probe function is optional - so check it is set before exec. */
+            if (dpif_impls[i].probe) {
+                int probe_ok = dpif_impls[i].probe();
+                if (probe_ok) {
+                  *out_func = NULL;
+                   return probe_ok;
+                }
+            }
+            *out_func = dpif_impls[i].func;
+            return 0;
+        }
+    }
+
+    return -EINVAL;
+}
diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
index 2fd7cc400..fb5380d2c 100644
--- a/lib/dpif-netdev-private-dpif.h
+++ b/lib/dpif-netdev-private-dpif.h
@@ -23,7 +23,52 @@ 
 struct dp_netdev_pmd_thread;
 struct dp_packet_batch;
 
-/* Available implementations for dpif work. */
+/* Typedef for DPIF functions.
+ * Returns a bitmask of packets to handle, possibly including upcall/misses.
+ */
+typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
+                                        struct dp_packet_batch *packets,
+                                        odp_port_t port_no);
+
+/* Probe a DPIF implementation. This allows the implementation to validate CPU
+ * ISA availability. Returns 0 if not available, returns 1 is valid to use.
+ */
+typedef int32_t (*dp_netdev_input_func_probe)(void);
+
+/* Structure describing each available DPIF implmeentation. */
+struct dpif_netdev_impl_info_t {
+    /* Function pointer to execute to have this DPIF implementation run. */
+    dp_netdev_input_func func;
+    /* Function pointer to execute to check the CPU ISA is available to run.
+     * May be NULL, which implies that it is always valid to use.
+     */
+    dp_netdev_input_func_probe probe;
+    /* Name used to select this DPIF implementation. */
+    const char *name;
+};
+
+/* This function checks all available DPIF implementations, and selects the
+ * returns the function pointer to the one requested by "name".
+ */
+int32_t
+dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
+
+/* Returns the default DPIF which is first ./configure selected, but can be
+ * overridden at runtime. */
+dp_netdev_input_func dp_netdev_impl_get_default(void);
+
+/* Overrides the default DPIF with the user set DPIF. */
+void dp_netdev_impl_set_default(dp_netdev_input_func func);
+
+/* Available implementations of DPIF below. */
+int32_t
+dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
+                struct dp_packet_batch *packets,
+                odp_port_t in_port);
+
+/* AVX512 enabled DPIF implementation and probe functions. */
+int32_t
+dp_netdev_input_outer_avx512_probe(void);
 int32_t
 dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
                              struct dp_packet_batch *packets,
diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index c0c94c566..19ce99f4e 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -28,6 +28,8 @@ 
 #include "dpif-netdev-perf.h"
 #include "openvswitch/thread.h"
 
+#include "dpif-netdev-private-dpif.h"
+
 #ifdef  __cplusplus
 extern "C" {
 #endif
@@ -49,16 +51,6 @@  struct dp_netdev_pmd_thread_ctx {
     bool smc_enable_db;
 };
 
-/* Forward declaration for typedef. */
-struct dp_netdev_pmd_thread;
-
-/* Typedef for DPIF functions.
- * Returns a bitmask of packets to handle, possibly including upcall/misses.
- */
-typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
-                                        struct dp_packet_batch *packets,
-                                        odp_port_t port_no);
-
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5ed61d08b..2dfe9003e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -481,8 +481,8 @@  static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
                                       const struct flow *flow,
                                       const struct nlattr *actions,
                                       size_t actions_len);
-static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
-                            struct dp_packet_batch *, odp_port_t port_no);
+int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
+                        struct dp_packet_batch *, odp_port_t port_no);
 static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
                                   struct dp_packet_batch *);
 
@@ -993,6 +993,81 @@  dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
     ds_destroy(&reply);
 }
 
+static void
+dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
+                     const char *argv[], void *aux OVS_UNUSED)
+{
+    /* This function requires just one parameter, the DPIF name.
+     * A second optional parameter can identify the datapath instance.
+     */
+    const char *dpif_name = argv[1];
+
+    static const char *error_description[2] = {
+        "Unknown DPIF implementation",
+        "CPU doesn't support the required instruction for",
+    };
+
+    dp_netdev_input_func new_func;
+    int32_t err = dp_netdev_impl_get_by_name(dpif_name, &new_func);
+    if (err) {
+        struct ds reply = DS_EMPTY_INITIALIZER;
+        ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
+                      error_description[ (err == -ENOTSUP) ], dpif_name);
+        const char *reply_str = ds_cstr(&reply);
+        unixctl_command_reply(conn, reply_str);
+        VLOG_INFO("%s", reply_str);
+        ds_destroy(&reply);
+        return;
+    }
+
+    /* argv[2] is optional datapath instance. If no datapath name is provided
+     * and only one datapath exists, the one existing datapath is reprobed.
+     */
+    ovs_mutex_lock(&dp_netdev_mutex);
+    struct dp_netdev *dp = NULL;
+
+    if (argc == 3) {
+        dp = shash_find_data(&dp_netdevs, argv[2]);
+    } else if (shash_count(&dp_netdevs) == 1) {
+        dp = shash_first(&dp_netdevs)->data;
+    }
+
+    if (!dp) {
+        ovs_mutex_unlock(&dp_netdev_mutex);
+        unixctl_command_reply_error(conn,
+                                    "please specify an existing datapath");
+        return;
+    }
+
+    /* Get PMD threads list. */
+    size_t n;
+    struct dp_netdev_pmd_thread **pmd_list;
+    sorted_poll_thread_list(dp, &pmd_list, &n);
+
+    for (size_t i = 0; i < n; i++) {
+        struct dp_netdev_pmd_thread *pmd = pmd_list[i];
+        if (pmd->core_id == NON_PMD_CORE_ID) {
+            continue;
+        }
+
+        /* Set PMD threads DPIF implementation to requested one. */
+        pmd->netdev_input_func = *new_func;
+    };
+
+    ovs_mutex_unlock(&dp_netdev_mutex);
+
+    /* Set the default implementation for PMD threads created in the future. */
+    dp_netdev_impl_set_default(*new_func);
+
+    /* Reply with success to command. */
+    struct ds reply = DS_EMPTY_INITIALIZER;
+    ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
+    const char *reply_str = ds_cstr(&reply);
+    unixctl_command_reply(conn, reply_str);
+    VLOG_INFO("%s", reply_str);
+    ds_destroy(&reply);
+}
+
 static void
 dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
                           const char *argv[], void *aux OVS_UNUSED)
@@ -1215,6 +1290,10 @@  dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
                              0, 0, dpif_netdev_subtable_lookup_get,
                              NULL);
+    unixctl_command_register("dpif-netdev/dpif-set",
+                             "dpif_implementation_name [dp]",
+                             1, 2, dpif_netdev_impl_set,
+                             NULL);
     return 0;
 }
 
@@ -6038,8 +6117,8 @@  dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
     hmap_init(&pmd->send_port_cache);
     cmap_init(&pmd->tx_bonds);
 
-    /* Initialize the DPIF function pointer to the default scalar version. */
-    pmd->netdev_input_func = dp_netdev_input;
+    /* Initialize DPIF function pointer to the default configured version. */
+    pmd->netdev_input_func = dp_netdev_impl_get_default();
 
     /* init the 'flow_cache' since there is no
      * actual thread created for NON_PMD_CORE_ID. */
@@ -6973,7 +7052,7 @@  dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
     }
 }
 
-static int32_t
+int32_t
 dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
                 struct dp_packet_batch *packets,
                 odp_port_t port_no)