[ovs-dev,dpdk-hwol,v6,1/2] netdev-dpdk: Upgrade to dpdk v18.08

Message ID 1540484676-8288-1-git-send-email-ophirmu@mellanox.com
State New
Delegated to: Ian Stokes
Headers show
Series
  • [ovs-dev,dpdk-hwol,v6,1/2] netdev-dpdk: Upgrade to dpdk v18.08
Related show

Commit Message

Ophir Munk Oct. 25, 2018, 4:24 p.m.
1. Enable compilation and linkage with dpdk 18.08.0
The following dpdk commits which were introduced after dpdk 17.11.x
require OVS updates to accommodate to the dpdk changes.
- ce17edde ("ethdev: introduce Rx queue offloads API")
- ab3ce1e0 ("ethdev: remove old offload API")
- c06ddf96 ("meter: add configuration profile")
- e58638c3 ("ethdev: fix TPID handling in flow API")
- cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
- ac8d22de ("ethdev: flatten RSS configuration in flow API")

2. Limit configured rss hash functions to only those supported
by the eth device.

3. Set default RSS key in struct action_rss_data, required by OVS commit
- e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow")
when configured with "other_config:hw-offload=true"
Remark: calling RSS with 0 length (default) key is rejected
in DPDK 18.08 and will be enabled in DPDK 18.11. It has no effect
when running in a "hw-offload=false" configuration.

4. Update references to DPDK version 18.08 in Documentation and in
travis linux-build script

5. There are currently warnings on DPDK deprecated functions calls:
- rte_eth_dev_attach
- rte_eth_dev_detach
- rte_eth_devargs_parse
The deprecated functions calls replacements will be added to
DPDK 18.11.

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
v1:
First version

v2:
Avoid seg faults cases as described in 
https://patchwork.ozlabs.org/patch/965451/
by using the patch in:
https://github.com/kevintraynor/ovs-dpdk-
master/commit/88f46cc5ab338eb4f3ca5db1eacd0effefe4fa0c

v3:
- Rebase on latest dpdk-hwol branch
- Updates based on latest reviews to versions v1 & v2

v4:
This patch got lost in mailing list server due to administrative issues and
is now obsolete

v5:
- updated commit message
- Address all reviews (some skipped by mistake) from recent versions
- it is suggested to ignore deprecated functions warnings as the functions 
replacements are missing in DPDK 18.08 and will be added to DPDK 18.11 

v6:
- Rebase on latest dpdk-hwol branch
- Updates based on patch reviews
https://patchwork.ozlabs.org/project/openvswitch/list/?series=70067

 .travis/linux-build.sh                   |   2 +-
 Documentation/intro/install/dpdk.rst     |  14 +--
 Documentation/topics/dpdk/vhost-user.rst |   6 +-
 lib/netdev-dpdk.c                        | 142 ++++++++++++++++++++-----------
 4 files changed, 105 insertions(+), 59 deletions(-)

Comments

Stokes, Ian Nov. 6, 2018, 11:14 a.m. | #1
> 1. Enable compilation and linkage with dpdk 18.08.0 The following dpdk
> commits which were introduced after dpdk 17.11.x require OVS updates to
> accommodate to the dpdk changes.
> - ce17edde ("ethdev: introduce Rx queue offloads API")
> - ab3ce1e0 ("ethdev: remove old offload API")
> - c06ddf96 ("meter: add configuration profile")
> - e58638c3 ("ethdev: fix TPID handling in flow API")
> - cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
> - ac8d22de ("ethdev: flatten RSS configuration in flow API")
> 
> 2. Limit configured rss hash functions to only those supported by the eth
> device.
> 
> 3. Set default RSS key in struct action_rss_data, required by OVS commit
> - e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow") when
> configured with "other_config:hw-offload=true"
> Remark: calling RSS with 0 length (default) key is rejected in DPDK 18.08
> and will be enabled in DPDK 18.11. It has no effect when running in a "hw-
> offload=false" configuration.
> 
> 4. Update references to DPDK version 18.08 in Documentation and in travis
> linux-build script
> 
> 5. There are currently warnings on DPDK deprecated functions calls:
> - rte_eth_dev_attach
> - rte_eth_dev_detach
> - rte_eth_devargs_parse
> The deprecated functions calls replacements will be added to DPDK 18.11.
> 

Thanks for this Ophir,

I believe with the exception of 1 comment from Ilya regarding a comment, all queries have been addressed in the v6.

I've completed a validation of all major features and everything seems ok.

I don't see any acks on the patchset to date. Is there any outstanding issue blocking people from signing off?

I'm aware that DPDK 18.11 RC2 has been released today. With that in mind I propose we merge these patches to dpdk_latest and the dpdk_hwol branches in the OVS repo.

That will allow us to begin testing with 18.11 RC2 and start work on any patches required OVS DPDK.

Any objections?

Thanks
Ian

> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> v1:
> First version
> 
> v2:
> Avoid seg faults cases as described in
> https://patchwork.ozlabs.org/patch/965451/
> by using the patch in:
> https://github.com/kevintraynor/ovs-dpdk-
> master/commit/88f46cc5ab338eb4f3ca5db1eacd0effefe4fa0c
> 
> v3:
> - Rebase on latest dpdk-hwol branch
> - Updates based on latest reviews to versions v1 & v2
> 
> v4:
> This patch got lost in mailing list server due to administrative issues
> and is now obsolete
> 
> v5:
> - updated commit message
> - Address all reviews (some skipped by mistake) from recent versions
> - it is suggested to ignore deprecated functions warnings as the functions
> replacements are missing in DPDK 18.08 and will be added to DPDK 18.11
> 
> v6:
> - Rebase on latest dpdk-hwol branch
> - Updates based on patch reviews
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=70067
> 
>  .travis/linux-build.sh                   |   2 +-
>  Documentation/intro/install/dpdk.rst     |  14 +--
>  Documentation/topics/dpdk/vhost-user.rst |   6 +-
>  lib/netdev-dpdk.c                        | 142 ++++++++++++++++++++------
> -----
>  4 files changed, 105 insertions(+), 59 deletions(-)
> 
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh index
> 1fe5bbf..4c9e952 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -83,7 +83,7 @@ fi
> 
>  if [ "$DPDK" ]; then
>      if [ -z "$DPDK_VER" ]; then
> -        DPDK_VER="17.11.4"
> +        DPDK_VER="18.08"
>      fi
>      install_dpdk $DPDK_VER
>      if [ "$CC" = "clang" ]; then
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index 312b1a8..73610ef 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -42,7 +42,7 @@ Build requirements
>  In addition to the requirements described in :doc:`general`, building
> Open  vSwitch with DPDK will require the following:
> 
> -- DPDK 17.11.4
> +- DPDK 18.08.0
> 
>  - A `DPDK supported NIC`_
> 
> @@ -71,9 +71,9 @@ Install DPDK
>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
> 
>         $ cd /usr/src/
> -       $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
> -       $ tar xf dpdk-17.11.4.tar.xz
> -       $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.4
> +       $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
> +       $ tar xf dpdk-18.08.tar.xz
> +       $ export DPDK_DIR=/usr/src/dpdk-stable-18.08
>         $ cd $DPDK_DIR
> 
>  #. (Optional) Configure DPDK as a shared library @@ -283,9 +283,9 @@ with
> either the ovs-vswitchd logs, or by running either of the commands::
> 
>    $ ovs-vswitchd --version
>    ovs-vswitchd (Open vSwitch) 2.9.0
> -  DPDK 17.11.0
> +  DPDK 18.08.0
>    $ ovs-vsctl get Open_vSwitch . dpdk_version
> -  "DPDK 17.11.0"
> +  "DPDK 18.08.0"
> 
>  At this point you can use ovs-vsctl to set up bridges and other Open
> vSwitch  features. Seeing as we've configured the DPDK datapath, we will
> use DPDK-type @@ -673,7 +673,7 @@ Limitations
>    The latest list of validated firmware versions can be found in the
> `DPDK
>    release notes`_.
> 
> -.. _DPDK release notes:
> http://dpdk.org/doc/guides/rel_notes/release_17_11.html
> +.. _DPDK release notes:
> +http://dpdk.org/doc/guides/rel_notes/release_18_08.html
> 
>  - Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
>    given MTU value is calculated e.g. i40e driver includes 2 x vlan
> headers in diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 176bf17..56f58ba 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -320,9 +320,9 @@ To begin, instantiate a guest as described in
> :ref:`dpdk-vhost-user` or  DPDK sources to VM and build DPDK::
> 
>      $ cd /root/dpdk/
> -    $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
> -    $ tar xf dpdk-17.11.4.tar.xz
> -    $ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.4
> +    $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
> +    $ tar xf dpdk-18.08.tar.xz
> +    $ export DPDK_DIR=/root/dpdk/dpdk-stable-18.08
>      $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
>      $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>      $ cd $DPDK_DIR
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f91aa27..adfad29
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -168,11 +168,7 @@ static const struct rte_eth_conf port_conf = {
>      .rxmode = {
>          .mq_mode = ETH_MQ_RX_RSS,
>          .split_hdr_size = 0,
> -        .header_split   = 0, /* Header Split disabled */
> -        .hw_ip_checksum = 0, /* IP checksum offload disabled */
> -        .hw_vlan_filter = 0, /* VLAN filtering disabled */
> -        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
> -        .hw_strip_crc   = 0,
> +        .offloads = 0,
>      },
>      .rx_adv_conf = {
>          .rss_conf = {
> @@ -364,6 +360,7 @@ struct dpdk_ring {
>  struct ingress_policer {
>      struct rte_meter_srtcm_params app_srtcm_params;
>      struct rte_meter_srtcm in_policer;
> +    struct rte_meter_srtcm_profile in_prof;
>      rte_spinlock_t policer_lock;
>  };
> 
> @@ -894,6 +891,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int
> n_rxq, int n_txq)
>      struct rte_eth_dev_info info;
>      uint16_t conf_mtu;
> 
> +    rte_eth_dev_info_get(dev->port_id, &info);
> +
>      /* As of DPDK 17.11.1 a few PMDs require to explicitly enable
>       * scatter to support jumbo RX. Checking the offload capabilities
>       * is not an option as PMDs are not required yet to report @@ -901,20
> +900,25 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq,
> int n_txq)
>       * (testing or code review). Listing all such PMDs feels harder
>       * than highlighting the one known not to need scatter */
>      if (dev->mtu > ETHER_MTU) {
> -        rte_eth_dev_info_get(dev->port_id, &info);
>          if (strncmp(info.driver_name, "net_nfp", 7)) {
> -            conf.rxmode.enable_scatter = 1;
> +            conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
>          }
>      }
> 
>      conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> -    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
> -                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> +
> +    if (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) {
> +        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CHECKSUM;
> +    }
> 
>      if (dev->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) {
> -        conf.rxmode.hw_strip_crc = 1;
> +        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CRC_STRIP;
>      }
> 
> +    /* Limit configured rss hash functions to only those supported
> +     * by the eth device. */
> +    conf.rx_adv_conf.rss_conf.rss_hf &= info.flow_type_rss_offloads;
> +
>      /* A device may report more queues than it makes available (this has
>       * been observed for Intel xl710, which reserves some of them for
>       * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not @@ -
> 1932,16 +1936,18 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int
> qid,
> 
>  static inline bool
>  netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
> +                               struct rte_meter_srtcm_profile *profile,
>                                 struct rte_mbuf *pkt, uint64_t time)  {
>      uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> ether_hdr);
> 
> -    return rte_meter_srtcm_color_blind_check(meter, time, pkt_len) ==
> -                                                e_RTE_METER_GREEN;
> +    return rte_meter_srtcm_color_blind_check(meter, profile, time,
> pkt_len) ==
> +                                             e_RTE_METER_GREEN;
>  }
> 
>  static int
>  netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
> +                        struct rte_meter_srtcm_profile *profile,
>                          struct rte_mbuf **pkts, int pkt_cnt,
>                          bool should_steal)  { @@ -1953,7 +1959,8 @@
> netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
>      for (i = 0; i < pkt_cnt; i++) {
>          pkt = pkts[i];
>          /* Handle current packet */
> -        if (netdev_dpdk_policer_pkt_handle(meter, pkt, current_time)) {
> +        if (netdev_dpdk_policer_pkt_handle(meter, profile,
> +                pkt, current_time)) {
>              if (cnt != i) {
>                  pkts[cnt] = pkt;
>              }
> @@ -1975,8 +1982,8 @@ ingress_policer_run(struct ingress_policer *policer,
> struct rte_mbuf **pkts,
>      int cnt = 0;
> 
>      rte_spinlock_lock(&policer->policer_lock);
> -    cnt = netdev_dpdk_policer_run(&policer->in_policer, pkts,
> -                                  pkt_cnt, should_steal);
> +    cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer-
> >in_prof,
> +                                  pkts, pkt_cnt, should_steal);
>      rte_spinlock_unlock(&policer->policer_lock);
> 
>      return cnt;
> @@ -2767,8 +2774,12 @@ netdev_dpdk_policer_construct(uint32_t rate,
> uint32_t burst)
>      policer->app_srtcm_params.cir = rate_bytes;
>      policer->app_srtcm_params.cbs = burst_bytes;
>      policer->app_srtcm_params.ebs = 0;
> -    err = rte_meter_srtcm_config(&policer->in_policer,
> -                                    &policer->app_srtcm_params);
> +    err = rte_meter_srtcm_profile_config(&policer->in_prof,
> +                                         &policer->app_srtcm_params);
> +    if (!err) {
> +        err = rte_meter_srtcm_config(&policer->in_policer,
> +                                     &policer->in_prof);
> +    }
>      if (err) {
>          VLOG_ERR("Could not create rte meter for ingress policer");
>          free(policer);
> @@ -3017,9 +3028,23 @@ netdev_dpdk_get_status(const struct netdev *netdev,
> struct smap *args)
>          return ENODEV;
>      }
> 
> +    ovs_mutex_lock(&dpdk_mutex);
>      ovs_mutex_lock(&dev->mutex);
>      rte_eth_dev_info_get(dev->port_id, &dev_info);
>      ovs_mutex_unlock(&dev->mutex);
> +    const struct rte_bus *bus;
> +    const struct rte_pci_device *pci_dev;
> +    uint16_t vendor_id = PCI_ANY_ID;
> +    uint16_t device_id = PCI_ANY_ID;
> +    bus = rte_bus_find_by_device(dev_info.device);
> +    if (bus && !strcmp(bus->name, "pci")) {
> +        pci_dev = RTE_DEV_TO_PCI(dev_info.device);
> +        if (pci_dev) {
> +            vendor_id = pci_dev->id.vendor_id;
> +            device_id = pci_dev->id.device_id;
> +        }
> +    }
> +    ovs_mutex_unlock(&dpdk_mutex);
> 
>      smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
>      smap_add_format(args, "numa_id", "%d", @@ -3042,14 +3067,8 @@
> netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
>      smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD);
>      smap_add_format(args, "if_descr", "%s %s", rte_version(),
>                                                 dev_info.driver_name);
> -
> -    if (dev_info.pci_dev) {
> -        smap_add_format(args, "pci-vendor_id", "0x%x",
> -                        dev_info.pci_dev->id.vendor_id);
> -        smap_add_format(args, "pci-device_id", "0x%x",
> -                        dev_info.pci_dev->id.device_id);
> -    }
> -
> +    smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id);
> +    smap_add_format(args, "pci-device_id", "0x%x", device_id);
>      return 0;
>  }
> 
> @@ -3727,6 +3746,7 @@ struct egress_policer {
>      struct qos_conf qos_conf;
>      struct rte_meter_srtcm_params app_srtcm_params;
>      struct rte_meter_srtcm egress_meter;
> +    struct rte_meter_srtcm_profile egress_prof;
>  };
> 
>  static void
> @@ -3749,11 +3769,17 @@ egress_policer_qos_construct(const struct smap
> *details,
>      policer = xmalloc(sizeof *policer);
>      qos_conf_init(&policer->qos_conf, &egress_policer_ops);
>      egress_policer_details_to_param(details, &policer->app_srtcm_params);
> -    err = rte_meter_srtcm_config(&policer->egress_meter,
> -                                 &policer->app_srtcm_params);
> +    err = rte_meter_srtcm_profile_config(&policer->egress_prof,
> +                                         &policer->app_srtcm_params);
> +    if (!err) {
> +        err = rte_meter_srtcm_config(&policer->egress_meter,
> +                                     &policer->egress_prof);
> +    }
> +
>      if (!err) {
>          *conf = &policer->qos_conf;
>      } else {
> +        VLOG_ERR("Could not create rte meter for egress policer");
>          free(policer);
>          *conf = NULL;
>          err = -err;
> @@ -3803,7 +3829,8 @@ egress_policer_run(struct qos_conf *conf, struct
> rte_mbuf **pkts, int pkt_cnt,
>      struct egress_policer *policer =
>          CONTAINER_OF(conf, struct egress_policer, qos_conf);
> 
> -    cnt = netdev_dpdk_policer_run(&policer->egress_meter, pkts,
> +    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
> +                                  &policer->egress_prof, pkts,
>                                    pkt_cnt, should_steal);
> 
>      return cnt;
> @@ -3888,7 +3915,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk
> *dev)
>      if (!err) {
>          /* A new mempool was created or re-used. */
>          netdev_change_seq_changed(&dev->up);
> -    } else if (err != EEXIST){
> +    } else if (err != EEXIST) {
>          return err;
>      }
>      if (netdev_dpdk_get_vid(dev) >= 0) { @@ -4103,15 +4130,15 @@
> dump_flow_pattern(struct rte_flow_item *item)
> 
>          VLOG_DBG("rte flow vlan pattern:\n");
>          if (vlan_spec) {
> -            VLOG_DBG("  Spec: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
> -                     ntohs(vlan_spec->tpid), ntohs(vlan_spec->tci));
> +            VLOG_DBG("  Spec: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
> +                     ntohs(vlan_spec->inner_type),
> + ntohs(vlan_spec->tci));
>          } else {
>              VLOG_DBG("  Spec = null\n");
>          }
> 
>          if (vlan_mask) {
> -            VLOG_DBG("  Mask: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
> -                     vlan_mask->tpid, vlan_mask->tci);
> +            VLOG_DBG("  Mask: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
> +                     ntohs(vlan_mask->inner_type),
> + ntohs(vlan_mask->tci));
>          } else {
>              VLOG_DBG("  Mask = null\n");
>          }
> @@ -4281,27 +4308,39 @@ add_flow_action(struct flow_actions *actions, enum
> rte_flow_action_type type,
>      actions->cnt++;
>  }
> 
> +struct action_rss_data {
> +    struct rte_flow_action_rss conf;
> +    uint16_t queue[0];
> +};
> +
>  static struct rte_flow_action_rss *
>  add_flow_rss_action(struct flow_actions *actions,
>                      struct netdev *netdev) {
>      int i;
> -    struct rte_flow_action_rss *rss;
> -
> -    rss = xmalloc(sizeof(*rss) + sizeof(uint16_t) * netdev->n_rxq);
> -    /*
> -     * Setting it to NULL will let the driver use the default RSS
> -     * configuration we have set: &port_conf.rx_adv_conf.rss_conf.
> -     */
> -    rss->rss_conf = NULL;
> -    rss->num = netdev->n_rxq;
> +    struct action_rss_data *rss_data;
> +
> +    rss_data = xmalloc(sizeof(struct action_rss_data) +
> +                       sizeof(uint16_t) * netdev->n_rxq);
> +    *rss_data = (struct action_rss_data) {
> +        .conf = (struct rte_flow_action_rss) {
> +            .func = RTE_ETH_HASH_FUNCTION_DEFAULT,
> +            .level = 0,
> +            .types = 0,
> +            .queue_num = netdev->n_rxq,
> +            .queue = rss_data->queue,
> +            .key_len = 0,
> +            .key  = NULL
> +        },
> +    };
> 
> -    for (i = 0; i < rss->num; i++) {
> -        rss->queue[i] = i;
> +    /* Override queue array with default */
> +    for (i = 0; i < netdev->n_rxq; i++) {
> +       rss_data->queue[i] = i;
>      }
> 
> -    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, rss);
> +    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS,
> + &rss_data->conf);
> 
> -    return rss;
> +    return &rss_data->conf;
>  }
> 
>  static int
> @@ -4365,7 +4404,7 @@ netdev_dpdk_add_rte_flow_offload(struct netdev
> *netdev,
>          vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
> 
>          /* match any protocols */
> -        vlan_mask.tpid = 0;
> +        vlan_mask.inner_type = 0;
> 
>          add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_VLAN,
>                           &vlan_spec, &vlan_mask); @@ -4520,7 +4559,14 @@
> end_proto_check:
> 
>      flow = rte_flow_create(dev->port_id, &flow_attr, patterns.items,
>                             actions.actions, &error);
> -    free(rss);
> +    /*
> +     * 'rss' points to a memory (struct rte_flow_action_rss) that is
> contained
> +     * in a bigger memory (struct action_rss_data) that was allocated in
> +     * function add_flow_rss_actions(). The bigger memory holds
> additional
> +     * space for the RSS queues. Given the 'rss' pointer we are freeing
> the
> +     * bigger memory by using the CONTAINER_OF() macro.
> +     */
> +    free(CONTAINER_OF(rss, struct action_rss_data, conf));
>      if (!flow) {
>          VLOG_ERR("rte flow creat error: %u : message : %s\n",
>                   error.type, error.message);
> --
> 1.8.3.1
Stokes, Ian Nov. 7, 2018, 5:43 p.m. | #2
> > 1. Enable compilation and linkage with dpdk 18.08.0 The following dpdk
> > commits which were introduced after dpdk 17.11.x require OVS updates
> > to accommodate to the dpdk changes.
> > - ce17edde ("ethdev: introduce Rx queue offloads API")
> > - ab3ce1e0 ("ethdev: remove old offload API")
> > - c06ddf96 ("meter: add configuration profile")
> > - e58638c3 ("ethdev: fix TPID handling in flow API")
> > - cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
> > - ac8d22de ("ethdev: flatten RSS configuration in flow API")
> >
> > 2. Limit configured rss hash functions to only those supported by the
> > eth device.
> >
> > 3. Set default RSS key in struct action_rss_data, required by OVS
> > commit
> > - e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow") when
> > configured with "other_config:hw-offload=true"
> > Remark: calling RSS with 0 length (default) key is rejected in DPDK
> > 18.08 and will be enabled in DPDK 18.11. It has no effect when running
> > in a "hw- offload=false" configuration.
> >
> > 4. Update references to DPDK version 18.08 in Documentation and in
> > travis linux-build script
> >
> > 5. There are currently warnings on DPDK deprecated functions calls:
> > - rte_eth_dev_attach
> > - rte_eth_dev_detach
> > - rte_eth_devargs_parse
> > The deprecated functions calls replacements will be added to DPDK 18.11.
> >
> 
> Thanks for this Ophir,
> 
> I believe with the exception of 1 comment from Ilya regarding a comment,
> all queries have been addressed in the v6.
> 
> I've completed a validation of all major features and everything seems ok.
> 
> I don't see any acks on the patchset to date. Is there any outstanding
> issue blocking people from signing off?
> 
> I'm aware that DPDK 18.11 RC2 has been released today. With that in mind I
> propose we merge these patches to dpdk_latest and the dpdk_hwol branches
> in the OVS repo.
> 
> That will allow us to begin testing with 18.11 RC2 and start work on any
> patches required OVS DPDK.
> 
> Any objections?
> 
> Thanks
> Ian

There's been no other feedback on this series so I'm assuming its ok to go. Thanks to all for the reviews to date.

I've rebased this to the head of master and pushed it to dpdk-hwol and dpdk-latest. 

Thanks Ian

Patch

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 1fe5bbf..4c9e952 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -83,7 +83,7 @@  fi
 
 if [ "$DPDK" ]; then
     if [ -z "$DPDK_VER" ]; then
-        DPDK_VER="17.11.4"
+        DPDK_VER="18.08"
     fi
     install_dpdk $DPDK_VER
     if [ "$CC" = "clang" ]; then
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index 312b1a8..73610ef 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@  Build requirements
 In addition to the requirements described in :doc:`general`, building Open
 vSwitch with DPDK will require the following:
 
-- DPDK 17.11.4
+- DPDK 18.08.0
 
 - A `DPDK supported NIC`_
 
@@ -71,9 +71,9 @@  Install DPDK
 #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
 
        $ cd /usr/src/
-       $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
-       $ tar xf dpdk-17.11.4.tar.xz
-       $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.4
+       $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
+       $ tar xf dpdk-18.08.tar.xz
+       $ export DPDK_DIR=/usr/src/dpdk-stable-18.08
        $ cd $DPDK_DIR
 
 #. (Optional) Configure DPDK as a shared library
@@ -283,9 +283,9 @@  with either the ovs-vswitchd logs, or by running either of the commands::
 
   $ ovs-vswitchd --version
   ovs-vswitchd (Open vSwitch) 2.9.0
-  DPDK 17.11.0
+  DPDK 18.08.0
   $ ovs-vsctl get Open_vSwitch . dpdk_version
-  "DPDK 17.11.0"
+  "DPDK 18.08.0"
 
 At this point you can use ovs-vsctl to set up bridges and other Open vSwitch
 features. Seeing as we've configured the DPDK datapath, we will use DPDK-type
@@ -673,7 +673,7 @@  Limitations
   The latest list of validated firmware versions can be found in the `DPDK
   release notes`_.
 
-.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html
+.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_18_08.html
 
 - Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
   given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 176bf17..56f58ba 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -320,9 +320,9 @@  To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
 DPDK sources to VM and build DPDK::
 
     $ cd /root/dpdk/
-    $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
-    $ tar xf dpdk-17.11.4.tar.xz
-    $ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.4
+    $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
+    $ tar xf dpdk-18.08.tar.xz
+    $ export DPDK_DIR=/root/dpdk/dpdk-stable-18.08
     $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
     $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
     $ cd $DPDK_DIR
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f91aa27..adfad29 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -168,11 +168,7 @@  static const struct rte_eth_conf port_conf = {
     .rxmode = {
         .mq_mode = ETH_MQ_RX_RSS,
         .split_hdr_size = 0,
-        .header_split   = 0, /* Header Split disabled */
-        .hw_ip_checksum = 0, /* IP checksum offload disabled */
-        .hw_vlan_filter = 0, /* VLAN filtering disabled */
-        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
-        .hw_strip_crc   = 0,
+        .offloads = 0,
     },
     .rx_adv_conf = {
         .rss_conf = {
@@ -364,6 +360,7 @@  struct dpdk_ring {
 struct ingress_policer {
     struct rte_meter_srtcm_params app_srtcm_params;
     struct rte_meter_srtcm in_policer;
+    struct rte_meter_srtcm_profile in_prof;
     rte_spinlock_t policer_lock;
 };
 
@@ -894,6 +891,8 @@  dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
     struct rte_eth_dev_info info;
     uint16_t conf_mtu;
 
+    rte_eth_dev_info_get(dev->port_id, &info);
+
     /* As of DPDK 17.11.1 a few PMDs require to explicitly enable
      * scatter to support jumbo RX. Checking the offload capabilities
      * is not an option as PMDs are not required yet to report
@@ -901,20 +900,25 @@  dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
      * (testing or code review). Listing all such PMDs feels harder
      * than highlighting the one known not to need scatter */
     if (dev->mtu > ETHER_MTU) {
-        rte_eth_dev_info_get(dev->port_id, &info);
         if (strncmp(info.driver_name, "net_nfp", 7)) {
-            conf.rxmode.enable_scatter = 1;
+            conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
         }
     }
 
     conf.intr_conf.lsc = dev->lsc_interrupt_mode;
-    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
-                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
+
+    if (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) {
+        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CHECKSUM;
+    }
 
     if (dev->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) {
-        conf.rxmode.hw_strip_crc = 1;
+        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CRC_STRIP;
     }
 
+    /* Limit configured rss hash functions to only those supported
+     * by the eth device. */
+    conf.rx_adv_conf.rss_conf.rss_hf &= info.flow_type_rss_offloads;
+
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
      * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
@@ -1932,16 +1936,18 @@  netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
 
 static inline bool
 netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
+                               struct rte_meter_srtcm_profile *profile,
                                struct rte_mbuf *pkt, uint64_t time)
 {
     uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct ether_hdr);
 
-    return rte_meter_srtcm_color_blind_check(meter, time, pkt_len) ==
-                                                e_RTE_METER_GREEN;
+    return rte_meter_srtcm_color_blind_check(meter, profile, time, pkt_len) ==
+                                             e_RTE_METER_GREEN;
 }
 
 static int
 netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
+                        struct rte_meter_srtcm_profile *profile,
                         struct rte_mbuf **pkts, int pkt_cnt,
                         bool should_steal)
 {
@@ -1953,7 +1959,8 @@  netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
     for (i = 0; i < pkt_cnt; i++) {
         pkt = pkts[i];
         /* Handle current packet */
-        if (netdev_dpdk_policer_pkt_handle(meter, pkt, current_time)) {
+        if (netdev_dpdk_policer_pkt_handle(meter, profile,
+                pkt, current_time)) {
             if (cnt != i) {
                 pkts[cnt] = pkt;
             }
@@ -1975,8 +1982,8 @@  ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
     int cnt = 0;
 
     rte_spinlock_lock(&policer->policer_lock);
-    cnt = netdev_dpdk_policer_run(&policer->in_policer, pkts,
-                                  pkt_cnt, should_steal);
+    cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer->in_prof,
+                                  pkts, pkt_cnt, should_steal);
     rte_spinlock_unlock(&policer->policer_lock);
 
     return cnt;
@@ -2767,8 +2774,12 @@  netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst)
     policer->app_srtcm_params.cir = rate_bytes;
     policer->app_srtcm_params.cbs = burst_bytes;
     policer->app_srtcm_params.ebs = 0;
-    err = rte_meter_srtcm_config(&policer->in_policer,
-                                    &policer->app_srtcm_params);
+    err = rte_meter_srtcm_profile_config(&policer->in_prof,
+                                         &policer->app_srtcm_params);
+    if (!err) {
+        err = rte_meter_srtcm_config(&policer->in_policer,
+                                     &policer->in_prof);
+    }
     if (err) {
         VLOG_ERR("Could not create rte meter for ingress policer");
         free(policer);
@@ -3017,9 +3028,23 @@  netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
         return ENODEV;
     }
 
+    ovs_mutex_lock(&dpdk_mutex);
     ovs_mutex_lock(&dev->mutex);
     rte_eth_dev_info_get(dev->port_id, &dev_info);
     ovs_mutex_unlock(&dev->mutex);
+    const struct rte_bus *bus;
+    const struct rte_pci_device *pci_dev;
+    uint16_t vendor_id = PCI_ANY_ID;
+    uint16_t device_id = PCI_ANY_ID;
+    bus = rte_bus_find_by_device(dev_info.device);
+    if (bus && !strcmp(bus->name, "pci")) {
+        pci_dev = RTE_DEV_TO_PCI(dev_info.device);
+        if (pci_dev) {
+            vendor_id = pci_dev->id.vendor_id;
+            device_id = pci_dev->id.device_id;
+        }
+    }
+    ovs_mutex_unlock(&dpdk_mutex);
 
     smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
     smap_add_format(args, "numa_id", "%d",
@@ -3042,14 +3067,8 @@  netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
     smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD);
     smap_add_format(args, "if_descr", "%s %s", rte_version(),
                                                dev_info.driver_name);
-
-    if (dev_info.pci_dev) {
-        smap_add_format(args, "pci-vendor_id", "0x%x",
-                        dev_info.pci_dev->id.vendor_id);
-        smap_add_format(args, "pci-device_id", "0x%x",
-                        dev_info.pci_dev->id.device_id);
-    }
-
+    smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id);
+    smap_add_format(args, "pci-device_id", "0x%x", device_id);
     return 0;
 }
 
@@ -3727,6 +3746,7 @@  struct egress_policer {
     struct qos_conf qos_conf;
     struct rte_meter_srtcm_params app_srtcm_params;
     struct rte_meter_srtcm egress_meter;
+    struct rte_meter_srtcm_profile egress_prof;
 };
 
 static void
@@ -3749,11 +3769,17 @@  egress_policer_qos_construct(const struct smap *details,
     policer = xmalloc(sizeof *policer);
     qos_conf_init(&policer->qos_conf, &egress_policer_ops);
     egress_policer_details_to_param(details, &policer->app_srtcm_params);
-    err = rte_meter_srtcm_config(&policer->egress_meter,
-                                 &policer->app_srtcm_params);
+    err = rte_meter_srtcm_profile_config(&policer->egress_prof,
+                                         &policer->app_srtcm_params);
+    if (!err) {
+        err = rte_meter_srtcm_config(&policer->egress_meter,
+                                     &policer->egress_prof);
+    }
+
     if (!err) {
         *conf = &policer->qos_conf;
     } else {
+        VLOG_ERR("Could not create rte meter for egress policer");
         free(policer);
         *conf = NULL;
         err = -err;
@@ -3803,7 +3829,8 @@  egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
     struct egress_policer *policer =
         CONTAINER_OF(conf, struct egress_policer, qos_conf);
 
-    cnt = netdev_dpdk_policer_run(&policer->egress_meter, pkts,
+    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
+                                  &policer->egress_prof, pkts,
                                   pkt_cnt, should_steal);
 
     return cnt;
@@ -3888,7 +3915,7 @@  dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
     if (!err) {
         /* A new mempool was created or re-used. */
         netdev_change_seq_changed(&dev->up);
-    } else if (err != EEXIST){
+    } else if (err != EEXIST) {
         return err;
     }
     if (netdev_dpdk_get_vid(dev) >= 0) {
@@ -4103,15 +4130,15 @@  dump_flow_pattern(struct rte_flow_item *item)
 
         VLOG_DBG("rte flow vlan pattern:\n");
         if (vlan_spec) {
-            VLOG_DBG("  Spec: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
-                     ntohs(vlan_spec->tpid), ntohs(vlan_spec->tci));
+            VLOG_DBG("  Spec: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
+                     ntohs(vlan_spec->inner_type), ntohs(vlan_spec->tci));
         } else {
             VLOG_DBG("  Spec = null\n");
         }
 
         if (vlan_mask) {
-            VLOG_DBG("  Mask: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
-                     vlan_mask->tpid, vlan_mask->tci);
+            VLOG_DBG("  Mask: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
+                     ntohs(vlan_mask->inner_type), ntohs(vlan_mask->tci));
         } else {
             VLOG_DBG("  Mask = null\n");
         }
@@ -4281,27 +4308,39 @@  add_flow_action(struct flow_actions *actions, enum rte_flow_action_type type,
     actions->cnt++;
 }
 
+struct action_rss_data {
+    struct rte_flow_action_rss conf;
+    uint16_t queue[0];
+};
+
 static struct rte_flow_action_rss *
 add_flow_rss_action(struct flow_actions *actions,
                     struct netdev *netdev) {
     int i;
-    struct rte_flow_action_rss *rss;
-
-    rss = xmalloc(sizeof(*rss) + sizeof(uint16_t) * netdev->n_rxq);
-    /*
-     * Setting it to NULL will let the driver use the default RSS
-     * configuration we have set: &port_conf.rx_adv_conf.rss_conf.
-     */
-    rss->rss_conf = NULL;
-    rss->num = netdev->n_rxq;
+    struct action_rss_data *rss_data;
+
+    rss_data = xmalloc(sizeof(struct action_rss_data) +
+                       sizeof(uint16_t) * netdev->n_rxq);
+    *rss_data = (struct action_rss_data) {
+        .conf = (struct rte_flow_action_rss) {
+            .func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+            .level = 0,
+            .types = 0,
+            .queue_num = netdev->n_rxq,
+            .queue = rss_data->queue,
+            .key_len = 0,
+            .key  = NULL
+        },
+    };
 
-    for (i = 0; i < rss->num; i++) {
-        rss->queue[i] = i;
+    /* Override queue array with default */
+    for (i = 0; i < netdev->n_rxq; i++) {
+       rss_data->queue[i] = i;
     }
 
-    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, rss);
+    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, &rss_data->conf);
 
-    return rss;
+    return &rss_data->conf;
 }
 
 static int
@@ -4365,7 +4404,7 @@  netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
         vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
 
         /* match any protocols */
-        vlan_mask.tpid = 0;
+        vlan_mask.inner_type = 0;
 
         add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_VLAN,
                          &vlan_spec, &vlan_mask);
@@ -4520,7 +4559,14 @@  end_proto_check:
 
     flow = rte_flow_create(dev->port_id, &flow_attr, patterns.items,
                            actions.actions, &error);
-    free(rss);
+    /*
+     * 'rss' points to a memory (struct rte_flow_action_rss) that is contained
+     * in a bigger memory (struct action_rss_data) that was allocated in
+     * function add_flow_rss_actions(). The bigger memory holds additional
+     * space for the RSS queues. Given the 'rss' pointer we are freeing the
+     * bigger memory by using the CONTAINER_OF() macro.
+     */
+    free(CONTAINER_OF(rss, struct action_rss_data, conf));
     if (!flow) {
         VLOG_ERR("rte flow creat error: %u : message : %s\n",
                  error.type, error.message);