diff mbox

[ovs-dev,V5,2/2] netdev-dpdk: add jumbo frame support

Message ID 1455881112-131651-3-git-send-email-mark.b.kavanagh@intel.com
State Changes Requested
Headers show

Commit Message

Mark Kavanagh Feb. 19, 2016, 11:25 a.m. UTC
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated for each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame can be carried
in a single mbuf, as opposed to partitioning it across multiple mbuf
segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user when adding a DPDK port to a bridge.
If an MTU value is not supplied, or the user-supplied value is invalid,
the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
---
 INSTALL.DPDK.md   |  60 ++++++++++++-
 NEWS              |   3 +-
 lib/netdev-dpdk.c | 248 +++++++++++++++++++++++++++++++++++++-----------------
 3 files changed, 233 insertions(+), 78 deletions(-)

Comments

Flavio Leitner Feb. 22, 2016, 10:59 p.m. UTC | #1
On Fri, 19 Feb 2016 11:25:12 +0000
Mark Kavanagh <mark.b.kavanagh@intel.com> wrote:

> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated for each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame can be carried
> in a single mbuf, as opposed to partitioning it across multiple mbuf
> segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user when adding a DPDK port to a bridge.
> If an MTU value is not supplied, or the user-supplied value is invalid,
> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
> 
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> ---
>  INSTALL.DPDK.md   |  60 ++++++++++++-
>  NEWS              |   3 +-
>  lib/netdev-dpdk.c | 248 +++++++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 233 insertions(+), 78 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index d892788..4ca98cb 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -878,10 +878,63 @@ by adding the following string:
>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>  determines how many queues can be used by the guest.
>  
> +Jumbo Frames
> +------------
> +
> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
> +
> +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
> +'add-port' command-line, along with the required MTU for the port.
> +e.g.
> +
> +     ```
> +     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000
> +     ```
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame may be accommodated inside a single
> +mbuf segment. Once set, the MTU for a DPDK port is immutable.
> +
> +Note that from an OVSDB perspective, the `mtu_request` option for a specific
> +port may be disregarded once initially set, as subsequent modifications to this
> +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
> +ports is reported by the `Interface` table's 'mtu' field.
> +
> +Jumbo frame support has been validated against 13312B frames, using the
> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
> +theoretically be supported. Supported port types excludes vHost-Cuse ports, as
> +that feature is pending deprecation.
> +
> +vHost Ports and Jumbo Frames
> +----------------------------
> +Jumbo frame support is available for DPDK vHost-User ports only. Some additional
> +configuration is needed to take advantage of this feature:
> +
> +  1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
> +      the QEMU command line snippet below:
> +
> +      ```
> +      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +      '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +      ```
> +
> +  2. Where virtio devices are bound to the Linux kernel driver in a guest
> +     environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
> +     MTU of those logical network interfaces must also be increased. This
> +     avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
> +     to the length of the IP packet only, and not that of the entire frame.
> +
> +     e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> +     header and CRC lengths (i.e. 18B) from the max supported frame size.
> +     So, to set the MTU for a 13312B Jumbo Frame:
> +
> +      ```
> +      ifconfig eth1 mtu 13294
> +      ```
> +
>  Restrictions:
>  -------------
>  
> -  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
>    - Currently DPDK port does not make use any offload functionality.
>    - DPDK-vHost support works with 1G huge pages.
>  
> @@ -922,6 +975,11 @@ Restrictions:
>      the next release of DPDK (which includes the above patch) is available and
>      integrated into OVS.
>  
> +  Jumbo Frames:
> +  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
> +     issue that is currently being investigated.
> +  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
> +
>  Bug Reporting:
>  --------------
>  
> diff --git a/NEWS b/NEWS
> index 3e33871..43127f9 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,10 +10,11 @@ Post-v2.5.0
>     - DPDK:
>       * New option "n_rxq" for PMD interfaces.
>         Old 'other_config:n-dpdk-rxqs' is no longer supported.
> +     * Support Jumbo Frames
> +
>     - ovs-benchmark: This utility has been removed due to lack of use and
>       bitrot.
>  
> -
>  v2.5.0 - xx xxx xxxx
>  ---------------------
>     - Dropped support for Python older than version 2.7.  As a consequence,
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 2a06bb5..ac89ee6 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>                                      + sizeof(struct dp_packet)    \
>                                      + RTE_PKTMBUF_HEADROOM)
>  #define NETDEV_DPDK_MBUF_ALIGN      1024
> +#define NETDEV_DPDK_MAX_FRAME_LEN   13312
> +#define MTU_NOT_SET                 0
>  
>  /* Max and min number of packets in the mempool.  OVS tries to allocate a
>   * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
> @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>  {
>      int diag = 0;
>      int i;
> +    struct rte_eth_conf conf = port_conf;
>  
>      /* A device may report more queues than it makes available (this has
>       * been observed for Intel xl710, which reserves some of them for
> @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>              VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
>          }
>  
> -        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
> +        if (dev->mtu > ETHER_MTU) {
> +            conf.rxmode.jumbo_frame = 1;
> +            conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
> +        } else {
> +            conf.rxmode.jumbo_frame = 0;
> +            conf.rxmode.max_rx_pkt_len = 0;
> +        }
> +
> +        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
>          if (diag) {
>              break;
>          }
> @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>  {
>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>      int sid;
> -    int err = 0;
> -    uint32_t buf_size;
>  
>      ovs_mutex_init(&netdev->mutex);
>      ovs_mutex_lock(&netdev->mutex);
> @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>      netdev->port_id = port_no;
>      netdev->type = type;
>      netdev->flags = 0;
> -    netdev->mtu = ETHER_MTU;
> -    netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu);
> -
> -    buf_size = dpdk_buf_size(netdev->mtu);
> -    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size));
> -    if (!netdev->dpdk_mp) {
> -        err = ENOMEM;
> -        goto unlock;
> -    }
> +    netdev->mtu = MTU_NOT_SET;
>  
>      netdev_->n_txq = NR_QUEUE;
>      netdev_->n_rxq = NR_QUEUE;
> @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>  
>      if (type == DPDK_DEV_ETH) {
>          netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
> -        err = dpdk_eth_dev_init(netdev);
> -        if (err) {
> -            goto unlock;
> -        }
>      }
>  
>      list_push_back(&dpdk_list, &netdev->list_node);
>  
> -unlock:
> -    if (err) {
> -        rte_free(netdev->tx_q);
> -    }
>      ovs_mutex_unlock(&netdev->mutex);
> -    return err;
> +    return 0;
>  }
>  
>  static int
> @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>      return 0;
>  }
>  
> +static void
> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
> +{
> +    const char *mtu_str = smap_get(args, "mtu_request");
> +    char *end_ptr = NULL;
> +    int local_mtu;
> +
> +    if (!mtu_str) {
> +        local_mtu = ETHER_MTU;
> +    } else {
> +        local_mtu = strtoul(mtu_str, &end_ptr, 0);
> +        if (local_mtu < ETHER_MTU ||
> +            local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
> +            *end_ptr != '\0') {
> +            local_mtu = ETHER_MTU;
> +            VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
> +                    local_mtu);
> +        } else {
> +            VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);

That message could be VLOG_DBG because it only tells about a parameter being
detected and not if the mtu request succeed.  Other than that the patch looks
good to me.

If no one else has comments to address, maybe a commiter can fix the message
level to avoid re-spinning the patchset.

Thanks for the patch!

Acked-by: Flavio Leitner <fbl@sysclose.org>

--
fbl



> +        }
> +    }
> +
> +    *mtu = local_mtu;
> +}
> +
>  static int
>  vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>  {
> @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>      smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
>      smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
>      smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>      ovs_mutex_unlock(&dev->mutex);
>  
>      return 0;
>  }
>  
> +/* Set the mtu of DPDK_DEV_ETH ports */
> +static int
> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int err, dpdk_mtu;
> +    uint32_t buf_size;
> +    struct dpdk_mp *mp;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    ovs_mutex_lock(&dev->mutex);
> +    if (dev->mtu == mtu) {
> +        err = 0;
> +        goto out;
> +    }
> +
> +    buf_size = dpdk_buf_size(mtu);
> +    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> +
> +    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> +    if (!mp) {
> +        err = ENOMEM;
> +        goto out;
> +    }
> +
> +    rte_eth_dev_stop(dev->port_id);
> +
> +    dev->dpdk_mp = mp;
> +    dev->mtu = mtu;
> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> +    err = dpdk_eth_dev_init(dev);
> +    if (err) {
> +        VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
> +                  "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
> +        dpdk_mp_put(mp);
> +        dev->mtu = ETHER_MTU;
> +        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> +        if (!mp) {
> +            err = ENOMEM;
> +            goto out;
> +        }
> +        dev->dpdk_mp = mp;
> +        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +        dpdk_eth_dev_init(dev);
> +        goto out;
> +    } else {
> +        netdev_change_seq_changed(netdev);
> +    }
> +out:
> +    ovs_mutex_unlock(&dev->mutex);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +    return err;
> +}
> +
>  static int
>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int mtu;
>  
>      ovs_mutex_lock(&dev->mutex);
>      netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
> @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>      netdev_change_seq_changed(netdev);
>      ovs_mutex_unlock(&dev->mutex);
>  
> +    dpdk_dev_parse_mtu(args, &mtu);
> +
> +    if (!dev->mtu) {
> +        return netdev_dpdk_set_mtu(netdev, mtu);
> +    } else if (mtu != dev->mtu) {
> +        VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
> +                  "%d\n", mtu, dev->port_id, dev->mtu);
> +    }
>      return 0;
>  }
>  
> @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
>  }
>  
>  static int
> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> -{
> -    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    int old_mtu, err, dpdk_mtu;
> -    struct dpdk_mp *old_mp;
> -    struct dpdk_mp *mp;
> -    uint32_t buf_size;
> -
> -    ovs_mutex_lock(&dpdk_mutex);
> -    ovs_mutex_lock(&dev->mutex);
> -    if (dev->mtu == mtu) {
> -        err = 0;
> -        goto out;
> -    }
> -
> -    buf_size = dpdk_buf_size(mtu);
> -    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> -
> -    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> -    if (!mp) {
> -        err = ENOMEM;
> -        goto out;
> -    }
> -
> -    rte_eth_dev_stop(dev->port_id);
> -
> -    old_mtu = dev->mtu;
> -    old_mp = dev->dpdk_mp;
> -    dev->dpdk_mp = mp;
> -    dev->mtu = mtu;
> -    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> -
> -    err = dpdk_eth_dev_init(dev);
> -    if (err) {
> -        dpdk_mp_put(mp);
> -        dev->mtu = old_mtu;
> -        dev->dpdk_mp = old_mp;
> -        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> -        dpdk_eth_dev_init(dev);
> -        goto out;
> -    }
> -
> -    dpdk_mp_put(old_mp);
> -    netdev_change_seq_changed(netdev);
> -out:
> -    ovs_mutex_unlock(&dev->mutex);
> -    ovs_mutex_unlock(&dpdk_mutex);
> -    return err;
> -}
> -
> -static int
>  netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>  
>  static int
> @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void)
>      return 0;
>  }
>  
> +/* Set the mtu of DPDK_DEV_VHOST ports */
> +static int
> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int err = 0;
> +    struct dpdk_mp *mp;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    ovs_mutex_lock(&dev->mutex);
> +    if (dev->mtu == mtu) {
> +        err = 0;
> +        goto out;
> +    }
> +
> +    mp = dpdk_mp_get(dev->socket_id, mtu);
> +    if (!mp) {
> +        err = ENOMEM;
> +        goto out;
> +    }
> +
> +    dev->dpdk_mp = mp;
> +    dev->mtu = mtu;
> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> +    netdev_change_seq_changed(netdev);
> +out:
> +    ovs_mutex_unlock(&dev->mutex);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +    return err;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int mtu;
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
> +                                               netdev->requested_n_rxq), 1);
> +    netdev_change_seq_changed(netdev);
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    dpdk_dev_parse_mtu(args, &mtu);
> +
> +    if (!dev->mtu) {
> +        return netdev_dpdk_vhost_set_mtu(netdev, mtu);
> +    } else if (mtu != dev->mtu) {
> +        VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
> +                  "%d\n", mtu, dev->mtu);
> +    }
> +    return 0;
> +}
> +
>  static void
>  dpdk_common_init(void)
>  {
> @@ -2136,8 +2223,9 @@ unlock_dpdk:
>      return err;
>  }
>  
> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
> -    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)          \
> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
> +        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,   \
> +        GET_STATUS, RXQ_RECV)                                          \
>  {                                                             \
>      NAME,                                                     \
>      INIT,                       /* init */                    \
> @@ -2149,7 +2237,7 @@ unlock_dpdk:
>      DESTRUCT,                                                 \
>      netdev_dpdk_dealloc,                                      \
>      netdev_dpdk_get_config,                                   \
> -    netdev_dpdk_set_config,                                   \
> +    SET_CONFIG            ,                                   \
>      NULL,                       /* get_tunnel_config */       \
>      NULL,                       /* build header */            \
>      NULL,                       /* push header */             \
> @@ -2163,7 +2251,7 @@ unlock_dpdk:
>      netdev_dpdk_set_etheraddr,                                \
>      netdev_dpdk_get_etheraddr,                                \
>      netdev_dpdk_get_mtu,                                      \
> -    netdev_dpdk_set_mtu,                                      \
> +    SET_MTU,                                                  \
>      netdev_dpdk_get_ifindex,                                  \
>      GET_CARRIER,                                              \
>      netdev_dpdk_get_carrier_resets,                           \
> @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class =
>          NULL,
>          netdev_dpdk_construct,
>          netdev_dpdk_destruct,
> +        netdev_dpdk_set_config,
>          netdev_dpdk_set_multiq,
>          netdev_dpdk_eth_send,
> +        netdev_dpdk_set_mtu,
>          netdev_dpdk_get_carrier,
>          netdev_dpdk_get_stats,
>          netdev_dpdk_get_features,
> @@ -2323,8 +2413,10 @@ static const struct netdev_class dpdk_ring_class =
>          NULL,
>          netdev_dpdk_ring_construct,
>          netdev_dpdk_destruct,
> +        netdev_dpdk_set_config,
>          netdev_dpdk_set_multiq,
>          netdev_dpdk_ring_send,
> +        netdev_dpdk_set_mtu,
>          netdev_dpdk_get_carrier,
>          netdev_dpdk_get_stats,
>          netdev_dpdk_get_features,
> @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
>          dpdk_vhost_cuse_class_init,
>          netdev_dpdk_vhost_cuse_construct,
>          netdev_dpdk_vhost_destruct,
> +        netdev_dpdk_set_config,
>          netdev_dpdk_vhost_cuse_set_multiq,
>          netdev_dpdk_vhost_send,
> +        NULL,
>          netdev_dpdk_vhost_get_carrier,
>          netdev_dpdk_vhost_get_stats,
>          NULL,
> @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
>          dpdk_vhost_user_class_init,
>          netdev_dpdk_vhost_user_construct,
>          netdev_dpdk_vhost_destruct,
> +        netdev_dpdk_vhost_set_config,
>          netdev_dpdk_vhost_set_multiq,
>          netdev_dpdk_vhost_send,
> +        netdev_dpdk_vhost_set_mtu,
>          netdev_dpdk_vhost_get_carrier,
>          netdev_dpdk_vhost_get_stats,
>          NULL,
Mark Kavanagh Feb. 23, 2016, 2:56 p.m. UTC | #2
>
>On Fri, 19 Feb 2016 11:25:12 +0000
>Mark Kavanagh <mark.b.kavanagh@intel.com> wrote:
>
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated for each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame can be carried
>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>> segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined dynamically by the user when adding a DPDK port to a bridge.
>> If an MTU value is not supplied, or the user-supplied value is invalid,
>> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
>>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> ---
>>  INSTALL.DPDK.md   |  60 ++++++++++++-
>>  NEWS              |   3 +-
>>  lib/netdev-dpdk.c | 248 +++++++++++++++++++++++++++++++++++++-----------------
>>  3 files changed, 233 insertions(+), 78 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index d892788..4ca98cb 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -878,10 +878,63 @@ by adding the following string:
>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>  determines how many queues can be used by the guest.
>>
>> +Jumbo Frames
>> +------------
>> +
>> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
>> +
>> +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
>> +'add-port' command-line, along with the required MTU for the port.
>> +e.g.
>> +
>> +     ```
>> +     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>options:mtu_request=9000
>> +     ```
>> +
>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
>> +increased, such that a full Jumbo Frame may be accommodated inside a single
>> +mbuf segment. Once set, the MTU for a DPDK port is immutable.
>> +
>> +Note that from an OVSDB perspective, the `mtu_request` option for a specific
>> +port may be disregarded once initially set, as subsequent modifications to this
>> +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
>> +ports is reported by the `Interface` table's 'mtu' field.
>> +
>> +Jumbo frame support has been validated against 13312B frames, using the
>> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
>> +theoretically be supported. Supported port types excludes vHost-Cuse ports, as
>> +that feature is pending deprecation.
>> +
>> +vHost Ports and Jumbo Frames
>> +----------------------------
>> +Jumbo frame support is available for DPDK vHost-User ports only. Some additional
>> +configuration is needed to take advantage of this feature:
>> +
>> +  1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
>> +      the QEMU command line snippet below:
>> +
>> +      ```
>> +      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>> +      '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>> +      ```
>> +
>> +  2. Where virtio devices are bound to the Linux kernel driver in a guest
>> +     environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
>> +     MTU of those logical network interfaces must also be increased. This
>> +     avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
>> +     to the length of the IP packet only, and not that of the entire frame.
>> +
>> +     e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
>> +     header and CRC lengths (i.e. 18B) from the max supported frame size.
>> +     So, to set the MTU for a 13312B Jumbo Frame:
>> +
>> +      ```
>> +      ifconfig eth1 mtu 13294
>> +      ```
>> +
>>  Restrictions:
>>  -------------
>>
>> -  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
>>    - Currently DPDK port does not make use any offload functionality.
>>    - DPDK-vHost support works with 1G huge pages.
>>
>> @@ -922,6 +975,11 @@ Restrictions:
>>      the next release of DPDK (which includes the above patch) is available and
>>      integrated into OVS.
>>
>> +  Jumbo Frames:
>> +  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
>> +     issue that is currently being investigated.
>> +  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
>> +
>>  Bug Reporting:
>>  --------------
>>
>> diff --git a/NEWS b/NEWS
>> index 3e33871..43127f9 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -10,10 +10,11 @@ Post-v2.5.0
>>     - DPDK:
>>       * New option "n_rxq" for PMD interfaces.
>>         Old 'other_config:n-dpdk-rxqs' is no longer supported.
>> +     * Support Jumbo Frames
>> +
>>     - ovs-benchmark: This utility has been removed due to lack of use and
>>       bitrot.
>>
>> -
>>  v2.5.0 - xx xxx xxxx
>>  ---------------------
>>     - Dropped support for Python older than version 2.7.  As a consequence,
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 2a06bb5..ac89ee6 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>                                      + sizeof(struct dp_packet)    \
>>                                      + RTE_PKTMBUF_HEADROOM)
>>  #define NETDEV_DPDK_MBUF_ALIGN      1024
>> +#define NETDEV_DPDK_MAX_FRAME_LEN   13312
>> +#define MTU_NOT_SET                 0
>>
>>  /* Max and min number of packets in the mempool.  OVS tries to allocate a
>>   * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
>> @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>>  {
>>      int diag = 0;
>>      int i;
>> +    struct rte_eth_conf conf = port_conf;
>>
>>      /* A device may report more queues than it makes available (this has
>>       * been observed for Intel xl710, which reserves some of them for
>> @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
>n_txq)
>>              VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
>>          }
>>
>> -        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
>> +        if (dev->mtu > ETHER_MTU) {
>> +            conf.rxmode.jumbo_frame = 1;
>> +            conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
>> +        } else {
>> +            conf.rxmode.jumbo_frame = 0;
>> +            conf.rxmode.max_rx_pkt_len = 0;
>> +        }
>> +
>> +        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
>>          if (diag) {
>>              break;
>>          }
>> @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>>  {
>>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>      int sid;
>> -    int err = 0;
>> -    uint32_t buf_size;
>>
>>      ovs_mutex_init(&netdev->mutex);
>>      ovs_mutex_lock(&netdev->mutex);
>> @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>>      netdev->port_id = port_no;
>>      netdev->type = type;
>>      netdev->flags = 0;
>> -    netdev->mtu = ETHER_MTU;
>> -    netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu);
>> -
>> -    buf_size = dpdk_buf_size(netdev->mtu);
>> -    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size));
>> -    if (!netdev->dpdk_mp) {
>> -        err = ENOMEM;
>> -        goto unlock;
>> -    }
>> +    netdev->mtu = MTU_NOT_SET;
>>
>>      netdev_->n_txq = NR_QUEUE;
>>      netdev_->n_rxq = NR_QUEUE;
>> @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>>
>>      if (type == DPDK_DEV_ETH) {
>>          netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
>> -        err = dpdk_eth_dev_init(netdev);
>> -        if (err) {
>> -            goto unlock;
>> -        }
>>      }
>>
>>      list_push_back(&dpdk_list, &netdev->list_node);
>>
>> -unlock:
>> -    if (err) {
>> -        rte_free(netdev->tx_q);
>> -    }
>>      ovs_mutex_unlock(&netdev->mutex);
>> -    return err;
>> +    return 0;
>>  }
>>
>>  static int
>> @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>>      return 0;
>>  }
>>
>> +static void
>> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
>> +{
>> +    const char *mtu_str = smap_get(args, "mtu_request");
>> +    char *end_ptr = NULL;
>> +    int local_mtu;
>> +
>> +    if (!mtu_str) {
>> +        local_mtu = ETHER_MTU;
>> +    } else {
>> +        local_mtu = strtoul(mtu_str, &end_ptr, 0);
>> +        if (local_mtu < ETHER_MTU ||
>> +            local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
>> +            *end_ptr != '\0') {
>> +            local_mtu = ETHER_MTU;
>> +            VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
>> +                    local_mtu);
>> +        } else {
>> +            VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);
>
>That message could be VLOG_DBG because it only tells about a parameter being
>detected and not if the mtu request succeed.  Other than that the patch looks
>good to me.

Great!

>
>If no one else has comments to address, maybe a commiter can fix the message
>level to avoid re-spinning the patchset.

Daniele - are you happy with this, or would you prefer that I spin a new version?

>
>Thanks for the patch!

Thanks for the feedback! :)

>
>Acked-by: Flavio Leitner <fbl@sysclose.org>
>
>--
>fbl
>
>
>
>> +        }
>> +    }
>> +
>> +    *mtu = local_mtu;
>> +}
>> +
>>  static int
>>  vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>>  {
>> @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap
>*args)
>>      smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
>>      smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
>>      smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
>> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>>      ovs_mutex_unlock(&dev->mutex);
>>
>>      return 0;
>>  }
>>
>> +/* Set the mtu of DPDK_DEV_ETH ports */
>> +static int
>> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>> +{
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    int err, dpdk_mtu;
>> +    uint32_t buf_size;
>> +    struct dpdk_mp *mp;
>> +
>> +    ovs_mutex_lock(&dpdk_mutex);
>> +    ovs_mutex_lock(&dev->mutex);
>> +    if (dev->mtu == mtu) {
>> +        err = 0;
>> +        goto out;
>> +    }
>> +
>> +    buf_size = dpdk_buf_size(mtu);
>> +    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>> +
>> +    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>> +    if (!mp) {
>> +        err = ENOMEM;
>> +        goto out;
>> +    }
>> +
>> +    rte_eth_dev_stop(dev->port_id);
>> +
>> +    dev->dpdk_mp = mp;
>> +    dev->mtu = mtu;
>> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +
>> +    err = dpdk_eth_dev_init(dev);
>> +    if (err) {
>> +        VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
>> +                  "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
>> +        dpdk_mp_put(mp);
>> +        dev->mtu = ETHER_MTU;
>> +        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
>> +        if (!mp) {
>> +            err = ENOMEM;
>> +            goto out;
>> +        }
>> +        dev->dpdk_mp = mp;
>> +        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +        dpdk_eth_dev_init(dev);
>> +        goto out;
>> +    } else {
>> +        netdev_change_seq_changed(netdev);
>> +    }
>> +out:
>> +    ovs_mutex_unlock(&dev->mutex);
>> +    ovs_mutex_unlock(&dpdk_mutex);
>> +    return err;
>> +}
>> +
>>  static int
>>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>>  {
>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    int mtu;
>>
>>      ovs_mutex_lock(&dev->mutex);
>>      netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
>> @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>>      netdev_change_seq_changed(netdev);
>>      ovs_mutex_unlock(&dev->mutex);
>>
>> +    dpdk_dev_parse_mtu(args, &mtu);
>> +
>> +    if (!dev->mtu) {
>> +        return netdev_dpdk_set_mtu(netdev, mtu);
>> +    } else if (mtu != dev->mtu) {
>> +        VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
>> +                  "%d\n", mtu, dev->port_id, dev->mtu);
>> +    }
>>      return 0;
>>  }
>>
>> @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
>>  }
>>
>>  static int
>> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>> -{
>> -    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> -    int old_mtu, err, dpdk_mtu;
>> -    struct dpdk_mp *old_mp;
>> -    struct dpdk_mp *mp;
>> -    uint32_t buf_size;
>> -
>> -    ovs_mutex_lock(&dpdk_mutex);
>> -    ovs_mutex_lock(&dev->mutex);
>> -    if (dev->mtu == mtu) {
>> -        err = 0;
>> -        goto out;
>> -    }
>> -
>> -    buf_size = dpdk_buf_size(mtu);
>> -    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>> -
>> -    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>> -    if (!mp) {
>> -        err = ENOMEM;
>> -        goto out;
>> -    }
>> -
>> -    rte_eth_dev_stop(dev->port_id);
>> -
>> -    old_mtu = dev->mtu;
>> -    old_mp = dev->dpdk_mp;
>> -    dev->dpdk_mp = mp;
>> -    dev->mtu = mtu;
>> -    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> -
>> -    err = dpdk_eth_dev_init(dev);
>> -    if (err) {
>> -        dpdk_mp_put(mp);
>> -        dev->mtu = old_mtu;
>> -        dev->dpdk_mp = old_mp;
>> -        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> -        dpdk_eth_dev_init(dev);
>> -        goto out;
>> -    }
>> -
>> -    dpdk_mp_put(old_mp);
>> -    netdev_change_seq_changed(netdev);
>> -out:
>> -    ovs_mutex_unlock(&dev->mutex);
>> -    ovs_mutex_unlock(&dpdk_mutex);
>> -    return err;
>> -}
>> -
>> -static int
>>  netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>>
>>  static int
>> @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void)
>>      return 0;
>>  }
>>
>> +/* Set the mtu of DPDK_DEV_VHOST ports */
>> +static int
>> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
>> +{
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    int err = 0;
>> +    struct dpdk_mp *mp;
>> +
>> +    ovs_mutex_lock(&dpdk_mutex);
>> +    ovs_mutex_lock(&dev->mutex);
>> +    if (dev->mtu == mtu) {
>> +        err = 0;
>> +        goto out;
>> +    }
>> +
>> +    mp = dpdk_mp_get(dev->socket_id, mtu);
>> +    if (!mp) {
>> +        err = ENOMEM;
>> +        goto out;
>> +    }
>> +
>> +    dev->dpdk_mp = mp;
>> +    dev->mtu = mtu;
>> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +
>> +    netdev_change_seq_changed(netdev);
>> +out:
>> +    ovs_mutex_unlock(&dev->mutex);
>> +    ovs_mutex_unlock(&dpdk_mutex);
>> +    return err;
>> +}
>> +
>> +static int
>> +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
>> +{
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    int mtu;
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
>> +                                               netdev->requested_n_rxq), 1);
>> +    netdev_change_seq_changed(netdev);
>> +    ovs_mutex_unlock(&dev->mutex);
>> +
>> +    dpdk_dev_parse_mtu(args, &mtu);
>> +
>> +    if (!dev->mtu) {
>> +        return netdev_dpdk_vhost_set_mtu(netdev, mtu);
>> +    } else if (mtu != dev->mtu) {
>> +        VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
>> +                  "%d\n", mtu, dev->mtu);
>> +    }
>> +    return 0;
>> +}
>> +
>>  static void
>>  dpdk_common_init(void)
>>  {
>> @@ -2136,8 +2223,9 @@ unlock_dpdk:
>>      return err;
>>  }
>>
>> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
>> -    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)          \
>> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
>> +        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,   \
>> +        GET_STATUS, RXQ_RECV)                                          \
>>  {                                                             \
>>      NAME,                                                     \
>>      INIT,                       /* init */                    \
>> @@ -2149,7 +2237,7 @@ unlock_dpdk:
>>      DESTRUCT,                                                 \
>>      netdev_dpdk_dealloc,                                      \
>>      netdev_dpdk_get_config,                                   \
>> -    netdev_dpdk_set_config,                                   \
>> +    SET_CONFIG            ,                                   \
>>      NULL,                       /* get_tunnel_config */       \
>>      NULL,                       /* build header */            \
>>      NULL,                       /* push header */             \
>> @@ -2163,7 +2251,7 @@ unlock_dpdk:
>>      netdev_dpdk_set_etheraddr,                                \
>>      netdev_dpdk_get_etheraddr,                                \
>>      netdev_dpdk_get_mtu,                                      \
>> -    netdev_dpdk_set_mtu,                                      \
>> +    SET_MTU,                                                  \
>>      netdev_dpdk_get_ifindex,                                  \
>>      GET_CARRIER,                                              \
>>      netdev_dpdk_get_carrier_resets,                           \
>> @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class =
>>          NULL,
>>          netdev_dpdk_construct,
>>          netdev_dpdk_destruct,
>> +        netdev_dpdk_set_config,
>>          netdev_dpdk_set_multiq,
>>          netdev_dpdk_eth_send,
>> +        netdev_dpdk_set_mtu,
>>          netdev_dpdk_get_carrier,
>>          netdev_dpdk_get_stats,
>>          netdev_dpdk_get_features,
>> @@ -2323,8 +2413,10 @@ static const struct netdev_class dpdk_ring_class =
>>          NULL,
>>          netdev_dpdk_ring_construct,
>>          netdev_dpdk_destruct,
>> +        netdev_dpdk_set_config,
>>          netdev_dpdk_set_multiq,
>>          netdev_dpdk_ring_send,
>> +        netdev_dpdk_set_mtu,
>>          netdev_dpdk_get_carrier,
>>          netdev_dpdk_get_stats,
>>          netdev_dpdk_get_features,
>> @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
>>          dpdk_vhost_cuse_class_init,
>>          netdev_dpdk_vhost_cuse_construct,
>>          netdev_dpdk_vhost_destruct,
>> +        netdev_dpdk_set_config,
>>          netdev_dpdk_vhost_cuse_set_multiq,
>>          netdev_dpdk_vhost_send,
>> +        NULL,
>>          netdev_dpdk_vhost_get_carrier,
>>          netdev_dpdk_vhost_get_stats,
>>          NULL,
>> @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
>>          dpdk_vhost_user_class_init,
>>          netdev_dpdk_vhost_user_construct,
>>          netdev_dpdk_vhost_destruct,
>> +        netdev_dpdk_vhost_set_config,
>>          netdev_dpdk_vhost_set_multiq,
>>          netdev_dpdk_vhost_send,
>> +        netdev_dpdk_vhost_set_mtu,
>>          netdev_dpdk_vhost_get_carrier,
>>          netdev_dpdk_vhost_get_stats,
>>          NULL,
Daniele Di Proietto Feb. 28, 2016, 8:33 p.m. UTC | #3
Hi Mark,

I thought about how to make this dynamically reconfigurable and I've come
up with a series that generalizes the mechanism used for rx/tx queues
reconfiguration:

http://openvswitch.org/pipermail/dev/2016-February/066927.html

I also think that it should be better to introduce 'mtu_request' as a
separate
column in the Interface table, instead of putting it in 'options'.  This
way
'mtu_request' won't be netdev specific, but could be used for every netdev
type.

In this scenario vswitch/bridge.c would call netdev_set_mtu() when it
detects a
change in 'mtu_request' and netdev_dpdk_set_mtu() could be implemented
using
netdev_request_reconfigure() from the series that I posted.

What do you think?

Thanks,

Daniele

On 23/02/2016 06:56, "Kavanagh, Mark B" <mark.b.kavanagh@intel.com> wrote:

>>
>>On Fri, 19 Feb 2016 11:25:12 +0000
>>Mark Kavanagh <mark.b.kavanagh@intel.com> wrote:
>>
>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>> using single-segment-mbufs.
>>>
>>> Using this approach, the amount of memory allocated for each mbuf
>>> to store frame data is increased to a value greater than 1518B
>>> (typical Ethernet maximum frame length). The increased space
>>> available in the mbuf means that an entire Jumbo Frame can be carried
>>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>>> segments.
>>>
>>> The amount of space allocated to each mbuf to hold frame data is
>>> defined dynamically by the user when adding a DPDK port to a bridge.
>>> If an MTU value is not supplied, or the user-supplied value is invalid,
>>> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
>>>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>> ---
>>>  INSTALL.DPDK.md   |  60 ++++++++++++-
>>>  NEWS              |   3 +-
>>>  lib/netdev-dpdk.c | 248
>>>+++++++++++++++++++++++++++++++++++++-----------------
>>>  3 files changed, 233 insertions(+), 78 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>> index d892788..4ca98cb 100644
>>> --- a/INSTALL.DPDK.md
>>> +++ b/INSTALL.DPDK.md
>>> @@ -878,10 +878,63 @@ by adding the following string:
>>>  to <interface> sections of all network devices used by DPDK.
>>>Parameter 'N'
>>>  determines how many queues can be used by the guest.
>>>
>>> +Jumbo Frames
>>> +------------
>>> +
>>> +Support for Jumbo Frames may be enabled at run-time for DPDK-type
>>>ports.
>>> +
>>> +To avail of Jumbo Frame support, add the 'mtu_request' option to the
>>>ovs-vsctl
>>> +'add-port' command-line, along with the required MTU for the port.
>>> +e.g.
>>> +
>>> +     ```
>>> +     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>>options:mtu_request=9000
>>> +     ```
>>> +
>>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf
>>>segments are
>>> +increased, such that a full Jumbo Frame may be accommodated inside a
>>>single
>>> +mbuf segment. Once set, the MTU for a DPDK port is immutable.
>>> +
>>> +Note that from an OVSDB perspective, the `mtu_request` option for a
>>>specific
>>> +port may be disregarded once initially set, as subsequent
>>>modifications to this
>>> +field are disregarded by the DPDK port. As with non-DPDK ports, the
>>>MTU of DPDK
>>> +ports is reported by the `Interface` table's 'mtu' field.
>>> +
>>> +Jumbo frame support has been validated against 13312B frames, using
>>>the
>>> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers
>>>may
>>> +theoretically be supported. Supported port types excludes vHost-Cuse
>>>ports, as
>>> +that feature is pending deprecation.
>>> +
>>> +vHost Ports and Jumbo Frames
>>> +----------------------------
>>> +Jumbo frame support is available for DPDK vHost-User ports only. Some
>>>additional
>>> +configuration is needed to take advantage of this feature:
>>> +
>>> +  1. `mergeable buffers` must be enabled for vHost ports, as
>>>demonstrated in
>>> +      the QEMU command line snippet below:
>>> +
>>> +      ```
>>> +      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>>> +      '-device
>>>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>>> +      ```
>>> +
>>> +  2. Where virtio devices are bound to the Linux kernel driver in a
>>>guest
>>> +     environment (i.e. interfaces are not bound to an in-guest DPDK
>>>driver), the
>>> +     MTU of those logical network interfaces must also be increased.
>>>This
>>> +     avoids segmentation of Jumbo Frames in the guest. Note that
>>>'MTU' refers
>>> +     to the length of the IP packet only, and not that of the entire
>>>frame.
>>> +
>>> +     e.g. To calculate the exact MTU of a standard IPv4 frame,
>>>subtract the L2
>>> +     header and CRC lengths (i.e. 18B) from the max supported frame
>>>size.
>>> +     So, to set the MTU for a 13312B Jumbo Frame:
>>> +
>>> +      ```
>>> +      ifconfig eth1 mtu 13294
>>> +      ```
>>> +
>>>  Restrictions:
>>>  -------------
>>>
>>> -  - Work with 1500 MTU, needs few changes in DPDK lib to fix this
>>>issue.
>>>    - Currently DPDK port does not make use any offload functionality.
>>>    - DPDK-vHost support works with 1G huge pages.
>>>
>>> @@ -922,6 +975,11 @@ Restrictions:
>>>      the next release of DPDK (which includes the above patch) is
>>>available and
>>>      integrated into OVS.
>>>
>>> +  Jumbo Frames:
>>> +  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This
>>>is a DPDK
>>> +     issue that is currently being investigated.
>>> +  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse
>>>ports.
>>> +
>>>  Bug Reporting:
>>>  --------------
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 3e33871..43127f9 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -10,10 +10,11 @@ Post-v2.5.0
>>>     - DPDK:
>>>       * New option "n_rxq" for PMD interfaces.
>>>         Old 'other_config:n-dpdk-rxqs' is no longer supported.
>>> +     * Support Jumbo Frames
>>> +
>>>     - ovs-benchmark: This utility has been removed due to lack of use
>>>and
>>>       bitrot.
>>>
>>> -
>>>  v2.5.0 - xx xxx xxxx
>>>  ---------------------
>>>     - Dropped support for Python older than version 2.7.  As a
>>>consequence,
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 2a06bb5..ac89ee6 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl =
>>>VLOG_RATE_LIMIT_INIT(5, 20);
>>>                                      + sizeof(struct dp_packet)    \
>>>                                      + RTE_PKTMBUF_HEADROOM)
>>>  #define NETDEV_DPDK_MBUF_ALIGN      1024
>>> +#define NETDEV_DPDK_MAX_FRAME_LEN   13312
>>> +#define MTU_NOT_SET                 0
>>>
>>>  /* Max and min number of packets in the mempool.  OVS tries to
>>>allocate a
>>>   * mempool with MAX_NB_MBUF: if this fails (because the system
>>>doesn't have
>>> @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev,
>>>int n_rxq, int n_txq)
>>>  {
>>>      int diag = 0;
>>>      int i;
>>> +    struct rte_eth_conf conf = port_conf;
>>>
>>>      /* A device may report more queues than it makes available (this
>>>has
>>>       * been observed for Intel xl710, which reserves some of them for
>>> @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev,
>>>int n_rxq, int
>>n_txq)
>>>              VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq,
>>>n_txq);
>>>          }
>>>
>>> -        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
>>>&port_conf);
>>> +        if (dev->mtu > ETHER_MTU) {
>>> +            conf.rxmode.jumbo_frame = 1;
>>> +            conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
>>> +        } else {
>>> +            conf.rxmode.jumbo_frame = 0;
>>> +            conf.rxmode.max_rx_pkt_len = 0;
>>> +        }
>>> +
>>> +        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
>>>&conf);
>>>          if (diag) {
>>>              break;
>>>          }
>>> @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
>>>int port_no,
>>>  {
>>>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>>      int sid;
>>> -    int err = 0;
>>> -    uint32_t buf_size;
>>>
>>>      ovs_mutex_init(&netdev->mutex);
>>>      ovs_mutex_lock(&netdev->mutex);
>>> @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
>>>int port_no,
>>>      netdev->port_id = port_no;
>>>      netdev->type = type;
>>>      netdev->flags = 0;
>>> -    netdev->mtu = ETHER_MTU;
>>> -    netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu);
>>> -
>>> -    buf_size = dpdk_buf_size(netdev->mtu);
>>> -    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
>>>FRAME_LEN_TO_MTU(buf_size));
>>> -    if (!netdev->dpdk_mp) {
>>> -        err = ENOMEM;
>>> -        goto unlock;
>>> -    }
>>> +    netdev->mtu = MTU_NOT_SET;
>>>
>>>      netdev_->n_txq = NR_QUEUE;
>>>      netdev_->n_rxq = NR_QUEUE;
>>> @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_,
>>>unsigned int port_no,
>>>
>>>      if (type == DPDK_DEV_ETH) {
>>>          netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
>>> -        err = dpdk_eth_dev_init(netdev);
>>> -        if (err) {
>>> -            goto unlock;
>>> -        }
>>>      }
>>>
>>>      list_push_back(&dpdk_list, &netdev->list_node);
>>>
>>> -unlock:
>>> -    if (err) {
>>> -        rte_free(netdev->tx_q);
>>> -    }
>>>      ovs_mutex_unlock(&netdev->mutex);
>>> -    return err;
>>> +    return 0;
>>>  }
>>>
>>>  static int
>>> @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const
>>>char prefix[],
>>>      return 0;
>>>  }
>>>
>>> +static void
>>> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
>>> +{
>>> +    const char *mtu_str = smap_get(args, "mtu_request");
>>> +    char *end_ptr = NULL;
>>> +    int local_mtu;
>>> +
>>> +    if (!mtu_str) {
>>> +        local_mtu = ETHER_MTU;
>>> +    } else {
>>> +        local_mtu = strtoul(mtu_str, &end_ptr, 0);
>>> +        if (local_mtu < ETHER_MTU ||
>>> +            local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
>>> +            *end_ptr != '\0') {
>>> +            local_mtu = ETHER_MTU;
>>> +            VLOG_WARN("Invalid mtu_request parameter - defaulting to
>>>%d.\n",
>>> +                    local_mtu);
>>> +        } else {
>>> +            VLOG_INFO("mtu_request parameter %d detected.\n",
>>>local_mtu);
>>
>>That message could be VLOG_DBG because it only tells about a parameter
>>being
>>detected and not if the mtu request succeed.  Other than that the patch
>>looks
>>good to me.
>
>Great!
>
>>
>>If no one else has comments to address, maybe a commiter can fix the
>>message
>>level to avoid re-spinning the patchset.
>
>Daniele - are you happy with this, or would you prefer that I spin a new
>version?
>
>>
>>Thanks for the patch!
>
>Thanks for the feedback! :)
>
>>
>>Acked-by: Flavio Leitner <fbl@sysclose.org>
>>
>>--
>>fbl
>>
>>
>>
>>> +        }
>>> +    }
>>> +
>>> +    *mtu = local_mtu;
>>> +}
>>> +
>>>  static int
>>>  vhost_construct_helper(struct netdev *netdev_)
>>>OVS_REQUIRES(dpdk_mutex)
>>>  {
>>> @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev
>>>*netdev, struct smap
>>*args)
>>>      smap_add_format(args, "configured_rx_queues", "%d",
>>>netdev->n_rxq);
>>>      smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
>>>      smap_add_format(args, "configured_tx_queues", "%d",
>>>dev->real_n_txq);
>>> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>>>      ovs_mutex_unlock(&dev->mutex);
>>>
>>>      return 0;
>>>  }
>>>
>>> +/* Set the mtu of DPDK_DEV_ETH ports */
>>> +static int
>>> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>>> +{
>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +    int err, dpdk_mtu;
>>> +    uint32_t buf_size;
>>> +    struct dpdk_mp *mp;
>>> +
>>> +    ovs_mutex_lock(&dpdk_mutex);
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    if (dev->mtu == mtu) {
>>> +        err = 0;
>>> +        goto out;
>>> +    }
>>> +
>>> +    buf_size = dpdk_buf_size(mtu);
>>> +    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>>> +
>>> +    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>>> +    if (!mp) {
>>> +        err = ENOMEM;
>>> +        goto out;
>>> +    }
>>> +
>>> +    rte_eth_dev_stop(dev->port_id);
>>> +
>>> +    dev->dpdk_mp = mp;
>>> +    dev->mtu = mtu;
>>> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> +
>>> +    err = dpdk_eth_dev_init(dev);
>>> +    if (err) {
>>> +        VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to
>>>default "
>>> +                  "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
>>> +        dpdk_mp_put(mp);
>>> +        dev->mtu = ETHER_MTU;
>>> +        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
>>> +        if (!mp) {
>>> +            err = ENOMEM;
>>> +            goto out;
>>> +        }
>>> +        dev->dpdk_mp = mp;
>>> +        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> +        dpdk_eth_dev_init(dev);
>>> +        goto out;
>>> +    } else {
>>> +        netdev_change_seq_changed(netdev);
>>> +    }
>>> +out:
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>> +    return err;
>>> +}
>>> +
>>>  static int
>>>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>>>  {
>>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +    int mtu;
>>>
>>>      ovs_mutex_lock(&dev->mutex);
>>>      netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
>>> @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev,
>>>const struct smap *args)
>>>      netdev_change_seq_changed(netdev);
>>>      ovs_mutex_unlock(&dev->mutex);
>>>
>>> +    dpdk_dev_parse_mtu(args, &mtu);
>>> +
>>> +    if (!dev->mtu) {
>>> +        return netdev_dpdk_set_mtu(netdev, mtu);
>>> +    } else if (mtu != dev->mtu) {
>>> +        VLOG_WARN("Unable to set MTU %d for port %d; this port has
>>>immutable MTU "
>>> +                  "%d\n", mtu, dev->port_id, dev->mtu);
>>> +    }
>>>      return 0;
>>>  }
>>>
>>> @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev
>>>*netdev, int *mtup)
>>>  }
>>>
>>>  static int
>>> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>>> -{
>>> -    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> -    int old_mtu, err, dpdk_mtu;
>>> -    struct dpdk_mp *old_mp;
>>> -    struct dpdk_mp *mp;
>>> -    uint32_t buf_size;
>>> -
>>> -    ovs_mutex_lock(&dpdk_mutex);
>>> -    ovs_mutex_lock(&dev->mutex);
>>> -    if (dev->mtu == mtu) {
>>> -        err = 0;
>>> -        goto out;
>>> -    }
>>> -
>>> -    buf_size = dpdk_buf_size(mtu);
>>> -    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>>> -
>>> -    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>>> -    if (!mp) {
>>> -        err = ENOMEM;
>>> -        goto out;
>>> -    }
>>> -
>>> -    rte_eth_dev_stop(dev->port_id);
>>> -
>>> -    old_mtu = dev->mtu;
>>> -    old_mp = dev->dpdk_mp;
>>> -    dev->dpdk_mp = mp;
>>> -    dev->mtu = mtu;
>>> -    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> -
>>> -    err = dpdk_eth_dev_init(dev);
>>> -    if (err) {
>>> -        dpdk_mp_put(mp);
>>> -        dev->mtu = old_mtu;
>>> -        dev->dpdk_mp = old_mp;
>>> -        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> -        dpdk_eth_dev_init(dev);
>>> -        goto out;
>>> -    }
>>> -
>>> -    dpdk_mp_put(old_mp);
>>> -    netdev_change_seq_changed(netdev);
>>> -out:
>>> -    ovs_mutex_unlock(&dev->mutex);
>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>> -    return err;
>>> -}
>>> -
>>> -static int
>>>  netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>>>
>>>  static int
>>> @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void)
>>>      return 0;
>>>  }
>>>
>>> +/* Set the mtu of DPDK_DEV_VHOST ports */
>>> +static int
>>> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
>>> +{
>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +    int err = 0;
>>> +    struct dpdk_mp *mp;
>>> +
>>> +    ovs_mutex_lock(&dpdk_mutex);
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    if (dev->mtu == mtu) {
>>> +        err = 0;
>>> +        goto out;
>>> +    }
>>> +
>>> +    mp = dpdk_mp_get(dev->socket_id, mtu);
>>> +    if (!mp) {
>>> +        err = ENOMEM;
>>> +        goto out;
>>> +    }
>>> +
>>> +    dev->dpdk_mp = mp;
>>> +    dev->mtu = mtu;
>>> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> +
>>> +    netdev_change_seq_changed(netdev);
>>> +out:
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>> +    return err;
>>> +}
>>> +
>>> +static int
>>> +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap
>>>*args)
>>> +{
>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +    int mtu;
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
>>> +              
>>>netdev->requested_n_rxq), 1);
>>> +    netdev_change_seq_changed(netdev);
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    dpdk_dev_parse_mtu(args, &mtu);
>>> +
>>> +    if (!dev->mtu) {
>>> +        return netdev_dpdk_vhost_set_mtu(netdev, mtu);
>>> +    } else if (mtu != dev->mtu) {
>>> +        VLOG_WARN("Unable to set MTU %d for vhost port; this port has
>>>immutable MTU "
>>> +                  "%d\n", mtu, dev->mtu);
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>>  static void
>>>  dpdk_common_init(void)
>>>  {
>>> @@ -2136,8 +2223,9 @@ unlock_dpdk:
>>>      return err;
>>>  }
>>>
>>> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ,
>>>SEND, \
>>> -    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)
>>>   \
>>> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT,
>>>SET_CONFIG, \
>>> +        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,
>>> \
>>> +        GET_STATUS, RXQ_RECV)
>>> \
>>>  {                                                             \
>>>      NAME,                                                     \
>>>      INIT,                       /* init */                    \
>>> @@ -2149,7 +2237,7 @@ unlock_dpdk:
>>>      DESTRUCT,                                                 \
>>>      netdev_dpdk_dealloc,                                      \
>>>      netdev_dpdk_get_config,                                   \
>>> -    netdev_dpdk_set_config,                                   \
>>> +    SET_CONFIG            ,                                   \
>>>      NULL,                       /* get_tunnel_config */       \
>>>      NULL,                       /* build header */            \
>>>      NULL,                       /* push header */             \
>>> @@ -2163,7 +2251,7 @@ unlock_dpdk:
>>>      netdev_dpdk_set_etheraddr,                                \
>>>      netdev_dpdk_get_etheraddr,                                \
>>>      netdev_dpdk_get_mtu,                                      \
>>> -    netdev_dpdk_set_mtu,                                      \
>>> +    SET_MTU,                                                  \
>>>      netdev_dpdk_get_ifindex,                                  \
>>>      GET_CARRIER,                                              \
>>>      netdev_dpdk_get_carrier_resets,                           \
>>> @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class =
>>>          NULL,
>>>          netdev_dpdk_construct,
>>>          netdev_dpdk_destruct,
>>> +        netdev_dpdk_set_config,
>>>          netdev_dpdk_set_multiq,
>>>          netdev_dpdk_eth_send,
>>> +        netdev_dpdk_set_mtu,
>>>          netdev_dpdk_get_carrier,
>>>          netdev_dpdk_get_stats,
>>>          netdev_dpdk_get_features,
>>> @@ -2323,8 +2413,10 @@ static const struct netdev_class
>>>dpdk_ring_class =
>>>          NULL,
>>>          netdev_dpdk_ring_construct,
>>>          netdev_dpdk_destruct,
>>> +        netdev_dpdk_set_config,
>>>          netdev_dpdk_set_multiq,
>>>          netdev_dpdk_ring_send,
>>> +        netdev_dpdk_set_mtu,
>>>          netdev_dpdk_get_carrier,
>>>          netdev_dpdk_get_stats,
>>>          netdev_dpdk_get_features,
>>> @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED
>>>dpdk_vhost_cuse_class =
>>>          dpdk_vhost_cuse_class_init,
>>>          netdev_dpdk_vhost_cuse_construct,
>>>          netdev_dpdk_vhost_destruct,
>>> +        netdev_dpdk_set_config,
>>>          netdev_dpdk_vhost_cuse_set_multiq,
>>>          netdev_dpdk_vhost_send,
>>> +        NULL,
>>>          netdev_dpdk_vhost_get_carrier,
>>>          netdev_dpdk_vhost_get_stats,
>>>          NULL,
>>> @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED
>>>dpdk_vhost_user_class =
>>>          dpdk_vhost_user_class_init,
>>>          netdev_dpdk_vhost_user_construct,
>>>          netdev_dpdk_vhost_destruct,
>>> +        netdev_dpdk_vhost_set_config,
>>>          netdev_dpdk_vhost_set_multiq,
>>>          netdev_dpdk_vhost_send,
>>> +        netdev_dpdk_vhost_set_mtu,
>>>          netdev_dpdk_vhost_get_carrier,
>>>          netdev_dpdk_vhost_get_stats,
>>>          NULL,
>
diff mbox

Patch

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index d892788..4ca98cb 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -878,10 +878,63 @@  by adding the following string:
 to <interface> sections of all network devices used by DPDK. Parameter 'N'
 determines how many queues can be used by the guest.
 
+Jumbo Frames
+------------
+
+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
+
+To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
+'add-port' command-line, along with the required MTU for the port.
+e.g.
+
+     ```
+     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000
+     ```
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame may be accommodated inside a single
+mbuf segment. Once set, the MTU for a DPDK port is immutable.
+
+Note that from an OVSDB perspective, the `mtu_request` option for a specific
+port may be disregarded once initially set, as subsequent modifications to this
+field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
+ports is reported by the `Interface` table's 'mtu' field.
+
+Jumbo frame support has been validated against 13312B frames, using the
+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
+theoretically be supported. Supported port types excludes vHost-Cuse ports, as
+that feature is pending deprecation.
+
+vHost Ports and Jumbo Frames
+----------------------------
+Jumbo frame support is available for DPDK vHost-User ports only. Some additional
+configuration is needed to take advantage of this feature:
+
+  1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+      the QEMU command line snippet below:
+
+      ```
+      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+      '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+      ```
+
+  2. Where virtio devices are bound to the Linux kernel driver in a guest
+     environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
+     MTU of those logical network interfaces must also be increased. This
+     avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
+     to the length of the IP packet only, and not that of the entire frame.
+
+     e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+     header and CRC lengths (i.e. 18B) from the max supported frame size.
+     So, to set the MTU for a 13312B Jumbo Frame:
+
+      ```
+      ifconfig eth1 mtu 13294
+      ```
+
 Restrictions:
 -------------
 
-  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
   - Currently DPDK port does not make use any offload functionality.
   - DPDK-vHost support works with 1G huge pages.
 
@@ -922,6 +975,11 @@  Restrictions:
     the next release of DPDK (which includes the above patch) is available and
     integrated into OVS.
 
+  Jumbo Frames:
+  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
+     issue that is currently being investigated.
+  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
+
 Bug Reporting:
 --------------
 
diff --git a/NEWS b/NEWS
index 3e33871..43127f9 100644
--- a/NEWS
+++ b/NEWS
@@ -10,10 +10,11 @@  Post-v2.5.0
    - DPDK:
      * New option "n_rxq" for PMD interfaces.
        Old 'other_config:n-dpdk-rxqs' is no longer supported.
+     * Support Jumbo Frames
+
    - ovs-benchmark: This utility has been removed due to lack of use and
      bitrot.
 
-
 v2.5.0 - xx xxx xxxx
 ---------------------
    - Dropped support for Python older than version 2.7.  As a consequence,
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2a06bb5..ac89ee6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -77,6 +77,8 @@  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
                                     + sizeof(struct dp_packet)    \
                                     + RTE_PKTMBUF_HEADROOM)
 #define NETDEV_DPDK_MBUF_ALIGN      1024
+#define NETDEV_DPDK_MAX_FRAME_LEN   13312
+#define MTU_NOT_SET                 0
 
 /* Max and min number of packets in the mempool.  OVS tries to allocate a
  * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
@@ -422,6 +424,7 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 {
     int diag = 0;
     int i;
+    struct rte_eth_conf conf = port_conf;
 
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
@@ -433,7 +436,15 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
             VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
         }
 
-        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
+        if (dev->mtu > ETHER_MTU) {
+            conf.rxmode.jumbo_frame = 1;
+            conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
+        } else {
+            conf.rxmode.jumbo_frame = 0;
+            conf.rxmode.max_rx_pkt_len = 0;
+        }
+
+        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
         if (diag) {
             break;
         }
@@ -574,8 +585,6 @@  netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
 {
     struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
     int sid;
-    int err = 0;
-    uint32_t buf_size;
 
     ovs_mutex_init(&netdev->mutex);
     ovs_mutex_lock(&netdev->mutex);
@@ -595,15 +604,7 @@  netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
     netdev->port_id = port_no;
     netdev->type = type;
     netdev->flags = 0;
-    netdev->mtu = ETHER_MTU;
-    netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu);
-
-    buf_size = dpdk_buf_size(netdev->mtu);
-    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size));
-    if (!netdev->dpdk_mp) {
-        err = ENOMEM;
-        goto unlock;
-    }
+    netdev->mtu = MTU_NOT_SET;
 
     netdev_->n_txq = NR_QUEUE;
     netdev_->n_rxq = NR_QUEUE;
@@ -612,20 +613,12 @@  netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
 
     if (type == DPDK_DEV_ETH) {
         netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
-        err = dpdk_eth_dev_init(netdev);
-        if (err) {
-            goto unlock;
-        }
     }
 
     list_push_back(&dpdk_list, &netdev->list_node);
 
-unlock:
-    if (err) {
-        rte_free(netdev->tx_q);
-    }
     ovs_mutex_unlock(&netdev->mutex);
-    return err;
+    return 0;
 }
 
 static int
@@ -643,6 +636,31 @@  dpdk_dev_parse_name(const char dev_name[], const char prefix[],
     return 0;
 }
 
+static void
+dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
+{
+    const char *mtu_str = smap_get(args, "mtu_request");
+    char *end_ptr = NULL;
+    int local_mtu;
+
+    if (!mtu_str) {
+        local_mtu = ETHER_MTU;
+    } else {
+        local_mtu = strtoul(mtu_str, &end_ptr, 0);
+        if (local_mtu < ETHER_MTU ||
+            local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
+            *end_ptr != '\0') {
+            local_mtu = ETHER_MTU;
+            VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
+                    local_mtu);
+        } else {
+            VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);
+        }
+    }
+
+    *mtu = local_mtu;
+}
+
 static int
 vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
 {
@@ -773,15 +791,72 @@  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
     smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
     smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
     smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
+    smap_add_format(args, "mtu", "%d", dev->mtu);
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
 }
 
+/* Set the mtu of DPDK_DEV_ETH ports */
+static int
+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err, dpdk_mtu;
+    uint32_t buf_size;
+    struct dpdk_mp *mp;
+
+    ovs_mutex_lock(&dpdk_mutex);
+    ovs_mutex_lock(&dev->mutex);
+    if (dev->mtu == mtu) {
+        err = 0;
+        goto out;
+    }
+
+    buf_size = dpdk_buf_size(mtu);
+    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
+
+    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
+    if (!mp) {
+        err = ENOMEM;
+        goto out;
+    }
+
+    rte_eth_dev_stop(dev->port_id);
+
+    dev->dpdk_mp = mp;
+    dev->mtu = mtu;
+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+    err = dpdk_eth_dev_init(dev);
+    if (err) {
+        VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
+                  "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
+        dpdk_mp_put(mp);
+        dev->mtu = ETHER_MTU;
+        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
+        if (!mp) {
+            err = ENOMEM;
+            goto out;
+        }
+        dev->dpdk_mp = mp;
+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+        dpdk_eth_dev_init(dev);
+        goto out;
+    } else {
+        netdev_change_seq_changed(netdev);
+    }
+out:
+    ovs_mutex_unlock(&dev->mutex);
+    ovs_mutex_unlock(&dpdk_mutex);
+    return err;
+}
+
 static int
 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int mtu;
 
     ovs_mutex_lock(&dev->mutex);
     netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
@@ -789,6 +864,14 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
     netdev_change_seq_changed(netdev);
     ovs_mutex_unlock(&dev->mutex);
 
+    dpdk_dev_parse_mtu(args, &mtu);
+
+    if (!dev->mtu) {
+        return netdev_dpdk_set_mtu(netdev, mtu);
+    } else if (mtu != dev->mtu) {
+        VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
+                  "%d\n", mtu, dev->port_id, dev->mtu);
+    }
     return 0;
 }
 
@@ -1407,57 +1490,6 @@  netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-    int old_mtu, err, dpdk_mtu;
-    struct dpdk_mp *old_mp;
-    struct dpdk_mp *mp;
-    uint32_t buf_size;
-
-    ovs_mutex_lock(&dpdk_mutex);
-    ovs_mutex_lock(&dev->mutex);
-    if (dev->mtu == mtu) {
-        err = 0;
-        goto out;
-    }
-
-    buf_size = dpdk_buf_size(mtu);
-    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-    if (!mp) {
-        err = ENOMEM;
-        goto out;
-    }
-
-    rte_eth_dev_stop(dev->port_id);
-
-    old_mtu = dev->mtu;
-    old_mp = dev->dpdk_mp;
-    dev->dpdk_mp = mp;
-    dev->mtu = mtu;
-    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-    err = dpdk_eth_dev_init(dev);
-    if (err) {
-        dpdk_mp_put(mp);
-        dev->mtu = old_mtu;
-        dev->dpdk_mp = old_mp;
-        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-        dpdk_eth_dev_init(dev);
-        goto out;
-    }
-
-    dpdk_mp_put(old_mp);
-    netdev_change_seq_changed(netdev);
-out:
-    ovs_mutex_unlock(&dev->mutex);
-    ovs_mutex_unlock(&dpdk_mutex);
-    return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
 
 static int
@@ -2000,6 +2032,61 @@  dpdk_vhost_user_class_init(void)
     return 0;
 }
 
+/* Set the mtu of DPDK_DEV_VHOST ports */
+static int
+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err = 0;
+    struct dpdk_mp *mp;
+
+    ovs_mutex_lock(&dpdk_mutex);
+    ovs_mutex_lock(&dev->mutex);
+    if (dev->mtu == mtu) {
+        err = 0;
+        goto out;
+    }
+
+    mp = dpdk_mp_get(dev->socket_id, mtu);
+    if (!mp) {
+        err = ENOMEM;
+        goto out;
+    }
+
+    dev->dpdk_mp = mp;
+    dev->mtu = mtu;
+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+    netdev_change_seq_changed(netdev);
+out:
+    ovs_mutex_unlock(&dev->mutex);
+    ovs_mutex_unlock(&dpdk_mutex);
+    return err;
+}
+
+static int
+netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int mtu;
+
+    ovs_mutex_lock(&dev->mutex);
+    netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
+                                               netdev->requested_n_rxq), 1);
+    netdev_change_seq_changed(netdev);
+    ovs_mutex_unlock(&dev->mutex);
+
+    dpdk_dev_parse_mtu(args, &mtu);
+
+    if (!dev->mtu) {
+        return netdev_dpdk_vhost_set_mtu(netdev, mtu);
+    } else if (mtu != dev->mtu) {
+        VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
+                  "%d\n", mtu, dev->mtu);
+    }
+    return 0;
+}
+
 static void
 dpdk_common_init(void)
 {
@@ -2136,8 +2223,9 @@  unlock_dpdk:
     return err;
 }
 
-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
-    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)          \
+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
+        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,   \
+        GET_STATUS, RXQ_RECV)                                          \
 {                                                             \
     NAME,                                                     \
     INIT,                       /* init */                    \
@@ -2149,7 +2237,7 @@  unlock_dpdk:
     DESTRUCT,                                                 \
     netdev_dpdk_dealloc,                                      \
     netdev_dpdk_get_config,                                   \
-    netdev_dpdk_set_config,                                   \
+    SET_CONFIG            ,                                   \
     NULL,                       /* get_tunnel_config */       \
     NULL,                       /* build header */            \
     NULL,                       /* push header */             \
@@ -2163,7 +2251,7 @@  unlock_dpdk:
     netdev_dpdk_set_etheraddr,                                \
     netdev_dpdk_get_etheraddr,                                \
     netdev_dpdk_get_mtu,                                      \
-    netdev_dpdk_set_mtu,                                      \
+    SET_MTU,                                                  \
     netdev_dpdk_get_ifindex,                                  \
     GET_CARRIER,                                              \
     netdev_dpdk_get_carrier_resets,                           \
@@ -2309,8 +2397,10 @@  static const struct netdev_class dpdk_class =
         NULL,
         netdev_dpdk_construct,
         netdev_dpdk_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_set_multiq,
         netdev_dpdk_eth_send,
+        netdev_dpdk_set_mtu,
         netdev_dpdk_get_carrier,
         netdev_dpdk_get_stats,
         netdev_dpdk_get_features,
@@ -2323,8 +2413,10 @@  static const struct netdev_class dpdk_ring_class =
         NULL,
         netdev_dpdk_ring_construct,
         netdev_dpdk_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_set_multiq,
         netdev_dpdk_ring_send,
+        netdev_dpdk_set_mtu,
         netdev_dpdk_get_carrier,
         netdev_dpdk_get_stats,
         netdev_dpdk_get_features,
@@ -2337,8 +2429,10 @@  static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
         dpdk_vhost_cuse_class_init,
         netdev_dpdk_vhost_cuse_construct,
         netdev_dpdk_vhost_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_vhost_cuse_set_multiq,
         netdev_dpdk_vhost_send,
+        NULL,
         netdev_dpdk_vhost_get_carrier,
         netdev_dpdk_vhost_get_stats,
         NULL,
@@ -2351,8 +2445,10 @@  static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
         dpdk_vhost_user_class_init,
         netdev_dpdk_vhost_user_construct,
         netdev_dpdk_vhost_destruct,
+        netdev_dpdk_vhost_set_config,
         netdev_dpdk_vhost_set_multiq,
         netdev_dpdk_vhost_send,
+        netdev_dpdk_vhost_set_mtu,
         netdev_dpdk_vhost_get_carrier,
         netdev_dpdk_vhost_get_stats,
         NULL,