diff mbox series

[ovs-dev,RFC,V4,9/9] netdev-dpdk: support multi-segment jumbo frames

Message ID 1512734518-103757-10-git-send-email-mark.b.kavanagh@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series netdev-dpdk: support multi-segment mbufs | expand

Commit Message

Mark Kavanagh Dec. 8, 2017, 12:01 p.m. UTC
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

    ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
    ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
    ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
---
 NEWS                 |  1 +
 lib/dpdk.c           |  7 +++++++
 lib/netdev-dpdk.c    | 52 +++++++++++++++++++++++++++++++++++++++++++++-------
 lib/netdev-dpdk.h    |  1 +
 vswitchd/vswitch.xml | 20 ++++++++++++++++++++
 5 files changed, 74 insertions(+), 7 deletions(-)

Comments

Chandran, Sugesh Dec. 8, 2017, 6:04 p.m. UTC | #1
Hi Mark,

For some reason, I could not apply this patch cleanly. 
I couldn't do much of testing on the feature as such. 
Can you please send a proper Patch after rebase.

Regards
_Sugesh


> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Friday, December 8, 2017 12:02 PM
> To: dev@openvswitch.org; qiudayu@chinac.com
> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>;
> santosh.shukla@caviumnetworks.com; Chandran, Sugesh
> <sugesh.chandran@intel.com>; Kavanagh, Mark B
> <mark.b.kavanagh@intel.com>
> Subject: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
> jumbo frames
> 

[Snip] 
> 1.9.3
Mark Kavanagh Dec. 11, 2017, 11:49 a.m. UTC | #2
>From: Chandran, Sugesh
>Sent: Friday, December 8, 2017 6:05 PM
>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org;
>qiudayu@chinac.com
>Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
><ciara.loftus@intel.com>; santosh.shukla@caviumnetworks.com
>Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
>jumbo frames
>
>Hi Mark,
>
>For some reason, I could not apply this patch cleanly.

Apologies for this Sugesh, I'll send an updated version soon.
-Mark

>I couldn't do much of testing on the feature as such.
>Can you please send a proper Patch after rebase.
>
>Regards
>_Sugesh
>
>
>> -----Original Message-----
>> From: Kavanagh, Mark B
>> Sent: Friday, December 8, 2017 12:02 PM
>> To: dev@openvswitch.org; qiudayu@chinac.com
>> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
><ciara.loftus@intel.com>;
>> santosh.shukla@caviumnetworks.com; Chandran, Sugesh
>> <sugesh.chandran@intel.com>; Kavanagh, Mark B
>> <mark.b.kavanagh@intel.com>
>> Subject: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
>> jumbo frames
>>
>
>[Snip]
>> 1.9.3
Mark Kavanagh Dec. 11, 2017, 11:58 a.m. UTC | #3
>From: Kavanagh, Mark B
>Sent: Monday, December 11, 2017 11:49 AM
>To: Chandran, Sugesh <sugesh.chandran@intel.com>; dev@openvswitch.org;
>qiudayu@chinac.com
>Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
><ciara.loftus@intel.com>; santosh.shukla@caviumnetworks.com
>Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
>jumbo frames
>
>>From: Chandran, Sugesh
>>Sent: Friday, December 8, 2017 6:05 PM
>>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; dev@openvswitch.org;
>>qiudayu@chinac.com
>>Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
>><ciara.loftus@intel.com>; santosh.shukla@caviumnetworks.com
>>Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
>>jumbo frames
>>
>>Hi Mark,
>>
>>For some reason, I could not apply this patch cleanly.
>
>Apologies for this Sugesh, I'll send an updated version soon.
>-Mark

Addendum: I think I know what happened Sugesh.

This patchset was built on the DPDK v17.11 upgrade patchset (this was mentioned in the cover letter!) - did you apply the former in advance of applying the multi-segment patchset?

Thanks again,
Mark

>
>>I couldn't do much of testing on the feature as such.
>>Can you please send a proper Patch after rebase.
>>
>>Regards
>>_Sugesh
>>
>>
>>> -----Original Message-----
>>> From: Kavanagh, Mark B
>>> Sent: Friday, December 8, 2017 12:02 PM
>>> To: dev@openvswitch.org; qiudayu@chinac.com
>>> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
>><ciara.loftus@intel.com>;
>>> santosh.shukla@caviumnetworks.com; Chandran, Sugesh
>>> <sugesh.chandran@intel.com>; Kavanagh, Mark B
>>> <mark.b.kavanagh@intel.com>
>>> Subject: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
>>> jumbo frames
>>>
>>
>>[Snip]
>>> 1.9.3
Chandran, Sugesh Dec. 12, 2017, 12:23 p.m. UTC | #4
Regards
_Sugesh


> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Monday, December 11, 2017 11:59 AM
> To: Chandran, Sugesh <sugesh.chandran@intel.com>; dev@openvswitch.org;
> qiudayu@chinac.com
> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara <ciara.loftus@intel.com>;
> santosh.shukla@caviumnetworks.com
> Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support multi-segment
> jumbo frames
> 
> >From: Kavanagh, Mark B
> >Sent: Monday, December 11, 2017 11:49 AM
> >To: Chandran, Sugesh <sugesh.chandran@intel.com>; dev@openvswitch.org;
> >qiudayu@chinac.com
> >Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
> ><ciara.loftus@intel.com>; santosh.shukla@caviumnetworks.com
> >Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support
> >multi-segment jumbo frames
> >
> >>From: Chandran, Sugesh
> >>Sent: Friday, December 8, 2017 6:05 PM
> >>To: Kavanagh, Mark B <mark.b.kavanagh@intel.com>;
> dev@openvswitch.org;
> >>qiudayu@chinac.com
> >>Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
> >><ciara.loftus@intel.com>; santosh.shukla@caviumnetworks.com
> >>Subject: RE: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support
> >>multi-segment jumbo frames
> >>
> >>Hi Mark,
> >>
> >>For some reason, I could not apply this patch cleanly.
> >
> >Apologies for this Sugesh, I'll send an updated version soon.
> >-Mark
> 
> Addendum: I think I know what happened Sugesh.
> 
> This patchset was built on the DPDK v17.11 upgrade patchset (this was
> mentioned in the cover letter!) - did you apply the former in advance of applying
> the multi-segment patchset?
[Sugesh] Apologies Mark, I do have the DPDK17.11 upgrade on my repo.
But its from the previous version. That caused the conflict.. 
My bad :(
> 
> Thanks again,
> Mark
> 
> >
> >>I couldn't do much of testing on the feature as such.
> >>Can you please send a proper Patch after rebase.
> >>
> >>Regards
> >>_Sugesh
> >>
> >>
> >>> -----Original Message-----
> >>> From: Kavanagh, Mark B
> >>> Sent: Friday, December 8, 2017 12:02 PM
> >>> To: dev@openvswitch.org; qiudayu@chinac.com
> >>> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
> >><ciara.loftus@intel.com>;
> >>> santosh.shukla@caviumnetworks.com; Chandran, Sugesh
> >>> <sugesh.chandran@intel.com>; Kavanagh, Mark B
> >>> <mark.b.kavanagh@intel.com>
> >>> Subject: [ovs-dev][RFC PATCH V4 9/9] netdev-dpdk: support
> >>> multi-segment jumbo frames
> >>>
> >>
> >>[Snip]
> >>> 1.9.3
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index d45904e..74a8910 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,7 @@  Post-v2.8.0
    - DPDK:
      * Add support for DPDK v17.11
      * Add support for vHost IOMMU
+     * Add support for multi-segment mbufs
 
 v2.8.0 - 31 Aug 2017
 --------------------
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 6710d10..5023d1a 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -456,6 +456,13 @@  dpdk_init__(const struct smap *ovs_other_config)
 
     /* Finally, register the dpdk classes */
     netdev_dpdk_register();
+
+    bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+            "dpdk-multi-seg-mbufs", false);
+    if (multi_seg_mbufs_enable) {
+        VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+        netdev_dpdk_multi_segment_mbufs_enable();
+    }
 }
 
 void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f83bb9e..a819a8f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -65,6 +65,7 @@  enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
 VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
 
 #define DPDK_PORT_WATCHDOG_INTERVAL 5
 
@@ -501,6 +502,7 @@  dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
               + dev->requested_n_txq * dev->requested_txq_size
               + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
               + MIN_NB_MBUF;
+    /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
 
     ovs_mutex_lock(&dpdk_mp_mutex);
     do {
@@ -588,7 +590,13 @@  dpdk_mp_free(struct rte_mempool *mp)
 
 /* Tries to allocate a new mempool - or re-use an existing one where
  * appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ *   together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ *   non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ *   fully accomdate packets of size 'requested_mtu'.
  * On success - or when re-using an existing mempool - the new configuration
  * will be applied.
  * On error, device will be left unchanged. */
@@ -596,10 +604,18 @@  static int
 netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
     OVS_REQUIRES(dev->mutex)
 {
-    uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+    uint16_t buf_size = 0;
     struct rte_mempool *mp;
     int ret = 0;
 
+    /* Contiguous mbufs in use - permit oversized mbufs */
+    if (!dpdk_multi_segment_mbufs) {
+        buf_size = dpdk_buf_size(dev->requested_mtu);
+    } else {
+        /* multi-segment mbufs - use standard mbuf size */
+        buf_size = dpdk_buf_size(ETHER_MTU);
+    }
+
     mp = dpdk_mp_create(dev, buf_size);
     if (!mp) {
         VLOG_ERR("Failed to create memory pool for netdev "
@@ -677,11 +693,25 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
     int diag = 0;
     int i;
     struct rte_eth_conf conf = port_conf;
+    struct rte_eth_txconf txconf;
+
+    /* Multi-segment-mbuf-specific setup. */
+    if (dpdk_multi_segment_mbufs) {
+            struct rte_eth_dev_info dev_info;
+
+            /* DPDK PMDs typically attempt to use simple or vectorized
+             * transmit functions, neither of which are compatible with
+             * multi-segment mbufs. Ensure that these are disabled when
+             * multi-segment mbufs are enabled.
+             */
+            rte_eth_dev_info_get(dev->port_id, &dev_info);
+            txconf = dev_info.default_txconf;
+            txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
 
-    /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
-     * enabled. */
-    if (dev->mtu > ETHER_MTU) {
-        conf.rxmode.enable_scatter = 1;
+            /* For some NICs (e.g. Niantic), scattered_rx mode (required for
+             * ingress jumbo frames when multi-segments are enabled) needs to
+             * be explicitly enabled. */
+            conf.rxmode.enable_scatter = 1;
     }
 
     conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
@@ -712,7 +742,9 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 
         for (i = 0; i < n_txq; i++) {
             diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
-                                          dev->socket_id, NULL);
+                                          dev->socket_id,
+                                          dpdk_multi_segment_mbufs ? &txconf
+                                                                   : NULL);
             if (diag) {
                 VLOG_INFO("Interface %s txq(%d) setup error: %s",
                           dev->up.name, i, rte_strerror(-diag));
@@ -3384,6 +3416,12 @@  unlock:
     return err;
 }
 
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+    dpdk_multi_segment_mbufs = true;
+}
+
 #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT,    \
                           SET_CONFIG, SET_TX_MULTIQ, SEND,    \
                           GET_CARRIER, GET_STATS,             \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..a3339fe 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,7 @@  struct dp_packet;
 
 #ifdef DPDK_NETDEV
 
+void netdev_dpdk_multi_segment_mbufs_enable(void);
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 4c317d0..ccce944 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@ 
         </p>
       </column>
 
+      <column name="other_config" key="dpdk-multi-seg-mbufs"
+              type='{"type": "boolean"}'>
+        <p>
+          Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+        </p>
+        <p>
+            If true, DPDK allocates a single mempool per port, irrespective
+            of the ports' requested MTU sizes. The elements of this mempool are
+            'standard'-sized mbufs (typically 2k MB), which may be chained
+            together to accommodate jumbo frames. In this approach, each mbuf
+            typically stores a fragment of the overall jumbo frame.
+        </p>
+        <p>
+            If not specified, defaults to <code>false</code>, in which case,
+            the size of each mbuf within a DPDK port's mempool will be grown to
+            accommodate jumbo frames within a single mbuf.
+        </p>
+      </column>
+
+
       <column name="other_config" key="vhost-sock-dir"
               type='{"type": "string"}'>
         <p>