diff mbox series

[ovs-dev,v4,01/15] netdev-dpdk: Differentiate between mtu/mbuf size.

Message ID 1531220774-219768-2-git-send-email-tiago.lam@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series Support multi-segment mbufs | expand

Commit Message

Lam, Tiago July 10, 2018, 11:06 a.m. UTC
When configuring a mempool, in netdev_dpdk_mempool_configure(), the
result of a call to dpdk_buf_size() is being used as the MTU. This,
however, is not the MTU but a ROUND_UP aligned number based on the MTU,
which could lead to the reuse of mempools even when the real MTUs
between different ports differ but somehow round up to the same mbuf
size.

To support multi-segment mbufs, which is coming in later commits, this
distinction is important. For multi-segment mbufs the mbuf size is
always the same (multiple mbufs are chained together to hold the
packet's data), which means that using the mbuf size as the mtu would
lead to reusing always the same mempool, independent of the MTU set.

To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
the mbuf_size, thus creating a distinction between the two. The latter
is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
while the former is used for tracking purposes in struct dpdk_mp and as
an input to create a unique hash for the mempool's name.

Signed-off-by: Tiago Lam <tiago.lam@intel.com>
---
 lib/netdev-dpdk.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Lam, Tiago July 10, 2018, 11:09 a.m. UTC | #1
On 10/07/2018 12:06, Tiago Lam wrote:
> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> result of a call to dpdk_buf_size() is being used as the MTU. This,
> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
> which could lead to the reuse of mempools even when the real MTUs
> between different ports differ but somehow round up to the same mbuf
> size.
> 
> To support multi-segment mbufs, which is coming in later commits, this
> distinction is important. For multi-segment mbufs the mbuf size is
> always the same (multiple mbufs are chained together to hold the
> packet's data), which means that using the mbuf size as the mtu would
> lead to reusing always the same mempool, independent of the MTU set.
> 
> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
> the mbuf_size, thus creating a distinction between the two. The latter
> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
> while the former is used for tracking purposes in struct dpdk_mp and as
> an input to create a unique hash for the mempool's name.
> 
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> ---
>  lib/netdev-dpdk.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 

Hi Ian,

This patch changes the way the dpdk_mp struct tracks the "mtu", which
was introduced in "Support both shared and per port mempools". Instead
of assigning the "buf_size" to the "mtu" member in the dpdk_mp struct,
we assign the "requested_mtu".

This was needed because for multi-segment mbufs the "mbuf_size" may end
up being always the same even for different MTUs (as the mbufs are
chained together to hold all data), which meant the same mempool would
end up being reused.

I don't expect this to change the functionality (ran a few tests myself
to confirm the correct behavior), but does this sound OK to you? Any
thoughts here?

Thanks,
Tiago.
Eelco Chaudron July 10, 2018, 12:54 p.m. UTC | #2
On 10 Jul 2018, at 13:06, Tiago Lam wrote:

> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> result of a call to dpdk_buf_size() is being used as the MTU. This,
> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
> which could lead to the reuse of mempools even when the real MTUs
> between different ports differ but somehow round up to the same mbuf
> size.
>
> To support multi-segment mbufs, which is coming in later commits, this
> distinction is important. For multi-segment mbufs the mbuf size is
> always the same (multiple mbufs are chained together to hold the
> packet's data), which means that using the mbuf size as the mtu would
> lead to reusing always the same mempool, independent of the MTU set.
>
> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
> the mbuf_size, thus creating a distinction between the two. The latter
> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
> while the former is used for tracking purposes in struct dpdk_mp and as
> an input to create a unique hash for the mempool's name.
>
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> ---

Changes look good to me.

Acked-by: Eelco Chaudron <echaudro@redhat.com>


<SNIP>
Stokes, Ian July 11, 2018, 11:40 a.m. UTC | #3
On 7/10/2018 12:06 PM, Tiago Lam wrote:
> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> result of a call to dpdk_buf_size() is being used as the MTU. This,
> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
> which could lead to the reuse of mempools even when the real MTUs
> between different ports differ but somehow round up to the same mbuf
> size.
> 
> To support multi-segment mbufs, which is coming in later commits, this
> distinction is important. For multi-segment mbufs the mbuf size is
> always the same (multiple mbufs are chained together to hold the
> packet's data), which means that using the mbuf size as the mtu would
> lead to reusing always the same mempool, independent of the MTU set.
> 
> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
> the mbuf_size, thus creating a distinction between the two. The latter
> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
> while the former is used for tracking purposes in struct dpdk_mp and as
> an input to create a unique hash for the mempool's name.
> 
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>

Hi Tiago,

I don't think these changes are acceptable as they break from the 
expected behavior of the shared mempool model.

The reason we introduced the shared mempool model was that it was used 
from OVS 2.6 -> 2.9.

A user with the same deployment moving between releases would not have 
to re-calculate the memory they dimension for OVS DPDK.

With these changes however they may have to.

Consider the following example:

A user adds a port with mtu 1500 and a second port with mtu 1800. 
Assuming both ports are on the same socket, the same mempool would be 
re-used for both ports in the current shared mempool model.

However with this change, separate mempools would have to be created as 
the MTU is different, potentially requiring an increase in the memory 
provisioned by the user. It may not be possible for users to upgrade 
from OVS 2.9 to 2.10 without re-dimension their memory with this.

We had a similar issue to this in OVS 2.9 release with the per port 
memory model, it was flagged as a blocking factor for Red Hat and 
Ericsson upgrades which led to the removal of the per port model in 2.9.

One possible solution here is that multiseg is supported for the per 
port memory model only?

Ian

> ---
>   lib/netdev-dpdk.c | 22 +++++++++++++---------
>   1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bb4d60f..be8e8b9 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -632,7 +632,8 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>   }
>   
>   static struct dpdk_mp *
> -dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> +dpdk_mp_create(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
> +               bool per_port_mp)
>   {
>       char mp_name[RTE_MEMPOOL_NAMESIZE];
>       const char *netdev_name = netdev_get_name(&dev->up);
> @@ -666,21 +667,23 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>           if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
>               VLOG_DBG("snprintf returned %d. "
>                        "Failed to generate a mempool name for \"%s\". "
> -                     "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
> -                     ret, netdev_name, hash, socket_id, mtu, n_mbufs);
> +                     "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u, "
> +                     "mbuf size: %u",
> +                     ret, netdev_name, hash, socket_id, mtu, n_mbufs,
> +                     mbuf_size);
>               break;
>           }
>   
> -        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
> +        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
>                     "on socket %d for %d Rx and %d Tx queues.",
> -                  netdev_name, n_mbufs, socket_id,
> +                  netdev_name, n_mbufs, mbuf_size, socket_id,
>                     dev->requested_n_rxq, dev->requested_n_txq);
>   
>           dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
>                                             MP_CACHE_SZ,
>                                             sizeof (struct dp_packet)
>                                             - sizeof (struct rte_mbuf),
> -                                          MBUF_SIZE(mtu)
> +                                          MBUF_SIZE(mbuf_size)
>                                             - sizeof(struct dp_packet),
>                                             socket_id);
>   
> @@ -718,7 +721,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>   }
>   
>   static struct dpdk_mp *
> -dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> +dpdk_mp_get(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
> +            bool per_port_mp)
>   {
>       struct dpdk_mp *dmp, *next;
>       bool reuse = false;
> @@ -741,7 +745,7 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>       dpdk_mp_sweep();
>   
>       if (!reuse) {
> -        dmp = dpdk_mp_create(dev, mtu, per_port_mp);
> +        dmp = dpdk_mp_create(dev, mtu, mbuf_size, per_port_mp);
>           if (dmp) {
>               /* Shared memory will hit the reuse case above so will not
>                * request a mempool that already exists but we need to check
> @@ -807,7 +811,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>           return ret;
>       }
>   
> -    dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
> +    dmp = dpdk_mp_get(dev, dev->requested_mtu, buf_size, per_port_mp);
>       if (!dmp) {
>           VLOG_ERR("Failed to create memory pool for netdev "
>                    "%s, with MTU %d on socket %d: %s\n",
>
Lam, Tiago July 11, 2018, 1:45 p.m. UTC | #4
On 11/07/2018 12:40, Ian Stokes wrote:
> On 7/10/2018 12:06 PM, Tiago Lam wrote:
>> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
>> result of a call to dpdk_buf_size() is being used as the MTU. This,
>> however, is not the MTU but a ROUND_UP aligned number based on the MTU,
>> which could lead to the reuse of mempools even when the real MTUs
>> between different ports differ but somehow round up to the same mbuf
>> size.
>>
>> To support multi-segment mbufs, which is coming in later commits, this
>> distinction is important. For multi-segment mbufs the mbuf size is
>> always the same (multiple mbufs are chained together to hold the
>> packet's data), which means that using the mbuf size as the mtu would
>> lead to reusing always the same mempool, independent of the MTU set.
>>
>> To fix this, two fields are now passed to dpdk_mp_create(), the mtu and
>> the mbuf_size, thus creating a distinction between the two. The latter
>> is used for telling rte_pktmbuf_pool_create() the size of each mbuf,
>> while the former is used for tracking purposes in struct dpdk_mp and as
>> an input to create a unique hash for the mempool's name.
>>
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> 
> Hi Tiago,
> 
> I don't think these changes are acceptable as they break from the 
> expected behavior of the shared mempool model.
> 
> The reason we introduced the shared mempool model was that it was used 
> from OVS 2.6 -> 2.9.
> 
> A user with the same deployment moving between releases would not have 
> to re-calculate the memory they dimension for OVS DPDK.
> 
> With these changes however they may have to.
> 
> Consider the following example:
> 
> A user adds a port with mtu 1500 and a second port with mtu 1800. 
> Assuming both ports are on the same socket, the same mempool would be 
> re-used for both ports in the current shared mempool model.

> However with this change, separate mempools would have to be created as 
> the MTU is different, potentially requiring an increase in the memory 
> provisioned by the user. It may not be possible for users to upgrade 
> from OVS 2.9 to 2.10 without re-dimension their memory with this.
> 

Hi Ian,

Thanks for the context here.

It's a valid point. I did consider this, but wasn't aware that users /
deployments were also taking this round up into account. In that case,
this could certainly cause problems during upgrade.

> We had a similar issue to this in OVS 2.9 release with the per port 
> memory model, it was flagged as a blocking factor for Red Hat and 
> Ericsson upgrades which led to the removal of the per port model in 2.9.
> 
> One possible solution here is that multiseg is supported for the per 
> port memory model only?
> 

It seems we can adapt this in a different way. I was trying to reuse the
"mbuf_size" for both cases; In the per-port and shared mempool case,
where it is being used as the MTU, and in the multi-segment case where
it is being used for telling rte_pktmbuf_pool_create() the size of each
mbuf. Hence why I thought it would be better to differentiate between
the real MTU and the "mbuf_size". Instead, we can calculate two
"mbuf_sizes" - the one the shared and per-port mempool models use for
tracking the MTU changes, and in case multi-segment mbufs are in use,
the "mbuf_size" for when creating the mempool. This should suffice.

I'll update the series with this; This patch 01/15 will probably be
dropped entirely, and patch 12/15 will likely be the one taking the
changes mentioned above.

I'll also update patches 05/15 and 11/15 to fix the Travis CI.

Thanks,
Tiago.
Stokes, Ian July 11, 2018, 1:47 p.m. UTC | #5
> On 11/07/2018 12:40, Ian Stokes wrote:
> > On 7/10/2018 12:06 PM, Tiago Lam wrote:
> >> When configuring a mempool, in netdev_dpdk_mempool_configure(), the
> >> result of a call to dpdk_buf_size() is being used as the MTU. This,
> >> however, is not the MTU but a ROUND_UP aligned number based on the
> >> MTU, which could lead to the reuse of mempools even when the real
> >> MTUs between different ports differ but somehow round up to the same
> >> mbuf size.
> >>
> >> To support multi-segment mbufs, which is coming in later commits,
> >> this distinction is important. For multi-segment mbufs the mbuf size
> >> is always the same (multiple mbufs are chained together to hold the
> >> packet's data), which means that using the mbuf size as the mtu would
> >> lead to reusing always the same mempool, independent of the MTU set.
> >>
> >> To fix this, two fields are now passed to dpdk_mp_create(), the mtu
> >> and the mbuf_size, thus creating a distinction between the two. The
> >> latter is used for telling rte_pktmbuf_pool_create() the size of each
> >> mbuf, while the former is used for tracking purposes in struct
> >> dpdk_mp and as an input to create a unique hash for the mempool's name.
> >>
> >> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> >
> > Hi Tiago,
> >
> > I don't think these changes are acceptable as they break from the
> > expected behavior of the shared mempool model.
> >
> > The reason we introduced the shared mempool model was that it was used
> > from OVS 2.6 -> 2.9.
> >
> > A user with the same deployment moving between releases would not have
> > to re-calculate the memory they dimension for OVS DPDK.
> >
> > With these changes however they may have to.
> >
> > Consider the following example:
> >
> > A user adds a port with mtu 1500 and a second port with mtu 1800.
> > Assuming both ports are on the same socket, the same mempool would be
> > re-used for both ports in the current shared mempool model.
> 
> > However with this change, separate mempools would have to be created
> > as the MTU is different, potentially requiring an increase in the
> > memory provisioned by the user. It may not be possible for users to
> > upgrade from OVS 2.9 to 2.10 without re-dimension their memory with
> this.
> >
> 
> Hi Ian,
> 
> Thanks for the context here.
> 
> It's a valid point. I did consider this, but wasn't aware that users /
> deployments were also taking this round up into account. In that case,
> this could certainly cause problems during upgrade.
> 
> > We had a similar issue to this in OVS 2.9 release with the per port
> > memory model, it was flagged as a blocking factor for Red Hat and
> > Ericsson upgrades which led to the removal of the per port model in 2.9.
> >
> > One possible solution here is that multiseg is supported for the per
> > port memory model only?
> >
> 
> It seems we can adapt this in a different way. I was trying to reuse the
> "mbuf_size" for both cases; In the per-port and shared mempool case, where
> it is being used as the MTU, and in the multi-segment case where it is
> being used for telling rte_pktmbuf_pool_create() the size of each mbuf.
> Hence why I thought it would be better to differentiate between the real
> MTU and the "mbuf_size". Instead, we can calculate two "mbuf_sizes" - the
> one the shared and per-port mempool models use for tracking the MTU
> changes, and in case multi-segment mbufs are in use, the "mbuf_size" for
> when creating the mempool. This should suffice.
> 
> I'll update the series with this; This patch 01/15 will probably be
> dropped entirely, and patch 12/15 will likely be the one taking the
> changes mentioned above.
> 
> I'll also update patches 05/15 and 11/15 to fix the Travis CI.
> 
> Thanks,
> Tiago.

That sounds ok, will review in the next revision.

Thanks
Ian
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bb4d60f..be8e8b9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -632,7 +632,8 @@  dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
 }
 
 static struct dpdk_mp *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
+               bool per_port_mp)
 {
     char mp_name[RTE_MEMPOOL_NAMESIZE];
     const char *netdev_name = netdev_get_name(&dev->up);
@@ -666,21 +667,23 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
         if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
             VLOG_DBG("snprintf returned %d. "
                      "Failed to generate a mempool name for \"%s\". "
-                     "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
-                     ret, netdev_name, hash, socket_id, mtu, n_mbufs);
+                     "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u, "
+                     "mbuf size: %u",
+                     ret, netdev_name, hash, socket_id, mtu, n_mbufs,
+                     mbuf_size);
             break;
         }
 
-        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
+        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
                   "on socket %d for %d Rx and %d Tx queues.",
-                  netdev_name, n_mbufs, socket_id,
+                  netdev_name, n_mbufs, mbuf_size, socket_id,
                   dev->requested_n_rxq, dev->requested_n_txq);
 
         dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
                                           MP_CACHE_SZ,
                                           sizeof (struct dp_packet)
                                           - sizeof (struct rte_mbuf),
-                                          MBUF_SIZE(mtu)
+                                          MBUF_SIZE(mbuf_size)
                                           - sizeof(struct dp_packet),
                                           socket_id);
 
@@ -718,7 +721,8 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
 }
 
 static struct dpdk_mp *
-dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
+            bool per_port_mp)
 {
     struct dpdk_mp *dmp, *next;
     bool reuse = false;
@@ -741,7 +745,7 @@  dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
     dpdk_mp_sweep();
 
     if (!reuse) {
-        dmp = dpdk_mp_create(dev, mtu, per_port_mp);
+        dmp = dpdk_mp_create(dev, mtu, mbuf_size, per_port_mp);
         if (dmp) {
             /* Shared memory will hit the reuse case above so will not
              * request a mempool that already exists but we need to check
@@ -807,7 +811,7 @@  netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
         return ret;
     }
 
-    dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
+    dmp = dpdk_mp_get(dev, dev->requested_mtu, buf_size, per_port_mp);
     if (!dmp) {
         VLOG_ERR("Failed to create memory pool for netdev "
                  "%s, with MTU %d on socket %d: %s\n",