diff mbox series

[ovs-dev] netdev-dpdk: Add mbuf HEADROOM after alignment.

Message ID 1542382363-251246-1-git-send-email-tiago.lam@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series [ovs-dev] netdev-dpdk: Add mbuf HEADROOM after alignment. | expand

Commit Message

Lam, Tiago Nov. 16, 2018, 3:32 p.m. UTC
Commit dfaf00e started using the result of dpdk_buf_size() to calculate
the available size on each mbuf, as opposed to using the previous
MBUF_SIZE macro. However, this was calculating the mbuf size by adding
up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to
NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.

Before alignment:
ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048

After aligment:
ROUNDUP(MTU(1500), 1024) + 128 = 2176

This might seem insignificant, however, it might have performance
implications in DPDK, where each mbuf is expected to have 2k +
RTE_PKTMBUF_HEADROOM of available space. This is because not only some
NICs have course grained alignments of 1k, they will also take
RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf
when setting up their Rx requirements. Thus, only the "After alignment"
case above would guarantee a 2k of available room, as the "Before
alignment" would report only 1920B.

Some extra information can be found at:
https://mails.dpdk.org/archives/dev/2018-November/119219.html

Note: This has been found by Ian Stokes while going through some
af_packet checks.

Reported-by: Ian Stokes <ian.stokes@intel.com>
Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
---
 Documentation/topics/dpdk/memory.rst | 20 ++++++++++----------
 lib/netdev-dpdk.c                    |  6 ++++--
 2 files changed, 14 insertions(+), 12 deletions(-)

Comments

Stokes, Ian Nov. 27, 2018, 1:57 p.m. UTC | #1
> Commit dfaf00e started using the result of dpdk_buf_size() to calculate
> the available size on each mbuf, as opposed to using the previous
> MBUF_SIZE macro. However, this was calculating the mbuf size by adding up
> the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to
> NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
> RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.
> 
> Before alignment:
> ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048
> 
> After aligment:
> ROUNDUP(MTU(1500), 1024) + 128 = 2176
> 
> This might seem insignificant, however, it might have performance
> implications in DPDK, where each mbuf is expected to have 2k +
> RTE_PKTMBUF_HEADROOM of available space. This is because not only some
> NICs have course grained alignments of 1k, they will also take
> RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf
> when setting up their Rx requirements. Thus, only the "After alignment"
> case above would guarantee a 2k of available room, as the "Before
> alignment" would report only 1920B.
> 
> Some extra information can be found at:
> https://mails.dpdk.org/archives/dev/2018-November/119219.html
> 
> Note: This has been found by Ian Stokes while going through some af_packet
> checks.
> 

Hi Tiago,, thanks for this, just a query below.

> Reported-by: Ian Stokes <ian.stokes@intel.com>
> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> ---
>  Documentation/topics/dpdk/memory.rst | 20 ++++++++++----------
>  lib/netdev-dpdk.c                    |  6 ++++--
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/memory.rst
> b/Documentation/topics/dpdk/memory.rst
> index c9b739f..3c4ee17 100644
> --- a/Documentation/topics/dpdk/memory.rst
> +++ b/Documentation/topics/dpdk/memory.rst
> @@ -107,8 +107,8 @@ Example 1
> 
>   MTU = 1500 Bytes
>   Number of mbufs = 262144
> - Mbuf size = 2752 Bytes
> - Memory required = 262144 * 2752 = 721 MB
> + Mbuf size = 2880 Bytes
> + Memory required = 262144 * 2880 = 755 MB

These measurements don't deem to take the header and trailer into account for total size when calculating the memory requirements? Is there any particular reason? 

total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;

I would expect above to be 262144 * 3008 = 788 MB.

Ian
> 
>  Example 2
>  +++++++++
> @@ -116,8 +116,8 @@ Example 2
> 
>   MTU = 1800 Bytes
>   Number of mbufs = 262144
> - Mbuf size = 2752 Bytes
> - Memory required = 262144 * 2752 = 721 MB
> + Mbuf size = 2880 Bytes
> + Memory required = 262144 * 2880 = 755 MB
> 
>  .. note::
> 
> @@ -130,8 +130,8 @@ Example 3
> 
>   MTU = 6000 Bytes
>   Number of mbufs = 262144
> - Mbuf size = 8000 Bytes
> - Memory required = 262144 * 8000 = 2097 MB
> + Mbuf size = 6976 Bytes
> + Memory required = 262144 * 6976 = 1829 MB
> 
>  Example 4
>  +++++++++
> @@ -194,8 +194,8 @@ Example 1: (1 rxq, 1 PMD, 1500 MTU)
> 
>   MTU = 1500
>   Number of mbufs = (1 * 2048) + (2 * 2048) + (1 * 32) + (16384) = 22560
> - Mbuf size = 2752 Bytes
> - Memory required = 22560 * 2752 = 62 MB
> + Mbuf size = 2880 Bytes
> + Memory required = 22560 * 2880 = 65 MB
> 
>  Example 2: (1 rxq, 2 PMD, 6000 MTU)
>  +++++++++++++++++++++++++++++++++++
> @@ -203,8 +203,8 @@ Example 2: (1 rxq, 2 PMD, 6000 MTU)
> 
>   MTU = 6000
>   Number of mbufs = (1 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 24608
> - Mbuf size = 8000 Bytes
> - Memory required = 24608 * 8000 = 196 MB
> + Mbuf size = 6976 Bytes
> + Memory required = 24608 * 6976 = 171 MB
> 
>  Example 3: (2 rxq, 2 PMD, 9000 MTU)
>  +++++++++++++++++++++++++++++++++++
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e8618a6..a871743
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -521,8 +521,8 @@ is_dpdk_class(const struct netdev_class *class)
> static uint32_t  dpdk_buf_size(int mtu)  {
> -    return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM),
> -                     NETDEV_DPDK_MBUF_ALIGN);
> +    return ROUND_UP(MTU_TO_MAX_FRAME_LEN(mtu), NETDEV_DPDK_MBUF_ALIGN)
> +            + RTE_PKTMBUF_HEADROOM;
>  }
> 
>  /* Allocates an area of 'sz' bytes from DPDK.  The memory is zero'ed.
> @@ -681,6 +681,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool
> per_port_mp)
>                    dev->requested_n_rxq, dev->requested_n_txq,
>                    RTE_CACHE_LINE_SIZE);
> 
> +        /* The size of the mbuf's private area (i.e. area that holds OvS'
> +         * dp_packet data)*/
>          mbuf_priv_data_len = sizeof(struct dp_packet) -
>                                   sizeof(struct rte_mbuf);
>          /* The size of the entire dp_packet. */
> --
> 2.7.4
Lam, Tiago Nov. 27, 2018, 3:32 p.m. UTC | #2
On 27/11/2018 13:57, Stokes, Ian wrote:
>> Commit dfaf00e started using the result of dpdk_buf_size() to calculate
>> the available size on each mbuf, as opposed to using the previous
>> MBUF_SIZE macro. However, this was calculating the mbuf size by adding up
>> the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to
>> NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
>> RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.
>>
>> Before alignment:
>> ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048
>>
>> After aligment:
>> ROUNDUP(MTU(1500), 1024) + 128 = 2176
>>
>> This might seem insignificant, however, it might have performance
>> implications in DPDK, where each mbuf is expected to have 2k +
>> RTE_PKTMBUF_HEADROOM of available space. This is because not only some
>> NICs have course grained alignments of 1k, they will also take
>> RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf
>> when setting up their Rx requirements. Thus, only the "After alignment"
>> case above would guarantee a 2k of available room, as the "Before
>> alignment" would report only 1920B.
>>
>> Some extra information can be found at:
>> https://mails.dpdk.org/archives/dev/2018-November/119219.html
>>
>> Note: This has been found by Ian Stokes while going through some af_packet
>> checks.
>>
> 
> Hi Tiago,, thanks for this, just a query below.
> 
>> Reported-by: Ian Stokes <ian.stokes@intel.com>
>> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>> ---
>>  Documentation/topics/dpdk/memory.rst | 20 ++++++++++----------
>>  lib/netdev-dpdk.c                    |  6 ++++--
>>  2 files changed, 14 insertions(+), 12 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/memory.rst
>> b/Documentation/topics/dpdk/memory.rst
>> index c9b739f..3c4ee17 100644
>> --- a/Documentation/topics/dpdk/memory.rst
>> +++ b/Documentation/topics/dpdk/memory.rst
>> @@ -107,8 +107,8 @@ Example 1
>>
>>   MTU = 1500 Bytes
>>   Number of mbufs = 262144
>> - Mbuf size = 2752 Bytes
>> - Memory required = 262144 * 2752 = 721 MB
>> + Mbuf size = 2880 Bytes
>> + Memory required = 262144 * 2880 = 755 MB
> 
> These measurements don't deem to take the header and trailer into account for total size when calculating the memory requirements? Is there any particular reason? 
> 
> total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> 
> I would expect above to be 262144 * 3008 = 788 MB.

Thanks for looking into it, Ian.

I hadn't considered that.

But I guess you're right, technically, since that will end up being the
total size of each mempool element. However, the size of each mbuf is
still the 2880B above.

Looking into commit 31154f95 (before the first change to this file was
applied), I can see that the size of each mbuf would be 2944B (for a
1500B MTU), where each mempool element would have a size of 3008B (the
extra 64B is the mp->header_size above). And we are using the 3008B
instead of the 2944B as the mbuf size. Was this on purpose to reflect
the total memory required?

Just wondering though, I can add the same logic anyway, which would
change the values to 2880B + 64B = 2944B.

Tiago.
Stokes, Ian Nov. 27, 2018, 4:10 p.m. UTC | #3
> On 27/11/2018 13:57, Stokes, Ian wrote:
> >> Commit dfaf00e started using the result of dpdk_buf_size() to
> >> calculate the available size on each mbuf, as opposed to using the
> >> previous MBUF_SIZE macro. However, this was calculating the mbuf size
> >> by adding up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning
> >> to NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
> >> RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.
> >>
> >> Before alignment:
> >> ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048
> >>
> >> After aligment:
> >> ROUNDUP(MTU(1500), 1024) + 128 = 2176
> >>
> >> This might seem insignificant, however, it might have performance
> >> implications in DPDK, where each mbuf is expected to have 2k +
> >> RTE_PKTMBUF_HEADROOM of available space. This is because not only
> >> some NICs have course grained alignments of 1k, they will also take
> >> RTE_PKTMBUF_HEADROOM bytes from the overall available space in an
> >> mbuf when setting up their Rx requirements. Thus, only the "After
> alignment"
> >> case above would guarantee a 2k of available room, as the "Before
> >> alignment" would report only 1920B.
> >>
> >> Some extra information can be found at:
> >> https://mails.dpdk.org/archives/dev/2018-November/119219.html
> >>
> >> Note: This has been found by Ian Stokes while going through some
> >> af_packet checks.
> >>
> >
> > Hi Tiago,, thanks for this, just a query below.
> >
> >> Reported-by: Ian Stokes <ian.stokes@intel.com>
> >> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
> >> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> >> ---
> >>  Documentation/topics/dpdk/memory.rst | 20 ++++++++++----------
> >>  lib/netdev-dpdk.c                    |  6 ++++--
> >>  2 files changed, 14 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/Documentation/topics/dpdk/memory.rst
> >> b/Documentation/topics/dpdk/memory.rst
> >> index c9b739f..3c4ee17 100644
> >> --- a/Documentation/topics/dpdk/memory.rst
> >> +++ b/Documentation/topics/dpdk/memory.rst
> >> @@ -107,8 +107,8 @@ Example 1
> >>
> >>   MTU = 1500 Bytes
> >>   Number of mbufs = 262144
> >> - Mbuf size = 2752 Bytes
> >> - Memory required = 262144 * 2752 = 721 MB
> >> + Mbuf size = 2880 Bytes
> >> + Memory required = 262144 * 2880 = 755 MB
> >
> > These measurements don't deem to take the header and trailer into
> account for total size when calculating the memory requirements? Is there
> any particular reason?
> >
> > total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> >
> > I would expect above to be 262144 * 3008 = 788 MB.
> 
> Thanks for looking into it, Ian.
> 
> I hadn't considered that.
> 
> But I guess you're right, technically, since that will end up being the
> total size of each mempool element. However, the size of each mbuf is
> still the 2880B above.
> 
> Looking into commit 31154f95 (before the first change to this file was
> applied), I can see that the size of each mbuf would be 2944B (for a 1500B
> MTU), where each mempool element would have a size of 3008B (the extra 64B
> is the mp->header_size above). And we are using the 3008B instead of the
> 2944B as the mbuf size. Was this on purpose to reflect the total memory
> required?

Yes, this is on purpose to account for the total memory requirements for a given deployment as memory for the header would also have to be provisioned for.

> 
> Just wondering though, I can add the same logic anyway, which would change
> the values to 2880B + 64B = 2944B.

I think above should be 3008 again, looking at the v1 of this patch it would break down as follows

total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
3008 = 64 + 2880 + 64

I would think the trailer should be accounted for as well as the header.

Ian
> 
> Tiago.
Lam, Tiago Nov. 27, 2018, 4:36 p.m. UTC | #4
On 27/11/2018 16:10, Stokes, Ian wrote:
>> On 27/11/2018 13:57, Stokes, Ian wrote:
>>>> Commit dfaf00e started using the result of dpdk_buf_size() to
>>>> calculate the available size on each mbuf, as opposed to using the
>>>> previous MBUF_SIZE macro. However, this was calculating the mbuf size
>>>> by adding up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning
>>>> to NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
>>>> RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.
>>>>
>>>> Before alignment:
>>>> ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048
>>>>
>>>> After aligment:
>>>> ROUNDUP(MTU(1500), 1024) + 128 = 2176
>>>>
>>>> This might seem insignificant, however, it might have performance
>>>> implications in DPDK, where each mbuf is expected to have 2k +
>>>> RTE_PKTMBUF_HEADROOM of available space. This is because not only
>>>> some NICs have course grained alignments of 1k, they will also take
>>>> RTE_PKTMBUF_HEADROOM bytes from the overall available space in an
>>>> mbuf when setting up their Rx requirements. Thus, only the "After
>> alignment"
>>>> case above would guarantee a 2k of available room, as the "Before
>>>> alignment" would report only 1920B.
>>>>
>>>> Some extra information can be found at:
>>>> https://mails.dpdk.org/archives/dev/2018-November/119219.html
>>>>
>>>> Note: This has been found by Ian Stokes while going through some
>>>> af_packet checks.
>>>>
>>>
>>> Hi Tiago,, thanks for this, just a query below.
>>>
>>>> Reported-by: Ian Stokes <ian.stokes@intel.com>
>>>> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
>>>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>>>> ---
>>>>  Documentation/topics/dpdk/memory.rst | 20 ++++++++++----------
>>>>  lib/netdev-dpdk.c                    |  6 ++++--
>>>>  2 files changed, 14 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/Documentation/topics/dpdk/memory.rst
>>>> b/Documentation/topics/dpdk/memory.rst
>>>> index c9b739f..3c4ee17 100644
>>>> --- a/Documentation/topics/dpdk/memory.rst
>>>> +++ b/Documentation/topics/dpdk/memory.rst
>>>> @@ -107,8 +107,8 @@ Example 1
>>>>
>>>>   MTU = 1500 Bytes
>>>>   Number of mbufs = 262144
>>>> - Mbuf size = 2752 Bytes
>>>> - Memory required = 262144 * 2752 = 721 MB
>>>> + Mbuf size = 2880 Bytes
>>>> + Memory required = 262144 * 2880 = 755 MB
>>>
>>> These measurements don't deem to take the header and trailer into
>> account for total size when calculating the memory requirements? Is there
>> any particular reason?
>>>
>>> total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>>>
>>> I would expect above to be 262144 * 3008 = 788 MB.
>>
>> Thanks for looking into it, Ian.
>>
>> I hadn't considered that.
>>
>> But I guess you're right, technically, since that will end up being the
>> total size of each mempool element. However, the size of each mbuf is
>> still the 2880B above.
>>
>> Looking into commit 31154f95 (before the first change to this file was
>> applied), I can see that the size of each mbuf would be 2944B (for a 1500B
>> MTU), where each mempool element would have a size of 3008B (the extra 64B
>> is the mp->header_size above). And we are using the 3008B instead of the
>> 2944B as the mbuf size. Was this on purpose to reflect the total memory
>> required?
> 
> Yes, this is on purpose to account for the total memory requirements for a given deployment as memory for the header would also have to be provisioned for.
> 
>>
>> Just wondering though, I can add the same logic anyway, which would change
>> the values to 2880B + 64B = 2944B.
> 
> I think above should be 3008 again, looking at the v1 of this patch it would break down as follows
> 
> total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> 3008 = 64 + 2880 + 64
> 
> I would think the trailer should be accounted for as well as the header.
> 
> Ian

Fair enough, I'll send v3.
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst
index c9b739f..3c4ee17 100644
--- a/Documentation/topics/dpdk/memory.rst
+++ b/Documentation/topics/dpdk/memory.rst
@@ -107,8 +107,8 @@  Example 1
 
  MTU = 1500 Bytes
  Number of mbufs = 262144
- Mbuf size = 2752 Bytes
- Memory required = 262144 * 2752 = 721 MB
+ Mbuf size = 2880 Bytes
+ Memory required = 262144 * 2880 = 755 MB
 
 Example 2
 +++++++++
@@ -116,8 +116,8 @@  Example 2
 
  MTU = 1800 Bytes
  Number of mbufs = 262144
- Mbuf size = 2752 Bytes
- Memory required = 262144 * 2752 = 721 MB
+ Mbuf size = 2880 Bytes
+ Memory required = 262144 * 2880 = 755 MB
 
 .. note::
 
@@ -130,8 +130,8 @@  Example 3
 
  MTU = 6000 Bytes
  Number of mbufs = 262144
- Mbuf size = 8000 Bytes
- Memory required = 262144 * 8000 = 2097 MB
+ Mbuf size = 6976 Bytes
+ Memory required = 262144 * 6976 = 1829 MB
 
 Example 4
 +++++++++
@@ -194,8 +194,8 @@  Example 1: (1 rxq, 1 PMD, 1500 MTU)
 
  MTU = 1500
  Number of mbufs = (1 * 2048) + (2 * 2048) + (1 * 32) + (16384) = 22560
- Mbuf size = 2752 Bytes
- Memory required = 22560 * 2752 = 62 MB
+ Mbuf size = 2880 Bytes
+ Memory required = 22560 * 2880 = 65 MB
 
 Example 2: (1 rxq, 2 PMD, 6000 MTU)
 +++++++++++++++++++++++++++++++++++
@@ -203,8 +203,8 @@  Example 2: (1 rxq, 2 PMD, 6000 MTU)
 
  MTU = 6000
  Number of mbufs = (1 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 24608
- Mbuf size = 8000 Bytes
- Memory required = 24608 * 8000 = 196 MB
+ Mbuf size = 6976 Bytes
+ Memory required = 24608 * 6976 = 171 MB
 
 Example 3: (2 rxq, 2 PMD, 9000 MTU)
 +++++++++++++++++++++++++++++++++++
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e8618a6..a871743 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -521,8 +521,8 @@  is_dpdk_class(const struct netdev_class *class)
 static uint32_t
 dpdk_buf_size(int mtu)
 {
-    return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM),
-                     NETDEV_DPDK_MBUF_ALIGN);
+    return ROUND_UP(MTU_TO_MAX_FRAME_LEN(mtu), NETDEV_DPDK_MBUF_ALIGN)
+            + RTE_PKTMBUF_HEADROOM;
 }
 
 /* Allocates an area of 'sz' bytes from DPDK.  The memory is zero'ed.
@@ -681,6 +681,8 @@  dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
                   dev->requested_n_rxq, dev->requested_n_txq,
                   RTE_CACHE_LINE_SIZE);
 
+        /* The size of the mbuf's private area (i.e. area that holds OvS'
+         * dp_packet data)*/
         mbuf_priv_data_len = sizeof(struct dp_packet) -
                                  sizeof(struct rte_mbuf);
         /* The size of the entire dp_packet. */