Message ID | 1543337663-160716-1-git-send-email-tiago.lam@intel.com |
---|---|
State | Accepted |
Delegated to: | Ian Stokes |
Headers | show |
Series | [ovs-dev,v3] netdev-dpdk: Add mbuf HEADROOM after alignment. | expand |
> Commit dfaf00e started using the result of dpdk_buf_size() to calculate > the available size on each mbuf, as opposed to using the previous > MBUF_SIZE macro. However, this was calculating the mbuf size by adding up > the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to > NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the > RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below. > > Before alignment: > ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048 > > After aligment: > ROUNDUP(MTU(1500), 1024) + 128 = 2176 > > This might seem insignificant, however, it might have performance > implications in DPDK, where each mbuf is expected to have 2k + > RTE_PKTMBUF_HEADROOM of available space. This is because not only some > NICs have course grained alignments of 1k, they will also take > RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf > when setting up their Rx requirements. Thus, only the "After alignment" > case above would guarantee a 2k of available room, as the "Before > alignment" would report only 1920B. > > Some extra information can be found at: > https://mails.dpdk.org/archives/dev/2018-November/119219.html > > Note: This has been found by Ian Stokes while going through some af_packet > checks. > > Reported-by: Ian Stokes <ian.stokes@intel.com> > Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing") > Signed-off-by: Tiago Lam <tiago.lam@intel.com> Thanks Tiago, I've applied this to master. If I'm not mistaken I believe it should be back ported also, previous releases were using the incorrect calculation for dpdk_buf_size Thoughts? Ian
On 28/11/2018 15:56, Stokes, Ian wrote: >> Commit dfaf00e started using the result of dpdk_buf_size() to calculate >> the available size on each mbuf, as opposed to using the previous >> MBUF_SIZE macro. However, this was calculating the mbuf size by adding up >> the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to >> NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the >> RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below. >> >> Before alignment: >> ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048 >> >> After aligment: >> ROUNDUP(MTU(1500), 1024) + 128 = 2176 >> >> This might seem insignificant, however, it might have performance >> implications in DPDK, where each mbuf is expected to have 2k + >> RTE_PKTMBUF_HEADROOM of available space. This is because not only some >> NICs have course grained alignments of 1k, they will also take >> RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf >> when setting up their Rx requirements. Thus, only the "After alignment" >> case above would guarantee a 2k of available room, as the "Before >> alignment" would report only 1920B. >> >> Some extra information can be found at: >> https://mails.dpdk.org/archives/dev/2018-November/119219.html >> >> Note: This has been found by Ian Stokes while going through some af_packet >> checks. >> >> Reported-by: Ian Stokes <ian.stokes@intel.com> >> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing") >> Signed-off-by: Tiago Lam <tiago.lam@intel.com> > > Thanks Tiago, I've applied this to master. If I'm not mistaken I believe it should be back ported also, previous releases were using the incorrect calculation for dpdk_buf_size Thoughts? Thanks for that, Ian. On back porting, that's a good point. I'm not too sure on this one here, though. From what I can gather, on previous versions until 2.6.x (so, 2.6.x, 2.7.x, 2.8.x, 2.9.x and 2.10.x) we won't have the same problem. This is because we are over allocating the size of each mbuf and the size being passed to `rte_pktmbuf_pool_create()` is adding RTE_PKTMBUF_HEADROOM for the second time, since it had already been added in dpdk_buf_size() and will be added again by the MBUF_SIZE macro. So, here we can potentially save some space by back porting, but I don't think we ever pass less size than we should into DPDK. So, I think the TL;DR here is that this incorrect behavior was introduced in commit dfaf00e ("netdev-dpdk: fix mbuf sizing") only, where the MBUF_SIZE header was removed. As a result, we started relying on dpdk_buf_size() result only, which adds RTE_PKTMBUF_HEADROOM first and aligns to MBUF_ALIGN afterwards, meaning we can end up passing only 2k when calling DPDK's rte_pktmbuf_pool_create(), which may not be enough. Back porting could make this up a bit, though. Let me know if you think it would still make sense. Regards, Tiago. (As an aside, from 2.7.x below, OvS-DPDK is not using `rte_pktmbuf_pool_create` and is using `rte_mempool_create()` directly (likely because the former wasn't available in DPDK yet). And there I think passing RTE_PKTMBUF_HEADROOM twice is correct behavior since the mempool API doesn't add `sizeof(struct rte_mbuf)` as `rte_pktmbuf_pool_create()` does. So, this may very well be a leftover.)
diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst index c9b739f..9ebfd11 100644 --- a/Documentation/topics/dpdk/memory.rst +++ b/Documentation/topics/dpdk/memory.rst @@ -107,8 +107,8 @@ Example 1 MTU = 1500 Bytes Number of mbufs = 262144 - Mbuf size = 2752 Bytes - Memory required = 262144 * 2752 = 721 MB + Mbuf size = 3008 Bytes + Memory required = 262144 * 3008 = 788 MB Example 2 +++++++++ @@ -116,8 +116,8 @@ Example 2 MTU = 1800 Bytes Number of mbufs = 262144 - Mbuf size = 2752 Bytes - Memory required = 262144 * 2752 = 721 MB + Mbuf size = 3008 Bytes + Memory required = 262144 * 3008 = 788 MB .. note:: @@ -130,8 +130,8 @@ Example 3 MTU = 6000 Bytes Number of mbufs = 262144 - Mbuf size = 8000 Bytes - Memory required = 262144 * 8000 = 2097 MB + Mbuf size = 7104 Bytes + Memory required = 262144 * 7104 = 1862 MB Example 4 +++++++++ @@ -139,8 +139,8 @@ Example 4 MTU = 9000 Bytes Number of mbufs = 262144 - Mbuf size = 10048 Bytes - Memory required = 262144 * 10048 = 2634 MB + Mbuf size = 10176 Bytes + Memory required = 262144 * 10176 = 2667 MB Per Port Memory Calculations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -194,8 +194,8 @@ Example 1: (1 rxq, 1 PMD, 1500 MTU) MTU = 1500 Number of mbufs = (1 * 2048) + (2 * 2048) + (1 * 32) + (16384) = 22560 - Mbuf size = 2752 Bytes - Memory required = 22560 * 2752 = 62 MB + Mbuf size = 3008 Bytes + Memory required = 22560 * 3008 = 67 MB Example 2: (1 rxq, 2 PMD, 6000 MTU) +++++++++++++++++++++++++++++++++++ @@ -203,8 +203,8 @@ Example 2: (1 rxq, 2 PMD, 6000 MTU) MTU = 6000 Number of mbufs = (1 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 24608 - Mbuf size = 8000 Bytes - Memory required = 24608 * 8000 = 196 MB + Mbuf size = 7104 Bytes + Memory required = 24608 * 7104 = 175 MB Example 3: (2 rxq, 2 PMD, 9000 MTU) +++++++++++++++++++++++++++++++++++ @@ -212,5 +212,5 @@ Example 3: (2 rxq, 2 PMD, 9000 MTU) MTU = 9000 Number of mbufs = (2 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 26656 - Mbuf size = 10048 Bytes - Memory required = 26656 * 10048 = 267 MB + Mbuf size = 10176 Bytes + Memory required = 26656 * 10176 = 271 MB diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e8618a6..a871743 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -521,8 +521,8 @@ is_dpdk_class(const struct netdev_class *class) static uint32_t dpdk_buf_size(int mtu) { - return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), - NETDEV_DPDK_MBUF_ALIGN); + return ROUND_UP(MTU_TO_MAX_FRAME_LEN(mtu), NETDEV_DPDK_MBUF_ALIGN) + + RTE_PKTMBUF_HEADROOM; } /* Allocates an area of 'sz' bytes from DPDK. The memory is zero'ed. @@ -681,6 +681,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp) dev->requested_n_rxq, dev->requested_n_txq, RTE_CACHE_LINE_SIZE); + /* The size of the mbuf's private area (i.e. area that holds OvS' + * dp_packet data)*/ mbuf_priv_data_len = sizeof(struct dp_packet) - sizeof(struct rte_mbuf); /* The size of the entire dp_packet. */
Commit dfaf00e started using the result of dpdk_buf_size() to calculate the available size on each mbuf, as opposed to using the previous MBUF_SIZE macro. However, this was calculating the mbuf size by adding up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below. Before alignment: ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048 After aligment: ROUNDUP(MTU(1500), 1024) + 128 = 2176 This might seem insignificant, however, it might have performance implications in DPDK, where each mbuf is expected to have 2k + RTE_PKTMBUF_HEADROOM of available space. This is because not only some NICs have course grained alignments of 1k, they will also take RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf when setting up their Rx requirements. Thus, only the "After alignment" case above would guarantee a 2k of available room, as the "Before alignment" would report only 1920B. Some extra information can be found at: https://mails.dpdk.org/archives/dev/2018-November/119219.html Note: This has been found by Ian Stokes while going through some af_packet checks. Reported-by: Ian Stokes <ian.stokes@intel.com> Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing") Signed-off-by: Tiago Lam <tiago.lam@intel.com> --- v3: - Take trailer_size into account when calculating mbuf size - Ian. v2: - Rebase to master 85706c3 ("ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis"). - Fix mbuf size calculations under Documentation/topics/dpdk/memory.rst to take into account the header_size added to each mepool element (64 bytes) - Ian. --- Documentation/topics/dpdk/memory.rst | 28 ++++++++++++++-------------- lib/netdev-dpdk.c | 6 ++++-- 2 files changed, 18 insertions(+), 16 deletions(-)