diff mbox series

[ovs-dev,v11,05/14] dp-packet: Fix data_len handling multi-seg mbufs.

Message ID 1539188552-129083-6-git-send-email-tiago.lam@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series Support multi-segment mbufs | expand

Commit Message

Lam, Tiago Oct. 10, 2018, 4:22 p.m. UTC
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu <qiudayu@chinac.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>

Signed-off-by: Michael Qiu <qiudayu@chinac.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/dp-packet.h | 83 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 71 insertions(+), 12 deletions(-)

Comments

Stokes, Ian Oct. 22, 2018, 10:31 a.m. UTC | #1
> When a dp_packet is from a DPDK source, and it contains multi-segment
> mbufs, the data_len is not equal to the packet size, pkt_len. Instead, the
> data_len of each mbuf in the chain should be considered while distributing
> the new (provided) size.
> 
> To account for the above dp_packet_set_size() has been changed so that, in
> the multi-segment mbufs case, only the data_len on the last mbuf of the
> chain and the total size of the packet, pkt_len, are changed. The data_len
> on the intermediate mbufs preceeding the last mbuf is not changed by
> dp_packet_set_size(). Furthermore, in some cases
> dp_packet_set_size() may be used to set a smaller size than the current
> packet size, thus effectively trimming the end of the packet. In the
> multi-segment mbufs case this may lead to lingering mbufs that may need
> freeing.
> 
> __dp_packet_set_data() now also updates an mbufs' data_len after setting
> the data offset. This is so that both fields are always in sync for each
> mbuf in a chain.
> 
> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
> 
> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>  lib/dp-packet.h | 83 ++++++++++++++++++++++++++++++++++++++++++++++++----
> -----
>  1 file changed, 71 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 6376039..223efe2
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -426,20 +426,60 @@ dp_packet_size(const struct dp_packet *b)
>      return b->mbuf.pkt_len;
>  }
> 
> +/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this
> +only means
> + * setting b->size_, but if used in a DPDK packet it means adjusting
> +the first
> + * mbuf pkt_len and last mbuf data_len, to reflect the real size, which
> +can
> + * lead to free'ing tail mbufs that are no longer used.
> + *
> + * This function should be used for setting the size only, and if
> +there's an
> + * assumption that the tail end of 'b' will be trimmed. For adjustng

Minor typo above 'adjustng' -> 'adjusting'.

> +the head
> + * 'end' of 'b', dp_packet_pull() should be used instead. */
>  static inline void
>  dp_packet_set_size(struct dp_packet *b, uint32_t v)  {
> -    /* netdev-dpdk does not currently support segmentation; consequently,
> for
> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32
> bit) may
> -     * be used interchangably.
> -     *
> -     * On the datapath, it is expected that the size of packets
> -     * (and thus 'v') will always be <= UINT16_MAX; this means that there
> is no
> -     * loss of accuracy in assigning 'v' to 'data_len'.
> -     */
> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> -    b->mbuf.pkt_len = v;             /* Total length of all segments
> linked to
> -                                      * this segment. */
> +    if (b->source == DPBUF_DPDK) {
> +        struct rte_mbuf *mbuf = &b->mbuf;
> +        uint16_t new_len = v;
> +        uint16_t data_len;
> +        uint16_t nb_segs = 0;
> +        uint16_t pkt_len = 0;
> +
> +        /* Trim 'v' length bytes from the end of the chained buffers,
> freeing

A general comment on the code block below, there is a number of conditions at play here. In reviewing I had to go pen to paper to keep track of the cases and see what would be the outcome in each case.

It might be worth outlining these cases in a comment first before the block i.e. what happens when v is larger/smaller than mbuf->data len. What happens when mbuf->next is already null. The relationship between mbuf->buf_len and mbuf->data_len would be useful also to give context when assigning more mbufs.

> +           any buffers that may be left floating */
> +        while (mbuf) {
> +            data_len = MIN(new_len, mbuf->data_len);
> +            mbuf->data_len = data_len;
> +
> +            if (new_len - data_len <= 0) {
> +                /* Free the rest of chained mbufs */
> +                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
> +                                           mbuf));
> +                mbuf->next = NULL;
> +            } else if (!mbuf->next) {
> +                /* Don't assign more than what we have available */
> +                mbuf->data_len = MIN(new_len,
> +                                     mbuf->buf_len - mbuf->data_off);
> +            }
> +
> +            new_len -= data_len;
> +            nb_segs += 1;
> +            pkt_len += mbuf->data_len;
> +            mbuf = mbuf->next;
> +        }
> +
> +        /* pkt_len != v would effectively mean that pkt_len < than 'v'
> (as
> +         * being bigger is logically impossible). Being < than 'v' would
> mean
> +         * the 'v' provided was bigger than the available room, which is
> the
> +         * responsibility of the caller to make sure there is enough room
> */
> +        ovs_assert(pkt_len == v);
> +
> +        b->mbuf.nb_segs = nb_segs;
> +        b->mbuf.pkt_len = pkt_len;
> +    } else {
> +        b->mbuf.data_len = v;
> +        /* Total length of all segments linked to this segment. */
> +        b->mbuf.pkt_len = v;
> +    }
>  }
> 
>  static inline uint16_t
> @@ -451,7 +491,26 @@ __packet_data(const struct dp_packet *b)  static
> inline void  __packet_set_data(struct dp_packet *b, uint16_t v)  {
> -    b->mbuf.data_off = v;
> +    if (b->source == DPBUF_DPDK) {
> +        /* Moving data_off away from the first mbuf in the chain is not a
> +         * possibility using DPBUF_DPDK dp_packets */

Can you clarify the comment above? On first reading I though it meant the data_off cannot be changed in the first mbuf but  we change it below then. I think it means it can't be changed beyond the first mbuf?

Ian
> +        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
> +
> +        uint16_t prev_ofs = b->mbuf.data_off;
> +        b->mbuf.data_off = v;
> +        int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
> +
> +        /* When dealing with DPDK mbufs, keep data_off and data_len in
> sync.
> +         * Thus, update data_len if the length changes with the move of
> +         * data_off. However, if data_len is 0, there's no data to move
> and
> +         * data_len should remain 0. */
> +
> +        if (b->mbuf.data_len != 0) {
> +            b->mbuf.data_len += ofs_diff;
> +        }
> +    } else {
> +        b->mbuf.data_off = v;
> +    }
>  }
> 
>  static inline uint16_t
> --
> 2.7.4
Lam, Tiago Oct. 23, 2018, 10:58 a.m. UTC | #2
On 22/10/2018 11:31, Stokes, Ian wrote:
>> When a dp_packet is from a DPDK source, and it contains multi-segment
>> mbufs, the data_len is not equal to the packet size, pkt_len. Instead, the
>> data_len of each mbuf in the chain should be considered while distributing
>> the new (provided) size.
>>
>> To account for the above dp_packet_set_size() has been changed so that, in
>> the multi-segment mbufs case, only the data_len on the last mbuf of the
>> chain and the total size of the packet, pkt_len, are changed. The data_len
>> on the intermediate mbufs preceeding the last mbuf is not changed by
>> dp_packet_set_size(). Furthermore, in some cases
>> dp_packet_set_size() may be used to set a smaller size than the current
>> packet size, thus effectively trimming the end of the packet. In the
>> multi-segment mbufs case this may lead to lingering mbufs that may need
>> freeing.
>>
>> __dp_packet_set_data() now also updates an mbufs' data_len after setting
>> the data offset. This is so that both fields are always in sync for each
>> mbuf in a chain.
>>
>> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
>>
>> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>>  lib/dp-packet.h | 83 ++++++++++++++++++++++++++++++++++++++++++++++++----
>> -----
>>  1 file changed, 71 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 6376039..223efe2
>> 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -426,20 +426,60 @@ dp_packet_size(const struct dp_packet *b)
>>      return b->mbuf.pkt_len;
>>  }
>>
>> +/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this
>> +only means
>> + * setting b->size_, but if used in a DPDK packet it means adjusting
>> +the first
>> + * mbuf pkt_len and last mbuf data_len, to reflect the real size, which
>> +can
>> + * lead to free'ing tail mbufs that are no longer used.
>> + *
>> + * This function should be used for setting the size only, and if
>> +there's an
>> + * assumption that the tail end of 'b' will be trimmed. For adjustng
> 
> Minor typo above 'adjustng' -> 'adjusting'.
> 
>> +the head
>> + * 'end' of 'b', dp_packet_pull() should be used instead. */
>>  static inline void
>>  dp_packet_set_size(struct dp_packet *b, uint32_t v)  {
>> -    /* netdev-dpdk does not currently support segmentation; consequently,
>> for
>> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32
>> bit) may
>> -     * be used interchangably.
>> -     *
>> -     * On the datapath, it is expected that the size of packets
>> -     * (and thus 'v') will always be <= UINT16_MAX; this means that there
>> is no
>> -     * loss of accuracy in assigning 'v' to 'data_len'.
>> -     */
>> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
>> -    b->mbuf.pkt_len = v;             /* Total length of all segments
>> linked to
>> -                                      * this segment. */
>> +    if (b->source == DPBUF_DPDK) {
>> +        struct rte_mbuf *mbuf = &b->mbuf;
>> +        uint16_t new_len = v;
>> +        uint16_t data_len;
>> +        uint16_t nb_segs = 0;
>> +        uint16_t pkt_len = 0;
>> +
>> +        /* Trim 'v' length bytes from the end of the chained buffers,
>> freeing
> 
> A general comment on the code block below, there is a number of conditions at play here. In reviewing I had to go pen to paper to keep track of the cases and see what would be the outcome in each case.
> 
> It might be worth outlining these cases in a comment first before the block i.e. what happens when v is larger/smaller than mbuf->data len. What happens when mbuf->next is already null. The relationship between mbuf->buf_len and mbuf->data_len would be useful also to give context when assigning more mbufs.

Thanks, Ian. That's good feedback. Some subtleties here are coming from
the DPDK side, maybe some overall comments (maybe where struct rte_mbuf
is introduced in dp_packet struct) like the following could help:

> /* pkt_len = sum of data_len's */
> /* data_len = data room in user */
> /* buf_len - data_off = total data room available */
> struct rte_mbuf mbuf;       /* DPDK mbuf */


But then again, would could easily argue about the same information
being in struct rte_mbuf already, so I'm not sure how useful this would be.

But to clarify this piece in specific I'll add the following to this
comment block:
> Trim 'v' length bytes from the end of the chained buffers, freeing
> any buffers that may be left floating.
>
> For that traverse over the entire mbuf chain and, for each mbuf,
> subtract its 'data_len' from 'new_len' (initially set to 'v'), which
> essentially spreads 'new_len' between all existing mbufs in the
> chain. While traversing the mbuf chain, we end the traversal if:
> - 'new_size' reaches 0, meaning the passed 'v' has been appropriately
>    spread over the mbuf chain. The remaining mbufs are freed;
> - We reach the last mbuf in the chain, in which case we set the last
>   mbuf's 'data_len' to the minimum value between the current
>   'new_len' (what's leftover from 'v') size and the maximum data the
>   mbuf can hold (mbuf->buf_len - mbuf->data_off).
>
> The above formula will thus make sure that when a 'v' is smaller
> than the overall 'pkt_len' (sum of all 'data_len'), it sets the new
> size and frees the leftover mbufs. In the other hand, if 'v' is
> bigger, it sets the size to the maximum available space, but no more
> than that.

Hopefully this will help.

Thanks,
Tiago.
diff mbox series

Patch

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6376039..223efe2 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -426,20 +426,60 @@  dp_packet_size(const struct dp_packet *b)
     return b->mbuf.pkt_len;
 }
 
+/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this only means
+ * setting b->size_, but if used in a DPDK packet it means adjusting the first
+ * mbuf pkt_len and last mbuf data_len, to reflect the real size, which can
+ * lead to free'ing tail mbufs that are no longer used.
+ *
+ * This function should be used for setting the size only, and if there's an
+ * assumption that the tail end of 'b' will be trimmed. For adjustng the head
+ * 'end' of 'b', dp_packet_pull() should be used instead. */
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-    /* netdev-dpdk does not currently support segmentation; consequently, for
-     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
-     * be used interchangably.
-     *
-     * On the datapath, it is expected that the size of packets
-     * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
-     * loss of accuracy in assigning 'v' to 'data_len'.
-     */
-    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
-                                      * this segment. */
+    if (b->source == DPBUF_DPDK) {
+        struct rte_mbuf *mbuf = &b->mbuf;
+        uint16_t new_len = v;
+        uint16_t data_len;
+        uint16_t nb_segs = 0;
+        uint16_t pkt_len = 0;
+
+        /* Trim 'v' length bytes from the end of the chained buffers, freeing
+           any buffers that may be left floating */
+        while (mbuf) {
+            data_len = MIN(new_len, mbuf->data_len);
+            mbuf->data_len = data_len;
+
+            if (new_len - data_len <= 0) {
+                /* Free the rest of chained mbufs */
+                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
+                                           mbuf));
+                mbuf->next = NULL;
+            } else if (!mbuf->next) {
+                /* Don't assign more than what we have available */
+                mbuf->data_len = MIN(new_len,
+                                     mbuf->buf_len - mbuf->data_off);
+            }
+
+            new_len -= data_len;
+            nb_segs += 1;
+            pkt_len += mbuf->data_len;
+            mbuf = mbuf->next;
+        }
+
+        /* pkt_len != v would effectively mean that pkt_len < than 'v' (as
+         * being bigger is logically impossible). Being < than 'v' would mean
+         * the 'v' provided was bigger than the available room, which is the
+         * responsibility of the caller to make sure there is enough room */
+        ovs_assert(pkt_len == v);
+
+        b->mbuf.nb_segs = nb_segs;
+        b->mbuf.pkt_len = pkt_len;
+    } else {
+        b->mbuf.data_len = v;
+        /* Total length of all segments linked to this segment. */
+        b->mbuf.pkt_len = v;
+    }
 }
 
 static inline uint16_t
@@ -451,7 +491,26 @@  __packet_data(const struct dp_packet *b)
 static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
-    b->mbuf.data_off = v;
+    if (b->source == DPBUF_DPDK) {
+        /* Moving data_off away from the first mbuf in the chain is not a
+         * possibility using DPBUF_DPDK dp_packets */
+        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
+
+        uint16_t prev_ofs = b->mbuf.data_off;
+        b->mbuf.data_off = v;
+        int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
+
+        /* When dealing with DPDK mbufs, keep data_off and data_len in sync.
+         * Thus, update data_len if the length changes with the move of
+         * data_off. However, if data_len is 0, there's no data to move and
+         * data_len should remain 0. */
+
+        if (b->mbuf.data_len != 0) {
+            b->mbuf.data_len += ofs_diff;
+        }
+    } else {
+        b->mbuf.data_off = v;
+    }
 }
 
 static inline uint16_t