[ovs-dev,v8,05/13] dp-packet: Fix data_len handling multi-seg mbufs.

Message ID 1528734090-220990-6-git-send-email-tiago.lam@intel.com
State Superseded
Headers show
Series
  • Support multi-segment mbufs
Related show

Commit Message

Lam, Tiago June 11, 2018, 4:21 p.m.
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu <qiudayu@chinac.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>

Signed-off-by: Michael Qiu <qiudayu@chinac.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
---
 lib/dp-packet.h | 56 +++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 45 insertions(+), 11 deletions(-)

Comments

Eelco Chaudron June 18, 2018, 11:39 a.m. | #1
On 11 Jun 2018, at 18:21, Tiago Lam wrote:

> When a dp_packet is from a DPDK source, and it contains multi-segment
> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
> the data_len of each mbuf in the chain should be considered while
> distributing the new (provided) size.
>
> To account for the above dp_packet_set_size() has been changed so 
> that,
> in the multi-segment mbufs case, only the data_len on the last mbuf of
> the chain and the total size of the packet, pkt_len, are changed. The
> data_len on the intermediate mbufs preceeding the last mbuf is not
> changed by dp_packet_set_size(). Furthermore, in some cases
> dp_packet_set_size() may be used to set a smaller size than the 
> current
> packet size, thus effectively trimming the end of the packet. In the
> multi-segment mbufs case this may lead to lingering mbufs that may 
> need
> freeing.
>
> __dp_packet_set_data() now also updates an mbufs' data_len after 
> setting
> the data offset. This is so that both fields are always in sync for 
> each
> mbuf in a chain.
>
> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
>
> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> ---
>  lib/dp-packet.h | 56 
> +++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 45 insertions(+), 11 deletions(-)
>
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 4c104b6..c301ed5 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -429,17 +429,39 @@ dp_packet_size(const struct dp_packet *b)
>  static inline void
>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>  {
> -    /* netdev-dpdk does not currently support segmentation; 
> consequently, for
> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' 
> (32 bit) may
> -     * be used interchangably.
> -     *
> -     * On the datapath, it is expected that the size of packets
> -     * (and thus 'v') will always be <= UINT16_MAX; this means that 
> there is no
> -     * loss of accuracy in assigning 'v' to 'data_len'.
> -     */
> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> -    b->mbuf.pkt_len = v;             /* Total length of all segments 
> linked to
> -                                      * this segment. */
> +    if (b->source == DPBUF_DPDK) {

This function does not have any check to guarantee that the size set is 
valid? i.e. is this not a trim only function? If we increase the size, 
no additional mbufs get allocated (see below)

> +        struct rte_mbuf *seg = &b->mbuf;
> +        struct rte_mbuf *fmbuf = seg;
> +        uint16_t pkt_len = v;
> +        uint16_t seg_len;
> +        uint16_t nb_segs = 0;
> +
> +        /* Trim 'v' length bytes from the end of the chained buffers, 
> freeing
> +           any buffers that may be left floating */
> +        while (seg) {
> +            seg_len = MIN(pkt_len, seg->data_len);
> +            seg->data_len = seg_len;
> +
> +            pkt_len -= seg_len;
> +            if (pkt_len <= 0) {
> +                /* Free the rest of chained mbufs */
> +                free_dpdk_buf((struct dp_packet *) seg->next);
> +                seg->next = NULL;
> +            } else if (!seg->next) {
> +                seg->data_len = pkt_len;
Here we could potentially set the size of the segment larger than the 
allocated/available buffer.

Also if this is fine, we set it to the wrong value... Assume we have a 
single buffer of 100 bytes and we call with v=400. This will result in 
data_len being 300, and not 400, should probably be =pkt_len + seg_len

> +            }
> +
> +            nb_segs += 1;
> +            seg = seg->next;
> +        }
> +
> +        fmbuf->nb_segs = nb_segs;
> +    } else {
> +        b->mbuf.data_len = v;
> +    }
> +
> +    /* Total length of all segments linked to this segment. */
> +    b->mbuf.pkt_len = v;
>  }
>
>  static inline uint16_t
> @@ -451,7 +473,19 @@ __packet_data(const struct dp_packet *b)
>  static inline void
>  __packet_set_data(struct dp_packet *b, uint16_t v)
>  {
> +    uint16_t prev_ofs = b->mbuf.data_off;
>      b->mbuf.data_off = v;
> +    int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
> +
> +    /* When dealing with DPDK mbufs, keep data_off and data_len in 
> sync. Thus,
> +     * update data_len if the length changes with the move of 
> data_off.
> +     * However, if data_len is 0, there's no data to move and 
> data_Len should
> +     * remain 0. */
> +
> +    if (b->mbuf.data_len != 0) {
> +        b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
> +                               b->mbuf.buf_len - b->mbuf.data_off);
> +    }
>  }
>
>  static inline uint16_t
> -- 
> 2.7.4
Lam, Tiago June 22, 2018, 7:04 p.m. | #2
On 18/06/2018 12:39, Eelco Chaudron wrote:
> 
> 
> On 11 Jun 2018, at 18:21, Tiago Lam wrote:
> 
>> When a dp_packet is from a DPDK source, and it contains multi-segment
>> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
>> the data_len of each mbuf in the chain should be considered while
>> distributing the new (provided) size.
>>
>> To account for the above dp_packet_set_size() has been changed so 
>> that,
>> in the multi-segment mbufs case, only the data_len on the last mbuf of
>> the chain and the total size of the packet, pkt_len, are changed. The
>> data_len on the intermediate mbufs preceeding the last mbuf is not
>> changed by dp_packet_set_size(). Furthermore, in some cases
>> dp_packet_set_size() may be used to set a smaller size than the 
>> current
>> packet size, thus effectively trimming the end of the packet. In the
>> multi-segment mbufs case this may lead to lingering mbufs that may 
>> need
>> freeing.
>>
>> __dp_packet_set_data() now also updates an mbufs' data_len after 
>> setting
>> the data offset. This is so that both fields are always in sync for 
>> each
>> mbuf in a chain.
>>
>> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
>>
>> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>> ---
>>  lib/dp-packet.h | 56 
>> +++++++++++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 45 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 4c104b6..c301ed5 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -429,17 +429,39 @@ dp_packet_size(const struct dp_packet *b)
>>  static inline void
>>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>>  {
>> -    /* netdev-dpdk does not currently support segmentation; 
>> consequently, for
>> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' 
>> (32 bit) may
>> -     * be used interchangably.
>> -     *
>> -     * On the datapath, it is expected that the size of packets
>> -     * (and thus 'v') will always be <= UINT16_MAX; this means that 
>> there is no
>> -     * loss of accuracy in assigning 'v' to 'data_len'.
>> -     */
>> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
>> -    b->mbuf.pkt_len = v;             /* Total length of all segments 
>> linked to
>> -                                      * this segment. */
>> +    if (b->source == DPBUF_DPDK) {
> 
> This function does not have any check to guarantee that the size set is 
> valid? i.e. is this not a trim only function? If we increase the size, 
> no additional mbufs get allocated (see below)

Indeed, it should be used as a trim function. If the size is increased
then the expectation is that enough space has been allocated. This is
the current behavior when using `DPBUF_MALLOC` dp_packets as well, and
is kept here for `DPBUF_DPDK` dp_packets.

> 
>> +        struct rte_mbuf *seg = &b->mbuf;
>> +        struct rte_mbuf *fmbuf = seg;
>> +        uint16_t pkt_len = v;
>> +        uint16_t seg_len;
>> +        uint16_t nb_segs = 0;
>> +
>> +        /* Trim 'v' length bytes from the end of the chained buffers, 
>> freeing
>> +           any buffers that may be left floating */
>> +        while (seg) {
>> +            seg_len = MIN(pkt_len, seg->data_len);
>> +            seg->data_len = seg_len;
>> +
>> +            pkt_len -= seg_len;
>> +            if (pkt_len <= 0) {
>> +                /* Free the rest of chained mbufs */
>> +                free_dpdk_buf((struct dp_packet *) seg->next);
>> +                seg->next = NULL;
>> +            } else if (!seg->next) {
>> +                seg->data_len = pkt_len;
> Here we could potentially set the size of the segment larger than the 
> allocated/available buffer.
> 
> Also if this is fine, we set it to the wrong value... Assume we have a 
> single buffer of 100 bytes and we call with v=400. This will result in 
> data_len being 300, and not 400, should probably be =pkt_len + seg_len
> 

Thanks for pointing that out. I've overlooked this, which together with
the leftover in patch 07/13 made it work correctly.

Tiago.
Eelco Chaudron June 26, 2018, 9:24 a.m. | #3
On 22 Jun 2018, at 21:04, Lam, Tiago wrote:

> On 18/06/2018 12:39, Eelco Chaudron wrote:
>>
>>
>> On 11 Jun 2018, at 18:21, Tiago Lam wrote:
>>
>>> When a dp_packet is from a DPDK source, and it contains multi-segment
>>> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
>>> the data_len of each mbuf in the chain should be considered while
>>> distributing the new (provided) size.
>>>
>>> To account for the above dp_packet_set_size() has been changed so
>>> that,
>>> in the multi-segment mbufs case, only the data_len on the last mbuf of
>>> the chain and the total size of the packet, pkt_len, are changed. The
>>> data_len on the intermediate mbufs preceeding the last mbuf is not
>>> changed by dp_packet_set_size(). Furthermore, in some cases
>>> dp_packet_set_size() may be used to set a smaller size than the
>>> current
>>> packet size, thus effectively trimming the end of the packet. In the
>>> multi-segment mbufs case this may lead to lingering mbufs that may
>>> need
>>> freeing.
>>>
>>> __dp_packet_set_data() now also updates an mbufs' data_len after
>>> setting
>>> the data offset. This is so that both fields are always in sync for
>>> each
>>> mbuf in a chain.
>>>
>>> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
>>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>>> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
>>> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
>>>
>>> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>>> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
>>> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
>>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>>> ---
>>>  lib/dp-packet.h | 56
>>> +++++++++++++++++++++++++++++++++++++++++++++-----------
>>>  1 file changed, 45 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index 4c104b6..c301ed5 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -429,17 +429,39 @@ dp_packet_size(const struct dp_packet *b)
>>>  static inline void
>>>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>>>  {
>>> -    /* netdev-dpdk does not currently support segmentation;
>>> consequently, for
>>> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len'
>>> (32 bit) may
>>> -     * be used interchangably.
>>> -     *
>>> -     * On the datapath, it is expected that the size of packets
>>> -     * (and thus 'v') will always be <= UINT16_MAX; this means that
>>> there is no
>>> -     * loss of accuracy in assigning 'v' to 'data_len'.
>>> -     */
>>> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
>>> -    b->mbuf.pkt_len = v;             /* Total length of all segments
>>> linked to
>>> -                                      * this segment. */
>>> +    if (b->source == DPBUF_DPDK) {
>>
>> This function does not have any check to guarantee that the size set is
>> valid? i.e. is this not a trim only function? If we increase the size,
>> no additional mbufs get allocated (see below)
>
> Indeed, it should be used as a trim function. If the size is increased
> then the expectation is that enough space has been allocated. This is
> the current behavior when using `DPBUF_MALLOC` dp_packets as well, and
> is kept here for `DPBUF_DPDK` dp_packets.

I think a check is still needed as it’s not only used as a trim function.
For example: dp_packet_put_uninit()


>>
>>> +        struct rte_mbuf *seg = &b->mbuf;
>>> +        struct rte_mbuf *fmbuf = seg;
>>> +        uint16_t pkt_len = v;
>>> +        uint16_t seg_len;
>>> +        uint16_t nb_segs = 0;
>>> +
>>> +        /* Trim 'v' length bytes from the end of the chained buffers,
>>> freeing
>>> +           any buffers that may be left floating */
>>> +        while (seg) {
>>> +            seg_len = MIN(pkt_len, seg->data_len);
>>> +            seg->data_len = seg_len;
>>> +
>>> +            pkt_len -= seg_len;
>>> +            if (pkt_len <= 0) {
>>> +                /* Free the rest of chained mbufs */
>>> +                free_dpdk_buf((struct dp_packet *) seg->next);
>>> +                seg->next = NULL;
>>> +            } else if (!seg->next) {
>>> +                seg->data_len = pkt_len;
>> Here we could potentially set the size of the segment larger than the
>> allocated/available buffer.
>>
>> Also if this is fine, we set it to the wrong value... Assume we have a
>> single buffer of 100 bytes and we call with v=400. This will result in
>> data_len being 300, and not 400, should probably be =pkt_len + seg_len
>>
>
> Thanks for pointing that out. I've overlooked this, which together with
> the leftover in patch 07/13 made it work correctly.

Patch

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 4c104b6..c301ed5 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -429,17 +429,39 @@  dp_packet_size(const struct dp_packet *b)
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-    /* netdev-dpdk does not currently support segmentation; consequently, for
-     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
-     * be used interchangably.
-     *
-     * On the datapath, it is expected that the size of packets
-     * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
-     * loss of accuracy in assigning 'v' to 'data_len'.
-     */
-    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
-                                      * this segment. */
+    if (b->source == DPBUF_DPDK) {
+        struct rte_mbuf *seg = &b->mbuf;
+        struct rte_mbuf *fmbuf = seg;
+        uint16_t pkt_len = v;
+        uint16_t seg_len;
+        uint16_t nb_segs = 0;
+
+        /* Trim 'v' length bytes from the end of the chained buffers, freeing
+           any buffers that may be left floating */
+        while (seg) {
+            seg_len = MIN(pkt_len, seg->data_len);
+            seg->data_len = seg_len;
+
+            pkt_len -= seg_len;
+            if (pkt_len <= 0) {
+                /* Free the rest of chained mbufs */
+                free_dpdk_buf((struct dp_packet *) seg->next);
+                seg->next = NULL;
+            } else if (!seg->next) {
+                seg->data_len = pkt_len;
+            }
+
+            nb_segs += 1;
+            seg = seg->next;
+        }
+
+        fmbuf->nb_segs = nb_segs;
+    } else {
+        b->mbuf.data_len = v;
+    }
+
+    /* Total length of all segments linked to this segment. */
+    b->mbuf.pkt_len = v;
 }
 
 static inline uint16_t
@@ -451,7 +473,19 @@  __packet_data(const struct dp_packet *b)
 static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
+    uint16_t prev_ofs = b->mbuf.data_off;
     b->mbuf.data_off = v;
+    int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
+
+    /* When dealing with DPDK mbufs, keep data_off and data_len in sync. Thus,
+     * update data_len if the length changes with the move of data_off.
+     * However, if data_len is 0, there's no data to move and data_Len should
+     * remain 0. */
+
+    if (b->mbuf.data_len != 0) {
+        b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
+                               b->mbuf.buf_len - b->mbuf.data_off);
+    }
 }
 
 static inline uint16_t