[ovs-dev,v10,05/14] dp-packet: Fix data_len handling multi-seg mbufs.

Message ID 1538151315-117132-6-git-send-email-tiago.lam@intel.com
State Superseded
Headers show
Series
  • Support multi-segment mbufs
Related show

Commit Message

Lam, Tiago Sept. 28, 2018, 4:15 p.m.
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.

To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.

__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.

Co-authored-by: Michael Qiu <qiudayu@chinac.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>

Signed-off-by: Michael Qiu <qiudayu@chinac.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/dp-packet.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 72 insertions(+), 12 deletions(-)

Comments

Flavio Leitner Oct. 3, 2018, 6:26 p.m. | #1
On Fri, Sep 28, 2018 at 05:15:06PM +0100, Tiago Lam wrote:
> When a dp_packet is from a DPDK source, and it contains multi-segment
> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
> the data_len of each mbuf in the chain should be considered while
> distributing the new (provided) size.
> 
> To account for the above dp_packet_set_size() has been changed so that,
> in the multi-segment mbufs case, only the data_len on the last mbuf of
> the chain and the total size of the packet, pkt_len, are changed. The
> data_len on the intermediate mbufs preceeding the last mbuf is not
> changed by dp_packet_set_size(). Furthermore, in some cases
> dp_packet_set_size() may be used to set a smaller size than the current
> packet size, thus effectively trimming the end of the packet. In the
> multi-segment mbufs case this may lead to lingering mbufs that may need
> freeing.
> 
> __dp_packet_set_data() now also updates an mbufs' data_len after setting
> the data offset. This is so that both fields are always in sync for each
> mbuf in a chain.

I think we need an assert(source) to make sure dp_packet_shift() will
never be called for a multi-seg.
 
> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
> 
> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>  lib/dp-packet.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 72 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 6376039..4545041 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -426,20 +426,60 @@ dp_packet_size(const struct dp_packet *b)
>      return b->mbuf.pkt_len;
>  }
>  
> +/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this only means
> + * setting b->size_, but if used in a DPDK packet it means adjusting the first
> + * mbuf pkt_len and last mbuf data_len, to reflect the real size, which can
> + * lead to free'ing tail mbufs that are no longer used.
> + *
> + * This function should be used for setting the size only, and if there's an
> + * assumption that the tail end of 'b' will be trimmed. For adjustng the head
> + * 'end' of 'b', dp_packet_pull() should be used instead. */
>  static inline void
>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>  {
> -    /* netdev-dpdk does not currently support segmentation; consequently, for
> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
> -     * be used interchangably.
> -     *
> -     * On the datapath, it is expected that the size of packets
> -     * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
> -     * loss of accuracy in assigning 'v' to 'data_len'.
> -     */
> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
> -    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
> -                                      * this segment. */
> +    if (b->source == DPBUF_DPDK) {
> +        struct rte_mbuf *mbuf = &b->mbuf;
> +        uint16_t new_len = v;
> +        uint16_t data_len;
> +        uint16_t nb_segs = 0;
> +        uint16_t pkt_len = 0;
> +
> +        /* Trim 'v' length bytes from the end of the chained buffers, freeing
> +           any buffers that may be left floating */
> +        while (mbuf) {
> +            data_len = MIN(new_len, mbuf->data_len);
> +            mbuf->data_len = data_len;
> +
> +            if (new_len - data_len <= 0) {
> +                /* Free the rest of chained mbufs */
> +                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
> +                                           mbuf));
> +                mbuf->next = NULL;
> +            } else if (!mbuf->next) {
> +                /* Don't assign more than what we have available */
> +                mbuf->data_len = MIN(new_len,
> +                                     mbuf->buf_len - mbuf->data_off);
> +            }
> +
> +            new_len -= data_len;
> +            nb_segs += 1;
> +            pkt_len += mbuf->data_len;
> +            mbuf = mbuf->next;
> +        }
> +
> +        /* pkt_len != v would effectively mean that pkt_len < than 'v' (as
> +         * being bigger is logically impossible). Being < than 'v' would mean
> +         * the 'v' provided was bigger than the available room, which is the
> +         * responsibility of the caller to make sure there is enough room */
> +        ovs_assert(pkt_len == v);
> +
> +        b->mbuf.nb_segs = nb_segs;
> +        b->mbuf.pkt_len = pkt_len;
> +    } else {
> +        b->mbuf.data_len = v;
> +        /* Total length of all segments linked to this segment. */
> +        b->mbuf.pkt_len = v;
> +    }
>  }
>  
>  static inline uint16_t
> @@ -451,7 +491,27 @@ __packet_data(const struct dp_packet *b)
>  static inline void
>  __packet_set_data(struct dp_packet *b, uint16_t v)
>  {
> -    b->mbuf.data_off = v;
> +    if (b->source == DPBUF_DPDK) {
> +        /* Moving data_off away from the first mbuf in the chain is not a
> +         * possibility using DPBUF_DPDK dp_packets */
> +        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
> +
> +        uint16_t prev_ofs = b->mbuf.data_off;
> +        b->mbuf.data_off = v;
> +        int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
> +
> +        /* When dealing with DPDK mbufs, keep data_off and data_len in sync.
> +         * Thus, update data_len if the length changes with the move of
> +         * data_off. However, if data_len is 0, there's no data to move and
> +         * data_len should remain 0. */
> +
> +        if (b->mbuf.data_len != 0) {
> +            b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
> +                                   b->mbuf.buf_len - b->mbuf.data_off);

if 'v' is <= mbuf.buf_len, then why we need the above?
Why not b->mbuf.data_len += ofs_diff ?

fbl 


> +        }
> +    } else {
> +        b->mbuf.data_off = v;
> +    }
>  }
>  
>  static inline uint16_t
> -- 
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Lam, Tiago Oct. 5, 2018, 2:50 p.m. | #2
On 03/10/2018 19:26, Flavio Leitner wrote:
> On Fri, Sep 28, 2018 at 05:15:06PM +0100, Tiago Lam wrote:
>> When a dp_packet is from a DPDK source, and it contains multi-segment
>> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
>> the data_len of each mbuf in the chain should be considered while
>> distributing the new (provided) size.
>>
>> To account for the above dp_packet_set_size() has been changed so that,
>> in the multi-segment mbufs case, only the data_len on the last mbuf of
>> the chain and the total size of the packet, pkt_len, are changed. The
>> data_len on the intermediate mbufs preceeding the last mbuf is not
>> changed by dp_packet_set_size(). Furthermore, in some cases
>> dp_packet_set_size() may be used to set a smaller size than the current
>> packet size, thus effectively trimming the end of the packet. In the
>> multi-segment mbufs case this may lead to lingering mbufs that may need
>> freeing.
>>
>> __dp_packet_set_data() now also updates an mbufs' data_len after setting
>> the data offset. This is so that both fields are always in sync for each
>> mbuf in a chain.
> 
> I think we need an assert(source) to make sure dp_packet_shift() will
> never be called for a multi-seg.

I guess you made this comment before seeing patch 07/14 where a
dp_packet_shift() implementation is introduced for DPDK packets? But
nonetheless, it has been mentioned before by Darrell that it might not
be worth implementing. Given that the only users for now are in
lib/pcap-file.c and lib/ovs-ofctl.c, where packets are always of
DPBUF_MALLOC type, I guess it could make sense to drop the
implementation. It would certainly help reduce the footprint of this
patchset for now. If a use case comes for that later on we can always
introduce it then.

>  
>> Co-authored-by: Michael Qiu <qiudayu@chinac.com>
>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Co-authored-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Co-authored-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
>>
>> Signed-off-by: Michael Qiu <qiudayu@chinac.com>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
>> Signed-off-by: Marcin Ksiadz <mksiadz@gmail.com>
>> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>>  lib/dp-packet.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 72 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 6376039..4545041 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -426,20 +426,60 @@ dp_packet_size(const struct dp_packet *b)
>>      return b->mbuf.pkt_len;
>>  }
>>  
>> +/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this only means
>> + * setting b->size_, but if used in a DPDK packet it means adjusting the first
>> + * mbuf pkt_len and last mbuf data_len, to reflect the real size, which can
>> + * lead to free'ing tail mbufs that are no longer used.
>> + *
>> + * This function should be used for setting the size only, and if there's an
>> + * assumption that the tail end of 'b' will be trimmed. For adjustng the head
>> + * 'end' of 'b', dp_packet_pull() should be used instead. */
>>  static inline void
>>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>>  {
>> -    /* netdev-dpdk does not currently support segmentation; consequently, for
>> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
>> -     * be used interchangably.
>> -     *
>> -     * On the datapath, it is expected that the size of packets
>> -     * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
>> -     * loss of accuracy in assigning 'v' to 'data_len'.
>> -     */
>> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
>> -    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
>> -                                      * this segment. */
>> +    if (b->source == DPBUF_DPDK) {
>> +        struct rte_mbuf *mbuf = &b->mbuf;
>> +        uint16_t new_len = v;
>> +        uint16_t data_len;
>> +        uint16_t nb_segs = 0;
>> +        uint16_t pkt_len = 0;
>> +
>> +        /* Trim 'v' length bytes from the end of the chained buffers, freeing
>> +           any buffers that may be left floating */
>> +        while (mbuf) {
>> +            data_len = MIN(new_len, mbuf->data_len);
>> +            mbuf->data_len = data_len;
>> +
>> +            if (new_len - data_len <= 0) {
>> +                /* Free the rest of chained mbufs */
>> +                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
>> +                                           mbuf));
>> +                mbuf->next = NULL;
>> +            } else if (!mbuf->next) {
>> +                /* Don't assign more than what we have available */
>> +                mbuf->data_len = MIN(new_len,
>> +                                     mbuf->buf_len - mbuf->data_off);
>> +            }
>> +
>> +            new_len -= data_len;
>> +            nb_segs += 1;
>> +            pkt_len += mbuf->data_len;
>> +            mbuf = mbuf->next;
>> +        }
>> +
>> +        /* pkt_len != v would effectively mean that pkt_len < than 'v' (as
>> +         * being bigger is logically impossible). Being < than 'v' would mean
>> +         * the 'v' provided was bigger than the available room, which is the
>> +         * responsibility of the caller to make sure there is enough room */
>> +        ovs_assert(pkt_len == v);
>> +
>> +        b->mbuf.nb_segs = nb_segs;
>> +        b->mbuf.pkt_len = pkt_len;
>> +    } else {
>> +        b->mbuf.data_len = v;
>> +        /* Total length of all segments linked to this segment. */
>> +        b->mbuf.pkt_len = v;
>> +    }
>>  }
>>  
>>  static inline uint16_t
>> @@ -451,7 +491,27 @@ __packet_data(const struct dp_packet *b)
>>  static inline void
>>  __packet_set_data(struct dp_packet *b, uint16_t v)
>>  {
>> -    b->mbuf.data_off = v;
>> +    if (b->source == DPBUF_DPDK) {
>> +        /* Moving data_off away from the first mbuf in the chain is not a
>> +         * possibility using DPBUF_DPDK dp_packets */
>> +        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
>> +
>> +        uint16_t prev_ofs = b->mbuf.data_off;
>> +        b->mbuf.data_off = v;
>> +        int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
>> +
>> +        /* When dealing with DPDK mbufs, keep data_off and data_len in sync.
>> +         * Thus, update data_len if the length changes with the move of
>> +         * data_off. However, if data_len is 0, there's no data to move and
>> +         * data_len should remain 0. */
>> +
>> +        if (b->mbuf.data_len != 0) {
>> +            b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
>> +                                   b->mbuf.buf_len - b->mbuf.data_off);
> 
> if 'v' is <= mbuf.buf_len, then why we need the above?
> Why not b->mbuf.data_len += ofs_diff ?

This is an oversight; Indeed, the first condition inside the MIN()
should always be hit. I'll use the above, thanks.

Tiago.

> 
> fbl 
> 
> 
>> +        }
>> +    } else {
>> +        b->mbuf.data_off = v;
>> +    }
>>  }
>>  
>>  static inline uint16_t
>> -- 
>> 2.7.4
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Flavio Leitner Oct. 5, 2018, 4:49 p.m. | #3
On Fri, Oct 05, 2018 at 03:50:46PM +0100, Lam, Tiago wrote:
> On 03/10/2018 19:26, Flavio Leitner wrote:
> > On Fri, Sep 28, 2018 at 05:15:06PM +0100, Tiago Lam wrote:
> >> When a dp_packet is from a DPDK source, and it contains multi-segment
> >> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
> >> the data_len of each mbuf in the chain should be considered while
> >> distributing the new (provided) size.
> >>
> >> To account for the above dp_packet_set_size() has been changed so that,
> >> in the multi-segment mbufs case, only the data_len on the last mbuf of
> >> the chain and the total size of the packet, pkt_len, are changed. The
> >> data_len on the intermediate mbufs preceeding the last mbuf is not
> >> changed by dp_packet_set_size(). Furthermore, in some cases
> >> dp_packet_set_size() may be used to set a smaller size than the current
> >> packet size, thus effectively trimming the end of the packet. In the
> >> multi-segment mbufs case this may lead to lingering mbufs that may need
> >> freeing.
> >>
> >> __dp_packet_set_data() now also updates an mbufs' data_len after setting
> >> the data offset. This is so that both fields are always in sync for each
> >> mbuf in a chain.
> > 
> > I think we need an assert(source) to make sure dp_packet_shift() will
> > never be called for a multi-seg.
> 
> I guess you made this comment before seeing patch 07/14 where a
> dp_packet_shift() implementation is introduced for DPDK packets? But
> nonetheless, it has been mentioned before by Darrell that it might not
> be worth implementing. Given that the only users for now are in
> lib/pcap-file.c and lib/ovs-ofctl.c, where packets are always of
> DPBUF_MALLOC type, I guess it could make sense to drop the
> implementation. It would certainly help reduce the footprint of this
> patchset for now. If a use case comes for that later on we can always
> introduce it then.

Sure, my point is that it doesn't expect anything other than
DPBUF_MALLOC type, so adding an assert() would be helpful to
catch bugs, that's all.

Thanks
fbl

Patch

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6376039..4545041 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -426,20 +426,60 @@  dp_packet_size(const struct dp_packet *b)
     return b->mbuf.pkt_len;
 }
 
+/* Sets the size of the packet 'b' to 'v'. For non-DPDK packets this only means
+ * setting b->size_, but if used in a DPDK packet it means adjusting the first
+ * mbuf pkt_len and last mbuf data_len, to reflect the real size, which can
+ * lead to free'ing tail mbufs that are no longer used.
+ *
+ * This function should be used for setting the size only, and if there's an
+ * assumption that the tail end of 'b' will be trimmed. For adjustng the head
+ * 'end' of 'b', dp_packet_pull() should be used instead. */
 static inline void
 dp_packet_set_size(struct dp_packet *b, uint32_t v)
 {
-    /* netdev-dpdk does not currently support segmentation; consequently, for
-     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
-     * be used interchangably.
-     *
-     * On the datapath, it is expected that the size of packets
-     * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
-     * loss of accuracy in assigning 'v' to 'data_len'.
-     */
-    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
-    b->mbuf.pkt_len = v;             /* Total length of all segments linked to
-                                      * this segment. */
+    if (b->source == DPBUF_DPDK) {
+        struct rte_mbuf *mbuf = &b->mbuf;
+        uint16_t new_len = v;
+        uint16_t data_len;
+        uint16_t nb_segs = 0;
+        uint16_t pkt_len = 0;
+
+        /* Trim 'v' length bytes from the end of the chained buffers, freeing
+           any buffers that may be left floating */
+        while (mbuf) {
+            data_len = MIN(new_len, mbuf->data_len);
+            mbuf->data_len = data_len;
+
+            if (new_len - data_len <= 0) {
+                /* Free the rest of chained mbufs */
+                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
+                                           mbuf));
+                mbuf->next = NULL;
+            } else if (!mbuf->next) {
+                /* Don't assign more than what we have available */
+                mbuf->data_len = MIN(new_len,
+                                     mbuf->buf_len - mbuf->data_off);
+            }
+
+            new_len -= data_len;
+            nb_segs += 1;
+            pkt_len += mbuf->data_len;
+            mbuf = mbuf->next;
+        }
+
+        /* pkt_len != v would effectively mean that pkt_len < than 'v' (as
+         * being bigger is logically impossible). Being < than 'v' would mean
+         * the 'v' provided was bigger than the available room, which is the
+         * responsibility of the caller to make sure there is enough room */
+        ovs_assert(pkt_len == v);
+
+        b->mbuf.nb_segs = nb_segs;
+        b->mbuf.pkt_len = pkt_len;
+    } else {
+        b->mbuf.data_len = v;
+        /* Total length of all segments linked to this segment. */
+        b->mbuf.pkt_len = v;
+    }
 }
 
 static inline uint16_t
@@ -451,7 +491,27 @@  __packet_data(const struct dp_packet *b)
 static inline void
 __packet_set_data(struct dp_packet *b, uint16_t v)
 {
-    b->mbuf.data_off = v;
+    if (b->source == DPBUF_DPDK) {
+        /* Moving data_off away from the first mbuf in the chain is not a
+         * possibility using DPBUF_DPDK dp_packets */
+        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
+
+        uint16_t prev_ofs = b->mbuf.data_off;
+        b->mbuf.data_off = v;
+        int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
+
+        /* When dealing with DPDK mbufs, keep data_off and data_len in sync.
+         * Thus, update data_len if the length changes with the move of
+         * data_off. However, if data_len is 0, there's no data to move and
+         * data_len should remain 0. */
+
+        if (b->mbuf.data_len != 0) {
+            b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
+                                   b->mbuf.buf_len - b->mbuf.data_off);
+        }
+    } else {
+        b->mbuf.data_off = v;
+    }
 }
 
 static inline uint16_t