diff mbox series

[ovs-dev,v5,08/14] dp-packet: Handle multi-seg mbufs in resize__().

Message ID 1531333421-235225-9-git-send-email-tiago.lam@intel.com
State Superseded
Headers show
Series Support multi-segment mbufs | expand

Commit Message

Lam, Tiago July 11, 2018, 6:23 p.m. UTC
When enabled with DPDK OvS relies on mbufs allocated by mempools to
receive and output data on DPDK ports. Until now, each OvS dp_packet has
had only one mbuf associated, which is allocated with the maximum
possible size, taking the MTU into account. This approach, however,
doesn't allow us to increase the allocated size in an mbuf, if needed,
since an mbuf is allocated and initialised upon mempool creation. Thus,
in the current implementatin this is dealt with by calling
OVS_NOT_REACHED() and terminating OvS.

To avoid this, and allow the (already) allocated space to be better
used, dp_packet_resize__() now tries to use the available room, both the
tailroom and the headroom, to make enough space for the new data. Since
this happens for packets of source DPBUF_DPDK, the single-segment mbuf
case mentioned above is also covered by this new aproach in resize__().

Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

Comments

Darrell Ball July 13, 2018, 5:54 p.m. UTC | #1
Thanks for the patch.

A few queries inline.

On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com> wrote:

> When enabled with DPDK OvS relies on mbufs allocated by mempools to
> receive and output data on DPDK ports. Until now, each OvS dp_packet has
> had only one mbuf associated, which is allocated with the maximum
> possible size, taking the MTU into account. This approach, however,
> doesn't allow us to increase the allocated size in an mbuf, if needed,
> since an mbuf is allocated and initialised upon mempool creation. Thus,
> in the current implementatin this is dealt with by calling
> OVS_NOT_REACHED() and terminating OvS.
>
> To avoid this, and allow the (already) allocated space to be better
> used, dp_packet_resize__() now tries to use the available room, both the
> tailroom and the headroom, to make enough space for the new data. Since
> this happens for packets of source DPBUF_DPDK, the single-segment mbuf
> case mentioned above is also covered by this new aproach in resize__().
>
> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>  lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index d6e19eb..87af459 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom
>      new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>
>      switch (b->source) {
> +    /* When resizing mbufs, both a single mbuf and multi-segment mbufs
> (where
> +     * data is not contigously held in memory), both the headroom and the
> +     * tailroom available will be used to make more space for where data
> needs
> +     * to be inserted. I.e if there's not enough headroom, data may be
> shifted
> +     * right if there's enough tailroom.
> +     * However, this is not bulletproof and in some cases the space
> available
> +     * won't be enough - in those cases, an error should be returned and
> the
> +     * packet dropped. */
>      case DPBUF_DPDK:
> -        OVS_NOT_REACHED();
>

Previously, it was a coding error to call this function for a DPDK mbuf
case, which is pretty
clear. But with this patch, presumably that is not longer the case and the
calling the API is
now ok for DPDK mbufs.



> +    {
> +        size_t miss_len;
> +
> +        if (new_headroom == dp_packet_headroom(b)) {
> +            /* This is a tailroom adjustment. Since there's no tailroom
> space
> +             * left, try and shift data towards the head to free up tail
> space,
> +             * if there's enough headroom */
> +
> +            miss_len = new_tailroom - dp_packet_tailroom(b);
> +
> +            if (miss_len <= new_headroom) {
> +                dp_packet_shift(b, -miss_len);
> +            } else {
> +                /* XXX: Handle error case and report error to caller */
> +                OVS_NOT_REACHED();
>


This will not just drop the packet, it will fail the daemon, because a
packet cannot be resized.
If the system is completely depleted of memory, that may be ok, but in the
case, no such
assumption is implied.

Also, why is XXX still left in the patch?

Also, the preexisting API handles two cases:
1/ Tailroom only adjustment
2/ headroom and/or tailroom adjustment

meaning it handles all cases.

The new DPDK addition (part of the same API) defines 2 cases

1/ tailroom only adjustment
2/ headroom only adjustment

So, it looks like a different API, that also does not handle all cases.



> +            }
> +        } else {
> +            /* Otherwise, this is a headroom adjustment. Try to shift data
> +             * towards the tail to free up head space, if there's enough
> +             * tailroom */
> +
> +            miss_len = new_headroom - dp_packet_headroom(b);
>
> +
> +            if (miss_len <= new_tailroom) {
> +                dp_packet_shift(b, miss_len);
> +            } else {
> +                /* XXX: Handle error case and report error to caller */
> +                OVS_NOT_REACHED();
>


same comments as above.



> +            }
> +        }
> +
> +        new_base = dp_packet_base(b);
> +
> +        break;
> +    }
>      case DPBUF_MALLOC:
>          if (new_headroom == dp_packet_headroom(b)) {
>              new_base = xrealloc(dp_packet_base(b), new_allocated);
> @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom
>          OVS_NOT_REACHED();
>      }
>
> -    dp_packet_set_allocated(b, new_allocated);
> +    if (b->source != DPBUF_DPDK) {
> +        dp_packet_set_allocated(b, new_allocated);
> +    }
>      dp_packet_set_base(b, new_base);
>
>      new_data = (char *) new_base + new_headroom;
> --
> 2.7.4
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Lam, Tiago July 16, 2018, 8:37 a.m. UTC | #2
On 13/07/2018 18:54, Darrell Ball wrote:
> Thanks for the patch.
> 
> A few queries inline.
> 

Hi Darrell,

Thanks for your inputs. I've replied in-line as well.

> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>> wrote:
> 
>     When enabled with DPDK OvS relies on mbufs allocated by mempools to
>     receive and output data on DPDK ports. Until now, each OvS dp_packet has
>     had only one mbuf associated, which is allocated with the maximum
>     possible size, taking the MTU into account. This approach, however,
>     doesn't allow us to increase the allocated size in an mbuf, if needed,
>     since an mbuf is allocated and initialised upon mempool creation. Thus,
>     in the current implementatin this is dealt with by calling
>     OVS_NOT_REACHED() and terminating OvS.
> 
>     To avoid this, and allow the (already) allocated space to be better
>     used, dp_packet_resize__() now tries to use the available room, both the
>     tailroom and the headroom, to make enough space for the new data. Since
>     this happens for packets of source DPBUF_DPDK, the single-segment mbuf
>     case mentioned above is also covered by this new aproach in resize__().
> 
>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com>>
>     Acked-by: Eelco Chaudron <echaudro@redhat.com
>     <mailto:echaudro@redhat.com>>
>     ---
>      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>      1 file changed, 46 insertions(+), 2 deletions(-)
> 
>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     index d6e19eb..87af459 100644
>     --- a/lib/dp-packet.c
>     +++ b/lib/dp-packet.c
>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
>     new_headroom, size_t new_tailroom
>          new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
> 
>          switch (b->source) {
>     +    /* When resizing mbufs, both a single mbuf and multi-segment
>     mbufs (where
>     +     * data is not contigously held in memory), both the headroom
>     and the
>     +     * tailroom available will be used to make more space for where
>     data needs
>     +     * to be inserted. I.e if there's not enough headroom, data may
>     be shifted
>     +     * right if there's enough tailroom.
>     +     * However, this is not bulletproof and in some cases the space
>     available
>     +     * won't be enough - in those cases, an error should be
>     returned and the
>     +     * packet dropped. */
>          case DPBUF_DPDK:
>     -        OVS_NOT_REACHED();
> 
> 
> Previously, it was a coding error to call this function for a DPDK mbuf
> case, which is pretty
> clear. But with this patch, presumably that is not longer the case and
> the calling the API is
> now ok for DPDK mbufs.
> 

As it stands, it will still be an error to call dp_packet_resize__() for
any DPDK packet, or by extension any of the other functions that call
it, such as dp_packet_prealloc_tailroom() and
dp_packet_prealloc_headroom(). This patch only tries to alleviate that
by accommodating space from the headroom or tailroom, if possible, and
create just enough space for the new data. My preferred approach would
be to return an error if not possible, but since the API doesn't deal
with errors as is, the previous behavior of manually asserting was left
as is. As reported in [1] (I comment more on that below), the behavior
of manually asserting can lead to undesired behavior in some use cases.

>  
> 
>     +    {
>     +        size_t miss_len;
>     +
>     +        if (new_headroom == dp_packet_headroom(b)) {
>     +            /* This is a tailroom adjustment. Since there's no
>     tailroom space
>     +             * left, try and shift data towards the head to free up
>     tail space,
>     +             * if there's enough headroom */
>     +
>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
>     +
>     +            if (miss_len <= new_headroom) {
>     +                dp_packet_shift(b, -miss_len);
>     +            } else {
>     +                /* XXX: Handle error case and report error to caller */
>     +                OVS_NOT_REACHED();
> 
> 
> 
> This will not just drop the packet, it will fail the daemon, because a
> packet cannot be resized.
> If the system is completely depleted of memory, that may be ok, but in
> the case, no such
> assumption is implied.
> 
> Also, why is XXX still left in the patch?
> 

Because there's still work to do in that regard. The whole process
shouldn't be brought down if there's not enough space to put some data
in one single packet. However, this was intentionally left out of this
series or otherwise it would increase its complexity considerably.

As others have pointed out in [1], this is not a simple change, which
would have to be propagated to higher levels in other parts of the code
base. I've proposed an alternative (vs refactoring the whole dp_packet
API to handle and return errors) in [2], but that seems to have gone
stale. Going forward I see that approach merging with this new piece in
dp_packet_resize__(), where an error can be returned to the caller if
there's not enough space.

> Also, the preexisting API handles two cases:
> 1/ Tailroom only adjustment
> 2/ headroom and/or tailroom adjustment
> 
> meaning it handles all cases.
>
> The new DPDK addition (part of the same API) defines 2 cases
> 
> 1/ tailroom only adjustment
> 2/ headroom only adjustment
> 
> So, it looks like a different API, that also does not handle all cases.
> 
>  

You have a point there, support for point 2/ "headroom and tailroom
adjustment" is missed. It doesn't seem to be used anywhere at the
moment, the only callers being dp_packet_prealloc_tailroom() and
dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
deal with this. Thanks for pointing it out.

Tiago.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html

> 
>     +            }
>     +        } else {
>     +            /* Otherwise, this is a headroom adjustment. Try to
>     shift data
>     +             * towards the tail to free up head space, if there's
>     enough
>     +             * tailroom */
>     +
>     +            miss_len = new_headroom - dp_packet_headroom(b);
> 
>     +
>     +            if (miss_len <= new_tailroom) {
>     +                dp_packet_shift(b, miss_len);
>     +            } else {
>     +                /* XXX: Handle error case and report error to caller */
>     +                OVS_NOT_REACHED();
> 
> 
> 
> same comments as above.
> 
>  
> 
>     +            }
>     +        }
>     +
>     +        new_base = dp_packet_base(b);
>     +
>     +        break;
>     +    }
>          case DPBUF_MALLOC:
>              if (new_headroom == dp_packet_headroom(b)) {
>                  new_base = xrealloc(dp_packet_base(b), new_allocated);
>     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
>     new_headroom, size_t new_tailroom
>              OVS_NOT_REACHED();
>          }
> 
>     -    dp_packet_set_allocated(b, new_allocated);
>     +    if (b->source != DPBUF_DPDK) {
>     +        dp_packet_set_allocated(b, new_allocated);
>     +    }
>          dp_packet_set_base(b, new_base);
> 
>          new_data = (char *) new_base + new_headroom;
>     -- 
>     2.7.4
> 
>     _______________________________________________
>     dev mailing list
>     dev@openvswitch.org <mailto:dev@openvswitch.org>
>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> 
>
Lam, Tiago July 17, 2018, 7:59 a.m. UTC | #3
On 16/07/2018 09:37, Lam, Tiago wrote:
> On 13/07/2018 18:54, Darrell Ball wrote:
>> Thanks for the patch.
>>
>> A few queries inline.
>>
> 
> Hi Darrell,
> 
> Thanks for your inputs. I've replied in-line as well.
> 
>> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
>> <mailto:tiago.lam@intel.com>> wrote:
>>
>>     When enabled with DPDK OvS relies on mbufs allocated by mempools to
>>     receive and output data on DPDK ports. Until now, each OvS dp_packet has
>>     had only one mbuf associated, which is allocated with the maximum
>>     possible size, taking the MTU into account. This approach, however,
>>     doesn't allow us to increase the allocated size in an mbuf, if needed,
>>     since an mbuf is allocated and initialised upon mempool creation. Thus,
>>     in the current implementatin this is dealt with by calling
>>     OVS_NOT_REACHED() and terminating OvS.
>>
>>     To avoid this, and allow the (already) allocated space to be better
>>     used, dp_packet_resize__() now tries to use the available room, both the
>>     tailroom and the headroom, to make enough space for the new data. Since
>>     this happens for packets of source DPBUF_DPDK, the single-segment mbuf
>>     case mentioned above is also covered by this new aproach in resize__().
>>
>>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
>>     <mailto:tiago.lam@intel.com>>
>>     Acked-by: Eelco Chaudron <echaudro@redhat.com
>>     <mailto:echaudro@redhat.com>>
>>     ---
>>      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>>      1 file changed, 46 insertions(+), 2 deletions(-)
>>
>>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>     index d6e19eb..87af459 100644
>>     --- a/lib/dp-packet.c
>>     +++ b/lib/dp-packet.c
>>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
>>     new_headroom, size_t new_tailroom
>>          new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
>>
>>          switch (b->source) {
>>     +    /* When resizing mbufs, both a single mbuf and multi-segment
>>     mbufs (where
>>     +     * data is not contigously held in memory), both the headroom
>>     and the
>>     +     * tailroom available will be used to make more space for where
>>     data needs
>>     +     * to be inserted. I.e if there's not enough headroom, data may
>>     be shifted
>>     +     * right if there's enough tailroom.
>>     +     * However, this is not bulletproof and in some cases the space
>>     available
>>     +     * won't be enough - in those cases, an error should be
>>     returned and the
>>     +     * packet dropped. */
>>          case DPBUF_DPDK:
>>     -        OVS_NOT_REACHED();
>>
>>
>> Previously, it was a coding error to call this function for a DPDK mbuf
>> case, which is pretty
>> clear. But with this patch, presumably that is not longer the case and
>> the calling the API is
>> now ok for DPDK mbufs.
>>
> 
> As it stands, it will still be an error to call dp_packet_resize__() for
> any DPDK packet, or by extension any of the other functions that call
> it, such as dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom(). This patch only tries to alleviate that
> by accommodating space from the headroom or tailroom, if possible, and
> create just enough space for the new data. My preferred approach would
> be to return an error if not possible, but since the API doesn't deal
> with errors as is, the previous behavior of manually asserting was left
> as is. As reported in [1] (I comment more on that below), the behavior
> of manually asserting can lead to undesired behavior in some use cases.
> 
>>  
>>
>>     +    {
>>     +        size_t miss_len;
>>     +
>>     +        if (new_headroom == dp_packet_headroom(b)) {
>>     +            /* This is a tailroom adjustment. Since there's no
>>     tailroom space
>>     +             * left, try and shift data towards the head to free up
>>     tail space,
>>     +             * if there's enough headroom */
>>     +
>>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
>>     +
>>     +            if (miss_len <= new_headroom) {
>>     +                dp_packet_shift(b, -miss_len);
>>     +            } else {
>>     +                /* XXX: Handle error case and report error to caller */
>>     +                OVS_NOT_REACHED();
>>
>>
>>
>> This will not just drop the packet, it will fail the daemon, because a
>> packet cannot be resized.
>> If the system is completely depleted of memory, that may be ok, but in
>> the case, no such
>> assumption is implied.
>>
>> Also, why is XXX still left in the patch?
>>
> 
> Because there's still work to do in that regard. The whole process
> shouldn't be brought down if there's not enough space to put some data
> in one single packet. However, this was intentionally left out of this
> series or otherwise it would increase its complexity considerably.
> 
> As others have pointed out in [1], this is not a simple change, which
> would have to be propagated to higher levels in other parts of the code
> base. I've proposed an alternative (vs refactoring the whole dp_packet
> API to handle and return errors) in [2], but that seems to have gone
> stale. Going forward I see that approach merging with this new piece in
> dp_packet_resize__(), where an error can be returned to the caller if
> there's not enough space.
> 
>> Also, the preexisting API handles two cases:
>> 1/ Tailroom only adjustment
>> 2/ headroom and/or tailroom adjustment
>>
>> meaning it handles all cases.
>>
>> The new DPDK addition (part of the same API) defines 2 cases
>>
>> 1/ tailroom only adjustment
>> 2/ headroom only adjustment
>>
>> So, it looks like a different API, that also does not handle all cases.
>>
>>  
> 
> You have a point there, support for point 2/ "headroom and tailroom
> adjustment" is missed. It doesn't seem to be used anywhere at the
> moment, the only callers being dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
> deal with this. Thanks for pointing it out.
> 
I've had a look into this and it doesn't seem that case number 2/
"headroom and tailroom adjustment" above would make sense for the
DPBUF_DPDK case. The reason being that if both `new_tailroom` and
`new_headroom` are being increased (in respect to dp_packet_tailroom()
and dp_packet_headroom()), there won't be enough space as the tail and
head would be competing for the same available space. So it makes sense
to make it an exclusive operation for the DPBUF_DPDK case.

The work I mentioned before in [1] should help here as we would then be
able to return an error (suach as `EINVAL`) if both "new_tailroom" and
"new_headroom" are incremented for the DPBUF_DPDK case.

What do you think?

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
Darrell Ball July 18, 2018, 10:53 p.m. UTC | #4
sorry, several distractions delayed response.

On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 13/07/2018 18:54, Darrell Ball wrote:
> > Thanks for the patch.
> >
> > A few queries inline.
> >
>
> Hi Darrell,
>
> Thanks for your inputs. I've replied in-line as well.
>
> > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
> > <mailto:tiago.lam@intel.com>> wrote:
> >
> >     When enabled with DPDK OvS relies on mbufs allocated by mempools to
> >     receive and output data on DPDK ports. Until now, each OvS dp_packet
> has
> >     had only one mbuf associated, which is allocated with the maximum
> >     possible size, taking the MTU into account. This approach, however,
> >     doesn't allow us to increase the allocated size in an mbuf, if
> needed,
> >     since an mbuf is allocated and initialised upon mempool creation.
> Thus,
> >     in the current implementatin this is dealt with by calling
> >     OVS_NOT_REACHED() and terminating OvS.
> >
> >     To avoid this, and allow the (already) allocated space to be better
> >     used, dp_packet_resize__() now tries to use the available room, both
> the
> >     tailroom and the headroom, to make enough space for the new data.
> Since
> >     this happens for packets of source DPBUF_DPDK, the single-segment
> mbuf
> >     case mentioned above is also covered by this new aproach in
> resize__().
> >
> >     Signed-off-by: Tiago Lam <tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com>>
> >     Acked-by: Eelco Chaudron <echaudro@redhat.com
> >     <mailto:echaudro@redhat.com>>
> >     ---
> >      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++
> ++++++++++++++++--
> >      1 file changed, 46 insertions(+), 2 deletions(-)
> >
> >     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     index d6e19eb..87af459 100644
> >     --- a/lib/dp-packet.c
> >     +++ b/lib/dp-packet.c
> >     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
> >     new_headroom, size_t new_tailroom
> >          new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
> >
> >          switch (b->source) {
> >     +    /* When resizing mbufs, both a single mbuf and multi-segment
> >     mbufs (where
> >     +     * data is not contigously held in memory), both the headroom
> >     and the
> >     +     * tailroom available will be used to make more space for where
> >     data needs
> >     +     * to be inserted. I.e if there's not enough headroom, data may
> >     be shifted
> >     +     * right if there's enough tailroom.
> >     +     * However, this is not bulletproof and in some cases the space
> >     available
> >     +     * won't be enough - in those cases, an error should be
> >     returned and the
> >     +     * packet dropped. */
> >          case DPBUF_DPDK:
> >     -        OVS_NOT_REACHED();
> >
> >
> > Previously, it was a coding error to call this function for a DPDK mbuf
> > case, which is pretty
> > clear. But with this patch, presumably that is not longer the case and
> > the calling the API is
> > now ok for DPDK mbufs.
> >
>
> As it stands, it will still be an error to call dp_packet_resize__() for
> any DPDK packet, or by extension any of the other functions that call
> it, such as dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom().



yep, the existing code fails in a very clear way; i.e. whenever it is
called for a dpdk packet.
So the user would need to handle in some other way, which is not being done
today, I know.



> This patch only tries to alleviate that
> by accommodating space from the headroom or tailroom, if possible, and
> create just enough space for the new data.


The new code will fail is some yet undefined way, occasionally working and
failing
in the "other" cases.



> My preferred approach would
> be to return an error if not possible, but since the API doesn't deal
> with errors as is, the previous behavior of manually asserting was left
> as is. As reported in [1] (I comment more on that below), the behavior

of manually asserting can lead to undesired behavior in some use cases.
>


I am familiar with the issue.
As part of the change,  dp_packet_put_uninit() and dp_packet_push_uninit()
could be modified to return NULL
and that could be percolated and checked for.

Those APIs could simply check (by calling a helper API) if they would fail
a priori to trigger returning
NULL for dpdk buf cases.



>
> >
> >
> >     +    {
> >     +        size_t miss_len;
> >     +
> >     +        if (new_headroom == dp_packet_headroom(b)) {
> >     +            /* This is a tailroom adjustment. Since there's no
> >     tailroom space
> >     +             * left, try and shift data towards the head to free up
> >     tail space,
> >     +             * if there's enough headroom */
> >     +
> >     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >     +
> >     +            if (miss_len <= new_headroom) {
> >     +                dp_packet_shift(b, -miss_len);
> >     +            } else {
> >     +                /* XXX: Handle error case and report error to
> caller */
> >     +                OVS_NOT_REACHED();
> >
> >
> >
> > This will not just drop the packet, it will fail the daemon, because a
> > packet cannot be resized.
> > If the system is completely depleted of memory, that may be ok, but in
> > the case, no such
> > assumption is implied.
> >
> > Also, why is XXX still left in the patch?
> >
>
> Because there's still work to do in that regard. The whole process
> shouldn't be brought down if there's not enough space to put some data
> in one single packet. However, this was intentionally left out of this
> series or otherwise it would increase its complexity considerably.


It seems unnecessary to add a bunch of code to a series that tries to handle
'resize', but handles it partially in practical cases. It also seems
undefined when
it works and when it does not from a API caller POV.
I think patch 7 is also there to only support this patch 8.

Ideally, It would seem like a modified patch 7 and patch 8 would belong
with the rest of the
fix for the dpdk packet memory preallocation constraint issue.

Also, ideally, 'XXX' is removed from patches.



>
> As others have pointed out in [1], this is not a simple change, which
> would have to be propagated to higher levels in other parts of the code
> base. I've proposed an alternative (vs refactoring the whole dp_packet
> API to handle and return errors) in [2], but that seems to have gone
> stale. Going forward I see that approach merging with this new piece in
> dp_packet_resize__(), where an error can be returned to the caller if
> there's not enough space.
>

The full change is outside the scope of this series.



>
> > Also, the preexisting API handles two cases:
> > 1/ Tailroom only adjustment
> > 2/ headroom and/or tailroom adjustment
> >
> > meaning it handles all cases.
> >
> > The new DPDK addition (part of the same API) defines 2 cases
> >
> > 1/ tailroom only adjustment
> > 2/ headroom only adjustment
> >
> > So, it looks like a different API, that also does not handle all cases.
> >
> >
>
> You have a point there, support for point 2/ "headroom and tailroom
> adjustment" is missed. It doesn't seem to be used anywhere at the
> moment, the only callers being dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
> deal with this. Thanks for pointing it out.
>
> Tiago.
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> [2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>
> >
> >     +            }
> >     +        } else {
> >     +            /* Otherwise, this is a headroom adjustment. Try to
> >     shift data
> >     +             * towards the tail to free up head space, if there's
> >     enough
> >     +             * tailroom */
> >     +
> >     +            miss_len = new_headroom - dp_packet_headroom(b);
> >
> >     +
> >     +            if (miss_len <= new_tailroom) {
> >     +                dp_packet_shift(b, miss_len);
> >     +            } else {
> >     +                /* XXX: Handle error case and report error to
> caller */
> >     +                OVS_NOT_REACHED();
> >
> >
> >
> > same comments as above.
> >
> >
> >
> >     +            }
> >     +        }
> >     +
> >     +        new_base = dp_packet_base(b);
> >     +
> >     +        break;
> >     +    }
> >          case DPBUF_MALLOC:
> >              if (new_headroom == dp_packet_headroom(b)) {
> >                  new_base = xrealloc(dp_packet_base(b), new_allocated);
> >     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
> >     new_headroom, size_t new_tailroom
> >              OVS_NOT_REACHED();
> >          }
> >
> >     -    dp_packet_set_allocated(b, new_allocated);
> >     +    if (b->source != DPBUF_DPDK) {
> >     +        dp_packet_set_allocated(b, new_allocated);
> >     +    }
> >          dp_packet_set_base(b, new_base);
> >
> >          new_data = (char *) new_base + new_headroom;
> >     --
> >     2.7.4
> >
> >     _______________________________________________
> >     dev mailing list
> >     dev@openvswitch.org <mailto:dev@openvswitch.org>
> >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >
> >
>
Darrell Ball July 18, 2018, 11:02 p.m. UTC | #5
On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 16/07/2018 09:37, Lam, Tiago wrote:
> > On 13/07/2018 18:54, Darrell Ball wrote:
> >> Thanks for the patch.
> >>
> >> A few queries inline.
> >>
> >
> > Hi Darrell,
> >
> > Thanks for your inputs. I've replied in-line as well.
> >
> >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
> >> <mailto:tiago.lam@intel.com>> wrote:
> >>
> >>     When enabled with DPDK OvS relies on mbufs allocated by mempools to
> >>     receive and output data on DPDK ports. Until now, each OvS
> dp_packet has
> >>     had only one mbuf associated, which is allocated with the maximum
> >>     possible size, taking the MTU into account. This approach, however,
> >>     doesn't allow us to increase the allocated size in an mbuf, if
> needed,
> >>     since an mbuf is allocated and initialised upon mempool creation.
> Thus,
> >>     in the current implementatin this is dealt with by calling
> >>     OVS_NOT_REACHED() and terminating OvS.
> >>
> >>     To avoid this, and allow the (already) allocated space to be better
> >>     used, dp_packet_resize__() now tries to use the available room,
> both the
> >>     tailroom and the headroom, to make enough space for the new data.
> Since
> >>     this happens for packets of source DPBUF_DPDK, the single-segment
> mbuf
> >>     case mentioned above is also covered by this new aproach in
> resize__().
> >>
> >>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
> >>     <mailto:tiago.lam@intel.com>>
> >>     Acked-by: Eelco Chaudron <echaudro@redhat.com
> >>     <mailto:echaudro@redhat.com>>
> >>     ---
> >>      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++
> ++++++++++++++++--
> >>      1 file changed, 46 insertions(+), 2 deletions(-)
> >>
> >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >>     index d6e19eb..87af459 100644
> >>     --- a/lib/dp-packet.c
> >>     +++ b/lib/dp-packet.c
> >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
> >>     new_headroom, size_t new_tailroom
> >>          new_allocated = new_headroom + dp_packet_size(b) +
> new_tailroom;
> >>
> >>          switch (b->source) {
> >>     +    /* When resizing mbufs, both a single mbuf and multi-segment
> >>     mbufs (where
> >>     +     * data is not contigously held in memory), both the headroom
> >>     and the
> >>     +     * tailroom available will be used to make more space for where
> >>     data needs
> >>     +     * to be inserted. I.e if there's not enough headroom, data may
> >>     be shifted
> >>     +     * right if there's enough tailroom.
> >>     +     * However, this is not bulletproof and in some cases the space
> >>     available
> >>     +     * won't be enough - in those cases, an error should be
> >>     returned and the
> >>     +     * packet dropped. */
> >>          case DPBUF_DPDK:
> >>     -        OVS_NOT_REACHED();
> >>
> >>
> >> Previously, it was a coding error to call this function for a DPDK mbuf
> >> case, which is pretty
> >> clear. But with this patch, presumably that is not longer the case and
> >> the calling the API is
> >> now ok for DPDK mbufs.
> >>
> >
> > As it stands, it will still be an error to call dp_packet_resize__() for
> > any DPDK packet, or by extension any of the other functions that call
> > it, such as dp_packet_prealloc_tailroom() and
> > dp_packet_prealloc_headroom(). This patch only tries to alleviate that
> > by accommodating space from the headroom or tailroom, if possible, and
> > create just enough space for the new data. My preferred approach would
> > be to return an error if not possible, but since the API doesn't deal
> > with errors as is, the previous behavior of manually asserting was left
> > as is. As reported in [1] (I comment more on that below), the behavior
> > of manually asserting can lead to undesired behavior in some use cases.
> >
> >>
> >>
> >>     +    {
> >>     +        size_t miss_len;
> >>     +
> >>     +        if (new_headroom == dp_packet_headroom(b)) {
> >>     +            /* This is a tailroom adjustment. Since there's no
> >>     tailroom space
> >>     +             * left, try and shift data towards the head to free up
> >>     tail space,
> >>     +             * if there's enough headroom */
> >>     +
> >>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >>     +
> >>     +            if (miss_len <= new_headroom) {
> >>     +                dp_packet_shift(b, -miss_len);
> >>     +            } else {
> >>     +                /* XXX: Handle error case and report error to
> caller */
> >>     +                OVS_NOT_REACHED();
> >>
> >>
> >>
> >> This will not just drop the packet, it will fail the daemon, because a
> >> packet cannot be resized.
> >> If the system is completely depleted of memory, that may be ok, but in
> >> the case, no such
> >> assumption is implied.
> >>
> >> Also, why is XXX still left in the patch?
> >>
> >
> > Because there's still work to do in that regard. The whole process
> > shouldn't be brought down if there's not enough space to put some data
> > in one single packet. However, this was intentionally left out of this
> > series or otherwise it would increase its complexity considerably.
> >
> > As others have pointed out in [1], this is not a simple change, which
> > would have to be propagated to higher levels in other parts of the code
> > base. I've proposed an alternative (vs refactoring the whole dp_packet
> > API to handle and return errors) in [2], but that seems to have gone
> > stale. Going forward I see that approach merging with this new piece in
> > dp_packet_resize__(), where an error can be returned to the caller if
> > there's not enough space.
> >
> >> Also, the preexisting API handles two cases:
> >> 1/ Tailroom only adjustment
> >> 2/ headroom and/or tailroom adjustment
> >>
> >> meaning it handles all cases.
> >>
> >> The new DPDK addition (part of the same API) defines 2 cases
> >>
> >> 1/ tailroom only adjustment
> >> 2/ headroom only adjustment
> >>
> >> So, it looks like a different API, that also does not handle all cases.
> >>
> >>
> >
> > You have a point there, support for point 2/ "headroom and tailroom
> > adjustment" is missed. It doesn't seem to be used anywhere at the
> > moment, the only callers being dp_packet_prealloc_tailroom() and
> > dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
> > deal with this. Thanks for pointing it out.
> >
> I've had a look into this and it doesn't seem that case number 2/
> "headroom and tailroom adjustment" above would make sense for the
> DPBUF_DPDK case. The reason being that if both `new_tailroom` and
> `new_headroom` are being increased (in respect to dp_packet_tailroom()
> and dp_packet_headroom()), there won't be enough space as the tail and
> head would be competing for the same available space. So it makes sense
> to make it an exclusive operation for the DPBUF_DPDK case.
>

Maybe you can explain why that would be the case, in general?

However, if that is really the case, then that is fine; an "else if" case
would
need to be added and a modified 'return error else case'.



>
> The work I mentioned before in [1] should help here as we would then be
> able to return an error (suach as `EINVAL`) if both "new_tailroom" and
> "new_headroom" are incremented for the DPBUF_DPDK case.
>
> What do you think?
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>
Lam, Tiago July 20, 2018, 5:09 p.m. UTC | #6
On 18/07/2018 23:53, Darrell Ball wrote:
> sorry, several distractions delayed response.
> 
> On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>> wrote:
> 
>     On 13/07/2018 18:54, Darrell Ball wrote:
>     > Thanks for the patch.
>     > 
>     > A few queries inline.
>     > 
> 
>     Hi Darrell,
> 
>     Thanks for your inputs. I've replied in-line as well.
> 
>     > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
>     > 
>     >     When enabled with DPDK OvS relies on mbufs allocated by mempools to
>     >     receive and output data on DPDK ports. Until now, each OvS dp_packet has
>     >     had only one mbuf associated, which is allocated with the maximum
>     >     possible size, taking the MTU into account. This approach, however,
>     >     doesn't allow us to increase the allocated size in an mbuf, if needed,
>     >     since an mbuf is allocated and initialised upon mempool creation. Thus,
>     >     in the current implementatin this is dealt with by calling
>     >     OVS_NOT_REACHED() and terminating OvS.
>     > 
>     >     To avoid this, and allow the (already) allocated space to be better
>     >     used, dp_packet_resize__() now tries to use the available room, both the
>     >     tailroom and the headroom, to make enough space for the new data. Since
>     >     this happens for packets of source DPBUF_DPDK, the single-segment mbuf
>     >     case mentioned above is also covered by this new aproach in resize__().
>     > 
>     >     Signed-off-by: Tiago Lam <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>
>     >     Acked-by: Eelco Chaudron <echaudro@redhat.com
>     <mailto:echaudro@redhat.com>
>     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>
>     >     ---
>     >      lib/dp-packet.c | 48
>     ++++++++++++++++++++++++++++++++++++++++++++++--
>     >      1 file changed, 46 insertions(+), 2 deletions(-)
>     >
>     >     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     >     index d6e19eb..87af459 100644
>     >     --- a/lib/dp-packet.c
>     >     +++ b/lib/dp-packet.c
>     >     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
>     size_t
>     >     new_headroom, size_t new_tailroom
>     >          new_allocated = new_headroom + dp_packet_size(b) +
>     new_tailroom;
>     >
>     >          switch (b->source) {
>     >     +    /* When resizing mbufs, both a single mbuf and multi-segment
>     >     mbufs (where
>     >     +     * data is not contigously held in memory), both the headroom
>     >     and the
>     >     +     * tailroom available will be used to make more space for
>     where
>     >     data needs
>     >     +     * to be inserted. I.e if there's not enough headroom,
>     data may
>     >     be shifted
>     >     +     * right if there's enough tailroom.
>     >     +     * However, this is not bulletproof and in some cases the
>     space
>     >     available
>     >     +     * won't be enough - in those cases, an error should be
>     >     returned and the
>     >     +     * packet dropped. */
>     >          case DPBUF_DPDK:
>     >     -        OVS_NOT_REACHED();
>     >
>     >
>     > Previously, it was a coding error to call this function for a DPDK
>     mbuf
>     > case, which is pretty
>     > clear. But with this patch, presumably that is not longer the case and
>     > the calling the API is
>     > now ok for DPDK mbufs.
>     >
> 
>     As it stands, it will still be an error to call dp_packet_resize__() for
>     any DPDK packet, or by extension any of the other functions that call
>     it, such as dp_packet_prealloc_tailroom() and
>     dp_packet_prealloc_headroom(). 
> 
> 
> 
> yep, the existing code fails in a very clear way; i.e. whenever it is
> called for a dpdk packet.
> So the user would need to handle in some other way, which is not being
> done today, I know.
> 
>  
> 
>     This patch only tries to alleviate that
>     by accommodating space from the headroom or tailroom, if possible, and
>     create just enough space for the new data. 
> 
> 
> The new code will fail is some yet undefined way, occasionally working
> and failing
> in the "other" cases.
> 
>  
> 
>     My preferred approach would
>     be to return an error if not possible, but since the API doesn't deal
>     with errors as is, the previous behavior of manually asserting was left
>     as is. As reported in [1] (I comment more on that below), the behavior 
> 
>     of manually asserting can lead to undesired behavior in some use cases.
> 
> 
> 
> I am familiar with the issue.
> As part of the change,  dp_packet_put_uninit()
> and dp_packet_push_uninit() could be modified to return NULL
> and that could be percolated and checked for.
> 
> Those APIs could simply check (by calling a helper API) if they would
> fail a priori to trigger returning 
> NULL for dpdk buf cases.
> 
>  

Ok, I'm glad we're on the same page here. That's what [2] is doing. I'm
planning to bring that forward. If you could have a look as well, that
would be great.

> 
> 
>     >  
>     > 
>     >     +    {
>     >     +        size_t miss_len;
>     >     +
>     >     +        if (new_headroom == dp_packet_headroom(b)) {
>     >     +            /* This is a tailroom adjustment. Since there's no
>     >     tailroom space
>     >     +             * left, try and shift data towards the head to free up
>     >     tail space,
>     >     +             * if there's enough headroom */
>     >     +
>     >     +            miss_len = new_tailroom - dp_packet_tailroom(b);
>     >     +
>     >     +            if (miss_len <= new_headroom) {
>     >     +                dp_packet_shift(b, -miss_len);
>     >     +            } else {
>     >     +                /* XXX: Handle error case and report error to caller */
>     >     +                OVS_NOT_REACHED();
>     > 
>     > 
>     > 
>     > This will not just drop the packet, it will fail the daemon, because a
>     > packet cannot be resized.
>     > If the system is completely depleted of memory, that may be ok, but in
>     > the case, no such
>     > assumption is implied.
>     > 
>     > Also, why is XXX still left in the patch?
>     > 
> 
>     Because there's still work to do in that regard. The whole process
>     shouldn't be brought down if there's not enough space to put some data
>     in one single packet. However, this was intentionally left out of this
>     series or otherwise it would increase its complexity considerably. 
> 
> 
> It seems unnecessary to add a bunch of code to a series that tries to handle
> 'resize', but handles it partially in practical cases. It also seems
> undefined when
> it works and when it does not from a API caller POV. 

Granted that the dp_packet_resize__() supports more operations than what
this implementation provides, but dp_packet API callers won't notice a
difference since the only callers are
dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And the
implementation covers for those functions, which only either modify the
tailroom or the headroom (and not both).

> I think patch 7 is also there to only support this patch 8.
> 

That's a wrong assumption. Patch 7/14 was added to support the
dp_packet_shift() operation.

In this patch 8/14, because of the noncontiguous nature of chained
mbufs, there's no possibility of using general functions like `memmove`,
and re-using the shift operations was just easier.

> Ideally, It would seem like a modified patch 7 and patch 8 would belong
> with the rest of the 
> fix for the dpdk packet memory preallocation constraint issue.
> 
> Also, ideally, 'XXX' is removed from patches.
> 
>  
> 
> 
>     As others have pointed out in [1], this is not a simple change, which
>     would have to be propagated to higher levels in other parts of the code
>     base. I've proposed an alternative (vs refactoring the whole dp_packet
>     API to handle and return errors) in [2], but that seems to have gone
>     stale. Going forward I see that approach merging with this new piece in
>     dp_packet_resize__(), where an error can be returned to the caller if
>     there's not enough space.
> 
> 
> The full change is outside the scope of this series.
> 
>  
> 
> 
>     > Also, the preexisting API handles two cases:
>     > 1/ Tailroom only adjustment
>     > 2/ headroom and/or tailroom adjustment
>     > 
>     > meaning it handles all cases.
>     >
>     > The new DPDK addition (part of the same API) defines 2 cases
>     > 
>     > 1/ tailroom only adjustment
>     > 2/ headroom only adjustment
>     > 
>     > So, it looks like a different API, that also does not handle all cases.
>     > 
>     >  
> 
>     You have a point there, support for point 2/ "headroom and tailroom
>     adjustment" is missed. It doesn't seem to be used anywhere at the
>     moment, the only callers being dp_packet_prealloc_tailroom() and
>     dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
>     deal with this. Thanks for pointing it out.
> 
>     Tiago.
> 
>     [1]
>     https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html>
>     [2]
>     https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>
> 
>     >
>     >     +            }
>     >     +        } else {
>     >     +            /* Otherwise, this is a headroom adjustment. Try to
>     >     shift data
>     >     +             * towards the tail to free up head space, if there's
>     >     enough
>     >     +             * tailroom */
>     >     +
>     >     +            miss_len = new_headroom - dp_packet_headroom(b);
>     >
>     >     +
>     >     +            if (miss_len <= new_tailroom) {
>     >     +                dp_packet_shift(b, miss_len);
>     >     +            } else {
>     >     +                /* XXX: Handle error case and report error to
>     caller */
>     >     +                OVS_NOT_REACHED();
>     >
>     >
>     >
>     > same comments as above.
>     >
>     >  
>     >
>     >     +            }
>     >     +        }
>     >     +
>     >     +        new_base = dp_packet_base(b);
>     >     +
>     >     +        break;
>     >     +    }
>     >          case DPBUF_MALLOC:
>     >              if (new_headroom == dp_packet_headroom(b)) {
>     >                  new_base = xrealloc(dp_packet_base(b),
>     new_allocated);
>     >     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
>     >     new_headroom, size_t new_tailroom
>     >              OVS_NOT_REACHED();
>     >          }
>     >
>     >     -    dp_packet_set_allocated(b, new_allocated);
>     >     +    if (b->source != DPBUF_DPDK) {
>     >     +        dp_packet_set_allocated(b, new_allocated);
>     >     +    }
>     >          dp_packet_set_base(b, new_base);
>     >
>     >          new_data = (char *) new_base + new_headroom;
>     >     --
>     >     2.7.4
>     >
>     >     _______________________________________________
>     >     dev mailing list
>     >     dev@openvswitch.org <mailto:dev@openvswitch.org>
>     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>
>     >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>
>     >
>     >
> 
>
Lam, Tiago July 20, 2018, 5:10 p.m. UTC | #7
On 19/07/2018 00:02, Darrell Ball wrote:
> 
> 
> On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>> wrote:
> 
>     On 16/07/2018 09:37, Lam, Tiago wrote:
>     > On 13/07/2018 18:54, Darrell Ball wrote:
>     >> Thanks for the patch.
>     >>
>     >> A few queries inline.
>     >>
>     >
>     > Hi Darrell,
>     >
>     > Thanks for your inputs. I've replied in-line as well.
>     >
>     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com>
>     >> <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
>     >>
>     >>     When enabled with DPDK OvS relies on mbufs allocated by
>     mempools to
>     >>     receive and output data on DPDK ports. Until now, each OvS
>     dp_packet has
>     >>     had only one mbuf associated, which is allocated with the maximum
>     >>     possible size, taking the MTU into account. This approach,
>     however,
>     >>     doesn't allow us to increase the allocated size in an mbuf,
>     if needed,
>     >>     since an mbuf is allocated and initialised upon mempool
>     creation. Thus,
>     >>     in the current implementatin this is dealt with by calling
>     >>     OVS_NOT_REACHED() and terminating OvS.
>     >>
>     >>     To avoid this, and allow the (already) allocated space to be
>     better
>     >>     used, dp_packet_resize__() now tries to use the available
>     room, both the
>     >>     tailroom and the headroom, to make enough space for the new
>     data. Since
>     >>     this happens for packets of source DPBUF_DPDK, the
>     single-segment mbuf
>     >>     case mentioned above is also covered by this new aproach in
>     resize__().
>     >>
>     >>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com>
>     >>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>
>     >>     Acked-by: Eelco Chaudron <echaudro@redhat.com
>     <mailto:echaudro@redhat.com>
>     >>     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>
>     >>     ---
>     >>      lib/dp-packet.c | 48
>     ++++++++++++++++++++++++++++++++++++++++++++++--
>     >>      1 file changed, 46 insertions(+), 2 deletions(-)
>     >>
>     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     >>     index d6e19eb..87af459 100644
>     >>     --- a/lib/dp-packet.c
>     >>     +++ b/lib/dp-packet.c
>     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
>     size_t
>     >>     new_headroom, size_t new_tailroom
>     >>          new_allocated = new_headroom + dp_packet_size(b) +
>     new_tailroom;
>     >>
>     >>          switch (b->source) {
>     >>     +    /* When resizing mbufs, both a single mbuf and multi-segment
>     >>     mbufs (where
>     >>     +     * data is not contigously held in memory), both the
>     headroom
>     >>     and the
>     >>     +     * tailroom available will be used to make more space
>     for where
>     >>     data needs
>     >>     +     * to be inserted. I.e if there's not enough headroom,
>     data may
>     >>     be shifted
>     >>     +     * right if there's enough tailroom.
>     >>     +     * However, this is not bulletproof and in some cases
>     the space
>     >>     available
>     >>     +     * won't be enough - in those cases, an error should be
>     >>     returned and the
>     >>     +     * packet dropped. */
>     >>          case DPBUF_DPDK:
>     >>     -        OVS_NOT_REACHED();
>     >>
>     >>
>     >> Previously, it was a coding error to call this function for a
>     DPDK mbuf
>     >> case, which is pretty
>     >> clear. But with this patch, presumably that is not longer the
>     case and
>     >> the calling the API is
>     >> now ok for DPDK mbufs.
>     >>
>     >
>     > As it stands, it will still be an error to call
>     dp_packet_resize__() for
>     > any DPDK packet, or by extension any of the other functions that call
>     > it, such as dp_packet_prealloc_tailroom() and
>     > dp_packet_prealloc_headroom(). This patch only tries to alleviate that
>     > by accommodating space from the headroom or tailroom, if possible, and
>     > create just enough space for the new data. My preferred approach would
>     > be to return an error if not possible, but since the API doesn't deal
>     > with errors as is, the previous behavior of manually asserting was
>     left
>     > as is. As reported in [1] (I comment more on that below), the behavior
>     > of manually asserting can lead to undesired behavior in some use
>     cases.
>     >
>     >>  
>     >>
>     >>     +    {
>     >>     +        size_t miss_len;
>     >>     +
>     >>     +        if (new_headroom == dp_packet_headroom(b)) {
>     >>     +            /* This is a tailroom adjustment. Since there's no
>     >>     tailroom space
>     >>     +             * left, try and shift data towards the head to
>     free up
>     >>     tail space,
>     >>     +             * if there's enough headroom */
>     >>     +
>     >>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
>     >>     +
>     >>     +            if (miss_len <= new_headroom) {
>     >>     +                dp_packet_shift(b, -miss_len);
>     >>     +            } else {
>     >>     +                /* XXX: Handle error case and report error
>     to caller */
>     >>     +                OVS_NOT_REACHED();
>     >>
>     >>
>     >>
>     >> This will not just drop the packet, it will fail the daemon,
>     because a
>     >> packet cannot be resized.
>     >> If the system is completely depleted of memory, that may be ok,
>     but in
>     >> the case, no such
>     >> assumption is implied.
>     >>
>     >> Also, why is XXX still left in the patch?
>     >>
>     >
>     > Because there's still work to do in that regard. The whole process
>     > shouldn't be brought down if there's not enough space to put some data
>     > in one single packet. However, this was intentionally left out of this
>     > series or otherwise it would increase its complexity considerably.
>     >
>     > As others have pointed out in [1], this is not a simple change, which
>     > would have to be propagated to higher levels in other parts of the
>     code
>     > base. I've proposed an alternative (vs refactoring the whole dp_packet
>     > API to handle and return errors) in [2], but that seems to have gone
>     > stale. Going forward I see that approach merging with this new
>     piece in
>     > dp_packet_resize__(), where an error can be returned to the caller if
>     > there's not enough space.
>     >
>     >> Also, the preexisting API handles two cases:
>     >> 1/ Tailroom only adjustment
>     >> 2/ headroom and/or tailroom adjustment
>     >>
>     >> meaning it handles all cases.
>     >>
>     >> The new DPDK addition (part of the same API) defines 2 cases
>     >>
>     >> 1/ tailroom only adjustment
>     >> 2/ headroom only adjustment
>     >>
>     >> So, it looks like a different API, that also does not handle all
>     cases.
>     >>
>     >>  
>     >
>     > You have a point there, support for point 2/ "headroom and tailroom
>     > adjustment" is missed. It doesn't seem to be used anywhere at the
>     > moment, the only callers being dp_packet_prealloc_tailroom() and
>     > dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
>     > deal with this. Thanks for pointing it out.
>     >
>     I've had a look into this and it doesn't seem that case number 2/
>     "headroom and tailroom adjustment" above would make sense for the
>     DPBUF_DPDK case. The reason being that if both `new_tailroom` and
>     `new_headroom` are being increased (in respect to dp_packet_tailroom()
>     and dp_packet_headroom()), there won't be enough space as the tail and
>     head would be competing for the same available space. So it makes sense
>     to make it an exclusive operation for the DPBUF_DPDK case.
> 
> 
> Maybe you can explain why that would be the case, in general?
>

DPDK mbufs have a fixed layout. That means we can't really resize the
mbufs per say, as `buf_len` are fixed. For example, while in a normal
DPBUF_MALLOC packet we can reallocate data and copy everything to a
smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are
stuck with the same `buf_len` (meaning we can't change the tailroom
without affecting the `data_len`, thus modifying data). What we can do
is shift data to try and create more or less tailroom, taking or adding
space from the headroom.

I've gave it some thought and I'm now convinced that reusing resize__()
here is probably not the best approach. Given its implementation and
description ("Reallocates 'b' so that it has exactly 'new_headroom' and
'new_tailroom'"), that doesn't sound like it would be a good fit for a
DPDK packet. Maybe a completely separate function would be the best
approach here, setting up different expectations.

However, as I said before, this was implemented to try to alleviate the
call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding
more complications than what it's actually solving (at least at this
point in time), I'm going to drop this patch (I'll send a revert patch
with the needed changes) in favor of dealing properly with
dp_packet_resize__() for DPDK packets, and stop calling it for DPDK packets.

> However, if that is really the case, then that is fine; an "else if"
> case would
> need to be added and a modified 'return error else case'.
> 
>  
> 
> 
>     The work I mentioned before in [1] should help here as we would then be
>     able to return an error (suach as `EINVAL`) if both "new_tailroom" and
>     "new_headroom" are incremented for the DPBUF_DPDK case.
> 
>     What do you think?
> 
>     [1]
>     https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>
> 
>
Darrell Ball July 23, 2018, 10:48 p.m. UTC | #8
On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 19/07/2018 00:02, Darrell Ball wrote:
> >
> >
> > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam@intel.com
> > <mailto:tiago.lam@intel.com>> wrote:
> >
> >     On 16/07/2018 09:37, Lam, Tiago wrote:
> >     > On 13/07/2018 18:54, Darrell Ball wrote:
> >     >> Thanks for the patch.
> >     >>
> >     >> A few queries inline.
> >     >>
> >     >
> >     > Hi Darrell,
> >     >
> >     > Thanks for your inputs. I've replied in-line as well.
> >     >
> >     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com>
> >     >> <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
> >     >>
> >     >>     When enabled with DPDK OvS relies on mbufs allocated by
> >     mempools to
> >     >>     receive and output data on DPDK ports. Until now, each OvS
> >     dp_packet has
> >     >>     had only one mbuf associated, which is allocated with the
> maximum
> >     >>     possible size, taking the MTU into account. This approach,
> >     however,
> >     >>     doesn't allow us to increase the allocated size in an mbuf,
> >     if needed,
> >     >>     since an mbuf is allocated and initialised upon mempool
> >     creation. Thus,
> >     >>     in the current implementatin this is dealt with by calling
> >     >>     OVS_NOT_REACHED() and terminating OvS.
> >     >>
> >     >>     To avoid this, and allow the (already) allocated space to be
> >     better
> >     >>     used, dp_packet_resize__() now tries to use the available
> >     room, both the
> >     >>     tailroom and the headroom, to make enough space for the new
> >     data. Since
> >     >>     this happens for packets of source DPBUF_DPDK, the
> >     single-segment mbuf
> >     >>     case mentioned above is also covered by this new aproach in
> >     resize__().
> >     >>
> >     >>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com>
> >     >>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>
> >     >>     Acked-by: Eelco Chaudron <echaudro@redhat.com
> >     <mailto:echaudro@redhat.com>
> >     >>     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>
> >     >>     ---
> >     >>      lib/dp-packet.c | 48
> >     ++++++++++++++++++++++++++++++++++++++++++++++--
> >     >>      1 file changed, 46 insertions(+), 2 deletions(-)
> >     >>
> >     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     >>     index d6e19eb..87af459 100644
> >     >>     --- a/lib/dp-packet.c
> >     >>     +++ b/lib/dp-packet.c
> >     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
> >     size_t
> >     >>     new_headroom, size_t new_tailroom
> >     >>          new_allocated = new_headroom + dp_packet_size(b) +
> >     new_tailroom;
> >     >>
> >     >>          switch (b->source) {
> >     >>     +    /* When resizing mbufs, both a single mbuf and
> multi-segment
> >     >>     mbufs (where
> >     >>     +     * data is not contigously held in memory), both the
> >     headroom
> >     >>     and the
> >     >>     +     * tailroom available will be used to make more space
> >     for where
> >     >>     data needs
> >     >>     +     * to be inserted. I.e if there's not enough headroom,
> >     data may
> >     >>     be shifted
> >     >>     +     * right if there's enough tailroom.
> >     >>     +     * However, this is not bulletproof and in some cases
> >     the space
> >     >>     available
> >     >>     +     * won't be enough - in those cases, an error should be
> >     >>     returned and the
> >     >>     +     * packet dropped. */
> >     >>          case DPBUF_DPDK:
> >     >>     -        OVS_NOT_REACHED();
> >     >>
> >     >>
> >     >> Previously, it was a coding error to call this function for a
> >     DPDK mbuf
> >     >> case, which is pretty
> >     >> clear. But with this patch, presumably that is not longer the
> >     case and
> >     >> the calling the API is
> >     >> now ok for DPDK mbufs.
> >     >>
> >     >
> >     > As it stands, it will still be an error to call
> >     dp_packet_resize__() for
> >     > any DPDK packet, or by extension any of the other functions that
> call
> >     > it, such as dp_packet_prealloc_tailroom() and
> >     > dp_packet_prealloc_headroom(). This patch only tries to alleviate
> that
> >     > by accommodating space from the headroom or tailroom, if possible,
> and
> >     > create just enough space for the new data. My preferred approach
> would
> >     > be to return an error if not possible, but since the API doesn't
> deal
> >     > with errors as is, the previous behavior of manually asserting was
> >     left
> >     > as is. As reported in [1] (I comment more on that below), the
> behavior
> >     > of manually asserting can lead to undesired behavior in some use
> >     cases.
> >     >
> >     >>
> >     >>
> >     >>     +    {
> >     >>     +        size_t miss_len;
> >     >>     +
> >     >>     +        if (new_headroom == dp_packet_headroom(b)) {
> >     >>     +            /* This is a tailroom adjustment. Since there's
> no
> >     >>     tailroom space
> >     >>     +             * left, try and shift data towards the head to
> >     free up
> >     >>     tail space,
> >     >>     +             * if there's enough headroom */
> >     >>     +
> >     >>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >     >>     +
> >     >>     +            if (miss_len <= new_headroom) {
> >     >>     +                dp_packet_shift(b, -miss_len);
> >     >>     +            } else {
> >     >>     +                /* XXX: Handle error case and report error
> >     to caller */
> >     >>     +                OVS_NOT_REACHED();
> >     >>
> >     >>
> >     >>
> >     >> This will not just drop the packet, it will fail the daemon,
> >     because a
> >     >> packet cannot be resized.
> >     >> If the system is completely depleted of memory, that may be ok,
> >     but in
> >     >> the case, no such
> >     >> assumption is implied.
> >     >>
> >     >> Also, why is XXX still left in the patch?
> >     >>
> >     >
> >     > Because there's still work to do in that regard. The whole process
> >     > shouldn't be brought down if there's not enough space to put some
> data
> >     > in one single packet. However, this was intentionally left out of
> this
> >     > series or otherwise it would increase its complexity considerably.
> >     >
> >     > As others have pointed out in [1], this is not a simple change,
> which
> >     > would have to be propagated to higher levels in other parts of the
> >     code
> >     > base. I've proposed an alternative (vs refactoring the whole
> dp_packet
> >     > API to handle and return errors) in [2], but that seems to have
> gone
> >     > stale. Going forward I see that approach merging with this new
> >     piece in
> >     > dp_packet_resize__(), where an error can be returned to the caller
> if
> >     > there's not enough space.
> >     >
> >     >> Also, the preexisting API handles two cases:
> >     >> 1/ Tailroom only adjustment
> >     >> 2/ headroom and/or tailroom adjustment
> >     >>
> >     >> meaning it handles all cases.
> >     >>
> >     >> The new DPDK addition (part of the same API) defines 2 cases
> >     >>
> >     >> 1/ tailroom only adjustment
> >     >> 2/ headroom only adjustment
> >     >>
> >     >> So, it looks like a different API, that also does not handle all
> >     cases.
> >     >>
> >     >>
> >     >
> >     > You have a point there, support for point 2/ "headroom and tailroom
> >     > adjustment" is missed. It doesn't seem to be used anywhere at the
> >     > moment, the only callers being dp_packet_prealloc_tailroom() and
> >     > dp_packet_prealloc_headroom(), but I'll submit an incremental
> patch to
> >     > deal with this. Thanks for pointing it out.
> >     >
> >     I've had a look into this and it doesn't seem that case number 2/
> >     "headroom and tailroom adjustment" above would make sense for the
> >     DPBUF_DPDK case. The reason being that if both `new_tailroom` and
> >     `new_headroom` are being increased (in respect to
> dp_packet_tailroom()
> >     and dp_packet_headroom()), there won't be enough space as the tail
> and
> >     head would be competing for the same available space. So it makes
> sense
> >     to make it an exclusive operation for the DPBUF_DPDK case.
> >
> >
> > Maybe you can explain why that would be the case, in general?
> >
>
> DPDK mbufs have a fixed layout. That means we can't really resize the
> mbufs per say, as `buf_len` are fixed. For example, while in a normal
> DPBUF_MALLOC packet we can reallocate data and copy everything to a
> smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are
> stuck with the same `buf_len` (meaning we can't change the tailroom
> without affecting the `data_len`, thus modifying data). What we can do
> is shift data to try and create more or less tailroom, taking or adding
> space from the headroom.
>

The above is fine, albeit obvious; but my question was something different.
I was asking why you thought a packet could not adjust both headroom and
tailroom
for dpdk mbuf packets. data_len may change will headroom and/or tailroom
changes.



>
> I've gave it some thought and I'm now convinced that reusing resize__()
> here is probably not the best approach. Given its implementation and
> description ("Reallocates 'b' so that it has exactly 'new_headroom' and
> 'new_tailroom'"), that doesn't sound like it would be a good fit for a
> DPDK packet. Maybe a completely separate function would be the best
> approach here, setting up different expectations.
>

That is part of the issue.




>
> However, as I said before, this was implemented to try to alleviate the
> call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding
> more complications than what it's actually solving (at least at this
> point in time), I'm going to drop this patch (I'll send a revert patch
> with the needed changes) in favor of dealing properly with
> dp_packet_resize__() for DPDK packets, and stop calling it for DPDK
> packets.
>


I think it would be better to modify the original patchset, rather than try
to add code with
the multisegment mbuf patchset and then immediately revert it.
Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used
without the resize__() in
this patch 8, I think it would be best to remove the 'shift' api from patch
7.



>
> > However, if that is really the case, then that is fine; an "else if"
> > case would
> > need to be added and a modified 'return error else case'.
> >
> >
> >
> >
> >     The work I mentioned before in [1] should help here as we would then
> be
> >     able to return an error (suach as `EINVAL`) if both "new_tailroom"
> and
> >     "new_headroom" are incremented for the DPBUF_DPDK case.
> >
> >     What do you think?
> >
> >     [1]
> >     https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html>
> >
> >
>
Darrell Ball July 23, 2018, 10:55 p.m. UTC | #9
On Fri, Jul 20, 2018 at 10:09 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 18/07/2018 23:53, Darrell Ball wrote:
> > sorry, several distractions delayed response.
> >
> > On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam@intel.com
> > <mailto:tiago.lam@intel.com>> wrote:
> >
> >     On 13/07/2018 18:54, Darrell Ball wrote:
> >     > Thanks for the patch.
> >     >
> >     > A few queries inline.
> >     >
> >
> >     Hi Darrell,
> >
> >     Thanks for your inputs. I've replied in-line as well.
> >
> >     > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>
> >     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
> >     >
> >     >     When enabled with DPDK OvS relies on mbufs allocated by
> mempools to
> >     >     receive and output data on DPDK ports. Until now, each OvS
> dp_packet has
> >     >     had only one mbuf associated, which is allocated with the
> maximum
> >     >     possible size, taking the MTU into account. This approach,
> however,
> >     >     doesn't allow us to increase the allocated size in an mbuf, if
> needed,
> >     >     since an mbuf is allocated and initialised upon mempool
> creation. Thus,
> >     >     in the current implementatin this is dealt with by calling
> >     >     OVS_NOT_REACHED() and terminating OvS.
> >     >
> >     >     To avoid this, and allow the (already) allocated space to be
> better
> >     >     used, dp_packet_resize__() now tries to use the available
> room, both the
> >     >     tailroom and the headroom, to make enough space for the new
> data. Since
> >     >     this happens for packets of source DPBUF_DPDK, the
> single-segment mbuf
> >     >     case mentioned above is also covered by this new aproach in
> resize__().
> >     >
> >     >     Signed-off-by: Tiago Lam <tiago.lam@intel.com <mailto:
> tiago.lam@intel.com>
> >     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>
> >     >     Acked-by: Eelco Chaudron <echaudro@redhat.com
> >     <mailto:echaudro@redhat.com>
> >     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>
> >     >     ---
> >     >      lib/dp-packet.c | 48
> >     ++++++++++++++++++++++++++++++++++++++++++++++--
> >     >      1 file changed, 46 insertions(+), 2 deletions(-)
> >     >
> >     >     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     >     index d6e19eb..87af459 100644
> >     >     --- a/lib/dp-packet.c
> >     >     +++ b/lib/dp-packet.c
> >     >     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
> >     size_t
> >     >     new_headroom, size_t new_tailroom
> >     >          new_allocated = new_headroom + dp_packet_size(b) +
> >     new_tailroom;
> >     >
> >     >          switch (b->source) {
> >     >     +    /* When resizing mbufs, both a single mbuf and
> multi-segment
> >     >     mbufs (where
> >     >     +     * data is not contigously held in memory), both the
> headroom
> >     >     and the
> >     >     +     * tailroom available will be used to make more space for
> >     where
> >     >     data needs
> >     >     +     * to be inserted. I.e if there's not enough headroom,
> >     data may
> >     >     be shifted
> >     >     +     * right if there's enough tailroom.
> >     >     +     * However, this is not bulletproof and in some cases the
> >     space
> >     >     available
> >     >     +     * won't be enough - in those cases, an error should be
> >     >     returned and the
> >     >     +     * packet dropped. */
> >     >          case DPBUF_DPDK:
> >     >     -        OVS_NOT_REACHED();
> >     >
> >     >
> >     > Previously, it was a coding error to call this function for a DPDK
> >     mbuf
> >     > case, which is pretty
> >     > clear. But with this patch, presumably that is not longer the case
> and
> >     > the calling the API is
> >     > now ok for DPDK mbufs.
> >     >
> >
> >     As it stands, it will still be an error to call dp_packet_resize__()
> for
> >     any DPDK packet, or by extension any of the other functions that call
> >     it, such as dp_packet_prealloc_tailroom() and
> >     dp_packet_prealloc_headroom().
> >
> >
> >
> > yep, the existing code fails in a very clear way; i.e. whenever it is
> > called for a dpdk packet.
> > So the user would need to handle in some other way, which is not being
> > done today, I know.
> >
> >
> >
> >     This patch only tries to alleviate that
> >     by accommodating space from the headroom or tailroom, if possible,
> and
> >     create just enough space for the new data.
> >
> >
> > The new code will fail is some yet undefined way, occasionally working
> > and failing
> > in the "other" cases.
> >
> >
> >
> >     My preferred approach would
> >     be to return an error if not possible, but since the API doesn't deal
> >     with errors as is, the previous behavior of manually asserting was
> left
> >     as is. As reported in [1] (I comment more on that below), the
> behavior
> >
> >     of manually asserting can lead to undesired behavior in some use
> cases.
> >
> >
> >
> > I am familiar with the issue.
> > As part of the change,  dp_packet_put_uninit()
> > and dp_packet_push_uninit() could be modified to return NULL
> > and that could be percolated and checked for.
> >
> > Those APIs could simply check (by calling a helper API) if they would
> > fail a priori to trigger returning
> > NULL for dpdk buf cases.
> >
> >
>
> Ok, I'm glad we're on the same page here. That's what [2] is doing. I'm
> planning to bring that forward. If you could have a look as well, that
> would be great.
>
> >
> >
> >     >
> >     >
> >     >     +    {
> >     >     +        size_t miss_len;
> >     >     +
> >     >     +        if (new_headroom == dp_packet_headroom(b)) {
> >     >     +            /* This is a tailroom adjustment. Since there's no
> >     >     tailroom space
> >     >     +             * left, try and shift data towards the head to
> free up
> >     >     tail space,
> >     >     +             * if there's enough headroom */
> >     >     +
> >     >     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >     >     +
> >     >     +            if (miss_len <= new_headroom) {
> >     >     +                dp_packet_shift(b, -miss_len);
> >     >     +            } else {
> >     >     +                /* XXX: Handle error case and report error to
> caller */
> >     >     +                OVS_NOT_REACHED();
> >     >
> >     >
> >     >
> >     > This will not just drop the packet, it will fail the daemon,
> because a
> >     > packet cannot be resized.
> >     > If the system is completely depleted of memory, that may be ok,
> but in
> >     > the case, no such
> >     > assumption is implied.
> >     >
> >     > Also, why is XXX still left in the patch?
> >     >
> >
> >     Because there's still work to do in that regard. The whole process
> >     shouldn't be brought down if there's not enough space to put some
> data
> >     in one single packet. However, this was intentionally left out of
> this
> >     series or otherwise it would increase its complexity considerably.
> >
> >
> > It seems unnecessary to add a bunch of code to a series that tries to
> handle
> > 'resize', but handles it partially in practical cases. It also seems
> > undefined when
> > it works and when it does not from a API caller POV.
>
> Granted that the dp_packet_resize__() supports more operations than what
> this implementation provides, but dp_packet API callers won't notice a
> difference since the only callers are
> dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And the
> implementation covers for those functions, which only either modify the
> tailroom or the headroom (and not both).
>
> > I think patch 7 is also there to only support this patch 8.
> >
>
> That's a wrong assumption. Patch 7/14 was added to support the
> dp_packet_shift() operation.
>
> In this patch 8/14, because of the noncontiguous nature of chained
> mbufs, there's no possibility of using general functions like `memmove`,
> and re-using the shift operations was just easier.
>

I guess we may be talking about 2 different things.
My point is simply that greping for dp_packet_mbuf_shift() does not show
any users with resize__()
removed; I think it would be best to remove unused code from the patchset
dp_packet_mbuf_shift() is defined in patch 7.



> > Ideally, It would seem like a modified patch 7 and patch 8 would belong
> > with the rest of the
> > fix for the dpdk packet memory preallocation constraint issue.
> >
> > Also, ideally, 'XXX' is removed from patches.
> >
> >
> >
> >
> >     As others have pointed out in [1], this is not a simple change, which
> >     would have to be propagated to higher levels in other parts of the
> code
> >     base. I've proposed an alternative (vs refactoring the whole
> dp_packet
> >     API to handle and return errors) in [2], but that seems to have gone
> >     stale. Going forward I see that approach merging with this new piece
> in
> >     dp_packet_resize__(), where an error can be returned to the caller if
> >     there's not enough space.
> >
> >
> > The full change is outside the scope of this series.
> >
> >
> >
> >
> >     > Also, the preexisting API handles two cases:
> >     > 1/ Tailroom only adjustment
> >     > 2/ headroom and/or tailroom adjustment
> >     >
> >     > meaning it handles all cases.
> >     >
> >     > The new DPDK addition (part of the same API) defines 2 cases
> >     >
> >     > 1/ tailroom only adjustment
> >     > 2/ headroom only adjustment
> >     >
> >     > So, it looks like a different API, that also does not handle all
> cases.
> >     >
> >     >
> >
> >     You have a point there, support for point 2/ "headroom and tailroom
> >     adjustment" is missed. It doesn't seem to be used anywhere at the
> >     moment, the only callers being dp_packet_prealloc_tailroom() and
> >     dp_packet_prealloc_headroom(), but I'll submit an incremental patch
> to
> >     deal with this. Thanks for pointing it out.
> >
> >     Tiago.
> >
> >     [1]
> >     https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> >
> >     [2]
> >     https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html>
> >
> >     >
> >     >     +            }
> >     >     +        } else {
> >     >     +            /* Otherwise, this is a headroom adjustment. Try
> to
> >     >     shift data
> >     >     +             * towards the tail to free up head space, if
> there's
> >     >     enough
> >     >     +             * tailroom */
> >     >     +
> >     >     +            miss_len = new_headroom - dp_packet_headroom(b);
> >     >
> >     >     +
> >     >     +            if (miss_len <= new_tailroom) {
> >     >     +                dp_packet_shift(b, miss_len);
> >     >     +            } else {
> >     >     +                /* XXX: Handle error case and report error to
> >     caller */
> >     >     +                OVS_NOT_REACHED();
> >     >
> >     >
> >     >
> >     > same comments as above.
> >     >
> >     >
> >     >
> >     >     +            }
> >     >     +        }
> >     >     +
> >     >     +        new_base = dp_packet_base(b);
> >     >     +
> >     >     +        break;
> >     >     +    }
> >     >          case DPBUF_MALLOC:
> >     >              if (new_headroom == dp_packet_headroom(b)) {
> >     >                  new_base = xrealloc(dp_packet_base(b),
> >     new_allocated);
> >     >     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b,
> size_t
> >     >     new_headroom, size_t new_tailroom
> >     >              OVS_NOT_REACHED();
> >     >          }
> >     >
> >     >     -    dp_packet_set_allocated(b, new_allocated);
> >     >     +    if (b->source != DPBUF_DPDK) {
> >     >     +        dp_packet_set_allocated(b, new_allocated);
> >     >     +    }
> >     >          dp_packet_set_base(b, new_base);
> >     >
> >     >          new_data = (char *) new_base + new_headroom;
> >     >     --
> >     >     2.7.4
> >     >
> >     >     _______________________________________________
> >     >     dev mailing list
> >     >     dev@openvswitch.org <mailto:dev@openvswitch.org>
> >     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>
> >     >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>
> >     >
> >     >
> >
> >
>
Lam, Tiago July 24, 2018, 7:06 a.m. UTC | #10
On 23/07/2018 23:55, Darrell Ball wrote:
> 
> 
> On Fri, Jul 20, 2018 at 10:09 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>> wrote:
> 
>     On 18/07/2018 23:53, Darrell Ball wrote:
>     > sorry, several distractions delayed response.
>     > 
>     > On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
>     > 
>     >     On 13/07/2018 18:54, Darrell Ball wrote:
>     >     > Thanks for the patch.
>     >     > 
>     >     > A few queries inline.
>     >     > 
>     > 
>     >     Hi Darrell,
>     > 
>     >     Thanks for your inputs. I've replied in-line as well.
>     > 
>     >     > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
>     >     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>> wrote:
>     >     > 
>     >     >     When enabled with DPDK OvS relies on mbufs allocated by mempools to
>     >     >     receive and output data on DPDK ports. Until now, each OvS dp_packet has
>     >     >     had only one mbuf associated, which is allocated with the maximum
>     >     >     possible size, taking the MTU into account. This approach, however,
>     >     >     doesn't allow us to increase the allocated size in an mbuf, if needed,
>     >     >     since an mbuf is allocated and initialised upon mempool creation. Thus,
>     >     >     in the current implementatin this is dealt with by calling
>     >     >     OVS_NOT_REACHED() and terminating OvS.
>     >     > 
>     >     >     To avoid this, and allow the (already) allocated space to be better
>     >     >     used, dp_packet_resize__() now tries to use the available room, both the
>     >     >     tailroom and the headroom, to make enough space for the new data. Since
>     >     >     this happens for packets of source DPBUF_DPDK, the single-segment mbuf
>     >     >     case mentioned above is also covered by this new aproach in resize__().
>     >     > 
>     >     >     Signed-off-by: Tiago Lam <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
>     >     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>>
>     >     >     Acked-by: Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>
>     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>
>     >     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>
>     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>>
>     >     >     ---
>     >     >      lib/dp-packet.c | 48
>     >     ++++++++++++++++++++++++++++++++++++++++++++++--
>     >     >      1 file changed, 46 insertions(+), 2 deletions(-)
>     >     >
>     >     >     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     >     >     index d6e19eb..87af459 100644
>     >     >     --- a/lib/dp-packet.c
>     >     >     +++ b/lib/dp-packet.c
>     >     >     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
>     >     size_t
>     >     >     new_headroom, size_t new_tailroom
>     >     >          new_allocated = new_headroom + dp_packet_size(b) +
>     >     new_tailroom;
>     >     >
>     >     >          switch (b->source) {
>     >     >     +    /* When resizing mbufs, both a single mbuf and
>     multi-segment
>     >     >     mbufs (where
>     >     >     +     * data is not contigously held in memory), both
>     the headroom
>     >     >     and the
>     >     >     +     * tailroom available will be used to make more
>     space for
>     >     where
>     >     >     data needs
>     >     >     +     * to be inserted. I.e if there's not enough headroom,
>     >     data may
>     >     >     be shifted
>     >     >     +     * right if there's enough tailroom.
>     >     >     +     * However, this is not bulletproof and in some
>     cases the
>     >     space
>     >     >     available
>     >     >     +     * won't be enough - in those cases, an error should be
>     >     >     returned and the
>     >     >     +     * packet dropped. */
>     >     >          case DPBUF_DPDK:
>     >     >     -        OVS_NOT_REACHED();
>     >     >
>     >     >
>     >     > Previously, it was a coding error to call this function for
>     a DPDK
>     >     mbuf
>     >     > case, which is pretty
>     >     > clear. But with this patch, presumably that is not longer
>     the case and
>     >     > the calling the API is
>     >     > now ok for DPDK mbufs.
>     >     >
>     >
>     >     As it stands, it will still be an error to call
>     dp_packet_resize__() for
>     >     any DPDK packet, or by extension any of the other functions
>     that call
>     >     it, such as dp_packet_prealloc_tailroom() and
>     >     dp_packet_prealloc_headroom().
>     >
>     >
>     >
>     > yep, the existing code fails in a very clear way; i.e. whenever it is
>     > called for a dpdk packet.
>     > So the user would need to handle in some other way, which is not being
>     > done today, I know.
>     >
>     >  
>     >
>     >     This patch only tries to alleviate that
>     >     by accommodating space from the headroom or tailroom, if
>     possible, and
>     >     create just enough space for the new data.
>     >
>     >
>     > The new code will fail is some yet undefined way, occasionally working
>     > and failing
>     > in the "other" cases.
>     >
>     >  
>     >
>     >     My preferred approach would
>     >     be to return an error if not possible, but since the API
>     doesn't deal
>     >     with errors as is, the previous behavior of manually asserting
>     was left
>     >     as is. As reported in [1] (I comment more on that below), the
>     behavior 
>     >
>     >     of manually asserting can lead to undesired behavior in some
>     use cases.
>     >
>     >
>     >
>     > I am familiar with the issue.
>     > As part of the change,  dp_packet_put_uninit()
>     > and dp_packet_push_uninit() could be modified to return NULL
>     > and that could be percolated and checked for.
>     >
>     > Those APIs could simply check (by calling a helper API) if they would
>     > fail a priori to trigger returning 
>     > NULL for dpdk buf cases.
>     >
>     >  
> 
>     Ok, I'm glad we're on the same page here. That's what [2] is doing. I'm
>     planning to bring that forward. If you could have a look as well, that
>     would be great.
> 
>     >
>     >
>     >     >  
>     >     >
>     >     >     +    {
>     >     >     +        size_t miss_len;
>     >     >     +
>     >     >     +        if (new_headroom == dp_packet_headroom(b)) {
>     >     >     +            /* This is a tailroom adjustment. Since
>     there's no
>     >     >     tailroom space
>     >     >     +             * left, try and shift data towards the
>     head to free up
>     >     >     tail space,
>     >     >     +             * if there's enough headroom */
>     >     >     +
>     >     >     +            miss_len = new_tailroom -
>     dp_packet_tailroom(b);
>     >     >     +
>     >     >     +            if (miss_len <= new_headroom) {
>     >     >     +                dp_packet_shift(b, -miss_len);
>     >     >     +            } else {
>     >     >     +                /* XXX: Handle error case and report
>     error to caller */
>     >     >     +                OVS_NOT_REACHED();
>     >     >
>     >     >
>     >     >
>     >     > This will not just drop the packet, it will fail the daemon,
>     because a
>     >     > packet cannot be resized.
>     >     > If the system is completely depleted of memory, that may be
>     ok, but in
>     >     > the case, no such
>     >     > assumption is implied.
>     >     >
>     >     > Also, why is XXX still left in the patch?
>     >     >
>     >
>     >     Because there's still work to do in that regard. The whole process
>     >     shouldn't be brought down if there's not enough space to put
>     some data
>     >     in one single packet. However, this was intentionally left out
>     of this
>     >     series or otherwise it would increase its complexity
>     considerably. 
>     >
>     >
>     > It seems unnecessary to add a bunch of code to a series that tries
>     to handle
>     > 'resize', but handles it partially in practical cases. It also seems
>     > undefined when
>     > it works and when it does not from a API caller POV.
> 
>     Granted that the dp_packet_resize__() supports more operations than what
>     this implementation provides, but dp_packet API callers won't notice a
>     difference since the only callers are
>     dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And the
>     implementation covers for those functions, which only either modify the
>     tailroom or the headroom (and not both).
> 
>     > I think patch 7 is also there to only support this patch 8.
>     > 
> 
>     That's a wrong assumption. Patch 7/14 was added to support the
>     dp_packet_shift() operation.
> 
>     In this patch 8/14, because of the noncontiguous nature of chained
>     mbufs, there's no possibility of using general functions like `memmove`,
>     and re-using the shift operations was just easier.
> 
> 
> I guess we may be talking about 2 different things.
> My point is simply that greping for dp_packet_mbuf_shift() does not show
> any users with resize__()
> removed; I think it would be best to remove unused code from the patchset
> dp_packet_mbuf_shift() is defined in patch 7.
> 
> 

Again, it is used in `dp_packet_shift()` for DPDK packets.

> 
>     > Ideally, It would seem like a modified patch 7 and patch 8 would
>     belong
>     > with the rest of the 
>     > fix for the dpdk packet memory preallocation constraint issue.
>     >
>     > Also, ideally, 'XXX' is removed from patches.
>     >
>     >  
>     >
>     >
>     >     As others have pointed out in [1], this is not a simple
>     change, which
>     >     would have to be propagated to higher levels in other parts of
>     the code
>     >     base. I've proposed an alternative (vs refactoring the whole
>     dp_packet
>     >     API to handle and return errors) in [2], but that seems to
>     have gone
>     >     stale. Going forward I see that approach merging with this new
>     piece in
>     >     dp_packet_resize__(), where an error can be returned to the
>     caller if
>     >     there's not enough space.
>     >
>     >
>     > The full change is outside the scope of this series.
>     >
>     >  
>     >
>     >
>     >     > Also, the preexisting API handles two cases:
>     >     > 1/ Tailroom only adjustment
>     >     > 2/ headroom and/or tailroom adjustment
>     >     >
>     >     > meaning it handles all cases.
>     >     >
>     >     > The new DPDK addition (part of the same API) defines 2 cases
>     >     >
>     >     > 1/ tailroom only adjustment
>     >     > 2/ headroom only adjustment
>     >     >
>     >     > So, it looks like a different API, that also does not handle
>     all cases.
>     >     >
>     >     >  
>     >
>     >     You have a point there, support for point 2/ "headroom and
>     tailroom
>     >     adjustment" is missed. It doesn't seem to be used anywhere at the
>     >     moment, the only callers being dp_packet_prealloc_tailroom() and
>     >     dp_packet_prealloc_headroom(), but I'll submit an incremental
>     patch to
>     >     deal with this. Thanks for pointing it out.
>     >
>     >     Tiago.
>     >
>     >     [1]
>     >   
>      https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html>
>     >   
>      <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html>>
>     >     [2]
>     >   
>      https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>
>     >   
>      <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>>
>     >
>     >     >
>     >     >     +            }
>     >     >     +        } else {
>     >     >     +            /* Otherwise, this is a headroom
>     adjustment. Try to
>     >     >     shift data
>     >     >     +             * towards the tail to free up head space,
>     if there's
>     >     >     enough
>     >     >     +             * tailroom */
>     >     >     +
>     >     >     +            miss_len = new_headroom -
>     dp_packet_headroom(b);
>     >     >
>     >     >     +
>     >     >     +            if (miss_len <= new_tailroom) {
>     >     >     +                dp_packet_shift(b, miss_len);
>     >     >     +            } else {
>     >     >     +                /* XXX: Handle error case and report
>     error to
>     >     caller */
>     >     >     +                OVS_NOT_REACHED();
>     >     >
>     >     >
>     >     >
>     >     > same comments as above.
>     >     >
>     >     >  
>     >     >
>     >     >     +            }
>     >     >     +        }
>     >     >     +
>     >     >     +        new_base = dp_packet_base(b);
>     >     >     +
>     >     >     +        break;
>     >     >     +    }
>     >     >          case DPBUF_MALLOC:
>     >     >              if (new_headroom == dp_packet_headroom(b)) {
>     >     >                  new_base = xrealloc(dp_packet_base(b),
>     >     new_allocated);
>     >     >     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet
>     *b, size_t
>     >     >     new_headroom, size_t new_tailroom
>     >     >              OVS_NOT_REACHED();
>     >     >          }
>     >     >
>     >     >     -    dp_packet_set_allocated(b, new_allocated);
>     >     >     +    if (b->source != DPBUF_DPDK) {
>     >     >     +        dp_packet_set_allocated(b, new_allocated);
>     >     >     +    }
>     >     >          dp_packet_set_base(b, new_base);
>     >     >
>     >     >          new_data = (char *) new_base + new_headroom;
>     >     >     --
>     >     >     2.7.4
>     >     >
>     >     >     _______________________________________________
>     >     >     dev mailing list
>     >     >     dev@openvswitch.org <mailto:dev@openvswitch.org>
>     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>
>     >     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>
>     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>>
>     >     >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>
>     >     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>>
>     >     >
>     >     >
>     >
>     >
> 
>
Lam, Tiago July 24, 2018, 7:36 a.m. UTC | #11
On 23/07/2018 23:48, Darrell Ball wrote:
> 
> 
> On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>> wrote:
> 
>     On 19/07/2018 00:02, Darrell Ball wrote:
>     > 
>     > 
>     > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
>     > 
>     >     On 16/07/2018 09:37, Lam, Tiago wrote:
>     >     > On 13/07/2018 18:54, Darrell Ball wrote:
>     >     >> Thanks for the patch.
>     >     >>
>     >     >> A few queries inline.
>     >     >>
>     >     >
>     >     > Hi Darrell,
>     >     >
>     >     > Thanks for your inputs. I've replied in-line as well.
>     >     >
>     >     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
>     >     >> <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
>     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>> wrote:
>     >     >>
>     >     >>     When enabled with DPDK OvS relies on mbufs allocated by
>     >     mempools to
>     >     >>     receive and output data on DPDK ports. Until now, each OvS
>     >     dp_packet has
>     >     >>     had only one mbuf associated, which is allocated with
>     the maximum
>     >     >>     possible size, taking the MTU into account. This approach,
>     >     however,
>     >     >>     doesn't allow us to increase the allocated size in an mbuf,
>     >     if needed,
>     >     >>     since an mbuf is allocated and initialised upon mempool
>     >     creation. Thus,
>     >     >>     in the current implementatin this is dealt with by calling
>     >     >>     OVS_NOT_REACHED() and terminating OvS.
>     >     >>
>     >     >>     To avoid this, and allow the (already) allocated space
>     to be
>     >     better
>     >     >>     used, dp_packet_resize__() now tries to use the available
>     >     room, both the
>     >     >>     tailroom and the headroom, to make enough space for the new
>     >     data. Since
>     >     >>     this happens for packets of source DPBUF_DPDK, the
>     >     single-segment mbuf
>     >     >>     case mentioned above is also covered by this new aproach in
>     >     resize__().
>     >     >>
>     >     >>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com>
>     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
>     >     >>     <mailto:tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com> <mailto:tiago.lam@intel.com
>     <mailto:tiago.lam@intel.com>>>>
>     >     >>     Acked-by: Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>
>     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>
>     >     >>     <mailto:echaudro@redhat.com
>     <mailto:echaudro@redhat.com> <mailto:echaudro@redhat.com
>     <mailto:echaudro@redhat.com>>>>
>     >     >>     ---
>     >     >>      lib/dp-packet.c | 48
>     >     ++++++++++++++++++++++++++++++++++++++++++++++--
>     >     >>      1 file changed, 46 insertions(+), 2 deletions(-)
>     >     >>
>     >     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     >     >>     index d6e19eb..87af459 100644
>     >     >>     --- a/lib/dp-packet.c
>     >     >>     +++ b/lib/dp-packet.c
>     >     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct
>     dp_packet *b,
>     >     size_t
>     >     >>     new_headroom, size_t new_tailroom
>     >     >>          new_allocated = new_headroom + dp_packet_size(b) +
>     >     new_tailroom;
>     >     >>
>     >     >>          switch (b->source) {
>     >     >>     +    /* When resizing mbufs, both a single mbuf and
>     multi-segment
>     >     >>     mbufs (where
>     >     >>     +     * data is not contigously held in memory), both the
>     >     headroom
>     >     >>     and the
>     >     >>     +     * tailroom available will be used to make more space
>     >     for where
>     >     >>     data needs
>     >     >>     +     * to be inserted. I.e if there's not enough headroom,
>     >     data may
>     >     >>     be shifted
>     >     >>     +     * right if there's enough tailroom.
>     >     >>     +     * However, this is not bulletproof and in some cases
>     >     the space
>     >     >>     available
>     >     >>     +     * won't be enough - in those cases, an error
>     should be
>     >     >>     returned and the
>     >     >>     +     * packet dropped. */
>     >     >>          case DPBUF_DPDK:
>     >     >>     -        OVS_NOT_REACHED();
>     >     >>
>     >     >>
>     >     >> Previously, it was a coding error to call this function for a
>     >     DPDK mbuf
>     >     >> case, which is pretty
>     >     >> clear. But with this patch, presumably that is not longer the
>     >     case and
>     >     >> the calling the API is
>     >     >> now ok for DPDK mbufs.
>     >     >>
>     >     >
>     >     > As it stands, it will still be an error to call
>     >     dp_packet_resize__() for
>     >     > any DPDK packet, or by extension any of the other functions
>     that call
>     >     > it, such as dp_packet_prealloc_tailroom() and
>     >     > dp_packet_prealloc_headroom(). This patch only tries to
>     alleviate that
>     >     > by accommodating space from the headroom or tailroom, if
>     possible, and
>     >     > create just enough space for the new data. My preferred
>     approach would
>     >     > be to return an error if not possible, but since the API
>     doesn't deal
>     >     > with errors as is, the previous behavior of manually
>     asserting was
>     >     left
>     >     > as is. As reported in [1] (I comment more on that below),
>     the behavior
>     >     > of manually asserting can lead to undesired behavior in some use
>     >     cases.
>     >     >
>     >     >>  
>     >     >>
>     >     >>     +    {
>     >     >>     +        size_t miss_len;
>     >     >>     +
>     >     >>     +        if (new_headroom == dp_packet_headroom(b)) {
>     >     >>     +            /* This is a tailroom adjustment. Since
>     there's no
>     >     >>     tailroom space
>     >     >>     +             * left, try and shift data towards the
>     head to
>     >     free up
>     >     >>     tail space,
>     >     >>     +             * if there's enough headroom */
>     >     >>     +
>     >     >>     +            miss_len = new_tailroom -
>     dp_packet_tailroom(b);
>     >     >>     +
>     >     >>     +            if (miss_len <= new_headroom) {
>     >     >>     +                dp_packet_shift(b, -miss_len);
>     >     >>     +            } else {
>     >     >>     +                /* XXX: Handle error case and report error
>     >     to caller */
>     >     >>     +                OVS_NOT_REACHED();
>     >     >>
>     >     >>
>     >     >>
>     >     >> This will not just drop the packet, it will fail the daemon,
>     >     because a
>     >     >> packet cannot be resized.
>     >     >> If the system is completely depleted of memory, that may be ok,
>     >     but in
>     >     >> the case, no such
>     >     >> assumption is implied.
>     >     >>
>     >     >> Also, why is XXX still left in the patch?
>     >     >>
>     >     >
>     >     > Because there's still work to do in that regard. The whole
>     process
>     >     > shouldn't be brought down if there's not enough space to put
>     some data
>     >     > in one single packet. However, this was intentionally left
>     out of this
>     >     > series or otherwise it would increase its complexity
>     considerably.
>     >     >
>     >     > As others have pointed out in [1], this is not a simple
>     change, which
>     >     > would have to be propagated to higher levels in other parts
>     of the
>     >     code
>     >     > base. I've proposed an alternative (vs refactoring the whole
>     dp_packet
>     >     > API to handle and return errors) in [2], but that seems to
>     have gone
>     >     > stale. Going forward I see that approach merging with this new
>     >     piece in
>     >     > dp_packet_resize__(), where an error can be returned to the
>     caller if
>     >     > there's not enough space.
>     >     >
>     >     >> Also, the preexisting API handles two cases:
>     >     >> 1/ Tailroom only adjustment
>     >     >> 2/ headroom and/or tailroom adjustment
>     >     >>
>     >     >> meaning it handles all cases.
>     >     >>
>     >     >> The new DPDK addition (part of the same API) defines 2 cases
>     >     >>
>     >     >> 1/ tailroom only adjustment
>     >     >> 2/ headroom only adjustment
>     >     >>
>     >     >> So, it looks like a different API, that also does not
>     handle all
>     >     cases.
>     >     >>
>     >     >>  
>     >     >
>     >     > You have a point there, support for point 2/ "headroom and
>     tailroom
>     >     > adjustment" is missed. It doesn't seem to be used anywhere
>     at the
>     >     > moment, the only callers being dp_packet_prealloc_tailroom() and
>     >     > dp_packet_prealloc_headroom(), but I'll submit an
>     incremental patch to
>     >     > deal with this. Thanks for pointing it out.
>     >     >
>     >     I've had a look into this and it doesn't seem that case number 2/
>     >     "headroom and tailroom adjustment" above would make sense for the
>     >     DPBUF_DPDK case. The reason being that if both `new_tailroom` and
>     >     `new_headroom` are being increased (in respect to
>     dp_packet_tailroom()
>     >     and dp_packet_headroom()), there won't be enough space as the
>     tail and
>     >     head would be competing for the same available space. So it
>     makes sense
>     >     to make it an exclusive operation for the DPBUF_DPDK case.
>     >
>     >
>     > Maybe you can explain why that would be the case, in general?
>     >
> 
>     DPDK mbufs have a fixed layout. That means we can't really resize the
>     mbufs per say, as `buf_len` are fixed. For example, while in a normal
>     DPBUF_MALLOC packet we can reallocate data and copy everything to a
>     smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are
>     stuck with the same `buf_len` (meaning we can't change the tailroom
>     without affecting the `data_len`, thus modifying data). What we can do
>     is shift data to try and create more or less tailroom, taking or adding
>     space from the headroom.
> 
> 
> The above is fine, albeit obvious; but my question was something different.
> I was asking why you thought a packet could not adjust both headroom and
> tailroom
> for dpdk mbuf packets. data_len may change will headroom and/or tailroom
> changes.
> 
>  

My understanding is that `data_len` needs to remain fixed (as `size_` in
non-DPDK packets remains the same in DPBUF_MALLOC). If you change
`data_len` then you're effectively affecting the data. Either adding
random data to the packet, by increasing `data_len`, or trunking the
packet by decreasing `data_len`.

Given the above, if you want to increase the tailroom then you can only
shift the data left (if there's enough headroom to do so), and that will
affect the headroom available space.

> 
> 
>     I've gave it some thought and I'm now convinced that reusing resize__()
>     here is probably not the best approach. Given its implementation and
>     description ("Reallocates 'b' so that it has exactly 'new_headroom' and
>     'new_tailroom'"), that doesn't sound like it would be a good fit for a
>     DPDK packet. Maybe a completely separate function would be the best
>     approach here, setting up different expectations.
> 
> 
> That is part of the issue.
> 
> 
>  
> 
> 
>     However, as I said before, this was implemented to try to alleviate the
>     call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding
>     more complications than what it's actually solving (at least at this
>     point in time), I'm going to drop this patch (I'll send a revert patch
>     with the needed changes) in favor of dealing properly with
>     dp_packet_resize__() for DPDK packets, and stop calling it for DPDK
>     packets.
> 
> 
> 
> I think it would be better to modify the original patchset, rather than
> try to add code with
> the multisegment mbuf patchset and then immediately revert it.
> Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used
> without the resize__() in 
> this patch 8, I think it would be best to remove the 'shift' api from
> patch 7.
> 

I would be fine with modifying the original patchset, except your
comments came after the merge to the `dpdk_merge` branch. And it becomes
somewhat even more delicate because it is very close to the deadline for
accepting new features.

Now, we could modify the `dpdk_merge` branch, since it hasn't been
merged to master yet, but at expenses of Ian's time... I'm not sure what
the policy is here, so I'm CC'ing Ian to clarify. Should I submit v6
here, or leave it as is (with a revert patch queued to be applied)?
Darrell Ball Aug. 7, 2018, 4:39 p.m. UTC | #12
On Tue, Jul 24, 2018 at 12:06 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 23/07/2018 23:55, Darrell Ball wrote:
> >
> >
> > On Fri, Jul 20, 2018 at 10:09 AM, Lam, Tiago <tiago.lam@intel.com
> > <mailto:tiago.lam@intel.com>> wrote:
> >
> >     On 18/07/2018 23:53, Darrell Ball wrote:
> >     > sorry, several distractions delayed response.
> >     >
> >     > On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>
> >     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
> >     >
> >     >     On 13/07/2018 18:54, Darrell Ball wrote:
> >     >     > Thanks for the patch.
> >     >     >
> >     >     > A few queries inline.
> >     >     >
> >     >
> >     >     Hi Darrell,
> >     >
> >     >     Thanks for your inputs. I've replied in-line as well.
> >     >
> >     >     > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <
> tiago.lam@intel.com <mailto:tiago.lam@intel.com>
> >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
> >     >     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
> >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>> wrote:
> >     >     >
> >     >     >     When enabled with DPDK OvS relies on mbufs allocated by
> mempools to
> >     >     >     receive and output data on DPDK ports. Until now, each
> OvS dp_packet has
> >     >     >     had only one mbuf associated, which is allocated with
> the maximum
> >     >     >     possible size, taking the MTU into account. This
> approach, however,
> >     >     >     doesn't allow us to increase the allocated size in an
> mbuf, if needed,
> >     >     >     since an mbuf is allocated and initialised upon mempool
> creation. Thus,
> >     >     >     in the current implementatin this is dealt with by
> calling
> >     >     >     OVS_NOT_REACHED() and terminating OvS.
> >     >     >
> >     >     >     To avoid this, and allow the (already) allocated space
> to be better
> >     >     >     used, dp_packet_resize__() now tries to use the
> available room, both the
> >     >     >     tailroom and the headroom, to make enough space for the
> new data. Since
> >     >     >     this happens for packets of source DPBUF_DPDK, the
> single-segment mbuf
> >     >     >     case mentioned above is also covered by this new aproach
> in resize__().
> >     >     >
> >     >     >     Signed-off-by: Tiago Lam <tiago.lam@intel.com <mailto:
> tiago.lam@intel.com>
> >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
> >     >     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
> >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>>
> >     >     >     Acked-by: Eelco Chaudron <echaudro@redhat.com <mailto:
> echaudro@redhat.com>
> >     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>
> >     >     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>
> >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>>>
> >     >     >     ---
> >     >     >      lib/dp-packet.c | 48
> >     >     ++++++++++++++++++++++++++++++++++++++++++++++--
> >     >     >      1 file changed, 46 insertions(+), 2 deletions(-)
> >     >     >
> >     >     >     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     >     >     index d6e19eb..87af459 100644
> >     >     >     --- a/lib/dp-packet.c
> >     >     >     +++ b/lib/dp-packet.c
> >     >     >     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet
> *b,
> >     >     size_t
> >     >     >     new_headroom, size_t new_tailroom
> >     >     >          new_allocated = new_headroom + dp_packet_size(b) +
> >     >     new_tailroom;
> >     >     >
> >     >     >          switch (b->source) {
> >     >     >     +    /* When resizing mbufs, both a single mbuf and
> >     multi-segment
> >     >     >     mbufs (where
> >     >     >     +     * data is not contigously held in memory), both
> >     the headroom
> >     >     >     and the
> >     >     >     +     * tailroom available will be used to make more
> >     space for
> >     >     where
> >     >     >     data needs
> >     >     >     +     * to be inserted. I.e if there's not enough
> headroom,
> >     >     data may
> >     >     >     be shifted
> >     >     >     +     * right if there's enough tailroom.
> >     >     >     +     * However, this is not bulletproof and in some
> >     cases the
> >     >     space
> >     >     >     available
> >     >     >     +     * won't be enough - in those cases, an error
> should be
> >     >     >     returned and the
> >     >     >     +     * packet dropped. */
> >     >     >          case DPBUF_DPDK:
> >     >     >     -        OVS_NOT_REACHED();
> >     >     >
> >     >     >
> >     >     > Previously, it was a coding error to call this function for
> >     a DPDK
> >     >     mbuf
> >     >     > case, which is pretty
> >     >     > clear. But with this patch, presumably that is not longer
> >     the case and
> >     >     > the calling the API is
> >     >     > now ok for DPDK mbufs.
> >     >     >
> >     >
> >     >     As it stands, it will still be an error to call
> >     dp_packet_resize__() for
> >     >     any DPDK packet, or by extension any of the other functions
> >     that call
> >     >     it, such as dp_packet_prealloc_tailroom() and
> >     >     dp_packet_prealloc_headroom().
> >     >
> >     >
> >     >
> >     > yep, the existing code fails in a very clear way; i.e. whenever it
> is
> >     > called for a dpdk packet.
> >     > So the user would need to handle in some other way, which is not
> being
> >     > done today, I know.
> >     >
> >     >
> >     >
> >     >     This patch only tries to alleviate that
> >     >     by accommodating space from the headroom or tailroom, if
> >     possible, and
> >     >     create just enough space for the new data.
> >     >
> >     >
> >     > The new code will fail is some yet undefined way, occasionally
> working
> >     > and failing
> >     > in the "other" cases.
> >     >
> >     >
> >     >
> >     >     My preferred approach would
> >     >     be to return an error if not possible, but since the API
> >     doesn't deal
> >     >     with errors as is, the previous behavior of manually asserting
> >     was left
> >     >     as is. As reported in [1] (I comment more on that below), the
> >     behavior
> >     >
> >     >     of manually asserting can lead to undesired behavior in some
> >     use cases.
> >     >
> >     >
> >     >
> >     > I am familiar with the issue.
> >     > As part of the change,  dp_packet_put_uninit()
> >     > and dp_packet_push_uninit() could be modified to return NULL
> >     > and that could be percolated and checked for.
> >     >
> >     > Those APIs could simply check (by calling a helper API) if they
> would
> >     > fail a priori to trigger returning
> >     > NULL for dpdk buf cases.
> >     >
> >     >
> >
> >     Ok, I'm glad we're on the same page here. That's what [2] is doing.
> I'm
> >     planning to bring that forward. If you could have a look as well,
> that
> >     would be great.
> >
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >     +    {
> >     >     >     +        size_t miss_len;
> >     >     >     +
> >     >     >     +        if (new_headroom == dp_packet_headroom(b)) {
> >     >     >     +            /* This is a tailroom adjustment. Since
> >     there's no
> >     >     >     tailroom space
> >     >     >     +             * left, try and shift data towards the
> >     head to free up
> >     >     >     tail space,
> >     >     >     +             * if there's enough headroom */
> >     >     >     +
> >     >     >     +            miss_len = new_tailroom -
> >     dp_packet_tailroom(b);
> >     >     >     +
> >     >     >     +            if (miss_len <= new_headroom) {
> >     >     >     +                dp_packet_shift(b, -miss_len);
> >     >     >     +            } else {
> >     >     >     +                /* XXX: Handle error case and report
> >     error to caller */
> >     >     >     +                OVS_NOT_REACHED();
> >     >     >
> >     >     >
> >     >     >
> >     >     > This will not just drop the packet, it will fail the daemon,
> >     because a
> >     >     > packet cannot be resized.
> >     >     > If the system is completely depleted of memory, that may be
> >     ok, but in
> >     >     > the case, no such
> >     >     > assumption is implied.
> >     >     >
> >     >     > Also, why is XXX still left in the patch?
> >     >     >
> >     >
> >     >     Because there's still work to do in that regard. The whole
> process
> >     >     shouldn't be brought down if there's not enough space to put
> >     some data
> >     >     in one single packet. However, this was intentionally left out
> >     of this
> >     >     series or otherwise it would increase its complexity
> >     considerably.
> >     >
> >     >
> >     > It seems unnecessary to add a bunch of code to a series that tries
> >     to handle
> >     > 'resize', but handles it partially in practical cases. It also
> seems
> >     > undefined when
> >     > it works and when it does not from a API caller POV.
> >
> >     Granted that the dp_packet_resize__() supports more operations than
> what
> >     this implementation provides, but dp_packet API callers won't notice
> a
> >     difference since the only callers are
> >     dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And
> the
> >     implementation covers for those functions, which only either modify
> the
> >     tailroom or the headroom (and not both).
> >
> >     > I think patch 7 is also there to only support this patch 8.
> >     >
> >
> >     That's a wrong assumption. Patch 7/14 was added to support the
> >     dp_packet_shift() operation.
> >
> >     In this patch 8/14, because of the noncontiguous nature of chained
> >     mbufs, there's no possibility of using general functions like
> `memmove`,
> >     and re-using the shift operations was just easier.
> >
> >
> > I guess we may be talking about 2 different things.
> > My point is simply that greping for dp_packet_mbuf_shift() does not show
> > any users with resize__()
> > removed; I think it would be best to remove unused code from the patchset
> > dp_packet_mbuf_shift() is defined in patch 7.
> >
> >
>
> Again, it is used in `dp_packet_shift()` for DPDK packets.
>


sorry, I missed these responses.

Presently, the usage of the dp_packet_shift() api is limited to pcap usage.
I did not check carefully, but possibly there is no valid dpdk use at the
moment.
But, on the other hand, I suppose dp_packet_shift() is a general api, of
sorts.




>
> >
> >     > Ideally, It would seem like a modified patch 7 and patch 8 would
> >     belong
> >     > with the rest of the
> >     > fix for the dpdk packet memory preallocation constraint issue.
> >     >
> >     > Also, ideally, 'XXX' is removed from patches.
> >     >
> >     >
> >     >
> >     >
> >     >     As others have pointed out in [1], this is not a simple
> >     change, which
> >     >     would have to be propagated to higher levels in other parts of
> >     the code
> >     >     base. I've proposed an alternative (vs refactoring the whole
> >     dp_packet
> >     >     API to handle and return errors) in [2], but that seems to
> >     have gone
> >     >     stale. Going forward I see that approach merging with this new
> >     piece in
> >     >     dp_packet_resize__(), where an error can be returned to the
> >     caller if
> >     >     there's not enough space.
> >     >
> >     >
> >     > The full change is outside the scope of this series.
> >     >
> >     >
> >     >
> >     >
> >     >     > Also, the preexisting API handles two cases:
> >     >     > 1/ Tailroom only adjustment
> >     >     > 2/ headroom and/or tailroom adjustment
> >     >     >
> >     >     > meaning it handles all cases.
> >     >     >
> >     >     > The new DPDK addition (part of the same API) defines 2 cases
> >     >     >
> >     >     > 1/ tailroom only adjustment
> >     >     > 2/ headroom only adjustment
> >     >     >
> >     >     > So, it looks like a different API, that also does not handle
> >     all cases.
> >     >     >
> >     >     >
> >     >
> >     >     You have a point there, support for point 2/ "headroom and
> >     tailroom
> >     >     adjustment" is missed. It doesn't seem to be used anywhere at
> the
> >     >     moment, the only callers being dp_packet_prealloc_tailroom()
> and
> >     >     dp_packet_prealloc_headroom(), but I'll submit an incremental
> >     patch to
> >     >     deal with this. Thanks for pointing it out.
> >     >
> >     >     Tiago.
> >     >
> >     >     [1]
> >     >
> >      https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> >
> >     >
> >      <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
> May/346649.html <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/
> 346649.html>>
> >     >     [2]
> >     >
> >      https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html>
> >     >
> >      <https://mail.openvswitch.org/pipermail/ovs-dev/2018-
> July/348908.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html>>
> >     >
> >     >     >
> >     >     >     +            }
> >     >     >     +        } else {
> >     >     >     +            /* Otherwise, this is a headroom
> >     adjustment. Try to
> >     >     >     shift data
> >     >     >     +             * towards the tail to free up head space,
> >     if there's
> >     >     >     enough
> >     >     >     +             * tailroom */
> >     >     >     +
> >     >     >     +            miss_len = new_headroom -
> >     dp_packet_headroom(b);
> >     >     >
> >     >     >     +
> >     >     >     +            if (miss_len <= new_tailroom) {
> >     >     >     +                dp_packet_shift(b, miss_len);
> >     >     >     +            } else {
> >     >     >     +                /* XXX: Handle error case and report
> >     error to
> >     >     caller */
> >     >     >     +                OVS_NOT_REACHED();
> >     >     >
> >     >     >
> >     >     >
> >     >     > same comments as above.
> >     >     >
> >     >     >
> >     >     >
> >     >     >     +            }
> >     >     >     +        }
> >     >     >     +
> >     >     >     +        new_base = dp_packet_base(b);
> >     >     >     +
> >     >     >     +        break;
> >     >     >     +    }
> >     >     >          case DPBUF_MALLOC:
> >     >     >              if (new_headroom == dp_packet_headroom(b)) {
> >     >     >                  new_base = xrealloc(dp_packet_base(b),
> >     >     new_allocated);
> >     >     >     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet
> >     *b, size_t
> >     >     >     new_headroom, size_t new_tailroom
> >     >     >              OVS_NOT_REACHED();
> >     >     >          }
> >     >     >
> >     >     >     -    dp_packet_set_allocated(b, new_allocated);
> >     >     >     +    if (b->source != DPBUF_DPDK) {
> >     >     >     +        dp_packet_set_allocated(b, new_allocated);
> >     >     >     +    }
> >     >     >          dp_packet_set_base(b, new_base);
> >     >     >
> >     >     >          new_data = (char *) new_base + new_headroom;
> >     >     >     --
> >     >     >     2.7.4
> >     >     >
> >     >     >     _______________________________________________
> >     >     >     dev mailing list
> >     >     >     dev@openvswitch.org <mailto:dev@openvswitch.org>
> >     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>
> >     >     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>
> >     <mailto:dev@openvswitch.org <mailto:dev@openvswitch.org>>>
> >     >     >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>
> >     >     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >     >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>>
> >     >     >
> >     >     >
> >     >
> >     >
> >
> >
>
Darrell Ball Aug. 7, 2018, 4:56 p.m. UTC | #13
On Tue, Jul 24, 2018 at 12:36 AM, Lam, Tiago <tiago.lam@intel.com> wrote:

> On 23/07/2018 23:48, Darrell Ball wrote:
> >
> >
> > On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <tiago.lam@intel.com
> > <mailto:tiago.lam@intel.com>> wrote:
> >
> >     On 19/07/2018 00:02, Darrell Ball wrote:
> >     >
> >     >
> >     > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam@intel.com
> <mailto:tiago.lam@intel.com>
> >     > <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>> wrote:
> >     >
> >     >     On 16/07/2018 09:37, Lam, Tiago wrote:
> >     >     > On 13/07/2018 18:54, Darrell Ball wrote:
> >     >     >> Thanks for the patch.
> >     >     >>
> >     >     >> A few queries inline.
> >     >     >>
> >     >     >
> >     >     > Hi Darrell,
> >     >     >
> >     >     > Thanks for your inputs. I've replied in-line as well.
> >     >     >
> >     >     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <
> tiago.lam@intel.com <mailto:tiago.lam@intel.com>
> >     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
> >     >     >> <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>
> >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>>> wrote:
> >     >     >>
> >     >     >>     When enabled with DPDK OvS relies on mbufs allocated by
> >     >     mempools to
> >     >     >>     receive and output data on DPDK ports. Until now, each
> OvS
> >     >     dp_packet has
> >     >     >>     had only one mbuf associated, which is allocated with
> >     the maximum
> >     >     >>     possible size, taking the MTU into account. This
> approach,
> >     >     however,
> >     >     >>     doesn't allow us to increase the allocated size in an
> mbuf,
> >     >     if needed,
> >     >     >>     since an mbuf is allocated and initialised upon mempool
> >     >     creation. Thus,
> >     >     >>     in the current implementatin this is dealt with by
> calling
> >     >     >>     OVS_NOT_REACHED() and terminating OvS.
> >     >     >>
> >     >     >>     To avoid this, and allow the (already) allocated space
> >     to be
> >     >     better
> >     >     >>     used, dp_packet_resize__() now tries to use the
> available
> >     >     room, both the
> >     >     >>     tailroom and the headroom, to make enough space for the
> new
> >     >     data. Since
> >     >     >>     this happens for packets of source DPBUF_DPDK, the
> >     >     single-segment mbuf
> >     >     >>     case mentioned above is also covered by this new
> aproach in
> >     >     resize__().
> >     >     >>
> >     >     >>     Signed-off-by: Tiago Lam <tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com>
> >     >     <mailto:tiago.lam@intel.com <mailto:tiago.lam@intel.com>>
> >     >     >>     <mailto:tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com> <mailto:tiago.lam@intel.com
> >     <mailto:tiago.lam@intel.com>>>>
> >     >     >>     Acked-by: Eelco Chaudron <echaudro@redhat.com <mailto:
> echaudro@redhat.com>
> >     >     <mailto:echaudro@redhat.com <mailto:echaudro@redhat.com>>
> >     >     >>     <mailto:echaudro@redhat.com
> >     <mailto:echaudro@redhat.com> <mailto:echaudro@redhat.com
> >     <mailto:echaudro@redhat.com>>>>
> >     >     >>     ---
> >     >     >>      lib/dp-packet.c | 48
> >     >     ++++++++++++++++++++++++++++++++++++++++++++++--
> >     >     >>      1 file changed, 46 insertions(+), 2 deletions(-)
> >     >     >>
> >     >     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     >     >>     index d6e19eb..87af459 100644
> >     >     >>     --- a/lib/dp-packet.c
> >     >     >>     +++ b/lib/dp-packet.c
> >     >     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct
> >     dp_packet *b,
> >     >     size_t
> >     >     >>     new_headroom, size_t new_tailroom
> >     >     >>          new_allocated = new_headroom + dp_packet_size(b) +
> >     >     new_tailroom;
> >     >     >>
> >     >     >>          switch (b->source) {
> >     >     >>     +    /* When resizing mbufs, both a single mbuf and
> >     multi-segment
> >     >     >>     mbufs (where
> >     >     >>     +     * data is not contigously held in memory), both
> the
> >     >     headroom
> >     >     >>     and the
> >     >     >>     +     * tailroom available will be used to make more
> space
> >     >     for where
> >     >     >>     data needs
> >     >     >>     +     * to be inserted. I.e if there's not enough
> headroom,
> >     >     data may
> >     >     >>     be shifted
> >     >     >>     +     * right if there's enough tailroom.
> >     >     >>     +     * However, this is not bulletproof and in some
> cases
> >     >     the space
> >     >     >>     available
> >     >     >>     +     * won't be enough - in those cases, an error
> >     should be
> >     >     >>     returned and the
> >     >     >>     +     * packet dropped. */
> >     >     >>          case DPBUF_DPDK:
> >     >     >>     -        OVS_NOT_REACHED();
> >     >     >>
> >     >     >>
> >     >     >> Previously, it was a coding error to call this function for
> a
> >     >     DPDK mbuf
> >     >     >> case, which is pretty
> >     >     >> clear. But with this patch, presumably that is not longer
> the
> >     >     case and
> >     >     >> the calling the API is
> >     >     >> now ok for DPDK mbufs.
> >     >     >>
> >     >     >
> >     >     > As it stands, it will still be an error to call
> >     >     dp_packet_resize__() for
> >     >     > any DPDK packet, or by extension any of the other functions
> >     that call
> >     >     > it, such as dp_packet_prealloc_tailroom() and
> >     >     > dp_packet_prealloc_headroom(). This patch only tries to
> >     alleviate that
> >     >     > by accommodating space from the headroom or tailroom, if
> >     possible, and
> >     >     > create just enough space for the new data. My preferred
> >     approach would
> >     >     > be to return an error if not possible, but since the API
> >     doesn't deal
> >     >     > with errors as is, the previous behavior of manually
> >     asserting was
> >     >     left
> >     >     > as is. As reported in [1] (I comment more on that below),
> >     the behavior
> >     >     > of manually asserting can lead to undesired behavior in some
> use
> >     >     cases.
> >     >     >
> >     >     >>
> >     >     >>
> >     >     >>     +    {
> >     >     >>     +        size_t miss_len;
> >     >     >>     +
> >     >     >>     +        if (new_headroom == dp_packet_headroom(b)) {
> >     >     >>     +            /* This is a tailroom adjustment. Since
> >     there's no
> >     >     >>     tailroom space
> >     >     >>     +             * left, try and shift data towards the
> >     head to
> >     >     free up
> >     >     >>     tail space,
> >     >     >>     +             * if there's enough headroom */
> >     >     >>     +
> >     >     >>     +            miss_len = new_tailroom -
> >     dp_packet_tailroom(b);
> >     >     >>     +
> >     >     >>     +            if (miss_len <= new_headroom) {
> >     >     >>     +                dp_packet_shift(b, -miss_len);
> >     >     >>     +            } else {
> >     >     >>     +                /* XXX: Handle error case and report
> error
> >     >     to caller */
> >     >     >>     +                OVS_NOT_REACHED();
> >     >     >>
> >     >     >>
> >     >     >>
> >     >     >> This will not just drop the packet, it will fail the daemon,
> >     >     because a
> >     >     >> packet cannot be resized.
> >     >     >> If the system is completely depleted of memory, that may be
> ok,
> >     >     but in
> >     >     >> the case, no such
> >     >     >> assumption is implied.
> >     >     >>
> >     >     >> Also, why is XXX still left in the patch?
> >     >     >>
> >     >     >
> >     >     > Because there's still work to do in that regard. The whole
> >     process
> >     >     > shouldn't be brought down if there's not enough space to put
> >     some data
> >     >     > in one single packet. However, this was intentionally left
> >     out of this
> >     >     > series or otherwise it would increase its complexity
> >     considerably.
> >     >     >
> >     >     > As others have pointed out in [1], this is not a simple
> >     change, which
> >     >     > would have to be propagated to higher levels in other parts
> >     of the
> >     >     code
> >     >     > base. I've proposed an alternative (vs refactoring the whole
> >     dp_packet
> >     >     > API to handle and return errors) in [2], but that seems to
> >     have gone
> >     >     > stale. Going forward I see that approach merging with this
> new
> >     >     piece in
> >     >     > dp_packet_resize__(), where an error can be returned to the
> >     caller if
> >     >     > there's not enough space.
> >     >     >
> >     >     >> Also, the preexisting API handles two cases:
> >     >     >> 1/ Tailroom only adjustment
> >     >     >> 2/ headroom and/or tailroom adjustment
> >     >     >>
> >     >     >> meaning it handles all cases.
> >     >     >>
> >     >     >> The new DPDK addition (part of the same API) defines 2 cases
> >     >     >>
> >     >     >> 1/ tailroom only adjustment
> >     >     >> 2/ headroom only adjustment
> >     >     >>
> >     >     >> So, it looks like a different API, that also does not
> >     handle all
> >     >     cases.
> >     >     >>
> >     >     >>
> >     >     >
> >     >     > You have a point there, support for point 2/ "headroom and
> >     tailroom
> >     >     > adjustment" is missed. It doesn't seem to be used anywhere
> >     at the
> >     >     > moment, the only callers being dp_packet_prealloc_tailroom()
> and
> >     >     > dp_packet_prealloc_headroom(), but I'll submit an
> >     incremental patch to
> >     >     > deal with this. Thanks for pointing it out.
> >     >     >
> >     >     I've had a look into this and it doesn't seem that case number
> 2/
> >     >     "headroom and tailroom adjustment" above would make sense for
> the
> >     >     DPBUF_DPDK case. The reason being that if both `new_tailroom`
> and
> >     >     `new_headroom` are being increased (in respect to
> >     dp_packet_tailroom()
> >     >     and dp_packet_headroom()), there won't be enough space as the
> >     tail and
> >     >     head would be competing for the same available space. So it
> >     makes sense
> >     >     to make it an exclusive operation for the DPBUF_DPDK case.
> >     >
> >     >
> >     > Maybe you can explain why that would be the case, in general?
> >     >
> >
> >     DPDK mbufs have a fixed layout. That means we can't really resize the
> >     mbufs per say, as `buf_len` are fixed. For example, while in a normal
> >     DPBUF_MALLOC packet we can reallocate data and copy everything to a
> >     smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we
> are
> >     stuck with the same `buf_len` (meaning we can't change the tailroom
> >     without affecting the `data_len`, thus modifying data). What we can
> do
> >     is shift data to try and create more or less tailroom, taking or
> adding
> >     space from the headroom.
> >
> >
> > The above is fine, albeit obvious; but my question was something
> different.
> > I was asking why you thought a packet could not adjust both headroom and
> > tailroom
> > for dpdk mbuf packets. data_len may change will headroom and/or tailroom
> > changes.
> >
> >
>
> My understanding is that `data_len` needs to remain fixed (as `size_` in
> non-DPDK packets remains the same in DPBUF_MALLOC). If you change
> `data_len` then you're effectively affecting the data. Either adding
> random data to the packet, by increasing `data_len`, or trunking the
> packet by decreasing `data_len`.
>
> Given the above, if you want to increase the tailroom then you can only
> shift the data left (if there's enough headroom to do so), and that will
> affect the headroom available space.
>

I think it is possible to change both headroom and tailroom in a packet in
general.

Also from your own code

    /* Update mbufs' properties, and if using multi-segment mbufs, first and
     * last mbuf's data_len also needs to be adjusted */
    mbuf->data_off = mbuf->data_off + dst_ofs;




>
> >
> >
> >     I've gave it some thought and I'm now convinced that reusing
> resize__()
> >     here is probably not the best approach. Given its implementation and
> >     description ("Reallocates 'b' so that it has exactly 'new_headroom'
> and
> >     'new_tailroom'"), that doesn't sound like it would be a good fit for
> a
> >     DPDK packet. Maybe a completely separate function would be the best
> >     approach here, setting up different expectations.
> >
> >
> > That is part of the issue.
> >
> >
> >
> >
> >
> >     However, as I said before, this was implemented to try to alleviate
> the
> >     call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's
> adding
> >     more complications than what it's actually solving (at least at this
> >     point in time), I'm going to drop this patch (I'll send a revert
> patch
> >     with the needed changes) in favor of dealing properly with
> >     dp_packet_resize__() for DPDK packets, and stop calling it for DPDK
> >     packets.
> >
> >
> >
> > I think it would be better to modify the original patchset, rather than
> > try to add code with
> > the multisegment mbuf patchset and then immediately revert it.
> > Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used
> > without the resize__() in
> > this patch 8, I think it would be best to remove the 'shift' api from
> > patch 7.
> >
>
> I would be fine with modifying the original patchset, except your
> comments came after the merge to the `dpdk_merge` branch. And it becomes
> somewhat even more delicate because it is very close to the deadline for
> accepting new features.
>
> Now, we could modify the `dpdk_merge` branch, since it hasn't been
> merged to master yet, but at expenses of Ian's time... I'm not sure what
> the policy is here, so I'm CC'ing Ian to clarify. Should I submit v6
> here, or leave it as is (with a revert patch queued to be applied)?
>
diff mbox series

Patch

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index d6e19eb..87af459 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -237,9 +237,51 @@  dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
     new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
 
     switch (b->source) {
+    /* When resizing mbufs, both a single mbuf and multi-segment mbufs (where
+     * data is not contigously held in memory), both the headroom and the
+     * tailroom available will be used to make more space for where data needs
+     * to be inserted. I.e if there's not enough headroom, data may be shifted
+     * right if there's enough tailroom.
+     * However, this is not bulletproof and in some cases the space available
+     * won't be enough - in those cases, an error should be returned and the
+     * packet dropped. */
     case DPBUF_DPDK:
-        OVS_NOT_REACHED();
+    {
+        size_t miss_len;
+
+        if (new_headroom == dp_packet_headroom(b)) {
+            /* This is a tailroom adjustment. Since there's no tailroom space
+             * left, try and shift data towards the head to free up tail space,
+             * if there's enough headroom */
+
+            miss_len = new_tailroom - dp_packet_tailroom(b);
+
+            if (miss_len <= new_headroom) {
+                dp_packet_shift(b, -miss_len);
+            } else {
+                /* XXX: Handle error case and report error to caller */
+                OVS_NOT_REACHED();
+            }
+        } else {
+            /* Otherwise, this is a headroom adjustment. Try to shift data
+             * towards the tail to free up head space, if there's enough
+             * tailroom */
+
+            miss_len = new_headroom - dp_packet_headroom(b);
 
+
+            if (miss_len <= new_tailroom) {
+                dp_packet_shift(b, miss_len);
+            } else {
+                /* XXX: Handle error case and report error to caller */
+                OVS_NOT_REACHED();
+            }
+        }
+
+        new_base = dp_packet_base(b);
+
+        break;
+    }
     case DPBUF_MALLOC:
         if (new_headroom == dp_packet_headroom(b)) {
             new_base = xrealloc(dp_packet_base(b), new_allocated);
@@ -263,7 +305,9 @@  dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
         OVS_NOT_REACHED();
     }
 
-    dp_packet_set_allocated(b, new_allocated);
+    if (b->source != DPBUF_DPDK) {
+        dp_packet_set_allocated(b, new_allocated);
+    }
     dp_packet_set_base(b, new_base);
 
     new_data = (char *) new_base + new_headroom;