[ovs-dev,v2,1/3] netdev-dpdk: Validate packets burst before Tx.

Message ID 1547139522-31154-2-git-send-email-tiago.lam@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series
  • dpdk: Add support for TSO
Related show

Commit Message

Lam, Tiago Jan. 10, 2019, 4:58 p.m.
Given that multi-segment mbufs might be sent between interfaces that
support different capabilities, and may even support different layouts
of mbufs, outgoing packets should be validated before sent on the egress
interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
order to validate the following (taken from the DPDK documentation), on
a device specific manner:
- Check if packet meets devices requirements for tx offloads.
- Check limitations about number of segments.
- Check additional requirements when debug is enabled.
- Update and/or reset required checksums when tx offload is set for
packet.

Signed-off-by: Tiago Lam <tiago.lam@intel.com>
---
 lib/netdev-dpdk.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Comments

Ian Stokes Jan. 11, 2019, 10:50 a.m. | #1
On 1/10/2019 4:58 PM, Tiago Lam wrote:
> Given that multi-segment mbufs might be sent between interfaces that
> support different capabilities, and may even support different layouts
> of mbufs, outgoing packets should be validated before sent on the egress
> interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
> rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
> order to validate the following (taken from the DPDK documentation), on
> a device specific manner:
> - Check if packet meets devices requirements for tx offloads.
> - Check limitations about number of segments.
> - Check additional requirements when debug is enabled.
> - Update and/or reset required checksums when tx offload is set for
> packet.
> 

Thanks Tiago comments below.

> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
> ---
>   lib/netdev-dpdk.c | 21 +++++++++++++++++++--
>   1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index d6114ee..77d04fc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
>   
>   /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of
>    * 'pkts', even in case of failure.
> + * In case multi-segment mbufs / TSO is being used, it also prepares. In such
> + * cases, only the prepared packets will be sent to Tx burst, meaning that if
> + * an invalid packet appears in 'pkts'[3] only the validated packets in indices
> + * 0, 1 and 2 will be sent.
>    *
>    * Returns the number of packets that weren't transmitted. */
>   static inline int
> @@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>                            struct rte_mbuf **pkts, int cnt)
>   {
>       uint32_t nb_tx = 0;
> +    uint16_t nb_prep = cnt;
>   
> -    while (nb_tx != cnt) {
> +    if (dpdk_multi_segment_mbufs) {
As this feature would be experimental and not enabled, by default this 
would be false.

Would it make sense to deal with the default non multiseg case first?

Extra checks in the datapath can be costly and I'd like to minimize any 
impact for the non multi seg traffic which is the default use case 
currently. Possibly checking for !dpdk_multi_segment_mbufs first? Or 
possibly the use of OVS_LIKELY/OVS_UNLIKELY. Have you given thought to this?

> +        /* Validate the burst of packets for Tx. */
> +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);

So one of the gotchas here is that rte_eth_tx_prepare is experimental 
although I didn't see any compilation issues (warnings etc.) and it 
builds OK with travis.

Possibly a comment above it to explain it's currently experimental and 
subject to change in DPDK would be useful, can be removed once the api 
is non experimental in DPDK. It would be one of the conditions to remove 
the experimental tag for TSO in OVS DPDK I suspect in the future.

Ian
> +        if (nb_prep != cnt) {
> +            VLOG_WARN_RL(&rl, "%s: Preparing packet tx burst failed (%u/%u "
> +                         "packets valid): %s", dev->up.name, nb_prep, cnt,
> +                         rte_strerror(rte_errno));
> +        }
> +    }
> +
> +    /* Tx the validated burst of packets only. */
> +    while (nb_tx != nb_prep) {
>           uint32_t ret;
>   
> -        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);
> +        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx,
> +                               nb_prep - nb_tx);
>           if (!ret) {
>               break;
>           }
>
David Marchand Jan. 11, 2019, 11:11 a.m. | #2
On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes <ian.stokes@intel.com> wrote:

> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>
> > +        /* Validate the burst of packets for Tx. */
> > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>
> So one of the gotchas here is that rte_eth_tx_prepare is experimental
> although I didn't see any compilation issues (warnings etc.) and it
> builds OK with travis.
>

Indeed, the documentation tells it is experimental, but the experimental
tag has always been missing.
On the other hand, this api has been there since 17.02 and its form did not
change afaics.
I am not sure it qualifies as experimental anymore.

CC Ferruh, Andrew and Thomas.
Andrew Rybchenko Jan. 11, 2019, 11:20 a.m. | #3
On 1/11/19 2:11 PM, David Marchand wrote:

On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes <ian.stokes@intel.com<mailto:ian.stokes@intel.com>> wrote:
On 1/10/2019 4:58 PM, Tiago Lam wrote:

> +        /* Validate the burst of packets for Tx. */
> +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);

So one of the gotchas here is that rte_eth_tx_prepare is experimental
although I didn't see any compilation issues (warnings etc.) and it
builds OK with travis.

Indeed, the documentation tells it is experimental, but the experimental tag has always been missing.
On the other hand, this api has been there since 17.02 and its form did not change afaics.
I am not sure it qualifies as experimental anymore.

I think that we should fix the documentation and remove experimental tag.

Andrew.
The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited.
Ferruh Yigit Jan. 11, 2019, 12:29 p.m. | #4
On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> On 1/11/19 2:11 PM, David Marchand wrote:
>>
>> On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes <ian.stokes@intel.com
>> <mailto:ian.stokes@intel.com>> wrote:
>>
>>     On 1/10/2019 4:58 PM, Tiago Lam wrote:
>>
>>     > +        /* Validate the burst of packets for Tx. */
>>     > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>
>>     So one of the gotchas here is that rte_eth_tx_prepare is experimental
>>     although I didn't see any compilation issues (warnings etc.) and it
>>     builds OK with travis.
>>
>>
>> Indeed, the documentation tells it is experimental, but the experimental tag
>> has always been missing.
>> On the other hand, this api has been there since 17.02 and its form did not
>> change afaics.
>> I am not sure it qualifies as experimental anymore.
> 
> I think that we should fix the documentation and remove experimental tag.

Agreed to remove experimental from rte_eth_tx_prepare().

Most probably rte_eth_tx_prepare() developed before we started to use
__rte_experimental tag, that is why it doesn't have it.
Ferruh Yigit Jan. 11, 2019, 12:29 p.m. | #5
On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> On 1/11/19 2:11 PM, David Marchand wrote:
>>
>> On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes <ian.stokes@intel.com
>> <mailto:ian.stokes@intel.com>> wrote:
>>
>>     On 1/10/2019 4:58 PM, Tiago Lam wrote:
>>
>>     > +        /* Validate the burst of packets for Tx. */
>>     > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>
>>     So one of the gotchas here is that rte_eth_tx_prepare is experimental
>>     although I didn't see any compilation issues (warnings etc.) and it
>>     builds OK with travis.
>>
>>
>> Indeed, the documentation tells it is experimental, but the experimental tag
>> has always been missing.
>> On the other hand, this api has been there since 17.02 and its form did not
>> change afaics.
>> I am not sure it qualifies as experimental anymore.
> 
> I think that we should fix the documentation and remove experimental tag.

Agreed to remove experimental from rte_eth_tx_prepare().

Most probably rte_eth_tx_prepare() developed before we started to use
__rte_experimental tag, that is why it doesn't have it.
Ian Stokes Jan. 11, 2019, 12:49 p.m. | #6
On 1/11/2019 12:29 PM, Ferruh Yigit wrote:
> On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
>> On 1/11/19 2:11 PM, David Marchand wrote:
>>>
>>> On Fri, Jan 11, 2019 at 11:56 AM Ian Stokes <ian.stokes@intel.com
>>> <mailto:ian.stokes@intel.com>> wrote:
>>>
>>>      On 1/10/2019 4:58 PM, Tiago Lam wrote:
>>>
>>>      > +        /* Validate the burst of packets for Tx. */
>>>      > +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>>
>>>      So one of the gotchas here is that rte_eth_tx_prepare is experimental
>>>      although I didn't see any compilation issues (warnings etc.) and it
>>>      builds OK with travis.
>>>
>>>
>>> Indeed, the documentation tells it is experimental, but the experimental tag
>>> has always been missing.
>>> On the other hand, this api has been there since 17.02 and its form did not
>>> change afaics.
>>> I am not sure it qualifies as experimental anymore.
>>
>> I think that we should fix the documentation and remove experimental tag.
> 
> Agreed to remove experimental from rte_eth_tx_prepare().
> 
> Most probably rte_eth_tx_prepare() developed before we started to use
> __rte_experimental tag, that is why it doesn't have it.
> 
Sounds good, thanks Ferruh.

Ian
David Marchand Jan. 11, 2019, 12:57 p.m. | #7
On Fri, Jan 11, 2019 at 1:49 PM Ian Stokes <ian.stokes@intel.com> wrote:

> On 1/11/2019 12:29 PM, Ferruh Yigit wrote:
> > On 1/11/2019 11:20 AM, Andrew Rybchenko wrote:
> >> On 1/11/19 2:11 PM, David Marchand wrote:
> >>> Indeed, the documentation tells it is experimental, but the
> experimental tag
> >>> has always been missing.
> >>> On the other hand, this api has been there since 17.02 and its form
> did not
> >>> change afaics.
> >>> I am not sure it qualifies as experimental anymore.
> >>
> >> I think that we should fix the documentation and remove experimental
> tag.
> >
> > Agreed to remove experimental from rte_eth_tx_prepare().
> >
> > Most probably rte_eth_tx_prepare() developed before we started to use
> > __rte_experimental tag, that is why it doesn't have it.
> >
> Sounds good, thanks Ferruh.
>

Ok, let me send a patch for dpdk.
Lam, Tiago Jan. 11, 2019, 3:45 p.m. | #8
Hi Ian,

Thanks, comments in-line.

On 11/01/2019 10:50, Ian Stokes wrote:
> On 1/10/2019 4:58 PM, Tiago Lam wrote:
>> Given that multi-segment mbufs might be sent between interfaces that
>> support different capabilities, and may even support different layouts
>> of mbufs, outgoing packets should be validated before sent on the egress
>> interface. Thus, netdev_dpdk_eth_tx_burst() now calls DPDK's
>> rte_eth_tx_prepare() function, if and only multi-segments is enbaled, in
>> order to validate the following (taken from the DPDK documentation), on
>> a device specific manner:
>> - Check if packet meets devices requirements for tx offloads.
>> - Check limitations about number of segments.
>> - Check additional requirements when debug is enabled.
>> - Update and/or reset required checksums when tx offload is set for
>> packet.
>>
> 
> Thanks Tiago comments below.
> 
>> Signed-off-by: Tiago Lam <tiago.lam@intel.com>
>> ---
>>   lib/netdev-dpdk.c | 21 +++++++++++++++++++--
>>   1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index d6114ee..77d04fc 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -2029,6 +2029,10 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
>>   
>>   /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of
>>    * 'pkts', even in case of failure.
>> + * In case multi-segment mbufs / TSO is being used, it also prepares. In such
>> + * cases, only the prepared packets will be sent to Tx burst, meaning that if
>> + * an invalid packet appears in 'pkts'[3] only the validated packets in indices
>> + * 0, 1 and 2 will be sent.
>>    *
>>    * Returns the number of packets that weren't transmitted. */
>>   static inline int
>> @@ -2036,11 +2040,24 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>>                            struct rte_mbuf **pkts, int cnt)
>>   {
>>       uint32_t nb_tx = 0;
>> +    uint16_t nb_prep = cnt;
>>   
>> -    while (nb_tx != cnt) {
>> +    if (dpdk_multi_segment_mbufs) {
> As this feature would be experimental and not enabled, by default this 
> would be false.
> 
> Would it make sense to deal with the default non multiseg case first?
> 
> Extra checks in the datapath can be costly and I'd like to minimize any 
> impact for the non multi seg traffic which is the default use case 
> currently. Possibly checking for !dpdk_multi_segment_mbufs first? Or 
> possibly the use of OVS_LIKELY/OVS_UNLIKELY. Have you given thought to this?

Sounds reasonable to me, since this won't be the default. I'll use
`OVS_UNLIKELY` for the next iteration.

> 
>> +        /* Validate the burst of packets for Tx. */
>> +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> 
> So one of the gotchas here is that rte_eth_tx_prepare is experimental 
> although I didn't see any compilation issues (warnings etc.) and it 
> builds OK with travis.
> 
> Possibly a comment above it to explain it's currently experimental and 
> subject to change in DPDK would be useful, can be removed once the api 
> is non experimental in DPDK. It would be one of the conditions to remove 
> the experimental tag for TSO in OVS DPDK I suspect in the future.

Thanks for bringing this up. I can see from the thread that this is
pretty much clarified by now, but do we still want to include a comment?
Since the docs are also versioned, and David's patch (thanks!) will be
for master, I assume we still want a comment here, since people might be
looking into DPDK's v18.11 docs specifically.

Tiago.
Ian Stokes Jan. 11, 2019, 4:07 p.m. | #9
On 1/11/2019 3:45 PM, Lam, Tiago wrote:
> Hi Ian,
> 
> Thanks, comments in-line.
> 

[snip]

>>> +        /* Validate the burst of packets for Tx. */
>>> +        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>
>> So one of the gotchas here is that rte_eth_tx_prepare is experimental
>> although I didn't see any compilation issues (warnings etc.) and it
>> builds OK with travis.
>>
>> Possibly a comment above it to explain it's currently experimental and
>> subject to change in DPDK would be useful, can be removed once the api
>> is non experimental in DPDK. It would be one of the conditions to remove
>> the experimental tag for TSO in OVS DPDK I suspect in the future.
> 
> Thanks for bringing this up. I can see from the thread that this is
> pretty much clarified by now, but do we still want to include a comment?
> Since the docs are also versioned, and David's patch (thanks!) will be
> for master, I assume we still want a comment here, since people might be
> looking into DPDK's v18.11 docs specifically.

Good point, I would think Davids doc patch would be backported to DPDK 
but I can confirm, if it is backported to 18.11, I don't think the 
comment is needed.

Ian
> 
> Tiago.
>
David Marchand Jan. 11, 2019, 4:11 p.m. | #10
On Fri, Jan 11, 2019 at 5:07 PM Ian Stokes <ian.stokes@intel.com> wrote:

> On 1/11/2019 3:45 PM, Lam, Tiago wrote:
> > Thanks for bringing this up. I can see from the thread that this is
> > pretty much clarified by now, but do we still want to include a comment?
> > Since the docs are also versioned, and David's patch (thanks!) will be
> > for master, I assume we still want a comment here, since people might be
> > looking into DPDK's v18.11 docs specifically.
>
> Good point, I would think Davids doc patch would be backported to DPDK
> but I can confirm, if it is backported to 18.11, I don't think the
> comment is needed.
>

I was going to reply that I intended to ask for a backport in 18.11 at
least :-)
Ian Stokes Jan. 11, 2019, 4:27 p.m. | #11
On 1/11/2019 4:11 PM, David Marchand wrote:
> 
> On Fri, Jan 11, 2019 at 5:07 PM Ian Stokes <ian.stokes@intel.com 
> <mailto:ian.stokes@intel.com>> wrote:
> 
>     On 1/11/2019 3:45 PM, Lam, Tiago wrote:
>      > Thanks for bringing this up. I can see from the thread that this is
>      > pretty much clarified by now, but do we still want to include a
>     comment?
>      > Since the docs are also versioned, and David's patch (thanks!)
>     will be
>      > for master, I assume we still want a comment here, since people
>     might be
>      > looking into DPDK's v18.11 docs specifically.
> 
>     Good point, I would think Davids doc patch would be backported to DPDK
>     but I can confirm, if it is backported to 18.11, I don't think the
>     comment is needed.
> 
> 
> I was going to reply that I intended to ask for a backport in 18.11 at 
> least :-)

That's perfect, thanks David, much appreciated.
> 
> -- 
> David Marchand
>

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d6114ee..77d04fc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2029,6 +2029,10 @@  netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
 
 /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes ownership of
  * 'pkts', even in case of failure.
+ * In case multi-segment mbufs / TSO is being used, it also prepares. In such
+ * cases, only the prepared packets will be sent to Tx burst, meaning that if
+ * an invalid packet appears in 'pkts'[3] only the validated packets in indices
+ * 0, 1 and 2 will be sent.
  *
  * Returns the number of packets that weren't transmitted. */
 static inline int
@@ -2036,11 +2040,24 @@  netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
                          struct rte_mbuf **pkts, int cnt)
 {
     uint32_t nb_tx = 0;
+    uint16_t nb_prep = cnt;
 
-    while (nb_tx != cnt) {
+    if (dpdk_multi_segment_mbufs) {
+        /* Validate the burst of packets for Tx. */
+        nb_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
+        if (nb_prep != cnt) {
+            VLOG_WARN_RL(&rl, "%s: Preparing packet tx burst failed (%u/%u "
+                         "packets valid): %s", dev->up.name, nb_prep, cnt,
+                         rte_strerror(rte_errno));
+        }
+    }
+
+    /* Tx the validated burst of packets only. */
+    while (nb_tx != nb_prep) {
         uint32_t ret;
 
-        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);
+        ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx,
+                               nb_prep - nb_tx);
         if (!ret) {
             break;
         }