diff mbox

[ovs-dev,v7] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.

Message ID 1481729424-49438-1-git-send-email-sugesh.chandran@intel.com
State Superseded
Delegated to: Daniele Di Proietto
Headers show

Commit Message

Chandran, Sugesh Dec. 14, 2016, 3:30 p.m. UTC
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.

The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,

     'ovs-vsctl add-port br0 dpdk0 -- \
      set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'

OR (to disable at run time after port is being added to OVS)

    'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'

Similarly to turn ON rx checksum offloading at run time,
    'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'

The Tx checksum offloading support is not implemented due to the following
reasons.

1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.

2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,

*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation.  In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.

No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.

Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Jesse Gross <jesse@kernel.org>

---
v7
- Update documentation with details of performance impact of having DPDK
vectorization in disabled state.
- Merge to latest OVS master.

v6
- Avoid unnecessary reconfiguration of DPDK ports when Rx checksum offload
support is unavailable.
- Limit the 'Rx checksum offload is unsupported' message in logging.
- Merge to latest OVS for DPDK 16.11 support. The DPDK checksum flags that used
- in the patch are only introduced in 16.11.

v5
- Reset the checksum flag in common tunnel pop function than in
'udp_extract_tnl_md' function.

v4
- Unconditonally clear off the checksum flag one time in pop operation than
doing separately in IP and UDP layers.

v3
- Reset the checksum offload flags in tunnel pop operation after the validation.
- Reconfigure the dpdk port with rx checksum offload only if new configuration
is different than current one.

v2
- Set Rx checksum enabled by default.
- Modified commit message, explaining the tradeoff with tx checksum offloading.
- Use dpdk mbuf checksum offload flags  instead of defining new metadata field
in OVS dp_packet.
- validate udp checksum mbuf flag only if the checksum present in the packet.
- Doc update with Rx checksum offloading feature.
---
---
 Documentation/intro/install/dpdk-advanced.rst | 25 ++++++++++++++
 lib/dp-packet.h                               | 29 +++++++++++++++++
 lib/netdev-dpdk.c                             | 47 +++++++++++++++++++++++++++
 lib/netdev-native-tnl.c                       | 35 +++++++++++---------
 lib/netdev.c                                  |  4 +++
 vswitchd/vswitch.xml                          | 14 ++++++++
 6 files changed, 139 insertions(+), 15 deletions(-)

Comments

Pravin Shelar Dec. 14, 2016, 9:42 p.m. UTC | #1
Thanks for the patch. I have couple of questions.

On Wed, Dec 14, 2016 at 7:30 AM, Sugesh Chandran
<sugesh.chandran@intel.com> wrote:
> Add Rx checksum offloading feature support on DPDK physical ports. By default,
> the Rx checksum offloading is enabled if NIC supports. However,
> the checksum offloading can be turned OFF either while adding a new DPDK
> physical port to OVS or at runtime.
>
> The rx checksum offloading can be turned off by setting the parameter to
> 'false'. For eg: To disable the rx checksum offloading when adding a port,
>
>      'ovs-vsctl add-port br0 dpdk0 -- \
>       set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
>
> OR (to disable at run time after port is being added to OVS)
>
>     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
>
> Similarly to turn ON rx checksum offloading at run time,
>     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
>
> The Tx checksum offloading support is not implemented due to the following
> reasons.
>
> 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
> mode driver. Vector packet processing is turned OFF when checksum offloading
> is enabled which causes significant performance drop at Tx side.
>
> 2) Normally, OVS generates checksum for tunnel packets in software at the
> 'tunnel push' operation, where the tunnel headers are created. However
> enabling Tx checksum offloading involves,
>
> *) Mark every packets for tx checksum offloading at 'tunnel_push' and
> recirculate.
> *) At the time of xmit, validate the same flag and instruct the NIC to do the
> checksum calculation.  In case NIC doesnt support Tx checksum offloading,
> the checksum calculation has to be done in software before sending out the
> packets.
>
> No significant performance improvement noticed with Tx checksum offloading
> due to the e overhead of additional validations + non vector packet processing.
> In some test scenarios, it introduces performance drop too.
>
> Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
> decapsulation even though the SSE vector Rx function is disabled in DPDK poll
> mode driver.
>
> Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
> Acked-by: Jesse Gross <jesse@kernel.org>
>
...
...
>  static void
> +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> +    OVS_REQUIRES(dev->mutex)
> +{
> +    struct rte_eth_dev_info info;
> +    bool rx_csum_ol_flag = false;
> +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
> +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
> +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
> +    rte_eth_dev_info_get(dev->port_id, &info);
> +    rx_csum_ol_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> +
> +    if (rx_csum_ol_flag &&
> +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
> +         rx_chksm_offload_capa) {
> +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on device %d",
> +                   dev->port_id);
> +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> +        return;
> +    }
> +    netdev_request_reconfigure(&dev->up);
> +}
> +
I am not sure about need for netdev reconfigure here.

> +static void
>  dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex)
>  {
>      if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
> @@ -851,6 +884,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
>
>      /* Initialize the flow control to NULL */
>      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> +
> +    /* Initilize the hardware offload flags to 0 */
> +    dev->hw_ol_features = 0;
>      if (type == DPDK_DEV_ETH) {
>          err = dpdk_eth_dev_init(dev);
>          if (err) {
> @@ -1118,6 +1154,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>          {RTE_FC_NONE,     RTE_FC_TX_PAUSE},
>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>      };
> +    bool rx_chksm_ofld;
> +    bool temp_flag;
>
>      ovs_mutex_lock(&dev->mutex);
>
> @@ -1141,6 +1179,15 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>          dpdk_eth_flow_ctrl_setup(dev);
>      }
>
> +    /* Rx checksum offload configuration */
> +    /* By default the Rx checksum offload is ON */
> +    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
> +    temp_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD)
> +                        != 0;
> +    if (temp_flag != rx_chksm_ofld) {
> +        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
> +        dpdk_eth_checksum_offload_configure(dev);
> +    }
>      ovs_mutex_unlock(&dev->mutex);
>
Can you also reflect this feature back to user via get-status.

>      return 0;
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index ce2582f..c730e72 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
>
>          ovs_be32 ip_src, ip_dst;
>
> -        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> -            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> -            return NULL;
> +        if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
> +            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> +                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> +                return NULL;
> +            }
>          }
>
>          if (ntohs(ip->ip_tot_len) > l3_size) {
> @@ -179,18 +181,21 @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
>      }
>
>      if (udp->udp_csum) {
> -        uint32_t csum;
> -        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> -            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> -        } else {
> -            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> -        }
> -
> -        csum = csum_continue(csum, udp, dp_packet_size(packet) -
> -                             ((const unsigned char *)udp -
> -                              (const unsigned char *)dp_packet_l2(packet)));
> -        if (csum_finish(csum)) {
> -            return NULL;
> +        if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
> +            uint32_t csum;
> +            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> +                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> +            } else {
> +                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> +            }
> +
> +            csum = csum_continue(csum, udp, dp_packet_size(packet) -
> +                                 ((const unsigned char *)udp -
> +                                  (const unsigned char *)dp_packet_l2(packet)
> +                                 ));
> +            if (csum_finish(csum)) {
> +                return NULL;
> +            }

Can we make use of RX checksum offload for userspace conntrack lib? I
think there is already code duplication in checksum verification, we
could refactor code around checksum API with hardware offload to avoid
it and likely improve performance.
Chandran, Sugesh Dec. 15, 2016, 12:07 p.m. UTC | #2
Regards
_Sugesh

> -----Original Message-----
> From: Pravin Shelar [mailto:pshelar@ovn.org]
> Sent: Wednesday, December 14, 2016 9:42 PM
> To: Chandran, Sugesh <sugesh.chandran@intel.com>
> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>; Ben
> Pfaff <blp@ovn.org>
> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
> offloading feature on DPDK physical ports.
> 
> Thanks for the patch. I have couple of questions.
> 
> On Wed, Dec 14, 2016 at 7:30 AM, Sugesh Chandran
> <sugesh.chandran@intel.com> wrote:
> > Add Rx checksum offloading feature support on DPDK physical ports. By
> > default, the Rx checksum offloading is enabled if NIC supports.
> > However, the checksum offloading can be turned OFF either while adding
> > a new DPDK physical port to OVS or at runtime.
> >
> > The rx checksum offloading can be turned off by setting the parameter
> > to 'false'. For eg: To disable the rx checksum offloading when adding
> > a port,
> >
> >      'ovs-vsctl add-port br0 dpdk0 -- \
> >       set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
> >
> > OR (to disable at run time after port is being added to OVS)
> >
> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
> >
> > Similarly to turn ON rx checksum offloading at run time,
> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
> >
> > The Tx checksum offloading support is not implemented due to the
> > following reasons.
> >
> > 1) Checksum offloading and vectorization are mutually exclusive in
> > DPDK poll mode driver. Vector packet processing is turned OFF when
> > checksum offloading is enabled which causes significant performance drop
> at Tx side.
> >
> > 2) Normally, OVS generates checksum for tunnel packets in software at
> > the 'tunnel push' operation, where the tunnel headers are created.
> > However enabling Tx checksum offloading involves,
> >
> > *) Mark every packets for tx checksum offloading at 'tunnel_push' and
> > recirculate.
> > *) At the time of xmit, validate the same flag and instruct the NIC to
> > do the checksum calculation.  In case NIC doesnt support Tx checksum
> > offloading, the checksum calculation has to be done in software before
> > sending out the packets.
> >
> > No significant performance improvement noticed with Tx checksum
> > offloading due to the e overhead of additional validations + non vector
> packet processing.
> > In some test scenarios, it introduces performance drop too.
> >
> > Rx checksum offloading still offers 8-9% of improvement on VxLAN
> > tunneling decapsulation even though the SSE vector Rx function is
> > disabled in DPDK poll mode driver.
> >
> > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
> > Acked-by: Jesse Gross <jesse@kernel.org>
> >
> ...
> ...
> >  static void
> > +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> > +    OVS_REQUIRES(dev->mutex)
> > +{
> > +    struct rte_eth_dev_info info;
> > +    bool rx_csum_ol_flag = false;
> > +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
> > +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
> > +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
> > +    rte_eth_dev_info_get(dev->port_id, &info);
> > +    rx_csum_ol_flag = (dev->hw_ol_features &
> > +NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> > +
> > +    if (rx_csum_ol_flag &&
> > +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
> > +         rx_chksm_offload_capa) {
> > +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on device
> %d",
> > +                   dev->port_id);
> > +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> > +        return;
> > +    }
> > +    netdev_request_reconfigure(&dev->up);
> > +}
> > +
> I am not sure about need for netdev reconfigure here.
[Sugesh] Reconfigure is called here to enable/disable the checksum
at run time. Do you think the checksum will be updated on a user request
without having this call?
> 
> > +static void
> >  dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev)
> > OVS_REQUIRES(dev->mutex)  {
> >      if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) { @@
> > -851,6 +884,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int
> > port_no,
> >
> >      /* Initialize the flow control to NULL */
> >      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> > +
> > +    /* Initilize the hardware offload flags to 0 */
> > +    dev->hw_ol_features = 0;
> >      if (type == DPDK_DEV_ETH) {
> >          err = dpdk_eth_dev_init(dev);
> >          if (err) {
> > @@ -1118,6 +1154,8 @@ netdev_dpdk_set_config(struct netdev *netdev,
> const struct smap *args)
> >          {RTE_FC_NONE,     RTE_FC_TX_PAUSE},
> >          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
> >      };
> > +    bool rx_chksm_ofld;
> > +    bool temp_flag;
> >
> >      ovs_mutex_lock(&dev->mutex);
> >
> > @@ -1141,6 +1179,15 @@ netdev_dpdk_set_config(struct netdev
> *netdev, const struct smap *args)
> >          dpdk_eth_flow_ctrl_setup(dev);
> >      }
> >
> > +    /* Rx checksum offload configuration */
> > +    /* By default the Rx checksum offload is ON */
> > +    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
> > +    temp_flag = (dev->hw_ol_features &
> NETDEV_RX_CHECKSUM_OFFLOAD)
> > +                        != 0;
> > +    if (temp_flag != rx_chksm_ofld) {
> > +        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
> > +        dpdk_eth_checksum_offload_configure(dev);
> > +    }
> >      ovs_mutex_unlock(&dev->mutex);
> >
> Can you also reflect this feature back to user via get-status.
> 
> >      return 0;
> > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index
> > ce2582f..c730e72 100644
> > --- a/lib/netdev-native-tnl.c
> > +++ b/lib/netdev-native-tnl.c
> > @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet
> > *packet, struct flow_tnl *tnl,
> >
> >          ovs_be32 ip_src, ip_dst;
> >
> > -        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> > -            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> > -            return NULL;
> > +        if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
> > +            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> > +                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> > +                return NULL;
> > +            }
> >          }
> >
> >          if (ntohs(ip->ip_tot_len) > l3_size) { @@ -179,18 +181,21 @@
> > udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
> >      }
> >
> >      if (udp->udp_csum) {
> > -        uint32_t csum;
> > -        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> > -            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> > -        } else {
> > -            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> > -        }
> > -
> > -        csum = csum_continue(csum, udp, dp_packet_size(packet) -
> > -                             ((const unsigned char *)udp -
> > -                              (const unsigned char *)dp_packet_l2(packet)));
> > -        if (csum_finish(csum)) {
> > -            return NULL;
> > +        if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
> > +            uint32_t csum;
> > +            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> > +                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> > +            } else {
> > +                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> > +            }
> > +
> > +            csum = csum_continue(csum, udp, dp_packet_size(packet) -
> > +                                 ((const unsigned char *)udp -
> > +                                  (const unsigned char *)dp_packet_l2(packet)
> > +                                 ));
> > +            if (csum_finish(csum)) {
> > +                return NULL;
> > +            }
> 
> Can we make use of RX checksum offload for userspace conntrack lib? I think
> there is already code duplication in checksum verification, we could refactor
> code around checksum API with hardware offload to avoid it and likely
> improve performance.
[Sugesh] This is also a good place where we can think of using the Rx checksum offload.
As I mentioned in the commit message, the checksum offload disables vectorization
which impacts the performance.
We must need to test the checksum offload in different conn-track use cases to make sure
the checksum offload really offers the performance improvement before refactoring the code.

I just had a glance on conntrack lib, it seems only L4 checksum is validated when compared to the tunneling case where both IP and L4 checksum validated for every packets.

I can modify the conntrack lib and test to see if any performance improvement can be achieved with
Rx checksum offload. If it offers acceptable performance boost, I will send out another patch to enable
Rx checksum offload in conntrack module + refactoring the checksum offload around checksum API.
What do you think?
Pravin Shelar Dec. 16, 2016, 7:50 a.m. UTC | #3
On Thu, Dec 15, 2016 at 4:07 AM, Chandran, Sugesh
<sugesh.chandran@intel.com> wrote:
>
>
> Regards
> _Sugesh
>
>> -----Original Message-----
>> From: Pravin Shelar [mailto:pshelar@ovn.org]
>> Sent: Wednesday, December 14, 2016 9:42 PM
>> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>; Ben
>> Pfaff <blp@ovn.org>
>> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
>> offloading feature on DPDK physical ports.
>>
>> Thanks for the patch. I have couple of questions.
>>
>> On Wed, Dec 14, 2016 at 7:30 AM, Sugesh Chandran
>> <sugesh.chandran@intel.com> wrote:
>> > Add Rx checksum offloading feature support on DPDK physical ports. By
>> > default, the Rx checksum offloading is enabled if NIC supports.
>> > However, the checksum offloading can be turned OFF either while adding
>> > a new DPDK physical port to OVS or at runtime.
>> >
>> > The rx checksum offloading can be turned off by setting the parameter
>> > to 'false'. For eg: To disable the rx checksum offloading when adding
>> > a port,
>> >
>> >      'ovs-vsctl add-port br0 dpdk0 -- \
>> >       set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
>> >
>> > OR (to disable at run time after port is being added to OVS)
>> >
>> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
>> >
>> > Similarly to turn ON rx checksum offloading at run time,
>> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
>> >
>> > The Tx checksum offloading support is not implemented due to the
>> > following reasons.
>> >
>> > 1) Checksum offloading and vectorization are mutually exclusive in
>> > DPDK poll mode driver. Vector packet processing is turned OFF when
>> > checksum offloading is enabled which causes significant performance drop
>> at Tx side.
>> >
>> > 2) Normally, OVS generates checksum for tunnel packets in software at
>> > the 'tunnel push' operation, where the tunnel headers are created.
>> > However enabling Tx checksum offloading involves,
>> >
>> > *) Mark every packets for tx checksum offloading at 'tunnel_push' and
>> > recirculate.
>> > *) At the time of xmit, validate the same flag and instruct the NIC to
>> > do the checksum calculation.  In case NIC doesnt support Tx checksum
>> > offloading, the checksum calculation has to be done in software before
>> > sending out the packets.
>> >
>> > No significant performance improvement noticed with Tx checksum
>> > offloading due to the e overhead of additional validations + non vector
>> packet processing.
>> > In some test scenarios, it introduces performance drop too.
>> >
>> > Rx checksum offloading still offers 8-9% of improvement on VxLAN
>> > tunneling decapsulation even though the SSE vector Rx function is
>> > disabled in DPDK poll mode driver.
>> >
>> > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
>> > Acked-by: Jesse Gross <jesse@kernel.org>
>> >
>> ...
>> ...
>> >  static void
>> > +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
>> > +    OVS_REQUIRES(dev->mutex)
>> > +{
>> > +    struct rte_eth_dev_info info;
>> > +    bool rx_csum_ol_flag = false;
>> > +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
>> > +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
>> > +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
>> > +    rte_eth_dev_info_get(dev->port_id, &info);
>> > +    rx_csum_ol_flag = (dev->hw_ol_features &
>> > +NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
>> > +
>> > +    if (rx_csum_ol_flag &&
>> > +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
>> > +         rx_chksm_offload_capa) {
>> > +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on device
>> %d",
>> > +                   dev->port_id);
>> > +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
>> > +        return;
>> > +    }
>> > +    netdev_request_reconfigure(&dev->up);
>> > +}
>> > +
>> I am not sure about need for netdev reconfigure here.
> [Sugesh] Reconfigure is called here to enable/disable the checksum
> at run time. Do you think the checksum will be updated on a user request
> without having this call?

I do not see need to reconfigure netdev here.
>>
>> > +static void
>> >  dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev)
>> > OVS_REQUIRES(dev->mutex)  {
>> >      if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) { @@
>> > -851,6 +884,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int
>> > port_no,
>> >
>> >      /* Initialize the flow control to NULL */
>> >      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
>> > +
>> > +    /* Initilize the hardware offload flags to 0 */
>> > +    dev->hw_ol_features = 0;
>> >      if (type == DPDK_DEV_ETH) {
>> >          err = dpdk_eth_dev_init(dev);
>> >          if (err) {
>> > @@ -1118,6 +1154,8 @@ netdev_dpdk_set_config(struct netdev *netdev,
>> const struct smap *args)
>> >          {RTE_FC_NONE,     RTE_FC_TX_PAUSE},
>> >          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>> >      };
>> > +    bool rx_chksm_ofld;
>> > +    bool temp_flag;
>> >
>> >      ovs_mutex_lock(&dev->mutex);
>> >
>> > @@ -1141,6 +1179,15 @@ netdev_dpdk_set_config(struct netdev
>> *netdev, const struct smap *args)
>> >          dpdk_eth_flow_ctrl_setup(dev);
>> >      }
>> >
>> > +    /* Rx checksum offload configuration */
>> > +    /* By default the Rx checksum offload is ON */
>> > +    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
>> > +    temp_flag = (dev->hw_ol_features &
>> NETDEV_RX_CHECKSUM_OFFLOAD)
>> > +                        != 0;
>> > +    if (temp_flag != rx_chksm_ofld) {
>> > +        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
>> > +        dpdk_eth_checksum_offload_configure(dev);
>> > +    }
>> >      ovs_mutex_unlock(&dev->mutex);
>> >
>> Can you also reflect this feature back to user via get-status.
>>
>> >      return 0;
>> > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index
>> > ce2582f..c730e72 100644
>> > --- a/lib/netdev-native-tnl.c
>> > +++ b/lib/netdev-native-tnl.c
>> > @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet
>> > *packet, struct flow_tnl *tnl,
>> >
>> >          ovs_be32 ip_src, ip_dst;
>> >
>> > -        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
>> > -            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
>> > -            return NULL;
>> > +        if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
>> > +            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
>> > +                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
>> > +                return NULL;
>> > +            }
>> >          }
>> >
>> >          if (ntohs(ip->ip_tot_len) > l3_size) { @@ -179,18 +181,21 @@
>> > udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
>> >      }
>> >
>> >      if (udp->udp_csum) {
>> > -        uint32_t csum;
>> > -        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
>> > -            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
>> > -        } else {
>> > -            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
>> > -        }
>> > -
>> > -        csum = csum_continue(csum, udp, dp_packet_size(packet) -
>> > -                             ((const unsigned char *)udp -
>> > -                              (const unsigned char *)dp_packet_l2(packet)));
>> > -        if (csum_finish(csum)) {
>> > -            return NULL;
>> > +        if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
>> > +            uint32_t csum;
>> > +            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
>> > +                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
>> > +            } else {
>> > +                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
>> > +            }
>> > +
>> > +            csum = csum_continue(csum, udp, dp_packet_size(packet) -
>> > +                                 ((const unsigned char *)udp -
>> > +                                  (const unsigned char *)dp_packet_l2(packet)
>> > +                                 ));
>> > +            if (csum_finish(csum)) {
>> > +                return NULL;
>> > +            }
>>
>> Can we make use of RX checksum offload for userspace conntrack lib? I think
>> there is already code duplication in checksum verification, we could refactor
>> code around checksum API with hardware offload to avoid it and likely
>> improve performance.
> [Sugesh] This is also a good place where we can think of using the Rx checksum offload.
> As I mentioned in the commit message, the checksum offload disables vectorization
> which impacts the performance.
> We must need to test the checksum offload in different conn-track use cases to make sure
> the checksum offload really offers the performance improvement before refactoring the code.
>
> I just had a glance on conntrack lib, it seems only L4 checksum is validated when compared to the tunneling case where both IP and L4 checksum validated for every packets.
>
> I can modify the conntrack lib and test to see if any performance improvement can be achieved with
> Rx checksum offload. If it offers acceptable performance boost, I will send out another patch to enable
> Rx checksum offload in conntrack module + refactoring the checksum offload around checksum API.
> What do you think?
>

I think it should improve performance even in this case, But I am fine
with working on this later as separate patch series.
Chandran, Sugesh Dec. 16, 2016, 10:33 a.m. UTC | #4
Regards
_Sugesh

> -----Original Message-----
> From: Pravin Shelar [mailto:pshelar@ovn.org]
> Sent: Friday, December 16, 2016 7:50 AM
> To: Chandran, Sugesh <sugesh.chandran@intel.com>
> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>; Ben
> Pfaff <blp@ovn.org>
> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
> offloading feature on DPDK physical ports.
> 
> On Thu, Dec 15, 2016 at 4:07 AM, Chandran, Sugesh
> <sugesh.chandran@intel.com> wrote:
> >
> >
> > Regards
> > _Sugesh
> >
> >> -----Original Message-----
> >> From: Pravin Shelar [mailto:pshelar@ovn.org]
> >> Sent: Wednesday, December 14, 2016 9:42 PM
> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>
> >> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>;
> >> Ben Pfaff <blp@ovn.org>
> >> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
> >> offloading feature on DPDK physical ports.
> >>
> >> Thanks for the patch. I have couple of questions.
> >>
> >> On Wed, Dec 14, 2016 at 7:30 AM, Sugesh Chandran
> >> <sugesh.chandran@intel.com> wrote:
> >> > Add Rx checksum offloading feature support on DPDK physical ports.
> >> > By default, the Rx checksum offloading is enabled if NIC supports.
> >> > However, the checksum offloading can be turned OFF either while
> >> > adding a new DPDK physical port to OVS or at runtime.
> >> >
> >> > The rx checksum offloading can be turned off by setting the
> >> > parameter to 'false'. For eg: To disable the rx checksum offloading
> >> > when adding a port,
> >> >
> >> >      'ovs-vsctl add-port br0 dpdk0 -- \
> >> >       set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
> >> >
> >> > OR (to disable at run time after port is being added to OVS)
> >> >
> >> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
> >> >
> >> > Similarly to turn ON rx checksum offloading at run time,
> >> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
> >> >
> >> > The Tx checksum offloading support is not implemented due to the
> >> > following reasons.
> >> >
> >> > 1) Checksum offloading and vectorization are mutually exclusive in
> >> > DPDK poll mode driver. Vector packet processing is turned OFF when
> >> > checksum offloading is enabled which causes significant performance
> >> > drop
> >> at Tx side.
> >> >
> >> > 2) Normally, OVS generates checksum for tunnel packets in software
> >> > at the 'tunnel push' operation, where the tunnel headers are created.
> >> > However enabling Tx checksum offloading involves,
> >> >
> >> > *) Mark every packets for tx checksum offloading at 'tunnel_push'
> >> > and recirculate.
> >> > *) At the time of xmit, validate the same flag and instruct the NIC
> >> > to do the checksum calculation.  In case NIC doesnt support Tx
> >> > checksum offloading, the checksum calculation has to be done in
> >> > software before sending out the packets.
> >> >
> >> > No significant performance improvement noticed with Tx checksum
> >> > offloading due to the e overhead of additional validations + non
> >> > vector
> >> packet processing.
> >> > In some test scenarios, it introduces performance drop too.
> >> >
> >> > Rx checksum offloading still offers 8-9% of improvement on VxLAN
> >> > tunneling decapsulation even though the SSE vector Rx function is
> >> > disabled in DPDK poll mode driver.
> >> >
> >> > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
> >> > Acked-by: Jesse Gross <jesse@kernel.org>
> >> >
> >> ...
> >> ...
> >> >  static void
> >> > +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> >> > +    OVS_REQUIRES(dev->mutex)
> >> > +{
> >> > +    struct rte_eth_dev_info info;
> >> > +    bool rx_csum_ol_flag = false;
> >> > +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM
> |
> >> > +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
> >> > +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
> >> > +    rte_eth_dev_info_get(dev->port_id, &info);
> >> > +    rx_csum_ol_flag = (dev->hw_ol_features &
> >> > +NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> >> > +
> >> > +    if (rx_csum_ol_flag &&
> >> > +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
> >> > +         rx_chksm_offload_capa) {
> >> > +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on
> >> > + device
> >> %d",
> >> > +                   dev->port_id);
> >> > +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> >> > +        return;
> >> > +    }
> >> > +    netdev_request_reconfigure(&dev->up);
> >> > +}
> >> > +
> >> I am not sure about need for netdev reconfigure here.
> > [Sugesh] Reconfigure is called here to enable/disable the checksum at
> > run time. Do you think the checksum will be updated on a user request
> > without having this call?
> 
> I do not see need to reconfigure netdev here.
[Sugesh] Sorry Pravin, Looks like I am missing something here. Can you please
elaborate why you feel the reconfigure is not needed here?
> >>
> >> > +static void
> >> >  dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev)
> >> > OVS_REQUIRES(dev->mutex)  {
> >> >      if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
> >> > @@
> >> > -851,6 +884,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned
> >> > int port_no,
> >> >
> >> >      /* Initialize the flow control to NULL */
> >> >      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> >> > +
> >> > +    /* Initilize the hardware offload flags to 0 */
> >> > +    dev->hw_ol_features = 0;
> >> >      if (type == DPDK_DEV_ETH) {
> >> >          err = dpdk_eth_dev_init(dev);
> >> >          if (err) {
> >> > @@ -1118,6 +1154,8 @@ netdev_dpdk_set_config(struct netdev
> *netdev,
> >> const struct smap *args)
> >> >          {RTE_FC_NONE,     RTE_FC_TX_PAUSE},
> >> >          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
> >> >      };
> >> > +    bool rx_chksm_ofld;
> >> > +    bool temp_flag;
> >> >
> >> >      ovs_mutex_lock(&dev->mutex);
> >> >
> >> > @@ -1141,6 +1179,15 @@ netdev_dpdk_set_config(struct netdev
> >> *netdev, const struct smap *args)
> >> >          dpdk_eth_flow_ctrl_setup(dev);
> >> >      }
> >> >
> >> > +    /* Rx checksum offload configuration */
> >> > +    /* By default the Rx checksum offload is ON */
> >> > +    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
> >> > +    temp_flag = (dev->hw_ol_features &
> >> NETDEV_RX_CHECKSUM_OFFLOAD)
> >> > +                        != 0;
> >> > +    if (temp_flag != rx_chksm_ofld) {
> >> > +        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
> >> > +        dpdk_eth_checksum_offload_configure(dev);
> >> > +    }
> >> >      ovs_mutex_unlock(&dev->mutex);
> >> >
> >> Can you also reflect this feature back to user via get-status.
> >>
> >> >      return 0;
> >> > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> >> > index
> >> > ce2582f..c730e72 100644
> >> > --- a/lib/netdev-native-tnl.c
> >> > +++ b/lib/netdev-native-tnl.c
> >> > @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet
> >> > *packet, struct flow_tnl *tnl,
> >> >
> >> >          ovs_be32 ip_src, ip_dst;
> >> >
> >> > -        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> >> > -            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> >> > -            return NULL;
> >> > +        if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
> >> > +            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> >> > +                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> >> > +                return NULL;
> >> > +            }
> >> >          }
> >> >
> >> >          if (ntohs(ip->ip_tot_len) > l3_size) { @@ -179,18 +181,21
> >> > @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl
> *tnl,
> >> >      }
> >> >
> >> >      if (udp->udp_csum) {
> >> > -        uint32_t csum;
> >> > -        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> >> > -            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> >> > -        } else {
> >> > -            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> >> > -        }
> >> > -
> >> > -        csum = csum_continue(csum, udp, dp_packet_size(packet) -
> >> > -                             ((const unsigned char *)udp -
> >> > -                              (const unsigned char *)dp_packet_l2(packet)));
> >> > -        if (csum_finish(csum)) {
> >> > -            return NULL;
> >> > +        if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
> >> > +            uint32_t csum;
> >> > +            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> >> > +                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> >> > +            } else {
> >> > +                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> >> > +            }
> >> > +
> >> > +            csum = csum_continue(csum, udp, dp_packet_size(packet) -
> >> > +                                 ((const unsigned char *)udp -
> >> > +                                  (const unsigned char *)dp_packet_l2(packet)
> >> > +                                 ));
> >> > +            if (csum_finish(csum)) {
> >> > +                return NULL;
> >> > +            }
> >>
> >> Can we make use of RX checksum offload for userspace conntrack lib? I
> >> think there is already code duplication in checksum verification, we
> >> could refactor code around checksum API with hardware offload to
> >> avoid it and likely improve performance.
> > [Sugesh] This is also a good place where we can think of using the Rx
> checksum offload.
> > As I mentioned in the commit message, the checksum offload disables
> > vectorization which impacts the performance.
> > We must need to test the checksum offload in different conn-track use
> > cases to make sure the checksum offload really offers the performance
> improvement before refactoring the code.
> >
> > I just had a glance on conntrack lib, it seems only L4 checksum is validated
> when compared to the tunneling case where both IP and L4 checksum
> validated for every packets.
> >
> > I can modify the conntrack lib and test to see if any performance
> > improvement can be achieved with Rx checksum offload. If it offers
> > acceptable performance boost, I will send out another patch to enable Rx
> checksum offload in conntrack module + refactoring the checksum offload
> around checksum API.
> > What do you think?
> >
> 
> I think it should improve performance even in this case, But I am fine with
> working on this later as separate patch series.
[Sugesh] Great, Thanks Pravin, I will work on the follow up patches once this get
upstreamed.
Pravin Shelar Dec. 22, 2016, 10:10 p.m. UTC | #5
On Fri, Dec 16, 2016 at 4:03 PM, Chandran, Sugesh
<sugesh.chandran@intel.com> wrote:
>
>
> Regards
> _Sugesh
>
>> -----Original Message-----
>> From: Pravin Shelar [mailto:pshelar@ovn.org]
>> Sent: Friday, December 16, 2016 7:50 AM
>> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>; Ben
>> Pfaff <blp@ovn.org>
>> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
>> offloading feature on DPDK physical ports.
>>
>> On Thu, Dec 15, 2016 at 4:07 AM, Chandran, Sugesh
>> <sugesh.chandran@intel.com> wrote:
>> >
>> >
>> > Regards
>> > _Sugesh
>> >
>> >> -----Original Message-----
>> >> From: Pravin Shelar [mailto:pshelar@ovn.org]
>> >> Sent: Wednesday, December 14, 2016 9:42 PM
>> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> >> Cc: ovs dev <dev@openvswitch.org>; Jesse Gross <jesse@kernel.org>;
>> >> Ben Pfaff <blp@ovn.org>
>> >> Subject: Re: [ovs-dev] [PATCH v7] netdev-dpdk: Enable Rx checksum
>> >> offloading feature on DPDK physical ports.
>> >>
>> >> Thanks for the patch. I have couple of questions.
>> >>
>> >> On Wed, Dec 14, 2016 at 7:30 AM, Sugesh Chandran
>> >> <sugesh.chandran@intel.com> wrote:
>> >> > Add Rx checksum offloading feature support on DPDK physical ports.
>> >> > By default, the Rx checksum offloading is enabled if NIC supports.
>> >> > However, the checksum offloading can be turned OFF either while
>> >> > adding a new DPDK physical port to OVS or at runtime.
>> >> >
>> >> > The rx checksum offloading can be turned off by setting the
>> >> > parameter to 'false'. For eg: To disable the rx checksum offloading
>> >> > when adding a port,
>> >> >
>> >> >      'ovs-vsctl add-port br0 dpdk0 -- \
>> >> >       set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
>> >> >
>> >> > OR (to disable at run time after port is being added to OVS)
>> >> >
>> >> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
>> >> >
>> >> > Similarly to turn ON rx checksum offloading at run time,
>> >> >     'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
>> >> >
>> >> > The Tx checksum offloading support is not implemented due to the
>> >> > following reasons.
>> >> >
>> >> > 1) Checksum offloading and vectorization are mutually exclusive in
>> >> > DPDK poll mode driver. Vector packet processing is turned OFF when
>> >> > checksum offloading is enabled which causes significant performance
>> >> > drop
>> >> at Tx side.
>> >> >
>> >> > 2) Normally, OVS generates checksum for tunnel packets in software
>> >> > at the 'tunnel push' operation, where the tunnel headers are created.
>> >> > However enabling Tx checksum offloading involves,
>> >> >
>> >> > *) Mark every packets for tx checksum offloading at 'tunnel_push'
>> >> > and recirculate.
>> >> > *) At the time of xmit, validate the same flag and instruct the NIC
>> >> > to do the checksum calculation.  In case NIC doesnt support Tx
>> >> > checksum offloading, the checksum calculation has to be done in
>> >> > software before sending out the packets.
>> >> >
>> >> > No significant performance improvement noticed with Tx checksum
>> >> > offloading due to the e overhead of additional validations + non
>> >> > vector
>> >> packet processing.
>> >> > In some test scenarios, it introduces performance drop too.
>> >> >
>> >> > Rx checksum offloading still offers 8-9% of improvement on VxLAN
>> >> > tunneling decapsulation even though the SSE vector Rx function is
>> >> > disabled in DPDK poll mode driver.
>> >> >
>> >> > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
>> >> > Acked-by: Jesse Gross <jesse@kernel.org>
>> >> >
>> >> ...
>> >> ...
>> >> >  static void
>> >> > +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
>> >> > +    OVS_REQUIRES(dev->mutex)
>> >> > +{
>> >> > +    struct rte_eth_dev_info info;
>> >> > +    bool rx_csum_ol_flag = false;
>> >> > +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM
>> |
>> >> > +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
>> >> > +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
>> >> > +    rte_eth_dev_info_get(dev->port_id, &info);
>> >> > +    rx_csum_ol_flag = (dev->hw_ol_features &
>> >> > +NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
>> >> > +
>> >> > +    if (rx_csum_ol_flag &&
>> >> > +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
>> >> > +         rx_chksm_offload_capa) {
>> >> > +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on
>> >> > + device
>> >> %d",
>> >> > +                   dev->port_id);
This is not clear error msg. In this case device does not support
requested offload feature. So the msg should mention that.

>> >> > +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
>> >> > +        return;
>> >> > +    }
>> >> > +    netdev_request_reconfigure(&dev->up);
>> >> > +}
>> >> > +
>> >> I am not sure about need for netdev reconfigure here.
>> > [Sugesh] Reconfigure is called here to enable/disable the checksum at
>> > run time. Do you think the checksum will be updated on a user request
>> > without having this call?
>>
>> I do not see need to reconfigure netdev here.
> [Sugesh] Sorry Pravin, Looks like I am missing something here. Can you please
> elaborate why you feel the reconfigure is not needed here?

I thought we can enable rx checksum offload synchronously, but the
device API does not allow it, so I agree we have to reconfigure netdev
here.
Chandran, Sugesh Dec. 23, 2016, 9:16 a.m. UTC | #6
Regards
_Sugesh

....
> >> >> > +    rx_csum_ol_flag = (dev->hw_ol_features &

> >> >> > +NETDEV_RX_CHECKSUM_OFFLOAD) != 0;

> >> >> > +

> >> >> > +    if (rx_csum_ol_flag &&

> >> >> > +        (info.rx_offload_capa & rx_chksm_offload_capa) !=

> >> >> > +         rx_chksm_offload_capa) {

> >> >> > +        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on

> >> >> > + device

> >> >> %d",

> >> >> > +                   dev->port_id);

> This is not clear error msg. In this case device does not support requested

> offload feature. So the msg should mention that.

[Sugesh] Ok, Will change the error log message stating device doesn’t support the feature.
> 

> >> >> > +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;

> >> >> > +        return;

> >> >> > +    }

> >> >> > +    netdev_request_reconfigure(&dev->up);

> >> >> > +}

> >> >> > +

> >> >> I am not sure about need for netdev reconfigure here.

> >> > [Sugesh] Reconfigure is called here to enable/disable the checksum

> >> > at run time. Do you think the checksum will be updated on a user

> >> > request without having this call?

> >>

> >> I do not see need to reconfigure netdev here.

> > [Sugesh] Sorry Pravin, Looks like I am missing something here. Can you

> > please elaborate why you feel the reconfigure is not needed here?

> 

> I thought we can enable rx checksum offload synchronously, but the device

> API does not allow it, so I agree we have to reconfigure netdev here.

[Sugesh] Ok
diff mbox

Patch

diff --git a/Documentation/intro/install/dpdk-advanced.rst b/Documentation/intro/install/dpdk-advanced.rst
index 44d1cd7..f0b28e5 100644
--- a/Documentation/intro/install/dpdk-advanced.rst
+++ b/Documentation/intro/install/dpdk-advanced.rst
@@ -924,6 +924,31 @@  largest frame size supported by Fortville NIC using the DPDK i40e driver, but
 larger frames and other DPDK NIC drivers may be supported. These cases are
 common for use cases involving East-West traffic only.
 
+Rx Checksum Offload
+-------------------
+
+By default, DPDK physical ports are enabled with Rx checksum offload. Rx
+checksum offload can be configured on a DPDK physical port either when adding
+or at run time.
+
+To disable Rx checksum offload when adding a DPDK port dpdk0::
+
+    $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
+      options:rx-checksum-offload=false
+
+Similarly to disable the Rx checksum offloading on a existing DPDK port dpdk0::
+
+    $ ovs-vsctl set Interface dpdk0 type=dpdk options:rx-checksum-offload=false
+
+OVS-DPDK is validating the checksum only for tunnel packets in software, hence
+Rx checksum offload can offer performance improvement only for tunneling
+traffic in OVS-DPDK. Also enabling Rx checksum may slightly reduce the
+performance of non-tunnel traffic, specifically for smaller size packet.
+DPDK vectorization is disabled when checksum offloading is configured on DPDK
+physical ports which in turn effects the non-tunnel traffic performance.
+So it is advised to turn off the Rx checksum offload for non-tunnel traffic use
+cases to achieve the best performance.
+
 vsperf
 ------
 
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 1469864..d61ec83 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -598,6 +598,35 @@  dp_packet_rss_invalidate(struct dp_packet *p)
 #endif
 }
 
+static inline bool
+dp_packet_ip_checksum_valid(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    return p->mbuf.ol_flags & PKT_RX_IP_CKSUM_GOOD;
+#else
+    return 0;
+#endif
+}
+
+static inline bool
+dp_packet_l4_checksum_valid(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    return p->mbuf.ol_flags & PKT_RX_L4_CKSUM_GOOD;
+#else
+    return 0;
+#endif
+}
+
+static inline void
+reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    p->mbuf.ol_flags &= ~(PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
+                          PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD);
+#endif
+}
+
 enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 625f425..32f9daa 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -319,6 +319,10 @@  struct ingress_policer {
     rte_spinlock_t policer_lock;
 };
 
+enum dpdk_hw_ol_features {
+    NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
+};
+
 struct netdev_dpdk {
     struct netdev up;
     int port_id;
@@ -384,6 +388,10 @@  struct netdev_dpdk {
 
     /* DPDK-ETH Flow control */
     struct rte_eth_fc_conf fc_conf;
+
+    /* DPDK-ETH hardware offload features,
+     * from the enum set 'dpdk_hw_ol_features' */
+    uint32_t hw_ol_features;
 };
 
 struct netdev_rxq_dpdk {
@@ -633,6 +641,8 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
         conf.rxmode.jumbo_frame = 0;
         conf.rxmode.max_rx_pkt_len = 0;
     }
+    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
+                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
      * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
@@ -693,6 +703,29 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 }
 
 static void
+dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
+    OVS_REQUIRES(dev->mutex)
+{
+    struct rte_eth_dev_info info;
+    bool rx_csum_ol_flag = false;
+    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
+                                     DEV_RX_OFFLOAD_TCP_CKSUM |
+                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
+    rte_eth_dev_info_get(dev->port_id, &info);
+    rx_csum_ol_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
+
+    if (rx_csum_ol_flag &&
+        (info.rx_offload_capa & rx_chksm_offload_capa) !=
+         rx_chksm_offload_capa) {
+        VLOG_WARN_ONCE("Failed to enable Rx checksum offload on device %d",
+                   dev->port_id);
+        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
+        return;
+    }
+    netdev_request_reconfigure(&dev->up);
+}
+
+static void
 dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex)
 {
     if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
@@ -851,6 +884,9 @@  netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
 
     /* Initialize the flow control to NULL */
     memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
+
+    /* Initilize the hardware offload flags to 0 */
+    dev->hw_ol_features = 0;
     if (type == DPDK_DEV_ETH) {
         err = dpdk_eth_dev_init(dev);
         if (err) {
@@ -1118,6 +1154,8 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
         {RTE_FC_NONE,     RTE_FC_TX_PAUSE},
         {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
     };
+    bool rx_chksm_ofld;
+    bool temp_flag;
 
     ovs_mutex_lock(&dev->mutex);
 
@@ -1141,6 +1179,15 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
         dpdk_eth_flow_ctrl_setup(dev);
     }
 
+    /* Rx checksum offload configuration */
+    /* By default the Rx checksum offload is ON */
+    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
+    temp_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD)
+                        != 0;
+    if (temp_flag != rx_chksm_ofld) {
+        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
+        dpdk_eth_checksum_offload_configure(dev);
+    }
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index ce2582f..c730e72 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -85,9 +85,11 @@  netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
 
         ovs_be32 ip_src, ip_dst;
 
-        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
-            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
-            return NULL;
+        if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
+            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
+                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
+                return NULL;
+            }
         }
 
         if (ntohs(ip->ip_tot_len) > l3_size) {
@@ -179,18 +181,21 @@  udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
     }
 
     if (udp->udp_csum) {
-        uint32_t csum;
-        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
-            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
-        } else {
-            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
-        }
-
-        csum = csum_continue(csum, udp, dp_packet_size(packet) -
-                             ((const unsigned char *)udp -
-                              (const unsigned char *)dp_packet_l2(packet)));
-        if (csum_finish(csum)) {
-            return NULL;
+        if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
+            uint32_t csum;
+            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
+                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
+            } else {
+                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
+            }
+
+            csum = csum_continue(csum, udp, dp_packet_size(packet) -
+                                 ((const unsigned char *)udp -
+                                  (const unsigned char *)dp_packet_l2(packet)
+                                 ));
+            if (csum_finish(csum)) {
+                return NULL;
+            }
         }
         tnl->flags |= FLOW_TNL_F_CSUM;
     }
diff --git a/lib/netdev.c b/lib/netdev.c
index 860781d..f7a1001 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -741,6 +741,10 @@  netdev_pop_header(struct netdev *netdev, struct dp_packet_batch *batch)
     for (i = 0; i < batch->count; i++) {
         buffers[i] = netdev->netdev_class->pop_header(buffers[i]);
         if (buffers[i]) {
+            /* Reset the checksum offload flags if present, to avoid wrong
+             * interpretation in the further packet processing when
+             * recirculated.*/
+            reset_dp_packet_checksum_ol_flags(buffers[i]);
             buffers[n_cnt++] = buffers[i];
         }
     }
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index b4af5a5..e73f184 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3186,6 +3186,20 @@ 
       </column>
     </group>
 
+    <group title="Rx Checksum Offload Configuration">
+      <p>
+        The checksum validation on the incoming packets are performed on NIC
+        using Rx checksum offload feature. Implemented only for <code>dpdk
+        </code>physical interfaces.
+      </p>
+
+      <column name="options" key="rx-checksum-offload"
+              type='{"type": "boolean"}'>
+        Set to <code>false</code> to disble Rx checksum offloading on <code>
+        dpdk</code>physical ports. By default, Rx checksum offload is enabled.
+      </column>
+    </group>
+
     <group title="Common Columns">
       The overall purpose of these columns is described under <code>Common
       Columns</code> at the beginning of this document.