diff mbox series

[ovs-dev] netdev-linux: do not send packets to down tap ifaces.

Message ID 20180110160643.3273-1-fbl@sysclose.org
State Superseded
Headers show
Series [ovs-dev] netdev-linux: do not send packets to down tap ifaces. | expand

Commit Message

Flavio Leitner Jan. 10, 2018, 4:06 p.m. UTC
Today OVS pushes packets to the TAP interface ignoring its
current state. That works because the kernel will return -EIO
when it's not UP and OVS will just ignore that as it is not
an OVS issue.

However, it causes a huge impact when broadcasts happen when
using userspace datapath accelerated with DPDK (e.g.: action
NORMAL).  This patch improves the situation by checking the
TAP's interface state before issueing any syscall.

However, there might be use-cases moving interfaces to other
networking namespaces and in that case, OVS can't retrieve
the iface state (sets it to DOWN). That would stop the traffic
breaking the use-case. This patch relies on netlink notifications
to find out if the device is local or not. When it's local, the
device state is checked otherwise it will behave as before.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/netdev-linux.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Ben Pfaff Jan. 10, 2018, 11:09 p.m. UTC | #1
On Wed, Jan 10, 2018 at 02:06:43PM -0200, Flavio Leitner wrote:
> Today OVS pushes packets to the TAP interface ignoring its
> current state. That works because the kernel will return -EIO
> when it's not UP and OVS will just ignore that as it is not
> an OVS issue.
> 
> However, it causes a huge impact when broadcasts happen when
> using userspace datapath accelerated with DPDK (e.g.: action
> NORMAL).  This patch improves the situation by checking the
> TAP's interface state before issueing any syscall.
> 
> However, there might be use-cases moving interfaces to other
> networking namespaces and in that case, OVS can't retrieve
> the iface state (sets it to DOWN). That would stop the traffic
> breaking the use-case. This patch relies on netlink notifications
> to find out if the device is local or not. When it's local, the
> device state is checked otherwise it will behave as before.
> 
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>

Can you help me understand the "huge impact"?  What kind of impact?

Thanks,

Ben.
Flavio Leitner Jan. 11, 2018, 12:49 p.m. UTC | #2
On Wed, Jan 10, 2018 at 03:09:30PM -0800, Ben Pfaff wrote:
> On Wed, Jan 10, 2018 at 02:06:43PM -0200, Flavio Leitner wrote:
> > Today OVS pushes packets to the TAP interface ignoring its
> > current state. That works because the kernel will return -EIO
> > when it's not UP and OVS will just ignore that as it is not
> > an OVS issue.
> > 
> > However, it causes a huge impact when broadcasts happen when
> > using userspace datapath accelerated with DPDK (e.g.: action
> > NORMAL).  This patch improves the situation by checking the
> > TAP's interface state before issueing any syscall.
> > 
> > However, there might be use-cases moving interfaces to other
> > networking namespaces and in that case, OVS can't retrieve
> > the iface state (sets it to DOWN). That would stop the traffic
> > breaking the use-case. This patch relies on netlink notifications
> > to find out if the device is local or not. When it's local, the
> > device state is checked otherwise it will behave as before.
> > 
> > Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> 
> Can you help me understand the "huge impact"?  What kind of impact?

The PMD thread needs to be all time processing packets in order to
provide maximum performance (no loss, less jitter and stable tput).
However, it calls the write() syscall when sending packets to tap
devices and then we lost control over the CPU.

One exampe is that we need to warm up the fdb before starting the
performance tests because the ARP resolution uses the syscall and
causes issues right at the beginning.

So, basically every time a broadcast happen for whatever reason,
there is a high chance of stalling the PMD thread enough to cause
packet loss, jitter or tput degradation.

A real world example is OpenStack running with OVS-DPDK and
serving multiple VMs through vhost-user ports.  If any of the
VMs triggers a broadcast, other VMs will notice the penalty.
Vishal Deep Ajmera Jan. 15, 2018, 10:27 a.m. UTC | #3
Hi Flavio,

Thank you for looking into this issue. It certainly improves the situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state.

However, I see an issue with the patch. It does not handle device statistics. Earlier even when tap-port was DOWN, we would make a write system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ?

Warm Regards,
Vishal Ajmera

-----Original Message-----
From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner
Sent: Wednesday, January 10, 2018 9:37 PM
To: dev@openvswitch.org
Cc: Flavio Leitner <fbl@sysclose.org>
Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.

Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue.

However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL).  This patch improves the situation by checking the TAP's interface state before issueing any syscall.

However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
---
 lib/netdev-linux.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index ccb6def6c..53b08bebd 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -502,6 +502,7 @@ struct netdev_linux {
 
     /* For devices of class netdev_tap_class only. */
     int tap_fd;
+    bool present;            /* If the device is present in the namespace */
 };
 
 struct netdev_rxq_linux {
@@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev,
             dev->ifindex = change->if_index;
             dev->cache_valid |= VALID_IFINDEX;
             dev->get_ifindex_error = 0;
+            dev->present = true;
         } else {
             netdev_linux_changed(dev, change->ifi_flags, 0);
+            dev->present = false;
         }
     } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
         /* Invalidates in4, in6. */
@@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,  {
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
     struct dp_packet *packet;
+
+    /* The Linux tap driver returns EIO if the device is not up,
+     * so if the device is not up, don't waste time sending it.
+     * However, if the device is in another network namespace
+     * then OVS can't retrieve the state. In that case, send the
+     * packets anyway. */
+    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
+        return 0;
+    }
+
     DP_PACKET_BATCH_FOR_EACH (packet, batch) {
         size_t size = dp_packet_size(packet);
         ssize_t retval;
--
2.14.3
Jan Scheurich Jan. 15, 2018, 12:25 p.m. UTC | #4
Hi Vishal,

I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right?

BR, Jan

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep Ajmera
> Sent: Monday, 15 January, 2018 11:27
> To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> 
> Hi Flavio,
> 
> Thank you for looking into this issue. It certainly improves the situation for broadcast frames when we have multiple tap ports on the
> bridge which are in DOWN state.
> 
> However, I see an issue with the patch. It does not handle device statistics. Earlier even when tap-port was DOWN, we would make a write
> system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets
> to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ?
> 
> Warm Regards,
> Vishal Ajmera
> 
> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner
> Sent: Wednesday, January 10, 2018 9:37 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner <fbl@sysclose.org>
> Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> 
> Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP
> and OVS will just ignore that as it is not an OVS issue.
> 
> However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL).
> This patch improves the situation by checking the TAP's interface state before issueing any syscall.
> 
> However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state
> (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is
> local or not. When it's local, the device state is checked otherwise it will behave as before.
> 
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  lib/netdev-linux.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index ccb6def6c..53b08bebd 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -502,6 +502,7 @@ struct netdev_linux {
> 
>      /* For devices of class netdev_tap_class only. */
>      int tap_fd;
> +    bool present;            /* If the device is present in the namespace */
>  };
> 
>  struct netdev_rxq_linux {
> @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev,
>              dev->ifindex = change->if_index;
>              dev->cache_valid |= VALID_IFINDEX;
>              dev->get_ifindex_error = 0;
> +            dev->present = true;
>          } else {
>              netdev_linux_changed(dev, change->ifi_flags, 0);
> +            dev->present = false;
>          }
>      } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
>          /* Invalidates in4, in6. */
> @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,  {
>      struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>      struct dp_packet *packet;
> +
> +    /* The Linux tap driver returns EIO if the device is not up,
> +     * so if the device is not up, don't waste time sending it.
> +     * However, if the device is in another network namespace
> +     * then OVS can't retrieve the state. In that case, send the
> +     * packets anyway. */
> +    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
> +        return 0;
> +    }
> +
>      DP_PACKET_BATCH_FOR_EACH (packet, batch) {
>          size_t size = dp_packet_size(packet);
>          ssize_t retval;
> --
> 2.14.3
> 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Vishal Deep Ajmera Jan. 15, 2018, 3:12 p.m. UTC | #5
Yes Jan. I believe it should be counted as tx-dropped on the tap interface.

Warm Regards,
Vishal Ajmera

-----Original Message-----
From: Jan Scheurich 
Sent: Monday, January 15, 2018 5:56 PM
To: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>; Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org
Cc: Rohith Basavaraja <rohith.basavaraja@ericsson.com>
Subject: RE: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.

Hi Vishal,

I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right?

BR, Jan

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org 
> [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep 
> Ajmera
> Sent: Monday, 15 January, 2018 11:27
> To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> 
> Hi Flavio,
> 
> Thank you for looking into this issue. It certainly improves the 
> situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state.
> 
> However, I see an issue with the patch. It does not handle device 
> statistics. Earlier even when tap-port was DOWN, we would make a write 
> system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ?
> 
> Warm Regards,
> Vishal Ajmera
> 
> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org 
> [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner
> Sent: Wednesday, January 10, 2018 9:37 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner <fbl@sysclose.org>
> Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> 
> Today OVS pushes packets to the TAP interface ignoring its current 
> state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue.
> 
> However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL).
> This patch improves the situation by checking the TAP's interface state before issueing any syscall.
> 
> However, there might be use-cases moving interfaces to other 
> networking namespaces and in that case, OVS can't retrieve the iface 
> state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before.
> 
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  lib/netdev-linux.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
> ccb6def6c..53b08bebd 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -502,6 +502,7 @@ struct netdev_linux {
> 
>      /* For devices of class netdev_tap_class only. */
>      int tap_fd;
> +    bool present;            /* If the device is present in the namespace */
>  };
> 
>  struct netdev_rxq_linux {
> @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev,
>              dev->ifindex = change->if_index;
>              dev->cache_valid |= VALID_IFINDEX;
>              dev->get_ifindex_error = 0;
> +            dev->present = true;
>          } else {
>              netdev_linux_changed(dev, change->ifi_flags, 0);
> +            dev->present = false;
>          }
>      } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
>          /* Invalidates in4, in6. */
> @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,  {
>      struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>      struct dp_packet *packet;
> +
> +    /* The Linux tap driver returns EIO if the device is not up,
> +     * so if the device is not up, don't waste time sending it.
> +     * However, if the device is in another network namespace
> +     * then OVS can't retrieve the state. In that case, send the
> +     * packets anyway. */
> +    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
> +        return 0;
> +    }
> +
>      DP_PACKET_BATCH_FOR_EACH (packet, batch) {
>          size_t size = dp_packet_size(packet);
>          ssize_t retval;
> --
> 2.14.3
> 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Flavio Leitner Jan. 16, 2018, 3:07 a.m. UTC | #6
On Mon, Jan 15, 2018 at 03:12:49PM +0000, Vishal Deep Ajmera wrote:
> Yes Jan. I believe it should be counted as tx-dropped on the tap interface.

Sounds reasonable. I will have a look.
fbl


> 
> Warm Regards,
> Vishal Ajmera
> 
> -----Original Message-----
> From: Jan Scheurich 
> Sent: Monday, January 15, 2018 5:56 PM
> To: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>; Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org
> Cc: Rohith Basavaraja <rohith.basavaraja@ericsson.com>
> Subject: RE: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> 
> Hi Vishal,
> 
> I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right?
> 
> BR, Jan
> 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org 
> > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep 
> > Ajmera
> > Sent: Monday, 15 January, 2018 11:27
> > To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> > 
> > Hi Flavio,
> > 
> > Thank you for looking into this issue. It certainly improves the 
> > situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state.
> > 
> > However, I see an issue with the patch. It does not handle device 
> > statistics. Earlier even when tap-port was DOWN, we would make a write 
> > system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ?
> > 
> > Warm Regards,
> > Vishal Ajmera
> > 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org 
> > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner
> > Sent: Wednesday, January 10, 2018 9:37 PM
> > To: dev@openvswitch.org
> > Cc: Flavio Leitner <fbl@sysclose.org>
> > Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces.
> > 
> > Today OVS pushes packets to the TAP interface ignoring its current 
> > state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue.
> > 
> > However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL).
> > This patch improves the situation by checking the TAP's interface state before issueing any syscall.
> > 
> > However, there might be use-cases moving interfaces to other 
> > networking namespaces and in that case, OVS can't retrieve the iface 
> > state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before.
> > 
> > Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> > ---
> >  lib/netdev-linux.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
> > ccb6def6c..53b08bebd 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -502,6 +502,7 @@ struct netdev_linux {
> > 
> >      /* For devices of class netdev_tap_class only. */
> >      int tap_fd;
> > +    bool present;            /* If the device is present in the namespace */
> >  };
> > 
> >  struct netdev_rxq_linux {
> > @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev,
> >              dev->ifindex = change->if_index;
> >              dev->cache_valid |= VALID_IFINDEX;
> >              dev->get_ifindex_error = 0;
> > +            dev->present = true;
> >          } else {
> >              netdev_linux_changed(dev, change->ifi_flags, 0);
> > +            dev->present = false;
> >          }
> >      } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
> >          /* Invalidates in4, in6. */
> > @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,  {
> >      struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> >      struct dp_packet *packet;
> > +
> > +    /* The Linux tap driver returns EIO if the device is not up,
> > +     * so if the device is not up, don't waste time sending it.
> > +     * However, if the device is in another network namespace
> > +     * then OVS can't retrieve the state. In that case, send the
> > +     * packets anyway. */
> > +    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
> > +        return 0;
> > +    }
> > +
> >      DP_PACKET_BATCH_FOR_EACH (packet, batch) {
> >          size_t size = dp_packet_size(packet);
> >          ssize_t retval;
> > --
> > 2.14.3
> > 
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index ccb6def6c..53b08bebd 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -502,6 +502,7 @@  struct netdev_linux {
 
     /* For devices of class netdev_tap_class only. */
     int tap_fd;
+    bool present;            /* If the device is present in the namespace */
 };
 
 struct netdev_rxq_linux {
@@ -750,8 +751,10 @@  netdev_linux_update(struct netdev_linux *dev,
             dev->ifindex = change->if_index;
             dev->cache_valid |= VALID_IFINDEX;
             dev->get_ifindex_error = 0;
+            dev->present = true;
         } else {
             netdev_linux_changed(dev, change->ifi_flags, 0);
+            dev->present = false;
         }
     } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
         /* Invalidates in4, in6. */
@@ -1234,6 +1237,16 @@  netdev_linux_tap_batch_send(struct netdev *netdev_,
 {
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
     struct dp_packet *packet;
+
+    /* The Linux tap driver returns EIO if the device is not up,
+     * so if the device is not up, don't waste time sending it.
+     * However, if the device is in another network namespace
+     * then OVS can't retrieve the state. In that case, send the
+     * packets anyway. */
+    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
+        return 0;
+    }
+
     DP_PACKET_BATCH_FOR_EACH (packet, batch) {
         size_t size = dp_packet_size(packet);
         ssize_t retval;