diff mbox series

[ovs-dev,v3] netdev-linux: do not send packets to down tap ifaces.

Message ID 20180116051753.9639-1-fbl@sysclose.org
State Superseded
Headers show
Series [ovs-dev,v3] netdev-linux: do not send packets to down tap ifaces. | expand

Commit Message

Flavio Leitner Jan. 16, 2018, 5:17 a.m. UTC
Today OVS pushes packets to the TAP interface ignoring its
current state. That works because the kernel will return -EIO
when it's not UP and OVS will just ignore that as it is not
an OVS issue.

However, it causes a huge impact when broadcasts happen when
using userspace datapath accelerated with DPDK (e.g.: action
NORMAL).  This patch improves the situation by checking the
TAP's interface state before issueing any syscall.

However, there might be use-cases moving interfaces to other
networking namespaces and in that case, OVS can't retrieve
the iface state (sets it to DOWN). That would stop the traffic
breaking the use-case. This patch relies on netlink notifications
to find out if the device is local or not. When it's local, the
device state is checked otherwise it will behave as before.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
---
 NEWS               |  2 ++
 lib/netdev-linux.c | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

ChangeLog:
v3 - mentioned in the NEWS file.
v2 - drops are accounted.

Comments

Vishal Deep Ajmera Jan. 17, 2018, 12:43 p.m. UTC | #1
Hi Flavio,

I was testing your patch and comparing the stats of tap port for both ovs-master 
and your patch. The drop stats are now matching with master.

However I still see one more difference, in earlier case the "tx_packets" were 
also incremented along with "tx_dropped" when the tap port is DOWN. But with 
your patch "tx_packets" does not increment. Though I believe the new stats seem 
to be correct. Does it need to be documented ?

Otherwise the patch look ok to me.

Tested-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>

Warm Regards,
Vishal Ajmera
Flavio Leitner Jan. 17, 2018, 1:50 p.m. UTC | #2
On Wed, Jan 17, 2018 at 12:43:08PM +0000, Vishal Deep Ajmera wrote:
> Hi Flavio,

Hi Vishal,

 
> I was testing your patch and comparing the stats of tap port for both ovs-master 
> and your patch. The drop stats are now matching with master.

Thanks for doing that work, I appreciate it!

 
> However I still see one more difference, in earlier case the "tx_packets" were 
> also incremented along with "tx_dropped" when the tap port is DOWN. But with 
> your patch "tx_packets" does not increment. Though I believe the new stats seem 
> to be correct. Does it need to be documented ?

I wonder how to interpret that combination because either the packet
is transmitted or dropped, so I agree that now it seems to be correct.

You mean add more details to NEWS file explaining that change?


> Otherwise the patch look ok to me.
> 
> Tested-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>

Thanks!
Vishal Deep Ajmera Jan. 17, 2018, 4:51 p.m. UTC | #3
> You mean add more details to NEWS file explaining that change?

I am not much familiar with documentation part of ovs. It will help if
others can comment on where this change should be documented.
Ben Pfaff Jan. 17, 2018, 6:10 p.m. UTC | #4
On Wed, Jan 17, 2018 at 04:51:20PM +0000, Vishal Deep Ajmera wrote:
> > You mean add more details to NEWS file explaining that change?
> 
> I am not much familiar with documentation part of ovs. It will help if
> others can comment on where this change should be documented.

vswitchd/vswitch.xml has some type-specific documentation on most
interface types.  It doesn't currently have any for tap devices, but
that might be the most appropriate place.
Flavio Leitner Jan. 18, 2018, 12:11 a.m. UTC | #5
On Wed, Jan 17, 2018 at 10:10:17AM -0800, Ben Pfaff wrote:
> On Wed, Jan 17, 2018 at 04:51:20PM +0000, Vishal Deep Ajmera wrote:
> > > You mean add more details to NEWS file explaining that change?
> > 
> > I am not much familiar with documentation part of ovs. It will help if
> > others can comment on where this change should be documented.
> 
> vswitchd/vswitch.xml has some type-specific documentation on most
> interface types.  It doesn't currently have any for tap devices, but
> that might be the most appropriate place.

Good idea, I send out another patch updating vswitch.xml.
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343364.html
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index cb020d00d..545c64e63 100644
--- a/NEWS
+++ b/NEWS
@@ -32,6 +32,8 @@  Post-v2.8.0
        connection tracking entry.
      * New "ct-set-maxconns", "ct-get-maxconns", and "ct-get-nconns" commands
        for userspace datapath.
+   - No longer send packets to the Linux TAP device if it's DOWN unless it is
+     in another networking namespace.
    - DPDK:
      * Add support for DPDK v17.11
      * Add support for vHost IOMMU
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 37143b8df..6dae7964a 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -502,6 +502,8 @@  struct netdev_linux {
 
     /* For devices of class netdev_tap_class only. */
     int tap_fd;
+    bool present;               /* If the device is present in the namespace */
+    uint64_t tx_dropped;        /* tap device can drop if the iface is down */
 };
 
 struct netdev_rxq_linux {
@@ -750,8 +752,10 @@  netdev_linux_update(struct netdev_linux *dev,
             dev->ifindex = change->if_index;
             dev->cache_valid |= VALID_IFINDEX;
             dev->get_ifindex_error = 0;
+            dev->present = true;
         } else {
             netdev_linux_changed(dev, change->ifi_flags, 0);
+            dev->present = false;
         }
     } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) {
         /* Invalidates in4, in6. */
@@ -1234,6 +1238,17 @@  netdev_linux_tap_batch_send(struct netdev *netdev_,
 {
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
     struct dp_packet *packet;
+
+    /* The Linux tap driver returns EIO if the device is not up,
+     * so if the device is not up, don't waste time sending it.
+     * However, if the device is in another network namespace
+     * then OVS can't retrieve the state. In that case, send the
+     * packets anyway. */
+    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
+        netdev->tx_dropped += dp_packet_batch_size(batch);
+        return 0;
+    }
+
     DP_PACKET_BATCH_FOR_EACH (packet, batch) {
         size_t size = dp_packet_size(packet);
         ssize_t retval;
@@ -1825,6 +1840,7 @@  netdev_tap_get_stats(const struct netdev *netdev_, struct netdev_stats *stats)
         stats->multicast           += dev_stats.multicast;
         stats->collisions          += dev_stats.collisions;
     }
+    stats->tx_dropped += netdev->tx_dropped;
     ovs_mutex_unlock(&netdev->mutex);
 
     return error;