Message ID | 20180110160643.3273-1-fbl@sysclose.org |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] netdev-linux: do not send packets to down tap ifaces. | expand |
On Wed, Jan 10, 2018 at 02:06:43PM -0200, Flavio Leitner wrote: > Today OVS pushes packets to the TAP interface ignoring its > current state. That works because the kernel will return -EIO > when it's not UP and OVS will just ignore that as it is not > an OVS issue. > > However, it causes a huge impact when broadcasts happen when > using userspace datapath accelerated with DPDK (e.g.: action > NORMAL). This patch improves the situation by checking the > TAP's interface state before issueing any syscall. > > However, there might be use-cases moving interfaces to other > networking namespaces and in that case, OVS can't retrieve > the iface state (sets it to DOWN). That would stop the traffic > breaking the use-case. This patch relies on netlink notifications > to find out if the device is local or not. When it's local, the > device state is checked otherwise it will behave as before. > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> Can you help me understand the "huge impact"? What kind of impact? Thanks, Ben.
On Wed, Jan 10, 2018 at 03:09:30PM -0800, Ben Pfaff wrote: > On Wed, Jan 10, 2018 at 02:06:43PM -0200, Flavio Leitner wrote: > > Today OVS pushes packets to the TAP interface ignoring its > > current state. That works because the kernel will return -EIO > > when it's not UP and OVS will just ignore that as it is not > > an OVS issue. > > > > However, it causes a huge impact when broadcasts happen when > > using userspace datapath accelerated with DPDK (e.g.: action > > NORMAL). This patch improves the situation by checking the > > TAP's interface state before issueing any syscall. > > > > However, there might be use-cases moving interfaces to other > > networking namespaces and in that case, OVS can't retrieve > > the iface state (sets it to DOWN). That would stop the traffic > > breaking the use-case. This patch relies on netlink notifications > > to find out if the device is local or not. When it's local, the > > device state is checked otherwise it will behave as before. > > > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > > Can you help me understand the "huge impact"? What kind of impact? The PMD thread needs to be all time processing packets in order to provide maximum performance (no loss, less jitter and stable tput). However, it calls the write() syscall when sending packets to tap devices and then we lost control over the CPU. One exampe is that we need to warm up the fdb before starting the performance tests because the ARP resolution uses the syscall and causes issues right at the beginning. So, basically every time a broadcast happen for whatever reason, there is a high chance of stalling the PMD thread enough to cause packet loss, jitter or tput degradation. A real world example is OpenStack running with OVS-DPDK and serving multiple VMs through vhost-user ports. If any of the VMs triggers a broadcast, other VMs will notice the penalty.
Hi Flavio, Thank you for looking into this issue. It certainly improves the situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state. However, I see an issue with the patch. It does not handle device statistics. Earlier even when tap-port was DOWN, we would make a write system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ? Warm Regards, Vishal Ajmera -----Original Message----- From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner Sent: Wednesday, January 10, 2018 9:37 PM To: dev@openvswitch.org Cc: Flavio Leitner <fbl@sysclose.org> Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). This patch improves the situation by checking the TAP's interface state before issueing any syscall. However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. Signed-off-by: Flavio Leitner <fbl@sysclose.org> --- lib/netdev-linux.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index ccb6def6c..53b08bebd 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -502,6 +502,7 @@ struct netdev_linux { /* For devices of class netdev_tap_class only. */ int tap_fd; + bool present; /* If the device is present in the namespace */ }; struct netdev_rxq_linux { @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev, dev->ifindex = change->if_index; dev->cache_valid |= VALID_IFINDEX; dev->get_ifindex_error = 0; + dev->present = true; } else { netdev_linux_changed(dev, change->ifi_flags, 0); + dev->present = false; } } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) { /* Invalidates in4, in6. */ @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, { struct netdev_linux *netdev = netdev_linux_cast(netdev_); struct dp_packet *packet; + + /* The Linux tap driver returns EIO if the device is not up, + * so if the device is not up, don't waste time sending it. + * However, if the device is in another network namespace + * then OVS can't retrieve the state. In that case, send the + * packets anyway. */ + if (netdev->present && !(netdev->ifi_flags & IFF_UP)) { + return 0; + } + DP_PACKET_BATCH_FOR_EACH (packet, batch) { size_t size = dp_packet_size(packet); ssize_t retval; -- 2.14.3
Hi Vishal, I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right? BR, Jan > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep Ajmera > Sent: Monday, 15 January, 2018 11:27 > To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > Hi Flavio, > > Thank you for looking into this issue. It certainly improves the situation for broadcast frames when we have multiple tap ports on the > bridge which are in DOWN state. > > However, I see an issue with the patch. It does not handle device statistics. Earlier even when tap-port was DOWN, we would make a write > system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets > to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ? > > Warm Regards, > Vishal Ajmera > > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner > Sent: Wednesday, January 10, 2018 9:37 PM > To: dev@openvswitch.org > Cc: Flavio Leitner <fbl@sysclose.org> > Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP > and OVS will just ignore that as it is not an OVS issue. > > However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). > This patch improves the situation by checking the TAP's interface state before issueing any syscall. > > However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state > (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is > local or not. When it's local, the device state is checked otherwise it will behave as before. > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > lib/netdev-linux.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index ccb6def6c..53b08bebd 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -502,6 +502,7 @@ struct netdev_linux { > > /* For devices of class netdev_tap_class only. */ > int tap_fd; > + bool present; /* If the device is present in the namespace */ > }; > > struct netdev_rxq_linux { > @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev, > dev->ifindex = change->if_index; > dev->cache_valid |= VALID_IFINDEX; > dev->get_ifindex_error = 0; > + dev->present = true; > } else { > netdev_linux_changed(dev, change->ifi_flags, 0); > + dev->present = false; > } > } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) { > /* Invalidates in4, in6. */ > @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, { > struct netdev_linux *netdev = netdev_linux_cast(netdev_); > struct dp_packet *packet; > + > + /* The Linux tap driver returns EIO if the device is not up, > + * so if the device is not up, don't waste time sending it. > + * However, if the device is in another network namespace > + * then OVS can't retrieve the state. In that case, send the > + * packets anyway. */ > + if (netdev->present && !(netdev->ifi_flags & IFF_UP)) { > + return 0; > + } > + > DP_PACKET_BATCH_FOR_EACH (packet, batch) { > size_t size = dp_packet_size(packet); > ssize_t retval; > -- > 2.14.3 > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Yes Jan. I believe it should be counted as tx-dropped on the tap interface. Warm Regards, Vishal Ajmera -----Original Message----- From: Jan Scheurich Sent: Monday, January 15, 2018 5:56 PM To: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>; Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org Cc: Rohith Basavaraja <rohith.basavaraja@ericsson.com> Subject: RE: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. Hi Vishal, I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right? BR, Jan > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep > Ajmera > Sent: Monday, 15 January, 2018 11:27 > To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > Hi Flavio, > > Thank you for looking into this issue. It certainly improves the > situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state. > > However, I see an issue with the patch. It does not handle device > statistics. Earlier even when tap-port was DOWN, we would make a write > system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ? > > Warm Regards, > Vishal Ajmera > > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner > Sent: Wednesday, January 10, 2018 9:37 PM > To: dev@openvswitch.org > Cc: Flavio Leitner <fbl@sysclose.org> > Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > Today OVS pushes packets to the TAP interface ignoring its current > state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. > > However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). > This patch improves the situation by checking the TAP's interface state before issueing any syscall. > > However, there might be use-cases moving interfaces to other > networking namespaces and in that case, OVS can't retrieve the iface > state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > lib/netdev-linux.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index > ccb6def6c..53b08bebd 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -502,6 +502,7 @@ struct netdev_linux { > > /* For devices of class netdev_tap_class only. */ > int tap_fd; > + bool present; /* If the device is present in the namespace */ > }; > > struct netdev_rxq_linux { > @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev, > dev->ifindex = change->if_index; > dev->cache_valid |= VALID_IFINDEX; > dev->get_ifindex_error = 0; > + dev->present = true; > } else { > netdev_linux_changed(dev, change->ifi_flags, 0); > + dev->present = false; > } > } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) { > /* Invalidates in4, in6. */ > @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, { > struct netdev_linux *netdev = netdev_linux_cast(netdev_); > struct dp_packet *packet; > + > + /* The Linux tap driver returns EIO if the device is not up, > + * so if the device is not up, don't waste time sending it. > + * However, if the device is in another network namespace > + * then OVS can't retrieve the state. In that case, send the > + * packets anyway. */ > + if (netdev->present && !(netdev->ifi_flags & IFF_UP)) { > + return 0; > + } > + > DP_PACKET_BATCH_FOR_EACH (packet, batch) { > size_t size = dp_packet_size(packet); > ssize_t retval; > -- > 2.14.3 > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Mon, Jan 15, 2018 at 03:12:49PM +0000, Vishal Deep Ajmera wrote: > Yes Jan. I believe it should be counted as tx-dropped on the tap interface. Sounds reasonable. I will have a look. fbl > > Warm Regards, > Vishal Ajmera > > -----Original Message----- > From: Jan Scheurich > Sent: Monday, January 15, 2018 5:56 PM > To: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>; Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org > Cc: Rohith Basavaraja <rohith.basavaraja@ericsson.com> > Subject: RE: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > Hi Vishal, > > I assume you want to count these packets as "tx dropped" on the tap interface ports in DOWN state, right? > > BR, Jan > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org > > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Vishal Deep > > Ajmera > > Sent: Monday, 15 January, 2018 11:27 > > To: Flavio Leitner <fbl@sysclose.org>; dev@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > > > Hi Flavio, > > > > Thank you for looking into this issue. It certainly improves the > > situation for broadcast frames when we have multiple tap ports on the bridge which are in DOWN state. > > > > However, I see an issue with the patch. It does not handle device > > statistics. Earlier even when tap-port was DOWN, we would make a write > > system-call and kernel drops the packets which also accounts it in device statistics (drop counter). But now, since we skip sending packets to kernel then we will have to adjust it in netdev-linux. Can you modify the patch accordingly ? > > > > Warm Regards, > > Vishal Ajmera > > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org > > [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Flavio Leitner > > Sent: Wednesday, January 10, 2018 9:37 PM > > To: dev@openvswitch.org > > Cc: Flavio Leitner <fbl@sysclose.org> > > Subject: [ovs-dev] [PATCH] netdev-linux: do not send packets to down tap ifaces. > > > > Today OVS pushes packets to the TAP interface ignoring its current > > state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. > > > > However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). > > This patch improves the situation by checking the TAP's interface state before issueing any syscall. > > > > However, there might be use-cases moving interfaces to other > > networking namespaces and in that case, OVS can't retrieve the iface > > state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. > > > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > > --- > > lib/netdev-linux.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index > > ccb6def6c..53b08bebd 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -502,6 +502,7 @@ struct netdev_linux { > > > > /* For devices of class netdev_tap_class only. */ > > int tap_fd; > > + bool present; /* If the device is present in the namespace */ > > }; > > > > struct netdev_rxq_linux { > > @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev, > > dev->ifindex = change->if_index; > > dev->cache_valid |= VALID_IFINDEX; > > dev->get_ifindex_error = 0; > > + dev->present = true; > > } else { > > netdev_linux_changed(dev, change->ifi_flags, 0); > > + dev->present = false; > > } > > } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) { > > /* Invalidates in4, in6. */ > > @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, { > > struct netdev_linux *netdev = netdev_linux_cast(netdev_); > > struct dp_packet *packet; > > + > > + /* The Linux tap driver returns EIO if the device is not up, > > + * so if the device is not up, don't waste time sending it. > > + * However, if the device is in another network namespace > > + * then OVS can't retrieve the state. In that case, send the > > + * packets anyway. */ > > + if (netdev->present && !(netdev->ifi_flags & IFF_UP)) { > > + return 0; > > + } > > + > > DP_PACKET_BATCH_FOR_EACH (packet, batch) { > > size_t size = dp_packet_size(packet); > > ssize_t retval; > > -- > > 2.14.3 > > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index ccb6def6c..53b08bebd 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -502,6 +502,7 @@ struct netdev_linux { /* For devices of class netdev_tap_class only. */ int tap_fd; + bool present; /* If the device is present in the namespace */ }; struct netdev_rxq_linux { @@ -750,8 +751,10 @@ netdev_linux_update(struct netdev_linux *dev, dev->ifindex = change->if_index; dev->cache_valid |= VALID_IFINDEX; dev->get_ifindex_error = 0; + dev->present = true; } else { netdev_linux_changed(dev, change->ifi_flags, 0); + dev->present = false; } } else if (rtnetlink_type_is_rtnlgrp_addr(change->nlmsg_type)) { /* Invalidates in4, in6. */ @@ -1234,6 +1237,16 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, { struct netdev_linux *netdev = netdev_linux_cast(netdev_); struct dp_packet *packet; + + /* The Linux tap driver returns EIO if the device is not up, + * so if the device is not up, don't waste time sending it. + * However, if the device is in another network namespace + * then OVS can't retrieve the state. In that case, send the + * packets anyway. */ + if (netdev->present && !(netdev->ifi_flags & IFF_UP)) { + return 0; + } + DP_PACKET_BATCH_FOR_EACH (packet, batch) { size_t size = dp_packet_size(packet); ssize_t retval;
Today OVS pushes packets to the TAP interface ignoring its current state. That works because the kernel will return -EIO when it's not UP and OVS will just ignore that as it is not an OVS issue. However, it causes a huge impact when broadcasts happen when using userspace datapath accelerated with DPDK (e.g.: action NORMAL). This patch improves the situation by checking the TAP's interface state before issueing any syscall. However, there might be use-cases moving interfaces to other networking namespaces and in that case, OVS can't retrieve the iface state (sets it to DOWN). That would stop the traffic breaking the use-case. This patch relies on netlink notifications to find out if the device is local or not. When it's local, the device state is checked otherwise it will behave as before. Signed-off-by: Flavio Leitner <fbl@sysclose.org> --- lib/netdev-linux.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)