mbox series

[ovs-dev,v3,0/6] Add minimum network namespace support.

Message ID 20171207022224.25215-1-fbl@redhat.com
Headers show
Series Add minimum network namespace support. | expand

Message

Flavio Leitner Dec. 7, 2017, 2:22 a.m. UTC
Today Open vSwitch doesn't know about network namespaces (netns), but
users are moving internal ports to other namespaces.  Although packets
are still flowing, the daemon fails to find out basic port information,
like if it is UP or DOWN, for instance.

This patchset rely on a new kernel vport API recently accepted to find
out the new network namespace ID of a bridge's port. This information
along with the port's name recorded in the database is used to match the
corresponding netlink messages.

This patchset also leverages another kernel API that allows the daemon
to listen to all netlink messages from all netns which has an ID assigned
into it.  This and the previous change allows the userspace to track ports
in other network namespaces.

If any of the APIs aren't available, it falls back to the older APIs to
not break backwards compatibility.


ChangeLog:

* V3:
  - Fixed long line (Greg)
  - Rewrote assuming that the kernel will not send negative
    numbers as valid network namespace id. (Ben, Flavio, Jiri)

* V2:
  - report and close unexpected file descriptors (Ben)

Flavio Leitner (6):
  netlink: provide network namespace id from a msg.
  netnsid: update device only if netnsid matches.
  netdev-linux: use netlink to update netdev.
  netlink linux: enable listening to all nsids
  nlmon: added netns support.
  netdev-linux: fail ops not supporting remote netns.

 configure.ac                                      |   3 +-
 datapath/linux/compat/include/linux/openvswitch.h |   2 +
 lib/automake.mk                                   |   1 +
 lib/daemon-unix.c                                 |   3 +-
 lib/daemon.man                                    |   6 +-
 lib/daemon.xml                                    |   8 +-
 lib/dpif-netlink.c                                |  14 +-
 lib/dpif-netlink.h                                |   1 +
 lib/netdev-linux.c                                | 312 ++++++++++++++++++++--
 lib/netlink-notifier.c                            |   2 +-
 lib/netlink-protocol.h                            |   6 +
 lib/netlink-socket.c                              |  80 +++++-
 lib/netlink-socket.h                              |   4 +-
 lib/netnsid.h                                     | 139 ++++++++++
 tests/ofproto-macros.at                           |   1 +
 tests/ovn-controller-vtep.at                      |   1 +
 utilities/nlmon.c                                 |  10 +-
 17 files changed, 549 insertions(+), 44 deletions(-)
 create mode 100644 lib/netnsid.h

Comments

Flavio Leitner Jan. 5, 2018, 1:05 p.m. UTC | #1
Hi,

I know this change is not easy to review, but at the same time
we are getting close to the release date.  I am afraid that
delaying the acceptance will reduce its exposure and testing.

Please let me know the questions you might have to help you
reviewing this patchset.

Thanks,
fbl



On Thu, Dec 07, 2017 at 12:22:18AM -0200, Flavio Leitner wrote:
> Today Open vSwitch doesn't know about network namespaces (netns), but
> users are moving internal ports to other namespaces.  Although packets
> are still flowing, the daemon fails to find out basic port information,
> like if it is UP or DOWN, for instance.
> 
> This patchset rely on a new kernel vport API recently accepted to find
> out the new network namespace ID of a bridge's port. This information
> along with the port's name recorded in the database is used to match the
> corresponding netlink messages.
> 
> This patchset also leverages another kernel API that allows the daemon
> to listen to all netlink messages from all netns which has an ID assigned
> into it.  This and the previous change allows the userspace to track ports
> in other network namespaces.
> 
> If any of the APIs aren't available, it falls back to the older APIs to
> not break backwards compatibility.
> 
> 
> ChangeLog:
> 
> * V3:
>   - Fixed long line (Greg)
>   - Rewrote assuming that the kernel will not send negative
>     numbers as valid network namespace id. (Ben, Flavio, Jiri)
> 
> * V2:
>   - report and close unexpected file descriptors (Ben)
> 
> Flavio Leitner (6):
>   netlink: provide network namespace id from a msg.
>   netnsid: update device only if netnsid matches.
>   netdev-linux: use netlink to update netdev.
>   netlink linux: enable listening to all nsids
>   nlmon: added netns support.
>   netdev-linux: fail ops not supporting remote netns.
> 
>  configure.ac                                      |   3 +-
>  datapath/linux/compat/include/linux/openvswitch.h |   2 +
>  lib/automake.mk                                   |   1 +
>  lib/daemon-unix.c                                 |   3 +-
>  lib/daemon.man                                    |   6 +-
>  lib/daemon.xml                                    |   8 +-
>  lib/dpif-netlink.c                                |  14 +-
>  lib/dpif-netlink.h                                |   1 +
>  lib/netdev-linux.c                                | 312 ++++++++++++++++++++--
>  lib/netlink-notifier.c                            |   2 +-
>  lib/netlink-protocol.h                            |   6 +
>  lib/netlink-socket.c                              |  80 +++++-
>  lib/netlink-socket.h                              |   4 +-
>  lib/netnsid.h                                     | 139 ++++++++++
>  tests/ofproto-macros.at                           |   1 +
>  tests/ovn-controller-vtep.at                      |   1 +
>  utilities/nlmon.c                                 |  10 +-
>  17 files changed, 549 insertions(+), 44 deletions(-)
>  create mode 100644 lib/netnsid.h
> 
> -- 
> 2.14.3
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ben Pfaff Jan. 11, 2018, 12:07 a.m. UTC | #2
Thanks for the series.  I actually think that it's pretty close.  For
me, this series falls into the category of "obviously the right
direction but impossible to fully validate before applying it".  It
builds fine and I was about to push it when I realized that it breaks
most of the unit tests with failures like:

    --- /dev/null   2017-07-26 15:46:07.674034656 -0700
    +++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/3/stdout  2018-01-10 16:03:35.309361001 -0800
    @@ -0,0 +1 @@
    +2018-01-11T00:03:35Z|00007|dpif_netlink|WARN|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.

The unit tests are supposed to work without the kernel module loaded (I
run them on a system that doesn't even have the module).  Previously
they didn't even try to interact with the kernel module and now clearly
they do.  Can you figure out how that changed?  It would be best to fix
it.

Probably, this series should add something to NEWS and possibly to other
documentation that describes the new feature.

Thanks,

Ben.

On Fri, Jan 05, 2018 at 11:05:05AM -0200, Flavio Leitner wrote:
> 
> Hi,
> 
> I know this change is not easy to review, but at the same time
> we are getting close to the release date.  I am afraid that
> delaying the acceptance will reduce its exposure and testing.
> 
> Please let me know the questions you might have to help you
> reviewing this patchset.
> 
> Thanks,
> fbl
> 
> 
> 
> On Thu, Dec 07, 2017 at 12:22:18AM -0200, Flavio Leitner wrote:
> > Today Open vSwitch doesn't know about network namespaces (netns), but
> > users are moving internal ports to other namespaces.  Although packets
> > are still flowing, the daemon fails to find out basic port information,
> > like if it is UP or DOWN, for instance.
> > 
> > This patchset rely on a new kernel vport API recently accepted to find
> > out the new network namespace ID of a bridge's port. This information
> > along with the port's name recorded in the database is used to match the
> > corresponding netlink messages.
> > 
> > This patchset also leverages another kernel API that allows the daemon
> > to listen to all netlink messages from all netns which has an ID assigned
> > into it.  This and the previous change allows the userspace to track ports
> > in other network namespaces.
> > 
> > If any of the APIs aren't available, it falls back to the older APIs to
> > not break backwards compatibility.
> > 
> > 
> > ChangeLog:
> > 
> > * V3:
> >   - Fixed long line (Greg)
> >   - Rewrote assuming that the kernel will not send negative
> >     numbers as valid network namespace id. (Ben, Flavio, Jiri)
> > 
> > * V2:
> >   - report and close unexpected file descriptors (Ben)
> > 
> > Flavio Leitner (6):
> >   netlink: provide network namespace id from a msg.
> >   netnsid: update device only if netnsid matches.
> >   netdev-linux: use netlink to update netdev.
> >   netlink linux: enable listening to all nsids
> >   nlmon: added netns support.
> >   netdev-linux: fail ops not supporting remote netns.
> > 
> >  configure.ac                                      |   3 +-
> >  datapath/linux/compat/include/linux/openvswitch.h |   2 +
> >  lib/automake.mk                                   |   1 +
> >  lib/daemon-unix.c                                 |   3 +-
> >  lib/daemon.man                                    |   6 +-
> >  lib/daemon.xml                                    |   8 +-
> >  lib/dpif-netlink.c                                |  14 +-
> >  lib/dpif-netlink.h                                |   1 +
> >  lib/netdev-linux.c                                | 312 ++++++++++++++++++++--
> >  lib/netlink-notifier.c                            |   2 +-
> >  lib/netlink-protocol.h                            |   6 +
> >  lib/netlink-socket.c                              |  80 +++++-
> >  lib/netlink-socket.h                              |   4 +-
> >  lib/netnsid.h                                     | 139 ++++++++++
> >  tests/ofproto-macros.at                           |   1 +
> >  tests/ovn-controller-vtep.at                      |   1 +
> >  utilities/nlmon.c                                 |  10 +-
> >  17 files changed, 549 insertions(+), 44 deletions(-)
> >  create mode 100644 lib/netnsid.h
> > 
> > -- 
> > 2.14.3
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> -- 
> Flavio
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Flavio Leitner Jan. 11, 2018, 1:01 p.m. UTC | #3
On Wed, Jan 10, 2018 at 04:07:38PM -0800, Ben Pfaff wrote:
> Thanks for the series.  I actually think that it's pretty close.  For
> me, this series falls into the category of "obviously the right
> direction but impossible to fully validate before applying it".  It

I have the same feeling and that's why I sent the other email
asking to avoid leaving this to the last minute, so thanks :-)


> builds fine and I was about to push it when I realized that it breaks
> most of the unit tests with failures like:
> 
>     --- /dev/null   2017-07-26 15:46:07.674034656 -0700
>     +++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/3/stdout  2018-01-10 16:03:35.309361001 -0800
>     @@ -0,0 +1 @@
>     +2018-01-11T00:03:35Z|00007|dpif_netlink|WARN|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.
> 
> The unit tests are supposed to work without the kernel module loaded (I
> run them on a system that doesn't even have the module).  Previously
> they didn't even try to interact with the kernel module and now clearly
> they do.  Can you figure out how that changed?  It would be best to fix
> it.

Hm, I guess I always had the openvswitch module loaded because all
unit tests were passing.  I will have a look without the module.


> Probably, this series should add something to NEWS and possibly to other
> documentation that describes the new feature.

Sounds good.
Thanks,
fbl

> 
> Thanks,
> 
> Ben.
> 
> On Fri, Jan 05, 2018 at 11:05:05AM -0200, Flavio Leitner wrote:
> > 
> > Hi,
> > 
> > I know this change is not easy to review, but at the same time
> > we are getting close to the release date.  I am afraid that
> > delaying the acceptance will reduce its exposure and testing.
> > 
> > Please let me know the questions you might have to help you
> > reviewing this patchset.
> > 
> > Thanks,
> > fbl
> > 
> > 
> > 
> > On Thu, Dec 07, 2017 at 12:22:18AM -0200, Flavio Leitner wrote:
> > > Today Open vSwitch doesn't know about network namespaces (netns), but
> > > users are moving internal ports to other namespaces.  Although packets
> > > are still flowing, the daemon fails to find out basic port information,
> > > like if it is UP or DOWN, for instance.
> > > 
> > > This patchset rely on a new kernel vport API recently accepted to find
> > > out the new network namespace ID of a bridge's port. This information
> > > along with the port's name recorded in the database is used to match the
> > > corresponding netlink messages.
> > > 
> > > This patchset also leverages another kernel API that allows the daemon
> > > to listen to all netlink messages from all netns which has an ID assigned
> > > into it.  This and the previous change allows the userspace to track ports
> > > in other network namespaces.
> > > 
> > > If any of the APIs aren't available, it falls back to the older APIs to
> > > not break backwards compatibility.
> > > 
> > > 
> > > ChangeLog:
> > > 
> > > * V3:
> > >   - Fixed long line (Greg)
> > >   - Rewrote assuming that the kernel will not send negative
> > >     numbers as valid network namespace id. (Ben, Flavio, Jiri)
> > > 
> > > * V2:
> > >   - report and close unexpected file descriptors (Ben)
> > > 
> > > Flavio Leitner (6):
> > >   netlink: provide network namespace id from a msg.
> > >   netnsid: update device only if netnsid matches.
> > >   netdev-linux: use netlink to update netdev.
> > >   netlink linux: enable listening to all nsids
> > >   nlmon: added netns support.
> > >   netdev-linux: fail ops not supporting remote netns.
> > > 
> > >  configure.ac                                      |   3 +-
> > >  datapath/linux/compat/include/linux/openvswitch.h |   2 +
> > >  lib/automake.mk                                   |   1 +
> > >  lib/daemon-unix.c                                 |   3 +-
> > >  lib/daemon.man                                    |   6 +-
> > >  lib/daemon.xml                                    |   8 +-
> > >  lib/dpif-netlink.c                                |  14 +-
> > >  lib/dpif-netlink.h                                |   1 +
> > >  lib/netdev-linux.c                                | 312 ++++++++++++++++++++--
> > >  lib/netlink-notifier.c                            |   2 +-
> > >  lib/netlink-protocol.h                            |   6 +
> > >  lib/netlink-socket.c                              |  80 +++++-
> > >  lib/netlink-socket.h                              |   4 +-
> > >  lib/netnsid.h                                     | 139 ++++++++++
> > >  tests/ofproto-macros.at                           |   1 +
> > >  tests/ovn-controller-vtep.at                      |   1 +
> > >  utilities/nlmon.c                                 |  10 +-
> > >  17 files changed, 549 insertions(+), 44 deletions(-)
> > >  create mode 100644 lib/netnsid.h
> > > 
> > > -- 
> > > 2.14.3
> > > 
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> > -- 
> > Flavio
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Flavio Leitner Jan. 26, 2018, 6:23 p.m. UTC | #4
Hi Ben,


On Wed, Jan 10, 2018 at 04:07:38PM -0800, Ben Pfaff wrote:
> Thanks for the series.  I actually think that it's pretty close.  For
> me, this series falls into the category of "obviously the right
> direction but impossible to fully validate before applying it".  It
> builds fine and I was about to push it when I realized that it breaks
> most of the unit tests with failures like:
> 
>     --- /dev/null   2017-07-26 15:46:07.674034656 -0700
>     +++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/3/stdout  2018-01-10 16:03:35.309361001 -0800
>     @@ -0,0 +1 @@
>     +2018-01-11T00:03:35Z|00007|dpif_netlink|WARN|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.
> 
> The unit tests are supposed to work without the kernel module loaded (I
> run them on a system that doesn't even have the module).  Previously
> they didn't even try to interact with the kernel module and now clearly
> they do.  Can you figure out how that changed?  It would be best to fix
> it.


When bridge is running, it will try to initialize the routing table
and for that it will try get interface's flags (lo, NICs), or the addr
list.  Both interfaces are now using netlink messages to find out if the
interface is in the local network namespace or not[1] and that's when the
netlink is initialized. If you don't have the module, the warn message
is printed.

This patchset relies on netlink messages anyways to keep track of the
interfaces, so it is preferred. It falls back to the old interfaces
only if the kernel is not recent enough though.

One possible way to fix this is to try the old interface first and fall
back to netlink only if that fails.  That's not ideal because we would
need to keep track of return codes indicating the device is not in the
local network namespace, or blindly try it again.  Another thing to
consider is that we might be able to remove the old interface and only
use netlink in the future to simplify the code.

It doesn't sound like a good idea to remove that message as it has value.

I can fix the testsuite vswitchd initialization to ignore that message,
but after each test is completed, the testsuite checks for WARN messages
and fails if there is one in the logs.

Ben, any thoughts? I don't know how bad it is to add a dependency to
openvswitch module to decide whether the patchset must address that or
if it's okay to fix the testsuite.


[1] 
netdev_linux_netnsid_is_remote
 +- netdev_linux_netnsid_update__
  +- dpif_netlink_vport_get
    (this uses openvswitch's  vport get cmd)

fbl
Ben Pfaff Feb. 1, 2018, 10:42 p.m. UTC | #5
On Fri, Jan 26, 2018 at 04:23:44PM -0200, Flavio Leitner wrote:
> Hi Ben,
> 
> 
> On Wed, Jan 10, 2018 at 04:07:38PM -0800, Ben Pfaff wrote:
> > Thanks for the series.  I actually think that it's pretty close.  For
> > me, this series falls into the category of "obviously the right
> > direction but impossible to fully validate before applying it".  It
> > builds fine and I was about to push it when I realized that it breaks
> > most of the unit tests with failures like:
> > 
> >     --- /dev/null   2017-07-26 15:46:07.674034656 -0700
> >     +++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/3/stdout  2018-01-10 16:03:35.309361001 -0800
> >     @@ -0,0 +1 @@
> >     +2018-01-11T00:03:35Z|00007|dpif_netlink|WARN|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.
> > 
> > The unit tests are supposed to work without the kernel module loaded (I
> > run them on a system that doesn't even have the module).  Previously
> > they didn't even try to interact with the kernel module and now clearly
> > they do.  Can you figure out how that changed?  It would be best to fix
> > it.
> 
> 
> When bridge is running, it will try to initialize the routing table
> and for that it will try get interface's flags (lo, NICs), or the addr
> list.  Both interfaces are now using netlink messages to find out if the
> interface is in the local network namespace or not[1] and that's when the
> netlink is initialized. If you don't have the module, the warn message
> is printed.
> 
> This patchset relies on netlink messages anyways to keep track of the
> interfaces, so it is preferred. It falls back to the old interfaces
> only if the kernel is not recent enough though.
> 
> One possible way to fix this is to try the old interface first and fall
> back to netlink only if that fails.  That's not ideal because we would
> need to keep track of return codes indicating the device is not in the
> local network namespace, or blindly try it again.  Another thing to
> consider is that we might be able to remove the old interface and only
> use netlink in the future to simplify the code.
> 
> It doesn't sound like a good idea to remove that message as it has value.
> 
> I can fix the testsuite vswitchd initialization to ignore that message,
> but after each test is completed, the testsuite checks for WARN messages
> and fails if there is one in the logs.
> 
> Ben, any thoughts? I don't know how bad it is to add a dependency to
> openvswitch module to decide whether the patchset must address that or
> if it's okay to fix the testsuite.
> 
> 
> [1] 
> netdev_linux_netnsid_is_remote
>  +- netdev_linux_netnsid_update__
>   +- dpif_netlink_vport_get
>     (this uses openvswitch's  vport get cmd)

I see the following possibilities here.

* Do something to keep the code from trying to load the module.  It
  sounds like this is difficult or undesirable.

* Remove the message.  It has value since when it shows up in logs I can
  tell the user that's the problem.

* Downgrade the message to INFO level.  Maybe that is the right solution
  since now it's going to appear even if the user doesn't really need
  it, making it not anything to really warn about anymore.

* Ignore this message, too, in check_logs in tests/ofproto-macros.at.

* Downgrade the message to DBG level at the current point it's issued,
  but then issue it at WARN level later in dpif_netlink_open() since
  that's the place where it's clear that the user is trying to use
  something that's unavailable.