diff mbox

[ovs-dev,RFC] create vxlan device using rtnetlink interface

Message ID 1460755859-4282-1-git-send-email-cascardo@redhat.com
State RFC
Headers show

Commit Message

Thadeu Lima de Souza Cascardo April 15, 2016, 9:30 p.m. UTC
Hi, this is an RFC patch (could probably be 2 patches - more below), that
creates VXLAN devices in userspace and adds them as netdev vports, instead of
using the vxlan vport code.

The reason for this change is that it has been mentioned in the past that some
of the vport code should be considered compat code, that is, it won't receive
new features or flags.

That means tunnel devices should be created in userspace and added as netdev
vports. Some problems are noticeable in this patch, and I have gone through some
iterations back and forth.

First is the creation of the device under a switch/case. I tried to make this a
netdev_class function, overriding the netdev class by dpif_port_open_type. That
required exporting a lot of the vport functions. I can still do it that way, if
that's preferrable, but this seems more simple.

The other problem is during port_del, that since we don't have a netdev, the
dpif_port type would not be vxlan, and deciding whether to remove it or not
could not be made. In fact, this is one of the main reasons why this is an RFC.

The solution here is to decide on the type by the device's name, and there is
even a function for this, though it looks up for the netdev's already created,
those that probably come from the database. However, when OVS starts, it removes
ports from the datapath, and that information is not available, and may not even
be. In fact, since the dpif_port has a different name because it's a vport, it
won't match any netdev at all. So, I did the opposite of getting the type from
the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
make this more strict and use the port, we could still be in conflict with a
vxlan device created by the user with such name. This should have been one patch
by itself.

Since we already have similar problems, like failing to port_add when there is a
device named vxlan_sys_<dstport> created by the user, one could argue that this
patch does not introduce a new problem.

Jiri has suggested that we require users to create the interfaces themselves, by
using whatever method their OS has, and add them as netdev devices. That would
still require fetching some of the configuration from the device in order to
make it properly work with flow-based tunnels. In fact, if we set the remote or
local IP addresses on those devices, this would require multiple interfaces
instead of only one just to be able to specify the same level of configuration
as ovsdb allows us to. And that still requires some proper changes in order to
support flow-based tunnels (calling tnl_port_add/tnl_port_del with proper
configuration, for example).

Another option instead of relying on the device names is to use ifalias. It's a
user settable name that is opaque to the kernel and tools like iproute and
net-tools.

Any opinions on these options or other comments on the patch?
---
 lib/dpif-netlink.c | 285 +++++++++++++++++++++++++++++++++++++++++++----------
 lib/netdev-vport.c |  47 ++++++---
 lib/netdev-vport.h |   1 +
 lib/netdev.c       |   7 ++
 4 files changed, 273 insertions(+), 67 deletions(-)

Comments

Jesse Gross April 16, 2016, 3:36 a.m. UTC | #1
On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
<cascardo@redhat.com> wrote:
> Hi, this is an RFC patch (could probably be 2 patches - more below), that
> creates VXLAN devices in userspace and adds them as netdev vports, instead of
> using the vxlan vport code.
>
> The reason for this change is that it has been mentioned in the past that some
> of the vport code should be considered compat code, that is, it won't receive
> new features or flags.

Thanks for looking at this. I agree that this is definitely a
necessary direction.

> First is the creation of the device under a switch/case. I tried to make this a
> netdev_class function, overriding the netdev class by dpif_port_open_type. That
> required exporting a lot of the vport functions. I can still do it that way, if
> that's preferrable, but this seems more simple.

Can you give some examples of what would need to be exported? It would
be nice to just set the open function for each type but if it is
really horrible we can live with the switch statement.

> The other problem is during port_del, that since we don't have a netdev, the
> dpif_port type would not be vxlan, and deciding whether to remove it or not
> could not be made. In fact, this is one of the main reasons why this is an RFC.
>
> The solution here is to decide on the type by the device's name, and there is
> even a function for this, though it looks up for the netdev's already created,
> those that probably come from the database. However, when OVS starts, it removes
> ports from the datapath, and that information is not available, and may not even
> be. In fact, since the dpif_port has a different name because it's a vport, it
> won't match any netdev at all. So, I did the opposite of getting the type from
> the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
> make this more strict and use the port, we could still be in conflict with a
> vxlan device created by the user with such name. This should have been one patch
> by itself.

What about using the driver name exposed through ethtool? I believe
that all of the tunnel and internal devices expose this in a
consistent way - i.e. a VXLAN device can be queried and get back the
string "vxlan". Any driver other than the ones that we recognize can
be assumed to be OVS_VPORT_TYPE_NETDEV.

> Jiri has suggested that we require users to create the interfaces themselves, by
> using whatever method their OS has, and add them as netdev devices. That would
> still require fetching some of the configuration from the device in order to
> make it properly work with flow-based tunnels. In fact, if we set the remote or
> local IP addresses on those devices, this would require multiple interfaces
> instead of only one just to be able to specify the same level of configuration
> as ovsdb allows us to. And that still requires some proper changes in order to
> support flow-based tunnels (calling tnl_port_add/tnl_port_del with proper
> configuration, for example).

I'm not too excited about this. It seems like it would be a regression
- currently OVSDB allows remote creation of tunnels, so it seems like
this would break existing setups if it also requires users to
explicitly create tunnel devices on the host ahead of time.

One comment on the patch itself - it's possible that the device that
is being created might not support all of the necessary options that
we are passing to it. For example, the original VXLAN driver as merged
into the kernel didn't support COLLECT_METADATA. We'll need to check
that the device that was created supports what we need and fallback to
the old model otherwise.

I presume that you will convert all of the vport types over once we
have settled on the right model?
Thadeu Lima de Souza Cascardo April 18, 2016, 9:57 a.m. UTC | #2
On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
> <cascardo@redhat.com> wrote:
> > Hi, this is an RFC patch (could probably be 2 patches - more below), that
> > creates VXLAN devices in userspace and adds them as netdev vports, instead of
> > using the vxlan vport code.
> >
> > The reason for this change is that it has been mentioned in the past that some
> > of the vport code should be considered compat code, that is, it won't receive
> > new features or flags.
> 
> Thanks for looking at this. I agree that this is definitely a
> necessary direction.
> 
> > First is the creation of the device under a switch/case. I tried to make this a
> > netdev_class function, overriding the netdev class by dpif_port_open_type. That
> > required exporting a lot of the vport functions. I can still do it that way, if
> > that's preferrable, but this seems more simple.
> 
> Can you give some examples of what would need to be exported? It would
> be nice to just set the open function for each type but if it is
> really horrible we can live with the switch statement.
> 

netdev_vport_{alloc, construct, destruct, dealloc}, get_tunnel_config,
set_tunnel_config, get_netdev_tunnel_config.

We could also do it the other way around, write this code in netdev-vport.c and
export one or two functions from netdev-linux or dpif-netlink, and the create
function per tunnel type.

> > The other problem is during port_del, that since we don't have a netdev, the
> > dpif_port type would not be vxlan, and deciding whether to remove it or not
> > could not be made. In fact, this is one of the main reasons why this is an RFC.
> >
> > The solution here is to decide on the type by the device's name, and there is
> > even a function for this, though it looks up for the netdev's already created,
> > those that probably come from the database. However, when OVS starts, it removes
> > ports from the datapath, and that information is not available, and may not even
> > be. In fact, since the dpif_port has a different name because it's a vport, it
> > won't match any netdev at all. So, I did the opposite of getting the type from
> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
> > make this more strict and use the port, we could still be in conflict with a
> > vxlan device created by the user with such name. This should have been one patch
> > by itself.
> 
> What about using the driver name exposed through ethtool? I believe
> that all of the tunnel and internal devices expose this in a
> consistent way - i.e. a VXLAN device can be queried and get back the
> string "vxlan". Any driver other than the ones that we recognize can
> be assumed to be OVS_VPORT_TYPE_NETDEV.
> 

I don't see how this address the case when the user adds a vxlan interface
created by the system. Like:

ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
ovs-vsctl add-port br0 vxlan_sys_4789

Its driver would be vxlan. That goes to vxlan0 too.

But... wait a minute. We don't support adding devices name as vxlan_sys*. Such
names are reserved. I think that means we could probably rely on the names.

> > Jiri has suggested that we require users to create the interfaces themselves, by
> > using whatever method their OS has, and add them as netdev devices. That would
> > still require fetching some of the configuration from the device in order to
> > make it properly work with flow-based tunnels. In fact, if we set the remote or
> > local IP addresses on those devices, this would require multiple interfaces
> > instead of only one just to be able to specify the same level of configuration
> > as ovsdb allows us to. And that still requires some proper changes in order to
> > support flow-based tunnels (calling tnl_port_add/tnl_port_del with proper
> > configuration, for example).
> 
> I'm not too excited about this. It seems like it would be a regression
> - currently OVSDB allows remote creation of tunnels, so it seems like
> this would break existing setups if it also requires users to
> explicitly create tunnel devices on the host ahead of time.

Agreed. That's a very good reason not to go this path. And realizing we already
reserve the names, I am more comfortable relying on only that.

> 
> One comment on the patch itself - it's possible that the device that
> is being created might not support all of the necessary options that
> we are passing to it. For example, the original VXLAN driver as merged
> into the kernel didn't support COLLECT_METADATA. We'll need to check
> that the device that was created supports what we need and fallback to
> the old model otherwise.
> 

That should be taken care of. We would get EINVAL, and that's why I return
EOPNOTSUPP if that's the case. Then, the code falls back to the compat mode.

> I presume that you will convert all of the vport types over once we
> have settled on the right model?

Sure.

Thanks for the review and help.
Cascardo.

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
Jiri Benc April 18, 2016, 10:15 a.m. UTC | #3
On Fri, 15 Apr 2016 18:30:59 -0300, Thadeu Lima de Souza Cascardo wrote:
> Jiri has suggested that we require users to create the interfaces themselves, by
> using whatever method their OS has, and add them as netdev devices. That would
> still require fetching some of the configuration from the device in order to
> make it properly work with flow-based tunnels. In fact, if we set the remote or
> local IP addresses on those devices, this would require multiple interfaces
> instead of only one just to be able to specify the same level of configuration
> as ovsdb allows us to.

No need for multiple interfaces. We have 'external' parameter for 'ip
link add' that will create the interface in metada mode.

The reason I'm suggesting this is it solves the problem with cleanup in
unexpected conditions (What happens with the interface automatically
created by ovs when ovs crashes? What happens if user creates an
interface named vxlan_sys_*? Relaying on a particular interface name
doesn't look like a good design to me.) and it allows easy support of
future vxlan features without modifying ovs.

This would require querying the kernel whether the interface is in
metadata mode on vport addition, sure. But that's very easy to do and
I think it's the only configuration parameter that we need to know in
ovs.

The existing code that allows adding the ports through the compat
genetlink interface would stay, of course, for backwards compatibility.
To use new vxlan feature, the new way of setup would be needed.

 Jiri
Jiri Benc April 18, 2016, 10:27 a.m. UTC | #4
On Fri, 15 Apr 2016 20:36:51 -0700, Jesse Gross wrote:
> What about using the driver name exposed through ethtool? I believe
> that all of the tunnel and internal devices expose this in a
> consistent way - i.e. a VXLAN device can be queried and get back the
> string "vxlan". Any driver other than the ones that we recognize can
> be assumed to be OVS_VPORT_TYPE_NETDEV.

Or netlink. Though ethtool is more generic in this case, as with
netlink, every interface type would need to be queried differently.

> I'm not too excited about this. It seems like it would be a regression
> - currently OVSDB allows remote creation of tunnels, so it seems like
> this would break existing setups if it also requires users to
> explicitly create tunnel devices on the host ahead of time.

It wouldn't break the existing setups, the current code is not going
away.

However, it wouldn't allow remote creation of tunnels with new features
(like VXLAN-GPE). I don't think it's that bad - we don't support remote
creation of e.g. veth pairs, yet it's common to have them.

> One comment on the patch itself - it's possible that the device that
> is being created might not support all of the necessary options that
> we are passing to it. For example, the original VXLAN driver as merged
> into the kernel didn't support COLLECT_METADATA. We'll need to check
> that the device that was created supports what we need and fallback to
> the old model otherwise.

Yes. Unfortunately, the only way to find out is to fetch back the
config from the created interface and if it doesn't match, delete the
interface. And fall back to the compat code but *only if* there are no
advanced features specified. What the "advanced features" are will need
to be hardcoded in ovs.

What I'm proposing makes this much simpler. No need for verifying the
interface was created with all the wanted flags, no need for knowing
what features are supported by the compat code, etc. Just take whatever
was given to ovs by the user and use it.

 Jiri
Jiri Benc April 18, 2016, 10:36 a.m. UTC | #5
On Mon, 18 Apr 2016 06:57:22 -0300, Thadeu Lima de Souza Cascardo wrote:
> I don't see how this address the case when the user adds a vxlan interface
> created by the system. Like:
> 
> ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
> ovs-vsctl add-port br0 vxlan_sys_4789
> 
> Its driver would be vxlan. That goes to vxlan0 too.

Yes but that's a fundamental problem that cannot be solved. There's no
way an interface created by the user can be *reliably* distinguished
from an interface automatically created by ovs, whatever approach you
choose. Anything that ovs does with the interface, the user can do,
too.

The only exception is adding the interface to ovs bridge - that can't be
done with 'ip'. Thus, if you disallow adding of vxlan interfaces (i.e.
'ovs-vsctl add-port br0 iface' where 'iface' is an already created
interface that is of vxlan type), then and only then you can be sure
that the interfaces connected to the kernel datapath were automatically
created by ovs.

> But... wait a minute. We don't support adding devices name as vxlan_sys*. Such
> names are reserved. I think that means we could probably rely on the names.

They are not. You can easily create an interface named vxlan_sys_4789
by 'ip link add'.

> That should be taken care of. We would get EINVAL, and that's why I return
> EOPNOTSUPP if that's the case. Then, the code falls back to the compat mode.

We would not get EINVAL. The interface will be created, netlink will
return success but the interface won't be in the metadata mode.

The same applies to all other configuration, e.g. VXLAN-GPE. The
netlink command will succeed, the interface will be created but it
won't be GPE.

It sucks completely, it's a big gap in netlink design but all my
attempts to solve this were rejected upstream.

 Jiri
Thadeu Lima de Souza Cascardo April 18, 2016, 12:08 p.m. UTC | #6
On Mon, Apr 18, 2016 at 12:36:31PM +0200, Jiri Benc wrote:
> On Mon, 18 Apr 2016 06:57:22 -0300, Thadeu Lima de Souza Cascardo wrote:
> > But... wait a minute. We don't support adding devices name as vxlan_sys*. Such
> > names are reserved. I think that means we could probably rely on the names.
> 
> They are not. You can easily create an interface named vxlan_sys_4789
> by 'ip link add'.
> 

I mean by using ovs-vsctl add-port. At vswitchd/bridge.c:iface_do_create,

    if (netdev_is_reserved_name(iface_cfg->name)) {
        VLOG_WARN("could not create interface %s, name is reserved",
                  iface_cfg->name);
        error = EINVAL;
        goto error;
    }

netdev_is_reserved_name checks for names starting with the dpif_port prefix from
vports. That includes vxlan_sys.

[root@devconf2 ~]# ovs-vsctl add-port br0 vxlan_sys_8888
ovs-vsctl: Error detected while setting up 'vxlan_sys_8888'.  See ovs-vswitchd log for details.
[root@devconf2 ~]# tail /var/log/openvswitch/ovs-vswitchd.log
2016-04-18T07:34:01.422Z|00032|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2016-04-18T09:30:34.142Z|00033|bridge|WARN|could not create interface vxlan_sys_8888, name is reserved

> > That should be taken care of. We would get EINVAL, and that's why I return
> > EOPNOTSUPP if that's the case. Then, the code falls back to the compat mode.
> 
> We would not get EINVAL. The interface will be created, netlink will
> return success but the interface won't be in the metadata mode.
> 
> The same applies to all other configuration, e.g. VXLAN-GPE. The
> netlink command will succeed, the interface will be created but it
> won't be GPE.
> 
> It sucks completely, it's a big gap in netlink design but all my
> attempts to solve this were rejected upstream.

I will test that and add the verification code.

Thanks.
Cascardo.

> 
>  Jiri
Jesse Gross April 18, 2016, 6:30 p.m. UTC | #7
On Mon, Apr 18, 2016 at 3:27 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Fri, 15 Apr 2016 20:36:51 -0700, Jesse Gross wrote:
>> I'm not too excited about this. It seems like it would be a regression
>> - currently OVSDB allows remote creation of tunnels, so it seems like
>> this would break existing setups if it also requires users to
>> explicitly create tunnel devices on the host ahead of time.
>
> It wouldn't break the existing setups, the current code is not going
> away.
>
> However, it wouldn't allow remote creation of tunnels with new features
> (like VXLAN-GPE). I don't think it's that bad - we don't support remote
> creation of e.g. veth pairs, yet it's common to have them.

I don't really want to have two separate interfaces to create tunnels.
I think this would mean that you would need to use OVSDB to configure
tunnels on older kernels (which would then use the kernel compat code)
and need a script to create tunnel interfaces and add them to OVS on
new kernels. (Even on new kernels I don't know how this would interact
with OVS tunnel ports that are not of type 'flow'. Would they also use
the compat code?) This seems really hard to use in practice and
exports many implementation details across the network.

Even if we didn't have to worry about any legacy code, I'm not
convinced that taking tunnel configuration out of OVSDB is the right
thing from the user's perspective. Much of the value of OVS is around
programmatic and central control so we should allow that where
possible. For example, if a controller starts using a different
encapsulation type in a new version or for a different situation, it
seems undesirable to require the user to change things on each
machine. With regards to veths, in many cases they are used to stitch
together different combinations of OVS and bridging, which is
something that is itself a pain point and something that I hope will
be less necessary in the future.
Jiri Benc April 19, 2016, 5 p.m. UTC | #8
On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote:
> (Even on new kernels I don't know how this would interact
> with OVS tunnel ports that are not of type 'flow'. Would they also use
> the compat code?)

This is a very good point. Although, on the other hand, how much common
are tunnel ports with fixed parameters nowadays?

> Even if we didn't have to worry about any legacy code, I'm not
> convinced that taking tunnel configuration out of OVSDB is the right
> thing from the user's perspective. Much of the value of OVS is around
> programmatic and central control so we should allow that where
> possible. For example, if a controller starts using a different
> encapsulation type in a new version or for a different situation, it
> seems undesirable to require the user to change things on each
> machine.

But tunnels can't be completely set up from a controller even now. At
minimum, an (outer) IP address has to be assigned on each machine. Of
course, there are cases in which an existing, already set up network is
used as the underlay. If the main target is such use cases, then indeed,
this could be perceived as usability regression.

> With regards to veths, in many cases they are used to stitch
> together different combinations of OVS and bridging, which is
> something that is itself a pain point and something that I hope will
> be less necessary in the future.

Fair enough.

 Jiri
Ben Pfaff April 19, 2016, 5:05 p.m. UTC | #9
On Tue, Apr 19, 2016 at 07:00:00PM +0200, Jiri Benc wrote:
> On Mon, 18 Apr 2016 11:30:43 -0700, Jesse Gross wrote:
> > (Even on new kernels I don't know how this would interact
> > with OVS tunnel ports that are not of type 'flow'. Would they also use
> > the compat code?)
> 
> This is a very good point. Although, on the other hand, how much common
> are tunnel ports with fixed parameters nowadays?

OVN and NSX use them.
Jesse Gross April 19, 2016, 11:25 p.m. UTC | #10
On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardo
<cascardo@redhat.com> wrote:
> On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
>> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
>> <cascardo@redhat.com> wrote:
>> > Hi, this is an RFC patch (could probably be 2 patches - more below), that
>> > creates VXLAN devices in userspace and adds them as netdev vports, instead of
>> > using the vxlan vport code.
>> >
>> > The reason for this change is that it has been mentioned in the past that some
>> > of the vport code should be considered compat code, that is, it won't receive
>> > new features or flags.
>>
>> Thanks for looking at this. I agree that this is definitely a
>> necessary direction.
>>
>> > First is the creation of the device under a switch/case. I tried to make this a
>> > netdev_class function, overriding the netdev class by dpif_port_open_type. That
>> > required exporting a lot of the vport functions. I can still do it that way, if
>> > that's preferrable, but this seems more simple.
>>
>> Can you give some examples of what would need to be exported? It would
>> be nice to just set the open function for each type but if it is
>> really horrible we can live with the switch statement.
>>
>
> netdev_vport_{alloc, construct, destruct, dealloc}, get_tunnel_config,
> set_tunnel_config, get_netdev_tunnel_config.
>
> We could also do it the other way around, write this code in netdev-vport.c and
> export one or two functions from netdev-linux or dpif-netlink, and the create
> function per tunnel type.

OK, thanks, I understand now. I think what you have currently is
probably the best solution for the time being.

>> > The other problem is during port_del, that since we don't have a netdev, the
>> > dpif_port type would not be vxlan, and deciding whether to remove it or not
>> > could not be made. In fact, this is one of the main reasons why this is an RFC.
>> >
>> > The solution here is to decide on the type by the device's name, and there is
>> > even a function for this, though it looks up for the netdev's already created,
>> > those that probably come from the database. However, when OVS starts, it removes
>> > ports from the datapath, and that information is not available, and may not even
>> > be. In fact, since the dpif_port has a different name because it's a vport, it
>> > won't match any netdev at all. So, I did the opposite of getting the type from
>> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
>> > make this more strict and use the port, we could still be in conflict with a
>> > vxlan device created by the user with such name. This should have been one patch
>> > by itself.
>>
>> What about using the driver name exposed through ethtool? I believe
>> that all of the tunnel and internal devices expose this in a
>> consistent way - i.e. a VXLAN device can be queried and get back the
>> string "vxlan". Any driver other than the ones that we recognize can
>> be assumed to be OVS_VPORT_TYPE_NETDEV.
>>
>
> I don't see how this address the case when the user adds a vxlan interface
> created by the system. Like:
>
> ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
> ovs-vsctl add-port br0 vxlan_sys_4789
>
> Its driver would be vxlan. That goes to vxlan0 too.

I think we can check the combination of device type and the options
that are set on it (the same as what we need to do after we create the
device). If all of those match then it doesn't matter whether we added
it previously or the user added it - the device will work exactly the
same either way. If there isn't a match then we should bail out
without adding the port. This is pretty similar to the behavior of the
current compat code in the kernel (in that case if a port already
exists with the same name, it just aborts).

Two unresolved issues in my mind:
 * How do we handle cleaning up ports (including in cases where
userspace crashes)? In the kernel case, the port will stick around
until either the module is removed or the port is explicitly deleted.
 * How do we handle upgrade where the new version of OVS needs options
that are different from the previous version? This is basically a
special version of the port being created by someone else but we
should be able to handle the differences. Depending on how good a job
we do at cleaning up the port, this might not be an issue.
Thadeu Lima de Souza Cascardo April 20, 2016, 11 a.m. UTC | #11
On Tue, Apr 19, 2016 at 04:25:32PM -0700, Jesse Gross wrote:
> On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardo
> <cascardo@redhat.com> wrote:
> > On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
> >> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
> >> <cascardo@redhat.com> wrote:
> >> > Hi, this is an RFC patch (could probably be 2 patches - more below), that
> >> > creates VXLAN devices in userspace and adds them as netdev vports, instead of
> >> > using the vxlan vport code.
> >> >
> >> > The reason for this change is that it has been mentioned in the past that some
> >> > of the vport code should be considered compat code, that is, it won't receive
> >> > new features or flags.
> >>
> >> Thanks for looking at this. I agree that this is definitely a
> >> necessary direction.
> >>
> >> > First is the creation of the device under a switch/case. I tried to make this a
> >> > netdev_class function, overriding the netdev class by dpif_port_open_type. That
> >> > required exporting a lot of the vport functions. I can still do it that way, if
> >> > that's preferrable, but this seems more simple.
> >>
> >> Can you give some examples of what would need to be exported? It would
> >> be nice to just set the open function for each type but if it is
> >> really horrible we can live with the switch statement.
> >>
> >
> > netdev_vport_{alloc, construct, destruct, dealloc}, get_tunnel_config,
> > set_tunnel_config, get_netdev_tunnel_config.
> >
> > We could also do it the other way around, write this code in netdev-vport.c and
> > export one or two functions from netdev-linux or dpif-netlink, and the create
> > function per tunnel type.
> 
> OK, thanks, I understand now. I think what you have currently is
> probably the best solution for the time being.
> 
> >> > The other problem is during port_del, that since we don't have a netdev, the
> >> > dpif_port type would not be vxlan, and deciding whether to remove it or not
> >> > could not be made. In fact, this is one of the main reasons why this is an RFC.
> >> >
> >> > The solution here is to decide on the type by the device's name, and there is
> >> > even a function for this, though it looks up for the netdev's already created,
> >> > those that probably come from the database. However, when OVS starts, it removes
> >> > ports from the datapath, and that information is not available, and may not even
> >> > be. In fact, since the dpif_port has a different name because it's a vport, it
> >> > won't match any netdev at all. So, I did the opposite of getting the type from
> >> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
> >> > make this more strict and use the port, we could still be in conflict with a
> >> > vxlan device created by the user with such name. This should have been one patch
> >> > by itself.
> >>
> >> What about using the driver name exposed through ethtool? I believe
> >> that all of the tunnel and internal devices expose this in a
> >> consistent way - i.e. a VXLAN device can be queried and get back the
> >> string "vxlan". Any driver other than the ones that we recognize can
> >> be assumed to be OVS_VPORT_TYPE_NETDEV.
> >>
> >
> > I don't see how this address the case when the user adds a vxlan interface
> > created by the system. Like:
> >
> > ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
> > ovs-vsctl add-port br0 vxlan_sys_4789
> >
> > Its driver would be vxlan. That goes to vxlan0 too.
> 
> I think we can check the combination of device type and the options
> that are set on it (the same as what we need to do after we create the
> device). If all of those match then it doesn't matter whether we added
> it previously or the user added it - the device will work exactly the
> same either way. If there isn't a match then we should bail out
> without adding the port. This is pretty similar to the behavior of the
> current compat code in the kernel (in that case if a port already
> exists with the same name, it just aborts).
> 

As I pointed out in another message, vxlan_sys* is a reserved name, if
any port like that is added to the database, it's going to be rejected
by vswitchd. The same goes for other vports. So, I think it's sufficient
to check this name, then set the type appropriately, and depending on
the type, try to remove the interface. So vxlan0 is not going to be
removed, but vxlan_sys_4789 will.

> Two unresolved issues in my mind:
>  * How do we handle cleaning up ports (including in cases where
> userspace crashes)? In the kernel case, the port will stick around
> until either the module is removed or the port is explicitly deleted.
>  * How do we handle upgrade where the new version of OVS needs options
> that are different from the previous version? This is basically a
> special version of the port being created by someone else but we
> should be able to handle the differences. Depending on how good a job
> we do at cleaning up the port, this might not be an issue.

OVS already does a good job at cleaning up ports when it starts. In
open_dpif_backer, any port not belonging to init_ofp_ports will be
removed from the datapath. As I implemented the code, those vxlan_sys
devices are going to be deleted as well after removal from the datapath,
and only if they were present in the datapath in the first place. That
means the port won't stick around in the case of a crash, and neither in
the case of upgrades.

Cascardo.

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
Jesse Gross April 20, 2016, 6:38 p.m. UTC | #12
On Wed, Apr 20, 2016 at 4:00 AM, Thadeu Lima de Souza Cascardo
<cascardo@redhat.com> wrote:
> On Tue, Apr 19, 2016 at 04:25:32PM -0700, Jesse Gross wrote:
>> On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardo
>> <cascardo@redhat.com> wrote:
>> > On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
>> >> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
>> >> <cascardo@redhat.com> wrote:
>> >> > The other problem is during port_del, that since we don't have a netdev, the
>> >> > dpif_port type would not be vxlan, and deciding whether to remove it or not
>> >> > could not be made. In fact, this is one of the main reasons why this is an RFC.
>> >> >
>> >> > The solution here is to decide on the type by the device's name, and there is
>> >> > even a function for this, though it looks up for the netdev's already created,
>> >> > those that probably come from the database. However, when OVS starts, it removes
>> >> > ports from the datapath, and that information is not available, and may not even
>> >> > be. In fact, since the dpif_port has a different name because it's a vport, it
>> >> > won't match any netdev at all. So, I did the opposite of getting the type from
>> >> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
>> >> > make this more strict and use the port, we could still be in conflict with a
>> >> > vxlan device created by the user with such name. This should have been one patch
>> >> > by itself.
>> >>
>> >> What about using the driver name exposed through ethtool? I believe
>> >> that all of the tunnel and internal devices expose this in a
>> >> consistent way - i.e. a VXLAN device can be queried and get back the
>> >> string "vxlan". Any driver other than the ones that we recognize can
>> >> be assumed to be OVS_VPORT_TYPE_NETDEV.
>> >>
>> >
>> > I don't see how this address the case when the user adds a vxlan interface
>> > created by the system. Like:
>> >
>> > ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
>> > ovs-vsctl add-port br0 vxlan_sys_4789
>> >
>> > Its driver would be vxlan. That goes to vxlan0 too.
>>
>> I think we can check the combination of device type and the options
>> that are set on it (the same as what we need to do after we create the
>> device). If all of those match then it doesn't matter whether we added
>> it previously or the user added it - the device will work exactly the
>> same either way. If there isn't a match then we should bail out
>> without adding the port. This is pretty similar to the behavior of the
>> current compat code in the kernel (in that case if a port already
>> exists with the same name, it just aborts).
>>
>
> As I pointed out in another message, vxlan_sys* is a reserved name, if
> any port like that is added to the database, it's going to be rejected
> by vswitchd. The same goes for other vports. So, I think it's sufficient
> to check this name, then set the type appropriately, and depending on
> the type, try to remove the interface. So vxlan0 is not going to be
> removed, but vxlan_sys_4789 will.
>
>> Two unresolved issues in my mind:
>>  * How do we handle cleaning up ports (including in cases where
>> userspace crashes)? In the kernel case, the port will stick around
>> until either the module is removed or the port is explicitly deleted.
>>  * How do we handle upgrade where the new version of OVS needs options
>> that are different from the previous version? This is basically a
>> special version of the port being created by someone else but we
>> should be able to handle the differences. Depending on how good a job
>> we do at cleaning up the port, this might not be an issue.
>
> OVS already does a good job at cleaning up ports when it starts. In
> open_dpif_backer, any port not belonging to init_ofp_ports will be
> removed from the datapath. As I implemented the code, those vxlan_sys
> devices are going to be deleted as well after removal from the datapath,
> and only if they were present in the datapath in the first place. That
> means the port won't stick around in the case of a crash, and neither in
> the case of upgrades.

Thanks for that analysis. I agree with you that this looks safe. I'm
glad - there are fewer corner cases than I was expecting.

One minor comment that I noticed on the patch itself - I don't know if
the port mapping functions are handling IPsec variants of tunnels
correctly in all situations (or maybe are handling them by accident).
Since both IPsec and non-IPsec ports share the same dpif_port name, we
might get the wrong class back and then it seems like we don't handle
the IPsec class types when converting between names and types, which
could itself be an independent bug. Can you take a look?
Thadeu Lima de Souza Cascardo April 21, 2016, 4:43 p.m. UTC | #13
On Wed, Apr 20, 2016 at 11:38:31AM -0700, Jesse Gross wrote:
> Thanks for that analysis. I agree with you that this looks safe. I'm
> glad - there are fewer corner cases than I was expecting.
> 
> One minor comment that I noticed on the patch itself - I don't know if
> the port mapping functions are handling IPsec variants of tunnels
> correctly in all situations (or maybe are handling them by accident).
> Since both IPsec and non-IPsec ports share the same dpif_port name, we
> might get the wrong class back and then it seems like we don't handle
> the IPsec class types when converting between names and types, which
> could itself be an independent bug. Can you take a look?

Sure, I am not too familiar with the ipsec code. Looking at it, I don't see
anything really special about it. It just requires that some configuration is
available and that the ovs-monitor-ipsec daemon is running. When adding and
removing devices, the same procedure would still work. So, I don't see why we
would need to handle it specially, do you?

On the other hand, I noticed that VXLAN-GBP tunnels will have the problem of
sharing the same dpif_port name with non GBP tunnels. That means that tunnels
using different remote IPs with the same port all must be GBP or not GBP. If we
used extensions for GPE, that would also apply there. Maybe we should just add
an infix gbp and gpe, resulting in vxlan_sys_gbp_4789 and vxlan_sys_gpe_4789.
How about that?

Regards.
Cascardo.
Jesse Gross April 21, 2016, 5:43 p.m. UTC | #14
On Thu, Apr 21, 2016 at 9:43 AM, Thadeu Lima de Souza Cascardo
<cascardo@redhat.com> wrote:
> On Wed, Apr 20, 2016 at 11:38:31AM -0700, Jesse Gross wrote:
>> One minor comment that I noticed on the patch itself - I don't know if
>> the port mapping functions are handling IPsec variants of tunnels
>> correctly in all situations (or maybe are handling them by accident).
>> Since both IPsec and non-IPsec ports share the same dpif_port name, we
>> might get the wrong class back and then it seems like we don't handle
>> the IPsec class types when converting between names and types, which
>> could itself be an independent bug. Can you take a look?
>
> Sure, I am not too familiar with the ipsec code. Looking at it, I don't see
> anything really special about it. It just requires that some configuration is
> available and that the ovs-monitor-ipsec daemon is running. When adding and
> removing devices, the same procedure would still work. So, I don't see why we
> would need to handle it specially, do you?

Nevermind, I just noticed that netdev_to_ovs_vport_type() handles GRE
differently from the other tunnel types to account for the "ipsec_"
prefix in the class name. I was concerned that we wouldn't properly
translate IPsec netdev types into the Linux OVS vport type but that
doesn't appear to be the case.

> On the other hand, I noticed that VXLAN-GBP tunnels will have the problem of
> sharing the same dpif_port name with non GBP tunnels. That means that tunnels
> using different remote IPs with the same port all must be GBP or not GBP. If we
> used extensions for GPE, that would also apply there. Maybe we should just add
> an infix gbp and gpe, resulting in vxlan_sys_gbp_4789 and vxlan_sys_gpe_4789.
> How about that?

I agree that this is a problem. Different extensions will need to run
on different UDP port numbers since there's no way to otherwise know
how to interpret the bits when the packet is first received. I think
we need to block differing extensions on the same UDP port when the
VXLAN ports are added to OVS, otherwise they'll silently be combined
onto the same backer. Your idea of adding a tag indicating the
extension type is nice for debugging (though you'll need to find a way
to abbreviate the name so that it fits within the interface name
length) but it's not strictly necessary: if extensions are running on
different UDP ports then the backers will already be distinguished and
if they are running on the same port then attempting to add the second
one will silently fail at the kernel level.
diff mbox

Patch

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 015ba20..6be6f52 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -25,6 +25,7 @@ 
 #include <net/if.h>
 #include <linux/types.h>
 #include <linux/pkt_sched.h>
+#include <linux/rtnetlink.h>
 #include <poll.h>
 #include <stdlib.h>
 #include <strings.h>
@@ -780,10 +781,8 @@  get_vport_type(const struct dpif_netlink_vport *vport)
 }
 
 static enum ovs_vport_type
-netdev_to_ovs_vport_type(const struct netdev *netdev)
+netdev_to_ovs_vport_type(const char *type)
 {
-    const char *type = netdev_get_type(netdev);
-
     if (!strcmp(type, "tap") || !strcmp(type, "system")) {
         return OVS_VPORT_TYPE_NETDEV;
     } else if (!strcmp(type, "internal")) {
@@ -804,19 +803,14 @@  netdev_to_ovs_vport_type(const struct netdev *netdev)
 }
 
 static int
-dpif_netlink_port_add__(struct dpif_netlink *dpif, struct netdev *netdev,
+dpif_netlink_port_add__(struct dpif_netlink *dpif, const char *name,
+                        enum ovs_vport_type type,
+                        struct ofpbuf *options,
                         odp_port_t *port_nop)
     OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
-    const struct netdev_tunnel_config *tnl_cfg;
-    char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
-    const char *name = netdev_vport_get_dpif_port(netdev,
-                                                  namebuf, sizeof namebuf);
-    const char *type = netdev_get_type(netdev);
     struct dpif_netlink_vport request, reply;
     struct ofpbuf *buf;
-    uint64_t options_stub[64 / 8];
-    struct ofpbuf options;
     struct nl_sock **socksp = NULL;
     uint32_t *upcall_pids;
     int error = 0;
@@ -831,52 +825,19 @@  dpif_netlink_port_add__(struct dpif_netlink *dpif, struct netdev *netdev,
     dpif_netlink_vport_init(&request);
     request.cmd = OVS_VPORT_CMD_NEW;
     request.dp_ifindex = dpif->dp_ifindex;
-    request.type = netdev_to_ovs_vport_type(netdev);
-    if (request.type == OVS_VPORT_TYPE_UNSPEC) {
-        VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has "
-                     "unsupported type `%s'",
-                     dpif_name(&dpif->dpif), name, type);
-        vport_del_socksp(dpif, socksp);
-        return EINVAL;
-    }
+    request.type = type;
     request.name = name;
 
-    if (request.type == OVS_VPORT_TYPE_NETDEV) {
-#ifdef _WIN32
-        /* XXX : Map appropiate Windows handle */
-#else
-        netdev_linux_ethtool_set_flag(netdev, ETH_FLAG_LRO, "LRO", false);
-#endif
-    }
-
-    tnl_cfg = netdev_get_tunnel_config(netdev);
-    if (tnl_cfg && (tnl_cfg->dst_port != 0 || tnl_cfg->exts)) {
-        ofpbuf_use_stack(&options, options_stub, sizeof options_stub);
-        if (tnl_cfg->dst_port) {
-            nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_DST_PORT,
-                           ntohs(tnl_cfg->dst_port));
-        }
-        if (tnl_cfg->exts) {
-            size_t ext_ofs;
-            int i;
-
-            ext_ofs = nl_msg_start_nested(&options, OVS_TUNNEL_ATTR_EXTENSION);
-            for (i = 0; i < 32; i++) {
-                if (tnl_cfg->exts & (1 << i)) {
-                    nl_msg_put_flag(&options, i);
-                }
-            }
-            nl_msg_end_nested(&options, ext_ofs);
-        }
-        request.options = options.data;
-        request.options_len = options.size;
-    }
-
     request.port_no = *port_nop;
     upcall_pids = vport_socksp_to_pids(socksp, dpif->n_handlers);
     request.n_upcall_pids = socksp ? dpif->n_handlers : 1;
     request.upcall_pids = upcall_pids;
 
+    if (options) {
+        request.options = options->data;
+        request.options_len = options->size;
+    }
+
     error = dpif_netlink_vport_transact(&request, &reply, &buf);
     if (!error) {
         *port_nop = reply.port_no;
@@ -916,6 +877,215 @@  exit:
 }
 
 static int
+dpif_netlink_port_add_compat(struct dpif_netlink *dpif, struct netdev *netdev,
+                             odp_port_t *port_nop)
+    OVS_REQ_WRLOCK(dpif->upcall_lock)
+{
+    const struct netdev_tunnel_config *tnl_cfg;
+    char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
+    const char *name = netdev_vport_get_dpif_port(netdev,
+                                                  namebuf, sizeof namebuf);
+    const char *type = netdev_get_type(netdev);
+    uint64_t options_stub[64 / 8];
+    struct ofpbuf options;
+    enum ovs_vport_type ovs_type;
+
+    ovs_type = netdev_to_ovs_vport_type(netdev_get_type(netdev));
+    if (ovs_type == OVS_VPORT_TYPE_UNSPEC) {
+        VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has "
+                     "unsupported type `%s'",
+                     dpif_name(&dpif->dpif), name, type);
+        return EINVAL;
+    }
+
+    if (ovs_type == OVS_VPORT_TYPE_NETDEV) {
+#ifdef _WIN32
+        /* XXX : Map appropiate Windows handle */
+#else
+        netdev_linux_ethtool_set_flag(netdev, ETH_FLAG_LRO, "LRO", false);
+#endif
+    }
+
+    tnl_cfg = netdev_get_tunnel_config(netdev);
+    if (tnl_cfg && (tnl_cfg->dst_port != 0 || tnl_cfg->exts)) {
+        ofpbuf_use_stack(&options, options_stub, sizeof options_stub);
+        if (tnl_cfg->dst_port) {
+            nl_msg_put_u16(&options, OVS_TUNNEL_ATTR_DST_PORT,
+                           ntohs(tnl_cfg->dst_port));
+        }
+        if (tnl_cfg->exts) {
+            size_t ext_ofs;
+            int i;
+
+            ext_ofs = nl_msg_start_nested(&options, OVS_TUNNEL_ATTR_EXTENSION);
+            for (i = 0; i < 32; i++) {
+                if (tnl_cfg->exts & (1 << i)) {
+                    nl_msg_put_flag(&options, i);
+                }
+            }
+            nl_msg_end_nested(&options, ext_ofs);
+        }
+        return dpif_netlink_port_add__(dpif, name, ovs_type, &options, port_nop);
+    } else {
+        return dpif_netlink_port_add__(dpif, name, ovs_type, NULL, port_nop);
+    }
+
+}
+
+#ifdef __linux__
+
+static int
+netdev_vxlan_create(struct netdev *netdev)
+{
+    int err;
+    struct ofpbuf request, *reply;
+    size_t linkinfo_off, infodata_off;
+    char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
+    const char *name = netdev_vport_get_dpif_port(netdev,
+                                                  namebuf, sizeof namebuf);
+    const struct netdev_tunnel_config *tnl_cfg;
+    tnl_cfg = netdev_get_tunnel_config(netdev);
+    if (!tnl_cfg) { /* or assert? */
+        return EINVAL;
+    }
+
+    ofpbuf_init(&request, 0);
+    nl_msg_put_nlmsghdr(&request, 0, RTM_NEWLINK,
+                        NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE);
+    ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
+    nl_msg_put_string(&request, IFLA_IFNAME, name);
+    linkinfo_off = nl_msg_start_nested(&request, IFLA_LINKINFO);
+        nl_msg_put_string(&request, IFLA_INFO_KIND, "vxlan");
+        infodata_off = nl_msg_start_nested(&request, IFLA_INFO_DATA);
+            nl_msg_put_u8(&request, IFLA_VXLAN_COLLECT_METADATA, 1);
+            nl_msg_put_u8(&request, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1);
+            if (tnl_cfg->exts & (1 << OVS_VXLAN_EXT_GBP)) {
+                nl_msg_put_flag(&request, IFLA_VXLAN_GBP);
+            }
+            nl_msg_put_be16(&request, IFLA_VXLAN_PORT, tnl_cfg->dst_port);
+        nl_msg_end_nested(&request, infodata_off);
+    nl_msg_end_nested(&request, linkinfo_off);
+
+    err = nl_transact(NETLINK_ROUTE, &request, &reply);
+
+    if (!err) {
+        ofpbuf_uninit(reply);
+    }
+
+    if (err == EINVAL) {
+        err = EOPNOTSUPP;
+    }
+
+    ofpbuf_uninit(&request);
+    return err;
+}
+
+static int
+netdev_vxlan_destroy(const char *name)
+{
+    int err;
+    struct ofpbuf request, *reply;
+
+    ofpbuf_init(&request, 0);
+    nl_msg_put_nlmsghdr(&request, 0, RTM_DELLINK,
+                        NLM_F_REQUEST | NLM_F_ACK);
+    ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
+    nl_msg_put_string(&request, IFLA_IFNAME, name);
+
+    err = nl_transact(NETLINK_ROUTE, &request, &reply);
+
+    if (!err) {
+        ofpbuf_uninit(reply);
+    }
+
+    ofpbuf_uninit(&request);
+    return err;
+}
+
+#else
+
+static int
+netdev_vxlan_create(struct netdev *netdev OVS_UNUSED)
+{
+    return EOPNOTSUPP;
+}
+
+static int
+netdev_vxlan_destroy(const char *name OVS_UNUSED)
+{
+    return EOPNOTSUPP;
+}
+
+#endif
+
+static int
+dpif_netlink_port_query__(const struct dpif_netlink *dpif, odp_port_t port_no,
+                          const char *port_name, struct dpif_port *dpif_port);
+
+static int
+dpif_netlink_port_create(struct netdev *netdev)
+{
+    switch (netdev_to_ovs_vport_type(netdev_get_type(netdev))) {
+    case OVS_VPORT_TYPE_VXLAN:
+        return netdev_vxlan_create(netdev);
+    case OVS_VPORT_TYPE_NETDEV:
+    case OVS_VPORT_TYPE_INTERNAL:
+    case OVS_VPORT_TYPE_GENEVE:
+    case OVS_VPORT_TYPE_GRE:
+    case OVS_VPORT_TYPE_LISP:
+    case OVS_VPORT_TYPE_STT:
+    case OVS_VPORT_TYPE_UNSPEC:
+    case __OVS_VPORT_TYPE_MAX:
+    default:
+        return EOPNOTSUPP;
+    }
+    return 0;
+}
+
+static int
+dpif_netlink_port_destroy(const char *name, const char *type)
+{
+    switch (netdev_to_ovs_vport_type(type)) {
+    case OVS_VPORT_TYPE_VXLAN:
+        return netdev_vxlan_destroy(name);
+    case OVS_VPORT_TYPE_NETDEV:
+    case OVS_VPORT_TYPE_INTERNAL:
+    case OVS_VPORT_TYPE_GENEVE:
+    case OVS_VPORT_TYPE_GRE:
+    case OVS_VPORT_TYPE_LISP:
+    case OVS_VPORT_TYPE_STT:
+    case OVS_VPORT_TYPE_UNSPEC:
+    case __OVS_VPORT_TYPE_MAX:
+    default:
+        return EOPNOTSUPP;
+    }
+    return 0;
+}
+
+static int
+dpif_netlink_port_create_and_add(struct dpif_netlink *dpif, struct netdev *netdev,
+                           odp_port_t *port_nop)
+    OVS_REQ_WRLOCK(dpif->upcall_lock)
+{
+    int error;
+    char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
+    const char *name = netdev_vport_get_dpif_port(netdev,
+                                                  namebuf, sizeof namebuf);
+
+    error = dpif_netlink_port_create(netdev);
+    if (error) {
+        return error;
+    }
+
+    error = dpif_netlink_port_add__(dpif, name, OVS_VPORT_TYPE_NETDEV, NULL, port_nop);
+    if (error) {
+        VLOG_DBG("failed to add port, destroying: %d", error);
+        dpif_netlink_port_destroy(name, netdev_get_type(netdev));
+    }
+    return error;
+}
+
+static int
 dpif_netlink_port_add(struct dpif *dpif_, struct netdev *netdev,
                       odp_port_t *port_nop)
 {
@@ -923,7 +1093,10 @@  dpif_netlink_port_add(struct dpif *dpif_, struct netdev *netdev,
     int error;
 
     fat_rwlock_wrlock(&dpif->upcall_lock);
-    error = dpif_netlink_port_add__(dpif, netdev, port_nop);
+    error = dpif_netlink_port_create_and_add(dpif, netdev, port_nop);
+    if (error == EOPNOTSUPP) {
+        error = dpif_netlink_port_add_compat(dpif, netdev, port_nop);
+    }
     fat_rwlock_unlock(&dpif->upcall_lock);
 
     return error;
@@ -935,6 +1108,12 @@  dpif_netlink_port_del__(struct dpif_netlink *dpif, odp_port_t port_no)
 {
     struct dpif_netlink_vport vport;
     int error;
+    struct dpif_port dpif_port;
+
+    error = dpif_netlink_port_query__(dpif, port_no, NULL, &dpif_port);
+    if (error) {
+        return error;
+    }
 
     dpif_netlink_vport_init(&vport);
     vport.cmd = OVS_VPORT_CMD_DEL;
@@ -944,6 +1123,9 @@  dpif_netlink_port_del__(struct dpif_netlink *dpif, odp_port_t port_no)
 
     vport_del_channels(dpif, port_no);
 
+    dpif_netlink_port_destroy(dpif_port.name, dpif_port.type);
+    dpif_port_destroy(&dpif_port);
+
     return error;
 }
 
@@ -991,6 +1173,7 @@  dpif_netlink_port_query__(const struct dpif_netlink *dpif, odp_port_t port_no,
     return error;
 }
 
+
 static int
 dpif_netlink_port_query_by_number(const struct dpif *dpif_, odp_port_t port_no,
                                   struct dpif_port *dpif_port)
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index e398562..57578c3 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -90,6 +90,8 @@  struct vport_class {
     struct netdev_class netdev_class;
 };
 
+static const struct vport_class vport_classes[];
+
 /* Last read of the route-table's change number. */
 static uint64_t rt_change_seqno;
 
@@ -1547,25 +1549,38 @@  netdev_vport_range(struct unixctl_conn *conn, int argc,
                           tunnel_get_status,                                   \
                           BUILD_HEADER, PUSH_HEADER, POP_HEADER) }}
 
+/* The name of the dpif_port should be short enough to accomodate adding
+ * a port number to the end if one is necessary. */
+static const struct vport_class vport_classes[] = {
+    TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header,
+                                        push_udp_header,
+                                        netdev_geneve_pop_header),
+    TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header,
+                                   netdev_gre_push_header,
+                                   netdev_gre_pop_header),
+    TUNNEL_CLASS("ipsec_gre", "gre_sys", NULL, NULL, NULL),
+    TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header,
+                                       push_udp_header,
+                                       netdev_vxlan_pop_header),
+    TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL),
+    TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL),
+};
+
+const struct netdev_class *
+netdev_dpif_port_get_vport_class(const char *dpif_port)
+{
+    int i;
+    for (i = 0; i < ARRAY_SIZE(vport_classes); i++) {
+        if (strstr(dpif_port, vport_classes[i].dpif_port) == dpif_port) {
+            return &vport_classes[i].netdev_class;
+        }
+    }
+    return NULL;
+}
+
 void
 netdev_vport_tunnel_register(void)
 {
-    /* The name of the dpif_port should be short enough to accomodate adding
-     * a port number to the end if one is necessary. */
-    static const struct vport_class vport_classes[] = {
-        TUNNEL_CLASS("geneve", "genev_sys", netdev_geneve_build_header,
-                                            push_udp_header,
-                                            netdev_geneve_pop_header),
-        TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header,
-                                       netdev_gre_push_header,
-                                       netdev_gre_pop_header),
-        TUNNEL_CLASS("ipsec_gre", "gre_sys", NULL, NULL, NULL),
-        TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header,
-                                           push_udp_header,
-                                           netdev_vxlan_pop_header),
-        TUNNEL_CLASS("lisp", "lisp_sys", NULL, NULL, NULL),
-        TUNNEL_CLASS("stt", "stt_sys", NULL, NULL, NULL),
-    };
     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
     if (ovsthread_once_start(&once)) {
diff --git a/lib/netdev-vport.h b/lib/netdev-vport.h
index be02cb5..7e32bbd 100644
--- a/lib/netdev-vport.h
+++ b/lib/netdev-vport.h
@@ -42,6 +42,7 @@  void netdev_vport_inc_tx(const struct netdev *,
 
 bool netdev_vport_is_vport_class(const struct netdev_class *);
 const char *netdev_vport_class_get_dpif_port(const struct netdev_class *);
+const struct netdev_class *netdev_dpif_port_get_vport_class(const char *);
 
 #ifndef _WIN32
 enum { NETDEV_VPORT_NAME_BUFSIZE = 16 };
diff --git a/lib/netdev.c b/lib/netdev.c
index 3e50694..0bc4d04 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -1789,6 +1789,13 @@  netdev_get_type_from_name(const char *name)
     struct netdev *dev = netdev_from_name(name);
     const char *type = dev ? netdev_get_type(dev) : NULL;
     netdev_close(dev);
+    if (!type) {
+        const struct netdev_class * netdev_class;
+        netdev_class = netdev_dpif_port_get_vport_class(name);
+        if (netdev_class) {
+            type = netdev_class->type;
+        }
+    }
     return type;
 }