diff mbox

virtif: initial interface extensions

Message ID 201005090120.19958.arnd@arndb.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Arnd Bergmann May 8, 2010, 11:20 p.m. UTC
Building on the work of Scott Feldman, this extends the netlink interface
to deal with not only port profiles but also CDCP multichannel and VDP
VSI registration.

The protocols are split apart into separate netlink attributes for
a cleaner separation. A device can have multiple IFLA_VIRTIF attributes,
each pointing to one of the slaves that get registered using one of
the protocols. In case of VDP, each IFLA_VIRTIF attribute needs
both an IFLA_VIRTIF_VSI and one or more IFLA_VIRTIF_VSI_MAC_VLAN
attributes.

The VF number is split out because it only makes sense when the
implementation is in the device driver or firmware, but not for
software-only implementations like we do for the initial VDP code
using LLDPAD. If a IFLA_VIRTIF_VF attribute is given, the kernel
takes care of the association through the device driver for that
VF, otherwise we do it in user space. This should work for each
of the three protocols.

This code is a first rough prototype, completely untested and
very likely buggy. This is also the first time I'm trying
to deal with netlink in the kernel, so chances are that I've
misunderstood something in a major way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/if_link.h   |   81 ++++++++++++++++++++++++++++------
 include/linux/netdevice.h |    8 ++--
 net/core/rtnetlink.c      |  106 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 152 insertions(+), 43 deletions(-)

Comments

Stefan Berger May 10, 2010, 3:37 p.m. UTC | #1
Arnd Bergmann <arnd <at> arndb.de> writes:

[...]

> +	if (tb[IFLA_VIRTIF]) {
> +		struct ifla_virtif_port_profile *ivp;
> +		struct nlattr *virtif[IFLA_VIRTIF_MAX+1];
> +		u32 vf;
> +
> +		err = nla_parse_nested(virtif, IFLA_VIRTIF_MAX,
> +				       tb[IFLA_VIRTIF], ifla_virtif_policy);
> +		if (err < 0)
> +			return err;
> +
> +		if (!virtif[IFLA_VIRTIF_VF] || !virtif[IFLA_VIRTIF_PORT_PROFILE])
> +			goto novirtif; /* IFLA_VIRTIF may be directed at user space */


In what case would the IFLA_VIRTIF_PORT_PROFILE be provided? Would libvirt for
example need to be aware of whether the Ethernet device can handle the setup
protocol via its firmware and in this case provide the port profile parameter
and in other cases provide other parameters? Certainly the user or upper layer
management software would have to know it when creating the domain XML and in
fact different types of parameters were needed. Obviously we should have one
common set of (XML) parameters that go into the netlink message and that can be
handled by the kernel driver if the firmware knows how to handle it or by
LLDPAD. Libvirt would send the parameters via netlink message to trigger the
setup protocol and the message may be received by kernel and LLDPAD. From what I
can see LLDPAD also may need a way to probe the kernel driver whether it handled
the setup protocol via firmware on a given interface, which may or may not be
true for all interfaces, but may be necessary to avoid triggering the setup
protocol twice.

   Stefan



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman May 10, 2010, 6:56 p.m. UTC | #2
On 5/10/10 8:37 AM, "Stefan Berger" <stefanb@us.ibm.com> wrote:

> Arnd Bergmann <arnd <at> arndb.de> writes:
> 
> [...]
> 
>> + if (tb[IFLA_VIRTIF]) {
>> +  struct ifla_virtif_port_profile *ivp;
>> +  struct nlattr *virtif[IFLA_VIRTIF_MAX+1];
>> +  u32 vf;
>> +
>> +  err = nla_parse_nested(virtif, IFLA_VIRTIF_MAX,
>> +           tb[IFLA_VIRTIF], ifla_virtif_policy);
>> +  if (err < 0)
>> +   return err;
>> +
>> +  if (!virtif[IFLA_VIRTIF_VF] || !virtif[IFLA_VIRTIF_PORT_PROFILE])
>> +   goto novirtif; /* IFLA_VIRTIF may be directed at user space */
> 
> 
> In what case would the IFLA_VIRTIF_PORT_PROFILE be provided? Would libvirt for
> example need to be aware of whether the Ethernet device can handle the setup
> protocol via its firmware and in this case provide the port profile parameter
> and in other cases provide other parameters? Certainly the user or upper layer
> management software would have to know it when creating the domain XML and in
> fact different types of parameters were needed.

> Obviously we should have one
> common set of (XML) parameters that go into the netlink message and that can
> be handled by the kernel driver if the firmware knows how to handle it or by
> LLDPAD. 

With Arnd's latest additions, we have a single netlink msg, but the
parameter sets are disjoint between VDP/CDCP and what we need for the kernel
driver.  So that means the sender (libvirt in this case) needs to know about
both setups to send a single netlink msg.  An alternative is a have two
netlink msgs, one for each setup.  That still requires the sender to know
about two setups.  

> Libvirt would send the parameters via netlink message to trigger the
> setup protocol and the message may be received by kernel and LLDPAD.
 
That was the original idea by having libvirt send the netlink msg using
multicast.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 10, 2010, 9:46 p.m. UTC | #3
On Monday 10 May 2010 20:56:39 Scott Feldman wrote:
> On 5/10/10 8:37 AM, "Stefan Berger" <stefanb@us.ibm.com> wrote:
> > In what case would the IFLA_VIRTIF_PORT_PROFILE be provided? Would libvirt for
> > example need to be aware of whether the Ethernet device can handle the setup
> > protocol via its firmware and in this case provide the port profile parameter
> > and in other cases provide other parameters? Certainly the user or upper layer
> > management software would have to know it when creating the domain XML and in
> > fact different types of parameters were needed.
> 
> > Obviously we should have one
> > common set of (XML) parameters that go into the netlink message and that can
> > be handled by the kernel driver if the firmware knows how to handle it or by
> > LLDPAD. 
> 
> With Arnd's latest additions, we have a single netlink msg, but the
> parameter sets are disjoint between VDP/CDCP and what we need for the kernel
> driver.  So that means the sender (libvirt in this case) needs to know about
> both setups to send a single netlink msg.  An alternative is a have two
> netlink msgs, one for each setup.  That still requires the sender to know
> about two setups.

There are two separate issues here. The first one is whether we're doing
the association in the device driver or in user space. The assumption here
is that if it's in the device driver, there will be a VF number to identify
the channel, while in user space that is not needed.

The other question is which protocol we're using. There are as far as I
can tell five options:

1. enic device driver
2. VDP
3. CDCP
4. CDCP + VDP
5. enic + VDP

The first two ones are the most interesting for now, since Linux cannot do
S-VLANs yet, and they are required for CDCP. However, each of these options
could theoreticall be done in the kernel (plus firmware) or in user space.

If it's done in user space, the VF number is meaningless, because the setup
of the software device is also done from software, but instead you need to
take care of creating the software device with the correct parameters, e.g.
a macvtap device connected to a VLAN interface using the numbers you pass
in the VDP protocol.

Right now, we're not planning to do the protocol that enic uses in LLDPAD,
because it's not publically released. Similarly, there are no adapters that
do VDP in firmware, but both these cases should be covered by the protocol
and it would be good if libvirt could handle them.

Stefan, can you just define the XML in a way that matches the netlink
definition? What you need is something like

1. VF number (optional, signifies that 2/3 are done in firmware)
2. Lower-level protocol
  2.1. CDCP
     2.1.1 SVID
     2.1.2 SCID
  2.2. enic
     2.2.1 port profile name
     2.2.2 ...
3. VDP
  3.1 VSI type/version/provider
  3.2 UUID
  3.3 MAC/VLAN

You need to have 2. or 3. or both, and 2.1/2.2 are mutually exclusive.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Berger May 10, 2010, 11:51 p.m. UTC | #4
Arnd Bergmann <arnd@arndb.de> wrote on 05/10/2010 05:46:37 PM:


> The other question is which protocol we're using. There are as far as I
> can tell five options:
> 
> 1. enic device driver
> 2. VDP
> 3. CDCP
> 4. CDCP + VDP
> 5. enic + VDP
> 
> The first two ones are the most interesting for now, since Linux cannot do
> S-VLANs yet, and they are required for CDCP. However, each of these options
> could theoreticall be done in the kernel (plus firmware) or in user space.
> 
> If it's done in user space, the VF number is meaningless, because the setup
> of the software device is also done from software, but instead you need to
> take care of creating the software device with the correct parameters, e.g.
> a macvtap device connected to a VLAN interface using the numbers you pass
> in the VDP protocol.
> 
> Right now, we're not planning to do the protocol that enic uses in LLDPAD,
> because it's not publically released. Similarly, there are no adapters that
> do VDP in firmware, but both these cases should be covered by the protocol
> and it would be good if libvirt could handle them.
> 
> Stefan, can you just define the XML in a way that matches the netlink
> definition? What you need is something like
> 
> 1. VF number (optional, signifies that 2/3 are done in firmware) 

Shouldn't we be able to query that number via netlink starting with the 
macvtap device and the following the trail to the root and trying to find 
a VF number on the way? 

> 2. Lower-level protocol
>   2.1. CDCP
>      2.1.1 SVID
>      2.1.2 SCID 

Will the later on be qeueryable via netlink as well but not today??? 
Vivek tells me svid is vlan, so that could be found out from the kernel. 

So if we want to only support 1 and 2 for now, I'd rather skip them for now. 

>   2.2. enic
>      2.2.1 port profile name 

as proposed on libvirt mailing list 

>      2.2.2 ...
> 3. VDP
>   3.1 VSI type/version/provider 

as proposed on libvirt mailing list 

>   3.2 UUID 

we have a couple of UUIDs, which one? 


>   3.3 MAC/VLAN 

MAC: available from libvirt 
VLAN: can be found out by querying for every interface for VLAN ID while
following the path towards the root device. 


    Stefan 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman May 11, 2010, 12:25 a.m. UTC | #5
On 5/10/10 2:46 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Monday 10 May 2010 20:56:39 Scott Feldman wrote:
>> With Arnd's latest additions, we have a single netlink msg, but the
>> parameter sets are disjoint between VDP/CDCP and what we need for the kernel
>> driver.  So that means the sender (libvirt in this case) needs to know about
>> both setups to send a single netlink msg.  An alternative is a have two
>> netlink msgs, one for each setup.  That still requires the sender to know
>> about two setups.
> 
> There are two separate issues here. The first one is whether we're doing
> the association in the device driver or in user space. The assumption here
> is that if it's in the device driver, there will be a VF number to identify
> the channel, while in user space that is not needed.
> 
> The other question is which protocol we're using. There are as far as I
> can tell five options:
> 
> 1. enic device driver
> 2. VDP
> 3. CDCP
> 4. CDCP + VDP
> 5. enic + VDP
> 
> The first two ones are the most interesting for now, since Linux cannot do
> S-VLANs yet, and they are required for CDCP. However, each of these options
> could theoreticall be done in the kernel (plus firmware) or in user space.
> 
> If it's done in user space, the VF number is meaningless, because the setup
> of the software device is also done from software, but instead you need to
> take care of creating the software device with the correct parameters, e.g.
> a macvtap device connected to a VLAN interface using the numbers you pass
> in the VDP protocol.
> 
> Right now, we're not planning to do the protocol that enic uses in LLDPAD,
> because it's not publically released. Similarly, there are no adapters that
> do VDP in firmware, but both these cases should be covered by the protocol
> and it would be good if libvirt could handle them.
> 
> Stefan, can you just define the XML in a way that matches the netlink
> definition? What you need is something like
> 
> 1. VF number (optional, signifies that 2/3 are done in firmware)
> 2. Lower-level protocol
>   2.1. CDCP
>      2.1.1 SVID
>      2.1.2 SCID
>   2.2. enic
>      2.2.1 port profile name
>      2.2.2 ...
> 3. VDP
>   3.1 VSI type/version/provider
>   3.2 UUID
>   3.3 MAC/VLAN
> 
> You need to have 2. or 3. or both, and 2.1/2.2 are mutually exclusive.

I'm don't think this is going in the right direction.  We're talking a
pretty simple concept of a port-profile used to configure the virtual port
backing a VM i/f to something that's trying to munge disjoint protocols
based on pre-standard work into a single API.  It's forcing all those
pre-standard protocol details into the API, into the XML, and into the mgmt
software (libvirt), and into the admin's lap.

I want the API to pass a port-profile name plus other information associated
with the VM i/f to some mgmt object which can setup the virtual port backing
the VM i/f.  (And unset it).  Using netlink for API let's that object be in
user- or kernel-space software, hardware or firmware.  How the object sets
up the virtual port based on the passed port-profile is beyond the scope of
the API.

My last port-profile patch is this API.  It gives us this:

    1) single netlink msg for kernel- and user-space
    2) single parameter set from sender's perspective (libvirt)
    3) single XML representation of parameters
    4) single code path in kernel and libvirt
    5) (potential) cross-vendor-switch VM migration
    6) admin-friendly port-profile names
    7) allows pre-standard (802.1Qbg/bh) details to change
       without bogging down the API

What I proposed on the libvirt list is to maintain a mapping database from
port-profile to vendor-specific or protocol-specific parameters.  Using VDP
as an example, a port-profile would resolve the VDP tuple:

     port-profile: "joes-garage"   --->  VSI Manager ID: 15
                                         VSI Type ID: 12345
                                         VSI Type ID Ver: 1
                                         other VSI settings (preassociate)

How the mapping database is maintained is beyond the scope of the API.

The port-profile string is the unifying concept.  This is the common ground
and the only way to be protocol-independent in the API.

-scott

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 11, 2010, 12:25 p.m. UTC | #6
On Tuesday 11 May 2010, Stefan Berger wrote:
> Arnd Bergmann <arnd@arndb.de> wrote on 05/10/2010 05:46:37 PM:
> 
> > Stefan, can you just define the XML in a way that matches the netlink
> > definition? What you need is something like
> > 
> > 1. VF number (optional, signifies that 2/3 are done in firmware)
> 
> Shouldn't we be able to query that number via netlink starting with the
> macvtap device and the following the trail to the root and trying to find
> a VF number on the way?

No. If we have a macvtap device, there is no VF number. The VF number
should be known to libvirt in those cases where instead of creating a
macvtap device, it assigns a VF of an SR-IOV adapter to the guest.

> > 2. Lower-level protocol
> >   2.1. CDCP
> >      2.1.1 SVID
> >      2.1.2 SCID
> 
> Will the later on be qeueryable via netlink as well but not today???
> Vivek tells me svid is vlan, so that could be found out from the kernel.
> 
> So if we want to only support 1 and 2 for now, I'd rather skip them for 
> now.

svid is almost vlan (hence S-VLAN), but slightly different and is not
currently supported by the kernel. Again, if the implementation is done in
firmware, libvirt needs to set the same S-VLAN ID when setting up the
VF and when associating it to the switch.

When it's done in software, we need to create the device (or have
it created in advance), so you either know it or can query it as
you describe.

You don't need to support it yet in libvirt, but the definition should
be done in a way that leaves the option open to add it later.

> >      2.2.2 ...
> > 3. VDP
> >   3.1 VSI type/version/provider
> 
> as proposed on libvirt mailing list
> 
> >   3.2 UUID
> 
> we have a couple of UUIDs, which one?

This is a UUID that describes the VSI to the switch. It needs to be
unique in the migration domain. For a guest that has multiple
macvtap interfaces, you either need to have a single UUID and
put all MAC/VLAN pairs into the same netlink message with this
UUID, or have one UUID per device. 
 
> >   3.3 MAC/VLAN
> 
> MAC: available from libvirt
> VLAN: can be found out by querying for every interface for VLAN ID while 
> following the path towards the root device. 

Yes, in case of macvtap.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 11, 2010, 12:59 p.m. UTC | #7
On Tuesday 11 May 2010, Scott Feldman wrote:
> On 5/10/10 2:46 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>
> > Stefan, can you just define the XML in a way that matches the netlink
> > definition? What you need is something like
> > 
> > 1. VF number (optional, signifies that 2/3 are done in firmware)
> > 2. Lower-level protocol
> >   2.1. CDCP
> >      2.1.1 SVID
> >      2.1.2 SCID
> >   2.2. enic
> >      2.2.1 port profile name
> >      2.2.2 ...
> > 3. VDP
> >   3.1 VSI type/version/provider
> >   3.2 UUID
> >   3.3 MAC/VLAN
> > 
> > You need to have 2. or 3. or both, and 2.1/2.2 are mutually exclusive.
> 
> I'm don't think this is going in the right direction.  We're talking a
> pretty simple concept of a port-profile used to configure the virtual port
> backing a VM i/f to something that's trying to munge disjoint protocols
> based on pre-standard work into a single API.  It's forcing all those
> pre-standard protocol details into the API, into the XML, and into the mgmt
> software (libvirt), and into the admin's lap.

No. I agree that the port-profile concept is simple and that the complexity
comes from trying to merge the Linux interface with what we do for VDP.
It would be much more sensible IMHO to just unify port-profiles and CDCP
in the kernel interface, because they are conceptually more similar and
both rather simple (note that there is no S-VLAN implementation in Linux,
so CDCP is not there yet either).

If we layer VDP on top of the two and do it always in user space (LLDPAD
with lldptool), things are much simpler on the interface side.

The focus of VDP is to manage migration, which is something that
enic doesn't do (or doesn't need to do) and most of the complexity
in there comes from this.

> I want the API to pass a port-profile name plus other information associated
> with the VM i/f to some mgmt object which can setup the virtual port backing
> the VM i/f.  (And unset it).  Using netlink for API let's that object be in
> user- or kernel-space software, hardware or firmware.  How the object sets
> up the virtual port based on the passed port-profile is beyond the scope of
> the API.
> 
> My last port-profile patch is this API.  It gives us this:
> 
>     1) single netlink msg for kernel- and user-space

Except for the VF argument, which is kernel- only, and it doesn't work
if user space needs additional information that only the firmware has
in your case (something like a virtual channel ID).

>     2) single parameter set from sender's perspective (libvirt)
>     3) single XML representation of parameters
>     4) single code path in kernel and libvirt

>     5) (potential) cross-vendor-switch VM migration

You cannot do live migration between Cisco switches and those implementing
802.1Qbg, because of the differences between port profiles and VSI types,
e.g. how they handle VLANs or handover during migration.
Migration between 802.1Qbg capable switches is covered by the standard.

>     6) admin-friendly port-profile names

>     7) allows pre-standard (802.1Qbg/bh) details to change
>        without bogging down the API

We're basically nailing down the API right here. As soon as this is
supported in Linux, it will be the standard, so the details won't
be able to change any more.

> What I proposed on the libvirt list is to maintain a mapping database from
> port-profile to vendor-specific or protocol-specific parameters.  Using VDP
> as an example, a port-profile would resolve the VDP tuple:
> 
>      port-profile: "joes-garage"   --->  VSI Manager ID: 15
>                                          VSI Type ID: 12345
>                                          VSI Type ID Ver: 1
>                                          other VSI settings (preassociate)

I agree that the port profile name matches the VSI manager/type/version ID
to a large degree and that it would be nice to unify these, but I don't think
there is a point given all the other differences.

There is no way around the preassociate/associate/disassociate messages
being part of the API, because otherwise you cannot do seamless migration
across multiple switches. 

Also, the primary key on VDP is the VSI UUID, which needs to be the same
on the target host after migration, while in your case the switch never
identifies the guest itself, only the port profile.

If I have understood you correctly, the primary key identifying a port
on enic is something that is only visible to the switch and nic firmware,
but never to software, so you identify the guest by its VF number on the
user API.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann May 11, 2010, 2:22 p.m. UTC | #8
On Tuesday 11 May 2010, Stefan Berger wrote:
> Arnd Bergmann <arnd@arndb.de> wrote on 05/11/2010 08:25:27 AM:
> > netdev, Scott Feldman
> > On Tuesday 11 May 2010, Stefan Berger wrote:
> > > Arnd Bergmann <arnd@arndb.de> wrote on 05/10/2010 05:46:37 PM:
> > No. If we have a macvtap device, there is no VF number. The VF number
> > should be known to libvirt in those cases where instead of creating a
> > macvtap device, it assigns a VF of an SR-IOV adapter to the guest.
> 
> The only interface type that currently supports the vsi parameters is the
> 'direct' type of interface which directly maps into macvtap. That's the 
> only one that would currently let you run the setup protocol with the
> switch. Regular tap devices created through other interface types 
> (bridge, network) do not support these parameters and hence you cannot
> run the protocol with the switch. I never tried passthrough but I 
> believe libvirt is not aware of what it is passing through nor do we
> currently support the parameters for passthrough devices.

Ok. I believe we will at least have to add the same kind of setup
to bridged devices as well, not just macvtap.

For SR-IOV with device assignment, I'm not sure. This will be more
important when adapters show up that actually support VEPA in hardware
and don't have their own switch, but even for those with an integrated
switch, it would be nice if we could use VDP correctly.

> > svid is almost vlan (hence S-VLAN), but slightly different and is not
> > currently supported by the kernel. Again, if the implementation is done  in
> > firmware, libvirt needs to set the same S-VLAN ID when setting up the
> > VF and when associating it to the switch.
> 
> The netlink messages go into the kernel and I suppose the driver should be
> able to find out what the S-VLAN ID is that it needs to use, no?

Possibly yes, but that will depend on how the firmware does this. It
may also be possible that adapters implement this similar to what enic
does, which does not expose the S-VLAN ID at all and uses the VF
number as the identifier.

Maybe we should leave out the CDCP stuff for now, until we start seeing
hardware for it.

> > This is a UUID that describes the VSI to the switch. It needs to be
> > unique in the migration domain. For a guest that has multiple
> > macvtap interfaces, you either need to have a single UUID and
> > put all MAC/VLAN pairs into the same netlink message with this
> > UUID, or have one UUID per device. 
> 
> In that case it's the instanceID as proposed in this XML here:
> 
>    <interface type='direct'>
>       <source dev='static' mode='vepa'/>
>       <model type='virtio'/>
>       <vsi managerid='12' typeid='0x123456' typeidversion='1'
>            instanceid='fa9b7fff-b0a0-4893-8e0e-beef4ff18f8f' />
>       <filterref filter='clean-traffic'/>
>    </interface>

Yes.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Kashyap May 11, 2010, 5:15 p.m. UTC | #9
>
> I'm don't think this is going in the right direction.  We're talking a
> pretty simple concept of a port-profile used to configure the virtual port
> backing a VM i/f to something that's trying to munge disjoint protocols
> based on pre-standard work into a single API.  It's forcing all those

The multi-channel setup and VDP can live, and are designed to live
together:
  - multiple channels per link (setup by the bridge) -- CDCP
  - association of a vsi (virtual interface) to bridge ports -- VDP

However, their implementation can be pursued separately.

> pre-standard protocol details into the API, into the XML, and into the mgmt
> software (libvirt), and into the admin's lap.
>
> I want the API to pass a port-profile name plus other information associated

The problem with defining a 'name' as the lookup key to another database is 
that one then requires additional management mechanisms to describe, 
maintain, and map to the real values needed. It lands in admin's lap 
anyway by a different path.


> with the VM i/f to some mgmt object which can setup the virtual port backing
> the VM i/f.  (And unset it).  Using netlink for API let's that object be in
> user- or kernel-space software, hardware or firmware.  How the object sets
> up the virtual port based on the passed port-profile is beyond the scope of
> the API.

Agree.  See Stefan's last patch in libvirt - it converges 802.1Qbh/portprofile
and 802.1Qbg quite well. The netlink message can be deciphered by the
recipient and meets the advantages below.  (Or, one could use a direct 
unix domain socket to communicate with a daemon as well).


>
> My last port-profile patch is this API.  It gives us this:
>
>    1) single netlink msg for kernel- and user-space
>    2) single parameter set from sender's perspective (libvirt)
>    3) single XML representation of parameters
>    4) single code path in kernel and libvirt
>    5) (potential) cross-vendor-switch VM migration
>    6) admin-friendly port-profile names
>    7) allows pre-standard (802.1Qbg/bh) details to change
>       without bogging down the API
>
> What I proposed on the libvirt list is to maintain a mapping database from
> port-profile to vendor-specific or protocol-specific parameters.  Using VDP
> as an example, a port-profile would resolve the VDP tuple:
>
>     port-profile: "joes-garage"   --->  VSI Manager ID: 15
>                                         VSI Type ID: 12345
>                                         VSI Type ID Ver: 1
>                                         other VSI settings (preassociate)
>
> How the mapping database is maintained is beyond the scope of the API.

Yes, but that will add another mechanism to set the mappings. The 
xml posted (on libvirt) defines the vsi values and avoids this 
additional management. It also allows for port-profile name for 802.1Qbh.

thanks
 	Vivek


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index d763358..675d190 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,7 +116,7 @@  enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
-	IFLA_VF_PORT_PROFILE,
+	IFLA_VIRTIF,
 	__IFLA_MAX
 };
 
@@ -262,26 +262,79 @@  struct ifla_vf_info {
 };
 
 enum {
-	IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN,
-	IFLA_VF_PORT_PROFILE_STATUS_SUCCESS,
-	IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS,
-	IFLA_VF_PORT_PROFILE_STATUS_ERROR,
+	IFLA_VIRTIF_UNSPEC,
+	IFLA_VIRTIF_VF,
+	IFLA_VIRTIF_PORT_PROFILE, /* Cisco enic */
+	IFLA_VIRTIF_CHANNEL,	  /* 802.1Qbg CDCP */
+	IFLA_VIRTIF_VSI,	  /* 802.1Qbg VDP */
+	IFLA_VIRTIF_VSI_MAC_VLAN,
+	__IFLA_VIRTIF_MAX,
 };
 
-#define IFLA_VF_PORT_PROFILE_MAX	40
-#define IFLA_VF_UUID_MAX		40
-#define IFLA_VF_CLIENT_NAME_MAX		40
+#define IFLA_VIRTIF_MAX (__IFLA_VIRTIF_MAX - 1)
 
-struct ifla_vf_port_profile {
-	__u32 vf;
+enum {
+	VIRTIF_PORT_PROFILE_STATUS_UNKNOWN,
+	VIRTIF_PORT_PROFILE_STATUS_SUCCESS,
+	VIRTIF_PORT_PROFILE_STATUS_INPROGRESS,
+	VIRTIF_PORT_PROFILE_STATUS_ERROR,
+};
+
+#define VIRTIF_PORT_PROFILE_MAX	40
+#define VIRTIF_UUID_MAX		40
+#define VIRTIF_CLIENT_NAME_MAX	40
+
+struct ifla_virtif_port_profile {
 	__u32 flags;
 	__u32 status;
-	__u8 port_profile[IFLA_VF_PORT_PROFILE_MAX];
+	__u8 port_profile[VIRTIF_PORT_PROFILE_MAX];
 	__u8 mac[32];					/* MAX_ADDR_LEN */
 	/* UUID e.g. "CEEFD3B1-9E11-11DE-BDFD-000BAB01C0FB" */
-	__u8 host_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_name[IFLA_VF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+	__u8 host_uuid[VIRTIF_UUID_MAX];
+	__u8 client_uuid[VIRTIF_UUID_MAX];
+	__u8 client_name[VIRTIF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+};
+
+/*
+ * CDCP and VDP come from the 802.1Qbg standard, these definitions are taken
+ * from the respective TLV definitions in there. Adapters implementing CDCP
+ * or VDP can encapsulate the structures in LLDP/ECP headers and send them
+ * to the switch.
+ */
+struct ifla_virtif_channel {
+	__u16 scid;	/* S-Channel ID */
+	__u16 svid;	/* S-VLAN ID */
+};
+
+enum {
+	VIRTIF_VDP_REQUEST_PREASSOCIATE = 0,
+	VIRVIF_VDP_REQUEST_PREASSOCIATE_RR,
+	VIRVIF_VDP_REQUEST_ASSOCIATE,
+	VIRVIF_VDP_REQUEST_DISASSOCIATE,
+};
+
+enum {
+	VIRTIF_VDP_RESPONSE_SUCCESS = 0,
+	VIRTIF_VDP_RESPONSE_INVALID_FORMAT,
+	VIRTIF_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+	VIRTIF_VDP_RESPONSE_UNUSED_VTID,
+	VIRTIF_VDP_RESPONSE_VTID_VIOLATION,
+	VIRTIF_VDP_RESPONSE_VTID_VERSION_VIOLATION,
+	VIRTIF_VDP_RESPONSE_OUT_OF_SYNC,
+};
+
+struct ifla_virtif_vsi {
+	__u8 mode_request;
+	__u8 mode_response;
+	__u8 vsi_mgr_id;	/* these three define the policy */
+	__u8 vsi_type_id[3]; 	/*   for the guest */
+	__u8 vsi_type_version;
+	__u8 vsi_instance[16];	/* identifies the guest */
+};
+
+struct ifla_virtif_vsi_mac_vlan {
+	__u8 mac[6];
+	__u8 vlan_id[2];
 };
 
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f5b0be5..d549a3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -687,9 +687,9 @@  struct netdev_rx_queue {
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  * int (*ndo_get_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -741,10 +741,10 @@  struct net_device_ops {
 						     struct ifla_vf_info *ivf);
 	int			(*ndo_set_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 	int			(*ndo_get_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2ef45b..cf5e328 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,6 +653,22 @@  static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		return 0;
 }
 
+static size_t rtnl_virtif_get_size(const struct net_device *dev)
+{
+	size_t total = 0;
+
+	if (dev->netdev_ops->ndo_get_vf_port_profile) {
+		total = dev_num_vf(dev->dev.parent) *
+			(nla_total_size(sizeof(struct ifla_virtif_port_profile))
+			 + nla_total_size(4)); /* IFLA_VIRTIF_VF */
+	}
+
+	/* fill in IFLA_VIRTIF_CHANNEL and IFLA_VIRTIF_VSI when needed */
+		
+	return total;
+}
+
+
 static inline size_t if_nlmsg_size(const struct net_device *dev)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -674,6 +690,38 @@  static inline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(4) /* IFLA_NUM_VF */
 	       + nla_total_size(rtnl_vfinfo_size(dev)) /* IFLA_VFINFO */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
+	       + rtnl_virtif_get_size(dev); /* IFLA_VIRTIF */
+}
+
+static int rtnl_virtif_fill_ifinfo(struct sk_buff *skb, struct net_device *dev)
+{
+	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
+		struct ifla_virtif_port_profile ivp;
+
+		if (dev_num_vf(dev->dev.parent)) {
+			int i;
+
+			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
+				if (dev->netdev_ops->ndo_get_vf_port_profile(
+					dev, i, &ivp))
+					break;
+				NLA_PUT_U32(skb, IFLA_VIRTIF_VF, i);
+				NLA_PUT(skb, IFLA_VIRTIF_PORT_PROFILE,
+					sizeof(ivp), &ivp);
+			}
+		}
+#if 0 /* Not sure how we should do this -arnd */
+		 else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
+			0, &ivp)) {
+			NLA_PUT(skb, VIRTIF_PORT_PROFILE,
+				sizeof(ivp), &ivp);
+		}
+#endif
+	}
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
 }
 
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
@@ -761,25 +809,8 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
-		struct ifla_vf_port_profile ivp;
-
-		if (dev_num_vf(dev->dev.parent)) {
-			int i;
-
-			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
-				if (dev->netdev_ops->ndo_get_vf_port_profile(
-					dev, i, &ivp))
-					break;
-				NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-					sizeof(ivp), &ivp);
-			}
-		} else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
-			0, &ivp)) {
-			NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-				sizeof(ivp), &ivp);
-		}
-	}
+	if (rtnl_virtif_fill_ifinfo(skb, dev))
+		goto nla_put_failure;
 
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
@@ -847,8 +878,7 @@  const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
-	[IFLA_VF_PORT_PROFILE]	= { .type = NLA_BINARY,
-				    .len = sizeof(struct ifla_vf_port_profile)},
+	[IFLA_VIRTIF]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -857,6 +887,18 @@  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
 };
 
+static const struct nla_policy ifla_virtif_policy[IFLA_VIRTIF_MAX+1] = {
+	[IFLA_VIRTIF_VF]	   = { .type = NLA_U32 },
+	[IFLA_VIRTIF_PORT_PROFILE] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_port_profile) },
+	[IFLA_VIRTIF_CHANNEL]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_channel) },
+	[IFLA_VIRTIF_VSI]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi) },
+	[IFLA_VIRTIF_VSI_MAC_VLAN] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi_mac_vlan) },
+};
+				    
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -1053,16 +1095,30 @@  static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
-	if (tb[IFLA_VF_PORT_PROFILE]) {
-		struct ifla_vf_port_profile *ivp;
-		ivp = nla_data(tb[IFLA_VF_PORT_PROFILE]);
+	if (tb[IFLA_VIRTIF]) {
+		struct ifla_virtif_port_profile *ivp;
+		struct nlattr *virtif[IFLA_VIRTIF_MAX+1];
+		u32 vf;
+
+		err = nla_parse_nested(virtif, IFLA_VIRTIF_MAX,
+				       tb[IFLA_VIRTIF], ifla_virtif_policy);
+		if (err < 0)
+			return err;
+
+		if (!virtif[IFLA_VIRTIF_VF] || !virtif[IFLA_VIRTIF_PORT_PROFILE])
+			goto novirtif; /* IFLA_VIRTIF may be directed at user space */
+
+		vf = nla_get_u32(virtif[IFLA_VIRTIF_VF]);
+		ivp = nla_data(virtif[IFLA_VIRTIF_PORT_PROFILE]);
+
 		err = -EOPNOTSUPP;
 		if (ops->ndo_set_vf_port_profile)
-			err = ops->ndo_set_vf_port_profile(dev, ivp->vf, ivp);
+			err = ops->ndo_set_vf_port_profile(dev, vf, ivp);
 		if (err < 0)
 			goto errout;
 		modified = 1;
 	}
+novirtif:
 	err = 0;
 
 errout: