Message ID | 20210509140305.1910796-1-frode.nordahl@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,RFC] Introduce representor port plugging support | expand |
On 5/9/21 4:03 PM, Frode Nordahl wrote: > Introduce plugging module that adds and removes ports on the > integration bridge, as directed by Port_Binding options. > > Traditionally it has been the CMSs responsibility to create Virtual > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > Machine etc.) life cycle, and subsequently manage plug/unplug > operations on the Open vSwitch integration bridge. > > With the advent of NICs connected to multiple distinct CPUs we can > have a topology where the instance runs on one host and Open > vSwitch and OVN runs on a different host, the smartnic CPU. > > The act of plugging and unplugging the representor port in Open > vSwitch running on the smartnic host CPU would be the same for > every smartnic variant (thanks to the devlink-port[0][1] > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > As such it is natural to extend OVN to provide this common > functionality through its CMS facing API. Hi, Frode. Thanks for putting this together, but it doesn't look natural to me. OVN, AFAIK, never touched physical devices or interacted with the kernel directly. This change introduces completely new functionality inside OVN. With the same effect we can run a fully separate service on these smartnic CPUs that will do plugging and configuration job for CMS. You may even make it independent from a particular CMS by creating a REST API for it or whatever. This will additionally allow using same service for non-OVN setups. Interactions with physical devices also makes OVN linux-dependent at least for this use case, IIUC. Maybe, others has different opinions. Another though is that there is, obviously, a network connection between the host and smartnic system. Maybe it's possible to just add an extra remote to the local ovsdb-server so CMS daemon on the host system could just add interfaces over the network connection? Best regards, Ilya Maximets.
On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > Introduce plugging module that adds and removes ports on the > > integration bridge, as directed by Port_Binding options. > > > > Traditionally it has been the CMSs responsibility to create Virtual > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > Machine etc.) life cycle, and subsequently manage plug/unplug > > operations on the Open vSwitch integration bridge. > > > > With the advent of NICs connected to multiple distinct CPUs we can > > have a topology where the instance runs on one host and Open > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > The act of plugging and unplugging the representor port in Open > > vSwitch running on the smartnic host CPU would be the same for > > every smartnic variant (thanks to the devlink-port[0][1] > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > As such it is natural to extend OVN to provide this common > > functionality through its CMS facing API. > > Hi, Frode. Thanks for putting this together, but it doesn't look > natural to me. OVN, AFAIK, never touched physical devices or > interacted with the kernel directly. This change introduces completely > new functionality inside OVN. With the same effect we can run a fully > separate service on these smartnic CPUs that will do plugging > and configuration job for CMS. You may even make it independent > from a particular CMS by creating a REST API for it or whatever. > This will additionally allow using same service for non-OVN setups. Ilya, Thank you for taking the time to comment, much appreciated. Yes, this is new functionality, NICs with separate control plane CPUs and isolation from the host are also new, so this is one proposal for how we could go about to enable the use of them. The OVN controller does today get pretty close to the physical realm by maintaining patch ports in Open vSwitch based on bridge mapping configuration and presence of bridges to physical interfaces. It also does react to events of physical interfaces being plugged into the Open vSwitch instance it manages, albeit to date some other entity has been doing the act of adding the port into the bridge. The rationale for proposing to use the OVN database for coordinating this is that the information about which ports to bind, and where to bind them is already there. The timing of the information flow from the CMS is also suitable for the task. OVN relies on OVS library code, and all the necessary libraries for interfacing with the kernel through netlink and friends are there or would be easy to add. The rationale for using the netlink-devlink interface is that it provides a generic infrastructure for these types of NICs. So by using this interface we should be able to support most if not all of the variants of these cards. Providing a separate OVN service to do the task could work, but would have the cost of an extra SB DB connection, IDL and monitors. I fear it would be quite hard to build a whole separate project with its own API, feels like a lot of duplicated effort when the flow of data and APIs in OVN already align so well with CMSs interested in using this? > Interactions with physical devices also makes OVN linux-dependent > at least for this use case, IIUC. This specific bit would be linux-specific in the first iteration, yes. But the vendors manufacturing and distributing the hardware do often have drivers for other platforms, I am sure the necessary infrastructure will become available there too over time, if it is not there already. We do currently have platform specific macros in the OVN build system, so we could enable the functionality when built on a compatible platform. > Maybe, others has different opinions. I appreciate your opinion, and enjoy discussing this topic. > Another though is that there is, obviously, a network connection > between the host and smartnic system. Maybe it's possible to just > add an extra remote to the local ovsdb-server so CMS daemon on the > host system could just add interfaces over the network connection? There are a few issues with such an approach. One of the main goals with providing and using a NIC with control plane CPUs is having an extra layer of security and isolation which is separate from the hypervisor host the card happens to share a PCI complex with and draw power from. Requiring a connection between the two for operation would defy this purpose. In addition to that, this class of cards provide visibility into kernel interfaces, enumeration of representor ports etc. only from the NIC control plane CPU side of the PCI complex, this information is not provided to the host. So if a hypervisor host CMS agent were to do the plugging through a remote ovsdb connection, it would have to communicate with something else running on the NIC control plane CPU to retrieve the information it needs before it can know what to relay back over the ovsdb connection. -- Frode Nordahl > Best regards, Ilya Maximets.
On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > Introduce plugging module that adds and removes ports on the > > > integration bridge, as directed by Port_Binding options. > > > > > > Traditionally it has been the CMSs responsibility to create Virtual > > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > operations on the Open vSwitch integration bridge. > > > > > > With the advent of NICs connected to multiple distinct CPUs we can > > > have a topology where the instance runs on one host and Open > > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > The act of plugging and unplugging the representor port in Open > > > vSwitch running on the smartnic host CPU would be the same for > > > every smartnic variant (thanks to the devlink-port[0][1] > > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > As such it is natural to extend OVN to provide this common > > > functionality through its CMS facing API. > > > > Hi, Frode. Thanks for putting this together, but it doesn't look > > natural to me. OVN, AFAIK, never touched physical devices or > > interacted with the kernel directly. This change introduces completely > > new functionality inside OVN. With the same effect we can run a fully > > separate service on these smartnic CPUs that will do plugging > > and configuration job for CMS. You may even make it independent > > from a particular CMS by creating a REST API for it or whatever. > > This will additionally allow using same service for non-OVN setups. > > Ilya, > > Thank you for taking the time to comment, much appreciated. > > Yes, this is new functionality, NICs with separate control plane CPUs > and isolation from the host are also new, so this is one proposal for > how we could go about to enable the use of them. > > The OVN controller does today get pretty close to the physical realm > by maintaining patch ports in Open vSwitch based on bridge mapping > configuration and presence of bridges to physical interfaces. It also > does react to events of physical interfaces being plugged into the > Open vSwitch instance it manages, albeit to date some other entity has > been doing the act of adding the port into the bridge. > > The rationale for proposing to use the OVN database for coordinating > this is that the information about which ports to bind, and where to > bind them is already there. The timing of the information flow from > the CMS is also suitable for the task. > > OVN relies on OVS library code, and all the necessary libraries for > interfacing with the kernel through netlink and friends are there or > would be easy to add. The rationale for using the netlink-devlink > interface is that it provides a generic infrastructure for these types > of NICs. So by using this interface we should be able to support most > if not all of the variants of these cards. > > > Providing a separate OVN service to do the task could work, but would > have the cost of an extra SB DB connection, IDL and monitors. > > I fear it would be quite hard to build a whole separate project with > its own API, feels like a lot of duplicated effort when the flow of > data and APIs in OVN already align so well with CMSs interested in > using this? > > > Interactions with physical devices also makes OVN linux-dependent > > at least for this use case, IIUC. > > This specific bit would be linux-specific in the first iteration, yes. > But the vendors manufacturing and distributing the hardware do often > have drivers for other platforms, I am sure the necessary > infrastructure will become available there too over time, if it is not > there already. > > We do currently have platform specific macros in the OVN build system, > so we could enable the functionality when built on a compatible > platform. > > > Maybe, others has different opinions. > > I appreciate your opinion, and enjoy discussing this topic. > > > Another though is that there is, obviously, a network connection > > between the host and smartnic system. Maybe it's possible to just > > add an extra remote to the local ovsdb-server so CMS daemon on the > > host system could just add interfaces over the network connection? > > There are a few issues with such an approach. One of the main goals > with providing and using a NIC with control plane CPUs is having an > extra layer of security and isolation which is separate from the > hypervisor host the card happens to share a PCI complex with and draw > power from. Requiring a connection between the two for operation would > defy this purpose. > > In addition to that, this class of cards provide visibility into > kernel interfaces, enumeration of representor ports etc. only from the > NIC control plane CPU side of the PCI complex, this information is not > provided to the host. So if a hypervisor host CMS agent were to do the > plugging through a remote ovsdb connection, it would have to > communicate with something else running on the NIC control plane CPU > to retrieve the information it needs before it can know what to relay > back over the ovsdb connection. > > -- > Frode Nordahl > > > Best regards, Ilya Maximets. Here are my 2 cents. Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. There are some benefits: 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. Thanks, Han
On 6/10/21 8:36 AM, Han Zhou wrote: > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: >> >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: >> > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: >> > > Introduce plugging module that adds and removes ports on the >> > > integration bridge, as directed by Port_Binding options. >> > > >> > > Traditionally it has been the CMSs responsibility to create Virtual >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual >> > > Machine etc.) life cycle, and subsequently manage plug/unplug >> > > operations on the Open vSwitch integration bridge. >> > > >> > > With the advent of NICs connected to multiple distinct CPUs we can >> > > have a topology where the instance runs on one host and Open >> > > vSwitch and OVN runs on a different host, the smartnic CPU. >> > > >> > > The act of plugging and unplugging the representor port in Open >> > > vSwitch running on the smartnic host CPU would be the same for >> > > every smartnic variant (thanks to the devlink-port[0][1] >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). >> > > As such it is natural to extend OVN to provide this common >> > > functionality through its CMS facing API. >> > >> > Hi, Frode. Thanks for putting this together, but it doesn't look >> > natural to me. OVN, AFAIK, never touched physical devices or >> > interacted with the kernel directly. This change introduces completely >> > new functionality inside OVN. With the same effect we can run a fully >> > separate service on these smartnic CPUs that will do plugging >> > and configuration job for CMS. You may even make it independent >> > from a particular CMS by creating a REST API for it or whatever. >> > This will additionally allow using same service for non-OVN setups. >> >> Ilya, >> >> Thank you for taking the time to comment, much appreciated. >> >> Yes, this is new functionality, NICs with separate control plane CPUs >> and isolation from the host are also new, so this is one proposal for >> how we could go about to enable the use of them. >> >> The OVN controller does today get pretty close to the physical realm >> by maintaining patch ports in Open vSwitch based on bridge mapping >> configuration and presence of bridges to physical interfaces. It also >> does react to events of physical interfaces being plugged into the >> Open vSwitch instance it manages, albeit to date some other entity has >> been doing the act of adding the port into the bridge. >> >> The rationale for proposing to use the OVN database for coordinating >> this is that the information about which ports to bind, and where to >> bind them is already there. The timing of the information flow from >> the CMS is also suitable for the task. >> >> OVN relies on OVS library code, and all the necessary libraries for >> interfacing with the kernel through netlink and friends are there or >> would be easy to add. The rationale for using the netlink-devlink >> interface is that it provides a generic infrastructure for these types >> of NICs. So by using this interface we should be able to support most >> if not all of the variants of these cards. >> >> >> Providing a separate OVN service to do the task could work, but would >> have the cost of an extra SB DB connection, IDL and monitors. IMHO, CMS should never connect to Southbound DB. It's just because the Southbound DB is not meant to be a public interface, it just happened to be available for connections. I know that OpenStack has metadata agents that connects to Sb DB, but if it's really required for them, I think, there should be a different way to get/set required information without connection to the Southbound. >> >> I fear it would be quite hard to build a whole separate project with >> its own API, feels like a lot of duplicated effort when the flow of >> data and APIs in OVN already align so well with CMSs interested in >> using this? >> >> > Interactions with physical devices also makes OVN linux-dependent >> > at least for this use case, IIUC. >> >> This specific bit would be linux-specific in the first iteration, yes. >> But the vendors manufacturing and distributing the hardware do often >> have drivers for other platforms, I am sure the necessary >> infrastructure will become available there too over time, if it is not >> there already. >> >> We do currently have platform specific macros in the OVN build system, >> so we could enable the functionality when built on a compatible >> platform. >> >> > Maybe, others has different opinions. >> >> I appreciate your opinion, and enjoy discussing this topic. >> >> > Another though is that there is, obviously, a network connection >> > between the host and smartnic system. Maybe it's possible to just >> > add an extra remote to the local ovsdb-server so CMS daemon on the >> > host system could just add interfaces over the network connection? >> >> There are a few issues with such an approach. One of the main goals >> with providing and using a NIC with control plane CPUs is having an >> extra layer of security and isolation which is separate from the >> hypervisor host the card happens to share a PCI complex with and draw >> power from. Requiring a connection between the two for operation would >> defy this purpose. >> >> In addition to that, this class of cards provide visibility into >> kernel interfaces, enumeration of representor ports etc. only from the >> NIC control plane CPU side of the PCI complex, this information is not >> provided to the host. So if a hypervisor host CMS agent were to do the >> plugging through a remote ovsdb connection, it would have to >> communicate with something else running on the NIC control plane CPU >> to retrieve the information it needs before it can know what to relay >> back over the ovsdb connection. >> >> -- >> Frode Nordahl >> >> > Best regards, Ilya Maximets. > > Here are my 2 cents. > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> AFAICT, a big part of the work is already done on the ovn-k8s side: https://github.com/ovn-org/ovn-kubernetes/pull/2005 https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. While it's, probably, OK for OVN to add port to the OVSDB, in many cases these ports will require a lot of extra configuration which is typically done by os-vif or CNI/device plugins. Imagine that OVS is running with userspace datapath and you need to plug-in some DPDK ports, where you have to specify the port type, DPDK port config, porbably also number of rx/tx queues, number of descriptors in these queues. You may also want to configure affinity for these queues per PMD thread in OVS. For kernel interfaces it might be easier, but they also might require some extra configuration that OVN will have to think about now. This is a typical job for CMS to configure this kind of stuff, and that is why projects like os-vif or large variety of CNI/device plugins exists. > > There are some benefits: > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) The problem I see is that at least ovn-k8s, AFAIU, will require the daemon on the Smart NIC anyways to monitor and configure OVN components, e.g. to configure ovn-remote for ovn-controller or run management appctl commands if required. So, I don't see the point why it can't do plugging if it's already there. > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. Aside from plugging, I also don't see the reason to have devlink code in OVN just because it runs once on the init stage, IIUC. So, I don't understand why this information about the hardware cannot be retrieved during the host provisioning and stored somewhere (Nova config?). Isn't hardware inventory a job for tripleo or something? Best regards, Ilya Maximets.
On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > >> > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > >> > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > >> > > Introduce plugging module that adds and removes ports on the > >> > > integration bridge, as directed by Port_Binding options. > >> > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > >> > > operations on the Open vSwitch integration bridge. > >> > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > >> > > have a topology where the instance runs on one host and Open > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > >> > > > >> > > The act of plugging and unplugging the representor port in Open > >> > > vSwitch running on the smartnic host CPU would be the same for > >> > > every smartnic variant (thanks to the devlink-port[0][1] > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > >> > > As such it is natural to extend OVN to provide this common > >> > > functionality through its CMS facing API. > >> > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > >> > natural to me. OVN, AFAIK, never touched physical devices or > >> > interacted with the kernel directly. This change introduces completely > >> > new functionality inside OVN. With the same effect we can run a fully > >> > separate service on these smartnic CPUs that will do plugging > >> > and configuration job for CMS. You may even make it independent > >> > from a particular CMS by creating a REST API for it or whatever. > >> > This will additionally allow using same service for non-OVN setups. > >> > >> Ilya, > >> > >> Thank you for taking the time to comment, much appreciated. > >> > >> Yes, this is new functionality, NICs with separate control plane CPUs > >> and isolation from the host are also new, so this is one proposal for > >> how we could go about to enable the use of them. > >> > >> The OVN controller does today get pretty close to the physical realm > >> by maintaining patch ports in Open vSwitch based on bridge mapping > >> configuration and presence of bridges to physical interfaces. It also > >> does react to events of physical interfaces being plugged into the > >> Open vSwitch instance it manages, albeit to date some other entity has > >> been doing the act of adding the port into the bridge. > >> > >> The rationale for proposing to use the OVN database for coordinating > >> this is that the information about which ports to bind, and where to > >> bind them is already there. The timing of the information flow from > >> the CMS is also suitable for the task. > >> > >> OVN relies on OVS library code, and all the necessary libraries for > >> interfacing with the kernel through netlink and friends are there or > >> would be easy to add. The rationale for using the netlink-devlink > >> interface is that it provides a generic infrastructure for these types > >> of NICs. So by using this interface we should be able to support most > >> if not all of the variants of these cards. > >> > >> > >> Providing a separate OVN service to do the task could work, but would > >> have the cost of an extra SB DB connection, IDL and monitors. > > IMHO, CMS should never connect to Southbound DB. It's just because the > Southbound DB is not meant to be a public interface, it just happened > to be available for connections. I know that OpenStack has metadata > agents that connects to Sb DB, but if it's really required for them, I > think, there should be a different way to get/set required information > without connection to the Southbound. The CMS-facing API is the Northbound DB, I was not suggesting direct use of the Southbound DB by external to OVN services. My suggestion was to have a separate OVN process do this if your objection was to handle it as part of the ovn-controller process. > >> > >> I fear it would be quite hard to build a whole separate project with > >> its own API, feels like a lot of duplicated effort when the flow of > >> data and APIs in OVN already align so well with CMSs interested in > >> using this? > >> > >> > Interactions with physical devices also makes OVN linux-dependent > >> > at least for this use case, IIUC. > >> > >> This specific bit would be linux-specific in the first iteration, yes. > >> But the vendors manufacturing and distributing the hardware do often > >> have drivers for other platforms, I am sure the necessary > >> infrastructure will become available there too over time, if it is not > >> there already. > >> > >> We do currently have platform specific macros in the OVN build system, > >> so we could enable the functionality when built on a compatible > >> platform. > >> > >> > Maybe, others has different opinions. > >> > >> I appreciate your opinion, and enjoy discussing this topic. > >> > >> > Another though is that there is, obviously, a network connection > >> > between the host and smartnic system. Maybe it's possible to just > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > >> > host system could just add interfaces over the network connection? > >> > >> There are a few issues with such an approach. One of the main goals > >> with providing and using a NIC with control plane CPUs is having an > >> extra layer of security and isolation which is separate from the > >> hypervisor host the card happens to share a PCI complex with and draw > >> power from. Requiring a connection between the two for operation would > >> defy this purpose. > >> > >> In addition to that, this class of cards provide visibility into > >> kernel interfaces, enumeration of representor ports etc. only from the > >> NIC control plane CPU side of the PCI complex, this information is not > >> provided to the host. So if a hypervisor host CMS agent were to do the > >> plugging through a remote ovsdb connection, it would have to > >> communicate with something else running on the NIC control plane CPU > >> to retrieve the information it needs before it can know what to relay > >> back over the ovsdb connection. > >> > >> -- > >> Frode Nordahl > >> > >> > Best regards, Ilya Maximets. > > > > Here are my 2 cents. > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > AFAICT, a big part of the work is already done on the ovn-k8s side: > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > https://github.com/ovn-org/ovn-kubernetes/pull/2042 I am aware of the on-going effort to implement support for this in ovn-kubernetes directly. What we have identified is that there are other CMSs that want this functionality, and with that we have an opportunity to generalise and provide an abstraction in a common place that all the consuming CMSes can benefit from. > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > While it's, probably, OK for OVN to add port to the OVSDB, in many > cases these ports will require a lot of extra configuration which > is typically done by os-vif or CNI/device plugins. Imagine that OVS > is running with userspace datapath and you need to plug-in some DPDK > ports, where you have to specify the port type, DPDK port config, > porbably also number of rx/tx queues, number of descriptors in these > queues. You may also want to configure affinity for these queues per > PMD thread in OVS. For kernel interfaces it might be easier, but they > also might require some extra configuration that OVN will have to > think about now. This is a typical job for CMS to configure this > kind of stuff, and that is why projects like os-vif or large variety > of CNI/device plugins exists. CNI is Kubernetes specific, os-vif is OpenStack specific. And both of them get their information from the CMS. Providing support for userspace datapath and DPDK would require more information, some of which is available through devlink, some fit well in key/value options. Our initial target would be to support the kernel representor port workflow. > > > > There are some benefits: > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > The problem I see is that at least ovn-k8s, AFAIU, will require > the daemon on the Smart NIC anyways to monitor and configure > OVN components, e.g. to configure ovn-remote for ovn-controller > or run management appctl commands if required. > So, I don't see the point why it can't do plugging if it's already > there. This pattern is Kubernetes specific and is not the case for other CMSes. The current proposal for enabling Smart NIC with control plane CPUs for ovn-kubernetes could be simplified if the networking platform provided means for coordinating more of the network related bits. > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > Aside from plugging, > I also don't see the reason to have devlink code in OVN just > because it runs once on the init stage, IIUC. So, I don't > understand why this information about the hardware cannot be > retrieved during the host provisioning and stored somewhere > (Nova config?). Isn't hardware inventory a job for tripleo > or something? As noted in one of the TODOs in the commit message of the RFC one of the work items to further develop this is to monitor devlink for changes, the end product will not do one-time initialization. While I agree with you that the current SR-IOV acceleration workflow configuration is pretty static and can be done at deploy time, this proposal prepares for the next generation subfunction workflow where you will have a much higher density, and run-time configuration and discovery of representor ports. There is a paper about it from netdev conf[2], and all of this is part of the devlink infrastructure (Linux kernel ABI). 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf -- Frode Nordahl > Best regards, Ilya Maximets.
On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > >> > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > >> > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > >> > > Introduce plugging module that adds and removes ports on the > > >> > > integration bridge, as directed by Port_Binding options. > > >> > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > >> > > operations on the Open vSwitch integration bridge. > > >> > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > >> > > have a topology where the instance runs on one host and Open > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > >> > > > > >> > > The act of plugging and unplugging the representor port in Open > > >> > > vSwitch running on the smartnic host CPU would be the same for > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > >> > > As such it is natural to extend OVN to provide this common > > >> > > functionality through its CMS facing API. > > >> > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > >> > interacted with the kernel directly. This change introduces completely > > >> > new functionality inside OVN. With the same effect we can run a fully > > >> > separate service on these smartnic CPUs that will do plugging > > >> > and configuration job for CMS. You may even make it independent > > >> > from a particular CMS by creating a REST API for it or whatever. > > >> > This will additionally allow using same service for non-OVN setups. > > >> > > >> Ilya, > > >> > > >> Thank you for taking the time to comment, much appreciated. > > >> > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > >> and isolation from the host are also new, so this is one proposal for > > >> how we could go about to enable the use of them. > > >> > > >> The OVN controller does today get pretty close to the physical realm > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > >> configuration and presence of bridges to physical interfaces. It also > > >> does react to events of physical interfaces being plugged into the > > >> Open vSwitch instance it manages, albeit to date some other entity has > > >> been doing the act of adding the port into the bridge. > > >> > > >> The rationale for proposing to use the OVN database for coordinating > > >> this is that the information about which ports to bind, and where to > > >> bind them is already there. The timing of the information flow from > > >> the CMS is also suitable for the task. > > >> > > >> OVN relies on OVS library code, and all the necessary libraries for > > >> interfacing with the kernel through netlink and friends are there or > > >> would be easy to add. The rationale for using the netlink-devlink > > >> interface is that it provides a generic infrastructure for these types > > >> of NICs. So by using this interface we should be able to support most > > >> if not all of the variants of these cards. > > >> > > >> > > >> Providing a separate OVN service to do the task could work, but would > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > Southbound DB is not meant to be a public interface, it just happened > > to be available for connections. I know that OpenStack has metadata > > agents that connects to Sb DB, but if it's really required for them, I > > think, there should be a different way to get/set required information > > without connection to the Southbound. > > The CMS-facing API is the Northbound DB, I was not suggesting direct > use of the Southbound DB by external to OVN services. My suggestion > was to have a separate OVN process do this if your objection was to > handle it as part of the ovn-controller process. > > > >> > > >> I fear it would be quite hard to build a whole separate project with > > >> its own API, feels like a lot of duplicated effort when the flow of > > >> data and APIs in OVN already align so well with CMSs interested in > > >> using this? > > >> > > >> > Interactions with physical devices also makes OVN linux-dependent > > >> > at least for this use case, IIUC. > > >> > > >> This specific bit would be linux-specific in the first iteration, yes. > > >> But the vendors manufacturing and distributing the hardware do often > > >> have drivers for other platforms, I am sure the necessary > > >> infrastructure will become available there too over time, if it is not > > >> there already. > > >> > > >> We do currently have platform specific macros in the OVN build system, > > >> so we could enable the functionality when built on a compatible > > >> platform. > > >> > > >> > Maybe, others has different opinions. > > >> > > >> I appreciate your opinion, and enjoy discussing this topic. > > >> > > >> > Another though is that there is, obviously, a network connection > > >> > between the host and smartnic system. Maybe it's possible to just > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > >> > host system could just add interfaces over the network connection? > > >> > > >> There are a few issues with such an approach. One of the main goals > > >> with providing and using a NIC with control plane CPUs is having an > > >> extra layer of security and isolation which is separate from the > > >> hypervisor host the card happens to share a PCI complex with and draw > > >> power from. Requiring a connection between the two for operation would > > >> defy this purpose. > > >> > > >> In addition to that, this class of cards provide visibility into > > >> kernel interfaces, enumeration of representor ports etc. only from the > > >> NIC control plane CPU side of the PCI complex, this information is not > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > >> plugging through a remote ovsdb connection, it would have to > > >> communicate with something else running on the NIC control plane CPU > > >> to retrieve the information it needs before it can know what to relay > > >> back over the ovsdb connection. > > >> > > >> -- > > >> Frode Nordahl > > >> > > >> > Best regards, Ilya Maximets. > > > > > > Here are my 2 cents. > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > I am aware of the on-going effort to implement support for this in > ovn-kubernetes directly. What we have identified is that there are > other CMSs that want this functionality, and with that we have an > opportunity to generalise and provide an abstraction in a common place > that all the consuming CMSes can benefit from. > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > cases these ports will require a lot of extra configuration which > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > is running with userspace datapath and you need to plug-in some DPDK > > ports, where you have to specify the port type, DPDK port config, > > porbably also number of rx/tx queues, number of descriptors in these > > queues. You may also want to configure affinity for these queues per > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > also might require some extra configuration that OVN will have to > > think about now. This is a typical job for CMS to configure this > > kind of stuff, and that is why projects like os-vif or large variety > > of CNI/device plugins exists. > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > them get their information from the CMS. Providing support for > userspace datapath and DPDK would require more information, some of > which is available through devlink, some fit well in key/value > options. Our initial target would be to support the kernel representor > port workflow. > > > > > > > There are some benefits: > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > the daemon on the Smart NIC anyways to monitor and configure > > OVN components, e.g. to configure ovn-remote for ovn-controller > > or run management appctl commands if required. > > So, I don't see the point why it can't do plugging if it's already > > there. > > This pattern is Kubernetes specific and is not the case for other > CMSes. The current proposal for enabling Smart NIC with control plane > CPUs for ovn-kubernetes could be simplified if the networking platform > provided means for coordinating more of the network related bits. > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > Aside from plugging, > > I also don't see the reason to have devlink code in OVN just > > because it runs once on the init stage, IIUC. So, I don't > > understand why this information about the hardware cannot be > > retrieved during the host provisioning and stored somewhere > > (Nova config?). Isn't hardware inventory a job for tripleo > > or something? > > As noted in one of the TODOs in the commit message of the RFC one of > the work items to further develop this is to monitor devlink for > changes, the end product will not do one-time initialization. > > While I agree with you that the current SR-IOV acceleration workflow > configuration is pretty static and can be done at deploy time, this > proposal prepares for the next generation subfunction workflow where > you will have a much higher density, and run-time configuration and > discovery of representor ports. There is a paper about it from netdev > conf[2], and all of this is part of the devlink infrastructure (Linux > kernel ABI). > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf My 2 cents. I agree with Ilya. It doesn't seem natural to me for OVN to create OVS ports and also to use devlink infrastructure. I think an external entity can do all these. The other downside I see with this RFC is that it will be now monitoring all the port binding rows without any condition because when a port binding is created the chassis column will be NULL. Correct me If I'm wrong here. Perhaps there can be a separate project called - ovn-vif which does this ? Thanks Numan > > > -- > Frode Nordahl > > > Best regards, Ilya Maximets. > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > <frode.nordahl@canonical.com> wrote: > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > >> > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > >> > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > >> > > Introduce plugging module that adds and removes ports on the > > > >> > > integration bridge, as directed by Port_Binding options. > > > >> > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > >> > > operations on the Open vSwitch integration bridge. > > > >> > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > >> > > have a topology where the instance runs on one host and Open > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > >> > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > >> > > As such it is natural to extend OVN to provide this common > > > >> > > functionality through its CMS facing API. > > > >> > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > >> > interacted with the kernel directly. This change introduces completely > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > >> > separate service on these smartnic CPUs that will do plugging > > > >> > and configuration job for CMS. You may even make it independent > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > >> > This will additionally allow using same service for non-OVN setups. > > > >> > > > >> Ilya, > > > >> > > > >> Thank you for taking the time to comment, much appreciated. > > > >> > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > >> and isolation from the host are also new, so this is one proposal for > > > >> how we could go about to enable the use of them. > > > >> > > > >> The OVN controller does today get pretty close to the physical realm > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > >> configuration and presence of bridges to physical interfaces. It also > > > >> does react to events of physical interfaces being plugged into the > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > >> been doing the act of adding the port into the bridge. > > > >> > > > >> The rationale for proposing to use the OVN database for coordinating > > > >> this is that the information about which ports to bind, and where to > > > >> bind them is already there. The timing of the information flow from > > > >> the CMS is also suitable for the task. > > > >> > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > >> interfacing with the kernel through netlink and friends are there or > > > >> would be easy to add. The rationale for using the netlink-devlink > > > >> interface is that it provides a generic infrastructure for these types > > > >> of NICs. So by using this interface we should be able to support most > > > >> if not all of the variants of these cards. > > > >> > > > >> > > > >> Providing a separate OVN service to do the task could work, but would > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > Southbound DB is not meant to be a public interface, it just happened > > > to be available for connections. I know that OpenStack has metadata > > > agents that connects to Sb DB, but if it's really required for them, I > > > think, there should be a different way to get/set required information > > > without connection to the Southbound. > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > use of the Southbound DB by external to OVN services. My suggestion > > was to have a separate OVN process do this if your objection was to > > handle it as part of the ovn-controller process. > > > > > >> > > > >> I fear it would be quite hard to build a whole separate project with > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > >> data and APIs in OVN already align so well with CMSs interested in > > > >> using this? > > > >> > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > >> > at least for this use case, IIUC. > > > >> > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > >> But the vendors manufacturing and distributing the hardware do often > > > >> have drivers for other platforms, I am sure the necessary > > > >> infrastructure will become available there too over time, if it is not > > > >> there already. > > > >> > > > >> We do currently have platform specific macros in the OVN build system, > > > >> so we could enable the functionality when built on a compatible > > > >> platform. > > > >> > > > >> > Maybe, others has different opinions. > > > >> > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > >> > > > >> > Another though is that there is, obviously, a network connection > > > >> > between the host and smartnic system. Maybe it's possible to just > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > >> > host system could just add interfaces over the network connection? > > > >> > > > >> There are a few issues with such an approach. One of the main goals > > > >> with providing and using a NIC with control plane CPUs is having an > > > >> extra layer of security and isolation which is separate from the > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > >> power from. Requiring a connection between the two for operation would > > > >> defy this purpose. > > > >> > > > >> In addition to that, this class of cards provide visibility into > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > >> plugging through a remote ovsdb connection, it would have to > > > >> communicate with something else running on the NIC control plane CPU > > > >> to retrieve the information it needs before it can know what to relay > > > >> back over the ovsdb connection. > > > >> > > > >> -- > > > >> Frode Nordahl > > > >> > > > >> > Best regards, Ilya Maximets. > > > > > > > > Here are my 2 cents. > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > I am aware of the on-going effort to implement support for this in > > ovn-kubernetes directly. What we have identified is that there are > > other CMSs that want this functionality, and with that we have an > > opportunity to generalise and provide an abstraction in a common place > > that all the consuming CMSes can benefit from. > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > cases these ports will require a lot of extra configuration which > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > is running with userspace datapath and you need to plug-in some DPDK > > > ports, where you have to specify the port type, DPDK port config, > > > porbably also number of rx/tx queues, number of descriptors in these > > > queues. You may also want to configure affinity for these queues per > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > also might require some extra configuration that OVN will have to > > > think about now. This is a typical job for CMS to configure this > > > kind of stuff, and that is why projects like os-vif or large variety > > > of CNI/device plugins exists. > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > them get their information from the CMS. Providing support for > > userspace datapath and DPDK would require more information, some of > > which is available through devlink, some fit well in key/value > > options. Our initial target would be to support the kernel representor > > port workflow. > > > > > > > > > > There are some benefits: > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > the daemon on the Smart NIC anyways to monitor and configure > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > or run management appctl commands if required. > > > So, I don't see the point why it can't do plugging if it's already > > > there. > > > > This pattern is Kubernetes specific and is not the case for other > > CMSes. The current proposal for enabling Smart NIC with control plane > > CPUs for ovn-kubernetes could be simplified if the networking platform > > provided means for coordinating more of the network related bits. > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > Aside from plugging, > > > I also don't see the reason to have devlink code in OVN just > > > because it runs once on the init stage, IIUC. So, I don't > > > understand why this information about the hardware cannot be > > > retrieved during the host provisioning and stored somewhere > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > or something? > > > > As noted in one of the TODOs in the commit message of the RFC one of > > the work items to further develop this is to monitor devlink for > > changes, the end product will not do one-time initialization. > > > > While I agree with you that the current SR-IOV acceleration workflow > > configuration is pretty static and can be done at deploy time, this > > proposal prepares for the next generation subfunction workflow where > > you will have a much higher density, and run-time configuration and > > discovery of representor ports. There is a paper about it from netdev > > conf[2], and all of this is part of the devlink infrastructure (Linux > > kernel ABI). > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > My 2 cents. Thank you for weighing in with your opinion, Numan. > I agree with Ilya. It doesn't seem natural to me for OVN to create > OVS ports and also > to use devlink infrastructure. I think an external entity can do all these. No doubt an external entity could do this, the challenge is that due to how the SmartNIC with separate control plane CPUs are laid out existing CMS tooling cannot do it, and we are working out a way to efficiently and securely coordinate how and when this external entity does the plugging. At the end of the day the data being passed around is networking related, and I believe it is possible to create a solution that would be consumable by all OVN-enabled CMS's. We'd be happy to prove this out outside of the ovn-controller first if that is more palatable, see below. > The other downside I see with this RFC is that it will be now > monitoring all the port binding > rows without any condition because when a port binding is created the > chassis column will > be NULL. Correct me If I'm wrong here. The intention of the condition is to consider unbound ports for plugging, as soon as they have been plugged and bound there will be a value in the chassis column and at that point only the respective chassis will monitor them. I guess this could be a problem if there is a large number of unbound ports in a deployment, from your experience would that be the case? (When I think of it I do remember the load balancer vip perhaps being a consumer of unbound ports). We considered using a separate LSP type, but apart from the plugging bit these LSP's will behave just like regular VIFs from OVN Port Binding code POV, and if all your hypervisors have SmartNICs with control plane CPUs there would be just as many of those ports. > Perhaps there can be a separate project called - ovn-vif which does this ? Creating a separate ovn-org project called ovn-vif would allow us to move forward, we would of course have to document the optional CMS API in core OVN pointing to an optional external dependency for their use. Would this be an acceptable compromise? -- Frode Nordahl > Thanks > Numan > > > > > > > -- > > Frode Nordahl > > > > > Best regards, Ilya Maximets. > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > <frode.nordahl@canonical.com> wrote: > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > >> > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > >> > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > >> > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > >> > > operations on the Open vSwitch integration bridge. > > > > >> > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > >> > > have a topology where the instance runs on one host and Open > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > >> > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > >> > > As such it is natural to extend OVN to provide this common > > > > >> > > functionality through its CMS facing API. > > > > >> > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > >> > interacted with the kernel directly. This change introduces completely > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > >> > and configuration job for CMS. You may even make it independent > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > >> > > > > >> Ilya, > > > > >> > > > > >> Thank you for taking the time to comment, much appreciated. > > > > >> > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > >> and isolation from the host are also new, so this is one proposal for > > > > >> how we could go about to enable the use of them. > > > > >> > > > > >> The OVN controller does today get pretty close to the physical realm > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > >> does react to events of physical interfaces being plugged into the > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > >> been doing the act of adding the port into the bridge. > > > > >> > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > >> this is that the information about which ports to bind, and where to > > > > >> bind them is already there. The timing of the information flow from > > > > >> the CMS is also suitable for the task. > > > > >> > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > >> interfacing with the kernel through netlink and friends are there or > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > >> interface is that it provides a generic infrastructure for these types > > > > >> of NICs. So by using this interface we should be able to support most > > > > >> if not all of the variants of these cards. > > > > >> > > > > >> > > > > >> Providing a separate OVN service to do the task could work, but would > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > Southbound DB is not meant to be a public interface, it just happened > > > > to be available for connections. I know that OpenStack has metadata > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > think, there should be a different way to get/set required information > > > > without connection to the Southbound. > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > use of the Southbound DB by external to OVN services. My suggestion > > > was to have a separate OVN process do this if your objection was to > > > handle it as part of the ovn-controller process. > > > > > > > >> > > > > >> I fear it would be quite hard to build a whole separate project with > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > >> using this? > > > > >> > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > >> > at least for this use case, IIUC. > > > > >> > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > >> have drivers for other platforms, I am sure the necessary > > > > >> infrastructure will become available there too over time, if it is not > > > > >> there already. > > > > >> > > > > >> We do currently have platform specific macros in the OVN build system, > > > > >> so we could enable the functionality when built on a compatible > > > > >> platform. > > > > >> > > > > >> > Maybe, others has different opinions. > > > > >> > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > >> > > > > >> > Another though is that there is, obviously, a network connection > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > >> > host system could just add interfaces over the network connection? > > > > >> > > > > >> There are a few issues with such an approach. One of the main goals > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > >> extra layer of security and isolation which is separate from the > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > >> power from. Requiring a connection between the two for operation would > > > > >> defy this purpose. > > > > >> > > > > >> In addition to that, this class of cards provide visibility into > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > >> plugging through a remote ovsdb connection, it would have to > > > > >> communicate with something else running on the NIC control plane CPU > > > > >> to retrieve the information it needs before it can know what to relay > > > > >> back over the ovsdb connection. > > > > >> > > > > >> -- > > > > >> Frode Nordahl > > > > >> > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > I am aware of the on-going effort to implement support for this in > > > ovn-kubernetes directly. What we have identified is that there are > > > other CMSs that want this functionality, and with that we have an > > > opportunity to generalise and provide an abstraction in a common place > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > cases these ports will require a lot of extra configuration which > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > ports, where you have to specify the port type, DPDK port config, > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > queues. You may also want to configure affinity for these queues per > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > also might require some extra configuration that OVN will have to > > > > think about now. This is a typical job for CMS to configure this > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > of CNI/device plugins exists. > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > them get their information from the CMS. Providing support for > > > userspace datapath and DPDK would require more information, some of > > > which is available through devlink, some fit well in key/value > > > options. Our initial target would be to support the kernel representor > > > port workflow. > > > > > > > > > > > > > There are some benefits: > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > or run management appctl commands if required. > > > > So, I don't see the point why it can't do plugging if it's already > > > > there. > > > > > > This pattern is Kubernetes specific and is not the case for other > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > Aside from plugging, > > > > I also don't see the reason to have devlink code in OVN just > > > > because it runs once on the init stage, IIUC. So, I don't > > > > understand why this information about the hardware cannot be > > > > retrieved during the host provisioning and stored somewhere > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > or something? > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > the work items to further develop this is to monitor devlink for > > > changes, the end product will not do one-time initialization. > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > configuration is pretty static and can be done at deploy time, this > > > proposal prepares for the next generation subfunction workflow where > > > you will have a much higher density, and run-time configuration and > > > discovery of representor ports. There is a paper about it from netdev > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > kernel ABI). > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > My 2 cents. > > Thank you for weighing in with your opinion, Numan. > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > OVS ports and also > > to use devlink infrastructure. I think an external entity can do all these. > > No doubt an external entity could do this, the challenge is that due > to how the SmartNIC with separate control plane CPUs are laid out > existing CMS tooling cannot do it, and we are working out a way to > efficiently and securely coordinate how and when this external entity > does the plugging. At the end of the day the data being passed around > is networking related, and I believe it is possible to create a > solution that would be consumable by all OVN-enabled CMS's. We'd be > happy to prove this out outside of the ovn-controller first if that is > more palatable, see below. > > > The other downside I see with this RFC is that it will be now > > monitoring all the port binding > > rows without any condition because when a port binding is created the > > chassis column will > > be NULL. Correct me If I'm wrong here. > > The intention of the condition is to consider unbound ports for > plugging, as soon as they have been plugged and bound there will be a > value in the chassis column and at that point only the respective > chassis will monitor them. I guess this could be a problem if there is > a large number of unbound ports in a deployment, from your experience > would that be the case? (When I think of it I do remember the load > balancer vip perhaps being a consumer of unbound ports). > > We considered using a separate LSP type, but apart from the plugging > bit these LSP's will behave just like regular VIFs from OVN Port > Binding code POV, and if all your hypervisors have SmartNICs with > control plane CPUs there would be just as many of those ports. In my opinion having a separate type would be better. At the least it will not impact normal or existing deployments. Let's say we call a type as "representor" (just picking a name for now), then each ovn-controller can add a conditional monitor to get updates for these types (just like how we do now for patch or external ports). What do you think of the below approach 1. A new LSP type - representor is added. CMS when creating this lport also sets the requested-chassis option. 2. ovn-controller on the requested chassis sees this lport type, creates an OVS port/interface and possibly sets some options or hints in the external_ids column of ovs interface. At this point, it will not do anything. 3. An external entity (similar to your proposed pluggable.c) running on the chassis, would have built an shash of "devlink_ports" using the devlink sockets. 4. This entity will also be monitoring the ovs interfaces. When it sees a new interface with the proper hints, will set the external_ids:iface-id=<lport_name> 5. ovn-controller when it sees the external_ids:iface-id, will bind the port and add flows. With (2) we now add the support in ovn-controller to create OVS ports, but at the same time OVN will not bother about the internal details of SMART nics or bother about the devlink sockets. I'm not sure if the above approach seems hacky. The external entity could be called "ovn-vif". What do you think ? > > > Perhaps there can be a separate project called - ovn-vif which does this ? > > Creating a separate ovn-org project called ovn-vif would allow us to > move forward, we would of course have to document the optional CMS API > in core OVN pointing to an optional external dependency for their use. > Would this be an acceptable compromise? I personally do not have any objections to that. Maybe others have different opinions. Thanks Numan > > -- > Frode Nordahl > > > Thanks > > Numan > > > > > > > > > > > -- > > > Frode Nordahl > > > > > > > Best regards, Ilya Maximets. > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > <frode.nordahl@canonical.com> wrote: > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > >> > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > >> > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > >> > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > >> > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > >> > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > >> > > functionality through its CMS facing API. > > > > > >> > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > >> > > > > > >> Ilya, > > > > > >> > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > >> > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > >> how we could go about to enable the use of them. > > > > > >> > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > >> does react to events of physical interfaces being plugged into the > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > >> been doing the act of adding the port into the bridge. > > > > > >> > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > >> this is that the information about which ports to bind, and where to > > > > > >> bind them is already there. The timing of the information flow from > > > > > >> the CMS is also suitable for the task. > > > > > >> > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > >> if not all of the variants of these cards. > > > > > >> > > > > > >> > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > to be available for connections. I know that OpenStack has metadata > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > think, there should be a different way to get/set required information > > > > > without connection to the Southbound. > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > was to have a separate OVN process do this if your objection was to > > > > handle it as part of the ovn-controller process. > > > > > > > > > >> > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > >> using this? > > > > > >> > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > >> > at least for this use case, IIUC. > > > > > >> > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > >> infrastructure will become available there too over time, if it is not > > > > > >> there already. > > > > > >> > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > >> so we could enable the functionality when built on a compatible > > > > > >> platform. > > > > > >> > > > > > >> > Maybe, others has different opinions. > > > > > >> > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > >> > > > > > >> > Another though is that there is, obviously, a network connection > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > >> > host system could just add interfaces over the network connection? > > > > > >> > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > >> extra layer of security and isolation which is separate from the > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > >> power from. Requiring a connection between the two for operation would > > > > > >> defy this purpose. > > > > > >> > > > > > >> In addition to that, this class of cards provide visibility into > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > >> back over the ovsdb connection. > > > > > >> > > > > > >> -- > > > > > >> Frode Nordahl > > > > > >> > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > ovn-kubernetes directly. What we have identified is that there are > > > > other CMSs that want this functionality, and with that we have an > > > > opportunity to generalise and provide an abstraction in a common place > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > cases these ports will require a lot of extra configuration which > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > queues. You may also want to configure affinity for these queues per > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > also might require some extra configuration that OVN will have to > > > > > think about now. This is a typical job for CMS to configure this > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > of CNI/device plugins exists. > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > them get their information from the CMS. Providing support for > > > > userspace datapath and DPDK would require more information, some of > > > > which is available through devlink, some fit well in key/value > > > > options. Our initial target would be to support the kernel representor > > > > port workflow. > > > > > > > > > > > > > > > > There are some benefits: > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > or run management appctl commands if required. > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > there. > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > Aside from plugging, > > > > > I also don't see the reason to have devlink code in OVN just > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > understand why this information about the hardware cannot be > > > > > retrieved during the host provisioning and stored somewhere > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > or something? > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > the work items to further develop this is to monitor devlink for > > > > changes, the end product will not do one-time initialization. > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > configuration is pretty static and can be done at deploy time, this > > > > proposal prepares for the next generation subfunction workflow where > > > > you will have a much higher density, and run-time configuration and > > > > discovery of representor ports. There is a paper about it from netdev > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > kernel ABI). > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > My 2 cents. > > > > Thank you for weighing in with your opinion, Numan. > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > OVS ports and also > > > to use devlink infrastructure. I think an external entity can do all these. > > > > No doubt an external entity could do this, the challenge is that due > > to how the SmartNIC with separate control plane CPUs are laid out > > existing CMS tooling cannot do it, and we are working out a way to > > efficiently and securely coordinate how and when this external entity > > does the plugging. At the end of the day the data being passed around > > is networking related, and I believe it is possible to create a > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > happy to prove this out outside of the ovn-controller first if that is > > more palatable, see below. > > > > > The other downside I see with this RFC is that it will be now > > > monitoring all the port binding > > > rows without any condition because when a port binding is created the > > > chassis column will > > > be NULL. Correct me If I'm wrong here. > > > > The intention of the condition is to consider unbound ports for > > plugging, as soon as they have been plugged and bound there will be a > > value in the chassis column and at that point only the respective > > chassis will monitor them. I guess this could be a problem if there is > > a large number of unbound ports in a deployment, from your experience > > would that be the case? (When I think of it I do remember the load > > balancer vip perhaps being a consumer of unbound ports). > > > > We considered using a separate LSP type, but apart from the plugging > > bit these LSP's will behave just like regular VIFs from OVN Port > > Binding code POV, and if all your hypervisors have SmartNICs with > > control plane CPUs there would be just as many of those ports. > > In my opinion having a separate type would be better. At the least it > will not impact > normal or existing deployments. > > Let's say we call a type as "representor" (just picking a name for > now), then each ovn-controller > can add a conditional monitor to get updates for these types (just > like how we do now for patch or external ports). > > What do you think of the below approach > > 1. A new LSP type - representor is added. CMS when creating this > lport also sets the requested-chassis option. Works for me. > 2. ovn-controller on the requested chassis sees this lport type, > creates an OVS port/interface and possibly sets some options or hints > in the external_ids column > of ovs interface. At this point, it will not do anything. This could potentially help us avoid the issue of adding more processes and IDLs talking to the SB DB. However, at this point in time, the ovn-controller does not know the details of the representor port, such as name/ifindex for the kernel representor port, and the interface it creates in OVS has to match the kernel representor port, and afict it is not possible to change these details after port creation. This information is provided by the devlink lookup in the current proposal. It might work if it was possible to create the port and interface separately, but that would cause a referential integrity violation. Another alternative could be that ovn-vif created a new interface record in 4), attaching that to the port record created by ovn-controller replacing a throw-away interface record, but we would be approaching hacky territory. Could we do some sort of local IPC perhaps through a Unix domain socket? > 3. An external entity (similar to your proposed pluggable.c) running > on the chassis, would have built an shash of "devlink_ports" using > the devlink sockets. > > 4. This entity will also be monitoring the ovs interfaces. When it > sees a new interface with the proper hints, will set the > external_ids:iface-id=<lport_name> > > 5. ovn-controller when it sees the external_ids:iface-id, will bind > the port and add flows. > > > With (2) we now add the support in ovn-controller to create OVS ports, > but at the same time OVN will not bother about the internal details > of SMART nics > or bother about the devlink sockets. > > I'm not sure if the above approach seems hacky. The external entity > could be called "ovn-vif". > > What do you think ? This is an interesting idea and it could work if we figure out a way for ovn-controller and ovn-vif to exchange information before the port/interface is created. > > > > > Perhaps there can be a separate project called - ovn-vif which does this ? > > > > Creating a separate ovn-org project called ovn-vif would allow us to > > move forward, we would of course have to document the optional CMS API > > in core OVN pointing to an optional external dependency for their use. > > Would this be an acceptable compromise? > > I personally do not have any objections to that. Maybe others have > different opinions. Great. Cheers! -- Frode Nordahl > > Thanks > Numan > > > > > -- > > Frode Nordahl > > > > > Thanks > > > Numan > > > > > > > > > > > > > > > -- > > > > Frode Nordahl > > > > > > > > > Best regards, Ilya Maximets. > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > <frode.nordahl@canonical.com> wrote: > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl < frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > > >> > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets < i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > > >> > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > >> > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > >> > > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > >> > > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > >> > > functionality through its CMS facing API. > > > > > > >> > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > > >> > > > > > > >> Ilya, > > > > > > >> > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > >> > > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > > >> how we could go about to enable the use of them. > > > > > > >> > > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > > >> does react to events of physical interfaces being plugged into the > > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > >> > > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > > >> this is that the information about which ports to bind, and where to > > > > > > >> bind them is already there. The timing of the information flow from > > > > > > >> the CMS is also suitable for the task. > > > > > > >> > > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > > >> if not all of the variants of these cards. > > > > > > >> > > > > > > >> > > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > > to be available for connections. I know that OpenStack has metadata > > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > > think, there should be a different way to get/set required information > > > > > > without connection to the Southbound. > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > was to have a separate OVN process do this if your objection was to > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > >> > > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > > >> using this? > > > > > > >> > > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > > >> > at least for this use case, IIUC. > > > > > > >> > > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > >> infrastructure will become available there too over time, if it is not > > > > > > >> there already. > > > > > > >> > > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > >> platform. > > > > > > >> > > > > > > >> > Maybe, others has different opinions. > > > > > > >> > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > >> > > > > > > >> > Another though is that there is, obviously, a network connection > > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > > >> > host system could just add interfaces over the network connection? > > > > > > >> > > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > > >> extra layer of security and isolation which is separate from the > > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > > >> power from. Requiring a connection between the two for operation would > > > > > > >> defy this purpose. > > > > > > >> > > > > > > >> In addition to that, this class of cards provide visibility into > > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > > >> back over the ovsdb connection. > > > > > > >> > > > > > > >> -- > > > > > > >> Frode Nordahl > > > > > > >> > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing < https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > other CMSs that want this functionality, and with that we have an > > > > > opportunity to generalise and provide an abstraction in a common place > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > cases these ports will require a lot of extra configuration which > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > > queues. You may also want to configure affinity for these queues per > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > > also might require some extra configuration that OVN will have to > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > > of CNI/device plugins exists. > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > > them get their information from the CMS. Providing support for > > > > > userspace datapath and DPDK would require more information, some of > > > > > which is available through devlink, some fit well in key/value > > > > > options. Our initial target would be to support the kernel representor > > > > > port workflow. > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > or run management appctl commands if required. > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > there. > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > Aside from plugging, > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > understand why this information about the hardware cannot be > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > or something? > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > the work items to further develop this is to monitor devlink for > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > configuration is pretty static and can be done at deploy time, this > > > > > proposal prepares for the next generation subfunction workflow where > > > > > you will have a much higher density, and run-time configuration and > > > > > discovery of representor ports. There is a paper about it from netdev > > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > > kernel ABI). > > > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > My 2 cents. > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > OVS ports and also > > > > to use devlink infrastructure. I think an external entity can do all these. > > > > > > No doubt an external entity could do this, the challenge is that due > > > to how the SmartNIC with separate control plane CPUs are laid out > > > existing CMS tooling cannot do it, and we are working out a way to > > > efficiently and securely coordinate how and when this external entity > > > does the plugging. At the end of the day the data being passed around > > > is networking related, and I believe it is possible to create a > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > happy to prove this out outside of the ovn-controller first if that is > > > more palatable, see below. > > > > > > > The other downside I see with this RFC is that it will be now > > > > monitoring all the port binding > > > > rows without any condition because when a port binding is created the > > > > chassis column will > > > > be NULL. Correct me If I'm wrong here. > > > > > > The intention of the condition is to consider unbound ports for > > > plugging, as soon as they have been plugged and bound there will be a > > > value in the chassis column and at that point only the respective > > > chassis will monitor them. I guess this could be a problem if there is > > > a large number of unbound ports in a deployment, from your experience > > > would that be the case? (When I think of it I do remember the load > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > We considered using a separate LSP type, but apart from the plugging > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > control plane CPUs there would be just as many of those ports. > > > > In my opinion having a separate type would be better. At the least it > > will not impact > > normal or existing deployments. > > > > Let's say we call a type as "representor" (just picking a name for > > now), then each ovn-controller > > can add a conditional monitor to get updates for these types (just > > like how we do now for patch or external ports). > > > > What do you think of the below approach > > > > 1. A new LSP type - representor is added. CMS when creating this > > lport also sets the requested-chassis option. > > Works for me. > I think it is better not to add a new LSP type for this purpose. It would be better to make this "interface plugging" feature generic, so that it can work for other types as well. It could be supported in the future for a regular VIF, or for a new LSP type, so I think this behavior is better not to be tied to a LSP type. We could add a new column in the Port_Binding table: plugged_by. ovn-controller can monitor all port_bindings that have "plugged_by" equals to current chassis. This way it should be backward compatible and not affecting the use cases that doesn't need "plugged_by". The details of plugging information can be stored in the options column with new options defined, such as: "plug_type". For representor port the "plug_type" can be defined as "representor". For each plug_type there can be type-specific information defined and also stored in the options column. (Or we can define a new column "plug_options" just for plugging information). > > 2. ovn-controller on the requested chassis sees this lport type, > > creates an OVS port/interface and possibly sets some options or hints > > in the external_ids column > > of ovs interface. At this point, it will not do anything. > > This could potentially help us avoid the issue of adding more > processes and IDLs talking to the SB DB. > > However, at this point in time, the ovn-controller does not know the > details of the representor port, such as name/ifindex for the kernel > representor port, and the interface it creates in OVS has to match the > kernel representor port, and afict it is not possible to change these > details after port creation. This information is provided by the > devlink lookup in the current proposal. > > It might work if it was possible to create the port and interface > separately, but that would cause a referential integrity violation. > Another alternative could be that ovn-vif created a new interface > record in 4), attaching that to the port record created by > ovn-controller replacing a throw-away interface record, but we would > be approaching hacky territory. > > Could we do some sort of local IPC perhaps through a Unix domain socket? I think it would be good to have separate *modules* to handle interface plugging (one module per plug_type), but there is no need to have a separate *process* for it, which increases the complexity and the IPC cost. The modules can be invoked as callbacks registered for each plug_type, and responsible for explaining the options defined for its type, which is directly retrieved from the port_binding record from SB DB. This way the core ovn-controller components are kept untouched (except calling the plugging modules at some point, but not worrying about the details), without introducing a separate process. (A separate process is needed only if there are performance concerns or blocking I/Os). It is similar to how different dpif providers (or different dev types) are supported in OVS. > > > 3. An external entity (similar to your proposed pluggable.c) running > > on the chassis, would have built an shash of "devlink_ports" using > > the devlink sockets. > > > > 4. This entity will also be monitoring the ovs interfaces. When it > > sees a new interface with the proper hints, will set the > > external_ids:iface-id=<lport_name> > > I think setting iface-id (if still necessary) shall be done together with the port creation, by the plugging module (through a callback). > > 5. ovn-controller when it sees the external_ids:iface-id, will bind > > the port and add flows. > > > > > > With (2) we now add the support in ovn-controller to create OVS ports, > > but at the same time OVN will not bother about the internal details > > of SMART nics > > or bother about the devlink sockets. > > > > I'm not sure if the above approach seems hacky. The external entity > > could be called "ovn-vif". > > > > What do you think ? > > This is an interesting idea and it could work if we figure out a way > for ovn-controller and ovn-vif to exchange information before the > port/interface is created. > I'd really prefer not to introduce such extra IPC. Would the plug_type mechanism be generic enough and avoid such external entities? Thanks, Han > > > > > > > Perhaps there can be a separate project called - ovn-vif which does this ? > > > > > > Creating a separate ovn-org project called ovn-vif would allow us to > > > move forward, we would of course have to document the optional CMS API > > > in core OVN pointing to an optional external dependency for their use. > > > Would this be an acceptable compromise? > > > > I personally do not have any objections to that. Maybe others have > > different opinions. > > Great. > > Cheers! > > -- > Frode Nordahl > > > > > > Thanks > > Numan > > > > > > > > -- > > > Frode Nordahl > > > > > > > Thanks > > > > Numan > > > > > > > > > > > > > > > > > > > -- > > > > > Frode Nordahl > > > > > > > > > > > Best regards, Ilya Maximets. > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > -- > Frode Nordahl > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <zhouhan@gmail.com> wrote: > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > > > >> > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > > > >> > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > > >> > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > >> > > > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > > >> > > > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > >> > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > > > >> > > > > > > > >> Ilya, > > > > > > > >> > > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > > >> > > > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > > > >> how we could go about to enable the use of them. > > > > > > > >> > > > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > > > >> does react to events of physical interfaces being plugged into the > > > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > >> > > > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > > > >> this is that the information about which ports to bind, and where to > > > > > > > >> bind them is already there. The timing of the information flow from > > > > > > > >> the CMS is also suitable for the task. > > > > > > > >> > > > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > > > >> if not all of the variants of these cards. > > > > > > > >> > > > > > > > >> > > > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > > > to be available for connections. I know that OpenStack has metadata > > > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > > > think, there should be a different way to get/set required information > > > > > > > without connection to the Southbound. > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > > was to have a separate OVN process do this if your objection was to > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > >> > > > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > > > >> using this? > > > > > > > >> > > > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > > > >> > at least for this use case, IIUC. > > > > > > > >> > > > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > > >> infrastructure will become available there too over time, if it is not > > > > > > > >> there already. > > > > > > > >> > > > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > > >> platform. > > > > > > > >> > > > > > > > >> > Maybe, others has different opinions. > > > > > > > >> > > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > > >> > > > > > > > >> > Another though is that there is, obviously, a network connection > > > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > > > >> > host system could just add interfaces over the network connection? > > > > > > > >> > > > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > > > >> extra layer of security and isolation which is separate from the > > > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > > > >> power from. Requiring a connection between the two for operation would > > > > > > > >> defy this purpose. > > > > > > > >> > > > > > > > >> In addition to that, this class of cards provide visibility into > > > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > > > >> back over the ovsdb connection. > > > > > > > >> > > > > > > > >> -- > > > > > > > >> Frode Nordahl > > > > > > > >> > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > > other CMSs that want this functionality, and with that we have an > > > > > > opportunity to generalise and provide an abstraction in a common place > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > > cases these ports will require a lot of extra configuration which > > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > > > queues. You may also want to configure affinity for these queues per > > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > > > also might require some extra configuration that OVN will have to > > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > > > them get their information from the CMS. Providing support for > > > > > > userspace datapath and DPDK would require more information, some of > > > > > > which is available through devlink, some fit well in key/value > > > > > > options. Our initial target would be to support the kernel representor > > > > > > port workflow. > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > > or run management appctl commands if required. > > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > > there. > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > > > Aside from plugging, > > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > understand why this information about the hardware cannot be > > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > > or something? > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > > the work items to further develop this is to monitor devlink for > > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > > configuration is pretty static and can be done at deploy time, this > > > > > > proposal prepares for the next generation subfunction workflow where > > > > > > you will have a much higher density, and run-time configuration and > > > > > > discovery of representor ports. There is a paper about it from netdev > > > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > > > kernel ABI). > > > > > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > My 2 cents. > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > > OVS ports and also > > > > > to use devlink infrastructure. I think an external entity can do all these. > > > > > > > > No doubt an external entity could do this, the challenge is that due > > > > to how the SmartNIC with separate control plane CPUs are laid out > > > > existing CMS tooling cannot do it, and we are working out a way to > > > > efficiently and securely coordinate how and when this external entity > > > > does the plugging. At the end of the day the data being passed around > > > > is networking related, and I believe it is possible to create a > > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > > happy to prove this out outside of the ovn-controller first if that is > > > > more palatable, see below. > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > monitoring all the port binding > > > > > rows without any condition because when a port binding is created the > > > > > chassis column will > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > The intention of the condition is to consider unbound ports for > > > > plugging, as soon as they have been plugged and bound there will be a > > > > value in the chassis column and at that point only the respective > > > > chassis will monitor them. I guess this could be a problem if there is > > > > a large number of unbound ports in a deployment, from your experience > > > > would that be the case? (When I think of it I do remember the load > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > We considered using a separate LSP type, but apart from the plugging > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > > control plane CPUs there would be just as many of those ports. > > > In my opinion having a separate type would be better. At the least it > > > will not impact > > > normal or existing deployments. > > > > > > Let's say we call a type as "representor" (just picking a name for > > > now), then each ovn-controller > > > can add a conditional monitor to get updates for these types (just > > > like how we do now for patch or external ports). > > > > > > What do you think of the below approach > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > lport also sets the requested-chassis option. > > > > Works for me. > > > I think it is better not to add a new LSP type for this purpose. It would be better to make this "interface plugging" feature generic, so that it can work for other types as well. It could be supported in the future for a regular VIF, or for a new LSP type, so I think this behavior is better not to be tied to a LSP type. > > We could add a new column in the Port_Binding table: plugged_by. ovn-controller can monitor all port_bindings that have "plugged_by" equals to current chassis. This way it should be backward compatible and not affecting the use cases that doesn't need "plugged_by". That is a good idea, during development of the RFC I did make an attempt at creating an IDL index for the options:requested-chassis, but that does not work because it would only match rows with requested-chassis as the *only* option. Creating a new column would resolve this issue! > The details of plugging information can be stored in the options column with new options defined, such as: "plug_type". For representor port the "plug_type" can be defined as "representor". For each plug_type there can be type-specific information defined and also stored in the options column. (Or we can define a new column "plug_options" just for plugging information). A convention to prefix all plugging related options the same way, let's say: 'plug_', would work as well as a separate column, but I have nothing against a separate column if there is appetite for that. > > > 2. ovn-controller on the requested chassis sees this lport type, > > > creates an OVS port/interface and possibly sets some options or hints > > > in the external_ids column > > > of ovs interface. At this point, it will not do anything. > > > > This could potentially help us avoid the issue of adding more > > processes and IDLs talking to the SB DB. > > > > However, at this point in time, the ovn-controller does not know the > > details of the representor port, such as name/ifindex for the kernel > > representor port, and the interface it creates in OVS has to match the > > kernel representor port, and afict it is not possible to change these > > details after port creation. This information is provided by the > > devlink lookup in the current proposal. > > > > It might work if it was possible to create the port and interface > > separately, but that would cause a referential integrity violation. > > Another alternative could be that ovn-vif created a new interface > > record in 4), attaching that to the port record created by > > ovn-controller replacing a throw-away interface record, but we would > > be approaching hacky territory. > > > > Could we do some sort of local IPC perhaps through a Unix domain socket? > > I think it would be good to have separate *modules* to handle interface plugging (one module per plug_type), but there is no need to have a separate *process* for it, which increases the complexity and the IPC cost. The modules can be invoked as callbacks registered for each plug_type, and responsible for explaining the options defined for its type, which is directly retrieved from the port_binding record from SB DB. This way the core ovn-controller components are kept untouched (except calling the plugging modules at some point, but not worrying about the details), without introducing a separate process. (A separate process is needed only if there are performance concerns or blocking I/Os). > > It is similar to how different dpif providers (or different dev types) are supported in OVS. Yes, I like this. The currently proposed devlink informed plugging would have no issues with blocking I/O, and we could document a "charter" for the ovn-vif project that anything going into it would have to operate as if it was native OVN asynchronous code. > > > > > 3. An external entity (similar to your proposed pluggable.c) running > > > on the chassis, would have built an shash of "devlink_ports" using > > > the devlink sockets. > > > > > > 4. This entity will also be monitoring the ovs interfaces. When it > > > sees a new interface with the proper hints, will set the > > > external_ids:iface-id=<lport_name> > > > > I think setting iface-id (if still necessary) shall be done together with the port creation, by the plugging module (through a callback). +1 > > > 5. ovn-controller when it sees the external_ids:iface-id, will bind > > > the port and add flows. > > > > > > > > > With (2) we now add the support in ovn-controller to create OVS ports, > > > but at the same time OVN will not bother about the internal details > > > of SMART nics > > > or bother about the devlink sockets. > > > > > > I'm not sure if the above approach seems hacky. The external entity > > > could be called "ovn-vif". > > > > > > What do you think ? > > > > This is an interesting idea and it could work if we figure out a way > > for ovn-controller and ovn-vif to exchange information before the > > port/interface is created. > > > > I'd really prefer not to introduce such extra IPC. Would the plug_type mechanism be generic enough and avoid such external entities? I agree, optionally pulling the code in at compile time would make things much simpler and more effective.
On Wed, Jul 7, 2021 at 8:30 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > > > > >> > > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > > > > >> > > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > > > >> > > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > > >> > > > > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > > > >> > > > > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > > >> > > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > > > > >> > > > > > > > > >> Ilya, > > > > > > > > >> > > > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > > > >> > > > > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > > > > >> how we could go about to enable the use of them. > > > > > > > > >> > > > > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > > > > >> does react to events of physical interfaces being plugged into the > > > > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > > >> > > > > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > > > > >> this is that the information about which ports to bind, and where to > > > > > > > > >> bind them is already there. The timing of the information flow from > > > > > > > > >> the CMS is also suitable for the task. > > > > > > > > >> > > > > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > > > > >> if not all of the variants of these cards. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > > > > to be available for connections. I know that OpenStack has metadata > > > > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > > > > think, there should be a different way to get/set required information > > > > > > > > without connection to the Southbound. > > > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > > > was to have a separate OVN process do this if your objection was to > > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > > > >> > > > > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > > > > >> using this? > > > > > > > > >> > > > > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > > > > >> > at least for this use case, IIUC. > > > > > > > > >> > > > > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > > > >> infrastructure will become available there too over time, if it is not > > > > > > > > >> there already. > > > > > > > > >> > > > > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > > > >> platform. > > > > > > > > >> > > > > > > > > >> > Maybe, others has different opinions. > > > > > > > > >> > > > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > > > >> > > > > > > > > >> > Another though is that there is, obviously, a network connection > > > > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > > > > >> > host system could just add interfaces over the network connection? > > > > > > > > >> > > > > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > > > > >> extra layer of security and isolation which is separate from the > > > > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > > > > >> power from. Requiring a connection between the two for operation would > > > > > > > > >> defy this purpose. > > > > > > > > >> > > > > > > > > >> In addition to that, this class of cards provide visibility into > > > > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > > > > >> back over the ovsdb connection. > > > > > > > > >> > > > > > > > > >> -- > > > > > > > > >> Frode Nordahl > > > > > > > > >> > > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > > > other CMSs that want this functionality, and with that we have an > > > > > > > opportunity to generalise and provide an abstraction in a common place > > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > > > cases these ports will require a lot of extra configuration which > > > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > > > > queues. You may also want to configure affinity for these queues per > > > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > > > > also might require some extra configuration that OVN will have to > > > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > > > > them get their information from the CMS. Providing support for > > > > > > > userspace datapath and DPDK would require more information, some of > > > > > > > which is available through devlink, some fit well in key/value > > > > > > > options. Our initial target would be to support the kernel representor > > > > > > > port workflow. > > > > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > > > or run management appctl commands if required. > > > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > > > there. > > > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > > > > > Aside from plugging, > > > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > > understand why this information about the hardware cannot be > > > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > > > or something? > > > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > > > the work items to further develop this is to monitor devlink for > > > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > > > configuration is pretty static and can be done at deploy time, this > > > > > > > proposal prepares for the next generation subfunction workflow where > > > > > > > you will have a much higher density, and run-time configuration and > > > > > > > discovery of representor ports. There is a paper about it from netdev > > > > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > > > > kernel ABI). > > > > > > > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > > > My 2 cents. > > > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > > > OVS ports and also > > > > > > to use devlink infrastructure. I think an external entity can do all these. > > > > > > > > > > No doubt an external entity could do this, the challenge is that due > > > > > to how the SmartNIC with separate control plane CPUs are laid out > > > > > existing CMS tooling cannot do it, and we are working out a way to > > > > > efficiently and securely coordinate how and when this external entity > > > > > does the plugging. At the end of the day the data being passed around > > > > > is networking related, and I believe it is possible to create a > > > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > > > happy to prove this out outside of the ovn-controller first if that is > > > > > more palatable, see below. > > > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > > monitoring all the port binding > > > > > > rows without any condition because when a port binding is created the > > > > > > chassis column will > > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > > > The intention of the condition is to consider unbound ports for > > > > > plugging, as soon as they have been plugged and bound there will be a > > > > > value in the chassis column and at that point only the respective > > > > > chassis will monitor them. I guess this could be a problem if there is > > > > > a large number of unbound ports in a deployment, from your experience > > > > > would that be the case? (When I think of it I do remember the load > > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > > > We considered using a separate LSP type, but apart from the plugging > > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > > > control plane CPUs there would be just as many of those ports. > > > > > > In my opinion having a separate type would be better. At the least it > > > > will not impact > > > > normal or existing deployments. > > > > > > > > Let's say we call a type as "representor" (just picking a name for > > > > now), then each ovn-controller > > > > can add a conditional monitor to get updates for these types (just > > > > like how we do now for patch or external ports). > > > > > > > > What do you think of the below approach > > > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > > lport also sets the requested-chassis option. > > > > > > Works for me. > > > > > I think it is better not to add a new LSP type for this purpose. It would be better to make this "interface plugging" feature generic, so that it can work for other types as well. It could be supported in the future for a regular VIF, or for a new LSP type, so I think this behavior is better not to be tied to a LSP type. > > > > We could add a new column in the Port_Binding table: plugged_by. ovn-controller can monitor all port_bindings that have "plugged_by" equals to current chassis. This way it should be backward compatible and not affecting the use cases that doesn't need "plugged_by". > > That is a good idea, during development of the RFC I did make an > attempt at creating an IDL index for the options:requested-chassis, > but that does not work because it would only match rows with > requested-chassis as the *only* option. > > Creating a new column would resolve this issue! Looking into how this bit could be implemented, I see that northd is already performing lookup of chassis by name in several places, so I'm thinking we could accomplish this without any change to NB schema. With the following change to SB schema, we could have northd check for presence of Logical_Switch_Port options:plug_type and options:requested-chassis, if both are there look up chassis UUID by name in options:requested-chassis and insert into SB Port_Binding plugged_by column. diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index bbf60781d..0dee8e155 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -229,7 +229,11 @@ "external_ids": {"type": {"key": "string", "value": "string", "min": 0, - "max": "unlimited"}}}, + "max": "unlimited"}}, + "plugged_by": {"type": {"key": {"type": "uuid", + "refTable": "Chassis", + "refType": "weak"}, + "min": 0, "max": 1}}}, "indexes": [["datapath", "tunnel_key"], ["logical_port"]], "isRoot": true}, "MAC_Binding": { What do you think? -- Frode Nordahl > > The details of plugging information can be stored in the options column with new options defined, such as: "plug_type". For representor port the "plug_type" can be defined as "representor". For each plug_type there can be type-specific information defined and also stored in the options column. (Or we can define a new column "plug_options" just for plugging information). > > A convention to prefix all plugging related options the same way, > let's say: 'plug_', would work as well as a separate column, but I > have nothing against a separate column if there is appetite for that. > > > > > 2. ovn-controller on the requested chassis sees this lport type, > > > > creates an OVS port/interface and possibly sets some options or hints > > > > in the external_ids column > > > > of ovs interface. At this point, it will not do anything. > > > > > > This could potentially help us avoid the issue of adding more > > > processes and IDLs talking to the SB DB. > > > > > > However, at this point in time, the ovn-controller does not know the > > > details of the representor port, such as name/ifindex for the kernel > > > representor port, and the interface it creates in OVS has to match the > > > kernel representor port, and afict it is not possible to change these > > > details after port creation. This information is provided by the > > > devlink lookup in the current proposal. > > > > > > It might work if it was possible to create the port and interface > > > separately, but that would cause a referential integrity violation. > > > Another alternative could be that ovn-vif created a new interface > > > record in 4), attaching that to the port record created by > > > ovn-controller replacing a throw-away interface record, but we would > > > be approaching hacky territory. > > > > > > Could we do some sort of local IPC perhaps through a Unix domain socket? > > > > I think it would be good to have separate *modules* to handle interface plugging (one module per plug_type), but there is no need to have a separate *process* for it, which increases the complexity and the IPC cost. The modules can be invoked as callbacks registered for each plug_type, and responsible for explaining the options defined for its type, which is directly retrieved from the port_binding record from SB DB. This way the core ovn-controller components are kept untouched (except calling the plugging modules at some point, but not worrying about the details), without introducing a separate process. (A separate process is needed only if there are performance concerns or blocking I/Os). > > > > It is similar to how different dpif providers (or different dev types) are supported in OVS. > > Yes, I like this. The currently proposed devlink informed plugging > would have no issues with blocking I/O, and we could document a > "charter" for the ovn-vif project that anything going into it would > have to operate as if it was native OVN asynchronous code. > > > > > > > > 3. An external entity (similar to your proposed pluggable.c) running > > > > on the chassis, would have built an shash of "devlink_ports" using > > > > the devlink sockets. > > > > > > > > 4. This entity will also be monitoring the ovs interfaces. When it > > > > sees a new interface with the proper hints, will set the > > > > external_ids:iface-id=<lport_name> > > > > > > I think setting iface-id (if still necessary) shall be done together with the port creation, by the plugging module (through a callback). > > +1 > > > > > 5. ovn-controller when it sees the external_ids:iface-id, will bind > > > > the port and add flows. > > > > > > > > > > > > With (2) we now add the support in ovn-controller to create OVS ports, > > > > but at the same time OVN will not bother about the internal details > > > > of SMART nics > > > > or bother about the devlink sockets. > > > > > > > > I'm not sure if the above approach seems hacky. The external entity > > > > could be called "ovn-vif". > > > > > > > > What do you think ? > > > > > > This is an interesting idea and it could work if we figure out a way > > > for ovn-controller and ovn-vif to exchange information before the > > > port/interface is created. > > > > > > > I'd really prefer not to introduce such extra IPC. Would the plug_type mechanism be generic enough and avoid such external entities? > > I agree, optionally pulling the code in at compile time would make > things much simpler and more effective. > > -- > Frode Nordahl > > > Thanks, > > Han > > > > > > > > > > > > > Perhaps there can be a separate project called - ovn-vif which does this ? > > > > > > > > > > Creating a separate ovn-org project called ovn-vif would allow us to > > > > > move forward, we would of course have to document the optional CMS API > > > > > in core OVN pointing to an optional external dependency for their use. > > > > > Would this be an acceptable compromise? > > > > > > > > I personally do not have any objections to that. Maybe others have > > > > different opinions. > > > > > > Great. > > > > > > Cheers! > > > > > > -- > > > Frode Nordahl > > > > > > > > > > > > > > Thanks > > > > Numan > > > > > > > > > > > > > > -- > > > > > Frode Nordahl > > > > > > > > > > > Thanks > > > > > > Numan > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Frode Nordahl > > > > > > > > > > > > > > > Best regards, Ilya Maximets. > > > > > > > _______________________________________________ > > > > > > > dev mailing list > > > > > > > dev@openvswitch.org > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > _______________________________________________ > > > > > dev mailing list > > > > > dev@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > -- > > > Frode Nordahl > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, Jul 8, 2021 at 11:58 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Wed, Jul 7, 2021 at 8:30 AM Frode Nordahl > <frode.nordahl@canonical.com> wrote: > > > > On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > > > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > > > > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > > > > > >> > > > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > > > > > >> > > > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > > > > >> > > > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > > > >> > > > > > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > > > > >> > > > > > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > > > >> > > > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > > > > > >> > > > > > > > > > >> Ilya, > > > > > > > > > >> > > > > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > > > > >> > > > > > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > > > > > >> how we could go about to enable the use of them. > > > > > > > > > >> > > > > > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > > > > > >> does react to events of physical interfaces being plugged into the > > > > > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > > > >> > > > > > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > > > > > >> this is that the information about which ports to bind, and where to > > > > > > > > > >> bind them is already there. The timing of the information flow from > > > > > > > > > >> the CMS is also suitable for the task. > > > > > > > > > >> > > > > > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > > > > > >> if not all of the variants of these cards. > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > > > > > to be available for connections. I know that OpenStack has metadata > > > > > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > > > > > think, there should be a different way to get/set required information > > > > > > > > > without connection to the Southbound. > > > > > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > > > > was to have a separate OVN process do this if your objection was to > > > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > > > > > >> > > > > > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > > > > > >> using this? > > > > > > > > > >> > > > > > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > > > > > >> > at least for this use case, IIUC. > > > > > > > > > >> > > > > > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > > > > >> infrastructure will become available there too over time, if it is not > > > > > > > > > >> there already. > > > > > > > > > >> > > > > > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > > > > >> platform. > > > > > > > > > >> > > > > > > > > > >> > Maybe, others has different opinions. > > > > > > > > > >> > > > > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > > > > >> > > > > > > > > > >> > Another though is that there is, obviously, a network connection > > > > > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > > > > > >> > host system could just add interfaces over the network connection? > > > > > > > > > >> > > > > > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > > > > > >> extra layer of security and isolation which is separate from the > > > > > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > > > > > >> power from. Requiring a connection between the two for operation would > > > > > > > > > >> defy this purpose. > > > > > > > > > >> > > > > > > > > > >> In addition to that, this class of cards provide visibility into > > > > > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > > > > > >> back over the ovsdb connection. > > > > > > > > > >> > > > > > > > > > >> -- > > > > > > > > > >> Frode Nordahl > > > > > > > > > >> > > > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > > > > other CMSs that want this functionality, and with that we have an > > > > > > > > opportunity to generalise and provide an abstraction in a common place > > > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > > > > cases these ports will require a lot of extra configuration which > > > > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > > > > > queues. You may also want to configure affinity for these queues per > > > > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > > > > > also might require some extra configuration that OVN will have to > > > > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > > > > > them get their information from the CMS. Providing support for > > > > > > > > userspace datapath and DPDK would require more information, some of > > > > > > > > which is available through devlink, some fit well in key/value > > > > > > > > options. Our initial target would be to support the kernel representor > > > > > > > > port workflow. > > > > > > > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > > > > or run management appctl commands if required. > > > > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > > > > there. > > > > > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > > > > > > > Aside from plugging, > > > > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > > > understand why this information about the hardware cannot be > > > > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > > > > or something? > > > > > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > > > > the work items to further develop this is to monitor devlink for > > > > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > > > > configuration is pretty static and can be done at deploy time, this > > > > > > > > proposal prepares for the next generation subfunction workflow where > > > > > > > > you will have a much higher density, and run-time configuration and > > > > > > > > discovery of representor ports. There is a paper about it from netdev > > > > > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > > > > > kernel ABI). > > > > > > > > > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > > > > > My 2 cents. > > > > > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > > > > OVS ports and also > > > > > > > to use devlink infrastructure. I think an external entity can do all these. > > > > > > > > > > > > No doubt an external entity could do this, the challenge is that due > > > > > > to how the SmartNIC with separate control plane CPUs are laid out > > > > > > existing CMS tooling cannot do it, and we are working out a way to > > > > > > efficiently and securely coordinate how and when this external entity > > > > > > does the plugging. At the end of the day the data being passed around > > > > > > is networking related, and I believe it is possible to create a > > > > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > > > > happy to prove this out outside of the ovn-controller first if that is > > > > > > more palatable, see below. > > > > > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > > > monitoring all the port binding > > > > > > > rows without any condition because when a port binding is created the > > > > > > > chassis column will > > > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > > > > > The intention of the condition is to consider unbound ports for > > > > > > plugging, as soon as they have been plugged and bound there will be a > > > > > > value in the chassis column and at that point only the respective > > > > > > chassis will monitor them. I guess this could be a problem if there is > > > > > > a large number of unbound ports in a deployment, from your experience > > > > > > would that be the case? (When I think of it I do remember the load > > > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > > > > > We considered using a separate LSP type, but apart from the plugging > > > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > > > > control plane CPUs there would be just as many of those ports. > > > > > > > > > In my opinion having a separate type would be better. At the least it > > > > > will not impact > > > > > normal or existing deployments. > > > > > > > > > > Let's say we call a type as "representor" (just picking a name for > > > > > now), then each ovn-controller > > > > > can add a conditional monitor to get updates for these types (just > > > > > like how we do now for patch or external ports). > > > > > > > > > > > > What do you think of the below approach > > > > > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > > > lport also sets the requested-chassis option. > > > > > > > > Works for me. > > > > > > > I think it is better not to add a new LSP type for this purpose. It would be better to make this "interface plugging" feature generic, so that it can work for other types as well. It could be supported in the future for a regular VIF, or for a new LSP type, so I think this behavior is better not to be tied to a LSP type. > > > > > > We could add a new column in the Port_Binding table: plugged_by. ovn-controller can monitor all port_bindings that have "plugged_by" equals to current chassis. This way it should be backward compatible and not affecting the use cases that doesn't need "plugged_by". > > > > That is a good idea, during development of the RFC I did make an > > attempt at creating an IDL index for the options:requested-chassis, > > but that does not work because it would only match rows with > > requested-chassis as the *only* option. > > > > Creating a new column would resolve this issue! > > Looking into how this bit could be implemented, I see that northd is > already performing lookup of chassis by name in several places, so I'm > thinking we could accomplish this without any change to NB schema. > > With the following change to SB schema, we could have northd check for > presence of Logical_Switch_Port options:plug_type and > options:requested-chassis, if both are there look up chassis UUID by > name in options:requested-chassis and insert into SB Port_Binding > plugged_by column. > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > index bbf60781d..0dee8e155 100644 > --- a/ovn-sb.ovsschema > +++ b/ovn-sb.ovsschema > @@ -229,7 +229,11 @@ > "external_ids": {"type": {"key": "string", > "value": "string", > "min": 0, > - "max": "unlimited"}}}, > + "max": "unlimited"}}, > + "plugged_by": {"type": {"key": {"type": "uuid", > + "refTable": "Chassis", > + "refType": "weak"}, > + "min": 0, "max": 1}}}, > "indexes": [["datapath", "tunnel_key"], ["logical_port"]], > "isRoot": true}, > "MAC_Binding": { > > What do you think? > > -- > Frode Nordahl > > > > The details of plugging information can be stored in the options column with new options defined, such as: "plug_type". For representor port the "plug_type" can be defined as "representor". For each plug_type there can be type-specific information defined and also stored in the options column. (Or we can define a new column "plug_options" just for plugging information). > > > > A convention to prefix all plugging related options the same way, > > let's say: 'plug_', would work as well as a separate column, but I > > have nothing against a separate column if there is appetite for that. > > > > > > > 2. ovn-controller on the requested chassis sees this lport type, > > > > > creates an OVS port/interface and possibly sets some options or hints > > > > > in the external_ids column > > > > > of ovs interface. At this point, it will not do anything. > > > > > > > > This could potentially help us avoid the issue of adding more > > > > processes and IDLs talking to the SB DB. > > > > > > > > However, at this point in time, the ovn-controller does not know the > > > > details of the representor port, such as name/ifindex for the kernel > > > > representor port, and the interface it creates in OVS has to match the > > > > kernel representor port, and afict it is not possible to change these > > > > details after port creation. This information is provided by the > > > > devlink lookup in the current proposal. > > > > > > > > It might work if it was possible to create the port and interface > > > > separately, but that would cause a referential integrity violation. > > > > Another alternative could be that ovn-vif created a new interface > > > > record in 4), attaching that to the port record created by > > > > ovn-controller replacing a throw-away interface record, but we would > > > > be approaching hacky territory. > > > > > > > > Could we do some sort of local IPC perhaps through a Unix domain socket? > > > > > > I think it would be good to have separate *modules* to handle interface plugging (one module per plug_type), but there is no need to have a separate *process* for it, which increases the complexity and the IPC cost. The modules can be invoked as callbacks registered for each plug_type, and responsible for explaining the options defined for its type, which is directly retrieved from the port_binding record from SB DB. This way the core ovn-controller components are kept untouched (except calling the plugging modules at some point, but not worrying about the details), without introducing a separate process. (A separate process is needed only if there are performance concerns or blocking I/Os). > > > > > > It is similar to how different dpif providers (or different dev types) are supported in OVS. > > > > Yes, I like this. The currently proposed devlink informed plugging > > would have no issues with blocking I/O, and we could document a > > "charter" for the ovn-vif project that anything going into it would > > have to operate as if it was native OVN asynchronous code. > > This bit could possibly be implemented with an interface like this (note that this is draft/pseudocode at this point): diff --git a/controller/plug-provider.h b/controller/plug-provider.h new file mode 100644 index 000000000..9b7a2e722 --- /dev/null +++ b/controller/plug-provider.h @@ -0,0 +1,45 @@ +struct plug { + const struct plug_class *plug_class; +}; + +struct plug_iface { + char *iface_name; + struct smap iface_external_ids; + struct smap iface_options; + struct smap iface_other_config; +}; + +struct plug_class { + /* Type of plugger in this class. */ + const char *type; + + /* Called when the plug provider is registered, typically at program + * startup. */ + int (*init)(void); + + /* Performs periodic work needed by plugger, if any is necessary. */ + bool (*run)(struct plug *plug); + + /* Called when the plug provider is removed, typically at program exit. */ + int (*destroy)(struct plug *plug); + + /* Pass lport name and options to plugger to prepare for port creation. + * + * The plug implemantation can perform lookup or any per port + * initialization and should fill plug_iface with data required for + * port/interface creation. + * + * OVN owns the contents of plug_iface struct and is responsible to free + * any data allocated when done with it. */ + bool (*plug_prepare)(const char *lport_name, + const struct smap *lport_options, + struct plug_iface *); + + /* Notify plugger that port was created. */ + bool (*plug_port_created)(const char *lport_name, + const struct smap *lport_options); + + /* Notify plugger that port was removed. */ + bool (*plug_port_removed)(const char *lport_name, + const struct smap *lport_options); +}; With that the callbacks could be enabled and the infrastructure tested in core OVN, and we could host the platform dependent bits externally which would be enabled by the plug implementation calling a register function like with the OVS dpif framework.
On Fri, Jul 9, 2021 at 9:04 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > On Thu, Jul 8, 2021 at 11:58 AM Frode Nordahl > <frode.nordahl@canonical.com> wrote: > > > > On Wed, Jul 7, 2021 at 8:30 AM Frode Nordahl > > <frode.nordahl@canonical.com> wrote: > > > > > > On Tue, Jul 6, 2021 at 8:11 PM Han Zhou <zhouhan@gmail.com> wrote: > > > > > > > > > > > > > > > > On Tue, Jul 6, 2021 at 1:19 AM Frode Nordahl <frode.nordahl@canonical.com> wrote: > > > > > > > > > > On Mon, Jul 5, 2021 at 6:57 PM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > > > On Mon, Jul 5, 2021 at 12:12 PM Frode Nordahl > > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > > > On Wed, Jun 30, 2021 at 12:32 AM Numan Siddique <numans@ovn.org> wrote: > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl > > > > > > > > <frode.nordahl@canonical.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets <i.maximets@ovn.org> wrote: > > > > > > > > > > > > > > > > > > > > On 6/10/21 8:36 AM, Han Zhou wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl <frode.nordahl@canonical.com <mailto:frode.nordahl@canonical.com>> wrote: > > > > > > > > > > >> > > > > > > > > > > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > > > > > > > > > >> > > > > > > > > > > > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote: > > > > > > > > > > >> > > Introduce plugging module that adds and removes ports on the > > > > > > > > > > >> > > integration bridge, as directed by Port_Binding options. > > > > > > > > > > >> > > > > > > > > > > > > >> > > Traditionally it has been the CMSs responsibility to create Virtual > > > > > > > > > > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual > > > > > > > > > > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug > > > > > > > > > > >> > > operations on the Open vSwitch integration bridge. > > > > > > > > > > >> > > > > > > > > > > > > >> > > With the advent of NICs connected to multiple distinct CPUs we can > > > > > > > > > > >> > > have a topology where the instance runs on one host and Open > > > > > > > > > > >> > > vSwitch and OVN runs on a different host, the smartnic CPU. > > > > > > > > > > >> > > > > > > > > > > > > >> > > The act of plugging and unplugging the representor port in Open > > > > > > > > > > >> > > vSwitch running on the smartnic host CPU would be the same for > > > > > > > > > > >> > > every smartnic variant (thanks to the devlink-port[0][1] > > > > > > > > > > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). > > > > > > > > > > >> > > As such it is natural to extend OVN to provide this common > > > > > > > > > > >> > > functionality through its CMS facing API. > > > > > > > > > > >> > > > > > > > > > > > >> > Hi, Frode. Thanks for putting this together, but it doesn't look > > > > > > > > > > >> > natural to me. OVN, AFAIK, never touched physical devices or > > > > > > > > > > >> > interacted with the kernel directly. This change introduces completely > > > > > > > > > > >> > new functionality inside OVN. With the same effect we can run a fully > > > > > > > > > > >> > separate service on these smartnic CPUs that will do plugging > > > > > > > > > > >> > and configuration job for CMS. You may even make it independent > > > > > > > > > > >> > from a particular CMS by creating a REST API for it or whatever. > > > > > > > > > > >> > This will additionally allow using same service for non-OVN setups. > > > > > > > > > > >> > > > > > > > > > > >> Ilya, > > > > > > > > > > >> > > > > > > > > > > >> Thank you for taking the time to comment, much appreciated. > > > > > > > > > > >> > > > > > > > > > > >> Yes, this is new functionality, NICs with separate control plane CPUs > > > > > > > > > > >> and isolation from the host are also new, so this is one proposal for > > > > > > > > > > >> how we could go about to enable the use of them. > > > > > > > > > > >> > > > > > > > > > > >> The OVN controller does today get pretty close to the physical realm > > > > > > > > > > >> by maintaining patch ports in Open vSwitch based on bridge mapping > > > > > > > > > > >> configuration and presence of bridges to physical interfaces. It also > > > > > > > > > > >> does react to events of physical interfaces being plugged into the > > > > > > > > > > >> Open vSwitch instance it manages, albeit to date some other entity has > > > > > > > > > > >> been doing the act of adding the port into the bridge. > > > > > > > > > > >> > > > > > > > > > > >> The rationale for proposing to use the OVN database for coordinating > > > > > > > > > > >> this is that the information about which ports to bind, and where to > > > > > > > > > > >> bind them is already there. The timing of the information flow from > > > > > > > > > > >> the CMS is also suitable for the task. > > > > > > > > > > >> > > > > > > > > > > >> OVN relies on OVS library code, and all the necessary libraries for > > > > > > > > > > >> interfacing with the kernel through netlink and friends are there or > > > > > > > > > > >> would be easy to add. The rationale for using the netlink-devlink > > > > > > > > > > >> interface is that it provides a generic infrastructure for these types > > > > > > > > > > >> of NICs. So by using this interface we should be able to support most > > > > > > > > > > >> if not all of the variants of these cards. > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> Providing a separate OVN service to do the task could work, but would > > > > > > > > > > >> have the cost of an extra SB DB connection, IDL and monitors. > > > > > > > > > > > > > > > > > > > > IMHO, CMS should never connect to Southbound DB. It's just because the > > > > > > > > > > Southbound DB is not meant to be a public interface, it just happened > > > > > > > > > > to be available for connections. I know that OpenStack has metadata > > > > > > > > > > agents that connects to Sb DB, but if it's really required for them, I > > > > > > > > > > think, there should be a different way to get/set required information > > > > > > > > > > without connection to the Southbound. > > > > > > > > > > > > > > > > > > The CMS-facing API is the Northbound DB, I was not suggesting direct > > > > > > > > > use of the Southbound DB by external to OVN services. My suggestion > > > > > > > > > was to have a separate OVN process do this if your objection was to > > > > > > > > > handle it as part of the ovn-controller process. > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > >> I fear it would be quite hard to build a whole separate project with > > > > > > > > > > >> its own API, feels like a lot of duplicated effort when the flow of > > > > > > > > > > >> data and APIs in OVN already align so well with CMSs interested in > > > > > > > > > > >> using this? > > > > > > > > > > >> > > > > > > > > > > >> > Interactions with physical devices also makes OVN linux-dependent > > > > > > > > > > >> > at least for this use case, IIUC. > > > > > > > > > > >> > > > > > > > > > > >> This specific bit would be linux-specific in the first iteration, yes. > > > > > > > > > > >> But the vendors manufacturing and distributing the hardware do often > > > > > > > > > > >> have drivers for other platforms, I am sure the necessary > > > > > > > > > > >> infrastructure will become available there too over time, if it is not > > > > > > > > > > >> there already. > > > > > > > > > > >> > > > > > > > > > > >> We do currently have platform specific macros in the OVN build system, > > > > > > > > > > >> so we could enable the functionality when built on a compatible > > > > > > > > > > >> platform. > > > > > > > > > > >> > > > > > > > > > > >> > Maybe, others has different opinions. > > > > > > > > > > >> > > > > > > > > > > >> I appreciate your opinion, and enjoy discussing this topic. > > > > > > > > > > >> > > > > > > > > > > >> > Another though is that there is, obviously, a network connection > > > > > > > > > > >> > between the host and smartnic system. Maybe it's possible to just > > > > > > > > > > >> > add an extra remote to the local ovsdb-server so CMS daemon on the > > > > > > > > > > >> > host system could just add interfaces over the network connection? > > > > > > > > > > >> > > > > > > > > > > >> There are a few issues with such an approach. One of the main goals > > > > > > > > > > >> with providing and using a NIC with control plane CPUs is having an > > > > > > > > > > >> extra layer of security and isolation which is separate from the > > > > > > > > > > >> hypervisor host the card happens to share a PCI complex with and draw > > > > > > > > > > >> power from. Requiring a connection between the two for operation would > > > > > > > > > > >> defy this purpose. > > > > > > > > > > >> > > > > > > > > > > >> In addition to that, this class of cards provide visibility into > > > > > > > > > > >> kernel interfaces, enumeration of representor ports etc. only from the > > > > > > > > > > >> NIC control plane CPU side of the PCI complex, this information is not > > > > > > > > > > >> provided to the host. So if a hypervisor host CMS agent were to do the > > > > > > > > > > >> plugging through a remote ovsdb connection, it would have to > > > > > > > > > > >> communicate with something else running on the NIC control plane CPU > > > > > > > > > > >> to retrieve the information it needs before it can know what to relay > > > > > > > > > > >> back over the ovsdb connection. > > > > > > > > > > >> > > > > > > > > > > >> -- > > > > > > > > > > >> Frode Nordahl > > > > > > > > > > >> > > > > > > > > > > >> > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > > > > > > > Here are my 2 cents. > > > > > > > > > > > > > > > > > > > > > > Initially I had similar concerns to Ilya, and it seems OVN should stay away from the physical interface plugging. As a reference, here is how ovn-kubernetes is doing it without adding anything to OVN: https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing <https://docs.google.com/document/d/11IoMKiohK7hIyIE36FJmwJv46DEBx52a4fqvrpCBBcg/edit?usp=sharing> > > > > > > > > > > > > > > > > > > > > AFAICT, a big part of the work is already done on the ovn-k8s side: > > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2005 > > > > > > > > > > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > > > > > > > > > > > > > > > > > > I am aware of the on-going effort to implement support for this in > > > > > > > > > ovn-kubernetes directly. What we have identified is that there are > > > > > > > > > other CMSs that want this functionality, and with that we have an > > > > > > > > > opportunity to generalise and provide an abstraction in a common place > > > > > > > > > that all the consuming CMSes can benefit from. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, thinking more about it, the proposed approach in this patch just expands the way how OVN can bind ports, utilizing the communication channel of OVN (OVSDB connections). If all the information regarding port binding can be specified by the CMS from NB, then it is not unnatural for ovn-controller to perform interface binding directly (instead of passively accepting what is attached by CMS). This kind of information already existed to some extent - the "requested_chassis" option in OpenStack. Now it seems this idea is just expanding it to a specific interface. The difference is that "requested_chassis" is used for validation only, but now we want to directly apply it. So I think at least I don't have a strong opinion on the idea. > > > > > > > > > > > > > > > > > > > > While it's, probably, OK for OVN to add port to the OVSDB, in many > > > > > > > > > > cases these ports will require a lot of extra configuration which > > > > > > > > > > is typically done by os-vif or CNI/device plugins. Imagine that OVS > > > > > > > > > > is running with userspace datapath and you need to plug-in some DPDK > > > > > > > > > > ports, where you have to specify the port type, DPDK port config, > > > > > > > > > > porbably also number of rx/tx queues, number of descriptors in these > > > > > > > > > > queues. You may also want to configure affinity for these queues per > > > > > > > > > > PMD thread in OVS. For kernel interfaces it might be easier, but they > > > > > > > > > > also might require some extra configuration that OVN will have to > > > > > > > > > > think about now. This is a typical job for CMS to configure this > > > > > > > > > > kind of stuff, and that is why projects like os-vif or large variety > > > > > > > > > > of CNI/device plugins exists. > > > > > > > > > > > > > > > > > > CNI is Kubernetes specific, os-vif is OpenStack specific. And both of > > > > > > > > > them get their information from the CMS. Providing support for > > > > > > > > > userspace datapath and DPDK would require more information, some of > > > > > > > > > which is available through devlink, some fit well in key/value > > > > > > > > > options. Our initial target would be to support the kernel representor > > > > > > > > > port workflow. Circling back to the DPDK argument, I had a look at what this would look like in practice. For the DPDK representor port workflow, the information required from CMS would be which PF and a logical VF number. In the current proposal the CMS provides a MAC address of PF and a VF number, with that information the plugging library can look up a PCI address and fill the `options:dpdk-devargs` for example '0000:03:00.0,representor=[0]'. When OVN is running on a SmartNIC, there will be no vHost sockets as the workload would be running on a different host, at the other end of the host facing PF+VF pair. As for other tuning parameters such as queues, descriptors and queue affinity, it appears to me CMS systems are generally unaware of this tuning today and that blanket configuration is applied through other means. When OVN is running on a SmartNIC any queue affinity options would be relative to control plane CPU cores and have no relation to which host CPU cores the workload is running on at the other end of the host facing PF+VF pair. We could choose to replicate this behavior by having the plugging library have its own defaults, possibly influenced by local per chassis configuration and keep this out of the CMS API. Does this address your concerns wrt. DPDK support or do you have other examples of configuration we would need to consider? > > > > > > > > > > > > > > > > > > > > > > There are some benefits: > > > > > > > > > > > 1) The mechanism can be reused by different CMSes, which may simplify CMS implementation. > > > > > > > > > > > 2) Compared with the ovn-k8s approach, it reuses OVN's communication channel, which avoids an extra CMS communication channel on the smart NIC side. (of course this can be achieved by a connection between the BM and smart NIC with *restricted* API just to convey the necessary information) > > > > > > > > > > > > > > > > > > > > The problem I see is that at least ovn-k8s, AFAIU, will require > > > > > > > > > > the daemon on the Smart NIC anyways to monitor and configure > > > > > > > > > > OVN components, e.g. to configure ovn-remote for ovn-controller > > > > > > > > > > or run management appctl commands if required. > > > > > > > > > > So, I don't see the point why it can't do plugging if it's already > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > This pattern is Kubernetes specific and is not the case for other > > > > > > > > > CMSes. The current proposal for enabling Smart NIC with control plane > > > > > > > > > CPUs for ovn-kubernetes could be simplified if the networking platform > > > > > > > > > provided means for coordinating more of the network related bits. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As to the negative side, it would increase OVN's complexity, and as mentioned by Ilya potentially breaks OVN's platform independence. To avoid this, I think the *plugging* module itself needs to be independent and pluggable. It can be extended as independent plugins. The plugin would need to define what information is needed in LSP's "options", and then implement corresponding drivers. With this approach, even the regular VIFs can be attached by ovn-controller if CMS can tell the interface name. Anyway, this is just my brief thinking. > > > > > > > > > > > > > > > > > > > > Aside from plugging, > > > > > > > > > > I also don't see the reason to have devlink code in OVN just > > > > > > > > > > because it runs once on the init stage, IIUC. So, I don't > > > > > > > > > > understand why this information about the hardware cannot be > > > > > > > > > > retrieved during the host provisioning and stored somewhere > > > > > > > > > > (Nova config?). Isn't hardware inventory a job for tripleo > > > > > > > > > > or something? > > > > > > > > > > > > > > > > > > As noted in one of the TODOs in the commit message of the RFC one of > > > > > > > > > the work items to further develop this is to monitor devlink for > > > > > > > > > changes, the end product will not do one-time initialization. > > > > > > > > > > > > > > > > > > While I agree with you that the current SR-IOV acceleration workflow > > > > > > > > > configuration is pretty static and can be done at deploy time, this > > > > > > > > > proposal prepares for the next generation subfunction workflow where > > > > > > > > > you will have a much higher density, and run-time configuration and > > > > > > > > > discovery of representor ports. There is a paper about it from netdev > > > > > > > > > conf[2], and all of this is part of the devlink infrastructure (Linux > > > > > > > > > kernel ABI). > > > > > > > > > > > > > > > > > > 2: https://netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf > > > > > > > > > > > > > > > My 2 cents. > > > > > > > > > > > > > > Thank you for weighing in with your opinion, Numan. > > > > > > > > > > > > > > > I agree with Ilya. It doesn't seem natural to me for OVN to create > > > > > > > > OVS ports and also > > > > > > > > to use devlink infrastructure. I think an external entity can do all these. > > > > > > > > > > > > > > No doubt an external entity could do this, the challenge is that due > > > > > > > to how the SmartNIC with separate control plane CPUs are laid out > > > > > > > existing CMS tooling cannot do it, and we are working out a way to > > > > > > > efficiently and securely coordinate how and when this external entity > > > > > > > does the plugging. At the end of the day the data being passed around > > > > > > > is networking related, and I believe it is possible to create a > > > > > > > solution that would be consumable by all OVN-enabled CMS's. We'd be > > > > > > > happy to prove this out outside of the ovn-controller first if that is > > > > > > > more palatable, see below. > > > > > > > > > > > > > > > The other downside I see with this RFC is that it will be now > > > > > > > > monitoring all the port binding > > > > > > > > rows without any condition because when a port binding is created the > > > > > > > > chassis column will > > > > > > > > be NULL. Correct me If I'm wrong here. > > > > > > > > > > > > > > The intention of the condition is to consider unbound ports for > > > > > > > plugging, as soon as they have been plugged and bound there will be a > > > > > > > value in the chassis column and at that point only the respective > > > > > > > chassis will monitor them. I guess this could be a problem if there is > > > > > > > a large number of unbound ports in a deployment, from your experience > > > > > > > would that be the case? (When I think of it I do remember the load > > > > > > > balancer vip perhaps being a consumer of unbound ports). > > > > > > > > > > > > > > We considered using a separate LSP type, but apart from the plugging > > > > > > > bit these LSP's will behave just like regular VIFs from OVN Port > > > > > > > Binding code POV, and if all your hypervisors have SmartNICs with > > > > > > > control plane CPUs there would be just as many of those ports. > > > > > > > > > > > > In my opinion having a separate type would be better. At the least it > > > > > > will not impact > > > > > > normal or existing deployments. > > > > > > > > > > > > Let's say we call a type as "representor" (just picking a name for > > > > > > now), then each ovn-controller > > > > > > can add a conditional monitor to get updates for these types (just > > > > > > like how we do now for patch or external ports). > > > > > > > > > > > > > > > > What do you think of the below approach > > > > > > > > > > > > 1. A new LSP type - representor is added. CMS when creating this > > > > > > lport also sets the requested-chassis option. > > > > > > > > > > Works for me. > > > > > > > > > I think it is better not to add a new LSP type for this purpose. It would be better to make this "interface plugging" feature generic, so that it can work for other types as well. It could be supported in the future for a regular VIF, or for a new LSP type, so I think this behavior is better not to be tied to a LSP type. > > > > > > > > We could add a new column in the Port_Binding table: plugged_by. ovn-controller can monitor all port_bindings that have "plugged_by" equals to current chassis. This way it should be backward compatible and not affecting the use cases that doesn't need "plugged_by". > > > > > > That is a good idea, during development of the RFC I did make an > > > attempt at creating an IDL index for the options:requested-chassis, > > > but that does not work because it would only match rows with > > > requested-chassis as the *only* option. > > > > > > Creating a new column would resolve this issue! > > > > Looking into how this bit could be implemented, I see that northd is > > already performing lookup of chassis by name in several places, so I'm > > thinking we could accomplish this without any change to NB schema. > > > > With the following change to SB schema, we could have northd check for > > presence of Logical_Switch_Port options:plug_type and > > options:requested-chassis, if both are there look up chassis UUID by > > name in options:requested-chassis and insert into SB Port_Binding > > plugged_by column. > > > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > > index bbf60781d..0dee8e155 100644 > > --- a/ovn-sb.ovsschema > > +++ b/ovn-sb.ovsschema > > @@ -229,7 +229,11 @@ > > "external_ids": {"type": {"key": "string", > > "value": "string", > > "min": 0, > > - "max": "unlimited"}}}, > > + "max": "unlimited"}}, > > + "plugged_by": {"type": {"key": {"type": "uuid", > > + "refTable": "Chassis", > > + "refType": "weak"}, > > + "min": 0, "max": 1}}}, > > "indexes": [["datapath", "tunnel_key"], ["logical_port"]], > > "isRoot": true}, > > "MAC_Binding": { > > > > What do you think? > > > > -- > > Frode Nordahl > > > > > > The details of plugging information can be stored in the options column with new options defined, such as: "plug_type". For representor port the "plug_type" can be defined as "representor". For each plug_type there can be type-specific information defined and also stored in the options column. (Or we can define a new column "plug_options" just for plugging information). > > > > > > A convention to prefix all plugging related options the same way, > > > let's say: 'plug_', would work as well as a separate column, but I > > > have nothing against a separate column if there is appetite for that. > > > > > > > > > 2. ovn-controller on the requested chassis sees this lport type, > > > > > > creates an OVS port/interface and possibly sets some options or hints > > > > > > in the external_ids column > > > > > > of ovs interface. At this point, it will not do anything. > > > > > > > > > > This could potentially help us avoid the issue of adding more > > > > > processes and IDLs talking to the SB DB. > > > > > > > > > > However, at this point in time, the ovn-controller does not know the > > > > > details of the representor port, such as name/ifindex for the kernel > > > > > representor port, and the interface it creates in OVS has to match the > > > > > kernel representor port, and afict it is not possible to change these > > > > > details after port creation. This information is provided by the > > > > > devlink lookup in the current proposal. > > > > > > > > > > It might work if it was possible to create the port and interface > > > > > separately, but that would cause a referential integrity violation. > > > > > Another alternative could be that ovn-vif created a new interface > > > > > record in 4), attaching that to the port record created by > > > > > ovn-controller replacing a throw-away interface record, but we would > > > > > be approaching hacky territory. > > > > > > > > > > Could we do some sort of local IPC perhaps through a Unix domain socket? > > > > > > > > I think it would be good to have separate *modules* to handle interface plugging (one module per plug_type), but there is no need to have a separate *process* for it, which increases the complexity and the IPC cost. The modules can be invoked as callbacks registered for each plug_type, and responsible for explaining the options defined for its type, which is directly retrieved from the port_binding record from SB DB. This way the core ovn-controller components are kept untouched (except calling the plugging modules at some point, but not worrying about the details), without introducing a separate process. (A separate process is needed only if there are performance concerns or blocking I/Os). > > > > > > > > It is similar to how different dpif providers (or different dev types) are supported in OVS. > > > > > > Yes, I like this. The currently proposed devlink informed plugging > > > would have no issues with blocking I/O, and we could document a > > > "charter" for the ovn-vif project that anything going into it would > > > have to operate as if it was native OVN asynchronous code. > > > > > This bit could possibly be implemented with an interface like this > (note that this is draft/pseudocode at this point): > diff --git a/controller/plug-provider.h b/controller/plug-provider.h > new file mode 100644 > index 000000000..9b7a2e722 > --- /dev/null > +++ b/controller/plug-provider.h > @@ -0,0 +1,45 @@ > +struct plug { > + const struct plug_class *plug_class; > +}; > + > +struct plug_iface { > + char *iface_name; > + struct smap iface_external_ids; > + struct smap iface_options; > + struct smap iface_other_config; > +}; > + > +struct plug_class { > + /* Type of plugger in this class. */ > + const char *type; > + > + /* Called when the plug provider is registered, typically at program > + * startup. */ > + int (*init)(void); > + > + /* Performs periodic work needed by plugger, if any is necessary. */ > + bool (*run)(struct plug *plug); > + > + /* Called when the plug provider is removed, typically at program exit. */ > + int (*destroy)(struct plug *plug); > + > + /* Pass lport name and options to plugger to prepare for port creation. > + * > + * The plug implemantation can perform lookup or any per port > + * initialization and should fill plug_iface with data required for > + * port/interface creation. > + * > + * OVN owns the contents of plug_iface struct and is responsible to free > + * any data allocated when done with it. */ > + bool (*plug_prepare)(const char *lport_name, > + const struct smap *lport_options, > + struct plug_iface *); > + > + /* Notify plugger that port was created. */ > + bool (*plug_port_created)(const char *lport_name, > + const struct smap *lport_options); > + > + /* Notify plugger that port was removed. */ > + bool (*plug_port_removed)(const char *lport_name, > + const struct smap *lport_options); > +}; We should probably have a struct to keep the values the plugging library needs to do its processing instead of multiple arguments, something like: struct plug_port_ctx { bool dpdk_initialized; /* from local Open_vSwitch table */ const char *datapath_type; /* from local integration bridge */ /* alternatively the above two could be collapsed into a single * dpdk_enabled / use_dpdk bool */ const char *lport_name; const struct smap *lport_options; };
As we discussed in the IRC meeting on Thu, July 15th I'm working on a new RFC patch series which incorporates the discussion so far and I'll propose that to the list when done. For anyone following along I'll be staging it over here: https://github.com/fnordahl/ovn/commits/pluggingv2
diff --git a/controller/automake.mk b/controller/automake.mk index e664f1980..04e5708ec 100644 --- a/controller/automake.mk +++ b/controller/automake.mk @@ -26,6 +26,8 @@ controller_ovn_controller_SOURCES = \ controller/pinctrl.h \ controller/patch.c \ controller/patch.h \ + controller/plugging.c \ + controller/plugging.h \ controller/ovn-controller.c \ controller/ovn-controller.h \ controller/physical.c \ diff --git a/controller/binding.c b/controller/binding.c index 4ca2b4d9a..66fa65091 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1081,15 +1081,6 @@ is_binding_lport_this_chassis(struct binding_lport *b_lport, b_lport->pb->chassis == chassis); } -static bool -can_bind_on_this_chassis(const struct sbrec_chassis *chassis_rec, - const char *requested_chassis) -{ - return !requested_chassis || !requested_chassis[0] - || !strcmp(requested_chassis, chassis_rec->name) - || !strcmp(requested_chassis, chassis_rec->hostname); -} - /* Returns 'true' if the 'lbinding' has binding lports of type LP_CONTAINER, * 'false' otherwise. */ static bool @@ -1186,8 +1177,8 @@ consider_vif_lport(const struct sbrec_port_binding *pb, struct hmap *qos_map) { const char *vif_chassis = smap_get(&pb->options, "requested-chassis"); - bool can_bind = can_bind_on_this_chassis(b_ctx_in->chassis_rec, - vif_chassis); + bool can_bind = lport_can_bind_on_this_chassis(b_ctx_in->chassis_rec, + vif_chassis); if (!lbinding) { lbinding = local_binding_find(&b_ctx_out->lbinding_data->bindings, @@ -1278,8 +1269,8 @@ consider_container_lport(const struct sbrec_port_binding *pb, ovs_assert(parent_b_lport && parent_b_lport->pb); const char *vif_chassis = smap_get(&parent_b_lport->pb->options, "requested-chassis"); - bool can_bind = can_bind_on_this_chassis(b_ctx_in->chassis_rec, - vif_chassis); + bool can_bind = lport_can_bind_on_this_chassis(b_ctx_in->chassis_rec, + vif_chassis); return consider_vif_lport_(pb, can_bind, vif_chassis, b_ctx_in, b_ctx_out, container_b_lport, qos_map); diff --git a/controller/lport.c b/controller/lport.c index 478fcfd82..e97cc17c5 100644 --- a/controller/lport.c +++ b/controller/lport.c @@ -120,3 +120,12 @@ mcgroup_lookup_by_dp_name( return retval; } + +bool +lport_can_bind_on_this_chassis(const struct sbrec_chassis *chassis_rec, + const char *requested_chassis) +{ + return !requested_chassis || !requested_chassis[0] + || !strcmp(requested_chassis, chassis_rec->name) + || !strcmp(requested_chassis, chassis_rec->hostname); +} diff --git a/controller/lport.h b/controller/lport.h index 345efc184..4d9b6891d 100644 --- a/controller/lport.h +++ b/controller/lport.h @@ -54,5 +54,7 @@ lport_is_chassis_resident(struct ovsdb_idl_index *sbrec_port_binding_by_name, const struct sbrec_chassis *chassis, const struct sset *active_tunnels, const char *port_name); +bool lport_can_bind_on_this_chassis(const struct sbrec_chassis *, + const char *); #endif /* controller/lport.h */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 67c51a86f..93b6e3e11 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -54,6 +54,7 @@ #include "patch.h" #include "physical.h" #include "pinctrl.h" +#include "plugging.h" #include "openvswitch/poll-loop.h" #include "lib/bitmap.h" #include "lib/hash.h" @@ -269,6 +270,11 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, */ sbrec_logical_flow_add_clause_logical_dp_group(&lf, OVSDB_F_NE, NULL); } + /* Monitor unbound LP_VIF ports to consider representor port plugging */ + struct uuid zero_uuid; + memset(&zero_uuid, 0, sizeof(zero_uuid)); + sbrec_port_binding_add_clause_chassis(&pb, OVSDB_F_EQ, &zero_uuid); + sbrec_port_binding_add_clause_type(&pb, OVSDB_F_EQ, "\"\""); out:; unsigned int cond_seqnos[] = { @@ -2572,6 +2578,7 @@ main(int argc, char *argv[]) binding_init(); patch_init(); pinctrl_init(); + plugging_init(); lflow_init(); /* Connect to OVS OVSDB instance. */ @@ -3050,6 +3057,10 @@ main(int argc, char *argv[]) ovsrec_open_vswitch_table_get(ovs_idl_loop.idl), ovsrec_port_table_get(ovs_idl_loop.idl), br_int, chassis, &runtime_data->local_datapaths); + plugging_run(ovs_idl_txn, + sbrec_port_binding_table_get(ovnsb_idl_loop.idl), + ovsrec_port_table_get(ovs_idl_loop.idl), + br_int, chassis); pinctrl_run(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_datapath, @@ -3268,6 +3279,7 @@ loop_done: ofctrl_destroy(); pinctrl_destroy(); patch_destroy(); + plugging_destroy(); ovsdb_idl_loop_destroy(&ovs_idl_loop); ovsdb_idl_loop_destroy(&ovnsb_idl_loop); diff --git a/controller/plugging.c b/controller/plugging.c new file mode 100644 index 000000000..8e62dfcb9 --- /dev/null +++ b/controller/plugging.c @@ -0,0 +1,422 @@ +/* Copyright (c) 2021 Canonical + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <config.h> +#include <errno.h> +#include <inttypes.h> +#include <linux/devlink.h> +#include <net/if.h> + +#include "plugging.h" + +#include "hash.h" +#include "lflow.h" +#include "lib/vswitch-idl.h" +#include "lport.h" +#include "openvswitch/hmap.h" +#include "openvswitch/vlog.h" +#include "lib/ovn-sb-idl.h" +#include "netlink-devlink.h" +#include "ovn-controller.h" +#include "openvswitch/shash.h" +#include "packets.h" + +VLOG_DEFINE_THIS_MODULE(plugging); + +/* Contains netdev name of ports known to devlink indexed by PF MAC + * address and logical function number (if applicable). + * + * Examples: + * SR-IOV Physical Function: key "00:53:00:00:00:42" value "pf0hpf" + * SR-IOV Virtual Function: key "00:53:00:00:00:42-42" value "pf0vf42" + */ +static struct shash devlink_ports; + +/* Max number of physical ports connected to a single NIC SoC. */ +#define MAX_NIC_PHY_PORTS 64 +/* string repr of eth MAC, '-', logical function number (uint32_t) */ +#define MAX_KEY_LEN 17+1+10+1 + + +static bool compat_get_host_pf_mac(const char *, struct eth_addr *); + +static bool +fill_devlink_ports_key_from_strs(char *buf, size_t bufsiz, + const char *host_pf_mac, + const char *function) +{ + return snprintf(buf, bufsiz, + function != NULL ? "%s-%s": "%s", + host_pf_mac, function) < bufsiz; +} + +/* We deliberately pass the struct eth_addr by value as we would have to copy + * the data either way to make use of the ETH_ADDR_ARGS macro */ +static bool +fill_devlink_ports_key_from_typed(char *buf, size_t bufsiz, + struct eth_addr host_pf_mac, + uint32_t function) +{ + return snprintf( + buf, bufsiz, + function < UINT32_MAX ? ETH_ADDR_FMT"-%"PRIu32 : ETH_ADDR_FMT, + ETH_ADDR_ARGS(host_pf_mac), function) < bufsiz; +} + +static void +devlink_port_add_function(struct dl_port *port_entry, + struct eth_addr *host_pf_mac) +{ + char keybuf[MAX_KEY_LEN]; + uint32_t function_number; + + switch(port_entry->flavour) { + case DEVLINK_PORT_FLAVOUR_PCI_PF: + /* for Physical Function representor ports we only add the MAC address + * and no logical function number */ + function_number = -1; + break; + case DEVLINK_PORT_FLAVOUR_PCI_VF: + function_number = port_entry->pci_vf_number; + break; + default: + VLOG_WARN("Unsupported flavour for port '%s': %s", + port_entry->netdev_name, + port_entry->flavour == DEVLINK_PORT_FLAVOUR_PHYSICAL ? "PHYSICAL" : + port_entry->flavour == DEVLINK_PORT_FLAVOUR_CPU ? "CPU" : + port_entry->flavour == DEVLINK_PORT_FLAVOUR_DSA ? "DSA" : + port_entry->flavour == DEVLINK_PORT_FLAVOUR_PCI_PF ? "PCI_PF": + port_entry->flavour == DEVLINK_PORT_FLAVOUR_PCI_VF ? "PCI_VF": + port_entry->flavour == DEVLINK_PORT_FLAVOUR_VIRTUAL ? "VIRTUAL": + port_entry->flavour == DEVLINK_PORT_FLAVOUR_UNUSED ? "UNUSED": + port_entry->flavour == DEVLINK_PORT_FLAVOUR_PCI_SF ? "PCI_SF": + "UNKNOWN"); + return; + }; + /* Failure to fill key from typed values means calculation of the max key + * length is wrong, i.e. a bug. */ + ovs_assert(fill_devlink_ports_key_from_typed( + keybuf, sizeof(keybuf), + *host_pf_mac, function_number)); + shash_add(&devlink_ports, keybuf, xstrdup(port_entry->netdev_name)); +} + + +void +plugging_init(void) +{ + struct nl_dl_dump_state *port_dump; + struct dl_port port_entry; + int error; + struct eth_addr host_pf_macs[MAX_NIC_PHY_PORTS]; + + shash_init(&devlink_ports); + + port_dump = nl_dl_dump_init(); + if ((error = nl_dl_dump_init_error(port_dump))) { + VLOG_WARN( + "unable to start dump of ports from devlink-port interface"); + return; + } + /* The core devlink infrastructure in the kernel keeps a linked list of + * the devices and each of those has a linked list of ports. These are + * populated by each device driver as devices are enumerated, and as such + * we can rely on ports being dumped in a consistent order on a device + * by device basis with logical numbering for each port flavour starting + * on 0 for each new device. + */ + nl_dl_dump_start(DEVLINK_CMD_PORT_GET, port_dump); + while (nl_dl_port_dump_next(port_dump, &port_entry)) { + switch (port_entry.flavour) { + case DEVLINK_PORT_FLAVOUR_PHYSICAL: + /* The PHYSICAL flavoured port represent a network facing port on + * the NIC. + * + * For kernel versions where the devlink-port infrastructure does + * not provide MAC address for PCI_PF flavoured ports, there exist + * a interface in sysfs which is relative to the name of the + * PHYSICAL port netdev name. + * + * Since we at this point in the dump do not know if the MAC will + * be provided for the PCI_PF or not, proactively store the MAC + * address by looking up through the sysfs interface. + * + * If MAC address is available once we get to the PCI_PF we will + * overwrite the stored value. + */ + if (port_entry.number > MAX_NIC_PHY_PORTS) { + VLOG_WARN("physical port number out of range for port '%s': " + "%"PRIu32, + port_entry.netdev_name, port_entry.number); + continue; + } + compat_get_host_pf_mac(port_entry.netdev_name, + &host_pf_macs[port_entry.number]); + break; + case DEVLINK_PORT_FLAVOUR_PCI_PF: /* FALL THROUGH */ + /* The PCI_PF flavoured port represent a host facing port. + * + * For function flavours other than PHYSICAL pci_pf_number will be + * set to the logical number of which physical port the function + * belongs. + */ + if (!eth_addr_is_zero(port_entry.function.eth_addr)) { + host_pf_macs[port_entry.pci_pf_number] = + port_entry.function.eth_addr; + } + /* FALL THROUGH */ + case DEVLINK_PORT_FLAVOUR_PCI_VF: + /* The PCI_VF flavoured port represent a host facing + * PCI Virtual Function. + * + * For function flavours other than PHYSICAL pci_pf_number will be + * set to the logical number of which physical port the function + * belongs. + */ + if (port_entry.pci_pf_number > MAX_NIC_PHY_PORTS) { + VLOG_WARN("physical port number out of range for port '%s': " + "%"PRIu32, + port_entry.netdev_name, port_entry.pci_pf_number); + continue; + } + devlink_port_add_function(&port_entry, + &host_pf_macs[port_entry.pci_pf_number]); + break; + }; + } + nl_dl_dump_finish(port_dump); + nl_dl_dump_destroy(port_dump); + + struct shash_node *node; + SHASH_FOR_EACH (node, &devlink_ports) { + VLOG_INFO("HELLO %s -> %s", node->name, (char*)node->data); + } +} + +void +plugging_destroy(void) +{ + shash_destroy_free_data(&devlink_ports); +} + +static bool +match_port (const struct ovsrec_port *port, const char *name) +{ + return !name || !name[0] + || !strcmp(port->name, name); +} + +/* Creates a port in bridge 'br_int' named 'name'. + * + * If such a port already exists, removes it from 'existing_ports'. */ +static void +create_port(struct ovsdb_idl_txn *ovs_idl_txn, + const char *iface_id, + const struct ovsrec_bridge *br_int, const char *name, + struct shash *existing_ports) +{ + for (size_t i = 0; i < br_int->n_ports; i++) { + if (match_port(br_int->ports[i], name)) { + VLOG_INFO("port already created: %s %s", iface_id, name); + shash_find_and_delete(existing_ports, br_int->ports[i]->name); + return; + } + } + + ovsdb_idl_txn_add_comment(ovs_idl_txn, + "ovn-controller: plugging port '%s' into '%s'", + name, br_int->name); + + struct ovsrec_interface *iface; + iface = ovsrec_interface_insert(ovs_idl_txn); + ovsrec_interface_set_name(iface, name); + const struct smap ids = SMAP_CONST2( + &ids, + "iface-id", iface_id, + "ovn-plugged", "true"); + ovsrec_interface_set_external_ids(iface, &ids); + + struct ovsrec_port *port; + port = ovsrec_port_insert(ovs_idl_txn); + ovsrec_port_set_name(port, name); + ovsrec_port_set_interfaces(port, &iface, 1); + + struct ovsrec_port **ports; + ports = xmalloc(sizeof *ports * (br_int->n_ports + 1)); + memcpy(ports, br_int->ports, sizeof *ports * br_int->n_ports); + ports[br_int->n_ports] = port; + ovsrec_bridge_verify_ports(br_int); + ovsrec_bridge_set_ports(br_int, ports, br_int->n_ports + 1); + + free(ports); +} + +static void +remove_port(const struct ovsrec_bridge *br_int, + const struct ovsrec_port *port) +{ + for (size_t i = 0; i < br_int->n_ports; i++) { + if (br_int->ports[i] != port) { + continue; + } + struct ovsrec_port **new_ports; + new_ports = xmemdup(br_int->ports, + sizeof *new_ports * (br_int->n_ports - 1)); + if (i != br_int->n_ports - 1) { + /* Removed port was not last */ + new_ports[i] = br_int->ports[br_int->n_ports - 1]; + } + ovsrec_bridge_verify_ports(br_int); + ovsrec_bridge_set_ports(br_int, new_ports, br_int->n_ports - 1); + free(new_ports); + ovsrec_port_delete(port); + return; + } +} + +static bool +can_plug(const char *vif_plugging) +{ + return !vif_plugging || !vif_plugging[0] + || !strcmp(vif_plugging, "true"); +} + +void +plugging_run(struct ovsdb_idl_txn *ovs_idl_txn, + const struct sbrec_port_binding_table *port_binding_table, + const struct ovsrec_port_table *port_table, + const struct ovsrec_bridge *br_int, + const struct sbrec_chassis *chassis) +{ + if (!ovs_idl_txn) { + return; + } + + /* Figure out what ports managed by OVN already exist. */ + struct shash existing_ports = SHASH_INITIALIZER(&existing_ports); + const struct ovsrec_port *port; + OVSREC_PORT_TABLE_FOR_EACH (port, port_table) { + for (size_t i = 0; i < port->n_interfaces; i++) { + struct ovsrec_interface *iface = port->interfaces[i]; + const char *port_iface_id; + if (can_plug(smap_get(&iface->external_ids, "ovn-plugged")) + && (port_iface_id = smap_get(&iface->external_ids, + "iface-id"))) { + shash_add(&existing_ports, port_iface_id, port); + } + } + } + + /* Iterate over currently unbound ports destined for this chassis or ports + * already bound to this chassis and check if OVN management is requested. + * Remove ports from 'existing_ports' that do exist in the database and + * should be there. */ + const struct sbrec_port_binding *port_binding; + SBREC_PORT_BINDING_TABLE_FOR_EACH (port_binding, + port_binding_table) + { + VLOG_INFO("HELLO %s", port_binding->logical_port); + const char *vif_chassis = smap_get(&port_binding->options, + "requested-chassis"); + const char *vif_plugging = smap_get_def(&port_binding->options, + "ovn-plugging", + "false"); + VLOG_INFO("HELLO %s", port_binding->logical_port); + if (lport_can_bind_on_this_chassis(chassis, vif_chassis) + && can_plug(vif_plugging)) + { + char keybuf[MAX_KEY_LEN]; + const char *rep_port; + const char *pf_mac; + const char *vf_num; + + if (!fill_devlink_ports_key_from_strs( + keybuf, sizeof(keybuf), + (pf_mac = smap_get( + &port_binding->options, "pf-mac")), + (vf_num = smap_get( + &port_binding->options, "vf-num")))) + { + /* Overflow, most likely incorrect input data from database */ + VLOG_WARN("Southbound DB port plugging options out of range: " + "pf-mac: '%s' vf-num: '%s'", pf_mac, vf_num); + continue; + } + + shash_find_and_delete(&existing_ports, port_binding->logical_port); + + rep_port = shash_find_data(&devlink_ports, keybuf); + VLOG_INFO("plug %s (%s) -> %s", + port_binding->logical_port, rep_port, br_int->name); + create_port(ovs_idl_txn, port_binding->logical_port, + br_int, rep_port, &existing_ports); + } + } + + /* Now 'existing_ports' only contains ports that exist in the + * database but shouldn't. Delete them from the database. */ + struct shash_node *port_node, *port_next_node; + SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) { + port = port_node->data; + shash_delete(&existing_ports, port_node); + VLOG_INFO("remove port %s", port->name); + remove_port(br_int, port); + } + shash_destroy(&existing_ports); +} + +/* The kernel devlink-port interface provides a vendor neutral and standard way + * of discovering host visible resources such as MAC address of interfaces from + * a program running on the NIC SoC side. + * + * However a fairly recent kernel version is required for it to work, so until + * this is widely available we provide this helper to retrieve the same + * information from the interim sysfs solution. */ +static bool +compat_get_host_pf_mac(const char *netdev_name, struct eth_addr *ea) +{ + char file_name[IFNAMSIZ+35+1]; + FILE *stream; + char line[128]; + bool retval = false; + + snprintf(file_name, sizeof(file_name), + "/sys/class/net/%s/smart_nic/pf/config", netdev_name); + stream = fopen(file_name, "r"); + if (!stream) { + VLOG_WARN("%s: open failed (%s)", + file_name, ovs_strerror(errno)); + *ea = eth_addr_zero; + return false; + } + while (fgets(line, sizeof(line), stream)) { + char key[16]; + char *cp; + if (ovs_scan(line, "%15[^:]: ", key) + && key[0] == 'M' && key[1] == 'A' && key[2] == 'C') + { + /* strip any newline character */ + if ((cp = strchr(line, '\n')) != NULL) { + *cp = '\0'; + } + /* point cp at end of key + ': ', i.e. start of MAC address */ + cp = line + strnlen(key, sizeof(key)) + 2; + retval = eth_addr_from_string(cp, ea); + break; + } + } + fclose(stream); + return retval; +} diff --git a/controller/plugging.h b/controller/plugging.h new file mode 100644 index 000000000..324e5bdc0 --- /dev/null +++ b/controller/plugging.h @@ -0,0 +1,81 @@ +/* Copyright (c) 2021 Canonical + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OVN_PLUGGING_H +#define OVN_PLUGGING_H 1 + +/* Interface Plugging + * ================== + * + * This module adds and removes ports on the integration bridge, as directed by + * Port_Binding options. + * + * Traditionally it has been the CMSs responsibility to create Virtual + * Interfaces as part of instance (Container, Pod, Virtual Machine etc.) life + * cycle, and subsequently manage plug/unplug operations on the Open vSwitch + * integration bridge. + * + * With the advent of NICs connected to multiple distinct CPUs we can have a + * topology where the instance runs on one host and Open vSwitch and OVN runs + * on a different host, the smartnic CPU. + * + * The act of plugging and unplugging the representor port in Open vSwitch + * running on the smartnic host CPU would be the same for every smartnic + * variant (thanks to the devlink-port infrastructure), and every CMS. As + * such it is natural to extend OVN to provide this common functionality + * through its CMS facing API. + * + * The instance will be connected to a SR-IOV Virtual Function or a RDMA + * Mediated Device on the host sytem (the latter not currently addressed in + * this implementation). The NIC driver will maintain a representor port for + * each of the host visible devices on the smartnic side. + * + * It is the CMSs responsibility to maintain a mapping between instance host + * and smartnic host, OVN can help by optionally providing details such as + * board serial number of the smartnic system as part of Chassis registration. + * + * The CMS will use it's knowledge of instance host <-> smartnic host mapping + * to add appropriate `requested-chassis` along with the information OVN needs + * to identify the representor port as options when creating Logical Switch + * Ports for instances. These options will be copied over to the Port_Binding + * table by ovn-northd. + * + * OVN will use the devlink interface to look up which representor port + * corresponds to the host visible resource and add this representor port to + * the integration bridge. + * + * Options API: + * ovn-plugged: true + * pf-mac: "00:53:00:00:00:42" // To distinguish between ports on NIC SoC + * vf-num: 42 (optional) // Refers to a logical PCI VF number + * // not specifying vf-num means plug PF + * // representor. + */ + +struct ovsdb_idl_txn; +struct sbrec_port_binding_table; +struct ovsrec_port_table; +struct ovsrec_bridge; +struct sbrec_chassis; + +void plugging_run(struct ovsdb_idl_txn *, + const struct sbrec_port_binding_table *, + const struct ovsrec_port_table *, + const struct ovsrec_bridge *, + const struct sbrec_chassis *); +void plugging_init(void); +void plugging_destroy(void); + +#endif /* controller/plugging.h */
Introduce plugging module that adds and removes ports on the integration bridge, as directed by Port_Binding options. Traditionally it has been the CMSs responsibility to create Virtual Interfaces (VIFs) as part of instance (Container, Pod, Virtual Machine etc.) life cycle, and subsequently manage plug/unplug operations on the Open vSwitch integration bridge. With the advent of NICs connected to multiple distinct CPUs we can have a topology where the instance runs on one host and Open vSwitch and OVN runs on a different host, the smartnic CPU. The act of plugging and unplugging the representor port in Open vSwitch running on the smartnic host CPU would be the same for every smartnic variant (thanks to the devlink-port[0][1] infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.). As such it is natural to extend OVN to provide this common functionality through its CMS facing API. The instance will be connected to a SR-IOV Virtual Function or a RDMA Mediated Device on the host system (the latter not currently addressed in this implementation). The smartnic driver will maintain a representor port for each of the host visible devices on the smartnic CPU side. It is the CMSs responsibility to maintain a mapping between instance host and smartnic host, OVN can help by optionally providing details such as board serial number of the smartnic system as part of Chassis registration. The CMS will use it's knowledge of instance host <-> smartnic host mapping to add appropriate `requested-chassis` along with the information OVN needs to identify the representor port as options when creating Logical Switch Ports for instances. These options will be copied over to the Port_Binding table by ovn-northd. OVN will use the devlink-port[0][1] interface to look up which representor port corresponds to the host visible resource and maintain presence of this representor port on the integration bridge. 0: https://www.kernel.org/doc/html/latest/networking/devlink/devlink-port.html 1: https://mail.openvswitch.org/pipermail/ovs-dev/2021-May/382701.html TODO: * Implement run-time refresh or incremental updates to devlink_ports hash by opening a netlink socket and monitoring devlink changes. * Make use of an index for unbound ports. * Make use of the existing Incremental Processing support for Port-Bindings and call into plugging module from the binding module instead of having a separate run loop. * Implement functionality to convey smartnic identifying characteristics as external-ids on Chassis registration. The intention is to help CMS map relationships between hosts. Source can be board serial number obtained from devlink-info, PCI VPD, or a value stored in the local Open vSwitch database maintained by the operator. * Write tests. * Update documentation. Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> --- controller/automake.mk | 2 + controller/binding.c | 17 +- controller/lport.c | 9 + controller/lport.h | 2 + controller/ovn-controller.c | 12 + controller/plugging.c | 422 ++++++++++++++++++++++++++++++++++++ controller/plugging.h | 81 +++++++ 7 files changed, 532 insertions(+), 13 deletions(-) create mode 100644 controller/plugging.c create mode 100644 controller/plugging.h