[ovs-dev,RFC] Design for OVN Kubernetes integration.
diff mbox

Message ID 1445464402-21268-1-git-send-email-gshetty@nicira.com
State RFC
Headers show

Commit Message

Gurucharan Shetty Oct. 21, 2015, 9:53 p.m. UTC
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
---
 ovn/automake.mk   |    1 +
 ovn/kubernetes.md |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+)
 create mode 100644 ovn/kubernetes.md

Comments

Gurucharan Shetty Oct. 30, 2015, 3:47 p.m. UTC | #1
I have added a couple of example integrations that do basic
connectivity here (for anyone that wants to try it out):

Without OpenStack:
https://github.com/shettyg/ovn-kubernetes/blob/master/Example_overlay.md

With OpenStack:
https://github.com/shettyg/ovn-kubernetes/blob/master/Example_underlay.md

With OpenStack, since the OVN Neutron plugin of OpenStack does not yet
create OVN l3 routers, one has to create it manually. Also since
OpenStack OVN driver does not yet set the ip address for lports, that
has to be set manually too.

Please note that the k8 network plugin architecture is still a "alpha"
feature. It is likely that Kubernetes will introduce a new plugin
architecture next month, so we will likely have to rewrite a few
things. As a result, the code is mostly for POC.

On Wed, Oct 21, 2015 at 2:53 PM, Gurucharan Shetty <shettyg@nicira.com> wrote:
> Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
> ---
>  ovn/automake.mk   |    1 +
>  ovn/kubernetes.md |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 127 insertions(+)
>  create mode 100644 ovn/kubernetes.md
>
> diff --git a/ovn/automake.mk b/ovn/automake.mk
> index f3f40e5..7edaa29 100644
> --- a/ovn/automake.mk
> +++ b/ovn/automake.mk
> @@ -73,6 +73,7 @@ DISTCLEANFILES += ovn/ovn-architecture.7
>  EXTRA_DIST += \
>         ovn/TODO \
>         ovn/CONTAINERS.OpenStack.md \
> +       ovn/kubernetes.md \
>         ovn/OVN-GW-HA.md
>
>  # Version checking for ovn-nb.ovsschema.
> diff --git a/ovn/kubernetes.md b/ovn/kubernetes.md
> new file mode 100644
> index 0000000..47bbc9b
> --- /dev/null
> +++ b/ovn/kubernetes.md
> @@ -0,0 +1,126 @@
> +Integration of OVN with Kubernetes.
> +----------------------------------
> +
> +OVN's integration with k8 is a work in progress.
> +
> +OVN provides network virtualization to containers.  OVN's integration with
> +Kubernetes works in two modes - the "underlay" mode or the "overlay" mode.
> +
> +In the "underlay" mode, OVN requires a OpenStack setup to provide container
> +networking. In this mode, one can create logical networks and can have
> +k8 pods running inside VMs, independent VMs and physical machines running some
> +stateful services connected to the same logical network. (For this mode to
> +work completely, we need distributed load-balancer suppport in OVN, which
> +is yet to be implemented, but is in the roadmap.)
> +
> +In the "overlay" mode, OVN can create virtual networks amongst k8 pods
> +running on multiple hosts.  In this mode, you do not need a pre-created
> +OpenStack setup. (This mode needs NAT support. Open vSwitch as of version
> +2.4 does not support NAT. It is likely that Open vSwitch 2.5 or 2.6 will
> +support NAT)
> +
> +For both the modes to work, a user has to install Open vSwitch in each VM/host
> +that he plans to run his containers.
> +
> +The "underlay" mode
> +-------------------
> +
> +This mode requires that you have a OpenStack setup pre-installed with OVN
> +providing the underlay networking.  It is out of scope of this documentation
> +to describe how to create a OpenStack setup with OVN. Instead, please refer
> +http://docs.openstack.org/developer/networking-ovn/ for that.
> +
> +Cluster Setup
> +=============
> +
> +Once you have the OpenStack setup, you can have a tenant create a bunch
> +of VMs (with one mgmt network interface or more) that form the k8 cluster.
> +
> +From the master node, we make a call to OpenStack Neutron to create a
> +logical router.  The returned UUID is henceforth referred as '$ROUTER_ID'.
> +
> +With OVN, each k8 worker node is a logical switch. So they get a /x address.
> +From each worker node, we create a logical switch in OVN via Neutron. We then
> +make that logical switch as a port of the logical router $ROUTER_ID.
> +
> +We then create 2^(32-x) logical ports for that logical switch (with the parent
> +port being the VIF_ID of the hosting VM).  On the worker node, for each
> +logical port we write data into the local Open vSwitch database to
> +act as a cache of ip address, its associated mac address and port uuid.
> +
> +The value 'x' chosen depends on the number of CPUs and memory available
> +in the VM.
> +
> +K8 kubelet in each worker node is started with a OVN network plugin to setup
> +the pod network namespace.
> +
> +Since one of the k8 requirements is that each pod in a cluster is able to
> +talk to every other pod in the cluster via IP address, the above architecture
> +with interconnected logical switches via a logical router acts as the
> +foundation. In addition, this lets other VMs and physical machines (outside
> +the k8 cluster) reach the k8 pods via the same IP address.  With a OVN l3
> +gateway, one could also access each pod from the external world using direct
> +ip addresses or via using floating IPs.
> +
> +Pod Creation
> +============
> +
> +When a k8 pod is created, it lands on one of the worker nodes. The OVN
> +plugin is called to setup the network namespace. The plugin looks at the
> +local cache of IP addresses.  It picks an unused IP address and sets up the
> +pod with that IP address and MAC address and then marks that IP address as
> +'used' in the cache.
> +
> +The above design prevents one from making calls to Neutron for every single
> +pod creation.
> +
> +Security
> +========
> +
> +The network admin creates a few security profiles in OVN via Neutron.
> +For each pod spec, if he wants to associate a firewall profile, he adds the
> +UUIDs from Neutron in the pod spec either as labels or as an annotation.
> +
> +When the pod is created, the network plugin will make a call to the k8 API
> +server to fetch the security profile UUIDs for the pod. The network plugin
> +chooses a unused logical port and makes a call to Neutron to associate the
> +security profile with the logical port. It also updates the local cache to
> +indicate the associated security profile with the logical port.
> +
> +An optimization is possible here. If the amount of distinct security profiles
> +is limited in a k8 cluster, one can pre-associate the security profiles with
> +the logical ports. This would mean that one would create a lot more logical
> +ports per worker node (inspite of limited CPU and Memory).
> +
> +
> +Loadbalancers
> +=============
> +
> +We need to maintain a cache of all the logical port uuids and their ip
> +addresses in the k8 master node.  This cache can be built during cluster
> +bringup by providing the $ROUTER_ID as an input to a script.
> +
> +On k8 master node, we will need to write a new daemon that is equivalent to
> +k8's kube-proxy. Unlike default k8 setup, where each node has a kube-proxy,
> +we will need our daemon run only on the master node.
> +
> +When a k8 service is created, the new daemon will create a load-balancer object
> +in OVN (via Neutron). Whenever new endpoints get created (via pods) for that
> +service, the daemon will look at the IP addresses of the endpoints, figure
> +out the logical port uuid associated with that IP address and then make
> +a call to OVN (via Neutron) to update the logical port uuids associated with
> +the load balancer object.
> +
> +North-South traffic
> +===================
> +
> +For external connectivity to k8 services, we will need l3 gateway in OVN.
> +For every service ip address, the L3 gateway should choose one of the pod
> +endpoints as the destination.
> +
> +
> +Overlay mode
> +------------
> +
> +TBA.
> +
> --
> 1.7.9.5
>
Ben Pfaff Nov. 10, 2015, 5:56 a.m. UTC | #2
On Wed, Oct 21, 2015 at 02:53:22PM -0700, Gurucharan Shetty wrote:
> Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>

This looks pretty valuable to me.  I am glad that you spent the time to
figure all of this out.  Thank you!

I feel like I don't know any of this well enough to review it properly.
I guess that since it is an RFC, you aren't really looking for a
detailed review at this point.  I hope that someone who knows Kubernetes
will take a look at some point.
Han Zhou Nov. 10, 2015, 7:55 a.m. UTC | #3
Hi Guruchanran,

Thanks for your work!

On Wed, Oct 21, 2015 at 2:53 PM, Gurucharan Shetty <shettyg@nicira.com>
wrote:

>
> +
> +OVN provides network virtualization to containers.  OVN's integration with
> +Kubernetes works in two modes - the "underlay" mode or the "overlay" mode.
> +
>

Could you help briefly describe what are the scenario, pros & cons of each
mode?


>
> +We then create 2^(32-x) logical ports for that logical switch (with the
> parent
> +port being the VIF_ID of the hosting VM).  On the worker node, for each
> +logical port we write data into the local Open vSwitch database to
> +act as a cache of ip address, its associated mac address and port uuid.
> +
> +The value 'x' chosen depends on the number of CPUs and memory available
> +in the VM.
>

Well, this 'x' might be hard to pre-define. We might have to end up with
reserving big enough subnets for a host to be able to host small pods. But
that would mean waste of IP space.
Of course it is not an issue in deployments where IP space is adequate.


> +Since one of the k8 requirements is that each pod in a cluster is able to
> +talk to every other pod in the cluster via IP address, the above
> architecture
> +with interconnected logical switches via a logical router acts as the
>

This ideal sounds good. But I have a concern about the scalability. For
example, 1000 logical switches (for 1000 hosts in a cluster) connects to a
single logical router. Would this scale?

Thanks,
Han
Gurucharan Shetty Nov. 10, 2015, 4:19 p.m. UTC | #4
On Mon, Nov 9, 2015 at 11:55 PM, Han Zhou <zhouhan@gmail.com> wrote:
> Hi Guruchanran,
>
> Thanks for your work!
>
> On Wed, Oct 21, 2015 at 2:53 PM, Gurucharan Shetty <shettyg@nicira.com>
> wrote:
>>
>>
>> +
>> +OVN provides network virtualization to containers.  OVN's integration
>> with
>> +Kubernetes works in two modes - the "underlay" mode or the "overlay"
>> mode.
>> +
>
>
> Could you help briefly describe what are the scenario, pros & cons of each
> mode?

OVN is a pure networking story and the idea is to integrate with as
many use cases as possible.

Some use cases for the underlay mode.

1. One big use case (after speaking to many potential and current
deployers of containers) is the ability to have seamless connectivity
between your existing services (in VMs and Physical machines) with
your new applications running in your containers in a k8 cluster. This
means that you need a way to connect your containers running in a k8
cluster on top of VMs access your physical machines and other VMs.
Doing something like this in a secure way needs the support for
"underlay" mode.

2. If you take GCE, the way they are able to provide a pool of IP
addresses to your VMs, is via the support in their underlay. i.e. they
create tunnels in their underlay, but don't create tunnels inside
their VMs. This lets them do seamless integration with external
loadbalancers for north-south traffic as well as east-west traffic for
other services. WIth OVN, we provide the same richness.

3. k8s is not inherently multi-tenant (yet). If you have a enterprise
OpenStack cloud, it already provides multi-tenancy. If you use that as
your IAAS layer, then you can have multiple k8 clusters for different
tenants and not worry about overlapping ip addresses inside a single
tenant. In this mode, you don't have to worry about overstepping on
your compute resources by multiple container schedulers. So in the
same cloud, you can have k8, Mesos, Swarm etc running in parallel.

4. If you consider containers to be inherently insecure (many people
currently do), it makes sense to only run them inside VMs and not on
baremetal. This is because even if a container app breaks out, they
don't have access to your entire datacenter.


Use case for the overlay mode.

If you just want to run your cluster in a public cloud, "underlay"
mode is out of question. OVN still has a good role to play as it can
provide network connectivity, light weight security, you can enforce
policies for clean separation between dev/qa workloads etc.


>
>>
>>
>> +We then create 2^(32-x) logical ports for that logical switch (with the
>> parent
>> +port being the VIF_ID of the hosting VM).  On the worker node, for each
>> +logical port we write data into the local Open vSwitch database to
>> +act as a cache of ip address, its associated mac address and port uuid.
>> +
>> +The value 'x' chosen depends on the number of CPUs and memory available
>> +in the VM.
>
>
> Well, this 'x' might be hard to pre-define. We might have to end up with
> reserving big enough subnets for a host to be able to host small pods. But
> that would mean waste of IP space.
> Of course it is not an issue in deployments where IP space is adequate.


Writing a network plugin is quite easy for k8. So people with specific
deployment models will write their own network plugins.
For this case, the thought process is that OVN IP addresses are
virtual. So IP addresses is not really in short supply.



>
>>
>> +Since one of the k8 requirements is that each pod in a cluster is able to
>> +talk to every other pod in the cluster via IP address, the above
>> architecture
>> +with interconnected logical switches via a logical router acts as the
>
>
> This ideal sounds good. But I have a concern about the scalability. For
> example, 1000 logical switches (for 1000 hosts in a cluster) connects to a
> single logical router. Would this scale?

I don't know the scale implications as we are just getting started. k8
talks about a 100 node cluster as supportable scale (it was
re-iterated by k8s developers yesterday in kubecon. They intend to
increase the scale goals, but they don't want to promise the moon) .
In yesterday's talk in kubecon ebay mentioned that they too have a
single router connected to multiple logical switches in a large
cluster (1000ish nodes). They did talk about scale implications, but
it is not really clear where the bottleneck is.


>
> Thanks,
> Han
>
Han Zhou Nov. 10, 2015, 6:50 p.m. UTC | #5
On Tue, Nov 10, 2015 at 8:19 AM, Gurucharan Shetty <shettyg@nicira.com>
wrote:

> On Mon, Nov 9, 2015 at 11:55 PM, Han Zhou <zhouhan@gmail.com> wrote:
> > Hi Guruchanran,
> >
> > Thanks for your work!
> >
> > On Wed, Oct 21, 2015 at 2:53 PM, Gurucharan Shetty <shettyg@nicira.com>
> > wrote:
> >>
> >>
> >> +
> >> +OVN provides network virtualization to containers.  OVN's integration
> >> with
> >> +Kubernetes works in two modes - the "underlay" mode or the "overlay"
> >> mode.
> >> +
> >
> >
> > Could you help briefly describe what are the scenario, pros & cons of
> each
> > mode?
>
> OVN is a pure networking story and the idea is to integrate with as
> many use cases as possible.
>
> Some use cases for the underlay mode.
>
> 1. One big use case (after speaking to many potential and current
> deployers of containers) is the ability to have seamless connectivity
> between your existing services (in VMs and Physical machines) with
> your new applications running in your containers in a k8 cluster. This
> means that you need a way to connect your containers running in a k8
> cluster on top of VMs access your physical machines and other VMs.
> Doing something like this in a secure way needs the support for
> "underlay" mode.
>
> 2. If you take GCE, the way they are able to provide a pool of IP
> addresses to your VMs, is via the support in their underlay. i.e. they
> create tunnels in their underlay, but don't create tunnels inside
> their VMs. This lets them do seamless integration with external
> loadbalancers for north-south traffic as well as east-west traffic for
> other services. WIth OVN, we provide the same richness.
>
> 3. k8s is not inherently multi-tenant (yet). If you have a enterprise
> OpenStack cloud, it already provides multi-tenancy. If you use that as
> your IAAS layer, then you can have multiple k8 clusters for different
> tenants and not worry about overlapping ip addresses inside a single
> tenant. In this mode, you don't have to worry about overstepping on
> your compute resources by multiple container schedulers. So in the
> same cloud, you can have k8, Mesos, Swarm etc running in parallel.
>
> 4. If you consider containers to be inherently insecure (many people
> currently do), it makes sense to only run them inside VMs and not on
> baremetal. This is because even if a container app breaks out, they
> don't have access to your entire datacenter.
>
>
> Use case for the overlay mode.
>
> If you just want to run your cluster in a public cloud, "underlay"
> mode is out of question. OVN still has a good role to play as it can
> provide network connectivity, light weight security, you can enforce
> policies for clean separation between dev/qa workloads etc.
>
>
This makes it much clearer. Thanks and would appreciate if it is captured
in the document.

In addition I would suggest clearly explain the terminology "underlay" and
"overlay" in the document. "underlay" mode in this context actually means
k8s ports are running at same logical layer as the nodes that host the
containers. The host nodes themselves can run in "overlay" networks
provisioned on OVN. Without specific explain it may create confusion for
first time readers.


>
> >
> >>
> >>
> >> +We then create 2^(32-x) logical ports for that logical switch (with the
> >> parent
> >> +port being the VIF_ID of the hosting VM).  On the worker node, for each
> >> +logical port we write data into the local Open vSwitch database to
> >> +act as a cache of ip address, its associated mac address and port uuid.
> >> +
> >> +The value 'x' chosen depends on the number of CPUs and memory available
> >> +in the VM.
> >
> >
> > Well, this 'x' might be hard to pre-define. We might have to end up with
> > reserving big enough subnets for a host to be able to host small pods.
> But
> > that would mean waste of IP space.
> > Of course it is not an issue in deployments where IP space is adequate.
>
>
> Writing a network plugin is quite easy for k8. So people with specific
> deployment models will write their own network plugins.
> For this case, the thought process is that OVN IP addresses are
> virtual. So IP addresses is not really in short supply.
>
> This suggests the best practise is to run the host nodes themselves in
overlay mode (with virtual IPs).
It makes sense because otherwise if running in bridged mode it may not need
OVN or any other overlay based SDN to be deployed in the first place.

>


> >
> >>
> >> +Since one of the k8 requirements is that each pod in a cluster is able
> to
> >> +talk to every other pod in the cluster via IP address, the above
> >> architecture
> >> +with interconnected logical switches via a logical router acts as the
> >
> >
> > This ideal sounds good. But I have a concern about the scalability. For
> > example, 1000 logical switches (for 1000 hosts in a cluster) connects to
> a
> > single logical router. Would this scale?
>
> I don't know the scale implications as we are just getting started. k8
> talks about a 100 node cluster as supportable scale (it was
> re-iterated by k8s developers yesterday in kubecon. They intend to
> increase the scale goals, but they don't want to promise the moon) .
> In yesterday's talk in kubecon ebay mentioned that they too have a
> single router connected to multiple logical switches in a large
> cluster (1000ish nodes). They did talk about scale implications, but
> it is not really clear where the bottleneck is.
>
>
Let's keep this in mind and see what the scale we can achieve with ovn :)

Acked-by: Han Zhou <zhouhan@gmail.com>

Patch
diff mbox

diff --git a/ovn/automake.mk b/ovn/automake.mk
index f3f40e5..7edaa29 100644
--- a/ovn/automake.mk
+++ b/ovn/automake.mk
@@ -73,6 +73,7 @@  DISTCLEANFILES += ovn/ovn-architecture.7
 EXTRA_DIST += \
 	ovn/TODO \
 	ovn/CONTAINERS.OpenStack.md \
+	ovn/kubernetes.md \
 	ovn/OVN-GW-HA.md
 
 # Version checking for ovn-nb.ovsschema.
diff --git a/ovn/kubernetes.md b/ovn/kubernetes.md
new file mode 100644
index 0000000..47bbc9b
--- /dev/null
+++ b/ovn/kubernetes.md
@@ -0,0 +1,126 @@ 
+Integration of OVN with Kubernetes.
+----------------------------------
+
+OVN's integration with k8 is a work in progress.
+
+OVN provides network virtualization to containers.  OVN's integration with
+Kubernetes works in two modes - the "underlay" mode or the "overlay" mode.
+
+In the "underlay" mode, OVN requires a OpenStack setup to provide container
+networking. In this mode, one can create logical networks and can have
+k8 pods running inside VMs, independent VMs and physical machines running some
+stateful services connected to the same logical network. (For this mode to
+work completely, we need distributed load-balancer suppport in OVN, which
+is yet to be implemented, but is in the roadmap.)
+
+In the "overlay" mode, OVN can create virtual networks amongst k8 pods
+running on multiple hosts.  In this mode, you do not need a pre-created
+OpenStack setup. (This mode needs NAT support. Open vSwitch as of version
+2.4 does not support NAT. It is likely that Open vSwitch 2.5 or 2.6 will
+support NAT)
+
+For both the modes to work, a user has to install Open vSwitch in each VM/host
+that he plans to run his containers.
+
+The "underlay" mode
+-------------------
+
+This mode requires that you have a OpenStack setup pre-installed with OVN
+providing the underlay networking.  It is out of scope of this documentation
+to describe how to create a OpenStack setup with OVN. Instead, please refer
+http://docs.openstack.org/developer/networking-ovn/ for that.
+
+Cluster Setup
+=============
+
+Once you have the OpenStack setup, you can have a tenant create a bunch
+of VMs (with one mgmt network interface or more) that form the k8 cluster.
+
+From the master node, we make a call to OpenStack Neutron to create a
+logical router.  The returned UUID is henceforth referred as '$ROUTER_ID'.
+
+With OVN, each k8 worker node is a logical switch. So they get a /x address.
+From each worker node, we create a logical switch in OVN via Neutron. We then
+make that logical switch as a port of the logical router $ROUTER_ID.
+
+We then create 2^(32-x) logical ports for that logical switch (with the parent
+port being the VIF_ID of the hosting VM).  On the worker node, for each
+logical port we write data into the local Open vSwitch database to
+act as a cache of ip address, its associated mac address and port uuid.
+
+The value 'x' chosen depends on the number of CPUs and memory available
+in the VM.
+
+K8 kubelet in each worker node is started with a OVN network plugin to setup
+the pod network namespace.
+
+Since one of the k8 requirements is that each pod in a cluster is able to
+talk to every other pod in the cluster via IP address, the above architecture
+with interconnected logical switches via a logical router acts as the
+foundation. In addition, this lets other VMs and physical machines (outside
+the k8 cluster) reach the k8 pods via the same IP address.  With a OVN l3
+gateway, one could also access each pod from the external world using direct
+ip addresses or via using floating IPs.
+
+Pod Creation
+============
+
+When a k8 pod is created, it lands on one of the worker nodes. The OVN
+plugin is called to setup the network namespace. The plugin looks at the
+local cache of IP addresses.  It picks an unused IP address and sets up the
+pod with that IP address and MAC address and then marks that IP address as
+'used' in the cache.
+
+The above design prevents one from making calls to Neutron for every single
+pod creation.
+
+Security
+========
+
+The network admin creates a few security profiles in OVN via Neutron.
+For each pod spec, if he wants to associate a firewall profile, he adds the
+UUIDs from Neutron in the pod spec either as labels or as an annotation.
+
+When the pod is created, the network plugin will make a call to the k8 API
+server to fetch the security profile UUIDs for the pod. The network plugin
+chooses a unused logical port and makes a call to Neutron to associate the
+security profile with the logical port. It also updates the local cache to
+indicate the associated security profile with the logical port.
+
+An optimization is possible here. If the amount of distinct security profiles
+is limited in a k8 cluster, one can pre-associate the security profiles with
+the logical ports. This would mean that one would create a lot more logical
+ports per worker node (inspite of limited CPU and Memory).
+
+
+Loadbalancers
+=============
+
+We need to maintain a cache of all the logical port uuids and their ip
+addresses in the k8 master node.  This cache can be built during cluster
+bringup by providing the $ROUTER_ID as an input to a script.
+
+On k8 master node, we will need to write a new daemon that is equivalent to
+k8's kube-proxy. Unlike default k8 setup, where each node has a kube-proxy,
+we will need our daemon run only on the master node.
+
+When a k8 service is created, the new daemon will create a load-balancer object
+in OVN (via Neutron). Whenever new endpoints get created (via pods) for that
+service, the daemon will look at the IP addresses of the endpoints, figure
+out the logical port uuid associated with that IP address and then make
+a call to OVN (via Neutron) to update the logical port uuids associated with
+the load balancer object.
+
+North-South traffic
+===================
+
+For external connectivity to k8 services, we will need l3 gateway in OVN.
+For every service ip address, the L3 gateway should choose one of the pod
+endpoints as the destination.
+
+
+Overlay mode
+------------
+
+TBA.
+