From patchwork Sun Oct 30 13:30:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stephen Finucane X-Patchwork-Id: 688944 X-Patchwork-Delegate: rbryant@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3t6JQM2gnfz9t1T for ; Mon, 31 Oct 2016 00:35:07 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=that.guru header.i=@that.guru header.b=XKkn4M78; dkim-atps=neutral Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 5DC0B105F8; Sun, 30 Oct 2016 06:34:50 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id A84F5105D9 for ; Sun, 30 Oct 2016 06:34:49 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 44B8E42037A for ; Sun, 30 Oct 2016 07:34:49 -0600 (MDT) X-ASG-Debug-ID: 1477834487-09eadd0f96243670001-byXFYA Received: from mx3-pf3.cudamail.com ([192.168.14.3]) by bar5.cudamail.com with ESMTP id EvdNrmxla7X7bgUm (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 30 Oct 2016 07:34:48 -0600 (MDT) X-Barracuda-Envelope-From: stephen@that.guru X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.3 Received: from unknown (HELO camel.maple.relay.mailchannels.net) (23.83.214.29) by mx3-pf3.cudamail.com with ESMTPS (DHE-RSA-AES256-SHA encrypted); 30 Oct 2016 13:34:46 -0000 Received-SPF: none (mx3-pf3.cudamail.com: domain at that.guru does not designate permitted sender hosts) X-Barracuda-Apparent-Source-IP: 23.83.214.29 X-Barracuda-RBL-IP: 23.83.214.29 X-Sender-Id: mxroute|x-authuser|stephen@that.guru Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id A456C121014 for ; Sun, 30 Oct 2016 13:34:41 +0000 (UTC) Received: from one.mxroute.com (ip-10-120-4-226.us-west-2.compute.internal [10.120.4.226]) by relay.mailchannels.net (Postfix) with ESMTPA id 00606120E36 for ; Sun, 30 Oct 2016 13:34:40 +0000 (UTC) X-Sender-Id: mxroute|x-authuser|stephen@that.guru Received: from one.mxroute.com ([UNAVAILABLE]. [10.28.138.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.7.8); Sun, 30 Oct 2016 13:34:41 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: mxroute|x-authuser|stephen@that.guru X-MailChannels-Auth-Id: mxroute X-MC-Loop-Signature: 1477834481420:3045491411 X-MC-Ingress-Time: 1477834481420 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=that.guru; s=default; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=52kddu+lC3/IK0ccB9x0Jba2OWs2MAD24S5ZHFJLGf0=; b=XKkn4M78i6w9BEja3ezuXaWLfd Eza1FfN6gcVWnCGm+1tWkCvTd5QUU8Go5a7AkdhdhZafQpNKSYfXLCVn/T+KbOxTlpkA2zoabiIKf Q3jLsFhy8A7Hq7i27ILKvNMUGXSzwAPhfYOTIc508tv/8pGc+ELiA7XMksY54sdMa8m5agzT1OBRE SesIWZ3GmPmFm8KoWgeuJoi9wZKdpoMlHp7dzC7f59fMe+6np7xYDdq6pzG6YlORaLTWnWXcvMQfS YKptQcRvXEu61TkDr81ff/ZOCUDQESfPOFtu0Tfhqp1HQpZCVXT98arKN3ScwmxwrecR1xiR8W/MT aoURpIaA==; X-CudaMail-Envelope-Sender: stephen@that.guru From: Stephen Finucane To: dev@openvswitch.org X-CudaMail-MID: CM-V3-1029004428 X-CudaMail-DTE: 103016 X-CudaMail-Originating-IP: 23.83.214.29 Date: Sun, 30 Oct 2016 13:30:06 +0000 X-ASG-Orig-Subj: [##CM-V3-1029004428##][PATCH 20/23] doc: Convert datapath-windows/DESIGN to rST Message-Id: <1477834209-11414-21-git-send-email-stephen@that.guru> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1477834209-11414-1-git-send-email-stephen@that.guru> References: <1477834209-11414-1-git-send-email-stephen@that.guru> MIME-Version: 1.0 X-OutGoing-Spam-Status: No, score=-9.2 X-AuthUser: stephen@that.guru X-GBUdb-Analysis: 0, 23.83.214.29, Ugly c=0.071429 p=0 Source Normal X-MessageSniffer-Rules: 0-0-0-32767-c X-Barracuda-Connect: UNKNOWN[192.168.14.3] X-Barracuda-Start-Time: 1477834487 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 1.10 X-Barracuda-Spam-Status: No, SCORE=1.10 using global scores of TAG_LEVEL=3.5 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=4.0 tests=BSF_SC0_MV0713, BSF_SC5_MJ1963, DKIM_SIGNED, RDNS_NONE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.34168 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 DKIM_SIGNED Domain Keys Identified Mail: message has a signature 0.10 RDNS_NONE Delivered to trusted network by a host with no rDNS 0.50 BSF_SC0_MV0713 Custom rule MV0713 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 Subject: [ovs-dev] [PATCH 20/23] doc: Convert datapath-windows/DESIGN to rST X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@openvswitch.org Sender: "dev" Signed-off-by: Stephen Finucane --- datapath-windows/{DESIGN => DESIGN.rst} | 503 ++++++++++++++++++-------------- datapath-windows/automake.mk | 2 +- 2 files changed, 279 insertions(+), 226 deletions(-) rename datapath-windows/{DESIGN => DESIGN.rst} (54%) diff --git a/datapath-windows/DESIGN b/datapath-windows/DESIGN.rst similarity index 54% rename from datapath-windows/DESIGN rename to datapath-windows/DESIGN.rst index 6d30adc..81c1da5 100644 --- a/datapath-windows/DESIGN +++ b/datapath-windows/DESIGN.rst @@ -1,38 +1,57 @@ - OVS-on-Hyper-V Design Document - ============================== -There has been a community effort to develop Open vSwitch on Microsoft Hyper-V. -In this document, we provide details of the development effort. We believe this -document should give enough information to understand the overall design. - -The userspace portion of the OVS has been ported to Hyper-V in a separate -effort, and committed to the openvswitch repo. So, this document will mostly -emphasize on the kernel driver, though we touch upon some of the aspects of -userspace as well. - -We cover the following topics: -1. Background into relevant Hyper-V architecture -2. Design of the OVS Windows implementation - a. Kernel module (datapath) - b. Userspace components - c. Kernel-Userspace interface - d. Flow of a packet -3. Build/Deployment environment - -For more questions, please contact dev@openvswitch.org - -1) Background into relevant Hyper-V architecture ------------------------------------------------- -Microsoft’s hypervisor solution - Hyper-V[1] implements a virtual switch that -is extensible and provides opportunities for other vendors to implement -functional extensions[2]. The extensions need to be implemented as NDIS drivers -that bind within the extensible switch driver stack provided. The extensions -can broadly provide the functionality of monitoring, modifying and forwarding -packets to destination ports on the Hyper-V extensible switch. Correspondingly, -the extensions can be categorized into the following types and provide the -functionality noted: - * Capturing extensions: monitoring packets - * Filtering extensions: monitoring, modifying packets - * Forwarding extensions: monitoring, modifying, forwarding packets +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +===================== +OVS-on-Hyper-V Design +===================== + +This document provides details of the effort to develop Open vSwitch on +Microsoft Hyper-V. This document should give enough information to understand +the overall design. + +.. note:: + The userspace portion of the OVS has been ported to Hyper-V in a separate + effort, and committed to the openvswitch repo. This document will mostly + emphasize on the kernel driver, though we touch upon some of the aspects of + userspace as well. + +Background Info +--------------- + +Microsoft’s hypervisor solution - Hyper-V [1]_ implements a virtual switch +that is extensible and provides opportunities for other vendors to implement +functional extensions [2]_. The extensions need to be implemented as NDIS +drivers that bind within the extensible switch driver stack provided. The +extensions can broadly provide the functionality of monitoring, modifying and +forwarding packets to destination ports on the Hyper-V extensible switch. +Correspondingly, the extensions can be categorized into the following types and +provide the functionality noted: + +* Capturing extensions: monitoring packets + +* Filtering extensions: monitoring, modifying packets + +* Forwarding extensions: monitoring, modifying, forwarding packets As can be expected, the kernel portion (datapath) of OVS on Hyper-V solution will be implemented as a forwarding extension. @@ -44,68 +63,69 @@ is used for packets being sent out of a port, and egress is used for packet being received on a port. By design, NDIS provides a layered interface. In this layered interface, higher level layers call into lower level layers, in the ingress path. In the egress path, it is the other way round. In addition, there -is a object identifier (OID) interface for control operations Eg. addition of -a port. The workflow for the calls is similar in nature to the packets, where +is a object identifier (OID) interface for control operations Eg. addition of a +port. The workflow for the calls is similar in nature to the packets, where higher level layers call into the lower level layers. A good representational -diagram of this architecture is in [4]. +diagram of this architecture is in [4]_. -Windows Filtering Platform (WFP)[5] is a platform implemented on Hyper-V that +Windows Filtering Platform (WFP)[5]_ is a platform implemented on Hyper-V that provides APIs and services for filtering packets. WFP has been utilized to filter on some of the packets that OVS is not equipped to handle directly. More details in later sections. -IP Helper [6] is a set of API available on Hyper-V to retrieve information +IP Helper [6]_ is a set of API available on Hyper-V to retrieve information related to the network configuration information on the host machine. IP Helper has been used to retrieve some of the configuration information that OVS needs. - -2) Design of the OVS Windows implementation -------------------------------------------- - - +-------------------------------+ - | | - | CHILD PARTITION | - | | - +------+ +--------------+ | +-----------+ +------------+ | - | | | | | | | | | | - | ovs- | | OVS- | | | Virtual | | Virtual | | - | *ctl | | USERSPACE | | | Machine #1| | Machine #2 | | - | | | DAEMON | | | | | | | - +------+-++---+---------+ | +--+------+-+ +----+------++ | +--------+ - | dpif- | | netdev- | | |VIF #1| |VIF #2| | |Physical| - | netlink | | windows | | +------+ +------+ | | NIC | - +---------+ +---------+ | || /\ | +--------+ -User /\ /\ | || *#1* *#4* || | /\ -=========||=========||============+------||-------------------||--+ || -Kernel || || \/ || ||=====/ - \/ \/ +-----+ +-----+ *#5* - +-------------------------------+ | | | | - | +----------------------+ | | | | | - | | OVS Pseudo Device | | | | | | - | +----------------------+ | | | | | - | | Netlink Impl. | | | | | | - | ----------------- | | I | | | - | +------------+ | | N | | E | - | | Flowtable | +------------+ | | G | | G | - | +------------+ | Packet | |*#2*| R | | R | - | +--------+ | Processing | |<=> | E | | E | - | | WFP | | | | | S | | S | - | | Driver | +------------+ | | S | | S | - | +--------+ | | | | | - | | | | | | - | OVS FORWARDING EXTENSION | | | | | - +-------------------------------+ +-----+-----------------+-----+ - |HYPER-V Extensible Switch *#3| - +-----------------------------+ - NDIS STACK - - Fig 2. Various blocks of the OVS Windows implementation - -Figure 2 shows the various blocks involved in the OVS Windows implementation, -along with some of the components available in the NDIS stack, and also the -virtual machines. The workflow of a packet being transmitted from a VIF out and -into another VIF and to a physical NIC is also shown. Later on in this section, -we will discuss the flow of a packet at a high level. +Design +------ + +:: + + Various blocks of the OVS Windows implementation + + +-------------------------------+ + | | + | CHILD PARTITION | + | | + +------+ +--------------+ | +-----------+ +------------+ | + | | | | | | | | | | + | ovs- | | OVS- | | | Virtual | | Virtual | | + | *ctl | | USERSPACE | | | Machine #1| | Machine #2 | | + | | | DAEMON | | | | | | | + +------+-++---+---------+ | +--+------+-+ +----+------++ | +--------+ + | dpif- | | netdev- | | |VIF #1| |VIF #2| | |Physical| + | netlink | | windows | | +------+ +------+ | | NIC | + +---------+ +---------+ | || /\ | +--------+ + User /\ /\ | || *#1* *#4* || | /\ + =========||=========||============+------||-------------------||--+ || + Kernel || || \/ || ||=====/ + \/ \/ +-----+ +-----+ *#5* + +-------------------------------+ | | | | + | +----------------------+ | | | | | + | | OVS Pseudo Device | | | | | | + | +----------------------+ | | | | | + | | Netlink Impl. | | | | | | + | ----------------- | | I | | | + | +------------+ | | N | | E | + | | Flowtable | +------------+ | | G | | G | + | +------------+ | Packet | |*#2*| R | | R | + | +--------+ | Processing | |<=> | E | | E | + | | WFP | | | | | S | | S | + | | Driver | +------------+ | | S | | S | + | +--------+ | | | | | + | | | | | | + | OVS FORWARDING EXTENSION | | | | | + +-------------------------------+ +-----+-----------------+-----+ + |HYPER-V Extensible Switch *#3| + +-----------------------------+ + NDIS STACK + +This diagram shows the various blocks involved in the OVS Windows +implementation, along with some of the components available in the NDIS stack, +and also the virtual machines. The workflow of a packet being transmitted from +a VIF out and into another VIF and to a physical NIC is also shown. Later on in +this section, we will discuss the flow of a packet at a high level. The figure gives a general idea of where the OVS userspace and the kernel components fit in, and how they interface with each other. @@ -114,63 +134,79 @@ The kernel portion (datapath) of OVS on Hyper-V solution has be implemented as a forwarding extension roughly implementing the following sub-modules/functionality. Details of each of these sub-components in the kernel are contained in later sections: - * Interfacing with the NDIS stack - * Netlink message parser - * Netlink sockets - * Switch/Datapath management - * Interfacing with userspace portion of the OVS solution to implement the - necessary functionality that userspace needs - * Port management - * Flowtable/Actions/packet forwarding - * Tunneling - * Event notifications + +* Interfacing with the NDIS stack + +* Netlink message parser + +* Netlink sockets + +* Switch/Datapath management + +* Interfacing with userspace portion of the OVS solution to implement the + necessary functionality that userspace needs + +* Port management + +* Flowtable/Actions/packet forwarding + +* Tunneling + +* Event notifications The datapath for the OVS on Linux is a kernel module, and cannot be directly ported since there are significant differences in architecture even though the end functionality provided would be similar. Some examples of the differences are: - * Interfacing with the NDIS stack to hook into the NDIS callbacks for - functionality such as receiving and sending packets, packet completions, - OIDs used for events such as a new port appearing on the virtual switch. - * Interface between the userspace and the kernel module. - * Event notifications are significantly different. - * The communication interface between DPIF and the kernel module need not be - implemented in the way OVS on Linux does. That said, it would be - advantageous to have a similar interface to the kernel module for reasons of - readability and maintainability. - * Any licensing issues of using Linux kernel code directly. + +* Interfacing with the NDIS stack to hook into the NDIS callbacks for + functionality such as receiving and sending packets, packet completions, OIDs + used for events such as a new port appearing on the virtual switch. + +* Interface between the userspace and the kernel module. + +* Event notifications are significantly different. + +* The communication interface between DPIF and the kernel module need not be + implemented in the way OVS on Linux does. That said, it would be advantageous + to have a similar interface to the kernel module for reasons of readability + and maintainability. + +* Any licensing issues of using Linux kernel code directly. Due to these differences, it was a straightforward decision to develop the datapath for OVS on Hyper-V from scratch rather than porting the one on Linux. A re-development focused on the following goals: - * Adhere to the existing requirements of userspace portion of OVS (such as - ovs-vswitchd), to minimize changes in the userspace workflow. - * Fit well into the typical workflow of a Hyper-V extensible switch forwarding - extension. + +* Adhere to the existing requirements of userspace portion of OVS (such as + ovs-vswitchd), to minimize changes in the userspace workflow. + +* Fit well into the typical workflow of a Hyper-V extensible switch forwarding + extension. The userspace portion of the OVS solution is mostly POSIX code, and not very Linux specific. Majority of the userspace code does not interface directly with -the kernel datapath and was ported independently of the kernel datapath -effort. +the kernel datapath and was ported independently of the kernel datapath effort. -As explained in the OVS porting design document [7], DPIF is the portion of +As explained in the OVS porting design document [7]_, DPIF is the portion of userspace that interfaces with the kernel portion of the OVS. The interface -that each DPIF provider has to implement is defined in dpif-provider.h [3]. -Though each platform is allowed to have its own implementation of the DPIF -provider, it was found, via community feedback, that it is desired to +that each DPIF provider has to implement is defined in ``dpif-provider.h`` +[3]_. Though each platform is allowed to have its own implementation of the +DPIF provider, it was found, via community feedback, that it is desired to share code whenever possible. Thus, the DPIF provider for OVS on Hyper-V shares code with the DPIF provider on Linux. This interface is implemented in -dpif-netlink.c, formerly dpif-linux.c. +``dpif-netlink.c``. We'll elaborate more on kernel-userspace interface in a dedicated section below. Here it suffices to say that the DPIF provider implementation for Windows is netlink-based and shares code with the Linux one. -2.a) Kernel module (datapath) ------------------------------ +Kernel Module (Datapath) +------------------------ + +Interfacing with the NDIS Stack +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Interfacing with the NDIS stack -------------------------------- For each virtual switch on Hyper-V, the OVS extensible switch extension can be enabled/disabled. We support enabling the OVS extension on only one switch. This is consistent with using a single datapath in the kernel on Linux. All the @@ -190,38 +226,45 @@ As shown in the figures, an extensible switch extension gets to see a packet sent by the VM (VIF) twice - once on the ingress path and once on the egress path. Forwarding decisions are to be made on the ingress path. Correspondingly, we will be hooking onto the following interfaces: - * Ingress send indication: intercept packets for performing flow based - forwarding.This includes straight forwarding to output ports. Any packet - modifications needed to be performed are done here either inline or by - creating a new packet. A forwarding action is performed as the flow actions - dictate. - * Ingress completion indication: cleanup and free packets that we generated on - the ingress send path, pass-through for packets that we did not generate. - * Egress receive indication: pass-through. - * Egress completion indication: pass-through. - -Interfacing with OVS userspace ------------------------------- + +* Ingress send indication: intercept packets for performing flow based + forwarding.This includes straight forwarding to output ports. Any packet + modifications needed to be performed are done here either inline or by + creating a new packet. A forwarding action is performed as the flow actions + dictate. + +* Ingress completion indication: cleanup and free packets that we generated on + the ingress send path, pass-through for packets that we did not generate. + +* Egress receive indication: pass-through. + +* Egress completion indication: pass-through. + +Interfacing with OVS Userspace +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + We have implemented a pseudo device interface for letting OVS userspace talk to the OVS kernel module. This is equivalent to the typical character device interface on POSIX platforms where we can register custom functions for read, write and ioctl functionality. The pseudo device supports a whole bunch of ioctls that netdev and DPIF on OVS userspace make use of. -Netlink message parser ----------------------- +Netlink Message Parser +~~~~~~~~~~~~~~~~~~~~~~ + The communication between OVS userspace and OVS kernel datapath is in the form -of Netlink messages [1]. More details about this are provided in #2.c section, -kernel-userspace interface. In the kernel, a full fledged netlink message -parser has been implemented along the lines of the netlink message parser in -OVS userspace. In fact, a lot of the code is ported code. +of Netlink messages [1]_. More details about this are provided below. In the +kernel, a full fledged netlink message parser has been implemented along the +lines of the netlink message parser in OVS userspace. In fact, a lot of the +code is ported code. -On the lines of 'struct ofpbuf' in OVS userspace, a managed buffer has been +On the lines of ``struct ofpbuf`` in OVS userspace, a managed buffer has been implemented in the kernel datapath to make it easier to parse and construct netlink messages. -Netlink sockets ---------------- +Netlink Sockets +~~~~~~~~~~~~~~~ + On Linux, OVS userspace utilizes netlink sockets to pass back and forth netlink messages. Since much of userspace code including DPIF provider in dpif-netlink.c (formerly dpif-linux.c) has been reused, pseudo-netlink sockets @@ -237,15 +280,17 @@ Typical netlink semantics of read message, write message, dump, and transaction have been implemented so that higher level layers are not affected by the netlink implementation not being native. -Switch/Datapath management --------------------------- +Switch/Datapath Management +~~~~~~~~~~~~~~~~~~~~~~~~~~ + As explained above, we hook onto the management callback functions in the NDIS interface for when to initialize the OVS data structures, flow tables etc. Some of this code is also driven by OVS userspace code which sends down ioctls for operations like creating a tunnel port etc. -Port management ---------------- +Port Management +~~~~~~~~~~~~~~~ + As explained above, we hook onto the management callback functions in the NDIS interface to know when a port is added/connected to the Hyper-V switch. We use these callbacks to initialize the port related data structures in OVS. Also, @@ -263,8 +308,9 @@ We maintain separate hash tables, and separate counters for ports that have been added from the Hyper-V switch, and for ports that have been added from OVS userspace. -Flowtable/Actions/packet forwarding ------------------------------------ +Flowtable/Actions/Packet Forwarding +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + The flowtable and flow actions based packet forwarding is the core of the OVS datapath functionality. For each packet on the ingress path, we consult the flowtable and execute the corresponding actions. The actions can be limited to @@ -274,7 +320,8 @@ thereafter forwarding to the external port to send the packet to a destination host. Tunneling ---------- +~~~~~~~~~ + We make use of the Internal Port on a Hyper-V switch for implementing tunneling. The Internal Port is a virtual adapter that is exposed on the Hyper- V host, and connected to the Hyper-V switch. Basically, it is an interface @@ -296,21 +343,22 @@ OVS right away. Currently, fragmented IP packets fall into that category, and we leverage the code in the host IP stack to reassemble the packet, and performing decapsulation on the reassembled packet. -We’ll also be using the IP helper library to provide us IP address and other +We'll also be using the IP helper library to provide us IP address and other information corresponding to the Internal port. -Event notifications -------------------- +Event Notifications +~~~~~~~~~~~~~~~~~~~ + The pseudo device interface described above is also used for providing event notifications back to OVS userspace. A shared memory/overlapped IO model is used. -2.b) Userspace components -------------------------- +Userspace Components +~~~~~~~~~~~~~~~~~~~~ + The userspace portion of the OVS solution is mostly POSIX code, and not very Linux specific. Majority of the userspace code does not interface directly with -the kernel datapath and was ported independently of the kernel datapath -effort. +the kernel datapath and was ported independently of the kernel datapath effort. In this section, we cover the userspace components that interface with the kernel datapath. @@ -319,109 +367,124 @@ As explained earlier, OVS on Hyper-V shares the DPIF provider implementation with Linux. The DPIF provider on Linux uses netlink sockets and netlink messages. Netlink sockets and messages are extensively used on Linux to exchange information between userspace and kernel. In order to satisfy these -dependencies, netlink socket (pseudo and non-native) and netlink messages -are implemented on Hyper-V. +dependencies, netlink socket (pseudo and non-native) and netlink messages are +implemented on Hyper-V. The following are the major advantages of sharing DPIF provider code: + 1. Maintenance is simpler: + Any change made to the interface defined in dpif-provider.h need not be propagated to multiple implementations. Also, developers familiar with the Linux implementation of the DPIF provider can easily ramp on the Hyper-V implementation as well. + 2. Netlink messages provides inherent advantages: + Netlink messages are known for their extensibility. Each message is versioned, so the provided data structures offer a mechanism to perform - version checking and forward/backward compatibility with the kernel - module. + version checking and forward/backward compatibility with the kernel module. + +Netlink Sockets +~~~~~~~~~~~~~~~ -Netlink sockets ---------------- As explained in other sections, an emulation of netlink sockets has been -implemented in lib/netlink-socket.c for Windows. The implementation creates a -handle to the OVS pseudo device, and emulates netlink socket semantics of -receive message, send message, dump, and transact. Most of the nl_* functions -are supported. +implemented in ``lib/netlink-socket.c`` for Windows. The implementation creates +a handle to the OVS pseudo device, and emulates netlink socket semantics of +receive message, send message, dump, and transact. Most of the ``nl_*`` +functions are supported. -The fact that the implementation is non-native manifests in various ways. -One example is that PID for the netlink socket is not automatically assigned in +The fact that the implementation is non-native manifests in various ways. One +example is that PID for the netlink socket is not automatically assigned in userspace when a handle is created to the OVS pseudo device. There's an extra -command (defined in OvsDpInterfaceExt.h) that is used to grab the PID generated -in the kernel. +command (defined in ``OvsDpInterfaceExt.h``) that is used to grab the PID +generated in the kernel. + +DPIF Provider +~~~~~~~~~~~~~ -DPIF provider --------------- As has been mentioned in earlier sections, the netlink socket and netlink message based DPIF provider on Linux has been ported to Windows. -Correspondingly, the file is called lib/dpif-netlink.c now from its former -name of lib/dpif-linux.c. -Most of the code is common. Some divergence is in the code to receive -packets. The Linux implementation uses epoll() which is not natively supported -on Windows. +Most of the code is common. Some divergence is in the code to receive packets. +The Linux implementation uses epoll() which is not natively supported on +Windows. + +netdev-windows +~~~~~~~~~~~~~~ -Netdev-Windows --------------- We have a Windows implementation of the interface defined in -lib/netdev-provider.h. The implementation provides functionality to get +``lib/netdev-provider.h``. The implementation provides functionality to get extended information about an interface. It is limited in functionality compared to the Linux implementation of the netdev provider and cannot be used to add any interfaces in the kernel such as a tap interface or to send/receive packets. The netdev-windows implementation uses the datapath interface -extensions defined in: -datapath-windows/include/OvsDpInterfaceExt.h +extensions defined in ``datapath-windows/include/OvsDpInterfaceExt.h``. + +Powershell Extensions to Set ``OVS-port-name`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Powershell extensions to set "OVS-port-name" --------------------------------------------- As explained in the section on "Port management", each Hyper-V port has a 'FriendlyName' field, which we call as the "OVS-port-name" field. We have implemented powershell command extensions to be able to set the "OVS-port-name" of a Hyper-V port. -2.c) Kernel-Userspace interface -------------------------------- +Kernel-Userspace Interface +-------------------------- + openvswitch.h and OvsDpInterfaceExt.h -------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Since the DPIF provider is shared with Linux, the kernel datapath provides the same interface as the Linux datapath. The interface is defined in -datapath/linux/compat/include/linux/openvswitch.h. Derivatives of this +``datapath/linux/compat/include/linux/openvswitch.h``. Derivatives of this interface file are created during OVS userspace compilation. The derivative for -the kernel datapath on Hyper-V is provided in the following location: -datapath-windows/include/OvsDpInterface.h +the kernel datapath on Hyper-V is provided in +``datapath-windows/include/OvsDpInterface.h``. That said, there are Windows specific extensions that are defined in the -interface file: -datapath-windows/include/OvsDpInterfaceExt.h +interface file ``datapath-windows/include/OvsDpInterfaceExt.h``. + +Flow of a Packet +---------------- -2.d) Flow of a packet ---------------------- Figure 2 shows the numbered steps in which a packets gets sent out of a VIF and is forwarded to another VIF or a physical NIC. As mentioned earlier, each VIF is attached to the switch via a port, and each port is both on the ingress and egress path of the switch, and depending on whether a packet is being transmitted or received, one of the paths gets used. In the figure, each step n -is annotated as *#n* +is annotated as ``#n`` The steps are as follows: + 1. When a packet is sent out of a VIF or an physical NIC or an internal port, -the packet is part of the ingress path. + the packet is part of the ingress path. + 2. The OVS kernel driver gets to intercept this packet. + a. OVS looks up the flows in the flowtable for this packet, and executes the corresponding action. + b. If there is not action, the packet is sent up to OVS userspace to examine the packet and figure out the actions. - v. Userspace executes the packet by specifying the actions, and might also + + c. Userspace executes the packet by specifying the actions, and might also insert a flow for such a packet in the future. + d. The destination ports are added to the packet and sent down to the Hyper- V switch. + 3. The Hyper-V forwards the packet to the destination ports specified in the -packet, and sends it out on the egress path. + packet, and sends it out on the egress path. + 4. The packet gets forwarded to the destination VIF. + 5. It might also get forwarded to a physical NIC as well, if the physical NIC -has been added as a destination port by OVS. + has been added as a destination port by OVS. +Build/Deployment +---------------- -3) Build/Deployment: --------------------- The userspace components added as part of OVS Windows implementation have been integrated with autoconf, and can be built using the steps mentioned in the BUILD.Windows file. Additional targets need to be specified to make. @@ -433,25 +496,15 @@ such that we can compile it without an IDE as well. Once compiled, we have an install script that can be used to load the kernel driver. - -Reference list: -=============== -1. Hyper-V Extensible Switch -http://msdn.microsoft.com/en-us/library/windows/hardware/hh598161(v=vs.85).aspx -2. Hyper-V Extensible Switch Extensions -http://msdn.microsoft.com/en-us/library/windows/hardware/hh598169(v=vs.85).aspx -3. DPIF Provider -http://openvswitch.sourcearchive.com/documentation/1.1.0-1/dpif- -provider_8h_source.html -4. Hyper-V Extensible Switch Components -http://msdn.microsoft.com/en-us/library/windows/hardware/hh598163(v=vs.85).aspx -5. Windows Filtering Platform -http://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx -6. IP Helper -http://msdn.microsoft.com/en-us/library/windows/hardware/ff557015(v=vs.85).aspx -7. How to Port Open vSwitch to New Software or Hardware -http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=PORTING -8. Netlink -http://en.wikipedia.org/wiki/Netlink -9. epoll -http://en.wikipedia.org/wiki/Epoll +References +---------- + +.. [1] Hyper-V Extensible Switch http://msdn.microsoft.com/en-us/library/windows/hardware/hh598161(v=vs.85).aspx +.. [2] Hyper-V Extensible Switch Extensions http://msdn.microsoft.com/en-us/library/windows/hardware/hh598169(v=vs.85).aspx +.. [3] DPIF Provider http://openvswitch.sourcearchive.com/documentation/1.1.0-1/dpif-provider_8h_source.html +.. [4] Hyper-V Extensible Switch Components http://msdn.microsoft.com/en-us/library/windows/hardware/hh598163(v=vs.85).aspx +.. [5] Windows Filtering Platform http://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx +.. [6] IP Helper http://msdn.microsoft.com/en-us/library/windows/hardware/ff557015(v=vs.85).aspx +.. [7] How to Port Open vSwitch to New Software or Hardware http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=PORTING +.. [8] Netlink http://en.wikipedia.org/wiki/Netlink +.. [9] epoll http://en.wikipedia.org/wiki/Epoll diff --git a/datapath-windows/automake.mk b/datapath-windows/automake.mk index 096575b..db83962 100644 --- a/datapath-windows/automake.mk +++ b/datapath-windows/automake.mk @@ -1,6 +1,6 @@ EXTRA_DIST += \ datapath-windows/CodingStyle.rst \ - datapath-windows/DESIGN \ + datapath-windows/DESIGN.rst \ datapath-windows/Package/package.VcxProj \ datapath-windows/Package/package.VcxProj.user \ datapath-windows/include/OvsDpInterfaceExt.h \