From patchwork Thu Sep 17 17:11:23 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ben Pfaff
X-Patchwork-Id: 518973
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from archives.nicira.com (li376-54.members.linode.com
[96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id DAD551400B7
for ;
Fri, 18 Sep 2015 03:11:43 +1000 (AEST)
Received: from archives.nicira.com (localhost [127.0.0.1])
by archives.nicira.com (Postfix) with ESMTP id 1AD8F106FF;
Thu, 17 Sep 2015 10:11:40 -0700 (PDT)
X-Original-To: dev@openvswitch.org
Delivered-To: dev@openvswitch.org
Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67])
by archives.nicira.com (Postfix) with ESMTPS id 76DBD106FB
for ; Thu, 17 Sep 2015 10:11:38 -0700 (PDT)
Received: from bar5.cudamail.com (localhost [127.0.0.1])
by mx1e3.cudamail.com (Postfix) with ESMTPS id 93A2F420049
for ; Thu, 17 Sep 2015 11:11:37 -0600 (MDT)
X-ASG-Debug-ID: 1442509897-09eadd48f92dbd50001-byXFYA
Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com
with
ESMTP id SyfztE6jF2gjRISC (version=TLSv1 cipher=DHE-RSA-AES256-SHA
bits=256 verify=NO) for ;
Thu, 17 Sep 2015 11:11:37 -0600 (MDT)
X-Barracuda-Envelope-From: blp@nicira.com
X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1
Received: from unknown (HELO mail-pa0-f52.google.com) (209.85.220.52)
by mx1-pf1.cudamail.com with ESMTPS (RC4-SHA encrypted);
17 Sep 2015 17:11:36 -0000
Received-SPF: unknown (mx1-pf1.cudamail.com: Multiple SPF records returned)
X-Barracuda-RBL-Trusted-Forwarder: 209.85.220.52
Received: by pacex6 with SMTP id ex6so24982815pac.0
for ; Thu, 17 Sep 2015 10:11:36 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
:references;
bh=JM0/0u/CHAcTcmAwdJEEvv05RfXCfgA0eZyb7D48PRU=;
b=VaGQa+JRz/N/o8exhRIFzSKWTtSTdx3M2muFe6EK3xOf7Hmk/mc4LWVJ17iPpkAtba
bUxq1xxT8B/MtUwoC5LwgA2ZXVc6LNU4C+AGSmXVnHKBXr6AMyHVOv7AnvpQ1ElHRhr8
DrHPFamT8noHQK5dvJyMMZvLG/EwgKuV1IKjnVcXJvINgQca5XZ4kyEAsZn25b1tLUfK
9DNYMN29pDAhLsIClP5X/d3xcvBSOrqinp9sIzXxTwlpIDB5/+ZoyNGGR0u2vOHN7p9/
IHR3egtXElzyLJDMGCJA6hgAdWyBQbbiqvNhnLk7Y+TqXrqVyuKaGQFfWEChDp10QcAV
aNNA==
X-Gm-Message-State:
ALoCoQkqqCei+Jn+QHhZDIYty1ecE4bdRCS/RVPbNNHBcbS2fYZehuRrc916phyNIOfF/05mLY9P
X-Received: by 10.66.236.129 with SMTP id uu1mr252608pac.34.1442509896131;
Thu, 17 Sep 2015 10:11:36 -0700 (PDT)
Received: from sigabrt.benpfaff.org ([208.91.2.4])
by smtp.gmail.com with ESMTPSA id
be3sm4440806pbc.88.2015.09.17.10.11.34
(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
Thu, 17 Sep 2015 10:11:34 -0700 (PDT)
X-CudaMail-Envelope-Sender: blp@nicira.com
X-Barracuda-Apparent-Source-IP: 208.91.2.4
From: Ben Pfaff
To: dev@openvswitch.org
X-CudaMail-Whitelist-To: dev@openvswitch.org
X-CudaMail-MID: CM-E1-916057840
X-CudaMail-DTE: 091715
X-CudaMail-Originating-IP: 209.85.220.52
Date: Thu, 17 Sep 2015 10:11:23 -0700
X-ASG-Orig-Subj: [##CM-E1-916057840##][PATCH 3/3] ovn: Update TODO,
ovn-northd flow table design, ovn-architecture for L3.
Message-Id: <1442509883-3992-3-git-send-email-blp@nicira.com>
X-Mailer: git-send-email 2.1.3
In-Reply-To: <1442509883-3992-1-git-send-email-blp@nicira.com>
References: <1442509883-3992-1-git-send-email-blp@nicira.com>
X-Barracuda-Connect: UNKNOWN[192.168.24.1]
X-Barracuda-Start-Time: 1442509897
X-Barracuda-Encrypted: DHE-RSA-AES256-SHA
X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi
X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?=
X-Virus-Scanned: by bsmtpd at cudamail.com
X-Barracuda-BRTS-Status: 1
Cc: Ben Pfaff
Subject: [ovs-dev] [PATCH 3/3] ovn: Update TODO, ovn-northd flow table design,
ovn-architecture for L3.
X-BeenThere: dev@openvswitch.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Errors-To: dev-bounces@openvswitch.org
Sender: "dev"
This is a proposed plan for logical L3 in OVN. It is not entirely
complete but it includes many important details and I believe that it moves
planning forward.
Signed-off-by: Ben Pfaff
---
ovn/TODO | 264 +++++++++++++++++++++++++++++++++-
ovn/northd/ovn-northd.8.xml | 342 +++++++++++++++++++++++++++++++++++++++++++-
ovn/ovn-architecture.7.xml | 2 +-
ovn/ovn-sb.xml | 109 ++++++++++++--
4 files changed, 697 insertions(+), 20 deletions(-)
diff --git a/ovn/TODO b/ovn/TODO
index 6f625ce..a0f5385 100644
--- a/ovn/TODO
+++ b/ovn/TODO
@@ -1,3 +1,265 @@
+-*- outline -*-
+
+* L3 support
+
+** OVN_Northbound schema
+
+*** Needs to support interconnected routers
+
+It should be possible to connect one router to another, e.g. to
+represent a provider/tenant router relationship. This requires
+an OVN_Northbound schema change.
+
+*** Needs to support extra routes
+
+Currently a router port has a single route associated with it, but
+presumably we should support multiple routes. For connections from
+one router to another, this doesn't seem to matter (just put more than
+one connection between them), but for connections between a router and
+a switch it might matter because a switch has only one router port.
+
+** OVN_SB schema
+
+*** Logical datapath interconnection
+
+There needs to be a way in the OVN_Southbound database to express
+connections between logical datapaths, so that packets can pass from a
+logical switch to its logical router (and vice versa) and from one
+logical router to another.
+
+One way to do this would be to introduce logical patch ports, closely
+modeled on the "physical" patch ports that OVS has had for ages. Each
+logical patch port would consist of two rows in the Port_Binding table
+(one in each logical datapath), with type "patch" and an option "peer"
+that names the other logical port in the pair.
+
+If we do it this way then we'll need to figure out one odd special
+case. Currently the ACL table documents that the logical router port
+is always named "ROUTER". This can't be done directly with this patch
+port technique, because every row in the Logical_Port table must have
+a unique name. This probably means that we should change the
+convention for the ACL table so that the logical router port name is
+unique; for example, we could change the Logical_Router_Port table to
+require the 'name' column to be unique, and then use that name in the
+ACL table.
+
+*** Allow output to ingress port
+
+Sometimes when a packet ingresses into a router, it has to egress the
+same port. One example is a "one-armed" router that has multiple
+routes on a single port (or in which a host is (mis)configured to send
+every IP packet to the router, e.g. due to a bad netmask). Another is
+when a router needs to send an ICMP reply to a ingressing packet.
+
+To some degree this problem is layered, because there are two
+different notions of "ingress port". The first is the OpenFlow
+ingress port, essentially a physical port identifier. This is
+implemented as part of ovs-vswitchd's OpenFlow implementation. It
+prevents a reply from being sent across the tunnel on which it
+arrived. It is questionable whether this OpenFlow feature is useful
+to OVN. (OVN already has to override it to allow a packet from one
+nested container to be forwarded to a different nested container.)
+OVS make it possible to disable this feature of OpenFlow by setting
+the OpenFlow input port field to 0. (If one does this too early, of
+course, it means that there's no way to actually match on the input
+port in the OpenFlow flow tables, but one can work around that by
+instead setting the input port just before the output action, possibly
+wrapping these actions in push/pop pairs to preserve the input port
+for later.)
+
+The second is the OVN logical ingress port, which is implemented in
+ovn-controller as part of the logical abstraction, using an OVS
+register. Dropping packets directed to the logical ingress port is
+implemented through an OpenFlow table not directly visible to the
+logical flow table. Currently this behavior can't be disabled, but
+various ways to ensure it could be implemented, e.g. the same as for
+OpenFlow by allowing the logical inport to be zeroed, or by
+introducing a new action that ignores the inport.
+
+** ovn-northd
+
+*** What flows should it generate?
+
+See description in ovn-northd(8).
+
+** New OVN logical actions
+
+*** arp
+
+Generates an ARP packet based on the current IPv4 packet and allows it
+to be processed as part of the current pipeline (and then pop back to
+processing the original IPv4 packet).
+
+TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to
+one per second for a given target. We might need to do this too.
+
+*** icmp4 { action... }
+
+Generates an ICMPv4 packet based on the current IPv4 packet and
+processes it according to each nested action (and then pops back to
+processing the original IPv4 packet). The intended use case is for
+generating "time exceeded" and "destination unreachable" errors.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+Tentatively, the icmp4 action sets a default icmp_type and icmp_code
+and lets the nested actions override it. This means that we'd have to
+make icmp_type and icmp_code writable. Because changing icmp_type and
+icmp_code can change the interpretation of the rest of the data in the
+ICMP packet, we would want to think this through carefully. If it
+seems like a bad idea then we could instead make the type and code a
+parameter to the action: icmp4(type, code) { action... }
+
+It is worth considering what should be considered the ingress port for
+the ICMPv4 packet. It's quite likely that the ICMPv4 packet is going
+to go back out the ingress port. Maybe the icmp4 action, therefore,
+should clear the inport, so that output to the original inport won't
+be discarded.
+
+*** tcp_reset
+
+Transforms the current TCP packet into a RST reply.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+*** Other actions for IPv6.
+
+IPv6 will probably need an action or actions for ND that is similar to
+the "arp" action, and an action for generating
+
+*** Other actions.
+
+Possibly we'll need to implement "field1 = field2;" for copying
+between fields and "field1 <-> field2;" for swapping fields.
+
+*** ovn-controller translation to OpenFlow
+
+The following two translation strategies come to mind. Some of the
+new actions we might want to implement one way, some of them the
+other, depending on the details.
+
+*** Implementation strategies
+
+One way to do this is to define new actions as Open vSwitch extensions
+to OpenFlow, emit those actions in ovn-controller, and implement them
+in ovs-vswitchd (possibly pushing the implementations into the Linux
+and DPDK datapaths as well). This is the only acceptable way for
+actions that need high performance. None of these actions obviously
+need high performance, but it might be necessary to have fairness in
+handling e.g. a flood of incoming packets that require these actions.
+The main disadvantage of this approach is that it ties ovs-vswitchd
+(and the Linux kernel module) to supporting these actions essentially
+forever, which means that we'd want to make sure that they are
+general-purpose, well designed, maintainable, and supportable.
+
+The other way to do this is to send the packets across an OpenFlow
+channel to ovn-controller and have ovn-controller process them. This
+is acceptable for actions that don't need high performance, and it
+means that we don't add anything permanently to ovs-vswitchd or the
+kernel (so we can be more casual about the design). The big
+disadvantage is that it becomes necessary to add a way to resume the
+OpenFlow pipeline when it is interrupted in the middle by sending a
+packet to the controller. This is not as simple as doing a new flow
+table lookup and resuming from that point. Instead, it is equivalent
+to the (very complicated) recirculation logic in ofproto-dpif-xlate.c.
+Much of this logic can be translated into OpenFlow actions (e.g. the
+call stack and data stack), but some of it is entirely outside
+OpenFlow (e.g. the state of mirrors). To implement it properly, it
+seems that we'll have to introduce a new Open vSwitch extension to
+OpenFlow, a "send-to-controller" action that causes extra data to be
+sent to the controller, where the extra data packages up the state
+necessary to resume the pipeline. Maybe the bits of the state that
+can be represented in OpenFlow can be embedded in this extra data in a
+controller-readable form, but other bits we might want to be opaque.
+It's also likely that we'll want to change and extend the form of this
+opaque data over time, so this should be allowed for, e.g. by
+including a nonce in the extra data that is newly generated every time
+ovs-vswitchd starts.
+
+*** OpenFlow action definitions
+
+Define OpenFlow wire structures for each new OpenFlow action and
+implement them in lib/ofp-actions.[ch].
+
+*** OVS implementation
+
+Add code for action translation. Possibly add datapath code for
+action implementation. However, none of these new actions should
+require high-bandwidth processing so we could at least start with them
+implemented in userspace only. (ARP field modification is already
+userspace-only and no one has complained yet.)
+
+** IPv6
+
+*** ND versus ARP
+
+*** IPv6 routing
+
+*** ICMPv6
+
+** IP to MAC binding
+
+Somehow it has to be possible for an L3 logical router to map from an
+IP address to an Ethernet address. This can happen statically or
+dynamically. Probably both cases need to be supported eventually.
+
+*** Static IP to MAC binding
+
+Commonly, for a VM, the binding of an IP address to a MAC is known
+statically. The Logical_Port table in the OVN_Northbound schema can
+be revised to make these bindings known. Then ovn-northd can
+integrate the bindings into the logical router flow table.
+(ovn-northd can also integrate them into the logical switch flow table
+to terminate ARP requests from VIFs.)
+
+*** Dynamic IP to MAC bindings
+
+Some bindings from IP address to MAC will undoubtedly need to be
+discovered dynamically through ARP requests. It's straightforward
+enough for a logical L3 router to generate ARP requests and forward
+them to the appropriate switch.
+
+It's more difficult to figure out where the reply should be processed
+and stored. It might seem at first that a first-cut implementation
+could just keep track of the binding on the hypervisor that needs to
+know, but that can't happen easily because the VM that sends the reply
+might not be on the same HV as the VM that needs the answer (that is,
+the VM that sent the packet that needs the binding to be resolved) and
+there isn't an easy way for it to know which HV needs the answer.
+
+Thus, the HV that processes the ARP reply (which is unknown when the
+ARP is sent) has to tell all the HVs the binding. The most obvious
+place for this in the OVN_Southbound database.
+
+Details need to be worked out, including:
+
+**** OVN_Southbound schema changes.
+
+Possibly bindings could be added to the Port_Binding table by adding
+or modifying columns. Another possibility is that another table
+should be added.
+
+**** Logical_Flow representation
+
+It would be really nice to maintain the general-purpose nature of
+logical flows, but these bindings might have to include some
+hard-coded special cases, especially when it comes to the relationship
+with populating the bindings into the OVN_Southbound table.
+
+**** Tracking queries
+
+It's probably best to only record in the database responses to queries
+actually issued by an L3 logical router, so somehow they have to be
+tracked, probably by putting a tentative binding without a MAC address
+into the database.
+
+**** Renewal and expiration.
+
+Something needs to make sure that bindings remain valid and expire
+those that become stale.
+
+*** MTU handling (fragmentation on output)
+
* ovn-controller
** ovn-controller parameters and configuration.
@@ -100,4 +362,4 @@
Both ovn-controller and ovn-contorller-vtep should use BFD to
monitor the tunnel liveness. Both ovs-vswitchd schema and
- VTEP schema supports BFD.
\ No newline at end of file
+ VTEP schema supports BFD.
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 1655958..9d35d9f 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -106,10 +106,12 @@
One of the main purposes of ovn-northd
is to populate the
Logical_Flow
table in the OVN_Southbound
database. This section describes how ovn-northd
does this
- for logical datapaths.
+ for switch and router logical datapaths.
- Ingress Table 0: Admission Control and Ingress Port Security
+ Logical Switch Datapaths
+
+ Ingress Table 0: Admission Control and Ingress Port Security
Ingress table 0 contains these logical flows:
@@ -137,7 +139,7 @@
be dropped.
- Ingress table 1: from-lport
ACLs
+ Ingress table 1: from-lport
ACLs
Logical flows in this table closely reproduce those in the
@@ -154,7 +156,7 @@
next;
, so that ACLs allow packets by default.
- Ingress Table 2: Destination Lookup
+ Ingress Table 2: Destination Lookup
This table implements switching behavior. It contains these logical
@@ -185,13 +187,13 @@
-
Egress Table 0: to-lport
ACLs
+ Egress Table 0: to-lport
ACLs
This is similar to ingress table 1 except for to-lport
ACLs.
- Egress Table 1: Egress Port Security
+ Egress Table 1: Egress Port Security
This is similar to the ingress port security logic in ingress table 0,
@@ -206,4 +208,332 @@
disabled logical outport
overrides the priority-100 flow
with a drop;
action.
+
+ Logical Router Datapaths
+
+ Ingress Table 0: L2 Admission Control
+
+
+ This table drops packets that the router shouldn't see at all based on
+ their Ethernet headers. It contains the following flows, all with
+ priority 100:
+
+
+
+ -
+ One flow that matches on
eth.dst[40] == 1
with action
+ next;
.
+
+
+ -
+ For each router port P with Ethernet address E, a
+ flow that matches
inport == P && eth.dst ==
+ E
, with action next;
.
+
+
+
+
+ Other packets are implicitly dropped.
+
+
+ Ingress Table 1: IP Routing
+
+
+ This table is the core of the logical router datapath functionality. It
+ contains the following flows to implement very basic IP host
+ functionality:
+
+
+
+ -
+
+ L3 admission control: A priority-220 flow drops packets that match
+ any of the following:
+
+
+
+ -
+
ip.src[28..31] == 0xe
(multicast source)
+
+ -
+
ip.src == 255.255.255.255
(broadcast source)
+
+ -
+
ip.src == 127.0.0.0/8 || ip.dst == 127.0.0.0/8
+ (localhost source or destination)
+
+ -
+
ip.src == 0.0.0.0/8 || ip.dst == 0.0.0.0/8
(zero
+ network source or destination)
+
+ -
+
ip.src
is any IP address owned by the router.
+
+ -
+
ip.src
is the broadcast address of any IP network
+ known to the router.
+
+
+
+
+ -
+
+ ICMP echo reply. These flows reply to ICMP echo requests received
+ for the router's IP address. Let A be an IP address owned
+ by the router or the broadcast address for one of these IP address's
+ networks. Then, for each A, a priority-210 flow matches
+ on ip.dst == A
and icmp4.type == 8
+ && icmp4.code == 0
(ICMP echo request). These flows
+ use the following actions where, if A is unicast, then
+ S is A, and if A is broadcast,
+ S is the router's IP address in A's network:
+
+
+
+ip4.dst = ip4.src;
+ip4.src = S;
+ip4.ttl = 255;
+icmp4.type = 0;
+reg0 = ip4.dst;
+next;
+
+
+
+ Similar flows match on ip.dst == 255.255.255.255
and
+ each individual inport
, and use the same actions in
+ which S is a function of inport
.
+
+
+
+ -
+
+ ARP reply. These flows reply to ARP requests for the router's own IP
+ address. For each router port P that owns IP address
+ A and Ethernet address E, a priority-210 flow
+ matches inport == P && arp.tpa ==
+ A && arp.op == 1
(ARP request) with the
+ following actions:
+
+
+
+eth.dst = eth.src;
+eth.src = E;
+arp.op = 2; // ARP reply
+arp.tha = arp.sha;
+arp.sha = E;
+arp.tpa = arp.spa;
+arp.spa = A;
+outport = P;
+inport = 0; // allow sending out inport
+output;
+
+
+
+ -
+
+ UDP port unreachable. These flows generate ICMP port unreachable
+ messages in reply to UDP datagrams directed to the router's IP
+ address. The logical router doesn't accept any UDP traffic so it
+ always generates such a reply.
+
+
+
+ These flows should not match IP fragments with nonzero offset.
+
+
+
+ Details TBD.
+
+
+
+ -
+
+ TCP reset. These flows generate TCP reset messages in reply to TCP
+ datagrams directed to the router's IP address. The logical router
+ doesn't accept any TCP traffic so it always generates such a reply.
+
+
+
+ Details TBD.
+
+
+
+ -
+
+ Protocol unreachable. These flows generate ICMP protocol unreachable
+ messages in reply to packets directed to the router's IP address on
+ IP protocols other than UDP, TCP, and ICMP.
+
+
+
+ These flows should not match IP fragments with nonzero offset.
+
+
+
+ Details TBD.
+
+
+
+ -
+ Drop other IP traffic to this router. These flows drop any other
+ traffic destined to an IP address of this router that is not already
+ handled by one of the flows above. For each IP address A
+ owned by the router, a priority-200 flow matches
ip.dst ==
+ A
and drops the traffic.
+
+
+
+
+ The flows above handle all of the traffic that might be directed to the
+ router itself. The following flows (with lower priorities) handle the
+ remaining traffic, potentially for forwarding:
+
+
+
+ -
+ Ethernet local broadcast. A priority-190 flow with match
eth.dst
+ == ff:ff:ff:ff:ff:ff
drops traffic destined to the local
+ Ethernet broadcast address. By definition this traffic should not be
+ forwarded.
+
+
+ -
+ Drop IP multicast. A priority-190 flow with match
ip.dst[28..31]
+ == 0xe
drops IP multicast traffic.
+
+
+ -
+
+ TTL check. For each router port P, whose IP address is
+ A, a priority-180 flow with match inport ==
+ P && ip.ttl < 2 &&
+ !ip.later_frag
matches packets whose TTL has expired, with the
+ following actions to send an ICMP time exceeded reply:
+
+
+
+icmp4 {
+ icmp4.type = 11; // Time exceeded
+ icmp4.code = 0; // TTL exceeded in transit
+ ip4.dst = ip4.src;
+ ip4.src = A;
+ ip4.ttl = 255;
+ reg0 = ip4.dst;
+ next;
+};
+
+
+
+ -
+
+ Routing table. For each route to IPv4 network N with
+ netmask M, a logical flow with match ip.dst ==
+ N/M
, whose priority is the number of
+ 1-bits in M, has the following actions:
+
+
+
+ip4.ttl--;
+reg0 = G;
+next;
+
+
+
+ If the route has a gateway, G is the gateway IP address,
+ otherwise it is ip.dst
.
+
+
+
+ -
+
+ Destination unreachable. For each router port P, which
+ owns IP address A, a priority-0 logical flow with match
+ in_port == P && !ip.later_frag
has
+ the following actions:
+
+
+
+icmp4 {
+ icmp4.type = 3; // Destination unreachable
+ icmp4.code = 0; // Network unreachable
+ ip4.dst = ip4.src;
+ ip4.src = A;
+ ip4.ttl = 255;
+ reg0 = ip4.dst;
+ next;
+};
+
+
+
+ These flows are omitted if the logical router has a default route,
+ that is, a route with netmask 0.0.0.0.
+
+
+
+
+ Ingress Table 2: ARP Resolution
+
+
+ Any packet that reaches this table is an IP packet whose next-hop IP
+ address is in reg0
. (ip.dst
is the final
+ destination.) This table resolves the IP address in reg0
+ into an Ethernet address in eth.dst
, using the following
+ flows:
+
+
+
+ -
+
+ Known MAC bindings. For each IP address A whose host is
+ known to have Ethernet address E and reside on router port
+ P, a priority-200 flow with match reg0 ==
+ A
has the following actions:
+
+
+
+eth.dst = E;
+outport = P;
+output;
+
+
+
+ -
+
+ Unknown MAC bindings. For each non-gateway route to IPv4 network
+ N with netmask M on router port P
+ that owns IP address A and Ethernet address E,
+ a logical flow with match ip.dst ==
+ N/M
, whose priority is the number of
+ 1-bits in M, has the following actions:
+
+
+
+ratelimit;
+arp {
+ eth.dst = ff:ff:ff:ff:ff:ff;
+ eth.src = E;
+ arp.sha = E;
+ arp.tha = 00:00:00:00:00:00;
+ arp.spa = A;
+ arp.tpa = ip.dst;
+ outport = P;
+ output;
+};
+
+
+
+ TBD: How to install MAC bindings when an ARP response comes back.
+ (Implement a "learn" action?)
+
+
+
+
+ Egress Table 0: ARP Details
+
+
+ Packets that reach this table are ready for delivery. It contains a
+ single priority-0 logical flow that matches all packets and actions
+ output;
.
+
+
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 47dfc2a..a7ff674 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -596,7 +596,7 @@
- Life Cycle of a Packet
+ Architectural Life Cycle of a Packet
This section describes how a packet travels from one virtual machine or
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index c1932ad..0aaf7ca 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -240,12 +240,12 @@
The default action when no flow matches is to drop packets.
- Logical Life Cycle of a Packet
+ Architectural Logical Life Cycle of a Packet
This following description focuses on the life cycle of a packet through
a logical datapath, ignoring physical details of the implementation.
- Please refer to Life Cycle of a Packet in
+ Please refer to Architectural Life Cycle of a Packet in
ovn-architecture
(7) for the physical information.
@@ -810,24 +810,109 @@
field1 = field2;
-
- Extends the assignment action to allow copying between fields.
+
+ Extends the assignment action to allow copying between fields.
+
+
+
+ An assignment adds prerequisites from the source and the
+ destination fields.
+
+
+
+ ip4.ttl--;
+ -
+
+ Decrements the IPv4 TTL. If this would make the TTL zero or
+ negative, then processing of the packet halts; no further actions
+ are processed. (To properly handle such cases, a higher-priority
+ flow should match on ip.ttl < 2
.)
+
+
+ Prerequisite: ip4
- learn
+ arp { action;
... };
+ -
+
+ Temporarily replaces the IPv4 packet being processed by an ARP
+ packet and executes each nested action on the ARP
+ packet. Actions following the arp action, if any, apply
+ to the original, unmodified packet.
+
- conntrack
+
+ The ARP packet that this action operates on is initialized based on
+ the IPv4 packet being processed, as follows:
+
+
+
+ eth.src
unchanged
+ eth.dst
unchanged
+ eth.type = 0x0806
+ arp.op = 1
(ARP request)
+ arp.sha
copied from eth.src
+ arp.spa
copied from ip4.src
+ arp.tha = 00:00:00:00:00:00
+ arp.tpa
copied from ip4.dst
+
+
+ Prerequisite: ip4
+
- dec_ttl { action,
... } { action;
...};
+ icmp4 { action;
... };
-
- decrement TTL; execute first set of actions if
- successful, second set if TTL decrement fails
+
+ Temporarily replaces the IPv4 packet being processed by an ICMPv4
+ packet and executes each nested action on the ARP
+ packet. Actions following the icmp4 action, if any,
+ apply to the original, unmodified packet.
+
+
+
+ The ICMPv4 packet that this action operates on is initialized based
+ on the IPv4 packet being processed, as follows. Ethernet and IPv4
+ fields not listed here are not changed:
+
+
+
+ ip.proto = 1
(ICMPv4)
+ ip.frag = 0
(not a fragment)
+ icmp4.type = 3
(destination unreachable)
+ icmp4.code = 1
(host unreachable)
+
+
+
+ XXX need to explain exactly how the ICMP packet is constructed
+
+
+ Prerequisite: ip4
- icmp_reply { action,
... };
- - generate ICMP reply from packet, execute actions
+ tcp_reset;
+ -
+
+ This action transforms the current TCP packet according to the
+ following pseudocode:
+
+
+
+if (tcp.ack) {
+ tcp.seq = tcp.ack;
+} else {
+ tcp.ack = tcp.seq + length(tcp.payload);
+ tcp.seq = 0;
+}
+tcp.flags = RST;
+
- arp { action,
... }
- - generate ARP from packet, execute actions
+
+ Then, the action drops all TCP options and payload data, and
+ updates the TCP checksum.
+
+
+ Prerequisite: tcp
+