From patchwork Thu Sep 17 17:11:23 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Pfaff X-Patchwork-Id: 518973 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (li376-54.members.linode.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id DAD551400B7 for ; Fri, 18 Sep 2015 03:11:43 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 1AD8F106FF; Thu, 17 Sep 2015 10:11:40 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 76DBD106FB for ; Thu, 17 Sep 2015 10:11:38 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 93A2F420049 for ; Thu, 17 Sep 2015 11:11:37 -0600 (MDT) X-ASG-Debug-ID: 1442509897-09eadd48f92dbd50001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com with ESMTP id SyfztE6jF2gjRISC (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 17 Sep 2015 11:11:37 -0600 (MDT) X-Barracuda-Envelope-From: blp@nicira.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO mail-pa0-f52.google.com) (209.85.220.52) by mx1-pf1.cudamail.com with ESMTPS (RC4-SHA encrypted); 17 Sep 2015 17:11:36 -0000 Received-SPF: unknown (mx1-pf1.cudamail.com: Multiple SPF records returned) X-Barracuda-RBL-Trusted-Forwarder: 209.85.220.52 Received: by pacex6 with SMTP id ex6so24982815pac.0 for ; Thu, 17 Sep 2015 10:11:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JM0/0u/CHAcTcmAwdJEEvv05RfXCfgA0eZyb7D48PRU=; b=VaGQa+JRz/N/o8exhRIFzSKWTtSTdx3M2muFe6EK3xOf7Hmk/mc4LWVJ17iPpkAtba bUxq1xxT8B/MtUwoC5LwgA2ZXVc6LNU4C+AGSmXVnHKBXr6AMyHVOv7AnvpQ1ElHRhr8 DrHPFamT8noHQK5dvJyMMZvLG/EwgKuV1IKjnVcXJvINgQca5XZ4kyEAsZn25b1tLUfK 9DNYMN29pDAhLsIClP5X/d3xcvBSOrqinp9sIzXxTwlpIDB5/+ZoyNGGR0u2vOHN7p9/ IHR3egtXElzyLJDMGCJA6hgAdWyBQbbiqvNhnLk7Y+TqXrqVyuKaGQFfWEChDp10QcAV aNNA== X-Gm-Message-State: ALoCoQkqqCei+Jn+QHhZDIYty1ecE4bdRCS/RVPbNNHBcbS2fYZehuRrc916phyNIOfF/05mLY9P X-Received: by 10.66.236.129 with SMTP id uu1mr252608pac.34.1442509896131; Thu, 17 Sep 2015 10:11:36 -0700 (PDT) Received: from sigabrt.benpfaff.org ([208.91.2.4]) by smtp.gmail.com with ESMTPSA id be3sm4440806pbc.88.2015.09.17.10.11.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 17 Sep 2015 10:11:34 -0700 (PDT) X-CudaMail-Envelope-Sender: blp@nicira.com X-Barracuda-Apparent-Source-IP: 208.91.2.4 From: Ben Pfaff To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E1-916057840 X-CudaMail-DTE: 091715 X-CudaMail-Originating-IP: 209.85.220.52 Date: Thu, 17 Sep 2015 10:11:23 -0700 X-ASG-Orig-Subj: [##CM-E1-916057840##][PATCH 3/3] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3. Message-Id: <1442509883-3992-3-git-send-email-blp@nicira.com> X-Mailer: git-send-email 2.1.3 In-Reply-To: <1442509883-3992-1-git-send-email-blp@nicira.com> References: <1442509883-3992-1-git-send-email-blp@nicira.com> X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1442509897 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Cc: Ben Pfaff Subject: [ovs-dev] [PATCH 3/3] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" This is a proposed plan for logical L3 in OVN. It is not entirely complete but it includes many important details and I believe that it moves planning forward. Signed-off-by: Ben Pfaff --- ovn/TODO | 264 +++++++++++++++++++++++++++++++++- ovn/northd/ovn-northd.8.xml | 342 +++++++++++++++++++++++++++++++++++++++++++- ovn/ovn-architecture.7.xml | 2 +- ovn/ovn-sb.xml | 109 ++++++++++++-- 4 files changed, 697 insertions(+), 20 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index 6f625ce..a0f5385 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -1,3 +1,265 @@ +-*- outline -*- + +* L3 support + +** OVN_Northbound schema + +*** Needs to support interconnected routers + +It should be possible to connect one router to another, e.g. to +represent a provider/tenant router relationship. This requires +an OVN_Northbound schema change. + +*** Needs to support extra routes + +Currently a router port has a single route associated with it, but +presumably we should support multiple routes. For connections from +one router to another, this doesn't seem to matter (just put more than +one connection between them), but for connections between a router and +a switch it might matter because a switch has only one router port. + +** OVN_SB schema + +*** Logical datapath interconnection + +There needs to be a way in the OVN_Southbound database to express +connections between logical datapaths, so that packets can pass from a +logical switch to its logical router (and vice versa) and from one +logical router to another. + +One way to do this would be to introduce logical patch ports, closely +modeled on the "physical" patch ports that OVS has had for ages. Each +logical patch port would consist of two rows in the Port_Binding table +(one in each logical datapath), with type "patch" and an option "peer" +that names the other logical port in the pair. + +If we do it this way then we'll need to figure out one odd special +case. Currently the ACL table documents that the logical router port +is always named "ROUTER". This can't be done directly with this patch +port technique, because every row in the Logical_Port table must have +a unique name. This probably means that we should change the +convention for the ACL table so that the logical router port name is +unique; for example, we could change the Logical_Router_Port table to +require the 'name' column to be unique, and then use that name in the +ACL table. + +*** Allow output to ingress port + +Sometimes when a packet ingresses into a router, it has to egress the +same port. One example is a "one-armed" router that has multiple +routes on a single port (or in which a host is (mis)configured to send +every IP packet to the router, e.g. due to a bad netmask). Another is +when a router needs to send an ICMP reply to a ingressing packet. + +To some degree this problem is layered, because there are two +different notions of "ingress port". The first is the OpenFlow +ingress port, essentially a physical port identifier. This is +implemented as part of ovs-vswitchd's OpenFlow implementation. It +prevents a reply from being sent across the tunnel on which it +arrived. It is questionable whether this OpenFlow feature is useful +to OVN. (OVN already has to override it to allow a packet from one +nested container to be forwarded to a different nested container.) +OVS make it possible to disable this feature of OpenFlow by setting +the OpenFlow input port field to 0. (If one does this too early, of +course, it means that there's no way to actually match on the input +port in the OpenFlow flow tables, but one can work around that by +instead setting the input port just before the output action, possibly +wrapping these actions in push/pop pairs to preserve the input port +for later.) + +The second is the OVN logical ingress port, which is implemented in +ovn-controller as part of the logical abstraction, using an OVS +register. Dropping packets directed to the logical ingress port is +implemented through an OpenFlow table not directly visible to the +logical flow table. Currently this behavior can't be disabled, but +various ways to ensure it could be implemented, e.g. the same as for +OpenFlow by allowing the logical inport to be zeroed, or by +introducing a new action that ignores the inport. + +** ovn-northd + +*** What flows should it generate? + +See description in ovn-northd(8). + +** New OVN logical actions + +*** arp + +Generates an ARP packet based on the current IPv4 packet and allows it +to be processed as part of the current pipeline (and then pop back to +processing the original IPv4 packet). + +TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to +one per second for a given target. We might need to do this too. + +*** icmp4 { action... } + +Generates an ICMPv4 packet based on the current IPv4 packet and +processes it according to each nested action (and then pops back to +processing the original IPv4 packet). The intended use case is for +generating "time exceeded" and "destination unreachable" errors. + +ovn-sb.xml includes a tentative specification for this action. + +Tentatively, the icmp4 action sets a default icmp_type and icmp_code +and lets the nested actions override it. This means that we'd have to +make icmp_type and icmp_code writable. Because changing icmp_type and +icmp_code can change the interpretation of the rest of the data in the +ICMP packet, we would want to think this through carefully. If it +seems like a bad idea then we could instead make the type and code a +parameter to the action: icmp4(type, code) { action... } + +It is worth considering what should be considered the ingress port for +the ICMPv4 packet. It's quite likely that the ICMPv4 packet is going +to go back out the ingress port. Maybe the icmp4 action, therefore, +should clear the inport, so that output to the original inport won't +be discarded. + +*** tcp_reset + +Transforms the current TCP packet into a RST reply. + +ovn-sb.xml includes a tentative specification for this action. + +*** Other actions for IPv6. + +IPv6 will probably need an action or actions for ND that is similar to +the "arp" action, and an action for generating + +*** Other actions. + +Possibly we'll need to implement "field1 = field2;" for copying +between fields and "field1 <-> field2;" for swapping fields. + +*** ovn-controller translation to OpenFlow + +The following two translation strategies come to mind. Some of the +new actions we might want to implement one way, some of them the +other, depending on the details. + +*** Implementation strategies + +One way to do this is to define new actions as Open vSwitch extensions +to OpenFlow, emit those actions in ovn-controller, and implement them +in ovs-vswitchd (possibly pushing the implementations into the Linux +and DPDK datapaths as well). This is the only acceptable way for +actions that need high performance. None of these actions obviously +need high performance, but it might be necessary to have fairness in +handling e.g. a flood of incoming packets that require these actions. +The main disadvantage of this approach is that it ties ovs-vswitchd +(and the Linux kernel module) to supporting these actions essentially +forever, which means that we'd want to make sure that they are +general-purpose, well designed, maintainable, and supportable. + +The other way to do this is to send the packets across an OpenFlow +channel to ovn-controller and have ovn-controller process them. This +is acceptable for actions that don't need high performance, and it +means that we don't add anything permanently to ovs-vswitchd or the +kernel (so we can be more casual about the design). The big +disadvantage is that it becomes necessary to add a way to resume the +OpenFlow pipeline when it is interrupted in the middle by sending a +packet to the controller. This is not as simple as doing a new flow +table lookup and resuming from that point. Instead, it is equivalent +to the (very complicated) recirculation logic in ofproto-dpif-xlate.c. +Much of this logic can be translated into OpenFlow actions (e.g. the +call stack and data stack), but some of it is entirely outside +OpenFlow (e.g. the state of mirrors). To implement it properly, it +seems that we'll have to introduce a new Open vSwitch extension to +OpenFlow, a "send-to-controller" action that causes extra data to be +sent to the controller, where the extra data packages up the state +necessary to resume the pipeline. Maybe the bits of the state that +can be represented in OpenFlow can be embedded in this extra data in a +controller-readable form, but other bits we might want to be opaque. +It's also likely that we'll want to change and extend the form of this +opaque data over time, so this should be allowed for, e.g. by +including a nonce in the extra data that is newly generated every time +ovs-vswitchd starts. + +*** OpenFlow action definitions + +Define OpenFlow wire structures for each new OpenFlow action and +implement them in lib/ofp-actions.[ch]. + +*** OVS implementation + +Add code for action translation. Possibly add datapath code for +action implementation. However, none of these new actions should +require high-bandwidth processing so we could at least start with them +implemented in userspace only. (ARP field modification is already +userspace-only and no one has complained yet.) + +** IPv6 + +*** ND versus ARP + +*** IPv6 routing + +*** ICMPv6 + +** IP to MAC binding + +Somehow it has to be possible for an L3 logical router to map from an +IP address to an Ethernet address. This can happen statically or +dynamically. Probably both cases need to be supported eventually. + +*** Static IP to MAC binding + +Commonly, for a VM, the binding of an IP address to a MAC is known +statically. The Logical_Port table in the OVN_Northbound schema can +be revised to make these bindings known. Then ovn-northd can +integrate the bindings into the logical router flow table. +(ovn-northd can also integrate them into the logical switch flow table +to terminate ARP requests from VIFs.) + +*** Dynamic IP to MAC bindings + +Some bindings from IP address to MAC will undoubtedly need to be +discovered dynamically through ARP requests. It's straightforward +enough for a logical L3 router to generate ARP requests and forward +them to the appropriate switch. + +It's more difficult to figure out where the reply should be processed +and stored. It might seem at first that a first-cut implementation +could just keep track of the binding on the hypervisor that needs to +know, but that can't happen easily because the VM that sends the reply +might not be on the same HV as the VM that needs the answer (that is, +the VM that sent the packet that needs the binding to be resolved) and +there isn't an easy way for it to know which HV needs the answer. + +Thus, the HV that processes the ARP reply (which is unknown when the +ARP is sent) has to tell all the HVs the binding. The most obvious +place for this in the OVN_Southbound database. + +Details need to be worked out, including: + +**** OVN_Southbound schema changes. + +Possibly bindings could be added to the Port_Binding table by adding +or modifying columns. Another possibility is that another table +should be added. + +**** Logical_Flow representation + +It would be really nice to maintain the general-purpose nature of +logical flows, but these bindings might have to include some +hard-coded special cases, especially when it comes to the relationship +with populating the bindings into the OVN_Southbound table. + +**** Tracking queries + +It's probably best to only record in the database responses to queries +actually issued by an L3 logical router, so somehow they have to be +tracked, probably by putting a tentative binding without a MAC address +into the database. + +**** Renewal and expiration. + +Something needs to make sure that bindings remain valid and expire +those that become stale. + +*** MTU handling (fragmentation on output) + * ovn-controller ** ovn-controller parameters and configuration. @@ -100,4 +362,4 @@ Both ovn-controller and ovn-contorller-vtep should use BFD to monitor the tunnel liveness. Both ovs-vswitchd schema and - VTEP schema supports BFD. \ No newline at end of file + VTEP schema supports BFD. diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 1655958..9d35d9f 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -106,10 +106,12 @@ One of the main purposes of ovn-northd is to populate the Logical_Flow table in the OVN_Southbound database. This section describes how ovn-northd does this - for logical datapaths. + for switch and router logical datapaths.

-

Ingress Table 0: Admission Control and Ingress Port Security

+

Logical Switch Datapaths

+ +

Ingress Table 0: Admission Control and Ingress Port Security

Ingress table 0 contains these logical flows: @@ -137,7 +139,7 @@ be dropped.

-

Ingress table 1: from-lport ACLs

+

Ingress table 1: from-lport ACLs

Logical flows in this table closely reproduce those in the @@ -154,7 +156,7 @@ next;, so that ACLs allow packets by default.

-

Ingress Table 2: Destination Lookup

+

Ingress Table 2: Destination Lookup

This table implements switching behavior. It contains these logical @@ -185,13 +187,13 @@ -

Egress Table 0: to-lport ACLs

+

Egress Table 0: to-lport ACLs

This is similar to ingress table 1 except for to-lport ACLs.

-

Egress Table 1: Egress Port Security

+

Egress Table 1: Egress Port Security

This is similar to the ingress port security logic in ingress table 0, @@ -206,4 +208,332 @@ disabled logical outport overrides the priority-100 flow with a drop; action.

+ +

Logical Router Datapaths

+ +

Ingress Table 0: L2 Admission Control

+ +

+ This table drops packets that the router shouldn't see at all based on + their Ethernet headers. It contains the following flows, all with + priority 100: +

+ +
    +
  • + One flow that matches on eth.dst[40] == 1 with action + next;. +
  • + +
  • + For each router port P with Ethernet address E, a + flow that matches inport == P && eth.dst == + E, with action next;. +
  • +
+ +

+ Other packets are implicitly dropped. +

+ +

Ingress Table 1: IP Routing

+ +

+ This table is the core of the logical router datapath functionality. It + contains the following flows to implement very basic IP host + functionality: +

+ +
    +
  • +

    + L3 admission control: A priority-220 flow drops packets that match + any of the following: +

    + +
      +
    • + ip.src[28..31] == 0xe (multicast source) +
    • +
    • + ip.src == 255.255.255.255 (broadcast source) +
    • +
    • + ip.src == 127.0.0.0/8 || ip.dst == 127.0.0.0/8 + (localhost source or destination) +
    • +
    • + ip.src == 0.0.0.0/8 || ip.dst == 0.0.0.0/8 (zero + network source or destination) +
    • +
    • + ip.src is any IP address owned by the router. +
    • +
    • + ip.src is the broadcast address of any IP network + known to the router. +
    • +
    +
  • + +
  • +

    + ICMP echo reply. These flows reply to ICMP echo requests received + for the router's IP address. Let A be an IP address owned + by the router or the broadcast address for one of these IP address's + networks. Then, for each A, a priority-210 flow matches + on ip.dst == A and icmp4.type == 8 + && icmp4.code == 0 (ICMP echo request). These flows + use the following actions where, if A is unicast, then + S is A, and if A is broadcast, + S is the router's IP address in A's network: +

    + +
    +ip4.dst = ip4.src;
    +ip4.src = S;
    +ip4.ttl = 255;
    +icmp4.type = 0;
    +reg0 = ip4.dst;
    +next;
    +
    + +

    + Similar flows match on ip.dst == 255.255.255.255 and + each individual inport, and use the same actions in + which S is a function of inport. +

    +
  • + +
  • +

    + ARP reply. These flows reply to ARP requests for the router's own IP + address. For each router port P that owns IP address + A and Ethernet address E, a priority-210 flow + matches inport == P && arp.tpa == + A && arp.op == 1 (ARP request) with the + following actions: +

    + +
    +eth.dst = eth.src;
    +eth.src = E;
    +arp.op = 2; // ARP reply
    +arp.tha = arp.sha;
    +arp.sha = E;
    +arp.tpa = arp.spa;
    +arp.spa = A;
    +outport = P;
    +inport = 0; // allow sending out inport
    +output;
    +
    +
  • + +
  • +

    + UDP port unreachable. These flows generate ICMP port unreachable + messages in reply to UDP datagrams directed to the router's IP + address. The logical router doesn't accept any UDP traffic so it + always generates such a reply. +

    + +

    + These flows should not match IP fragments with nonzero offset. +

    + +

    + Details TBD. +

    +
  • + +
  • +

    + TCP reset. These flows generate TCP reset messages in reply to TCP + datagrams directed to the router's IP address. The logical router + doesn't accept any TCP traffic so it always generates such a reply. +

    + +

    + Details TBD. +

    +
  • + +
  • +

    + Protocol unreachable. These flows generate ICMP protocol unreachable + messages in reply to packets directed to the router's IP address on + IP protocols other than UDP, TCP, and ICMP. +

    + +

    + These flows should not match IP fragments with nonzero offset. +

    + +

    + Details TBD. +

    +
  • + +
  • + Drop other IP traffic to this router. These flows drop any other + traffic destined to an IP address of this router that is not already + handled by one of the flows above. For each IP address A + owned by the router, a priority-200 flow matches ip.dst == + A and drops the traffic. +
  • +
+ +

+ The flows above handle all of the traffic that might be directed to the + router itself. The following flows (with lower priorities) handle the + remaining traffic, potentially for forwarding: +

+ +
    +
  • + Ethernet local broadcast. A priority-190 flow with match eth.dst + == ff:ff:ff:ff:ff:ff drops traffic destined to the local + Ethernet broadcast address. By definition this traffic should not be + forwarded. +
  • + +
  • + Drop IP multicast. A priority-190 flow with match ip.dst[28..31] + == 0xe drops IP multicast traffic. +
  • + +
  • +

    + TTL check. For each router port P, whose IP address is + A, a priority-180 flow with match inport == + P && ip.ttl < 2 && + !ip.later_frag matches packets whose TTL has expired, with the + following actions to send an ICMP time exceeded reply: +

    + +
    +icmp4 {
    +    icmp4.type = 11; // Time exceeded
    +    icmp4.code = 0;  // TTL exceeded in transit
    +    ip4.dst = ip4.src;
    +    ip4.src = A;
    +    ip4.ttl = 255;
    +    reg0 = ip4.dst;
    +    next;
    +};
    +
    +
  • + +
  • +

    + Routing table. For each route to IPv4 network N with + netmask M, a logical flow with match ip.dst == + N/M, whose priority is the number of + 1-bits in M, has the following actions: +

    + +
    +ip4.ttl--;
    +reg0 = G;
    +next;
    +
    + +

    + If the route has a gateway, G is the gateway IP address, + otherwise it is ip.dst. +

    +
  • + +
  • +

    + Destination unreachable. For each router port P, which + owns IP address A, a priority-0 logical flow with match + in_port == P && !ip.later_frag has + the following actions: +

    + +
    +icmp4 {
    +    icmp4.type = 3; // Destination unreachable
    +    icmp4.code = 0; // Network unreachable
    +    ip4.dst = ip4.src;
    +    ip4.src = A;
    +    ip4.ttl = 255;
    +    reg0 = ip4.dst;
    +    next;
    +};
    +
    + +

    + These flows are omitted if the logical router has a default route, + that is, a route with netmask 0.0.0.0. +

    +
  • +
+ +

Ingress Table 2: ARP Resolution

+ +

+ Any packet that reaches this table is an IP packet whose next-hop IP + address is in reg0. (ip.dst is the final + destination.) This table resolves the IP address in reg0 + into an Ethernet address in eth.dst, using the following + flows: +

+ +
    +
  • +

    + Known MAC bindings. For each IP address A whose host is + known to have Ethernet address E and reside on router port + P, a priority-200 flow with match reg0 == + A has the following actions: +

    + +
    +eth.dst = E;
    +outport = P;
    +output;
    +
    +
  • + +
  • +

    + Unknown MAC bindings. For each non-gateway route to IPv4 network + N with netmask M on router port P + that owns IP address A and Ethernet address E, + a logical flow with match ip.dst == + N/M, whose priority is the number of + 1-bits in M, has the following actions: +

    + +
    +ratelimit;
    +arp {
    +    eth.dst = ff:ff:ff:ff:ff:ff;
    +    eth.src = E;
    +    arp.sha = E;
    +    arp.tha = 00:00:00:00:00:00;
    +    arp.spa = A;
    +    arp.tpa = ip.dst;
    +    outport = P;
    +    output;
    +};
    +
    + +

    + TBD: How to install MAC bindings when an ARP response comes back. + (Implement a "learn" action?) +

    +
  • +
+ +

Egress Table 0: ARP Details

+ +

+ Packets that reach this table are ready for delivery. It contains a + single priority-0 logical flow that matches all packets and actions + output;. +

+ diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 47dfc2a..a7ff674 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -596,7 +596,7 @@ -

Life Cycle of a Packet

+

Architectural Life Cycle of a Packet

This section describes how a packet travels from one virtual machine or diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index c1932ad..0aaf7ca 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -240,12 +240,12 @@ The default action when no flow matches is to drop packets.

-

Logical Life Cycle of a Packet

+

Architectural Logical Life Cycle of a Packet

This following description focuses on the life cycle of a packet through a logical datapath, ignoring physical details of the implementation. - Please refer to Life Cycle of a Packet in + Please refer to Architectural Life Cycle of a Packet in ovn-architecture(7) for the physical information.

@@ -810,24 +810,109 @@
field1 = field2;
- Extends the assignment action to allow copying between fields. +

+ Extends the assignment action to allow copying between fields. +

+ +

+ An assignment adds prerequisites from the source and the + destination fields. +

+
+ +
ip4.ttl--;
+
+

+ Decrements the IPv4 TTL. If this would make the TTL zero or + negative, then processing of the packet halts; no further actions + are processed. (To properly handle such cases, a higher-priority + flow should match on ip.ttl < 2.) +

+ +

Prerequisite: ip4

-
learn
+
arp { action; ... };
+
+

+ Temporarily replaces the IPv4 packet being processed by an ARP + packet and executes each nested action on the ARP + packet. Actions following the arp action, if any, apply + to the original, unmodified packet. +

-
conntrack
+

+ The ARP packet that this action operates on is initialized based on + the IPv4 packet being processed, as follows: +

+ +
    +
  • eth.src unchanged
  • +
  • eth.dst unchanged
  • +
  • eth.type = 0x0806
  • +
  • arp.op = 1 (ARP request)
  • +
  • arp.sha copied from eth.src
  • +
  • arp.spa copied from ip4.src
  • +
  • arp.tha = 00:00:00:00:00:00
  • +
  • arp.tpa copied from ip4.dst
  • +
+ +

Prerequisite: ip4

+ -
dec_ttl { action, ... } { action; ...};
+
icmp4 { action; ... };
- decrement TTL; execute first set of actions if - successful, second set if TTL decrement fails +

+ Temporarily replaces the IPv4 packet being processed by an ICMPv4 + packet and executes each nested action on the ARP + packet. Actions following the icmp4 action, if any, + apply to the original, unmodified packet. +

+ +

+ The ICMPv4 packet that this action operates on is initialized based + on the IPv4 packet being processed, as follows. Ethernet and IPv4 + fields not listed here are not changed: +

+ +
    +
  • ip.proto = 1 (ICMPv4)
  • +
  • ip.frag = 0 (not a fragment)
  • +
  • icmp4.type = 3 (destination unreachable)
  • +
  • icmp4.code = 1 (host unreachable)
  • +
+ +

+ XXX need to explain exactly how the ICMP packet is constructed +

+ +

Prerequisite: ip4

-
icmp_reply { action, ... };
-
generate ICMP reply from packet, execute actions
+
tcp_reset;
+
+

+ This action transforms the current TCP packet according to the + following pseudocode: +

+ +
+if (tcp.ack) {
+        tcp.seq = tcp.ack;
+} else {
+        tcp.ack = tcp.seq + length(tcp.payload);
+        tcp.seq = 0;
+}
+tcp.flags = RST;
+
-
arp { action, ... }
-
generate ARP from packet, execute actions
+

+ Then, the action drops all TCP options and payload data, and + updates the TCP checksum. +

+ +

Prerequisite: tcp

+