From patchwork Tue Sep 29 22:19:47 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Pfaff X-Patchwork-Id: 524046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (unknown [IPv6:2600:3c00::f03c:91ff:fe6e:bdf7]) by ozlabs.org (Postfix) with ESMTP id B7D9014090A for ; Wed, 30 Sep 2015 08:19:55 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id ED6EC108BA; Tue, 29 Sep 2015 15:19:54 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx3v1.cudamail.com (mx3.cudamail.com [64.34.241.5]) by archives.nicira.com (Postfix) with ESMTPS id E6516108B9 for ; Tue, 29 Sep 2015 15:19:53 -0700 (PDT) Received: from bar4.cudamail.com (bar2 [192.168.15.2]) by mx3v1.cudamail.com (Postfix) with ESMTP id 35CD5618E3A for ; Tue, 29 Sep 2015 16:19:53 -0600 (MDT) X-ASG-Debug-ID: 1443565192-03dc2142474ff70001-byXFYA Received: from mx3-pf1.cudamail.com ([192.168.14.2]) by bar4.cudamail.com with ESMTP id YfEMOrx56t7oJRF8 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 29 Sep 2015 16:19:52 -0600 (MDT) X-Barracuda-Envelope-From: blp@nicira.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.14.2 Received: from unknown (HELO mail-pa0-f49.google.com) (209.85.220.49) by mx3-pf1.cudamail.com with ESMTPS (RC4-SHA encrypted); 29 Sep 2015 22:19:51 -0000 Received-SPF: unknown (mx3-pf1.cudamail.com: Multiple SPF records returned) X-Barracuda-RBL-Trusted-Forwarder: 209.85.220.49 Received: by padhy16 with SMTP id hy16so17906003pad.1 for ; Tue, 29 Sep 2015 15:19:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=vHYLdAgTrg3hQ/91m47iKyPtDfdHsshMlUivZX8CNgM=; b=ipsdNpEn9bnp7p/gv1gVR7sZFxT4GNQ1silFA0gvh+NyRs1RPobZUrOhVG3ynmmEnw IgqRkhSBM7TJrcJhIExiJpmf5PXUrtmVAmhjijoXk4SnHNCZjqNCTmPivcCAmFM8bX5Z gt7wopI7VyKNR2DospgONVmmH7XSk7H0PGsvxUlDCOdsghc0lXptP7XDKAHt52/SupCs 2aijUqUuahgODjCdReGZi0xoyK7xGzDBAd8LqX/U0zaRlEJ3yzA7kcUoe78Ox9eleo7w Bs4TmcH51qWv8xYuuXCh2d24VDtb+sxo6zabE098bnE09aJapqP9byzz0kJ/VipaEG4f rszQ== X-Gm-Message-State: ALoCoQnzUduIS9b9an0FYzrcrtxthmCfFIE3RukCgcCzXHlZ9mkTL7TGIlsulF5JE93HoQH/v62S X-Received: by 10.66.221.170 with SMTP id qf10mr455676pac.134.1443565191478; Tue, 29 Sep 2015 15:19:51 -0700 (PDT) Received: from nicira.com ([208.91.2.4]) by smtp.gmail.com with ESMTPSA id by1sm27615464pab.6.2015.09.29.15.19.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Sep 2015 15:19:49 -0700 (PDT) Date: Tue, 29 Sep 2015 15:19:47 -0700 X-Barracuda-Apparent-Source-IP: 208.91.2.4 X-CudaMail-Envelope-Sender: blp@nicira.com From: Ben Pfaff To: Justin Pettit X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-V1-928077135 X-CudaMail-DTE: 092915 X-CudaMail-Originating-IP: 209.85.220.49 Message-ID: <20150929221947.GP30384@nicira.com> X-ASG-Orig-Subj: [##CM-V1-928077135##]Re: [ovs-dev] [PATCH 3/3] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3. References: <1442509883-3992-1-git-send-email-blp@nicira.com> <1442509883-3992-3-git-send-email-blp@nicira.com> <78E16A38-E19E-4484-9CA3-E9013F4990F3@nicira.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <78E16A38-E19E-4484-9CA3-E9013F4990F3@nicira.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Barracuda-Connect: UNKNOWN[192.168.14.2] X-Barracuda-Start-Time: 1443565192 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH 3/3] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@openvswitch.org Sender: "dev" On Mon, Sep 21, 2015 at 06:06:16PM -0700, Justin Pettit wrote: > > > On Sep 17, 2015, at 10:11 AM, Ben Pfaff wrote: > > +
  • > > + ip.src[28..31] == 0xe (multicast source) > > +
  • > > +
  • > > + ip.src == 255.255.255.255 (broadcast source) > > +
  • > > +
  • > > + ip.src == 127.0.0.0/8 || ip.dst == 127.0.0.0/8 > > + (localhost source or destination) > > +
  • > > +
  • > > + ip.src == 0.0.0.0/8 || ip.dst == 0.0.0.0/8 (zero > > + network source or destination) > > +
  • > > I assume these should all be "ip4.src". Yes, thanks, I've now done a general search-and-replace. > > +

    > > + ICMP echo reply. These flows reply to ICMP echo requests received > > + for the router's IP address. Let A be an IP address owned > > + by the router or the broadcast address for one of these IP address's > > + networks. Then, for each A, a priority-210 flow matches > > + on ip.dst == A and icmp4.type == 8 > > + && icmp4.code == 0 (ICMP echo request). These flows > > + use the following actions where, if A is unicast, then > > + S is A, and if A is broadcast, > > + S is the router's IP address in A's network: > > +

    > > + > > +
    > > +ip4.dst = ip4.src;
    > > +ip4.src = S;
    > > +ip4.ttl = 255;
    > > +icmp4.type = 0;
    > > +reg0 = ip4.dst;
    > 
    > Later on, it becomes clear why reg0 is being used, but it would be
    > nice to be explicit about it earlier.
    
    Good point, I folded this in:
    
    
    > > +      
  • > > + Ethernet local broadcast. A priority-190 flow with match eth.dst > > + == ff:ff:ff:ff:ff:ff drops traffic destined to the local > > + Ethernet broadcast address. By definition this traffic should not be > > + forwarded. > > +
  • > > + > > +
  • > > + Drop IP multicast. A priority-190 flow with match ip.dst[28..31] > > + == 0xe drops IP multicast traffic. > > +
  • > > We may want to drop traffic sent to an IP broadcast address to prevent > things like Smurf attacks. It's an option, sure. It might not be viable to launch a Smurf attack from within a logical network when port security is turned on. > > +
    > > +ratelimit;
    > 
    > I don't think "ratelimit" is defined anywhere.
    
    I was thinking about ratelimiting, anyway.
    
    I'll drop that for now.
    
    > > +arp {
    > > +    eth.dst = ff:ff:ff:ff:ff:ff;
    > > +    eth.src = E;
    > > +    arp.sha = E;
    > > +    arp.tha = 00:00:00:00:00:00;
    > > +    arp.spa = A;
    > > +    arp.tpa = ip.dst;
    > > +    outport = P;
    > > +    output;
    > 
    > Should you set the ARP opcode?
    
    arp.op = 1 (request) is the default according to the definition in
    ovn-sb.xml.
    
    It doesn't hurt to make it explicit though so I added an assignment.
    
    > > +        
    ip4.ttl--;
    > > +
    > > +

    > > + Decrements the IPv4 TTL. If this would make the TTL zero or > > + negative, then processing of the packet halts; no further actions > > + are processed. (To properly handle such cases, a higher-priority > > + flow should match on ip.ttl < 2.) > > +

    > > + > > +

    Prerequisite: ip4

    > > Is IPv6 that different? Not here, thanks for pointing that out. I changed this to be generic IPv4/IPv6. > >
    > > > > -
    learn
    > > +
    arp { action; ... };
    > > +
    > > +

    > > + Temporarily replaces the IPv4 packet being processed by an ARP > > + packet and executes each nested action on the ARP > > + packet. Actions following the arp action, if any, apply > > + to the original, unmodified packet. > > +

    > > So what would we normally do with the original packet? Do we just > drop it? That seems kind of unfortunate. I agree. I've added a comment to the TODO list. > > > > -
    dec_ttl { action, ... } { action; ...};
    > > +
    icmp4 { action; ... };
    > >
    > > - decrement TTL; execute first set of actions if > > - successful, second set if TTL decrement fails > > +

    > > + Temporarily replaces the IPv4 packet being processed by an ICMPv4 > > + packet and executes each nested action on the ARP > > Do you mean IPv4 instead of ARP? Yes, thanks, fixed. > > + packet. Actions following the icmp4 action, if any, > > + apply to the original, unmodified packet. > > +

    > > + > > +

    > > + The ICMPv4 packet that this action operates on is initialized based > > + on the IPv4 packet being processed, as follows. Ethernet and IPv4 > > + fields not listed here are not changed: > > +

    > > + > > +
      > > +
    • ip.proto = 1 (ICMPv4)
    • > > +
    • ip.frag = 0 (not a fragment)
    • > > +
    • icmp4.type = 3 (destination unreachable)
    • > > +
    • icmp4.code = 1 (host unreachable)
    • > > I assume this is just an example since we support other types and > codes, so may want to mention that in the description. There have to be some defaults so these are the ones I was planning to use. I added a comment to that effect here and earlier: @@ -843,7 +843,8 @@

      The ARP packet that this action operates on is initialized based on - the IPv4 packet being processed, as follows: + the IPv4 packet being processed, as follows. These are default + values that the nested actions will probably want to change:

        @@ -864,15 +865,16 @@

        The ICMPv4 packet that this action operates on is initialized based - on the IPv4 packet being processed, as follows. Ethernet and IPv4 - fields not listed here are not changed: + on the IPv4 packet being processed, as follows. These are default + values that the nested actions will probably want to change. + Ethernet and IPv4 fields not listed here are not changed:

          Thanks for the review! diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 9d35d9f..3381f33 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -241,7 +241,13 @@

          This table is the core of the logical router datapath functionality. It contains the following flows to implement very basic IP host - functionality: + functionality. +

          + +

          + Logical flows in this table that advance to the next table set + reg0 to the next-hop IP address. (ip4.dst, the + final destination, is left unchanged.)

            > Maybe it's obvious to others, but my first thought was that this was > the original "ip.dst", but now that I look at it more carefully, my > guess is that it is the original "ip.src". It might be good to add a > comment to clarify. I guess that's about this line: reg0 = ip4.dst; so I changed it to: reg0 = ip4.dst; // Route back to original IP source. > Finally, should it always be the original source IP? If we have > multiple routers in between, shouldn't it be the gateway's address? > (This question holds for all the places where reg0 is being set.) XXXXXXXXXXXXx > > +next; > > +
    > > Is it possible in these examples to have "
    " and "
    " line up? I thought that the spaces would result in an extra blank line but it doesn't seem to make a difference. I must misunderstand some subtlety of XML or nroff. Anyway, I made the change, it does look better. > > +
  • > > +

    > > + TCP reset. These flows generate TCP reset messages in reply to TCP > > + datagrams directed to the router's IP address. The logical router > > + doesn't accept any TCP traffic so it always generates such a reply. > > +

    > > Do you want to add the comment about not matching on IP fragments with > nonzero offset like you did for UDP and protocol unreachable? (TCP > shouldn't generate IP fragments, but it can happen.) OK. > > +

    > > + Protocol unreachable. These flows generate ICMP protocol unreachable > > + messages in reply to packets directed to the router's IP address on > > + IP protocols other than UDP, TCP, and ICMP. > > +

    > > I think for all of these error generators, we should consider some > sort of rate-limiting. Obviously, this is a little complicated if we > want to do it in ovs-vswitchd--especially in the fast path. I agree. I've thought about that, but I don't have a detailed design for it. For now I've added a TODO item: diff --git a/ovn/TODO b/ovn/TODO index dc3ca10..e8d1a39 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -260,6 +260,15 @@ those that become stale. *** MTU handling (fragmentation on output) +** Ratelimiting. + +*** ARP. + +*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ... + +As a point of comparison, Linux doesn't ratelimit TCP resets but I +think it does everything else. + * ovn-controller ** ovn-controller parameters and configuration.