diff mbox

[ovs-dev,3/3] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3.

Message ID 20150929221947.GP30384@nicira.com
State Superseded
Headers show

Commit Message

Ben Pfaff Sept. 29, 2015, 10:19 p.m. UTC
On Mon, Sep 21, 2015 at 06:06:16PM -0700, Justin Pettit wrote:
> 
> > On Sep 17, 2015, at 10:11 AM, Ben Pfaff <blp@nicira.com> wrote:
> > +          <li>
> > +            <code>ip.src[28..31] == 0xe</code> (multicast source)
> > +          </li>
> > +          <li>
> > +            <code>ip.src == 255.255.255.255</code> (broadcast source)
> > +          </li>
> > +          <li>
> > +            <code>ip.src == 127.0.0.0/8 || ip.dst == 127.0.0.0/8</code>
> > +            (localhost source or destination)
> > +          </li>
> > +          <li>
> > +            <code>ip.src == 0.0.0.0/8 || ip.dst == 0.0.0.0/8</code> (zero
> > +            network source or destination)
> > +          </li>
> 
> I assume these should all be "ip4.src".

Yes, thanks, I've now done a general search-and-replace.

> > +        <p>
> > +          ICMP echo reply.  These flows reply to ICMP echo requests received
> > +          for the router's IP address.  Let <var>A</var> be an IP address owned
> > +          by the router or the broadcast address for one of these IP address's
> > +          networks.  Then, for each <var>A</var>, a priority-210 flow matches
> > +          on <code>ip.dst == <var>A</var></code> and <code>icmp4.type == 8
> > +          &amp;&amp; icmp4.code == 0</code> (ICMP echo request).  These flows
> > +          use the following actions where, if <var>A</var> is unicast, then
> > +          <var>S</var> is <var>A</var>, and if <var>A</var> is broadcast,
> > +          <var>S</var> is the router's IP address in <var>A</var>'s network:
> > +        </p>
> > +
> > +        <pre>
> > +ip4.dst = ip4.src;
> > +ip4.src = <var>S</var>;
> > +ip4.ttl = 255;
> > +icmp4.type = 0;
> > +reg0 = ip4.dst;
> 
> Later on, it becomes clear why reg0 is being used, but it would be
> nice to be explicit about it earlier.

Good point, I folded this in:


> > +      <li>
> > +        Ethernet local broadcast.  A priority-190 flow with match <code>eth.dst
> > +        == ff:ff:ff:ff:ff:ff</code> drops traffic destined to the local
> > +        Ethernet broadcast address.  By definition this traffic should not be
> > +        forwarded.
> > +      </li>
> > +
> > +      <li>
> > +        Drop IP multicast.  A priority-190 flow with match <code>ip.dst[28..31]
> > +        == 0xe</code> drops IP multicast traffic.
> > +      </li>
> 
> We may want to drop traffic sent to an IP broadcast address to prevent
> things like Smurf attacks.

It's an option, sure.

It might not be viable to launch a Smurf attack from within a logical
network when port security is turned on.

> > +        <pre>
> > +ratelimit;
> 
> I don't think "ratelimit" is defined anywhere.

I was thinking about ratelimiting, anyway.

I'll drop that for now.

> > +arp {
> > +    eth.dst = ff:ff:ff:ff:ff:ff;
> > +    eth.src = <var>E</var>;
> > +    arp.sha = <var>E</var>;
> > +    arp.tha = 00:00:00:00:00:00;
> > +    arp.spa = <var>A</var>;
> > +    arp.tpa = ip.dst;
> > +    outport = <var>P</var>;
> > +    output;
> 
> Should you set the ARP opcode?

arp.op = 1 (request) is the default according to the definition in
ovn-sb.xml.

It doesn't hurt to make it explicit though so I added an assignment.

> > +        <dt><code>ip4.ttl--;</code></dt>
> > +        <dd>
> > +          <p>
> > +            Decrements the IPv4 TTL.  If this would make the TTL zero or
> > +            negative, then processing of the packet halts; no further actions
> > +            are processed.  (To properly handle such cases, a higher-priority
> > +            flow should match on <code>ip.ttl &lt; 2</code>.)
> > +          </p>
> > +
> > +          <p><b>Prerequisite:</b> <code>ip4</code></p>
> 
> Is IPv6 that different?

Not here, thanks for pointing that out.

I changed this to be generic IPv4/IPv6.

> >         </dd>
> > 
> > -        <dt><code>learn</code></dt>
> > +        <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
> > +        <dd>
> > +          <p>
> > +            Temporarily replaces the IPv4 packet being processed by an ARP
> > +            packet and executes each nested <var>action</var> on the ARP
> > +            packet.  Actions following the <var>arp</var> action, if any, apply
> > +            to the original, unmodified packet.
> > +          </p>
> 
> So what would we normally do with the original packet?  Do we just
> drop it?  That seems kind of unfortunate.

I agree.  I've added a comment to the TODO list.

> > 
> > -        <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt>
> > +        <dt><code>icmp4 { <var>action</var>; </code>...<code> };</code></dt>
> >         <dd>
> > -          decrement TTL; execute first set of actions if
> > -          successful, second set if TTL decrement fails
> > +          <p>
> > +            Temporarily replaces the IPv4 packet being processed by an ICMPv4
> > +            packet and executes each nested <var>action</var> on the ARP
> 
> Do you mean IPv4 instead of ARP?

Yes, thanks, fixed.

> > +            packet.  Actions following the <var>icmp4</var> action, if any,
> > +            apply to the original, unmodified packet.
> > +          </p>
> > +
> > +          <p>
> > +            The ICMPv4 packet that this action operates on is initialized based
> > +            on the IPv4 packet being processed, as follows.  Ethernet and IPv4
> > +            fields not listed here are not changed:
> > +          </p>
> > +
> > +          <ul>
> > +            <li><code>ip.proto = 1</code> (ICMPv4)</li>
> > +            <li><code>ip.frag = 0</code> (not a fragment)</li>
> > +            <li><code>icmp4.type = 3</code> (destination unreachable)</li>
> > +            <li><code>icmp4.code = 1</code> (host unreachable)</li>
> 
> I assume this is just an example since we support other types and
> codes, so may want to mention that in the description.

There have to be some defaults so these are the ones I was planning to
use.

I added a comment to that effect here and earlier:

@@ -843,7 +843,8 @@
 
           <p>
             The ARP packet that this action operates on is initialized based on
-            the IPv4 packet being processed, as follows:
+            the IPv4 packet being processed, as follows.  These are default
+            values that the nested actions will probably want to change:
           </p>
 
           <ul>
@@ -864,15 +865,16 @@
           <p>
             The ICMPv4 packet that this action operates on is initialized based
-            on the IPv4 packet being processed, as follows.  Ethernet and IPv4
-            fields not listed here are not changed:
+            on the IPv4 packet being processed, as follows.  These are default
+            values that the nested actions will probably want to change.
+            Ethernet and IPv4 fields not listed here are not changed:
           </p>
 
           <ul>

Thanks for the review!

Comments

Ben Pfaff Sept. 29, 2015, 10:21 p.m. UTC | #1
On Tue, Sep 29, 2015 at 03:19:47PM -0700, Ben Pfaff wrote:
> On Mon, Sep 21, 2015 at 06:06:16PM -0700, Justin Pettit wrote:
> > > +        <pre>
> > > +ip4.dst = ip4.src;
> > > +ip4.src = <var>S</var>;
> > > +ip4.ttl = 255;
> > > +icmp4.type = 0;
> > > +reg0 = ip4.dst;
> > 
> > Finally, should it always be the original source IP?  If we have
> > multiple routers in between, shouldn't it be the gateway's address?
> > (This question holds for all the places where reg0 is being set.)
> 
> XXXXXXXXXXXXx

Um, that was meant to be an actual reply.  I'll get to that.
Justin Pettit Sept. 29, 2015, 11:30 p.m. UTC | #2
> On Sep 29, 2015, at 3:19 PM, Ben Pfaff <blp@nicira.com> wrote:
> 
> On Mon, Sep 21, 2015 at 06:06:16PM -0700, Justin Pettit wrote:
>> 
>> 
>> We may want to drop traffic sent to an IP broadcast address to prevent
>> things like Smurf attacks.
> 
> It's an option, sure.
> 
> It might not be viable to launch a Smurf attack from within a logical
> network when port security is turned on.

Couldn't it be an issue with traffic coming from a gateway?

--Justin
Ben Pfaff Oct. 6, 2015, 9:21 p.m. UTC | #3
On Tue, Sep 29, 2015 at 04:30:30PM -0700, Justin Pettit wrote:
> 
> > On Sep 29, 2015, at 3:19 PM, Ben Pfaff <blp@nicira.com> wrote:
> > 
> > On Mon, Sep 21, 2015 at 06:06:16PM -0700, Justin Pettit wrote:
> >> 
> >> 
> >> We may want to drop traffic sent to an IP broadcast address to prevent
> >> things like Smurf attacks.
> > 
> > It's an option, sure.
> > 
> > It might not be viable to launch a Smurf attack from within a logical
> > network when port security is turned on.
> 
> Couldn't it be an issue with traffic coming from a gateway?

Yes.
diff mbox

Patch

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 9d35d9f..3381f33 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -241,7 +241,13 @@ 
     <p>
       This table is the core of the logical router datapath functionality.  It
       contains the following flows to implement very basic IP host
-      functionality:
+      functionality.
+    </p>
+
+    <p>
+      Logical flows in this table that advance to the next table set
+      <code>reg0</code> to the next-hop IP address.  (<code>ip4.dst</code>, the
+      final destination, is left unchanged.)
     </p>
 
     <ul>

> Maybe it's obvious to others, but my first thought was that this was
> the original "ip.dst", but now that I look at it more carefully, my
> guess is that it is the original "ip.src".  It might be good to add a
> comment to clarify.

I guess that's about this line:
        reg0 = ip4.dst;
so I changed it to:
        reg0 = ip4.dst;   // Route back to original IP source.

> Finally, should it always be the original source IP?  If we have
> multiple routers in between, shouldn't it be the gateway's address?
> (This question holds for all the places where reg0 is being set.)

XXXXXXXXXXXXx

> > +next;
> > +</pre>
> 
> Is it possible in these examples to have "<pre>" and "</pre>"  line up?

I thought that the spaces would result in an extra blank line but it
doesn't seem to make a difference.  I must misunderstand some subtlety
of XML or nroff.  Anyway, I made the change, it does look better.

> > +      <li>
> > +        <p>
> > +          TCP reset.  These flows generate TCP reset messages in reply to TCP
> > +          datagrams directed to the router's IP address.  The logical router
> > +          doesn't accept any TCP traffic so it always generates such a reply.
> > +        </p>
> 
> Do you want to add the comment about not matching on IP fragments with
> nonzero offset like you did for UDP and protocol unreachable?  (TCP
> shouldn't generate IP fragments, but it can happen.)

OK.

> > +        <p>
> > +          Protocol unreachable.  These flows generate ICMP protocol unreachable
> > +          messages in reply to packets directed to the router's IP address on
> > +          IP protocols other than UDP, TCP, and ICMP.
> > +        </p>
> 
> I think for all of these error generators, we should consider some
> sort of rate-limiting.  Obviously, this is a little complicated if we
> want to do it in ovs-vswitchd--especially in the fast path.

I agree.  I've thought about that, but I don't have a detailed design
for it.

For now I've added a TODO item:

diff --git a/ovn/TODO b/ovn/TODO
index dc3ca10..e8d1a39 100644
--- a/ovn/TODO
+++ b/ovn/TODO
@@ -260,6 +260,15 @@  those that become stale.
 
 *** MTU handling (fragmentation on output)
 
+** Ratelimiting.
+
+*** ARP.
+
+*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ...
+
+As a point of comparison, Linux doesn't ratelimit TCP resets but I
+think it does everything else.
+
 * ovn-controller
 
 ** ovn-controller parameters and configuration.