[ovs-dev,7/7] ovn-northd: Logical flows for load balancers.
diff mbox

Message ID CAM_3v9+-5V3UD-6g==ahE6kH82xEoofnhBrHcm+n-1CNfdo5+Q@mail.gmail.com
State Not Applicable
Headers show

Commit Message

Gurucharan Shetty July 4, 2016, 3:17 a.m. UTC
On 3 July 2016 at 10:24, Ben Pfaff <blp@ovn.org> wrote:

> On Wed, Jun 29, 2016 at 01:17:11AM -0700, Gurucharan Shetty wrote:
> > This commit adds a 'pre_lb' table that sits before 'pre_stateful' table.
> > For packets that need to be load balanced, this table sets reg0[0]
> > to act as a hint for the pre-stateful table to send the packet to
> > the conntrack table for defragmentation.
> >
> > This commit also adds a 'lb' table that sits before 'stateful' table.
> > For packets from established connections, this table sets reg0[2] to
> > indicate to the 'stateful' table that the packet needs to be sent to
> > connection tracking table to just do NAT.
> >
> > In stateful table, packet for a new connection that needs to be load
> balanced
> > is given a ct_lb("$IP_LIST") action.
> >
> > Signed-off-by: Gurucharan Shetty <guru@ovn.org>
>
> This will require a change to the generated flow syntax if you accept my
> suggestion for patch 6.
>

I made that change and also had the schema change (that was reviewed
separately) squashed into this patch.

>
> Is there a way to test this?
>

The load balancing itself via group actions are tested here:
https://github.com/openvswitch/ovs/blob/master/tests/system-traffic.at#L2529

But OVN tests do not exist end to end as unit test framework does not
integrate with conntrack NAT.
One option is to wait for Daniele's userpace NAT work to get merged to add
the tests.

One way that we could test it is to look for specific group flows that get
generated with 'ovs-ofctl dump-groups' without any traffic sent. I can
create a unit test for that as a separate patch.


>
> Acked-by: Ben Pfaff <blp@ovn.org>
>

Thank you, I applied it with the following incremental around documentation
(which I had completely forgotten about, sorry about that. I am happy to
send any incremental documentation fixes if you have any comments. I
thought about sending another version, but I would likely again have
conflict with Numan's DHCP patches).

+    </p>
+
+    <h3>Ingress Table 5: Pre-stateful</h3>

     <p>
       This table prepares flows for all possible stateful processing
@@ -263,7 +279,7 @@
       <code>ct_next;</code> action.
     </p>

-    <h3>Ingress table 5: <code>from-lport</code> ACLs</h3>
+    <h3>Ingress table 6: <code>from-lport</code> ACLs</h3>

     <p>
       Logical flows in this table closely reproduce those in the
@@ -312,16 +328,57 @@
       </li>
     </ul>

-    <h3>Ingress Table 6: Stateful</h3>
+    <h3>Ingress Table 7: LB</h3>

     <p>
       It contains a priority-0 flow that simply moves traffic to the next
-      table.  A priority-100 flow commits packets to connection tracker
using
-      <code>ct_commit; next;</code> action based on a hint provided by
-      the previous tables (with a match for <code>reg0[1] == 1</code>).
+      table.  For established connections a priority 100 flow matches on
+      <code>ct.est &amp;&amp; !ct.rel &amp;&amp; !ct.new &amp;&amp;
+      !ct.inv</code> and sets an action <code>reg0[2] = 1; next;</code> to
act
+      as a hint for table <code>Stateful</code> to send packets through
+      connection tracker to NAT the packets.  (The packet will
automatically
+      get DNATed to the same IP address as the first packet in that
+      connection.)
     </p>

-    <h3>Ingress Table 7: ARP responder</h3>
+    <h3>Ingress Table 8: Stateful</h3>
+
+    <ul>
+      <li>
+        For all the configured load balancing rules in
+        <code>OVN_Northbound</code> database that includes a L4 port
+        <var>PORT</var> of protocol <var>P</var> and IPv4 address
+        <var>VIP</var>, a priority-120 flow that matches on
+        <code>ct.new &amp;&amp; ip &amp;&amp; ip4.dst == <var>VIP
+        </var>&amp;&amp; <var>P</var> &amp;&amp; <var>P</var>.dst ==
<var>PORT
+        </var></code> with an action of
<code>ct_lb(<var>args</var>)</code>,
+        where <var>args</var> contains comma separated IPv4 addresses (and
+        optional port numbers) to load balance to.
+      </li>
+      <li>
+        For all the configured load balancing rules in
+        <code>OVN_Northbound</code> database that includes just an IP
address
+        <var>VIP</var> to match on, a priority-110 flow that matches on
+        <code>ct.new &amp;&amp; ip &amp;&amp; ip4.dst ==
<var>VIP</var></code>
+        with an action of <code>ct_lb(<var>args</var>)</code>, where
+        <var>args</var> contains comma separated IPv4 addresses.
+      </li>
+      <li>
+        A priority-100 flow commits packets to connection tracker using
+        <code>ct_commit; next;</code> action based on a hint provided by
+        the previous tables (with a match for <code>reg0[1] == 1</code>).
+      </li>
+      <li>
+        A priority-100 flow sends the packets to connection tracker using
+        <code>ct_lb;</code> as the action based on a hint provided by the
+        previous tables (with a match for <code>reg0[2] == 1</code>).
+      </li>
+      <li>
+        A priority-0 flow that simply moves traffic to the next table.
+      </li>
+    </ul>
+
+    <h3>Ingress Table 9: ARP responder</h3>
     <p>
       This table implements ARP responder for known IPs.  It contains these
@@ -366,7 +423,7 @@ output;
       </li>
     </ul>

-    <h3>Ingress Table 8: Destination Lookup</h3>
+    <h3>Ingress Table 10: Destination Lookup</h3>

     <p>
       This table implements switching behavior.  It contains these logical
@@ -397,33 +454,50 @@ output;
       </li>
     </ul>

-    <h3>Egress Table 0: <code>to-lport</code> Pre-ACLs</h3>
+    <h3>Egress Table 0: Pre-LB</h3>
+
+    <p>
+      This table is similar to ingress table <code>Pre-LB</code>.  It
+      contains a priority-0 flow that simply moves traffic to the next
table.
+      If any load balancing rules exist for the datapath, a priority-100
flow
+      is added with a match of <code>ip</code> and action of <code>reg0[0]
= 1;
+       next;</code> to act as a hint for table <code>Pre-stateful</code> to
+      send IP packets to the connection tracker for packet
de-fragmentation.
+    </p>
+
+    <h3>Egress Table 1: <code>to-lport</code> Pre-ACLs</h3>

     <p>
       This is similar to ingress table <code>Pre-ACLs</code> except for
      <code>to-lport</code> traffic.
     </p>

-    <h3>Egress Table 1: Pre-stateful</h3>
+    <h3>Egress Table 2: Pre-stateful</h3>

     <p>
       This is similar to ingress table <code>Pre-stateful</code>.
     </p>

-    <h3>Egress Table 2: <code>to-lport</code> ACLs</h3>
+    <h3>Egress Table 3: LB</h3>
+    <p>
+      This is similar to ingress table <code>LB</code>.
+    </p>
+
+    <h3>Egress Table 4: <code>to-lport</code> ACLs</h3>

     <p>
       This is similar to ingress table <code>ACLs</code> except for
       <code>to-lport</code> ACLs.
     </p>

-    <h3>Egress Table 3: Stateful</h3>
+    <h3>Egress Table 5: Stateful</h3>

     <p>
-      This is similar to ingress table <code>Stateful</code>.
+      This is similar to ingress table <code>Stateful</code> except that
+      there are no rules added for load balancing new connections.
     </p>

-    <h3>Egress Table 4: Egress Port Security - IP</h3>
+    <h3>Egress Table 6: Egress Port Security - IP</h3>

     <p>
       This is similar to the port security logic in table
@@ -433,7 +507,7 @@ output;
       <code>ip4.src</code> and <code>ip6.src</code>
     </p>

-    <h3>Egress Table 5: Egress Port Security - L2</h3>
+    <h3>Egress Table 7: Egress Port Security - L2</h3>


>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>

Comments

Ben Pfaff July 4, 2016, 4:52 a.m. UTC | #1
On Sun, Jul 03, 2016 at 08:17:32PM -0700, Guru Shetty wrote:
> On 3 July 2016 at 10:24, Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Wed, Jun 29, 2016 at 01:17:11AM -0700, Gurucharan Shetty wrote:
> > > This commit adds a 'pre_lb' table that sits before 'pre_stateful' table.
> > > For packets that need to be load balanced, this table sets reg0[0]
> > > to act as a hint for the pre-stateful table to send the packet to
> > > the conntrack table for defragmentation.

...

> > Is there a way to test this?
> 
> The load balancing itself via group actions are tested here:
> https://github.com/openvswitch/ovs/blob/master/tests/system-traffic.at#L2529
> 
> But OVN tests do not exist end to end as unit test framework does not
> integrate with conntrack NAT.
> One option is to wait for Daniele's userpace NAT work to get merged to add
> the tests.
>
> One way that we could test it is to look for specific group flows that get
> generated with 'ovs-ofctl dump-groups' without any traffic sent. I can
> create a unit test for that as a separate patch.

That might be useful; I'll leave it to you to decide.

> Thank you, I applied it with the following incremental around documentation
> (which I had completely forgotten about, sorry about that. I am happy to
> send any incremental documentation fixes if you have any comments. I
> thought about sending another version, but I would likely again have
> conflict with Numan's DHCP patches).

Thanks for adding the documentation.
Zong Kai LI July 5, 2016, 2:31 p.m. UTC | #2
Hi, Ben and Guru. I tried to test lb feature on my OpenStack env, but
failed.
The simplest topology, three VMs(cirros) and VIP are on the same switch.
VM2 and VM3 are endpoints for the VIP.
I tried to use ping and ssh to test VIP, but things don't work.

I think it should be arp issue.
First, in table ls_in_arp_rsp, there is no flow entry to response for VIP.
Second, in table ls_in_l2_lkup, it will determine which port to output per
packet eth.dst. I'm not familiar with with conntrack, but it seems have
nothing to process on packet L2 address, when I run "conntrack -L". So I
suppose this is another place will cause load balance failure.

Thanks.
Zong Kai, LI
Gurucharan Shetty July 5, 2016, 2:46 p.m. UTC | #3
On 5 July 2016 at 07:31, Zong Kai LI <zealokii@gmail.com> wrote:

> Hi, Ben and Guru. I tried to test lb feature on my OpenStack env, but
> failed.
> The simplest topology, three VMs(cirros) and VIP are on the same switch.
> VM2 and VM3 are endpoints for the VIP.
> I tried to use ping and ssh to test VIP, but things don't work.
>
Yeah, the current feature works when the destination endpoints (servers)
are in a different subnet than the client. i.e. there has to be a router
in-between. The documentation should have clarified that, sorry!

I have a patch in my tree here that works for your use case too.
https://github.com/shettyg/ovs/commit/f961026fc0dd4645e5bcf1e819b8de99a7c3f95a

I will clean it up (i.e. add documentation) and send it out for review.



>
> I think it should be arp issue.
> First, in table ls_in_arp_rsp, there is no flow entry to response for VIP.
> Second, in table ls_in_l2_lkup, it will determine which port to output per
> packet eth.dst. I'm not familiar with with conntrack, but it seems have
> nothing to process on packet L2 address, when I run "conntrack -L". So I
> suppose this is another place will cause load balance failure.
>
> Thanks.
> Zong Kai, LI
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
Gurucharan Shetty July 5, 2016, 3:09 p.m. UTC | #4
On 5 July 2016 at 07:31, Zong Kai LI <zealokii@gmail.com> wrote:

> Hi, Ben and Guru. I tried to test lb feature on my OpenStack env, but
> failed.
> The simplest topology, three VMs(cirros) and VIP are on the same switch.
> VM2 and VM3 are endpoints for the VIP.
> I tried to use ping and ssh to test VIP, but things don't work.
>
> I think it should be arp issue.
> First, in table ls_in_arp_rsp, there is no flow entry to response for VIP.
> Second, in table ls_in_l2_lkup, it will determine which port to output per
> packet eth.dst. I'm not familiar with with conntrack, but it seems have
> nothing to process on packet L2 address, when I run "conntrack -L". So I
> suppose this is another place will cause load balance failure.
>

On second thoughts, your use case would still not work without a router
connected to your switch. i.e. the VIP itself does not have a MAC address
associated with it.  When a VIP is in a different subnet, the logical port
has to send the packet to the router port. I think, that would be the case
in a non-virtualized world too.



> Thanks.
> Zong Kai, LI
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>

Patch
diff mbox

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index b8ee106..6bc83ea 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -252,7 +252,23 @@ 
       before eventually advancing to ingress table <code>ACLs</code>.
     </p>

-    <h3>Ingress Table 4: Pre-stateful</h3>
+    <h3>Ingress Table 4: Pre-LB</h3>
+
+    <p>
+      This table prepares flows for possible stateful load balancing
processing
+      in ingress table <code>LB</code> and <code>Stateful</code>.  It
contains
+      a priority-0 flow that simply moves traffic to the next table.  If
load
+      balancing rules with virtual IP addresses (and ports) are configured
in
+      <code>OVN_Northbound</code> database for a logical datapath, a
+      priority-100 flow is added for each configured virtual IP address
+      <var>VIP</var> with a match <code>ip &amp;&amp; ip4.dst ==
<var>VIP</var>
+      </code> that sets an action <code>reg0[0] = 1; next;</code> to act
as a
+      hint for table <code>Pre-stateful</code> to send IP packets to the
+      connection tracker for packet de-fragmentation before eventually
+      advancing to ingress table <code>LB</code>.