diff mbox

[ovs-dev] ovn-controller: add restart test

Message ID 1457384636-22052-1-git-send-email-ramu.ramamurthy@us.ibm.com
State Superseded
Headers show

Commit Message

Ramu Ramamurthy March 7, 2016, 9:03 p.m. UTC
Add a test to validate that a restart of the ovn-controller
will program flows in table 0 consistently. Flows before
restart are compared against flows after restart to detect
problems with ofports or zone-ids.

Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
---
 tests/ovn-controller.at | 127 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Comments

Russell Bryant March 7, 2016, 9:15 p.m. UTC | #1
On Mon, Mar 7, 2016 at 4:03 PM, Ramu Ramamurthy <ramu.ramamurthy@gmail.com>
wrote:

> Add a test to validate that a restart of the ovn-controller
> will program flows in table 0 consistently. Flows before
> restart are compared against flows after restart to detect
> problems with ofports or zone-ids.
>
> Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>
>
>
Does this depend on your zone-ids fix?  Do you have another rev of that
coming?
Ramu Ramamurthy March 7, 2016, 9:27 p.m. UTC | #2
>
> Does this depend on your zone-ids fix?  Do you have another rev of that
> coming?
>

Russell, This test does not depend on the zone-id fix, because it replaces
zone-ids on flows
with a dummy string "xyz". It passes on the master as is.

I will send the next revision of the zone-id fix soon, and as part of that
fix, this test
will also be updated to compare zone-ids on flows.
Russell Bryant March 7, 2016, 9:32 p.m. UTC | #3
On Mon, Mar 7, 2016 at 4:27 PM, ramu <ramu.ramamurthy@gmail.com> wrote:

>
>
>> Does this depend on your zone-ids fix?  Do you have another rev of that
>> coming?
>>
>
> Russell, This test does not depend on the zone-id fix, because it replaces
> zone-ids on flows
> with a dummy string "xyz". It passes on the master as is.
>
> I will send the next revision of the zone-id fix soon, and as part of that
> fix, this test
> will also be updated to compare zone-ids on flows.
>

I see now.  You even wrote:

+# For now, zone-ids in flows are masked out because, they are not assigned
+# consistently. This test is added to prepare for that code-change.

What I would probably do is submit them together: the zone-ids fix + this
test case that helps test it.  If you take out your "xyz" change, will this
test fail currently?
Ramu Ramamurthy March 7, 2016, 9:50 p.m. UTC | #4
> What I would probably do is submit them together: the zone-ids fix + this
> test case that helps test it.  If you take out your "xyz" change, will this
> test fail currently?
>
> If i take out the change which masks the zone-ids, this test fails on
master because, zone-ids dont match before and after restart.
Ben Pfaff March 22, 2016, 5:47 p.m. UTC | #5
On Mon, Mar 07, 2016 at 04:03:56PM -0500, Ramu Ramamurthy wrote:
> Add a test to validate that a restart of the ovn-controller
> will program flows in table 0 consistently. Flows before
> restart are compared against flows after restart to detect
> problems with ofports or zone-ids.
> 
> Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com>

This adds a test for this invariant for a specific test case, which is
useful.  However, I believe that this should be a universal invariant
for ovn-controller, so that at any time if ovn-controller is restarted
gracefully after it has converged, it should preserve these flows.  Is
that true?  If so, then it would be more useful to add such a check to
*every* test case for OVN.

What do you think?

Thanks,

Ben.
Ramu Ramamurthy March 22, 2016, 11:37 p.m. UTC | #6
> This adds a test for this invariant for a specific test case, which is
> useful.  However, I believe that this should be a universal invariant
> for ovn-controller, so that at any time if ovn-controller is restarted
> gracefully after it has converged, it should preserve these flows.  Is
> that true?  If so, then it would be more useful to add such a check to
> *every* test case for OVN.
>
> What do you think?
>
> Thanks,
>
> Ben.

I am finding that, after a graceful exit of ovn-controller
using "ovs-appctl -t ovn-controller exit", and then a restart of the
ovn-controller,
the flows before restart and after restart dont match on table 0 in 1 out of
2 runs (or 1 out of 3 runs).

This is because, localnet ports in the test
are being deleted and recreated in such failing runs - and therefore, their
ofports are mismatching in the flow-table (before and after restart).

Localnet ports are being deleted and recreated in the failing runs
because, ovn-controller is cleaning up the chassis row during graceful
exit, and upon startup there still seems
to be a race between patch_run and binding_run when the chassis does
not exist, I am still
chasing this down.

Thats why the test as currently coded is killing the ovn-controller
ungracefully (kill -9) and restarting
it. As such the restart test and expected invariance of flows seems useful.
Ben Pfaff March 23, 2016, 11:23 p.m. UTC | #7
On Tue, Mar 22, 2016 at 04:37:16PM -0700, Ramu Ramamurthy wrote:
> > This adds a test for this invariant for a specific test case, which is
> > useful.  However, I believe that this should be a universal invariant
> > for ovn-controller, so that at any time if ovn-controller is restarted
> > gracefully after it has converged, it should preserve these flows.  Is
> > that true?  If so, then it would be more useful to add such a check to
> > *every* test case for OVN.
> >
> > What do you think?
> >
> > Thanks,
> >
> > Ben.
> 
> I am finding that, after a graceful exit of ovn-controller
> using "ovs-appctl -t ovn-controller exit", and then a restart of the
> ovn-controller,
> the flows before restart and after restart dont match on table 0 in 1 out of
> 2 runs (or 1 out of 3 runs).
> 
> This is because, localnet ports in the test
> are being deleted and recreated in such failing runs - and therefore, their
> ofports are mismatching in the flow-table (before and after restart).
> 
> Localnet ports are being deleted and recreated in the failing runs
> because, ovn-controller is cleaning up the chassis row during graceful
> exit, and upon startup there still seems
> to be a race between patch_run and binding_run when the chassis does
> not exist, I am still
> chasing this down.

Should that behavior of ovn-controller on graceful restart be considered
a bug, though?  If so, then we should probably fix it and add a test (or
modify multiple tests, as I suggested above) to make sure that it does
not recur.

> Thats why the test as currently coded is killing the ovn-controller
> ungracefully (kill -9) and restarting
> it. As such the restart test and expected invariance of flows seems useful.

I guess that kill -9 and restart can be a useful test too.
Ramu Ramamurthy March 23, 2016, 11:54 p.m. UTC | #8
On Wed, Mar 23, 2016 at 4:23 PM, Ben Pfaff <blp@ovn.org> wrote:
> On Tue, Mar 22, 2016 at 04:37:16PM -0700, Ramu Ramamurthy wrote:
>> > This adds a test for this invariant for a specific test case, which is
>> > useful.  However, I believe that this should be a universal invariant
>> > for ovn-controller, so that at any time if ovn-controller is restarted
>> > gracefully after it has converged, it should preserve these flows.  Is
>> > that true?  If so, then it would be more useful to add such a check to
>> > *every* test case for OVN.
>> >
>> > What do you think?
>> >
>> > Thanks,
>> >
>> > Ben.
>>
>> I am finding that, after a graceful exit of ovn-controller
>> using "ovs-appctl -t ovn-controller exit", and then a restart of the
>> ovn-controller,
>> the flows before restart and after restart dont match on table 0 in 1 out of
>> 2 runs (or 1 out of 3 runs).
>>
>> This is because, localnet ports in the test
>> are being deleted and recreated in such failing runs - and therefore, their
>> ofports are mismatching in the flow-table (before and after restart).
>>
>> Localnet ports are being deleted and recreated in the failing runs
>> because, ovn-controller is cleaning up the chassis row during graceful
>> exit, and upon startup there still seems
>> to be a race between patch_run and binding_run when the chassis does
>> not exist, I am still
>> chasing this down.
>
> Should that behavior of ovn-controller on graceful restart be considered
> a bug, though?  If so, then we should probably fix it and add a test (or
> modify multiple tests, as I suggested above) to make sure that it does
> not recur.
>
Indeed, its a bug in code, and I will update with patch(es) as needed
to fix such
restart bugs, along with test modifications.

>> Thats why the test as currently coded is killing the ovn-controller
>> ungracefully (kill -9) and restarting
>> it. As such the restart test and expected invariance of flows seems useful.
>
> I guess that kill -9 and restart can be a useful test too.
diff mbox

Patch

diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 2ff56af..68b78ff 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -117,3 +117,130 @@  check_patches
 # Gracefully terminate ovn-controller
 ovs-appctl -t ovn-controller exit
 AT_CLEANUP
+
+AT_SETUP([ovn-controller - restart])
+AT_KEYWORDS([ovn])
+
+# The purpose of this test is to validate flows programmed
+# (in table 0) after ovn-controller restart.
+# This test confirms that:
+# a) ofports remain the same, so, ports were not deleted/readded
+# b) zone-ids remain the same on ports
+#
+# For now, zone-ids in flows are masked out because, they are not assigned
+# consistently. This test is added to prepare for that code-change.
+#
+
+ovn_init_db ovn-sb
+net_add n1
+sim_add hv
+as hv
+ovs-vsctl \
+    -- add-br br-phys \
+    -- add-br br-eth0 \
+    -- add-br br-eth1 \
+    -- add-br br-eth2
+
+ovn_attach n1 br-phys 192.168.0.1
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1,physnet3:br-eth2])
+
+#
+# - Create 2 localnet ports
+# - Create 4 VIFs
+# - Create 5 CIFs
+#
+
+#
+# Create a localnet port, along with a vif
+#
+ovn-sbctl \
+    -- --id=@dp101 create Datapath_Binding tunnel_key=101 \
+    -- create Port_Binding datapath=@dp101 logical_port=localnet1 tunnel_key=1 \
+        type=localnet options:network_name=physnet1 \
+    -- create Port_Binding datapath=@dp101 logical_port=localvif1 tunnel_key=2
+ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1
+#
+# Create another localnet port, along with a vif
+#
+ovn-sbctl \
+    -- --id=@dp201 create Datapath_Binding tunnel_key=201 \
+    -- create Port_Binding datapath=@dp201 logical_port=localnet201 tunnel_key=1 \
+        type=localnet options:network_name=physnet2 \
+    -- create Port_Binding datapath=@dp201 logical_port=localvif201 tunnel_key=2
+
+ovs-vsctl add-port br-int localvif201 -- set Interface localvif201 external_ids:iface-id=localvif201
+
+#
+# Create a vif port
+#
+ovn-sbctl \
+    -- --id=@dp102 create Datapath_Binding tunnel_key=102 \
+    -- create Port_Binding datapath=@dp102 logical_port=localvif2 tunnel_key=3
+ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2
+
+#
+# Create a vif port and add 5 cifs on it
+#
+ovn-sbctl \
+    -- --id=@dp103 create Datapath_Binding tunnel_key=103 \
+    -- create Port_Binding datapath=@dp103 logical_port=localvif3 tunnel_key=4 \
+    -- create Port_Binding datapath=@dp103 logical_port=localcif1 tunnel_key=5 parent_port=localvif3 tag=1 \
+    -- create Port_Binding datapath=@dp103 logical_port=localcif2 tunnel_key=6 parent_port=localvif3 tag=2 \
+    -- create Port_Binding datapath=@dp103 logical_port=localcif3 tunnel_key=7 parent_port=localvif3 tag=3 \
+    -- create Port_Binding datapath=@dp103 logical_port=localcif4 tunnel_key=8 parent_port=localvif3 tag=4 \
+    -- create Port_Binding datapath=@dp103 logical_port=localcif5 tunnel_key=9 parent_port=localvif3 tag=5
+ovs-vsctl add-port br-int localvif3 -- set Interface localvif3 external_ids:iface-id=localvif3
+
+ovs-vsctl show
+
+# wait for flows in table 0 to get programmed
+OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13])
+
+# save the table0 flows for a later comparison after restart
+ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > expout
+cat expout
+
+
+#kill the ovn controller ungracefully
+kill -9 `cat "$OVS_RUNDIR"/ovn-controller.pid`
+
+echo "killed the ovn-controller"
+
+#clean the flows
+ovs-ofctl del-flows br-int
+
+#restart the ovn controller
+start_daemon ovn-controller
+
+#wait for the table 0 flows to get reprogrammed
+OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13])
+
+# save flows for a comparison
+ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > afterkill
+cat afterkill
+
+#compare the table 0 flows before and after restart
+AT_CHECK([ovs-ofctl diff-flows afterkill expout])
+
+
+ovs-appctl -t ovn-controller exit
+echo "shutdown the ovn-controller gracefully"
+
+#clean the flows
+ovs-ofctl del-flows br-int
+
+start_daemon ovn-controller
+echo "restarted the ovn-controller"
+
+#wait for the table 0 flows to get reprogrammed
+OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13])
+
+ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > afterrestart
+cat afterrestart
+
+#compare the table 0 flows before and after restart
+AT_CHECK([ovs-ofctl diff-flows afterrestart expout])
+
+
+AT_CLEANUP