Message ID | 1457384636-22052-1-git-send-email-ramu.ramamurthy@us.ibm.com |
---|---|
State | Superseded |
Headers | show |
On Mon, Mar 7, 2016 at 4:03 PM, Ramu Ramamurthy <ramu.ramamurthy@gmail.com> wrote: > Add a test to validate that a restart of the ovn-controller > will program flows in table 0 consistently. Flows before > restart are compared against flows after restart to detect > problems with ofports or zone-ids. > > Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> > > Does this depend on your zone-ids fix? Do you have another rev of that coming?
> > Does this depend on your zone-ids fix? Do you have another rev of that > coming? > Russell, This test does not depend on the zone-id fix, because it replaces zone-ids on flows with a dummy string "xyz". It passes on the master as is. I will send the next revision of the zone-id fix soon, and as part of that fix, this test will also be updated to compare zone-ids on flows.
On Mon, Mar 7, 2016 at 4:27 PM, ramu <ramu.ramamurthy@gmail.com> wrote: > > >> Does this depend on your zone-ids fix? Do you have another rev of that >> coming? >> > > Russell, This test does not depend on the zone-id fix, because it replaces > zone-ids on flows > with a dummy string "xyz". It passes on the master as is. > > I will send the next revision of the zone-id fix soon, and as part of that > fix, this test > will also be updated to compare zone-ids on flows. > I see now. You even wrote: +# For now, zone-ids in flows are masked out because, they are not assigned +# consistently. This test is added to prepare for that code-change. What I would probably do is submit them together: the zone-ids fix + this test case that helps test it. If you take out your "xyz" change, will this test fail currently?
> What I would probably do is submit them together: the zone-ids fix + this > test case that helps test it. If you take out your "xyz" change, will this > test fail currently? > > If i take out the change which masks the zone-ids, this test fails on master because, zone-ids dont match before and after restart.
On Mon, Mar 07, 2016 at 04:03:56PM -0500, Ramu Ramamurthy wrote: > Add a test to validate that a restart of the ovn-controller > will program flows in table 0 consistently. Flows before > restart are compared against flows after restart to detect > problems with ofports or zone-ids. > > Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> This adds a test for this invariant for a specific test case, which is useful. However, I believe that this should be a universal invariant for ovn-controller, so that at any time if ovn-controller is restarted gracefully after it has converged, it should preserve these flows. Is that true? If so, then it would be more useful to add such a check to *every* test case for OVN. What do you think? Thanks, Ben.
> This adds a test for this invariant for a specific test case, which is > useful. However, I believe that this should be a universal invariant > for ovn-controller, so that at any time if ovn-controller is restarted > gracefully after it has converged, it should preserve these flows. Is > that true? If so, then it would be more useful to add such a check to > *every* test case for OVN. > > What do you think? > > Thanks, > > Ben. I am finding that, after a graceful exit of ovn-controller using "ovs-appctl -t ovn-controller exit", and then a restart of the ovn-controller, the flows before restart and after restart dont match on table 0 in 1 out of 2 runs (or 1 out of 3 runs). This is because, localnet ports in the test are being deleted and recreated in such failing runs - and therefore, their ofports are mismatching in the flow-table (before and after restart). Localnet ports are being deleted and recreated in the failing runs because, ovn-controller is cleaning up the chassis row during graceful exit, and upon startup there still seems to be a race between patch_run and binding_run when the chassis does not exist, I am still chasing this down. Thats why the test as currently coded is killing the ovn-controller ungracefully (kill -9) and restarting it. As such the restart test and expected invariance of flows seems useful.
On Tue, Mar 22, 2016 at 04:37:16PM -0700, Ramu Ramamurthy wrote: > > This adds a test for this invariant for a specific test case, which is > > useful. However, I believe that this should be a universal invariant > > for ovn-controller, so that at any time if ovn-controller is restarted > > gracefully after it has converged, it should preserve these flows. Is > > that true? If so, then it would be more useful to add such a check to > > *every* test case for OVN. > > > > What do you think? > > > > Thanks, > > > > Ben. > > I am finding that, after a graceful exit of ovn-controller > using "ovs-appctl -t ovn-controller exit", and then a restart of the > ovn-controller, > the flows before restart and after restart dont match on table 0 in 1 out of > 2 runs (or 1 out of 3 runs). > > This is because, localnet ports in the test > are being deleted and recreated in such failing runs - and therefore, their > ofports are mismatching in the flow-table (before and after restart). > > Localnet ports are being deleted and recreated in the failing runs > because, ovn-controller is cleaning up the chassis row during graceful > exit, and upon startup there still seems > to be a race between patch_run and binding_run when the chassis does > not exist, I am still > chasing this down. Should that behavior of ovn-controller on graceful restart be considered a bug, though? If so, then we should probably fix it and add a test (or modify multiple tests, as I suggested above) to make sure that it does not recur. > Thats why the test as currently coded is killing the ovn-controller > ungracefully (kill -9) and restarting > it. As such the restart test and expected invariance of flows seems useful. I guess that kill -9 and restart can be a useful test too.
On Wed, Mar 23, 2016 at 4:23 PM, Ben Pfaff <blp@ovn.org> wrote: > On Tue, Mar 22, 2016 at 04:37:16PM -0700, Ramu Ramamurthy wrote: >> > This adds a test for this invariant for a specific test case, which is >> > useful. However, I believe that this should be a universal invariant >> > for ovn-controller, so that at any time if ovn-controller is restarted >> > gracefully after it has converged, it should preserve these flows. Is >> > that true? If so, then it would be more useful to add such a check to >> > *every* test case for OVN. >> > >> > What do you think? >> > >> > Thanks, >> > >> > Ben. >> >> I am finding that, after a graceful exit of ovn-controller >> using "ovs-appctl -t ovn-controller exit", and then a restart of the >> ovn-controller, >> the flows before restart and after restart dont match on table 0 in 1 out of >> 2 runs (or 1 out of 3 runs). >> >> This is because, localnet ports in the test >> are being deleted and recreated in such failing runs - and therefore, their >> ofports are mismatching in the flow-table (before and after restart). >> >> Localnet ports are being deleted and recreated in the failing runs >> because, ovn-controller is cleaning up the chassis row during graceful >> exit, and upon startup there still seems >> to be a race between patch_run and binding_run when the chassis does >> not exist, I am still >> chasing this down. > > Should that behavior of ovn-controller on graceful restart be considered > a bug, though? If so, then we should probably fix it and add a test (or > modify multiple tests, as I suggested above) to make sure that it does > not recur. > Indeed, its a bug in code, and I will update with patch(es) as needed to fix such restart bugs, along with test modifications. >> Thats why the test as currently coded is killing the ovn-controller >> ungracefully (kill -9) and restarting >> it. As such the restart test and expected invariance of flows seems useful. > > I guess that kill -9 and restart can be a useful test too.
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 2ff56af..68b78ff 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -117,3 +117,130 @@ check_patches # Gracefully terminate ovn-controller ovs-appctl -t ovn-controller exit AT_CLEANUP + +AT_SETUP([ovn-controller - restart]) +AT_KEYWORDS([ovn]) + +# The purpose of this test is to validate flows programmed +# (in table 0) after ovn-controller restart. +# This test confirms that: +# a) ofports remain the same, so, ports were not deleted/readded +# b) zone-ids remain the same on ports +# +# For now, zone-ids in flows are masked out because, they are not assigned +# consistently. This test is added to prepare for that code-change. +# + +ovn_init_db ovn-sb +net_add n1 +sim_add hv +as hv +ovs-vsctl \ + -- add-br br-phys \ + -- add-br br-eth0 \ + -- add-br br-eth1 \ + -- add-br br-eth2 + +ovn_attach n1 br-phys 192.168.0.1 + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1,physnet3:br-eth2]) + +# +# - Create 2 localnet ports +# - Create 4 VIFs +# - Create 5 CIFs +# + +# +# Create a localnet port, along with a vif +# +ovn-sbctl \ + -- --id=@dp101 create Datapath_Binding tunnel_key=101 \ + -- create Port_Binding datapath=@dp101 logical_port=localnet1 tunnel_key=1 \ + type=localnet options:network_name=physnet1 \ + -- create Port_Binding datapath=@dp101 logical_port=localvif1 tunnel_key=2 +ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1 +# +# Create another localnet port, along with a vif +# +ovn-sbctl \ + -- --id=@dp201 create Datapath_Binding tunnel_key=201 \ + -- create Port_Binding datapath=@dp201 logical_port=localnet201 tunnel_key=1 \ + type=localnet options:network_name=physnet2 \ + -- create Port_Binding datapath=@dp201 logical_port=localvif201 tunnel_key=2 + +ovs-vsctl add-port br-int localvif201 -- set Interface localvif201 external_ids:iface-id=localvif201 + +# +# Create a vif port +# +ovn-sbctl \ + -- --id=@dp102 create Datapath_Binding tunnel_key=102 \ + -- create Port_Binding datapath=@dp102 logical_port=localvif2 tunnel_key=3 +ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2 + +# +# Create a vif port and add 5 cifs on it +# +ovn-sbctl \ + -- --id=@dp103 create Datapath_Binding tunnel_key=103 \ + -- create Port_Binding datapath=@dp103 logical_port=localvif3 tunnel_key=4 \ + -- create Port_Binding datapath=@dp103 logical_port=localcif1 tunnel_key=5 parent_port=localvif3 tag=1 \ + -- create Port_Binding datapath=@dp103 logical_port=localcif2 tunnel_key=6 parent_port=localvif3 tag=2 \ + -- create Port_Binding datapath=@dp103 logical_port=localcif3 tunnel_key=7 parent_port=localvif3 tag=3 \ + -- create Port_Binding datapath=@dp103 logical_port=localcif4 tunnel_key=8 parent_port=localvif3 tag=4 \ + -- create Port_Binding datapath=@dp103 logical_port=localcif5 tunnel_key=9 parent_port=localvif3 tag=5 +ovs-vsctl add-port br-int localvif3 -- set Interface localvif3 external_ids:iface-id=localvif3 + +ovs-vsctl show + +# wait for flows in table 0 to get programmed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +# save the table0 flows for a later comparison after restart +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > expout +cat expout + + +#kill the ovn controller ungracefully +kill -9 `cat "$OVS_RUNDIR"/ovn-controller.pid` + +echo "killed the ovn-controller" + +#clean the flows +ovs-ofctl del-flows br-int + +#restart the ovn controller +start_daemon ovn-controller + +#wait for the table 0 flows to get reprogrammed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +# save flows for a comparison +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > afterkill +cat afterkill + +#compare the table 0 flows before and after restart +AT_CHECK([ovs-ofctl diff-flows afterkill expout]) + + +ovs-appctl -t ovn-controller exit +echo "shutdown the ovn-controller gracefully" + +#clean the flows +ovs-ofctl del-flows br-int + +start_daemon ovn-controller +echo "restarted the ovn-controller" + +#wait for the table 0 flows to get reprogrammed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > afterrestart +cat afterrestart + +#compare the table 0 flows before and after restart +AT_CHECK([ovs-ofctl diff-flows afterrestart expout]) + + +AT_CLEANUP
Add a test to validate that a restart of the ovn-controller will program flows in table 0 consistently. Flows before restart are compared against flows after restart to detect problems with ofports or zone-ids. Signed-off-by: Ramu Ramamurthy <ramu.ramamurthy@us.ibm.com> --- tests/ovn-controller.at | 127 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+)