diff mbox series

[ovs-dev] ovn-controller: Delay ovn-installed if sb db is readonly

Message ID 20220208121536.3802003-1-xsimonar@redhat.com
State Superseded
Headers show
Series [ovs-dev] ovn-controller: Delay ovn-installed if sb db is readonly | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Xavier Simonart Feb. 8, 2022, 12:15 p.m. UTC
Before this patch, when all flows for a port have been installed in OVS,
ovn-installed was immediately written to the OVS DB; the notification of
this change to the OVN controller (handled by OVS interface handler)
might cause a full recompute if the SBDB is readonly / still handling
the chassis update in PortBinding.
With this patch, ovn-installed is not written to the OVS DB while the
SBDB is readonly.  This reduces the number of (un-necessary) full
recomputes.
Note that this was causing some spurious full recompute in test "ACL
with Port Group conjunction flow efficiency", causing the test to fail
from time to time. The test case was also updated as part of this patch
to check that ovn-installed was always properly set.

Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
---
 controller/binding.c |  2 +-
 tests/ovn.at         | 10 +++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Dumitru Ceara Feb. 23, 2022, 2:05 p.m. UTC | #1
On 2/8/22 13:15, Xavier Simonart wrote:
> Before this patch, when all flows for a port have been installed in OVS,
> ovn-installed was immediately written to the OVS DB; the notification of
> this change to the OVN controller (handled by OVS interface handler)
> might cause a full recompute if the SBDB is readonly / still handling
> the chassis update in PortBinding.
> With this patch, ovn-installed is not written to the OVS DB while the
> SBDB is readonly.  This reduces the number of (un-necessary) full
> recomputes.
> Note that this was causing some spurious full recompute in test "ACL
> with Port Group conjunction flow efficiency", causing the test to fail
> from time to time. The test case was also updated as part of this patch
> to check that ovn-installed was always properly set.
> 
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> ---

Hi Xavier,

>  controller/binding.c |  2 +-
>  tests/ovn.at         | 10 +++++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 4d62b0858..536cbc513 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -688,7 +688,7 @@ local_binding_set_up(struct shash *local_bindings, const char *pb_name,
>          local_binding_find(local_bindings, pb_name);
>      struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
>  
> -    if (!ovs_readonly && lbinding && lbinding->iface
> +    if (!ovs_readonly && !sb_readonly && lbinding && lbinding->iface
>              && !smap_get_bool(&lbinding->iface->external_ids,
>                                OVN_INSTALLED_EXT_ID, false)) {

This makes sense but we're adding a dependency on the SB.  In some
cases, e.g., at scale when the SB ovn-controller is connected to is
snapshotting, it can take a while (a few seconds) until a locally
initiated transaction commits and ovn-controller is notified.

I'm guessing those are corner cases, but it would be nice if we could
test this change at scale with a CMS that uses the "ovn-installed" OVS
external-id.

AFAIK the only CMS doing that is ovn-kubernetes.  Pending some scale
test results with OpenShift/Kubernetes with ovn-k8s, and with a few
minor comments below:

Acked-by: Dumitru Ceara <dceara@redhat.com>

>          VLOG_INFO("Setting lport %s ovn-installed in OVS", pb_name);
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 957eb7850..1a009561b 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -28325,6 +28325,8 @@ sim_add hv1
>  as hv1
>  ovs-vsctl add-br br-phys
>  ovn_attach n1 br-phys 192.168.0.1
> +#ovn-appctl -t ovn-controller vlog/set dbg
> +ovn-appctl -t ovn-sb vlog/set dbg
>  

I'm guessing these were debug leftovers.

>  ovn-nbctl lr-add lr0
>  
> @@ -28353,6 +28355,8 @@ ovs-vsctl add-port br-int lsp0-1 -- set interface lsp0-1 external_ids:iface-id=l
>  
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 22])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-1 external_ids:ovn-installed` = '"true"'])
>  
>  # Save the current lflow_run counter
>  lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
> @@ -28377,11 +28381,13 @@ AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep 192.168 | wc -l) ==
>  ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
>  
>  # 4. Bind a lsp (lsp9-0) that doesn't belong to pg1, should not see any change.
>  ovs-vsctl add-port br-int lsp9-0 -- set interface lsp9-0 external_ids:iface-id=lsp9-0
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp9-0 external_ids:ovn-installed` = '"true"'])
>  
>  # 5. Bind another 2 lsps (lsp1-0 lsp1-1) that belong to pg1 and on a different
>  #    LS (ls1), should see conjunction flows doubled (12 x 2 = 24)
> @@ -28389,6 +28395,8 @@ ovs-vsctl add-port br-int lsp1-0 -- set interface lsp1-0 external_ids:iface-id=l
>  ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=lsp1-1
>  check ovn-nbctl --wait=hv sync
>  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-0 external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-1 external_ids:ovn-installed` = '"true"'])
>  
>  # 6. Simulate a SB port-group "del and add" notification to ovn-controller in the
>  #    same IDL iteration. ovn-controller should still program the same flows. In
> @@ -28414,11 +28422,11 @@ for i in $(seq 1 10); do
>  
>      # Finally check flow count is the same as before.
>      AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
> +    OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
>  done
>  
>  # Make sure all the above was performed with I-P (no recompute)
>  AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run])
> -

Accidental?  I would've missed it but it actually generates a conflict
when applying on top of current main branch.

>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])

Thanks,
Dumitru
diff mbox series

Patch

diff --git a/controller/binding.c b/controller/binding.c
index 4d62b0858..536cbc513 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -688,7 +688,7 @@  local_binding_set_up(struct shash *local_bindings, const char *pb_name,
         local_binding_find(local_bindings, pb_name);
     struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
 
-    if (!ovs_readonly && lbinding && lbinding->iface
+    if (!ovs_readonly && !sb_readonly && lbinding && lbinding->iface
             && !smap_get_bool(&lbinding->iface->external_ids,
                               OVN_INSTALLED_EXT_ID, false)) {
         VLOG_INFO("Setting lport %s ovn-installed in OVS", pb_name);
diff --git a/tests/ovn.at b/tests/ovn.at
index 957eb7850..1a009561b 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -28325,6 +28325,8 @@  sim_add hv1
 as hv1
 ovs-vsctl add-br br-phys
 ovn_attach n1 br-phys 192.168.0.1
+#ovn-appctl -t ovn-controller vlog/set dbg
+ovn-appctl -t ovn-sb vlog/set dbg
 
 ovn-nbctl lr-add lr0
 
@@ -28353,6 +28355,8 @@  ovs-vsctl add-port br-int lsp0-1 -- set interface lsp0-1 external_ids:iface-id=l
 
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 22])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-1 external_ids:ovn-installed` = '"true"'])
 
 # Save the current lflow_run counter
 lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
@@ -28377,11 +28381,13 @@  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep 192.168 | wc -l) ==
 ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
 
 # 4. Bind a lsp (lsp9-0) that doesn't belong to pg1, should not see any change.
 ovs-vsctl add-port br-int lsp9-0 -- set interface lsp9-0 external_ids:iface-id=lsp9-0
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp9-0 external_ids:ovn-installed` = '"true"'])
 
 # 5. Bind another 2 lsps (lsp1-0 lsp1-1) that belong to pg1 and on a different
 #    LS (ls1), should see conjunction flows doubled (12 x 2 = 24)
@@ -28389,6 +28395,8 @@  ovs-vsctl add-port br-int lsp1-0 -- set interface lsp1-0 external_ids:iface-id=l
 ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=lsp1-1
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-1 external_ids:ovn-installed` = '"true"'])
 
 # 6. Simulate a SB port-group "del and add" notification to ovn-controller in the
 #    same IDL iteration. ovn-controller should still program the same flows. In
@@ -28414,11 +28422,11 @@  for i in $(seq 1 10); do
 
     # Finally check flow count is the same as before.
     AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
+    OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
 done
 
 # Make sure all the above was performed with I-P (no recompute)
 AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run])
-
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])