diff mbox series

[ovs-dev,v2] ovn-controller: Delay ovn-installed if sb db is readonly

Message ID 20220401101640.1139426-1-xsimonar@redhat.com
State Superseded
Headers show
Series [ovs-dev,v2] ovn-controller: Delay ovn-installed if sb db is readonly | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Xavier Simonart April 1, 2022, 10:16 a.m. UTC
Before this patch, when all flows for a port have been installed in OVS,
ovn-installed was immediately written to the OVS DB; the notification of
this change to the OVN controller (handled by OVS interface handler)
might cause a full recompute if the SBDB is readonly / still handling
the chassis update in PortBinding.
With this patch, ovn-installed is not written to the OVS DB while the
SBDB is readonly.  This reduces the number of (un-necessary) full
recomputes.
Note that this was causing some spurious full recompute in test "ACL
with Port Group conjunction flow efficiency", causing the test to fail
from time to time. The test case was also updated as part of this patch
to check that ovn-installed was always properly set.

Signed-off-by: Xavier Simonart <xsimonar@redhat.com>

---
v2: Remove debugging leftover, add back empty line and rebase.
---
 controller/binding.c | 2 +-
 tests/ovn.at         | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Dumitru Ceara April 14, 2022, 11:06 a.m. UTC | #1
On 4/1/22 12:16, Xavier Simonart wrote:
> Before this patch, when all flows for a port have been installed in OVS,
> ovn-installed was immediately written to the OVS DB; the notification of
> this change to the OVN controller (handled by OVS interface handler)
> might cause a full recompute if the SBDB is readonly / still handling
> the chassis update in PortBinding.
> With this patch, ovn-installed is not written to the OVS DB while the
> SBDB is readonly.  This reduces the number of (un-necessary) full
> recomputes.
> Note that this was causing some spurious full recompute in test "ACL
> with Port Group conjunction flow efficiency", causing the test to fail
> from time to time. The test case was also updated as part of this patch
> to check that ovn-installed was always properly set.
> 
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> 
> ---

It seems that delaying the "ovn-installed" causes measurable latency
spikes when scale testing with OpenShift/ovn-kubernetes.  On a 120 nodes
cluster we saw "ovn-installed" being delayed occasionally for up to
500ms.  All this time the OVS flows were already properly installed but
the SB transaction was still in progress.

We might have to figure out a different way to address the issue this
patch is fixing.

Thanks,
Dumitru
Han Zhou May 3, 2022, 6:58 p.m. UTC | #2
On Thu, Apr 14, 2022 at 4:06 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 4/1/22 12:16, Xavier Simonart wrote:
> > Before this patch, when all flows for a port have been installed in OVS,
> > ovn-installed was immediately written to the OVS DB; the notification of
> > this change to the OVN controller (handled by OVS interface handler)
> > might cause a full recompute if the SBDB is readonly / still handling
> > the chassis update in PortBinding.
> > With this patch, ovn-installed is not written to the OVS DB while the
> > SBDB is readonly.  This reduces the number of (un-necessary) full
> > recomputes.
> > Note that this was causing some spurious full recompute in test "ACL
> > with Port Group conjunction flow efficiency", causing the test to fail
> > from time to time. The test case was also updated as part of this patch
> > to check that ovn-installed was always properly set.
> >
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >
> > ---
>
> It seems that delaying the "ovn-installed" causes measurable latency
> spikes when scale testing with OpenShift/ovn-kubernetes.  On a 120 nodes
> cluster we saw "ovn-installed" being delayed occasionally for up to
> 500ms.  All this time the OVS flows were already properly installed but
> the SB transaction was still in progress.
>
> We might have to figure out a different way to address the issue this
> patch is fixing.
>
> Thanks,
> Dumitru
>
Hi Dumitru, Xavier, Lorenzo,

I used a different approach to fix the problem. Please take a look:
https://patchwork.ozlabs.org/project/ovn/patch/20220503185454.1110829-1-hzhou@ovn.org/

Thanks,
Han
diff mbox series

Patch

diff --git a/controller/binding.c b/controller/binding.c
index 4d62b0858..536cbc513 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -688,7 +688,7 @@  local_binding_set_up(struct shash *local_bindings, const char *pb_name,
         local_binding_find(local_bindings, pb_name);
     struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
 
-    if (!ovs_readonly && lbinding && lbinding->iface
+    if (!ovs_readonly && !sb_readonly && lbinding && lbinding->iface
             && !smap_get_bool(&lbinding->iface->external_ids,
                               OVN_INSTALLED_EXT_ID, false)) {
         VLOG_INFO("Setting lport %s ovn-installed in OVS", pb_name);
diff --git a/tests/ovn.at b/tests/ovn.at
index 0c2fe9f97..b7bf69e18 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -28799,6 +28799,8 @@  ovs-vsctl add-port br-int lsp0-1 -- set interface lsp0-1 external_ids:iface-id=l
 
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 22])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-1 external_ids:ovn-installed` = '"true"'])
 
 # Save the current lflow_run counter
 lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
@@ -28823,11 +28825,13 @@  AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep 192.168 | wc -l) ==
 ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
 
 # 4. Bind a lsp (lsp9-0) that doesn't belong to pg1, should not see any change.
 ovs-vsctl add-port br-int lsp9-0 -- set interface lsp9-0 external_ids:iface-id=lsp9-0
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp9-0 external_ids:ovn-installed` = '"true"'])
 
 # 5. Bind another 2 lsps (lsp1-0 lsp1-1) that belong to pg1 and on a different
 #    LS (ls1), should see conjunction flows doubled (12 x 2 = 24)
@@ -28835,6 +28839,8 @@  ovs-vsctl add-port br-int lsp1-0 -- set interface lsp1-0 external_ids:iface-id=l
 ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=lsp1-1
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-1 external_ids:ovn-installed` = '"true"'])
 
 # 6. Simulate a SB port-group "del and add" notification to ovn-controller in the
 #    same IDL iteration. ovn-controller should still program the same flows. In
@@ -28860,6 +28866,7 @@  for i in $(seq 1 10); do
 
     # Finally check flow count is the same as before.
     AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
+    OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"'])
 done
 
 # Make sure all the above was performed with I-P (no recompute)