Message ID | 20220401101640.1139426-1-xsimonar@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,v2] ovn-controller: Delay ovn-installed if sb db is readonly | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
On 4/1/22 12:16, Xavier Simonart wrote: > Before this patch, when all flows for a port have been installed in OVS, > ovn-installed was immediately written to the OVS DB; the notification of > this change to the OVN controller (handled by OVS interface handler) > might cause a full recompute if the SBDB is readonly / still handling > the chassis update in PortBinding. > With this patch, ovn-installed is not written to the OVS DB while the > SBDB is readonly. This reduces the number of (un-necessary) full > recomputes. > Note that this was causing some spurious full recompute in test "ACL > with Port Group conjunction flow efficiency", causing the test to fail > from time to time. The test case was also updated as part of this patch > to check that ovn-installed was always properly set. > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- It seems that delaying the "ovn-installed" causes measurable latency spikes when scale testing with OpenShift/ovn-kubernetes. On a 120 nodes cluster we saw "ovn-installed" being delayed occasionally for up to 500ms. All this time the OVS flows were already properly installed but the SB transaction was still in progress. We might have to figure out a different way to address the issue this patch is fixing. Thanks, Dumitru
On Thu, Apr 14, 2022 at 4:06 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/1/22 12:16, Xavier Simonart wrote: > > Before this patch, when all flows for a port have been installed in OVS, > > ovn-installed was immediately written to the OVS DB; the notification of > > this change to the OVN controller (handled by OVS interface handler) > > might cause a full recompute if the SBDB is readonly / still handling > > the chassis update in PortBinding. > > With this patch, ovn-installed is not written to the OVS DB while the > > SBDB is readonly. This reduces the number of (un-necessary) full > > recomputes. > > Note that this was causing some spurious full recompute in test "ACL > > with Port Group conjunction flow efficiency", causing the test to fail > > from time to time. The test case was also updated as part of this patch > > to check that ovn-installed was always properly set. > > > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > > > --- > > It seems that delaying the "ovn-installed" causes measurable latency > spikes when scale testing with OpenShift/ovn-kubernetes. On a 120 nodes > cluster we saw "ovn-installed" being delayed occasionally for up to > 500ms. All this time the OVS flows were already properly installed but > the SB transaction was still in progress. > > We might have to figure out a different way to address the issue this > patch is fixing. > > Thanks, > Dumitru > Hi Dumitru, Xavier, Lorenzo, I used a different approach to fix the problem. Please take a look: https://patchwork.ozlabs.org/project/ovn/patch/20220503185454.1110829-1-hzhou@ovn.org/ Thanks, Han
diff --git a/controller/binding.c b/controller/binding.c index 4d62b0858..536cbc513 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -688,7 +688,7 @@ local_binding_set_up(struct shash *local_bindings, const char *pb_name, local_binding_find(local_bindings, pb_name); struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding); - if (!ovs_readonly && lbinding && lbinding->iface + if (!ovs_readonly && !sb_readonly && lbinding && lbinding->iface && !smap_get_bool(&lbinding->iface->external_ids, OVN_INSTALLED_EXT_ID, false)) { VLOG_INFO("Setting lport %s ovn-installed in OVS", pb_name); diff --git a/tests/ovn.at b/tests/ovn.at index 0c2fe9f97..b7bf69e18 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -28799,6 +28799,8 @@ ovs-vsctl add-port br-int lsp0-1 -- set interface lsp0-1 external_ids:iface-id=l check ovn-nbctl --wait=hv sync AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 22]) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"']) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-1 external_ids:ovn-installed` = '"true"']) # Save the current lflow_run counter lflow_run=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) @@ -28823,11 +28825,13 @@ AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep 192.168 | wc -l) == ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0 check ovn-nbctl --wait=hv sync AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12]) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"']) # 4. Bind a lsp (lsp9-0) that doesn't belong to pg1, should not see any change. ovs-vsctl add-port br-int lsp9-0 -- set interface lsp9-0 external_ids:iface-id=lsp9-0 check ovn-nbctl --wait=hv sync AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 12]) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp9-0 external_ids:ovn-installed` = '"true"']) # 5. Bind another 2 lsps (lsp1-0 lsp1-1) that belong to pg1 and on a different # LS (ls1), should see conjunction flows doubled (12 x 2 = 24) @@ -28835,6 +28839,8 @@ ovs-vsctl add-port br-int lsp1-0 -- set interface lsp1-0 external_ids:iface-id=l ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=lsp1-1 check ovn-nbctl --wait=hv sync AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24]) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-0 external_ids:ovn-installed` = '"true"']) +OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp1-1 external_ids:ovn-installed` = '"true"']) # 6. Simulate a SB port-group "del and add" notification to ovn-controller in the # same IDL iteration. ovn-controller should still program the same flows. In @@ -28860,6 +28866,7 @@ for i in $(seq 1 10); do # Finally check flow count is the same as before. AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24]) + OVS_WAIT_UNTIL([test `ovs-vsctl get Interface lsp0-0 external_ids:ovn-installed` = '"true"']) done # Make sure all the above was performed with I-P (no recompute)
Before this patch, when all flows for a port have been installed in OVS, ovn-installed was immediately written to the OVS DB; the notification of this change to the OVN controller (handled by OVS interface handler) might cause a full recompute if the SBDB is readonly / still handling the chassis update in PortBinding. With this patch, ovn-installed is not written to the OVS DB while the SBDB is readonly. This reduces the number of (un-necessary) full recomputes. Note that this was causing some spurious full recompute in test "ACL with Port Group conjunction flow efficiency", causing the test to fail from time to time. The test case was also updated as part of this patch to check that ovn-installed was always properly set. Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- v2: Remove debugging leftover, add back empty line and rebase. --- controller/binding.c | 2 +- tests/ovn.at | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-)