diff mbox series

[ovs-dev] ovn-controller: Fix port-group incremental processing.

Message ID 20210602070731.3736171-1-hzhou@ovn.org
State New
Headers show
Series [ovs-dev] ovn-controller: Fix port-group incremental processing. | expand

Commit Message

Han Zhou June 2, 2021, 7:07 a.m. UTC
When SB port-group (per DP) is deleted and added back and the
notification to ovn-controller include both operations in a reversed
order, the current change handler in ovn-controller would end up
deleting the SB port-group in the local cache, which in turn result in
missing flows, and even worse it can result in crash if runtime data
handler is triggered later because we asserts the SB PG should be
equivalent to the local cache.

This patch fixes it by always handling deletion for tracked PG changes,
just like how we are handling other tracked changes in I-P (e.g.
logical_flow). A test case is crafted to cover such scenario.

Fixes: 0cfeba6b55 ("ovn-controller: Fix port group conjunction flow explosion problem.")
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 controller/ovn-controller.c |  6 +++++-
 tests/ovn.at                | 28 +++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 2 deletions(-)

Comments

Mark Michelson June 10, 2021, 7:11 p.m. UTC | #1
Looks good to me, thanks Han!

Acked-by: Mark Michelson <mmichels@redhat.com>

On 6/2/21 3:07 AM, Han Zhou wrote:
> When SB port-group (per DP) is deleted and added back and the
> notification to ovn-controller include both operations in a reversed
> order, the current change handler in ovn-controller would end up
> deleting the SB port-group in the local cache, which in turn result in
> missing flows, and even worse it can result in crash if runtime data
> handler is triggered later because we asserts the SB PG should be
> equivalent to the local cache.
> 
> This patch fixes it by always handling deletion for tracked PG changes,
> just like how we are handling other tracked changes in I-P (e.g.
> logical_flow). A test case is crafted to cover such scenario.
> 
> Fixes: 0cfeba6b55 ("ovn-controller: Fix port group conjunction flow explosion problem.")
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>   controller/ovn-controller.c |  6 +++++-
>   tests/ovn.at                | 28 +++++++++++++++++++++++++++-
>   2 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index d48ddc7a2..07c6fcfd1 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1556,7 +1556,11 @@ port_groups_update(const struct sbrec_port_group_table *port_group_table,
>               expr_const_sets_remove(port_groups_cs_local, pg->name);
>               port_group_ssets_delete(port_group_ssets, pg->name);
>               sset_add(deleted, pg->name);
> -        } else {
> +        }
> +    }
> +
> +    SBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (pg, port_group_table) {
> +        if (!sbrec_port_group_is_deleted(pg)) {
>               port_group_ssets_add_or_update(port_group_ssets, pg);
>               expr_const_sets_add_strings(port_groups_cs_local, pg->name,
>                                           (const char *const *) pg->ports,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index a21dbadac..7be494644 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -26562,7 +26562,7 @@ AT_CLEANUP
>   ])
>   
>   # Tests the efficiency of conjunction flow generation when port groups are used
> -# by ACLs. Make sure there is no open flow explosion in table 45 for an ACL
> +# by ACLs. Make sure there is no open flow explosion in table 44 for an ACL
>   # with self-referencing match condition of a port group:
>   #
>   # "outport == @pg1 && ip4.src == $pg1_ip4"
> @@ -26648,6 +26648,32 @@ ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=l
>   check ovn-nbctl --wait=hv sync
>   AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
>   
> +# 6. Simulate a SB port-group "del and add" notification to ovn-controller in the
> +#    same IDL iteration. ovn-controller should still program the same flows. In
> +#    reality it can happen when PG memembership change triggers a SB port-group
> +#    deletion and creation with the same SB port-group name, while the
> +#    notification of the changes can come to ovn-controller in one shot and the
> +#    order of the "del" and "add" in the notification is undefined. This test
> +#    runs the scenario ten times to make sure the unpleasant order is tested.
> +
> +for i in $(seq 1 10); do
> +    # Delete and recreate the SB PG with same name and content.
> +    sb_pg_name=2_pg1 # dp key + NB pg name
> +    sb_pg_uuid=$(fetch_column port_group _uuid name=$sb_pg_name)
> +    ports_=$(fetch_column port_group ports name=2_pg1)
> +    ports=${ports_/ /,}
> +    AT_CHECK([ovn-sbctl destroy port_group $sb_pg_uuid -- create port_group name=$sb_pg_name ports=$ports], [0], [ignore])
> +
> +    # Unbind and bind lsp0-0 to trigger runtime data change as well.
> +    ovs-vsctl del-port br-int lsp0-0
> +    check ovn-nbctl --wait=hv sync
> +    ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0
> +    check ovn-nbctl --wait=hv sync
> +
> +    # Finally check flow count is the same as before.
> +    AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
> +done
> +
>   # Make sure all the above was performed with I-P (no recompute)
>   AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run])
>   
>
Han Zhou June 10, 2021, 11:52 p.m. UTC | #2
On Thu, Jun 10, 2021 at 12:12 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Looks good to me, thanks Han!
>
> Acked-by: Mark Michelson <mmichels@redhat.com>
>

Thanks Mark! I applied it to master, branch-21.06 and 21.03.

> On 6/2/21 3:07 AM, Han Zhou wrote:
> > When SB port-group (per DP) is deleted and added back and the
> > notification to ovn-controller include both operations in a reversed
> > order, the current change handler in ovn-controller would end up
> > deleting the SB port-group in the local cache, which in turn result in
> > missing flows, and even worse it can result in crash if runtime data
> > handler is triggered later because we asserts the SB PG should be
> > equivalent to the local cache.
> >
> > This patch fixes it by always handling deletion for tracked PG changes,
> > just like how we are handling other tracked changes in I-P (e.g.
> > logical_flow). A test case is crafted to cover such scenario.
> >
> > Fixes: 0cfeba6b55 ("ovn-controller: Fix port group conjunction flow
explosion problem.")
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >   controller/ovn-controller.c |  6 +++++-
> >   tests/ovn.at                | 28 +++++++++++++++++++++++++++-
> >   2 files changed, 32 insertions(+), 2 deletions(-)
> >
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index d48ddc7a2..07c6fcfd1 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -1556,7 +1556,11 @@ port_groups_update(const struct
sbrec_port_group_table *port_group_table,
> >               expr_const_sets_remove(port_groups_cs_local, pg->name);
> >               port_group_ssets_delete(port_group_ssets, pg->name);
> >               sset_add(deleted, pg->name);
> > -        } else {
> > +        }
> > +    }
> > +
> > +    SBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (pg, port_group_table) {
> > +        if (!sbrec_port_group_is_deleted(pg)) {
> >               port_group_ssets_add_or_update(port_group_ssets, pg);
> >               expr_const_sets_add_strings(port_groups_cs_local,
pg->name,
> >                                           (const char *const *)
pg->ports,
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index a21dbadac..7be494644 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -26562,7 +26562,7 @@ AT_CLEANUP
> >   ])
> >
> >   # Tests the efficiency of conjunction flow generation when port
groups are used
> > -# by ACLs. Make sure there is no open flow explosion in table 45 for
an ACL
> > +# by ACLs. Make sure there is no open flow explosion in table 44 for
an ACL
> >   # with self-referencing match condition of a port group:
> >   #
> >   # "outport == @pg1 && ip4.src == $pg1_ip4"
> > @@ -26648,6 +26648,32 @@ ovs-vsctl add-port br-int lsp1-1 -- set
interface lsp1-1 external_ids:iface-id=l
> >   check ovn-nbctl --wait=hv sync
> >   AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep
conjunction | wc -l) == 24])
> >
> > +# 6. Simulate a SB port-group "del and add" notification to
ovn-controller in the
> > +#    same IDL iteration. ovn-controller should still program the same
flows. In
> > +#    reality it can happen when PG memembership change triggers a SB
port-group
> > +#    deletion and creation with the same SB port-group name, while the
> > +#    notification of the changes can come to ovn-controller in one
shot and the
> > +#    order of the "del" and "add" in the notification is undefined.
This test
> > +#    runs the scenario ten times to make sure the unpleasant order is
tested.
> > +
> > +for i in $(seq 1 10); do
> > +    # Delete and recreate the SB PG with same name and content.
> > +    sb_pg_name=2_pg1 # dp key + NB pg name
> > +    sb_pg_uuid=$(fetch_column port_group _uuid name=$sb_pg_name)
> > +    ports_=$(fetch_column port_group ports name=2_pg1)
> > +    ports=${ports_/ /,}
> > +    AT_CHECK([ovn-sbctl destroy port_group $sb_pg_uuid -- create
port_group name=$sb_pg_name ports=$ports], [0], [ignore])
> > +
> > +    # Unbind and bind lsp0-0 to trigger runtime data change as well.
> > +    ovs-vsctl del-port br-int lsp0-0
> > +    check ovn-nbctl --wait=hv sync
> > +    ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0
external_ids:iface-id=lsp0-0
> > +    check ovn-nbctl --wait=hv sync
> > +
> > +    # Finally check flow count is the same as before.
> > +    AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep
conjunction | wc -l) == 24])
> > +done
> > +
> >   # Make sure all the above was performed with I-P (no recompute)
> >   AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter
lflow_run) == $lflow_run])
> >
> >
>
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index d48ddc7a2..07c6fcfd1 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1556,7 +1556,11 @@  port_groups_update(const struct sbrec_port_group_table *port_group_table,
             expr_const_sets_remove(port_groups_cs_local, pg->name);
             port_group_ssets_delete(port_group_ssets, pg->name);
             sset_add(deleted, pg->name);
-        } else {
+        }
+    }
+
+    SBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (pg, port_group_table) {
+        if (!sbrec_port_group_is_deleted(pg)) {
             port_group_ssets_add_or_update(port_group_ssets, pg);
             expr_const_sets_add_strings(port_groups_cs_local, pg->name,
                                         (const char *const *) pg->ports,
diff --git a/tests/ovn.at b/tests/ovn.at
index a21dbadac..7be494644 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26562,7 +26562,7 @@  AT_CLEANUP
 ])
 
 # Tests the efficiency of conjunction flow generation when port groups are used
-# by ACLs. Make sure there is no open flow explosion in table 45 for an ACL
+# by ACLs. Make sure there is no open flow explosion in table 44 for an ACL
 # with self-referencing match condition of a port group:
 #
 # "outport == @pg1 && ip4.src == $pg1_ip4"
@@ -26648,6 +26648,32 @@  ovs-vsctl add-port br-int lsp1-1 -- set interface lsp1-1 external_ids:iface-id=l
 check ovn-nbctl --wait=hv sync
 AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
 
+# 6. Simulate a SB port-group "del and add" notification to ovn-controller in the
+#    same IDL iteration. ovn-controller should still program the same flows. In
+#    reality it can happen when PG memembership change triggers a SB port-group
+#    deletion and creation with the same SB port-group name, while the
+#    notification of the changes can come to ovn-controller in one shot and the
+#    order of the "del" and "add" in the notification is undefined. This test
+#    runs the scenario ten times to make sure the unpleasant order is tested.
+
+for i in $(seq 1 10); do
+    # Delete and recreate the SB PG with same name and content.
+    sb_pg_name=2_pg1 # dp key + NB pg name
+    sb_pg_uuid=$(fetch_column port_group _uuid name=$sb_pg_name)
+    ports_=$(fetch_column port_group ports name=2_pg1)
+    ports=${ports_/ /,}
+    AT_CHECK([ovn-sbctl destroy port_group $sb_pg_uuid -- create port_group name=$sb_pg_name ports=$ports], [0], [ignore])
+
+    # Unbind and bind lsp0-0 to trigger runtime data change as well.
+    ovs-vsctl del-port br-int lsp0-0
+    check ovn-nbctl --wait=hv sync
+    ovs-vsctl add-port br-int lsp0-0 -- set interface lsp0-0 external_ids:iface-id=lsp0-0
+    check ovn-nbctl --wait=hv sync
+
+    # Finally check flow count is the same as before.
+    AT_CHECK([test $(ovs-ofctl dump-flows br-int table=44 | grep conjunction | wc -l) == 24])
+done
+
 # Make sure all the above was performed with I-P (no recompute)
 AT_CHECK([test $(ovn-appctl -t ovn-controller coverage/read-counter lflow_run) == $lflow_run])