diff mbox series

[ovs-dev] controller: fix group_table and meter_table allocation

Message ID 20231129154521.313487-1-xsimonar@redhat.com
State Superseded
Delegated to: Dumitru Ceara
Headers show
Series [ovs-dev] controller: fix group_table and meter_table allocation | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Xavier Simonart Nov. 29, 2023, 3:45 p.m. UTC
The group_table and meter_table are initialized in ovn-controller, with n_ids = 0.
Then they are re-initialized in ofctrl, with proper number of n_ids, in state S_CLEAR_FLOWS.
However, nothing prevented to start adding flows (by adding logical ports) using groups
before ofctrl reaches this state. This was causing some wrong flows (missing group in the flow).

With this patch, as soon as the feature rconn is available, i.e. before adding flows, those table
are properly re-initialized.

This issue is usually not visible in main and recent branches ci, since
"Wait for new ovn-controllers to connect to Southbound." as this was slowing down the moment when
the test started to add ports.
This was causing the following test to fail:
"ECMP static routes -- ovn-northd -- parallelization=yes -- ovn_monitor_all=yes".

Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.")
Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
---
 controller/ovn-controller.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Dumitru Ceara Nov. 29, 2023, 10:23 p.m. UTC | #1
On 11/29/23 16:45, Xavier Simonart wrote:
> The group_table and meter_table are initialized in ovn-controller, with n_ids = 0.
> Then they are re-initialized in ofctrl, with proper number of n_ids, in state S_CLEAR_FLOWS.
> However, nothing prevented to start adding flows (by adding logical ports) using groups
> before ofctrl reaches this state. This was causing some wrong flows (missing group in the flow).
> 
> With this patch, as soon as the feature rconn is available, i.e. before adding flows, those table
> are properly re-initialized.
> 
> This issue is usually not visible in main and recent branches ci, since
> "Wait for new ovn-controllers to connect to Southbound." as this was slowing down the moment when
> the test started to add ports.
> This was causing the following test to fail:
> "ECMP static routes -- ovn-northd -- parallelization=yes -- ovn_monitor_all=yes".
> 
> Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.")
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> ---

Thanks for the patch, Xavier!  And sorry for introducing the issue in
the first place.

>  controller/ovn-controller.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 44605eb4e..98d95b1f3 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -5654,6 +5654,14 @@ main(int argc, char *argv[])
>                                             br_int ? br_int->name : NULL)) {
>                  VLOG_INFO("OVS feature set changed, force recompute.");
>                  engine_set_force_recompute(true);
> +                if (ovs_feature_set_discovered()) {
> +                    lflow_output_data =
> +                        engine_get_internal_data(&en_lflow_output);
> +                    ovn_extend_table_reinit(&lflow_output_data->group_table,
> +                                        ovs_feature_max_select_groups_get());
> +                    ovn_extend_table_reinit(&lflow_output_data->meter_table,
> +                                            ovs_feature_max_meters_get());

I don't think this is correct.  What if the supported number of
groups/meters actually changed, from X to Y, with X != 0?  In that case
ovs_feature_support_run() would also return true and we'll reinitialize
the "extend_tables" (call ovn_extend_table_reinit()).  That currently
just recreates the id_pool but doesn't touch the
"ovn_extend_table->existing" map.

Doesn't that mean that we'd potentially end up in an inconsistent state?

Funny thing, in the v3 of the original patch that introduced the problem
I was calling ovn_extend_table_clear() from ovn_extend_table_reinit()
but we decided to remove it as it was already done by the caller. :)

https://patchwork.ozlabs.org/project/ovn/patch/20231026113713.1718954-1-dceara@redhat.com/#3209437

Thinking more about it, I wonder, isn't it enough to change the way this
works and just call ovn_extend_table_reinit() here instead of in
run_S_CLEAR_FLOWS()?  We should also clear the "existing" map inside
ovn_extend_table_reinit() if we're changing the size of the id_pool.

Wpuldn't that properly cover all scenarios?

> +                }
>              }
>  
>              if (br_int && ovs_feature_set_discovered()) {

Regards,
Dumitru
Xavier Simonart Nov. 30, 2023, 4:44 p.m. UTC | #2
Hi Dumitru

Thanks for the review.

On Wed, Nov 29, 2023 at 11:23 PM Dumitru Ceara <dceara@redhat.com> wrote:

> On 11/29/23 16:45, Xavier Simonart wrote:
> > The group_table and meter_table are initialized in ovn-controller, with
> n_ids = 0.
> > Then they are re-initialized in ofctrl, with proper number of n_ids, in
> state S_CLEAR_FLOWS.
> > However, nothing prevented to start adding flows (by adding logical
> ports) using groups
> > before ofctrl reaches this state. This was causing some wrong flows
> (missing group in the flow).
> >
> > With this patch, as soon as the feature rconn is available, i.e. before
> adding flows, those table
> > are properly re-initialized.
> >
> > This issue is usually not visible in main and recent branches ci, since
> > "Wait for new ovn-controllers to connect to Southbound." as this was
> slowing down the moment when
> > the test started to add ports.
> > This was causing the following test to fail:
> > "ECMP static routes -- ovn-northd -- parallelization=yes --
> ovn_monitor_all=yes".
> >
> > Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and
> meter IDs to 16bit.")
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> > ---
>
> Thanks for the patch, Xavier!  And sorry for introducing the issue in
> the first place.
>
> >  controller/ovn-controller.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 44605eb4e..98d95b1f3 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -5654,6 +5654,14 @@ main(int argc, char *argv[])
> >                                             br_int ? br_int->name :
> NULL)) {
> >                  VLOG_INFO("OVS feature set changed, force recompute.");
> >                  engine_set_force_recompute(true);
> > +                if (ovs_feature_set_discovered()) {
> > +                    lflow_output_data =
> > +                        engine_get_internal_data(&en_lflow_output);
> > +
> ovn_extend_table_reinit(&lflow_output_data->group_table,
> > +
> ovs_feature_max_select_groups_get());
> > +
> ovn_extend_table_reinit(&lflow_output_data->meter_table,
> > +
> ovs_feature_max_meters_get());
>
> I don't think this is correct.  What if the supported number of
> groups/meters actually changed, from X to Y, with X != 0?  In that case
> ovs_feature_support_run() would also return true and we'll reinitialize
> the "extend_tables" (call ovn_extend_table_reinit()).  That currently
> just recreates the id_pool but doesn't touch the
> "ovn_extend_table->existing" map.
>
> Doesn't that mean that we'd potentially end up in an inconsistent state?
>
> Funny thing, in the v3 of the original patch that introduced the problem
> I was calling ovn_extend_table_clear() from ovn_extend_table_reinit()
> but we decided to remove it as it was already done by the caller. :)
>
>
> https://patchwork.ozlabs.org/project/ovn/patch/20231026113713.1718954-1-dceara@redhat.com/#3209437

I agree  that we should call ovn_extend_table_clear() before (or within)
the ovn_extend_table_reinit()
As a side note, if the number of group decreased (which seems quite
theoretical), some flows might become inconsistent, but
as the feature set changed, we are recomputing, so it should be fine.
I'll send a v2

>
>
> Thinking more about it, I wonder, isn't it enough to change the way this
> works and just call ovn_extend_table_reinit() here instead of in
> run_S_CLEAR_FLOWS()?  We should also clear the "existing" map inside
> ovn_extend_table_reinit() if we're changing the size of the id_pool.
>
> Wpuldn't that properly cover all scenarios?
>
I think so too.

>
> > +                }
> >              }
> >
> >              if (br_int && ovs_feature_set_discovered()) {
>
> Regards,
> Dumitru
>
Thanks
Xavier
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 44605eb4e..98d95b1f3 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5654,6 +5654,14 @@  main(int argc, char *argv[])
                                            br_int ? br_int->name : NULL)) {
                 VLOG_INFO("OVS feature set changed, force recompute.");
                 engine_set_force_recompute(true);
+                if (ovs_feature_set_discovered()) {
+                    lflow_output_data =
+                        engine_get_internal_data(&en_lflow_output);
+                    ovn_extend_table_reinit(&lflow_output_data->group_table,
+                                        ovs_feature_max_select_groups_get());
+                    ovn_extend_table_reinit(&lflow_output_data->meter_table,
+                                            ovs_feature_max_meters_get());
+                }
             }
 
             if (br_int && ovs_feature_set_discovered()) {