diff mbox series

[ovs-dev] ovn-controller: Fix some issues with CT zone assignment.

Message ID 20221021183759.4192249-1-mmichels@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] ovn-controller: Fix some issues with CT zone assignment. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Mark Michelson Oct. 21, 2022, 6:37 p.m. UTC
An issue was filed where CT zone 0 was assigned to both a logical router
SNAT and to a logical port. CT zone 0 is typically "reserved" and not
assigned by ovn-controller; however, since SNAT zones are configurable,
it is possible for ovn-controller to assign this zone at the CMS's
request. This accounts for how CT zone 0 can be assigned for SNAT. There
was also a small bug in the incremental processing that could result in
a logical port being assigned zone 0.

In the specific issue report, CT zones were restored from the OVSDB when
ovn-controller started, and the conflicting CT zones were already
present. ovn-controller dutifully loaded these zones up. But then there
was nothing that would allow for the conflict to be resolved afterwards.

It is unknown how these conflicts entered into the OVSDB in the first
place. This change does not purport to prevent conflicts from entering
the OVSDB. However, it does make the following changes that should
further safeguard against unwanted behavior:

* ct_zones_runtime_data_handler() now assigns zones starting at
  1 instead of 0. This makes it use the same range as update_ct_zones().
* update_ct_zones() now keeps a new simap for zones that are assigned
  but not due to an SNAT zone request. This allows for us to guarantee
  that when there is a conflict between a previously auto-assigned CT
  zone and a newly-requested zone, we are guaranteed to remove the
  auto-assigned CT zone.
* The removal of conflicting auto-assigned CT zones is now performed
  before dealing with the newly requested zone. This makes it so that if
  we load conflicting zones from the OVSDB or if there is some issue
  that results in conflicting zones being assigned, we will correct the
  issue.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
Signed-off-by: Mark Michelson <mmichels@redhat.com>
---
 controller/ovn-controller.c | 53 ++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 27 deletions(-)

Comments

Ales Musil Oct. 24, 2022, 6:53 a.m. UTC | #1
Hi Mark,

please see my comment below.


On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <mmichels@redhat.com> wrote:

> An issue was filed where CT zone 0 was assigned to both a logical router
> SNAT and to a logical port. CT zone 0 is typically "reserved" and not
> assigned by ovn-controller; however, since SNAT zones are configurable,
> it is possible for ovn-controller to assign this zone at the CMS's
> request. This accounts for how CT zone 0 can be assigned for SNAT. There
> was also a small bug in the incremental processing that could result in
> a logical port being assigned zone 0.
>
> In the specific issue report, CT zones were restored from the OVSDB when
> ovn-controller started, and the conflicting CT zones were already
> present. ovn-controller dutifully loaded these zones up. But then there
> was nothing that would allow for the conflict to be resolved afterwards.
>
> It is unknown how these conflicts entered into the OVSDB in the first
> place. This change does not purport to prevent conflicts from entering
> the OVSDB. However, it does make the following changes that should
> further safeguard against unwanted behavior:
>
> * ct_zones_runtime_data_handler() now assigns zones starting at
>   1 instead of 0. This makes it use the same range as update_ct_zones().
> * update_ct_zones() now keeps a new simap for zones that are assigned
>   but not due to an SNAT zone request. This allows for us to guarantee
>   that when there is a conflict between a previously auto-assigned CT
>   zone and a newly-requested zone, we are guaranteed to remove the
>   auto-assigned CT zone.
> * The removal of conflicting auto-assigned CT zones is now performed
>   before dealing with the newly requested zone. This makes it so that if
>   we load conflicting zones from the OVSDB or if there is some issue
>   that results in conflicting zones being assigned, we will correct the
>   issue.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> ---
>  controller/ovn-controller.c | 53 ++++++++++++++++++-------------------
>  1 file changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 8895c7a2b..5fd9bbc40 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -659,7 +659,8 @@ update_ct_zones(const struct shash *binding_lports,
>      const char *user;
>      struct sset all_users = SSET_INITIALIZER(&all_users);
>      struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);
> -    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
> +    unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
> +    struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones);
>
>      struct shash_node *shash_node;
>      SHASH_FOR_EACH (shash_node, binding_lports) {
> @@ -696,49 +697,46 @@ update_ct_zones(const struct shash *binding_lports,
>              bitmap_set0(ct_zone_bitmap, ct_zone->data);
>              simap_delete(ct_zones, ct_zone);
>          } else if (!simap_find(&req_snat_zones, ct_zone->name)) {
> -            bitmap_set1(unreq_snat_zones, ct_zone->data);
> +            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
> +            simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data);
>          }
>      }
>
>      /* Prioritize requested CT zones */
>      struct simap_node *snat_req_node;
>      SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
> -        struct simap_node *node = simap_find(ct_zones,
> snat_req_node->name);
> -        if (node) {
> -            if (node->data == snat_req_node->data) {
> -                /* No change to this request, so no action needed */
> -                continue;
> -            } else {
> -                /* Zone request has changed for this node. delete old
> entry */
> -                bitmap_set0(ct_zone_bitmap, node->data);
> -                simap_delete(ct_zones, node);
> -            }
> -        }
> -
>          /* Determine if someone already had this zone auto-assigned.
>           * If so, then they need to give up their assignment since
>           * that zone is being explicitly requested now.
>           */
> -        if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
> -            struct simap_node *dup;
> -            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
> -                if (dup != snat_req_node && dup->data ==
> snat_req_node->data) {
> -                    simap_delete(ct_zones, dup);
> +        if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) {
> +            struct simap_node *unreq_node;
> +            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
> +                if (unreq_node->data == snat_req_node->data) {
>                      break;
>                  }
>              }
>

This does not seem right, considering the scenario when we have two zones
A=0 and B=0 and at the same time requesting SNAT=0,
the B wouldn't be cleared and there would be conflict. Which makes me
wonder if it's not the scenario of how we got into the same zone scenario
in the first place?
IMO it would make more sense to run the simap_find_and_delete(ct_zones,
unreq_node->name); in the loop without break, to really remove anything
that has the same id.
Having this patch applied the could just run recompute without worrying how
it got into this situation in the first place and it should fix it. WDYT?


> +            ovs_assert(unreq_node);
> +
> +            simap_find_and_delete(ct_zones, unreq_node->name);
> +
>              /* Set this bit to 0 so that if multiple datapaths have
> requested
>               * this zone, we don't needlessly double-detect this
> condition.
>               */
> -            bitmap_set0(unreq_snat_zones, snat_req_node->data);
> +            bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
>          }
>
> -        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
> -                                  snat_req_node->data, true,
> -                                  snat_req_node->name);
> -
> -        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
> -        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
> +        struct simap_node *node = simap_find(ct_zones,
> snat_req_node->name);
> +        if (node && node->data != snat_req_node->data) {
> +            /* Zone request has changed for this node. delete old entry
> and
> +             * create new one*/
> +            add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
> +                                      snat_req_node->data, true,
> +                                      snat_req_node->name);
> +            bitmap_set0(ct_zone_bitmap, node->data);
> +            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
> +            node->data = snat_req_node->data;
> +        }
>      }
>
>      /* xxx This is wasteful to assign a zone to each port--even if no
> @@ -756,6 +754,7 @@ update_ct_zones(const struct shash *binding_lports,
>      }
>
>      simap_destroy(&req_snat_zones);
> +    simap_destroy(&unreq_snat_zones);
>      sset_destroy(&all_users);
>  }
>
> @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct engine_node
> *node, void *data)
>
>      struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings;
>      struct tracked_datapath *tdp;
> -    int scan_start = 0;
> +    int scan_start = 1;
>
>      bool updated = false;
>
> --
> 2.37.3
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Thanks,
Ales
Ales Musil Oct. 24, 2022, 8:43 a.m. UTC | #2
On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <amusil@redhat.com> wrote:

> Hi Mark,
>
> please see my comment below.
>
>
> On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <mmichels@redhat.com>
> wrote:
>
>> An issue was filed where CT zone 0 was assigned to both a logical router
>> SNAT and to a logical port. CT zone 0 is typically "reserved" and not
>> assigned by ovn-controller; however, since SNAT zones are configurable,
>> it is possible for ovn-controller to assign this zone at the CMS's
>> request. This accounts for how CT zone 0 can be assigned for SNAT. There
>> was also a small bug in the incremental processing that could result in
>> a logical port being assigned zone 0.
>>
>> In the specific issue report, CT zones were restored from the OVSDB when
>> ovn-controller started, and the conflicting CT zones were already
>> present. ovn-controller dutifully loaded these zones up. But then there
>> was nothing that would allow for the conflict to be resolved afterwards.
>>
>> It is unknown how these conflicts entered into the OVSDB in the first
>> place. This change does not purport to prevent conflicts from entering
>> the OVSDB. However, it does make the following changes that should
>> further safeguard against unwanted behavior:
>>
>> * ct_zones_runtime_data_handler() now assigns zones starting at
>>   1 instead of 0. This makes it use the same range as update_ct_zones().
>> * update_ct_zones() now keeps a new simap for zones that are assigned
>>   but not due to an SNAT zone request. This allows for us to guarantee
>>   that when there is a conflict between a previously auto-assigned CT
>>   zone and a newly-requested zone, we are guaranteed to remove the
>>   auto-assigned CT zone.
>> * The removal of conflicting auto-assigned CT zones is now performed
>>   before dealing with the newly requested zone. This makes it so that if
>>   we load conflicting zones from the OVSDB or if there is some issue
>>   that results in conflicting zones being assigned, we will correct the
>>   issue.
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
>> Signed-off-by: Mark Michelson <mmichels@redhat.com>
>> ---
>>  controller/ovn-controller.c | 53 ++++++++++++++++++-------------------
>>  1 file changed, 26 insertions(+), 27 deletions(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 8895c7a2b..5fd9bbc40 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -659,7 +659,8 @@ update_ct_zones(const struct shash *binding_lports,
>>      const char *user;
>>      struct sset all_users = SSET_INITIALIZER(&all_users);
>>      struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);
>> -    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>> +    unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
>> +    struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones);
>>
>>      struct shash_node *shash_node;
>>      SHASH_FOR_EACH (shash_node, binding_lports) {
>> @@ -696,49 +697,46 @@ update_ct_zones(const struct shash *binding_lports,
>>              bitmap_set0(ct_zone_bitmap, ct_zone->data);
>>              simap_delete(ct_zones, ct_zone);
>>          } else if (!simap_find(&req_snat_zones, ct_zone->name)) {
>> -            bitmap_set1(unreq_snat_zones, ct_zone->data);
>> +            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
>> +            simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data);
>>          }
>>      }
>>
>>      /* Prioritize requested CT zones */
>>      struct simap_node *snat_req_node;
>>      SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
>> -        struct simap_node *node = simap_find(ct_zones,
>> snat_req_node->name);
>> -        if (node) {
>> -            if (node->data == snat_req_node->data) {
>> -                /* No change to this request, so no action needed */
>> -                continue;
>> -            } else {
>> -                /* Zone request has changed for this node. delete old
>> entry */
>> -                bitmap_set0(ct_zone_bitmap, node->data);
>> -                simap_delete(ct_zones, node);
>> -            }
>> -        }
>> -
>>          /* Determine if someone already had this zone auto-assigned.
>>           * If so, then they need to give up their assignment since
>>           * that zone is being explicitly requested now.
>>           */
>> -        if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
>> -            struct simap_node *dup;
>> -            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
>> -                if (dup != snat_req_node && dup->data ==
>> snat_req_node->data) {
>> -                    simap_delete(ct_zones, dup);
>> +        if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) {
>> +            struct simap_node *unreq_node;
>> +            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
>> +                if (unreq_node->data == snat_req_node->data) {
>>                      break;
>>                  }
>>              }
>>
>
> This does not seem right, considering the scenario when we have two zones
> A=0 and B=0 and at the same time requesting SNAT=0,
> the B wouldn't be cleared and there would be conflict. Which makes me
> wonder if it's not the scenario of how we got into the same zone scenario
> in the first place?
> IMO it would make more sense to run the simap_find_and_delete(ct_zones,
> unreq_node->name); in the loop without break, to really remove anything
> that has the same id.
> Having this patch applied the could just run recompute without worrying
> how it got into this situation in the first place and it should fix it.
> WDYT?
>
>
>> +            ovs_assert(unreq_node);
>> +
>> +            simap_find_and_delete(ct_zones, unreq_node->name);
>> +
>>              /* Set this bit to 0 so that if multiple datapaths have
>> requested
>>               * this zone, we don't needlessly double-detect this
>> condition.
>>               */
>> -            bitmap_set0(unreq_snat_zones, snat_req_node->data);
>> +            bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
>>          }
>>
>> -        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
>> -                                  snat_req_node->data, true,
>> -                                  snat_req_node->name);
>> -
>> -        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>> -        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
>> +        struct simap_node *node = simap_find(ct_zones,
>> snat_req_node->name);
>> +        if (node && node->data != snat_req_node->data) {
>> +            /* Zone request has changed for this node. delete old entry
>> and
>> +             * create new one*/
>> +            add_pending_ct_zone_entry(pending_ct_zones,
>> CT_ZONE_OF_QUEUED,
>> +                                      snat_req_node->data, true,
>> +                                      snat_req_node->name);
>> +            bitmap_set0(ct_zone_bitmap, node->data);
>> +            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>> +            node->data = snat_req_node->data;
>> +        }
>>      }
>>
>>      /* xxx This is wasteful to assign a zone to each port--even if no
>> @@ -756,6 +754,7 @@ update_ct_zones(const struct shash *binding_lports,
>>      }
>>
>>      simap_destroy(&req_snat_zones);
>> +    simap_destroy(&unreq_snat_zones);
>>      sset_destroy(&all_users);
>>  }
>>
>> @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct engine_node
>> *node, void *data)
>>
>>      struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings;
>>      struct tracked_datapath *tdp;
>> -    int scan_start = 0;
>> +    int scan_start = 1;
>>
>>      bool updated = false;
>>
>> --
>> 2.37.3
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thanks,
> Ales
>
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA <https://www.redhat.com>
>
> amusil@redhat.com    IM: amusil
> <https://red.ht/sig>
>


And I have also found a reproducer.
If you run "snat-ct-zone with common NAT zone" with the below diff applied
you should see
multiple zones having zone id=0. It can be also reproduced with your patch
so we probably need to adjust that loop.

diff --git a/tests/ovn.at b/tests/ovn.at
index f8b8db4df..6cf7d93a3 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -32366,6 +32366,9 @@ as hv1
 check ovs-vsctl add-br br-phys
 ovn_attach n1 br-phys 192.168.0.1
 check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv
external-ids:iface-id=ls0-hv
+check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1
external-ids:iface-id=ls0-hv1
+check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2
external-ids:iface-id=ls0-hv2
+check ovs-vsctl set open . external_ids:ovn-monitor-all=true

 check ovn-nbctl lr-add lr0

@@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0
00:00:00:00:00:01 10.0.0.1
 check ovn-nbctl lsp-add ls0 ls0-hv
 check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2"

+check ovn-nbctl lsp-add ls0 ls0-hv1
+check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3"
+
+check ovn-nbctl lsp-add ls0 ls0-hv2
+check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4"
+
 check ovn-nbctl ls-add ext
 check ovn-nbctl lsp-add ext ext-lr0
 check ovn-nbctl lsp-set-type ext-lr0 router
@@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1}
 check test "$dnat_zone" = "$DNAT_ZONE_REG"
 check test "$snat_zone" = "$DNAT_ZONE_REG"

-check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=666
+check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0

 dnat_zone=$(ovs-ofctl dump-flows br-int
table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d '='
-f 2)
 dnat_zone=${dnat_zone::-1}
@@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1}
 check test "$dnat_zone" = "$SNAT_ZONE_REG"
 check test "$snat_zone" = "$SNAT_ZONE_REG"

+ovs-appctl -t ovn-controller exit --restart
+
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0"
+ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0"
+
+start_daemon ovn-controller
+sleep 2
+
+ovn-appctl ct-zone-list
+
 OVN_CLEANUP([hv1])
 AT_CLEANUP

Thanks,
Ales
Mark Michelson Oct. 24, 2022, 7:18 p.m. UTC | #3
On 10/24/22 04:43, Ales Musil wrote:
> 
> 
> On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <amusil@redhat.com 
> <mailto:amusil@redhat.com>> wrote:
> 
>     Hi Mark,
> 
>     please see my comment below.
> 

Thanks for the review.

> 
>     On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <mmichels@redhat.com
>     <mailto:mmichels@redhat.com>> wrote:
> 
>         An issue was filed where CT zone 0 was assigned to both a
>         logical router
>         SNAT and to a logical port. CT zone 0 is typically "reserved"
>         and not
>         assigned by ovn-controller; however, since SNAT zones are
>         configurable,
>         it is possible for ovn-controller to assign this zone at the CMS's
>         request. This accounts for how CT zone 0 can be assigned for
>         SNAT. There
>         was also a small bug in the incremental processing that could
>         result in
>         a logical port being assigned zone 0.
> 
>         In the specific issue report, CT zones were restored from the
>         OVSDB when
>         ovn-controller started, and the conflicting CT zones were already
>         present. ovn-controller dutifully loaded these zones up. But
>         then there
>         was nothing that would allow for the conflict to be resolved
>         afterwards.
> 
>         It is unknown how these conflicts entered into the OVSDB in the
>         first
>         place. This change does not purport to prevent conflicts from
>         entering
>         the OVSDB. However, it does make the following changes that should
>         further safeguard against unwanted behavior:
> 
>         * ct_zones_runtime_data_handler() now assigns zones starting at
>            1 instead of 0. This makes it use the same range as
>         update_ct_zones().
>         * update_ct_zones() now keeps a new simap for zones that are
>         assigned
>            but not due to an SNAT zone request. This allows for us to
>         guarantee
>            that when there is a conflict between a previously
>         auto-assigned CT
>            zone and a newly-requested zone, we are guaranteed to remove the
>            auto-assigned CT zone.
>         * The removal of conflicting auto-assigned CT zones is now performed
>            before dealing with the newly requested zone. This makes it
>         so that if
>            we load conflicting zones from the OVSDB or if there is some
>         issue
>            that results in conflicting zones being assigned, we will
>         correct the
>            issue.
> 
>         Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
>         <https://bugzilla.redhat.com/show_bug.cgi?id=2126406>
>         Signed-off-by: Mark Michelson <mmichels@redhat.com
>         <mailto:mmichels@redhat.com>>
>         ---
>           controller/ovn-controller.c | 53
>         ++++++++++++++++++-------------------
>           1 file changed, 26 insertions(+), 27 deletions(-)
> 
>         diff --git a/controller/ovn-controller.c
>         b/controller/ovn-controller.c
>         index 8895c7a2b..5fd9bbc40 100644
>         --- a/controller/ovn-controller.c
>         +++ b/controller/ovn-controller.c
>         @@ -659,7 +659,8 @@ update_ct_zones(const struct shash
>         *binding_lports,
>               const char *user;
>               struct sset all_users = SSET_INITIALIZER(&all_users);
>               struct simap req_snat_zones =
>         SIMAP_INITIALIZER(&req_snat_zones);
>         -    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>         +    unsigned long
>         unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
>         +    struct simap unreq_snat_zones =
>         SIMAP_INITIALIZER(&unreq_snat_zones);
> 
>               struct shash_node *shash_node;
>               SHASH_FOR_EACH (shash_node, binding_lports) {
>         @@ -696,49 +697,46 @@ update_ct_zones(const struct shash
>         *binding_lports,
>                       bitmap_set0(ct_zone_bitmap, ct_zone->data);
>                       simap_delete(ct_zones, ct_zone);
>                   } else if (!simap_find(&req_snat_zones, ct_zone->name)) {
>         -            bitmap_set1(unreq_snat_zones, ct_zone->data);
>         +            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
>         +            simap_put(&unreq_snat_zones, ct_zone->name,
>         ct_zone->data);
>                   }
>               }
> 
>               /* Prioritize requested CT zones */
>               struct simap_node *snat_req_node;
>               SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
>         -        struct simap_node *node = simap_find(ct_zones,
>         snat_req_node->name);
>         -        if (node) {
>         -            if (node->data == snat_req_node->data) {
>         -                /* No change to this request, so no action
>         needed */
>         -                continue;
>         -            } else {
>         -                /* Zone request has changed for this node.
>         delete old entry */
>         -                bitmap_set0(ct_zone_bitmap, node->data);
>         -                simap_delete(ct_zones, node);
>         -            }
>         -        }
>         -
>                   /* Determine if someone already had this zone
>         auto-assigned.
>                    * If so, then they need to give up their assignment since
>                    * that zone is being explicitly requested now.
>                    */
>         -        if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
>         -            struct simap_node *dup;
>         -            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
>         -                if (dup != snat_req_node && dup->data ==
>         snat_req_node->data) {
>         -                    simap_delete(ct_zones, dup);
>         +        if (bitmap_is_set(unreq_snat_zones_map,
>         snat_req_node->data)) {
>         +            struct simap_node *unreq_node;
>         +            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
>         +                if (unreq_node->data == snat_req_node->data) {
>                               break;
>                           }
>                       }
> 
> 
>     This does not seem right, considering the scenario when we have two
>     zones A=0 and B=0 and at the same time requesting SNAT=0,
>     the B wouldn't be cleared and there would be conflict. Which makes
>     me wonder if it's not the scenario of how we got into the same zone
>     scenario in the first place?

Right now, the only way I know how to make that happen is by 
purposefully setting those zones in the OVSDB. You're correct that if 
the OVSDB already had A=0 and B=0, then neither the original code nor 
the patched code  will correct the issue.

>     IMO it would make more sense to run the
>     simap_find_and_delete(ct_zones, unreq_node->name); in the loop
>     without break, to really remove anything that has the same id.
>     Having this patch applied the could just run recompute without
>     worrying how it got into this situation in the first place and it
>     should fix it. WDYT?

That's a good idea, since this will address the scenario you have 
outlined above. I had a bit of concern about "wasted" cycles, but it's 
probably better to be safe here, especially since you have demonstrated 
a scenario where this patched code will not resolve the issue.

> 
>         +            ovs_assert(unreq_node);
>         +
>         +            simap_find_and_delete(ct_zones, unreq_node->name);
>         +
>                       /* Set this bit to 0 so that if multiple datapaths
>         have requested
>                        * this zone, we don't needlessly double-detect
>         this condition.
>                        */
>         -            bitmap_set0(unreq_snat_zones, snat_req_node->data);
>         +            bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
>                   }
> 
>         -        add_pending_ct_zone_entry(pending_ct_zones,
>         CT_ZONE_OF_QUEUED,
>         -                                  snat_req_node->data, true,
>         -                                  snat_req_node->name);
>         -
>         -        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>         -        simap_put(ct_zones, snat_req_node->name,
>         snat_req_node->data);
>         +        struct simap_node *node = simap_find(ct_zones,
>         snat_req_node->name);
>         +        if (node && node->data != snat_req_node->data) {
>         +            /* Zone request has changed for this node. delete
>         old entry and
>         +             * create new one*/
>         +            add_pending_ct_zone_entry(pending_ct_zones,
>         CT_ZONE_OF_QUEUED,
>         +                                      snat_req_node->data, true,
>         +                                      snat_req_node->name);
>         +            bitmap_set0(ct_zone_bitmap, node->data);
>         +            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
>         +            node->data = snat_req_node->data;
>         +        }
>               }
> 
>               /* xxx This is wasteful to assign a zone to each
>         port--even if no
>         @@ -756,6 +754,7 @@ update_ct_zones(const struct shash
>         *binding_lports,
>               }
> 
>               simap_destroy(&req_snat_zones);
>         +    simap_destroy(&unreq_snat_zones);
>               sset_destroy(&all_users);
>           }
> 
>         @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct
>         engine_node *node, void *data)
> 
>               struct hmap *tracked_dp_bindings =
>         &rt_data->tracked_dp_bindings;
>               struct tracked_datapath *tdp;
>         -    int scan_start = 0;
>         +    int scan_start = 1;
> 
>               bool updated = false;
> 
>         -- 
>         2.37.3
> 
>         _______________________________________________
>         dev mailing list
>         dev@openvswitch.org <mailto:dev@openvswitch.org>
>         https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>         <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> 
> 
>     Thanks,
>     Ales
> 
>     -- 
> 
>     Ales Musil
> 
>     Senior Software Engineer - OVN Core
> 
>     Red Hat EMEA <https://www.redhat.com>
> 
>     amusil@redhat.com <mailto:amusil@redhat.com> IM: amusil
> 
>     <https://red.ht/sig>
> 
> 
> 
> And I have also found a reproducer.
> If you run "snat-ct-zone with common NAT zone" with the below diff 
> applied you should see
> multiple zones having zone id=0. It can be also reproduced with your 
> patch so we probably need to adjust that loop.
> 
> diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
> index f8b8db4df..6cf7d93a3 100644
> --- a/tests/ovn.at <http://ovn.at>
> +++ b/tests/ovn.at <http://ovn.at>
> @@ -32366,6 +32366,9 @@ as hv1
>   check ovs-vsctl add-br br-phys
>   ovn_attach n1 br-phys 192.168.0.1
>   check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv 
> external-ids:iface-id=ls0-hv
> +check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1 
> external-ids:iface-id=ls0-hv1
> +check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2 
> external-ids:iface-id=ls0-hv2
> +check ovs-vsctl set open . external_ids:ovn-monitor-all=true
> 
>   check ovn-nbctl lr-add lr0
> 
> @@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0 
> 00:00:00:00:00:01 10.0.0.1
>   check ovn-nbctl lsp-add ls0 ls0-hv
>   check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2"
> 
> +check ovn-nbctl lsp-add ls0 ls0-hv1
> +check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3"
> +
> +check ovn-nbctl lsp-add ls0 ls0-hv2
> +check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4"
> +
>   check ovn-nbctl ls-add ext
>   check ovn-nbctl lsp-add ext ext-lr0
>   check ovn-nbctl lsp-set-type ext-lr0 router
> @@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1}
>   check test "$dnat_zone" = "$DNAT_ZONE_REG"
>   check test "$snat_zone" = "$DNAT_ZONE_REG"
> 
> -check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=666
> +check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0
> 
>   dnat_zone=$(ovs-ofctl dump-flows br-int 
> table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d 
> '=' -f 2)
>   dnat_zone=${dnat_zone::-1}
> @@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1}
>   check test "$dnat_zone" = "$SNAT_ZONE_REG"
>   check test "$snat_zone" = "$SNAT_ZONE_REG"
> 
> +ovs-appctl -t ovn-controller exit --restart
> +
> +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0"
> +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0"
> +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0"
> +
> +start_daemon ovn-controller
> +sleep 2
> +
> +ovn-appctl ct-zone-list
> +

Thanks for the reproducer. I will add this to v2 and add a co-author 
credit to you.

>   OVN_CLEANUP([hv1])
>   AT_CLEANUP
> Thanks,
> Ales
> 
> -- 
> 
> Ales Musil
> 
> Senior Software Engineer - OVN Core
> 
> Red Hat EMEA <https://www.redhat.com>
> 
> amusil@redhat.com <mailto:amusil@redhat.com> IM: amusil
> 
> <https://red.ht/sig>
>
Ales Musil Oct. 25, 2022, 5:30 a.m. UTC | #4
On Mon, Oct 24, 2022 at 9:18 PM Mark Michelson <mmichels@redhat.com> wrote:

> On 10/24/22 04:43, Ales Musil wrote:
> >
> >
> > On Mon, Oct 24, 2022 at 8:53 AM Ales Musil <amusil@redhat.com
> > <mailto:amusil@redhat.com>> wrote:
> >
> >     Hi Mark,
> >
> >     please see my comment below.
> >
>
> Thanks for the review.
>
> >
> >     On Fri, Oct 21, 2022 at 8:38 PM Mark Michelson <mmichels@redhat.com
> >     <mailto:mmichels@redhat.com>> wrote:
> >
> >         An issue was filed where CT zone 0 was assigned to both a
> >         logical router
> >         SNAT and to a logical port. CT zone 0 is typically "reserved"
> >         and not
> >         assigned by ovn-controller; however, since SNAT zones are
> >         configurable,
> >         it is possible for ovn-controller to assign this zone at the
> CMS's
> >         request. This accounts for how CT zone 0 can be assigned for
> >         SNAT. There
> >         was also a small bug in the incremental processing that could
> >         result in
> >         a logical port being assigned zone 0.
> >
> >         In the specific issue report, CT zones were restored from the
> >         OVSDB when
> >         ovn-controller started, and the conflicting CT zones were already
> >         present. ovn-controller dutifully loaded these zones up. But
> >         then there
> >         was nothing that would allow for the conflict to be resolved
> >         afterwards.
> >
> >         It is unknown how these conflicts entered into the OVSDB in the
> >         first
> >         place. This change does not purport to prevent conflicts from
> >         entering
> >         the OVSDB. However, it does make the following changes that
> should
> >         further safeguard against unwanted behavior:
> >
> >         * ct_zones_runtime_data_handler() now assigns zones starting at
> >            1 instead of 0. This makes it use the same range as
> >         update_ct_zones().
> >         * update_ct_zones() now keeps a new simap for zones that are
> >         assigned
> >            but not due to an SNAT zone request. This allows for us to
> >         guarantee
> >            that when there is a conflict between a previously
> >         auto-assigned CT
> >            zone and a newly-requested zone, we are guaranteed to remove
> the
> >            auto-assigned CT zone.
> >         * The removal of conflicting auto-assigned CT zones is now
> performed
> >            before dealing with the newly requested zone. This makes it
> >         so that if
> >            we load conflicting zones from the OVSDB or if there is some
> >         issue
> >            that results in conflicting zones being assigned, we will
> >         correct the
> >            issue.
> >
> >         Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2126406
> >         <https://bugzilla.redhat.com/show_bug.cgi?id=2126406>
> >         Signed-off-by: Mark Michelson <mmichels@redhat.com
> >         <mailto:mmichels@redhat.com>>
> >         ---
> >           controller/ovn-controller.c | 53
> >         ++++++++++++++++++-------------------
> >           1 file changed, 26 insertions(+), 27 deletions(-)
> >
> >         diff --git a/controller/ovn-controller.c
> >         b/controller/ovn-controller.c
> >         index 8895c7a2b..5fd9bbc40 100644
> >         --- a/controller/ovn-controller.c
> >         +++ b/controller/ovn-controller.c
> >         @@ -659,7 +659,8 @@ update_ct_zones(const struct shash
> >         *binding_lports,
> >               const char *user;
> >               struct sset all_users = SSET_INITIALIZER(&all_users);
> >               struct simap req_snat_zones =
> >         SIMAP_INITIALIZER(&req_snat_zones);
> >         -    unsigned long
> unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
> >         +    unsigned long
> >         unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
> >         +    struct simap unreq_snat_zones =
> >         SIMAP_INITIALIZER(&unreq_snat_zones);
> >
> >               struct shash_node *shash_node;
> >               SHASH_FOR_EACH (shash_node, binding_lports) {
> >         @@ -696,49 +697,46 @@ update_ct_zones(const struct shash
> >         *binding_lports,
> >                       bitmap_set0(ct_zone_bitmap, ct_zone->data);
> >                       simap_delete(ct_zones, ct_zone);
> >                   } else if (!simap_find(&req_snat_zones,
> ct_zone->name)) {
> >         -            bitmap_set1(unreq_snat_zones, ct_zone->data);
> >         +            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
> >         +            simap_put(&unreq_snat_zones, ct_zone->name,
> >         ct_zone->data);
> >                   }
> >               }
> >
> >               /* Prioritize requested CT zones */
> >               struct simap_node *snat_req_node;
> >               SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
> >         -        struct simap_node *node = simap_find(ct_zones,
> >         snat_req_node->name);
> >         -        if (node) {
> >         -            if (node->data == snat_req_node->data) {
> >         -                /* No change to this request, so no action
> >         needed */
> >         -                continue;
> >         -            } else {
> >         -                /* Zone request has changed for this node.
> >         delete old entry */
> >         -                bitmap_set0(ct_zone_bitmap, node->data);
> >         -                simap_delete(ct_zones, node);
> >         -            }
> >         -        }
> >         -
> >                   /* Determine if someone already had this zone
> >         auto-assigned.
> >                    * If so, then they need to give up their assignment
> since
> >                    * that zone is being explicitly requested now.
> >                    */
> >         -        if (bitmap_is_set(unreq_snat_zones,
> snat_req_node->data)) {
> >         -            struct simap_node *dup;
> >         -            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
> >         -                if (dup != snat_req_node && dup->data ==
> >         snat_req_node->data) {
> >         -                    simap_delete(ct_zones, dup);
> >         +        if (bitmap_is_set(unreq_snat_zones_map,
> >         snat_req_node->data)) {
> >         +            struct simap_node *unreq_node;
> >         +            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
> >         +                if (unreq_node->data == snat_req_node->data) {
> >                               break;
> >                           }
> >                       }
> >
> >
> >     This does not seem right, considering the scenario when we have two
> >     zones A=0 and B=0 and at the same time requesting SNAT=0,
> >     the B wouldn't be cleared and there would be conflict. Which makes
> >     me wonder if it's not the scenario of how we got into the same zone
> >     scenario in the first place?
>
> Right now, the only way I know how to make that happen is by
> purposefully setting those zones in the OVSDB. You're correct that if
> the OVSDB already had A=0 and B=0, then neither the original code nor
> the patched code  will correct the issue.
>

Right, but that's still something that can happen as we do directly control
the OVSDB.


>
> >     IMO it would make more sense to run the
> >     simap_find_and_delete(ct_zones, unreq_node->name); in the loop
> >     without break, to really remove anything that has the same id.
> >     Having this patch applied the could just run recompute without
> >     worrying how it got into this situation in the first place and it
> >     should fix it. WDYT?
>
> That's a good idea, since this will address the scenario you have
> outlined above. I had a bit of concern about "wasted" cycles, but it's
> probably better to be safe here, especially since you have demonstrated
> a scenario where this patched code will not resolve the issue.
>

Yeah it's not the best solution, on the other hand the loop only happens
when there is
conflict so that should be fine.


>
> >
> >         +            ovs_assert(unreq_node);
> >         +
> >         +            simap_find_and_delete(ct_zones, unreq_node->name);
> >         +
> >                       /* Set this bit to 0 so that if multiple datapaths
> >         have requested
> >                        * this zone, we don't needlessly double-detect
> >         this condition.
> >                        */
> >         -            bitmap_set0(unreq_snat_zones, snat_req_node->data);
> >         +            bitmap_set0(unreq_snat_zones_map,
> snat_req_node->data);
> >                   }
> >
> >         -        add_pending_ct_zone_entry(pending_ct_zones,
> >         CT_ZONE_OF_QUEUED,
> >         -                                  snat_req_node->data, true,
> >         -                                  snat_req_node->name);
> >         -
> >         -        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
> >         -        simap_put(ct_zones, snat_req_node->name,
> >         snat_req_node->data);
> >         +        struct simap_node *node = simap_find(ct_zones,
> >         snat_req_node->name);
> >         +        if (node && node->data != snat_req_node->data) {
> >         +            /* Zone request has changed for this node. delete
> >         old entry and
> >         +             * create new one*/
> >         +            add_pending_ct_zone_entry(pending_ct_zones,
> >         CT_ZONE_OF_QUEUED,
> >         +                                      snat_req_node->data, true,
> >         +                                      snat_req_node->name);
> >         +            bitmap_set0(ct_zone_bitmap, node->data);
> >         +            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
> >         +            node->data = snat_req_node->data;
> >         +        }
> >               }
> >
> >               /* xxx This is wasteful to assign a zone to each
> >         port--even if no
> >         @@ -756,6 +754,7 @@ update_ct_zones(const struct shash
> >         *binding_lports,
> >               }
> >
> >               simap_destroy(&req_snat_zones);
> >         +    simap_destroy(&unreq_snat_zones);
> >               sset_destroy(&all_users);
> >           }
> >
> >         @@ -2178,7 +2177,7 @@ ct_zones_runtime_data_handler(struct
> >         engine_node *node, void *data)
> >
> >               struct hmap *tracked_dp_bindings =
> >         &rt_data->tracked_dp_bindings;
> >               struct tracked_datapath *tdp;
> >         -    int scan_start = 0;
> >         +    int scan_start = 1;
> >
> >               bool updated = false;
> >
> >         --
> >         2.37.3
> >
> >         _______________________________________________
> >         dev mailing list
> >         dev@openvswitch.org <mailto:dev@openvswitch.org>
> >         https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >         <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >
> >
> >     Thanks,
> >     Ales
> >
> >     --
> >
> >     Ales Musil
> >
> >     Senior Software Engineer - OVN Core
> >
> >     Red Hat EMEA <https://www.redhat.com>
> >
> >     amusil@redhat.com <mailto:amusil@redhat.com> IM: amusil
> >
> >     <https://red.ht/sig>
> >
> >
> >
> > And I have also found a reproducer.
> > If you run "snat-ct-zone with common NAT zone" with the below diff
> > applied you should see
> > multiple zones having zone id=0. It can be also reproduced with your
> > patch so we probably need to adjust that loop.
> >
> > diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at <http://ovn.at>
> > index f8b8db4df..6cf7d93a3 100644
> > --- a/tests/ovn.at <http://ovn.at>
> > +++ b/tests/ovn.at <http://ovn.at>
> > @@ -32366,6 +32366,9 @@ as hv1
> >   check ovs-vsctl add-br br-phys
> >   ovn_attach n1 br-phys 192.168.0.1
> >   check ovs-vsctl add-port br-int ls0-hv -- set Interface ls0-hv
> > external-ids:iface-id=ls0-hv
> > +check ovs-vsctl add-port br-int ls0-hv1 -- set Interface ls0-hv1
> > external-ids:iface-id=ls0-hv1
> > +check ovs-vsctl add-port br-int ls0-hv2 -- set Interface ls0-hv2
> > external-ids:iface-id=ls0-hv2
> > +check ovs-vsctl set open . external_ids:ovn-monitor-all=true
> >
> >   check ovn-nbctl lr-add lr0
> >
> > @@ -32378,6 +32381,12 @@ check ovn-nbctl lrp-add lr0 lr0-ls0
> > 00:00:00:00:00:01 10.0.0.1
> >   check ovn-nbctl lsp-add ls0 ls0-hv
> >   check ovn-nbctl lsp-set-addresses ls0-hv "00:00:00:00:00:02 10.0.0.2"
> >
> > +check ovn-nbctl lsp-add ls0 ls0-hv1
> > +check ovn-nbctl lsp-set-addresses ls0-hv1 "00:00:00:00:00:03 10.0.0.3"
> > +
> > +check ovn-nbctl lsp-add ls0 ls0-hv2
> > +check ovn-nbctl lsp-set-addresses ls0-hv2 "00:00:00:00:00:04 10.0.0.4"
> > +
> >   check ovn-nbctl ls-add ext
> >   check ovn-nbctl lsp-add ext ext-lr0
> >   check ovn-nbctl lsp-set-type ext-lr0 router
> > @@ -32410,7 +32419,7 @@ snat_zone=${snat_zone::-1}
> >   check test "$dnat_zone" = "$DNAT_ZONE_REG"
> >   check test "$snat_zone" = "$DNAT_ZONE_REG"
> >
> > -check ovn-nbctl --wait=hv set logical_router lr0
> options:snat-ct-zone=666
> > +check ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0
> >
> >   dnat_zone=$(ovs-ofctl dump-flows br-int
> > table=$DNAT_TABLE,metadata=0x${lr0_dp_key} | grep -o zone=.*, | cut -d
> > '=' -f 2)
> >   dnat_zone=${dnat_zone::-1}
> > @@ -32422,6 +32431,19 @@ snat_zone=${snat_zone::-1}
> >   check test "$dnat_zone" = "$SNAT_ZONE_REG"
> >   check test "$snat_zone" = "$SNAT_ZONE_REG"
> >
> > +ovs-appctl -t ovn-controller exit --restart
> > +
> > +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv="0"
> > +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1="0"
> > +ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2="0"
> > +
> > +start_daemon ovn-controller
> > +sleep 2
> > +
> > +ovn-appctl ct-zone-list
> > +
>
> Thanks for the reproducer. I will add this to v2 and add a co-author
> credit to you.
>

Ack, thanks.


>
> >   OVN_CLEANUP([hv1])
> >   AT_CLEANUP
> > Thanks,
> > Ales
> >
> > --
> >
> > Ales Musil
> >
> > Senior Software Engineer - OVN Core
> >
> > Red Hat EMEA <https://www.redhat.com>
> >
> > amusil@redhat.com <mailto:amusil@redhat.com> IM: amusil
> >
> > <https://red.ht/sig>
> >
>
>
Thanks,
Ales
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 8895c7a2b..5fd9bbc40 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -659,7 +659,8 @@  update_ct_zones(const struct shash *binding_lports,
     const char *user;
     struct sset all_users = SSET_INITIALIZER(&all_users);
     struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);
-    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
+    unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)];
+    struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones);
 
     struct shash_node *shash_node;
     SHASH_FOR_EACH (shash_node, binding_lports) {
@@ -696,49 +697,46 @@  update_ct_zones(const struct shash *binding_lports,
             bitmap_set0(ct_zone_bitmap, ct_zone->data);
             simap_delete(ct_zones, ct_zone);
         } else if (!simap_find(&req_snat_zones, ct_zone->name)) {
-            bitmap_set1(unreq_snat_zones, ct_zone->data);
+            bitmap_set1(unreq_snat_zones_map, ct_zone->data);
+            simap_put(&unreq_snat_zones, ct_zone->name, ct_zone->data);
         }
     }
 
     /* Prioritize requested CT zones */
     struct simap_node *snat_req_node;
     SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
-        struct simap_node *node = simap_find(ct_zones, snat_req_node->name);
-        if (node) {
-            if (node->data == snat_req_node->data) {
-                /* No change to this request, so no action needed */
-                continue;
-            } else {
-                /* Zone request has changed for this node. delete old entry */
-                bitmap_set0(ct_zone_bitmap, node->data);
-                simap_delete(ct_zones, node);
-            }
-        }
-
         /* Determine if someone already had this zone auto-assigned.
          * If so, then they need to give up their assignment since
          * that zone is being explicitly requested now.
          */
-        if (bitmap_is_set(unreq_snat_zones, snat_req_node->data)) {
-            struct simap_node *dup;
-            SIMAP_FOR_EACH_SAFE (dup, ct_zones) {
-                if (dup != snat_req_node && dup->data == snat_req_node->data) {
-                    simap_delete(ct_zones, dup);
+        if (bitmap_is_set(unreq_snat_zones_map, snat_req_node->data)) {
+            struct simap_node *unreq_node;
+            SIMAP_FOR_EACH (unreq_node, &unreq_snat_zones) {
+                if (unreq_node->data == snat_req_node->data) {
                     break;
                 }
             }
+            ovs_assert(unreq_node);
+
+            simap_find_and_delete(ct_zones, unreq_node->name);
+
             /* Set this bit to 0 so that if multiple datapaths have requested
              * this zone, we don't needlessly double-detect this condition.
              */
-            bitmap_set0(unreq_snat_zones, snat_req_node->data);
+            bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
         }
 
-        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
-                                  snat_req_node->data, true,
-                                  snat_req_node->name);
-
-        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
-        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
+        struct simap_node *node = simap_find(ct_zones, snat_req_node->name);
+        if (node && node->data != snat_req_node->data) {
+            /* Zone request has changed for this node. delete old entry and
+             * create new one*/
+            add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
+                                      snat_req_node->data, true,
+                                      snat_req_node->name);
+            bitmap_set0(ct_zone_bitmap, node->data);
+            bitmap_set1(ct_zone_bitmap, snat_req_node->data);
+            node->data = snat_req_node->data;
+        }
     }
 
     /* xxx This is wasteful to assign a zone to each port--even if no
@@ -756,6 +754,7 @@  update_ct_zones(const struct shash *binding_lports,
     }
 
     simap_destroy(&req_snat_zones);
+    simap_destroy(&unreq_snat_zones);
     sset_destroy(&all_users);
 }
 
@@ -2178,7 +2177,7 @@  ct_zones_runtime_data_handler(struct engine_node *node, void *data)
 
     struct hmap *tracked_dp_bindings = &rt_data->tracked_dp_bindings;
     struct tracked_datapath *tdp;
-    int scan_start = 0;
+    int scan_start = 1;
 
     bool updated = false;