diff mbox series

[ovs-dev,v4,5/7] northd: Add n_nat_entries field to 'struct ovn_datapath'

Message ID 20210903122148.826196-6-mark.d.gray@redhat.com
State Superseded
Headers show
Series northd: Split northd and northd incremental processing framework | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Mark Gray Sept. 3, 2021, 12:21 p.m. UTC
destroy_nat_entries() iterates over nat_entries using the
number of NAT entries from the NB database. When using I-P,
this behaviour is incorrect as the number of NAT entries in
the NBDB may differ from that stored in 'struct ovn_datapath'
causing unexpected behaviour.

Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
---
 northd/northd.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Mark Michelson Sept. 13, 2021, 8:41 p.m. UTC | #1
Hi Mark,

Of all the patches in this series, this is the one I am least sure about.

It seems simple enough, but it's not clear to me

1) Why would the number of nat entries on an ovn_datapath differ from 
what's in the database? Since this series doesn't introduce any "real" 
incremental processing, we should be recreating the ovn_datapath on 
every iteration. And on each iteration, we should have a fresh view of 
the database, so why would the values differ?
2) Why is this only an issue in destroy_nat_entries()? There are several 
other places in the code where we iterate over the number of nat entries 
using od->nbr->n_nat. Why are those not affected?
3) How does this patch address that issue? The patch just sets the 
ovn_datapath's n_nat_entries to be the same as the database's count, 
which according to the commit message, is not always accurate.

Can you provide some clarity here? Maybe tell of a situation where 
od->nbr->n_nat is incorrect?

On 9/3/21 8:21 AM, Mark Gray wrote:
> destroy_nat_entries() iterates over nat_entries using the
> number of NAT entries from the NB database. When using I-P,
> this behaviour is incorrect as the number of NAT entries in
> the NBDB may differ from that stored in 'struct ovn_datapath'
> causing unexpected behaviour.
> 
> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
> ---
>   northd/northd.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 724baa3994d5..829c4479f14b 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -605,6 +605,8 @@ struct ovn_datapath {
>   
>       /* NAT entries configured on the router. */
>       struct ovn_nat *nat_entries;
> +    size_t n_nat_entries;
> +
>       bool has_distributed_nat;
>   
>       /* Set of nat external ips on the router. */
> @@ -789,6 +791,7 @@ init_nat_entries(struct ovn_datapath *od)
>               od->has_distributed_nat = true;
>           }
>       }
> +    od->n_nat_entries = od->nbr->n_nat;
>   }
>   
>   static void
> @@ -802,7 +805,7 @@ destroy_nat_entries(struct ovn_datapath *od)
>       destroy_lport_addresses(&od->dnat_force_snat_addrs);
>       destroy_lport_addresses(&od->lb_force_snat_addrs);
>   
> -    for (size_t i = 0; i < od->nbr->n_nat; i++) {
> +    for (size_t i = 0; i < od->n_nat_entries; i++) {
>           destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
>       }
>   }
>
Mark Gray Sept. 14, 2021, 5:27 p.m. UTC | #2
On 13/09/2021 21:41, Mark Michelson wrote:
> Hi Mark,
> 
> Of all the patches in this series, this is the one I am least sure about.
Thanks for looking over these.

> 
> It seems simple enough, but it's not clear to me
> 
> 1) Why would the number of nat entries on an ovn_datapath differ from 
> what's in the database? Since this series doesn't introduce any "real" 
> incremental processing, we should be recreating the ovn_datapath on 
> every iteration. And on each iteration, we should have a fresh view of 
> the database, so why would the values differ?
> 2) Why is this only an issue in destroy_nat_entries()? There are several 
> other places in the code where we iterate over the number of nat entries 
> using od->nbr->n_nat. Why are those not affected?
> 3) How does this patch address that issue? The patch just sets the 
> ovn_datapath's n_nat_entries to be the same as the database's count, 
> which according to the commit message, is not always accurate.
> 
> Can you provide some clarity here? Maybe tell of a situation where 
> od->nbr->n_nat is incorrect?

You are absolutely correct as the details in the commit message are not
sufficient to understand. In fact, I had a hard time remembering the
reason myself!

IIRC, on each iteration, we need to call en_runtime_run(). This needs to
initialize the runtime data .. but .. it also needs to destroy the
runtime data from the previous run. This may not be the best way to do
it, but the I-P framework doesn't have a convenient way to clear data
between runs.

Consider an example in which we have completed iteration x, and started
iteration x+1. If we add a NAT entry between iteration x and iteration
x+1, od->nbr->n_nat will contain the updated value of n_nat as the
northbound record will have been updated. However, as we have not yet
(re)initialized the nat entries in od, od->n_nat_entries will be equal
to the previous value. These two values could then be different which
makes this loop invalid. Does this make sense? If so, shall I update the
commit message? How would you like me to proceed?

> 
> On 9/3/21 8:21 AM, Mark Gray wrote:
>> destroy_nat_entries() iterates over nat_entries using the
>> number of NAT entries from the NB database. When using I-P,
>> this behaviour is incorrect as the number of NAT entries in
>> the NBDB may differ from that stored in 'struct ovn_datapath'
>> causing unexpected behaviour.
>>
>> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
>> ---
>>   northd/northd.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 724baa3994d5..829c4479f14b 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -605,6 +605,8 @@ struct ovn_datapath {
>>   
>>       /* NAT entries configured on the router. */
>>       struct ovn_nat *nat_entries;
>> +    size_t n_nat_entries;
>> +
>>       bool has_distributed_nat;
>>   
>>       /* Set of nat external ips on the router. */
>> @@ -789,6 +791,7 @@ init_nat_entries(struct ovn_datapath *od)
>>               od->has_distributed_nat = true;
>>           }
>>       }
>> +    od->n_nat_entries = od->nbr->n_nat;
>>   }
>>   
>>   static void
>> @@ -802,7 +805,7 @@ destroy_nat_entries(struct ovn_datapath *od)
>>       destroy_lport_addresses(&od->dnat_force_snat_addrs);
>>       destroy_lport_addresses(&od->lb_force_snat_addrs);
>>   
>> -    for (size_t i = 0; i < od->nbr->n_nat; i++) {
>> +    for (size_t i = 0; i < od->n_nat_entries; i++) {
>>           destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
>>       }
>>   }
>>
>
Mark Michelson Sept. 15, 2021, 5:58 p.m. UTC | #3
On 9/14/21 1:27 PM, Mark Gray wrote:
> On 13/09/2021 21:41, Mark Michelson wrote:
>> Hi Mark,
>>
>> Of all the patches in this series, this is the one I am least sure about.
> Thanks for looking over these.
> 
>>
>> It seems simple enough, but it's not clear to me
>>
>> 1) Why would the number of nat entries on an ovn_datapath differ from
>> what's in the database? Since this series doesn't introduce any "real"
>> incremental processing, we should be recreating the ovn_datapath on
>> every iteration. And on each iteration, we should have a fresh view of
>> the database, so why would the values differ?
>> 2) Why is this only an issue in destroy_nat_entries()? There are several
>> other places in the code where we iterate over the number of nat entries
>> using od->nbr->n_nat. Why are those not affected?
>> 3) How does this patch address that issue? The patch just sets the
>> ovn_datapath's n_nat_entries to be the same as the database's count,
>> which according to the commit message, is not always accurate.
>>
>> Can you provide some clarity here? Maybe tell of a situation where
>> od->nbr->n_nat is incorrect?
> 
> You are absolutely correct as the details in the commit message are not
> sufficient to understand. In fact, I had a hard time remembering the
> reason myself!
> 
> IIRC, on each iteration, we need to call en_runtime_run(). This needs to
> initialize the runtime data .. but .. it also needs to destroy the
> runtime data from the previous run. This may not be the best way to do
> it, but the I-P framework doesn't have a convenient way to clear data
> between runs.
> 
> Consider an example in which we have completed iteration x, and started
> iteration x+1. If we add a NAT entry between iteration x and iteration
> x+1, od->nbr->n_nat will contain the updated value of n_nat as the
> northbound record will have been updated. However, as we have not yet
> (re)initialized the nat entries in od, od->n_nat_entries will be equal
> to the previous value. These two values could then be different which
> makes this loop invalid. Does this make sense? If so, shall I update the
> commit message? How would you like me to proceed?

OK, this makes a lot more sense now. It makes sense that the issue 
specifically would be seen when destroying datapaths and not at any 
other point.

Here is my suggestion for how to proceed:
1) Update the commit message to give an explanation similar to what 
you've given here.

2) Correct the finding that I just found right now and that I mention 
down below.

> 
>>
>> On 9/3/21 8:21 AM, Mark Gray wrote:
>>> destroy_nat_entries() iterates over nat_entries using the
>>> number of NAT entries from the NB database. When using I-P,
>>> this behaviour is incorrect as the number of NAT entries in
>>> the NBDB may differ from that stored in 'struct ovn_datapath'
>>> causing unexpected behaviour.
>>>
>>> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
>>> ---
>>>    northd/northd.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 724baa3994d5..829c4479f14b 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -605,6 +605,8 @@ struct ovn_datapath {
>>>    
>>>        /* NAT entries configured on the router. */
>>>        struct ovn_nat *nat_entries;
>>> +    size_t n_nat_entries;
>>> +

n_nat_entries is declared and never used in this function.

>>>        bool has_distributed_nat;
>>>    
>>>        /* Set of nat external ips on the router. */
>>> @@ -789,6 +791,7 @@ init_nat_entries(struct ovn_datapath *od)
>>>                od->has_distributed_nat = true;
>>>            }
>>>        }
>>> +    od->n_nat_entries = od->nbr->n_nat;
>>>    }
>>>    
>>>    static void
>>> @@ -802,7 +805,7 @@ destroy_nat_entries(struct ovn_datapath *od)
>>>        destroy_lport_addresses(&od->dnat_force_snat_addrs);
>>>        destroy_lport_addresses(&od->lb_force_snat_addrs);
>>>    
>>> -    for (size_t i = 0; i < od->nbr->n_nat; i++) {
>>> +    for (size_t i = 0; i < od->n_nat_entries; i++) {
>>>            destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
>>>        }
>>>    }
>>>
>>
>
Mark Gray Sept. 29, 2021, 5:26 p.m. UTC | #4
On 15/09/2021 18:58, Mark Michelson wrote:
> On 9/14/21 1:27 PM, Mark Gray wrote:
>> On 13/09/2021 21:41, Mark Michelson wrote:
>>> Hi Mark,
>>>
>>> Of all the patches in this series, this is the one I am least sure about.
>> Thanks for looking over these.
>>
>>>
>>> It seems simple enough, but it's not clear to me
>>>
>>> 1) Why would the number of nat entries on an ovn_datapath differ from
>>> what's in the database? Since this series doesn't introduce any "real"
>>> incremental processing, we should be recreating the ovn_datapath on
>>> every iteration. And on each iteration, we should have a fresh view of
>>> the database, so why would the values differ?
>>> 2) Why is this only an issue in destroy_nat_entries()? There are several
>>> other places in the code where we iterate over the number of nat entries
>>> using od->nbr->n_nat. Why are those not affected?
>>> 3) How does this patch address that issue? The patch just sets the
>>> ovn_datapath's n_nat_entries to be the same as the database's count,
>>> which according to the commit message, is not always accurate.
>>>
>>> Can you provide some clarity here? Maybe tell of a situation where
>>> od->nbr->n_nat is incorrect?
>>
>> You are absolutely correct as the details in the commit message are not
>> sufficient to understand. In fact, I had a hard time remembering the
>> reason myself!
>>
>> IIRC, on each iteration, we need to call en_runtime_run(). This needs to
>> initialize the runtime data .. but .. it also needs to destroy the
>> runtime data from the previous run. This may not be the best way to do
>> it, but the I-P framework doesn't have a convenient way to clear data
>> between runs.
>>
>> Consider an example in which we have completed iteration x, and started
>> iteration x+1. If we add a NAT entry between iteration x and iteration
>> x+1, od->nbr->n_nat will contain the updated value of n_nat as the
>> northbound record will have been updated. However, as we have not yet
>> (re)initialized the nat entries in od, od->n_nat_entries will be equal
>> to the previous value. These two values could then be different which
>> makes this loop invalid. Does this make sense? If so, shall I update the
>> commit message? How would you like me to proceed?
> 
> OK, this makes a lot more sense now. It makes sense that the issue 
> specifically would be seen when destroying datapaths and not at any 
> other point.
> 
> Here is my suggestion for how to proceed:
> 1) Update the commit message to give an explanation similar to what 
> you've given here.

I posted a new series and added the comment.
> 2) Correct the finding that I just found right now and that I mention 
> down below.

I forgot to do this, I will fix in v2. Also, I am surprised I didn't see
this as I though I compiled with --enable-Werror.
> 
>>
>>>
>>> On 9/3/21 8:21 AM, Mark Gray wrote:
>>>> destroy_nat_entries() iterates over nat_entries using the
>>>> number of NAT entries from the NB database. When using I-P,
>>>> this behaviour is incorrect as the number of NAT entries in
>>>> the NBDB may differ from that stored in 'struct ovn_datapath'
>>>> causing unexpected behaviour.
>>>>
>>>> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
>>>> ---
>>>>    northd/northd.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 724baa3994d5..829c4479f14b 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -605,6 +605,8 @@ struct ovn_datapath {
>>>>    
>>>>        /* NAT entries configured on the router. */
>>>>        struct ovn_nat *nat_entries;
>>>> +    size_t n_nat_entries;
>>>> +
> 
> n_nat_entries is declared and never used in this function.
> 
>>>>        bool has_distributed_nat;
>>>>    
>>>>        /* Set of nat external ips on the router. */
>>>> @@ -789,6 +791,7 @@ init_nat_entries(struct ovn_datapath *od)
>>>>                od->has_distributed_nat = true;
>>>>            }
>>>>        }
>>>> +    od->n_nat_entries = od->nbr->n_nat;
>>>>    }
>>>>    
>>>>    static void
>>>> @@ -802,7 +805,7 @@ destroy_nat_entries(struct ovn_datapath *od)
>>>>        destroy_lport_addresses(&od->dnat_force_snat_addrs);
>>>>        destroy_lport_addresses(&od->lb_force_snat_addrs);
>>>>    
>>>> -    for (size_t i = 0; i < od->nbr->n_nat; i++) {
>>>> +    for (size_t i = 0; i < od->n_nat_entries; i++) {
>>>>            destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
>>>>        }
>>>>    }
>>>>
>>>
>>
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 724baa3994d5..829c4479f14b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -605,6 +605,8 @@  struct ovn_datapath {
 
     /* NAT entries configured on the router. */
     struct ovn_nat *nat_entries;
+    size_t n_nat_entries;
+
     bool has_distributed_nat;
 
     /* Set of nat external ips on the router. */
@@ -789,6 +791,7 @@  init_nat_entries(struct ovn_datapath *od)
             od->has_distributed_nat = true;
         }
     }
+    od->n_nat_entries = od->nbr->n_nat;
 }
 
 static void
@@ -802,7 +805,7 @@  destroy_nat_entries(struct ovn_datapath *od)
     destroy_lport_addresses(&od->dnat_force_snat_addrs);
     destroy_lport_addresses(&od->lb_force_snat_addrs);
 
-    for (size_t i = 0; i < od->nbr->n_nat; i++) {
+    for (size_t i = 0; i < od->n_nat_entries; i++) {
         destroy_lport_addresses(&od->nat_entries[i].ext_addrs);
     }
 }