diff mbox series

[ovs-dev] northd: Don't get stuck in the STATE_INIT_HASH_SIZES state.

Message ID 20220823144245.124676-1-dceara@redhat.com
State Accepted
Headers show
Series [ovs-dev] northd: Don't get stuck in the STATE_INIT_HASH_SIZES state. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Dumitru Ceara Aug. 23, 2022, 2:42 p.m. UTC
After removing the possibility of disabling logical datapath groups the
above mentioned state should only be relevant when ovn-northd is started
with more than one lflow processing threads (--n-threads > 1), to avoid
a very inefficient first lflow processing run due to a suboptimally
sized hmap.

There's no need to ever go back to this state now that we don't allow
users to disable logical datapath groups.  Moreover, the previous code
was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
iteration, essentially disabling parallelization.

Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 northd/northd.c | 10 ----------
 1 file changed, 10 deletions(-)

Comments

Ilya Maximets Aug. 24, 2022, 3:43 p.m. UTC | #1
On 8/23/22 16:42, Dumitru Ceara wrote:
> After removing the possibility of disabling logical datapath groups the
> above mentioned state should only be relevant when ovn-northd is started
> with more than one lflow processing threads (--n-threads > 1), to avoid
> a very inefficient first lflow processing run due to a suboptimally
> sized hmap.
> 
> There's no need to ever go back to this state now that we don't allow
> users to disable logical datapath groups.  Moreover, the previous code
> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
> iteration, essentially disabling parallelization.
> 
> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---

Hi, Dumitru.  Thanks for the patch!
I've been using it in multiple scale tests since it was posted,
and it does re-enable parallel lflow build.

One issue though.  Not sure how to fix that or what is the desired
behavior, but what I noticed is:

 1. Cluster with 3 northd instances starts up.
 2. After a while all of them has been active at least once, due
    to leadership transfers or other things.
 3. If one instance stays active for long, a lot of database changes
    can be accumulated.
 4. Once the leadership flips, new active northd is not prepared for
    the amount of logical flows it suddenly needs to manage.  So,
    it starts parallel processing with hmap significantly smaller
    than it actually needs right now.  That iteration takes noticeably
    more time than a usual one.  After the first iteration after
    becoming active, it can process lflows with a normal speed.

So, I wonder if we can fall into a catastrophic scenario where
the current hash map size is very small and amount of flows after
re-connection is very high, so threads are only fighting for locks
all the time.

Do you have some thoughts on this?

Should we disable parallelism when northd goes into stand-up state?

Best regards, Ilya Maximets.

>  northd/northd.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 7e2681865..33943bfaf 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>      }
>  }
>  
> -static void init_worker_pool(void)
> -{
> -    /* If parallelization is enabled, make sure locks are initialized. */
> -    if (parallelization_state != STATE_NULL) {
> -        lflow_hash_lock_init();
> -        parallelization_state = STATE_INIT_HASH_SIZES;
> -    }
> -}
> -
>  static void
>  build_mcast_groups(struct lflow_input *data,
>                     const struct hmap *datapaths,
> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>                  struct ovsdb_idl_txn *ovnsb_txn)
>  {
>      stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
> -    init_worker_pool();
>      ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>                   input_data->sbrec_chassis_by_name,
>                   input_data->sbrec_chassis_by_hostname);
Ilya Maximets Aug. 24, 2022, 3:49 p.m. UTC | #2
On 8/24/22 17:43, Ilya Maximets wrote:
> On 8/23/22 16:42, Dumitru Ceara wrote:
>> After removing the possibility of disabling logical datapath groups the
>> above mentioned state should only be relevant when ovn-northd is started
>> with more than one lflow processing threads (--n-threads > 1), to avoid
>> a very inefficient first lflow processing run due to a suboptimally
>> sized hmap.
>>
>> There's no need to ever go back to this state now that we don't allow
>> users to disable logical datapath groups.  Moreover, the previous code
>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>> iteration, essentially disabling parallelization.
>>
>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>> ---
> 
> Hi, Dumitru.  Thanks for the patch!
> I've been using it in multiple scale tests since it was posted,
> and it does re-enable parallel lflow build.
> 
> One issue though.  Not sure how to fix that or what is the desired
> behavior, but what I noticed is:
> 
>  1. Cluster with 3 northd instances starts up.
>  2. After a while all of them has been active at least once, due
>     to leadership transfers or other things.
>  3. If one instance stays active for long, a lot of database changes
>     can be accumulated.
>  4. Once the leadership flips, new active northd is not prepared for
>     the amount of logical flows it suddenly needs to manage.  So,
>     it starts parallel processing with hmap significantly smaller
>     than it actually needs right now.  That iteration takes noticeably
>     more time than a usual one.  After the first iteration after
>     becoming active, it can process lflows with a normal speed.
> 
> So, I wonder if we can fall into a catastrophic scenario where
> the current hash map size is very small and amount of flows after
> re-connection is very high, so threads are only fighting for locks
> all the time.
> 
> Do you have some thoughts on this?
> 
> Should we disable parallelism when northd goes into stand-up state?

* I meant 'standby', of course.  :/

> 
> Best regards, Ilya Maximets.
> 
>>  northd/northd.c | 10 ----------
>>  1 file changed, 10 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 7e2681865..33943bfaf 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>>      }
>>  }
>>  
>> -static void init_worker_pool(void)
>> -{
>> -    /* If parallelization is enabled, make sure locks are initialized. */
>> -    if (parallelization_state != STATE_NULL) {
>> -        lflow_hash_lock_init();
>> -        parallelization_state = STATE_INIT_HASH_SIZES;
>> -    }
>> -}
>> -
>>  static void
>>  build_mcast_groups(struct lflow_input *data,
>>                     const struct hmap *datapaths,
>> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>>                  struct ovsdb_idl_txn *ovnsb_txn)
>>  {
>>      stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
>> -    init_worker_pool();
>>      ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>>                   input_data->sbrec_chassis_by_name,
>>                   input_data->sbrec_chassis_by_hostname);
>
Dumitru Ceara Aug. 24, 2022, 3:57 p.m. UTC | #3
On 8/24/22 17:49, Ilya Maximets wrote:
> On 8/24/22 17:43, Ilya Maximets wrote:
>> On 8/23/22 16:42, Dumitru Ceara wrote:
>>> After removing the possibility of disabling logical datapath groups the
>>> above mentioned state should only be relevant when ovn-northd is started
>>> with more than one lflow processing threads (--n-threads > 1), to avoid
>>> a very inefficient first lflow processing run due to a suboptimally
>>> sized hmap.
>>>
>>> There's no need to ever go back to this state now that we don't allow
>>> users to disable logical datapath groups.  Moreover, the previous code
>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>>> iteration, essentially disabling parallelization.
>>>
>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>> ---
>>
>> Hi, Dumitru.  Thanks for the patch!

Hi Ilya,

>> I've been using it in multiple scale tests since it was posted,
>> and it does re-enable parallel lflow build.
>>

Thanks for trying it out!

>> One issue though.  Not sure how to fix that or what is the desired
>> behavior, but what I noticed is:
>>
>>  1. Cluster with 3 northd instances starts up.
>>  2. After a while all of them has been active at least once, due
>>     to leadership transfers or other things.
>>  3. If one instance stays active for long, a lot of database changes
>>     can be accumulated.
>>  4. Once the leadership flips, new active northd is not prepared for
>>     the amount of logical flows it suddenly needs to manage.  So,
>>     it starts parallel processing with hmap significantly smaller
>>     than it actually needs right now.  That iteration takes noticeably
>>     more time than a usual one.  After the first iteration after
>>     becoming active, it can process lflows with a normal speed.
>>
>> So, I wonder if we can fall into a catastrophic scenario where
>> the current hash map size is very small and amount of flows after
>> re-connection is very high, so threads are only fighting for locks
>> all the time.
>>
>> Do you have some thoughts on this?
>>

This seems like a possible scenario (although I wonder how often we'd
hit it in practice).

But this was a possible scenario before 9dea0c090e91 ("nb: Remove
possibility of disabling logical datapath groups.") too, right?

>> Should we disable parallelism when northd goes into stand-up state?
> 
> * I meant 'standby', of course.  :/
> 

We could, but that means every time northd becomes active it will do one
first inefficient run.  Can we instead store the max_seen_lflow_size
value in the SB database and use the one we read from the DB?

>>
>> Best regards, Ilya Maximets.
>>

Thanks,
Dumitru

>>>  northd/northd.c | 10 ----------
>>>  1 file changed, 10 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 7e2681865..33943bfaf 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>>>      }
>>>  }
>>>  
>>> -static void init_worker_pool(void)
>>> -{
>>> -    /* If parallelization is enabled, make sure locks are initialized. */
>>> -    if (parallelization_state != STATE_NULL) {
>>> -        lflow_hash_lock_init();
>>> -        parallelization_state = STATE_INIT_HASH_SIZES;
>>> -    }
>>> -}
>>> -
>>>  static void
>>>  build_mcast_groups(struct lflow_input *data,
>>>                     const struct hmap *datapaths,
>>> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>>>                  struct ovsdb_idl_txn *ovnsb_txn)
>>>  {
>>>      stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
>>> -    init_worker_pool();
>>>      ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>>>                   input_data->sbrec_chassis_by_name,
>>>                   input_data->sbrec_chassis_by_hostname);
>>
>
Ilya Maximets Aug. 24, 2022, 4:32 p.m. UTC | #4
On 8/24/22 17:57, Dumitru Ceara wrote:
> On 8/24/22 17:49, Ilya Maximets wrote:
>> On 8/24/22 17:43, Ilya Maximets wrote:
>>> On 8/23/22 16:42, Dumitru Ceara wrote:
>>>> After removing the possibility of disabling logical datapath groups the
>>>> above mentioned state should only be relevant when ovn-northd is started
>>>> with more than one lflow processing threads (--n-threads > 1), to avoid
>>>> a very inefficient first lflow processing run due to a suboptimally
>>>> sized hmap.
>>>>
>>>> There's no need to ever go back to this state now that we don't allow
>>>> users to disable logical datapath groups.  Moreover, the previous code
>>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>>>> iteration, essentially disabling parallelization.
>>>>
>>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>> ---
>>>
>>> Hi, Dumitru.  Thanks for the patch!
> 
> Hi Ilya,
> 
>>> I've been using it in multiple scale tests since it was posted,
>>> and it does re-enable parallel lflow build.
>>>
> 
> Thanks for trying it out!
> 
>>> One issue though.  Not sure how to fix that or what is the desired
>>> behavior, but what I noticed is:
>>>
>>>  1. Cluster with 3 northd instances starts up.
>>>  2. After a while all of them has been active at least once, due
>>>     to leadership transfers or other things.
>>>  3. If one instance stays active for long, a lot of database changes
>>>     can be accumulated.
>>>  4. Once the leadership flips, new active northd is not prepared for
>>>     the amount of logical flows it suddenly needs to manage.  So,
>>>     it starts parallel processing with hmap significantly smaller
>>>     than it actually needs right now.  That iteration takes noticeably
>>>     more time than a usual one.  After the first iteration after
>>>     becoming active, it can process lflows with a normal speed.
>>>
>>> So, I wonder if we can fall into a catastrophic scenario where
>>> the current hash map size is very small and amount of flows after
>>> re-connection is very high, so threads are only fighting for locks
>>> all the time.
>>>
>>> Do you have some thoughts on this?
>>>
> 
> This seems like a possible scenario (although I wonder how often we'd
> hit it in practice).
> 
> But this was a possible scenario before 9dea0c090e91 ("nb: Remove
> possibility of disabling logical datapath groups.") too, right?

Probably, yes.  There is no need to block this particular patch as
it is fixing the separate bug. So,

Acked-by: Ilya Maximets <i.maximets@ovn.org>

> 
>>> Should we disable parallelism when northd goes into stand-up state?
>>
>> * I meant 'standby', of course.  :/
>>
> 
> We could, but that means every time northd becomes active it will do one
> first inefficient run.  Can we instead store the max_seen_lflow_size
> value in the SB database and use the one we read from the DB?

It's an interesting idea, if it can be implemented efficiently.  I'm not
sure how much extra traffic or northd wakeups that will generate.
It might make more sense to store it in NB, just to avoid spamming all
the controllers with monitor-all.  Also, NB DB generally has more room
performance-wise.

> 
>>>
>>> Best regards, Ilya Maximets.
>>>
> 
> Thanks,
> Dumitru
> 
>>>>  northd/northd.c | 10 ----------
>>>>  1 file changed, 10 deletions(-)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 7e2681865..33943bfaf 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -14164,15 +14164,6 @@ void run_update_worker_pool(int n_threads)
>>>>      }
>>>>  }
>>>>  
>>>> -static void init_worker_pool(void)
>>>> -{
>>>> -    /* If parallelization is enabled, make sure locks are initialized. */
>>>> -    if (parallelization_state != STATE_NULL) {
>>>> -        lflow_hash_lock_init();
>>>> -        parallelization_state = STATE_INIT_HASH_SIZES;
>>>> -    }
>>>> -}
>>>> -
>>>>  static void
>>>>  build_mcast_groups(struct lflow_input *data,
>>>>                     const struct hmap *datapaths,
>>>> @@ -15762,7 +15753,6 @@ void northd_run(struct northd_input *input_data,
>>>>                  struct ovsdb_idl_txn *ovnsb_txn)
>>>>  {
>>>>      stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
>>>> -    init_worker_pool();
>>>>      ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
>>>>                   input_data->sbrec_chassis_by_name,
>>>>                   input_data->sbrec_chassis_by_hostname);
>>>
>>
>
Dumitru Ceara Aug. 25, 2022, 9:03 a.m. UTC | #5
On 8/24/22 18:32, Ilya Maximets wrote:
> On 8/24/22 17:57, Dumitru Ceara wrote:
>> On 8/24/22 17:49, Ilya Maximets wrote:
>>> On 8/24/22 17:43, Ilya Maximets wrote:
>>>> On 8/23/22 16:42, Dumitru Ceara wrote:
>>>>> After removing the possibility of disabling logical datapath groups the
>>>>> above mentioned state should only be relevant when ovn-northd is started
>>>>> with more than one lflow processing threads (--n-threads > 1), to avoid
>>>>> a very inefficient first lflow processing run due to a suboptimally
>>>>> sized hmap.
>>>>>
>>>>> There's no need to ever go back to this state now that we don't allow
>>>>> users to disable logical datapath groups.  Moreover, the previous code
>>>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>>>>> iteration, essentially disabling parallelization.
>>>>>
>>>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
>>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>>> ---
>>>>
>>>> Hi, Dumitru.  Thanks for the patch!
>>
>> Hi Ilya,
>>
>>>> I've been using it in multiple scale tests since it was posted,
>>>> and it does re-enable parallel lflow build.
>>>>
>>
>> Thanks for trying it out!
>>
>>>> One issue though.  Not sure how to fix that or what is the desired
>>>> behavior, but what I noticed is:
>>>>
>>>>  1. Cluster with 3 northd instances starts up.
>>>>  2. After a while all of them has been active at least once, due
>>>>     to leadership transfers or other things.
>>>>  3. If one instance stays active for long, a lot of database changes
>>>>     can be accumulated.
>>>>  4. Once the leadership flips, new active northd is not prepared for
>>>>     the amount of logical flows it suddenly needs to manage.  So,
>>>>     it starts parallel processing with hmap significantly smaller
>>>>     than it actually needs right now.  That iteration takes noticeably
>>>>     more time than a usual one.  After the first iteration after
>>>>     becoming active, it can process lflows with a normal speed.
>>>>
>>>> So, I wonder if we can fall into a catastrophic scenario where
>>>> the current hash map size is very small and amount of flows after
>>>> re-connection is very high, so threads are only fighting for locks
>>>> all the time.
>>>>
>>>> Do you have some thoughts on this?
>>>>
>>
>> This seems like a possible scenario (although I wonder how often we'd
>> hit it in practice).
>>
>> But this was a possible scenario before 9dea0c090e91 ("nb: Remove
>> possibility of disabling logical datapath groups.") too, right?
> 
> Probably, yes.  There is no need to block this particular patch as
> it is fixing the separate bug. So,
> 
> Acked-by: Ilya Maximets <i.maximets@ovn.org>
> 

Thanks!

>>
>>>> Should we disable parallelism when northd goes into stand-up state?
>>>
>>> * I meant 'standby', of course.  :/
>>>
>>
>> We could, but that means every time northd becomes active it will do one
>> first inefficient run.  Can we instead store the max_seen_lflow_size
>> value in the SB database and use the one we read from the DB?
> 
> It's an interesting idea, if it can be implemented efficiently.  I'm not
> sure how much extra traffic or northd wakeups that will generate.
> It might make more sense to store it in NB, just to avoid spamming all
> the controllers with monitor-all.  Also, NB DB generally has more room
> performance-wise.
> 

I'll try to work on a patch to store it in NB; we already write to NB
for other stuff like Logical_Switch_Port.up so it should be acceptable.

I'll run some benchmarks to see if there's any measurable negative
impact because of this and report back.

Regards,
Dumitru
Ilya Maximets Sept. 9, 2022, 7:59 a.m. UTC | #6
On 8/25/22 11:03, Dumitru Ceara wrote:
> On 8/24/22 18:32, Ilya Maximets wrote:
>> On 8/24/22 17:57, Dumitru Ceara wrote:
>>> On 8/24/22 17:49, Ilya Maximets wrote:
>>>> On 8/24/22 17:43, Ilya Maximets wrote:
>>>>> On 8/23/22 16:42, Dumitru Ceara wrote:
>>>>>> After removing the possibility of disabling logical datapath groups the
>>>>>> above mentioned state should only be relevant when ovn-northd is started
>>>>>> with more than one lflow processing threads (--n-threads > 1), to avoid
>>>>>> a very inefficient first lflow processing run due to a suboptimally
>>>>>> sized hmap.
>>>>>>
>>>>>> There's no need to ever go back to this state now that we don't allow
>>>>>> users to disable logical datapath groups.  Moreover, the previous code
>>>>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
>>>>>> iteration, essentially disabling parallelization.
>>>>>>
>>>>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
>>>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>>>> ---
>>>>>
>>>>> Hi, Dumitru.  Thanks for the patch!
>>>
>>> Hi Ilya,
>>>
>>>>> I've been using it in multiple scale tests since it was posted,
>>>>> and it does re-enable parallel lflow build.
>>>>>
>>>
>>> Thanks for trying it out!
>>>
>>>>> One issue though.  Not sure how to fix that or what is the desired
>>>>> behavior, but what I noticed is:
>>>>>
>>>>>  1. Cluster with 3 northd instances starts up.
>>>>>  2. After a while all of them has been active at least once, due
>>>>>     to leadership transfers or other things.
>>>>>  3. If one instance stays active for long, a lot of database changes
>>>>>     can be accumulated.
>>>>>  4. Once the leadership flips, new active northd is not prepared for
>>>>>     the amount of logical flows it suddenly needs to manage.  So,
>>>>>     it starts parallel processing with hmap significantly smaller
>>>>>     than it actually needs right now.  That iteration takes noticeably
>>>>>     more time than a usual one.  After the first iteration after
>>>>>     becoming active, it can process lflows with a normal speed.
>>>>>
>>>>> So, I wonder if we can fall into a catastrophic scenario where
>>>>> the current hash map size is very small and amount of flows after
>>>>> re-connection is very high, so threads are only fighting for locks
>>>>> all the time.
>>>>>
>>>>> Do you have some thoughts on this?
>>>>>
>>>
>>> This seems like a possible scenario (although I wonder how often we'd
>>> hit it in practice).
>>>
>>> But this was a possible scenario before 9dea0c090e91 ("nb: Remove
>>> possibility of disabling logical datapath groups.") too, right?
>>
>> Probably, yes.  There is no need to block this particular patch as
>> it is fixing the separate bug. So,
>>
>> Acked-by: Ilya Maximets <i.maximets@ovn.org>
>>
> 
> Thanks!
> 
>>>
>>>>> Should we disable parallelism when northd goes into stand-up state?
>>>>
>>>> * I meant 'standby', of course.  :/
>>>>
>>>
>>> We could, but that means every time northd becomes active it will do one
>>> first inefficient run.  Can we instead store the max_seen_lflow_size
>>> value in the SB database and use the one we read from the DB?
>>
>> It's an interesting idea, if it can be implemented efficiently.  I'm not
>> sure how much extra traffic or northd wakeups that will generate.
>> It might make more sense to store it in NB, just to avoid spamming all
>> the controllers with monitor-all.  Also, NB DB generally has more room
>> performance-wise.
>>
> 
> I'll try to work on a patch to store it in NB; we already write to NB
> for other stuff like Logical_Switch_Port.up so it should be acceptable.
> 
> I'll run some benchmarks to see if there's any measurable negative
> impact because of this and report back.

Sounds reasonable.

Meanwhile, Han, Mark, Numan, can we, please, accept the current fix?

The parallelization is completely broken without it on both the main
branch and brach-22.09 as well.

Thanks!
Best regards, Ilya Maximets.
Numan Siddique Sept. 15, 2022, 2:52 p.m. UTC | #7
On Fri, Sep 9, 2022 at 4:00 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 8/25/22 11:03, Dumitru Ceara wrote:
> > On 8/24/22 18:32, Ilya Maximets wrote:
> >> On 8/24/22 17:57, Dumitru Ceara wrote:
> >>> On 8/24/22 17:49, Ilya Maximets wrote:
> >>>> On 8/24/22 17:43, Ilya Maximets wrote:
> >>>>> On 8/23/22 16:42, Dumitru Ceara wrote:
> >>>>>> After removing the possibility of disabling logical datapath groups the
> >>>>>> above mentioned state should only be relevant when ovn-northd is started
> >>>>>> with more than one lflow processing threads (--n-threads > 1), to avoid
> >>>>>> a very inefficient first lflow processing run due to a suboptimally
> >>>>>> sized hmap.
> >>>>>>
> >>>>>> There's no need to ever go back to this state now that we don't allow
> >>>>>> users to disable logical datapath groups.  Moreover, the previous code
> >>>>>> was wrongfully setting the state to STATE_INIT_HASH_SIZES at every
> >>>>>> iteration, essentially disabling parallelization.
> >>>>>>
> >>>>>> Fixes: 9dea0c090e91 ("nb: Remove possibility of disabling logical datapath groups.")
> >>>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> >>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> >>>>>> ---
> >>>>>
> >>>>> Hi, Dumitru.  Thanks for the patch!
> >>>
> >>> Hi Ilya,
> >>>
> >>>>> I've been using it in multiple scale tests since it was posted,
> >>>>> and it does re-enable parallel lflow build.
> >>>>>
> >>>
> >>> Thanks for trying it out!
> >>>
> >>>>> One issue though.  Not sure how to fix that or what is the desired
> >>>>> behavior, but what I noticed is:
> >>>>>
> >>>>>  1. Cluster with 3 northd instances starts up.
> >>>>>  2. After a while all of them has been active at least once, due
> >>>>>     to leadership transfers or other things.
> >>>>>  3. If one instance stays active for long, a lot of database changes
> >>>>>     can be accumulated.
> >>>>>  4. Once the leadership flips, new active northd is not prepared for
> >>>>>     the amount of logical flows it suddenly needs to manage.  So,
> >>>>>     it starts parallel processing with hmap significantly smaller
> >>>>>     than it actually needs right now.  That iteration takes noticeably
> >>>>>     more time than a usual one.  After the first iteration after
> >>>>>     becoming active, it can process lflows with a normal speed.
> >>>>>
> >>>>> So, I wonder if we can fall into a catastrophic scenario where
> >>>>> the current hash map size is very small and amount of flows after
> >>>>> re-connection is very high, so threads are only fighting for locks
> >>>>> all the time.
> >>>>>
> >>>>> Do you have some thoughts on this?
> >>>>>
> >>>
> >>> This seems like a possible scenario (although I wonder how often we'd
> >>> hit it in practice).
> >>>
> >>> But this was a possible scenario before 9dea0c090e91 ("nb: Remove
> >>> possibility of disabling logical datapath groups.") too, right?
> >>
> >> Probably, yes.  There is no need to block this particular patch as
> >> it is fixing the separate bug. So,
> >>
> >> Acked-by: Ilya Maximets <i.maximets@ovn.org>
> >>
> >
> > Thanks!
> >
> >>>
> >>>>> Should we disable parallelism when northd goes into stand-up state?
> >>>>
> >>>> * I meant 'standby', of course.  :/
> >>>>
> >>>
> >>> We could, but that means every time northd becomes active it will do one
> >>> first inefficient run.  Can we instead store the max_seen_lflow_size
> >>> value in the SB database and use the one we read from the DB?
> >>
> >> It's an interesting idea, if it can be implemented efficiently.  I'm not
> >> sure how much extra traffic or northd wakeups that will generate.
> >> It might make more sense to store it in NB, just to avoid spamming all
> >> the controllers with monitor-all.  Also, NB DB generally has more room
> >> performance-wise.
> >>
> >
> > I'll try to work on a patch to store it in NB; we already write to NB
> > for other stuff like Logical_Switch_Port.up so it should be acceptable.
> >
> > I'll run some benchmarks to see if there's any measurable negative
> > impact because of this and report back.
>
> Sounds reasonable.
>
> Meanwhile, Han, Mark, Numan, can we, please, accept the current fix?

Thanks.  I applied this patch to main and to branch-22.09.

Numan

>
> The parallelization is completely broken without it on both the main
> branch and brach-22.09 as well.
>
> Thanks!
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 7e2681865..33943bfaf 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -14164,15 +14164,6 @@  void run_update_worker_pool(int n_threads)
     }
 }
 
-static void init_worker_pool(void)
-{
-    /* If parallelization is enabled, make sure locks are initialized. */
-    if (parallelization_state != STATE_NULL) {
-        lflow_hash_lock_init();
-        parallelization_state = STATE_INIT_HASH_SIZES;
-    }
-}
-
 static void
 build_mcast_groups(struct lflow_input *data,
                    const struct hmap *datapaths,
@@ -15762,7 +15753,6 @@  void northd_run(struct northd_input *input_data,
                 struct ovsdb_idl_txn *ovnsb_txn)
 {
     stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
-    init_worker_pool();
     ovnnb_db_run(input_data, data, ovnnb_txn, ovnsb_txn,
                  input_data->sbrec_chassis_by_name,
                  input_data->sbrec_chassis_by_hostname);