diff mbox series

[ovs-dev,v8,3/3] northd: Restore parallel build with dp_groups

Message ID 20210915124340.1765-3-anton.ivanov@cambridgegreys.com
State Accepted
Headers show
Series [ovs-dev,v8,1/3] northd: Disable parallel processing for logical_dp_groups | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Anton Ivanov Sept. 15, 2021, 12:43 p.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Restore parallel build with dp groups using rwlock instead
of per row locking as an underlying mechanism.

This provides improvement ~ 10% end-to-end on ovn-heater
under virutalization despite awakening some qemu gremlin
which makes qemu climb to silly CPU usage. The gain on
bare metal is likely to be higher.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 northd/ovn-northd.c | 150 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 127 insertions(+), 23 deletions(-)

Comments

Han Zhou Sept. 29, 2021, 11:56 p.m. UTC | #1
On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com> wrote:
>
> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>
> Restore parallel build with dp groups using rwlock instead
> of per row locking as an underlying mechanism.
>
> This provides improvement ~ 10% end-to-end on ovn-heater
> under virutalization despite awakening some qemu gremlin
> which makes qemu climb to silly CPU usage. The gain on
> bare metal is likely to be higher.
>
Hi Anton,

I am trying to see the benefit of parallel_build, but encountered
unexpected performance result when running the perf tests with command:
     make check-perf TESTSUITEFLAGS="--rebuild"

It shows significantly worse performance than without parallel_build. For
dp_group = no cases, it is better, but still ~30% slower than without
parallel_build. I have 24 cores, but each thread is not consuming much CPU
except the main thread. I also tried hardcode the number of thread to just
4, which end up with slightly better results, but still far behind "without
parallel_build".

             no parallel                                   |
parallel  (24 pool threads)                  |        parallel with (4 pool
threads)
                                                           |
                                            |
    1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1:
ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd
basic scale test -- 200 Hypervisors, 200
    ---                                                    |    ---
                                           |    ---
    Maximum (NB in msec): 1058                             |    Maximum (NB
in msec): 4269                             |    Maximum (NB in msec): 4097

    Average (NB in msec): 836.941167                       |    Average (NB
in msec): 3697.253931                      |    Average (NB in msec):
3498.311525
    Maximum (SB in msec): 30                               |    Maximum (SB
in msec): 30                               |    Maximum (SB in msec): 28

    Average (SB in msec): 25.934011                        |    Average (SB
in msec): 26.001840                        |    Average (SB in msec):
25.685091
    Maximum (northd-loop in msec): 1204                    |    Maximum
(northd-loop in msec): 4379                    |    Maximum (northd-loop in
msec): 4251
    Average (northd-loop in msec): 1005.330078             |    Average
(northd-loop in msec): 4233.871504             |    Average (northd-loop in
msec): 4022.774208
                                                           |
                                            |
    2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2:
ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd
basic scale test -- 200 Hypervisors, 200
    ---                                                    |    ---
                                           |    ---
    Maximum (NB in msec): 1124                             |    Maximum (NB
in msec): 1480                             |    Maximum (NB in msec): 1331

    Average (NB in msec): 892.403405                       |    Average (NB
in msec): 1206.189287                      |    Average (NB in msec):
1089.378455
    Maximum (SB in msec): 29                               |    Maximum (SB
in msec): 31                               |    Maximum (SB in msec): 30

    Average (SB in msec): 26.922632                        |    Average (SB
in msec): 26.636706                        |    Average (SB in msec):
25.657484
    Maximum (northd-loop in msec): 1275                    |    Maximum
(northd-loop in msec): 1639                    |    Maximum (northd-loop in
msec): 1495
    Average (northd-loop in msec): 1074.917873             |    Average
(northd-loop in msec): 1458.152327             |    Average (northd-loop in
msec): 1301.057201
                                                           |
                                            |
    5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5:
ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd
basic scale test -- 500 Hypervisors, 50
    ---                                                    |    ---
                                           |    ---
    Maximum (NB in msec): 768                              |    Maximum (NB
in msec): 3086                             |    Maximum (NB in msec): 2876

    Average (NB in msec): 614.491938                       |    Average (NB
in msec): 2681.688365                      |    Average (NB in msec):
2531.255444
    Maximum (SB in msec): 18                               |    Maximum (SB
in msec): 17                               |    Maximum (SB in msec): 18

    Average (SB in msec): 16.347526                        |    Average (SB
in msec): 15.955263                        |    Average (SB in msec):
16.278075
    Maximum (northd-loop in msec): 889                     |    Maximum
(northd-loop in msec): 3247                    |    Maximum (northd-loop in
msec): 3031
    Average (northd-loop in msec): 772.083572              |    Average
(northd-loop in msec): 3117.504297             |    Average (northd-loop in
msec): 2833.182361
                                                           |
                                            |
    6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6:
ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd
basic scale test -- 500 Hypervisors, 50
    ---                                                    |    ---
                                           |    ---
    Maximum (NB in msec): 1046                             |    Maximum (NB
in msec): 1371                             |    Maximum (NB in msec): 1262

    Average (NB in msec): 827.735852                       |    Average (NB
in msec): 1135.514228                      |    Average (NB in msec):
970.544792
    Maximum (SB in msec): 19                               |    Maximum (SB
in msec): 18                               |    Maximum (SB in msec): 19

    Average (SB in msec): 16.828127                        |    Average (SB
in msec): 16.083914                        |    Average (SB in msec):
15.602525
    Maximum (northd-loop in msec): 1163                    |    Maximum
(northd-loop in msec): 1545                    |    Maximum (northd-loop in
msec): 1411
    Average (northd-loop in msec): 972.567407              |    Average
(northd-loop in msec): 1328.617583             |    Average (northd-loop in
msec): 1207.667100

I didn't debug yet, but do you have any clue what could be the reason? I am
using the upstream commit 9242f27f63 which already included this patch.
Below is my change to the perf-northd.at file just to enable parallel_build:

diff --git a/tests/perf-northd.at b/tests/perf-northd.at
index 74b69e9d4..9328c2e21 100644
--- a/tests/perf-northd.at
+++ b/tests/perf-northd.at
@@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200
Hypervisors, 200 Logical Ports/Hype
 PERF_RECORD_START()

 ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true

 BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))

@@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500
Hypervisors, 50 Logical Ports/Hyper
 PERF_RECORD_START()

 ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true

 BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))

Thanks,
Han
Anton Ivanov Sept. 30, 2021, 5:54 a.m. UTC | #2
I need to have a look.

I use the ovn-heater end-to-end test, that was showing a substantial 
improvement.

What are you running this on?

A.

On 30/09/2021 00:56, Han Zhou wrote:
>
>
> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>> wrote:
> >
> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>>
> >
> > Restore parallel build with dp groups using rwlock instead
> > of per row locking as an underlying mechanism.
> >
> > This provides improvement ~ 10% end-to-end on ovn-heater
> > under virutalization despite awakening some qemu gremlin
> > which makes qemu climb to silly CPU usage. The gain on
> > bare metal is likely to be higher.
> >
> Hi Anton,
>
> I am trying to see the benefit of parallel_build, but encountered 
> unexpected performance result when running the perf tests with command:
>      make check-perf TESTSUITEFLAGS="--rebuild"
>
> It shows significantly worse performance than without parallel_build. 
> For dp_group = no cases, it is better, but still ~30% slower than 
> without parallel_build. I have 24 cores, but each thread is not 
> consuming much CPU except the main thread. I also tried hardcode the 
> number of thread to just 4, which end up with slightly better results, 
> but still far behind "without parallel_build".
>
>              no parallel                                   | parallel  
> (24 pool threads)                  | parallel with (4 pool threads)
>                                    |                                   |
>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: 
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd 
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1058                             |   
>  Maximum (NB in msec): 4269                             |    Maximum 
> (NB in msec): 4097
>     Average (NB in msec): 836.941167                       |   
>  Average (NB in msec): 3697.253931                      |    Average 
> (NB in msec): 3498.311525
>     Maximum (SB in msec): 30                               |   
>  Maximum (SB in msec): 30                               |    Maximum 
> (SB in msec): 28
>     Average (SB in msec): 25.934011                        |   
>  Average (SB in msec): 26.001840                        |    Average 
> (SB in msec): 25.685091
>     Maximum (northd-loop in msec): 1204                    |   
>  Maximum (northd-loop in msec): 4379                    |    Maximum 
> (northd-loop in msec): 4251
>     Average (northd-loop in msec): 1005.330078             |   
>  Average (northd-loop in msec): 4233.871504             |    Average 
> (northd-loop in msec): 4022.774208
>                                                            |           
>                                                 |
>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: 
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd 
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1124                             |   
>  Maximum (NB in msec): 1480                             |    Maximum 
> (NB in msec): 1331
>     Average (NB in msec): 892.403405                       |   
>  Average (NB in msec): 1206.189287                      |    Average 
> (NB in msec): 1089.378455
>     Maximum (SB in msec): 29                               |   
>  Maximum (SB in msec): 31                               |    Maximum 
> (SB in msec): 30
>     Average (SB in msec): 26.922632                        |   
>  Average (SB in msec): 26.636706                        |    Average 
> (SB in msec): 25.657484
>     Maximum (northd-loop in msec): 1275                    |   
>  Maximum (northd-loop in msec): 1639                    |    Maximum 
> (northd-loop in msec): 1495
>     Average (northd-loop in msec): 1074.917873             |   
>  Average (northd-loop in msec): 1458.152327             |    Average 
> (northd-loop in msec): 1301.057201
>                                                            |           
>                                                 |
>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: 
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd 
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 768                              |   
>  Maximum (NB in msec): 3086                             |    Maximum 
> (NB in msec): 2876
>     Average (NB in msec): 614.491938                       |   
>  Average (NB in msec): 2681.688365                      |    Average 
> (NB in msec): 2531.255444
>     Maximum (SB in msec): 18                               |   
>  Maximum (SB in msec): 17                               |    Maximum 
> (SB in msec): 18
>     Average (SB in msec): 16.347526                        |   
>  Average (SB in msec): 15.955263                        |    Average 
> (SB in msec): 16.278075
>     Maximum (northd-loop in msec): 889                     |   
>  Maximum (northd-loop in msec): 3247                    |    Maximum 
> (northd-loop in msec): 3031
>     Average (northd-loop in msec): 772.083572              |   
>  Average (northd-loop in msec): 3117.504297             |    Average 
> (northd-loop in msec): 2833.182361
>                                                            |           
>                                                 |
>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: 
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd 
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1046                             |   
>  Maximum (NB in msec): 1371                             |    Maximum 
> (NB in msec): 1262
>     Average (NB in msec): 827.735852                       |   
>  Average (NB in msec): 1135.514228                      |    Average 
> (NB in msec): 970.544792
>     Maximum (SB in msec): 19                               |   
>  Maximum (SB in msec): 18                               |    Maximum 
> (SB in msec): 19
>     Average (SB in msec): 16.828127                        |   
>  Average (SB in msec): 16.083914                        |    Average 
> (SB in msec): 15.602525
>     Maximum (northd-loop in msec): 1163                    |   
>  Maximum (northd-loop in msec): 1545                    |    Maximum 
> (northd-loop in msec): 1411
>     Average (northd-loop in msec): 972.567407              |   
>  Average (northd-loop in msec): 1328.617583             |    Average 
> (northd-loop in msec): 1207.667100
>
> I didn't debug yet, but do you have any clue what could be the reason? 
> I am using the upstream commit 9242f27f63 which already included this 
> patch.
> Below is my change to the perf-northd.at <http://perf-northd.at> file 
> just to enable parallel_build:
>
> diff --git a/tests/perf-northd.at <http://perf-northd.at> 
> b/tests/perf-northd.at <http://perf-northd.at>
> index 74b69e9d4..9328c2e21 100644
> --- a/tests/perf-northd.at <http://perf-northd.at>
> +++ b/tests/perf-northd.at <http://perf-northd.at>
> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 
> Hypervisors, 200 Logical Ports/Hype
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>
> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 
> Hypervisors, 50 Logical Ports/Hyper
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>
> Thanks,
> Han
Anton Ivanov Sept. 30, 2021, 6:16 a.m. UTC | #3
Results on a Ryzen 5 3600 - 6 cores 12 threads

Without


   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 1256
   Average (NB in msec): 679.463785
   Maximum (SB in msec): 25
   Average (SB in msec): 22.489798
   Maximum (northd-loop in msec): 1347
   Average (northd-loop in msec): 799.944878

   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
   ---
   Maximum (NB in msec): 1956
   Average (NB in msec): 809.387285
   Maximum (SB in msec): 24
   Average (SB in msec): 21.649258
   Maximum (northd-loop in msec): 2011
   Average (northd-loop in msec): 961.718686

   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 557
   Average (NB in msec): 474.010337
   Maximum (SB in msec): 15
   Average (SB in msec): 13.927192
   Maximum (northd-loop in msec): 1261
   Average (northd-loop in msec): 580.999122

   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
   ---
   Maximum (NB in msec): 756
   Average (NB in msec): 625.614724
   Maximum (SB in msec): 15
   Average (SB in msec): 14.181048
   Maximum (northd-loop in msec): 1649
   Average (northd-loop in msec): 746.208332


With

   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 1140
   Average (NB in msec): 631.125000
   Maximum (SB in msec): 24
   Average (SB in msec): 21.453609
   Maximum (northd-loop in msec): 6080
   Average (northd-loop in msec): 759.718815

   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
   ---
   Maximum (NB in msec): 1210
   Average (NB in msec): 673.000000
   Maximum (SB in msec): 27
   Average (SB in msec): 22.453125
   Maximum (northd-loop in msec): 6514
   Average (northd-loop in msec): 808.596842

   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 798
   Average (NB in msec): 429.750000
   Maximum (SB in msec): 15
   Average (SB in msec): 12.998533
   Maximum (northd-loop in msec): 3835
   Average (northd-loop in msec): 564.875986

   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
   ---
   Maximum (NB in msec): 1074
   Average (NB in msec): 593.875000
   Maximum (SB in msec): 14
   Average (SB in msec): 13.655273
   Maximum (northd-loop in msec): 4973
   Average (northd-loop in msec): 771.102605

The only one slower is test 6 which I will look into.

The rest are > 5% faster.

A.

On 30/09/2021 00:56, Han Zhou wrote:
>
>
> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>> wrote:
> >
> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>>
> >
> > Restore parallel build with dp groups using rwlock instead
> > of per row locking as an underlying mechanism.
> >
> > This provides improvement ~ 10% end-to-end on ovn-heater
> > under virutalization despite awakening some qemu gremlin
> > which makes qemu climb to silly CPU usage. The gain on
> > bare metal is likely to be higher.
> >
> Hi Anton,
>
> I am trying to see the benefit of parallel_build, but encountered 
> unexpected performance result when running the perf tests with command:
>      make check-perf TESTSUITEFLAGS="--rebuild"
>
> It shows significantly worse performance than without parallel_build. 
> For dp_group = no cases, it is better, but still ~30% slower than 
> without parallel_build. I have 24 cores, but each thread is not 
> consuming much CPU except the main thread. I also tried hardcode the 
> number of thread to just 4, which end up with slightly better results, 
> but still far behind "without parallel_build".
>
>              no parallel                                   | parallel  
> (24 pool threads)                  | parallel with (4 pool threads)
>                                    |                                   |
>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: 
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd 
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1058                             |   
>  Maximum (NB in msec): 4269                             |    Maximum 
> (NB in msec): 4097
>     Average (NB in msec): 836.941167                       |   
>  Average (NB in msec): 3697.253931                      |    Average 
> (NB in msec): 3498.311525
>     Maximum (SB in msec): 30                               |   
>  Maximum (SB in msec): 30                               |    Maximum 
> (SB in msec): 28
>     Average (SB in msec): 25.934011                        |   
>  Average (SB in msec): 26.001840                        |    Average 
> (SB in msec): 25.685091
>     Maximum (northd-loop in msec): 1204                    |   
>  Maximum (northd-loop in msec): 4379                    |    Maximum 
> (northd-loop in msec): 4251
>     Average (northd-loop in msec): 1005.330078             |   
>  Average (northd-loop in msec): 4233.871504             |    Average 
> (northd-loop in msec): 4022.774208
>                                                            |           
>                                                 |
>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: 
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd 
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1124                             |   
>  Maximum (NB in msec): 1480                             |    Maximum 
> (NB in msec): 1331
>     Average (NB in msec): 892.403405                       |   
>  Average (NB in msec): 1206.189287                      |    Average 
> (NB in msec): 1089.378455
>     Maximum (SB in msec): 29                               |   
>  Maximum (SB in msec): 31                               |    Maximum 
> (SB in msec): 30
>     Average (SB in msec): 26.922632                        |   
>  Average (SB in msec): 26.636706                        |    Average 
> (SB in msec): 25.657484
>     Maximum (northd-loop in msec): 1275                    |   
>  Maximum (northd-loop in msec): 1639                    |    Maximum 
> (northd-loop in msec): 1495
>     Average (northd-loop in msec): 1074.917873             |   
>  Average (northd-loop in msec): 1458.152327             |    Average 
> (northd-loop in msec): 1301.057201
>                                                            |           
>                                                 |
>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: 
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd 
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 768                              |   
>  Maximum (NB in msec): 3086                             |    Maximum 
> (NB in msec): 2876
>     Average (NB in msec): 614.491938                       |   
>  Average (NB in msec): 2681.688365                      |    Average 
> (NB in msec): 2531.255444
>     Maximum (SB in msec): 18                               |   
>  Maximum (SB in msec): 17                               |    Maximum 
> (SB in msec): 18
>     Average (SB in msec): 16.347526                        |   
>  Average (SB in msec): 15.955263                        |    Average 
> (SB in msec): 16.278075
>     Maximum (northd-loop in msec): 889                     |   
>  Maximum (northd-loop in msec): 3247                    |    Maximum 
> (northd-loop in msec): 3031
>     Average (northd-loop in msec): 772.083572              |   
>  Average (northd-loop in msec): 3117.504297             |    Average 
> (northd-loop in msec): 2833.182361
>                                                            |           
>                                                 |
>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: 
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd 
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---   
>                                                  |    ---
>     Maximum (NB in msec): 1046                             |   
>  Maximum (NB in msec): 1371                             |    Maximum 
> (NB in msec): 1262
>     Average (NB in msec): 827.735852                       |   
>  Average (NB in msec): 1135.514228                      |    Average 
> (NB in msec): 970.544792
>     Maximum (SB in msec): 19                               |   
>  Maximum (SB in msec): 18                               |    Maximum 
> (SB in msec): 19
>     Average (SB in msec): 16.828127                        |   
>  Average (SB in msec): 16.083914                        |    Average 
> (SB in msec): 15.602525
>     Maximum (northd-loop in msec): 1163                    |   
>  Maximum (northd-loop in msec): 1545                    |    Maximum 
> (northd-loop in msec): 1411
>     Average (northd-loop in msec): 972.567407              |   
>  Average (northd-loop in msec): 1328.617583             |    Average 
> (northd-loop in msec): 1207.667100
>
> I didn't debug yet, but do you have any clue what could be the reason? 
> I am using the upstream commit 9242f27f63 which already included this 
> patch.
> Below is my change to the perf-northd.at <http://perf-northd.at> file 
> just to enable parallel_build:
>
> diff --git a/tests/perf-northd.at <http://perf-northd.at> 
> b/tests/perf-northd.at <http://perf-northd.at>
> index 74b69e9d4..9328c2e21 100644
> --- a/tests/perf-northd.at <http://perf-northd.at>
> +++ b/tests/perf-northd.at <http://perf-northd.at>
> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 
> Hypervisors, 200 Logical Ports/Hype
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>
> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 
> Hypervisors, 50 Logical Ports/Hyper
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>
> Thanks,
> Han
Anton Ivanov Sept. 30, 2021, 6:31 a.m. UTC | #4
On 30/09/2021 07:16, Anton Ivanov wrote:
> Results on a Ryzen 5 3600 - 6 cores 12 threads

I will also have a look into the "maximum" measurement for multi-thread.

It does not tie up with the drop in average across the board.

A.

>
> Without
>
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1256
>   Average (NB in msec): 679.463785
>   Maximum (SB in msec): 25
>   Average (SB in msec): 22.489798
>   Maximum (northd-loop in msec): 1347
>   Average (northd-loop in msec): 799.944878
>
>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1956
>   Average (NB in msec): 809.387285
>   Maximum (SB in msec): 24
>   Average (SB in msec): 21.649258
>   Maximum (northd-loop in msec): 2011
>   Average (northd-loop in msec): 961.718686
>
>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 557
>   Average (NB in msec): 474.010337
>   Maximum (SB in msec): 15
>   Average (SB in msec): 13.927192
>   Maximum (northd-loop in msec): 1261
>   Average (northd-loop in msec): 580.999122
>
>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 756
>   Average (NB in msec): 625.614724
>   Maximum (SB in msec): 15
>   Average (SB in msec): 14.181048
>   Maximum (northd-loop in msec): 1649
>   Average (northd-loop in msec): 746.208332
>
>
> With
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1140
>   Average (NB in msec): 631.125000
>   Maximum (SB in msec): 24
>   Average (SB in msec): 21.453609
>   Maximum (northd-loop in msec): 6080
>   Average (northd-loop in msec): 759.718815
>
>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1210
>   Average (NB in msec): 673.000000
>   Maximum (SB in msec): 27
>   Average (SB in msec): 22.453125
>   Maximum (northd-loop in msec): 6514
>   Average (northd-loop in msec): 808.596842
>
>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 798
>   Average (NB in msec): 429.750000
>   Maximum (SB in msec): 15
>   Average (SB in msec): 12.998533
>   Maximum (northd-loop in msec): 3835
>   Average (northd-loop in msec): 564.875986
>
>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1074
>   Average (NB in msec): 593.875000
>   Maximum (SB in msec): 14
>   Average (SB in msec): 13.655273
>   Maximum (northd-loop in msec): 4973
>   Average (northd-loop in msec): 771.102605
>
> The only one slower is test 6 which I will look into.
>
> The rest are > 5% faster.
>
> A.
>
> On 30/09/2021 00:56, Han Zhou wrote:
>>
>>
>> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com 
>> <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>> >
>> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com 
>> <mailto:anton.ivanov@cambridgegreys.com>>
>> >
>> > Restore parallel build with dp groups using rwlock instead
>> > of per row locking as an underlying mechanism.
>> >
>> > This provides improvement ~ 10% end-to-end on ovn-heater
>> > under virutalization despite awakening some qemu gremlin
>> > which makes qemu climb to silly CPU usage. The gain on
>> > bare metal is likely to be higher.
>> >
>> Hi Anton,
>>
>> I am trying to see the benefit of parallel_build, but encountered 
>> unexpected performance result when running the perf tests with command:
>>      make check-perf TESTSUITEFLAGS="--rebuild"
>>
>> It shows significantly worse performance than without parallel_build. 
>> For dp_group = no cases, it is better, but still ~30% slower than 
>> without parallel_build. I have 24 cores, but each thread is not 
>> consuming much CPU except the main thread. I also tried hardcode the 
>> number of thread to just 4, which end up with slightly better 
>> results, but still far behind "without parallel_build".
>>
>>              no parallel                                   | 
>> parallel  (24 pool threads)                  | parallel with (4 pool 
>> threads)
>>                                      |                               
>>         |
>>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: 
>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: 
>> ovn-northd basic scale test -- 200 Hypervisors, 200
>>     ---  |    ---    |    ---
>>     Maximum (NB in msec): 1058 |    Maximum (NB in msec): 4269   |   
>>  Maximum (NB in msec): 4097
>>     Average (NB in msec): 836.941167 |    Average (NB in msec): 
>> 3697.253931  |    Average (NB in msec): 3498.311525
>>     Maximum (SB in msec): 30 |    Maximum (SB in msec): 30   |   
>>  Maximum (SB in msec): 28
>>     Average (SB in msec): 25.934011  |    Average (SB in msec): 
>> 26.001840    |    Average (SB in msec): 25.685091
>>     Maximum (northd-loop in msec): 1204  |    Maximum (northd-loop in 
>> msec): 4379    |    Maximum (northd-loop in msec): 4251
>>     Average (northd-loop in msec): 1005.330078 |    Average 
>> (northd-loop in msec): 4233.871504   |    Average (northd-loop in 
>> msec): 4022.774208
>>  |   |
>>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: 
>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: 
>> ovn-northd basic scale test -- 200 Hypervisors, 200
>>     ---  |    ---    |    ---
>>     Maximum (NB in msec): 1124 |    Maximum (NB in msec): 1480   |   
>>  Maximum (NB in msec): 1331
>>     Average (NB in msec): 892.403405 |    Average (NB in msec): 
>> 1206.189287  |    Average (NB in msec): 1089.378455
>>     Maximum (SB in msec): 29 |    Maximum (SB in msec): 31   |   
>>  Maximum (SB in msec): 30
>>     Average (SB in msec): 26.922632  |    Average (SB in msec): 
>> 26.636706    |    Average (SB in msec): 25.657484
>>     Maximum (northd-loop in msec): 1275  |    Maximum (northd-loop in 
>> msec): 1639    |    Maximum (northd-loop in msec): 1495
>>     Average (northd-loop in msec): 1074.917873 |    Average 
>> (northd-loop in msec): 1458.152327   |    Average (northd-loop in 
>> msec): 1301.057201
>>  |   |
>>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: 
>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: 
>> ovn-northd basic scale test -- 500 Hypervisors, 50
>>     ---  |    ---    |    ---
>>     Maximum (NB in msec): 768  |    Maximum (NB in msec): 3086   |   
>>  Maximum (NB in msec): 2876
>>     Average (NB in msec): 614.491938 |    Average (NB in msec): 
>> 2681.688365  |    Average (NB in msec): 2531.255444
>>     Maximum (SB in msec): 18 |    Maximum (SB in msec): 17   |   
>>  Maximum (SB in msec): 18
>>     Average (SB in msec): 16.347526  |    Average (SB in msec): 
>> 15.955263    |    Average (SB in msec): 16.278075
>>     Maximum (northd-loop in msec): 889 |    Maximum (northd-loop in 
>> msec): 3247  |    Maximum (northd-loop in msec): 3031
>>     Average (northd-loop in msec): 772.083572  |    Average 
>> (northd-loop in msec): 3117.504297   |    Average (northd-loop in 
>> msec): 2833.182361
>>  |   |
>>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: 
>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: 
>> ovn-northd basic scale test -- 500 Hypervisors, 50
>>     ---  |    ---    |    ---
>>     Maximum (NB in msec): 1046 |    Maximum (NB in msec): 1371   |   
>>  Maximum (NB in msec): 1262
>>     Average (NB in msec): 827.735852 |    Average (NB in msec): 
>> 1135.514228  |    Average (NB in msec): 970.544792
>>     Maximum (SB in msec): 19 |    Maximum (SB in msec): 18   |   
>>  Maximum (SB in msec): 19
>>     Average (SB in msec): 16.828127  |    Average (SB in msec): 
>> 16.083914    |    Average (SB in msec): 15.602525
>>     Maximum (northd-loop in msec): 1163  |    Maximum (northd-loop in 
>> msec): 1545    |    Maximum (northd-loop in msec): 1411
>>     Average (northd-loop in msec): 972.567407  |    Average 
>> (northd-loop in msec): 1328.617583   |    Average (northd-loop in 
>> msec): 1207.667100
>>
>> I didn't debug yet, but do you have any clue what could be the 
>> reason? I am using the upstream commit 9242f27f63 which already 
>> included this patch.
>> Below is my change to the perf-northd.at <http://perf-northd.at> file 
>> just to enable parallel_build:
>>
>> diff --git a/tests/perf-northd.at <http://perf-northd.at> 
>> b/tests/perf-northd.at <http://perf-northd.at>
>> index 74b69e9d4..9328c2e21 100644
>> --- a/tests/perf-northd.at <http://perf-northd.at>
>> +++ b/tests/perf-northd.at <http://perf-northd.at>
>> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 
>> Hypervisors, 200 Logical Ports/Hype
>>  PERF_RECORD_START()
>>
>>  ovn_start
>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>
>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>
>> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 
>> Hypervisors, 50 Logical Ports/Hyper
>>  PERF_RECORD_START()
>>
>>  ovn_start
>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>
>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>
>> Thanks,
>> Han
>
>
> -- 
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661
> https://www.cambridgegreys.com/
Han Zhou Sept. 30, 2021, 6:52 a.m. UTC | #5
Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @
2.90GHz, 24 cores.
It is weird why my result is so different. I also verified with a scale
test script that creates a large scale NB/SB with 800 nodes of simulated
k8s setup. And then just run:
    ovn-nbctl --print-wait-time --wait=sb sync

Without parallel:
ovn-northd completion: 7807ms

With parallel:
ovn-northd completion: 41267ms

I suspected the hmap size problem but I tried changing the initial size to
64k buckets and it didn't help. I will find some time to check the "perf"
reports.

Thanks,
Han

On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <
anton.ivanov@cambridgegreys.com> wrote:

> On 30/09/2021 07:16, Anton Ivanov wrote:
>
> Results on a Ryzen 5 3600 - 6 cores 12 threads
>
> I will also have a look into the "maximum" measurement for multi-thread.
>
> It does not tie up with the drop in average across the board.
>
> A.
>
>
> Without
>
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1256
>   Average (NB in msec): 679.463785
>   Maximum (SB in msec): 25
>   Average (SB in msec): 22.489798
>   Maximum (northd-loop in msec): 1347
>   Average (northd-loop in msec): 799.944878
>
>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1956
>   Average (NB in msec): 809.387285
>   Maximum (SB in msec): 24
>   Average (SB in msec): 21.649258
>   Maximum (northd-loop in msec): 2011
>   Average (northd-loop in msec): 961.718686
>
>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 557
>   Average (NB in msec): 474.010337
>   Maximum (SB in msec): 15
>   Average (SB in msec): 13.927192
>   Maximum (northd-loop in msec): 1261
>   Average (northd-loop in msec): 580.999122
>
>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 756
>   Average (NB in msec): 625.614724
>   Maximum (SB in msec): 15
>   Average (SB in msec): 14.181048
>   Maximum (northd-loop in msec): 1649
>   Average (northd-loop in msec): 746.208332
>
>
> With
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1140
>   Average (NB in msec): 631.125000
>   Maximum (SB in msec): 24
>   Average (SB in msec): 21.453609
>   Maximum (northd-loop in msec): 6080
>   Average (northd-loop in msec): 759.718815
>
>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1210
>   Average (NB in msec): 673.000000
>   Maximum (SB in msec): 27
>   Average (SB in msec): 22.453125
>   Maximum (northd-loop in msec): 6514
>   Average (northd-loop in msec): 808.596842
>
>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 798
>   Average (NB in msec): 429.750000
>   Maximum (SB in msec): 15
>   Average (SB in msec): 12.998533
>   Maximum (northd-loop in msec): 3835
>   Average (northd-loop in msec): 564.875986
>
>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>   ---
>   Maximum (NB in msec): 1074
>   Average (NB in msec): 593.875000
>   Maximum (SB in msec): 14
>   Average (SB in msec): 13.655273
>   Maximum (northd-loop in msec): 4973
>   Average (northd-loop in msec): 771.102605
>
> The only one slower is test 6 which I will look into.
>
> The rest are > 5% faster.
>
> A.
>
> On 30/09/2021 00:56, Han Zhou wrote:
>
>
>
> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com> wrote:
> >
> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> >
> > Restore parallel build with dp groups using rwlock instead
> > of per row locking as an underlying mechanism.
> >
> > This provides improvement ~ 10% end-to-end on ovn-heater
> > under virutalization despite awakening some qemu gremlin
> > which makes qemu climb to silly CPU usage. The gain on
> > bare metal is likely to be higher.
> >
> Hi Anton,
>
> I am trying to see the benefit of parallel_build, but encountered
> unexpected performance result when running the perf tests with command:
>      make check-perf TESTSUITEFLAGS="--rebuild"
>
> It shows significantly worse performance than without parallel_build. For
> dp_group = no cases, it is better, but still ~30% slower than without
> parallel_build. I have 24 cores, but each thread is not consuming much CPU
> except the main thread. I also tried hardcode the number of thread to just
> 4, which end up with slightly better results, but still far behind "without
> parallel_build".
>
>              no parallel                                   |
> parallel  (24 pool threads)                  |        parallel with (4 pool
> threads)
>                                                            |
>                                             |
>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1:
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---
>                                              |    ---
>     Maximum (NB in msec): 1058                             |    Maximum
> (NB in msec): 4269                             |    Maximum (NB in msec):
> 4097
>     Average (NB in msec): 836.941167                       |    Average
> (NB in msec): 3697.253931                      |    Average (NB in msec):
> 3498.311525
>     Maximum (SB in msec): 30                               |    Maximum
> (SB in msec): 30                               |    Maximum (SB in msec):
> 28
>     Average (SB in msec): 25.934011                        |    Average
> (SB in msec): 26.001840                        |    Average (SB in msec):
> 25.685091
>     Maximum (northd-loop in msec): 1204                    |    Maximum
> (northd-loop in msec): 4379                    |    Maximum (northd-loop in
> msec): 4251
>     Average (northd-loop in msec): 1005.330078             |    Average
> (northd-loop in msec): 4233.871504             |    Average (northd-loop in
> msec): 4022.774208
>                                                            |
>                                             |
>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2:
> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd
> basic scale test -- 200 Hypervisors, 200
>     ---                                                    |    ---
>                                              |    ---
>     Maximum (NB in msec): 1124                             |    Maximum
> (NB in msec): 1480                             |    Maximum (NB in msec):
> 1331
>     Average (NB in msec): 892.403405                       |    Average
> (NB in msec): 1206.189287                      |    Average (NB in msec):
> 1089.378455
>     Maximum (SB in msec): 29                               |    Maximum
> (SB in msec): 31                               |    Maximum (SB in msec):
> 30
>     Average (SB in msec): 26.922632                        |    Average
> (SB in msec): 26.636706                        |    Average (SB in msec):
> 25.657484
>     Maximum (northd-loop in msec): 1275                    |    Maximum
> (northd-loop in msec): 1639                    |    Maximum (northd-loop in
> msec): 1495
>     Average (northd-loop in msec): 1074.917873             |    Average
> (northd-loop in msec): 1458.152327             |    Average (northd-loop in
> msec): 1301.057201
>                                                            |
>                                             |
>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5:
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---
>                                              |    ---
>     Maximum (NB in msec): 768                              |    Maximum
> (NB in msec): 3086                             |    Maximum (NB in msec):
> 2876
>     Average (NB in msec): 614.491938                       |    Average
> (NB in msec): 2681.688365                      |    Average (NB in msec):
> 2531.255444
>     Maximum (SB in msec): 18                               |    Maximum
> (SB in msec): 17                               |    Maximum (SB in msec):
> 18
>     Average (SB in msec): 16.347526                        |    Average
> (SB in msec): 15.955263                        |    Average (SB in msec):
> 16.278075
>     Maximum (northd-loop in msec): 889                     |    Maximum
> (northd-loop in msec): 3247                    |    Maximum (northd-loop in
> msec): 3031
>     Average (northd-loop in msec): 772.083572              |    Average
> (northd-loop in msec): 3117.504297             |    Average (northd-loop in
> msec): 2833.182361
>                                                            |
>                                             |
>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6:
> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd
> basic scale test -- 500 Hypervisors, 50
>     ---                                                    |    ---
>                                              |    ---
>     Maximum (NB in msec): 1046                             |    Maximum
> (NB in msec): 1371                             |    Maximum (NB in msec):
> 1262
>     Average (NB in msec): 827.735852                       |    Average
> (NB in msec): 1135.514228                      |    Average (NB in msec):
> 970.544792
>     Maximum (SB in msec): 19                               |    Maximum
> (SB in msec): 18                               |    Maximum (SB in msec):
> 19
>     Average (SB in msec): 16.828127                        |    Average
> (SB in msec): 16.083914                        |    Average (SB in msec):
> 15.602525
>     Maximum (northd-loop in msec): 1163                    |    Maximum
> (northd-loop in msec): 1545                    |    Maximum (northd-loop in
> msec): 1411
>     Average (northd-loop in msec): 972.567407              |    Average
> (northd-loop in msec): 1328.617583             |    Average (northd-loop in
> msec): 1207.667100
>
> I didn't debug yet, but do you have any clue what could be the reason? I
> am using the upstream commit 9242f27f63 which already included this patch.
> Below is my change to the perf-northd.at file just to enable
> parallel_build:
>
> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> index 74b69e9d4..9328c2e21 100644
> --- a/tests/perf-northd.at
> +++ b/tests/perf-northd.at
> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200
> Hypervisors, 200 Logical Ports/Hype
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>
> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500
> Hypervisors, 50 Logical Ports/Hyper
>  PERF_RECORD_START()
>
>  ovn_start
> +ovn-nbctl set nb_global . options:use_parallel_build=true
>
>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>
> Thanks,
> Han
>
>
> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>
> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>
Anton Ivanov Sept. 30, 2021, 7:08 a.m. UTC | #6
After quickly adding some more prints into the testsuite.

Test 1:

Without

   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 1130
   Average (NB in msec): 620.375000
   Maximum (SB in msec): 23
   Average (SB in msec): 21.468759
   Maximum (northd-loop in msec): 6002
   Minimum (northd-loop in msec): 0
   Average (northd-loop in msec): 914.760417
   Long term average (northd-loop in msec): 104.799340

With

   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
   ---
   Maximum (NB in msec): 1148
   Average (NB in msec): 630.250000
   Maximum (SB in msec): 24
   Average (SB in msec): 21.468744
   Maximum (northd-loop in msec): 6090
   Minimum (northd-loop in msec): 0
   Average (northd-loop in msec): 762.101565
   Long term average (northd-loop in msec): 80.735192

The metric which actually matters and which SHOULD me measured - long term average is better by 20%. Using short term average instead of long term in the test suite is actually a BUG.

Are you running yours under some sort of virtualization?

A.

On 30/09/2021 07:52, Han Zhou wrote:
> Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz, 24 cores.
> It is weird why my result is so different. I also verified with a scale test script that creates a large scale NB/SB with 800 nodes of simulated k8s setup. And then just run:
>     ovn-nbctl --print-wait-time --wait=sb sync
>
> Without parallel:
> ovn-northd completion: 7807ms
>
> With parallel:
> ovn-northd completion: 41267ms
>
> I suspected the hmap size problem but I tried changing the initial size to 64k buckets and it didn't help. I will find some time to check the "perf" reports.
>
> Thanks,
> Han
>
> On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>
>     On 30/09/2021 07:16, Anton Ivanov wrote:
>>     Results on a Ryzen 5 3600 - 6 cores 12 threads
>
>     I will also have a look into the "maximum" measurement for multi-thread.
>
>     It does not tie up with the drop in average across the board.
>
>     A.
>
>>
>>     Without
>>
>>
>>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>       ---
>>       Maximum (NB in msec): 1256
>>       Average (NB in msec): 679.463785
>>       Maximum (SB in msec): 25
>>       Average (SB in msec): 22.489798
>>       Maximum (northd-loop in msec): 1347
>>       Average (northd-loop in msec): 799.944878
>>
>>       2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>       ---
>>       Maximum (NB in msec): 1956
>>       Average (NB in msec): 809.387285
>>       Maximum (SB in msec): 24
>>       Average (SB in msec): 21.649258
>>       Maximum (northd-loop in msec): 2011
>>       Average (northd-loop in msec): 961.718686
>>
>>       5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>       ---
>>       Maximum (NB in msec): 557
>>       Average (NB in msec): 474.010337
>>       Maximum (SB in msec): 15
>>       Average (SB in msec): 13.927192
>>       Maximum (northd-loop in msec): 1261
>>       Average (northd-loop in msec): 580.999122
>>
>>       6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>       ---
>>       Maximum (NB in msec): 756
>>       Average (NB in msec): 625.614724
>>       Maximum (SB in msec): 15
>>       Average (SB in msec): 14.181048
>>       Maximum (northd-loop in msec): 1649
>>       Average (northd-loop in msec): 746.208332
>>
>>
>>     With
>>
>>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>       ---
>>       Maximum (NB in msec): 1140
>>       Average (NB in msec): 631.125000
>>       Maximum (SB in msec): 24
>>       Average (SB in msec): 21.453609
>>       Maximum (northd-loop in msec): 6080
>>       Average (northd-loop in msec): 759.718815
>>
>>       2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>       ---
>>       Maximum (NB in msec): 1210
>>       Average (NB in msec): 673.000000
>>       Maximum (SB in msec): 27
>>       Average (SB in msec): 22.453125
>>       Maximum (northd-loop in msec): 6514
>>       Average (northd-loop in msec): 808.596842
>>
>>       5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>       ---
>>       Maximum (NB in msec): 798
>>       Average (NB in msec): 429.750000
>>       Maximum (SB in msec): 15
>>       Average (SB in msec): 12.998533
>>       Maximum (northd-loop in msec): 3835
>>       Average (northd-loop in msec): 564.875986
>>
>>       6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>       ---
>>       Maximum (NB in msec): 1074
>>       Average (NB in msec): 593.875000
>>       Maximum (SB in msec): 14
>>       Average (SB in msec): 13.655273
>>       Maximum (northd-loop in msec): 4973
>>       Average (northd-loop in msec): 771.102605
>>
>>     The only one slower is test 6 which I will look into.
>>
>>     The rest are > 5% faster.
>>
>>     A.
>>
>>     On 30/09/2021 00:56, Han Zhou wrote:
>>>
>>>
>>>     On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>>     >
>>>     > From: Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>>
>>>     >
>>>     > Restore parallel build with dp groups using rwlock instead
>>>     > of per row locking as an underlying mechanism.
>>>     >
>>>     > This provides improvement ~ 10% end-to-end on ovn-heater
>>>     > under virutalization despite awakening some qemu gremlin
>>>     > which makes qemu climb to silly CPU usage. The gain on
>>>     > bare metal is likely to be higher.
>>>     >
>>>     Hi Anton,
>>>
>>>     I am trying to see the benefit of parallel_build, but encountered unexpected performance result when running the perf tests with command:
>>>          make check-perf TESTSUITEFLAGS="--rebuild"
>>>
>>>     It shows significantly worse performance than without parallel_build. For dp_group = no cases, it is better, but still ~30% slower than without parallel_build. I have 24 cores, but each thread is not consuming much CPU except the main thread. I also tried hardcode the number of thread to just 4, which end up with slightly better results, but still far behind "without parallel_build".
>>>
>>>        no parallel |              parallel  (24 pool threads)               |        parallel with (4 pool threads)
>>>                                                    |           |
>>>         1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>         ---            |    ---                        |    ---
>>>         Maximum (NB in msec): 1058           |    Maximum (NB in msec): 4269                       |    Maximum (NB in msec): 4097
>>>         Average (NB in msec): 836.941167           |    Average (NB in msec): 3697.253931                      |    Average (NB in msec): 3498.311525
>>>         Maximum (SB in msec): 30           |    Maximum (SB in msec): 30                       |    Maximum (SB in msec): 28
>>>         Average (SB in msec): 25.934011            |    Average (SB in msec): 26.001840                        |    Average (SB in msec): 25.685091
>>>         Maximum (northd-loop in msec): 1204            |    Maximum (northd-loop in msec): 4379                    |    Maximum (northd-loop in msec): 4251
>>>         Average (northd-loop in msec): 1005.330078           |    Average (northd-loop in msec): 4233.871504             |    Average (northd-loop in msec): 4022.774208
>>>                |                       |
>>>         2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>         ---            |    ---                        |    ---
>>>         Maximum (NB in msec): 1124           |    Maximum (NB in msec): 1480                       |    Maximum (NB in msec): 1331
>>>         Average (NB in msec): 892.403405           |    Average (NB in msec): 1206.189287                      |    Average (NB in msec): 1089.378455
>>>         Maximum (SB in msec): 29           |    Maximum (SB in msec): 31                       |    Maximum (SB in msec): 30
>>>         Average (SB in msec): 26.922632            |    Average (SB in msec): 26.636706                        |    Average (SB in msec): 25.657484
>>>         Maximum (northd-loop in msec): 1275            |    Maximum (northd-loop in msec): 1639                    |    Maximum (northd-loop in msec): 1495
>>>         Average (northd-loop in msec): 1074.917873           |    Average (northd-loop in msec): 1458.152327             |    Average (northd-loop in msec): 1301.057201
>>>                |                       |
>>>         5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>         ---            |    ---                        |    ---
>>>         Maximum (NB in msec): 768            |    Maximum (NB in msec): 3086                       |    Maximum (NB in msec): 2876
>>>         Average (NB in msec): 614.491938           |    Average (NB in msec): 2681.688365                      |    Average (NB in msec): 2531.255444
>>>         Maximum (SB in msec): 18           |    Maximum (SB in msec): 17                       |    Maximum (SB in msec): 18
>>>         Average (SB in msec): 16.347526            |    Average (SB in msec): 15.955263                        |    Average (SB in msec): 16.278075
>>>         Maximum (northd-loop in msec): 889           |    Maximum (northd-loop in msec): 3247                    |    Maximum (northd-loop in msec): 3031
>>>         Average (northd-loop in msec): 772.083572            |    Average (northd-loop in msec): 3117.504297             |    Average (northd-loop in msec): 2833.182361
>>>                |                       |
>>>         6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>         ---            |    ---                        |    ---
>>>         Maximum (NB in msec): 1046           |    Maximum (NB in msec): 1371                       |    Maximum (NB in msec): 1262
>>>         Average (NB in msec): 827.735852           |    Average (NB in msec): 1135.514228                      |    Average (NB in msec): 970.544792
>>>         Maximum (SB in msec): 19           |    Maximum (SB in msec): 18                       |    Maximum (SB in msec): 19
>>>         Average (SB in msec): 16.828127            |    Average (SB in msec): 16.083914                        |    Average (SB in msec): 15.602525
>>>         Maximum (northd-loop in msec): 1163            |    Maximum (northd-loop in msec): 1545                    |    Maximum (northd-loop in msec): 1411
>>>         Average (northd-loop in msec): 972.567407            |    Average (northd-loop in msec): 1328.617583             |    Average (northd-loop in msec): 1207.667100
>>>
>>>     I didn't debug yet, but do you have any clue what could be the reason? I am using the upstream commit 9242f27f63 which already included this patch.
>>>     Below is my change to the perf-northd.at <http://perf-northd.at> file just to enable parallel_build:
>>>
>>>     diff --git a/tests/perf-northd.at <http://perf-northd.at> b/tests/perf-northd.at <http://perf-northd.at>
>>>     index 74b69e9d4..9328c2e21 100644
>>>     --- a/tests/perf-northd.at <http://perf-northd.at>
>>>     +++ b/tests/perf-northd.at <http://perf-northd.at>
>>>     @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hype
>>>      PERF_RECORD_START()
>>>
>>>      ovn_start
>>>     +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>
>>>      BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>>
>>>     @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hyper
>>>      PERF_RECORD_START()
>>>
>>>      ovn_start
>>>     +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>
>>>      BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>>
>>>     Thanks,
>>>     Han
>>
>>
>>     -- 
>>     Anton R. Ivanov
>>     Cambridgegreys Limited. Registered in England. Company Number 10273661
>>     https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>
>
>     -- 
>     Anton R. Ivanov
>     Cambridgegreys Limited. Registered in England. Company Number 10273661
>     https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>
Han Zhou Sept. 30, 2021, 7:26 a.m. UTC | #7
On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <
anton.ivanov@cambridgegreys.com> wrote:

> After quickly adding some more prints into the testsuite.
>
> Test 1:
>
> Without
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1130
>   Average (NB in msec): 620.375000
>   Maximum (SB in msec): 23
>   Average (SB in msec): 21.468759
>   Maximum (northd-loop in msec): 6002
>   Minimum (northd-loop in msec): 0
>   Average (northd-loop in msec): 914.760417
>   Long term average (northd-loop in msec): 104.799340
>
> With
>
>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>   ---
>   Maximum (NB in msec): 1148
>   Average (NB in msec): 630.250000
>   Maximum (SB in msec): 24
>   Average (SB in msec): 21.468744
>   Maximum (northd-loop in msec): 6090
>   Minimum (northd-loop in msec): 0
>   Average (northd-loop in msec): 762.101565
>   Long term average (northd-loop in msec): 80.735192
>
> The metric which actually matters and which SHOULD me measured - long term
> average is better by 20%. Using short term average instead of long term in
> the test suite is actually a BUG.
>
Good catch!


> Are you running yours under some sort of virtualization?
>

No, I am testing on a bare-metal.


> A.
> On 30/09/2021 07:52, Han Zhou wrote:
>
> Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @
> 2.90GHz, 24 cores.
> It is weird why my result is so different. I also verified with a scale
> test script that creates a large scale NB/SB with 800 nodes of simulated
> k8s setup. And then just run:
>     ovn-nbctl --print-wait-time --wait=sb sync
>
> Without parallel:
> ovn-northd completion: 7807ms
>
> With parallel:
> ovn-northd completion: 41267ms
>
> I suspected the hmap size problem but I tried changing the initial size to
> 64k buckets and it didn't help. I will find some time to check the "perf"
> reports.
>
> Thanks,
> Han
>
> On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <
> anton.ivanov@cambridgegreys.com> wrote:
>
>> On 30/09/2021 07:16, Anton Ivanov wrote:
>>
>> Results on a Ryzen 5 3600 - 6 cores 12 threads
>>
>> I will also have a look into the "maximum" measurement for multi-thread.
>>
>> It does not tie up with the drop in average across the board.
>>
>> A.
>>
>>
>> Without
>>
>>
>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 1256
>>   Average (NB in msec): 679.463785
>>   Maximum (SB in msec): 25
>>   Average (SB in msec): 22.489798
>>   Maximum (northd-loop in msec): 1347
>>   Average (northd-loop in msec): 799.944878
>>
>>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>   ---
>>   Maximum (NB in msec): 1956
>>   Average (NB in msec): 809.387285
>>   Maximum (SB in msec): 24
>>   Average (SB in msec): 21.649258
>>   Maximum (northd-loop in msec): 2011
>>   Average (northd-loop in msec): 961.718686
>>
>>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 557
>>   Average (NB in msec): 474.010337
>>   Maximum (SB in msec): 15
>>   Average (SB in msec): 13.927192
>>   Maximum (northd-loop in msec): 1261
>>   Average (northd-loop in msec): 580.999122
>>
>>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>   ---
>>   Maximum (NB in msec): 756
>>   Average (NB in msec): 625.614724
>>   Maximum (SB in msec): 15
>>   Average (SB in msec): 14.181048
>>   Maximum (northd-loop in msec): 1649
>>   Average (northd-loop in msec): 746.208332
>>
>>
>> With
>>
>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 1140
>>   Average (NB in msec): 631.125000
>>   Maximum (SB in msec): 24
>>   Average (SB in msec): 21.453609
>>   Maximum (northd-loop in msec): 6080
>>   Average (northd-loop in msec): 759.718815
>>
>>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>   ---
>>   Maximum (NB in msec): 1210
>>   Average (NB in msec): 673.000000
>>   Maximum (SB in msec): 27
>>   Average (SB in msec): 22.453125
>>   Maximum (northd-loop in msec): 6514
>>   Average (northd-loop in msec): 808.596842
>>
>>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 798
>>   Average (NB in msec): 429.750000
>>   Maximum (SB in msec): 15
>>   Average (SB in msec): 12.998533
>>   Maximum (northd-loop in msec): 3835
>>   Average (northd-loop in msec): 564.875986
>>
>>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>   ---
>>   Maximum (NB in msec): 1074
>>   Average (NB in msec): 593.875000
>>   Maximum (SB in msec): 14
>>   Average (SB in msec): 13.655273
>>   Maximum (northd-loop in msec): 4973
>>   Average (northd-loop in msec): 771.102605
>>
>> The only one slower is test 6 which I will look into.
>>
>> The rest are > 5% faster.
>>
>> A.
>>
>> On 30/09/2021 00:56, Han Zhou wrote:
>>
>>
>>
>> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com> wrote:
>> >
>> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>> >
>> > Restore parallel build with dp groups using rwlock instead
>> > of per row locking as an underlying mechanism.
>> >
>> > This provides improvement ~ 10% end-to-end on ovn-heater
>> > under virutalization despite awakening some qemu gremlin
>> > which makes qemu climb to silly CPU usage. The gain on
>> > bare metal is likely to be higher.
>> >
>> Hi Anton,
>>
>> I am trying to see the benefit of parallel_build, but encountered
>> unexpected performance result when running the perf tests with command:
>>      make check-perf TESTSUITEFLAGS="--rebuild"
>>
>> It shows significantly worse performance than without parallel_build. For
>> dp_group = no cases, it is better, but still ~30% slower than without
>> parallel_build. I have 24 cores, but each thread is not consuming much CPU
>> except the main thread. I also tried hardcode the number of thread to just
>> 4, which end up with slightly better results, but still far behind "without
>> parallel_build".
>>
>>              no parallel                                   |
>> parallel  (24 pool threads)                  |        parallel with (4 pool
>> threads)
>>                                                            |
>>                                               |
>>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1:
>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd
>> basic scale test -- 200 Hypervisors, 200
>>     ---                                                    |    ---
>>                                              |    ---
>>     Maximum (NB in msec): 1058                             |    Maximum
>> (NB in msec): 4269                             |    Maximum (NB in msec):
>> 4097
>>     Average (NB in msec): 836.941167                       |    Average
>> (NB in msec): 3697.253931                      |    Average (NB in msec):
>> 3498.311525
>>     Maximum (SB in msec): 30                               |    Maximum
>> (SB in msec): 30                               |    Maximum (SB in msec):
>> 28
>>     Average (SB in msec): 25.934011                        |    Average
>> (SB in msec): 26.001840                        |    Average (SB in msec):
>> 25.685091
>>     Maximum (northd-loop in msec): 1204                    |    Maximum
>> (northd-loop in msec): 4379                    |    Maximum (northd-loop in
>> msec): 4251
>>     Average (northd-loop in msec): 1005.330078             |    Average
>> (northd-loop in msec): 4233.871504             |    Average (northd-loop in
>> msec): 4022.774208
>>                                                            |
>>                                               |
>>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2:
>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd
>> basic scale test -- 200 Hypervisors, 200
>>     ---                                                    |    ---
>>                                              |    ---
>>     Maximum (NB in msec): 1124                             |    Maximum
>> (NB in msec): 1480                             |    Maximum (NB in msec):
>> 1331
>>     Average (NB in msec): 892.403405                       |    Average
>> (NB in msec): 1206.189287                      |    Average (NB in msec):
>> 1089.378455
>>     Maximum (SB in msec): 29                               |    Maximum
>> (SB in msec): 31                               |    Maximum (SB in msec):
>> 30
>>     Average (SB in msec): 26.922632                        |    Average
>> (SB in msec): 26.636706                        |    Average (SB in msec):
>> 25.657484
>>     Maximum (northd-loop in msec): 1275                    |    Maximum
>> (northd-loop in msec): 1639                    |    Maximum (northd-loop in
>> msec): 1495
>>     Average (northd-loop in msec): 1074.917873             |    Average
>> (northd-loop in msec): 1458.152327             |    Average (northd-loop in
>> msec): 1301.057201
>>                                                            |
>>                                               |
>>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5:
>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd
>> basic scale test -- 500 Hypervisors, 50
>>     ---                                                    |    ---
>>                                              |    ---
>>     Maximum (NB in msec): 768                              |    Maximum
>> (NB in msec): 3086                             |    Maximum (NB in msec):
>> 2876
>>     Average (NB in msec): 614.491938                       |    Average
>> (NB in msec): 2681.688365                      |    Average (NB in msec):
>> 2531.255444
>>     Maximum (SB in msec): 18                               |    Maximum
>> (SB in msec): 17                               |    Maximum (SB in msec):
>> 18
>>     Average (SB in msec): 16.347526                        |    Average
>> (SB in msec): 15.955263                        |    Average (SB in msec):
>> 16.278075
>>     Maximum (northd-loop in msec): 889                     |    Maximum
>> (northd-loop in msec): 3247                    |    Maximum (northd-loop in
>> msec): 3031
>>     Average (northd-loop in msec): 772.083572              |    Average
>> (northd-loop in msec): 3117.504297             |    Average (northd-loop in
>> msec): 2833.182361
>>                                                            |
>>                                               |
>>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6:
>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd
>> basic scale test -- 500 Hypervisors, 50
>>     ---                                                    |    ---
>>                                              |    ---
>>     Maximum (NB in msec): 1046                             |    Maximum
>> (NB in msec): 1371                             |    Maximum (NB in msec):
>> 1262
>>     Average (NB in msec): 827.735852                       |    Average
>> (NB in msec): 1135.514228                      |    Average (NB in msec):
>> 970.544792
>>     Maximum (SB in msec): 19                               |    Maximum
>> (SB in msec): 18                               |    Maximum (SB in msec):
>> 19
>>     Average (SB in msec): 16.828127                        |    Average
>> (SB in msec): 16.083914                        |    Average (SB in msec):
>> 15.602525
>>     Maximum (northd-loop in msec): 1163                    |    Maximum
>> (northd-loop in msec): 1545                    |    Maximum (northd-loop in
>> msec): 1411
>>     Average (northd-loop in msec): 972.567407              |    Average
>> (northd-loop in msec): 1328.617583             |    Average (northd-loop in
>> msec): 1207.667100
>>
>> I didn't debug yet, but do you have any clue what could be the reason? I
>> am using the upstream commit 9242f27f63 which already included this patch.
>> Below is my change to the perf-northd.at file just to enable
>> parallel_build:
>>
>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>> index 74b69e9d4..9328c2e21 100644
>> --- a/tests/perf-northd.at
>> +++ b/tests/perf-northd.at
>> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200
>> Hypervisors, 200 Logical Ports/Hype
>>  PERF_RECORD_START()
>>
>>  ovn_start
>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>
>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>
>> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500
>> Hypervisors, 50 Logical Ports/Hyper
>>  PERF_RECORD_START()
>>
>>  ovn_start
>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>
>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>
>> Thanks,
>> Han
>>
>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>>
>>
>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>>
>> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>
Anton Ivanov Sept. 30, 2021, 8:04 a.m. UTC | #8
OK,

I can dig into this later this afternoon.

There is quite a bit of dispersion in tests without parallelization on my system which should not be there.

I want to get down to the bottom of where it is coming from and why are we getting different results compared to ovn-heater.

I did all the original tests with ovn-heater and they were consistently 5-10% better end-to-end with parallelization enabled.

As far as the worker threads never reaching 100% and the northd thread being regularly at 100% that is unfortunately how it is. Large sections of northd cannot be parallelized at present. The only bit which can be run in parallel is lflow compute.

Generation of datapaths, ports, groups - all before the lflows cannot be parallelized and it is compute heavy.

Post-processing of flows once they have been generated - hash recompute, reconciliation of databases, etc - cannot be parallelized at present. Some of it may be run in parallel if there were parallel macros in the OVS source, but they are likely to give only marginal effect on performance - 1-2% at most.

Best Regards,

A.

On 30/09/2021 08:26, Han Zhou wrote:
>
>
> On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>
>     After quickly adding some more prints into the testsuite.
>
>     Test 1:
>
>     Without
>
>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>       ---
>       Maximum (NB in msec): 1130
>       Average (NB in msec): 620.375000
>       Maximum (SB in msec): 23
>       Average (SB in msec): 21.468759
>       Maximum (northd-loop in msec): 6002
>       Minimum (northd-loop in msec): 0
>       Average (northd-loop in msec): 914.760417
>       Long term average (northd-loop in msec): 104.799340
>
>     With
>
>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>       ---
>       Maximum (NB in msec): 1148
>       Average (NB in msec): 630.250000
>       Maximum (SB in msec): 24
>       Average (SB in msec): 21.468744
>       Maximum (northd-loop in msec): 6090
>       Minimum (northd-loop in msec): 0
>       Average (northd-loop in msec): 762.101565
>       Long term average (northd-loop in msec): 80.735192
>
>     The metric which actually matters and which SHOULD me measured - long term average is better by 20%. Using short term average instead of long term in the test suite is actually a BUG.
>
> Good catch!
>
>     Are you running yours under some sort of virtualization?
>
>
> No, I am testing on a bare-metal.
>
>     A.
>
>     On 30/09/2021 07:52, Han Zhou wrote:
>>     Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz, 24 cores.
>>     It is weird why my result is so different. I also verified with a scale test script that creates a large scale NB/SB with 800 nodes of simulated k8s setup. And then just run:
>>         ovn-nbctl --print-wait-time --wait=sb sync
>>
>>     Without parallel:
>>     ovn-northd completion: 7807ms
>>
>>     With parallel:
>>     ovn-northd completion: 41267ms
>>
>>     I suspected the hmap size problem but I tried changing the initial size to 64k buckets and it didn't help. I will find some time to check the "perf" reports.
>>
>>     Thanks,
>>     Han
>>
>>     On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>
>>         On 30/09/2021 07:16, Anton Ivanov wrote:
>>>         Results on a Ryzen 5 3600 - 6 cores 12 threads
>>
>>         I will also have a look into the "maximum" measurement for multi-thread.
>>
>>         It does not tie up with the drop in average across the board.
>>
>>         A.
>>
>>>
>>>         Without
>>>
>>>
>>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 1256
>>>           Average (NB in msec): 679.463785
>>>           Maximum (SB in msec): 25
>>>           Average (SB in msec): 22.489798
>>>           Maximum (northd-loop in msec): 1347
>>>           Average (northd-loop in msec): 799.944878
>>>
>>>           2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1956
>>>           Average (NB in msec): 809.387285
>>>           Maximum (SB in msec): 24
>>>           Average (SB in msec): 21.649258
>>>           Maximum (northd-loop in msec): 2011
>>>           Average (northd-loop in msec): 961.718686
>>>
>>>           5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 557
>>>           Average (NB in msec): 474.010337
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 13.927192
>>>           Maximum (northd-loop in msec): 1261
>>>           Average (northd-loop in msec): 580.999122
>>>
>>>           6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 756
>>>           Average (NB in msec): 625.614724
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 14.181048
>>>           Maximum (northd-loop in msec): 1649
>>>           Average (northd-loop in msec): 746.208332
>>>
>>>
>>>         With
>>>
>>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 1140
>>>           Average (NB in msec): 631.125000
>>>           Maximum (SB in msec): 24
>>>           Average (SB in msec): 21.453609
>>>           Maximum (northd-loop in msec): 6080
>>>           Average (northd-loop in msec): 759.718815
>>>
>>>           2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1210
>>>           Average (NB in msec): 673.000000
>>>           Maximum (SB in msec): 27
>>>           Average (SB in msec): 22.453125
>>>           Maximum (northd-loop in msec): 6514
>>>           Average (northd-loop in msec): 808.596842
>>>
>>>           5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 798
>>>           Average (NB in msec): 429.750000
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 12.998533
>>>           Maximum (northd-loop in msec): 3835
>>>           Average (northd-loop in msec): 564.875986
>>>
>>>           6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1074
>>>           Average (NB in msec): 593.875000
>>>           Maximum (SB in msec): 14
>>>           Average (SB in msec): 13.655273
>>>           Maximum (northd-loop in msec): 4973
>>>           Average (northd-loop in msec): 771.102605
>>>
>>>         The only one slower is test 6 which I will look into.
>>>
>>>         The rest are > 5% faster.
>>>
>>>         A.
>>>
>>>         On 30/09/2021 00:56, Han Zhou wrote:
>>>>
>>>>
>>>>         On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>>>         >
>>>>         > From: Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>>
>>>>         >
>>>>         > Restore parallel build with dp groups using rwlock instead
>>>>         > of per row locking as an underlying mechanism.
>>>>         >
>>>>         > This provides improvement ~ 10% end-to-end on ovn-heater
>>>>         > under virutalization despite awakening some qemu gremlin
>>>>         > which makes qemu climb to silly CPU usage. The gain on
>>>>         > bare metal is likely to be higher.
>>>>         >
>>>>         Hi Anton,
>>>>
>>>>         I am trying to see the benefit of parallel_build, but encountered unexpected performance result when running the perf tests with command:
>>>>              make check-perf TESTSUITEFLAGS="--rebuild"
>>>>
>>>>         It shows significantly worse performance than without parallel_build. For dp_group = no cases, it is better, but still ~30% slower than without parallel_build. I have 24 cores, but each thread is not consuming much CPU except the main thread. I also tried hardcode the number of thread to just 4, which end up with slightly better results, but still far behind "without parallel_build".
>>>>
>>>>                    no parallel                    | parallel  (24 pool threads)       |        parallel with (4 pool threads)
>>>>                            |   |
>>>>             1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1058                     |    Maximum (NB in msec): 4269     |    Maximum (NB in msec): 4097
>>>>             Average (NB in msec): 836.941167                     |    Average (NB in msec): 3697.253931      |    Average (NB in msec): 3498.311525
>>>>             Maximum (SB in msec): 30                     |    Maximum (SB in msec): 30     |    Maximum (SB in msec): 28
>>>>             Average (SB in msec): 25.934011                      |    Average (SB in msec): 26.001840      |    Average (SB in msec): 25.685091
>>>>             Maximum (northd-loop in msec): 1204                    |    Maximum (northd-loop in msec): 4379          |    Maximum (northd-loop in msec): 4251
>>>>             Average (northd-loop in msec): 1005.330078             |    Average (northd-loop in msec): 4233.871504         |    Average (northd-loop in msec): 4022.774208
>>>>                              |     |
>>>>             2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1124                     |    Maximum (NB in msec): 1480     |    Maximum (NB in msec): 1331
>>>>             Average (NB in msec): 892.403405                     |    Average (NB in msec): 1206.189287      |    Average (NB in msec): 1089.378455
>>>>             Maximum (SB in msec): 29                     |    Maximum (SB in msec): 31     |    Maximum (SB in msec): 30
>>>>             Average (SB in msec): 26.922632                      |    Average (SB in msec): 26.636706      |    Average (SB in msec): 25.657484
>>>>             Maximum (northd-loop in msec): 1275                    |    Maximum (northd-loop in msec): 1639          |    Maximum (northd-loop in msec): 1495
>>>>             Average (northd-loop in msec): 1074.917873             |    Average (northd-loop in msec): 1458.152327         |    Average (northd-loop in msec): 1301.057201
>>>>                              |     |
>>>>             5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 768                      |    Maximum (NB in msec): 3086     |    Maximum (NB in msec): 2876
>>>>             Average (NB in msec): 614.491938                     |    Average (NB in msec): 2681.688365      |    Average (NB in msec): 2531.255444
>>>>             Maximum (SB in msec): 18                     |    Maximum (SB in msec): 17     |    Maximum (SB in msec): 18
>>>>             Average (SB in msec): 16.347526                      |    Average (SB in msec): 15.955263      |    Average (SB in msec): 16.278075
>>>>             Maximum (northd-loop in msec): 889                     |    Maximum (northd-loop in msec): 3247          |    Maximum (northd-loop in msec): 3031
>>>>             Average (northd-loop in msec): 772.083572              |    Average (northd-loop in msec): 3117.504297         |    Average (northd-loop in msec): 2833.182361
>>>>                              |     |
>>>>             6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1046                     |    Maximum (NB in msec): 1371     |    Maximum (NB in msec): 1262
>>>>             Average (NB in msec): 827.735852                     |    Average (NB in msec): 1135.514228      |    Average (NB in msec): 970.544792
>>>>             Maximum (SB in msec): 19                     |    Maximum (SB in msec): 18     |    Maximum (SB in msec): 19
>>>>             Average (SB in msec): 16.828127                      |    Average (SB in msec): 16.083914      |    Average (SB in msec): 15.602525
>>>>             Maximum (northd-loop in msec): 1163                    |    Maximum (northd-loop in msec): 1545          |    Maximum (northd-loop in msec): 1411
>>>>             Average (northd-loop in msec): 972.567407              |    Average (northd-loop in msec): 1328.617583         |    Average (northd-loop in msec): 1207.667100
>>>>
>>>>         I didn't debug yet, but do you have any clue what could be the reason? I am using the upstream commit 9242f27f63 which already included this patch.
>>>>         Below is my change to the perf-northd.at <http://perf-northd.at> file just to enable parallel_build:
>>>>
>>>>         diff --git a/tests/perf-northd.at <http://perf-northd.at> b/tests/perf-northd.at <http://perf-northd.at>
>>>>         index 74b69e9d4..9328c2e21 100644
>>>>         --- a/tests/perf-northd.at <http://perf-northd.at>
>>>>         +++ b/tests/perf-northd.at <http://perf-northd.at>
>>>>         @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hype
>>>>          PERF_RECORD_START()
>>>>
>>>>          ovn_start
>>>>         +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>
>>>>          BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>>>
>>>>         @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hyper
>>>>          PERF_RECORD_START()
>>>>
>>>>          ovn_start
>>>>         +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>
>>>>          BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>>>
>>>>         Thanks,
>>>>         Han
>>>
>>>
>>>         -- 
>>>         Anton R. Ivanov
>>>         Cambridgegreys Limited. Registered in England. Company Number 10273661
>>>         https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>
>>
>>         -- 
>>         Anton R. Ivanov
>>         Cambridgegreys Limited. Registered in England. Company Number 10273661
>>         https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>
>     -- 
>     Anton R. Ivanov
>     Cambridgegreys Limited. Registered in England. Company Number 10273661
>     https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>
Anton Ivanov Sept. 30, 2021, 2:34 p.m. UTC | #9
Summary of findings.

1. The numbers on the perf test do not align with heater which is much closer to a realistic load. On some tests where heater gives 5-10% end-to-end improvement with parallelization we get worse results with the perf-test. You spotted this one correctly.

Example of the northd average pulled out of the test report via grep and sed.

    127.489353
    131.509458
    116.088205
    94.721911
    119.629756
    114.896258
    124.811069
    129.679160
    106.699905
    134.490338
    112.106713
    135.957658
    132.471111
    94.106849
    117.431450
    115.861592
    106.830657
    132.396905
    107.092542
    128.945760
    94.298464
    120.455510
    136.910426
    134.311765
    115.881292
    116.918458

These values are all over the place - this is not a reproducible test.

2. In the present state you need to re-run it > 30+ times and take an average. The standard deviation for the values for the northd loop is > 10%. Compared to that the reproducibility of ovn-heater is significantly better. I usually get less than 0.5% difference between runs if there was no iteration failures. I would suggest using that instead if you want to do performance comparisons until we have figured out what affects the perf-test.

3. It is using the short term running average value in reports which is probably wrong because you have very significant skew from the last several values.

I will look into all of these.

Brgds,

On 30/09/2021 08:26, Han Zhou wrote:
>
>
> On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>
>     After quickly adding some more prints into the testsuite.
>
>     Test 1:
>
>     Without
>
>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>       ---
>       Maximum (NB in msec): 1130
>       Average (NB in msec): 620.375000
>       Maximum (SB in msec): 23
>       Average (SB in msec): 21.468759
>       Maximum (northd-loop in msec): 6002
>       Minimum (northd-loop in msec): 0
>       Average (northd-loop in msec): 914.760417
>       Long term average (northd-loop in msec): 104.799340
>
>     With
>
>       1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>       ---
>       Maximum (NB in msec): 1148
>       Average (NB in msec): 630.250000
>       Maximum (SB in msec): 24
>       Average (SB in msec): 21.468744
>       Maximum (northd-loop in msec): 6090
>       Minimum (northd-loop in msec): 0
>       Average (northd-loop in msec): 762.101565
>       Long term average (northd-loop in msec): 80.735192
>
>     The metric which actually matters and which SHOULD me measured - long term average is better by 20%. Using short term average instead of long term in the test suite is actually a BUG.
>
> Good catch!
>
>     Are you running yours under some sort of virtualization?
>
>
> No, I am testing on a bare-metal.
>
>     A.
>
>     On 30/09/2021 07:52, Han Zhou wrote:
>>     Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz, 24 cores.
>>     It is weird why my result is so different. I also verified with a scale test script that creates a large scale NB/SB with 800 nodes of simulated k8s setup. And then just run:
>>         ovn-nbctl --print-wait-time --wait=sb sync
>>
>>     Without parallel:
>>     ovn-northd completion: 7807ms
>>
>>     With parallel:
>>     ovn-northd completion: 41267ms
>>
>>     I suspected the hmap size problem but I tried changing the initial size to 64k buckets and it didn't help. I will find some time to check the "perf" reports.
>>
>>     Thanks,
>>     Han
>>
>>     On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>
>>         On 30/09/2021 07:16, Anton Ivanov wrote:
>>>         Results on a Ryzen 5 3600 - 6 cores 12 threads
>>
>>         I will also have a look into the "maximum" measurement for multi-thread.
>>
>>         It does not tie up with the drop in average across the board.
>>
>>         A.
>>
>>>
>>>         Without
>>>
>>>
>>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 1256
>>>           Average (NB in msec): 679.463785
>>>           Maximum (SB in msec): 25
>>>           Average (SB in msec): 22.489798
>>>           Maximum (northd-loop in msec): 1347
>>>           Average (northd-loop in msec): 799.944878
>>>
>>>           2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1956
>>>           Average (NB in msec): 809.387285
>>>           Maximum (SB in msec): 24
>>>           Average (SB in msec): 21.649258
>>>           Maximum (northd-loop in msec): 2011
>>>           Average (northd-loop in msec): 961.718686
>>>
>>>           5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 557
>>>           Average (NB in msec): 474.010337
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 13.927192
>>>           Maximum (northd-loop in msec): 1261
>>>           Average (northd-loop in msec): 580.999122
>>>
>>>           6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 756
>>>           Average (NB in msec): 625.614724
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 14.181048
>>>           Maximum (northd-loop in msec): 1649
>>>           Average (northd-loop in msec): 746.208332
>>>
>>>
>>>         With
>>>
>>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 1140
>>>           Average (NB in msec): 631.125000
>>>           Maximum (SB in msec): 24
>>>           Average (SB in msec): 21.453609
>>>           Maximum (northd-loop in msec): 6080
>>>           Average (northd-loop in msec): 759.718815
>>>
>>>           2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1210
>>>           Average (NB in msec): 673.000000
>>>           Maximum (SB in msec): 27
>>>           Average (SB in msec): 22.453125
>>>           Maximum (northd-loop in msec): 6514
>>>           Average (northd-loop in msec): 808.596842
>>>
>>>           5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>           ---
>>>           Maximum (NB in msec): 798
>>>           Average (NB in msec): 429.750000
>>>           Maximum (SB in msec): 15
>>>           Average (SB in msec): 12.998533
>>>           Maximum (northd-loop in msec): 3835
>>>           Average (northd-loop in msec): 564.875986
>>>
>>>           6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>           ---
>>>           Maximum (NB in msec): 1074
>>>           Average (NB in msec): 593.875000
>>>           Maximum (SB in msec): 14
>>>           Average (SB in msec): 13.655273
>>>           Maximum (northd-loop in msec): 4973
>>>           Average (northd-loop in msec): 771.102605
>>>
>>>         The only one slower is test 6 which I will look into.
>>>
>>>         The rest are > 5% faster.
>>>
>>>         A.
>>>
>>>         On 30/09/2021 00:56, Han Zhou wrote:
>>>>
>>>>
>>>>         On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>>>         >
>>>>         > From: Anton Ivanov <anton.ivanov@cambridgegreys.com <mailto:anton.ivanov@cambridgegreys.com>>
>>>>         >
>>>>         > Restore parallel build with dp groups using rwlock instead
>>>>         > of per row locking as an underlying mechanism.
>>>>         >
>>>>         > This provides improvement ~ 10% end-to-end on ovn-heater
>>>>         > under virutalization despite awakening some qemu gremlin
>>>>         > which makes qemu climb to silly CPU usage. The gain on
>>>>         > bare metal is likely to be higher.
>>>>         >
>>>>         Hi Anton,
>>>>
>>>>         I am trying to see the benefit of parallel_build, but encountered unexpected performance result when running the perf tests with command:
>>>>              make check-perf TESTSUITEFLAGS="--rebuild"
>>>>
>>>>         It shows significantly worse performance than without parallel_build. For dp_group = no cases, it is better, but still ~30% slower than without parallel_build. I have 24 cores, but each thread is not consuming much CPU except the main thread. I also tried hardcode the number of thread to just 4, which end up with slightly better results, but still far behind "without parallel_build".
>>>>
>>>>                    no parallel                    | parallel  (24 pool threads)       |        parallel with (4 pool threads)
>>>>                            |   |
>>>>             1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1058                     |    Maximum (NB in msec): 4269     |    Maximum (NB in msec): 4097
>>>>             Average (NB in msec): 836.941167                     |    Average (NB in msec): 3697.253931      |    Average (NB in msec): 3498.311525
>>>>             Maximum (SB in msec): 30                     |    Maximum (SB in msec): 30     |    Maximum (SB in msec): 28
>>>>             Average (SB in msec): 25.934011                      |    Average (SB in msec): 26.001840      |    Average (SB in msec): 25.685091
>>>>             Maximum (northd-loop in msec): 1204                    |    Maximum (northd-loop in msec): 4379          |    Maximum (northd-loop in msec): 4251
>>>>             Average (northd-loop in msec): 1005.330078             |    Average (northd-loop in msec): 4233.871504         |    Average (northd-loop in msec): 4022.774208
>>>>                              |     |
>>>>             2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic scale test -- 200 Hypervisors, 200
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1124                     |    Maximum (NB in msec): 1480     |    Maximum (NB in msec): 1331
>>>>             Average (NB in msec): 892.403405                     |    Average (NB in msec): 1206.189287      |    Average (NB in msec): 1089.378455
>>>>             Maximum (SB in msec): 29                     |    Maximum (SB in msec): 31     |    Maximum (SB in msec): 30
>>>>             Average (SB in msec): 26.922632                      |    Average (SB in msec): 26.636706      |    Average (SB in msec): 25.657484
>>>>             Maximum (northd-loop in msec): 1275                    |    Maximum (northd-loop in msec): 1639          |    Maximum (northd-loop in msec): 1495
>>>>             Average (northd-loop in msec): 1074.917873             |    Average (northd-loop in msec): 1458.152327         |    Average (northd-loop in msec): 1301.057201
>>>>                              |     |
>>>>             5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 768                      |    Maximum (NB in msec): 3086     |    Maximum (NB in msec): 2876
>>>>             Average (NB in msec): 614.491938                     |    Average (NB in msec): 2681.688365      |    Average (NB in msec): 2531.255444
>>>>             Maximum (SB in msec): 18                     |    Maximum (SB in msec): 17     |    Maximum (SB in msec): 18
>>>>             Average (SB in msec): 16.347526                      |    Average (SB in msec): 15.955263      |    Average (SB in msec): 16.278075
>>>>             Maximum (northd-loop in msec): 889                     |    Maximum (northd-loop in msec): 3247          |    Maximum (northd-loop in msec): 3031
>>>>             Average (northd-loop in msec): 772.083572              |    Average (northd-loop in msec): 3117.504297         |    Average (northd-loop in msec): 2833.182361
>>>>                              |     |
>>>>             6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             ---                      |    ---      |    ---
>>>>             Maximum (NB in msec): 1046                     |    Maximum (NB in msec): 1371     |    Maximum (NB in msec): 1262
>>>>             Average (NB in msec): 827.735852                     |    Average (NB in msec): 1135.514228      |    Average (NB in msec): 970.544792
>>>>             Maximum (SB in msec): 19                     |    Maximum (SB in msec): 18     |    Maximum (SB in msec): 19
>>>>             Average (SB in msec): 16.828127                      |    Average (SB in msec): 16.083914      |    Average (SB in msec): 15.602525
>>>>             Maximum (northd-loop in msec): 1163                    |    Maximum (northd-loop in msec): 1545          |    Maximum (northd-loop in msec): 1411
>>>>             Average (northd-loop in msec): 972.567407              |    Average (northd-loop in msec): 1328.617583         |    Average (northd-loop in msec): 1207.667100
>>>>
>>>>         I didn't debug yet, but do you have any clue what could be the reason? I am using the upstream commit 9242f27f63 which already included this patch.
>>>>         Below is my change to the perf-northd.at <http://perf-northd.at> file just to enable parallel_build:
>>>>
>>>>         diff --git a/tests/perf-northd.at <http://perf-northd.at> b/tests/perf-northd.at <http://perf-northd.at>
>>>>         index 74b69e9d4..9328c2e21 100644
>>>>         --- a/tests/perf-northd.at <http://perf-northd.at>
>>>>         +++ b/tests/perf-northd.at <http://perf-northd.at>
>>>>         @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 Hypervisors, 200 Logical Ports/Hype
>>>>          PERF_RECORD_START()
>>>>
>>>>          ovn_start
>>>>         +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>
>>>>          BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>>>
>>>>         @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 Hypervisors, 50 Logical Ports/Hyper
>>>>          PERF_RECORD_START()
>>>>
>>>>          ovn_start
>>>>         +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>
>>>>          BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>>>
>>>>         Thanks,
>>>>         Han
>>>
>>>
>>>         -- 
>>>         Anton R. Ivanov
>>>         Cambridgegreys Limited. Registered in England. Company Number 10273661
>>>         https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>
>>
>>         -- 
>>         Anton R. Ivanov
>>         Cambridgegreys Limited. Registered in England. Company Number 10273661
>>         https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>
>     -- 
>     Anton R. Ivanov
>     Cambridgegreys Limited. Registered in England. Company Number 10273661
>     https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>
Han Zhou Sept. 30, 2021, 7:48 p.m. UTC | #10
On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov <
anton.ivanov@cambridgegreys.com> wrote:

> Summary of findings.
>
> 1. The numbers on the perf test do not align with heater which is much
> closer to a realistic load. On some tests where heater gives 5-10%
> end-to-end improvement with parallelization we get worse results with the
> perf-test. You spotted this one correctly.
>
> Example of the northd average pulled out of the test report via grep and
> sed.
>
>    127.489353
>    131.509458
>    116.088205
>    94.721911
>    119.629756
>    114.896258
>    124.811069
>    129.679160
>    106.699905
>    134.490338
>    112.106713
>    135.957658
>    132.471111
>    94.106849
>    117.431450
>    115.861592
>    106.830657
>    132.396905
>    107.092542
>    128.945760
>    94.298464
>    120.455510
>    136.910426
>    134.311765
>    115.881292
>    116.918458
>
> These values are all over the place - this is not a reproducible test.
>
> 2. In the present state you need to re-run it > 30+ times and take an
> average. The standard deviation for the values for the northd loop is >
> 10%. Compared to that the reproducibility of ovn-heater is significantly
> better. I usually get less than 0.5% difference between runs if there was
> no iteration failures. I would suggest using that instead if you want to do
> performance comparisons until we have figured out what affects the
> perf-test.
>
> 3. It is using the short term running average value in reports which is
> probably wrong because you have very significant skew from the last several
> values.
>
> I will look into all of these.
>
Thanks for the summary! However, I think there is a bigger problem
(probably related to my environment) than the stability of the test (make
check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in an earlier
email I observed even worse results with a large scale topology closer to a
real world deployment of ovn-k8s just testing with the command:
    ovn-nbctl --print-wait-time --wait=sb sync

This command simply triggers a change in NB_Global table and wait for
northd to complete all the recompute and update SB. It doesn't have to use
"sync" command but any change to the NB DB produces similar result (e.g.:
ovn-nbctl --print-wait-time --wait=sb ls-add ls1)

Without parallel:
ovn-northd completion: 7807ms

With parallel:
ovn-northd completion: 41267ms

This result is stable and consistent when repeating the command on my
machine. Would you try it on your machine as well? I understand that only
the lflow generation part can be parallelized and it doesn't solve all the
bottleneck, but I did expect it to be faster instead of slower. If your
result always shows that parallel is better, then I will have to dig it out
myself on my test machine.

Thanks,
Han

> Brgds,
> On 30/09/2021 08:26, Han Zhou wrote:
>
>
>
> On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <
> anton.ivanov@cambridgegreys.com> wrote:
>
>> After quickly adding some more prints into the testsuite.
>>
>> Test 1:
>>
>> Without
>>
>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 1130
>>   Average (NB in msec): 620.375000
>>   Maximum (SB in msec): 23
>>   Average (SB in msec): 21.468759
>>   Maximum (northd-loop in msec): 6002
>>   Minimum (northd-loop in msec): 0
>>   Average (northd-loop in msec): 914.760417
>>   Long term average (northd-loop in msec): 104.799340
>>
>> With
>>
>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>   ---
>>   Maximum (NB in msec): 1148
>>   Average (NB in msec): 630.250000
>>   Maximum (SB in msec): 24
>>   Average (SB in msec): 21.468744
>>   Maximum (northd-loop in msec): 6090
>>   Minimum (northd-loop in msec): 0
>>   Average (northd-loop in msec): 762.101565
>>   Long term average (northd-loop in msec): 80.735192
>>
>> The metric which actually matters and which SHOULD me measured - long
>> term average is better by 20%. Using short term average instead of long
>> term in the test suite is actually a BUG.
>>
> Good catch!
>
>
>> Are you running yours under some sort of virtualization?
>>
>
> No, I am testing on a bare-metal.
>
>
>> A.
>> On 30/09/2021 07:52, Han Zhou wrote:
>>
>> Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @
>> 2.90GHz, 24 cores.
>> It is weird why my result is so different. I also verified with a scale
>> test script that creates a large scale NB/SB with 800 nodes of simulated
>> k8s setup. And then just run:
>>     ovn-nbctl --print-wait-time --wait=sb sync
>>
>> Without parallel:
>> ovn-northd completion: 7807ms
>>
>> With parallel:
>> ovn-northd completion: 41267ms
>>
>> I suspected the hmap size problem but I tried changing the initial size
>> to 64k buckets and it didn't help. I will find some time to check the
>> "perf" reports.
>>
>> Thanks,
>> Han
>>
>> On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <
>> anton.ivanov@cambridgegreys.com> wrote:
>>
>>> On 30/09/2021 07:16, Anton Ivanov wrote:
>>>
>>> Results on a Ryzen 5 3600 - 6 cores 12 threads
>>>
>>> I will also have a look into the "maximum" measurement for multi-thread.
>>>
>>> It does not tie up with the drop in average across the board.
>>>
>>> A.
>>>
>>>
>>> Without
>>>
>>>
>>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>   ---
>>>   Maximum (NB in msec): 1256
>>>   Average (NB in msec): 679.463785
>>>   Maximum (SB in msec): 25
>>>   Average (SB in msec): 22.489798
>>>   Maximum (northd-loop in msec): 1347
>>>   Average (northd-loop in msec): 799.944878
>>>
>>>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>   ---
>>>   Maximum (NB in msec): 1956
>>>   Average (NB in msec): 809.387285
>>>   Maximum (SB in msec): 24
>>>   Average (SB in msec): 21.649258
>>>   Maximum (northd-loop in msec): 2011
>>>   Average (northd-loop in msec): 961.718686
>>>
>>>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>   ---
>>>   Maximum (NB in msec): 557
>>>   Average (NB in msec): 474.010337
>>>   Maximum (SB in msec): 15
>>>   Average (SB in msec): 13.927192
>>>   Maximum (northd-loop in msec): 1261
>>>   Average (northd-loop in msec): 580.999122
>>>
>>>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>   ---
>>>   Maximum (NB in msec): 756
>>>   Average (NB in msec): 625.614724
>>>   Maximum (SB in msec): 15
>>>   Average (SB in msec): 14.181048
>>>   Maximum (northd-loop in msec): 1649
>>>   Average (northd-loop in msec): 746.208332
>>>
>>>
>>> With
>>>
>>>   1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>   ---
>>>   Maximum (NB in msec): 1140
>>>   Average (NB in msec): 631.125000
>>>   Maximum (SB in msec): 24
>>>   Average (SB in msec): 21.453609
>>>   Maximum (northd-loop in msec): 6080
>>>   Average (northd-loop in msec): 759.718815
>>>
>>>   2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>   ---
>>>   Maximum (NB in msec): 1210
>>>   Average (NB in msec): 673.000000
>>>   Maximum (SB in msec): 27
>>>   Average (SB in msec): 22.453125
>>>   Maximum (northd-loop in msec): 6514
>>>   Average (northd-loop in msec): 808.596842
>>>
>>>   5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>   ---
>>>   Maximum (NB in msec): 798
>>>   Average (NB in msec): 429.750000
>>>   Maximum (SB in msec): 15
>>>   Average (SB in msec): 12.998533
>>>   Maximum (northd-loop in msec): 3835
>>>   Average (northd-loop in msec): 564.875986
>>>
>>>   6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
>>> Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>   ---
>>>   Maximum (NB in msec): 1074
>>>   Average (NB in msec): 593.875000
>>>   Maximum (SB in msec): 14
>>>   Average (SB in msec): 13.655273
>>>   Maximum (northd-loop in msec): 4973
>>>   Average (northd-loop in msec): 771.102605
>>>
>>> The only one slower is test 6 which I will look into.
>>>
>>> The rest are > 5% faster.
>>>
>>> A.
>>>
>>> On 30/09/2021 00:56, Han Zhou wrote:
>>>
>>>
>>>
>>> On Wed, Sep 15, 2021 at 5:45 AM <anton.ivanov@cambridgegreys.com> wrote:
>>> >
>>> > From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>> >
>>> > Restore parallel build with dp groups using rwlock instead
>>> > of per row locking as an underlying mechanism.
>>> >
>>> > This provides improvement ~ 10% end-to-end on ovn-heater
>>> > under virutalization despite awakening some qemu gremlin
>>> > which makes qemu climb to silly CPU usage. The gain on
>>> > bare metal is likely to be higher.
>>> >
>>> Hi Anton,
>>>
>>> I am trying to see the benefit of parallel_build, but encountered
>>> unexpected performance result when running the perf tests with command:
>>>      make check-perf TESTSUITEFLAGS="--rebuild"
>>>
>>> It shows significantly worse performance than without parallel_build.
>>> For dp_group = no cases, it is better, but still ~30% slower than without
>>> parallel_build. I have 24 cores, but each thread is not consuming much CPU
>>> except the main thread. I also tried hardcode the number of thread to just
>>> 4, which end up with slightly better results, but still far behind "without
>>> parallel_build".
>>>
>>>              no parallel                                   |
>>>    parallel  (24 pool threads)                  |        parallel with (4
>>> pool threads)
>>>                                                            |
>>>                                               |
>>>     1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1:
>>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd
>>> basic scale test -- 200 Hypervisors, 200
>>>     ---                                                    |    ---
>>>                                                |    ---
>>>     Maximum (NB in msec): 1058                             |    Maximum
>>> (NB in msec): 4269                             |    Maximum (NB in msec):
>>> 4097
>>>     Average (NB in msec): 836.941167                       |    Average
>>> (NB in msec): 3697.253931                      |    Average (NB in msec):
>>> 3498.311525
>>>     Maximum (SB in msec): 30                               |    Maximum
>>> (SB in msec): 30                               |    Maximum (SB in msec):
>>> 28
>>>     Average (SB in msec): 25.934011                        |    Average
>>> (SB in msec): 26.001840                        |    Average (SB in msec):
>>> 25.685091
>>>     Maximum (northd-loop in msec): 1204                    |    Maximum
>>> (northd-loop in msec): 4379                    |    Maximum (northd-loop in
>>> msec): 4251
>>>     Average (northd-loop in msec): 1005.330078             |    Average
>>> (northd-loop in msec): 4233.871504             |    Average (northd-loop in
>>> msec): 4022.774208
>>>                                                            |
>>>                                               |
>>>     2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2:
>>> ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd
>>> basic scale test -- 200 Hypervisors, 200
>>>     ---                                                    |    ---
>>>                                                |    ---
>>>     Maximum (NB in msec): 1124                             |    Maximum
>>> (NB in msec): 1480                             |    Maximum (NB in msec):
>>> 1331
>>>     Average (NB in msec): 892.403405                       |    Average
>>> (NB in msec): 1206.189287                      |    Average (NB in msec):
>>> 1089.378455
>>>     Maximum (SB in msec): 29                               |    Maximum
>>> (SB in msec): 31                               |    Maximum (SB in msec):
>>> 30
>>>     Average (SB in msec): 26.922632                        |    Average
>>> (SB in msec): 26.636706                        |    Average (SB in msec):
>>> 25.657484
>>>     Maximum (northd-loop in msec): 1275                    |    Maximum
>>> (northd-loop in msec): 1639                    |    Maximum (northd-loop in
>>> msec): 1495
>>>     Average (northd-loop in msec): 1074.917873             |    Average
>>> (northd-loop in msec): 1458.152327             |    Average (northd-loop in
>>> msec): 1301.057201
>>>                                                            |
>>>                                               |
>>>     5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5:
>>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd
>>> basic scale test -- 500 Hypervisors, 50
>>>     ---                                                    |    ---
>>>                                                |    ---
>>>     Maximum (NB in msec): 768                              |    Maximum
>>> (NB in msec): 3086                             |    Maximum (NB in msec):
>>> 2876
>>>     Average (NB in msec): 614.491938                       |    Average
>>> (NB in msec): 2681.688365                      |    Average (NB in msec):
>>> 2531.255444
>>>     Maximum (SB in msec): 18                               |    Maximum
>>> (SB in msec): 17                               |    Maximum (SB in msec):
>>> 18
>>>     Average (SB in msec): 16.347526                        |    Average
>>> (SB in msec): 15.955263                        |    Average (SB in msec):
>>> 16.278075
>>>     Maximum (northd-loop in msec): 889                     |    Maximum
>>> (northd-loop in msec): 3247                    |    Maximum (northd-loop in
>>> msec): 3031
>>>     Average (northd-loop in msec): 772.083572              |    Average
>>> (northd-loop in msec): 3117.504297             |    Average (northd-loop in
>>> msec): 2833.182361
>>>                                                            |
>>>                                               |
>>>     6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6:
>>> ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd
>>> basic scale test -- 500 Hypervisors, 50
>>>     ---                                                    |    ---
>>>                                                |    ---
>>>     Maximum (NB in msec): 1046                             |    Maximum
>>> (NB in msec): 1371                             |    Maximum (NB in msec):
>>> 1262
>>>     Average (NB in msec): 827.735852                       |    Average
>>> (NB in msec): 1135.514228                      |    Average (NB in msec):
>>> 970.544792
>>>     Maximum (SB in msec): 19                               |    Maximum
>>> (SB in msec): 18                               |    Maximum (SB in msec):
>>> 19
>>>     Average (SB in msec): 16.828127                        |    Average
>>> (SB in msec): 16.083914                        |    Average (SB in msec):
>>> 15.602525
>>>     Maximum (northd-loop in msec): 1163                    |    Maximum
>>> (northd-loop in msec): 1545                    |    Maximum (northd-loop in
>>> msec): 1411
>>>     Average (northd-loop in msec): 972.567407              |    Average
>>> (northd-loop in msec): 1328.617583             |    Average (northd-loop in
>>> msec): 1207.667100
>>>
>>> I didn't debug yet, but do you have any clue what could be the reason? I
>>> am using the upstream commit 9242f27f63 which already included this patch.
>>> Below is my change to the perf-northd.at file just to enable
>>> parallel_build:
>>>
>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>> index 74b69e9d4..9328c2e21 100644
>>> --- a/tests/perf-northd.at
>>> +++ b/tests/perf-northd.at
>>> @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200
>>> Hypervisors, 200 Logical Ports/Hype
>>>  PERF_RECORD_START()
>>>
>>>  ovn_start
>>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>
>>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>>
>>> @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500
>>> Hypervisors, 50 Logical Ports/Hyper
>>>  PERF_RECORD_START()
>>>
>>>  ovn_start
>>> +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>
>>>  BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>>
>>> Thanks,
>>> Han
>>>
>>>
>>> --
>>> Anton R. Ivanov
>>> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>>>
>>>
>>> --
>>> Anton R. Ivanov
>>> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>>>
>>> --
>> Anton R. Ivanov
>> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>>
>> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>
Anton Ivanov Sept. 30, 2021, 9:03 p.m. UTC | #11
On 30/09/2021 20:48, Han Zhou wrote:
>
>
> On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov 
> <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>
>     Summary of findings.
>
>     1. The numbers on the perf test do not align with heater which is
>     much closer to a realistic load. On some tests where heater gives
>     5-10% end-to-end improvement with parallelization we get worse
>     results with the perf-test. You spotted this one correctly.
>
>     Example of the northd average pulled out of the test report via
>     grep and sed.
>
>        127.489353
>        131.509458
>        116.088205
>        94.721911
>        119.629756
>        114.896258
>        124.811069
>        129.679160
>        106.699905
>        134.490338
>        112.106713
>        135.957658
>        132.471111
>        94.106849
>        117.431450
>        115.861592
>        106.830657
>        132.396905
>        107.092542
>        128.945760
>        94.298464
>        120.455510
>        136.910426
>        134.311765
>        115.881292
>        116.918458
>
>     These values are all over the place - this is not a reproducible test.
>
>     2. In the present state you need to re-run it > 30+ times and take
>     an average. The standard deviation for the values for the northd
>     loop is > 10%. Compared to that the reproducibility of ovn-heater
>     is significantly better. I usually get less than 0.5% difference
>     between runs if there was no iteration failures. I would suggest
>     using that instead if you want to do performance comparisons until
>     we have figured out what affects the perf-test.
>
>     3. It is using the short term running average value in reports
>     which is probably wrong because you have very significant skew
>     from the last several values.
>
>     I will look into all of these.
>
> Thanks for the summary! However, I think there is a bigger problem 
> (probably related to my environment) than the stability of the test 
> (make check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in 
> an earlier email I observed even worse results with a large scale 
> topology closer to a real world deployment of ovn-k8s just testing 
> with the command:
>     ovn-nbctl --print-wait-time --wait=sb sync
>
> This command simply triggers a change in NB_Global table and wait for 
> northd to complete all the recompute and update SB. It doesn't have to 
> use "sync" command but any change to the NB DB produces similar result 
> (e.g.: ovn-nbctl --print-wait-time --wait=sb ls-add ls1)
>
> Without parallel:
> ovn-northd completion: 7807ms
>
> With parallel:
> ovn-northd completion: 41267ms

Is this with current master or prior to these patches?

1. There was an issue prior to these where the hash on first iteration 
with an existing database when loading a large database for the first 
time was not sized correctly. These numbers sound about right when this 
bug was around.

2. There should be NO DIFFERENCE in a single compute cycle with an 
existing database between a run with parallel and without with dp groups 
at present. This is because the first cycle does not use parallel 
compute. It is disabled in order to achieve the correct hash sizings for 
future cycle by auto-scaling the hash.

So what exact tag/commit are you running this with and with what options 
are on/off?

A.

>
> This result is stable and consistent when repeating the command on my 
> machine. Would you try it on your machine as well? I understand that 
> only the lflow generation part can be parallelized and it doesn't 
> solve all the bottleneck, but I did expect it to be faster instead of 
> slower. If your result always shows that parallel is better, then I 
> will have to dig it out myself on my test machine.
>
> Thanks,
> Han
>
>     Brgds,
>
>     On 30/09/2021 08:26, Han Zhou wrote:
>>
>>
>>     On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov
>>     <anton.ivanov@cambridgegreys.com
>>     <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>
>>         After quickly adding some more prints into the testsuite.
>>
>>         Test 1:
>>
>>         Without
>>
>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200
>>         Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>           ---
>>           Maximum (NB in msec): 1130
>>           Average (NB in msec): 620.375000
>>           Maximum (SB in msec): 23
>>           Average (SB in msec): 21.468759
>>           Maximum (northd-loop in msec): 6002
>>           Minimum (northd-loop in msec): 0
>>           Average (northd-loop in msec): 914.760417
>>           Long term average (northd-loop in msec): 104.799340
>>
>>         With
>>
>>           1: ovn-northd basic scale test -- 200 Hypervisors, 200
>>         Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>           ---
>>           Maximum (NB in msec): 1148
>>           Average (NB in msec): 630.250000
>>           Maximum (SB in msec): 24
>>           Average (SB in msec): 21.468744
>>           Maximum (northd-loop in msec): 6090
>>           Minimum (northd-loop in msec): 0
>>           Average (northd-loop in msec): 762.101565
>>           Long term average (northd-loop in msec): 80.735192
>>
>>         The metric which actually matters and which SHOULD me
>>         measured - long term average is better by 20%. Using short
>>         term average instead of long term in the test suite is
>>         actually a BUG.
>>
>>     Good catch!
>>
>>         Are you running yours under some sort of virtualization?
>>
>>
>>     No, I am testing on a bare-metal.
>>
>>         A.
>>
>>         On 30/09/2021 07:52, Han Zhou wrote:
>>>         Thanks Anton for checking. I am using: Intel(R) Core(TM)
>>>         i9-7920X CPU @ 2.90GHz, 24 cores.
>>>         It is weird why my result is so different. I also verified
>>>         with a scale test script that creates a large scale NB/SB
>>>         with 800 nodes of simulated k8s setup. And then just run:
>>>             ovn-nbctl --print-wait-time --wait=sb sync
>>>
>>>         Without parallel:
>>>         ovn-northd completion: 7807ms
>>>
>>>         With parallel:
>>>         ovn-northd completion: 41267ms
>>>
>>>         I suspected the hmap size problem but I tried changing the
>>>         initial size to 64k buckets and it didn't help. I will find
>>>         some time to check the "perf" reports.
>>>
>>>         Thanks,
>>>         Han
>>>
>>>         On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov
>>>         <anton.ivanov@cambridgegreys.com
>>>         <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>>
>>>             On 30/09/2021 07:16, Anton Ivanov wrote:
>>>>             Results on a Ryzen 5 3600 - 6 cores 12 threads
>>>
>>>             I will also have a look into the "maximum" measurement
>>>             for multi-thread.
>>>
>>>             It does not tie up with the drop in average across the
>>>             board.
>>>
>>>             A.
>>>
>>>>
>>>>             Without
>>>>
>>>>
>>>>               1: ovn-northd basic scale test -- 200 Hypervisors,
>>>>             200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>>               ---
>>>>               Maximum (NB in msec): 1256
>>>>               Average (NB in msec): 679.463785
>>>>               Maximum (SB in msec): 25
>>>>               Average (SB in msec): 22.489798
>>>>               Maximum (northd-loop in msec): 1347
>>>>               Average (northd-loop in msec): 799.944878
>>>>
>>>>               2: ovn-northd basic scale test -- 200 Hypervisors,
>>>>             200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>>               ---
>>>>               Maximum (NB in msec): 1956
>>>>               Average (NB in msec): 809.387285
>>>>               Maximum (SB in msec): 24
>>>>               Average (SB in msec): 21.649258
>>>>               Maximum (northd-loop in msec): 2011
>>>>               Average (northd-loop in msec): 961.718686
>>>>
>>>>               5: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>>               ---
>>>>               Maximum (NB in msec): 557
>>>>               Average (NB in msec): 474.010337
>>>>               Maximum (SB in msec): 15
>>>>               Average (SB in msec): 13.927192
>>>>               Maximum (northd-loop in msec): 1261
>>>>               Average (northd-loop in msec): 580.999122
>>>>
>>>>               6: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>>               ---
>>>>               Maximum (NB in msec): 756
>>>>               Average (NB in msec): 625.614724
>>>>               Maximum (SB in msec): 15
>>>>               Average (SB in msec): 14.181048
>>>>               Maximum (northd-loop in msec): 1649
>>>>               Average (northd-loop in msec): 746.208332
>>>>
>>>>
>>>>             With
>>>>
>>>>               1: ovn-northd basic scale test -- 200 Hypervisors,
>>>>             200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>>               ---
>>>>               Maximum (NB in msec): 1140
>>>>               Average (NB in msec): 631.125000
>>>>               Maximum (SB in msec): 24
>>>>               Average (SB in msec): 21.453609
>>>>               Maximum (northd-loop in msec): 6080
>>>>               Average (northd-loop in msec): 759.718815
>>>>
>>>>               2: ovn-northd basic scale test -- 200 Hypervisors,
>>>>             200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>>               ---
>>>>               Maximum (NB in msec): 1210
>>>>               Average (NB in msec): 673.000000
>>>>               Maximum (SB in msec): 27
>>>>               Average (SB in msec): 22.453125
>>>>               Maximum (northd-loop in msec): 6514
>>>>               Average (northd-loop in msec): 808.596842
>>>>
>>>>               5: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
>>>>               ---
>>>>               Maximum (NB in msec): 798
>>>>               Average (NB in msec): 429.750000
>>>>               Maximum (SB in msec): 15
>>>>               Average (SB in msec): 12.998533
>>>>               Maximum (northd-loop in msec): 3835
>>>>               Average (northd-loop in msec): 564.875986
>>>>
>>>>               6: ovn-northd basic scale test -- 500 Hypervisors, 50
>>>>             Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
>>>>               ---
>>>>               Maximum (NB in msec): 1074
>>>>               Average (NB in msec): 593.875000
>>>>               Maximum (SB in msec): 14
>>>>               Average (SB in msec): 13.655273
>>>>               Maximum (northd-loop in msec): 4973
>>>>               Average (northd-loop in msec): 771.102605
>>>>
>>>>             The only one slower is test 6 which I will look into.
>>>>
>>>>             The rest are > 5% faster.
>>>>
>>>>             A.
>>>>
>>>>             On 30/09/2021 00:56, Han Zhou wrote:
>>>>>
>>>>>
>>>>>             On Wed, Sep 15, 2021 at 5:45 AM
>>>>>             <anton.ivanov@cambridgegreys.com
>>>>>             <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>>>>             >
>>>>>             > From: Anton Ivanov <anton.ivanov@cambridgegreys.com
>>>>>             <mailto:anton.ivanov@cambridgegreys.com>>
>>>>>             >
>>>>>             > Restore parallel build with dp groups using rwlock
>>>>>             instead
>>>>>             > of per row locking as an underlying mechanism.
>>>>>             >
>>>>>             > This provides improvement ~ 10% end-to-end on ovn-heater
>>>>>             > under virutalization despite awakening some qemu gremlin
>>>>>             > which makes qemu climb to silly CPU usage. The gain on
>>>>>             > bare metal is likely to be higher.
>>>>>             >
>>>>>             Hi Anton,
>>>>>
>>>>>             I am trying to see the benefit of parallel_build, but
>>>>>             encountered unexpected performance result when running
>>>>>             the perf tests with command:
>>>>>                  make check-perf TESTSUITEFLAGS="--rebuild"
>>>>>
>>>>>             It shows significantly worse performance than without
>>>>>             parallel_build. For dp_group = no cases, it is better,
>>>>>             but still ~30% slower than without parallel_build. I
>>>>>             have 24 cores, but each thread is not consuming much
>>>>>             CPU except the main thread. I also tried hardcode the
>>>>>             number of thread to just 4, which end up with slightly
>>>>>             better results, but still far behind "without
>>>>>             parallel_build".
>>>>>
>>>>>                        no parallel    |              parallel (24
>>>>>             pool threads)       |        parallel with (4 pool
>>>>>             threads)
>>>>>              |     |
>>>>>                 1: ovn-northd basic scale test -- 200 Hypervisors,
>>>>>             200 |    1: ovn-northd basic scale test -- 200
>>>>>             Hypervisors, 200 |  1: ovn-northd basic scale test --
>>>>>             200 Hypervisors, 200
>>>>>                 ---    |    ---        |    ---
>>>>>                 Maximum (NB in msec): 1058     |    Maximum (NB in
>>>>>             msec): 4269             |    Maximum (NB in msec): 4097
>>>>>                 Average (NB in msec): 836.941167     |    Average
>>>>>             (NB in msec): 3697.253931            |    Average (NB
>>>>>             in msec): 3498.311525
>>>>>                 Maximum (SB in msec): 30   |    Maximum (SB in
>>>>>             msec): 30     |    Maximum (SB in msec): 28
>>>>>                 Average (SB in msec): 25.934011      |    Average
>>>>>             (SB in msec): 26.001840            |    Average (SB in
>>>>>             msec): 25.685091
>>>>>                 Maximum (northd-loop in msec): 1204    |  
>>>>>              Maximum (northd-loop in msec): 4379      |    Maximum
>>>>>             (northd-loop in msec): 4251
>>>>>                 Average (northd-loop in msec): 1005.330078   |  
>>>>>              Average (northd-loop in msec): 4233.871504       |  
>>>>>              Average (northd-loop in msec): 4022.774208
>>>>>                |       |
>>>>>                 2: ovn-northd basic scale test -- 200 Hypervisors,
>>>>>             200 |    2: ovn-northd basic scale test -- 200
>>>>>             Hypervisors, 200 |  2: ovn-northd basic scale test --
>>>>>             200 Hypervisors, 200
>>>>>                 ---    |    ---        |    ---
>>>>>                 Maximum (NB in msec): 1124     |    Maximum (NB in
>>>>>             msec): 1480             |    Maximum (NB in msec): 1331
>>>>>                 Average (NB in msec): 892.403405     |    Average
>>>>>             (NB in msec): 1206.189287            |    Average (NB
>>>>>             in msec): 1089.378455
>>>>>                 Maximum (SB in msec): 29   |    Maximum (SB in
>>>>>             msec): 31     |    Maximum (SB in msec): 30
>>>>>                 Average (SB in msec): 26.922632      |    Average
>>>>>             (SB in msec): 26.636706            |    Average (SB in
>>>>>             msec): 25.657484
>>>>>                 Maximum (northd-loop in msec): 1275    |  
>>>>>              Maximum (northd-loop in msec): 1639      |    Maximum
>>>>>             (northd-loop in msec): 1495
>>>>>                 Average (northd-loop in msec): 1074.917873   |  
>>>>>              Average (northd-loop in msec): 1458.152327       |  
>>>>>              Average (northd-loop in msec): 1301.057201
>>>>>                |       |
>>>>>                 5: ovn-northd basic scale test -- 500 Hypervisors,
>>>>>             50 L|    5: ovn-northd basic scale test -- 500
>>>>>             Hypervisors, 50 L|  5: ovn-northd basic scale test --
>>>>>             500 Hypervisors, 50
>>>>>                 ---    |    ---        |    ---
>>>>>                 Maximum (NB in msec): 768      |    Maximum (NB in
>>>>>             msec): 3086             |    Maximum (NB in msec): 2876
>>>>>                 Average (NB in msec): 614.491938     |    Average
>>>>>             (NB in msec): 2681.688365            |    Average (NB
>>>>>             in msec): 2531.255444
>>>>>                 Maximum (SB in msec): 18   |    Maximum (SB in
>>>>>             msec): 17     |    Maximum (SB in msec): 18
>>>>>                 Average (SB in msec): 16.347526      |    Average
>>>>>             (SB in msec): 15.955263            |    Average (SB in
>>>>>             msec): 16.278075
>>>>>                 Maximum (northd-loop in msec): 889   |    Maximum
>>>>>             (northd-loop in msec): 3247      |    Maximum
>>>>>             (northd-loop in msec): 3031
>>>>>                 Average (northd-loop in msec): 772.083572    |  
>>>>>              Average (northd-loop in msec): 3117.504297       |  
>>>>>              Average (northd-loop in msec): 2833.182361
>>>>>                |       |
>>>>>                 6: ovn-northd basic scale test -- 500 Hypervisors,
>>>>>             50 L|    6: ovn-northd basic scale test -- 500
>>>>>             Hypervisors, 50 L|  6: ovn-northd basic scale test --
>>>>>             500 Hypervisors, 50
>>>>>                 ---    |    ---        |    ---
>>>>>                 Maximum (NB in msec): 1046     |    Maximum (NB in
>>>>>             msec): 1371             |    Maximum (NB in msec): 1262
>>>>>                 Average (NB in msec): 827.735852     |    Average
>>>>>             (NB in msec): 1135.514228            |    Average (NB
>>>>>             in msec): 970.544792
>>>>>                 Maximum (SB in msec): 19   |    Maximum (SB in
>>>>>             msec): 18     |    Maximum (SB in msec): 19
>>>>>                 Average (SB in msec): 16.828127      |    Average
>>>>>             (SB in msec): 16.083914            |    Average (SB in
>>>>>             msec): 15.602525
>>>>>                 Maximum (northd-loop in msec): 1163    |  
>>>>>              Maximum (northd-loop in msec): 1545      |    Maximum
>>>>>             (northd-loop in msec): 1411
>>>>>                 Average (northd-loop in msec): 972.567407    |  
>>>>>              Average (northd-loop in msec): 1328.617583       |  
>>>>>              Average (northd-loop in msec): 1207.667100
>>>>>
>>>>>             I didn't debug yet, but do you have any clue what
>>>>>             could be the reason? I am using the upstream commit
>>>>>             9242f27f63 which already included this patch.
>>>>>             Below is my change to the perf-northd.at
>>>>>             <http://perf-northd.at> file just to enable
>>>>>             parallel_build:
>>>>>
>>>>>             diff --git a/tests/perf-northd.at
>>>>>             <http://perf-northd.at> b/tests/perf-northd.at
>>>>>             <http://perf-northd.at>
>>>>>             index 74b69e9d4..9328c2e21 100644
>>>>>             --- a/tests/perf-northd.at <http://perf-northd.at>
>>>>>             +++ b/tests/perf-northd.at <http://perf-northd.at>
>>>>>             @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale
>>>>>             test -- 200 Hypervisors, 200 Logical Ports/Hype
>>>>>              PERF_RECORD_START()
>>>>>
>>>>>              ovn_start
>>>>>             +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>>
>>>>>              BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
>>>>>
>>>>>             @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale
>>>>>             test -- 500 Hypervisors, 50 Logical Ports/Hyper
>>>>>              PERF_RECORD_START()
>>>>>
>>>>>              ovn_start
>>>>>             +ovn-nbctl set nb_global . options:use_parallel_build=true
>>>>>
>>>>>              BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
>>>>>
>>>>>             Thanks,
>>>>>             Han
>>>>
>>>>
>>>>             -- 
>>>>             Anton R. Ivanov
>>>>             Cambridgegreys Limited. Registered in England. Company Number 10273661
>>>>             https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>>
>>>
>>>             -- 
>>>             Anton R. Ivanov
>>>             Cambridgegreys Limited. Registered in England. Company Number 10273661
>>>             https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>>
>>         -- 
>>         Anton R. Ivanov
>>         Cambridgegreys Limited. Registered in England. Company Number 10273661
>>         https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>>
>     -- 
>     Anton R. Ivanov
>     Cambridgegreys Limited. Registered in England. Company Number 10273661
>     https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>
>
Han Zhou Oct. 1, 2021, 12:32 a.m. UTC | #12
On Thu, Sep 30, 2021 at 2:03 PM Anton Ivanov <
anton.ivanov@cambridgegreys.com> wrote:

> On 30/09/2021 20:48, Han Zhou wrote:
>
>
>
> On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov <
> anton.ivanov@cambridgegreys.com> wrote:
>
>> Summary of findings.
>>
>> 1. The numbers on the perf test do not align with heater which is much
>> closer to a realistic load. On some tests where heater gives 5-10%
>> end-to-end improvement with parallelization we get worse results with the
>> perf-test. You spotted this one correctly.
>>
>> Example of the northd average pulled out of the test report via grep and
>> sed.
>>
>>    127.489353
>>    131.509458
>>    116.088205
>>    94.721911
>>    119.629756
>>    114.896258
>>    124.811069
>>    129.679160
>>    106.699905
>>    134.490338
>>    112.106713
>>    135.957658
>>    132.471111
>>    94.106849
>>    117.431450
>>    115.861592
>>    106.830657
>>    132.396905
>>    107.092542
>>    128.945760
>>    94.298464
>>    120.455510
>>    136.910426
>>    134.311765
>>    115.881292
>>    116.918458
>>
>> These values are all over the place - this is not a reproducible test.
>>
>> 2. In the present state you need to re-run it > 30+ times and take an
>> average. The standard deviation for the values for the northd loop is >
>> 10%. Compared to that the reproducibility of ovn-heater is significantly
>> better. I usually get less than 0.5% difference between runs if there was
>> no iteration failures. I would suggest using that instead if you want to do
>> performance comparisons until we have figured out what affects the
>> perf-test.
>>
>> 3. It is using the short term running average value in reports which is
>> probably wrong because you have very significant skew from the last several
>> values.
>>
>> I will look into all of these.
>>
> Thanks for the summary! However, I think there is a bigger problem
> (probably related to my environment) than the stability of the test (make
> check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in an earlier
> email I observed even worse results with a large scale topology closer to a
> real world deployment of ovn-k8s just testing with the command:
>     ovn-nbctl --print-wait-time --wait=sb sync
>
> This command simply triggers a change in NB_Global table and wait for
> northd to complete all the recompute and update SB. It doesn't have to use
> "sync" command but any change to the NB DB produces similar result (e.g.:
> ovn-nbctl --print-wait-time --wait=sb ls-add ls1)
>
> Without parallel:
> ovn-northd completion: 7807ms
>
> With parallel:
> ovn-northd completion: 41267ms
>
> Is this with current master or prior to these patches?
>
1. There was an issue prior to these where the hash on first iteration with
> an existing database when loading a large database for the first time was
> not sized correctly. These numbers sound about right when this bug was
> around.
>
The patches are included. The commit id is 9242f27f63 as mentioned in my
first email.

> 2. There should be NO DIFFERENCE in a single compute cycle with an
> existing database between a run with parallel and without with dp groups at
> present. This is because the first cycle does not use parallel compute. It
> is disabled in order to achieve the correct hash sizings for future cycle
> by auto-scaling the hash.
>
Yes, I understand this and I did enable dp-group for the above "ovn-nbctl
sync" test, so the number I showed above for "with parallel" was for the
2nd run and onwards. For the first round the result is exactly the same as
without parallel.

I just tried disabling DP group for the large scale "ovn-nbctl sync" test
(after taking some effort squeezing out memory spaces on my desktop), and
the result shows that parallel build performs slightly better (although it
is 3x slower than with dp-group & without parallel, which is expected).
Summarize the result together below:

Without parallel, with dp-group:
ovn-northd completion: 7807ms

With parallel, with dp-group:
ovn-northd completion: 41267ms

without parallel, without dp-group:
ovn-northd completion: 27996ms

with parallel, without dp-group:
ovn-northd completion: 26584ms

Now the interesting part:
I implemented a POC of a hash based mutex array that replaces the rw lock
in the function do_ovn_lflow_add_pd(), and the performance is greatly
improved for the dp-group test:

with parallel, with dp-group (hash based mutex):
ovn-northd completion: 5081ms

This is 8x faster than the current parallel one and 30% faster than without
parallel. This result looks much more reasonable to me. My theory is that
when using parallel with dp-group, the rwlock contention is causing the low
CPU utilization of the threads and the overall slowness on my machine. I
will refine the POC to a formal patch and send it for review, hopefully by
tomorrow.

Thanks,
Han
Anton Ivanov Oct. 1, 2021, 6:04 a.m. UTC | #13
On 01/10/2021 01:32, Han Zhou wrote:
>
>
> On Thu, Sep 30, 2021 at 2:03 PM Anton Ivanov 
> <anton.ivanov@cambridgegreys.com 
> <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>
>     On 30/09/2021 20:48, Han Zhou wrote:
>>
>>
>>     On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov
>>     <anton.ivanov@cambridgegreys.com
>>     <mailto:anton.ivanov@cambridgegreys.com>> wrote:
>>
>>         Summary of findings.
>>
>>         1. The numbers on the perf test do not align with heater
>>         which is much closer to a realistic load. On some tests where
>>         heater gives 5-10% end-to-end improvement with
>>         parallelization we get worse results with the perf-test. You
>>         spotted this one correctly.
>>
>>         Example of the northd average pulled out of the test report
>>         via grep and sed.
>>
>>            127.489353
>>            131.509458
>>            116.088205
>>            94.721911
>>            119.629756
>>            114.896258
>>            124.811069
>>            129.679160
>>            106.699905
>>            134.490338
>>            112.106713
>>            135.957658
>>            132.471111
>>            94.106849
>>            117.431450
>>            115.861592
>>            106.830657
>>            132.396905
>>            107.092542
>>            128.945760
>>            94.298464
>>            120.455510
>>            136.910426
>>            134.311765
>>            115.881292
>>            116.918458
>>
>>         These values are all over the place - this is not a
>>         reproducible test.
>>
>>         2. In the present state you need to re-run it > 30+ times and
>>         take an average. The standard deviation for the values for
>>         the northd loop is > 10%. Compared to that the
>>         reproducibility of ovn-heater is significantly better. I
>>         usually get less than 0.5% difference between runs if there
>>         was no iteration failures. I would suggest using that instead
>>         if you want to do performance comparisons until we have
>>         figured out what affects the perf-test.
>>
>>         3. It is using the short term running average value in
>>         reports which is probably wrong because you have very
>>         significant skew from the last several values.
>>
>>         I will look into all of these.
>>
>>     Thanks for the summary! However, I think there is a bigger
>>     problem (probably related to my environment) than the stability
>>     of the test (make check-perf TESTSUITEFLAGS="--rebuild") itself.
>>     As I mentioned in an earlier email I observed even worse results
>>     with a large scale topology closer to a real world deployment of
>>     ovn-k8s just testing with the command:
>>         ovn-nbctl --print-wait-time --wait=sb sync
>>
>>     This command simply triggers a change in NB_Global table and wait
>>     for northd to complete all the recompute and update SB. It
>>     doesn't have to use "sync" command but any change to the NB DB
>>     produces similar result (e.g.: ovn-nbctl --print-wait-time
>>     --wait=sb ls-add ls1)
>>
>>     Without parallel:
>>     ovn-northd completion: 7807ms
>>
>>     With parallel:
>>     ovn-northd completion: 41267ms
>
>     Is this with current master or prior to these patches?
>
>     1. There was an issue prior to these where the hash on first
>     iteration with an existing database when loading a large database
>     for the first time was not sized correctly. These numbers sound
>     about right when this bug was around.
>
> The patches are included. The commit id is 9242f27f63 as mentioned in 
> my first email.
>
>     2. There should be NO DIFFERENCE in a single compute cycle with an
>     existing database between a run with parallel and without with dp
>     groups at present. This is because the first cycle does not use
>     parallel compute. It is disabled in order to achieve the correct
>     hash sizings for future cycle by auto-scaling the hash.
>
> Yes, I understand this and I did enable dp-group for the above 
> "ovn-nbctl sync" test, so the number I showed above for "with 
> parallel" was for the 2nd run and onwards. For the first round the 
> result is exactly the same as without parallel.
>
> I just tried disabling DP group for the large scale "ovn-nbctl sync" 
> test (after taking some effort squeezing out memory spaces on my 
> desktop), and the result shows that parallel build performs slightly 
> better (although it is 3x slower than with dp-group & without 
> parallel, which is expected). Summarize the result together below:
>
> Without parallel, with dp-group:
> ovn-northd completion: 7807ms
>
> With parallel, with dp-group:
> ovn-northd completion: 41267ms
>
> without parallel, without dp-group:
> ovn-northd completion: 27996ms
>
> with parallel, without dp-group:
> ovn-northd completion: 26584ms
>
> Now the interesting part:
> I implemented a POC of a hash based mutex array that replaces the rw 
> lock in the function do_ovn_lflow_add_pd(), and the performance is 
> greatly improved for the dp-group test:
>
> with parallel, with dp-group (hash based mutex):
> ovn-northd completion: 5081ms
>
> This is 8x faster than the current parallel one and 30% faster than 
> without parallel. This result looks much more reasonable to me. My 
> theory is that when using parallel with dp-group, the rwlock 
> contention is causing the low CPU utilization of the threads and the 
> overall slowness on my machine. I will refine the POC to a formal 
> patch and send it for review, hopefully by tomorrow.

Cool. The older implementation prior to going to rwlock was based on that.

I found a couple of issues with it which is why I switched to RWlock

Namely - the access to the lflow hash size is not controlled and the 
hash size ends up corrupt because different threads modify it without a 
lock. In a worst case scenario you end up with a dog's breakfast in this 
entire cache line.

So you need a couple of extra macros to insert fast without touching the 
cache size.

This in turn leaves you with a hash you cannot resize to correct size 
for searching for post-processing lflows and reconciliation. You will 
probably need the post-processing optimization patch which I submitted a 
couple of weeks back. Instead of using a HMAPX to hold the single flows 
and modifying in-place the lflow hash it rebuilds it completely and 
replaces the original one. At that point you have the right size and the 
hash is resized to optimum size.

By the way, there is one more option - you may want to try switching to 
fatrwlock - that is supposed to decrease contention and make things 
faster. Though that probably will not be enough.

I still do not get it why your results are so different from ovn-heater 
tests, but that is something I will look into separately.

Brgds,

>
> Thanks,
> Han
Han Zhou Oct. 1, 2021, 10:34 p.m. UTC | #14
On Thu, Sep 30, 2021 at 11:04 PM Anton Ivanov <
anton.ivanov@cambridgegreys.com> wrote:

> On 01/10/2021 01:32, Han Zhou wrote:
>
>
>
> On Thu, Sep 30, 2021 at 2:03 PM Anton Ivanov <
> anton.ivanov@cambridgegreys.com> wrote:
>
>> On 30/09/2021 20:48, Han Zhou wrote:
>>
>>
>>
>> On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov <
>> anton.ivanov@cambridgegreys.com> wrote:
>>
>>> Summary of findings.
>>>
>>> 1. The numbers on the perf test do not align with heater which is much
>>> closer to a realistic load. On some tests where heater gives 5-10%
>>> end-to-end improvement with parallelization we get worse results with the
>>> perf-test. You spotted this one correctly.
>>>
>>> Example of the northd average pulled out of the test report via grep and
>>> sed.
>>>
>>>    127.489353
>>>    131.509458
>>>    116.088205
>>>    94.721911
>>>    119.629756
>>>    114.896258
>>>    124.811069
>>>    129.679160
>>>    106.699905
>>>    134.490338
>>>    112.106713
>>>    135.957658
>>>    132.471111
>>>    94.106849
>>>    117.431450
>>>    115.861592
>>>    106.830657
>>>    132.396905
>>>    107.092542
>>>    128.945760
>>>    94.298464
>>>    120.455510
>>>    136.910426
>>>    134.311765
>>>    115.881292
>>>    116.918458
>>>
>>> These values are all over the place - this is not a reproducible test.
>>>
>>> 2. In the present state you need to re-run it > 30+ times and take an
>>> average. The standard deviation for the values for the northd loop is >
>>> 10%. Compared to that the reproducibility of ovn-heater is significantly
>>> better. I usually get less than 0.5% difference between runs if there was
>>> no iteration failures. I would suggest using that instead if you want to do
>>> performance comparisons until we have figured out what affects the
>>> perf-test.
>>>
>>> 3. It is using the short term running average value in reports which is
>>> probably wrong because you have very significant skew from the last several
>>> values.
>>>
>>> I will look into all of these.
>>>
>> Thanks for the summary! However, I think there is a bigger problem
>> (probably related to my environment) than the stability of the test (make
>> check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in an earlier
>> email I observed even worse results with a large scale topology closer to a
>> real world deployment of ovn-k8s just testing with the command:
>>     ovn-nbctl --print-wait-time --wait=sb sync
>>
>> This command simply triggers a change in NB_Global table and wait for
>> northd to complete all the recompute and update SB. It doesn't have to use
>> "sync" command but any change to the NB DB produces similar result (e.g.:
>> ovn-nbctl --print-wait-time --wait=sb ls-add ls1)
>>
>> Without parallel:
>> ovn-northd completion: 7807ms
>>
>> With parallel:
>> ovn-northd completion: 41267ms
>>
>> Is this with current master or prior to these patches?
>>
> 1. There was an issue prior to these where the hash on first iteration
>> with an existing database when loading a large database for the first time
>> was not sized correctly. These numbers sound about right when this bug was
>> around.
>>
> The patches are included. The commit id is 9242f27f63 as mentioned in my
> first email.
>
>> 2. There should be NO DIFFERENCE in a single compute cycle with an
>> existing database between a run with parallel and without with dp groups at
>> present. This is because the first cycle does not use parallel compute. It
>> is disabled in order to achieve the correct hash sizings for future cycle
>> by auto-scaling the hash.
>>
> Yes, I understand this and I did enable dp-group for the above "ovn-nbctl
> sync" test, so the number I showed above for "with parallel" was for the
> 2nd run and onwards. For the first round the result is exactly the same as
> without parallel.
>
> I just tried disabling DP group for the large scale "ovn-nbctl sync" test
> (after taking some effort squeezing out memory spaces on my desktop), and
> the result shows that parallel build performs slightly better (although it
> is 3x slower than with dp-group & without parallel, which is expected).
> Summarize the result together below:
>
> Without parallel, with dp-group:
> ovn-northd completion: 7807ms
>
> With parallel, with dp-group:
> ovn-northd completion: 41267ms
>
> without parallel, without dp-group:
> ovn-northd completion: 27996ms
>
> with parallel, without dp-group:
> ovn-northd completion: 26584ms
>
> Now the interesting part:
> I implemented a POC of a hash based mutex array that replaces the rw lock
> in the function do_ovn_lflow_add_pd(), and the performance is greatly
> improved for the dp-group test:
>
> with parallel, with dp-group (hash based mutex):
> ovn-northd completion: 5081ms
>
> This is 8x faster than the current parallel one and 30% faster than
> without parallel. This result looks much more reasonable to me. My theory
> is that when using parallel with dp-group, the rwlock contention is causing
> the low CPU utilization of the threads and the overall slowness on my
> machine. I will refine the POC to a formal patch and send it for review,
> hopefully by tomorrow.
>
> Cool. The older implementation prior to going to rwlock was based on that.
>
> I found a couple of issues with it which is why I switched to RWlock
>
> Namely - the access to the lflow hash size is not controlled and the hash
> size ends up corrupt because different threads modify it without a lock. In
> a worst case scenario you end up with a dog's breakfast in this entire
> cache line.
>
> So you need a couple of extra macros to insert fast without touching the
> cache size.
>
> This in turn leaves you with a hash you cannot resize to correct size for
> searching for post-processing lflows and reconciliation. You will probably
> need the post-processing optimization patch which I submitted a couple of
> weeks back. Instead of using a HMAPX to hold the single flows and modifying
> in-place the lflow hash it rebuilds it completely and replaces the original
> one. At that point you have the right size and the hash is resized to
> optimum size.
>
> By the way, there is one more option - you may want to try switching to
> fatrwlock - that is supposed to decrease contention and make things faster.
> Though that probably will not be enough.
>
> I still do not get it why your results are so different from ovn-heater
> tests, but that is something I will look into separately.
>
> Brgds,
>
Hi Anton, thanks for bringing up the tricky points. I end up with a
relatively simple solution:
https://patchwork.ozlabs.org/project/ovn/patch/20211001222944.2353351-1-hzhou@ovn.org/

The considerations are described in the comments. Please take a look and
see if it solves the concerns.

Thanks,
Han

>
> Thanks,
> Han
>
>
> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>
diff mbox series

Patch

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index ed231510e..34e6ad1a9 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -59,6 +59,7 @@ 
 #include "unixctl.h"
 #include "util.h"
 #include "uuid.h"
+#include "ovs-thread.h"
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
@@ -4294,6 +4295,7 @@  struct ovn_lflow {
     struct hmap_node hmap_node;
 
     struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
+    struct ovs_mutex odg_lock;   /* Lock guarding access to od_group */
     struct hmapx od_group;       /* Hash map of 'struct ovn_datapath *'. */
     enum ovn_stage stage;
     uint16_t priority;
@@ -4335,6 +4337,11 @@  ovn_lflow_equal(const struct ovn_lflow *a, const struct ovn_datapath *od,
             && !strcmp(a->actions, actions)
             && nullable_string_is_equal(a->ctrl_meter, ctrl_meter));
 }
+/* If this option is 'true' northd will combine logical flows that differ by
+ * logical datapath only by creating a datapath group. */
+static bool use_logical_dp_groups = false;
+static bool use_parallel_build = true;
+
 
 static void
 ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
@@ -4353,24 +4360,56 @@  ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
     lflow->ctrl_meter = ctrl_meter;
     lflow->dpg = NULL;
     lflow->where = where;
+    if (use_parallel_build && use_logical_dp_groups) {
+        ovs_mutex_init(&lflow->odg_lock);
+    }
 }
 
-/* If this option is 'true' northd will combine logical flows that differ by
- * logical datapath only by creating a datapath group. */
-static bool use_logical_dp_groups = false;
-static bool use_parallel_build = true;
+ /* Adds a row with the specified contents to the Logical_Flow table.
+  * Version to use with dp_groups + parallel - when locking is required.
+ *
+ * Assumptions:
+ *
+ * 1. A large proportion of the operations are lookups (reads).
+ * 2. RW operations are a small proportion of overall adds.
+ * 3. Most RW ops are not flow adds, but changes to the
+ * od groups.
+ *
+ * Principles of operation:
+ * 1. All accesses to the flow table are protected by a rwlock.
+ * 2. By default, everyone grabs a rd lock so that multiple threads
+ * can do lookups simultaneously.
+ * 3. If a change to the lflow is needed, the rd lock is released and
+ * a wr lock is acquired instead (the fact that POSIX does not have an
+ * "upgrade" on locks is a major pain, but there is nothing we can do
+ * - it's not available).
+ * 4. WR lock operations in rd/wr locking have LOWER priority than RD.
+ * That is by design and spec. So the code after a request for WR lock
+ * may wait for a considerable amount of time until it is given a
+ * change to run. That means that another thread may get there in the
+ * meantime and change the data. Hence all wr operations MUST be coded
+ * to ensure that they are not vulnerable to "someone pulled this from
+ * under my feet". Re- reads, checks for presense, etc.
+ * 5. Operations on the actual od_group hash map are protected by
+ * per-flow locks. There is no need for these to be rd, mutex is more
+ * appropriate. They are low contention as each protects only its flow
+ * and only during modification which happen while holding a rd lock on
+ * the flow table.
+ */
 
-static struct hashrow_locks lflow_locks;
+static struct ovs_rwlock flowtable_lock;
 
 /* Adds a row with the specified contents to the Logical_Flow table.
- * Version to use when locking is required.
+ * Version to use when locking is NOT required.
  */
+
 static struct ovn_lflow *
 do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
                  uint32_t hash, enum ovn_stage stage, uint16_t priority,
                  const char *match, const char *actions, const char *io_port,
                  const struct ovsdb_idl_row *stage_hint,
                  const char *where, const char *ctrl_meter)
+                 OVS_NO_THREAD_SAFETY_ANALYSIS
 {
 
     struct ovn_lflow *old_lflow;
@@ -4403,6 +4442,59 @@  do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
     return lflow;
 }
 
+/* Adds a row with the specified contents to the Logical_Flow table.
+ * Version to use when locking is IS required.
+ */
+
+static struct ovn_lflow *
+do_ovn_lflow_add_pd(struct hmap *lflow_map, struct ovn_datapath *od,
+                    uint32_t hash, enum ovn_stage stage, uint16_t priority,
+                    const char *match, const char *actions,
+                    const char *io_port,
+                    const struct ovsdb_idl_row *stage_hint,
+                    const char *where, const char *ctrl_meter)
+                    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+
+    struct ovn_lflow *old_lflow;
+    struct ovn_lflow *lflow;
+
+    /* Fast Path - try to amend an existing flow without asking
+     * for WR access to the whole flow table. Locking the actual
+     * hmapx for the particular flow's odg is low overhead as its
+     * contention is much lower.
+     */
+
+    ovs_rwlock_rdlock(&flowtable_lock);
+    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
+                               actions, ctrl_meter, hash);
+    if (old_lflow) {
+        ovs_mutex_lock(&old_lflow->odg_lock);
+        hmapx_add(&old_lflow->od_group, od);
+        ovs_mutex_unlock(&old_lflow->odg_lock);
+    }
+    ovs_rwlock_unlock(&flowtable_lock);
+
+    if (old_lflow) {
+        return old_lflow;
+    }
+
+    ovs_rwlock_wrlock(&flowtable_lock);
+
+    /* We need to rerun the "if in flowtable" steps, because someone
+     * could have inserted it while we were waiting to acquire an
+     * wr lock. As we are now holding a wr lock on it nobody else is
+     * in the * "fast" portion of the code which is protected by the
+     * rwlock.
+     */
+    lflow = do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
+                                 actions, io_port, stage_hint, where,
+                                 ctrl_meter);
+    ovs_rwlock_unlock(&flowtable_lock);
+    return lflow;
+}
+
+
 static struct ovn_lflow *
 ovn_lflow_add_at_with_hash(struct hmap *lflow_map, struct ovn_datapath *od,
                            enum ovn_stage stage, uint16_t priority,
@@ -4415,11 +4507,9 @@  ovn_lflow_add_at_with_hash(struct hmap *lflow_map, struct ovn_datapath *od,
 
     ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od));
     if (use_logical_dp_groups && use_parallel_build) {
-        lock_hash_row(&lflow_locks, hash);
-        lflow = do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
-                                 actions, io_port, stage_hint, where,
-                                 ctrl_meter);
-        unlock_hash_row(&lflow_locks, hash);
+        lflow = do_ovn_lflow_add_pd(lflow_map, od, hash, stage, priority,
+                                    match, actions, io_port, stage_hint, where,
+                                    ctrl_meter);
     } else {
         lflow = do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
                          actions, io_port, stage_hint, where, ctrl_meter);
@@ -4447,17 +4537,17 @@  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
 
 static bool
 ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
-                                struct ovn_datapath *od,
-                                uint32_t hash)
+                                struct ovn_datapath *od)
+                                OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     if (!use_logical_dp_groups || !lflow_ref) {
         return false;
     }
 
-    if (use_parallel_build) {
-        lock_hash_row(&lflow_locks, hash);
+    if (use_parallel_build & use_logical_dp_groups) {
+        ovs_mutex_lock(&lflow_ref->odg_lock);
         hmapx_add(&lflow_ref->od_group, od);
-        unlock_hash_row(&lflow_locks, hash);
+        ovs_mutex_unlock(&lflow_ref->odg_lock);
     } else {
         hmapx_add(&lflow_ref->od_group, od);
     }
@@ -6423,7 +6513,7 @@  build_lb_rules(struct hmap *lflows, struct ovn_northd_lb *lb,
             if (reject) {
                 meter = copp_meter_get(COPP_REJECT, od->nbs->copp,
                                        meter_groups);
-            } else if (ovn_dp_group_add_with_reference(lflow_ref, od, hash)) {
+            } else if (ovn_dp_group_add_with_reference(lflow_ref, od)) {
                 continue;
             }
             lflow_ref = ovn_lflow_add_at_with_hash(lflows, od,
@@ -9476,7 +9566,7 @@  build_lrouter_defrag_flows_for_lb(struct ovn_northd_lb *lb,
                 ds_cstr(match), ds_cstr(&defrag_actions));
         for (size_t j = 0; j < lb->n_nb_lr; j++) {
             struct ovn_datapath *od = lb->nb_lr[j];
-            if (ovn_dp_group_add_with_reference(lflow_ref, od, hash)) {
+            if (ovn_dp_group_add_with_reference(lflow_ref, od)) {
                 continue;
             }
             lflow_ref = ovn_lflow_add_at_with_hash(lflows, od,
@@ -9540,7 +9630,7 @@  build_lflows_for_unreachable_vips(struct ovn_northd_lb *lb,
                 continue;
             }
 
-            if (ovn_dp_group_add_with_reference(lflow_ref, peer->od, hash)) {
+            if (ovn_dp_group_add_with_reference(lflow_ref, peer->od)) {
                 continue;
             }
             lflow_ref = ovn_lflow_add_at_with_hash(lflows, peer->od,
@@ -12974,7 +13064,7 @@  build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }
 
-    if (use_parallel_build && (!use_logical_dp_groups)) {
+    if (use_parallel_build) {
         struct hmap *lflow_segs;
         struct lswitch_flow_build_info *lsiv;
         int index;
@@ -13156,6 +13246,8 @@  ovn_sb_set_lflow_logical_dp_group(
 }
 
 static ssize_t max_seen_lflow_size = 128;
+static bool needs_parallel_init = true;
+static bool reset_parallel = false;
 
 /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database,
  * constructing their contents based on the OVN_NB database. */
@@ -13169,9 +13261,22 @@  build_lflows(struct northd_context *ctx, struct hmap *datapaths,
 {
     struct hmap lflows;
 
+    if (reset_parallel) {
+        /* Parallel build was disabled before, we need to
+         * re-enable it. */
+        use_parallel_build = true;
+        reset_parallel = false;
+    }
+
     fast_hmap_size_for(&lflows, max_seen_lflow_size);
-    if (use_parallel_build) {
-        update_hashrow_locks(&lflows, &lflow_locks);
+    if (use_parallel_build && use_logical_dp_groups &&
+        needs_parallel_init) {
+        ovs_rwlock_init(&flowtable_lock);
+        needs_parallel_init = false;
+        /* Disable parallel build on first run with dp_groups
+         * to determine the correct sizing of hashes. */
+        use_parallel_build = false;
+        reset_parallel = true;
     }
     build_lswitch_and_lrouter_flows(datapaths, ports,
                                     port_groups, &lflows, mcgroups,
@@ -15167,7 +15272,6 @@  main(int argc, char *argv[])
 
     daemonize_complete();
 
-    init_hash_row_locks(&lflow_locks);
     use_parallel_build = can_parallelize_hashes(false);
 
     /* We want to detect (almost) all changes to the ovn-nb db. */