mbox series

[ovs-dev,RFC,0/4] Avoid parsing non-local lflows with the help of tags in SB.

Message ID 20210701054522.162291-1-hzhou@ovn.org
Headers show
Series Avoid parsing non-local lflows with the help of tags in SB. | expand

Message

Han Zhou July 1, 2021, 5:45 a.m. UTC
With the help of a new column in Logical_Flow table that stores ingress/egress
lport information, ovn-controller can avoid parsing a big portion of the
logical flows in SB DB, which can largely improve ovn-controller's performance
whenever a full recompute is required.

With a scale test topology of 1000 chassises, 20 LSPs per chassis, 20k
lports in total spread acrossing 200 logical switches, connected by a
logical router, the test result before & after this change:

Before:
- lflow-cache disabled:
    - ovn-controller recompute: 2.7 sec
- lflow-cache enabled:
    - ovn-controller recompute: 2.1 sec
    - lflow cache memory: 622103 KB

After:
- lflow-cache disabled:
    - ovn-controller recompute: 0.83 sec
- lflow-cache enabled:
    - ovn-controller recompute: 0.71 sec
    - lflow cache memory: 123641 KB

(note: DP group enabled for both)

So for this test scenario, when lflow cache is disabled, latency reduced
~70%; when lflow cache is enabled, latency reduced ~65% and lflow cache
memory reduced ~80%.

TODO: DDlog change for ovn-northd.

Note that this series applies on top of a pending patch:
https://patchwork.ozlabs.org/project/ovn/patch/20210629192257.1699504-1-hzhou@ovn.org/

Han Zhou (4):
  ovn-northd.at: Minor improvement for the dp group test case.
  ovn-sb: Add tags column to logical_flow table of the SB DB.
  ovn-northd: Populate in_out_port in logical_flow table's tags.
  ovn-controller: Skip non-local lflows in ovn-controller before
    parsing.

 controller/lflow.c          |  21 +++
 controller/lflow.h          |   1 +
 controller/ovn-controller.c |   1 +
 northd/ovn-northd.c         | 272 ++++++++++++++++++++----------------
 ovn-sb.ovsschema            |   7 +-
 ovn-sb.xml                  |  23 +++
 tests/ovn-northd.at         |   2 +-
 7 files changed, 207 insertions(+), 120 deletions(-)

Comments

Mark Michelson July 15, 2021, 8:33 p.m. UTC | #1
Hi Han,

I finally got around to having a look at this, and honestly I'm really 
happy at how simple the series is. For now, I'm not giving individual 
notes on patches, but I'll comment on the series as a whole.

It seems that this is targeted at deployments where logical switches 
have their ports distributed across multiple HVs. Something like the 
OpenShift/ovn-kubernetes model of having one logical switch per node is 
not going to see much benefit from this series. However, this also isn't 
likely to add any extra overhead to that sort of deployment either.

I think this is a good basis for an optimization. The biggest 
improvement I can think of is to be able to apply the port hint to more 
flows than just the ones that explicitly reference the inport or 
outport. But I think that could be an incremental improvement over this 
initial patch series.

The only other criticism is the lack of DDLog, but as you noted in the 
description, that's a known shortcoming.

On 7/1/21 1:45 AM, Han Zhou wrote:
> With the help of a new column in Logical_Flow table that stores ingress/egress
> lport information, ovn-controller can avoid parsing a big portion of the
> logical flows in SB DB, which can largely improve ovn-controller's performance
> whenever a full recompute is required.
> 
> With a scale test topology of 1000 chassises, 20 LSPs per chassis, 20k
> lports in total spread acrossing 200 logical switches, connected by a
> logical router, the test result before & after this change:
> 
> Before:
> - lflow-cache disabled:
>      - ovn-controller recompute: 2.7 sec
> - lflow-cache enabled:
>      - ovn-controller recompute: 2.1 sec
>      - lflow cache memory: 622103 KB
> 
> After:
> - lflow-cache disabled:
>      - ovn-controller recompute: 0.83 sec
> - lflow-cache enabled:
>      - ovn-controller recompute: 0.71 sec
>      - lflow cache memory: 123641 KB
> 
> (note: DP group enabled for both)
> 
> So for this test scenario, when lflow cache is disabled, latency reduced
> ~70%; when lflow cache is enabled, latency reduced ~65% and lflow cache
> memory reduced ~80%.
> 
> TODO: DDlog change for ovn-northd.
> 
> Note that this series applies on top of a pending patch:
> https://patchwork.ozlabs.org/project/ovn/patch/20210629192257.1699504-1-hzhou@ovn.org/
> 
> Han Zhou (4):
>    ovn-northd.at: Minor improvement for the dp group test case.
>    ovn-sb: Add tags column to logical_flow table of the SB DB.
>    ovn-northd: Populate in_out_port in logical_flow table's tags.
>    ovn-controller: Skip non-local lflows in ovn-controller before
>      parsing.
> 
>   controller/lflow.c          |  21 +++
>   controller/lflow.h          |   1 +
>   controller/ovn-controller.c |   1 +
>   northd/ovn-northd.c         | 272 ++++++++++++++++++++----------------
>   ovn-sb.ovsschema            |   7 +-
>   ovn-sb.xml                  |  23 +++
>   tests/ovn-northd.at         |   2 +-
>   7 files changed, 207 insertions(+), 120 deletions(-)
>