[ovs-dev,v4,0/5] Fast OVSDB resync after restart or fail-over.

Message ID	1551374120-44287-1-git-send-email-hzhou8@ebay.com
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: Han Zhou <zhouhan@gmail.com> To: dev@openvswitch.org Date: Thu, 28 Feb 2019 09:15:15 -0800 Message-Id: <1551374120-44287-1-git-send-email-hzhou8@ebay.com> Subject: [ovs-dev] [PATCH v4 0/5] Fast OVSDB resync after restart or fail-over. Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org
Series	Fast OVSDB resync after restart or fail-over. \| expand [ovs-dev,v4,0/5] Fast OVSDB resync after restart or fail-over. [ovs-dev,v4,1/5] ovsdb-monitor: Refactor ovsdb monitor implementation. [ovs-dev,v4,2/5] ovsdb-server: Transaction history tracking. [ovs-dev,v4,3/5] ovsdb-monitor: Support monitor_cond_since. [ovs-dev,v4,4/5] ovsdb-idl.c: Support monitor_cond_since method in C IDL. [ovs-dev,v4,5/5] ovsdb-idl.c: Fast resync from server when connection reset.

Message ID

1551374120-44287-1-git-send-email-hzhou8@ebay.com

Headers

From: Han Zhou <zhouhan@gmail.com>
To: dev@openvswitch.org
Date: Thu, 28 Feb 2019 09:15:15 -0800
Message-Id: <1551374120-44287-1-git-send-email-hzhou8@ebay.com>
Subject: [ovs-dev] [PATCH v4 0/5] Fast OVSDB resync after restart or
	fail-over.
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

Series

Fast OVSDB resync after restart or fail-over. | expand

Message

Han Zhou Feb. 28, 2019, 5:15 p.m. UTC

In scalability test with ovn-scale-test, ovsdb-server SB load is not a
problem at least with 1k HVs. However, if we restart the ovsdb-server,
depending on the number of HVs and scale of logical objects, e.g. the
number of logical ports, ovsdb-server of SB become an obvious bottleneck.

In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
connected by one single logical router). Restarting ovsdb-server of SB
resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
northd) are reconnecting and resyncing the big amount of data at the same
time.

Similar problem would happen in failover scenario. With active-active
cluster, the problem can be aleviated slightly, because only 1/3 (assuming
it is 3-node cluster) of the HVs will need to resync data from new servers,
but it is still a serious problem.

For detailed discussions for the problem and solutions, see:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html

The patches implements the proposal in that discussion. It introduces
a new method monitor_cond_since to enable client to request changes that
happened after a specific point so that the data has been cached already
in client are not re-transfered. Scalability test shows dramatic improvement.
All HVs finishes sync as soon as they reconnect since there is no new data
to be transfered.

The current patches supports all 3 modes of ovsdb-server, but only clustered
mode can benefit from it, since it is the only one that supports transaction
id out of the box.

---
v1 -> v2: fix the unused variable in patch 6/7 reported by 0-day.
v2 -> v3: first 2 in the 7 were merged. Revised 1/5 addressing Ben's
comment on the json cache hmap.
v3 -> v4: fix the use after free found by Address Sanitizer in 2/5.

Han Zhou (5):
  ovsdb-monitor: Refactor ovsdb monitor implementation.
  ovsdb-server: Transaction history tracking.
  ovsdb-monitor: Support monitor_cond_since.
  ovsdb-idl.c: Support monitor_cond_since method in C IDL.
  ovsdb-idl.c: Fast resync from server when connection reset.

 Documentation/ref/ovsdb-server.7.rst |  78 +++++-
 lib/ovsdb-idl.c                      | 229 ++++++++++++----
 ovsdb/jsonrpc-server.c               | 101 +++++--
 ovsdb/monitor.c                      | 516 ++++++++++++++++++++++-------------
 ovsdb/monitor.h                      |  16 +-
 ovsdb/ovsdb-client.1.in              |  28 +-
 ovsdb/ovsdb-client.c                 | 104 ++++++-
 ovsdb/ovsdb-server.c                 |   9 +
 ovsdb/ovsdb.c                        |   6 +
 ovsdb/ovsdb.h                        |  10 +
 ovsdb/transaction.c                  | 127 ++++++++-
 ovsdb/transaction.h                  |   5 +
 tests/ovsdb-monitor.at               | 301 +++++++++++++++++++-
 13 files changed, 1244 insertions(+), 286 deletions(-)

Comments

Ben Pfaff Feb. 28, 2019, 7:15 p.m. UTC | #1

On Thu, Feb 28, 2019 at 09:15:15AM -0800, Han Zhou wrote:
> In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> problem at least with 1k HVs. However, if we restart the ovsdb-server,
> depending on the number of HVs and scale of logical objects, e.g. the
> number of logical ports, ovsdb-server of SB become an obvious bottleneck.
> 
> In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> connected by one single logical router). Restarting ovsdb-server of SB
> resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> northd) are reconnecting and resyncing the big amount of data at the same
> time.
> 
> Similar problem would happen in failover scenario. With active-active
> cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> it is 3-node cluster) of the HVs will need to resync data from new servers,
> but it is still a serious problem.
> 
> For detailed discussions for the problem and solutions, see:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html
> 
> The patches implements the proposal in that discussion. It introduces
> a new method monitor_cond_since to enable client to request changes that
> happened after a specific point so that the data has been cached already
> in client are not re-transfered. Scalability test shows dramatic improvement.
> All HVs finishes sync as soon as they reconnect since there is no new data
> to be transfered.

Thanks a lot.  I applied this to master.

I want to encourage you to send another patch adding a NEWS item.