[ovs-dev,v3,0/5] Fast OVSDB resync after restart or failover.

Message ID	1551115508-12996-1-git-send-email-hzhou8@ebay.com
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: Han Zhou <zhouhan@gmail.com> To: dev@openvswitch.org Date: Mon, 25 Feb 2019 09:25:03 -0800 Message-Id: <1551115508-12996-1-git-send-email-hzhou8@ebay.com> Subject: [ovs-dev] [PATCH v3 0/5] Fast OVSDB resync after restart or failover. Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org
Series	Fast OVSDB resync after restart or failover. \| expand [ovs-dev,v3,0/5] Fast OVSDB resync after restart or failover. [ovs-dev,v3,1/5] ovsdb-monitor: Refactor ovsdb monitor implementation. [ovs-dev,v3,2/5] ovsdb-server: Transaction history tracking. [ovs-dev,v3,3/5] ovsdb-monitor: Support monitor_cond_since. [ovs-dev,v3,4/5] ovsdb-idl.c: Support monitor_cond_since method in C IDL. [ovs-dev,v3,5/5] ovsdb-idl.c: Fast resync from server when connection reset.

Han Zhou Feb. 25, 2019, 5:25 p.m. UTC

In scalability test with ovn-scale-test, ovsdb-server SB load is not a
problem at least with 1k HVs. However, if we restart the ovsdb-server,
depending on the number of HVs and scale of logical objects, e.g. the
number of logical ports, ovsdb-server of SB become an obvious bottleneck.

In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
connected by one single logical router). Restarting ovsdb-server of SB
resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
northd) are reconnecting and resyncing the big amount of data at the same
time.

Similar problem would happen in failover scenario. With active-active
cluster, the problem can be aleviated slightly, because only 1/3 (assuming
it is 3-node cluster) of the HVs will need to resync data from new servers,
but it is still a serious problem.

For detailed discussions for the problem and solutions, see:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html

The patches implements the proposal in that discussion. It introduces
a new method monitor_cond_since to enable client to request changes that
happened after a specific point so that the data has been cached already
in client are not re-transfered. Scalability test shows dramatic improvement.
All HVs finishes sync as soon as they reconnect since there is no new data
to be transfered.

The current patches supports all 3 modes of ovsdb-server, but only clustered
mode can benefit from it, since it is the only one that supports transaction
id out of the box.

---
v1 -> v2: fix the unused variable in patch 6/7 reported by 0-day.
v2 -> v3: first 2 in the 7 were merged. Revised 1/5 addressing Ben's
comment on the json cache hmap.

Han Zhou (5):
  ovsdb-monitor: Refactor ovsdb monitor implementation.
  ovsdb-server: Transaction history tracking.
  ovsdb-monitor: Support monitor_cond_since.
  ovsdb-idl.c: Support monitor_cond_since method in C IDL.
  ovsdb-idl.c: Fast resync from server when connection reset.

 Documentation/ref/ovsdb-server.7.rst |  78 +++++-
 lib/ovsdb-idl.c                      | 229 ++++++++++++----
 ovsdb/jsonrpc-server.c               | 101 +++++--
 ovsdb/monitor.c                      | 516 ++++++++++++++++++++++-------------
 ovsdb/monitor.h                      |  16 +-
 ovsdb/ovsdb-client.1.in              |  28 +-
 ovsdb/ovsdb-client.c                 | 104 ++++++-
 ovsdb/ovsdb-server.c                 |  11 +
 ovsdb/ovsdb.c                        |   3 +
 ovsdb/ovsdb.h                        |  10 +
 ovsdb/transaction.c                  | 117 +++++++-
 ovsdb/transaction.h                  |   4 +
 tests/ovsdb-monitor.at               | 301 +++++++++++++++++++-
 13 files changed, 1232 insertions(+), 286 deletions(-)

Ben Pfaff Feb. 25, 2019, 11:59 p.m. UTC | #1

It didn't get caught in any filters for a different -dev list that I
happen to also moderate.  Doh!  Anyway, I released it now.

On Mon, Feb 25, 2019 at 11:24:21AM -0800, Ben Pfaff wrote:
> It didn't get caught in any filters this time.
> 
> I see all the patches, although maybe I was CCed.
> 
> On Mon, Feb 25, 2019 at 11:08:40AM -0800, Han Zhou wrote:
> > This seems not showing up in the mailing list. Retry sending using
> > different words in title.
> > On Mon, Feb 25, 2019 at 9:25 AM Han Zhou <zhouhan@gmail.com> wrote:
> > >
> > > In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> > > problem at least with 1k HVs. However, if we restart the ovsdb-server,
> > > depending on the number of HVs and scale of logical objects, e.g. the
> > > number of logical ports, ovsdb-server of SB become an obvious bottleneck.
> > >
> > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> > > connected by one single logical router). Restarting ovsdb-server of SB
> > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> > > northd) are reconnecting and resyncing the big amount of data at the same
> > > time.
> > >
> > > Similar problem would happen in failover scenario. With active-active
> > > cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> > > it is 3-node cluster) of the HVs will need to resync data from new servers,
> > > but it is still a serious problem.
> > >
> > > For detailed discussions for the problem and solutions, see:
> > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html
> > >
> > > The patches implements the proposal in that discussion. It introduces
> > > a new method monitor_cond_since to enable client to request changes that
> > > happened after a specific point so that the data has been cached already
> > > in client are not re-transfered. Scalability test shows dramatic improvement.
> > > All HVs finishes sync as soon as they reconnect since there is no new data
> > > to be transfered.
> > >
> > > The current patches supports all 3 modes of ovsdb-server, but only clustered
> > > mode can benefit from it, since it is the only one that supports transaction
> > > id out of the box.
> > >
> > > ---
> > > v1 -> v2: fix the unused variable in patch 6/7 reported by 0-day.
> > > v2 -> v3: first 2 in the 7 were merged. Revised 1/5 addressing Ben's
> > > comment on the json cache hmap.
> > >
> > > Han Zhou (5):
> > >   ovsdb-monitor: Refactor ovsdb monitor implementation.
> > >   ovsdb-server: Transaction history tracking.
> > >   ovsdb-monitor: Support monitor_cond_since.
> > >   ovsdb-idl.c: Support monitor_cond_since method in C IDL.
> > >   ovsdb-idl.c: Fast resync from server when connection reset.
> > >
> > >  Documentation/ref/ovsdb-server.7.rst |  78 +++++-
> > >  lib/ovsdb-idl.c                      | 229 ++++++++++++----
> > >  ovsdb/jsonrpc-server.c               | 101 +++++--
> > >  ovsdb/monitor.c                      | 516 ++++++++++++++++++++++-------------
> > >  ovsdb/monitor.h                      |  16 +-
> > >  ovsdb/ovsdb-client.1.in              |  28 +-
> > >  ovsdb/ovsdb-client.c                 | 104 ++++++-
> > >  ovsdb/ovsdb-server.c                 |  11 +
> > >  ovsdb/ovsdb.c                        |   3 +
> > >  ovsdb/ovsdb.h                        |  10 +
> > >  ovsdb/transaction.c                  | 117 +++++++-
> > >  ovsdb/transaction.h                  |   4 +
> > >  tests/ovsdb-monitor.at               | 301 +++++++++++++++++++-
> > >  13 files changed, 1232 insertions(+), 286 deletions(-)
> > >
> > > --
> > > 2.1.0
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Ben Pfaff Feb. 28, 2019, 1:26 a.m. UTC | #2

On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote:
> In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> problem at least with 1k HVs. However, if we restart the ovsdb-server,
> depending on the number of HVs and scale of logical objects, e.g. the
> number of logical ports, ovsdb-server of SB become an obvious bottleneck.
> 
> In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> connected by one single logical router). Restarting ovsdb-server of SB
> resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> northd) are reconnecting and resyncing the big amount of data at the same
> time.
> 
> Similar problem would happen in failover scenario. With active-active
> cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> it is 3-node cluster) of the HVs will need to resync data from new servers,
> but it is still a serious problem.
> 
> For detailed discussions for the problem and solutions, see:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html

Thanks.

When I apply this series, I get a reproducible test failure in test
1920 "schema conversion online - clustered".  It's an error from Address
Sanitizer.  I'm appending the testsuite.log.

Obviously this needs to be fixed but I'll take a look at the patches in
more detail now too.

#                             -*- compilation -*-
1920. ovsdb-server.at:1126: testing schema conversion online - clustered ...
../../tests/ovsdb-server.at:869: if test $model = standalone; then
                  ovsdb-tool create db schema
              else
                  ovsdb-tool create-cluster db schema unix:s1.raft
              fi
../../tests/ovsdb-server.at:876: ovsdb-server -vfile -vvlog:off -vconsole:off --detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:880: for pair in 'zero 0' 'one 1' 'two 2' 'three 3' 'four 4' 'five 5'; do
          set -- $pair
          ovsdb-client transact '
            ["ordinals",
             {"op": "insert",
              "table": "ordinals",
              "row": {"name": "'$1'", "number": '$2'}},
             {"op": "comment",
              "comment": "add row for '"$pair"'"}]'
        done | uuidfilt
stderr:
../../tests/ovsdb-server.at:900: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:902: ovsdb-client needs-conversion new-schema
../../tests/ovsdb-server.at:907: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-ordinals-aware.pid --log-file=monitor-ordinals-aware.log --db-change-aware --no-headings monitor ordinals ordinals number name > monitor-ordinals-aware.stdout 2> monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:912: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-ordinals-unaware.pid --log-file=monitor-ordinals-unaware.log --no-db-change-aware --no-headings monitor ordinals ordinals number name > monitor-ordinals-unaware.stdout 2> monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:919: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-server-aware.pid --log-file=monitor-server-aware.log --db-change-aware --no-headings monitor _Server Database name > monitor-server-aware.stdout 2> monitor-server-aware.stderr
../../tests/ovsdb-server.at:924: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-server-unaware.pid --log-file=monitor-server-unaware.log --no-db-change-aware --no-headings monitor _Server Database name > monitor-server-unaware.stdout 2> monitor-server-unaware.stderr
../../tests/ovsdb-server.at:938: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-ordinals-aware.pid --log-file=trigger-ordinals-aware.log --db-change-aware transact "$ordinals_txn"  > trigger-ordinals-aware.stdout 2> trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:943: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-ordinals-unaware.pid --log-file=trigger-ordinals-unaware.log --no-db-change-aware transact  "$ordinals_txn" > trigger-ordinals-unaware.stdout 2> trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:957: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-server-aware.pid --log-file=trigger-server-aware.log --db-change-aware transact "$server_txn"  > trigger-server-aware.stdout 2> trigger-server-aware.stderr
../../tests/ovsdb-server.at:962: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-server-unaware.pid --log-file=trigger-server-unaware.log --no-db-change-aware transact  "$server_txn" > trigger-server-unaware.stdout 2> trigger-server-unaware.stderr
../../tests/ovsdb-server.at:968: ovsdb-client dump unix:db.sock ordinals
stdout:
ordinals table
_uuid                                name  number
------------------------------------ ----- ------
c495692b-f991-440b-865b-ed7c61fe6412 five  5
d731216a-16c9-4b78-86da-31c65b0b8358 four  4
3e3e6998-2633-4bd0-bb97-3c3c4cbd7227 one   1
7fbcde1d-30ee-47ea-bf24-a40f16f05b15 three 3
6a7387b0-8115-48e9-a202-e2898a8f1772 two   2
79dcb8d0-7738-417c-b779-40d530c87c90 zero  0
../../tests/ovsdb-server.at:969: uuidfilt stdout
../../tests/ovsdb-server.at:982: ovsdb-client convert new-schema
../../tests/ovsdb-server.at:985: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:987: ovsdb-client needs-conversion new-schema
aware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
unaware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1004: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1006: sed 's/.*: //' monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
ovsdb-server.at:1017: waiting while test -e monitor-server-unaware.pid...
ovsdb-server.at:1017: wait succeeded immediately
../../tests/ovsdb-server.at:1018: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1020: test -e monitor-server-aware.pid
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
../../tests/ovsdb-server.at:1028: cat trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1030: cat trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
ovsdb-server.at:1038: waiting while test -e trigger-server-unaware.pid...
ovsdb-server.at:1038: wait succeeded immediately
../../tests/ovsdb-server.at:1039: sed 's/.*: //' trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1041: test -e trigger-server-aware.pid
../../tests/ovsdb-server.at:1051: ovsdb-client dump unix:db.sock ordinals | uuidfilt
../../tests/ovsdb-server.at:1065: ovsdb-client transact '
         ["ordinals",
          {"op": "insert",
           "table": "ordinals",
           "row": {"number": 6}},
          {"op": "comment",
           "comment": "add row for 6"}]' | uuidfilt
../../tests/ovsdb-server.at:1078: ovsdb-client dump unix:db.sock ordinals | uuidfilt
../../tests/ovsdb-server.at:1092: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1092: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1092: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1092: wait succeeded after 1 seconds
../../tests/ovsdb-server.at:1093: ovsdb-server -vfile -vvlog:off -vconsole:off --detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:1095: ovsdb-client dump unix:db.sock ordinals | uuidfilt
../../tests/ovsdb-server.at:1111: test -h db
../../tests/ovsdb-server.at:1112: test -h .db.~lock~
../../tests/ovsdb-server.at:1113: test -f dir/db
../../tests/ovsdb-server.at:1114: test -f dir/.db.~lock~
../../tests/ovsdb-server.at:1117: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1117: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1117: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1117: wait succeeded after 1 seconds
Address Sanitizer reported errors in: asan.32224 asan.32375
=================================================================
==32224==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000005c0 at pc 0x560f3a96859f bp 0x7ffee562c110 sp 0x7ffee562c108
READ of size 8 at 0x6060000005c0 thread T0
    #0 0x560f3a96859e in ovsdb_row_destroy ../ovsdb/row.c:103
    #1 0x560f3a973520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
    #2 0x560f3a97a234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
    #3 0x560f3a9599de in ovsdb_destroy ../ovsdb/ovsdb.c:459
    #4 0x560f3a94600e in close_db ../ovsdb/ovsdb-server.c:522
    #5 0x560f3a941cd5 in main ../ovsdb/ovsdb-server.c:463
    #6 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308
    #7 0x560f3a9438f9 in _start (/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)

0x6060000005c0 is located 0 bytes inside of 56-byte region [0x6060000005c0,0x6060000005f8)
freed by thread T0 here:
    #0 0x7f21536fafd0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
    #1 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
    #2 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
    #3 0x560f3a945117 in parse_txn ../ovsdb/ovsdb-server.c:556
    #4 0x560f3a945117 in read_db ../ovsdb/ovsdb-server.c:603
    #5 0x560f3a9401f9 in main_loop ../ovsdb/ovsdb-server.c:232
    #6 0x560f3a9401f9 in main ../ovsdb/ovsdb-server.c:458
    #7 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
    #0 0x7f21536fb350 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
    #1 0x560f3a9fd794 in xmalloc ../lib/util.c:123
    #2 0x560f3a96f46d in ovsdb_table_create ../ovsdb/table.c:295
    #3 0x560f3a959603 in ovsdb_create ../ovsdb/ovsdb.c:423
    #4 0x560f3a9450f3 in parse_txn ../ovsdb/ovsdb-server.c:556
    #5 0x560f3a9450f3 in read_db ../ovsdb/ovsdb-server.c:603
    #6 0x560f3a946b74 in open_db ../ovsdb/ovsdb-server.c:670
    #7 0x560f3a9419f6 in main ../ovsdb/ovsdb-server.c:356
    #8 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in ovsdb_row_destroy
Shadow bytes around the buggy address:
  0x0c0c7fff8060: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8070: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
  0x0c0c7fff8080: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff8090: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c0c7fff80a0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff80b0: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
  0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff80d0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff80f0: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa
  0x0c0c7fff8100: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==32224==ABORTING
=================================================================
==32375==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000008c0 at pc 0x5629da28a59f bp 0x7ffc98cc14e0 sp 0x7ffc98cc14d8
READ of size 8 at 0x6060000008c0 thread T0
    #0 0x5629da28a59e in ovsdb_row_destroy ../ovsdb/row.c:103
    #1 0x5629da295520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
    #2 0x5629da29c234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
    #3 0x5629da27b9de in ovsdb_destroy ../ovsdb/ovsdb.c:459
    #4 0x5629da26800e in close_db ../ovsdb/ovsdb-server.c:522
    #5 0x5629da263cd5 in main ../ovsdb/ovsdb-server.c:463
    #6 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308
    #7 0x5629da2658f9 in _start (/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)

0x6060000008c0 is located 0 bytes inside of 56-byte region [0x6060000008c0,0x6060000008f8)
freed by thread T0 here:
    #0 0x7f4f4a69ffd0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
    #1 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
    #2 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
    #3 0x5629da267117 in parse_txn ../ovsdb/ovsdb-server.c:556
    #4 0x5629da267117 in read_db ../ovsdb/ovsdb-server.c:603
    #5 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
    #6 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
    #7 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
    #0 0x7f4f4a6a0350 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
    #1 0x5629da31f794 in xmalloc ../lib/util.c:123
    #2 0x5629da29146d in ovsdb_table_create ../ovsdb/table.c:295
    #3 0x5629da27b603 in ovsdb_create ../ovsdb/ovsdb.c:423
    #4 0x5629da2670f3 in parse_txn ../ovsdb/ovsdb-server.c:556
    #5 0x5629da2670f3 in read_db ../ovsdb/ovsdb-server.c:603
    #6 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
    #7 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
    #8 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in ovsdb_row_destroy
Shadow bytes around the buggy address:
  0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff80d0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
  0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff80f0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c0c7fff8100: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff8110: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
  0x0c0c7fff8120: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8130: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff8140: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c0c7fff8150: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8160: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==32375==ABORTING
../../tests/ovs-macros.at:194: hard failure
/home/blp/nicira/ovs/_build/tests/testsuite.dir/1920/cleanup: line 1: kill: (32375) - No such process
1920. ovsdb-server.at:1126: 1920. schema conversion online - clustered (ovsdb-server.at:1126): FAILED (ovs-macros.at:194)

Ben Pfaff Feb. 28, 2019, 2:01 a.m. UTC | #3

I spent a bunch of time looking through this series.  The actual support
for fast resync is really simple and clean.

The first patch is the one that I have the most trouble with.  I think
it's probably because I don't understand the existing code all that
well.  I wrote the original monitor implementation, then other people
added the caching layer, and I think I reviewed that caching layer
without ever properly understanding it.  Do you understand it?  Are you
confident that your revision is high quality, and are you confident that
you can maintain it?

Thanks,

Ben.

Han Zhou Feb. 28, 2019, 4:04 a.m. UTC | #4

On Wed, Feb 27, 2019 at 6:01 PM Ben Pfaff <blp@ovn.org> wrote:
>
> I spent a bunch of time looking through this series.  The actual support
> for fast resync is really simple and clean.
>
> The first patch is the one that I have the most trouble with.  I think
> it's probably because I don't understand the existing code all that
> well.  I wrote the original monitor implementation, then other people
> added the caching layer, and I think I reviewed that caching layer
> without ever properly understanding it.  Do you understand it?  Are you
> confident that your revision is high quality, and are you confident that
> you can maintain it?
>
> Thanks,
>
> Ben.

Hi Ben,

Thanks a lot for spending time reviewing this. For the first patch:
"ovsdb-monitor: Refactor ovsdb monitor implementation.", I am pretty
confident on it. I see spaces for further improvement, but at least I
am sure the revised version is functionally equivalent to the original
one. The monitoring implementation was hard to understand for me, too.
After spending a lot of time on it I understand the structure and most
of the details now, and I believe it is slightly easier to understand
after the refactoring, and I think I can maintain it. Consider my
previous bug fix on the crash caused by condition change as an
evidence :o) I think I can add some documentation on it (so that I
will not forget after some time).

And thanks for finding the Address Sanitizer test failure. I think I
got the root cause, which is related to the order of freeing the rows
tracked in the txn history and the table schema that is used. I will
figure out how to properly fix it.

Thanks,
Han

Han Zhou Feb. 28, 2019, 4:13 a.m. UTC | #5

On Wed, Feb 27, 2019 at 8:04 PM Han Zhou <zhouhan@gmail.com> wrote:
>
> On Wed, Feb 27, 2019 at 6:01 PM Ben Pfaff <blp@ovn.org> wrote:
> >
> > I spent a bunch of time looking through this series.  The actual support
> > for fast resync is really simple and clean.
> >
> > The first patch is the one that I have the most trouble with.  I think
> > it's probably because I don't understand the existing code all that
> > well.  I wrote the original monitor implementation, then other people
> > added the caching layer, and I think I reviewed that caching layer
> > without ever properly understanding it.  Do you understand it?  Are you
> > confident that your revision is high quality, and are you confident that
> > you can maintain it?
> >
> > Thanks,
> >
> > Ben.
>
> Hi Ben,
>
> Thanks a lot for spending time reviewing this. For the first patch:
> "ovsdb-monitor: Refactor ovsdb monitor implementation.", I am pretty
> confident on it. I see spaces for further improvement, but at least I
> am sure the revised version is functionally equivalent to the original
> one. The monitoring implementation was hard to understand for me, too.
> After spending a lot of time on it I understand the structure and most
> of the details now, and I believe it is slightly easier to understand
> after the refactoring, and I think I can maintain it. Consider my
> previous bug fix on the crash caused by condition change as an
> evidence :o) I think I can add some documentation on it (so that I
> will not forget after some time).
>
> And thanks for finding the Address Sanitizer test failure. I think I
> got the root cause, which is related to the order of freeing the rows
> tracked in the txn history and the table schema that is used. I will
> figure out how to properly fix it.
>
> Thanks,
> Han

I realized that you might be talking specifically about the json cache
part. It is pretty clear to me now and I can briefly explain. It just
tries to avoid the repeated work of translating same set of changes
into json_rpc. When many clients are monitoring same set of tables and
columns without filter conditions, the cached json-rpc will be used to
send notifications and this may save a lot of cost. When clients have
different conditions, the cache is not used. The cache is based on a
change set and monitor version, because different monitor version
constructs different json-rpc messages. Whenever there are changes to
the change set, the cache is invalidated (flushed).

Han Zhou Feb. 28, 2019, 7:37 a.m. UTC | #6

On Wed, Feb 27, 2019 at 5:26 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote:
> > In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> > problem at least with 1k HVs. However, if we restart the ovsdb-server,
> > depending on the number of HVs and scale of logical objects, e.g. the
> > number of logical ports, ovsdb-server of SB become an obvious bottleneck.
> >
> > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> > connected by one single logical router). Restarting ovsdb-server of SB
> > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> > northd) are reconnecting and resyncing the big amount of data at the same
> > time.
> >
> > Similar problem would happen in failover scenario. With active-active
> > cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> > it is 3-node cluster) of the HVs will need to resync data from new servers,
> > but it is still a serious problem.
> >
> > For detailed discussions for the problem and solutions, see:
> > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html
>
> Thanks.
>
> When I apply this series, I get a reproducible test failure in test
> 1920 "schema conversion online - clustered".  It's an error from Address
> Sanitizer.  I'm appending the testsuite.log.
>

Thanks Ben for catching this. I should enable AddressSanitizer for
regression tests. Please find below patch that fixes this bug. I will
send v4 with this fixing the patch 2/5: ovsdb-server: Transaction
history tracking.

----8><----------------------------------------------------><8----
diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
index ea7dd23..cfc96b3 100644
--- a/ovsdb/ovsdb.c
+++ b/ovsdb/ovsdb.c
@@ -538,6 +538,9 @@ ovsdb_replace(struct ovsdb *dst, struct ovsdb *src)
         ovsdb_trigger_prereplace_db(trigger);
     }

+    /* Destroy txn history. */
+    ovsdb_txn_history_destroy(dst);
+
     struct ovsdb_schema *tmp_schema = dst->schema;
     dst->schema = src->schema;
     src->schema = tmp_schema;
diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c
index 0081840..b3f4946 100644
--- a/ovsdb/transaction.c
+++ b/ovsdb/transaction.c
@@ -1415,7 +1415,9 @@ ovsdb_txn_history_destroy(struct ovsdb *db)

     struct ovsdb_txn_history_node *txn_h_node, *next;
     LIST_FOR_EACH_SAFE (txn_h_node, next, node, &db->txn_history) {
+        ovs_list_remove(&txn_h_node->node);
         ovsdb_txn_destroy_cloned(txn_h_node->txn);
         free(txn_h_node);
     }
+    db->n_txn_history = 0;
 }

[ovs-dev,v3,0/5] Fast OVSDB resync after restart or failover.

Message

Comments