Message ID | 1551115508-12996-1-git-send-email-hzhou8@ebay.com |
---|---|
Headers | show |
Series | Fast OVSDB resync after restart or failover. | expand |
It didn't get caught in any filters for a different -dev list that I happen to also moderate. Doh! Anyway, I released it now. On Mon, Feb 25, 2019 at 11:24:21AM -0800, Ben Pfaff wrote: > It didn't get caught in any filters this time. > > I see all the patches, although maybe I was CCed. > > On Mon, Feb 25, 2019 at 11:08:40AM -0800, Han Zhou wrote: > > This seems not showing up in the mailing list. Retry sending using > > different words in title. > > On Mon, Feb 25, 2019 at 9:25 AM Han Zhou <zhouhan@gmail.com> wrote: > > > > > > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > > > problem at least with 1k HVs. However, if we restart the ovsdb-server, > > > depending on the number of HVs and scale of logical objects, e.g. the > > > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > > > > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > > > connected by one single logical router). Restarting ovsdb-server of SB > > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > > > northd) are reconnecting and resyncing the big amount of data at the same > > > time. > > > > > > Similar problem would happen in failover scenario. With active-active > > > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > > > it is 3-node cluster) of the HVs will need to resync data from new servers, > > > but it is still a serious problem. > > > > > > For detailed discussions for the problem and solutions, see: > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html > > > > > > The patches implements the proposal in that discussion. It introduces > > > a new method monitor_cond_since to enable client to request changes that > > > happened after a specific point so that the data has been cached already > > > in client are not re-transfered. Scalability test shows dramatic improvement. > > > All HVs finishes sync as soon as they reconnect since there is no new data > > > to be transfered. > > > > > > The current patches supports all 3 modes of ovsdb-server, but only clustered > > > mode can benefit from it, since it is the only one that supports transaction > > > id out of the box. > > > > > > --- > > > v1 -> v2: fix the unused variable in patch 6/7 reported by 0-day. > > > v2 -> v3: first 2 in the 7 were merged. Revised 1/5 addressing Ben's > > > comment on the json cache hmap. > > > > > > Han Zhou (5): > > > ovsdb-monitor: Refactor ovsdb monitor implementation. > > > ovsdb-server: Transaction history tracking. > > > ovsdb-monitor: Support monitor_cond_since. > > > ovsdb-idl.c: Support monitor_cond_since method in C IDL. > > > ovsdb-idl.c: Fast resync from server when connection reset. > > > > > > Documentation/ref/ovsdb-server.7.rst | 78 +++++- > > > lib/ovsdb-idl.c | 229 ++++++++++++---- > > > ovsdb/jsonrpc-server.c | 101 +++++-- > > > ovsdb/monitor.c | 516 ++++++++++++++++++++++------------- > > > ovsdb/monitor.h | 16 +- > > > ovsdb/ovsdb-client.1.in | 28 +- > > > ovsdb/ovsdb-client.c | 104 ++++++- > > > ovsdb/ovsdb-server.c | 11 + > > > ovsdb/ovsdb.c | 3 + > > > ovsdb/ovsdb.h | 10 + > > > ovsdb/transaction.c | 117 +++++++- > > > ovsdb/transaction.h | 4 + > > > tests/ovsdb-monitor.at | 301 +++++++++++++++++++- > > > 13 files changed, 1232 insertions(+), 286 deletions(-) > > > > > > -- > > > 2.1.0 > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote: > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > problem at least with 1k HVs. However, if we restart the ovsdb-server, > depending on the number of HVs and scale of logical objects, e.g. the > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > connected by one single logical router). Restarting ovsdb-server of SB > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > northd) are reconnecting and resyncing the big amount of data at the same > time. > > Similar problem would happen in failover scenario. With active-active > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > it is 3-node cluster) of the HVs will need to resync data from new servers, > but it is still a serious problem. > > For detailed discussions for the problem and solutions, see: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html Thanks. When I apply this series, I get a reproducible test failure in test 1920 "schema conversion online - clustered". It's an error from Address Sanitizer. I'm appending the testsuite.log. Obviously this needs to be fixed but I'll take a look at the patches in more detail now too. # -*- compilation -*- 1920. ovsdb-server.at:1126: testing schema conversion online - clustered ... ../../tests/ovsdb-server.at:869: if test $model = standalone; then ovsdb-tool create db schema else ovsdb-tool create-cluster db schema unix:s1.raft fi ../../tests/ovsdb-server.at:876: ovsdb-server -vfile -vvlog:off -vconsole:off --detach --no-chdir --pidfile --log-file --remote=punix:db.sock db ../../tests/ovsdb-server.at:880: for pair in 'zero 0' 'one 1' 'two 2' 'three 3' 'four 4' 'five 5'; do set -- $pair ovsdb-client transact ' ["ordinals", {"op": "insert", "table": "ordinals", "row": {"name": "'$1'", "number": '$2'}}, {"op": "comment", "comment": "add row for '"$pair"'"}]' done | uuidfilt stderr: ../../tests/ovsdb-server.at:900: ovsdb-client needs-conversion schema ../../tests/ovsdb-server.at:902: ovsdb-client needs-conversion new-schema ../../tests/ovsdb-server.at:907: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-ordinals-aware.pid --log-file=monitor-ordinals-aware.log --db-change-aware --no-headings monitor ordinals ordinals number name > monitor-ordinals-aware.stdout 2> monitor-ordinals-aware.stderr ../../tests/ovsdb-server.at:912: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-ordinals-unaware.pid --log-file=monitor-ordinals-unaware.log --no-db-change-aware --no-headings monitor ordinals ordinals number name > monitor-ordinals-unaware.stdout 2> monitor-ordinals-unaware.stderr ../../tests/ovsdb-server.at:919: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-server-aware.pid --log-file=monitor-server-aware.log --db-change-aware --no-headings monitor _Server Database name > monitor-server-aware.stdout 2> monitor-server-aware.stderr ../../tests/ovsdb-server.at:924: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=monitor-server-unaware.pid --log-file=monitor-server-unaware.log --no-db-change-aware --no-headings monitor _Server Database name > monitor-server-unaware.stdout 2> monitor-server-unaware.stderr ../../tests/ovsdb-server.at:938: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-ordinals-aware.pid --log-file=trigger-ordinals-aware.log --db-change-aware transact "$ordinals_txn" > trigger-ordinals-aware.stdout 2> trigger-ordinals-aware.stderr ../../tests/ovsdb-server.at:943: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-ordinals-unaware.pid --log-file=trigger-ordinals-unaware.log --no-db-change-aware transact "$ordinals_txn" > trigger-ordinals-unaware.stdout 2> trigger-ordinals-unaware.stderr ../../tests/ovsdb-server.at:957: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-server-aware.pid --log-file=trigger-server-aware.log --db-change-aware transact "$server_txn" > trigger-server-aware.stdout 2> trigger-server-aware.stderr ../../tests/ovsdb-server.at:962: ovsdb-client -vfile -vvlog:off --detach --no-chdir --pidfile=trigger-server-unaware.pid --log-file=trigger-server-unaware.log --no-db-change-aware transact "$server_txn" > trigger-server-unaware.stdout 2> trigger-server-unaware.stderr ../../tests/ovsdb-server.at:968: ovsdb-client dump unix:db.sock ordinals stdout: ordinals table _uuid name number ------------------------------------ ----- ------ c495692b-f991-440b-865b-ed7c61fe6412 five 5 d731216a-16c9-4b78-86da-31c65b0b8358 four 4 3e3e6998-2633-4bd0-bb97-3c3c4cbd7227 one 1 7fbcde1d-30ee-47ea-bf24-a40f16f05b15 three 3 6a7387b0-8115-48e9-a202-e2898a8f1772 two 2 79dcb8d0-7738-417c-b779-40d530c87c90 zero 0 ../../tests/ovsdb-server.at:969: uuidfilt stdout ../../tests/ovsdb-server.at:982: ovsdb-client convert new-schema ../../tests/ovsdb-server.at:985: ovsdb-client needs-conversion schema ../../tests/ovsdb-server.at:987: ovsdb-client needs-conversion new-schema aware ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid... ovsdb-server.at:994: wait succeeded immediately ../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt unaware ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid... ovsdb-server.at:994: wait succeeded immediately ../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt ../../tests/ovsdb-server.at:1004: sed 's/.*: //' monitor-ordinals-unaware.stderr ../../tests/ovsdb-server.at:1006: sed 's/.*: //' monitor-ordinals-aware.stderr ../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt ../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt ovsdb-server.at:1017: waiting while test -e monitor-server-unaware.pid... ovsdb-server.at:1017: wait succeeded immediately ../../tests/ovsdb-server.at:1018: sed 's/.*: //' monitor-ordinals-unaware.stderr ../../tests/ovsdb-server.at:1020: test -e monitor-server-aware.pid ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid... ovsdb-server.at:1025: wait succeeded immediately ../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid... ovsdb-server.at:1025: wait succeeded immediately ../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout ../../tests/ovsdb-server.at:1028: cat trigger-ordinals-unaware.stderr ../../tests/ovsdb-server.at:1030: cat trigger-ordinals-aware.stderr ../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout ../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout ovsdb-server.at:1038: waiting while test -e trigger-server-unaware.pid... ovsdb-server.at:1038: wait succeeded immediately ../../tests/ovsdb-server.at:1039: sed 's/.*: //' trigger-ordinals-unaware.stderr ../../tests/ovsdb-server.at:1041: test -e trigger-server-aware.pid ../../tests/ovsdb-server.at:1051: ovsdb-client dump unix:db.sock ordinals | uuidfilt ../../tests/ovsdb-server.at:1065: ovsdb-client transact ' ["ordinals", {"op": "insert", "table": "ordinals", "row": {"number": 6}}, {"op": "comment", "comment": "add row for 6"}]' | uuidfilt ../../tests/ovsdb-server.at:1078: ovsdb-client dump unix:db.sock ordinals | uuidfilt ../../tests/ovsdb-server.at:1092: test -e $OVS_RUNDIR/ovsdb-server.pid ../../tests/ovsdb-server.at:1092: ovs-appctl --timeout=10 -t ovsdb-server exit ovsdb-server.at:1092: waiting while kill -0 $TMPPID 2>/dev/null... ovsdb-server.at:1092: wait succeeded after 1 seconds ../../tests/ovsdb-server.at:1093: ovsdb-server -vfile -vvlog:off -vconsole:off --detach --no-chdir --pidfile --log-file --remote=punix:db.sock db ../../tests/ovsdb-server.at:1095: ovsdb-client dump unix:db.sock ordinals | uuidfilt ../../tests/ovsdb-server.at:1111: test -h db ../../tests/ovsdb-server.at:1112: test -h .db.~lock~ ../../tests/ovsdb-server.at:1113: test -f dir/db ../../tests/ovsdb-server.at:1114: test -f dir/.db.~lock~ ../../tests/ovsdb-server.at:1117: test -e $OVS_RUNDIR/ovsdb-server.pid ../../tests/ovsdb-server.at:1117: ovs-appctl --timeout=10 -t ovsdb-server exit ovsdb-server.at:1117: waiting while kill -0 $TMPPID 2>/dev/null... ovsdb-server.at:1117: wait succeeded after 1 seconds Address Sanitizer reported errors in: asan.32224 asan.32375 ================================================================= ==32224==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000005c0 at pc 0x560f3a96859f bp 0x7ffee562c110 sp 0x7ffee562c108 READ of size 8 at 0x6060000005c0 thread T0 #0 0x560f3a96859e in ovsdb_row_destroy ../ovsdb/row.c:103 #1 0x560f3a973520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941 #2 0x560f3a97a234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418 #3 0x560f3a9599de in ovsdb_destroy ../ovsdb/ovsdb.c:459 #4 0x560f3a94600e in close_db ../ovsdb/ovsdb-server.c:522 #5 0x560f3a941cd5 in main ../ovsdb/ovsdb-server.c:463 #6 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308 #7 0x560f3a9438f9 in _start (/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9) 0x6060000005c0 is located 0 bytes inside of 56-byte region [0x6060000005c0,0x6060000005f8) freed by thread T0 here: #0 0x7f21536fafd0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0) #1 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:467 #2 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:447 #3 0x560f3a945117 in parse_txn ../ovsdb/ovsdb-server.c:556 #4 0x560f3a945117 in read_db ../ovsdb/ovsdb-server.c:603 #5 0x560f3a9401f9 in main_loop ../ovsdb/ovsdb-server.c:232 #6 0x560f3a9401f9 in main ../ovsdb/ovsdb-server.c:458 #7 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7f21536fb350 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9350) #1 0x560f3a9fd794 in xmalloc ../lib/util.c:123 #2 0x560f3a96f46d in ovsdb_table_create ../ovsdb/table.c:295 #3 0x560f3a959603 in ovsdb_create ../ovsdb/ovsdb.c:423 #4 0x560f3a9450f3 in parse_txn ../ovsdb/ovsdb-server.c:556 #5 0x560f3a9450f3 in read_db ../ovsdb/ovsdb-server.c:603 #6 0x560f3a946b74 in open_db ../ovsdb/ovsdb-server.c:670 #7 0x560f3a9419f6 in main ../ovsdb/ovsdb-server.c:356 #8 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in ovsdb_row_destroy Shadow bytes around the buggy address: 0x0c0c7fff8060: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fff8070: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd 0x0c0c7fff8080: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd 0x0c0c7fff8090: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa 0x0c0c7fff80a0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd =>0x0c0c7fff80b0: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa 0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fff80d0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd 0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd 0x0c0c7fff80f0: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa 0x0c0c7fff8100: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==32224==ABORTING ================================================================= ==32375==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000008c0 at pc 0x5629da28a59f bp 0x7ffc98cc14e0 sp 0x7ffc98cc14d8 READ of size 8 at 0x6060000008c0 thread T0 #0 0x5629da28a59e in ovsdb_row_destroy ../ovsdb/row.c:103 #1 0x5629da295520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941 #2 0x5629da29c234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418 #3 0x5629da27b9de in ovsdb_destroy ../ovsdb/ovsdb.c:459 #4 0x5629da26800e in close_db ../ovsdb/ovsdb-server.c:522 #5 0x5629da263cd5 in main ../ovsdb/ovsdb-server.c:463 #6 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308 #7 0x5629da2658f9 in _start (/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9) 0x6060000008c0 is located 0 bytes inside of 56-byte region [0x6060000008c0,0x6060000008f8) freed by thread T0 here: #0 0x7f4f4a69ffd0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0) #1 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:467 #2 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:447 #3 0x5629da267117 in parse_txn ../ovsdb/ovsdb-server.c:556 #4 0x5629da267117 in read_db ../ovsdb/ovsdb-server.c:603 #5 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670 #6 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356 #7 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7f4f4a6a0350 in __interceptor_malloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9350) #1 0x5629da31f794 in xmalloc ../lib/util.c:123 #2 0x5629da29146d in ovsdb_table_create ../ovsdb/table.c:295 #3 0x5629da27b603 in ovsdb_create ../ovsdb/ovsdb.c:423 #4 0x5629da2670f3 in parse_txn ../ovsdb/ovsdb-server.c:556 #5 0x5629da2670f3 in read_db ../ovsdb/ovsdb-server.c:603 #6 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670 #7 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356 #8 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in ovsdb_row_destroy Shadow bytes around the buggy address: 0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fff80d0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd 0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd 0x0c0c7fff80f0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa 0x0c0c7fff8100: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd =>0x0c0c7fff8110: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa 0x0c0c7fff8120: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fff8130: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd 0x0c0c7fff8140: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00 0x0c0c7fff8150: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fff8160: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==32375==ABORTING ../../tests/ovs-macros.at:194: hard failure /home/blp/nicira/ovs/_build/tests/testsuite.dir/1920/cleanup: line 1: kill: (32375) - No such process 1920. ovsdb-server.at:1126: 1920. schema conversion online - clustered (ovsdb-server.at:1126): FAILED (ovs-macros.at:194)
I spent a bunch of time looking through this series. The actual support for fast resync is really simple and clean. The first patch is the one that I have the most trouble with. I think it's probably because I don't understand the existing code all that well. I wrote the original monitor implementation, then other people added the caching layer, and I think I reviewed that caching layer without ever properly understanding it. Do you understand it? Are you confident that your revision is high quality, and are you confident that you can maintain it? Thanks, Ben.
On Wed, Feb 27, 2019 at 6:01 PM Ben Pfaff <blp@ovn.org> wrote: > > I spent a bunch of time looking through this series. The actual support > for fast resync is really simple and clean. > > The first patch is the one that I have the most trouble with. I think > it's probably because I don't understand the existing code all that > well. I wrote the original monitor implementation, then other people > added the caching layer, and I think I reviewed that caching layer > without ever properly understanding it. Do you understand it? Are you > confident that your revision is high quality, and are you confident that > you can maintain it? > > Thanks, > > Ben. Hi Ben, Thanks a lot for spending time reviewing this. For the first patch: "ovsdb-monitor: Refactor ovsdb monitor implementation.", I am pretty confident on it. I see spaces for further improvement, but at least I am sure the revised version is functionally equivalent to the original one. The monitoring implementation was hard to understand for me, too. After spending a lot of time on it I understand the structure and most of the details now, and I believe it is slightly easier to understand after the refactoring, and I think I can maintain it. Consider my previous bug fix on the crash caused by condition change as an evidence :o) I think I can add some documentation on it (so that I will not forget after some time). And thanks for finding the Address Sanitizer test failure. I think I got the root cause, which is related to the order of freeing the rows tracked in the txn history and the table schema that is used. I will figure out how to properly fix it. Thanks, Han
On Wed, Feb 27, 2019 at 8:04 PM Han Zhou <zhouhan@gmail.com> wrote: > > On Wed, Feb 27, 2019 at 6:01 PM Ben Pfaff <blp@ovn.org> wrote: > > > > I spent a bunch of time looking through this series. The actual support > > for fast resync is really simple and clean. > > > > The first patch is the one that I have the most trouble with. I think > > it's probably because I don't understand the existing code all that > > well. I wrote the original monitor implementation, then other people > > added the caching layer, and I think I reviewed that caching layer > > without ever properly understanding it. Do you understand it? Are you > > confident that your revision is high quality, and are you confident that > > you can maintain it? > > > > Thanks, > > > > Ben. > > Hi Ben, > > Thanks a lot for spending time reviewing this. For the first patch: > "ovsdb-monitor: Refactor ovsdb monitor implementation.", I am pretty > confident on it. I see spaces for further improvement, but at least I > am sure the revised version is functionally equivalent to the original > one. The monitoring implementation was hard to understand for me, too. > After spending a lot of time on it I understand the structure and most > of the details now, and I believe it is slightly easier to understand > after the refactoring, and I think I can maintain it. Consider my > previous bug fix on the crash caused by condition change as an > evidence :o) I think I can add some documentation on it (so that I > will not forget after some time). > > And thanks for finding the Address Sanitizer test failure. I think I > got the root cause, which is related to the order of freeing the rows > tracked in the txn history and the table schema that is used. I will > figure out how to properly fix it. > > Thanks, > Han I realized that you might be talking specifically about the json cache part. It is pretty clear to me now and I can briefly explain. It just tries to avoid the repeated work of translating same set of changes into json_rpc. When many clients are monitoring same set of tables and columns without filter conditions, the cached json-rpc will be used to send notifications and this may save a lot of cost. When clients have different conditions, the cache is not used. The cache is based on a change set and monitor version, because different monitor version constructs different json-rpc messages. Whenever there are changes to the change set, the cache is invalidated (flushed).
On Wed, Feb 27, 2019 at 5:26 PM Ben Pfaff <blp@ovn.org> wrote: > > On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote: > > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > > problem at least with 1k HVs. However, if we restart the ovsdb-server, > > depending on the number of HVs and scale of logical objects, e.g. the > > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > > connected by one single logical router). Restarting ovsdb-server of SB > > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > > northd) are reconnecting and resyncing the big amount of data at the same > > time. > > > > Similar problem would happen in failover scenario. With active-active > > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > > it is 3-node cluster) of the HVs will need to resync data from new servers, > > but it is still a serious problem. > > > > For detailed discussions for the problem and solutions, see: > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html > > Thanks. > > When I apply this series, I get a reproducible test failure in test > 1920 "schema conversion online - clustered". It's an error from Address > Sanitizer. I'm appending the testsuite.log. > Thanks Ben for catching this. I should enable AddressSanitizer for regression tests. Please find below patch that fixes this bug. I will send v4 with this fixing the patch 2/5: ovsdb-server: Transaction history tracking. ----8><----------------------------------------------------><8---- diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c index ea7dd23..cfc96b3 100644 --- a/ovsdb/ovsdb.c +++ b/ovsdb/ovsdb.c @@ -538,6 +538,9 @@ ovsdb_replace(struct ovsdb *dst, struct ovsdb *src) ovsdb_trigger_prereplace_db(trigger); } + /* Destroy txn history. */ + ovsdb_txn_history_destroy(dst); + struct ovsdb_schema *tmp_schema = dst->schema; dst->schema = src->schema; src->schema = tmp_schema; diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c index 0081840..b3f4946 100644 --- a/ovsdb/transaction.c +++ b/ovsdb/transaction.c @@ -1415,7 +1415,9 @@ ovsdb_txn_history_destroy(struct ovsdb *db) struct ovsdb_txn_history_node *txn_h_node, *next; LIST_FOR_EACH_SAFE (txn_h_node, next, node, &db->txn_history) { + ovs_list_remove(&txn_h_node->node); ovsdb_txn_destroy_cloned(txn_h_node->txn); free(txn_h_node); } + db->n_txn_history = 0; }