Message ID | 1551374120-44287-1-git-send-email-hzhou8@ebay.com |
---|---|
Headers | show |
Series | Fast OVSDB resync after restart or fail-over. | expand |
On Thu, Feb 28, 2019 at 09:15:15AM -0800, Han Zhou wrote: > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > problem at least with 1k HVs. However, if we restart the ovsdb-server, > depending on the number of HVs and scale of logical objects, e.g. the > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > connected by one single logical router). Restarting ovsdb-server of SB > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > northd) are reconnecting and resyncing the big amount of data at the same > time. > > Similar problem would happen in failover scenario. With active-active > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > it is 3-node cluster) of the HVs will need to resync data from new servers, > but it is still a serious problem. > > For detailed discussions for the problem and solutions, see: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html > > The patches implements the proposal in that discussion. It introduces > a new method monitor_cond_since to enable client to request changes that > happened after a specific point so that the data has been cached already > in client are not re-transfered. Scalability test shows dramatic improvement. > All HVs finishes sync as soon as they reconnect since there is no new data > to be transfered. Thanks a lot. I applied this to master. I want to encourage you to send another patch adding a NEWS item.