diff mbox series

[ovs-dev] scale test testing requests (was: raft ovsdb clustering with scale test)

Message ID 20180518203545.GW7948@ovn.org
State Not Applicable
Headers show
Series [ovs-dev] scale test testing requests (was: raft ovsdb clustering with scale test) | expand

Commit Message

Ben Pfaff May 18, 2018, 8:35 p.m. UTC
I've spent some time stressing the database yesterday and today.  So
far, I can't reproduce these particular problems.  I do see various ways
to improve OVS and OVN and their tests.

Here are some suggestions I have for further testing:

1. You mentioned that programs were segfaulting.  They should not be
   doing that (obviously) but I wasn't able to get them to do so in my
   own testing.  It would be very helpful to have backtraces.  Would you
   mind trying to get them?

2. You mentioned that "perf" shows that lots of time is being spent
   writing snapshots.  It would be helpful to know whether this is a
   contributing factor in the failures.  (If it is, then I will work on
   making snapshots faster.)  One way to figure that out would be to
   disable snapshots entirely for testing.  That isn't acceptable in
   production because it will use up all the disk space eventually, but
   for testing one could apply the following patch:


3. This isn't really a testing note but I do see that the way that OVSDB
   is proxying writes from a Raft follower to the leader is needlessly
   inefficient and I should rework it for better write performance.

Comments

Han Zhou May 18, 2018, 9:30 p.m. UTC | #1
On Fri, May 18, 2018 at 1:35 PM, Ben Pfaff <blp@ovn.org> wrote:
>
> I've spent some time stressing the database yesterday and today.  So
> far, I can't reproduce these particular problems.  I do see various ways
> to improve OVS and OVN and their tests.
>
> Here are some suggestions I have for further testing:
>
> 1. You mentioned that programs were segfaulting.  They should not be
>    doing that (obviously) but I wasn't able to get them to do so in my
>    own testing.  It would be very helpful to have backtraces.  Would you
>    mind trying to get them?
>
> 2. You mentioned that "perf" shows that lots of time is being spent
>    writing snapshots.  It would be helpful to know whether this is a
>    contributing factor in the failures.  (If it is, then I will work on
>    making snapshots faster.)  One way to figure that out would be to
>    disable snapshots entirely for testing.  That isn't acceptable in
>    production because it will use up all the disk space eventually, but
>    for testing one could apply the following patch:
>
> diff --git a/ovsdb/storage.c b/ovsdb/storage.c
> index 446cae0861ec..9fa9954b6d35 100644
> --- a/ovsdb/storage.c
> +++ b/ovsdb/storage.c
> @@ -490,38 +490,8 @@ schedule_next_snapshot(struct ovsdb_storage
*storage, bool quick)
>  }
>
>  bool
> -ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage)
> +ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage
OVS_UNUSED)
>  {
> -    if (storage->raft || storage->log) {
> -        /* If we haven't reached the minimum snapshot time, don't
snapshot. */
> -        long long int now = time_msec();
> -        if (now < storage->next_snapshot_min) {
> -            return false;
> -        }
> -
> -        /* If we can't snapshot right now, don't. */
> -        if (storage->raft && !raft_may_snapshot(storage->raft)) {
> -            return false;
> -        }
> -
> -        uint64_t log_len = (storage->raft
> -                            ? raft_get_log_length(storage->raft)
> -                            : storage->n_read + storage->n_written);
> -        if (now < storage->next_snapshot_max) {
> -            /* Maximum snapshot time not yet reached.  Take a snapshot
if there
> -             * have been at least 100 log entries and the log file size
has
> -             * grown a lot. */
> -            bool grew_lots = (storage->raft
> -                              ? raft_grew_lots(storage->raft)
> -                              : ovsdb_log_grew_lots(storage->log));
> -            return log_len >= 100 && grew_lots;
> -        } else {
> -            /* We have reached the maximum snapshot time.  Take a
snapshot if
> -             * there have been any log entries at all. */
> -            return log_len > 0;
> -        }
> -    }
> -
>      return false;
>  }
>
>
> 3. This isn't really a testing note but I do see that the way that OVSDB
>    is proxying writes from a Raft follower to the leader is needlessly
>    inefficient and I should rework it for better write performance.

Thanks Ben for taking effort on this. We'll do more tests the way you
suggested. Here I have just one more detail of our test scenario, which
might (or not) help on reproducing the problem:
In our test we create lport in 100 batch size with a single ovn-nbctl
command (i.e. one transaction), and then bind each of the new lports on a
random HV. After all these 100 lports are bound, we do ovn-nbctl wait-until
logical_switch_port <port> up=true for each lport. After all lport is up in
NB, we continue with next batch. Not sure if this has anything to do with
the problems, but it is how we tested :)
Ben Pfaff May 18, 2018, 9:46 p.m. UTC | #2
On Fri, May 18, 2018 at 02:30:35PM -0700, Han Zhou wrote:
> Thanks Ben for taking effort on this. We'll do more tests the way you
> suggested. Here I have just one more detail of our test scenario, which
> might (or not) help on reproducing the problem:
> In our test we create lport in 100 batch size with a single ovn-nbctl
> command (i.e. one transaction), and then bind each of the new lports on a
> random HV. After all these 100 lports are bound, we do ovn-nbctl wait-until
> logical_switch_port <port> up=true for each lport. After all lport is up in
> NB, we continue with next batch. Not sure if this has anything to do with
> the problems, but it is how we tested :)

It seems like a reasonable way to test, to me.  My testing is using
batches of 50 also.
aginwala May 31, 2018, 4:50 p.m. UTC | #3
On Fri, May 18, 2018 at 1:35 PM, Ben Pfaff <blp@ovn.org> wrote:

> I've spent some time stressing the database yesterday and today.  So
> far, I can't reproduce these particular problems.  I do see various ways
> to improve OVS and OVN and their tests.
>
> Here are some suggestions I have for further testing:
>
> 1. You mentioned that programs were segfaulting.  They should not be
>    doing that (obviously) but I wasn't able to get them to do so in my
>    own testing.  It would be very helpful to have backtraces.  Would you
>    mind trying to get them?
>
> >>> Seems this is no more an issue as I suspect my very first run was
having low disk, ram and vcpu. After, I increased the same I dont see it
anymore.


2. You mentioned that "perf" shows that lots of time is being spent
>    writing snapshots.  It would be helpful to know whether this is a
>    contributing factor in the failures.  (If it is, then I will work on
>    making snapshots faster.)  One way to figure that out would be to
>    disable snapshots entirely for testing.  That isn't acceptable in
>    production because it will use up all the disk space eventually, but
>    for testing one could apply the following patch:
>
> diff --git a/ovsdb/storage.c b/ovsdb/storage.c
> index 446cae0861ec..9fa9954b6d35 100644
> --- a/ovsdb/storage.c
> +++ b/ovsdb/storage.c
> @@ -490,38 +490,8 @@ schedule_next_snapshot(struct ovsdb_storage *storage,
> bool quick)
>  }
>
>  bool
> -ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage)
> +ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage
> OVS_UNUSED)
>  {
> -    if (storage->raft || storage->log) {
> -        /* If we haven't reached the minimum snapshot time, don't
> snapshot. */
> -        long long int now = time_msec();
> -        if (now < storage->next_snapshot_min) {
> -            return false;
> -        }
> -
> -        /* If we can't snapshot right now, don't. */
> -        if (storage->raft && !raft_may_snapshot(storage->raft)) {
> -            return false;
> -        }
> -
> -        uint64_t log_len = (storage->raft
> -                            ? raft_get_log_length(storage->raft)
> -                            : storage->n_read + storage->n_written);
> -        if (now < storage->next_snapshot_max) {
> -            /* Maximum snapshot time not yet reached.  Take a snapshot if
> there
> -             * have been at least 100 log entries and the log file size
> has
> -             * grown a lot. */
> -            bool grew_lots = (storage->raft
> -                              ? raft_grew_lots(storage->raft)
> -                              : ovsdb_log_grew_lots(storage->log));
> -            return log_len >= 100 && grew_lots;
> -        } else {
> -            /* We have reached the maximum snapshot time.  Take a
> snapshot if
> -             * there have been any log entries at all. */
> -            return log_len > 0;
> -        }
> -    }
> -
>      return false;
>  }
>

>> I applied this patch but the issue is still the same. rally run still
hangs after 1k+ lports.  So I think even though perf was showing high
consumption in write/save/create snapshot, it should not be a bottleneck.

>
> 3. This isn't really a testing note but I do see that the way that OVSDB
>    is proxying writes from a Raft follower to the leader is needlessly
>    inefficient and I should rework it for better write performance.
>
>>> May be we should consider this too.


Further analysis indicate that for my current run, rally just stalls and
then results in db time out for a new batch of 250 lports few minutes post
1k+ lports creation . e.g.payload is attached. However, during the stall if
I run ovn-nbctl with few lports commands manually it works fine. Also CPU
is constant and pretty much ideal.

I only see below logs in the nb db logs on a raft node when this happens:
2018-05-31T06:21:30.631Z|00216|reconnect|WARN|tcp:192.168.10.10:40970:
connection dropped (Connection reset by peer)
2018-05-31T06:21:30.968Z|00217|reconnect|WARN|tcp:192.168.10.10:40982:
connection dropped (Connection reset by peer)
2018-05-31T06:21:31.661Z|00218|reconnect|WARN|tcp:192.168.10.10:40998:
connection dropped (Connection reset by peer)
2018-05-31T06:21:31.806Z|00219|reconnect|WARN|tcp:192.168.10.10:41004:
connection dropped (Connection reset by peer)
2018-05-31T06:21:32.330Z|00220|reconnect|WARN|tcp:192.168.10.10:41008:
connection dropped (Connection reset by peer)
2018-05-31T06:21:33.702Z|00221|reconnect|WARN|tcp:192.168.10.10:41034:
connection dropped (Connection reset by peer)
2018-05-31T06:21:35.337Z|00222|reconnect|WARN|tcp:192.168.10.10:41064:
connection dropped (Connection reset by peer)
2018-05-31T06:21:36.043Z|00223|reconnect|WARN|tcp:192.168.10.10:41072:
connection dropped (Connection reset by peer)
2018-05-31T06:21:36.483Z|00224|reconnect|WARN|tcp:192.168.10.10:41078:
connection dropped (Broken pipe)
2018-05-31T06:21:36.786Z|00225|reconnect|WARN|tcp:192.168.10.10:41080:
connection dropped (Broken pipe)

where 192.168.10.10 is the host (BM) which runs ovn-nbctl commands  with
--db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.103:6641" where 3 raft nodes are running on a VM backed by
linux bridge on the same host 192.168.10.10

Does it give you some hints? Do you think that if db entry grows, bulk
lport creation for raft would stall or something else?


Also below logs are constant on master node with entry eid mismatch:
2018-05-30T22:53:13.903Z|00549|raft|INFO|current entry eid
667cefa2-3c31-41a8-9798-1c379dcbea26 does not match prerequisite
549881bb-7105-4f1a-8f45-98443d97d14c
2018-05-30T22:53:13.903Z|00550|raft|INFO|current entry eid
667cefa2-3c31-41a8-9798-1c379dcbea26 does not match prerequisite
549881bb-7105-4f1a-8f45-98443d97d14c
2018-05-30T22:53:13.910Z|00551|raft|INFO|current entry eid
9ca22ca3-fed6-41f5-8f58-25b0d7bfc8c4 does not match prerequisite
cbb82ac1-1206-4d0e-9849-66dd1dff7110
2018-05-30T22:53:13.911Z|00552|raft|INFO|current entry eid
9ca22ca3-fed6-41f5-8f58-25b0d7bfc8c4 does not match prerequisite
cbb82ac1-1206-4d0e-9849-66dd1dff7110
2018-05-30T22:53:13.925Z|00553|raft|INFO|current entry eid
e8db2339-3419-4601-ba9e-21a2e67ea014 does not match prerequisite
7dfeb122-107f-4695-8491-217a178c42d9


Will continue to see if I can get something else. But for now its clear
that running the payload attached stalls as well and its constantly
reproducible.


Regards,
Aliasgar
diff mbox series

Patch

diff --git a/ovsdb/storage.c b/ovsdb/storage.c
index 446cae0861ec..9fa9954b6d35 100644
--- a/ovsdb/storage.c
+++ b/ovsdb/storage.c
@@ -490,38 +490,8 @@  schedule_next_snapshot(struct ovsdb_storage *storage, bool quick)
 }
 
 bool
-ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage)
+ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage OVS_UNUSED)
 {
-    if (storage->raft || storage->log) {
-        /* If we haven't reached the minimum snapshot time, don't snapshot. */
-        long long int now = time_msec();
-        if (now < storage->next_snapshot_min) {
-            return false;
-        }
-
-        /* If we can't snapshot right now, don't. */
-        if (storage->raft && !raft_may_snapshot(storage->raft)) {
-            return false;
-        }
-
-        uint64_t log_len = (storage->raft
-                            ? raft_get_log_length(storage->raft)
-                            : storage->n_read + storage->n_written);
-        if (now < storage->next_snapshot_max) {
-            /* Maximum snapshot time not yet reached.  Take a snapshot if there
-             * have been at least 100 log entries and the log file size has
-             * grown a lot. */
-            bool grew_lots = (storage->raft
-                              ? raft_grew_lots(storage->raft)
-                              : ovsdb_log_grew_lots(storage->log));
-            return log_len >= 100 && grew_lots;
-        } else {
-            /* We have reached the maximum snapshot time.  Take a snapshot if
-             * there have been any log entries at all. */
-            return log_len > 0;
-        }
-    }
-
     return false;
 }