diff mbox series

[ovs-dev] ovsdb-server: Replace in-memory DB contents at raft install_snapshot.

Message ID 1596641273-5748-1-git-send-email-dceara@redhat.com
State Superseded
Headers show
Series [ovs-dev] ovsdb-server: Replace in-memory DB contents at raft install_snapshot. | expand

Commit Message

Dumitru Ceara Aug. 5, 2020, 3:27 p.m. UTC
Every time a follower has to install a snapshot received from the
leader, it should also replace the data in memory. Right now this only
happens when snapshots are installed that also change the schema.

This can lead to inconsistent DB data on follower nodes and the snapshot
may fail to get applied.

CC: Han Zhou <hzhou@ovn.org>
Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 ovsdb/ovsdb-server.c    | 21 +++++++++++++--------
 tests/idltest.ovsschema |  9 +++++++++
 tests/ovsdb-cluster.at  | 28 +++++++++++++++++++++++++---
 tests/ovsdb-idl.at      |  1 +
 4 files changed, 48 insertions(+), 11 deletions(-)

Comments

Han Zhou Aug. 5, 2020, 5:48 p.m. UTC | #1
On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> Every time a follower has to install a snapshot received from the
> leader, it should also replace the data in memory. Right now this only
> happens when snapshots are installed that also change the schema.
>
> This can lead to inconsistent DB data on follower nodes and the snapshot
> may fail to get applied.
>
> CC: Han Zhou <hzhou@ovn.org>
> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft
install_snapshot.")
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Thanks Dumitru! This is a great finding, and sorry for my mistake.
This patch looks good to me. Just one minor comment below on the test case.
Otherwise:

Acked-by: Han Zhou <hzhou@ovn.org>

> ---
>  ovsdb/ovsdb-server.c    | 21 +++++++++++++--------
>  tests/idltest.ovsschema |  9 +++++++++
>  tests/ovsdb-cluster.at  | 28 +++++++++++++++++++++++++---
>  tests/ovsdb-idl.at      |  1 +
>  4 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index ef4e996..fd7891a 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct db
*db,
>            const struct ovsdb_schema *schema, const struct json *txn_json,
>            const struct uuid *txnid)
>  {
> -    if (schema && (!db->db->schema || strcmp(schema->version,
> -                                             db->db->schema->version))) {
> +    if (schema) {
>          /* We're replacing the schema (and the data).  Destroy the
database
>           * (first grabbing its storage), then replace it with the new
schema.
>           * The transaction must also include the replacement data.
>           *
> -         * Only clustered database schema changes go through this path.
*/
> +         * Only clustered database schema changes and snapshot installs
> +         * go through this path.
> +         */
>          ovs_assert(txn_json);
>          ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
>
> @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct db
*db,
>              return error;
>          }
>
> -        ovsdb_jsonrpc_server_reconnect(
> -            config->jsonrpc, false,
> -            (db->db->schema
> -             ? xasprintf("database %s schema changed", db->db->name)
> -             : xasprintf("database %s connected to storage",
db->db->name)));
> +        if (!db->db->schema ||
> +            strcmp(schema->version, db->db->schema->version)) {
> +            ovsdb_jsonrpc_server_reconnect(
> +                config->jsonrpc, false,
> +                (db->db->schema
> +                ? xasprintf("database %s schema changed", db->db->name)
> +                : xasprintf("database %s connected to storage",
> +                            db->db->name)));
> +        }
>
>          ovsdb_replace(db->db, ovsdb_create(ovsdb_schema_clone(schema),
NULL));
>
> diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
> index e02b975..e04755e 100644
> --- a/tests/idltest.ovsschema
> +++ b/tests/idltest.ovsschema
> @@ -54,6 +54,15 @@
>        },
>        "isRoot" : true
>      },
> +    "indexed": {
> +      "columns": {
> +        "i": {
> +          "type": "integer"
> +        }
> +      },
> +      "indexes": [["i"]],
> +      "isRoot" : true
> +    },
>      "simple": {
>        "columns": {
>          "b": {
> diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> index 9714545..06d27c9 100644
> --- a/tests/ovsdb-cluster.at
> +++ b/tests/ovsdb-cluster.at
> @@ -332,15 +332,31 @@ for i in `seq $n`; do
>      AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
>  done
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "insert",
> +       "table": "indexed",
> +       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> +
>  # Kill one follower (s2) and write some data to cluster, so that the
follower is falling behind
>  printf "\ns2: stopping\n"
>  OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
>
> +# Delete "i":1 and readd it to get a different UUID for it.
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "delete",
> +       "table": "indexed",
> +       "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
> +
>  AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>        {"op": "insert",
> -       "table": "simple",
> +       "table": "indexed",
>         "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>
> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> +      {"op": "insert",
> +       "table": "indexed",
> +       "row": {"i": 2}}]]'], [0], [ignore], [ignore])
> +
>  # Compact leader online to generate snapshot
>  AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
>
> @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb
$schema_name connected])
>  # succeed.
>  AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
>        {"op": "insert",
> -       "table": "simple",
> -       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> +       "table": "indexed",
> +       "row": {"i": 3}}]]'], [0], [ignore], [ignore])
> +
> +# The snapshot should overwrite the in-memory contents of the DB on S2
> +# without generating any constraint violations.
> +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table to
have identical values (1) for index on column "i"' -c s2.log], [1], [dnl
> +0
> +])

Beside checking the error log, it may be better to simply check if the i=2
row is found in S2. Without this fix i=2 shouldn't get inserted. This can
better ensure the correctness, in case the log text changes and the grep
doesn't detect the problem.

>
>  for i in `seq $n`; do
>      OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 4efed88..789ae23 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0],
>
>  # Check that ovsdb-idl figured out that table link2 and column l2 are
missing.
>  AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
> +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database
needs upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database needs
upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database
needs upgrade?)
>  test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database
needs upgrade?)
> --
> 1.8.3.1
>
Dumitru Ceara Aug. 5, 2020, 7:47 p.m. UTC | #2
On 8/5/20 7:48 PM, Han Zhou wrote:
> 
> 
> On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>> wrote:
>>
>> Every time a follower has to install a snapshot received from the
>> leader, it should also replace the data in memory. Right now this only
>> happens when snapshots are installed that also change the schema.
>>
>> This can lead to inconsistent DB data on follower nodes and the snapshot
>> may fail to get applied.
>>
>> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
>> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after
> raft install_snapshot.")
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>>
> 
> Thanks Dumitru! This is a great finding, and sorry for my mistake.
> This patch looks good to me. Just one minor comment below on the test
> case. Otherwise:
> 
> Acked-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
> 

Thanks Han for the review! I fixed the test case as you suggested and
sent v2.

I was wondering if this is also the root cause for the issue you
reported a while back during the OVN meeting. In my scenario, if a
follower ends up in this situation, and if the DB gets compacted online
afterwards, the DB file also becomes inconsistent and in some cases
(after the DB server is restarted) all write transactions from clients
are rejected with "ovsdb-error: inconsistent data".

Related to that I also sent the following patch to make the ovsdb-server
storage state available via appctl commands:

https://patchwork.ozlabs.org/project/openvswitch/patch/1596467128-13004-1-git-send-email-dceara@redhat.com/

Regards,
Dumitru

>> ---
>>  ovsdb/ovsdb-server.c    | 21 +++++++++++++--------
>>  tests/idltest.ovsschema |  9 +++++++++
>>  tests/ovsdb-cluster.at <http://ovsdb-cluster.at>  | 28
> +++++++++++++++++++++++++---
>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>      |  1 +
>>  4 files changed, 48 insertions(+), 11 deletions(-)
>>
>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>> index ef4e996..fd7891a 100644
>> --- a/ovsdb/ovsdb-server.c
>> +++ b/ovsdb/ovsdb-server.c
>> @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct
> db *db,
>>            const struct ovsdb_schema *schema, const struct json *txn_json,
>>            const struct uuid *txnid)
>>  {
>> -    if (schema && (!db->db->schema || strcmp(schema->version,
>> -                                             db->db->schema->version))) {
>> +    if (schema) {
>>          /* We're replacing the schema (and the data).  Destroy the
> database
>>           * (first grabbing its storage), then replace it with the new
> schema.
>>           * The transaction must also include the replacement data.
>>           *
>> -         * Only clustered database schema changes go through this
> path. */
>> +         * Only clustered database schema changes and snapshot installs
>> +         * go through this path.
>> +         */
>>          ovs_assert(txn_json);
>>          ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
>>
>> @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct
> db *db,
>>              return error;
>>          }
>>
>> -        ovsdb_jsonrpc_server_reconnect(
>> -            config->jsonrpc, false,
>> -            (db->db->schema
>> -             ? xasprintf("database %s schema changed", db->db->name)
>> -             : xasprintf("database %s connected to storage",
> db->db->name)));
>> +        if (!db->db->schema ||
>> +            strcmp(schema->version, db->db->schema->version)) {
>> +            ovsdb_jsonrpc_server_reconnect(
>> +                config->jsonrpc, false,
>> +                (db->db->schema
>> +                ? xasprintf("database %s schema changed", db->db->name)
>> +                : xasprintf("database %s connected to storage",
>> +                            db->db->name)));
>> +        }
>>
>>          ovsdb_replace(db->db,
> ovsdb_create(ovsdb_schema_clone(schema), NULL));
>>
>> diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
>> index e02b975..e04755e 100644
>> --- a/tests/idltest.ovsschema
>> +++ b/tests/idltest.ovsschema
>> @@ -54,6 +54,15 @@
>>        },
>>        "isRoot" : true
>>      },
>> +    "indexed": {
>> +      "columns": {
>> +        "i": {
>> +          "type": "integer"
>> +        }
>> +      },
>> +      "indexes": [["i"]],
>> +      "isRoot" : true
>> +    },
>>      "simple": {
>>        "columns": {
>>          "b": {
>> diff --git a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
> b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
>> index 9714545..06d27c9 100644
>> --- a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
>> +++ b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
>> @@ -332,15 +332,31 @@ for i in `seq $n`; do
>>      AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
>>  done
>>
>> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>> +      {"op": "insert",
>> +       "table": "indexed",
>> +       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>> +
>>  # Kill one follower (s2) and write some data to cluster, so that the
> follower is falling behind
>>  printf "\ns2: stopping\n"
>>  OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
>>
>> +# Delete "i":1 and readd it to get a different UUID for it.
>> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>> +      {"op": "delete",
>> +       "table": "indexed",
>> +       "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
>> +
>>  AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>>        {"op": "insert",
>> -       "table": "simple",
>> +       "table": "indexed",
>>         "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>>
>> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
>> +      {"op": "insert",
>> +       "table": "indexed",
>> +       "row": {"i": 2}}]]'], [0], [ignore], [ignore])
>> +
>>  # Compact leader online to generate snapshot
>>  AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
>>
>> @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb
> $schema_name connected])
>>  # succeed.
>>  AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
>>        {"op": "insert",
>> -       "table": "simple",
>> -       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
>> +       "table": "indexed",
>> +       "row": {"i": 3}}]]'], [0], [ignore], [ignore])
>> +
>> +# The snapshot should overwrite the in-memory contents of the DB on S2
>> +# without generating any constraint violations.
>> +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table
> to have identical values (1) for index on column "i"' -c s2.log], [1], [dnl
>> +0
>> +])
> 
> Beside checking the error log, it may be better to simply check if the
> i=2 row is found in S2. Without this fix i=2 shouldn't get inserted.
> This can better ensure the correctness, in case the log text changes and
> the grep doesn't detect the problem.
> 
>>
>>  for i in `seq $n`; do
>>      OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
>> diff --git a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> index 4efed88..789ae23 100644
>> --- a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> +++ b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
>> @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0],
>>
>>  # Check that ovsdb-idl figured out that table link2 and column l2 are
> missing.
>>  AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
>> +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database
> needs upgrade?)
>>  test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database
> needs upgrade?)
>>  test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database
> needs upgrade?)
>>  test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database
> needs upgrade?)
>> --
>> 1.8.3.1
>>
Han Zhou Aug. 5, 2020, 9:58 p.m. UTC | #3
On Wed, Aug 5, 2020 at 12:48 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 8/5/20 7:48 PM, Han Zhou wrote:
> >
> >
> > On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>> wrote:
> >>
> >> Every time a follower has to install a snapshot received from the
> >> leader, it should also replace the data in memory. Right now this only
> >> happens when snapshots are installed that also change the schema.
> >>
> >> This can lead to inconsistent DB data on follower nodes and the
snapshot
> >> may fail to get applied.
> >>
> >> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
> >> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after
> > raft install_snapshot.")
> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>>
> >
> > Thanks Dumitru! This is a great finding, and sorry for my mistake.
> > This patch looks good to me. Just one minor comment below on the test
> > case. Otherwise:
> >
> > Acked-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
> >
>
> Thanks Han for the review! I fixed the test case as you suggested and
> sent v2.
>
> I was wondering if this is also the root cause for the issue you
> reported a while back during the OVN meeting. In my scenario, if a
> follower ends up in this situation, and if the DB gets compacted online
> afterwards, the DB file also becomes inconsistent and in some cases
> (after the DB server is restarted) all write transactions from clients
> are rejected with "ovsdb-error: inconsistent data".
>
Yes, I believe it is the root cause. I thought this patch was exactly for
that issue. Is it also for something else?

> Related to that I also sent the following patch to make the ovsdb-server
> storage state available via appctl commands:
>
>
https://patchwork.ozlabs.org/project/openvswitch/patch/1596467128-13004-1-git-send-email-dceara@redhat.com/
>

I will take a look.

Thanks,
Han

> Regards,
> Dumitru
>
> >> ---
> >>  ovsdb/ovsdb-server.c    | 21 +++++++++++++--------
> >>  tests/idltest.ovsschema |  9 +++++++++
> >>  tests/ovsdb-cluster.at <http://ovsdb-cluster.at>  | 28
> > +++++++++++++++++++++++++---
> >>  tests/ovsdb-idl.at <http://ovsdb-idl.at>      |  1 +
> >>  4 files changed, 48 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> >> index ef4e996..fd7891a 100644
> >> --- a/ovsdb/ovsdb-server.c
> >> +++ b/ovsdb/ovsdb-server.c
> >> @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct
> > db *db,
> >>            const struct ovsdb_schema *schema, const struct json
*txn_json,
> >>            const struct uuid *txnid)
> >>  {
> >> -    if (schema && (!db->db->schema || strcmp(schema->version,
> >> -
db->db->schema->version))) {
> >> +    if (schema) {
> >>          /* We're replacing the schema (and the data).  Destroy the
> > database
> >>           * (first grabbing its storage), then replace it with the new
> > schema.
> >>           * The transaction must also include the replacement data.
> >>           *
> >> -         * Only clustered database schema changes go through this
> > path. */
> >> +         * Only clustered database schema changes and snapshot
installs
> >> +         * go through this path.
> >> +         */
> >>          ovs_assert(txn_json);
> >>          ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
> >>
> >> @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct
> > db *db,
> >>              return error;
> >>          }
> >>
> >> -        ovsdb_jsonrpc_server_reconnect(
> >> -            config->jsonrpc, false,
> >> -            (db->db->schema
> >> -             ? xasprintf("database %s schema changed", db->db->name)
> >> -             : xasprintf("database %s connected to storage",
> > db->db->name)));
> >> +        if (!db->db->schema ||
> >> +            strcmp(schema->version, db->db->schema->version)) {
> >> +            ovsdb_jsonrpc_server_reconnect(
> >> +                config->jsonrpc, false,
> >> +                (db->db->schema
> >> +                ? xasprintf("database %s schema changed",
db->db->name)
> >> +                : xasprintf("database %s connected to storage",
> >> +                            db->db->name)));
> >> +        }
> >>
> >>          ovsdb_replace(db->db,
> > ovsdb_create(ovsdb_schema_clone(schema), NULL));
> >>
> >> diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
> >> index e02b975..e04755e 100644
> >> --- a/tests/idltest.ovsschema
> >> +++ b/tests/idltest.ovsschema
> >> @@ -54,6 +54,15 @@
> >>        },
> >>        "isRoot" : true
> >>      },
> >> +    "indexed": {
> >> +      "columns": {
> >> +        "i": {
> >> +          "type": "integer"
> >> +        }
> >> +      },
> >> +      "indexes": [["i"]],
> >> +      "isRoot" : true
> >> +    },
> >>      "simple": {
> >>        "columns": {
> >>          "b": {
> >> diff --git a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
> > b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
> >> index 9714545..06d27c9 100644
> >> --- a/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
> >> +++ b/tests/ovsdb-cluster.at <http://ovsdb-cluster.at>
> >> @@ -332,15 +332,31 @@ for i in `seq $n`; do
> >>      AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name
connected])
> >>  done
> >>
> >> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> >> +      {"op": "insert",
> >> +       "table": "indexed",
> >> +       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> >> +
> >>  # Kill one follower (s2) and write some data to cluster, so that the
> > follower is falling behind
> >>  printf "\ns2: stopping\n"
> >>  OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
> >>
> >> +# Delete "i":1 and readd it to get a different UUID for it.
> >> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> >> +      {"op": "delete",
> >> +       "table": "indexed",
> >> +       "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
> >> +
> >>  AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> >>        {"op": "insert",
> >> -       "table": "simple",
> >> +       "table": "indexed",
> >>         "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> >>
> >> +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
> >> +      {"op": "insert",
> >> +       "table": "indexed",
> >> +       "row": {"i": 2}}]]'], [0], [ignore], [ignore])
> >> +
> >>  # Compact leader online to generate snapshot
> >>  AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
> >>
> >> @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb
> > $schema_name connected])
> >>  # succeed.
> >>  AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
> >>        {"op": "insert",
> >> -       "table": "simple",
> >> -       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
> >> +       "table": "indexed",
> >> +       "row": {"i": 3}}]]'], [0], [ignore], [ignore])
> >> +
> >> +# The snapshot should overwrite the in-memory contents of the DB on S2
> >> +# without generating any constraint violations.
> >> +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table
> > to have identical values (1) for index on column "i"' -c s2.log], [1],
[dnl
> >> +0
> >> +])
> >
> > Beside checking the error log, it may be better to simply check if the
> > i=2 row is found in S2. Without this fix i=2 shouldn't get inserted.
> > This can better ensure the correctness, in case the log text changes and
> > the grep doesn't detect the problem.
> >
> >>
> >>  for i in `seq $n`; do
> >>      OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
> >> diff --git a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> > b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> >> index 4efed88..789ae23 100644
> >> --- a/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> >> +++ b/tests/ovsdb-idl.at <http://ovsdb-idl.at>
> >> @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0],
> >>
> >>  # Check that ovsdb-idl figured out that table link2 and column l2 are
> > missing.
> >>  AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
> >> +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database
> > needs upgrade?)
> >>  test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database
> > needs upgrade?)
> >>  test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database
> > needs upgrade?)
> >>  test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database
> > needs upgrade?)
> >> --
> >> 1.8.3.1
> >>
>
Dumitru Ceara Aug. 5, 2020, 10:04 p.m. UTC | #4
On 8/5/20 11:58 PM, Han Zhou wrote:
> 
> 
> On Wed, Aug 5, 2020 at 12:48 PM Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>> wrote:
>>
>> On 8/5/20 7:48 PM, Han Zhou wrote:
>> >
>> >
>> > On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>
>> > <mailto:dceara@redhat.com <mailto:dceara@redhat.com>>> wrote:
>> >>
>> >> Every time a follower has to install a snapshot received from the
>> >> leader, it should also replace the data in memory. Right now this only
>> >> happens when snapshots are installed that also change the schema.
>> >>
>> >> This can lead to inconsistent DB data on follower nodes and the
> snapshot
>> >> may fail to get applied.
>> >>
>> >> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>
> <mailto:hzhou@ovn.org <mailto:hzhou@ovn.org>>>
>> >> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after
>> > raft install_snapshot.")
>> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>
>> > <mailto:dceara@redhat.com <mailto:dceara@redhat.com>>>
>> >
>> > Thanks Dumitru! This is a great finding, and sorry for my mistake.
>> > This patch looks good to me. Just one minor comment below on the test
>> > case. Otherwise:
>> >
>> > Acked-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>
> <mailto:hzhou@ovn.org <mailto:hzhou@ovn.org>>>
>> >
>>
>> Thanks Han for the review! I fixed the test case as you suggested and
>> sent v2.
>>
>> I was wondering if this is also the root cause for the issue you
>> reported a while back during the OVN meeting. In my scenario, if a
>> follower ends up in this situation, and if the DB gets compacted online
>> afterwards, the DB file also becomes inconsistent and in some cases
>> (after the DB server is restarted) all write transactions from clients
>> are rejected with "ovsdb-error: inconsistent data".
>>
> Yes, I believe it is the root cause. I thought this patch was exactly
> for that issue. Is it also for something else?
> 

This patch is for the issue I described above: inconsistent DB on
follower followed by online compacting of the DB which corrupts the DB
file too. I wasn't sure if this was also what you were hitting in your
deployment, I just wanted to check if there are any other known
potential issues we need to investigate.

>> Related to that I also sent the following patch to make the ovsdb-server
>> storage state available via appctl commands:
>>
>>
> https://patchwork.ozlabs.org/project/openvswitch/patch/1596467128-13004-1-git-send-email-dceara@redhat.com/
>>
> 
> I will take a look.
> 
> Thanks,
> Han
> 

Thanks!
Dumitru
Han Zhou Aug. 5, 2020, 10:09 p.m. UTC | #5
On Wed, Aug 5, 2020 at 3:04 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 8/5/20 11:58 PM, Han Zhou wrote:
> >
> >
> > On Wed, Aug 5, 2020 at 12:48 PM Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>> wrote:
> >>
> >> On 8/5/20 7:48 PM, Han Zhou wrote:
> >> >
> >> >
> >> > On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>
> >> > <mailto:dceara@redhat.com <mailto:dceara@redhat.com>>> wrote:
> >> >>
> >> >> Every time a follower has to install a snapshot received from the
> >> >> leader, it should also replace the data in memory. Right now this
only
> >> >> happens when snapshots are installed that also change the schema.
> >> >>
> >> >> This can lead to inconsistent DB data on follower nodes and the
> > snapshot
> >> >> may fail to get applied.
> >> >>
> >> >> CC: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>
> > <mailto:hzhou@ovn.org <mailto:hzhou@ovn.org>>>
> >> >> Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after
> >> > raft install_snapshot.")
> >> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com
> > <mailto:dceara@redhat.com>
> >> > <mailto:dceara@redhat.com <mailto:dceara@redhat.com>>>
> >> >
> >> > Thanks Dumitru! This is a great finding, and sorry for my mistake.
> >> > This patch looks good to me. Just one minor comment below on the test
> >> > case. Otherwise:
> >> >
> >> > Acked-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>
> > <mailto:hzhou@ovn.org <mailto:hzhou@ovn.org>>>
> >> >
> >>
> >> Thanks Han for the review! I fixed the test case as you suggested and
> >> sent v2.
> >>
> >> I was wondering if this is also the root cause for the issue you
> >> reported a while back during the OVN meeting. In my scenario, if a
> >> follower ends up in this situation, and if the DB gets compacted online
> >> afterwards, the DB file also becomes inconsistent and in some cases
> >> (after the DB server is restarted) all write transactions from clients
> >> are rejected with "ovsdb-error: inconsistent data".
> >>
> > Yes, I believe it is the root cause. I thought this patch was exactly
> > for that issue. Is it also for something else?
> >
>
> This patch is for the issue I described above: inconsistent DB on
> follower followed by online compacting of the DB which corrupts the DB
> file too. I wasn't sure if this was also what you were hitting in your
> deployment, I just wanted to check if there are any other known
> potential issues we need to investigate.
>

OK, I think it should be the same issue. There are no other potential
issues related to "inconsistent data" discovered so far.

> >> Related to that I also sent the following patch to make the
ovsdb-server
> >> storage state available via appctl commands:
> >>
> >>
> >
https://patchwork.ozlabs.org/project/openvswitch/patch/1596467128-13004-1-git-send-email-dceara@redhat.com/
> >>
> >
> > I will take a look.
> >
> > Thanks,
> > Han
> >
>
> Thanks!
> Dumitru
>
diff mbox series

Patch

diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index ef4e996..fd7891a 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -543,13 +543,14 @@  parse_txn(struct server_config *config, struct db *db,
           const struct ovsdb_schema *schema, const struct json *txn_json,
           const struct uuid *txnid)
 {
-    if (schema && (!db->db->schema || strcmp(schema->version,
-                                             db->db->schema->version))) {
+    if (schema) {
         /* We're replacing the schema (and the data).  Destroy the database
          * (first grabbing its storage), then replace it with the new schema.
          * The transaction must also include the replacement data.
          *
-         * Only clustered database schema changes go through this path. */
+         * Only clustered database schema changes and snapshot installs
+         * go through this path.
+         */
         ovs_assert(txn_json);
         ovs_assert(ovsdb_storage_is_clustered(db->db->storage));
 
@@ -559,11 +560,15 @@  parse_txn(struct server_config *config, struct db *db,
             return error;
         }
 
-        ovsdb_jsonrpc_server_reconnect(
-            config->jsonrpc, false,
-            (db->db->schema
-             ? xasprintf("database %s schema changed", db->db->name)
-             : xasprintf("database %s connected to storage", db->db->name)));
+        if (!db->db->schema ||
+            strcmp(schema->version, db->db->schema->version)) {
+            ovsdb_jsonrpc_server_reconnect(
+                config->jsonrpc, false,
+                (db->db->schema
+                ? xasprintf("database %s schema changed", db->db->name)
+                : xasprintf("database %s connected to storage",
+                            db->db->name)));
+        }
 
         ovsdb_replace(db->db, ovsdb_create(ovsdb_schema_clone(schema), NULL));
 
diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema
index e02b975..e04755e 100644
--- a/tests/idltest.ovsschema
+++ b/tests/idltest.ovsschema
@@ -54,6 +54,15 @@ 
       },
       "isRoot" : true
     },
+    "indexed": {
+      "columns": {
+        "i": {
+          "type": "integer"
+        }
+      },
+      "indexes": [["i"]],
+      "isRoot" : true
+    },
     "simple": {
       "columns": {
         "b": {
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index 9714545..06d27c9 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -332,15 +332,31 @@  for i in `seq $n`; do
     AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected])
 done
 
+AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
+      {"op": "insert",
+       "table": "indexed",
+       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
+
 # Kill one follower (s2) and write some data to cluster, so that the follower is falling behind
 printf "\ns2: stopping\n"
 OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid])
 
+# Delete "i":1 and readd it to get a different UUID for it.
+AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
+      {"op": "delete",
+       "table": "indexed",
+       "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore])
+
 AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
       {"op": "insert",
-       "table": "simple",
+       "table": "indexed",
        "row": {"i": 1}}]]'], [0], [ignore], [ignore])
 
+AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest",
+      {"op": "insert",
+       "table": "indexed",
+       "row": {"i": 2}}]]'], [0], [ignore], [ignore])
+
 # Compact leader online to generate snapshot
 AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact])
 
@@ -355,8 +371,14 @@  AT_CHECK([ovsdb_client_wait unix:s2.ovsdb $schema_name connected])
 # succeed.
 AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest",
       {"op": "insert",
-       "table": "simple",
-       "row": {"i": 1}}]]'], [0], [ignore], [ignore])
+       "table": "indexed",
+       "row": {"i": 3}}]]'], [0], [ignore], [ignore])
+
+# The snapshot should overwrite the in-memory contents of the DB on S2
+# without generating any constraint violations.
+AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table to have identical values (1) for index on column "i"' -c s2.log], [1], [dnl
+0
+])
 
 for i in `seq $n`; do
     OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid])
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 4efed88..789ae23 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -954,6 +954,7 @@  AT_CHECK([sort stdout | uuidfilt], [0],
 
 # Check that ovsdb-idl figured out that table link2 and column l2 are missing.
 AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl
+test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database needs upgrade?)
 test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database needs upgrade?)
 test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database needs upgrade?)
 test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database needs upgrade?)