diff mbox series

[ovs-dev,v4] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command

Message ID CAAFK5zxZ2PPKbq9ythtRrGyWhXA47RyHktyGGC20UvVCLT9WRw@mail.gmail.com
State Accepted
Headers show
Series [ovs-dev,v4] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command | expand

Commit Message

Federico Paolinelli July 30, 2020, 10:41 a.m. UTC
There are some occurrences where the database ends up in an inconsistent
state. This happened in ovn-k8s and is described in [0].
Here we are adding a supported way to check that a given db is consistent,
which is less error prone than checking the logs.

Tested against both a valid db and a corrupted db attached to the
above bug [1]. Also, tested  with a fresh db that did not do a snapshot.

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
[1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Suggested-by: Dumitru Ceara <dceara@redhat.com>
---
 ovsdb/ovsdb-tool.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

Comments

Dumitru Ceara July 31, 2020, 2:53 p.m. UTC | #1
On 7/30/20 12:41 PM, Federico Paolinelli wrote:
> There are some occurrences where the database ends up in an inconsistent
> state. This happened in ovn-k8s and is described in [0].
> Here we are adding a supported way to check that a given db is consistent,
> which is less error prone than checking the logs.
> 
> Tested against both a valid db and a corrupted db attached to the
> above bug [1]. Also, tested  with a fresh db that did not do a snapshot.
> 
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
> [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
> 
> Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
> Suggested-by: Dumitru Ceara <dceara@redhat.com>

Looks good to me, thanks!

Acked-by: Dumitru Ceara <dceara@redhat.com>
Federico Paolinelli Sept. 15, 2020, 12:12 p.m. UTC | #2
On Fri, Jul 31, 2020 at 4:53 PM Dumitru Ceara <dceara@redhat.com> wrote:

> On 7/30/20 12:41 PM, Federico Paolinelli wrote:
> > There are some occurrences where the database ends up in an inconsistent
> > state. This happened in ovn-k8s and is described in [0].
> > Here we are adding a supported way to check that a given db is
> consistent,
> > which is less error prone than checking the logs.
> >
> > Tested against both a valid db and a corrupted db attached to the
> > above bug [1]. Also, tested  with a fresh db that did not do a snapshot.
> >
> > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
> > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
> >
> > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
> > Suggested-by: Dumitru Ceara <dceara@redhat.com>
>
> Looks good to me, thanks!
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>
>
>
Not sure how to move this forward, pinging here :-)
Ilya Maximets Sept. 16, 2020, 1:58 p.m. UTC | #3
On 7/31/20 4:53 PM, Dumitru Ceara wrote:
> On 7/30/20 12:41 PM, Federico Paolinelli wrote:
>> There are some occurrences where the database ends up in an inconsistent
>> state. This happened in ovn-k8s and is described in [0].
>> Here we are adding a supported way to check that a given db is consistent,
>> which is less error prone than checking the logs.
>>
>> Tested against both a valid db and a corrupted db attached to the
>> above bug [1]. Also, tested  with a fresh db that did not do a snapshot.
>>
>> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
>> [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
>>
>> Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
>> Suggested-by: Dumitru Ceara <dceara@redhat.com>
> 
> Looks good to me, thanks!
> 
> Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks!

Applied to master.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 91662cab8..30d0472b2 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -1497,6 +1497,44 @@  do_check_cluster(struct ovs_cmdl_context *ctx)
         }
     }

+    /* Check for db consistency:
+     * The serverid must be in the servers list.
+     */
+
+    for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) {
+        struct shash *servers_obj = json_object(s->snap->servers);
+        char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
+        bool found = false;
+        const struct shash_node *node;
+
+        SHASH_FOR_EACH (node, servers_obj) {
+            if (!strncmp(server_id, node->name, SID_LEN)) {
+                found = true;
+            }
+        }
+
+        if (!found) {
+            for (struct raft_entry *e = s->entries;
+                 e < &s->entries[s->log_end - s->log_start]; e++) {
+                if (e->servers == NULL) {
+                    continue;
+                }
+                struct shash *log_servers_obj = json_object(e->servers);
+                SHASH_FOR_EACH (node, log_servers_obj) {
+                    if (!strncmp(server_id, node->name, SID_LEN)) {
+                        found = true;
+                    }
+                }
+            }
+        }
+
+        if (!found) {
+            ovs_fatal(0, "%s: server %s not found in server list",
+                      s->filename, server_id);
+        }
+        free(server_id);
+    }
+
     /* Clean up. */

     for (size_t i = 0; i < c.n_servers; i++) {