diff mbox series

[ovs-dev,v3] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command

Message ID CAAFK5zwuJ1GjfNgNObmsqHF3bQ9tTe2phVwbCHjHpkeyf0pmVA@mail.gmail.com
State Superseded
Headers show
Series [ovs-dev,v3] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command | expand

Commit Message

Federico Paolinelli July 10, 2020, 8:18 a.m. UTC
There are some occurrences where the database ends up in an inconsistent
state. This happened in ovn-k8s and is described in [0].
Here we are adding a supported way to check that a given db is consistent,
which is less error prone than checking the logs.

Tested against both a valid db and a corrupted db attached to the
above bug [1].

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
[1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Suggested-by: Dumitru Ceara <dceara@redhat.com>
---
 ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Dumitru Ceara July 14, 2020, 8:22 a.m. UTC | #1
On 7/10/20 10:18 AM, Federico Paolinelli wrote:
> There are some occurrences where the database ends up in an inconsistent
> state. This happened in ovn-k8s and is described in [0].
> Here we are adding a supported way to check that a given db is consistent,
> which is less error prone than checking the logs.
> 
> Tested against both a valid db and a corrupted db attached to the
> above bug [1].
> 
> [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
> [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
> 
> Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
> Suggested-by: Dumitru Ceara <dceara@redhat.com>
> ---
>  ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
> index 91662cab8..016a3ba28 100644
> --- a/ovsdb/ovsdb-tool.c
> +++ b/ovsdb/ovsdb-tool.c
> @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
>          }
>      }
> 
> +    /* Check for db consistency:
> +     * The serverid must be in the servers list.
> +     */
> +
> +    for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) {
> +        struct shash *servers_obj = json_object(s->snap->servers);
> +        char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
> +        bool found = false;
> +        const struct shash_node *node;
> +
> +        SHASH_FOR_EACH (node, servers_obj) {
> +            if (!strncmp(server_id, node->name, SID_LEN)) {
> +                found = true;
> +            }
> +        }
> +        if (!found) {
> +            ovs_fatal(0, "%s: server %s not found in server list",
> +                      s->filename, server_id);
> +        }
> +        free(server_id);
> +    }
> +
>      /* Clean up. */
> 
>      for (size_t i = 0; i < c.n_servers; i++) {
> 

Looks good to me, thanks!

Acked-by: Dumitru Ceara <dceara@redhat.com>
Federico Paolinelli July 15, 2020, 2:22 p.m. UTC | #2
On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com> wrote:

> On 7/10/20 10:18 AM, Federico Paolinelli wrote:
> > There are some occurrences where the database ends up in an inconsistent
> > state. This happened in ovn-k8s and is described in [0].
> > Here we are adding a supported way to check that a given db is
> consistent,
> > which is less error prone than checking the logs.
> >
> > Tested against both a valid db and a corrupted db attached to the
> > above bug [1].
> >
> > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
> > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
> >
> > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
> > Suggested-by: Dumitru Ceara <dceara@redhat.com>
> > ---
> >  ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
> > index 91662cab8..016a3ba28 100644
> > --- a/ovsdb/ovsdb-tool.c
> > +++ b/ovsdb/ovsdb-tool.c
> > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
> >          }
> >      }
> >
> > +    /* Check for db consistency:
> > +     * The serverid must be in the servers list.
> > +     */
> > +
> > +    for (struct server *s = c.servers; s < &c.servers[c.n_servers];
> s++) {
> > +        struct shash *servers_obj = json_object(s->snap->servers);
> > +        char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
> > +        bool found = false;
> > +        const struct shash_node *node;
> > +
> > +        SHASH_FOR_EACH (node, servers_obj) {
> > +            if (!strncmp(server_id, node->name, SID_LEN)) {
> > +                found = true;
> > +            }
> > +        }
> > +        if (!found) {
> > +            ovs_fatal(0, "%s: server %s not found in server list",
> > +                      s->filename, server_id);
> > +        }
> > +        free(server_id);
> > +    }
> > +
> >      /* Clean up. */
> >
> >      for (size_t i = 0; i < c.n_servers; i++) {
> >
>
> Looks good to me, thanks!
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>
>
>
Question: does it make sense to add a unit test that validates the
behaviour
of this new function? And if so, what would be the best place to add it?

Thanks!
Dumitru Ceara July 15, 2020, 2:51 p.m. UTC | #3
On 7/15/20 4:22 PM, Federico Paolinelli wrote:
> 
> 
> On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com
> <mailto:dceara@redhat.com>> wrote:
> 
>     On 7/10/20 10:18 AM, Federico Paolinelli wrote:
>     > There are some occurrences where the database ends up in an
>     inconsistent
>     > state. This happened in ovn-k8s and is described in [0].
>     > Here we are adding a supported way to check that a given db is
>     consistent,
>     > which is less error prone than checking the logs.
>     >
>     > Tested against both a valid db and a corrupted db attached to the
>     > above bug [1].
>     >
>     > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
>     > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
>     >
>     > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com
>     <mailto:fpaoline@redhat.com>>
>     > Suggested-by: Dumitru Ceara <dceara@redhat.com
>     <mailto:dceara@redhat.com>>
>     > ---
>     >  ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++
>     >  1 file changed, 22 insertions(+)
>     >
>     > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
>     > index 91662cab8..016a3ba28 100644
>     > --- a/ovsdb/ovsdb-tool.c
>     > +++ b/ovsdb/ovsdb-tool.c
>     > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
>     >          }
>     >      }
>     >
>     > +    /* Check for db consistency:
>     > +     * The serverid must be in the servers list.
>     > +     */
>     > +
>     > +    for (struct server *s = c.servers; s <
>     &c.servers[c.n_servers]; s++) {
>     > +        struct shash *servers_obj = json_object(s->snap->servers);
>     > +        char *server_id = xasprintf(SID_FMT,
>     SID_ARGS(&s->header.sid));
>     > +        bool found = false;
>     > +        const struct shash_node *node;
>     > +
>     > +        SHASH_FOR_EACH (node, servers_obj) {
>     > +            if (!strncmp(server_id, node->name, SID_LEN)) {
>     > +                found = true;
>     > +            }
>     > +        }
>     > +        if (!found) {
>     > +            ovs_fatal(0, "%s: server %s not found in server list",
>     > +                      s->filename, server_id);
>     > +        }
>     > +        free(server_id);
>     > +    }
>     > +
>     >      /* Clean up. */
>     >
>     >      for (size_t i = 0; i < c.n_servers; i++) {
>     >
> 
>     Looks good to me, thanks!
> 
>     Acked-by: Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>>
> 
> 
> Question: does it make sense to add a unit test that validates the
> behaviour
> of this new function? And if so, what would be the best place to add it?
> 
> Thanks!
>  

I might be wrong but I don't see an easy way to generate a corrupted
database to run "ovsdb-tool check-cluster" against.

Thanks,
Dumitru
Dumitru Ceara July 28, 2020, 2:46 p.m. UTC | #4
On 7/15/20 4:51 PM, Dumitru Ceara wrote:
> On 7/15/20 4:22 PM, Federico Paolinelli wrote:
>>
>>
>> On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com
>> <mailto:dceara@redhat.com>> wrote:
>>
>>     On 7/10/20 10:18 AM, Federico Paolinelli wrote:
>>     > There are some occurrences where the database ends up in an
>>     inconsistent
>>     > state. This happened in ovn-k8s and is described in [0].
>>     > Here we are adding a supported way to check that a given db is
>>     consistent,
>>     > which is less error prone than checking the logs.
>>     >
>>     > Tested against both a valid db and a corrupted db attached to the
>>     > above bug [1].
>>     >
>>     > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23
>>     > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595
>>     >
>>     > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com
>>     <mailto:fpaoline@redhat.com>>
>>     > Suggested-by: Dumitru Ceara <dceara@redhat.com
>>     <mailto:dceara@redhat.com>>
>>     > ---
>>     >  ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++
>>     >  1 file changed, 22 insertions(+)
>>     >
>>     > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
>>     > index 91662cab8..016a3ba28 100644
>>     > --- a/ovsdb/ovsdb-tool.c
>>     > +++ b/ovsdb/ovsdb-tool.c
>>     > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx)
>>     >          }
>>     >      }
>>     >
>>     > +    /* Check for db consistency:
>>     > +     * The serverid must be in the servers list.
>>     > +     */
>>     > +
>>     > +    for (struct server *s = c.servers; s <
>>     &c.servers[c.n_servers]; s++) {
>>     > +        struct shash *servers_obj = json_object(s->snap->servers);
>>     > +        char *server_id = xasprintf(SID_FMT,
>>     SID_ARGS(&s->header.sid));
>>     > +        bool found = false;
>>     > +        const struct shash_node *node;
>>     > +
>>     > +        SHASH_FOR_EACH (node, servers_obj) {
>>     > +            if (!strncmp(server_id, node->name, SID_LEN)) {
>>     > +                found = true;
>>     > +            }
>>     > +        }
>>     > +        if (!found) {
>>     > +            ovs_fatal(0, "%s: server %s not found in server list",
>>     > +                      s->filename, server_id);
>>     > +        }
>>     > +        free(server_id);
>>     > +    }
>>     > +
>>     >      /* Clean up. */
>>     >
>>     >      for (size_t i = 0; i < c.n_servers; i++) {
>>     >
>>
>>     Looks good to me, thanks!
>>
>>     Acked-by: Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>>
>>
>>
>> Question: does it make sense to add a unit test that validates the
>> behaviour
>> of this new function? And if so, what would be the best place to add it?
>>
>> Thanks!
>>  
> 
> I might be wrong but I don't see an easy way to generate a corrupted
> database to run "ovsdb-tool check-cluster" against.
> 
> Thanks,
> Dumitru
> 

Hi Federico,

As discussed offline, we probably need a new revision of this patch to
check the s->entries[*].servers too and fail the check only when the
server is not in the raft header and also not in the log entries.

Regards,
Dumitru
diff mbox series

Patch

diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 91662cab8..016a3ba28 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -1497,6 +1497,28 @@  do_check_cluster(struct ovs_cmdl_context *ctx)
         }
     }

+    /* Check for db consistency:
+     * The serverid must be in the servers list.
+     */
+
+    for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) {
+        struct shash *servers_obj = json_object(s->snap->servers);
+        char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid));
+        bool found = false;
+        const struct shash_node *node;
+
+        SHASH_FOR_EACH (node, servers_obj) {
+            if (!strncmp(server_id, node->name, SID_LEN)) {
+                found = true;
+            }
+        }
+        if (!found) {
+            ovs_fatal(0, "%s: server %s not found in server list",
+                      s->filename, server_id);
+        }
+        free(server_id);
+    }
+
     /* Clean up. */

     for (size_t i = 0; i < c.n_servers; i++) {