Message ID | CAAFK5zwuJ1GjfNgNObmsqHF3bQ9tTe2phVwbCHjHpkeyf0pmVA@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,v3] ovsdb-tool: Add a db consistency check to the ovsdb-tool check-cluster command | expand |
On 7/10/20 10:18 AM, Federico Paolinelli wrote: > There are some occurrences where the database ends up in an inconsistent > state. This happened in ovn-k8s and is described in [0]. > Here we are adding a supported way to check that a given db is consistent, > which is less error prone than checking the logs. > > Tested against both a valid db and a corrupted db attached to the > above bug [1]. > > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 > > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> > Suggested-by: Dumitru Ceara <dceara@redhat.com> > --- > ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c > index 91662cab8..016a3ba28 100644 > --- a/ovsdb/ovsdb-tool.c > +++ b/ovsdb/ovsdb-tool.c > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) > } > } > > + /* Check for db consistency: > + * The serverid must be in the servers list. > + */ > + > + for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) { > + struct shash *servers_obj = json_object(s->snap->servers); > + char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid)); > + bool found = false; > + const struct shash_node *node; > + > + SHASH_FOR_EACH (node, servers_obj) { > + if (!strncmp(server_id, node->name, SID_LEN)) { > + found = true; > + } > + } > + if (!found) { > + ovs_fatal(0, "%s: server %s not found in server list", > + s->filename, server_id); > + } > + free(server_id); > + } > + > /* Clean up. */ > > for (size_t i = 0; i < c.n_servers; i++) { > Looks good to me, thanks! Acked-by: Dumitru Ceara <dceara@redhat.com>
On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com> wrote: > On 7/10/20 10:18 AM, Federico Paolinelli wrote: > > There are some occurrences where the database ends up in an inconsistent > > state. This happened in ovn-k8s and is described in [0]. > > Here we are adding a supported way to check that a given db is > consistent, > > which is less error prone than checking the logs. > > > > Tested against both a valid db and a corrupted db attached to the > > above bug [1]. > > > > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 > > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 > > > > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> > > Suggested-by: Dumitru Ceara <dceara@redhat.com> > > --- > > ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c > > index 91662cab8..016a3ba28 100644 > > --- a/ovsdb/ovsdb-tool.c > > +++ b/ovsdb/ovsdb-tool.c > > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) > > } > > } > > > > + /* Check for db consistency: > > + * The serverid must be in the servers list. > > + */ > > + > > + for (struct server *s = c.servers; s < &c.servers[c.n_servers]; > s++) { > > + struct shash *servers_obj = json_object(s->snap->servers); > > + char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid)); > > + bool found = false; > > + const struct shash_node *node; > > + > > + SHASH_FOR_EACH (node, servers_obj) { > > + if (!strncmp(server_id, node->name, SID_LEN)) { > > + found = true; > > + } > > + } > > + if (!found) { > > + ovs_fatal(0, "%s: server %s not found in server list", > > + s->filename, server_id); > > + } > > + free(server_id); > > + } > > + > > /* Clean up. */ > > > > for (size_t i = 0; i < c.n_servers; i++) { > > > > Looks good to me, thanks! > > Acked-by: Dumitru Ceara <dceara@redhat.com> > > Question: does it make sense to add a unit test that validates the behaviour of this new function? And if so, what would be the best place to add it? Thanks!
On 7/15/20 4:22 PM, Federico Paolinelli wrote: > > > On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com > <mailto:dceara@redhat.com>> wrote: > > On 7/10/20 10:18 AM, Federico Paolinelli wrote: > > There are some occurrences where the database ends up in an > inconsistent > > state. This happened in ovn-k8s and is described in [0]. > > Here we are adding a supported way to check that a given db is > consistent, > > which is less error prone than checking the logs. > > > > Tested against both a valid db and a corrupted db attached to the > > above bug [1]. > > > > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 > > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 > > > > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com > <mailto:fpaoline@redhat.com>> > > Suggested-by: Dumitru Ceara <dceara@redhat.com > <mailto:dceara@redhat.com>> > > --- > > ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c > > index 91662cab8..016a3ba28 100644 > > --- a/ovsdb/ovsdb-tool.c > > +++ b/ovsdb/ovsdb-tool.c > > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) > > } > > } > > > > + /* Check for db consistency: > > + * The serverid must be in the servers list. > > + */ > > + > > + for (struct server *s = c.servers; s < > &c.servers[c.n_servers]; s++) { > > + struct shash *servers_obj = json_object(s->snap->servers); > > + char *server_id = xasprintf(SID_FMT, > SID_ARGS(&s->header.sid)); > > + bool found = false; > > + const struct shash_node *node; > > + > > + SHASH_FOR_EACH (node, servers_obj) { > > + if (!strncmp(server_id, node->name, SID_LEN)) { > > + found = true; > > + } > > + } > > + if (!found) { > > + ovs_fatal(0, "%s: server %s not found in server list", > > + s->filename, server_id); > > + } > > + free(server_id); > > + } > > + > > /* Clean up. */ > > > > for (size_t i = 0; i < c.n_servers; i++) { > > > > Looks good to me, thanks! > > Acked-by: Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>> > > > Question: does it make sense to add a unit test that validates the > behaviour > of this new function? And if so, what would be the best place to add it? > > Thanks! > I might be wrong but I don't see an easy way to generate a corrupted database to run "ovsdb-tool check-cluster" against. Thanks, Dumitru
On 7/15/20 4:51 PM, Dumitru Ceara wrote: > On 7/15/20 4:22 PM, Federico Paolinelli wrote: >> >> >> On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dceara@redhat.com >> <mailto:dceara@redhat.com>> wrote: >> >> On 7/10/20 10:18 AM, Federico Paolinelli wrote: >> > There are some occurrences where the database ends up in an >> inconsistent >> > state. This happened in ovn-k8s and is described in [0]. >> > Here we are adding a supported way to check that a given db is >> consistent, >> > which is less error prone than checking the logs. >> > >> > Tested against both a valid db and a corrupted db attached to the >> > above bug [1]. >> > >> > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 >> > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 >> > >> > Signed-off-by: Federico Paolinelli <fpaoline@redhat.com >> <mailto:fpaoline@redhat.com>> >> > Suggested-by: Dumitru Ceara <dceara@redhat.com >> <mailto:dceara@redhat.com>> >> > --- >> > ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ >> > 1 file changed, 22 insertions(+) >> > >> > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c >> > index 91662cab8..016a3ba28 100644 >> > --- a/ovsdb/ovsdb-tool.c >> > +++ b/ovsdb/ovsdb-tool.c >> > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) >> > } >> > } >> > >> > + /* Check for db consistency: >> > + * The serverid must be in the servers list. >> > + */ >> > + >> > + for (struct server *s = c.servers; s < >> &c.servers[c.n_servers]; s++) { >> > + struct shash *servers_obj = json_object(s->snap->servers); >> > + char *server_id = xasprintf(SID_FMT, >> SID_ARGS(&s->header.sid)); >> > + bool found = false; >> > + const struct shash_node *node; >> > + >> > + SHASH_FOR_EACH (node, servers_obj) { >> > + if (!strncmp(server_id, node->name, SID_LEN)) { >> > + found = true; >> > + } >> > + } >> > + if (!found) { >> > + ovs_fatal(0, "%s: server %s not found in server list", >> > + s->filename, server_id); >> > + } >> > + free(server_id); >> > + } >> > + >> > /* Clean up. */ >> > >> > for (size_t i = 0; i < c.n_servers; i++) { >> > >> >> Looks good to me, thanks! >> >> Acked-by: Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>> >> >> >> Question: does it make sense to add a unit test that validates the >> behaviour >> of this new function? And if so, what would be the best place to add it? >> >> Thanks! >> > > I might be wrong but I don't see an easy way to generate a corrupted > database to run "ovsdb-tool check-cluster" against. > > Thanks, > Dumitru > Hi Federico, As discussed offline, we probably need a new revision of this patch to check the s->entries[*].servers too and fail the check only when the server is not in the raft header and also not in the log entries. Regards, Dumitru
diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c index 91662cab8..016a3ba28 100644 --- a/ovsdb/ovsdb-tool.c +++ b/ovsdb/ovsdb-tool.c @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) } } + /* Check for db consistency: + * The serverid must be in the servers list. + */ + + for (struct server *s = c.servers; s < &c.servers[c.n_servers]; s++) { + struct shash *servers_obj = json_object(s->snap->servers); + char *server_id = xasprintf(SID_FMT, SID_ARGS(&s->header.sid)); + bool found = false; + const struct shash_node *node; + + SHASH_FOR_EACH (node, servers_obj) { + if (!strncmp(server_id, node->name, SID_LEN)) { + found = true; + } + } + if (!found) { + ovs_fatal(0, "%s: server %s not found in server list", + s->filename, server_id); + } + free(server_id); + } + /* Clean up. */ for (size_t i = 0; i < c.n_servers; i++) {
There are some occurrences where the database ends up in an inconsistent state. This happened in ovn-k8s and is described in [0]. Here we are adding a supported way to check that a given db is consistent, which is less error prone than checking the logs. Tested against both a valid db and a corrupted db attached to the above bug [1]. [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 Signed-off-by: Federico Paolinelli <fpaoline@redhat.com> Suggested-by: Dumitru Ceara <dceara@redhat.com> --- ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)