diff mbox series

[ovs-dev,RFC] ovsdb-server: Fix the possible data loss in an active/standby setup - APPROACH 1

Message ID 20180903054840.6212-1-nusiddiq@redhat.com
State Superseded
Headers show
Series [ovs-dev,RFC] ovsdb-server: Fix the possible data loss in an active/standby setup - APPROACH 1 | expand

Commit Message

Numan Siddique Sept. 3, 2018, 5:48 a.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

The present code resets the database when it is in the state -
'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it
receives the monitor reply when it is in the state -
'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode
before it processes the monitor reply, the whole data is lost.

This patch fixes the issue by resetting the database when it
receives the monitor reply (before processing it). so that
reset and repopulation of the db happens in the same state.

This approach is simpler, but it has a small window for data loss
if the function process_notification() which processes the monitor
reply fails for some reason.

An alternate approach to solve this issue is to store the result of
the process_notification() in memory (using unbacked 'struct ovsdb *'
object), reset the database and then repopulate the db from the
memory.

Reported-by: Han Zhou <zhouhan@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---
 ovsdb/replication.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)
diff mbox series

Patch

diff --git a/ovsdb/replication.c b/ovsdb/replication.c
index 2b9ae2f83..3c37ce3d8 100644
--- a/ovsdb/replication.c
+++ b/ovsdb/replication.c
@@ -299,19 +299,7 @@  replication_run(void)
                 /* After receiving schemas, reset the local databases that
                  * will be monitored and send out monitor requests for them. */
                 if (hmap_is_empty(&request_ids)) {
-                    struct shash_node *node, *next;
-
-                    SHASH_FOR_EACH_SAFE (node, next, replication_dbs) {
-                        db = node->data;
-                        error = reset_database(db);
-                        if (error) {
-                            const char *db_name = db->schema->name;
-                            shash_find_and_delete(replication_dbs, db_name);
-                            ovsdb_error_assert(error);
-                            VLOG_WARN("Failed to reset database, "
-                                      "%s not replicated.", db_name);
-                        }
-                    }
+                    struct shash_node *node;
 
                     if (shash_is_empty(replication_dbs)) {
                         VLOG_WARN("Nothing to replicate.");
@@ -335,7 +323,10 @@  replication_run(void)
             case RPL_S_MONITOR_REQUESTED: {
                 /* Reply to monitor requests. */
                 struct ovsdb_error *error;
-                error = process_notification(msg->result, db);
+                error = reset_database(db);
+                if (!error) {
+                    error = process_notification(msg->result, db);
+                }
                 if (error) {
                     ovsdb_error_assert(error);
                     state = RPL_S_ERR;