diff mbox

[ovs-dev,1/4] docs: OVSDB replication design document

Message ID 1466811845-32387-1-git-send-email-blp@ovn.org
State Accepted
Headers show

Commit Message

Ben Pfaff June 24, 2016, 11:44 p.m. UTC
From: Mario Cabrera <mario.cabrera@hpe.com>

The database replication functionality is designed to provide "fail
over" characteristics. There are two participating databases, one of
which is the "active" database and the other is the "stand by" database.
Replication happens exclusively from the active to the stand by
database.

This document explains how the replication functionality is implemented.

Signed-off-by: Mario Cabrera <mario.cabrera@hpe.com>
---
 Documentation/OVSDB-replication.md | 123 +++++++++++++++++++++++++++++++++++++
 Documentation/automake.mk          |   3 +-
 2 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/OVSDB-replication.md
diff mbox

Patch

diff --git a/Documentation/OVSDB-replication.md b/Documentation/OVSDB-replication.md
new file mode 100644
index 0000000..4a4eb5e
--- /dev/null
+++ b/Documentation/OVSDB-replication.md
@@ -0,0 +1,123 @@ 
+OVSDB replication implementation
+--------------------------------
+
+Overview
+========
+Given two Open vSwitch databases that have the same schema, OVSDB replication
+consists on maintaining these databases in the same state with one another,
+i.e each of the databases have the same contents at any given time even if they
+are not running in the same host. This document elaborates on the implementation
+details to provide this functionality.
+
+Terminology
+===========
+-   Source of truth database: database whose content will be replicated to
+another database.
+-   Active server: ovsdb-server providing RPC interface to the source of
+truth database.
+-   Standby server: ovsdb-server providing RPC interface to the database that
+is not the source of truth.
+
+Design
+======
+The overall design of replication consist on one ovsdb-server (active server)
+communicating the state of its databases to another ovsdb-server
+(standby server) so that the latter keep its own databases in that same state.
+In order to achieve this, the standby server acts as a client of the active
+server, in the sense that it sends a monitor request to keep up to date with
+the changes in the active server databases. When a notification from the
+active server arrives, the standby server executes the necessary set of
+operations so its databases reach the same state as the the active server
+databases. Below is the design represented as a diagram.
+
+    +--------------+    replication     +--------------+
+    |    Active    |<-------------------|   Standby    |
+    | OVSDB-server |                    | OVSDB-server |
+    +--------------+                    +--------------+
+            |                                  |
+            |                                  |
+        +-------+                          +-------+
+        |  SoT  |                          |       |
+        | OVSDB |                          | OVSDB |
+        +-------+                          +-------+
+
+Setting up the replication
+==========================
+To initiate the replication process, the standby server must be executed
+indicating the location of the active server via the command line option
+"--sync-from=server", where server can take any form described in the
+ovsdb-client manpage and it must specify an active connection type (tcp, unix,
+ssl). This option will cause the standby server to attempt to send a monitor
+request to the active server in every main loop iteration, until the active
+server responds.
+
+When sending a monitor request the standby server is doing the following:
+
+1. Erase the content of the databases for which it is providing a RPC interface.
+2. Open the jsonrpc channel to communicate with the active server.
+3. Fetch all the databases located in the active server.
+4. For each database with the same schema in both the active and standby
+servers: construct and send a monitor request message specifying the tables
+that will be monitored (i.e all the tables on the database except the ones
+blacklisted*).
+5. Set the standby database to the current state of the active database.
+
+Once the monitor request message is sent, the standby server will continuosly
+receive notifications of changes occuring to the tables specified in the
+request. The process of handling this notifications is detailed in the next
+section.
+
+*A set of tables that will be excluded from replication can be configure as a
+blacklist of tables via the command line option "--sync-exclude-tables=db:table[,db:table]...",
+where db corresponds to the database where the table resides.
+
+Replication process
+===================
+The replication proccess consists on handling the update notifications received
+in the standby server caused by the monitor request that was previously sent to
+the active server. In every loop interation, the standby server attempts to
+receive a message from the active server which can be an error, an echo
+message (used to keep the connection alive) or an update notification. In case
+the message is a fatal error, the standby server will disconnect from the
+active without dropping the replicated data. If it is an echo message, the
+standby server will reply with an echo message as well. If the message is an
+update notification, the following process occurs:
+
+1. Create a new transaction.
+2. Get the \<table-updates\> object from the "params" member of the
+   notification.
+3. For each \<table-update\> in the \<table-updates\> object do:
+    1. For each \<row-update\> in \<table-update\> check what kind of
+       operation should be executed according to the following criteria about
+       the presence of the object members:
+    -   If "old" member is not present, execute an insert operation using
+        \<row\> from the "new" member.
+    -   If "old" member is present and "new" member is not present, execute
+        a delete operation using \<row\> from the "old" member
+    -   If both "old" and "new" members are present, execute an update
+        operation using \<row\> from the "new" member.
+4. Commit the transaction.
+
+If an error occurrs during the replication process, all replication is
+restarted by resending a new monitor request as described in the section
+"Setting up the replication".
+
+Runtime management commands
+===========================
+Runtime management commands can be sent to a running standby server via
+ovs-appctl in order to configure the replication functionality. The available
+commands are the following.
+
+-   ovsdb-server/set-remote-ovsdb-server {server}: sets the name of the active
+    server.
+-   ovsdb-server/get-remote-ovsdb-server: gets the name of the active server
+-   ovsdb-server/connect-remote-ovsdb-server: causes the server to attempt to
+    send a monitor request every main loop iteration.
+-   ovsdb-server/disconnect-remote-ovsdb-server: closes the jsonrpc channel
+    between the active server and frees the memory used for the replication
+    configuration.
+-   ovsdb-server/set-sync-excluded-tables {db:table,...}: sets the tables list
+    that will be excluded from being replicated.
+-   ovsdb-server/get-sync-excluded-tables: gets the tables list that is
+    currently excluded from replication.
+
diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 5903c22..aae41d2 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -1,4 +1,5 @@ 
 docs += \
 	Documentation/committer-responsibilities.md \
 	Documentation/committer-grant-revocation.md \
-	Documentation/group-selection-method-property.txt
+	Documentation/group-selection-method-property.txt \
+	Documentation/OVSDB-replication.md