diff mbox

[ovs-dev] ovsdb: fix data loss when OVSDB replication from itself

Message ID 20170103030950.8656-1-ligs@dtdream.com
State Superseded
Headers show

Commit Message

Guoshuai Li Jan. 3, 2017, 3:09 a.m. UTC
Delete the local database after receiving the master data,
this is safer for data.
This patch is used by HA cluster that have no way to
control the order of resources, such as kubernetes.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
---
 ovsdb/replication.c | 73 +++++++++++++++++++++++++----------------------------
 1 file changed, 35 insertions(+), 38 deletions(-)

Comments

Andy Zhou Jan. 31, 2017, 11:11 p.m. UTC | #1
On Mon, Jan 2, 2017 at 7:09 PM, Guoshuai Li <ligs@dtdream.com> wrote:
> Delete the local database after receiving the master data,
> this is safer for data.
> This patch is used by HA cluster that have no way to
> control the order of resources, such as kubernetes.
>
> Signed-off-by: Guoshuai Li <ligs@dtdream.com>

Sorry for the delay in review.

Replication of OVSDB onto itself seems to be an configuration issue. I don't see
why such configuration is ever useful in practice, and probably should
be blocked
at a higher level.

Is there something special about K8 where OVSDB is expected to
replicating itself?
If so, please explain.
Guoshuai Li Feb. 1, 2017, 3:27 a.m. UTC | #2
This patch removes IP migration and OVSDB services to promote timing 
dependency, which occurs at the master / slave exchange time and may not 
be user-configurable.
If this dependency requires upper-level processing, not every cluster 
program can easily handle it. For example, building an OVSDB cluster 
using K8S is difficult to handle this dependency.
> Replication of OVSDB onto itself seems to be an configuration issue. I don't see
> why such configuration is ever useful in practice, and probably should
> be blocked
> at a higher level.
>
> Is there something special about K8 where OVSDB is expected to
> replicating itself?
> If so, please explain.
Gurucharan Shetty Feb. 2, 2017, 4:50 p.m. UTC | #3
On 31 January 2017 at 19:27, Guoshuai Li <ligs@dtdream.com> wrote:

>
>
> This patch removes IP migration and OVSDB services to promote timing
> dependency, which occurs at the master / slave exchange time and may not be
> user-configurable.
> If this dependency requires upper-level processing, not every cluster
> program can easily handle it. For example, building an OVSDB cluster using
> K8S is difficult to handle this dependency.


Assume that I don't know anything about Kubernetes and OVSDB replication.
Can you clearly explain how do you plan to use it with Kubernetes and what
is it that does not work right now?


>
> Replication of OVSDB onto itself seems to be an configuration issue. I
>> don't see
>> why such configuration is ever useful in practice, and probably should
>> be blocked
>> at a higher level.
>>
>> Is there something special about K8 where OVSDB is expected to
>> replicating itself?
>> If so, please explain.
>>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
=?UTF-8?B?5YWL6LWb5Y+3MDE4MQ==?= Feb. 7, 2017, 1:31 a.m. UTC | #4
In K8S, Pacemaker is not used usally, we use keepalived to implement the VIP (OVN service endpoint),when master nodes fails, the VIP switched earlier than we notice such event,so, the new master node will connect itself in a short time(about 3-10 seconds).




------------------------------------------------------------------发件人:Guru Shetty <guru@ovn.org>发送时间:2017年2月3日(星期五) 00:50收件人:姜尚0387 <ligs@dtdream.com>抄 送:Andy Zhou <azhou@ovn.org>; ovs-dev <ovs-dev@openvswitch.org>; 克赛号0181 <dupf@dtdream.com>主 题:Re: [ovs-dev] [PATCH] ovsdb: fix data loss when OVSDB replication from itself


On 31 January 2017 at 19:27, Guoshuai Li <ligs@dtdream.com> wrote:



This patch removes IP migration and OVSDB services to promote timing dependency, which occurs at the master / slave exchange time and may not be user-configurable.

If this dependency requires upper-level processing, not every cluster program can easily handle it. For example, building an OVSDB cluster using K8S is difficult to handle this dependency.
Assume that I don't know anything about Kubernetes and OVSDB replication. Can you clearly explain how do you plan to use it with Kubernetes and what is it that does not work right now? 


Replication of OVSDB onto itself seems to be an configuration issue. I don't see

why such configuration is ever useful in practice, and probably should

be blocked

at a higher level.


Is there something special about K8 where OVSDB is expected to

replicating itself?

If so, please explain.
=?UTF-8?B?5YWL6LWb5Y+3MDE4MQ==?= Feb. 7, 2017, 1:35 a.m. UTC | #5
Additional, we have 3 OVN nodes, which have 3 independent IP, and 1 VIP.All OVN set the VIP as the active-ovsdb-server IP.


------------------------------------------------------------------发件人:克赛号0181 <dupf@dtdream.com>发送时间:2017年2月7日(星期二) 09:31收件人:姜尚0387 <ligs@dtdream.com>; Guru Shetty <guru@ovn.org>抄 送:Andy Zhou <azhou@ovn.org>; ovs-dev <ovs-dev@openvswitch.org>主 题:回复:[ovs-dev] [PATCH] ovsdb: fix data loss when OVSDB replication from itself
In K8S, Pacemaker is not used usally, we use keepalived to implement the VIP (OVN service endpoint),when master nodes fails, the VIP switched earlier than we notice such event,so, the new master node will connect itself in a short time(about 3-10 seconds).




------------------------------------------------------------------发件人:Guru Shetty <guru@ovn.org>发送时间:2017年2月3日(星期五) 00:50收件人:姜尚0387 <ligs@dtdream.com>抄 送:Andy Zhou <azhou@ovn.org>; ovs-dev <ovs-dev@openvswitch.org>; 克赛号0181 <dupf@dtdream.com>主 题:Re: [ovs-dev] [PATCH] ovsdb: fix data loss when OVSDB replication from itself


On 31 January 2017 at 19:27, Guoshuai Li <ligs@dtdream.com> wrote:



This patch removes IP migration and OVSDB services to promote timing dependency, which occurs at the master / slave exchange time and may not be user-configurable.

If this dependency requires upper-level processing, not every cluster program can easily handle it. For example, building an OVSDB cluster using K8S is difficult to handle this dependency.
Assume that I don't know anything about Kubernetes and OVSDB replication. Can you clearly explain how do you plan to use it with Kubernetes and what is it that does not work right now? 


Replication of OVSDB onto itself seems to be an configuration issue. I don't see

why such configuration is ever useful in practice, and probably should

be blocked

at a higher level.


Is there something special about K8 where OVSDB is expected to

replicating itself?

If so, please explain.
Andy Zhou Feb. 8, 2017, 5:31 a.m. UTC | #6
On Mon, Feb 6, 2017 at 5:31 PM, 克赛号0181 <dupf@dtdream.com> wrote:
> In K8S, Pacemaker is not used usally, we use keepalived to implement the VIP
> (OVN service endpoint),when master nodes fails, the VIP switched earlier
> than we notice such event,
> so, the new master node will connect itself in a short time(about 3-10
> seconds).
>
>
Thanks for explaining. This is the first time we heard of a use case
without pacemaker.

Regardless how it happens, I agree it is desirable to prevent data
loss when ovsdb-server
tries to replicate itself.

The implementation of the patch is pretty clever, in that the reply of
the monitor request
is used to preserve the database content in case it tries to replicate
itself.  However, I see two two
potential drawbacks to this approach.

1. In case the replication set up excludes tables, or columns, the
excluded data will be lost.
2. In general, to guard against undesirable configurations, it is
usually more desirable to reject
 them (and log errors) than to silently 'fix' the problem.

I realize those drawbacks are not critical, and may not even be
relevant in some deployments,
but it would be nice avoid them.

Ben suggested a method that an OVSDB server can detect self
replicating. I just posted
a 3 patch series that tries to implement this idea as an alternative solution.

https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328587.html

If interested, please take a look and provide any feedback you may have.

Thanks,

--andy
diff mbox

Patch

diff --git a/ovsdb/replication.c b/ovsdb/replication.c
index 1c77b18..109ae98 100644
--- a/ovsdb/replication.c
+++ b/ovsdb/replication.c
@@ -252,56 +252,53 @@  replication_run(void)
                 }
                 ovsdb_schema_destroy(schema);
 
-                /* After receiving schemas, reset the local databases that
-                 * will be monitored and send out monitor requests for them. */
+                /* After receiving schemas, send out monitor requests for DB. */
                 if (hmap_is_empty(&request_ids)) {
-                    struct shash_node *node, *next;
-
-                    SHASH_FOR_EACH_SAFE (node, next, replication_dbs) {
+                    struct shash_node *node;
+                    SHASH_FOR_EACH (node, replication_dbs) {
                         db = node->data;
-                        struct ovsdb_error *error = reset_database(db);
-                        if (error) {
-                            const char *db_name = db->schema->name;
-                            shash_find_and_delete(replication_dbs, db_name);
-                            ovsdb_error_assert(error);
-                            VLOG_WARN("Failed to reset database, "
-                                      "%s not replicated.", db_name);
-                        }
-                    }
-
-                    if (shash_is_empty(replication_dbs)) {
-                        VLOG_WARN("Nothing to replicate.");
-                        state = RPL_S_ERR;
-                    } else {
-                        SHASH_FOR_EACH (node, replication_dbs) {
-                            db = node->data;
-                            struct ovsdb *db = node->data;
-                            struct jsonrpc_msg *request =
-                                create_monitor_request(db);
-
-                            request_ids_add(request->id, db);
-                            jsonrpc_session_send(session, request);
-                            VLOG_DBG("Send monitor requests");
-                            state = RPL_S_MONITOR_REQUESTED;
-                        }
+                        struct ovsdb *db = node->data;
+                        struct jsonrpc_msg *request =
+                            create_monitor_request(db);
+
+                        request_ids_add(request->id, db);
+                        jsonrpc_session_send(session, request);
+                        VLOG_DBG("Send monitor requests");
+                        state = RPL_S_MONITOR_REQUESTED;
                     }
                 }
                 break;
             }
 
             case RPL_S_MONITOR_REQUESTED: {
-                /* Reply to monitor requests. */
-                struct ovsdb_error *error;
-                error = process_notification(msg->result, db);
+                /* After receiving monitor response, reset the local database
+                 * that will be monitored and process message. */
+                struct ovsdb_error *error = reset_database(db);
                 if (error) {
+                    const char *db_name = db->schema->name;
+                    shash_find_and_delete(replication_dbs, db_name);
                     ovsdb_error_assert(error);
+                    VLOG_WARN("Failed to reset database, "
+                              "%s not replicated.", db_name);
+                }
+
+                if (shash_is_empty(replication_dbs)) {
+                    VLOG_WARN("Nothing to replicate.");
                     state = RPL_S_ERR;
                 } else {
-                    /* Transition to replicating state after receiving
-                     * all replies of "monitor" requests. */
-                    if (hmap_is_empty(&request_ids)) {
-                        VLOG_DBG("Listening to monitor updates");
-                        state = RPL_S_REPLICATING;
+                    /* Reply to monitor requests. */
+                    struct ovsdb_error *error;
+                    error = process_notification(msg->result, db);
+                    if (error) {
+                        ovsdb_error_assert(error);
+                        state = RPL_S_ERR;
+                    } else {
+                        /* Transition to replicating state after receiving
+                         * all replies of "monitor" requests. */
+                        if (hmap_is_empty(&request_ids)) {
+                            VLOG_DBG("Listening to monitor updates");
+                            state = RPL_S_REPLICATING;
+                        }
                     }
                 }
                 break;