diff mbox

[ovs-dev] ovn: fix OVNDB process is stopped when master node demote to the slave by pacemaker

Message ID 20161207054106.20828-1-ligs@dtdream.com
State Changes Requested
Headers show

Commit Message

Guoshuai Li Dec. 7, 2016, 5:41 a.m. UTC
When the master node's OVNDB process fails, the local node demote to the slave.
Failure cause is that the OVNDB process is stop, So the need to re-run the process up.
if return $OCF_NOT_RUNNING will not demote the node to slave.

Signed-off-by: Guoshuai Li <ligs@dtdream.com>
---
 ovn/utilities/ovndb-servers.ocf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andy Zhou Dec. 7, 2016, 9:36 p.m. UTC | #1
On Tue, Dec 6, 2016 at 9:41 PM, Guoshuai Li <ligs@dtdream.com> wrote:

> When the master node's OVNDB process fails, the local node demote to the
> slave.
> Failure cause is that the OVNDB process is stop, So the need to re-run the
> process up.
> if return $OCF_NOT_RUNNING will not demote the node to slave.
>
> Signed-off-by: Guoshuai Li <ligs@dtdream.com>
> ---
>  ovn/utilities/ovndb-servers.ocf | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 1cf6f20..8a64e88 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -283,7 +283,7 @@ ovsdb_server_promote() {
>  ovsdb_server_demote() {
>      ovsdb_server_check_status
>      if [ $? = $OCF_NOT_RUNNING ]; then
> -        return $OCF_NOT_RUNNING
> +        ovsdb_server_start
>

The logic here looks odd to me. demote() operation should be done against
running OVNDBs.

Why is OVNDB stopped in the first place?  If they are stopped by admin, it
would be odd that ocf script
would restart them.
Guoshuai Li Dec. 8, 2016, 7:42 a.m. UTC | #2
On 2016/12/8 5:36, Andy Zhou wrote:
>
>
> On Tue, Dec 6, 2016 at 9:41 PM, Guoshuai Li <ligs@dtdream.com 
> <mailto:ligs@dtdream.com>> wrote:
>
>     When the master node's OVNDB process fails, the local node demote
>     to the slave.
>     Failure cause is that the OVNDB process is stop, So the need to
>     re-run the process up.
>     if return $OCF_NOT_RUNNING will not demote the node to slave.
>
>     Signed-off-by: Guoshuai Li <ligs@dtdream.com
>     <mailto:ligs@dtdream.com>>
>     ---
>      ovn/utilities/ovndb-servers.ocf | 2 +-
>      1 file changed, 1 insertion(+), 1 deletion(-)
>
>     diff --git a/ovn/utilities/ovndb-servers.ocf
>     b/ovn/utilities/ovndb-servers.ocf
>     index 1cf6f20..8a64e88 100755
>     --- a/ovn/utilities/ovndb-servers.ocf
>     +++ b/ovn/utilities/ovndb-servers.ocf
>     @@ -283,7 +283,7 @@ ovsdb_server_promote() {
>      ovsdb_server_demote() {
>          ovsdb_server_check_status
>          if [ $? = $OCF_NOT_RUNNING ]; then
>     -        return $OCF_NOT_RUNNING
>     +        ovsdb_server_start
>
>
> The logic here looks odd to me. demote() operation should be done 
> against running OVNDBs.
>
> Why is OVNDB stopped in the first place?  If they are stopped by 
> admin, it would be odd that ocf script
> would restart them.

I agree that demote () should not start OVN-DB.
But when the OVN-DB process crashes, who might restart it?

I put the master node's OVSDB process with 'kill -9', It does not 
migrate because of depends on VIP.
but after a long time did not start, and no master node.
/
Full list of resources://
// Master/Slave Set: ovndb_servers-master [ovndb_servers]//
//     ovndb_servers      (ocf::ovn:ovndb-servers): Started ovn2//
//     ovndb_servers      (ocf::ovn:ovndb-servers): Started ovn3//
//     ovndb_servers      (ocf::ovn:ovndb-servers): //Stopped//
//     Slaves: [ ovn2 ovn3 ]//
//     Stopped: [ ovn1 ]//
// VirtualIP      (ocf::heartbeat:IPaddr2):       Started ovn1//
//Failed Actions://
//* ovndb_servers_demote_0 on ovn1 'not running' (7): call=21, 
status=complete, exitreason='none',//
//    last-rc-change='Thu Dec  8 13:41:14 2016', queued=0ms, exec=69ms/

By debugging I found that pacemaker did not call ovsdb_server_start(), 
it call ovsdb_server_demote() and ovsdb_server_stop().
Who should start it?  ovsdb_server_monitor ()?  or pacemaker error?
Andy Zhou Dec. 8, 2016, 9:29 p.m. UTC | #3
On Wed, Dec 7, 2016 at 11:42 PM, Guoshuai Li <ligs@dtdream.com> wrote:

>
> On 2016/12/8 5:36, Andy Zhou wrote:
>
>
>
> On Tue, Dec 6, 2016 at 9:41 PM, Guoshuai Li <ligs@dtdream.com> wrote:
>
>> When the master node's OVNDB process fails, the local node demote to the
>> slave.
>> Failure cause is that the OVNDB process is stop, So the need to re-run
>> the process up.
>> if return $OCF_NOT_RUNNING will not demote the node to slave.
>>
>> Signed-off-by: Guoshuai Li <ligs@dtdream.com>
>> ---
>>  ovn/utilities/ovndb-servers.ocf | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ovn/utilities/ovndb-servers.ocf
>> b/ovn/utilities/ovndb-servers.ocf
>> index 1cf6f20..8a64e88 100755
>> --- a/ovn/utilities/ovndb-servers.ocf
>> +++ b/ovn/utilities/ovndb-servers.ocf
>> @@ -283,7 +283,7 @@ ovsdb_server_promote() {
>>  ovsdb_server_demote() {
>>      ovsdb_server_check_status
>>      if [ $? = $OCF_NOT_RUNNING ]; then
>> -        return $OCF_NOT_RUNNING
>> +        ovsdb_server_start
>>
>
> The logic here looks odd to me. demote() operation should be done against
> running OVNDBs.
>
> Why is OVNDB stopped in the first place?  If they are stopped by admin, it
> would be odd that ocf script
> would restart them.
>
>
>
> I agree that demote () should not start OVN-DB.
> But when the OVN-DB process crashes, who might restart it?
>

If OVN-DB crashes, (usually with SIGSEGV segmentation fault), it will be
restarted by the --monitor option.
ovsdb-server deamon does not consider kill -9 (SIGKILL)  as crash. It is
rather treated as intentional stop.

>
> I put the master node's OVSDB process with 'kill -9', It does not migrate
> because of depends on VIP.
> but after a long time did not start, and no master node.
>
> * Full list of resources:*
> * Master/Slave Set: ovndb_servers-master [ovndb_servers]*
> *     ovndb_servers      (ocf::ovn:ovndb-servers):       Started ovn2*
> *     ovndb_servers      (ocf::ovn:ovndb-servers):       Started ovn3*
> *     ovndb_servers      (ocf::ovn:ovndb-servers):       **Stopped*
> *     Slaves: [ ovn2 ovn3 ]*
> *     Stopped: [ ovn1 ]*
> * VirtualIP      (ocf::heartbeat:IPaddr2):       Started ovn1*
> *Failed Actions:*
> ** ovndb_servers_demote_0 on ovn1 'not running' (7): call=21,
> status=complete, exitreason='none',*
> *    last-rc-change='Thu Dec  8 13:41:14 2016', queued=0ms, exec=69ms*
>
> By debugging I found that pacemaker did not call ovsdb_server_start(), it
> call ovsdb_server_demote() and ovsdb_server_stop().
> Who should start it?  ovsdb_server_monitor ()?  or pacemaker error?
>
>
>
diff mbox

Patch

diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf
index 1cf6f20..8a64e88 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -283,7 +283,7 @@  ovsdb_server_promote() {
 ovsdb_server_demote() {
     ovsdb_server_check_status
     if [ $? = $OCF_NOT_RUNNING ]; then
-        return $OCF_NOT_RUNNING
+        ovsdb_server_start
     fi
 
     local present_master=$(ovsdb_server_find_active_master)