diff mbox

[ovs-dev,1/2] ovn pacemaker: Fix return code errors in start/stop action

Message ID 20170522013512.27817-1-nusiddiq@redhat.com
State Accepted
Headers show

Commit Message

Numan Siddique May 22, 2017, 1:35 a.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

start action returns OCF_RUNNING_MASTER in certain scenarios.
But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud
be returned only in monitor action [1].

Whenever the start action returns OCF_RUNNING_MASTER, it is observed
in the testing that, pacemaker stops the ovsdb-server ocf resource
in that node. This patch fixes this issue by returning OCF_SUCESS in
such cases.

stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are
running as master. But as per the OCF guidelines [2], stop action
should only return OCF_SUCCESS. If any other code is returned,
pacemaker cluster would block that resource in that node.

This patch fixes this issue by stopping the ovsdb-servers when they
are running as masters (which is the expected case) and returns
OCF_SUCCESS.

[1] - http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html
[2] - http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html

CC: Andy Zhou <azhou@ovn.org>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---
 ovn/utilities/ovndb-servers.ocf | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Andy Zhou May 22, 2017, 11:51 p.m. UTC | #1
On Sun, May 21, 2017 at 6:35 PM,  <nusiddiq@redhat.com> wrote:
> From: Numan Siddique <nusiddiq@redhat.com>
>
> start action returns OCF_RUNNING_MASTER in certain scenarios.
> But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud
> be returned only in monitor action [1].
>
> Whenever the start action returns OCF_RUNNING_MASTER, it is observed
> in the testing that, pacemaker stops the ovsdb-server ocf resource
> in that node. This patch fixes this issue by returning OCF_SUCESS in
> such cases.
>
> stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are
> running as master. But as per the OCF guidelines [2], stop action
> should only return OCF_SUCCESS. If any other code is returned,
> pacemaker cluster would block that resource in that node.
>
> This patch fixes this issue by stopping the ovsdb-servers when they
> are running as masters (which is the expected case) and returns
> OCF_SUCCESS.
>
> [1] - http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html
> [2] - http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html
>
> CC: Andy Zhou <azhou@ovn.org>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>

Thanks for the fixes!  Both patches look reasonable to me. I pushed
them to master.
Numan Siddique May 23, 2017, 4:12 a.m. UTC | #2
On Tue, May 23, 2017 at 5:21 AM, Andy Zhou <azhou@ovn.org> wrote:

> On Sun, May 21, 2017 at 6:35 PM,  <nusiddiq@redhat.com> wrote:
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > start action returns OCF_RUNNING_MASTER in certain scenarios.
> > But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud
> > be returned only in monitor action [1].
> >
> > Whenever the start action returns OCF_RUNNING_MASTER, it is observed
> > in the testing that, pacemaker stops the ovsdb-server ocf resource
> > in that node. This patch fixes this issue by returning OCF_SUCESS in
> > such cases.
> >
> > stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are
> > running as master. But as per the OCF guidelines [2], stop action
> > should only return OCF_SUCCESS. If any other code is returned,
> > pacemaker cluster would block that resource in that node.
> >
> > This patch fixes this issue by stopping the ovsdb-servers when they
> > are running as masters (which is the expected case) and returns
> > OCF_SUCCESS.
> >
> > [1] - http://www.linux-ha.org/doc/dev-guides/_literal_ocf_
> running_master_literal_8.html
> > [2] - http://www.linux-ha.org/doc/dev-guides/_literal_stop_
> literal_action.html
> >
> > CC: Andy Zhou <azhou@ovn.org>
> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>
> Thanks for the fixes!  Both patches look reasonable to me. I pushed
> them to master.
>

Thanks Andy. Can these patches be back ported to  branch 2.7 ? It would be
great since the tripleo patches for OVN needs these fixes

Numan
Andy Zhou May 23, 2017, 6:39 p.m. UTC | #3
On Mon, May 22, 2017 at 9:12 PM, Numan Siddique <nusiddiq@redhat.com> wrote:
>
>
> On Tue, May 23, 2017 at 5:21 AM, Andy Zhou <azhou@ovn.org> wrote:
>>
>> On Sun, May 21, 2017 at 6:35 PM,  <nusiddiq@redhat.com> wrote:
>> > From: Numan Siddique <nusiddiq@redhat.com>
>> >
>> > start action returns OCF_RUNNING_MASTER in certain scenarios.
>> > But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud
>> > be returned only in monitor action [1].
>> >
>> > Whenever the start action returns OCF_RUNNING_MASTER, it is observed
>> > in the testing that, pacemaker stops the ovsdb-server ocf resource
>> > in that node. This patch fixes this issue by returning OCF_SUCESS in
>> > such cases.
>> >
>> > stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are
>> > running as master. But as per the OCF guidelines [2], stop action
>> > should only return OCF_SUCCESS. If any other code is returned,
>> > pacemaker cluster would block that resource in that node.
>> >
>> > This patch fixes this issue by stopping the ovsdb-servers when they
>> > are running as masters (which is the expected case) and returns
>> > OCF_SUCCESS.
>> >
>> > [1] -
>> > http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html
>> > [2] -
>> > http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html
>> >
>> > CC: Andy Zhou <azhou@ovn.org>
>> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>>
>> Thanks for the fixes!  Both patches look reasonable to me. I pushed
>> them to master.
>
>
> Thanks Andy. Can these patches be back ported to  branch 2.7 ? It would be
> great since the tripleo patches for OVN needs these fixes
>
> Numan
Done. Thanks for the reminder.
>
>
diff mbox

Patch

diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf
index 908cb3c..ad03cc3 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -275,7 +275,18 @@  ovsdb_server_start() {
         rc=$?
         case $rc in
             $OCF_SUCCESS)        return $rc;;
-            $OCF_RUNNING_MASTER) return $rc;;
+            $OCF_RUNNING_MASTER)
+                # When a slave node is promoted as master, the action would be
+                # STOP -> START -> PROMOTE.
+                # When the start action is called, it is possible for the
+                # ovsdb-server's to be started as active. This could happen
+                # if the node owns the $MASTER_IP. At this point, pacemaker
+                # has not promoted this node yet. So return OCF_SUCCESS.
+                # Let pacemaker promote it in subsequent actions.
+                # As per the OCF guidelines, only monitor action should return
+                # OCF_RUNNING_MASTER.
+                # http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html
+                return $OCF_SUCCESS;;
             $OCF_ERR_GENERIC)    return $rc;;
             # Otherwise loop, waiting for the service to start, until
             # the cluster times the operation out
@@ -288,7 +299,6 @@  ovsdb_server_stop() {
     ovsdb_server_check_status
     case $? in
         $OCF_NOT_RUNNING)    return ${OCF_SUCCESS};;
-        $OCF_RUNNING_MASTER) return ${OCF_RUNNING_MASTER};;
     esac
 
     ${OVN_CTL} stop_ovsdb