Message ID | 20170522013512.27817-1-nusiddiq@redhat.com |
---|---|
State | Accepted |
Headers | show |
On Sun, May 21, 2017 at 6:35 PM, <nusiddiq@redhat.com> wrote: > From: Numan Siddique <nusiddiq@redhat.com> > > start action returns OCF_RUNNING_MASTER in certain scenarios. > But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud > be returned only in monitor action [1]. > > Whenever the start action returns OCF_RUNNING_MASTER, it is observed > in the testing that, pacemaker stops the ovsdb-server ocf resource > in that node. This patch fixes this issue by returning OCF_SUCESS in > such cases. > > stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are > running as master. But as per the OCF guidelines [2], stop action > should only return OCF_SUCCESS. If any other code is returned, > pacemaker cluster would block that resource in that node. > > This patch fixes this issue by stopping the ovsdb-servers when they > are running as masters (which is the expected case) and returns > OCF_SUCCESS. > > [1] - http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html > [2] - http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html > > CC: Andy Zhou <azhou@ovn.org> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Thanks for the fixes! Both patches look reasonable to me. I pushed them to master.
On Tue, May 23, 2017 at 5:21 AM, Andy Zhou <azhou@ovn.org> wrote: > On Sun, May 21, 2017 at 6:35 PM, <nusiddiq@redhat.com> wrote: > > From: Numan Siddique <nusiddiq@redhat.com> > > > > start action returns OCF_RUNNING_MASTER in certain scenarios. > > But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud > > be returned only in monitor action [1]. > > > > Whenever the start action returns OCF_RUNNING_MASTER, it is observed > > in the testing that, pacemaker stops the ovsdb-server ocf resource > > in that node. This patch fixes this issue by returning OCF_SUCESS in > > such cases. > > > > stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are > > running as master. But as per the OCF guidelines [2], stop action > > should only return OCF_SUCCESS. If any other code is returned, > > pacemaker cluster would block that resource in that node. > > > > This patch fixes this issue by stopping the ovsdb-servers when they > > are running as masters (which is the expected case) and returns > > OCF_SUCCESS. > > > > [1] - http://www.linux-ha.org/doc/dev-guides/_literal_ocf_ > running_master_literal_8.html > > [2] - http://www.linux-ha.org/doc/dev-guides/_literal_stop_ > literal_action.html > > > > CC: Andy Zhou <azhou@ovn.org> > > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > > Thanks for the fixes! Both patches look reasonable to me. I pushed > them to master. > Thanks Andy. Can these patches be back ported to branch 2.7 ? It would be great since the tripleo patches for OVN needs these fixes Numan
On Mon, May 22, 2017 at 9:12 PM, Numan Siddique <nusiddiq@redhat.com> wrote: > > > On Tue, May 23, 2017 at 5:21 AM, Andy Zhou <azhou@ovn.org> wrote: >> >> On Sun, May 21, 2017 at 6:35 PM, <nusiddiq@redhat.com> wrote: >> > From: Numan Siddique <nusiddiq@redhat.com> >> > >> > start action returns OCF_RUNNING_MASTER in certain scenarios. >> > But as per the OCF guidelines, status code OCF_RUNNING_MASTER shoud >> > be returned only in monitor action [1]. >> > >> > Whenever the start action returns OCF_RUNNING_MASTER, it is observed >> > in the testing that, pacemaker stops the ovsdb-server ocf resource >> > in that node. This patch fixes this issue by returning OCF_SUCESS in >> > such cases. >> > >> > stop action returns OCF_RUNNING_MASTER if the ovsdb-servers are >> > running as master. But as per the OCF guidelines [2], stop action >> > should only return OCF_SUCCESS. If any other code is returned, >> > pacemaker cluster would block that resource in that node. >> > >> > This patch fixes this issue by stopping the ovsdb-servers when they >> > are running as masters (which is the expected case) and returns >> > OCF_SUCCESS. >> > >> > [1] - >> > http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html >> > [2] - >> > http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html >> > >> > CC: Andy Zhou <azhou@ovn.org> >> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> >> >> Thanks for the fixes! Both patches look reasonable to me. I pushed >> them to master. > > > Thanks Andy. Can these patches be back ported to branch 2.7 ? It would be > great since the tripleo patches for OVN needs these fixes > > Numan Done. Thanks for the reminder. > >
diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index 908cb3c..ad03cc3 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -275,7 +275,18 @@ ovsdb_server_start() { rc=$? case $rc in $OCF_SUCCESS) return $rc;; - $OCF_RUNNING_MASTER) return $rc;; + $OCF_RUNNING_MASTER) + # When a slave node is promoted as master, the action would be + # STOP -> START -> PROMOTE. + # When the start action is called, it is possible for the + # ovsdb-server's to be started as active. This could happen + # if the node owns the $MASTER_IP. At this point, pacemaker + # has not promoted this node yet. So return OCF_SUCCESS. + # Let pacemaker promote it in subsequent actions. + # As per the OCF guidelines, only monitor action should return + # OCF_RUNNING_MASTER. + # http://www.linux-ha.org/doc/dev-guides/_literal_ocf_running_master_literal_8.html + return $OCF_SUCCESS;; $OCF_ERR_GENERIC) return $rc;; # Otherwise loop, waiting for the service to start, until # the cluster times the operation out @@ -288,7 +299,6 @@ ovsdb_server_stop() { ovsdb_server_check_status case $? in $OCF_NOT_RUNNING) return ${OCF_SUCCESS};; - $OCF_RUNNING_MASTER) return ${OCF_RUNNING_MASTER};; esac ${OVN_CTL} stop_ovsdb