[ovs-dev,v2,ovn] Make pid_exists() more robust against empty pid argument
diff mbox series

Message ID 20190814154857.15884-1-michele@acksyn.org
State Not Applicable
Headers show
Series
  • [ovs-dev,v2,ovn] Make pid_exists() more robust against empty pid argument
Related show

Commit Message

Michele Baldessari Aug. 14, 2019, 3:48 p.m. UTC
In some of our destructive testing of ovn-dbs inside containers managed
by pacemaker we reached a situation where /var/run/openvswitch had
empty .pid files. The current code does not deal well with them
and pidfile_is_running() returns true in such a case and this confuses
the OCF resource agent.

- Before this change:
Inside a container run:
  killall ovsdb-server;
  echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid

We will observe that the cluster is unable to ever recover because
it believes the ovn processes to be running when they really aren't and
eventually just fails:
 podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
   ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Stopped controller-1
   ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2

Let's make sure pid_exists() returns false when the pid is an empty
string.

- After this change the cluster is able to recover from this state and
correctly start the resource:
 podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
   ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Slave controller-1
   ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2

Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.")

Signed-off-by: Michele Baldessari <michele@acksyn.org>
---
v1 -> v2
========
- Implemented Ilya's suggestion and moved the check from
  pidfile_is_running() to pid_exists() and re-run my tests
---
 utilities/ovs-lib.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniel Alvarez Sanchez Aug. 20, 2019, 8:34 a.m. UTC | #1
On Wed, Aug 14, 2019 at 5:49 PM Michele Baldessari <michele@acksyn.org> wrote:
>
> In some of our destructive testing of ovn-dbs inside containers managed
> by pacemaker we reached a situation where /var/run/openvswitch had
> empty .pid files. The current code does not deal well with them
> and pidfile_is_running() returns true in such a case and this confuses
> the OCF resource agent.
>
> - Before this change:
> Inside a container run:
>   killall ovsdb-server;
>   echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid
>
> We will observe that the cluster is unable to ever recover because
> it believes the ovn processes to be running when they really aren't and
> eventually just fails:
>  podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
>    ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
>    ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Stopped controller-1
>    ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2
>
> Let's make sure pid_exists() returns false when the pid is an empty
> string.
>
> - After this change the cluster is able to recover from this state and
> correctly start the resource:
>  podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
>    ovn-dbs-bundle-0     (ocf::ovn:ovndb-servers):       Master controller-0
>    ovn-dbs-bundle-1     (ocf::ovn:ovndb-servers):       Slave controller-1
>    ovn-dbs-bundle-2     (ocf::ovn:ovndb-servers):       Slave controller-2
>
> Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.")
>
> Signed-off-by: Michele Baldessari <michele@acksyn.org>
> ---
> v1 -> v2
> ========
> - Implemented Ilya's suggestion and moved the check from
>   pidfile_is_running() to pid_exists() and re-run my tests
> ---
>  utilities/ovs-lib.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> index fa840ec637f5..dc485413ef0c 100644
> --- a/utilities/ovs-lib.in
> +++ b/utilities/ovs-lib.in
> @@ -127,7 +127,7 @@ fi
>  pid_exists () {
>      # This is better than "kill -0" because it doesn't require permission to
>      # send a signal (so daemon_status in particular works as non-root).
> -    test -d /proc/"$1"
> +    test -n "$1" && test -d /proc/"$1"
>  }
>
>  pid_comm_check () {
> --
> 2.21.0

Acked-By:  Daniel Alvarez <dalvarez@redhat.com>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Patch
diff mbox series

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index fa840ec637f5..dc485413ef0c 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -127,7 +127,7 @@  fi
 pid_exists () {
     # This is better than "kill -0" because it doesn't require permission to
     # send a signal (so daemon_status in particular works as non-root).
-    test -d /proc/"$1"
+    test -n "$1" && test -d /proc/"$1"
 }
 
 pid_comm_check () {