Message ID | 20190814154707.15023-1-michele@acksyn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v2] Make pid_exists() more robust against empty pid argument | expand |
On 14.08.2019 18:47, Michele Baldessari wrote: > In some of our destructive testing of ovn-dbs inside containers managed > by pacemaker we reached a situation where /var/run/openvswitch had > empty .pid files. The current code does not deal well with them > and pidfile_is_running() returns true in such a case and this confuses > the OCF resource agent. > > - Before this change: > Inside a container run: > killall ovsdb-server; > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid What about whitespaces? I mean, if you'll write ' ' instead of '', the check 'test -n "$1"' will succeed and the test will fail. To handle this case we need to trim off whitespaces by the 'tr' utility or change the proc checker to something like 'test -f /proc/"$1"/status'. What do you think? > > We will observe that the cluster is unable to ever recover because > it believes the ovn processes to be running when they really aren't and > eventually just fails: > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > Let's make sure pid_exists() returns false when the pid is an empty > string. > > - After this change the cluster is able to recover from this state and > correctly start the resource: > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") > > Signed-off-by: Michele Baldessari <michele@acksyn.org> > --- > v1 -> v2 > ======== > - Implemented Ilya's suggestion and moved the check from > pidfile_is_running() to pid_exists() and re-run my tests > --- > utilities/ovs-lib.in | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in > index fa840ec637f5..dc485413ef0c 100644 > --- a/utilities/ovs-lib.in > +++ b/utilities/ovs-lib.in > @@ -127,7 +127,7 @@ fi > pid_exists () { > # This is better than "kill -0" because it doesn't require permission to > # send a signal (so daemon_status in particular works as non-root). > - test -d /proc/"$1" > + test -n "$1" && test -d /proc/"$1" > } > > pid_comm_check () { >
On Tue, Aug 27, 2019 at 02:43:21PM +0300, Ilya Maximets wrote: > On 14.08.2019 18:47, Michele Baldessari wrote: > > In some of our destructive testing of ovn-dbs inside containers managed > > by pacemaker we reached a situation where /var/run/openvswitch had > > empty .pid files. The current code does not deal well with them > > and pidfile_is_running() returns true in such a case and this confuses > > the OCF resource agent. > > > > - Before this change: > > Inside a container run: > > killall ovsdb-server; > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid > > What about whitespaces? > I mean, if you'll write ' ' instead of '', the check 'test -n "$1"' > will succeed and the test will fail. I think that is OK, because test -d "/proc/ " will also fail. > To handle this case we need to trim off whitespaces by the 'tr' utility > or change the proc checker to something like 'test -f /proc/"$1"/status'. I guess to be absolutely certain we'd need something like case $1 in '') false ;; # Reject empty string *[!0-9]*) false ;; # Reject anything with non-digits *) test -d /proc/$1 ;; esac Anyway, I think that this is an improvement, so I applied it to master and backported it.
On 29.08.2019 17:46, Ben Pfaff wrote: > On Tue, Aug 27, 2019 at 02:43:21PM +0300, Ilya Maximets wrote: >> On 14.08.2019 18:47, Michele Baldessari wrote: >>> In some of our destructive testing of ovn-dbs inside containers managed >>> by pacemaker we reached a situation where /var/run/openvswitch had >>> empty .pid files. The current code does not deal well with them >>> and pidfile_is_running() returns true in such a case and this confuses >>> the OCF resource agent. >>> >>> - Before this change: >>> Inside a container run: >>> killall ovsdb-server; >>> echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid >> >> What about whitespaces? >> I mean, if you'll write ' ' instead of '', the check 'test -n "$1"' >> will succeed and the test will fail. > > I think that is OK, because test -d "/proc/ " will also fail. Oh, I see. Good point. > >> To handle this case we need to trim off whitespaces by the 'tr' utility >> or change the proc checker to something like 'test -f /proc/"$1"/status'. > > I guess to be absolutely certain we'd need something like > > case $1 in > '') false ;; # Reject empty string > *[!0-9]*) false ;; # Reject anything with non-digits > *) test -d /proc/$1 ;; > esac > > Anyway, I think that this is an improvement, so I applied it to master > and backported it. Agree. Thanks! Best regards, Ilya Maximets.
diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in index fa840ec637f5..dc485413ef0c 100644 --- a/utilities/ovs-lib.in +++ b/utilities/ovs-lib.in @@ -127,7 +127,7 @@ fi pid_exists () { # This is better than "kill -0" because it doesn't require permission to # send a signal (so daemon_status in particular works as non-root). - test -d /proc/"$1" + test -n "$1" && test -d /proc/"$1" } pid_comm_check () {
In some of our destructive testing of ovn-dbs inside containers managed by pacemaker we reached a situation where /var/run/openvswitch had empty .pid files. The current code does not deal well with them and pidfile_is_running() returns true in such a case and this confuses the OCF resource agent. - Before this change: Inside a container run: killall ovsdb-server; echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid We will observe that the cluster is unable to ever recover because it believes the ovn processes to be running when they really aren't and eventually just fails: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 Let's make sure pid_exists() returns false when the pid is an empty string. - After this change the cluster is able to recover from this state and correctly start the resource: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") Signed-off-by: Michele Baldessari <michele@acksyn.org> --- v1 -> v2 ======== - Implemented Ilya's suggestion and moved the check from pidfile_is_running() to pid_exists() and re-run my tests --- utilities/ovs-lib.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)