Message ID | 20190814083913.9273-1-michele@acksyn.org |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] Make pidfile_is_running more robust against empty pidfiles | expand |
Bleep bloop. Greetings Michele Baldessari, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 88 characters long (recommended limit is 79) #46 FILE: ovn/utilities/ovn-ctl:38: test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" Lines checked: 52, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email aconole@redhat.com Thanks, 0-day Robot
On 14.08.2019 11:39, Michele Baldessari wrote: > In some of our destructive testing of ovn-dbs inside containers managed > by pacemaker we reached a situation where /var/run/openvswitch had > empty .pid files. The current code does not deal well with them > and pidfile_is_running() returns true in such a case and this confuses > the OCF resource agent. > > - Before this change: > Inside a container run: > killall ovsdb-server; > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid > > We will observe that the cluster is unable to ever recover because > it believes the ovn processes to be running when they really aren't and > eventually just fails: > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > - After this change the cluster is able to recover from this state and > correctly start the resource: > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > Signed-off-by: Michele Baldessari <michele@acksyn.org> > --- > ovn/utilities/ovn-ctl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > index 7e5cd469c83c..65f03e28ddba 100755 > --- a/ovn/utilities/ovn-ctl > +++ b/ovn/utilities/ovn-ctl > @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" > > pidfile_is_running () { > pidfile=$1 > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" Hi. Thanks for the fix! Maybe it's better to add additional check for an empty argument to 'pid_exists' function instead? This will cover more cases like invocations from the utilities/ovs-lib.in. I think, you may also add following tag to commit-message in this case: Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") This patch also will be needed in ovn-org/ovn repository too. (Use 'PATCH ovn' subject prefix while sending patches targeted for ovn repo.) Best regards, Ilya Maximets.
On Wed, Aug 14, 2019 at 02:28:13PM +0300, Ilya Maximets wrote: > On 14.08.2019 11:39, Michele Baldessari wrote: > > In some of our destructive testing of ovn-dbs inside containers managed > > by pacemaker we reached a situation where /var/run/openvswitch had > > empty .pid files. The current code does not deal well with them > > and pidfile_is_running() returns true in such a case and this confuses > > the OCF resource agent. > > > > - Before this change: > > Inside a container run: > > killall ovsdb-server; > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid > > > > We will observe that the cluster is unable to ever recover because > > it believes the ovn processes to be running when they really aren't and > > eventually just fails: > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > > > - After this change the cluster is able to recover from this state and > > correctly start the resource: > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > > > Signed-off-by: Michele Baldessari <michele@acksyn.org> > > --- > > ovn/utilities/ovn-ctl | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > > index 7e5cd469c83c..65f03e28ddba 100755 > > --- a/ovn/utilities/ovn-ctl > > +++ b/ovn/utilities/ovn-ctl > > @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" > > > > pidfile_is_running () { > > pidfile=$1 > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" > > Hi. Thanks for the fix! > > Maybe it's better to add additional check for an empty argument to > 'pid_exists' function instead? This will cover more cases like invocations > from the utilities/ovs-lib.in. > > I think, you may also add following tag to commit-message in this case: > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") > > This patch also will be needed in ovn-org/ovn repository too. > (Use 'PATCH ovn' subject prefix while sending patches targeted for ovn repo.) > > Best regards, Ilya Maximets. Thanks for the feedback Ilya, I have amended things (hopefully correctly) in http://patchwork.ozlabs.org/patch/1147111/ (I could not figure out how to update an existing patch in patchwork, I hope this is okay) Kind regards, Michele
On Wed, Aug 14, 2019 at 9:21 PM Michele Baldessari <michele@acksyn.org> wrote: > On Wed, Aug 14, 2019 at 02:28:13PM +0300, Ilya Maximets wrote: > > On 14.08.2019 11:39, Michele Baldessari wrote: > > > In some of our destructive testing of ovn-dbs inside containers managed > > > by pacemaker we reached a situation where /var/run/openvswitch had > > > empty .pid files. The current code does not deal well with them > > > and pidfile_is_running() returns true in such a case and this confuses > > > the OCF resource agent. > > > > > > - Before this change: > > > Inside a container run: > > > killall ovsdb-server; > > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > > /var/run/openvswitch/ovnsb_db.pid > > > > > > We will observe that the cluster is unable to ever recover because > > > it believes the ovn processes to be running when they really aren't and > > > eventually just fails: > > > podman container set: ovn-dbs-bundle [ > 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master > controller-0 > > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped > controller-1 > > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave > controller-2 > > > > > > - After this change the cluster is able to recover from this state and > > > correctly start the resource: > > > podman container set: ovn-dbs-bundle [ > 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] > > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master > controller-0 > > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave > controller-1 > > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave > controller-2 > > > > > > Signed-off-by: Michele Baldessari <michele@acksyn.org> > > > --- > > > ovn/utilities/ovn-ctl | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > > > index 7e5cd469c83c..65f03e28ddba 100755 > > > --- a/ovn/utilities/ovn-ctl > > > +++ b/ovn/utilities/ovn-ctl > > > @@ -35,7 +35,7 @@ > ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" > > > > > > pidfile_is_running () { > > > pidfile=$1 > > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > > > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` > && pid_exists "$pid" > > > > Hi. Thanks for the fix! > > > > Maybe it's better to add additional check for an empty argument to > > 'pid_exists' function instead? This will cover more cases like > invocations > > from the utilities/ovs-lib.in. > > > > I think, you may also add following tag to commit-message in this case: > > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as > non-root.") > > > > This patch also will be needed in ovn-org/ovn repository too. > > (Use 'PATCH ovn' subject prefix while sending patches targeted for ovn > repo.) > > > > Best regards, Ilya Maximets. > > Thanks for the feedback Ilya, I have amended things (hopefully correctly) > in > http://patchwork.ozlabs.org/patch/1147111/ (I could not figure out how > to update an existing patch in patchwork, I hope this is okay) > > Hi Michele and Ilya, I applied this fix to the OVN repo. It's possible that the fix to address this issue in ovs-lib.in could be missing in some deployments if older ovs version is used. I thought its no harm in having the fixes in both ovn-ctl and ovs-lib.in. Thanks Numan Kind regards, > Michele > -- > Michele Baldessari <michele@acksyn.org> > C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D >
On 20.08.2019 11:48, Numan Siddique wrote: > > > On Wed, Aug 14, 2019 at 9:21 PM Michele Baldessari <michele@acksyn.org <mailto:michele@acksyn.org>> wrote: > > On Wed, Aug 14, 2019 at 02:28:13PM +0300, Ilya Maximets wrote: > > On 14.08.2019 11:39, Michele Baldessari wrote: > > > In some of our destructive testing of ovn-dbs inside containers managed > > > by pacemaker we reached a situation where /var/run/openvswitch had > > > empty .pid files. The current code does not deal well with them > > > and pidfile_is_running() returns true in such a case and this confuses > > > the OCF resource agent. > > > > > > - Before this change: > > > Inside a container run: > > > killall ovsdb-server; > > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid > > > > > > We will observe that the cluster is unable to ever recover because > > > it believes the ovn processes to be running when they really aren't and > > > eventually just fails: > > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest <http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] > > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 > > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > > > > > - After this change the cluster is able to recover from this state and > > > correctly start the resource: > > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest <http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] > > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 > > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 > > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 > > > > > > Signed-off-by: Michele Baldessari <michele@acksyn.org <mailto:michele@acksyn.org>> > > > --- > > > ovn/utilities/ovn-ctl | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > > > index 7e5cd469c83c..65f03e28ddba 100755 > > > --- a/ovn/utilities/ovn-ctl > > > +++ b/ovn/utilities/ovn-ctl > > > @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" > > > > > > pidfile_is_running () { > > > pidfile=$1 > > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > > > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" > > > > Hi. Thanks for the fix! > > > > Maybe it's better to add additional check for an empty argument to > > 'pid_exists' function instead? This will cover more cases like invocations > > from the utilities/ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in>. > > > > I think, you may also add following tag to commit-message in this case: > > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") > > > > This patch also will be needed in ovn-org/ovn repository too. > > (Use 'PATCH ovn' subject prefix while sending patches targeted for ovn repo.) > > > > Best regards, Ilya Maximets. > > Thanks for the feedback Ilya, I have amended things (hopefully correctly) in > http://patchwork.ozlabs.org/patch/1147111/ (I could not figure out how > to update an existing patch in patchwork, I hope this is okay) > > > Hi Michele and Ilya, > > I applied this fix to the OVN repo. It's possible that the fix to address this issue in ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in> could > be missing in some deployments if older ovs version is used. I thought its no harm in having > the fixes in both ovn-ctl and ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in>. > Hi Numan, There was already v2 for this patch (a bit renamed): OVS: https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361678.html OVN: https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361679.html Best regards, Ilya Maximets.
On 20.08.2019 12:16, Ilya Maximets wrote: > On 20.08.2019 11:48, Numan Siddique wrote: >> >> >> On Wed, Aug 14, 2019 at 9:21 PM Michele Baldessari <michele@acksyn.org <mailto:michele@acksyn.org>> wrote: >> >> On Wed, Aug 14, 2019 at 02:28:13PM +0300, Ilya Maximets wrote: >> > On 14.08.2019 11:39, Michele Baldessari wrote: >> > > In some of our destructive testing of ovn-dbs inside containers managed >> > > by pacemaker we reached a situation where /var/run/openvswitch had >> > > empty .pid files. The current code does not deal well with them >> > > and pidfile_is_running() returns true in such a case and this confuses >> > > the OCF resource agent. >> > > >> > > - Before this change: >> > > Inside a container run: >> > > killall ovsdb-server; >> > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid >> > > >> > > We will observe that the cluster is unable to ever recover because >> > > it believes the ovn processes to be running when they really aren't and >> > > eventually just fails: >> > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest <http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] >> > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 >> > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 >> > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 >> > > >> > > - After this change the cluster is able to recover from this state and >> > > correctly start the resource: >> > > podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest <http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] >> > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 >> > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 >> > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 >> > > >> > > Signed-off-by: Michele Baldessari <michele@acksyn.org <mailto:michele@acksyn.org>> >> > > --- >> > > ovn/utilities/ovn-ctl | 2 +- >> > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > >> > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl >> > > index 7e5cd469c83c..65f03e28ddba 100755 >> > > --- a/ovn/utilities/ovn-ctl >> > > +++ b/ovn/utilities/ovn-ctl >> > > @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" >> > > >> > > pidfile_is_running () { >> > > pidfile=$1 >> > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" >> > > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" >> > >> > Hi. Thanks for the fix! >> > >> > Maybe it's better to add additional check for an empty argument to >> > 'pid_exists' function instead? This will cover more cases like invocations >> > from the utilities/ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in>. >> > >> > I think, you may also add following tag to commit-message in this case: >> > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as non-root.") >> > >> > This patch also will be needed in ovn-org/ovn repository too. >> > (Use 'PATCH ovn' subject prefix while sending patches targeted for ovn repo.) >> > >> > Best regards, Ilya Maximets. >> >> Thanks for the feedback Ilya, I have amended things (hopefully correctly) in >> http://patchwork.ozlabs.org/patch/1147111/ (I could not figure out how >> to update an existing patch in patchwork, I hope this is okay) >> >> >> Hi Michele and Ilya, >> >> I applied this fix to the OVN repo. It's possible that the fix to address this issue in ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in> could >> be missing in some deployments if older ovs version is used. I thought its no harm in having >> the fixes in both ovn-ctl and ovs-lib.in <https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in>. >> > > Hi Numan, > > There was already v2 for this patch (a bit renamed): > OVS: https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361678.html > OVN: https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361679.html Sorry, maybe I misunderstood what you wanted to do. Do you suggest to apply v1 to OVN repo and v2 to OVS repo? What about applying v2 to OVN repo? > > Best regards, Ilya Maximets. > >
On Tue, Aug 20, 2019 at 2:52 PM Ilya Maximets <i.maximets@samsung.com> wrote: > On 20.08.2019 12:16, Ilya Maximets wrote: > > On 20.08.2019 11:48, Numan Siddique wrote: > >> > >> > >> On Wed, Aug 14, 2019 at 9:21 PM Michele Baldessari <michele@acksyn.org > <mailto:michele@acksyn.org>> wrote: > >> > >> On Wed, Aug 14, 2019 at 02:28:13PM +0300, Ilya Maximets wrote: > >> > On 14.08.2019 11:39, Michele Baldessari wrote: > >> > > In some of our destructive testing of ovn-dbs inside containers > managed > >> > > by pacemaker we reached a situation where /var/run/openvswitch > had > >> > > empty .pid files. The current code does not deal well with them > >> > > and pidfile_is_running() returns true in such a case and this > confuses > >> > > the OCF resource agent. > >> > > > >> > > - Before this change: > >> > > Inside a container run: > >> > > killall ovsdb-server; > >> > > echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > > /var/run/openvswitch/ovnsb_db.pid > >> > > > >> > > We will observe that the cluster is unable to ever recover > because > >> > > it believes the ovn processes to be running when they really > aren't and > >> > > eventually just fails: > >> > > podman container set: ovn-dbs-bundle [ > 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest < > http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] > >> > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master > controller-0 > >> > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped > controller-1 > >> > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave > controller-2 > >> > > > >> > > - After this change the cluster is able to recover from this > state and > >> > > correctly start the resource: > >> > > podman container set: ovn-dbs-bundle [ > 192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest < > http://192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest>] > >> > > ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master > controller-0 > >> > > ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave > controller-1 > >> > > ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave > controller-2 > >> > > > >> > > Signed-off-by: Michele Baldessari <michele@acksyn.org <mailto: > michele@acksyn.org>> > >> > > --- > >> > > ovn/utilities/ovn-ctl | 2 +- > >> > > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > > >> > > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > >> > > index 7e5cd469c83c..65f03e28ddba 100755 > >> > > --- a/ovn/utilities/ovn-ctl > >> > > +++ b/ovn/utilities/ovn-ctl > >> > > @@ -35,7 +35,7 @@ > ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" > >> > > > >> > > pidfile_is_running () { > >> > > pidfile=$1 > >> > > - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists > "$pid" > >> > > + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat > "$pidfile"` && pid_exists "$pid" > >> > > >> > Hi. Thanks for the fix! > >> > > >> > Maybe it's better to add additional check for an empty argument to > >> > 'pid_exists' function instead? This will cover more cases like > invocations > >> > from the utilities/ovs-lib.in < > https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in > >. > >> > > >> > I think, you may also add following tag to commit-message in this > case: > >> > Fixes: 3028ce2595c8 ("ovs-lib: Allow "status" command to work as > non-root.") > >> > > >> > This patch also will be needed in ovn-org/ovn repository too. > >> > (Use 'PATCH ovn' subject prefix while sending patches targeted > for ovn repo.) > >> > > >> > Best regards, Ilya Maximets. > >> > >> Thanks for the feedback Ilya, I have amended things (hopefully > correctly) in > >> http://patchwork.ozlabs.org/patch/1147111/ (I could not figure out > how > >> to update an existing patch in patchwork, I hope this is okay) > >> > >> > >> Hi Michele and Ilya, > >> > >> I applied this fix to the OVN repo. It's possible that the fix to > address this issue in ovs-lib.in < > https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in> > could > >> be missing in some deployments if older ovs version is used. I thought > its no harm in having > >> the fixes in both ovn-ctl and ovs-lib.in < > https://protect2.fireeye.com/url?k=64579ecddf75e065.64561582-0f6314ee43bf7086&u=http://ovs-lib.in > >. > >> > > > > Hi Numan, > > > > There was already v2 for this patch (a bit renamed): > > OVS: > https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361678.html > > OVN: > https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361679.html > > Sorry, maybe I misunderstood what you wanted to do. > Do you suggest to apply v1 to OVN repo and v2 to OVS repo? > What about applying v2 to OVN repo? > > Yes. v1 to OVN repo and v2 to OVS repo. OVN repo doesn't have the ovs-lib.in file. So v2 won't apply there. We could apply v1 to OVS repo too, but that wouldn't make sense as we may delete the ovn folder once this patch is accepted - https://patchwork.ozlabs.org/patch/1147617/ Thanks Numan > > > > Best regards, Ilya Maximets. > > > > >
diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl index 7e5cd469c83c..65f03e28ddba 100755 --- a/ovn/utilities/ovn-ctl +++ b/ovn/utilities/ovn-ctl @@ -35,7 +35,7 @@ ovn_northd_db_conf_file="$etcdir/ovn-northd-db-params.conf" pidfile_is_running () { pidfile=$1 - test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" + test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" } >/dev/null 2>&1 stop_nb_ovsdb() {
In some of our destructive testing of ovn-dbs inside containers managed by pacemaker we reached a situation where /var/run/openvswitch had empty .pid files. The current code does not deal well with them and pidfile_is_running() returns true in such a case and this confuses the OCF resource agent. - Before this change: Inside a container run: killall ovsdb-server; echo -n '' > /var/run/openvswitch/ovnnb_db.pid; echo -n '' > /var/run/openvswitch/ovnsb_db.pid We will observe that the cluster is unable to ever recover because it believes the ovn processes to be running when they really aren't and eventually just fails: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 - After this change the cluster is able to recover from this state and correctly start the resource: podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master controller-0 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave controller-1 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave controller-2 Signed-off-by: Michele Baldessari <michele@acksyn.org> --- ovn/utilities/ovn-ctl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)