diff mbox series

[ovs-dev] ovs-lib: Handle daemon segfaults during exit.

Message ID 1600468401-3300-1-git-send-email-guru@ovn.org
State Accepted
Headers show
Series [ovs-dev] ovs-lib: Handle daemon segfaults during exit. | expand

Commit Message

Gurucharan Shetty Sept. 18, 2020, 10:33 p.m. UTC
Currently, we terminate a daemon by trying
"ovs-appctl exit", "SIGTERM" and finally "SIGKILL".
But the logic fails if during "ovs-appctl exit", the
daemon crashes (segfaults). The monitor will automatically
restart the daemon with a new pid. The current logic of
checking the non-existance of old pid succeeds and we proceed
with the assumption that the daemon is dead.

This is a problem during OVS upgrades as we will continue
to run the older version of OVS.

With this commit, we take care of this situation. If there
is a segfault, the pidfile is not deleted. So, we wait a
little to give time for the monitor to restart the daemon
(which is usually instantaneous) and then re-read the pidfile.

VMware-BZ: #2633995
Signed-off-by: Gurucharan Shetty <guru@ovn.org>
---
 utilities/ovs-lib.in | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Comments

Yi-Hung Wei Sept. 21, 2020, 5:53 p.m. UTC | #1
On Fri, Sep 18, 2020 at 9:54 AM Gurucharan Shetty <guru@ovn.org> wrote:
>
> Currently, we terminate a daemon by trying
> "ovs-appctl exit", "SIGTERM" and finally "SIGKILL".
> But the logic fails if during "ovs-appctl exit", the
> daemon crashes (segfaults). The monitor will automatically
> restart the daemon with a new pid. The current logic of
> checking the non-existance of old pid succeeds and we proceed
> with the assumption that the daemon is dead.
>
> This is a problem during OVS upgrades as we will continue
> to run the older version of OVS.
>
> With this commit, we take care of this situation. If there
> is a segfault, the pidfile is not deleted. So, we wait a
> little to give time for the monitor to restart the daemon
> (which is usually instantaneous) and then re-read the pidfile.
>
> VMware-BZ: #2633995
> Signed-off-by: Gurucharan Shetty <guru@ovn.org>
> ---

Thanks for the patch. I think it does fix a case when --monitor is
specified and ovs crash during "ovs-appctl exit".

Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Guru Shetty Sept. 21, 2020, 7:57 p.m. UTC | #2
On Mon, 21 Sep 2020 at 10:53, Yi-Hung Wei <yihung.wei@gmail.com> wrote:

> On Fri, Sep 18, 2020 at 9:54 AM Gurucharan Shetty <guru@ovn.org> wrote:
> >
> > Currently, we terminate a daemon by trying
> > "ovs-appctl exit", "SIGTERM" and finally "SIGKILL".
> > But the logic fails if during "ovs-appctl exit", the
> > daemon crashes (segfaults). The monitor will automatically
> > restart the daemon with a new pid. The current logic of
> > checking the non-existance of old pid succeeds and we proceed
> > with the assumption that the daemon is dead.
> >
> > This is a problem during OVS upgrades as we will continue
> > to run the older version of OVS.
> >
> > With this commit, we take care of this situation. If there
> > is a segfault, the pidfile is not deleted. So, we wait a
> > little to give time for the monitor to restart the daemon
> > (which is usually instantaneous) and then re-read the pidfile.
> >
> > VMware-BZ: #2633995
> > Signed-off-by: Gurucharan Shetty <guru@ovn.org>
> > ---
>
> Thanks for the patch. I think it does fix a case when --monitor is
> specified and ovs crash during "ovs-appctl exit".
>
> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
>

Thank you. I applied this to master.
diff mbox series

Patch

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index d646b44..f7e9756 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -255,20 +255,36 @@  stop_daemon () {
             if version_geq "$version" "2.5.90"; then
                 actions="$graceful $actions"
             fi
+            actiontype=""
             for action in $actions; do
                 if pid_exists "$pid" >/dev/null 2>&1; then :; else
-                    return 0
+                    # pid does not exist.
+                    if [ -n "$actiontype" ]; then
+                        return 0
+                    fi
+                    # But, does the file exist? We may have had a daemon
+                    # segfault with `ovs-appctl exit`. Check one more time
+                    # before deciding that the daemon is dead.
+                    [ -e "$rundir/$1.pid" ] && sleep 2 && pid=`cat "$rundir/$1.pid"` 2>/dev/null
+                    if pid_exists "$pid" >/dev/null 2>&1; then :; else
+                        return 0
+                    fi
                 fi
                 case $action in
                     EXIT)
                         action "Exiting $1 ($pid)" \
                             ${bindir}/ovs-appctl -T 1 -t $rundir/$1.$pid.ctl exit $2
+                        # The above command could have resulted in delayed
+                        # daemon segfault. And if a monitor is running, it
+                        # would restart the daemon giving it a new pid.
                         ;;
                     TERM)
                         action "Killing $1 ($pid)" kill $pid
+                        actiontype="force"
                         ;;
                     KILL)
                         action "Killing $1 ($pid) with SIGKILL" kill -9 $pid
+                        actiontype="force"
                         ;;
                     FAIL)
                         log_failure_msg "Killing $1 ($pid) failed"