Message ID | 1600468401-3300-1-git-send-email-guru@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] ovs-lib: Handle daemon segfaults during exit. | expand |
On Fri, Sep 18, 2020 at 9:54 AM Gurucharan Shetty <guru@ovn.org> wrote: > > Currently, we terminate a daemon by trying > "ovs-appctl exit", "SIGTERM" and finally "SIGKILL". > But the logic fails if during "ovs-appctl exit", the > daemon crashes (segfaults). The monitor will automatically > restart the daemon with a new pid. The current logic of > checking the non-existance of old pid succeeds and we proceed > with the assumption that the daemon is dead. > > This is a problem during OVS upgrades as we will continue > to run the older version of OVS. > > With this commit, we take care of this situation. If there > is a segfault, the pidfile is not deleted. So, we wait a > little to give time for the monitor to restart the daemon > (which is usually instantaneous) and then re-read the pidfile. > > VMware-BZ: #2633995 > Signed-off-by: Gurucharan Shetty <guru@ovn.org> > --- Thanks for the patch. I think it does fix a case when --monitor is specified and ovs crash during "ovs-appctl exit". Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
On Mon, 21 Sep 2020 at 10:53, Yi-Hung Wei <yihung.wei@gmail.com> wrote: > On Fri, Sep 18, 2020 at 9:54 AM Gurucharan Shetty <guru@ovn.org> wrote: > > > > Currently, we terminate a daemon by trying > > "ovs-appctl exit", "SIGTERM" and finally "SIGKILL". > > But the logic fails if during "ovs-appctl exit", the > > daemon crashes (segfaults). The monitor will automatically > > restart the daemon with a new pid. The current logic of > > checking the non-existance of old pid succeeds and we proceed > > with the assumption that the daemon is dead. > > > > This is a problem during OVS upgrades as we will continue > > to run the older version of OVS. > > > > With this commit, we take care of this situation. If there > > is a segfault, the pidfile is not deleted. So, we wait a > > little to give time for the monitor to restart the daemon > > (which is usually instantaneous) and then re-read the pidfile. > > > > VMware-BZ: #2633995 > > Signed-off-by: Gurucharan Shetty <guru@ovn.org> > > --- > > Thanks for the patch. I think it does fix a case when --monitor is > specified and ovs crash during "ovs-appctl exit". > > Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> > Thank you. I applied this to master.
diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in index d646b44..f7e9756 100644 --- a/utilities/ovs-lib.in +++ b/utilities/ovs-lib.in @@ -255,20 +255,36 @@ stop_daemon () { if version_geq "$version" "2.5.90"; then actions="$graceful $actions" fi + actiontype="" for action in $actions; do if pid_exists "$pid" >/dev/null 2>&1; then :; else - return 0 + # pid does not exist. + if [ -n "$actiontype" ]; then + return 0 + fi + # But, does the file exist? We may have had a daemon + # segfault with `ovs-appctl exit`. Check one more time + # before deciding that the daemon is dead. + [ -e "$rundir/$1.pid" ] && sleep 2 && pid=`cat "$rundir/$1.pid"` 2>/dev/null + if pid_exists "$pid" >/dev/null 2>&1; then :; else + return 0 + fi fi case $action in EXIT) action "Exiting $1 ($pid)" \ ${bindir}/ovs-appctl -T 1 -t $rundir/$1.$pid.ctl exit $2 + # The above command could have resulted in delayed + # daemon segfault. And if a monitor is running, it + # would restart the daemon giving it a new pid. ;; TERM) action "Killing $1 ($pid)" kill $pid + actiontype="force" ;; KILL) action "Killing $1 ($pid) with SIGKILL" kill -9 $pid + actiontype="force" ;; FAIL) log_failure_msg "Killing $1 ($pid) failed"
Currently, we terminate a daemon by trying "ovs-appctl exit", "SIGTERM" and finally "SIGKILL". But the logic fails if during "ovs-appctl exit", the daemon crashes (segfaults). The monitor will automatically restart the daemon with a new pid. The current logic of checking the non-existance of old pid succeeds and we proceed with the assumption that the daemon is dead. This is a problem during OVS upgrades as we will continue to run the older version of OVS. With this commit, we take care of this situation. If there is a segfault, the pidfile is not deleted. So, we wait a little to give time for the monitor to restart the daemon (which is usually instantaneous) and then re-read the pidfile. VMware-BZ: #2633995 Signed-off-by: Gurucharan Shetty <guru@ovn.org> --- utilities/ovs-lib.in | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)