diff mbox series

[ovs-dev] fatal-signal: Catch SIGSEGV and print backtrace.

Message ID 1568762004-113081-1-git-send-email-u9012063@gmail.com
State Superseded
Headers show
Series [ovs-dev] fatal-signal: Catch SIGSEGV and print backtrace. | expand

Commit Message

William Tu Sept. 17, 2019, 11:13 p.m. UTC
The patch catches the SIGSEGV signal and prints the backtrace
using libunwind, hopefully makes it easier to debug.

Signed-off-by: William Tu <u9012063@gmail.com>
---
 .travis.yml        |  1 +
 configure.ac       |  1 +
 lib/fatal-signal.c | 40 +++++++++++++++++++++++++++++++++++++++-
 m4/openvswitch.m4  | 10 ++++++++++
 4 files changed, 51 insertions(+), 1 deletion(-)

Comments

Ben Pfaff Sept. 18, 2019, 4:11 p.m. UTC | #1
On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote:
> The patch catches the SIGSEGV signal and prints the backtrace
> using libunwind, hopefully makes it easier to debug.
> 
> Signed-off-by: William Tu <u9012063@gmail.com>

I guess my experience is that this sort of thing is sometimes useful but
ultimately causes problems because it tries to call a lot of functions
that a signal handler is not supposed to call.

It might be a good idea to put some of this in lib/backtrace.[ch] and
provide some way to enable it at runtime.
William Tu Sept. 18, 2019, 8:16 p.m. UTC | #2
Thanks for the feedback.

On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote:
>
> On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote:
> > The patch catches the SIGSEGV signal and prints the backtrace
> > using libunwind, hopefully makes it easier to debug.
> >
> > Signed-off-by: William Tu <u9012063@gmail.com>
>
> I guess my experience is that this sort of thing is sometimes useful but
> ultimately causes problems because it tries to call a lot of functions
> that a signal handler is not supposed to call.

Do you mean the show_backtrace() tries to call too many functions?
What's the concern about calling lots of functions in a signal handler?

>
> It might be a good idea to put some of this in lib/backtrace.[ch] and
> provide some way to enable it at runtime.

I'm hoping this can be enabled by default, so debugging on systems
without ovs debug symbol package or gdb is easier.

Regards,
William
Ben Pfaff Sept. 18, 2019, 11:27 p.m. UTC | #3
On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote:
> Thanks for the feedback.
> 
> On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote:
> >
> > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote:
> > > The patch catches the SIGSEGV signal and prints the backtrace
> > > using libunwind, hopefully makes it easier to debug.
> > >
> > > Signed-off-by: William Tu <u9012063@gmail.com>
> >
> > I guess my experience is that this sort of thing is sometimes useful but
> > ultimately causes problems because it tries to call a lot of functions
> > that a signal handler is not supposed to call.
> 
> Do you mean the show_backtrace() tries to call too many functions?
> What's the concern about calling lots of functions in a signal handler?

There is no concern about the number of functions but about the specific
ones.  Most C library functions are not async-signal-safe, meaning that
it is not safe to call them from a signal handler.  There is a list of
functions guaranteed to be async-signal-safe in section 2.4.3 "Signal
Actions" of the System Interfaces volume of POSIX.  You can find it
here: https://pubs.opengroup.org/onlinepubs/9699919799/.  Notably, it
doesn't contain the stdio functions like printf() and fflush(), or
syslog, and those functions are likely to malfunction if the signal
handler interrupts some ongoing operation on a stream.  There's a risk
of being caught in some kind of SIGSEGV loop.

I think there needs to be a lot of care if we're going to do this by
default.
Ben Pfaff Sept. 20, 2019, 2:39 p.m. UTC | #4
On Fri, Sep 20, 2019 at 10:28:36AM -0700, William Tu wrote:
> On Wed, Sep 18, 2019 at 04:27:46PM -0700, Ben Pfaff wrote:
> > On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote:
> > > Thanks for the feedback.
> > > 
> > > On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote:
> > > >
> > > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote:
> > > > > The patch catches the SIGSEGV signal and prints the backtrace
> > > > > using libunwind, hopefully makes it easier to debug.
> > > > >
> > > > > Signed-off-by: William Tu <u9012063@gmail.com>
> > > >
> > > > I guess my experience is that this sort of thing is sometimes useful but
> > > > ultimately causes problems because it tries to call a lot of functions
> > > > that a signal handler is not supposed to call.
> > > 
> > > Do you mean the show_backtrace() tries to call too many functions?
> > > What's the concern about calling lots of functions in a signal handler?
> > 
> > There is no concern about the number of functions but about the specific
> > ones.  Most C library functions are not async-signal-safe, meaning that
> > it is not safe to call them from a signal handler.  There is a list of
> > functions guaranteed to be async-signal-safe in section 2.4.3 "Signal
> > Actions" of the System Interfaces volume of POSIX.  You can find it
> > here: https://pubs.opengroup.org/onlinepubs/9699919799/.  Notably, it
> > doesn't contain the stdio functions like printf() and fflush(), or
> > syslog, and those functions are likely to malfunction if the signal
> > handler interrupts some ongoing operation on a stream.  There's a risk
> > of being caught in some kind of SIGSEGV loop.
> > 
> > I think there needs to be a lot of care if we're going to do this by
> > default.
> 
> Thanks a lot! I wasn't aware of these issue.
> 
> In lib/backtrace.c, it uses GNU backtrace(3) and it's not signal safe.
> Fortunately, the libunwind in this patch with local unwinding is signal
> safe, 
> https://www.nongnu.org/libunwind/man/libunwind(3).html#section_5
> So for v2, I'm thinking about removing the use of printf and fllush.

That sounds like a good idea.  It is still possible to write to stderr
using write().  It might be possible to define some signal-safe vlog
interface for logging to a file, too.
William Tu Sept. 20, 2019, 5:28 p.m. UTC | #5
On Wed, Sep 18, 2019 at 04:27:46PM -0700, Ben Pfaff wrote:
> On Wed, Sep 18, 2019 at 01:16:37PM -0700, William Tu wrote:
> > Thanks for the feedback.
> > 
> > On Wed, Sep 18, 2019 at 11:30 AM Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > On Tue, Sep 17, 2019 at 04:13:24PM -0700, William Tu wrote:
> > > > The patch catches the SIGSEGV signal and prints the backtrace
> > > > using libunwind, hopefully makes it easier to debug.
> > > >
> > > > Signed-off-by: William Tu <u9012063@gmail.com>
> > >
> > > I guess my experience is that this sort of thing is sometimes useful but
> > > ultimately causes problems because it tries to call a lot of functions
> > > that a signal handler is not supposed to call.
> > 
> > Do you mean the show_backtrace() tries to call too many functions?
> > What's the concern about calling lots of functions in a signal handler?
> 
> There is no concern about the number of functions but about the specific
> ones.  Most C library functions are not async-signal-safe, meaning that
> it is not safe to call them from a signal handler.  There is a list of
> functions guaranteed to be async-signal-safe in section 2.4.3 "Signal
> Actions" of the System Interfaces volume of POSIX.  You can find it
> here: https://pubs.opengroup.org/onlinepubs/9699919799/.  Notably, it
> doesn't contain the stdio functions like printf() and fflush(), or
> syslog, and those functions are likely to malfunction if the signal
> handler interrupts some ongoing operation on a stream.  There's a risk
> of being caught in some kind of SIGSEGV loop.
> 
> I think there needs to be a lot of care if we're going to do this by
> default.

Thanks a lot! I wasn't aware of these issue.

In lib/backtrace.c, it uses GNU backtrace(3) and it's not signal safe.
Fortunately, the libunwind in this patch with local unwinding is signal
safe, 
https://www.nongnu.org/libunwind/man/libunwind(3).html#section_5
So for v2, I'm thinking about removing the use of printf and fllush.

Regards,
William
diff mbox series

Patch

diff --git a/.travis.yml b/.travis.yml
index 370b3d0a6c98..f5d62387c89b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,6 +25,7 @@  addons:
       - selinux-policy-dev
       - libunbound-dev
       - libunbound-dev:i386
+      - libunwind-dev
 
 before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh
 
diff --git a/configure.ac b/configure.ac
index 1d45c4fdd153..15922418062b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -139,6 +139,7 @@  OVS_LIBTOOL_VERSIONS
 OVS_CHECK_CXX
 AX_FUNC_POSIX_MEMALIGN
 OVS_CHECK_UNBOUND
+OVS_CHECK_UNWIND
 
 OVS_CHECK_INCLUDE_NEXT([stdio.h string.h])
 AC_CONFIG_FILES([
diff --git a/lib/fatal-signal.c b/lib/fatal-signal.c
index 3b905b6de766..a7325c1ba37e 100644
--- a/lib/fatal-signal.c
+++ b/lib/fatal-signal.c
@@ -34,6 +34,11 @@ 
 
 #include "openvswitch/type-props.h"
 
+#ifdef HAVE_UNWIND
+#define UNW_LOCAL_ONLY
+#include <libunwind.h>
+#endif
+
 #ifndef SIG_ATOMIC_MAX
 #define SIG_ATOMIC_MAX TYPE_MAXIMUM(sig_atomic_t)
 #endif
@@ -42,7 +47,8 @@  VLOG_DEFINE_THIS_MODULE(fatal_signal);
 
 /* Signals to catch. */
 #ifndef _WIN32
-static const int fatal_signals[] = { SIGTERM, SIGINT, SIGHUP, SIGALRM };
+static const int fatal_signals[] = { SIGTERM, SIGINT, SIGHUP, SIGALRM,
+                                     SIGSEGV };
 #else
 static const int fatal_signals[] = { SIGTERM };
 #endif
@@ -151,6 +157,32 @@  fatal_signal_add_hook(void (*hook_cb)(void *aux), void (*cancel_cb)(void *aux),
     ovs_mutex_unlock(&mutex);
 }
 
+#ifdef HAVE_UNWIND
+static void
+show_backtrace(void) {
+    char buf[4096];
+    unw_cursor_t cursor;
+    unw_context_t uc;
+    unw_word_t ip, sp;
+    unw_word_t offset;
+
+    unw_getcontext(&uc);
+    unw_init_local(&cursor, &uc);
+
+    while (unw_step(&cursor) > 0) {
+        unw_get_reg(&cursor, UNW_REG_IP, &ip);
+        unw_get_reg(&cursor, UNW_REG_SP, &sp);
+        unw_get_proc_name(&cursor, buf, 4095, &offset);
+        VLOG_WARN("0x%016lx <%s+0x%lx>\n", ip, buf, offset);
+    }
+}
+#else
+static void
+show_backtrace(void) {
+    /* Nothing. */
+}
+#endif
+
 /* Handles fatal signal number 'sig_nr'.
  *
  * Ordinarily this is the actual signal handler.  When other code needs to
@@ -164,6 +196,12 @@  void
 fatal_signal_handler(int sig_nr)
 {
 #ifndef _WIN32
+    if (sig_nr == SIGSEGV) {
+        show_backtrace();
+        fflush(stderr);
+        signal(sig_nr, SIG_DFL);
+        raise(sig_nr);
+    }
     ignore(write(signal_fds[1], "", 1));
 #else
     SetEvent(wevent);
diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4
index cd6b51d86c16..f8bb069e80c9 100644
--- a/m4/openvswitch.m4
+++ b/m4/openvswitch.m4
@@ -705,3 +705,13 @@  AC_DEFUN([OVS_CHECK_UNBOUND],
    fi
    AM_CONDITIONAL([HAVE_UNBOUND], [test "$HAVE_UNBOUND" = yes])
    AC_SUBST([HAVE_UNBOUND])])
+
+dnl Checks for libunwind.
+AC_DEFUN([OVS_CHECK_UNWIND],
+  [AC_CHECK_LIB(unwind, unw_backtrace, [HAVE_UNWIND=yes], [HAVE_UNWIND=no])
+   if test "$HAVE_UNWIND" = yes; then
+     AC_DEFINE([HAVE_UNWIND], [1], [Define to 1 if unwind is detected.])
+     LIBS="$LIBS -lunwind"
+   fi
+   AM_CONDITIONAL([HAVE_UNWIND], [test "$HAVE_UNWIND" = yes])
+   AC_SUBST([HAVE_UNWIND])])