malloc: Fix attached thread reference count handling [BZ #19243]
diff mbox

Message ID 564B1B91.30607@redhat.com
State New
Headers show

Commit Message

Florian Weimer Nov. 17, 2015, 12:20 p.m. UTC
This patch does not address the deadlock scenario around list_lock and
the fork handler.

I tried various different approaches, and there is an immediate
temptation to introduce proper synchronization into the arena selection
logic.  This is why I went with the minimal change for now.

The test, while racy, is extremely reliable in practice.  It usually
crashes within one or two seconds.  I have never seen it succeeding
without the fix.

Florian

Comments

Florian Weimer Dec. 6, 2015, 12:03 p.m. UTC | #1
On 11/17/2015 01:20 PM, Florian Weimer wrote:
> This patch does not address the deadlock scenario around list_lock and
> the fork handler.
> 
> I tried various different approaches, and there is an immediate
> temptation to introduce proper synchronization into the arena selection
> logic.  This is why I went with the minimal change for now.
> 
> The test, while racy, is extremely reliable in practice.  It usually
> crashes within one or two seconds.  I have never seen it succeeding
> without the fix.

Ping?

<https://sourceware.org/ml/libc-alpha/2015-11/msg00351.html>

Florian
Carlos O'Donell Dec. 16, 2015, 4:51 a.m. UTC | #2
On 11/17/2015 07:20 AM, Florian Weimer wrote:
> The test, while racy, is extremely reliable in practice.  It usually
> crashes within one or two seconds.  I have never seen it succeeding
> without the fix.

Thank you for the test.
 
> malloc: Fix attached thread reference count handling [BZ #19243]
> 
> reused_arena can increase the attached thread count of arenas on the
> free list.  This means that the assertion that the reference count is
> zero is incorrect.  In this case, the reference count initialization
> is incorrect as well and could cause arenas to be put on the free
> list too early (while they still have attached threads).
> 
> 2015-11-17  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #19243]
> 	* malloc/arena.c (get_free_list): Remove assert and adjust
> 	reference count handling.
> 	* malloc/tst-malloc-thread-exit.c: New file.

I have reviewed the concurrency aspects of this patch, and it looks
good to me. The review is made significantly easier because of the
list_lock mutex which serialiazes access to the various data elements
in question.

OK to checkin if you:

(a) Add a comment in reused_arena that says that it may attach
    threads to arenas that were on the free_list without removing
    such arenas from the free_list.

(b) One more comment in the test case. See below.

> diff --git a/malloc/Makefile b/malloc/Makefile
> index 67ed293..aa0579c 100644
> --- a/malloc/Makefile
> +++ b/malloc/Makefile
> @@ -28,7 +28,7 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
>  	 tst-mallocstate tst-mcheck tst-mallocfork tst-trim1 \
>  	 tst-malloc-usable tst-realloc tst-posix_memalign \
>  	 tst-pvalloc tst-memalign tst-mallopt tst-scratch_buffer \
> -	 tst-malloc-backtrace
> +	 tst-malloc-backtrace tst-malloc-thread-exit

OK.

>  test-srcs = tst-mtrace
>  
>  routines = malloc morecore mcheck mtrace obstack \
> @@ -47,6 +47,8 @@ libmemusage-inhibit-o = $(filter-out .os,$(object-suffixes))
>  
>  $(objpfx)tst-malloc-backtrace: $(common-objpfx)nptl/libpthread.so \
>  			       $(common-objpfx)nptl/libpthread_nonshared.a
> +$(objpfx)tst-malloc-thread-exit: $(common-objpfx)nptl/libpthread.so \
> +			       $(common-objpfx)nptl/libpthread_nonshared.a

OK.

>  
>  # These should be removed by `make clean'.
>  extra-objs = mcheck-init.o libmcheck.a
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 0f00afa..c8e043a 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -805,6 +805,8 @@ _int_new_arena (size_t size)
>  }
>  
>  
> +/* Remove an arena from free_list.  The arena may be in use because it
> +   was attached concurrently to a thread by reused_arena below.  */

Agreed.

The code in reused_arena walks all arenas starting with the main arena
and therefore may pick an arena that is in the sequence of arenas listed
via the free_list next_free pointer chain.

This can happen concurrently and there

>  static mstate
>  get_free_list (void)
>  {
> @@ -818,10 +820,8 @@ get_free_list (void)
>  	{
>  	  free_list = result->next_free;
>  
> -	  /* Arenas on the free list are not attached to any thread.  */
> -	  assert (result->attached_threads == 0);
> -	  /* But the arena will now be attached to this thread.  */
> -	  result->attached_threads = 1;
> +	  /* The arena will be attached to this thread.  */
> +	  ++result->attached_threads;

In principle this means that reused_arena may increment attached_threads
for arenas in the free_list, and that isn't very clean from a design
perspective.

It would be considerably more work though to remove arenas selected by
reused_arena from the free_list, you would need to traverse the list and
remove the item. That would avoid the changes in get_free_list, but would
be considerably more work, pointer chasing, and TLB pressure.

Logically it seems like reused_arena is a superset of the work being done
by get_free_list, since it gets any list. However, the expectation is that
get_free_list was called, and returned NULL, now we call reused_arena.
However, as seen here, a thread might have exited, adding an arena to the
free list.

Without holding the list_lock the whole time there is always a race as you
go to look at the list of usable arenas, and other threads exit and add
more arenas to the free_list chain. Therefore I think this fix as you wrote
it is the best way forward. You make get_free_list robust against addition
reused_arena attaching threads to the items on the free list.

The only question I have is this:

If only get_free_list removes an arena from the free_list chain, then what
prevents reused_arena from attaching threads to all of those arenas and
rendering the free_list useless?

Is it simply the expectation that arena_get2 calls get_free_list *first*
and therefore you are more likely to get an unattached arena? Probably.

This is some really painful logic. One really wants a thread to thread
handoff of a free arena, but that's quiet a rewrite.

>  
>  	  detach_arena (replaced_arena);
>  	}
> diff --git a/malloc/tst-malloc-thread-exit.c b/malloc/tst-malloc-thread-exit.c
> new file mode 100644
> index 0000000..8977f77
> --- /dev/null
> +++ b/malloc/tst-malloc-thread-exit.c
> @@ -0,0 +1,215 @@
> +/* Test malloc with concurrent thread termination.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* This thread spawns a number of outer threads, equal to the arena
> +   limit.  The outer threads run a loop which start and join two
> +   different kinds of threads: the first kind allocates (attaching an
> +   arena to the thread) and waits, the second kind waits and
> +   allocates.  The hope is that this will exhibit races in thread
> +   termination and arena management.  */

You should expand a bit and describe what things we are looking for
in this test.

Particularly that we want a thread to exit, while another thread
is starting to allocate, and we want to test the handoff of an
unused arena from the exiting thread to the newly allocating thread.

We don't need any more detail that that. In truth what we want to hit
is the scenario where the second thread has called get_free_list and
failed to get an arena, then the first thread exits, and then the second
thread calls reused_arena, marks the free'd arena as attached, and then
a third thread calls get_free_list and fails with the assertion we had
originally?

> +
> +#include <errno.h>
> +#include <pthread.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +
> +#define TIMEOUT 7
> +
> +static bool termination_requested;
> +static int inner_thread_count = 4;
> +static size_t malloc_size = 32;
> +
> +static void
> +__attribute__ ((noinline, noclone))
> +unoptimized_free (void *ptr)
> +{
> +  free (ptr);
> +}
> +
> +static void *
> +malloc_first_thread (void * closure)
> +{
> +  pthread_barrier_t *barrier = closure;
> +  void *ptr = malloc (malloc_size);
> +  if (ptr == NULL)
> +    {
> +      printf ("error: malloc: %m\n");
> +      abort ();
> +    }
> +  int ret = pthread_barrier_wait (barrier);
> +  if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
> +    {
> +      errno = ret;
> +      printf ("error: pthread_barrier_wait: %m\n");
> +      abort ();
> +    }
> +  unoptimized_free (ptr);
> +  return NULL;
> +}

OK.

> +
> +static void *
> +wait_first_thread (void * closure)
> +{
> +  pthread_barrier_t *barrier = closure;
> +  int ret = pthread_barrier_wait (barrier);
> +  if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
> +    {
> +      errno = ret;
> +      printf ("error: pthread_barrier_wait: %m\n");
> +      abort ();
> +    }
> +  void *ptr = malloc (malloc_size);
> +  if (ptr == NULL)
> +    {
> +      printf ("error: malloc: %m\n");
> +      abort ();
> +    }
> +  unoptimized_free (ptr);
> +  return NULL;

OK.

> +}
> +
> +static void *
> +outer_thread (void *closure)
> +{
> +  pthread_t *threads = calloc (sizeof (*threads), inner_thread_count);
> +  if (threads == NULL)
> +    {
> +      printf ("error: calloc: %m\n");
> +      abort ();
> +    }
> +
> +  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
> +    {
> +      pthread_barrier_t barrier;
> +      int ret = pthread_barrier_init (&barrier, NULL, inner_thread_count + 1);
> +      if (ret != 0)
> +        {
> +          errno = ret;
> +          printf ("pthread_barrier_init: %m\n");
> +          abort ();
> +        }
> +      for (int i = 0; i < inner_thread_count; ++i)
> +        {
> +          void *(*func) (void *);
> +          if ((i  % 2) == 0)
> +            func = malloc_first_thread;
> +          else
> +            func = wait_first_thread;
> +          ret = pthread_create (threads + i, NULL, func, &barrier);
> +          if (ret != 0)
> +            {
> +              errno = ret;
> +              printf ("error: pthread_create: %m\n");
> +              abort ();
> +            }
> +        }
> +      ret = pthread_barrier_wait (&barrier);
> +      if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
> +        {
> +          errno = ret;
> +          printf ("pthread_wait: %m\n");
> +          abort ();
> +        }
> +      for (int i = 0; i < inner_thread_count; ++i)
> +        {
> +          ret = pthread_join (threads[i], NULL);
> +          if (ret != 0)
> +            {
> +              ret = errno;
> +              printf ("error: pthread_join: %m\n");
> +              abort ();
> +            }
> +        }
> +      ret = pthread_barrier_destroy (&barrier);
> +      if (ret != 0)
> +        {
> +          ret = errno;
> +          printf ("pthread_barrier_destroy: %m\n");
> +          abort ();
> +        }
> +    }
> +
> +  free (threads);
> +
> +  return NULL;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  /* The number of top-level threads should be equal to the number of
> +     arenas.  See arena_get2.  */
> +  long outer_thread_count = sysconf (_SC_NPROCESSORS_ONLN);
> +  if (outer_thread_count >= 1)
> +    {
> +      /* See NARENAS_FROM_NCORES in malloc.c.  */
> +      if (sizeof (long) == 4)
> +        outer_thread_count *= 2;
> +      else
> +        outer_thread_count *= 8;
> +    }
> +

OK.

> +  /* Leave some room for shutting down all threads gracefully.  */
> +  int timeout = TIMEOUT - 2;
> +
> +  pthread_t *threads = calloc (sizeof (*threads), outer_thread_count);
> +  if (threads == NULL)
> +    {
> +      printf ("error: calloc: %m\n");
> +      abort ();
> +    }
> +
> +  for (long i = 0; i < outer_thread_count; ++i)
> +    {
> +      int ret = pthread_create (threads + i, NULL, outer_thread, NULL);
> +      if (ret != 0)
> +        {
> +          errno = ret;
> +          printf ("error: pthread_create: %m\n");
> +          abort ();
> +        }
> +    }
> +
> +  struct timespec ts = {timeout, 0};
> +  if (nanosleep (&ts, NULL))
> +    {
> +      printf ("error: error: nanosleep: %m\n");
> +      abort ();
> +    }
> +
> +  __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED);
> +
> +  for (long i = 0; i < outer_thread_count; ++i)
> +    {
> +      int ret = pthread_join (threads[i], NULL);
> +      if (ret != 0)
> +        {
> +          errno = ret;
> +          printf ("error: pthread_join: %m\n");
> +          abort ();
> +        }
> +    }
> +  free (threads);

OK.

> +
> +  return 0;
> +}
> +
> +#define TEST_FUNCTION do_test ()
> +#include "../test-skeleton.c"
> -- 2.4.3

Cheers,
Carlos.

Patch
diff mbox

malloc: Fix attached thread reference count handling [BZ #19243]

reused_arena can increase the attached thread count of arenas on the
free list.  This means that the assertion that the reference count is
zero is incorrect.  In this case, the reference count initialization
is incorrect as well and could cause arenas to be put on the free
list too early (while they still have attached threads).

2015-11-17  Florian Weimer  <fweimer@redhat.com>

	[BZ #19243]
	* malloc/arena.c (get_free_list): Remove assert and adjust
	reference count handling.
	* malloc/tst-malloc-thread-exit.c: New file.

diff --git a/malloc/Makefile b/malloc/Makefile
index 67ed293..aa0579c 100644
--- a/malloc/Makefile
+++ b/malloc/Makefile
@@ -28,7 +28,7 @@  tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \
 	 tst-mallocstate tst-mcheck tst-mallocfork tst-trim1 \
 	 tst-malloc-usable tst-realloc tst-posix_memalign \
 	 tst-pvalloc tst-memalign tst-mallopt tst-scratch_buffer \
-	 tst-malloc-backtrace
+	 tst-malloc-backtrace tst-malloc-thread-exit
 test-srcs = tst-mtrace
 
 routines = malloc morecore mcheck mtrace obstack \
@@ -47,6 +47,8 @@  libmemusage-inhibit-o = $(filter-out .os,$(object-suffixes))
 
 $(objpfx)tst-malloc-backtrace: $(common-objpfx)nptl/libpthread.so \
 			       $(common-objpfx)nptl/libpthread_nonshared.a
+$(objpfx)tst-malloc-thread-exit: $(common-objpfx)nptl/libpthread.so \
+			       $(common-objpfx)nptl/libpthread_nonshared.a
 
 # These should be removed by `make clean'.
 extra-objs = mcheck-init.o libmcheck.a
diff --git a/malloc/arena.c b/malloc/arena.c
index 0f00afa..c8e043a 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -805,6 +805,8 @@  _int_new_arena (size_t size)
 }
 
 
+/* Remove an arena from free_list.  The arena may be in use because it
+   was attached concurrently to a thread by reused_arena below.  */
 static mstate
 get_free_list (void)
 {
@@ -818,10 +820,8 @@  get_free_list (void)
 	{
 	  free_list = result->next_free;
 
-	  /* Arenas on the free list are not attached to any thread.  */
-	  assert (result->attached_threads == 0);
-	  /* But the arena will now be attached to this thread.  */
-	  result->attached_threads = 1;
+	  /* The arena will be attached to this thread.  */
+	  ++result->attached_threads;
 
 	  detach_arena (replaced_arena);
 	}
diff --git a/malloc/tst-malloc-thread-exit.c b/malloc/tst-malloc-thread-exit.c
new file mode 100644
index 0000000..8977f77
--- /dev/null
+++ b/malloc/tst-malloc-thread-exit.c
@@ -0,0 +1,215 @@ 
+/* Test malloc with concurrent thread termination.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This thread spawns a number of outer threads, equal to the arena
+   limit.  The outer threads run a loop which start and join two
+   different kinds of threads: the first kind allocates (attaching an
+   arena to the thread) and waits, the second kind waits and
+   allocates.  The hope is that this will exhibit races in thread
+   termination and arena management.  */
+
+#include <errno.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#define TIMEOUT 7
+
+static bool termination_requested;
+static int inner_thread_count = 4;
+static size_t malloc_size = 32;
+
+static void
+__attribute__ ((noinline, noclone))
+unoptimized_free (void *ptr)
+{
+  free (ptr);
+}
+
+static void *
+malloc_first_thread (void * closure)
+{
+  pthread_barrier_t *barrier = closure;
+  void *ptr = malloc (malloc_size);
+  if (ptr == NULL)
+    {
+      printf ("error: malloc: %m\n");
+      abort ();
+    }
+  int ret = pthread_barrier_wait (barrier);
+  if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
+    {
+      errno = ret;
+      printf ("error: pthread_barrier_wait: %m\n");
+      abort ();
+    }
+  unoptimized_free (ptr);
+  return NULL;
+}
+
+static void *
+wait_first_thread (void * closure)
+{
+  pthread_barrier_t *barrier = closure;
+  int ret = pthread_barrier_wait (barrier);
+  if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
+    {
+      errno = ret;
+      printf ("error: pthread_barrier_wait: %m\n");
+      abort ();
+    }
+  void *ptr = malloc (malloc_size);
+  if (ptr == NULL)
+    {
+      printf ("error: malloc: %m\n");
+      abort ();
+    }
+  unoptimized_free (ptr);
+  return NULL;
+}
+
+static void *
+outer_thread (void *closure)
+{
+  pthread_t *threads = calloc (sizeof (*threads), inner_thread_count);
+  if (threads == NULL)
+    {
+      printf ("error: calloc: %m\n");
+      abort ();
+    }
+
+  while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED))
+    {
+      pthread_barrier_t barrier;
+      int ret = pthread_barrier_init (&barrier, NULL, inner_thread_count + 1);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("pthread_barrier_init: %m\n");
+          abort ();
+        }
+      for (int i = 0; i < inner_thread_count; ++i)
+        {
+          void *(*func) (void *);
+          if ((i  % 2) == 0)
+            func = malloc_first_thread;
+          else
+            func = wait_first_thread;
+          ret = pthread_create (threads + i, NULL, func, &barrier);
+          if (ret != 0)
+            {
+              errno = ret;
+              printf ("error: pthread_create: %m\n");
+              abort ();
+            }
+        }
+      ret = pthread_barrier_wait (&barrier);
+      if (ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD)
+        {
+          errno = ret;
+          printf ("pthread_wait: %m\n");
+          abort ();
+        }
+      for (int i = 0; i < inner_thread_count; ++i)
+        {
+          ret = pthread_join (threads[i], NULL);
+          if (ret != 0)
+            {
+              ret = errno;
+              printf ("error: pthread_join: %m\n");
+              abort ();
+            }
+        }
+      ret = pthread_barrier_destroy (&barrier);
+      if (ret != 0)
+        {
+          ret = errno;
+          printf ("pthread_barrier_destroy: %m\n");
+          abort ();
+        }
+    }
+
+  free (threads);
+
+  return NULL;
+}
+
+static int
+do_test (void)
+{
+  /* The number of top-level threads should be equal to the number of
+     arenas.  See arena_get2.  */
+  long outer_thread_count = sysconf (_SC_NPROCESSORS_ONLN);
+  if (outer_thread_count >= 1)
+    {
+      /* See NARENAS_FROM_NCORES in malloc.c.  */
+      if (sizeof (long) == 4)
+        outer_thread_count *= 2;
+      else
+        outer_thread_count *= 8;
+    }
+
+  /* Leave some room for shutting down all threads gracefully.  */
+  int timeout = TIMEOUT - 2;
+
+  pthread_t *threads = calloc (sizeof (*threads), outer_thread_count);
+  if (threads == NULL)
+    {
+      printf ("error: calloc: %m\n");
+      abort ();
+    }
+
+  for (long i = 0; i < outer_thread_count; ++i)
+    {
+      int ret = pthread_create (threads + i, NULL, outer_thread, NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_create: %m\n");
+          abort ();
+        }
+    }
+
+  struct timespec ts = {timeout, 0};
+  if (nanosleep (&ts, NULL))
+    {
+      printf ("error: error: nanosleep: %m\n");
+      abort ();
+    }
+
+  __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED);
+
+  for (long i = 0; i < outer_thread_count; ++i)
+    {
+      int ret = pthread_join (threads[i], NULL);
+      if (ret != 0)
+        {
+          errno = ret;
+          printf ("error: pthread_join: %m\n");
+          abort ();
+        }
+    }
+  free (threads);
+
+  return 0;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
-- 
2.4.3