Fix OpenACC shutdown and PTX image unloading (PR65904)
diff mbox

Message ID 20150501104719.1355dd4c@octopus
State New
Headers show

Commit Message

Julian Brown May 1, 2015, 9:47 a.m. UTC
Hi,

This patch fixes PR65904, a double-free error that started occurring
after recent libgomp changes to the way offload images are registered
with the runtime.

Offload images now map all functions/data using just two malloc'ed
blocks, but the function gomp_free_memmap did not take that into
account, and treated all mappings as if they had their own blocks (as
they do if created by gomp_map_vars): so attempting to free the whole
map at once failed when it hit mappings for an offload image.

The fix is to split offload-image freeing out of GOMP_offload_unregister
into a separate function, and call that from gomp_free_memmap for the
given device before freeing the rest of the memory map.

The patch also fixes a thinko that was revealed in image unloading in
the NVPTX backend. Tested for libgomp with PTX offloading.

OK for trunk?

Thanks,

Julian

ChangeLog

    libgomp/
    * libgomp.h (gomp_free_memmap): Update prototype.
    * oacc-init.c (acc_shutdown_1): Pass device descriptor to
    gomp_free_memmap. Don't lock device around call.
    * target.c (gomp_map_vars): Initialise tgt->array to NULL before
    early exit.
    (GOMP_offload_unregister): Split out and call...
    (gomp_deoffload_image_from_device): This new function.
    (gomp_free_memmap): Call gomp_deoffload_image_from_device.
    
    * plugin/nvptx.c (struct ptx_image_data): Add ord, fn_descs fields.
    (GOMP_OFFLOAD_load_image): Populate above fields.
    (GOMP_OFFLOAD_unload_image): Switch to ORD'th device before freeing
    images, and use fn_descs field from ptx_image_data instead of
    incorrectly using a pointer derived from target_data.

Comments

Thomas Schwinge May 6, 2015, 8:32 a.m. UTC | #1
Hi!

On Fri, 1 May 2015 10:47:19 +0100, Julian Brown <julian@codesourcery.com> wrote:
> This patch fixes PR65904, a double-free error that started occurring
> after recent libgomp changes to the way offload images are registered
> with the runtime.
> 
> Offload images now map all functions/data using just two malloc'ed
> blocks, but the function gomp_free_memmap did not take that into
> account, and treated all mappings as if they had their own blocks (as
> they do if created by gomp_map_vars): so attempting to free the whole
> map at once failed when it hit mappings for an offload image.
> 
> The fix is to split offload-image freeing out of GOMP_offload_unregister
> into a separate function, and call that from gomp_free_memmap for the
> given device before freeing the rest of the memory map.
> 
> The patch also fixes a thinko that was revealed in image unloading in
> the NVPTX backend. Tested for libgomp with PTX offloading.

Confirming that both nvptx (PR65904 fixed) and (emulated) intelmic (no
changes) testing look fine.

> OK for trunk?

> ChangeLog
> 
>     libgomp/

Don't forget to add the PR marker here.

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1687,23 +1695,22 @@ GOMP_OFFLOAD_load_image (int ord, void *target_data,
>  }
>  
>  void
> -GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
> +GOMP_OFFLOAD_unload_image (int ord, void *target_data)
>  {
> -  void **img_header = (void **) target_data;
> -  struct targ_fn_descriptor *targ_fns
> -    = (struct targ_fn_descriptor *) img_header[0];
>    struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
>  
> -  free (targ_fns);
> +  nvptx_attach_host_thread_to_device (ord);

By the way, do we need to lock ptx_devices in
libgomp/plugin/plugin-nvptx.c:nvptx_attach_host_thread_to_device and
libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_openacc_create_thread_data?
(ptx_dev_lock?  If yes, its definition as well as instantiated_devices
should be moved to a more appropriate place, probably?)  Also, several
accesses to instantiated_devices are not locked by ptx_dev_lock but
should be, from my cursory review.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -797,32 +797,79 @@ GOMP_offload_register (void *host_table, enum offload_target_type target_type,
>    gomp_mutex_unlock (&register_lock);
>  }
>  
> -/* This function should be called from every offload image while unloading.
> -   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> -   the target, and TARGET_DATA needed by target plugin.  */
> +/* DEVICEP should be locked on entry, and remains locked on exit.  */

(I'm not a native speaker, but would use what I consider to be more
explicit/precise language: »must be locked« instead of »should be«.  I'll
be happy to learn should they mean the same thing?)

>  
> -void
> -GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
> -			 void *target_data)
> +static void
> +gomp_deoffload_image_from_device (struct gomp_device_descr *devicep,
> +				  void *host_table, void *target_data)
>  {

> +/* This function should be called from every offload image while unloading.

s%from%for%, I think?  (And, s%should%must%, again?)

> +   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
> +   the target, and TARGET_DATA needed by target plugin.  */
> +
> +void
> +GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
> +			 void *target_data)
> +{

> -/* Free address mapping tables.  MM must be locked on entry, and remains locked
> -   on return.  */
> +/* Free address mapping tables for an active device DEVICEP.  This includes
> +   both mapped offload functions/variables, and mapped user data regions.
> +   To be used before shutting a device down: subsequently reinitialising the
> +   device will repopulate the offload image mappings.  */
>  
>  attribute_hidden void
> -gomp_free_memmap (struct splay_tree_s *mem_map)
> +gomp_free_memmap (struct gomp_device_descr *devicep)
>  {
> +  int i;
> +  struct splay_tree_s *mem_map = &devicep->mem_map;
> +
> +  assert (devicep->is_initialized);
> +
> +  gomp_mutex_lock (&devicep->lock);

Need to lock before first access to *devicep?

> +  
> +  /* Unmap offload images that are registered to this device.  */
> +  for (i = 0; i < num_offload_images; i++)
> +    {
> +      struct offload_image_descr *image = &offload_images[i];

Need to take register_lock when accessing offload_images?

> +      if (image->type == devicep->type)
> +        {
> +	  void *host_table = image->host_table;
> +	  void *target_data = image->target_data;
> +
> +	  gomp_deoffload_image_from_device (devicep, host_table, target_data);
> +	}
> +    }
> +
> +  /* Clear other mappings.  */
>    while (mem_map->root)
>      {
>        struct target_mem_desc *tgt = mem_map->root->key.tgt;
>  
>        splay_tree_remove (mem_map, &mem_map->root->key);
> -      free (tgt->array);
> +
> +      if (tgt->list_count == 0)
> +	assert (!tgt->array);
> +      else
> +	free (tgt->array);
> +
>        free (tgt);
>      }
> +
> +  gomp_mutex_unlock (&devicep->lock);
>  }


Grüße,
 Thomas

Patch
diff mbox

commit 14e8e35a494a5a8231ab1a3cad38a2157bca7e4a
Author: Julian Brown <julian@codesourcery.com>
Date:   Thu Apr 30 10:19:58 2015 -0700

    Fix freeing of memory maps during acc shutdown.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5272f01..5e0e09c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -777,7 +777,7 @@  extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
+extern void gomp_free_memmap (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 503f8b8..f2c60ec 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -245,9 +245,7 @@  acc_shutdown_1 (acc_device_t d)
 
       if (walk->dev)
 	{
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
+	  gomp_free_memmap (walk->dev);
 
 	  walk->dev = NULL;
 	  walk->base_dev = NULL;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 583ec87..2cc0ae0 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -334,8 +334,10 @@  struct ptx_event
 
 struct ptx_image_data
 {
+  int ord;
   void *target_data;
   CUmodule module;
+  struct targ_fn_descriptor *fn_descs;
   struct ptx_image_data *next;
 };
 
@@ -1625,13 +1627,6 @@  GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   link_ptx (&module, img_header[0]);
 
-  pthread_mutex_lock (&ptx_image_lock);
-  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
-  new_image->target_data = target_data;
-  new_image->module = module;
-  new_image->next = ptx_images;
-  ptx_images = new_image;
-  pthread_mutex_unlock (&ptx_image_lock);
 
   /* The mkoffload utility emits a table of pointers/integers at the start of
      each offload image:
@@ -1652,8 +1647,21 @@  GOMP_OFFLOAD_load_image (int ord, void *target_data,
 
   *target_table = GOMP_PLUGIN_malloc (sizeof (struct addr_pair)
 				      * (fn_entries + var_entries));
-  targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
-				 * fn_entries);
+  if (fn_entries > 0)
+    targ_fns = GOMP_PLUGIN_malloc (sizeof (struct targ_fn_descriptor)
+				   * fn_entries);
+  else
+    targ_fns = NULL;
+
+  pthread_mutex_lock (&ptx_image_lock);
+  new_image = GOMP_PLUGIN_malloc (sizeof (struct ptx_image_data));
+  new_image->ord = ord;
+  new_image->target_data = target_data;
+  new_image->module = module;
+  new_image->fn_descs = targ_fns;
+  new_image->next = ptx_images;
+  ptx_images = new_image;
+  pthread_mutex_unlock (&ptx_image_lock);
 
   for (i = 0; i < fn_entries; i++)
     {
@@ -1687,23 +1695,22 @@  GOMP_OFFLOAD_load_image (int ord, void *target_data,
 }
 
 void
-GOMP_OFFLOAD_unload_image (int tid __attribute__((unused)), void *target_data)
+GOMP_OFFLOAD_unload_image (int ord, void *target_data)
 {
-  void **img_header = (void **) target_data;
-  struct targ_fn_descriptor *targ_fns
-    = (struct targ_fn_descriptor *) img_header[0];
   struct ptx_image_data *image, *prev = NULL, *newhd = NULL;
 
-  free (targ_fns);
+  nvptx_attach_host_thread_to_device (ord);
 
   pthread_mutex_lock (&ptx_image_lock);
   for (image = ptx_images; image != NULL;)
     {
       struct ptx_image_data *next = image->next;
 
-      if (image->target_data == target_data)
+      if (image->ord == ord && image->target_data == target_data)
 	{
 	  cuModuleUnload (image->module);
+	  if (image->fn_descs)
+	    free (image->fn_descs);
 	  free (image);
 	  if (prev)
 	    prev->next = next;
diff --git a/libgomp/target.c b/libgomp/target.c
index d8da783..3ac674c 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -178,6 +178,7 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   tgt->list_count = mapnum;
   tgt->refcount = 1;
   tgt->device_descr = devicep;
+  tgt->array = NULL;
 
   if (mapnum == 0)
     return tgt;
@@ -275,7 +276,6 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   if (is_target)
     tgt_size = mapnum * sizeof (void *);
 
-  tgt->array = NULL;
   if (not_found_cnt)
     {
       tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
@@ -797,32 +797,79 @@  GOMP_offload_register (void *host_table, enum offload_target_type target_type,
   gomp_mutex_unlock (&register_lock);
 }
 
-/* This function should be called from every offload image while unloading.
-   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
-   the target, and TARGET_DATA needed by target plugin.  */
+/* DEVICEP should be locked on entry, and remains locked on exit.  */
 
-void
-GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
-			 void *target_data)
+static void
+gomp_deoffload_image_from_device (struct gomp_device_descr *devicep,
+				  void *host_table, void *target_data)
 {
   void **host_func_table = ((void ***) host_table)[0];
   void **host_funcs_end  = ((void ***) host_table)[1];
   void **host_var_table  = ((void ***) host_table)[2];
   void **host_vars_end   = ((void ***) host_table)[3];
-  int i;
 
   /* The func table contains only addresses, the var table contains addresses
      and corresponding sizes.  */
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
+  int j;
+
+  devicep->unload_image_func (devicep->target_id, target_data);
+
+  /* Remove mapping from splay tree.  */
+  struct splay_tree_key_s k;
+  splay_tree_key node = NULL;
+  if (num_funcs > 0)
+    {
+      k.host_start = (uintptr_t) host_func_table[0];
+      k.host_end = k.host_start + 1;
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+  else if (num_vars > 0)
+    {
+      k.host_start = (uintptr_t) host_var_table[0];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[1];
+      node = splay_tree_lookup (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_funcs; j++)
+    {
+      k.host_start = (uintptr_t) host_func_table[j];
+      k.host_end = k.host_start + 1;
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  for (j = 0; j < num_vars; j++)
+    {
+      k.host_start = (uintptr_t) host_var_table[j * 2];
+      k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+      splay_tree_remove (&devicep->mem_map, &k);
+    }
+
+  if (node)
+    {
+      free (node->tgt);
+      free (node);
+    }
+}
+
+/* This function should be called from every offload image while unloading.
+   It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of
+   the target, and TARGET_DATA needed by target plugin.  */
+
+void
+GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
+			 void *target_data)
+{
+  int i;
 
   gomp_mutex_lock (&register_lock);
 
   /* Unload image from all initialized devices.  */
   for (i = 0; i < num_devices; i++)
     {
-      int j;
       struct gomp_device_descr *devicep = &devices[i];
+      
       gomp_mutex_lock (&devicep->lock);
       if (devicep->type != target_type || !devicep->is_initialized)
 	{
@@ -830,44 +877,7 @@  GOMP_offload_unregister (void *host_table, enum offload_target_type target_type,
 	  continue;
 	}
 
-      devicep->unload_image_func (devicep->target_id, target_data);
-
-      /* Remove mapping from splay tree.  */
-      struct splay_tree_key_s k;
-      splay_tree_key node = NULL;
-      if (num_funcs > 0)
-	{
-	  k.host_start = (uintptr_t) host_func_table[0];
-	  k.host_end = k.host_start + 1;
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-      else if (num_vars > 0)
-	{
-	  k.host_start = (uintptr_t) host_var_table[0];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[1];
-	  node = splay_tree_lookup (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_funcs; j++)
-	{
-	  k.host_start = (uintptr_t) host_func_table[j];
-	  k.host_end = k.host_start + 1;
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      for (j = 0; j < num_vars; j++)
-	{
-	  k.host_start = (uintptr_t) host_var_table[j * 2];
-	  k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
-	  splay_tree_remove (&devicep->mem_map, &k);
-	}
-
-      if (node)
-	{
-	  free (node->tgt);
-	  free (node);
-	}
-
+      gomp_deoffload_image_from_device (devicep, host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
     }
 
@@ -903,20 +913,50 @@  gomp_init_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = true;
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
+/* Free address mapping tables for an active device DEVICEP.  This includes
+   both mapped offload functions/variables, and mapped user data regions.
+   To be used before shutting a device down: subsequently reinitialising the
+   device will repopulate the offload image mappings.  */
 
 attribute_hidden void
-gomp_free_memmap (struct splay_tree_s *mem_map)
+gomp_free_memmap (struct gomp_device_descr *devicep)
 {
+  int i;
+  struct splay_tree_s *mem_map = &devicep->mem_map;
+
+  assert (devicep->is_initialized);
+
+  gomp_mutex_lock (&devicep->lock);
+  
+  /* Unmap offload images that are registered to this device.  */
+  for (i = 0; i < num_offload_images; i++)
+    {
+      struct offload_image_descr *image = &offload_images[i];
+      if (image->type == devicep->type)
+        {
+	  void *host_table = image->host_table;
+	  void *target_data = image->target_data;
+
+	  gomp_deoffload_image_from_device (devicep, host_table, target_data);
+	}
+    }
+
+  /* Clear other mappings.  */
   while (mem_map->root)
     {
       struct target_mem_desc *tgt = mem_map->root->key.tgt;
 
       splay_tree_remove (mem_map, &mem_map->root->key);
-      free (tgt->array);
+
+      if (tgt->list_count == 0)
+	assert (!tgt->array);
+      else
+	free (tgt->array);
+
       free (tgt);
     }
+
+  gomp_mutex_unlock (&devicep->lock);
 }
 
 /* This function de-initializes the target device, specified by DEVICEP.