diff mbox series

[og12] '-foffload-memory=pinned' using offloading device interfaces (was: -foffload-memory=pinned)

Message ID 878rf9xbd0.fsf@euler.schwinge.homeip.net
State New
Headers show
Series [og12] '-foffload-memory=pinned' using offloading device interfaces (was: -foffload-memory=pinned) | expand

Commit Message

Thomas Schwinge April 3, 2023, 2:56 p.m. UTC
Hi!

On 2023-02-13T15:20:07+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 13/02/2023 14:38, Thomas Schwinge wrote:
>> On 2022-03-08T11:30:55+0000, Hafiz Abid Qadeer <abidh@codesourcery.com> wrote:
>>> From: Andrew Stubbs <ams@codesourcery.com>
>>>
>>> Add a new option.  It will be used in follow-up patches.
>>
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>
>>> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this
>>> +mode may require the user to increase the ulimit setting for locked memory).
>>
>> So, this is currently implemented via 'mlockall', which, as discussed,
>> (a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
>> meant to achieve (because it doesn't register the page-locked memory with
>> the GPU driver).
>> [...]
>> As '-foffload-memory=pinned', per the name
>> of the option, concerns itself with memory used in offloading but not
>> host execution generally, why are we actually attempting to "[force] all
>> host memory to be pinned" -- why not just the memory that's being used
>> with offloading?  That is, if '-foffload-memory=pinned' is set, register
>> as page-locked with the GPU driver all memory that appears in OMP
>> offloading data regions, such as OpenMP 'target' 'map' clauses etc.  That
>> way, this is directed at the offloading data transfers, as itended, but
>> at the same time we don't "waste" page-locked memory for generic host
>> memory allocations.  What do you think -- you, who've spent a lot more
>> time on this topic than I have, so it's likely possible that I fail to
>> realize some "details"?
>
> The main reason it is the way it is is because in general it's not
> possible to know what memory is going to be offloaded at the time it is
> allocated (and stack/static memory is never allocated that way).
>
> If there's a way to pin it after the fact then maybe that's not a
> terrible idea?  [...]

I've now pushed to devel/omp/gcc-12 branch my take on this in
commit 43095690ea519205bf56fc148b346edaa43e0f0f
"'-foffload-memory=pinned' using offloading device interfaces", and for
changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches":
commit 694bbd399c1323975b4a6735646e46c6914de63d
"'-foffload-memory=pinned' using offloading device interfaces for non-contiguous array support",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

From 694bbd399c1323975b4a6735646e46c6914de63d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 30 Mar 2023 10:08:12 +0200
Subject: [PATCH 2/2] '-foffload-memory=pinned' using offloading device
 interfaces for non-contiguous array support

Changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".

	libgomp/
	* target.c (gomp_map_vars_internal)
	<non-contiguous array support>: Handle 'always_pinned_mode'.
---
 libgomp/ChangeLog.omp |  3 +++
 libgomp/target.c      | 55 +++++++++++++++++++++++++++++++++++++++----
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 1b02c057562..09cf9c6f3c1 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@ 
 2023-04-03  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* target.c (gomp_map_vars_internal)
+	<non-contiguous array support>: Handle 'always_pinned_mode'.
+
 	* libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add
 	'struct goacc_asyncqueue *' formal parameter.
 	(GOMP_OFFLOAD_page_locked_host_register)
diff --git a/libgomp/target.c b/libgomp/target.c
index ed2fc09cf44..38eb5d1aa5b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2087,6 +2087,23 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      k->dynamic_refcount = 0;
 		      k->aux = NULL;
 		      k->tgt_offset = tgt_size;
+		      k->page_locked_host_p = false;
+		      if (always_pinned_mode)
+			{
+			  void *ptr = (void *) k->host_start;
+			  size_t size = k->host_end - k->host_start;
+			  int page_locked_host_p = 0;
+			  if (size != 0)
+			    page_locked_host_p = gomp_page_locked_host_register_dev
+			      (devicep, ptr, size, kind & typemask);
+			  if (page_locked_host_p < 0)
+			    {
+			      gomp_mutex_unlock (&devicep->lock);
+			      exit (EXIT_FAILURE);
+			    }
+			  if (page_locked_host_p)
+			    k->page_locked_host_p = true;
+			}
 
 		      tgt_size += nca->data_row_size;
 
@@ -2120,16 +2137,44 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		 accelerator side ptrblock and copy it in.  */
 	      if (nca->ptrblock_size)
 		{
-		  void *ptrblock = gomp_malloc (nca->ptrblock_size);
+		  void *ptrblock;
+		  if (always_pinned_mode)
+		    {
+		      ptrblock
+			= gomp_page_locked_host_alloc_dev (devicep,
+							   nca->ptrblock_size,
+							   false);
+		      if (!ptrblock)
+			{
+			  gomp_mutex_unlock (&devicep->lock);
+			  exit (EXIT_FAILURE);
+			}
+		    }
+		  else
+		    ptrblock = gomp_malloc (nca->ptrblock_size);
 		  goacc_noncontig_array_create_ptrblock
 		    (nca, ptrblock, target_ptrblock);
 		  gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
 				      nca->ptrblock_size, false, cbufp);
-		  if (aq)
-		    /* Free once the transfer has completed.  */
-		    devicep->openacc.async.queue_callback_func (aq, free, ptrblock);
+		  if (always_pinned_mode)
+		    {
+		      if (!gomp_page_locked_host_free_dev (devicep,
+							   ptrblock,
+							   aq))
+			{
+			  gomp_mutex_unlock (&devicep->lock);
+			  exit (EXIT_FAILURE);
+			}
+		    }
 		  else
-		    free (ptrblock);
+		    {
+		      if (aq)
+			/* Free once the transfer has completed.  */
+			devicep->openacc.async.queue_callback_func
+			  (aq, free, ptrblock);
+		      else
+			free (ptrblock);
+		    }
 		}
 	    }
 	}
-- 
2.25.1