diff mbox series

[og7,libgomp,openacc,nvptx,committed] Don't select too many workers

Message ID 7e59359d-1c02-17be-1bd3-961260861890@mentor.com
State New
Headers show
Series [og7,libgomp,openacc,nvptx,committed] Don't select too many workers | expand

Commit Message

Tom de Vries May 4, 2018, 12:32 p.m. UTC
Hi,

On the og7 branch for Titan V, we run into this error message in 
testsuite polybench for testcases covariance and lu:
...
libgomp: The Nvidia accelerator has insufficient resources to launch 
'x$_omp_fn$0' with num_workers = 27 and vector_length = 32; recompile 
the program with 'num_workers = x and vector_length = y' on that 
offloaded region or '-fopenacc-dim=-:x:y' where x * y <= 768.
...

The problem here is that num_workers is chosen by libgomp, and instead 
of giving the error, it should reduce the num_workers.

Fixed by this patch.

Build x86_64 with nvptx accelerator, tested libgomp.

Committed to og7 branch.

Thanks,
- Tom
diff mbox series

Patch

[libgomp, openacc, nvptx] Don't select too many workers

2018-05-04  Tom de Vries  <tom@codesourcery.com>

	PR libgomp/85649
	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Choose num_workers such that device has sufficient
	resources.

---
 libgomp/plugin/plugin-nvptx.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3c00555..e4d87f5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -189,6 +189,12 @@  cuda_error (CUresult r)
   return desc;
 }
 
+/* From gcc/system.h.  */
+#undef MIN
+#undef MAX
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
+
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -802,7 +808,8 @@  nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
     {
       int vectors = dims[GOMP_DIM_VECTOR] > 0
 	? dims[GOMP_DIM_VECTOR] : warp_size;
-      int workers = threads_per_block / vectors;
+      int workers
+	= MIN (threads_per_block, targ_fn->max_threads_per_block) / vectors;
 
       for (i = 0; i != GOMP_DIM_MAX; i++)
 	if (!dims[i])