diff mbox series

[committed,libgomp,nvptx] Remove hard-coded const in nvptx_open_device

Message ID 20180808142814.GA22476@delia
State New
Headers show
Series [committed,libgomp,nvptx] Remove hard-coded const in nvptx_open_device | expand

Commit Message

Tom de Vries Aug. 8, 2018, 2:28 p.m. UTC
Hi,

CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver
api version 6.0 and higher.

Currently nvptx_open_device uses a hard-coded constant instead.

This patch fixes that by:
- defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded
  constant at toplevel, if not present in cuda.h, and
- using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to trunk.

Thanks,
- Tom

[libgomp, nvptx] Remove hard-coded const in nvptx_open_device

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c
	(CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define.
	(nvptx_open_device): Use
	CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR.

---
 libgomp/plugin/plugin-nvptx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index e2ea542049b0..589d6596cc2f 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -51,6 +51,7 @@ 
 
 #if CUDA_VERSION < 6000
 extern CUresult cuGetErrorString (CUresult, const char **);
+#define CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR 82
 #endif
 
 #define DO_PRAGMA(x) _Pragma (#x)
@@ -741,9 +742,11 @@  nvptx_open_device (int n)
 		  &pi, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK, dev);
   ptx_dev->regs_per_block = pi;
 
-  /* CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 is defined only
+  /* CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined only
      in CUDA 6.0 and newer.  */
-  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi, 82, dev);
+  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
+			 CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR,
+			 dev);
   /* Fallback: use limit of registers per block, which is usually equal.  */
   if (r == CUDA_ERROR_INVALID_VALUE)
     pi = ptx_dev->regs_per_block;