diff mbox series

[nvptx] Add -mptx=3.1/6.3

Message ID 20210512141046.GA8844@delia
State New
Headers show
Series [nvptx] Add -mptx=3.1/6.3 | expand

Commit Message

Tom de Vries May 12, 2021, 2:10 p.m. UTC
Hi,

Add nvptx option -mptx that sets the ptx ISA version.  This is currently
hardcoded to 3.1.

Tested libgomp on x86_64-linux with nvptx accelerator, both with default set to
3.1 and 6.3.

Any comments?

Thanks,
- Tom

[nvptx] Add -mptx=3.1/6.3

gcc/ChangeLog:

2021-05-12  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx-opts.h (enum ptx_version): New enum.
	* config/nvptx/nvptx.c (nvptx_file_start): Print .version according
	to ptx_version_option.
	* config/nvptx/nvptx.h (TARGET_PTX_6_3): Define.
	* config/nvptx/nvptx.md (define_insn "nvptx_shuffle<mode>")
	(define_insn "nvptx_vote_ballot"): Use sync variant for
	TARGET_PTX_6_3.
	* config/nvptx/nvptx.opt (ptx_version): Add enum.
	(mptx): Add option.
	* doc/invoke.texi (Nvidia PTX Options): Add mptx item.

---
 gcc/config/nvptx/nvptx-opts.h |  6 ++++++
 gcc/config/nvptx/nvptx.c      |  5 ++++-
 gcc/config/nvptx/nvptx.h      |  2 ++
 gcc/config/nvptx/nvptx.md     | 14 ++++++++++++--
 gcc/config/nvptx/nvptx.opt    | 14 ++++++++++++++
 gcc/doc/invoke.texi           |  6 ++++++
 6 files changed, 44 insertions(+), 3 deletions(-)

Comments

Tobias Burnus May 12, 2021, 3:50 p.m. UTC | #1
Hi,

On 12.05.21 16:10, Tom de Vries wrote:
> Add nvptx option -mptx that sets the ptx ISA version.  This is currently
> hardcoded to 3.1.
> Tested libgomp on x86_64-linux with nvptx accelerator, both with default set to
> 3.1 and 6.3.
> Any comments?

:-)

ISA 3.1 = CUDA 5 (supporting sm_10 to sm_{30,35}
ISA 6.3 = CUDA 10.0 (supporting sm_10 to sm_{70,72,75}

I think it is useful – both to move to new -misa (beyond
sm_30 and sm_35) which require a newer ISA for some
features. But a lot of new features (like .alias) are
generic and also very useful.

It also permits to use the .alias feature for
PR 97102 (see attached patch).

There is one typo in the doc:
'The default PTX version is sm_3.1.'
There is a spurious 'sm_'.

Should there be a fixme/missed optimization
comment regarding the lane mask for the
.sync variants? Or is 0xffffffff fine for the
foreseeable future and the comment is not needed?

  * * *

The other question is how to move forward from there,
i.e. when to move requiring CUDA 10+ (6.3) by default,
permitting -mptx=3.1 only as legacy mode?

And how to test this best in the testsuite? Namely,
should we iterate through both ISA modes? Or specify
manually in some tests? Just test the default regularily?

How to handle sm_xx which are not supported by the
default/specified -misa=sm_...? (Error out?)
And when/whether to move to a higher sm_... value by default?

(I have not checked but it seems as sm_70+ is the largest
step but sm_70 not yet widely used; hence, sticking to
sm_35 for a while is probably fine.)

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
Tom de Vries May 12, 2021, 4:33 p.m. UTC | #2
On 5/12/21 5:50 PM, Tobias Burnus wrote:
> Hi,
> 
> On 12.05.21 16:10, Tom de Vries wrote:
>> Add nvptx option -mptx that sets the ptx ISA version.  This is currently
>> hardcoded to 3.1.
>> Tested libgomp on x86_64-linux with nvptx accelerator, both with
>> default set to
>> 3.1 and 6.3.
>> Any comments?
> 
> :-)
> 
> ISA 3.1 = CUDA 5 (supporting sm_10 to sm_{30,35}
> ISA 6.3 = CUDA 10.0 (supporting sm_10 to sm_{70,72,75}
> 
> I think it is useful – both to move to new -misa (beyond
> sm_30 and sm_35) which require a newer ISA for some
> features.

Yes, I expect that we want to add sm_70 to take advantage of cas.b16.

> But a lot of new features (like .alias) are
> generic and also very useful.
> 
> It also permits to use the .alias feature for
> PR 97102 (see attached patch).
> 

Ack.

> There is one typo in the doc:
> 'The default PTX version is sm_3.1.'
> There is a spurious 'sm_'.
> 

Fixed, thanks.

> Should there be a fixme/missed optimization
> comment regarding the lane mask for the
> .sync variants? Or is 0xffffffff fine for the
> foreseeable future and the comment is not needed?
> 

I think it's fine like this.

> * * *
> 
> The other question is how to move forward from there,
> i.e. when to move requiring CUDA 10+ (6.3) by default,
> permitting -mptx=3.1 only as legacy mode?
> 

I filed today PR Bug 100565 - "[nvptx] Need configure options for misa
default". So, my thinking is that once we can set -misa and -mptx
defaults using configure options, changing the default in the source
code should have less of an impact.

Anyway, I think a reasonable way of dealing with this is to follow the
latest stable CUDA release: if that stops supporting something, the
default should move on to accommodate for that.

> And how to test this best in the testsuite? Namely,
> should we iterate through both ISA modes? Or specify
> manually in some tests? Just test the default regularily?
> 

I would probably go for some default config that works well for my
hardware and driver and test that, and then once in a while test other
configurations.

> How to handle sm_xx which are not supported by the
> default/specified -misa=sm_...? (Error out?)

I think we're ok for the current matrix.  I guess with sm_70 that'll be
different.  I'd say the solution there will be dictated by what the
error mode will look like otherwise.

> And when/whether to move to a higher sm_... value by default?
> 
> (I have not checked but it seems as sm_70+ is the largest
> step but sm_70 not yet widely used; hence, sticking to
> sm_35 for a while is probably fine.)
> 

sm_35 is still supported by cuda 11.3, I'm hoping the same for the next
one (11.4 IIUC).

Thanks,
- Tom
diff mbox series

Patch

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index ce88245955b..bfa926ef0f7 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -26,5 +26,11 @@  enum ptx_isa
   PTX_ISA_SM35
 };
 
+enum ptx_version
+{
+  PTX_VERSION_3_1,
+  PTX_VERSION_6_3
+};
+
 #endif
 
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 7a7a9130e84..ebbfa921589 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5309,7 +5309,10 @@  static void
 nvptx_file_start (void)
 {
   fputs ("// BEGIN PREAMBLE\n", asm_out_file);
-  fputs ("\t.version\t3.1\n", asm_out_file);
+  if (TARGET_PTX_6_3)
+    fputs ("\t.version\t6.3\n", asm_out_file);
+  else
+    fputs ("\t.version\t3.1\n", asm_out_file);
   if (TARGET_SM35)
     fputs ("\t.target\tsm_35\n", asm_out_file);
   else
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 2451703e77f..fdaacdd72d8 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -98,6 +98,8 @@ 
 
 #define TARGET_SM35 (ptx_isa_option >= PTX_ISA_SM35)
 
+#define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
+
 /* Registers.  Since ptx is a virtual target, we just define a few
    hard registers for special purposes and leave pseudos unallocated.
    We have to have some available hard registers, to keep gcc setup
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 0f15609ee4b..00bb8fea821 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1452,14 +1452,24 @@ 
 		 (match_operand:SI 3 "const_int_operand" "n")]
 		  UNSPEC_SHUFFLE))]
   ""
-  "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;")
+  {
+    if (TARGET_PTX_6_3)
+      return "%.\\tshfl.sync%S3.b32\\t%0, %1, %2, 31, 0xffffffff;";
+    else
+      return "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;";
+  })
 
 (define_insn "nvptx_vote_ballot"
   [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
 	(unspec:SI [(match_operand:BI 1 "nvptx_register_operand" "R")]
 		   UNSPEC_VOTE_BALLOT))]
   ""
-  "%.\\tvote.ballot.b32\\t%0, %1;")
+  {
+    if (TARGET_PTX_6_3)
+      return "%.\\tvote.sync.ballot.b32\\t%0, %1, 0xffffffff;";
+    else
+      return "%.\\tvote.ballot.b32\\t%0, %1;";
+  })
 
 ;; Patterns for OpenMP SIMD-via-SIMT lowering
 
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 51363e4e276..468c6cafd57 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -65,3 +65,17 @@  Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 misa=
 Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.
+
+Enum
+Name(ptx_version) Type(int)
+Known PTX versions (for use with the -mptx= option):
+
+EnumValue
+Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
+
+EnumValue
+Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
+
+mptx=
+Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Init(PTX_VERSION_3_1)
+Specify the version of the ptx version to use.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40cacc6f8e7..61e879d0067 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26288,6 +26288,12 @@  Generate code for given the specified PTX ISA (e.g.@: @samp{sm_35}).  ISA
 strings must be lower-case.  Valid ISA strings include @samp{sm_30} and
 @samp{sm_35}.  The default ISA is sm_35.
 
+@item -mptx=@var{version-string}
+@opindex mptx
+Generate code for given the specified PTX version (e.g.@: @samp{6.3}).
+Valid version strings include @samp{3.1} and @samp{6.3}.  The default PTX
+version is sm_3.1.
+
 @item -mmainkernel
 @opindex mmainkernel
 Link in code for a __main kernel.  This is for stand-alone instead of