Message ID | 20210512141046.GA8844@delia |
---|---|
State | New |
Headers | show |
Series | [nvptx] Add -mptx=3.1/6.3 | expand |
Hi, On 12.05.21 16:10, Tom de Vries wrote: > Add nvptx option -mptx that sets the ptx ISA version. This is currently > hardcoded to 3.1. > Tested libgomp on x86_64-linux with nvptx accelerator, both with default set to > 3.1 and 6.3. > Any comments? :-) ISA 3.1 = CUDA 5 (supporting sm_10 to sm_{30,35} ISA 6.3 = CUDA 10.0 (supporting sm_10 to sm_{70,72,75} I think it is useful – both to move to new -misa (beyond sm_30 and sm_35) which require a newer ISA for some features. But a lot of new features (like .alias) are generic and also very useful. It also permits to use the .alias feature for PR 97102 (see attached patch). There is one typo in the doc: 'The default PTX version is sm_3.1.' There is a spurious 'sm_'. Should there be a fixme/missed optimization comment regarding the lane mask for the .sync variants? Or is 0xffffffff fine for the foreseeable future and the comment is not needed? * * * The other question is how to move forward from there, i.e. when to move requiring CUDA 10+ (6.3) by default, permitting -mptx=3.1 only as legacy mode? And how to test this best in the testsuite? Namely, should we iterate through both ISA modes? Or specify manually in some tests? Just test the default regularily? How to handle sm_xx which are not supported by the default/specified -misa=sm_...? (Error out?) And when/whether to move to a higher sm_... value by default? (I have not checked but it seems as sm_70+ is the largest step but sm_70 not yet widely used; hence, sticking to sm_35 for a while is probably fine.) Tobias ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
On 5/12/21 5:50 PM, Tobias Burnus wrote: > Hi, > > On 12.05.21 16:10, Tom de Vries wrote: >> Add nvptx option -mptx that sets the ptx ISA version. This is currently >> hardcoded to 3.1. >> Tested libgomp on x86_64-linux with nvptx accelerator, both with >> default set to >> 3.1 and 6.3. >> Any comments? > > :-) > > ISA 3.1 = CUDA 5 (supporting sm_10 to sm_{30,35} > ISA 6.3 = CUDA 10.0 (supporting sm_10 to sm_{70,72,75} > > I think it is useful – both to move to new -misa (beyond > sm_30 and sm_35) which require a newer ISA for some > features. Yes, I expect that we want to add sm_70 to take advantage of cas.b16. > But a lot of new features (like .alias) are > generic and also very useful. > > It also permits to use the .alias feature for > PR 97102 (see attached patch). > Ack. > There is one typo in the doc: > 'The default PTX version is sm_3.1.' > There is a spurious 'sm_'. > Fixed, thanks. > Should there be a fixme/missed optimization > comment regarding the lane mask for the > .sync variants? Or is 0xffffffff fine for the > foreseeable future and the comment is not needed? > I think it's fine like this. > * * * > > The other question is how to move forward from there, > i.e. when to move requiring CUDA 10+ (6.3) by default, > permitting -mptx=3.1 only as legacy mode? > I filed today PR Bug 100565 - "[nvptx] Need configure options for misa default". So, my thinking is that once we can set -misa and -mptx defaults using configure options, changing the default in the source code should have less of an impact. Anyway, I think a reasonable way of dealing with this is to follow the latest stable CUDA release: if that stops supporting something, the default should move on to accommodate for that. > And how to test this best in the testsuite? Namely, > should we iterate through both ISA modes? Or specify > manually in some tests? Just test the default regularily? > I would probably go for some default config that works well for my hardware and driver and test that, and then once in a while test other configurations. > How to handle sm_xx which are not supported by the > default/specified -misa=sm_...? (Error out?) I think we're ok for the current matrix. I guess with sm_70 that'll be different. I'd say the solution there will be dictated by what the error mode will look like otherwise. > And when/whether to move to a higher sm_... value by default? > > (I have not checked but it seems as sm_70+ is the largest > step but sm_70 not yet widely used; hence, sticking to > sm_35 for a while is probably fine.) > sm_35 is still supported by cuda 11.3, I'm hoping the same for the next one (11.4 IIUC). Thanks, - Tom
diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h index ce88245955b..bfa926ef0f7 100644 --- a/gcc/config/nvptx/nvptx-opts.h +++ b/gcc/config/nvptx/nvptx-opts.h @@ -26,5 +26,11 @@ enum ptx_isa PTX_ISA_SM35 }; +enum ptx_version +{ + PTX_VERSION_3_1, + PTX_VERSION_6_3 +}; + #endif diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 7a7a9130e84..ebbfa921589 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -5309,7 +5309,10 @@ static void nvptx_file_start (void) { fputs ("// BEGIN PREAMBLE\n", asm_out_file); - fputs ("\t.version\t3.1\n", asm_out_file); + if (TARGET_PTX_6_3) + fputs ("\t.version\t6.3\n", asm_out_file); + else + fputs ("\t.version\t3.1\n", asm_out_file); if (TARGET_SM35) fputs ("\t.target\tsm_35\n", asm_out_file); else diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 2451703e77f..fdaacdd72d8 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -98,6 +98,8 @@ #define TARGET_SM35 (ptx_isa_option >= PTX_ISA_SM35) +#define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3) + /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. We have to have some available hard registers, to keep gcc setup diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 0f15609ee4b..00bb8fea821 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1452,14 +1452,24 @@ (match_operand:SI 3 "const_int_operand" "n")] UNSPEC_SHUFFLE))] "" - "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;") + { + if (TARGET_PTX_6_3) + return "%.\\tshfl.sync%S3.b32\\t%0, %1, %2, 31, 0xffffffff;"; + else + return "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;"; + }) (define_insn "nvptx_vote_ballot" [(set (match_operand:SI 0 "nvptx_register_operand" "=R") (unspec:SI [(match_operand:BI 1 "nvptx_register_operand" "R")] UNSPEC_VOTE_BALLOT))] "" - "%.\\tvote.ballot.b32\\t%0, %1;") + { + if (TARGET_PTX_6_3) + return "%.\\tvote.sync.ballot.b32\\t%0, %1, 0xffffffff;"; + else + return "%.\\tvote.ballot.b32\\t%0, %1;"; + }) ;; Patterns for OpenMP SIMD-via-SIMT lowering diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 51363e4e276..468c6cafd57 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -65,3 +65,17 @@ Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35) misa= Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM35) Specify the version of the ptx ISA to use. + +Enum +Name(ptx_version) Type(int) +Known PTX versions (for use with the -mptx= option): + +EnumValue +Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1) + +EnumValue +Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3) + +mptx= +Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) Init(PTX_VERSION_3_1) +Specify the version of the ptx version to use. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 40cacc6f8e7..61e879d0067 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -26288,6 +26288,12 @@ Generate code for given the specified PTX ISA (e.g.@: @samp{sm_35}). ISA strings must be lower-case. Valid ISA strings include @samp{sm_30} and @samp{sm_35}. The default ISA is sm_35. +@item -mptx=@var{version-string} +@opindex mptx +Generate code for given the specified PTX version (e.g.@: @samp{6.3}). +Valid version strings include @samp{3.1} and @samp{6.3}. The default PTX +version is sm_3.1. + @item -mmainkernel @opindex mmainkernel Link in code for a __main kernel. This is for stand-alone instead of