Message ID | 87wpnq0zbg.fsf@hertz.schwinge.homeip.net |
---|---|
State | New |
Headers | show |
On 04/22/2016 03:26 AM, Thomas Schwinge wrote: > > Thanks for the review; OK to commit as follows? And then, should > something be added to the "News" section on <https://gcc.gnu.org/> > itself, too? (I don't know the policy for that. We didn't suggest that > for GCC 5, because at that time we described the support as a > "preliminary implementation of the OpenACC 2.0a specification"; now it's > much more complete and usable.) I think the new patch is acceptable for release notes, but TBH I don't know what the policy is for updating "News", either. :-S -Sandra
On Fri, Apr 22, 2016 at 11:26:11AM +0200, Thomas Schwinge wrote: > Index: htdocs/gcc-6/changes.html > =================================================================== > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v > retrieving revision 1.75 > diff -u -p -r1.75 changes.html LGTM. > --- htdocs/gcc-6/changes.html 21 Apr 2016 15:57:43 -0000 1.75 > +++ htdocs/gcc-6/changes.html 22 Apr 2016 09:22:19 -0000 > @@ -124,6 +124,52 @@ For more information, see the > <!-- .................................................................. --> > <h2 id="languages">New Languages and Language specific improvements</h2> > > +<!-- <ul> > + <li> -->Compared to GCC 5, the GCC 6 release series includes a much improved > + implementation of the <a href="http://www.openacc.org/">OpenACC 2.0a > + specification</a>. Highlights are: > + <ul> > + <li>In addition to single-threaded host-fallback execution, offloading is > + supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit > + little-endian GNU/Linux host systems. For nvptx offloading, with the > + OpenACC parallel construct, the execution model allows for an arbitrary > + number of gangs, up to 32 workers, and 32 vectors.</li> > + <li>Initial support for parallelized execution of OpenACC kernels > + constructs: > + <ul> > + <li>Parallelization of a kernels region is switched on > + by <code>-fopenacc</code> combined with <code>-O2</code> or > + higher.</li> > + <li>Code is offloaded onto multiple gangs, but executes with just one > + worker, and a vector length of 1.</li> > + <li>Directives inside a kernels region are not supported.</li> > + <li>Loops with reductions can be parallelized.</li> > + <li>Only kernels regions with one loop nest are parallelized.</li> > + <li>Only the outer-most loop of a loop nest can be parallelized.</li> > + <li>Loop nests containing sibling loops are not parallelized.</li> > + </ul> > + Typically, using the OpenACC parallel construct gives much better > + performance, compared to the initial support of the OpenACC kernels > + construct. > + <li>The <code>device_type</code> clause is not supported. > + The <code>bind</code> and <code>nohost</code> clauses are not > + supported. The <code>host_data</code> directive is not supported in > + Fortran.</li> > + <li>Nested parallelism (cf. CUDA dynamic parallelism) is not > + supported.</li> > + <li>Usage of OpenACC constructs inside multithreaded contexts (such as > + created by OpenMP, or pthread programming) is not supported.</li> > + <li>If a call to the <code>acc_on_device</code> function has a > + compile-time constant argument, the function call evaluates to a > + compile-time constant value only for C and C++ but not for > + Fortran.</li> > + </ul> > + See the <a href="https://gcc.gnu.org/wiki/OpenACC">OpenACC</a> > + and <a href="https://gcc.gnu.org/wiki/Offloading">Offloading</a> wiki pages > + for further information. > + <!-- </li> > +</ul> --> > + > <!-- <h3 id="ada">Ada</h3> --> > > <h3 id="c-family">C family</h3> Jakub
[ Old e-mail alert ] On Fri, 22 Apr 2016, Thomas Schwinge wrote: > Thanks for the review; OK to commit as follows? And then, should > something be added to the "News" section on <https://gcc.gnu.org/> > itself, too? (I don't know the policy for that. We didn't suggest that > for GCC 5, because at that time we described the support as a > "preliminary implementation of the OpenACC 2.0a specification"; now > it's much more complete and usable.) Yes, definitely. Sorry for not picking this up earlier, but this definitely is a strong News item. If you feel that particular item is not suitable any longer, perhaps there is another one (or there are other ones, plural)? As a rule of thumb, when in doubt, propose a News item. As a group we have been way too conservative on that front. Gerald
Index: htdocs/gcc-6/changes.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.75 diff -u -p -r1.75 changes.html --- htdocs/gcc-6/changes.html 21 Apr 2016 15:57:43 -0000 1.75 +++ htdocs/gcc-6/changes.html 22 Apr 2016 09:22:19 -0000 @@ -124,6 +124,52 @@ For more information, see the <!-- .................................................................. --> <h2 id="languages">New Languages and Language specific improvements</h2> +<!-- <ul> + <li> -->Compared to GCC 5, the GCC 6 release series includes a much improved + implementation of the <a href="http://www.openacc.org/">OpenACC 2.0a + specification</a>. Highlights are: + <ul> + <li>In addition to single-threaded host-fallback execution, offloading is + supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit + little-endian GNU/Linux host systems. For nvptx offloading, with the + OpenACC parallel construct, the execution model allows for an arbitrary + number of gangs, up to 32 workers, and 32 vectors.</li> + <li>Initial support for parallelized execution of OpenACC kernels + constructs: + <ul> + <li>Parallelization of a kernels region is switched on + by <code>-fopenacc</code> combined with <code>-O2</code> or + higher.</li> + <li>Code is offloaded onto multiple gangs, but executes with just one + worker, and a vector length of 1.</li> + <li>Directives inside a kernels region are not supported.</li> + <li>Loops with reductions can be parallelized.</li> + <li>Only kernels regions with one loop nest are parallelized.</li> + <li>Only the outer-most loop of a loop nest can be parallelized.</li> + <li>Loop nests containing sibling loops are not parallelized.</li> + </ul> + Typically, using the OpenACC parallel construct gives much better + performance, compared to the initial support of the OpenACC kernels + construct. + <li>The <code>device_type</code> clause is not supported. + The <code>bind</code> and <code>nohost</code> clauses are not + supported. The <code>host_data</code> directive is not supported in + Fortran.</li> + <li>Nested parallelism (cf. CUDA dynamic parallelism) is not + supported.</li> + <li>Usage of OpenACC constructs inside multithreaded contexts (such as + created by OpenMP, or pthread programming) is not supported.</li> + <li>If a call to the <code>acc_on_device</code> function has a + compile-time constant argument, the function call evaluates to a + compile-time constant value only for C and C++ but not for + Fortran.</li> + </ul> + See the <a href="https://gcc.gnu.org/wiki/OpenACC">OpenACC</a> + and <a href="https://gcc.gnu.org/wiki/Offloading">Offloading</a> wiki pages + for further information. + <!-- </li> +</ul> --> + <!-- <h3 id="ada">Ada</h3> --> <h3 id="c-family">C family</h3>