[committed] Add oacc_kernels_p argument to pass_parallelize_loops

[was: Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels ]

On 14/12/15 16:22, Richard Biener wrote:
> On Sun, Dec 13, 2015 at 5:58 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 24/11/15 13:24, Tom de Vries wrote:
>>>
>>> On 16/11/15 12:59, Tom de Vries wrote:
>>>>
>>>> On 09/11/15 20:52, Tom de Vries wrote:
>>>>>
>>>>> On 09/11/15 16:35, Tom de Vries wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> this patch series for stage1 trunk adds support to:
>>>>>> - parallelize oacc kernels regions using parloops, and
>>>>>> - map the loops onto the oacc gang dimension.
>>>>>>
>>>>>> The patch series contains these patches:
>>>>>>
>>>>>>        1    Insert new exit block only when needed in
>>>>>>           transform_to_exit_first_loop_alt
>>>>>>        2    Make create_parallel_loop return void
>>>>>>        3    Ignore reduction clause on kernels directive
>>>>>>        4    Implement -foffload-alias
>>>>>>        5    Add in_oacc_kernels_region in struct loop
>>>>>>        6    Add pass_oacc_kernels
>>>>>>        7    Add pass_dominator_oacc_kernels
>>>>>>        8    Add pass_ch_oacc_kernels
>>>>>>        9    Add pass_parallelize_loops_oacc_kernels
>>>>>>       10    Add pass_oacc_kernels pass group in passes.def
>>>>>>       11    Update testcases after adding kernels pass group
>>>>>>       12    Handle acc loop directive
>>>>>>       13    Add c-c++-common/goacc/kernels-*.c
>>>>>>       14    Add gfortran.dg/goacc/kernels-*.f95
>>>>>>       15    Add libgomp.oacc-c-c++-common/kernels-*.c
>>>>>>       16    Add libgomp.oacc-fortran/kernels-*.f95
>>>>>>
>>>>>> The first 9 patches are more or less independent, but patches 10-16 are
>>>>>> intended to be committed at the same time.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> Build and reg-tested with nvidia accelerator, in combination with a
>>>>>> patch that enables accelerator testing (which is submitted at
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
>>>>>>
>>>>>> I'll post the individual patches in reply to this message.
>>>>>
>>>>>
>>>>> This patch adds pass_parallelize_loops_oacc_kernels.
>>>>>
>>>>> There's a number of things we do differently in parloops for oacc
>>>>> kernels:
>>>>> - in normal parloops, we generate code to choose between a parallel
>>>>>     version of the loop, and a sequential (low iteration count) version.
>>>>>     Since the code in oacc kernels region is supposed to run on the
>>>>>     accelerator anyway, we skip this check, and don't add a low iteration
>>>>>     count loop.
>>>>> - in normal parloops, we generate an #pragma omp parallel /
>>>>>     GIMPLE_OMP_RETURN pair to delimit the region which will we split off
>>>>>     into a thread function. Since the oacc kernels region is already
>>>>>     split off, we don't add this pair.
>>>>> - we indicate the parallelization factor by setting the oacc function
>>>>>     attributes
>>>>> - we generate an #pragma oacc loop instead of an #pragma omp for, and
>>>>>     we add the gang clause
>>>>> - in normal parloops, we rewrite the variable accesses in the loop in
>>>>>     terms into accesses relative to a thread function parameter. For the
>>>>>     oacc kernels region, that rewrite has already been done at omp-lower,
>>>>>     so we skip this.
>>>>> - we need to ensure that the entire kernels region can be run in
>>>>>     parallel. The loop independence check is already present, so for oacc
>>>>>     kernels we add a check between blocks outside the loop and the entire
>>>>>     region.
>>>>> - we guard stores in the blocks outside the loop with gang_pos == 0.
>>>>>     There's no need for each gang to write to a single location, we can
>>>>>     do this in just one gang. (Typically this is the write of the final
>>>>>     value of the iteration variable if that one is copied back to the
>>>>>     host).
>>>>>
>>>>
>>>> Reposting with loop optimizer init added in
>>>> pass_parallelize_loops_oacc_kernels::execute.
>>>>
>>>
>>> Reposting with loop_optimizer_finalize,scev_initialize and scev_finalize
>>>    added in pass_parallelize_loops_oacc_kernels::execute.
>>>
>>
>> Ping.
>>
>> Anything I can do to facilitate the review?
>
> Document new functions.

Done.

avoid if (1).

Done.

> Ideally some refactoring would avoid some of the if (!oacc_kernels_p) spaghetti

Ack. For now, i've tried to minimize the number of oacc_kernels_p tests 
in the code.

Further suggestions on how to improve here are much appreciated.

> but I'm considering tree-parloops.c (and its bugs) yours.

Ack.

> Can the pass not just use a pass parameter to switch between oacc/non-oacc?
>

This patch introduces the pass parameter oacc_kernels_p (but does not 
instantiate an oacc_kernels_p == true pass version yet).

Bootstrapped and reg-tested on x86_64.

Committed to trunk.

Thanks,
- Tom

[committed] Add oacc_kernels_p argument to pass_parallelize_loops

Commit Message

Comments

Patch