diff mbox

[vec-tails] Support loop epilogue vectorization

Message ID CAEoMCqS2O3v0gf9ST00Kma9uFAT=FPJJGwZVLuefMKFh1uuKRA@mail.gmail.com
State New
Headers show

Commit Message

Yuri Rumyantsev Nov. 11, 2016, 11:15 a.m. UTC
Richard,

I prepare updated 3 patch with passing additional argument to
vect_analyze_loop as you proposed (untested).

You wrote:
tw, I wonder if you can produce a single patch containing just
epilogue vectorization, that is combine patches 1-3 but rip out
changes only needed by later patches?

Did you mean that I exclude all support for vectorization epilogues,
i.e. exclude from 2-nd patch all non-related changes
like


Could you please look at updated patch?

Thanks.
Yuri.

2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> On Thu, 10 Nov 2016, Richard Biener wrote:
>
>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>
>> > Richard,
>> >
>> > Here is updated 3 patch.
>> >
>> > I checked that all new tests related to epilogue vectorization passed with it.
>> >
>> > Your comments will be appreciated.
>>
>> A lot better now.  Instead of the ->aux dance I now prefer to
>> pass the original loops loop_vinfo to vect_analyze_loop as
>> optional argument (if non-NULL we analyze the epilogue of that
>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> original vectorization factor?  So we can pass down an (optional)
>> forced vectorization factor as well?
>
> Btw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Thanks,
> Richard.
>
>> Richard.
>>
>> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> > >
>> > >> Hi Richard,
>> > >>
>> > >> I did not understand your last remark:
>> > >>
>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> > >> >
>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > >> >           && dump_enabled_p ())
>> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > >> >                            "loop vectorized\n");
>> > >> > -       vect_transform_loop (loop_vinfo);
>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> > >> >         num_vectorized_loops++;
>> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
>> > >> >           etc.  */
>> > >> >      loop->force_vectorize = false;
>> > >> >
>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> > >> > +          to match loop and its epilogue vectorization in dumps
>> > >> > +          put new loop as the next loop to process.  */
>> > >> > +       if (new_loop)
>> > >> > +         {
>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> > >> > +           vect_loops_num = number_of_loops (cfun);
>> > >> > +         }
>> > >> >
>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > >> f> unction which will set up stuff properly (and also perform
>> > >> > the if-conversion of the epilogue there).
>> > >> >
>> > >> > That said, if we can get in non-masked epilogue vectorization
>> > >> > separately that would be great.
>> > >>
>> > >> Could you please clarify your proposal.
>> > >
>> > > When a loop was vectorized set things up to immediately vectorize
>> > > its epilogue, avoiding changing the loop iteration and avoiding
>> > > the re-use of ->aux.
>> > >
>> > > Richard.
>> > >
>> > >> Thanks.
>> > >> Yuri.
>> > >>
>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> > >> >
>> > >> >> Hi All,
>> > >> >>
>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> > >> >> approved by Jeff.
>> > >> >>
>> > >> >> I did re-base of all patches and performed bootstrapping and
>> > >> >> regression testing that did not show any new failures. Also all
>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>> > >> >> accordingly.
>> > >> >>
>> > >> >> Is it OK for trunk?
>> > >> >
>> > >> > I would have prefered that the series up to -03-nomask-tails would
>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>> > >> > the patchset is oddly separated.
>> > >> >
>> > >> > I have a comment on that part nevertheless:
>> > >> >
>> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > >> > loop_vinfo)
>> > >> >    /* Check if we can possibly peel the loop.  */
>> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>> > >> > -      || loop->inner)
>> > >> > +      || loop->inner
>> > >> > +      /* Required peeling was performed in prologue and
>> > >> > +        is not required for epilogue.  */
>> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> > >> >      do_peeling = false;
>> > >> >
>> > >> >    if (do_peeling
>> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> > >> > loop_vinfo)
>> > >> >
>> > >> >    do_versioning =
>> > >> >         optimize_loop_nest_for_speed_p (loop)
>> > >> > -       && (!loop->inner); /* FORNOW */
>> > >> > +       && (!loop->inner) /* FORNOW */
>> > >> > +        /* Required versioning was performed for the
>> > >> > +          original loop and is not required for epilogue.  */
>> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> > >> >
>> > >> >    if (do_versioning)
>> > >> >      {
>> > >> >
>> > >> > please do that check in the single caller of this function.
>> > >> >
>> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
>> > >> > passing down info from the processed parent would be _much_ cleaner.
>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> > >> >
>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> > >> >             && dump_enabled_p ())
>> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> > >> >                             "loop vectorized\n");
>> > >> > -       vect_transform_loop (loop_vinfo);
>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> > >> >         num_vectorized_loops++;
>> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
>> > >> >            etc.  */
>> > >> >         loop->force_vectorize = false;
>> > >> >
>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> > >> > +          to match loop and its epilogue vectorization in dumps
>> > >> > +          put new loop as the next loop to process.  */
>> > >> > +       if (new_loop)
>> > >> > +         {
>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> > >> > +           vect_loops_num = number_of_loops (cfun);
>> > >> > +         }
>> > >> >
>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> > >> > function which will set up stuff properly (and also perform
>> > >> > the if-conversion of the epilogue there).
>> > >> >
>> > >> > That said, if we can get in non-masked epilogue vectorization
>> > >> > separately that would be great.
>> > >> >
>> > >> > I'm still torn about all the rest of the stuff and question its
>> > >> > usability (esp. merging the epilogue with the main vector loop).
>> > >> > But it has already been approved ... oh well.
>> > >> >
>> > >> > Thanks,
>> > >> > Richard.
>> > >>
>> > >>
>> > >
>> > > --
>> > > Richard Biener <rguenther@suse.de>
>> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Comments

Yuri Rumyantsev Nov. 11, 2016, 2:15 p.m. UTC | #1
Richard,

Sorry for confusion but my updated patch  does not work properly, so I
need to fix it.

Yuri.

2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> I prepare updated 3 patch with passing additional argument to
> vect_analyze_loop as you proposed (untested).
>
> You wrote:
> tw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
>
> Did you mean that I exclude all support for vectorization epilogues,
> i.e. exclude from 2-nd patch all non-related changes
> like
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 11863af..32011c1 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> +  LOOP_VINFO_NEED_MASKING (res) = false;
> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>
> Did you mean also that new combined patch must be working patch, i.e.
> can be integrated without other patches?
>
> Could you please look at updated patch?
>
> Thanks.
> Yuri.
>
> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>
>>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>
>>> > Richard,
>>> >
>>> > Here is updated 3 patch.
>>> >
>>> > I checked that all new tests related to epilogue vectorization passed with it.
>>> >
>>> > Your comments will be appreciated.
>>>
>>> A lot better now.  Instead of the ->aux dance I now prefer to
>>> pass the original loops loop_vinfo to vect_analyze_loop as
>>> optional argument (if non-NULL we analyze the epilogue of that
>>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>> original vectorization factor?  So we can pass down an (optional)
>>> forced vectorization factor as well?
>>
>> Btw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Thanks,
>> Richard.
>>
>>> Richard.
>>>
>>> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>> > >
>>> > >> Hi Richard,
>>> > >>
>>> > >> I did not understand your last remark:
>>> > >>
>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> > >> >
>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> > >> >           && dump_enabled_p ())
>>> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>> > >> >                            "loop vectorized\n");
>>> > >> > -       vect_transform_loop (loop_vinfo);
>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>> > >> >         num_vectorized_loops++;
>>> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
>>> > >> >           etc.  */
>>> > >> >      loop->force_vectorize = false;
>>> > >> >
>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>>> > >> > +          to match loop and its epilogue vectorization in dumps
>>> > >> > +          put new loop as the next loop to process.  */
>>> > >> > +       if (new_loop)
>>> > >> > +         {
>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>> > >> > +           vect_loops_num = number_of_loops (cfun);
>>> > >> > +         }
>>> > >> >
>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>> > >> f> unction which will set up stuff properly (and also perform
>>> > >> > the if-conversion of the epilogue there).
>>> > >> >
>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>> > >> > separately that would be great.
>>> > >>
>>> > >> Could you please clarify your proposal.
>>> > >
>>> > > When a loop was vectorized set things up to immediately vectorize
>>> > > its epilogue, avoiding changing the loop iteration and avoiding
>>> > > the re-use of ->aux.
>>> > >
>>> > > Richard.
>>> > >
>>> > >> Thanks.
>>> > >> Yuri.
>>> > >>
>>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>> > >> >
>>> > >> >> Hi All,
>>> > >> >>
>>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>> > >> >> approved by Jeff.
>>> > >> >>
>>> > >> >> I did re-base of all patches and performed bootstrapping and
>>> > >> >> regression testing that did not show any new failures. Also all
>>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>>> > >> >> accordingly.
>>> > >> >>
>>> > >> >> Is it OK for trunk?
>>> > >> >
>>> > >> > I would have prefered that the series up to -03-nomask-tails would
>>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>>> > >> > the patchset is oddly separated.
>>> > >> >
>>> > >> > I have a comment on that part nevertheless:
>>> > >> >
>>> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>>> > >> > loop_vinfo)
>>> > >> >    /* Check if we can possibly peel the loop.  */
>>> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>>> > >> > -      || loop->inner)
>>> > >> > +      || loop->inner
>>> > >> > +      /* Required peeling was performed in prologue and
>>> > >> > +        is not required for epilogue.  */
>>> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>> > >> >      do_peeling = false;
>>> > >> >
>>> > >> >    if (do_peeling
>>> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>>> > >> > loop_vinfo)
>>> > >> >
>>> > >> >    do_versioning =
>>> > >> >         optimize_loop_nest_for_speed_p (loop)
>>> > >> > -       && (!loop->inner); /* FORNOW */
>>> > >> > +       && (!loop->inner) /* FORNOW */
>>> > >> > +        /* Required versioning was performed for the
>>> > >> > +          original loop and is not required for epilogue.  */
>>> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>> > >> >
>>> > >> >    if (do_versioning)
>>> > >> >      {
>>> > >> >
>>> > >> > please do that check in the single caller of this function.
>>> > >> >
>>> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
>>> > >> > passing down info from the processed parent would be _much_ cleaner.
>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> > >> >
>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> > >> >             && dump_enabled_p ())
>>> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>> > >> >                             "loop vectorized\n");
>>> > >> > -       vect_transform_loop (loop_vinfo);
>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>> > >> >         num_vectorized_loops++;
>>> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
>>> > >> >            etc.  */
>>> > >> >         loop->force_vectorize = false;
>>> > >> >
>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>>> > >> > +          to match loop and its epilogue vectorization in dumps
>>> > >> > +          put new loop as the next loop to process.  */
>>> > >> > +       if (new_loop)
>>> > >> > +         {
>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>> > >> > +           vect_loops_num = number_of_loops (cfun);
>>> > >> > +         }
>>> > >> >
>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>> > >> > function which will set up stuff properly (and also perform
>>> > >> > the if-conversion of the epilogue there).
>>> > >> >
>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>> > >> > separately that would be great.
>>> > >> >
>>> > >> > I'm still torn about all the rest of the stuff and question its
>>> > >> > usability (esp. merging the epilogue with the main vector loop).
>>> > >> > But it has already been approved ... oh well.
>>> > >> >
>>> > >> > Thanks,
>>> > >> > Richard.
>>> > >>
>>> > >>
>>> > >
>>> > > --
>>> > > Richard Biener <rguenther@suse.de>
>>> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>> >
>>>
>>>
>>
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Yuri Rumyantsev Nov. 11, 2016, 2:43 p.m. UTC | #2
Richard,

Here is fixed version of updated patch 3.

Any comments will be appreciated.

Thanks.
Yuri.

2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> Sorry for confusion but my updated patch  does not work properly, so I
> need to fix it.
>
> Yuri.
>
> 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Richard,
>>
>> I prepare updated 3 patch with passing additional argument to
>> vect_analyze_loop as you proposed (untested).
>>
>> You wrote:
>> tw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Did you mean that I exclude all support for vectorization epilogues,
>> i.e. exclude from 2-nd patch all non-related changes
>> like
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 11863af..32011c1 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>
>> Did you mean also that new combined patch must be working patch, i.e.
>> can be integrated without other patches?
>>
>> Could you please look at updated patch?
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> On Thu, 10 Nov 2016, Richard Biener wrote:
>>>
>>>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>
>>>> > Richard,
>>>> >
>>>> > Here is updated 3 patch.
>>>> >
>>>> > I checked that all new tests related to epilogue vectorization passed with it.
>>>> >
>>>> > Your comments will be appreciated.
>>>>
>>>> A lot better now.  Instead of the ->aux dance I now prefer to
>>>> pass the original loops loop_vinfo to vect_analyze_loop as
>>>> optional argument (if non-NULL we analyze the epilogue of that
>>>> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>> original vectorization factor?  So we can pass down an (optional)
>>>> forced vectorization factor as well?
>>>
>>> Btw, I wonder if you can produce a single patch containing just
>>> epilogue vectorization, that is combine patches 1-3 but rip out
>>> changes only needed by later patches?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Richard.
>>>>
>>>> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>> > >
>>>> > >> Hi Richard,
>>>> > >>
>>>> > >> I did not understand your last remark:
>>>> > >>
>>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>> > >> >
>>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>> > >> >           && dump_enabled_p ())
>>>> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>>> > >> >                            "loop vectorized\n");
>>>> > >> > -       vect_transform_loop (loop_vinfo);
>>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>> > >> >         num_vectorized_loops++;
>>>> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
>>>> > >> >           etc.  */
>>>> > >> >      loop->force_vectorize = false;
>>>> > >> >
>>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>>>> > >> > +          to match loop and its epilogue vectorization in dumps
>>>> > >> > +          put new loop as the next loop to process.  */
>>>> > >> > +       if (new_loop)
>>>> > >> > +         {
>>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>> > >> > +         }
>>>> > >> >
>>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>>> > >> f> unction which will set up stuff properly (and also perform
>>>> > >> > the if-conversion of the epilogue there).
>>>> > >> >
>>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>>> > >> > separately that would be great.
>>>> > >>
>>>> > >> Could you please clarify your proposal.
>>>> > >
>>>> > > When a loop was vectorized set things up to immediately vectorize
>>>> > > its epilogue, avoiding changing the loop iteration and avoiding
>>>> > > the re-use of ->aux.
>>>> > >
>>>> > > Richard.
>>>> > >
>>>> > >> Thanks.
>>>> > >> Yuri.
>>>> > >>
>>>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>> > >> >
>>>> > >> >> Hi All,
>>>> > >> >>
>>>> > >> >> I re-send all patches sent by Ilya earlier for review which support
>>>> > >> >> vectorization of loop epilogues and loops with low trip count. We
>>>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>>> > >> >> approved by Jeff.
>>>> > >> >>
>>>> > >> >> I did re-base of all patches and performed bootstrapping and
>>>> > >> >> regression testing that did not show any new failures. Also all
>>>> > >> >> changes related to new vect_do_peeling algorithm have been changed
>>>> > >> >> accordingly.
>>>> > >> >>
>>>> > >> >> Is it OK for trunk?
>>>> > >> >
>>>> > >> > I would have prefered that the series up to -03-nomask-tails would
>>>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>>>> > >> > the patchset is oddly separated.
>>>> > >> >
>>>> > >> > I have a comment on that part nevertheless:
>>>> > >> >
>>>> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>>>> > >> > loop_vinfo)
>>>> > >> >    /* Check if we can possibly peel the loop.  */
>>>> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>>>> > >> > -      || loop->inner)
>>>> > >> > +      || loop->inner
>>>> > >> > +      /* Required peeling was performed in prologue and
>>>> > >> > +        is not required for epilogue.  */
>>>> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>> > >> >      do_peeling = false;
>>>> > >> >
>>>> > >> >    if (do_peeling
>>>> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>>>> > >> > loop_vinfo)
>>>> > >> >
>>>> > >> >    do_versioning =
>>>> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>> > >> > -       && (!loop->inner); /* FORNOW */
>>>> > >> > +       && (!loop->inner) /* FORNOW */
>>>> > >> > +        /* Required versioning was performed for the
>>>> > >> > +          original loop and is not required for epilogue.  */
>>>> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>> > >> >
>>>> > >> >    if (do_versioning)
>>>> > >> >      {
>>>> > >> >
>>>> > >> > please do that check in the single caller of this function.
>>>> > >> >
>>>> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
>>>> > >> > passing down info from the processed parent would be _much_ cleaner.
>>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>> > >> >
>>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>> > >> >             && dump_enabled_p ())
>>>> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>>>> > >> >                             "loop vectorized\n");
>>>> > >> > -       vect_transform_loop (loop_vinfo);
>>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>> > >> >         num_vectorized_loops++;
>>>> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
>>>> > >> >            etc.  */
>>>> > >> >         loop->force_vectorize = false;
>>>> > >> >
>>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
>>>> > >> > +          to match loop and its epilogue vectorization in dumps
>>>> > >> > +          put new loop as the next loop to process.  */
>>>> > >> > +       if (new_loop)
>>>> > >> > +         {
>>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>> > >> > +         }
>>>> > >> >
>>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>>>> > >> > function which will set up stuff properly (and also perform
>>>> > >> > the if-conversion of the epilogue there).
>>>> > >> >
>>>> > >> > That said, if we can get in non-masked epilogue vectorization
>>>> > >> > separately that would be great.
>>>> > >> >
>>>> > >> > I'm still torn about all the rest of the stuff and question its
>>>> > >> > usability (esp. merging the epilogue with the main vector loop).
>>>> > >> > But it has already been approved ... oh well.
>>>> > >> >
>>>> > >> > Thanks,
>>>> > >> > Richard.
>>>> > >>
>>>> > >>
>>>> > >
>>>> > > --
>>>> > > Richard Biener <rguenther@suse.de>
>>>> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>>> >
>>>>
>>>>
>>>
>>> --
>>> Richard Biener <rguenther@suse.de>
>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 14, 2016, 12:51 p.m. UTC | #3
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> I prepare updated 3 patch with passing additional argument to
> vect_analyze_loop as you proposed (untested).
> 
> You wrote:
> tw, I wonder if you can produce a single patch containing just
> epilogue vectorization, that is combine patches 1-3 but rip out
> changes only needed by later patches?
> 
> Did you mean that I exclude all support for vectorization epilogues,
> i.e. exclude from 2-nd patch all non-related changes
> like
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 11863af..32011c1 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> +  LOOP_VINFO_NEED_MASKING (res) = false;
> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;

Yes.
 
> Did you mean also that new combined patch must be working patch, i.e.
> can be integrated without other patches?

Yes.

> Could you please look at updated patch?

Will do.

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >
> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>
> >> > Richard,
> >> >
> >> > Here is updated 3 patch.
> >> >
> >> > I checked that all new tests related to epilogue vectorization passed with it.
> >> >
> >> > Your comments will be appreciated.
> >>
> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >> optional argument (if non-NULL we analyze the epilogue of that
> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >> original vectorization factor?  So we can pass down an (optional)
> >> forced vectorization factor as well?
> >
> > Btw, I wonder if you can produce a single patch containing just
> > epilogue vectorization, that is combine patches 1-3 but rip out
> > changes only needed by later patches?
> >
> > Thanks,
> > Richard.
> >
> >> Richard.
> >>
> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >> > >
> >> > >> Hi Richard,
> >> > >>
> >> > >> I did not understand your last remark:
> >> > >>
> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> > >> >
> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > >> >           && dump_enabled_p ())
> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > >> >                            "loop vectorized\n");
> >> > >> > -       vect_transform_loop (loop_vinfo);
> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >> > >> >         num_vectorized_loops++;
> >> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
> >> > >> >           etc.  */
> >> > >> >      loop->force_vectorize = false;
> >> > >> >
> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >> > >> > +          to match loop and its epilogue vectorization in dumps
> >> > >> > +          put new loop as the next loop to process.  */
> >> > >> > +       if (new_loop)
> >> > >> > +         {
> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >> > >> > +         }
> >> > >> >
> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> > >> f> unction which will set up stuff properly (and also perform
> >> > >> > the if-conversion of the epilogue there).
> >> > >> >
> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> > >> > separately that would be great.
> >> > >>
> >> > >> Could you please clarify your proposal.
> >> > >
> >> > > When a loop was vectorized set things up to immediately vectorize
> >> > > its epilogue, avoiding changing the loop iteration and avoiding
> >> > > the re-use of ->aux.
> >> > >
> >> > > Richard.
> >> > >
> >> > >> Thanks.
> >> > >> Yuri.
> >> > >>
> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> > >> >
> >> > >> >> Hi All,
> >> > >> >>
> >> > >> >> I re-send all patches sent by Ilya earlier for review which support
> >> > >> >> vectorization of loop epilogues and loops with low trip count. We
> >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> > >> >> approved by Jeff.
> >> > >> >>
> >> > >> >> I did re-base of all patches and performed bootstrapping and
> >> > >> >> regression testing that did not show any new failures. Also all
> >> > >> >> changes related to new vect_do_peeling algorithm have been changed
> >> > >> >> accordingly.
> >> > >> >>
> >> > >> >> Is it OK for trunk?
> >> > >> >
> >> > >> > I would have prefered that the series up to -03-nomask-tails would
> >> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >> > >> > the patchset is oddly separated.
> >> > >> >
> >> > >> > I have a comment on that part nevertheless:
> >> > >> >
> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > >> > loop_vinfo)
> >> > >> >    /* Check if we can possibly peel the loop.  */
> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >> > >> > -      || loop->inner)
> >> > >> > +      || loop->inner
> >> > >> > +      /* Required peeling was performed in prologue and
> >> > >> > +        is not required for epilogue.  */
> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >> > >> >      do_peeling = false;
> >> > >> >
> >> > >> >    if (do_peeling
> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> > >> > loop_vinfo)
> >> > >> >
> >> > >> >    do_versioning =
> >> > >> >         optimize_loop_nest_for_speed_p (loop)
> >> > >> > -       && (!loop->inner); /* FORNOW */
> >> > >> > +       && (!loop->inner) /* FORNOW */
> >> > >> > +        /* Required versioning was performed for the
> >> > >> > +          original loop and is not required for epilogue.  */
> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >> > >> >
> >> > >> >    if (do_versioning)
> >> > >> >      {
> >> > >> >
> >> > >> > please do that check in the single caller of this function.
> >> > >> >
> >> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >> > >> > passing down info from the processed parent would be _much_ cleaner.
> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> > >> >
> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> > >> >             && dump_enabled_p ())
> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> > >> >                             "loop vectorized\n");
> >> > >> > -       vect_transform_loop (loop_vinfo);
> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >> > >> >         num_vectorized_loops++;
> >> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
> >> > >> >            etc.  */
> >> > >> >         loop->force_vectorize = false;
> >> > >> >
> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >> > >> > +          to match loop and its epilogue vectorization in dumps
> >> > >> > +          put new loop as the next loop to process.  */
> >> > >> > +       if (new_loop)
> >> > >> > +         {
> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >> > >> > +         }
> >> > >> >
> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> > >> > function which will set up stuff properly (and also perform
> >> > >> > the if-conversion of the epilogue there).
> >> > >> >
> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> > >> > separately that would be great.
> >> > >> >
> >> > >> > I'm still torn about all the rest of the stuff and question its
> >> > >> > usability (esp. merging the epilogue with the main vector loop).
> >> > >> > But it has already been approved ... oh well.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Richard.
> >> > >>
> >> > >>
> >> > >
> >> > > --
> >> > > Richard Biener <rguenther@suse.de>
> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >> >
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>
Richard Biener Nov. 14, 2016, 12:56 p.m. UTC | #4
On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> Here is fixed version of updated patch 3.
> 
> Any comments will be appreciated.

Looks good apart from

+  if (epilogue)
+    {
+      epilogue->force_vectorize = loop->force_vectorize;
+      epilogue->safelen = loop->safelen;
+      epilogue->dont_vectorize = false;
+
+      /* We may need to if-convert epilogue to vectorize it.  */
+      if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
+       tree_if_conversion (epilogue);
+
+      gcc_assert (!epilogue->aux);
+      epilogue->aux = loop_vinfo;

where the last two lines should now no longer be necessary?

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-11 17:15 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> > Richard,
> >
> > Sorry for confusion but my updated patch  does not work properly, so I
> > need to fix it.
> >
> > Yuri.
> >
> > 2016-11-11 14:15 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >>
> >> Did you mean also that new combined patch must be working patch, i.e.
> >> can be integrated without other patches?
> >>
> >> Could you please look at updated patch?
> >>
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>> On Thu, 10 Nov 2016, Richard Biener wrote:
> >>>
> >>>> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>>>
> >>>> > Richard,
> >>>> >
> >>>> > Here is updated 3 patch.
> >>>> >
> >>>> > I checked that all new tests related to epilogue vectorization passed with it.
> >>>> >
> >>>> > Your comments will be appreciated.
> >>>>
> >>>> A lot better now.  Instead of the ->aux dance I now prefer to
> >>>> pass the original loops loop_vinfo to vect_analyze_loop as
> >>>> optional argument (if non-NULL we analyze the epilogue of that
> >>>> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >>>> original vectorization factor?  So we can pass down an (optional)
> >>>> forced vectorization factor as well?
> >>>
> >>> Btw, I wonder if you can produce a single patch containing just
> >>> epilogue vectorization, that is combine patches 1-3 but rip out
> >>> changes only needed by later patches?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>> Richard.
> >>>>
> >>>> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >>>> > >
> >>>> > >> Hi Richard,
> >>>> > >>
> >>>> > >> I did not understand your last remark:
> >>>> > >>
> >>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> > >> >
> >>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> > >> >           && dump_enabled_p ())
> >>>> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >>>> > >> >                            "loop vectorized\n");
> >>>> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> > >> >         num_vectorized_loops++;
> >>>> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
> >>>> > >> >           etc.  */
> >>>> > >> >      loop->force_vectorize = false;
> >>>> > >> >
> >>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >>>> > >> > +          to match loop and its epilogue vectorization in dumps
> >>>> > >> > +          put new loop as the next loop to process.  */
> >>>> > >> > +       if (new_loop)
> >>>> > >> > +         {
> >>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> > >> > +         }
> >>>> > >> >
> >>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >>>> > >> f> unction which will set up stuff properly (and also perform
> >>>> > >> > the if-conversion of the epilogue there).
> >>>> > >> >
> >>>> > >> > That said, if we can get in non-masked epilogue vectorization
> >>>> > >> > separately that would be great.
> >>>> > >>
> >>>> > >> Could you please clarify your proposal.
> >>>> > >
> >>>> > > When a loop was vectorized set things up to immediately vectorize
> >>>> > > its epilogue, avoiding changing the loop iteration and avoiding
> >>>> > > the re-use of ->aux.
> >>>> > >
> >>>> > > Richard.
> >>>> > >
> >>>> > >> Thanks.
> >>>> > >> Yuri.
> >>>> > >>
> >>>> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >>>> > >> >
> >>>> > >> >> Hi All,
> >>>> > >> >>
> >>>> > >> >> I re-send all patches sent by Ilya earlier for review which support
> >>>> > >> >> vectorization of loop epilogues and loops with low trip count. We
> >>>> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >>>> > >> >> approved by Jeff.
> >>>> > >> >>
> >>>> > >> >> I did re-base of all patches and performed bootstrapping and
> >>>> > >> >> regression testing that did not show any new failures. Also all
> >>>> > >> >> changes related to new vect_do_peeling algorithm have been changed
> >>>> > >> >> accordingly.
> >>>> > >> >>
> >>>> > >> >> Is it OK for trunk?
> >>>> > >> >
> >>>> > >> > I would have prefered that the series up to -03-nomask-tails would
> >>>> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >>>> > >> > the patchset is oddly separated.
> >>>> > >> >
> >>>> > >> > I have a comment on that part nevertheless:
> >>>> > >> >
> >>>> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >>>> > >> > loop_vinfo)
> >>>> > >> >    /* Check if we can possibly peel the loop.  */
> >>>> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >>>> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >>>> > >> > -      || loop->inner)
> >>>> > >> > +      || loop->inner
> >>>> > >> > +      /* Required peeling was performed in prologue and
> >>>> > >> > +        is not required for epilogue.  */
> >>>> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >>>> > >> >      do_peeling = false;
> >>>> > >> >
> >>>> > >> >    if (do_peeling
> >>>> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >>>> > >> > loop_vinfo)
> >>>> > >> >
> >>>> > >> >    do_versioning =
> >>>> > >> >         optimize_loop_nest_for_speed_p (loop)
> >>>> > >> > -       && (!loop->inner); /* FORNOW */
> >>>> > >> > +       && (!loop->inner) /* FORNOW */
> >>>> > >> > +        /* Required versioning was performed for the
> >>>> > >> > +          original loop and is not required for epilogue.  */
> >>>> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >>>> > >> >
> >>>> > >> >    if (do_versioning)
> >>>> > >> >      {
> >>>> > >> >
> >>>> > >> > please do that check in the single caller of this function.
> >>>> > >> >
> >>>> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >>>> > >> > passing down info from the processed parent would be _much_ cleaner.
> >>>> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> > >> >
> >>>> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> > >> >             && dump_enabled_p ())
> >>>> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >>>> > >> >                             "loop vectorized\n");
> >>>> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> > >> >         num_vectorized_loops++;
> >>>> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
> >>>> > >> >            etc.  */
> >>>> > >> >         loop->force_vectorize = false;
> >>>> > >> >
> >>>> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >>>> > >> > +          to match loop and its epilogue vectorization in dumps
> >>>> > >> > +          put new loop as the next loop to process.  */
> >>>> > >> > +       if (new_loop)
> >>>> > >> > +         {
> >>>> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> > >> > +         }
> >>>> > >> >
> >>>> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >>>> > >> > function which will set up stuff properly (and also perform
> >>>> > >> > the if-conversion of the epilogue there).
> >>>> > >> >
> >>>> > >> > That said, if we can get in non-masked epilogue vectorization
> >>>> > >> > separately that would be great.
> >>>> > >> >
> >>>> > >> > I'm still torn about all the rest of the stuff and question its
> >>>> > >> > usability (esp. merging the epilogue with the main vector loop).
> >>>> > >> > But it has already been approved ... oh well.
> >>>> > >> >
> >>>> > >> > Thanks,
> >>>> > >> > Richard.
> >>>> > >>
> >>>> > >>
> >>>> > >
> >>>> > > --
> >>>> > > Richard Biener <rguenther@suse.de>
> >>>> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>>> >
> >>>>
> >>>>
> >>>
> >>> --
> >>> Richard Biener <rguenther@suse.de>
> >>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>
Yuri Rumyantsev Nov. 14, 2016, 1:29 p.m. UTC | #5
Richard,

In my previous patch I forgot to remove couple lines related to aux field.
Here is the correct updated patch.

Thanks.
Yuri.

2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
> On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> I prepare updated 3 patch with passing additional argument to
>> vect_analyze_loop as you proposed (untested).
>>
>> You wrote:
>> tw, I wonder if you can produce a single patch containing just
>> epilogue vectorization, that is combine patches 1-3 but rip out
>> changes only needed by later patches?
>>
>> Did you mean that I exclude all support for vectorization epilogues,
>> i.e. exclude from 2-nd patch all non-related changes
>> like
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 11863af..32011c1 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>
> Yes.
>
>> Did you mean also that new combined patch must be working patch, i.e.
>> can be integrated without other patches?
>
> Yes.
>
>> Could you please look at updated patch?
>
> Will do.
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> > On Thu, 10 Nov 2016, Richard Biener wrote:
>> >
>> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>> >>
>> >> > Richard,
>> >> >
>> >> > Here is updated 3 patch.
>> >> >
>> >> > I checked that all new tests related to epilogue vectorization passed with it.
>> >> >
>> >> > Your comments will be appreciated.
>> >>
>> >> A lot better now.  Instead of the ->aux dance I now prefer to
>> >> pass the original loops loop_vinfo to vect_analyze_loop as
>> >> optional argument (if non-NULL we analyze the epilogue of that
>> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> >> original vectorization factor?  So we can pass down an (optional)
>> >> forced vectorization factor as well?
>> >
>> > Btw, I wonder if you can produce a single patch containing just
>> > epilogue vectorization, that is combine patches 1-3 but rip out
>> > changes only needed by later patches?
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> Richard.
>> >>
>> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> >> > >
>> >> > >> Hi Richard,
>> >> > >>
>> >> > >> I did not understand your last remark:
>> >> > >>
>> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> > >> >
>> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> > >> >           && dump_enabled_p ())
>> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >> > >> >                            "loop vectorized\n");
>> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >> > >> >         num_vectorized_loops++;
>> >> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
>> >> > >> >           etc.  */
>> >> > >> >      loop->force_vectorize = false;
>> >> > >> >
>> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> >> > >> > +          to match loop and its epilogue vectorization in dumps
>> >> > >> > +          put new loop as the next loop to process.  */
>> >> > >> > +       if (new_loop)
>> >> > >> > +         {
>> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >> > >> > +         }
>> >> > >> >
>> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> >> > >> f> unction which will set up stuff properly (and also perform
>> >> > >> > the if-conversion of the epilogue there).
>> >> > >> >
>> >> > >> > That said, if we can get in non-masked epilogue vectorization
>> >> > >> > separately that would be great.
>> >> > >>
>> >> > >> Could you please clarify your proposal.
>> >> > >
>> >> > > When a loop was vectorized set things up to immediately vectorize
>> >> > > its epilogue, avoiding changing the loop iteration and avoiding
>> >> > > the re-use of ->aux.
>> >> > >
>> >> > > Richard.
>> >> > >
>> >> > >> Thanks.
>> >> > >> Yuri.
>> >> > >>
>> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >> > >> >
>> >> > >> >> Hi All,
>> >> > >> >>
>> >> > >> >> I re-send all patches sent by Ilya earlier for review which support
>> >> > >> >> vectorization of loop epilogues and loops with low trip count. We
>> >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> >> > >> >> approved by Jeff.
>> >> > >> >>
>> >> > >> >> I did re-base of all patches and performed bootstrapping and
>> >> > >> >> regression testing that did not show any new failures. Also all
>> >> > >> >> changes related to new vect_do_peeling algorithm have been changed
>> >> > >> >> accordingly.
>> >> > >> >>
>> >> > >> >> Is it OK for trunk?
>> >> > >> >
>> >> > >> > I would have prefered that the series up to -03-nomask-tails would
>> >> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>> >> > >> > the patchset is oddly separated.
>> >> > >> >
>> >> > >> > I have a comment on that part nevertheless:
>> >> > >> >
>> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> >> > >> > loop_vinfo)
>> >> > >> >    /* Check if we can possibly peel the loop.  */
>> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>> >> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>> >> > >> > -      || loop->inner)
>> >> > >> > +      || loop->inner
>> >> > >> > +      /* Required peeling was performed in prologue and
>> >> > >> > +        is not required for epilogue.  */
>> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> >> > >> >      do_peeling = false;
>> >> > >> >
>> >> > >> >    if (do_peeling
>> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> >> > >> > loop_vinfo)
>> >> > >> >
>> >> > >> >    do_versioning =
>> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>> >> > >> > -       && (!loop->inner); /* FORNOW */
>> >> > >> > +       && (!loop->inner) /* FORNOW */
>> >> > >> > +        /* Required versioning was performed for the
>> >> > >> > +          original loop and is not required for epilogue.  */
>> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> >> > >> >
>> >> > >> >    if (do_versioning)
>> >> > >> >      {
>> >> > >> >
>> >> > >> > please do that check in the single caller of this function.
>> >> > >> >
>> >> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
>> >> > >> > passing down info from the processed parent would be _much_ cleaner.
>> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> > >> >
>> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> > >> >             && dump_enabled_p ())
>> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >> > >> >                             "loop vectorized\n");
>> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >> > >> >         num_vectorized_loops++;
>> >> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
>> >> > >> >            etc.  */
>> >> > >> >         loop->force_vectorize = false;
>> >> > >> >
>> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> >> > >> > +          to match loop and its epilogue vectorization in dumps
>> >> > >> > +          put new loop as the next loop to process.  */
>> >> > >> > +       if (new_loop)
>> >> > >> > +         {
>> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >> > >> > +         }
>> >> > >> >
>> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> >> > >> > function which will set up stuff properly (and also perform
>> >> > >> > the if-conversion of the epilogue there).
>> >> > >> >
>> >> > >> > That said, if we can get in non-masked epilogue vectorization
>> >> > >> > separately that would be great.
>> >> > >> >
>> >> > >> > I'm still torn about all the rest of the stuff and question its
>> >> > >> > usability (esp. merging the epilogue with the main vector loop).
>> >> > >> > But it has already been approved ... oh well.
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Richard.
>> >> > >>
>> >> > >>
>> >> > >
>> >> > > --
>> >> > > Richard Biener <rguenther@suse.de>
>> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >> >
>> >>
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 14, 2016, 1:40 p.m. UTC | #6
On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:

> Richard,
> 
> In my previous patch I forgot to remove couple lines related to aux field.
> Here is the correct updated patch.

Yeah, I noticed.  This patch would be ok for trunk (together with
necessary parts from 1 and 2) if all not required parts are removed
(and you'd add the testcases covering non-masked tail vect).

Thus, can you please produce a single complete patch containing only
non-masked epilogue vectoriziation?

Thanks,
Richard.

> Thanks.
> Yuri.
> 
> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard,
> >>
> >> I prepare updated 3 patch with passing additional argument to
> >> vect_analyze_loop as you proposed (untested).
> >>
> >> You wrote:
> >> tw, I wonder if you can produce a single patch containing just
> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >> changes only needed by later patches?
> >>
> >> Did you mean that I exclude all support for vectorization epilogues,
> >> i.e. exclude from 2-nd patch all non-related changes
> >> like
> >>
> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >> index 11863af..32011c1 100644
> >> --- a/gcc/tree-vect-loop.c
> >> +++ b/gcc/tree-vect-loop.c
> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >
> > Yes.
> >
> >> Did you mean also that new combined patch must be working patch, i.e.
> >> can be integrated without other patches?
> >
> > Yes.
> >
> >> Could you please look at updated patch?
> >
> > Will do.
> >
> > Thanks,
> > Richard.
> >
> >> Thanks.
> >> Yuri.
> >>
> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >> >
> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >> >>
> >> >> > Richard,
> >> >> >
> >> >> > Here is updated 3 patch.
> >> >> >
> >> >> > I checked that all new tests related to epilogue vectorization passed with it.
> >> >> >
> >> >> > Your comments will be appreciated.
> >> >>
> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >> >> optional argument (if non-NULL we analyze the epilogue of that
> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >> >> original vectorization factor?  So we can pass down an (optional)
> >> >> forced vectorization factor as well?
> >> >
> >> > Btw, I wonder if you can produce a single patch containing just
> >> > epilogue vectorization, that is combine patches 1-3 but rip out
> >> > changes only needed by later patches?
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >> Richard.
> >> >>
> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >> >> > >
> >> >> > >> Hi Richard,
> >> >> > >>
> >> >> > >> I did not understand your last remark:
> >> >> > >>
> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >> > >> >
> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >> > >> >           && dump_enabled_p ())
> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> >> > >> >                            "loop vectorized\n");
> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >> >> > >> >         num_vectorized_loops++;
> >> >> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
> >> >> > >> >           etc.  */
> >> >> > >> >      loop->force_vectorize = false;
> >> >> > >> >
> >> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >> >> > >> > +          to match loop and its epilogue vectorization in dumps
> >> >> > >> > +          put new loop as the next loop to process.  */
> >> >> > >> > +       if (new_loop)
> >> >> > >> > +         {
> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >> >> > >> > +         }
> >> >> > >> >
> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> >> > >> f> unction which will set up stuff properly (and also perform
> >> >> > >> > the if-conversion of the epilogue there).
> >> >> > >> >
> >> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> >> > >> > separately that would be great.
> >> >> > >>
> >> >> > >> Could you please clarify your proposal.
> >> >> > >
> >> >> > > When a loop was vectorized set things up to immediately vectorize
> >> >> > > its epilogue, avoiding changing the loop iteration and avoiding
> >> >> > > the re-use of ->aux.
> >> >> > >
> >> >> > > Richard.
> >> >> > >
> >> >> > >> Thanks.
> >> >> > >> Yuri.
> >> >> > >>
> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >> >> > >> >
> >> >> > >> >> Hi All,
> >> >> > >> >>
> >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support
> >> >> > >> >> vectorization of loop epilogues and loops with low trip count. We
> >> >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> >> > >> >> approved by Jeff.
> >> >> > >> >>
> >> >> > >> >> I did re-base of all patches and performed bootstrapping and
> >> >> > >> >> regression testing that did not show any new failures. Also all
> >> >> > >> >> changes related to new vect_do_peeling algorithm have been changed
> >> >> > >> >> accordingly.
> >> >> > >> >>
> >> >> > >> >> Is it OK for trunk?
> >> >> > >> >
> >> >> > >> > I would have prefered that the series up to -03-nomask-tails would
> >> >> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
> >> >> > >> > the patchset is oddly separated.
> >> >> > >> >
> >> >> > >> > I have a comment on that part nevertheless:
> >> >> > >> >
> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> >> > >> > loop_vinfo)
> >> >> > >> >    /* Check if we can possibly peel the loop.  */
> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> >> >> > >> > -      || loop->inner)
> >> >> > >> > +      || loop->inner
> >> >> > >> > +      /* Required peeling was performed in prologue and
> >> >> > >> > +        is not required for epilogue.  */
> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >> >> > >> >      do_peeling = false;
> >> >> > >> >
> >> >> > >> >    if (do_peeling
> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> >> >> > >> > loop_vinfo)
> >> >> > >> >
> >> >> > >> >    do_versioning =
> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
> >> >> > >> > -       && (!loop->inner); /* FORNOW */
> >> >> > >> > +       && (!loop->inner) /* FORNOW */
> >> >> > >> > +        /* Required versioning was performed for the
> >> >> > >> > +          original loop and is not required for epilogue.  */
> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >> >> > >> >
> >> >> > >> >    if (do_versioning)
> >> >> > >> >      {
> >> >> > >> >
> >> >> > >> > please do that check in the single caller of this function.
> >> >> > >> >
> >> >> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
> >> >> > >> > passing down info from the processed parent would be _much_ cleaner.
> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >> >> > >> >
> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >> >> > >> >             && dump_enabled_p ())
> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> >> >> > >> >                             "loop vectorized\n");
> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >> >> > >> >         num_vectorized_loops++;
> >> >> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
> >> >> > >> >            etc.  */
> >> >> > >> >         loop->force_vectorize = false;
> >> >> > >> >
> >> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
> >> >> > >> > +          to match loop and its epilogue vectorization in dumps
> >> >> > >> > +          put new loop as the next loop to process.  */
> >> >> > >> > +       if (new_loop)
> >> >> > >> > +         {
> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >> >> > >> > +         }
> >> >> > >> >
> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> >> >> > >> > function which will set up stuff properly (and also perform
> >> >> > >> > the if-conversion of the epilogue there).
> >> >> > >> >
> >> >> > >> > That said, if we can get in non-masked epilogue vectorization
> >> >> > >> > separately that would be great.
> >> >> > >> >
> >> >> > >> > I'm still torn about all the rest of the stuff and question its
> >> >> > >> > usability (esp. merging the epilogue with the main vector loop).
> >> >> > >> > But it has already been approved ... oh well.
> >> >> > >> >
> >> >> > >> > Thanks,
> >> >> > >> > Richard.
> >> >> > >>
> >> >> > >>
> >> >> > >
> >> >> > > --
> >> >> > > Richard Biener <rguenther@suse.de>
> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >> >> >
> >> >>
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>
Yuri Rumyantsev Nov. 14, 2016, 3:39 p.m. UTC | #7
Richard,

I checked one of the tests designed for epilogue vectorization using
patches 1 - 3 and found out that build compiler performs vectorization
of epilogues with --param vect-epilogues-nomask=1 passed:

$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
t1.new-nomask.s -fdump-tree-vect-details
$ grep VECTORIZED -c t1.c.156t.vect
4
 Without param only 2 loops are vectorized.

Should I simply add a part of tests related to this feature or I must
delete all not necessary changes also?

Thanks.
Yuri.

2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard,
>>
>> In my previous patch I forgot to remove couple lines related to aux field.
>> Here is the correct updated patch.
>
> Yeah, I noticed.  This patch would be ok for trunk (together with
> necessary parts from 1 and 2) if all not required parts are removed
> (and you'd add the testcases covering non-masked tail vect).
>
> Thus, can you please produce a single complete patch containing only
> non-masked epilogue vectoriziation?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>> >
>> >> Richard,
>> >>
>> >> I prepare updated 3 patch with passing additional argument to
>> >> vect_analyze_loop as you proposed (untested).
>> >>
>> >> You wrote:
>> >> tw, I wonder if you can produce a single patch containing just
>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>> >> changes only needed by later patches?
>> >>
>> >> Did you mean that I exclude all support for vectorization epilogues,
>> >> i.e. exclude from 2-nd patch all non-related changes
>> >> like
>> >>
>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> >> index 11863af..32011c1 100644
>> >> --- a/gcc/tree-vect-loop.c
>> >> +++ b/gcc/tree-vect-loop.c
>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>> >
>> > Yes.
>> >
>> >> Did you mean also that new combined patch must be working patch, i.e.
>> >> can be integrated without other patches?
>> >
>> > Yes.
>> >
>> >> Could you please look at updated patch?
>> >
>> > Will do.
>> >
>> > Thanks,
>> > Richard.
>> >
>> >> Thanks.
>> >> Yuri.
>> >>
>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>> >> >
>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>> >> >>
>> >> >> > Richard,
>> >> >> >
>> >> >> > Here is updated 3 patch.
>> >> >> >
>> >> >> > I checked that all new tests related to epilogue vectorization passed with it.
>> >> >> >
>> >> >> > Your comments will be appreciated.
>> >> >>
>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> >> >> original vectorization factor?  So we can pass down an (optional)
>> >> >> forced vectorization factor as well?
>> >> >
>> >> > Btw, I wonder if you can produce a single patch containing just
>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>> >> > changes only needed by later patches?
>> >> >
>> >> > Thanks,
>> >> > Richard.
>> >> >
>> >> >> Richard.
>> >> >>
>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> >> >> > >
>> >> >> > >> Hi Richard,
>> >> >> > >>
>> >> >> > >> I did not understand your last remark:
>> >> >> > >>
>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> >> > >> >
>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> >> > >> >           && dump_enabled_p ())
>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >> >> > >> >                            "loop vectorized\n");
>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >> >> > >> >         num_vectorized_loops++;
>> >> >> > >> >        /* Now that the loop has been vectorized, allow it to be unrolled
>> >> >> > >> >           etc.  */
>> >> >> > >> >      loop->force_vectorize = false;
>> >> >> > >> >
>> >> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> >> >> > >> > +          to match loop and its epilogue vectorization in dumps
>> >> >> > >> > +          put new loop as the next loop to process.  */
>> >> >> > >> > +       if (new_loop)
>> >> >> > >> > +         {
>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >> >> > >> > +         }
>> >> >> > >> >
>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> >> >> > >> f> unction which will set up stuff properly (and also perform
>> >> >> > >> > the if-conversion of the epilogue there).
>> >> >> > >> >
>> >> >> > >> > That said, if we can get in non-masked epilogue vectorization
>> >> >> > >> > separately that would be great.
>> >> >> > >>
>> >> >> > >> Could you please clarify your proposal.
>> >> >> > >
>> >> >> > > When a loop was vectorized set things up to immediately vectorize
>> >> >> > > its epilogue, avoiding changing the loop iteration and avoiding
>> >> >> > > the re-use of ->aux.
>> >> >> > >
>> >> >> > > Richard.
>> >> >> > >
>> >> >> > >> Thanks.
>> >> >> > >> Yuri.
>> >> >> > >>
>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >> >> > >> >
>> >> >> > >> >> Hi All,
>> >> >> > >> >>
>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review which support
>> >> >> > >> >> vectorization of loop epilogues and loops with low trip count. We
>> >> >> > >> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> >> >> > >> >> approved by Jeff.
>> >> >> > >> >>
>> >> >> > >> >> I did re-base of all patches and performed bootstrapping and
>> >> >> > >> >> regression testing that did not show any new failures. Also all
>> >> >> > >> >> changes related to new vect_do_peeling algorithm have been changed
>> >> >> > >> >> accordingly.
>> >> >> > >> >>
>> >> >> > >> >> Is it OK for trunk?
>> >> >> > >> >
>> >> >> > >> > I would have prefered that the series up to -03-nomask-tails would
>> >> >> > >> > _only_ contain epilogue loop vectorization changes but unfortunately
>> >> >> > >> > the patchset is oddly separated.
>> >> >> > >> >
>> >> >> > >> > I have a comment on that part nevertheless:
>> >> >> > >> >
>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> >> >> > >> > loop_vinfo)
>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
>> >> >> > >> > -      || loop->inner)
>> >> >> > >> > +      || loop->inner
>> >> >> > >> > +      /* Required peeling was performed in prologue and
>> >> >> > >> > +        is not required for epilogue.  */
>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> >> >> > >> >      do_peeling = false;
>> >> >> > >> >
>> >> >> > >> >    if (do_peeling
>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
>> >> >> > >> > loop_vinfo)
>> >> >> > >> >
>> >> >> > >> >    do_versioning =
>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>> >> >> > >> > +        /* Required versioning was performed for the
>> >> >> > >> > +          original loop and is not required for epilogue.  */
>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> >> >> > >> >
>> >> >> > >> >    if (do_versioning)
>> >> >> > >> >      {
>> >> >> > >> >
>> >> >> > >> > please do that check in the single caller of this function.
>> >> >> > >> >
>> >> >> > >> > Otherwise I still dislike the new ->aux use and I believe that simply
>> >> >> > >> > passing down info from the processed parent would be _much_ cleaner.
>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >> >> > >> >
>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >> >> > >> >             && dump_enabled_p ())
>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>> >> >> > >> >                             "loop vectorized\n");
>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >> >> > >> >         num_vectorized_loops++;
>> >> >> > >> >         /* Now that the loop has been vectorized, allow it to be unrolled
>> >> >> > >> >            etc.  */
>> >> >> > >> >         loop->force_vectorize = false;
>> >> >> > >> >
>> >> >> > >> > +       /* Add new loop to a processing queue.  To make it easier
>> >> >> > >> > +          to match loop and its epilogue vectorization in dumps
>> >> >> > >> > +          put new loop as the next loop to process.  */
>> >> >> > >> > +       if (new_loop)
>> >> >> > >> > +         {
>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >> >> > >> > +         }
>> >> >> > >> >
>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
>> >> >> > >> > function which will set up stuff properly (and also perform
>> >> >> > >> > the if-conversion of the epilogue there).
>> >> >> > >> >
>> >> >> > >> > That said, if we can get in non-masked epilogue vectorization
>> >> >> > >> > separately that would be great.
>> >> >> > >> >
>> >> >> > >> > I'm still torn about all the rest of the stuff and question its
>> >> >> > >> > usability (esp. merging the epilogue with the main vector loop).
>> >> >> > >> > But it has already been approved ... oh well.
>> >> >> > >> >
>> >> >> > >> > Thanks,
>> >> >> > >> > Richard.
>> >> >> > >>
>> >> >> > >>
>> >> >> > >
>> >> >> > > --
>> >> >> > > Richard Biener <rguenther@suse.de>
>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 14, 2016, 5:04 p.m. UTC | #8
On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>Richard,
>
>I checked one of the tests designed for epilogue vectorization using
>patches 1 - 3 and found out that build compiler performs vectorization
>of epilogues with --param vect-epilogues-nomask=1 passed:
>
>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>t1.new-nomask.s -fdump-tree-vect-details
>$ grep VECTORIZED -c t1.c.156t.vect
>4
> Without param only 2 loops are vectorized.
>
>Should I simply add a part of tests related to this feature or I must
>delete all not necessary changes also?

Please remove all not necessary changes.

Richard.

>Thanks.
>Yuri.
>
>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>
>>> Richard,
>>>
>>> In my previous patch I forgot to remove couple lines related to aux
>field.
>>> Here is the correct updated patch.
>>
>> Yeah, I noticed.  This patch would be ok for trunk (together with
>> necessary parts from 1 and 2) if all not required parts are removed
>> (and you'd add the testcases covering non-masked tail vect).
>>
>> Thus, can you please produce a single complete patch containing only
>> non-masked epilogue vectoriziation?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>> >
>>> >> Richard,
>>> >>
>>> >> I prepare updated 3 patch with passing additional argument to
>>> >> vect_analyze_loop as you proposed (untested).
>>> >>
>>> >> You wrote:
>>> >> tw, I wonder if you can produce a single patch containing just
>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>> >> changes only needed by later patches?
>>> >>
>>> >> Did you mean that I exclude all support for vectorization
>epilogues,
>>> >> i.e. exclude from 2-nd patch all non-related changes
>>> >> like
>>> >>
>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> >> index 11863af..32011c1 100644
>>> >> --- a/gcc/tree-vect-loop.c
>>> >> +++ b/gcc/tree-vect-loop.c
>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>> >
>>> > Yes.
>>> >
>>> >> Did you mean also that new combined patch must be working patch,
>i.e.
>>> >> can be integrated without other patches?
>>> >
>>> > Yes.
>>> >
>>> >> Could you please look at updated patch?
>>> >
>>> > Will do.
>>> >
>>> > Thanks,
>>> > Richard.
>>> >
>>> >> Thanks.
>>> >> Yuri.
>>> >>
>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>> >> >
>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>> >> >>
>>> >> >> > Richard,
>>> >> >> >
>>> >> >> > Here is updated 3 patch.
>>> >> >> >
>>> >> >> > I checked that all new tests related to epilogue
>vectorization passed with it.
>>> >> >> >
>>> >> >> > Your comments will be appreciated.
>>> >> >>
>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>> >> >> original vectorization factor?  So we can pass down an
>(optional)
>>> >> >> forced vectorization factor as well?
>>> >> >
>>> >> > Btw, I wonder if you can produce a single patch containing just
>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>> >> > changes only needed by later patches?
>>> >> >
>>> >> > Thanks,
>>> >> > Richard.
>>> >> >
>>> >> >> Richard.
>>> >> >>
>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
><rguenther@suse.de>:
>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>> >> >> > >
>>> >> >> > >> Hi Richard,
>>> >> >> > >>
>>> >> >> > >> I did not understand your last remark:
>>> >> >> > >>
>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> >> >> > >> >
>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> >> >> > >> >           && dump_enabled_p ())
>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>vect_location,
>>> >> >> > >> >                            "loop vectorized\n");
>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>> >> >> > >> >         num_vectorized_loops++;
>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>it to be unrolled
>>> >> >> > >> >           etc.  */
>>> >> >> > >> >      loop->force_vectorize = false;
>>> >> >> > >> >
>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>it easier
>>> >> >> > >> > +          to match loop and its epilogue vectorization
>in dumps
>>> >> >> > >> > +          put new loop as the next loop to process. 
>*/
>>> >> >> > >> > +       if (new_loop)
>>> >> >> > >> > +         {
>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>> >> >> > >> > +         }
>>> >> >> > >> >
>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>new_loop)
>>> >> >> > >> f> unction which will set up stuff properly (and also
>perform
>>> >> >> > >> > the if-conversion of the epilogue there).
>>> >> >> > >> >
>>> >> >> > >> > That said, if we can get in non-masked epilogue
>vectorization
>>> >> >> > >> > separately that would be great.
>>> >> >> > >>
>>> >> >> > >> Could you please clarify your proposal.
>>> >> >> > >
>>> >> >> > > When a loop was vectorized set things up to immediately
>vectorize
>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>avoiding
>>> >> >> > > the re-use of ->aux.
>>> >> >> > >
>>> >> >> > > Richard.
>>> >> >> > >
>>> >> >> > >> Thanks.
>>> >> >> > >> Yuri.
>>> >> >> > >>
>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
><rguenther@suse.de>:
>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>> >> >> > >> >
>>> >> >> > >> >> Hi All,
>>> >> >> > >> >>
>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>which support
>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>trip count. We
>>> >> >> > >> >> assume that the only patch -
>vec-tails-07-combine-tail.patch - was not
>>> >> >> > >> >> approved by Jeff.
>>> >> >> > >> >>
>>> >> >> > >> >> I did re-base of all patches and performed
>bootstrapping and
>>> >> >> > >> >> regression testing that did not show any new failures.
>Also all
>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>been changed
>>> >> >> > >> >> accordingly.
>>> >> >> > >> >>
>>> >> >> > >> >> Is it OK for trunk?
>>> >> >> > >> >
>>> >> >> > >> > I would have prefered that the series up to
>-03-nomask-tails would
>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>unfortunately
>>> >> >> > >> > the patchset is oddly separated.
>>> >> >> > >> >
>>> >> >> > >> > I have a comment on that part nevertheless:
>>> >> >> > >> >
>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>(loop_vec_info
>>> >> >> > >> > loop_vinfo)
>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>single_exit (loop))
>>> >> >> > >> > -      || loop->inner)
>>> >> >> > >> > +      || loop->inner
>>> >> >> > >> > +      /* Required peeling was performed in prologue
>and
>>> >> >> > >> > +        is not required for epilogue.  */
>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>> >> >> > >> >      do_peeling = false;
>>> >> >> > >> >
>>> >> >> > >> >    if (do_peeling
>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>(loop_vec_info
>>> >> >> > >> > loop_vinfo)
>>> >> >> > >> >
>>> >> >> > >> >    do_versioning =
>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>> >> >> > >> > +        /* Required versioning was performed for the
>>> >> >> > >> > +          original loop and is not required for
>epilogue.  */
>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>> >> >> > >> >
>>> >> >> > >> >    if (do_versioning)
>>> >> >> > >> >      {
>>> >> >> > >> >
>>> >> >> > >> > please do that check in the single caller of this
>function.
>>> >> >> > >> >
>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>believe that simply
>>> >> >> > >> > passing down info from the processed parent would be
>_much_ cleaner.
>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>> >> >> > >> >
>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>> >> >> > >> >             && dump_enabled_p ())
>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>vect_location,
>>> >> >> > >> >                             "loop vectorized\n");
>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>> >> >> > >> >         num_vectorized_loops++;
>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>it to be unrolled
>>> >> >> > >> >            etc.  */
>>> >> >> > >> >         loop->force_vectorize = false;
>>> >> >> > >> >
>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>it easier
>>> >> >> > >> > +          to match loop and its epilogue vectorization
>in dumps
>>> >> >> > >> > +          put new loop as the next loop to process. 
>*/
>>> >> >> > >> > +       if (new_loop)
>>> >> >> > >> > +         {
>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>> >> >> > >> > +         }
>>> >> >> > >> >
>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>new_loop)
>>> >> >> > >> > function which will set up stuff properly (and also
>perform
>>> >> >> > >> > the if-conversion of the epilogue there).
>>> >> >> > >> >
>>> >> >> > >> > That said, if we can get in non-masked epilogue
>vectorization
>>> >> >> > >> > separately that would be great.
>>> >> >> > >> >
>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>question its
>>> >> >> > >> > usability (esp. merging the epilogue with the main
>vector loop).
>>> >> >> > >> > But it has already been approved ... oh well.
>>> >> >> > >> >
>>> >> >> > >> > Thanks,
>>> >> >> > >> > Richard.
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > > --
>>> >> >> > > Richard Biener <rguenther@suse.de>
>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>Graham Norton, HRB 21284 (AG Nuernberg)
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >
>>> >> > --
>>> >> > Richard Biener <rguenther@suse.de>
>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>Norton, HRB 21284 (AG Nuernberg)
>>> >>
>>> >
>>> > --
>>> > Richard Biener <rguenther@suse.de>
>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>Norton, HRB 21284 (AG Nuernberg)
>>>
>>
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>Norton, HRB 21284 (AG Nuernberg)
Yuri Rumyantsev Nov. 15, 2016, 2:41 p.m. UTC | #9
Hi All,

Here is patch for non-masked epilogue vectoriziation.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

Thanks.
Changelog:

2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>

* params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
* tree-if-conv.c (tree_if_conversion): Make public.
* * tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
dynamic alias checks for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
* tree-vect-loop.c: include tree-if-conv.h.
(new_loop_vec_info): Add zeroing orig_loop_info field.
(vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
(vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
using passed argument.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization with less vf.  If-convert epilogue if
required. Print vectorization success for epilogue.
* tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
if it is required, pass loop_vinfo produced during vectorization of
loop body to vect_analyze_loop.
* tree-vectorizer.h (struct _loop_vec_info): Add new field
orig_loop_info.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
(vect_do_peeling): Change prototype to return epilogue.
(vect_analyze_loop): Add argument of loop_vec_info type.
(vect_transform_loop): Return created loop.

gcc/testsuite/

* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.


2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>Richard,
>>
>>I checked one of the tests designed for epilogue vectorization using
>>patches 1 - 3 and found out that build compiler performs vectorization
>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>
>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>t1.new-nomask.s -fdump-tree-vect-details
>>$ grep VECTORIZED -c t1.c.156t.vect
>>4
>> Without param only 2 loops are vectorized.
>>
>>Should I simply add a part of tests related to this feature or I must
>>delete all not necessary changes also?
>
> Please remove all not necessary changes.
>
> Richard.
>
>>Thanks.
>>Yuri.
>>
>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>
>>>> Richard,
>>>>
>>>> In my previous patch I forgot to remove couple lines related to aux
>>field.
>>>> Here is the correct updated patch.
>>>
>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>> necessary parts from 1 and 2) if all not required parts are removed
>>> (and you'd add the testcases covering non-masked tail vect).
>>>
>>> Thus, can you please produce a single complete patch containing only
>>> non-masked epilogue vectoriziation?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>> >
>>>> >> Richard,
>>>> >>
>>>> >> I prepare updated 3 patch with passing additional argument to
>>>> >> vect_analyze_loop as you proposed (untested).
>>>> >>
>>>> >> You wrote:
>>>> >> tw, I wonder if you can produce a single patch containing just
>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>> >> changes only needed by later patches?
>>>> >>
>>>> >> Did you mean that I exclude all support for vectorization
>>epilogues,
>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>> >> like
>>>> >>
>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>> >> index 11863af..32011c1 100644
>>>> >> --- a/gcc/tree-vect-loop.c
>>>> >> +++ b/gcc/tree-vect-loop.c
>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>> >
>>>> > Yes.
>>>> >
>>>> >> Did you mean also that new combined patch must be working patch,
>>i.e.
>>>> >> can be integrated without other patches?
>>>> >
>>>> > Yes.
>>>> >
>>>> >> Could you please look at updated patch?
>>>> >
>>>> > Will do.
>>>> >
>>>> > Thanks,
>>>> > Richard.
>>>> >
>>>> >> Thanks.
>>>> >> Yuri.
>>>> >>
>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>> >> >
>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>> >> >>
>>>> >> >> > Richard,
>>>> >> >> >
>>>> >> >> > Here is updated 3 patch.
>>>> >> >> >
>>>> >> >> > I checked that all new tests related to epilogue
>>vectorization passed with it.
>>>> >> >> >
>>>> >> >> > Your comments will be appreciated.
>>>> >> >>
>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>> >> >> original vectorization factor?  So we can pass down an
>>(optional)
>>>> >> >> forced vectorization factor as well?
>>>> >> >
>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>> >> > changes only needed by later patches?
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Richard.
>>>> >> >
>>>> >> >> Richard.
>>>> >> >>
>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>><rguenther@suse.de>:
>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>> >> >> > >
>>>> >> >> > >> Hi Richard,
>>>> >> >> > >>
>>>> >> >> > >> I did not understand your last remark:
>>>> >> >> > >>
>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>> >> >> > >> >
>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>> >> >> > >> >           && dump_enabled_p ())
>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>vect_location,
>>>> >> >> > >> >                            "loop vectorized\n");
>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>> >> >> > >> >         num_vectorized_loops++;
>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>it to be unrolled
>>>> >> >> > >> >           etc.  */
>>>> >> >> > >> >      loop->force_vectorize = false;
>>>> >> >> > >> >
>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>it easier
>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>in dumps
>>>> >> >> > >> > +          put new loop as the next loop to process.
>>*/
>>>> >> >> > >> > +       if (new_loop)
>>>> >> >> > >> > +         {
>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>> >> >> > >> > +         }
>>>> >> >> > >> >
>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>new_loop)
>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>perform
>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>> >> >> > >> >
>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>vectorization
>>>> >> >> > >> > separately that would be great.
>>>> >> >> > >>
>>>> >> >> > >> Could you please clarify your proposal.
>>>> >> >> > >
>>>> >> >> > > When a loop was vectorized set things up to immediately
>>vectorize
>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>avoiding
>>>> >> >> > > the re-use of ->aux.
>>>> >> >> > >
>>>> >> >> > > Richard.
>>>> >> >> > >
>>>> >> >> > >> Thanks.
>>>> >> >> > >> Yuri.
>>>> >> >> > >>
>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>><rguenther@suse.de>:
>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>> >> >> > >> >
>>>> >> >> > >> >> Hi All,
>>>> >> >> > >> >>
>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>which support
>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>trip count. We
>>>> >> >> > >> >> assume that the only patch -
>>vec-tails-07-combine-tail.patch - was not
>>>> >> >> > >> >> approved by Jeff.
>>>> >> >> > >> >>
>>>> >> >> > >> >> I did re-base of all patches and performed
>>bootstrapping and
>>>> >> >> > >> >> regression testing that did not show any new failures.
>>Also all
>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>been changed
>>>> >> >> > >> >> accordingly.
>>>> >> >> > >> >>
>>>> >> >> > >> >> Is it OK for trunk?
>>>> >> >> > >> >
>>>> >> >> > >> > I would have prefered that the series up to
>>-03-nomask-tails would
>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>unfortunately
>>>> >> >> > >> > the patchset is oddly separated.
>>>> >> >> > >> >
>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>> >> >> > >> >
>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>(loop_vec_info
>>>> >> >> > >> > loop_vinfo)
>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>single_exit (loop))
>>>> >> >> > >> > -      || loop->inner)
>>>> >> >> > >> > +      || loop->inner
>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>and
>>>> >> >> > >> > +        is not required for epilogue.  */
>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>> >> >> > >> >      do_peeling = false;
>>>> >> >> > >> >
>>>> >> >> > >> >    if (do_peeling
>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>(loop_vec_info
>>>> >> >> > >> > loop_vinfo)
>>>> >> >> > >> >
>>>> >> >> > >> >    do_versioning =
>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>> >> >> > >> > +          original loop and is not required for
>>epilogue.  */
>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>> >> >> > >> >
>>>> >> >> > >> >    if (do_versioning)
>>>> >> >> > >> >      {
>>>> >> >> > >> >
>>>> >> >> > >> > please do that check in the single caller of this
>>function.
>>>> >> >> > >> >
>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>believe that simply
>>>> >> >> > >> > passing down info from the processed parent would be
>>_much_ cleaner.
>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>> >> >> > >> >
>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>> >> >> > >> >             && dump_enabled_p ())
>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>vect_location,
>>>> >> >> > >> >                             "loop vectorized\n");
>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>> >> >> > >> >         num_vectorized_loops++;
>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>it to be unrolled
>>>> >> >> > >> >            etc.  */
>>>> >> >> > >> >         loop->force_vectorize = false;
>>>> >> >> > >> >
>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>it easier
>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>in dumps
>>>> >> >> > >> > +          put new loop as the next loop to process.
>>*/
>>>> >> >> > >> > +       if (new_loop)
>>>> >> >> > >> > +         {
>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>> >> >> > >> > +         }
>>>> >> >> > >> >
>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>new_loop)
>>>> >> >> > >> > function which will set up stuff properly (and also
>>perform
>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>> >> >> > >> >
>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>vectorization
>>>> >> >> > >> > separately that would be great.
>>>> >> >> > >> >
>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>question its
>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>vector loop).
>>>> >> >> > >> > But it has already been approved ... oh well.
>>>> >> >> > >> >
>>>> >> >> > >> > Thanks,
>>>> >> >> > >> > Richard.
>>>> >> >> > >>
>>>> >> >> > >>
>>>> >> >> > >
>>>> >> >> > > --
>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >> > --
>>>> >> > Richard Biener <rguenther@suse.de>
>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>Norton, HRB 21284 (AG Nuernberg)
>>>> >>
>>>> >
>>>> > --
>>>> > Richard Biener <rguenther@suse.de>
>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>Norton, HRB 21284 (AG Nuernberg)
>>>>
>>>
>>> --
>>> Richard Biener <rguenther@suse.de>
>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>Norton, HRB 21284 (AG Nuernberg)
>
>
Richard Biener Nov. 16, 2016, 9:56 a.m. UTC | #10
On Tue, 15 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is patch for non-masked epilogue vectoriziation.
> 
> Bootstrap and regression testing did not show any new failures.
> 
> Is it OK for trunk?

Ok for trunk.

I believe we ultimatively want to remove the new
--param and enable this by default with a better cost model.
What immediately comes to my mind when seeing

+  if (epilogue)
+    {
+       unsigned int vector_sizes
+         = targetm.vectorize.autovectorize_vector_sizes ();
+       vector_sizes &= current_vector_size - 1;
+
+       if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
+         epilogue = NULL;
+       else if (!vector_sizes)
+         epilogue = NULL;
+       else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+                && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+         {
+           int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
+           int ratio = current_vector_size / smallest_vec_size;
+           int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+             - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+           eiters = eiters % vf;
+
+           epilogue->nb_iterations_upper_bound = eiters - 1;
+
+           if (eiters < vf / ratio)
+             epilogue = NULL;
+           }

is that we have targetm.vectorize.preferred_simd_mode which
for example with -mprefer-avx128 will first try with 128bit
vectorization.  So if a ! prefered vector size ends up
creating the epilogue we know vectorizing with the prefered
size will fail.  The above also does not take into account
peeling for gaps in which case we know the epilogue runs at
least VF times (but the vectorized epilogue might also need
an epilogue itself if not masked).

The natural next step is the masked epilogue support, after
that the merged masked epilogue one.

Thanks,
Richard.

> Thanks.
> Changelog:
> 
> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
> 
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
> 
> gcc/testsuite/
> 
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
> 
> 
> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
> > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> >>Richard,
> >>
> >>I checked one of the tests designed for epilogue vectorization using
> >>patches 1 - 3 and found out that build compiler performs vectorization
> >>of epilogues with --param vect-epilogues-nomask=1 passed:
> >>
> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
> >>t1.new-nomask.s -fdump-tree-vect-details
> >>$ grep VECTORIZED -c t1.c.156t.vect
> >>4
> >> Without param only 2 loops are vectorized.
> >>
> >>Should I simply add a part of tests related to this feature or I must
> >>delete all not necessary changes also?
> >
> > Please remove all not necessary changes.
> >
> > Richard.
> >
> >>Thanks.
> >>Yuri.
> >>
> >>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
> >>>
> >>>> Richard,
> >>>>
> >>>> In my previous patch I forgot to remove couple lines related to aux
> >>field.
> >>>> Here is the correct updated patch.
> >>>
> >>> Yeah, I noticed.  This patch would be ok for trunk (together with
> >>> necessary parts from 1 and 2) if all not required parts are removed
> >>> (and you'd add the testcases covering non-masked tail vect).
> >>>
> >>> Thus, can you please produce a single complete patch containing only
> >>> non-masked epilogue vectoriziation?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>> Thanks.
> >>>> Yuri.
> >>>>
> >>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >
> >>>> >> Richard,
> >>>> >>
> >>>> >> I prepare updated 3 patch with passing additional argument to
> >>>> >> vect_analyze_loop as you proposed (untested).
> >>>> >>
> >>>> >> You wrote:
> >>>> >> tw, I wonder if you can produce a single patch containing just
> >>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >>>> >> changes only needed by later patches?
> >>>> >>
> >>>> >> Did you mean that I exclude all support for vectorization
> >>epilogues,
> >>>> >> i.e. exclude from 2-nd patch all non-related changes
> >>>> >> like
> >>>> >>
> >>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >>>> >> index 11863af..32011c1 100644
> >>>> >> --- a/gcc/tree-vect-loop.c
> >>>> >> +++ b/gcc/tree-vect-loop.c
> >>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >>>> >
> >>>> > Yes.
> >>>> >
> >>>> >> Did you mean also that new combined patch must be working patch,
> >>i.e.
> >>>> >> can be integrated without other patches?
> >>>> >
> >>>> > Yes.
> >>>> >
> >>>> >> Could you please look at updated patch?
> >>>> >
> >>>> > Will do.
> >>>> >
> >>>> > Thanks,
> >>>> > Richard.
> >>>> >
> >>>> >> Thanks.
> >>>> >> Yuri.
> >>>> >>
> >>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >>>> >> >
> >>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >>
> >>>> >> >> > Richard,
> >>>> >> >> >
> >>>> >> >> > Here is updated 3 patch.
> >>>> >> >> >
> >>>> >> >> > I checked that all new tests related to epilogue
> >>vectorization passed with it.
> >>>> >> >> >
> >>>> >> >> > Your comments will be appreciated.
> >>>> >> >>
> >>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
> >>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >>>> >> >> original vectorization factor?  So we can pass down an
> >>(optional)
> >>>> >> >> forced vectorization factor as well?
> >>>> >> >
> >>>> >> > Btw, I wonder if you can produce a single patch containing just
> >>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
> >>>> >> > changes only needed by later patches?
> >>>> >> >
> >>>> >> > Thanks,
> >>>> >> > Richard.
> >>>> >> >
> >>>> >> >> Richard.
> >>>> >> >>
> >>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
> >><rguenther@suse.de>:
> >>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >> > >
> >>>> >> >> > >> Hi Richard,
> >>>> >> >> > >>
> >>>> >> >> > >> I did not understand your last remark:
> >>>> >> >> > >>
> >>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> >> >> > >> >           && dump_enabled_p ())
> >>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>vect_location,
> >>>> >> >> > >> >                            "loop vectorized\n");
> >>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> >         num_vectorized_loops++;
> >>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
> >>it to be unrolled
> >>>> >> >> > >> >           etc.  */
> >>>> >> >> > >> >      loop->force_vectorize = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>it easier
> >>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>in dumps
> >>>> >> >> > >> > +          put new loop as the next loop to process.
> >>*/
> >>>> >> >> > >> > +       if (new_loop)
> >>>> >> >> > >> > +         {
> >>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> >> >> > >> > +         }
> >>>> >> >> > >> >
> >>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>new_loop)
> >>>> >> >> > >> f> unction which will set up stuff properly (and also
> >>perform
> >>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>> >> >> > >> >
> >>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>vectorization
> >>>> >> >> > >> > separately that would be great.
> >>>> >> >> > >>
> >>>> >> >> > >> Could you please clarify your proposal.
> >>>> >> >> > >
> >>>> >> >> > > When a loop was vectorized set things up to immediately
> >>vectorize
> >>>> >> >> > > its epilogue, avoiding changing the loop iteration and
> >>avoiding
> >>>> >> >> > > the re-use of ->aux.
> >>>> >> >> > >
> >>>> >> >> > > Richard.
> >>>> >> >> > >
> >>>> >> >> > >> Thanks.
> >>>> >> >> > >> Yuri.
> >>>> >> >> > >>
> >>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
> >><rguenther@suse.de>:
> >>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >> > >> >
> >>>> >> >> > >> >> Hi All,
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
> >>which support
> >>>> >> >> > >> >> vectorization of loop epilogues and loops with low
> >>trip count. We
> >>>> >> >> > >> >> assume that the only patch -
> >>vec-tails-07-combine-tail.patch - was not
> >>>> >> >> > >> >> approved by Jeff.
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> I did re-base of all patches and performed
> >>bootstrapping and
> >>>> >> >> > >> >> regression testing that did not show any new failures.
> >>Also all
> >>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
> >>been changed
> >>>> >> >> > >> >> accordingly.
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> Is it OK for trunk?
> >>>> >> >> > >> >
> >>>> >> >> > >> > I would have prefered that the series up to
> >>-03-nomask-tails would
> >>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
> >>unfortunately
> >>>> >> >> > >> > the patchset is oddly separated.
> >>>> >> >> > >> >
> >>>> >> >> > >> > I have a comment on that part nevertheless:
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
> >>(loop_vec_info
> >>>> >> >> > >> > loop_vinfo)
> >>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
> >>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
> >>single_exit (loop))
> >>>> >> >> > >> > -      || loop->inner)
> >>>> >> >> > >> > +      || loop->inner
> >>>> >> >> > >> > +      /* Required peeling was performed in prologue
> >>and
> >>>> >> >> > >> > +        is not required for epilogue.  */
> >>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >>>> >> >> > >> >      do_peeling = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> >    if (do_peeling
> >>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
> >>(loop_vec_info
> >>>> >> >> > >> > loop_vinfo)
> >>>> >> >> > >> >
> >>>> >> >> > >> >    do_versioning =
> >>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
> >>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
> >>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
> >>>> >> >> > >> > +        /* Required versioning was performed for the
> >>>> >> >> > >> > +          original loop and is not required for
> >>epilogue.  */
> >>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >>>> >> >> > >> >
> >>>> >> >> > >> >    if (do_versioning)
> >>>> >> >> > >> >      {
> >>>> >> >> > >> >
> >>>> >> >> > >> > please do that check in the single caller of this
> >>function.
> >>>> >> >> > >> >
> >>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
> >>believe that simply
> >>>> >> >> > >> > passing down info from the processed parent would be
> >>_much_ cleaner.
> >>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> >> >> > >> >             && dump_enabled_p ())
> >>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>vect_location,
> >>>> >> >> > >> >                             "loop vectorized\n");
> >>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> >         num_vectorized_loops++;
> >>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
> >>it to be unrolled
> >>>> >> >> > >> >            etc.  */
> >>>> >> >> > >> >         loop->force_vectorize = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>it easier
> >>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>in dumps
> >>>> >> >> > >> > +          put new loop as the next loop to process.
> >>*/
> >>>> >> >> > >> > +       if (new_loop)
> >>>> >> >> > >> > +         {
> >>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> >> >> > >> > +         }
> >>>> >> >> > >> >
> >>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>new_loop)
> >>>> >> >> > >> > function which will set up stuff properly (and also
> >>perform
> >>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>> >> >> > >> >
> >>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>vectorization
> >>>> >> >> > >> > separately that would be great.
> >>>> >> >> > >> >
> >>>> >> >> > >> > I'm still torn about all the rest of the stuff and
> >>question its
> >>>> >> >> > >> > usability (esp. merging the epilogue with the main
> >>vector loop).
> >>>> >> >> > >> > But it has already been approved ... oh well.
> >>>> >> >> > >> >
> >>>> >> >> > >> > Thanks,
> >>>> >> >> > >> > Richard.
> >>>> >> >> > >>
> >>>> >> >> > >>
> >>>> >> >> > >
> >>>> >> >> > > --
> >>>> >> >> > > Richard Biener <rguenther@suse.de>
> >>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
> >>Graham Norton, HRB 21284 (AG Nuernberg)
> >>>> >> >> >
> >>>> >> >>
> >>>> >> >>
> >>>> >> >
> >>>> >> > --
> >>>> >> > Richard Biener <rguenther@suse.de>
> >>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >>>> >>
> >>>> >
> >>>> > --
> >>>> > Richard Biener <rguenther@suse.de>
> >>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >>>>
> >>>
> >>> --
> >>> Richard Biener <rguenther@suse.de>
> >>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >
> >
>
Christophe Lyon Nov. 18, 2016, 1:20 p.m. UTC | #11
On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> Here is patch for non-masked epilogue vectoriziation.
>
> Bootstrap and regression testing did not show any new failures.
>
> Is it OK for trunk?
>
> Thanks.
> Changelog:
>
> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
>
> gcc/testsuite/
>
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>

Hi,

This new test fails on arm-none-eabi (using default cpu/fpu/mode):
  gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
  gcc.dg/vect/vect-tail-nomask-1.c execution test

It does pass on the same target if configured --with-cpu=cortex-a9.

Christophe



>
> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>Richard,
>>>
>>>I checked one of the tests designed for epilogue vectorization using
>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>
>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>t1.new-nomask.s -fdump-tree-vect-details
>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>4
>>> Without param only 2 loops are vectorized.
>>>
>>>Should I simply add a part of tests related to this feature or I must
>>>delete all not necessary changes also?
>>
>> Please remove all not necessary changes.
>>
>> Richard.
>>
>>>Thanks.
>>>Yuri.
>>>
>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>>
>>>>> Richard,
>>>>>
>>>>> In my previous patch I forgot to remove couple lines related to aux
>>>field.
>>>>> Here is the correct updated patch.
>>>>
>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>>> necessary parts from 1 and 2) if all not required parts are removed
>>>> (and you'd add the testcases covering non-masked tail vect).
>>>>
>>>> Thus, can you please produce a single complete patch containing only
>>>> non-masked epilogue vectoriziation?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>>> >
>>>>> >> Richard,
>>>>> >>
>>>>> >> I prepare updated 3 patch with passing additional argument to
>>>>> >> vect_analyze_loop as you proposed (untested).
>>>>> >>
>>>>> >> You wrote:
>>>>> >> tw, I wonder if you can produce a single patch containing just
>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>>> >> changes only needed by later patches?
>>>>> >>
>>>>> >> Did you mean that I exclude all support for vectorization
>>>epilogues,
>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>>> >> like
>>>>> >>
>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>>> >> index 11863af..32011c1 100644
>>>>> >> --- a/gcc/tree-vect-loop.c
>>>>> >> +++ b/gcc/tree-vect-loop.c
>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>>> >
>>>>> > Yes.
>>>>> >
>>>>> >> Did you mean also that new combined patch must be working patch,
>>>i.e.
>>>>> >> can be integrated without other patches?
>>>>> >
>>>>> > Yes.
>>>>> >
>>>>> >> Could you please look at updated patch?
>>>>> >
>>>>> > Will do.
>>>>> >
>>>>> > Thanks,
>>>>> > Richard.
>>>>> >
>>>>> >> Thanks.
>>>>> >> Yuri.
>>>>> >>
>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>>> >> >
>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>> >> >>
>>>>> >> >> > Richard,
>>>>> >> >> >
>>>>> >> >> > Here is updated 3 patch.
>>>>> >> >> >
>>>>> >> >> > I checked that all new tests related to epilogue
>>>vectorization passed with it.
>>>>> >> >> >
>>>>> >> >> > Your comments will be appreciated.
>>>>> >> >>
>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>>> >> >> original vectorization factor?  So we can pass down an
>>>(optional)
>>>>> >> >> forced vectorization factor as well?
>>>>> >> >
>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>>> >> > changes only needed by later patches?
>>>>> >> >
>>>>> >> > Thanks,
>>>>> >> > Richard.
>>>>> >> >
>>>>> >> >> Richard.
>>>>> >> >>
>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>>><rguenther@suse.de>:
>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>>> >> >> > >
>>>>> >> >> > >> Hi Richard,
>>>>> >> >> > >>
>>>>> >> >> > >> I did not understand your last remark:
>>>>> >> >> > >>
>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>> >> >> > >> >
>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>> >> >> > >> >           && dump_enabled_p ())
>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>vect_location,
>>>>> >> >> > >> >                            "loop vectorized\n");
>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>>it to be unrolled
>>>>> >> >> > >> >           etc.  */
>>>>> >> >> > >> >      loop->force_vectorize = false;
>>>>> >> >> > >> >
>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>it easier
>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>in dumps
>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>*/
>>>>> >> >> > >> > +       if (new_loop)
>>>>> >> >> > >> > +         {
>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>> >> >> > >> > +         }
>>>>> >> >> > >> >
>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>new_loop)
>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>>perform
>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>> >> >> > >> >
>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>vectorization
>>>>> >> >> > >> > separately that would be great.
>>>>> >> >> > >>
>>>>> >> >> > >> Could you please clarify your proposal.
>>>>> >> >> > >
>>>>> >> >> > > When a loop was vectorized set things up to immediately
>>>vectorize
>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>>avoiding
>>>>> >> >> > > the re-use of ->aux.
>>>>> >> >> > >
>>>>> >> >> > > Richard.
>>>>> >> >> > >
>>>>> >> >> > >> Thanks.
>>>>> >> >> > >> Yuri.
>>>>> >> >> > >>
>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>>><rguenther@suse.de>:
>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>>> >> >> > >> >
>>>>> >> >> > >> >> Hi All,
>>>>> >> >> > >> >>
>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>>which support
>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>>trip count. We
>>>>> >> >> > >> >> assume that the only patch -
>>>vec-tails-07-combine-tail.patch - was not
>>>>> >> >> > >> >> approved by Jeff.
>>>>> >> >> > >> >>
>>>>> >> >> > >> >> I did re-base of all patches and performed
>>>bootstrapping and
>>>>> >> >> > >> >> regression testing that did not show any new failures.
>>>Also all
>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>>been changed
>>>>> >> >> > >> >> accordingly.
>>>>> >> >> > >> >>
>>>>> >> >> > >> >> Is it OK for trunk?
>>>>> >> >> > >> >
>>>>> >> >> > >> > I would have prefered that the series up to
>>>-03-nomask-tails would
>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>>unfortunately
>>>>> >> >> > >> > the patchset is oddly separated.
>>>>> >> >> > >> >
>>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>>> >> >> > >> >
>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>>(loop_vec_info
>>>>> >> >> > >> > loop_vinfo)
>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>>single_exit (loop))
>>>>> >> >> > >> > -      || loop->inner)
>>>>> >> >> > >> > +      || loop->inner
>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>>and
>>>>> >> >> > >> > +        is not required for epilogue.  */
>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>>> >> >> > >> >      do_peeling = false;
>>>>> >> >> > >> >
>>>>> >> >> > >> >    if (do_peeling
>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>>(loop_vec_info
>>>>> >> >> > >> > loop_vinfo)
>>>>> >> >> > >> >
>>>>> >> >> > >> >    do_versioning =
>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>>> >> >> > >> > +          original loop and is not required for
>>>epilogue.  */
>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>>> >> >> > >> >
>>>>> >> >> > >> >    if (do_versioning)
>>>>> >> >> > >> >      {
>>>>> >> >> > >> >
>>>>> >> >> > >> > please do that check in the single caller of this
>>>function.
>>>>> >> >> > >> >
>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>>believe that simply
>>>>> >> >> > >> > passing down info from the processed parent would be
>>>_much_ cleaner.
>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>> >> >> > >> >
>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>> >> >> > >> >             && dump_enabled_p ())
>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>vect_location,
>>>>> >> >> > >> >                             "loop vectorized\n");
>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>>it to be unrolled
>>>>> >> >> > >> >            etc.  */
>>>>> >> >> > >> >         loop->force_vectorize = false;
>>>>> >> >> > >> >
>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>it easier
>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>in dumps
>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>*/
>>>>> >> >> > >> > +       if (new_loop)
>>>>> >> >> > >> > +         {
>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>> >> >> > >> > +         }
>>>>> >> >> > >> >
>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>new_loop)
>>>>> >> >> > >> > function which will set up stuff properly (and also
>>>perform
>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>> >> >> > >> >
>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>vectorization
>>>>> >> >> > >> > separately that would be great.
>>>>> >> >> > >> >
>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>>question its
>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>>vector loop).
>>>>> >> >> > >> > But it has already been approved ... oh well.
>>>>> >> >> > >> >
>>>>> >> >> > >> > Thanks,
>>>>> >> >> > >> > Richard.
>>>>> >> >> > >>
>>>>> >> >> > >>
>>>>> >> >> > >
>>>>> >> >> > > --
>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >> > --
>>>>> >> > Richard Biener <rguenther@suse.de>
>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>Norton, HRB 21284 (AG Nuernberg)
>>>>> >>
>>>>> >
>>>>> > --
>>>>> > Richard Biener <rguenther@suse.de>
>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>
>>>>
>>>> --
>>>> Richard Biener <rguenther@suse.de>
>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>Norton, HRB 21284 (AG Nuernberg)
>>
>>
Yuri Rumyantsev Nov. 18, 2016, 3:46 p.m. UTC | #12
It is very strange that this test failed on arm, since it requires
target avx2 to check vectorizer dumps:

/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
target avx2_runtime } } } */
/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
\\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */

Could you please clarify what is the reason of the failure?

Thanks.

2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> Here is patch for non-masked epilogue vectoriziation.
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>>
>> Thanks.
>> Changelog:
>>
>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>> * tree-if-conv.c (tree_if_conversion): Make public.
>> * * tree-if-conv.h: New file.
>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>> dynamic alias checks for epilogues.
>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>> * tree-vect-loop.c: include tree-if-conv.h.
>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>> using passed argument.
>> (vect_transform_loop): Check if created epilogue should be returned
>> for further vectorization with less vf.  If-convert epilogue if
>> required. Print vectorization success for epilogue.
>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>> if it is required, pass loop_vinfo produced during vectorization of
>> loop body to vect_analyze_loop.
>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>> orig_loop_info.
>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>> (LOOP_VINFO_EPILOGUE_P): New.
>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>> (vect_do_peeling): Change prototype to return epilogue.
>> (vect_analyze_loop): Add argument of loop_vec_info type.
>> (vect_transform_loop): Return created loop.
>>
>> gcc/testsuite/
>>
>> * lib/target-supports.exp (check_avx2_hw_available): New.
>> (check_effective_target_avx2_runtime): New.
>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>
>
> Hi,
>
> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>
> It does pass on the same target if configured --with-cpu=cortex-a9.
>
> Christophe
>
>
>
>>
>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>Richard,
>>>>
>>>>I checked one of the tests designed for epilogue vectorization using
>>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>>
>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>>t1.new-nomask.s -fdump-tree-vect-details
>>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>>4
>>>> Without param only 2 loops are vectorized.
>>>>
>>>>Should I simply add a part of tests related to this feature or I must
>>>>delete all not necessary changes also?
>>>
>>> Please remove all not necessary changes.
>>>
>>> Richard.
>>>
>>>>Thanks.
>>>>Yuri.
>>>>
>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>>>
>>>>>> Richard,
>>>>>>
>>>>>> In my previous patch I forgot to remove couple lines related to aux
>>>>field.
>>>>>> Here is the correct updated patch.
>>>>>
>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>>>> necessary parts from 1 and 2) if all not required parts are removed
>>>>> (and you'd add the testcases covering non-masked tail vect).
>>>>>
>>>>> Thus, can you please produce a single complete patch containing only
>>>>> non-masked epilogue vectoriziation?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>>>> >
>>>>>> >> Richard,
>>>>>> >>
>>>>>> >> I prepare updated 3 patch with passing additional argument to
>>>>>> >> vect_analyze_loop as you proposed (untested).
>>>>>> >>
>>>>>> >> You wrote:
>>>>>> >> tw, I wonder if you can produce a single patch containing just
>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>> >> changes only needed by later patches?
>>>>>> >>
>>>>>> >> Did you mean that I exclude all support for vectorization
>>>>epilogues,
>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>>>> >> like
>>>>>> >>
>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>>>> >> index 11863af..32011c1 100644
>>>>>> >> --- a/gcc/tree-vect-loop.c
>>>>>> >> +++ b/gcc/tree-vect-loop.c
>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>>>> >
>>>>>> > Yes.
>>>>>> >
>>>>>> >> Did you mean also that new combined patch must be working patch,
>>>>i.e.
>>>>>> >> can be integrated without other patches?
>>>>>> >
>>>>>> > Yes.
>>>>>> >
>>>>>> >> Could you please look at updated patch?
>>>>>> >
>>>>>> > Will do.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Richard.
>>>>>> >
>>>>>> >> Thanks.
>>>>>> >> Yuri.
>>>>>> >>
>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>>>> >> >
>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>>> >> >>
>>>>>> >> >> > Richard,
>>>>>> >> >> >
>>>>>> >> >> > Here is updated 3 patch.
>>>>>> >> >> >
>>>>>> >> >> > I checked that all new tests related to epilogue
>>>>vectorization passed with it.
>>>>>> >> >> >
>>>>>> >> >> > Your comments will be appreciated.
>>>>>> >> >>
>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>>>> >> >> original vectorization factor?  So we can pass down an
>>>>(optional)
>>>>>> >> >> forced vectorization factor as well?
>>>>>> >> >
>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>> >> > changes only needed by later patches?
>>>>>> >> >
>>>>>> >> > Thanks,
>>>>>> >> > Richard.
>>>>>> >> >
>>>>>> >> >> Richard.
>>>>>> >> >>
>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>>>><rguenther@suse.de>:
>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>>>> >> >> > >
>>>>>> >> >> > >> Hi Richard,
>>>>>> >> >> > >>
>>>>>> >> >> > >> I did not understand your last remark:
>>>>>> >> >> > >>
>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>> >> >> > >> >           && dump_enabled_p ())
>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>vect_location,
>>>>>> >> >> > >> >                            "loop vectorized\n");
>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>>>it to be unrolled
>>>>>> >> >> > >> >           etc.  */
>>>>>> >> >> > >> >      loop->force_vectorize = false;
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>it easier
>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>in dumps
>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>*/
>>>>>> >> >> > >> > +       if (new_loop)
>>>>>> >> >> > >> > +         {
>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>> >> >> > >> > +         }
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>new_loop)
>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>>>perform
>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>vectorization
>>>>>> >> >> > >> > separately that would be great.
>>>>>> >> >> > >>
>>>>>> >> >> > >> Could you please clarify your proposal.
>>>>>> >> >> > >
>>>>>> >> >> > > When a loop was vectorized set things up to immediately
>>>>vectorize
>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>>>avoiding
>>>>>> >> >> > > the re-use of ->aux.
>>>>>> >> >> > >
>>>>>> >> >> > > Richard.
>>>>>> >> >> > >
>>>>>> >> >> > >> Thanks.
>>>>>> >> >> > >> Yuri.
>>>>>> >> >> > >>
>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>>>><rguenther@suse.de>:
>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>>>> >> >> > >> >
>>>>>> >> >> > >> >> Hi All,
>>>>>> >> >> > >> >>
>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>>>which support
>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>>>trip count. We
>>>>>> >> >> > >> >> assume that the only patch -
>>>>vec-tails-07-combine-tail.patch - was not
>>>>>> >> >> > >> >> approved by Jeff.
>>>>>> >> >> > >> >>
>>>>>> >> >> > >> >> I did re-base of all patches and performed
>>>>bootstrapping and
>>>>>> >> >> > >> >> regression testing that did not show any new failures.
>>>>Also all
>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>>>been changed
>>>>>> >> >> > >> >> accordingly.
>>>>>> >> >> > >> >>
>>>>>> >> >> > >> >> Is it OK for trunk?
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > I would have prefered that the series up to
>>>>-03-nomask-tails would
>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>>>unfortunately
>>>>>> >> >> > >> > the patchset is oddly separated.
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>>>(loop_vec_info
>>>>>> >> >> > >> > loop_vinfo)
>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>>>single_exit (loop))
>>>>>> >> >> > >> > -      || loop->inner)
>>>>>> >> >> > >> > +      || loop->inner
>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>>>and
>>>>>> >> >> > >> > +        is not required for epilogue.  */
>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>>>> >> >> > >> >      do_peeling = false;
>>>>>> >> >> > >> >
>>>>>> >> >> > >> >    if (do_peeling
>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>>>(loop_vec_info
>>>>>> >> >> > >> > loop_vinfo)
>>>>>> >> >> > >> >
>>>>>> >> >> > >> >    do_versioning =
>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>>>> >> >> > >> > +          original loop and is not required for
>>>>epilogue.  */
>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>>>> >> >> > >> >
>>>>>> >> >> > >> >    if (do_versioning)
>>>>>> >> >> > >> >      {
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > please do that check in the single caller of this
>>>>function.
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>>>believe that simply
>>>>>> >> >> > >> > passing down info from the processed parent would be
>>>>_much_ cleaner.
>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>> >> >> > >> >             && dump_enabled_p ())
>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>vect_location,
>>>>>> >> >> > >> >                             "loop vectorized\n");
>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>>>it to be unrolled
>>>>>> >> >> > >> >            etc.  */
>>>>>> >> >> > >> >         loop->force_vectorize = false;
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>it easier
>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>in dumps
>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>*/
>>>>>> >> >> > >> > +       if (new_loop)
>>>>>> >> >> > >> > +         {
>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>> >> >> > >> > +         }
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>new_loop)
>>>>>> >> >> > >> > function which will set up stuff properly (and also
>>>>perform
>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>vectorization
>>>>>> >> >> > >> > separately that would be great.
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>>>question its
>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>>>vector loop).
>>>>>> >> >> > >> > But it has already been approved ... oh well.
>>>>>> >> >> > >> >
>>>>>> >> >> > >> > Thanks,
>>>>>> >> >> > >> > Richard.
>>>>>> >> >> > >>
>>>>>> >> >> > >>
>>>>>> >> >> > >
>>>>>> >> >> > > --
>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> > Richard Biener <rguenther@suse.de>
>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>> >>
>>>>>> >
>>>>>> > --
>>>>>> > Richard Biener <rguenther@suse.de>
>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>
>>>>>
>>>>> --
>>>>> Richard Biener <rguenther@suse.de>
>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>Norton, HRB 21284 (AG Nuernberg)
>>>
>>>
Christophe Lyon Nov. 18, 2016, 3:54 p.m. UTC | #13
On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> It is very strange that this test failed on arm, since it requires
> target avx2 to check vectorizer dumps:
>
> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
> target avx2_runtime } } } */
> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>
> Could you please clarify what is the reason of the failure?

It's not the scan-dumps that fail, but the execution.
The test calls abort() for some reason.

It will take me a while to rebuild the test manually in the right
debug environment to provide you with more traces.



>
> Thanks.
>
> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is patch for non-masked epilogue vectoriziation.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>>
>>> Thanks.
>>> Changelog:
>>>
>>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>>> dynamic alias checks for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>>> using passed argument.
>>> (vect_transform_loop): Check if created epilogue should be returned
>>> for further vectorization with less vf.  If-convert epilogue if
>>> required. Print vectorization success for epilogue.
>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>>> if it is required, pass loop_vinfo produced during vectorization of
>>> loop body to vect_analyze_loop.
>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>>> orig_loop_info.
>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>>> (LOOP_VINFO_EPILOGUE_P): New.
>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>>> (vect_do_peeling): Change prototype to return epilogue.
>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>>> (vect_transform_loop): Return created loop.
>>>
>>> gcc/testsuite/
>>>
>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>>> (check_effective_target_avx2_runtime): New.
>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>>
>>
>> Hi,
>>
>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>
>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>
>> Christophe
>>
>>
>>
>>>
>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>Richard,
>>>>>
>>>>>I checked one of the tests designed for epilogue vectorization using
>>>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>>>
>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>>>t1.new-nomask.s -fdump-tree-vect-details
>>>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>>>4
>>>>> Without param only 2 loops are vectorized.
>>>>>
>>>>>Should I simply add a part of tests related to this feature or I must
>>>>>delete all not necessary changes also?
>>>>
>>>> Please remove all not necessary changes.
>>>>
>>>> Richard.
>>>>
>>>>>Thanks.
>>>>>Yuri.
>>>>>
>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>
>>>>>>> Richard,
>>>>>>>
>>>>>>> In my previous patch I forgot to remove couple lines related to aux
>>>>>field.
>>>>>>> Here is the correct updated patch.
>>>>>>
>>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>>>>> necessary parts from 1 and 2) if all not required parts are removed
>>>>>> (and you'd add the testcases covering non-masked tail vect).
>>>>>>
>>>>>> Thus, can you please produce a single complete patch containing only
>>>>>> non-masked epilogue vectoriziation?
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>> >
>>>>>>> >> Richard,
>>>>>>> >>
>>>>>>> >> I prepare updated 3 patch with passing additional argument to
>>>>>>> >> vect_analyze_loop as you proposed (untested).
>>>>>>> >>
>>>>>>> >> You wrote:
>>>>>>> >> tw, I wonder if you can produce a single patch containing just
>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>> >> changes only needed by later patches?
>>>>>>> >>
>>>>>>> >> Did you mean that I exclude all support for vectorization
>>>>>epilogues,
>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>>>>> >> like
>>>>>>> >>
>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>>>>> >> index 11863af..32011c1 100644
>>>>>>> >> --- a/gcc/tree-vect-loop.c
>>>>>>> >> +++ b/gcc/tree-vect-loop.c
>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>>>>> >
>>>>>>> > Yes.
>>>>>>> >
>>>>>>> >> Did you mean also that new combined patch must be working patch,
>>>>>i.e.
>>>>>>> >> can be integrated without other patches?
>>>>>>> >
>>>>>>> > Yes.
>>>>>>> >
>>>>>>> >> Could you please look at updated patch?
>>>>>>> >
>>>>>>> > Will do.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Richard.
>>>>>>> >
>>>>>>> >> Thanks.
>>>>>>> >> Yuri.
>>>>>>> >>
>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>>>>> >> >
>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>> >> >>
>>>>>>> >> >> > Richard,
>>>>>>> >> >> >
>>>>>>> >> >> > Here is updated 3 patch.
>>>>>>> >> >> >
>>>>>>> >> >> > I checked that all new tests related to epilogue
>>>>>vectorization passed with it.
>>>>>>> >> >> >
>>>>>>> >> >> > Your comments will be appreciated.
>>>>>>> >> >>
>>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>>>>> >> >> original vectorization factor?  So we can pass down an
>>>>>(optional)
>>>>>>> >> >> forced vectorization factor as well?
>>>>>>> >> >
>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>> >> > changes only needed by later patches?
>>>>>>> >> >
>>>>>>> >> > Thanks,
>>>>>>> >> > Richard.
>>>>>>> >> >
>>>>>>> >> >> Richard.
>>>>>>> >> >>
>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>>>>><rguenther@suse.de>:
>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>> >> >> > >
>>>>>>> >> >> > >> Hi Richard,
>>>>>>> >> >> > >>
>>>>>>> >> >> > >> I did not understand your last remark:
>>>>>>> >> >> > >>
>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>> >> >> > >> >           && dump_enabled_p ())
>>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>vect_location,
>>>>>>> >> >> > >> >                            "loop vectorized\n");
>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>>>>it to be unrolled
>>>>>>> >> >> > >> >           etc.  */
>>>>>>> >> >> > >> >      loop->force_vectorize = false;
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>it easier
>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>in dumps
>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>*/
>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>> >> >> > >> > +         {
>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>> >> >> > >> > +         }
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>new_loop)
>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>>>>perform
>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>vectorization
>>>>>>> >> >> > >> > separately that would be great.
>>>>>>> >> >> > >>
>>>>>>> >> >> > >> Could you please clarify your proposal.
>>>>>>> >> >> > >
>>>>>>> >> >> > > When a loop was vectorized set things up to immediately
>>>>>vectorize
>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>>>>avoiding
>>>>>>> >> >> > > the re-use of ->aux.
>>>>>>> >> >> > >
>>>>>>> >> >> > > Richard.
>>>>>>> >> >> > >
>>>>>>> >> >> > >> Thanks.
>>>>>>> >> >> > >> Yuri.
>>>>>>> >> >> > >>
>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>>>>><rguenther@suse.de>:
>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> >> Hi All,
>>>>>>> >> >> > >> >>
>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>>>>which support
>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>>>>trip count. We
>>>>>>> >> >> > >> >> assume that the only patch -
>>>>>vec-tails-07-combine-tail.patch - was not
>>>>>>> >> >> > >> >> approved by Jeff.
>>>>>>> >> >> > >> >>
>>>>>>> >> >> > >> >> I did re-base of all patches and performed
>>>>>bootstrapping and
>>>>>>> >> >> > >> >> regression testing that did not show any new failures.
>>>>>Also all
>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>>>>been changed
>>>>>>> >> >> > >> >> accordingly.
>>>>>>> >> >> > >> >>
>>>>>>> >> >> > >> >> Is it OK for trunk?
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > I would have prefered that the series up to
>>>>>-03-nomask-tails would
>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>>>>unfortunately
>>>>>>> >> >> > >> > the patchset is oddly separated.
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>>>>(loop_vec_info
>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>>>>single_exit (loop))
>>>>>>> >> >> > >> > -      || loop->inner)
>>>>>>> >> >> > >> > +      || loop->inner
>>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>>>>and
>>>>>>> >> >> > >> > +        is not required for epilogue.  */
>>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>>>>> >> >> > >> >      do_peeling = false;
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> >    if (do_peeling
>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>>>>(loop_vec_info
>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> >    do_versioning =
>>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>>>>> >> >> > >> > +          original loop and is not required for
>>>>>epilogue.  */
>>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> >    if (do_versioning)
>>>>>>> >> >> > >> >      {
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > please do that check in the single caller of this
>>>>>function.
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>>>>believe that simply
>>>>>>> >> >> > >> > passing down info from the processed parent would be
>>>>>_much_ cleaner.
>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>> >> >> > >> >             && dump_enabled_p ())
>>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>vect_location,
>>>>>>> >> >> > >> >                             "loop vectorized\n");
>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>>>>it to be unrolled
>>>>>>> >> >> > >> >            etc.  */
>>>>>>> >> >> > >> >         loop->force_vectorize = false;
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>it easier
>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>in dumps
>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>*/
>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>> >> >> > >> > +         {
>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>> >> >> > >> > +         }
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>new_loop)
>>>>>>> >> >> > >> > function which will set up stuff properly (and also
>>>>>perform
>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>vectorization
>>>>>>> >> >> > >> > separately that would be great.
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>>>>question its
>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>>>>vector loop).
>>>>>>> >> >> > >> > But it has already been approved ... oh well.
>>>>>>> >> >> > >> >
>>>>>>> >> >> > >> > Thanks,
>>>>>>> >> >> > >> > Richard.
>>>>>>> >> >> > >>
>>>>>>> >> >> > >>
>>>>>>> >> >> > >
>>>>>>> >> >> > > --
>>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>>>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>>>>> >> >> >
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >> > --
>>>>>>> >> > Richard Biener <rguenther@suse.de>
>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>> >>
>>>>>>> >
>>>>>>> > --
>>>>>>> > Richard Biener <rguenther@suse.de>
>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Richard Biener <rguenther@suse.de>
>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>
>>>>
Yuri Rumyantsev Nov. 24, 2016, 1:42 p.m. UTC | #14
Hi All,

Here is the second patch which supports epilogue vectorization using
masking without cost model. Currently it is possible
only with passing parameter "--param vect-epilogues-mask=1".

Bootstrapping and regression testing did not show any new regression.

Any comments will be appreciated.

ChangeLog:
2016-11-24  Yuri Rumyantsev  <ysrumyan@gmail.com>

* params.def (PARAM_VECT_EPILOGUES_MASK): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
(new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
required_mask fields.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Call vect_check_required_masks if all
statements can be masked.
(vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
Add check that epilogue can be masked with the same vf with issue
fail notes.  Allow epilogue vectorization through masking of low trip
loops. Set to true can_be_masked field before loop operation analysis.
Do not set-up min_scalar_loop_bound for epilogue vectorization through
masking.  Do not peeling for epilogue masking.  Reset can_be_masked
field before repeat analysis.
(vect_estimate_min_profitable_iters): Do not compute profitability
for epilogue masking.  Set up mask_loop filed to true if parameter
PARAM_VECT_EPILOGUES_MASK is non-zero.
(vectorizable_reduction): Add check that statement can be masked.
(vectorizable_induction): Do not support masking for induction.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_create_narrowed_masks): New.
(vect_create_widened_masks): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_mask_loop): New.
(vect_transform_loop): Invoke vect_mask_loop if required.
Use div_ceil to recompute upper bounds for masked loops.  Issue
statistics for epilogue vectorization through masking. Do not reduce
vf for masking epilogue.
* tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
(can_mask_load_store): New.
(vectorizable_mask_load_store): Check that mask conjuction is
supported.  Set-up first_copy_p field of stmt_vinfo.
(vectorizable_simd_clone_call): Check that simd clone can not be
masked.
(vectorizable_store): Check that store can be masked. Mark the first
copy of generated vector stores and provide it with vectype and the
original data reference.
(vectorizable_load): Check that load can be masked.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Add check on unsupported statements for masking
with printing message.
* tree-vectorizer.h (struct _loop_vec_info): Add new fields
can_be_maske, required_masks, masl_loop.
(LOOP_VINFO_CAN_BE_MASKED): New.
(LOOP_VINFO_REQUIRED_MASKS): New.
(LOOP_VINFO_MASK_LOOP): New.
(struct _stmt_vec_info): Add first_copy_p field.
(STMT_VINFO_FIRST_COPY_P): New.

gcc/testsuite/

* gcc.dg/vect/vect-tail-mask-1.c: New test.

2016-11-18 18:54 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
> On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> It is very strange that this test failed on arm, since it requires
>> target avx2 to check vectorizer dumps:
>>
>> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> target avx2_runtime } } } */
>> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>>
>> Could you please clarify what is the reason of the failure?
>
> It's not the scan-dumps that fail, but the execution.
> The test calls abort() for some reason.
>
> It will take me a while to rebuild the test manually in the right
> debug environment to provide you with more traces.
>
>
>
>>
>> Thanks.
>>
>> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
>>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is patch for non-masked epilogue vectoriziation.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>>
>>>> Is it OK for trunk?
>>>>
>>>> Thanks.
>>>> Changelog:
>>>>
>>>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>>> * * tree-if-conv.h: New file.
>>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>>>> dynamic alias checks for epilogues.
>>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>>>> * tree-vect-loop.c: include tree-if-conv.h.
>>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>>>> using passed argument.
>>>> (vect_transform_loop): Check if created epilogue should be returned
>>>> for further vectorization with less vf.  If-convert epilogue if
>>>> required. Print vectorization success for epilogue.
>>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>>>> if it is required, pass loop_vinfo produced during vectorization of
>>>> loop body to vect_analyze_loop.
>>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>>>> orig_loop_info.
>>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>>>> (LOOP_VINFO_EPILOGUE_P): New.
>>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>>>> (vect_do_peeling): Change prototype to return epilogue.
>>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>>>> (vect_transform_loop): Return created loop.
>>>>
>>>> gcc/testsuite/
>>>>
>>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>>>> (check_effective_target_avx2_runtime): New.
>>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>>>
>>>
>>> Hi,
>>>
>>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>>
>>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>>
>>> Christophe
>>>
>>>
>>>
>>>>
>>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>Richard,
>>>>>>
>>>>>>I checked one of the tests designed for epilogue vectorization using
>>>>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>>>>
>>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>>>>t1.new-nomask.s -fdump-tree-vect-details
>>>>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>>>>4
>>>>>> Without param only 2 loops are vectorized.
>>>>>>
>>>>>>Should I simply add a part of tests related to this feature or I must
>>>>>>delete all not necessary changes also?
>>>>>
>>>>> Please remove all not necessary changes.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>Thanks.
>>>>>>Yuri.
>>>>>>
>>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> In my previous patch I forgot to remove couple lines related to aux
>>>>>>field.
>>>>>>>> Here is the correct updated patch.
>>>>>>>
>>>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>>>>>> necessary parts from 1 and 2) if all not required parts are removed
>>>>>>> (and you'd add the testcases covering non-masked tail vect).
>>>>>>>
>>>>>>> Thus, can you please produce a single complete patch containing only
>>>>>>> non-masked epilogue vectoriziation?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >
>>>>>>>> >> Richard,
>>>>>>>> >>
>>>>>>>> >> I prepare updated 3 patch with passing additional argument to
>>>>>>>> >> vect_analyze_loop as you proposed (untested).
>>>>>>>> >>
>>>>>>>> >> You wrote:
>>>>>>>> >> tw, I wonder if you can produce a single patch containing just
>>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>>> >> changes only needed by later patches?
>>>>>>>> >>
>>>>>>>> >> Did you mean that I exclude all support for vectorization
>>>>>>epilogues,
>>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>>>>>> >> like
>>>>>>>> >>
>>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>>>>>> >> index 11863af..32011c1 100644
>>>>>>>> >> --- a/gcc/tree-vect-loop.c
>>>>>>>> >> +++ b/gcc/tree-vect-loop.c
>>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>>>>>> >
>>>>>>>> > Yes.
>>>>>>>> >
>>>>>>>> >> Did you mean also that new combined patch must be working patch,
>>>>>>i.e.
>>>>>>>> >> can be integrated without other patches?
>>>>>>>> >
>>>>>>>> > Yes.
>>>>>>>> >
>>>>>>>> >> Could you please look at updated patch?
>>>>>>>> >
>>>>>>>> > Will do.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Richard.
>>>>>>>> >
>>>>>>>> >> Thanks.
>>>>>>>> >> Yuri.
>>>>>>>> >>
>>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>>>>>> >> >
>>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> > Richard,
>>>>>>>> >> >> >
>>>>>>>> >> >> > Here is updated 3 patch.
>>>>>>>> >> >> >
>>>>>>>> >> >> > I checked that all new tests related to epilogue
>>>>>>vectorization passed with it.
>>>>>>>> >> >> >
>>>>>>>> >> >> > Your comments will be appreciated.
>>>>>>>> >> >>
>>>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>>>>>> >> >> original vectorization factor?  So we can pass down an
>>>>>>(optional)
>>>>>>>> >> >> forced vectorization factor as well?
>>>>>>>> >> >
>>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>>> >> > changes only needed by later patches?
>>>>>>>> >> >
>>>>>>>> >> > Thanks,
>>>>>>>> >> > Richard.
>>>>>>>> >> >
>>>>>>>> >> >> Richard.
>>>>>>>> >> >>
>>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>>>>>><rguenther@suse.de>:
>>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >> > >
>>>>>>>> >> >> > >> Hi Richard,
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> I did not understand your last remark:
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>>> >> >> > >> >           && dump_enabled_p ())
>>>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>>vect_location,
>>>>>>>> >> >> > >> >                            "loop vectorized\n");
>>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>>>>>it to be unrolled
>>>>>>>> >> >> > >> >           etc.  */
>>>>>>>> >> >> > >> >      loop->force_vectorize = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>>it easier
>>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>>in dumps
>>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>>*/
>>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>>> >> >> > >> > +         {
>>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>>> >> >> > >> > +         }
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>>new_loop)
>>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>>>>>perform
>>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>>vectorization
>>>>>>>> >> >> > >> > separately that would be great.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> Could you please clarify your proposal.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > When a loop was vectorized set things up to immediately
>>>>>>vectorize
>>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>>>>>avoiding
>>>>>>>> >> >> > > the re-use of ->aux.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > Richard.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > >> Thanks.
>>>>>>>> >> >> > >> Yuri.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>>>>>><rguenther@suse.de>:
>>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >> Hi All,
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>>>>>which support
>>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>>>>>trip count. We
>>>>>>>> >> >> > >> >> assume that the only patch -
>>>>>>vec-tails-07-combine-tail.patch - was not
>>>>>>>> >> >> > >> >> approved by Jeff.
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> I did re-base of all patches and performed
>>>>>>bootstrapping and
>>>>>>>> >> >> > >> >> regression testing that did not show any new failures.
>>>>>>Also all
>>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>>>>>been changed
>>>>>>>> >> >> > >> >> accordingly.
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> Is it OK for trunk?
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I would have prefered that the series up to
>>>>>>-03-nomask-tails would
>>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>>>>>unfortunately
>>>>>>>> >> >> > >> > the patchset is oddly separated.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>>>>>(loop_vec_info
>>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>>>>>single_exit (loop))
>>>>>>>> >> >> > >> > -      || loop->inner)
>>>>>>>> >> >> > >> > +      || loop->inner
>>>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>>>>>and
>>>>>>>> >> >> > >> > +        is not required for epilogue.  */
>>>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>>>>>> >> >> > >> >      do_peeling = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    if (do_peeling
>>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>>>>>(loop_vec_info
>>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    do_versioning =
>>>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>>>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>>>>>> >> >> > >> > +          original loop and is not required for
>>>>>>epilogue.  */
>>>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    if (do_versioning)
>>>>>>>> >> >> > >> >      {
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > please do that check in the single caller of this
>>>>>>function.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>>>>>believe that simply
>>>>>>>> >> >> > >> > passing down info from the processed parent would be
>>>>>>_much_ cleaner.
>>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>>> >> >> > >> >             && dump_enabled_p ())
>>>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>>vect_location,
>>>>>>>> >> >> > >> >                             "loop vectorized\n");
>>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>>>>>it to be unrolled
>>>>>>>> >> >> > >> >            etc.  */
>>>>>>>> >> >> > >> >         loop->force_vectorize = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>>it easier
>>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>>in dumps
>>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>>*/
>>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>>> >> >> > >> > +         {
>>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>>> >> >> > >> > +         }
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>>new_loop)
>>>>>>>> >> >> > >> > function which will set up stuff properly (and also
>>>>>>perform
>>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>>vectorization
>>>>>>>> >> >> > >> > separately that would be great.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>>>>>question its
>>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>>>>>vector loop).
>>>>>>>> >> >> > >> > But it has already been approved ... oh well.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > Thanks,
>>>>>>>> >> >> > >> > Richard.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > --
>>>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>>>>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >> > --
>>>>>>>> >> > Richard Biener <rguenther@suse.de>
>>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Richard Biener <rguenther@suse.de>
>>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Richard Biener <rguenther@suse.de>
>>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>
>>>>>
Richard Biener Nov. 28, 2016, 2:39 p.m. UTC | #15
On Thu, 24 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is the second patch which supports epilogue vectorization using
> masking without cost model. Currently it is possible
> only with passing parameter "--param vect-epilogues-mask=1".
> 
> Bootstrapping and regression testing did not show any new regression.
> 
> Any comments will be appreciated.

Going over the patch the main question is one how it works -- it looks
like the decision whether to vectorize & mask the epilogue is made
when vectorizing the loop that generates the epilogue rather than
in the epilogue vectorization path?

That is, I'd have expected to see this handling low-trip count loops
by masking?  And thus masking the epilogue simply by it being
low-trip count?

Richard.

> ChangeLog:
> 2016-11-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
> 
> * params.def (PARAM_VECT_EPILOGUES_MASK): New.
> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
> required_mask fields.
> (vect_check_required_masks_widening): New.
> (vect_check_required_masks_narrowing): New.
> (vect_get_masking_iv_elems): New.
> (vect_get_masking_iv_type): New.
> (vect_get_extreme_masks): New.
> (vect_check_required_masks): New.
> (vect_analyze_loop_operations): Call vect_check_required_masks if all
> statements can be masked.
> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
> Add check that epilogue can be masked with the same vf with issue
> fail notes.  Allow epilogue vectorization through masking of low trip
> loops. Set to true can_be_masked field before loop operation analysis.
> Do not set-up min_scalar_loop_bound for epilogue vectorization through
> masking.  Do not peeling for epilogue masking.  Reset can_be_masked
> field before repeat analysis.
> (vect_estimate_min_profitable_iters): Do not compute profitability
> for epilogue masking.  Set up mask_loop filed to true if parameter
> PARAM_VECT_EPILOGUES_MASK is non-zero.
> (vectorizable_reduction): Add check that statement can be masked.
> (vectorizable_induction): Do not support masking for induction.
> (vect_gen_ivs_for_masking): New.
> (vect_get_mask_index_for_elems): New.
> (vect_get_mask_index_for_type): New.
> (vect_create_narrowed_masks): New.
> (vect_create_widened_masks): New.
> (vect_gen_loop_masks): New.
> (vect_mask_reduction_stmt): New.
> (vect_mask_mask_load_store_stmt): New.
> (vect_mask_load_store_stmt): New.
> (vect_mask_loop): New.
> (vect_transform_loop): Invoke vect_mask_loop if required.
> Use div_ceil to recompute upper bounds for masked loops.  Issue
> statistics for epilogue vectorization through masking. Do not reduce
> vf for masking epilogue.
> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
> (can_mask_load_store): New.
> (vectorizable_mask_load_store): Check that mask conjuction is
> supported.  Set-up first_copy_p field of stmt_vinfo.
> (vectorizable_simd_clone_call): Check that simd clone can not be
> masked.
> (vectorizable_store): Check that store can be masked. Mark the first
> copy of generated vector stores and provide it with vectype and the
> original data reference.
> (vectorizable_load): Check that load can be masked.
> (vect_stmt_should_be_masked_for_epilogue): New.
> (vect_add_required_mask_for_stmt): New.
> (vect_analyze_stmt): Add check on unsupported statements for masking
> with printing message.
> * tree-vectorizer.h (struct _loop_vec_info): Add new fields
> can_be_maske, required_masks, masl_loop.
> (LOOP_VINFO_CAN_BE_MASKED): New.
> (LOOP_VINFO_REQUIRED_MASKS): New.
> (LOOP_VINFO_MASK_LOOP): New.
> (struct _stmt_vec_info): Add first_copy_p field.
> (STMT_VINFO_FIRST_COPY_P): New.
> 
> gcc/testsuite/
> 
> * gcc.dg/vect/vect-tail-mask-1.c: New test.
> 
> 2016-11-18 18:54 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
> > On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> >> It is very strange that this test failed on arm, since it requires
> >> target avx2 to check vectorizer dumps:
> >>
> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
> >> target avx2_runtime } } } */
> >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
> >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
> >>
> >> Could you please clarify what is the reason of the failure?
> >
> > It's not the scan-dumps that fail, but the execution.
> > The test calls abort() for some reason.
> >
> > It will take me a while to rebuild the test manually in the right
> > debug environment to provide you with more traces.
> >
> >
> >
> >>
> >> Thanks.
> >>
> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
> >>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> >>>> Hi All,
> >>>>
> >>>> Here is patch for non-masked epilogue vectoriziation.
> >>>>
> >>>> Bootstrap and regression testing did not show any new failures.
> >>>>
> >>>> Is it OK for trunk?
> >>>>
> >>>> Thanks.
> >>>> Changelog:
> >>>>
> >>>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
> >>>>
> >>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> >>>> * tree-if-conv.c (tree_if_conversion): Make public.
> >>>> * * tree-if-conv.h: New file.
> >>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> >>>> dynamic alias checks for epilogues.
> >>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> >>>> * tree-vect-loop.c: include tree-if-conv.h.
> >>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
> >>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> >>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> >>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> >>>> using passed argument.
> >>>> (vect_transform_loop): Check if created epilogue should be returned
> >>>> for further vectorization with less vf.  If-convert epilogue if
> >>>> required. Print vectorization success for epilogue.
> >>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> >>>> if it is required, pass loop_vinfo produced during vectorization of
> >>>> loop body to vect_analyze_loop.
> >>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> >>>> orig_loop_info.
> >>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> >>>> (LOOP_VINFO_EPILOGUE_P): New.
> >>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> >>>> (vect_do_peeling): Change prototype to return epilogue.
> >>>> (vect_analyze_loop): Add argument of loop_vec_info type.
> >>>> (vect_transform_loop): Return created loop.
> >>>>
> >>>> gcc/testsuite/
> >>>>
> >>>> * lib/target-supports.exp (check_avx2_hw_available): New.
> >>>> (check_effective_target_avx2_runtime): New.
> >>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
> >>>>
> >>>
> >>> Hi,
> >>>
> >>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
> >>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
> >>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
> >>>
> >>> It does pass on the same target if configured --with-cpu=cortex-a9.
> >>>
> >>> Christophe
> >>>
> >>>
> >>>
> >>>>
> >>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> >>>>>>Richard,
> >>>>>>
> >>>>>>I checked one of the tests designed for epilogue vectorization using
> >>>>>>patches 1 - 3 and found out that build compiler performs vectorization
> >>>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
> >>>>>>
> >>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
> >>>>>>t1.new-nomask.s -fdump-tree-vect-details
> >>>>>>$ grep VECTORIZED -c t1.c.156t.vect
> >>>>>>4
> >>>>>> Without param only 2 loops are vectorized.
> >>>>>>
> >>>>>>Should I simply add a part of tests related to this feature or I must
> >>>>>>delete all not necessary changes also?
> >>>>>
> >>>>> Please remove all not necessary changes.
> >>>>>
> >>>>> Richard.
> >>>>>
> >>>>>>Thanks.
> >>>>>>Yuri.
> >>>>>>
> >>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
> >>>>>>>
> >>>>>>>> Richard,
> >>>>>>>>
> >>>>>>>> In my previous patch I forgot to remove couple lines related to aux
> >>>>>>field.
> >>>>>>>> Here is the correct updated patch.
> >>>>>>>
> >>>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
> >>>>>>> necessary parts from 1 and 2) if all not required parts are removed
> >>>>>>> (and you'd add the testcases covering non-masked tail vect).
> >>>>>>>
> >>>>>>> Thus, can you please produce a single complete patch containing only
> >>>>>>> non-masked epilogue vectoriziation?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>> Yuri.
> >>>>>>>>
> >>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >>>>>>>> >
> >>>>>>>> >> Richard,
> >>>>>>>> >>
> >>>>>>>> >> I prepare updated 3 patch with passing additional argument to
> >>>>>>>> >> vect_analyze_loop as you proposed (untested).
> >>>>>>>> >>
> >>>>>>>> >> You wrote:
> >>>>>>>> >> tw, I wonder if you can produce a single patch containing just
> >>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >>>>>>>> >> changes only needed by later patches?
> >>>>>>>> >>
> >>>>>>>> >> Did you mean that I exclude all support for vectorization
> >>>>>>epilogues,
> >>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
> >>>>>>>> >> like
> >>>>>>>> >>
> >>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >>>>>>>> >> index 11863af..32011c1 100644
> >>>>>>>> >> --- a/gcc/tree-vect-loop.c
> >>>>>>>> >> +++ b/gcc/tree-vect-loop.c
> >>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >>>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >>>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >>>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >>>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >>>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >>>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >>>>>>>> >
> >>>>>>>> > Yes.
> >>>>>>>> >
> >>>>>>>> >> Did you mean also that new combined patch must be working patch,
> >>>>>>i.e.
> >>>>>>>> >> can be integrated without other patches?
> >>>>>>>> >
> >>>>>>>> > Yes.
> >>>>>>>> >
> >>>>>>>> >> Could you please look at updated patch?
> >>>>>>>> >
> >>>>>>>> > Will do.
> >>>>>>>> >
> >>>>>>>> > Thanks,
> >>>>>>>> > Richard.
> >>>>>>>> >
> >>>>>>>> >> Thanks.
> >>>>>>>> >> Yuri.
> >>>>>>>> >>
> >>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
> >>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >>>>>>>> >> >
> >>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>>>>>>> >> >>
> >>>>>>>> >> >> > Richard,
> >>>>>>>> >> >> >
> >>>>>>>> >> >> > Here is updated 3 patch.
> >>>>>>>> >> >> >
> >>>>>>>> >> >> > I checked that all new tests related to epilogue
> >>>>>>vectorization passed with it.
> >>>>>>>> >> >> >
> >>>>>>>> >> >> > Your comments will be appreciated.
> >>>>>>>> >> >>
> >>>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
> >>>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >>>>>>>> >> >> original vectorization factor?  So we can pass down an
> >>>>>>(optional)
> >>>>>>>> >> >> forced vectorization factor as well?
> >>>>>>>> >> >
> >>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
> >>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
> >>>>>>>> >> > changes only needed by later patches?
> >>>>>>>> >> >
> >>>>>>>> >> > Thanks,
> >>>>>>>> >> > Richard.
> >>>>>>>> >> >
> >>>>>>>> >> >> Richard.
> >>>>>>>> >> >>
> >>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
> >>>>>><rguenther@suse.de>:
> >>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >>>>>>>> >> >> > >
> >>>>>>>> >> >> > >> Hi Richard,
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >> I did not understand your last remark:
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>>>>>> >> >> > >> >           && dump_enabled_p ())
> >>>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>>>>>vect_location,
> >>>>>>>> >> >> > >> >                            "loop vectorized\n");
> >>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>>>>>> >> >> > >> >         num_vectorized_loops++;
> >>>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
> >>>>>>it to be unrolled
> >>>>>>>> >> >> > >> >           etc.  */
> >>>>>>>> >> >> > >> >      loop->force_vectorize = false;
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>>>>>it easier
> >>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>>>>>in dumps
> >>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
> >>>>>>*/
> >>>>>>>> >> >> > >> > +       if (new_loop)
> >>>>>>>> >> >> > >> > +         {
> >>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>>>>>> >> >> > >> > +         }
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>>>>>new_loop)
> >>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
> >>>>>>perform
> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>>>>>vectorization
> >>>>>>>> >> >> > >> > separately that would be great.
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >> Could you please clarify your proposal.
> >>>>>>>> >> >> > >
> >>>>>>>> >> >> > > When a loop was vectorized set things up to immediately
> >>>>>>vectorize
> >>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
> >>>>>>avoiding
> >>>>>>>> >> >> > > the re-use of ->aux.
> >>>>>>>> >> >> > >
> >>>>>>>> >> >> > > Richard.
> >>>>>>>> >> >> > >
> >>>>>>>> >> >> > >> Thanks.
> >>>>>>>> >> >> > >> Yuri.
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
> >>>>>><rguenther@suse.de>:
> >>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> >> Hi All,
> >>>>>>>> >> >> > >> >>
> >>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
> >>>>>>which support
> >>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
> >>>>>>trip count. We
> >>>>>>>> >> >> > >> >> assume that the only patch -
> >>>>>>vec-tails-07-combine-tail.patch - was not
> >>>>>>>> >> >> > >> >> approved by Jeff.
> >>>>>>>> >> >> > >> >>
> >>>>>>>> >> >> > >> >> I did re-base of all patches and performed
> >>>>>>bootstrapping and
> >>>>>>>> >> >> > >> >> regression testing that did not show any new failures.
> >>>>>>Also all
> >>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
> >>>>>>been changed
> >>>>>>>> >> >> > >> >> accordingly.
> >>>>>>>> >> >> > >> >>
> >>>>>>>> >> >> > >> >> Is it OK for trunk?
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > I would have prefered that the series up to
> >>>>>>-03-nomask-tails would
> >>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
> >>>>>>unfortunately
> >>>>>>>> >> >> > >> > the patchset is oddly separated.
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > I have a comment on that part nevertheless:
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
> >>>>>>(loop_vec_info
> >>>>>>>> >> >> > >> > loop_vinfo)
> >>>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
> >>>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >>>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
> >>>>>>single_exit (loop))
> >>>>>>>> >> >> > >> > -      || loop->inner)
> >>>>>>>> >> >> > >> > +      || loop->inner
> >>>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
> >>>>>>and
> >>>>>>>> >> >> > >> > +        is not required for epilogue.  */
> >>>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >>>>>>>> >> >> > >> >      do_peeling = false;
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> >    if (do_peeling
> >>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
> >>>>>>(loop_vec_info
> >>>>>>>> >> >> > >> > loop_vinfo)
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> >    do_versioning =
> >>>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
> >>>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
> >>>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
> >>>>>>>> >> >> > >> > +        /* Required versioning was performed for the
> >>>>>>>> >> >> > >> > +          original loop and is not required for
> >>>>>>epilogue.  */
> >>>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> >    if (do_versioning)
> >>>>>>>> >> >> > >> >      {
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > please do that check in the single caller of this
> >>>>>>function.
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
> >>>>>>believe that simply
> >>>>>>>> >> >> > >> > passing down info from the processed parent would be
> >>>>>>_much_ cleaner.
> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>>>>>> >> >> > >> >             && dump_enabled_p ())
> >>>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>>>>>vect_location,
> >>>>>>>> >> >> > >> >                             "loop vectorized\n");
> >>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>>>>>> >> >> > >> >         num_vectorized_loops++;
> >>>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
> >>>>>>it to be unrolled
> >>>>>>>> >> >> > >> >            etc.  */
> >>>>>>>> >> >> > >> >         loop->force_vectorize = false;
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>>>>>it easier
> >>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>>>>>in dumps
> >>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
> >>>>>>*/
> >>>>>>>> >> >> > >> > +       if (new_loop)
> >>>>>>>> >> >> > >> > +         {
> >>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>>>>>> >> >> > >> > +         }
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>>>>>new_loop)
> >>>>>>>> >> >> > >> > function which will set up stuff properly (and also
> >>>>>>perform
> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>>>>>vectorization
> >>>>>>>> >> >> > >> > separately that would be great.
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
> >>>>>>question its
> >>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
> >>>>>>vector loop).
> >>>>>>>> >> >> > >> > But it has already been approved ... oh well.
> >>>>>>>> >> >> > >> >
> >>>>>>>> >> >> > >> > Thanks,
> >>>>>>>> >> >> > >> > Richard.
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >>
> >>>>>>>> >> >> > >
> >>>>>>>> >> >> > > --
> >>>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
> >>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
> >>>>>>Graham Norton, HRB 21284 (AG Nuernberg)
> >>>>>>>> >> >> >
> >>>>>>>> >> >>
> >>>>>>>> >> >>
> >>>>>>>> >> >
> >>>>>>>> >> > --
> >>>>>>>> >> > Richard Biener <rguenther@suse.de>
> >>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>>>>>Norton, HRB 21284 (AG Nuernberg)
> >>>>>>>> >>
> >>>>>>>> >
> >>>>>>>> > --
> >>>>>>>> > Richard Biener <rguenther@suse.de>
> >>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>>>>>Norton, HRB 21284 (AG Nuernberg)
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Richard Biener <rguenther@suse.de>
> >>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>>>>>Norton, HRB 21284 (AG Nuernberg)
> >>>>>
> >>>>>
>
Yuri Rumyantsev Nov. 28, 2016, 4:57 p.m. UTC | #16
Richard!

I attached vect dump for hte part of attached test-case which
illustrated how vectorization of epilogues works through masking:
#define SIZE 1023
#define ALIGN 64

extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
__SIZE_TYPE__ size) __attribute__((weak));
extern void free (void *);

void __attribute__((noinline))
test_citer (int * __restrict__ a,
   int * __restrict__ b,
   int * __restrict__ c)
{
  int i;

  a = (int *)__builtin_assume_aligned (a, ALIGN);
  b = (int *)__builtin_assume_aligned (b, ALIGN);
  c = (int *)__builtin_assume_aligned (c, ALIGN);

  for (i = 0; i < SIZE; i++)
    c[i] = a[i] + b[i];
}

It was compiled with -mavx2 --param vect-epilogues-mask=1 options.

I did not include in this patch vectorization of low trip-count loops
since in the original patch additional parameter was introduced:
+DEFPARAM (PARAM_VECT_SHORT_LOOPS,
+  "vect-short-loops",
+  "Enable vectorization of low trip count loops using masking.",
+  0, 0, 1)

I assume that this ability can be included very quickly but it
requires cost model enhancements also.

Best regards.
Yuri.


2016-11-28 17:39 GMT+03:00 Richard Biener <rguenther@suse.de>:
> On Thu, 24 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi All,
>>
>> Here is the second patch which supports epilogue vectorization using
>> masking without cost model. Currently it is possible
>> only with passing parameter "--param vect-epilogues-mask=1".
>>
>> Bootstrapping and regression testing did not show any new regression.
>>
>> Any comments will be appreciated.
>
> Going over the patch the main question is one how it works -- it looks
> like the decision whether to vectorize & mask the epilogue is made
> when vectorizing the loop that generates the epilogue rather than
> in the epilogue vectorization path?
>
> That is, I'd have expected to see this handling low-trip count loops
> by masking?  And thus masking the epilogue simply by it being
> low-trip count?
>
> Richard.
>
>> ChangeLog:
>> 2016-11-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * params.def (PARAM_VECT_EPILOGUES_MASK): New.
>> * tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
>> * tree-vect-loop.c: Include insn-config.h, recog.h and alias.h.
>> (new_loop_vec_info): Add zeroing can_be_masked, mask_loop and
>> required_mask fields.
>> (vect_check_required_masks_widening): New.
>> (vect_check_required_masks_narrowing): New.
>> (vect_get_masking_iv_elems): New.
>> (vect_get_masking_iv_type): New.
>> (vect_get_extreme_masks): New.
>> (vect_check_required_masks): New.
>> (vect_analyze_loop_operations): Call vect_check_required_masks if all
>> statements can be masked.
>> (vect_analyze_loop_2): Inititalize to zero min_scalar_loop_bound.
>> Add check that epilogue can be masked with the same vf with issue
>> fail notes.  Allow epilogue vectorization through masking of low trip
>> loops. Set to true can_be_masked field before loop operation analysis.
>> Do not set-up min_scalar_loop_bound for epilogue vectorization through
>> masking.  Do not peeling for epilogue masking.  Reset can_be_masked
>> field before repeat analysis.
>> (vect_estimate_min_profitable_iters): Do not compute profitability
>> for epilogue masking.  Set up mask_loop filed to true if parameter
>> PARAM_VECT_EPILOGUES_MASK is non-zero.
>> (vectorizable_reduction): Add check that statement can be masked.
>> (vectorizable_induction): Do not support masking for induction.
>> (vect_gen_ivs_for_masking): New.
>> (vect_get_mask_index_for_elems): New.
>> (vect_get_mask_index_for_type): New.
>> (vect_create_narrowed_masks): New.
>> (vect_create_widened_masks): New.
>> (vect_gen_loop_masks): New.
>> (vect_mask_reduction_stmt): New.
>> (vect_mask_mask_load_store_stmt): New.
>> (vect_mask_load_store_stmt): New.
>> (vect_mask_loop): New.
>> (vect_transform_loop): Invoke vect_mask_loop if required.
>> Use div_ceil to recompute upper bounds for masked loops.  Issue
>> statistics for epilogue vectorization through masking. Do not reduce
>> vf for masking epilogue.
>> * tree-vect-stmts.c: Include tree-ssa-loop-ivopts.h.
>> (can_mask_load_store): New.
>> (vectorizable_mask_load_store): Check that mask conjuction is
>> supported.  Set-up first_copy_p field of stmt_vinfo.
>> (vectorizable_simd_clone_call): Check that simd clone can not be
>> masked.
>> (vectorizable_store): Check that store can be masked. Mark the first
>> copy of generated vector stores and provide it with vectype and the
>> original data reference.
>> (vectorizable_load): Check that load can be masked.
>> (vect_stmt_should_be_masked_for_epilogue): New.
>> (vect_add_required_mask_for_stmt): New.
>> (vect_analyze_stmt): Add check on unsupported statements for masking
>> with printing message.
>> * tree-vectorizer.h (struct _loop_vec_info): Add new fields
>> can_be_maske, required_masks, masl_loop.
>> (LOOP_VINFO_CAN_BE_MASKED): New.
>> (LOOP_VINFO_REQUIRED_MASKS): New.
>> (LOOP_VINFO_MASK_LOOP): New.
>> (struct _stmt_vec_info): Add first_copy_p field.
>> (STMT_VINFO_FIRST_COPY_P): New.
>>
>> gcc/testsuite/
>>
>> * gcc.dg/vect/vect-tail-mask-1.c: New test.
>>
>> 2016-11-18 18:54 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
>> > On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> >> It is very strange that this test failed on arm, since it requires
>> >> target avx2 to check vectorizer dumps:
>> >>
>> >> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> >> target avx2_runtime } } } */
>> >> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> >> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>> >>
>> >> Could you please clarify what is the reason of the failure?
>> >
>> > It's not the scan-dumps that fail, but the execution.
>> > The test calls abort() for some reason.
>> >
>> > It will take me a while to rebuild the test manually in the right
>> > debug environment to provide you with more traces.
>> >
>> >
>> >
>> >>
>> >> Thanks.
>> >>
>> >> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
>> >>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> >>>> Hi All,
>> >>>>
>> >>>> Here is patch for non-masked epilogue vectoriziation.
>> >>>>
>> >>>> Bootstrap and regression testing did not show any new failures.
>> >>>>
>> >>>> Is it OK for trunk?
>> >>>>
>> >>>> Thanks.
>> >>>> Changelog:
>> >>>>
>> >>>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>> >>>>
>> >>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>> >>>> * tree-if-conv.c (tree_if_conversion): Make public.
>> >>>> * * tree-if-conv.h: New file.
>> >>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>> >>>> dynamic alias checks for epilogues.
>> >>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>> >>>> * tree-vect-loop.c: include tree-if-conv.h.
>> >>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>> >>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>> >>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>> >>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>> >>>> using passed argument.
>> >>>> (vect_transform_loop): Check if created epilogue should be returned
>> >>>> for further vectorization with less vf.  If-convert epilogue if
>> >>>> required. Print vectorization success for epilogue.
>> >>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>> >>>> if it is required, pass loop_vinfo produced during vectorization of
>> >>>> loop body to vect_analyze_loop.
>> >>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>> >>>> orig_loop_info.
>> >>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>> >>>> (LOOP_VINFO_EPILOGUE_P): New.
>> >>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>> >>>> (vect_do_peeling): Change prototype to return epilogue.
>> >>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>> >>>> (vect_transform_loop): Return created loop.
>> >>>>
>> >>>> gcc/testsuite/
>> >>>>
>> >>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>> >>>> (check_effective_target_avx2_runtime): New.
>> >>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>> >>>>
>> >>>
>> >>> Hi,
>> >>>
>> >>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>> >>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>> >>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>> >>>
>> >>> It does pass on the same target if configured --with-cpu=cortex-a9.
>> >>>
>> >>> Christophe
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> >>>>>>Richard,
>> >>>>>>
>> >>>>>>I checked one of the tests designed for epilogue vectorization using
>> >>>>>>patches 1 - 3 and found out that build compiler performs vectorization
>> >>>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>> >>>>>>
>> >>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>> >>>>>>t1.new-nomask.s -fdump-tree-vect-details
>> >>>>>>$ grep VECTORIZED -c t1.c.156t.vect
>> >>>>>>4
>> >>>>>> Without param only 2 loops are vectorized.
>> >>>>>>
>> >>>>>>Should I simply add a part of tests related to this feature or I must
>> >>>>>>delete all not necessary changes also?
>> >>>>>
>> >>>>> Please remove all not necessary changes.
>> >>>>>
>> >>>>> Richard.
>> >>>>>
>> >>>>>>Thanks.
>> >>>>>>Yuri.
>> >>>>>>
>> >>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>> >>>>>>>
>> >>>>>>>> Richard,
>> >>>>>>>>
>> >>>>>>>> In my previous patch I forgot to remove couple lines related to aux
>> >>>>>>field.
>> >>>>>>>> Here is the correct updated patch.
>> >>>>>>>
>> >>>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>> >>>>>>> necessary parts from 1 and 2) if all not required parts are removed
>> >>>>>>> (and you'd add the testcases covering non-masked tail vect).
>> >>>>>>>
>> >>>>>>> Thus, can you please produce a single complete patch containing only
>> >>>>>>> non-masked epilogue vectoriziation?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Richard.
>> >>>>>>>
>> >>>>>>>> Thanks.
>> >>>>>>>> Yuri.
>> >>>>>>>>
>> >>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>> >>>>>>>> >
>> >>>>>>>> >> Richard,
>> >>>>>>>> >>
>> >>>>>>>> >> I prepare updated 3 patch with passing additional argument to
>> >>>>>>>> >> vect_analyze_loop as you proposed (untested).
>> >>>>>>>> >>
>> >>>>>>>> >> You wrote:
>> >>>>>>>> >> tw, I wonder if you can produce a single patch containing just
>> >>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>> >>>>>>>> >> changes only needed by later patches?
>> >>>>>>>> >>
>> >>>>>>>> >> Did you mean that I exclude all support for vectorization
>> >>>>>>epilogues,
>> >>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>> >>>>>>>> >> like
>> >>>>>>>> >>
>> >>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> >>>>>>>> >> index 11863af..32011c1 100644
>> >>>>>>>> >> --- a/gcc/tree-vect-loop.c
>> >>>>>>>> >> +++ b/gcc/tree-vect-loop.c
>> >>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>> >>>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>> >>>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>> >>>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>> >>>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>> >>>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>> >>>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>> >>>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>> >>>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>> >>>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>> >>>>>>>> >
>> >>>>>>>> > Yes.
>> >>>>>>>> >
>> >>>>>>>> >> Did you mean also that new combined patch must be working patch,
>> >>>>>>i.e.
>> >>>>>>>> >> can be integrated without other patches?
>> >>>>>>>> >
>> >>>>>>>> > Yes.
>> >>>>>>>> >
>> >>>>>>>> >> Could you please look at updated patch?
>> >>>>>>>> >
>> >>>>>>>> > Will do.
>> >>>>>>>> >
>> >>>>>>>> > Thanks,
>> >>>>>>>> > Richard.
>> >>>>>>>> >
>> >>>>>>>> >> Thanks.
>> >>>>>>>> >> Yuri.
>> >>>>>>>> >>
>> >>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>> >>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>> >>>>>>>> >> >
>> >>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>> >>>>>>>> >> >>
>> >>>>>>>> >> >> > Richard,
>> >>>>>>>> >> >> >
>> >>>>>>>> >> >> > Here is updated 3 patch.
>> >>>>>>>> >> >> >
>> >>>>>>>> >> >> > I checked that all new tests related to epilogue
>> >>>>>>vectorization passed with it.
>> >>>>>>>> >> >> >
>> >>>>>>>> >> >> > Your comments will be appreciated.
>> >>>>>>>> >> >>
>> >>>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>> >>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>> >>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>> >>>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>> >>>>>>>> >> >> original vectorization factor?  So we can pass down an
>> >>>>>>(optional)
>> >>>>>>>> >> >> forced vectorization factor as well?
>> >>>>>>>> >> >
>> >>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>> >>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>> >>>>>>>> >> > changes only needed by later patches?
>> >>>>>>>> >> >
>> >>>>>>>> >> > Thanks,
>> >>>>>>>> >> > Richard.
>> >>>>>>>> >> >
>> >>>>>>>> >> >> Richard.
>> >>>>>>>> >> >>
>> >>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>> >>>>>><rguenther@suse.de>:
>> >>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>> >>>>>>>> >> >> > >
>> >>>>>>>> >> >> > >> Hi Richard,
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >> I did not understand your last remark:
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >>>>>>>> >> >> > >> >           && dump_enabled_p ())
>> >>>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>> >>>>>>vect_location,
>> >>>>>>>> >> >> > >> >                            "loop vectorized\n");
>> >>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >>>>>>>> >> >> > >> >         num_vectorized_loops++;
>> >>>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>> >>>>>>it to be unrolled
>> >>>>>>>> >> >> > >> >           etc.  */
>> >>>>>>>> >> >> > >> >      loop->force_vectorize = false;
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>> >>>>>>it easier
>> >>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>> >>>>>>in dumps
>> >>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>> >>>>>>*/
>> >>>>>>>> >> >> > >> > +       if (new_loop)
>> >>>>>>>> >> >> > >> > +         {
>> >>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >>>>>>>> >> >> > >> > +         }
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>> >>>>>>new_loop)
>> >>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>> >>>>>>perform
>> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>> >>>>>>vectorization
>> >>>>>>>> >> >> > >> > separately that would be great.
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >> Could you please clarify your proposal.
>> >>>>>>>> >> >> > >
>> >>>>>>>> >> >> > > When a loop was vectorized set things up to immediately
>> >>>>>>vectorize
>> >>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>> >>>>>>avoiding
>> >>>>>>>> >> >> > > the re-use of ->aux.
>> >>>>>>>> >> >> > >
>> >>>>>>>> >> >> > > Richard.
>> >>>>>>>> >> >> > >
>> >>>>>>>> >> >> > >> Thanks.
>> >>>>>>>> >> >> > >> Yuri.
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>> >>>>>><rguenther@suse.de>:
>> >>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> >> Hi All,
>> >>>>>>>> >> >> > >> >>
>> >>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>> >>>>>>which support
>> >>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>> >>>>>>trip count. We
>> >>>>>>>> >> >> > >> >> assume that the only patch -
>> >>>>>>vec-tails-07-combine-tail.patch - was not
>> >>>>>>>> >> >> > >> >> approved by Jeff.
>> >>>>>>>> >> >> > >> >>
>> >>>>>>>> >> >> > >> >> I did re-base of all patches and performed
>> >>>>>>bootstrapping and
>> >>>>>>>> >> >> > >> >> regression testing that did not show any new failures.
>> >>>>>>Also all
>> >>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>> >>>>>>been changed
>> >>>>>>>> >> >> > >> >> accordingly.
>> >>>>>>>> >> >> > >> >>
>> >>>>>>>> >> >> > >> >> Is it OK for trunk?
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > I would have prefered that the series up to
>> >>>>>>-03-nomask-tails would
>> >>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>> >>>>>>unfortunately
>> >>>>>>>> >> >> > >> > the patchset is oddly separated.
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > I have a comment on that part nevertheless:
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>> >>>>>>(loop_vec_info
>> >>>>>>>> >> >> > >> > loop_vinfo)
>> >>>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>> >>>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>> >>>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>> >>>>>>single_exit (loop))
>> >>>>>>>> >> >> > >> > -      || loop->inner)
>> >>>>>>>> >> >> > >> > +      || loop->inner
>> >>>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>> >>>>>>and
>> >>>>>>>> >> >> > >> > +        is not required for epilogue.  */
>> >>>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>> >>>>>>>> >> >> > >> >      do_peeling = false;
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> >    if (do_peeling
>> >>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>> >>>>>>(loop_vec_info
>> >>>>>>>> >> >> > >> > loop_vinfo)
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> >    do_versioning =
>> >>>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>> >>>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>> >>>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>> >>>>>>>> >> >> > >> > +        /* Required versioning was performed for the
>> >>>>>>>> >> >> > >> > +          original loop and is not required for
>> >>>>>>epilogue.  */
>> >>>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> >    if (do_versioning)
>> >>>>>>>> >> >> > >> >      {
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > please do that check in the single caller of this
>> >>>>>>function.
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>> >>>>>>believe that simply
>> >>>>>>>> >> >> > >> > passing down info from the processed parent would be
>> >>>>>>_much_ cleaner.
>> >>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>> >>>>>>>> >> >> > >> >             && dump_enabled_p ())
>> >>>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>> >>>>>>vect_location,
>> >>>>>>>> >> >> > >> >                             "loop vectorized\n");
>> >>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>> >>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>> >>>>>>>> >> >> > >> >         num_vectorized_loops++;
>> >>>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>> >>>>>>it to be unrolled
>> >>>>>>>> >> >> > >> >            etc.  */
>> >>>>>>>> >> >> > >> >         loop->force_vectorize = false;
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>> >>>>>>it easier
>> >>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>> >>>>>>in dumps
>> >>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>> >>>>>>*/
>> >>>>>>>> >> >> > >> > +       if (new_loop)
>> >>>>>>>> >> >> > >> > +         {
>> >>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>> >>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>> >>>>>>>> >> >> > >> > +         }
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>> >>>>>>new_loop)
>> >>>>>>>> >> >> > >> > function which will set up stuff properly (and also
>> >>>>>>perform
>> >>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>> >>>>>>vectorization
>> >>>>>>>> >> >> > >> > separately that would be great.
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>> >>>>>>question its
>> >>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>> >>>>>>vector loop).
>> >>>>>>>> >> >> > >> > But it has already been approved ... oh well.
>> >>>>>>>> >> >> > >> >
>> >>>>>>>> >> >> > >> > Thanks,
>> >>>>>>>> >> >> > >> > Richard.
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >>
>> >>>>>>>> >> >> > >
>> >>>>>>>> >> >> > > --
>> >>>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>> >>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>> >>>>>>Graham Norton, HRB 21284 (AG Nuernberg)
>> >>>>>>>> >> >> >
>> >>>>>>>> >> >>
>> >>>>>>>> >> >>
>> >>>>>>>> >> >
>> >>>>>>>> >> > --
>> >>>>>>>> >> > Richard Biener <rguenther@suse.de>
>> >>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>> >>>>>>Norton, HRB 21284 (AG Nuernberg)
>> >>>>>>>> >>
>> >>>>>>>> >
>> >>>>>>>> > --
>> >>>>>>>> > Richard Biener <rguenther@suse.de>
>> >>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>> >>>>>>Norton, HRB 21284 (AG Nuernberg)
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Richard Biener <rguenther@suse.de>
>> >>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>> >>>>>>Norton, HRB 21284 (AG Nuernberg)
>> >>>>>
>> >>>>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Christophe Lyon Nov. 29, 2016, 4:22 p.m. UTC | #17
On 18 November 2016 at 16:54, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> It is very strange that this test failed on arm, since it requires
>> target avx2 to check vectorizer dumps:
>>
>> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> target avx2_runtime } } } */
>> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>>
>> Could you please clarify what is the reason of the failure?
>
> It's not the scan-dumps that fail, but the execution.
> The test calls abort() for some reason.
>
> It will take me a while to rebuild the test manually in the right
> debug environment to provide you with more traces.
>
>
Sorry for the delay... This problem is not directly related to your patch.

The tests in gcc.dg/vect are compiled with -mfpu=neon
-mfloat-abi=softfp -march=armv7-a
and thus cannot be executed on older versions of the architecture.

This is another instance of what I discussed with Jakub several months ago:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00666.html
but the thread died.

Basically, check_vect_support_and_set_flags sets set
dg-do-what-default compile, but
some tests in gcc.dg/vect have dg-do run hardcoded.

Jakub was not happy with my patch that was removing all these dg-do
run directives :-)

Christophe


>
>>
>> Thanks.
>>
>> 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.lyon@linaro.org>:
>>> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is patch for non-masked epilogue vectoriziation.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>>
>>>> Is it OK for trunk?
>>>>
>>>> Thanks.
>>>> Changelog:
>>>>
>>>> 2016-11-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
>>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>>> * * tree-if-conv.h: New file.
>>>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
>>>> dynamic alias checks for epilogues.
>>>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
>>>> * tree-vect-loop.c: include tree-if-conv.h.
>>>> (new_loop_vec_info): Add zeroing orig_loop_info field.
>>>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
>>>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
>>>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
>>>> using passed argument.
>>>> (vect_transform_loop): Check if created epilogue should be returned
>>>> for further vectorization with less vf.  If-convert epilogue if
>>>> required. Print vectorization success for epilogue.
>>>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
>>>> if it is required, pass loop_vinfo produced during vectorization of
>>>> loop body to vect_analyze_loop.
>>>> * tree-vectorizer.h (struct _loop_vec_info): Add new field
>>>> orig_loop_info.
>>>> (LOOP_VINFO_ORIG_LOOP_INFO): New.
>>>> (LOOP_VINFO_EPILOGUE_P): New.
>>>> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
>>>> (vect_do_peeling): Change prototype to return epilogue.
>>>> (vect_analyze_loop): Add argument of loop_vec_info type.
>>>> (vect_transform_loop): Return created loop.
>>>>
>>>> gcc/testsuite/
>>>>
>>>> * lib/target-supports.exp (check_avx2_hw_available): New.
>>>> (check_effective_target_avx2_runtime): New.
>>>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>>>>
>>>
>>> Hi,
>>>
>>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>>
>>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>>
>>> Christophe
>>>
>>>
>>>
>>>>
>>>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>Richard,
>>>>>>
>>>>>>I checked one of the tests designed for epilogue vectorization using
>>>>>>patches 1 - 3 and found out that build compiler performs vectorization
>>>>>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>>>>>
>>>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>>>>>t1.new-nomask.s -fdump-tree-vect-details
>>>>>>$ grep VECTORIZED -c t1.c.156t.vect
>>>>>>4
>>>>>> Without param only 2 loops are vectorized.
>>>>>>
>>>>>>Should I simply add a part of tests related to this feature or I must
>>>>>>delete all not necessary changes also?
>>>>>
>>>>> Please remove all not necessary changes.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>Thanks.
>>>>>>Yuri.
>>>>>>
>>>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> In my previous patch I forgot to remove couple lines related to aux
>>>>>>field.
>>>>>>>> Here is the correct updated patch.
>>>>>>>
>>>>>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>>>>>> necessary parts from 1 and 2) if all not required parts are removed
>>>>>>> (and you'd add the testcases covering non-masked tail vect).
>>>>>>>
>>>>>>> Thus, can you please produce a single complete patch containing only
>>>>>>> non-masked epilogue vectoriziation?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >
>>>>>>>> >> Richard,
>>>>>>>> >>
>>>>>>>> >> I prepare updated 3 patch with passing additional argument to
>>>>>>>> >> vect_analyze_loop as you proposed (untested).
>>>>>>>> >>
>>>>>>>> >> You wrote:
>>>>>>>> >> tw, I wonder if you can produce a single patch containing just
>>>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>>> >> changes only needed by later patches?
>>>>>>>> >>
>>>>>>>> >> Did you mean that I exclude all support for vectorization
>>>>>>epilogues,
>>>>>>>> >> i.e. exclude from 2-nd patch all non-related changes
>>>>>>>> >> like
>>>>>>>> >>
>>>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>>>>>>> >> index 11863af..32011c1 100644
>>>>>>>> >> --- a/gcc/tree-vect-loop.c
>>>>>>>> >> +++ b/gcc/tree-vect-loop.c
>>>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
>>>>>>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
>>>>>>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
>>>>>>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
>>>>>>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
>>>>>>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
>>>>>>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
>>>>>>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
>>>>>>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
>>>>>>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
>>>>>>>> >
>>>>>>>> > Yes.
>>>>>>>> >
>>>>>>>> >> Did you mean also that new combined patch must be working patch,
>>>>>>i.e.
>>>>>>>> >> can be integrated without other patches?
>>>>>>>> >
>>>>>>>> > Yes.
>>>>>>>> >
>>>>>>>> >> Could you please look at updated patch?
>>>>>>>> >
>>>>>>>> > Will do.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Richard.
>>>>>>>> >
>>>>>>>> >> Thanks.
>>>>>>>> >> Yuri.
>>>>>>>> >>
>>>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguenther@suse.de>:
>>>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
>>>>>>>> >> >
>>>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> > Richard,
>>>>>>>> >> >> >
>>>>>>>> >> >> > Here is updated 3 patch.
>>>>>>>> >> >> >
>>>>>>>> >> >> > I checked that all new tests related to epilogue
>>>>>>vectorization passed with it.
>>>>>>>> >> >> >
>>>>>>>> >> >> > Your comments will be appreciated.
>>>>>>>> >> >>
>>>>>>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
>>>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
>>>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
>>>>>>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
>>>>>>>> >> >> original vectorization factor?  So we can pass down an
>>>>>>(optional)
>>>>>>>> >> >> forced vectorization factor as well?
>>>>>>>> >> >
>>>>>>>> >> > Btw, I wonder if you can produce a single patch containing just
>>>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
>>>>>>>> >> > changes only needed by later patches?
>>>>>>>> >> >
>>>>>>>> >> > Thanks,
>>>>>>>> >> > Richard.
>>>>>>>> >> >
>>>>>>>> >> >> Richard.
>>>>>>>> >> >>
>>>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
>>>>>><rguenther@suse.de>:
>>>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >> > >
>>>>>>>> >> >> > >> Hi Richard,
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> I did not understand your last remark:
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>>> >> >> > >> >           && dump_enabled_p ())
>>>>>>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>>vect_location,
>>>>>>>> >> >> > >> >                            "loop vectorized\n");
>>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
>>>>>>it to be unrolled
>>>>>>>> >> >> > >> >           etc.  */
>>>>>>>> >> >> > >> >      loop->force_vectorize = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>>it easier
>>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>>in dumps
>>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>>*/
>>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>>> >> >> > >> > +         {
>>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>>> >> >> > >> > +         }
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>>new_loop)
>>>>>>>> >> >> > >> f> unction which will set up stuff properly (and also
>>>>>>perform
>>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>>vectorization
>>>>>>>> >> >> > >> > separately that would be great.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> Could you please clarify your proposal.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > When a loop was vectorized set things up to immediately
>>>>>>vectorize
>>>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and
>>>>>>avoiding
>>>>>>>> >> >> > > the re-use of ->aux.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > Richard.
>>>>>>>> >> >> > >
>>>>>>>> >> >> > >> Thanks.
>>>>>>>> >> >> > >> Yuri.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
>>>>>><rguenther@suse.de>:
>>>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >> Hi All,
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
>>>>>>which support
>>>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low
>>>>>>trip count. We
>>>>>>>> >> >> > >> >> assume that the only patch -
>>>>>>vec-tails-07-combine-tail.patch - was not
>>>>>>>> >> >> > >> >> approved by Jeff.
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> I did re-base of all patches and performed
>>>>>>bootstrapping and
>>>>>>>> >> >> > >> >> regression testing that did not show any new failures.
>>>>>>Also all
>>>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
>>>>>>been changed
>>>>>>>> >> >> > >> >> accordingly.
>>>>>>>> >> >> > >> >>
>>>>>>>> >> >> > >> >> Is it OK for trunk?
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I would have prefered that the series up to
>>>>>>-03-nomask-tails would
>>>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
>>>>>>unfortunately
>>>>>>>> >> >> > >> > the patchset is oddly separated.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I have a comment on that part nevertheless:
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
>>>>>>(loop_vec_info
>>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
>>>>>>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>>>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
>>>>>>single_exit (loop))
>>>>>>>> >> >> > >> > -      || loop->inner)
>>>>>>>> >> >> > >> > +      || loop->inner
>>>>>>>> >> >> > >> > +      /* Required peeling was performed in prologue
>>>>>>and
>>>>>>>> >> >> > >> > +        is not required for epilogue.  */
>>>>>>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>>>>>>>> >> >> > >> >      do_peeling = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    if (do_peeling
>>>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
>>>>>>(loop_vec_info
>>>>>>>> >> >> > >> > loop_vinfo)
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    do_versioning =
>>>>>>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
>>>>>>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
>>>>>>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
>>>>>>>> >> >> > >> > +        /* Required versioning was performed for the
>>>>>>>> >> >> > >> > +          original loop and is not required for
>>>>>>epilogue.  */
>>>>>>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> >    if (do_versioning)
>>>>>>>> >> >> > >> >      {
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > please do that check in the single caller of this
>>>>>>function.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
>>>>>>believe that simply
>>>>>>>> >> >> > >> > passing down info from the processed parent would be
>>>>>>_much_ cleaner.
>>>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
>>>>>>>> >> >> > >> >             && dump_enabled_p ())
>>>>>>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>>>>>>vect_location,
>>>>>>>> >> >> > >> >                             "loop vectorized\n");
>>>>>>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
>>>>>>>> >> >> > >> >         num_vectorized_loops++;
>>>>>>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
>>>>>>it to be unrolled
>>>>>>>> >> >> > >> >            etc.  */
>>>>>>>> >> >> > >> >         loop->force_vectorize = false;
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
>>>>>>it easier
>>>>>>>> >> >> > >> > +          to match loop and its epilogue vectorization
>>>>>>in dumps
>>>>>>>> >> >> > >> > +          put new loop as the next loop to process.
>>>>>>*/
>>>>>>>> >> >> > >> > +       if (new_loop)
>>>>>>>> >> >> > >> > +         {
>>>>>>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
>>>>>>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
>>>>>>>> >> >> > >> > +         }
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
>>>>>>new_loop)
>>>>>>>> >> >> > >> > function which will set up stuff properly (and also
>>>>>>perform
>>>>>>>> >> >> > >> > the if-conversion of the epilogue there).
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue
>>>>>>vectorization
>>>>>>>> >> >> > >> > separately that would be great.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and
>>>>>>question its
>>>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main
>>>>>>vector loop).
>>>>>>>> >> >> > >> > But it has already been approved ... oh well.
>>>>>>>> >> >> > >> >
>>>>>>>> >> >> > >> > Thanks,
>>>>>>>> >> >> > >> > Richard.
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >>
>>>>>>>> >> >> > >
>>>>>>>> >> >> > > --
>>>>>>>> >> >> > > Richard Biener <rguenther@suse.de>
>>>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
>>>>>>Graham Norton, HRB 21284 (AG Nuernberg)
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >> > --
>>>>>>>> >> > Richard Biener <rguenther@suse.de>
>>>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Richard Biener <rguenther@suse.de>
>>>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Richard Biener <rguenther@suse.de>
>>>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
>>>>>>Norton, HRB 21284 (AG Nuernberg)
>>>>>
>>>>>
diff mbox

Patch

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 11863af..32011c1 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1120,6 +1120,12 @@  new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
   LOOP_VINFO_PEELING_FOR_NITER (res) = false;
   LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
+  LOOP_VINFO_CAN_BE_MASKED (res) = false;
+  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
+  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
+  LOOP_VINFO_MASK_EPILOGUE (res) = false;
+  LOOP_VINFO_NEED_MASKING (res) = false;
+  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;

Did you mean also that new combined patch must be working patch, i.e.
can be integrated without other patches?