Patchwork Fix PR60505

login
register
mail settings
Submitter Cong Hou
Date March 11, 2014, 11:16 p.m.
Message ID <CAK=A3=1eySj3DAFV+3=GSD6y9KxnMQ6SKUW9s43cybRN7PeBiw@mail.gmail.com>
Download mbox | patch
Permalink /patch/329262/
State New
Headers show

Comments

Cong Hou - March 11, 2014, 11:16 p.m.
This patch is fixing PR60505 in which the vectorizer may produce
unnecessary epilogues.

Bootstrapped and tested on a x86_64 machine.

OK for trunk?


thanks,
Cong
Jakub Jelinek - March 12, 2014, 8:24 a.m.
On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> This patch is fixing PR60505 in which the vectorizer may produce
> unnecessary epilogues.
> 
> Bootstrapped and tested on a x86_64 machine.
> 
> OK for trunk?

That looks wrong.  Consider the case where the loop isn't versioned,
if you disable generation of the epilogue loop, you end up only with
a vector loop.

Say:
unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
void
foo (char *__restrict in, char *__restrict out, int num)
{
  int i;

  in = __builtin_assume_aligned (in, 16);
  out = __builtin_assume_aligned (out, 16);
  for (i = 0; i < num; ++i)
    out[i] = (ovec[i] = in[i]);
  out[num] = ovec[num / 2];
}
-O2 -ftree-vectorize.  Now, consider if this function is called
with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
valid and your patch will cause a wrong-code in this case).

	Jakub
Richard Guenther - March 12, 2014, 9:05 a.m.
On Wed, 12 Mar 2014, Jakub Jelinek wrote:

> On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> > This patch is fixing PR60505 in which the vectorizer may produce
> > unnecessary epilogues.
> > 
> > Bootstrapped and tested on a x86_64 machine.
> > 
> > OK for trunk?
> 
> That looks wrong.  Consider the case where the loop isn't versioned,
> if you disable generation of the epilogue loop, you end up only with
> a vector loop.
> 
> Say:
> unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> void
> foo (char *__restrict in, char *__restrict out, int num)
> {
>   int i;
> 
>   in = __builtin_assume_aligned (in, 16);
>   out = __builtin_assume_aligned (out, 16);
>   for (i = 0; i < num; ++i)
>     out[i] = (ovec[i] = in[i]);
>   out[num] = ovec[num / 2];
> }
> -O2 -ftree-vectorize.  Now, consider if this function is called
> with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> valid and your patch will cause a wrong-code in this case)

Indeed - we also "share" the epilogue loop for the cost model
check.  See where we set its maximum number of iterations.

Richard.
Cong Hou - March 12, 2014, 5:51 p.m.
Thank you for pointing it out. I didn't realized that alias analysis
has influences on this issue.

The current problem is that the epilogue may be unnecessary if the
loop bound cannot be larger than the number of iterations of the
vectorized loop multiplied by VF when the vectorized loop is supposed
to be executed. My method is incorrect because I assume the vectorized
loop will be executed which is actually guaranteed by loop bound check
(and also alias checks). So if the alias checks exist, my method is
fine as both conditions are met. If there is no alias checks, I must
consider the possibility that the vectorized loop may not be executed
at runtime and then the epilogue should not be eliminated. The warning
appears on epilogue, and with loop bound checks (and without alias
checks) the warning will be gone. So I think the key is alias checks:
my method only works if there is no alias checks.

How about adding one more condition that checks if alias checks are
needed, as the code shown below?

  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
      < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
      && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
                   || (unsigned HOST_WIDE_INT)max_stmt_executions_int
       (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;


thanks,
Cong


On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
>> This patch is fixing PR60505 in which the vectorizer may produce
>> unnecessary epilogues.
>>
>> Bootstrapped and tested on a x86_64 machine.
>>
>> OK for trunk?
>
> That looks wrong.  Consider the case where the loop isn't versioned,
> if you disable generation of the epilogue loop, you end up only with
> a vector loop.
>
> Say:
> unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> void
> foo (char *__restrict in, char *__restrict out, int num)
> {
>   int i;
>
>   in = __builtin_assume_aligned (in, 16);
>   out = __builtin_assume_aligned (out, 16);
>   for (i = 0; i < num; ++i)
>     out[i] = (ovec[i] = in[i]);
>   out[num] = ovec[num / 2];
> }
> -O2 -ftree-vectorize.  Now, consider if this function is called
> with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> valid and your patch will cause a wrong-code in this case).
>
>         Jakub
Richard Guenther - March 13, 2014, 9:27 a.m.
On Wed, 12 Mar 2014, Cong Hou wrote:

> Thank you for pointing it out. I didn't realized that alias analysis
> has influences on this issue.
> 
> The current problem is that the epilogue may be unnecessary if the
> loop bound cannot be larger than the number of iterations of the
> vectorized loop multiplied by VF when the vectorized loop is supposed
> to be executed. My method is incorrect because I assume the vectorized
> loop will be executed which is actually guaranteed by loop bound check
> (and also alias checks). So if the alias checks exist, my method is
> fine as both conditions are met.

But there is still the loop bound check which, if it fails, uses
the epilogue loop as fallback, not the scalar versioned loop.

> If there is no alias checks, I must
> consider the possibility that the vectorized loop may not be executed
> at runtime and then the epilogue should not be eliminated. The warning
> appears on epilogue, and with loop bound checks (and without alias
> checks) the warning will be gone. So I think the key is alias checks:
> my method only works if there is no alias checks.
> 
> How about adding one more condition that checks if alias checks are
> needed, as the code shown below?
> 
>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
> 
> 
> thanks,
> Cong
> 
> 
> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> >> This patch is fixing PR60505 in which the vectorizer may produce
> >> unnecessary epilogues.
> >>
> >> Bootstrapped and tested on a x86_64 machine.
> >>
> >> OK for trunk?
> >
> > That looks wrong.  Consider the case where the loop isn't versioned,
> > if you disable generation of the epilogue loop, you end up only with
> > a vector loop.
> >
> > Say:
> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> > void
> > foo (char *__restrict in, char *__restrict out, int num)
> > {
> >   int i;
> >
> >   in = __builtin_assume_aligned (in, 16);
> >   out = __builtin_assume_aligned (out, 16);
> >   for (i = 0; i < num; ++i)
> >     out[i] = (ovec[i] = in[i]);
> >   out[num] = ovec[num / 2];
> > }
> > -O2 -ftree-vectorize.  Now, consider if this function is called
> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> > valid and your patch will cause a wrong-code in this case).
> >
> >         Jakub
> 
>
Cong Hou - March 14, 2014, 12:41 a.m.
On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 12 Mar 2014, Cong Hou wrote:
>
>> Thank you for pointing it out. I didn't realized that alias analysis
>> has influences on this issue.
>>
>> The current problem is that the epilogue may be unnecessary if the
>> loop bound cannot be larger than the number of iterations of the
>> vectorized loop multiplied by VF when the vectorized loop is supposed
>> to be executed. My method is incorrect because I assume the vectorized
>> loop will be executed which is actually guaranteed by loop bound check
>> (and also alias checks). So if the alias checks exist, my method is
>> fine as both conditions are met.
>
> But there is still the loop bound check which, if it fails, uses
> the epilogue loop as fallback, not the scalar versioned loop.

The loop bound check is already performed together with alias checks
(assume we need alias checks). Actually, I did observe that the loop
bound check in the true body of alias checks may be unnecessary. For
example, for the following loop

  for(i=0; i < num ; ++i)
    out[i] = (ovec[i] = in[i]);

GCC now generates the following GIMPLE code after vectorization:



  <bb 3>: // loop bound check (with cost model) and alias checks
  _29 = (unsigned int) num_5(D);
  _28 = _29 > 15;
  _24 = in_9(D) + 16;
  _23 = out_7(D) >= _24;
  _2 = out_7(D) + 16;
  _1 = _2 <= in_9(D);
  _32 = _1 | _23;
  _31 = _28 & _32;
  if (_31 != 0)
    goto <bb 4>;
  else
    goto <bb 12>;

  <bb 4>:
  niters.3_44 = (unsigned int) num_5(D);
  _46 = niters.3_44 + 4294967280;
  _47 = _46 >> 4;
  bnd.4_45 = _47 + 1;
  ratio_mult_vf.5_48 = bnd.4_45 << 4;
  _59 = (unsigned int) num_5(D);
  _60 = _59 + 4294967295;
  if (_60 <= 14)                      <================ is this necessary?
    goto <bb 10>;
  else
    goto <bb 5>;


The check _60<=14 should be unnecessary because it is implied by the
fact _29 > 15 in bb3.

Consider this fact and if there are alias checks, we can safely remove
the epilogue if the maximum trip count of the loop is less than or
equal to the calculated threshold.


Cong



>
>> If there is no alias checks, I must
>> consider the possibility that the vectorized loop may not be executed
>> at runtime and then the epilogue should not be eliminated. The warning
>> appears on epilogue, and with loop bound checks (and without alias
>> checks) the warning will be gone. So I think the key is alias checks:
>> my method only works if there is no alias checks.
>>
>> How about adding one more condition that checks if alias checks are
>> needed, as the code shown below?
>>
>>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
>>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
>>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
>>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
>>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
>>
>>
>> thanks,
>> Cong
>>
>>
>> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
>> >> This patch is fixing PR60505 in which the vectorizer may produce
>> >> unnecessary epilogues.
>> >>
>> >> Bootstrapped and tested on a x86_64 machine.
>> >>
>> >> OK for trunk?
>> >
>> > That looks wrong.  Consider the case where the loop isn't versioned,
>> > if you disable generation of the epilogue loop, you end up only with
>> > a vector loop.
>> >
>> > Say:
>> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
>> > void
>> > foo (char *__restrict in, char *__restrict out, int num)
>> > {
>> >   int i;
>> >
>> >   in = __builtin_assume_aligned (in, 16);
>> >   out = __builtin_assume_aligned (out, 16);
>> >   for (i = 0; i < num; ++i)
>> >     out[i] = (ovec[i] = in[i]);
>> >   out[num] = ovec[num / 2];
>> > }
>> > -O2 -ftree-vectorize.  Now, consider if this function is called
>> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
>> > valid and your patch will cause a wrong-code in this case).
>> >
>> >         Jakub
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
Richard Guenther - March 14, 2014, 7:52 a.m.
On Thu, 13 Mar 2014, Cong Hou wrote:

> On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 12 Mar 2014, Cong Hou wrote:
> >
> >> Thank you for pointing it out. I didn't realized that alias analysis
> >> has influences on this issue.
> >>
> >> The current problem is that the epilogue may be unnecessary if the
> >> loop bound cannot be larger than the number of iterations of the
> >> vectorized loop multiplied by VF when the vectorized loop is supposed
> >> to be executed. My method is incorrect because I assume the vectorized
> >> loop will be executed which is actually guaranteed by loop bound check
> >> (and also alias checks). So if the alias checks exist, my method is
> >> fine as both conditions are met.
> >
> > But there is still the loop bound check which, if it fails, uses
> > the epilogue loop as fallback, not the scalar versioned loop.
> 
> The loop bound check is already performed together with alias checks
> (assume we need alias checks). Actually, I did observe that the loop
> bound check in the true body of alias checks may be unnecessary. For
> example, for the following loop
> 
>   for(i=0; i < num ; ++i)
>     out[i] = (ovec[i] = in[i]);
> 
> GCC now generates the following GIMPLE code after vectorization:
> 
> 
> 
>   <bb 3>: // loop bound check (with cost model) and alias checks
>   _29 = (unsigned int) num_5(D);
>   _28 = _29 > 15;
>   _24 = in_9(D) + 16;
>   _23 = out_7(D) >= _24;
>   _2 = out_7(D) + 16;
>   _1 = _2 <= in_9(D);
>   _32 = _1 | _23;
>   _31 = _28 & _32;
>   if (_31 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 12>;
> 
>   <bb 4>:
>   niters.3_44 = (unsigned int) num_5(D);
>   _46 = niters.3_44 + 4294967280;
>   _47 = _46 >> 4;
>   bnd.4_45 = _47 + 1;
>   ratio_mult_vf.5_48 = bnd.4_45 << 4;
>   _59 = (unsigned int) num_5(D);
>   _60 = _59 + 4294967295;
>   if (_60 <= 14)                      <================ is this necessary?
>     goto <bb 10>;
>   else
>     goto <bb 5>;
> 
> 
> The check _60<=14 should be unnecessary because it is implied by the
> fact _29 > 15 in bb3.

I agree that the code the vectorizer generates is less than optimal
in some cases (I filed a PR about this some time ago and fixed some
cases).

> Consider this fact and if there are alias checks, we can safely remove
> the epilogue if the maximum trip count of the loop is less than or
> equal to the calculated threshold.

You have to consider n % vf != 0, so an argument on only maximum
trip count or threshold cannot work.

Richard.

> 
> Cong
> 
> 
> 
> >
> >> If there is no alias checks, I must
> >> consider the possibility that the vectorized loop may not be executed
> >> at runtime and then the epilogue should not be eliminated. The warning
> >> appears on epilogue, and with loop bound checks (and without alias
> >> checks) the warning will be gone. So I think the key is alias checks:
> >> my method only works if there is no alias checks.
> >>
> >> How about adding one more condition that checks if alias checks are
> >> needed, as the code shown below?
> >>
> >>   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> >>   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> >>       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> >>       && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
> >>                    || (unsigned HOST_WIDE_INT)max_stmt_executions_int
> >>        (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
> >>     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
> >>
> >>
> >> thanks,
> >> Cong
> >>
> >>
> >> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> >> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> >> >> This patch is fixing PR60505 in which the vectorizer may produce
> >> >> unnecessary epilogues.
> >> >>
> >> >> Bootstrapped and tested on a x86_64 machine.
> >> >>
> >> >> OK for trunk?
> >> >
> >> > That looks wrong.  Consider the case where the loop isn't versioned,
> >> > if you disable generation of the epilogue loop, you end up only with
> >> > a vector loop.
> >> >
> >> > Say:
> >> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> >> > void
> >> > foo (char *__restrict in, char *__restrict out, int num)
> >> > {
> >> >   int i;
> >> >
> >> >   in = __builtin_assume_aligned (in, 16);
> >> >   out = __builtin_assume_aligned (out, 16);
> >> >   for (i = 0; i < num; ++i)
> >> >     out[i] = (ovec[i] = in[i]);
> >> >   out[num] = ovec[num / 2];
> >> > }
> >> > -O2 -ftree-vectorize.  Now, consider if this function is called
> >> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> >> > valid and your patch will cause a wrong-code in this case).
> >> >
> >> >         Jakub
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE / SUSE Labs
> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
> 
>
Jakub Jelinek - March 14, 2014, 7:57 a.m.
On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
> > Consider this fact and if there are alias checks, we can safely remove
> > the epilogue if the maximum trip count of the loop is less than or
> > equal to the calculated threshold.
> 
> You have to consider n % vf != 0, so an argument on only maximum
> trip count or threshold cannot work.

Well, if you only check if maximum trip count is <= vf and you know
that for n < vf the vectorized loop + it's epilogue path will not be taken,
then perhaps you could, but it is a very special case.
Now, the question is when we are guaranteed we enter the scalar versioned
loop instead for n < vf, is that in case of versioning for alias or
versioning for alignment?

	Jakub
Richard Guenther - March 14, 2014, 7:58 a.m.
On Fri, 14 Mar 2014, Jakub Jelinek wrote:

> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
> > > Consider this fact and if there are alias checks, we can safely remove
> > > the epilogue if the maximum trip count of the loop is less than or
> > > equal to the calculated threshold.
> > 
> > You have to consider n % vf != 0, so an argument on only maximum
> > trip count or threshold cannot work.
> 
> Well, if you only check if maximum trip count is <= vf and you know
> that for n < vf the vectorized loop + it's epilogue path will not be taken,
> then perhaps you could, but it is a very special case.
> Now, the question is when we are guaranteed we enter the scalar versioned
> loop instead for n < vf, is that in case of versioning for alias or
> versioning for alignment?

I think neither - I have plans to do the cost model check together
with the versioning condition but didn't get around to implement that.
That would allow stronger max bounds for the epilogue loop.

Richard.
Cong Hou - March 15, 2014, 12:27 a.m.
On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 14 Mar 2014, Jakub Jelinek wrote:
>
>> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
>> > > Consider this fact and if there are alias checks, we can safely remove
>> > > the epilogue if the maximum trip count of the loop is less than or
>> > > equal to the calculated threshold.
>> >
>> > You have to consider n % vf != 0, so an argument on only maximum
>> > trip count or threshold cannot work.
>>
>> Well, if you only check if maximum trip count is <= vf and you know
>> that for n < vf the vectorized loop + it's epilogue path will not be taken,
>> then perhaps you could, but it is a very special case.
>> Now, the question is when we are guaranteed we enter the scalar versioned
>> loop instead for n < vf, is that in case of versioning for alias or
>> versioning for alignment?
>
> I think neither - I have plans to do the cost model check together
> with the versioning condition but didn't get around to implement that.
> That would allow stronger max bounds for the epilogue loop.

In vect_transform_loop(), check_profitability will be set to true if
th >= VF-1 and the number of iteration is unknown (we only consider
unknown trip count here), where th is calculated based on the
parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum
value VF-1. If the loop needs to be versioned, then
check_profitability with true value will be passed to
vect_loop_versioning(), in which an enhanced loop bound check
(considering cost) will be built. So I think if the loop is versioned
and n < VF, then we must enter the scalar version, and in this case
removing epilogue should be safe when the maximum trip count <= th+1.


thanks,
Cong


>
> Richard.
Richard Guenther - March 17, 2014, 1:44 p.m.
On Fri, 14 Mar 2014, Cong Hou wrote:

> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 14 Mar 2014, Jakub Jelinek wrote:
> >
> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
> >> > > Consider this fact and if there are alias checks, we can safely remove
> >> > > the epilogue if the maximum trip count of the loop is less than or
> >> > > equal to the calculated threshold.
> >> >
> >> > You have to consider n % vf != 0, so an argument on only maximum
> >> > trip count or threshold cannot work.
> >>
> >> Well, if you only check if maximum trip count is <= vf and you know
> >> that for n < vf the vectorized loop + it's epilogue path will not be taken,
> >> then perhaps you could, but it is a very special case.
> >> Now, the question is when we are guaranteed we enter the scalar versioned
> >> loop instead for n < vf, is that in case of versioning for alias or
> >> versioning for alignment?
> >
> > I think neither - I have plans to do the cost model check together
> > with the versioning condition but didn't get around to implement that.
> > That would allow stronger max bounds for the epilogue loop.
> 
> In vect_transform_loop(), check_profitability will be set to true if
> th >= VF-1 and the number of iteration is unknown (we only consider
> unknown trip count here), where th is calculated based on the
> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum
> value VF-1. If the loop needs to be versioned, then
> check_profitability with true value will be passed to
> vect_loop_versioning(), in which an enhanced loop bound check
> (considering cost) will be built. So I think if the loop is versioned
> and n < VF, then we must enter the scalar version, and in this case
> removing epilogue should be safe when the maximum trip count <= th+1.

You mean exactly in the case where the profitability check ensures
that n % vf == 0?  Thus effectively if n == maximum trip count?
That's quite a special case, no?

Richard.
Jakub Jelinek - March 17, 2014, 1:49 p.m.
On Mon, Mar 17, 2014 at 02:44:29PM +0100, Richard Biener wrote:
> You mean exactly in the case where the profitability check ensures
> that n % vf == 0?  Thus effectively if n == maximum trip count?
> That's quite a special case, no?

Indeed it is.  But I guess that is pretty much the only case where
the following optimizers can fold the array accesses in the (unneeded)
epilogue loop from some non-constant indexes to constant ones (because, it
knows that the vector loop will iterate in that case exactly once).

	Jakub
Cong Hou - March 17, 2014, 4:46 p.m.
On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 14 Mar 2014, Cong Hou wrote:
>
>> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote:
>> > On Fri, 14 Mar 2014, Jakub Jelinek wrote:
>> >
>> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
>> >> > > Consider this fact and if there are alias checks, we can safely remove
>> >> > > the epilogue if the maximum trip count of the loop is less than or
>> >> > > equal to the calculated threshold.
>> >> >
>> >> > You have to consider n % vf != 0, so an argument on only maximum
>> >> > trip count or threshold cannot work.
>> >>
>> >> Well, if you only check if maximum trip count is <= vf and you know
>> >> that for n < vf the vectorized loop + it's epilogue path will not be taken,
>> >> then perhaps you could, but it is a very special case.
>> >> Now, the question is when we are guaranteed we enter the scalar versioned
>> >> loop instead for n < vf, is that in case of versioning for alias or
>> >> versioning for alignment?
>> >
>> > I think neither - I have plans to do the cost model check together
>> > with the versioning condition but didn't get around to implement that.
>> > That would allow stronger max bounds for the epilogue loop.
>>
>> In vect_transform_loop(), check_profitability will be set to true if
>> th >= VF-1 and the number of iteration is unknown (we only consider
>> unknown trip count here), where th is calculated based on the
>> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum
>> value VF-1. If the loop needs to be versioned, then
>> check_profitability with true value will be passed to
>> vect_loop_versioning(), in which an enhanced loop bound check
>> (considering cost) will be built. So I think if the loop is versioned
>> and n < VF, then we must enter the scalar version, and in this case
>> removing epilogue should be safe when the maximum trip count <= th+1.
>
> You mean exactly in the case where the profitability check ensures
> that n % vf == 0?  Thus effectively if n == maximum trip count?
> That's quite a special case, no?


Yes, it is a special case. But it is in this special case that those
warnings are thrown out. Also, I think declaring an array with VF*N as
length is not unusual.


thanks,
Cong


>
> Richard.
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
Richard Guenther - March 18, 2014, 11:43 a.m.
On Mon, 17 Mar 2014, Cong Hou wrote:

> On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 14 Mar 2014, Cong Hou wrote:
> >
> >> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote:
> >> > On Fri, 14 Mar 2014, Jakub Jelinek wrote:
> >> >
> >> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote:
> >> >> > > Consider this fact and if there are alias checks, we can safely remove
> >> >> > > the epilogue if the maximum trip count of the loop is less than or
> >> >> > > equal to the calculated threshold.
> >> >> >
> >> >> > You have to consider n % vf != 0, so an argument on only maximum
> >> >> > trip count or threshold cannot work.
> >> >>
> >> >> Well, if you only check if maximum trip count is <= vf and you know
> >> >> that for n < vf the vectorized loop + it's epilogue path will not be taken,
> >> >> then perhaps you could, but it is a very special case.
> >> >> Now, the question is when we are guaranteed we enter the scalar versioned
> >> >> loop instead for n < vf, is that in case of versioning for alias or
> >> >> versioning for alignment?
> >> >
> >> > I think neither - I have plans to do the cost model check together
> >> > with the versioning condition but didn't get around to implement that.
> >> > That would allow stronger max bounds for the epilogue loop.
> >>
> >> In vect_transform_loop(), check_profitability will be set to true if
> >> th >= VF-1 and the number of iteration is unknown (we only consider
> >> unknown trip count here), where th is calculated based on the
> >> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum
> >> value VF-1. If the loop needs to be versioned, then
> >> check_profitability with true value will be passed to
> >> vect_loop_versioning(), in which an enhanced loop bound check
> >> (considering cost) will be built. So I think if the loop is versioned
> >> and n < VF, then we must enter the scalar version, and in this case
> >> removing epilogue should be safe when the maximum trip count <= th+1.
> >
> > You mean exactly in the case where the profitability check ensures
> > that n % vf == 0?  Thus effectively if n == maximum trip count?
> > That's quite a special case, no?
> 
> 
> Yes, it is a special case. But it is in this special case that those
> warnings are thrown out. Also, I think declaring an array with VF*N as
> length is not unusual.

Ok, but then for the patch compute the cost model threshold once
in vect_analyze_loop_2 and store it in a new
LOOP_VINFO_COST_MODEL_THRESHOLD.  Also you have to check
the return value from max_stmt_executions_int as that may return
-1 if the number cannot be computed (or isn't representable in
a HOST_WIDE_INT).  You also should check for
LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the
same effect on the cost model check.

The existing condition is already complicated enough - adding new
stuff warrants comments before the (sub-)checks.

Richard.

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e1d8666..f98e628 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@ 
+2014-03-11  Cong Hou  <congh@google.com>
+
+ PR tree-optimization/60505
+ * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number
+ of iterations of the loop and see if we should build the epilogue.
+
 2014-03-10  Jakub Jelinek  <jakub@redhat.com>

  PR ipa/60457
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 41b6875..09ec1c0 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2014-03-11  Cong Hou  <congh@google.com>
+
+ PR tree-optimization/60505
+ * gcc.dg/vect/pr60505.c: New test.
+
 2014-03-10  Jakub Jelinek  <jakub@redhat.com>

  PR ipa/60457
diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c
b/gcc/testsuite/gcc.dg/vect/pr60505.c
new file mode 100644
index 0000000..6940513
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr60505.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall -Werror" } */
+
+void foo(char *in, char *out, int num)
+{
+  int i;
+  char ovec[16] = {0};
+
+  for(i = 0; i < num ; ++i)
+    out[i] = (ovec[i] = in[i]);
+  out[num] = ovec[num/2];
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index df6ab6f..2156d5f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1625,6 +1625,7 @@  vect_analyze_loop_2 (loop_vec_info loop_vinfo)
   bool ok, slp = false;
   int max_vf = MAX_VECTORIZATION_FACTOR;
   int min_vf = 2;
+  int th;

   /* Find all data references in the loop (which correspond to vdefs/vuses)
      and analyze their evolution in the loop.  Also adjust the minimal
@@ -1769,6 +1770,12 @@  vect_analyze_loop_2 (loop_vec_info loop_vinfo)

   /* Decide whether we need to create an epilogue loop to handle
      remaining scalar iterations.  */
+  th = MAX (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND), 1)
+       * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
+  th = MAX (th, LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo)) + 1;
+  th = (th / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
     {
@@ -1779,7 +1786,9 @@  vect_analyze_loop_2 (loop_vec_info loop_vinfo)
     }
   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
    || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
-       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))))
+       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+       && (unsigned HOST_WIDE_INT)max_stmt_executions_int
+    (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))
     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;

   /* If an epilogue loop is required make sure we can create one.  */