Message ID | CAK=A3=1eySj3DAFV+3=GSD6y9KxnMQ6SKUW9s43cybRN7PeBiw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > This patch is fixing PR60505 in which the vectorizer may produce > unnecessary epilogues. > > Bootstrapped and tested on a x86_64 machine. > > OK for trunk? That looks wrong. Consider the case where the loop isn't versioned, if you disable generation of the epilogue loop, you end up only with a vector loop. Say: unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; void foo (char *__restrict in, char *__restrict out, int num) { int i; in = __builtin_assume_aligned (in, 16); out = __builtin_assume_aligned (out, 16); for (i = 0; i < num; ++i) out[i] = (ovec[i] = in[i]); out[num] = ovec[num / 2]; } -O2 -ftree-vectorize. Now, consider if this function is called with num != 16 (num > 16 is of course invalid, but num 0 to 15 is valid and your patch will cause a wrong-code in this case). Jakub
On Wed, 12 Mar 2014, Jakub Jelinek wrote: > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > > This patch is fixing PR60505 in which the vectorizer may produce > > unnecessary epilogues. > > > > Bootstrapped and tested on a x86_64 machine. > > > > OK for trunk? > > That looks wrong. Consider the case where the loop isn't versioned, > if you disable generation of the epilogue loop, you end up only with > a vector loop. > > Say: > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > void > foo (char *__restrict in, char *__restrict out, int num) > { > int i; > > in = __builtin_assume_aligned (in, 16); > out = __builtin_assume_aligned (out, 16); > for (i = 0; i < num; ++i) > out[i] = (ovec[i] = in[i]); > out[num] = ovec[num / 2]; > } > -O2 -ftree-vectorize. Now, consider if this function is called > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > valid and your patch will cause a wrong-code in this case) Indeed - we also "share" the epilogue loop for the cost model check. See where we set its maximum number of iterations. Richard.
Thank you for pointing it out. I didn't realized that alias analysis has influences on this issue. The current problem is that the epilogue may be unnecessary if the loop bound cannot be larger than the number of iterations of the vectorized loop multiplied by VF when the vectorized loop is supposed to be executed. My method is incorrect because I assume the vectorized loop will be executed which is actually guaranteed by loop bound check (and also alias checks). So if the alias checks exist, my method is fine as both conditions are met. If there is no alias checks, I must consider the possibility that the vectorized loop may not be executed at runtime and then the epilogue should not be eliminated. The warning appears on epilogue, and with loop bound checks (and without alias checks) the warning will be gone. So I think the key is alias checks: my method only works if there is no alias checks. How about adding one more condition that checks if alias checks are needed, as the code shown below? else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) || (unsigned HOST_WIDE_INT)max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; thanks, Cong On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: >> This patch is fixing PR60505 in which the vectorizer may produce >> unnecessary epilogues. >> >> Bootstrapped and tested on a x86_64 machine. >> >> OK for trunk? > > That looks wrong. Consider the case where the loop isn't versioned, > if you disable generation of the epilogue loop, you end up only with > a vector loop. > > Say: > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > void > foo (char *__restrict in, char *__restrict out, int num) > { > int i; > > in = __builtin_assume_aligned (in, 16); > out = __builtin_assume_aligned (out, 16); > for (i = 0; i < num; ++i) > out[i] = (ovec[i] = in[i]); > out[num] = ovec[num / 2]; > } > -O2 -ftree-vectorize. Now, consider if this function is called > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > valid and your patch will cause a wrong-code in this case). > > Jakub
On Wed, 12 Mar 2014, Cong Hou wrote: > Thank you for pointing it out. I didn't realized that alias analysis > has influences on this issue. > > The current problem is that the epilogue may be unnecessary if the > loop bound cannot be larger than the number of iterations of the > vectorized loop multiplied by VF when the vectorized loop is supposed > to be executed. My method is incorrect because I assume the vectorized > loop will be executed which is actually guaranteed by loop bound check > (and also alias checks). So if the alias checks exist, my method is > fine as both conditions are met. But there is still the loop bound check which, if it fails, uses the epilogue loop as fallback, not the scalar versioned loop. > If there is no alias checks, I must > consider the possibility that the vectorized loop may not be executed > at runtime and then the epilogue should not be eliminated. The warning > appears on epilogue, and with loop bound checks (and without alias > checks) the warning will be gone. So I think the key is alias checks: > my method only works if there is no alias checks. > > How about adding one more condition that checks if alias checks are > needed, as the code shown below? > > else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) > || (unsigned HOST_WIDE_INT)max_stmt_executions_int > (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) > LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; > > > thanks, > Cong > > > On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > >> This patch is fixing PR60505 in which the vectorizer may produce > >> unnecessary epilogues. > >> > >> Bootstrapped and tested on a x86_64 machine. > >> > >> OK for trunk? > > > > That looks wrong. Consider the case where the loop isn't versioned, > > if you disable generation of the epilogue loop, you end up only with > > a vector loop. > > > > Say: > > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > > void > > foo (char *__restrict in, char *__restrict out, int num) > > { > > int i; > > > > in = __builtin_assume_aligned (in, 16); > > out = __builtin_assume_aligned (out, 16); > > for (i = 0; i < num; ++i) > > out[i] = (ovec[i] = in[i]); > > out[num] = ovec[num / 2]; > > } > > -O2 -ftree-vectorize. Now, consider if this function is called > > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > > valid and your patch will cause a wrong-code in this case). > > > > Jakub > >
On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguenther@suse.de> wrote: > On Wed, 12 Mar 2014, Cong Hou wrote: > >> Thank you for pointing it out. I didn't realized that alias analysis >> has influences on this issue. >> >> The current problem is that the epilogue may be unnecessary if the >> loop bound cannot be larger than the number of iterations of the >> vectorized loop multiplied by VF when the vectorized loop is supposed >> to be executed. My method is incorrect because I assume the vectorized >> loop will be executed which is actually guaranteed by loop bound check >> (and also alias checks). So if the alias checks exist, my method is >> fine as both conditions are met. > > But there is still the loop bound check which, if it fails, uses > the epilogue loop as fallback, not the scalar versioned loop. The loop bound check is already performed together with alias checks (assume we need alias checks). Actually, I did observe that the loop bound check in the true body of alias checks may be unnecessary. For example, for the following loop for(i=0; i < num ; ++i) out[i] = (ovec[i] = in[i]); GCC now generates the following GIMPLE code after vectorization: <bb 3>: // loop bound check (with cost model) and alias checks _29 = (unsigned int) num_5(D); _28 = _29 > 15; _24 = in_9(D) + 16; _23 = out_7(D) >= _24; _2 = out_7(D) + 16; _1 = _2 <= in_9(D); _32 = _1 | _23; _31 = _28 & _32; if (_31 != 0) goto <bb 4>; else goto <bb 12>; <bb 4>: niters.3_44 = (unsigned int) num_5(D); _46 = niters.3_44 + 4294967280; _47 = _46 >> 4; bnd.4_45 = _47 + 1; ratio_mult_vf.5_48 = bnd.4_45 << 4; _59 = (unsigned int) num_5(D); _60 = _59 + 4294967295; if (_60 <= 14) <================ is this necessary? goto <bb 10>; else goto <bb 5>; The check _60<=14 should be unnecessary because it is implied by the fact _29 > 15 in bb3. Consider this fact and if there are alias checks, we can safely remove the epilogue if the maximum trip count of the loop is less than or equal to the calculated threshold. Cong > >> If there is no alias checks, I must >> consider the possibility that the vectorized loop may not be executed >> at runtime and then the epilogue should not be eliminated. The warning >> appears on epilogue, and with loop bound checks (and without alias >> checks) the warning will be gone. So I think the key is alias checks: >> my method only works if there is no alias checks. >> >> How about adding one more condition that checks if alias checks are >> needed, as the code shown below? >> >> else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >> || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) >> < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) >> && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) >> || (unsigned HOST_WIDE_INT)max_stmt_executions_int >> (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) >> LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; >> >> >> thanks, >> Cong >> >> >> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: >> >> This patch is fixing PR60505 in which the vectorizer may produce >> >> unnecessary epilogues. >> >> >> >> Bootstrapped and tested on a x86_64 machine. >> >> >> >> OK for trunk? >> > >> > That looks wrong. Consider the case where the loop isn't versioned, >> > if you disable generation of the epilogue loop, you end up only with >> > a vector loop. >> > >> > Say: >> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; >> > void >> > foo (char *__restrict in, char *__restrict out, int num) >> > { >> > int i; >> > >> > in = __builtin_assume_aligned (in, 16); >> > out = __builtin_assume_aligned (out, 16); >> > for (i = 0; i < num; ++i) >> > out[i] = (ovec[i] = in[i]); >> > out[num] = ovec[num / 2]; >> > } >> > -O2 -ftree-vectorize. Now, consider if this function is called >> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is >> > valid and your patch will cause a wrong-code in this case). >> > >> > Jakub >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
On Thu, 13 Mar 2014, Cong Hou wrote: > On Thu, Mar 13, 2014 at 2:27 AM, Richard Biener <rguenther@suse.de> wrote: > > On Wed, 12 Mar 2014, Cong Hou wrote: > > > >> Thank you for pointing it out. I didn't realized that alias analysis > >> has influences on this issue. > >> > >> The current problem is that the epilogue may be unnecessary if the > >> loop bound cannot be larger than the number of iterations of the > >> vectorized loop multiplied by VF when the vectorized loop is supposed > >> to be executed. My method is incorrect because I assume the vectorized > >> loop will be executed which is actually guaranteed by loop bound check > >> (and also alias checks). So if the alias checks exist, my method is > >> fine as both conditions are met. > > > > But there is still the loop bound check which, if it fails, uses > > the epilogue loop as fallback, not the scalar versioned loop. > > The loop bound check is already performed together with alias checks > (assume we need alias checks). Actually, I did observe that the loop > bound check in the true body of alias checks may be unnecessary. For > example, for the following loop > > for(i=0; i < num ; ++i) > out[i] = (ovec[i] = in[i]); > > GCC now generates the following GIMPLE code after vectorization: > > > > <bb 3>: // loop bound check (with cost model) and alias checks > _29 = (unsigned int) num_5(D); > _28 = _29 > 15; > _24 = in_9(D) + 16; > _23 = out_7(D) >= _24; > _2 = out_7(D) + 16; > _1 = _2 <= in_9(D); > _32 = _1 | _23; > _31 = _28 & _32; > if (_31 != 0) > goto <bb 4>; > else > goto <bb 12>; > > <bb 4>: > niters.3_44 = (unsigned int) num_5(D); > _46 = niters.3_44 + 4294967280; > _47 = _46 >> 4; > bnd.4_45 = _47 + 1; > ratio_mult_vf.5_48 = bnd.4_45 << 4; > _59 = (unsigned int) num_5(D); > _60 = _59 + 4294967295; > if (_60 <= 14) <================ is this necessary? > goto <bb 10>; > else > goto <bb 5>; > > > The check _60<=14 should be unnecessary because it is implied by the > fact _29 > 15 in bb3. I agree that the code the vectorizer generates is less than optimal in some cases (I filed a PR about this some time ago and fixed some cases). > Consider this fact and if there are alias checks, we can safely remove > the epilogue if the maximum trip count of the loop is less than or > equal to the calculated threshold. You have to consider n % vf != 0, so an argument on only maximum trip count or threshold cannot work. Richard. > > Cong > > > > > > >> If there is no alias checks, I must > >> consider the possibility that the vectorized loop may not be executed > >> at runtime and then the epilogue should not be eliminated. The warning > >> appears on epilogue, and with loop bound checks (and without alias > >> checks) the warning will be gone. So I think the key is alias checks: > >> my method only works if there is no alias checks. > >> > >> How about adding one more condition that checks if alias checks are > >> needed, as the code shown below? > >> > >> else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > >> || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > >> < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > >> && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) > >> || (unsigned HOST_WIDE_INT)max_stmt_executions_int > >> (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) > >> LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; > >> > >> > >> thanks, > >> Cong > >> > >> > >> On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <jakub@redhat.com> wrote: > >> > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > >> >> This patch is fixing PR60505 in which the vectorizer may produce > >> >> unnecessary epilogues. > >> >> > >> >> Bootstrapped and tested on a x86_64 machine. > >> >> > >> >> OK for trunk? > >> > > >> > That looks wrong. Consider the case where the loop isn't versioned, > >> > if you disable generation of the epilogue loop, you end up only with > >> > a vector loop. > >> > > >> > Say: > >> > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > >> > void > >> > foo (char *__restrict in, char *__restrict out, int num) > >> > { > >> > int i; > >> > > >> > in = __builtin_assume_aligned (in, 16); > >> > out = __builtin_assume_aligned (out, 16); > >> > for (i = 0; i < num; ++i) > >> > out[i] = (ovec[i] = in[i]); > >> > out[num] = ovec[num / 2]; > >> > } > >> > -O2 -ftree-vectorize. Now, consider if this function is called > >> > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > >> > valid and your patch will cause a wrong-code in this case). > >> > > >> > Jakub > >> > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE / SUSE Labs > > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer > >
On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: > > Consider this fact and if there are alias checks, we can safely remove > > the epilogue if the maximum trip count of the loop is less than or > > equal to the calculated threshold. > > You have to consider n % vf != 0, so an argument on only maximum > trip count or threshold cannot work. Well, if you only check if maximum trip count is <= vf and you know that for n < vf the vectorized loop + it's epilogue path will not be taken, then perhaps you could, but it is a very special case. Now, the question is when we are guaranteed we enter the scalar versioned loop instead for n < vf, is that in case of versioning for alias or versioning for alignment? Jakub
On Fri, 14 Mar 2014, Jakub Jelinek wrote: > On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: > > > Consider this fact and if there are alias checks, we can safely remove > > > the epilogue if the maximum trip count of the loop is less than or > > > equal to the calculated threshold. > > > > You have to consider n % vf != 0, so an argument on only maximum > > trip count or threshold cannot work. > > Well, if you only check if maximum trip count is <= vf and you know > that for n < vf the vectorized loop + it's epilogue path will not be taken, > then perhaps you could, but it is a very special case. > Now, the question is when we are guaranteed we enter the scalar versioned > loop instead for n < vf, is that in case of versioning for alias or > versioning for alignment? I think neither - I have plans to do the cost model check together with the versioning condition but didn't get around to implement that. That would allow stronger max bounds for the epilogue loop. Richard.
On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote: > On Fri, 14 Mar 2014, Jakub Jelinek wrote: > >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: >> > > Consider this fact and if there are alias checks, we can safely remove >> > > the epilogue if the maximum trip count of the loop is less than or >> > > equal to the calculated threshold. >> > >> > You have to consider n % vf != 0, so an argument on only maximum >> > trip count or threshold cannot work. >> >> Well, if you only check if maximum trip count is <= vf and you know >> that for n < vf the vectorized loop + it's epilogue path will not be taken, >> then perhaps you could, but it is a very special case. >> Now, the question is when we are guaranteed we enter the scalar versioned >> loop instead for n < vf, is that in case of versioning for alias or >> versioning for alignment? > > I think neither - I have plans to do the cost model check together > with the versioning condition but didn't get around to implement that. > That would allow stronger max bounds for the epilogue loop. In vect_transform_loop(), check_profitability will be set to true if th >= VF-1 and the number of iteration is unknown (we only consider unknown trip count here), where th is calculated based on the parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum value VF-1. If the loop needs to be versioned, then check_profitability with true value will be passed to vect_loop_versioning(), in which an enhanced loop bound check (considering cost) will be built. So I think if the loop is versioned and n < VF, then we must enter the scalar version, and in this case removing epilogue should be safe when the maximum trip count <= th+1. thanks, Cong > > Richard.
On Fri, 14 Mar 2014, Cong Hou wrote: > On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote: > > On Fri, 14 Mar 2014, Jakub Jelinek wrote: > > > >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: > >> > > Consider this fact and if there are alias checks, we can safely remove > >> > > the epilogue if the maximum trip count of the loop is less than or > >> > > equal to the calculated threshold. > >> > > >> > You have to consider n % vf != 0, so an argument on only maximum > >> > trip count or threshold cannot work. > >> > >> Well, if you only check if maximum trip count is <= vf and you know > >> that for n < vf the vectorized loop + it's epilogue path will not be taken, > >> then perhaps you could, but it is a very special case. > >> Now, the question is when we are guaranteed we enter the scalar versioned > >> loop instead for n < vf, is that in case of versioning for alias or > >> versioning for alignment? > > > > I think neither - I have plans to do the cost model check together > > with the versioning condition but didn't get around to implement that. > > That would allow stronger max bounds for the epilogue loop. > > In vect_transform_loop(), check_profitability will be set to true if > th >= VF-1 and the number of iteration is unknown (we only consider > unknown trip count here), where th is calculated based on the > parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum > value VF-1. If the loop needs to be versioned, then > check_profitability with true value will be passed to > vect_loop_versioning(), in which an enhanced loop bound check > (considering cost) will be built. So I think if the loop is versioned > and n < VF, then we must enter the scalar version, and in this case > removing epilogue should be safe when the maximum trip count <= th+1. You mean exactly in the case where the profitability check ensures that n % vf == 0? Thus effectively if n == maximum trip count? That's quite a special case, no? Richard.
On Mon, Mar 17, 2014 at 02:44:29PM +0100, Richard Biener wrote: > You mean exactly in the case where the profitability check ensures > that n % vf == 0? Thus effectively if n == maximum trip count? > That's quite a special case, no? Indeed it is. But I guess that is pretty much the only case where the following optimizers can fold the array accesses in the (unneeded) epilogue loop from some non-constant indexes to constant ones (because, it knows that the vector loop will iterate in that case exactly once). Jakub
On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener <rguenther@suse.de> wrote: > On Fri, 14 Mar 2014, Cong Hou wrote: > >> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote: >> > On Fri, 14 Mar 2014, Jakub Jelinek wrote: >> > >> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: >> >> > > Consider this fact and if there are alias checks, we can safely remove >> >> > > the epilogue if the maximum trip count of the loop is less than or >> >> > > equal to the calculated threshold. >> >> > >> >> > You have to consider n % vf != 0, so an argument on only maximum >> >> > trip count or threshold cannot work. >> >> >> >> Well, if you only check if maximum trip count is <= vf and you know >> >> that for n < vf the vectorized loop + it's epilogue path will not be taken, >> >> then perhaps you could, but it is a very special case. >> >> Now, the question is when we are guaranteed we enter the scalar versioned >> >> loop instead for n < vf, is that in case of versioning for alias or >> >> versioning for alignment? >> > >> > I think neither - I have plans to do the cost model check together >> > with the versioning condition but didn't get around to implement that. >> > That would allow stronger max bounds for the epilogue loop. >> >> In vect_transform_loop(), check_profitability will be set to true if >> th >= VF-1 and the number of iteration is unknown (we only consider >> unknown trip count here), where th is calculated based on the >> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum >> value VF-1. If the loop needs to be versioned, then >> check_profitability with true value will be passed to >> vect_loop_versioning(), in which an enhanced loop bound check >> (considering cost) will be built. So I think if the loop is versioned >> and n < VF, then we must enter the scalar version, and in this case >> removing epilogue should be safe when the maximum trip count <= th+1. > > You mean exactly in the case where the profitability check ensures > that n % vf == 0? Thus effectively if n == maximum trip count? > That's quite a special case, no? Yes, it is a special case. But it is in this special case that those warnings are thrown out. Also, I think declaring an array with VF*N as length is not unusual. thanks, Cong > > Richard. > > -- > Richard Biener <rguenther@suse.de> > SUSE / SUSE Labs > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
On Mon, 17 Mar 2014, Cong Hou wrote: > On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener <rguenther@suse.de> wrote: > > On Fri, 14 Mar 2014, Cong Hou wrote: > > > >> On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener <rguenther@suse.de> wrote: > >> > On Fri, 14 Mar 2014, Jakub Jelinek wrote: > >> > > >> >> On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: > >> >> > > Consider this fact and if there are alias checks, we can safely remove > >> >> > > the epilogue if the maximum trip count of the loop is less than or > >> >> > > equal to the calculated threshold. > >> >> > > >> >> > You have to consider n % vf != 0, so an argument on only maximum > >> >> > trip count or threshold cannot work. > >> >> > >> >> Well, if you only check if maximum trip count is <= vf and you know > >> >> that for n < vf the vectorized loop + it's epilogue path will not be taken, > >> >> then perhaps you could, but it is a very special case. > >> >> Now, the question is when we are guaranteed we enter the scalar versioned > >> >> loop instead for n < vf, is that in case of versioning for alias or > >> >> versioning for alignment? > >> > > >> > I think neither - I have plans to do the cost model check together > >> > with the versioning condition but didn't get around to implement that. > >> > That would allow stronger max bounds for the epilogue loop. > >> > >> In vect_transform_loop(), check_profitability will be set to true if > >> th >= VF-1 and the number of iteration is unknown (we only consider > >> unknown trip count here), where th is calculated based on the > >> parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum > >> value VF-1. If the loop needs to be versioned, then > >> check_profitability with true value will be passed to > >> vect_loop_versioning(), in which an enhanced loop bound check > >> (considering cost) will be built. So I think if the loop is versioned > >> and n < VF, then we must enter the scalar version, and in this case > >> removing epilogue should be safe when the maximum trip count <= th+1. > > > > You mean exactly in the case where the profitability check ensures > > that n % vf == 0? Thus effectively if n == maximum trip count? > > That's quite a special case, no? > > > Yes, it is a special case. But it is in this special case that those > warnings are thrown out. Also, I think declaring an array with VF*N as > length is not unusual. Ok, but then for the patch compute the cost model threshold once in vect_analyze_loop_2 and store it in a new LOOP_VINFO_COST_MODEL_THRESHOLD. Also you have to check the return value from max_stmt_executions_int as that may return -1 if the number cannot be computed (or isn't representable in a HOST_WIDE_INT). You also should check for LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the same effect on the cost model check. The existing condition is already complicated enough - adding new stuff warrants comments before the (sub-)checks. Richard.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e1d8666..f98e628 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2014-03-11 Cong Hou <congh@google.com> + + PR tree-optimization/60505 + * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number + of iterations of the loop and see if we should build the epilogue. + 2014-03-10 Jakub Jelinek <jakub@redhat.com> PR ipa/60457 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 41b6875..09ec1c0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-11 Cong Hou <congh@google.com> + + PR tree-optimization/60505 + * gcc.dg/vect/pr60505.c: New test. + 2014-03-10 Jakub Jelinek <jakub@redhat.com> PR ipa/60457 diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c b/gcc/testsuite/gcc.dg/vect/pr60505.c new file mode 100644 index 0000000..6940513 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr60505.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall -Werror" } */ + +void foo(char *in, char *out, int num) +{ + int i; + char ovec[16] = {0}; + + for(i = 0; i < num ; ++i) + out[i] = (ovec[i] = in[i]); + out[num] = ovec[num/2]; +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index df6ab6f..2156d5f 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1625,6 +1625,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo) bool ok, slp = false; int max_vf = MAX_VECTORIZATION_FACTOR; int min_vf = 2; + int th; /* Find all data references in the loop (which correspond to vdefs/vuses) and analyze their evolution in the loop. Also adjust the minimal @@ -1769,6 +1770,12 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo) /* Decide whether we need to create an epilogue loop to handle remaining scalar iterations. */ + th = MAX (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND), 1) + * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1; + th = MAX (th, LOOP_VINFO_COST_MODEL_MIN_ITERS (loop_vinfo)) + 1; + th = (th / LOOP_VINFO_VECT_FACTOR (loop_vinfo)) + * LOOP_VINFO_VECT_FACTOR (loop_vinfo); + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0) { @@ -1779,7 +1786,9 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo) } else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) - < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)))) + < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) + && (unsigned HOST_WIDE_INT)max_stmt_executions_int + (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)) LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; /* If an epilogue loop is required make sure we can create one. */