[GCC8,32/33] Save niter check for vect peeling if loop versioning is required

Submitted by Bin Cheng on April 18, 2017, 10:54 a.m.

Details

Message ID VI1PR0802MB21767511F259C08E5E23F7A2E7190@VI1PR0802MB2176.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng April 18, 2017, 10:54 a.m.
Hi,
When loop versioning is required in vectorization, we can merge niter check for vect
peeling with the check for loop versioning, thus save one check/branch for vectorized
loop.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  <bin.cheng@arm.com>

	* tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop
	if versioning is required.
	* tree-vect-loop.c (vect_analyze_loop_2): Merge niter check for loop
	peeling with the check for versioning.
From bd54e2524a4047328ba4847ad013db2bbe5850fe Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Thu, 16 Mar 2017 16:40:50 +0000
Subject: [PATCH 32/33] save-vect_peeling-niters-check-20170225.txt

---
 gcc/tree-vect-loop-manip.c |  8 +++++---
 gcc/tree-vect-loop.c       | 30 ++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 3 deletions(-)

Comments

Richard Guenther May 11, 2017, 11:06 a.m.
On Tue, Apr 18, 2017 at 12:54 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> When loop versioning is required in vectorization, we can merge niter check for vect
> peeling with the check for loop versioning, thus save one check/branch for vectorized
> loop.
> Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop
>         if versioning is required.
>         * tree-vect-loop.c (vect_analyze_loop_2): Merge niter check for loop
>         peeling with the check for versioning.
Bin.Cheng June 7, 2017, 11:04 a.m.
On Thu, May 11, 2017 at 12:06 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 12:54 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> When loop versioning is required in vectorization, we can merge niter check for vect
>> peeling with the check for loop versioning, thus save one check/branch for vectorized
>> loop.
>> Is it OK?
>
> Ok.
Applied @r248959.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-vect-loop-manip.c (vect_do_peeling): Don't skip vector loop
>>         if versioning is required.
>>         * tree-vect-loop.c (vect_analyze_loop_2): Merge niter check for loop
>>         peeling with the check for versioning.
Richard Sandiford June 10, 2017, 10:06 a.m.
Another one sorry, but:

Bin Cheng <Bin.Cheng@arm.com> writes:
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index af874e7..98caa5e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2214,6 +2214,36 @@ start_over:
>          }
>      }
>  
> +  /* During peeling, we need to check if number of loop iterations is
> +     enough for both peeled prolog loop and vector loop.  This check
> +     can be merged along with threshold check of loop versioning, so
> +     increase threshold for this case if necessary.  */
> +  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
> +      && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +	  || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
> +    {
> +      unsigned niters_th;
> +
> +      /* Niters for peeled prolog loop.  */
> +      if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
> +	{
> +	  struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
> +	  tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
> +
> +	  niters_th = TYPE_VECTOR_SUBPARTS (vectype) - 1;
> +	}
> +      else
> +	niters_th = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
> +
> +      /* Niters for at least one iteration of vectorized loop.  */
> +      niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +      /* One additional iteration because of peeling for gap.  */
> +      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +	niters_th++;

is the ! intentional here?  It looks like it should adding 1 when
peeling for gaps _is_ needed.

> +      if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
> +	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
> +    }
> +
>    gcc_assert (vectorization_factor
>  	      == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));
Bin.Cheng June 12, 2017, 8:03 a.m.
On Sat, Jun 10, 2017 at 11:06 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Another one sorry, but:
>
> Bin Cheng <Bin.Cheng@arm.com> writes:
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index af874e7..98caa5e 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -2214,6 +2214,36 @@ start_over:
>>          }
>>      }
>>
>> +  /* During peeling, we need to check if number of loop iterations is
>> +     enough for both peeled prolog loop and vector loop.  This check
>> +     can be merged along with threshold check of loop versioning, so
>> +     increase threshold for this case if necessary.  */
>> +  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
>> +      && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>> +       || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
>> +    {
>> +      unsigned niters_th;
>> +
>> +      /* Niters for peeled prolog loop.  */
>> +      if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
>> +     {
>> +       struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
>> +       tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
>> +
>> +       niters_th = TYPE_VECTOR_SUBPARTS (vectype) - 1;
>> +     }
>> +      else
>> +     niters_th = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
>> +
>> +      /* Niters for at least one iteration of vectorized loop.  */
>> +      niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +      /* One additional iteration because of peeling for gap.  */
>> +      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>> +     niters_th++;
>
> is the ! intentional here?  It looks like it should adding 1 when
> peeling for gaps _is_ needed.
Hi Richard,
Thanks for spotting this.  This one is more like my typo.  The
comments says one additional iteration for peeling gap, but the code
does the opposite.  How to fix this depends on the answer to previous
question.  If th stands for minimum niters of vector loop, we need:
>> +      /* One additional iteration because of peeling for gap.  */
>> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>> +     niters_th++;

If it stands for maximum niters of scalar loop, we need:
>> +      /* One additional iteration because of peeling for gap.  */
>> +      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>> +     niters_th--;


Thanks,
bin

>
>> +      if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
>> +     LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
>> +    }
>> +
>>    gcc_assert (vectorization_factor
>>             == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));

Patch hide | download patch | download mbox

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 0fc8cd3..0ff474d 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1686,9 +1686,11 @@  vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   /* Prolog loop may be skipped.  */
   bool skip_prolog = (prolog_peeling != 0);
-  /* Skip to epilog if scalar loop may be preferred.  It's only used when
-     we peel for epilog loop.  */
-  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo));
+  /* Skip to epilog if scalar loop may be preferred.  It's only needed
+     when we peel for epilog loop and when it hasn't been checked with
+     loop versioning.  */
+  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+		      && !LOOP_REQUIRES_VERSIONING (loop_vinfo));
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index af874e7..98caa5e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2214,6 +2214,36 @@  start_over:
         }
     }
 
+  /* During peeling, we need to check if number of loop iterations is
+     enough for both peeled prolog loop and vector loop.  This check
+     can be merged along with threshold check of loop versioning, so
+     increase threshold for this case if necessary.  */
+  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
+      && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+	  || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
+    {
+      unsigned niters_th;
+
+      /* Niters for peeled prolog loop.  */
+      if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
+	{
+	  struct data_reference *dr = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
+	  tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
+
+	  niters_th = TYPE_VECTOR_SUBPARTS (vectype) - 1;
+	}
+      else
+	niters_th = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+
+      /* Niters for at least one iteration of vectorized loop.  */
+      niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      /* One additional iteration because of peeling for gap.  */
+      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+	niters_th++;
+      if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
+	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
+    }
+
   gcc_assert (vectorization_factor
 	      == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));