diff mbox

New GCC options for loop vectorization

Message ID CAAkRFZLa+ENGNhBhpmwTWKTFbxAkqhs2KnNBdzZ0qmn39Ocihw@mail.gmail.com
State New
Headers show

Commit Message

Xinliang David Li Sept. 23, 2013, 8:37 p.m. UTC
Thanks. I modified the patch so that the max allowed peel iterations
can be specified via a parameter. Testing on going. Ok for trunk ?

David

On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>>
>>>>>>>> In this patch, two new options are introduced:
>>>>>>>>
>>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>>
>>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>>> -ftree-slp-vectorize.
>>>>>>>>
>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>>
>>>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>>
>>>>>>>> With the new change it becomes:
>>>>>>>>
>>>>>>>>     -O3 -fno-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>>
>>>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>>
>>>>>>>> The new way is
>>>>>>>>
>>>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>>
>>>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>>>> introduce runtime problems:
>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>>>> common in practice.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>>
>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>>>
>>>>>> Ok. Can you also comment on 2) ?
>>>>>
>>>>> I think we want to decide how granular we want to control the vectorizer
>>>>> and using which mechanism.  My cost-model re-org makes
>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>>> a step backwards in this context.
>>>>
>>>> Using cost model to do a coarse grain control/configuration is
>>>> certainly something we want, but having a fine grain control is still
>>>> useful.
>>>>
>>>>>
>>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>>> you'd want to be able to disable separately?
>>>>
>>>> Loop peeling seems to be the main one. There is also a correctness
>>>> issue related. For instance, the following code is common in practice,
>>>> but loop peeling wrongly assumes initial base-alignment and generates
>>>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>>>> not something we can blindly turned on -- even when it is on, there
>>>> should be a way to turn it off explicitly:
>>>>
>>>> char a[10000];
>>>>
>>>> void foo(int n)
>>>> {
>>>>   int* b = (int*)(a+n);
>>>>   int i = 0;
>>>>   for (; i < 1000; ++i)
>>>>     b[i] = 1;
>>>> }
>>>>
>>>> int main(int argn, char** argv)
>>>> {
>>>>   foo(argn);
>>>> }
>>>
>>> But that's just a bug that should be fixed (looking into it).
>>>
>>>>>  Just disabling peeling for
>>>>> alignment may get you into the versioning for alignment path (and thus
>>>>> an unvectorized loop at runtime).
>>>>
>>>> This is not true for target supporting mis-aligned access. I have not
>>>> seen a case where alignment driver loop version happens on x86.
>>>>
>>>>>Also it's know that the alignment peeling
>>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>>> between unaligned load and store costs).
>>>>
>>>> Yet another reason to turn it off as it is not effective anyways?
>>>
>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost model
>>> patch because it wasn't guarding versioning for aliasing but only versioning
>>> for alignment.
>>>
>>> We have to be consistent here - if we add a way to disable peeling for
>>> alignment then we certainly don't want to remove the ability to disable
>>> versioning for alignment, no?
>>
>> We already have the ability to turn off versioning -- via --param. It
>> is a more natural way to fine tune a pass instead of introducing a -f
>> option. For this reason, your planned deprecation of the option is a
>> good thing to do.
>>
>> For consistency, I think we should introduce a new parameter to turn
>> on/off peeling. This can also be tied closely with the cost model
>> change.
>>
>> The proposed patch attached. Does this one look ok?
>
> I'd principally support adding a param but rather would for example make
> it limit the number of peeled iterations (zero turning off the feature).  As
> you implemented it it will give false messages for
>
>       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
>       do_peeling = vector_alignment_reachable_p (dr);
>       if (do_peeling)
>         {
> ...
>         }
>       else
>         {
>           if (!aligned_access_p (dr))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "vector alignment may not be reachable\n");
>               break;
>             }
>         }
>
> the else path.  I'd restructure this as
>
>      if (!vector_alignment_reachable_p (dr))
>        {
>           if (!aligned_access_p (dr))
>          ....
>           continue;
>        }
>
>      if (known_alignment_for_access_p (dr))
>       {
> ... check param against the number of iterations to peel for this DR ....
>       }
>      else
>       {
> ... check param against max iterations ...
>       }
>
> an alternative is a tri-state param that would either allow no peeling at all,
> just peeling with known number of iterations or all kind of peelings.
>
> Thanks,
> Richard.
>
>
>> thanks,
>>
>> David
>>
>>
>>
>>>
>>> Richard.
>>>
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>>
>>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>>
>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>>          opts->x_flag_ipa_reference = false;
>>>>>>>        break;
>>>>>>>
>>>>>>> +    case OPT_ftree_vectorize:
>>>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>>> +      break;
>>>>>>>
>>>>>>> doesn't look obviously correct.  Does that handle
>>>>>>>
>>>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> properly?  Currently at least
>>>>>>>
>>>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> doesn't "work".
>>>>>>
>>>>>>
>>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>>
>>>>>>>
>>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>>> repeated options.
>>>>>>>
>>>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>>>
>>>>>> We should probably just document the behavior. Even better, we should
>>>>>> deprecate the old option.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> David

Comments

Richard Biener Sept. 24, 2013, 11:09 a.m. UTC | #1
On Mon, Sep 23, 2013 at 10:37 PM, Xinliang David Li <davidxl@google.com> wrote:
> Thanks. I modified the patch so that the max allowed peel iterations
> can be specified via a parameter. Testing on going. Ok for trunk ?

+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+         "vect-max-peeling-for-alignment",
+         "Max number of loop peels to enhancement alignment of data
references in a loop",
+         -1, 0, 0)

I believe this doesn't allow --param vect-max-peeling-for-alignment=-1 as you
specify 0 as the minimum.

So, ok with changing minimum and maxmum to -1. (double check that this works)

Thanks,
Richard.



> David
>
> On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>>>
>>>>>>>>> In this patch, two new options are introduced:
>>>>>>>>>
>>>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>>>
>>>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>>>> -ftree-slp-vectorize.
>>>>>>>>>
>>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>>>
>>>>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>>>
>>>>>>>>> With the new change it becomes:
>>>>>>>>>
>>>>>>>>>     -O3 -fno-loop-vectorize
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>>>
>>>>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>>>
>>>>>>>>> The new way is
>>>>>>>>>
>>>>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>>>
>>>>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>>>>> introduce runtime problems:
>>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>>>>> common in practice.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>>>
>>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>>>>
>>>>>>> Ok. Can you also comment on 2) ?
>>>>>>
>>>>>> I think we want to decide how granular we want to control the vectorizer
>>>>>> and using which mechanism.  My cost-model re-org makes
>>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>>>> a step backwards in this context.
>>>>>
>>>>> Using cost model to do a coarse grain control/configuration is
>>>>> certainly something we want, but having a fine grain control is still
>>>>> useful.
>>>>>
>>>>>>
>>>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>>>> you'd want to be able to disable separately?
>>>>>
>>>>> Loop peeling seems to be the main one. There is also a correctness
>>>>> issue related. For instance, the following code is common in practice,
>>>>> but loop peeling wrongly assumes initial base-alignment and generates
>>>>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>>>>> not something we can blindly turned on -- even when it is on, there
>>>>> should be a way to turn it off explicitly:
>>>>>
>>>>> char a[10000];
>>>>>
>>>>> void foo(int n)
>>>>> {
>>>>>   int* b = (int*)(a+n);
>>>>>   int i = 0;
>>>>>   for (; i < 1000; ++i)
>>>>>     b[i] = 1;
>>>>> }
>>>>>
>>>>> int main(int argn, char** argv)
>>>>> {
>>>>>   foo(argn);
>>>>> }
>>>>
>>>> But that's just a bug that should be fixed (looking into it).
>>>>
>>>>>>  Just disabling peeling for
>>>>>> alignment may get you into the versioning for alignment path (and thus
>>>>>> an unvectorized loop at runtime).
>>>>>
>>>>> This is not true for target supporting mis-aligned access. I have not
>>>>> seen a case where alignment driver loop version happens on x86.
>>>>>
>>>>>>Also it's know that the alignment peeling
>>>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>>>> between unaligned load and store costs).
>>>>>
>>>>> Yet another reason to turn it off as it is not effective anyways?
>>>>
>>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost model
>>>> patch because it wasn't guarding versioning for aliasing but only versioning
>>>> for alignment.
>>>>
>>>> We have to be consistent here - if we add a way to disable peeling for
>>>> alignment then we certainly don't want to remove the ability to disable
>>>> versioning for alignment, no?
>>>
>>> We already have the ability to turn off versioning -- via --param. It
>>> is a more natural way to fine tune a pass instead of introducing a -f
>>> option. For this reason, your planned deprecation of the option is a
>>> good thing to do.
>>>
>>> For consistency, I think we should introduce a new parameter to turn
>>> on/off peeling. This can also be tied closely with the cost model
>>> change.
>>>
>>> The proposed patch attached. Does this one look ok?
>>
>> I'd principally support adding a param but rather would for example make
>> it limit the number of peeled iterations (zero turning off the feature).  As
>> you implemented it it will give false messages for
>>
>>       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
>>       do_peeling = vector_alignment_reachable_p (dr);
>>       if (do_peeling)
>>         {
>> ...
>>         }
>>       else
>>         {
>>           if (!aligned_access_p (dr))
>>             {
>>               if (dump_enabled_p ())
>>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                                  "vector alignment may not be reachable\n");
>>               break;
>>             }
>>         }
>>
>> the else path.  I'd restructure this as
>>
>>      if (!vector_alignment_reachable_p (dr))
>>        {
>>           if (!aligned_access_p (dr))
>>          ....
>>           continue;
>>        }
>>
>>      if (known_alignment_for_access_p (dr))
>>       {
>> ... check param against the number of iterations to peel for this DR ....
>>       }
>>      else
>>       {
>> ... check param against max iterations ...
>>       }
>>
>> an alternative is a tri-state param that would either allow no peeling at all,
>> just peeling with known number of iterations or all kind of peelings.
>>
>> Thanks,
>> Richard.
>>
>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>>
>>>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>>>
>>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>>>          opts->x_flag_ipa_reference = false;
>>>>>>>>        break;
>>>>>>>>
>>>>>>>> +    case OPT_ftree_vectorize:
>>>>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>>>> +      break;
>>>>>>>>
>>>>>>>> doesn't look obviously correct.  Does that handle
>>>>>>>>
>>>>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>>>
>>>>>>>> properly?  Currently at least
>>>>>>>>
>>>>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>>>
>>>>>>>> doesn't "work".
>>>>>>>
>>>>>>>
>>>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>>>
>>>>>>>>
>>>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>>>> repeated options.
>>>>>>>>
>>>>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>>>>
>>>>>>> We should probably just document the behavior. Even better, we should
>>>>>>> deprecate the old option.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> David
diff mbox

Patch

Index: params.def
===================================================================
--- params.def	(revision 202834)
+++ params.def	(working copy)
@@ -544,6 +544,11 @@  DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
          "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check",
          10, 0, 0)
 
+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+         "vect-max-peeling-for-alignment",
+         "Max number of loop peels to enhancement alignment of data references in a loop",
+         -1, 0, 0)
+
 DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
 	 "max-cselib-memory-locations",
 	 "The maximum memory locations recorded by cselib",
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c	(revision 202834)
+++ tree-vect-data-refs.c	(working copy)
@@ -1718,6 +1718,30 @@  vect_enhance_data_refs_alignment (loop_v
 
       if (do_peeling)
         {
+          unsigned max_allowed_peel
+            = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT);
+          if (max_allowed_peel != (unsigned)-1)
+            {
+              unsigned max_peel = npeel;
+              if (max_peel == 0)
+                {
+                  gimple dr_stmt = DR_STMT (dr0);
+                  stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt);
+                  tree vtype = STMT_VINFO_VECTYPE (vinfo);
+                  max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1;
+                }
+              if (max_peel > max_allowed_peel)
+                {
+                  do_peeling = false;
+                  if (dump_enabled_p ())
+                    dump_printf_loc (MSG_NOTE, vect_location,
+                        "Disable peeling, max peels reached: %d\n", max_peel);
+                }
+            }
+        }
+
+      if (do_peeling)
+        {
 	  stmt_info_for_cost *si;
 	  void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 202834)
+++ doc/invoke.texi	(working copy)
@@ -9451,6 +9451,10 @@  The maximum number of run-time checks th
 doing loop versioning for alias in the vectorizer.  See option
 @option{-ftree-vect-loop-version} for more information.
 
+@item vect-max-peeling-for-alignment
+The maximum number of loop peels to enhance access alignment
+for vectorizer. Value -1 means 'no limit'.
+
 @item max-iterations-to-track
 The maximum number of iterations of a loop the brute-force algorithm
 for analysis of the number of iterations of the loop tries to evaluate.