Patchwork New GCC options for loop vectorization

login
register
mail settings
Submitter Xinliang David Li
Date Sept. 12, 2013, 8:31 p.m.
Message ID <CAAkRFZJzPkDJe=y2RqDwXsegN1So-u8mvUM2Cc2=4yZm29ip5g@mail.gmail.com>
Download mbox | patch
Permalink /patch/274606/
State New
Headers show

Comments

Xinliang David Li - Sept. 12, 2013, 8:31 p.m.
Currently -ftree-vectorize turns on both loop and slp vectorizations,
but there is no simple way to turn on loop vectorization alone. The
logic for default O3 setting is also complicated.

In this patch, two new options are introduced:

1) -ftree-loop-vectorize

This option is used to turn on loop vectorization only. option
-ftree-slp-vectorize also becomes a first class citizen, and no funny
business of Init(2) is needed.  With this change, -ftree-vectorize
becomes a simple alias to -ftree-loop-vectorize +
-ftree-slp-vectorize.

For instance, to turn on only slp vectorize at O3, the old way is:

     -O3 -fno-tree-vectorize -ftree-slp-vectorize

With the new change it becomes:

    -O3 -fno-loop-vectorize


To turn on only loop vectorize at O2, the old way is

    -O2 -ftree-vectorize -fno-slp-vectorize

The new way is

    -O2 -ftree-loop-vectorize



2) -ftree-vect-loop-peeling

This option is used to turn on/off loop peeling for alignment.  In the
long run, this should be folded into the cheap cost model proposed by
Richard.  This option is also useful in scenarios where peeling can
introduce runtime problems:
http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
common in practice.



Patch attached. Compiler boostrapped. Ok after testing?


thanks,

David
Richard Guenther - Sept. 13, 2013, 8:30 a.m.
On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
> Currently -ftree-vectorize turns on both loop and slp vectorizations,
> but there is no simple way to turn on loop vectorization alone. The
> logic for default O3 setting is also complicated.
>
> In this patch, two new options are introduced:
>
> 1) -ftree-loop-vectorize
>
> This option is used to turn on loop vectorization only. option
> -ftree-slp-vectorize also becomes a first class citizen, and no funny
> business of Init(2) is needed.  With this change, -ftree-vectorize
> becomes a simple alias to -ftree-loop-vectorize +
> -ftree-slp-vectorize.
>
> For instance, to turn on only slp vectorize at O3, the old way is:
>
>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>
> With the new change it becomes:
>
>     -O3 -fno-loop-vectorize
>
>
> To turn on only loop vectorize at O2, the old way is
>
>     -O2 -ftree-vectorize -fno-slp-vectorize
>
> The new way is
>
>     -O2 -ftree-loop-vectorize
>
>
>
> 2) -ftree-vect-loop-peeling
>
> This option is used to turn on/off loop peeling for alignment.  In the
> long run, this should be folded into the cheap cost model proposed by
> Richard.  This option is also useful in scenarios where peeling can
> introduce runtime problems:
> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
> common in practice.
>
>
>
> Patch attached. Compiler boostrapped. Ok after testing?

I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

I've stopped a quick try doing 1) myself because

@@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
         opts->x_flag_ipa_reference = false;
       break;

+    case OPT_ftree_vectorize:
+      if (!opts_set->x_flag_tree_loop_vectorize)
+ opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+ opts->x_flag_tree_slp_vectorize = value;
+      break;

doesn't look obviously correct.  Does that handle

  -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

or

  -ftree-loop-vectorize -fno-tree-vectorize

properly?  Currently at least

  -ftree-slp-vectorize -fno-tree-vectorize

doesn't "work".

That said, the option machinery doesn't handle an option being an alias
for two other options, so it's mechanism to contract positives/negatives
doesn't work here and the override hooks do not work reliably for
repeated options.

Or am I wrong here?  Should we care at all?  Joseph?

Thanks,
Richard.

>
> thanks,
>
> David
Xinliang David Li - Sept. 13, 2013, 3:16 p.m.
On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>> but there is no simple way to turn on loop vectorization alone. The
>> logic for default O3 setting is also complicated.
>>
>> In this patch, two new options are introduced:
>>
>> 1) -ftree-loop-vectorize
>>
>> This option is used to turn on loop vectorization only. option
>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>> business of Init(2) is needed.  With this change, -ftree-vectorize
>> becomes a simple alias to -ftree-loop-vectorize +
>> -ftree-slp-vectorize.
>>
>> For instance, to turn on only slp vectorize at O3, the old way is:
>>
>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>
>> With the new change it becomes:
>>
>>     -O3 -fno-loop-vectorize
>>
>>
>> To turn on only loop vectorize at O2, the old way is
>>
>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>
>> The new way is
>>
>>     -O2 -ftree-loop-vectorize
>>
>>
>>
>> 2) -ftree-vect-loop-peeling
>>
>> This option is used to turn on/off loop peeling for alignment.  In the
>> long run, this should be folded into the cheap cost model proposed by
>> Richard.  This option is also useful in scenarios where peeling can
>> introduce runtime problems:
>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>> common in practice.
>>
>>
>>
>> Patch attached. Compiler boostrapped. Ok after testing?
>
> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).

Ok. Can you also comment on 2) ?

>
> I've stopped a quick try doing 1) myself because
>
> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>          opts->x_flag_ipa_reference = false;
>        break;
>
> +    case OPT_ftree_vectorize:
> +      if (!opts_set->x_flag_tree_loop_vectorize)
> + opts->x_flag_tree_loop_vectorize = value;
> +      if (!opts_set->x_flag_tree_slp_vectorize)
> + opts->x_flag_tree_slp_vectorize = value;
> +      break;
>
> doesn't look obviously correct.  Does that handle
>
>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>
> or
>
>   -ftree-loop-vectorize -fno-tree-vectorize
>
> properly?  Currently at least
>
>   -ftree-slp-vectorize -fno-tree-vectorize
>
> doesn't "work".


Right -- same is true for -fprofile-use option. FDO enables some
passes, but can not re-enable them if they are flipped off before.

>
> That said, the option machinery doesn't handle an option being an alias
> for two other options, so it's mechanism to contract positives/negatives
> doesn't work here and the override hooks do not work reliably for
> repeated options.
>
> Or am I wrong here?  Should we care at all?  Joseph?

We should probably just document the behavior. Even better, we should
deprecate the old option.

thanks,

David

>
> Thanks,
> Richard.
>
>>
>> thanks,
>>
>> David
Joseph S. Myers - Sept. 13, 2013, 4:45 p.m.
On Fri, 13 Sep 2013, Richard Biener wrote:

> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>          opts->x_flag_ipa_reference = false;
>        break;
> 
> +    case OPT_ftree_vectorize:
> +      if (!opts_set->x_flag_tree_loop_vectorize)
> + opts->x_flag_tree_loop_vectorize = value;
> +      if (!opts_set->x_flag_tree_slp_vectorize)
> + opts->x_flag_tree_slp_vectorize = value;
> +      break;
> 
> doesn't look obviously correct.  Does that handle

It looks right to me.  The general principle is that the more specific 
option takes precedence over the less specific one, whatever the order on 
the command line.

>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize

Should mean -ftree-slp-vectorize.

>   -ftree-loop-vectorize -fno-tree-vectorize

Should mean -ftree-loop-vectorize.

>   -ftree-slp-vectorize -fno-tree-vectorize

Should mean -ftree-slp-vectorize.
Xinliang David Li - Sept. 13, 2013, 4:48 p.m.
Ok -- then my updated patch is wrong then. The implementation in the
first version matches the requirement.

thanks,

David


On Fri, Sep 13, 2013 at 9:45 AM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Fri, 13 Sep 2013, Richard Biener wrote:
>
>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>          opts->x_flag_ipa_reference = false;
>>        break;
>>
>> +    case OPT_ftree_vectorize:
>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>> + opts->x_flag_tree_loop_vectorize = value;
>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>> + opts->x_flag_tree_slp_vectorize = value;
>> +      break;
>>
>> doesn't look obviously correct.  Does that handle
>
> It looks right to me.  The general principle is that the more specific
> option takes precedence over the less specific one, whatever the order on
> the command line.
>
>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>
> Should mean -ftree-slp-vectorize.
>
>>   -ftree-loop-vectorize -fno-tree-vectorize
>
> Should mean -ftree-loop-vectorize.
>
>>   -ftree-slp-vectorize -fno-tree-vectorize
>
> Should mean -ftree-slp-vectorize.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
Richard Guenther - Sept. 16, 2013, 9:56 a.m.
On Fri, Sep 13, 2013 at 6:45 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Fri, 13 Sep 2013, Richard Biener wrote:
>
>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>          opts->x_flag_ipa_reference = false;
>>        break;
>>
>> +    case OPT_ftree_vectorize:
>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>> + opts->x_flag_tree_loop_vectorize = value;
>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>> + opts->x_flag_tree_slp_vectorize = value;
>> +      break;
>>
>> doesn't look obviously correct.  Does that handle
>
> It looks right to me.  The general principle is that the more specific
> option takes precedence over the less specific one, whatever the order on
> the command line.
>
>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>
> Should mean -ftree-slp-vectorize.
>
>>   -ftree-loop-vectorize -fno-tree-vectorize
>
> Should mean -ftree-loop-vectorize.
>
>>   -ftree-slp-vectorize -fno-tree-vectorize
>
> Should mean -ftree-slp-vectorize.

Thanks for clarifying.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com
Richard Guenther - Sept. 16, 2013, 10:13 a.m.
On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>> but there is no simple way to turn on loop vectorization alone. The
>>> logic for default O3 setting is also complicated.
>>>
>>> In this patch, two new options are introduced:
>>>
>>> 1) -ftree-loop-vectorize
>>>
>>> This option is used to turn on loop vectorization only. option
>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>> becomes a simple alias to -ftree-loop-vectorize +
>>> -ftree-slp-vectorize.
>>>
>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>
>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>
>>> With the new change it becomes:
>>>
>>>     -O3 -fno-loop-vectorize
>>>
>>>
>>> To turn on only loop vectorize at O2, the old way is
>>>
>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>
>>> The new way is
>>>
>>>     -O2 -ftree-loop-vectorize
>>>
>>>
>>>
>>> 2) -ftree-vect-loop-peeling
>>>
>>> This option is used to turn on/off loop peeling for alignment.  In the
>>> long run, this should be folded into the cheap cost model proposed by
>>> Richard.  This option is also useful in scenarios where peeling can
>>> introduce runtime problems:
>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>> common in practice.
>>>
>>>
>>>
>>> Patch attached. Compiler boostrapped. Ok after testing?
>>
>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>
> Ok. Can you also comment on 2) ?

I think we want to decide how granular we want to control the vectorizer
and using which mechanism.  My cost-model re-org makes
ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
a step backwards in this context.

So, can you summarize what pieces (including versioning) of the vectorizer
you'd want to be able to disable separately?  Just disabling peeling for
alignment may get you into the versioning for alignment path (and thus
an unvectorized loop at runtime).  Also it's know that the alignment peeling
code needs some serious TLC (it's outcome depends on the order of DRs,
the cost model it uses leaves to be desired as we cannot distinguish
between unaligned load and store costs).

Richard.

>>
>> I've stopped a quick try doing 1) myself because
>>
>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>          opts->x_flag_ipa_reference = false;
>>        break;
>>
>> +    case OPT_ftree_vectorize:
>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>> + opts->x_flag_tree_loop_vectorize = value;
>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>> + opts->x_flag_tree_slp_vectorize = value;
>> +      break;
>>
>> doesn't look obviously correct.  Does that handle
>>
>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>
>> or
>>
>>   -ftree-loop-vectorize -fno-tree-vectorize
>>
>> properly?  Currently at least
>>
>>   -ftree-slp-vectorize -fno-tree-vectorize
>>
>> doesn't "work".
>
>
> Right -- same is true for -fprofile-use option. FDO enables some
> passes, but can not re-enable them if they are flipped off before.
>
>>
>> That said, the option machinery doesn't handle an option being an alias
>> for two other options, so it's mechanism to contract positives/negatives
>> doesn't work here and the override hooks do not work reliably for
>> repeated options.
>>
>> Or am I wrong here?  Should we care at all?  Joseph?
>
> We should probably just document the behavior. Even better, we should
> deprecate the old option.
>
> thanks,
>
> David
>
>>
>> Thanks,
>> Richard.
>>
>>>
>>> thanks,
>>>
>>> David
Xinliang David Li - Sept. 16, 2013, 8:24 p.m.
On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>> but there is no simple way to turn on loop vectorization alone. The
>>>> logic for default O3 setting is also complicated.
>>>>
>>>> In this patch, two new options are introduced:
>>>>
>>>> 1) -ftree-loop-vectorize
>>>>
>>>> This option is used to turn on loop vectorization only. option
>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>> -ftree-slp-vectorize.
>>>>
>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>
>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>
>>>> With the new change it becomes:
>>>>
>>>>     -O3 -fno-loop-vectorize
>>>>
>>>>
>>>> To turn on only loop vectorize at O2, the old way is
>>>>
>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>
>>>> The new way is
>>>>
>>>>     -O2 -ftree-loop-vectorize
>>>>
>>>>
>>>>
>>>> 2) -ftree-vect-loop-peeling
>>>>
>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>> long run, this should be folded into the cheap cost model proposed by
>>>> Richard.  This option is also useful in scenarios where peeling can
>>>> introduce runtime problems:
>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>> common in practice.
>>>>
>>>>
>>>>
>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>
>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>
>> Ok. Can you also comment on 2) ?
>
> I think we want to decide how granular we want to control the vectorizer
> and using which mechanism.  My cost-model re-org makes
> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
> a step backwards in this context.

Using cost model to do a coarse grain control/configuration is
certainly something we want, but having a fine grain control is still
useful.

>
> So, can you summarize what pieces (including versioning) of the vectorizer
> you'd want to be able to disable separately?

Loop peeling seems to be the main one. There is also a correctness
issue related. For instance, the following code is common in practice,
but loop peeling wrongly assumes initial base-alignment and generates
aligned mov instruction after peeling, leading to SEGV.  Peeling is
not something we can blindly turned on -- even when it is on, there
should be a way to turn it off explicitly:

char a[10000];

void foo(int n)
{
  int* b = (int*)(a+n);
  int i = 0;
  for (; i < 1000; ++i)
    b[i] = 1;
}

int main(int argn, char** argv)
{
  foo(argn);
}



>  Just disabling peeling for
> alignment may get you into the versioning for alignment path (and thus
> an unvectorized loop at runtime).

This is not true for target supporting mis-aligned access. I have not
seen a case where alignment driver loop version happens on x86.

>Also it's know that the alignment peeling
> code needs some serious TLC (it's outcome depends on the order of DRs,
> the cost model it uses leaves to be desired as we cannot distinguish
> between unaligned load and store costs).

Yet another reason to turn it off as it is not effective anyways?


thanks,

David

>
> Richard.
>
>>>
>>> I've stopped a quick try doing 1) myself because
>>>
>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>          opts->x_flag_ipa_reference = false;
>>>        break;
>>>
>>> +    case OPT_ftree_vectorize:
>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>> + opts->x_flag_tree_loop_vectorize = value;
>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>> + opts->x_flag_tree_slp_vectorize = value;
>>> +      break;
>>>
>>> doesn't look obviously correct.  Does that handle
>>>
>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>
>>> or
>>>
>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>
>>> properly?  Currently at least
>>>
>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>
>>> doesn't "work".
>>
>>
>> Right -- same is true for -fprofile-use option. FDO enables some
>> passes, but can not re-enable them if they are flipped off before.
>>
>>>
>>> That said, the option machinery doesn't handle an option being an alias
>>> for two other options, so it's mechanism to contract positives/negatives
>>> doesn't work here and the override hooks do not work reliably for
>>> repeated options.
>>>
>>> Or am I wrong here?  Should we care at all?  Joseph?
>>
>> We should probably just document the behavior. Even better, we should
>> deprecate the old option.
>>
>> thanks,
>>
>> David
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> thanks,
>>>>
>>>> David
Richard Guenther - Sept. 17, 2013, 8:20 a.m.
On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>> logic for default O3 setting is also complicated.
>>>>>
>>>>> In this patch, two new options are introduced:
>>>>>
>>>>> 1) -ftree-loop-vectorize
>>>>>
>>>>> This option is used to turn on loop vectorization only. option
>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>> -ftree-slp-vectorize.
>>>>>
>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>
>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>
>>>>> With the new change it becomes:
>>>>>
>>>>>     -O3 -fno-loop-vectorize
>>>>>
>>>>>
>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>
>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>
>>>>> The new way is
>>>>>
>>>>>     -O2 -ftree-loop-vectorize
>>>>>
>>>>>
>>>>>
>>>>> 2) -ftree-vect-loop-peeling
>>>>>
>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>> introduce runtime problems:
>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>> common in practice.
>>>>>
>>>>>
>>>>>
>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>
>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>
>>> Ok. Can you also comment on 2) ?
>>
>> I think we want to decide how granular we want to control the vectorizer
>> and using which mechanism.  My cost-model re-org makes
>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>> a step backwards in this context.
>
> Using cost model to do a coarse grain control/configuration is
> certainly something we want, but having a fine grain control is still
> useful.
>
>>
>> So, can you summarize what pieces (including versioning) of the vectorizer
>> you'd want to be able to disable separately?
>
> Loop peeling seems to be the main one. There is also a correctness
> issue related. For instance, the following code is common in practice,
> but loop peeling wrongly assumes initial base-alignment and generates
> aligned mov instruction after peeling, leading to SEGV.  Peeling is
> not something we can blindly turned on -- even when it is on, there
> should be a way to turn it off explicitly:
>
> char a[10000];
>
> void foo(int n)
> {
>   int* b = (int*)(a+n);
>   int i = 0;
>   for (; i < 1000; ++i)
>     b[i] = 1;
> }
>
> int main(int argn, char** argv)
> {
>   foo(argn);
> }

But that's just a bug that should be fixed (looking into it).

>>  Just disabling peeling for
>> alignment may get you into the versioning for alignment path (and thus
>> an unvectorized loop at runtime).
>
> This is not true for target supporting mis-aligned access. I have not
> seen a case where alignment driver loop version happens on x86.
>
>>Also it's know that the alignment peeling
>> code needs some serious TLC (it's outcome depends on the order of DRs,
>> the cost model it uses leaves to be desired as we cannot distinguish
>> between unaligned load and store costs).
>
> Yet another reason to turn it off as it is not effective anyways?

As said I'll disable all remains of -ftree-vect-loop-version with the cost model
patch because it wasn't guarding versioning for aliasing but only versioning
for alignment.

We have to be consistent here - if we add a way to disable peeling for
alignment then we certainly don't want to remove the ability to disable
versioning for alignment, no?

Richard.

>
> thanks,
>
> David
>
>>
>> Richard.
>>
>>>>
>>>> I've stopped a quick try doing 1) myself because
>>>>
>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>          opts->x_flag_ipa_reference = false;
>>>>        break;
>>>>
>>>> +    case OPT_ftree_vectorize:
>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>> +      break;
>>>>
>>>> doesn't look obviously correct.  Does that handle
>>>>
>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>
>>>> or
>>>>
>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>
>>>> properly?  Currently at least
>>>>
>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>
>>>> doesn't "work".
>>>
>>>
>>> Right -- same is true for -fprofile-use option. FDO enables some
>>> passes, but can not re-enable them if they are flipped off before.
>>>
>>>>
>>>> That said, the option machinery doesn't handle an option being an alias
>>>> for two other options, so it's mechanism to contract positives/negatives
>>>> doesn't work here and the override hooks do not work reliably for
>>>> repeated options.
>>>>
>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>
>>> We should probably just document the behavior. Even better, we should
>>> deprecate the old option.
>>>
>>> thanks,
>>>
>>> David
>>>
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
Richard Guenther - Sept. 17, 2013, 8:38 a.m.
On Tue, Sep 17, 2013 at 10:20 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>> logic for default O3 setting is also complicated.
>>>>>>
>>>>>> In this patch, two new options are introduced:
>>>>>>
>>>>>> 1) -ftree-loop-vectorize
>>>>>>
>>>>>> This option is used to turn on loop vectorization only. option
>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>> -ftree-slp-vectorize.
>>>>>>
>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>
>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>
>>>>>> With the new change it becomes:
>>>>>>
>>>>>>     -O3 -fno-loop-vectorize
>>>>>>
>>>>>>
>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>
>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>
>>>>>> The new way is
>>>>>>
>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>
>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>> introduce runtime problems:
>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>> common in practice.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>
>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>
>>>> Ok. Can you also comment on 2) ?
>>>
>>> I think we want to decide how granular we want to control the vectorizer
>>> and using which mechanism.  My cost-model re-org makes
>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>> a step backwards in this context.
>>
>> Using cost model to do a coarse grain control/configuration is
>> certainly something we want, but having a fine grain control is still
>> useful.
>>
>>>
>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>> you'd want to be able to disable separately?
>>
>> Loop peeling seems to be the main one. There is also a correctness
>> issue related. For instance, the following code is common in practice,
>> but loop peeling wrongly assumes initial base-alignment and generates
>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>> not something we can blindly turned on -- even when it is on, there
>> should be a way to turn it off explicitly:
>>
>> char a[10000];
>>
>> void foo(int n)
>> {
>>   int* b = (int*)(a+n);
>>   int i = 0;
>>   for (; i < 1000; ++i)
>>     b[i] = 1;
>> }
>>
>> int main(int argn, char** argv)
>> {
>>   foo(argn);
>> }
>
> But that's just a bug that should be fixed (looking into it).

Bug in the testcase.  b[i] asserts that b is aligned to 'int', so this invokes
undefined behavior if peeling cannot reach an alignment of 16.

Richard.
Xinliang David Li - Sept. 17, 2013, 3:37 p.m.
On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>> logic for default O3 setting is also complicated.
>>>>>>
>>>>>> In this patch, two new options are introduced:
>>>>>>
>>>>>> 1) -ftree-loop-vectorize
>>>>>>
>>>>>> This option is used to turn on loop vectorization only. option
>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>> -ftree-slp-vectorize.
>>>>>>
>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>
>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>
>>>>>> With the new change it becomes:
>>>>>>
>>>>>>     -O3 -fno-loop-vectorize
>>>>>>
>>>>>>
>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>
>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>
>>>>>> The new way is
>>>>>>
>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>
>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>> introduce runtime problems:
>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>> common in practice.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>
>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2).
>>>>
>>>> Ok. Can you also comment on 2) ?
>>>
>>> I think we want to decide how granular we want to control the vectorizer
>>> and using which mechanism.  My cost-model re-org makes
>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>> a step backwards in this context.
>>
>> Using cost model to do a coarse grain control/configuration is
>> certainly something we want, but having a fine grain control is still
>> useful.
>>
>>>
>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>> you'd want to be able to disable separately?
>>
>> Loop peeling seems to be the main one. There is also a correctness
>> issue related. For instance, the following code is common in practice,
>> but loop peeling wrongly assumes initial base-alignment and generates
>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>> not something we can blindly turned on -- even when it is on, there
>> should be a way to turn it off explicitly:
>>
>> char a[10000];
>>
>> void foo(int n)
>> {
>>   int* b = (int*)(a+n);
>>   int i = 0;
>>   for (; i < 1000; ++i)
>>     b[i] = 1;
>> }
>>
>> int main(int argn, char** argv)
>> {
>>   foo(argn);
>> }
>
> But that's just a bug that should be fixed (looking into it).

This kind of code is not uncommon for certain applications (e.g, group
varint decoding).  Besides, the code like this may be built with
-fno-strict-aliasing.


>
>>>  Just disabling peeling for
>>> alignment may get you into the versioning for alignment path (and thus
>>> an unvectorized loop at runtime).
>>
>> This is not true for target supporting mis-aligned access. I have not
>> seen a case where alignment driver loop version happens on x86.
>>
>>>Also it's know that the alignment peeling
>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>> the cost model it uses leaves to be desired as we cannot distinguish
>>> between unaligned load and store costs).
>>
>> Yet another reason to turn it off as it is not effective anyways?
>
> As said I'll disable all remains of -ftree-vect-loop-version with the cost model
> patch because it wasn't guarding versioning for aliasing but only versioning
> for alignment.
>
> We have to be consistent here - if we add a way to disable peeling for
> alignment then we certainly don't want to remove the ability to disable
> versioning for alignment, no?

yes, for consistency, the version control flag may also be useful to be kept.

David

>
> Richard.
>
>>
>> thanks,
>>
>> David
>>
>>>
>>> Richard.
>>>
>>>>>
>>>>> I've stopped a quick try doing 1) myself because
>>>>>
>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>          opts->x_flag_ipa_reference = false;
>>>>>        break;
>>>>>
>>>>> +    case OPT_ftree_vectorize:
>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>> +      break;
>>>>>
>>>>> doesn't look obviously correct.  Does that handle
>>>>>
>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>
>>>>> or
>>>>>
>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>
>>>>> properly?  Currently at least
>>>>>
>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>
>>>>> doesn't "work".
>>>>
>>>>
>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>> passes, but can not re-enable them if they are flipped off before.
>>>>
>>>>>
>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>> doesn't work here and the override hooks do not work reliably for
>>>>> repeated options.
>>>>>
>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>
>>>> We should probably just document the behavior. Even better, we should
>>>> deprecate the old option.
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David
Jakub Jelinek - Sept. 17, 2013, 3:45 p.m.
On Tue, Sep 17, 2013 at 08:37:57AM -0700, Xinliang David Li wrote:
> >> char a[10000];
> >>
> >> void foo(int n)
> >> {
> >>   int* b = (int*)(a+n);
> >>   int i = 0;
> >>   for (; i < 1000; ++i)
> >>     b[i] = 1;
> >> }
> >>
> >> int main(int argn, char** argv)
> >> {
> >>   foo(argn);
> >> }
> >
> > But that's just a bug that should be fixed (looking into it).
> 
> This kind of code is not uncommon for certain applications (e.g, group
> varint decoding).  Besides, the code like this may be built with

That is irrelevant to the fact that it is invalid.

> -fno-strict-aliasing.

It isn't invalid because of aliasing violations, but because of unaligned
access without saying that it is unaligned (say accessing through
aligned(1) type, or packed struct or similar, or doing memcpy).
On various architectures unaligned accesses don't cause faults, so it
may appear to work, and even on i?86/x86_64 often appears to work, as
long as you aren't trying to vectorize code (which doesn't change anything
on the fact that it is undefined behavior).

	Jakub
Xinliang David Li - Sept. 17, 2013, 4:39 p.m.
On Tue, Sep 17, 2013 at 8:45 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Sep 17, 2013 at 08:37:57AM -0700, Xinliang David Li wrote:
>> >> char a[10000];
>> >>
>> >> void foo(int n)
>> >> {
>> >>   int* b = (int*)(a+n);
>> >>   int i = 0;
>> >>   for (; i < 1000; ++i)
>> >>     b[i] = 1;
>> >> }
>> >>
>> >> int main(int argn, char** argv)
>> >> {
>> >>   foo(argn);
>> >> }
>> >
>> > But that's just a bug that should be fixed (looking into it).
>>
>> This kind of code is not uncommon for certain applications (e.g, group
>> varint decoding).  Besides, the code like this may be built with
>
> That is irrelevant to the fact that it is invalid.
>
>> -fno-strict-aliasing.
>
> It isn't invalid because of aliasing violations, but because of unaligned
> access without saying that it is unaligned (say accessing through
> aligned(1) type, or packed struct or similar, or doing memcpy).
> On various architectures unaligned accesses don't cause faults, so it
> may appear to work, and even on i?86/x86_64 often appears to work, as
> long as you aren't trying to vectorize code (which doesn't change anything
> on the fact that it is undefined behavior).

ok, undefined behavior it is.  By the way, ICC does loop versioning on
the case and therefore has no problem. Clang/LLVM vectorizes it with
neither peeling nor versioning, and it works fine to. For legacy code
like this, GCC is less tolerant.

thanks,

David

>
>         Jakub

Patch

Index: omp-low.c
===================================================================
--- omp-low.c	(revision 202481)
+++ omp-low.c	(working copy)
@@ -2305,8 +2305,8 @@  omp_max_vf (void)
 {
   if (!optimize
       || optimize_debug
-      || (!flag_tree_vectorize
-	  && global_options_set.x_flag_tree_vectorize))
+      || (!flag_tree_loop_vectorize
+	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
   int vs = targetm.vectorize.autovectorize_vector_sizes ();
@@ -5684,10 +5684,10 @@  expand_omp_simd (struct omp_region *regi
 	  loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
 	  cfun->has_simduid_loops = true;
 	}
-      /* If not -fno-tree-vectorize, hint that we want to vectorize
+      /* If not -fno-tree-loop-vectorize, hint that we want to vectorize
 	 the loop.  */
-      if ((flag_tree_vectorize
-	   || !global_options_set.x_flag_tree_vectorize)
+      if ((flag_tree_loop_vectorize
+	   || !global_options_set.x_flag_tree_loop_vectorize)
 	  && loop->safelen > 1)
 	{
 	  loop->force_vect = true;
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 202481)
+++ ChangeLog	(working copy)
@@ -1,3 +1,24 @@ 
+2013-09-12  Xinliang David Li  <davidxl@google.com>
+
+	* tree-if-conv.c (main_tree_if_conversion): Check new flag.
+	* omp-low.c (omp_max_vf): Ditto.
+	(expand_omp_simd): Ditto.
+	* tree-vectorizer.c (vectorize_loops): Ditto.
+	(gate_vect_slp): Ditto.
+	(gate_increase_alignment): Ditto.
+	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Ditto.
+	* tree-ssa-pre.c (inhibit_phi_insertion): Ditto.
+	* tree-ssa-loop.c (gate_tree_vectorize): Ditto.
+	(gate_tree_vectorize): Name change.
+	(tree_vectorize): Ditto.
+	(pass_vectorize::gate): Call new function.
+	(pass_vectorize::execute): Ditto.
+	opts.c: O3 default setting change.
+	(finish_options): Check new flag.
+	* doc/invoke.texi: Document new flags.
+	* common.opt: New flags.
+
+
 2013-09-10  Richard Earnshaw  <rearnsha@arm.com>
 
 	PR target/58361
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 202481)
+++ doc/invoke.texi	(working copy)
@@ -419,10 +419,12 @@  Objective-C and Objective-C++ Dialects}.
 -ftree-loop-if-convert-stores -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
+-ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge @gol
--ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
+-ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
+-ftree-vect-loop-version -ftree-vect-loop-peeling -ftree-vectorize @gol
+-ftree-vrp @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
@@ -6748,8 +6750,8 @@  invoking @option{-O2} on programs that u
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-vectorize},
-@option{-fvect-cost-model},
+@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
+@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
 @option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
 
 @item -O0
@@ -6766,7 +6768,7 @@  optimizations designed to reduce code si
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
 -falign-labels  -freorder-blocks  -freorder-blocks-and-partition @gol
--fprefetch-loop-arrays  -ftree-vect-loop-version}
+-fprefetch-loop-arrays  -ftree-vect-loop-version -ftree-vect-loop-peeling}
 
 @item -Ofast
 @opindex Ofast
@@ -8008,14 +8010,29 @@  higher.
 
 @item -ftree-vectorize
 @opindex ftree-vectorize
+Perform vectorization on trees. This flag enables @option{-ftree-loop-vectorize}
+and @option{-ftree-slp-vectorize} if neither option is explicitly specified.
+
+@item -ftree-loop-vectorize
+@opindex ftree-loop-vectorize
 Perform loop vectorization on trees. This flag is enabled by default at
-@option{-O3}.
+@option{-O3} and when @option{-ftree-vectorize} is enabled.
 
 @item -ftree-slp-vectorize
 @opindex ftree-slp-vectorize
 Perform basic block vectorization on trees. This flag is enabled by default at
 @option{-O3} and when @option{-ftree-vectorize} is enabled.
 
+@item -ftree-vect-loop-peeling
+@opindex ftree-vect-loop-peeling
+Perform loop peeling when doing loop vectorization on trees.  When a loop
+appears to be vectorizable except that data alignment can not be determined
+at compile time, then loop is peeled to enhance alignment for one or more
+data accesses determined by the compiler. After loop peeling, those accesses
+will become well aligned that more efficient simd instructions can be used 
+for them.  This option is enabled by default except at level @option{-Os} 
+where it is disabled.
+
 @item -ftree-vect-loop-version
 @opindex ftree-vect-loop-version
 Perform loop versioning when doing loop vectorization on trees.  When a loop
Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c	(revision 202481)
+++ tree-if-conv.c	(working copy)
@@ -1789,7 +1789,7 @@  main_tree_if_conversion (void)
   FOR_EACH_LOOP (li, loop, 0)
     if (flag_tree_loop_if_convert == 1
 	|| flag_tree_loop_if_convert_stores == 1
-	|| flag_tree_vectorize
+	|| flag_tree_loop_vectorize
 	|| loop->force_vect)
     changed |= tree_if_conversion (loop);
 
@@ -1815,7 +1815,7 @@  main_tree_if_conversion (void)
 static bool
 gate_tree_if_conversion (void)
 {
-  return (((flag_tree_vectorize || cfun->has_force_vect_loops)
+  return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
 	   && flag_tree_loop_if_convert != 0)
 	  || flag_tree_loop_if_convert == 1
 	  || flag_tree_loop_if_convert_stores == 1);
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c	(revision 202481)
+++ tree-vect-data-refs.c	(working copy)
@@ -1404,7 +1404,9 @@  vect_enhance_data_refs_alignment (loop_v
 	continue;
 
       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
-      do_peeling = vector_alignment_reachable_p (dr);
+      do_peeling = (flag_tree_vect_loop_peeling
+	            && optimize_loop_nest_for_speed_p (loop)
+                    && vector_alignment_reachable_p (dr));
       if (do_peeling)
         {
           if (known_alignment_for_access_p (dr))
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c	(revision 202481)
+++ tree-ssa-pre.c	(working copy)
@@ -3026,7 +3026,7 @@  inhibit_phi_insertion (basic_block bb, p
   unsigned i;
 
   /* If we aren't going to vectorize we don't inhibit anything.  */
-  if (!flag_tree_vectorize)
+  if (!flag_tree_loop_vectorize)
     return false;
 
   /* Otherwise we inhibit the insertion when the address of the
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 202481)
+++ tree-vectorizer.c	(working copy)
@@ -341,7 +341,7 @@  vectorize_loops (void)
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
   FOR_EACH_LOOP (li, loop, 0)
-    if ((flag_tree_vectorize && optimize_loop_nest_for_speed_p (loop))
+    if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
 	|| loop->force_vect)
       {
 	loop_vec_info loop_vinfo;
@@ -486,10 +486,7 @@  execute_vect_slp (void)
 static bool
 gate_vect_slp (void)
 {
-  /* Apply SLP either if the vectorizer is on and the user didn't specify
-     whether to run SLP or not, or if the SLP flag was set by the user.  */
-  return ((flag_tree_vectorize != 0 && flag_tree_slp_vectorize != 0)
-          || flag_tree_slp_vectorize == 1);
+  return flag_tree_slp_vectorize != 0;
 }
 
 namespace {
@@ -579,7 +576,7 @@  increase_alignment (void)
 static bool
 gate_increase_alignment (void)
 {
-  return flag_section_anchors && flag_tree_vectorize;
+  return flag_section_anchors && flag_tree_loop_vectorize;
 }
 
 
Index: tree-ssa-loop.c
===================================================================
--- tree-ssa-loop.c	(revision 202481)
+++ tree-ssa-loop.c	(working copy)
@@ -303,7 +303,7 @@  make_pass_predcom (gcc::context *ctxt)
 /* Loop autovectorization.  */
 
 static unsigned int
-tree_vectorize (void)
+tree_loop_vectorize (void)
 {
   if (number_of_loops (cfun) <= 1)
     return 0;
@@ -312,9 +312,9 @@  tree_vectorize (void)
 }
 
 static bool
-gate_tree_vectorize (void)
+gate_tree_loop_vectorize (void)
 {
-  return flag_tree_vectorize || cfun->has_force_vect_loops;
+  return flag_tree_loop_vectorize || cfun->has_force_vect_loops;
 }
 
 namespace {
@@ -342,8 +342,8 @@  public:
   {}
 
   /* opt_pass methods: */
-  bool gate () { return gate_tree_vectorize (); }
-  unsigned int execute () { return tree_vectorize (); }
+  bool gate () { return gate_tree_loop_vectorize (); }
+  unsigned int execute () { return tree_loop_vectorize (); }
 
 }; // class pass_vectorize
 
Index: opts.c
===================================================================
--- opts.c	(revision 202481)
+++ opts.c	(working copy)
@@ -498,7 +498,8 @@  static const struct default_options defa
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
-    { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
@@ -826,7 +827,8 @@  finish_options (struct gcc_options *opts
 
   /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
      is disabled.  */
-  if (!opts->x_flag_tree_vectorize || !opts->x_flag_tree_loop_if_convert)
+  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
+       || !opts->x_flag_tree_loop_if_convert)
     maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
                            opts->x_param_values, opts_set->x_param_values);
 
@@ -1660,8 +1662,10 @@  common_handle_option (struct gcc_options
 	opts->x_flag_unswitch_loops = value;
       if (!opts_set->x_flag_gcse_after_reload)
 	opts->x_flag_gcse_after_reload = value;
-      if (!opts_set->x_flag_tree_vectorize)
-	opts->x_flag_tree_vectorize = value;
+      if (!opts_set->x_flag_tree_loop_vectorize)
+	opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+	opts->x_flag_tree_slp_vectorize = value;
       if (!opts_set->x_flag_vect_cost_model)
 	opts->x_flag_vect_cost_model = value;
       if (!opts_set->x_flag_tree_loop_distribute_patterns)
@@ -1691,6 +1695,12 @@  common_handle_option (struct gcc_options
         opts->x_flag_ipa_reference = false;
       break;
 
+    case OPT_ftree_vectorize:
+      if (!opts_set->x_flag_tree_loop_vectorize)
+	opts->x_flag_tree_loop_vectorize = value;
+      if (!opts_set->x_flag_tree_slp_vectorize)
+	opts->x_flag_tree_slp_vectorize = value;
+      break;
     case OPT_fshow_column:
       dc->show_column = value;
       break;
Index: common.opt
===================================================================
--- common.opt	(revision 202481)
+++ common.opt	(working copy)
@@ -2263,15 +2263,19 @@  Common Report Var(flag_var_tracking_unin
 Perform variable tracking and also tag variables that are uninitialized
 
 ftree-vectorize
-Common Report Var(flag_tree_vectorize) Optimization
-Enable loop vectorization on trees
+Common Report Optimization
+Enable vectorization on trees
 
 ftree-vectorizer-verbose=
 Common RejectNegative Joined UInteger Var(common_deferred_options) Defer
 -ftree-vectorizer-verbose=<number>	This switch is deprecated. Use -fopt-info instead.
 
+ftree-loop-vectorize
+Common Report Var(flag_tree_loop_vectorize) Optimization
+Enable loop vectorization on trees
+
 ftree-slp-vectorize
-Common Report Var(flag_tree_slp_vectorize) Init(2) Optimization
+Common Report Var(flag_tree_slp_vectorize) Optimization
 Enable basic block vectorization (SLP) on trees
 
 fvect-cost-model
@@ -2282,6 +2286,10 @@  ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
 
+ftree-vect-loop-peeling
+Common Report Var(flag_tree_vect_loop_peeling) Init(1) Optimization
+Enable loop peeling to enhance alignment when doing loop vectorization on trees
+
 ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.