diff mbox

Vector Comparison patch

Message ID CABYV9SUWj2Fbp4OYtexdFz_oFrfTEjU5mn-rTAgo652gy1DfOw@mail.gmail.com
State New
Headers show

Commit Message

Artem Shinkarov Aug. 22, 2011, 9:11 p.m. UTC
I'll just send you my current version. I'll be a little bit more specific.

The problem starts when you try to lower the following expression:

x = a > b;
x1 = vcond <x != 0, -1, 0>
vcond <x1, c, d>

Now, you go from the beginning to the end of the block, and you cannot
leave a > b, because only vconds are valid expressions to expand.

Now, you meet a > b first. You try to transform it into vcond <a > b,
-1, 0>, you build this expression, then you try to gimplify it, and
you see that you have something like:

x' = a >b;
x = vcond <x', -1, 0>
x1 = vcond <x != 0, -1, 0>
vcond <x1, c, d>

and your gsi stands at the x1 now, so the gimplification created a
comparison that optab would not understand. And I am not really sure
that you would be able to solve this problem easily.

It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
cant and x op y is a single tree that must be gimplified, and I am not
sure that you can persuade gimplifier to leave this expression
untouched.

In the attachment the current version of the patch.


Thanks,
Artem.


On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>> Richard
>>>>>>>>>>>>
>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>
>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>
>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>
>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>
>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>> complicated.
>>>>>>>>>>>
>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>
>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>> duplication.
>>>>>>>>>>
>>>>>>>>>> Like:
>>>>>>>>>> mask = a > b;
>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>
>>>>>>>>>> Which in this case would be different from
>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>
>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>
>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>
>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>
>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>
>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>> mask.
>>>>>>>>>>
>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>
>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>
>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>> vec1 : vec2.
>>>>>>>>
>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>
>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>> that
>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>> dropping the comparison against zero.
>>>>>>>>
>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>> so no optimization is needed in this part.
>>>>>>>
>>>>>>> I mean for
>>>>>>>
>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>
>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>> it by v1 < v2.
>>>>>>
>>>>>> Yes, sure.
>>>>>>
>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>> do not have to agree).
>>>>>>>>
>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>> understand right?
>>>>>>>
>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>
>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>
>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>
>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>> doesn't know).
>>>>>
>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>
>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>
>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>
>>>>>> I didn't quite get that, could you give an example?
>>>>>
>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>
>>>> Ha, joking again. :)
>>>>
>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>
>>>>>>>> Well, take simpler example
>>>>>>>>
>>>>>>>> a = {0};
>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>
>>>>>>>> res = a ? v0 : v1;
>>>>>>>>
>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>
>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>> vector contents though).
>>>>>>
>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>
>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>> that needs to be careful to impose its stricter semantics.
>>>>>
>>>>> Richard.
>>>>>
>>>>
>>>> Ok, I see the last difference in the approaches we envision.
>>>> I am assuming, that frontend does not put != 0, but the later
>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>> and does the same functionality as you describe. So the philosophical
>>>> question why it is better to first add and then remove, rather than
>>>> just add if needed?
>>>
>>> Well, it's "better be right than sorry".  Thus, default to the
>>> conservatively correct
>>> way and let optimizers "optimize" it.
>>
>> How can we get sorry, it is impossible to skip the vcond during the
>> optimisation, but whatever, it is not really so important when to add.
>> Currently I have a bigger problem, see below.
>>>
>>>> In all the rest I think we agreed.
>>>
>>> Fine.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Artem.
>>>>
>>>
>>
>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>> that gimplifier pulls a > b always as a separate expression during the
>> gimplification, and I don't think that we can avoid it. So what
>> happens is:
>>
>> vcond <a > b , c , d>
>> transformed to
>> x = b > c;
>> x1 = vcond <x , -1, 0>
>> vcond <x1, c, d>
>>
>> and so on, infinitely long.
>
> Sounds like a bug that is possible to fix.
>
>> In order to fix the problem we need whether to introduce a new code
>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>> whether a builtin function which we would lower
>> whether stick back to the idea of hook.
>>
>> Anyway, representing a >b using vcond does not work.
>
> Well, sure it will work, it just needs some work appearantly.
>
>> What would be your thinking here?
>
> Do you have a patch that exposes this problem?  I can have a look
> tomorrow.
>
> Richard.
>
>>
>> Thanks,
>> Artem.
>>
>

Comments

Richard Biener Aug. 23, 2011, 8:17 a.m. UTC | #1
On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> I'll just send you my current version. I'll be a little bit more specific.
>
> The problem starts when you try to lower the following expression:
>
> x = a > b;
> x1 = vcond <x != 0, -1, 0>
> vcond <x1, c, d>
>
> Now, you go from the beginning to the end of the block, and you cannot
> leave a > b, because only vconds are valid expressions to expand.
>
> Now, you meet a > b first. You try to transform it into vcond <a > b,
> -1, 0>, you build this expression, then you try to gimplify it, and
> you see that you have something like:
>
> x' = a >b;
> x = vcond <x', -1, 0>
> x1 = vcond <x != 0, -1, 0>
> vcond <x1, c, d>
>
> and your gsi stands at the x1 now, so the gimplification created a
> comparison that optab would not understand. And I am not really sure
> that you would be able to solve this problem easily.
>
> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
> cant and x op y is a single tree that must be gimplified, and I am not
> sure that you can persuade gimplifier to leave this expression
> untouched.
>
> In the attachment the current version of the patch.

I can't reproduce it with your patch.  For

#define vector(elcount, type)  \
    __attribute__((vector_size((elcount)*sizeof(type)))) type

vector (4, float) x, y;
vector (4, int) a,b;
int
main (int argc, char *argv[])
{
  vector (4, int) i0 = x < y;
  vector (4, int) i1 = i0 ? a : b;
  return 0;
}

I get from the C frontend:

  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
-1, -1 } , { 0, 0, 0, 0 } > ;
  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
SAVE_EXPR <b> > ;

but I have expected i0 != 0 in the second VEC_COND_EXPR.

I do see that the gimplifier pulls away the condition for the first
VEC_COND_EXPR though:

  x.0 = x;
  y.1 = y;
  D.2735 = x.0 < y.1;
  D.2734 = D.2735;
  D.2736 = D.2734;
  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
{ 0, 0, 0, 0 } > ;

which is, I believe because of the SAVE_EXPR wrapped around the
comparison.  Why do you bother wrapping all operands in save-exprs?

With that the

  /* Currently the expansion of VEC_COND_EXPR does not allow
     expessions where the type of vectors you compare differs
     form the type of vectors you select from. For the time
     being we insert implicit conversions.  */
  if ((COMPARISON_CLASS_P (ifexp)
       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
      || TREE_TYPE (ifexp) != TREE_TYPE (op1))

checks will fail (because ifexp is a SAVE_EXPR).

I'll run into errors when not adding the SAVE_EXPR around the ifexp,
the transform into x < y ? {-1,...} : {0,...} is not happening.

>
> Thanks,
> Artem.
>
>
> On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>>> Richard
>>>>>>>>>>>>>
>>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>>> complicated.
>>>>>>>>>>>>
>>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>>
>>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>>> duplication.
>>>>>>>>>>>
>>>>>>>>>>> Like:
>>>>>>>>>>> mask = a > b;
>>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>>
>>>>>>>>>>> Which in this case would be different from
>>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>>
>>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>>
>>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>>
>>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>>
>>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>>> mask.
>>>>>>>>>>>
>>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>>
>>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>>
>>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>>> vec1 : vec2.
>>>>>>>>>
>>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>>
>>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>>> that
>>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>>> dropping the comparison against zero.
>>>>>>>>>
>>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>>> so no optimization is needed in this part.
>>>>>>>>
>>>>>>>> I mean for
>>>>>>>>
>>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>>
>>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>>> it by v1 < v2.
>>>>>>>
>>>>>>> Yes, sure.
>>>>>>>
>>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>>> do not have to agree).
>>>>>>>>>
>>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>>> understand right?
>>>>>>>>
>>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>>
>>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>>
>>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>>
>>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>>> doesn't know).
>>>>>>
>>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>>
>>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>>
>>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>>
>>>>>>> I didn't quite get that, could you give an example?
>>>>>>
>>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>>
>>>>> Ha, joking again. :)
>>>>>
>>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>>
>>>>>>>>> Well, take simpler example
>>>>>>>>>
>>>>>>>>> a = {0};
>>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>>
>>>>>>>>> res = a ? v0 : v1;
>>>>>>>>>
>>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>>
>>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>>> vector contents though).
>>>>>>>
>>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>>
>>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>>> that needs to be careful to impose its stricter semantics.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>
>>>>> Ok, I see the last difference in the approaches we envision.
>>>>> I am assuming, that frontend does not put != 0, but the later
>>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>>> and does the same functionality as you describe. So the philosophical
>>>>> question why it is better to first add and then remove, rather than
>>>>> just add if needed?
>>>>
>>>> Well, it's "better be right than sorry".  Thus, default to the
>>>> conservatively correct
>>>> way and let optimizers "optimize" it.
>>>
>>> How can we get sorry, it is impossible to skip the vcond during the
>>> optimisation, but whatever, it is not really so important when to add.
>>> Currently I have a bigger problem, see below.
>>>>
>>>>> In all the rest I think we agreed.
>>>>
>>>> Fine.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Artem.
>>>>>
>>>>
>>>
>>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>>> that gimplifier pulls a > b always as a separate expression during the
>>> gimplification, and I don't think that we can avoid it. So what
>>> happens is:
>>>
>>> vcond <a > b , c , d>
>>> transformed to
>>> x = b > c;
>>> x1 = vcond <x , -1, 0>
>>> vcond <x1, c, d>
>>>
>>> and so on, infinitely long.
>>
>> Sounds like a bug that is possible to fix.
>>
>>> In order to fix the problem we need whether to introduce a new code
>>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>>> whether a builtin function which we would lower
>>> whether stick back to the idea of hook.
>>>
>>> Anyway, representing a >b using vcond does not work.
>>
>> Well, sure it will work, it just needs some work appearantly.
>>
>>> What would be your thinking here?
>>
>> Do you have a patch that exposes this problem?  I can have a look
>> tomorrow.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> Artem.
>>>
>>
>
Artem Shinkarov Aug. 23, 2011, 9:44 a.m. UTC | #2
On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> I'll just send you my current version. I'll be a little bit more specific.
>>
>> The problem starts when you try to lower the following expression:
>>
>> x = a > b;
>> x1 = vcond <x != 0, -1, 0>
>> vcond <x1, c, d>
>>
>> Now, you go from the beginning to the end of the block, and you cannot
>> leave a > b, because only vconds are valid expressions to expand.
>>
>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>> -1, 0>, you build this expression, then you try to gimplify it, and
>> you see that you have something like:
>>
>> x' = a >b;
>> x = vcond <x', -1, 0>
>> x1 = vcond <x != 0, -1, 0>
>> vcond <x1, c, d>
>>
>> and your gsi stands at the x1 now, so the gimplification created a
>> comparison that optab would not understand. And I am not really sure
>> that you would be able to solve this problem easily.
>>
>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>> cant and x op y is a single tree that must be gimplified, and I am not
>> sure that you can persuade gimplifier to leave this expression
>> untouched.
>>
>> In the attachment the current version of the patch.
>
> I can't reproduce it with your patch.  For
>
> #define vector(elcount, type)  \
>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> vector (4, float) x, y;
> vector (4, int) a,b;
> int
> main (int argc, char *argv[])
> {
>  vector (4, int) i0 = x < y;
>  vector (4, int) i1 = i0 ? a : b;
>  return 0;
> }
>
> I get from the C frontend:
>
>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
> -1, -1 } , { 0, 0, 0, 0 } > ;
>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
> SAVE_EXPR <b> > ;
>
> but I have expected i0 != 0 in the second VEC_COND_EXPR.

I don't put it there. This patch adds != 0, rather removing. But this
could be changed.

> I do see that the gimplifier pulls away the condition for the first
> VEC_COND_EXPR though:
>
>  x.0 = x;
>  y.1 = y;
>  D.2735 = x.0 < y.1;
>  D.2734 = D.2735;
>  D.2736 = D.2734;
>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
> { 0, 0, 0, 0 } > ;
>
> which is, I believe because of the SAVE_EXPR wrapped around the
> comparison.  Why do you bother wrapping all operands in save-exprs?

I bother because they could be MAYBE_CONST which breaks the
gimplifier. But I don't really know if you can do it better. I can
always do this checking on operands of constructed vcond...

You are right, that if you just put a comparison of variables there
then we are fine. My point is that whenever gimplifier is pulling out
the comparison from the first operand, replacing it with the variable,
then we are screwed, because there is no chance to put it back, and
that is exactly what happens in expand_vector_comparison, if you
uncomment the replacement -- comparison is always represented as x = a
> b.

> With that the
>
>  /* Currently the expansion of VEC_COND_EXPR does not allow
>     expessions where the type of vectors you compare differs
>     form the type of vectors you select from. For the time
>     being we insert implicit conversions.  */
>  if ((COMPARISON_CLASS_P (ifexp)
>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>
> checks will fail (because ifexp is a SAVE_EXPR).
>
> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
> the transform into x < y ? {-1,...} : {0,...} is not happening.
>>
>> Thanks,
>> Artem.
>>
>>
>> On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>>>> Richard
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>>>> complicated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>>>> duplication.
>>>>>>>>>>>>
>>>>>>>>>>>> Like:
>>>>>>>>>>>> mask = a > b;
>>>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>>>
>>>>>>>>>>>> Which in this case would be different from
>>>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>>>
>>>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>>>
>>>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>>>> mask.
>>>>>>>>>>>>
>>>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>>>
>>>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>>>
>>>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>>>> vec1 : vec2.
>>>>>>>>>>
>>>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>>>
>>>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>>>> that
>>>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>>>> dropping the comparison against zero.
>>>>>>>>>>
>>>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>>>> so no optimization is needed in this part.
>>>>>>>>>
>>>>>>>>> I mean for
>>>>>>>>>
>>>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>>>
>>>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>>>> it by v1 < v2.
>>>>>>>>
>>>>>>>> Yes, sure.
>>>>>>>>
>>>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>>>> do not have to agree).
>>>>>>>>>>
>>>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>>>> understand right?
>>>>>>>>>
>>>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>>>
>>>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>>>
>>>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>>>
>>>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>>>> doesn't know).
>>>>>>>
>>>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>>>
>>>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>>>
>>>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>>>
>>>>>>>> I didn't quite get that, could you give an example?
>>>>>>>
>>>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>>>
>>>>>> Ha, joking again. :)
>>>>>>
>>>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>>>
>>>>>>>>>> Well, take simpler example
>>>>>>>>>>
>>>>>>>>>> a = {0};
>>>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>>>
>>>>>>>>>> res = a ? v0 : v1;
>>>>>>>>>>
>>>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>>>
>>>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>>>> vector contents though).
>>>>>>>>
>>>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>>>
>>>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>>>> that needs to be careful to impose its stricter semantics.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>
>>>>>> Ok, I see the last difference in the approaches we envision.
>>>>>> I am assuming, that frontend does not put != 0, but the later
>>>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>>>> and does the same functionality as you describe. So the philosophical
>>>>>> question why it is better to first add and then remove, rather than
>>>>>> just add if needed?
>>>>>
>>>>> Well, it's "better be right than sorry".  Thus, default to the
>>>>> conservatively correct
>>>>> way and let optimizers "optimize" it.
>>>>
>>>> How can we get sorry, it is impossible to skip the vcond during the
>>>> optimisation, but whatever, it is not really so important when to add.
>>>> Currently I have a bigger problem, see below.
>>>>>
>>>>>> In all the rest I think we agreed.
>>>>>
>>>>> Fine.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Artem.
>>>>>>
>>>>>
>>>>
>>>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>>>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>>>> that gimplifier pulls a > b always as a separate expression during the
>>>> gimplification, and I don't think that we can avoid it. So what
>>>> happens is:
>>>>
>>>> vcond <a > b , c , d>
>>>> transformed to
>>>> x = b > c;
>>>> x1 = vcond <x , -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> and so on, infinitely long.
>>>
>>> Sounds like a bug that is possible to fix.
>>>
>>>> In order to fix the problem we need whether to introduce a new code
>>>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>>>> whether a builtin function which we would lower
>>>> whether stick back to the idea of hook.
>>>>
>>>> Anyway, representing a >b using vcond does not work.
>>>
>>> Well, sure it will work, it just needs some work appearantly.
>>>
>>>> What would be your thinking here?
>>>
>>> Do you have a patch that exposes this problem?  I can have a look
>>> tomorrow.
>>>
>>> Richard.
>>>
>>>>
>>>> Thanks,
>>>> Artem.
>>>>
>>>
>>
>
Richard Biener Aug. 23, 2011, 10:08 a.m. UTC | #3
On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> I'll just send you my current version. I'll be a little bit more specific.
>>>
>>> The problem starts when you try to lower the following expression:
>>>
>>> x = a > b;
>>> x1 = vcond <x != 0, -1, 0>
>>> vcond <x1, c, d>
>>>
>>> Now, you go from the beginning to the end of the block, and you cannot
>>> leave a > b, because only vconds are valid expressions to expand.
>>>
>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>> you see that you have something like:
>>>
>>> x' = a >b;
>>> x = vcond <x', -1, 0>
>>> x1 = vcond <x != 0, -1, 0>
>>> vcond <x1, c, d>
>>>
>>> and your gsi stands at the x1 now, so the gimplification created a
>>> comparison that optab would not understand. And I am not really sure
>>> that you would be able to solve this problem easily.
>>>
>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>> cant and x op y is a single tree that must be gimplified, and I am not
>>> sure that you can persuade gimplifier to leave this expression
>>> untouched.
>>>
>>> In the attachment the current version of the patch.
>>
>> I can't reproduce it with your patch.  For
>>
>> #define vector(elcount, type)  \
>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>
>> vector (4, float) x, y;
>> vector (4, int) a,b;
>> int
>> main (int argc, char *argv[])
>> {
>>  vector (4, int) i0 = x < y;
>>  vector (4, int) i1 = i0 ? a : b;
>>  return 0;
>> }
>>
>> I get from the C frontend:
>>
>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>> SAVE_EXPR <b> > ;
>>
>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>
> I don't put it there. This patch adds != 0, rather removing. But this
> could be changed.

?

>> I do see that the gimplifier pulls away the condition for the first
>> VEC_COND_EXPR though:
>>
>>  x.0 = x;
>>  y.1 = y;
>>  D.2735 = x.0 < y.1;
>>  D.2734 = D.2735;
>>  D.2736 = D.2734;
>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>> { 0, 0, 0, 0 } > ;
>>
>> which is, I believe because of the SAVE_EXPR wrapped around the
>> comparison.  Why do you bother wrapping all operands in save-exprs?
>
> I bother because they could be MAYBE_CONST which breaks the
> gimplifier. But I don't really know if you can do it better. I can
> always do this checking on operands of constructed vcond...

Err, the patch does

+  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+  tmp = c_fully_fold (ifexp, false, &maybe_const);
+  ifexp = save_expr (tmp);
+  wrap &= maybe_const;

why is

  ifexp = save_expr (tmp);

necessary here?  SAVE_EXPR is if you need to protect side-effects
from being evaluated twice if you use an operand twice.  But all
operands are just used a single time.

And I expected, instead of

+  if ((COMPARISON_CLASS_P (ifexp)
+       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
+      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
+    {
+      tree comp_type = COMPARISON_CLASS_P (ifexp)
+                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+                      : TREE_TYPE (ifexp);
+
+      op1 = convert (comp_type, op1);
+      op2 = convert (comp_type, op2);
+      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+      vcond = convert (TREE_TYPE (op1), vcond);
+    }
+  else
+    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);

  if (!COMPARISON_CLASS_P (ifexp))
    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
                         build_vector_from_val (TREE_TYPE (ifexp), 0));

  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
    {
...

> You are right, that if you just put a comparison of variables there
> then we are fine. My point is that whenever gimplifier is pulling out
> the comparison from the first operand, replacing it with the variable,
> then we are screwed, because there is no chance to put it back, and
> that is exactly what happens in expand_vector_comparison, if you
> uncomment the replacement -- comparison is always represented as x = a
>> b.
>
>> With that the
>>
>>  /* Currently the expansion of VEC_COND_EXPR does not allow
>>     expessions where the type of vectors you compare differs
>>     form the type of vectors you select from. For the time
>>     being we insert implicit conversions.  */
>>  if ((COMPARISON_CLASS_P (ifexp)
>>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>
>> checks will fail (because ifexp is a SAVE_EXPR).
>>
>> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
>> the transform into x < y ? {-1,...} : {0,...} is not happening.
Artem Shinkarov Aug. 23, 2011, 10:24 a.m. UTC | #4
On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>
>>>> The problem starts when you try to lower the following expression:
>>>>
>>>> x = a > b;
>>>> x1 = vcond <x != 0, -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>
>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>> you see that you have something like:
>>>>
>>>> x' = a >b;
>>>> x = vcond <x', -1, 0>
>>>> x1 = vcond <x != 0, -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>> comparison that optab would not understand. And I am not really sure
>>>> that you would be able to solve this problem easily.
>>>>
>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>> sure that you can persuade gimplifier to leave this expression
>>>> untouched.
>>>>
>>>> In the attachment the current version of the patch.
>>>
>>> I can't reproduce it with your patch.  For
>>>
>>> #define vector(elcount, type)  \
>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>
>>> vector (4, float) x, y;
>>> vector (4, int) a,b;
>>> int
>>> main (int argc, char *argv[])
>>> {
>>>  vector (4, int) i0 = x < y;
>>>  vector (4, int) i1 = i0 ? a : b;
>>>  return 0;
>>> }
>>>
>>> I get from the C frontend:
>>>
>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>> SAVE_EXPR <b> > ;
>>>
>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>
>> I don't put it there. This patch adds != 0, rather removing. But this
>> could be changed.
>
> ?
>
>>> I do see that the gimplifier pulls away the condition for the first
>>> VEC_COND_EXPR though:
>>>
>>>  x.0 = x;
>>>  y.1 = y;
>>>  D.2735 = x.0 < y.1;
>>>  D.2734 = D.2735;
>>>  D.2736 = D.2734;
>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>> { 0, 0, 0, 0 } > ;
>>>
>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>
>> I bother because they could be MAYBE_CONST which breaks the
>> gimplifier. But I don't really know if you can do it better. I can
>> always do this checking on operands of constructed vcond...
>
> Err, the patch does
>
> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
> +  ifexp = save_expr (tmp);
> +  wrap &= maybe_const;
>
> why is
>
>  ifexp = save_expr (tmp);
>
> necessary here?  SAVE_EXPR is if you need to protect side-effects
> from being evaluated twice if you use an operand twice.  But all
> operands are just used a single time.

Again, the only reason why save_expr is there is to avoid MAYBE_CONST
nodes to break the gimplification. But may be it is a wrong way of
doing it, but it does the job.

> And I expected, instead of
>
> +  if ((COMPARISON_CLASS_P (ifexp)
> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
> +    {
> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
> +                      : TREE_TYPE (ifexp);
> +
> +      op1 = convert (comp_type, op1);
> +      op2 = convert (comp_type, op2);
> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
> +      vcond = convert (TREE_TYPE (op1), vcond);
> +    }
> +  else
> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>
>  if (!COMPARISON_CLASS_P (ifexp))
>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>
>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>    {
> ...
>
Why?
This is a function to constuct any vcond. The result of ifexp is
always signed integer vector if it is a comparison, but we need to
make sure that all the elements of vcond have the same type.

And I didn't really understand if we can guarantee that vector
comparison would not be lifted out by the gimplifier. It happens in
case I put this save_expr, it could possibly happen in some other
cases. How can we prevent that?


Artem.

>> You are right, that if you just put a comparison of variables there
>> then we are fine. My point is that whenever gimplifier is pulling out
>> the comparison from the first operand, replacing it with the variable,
>> then we are screwed, because there is no chance to put it back, and
>> that is exactly what happens in expand_vector_comparison, if you
>> uncomment the replacement -- comparison is always represented as x = a
>>> b.
>>
>>> With that the
>>>
>>>  /* Currently the expansion of VEC_COND_EXPR does not allow
>>>     expessions where the type of vectors you compare differs
>>>     form the type of vectors you select from. For the time
>>>     being we insert implicit conversions.  */
>>>  if ((COMPARISON_CLASS_P (ifexp)
>>>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>
>>> checks will fail (because ifexp is a SAVE_EXPR).
>>>
>>> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
>>> the transform into x < y ? {-1,...} : {0,...} is not happening.
>
Richard Biener Aug. 23, 2011, 10:33 a.m. UTC | #5
On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>
>>>>> The problem starts when you try to lower the following expression:
>>>>>
>>>>> x = a > b;
>>>>> x1 = vcond <x != 0, -1, 0>
>>>>> vcond <x1, c, d>
>>>>>
>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>
>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>> you see that you have something like:
>>>>>
>>>>> x' = a >b;
>>>>> x = vcond <x', -1, 0>
>>>>> x1 = vcond <x != 0, -1, 0>
>>>>> vcond <x1, c, d>
>>>>>
>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>> comparison that optab would not understand. And I am not really sure
>>>>> that you would be able to solve this problem easily.
>>>>>
>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>> sure that you can persuade gimplifier to leave this expression
>>>>> untouched.
>>>>>
>>>>> In the attachment the current version of the patch.
>>>>
>>>> I can't reproduce it with your patch.  For
>>>>
>>>> #define vector(elcount, type)  \
>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>
>>>> vector (4, float) x, y;
>>>> vector (4, int) a,b;
>>>> int
>>>> main (int argc, char *argv[])
>>>> {
>>>>  vector (4, int) i0 = x < y;
>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>  return 0;
>>>> }
>>>>
>>>> I get from the C frontend:
>>>>
>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>> SAVE_EXPR <b> > ;
>>>>
>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>
>>> I don't put it there. This patch adds != 0, rather removing. But this
>>> could be changed.
>>
>> ?
>>
>>>> I do see that the gimplifier pulls away the condition for the first
>>>> VEC_COND_EXPR though:
>>>>
>>>>  x.0 = x;
>>>>  y.1 = y;
>>>>  D.2735 = x.0 < y.1;
>>>>  D.2734 = D.2735;
>>>>  D.2736 = D.2734;
>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>> { 0, 0, 0, 0 } > ;
>>>>
>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>
>>> I bother because they could be MAYBE_CONST which breaks the
>>> gimplifier. But I don't really know if you can do it better. I can
>>> always do this checking on operands of constructed vcond...
>>
>> Err, the patch does
>>
>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>> +  ifexp = save_expr (tmp);
>> +  wrap &= maybe_const;
>>
>> why is
>>
>>  ifexp = save_expr (tmp);
>>
>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>> from being evaluated twice if you use an operand twice.  But all
>> operands are just used a single time.
>
> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
> nodes to break the gimplification. But may be it is a wrong way of
> doing it, but it does the job.
>
>> And I expected, instead of
>>
>> +  if ((COMPARISON_CLASS_P (ifexp)
>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>> +    {
>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>> +                      : TREE_TYPE (ifexp);
>> +
>> +      op1 = convert (comp_type, op1);
>> +      op2 = convert (comp_type, op2);
>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>> +      vcond = convert (TREE_TYPE (op1), vcond);
>> +    }
>> +  else
>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>
>>  if (!COMPARISON_CLASS_P (ifexp))
>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>
>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>    {
>> ...
>>
> Why?
> This is a function to constuct any vcond. The result of ifexp is
> always signed integer vector if it is a comparison, but we need to
> make sure that all the elements of vcond have the same type.
>
> And I didn't really understand if we can guarantee that vector
> comparison would not be lifted out by the gimplifier. It happens in
> case I put this save_expr, it could possibly happen in some other
> cases. How can we prevent that?

We don't need to prevent it.  If the C frontend makes sure that the
mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
the expansion can do the obvious thing with a non-comparison mask
(have another md pattern for this case to handle AMD XOP vcond
or simply emit bitwise mask operations).

The gimplifier shouldn't unnecessarily pull out the comparison, but
you instructed it to - by means of wrapping it inside a SAVE_EXPR.

Richard.
Artem Shinkarov Aug. 23, 2011, 10:45 a.m. UTC | #6
On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>
>>>>>> The problem starts when you try to lower the following expression:
>>>>>>
>>>>>> x = a > b;
>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>> vcond <x1, c, d>
>>>>>>
>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>
>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>> you see that you have something like:
>>>>>>
>>>>>> x' = a >b;
>>>>>> x = vcond <x', -1, 0>
>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>> vcond <x1, c, d>
>>>>>>
>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>> that you would be able to solve this problem easily.
>>>>>>
>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>> untouched.
>>>>>>
>>>>>> In the attachment the current version of the patch.
>>>>>
>>>>> I can't reproduce it with your patch.  For
>>>>>
>>>>> #define vector(elcount, type)  \
>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>
>>>>> vector (4, float) x, y;
>>>>> vector (4, int) a,b;
>>>>> int
>>>>> main (int argc, char *argv[])
>>>>> {
>>>>>  vector (4, int) i0 = x < y;
>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>  return 0;
>>>>> }
>>>>>
>>>>> I get from the C frontend:
>>>>>
>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>> SAVE_EXPR <b> > ;
>>>>>
>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>
>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>> could be changed.
>>>
>>> ?
>>>
>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>> VEC_COND_EXPR though:
>>>>>
>>>>>  x.0 = x;
>>>>>  y.1 = y;
>>>>>  D.2735 = x.0 < y.1;
>>>>>  D.2734 = D.2735;
>>>>>  D.2736 = D.2734;
>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>> { 0, 0, 0, 0 } > ;
>>>>>
>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>
>>>> I bother because they could be MAYBE_CONST which breaks the
>>>> gimplifier. But I don't really know if you can do it better. I can
>>>> always do this checking on operands of constructed vcond...
>>>
>>> Err, the patch does
>>>
>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>> +  ifexp = save_expr (tmp);
>>> +  wrap &= maybe_const;
>>>
>>> why is
>>>
>>>  ifexp = save_expr (tmp);
>>>
>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>> from being evaluated twice if you use an operand twice.  But all
>>> operands are just used a single time.
>>
>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>> nodes to break the gimplification. But may be it is a wrong way of
>> doing it, but it does the job.
>>
>>> And I expected, instead of
>>>
>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>> +    {
>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>> +                      : TREE_TYPE (ifexp);
>>> +
>>> +      op1 = convert (comp_type, op1);
>>> +      op2 = convert (comp_type, op2);
>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>> +    }
>>> +  else
>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>
>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>
>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>    {
>>> ...
>>>
>> Why?
>> This is a function to constuct any vcond. The result of ifexp is
>> always signed integer vector if it is a comparison, but we need to
>> make sure that all the elements of vcond have the same type.
>>
>> And I didn't really understand if we can guarantee that vector
>> comparison would not be lifted out by the gimplifier. It happens in
>> case I put this save_expr, it could possibly happen in some other
>> cases. How can we prevent that?
>
> We don't need to prevent it.  If the C frontend makes sure that the
> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
> the expansion can do the obvious thing with a non-comparison mask
> (have another md pattern for this case to handle AMD XOP vcond
> or simply emit bitwise mask operations).
>
> The gimplifier shouldn't unnecessarily pull out the comparison, but
> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>
> Richard.
>

I'm confused.
There is a set of problems which are tightly connected and you address
only one one of them.

I need to do something with C_MAYBE_CONST_EXPR node to allow the
gimplification of the expression. In order to achieve that I am
wrapping expression which can contain C_MAYBE_EXPR_NODE into
SAVE_EXPR. This works fine, but, the vector condition is lifted out.
So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
sure that the expression is still inside VEC_COND_EXPR?

All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
integer type, and when we are using it we can add != 0 to the mask, no
problem. The problem is to make sure that the vector expression is not
lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
also no there at the same time.

Artem.
Richard Biener Aug. 23, 2011, 10:56 a.m. UTC | #7
On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>>
>>>>>>> The problem starts when you try to lower the following expression:
>>>>>>>
>>>>>>> x = a > b;
>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>> vcond <x1, c, d>
>>>>>>>
>>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>>
>>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>>> you see that you have something like:
>>>>>>>
>>>>>>> x' = a >b;
>>>>>>> x = vcond <x', -1, 0>
>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>> vcond <x1, c, d>
>>>>>>>
>>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>>> that you would be able to solve this problem easily.
>>>>>>>
>>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>>> untouched.
>>>>>>>
>>>>>>> In the attachment the current version of the patch.
>>>>>>
>>>>>> I can't reproduce it with your patch.  For
>>>>>>
>>>>>> #define vector(elcount, type)  \
>>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>
>>>>>> vector (4, float) x, y;
>>>>>> vector (4, int) a,b;
>>>>>> int
>>>>>> main (int argc, char *argv[])
>>>>>> {
>>>>>>  vector (4, int) i0 = x < y;
>>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>>  return 0;
>>>>>> }
>>>>>>
>>>>>> I get from the C frontend:
>>>>>>
>>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>>> SAVE_EXPR <b> > ;
>>>>>>
>>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>>
>>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>>> could be changed.
>>>>
>>>> ?
>>>>
>>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>>> VEC_COND_EXPR though:
>>>>>>
>>>>>>  x.0 = x;
>>>>>>  y.1 = y;
>>>>>>  D.2735 = x.0 < y.1;
>>>>>>  D.2734 = D.2735;
>>>>>>  D.2736 = D.2734;
>>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>>> { 0, 0, 0, 0 } > ;
>>>>>>
>>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>>
>>>>> I bother because they could be MAYBE_CONST which breaks the
>>>>> gimplifier. But I don't really know if you can do it better. I can
>>>>> always do this checking on operands of constructed vcond...
>>>>
>>>> Err, the patch does
>>>>
>>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>>> +  ifexp = save_expr (tmp);
>>>> +  wrap &= maybe_const;
>>>>
>>>> why is
>>>>
>>>>  ifexp = save_expr (tmp);
>>>>
>>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>>> from being evaluated twice if you use an operand twice.  But all
>>>> operands are just used a single time.
>>>
>>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>>> nodes to break the gimplification. But may be it is a wrong way of
>>> doing it, but it does the job.
>>>
>>>> And I expected, instead of
>>>>
>>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>> +    {
>>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>>> +                      : TREE_TYPE (ifexp);
>>>> +
>>>> +      op1 = convert (comp_type, op1);
>>>> +      op2 = convert (comp_type, op2);
>>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>>> +    }
>>>> +  else
>>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>>
>>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>>
>>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>    {
>>>> ...
>>>>
>>> Why?
>>> This is a function to constuct any vcond. The result of ifexp is
>>> always signed integer vector if it is a comparison, but we need to
>>> make sure that all the elements of vcond have the same type.
>>>
>>> And I didn't really understand if we can guarantee that vector
>>> comparison would not be lifted out by the gimplifier. It happens in
>>> case I put this save_expr, it could possibly happen in some other
>>> cases. How can we prevent that?
>>
>> We don't need to prevent it.  If the C frontend makes sure that the
>> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
>> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
>> the expansion can do the obvious thing with a non-comparison mask
>> (have another md pattern for this case to handle AMD XOP vcond
>> or simply emit bitwise mask operations).
>>
>> The gimplifier shouldn't unnecessarily pull out the comparison, but
>> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>>
>> Richard.
>>
>
> I'm confused.
> There is a set of problems which are tightly connected and you address
> only one one of them.
>
> I need to do something with C_MAYBE_CONST_EXPR node to allow the
> gimplification of the expression. In order to achieve that I am
> wrapping expression which can contain C_MAYBE_EXPR_NODE into
> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
> sure that the expression is still inside VEC_COND_EXPR?

I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
until gimplification.  I thought c_fully_fold is exactly used (instead
of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
nodes.  Instead you delay that (well, commented out in your patch).

> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
> integer type, and when we are using it we can add != 0 to the mask, no
> problem. The problem is to make sure that the vector expression is not
> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
> also no there at the same time.

Well, for example for floating-point comparisons and -fnon-call-exceptions
you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
that shouldn't be an issue because C semantics are ensured for
the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
the VEC_COND_EXPR semantic for a non-comparison mask operand
is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
expand mask = v0 < v1 anyway, but we'll simply expand it if it were
VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.

So, I don't really see any problems for the C frontend or gimplification side.
We do have to make expansion handle more cases, but they can be all
dispatched to make use of the vcond named expander and handling
the mask ? v1 : v2 case with bitwise operations (to be optimized later
by introducing another named expander to match XOP vcond).

Richard.

> Artem.
>
Artem Shinkarov Aug. 23, 2011, 11:11 a.m. UTC | #8
On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>>>
>>>>>>>> The problem starts when you try to lower the following expression:
>>>>>>>>
>>>>>>>> x = a > b;
>>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>>> vcond <x1, c, d>
>>>>>>>>
>>>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>>>
>>>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>>>> you see that you have something like:
>>>>>>>>
>>>>>>>> x' = a >b;
>>>>>>>> x = vcond <x', -1, 0>
>>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>>> vcond <x1, c, d>
>>>>>>>>
>>>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>>>> that you would be able to solve this problem easily.
>>>>>>>>
>>>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>>>> untouched.
>>>>>>>>
>>>>>>>> In the attachment the current version of the patch.
>>>>>>>
>>>>>>> I can't reproduce it with your patch.  For
>>>>>>>
>>>>>>> #define vector(elcount, type)  \
>>>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>>
>>>>>>> vector (4, float) x, y;
>>>>>>> vector (4, int) a,b;
>>>>>>> int
>>>>>>> main (int argc, char *argv[])
>>>>>>> {
>>>>>>>  vector (4, int) i0 = x < y;
>>>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>>>  return 0;
>>>>>>> }
>>>>>>>
>>>>>>> I get from the C frontend:
>>>>>>>
>>>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>>>> SAVE_EXPR <b> > ;
>>>>>>>
>>>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>>>
>>>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>>>> could be changed.
>>>>>
>>>>> ?
>>>>>
>>>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>>>> VEC_COND_EXPR though:
>>>>>>>
>>>>>>>  x.0 = x;
>>>>>>>  y.1 = y;
>>>>>>>  D.2735 = x.0 < y.1;
>>>>>>>  D.2734 = D.2735;
>>>>>>>  D.2736 = D.2734;
>>>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>>>> { 0, 0, 0, 0 } > ;
>>>>>>>
>>>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>>>
>>>>>> I bother because they could be MAYBE_CONST which breaks the
>>>>>> gimplifier. But I don't really know if you can do it better. I can
>>>>>> always do this checking on operands of constructed vcond...
>>>>>
>>>>> Err, the patch does
>>>>>
>>>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>>>> +  ifexp = save_expr (tmp);
>>>>> +  wrap &= maybe_const;
>>>>>
>>>>> why is
>>>>>
>>>>>  ifexp = save_expr (tmp);
>>>>>
>>>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>>>> from being evaluated twice if you use an operand twice.  But all
>>>>> operands are just used a single time.
>>>>
>>>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>>>> nodes to break the gimplification. But may be it is a wrong way of
>>>> doing it, but it does the job.
>>>>
>>>>> And I expected, instead of
>>>>>
>>>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>> +    {
>>>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>>>> +                      : TREE_TYPE (ifexp);
>>>>> +
>>>>> +      op1 = convert (comp_type, op1);
>>>>> +      op2 = convert (comp_type, op2);
>>>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>>>> +    }
>>>>> +  else
>>>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>>>
>>>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>>>
>>>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>>    {
>>>>> ...
>>>>>
>>>> Why?
>>>> This is a function to constuct any vcond. The result of ifexp is
>>>> always signed integer vector if it is a comparison, but we need to
>>>> make sure that all the elements of vcond have the same type.
>>>>
>>>> And I didn't really understand if we can guarantee that vector
>>>> comparison would not be lifted out by the gimplifier. It happens in
>>>> case I put this save_expr, it could possibly happen in some other
>>>> cases. How can we prevent that?
>>>
>>> We don't need to prevent it.  If the C frontend makes sure that the
>>> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
>>> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
>>> the expansion can do the obvious thing with a non-comparison mask
>>> (have another md pattern for this case to handle AMD XOP vcond
>>> or simply emit bitwise mask operations).
>>>
>>> The gimplifier shouldn't unnecessarily pull out the comparison, but
>>> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>>>
>>> Richard.
>>>
>>
>> I'm confused.
>> There is a set of problems which are tightly connected and you address
>> only one one of them.
>>
>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>> gimplification of the expression. In order to achieve that I am
>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>> sure that the expression is still inside VEC_COND_EXPR?
>
> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
> until gimplification.  I thought c_fully_fold is exactly used (instead
> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
> nodes.  Instead you delay that (well, commented out in your patch).

Ok. So for the time being save_expr is the only way that we know to
avoid C_MAYBE_CONST_EXPR nodes.

>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>> integer type, and when we are using it we can add != 0 to the mask, no
>> problem. The problem is to make sure that the vector expression is not
>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>> also no there at the same time.
>
> Well, for example for floating-point comparisons and -fnon-call-exceptions
> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
> that shouldn't be an issue because C semantics are ensured for
> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
> the VEC_COND_EXPR semantic for a non-comparison mask operand
> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.

Richard, I think you almost get it, but there is a tiny thing you have missed.
Look, let's assume, that by some reason when we gimplified a > b, the
comparison was lifted out. So we have the following situation:

D.1 = a > b;
comp = vcond<D.1, v0, v1>
...

Ok?
Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
what I am doing in the veclower is when I meet vector comparison a >
b, I wrap it in the VCOND, otherwise it would not be recognized by
optabs. literally I am doing:

rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>

And here is a devil hidden. By some reason, when this expression is
gimplified, a > b is lifted again and is left outside the
VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
any ideas what could be done here?


Artem.
> So, I don't really see any problems for the C frontend or gimplification side.
> We do have to make expansion handle more cases, but they can be all
> dispatched to make use of the vcond named expander and handling
> the mask ? v1 : v2 case with bitwise operations (to be optimized later
> by introducing another named expander to match XOP vcond).
>
> Richard.
>
>> Artem.
>>
>
Artem Shinkarov Aug. 23, 2011, 11:13 a.m. UTC | #9
Sorry, not
rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>

but rather

rhs = gimplify_build3 (gsi, VEC_COND_EXPR, build2 (GT_EXPR, type, a,
b), {-1}, {0}>


Artem.
Richard Biener Aug. 23, 2011, 11:23 a.m. UTC | #10
On Tue, Aug 23, 2011 at 1:11 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> I'm confused.
>>> There is a set of problems which are tightly connected and you address
>>> only one one of them.
>>>
>>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>>> gimplification of the expression. In order to achieve that I am
>>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>>> sure that the expression is still inside VEC_COND_EXPR?
>>
>> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
>> until gimplification.  I thought c_fully_fold is exactly used (instead
>> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
>> nodes.  Instead you delay that (well, commented out in your patch).
>
> Ok. So for the time being save_expr is the only way that we know to
> avoid C_MAYBE_CONST_EXPR nodes.
>
>>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>>> integer type, and when we are using it we can add != 0 to the mask, no
>>> problem. The problem is to make sure that the vector expression is not
>>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>>> also no there at the same time.
>>
>> Well, for example for floating-point comparisons and -fnon-call-exceptions
>> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
>> that shouldn't be an issue because C semantics are ensured for
>> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
>> the VEC_COND_EXPR semantic for a non-comparison mask operand
>> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
>> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
>> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.
>
> Richard, I think you almost get it, but there is a tiny thing you have missed.
> Look, let's assume, that by some reason when we gimplified a > b, the
> comparison was lifted out. So we have the following situation:
>
> D.1 = a > b;
> comp = vcond<D.1, v0, v1>
> ...
>
> Ok?
> Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
> what I am doing in the veclower is when I meet vector comparison a >
> b, I wrap it in the VCOND, otherwise it would not be recognized by
> optabs. literally I am doing:
>
> rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>
>
> And here is a devil hidden. By some reason, when this expression is
> gimplified, a > b is lifted again and is left outside the
> VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
> any ideas what could be done here?

Well, don't do it.  Check if the target can expand

 D.1 = a > b;

via feeding it vcond <a < b, {-1,...}, {0,...} > and if not, expand it piecewise
in veclower.  If it can handle it - leave it alone!

In expand_expr_real_2 add to the EQ_EXPR (etc.) case the case
of a vector-typed comparison and use the vcond optab for it, again
via vcond <a < b, {-1,...}, {0,...} >.  If you look at the EQ_EXPR case
it dispatches to do_store_flag - that's the best place to handle
vector-typed compares.

Richard.
Artem Shinkarov Aug. 23, 2011, 11:57 a.m. UTC | #11
On Tue, Aug 23, 2011 at 12:23 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 1:11 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> I'm confused.
>>>> There is a set of problems which are tightly connected and you address
>>>> only one one of them.
>>>>
>>>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>>>> gimplification of the expression. In order to achieve that I am
>>>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>>>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>>>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>>>> sure that the expression is still inside VEC_COND_EXPR?
>>>
>>> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
>>> until gimplification.  I thought c_fully_fold is exactly used (instead
>>> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
>>> nodes.  Instead you delay that (well, commented out in your patch).
>>
>> Ok. So for the time being save_expr is the only way that we know to
>> avoid C_MAYBE_CONST_EXPR nodes.
>>
>>>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>>>> integer type, and when we are using it we can add != 0 to the mask, no
>>>> problem. The problem is to make sure that the vector expression is not
>>>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>>>> also no there at the same time.
>>>
>>> Well, for example for floating-point comparisons and -fnon-call-exceptions
>>> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
>>> that shouldn't be an issue because C semantics are ensured for
>>> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
>>> the VEC_COND_EXPR semantic for a non-comparison mask operand
>>> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
>>> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
>>> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.
>>
>> Richard, I think you almost get it, but there is a tiny thing you have missed.
>> Look, let's assume, that by some reason when we gimplified a > b, the
>> comparison was lifted out. So we have the following situation:
>>
>> D.1 = a > b;
>> comp = vcond<D.1, v0, v1>
>> ...
>>
>> Ok?
>> Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
>> what I am doing in the veclower is when I meet vector comparison a >
>> b, I wrap it in the VCOND, otherwise it would not be recognized by
>> optabs. literally I am doing:
>>
>> rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>
>>
>> And here is a devil hidden. By some reason, when this expression is
>> gimplified, a > b is lifted again and is left outside the
>> VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
>> any ideas what could be done here?
>
> Well, don't do it.  Check if the target can expand
>
>  D.1 = a > b;
>
> via feeding it vcond <a < b, {-1,...}, {0,...} > and if not, expand it piecewise
> in veclower.  If it can handle it - leave it alone!
>
> In expand_expr_real_2 add to the EQ_EXPR (etc.) case the case
> of a vector-typed comparison and use the vcond optab for it, again
> via vcond <a < b, {-1,...}, {0,...} >.  If you look at the EQ_EXPR case
> it dispatches to do_store_flag - that's the best place to handle
> vector-typed compares.
>
> Richard.
>
That sounds like a plan. I'll investigate if it can be done.
Also, if we can handle a > b, then we don't need to construct vcond<a
> b, {-1}, {0}>, we will know that it would be constructed correctly
when expanding.


Thanks for your help,
Artem.
diff mbox

Patch

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@  invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@  misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@  misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@  default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@  extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@  DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6572,16 +6572,37 @@  expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
+  
+  if (COMPARISON_CLASS_P (op0))
+    {
+      comparison = vector_compare_rtx (op0, unsignedp, icode);
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_fixed_operand (&ops[3], comparison);
+      create_fixed_operand (&ops[4], XEXP (comparison, 0));
+      create_fixed_operand (&ops[5], XEXP (comparison, 1));
+
+    }
+  else
+    {
+      rtx rtx_op0;
+      rtx vec; 
+    
+      rtx_op0 = expand_normal (op0);
+      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); 
+      vec = CONST0_RTX (mode);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);
+    }
 
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
   return ops[0].value;
 }
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@ 
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@  extract_muldiv_1 (tree t, tree c, enum t
 }
 
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -9073,26 +9082,28 @@  fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      tree arg0_type = TREE_TYPE (arg0);
+      
       switch (code)
 	{
 	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  break;
 
 	case GE_EXPR:
 	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
 	case NE_EXPR:
 	  /* For NE, we can only do this simplification if integer
 	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (FLOAT_TYPE_P (arg0_type)
+	      && HONOR_NANS (TYPE_MODE (arg0_type)))
 	    break;
 	  /* ... fall through ...  */
 	case GT_EXPR:
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,78 @@ 
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != (" fmt1 " " #op " " fmt1 " ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){-1,-1}), d0, d1, ==, "%f", "%i");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){-1,-1}), l0, l1, ==, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){-1,-1,-1,-1}), 
+		 f0, f1, ==, "%f", "%i");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){-1,-1,-1,-1}), 
+		 i0, i1, ==, "%i", "%i");
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@ 
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@ 
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   //test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@ 
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  
+  /* Do not trigger that  */
+  q4 ? p4 : r4;	    /* { "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4009,6 +4009,52 @@  ep_convert_and_check (tree type, tree ex
   return convert (type, expr);
 }
 
+static tree
+fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
+{
+  bool wrap = true;
+  bool maybe_const = false;
+  tree vcond, tmp;
+
+  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+  tmp = c_fully_fold (ifexp, false, &maybe_const);
+  ifexp = save_expr (tmp);
+  wrap &= maybe_const;
+  
+  tmp = c_fully_fold (op1, false, &maybe_const);
+  op1 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  tmp = c_fully_fold (op2, false, &maybe_const);
+  op2 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  /* Currently the expansion of VEC_COND_EXPR does not allow
+     expessions where the type of vectors you compare differs
+     form the type of vectors you select from. For the time
+     being we insert implicit conversions.  */
+  if ((COMPARISON_CLASS_P (ifexp)
+       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
+      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
+    {
+      tree comp_type = COMPARISON_CLASS_P (ifexp)
+		       ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+		       : TREE_TYPE (ifexp);
+      
+      op1 = convert (comp_type, op1);
+      op2 = convert (comp_type, op2);
+      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+      vcond = convert (TREE_TYPE (op1), vcond);
+    }
+  else
+    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
+
+  /*if (!wrap)
+    vcond = c_wrap_maybe_const (vcond, true);*/
+
+  return vcond;
+}
+
 /* Build and return a conditional expression IFEXP ? OP1 : OP2.  If
    IFEXP_BCP then the condition is a call to __builtin_constant_p, and
    if folded to an integer constant then the unselected half may
@@ -4058,6 +4104,49 @@  build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparison arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      return fold_build_vec_cond_expr (ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9995,37 @@  build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /*break;  */
+
+	  ret = fold_build_vec_cond_expr 
+		       (build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10138,37 @@  build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /* break; */
+	  ret = fold_build_vec_cond_expr 
+		       (build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10576,10 @@  c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@  gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,36 @@  gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    {
+		      goto expr_2;
+		      /* XXX my humble attempt to avoid comparisons.
+		      enum gimplify_status r0, r1;
+		      tree t, f;
+
+		      debug_tree (*expr_p);
+
+		      r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					  post_p, is_gimple_condexpr, fb_rvalue);
+		      r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+					  post_p, is_gimple_val, fb_rvalue);
+		      
+		      t = build_vector_from_val (TREE_TYPE (*expr_p),
+				    build_int_cst (TREE_TYPE (TREE_TYPE (*expr_p)), -1));
+		      f = build_vector_from_val (TREE_TYPE (*expr_p),
+				    build_int_cst (TREE_TYPE (TREE_TYPE (*expr_p)), 0));
+
+		      recalculate_side_effects (*expr_p);  
+		      t = build3 (VEC_COND_EXPR, TREE_TYPE (*expr_p), *expr_p, t, f);
+		      *expr_p = t;
+										
+		      ret = MIN (r0, r1);
+		      break;*/
+		    }
+
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@  DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 177665)
+++ gcc/emit-rtl.c	(working copy)
@@ -5474,6 +5474,11 @@  gen_const_vector (enum machine_mode mode
   return tem;
 }
 
+rtx
+gen_const_vector1 (enum machine_mode mode, int constant)
+{
+  return gen_const_vector (mode, constant);
+}
 /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when
    all elements are zero, and the one vector when all elements are one.  */
 rtx
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	(revision 177665)
+++ gcc/tree-ssa-forwprop.c	(working copy)
@@ -585,6 +585,128 @@  forward_propagate_into_cond (gimple_stmt
   return 0;
 }
 
+
+static tree
+combine_vec_cond_expr_cond (location_t loc, enum tree_code code, 
+			    tree type, tree op0, tree op1)
+{
+  tree t;
+
+  if (op0 == NULL_TREE && op1 == NULL_TREE)
+    return NULL_TREE;
+
+  if (op0 == NULL_TREE)
+    return op1;
+
+  if (op1 == NULL_TREE)
+    return op0;
+
+  gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
+
+  t = fold_binary_loc (loc, code, type, op0, op1);
+  if (!t)
+    return NULL_TREE;
+
+  /* Require that we got a boolean type out if we put one in.  */
+  gcc_assert (TREE_CODE (TREE_TYPE (t)) == TREE_CODE (type));
+
+  /* Canonicalize the combined condition for use in a COND_EXPR.  */
+  /* t = canonicalize_cond_expr_cond (t); */
+
+  /* Bail out if we required an invariant but didn't get one.  */
+  if (!t)
+    return NULL_TREE;
+
+  return t;
+}
+
+
+
+static tree
+forward_propagate_into_vec_comp (location_t loc, tree expr)
+{
+  tree tmp = NULL_TREE;
+  tree rhs0 = NULL_TREE, rhs1 = NULL_TREE;
+  bool single_use0_p = false, single_use1_p = false;
+
+  /* For comparisons use the first operand, that is likely to
+     simplify comparisons against constants.  */
+  /* debug_tree (expr);  */
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    {
+      tree type = TREE_TYPE (expr);
+      tree lhs = forward_propagate_into_vec_comp (loc, TREE_OPERAND (expr, 0));
+      tree rhs = forward_propagate_into_vec_comp (loc, TREE_OPERAND (expr, 1));
+
+      return combine_vec_cond_expr_cond (loc, TREE_CODE (expr), 
+					type, lhs, rhs);
+    }
+  else if (TREE_CODE (expr) == SSA_NAME)
+    {
+      gimple def_stmt = get_prop_source_stmt (expr, false, &single_use0_p);
+      if (def_stmt && can_propagate_from (def_stmt))
+	{
+	  expr = rhs_to_tree (TREE_TYPE (expr), def_stmt);
+	  return forward_propagate_into_vec_comp (loc, expr);
+	}
+      else
+	return tmp;
+    }
+
+  return tmp;
+}
+
+
+
+
+/* The same as forward_propogate_into_cond only for vector conditions.  */
+static int
+forward_propagate_into_vec_cond (gimple_stmt_iterator *gsi_p)
+{
+  gimple stmt = gsi_stmt (*gsi_p);
+  location_t loc = gimple_location (stmt);
+  tree tmp = NULL_TREE;
+  tree cond = gimple_assign_rhs1 (stmt);
+
+  /* We can do tree combining on SSA_NAME and comparison expressions.  */
+  if (TREE_CODE (cond) == VEC_COND_EXPR)
+    tmp = forward_propagate_into_vec_comp (loc, cond);
+  else if (TREE_CODE (cond) == SSA_NAME)
+    {
+      tree name = cond, rhs0;
+      gimple def_stmt = get_prop_source_stmt (name, true, NULL);
+      if (!def_stmt || !can_propagate_from (def_stmt))
+	return 0;
+
+      rhs0 = gimple_assign_rhs1 (def_stmt);
+      tmp = forward_propagate_into_vec_comp (loc, rhs0);
+    }
+
+  /* XXX Don't change anything for the time being.  */
+  tmp = NULL_TREE;
+
+  if (tmp)
+    {
+      if (tmp)
+	{
+	  fprintf (dump_file, "  Replaced '");
+	  print_generic_expr (dump_file, cond, 0);
+	  fprintf (dump_file, "' with '");
+	  print_generic_expr (dump_file, tmp, 0);
+	  fprintf (dump_file, "'\n");
+	}
+
+      gimple_assign_set_rhs_from_tree (gsi_p, unshare_expr (tmp));
+      stmt = gsi_stmt (*gsi_p);
+      update_stmt (stmt);
+
+      return is_gimple_min_invariant (tmp) ? 2 : 1;
+    }
+
+  return 0;
+}
+
 /* We've just substituted an ADDR_EXPR into stmt.  Update all the
    relevant data structures to match.  */
 
@@ -2445,6 +2567,20 @@  ssa_forward_propagate_and_combine (void)
 		    stmt = gsi_stmt (gsi);
 		    if (did_something == 2)
 		      cfg_changed = true;
+		    fold_undefer_overflow_warnings
+		      (!TREE_NO_WARNING (rhs1) && did_something, stmt,
+		       WARN_STRICT_OVERFLOW_CONDITIONAL);
+		    changed = did_something != 0;
+		  }
+		else if (code == VEC_COND_EXPR)
+		  {
+		    /* In this case the entire VEC_COND_EXPR is in rhs1. */
+		    int did_something;
+		    fold_defer_overflow_warnings ();
+		    did_something = forward_propagate_into_vec_cond (&gsi);
+		    stmt = gsi_stmt (gsi);
+		    if (did_something == 2)
+		      cfg_changed = true;
 		    fold_undefer_overflow_warnings
 		      (!TREE_NO_WARNING (rhs1) && did_something, stmt,
 		       WARN_STRICT_OVERFLOW_CONDITIONAL);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,11 +30,16 @@  along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +130,31 @@  do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
+
+  cond = gimplify_build2 (gsi, code, comp_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, 
+                    build_int_cst (comp_type, -1),
+                    build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +363,49 @@  uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 using  
+   builtin_vec_compare hardware hook, in case target does not 
+   support comparison of type TYPE, extract comparison piecewise.  
+   GSI is used inside the target hook to create the code needed
+   for the given comparison.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  /*if (expand_vec_cond_expr_p (type, TYPE_MODE (type)))
+    {
+      tree arg_type = TREE_TYPE (op0);
+      tree if_true, if_false, ifexp;
+      tree el_type = TREE_TYPE (type);
+      
+      //el_type = lang_hooks.types.type_for_size (TYPE_PRECISION (el_type), 0);
+
+      if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+      if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+      ifexp = gimplify_build2 (gsi, code, type, op0, op1);
+
+      debug_tree (ifexp);
+      debug_tree (if_true);
+      debug_tree (if_false);
+
+      if (arg_type != type)
+	{
+	  if_true = convert (arg_type, if_true);
+	  if_false = convert (arg_type, if_true);
+	  t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
+	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR,  type, t);
+	}
+      else
+	t = gimplify_build3 (gsi, VEC_COND_EXPR, type, ifexp, if_true, if_false);
+    }
+  else
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);*/
+  return gimplify_build2  (gsi, code, type, op0, op1);;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +448,27 @@  expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -432,6 +524,126 @@  type_for_widest_vector_mode (enum machin
     }
 }
 
+
+
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+  gimple_stmt_iterator gsi_tmp;
+  tree t;
+
+  
+  if (COMPARISON_CLASS_P (cond))
+    {
+      /* Expand vector condition inside of VEC_COND_EXPR.  */
+      op = optab_for_tree_code (TREE_CODE (cond), type, optab_default);
+      if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+	{
+	  tree op0 = TREE_OPERAND (cond, 0);
+	  tree op1 = TREE_OPERAND (cond, 1);
+
+	  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+	  new_rhs = expand_vector_piecewise (gsi, do_compare, 
+					     TREE_TYPE (cond),
+					     TREE_TYPE (TREE_TYPE (op1)),
+					     op0, op1, TREE_CODE (cond));
+
+	  new_stmt = gimple_build_assign (var, new_rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (gsi_stmt (*gsi));
+	}
+      else
+	var = cond;
+    }
+  else
+    var = cond;
+  
+  gsi_tmp = *gsi;
+  gsi_prev (&gsi_tmp);
+
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+
+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);
+  
+  return t;
+}
+
+static bool
+is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
+{
+  tree type = TREE_TYPE (expr);
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    return true;
+    
+  if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE)
+    return true;
+
+  if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR
+      || TREE_CODE (expr) == BIT_XOR_EXPR)
+    return is_vector_comparison (gsi, TREE_OPERAND (expr, 0))
+	   & is_vector_comparison (gsi, TREE_OPERAND (expr, 1));
+
+  if (TREE_CODE (expr) == VAR_DECL)
+    { 
+      gimple_stmt_iterator gsi_tmp;
+      tree name = DECL_NAME (expr);
+      tree var = NULL_TREE;
+      
+      gsi_tmp = *gsi;
+
+      for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp))
+	{
+	  gimple stmt = gsi_stmt (gsi_tmp);
+
+	  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	    continue;
+
+	  if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	      && DECL_NAME (gimple_assign_lhs (stmt)) == name)
+	    return is_vector_comparison (&gsi_tmp, 
+					 gimple_assign_rhs_to_tree (stmt));
+	}
+    } 
+  
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      enum tree_code code;
+      gimple exprdef = SSA_NAME_DEF_STMT (expr);
+
+      if (gimple_code (exprdef) != GIMPLE_ASSIGN)
+	return false;
+
+      if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE)
+	return false;
+
+      
+      return is_vector_comparison (gsi, 
+				   gimple_assign_rhs_to_tree (exprdef));
+    }
+
+  return false;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -450,11 +662,34 @@  expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
+
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      if (!is_vector_comparison (gsi, cond))
+	TREE_OPERAND (exp, 0) = 
+		    build2 (NE_EXPR, TREE_TYPE (cond), cond,
+			    build_vector_from_val (TREE_TYPE (cond),
+			    build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
+      
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@  EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@  TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@  tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@  verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@  c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@  c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@  along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18402,27 +18403,55 @@  ix86_expand_sse_fp_minmax (rtx dest, enu
   return true;
 }
 
+rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT);
+
+/* Returns a vector of mode MODE where all the elements are ARG.  */
+rtx
+rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg)
+{
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+  
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg);
+  
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
 /* Expand an sse vector comparison.  Return the register with the result.  */
 
 static rtx
 ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
-		     rtx op_true, rtx op_false)
+		     rtx op_true, rtx op_false, bool no_comparison)
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx x;
 
-  cmp_op0 = force_reg (mode, cmp_op0);
-  if (!nonimmediate_operand (cmp_op1, mode))
-    cmp_op1 = force_reg (mode, cmp_op1);
+  /* Avoid useless comparison.  */
+  if (no_comparison)
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      x = cmp_op0;
+    }
+  else
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      if (!nonimmediate_operand (cmp_op1, mode))
+	cmp_op1 = force_reg (mode, cmp_op1);
+
+      x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
+    }
 
   if (optimize
       || reg_overlap_mentioned_p (dest, op_true)
       || reg_overlap_mentioned_p (dest, op_false))
     dest = gen_reg_rtx (mode);
 
-  x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
-
   return dest;
 }
 
@@ -18434,8 +18463,14 @@  ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  rtx mask_true;
+  
+  if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);
@@ -18512,7 +18547,7 @@  ix86_expand_fp_movcc (rtx operands[])
 	return true;
 
       tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1,
-				 operands[2], operands[3]);
+				 operands[2], operands[3], false);
       ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]);
       return true;
     }
@@ -18555,7 +18590,7 @@  ix86_expand_fp_vcond (rtx operands[])
     return true;
 
   cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
+			     operands[1], operands[2], false);
   ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
@@ -18569,7 +18604,9 @@  ix86_expand_int_vcond (rtx operands[])
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
+  rtx comp;
 
+  comp = operands[3];
   cop0 = operands[4];
   cop1 = operands[5];
 
@@ -18681,8 +18718,18 @@  ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			   operands[1+negate], operands[2-negate]);
+  if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX 
+      && XEXP (comp, 1) == NULL_RTX)
+    {
+      rtx vec =  CONST0_RTX (mode);
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, vec,
+			       operands[1+negate], operands[2-negate], true);
+    }
+  else
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1+negate], operands[2-negate], false);
+    }
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
@@ -18774,7 +18821,7 @@  ix86_expand_sse_unpack (rtx operands[2],
 	tmp = force_reg (imode, CONST0_RTX (imode));
       else
 	tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
-				   operands[1], pc_rtx, pc_rtx);
+				   operands[1], pc_rtx, pc_rtx, false);
 
       emit_insn (unpack (dest, operands[1], tmp));
     }
@@ -32827,6 +32874,276 @@  ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35587,11 @@  ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function