Patchwork Vector comparison

login
register
mail settings
Submitter Artem Shinkarov
Date Aug. 15, 2010, 5:09 p.m.
Message ID <AANLkTimDve4hPRY9NEHprfa9SDjUGRykgEM=mdDFKNQf@mail.gmail.com>
Download mbox | patch
Permalink /patch/61753/
State New
Headers show

Comments

Artem Shinkarov - Aug. 15, 2010, 5:09 p.m.
This patch implements vector comparison according to OpenCL standard.
The patch tries to dispatch vector comparison to hardware-specific
instructions. If case when it is impossible, vector comparison is
expanded piecewise.

Currently the compilation process with this patch applied slows down
dramatically if you compile with -On. Can't find the reason. Any help
would be appreciated.

Changelog:

2010-08-15 Artem Shinkarov <artyom.shinkaroff@gmail.com>

        gcc/
        * targhooks.c (default_builtin_vec_compare): New hook.
        * targhooks.h (default_builtin_vec_compare): New definition.
        * target.def (builtin_vec_compare): New hook.
        * tree.c (build_vector_from_val): New function.
        * tree.h (build_vector_from_val): New definition.
        * target.h: New include (gimple.h).
        * c-typeck.c (build_binary_op): Allow vector comparison.
        (c_obj_common_truthvalue_conversion): Deny vector comparison
        inside of if statement.
        * tree-vect-generic.c (do_compare): Helper function.
        (expand_vector_comparison): Check if hardware comparison
        is available, if not expand comparison piecewise.
        (expand_vector_operation): Handle vector comparison
        expressions separately.
        * Makefile.in: New dependencies.
        * tree-cfg.c (verify_gimple_comparison): Allow vector
        comparison operations in gimple.
        * config/i386/i386.c (vector_fp_compare): Build hardware
        specific code for floating point vector comparison.
        (vector_int_compare): Build hardware specific code for
        integer vector comparison.
        (ix86_vectorize_builtin_vec_compare): Implementation of
        builtin_vec_compare hook.

        gcc/testsuite/
        * gcc.c-torture/execute/vector-compare-1.c: New test.

        gcc/doc
        * extend.texi: Adjust.
        * tm.texi: Adjust.
        * tm.texi.in: Adjust.


bootstrapped and tested on x86_64_unknown-linux.


Thank you,
Artem Shinkarov.
Andrew Pinski - Aug. 15, 2010, 5:30 p.m.
On Sun, Aug 15, 2010 at 10:09 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> This patch implements vector comparison according to OpenCL standard.
> The patch tries to dispatch vector comparison to hardware-specific
> instructions. If case when it is impossible, vector comparison is
> expanded piecewise.

I had posted a patch which had implemented them using the standard
tree codes and expansion and tree-lower-vect took care of the rest.
This was for reduction of vector comparison into a single scalar.
In fact we had agreed that they should implicitly turn a vector int
into a scalar when used in the context of a bool.
See http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01912.html .


Please consider using tree codes all the way through the gimple IR if
the target supports expansion and doing the expansion only at expand
time.  Building a call expression is expensive and really you could
use opcodes and not worry about a target hook.  I think opcodes it is
a much easier way allowing for targets to add support rather than
adding more and more target hooks.

Thanks,
Andrew Pinski
Artem Shinkarov - Aug. 15, 2010, 6:38 p.m.
On Sun, Aug 15, 2010 at 6:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Sun, Aug 15, 2010 at 10:09 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> This patch implements vector comparison according to OpenCL standard.
>> The patch tries to dispatch vector comparison to hardware-specific
>> instructions. If case when it is impossible, vector comparison is
>> expanded piecewise.
>
> I had posted a patch which had implemented them using the standard
> tree codes and expansion and tree-lower-vect took care of the rest.
> This was for reduction of vector comparison into a single scalar.
> In fact we had agreed that they should implicitly turn a vector int
> into a scalar when used in the context of a bool.
> See http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01912.html .

Yes, I looked over your patch. The thing is that OpenCL defines vector
comparison exactly in a way it is currently implemented. I can't see
any problem to reduce it with a max or min function over vector
elements for being able to use it inside if statement.

>
> Please consider using tree codes all the way through the gimple IR if
> the target supports expansion and doing the expansion only at expand
> time.  Building a call expression is expensive and really you could
> use opcodes and not worry about a target hook.  I think opcodes it is
> a much easier way allowing for targets to add support rather than
> adding more and more target hooks.

The hook approach was approved by Richard, so let's wait what he is
going to tell.


Thanks,
Artem.
Joseph S. Myers - Aug. 15, 2010, 6:45 p.m.
On Sun, 15 Aug 2010, Artem Shinkarov wrote:

> This patch implements vector comparison according to OpenCL standard.

Suppose your target's vector comparison instructions encode the result 
some way other than a vector of 0 and -1 values.  How effectively can such 
instructions be used in the context of your patch?  In the typical use 
cases of comparisons that you expect in real code, will it be possible to 
optimize code written for the 0 and -1 values so that it uses instructions 
generating some other encoding without a lot of conversions between the 
two conventions?  (For example, if code does several vector comparisons 
and boolean operations on the results of those comparisons, it should be 
possible to convert convention at most once at the end of those 
operations.)  Is there a clear route, once your patch is applied, to add 
support incrementally for targets using such other conventions and for 
such optimizations for such targets?
Steven Bosscher - Aug. 15, 2010, 6:56 p.m.
On Sun, Aug 15, 2010 at 8:38 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
>> Please consider using tree codes all the way through the gimple IR if
>> the target supports expansion and doing the expansion only at expand
>> time.  Building a call expression is expensive and really you could
>> use opcodes and not worry about a target hook.  I think opcodes it is
>> a much easier way allowing for targets to add support rather than
>> adding more and more target hooks.
>
> The hook approach was approved by Richard,

Hmm, it's unfortunate that this wasn't brought up earlier on the gcc
list. I agree with pinski that using new codes here seems like the
better approach.

> so let's wait what he is
> going to tell.

Right. Perhaps Richi had good reasons to not use new codes.

Ciao!
Steven
Artem Shinkarov - Aug. 15, 2010, 7:15 p.m.
On Sun, Aug 15, 2010 at 7:45 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Sun, 15 Aug 2010, Artem Shinkarov wrote:
>
>> This patch implements vector comparison according to OpenCL standard.
>
> Suppose your target's vector comparison instructions encode the result
> some way other than a vector of 0 and -1 values.  How effectively can such
> instructions be used in the context of your patch?

SSE and AltiVec at least define vector comparison result as a vector
of 0 and -1. So I think that this would be an exotic architecture (but
I'm not sure). Anyway, if the target returns boolean value, then the
implementation of the hook should return constant vector of zeros for
false and constant vector of -1-s for true.

The other thing which is unclear what is the definition of boolean
comparison with the test ">"? It could be treated very differently
considering the situation: all the elements of first vector are
bigger, most of the elements are bigger, at least one element is
bigger... I don't know the right answer, I'm even not sure if there is
one.

Again, both approaches (vector return and boolean return) have its own
disadvantages. The question is what is the common approach at in
hardware. I think that vector return.

Also it is not a big deal to support both approaches, the question is
which one should be expressed using the comparison operations.

>In the typical use
> cases of comparisons that you expect in real code, will it be possible to
> optimize code written for the 0 and -1 values so that it uses instructions
> generating some other encoding without a lot of conversions between the
> two conventions?  (For example, if code does several vector comparisons
> and boolean operations on the results of those comparisons, it should be
> possible to convert convention at most once at the end of those
> operations.)  Is there a clear route, once your patch is applied, to add
> support incrementally for targets using such other conventions and for
> such optimizations for such targets?

I think that yes, if we will agree on what are the rules for
conversion between boolean and vector approach. In other words how
many -1 is enough to make the value true.

All the reset is solvable. We would just change a boolean with a
vector and vice-versa. At least I can't see any huge problems there,
but may be i'm wrong.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
>
Joseph S. Myers - Aug. 15, 2010, 8:03 p.m.
On Sun, 15 Aug 2010, Artem Shinkarov wrote:

> On Sun, Aug 15, 2010 at 7:45 PM, Joseph S. Myers
> <joseph@codesourcery.com> wrote:
> > On Sun, 15 Aug 2010, Artem Shinkarov wrote:
> >
> >> This patch implements vector comparison according to OpenCL standard.
> >
> > Suppose your target's vector comparison instructions encode the result
> > some way other than a vector of 0 and -1 values.  How effectively can such
> > instructions be used in the context of your patch?
> 
> SSE and AltiVec at least define vector comparison result as a vector
> of 0 and -1. So I think that this would be an exotic architecture (but
> I'm not sure). Anyway, if the target returns boolean value, then the

I'm thinking in particular of TI C64X (which can probably be considered an 
exotic architecture in lots of ways), where vector comparisons set the 
low-order two or four bits of the target register and then you need a 
separate XPND2 or XPND4 instruction to convert this to a vector with 0 and 
-1 values.  The comparison instructions are single-cycle while XPND2 and 
XPND4 are two-cycle instructions; it would be nice to avoid excess 
instructions where possible.  (The GCC port for C6X exists based on 4.4 
but isn't yet ready for upstream submission; I sent the binutils port 
upstream in March.)
Richard Guenther - Aug. 15, 2010, 10 p.m.
On Sun, Aug 15, 2010 at 10:03 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Sun, 15 Aug 2010, Artem Shinkarov wrote:
>
>> On Sun, Aug 15, 2010 at 7:45 PM, Joseph S. Myers
>> <joseph@codesourcery.com> wrote:
>> > On Sun, 15 Aug 2010, Artem Shinkarov wrote:
>> >
>> >> This patch implements vector comparison according to OpenCL standard.
>> >
>> > Suppose your target's vector comparison instructions encode the result
>> > some way other than a vector of 0 and -1 values.  How effectively can such
>> > instructions be used in the context of your patch?
>>
>> SSE and AltiVec at least define vector comparison result as a vector
>> of 0 and -1. So I think that this would be an exotic architecture (but
>> I'm not sure). Anyway, if the target returns boolean value, then the
>
> I'm thinking in particular of TI C64X (which can probably be considered an
> exotic architecture in lots of ways), where vector comparisons set the
> low-order two or four bits of the target register and then you need a
> separate XPND2 or XPND4 instruction to convert this to a vector with 0 and
> -1 values.  The comparison instructions are single-cycle while XPND2 and
> XPND4 are two-cycle instructions; it would be nice to avoid excess
> instructions where possible.  (The GCC port for C6X exists based on 4.4
> but isn't yet ready for upstream submission; I sent the binutils port
> upstream in March.)

The current patch would support this by always emitting builtins for
the separate XPND2/4 instructions.  Dependent on the use (I expect
that openCL code will mostly use the comparison result as a mask,
not reduce it to a single bool) the not needed result could be optimized
by combine.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com
Richard Guenther - Aug. 15, 2010, 10:05 p.m.
On Sun, Aug 15, 2010 at 8:56 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, Aug 15, 2010 at 8:38 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>>> Please consider using tree codes all the way through the gimple IR if
>>> the target supports expansion and doing the expansion only at expand
>>> time.  Building a call expression is expensive and really you could
>>> use opcodes and not worry about a target hook.  I think opcodes it is
>>> a much easier way allowing for targets to add support rather than
>>> adding more and more target hooks.
>>
>> The hook approach was approved by Richard,
>
> Hmm, it's unfortunate that this wasn't brought up earlier on the gcc
> list. I agree with pinski that using new codes here seems like the
> better approach.

New codes for what?  If you want to delay expansion to expand instead
of vector lowering you could do that by just keeping the comparison as it is.

>> so let's wait what he is
>> going to tell.
>
> Right. Perhaps Richi had good reasons to not use new codes.

Lowering vector comparisons to either piecewise operations (as veclower
usually does) or to target specific code is easier on gimple.  It's easier
to develop and grok to have it in one place (instead of lowering only what
we can't handle directly).

But well - maybe you can clarify what new tree codes you are thinking of?
Certainly not a new tree code for every possible target builtin we have?

Richard.

> Ciao!
> Steven
>
Richard Henderson - Aug. 16, 2010, 7:24 p.m.
On 08/15/2010 12:15 PM, Artem Shinkarov wrote:
> SSE and AltiVec at least define vector comparison result as a vector
> of 0 and -1. So I think that this would be an exotic architecture (but
> I'm not sure). Anyway, if the target returns boolean value, then the
> implementation of the hook should return constant vector of zeros for
> false and constant vector of -1-s for true.

MIPS isn't so exotic.  The result of a float-pair comparison is to
set a pair of condition codes.  One can then use a branch that examines
both CC results, or a special move instruction for the SELECT operation.


r~
Richard Henderson - Aug. 16, 2010, 7:27 p.m.
On 08/16/2010 12:24 PM, Richard Henderson wrote:
> On 08/15/2010 12:15 PM, Artem Shinkarov wrote:
>> SSE and AltiVec at least define vector comparison result as a vector
>> of 0 and -1. So I think that this would be an exotic architecture (but
>> I'm not sure). Anyway, if the target returns boolean value, then the
>> implementation of the hook should return constant vector of zeros for
>> false and constant vector of -1-s for true.
> 
> MIPS isn't so exotic.  The result of a float-pair comparison is to
> set a pair of condition codes.  One can then use a branch that examines
> both CC results, or a special move instruction for the SELECT operation.

All that said, as long as we're clever enough to fold

  t1 = (v1 == v2);
  t2 = t1 ? v3 : v4;

where ?: is also as defined by OpenCL into

  t2 = VCOND< v1 == v2 ? v3 : v4 >

where VCOND is as defined by the existing gcc tree code,
then we're likely to produce optimal results for MIPS.


r~
Nathan Froyd - Aug. 16, 2010, 7:33 p.m.
On Mon, Aug 16, 2010 at 12:24:28PM -0700, Richard Henderson wrote:
> On 08/15/2010 12:15 PM, Artem Shinkarov wrote:
> > SSE and AltiVec at least define vector comparison result as a vector
> > of 0 and -1. So I think that this would be an exotic architecture (but
> > I'm not sure). Anyway, if the target returns boolean value, then the
> > implementation of the hook should return constant vector of zeros for
> > false and constant vector of -1-s for true.
> 
> MIPS isn't so exotic.  The result of a float-pair comparison is to
> set a pair of condition codes.  One can then use a branch that examines
> both CC results, or a special move instruction for the SELECT operation.

Vector comparisons on PPC E500 work roughly the same way.  I suppose
your clarification of producing VCOND should also work for E500, though.

-Nathan
Richard Henderson - Aug. 16, 2010, 7:38 p.m.
On 08/15/2010 11:56 AM, Steven Bosscher wrote:
> On Sun, Aug 15, 2010 at 8:38 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>>> Please consider using tree codes all the way through the gimple IR if
>>> the target supports expansion and doing the expansion only at expand
>>> time.  Building a call expression is expensive and really you could
>>> use opcodes and not worry about a target hook.  I think opcodes it is
>>> a much easier way allowing for targets to add support rather than
>>> adding more and more target hooks.
>>
>> The hook approach was approved by Richard,
> 
> Hmm, it's unfortunate that this wasn't brought up earlier on the gcc
> list. I agree with pinski that using new codes here seems like the
> better approach.

I don't think we need new codes.

The difference between Cell and OpenCL is obviously a problem at the
language level (to be sorted by command-line flags no doubt), but at
the GIMPLE level it's easy to have both

  bool t = <eq <type boolean> v1 v2>;
  vect v = <eq <type vect> v1 v2>;

be valid.


r~
Artem Shinkarov - Aug. 20, 2010, 1:15 p.m.
Summing up the discussion.

Richard, do I understand correctly that the semantics of the new VCOND
should be as following:

res = VCOND <v1 ? v2 : v3> means:
foreach (i in length (v1)) res[i] = v1 == 0 ? v3[i] : v2[i]?


And does anyone still have thoughts about the boolean return type of
vector comparison? Do we need it and if we need it then how exactly we
want to implement it.


Thank you,
Artem.
Richard Henderson - Aug. 20, 2010, 2:44 p.m.
On 08/20/2010 06:15 AM, Artem Shinkarov wrote:
> Summing up the discussion.
> 
> Richard, do I understand correctly that the semantics of the new VCOND
> should be as following:
> 
> res = VCOND <v1 ? v2 : v3> means:
> foreach (i in length (v1)) res[i] = v1 == 0 ? v3[i] : v2[i]?

Yes.

> And does anyone still have thoughts about the boolean return type of
> vector comparison? Do we need it and if we need it then how exactly we
> want to implement it.

Apparently we need it at the language level, at least for Cell.

Whether we implement that via normal comparison operators with
a boolean result, or whether we define additional operators
that operate on the vector compare result to produce a boolean
is still something for debate.

The issue I have with implementing it with the normal comparison
operators is that we hard-code a single definition -- either all
elements must match or at least one element must match.  It would
be better for gimple to define something like

  VEC_TEST_ANY < (cmp v1 v2) >
  VEC_TEST_ALL < (cmp v1 v2) >

I can see how these could be easily expanded for sse, altivec,
and mips-ps.

Comments, Andrew?


r~

Patch

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 163244)
+++ gcc/doc/extend.texi	(working copy)
@@ -6141,6 +6141,26 @@  minus or complement operators on a vecto
 elements are the negative or complemented values of the corresponding
 elements in the operand.
 
+Vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 163244)
+++ gcc/doc/tm.texi	(working copy)
@@ -5710,6 +5710,10 @@  misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 163244)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5710,6 +5710,8 @@  misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 163244)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@  default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 163244)
+++ gcc/targhooks.h	(working copy)
@@ -82,6 +82,11 @@  extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 163244)
+++ gcc/target.def	(working copy)
@@ -836,6 +836,15 @@  DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	(revision 163244)
+++ gcc/tree.c	(working copy)
@@ -1358,6 +1358,28 @@  build_vector_from_ctor (tree type, VEC(c
   return build_vector (type, nreverse (list));
 }
 
+/* Build a vector of type VECTYPE where all the elements are SCs.  */
+tree
+build_vector_from_val (const tree sc, const tree vectype) 
+{
+  tree t = NULL_TREE;
+  int i, nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (sc == error_mark_node)
+    return sc;
+
+  gcc_assert (TREE_TYPE (sc) == TREE_TYPE (vectype));
+
+  for (i = 0; i < nunits; ++i)
+    t = tree_cons (NULL_TREE, sc, t);
+
+  if (CONSTANT_CLASS_P (sc))
+    return build_vector (vectype, t);
+  else 
+    return build_constructor_from_list (vectype, t);
+}
+
+
 /* Return a new CONSTRUCTOR node whose type is TYPE and whose values
    are in the VEC pointed to by VALS.  */
 tree
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 163244)
+++ gcc/tree.h	(working copy)
@@ -4027,6 +4027,7 @@  extern tree build_int_cst_type (tree, HO
 extern tree build_int_cst_wide (tree, unsigned HOST_WIDE_INT, HOST_WIDE_INT);
 extern tree build_vector (tree, tree);
 extern tree build_vector_from_ctor (tree, VEC(constructor_elt,gc) *);
+extern tree build_vector_from_val (const tree, const tree);
 extern tree build_constructor (tree, VEC(constructor_elt,gc) *);
 extern tree build_constructor_single (tree, tree, tree);
 extern tree build_constructor_from_list (tree, tree);
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 163244)
+++ gcc/target.h	(working copy)
@@ -51,7 +51,7 @@ 
 
 #include "tm.h"
 #include "insn-modes.h"
-
+#include "gimple.h"
 /* Types used by the record_gcc_switches() target function.  */
 typedef enum
 {
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,109 @@ 
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, op) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i)) ? (type)-1 : 0)) { \
+            __builtin_printf ("%i != ((%i " #op " %i) ? -1 : 0) ", vidx (type, res, __i), \
+                              vidx (type, i0, __i), vidx (type, i1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, res); \
+do { \
+    res = (v0 > v1); \
+    check_compare (type, count, res, v0, v1, >); \
+    res = (v0 < v1); \
+    check_compare (type, count, res, v0, v1, <); \
+    res = (v0 >= v1); \
+    check_compare (type, count, res, v0, v1, >=); \
+    res = (v0 <= v1); \
+    check_compare (type, count, res, v0, v1, <=); \
+    res = (v0 == v1); \
+    check_compare (type, count, res, v0, v1, ==); \
+    res = (v0 != v1); \
+    check_compare (type, count, res, v0, v1, !=); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (INT, 4, i0, i1, ires);
+#undef INT
+
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (INT, 4, u0, u1, ures);
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (SHORT, 8, s0, s1, sres);
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (SHORT, 8, us0, us1, usres);
+#undef SHORT
+
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (CHAR, 16, c0, c1, cres);
+#undef CHAR
+
+#define CHAR char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (CHAR, 16, uc0, uc1, ucres);
+#undef CHAR
+
+
+    return 0;
+}
+
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 163244)
+++ gcc/c-typeck.c	(working copy)
@@ -9606,6 +9606,29 @@  build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -9718,6 +9741,29 @@  build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10113,6 +10159,10 @@  c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 163244)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@  along with GCC; see the file COPYING3.  
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -125,6 +126,21 @@  do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -284,6 +300,21 @@  expand_vector_addition (gimple_stmt_iter
 				    a, b, code);
 }
 
+/* Try a hardware hook for vector comparison or 
+   extract comparison piecewise.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -326,8 +357,24 @@  expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 163244)
+++ gcc/Makefile.in	(working copy)
@@ -864,7 +864,7 @@  endif
 VEC_H = vec.h statistics.h
 EXCEPT_H = except.h $(HASHTAB_H) vecprim.h vecir.h
 TOPLEV_H = toplev.h $(INPUT_H) bversion.h $(DIAGNOSTIC_CORE_H)
-TARGET_H = $(TM_H) target.h target.def insn-modes.h
+TGT = $(TM_H) target.h target.def insn-modes.h
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
 HOOKS_H = hooks.h $(MACHMODE_H)
 HOSTHOOKS_DEF_H = hosthooks-def.h $(HOOKS_H)
@@ -886,8 +886,9 @@  TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	$(GGC_H) $(BASIC_BLOCK_H) $(TM_H) $(TARGET_H) tree-ssa-operands.h \
+	$(GGC_H) $(BASIC_BLOCK_H) $(TM_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h vecir.h
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3156,7 +3157,7 @@  tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 163244)
+++ gcc/tree-cfg.c	(working copy)
@@ -3144,6 +3144,38 @@  verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (op0_type) == VECTOR_TYPE 
+      && TREE_CODE (op1_type) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TYPE_VECTOR_SUBPARTS (op0_type) != TYPE_VECTOR_SUBPARTS (op1_type))
+        {
+          error ("invalid vector comparison, number of elements do not match");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
+        {
+          error ("invalid vector comparison, vector element type mismatch");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 163244)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@  along with GCC; see the file COPYING3.  
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -30222,6 +30223,276 @@  ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## SS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -31715,6 +31986,11 @@  ix86_enum_va_list (int idx, const char *
 #define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \
   ix86_vectorize_builtin_vec_perm_ok
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function