Patchwork [ia64,rfa] vector pattern improvements

login
register
mail settings
Submitter Steve Ellcey
Date Jan. 6, 2011, 5:25 p.m.
Message ID <201101061725.p06HPCX21825@lucas.cup.hp.com>
Download mbox | patch
Permalink /patch/77746/
State New
Headers show

Comments

Steve Ellcey - Jan. 6, 2011, 5:25 p.m.
Richard,

I am trying to fix the new IA64 vector instructions you added so they
work on HP-UX (big-endian) and would like to get some feedback on
whether or not I am going about it in the correct way.  The attached
patch fixes some (but not most) of the gcc.dg/vect failures I am seeing
and I would like to verify that the vect_extract* instructions I am 
changing are the best place to address the big-endian/little-endian
Richard Henderson - Jan. 6, 2011, 6:53 p.m.
On 01/06/2011 09:25 AM, Steve Ellcey wrote:
> -  emit_insn (gen_mix1_r (temp, operands[1], operands[2]));
> +  if (TARGET_BIG_ENDIAN)
> +    emit_insn (gen_mix1_r (temp, operands[2], operands[1]));
> +  else
> +    emit_insn (gen_mix1_r (temp, operands[1], operands[2]));

Oh, I understand the problem now.

The Root Cause is that gcc numbers vector elements in their
memory ordering.  Thus the numbering of the elements as seen
in the register change based on TARGET_BIG_ENDIAN.  Thus just
about all of the (VEC_SELECT * (PARALLEL)) patterns are wrong
for big endian.

Something like the patch you propose may be required, yes.

I'm trying to think of a way to fix the other problems without
doubling the number of instruction patterns though.

One possibility is to do

(define_special_predicate "select_mix1_r_parallel"
  (match_code "parallel")
{
  static const int order[2][8] = {
    { 0, 8, 2, 10, 4, 12, 6, 14 }, /* le */
    { 8, 0, 10, 2, 12, 4, 14, 6 }  /* be */
  };
  return match_select_parallel(op, 8, order[TARGET_BIG_ENDIAN]);
})

(define_insn "mix1_r"
  [(set (match_operand:V8QI 0 "gr_register_operand" "=r")
        (vec_select:V8QI
          (vec_concat:V16QI
            (match_operand:V8QI 1 "gr_reg_or_0_operand" "rU")
            (match_operand:V8QI 2 "gr_reg_or_0_operand" "rU"))
	  (match_operand 3 "select_mix1_r_parallel" "")))]
  ""
  "mix1.r %0 = %r2, %r1"
  [(set_attr "itanium_class" "mmshf")])

bool
match_select_parallel (rtx par, int nelt, const int *order)
{
  int i;

  if (XVECLEN (par, 0) != nelt)
    return false;

  for (i = 0; i < nelt; ++i)
    {
      rtx e = XVECEXP (par, 0, i);
      if (GET_CODE (e) != CONST_INT)
        return false;
      if (INTVAL (e) != order[i])
	return false;
    }

  return true;
}

Although for this specific case it would probably be better
to simply manually swap the operands in the output template:

  {
    if (TARGET_BIG_ENDIAN)
      return "mix1.r %0 = %r1, %r2";
    else
      return "mix1.r %0 = %r2, %r1";
  }

but this would not be true of all of the patterns.

Another possibility is to simply give up on representing these
instructions exactly and instead use UNSPECs.

I think the proper representation is very much preferable,
since then we can do something akin to the i386 port with the
expand_vec_perm scheme, where we search for combinations of
permutations that we can support.

It's somewhat unfortunate that there's no ia64-hpux available
in the compile farm.  I could do the bulk conversion and test
on linux, but I'd still have to rely on you to test hpux.


r~
Steve Ellcey - Jan. 6, 2011, 8:22 p.m.
On Thu, 2011-01-06 at 10:53 -0800, Richard Henderson wrote:

> 
> Although for this specific case it would probably be better
> to simply manually swap the operands in the output template:
> 
>   {
>     if (TARGET_BIG_ENDIAN)
>       return "mix1.r %0 = %r1, %r2";
>     else
>       return "mix1.r %0 = %r2, %r1";
>   }
> 
> but this would not be true of all of the patterns.

If swapping arguments doesn't work for all patterns then I think I would
rather use the predicate method in all cases instead of having some use
the select*parallel predicate and others swapping arguments.

> Another possibility is to simply give up on representing these
> instructions exactly and instead use UNSPECs.
> 
> I think the proper representation is very much preferable,
> since then we can do something akin to the i386 port with the
> expand_vec_perm scheme, where we search for combinations of
> permutations that we can support.

Yes, I think using a proper representation instead of UNSPEC's is
preferable too.

> It's somewhat unfortunate that there's no ia64-hpux available
> in the compile farm.  I could do the bulk conversion and test
> on linux, but I'd still have to rely on you to test hpux.

I can certainly do any HP-UX testing that is needed if you create
a patch.

Steve Ellcey
sje@cup.hp.com

Patch

differences.

At first I tried changing the vec_pack to call gen_vec_extract* with
different options but now I think leaving the vec_pack* instructions
alone and changing vec_extract* is better, but I am not sure if one
or the other of these locations is considered the 'right' place to
address endian issues.

Can you offer any advise?

Steve Ellcey
sje@cup.hp.com


Index: config/ia64/vect.md
===================================================================
--- config/ia64/vect.md	(revision 168534)
+++ config/ia64/vect.md	(working copy)
@@ -1,5 +1,5 @@ 
 ;; IA-64 machine description for vector operations.
-;; Copyright (C) 2004, 2005, 2007, 2010 Free Software Foundation, Inc.
+;; Copyright (C) 2004, 2005, 2007 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
 ;;
@@ -857,7 +857,10 @@  (define_expand "vec_extract_evenv8qi"
   ""
 {
   rtx temp = gen_reg_rtx (V8QImode);
-  emit_insn (gen_mix1_r (temp, operands[1], operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_mix1_r (temp, operands[2], operands[1]));
+  else
+    emit_insn (gen_mix1_r (temp, operands[1], operands[2]));
   emit_insn (gen_mux1_alt (operands[0], temp));
   DONE;
 })
@@ -869,7 +872,10 @@  (define_expand "vec_extract_oddv8qi"
   ""
 {
   rtx temp = gen_reg_rtx (V8QImode);
-  emit_insn (gen_mix1_l (temp, operands[1], operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_mix1_l (temp, operands[2], operands[1]));
+  else
+    emit_insn (gen_mix1_l (temp, operands[1], operands[2]));
   emit_insn (gen_mux1_alt (operands[0], temp));
   DONE;
 })
@@ -967,7 +973,10 @@  (define_expand "vec_extract_evenv4hi"
   ""
 {
   rtx temp = gen_reg_rtx (V4HImode);
-  emit_insn (gen_mix2_r (temp, operands[1], operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_mix2_r (temp, operands[2], operands[1]));
+  else
+    emit_insn (gen_mix2_r (temp, operands[1], operands[2]));
   emit_insn (gen_vec_extract_evenodd_helper (operands[0], temp));
   DONE;
 })
@@ -979,7 +988,10 @@  (define_expand "vec_extract_oddv4hi"
   ""
 {
   rtx temp = gen_reg_rtx (V4HImode);
-  emit_insn (gen_mix2_l (temp, operands[1], operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_mix2_l (temp, operands[2], operands[1]));
+  else
+    emit_insn (gen_mix2_l (temp, operands[1], operands[2]));
   emit_insn (gen_vec_extract_evenodd_helper (operands[0], temp));
   DONE;
 })
@@ -1024,8 +1036,12 @@  (define_expand "vec_extract_evenv2si"
    (match_operand:V2SI 2 "gr_register_operand" "")]
   ""
 {
-  emit_insn (gen_vec_interleave_lowv2si (operands[0], operands[1],
-					 operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_vec_interleave_lowv2si (operands[0], operands[2],
+					   operands[1]));
+  else
+    emit_insn (gen_vec_interleave_lowv2si (operands[0], operands[1],
+					   operands[2]));
   DONE;
 })
 
@@ -1035,8 +1051,12 @@  (define_expand "vec_extract_oddv2si"
    (match_operand:V2SI 2 "gr_register_operand" "")]
   ""
 {
-  emit_insn (gen_vec_interleave_highv2si (operands[0], operands[1],
-					  operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_vec_interleave_lowv2si (operands[0], operands[2],
+					   operands[1]));
+  else
+    emit_insn (gen_vec_interleave_highv2si (operands[0], operands[1],
+					    operands[2]));
   DONE;
 })
 
@@ -1399,8 +1419,12 @@  (define_expand "vec_extract_evenv2sf"
    (match_operand:V2SF 2 "gr_register_operand" "")]
   ""
 {
-  emit_insn (gen_vec_interleave_lowv2sf (operands[0], operands[1],
-					 operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_vec_interleave_highv2sf (operands[0], operands[2],
+					   operands[1]));
+  else
+    emit_insn (gen_vec_interleave_lowv2sf (operands[0], operands[1],
+					   operands[2]));
   DONE;
 })
 
@@ -1410,8 +1434,12 @@  (define_expand "vec_extract_oddv2sf"
    (match_operand:V2SF 2 "gr_register_operand" "")]
   ""
 {
-  emit_insn (gen_vec_interleave_highv2sf (operands[0], operands[1],
-					  operands[2]));
+  if (TARGET_BIG_ENDIAN)
+    emit_insn (gen_vec_interleave_lowv2sf (operands[0], operands[2],
+					    operands[1]));
+  else
+    emit_insn (gen_vec_interleave_highv2sf (operands[0], operands[1],
+					    operands[2]));
   DONE;
 })