Patchwork [rs6000] Correct handling of multiply high-part for little endian

login
register
mail settings
Submitter William J. Schmidt
Date Oct. 30, 2013, 10:55 p.m.
Message ID <1383173749.6275.231.camel@gnopaine>
Download mbox | patch
Permalink /patch/287362/
State New
Headers show

Comments

William J. Schmidt - Oct. 30, 2013, 10:55 p.m.
Hi,

When working around the peculiar little-endian semantics of the vperm
instruction, our usual fix is to complement the permute control vector
and swap the order of the two vector input operands, so that we get a
double-wide vector in the proper order.  We don't want to swap the
operands when we are expanding a mult_highpart operation, however, as
the two input operands are not to be interpreted as a double-wide
vector.  Instead they represent odd and even elements, and swapping the
operands gets the odd and even elements reversed in the final result.

The permute for this case is generated by target-neutral code in
optabs.c: expand_mult_highpart ().  We obviously can't change that code
directly.  However, we can redirect the logic from the "case 2" method
to target-specific code by implementing expansions for the
umul<mode>3_highpart and smul<mode>3_highpart operations.  I've done
this, with the expansions acting exactly as expand_mult_highpart does
today, with the exception that it swaps the input operands to the call
to expand_vec_perm when we are generating little-endian code.  We will
later swap them back to their original position in the code in rs6000.c:
altivec_expand_vec_perm_const_le ().

The change has no intended effect when generating big-endian code.

Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
regressions.  This fixes the gcc.dg/vect/pr51581-4.c test failure for
little endian.  Ok for trunk?

Thanks,
Bill


2013-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
	prototype.
	* config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
	* config/rs6000/altivec.md (umul<mode>3_highpart): New.
	(smul_<mode>3_highpart): New.
David Edelsohn - Oct. 31, 2013, 12:06 a.m.
On Wed, Oct 30, 2013 at 6:55 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> When working around the peculiar little-endian semantics of the vperm
> instruction, our usual fix is to complement the permute control vector
> and swap the order of the two vector input operands, so that we get a
> double-wide vector in the proper order.  We don't want to swap the
> operands when we are expanding a mult_highpart operation, however, as
> the two input operands are not to be interpreted as a double-wide
> vector.  Instead they represent odd and even elements, and swapping the
> operands gets the odd and even elements reversed in the final result.
>
> The permute for this case is generated by target-neutral code in
> optabs.c: expand_mult_highpart ().  We obviously can't change that code
> directly.  However, we can redirect the logic from the "case 2" method
> to target-specific code by implementing expansions for the
> umul<mode>3_highpart and smul<mode>3_highpart operations.  I've done
> this, with the expansions acting exactly as expand_mult_highpart does
> today, with the exception that it swaps the input operands to the call
> to expand_vec_perm when we are generating little-endian code.  We will
> later swap them back to their original position in the code in rs6000.c:
> altivec_expand_vec_perm_const_le ().
>
> The change has no intended effect when generating big-endian code.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new
> regressions.  This fixes the gcc.dg/vect/pr51581-4.c test failure for
> little endian.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> 2013-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000-protos.h (altivec_expand_mul_highpart): New
>         prototype.
>         * config/rs6000/rs6000.c (altivec_expand_mul_highpart): New.
>         * config/rs6000/altivec.md (umul<mode>3_highpart): New.
>         (smul_<mode>3_highpart): New.

I really do not like duplicating this code.  I think that you need to
explore with the community the possibility of including a hook in the
general code to handle the strangeness of PPC LE vector semantics.

This is asking for problems if the generic code is updated / modified / fixed.

- David

Patch

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 204192)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -58,6 +58,7 @@  extern void rs6000_expand_vector_extract (rtx, rtx
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
+extern bool altivec_expand_mul_highpart (rtx op[3], bool);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
 extern void build_mask64_2_operands (rtx, rtx *);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 204192)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -29249,6 +29249,58 @@  rs6000_do_expand_vec_perm (rtx target, rtx op0, rt
     emit_move_insn (target, x);
 }
 
+/* Expand an Altivec multiply high-part.  The logic matches the 
+   general logic in optabs.c:expand_mult_highpart, but swaps the
+   inputs for little endian.  Note that we will swap them again
+   during the permute; this is the one case where we don't want
+   the operands swapped, as if we do we get the even and odd values
+   reversed.  */
+
+bool
+altivec_expand_mul_highpart (rtx operands[3], bool uns_p)
+{
+  struct expand_operand eops[3];
+  rtx m1, m2, perm, tmp;
+  int i;
+
+  optab tab1 = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab;
+  optab tab2 = uns_p ? vec_widen_umult_odd_optab : vec_widen_umult_odd_optab;
+  enum machine_mode mode = GET_MODE (operands[0]);
+  enum insn_code icode = optab_handler (tab1, mode);
+  int nunits = GET_MODE_NUNITS (mode);
+  enum machine_mode wmode = insn_data[icode].operand[0].mode;
+  rtvec v = rtvec_alloc (nunits);
+
+  create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
+  create_input_operand (&eops[1], operands[1], mode);
+  create_input_operand (&eops[2], operands[2], mode);
+  expand_insn (icode, 3, eops);
+  m1 = gen_lowpart (mode, eops[0].value);
+
+  create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
+  create_input_operand (&eops[1], operands[1], mode);
+  create_input_operand (&eops[2], operands[2], mode);
+  expand_insn (optab_handler (tab2, mode), 3, eops);
+  m2 = gen_lowpart (mode, eops[0].value);
+
+  for (i = 0; i < nunits; ++i)
+    RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
+				+ ((i & 1) ? nunits : 0));
+
+  perm = gen_rtx_CONST_VECTOR (mode, v);
+
+  if (!BYTES_BIG_ENDIAN) {
+    tmp = m1;
+    m1 = m2;
+    m2 = tmp;
+  }
+
+  perm = expand_vec_perm (mode, m1, m2, perm, NULL_RTX);
+  emit_move_insn (operands[0], perm);
+
+  return true;
+}
+
 /* Expand an extract even operation.  */
 
 void
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 204192)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1401,6 +1401,30 @@ 
     FAIL;
 })
 
+(define_expand "umul<mode>3_highpart"
+  [(match_operand:VIshort 0 "register_operand" "")
+   (match_operand:VIshort 1 "register_operand" "")
+   (match_operand:VIshort 2 "register_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  if (altivec_expand_mul_highpart (operands, /*uns_p*/true))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_expand "smul<mode>3_highpart"
+  [(match_operand:VIshort 0 "register_operand" "")
+   (match_operand:VIshort 1 "register_operand" "")
+   (match_operand:VIshort 2 "register_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  if (altivec_expand_mul_highpart (operands, /*uns_p*/false))
+    DONE;
+  else
+    FAIL;
+})
+
 (define_insn "altivec_vrfip"		; ceil
   [(set (match_operand:V4SF 0 "register_operand" "=v")
         (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")]