diff mbox

[rs6000] (3/3) Fix mulv4si3 and mulv8hi3 patterns for little endian

Message ID 1383543684.6275.312.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt Nov. 4, 2013, 5:41 a.m. UTC
Hi,

This patch contains two more fixes related to the multiply even/odd
problem.  First, it changes the mulv4si3 pattern so that it always uses
the vmulouh (vector multiply odd halfword) instruction regardless of
endianness.  The reason for this is that we are not really multiplying
halfwords, but are multiplying words that have been truncated to
halfword length; therefore they always sit in the right half of each
word, which is the odd-numbered word in big-endian numbering.

The fix for mulv8hi3 is another case where reversing the meanings of the
multiply even/odd instructions requires us to reverse the order of
inputs on the merge high/low instructions to compensate.

Bootstrapped and tested with the rest of the patch set on
powerpc64{,le}-unknown-linux-gnu with no regressions.  Ok for trunk?

Thanks,
Bill


2013-11-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
	for both big and little endian.
	(mulv8hi3): Swap input operands for merge high and merge low
	instructions.

Comments

Richard Sandiford Nov. 4, 2013, 3:48 p.m. UTC | #1
Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
> +   /* We need this to be vmulouh for both big and little endian,
> +      but for little endian we would swap this, so avoid that.  */
> +   if (BYTES_BIG_ENDIAN)
> +     emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> +   else
> +     emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));

FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into
define_expands and have define_insns for the underlying instructions.
E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub
depending on endianness.

Then the unspec name would always match the instruction, and you could
also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above.

It probably works out as more code overall, but maybe it means jumping
through fewer mental hoops...

Thanks,
Richard
Bill Schmidt Nov. 4, 2013, 4:29 p.m. UTC | #2
On Mon, 2013-11-04 at 15:48 +0000, Richard Sandiford wrote:
> Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
> > +   /* We need this to be vmulouh for both big and little endian,
> > +      but for little endian we would swap this, so avoid that.  */
> > +   if (BYTES_BIG_ENDIAN)
> > +     emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> > +   else
> > +     emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));
> 
> FWIW, an alternative would be to turn vec_widen_smult_{even,odd}_* into
> define_expands and have define_insns for the underlying instructions.
> E.g. vec_widen_umult_even_v16qi could call gen_vmuleub or gen_vmuloub
> depending on endianness.
> 
> Then the unspec name would always match the instruction, and you could
> also use gen_vmulouh rather than gen_vec_widen_umult_*_v8hi above.
> 
> It probably works out as more code overall, but maybe it means jumping
> through fewer mental hoops...

Good idea.  I'll have a look and produce a new patch set shortly.

Thanks,
Bill

> 
> Thanks,
> Richard
diff mbox

Patch

Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 204192)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -651,7 +651,12 @@ 
    convert_move (small_swap, swap, 0);
  
    low_product = gen_reg_rtx (V4SImode);
-   emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
+   /* We need this to be vmulouh for both big and little endian,
+      but for little endian we would swap this, so avoid that.  */
+   if (BYTES_BIG_ENDIAN)
+     emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
+   else
+     emit_insn (gen_vec_widen_umult_even_v8hi (low_product, one, two));
  
    high_product = gen_reg_rtx (V4SImode);
    emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
@@ -678,13 +683,18 @@ 
    emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
    emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
 
-   emit_insn (gen_altivec_vmrghw (high, even, odd));
-   emit_insn (gen_altivec_vmrglw (low, even, odd));
-
    if (BYTES_BIG_ENDIAN)
-     emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+     {
+       emit_insn (gen_altivec_vmrghw (high, even, odd));
+       emit_insn (gen_altivec_vmrglw (low, even, odd));
+       emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+     }
    else
-     emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
+     {
+       emit_insn (gen_altivec_vmrghw (high, odd, even));
+       emit_insn (gen_altivec_vmrglw (low, odd, even));
+       emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
+     } 
 
    DONE;
 }")