Patchwork [combine] Fix 16-bit -> 64-bit multiply and accumulate

login
register
mail settings
Submitter Andrew Stubbs
Date May 18, 2011, 11:55 a.m.
Message ID <4DD3B39A.4090905@codesourcery.com>
Download mbox | patch
Permalink /patch/96156/
State New
Headers show

Comments

Andrew Stubbs - May 18, 2011, 11:55 a.m.
On 03/05/11 10:07, Bernd Schmidt wrote:
> On 04/15/2011 12:54 PM, Andrew Stubbs wrote:
>> Ping.
>
> Trying to unblock this...
>
> I think the point is that both examples:
>
> long long foolong (long long x, short *a, short *b)
>     {
>         return x + (long long)*a * (long long)*b;
>     }
>
> and
>
>
>    long long foolong (long long x, short *a, short *b)
>    {
>        return x + *a * *b;
>    }
>
> should produce the same code using the maddhidi pattern, and for that to
> happen, simplify_rtx must produce a canonical form of the 16->64 bit
> widening multiply. The form that already exists in arm.md looks better
> so we should pick that.
>
> I tried to fix it with the patch below, which unfortunately doesn't work
> since during combine we don't see the SIGN_EXTEND operations inside the
> MULT, but two shift operations instead. Maybe you can complete it from here?

This patch fixes my original testcase (the second one above), and so can 
be a complete replacement for that patch, if preferred? I did originally 
want to do it without exporting the function from combine.c (since 
nothing else appears to be exported like that), but it required moving 
too much code between files.

The patch does not fix the first test case above because combine does 
not recognise the testcase as a HI -> DI multiply-and-add. It can't do 
it because it has already combined the HI load with a HI->DI 
sign-extend, and there's neither a multiply that supports mem arguments, 
nor a multiply with DI mode inputs. I could add define_and_split 
patterns to catch these cases, but that seems the wrong way to do it ...

The real problem is that the widening-multiply tree optimization pass 
does not support multiplies that widen by more than one mode. As far as 
I can tell, ARM is the only backend that tries to do this, although cris 
has a comment to the effect that it would use it if it worked. I am 
investigating a way to fix this by introducing an optab matrix akin to 
the existing one for conversions, but that's another patch.

Is this patch OK? (Assuming my tests pass.)

Andrew
Bernd Schmidt - May 19, 2011, 12:31 p.m.
On 05/18/2011 11:55 AM, Andrew Stubbs wrote:
>  
>      case SIGN_EXTEND:
> +      op = make_compound_operation (op, SET);
> +

I would have expected that bit to go into the MULT case (doing it for
both operands), so that the effect is limited to just making widening
multiplies more obvious.

Also, using make_compound_operation unchanged is probably not safe - it
calls SUBST which I think is only valid during combine. I think you may
have to break out the non-recursive part into a
make_compound_operation_1 (probably to be placed into simplify-rtx.c).


Bernd

Patch

2011-05-18  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* combine.c (make_compound_operation): Remove 'static'.
	* rtl.h (make_compound_operation): New prototype.
	* simplify-rtx.c (simplify_unary_operation_1): Create a new
	canonical form for widening multiplies.

--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -427,7 +427,6 @@  static const_rtx expand_field_assignment (const_rtx);
 static rtx make_extraction (enum machine_mode, rtx, HOST_WIDE_INT,
 			    rtx, unsigned HOST_WIDE_INT, int, int, int);
 static rtx extract_left_shift (rtx, int);
-static rtx make_compound_operation (rtx, enum rtx_code);
 static int get_pos_from_mask (unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT *);
 static rtx canon_reg_for_combine (rtx, rtx);
@@ -7516,7 +7515,7 @@  extract_left_shift (rtx x, int count)
    being kludges), it is MEM.  When processing the arguments of a comparison
    or a COMPARE against zero, it is COMPARE.  */
 
-static rtx
+rtx
 make_compound_operation (rtx x, enum rtx_code in_code)
 {
   enum rtx_code code = GET_CODE (x);
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2372,6 +2372,7 @@  extern unsigned int extended_count (const_rtx, enum machine_mode, int);
 extern rtx remove_death (unsigned int, rtx);
 extern void dump_combine_stats (FILE *);
 extern void dump_combine_total_stats (FILE *);
+extern rtx make_compound_operation (rtx, enum rtx_code);
 
 /* In cfgcleanup.c  */
 extern void delete_dead_jumptables (void);
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -989,6 +989,8 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
       break;
 
     case SIGN_EXTEND:
+      op = make_compound_operation (op, SET);
+
       /* (sign_extend (truncate (minus (label_ref L1) (label_ref L2))))
 	 becomes just the MINUS if its mode is MODE.  This allows
 	 folding switch statements on machines using casesi (such as
@@ -1000,6 +1002,23 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == SIGN_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1090,23 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == ZERO_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),