diff mbox

[Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding

Message ID n99oai9epez.fsf@arm.com
State New
Headers show

Commit Message

Jiong Wang Aug. 14, 2015, 5:40 p.m. UTC
Jeff Law writes:

> On 04/29/2015 03:36 PM, Jiong Wang wrote:
>>
>> Jeff Law writes:
>>
>>> On 04/27/2015 02:21 PM, Jiong Wang wrote:
>>>
>>>> Jeff,
>>>>
>>>>     Sorry, I can't understand the meaning of "overlap between t_low and low",
>>>>     assume "right" in "right value" means the opposite of "left" not
>>>>     "correct".
>>>>
>>>>     So what you mean is t_low and low share the same pseudo regiser?
>>> My concern is sharing the same pseudo or memory location.  But thinking
>>> more about it, the shifted value has to have range information, so it
>>> must have been an SSA_NAME, right?  If so, then it can't overlap with
>>> the destination, so this is a non-issue.  Sorry for the confusion.
>>
>> Thanks for the light. By looking at related code, looks like even it's
>> SSA_NAME, it's still possible to share the same pseudo given the
>> destination is in the same SSA map parition after ssa name coleascing?
> If they're the same size and have non-overlapping lifetimes, then yes, 
> they could be the same pseudo.  That ought to be easy to check. 
> Thankfully we don't have to worry about MEMs, which is a harder check.
>
>> OK. I will rework the patch, and I found there is a function named
>> "expand_doubleword_shift" which looks like a more natural place to do
>> this optimization, although it's hard to get range info there. I will do
>> further explore on this.
> Sounds good.
>
> jeff

Jeff,

  Sorry for the slow response.

  I found we can't integrate this transformation into
  "expand_doubleword_shift" as it's at quite deep layer in the call
  stack, when we reach there, we have lost some high level info.

  So I am keeping this transformaion still in expr.c, and attachment is
  the updated version of this patch. The main changes are:

  * Figuring out whether the shift source is coming from sign extension
    by checking SSA_NAME_DEF_STMT instead of deducing from tree range
    info. I fell checking the gimple statement is more reliable and
    straigtforward.

  * For the pseudo register overlaping issue, given:

      RTX target = TREE source << TREE amount

    I was trying to make sure those two SSA_NAME won't get the same
    pseudo register by comparing their assigned partition using
    var_to_partition, but I can't get the associated tree representation
    for "target" which is RTX.

    Then I just expand "source" and compare the register number with
    "target".

    But I found the simplest way maybe just reorder

      low_part (target) = low_part (source) << amount
      high_part (target) = low_part (source) >> amount1

    to:

      high_part (target) = low_part (source) >> amount1
      low_part (target) = low_part (source) << amount

    then, even target and source share the same pseudo register, we can
    still get what we want, as we are operating on their subreg.

  * Added checking on the result value of expand_variable_shift, if it's
    not the same as "target" then call emit_move_insn to do that.

  How is the re-worked patch looks to you?

  x86 bootstrap OK, regression OK. aarch64 bootstrap OK.

2015-08-14  Jiong.Wang  <jiong.wang@arm.com>

gcc/
  * expr.c (expand_expr_real_2): Check tree format to optimize
  LSHIFT_EXPR expand.

gcc/testsuite
  * gcc.dg/wide_shift_64_1.c: New testcase.
  * gcc.dg/wide_shift_128_1.c: Likewise.
  * gcc.target/aarch64/ashlti3_1.c: Likewise.
  * gcc.target/arm/ashldisi_1.c: Likewise.
--
Regards,
Jiong

Comments

Jeff Law Aug. 14, 2015, 8:24 p.m. UTC | #1
On 08/14/2015 11:40 AM, Jiong Wang wrote:
>
>    * Figuring out whether the shift source is coming from sign extension
>      by checking SSA_NAME_DEF_STMT instead of deducing from tree range
>      info. I fell checking the gimple statement is more reliable and
>      straigtforward.
I suspect it'll also apply more often if you're looking for the 
nop-conversion rather than looking at range information.

I keep thinking there ought to be a generalization here so that we're 
not so restrictive on the modes, but it's probably not worth doing.

In a perfect world we'd also integrate the other strategies for 
double-word shifts that exist in the various ports as special cases in 
expansion and remove the double-word shift patterns for ports that don't 
actually have them in hardware.  But that's probably a bit much to ask 
-- however, it probably couldn't hurt to see if there are any that are 
easily handled.


>
>    * For the pseudo register overlaping issue, given:
>
>        RTX target = TREE source << TREE amount
>
>      I was trying to make sure those two SSA_NAME won't get the same
>      pseudo register by comparing their assigned partition using
>      var_to_partition, but I can't get the associated tree representation
>      for "target" which is RTX.
>
>      Then I just expand "source" and compare the register number with
>      "target".
Right.  Which is what I would have suggested.  Given two pseudos, you 
can just compare them for equality.  If they're unequal, then its safe 
to assume they do not overlap.

>
>      But I found the simplest way maybe just reorder
>
>        low_part (target) = low_part (source) << amount
>        high_part (target) = low_part (source) >> amount1
>
>      to:
>
>        high_part (target) = low_part (source) >> amount1
>        low_part (target) = low_part (source) << amount
>
>      then, even target and source share the same pseudo register, we can
>      still get what we want, as we are operating on their subreg.
Yes.  This would work too and avoids the need to find source's pseudo.

>
>    * Added checking on the result value of expand_variable_shift, if it's
>      not the same as "target" then call emit_move_insn to do that.
Excellent.

>
>    How is the re-worked patch looks to you?
>
>    x86 bootstrap OK, regression OK. aarch64 bootstrap OK.
>
> 2015-08-14  Jiong.Wang<jiong.wang@arm.com>
>
> gcc/
>    * expr.c (expand_expr_real_2): Check tree format to optimize
>    LSHIFT_EXPR expand.
>
> gcc/testsuite
>    * gcc.dg/wide_shift_64_1.c: New testcase.
>    * gcc.dg/wide_shift_128_1.c: Likewise.
>    * gcc.target/aarch64/ashlti3_1.c: Likewise.
>    * gcc.target/arm/ashldisi_1.c: Likewise.



> +	/* Left shfit optimization:
s/shfit/shift/


> +
> +	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
> +	   native instruction to support this wide mode left shift.  Given below
> +	   example:
> +
> +	    Type A = (Type) B  << C
> +
> +            |<           T	    >|
> +            |   high     |   low     |
> +
> +                         |<- size  ->|
> +
> +	   By checking tree shape, if operand B is coming from signed extension,
> +	   then the left shift operand can be simplified into:
> +
> +	      1. high = low >> (size - C);
> +	      2. low = low << C;
You'll want to reorder those to match the code you generate.

Doesn't this require that C be less than the bitsize of a word?

If C is larger than the bitsize of a word, then you need some 
adjustments, something like:


	      1. high = low << (C - size)
               2. low = 0

Does this transformation require that the wider mode be exactly 2X the 
narrower mode?  If so, that needs to be verified.

> +		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
So we're assured we have a widening conversion.

> +		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
> +			>= GET_MODE_BITSIZE (word_mode)))
This test seems wrong.  I'm not sure what you're really trying to test 
here.  It seems you want to look at the shift count relative to the 
bitsize of word_mode.  If it's less, then generate the code you 
mentioned above in the comment.  If it's more, then generate the 
sequence I referenced?  Right?

I do think you need to be verifying that rmode == wordmode here.  If I 
understand everything correctly, the final value is "mode" which must be 
2X the size size of rmode/wordmode here, right?



The other question is are we endianness-safe in these transformations?




> +		  {
> +		    rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
> +		    rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0);
> +		    rtx thigh = simplify_gen_subreg (word_mode, target, mode,
> +						     UNITS_PER_WORD);
> +		    HOST_WIDE_INT ramount = (BITS_PER_WORD
> +					     - TREE_INT_CST_LOW (treeop1));
> +		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
> +
> +		    temp = expand_variable_shift (code, word_mode, low, treeop1,
> +						  tlow, unsignedp);
Why use "code" here right than just using LSHIFT_EXPR?  As noted earlier,

Jeff
Jiong Wang Aug. 14, 2015, 8:44 p.m. UTC | #2
Jeff Law writes:

> On 08/14/2015 11:40 AM, Jiong Wang wrote:
>>
>>    * Figuring out whether the shift source is coming from sign extension
>>      by checking SSA_NAME_DEF_STMT instead of deducing from tree range
>>      info. I fell checking the gimple statement is more reliable and
>>      straigtforward.
> I suspect it'll also apply more often if you're looking for the 
> nop-conversion rather than looking at range information.
>
> I keep thinking there ought to be a generalization here so that we're 
> not so restrictive on the modes, but it's probably not worth doing.
>
> In a perfect world we'd also integrate the other strategies for 
> double-word shifts that exist in the various ports as special cases in 
> expansion and remove the double-word shift patterns for ports that don't 
> actually have them in hardware.  But that's probably a bit much to ask 
> -- however, it probably couldn't hurt to see if there are any that are 
> easily handled.

Agree.

Doing these in early tree/rtl expand stage is more clean & safe, and
expose more details to later RTL optimization passes as I think if you
handle double-word by defining insn pattern, then you will try to split
it in RTL split pass which happens later after some important
optimizations.

>
>> +
>> +	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
>> +	   native instruction to support this wide mode left shift.  Given below
>> +	   example:
>> +
>> +	    Type A = (Type) B  << C
>> +
>> +            |<           T	    >|
>> +            |   high     |   low     |
>> +
>> +                         |<- size  ->|
>> +
>> +	   By checking tree shape, if operand B is coming from signed extension,
>> +	   then the left shift operand can be simplified into:
>> +
>> +	      1. high = low >> (size - C);
>> +	      2. low = low << C;
> You'll want to reorder those to match the code you generate.
>
> Doesn't this require that C be less than the bitsize of a word?

Yes.

Above transformation is to handle double word left shift which shift the
original source across the word boundary.

Part of the contents shifted into the high half of destination, and the
other remain in the low half.

So if C is bigger than bitsize of a word, the the whole source will be
shifted into high half, thus should generate what you have listed below
and that's what gcc is generating, looks like the existed double word
shift logic can recognize this special cases.

So, I should check the value of C,if it's bigger than bitsize of a word,
then just fall through to default expand logic.

> If C is larger than the bitsize of a word, then you need some 
> adjustments, something like:
>
>
> 	      1. high = low << (C - size)
>                2. low = 0
>
> Does this transformation require that the wider mode be exactly 2X the 
> narrower mode?  If so, that needs to be verified.

I think so, the wider mode should be exactly 2X the word_mode, will
add the check.

>
>> +		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
> So we're assured we have a widening conversion.
>
>> +		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
>> +			>= GET_MODE_BITSIZE (word_mode)))
> This test seems wrong.  I'm not sure what you're really trying to test 
> here.  It seems you want to look at the shift count relative to the 
> bitsize of word_mode.  If it's less, then generate the code you 
> mentioned above in the comment.  If it's more, then generate the 
> sequence I referenced?  Right?

As explained above, I am trying to check whether the left shift is
causing the original source across word boundary while I should make sure
TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode) at the same
time, otherwise fall through to default expand logic.

>
> I do think you need to be verifying that rmode == wordmode here.  If I 
> understand everything correctly, the final value is "mode" which must be 
> 2X the size size of rmode/wordmode here, right?

I think rmode don't need to equal wordmode. For example the
transformation is OK for the following, where rmode = SImode, and final
mode = TImode.

int b;
__128_int a = (__128_int) b;

the expand_expr (treeop0... in the start of those code will generate
necessary instruction sequences to extend whatever mode b is into TImode
register (op0 in the patch);

>
>
>
> The other question is are we endianness-safe in these transformations?

I think it is. As this transformation is done with register involved
only. I think endianness issue will only happen if there is operation on
memory object.

>> +		    temp = expand_variable_shift (code, word_mode, low, treeop1,
>> +						  tlow, unsignedp);
> Why use "code" here right than just using LSHIFT_EXPR?  As noted
> earlier,

Yes, better to use the constant LSHIFT_EXPR.

Thanks.
diff mbox

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 31b4573..8a25e98 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8836,23 +8836,90 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
-      /* If this is a fixed-point operation, then we cannot use the code
-	 below because "expand_shift" doesn't support sat/no-sat fixed-point
-         shifts.   */
-      if (ALL_FIXED_POINT_MODE_P (mode))
-	goto binop;
+      {
+	/* If this is a fixed-point operation, then we cannot use the code
+	   below because "expand_shift" doesn't support sat/no-sat fixed-point
+	   shifts.  */
+	if (ALL_FIXED_POINT_MODE_P (mode))
+	  goto binop;
+
+	if (! safe_from_p (subtarget, treeop1, 1))
+	  subtarget = 0;
+	if (modifier == EXPAND_STACK_PARM)
+	  target = 0;
+	op0 = expand_expr (treeop0, subtarget,
+			   VOIDmode, EXPAND_NORMAL);
+	/* Left shfit optimization:
+
+	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
+	   native instruction to support this wide mode left shift.  Given below
+	   example:
+
+	    Type A = (Type) B  << C
+
+            |<           T	    >|
+            |   high     |   low     |
+
+                         |<- size  ->|
+
+	   By checking tree shape, if operand B is coming from signed extension,
+	   then the left shift operand can be simplified into:
+
+	      1. high = low >> (size - C);
+	      2. low = low << C;
+	  */
+
+	temp = NULL_RTX;
+	if (code == LSHIFT_EXPR
+	    && target
+	    && REG_P (target)
+	    && ! unsignedp
+	    && mode == GET_MODE_WIDER_MODE (word_mode)
+	    && ! have_insn_for (ASHIFT, mode)
+	    && TREE_CONSTANT (treeop1)
+	    && TREE_CODE (treeop0) == SSA_NAME)
+	  {
+	    gimple def = SSA_NAME_DEF_STMT (treeop0);
+	    if (is_gimple_assign (def)
+		&& gimple_assign_rhs_code (def) == NOP_EXPR)
+	      {
+		machine_mode rmode = TYPE_MODE
+		  (TREE_TYPE (gimple_assign_rhs1 (def)));
 
-      if (! safe_from_p (subtarget, treeop1, 1))
-	subtarget = 0;
-      if (modifier == EXPAND_STACK_PARM)
-	target = 0;
-      op0 = expand_expr (treeop0, subtarget,
-			 VOIDmode, EXPAND_NORMAL);
-      temp = expand_variable_shift (code, mode, op0, treeop1, target,
-				    unsignedp);
-      if (code == LSHIFT_EXPR)
-	temp = REDUCE_BIT_FIELD (temp);
-      return temp;
+		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
+		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
+			>= GET_MODE_BITSIZE (word_mode)))
+		  {
+		    rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
+		    rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0);
+		    rtx thigh = simplify_gen_subreg (word_mode, target, mode,
+						     UNITS_PER_WORD);
+		    HOST_WIDE_INT ramount = (BITS_PER_WORD
+					     - TREE_INT_CST_LOW (treeop1));
+		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
+
+		    temp = expand_variable_shift (code, word_mode, low, treeop1,
+						  tlow, unsignedp);
+		    if (temp != tlow)
+		      emit_move_insn (tlow, temp);
+
+		    temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
+						  rshift, thigh, unsignedp);
+		    if (temp != thigh)
+		      emit_move_insn (thigh, temp);
+
+		    temp = target ;
+		  }
+	      }
+	  }
+
+	if (temp == NULL_RTX)
+	  temp = expand_variable_shift (code, mode, op0, treeop1, target,
+					unsignedp);
+	if (code == LSHIFT_EXPR)
+	  temp = REDUCE_BIT_FIELD (temp);
+	return temp;
+      }
 
       /* Could determine the answer when only additive constants differ.  Also,
 	 the addition of one can be handled by changing the condition.  */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-128.c b/gcc/testsuite/gcc.dg/wide-shift-128.c
new file mode 100644
index 0000000..9b62715
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-128.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+__int128_t
+load2 (int data)
+{
+    return (__int128_t) data << 50;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-64.c b/gcc/testsuite/gcc.dg/wide-shift-64.c
new file mode 100644
index 0000000..5bc278f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-64.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target arm*-*-* mips*-*-* sparc*-*-* } } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+long long
+load1 (int data)
+{
+    return (long long) data << 12;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ashltidisi.c b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
new file mode 100644
index 0000000..aeb2a24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
@@ -0,0 +1,48 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x, y, z)\
+__uint128_t __attribute__ ((noinline))\
+ushift_##x##_##z (unsigned y data)\
+{\
+  return (__uint128_t) data << x;\
+}\
+__int128_t __attribute__ ((noinline)) \
+shift_##x##_##z (y data) \
+{\
+  return (__int128_t) data << x;\
+}
+
+GEN_TEST_CASE (53, int, i)
+GEN_TEST_CASE (3, long long, ll)
+GEN_TEST_CASE (13, long long, ll)
+GEN_TEST_CASE (53, long long, ll)
+
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y, z, p) \
+	if (ushift_##y##_##p (x)\
+	    != ((__uint128_t) (unsigned z) x << y)) \
+	  abort ();\
+	if (shift_##y##_##p (x)\
+	    != ((__uint128_t) (signed z) x << y)) \
+	  abort ();
+
+  SHIFT_CHECK (0x12345678, 53, int, i)
+  SHIFT_CHECK (0xcafecafe, 53, int, i)
+
+  SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll)
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 4 } } */
diff --git a/gcc/testsuite/gcc.target/arm/ashldisi.c b/gcc/testsuite/gcc.target/arm/ashldisi.c
new file mode 100644
index 0000000..00dc06e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/ashldisi.c
@@ -0,0 +1,44 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x)\
+unsigned long long __attribute__ ((noinline))\
+ushift_ ## x (unsigned int data)\
+{\
+  return (unsigned long long) data << x;\
+}\
+long long __attribute__ ((noinline)) \
+shift_ ## x (int data) \
+{\
+  return (long long) data << x;\
+}
+
+GEN_TEST_CASE (3)
+GEN_TEST_CASE (23)
+GEN_TEST_CASE (30)
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y) \
+	if (ushift_ ## y (x)\
+	    != ((unsigned long long) (unsigned) x << y)) \
+	  abort (); \
+	if (shift_ ## y (x)\
+	    != ((long long) (signed) x << y)) \
+	  abort ();
+
+  SHIFT_CHECK (0x12345678, 3)
+  SHIFT_CHECK (0xcafecafe, 3)
+  SHIFT_CHECK (0x12345678, 23)
+  SHIFT_CHECK (0xcafecafe, 23)
+  SHIFT_CHECK (0x12345678, 30)
+  SHIFT_CHECK (0xcafecafe, 30)
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 3 } } */
+/* { dg-final { cleanup-saved-temps } } */