[RFC,mid-end] Optimize immediate choice in comparisons.

Message ID d9f9c0f4-62b2-7912-370c-c896dadd37c4@arm.com
State New
Headers show
Series
  • [RFC,mid-end] Optimize immediate choice in comparisons.
Related show

Commit Message

Vlad Lazar Aug. 7, 2018, 1:33 p.m.
Hi.

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded into a
register in fewer instructions. The cases are as follows:
   
i)   a >  b or a >= b + 1
ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less instructions.
This is done on a statement by statement basis, right before the GIMPLE statement is expanded
to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook.
The change is general and it applies to any integer comparison, regardless of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
   return x > 0xfefffffe;
}

it generates:

mov	w1, -16777217
cmp	w0, w1
cset	w0, cs

instead of:

mov	w1, 65534
movk	w1, 0xfeff, lsl 16
cmp	w0, w1
cset	w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>

	* gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>

	* cfgexpand.c (optimize_immediate_choice): New.
	(can_optimize_immediate_p): Likewise.
	(validate_immediate_optimization): Likewise.
	(update_immediate): Likewise.
	(immediate_optimized_code): Likewise.

---

Comments

Richard Sandiford Aug. 7, 2018, 8:11 p.m. | #1
Hi Vlad,

Thanks for the patch.

Vlad Lazar <vlad.lazar@arm.com> writes:
> Hi.
>
> This patch optimises the choice of immediates in integer comparisons. Integer
> comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
> and there are cases where an incremented/decremented immediate can be loaded into a
> register in fewer instructions. The cases are as follows:
>    
> i)   a >  b or a >= b + 1
> ii)  a <= b or a <  b + 1
> iii) a >= b or a >  b - 1
> iv)  a <  b or a <= b - 1
>
> For each comparison we check whether the equivalent can be performed in less instructions.
> This is done on a statement by statement basis, right before the GIMPLE statement is expanded
> to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook.
> The change is general and it applies to any integer comparison, regardless of types or location.
>
> For example, on AArch64 for the following code:
>
> int foo (int x)
> {
>    return x > 0xfefffffe;
> }
>
> it generates:
>
> mov	w1, -16777217
> cmp	w0, w1
> cset	w0, cs
>
> instead of:
>
> mov	w1, 65534
> movk	w1, 0xfeff, lsl 16
> cmp	w0, w1
> cset	w0, hi
>
> Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.

Looks like a useful feature.  I'm playing devil's advocate to some
extent here, but did you consider instead doing this during the
expansion functions themselves?  In some ways that feels more
natural because we're already operating on rtxes at that point.
It also has the advantage that it would trap comparisons that didn't
exist at the gimple level, such as when computing the carry for a
doubleword add/sub.

I think the two main functions that would need to change are:

- prepare_cmp_insn
- emit_store_flag_1

We could have something like:

  void canonicalize_comparison_for_target (machine_mode, rtx_code *, rtx *);

which changes the comparison code and second operand as in your patch.
In the case of emit_store_flag_1, it should happen after:

  if (swap_commutative_operands_p (op0, op1))
    {
      std::swap (op0, op1);
      code = swap_condition (code);
    }

(In the case of prepare_cmp_insn, the callers have already done this.)

Note that the patch as its stands won't handle comparisons nested
inside the first operand of a COND_EXPR gassign (which unfortunately
are still a thing).  E.g.:

  _1 = _2 > 0xfefffffe ? _3 : _4;

FWIW, I agree this is a target-independent technique that should be
done in target-independent code, rather than something that should
be duplicated in each individual target that benefits from it.

Thanks,
Richard

>
> Thanks,
> Vlad
>
> gcc/testsuite/
> Changelog for gcc/testsuite/Changelog
> 2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>
>
> 	* gcc.target/aarch64/imm_choice_comparison.c: New.
>
> gcc/
> Changelog for gcc/Changelog
> 2018-07-30  Vlad Lazar  <vlad.lazar@arm.com>
>
> 	* cfgexpand.c (optimize_immediate_choice): New.
> 	(can_optimize_immediate_p): Likewise.
> 	(validate_immediate_optimization): Likewise.
> 	(update_immediate): Likewise.
> 	(immediate_optimized_code): Likewise.
>
> ---
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 9b91279282e1c6956c8b3699f13036c401ea1dcd..5b0a2e0cdb23f649336844506c8241428b3e6e7d 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5423,6 +5423,157 @@ reorder_operands (basic_block bb)
>     XDELETE (lattice);
>   }
>   
> +/* Helper function for update_immediate.  Returns the adjusted conditional
> +   code for the immediate choice optimization.  See optimize_immediate_choice
> +   for cases.  */
> +
> +static enum tree_code
> +immediate_optimized_code (enum tree_code code)
> +{
> +  switch (code)
> +    {
> +    case GT_EXPR:
> +      return GE_EXPR;
> +    case GE_EXPR:
> +      return GT_EXPR;
> +    case LT_EXPR:
> +      return LE_EXPR;
> +    case LE_EXPR:
> +      return LT_EXPR;
> +    default:
> +      return code;
> +    }
> +}
> +
> +/* Helper function for optimize_immediate_choice.  It updates the immediate
> +   and the subcode of the gimple statement.  At the point of calling
> +   the function we know that the optimization leads to fewer instructions.
> +   stmt points to the gimple statement containing the comparison we wish to
> +   optimize and to_add is the amount by which the immediate needs to be
> +   adjusted (1 or -1).  */
> +
> +static void
> +update_immediate (gimple *stmt, int to_add)
> +{
> +  tree inc_dec = to_add == 1 ? build_one_cst (integer_type_node) :
> +			       build_minus_one_cst (integer_type_node);
> +
> +  enum gimple_code code = gimple_code (stmt);
> +  if (code == GIMPLE_COND)
> +    {
> +      gcond *gc = GIMPLE_CHECK2<gcond *> (stmt);
> +      tree rhs = gimple_cond_rhs (gc);
> +
> +      /* Update the statement.  */
> +      tree new_rhs = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs), rhs, inc_dec);
> +      gimple_cond_set_rhs (gc, new_rhs);
> +      gc->subcode = immediate_optimized_code ((enum tree_code) gc->subcode);
> +    }
> +  if (code == GIMPLE_ASSIGN)
> +    {
> +      gassign *ga = GIMPLE_CHECK2<gassign *> (stmt);
> +      tree rhs2 = gimple_assign_rhs2 (ga);
> +
> +      tree new_rhs2 = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs2), rhs2, inc_dec);
> +      gimple_assign_set_rhs2 (ga, new_rhs2);
> +      ga->subcode = immediate_optimized_code ((enum tree_code) ga->subcode);
> +    }
> +}
> +
> +/* Helper function for optimize_immediate_choice.  It checks if the other
> +   possible immediate leads to a lower rtx cost.  to_add is the amount by
> +   which the immediate needs to be adjusted (1 or -1).  The function
> +   returns 0 if there is no improvement and the amount by which the immediate
> +   needs to be adjusted (1 or -1) otherwise.  */
> +
> +static int
> +validate_immediate_optimization (gimple *stmt, int to_add)
> +{
> +  tree tree_imm = gimple_code (stmt) == GIMPLE_COND ? gimple_cond_rhs (stmt)
> +						: gimple_assign_rhs2 (stmt);
> +  const_tree type = TREE_TYPE (tree_imm);
> +  widest_int imm = wi::to_widest (tree_imm);
> +  enum signop sgn = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
> +
> +  /* Check for overflow/underflow in the case of signed values and
> +     wrapping around in the case of unsigned values.  If any occur
> +     cancel the optimization.  */
> +
> +  widest_int max_type = wi::to_widest (TYPE_MAX_VALUE (type));
> +  widest_int min_type = wi::to_widest (TYPE_MIN_VALUE (type));
> +  if ((wi::cmp (imm, max_type, sgn) == 0 && to_add == 1)
> +      || (wi::cmp (imm, min_type, sgn) == 0 && to_add == -1))
> +    return 0;
> +
> +  widest_int imm_modif = wi::add (imm, to_add);
> +
> +  rtx reg = gen_reg_rtx (DImode);
> +  rtx old_imm = GEN_INT (imm.to_shwi ());
> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
> +
> +  rtx_insn *old_rtx = gen_move_insn (reg, old_imm);
> +  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
> +
> +  /* If the change is beneficial we can just propagate to_add further,
> +     otherwise return 0 to cancel the optimization.  */
> +  return insn_cost (old_rtx, true) > insn_cost (new_rtx, true) ? to_add : 0;
> +}
> +
> +/* Helper function for optimize_immediate_choice.  Checks if the gimple
> +   statement has the right shape for the optimization.  */
> +
> +static bool
> +can_optimize_immediate_p (const gimple *stmt)
> +{
> +  enum gimple_code code = gimple_code (stmt);
> +  if (code != GIMPLE_ASSIGN && code != GIMPLE_COND)
> +    return false;
> +
> +  if (code == GIMPLE_ASSIGN
> +      && (gimple_num_ops (stmt) != 3
> +	  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST))
> +    return false;
> +  if (code == GIMPLE_COND && TREE_CODE (gimple_cond_rhs (stmt)) != INTEGER_CST)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Entry point for immediate choice optimization.  It aims to choose
> +   the simpler immediate in integer comparisons.  The purpose of this
> +   is to end up with an immediate which can be loaded into a register
> +   in fewer moves, if possible.
> +
> +   For each integer comparison there exists an equivalent choice:
> +     i)   a >  b or a >= b + 1
> +     ii)  a <= b or a <  b + 1
> +     iii) a >= b or a >  b - 1
> +     iv)  a <  b or a <= b - 1
> +
> +   Comparisons can only appear on the rhs of a GIMPLE_ASSIGN
> +   or in a GIMPLE_COND.  */
> +
> +static void
> +optimize_immediate_choice (gimple *stmt)
> +{
> +  if (!can_optimize_immediate_p (stmt))
> +    return;
> +
> +  /* Check if the other immediate available is preferable.  */
> +  int to_add = 0;
> +  if (stmt->subcode == GT_EXPR || stmt->subcode == LE_EXPR)
> +    to_add = validate_immediate_optimization (stmt, 1);
> +
> +  if (stmt->subcode == GE_EXPR || stmt->subcode == LT_EXPR)
> +    to_add = validate_immediate_optimization (stmt, -1);
> +
> +  if (!to_add)
> +    return;
> +
> +  /* Update the current statement.  */
> +  update_immediate (stmt, to_add);
> +}
> +
>   /* Expand basic block BB from GIMPLE trees to RTL.  */
>   
>   static basic_block
> @@ -5515,6 +5666,7 @@ expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
>         basic_block new_bb;
>   
>         stmt = gsi_stmt (gsi);
> +      optimize_immediate_choice (stmt);
>   
>         /* If this statement is a non-debug one, and we generate debug
>   	 insns, then this one might be the last real use of a TERed
> diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b30dcca88f44ca73fcb8202ea488888b365400c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return x >= 0xfff01;
> +}
> +
> +int
> +GT (unsigned int x)
> +{
> +  return x > 0xfefffffe;
> +}
> +
> +/* Go from four moves to two.  */
> +
> +int
> +baz (long long x)
> +{
> +  return x <= 0x1999999999999998;
> +}
> +
> +int
> +LE (unsigned int x)
> +{
> +  return x <= 0xfefffffe;
> +}
> +
> +int
> +GE (long long x)
> +{
> +  return x >= 0xff000000;
> +}
> +
> +int
> +LT (int x)
> +{
> +  return x < 0xff000000;
> +}
> +
> +/* Optimize the immediate in conditionals.  */
> +
> +int check (int x, int y)
> +{
> +  if (x > y && GT (x))
> +    return 100;
> +
> +  return x;
> +}
> +
> +/* baz produces one movk instruction.  */
> +/* { dg-final { scan-assembler-times "movk" 1 } } */
Jeff Law Aug. 9, 2018, 5:48 a.m. | #2
On 08/07/2018 02:11 PM, Richard Sandiford wrote:
> Hi Vlad,
> 
> Thanks for the patch.
> 
> Vlad Lazar <vlad.lazar@arm.com> writes:
>> Hi.
>>
>> This patch optimises the choice of immediates in integer comparisons. Integer
>> comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
>> and there are cases where an incremented/decremented immediate can be loaded into a
>> register in fewer instructions. The cases are as follows:
>>    
>> i)   a >  b or a >= b + 1
>> ii)  a <= b or a <  b + 1
>> iii) a >= b or a >  b - 1
>> iv)  a <  b or a <= b - 1
>>
>> For each comparison we check whether the equivalent can be performed in less instructions.
>> This is done on a statement by statement basis, right before the GIMPLE statement is expanded
>> to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook.
>> The change is general and it applies to any integer comparison, regardless of types or location.
>>
>> For example, on AArch64 for the following code:
>>
>> int foo (int x)
>> {
>>    return x > 0xfefffffe;
>> }
>>
>> it generates:
>>
>> mov	w1, -16777217
>> cmp	w0, w1
>> cset	w0, cs
>>
>> instead of:
>>
>> mov	w1, 65534
>> movk	w1, 0xfeff, lsl 16
>> cmp	w0, w1
>> cset	w0, hi
>>
>> Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.
> 
> Looks like a useful feature.  I'm playing devil's advocate to some
> extent here, but did you consider instead doing this during the
> expansion functions themselves?  In some ways that feels more
> natural because we're already operating on rtxes at that point.
> It also has the advantage that it would trap comparisons that didn't
> exist at the gimple level, such as when computing the carry for a
> doubleword add/sub.
I've got no strong opinions on doing it in cfgexpand vs the expansion
functions themselves.  I'm happy to have you setting overall direction
here Richard.

I do worry about the amount of RTL we generate and throw away during
cost computation.  Though it's just for comparisons, so it may not be
terrible.

I wouldn't be surprised if ports aren't particularly accurate in their
costing computations for this kind of use -- but that's nothing new.  We
run into it every time we use rtx costing in a new place.  I'm
comfortable having targets fault in improvements for this kind of use.

Jeff
Vlad Lazar Aug. 10, 2018, 3:54 p.m. | #3
On 09/08/18 06:48, Jeff Law wrote:
> On 08/07/2018 02:11 PM, Richard Sandiford wrote:
>> Hi Vlad,
>>
>> Thanks for the patch.
>>
>> Vlad Lazar <vlad.lazar@arm.com> writes:
>>> Hi.
>>>
>>> This patch optimises the choice of immediates in integer comparisons. Integer
>>> comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
>>> and there are cases where an incremented/decremented immediate can be loaded into a
>>> register in fewer instructions. The cases are as follows:
>>>     
>>> i)   a >  b or a >= b + 1
>>> ii)  a <= b or a <  b + 1
>>> iii) a >= b or a >  b - 1
>>> iv)  a <  b or a <= b - 1
>>>
>>> For each comparison we check whether the equivalent can be performed in less instructions.
>>> This is done on a statement by statement basis, right before the GIMPLE statement is expanded
>>> to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook.
>>> The change is general and it applies to any integer comparison, regardless of types or location.
>>>
>>> For example, on AArch64 for the following code:
>>>
>>> int foo (int x)
>>> {
>>>     return x > 0xfefffffe;
>>> }
>>>
>>> it generates:
>>>
>>> mov	w1, -16777217
>>> cmp	w0, w1
>>> cset	w0, cs
>>>
>>> instead of:
>>>
>>> mov	w1, 65534
>>> movk	w1, 0xfeff, lsl 16
>>> cmp	w0, w1
>>> cset	w0, hi
>>>
>>> Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.
>>
>> Looks like a useful feature.  I'm playing devil's advocate to some
>> extent here, but did you consider instead doing this during the
>> expansion functions themselves?  In some ways that feels more
>> natural because we're already operating on rtxes at that point.
>> It also has the advantage that it would trap comparisons that didn't
>> exist at the gimple level, such as when computing the carry for a
>> doubleword add/sub.
> I've got no strong opinions on doing it in cfgexpand vs the expansion
> functions themselves.  I'm happy to have you setting overall direction
> here Richard.
> 
> I do worry about the amount of RTL we generate and throw away during
> cost computation.  Though it's just for comparisons, so it may not be
> terrible.
> 
> I wouldn't be surprised if ports aren't particularly accurate in their
> costing computations for this kind of use -- but that's nothing new.  We
> run into it every time we use rtx costing in a new place.  I'm
> comfortable having targets fault in improvements for this kind of use.
> 
> Jeff
> 
Thank you for the feedback.

I agree with Richard's opinion that the change feels more natural in the
actual expanders.  Therefore, I've reimplemented the feature in the expansion
functions.  It's worth mentioning that it now also applies the optimization
in ternary operators comparisons and I added a test which reflects such a case.

Regarding the amount of RTL generated, I have tried to keep it at a minimum,
by only generating the immediate value moves.

I've bootstrapped and regtested again and there are no regressions.
See the updated patch below.

Thanks,
Vlad

---

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded into a
register in fewer instructions. The cases are as follows:
   
i)   a >  b or a >= b + 1
ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less instructions.
This is done in the gimple expanders. Therefore, it requires a correct implementation of the
TARGET_INSN_COST hook. The change is general and it applies to any integer comparison, regardless
of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
   return x > 0xfefffffe;
}

it generates:

mov	w1, -16777217
cmp	w0, w1
cset	w0, cs

instead of:

mov	w1, 65534
movk	w1, 0xfeff, lsl 16
cmp	w0, w1
cset	w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-08-10  Vlad Lazar  <vlad.lazar@arm.com>

	* gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-08-10  Vlad Lazar  <vlad.lazar@arm.com>
	* expmed.h (canonicalize_comparison, canonicalized_cmp_code): New declarations.
	* expmed.c (canonicalize_comparison, canonicalized_cmp_code): New implementations.
	* expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
	* optabs.c (prepare_cmp_insn): Likewise.

---

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode,
  extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
  				  machine_mode, int, int);
  
+extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);
+
+extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
+
  /* Choose a minimal N + 1 bit approximation to 1/D that can be used to
     replace division by D, and put the least significant N bits of the result
     in *MULTIPLIER_PTR and return the most significant bit.  */
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 4c74e7dc2ea448fb1c1dd69d197b4bca23668b44..bdd19062cf0dda3a6b7ffe04698bf832e80fca75 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5538,6 +5538,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1,
    if (mode == VOIDmode)
      mode = GET_MODE (op0);
  
+  if (CONST_SCALAR_INT_P (op1))
+    canonicalize_comparison (mode, &code, &op1);
+
    /* For some comparisons with 1 and -1, we can convert this to
       comparisons with zero.  This will often produce more opportunities for
       store-flag insns.  */
@@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
  
    return target;
  }
+
+/* Choose the more appropiate immediate in comparisons.  The purpose of this
+   is to end up with an immediate which can be loaded into a register in fewer
+   moves, if possible.
+
+   For each integer comparison there exists an equivalent choice:
+     i)   a >  b or a >= b + 1
+     ii)  a <= b or a <  b + 1
+     iii) a >= b or a >  b - 1
+     iv)  a <  b or a <= b - 1
+
+   The function is called in the gimple expanders which handle GIMPLE_ASSIGN
+   and GIMPLE_COND.  It assumes that the first operand of the comparison is a
+   register and the second is a constant.
+
+   mode is the mode of the first operand.
+   code points to the comparison code.
+   imm points to the rtx containing the immediate.  */
+
+void canonicalize_comparison (machine_mode mode, enum rtx_code *code, rtx *imm)
+{
+  int to_add = 0;
+
+  if (!SCALAR_INT_MODE_P (mode))
+    return;
+
+  /* Extract the immediate value from the rtx.  */
+  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
+
+  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
+    to_add = 1;
+
+  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
+    to_add = -1;
+
+  /* Check for overflow/underflow in the case of signed values and
+     wrapping around in the case of unsigned values.  If any occur
+     cancel the optimization.  */
+  wide_int max_val = wi::max_value (mode, SIGNED);
+  wide_int min_val = wi::min_value (mode, SIGNED);
+  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
+      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
+    return;
+
+  /* Generate a mov instruction for both cases and see whether the change
+     was profitable.  */
+  wide_int imm_modif = wi::add (imm_val, to_add);
+
+  rtx reg = gen_reg_rtx (mode);
+  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
+
+  rtx_insn *old_rtx = gen_move_insn (reg, *imm);
+  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
+
+  /* Update the immediate and the code.  */
+  if (insn_cost (old_rtx, true) > insn_cost (new_rtx, true))
+    {
+      *code = canonicalized_cmp_code (*code);
+      *imm = new_imm;
+    }
+}
+
+/* Helper function for canonicalize_cmp_for_target.  It returns the comparison
+   code of the equivalent comparison.  */
+
+enum rtx_code canonicalized_cmp_code (enum rtx_code code)
+{
+  switch (code)
+    {
+    case GT:
+      return GE;
+    case GE:
+      return GT;
+    case LT:
+      return LE;
+    case LE:
+      return LT;
+    case GTU:
+      return GEU;
+    case GEU:
+      return GTU;
+    case LTU:
+      return LEU;
+    case LEU:
+      return LTU;
+
+    default:
+      return code;
+    }
+}
+
  
  /* Perform possibly multi-word comparison and conditional jump to LABEL
     if ARG1 OP ARG2 true where ARG1 and ARG2 are of mode MODE.  This is
diff --git a/gcc/optabs.c b/gcc/optabs.c
index cadf4676c986c8430baafab8ef5282e890d36308..6052222c90ce559ae3b11a40fbd2ebbf4a6bc8dc 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3812,6 +3812,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
    gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
  	      || methods == OPTAB_LIB_WIDEN);
  
+  if (CONST_SCALAR_INT_P (y))
+    canonicalize_comparison (mode, &comparison, &y);
+
    /* If we are optimizing, force expensive constants into a register.  */
    if (CONSTANT_P (x) && optimize
        && (rtx_cost (x, mode, COMPARE, 0, optimize_insn_for_speed_p ())
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
new file mode 100644
index 0000000000000000000000000000000000000000..ebc44d6dbc7287d907603d77d7b54496de177c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Go from four moves to two.  */
+
+int
+foo (long long x)
+{
+  return x <= 0x1999999999999998;
+}
+
+int
+GT (unsigned int x)
+{
+  return x > 0xfefffffe;
+}
+
+int
+LE (unsigned int x)
+{
+  return x <= 0xfefffffe;
+}
+
+int
+GE (long long x)
+{
+  return x >= 0xff000000;
+}
+
+int
+LT (int x)
+{
+  return x < 0xff000000;
+}
+
+/* Optimize the immediate in conditionals.  */
+
+int
+check (int x, int y)
+{
+  if (x > y && GT (x))
+    return 100;
+
+  return x;
+}
+
+int
+tern (int x)
+{
+  return x >= 0xff000000 ? 5 : -3;
+}
+
+/* baz produces one movk instruction.  */
+/* { dg-final { scan-assembler-times "movk" 1 } } */
Richard Sandiford Aug. 13, 2018, 2 p.m. | #4
Vlad Lazar <vlad.lazar@arm.com> writes:
> diff --git a/gcc/expmed.h b/gcc/expmed.h
> index 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 100644
> --- a/gcc/expmed.h
> +++ b/gcc/expmed.h
> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode,
>   extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
>   				  machine_mode, int, int);
>   
> +extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);
> +
> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code);

It would probably be better to make the second function static (local
to expmed.c), since it's only used internally by canonicalize_comparison.
Switching the order of the two functions in expmed.c would avoid the need
for a forward declaration.

> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
>   
>     return target;
>   }
> +
> +/* Choose the more appropiate immediate in comparisons.  The purpose of this

Maybe worth saying "scalar integer comparisons", since the function
doesn't handle vector or floating-point comparisons.

> +   is to end up with an immediate which can be loaded into a register in fewer
> +   moves, if possible.
> +
> +   For each integer comparison there exists an equivalent choice:
> +     i)   a >  b or a >= b + 1
> +     ii)  a <= b or a <  b + 1
> +     iii) a >= b or a >  b - 1
> +     iv)  a <  b or a <= b - 1
> +
> +   The function is called in the gimple expanders which handle GIMPLE_ASSIGN
> +   and GIMPLE_COND.  It assumes that the first operand of the comparison is a
> +   register and the second is a constant.

This last paragraph doesn't seem accurate any more.  Probably best to
drop it, since there's no real need to say who the callers are if we
describe the interface.

> +   mode is the mode of the first operand.
> +   code points to the comparison code.
> +   imm points to the rtx containing the immediate.  */

GCC's convention is to put parameter names in caps in comments,
so "MODE is...", etc.

On the IMM line, it might be worth adding "*IMM must satisfy
CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
on exit."

> +void canonicalize_comparison (machine_mode mode, enum rtx_code *code, rtx *imm)
> +{
> +  int to_add = 0;
> +
> +  if (!SCALAR_INT_MODE_P (mode))
> +    return;
> +
> +  /* Extract the immediate value from the rtx.  */
> +  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);

This should be:

  rtx_mode_t imm_val (*imm, mode);

so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.

> +  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
> +    to_add = 1;
> +
> +  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
> +    to_add = -1;

Might be better to have:

  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
    to_add = 1;
  else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
    to_add = -1;
  else
    return;

so that we exit early for other comparisons, rather than doing wasted work.

> +  /* Check for overflow/underflow in the case of signed values and
> +     wrapping around in the case of unsigned values.  If any occur
> +     cancel the optimization.  */
> +  wide_int max_val = wi::max_value (mode, SIGNED);
> +  wide_int min_val = wi::min_value (mode, SIGNED);
> +  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
> +      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
> +    return;

This needs to use the SIGNED/UNSIGNED choice appropriate for the
comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
seem to be an existing function that says whether an rtx comparison
operation is signed or not (bit of a surprise), but there is
unsigned_condition, which returns the unsigned form of an rtx
comparison operator.  It might be worth adding something like:

  /* Return true if integer comparison operator CODE interprets its operands
     as unsigned.  */

  inline bool
  unsigned_condition_p (rtx_code code)
  {
    return unsigned_condition (code) == code;
  }

and using that to choose between SIGNED and UNSIGNED.

Rather than using wi::cmp, a more direct way of checking for overflow
is to change this:

> +  /* Generate a mov instruction for both cases and see whether the change
> +     was profitable.  */
> +  wide_int imm_modif = wi::add (imm_val, to_add);

to use the overflow form of wi::add, i.e.:

  wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);

and return if "overflow" is set.

> +
> +  rtx reg = gen_reg_rtx (mode);

gen_reg_rtx creates a new pseudo register that essentially stays
around until we've finished compiling the function.  It's better to use:

  gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);

for cost calculations, so that we don't waste pseudo register numbers.

> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());

This should be:

  rtx new_imm = immed_wide_int_const (imm_modif, mode);

to cope with non-CONST_INT immediates, and to ensure that CONST_INT
immediates are properly sign-extended.

(The rtx interfaces are a bit clunky, sorry.)

Patch looks good to me with those changes, but someone else will
need to approve.

Thanks,
Richard
Vlad Lazar Aug. 14, 2018, 5:01 p.m. | #5
On 13/08/18 15:00, Richard Sandiford wrote:
> Vlad Lazar <vlad.lazar@arm.com> writes:
>> diff --git a/gcc/expmed.h b/gcc/expmed.h
>> index 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 100644
>> --- a/gcc/expmed.h
>> +++ b/gcc/expmed.h
>> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode,
>>    extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
>>    				  machine_mode, int, int);
>>    
>> +extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);
>> +
>> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
> 
> It would probably be better to make the second function static (local
> to expmed.c), since it's only used internally by canonicalize_comparison.
> Switching the order of the two functions in expmed.c would avoid the need
> for a forward declaration.
> 
>> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
>>    
>>      return target;
>>    }
>> +
>> +/* Choose the more appropiate immediate in comparisons.  The purpose of this
> 
> Maybe worth saying "scalar integer comparisons", since the function
> doesn't handle vector or floating-point comparisons.
> 
>> +   is to end up with an immediate which can be loaded into a register in fewer
>> +   moves, if possible.
>> +
>> +   For each integer comparison there exists an equivalent choice:
>> +     i)   a >  b or a >= b + 1
>> +     ii)  a <= b or a <  b + 1
>> +     iii) a >= b or a >  b - 1
>> +     iv)  a <  b or a <= b - 1
>> +
>> +   The function is called in the gimple expanders which handle GIMPLE_ASSIGN
>> +   and GIMPLE_COND.  It assumes that the first operand of the comparison is a
>> +   register and the second is a constant.
> 
> This last paragraph doesn't seem accurate any more.  Probably best to
> drop it, since there's no real need to say who the callers are if we
> describe the interface.
> 
>> +   mode is the mode of the first operand.
>> +   code points to the comparison code.
>> +   imm points to the rtx containing the immediate.  */
> 
> GCC's convention is to put parameter names in caps in comments,
> so "MODE is...", etc.
> 
> On the IMM line, it might be worth adding "*IMM must satisfy
> CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
> on exit."
> 
>> +void canonicalize_comparison (machine_mode mode, enum rtx_code *code, rtx *imm)
>> +{
>> +  int to_add = 0;
>> +
>> +  if (!SCALAR_INT_MODE_P (mode))
>> +    return;
>> +
>> +  /* Extract the immediate value from the rtx.  */
>> +  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
> 
> This should be:
> 
>    rtx_mode_t imm_val (*imm, mode);
> 
> so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.
> 
>> +  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>> +    to_add = 1;
>> +
>> +  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>> +    to_add = -1;
> 
> Might be better to have:
> 
>    if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>      to_add = 1;
>    else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>      to_add = -1;
>    else
>      return;
> 
> so that we exit early for other comparisons, rather than doing wasted work.
> 
>> +  /* Check for overflow/underflow in the case of signed values and
>> +     wrapping around in the case of unsigned values.  If any occur
>> +     cancel the optimization.  */
>> +  wide_int max_val = wi::max_value (mode, SIGNED);
>> +  wide_int min_val = wi::min_value (mode, SIGNED);
>> +  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
>> +      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
>> +    return;
> 
> This needs to use the SIGNED/UNSIGNED choice appropriate for the
> comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
> seem to be an existing function that says whether an rtx comparison
> operation is signed or not (bit of a surprise), but there is
> unsigned_condition, which returns the unsigned form of an rtx
> comparison operator.  It might be worth adding something like:
> 
>    /* Return true if integer comparison operator CODE interprets its operands
>       as unsigned.  */
> 
>    inline bool
>    unsigned_condition_p (rtx_code code)
>    {
>      return unsigned_condition (code) == code;
>    }
> 
> and using that to choose between SIGNED and UNSIGNED.
> 
> Rather than using wi::cmp, a more direct way of checking for overflow
> is to change this:
> 
>> +  /* Generate a mov instruction for both cases and see whether the change
>> +     was profitable.  */
>> +  wide_int imm_modif = wi::add (imm_val, to_add);
> 
> to use the overflow form of wi::add, i.e.:
> 
>    wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);
> 
> and return if "overflow" is set.
> 
>> +
>> +  rtx reg = gen_reg_rtx (mode);
> 
> gen_reg_rtx creates a new pseudo register that essentially stays
> around until we've finished compiling the function.  It's better to use:
> 
>    gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);
> 
> for cost calculations, so that we don't waste pseudo register numbers.
> 
>> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
> 
> This should be:
> 
>    rtx new_imm = immed_wide_int_const (imm_modif, mode);
> 
> to cope with non-CONST_INT immediates, and to ensure that CONST_INT
> immediates are properly sign-extended.
> 
> (The rtx interfaces are a bit clunky, sorry.)
> 
> Patch looks good to me with those changes, but someone else will
> need to approve.
> 
> Thanks,
> Richard
> 

Thanks for taking the time to point everything out.

I've updated the patch according to Richard's comments.
See the updated patch below. OK for trunk?

Thanks,
Vlad

---

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded into a
register in fewer instructions. The cases are as follows:

i)   a >  b or a >= b + 1
ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less instructions.
This is done in the gimple expanders. Therefore, it requires a correct implementation of the
TARGET_INSN_COST hook. The change is general and it applies to any integer comparison, regardless
of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
   return x > 0xfefffffe;
}

it generates:

mov	w1, -16777217
cmp	w0, w1
cset	w0, cs

instead of:

mov	w1, 65534
movk	w1, 0xfeff, lsl 16
cmp	w0, w1
cset	w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>

	* gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>
	* expmed.h (canonicalize_comparison): New declaration.
	* expmed.c (canonicalize_comparison, equivalent_cmp_code): New.
	* expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
	* optabs.c (prepare_cmp_insn): Likewise.
	* rtl.h (unsigned_condition_p): New function which checks if a comparison operator is unsigned.

---

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 2890d9c9bbd034f01030dd551d544bf73e73b784..cc247c4379e3e4927f837421eb386dee2db98255 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -702,6 +702,8 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode,
  extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
  				  machine_mode, int, int);
  
+extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);
+
  /* Choose a minimal N + 1 bit approximation to 1/D that can be used to
     replace division by D, and put the least significant N bits of the result
     in *MULTIPLIER_PTR and return the most significant bit.  */
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 4c74e7dc2ea448fb1c1dd69d197b4bca23668b44..fbddb158f68e10f932a3473e392d3ea92f04790f 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5538,6 +5538,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1,
    if (mode == VOIDmode)
      mode = GET_MODE (op0);
  
+  if (CONST_SCALAR_INT_P (op1))
+    canonicalize_comparison (mode, &code, &op1);
+
    /* For some comparisons with 1 and -1, we can convert this to
       comparisons with zero.  This will often produce more opportunities for
       store-flag insns.  */
@@ -6153,6 +6156,95 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
  
    return target;
  }
+
+/* Helper function for canonicalize_cmp_for_target.  Swap between inclusive
+   and exclusive ranges in order to create an equivalent comparison.  See
+   canonicalize_cmp_for_target for the possible cases.  */
+
+static enum rtx_code
+equivalent_cmp_code (enum rtx_code code)
+{
+  switch (code)
+    {
+    case GT:
+      return GE;
+    case GE:
+      return GT;
+    case LT:
+      return LE;
+    case LE:
+      return LT;
+    case GTU:
+      return GEU;
+    case GEU:
+      return GTU;
+    case LTU:
+      return LEU;
+    case LEU:
+      return LTU;
+
+    default:
+      return code;
+    }
+}
+
+/* Choose the more appropiate immediate in scalar integer comparisons.  The
+   purpose of this is to end up with an immediate which can be loaded into a
+   register in fewer moves, if possible.
+
+   For each integer comparison there exists an equivalent choice:
+     i)   a >  b or a >= b + 1
+     ii)  a <= b or a <  b + 1
+     iii) a >= b or a >  b - 1
+     iv)  a <  b or a <= b - 1
+
+   MODE is the mode of the first operand.
+   CODE points to the comparison code.
+   IMM points to the rtx containing the immediate.  *IMM must satisfy
+   CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
+   on exit.  */
+
+void
+canonicalize_comparison (machine_mode mode, enum rtx_code *code, rtx *imm)
+{
+  if (!SCALAR_INT_MODE_P (mode))
+    return;
+
+  int to_add = 0;
+  enum signop sgn = unsigned_condition_p (*code) ? UNSIGNED : SIGNED;
+
+  /* Extract the immediate value from the rtx.  */
+  wide_int imm_val = rtx_mode_t (*imm, mode);
+
+  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
+    to_add = 1;
+  else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
+    to_add = -1;
+  else
+    return;
+
+  /* Check for overflow/underflow in the case of signed values and
+     wrapping around in the case of unsigned values.  If any occur
+     cancel the optimization.  */
+  wi::overflow_type overflow = wi::OVF_NONE;
+  wide_int imm_modif = wi::add (imm_val, to_add, sgn, &overflow);
+  if (overflow)
+    return;
+
+  rtx reg = gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);
+  rtx new_imm = immed_wide_int_const (imm_modif, mode);
+
+  rtx_insn *old_rtx = gen_move_insn (reg, *imm);
+  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
+
+  /* Update the immediate and the code.  */
+  if (insn_cost (old_rtx, true) > insn_cost (new_rtx, true))
+    {
+      *code = equivalent_cmp_code (*code);
+      *imm = new_imm;
+    }
+}
+
  
  /* Perform possibly multi-word comparison and conditional jump to LABEL
     if ARG1 OP ARG2 true where ARG1 and ARG2 are of mode MODE.  This is
diff --git a/gcc/optabs.c b/gcc/optabs.c
index cadf4676c986c8430baafab8ef5282e890d36308..6052222c90ce559ae3b11a40fbd2ebbf4a6bc8dc 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3812,6 +3812,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
    gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
  	      || methods == OPTAB_LIB_WIDEN);
  
+  if (CONST_SCALAR_INT_P (y))
+    canonicalize_comparison (mode, &comparison, &y);
+
    /* If we are optimizing, force expensive constants into a register.  */
    if (CONSTANT_P (x) && optimize
        && (rtx_cost (x, mode, COMPARE, 0, optimize_insn_for_speed_p ())
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 565ce3abbe4e199609a6915de648736b43aa838d..f92d6c2d6dcfb9721609b20b7ab49b35ee4ba03a 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3296,6 +3296,15 @@ extern enum rtx_code unsigned_condition (enum rtx_code);
  extern enum rtx_code signed_condition (enum rtx_code);
  extern void mark_jump_label (rtx, rtx_insn *, int);
  
+/* Return true if integer comparison operator CODE interprets its operands
+   as unsigned.  */
+
+inline bool
+unsigned_condition_p (enum rtx_code code)
+{
+  return unsigned_condition (code) == code;
+}
+
  /* In jump.c */
  extern rtx_insn *delete_related_insns (rtx);
  
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
new file mode 100644
index 0000000000000000000000000000000000000000..ebc44d6dbc7287d907603d77d7b54496de177c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Go from four moves to two.  */
+
+int
+foo (long long x)
+{
+  return x <= 0x1999999999999998;
+}
+
+int
+GT (unsigned int x)
+{
+  return x > 0xfefffffe;
+}
+
+int
+LE (unsigned int x)
+{
+  return x <= 0xfefffffe;
+}
+
+int
+GE (long long x)
+{
+  return x >= 0xff000000;
+}
+
+int
+LT (int x)
+{
+  return x < 0xff000000;
+}
+
+/* Optimize the immediate in conditionals.  */
+
+int
+check (int x, int y)
+{
+  if (x > y && GT (x))
+    return 100;
+
+  return x;
+}
+
+int
+tern (int x)
+{
+  return x >= 0xff000000 ? 5 : -3;
+}
+
+/* baz produces one movk instruction.  */
+/* { dg-final { scan-assembler-times "movk" 1 } } */
Richard Sandiford Aug. 16, 2018, 10:28 a.m. | #6
Vlad Lazar <vlad.lazar@arm.com> writes:
> On 13/08/18 15:00, Richard Sandiford wrote:
>> Vlad Lazar <vlad.lazar@arm.com> writes:
>>> diff --git a/gcc/expmed.h b/gcc/expmed.h
>>> index 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 100644
>>> --- a/gcc/expmed.h
>>> +++ b/gcc/expmed.h
>>> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode,
>>>    extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
>>>    				  machine_mode, int, int);
>>>    
>>> +extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);
>>> +
>>> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
>> 
>> It would probably be better to make the second function static (local
>> to expmed.c), since it's only used internally by canonicalize_comparison.
>> Switching the order of the two functions in expmed.c would avoid the need
>> for a forward declaration.
>> 
>>> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
>>>    
>>>      return target;
>>>    }
>>> +
>>> +/* Choose the more appropiate immediate in comparisons.  The purpose of this
>> 
>> Maybe worth saying "scalar integer comparisons", since the function
>> doesn't handle vector or floating-point comparisons.
>> 
>>> + is to end up with an immediate which can be loaded into a register
>>> in fewer
>>> +   moves, if possible.
>>> +
>>> +   For each integer comparison there exists an equivalent choice:
>>> +     i)   a >  b or a >= b + 1
>>> +     ii)  a <= b or a <  b + 1
>>> +     iii) a >= b or a >  b - 1
>>> +     iv)  a <  b or a <= b - 1
>>> +
>>> +   The function is called in the gimple expanders which handle GIMPLE_ASSIGN
>>> + and GIMPLE_COND.  It assumes that the first operand of the
>>> comparison is a
>>> +   register and the second is a constant.
>> 
>> This last paragraph doesn't seem accurate any more.  Probably best to
>> drop it, since there's no real need to say who the callers are if we
>> describe the interface.
>> 
>>> +   mode is the mode of the first operand.
>>> +   code points to the comparison code.
>>> +   imm points to the rtx containing the immediate.  */
>> 
>> GCC's convention is to put parameter names in caps in comments,
>> so "MODE is...", etc.
>> 
>> On the IMM line, it might be worth adding "*IMM must satisfy
>> CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
>> on exit."
>> 
>>> +void canonicalize_comparison (machine_mode mode, enum rtx_code
>>> *code, rtx *imm)
>>> +{
>>> +  int to_add = 0;
>>> +
>>> +  if (!SCALAR_INT_MODE_P (mode))
>>> +    return;
>>> +
>>> +  /* Extract the immediate value from the rtx.  */
>>> +  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
>> 
>> This should be:
>> 
>>    rtx_mode_t imm_val (*imm, mode);
>> 
>> so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.
>> 
>>> +  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>> +    to_add = 1;
>>> +
>>> +  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>> +    to_add = -1;
>> 
>> Might be better to have:
>> 
>>    if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>      to_add = 1;
>>    else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>      to_add = -1;
>>    else
>>      return;
>> 
>> so that we exit early for other comparisons, rather than doing wasted work.
>> 
>>> +  /* Check for overflow/underflow in the case of signed values and
>>> +     wrapping around in the case of unsigned values.  If any occur
>>> +     cancel the optimization.  */
>>> +  wide_int max_val = wi::max_value (mode, SIGNED);
>>> +  wide_int min_val = wi::min_value (mode, SIGNED);
>>> +  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
>>> +      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
>>> +    return;
>> 
>> This needs to use the SIGNED/UNSIGNED choice appropriate for the
>> comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
>> seem to be an existing function that says whether an rtx comparison
>> operation is signed or not (bit of a surprise), but there is
>> unsigned_condition, which returns the unsigned form of an rtx
>> comparison operator.  It might be worth adding something like:
>> 
>>    /* Return true if integer comparison operator CODE interprets its operands
>>       as unsigned.  */
>> 
>>    inline bool
>>    unsigned_condition_p (rtx_code code)
>>    {
>>      return unsigned_condition (code) == code;
>>    }
>> 
>> and using that to choose between SIGNED and UNSIGNED.
>> 
>> Rather than using wi::cmp, a more direct way of checking for overflow
>> is to change this:
>> 
>>> +  /* Generate a mov instruction for both cases and see whether the change
>>> +     was profitable.  */
>>> +  wide_int imm_modif = wi::add (imm_val, to_add);
>> 
>> to use the overflow form of wi::add, i.e.:
>> 
>>    wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);
>> 
>> and return if "overflow" is set.
>> 
>>> +
>>> +  rtx reg = gen_reg_rtx (mode);
>> 
>> gen_reg_rtx creates a new pseudo register that essentially stays
>> around until we've finished compiling the function.  It's better to use:
>> 
>>    gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);
>> 
>> for cost calculations, so that we don't waste pseudo register numbers.
>> 
>>> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
>> 
>> This should be:
>> 
>>    rtx new_imm = immed_wide_int_const (imm_modif, mode);
>> 
>> to cope with non-CONST_INT immediates, and to ensure that CONST_INT
>> immediates are properly sign-extended.
>> 
>> (The rtx interfaces are a bit clunky, sorry.)
>> 
>> Patch looks good to me with those changes, but someone else will
>> need to approve.
>> 
>> Thanks,
>> Richard
>> 
>
> Thanks for taking the time to point everything out.
>
> I've updated the patch according to Richard's comments.
> See the updated patch below. OK for trunk?

Thanks for the updates.  The patch looks good to me FWIW.

Richard
Jeff Law Aug. 16, 2018, 4:35 p.m. | #7
On 08/14/2018 11:01 AM, Vlad Lazar wrote:
> On 13/08/18 15:00, Richard Sandiford wrote:
>> Vlad Lazar <vlad.lazar@arm.com> writes:
>>> diff --git a/gcc/expmed.h b/gcc/expmed.h
>>> index
>>> 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16
>>> 100644
>>> --- a/gcc/expmed.h
>>> +++ b/gcc/expmed.h
>>> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code,
>>> rtx, rtx, machine_mode,
>>>    extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
>>>                      machine_mode, int, int);
>>>    +extern void canonicalize_comparison (machine_mode, enum rtx_code
>>> *, rtx *);
>>> +
>>> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
>>
>> It would probably be better to make the second function static (local
>> to expmed.c), since it's only used internally by canonicalize_comparison.
>> Switching the order of the two functions in expmed.c would avoid the need
>> for a forward declaration.
>>
>>> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum
>>> rtx_code code, rtx op0, rtx op1,
>>>         return target;
>>>    }
>>> +
>>> +/* Choose the more appropiate immediate in comparisons.  The purpose
>>> of this
>>
>> Maybe worth saying "scalar integer comparisons", since the function
>> doesn't handle vector or floating-point comparisons.
>>
>>> +   is to end up with an immediate which can be loaded into a
>>> register in fewer
>>> +   moves, if possible.
>>> +
>>> +   For each integer comparison there exists an equivalent choice:
>>> +     i)   a >  b or a >= b + 1
>>> +     ii)  a <= b or a <  b + 1
>>> +     iii) a >= b or a >  b - 1
>>> +     iv)  a <  b or a <= b - 1
>>> +
>>> +   The function is called in the gimple expanders which handle
>>> GIMPLE_ASSIGN
>>> +   and GIMPLE_COND.  It assumes that the first operand of the
>>> comparison is a
>>> +   register and the second is a constant.
>>
>> This last paragraph doesn't seem accurate any more.  Probably best to
>> drop it, since there's no real need to say who the callers are if we
>> describe the interface.
>>
>>> +   mode is the mode of the first operand.
>>> +   code points to the comparison code.
>>> +   imm points to the rtx containing the immediate.  */
>>
>> GCC's convention is to put parameter names in caps in comments,
>> so "MODE is...", etc.
>>
>> On the IMM line, it might be worth adding "*IMM must satisfy
>> CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
>> on exit."
>>
>>> +void canonicalize_comparison (machine_mode mode, enum rtx_code
>>> *code, rtx *imm)
>>> +{
>>> +  int to_add = 0;
>>> +
>>> +  if (!SCALAR_INT_MODE_P (mode))
>>> +    return;
>>> +
>>> +  /* Extract the immediate value from the rtx.  */
>>> +  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
>>
>> This should be:
>>
>>    rtx_mode_t imm_val (*imm, mode);
>>
>> so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.
>>
>>> +  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>> +    to_add = 1;
>>> +
>>> +  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>> +    to_add = -1;
>>
>> Might be better to have:
>>
>>    if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>      to_add = 1;
>>    else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>      to_add = -1;
>>    else
>>      return;
>>
>> so that we exit early for other comparisons, rather than doing wasted
>> work.
>>
>>> +  /* Check for overflow/underflow in the case of signed values and
>>> +     wrapping around in the case of unsigned values.  If any occur
>>> +     cancel the optimization.  */
>>> +  wide_int max_val = wi::max_value (mode, SIGNED);
>>> +  wide_int min_val = wi::min_value (mode, SIGNED);
>>> +  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
>>> +      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
>>> +    return;
>>
>> This needs to use the SIGNED/UNSIGNED choice appropriate for the
>> comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
>> seem to be an existing function that says whether an rtx comparison
>> operation is signed or not (bit of a surprise), but there is
>> unsigned_condition, which returns the unsigned form of an rtx
>> comparison operator.  It might be worth adding something like:
>>
>>    /* Return true if integer comparison operator CODE interprets its
>> operands
>>       as unsigned.  */
>>
>>    inline bool
>>    unsigned_condition_p (rtx_code code)
>>    {
>>      return unsigned_condition (code) == code;
>>    }
>>
>> and using that to choose between SIGNED and UNSIGNED.
>>
>> Rather than using wi::cmp, a more direct way of checking for overflow
>> is to change this:
>>
>>> +  /* Generate a mov instruction for both cases and see whether the
>>> change
>>> +     was profitable.  */
>>> +  wide_int imm_modif = wi::add (imm_val, to_add);
>>
>> to use the overflow form of wi::add, i.e.:
>>
>>    wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);
>>
>> and return if "overflow" is set.
>>
>>> +
>>> +  rtx reg = gen_reg_rtx (mode);
>>
>> gen_reg_rtx creates a new pseudo register that essentially stays
>> around until we've finished compiling the function.  It's better to use:
>>
>>    gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);
>>
>> for cost calculations, so that we don't waste pseudo register numbers.
>>
>>> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
>>
>> This should be:
>>
>>    rtx new_imm = immed_wide_int_const (imm_modif, mode);
>>
>> to cope with non-CONST_INT immediates, and to ensure that CONST_INT
>> immediates are properly sign-extended.
>>
>> (The rtx interfaces are a bit clunky, sorry.)
>>
>> Patch looks good to me with those changes, but someone else will
>> need to approve.
>>
>> Thanks,
>> Richard
>>
> 
> Thanks for taking the time to point everything out.
> 
> I've updated the patch according to Richard's comments.
> See the updated patch below. OK for trunk?
> 
> Thanks,
> Vlad
> 
> ---
> 
> This patch optimises the choice of immediates in integer comparisons.
> Integer
> comparisons allow for different choices (e.g. a > b is equivalent to a
>>= b+1)
> and there are cases where an incremented/decremented immediate can be
> loaded into a
> register in fewer instructions. The cases are as follows:
> 
> i)   a >  b or a >= b + 1
> ii)  a <= b or a <  b + 1
> iii) a >= b or a >  b - 1
> iv)  a <  b or a <= b - 1
> 
> For each comparison we check whether the equivalent can be performed in
> less instructions.
> This is done in the gimple expanders. Therefore, it requires a correct
> implementation of the
> TARGET_INSN_COST hook. The change is general and it applies to any
> integer comparison, regardless
> of types or location.
> 
> For example, on AArch64 for the following code:
> 
> int foo (int x)
> {
>   return x > 0xfefffffe;
> }
> 
> it generates:
> 
> mov    w1, -16777217
> cmp    w0, w1
> cset    w0, cs
> 
> instead of:
> 
> mov    w1, 65534
> movk    w1, 0xfeff, lsl 16
> cmp    w0, w1
> cset    w0, hi
> 
> Bootstrapped and regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and there are no regressions.
> 
> Thanks,
> Vlad
> 
> gcc/testsuite/
> Changelog for gcc/testsuite/Changelog
> 2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>
> 
>     * gcc.target/aarch64/imm_choice_comparison.c: New.
> 
> gcc/
> Changelog for gcc/Changelog
> 2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>
>     * expmed.h (canonicalize_comparison): New declaration.
>     * expmed.c (canonicalize_comparison, equivalent_cmp_code): New.
>     * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
>     * optabs.c (prepare_cmp_insn): Likewise.
>     * rtl.h (unsigned_condition_p): New function which checks if a
> comparison operator is unsigned.
Thanks.  I fixed up the ChangeLog entry and installed this on the trunk.

Richard S -- thanks for working with Vlad to push this forward.

jeff
Vlad Lazar Aug. 16, 2018, 4:46 p.m. | #8
On 16/08/18 17:35, Jeff Law wrote:
> On 08/14/2018 11:01 AM, Vlad Lazar wrote:
>> On 13/08/18 15:00, Richard Sandiford wrote:
>>> Vlad Lazar <vlad.lazar@arm.com> writes:
>>>> diff --git a/gcc/expmed.h b/gcc/expmed.h
>>>> index
>>>> 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16
>>>> 100644
>>>> --- a/gcc/expmed.h
>>>> +++ b/gcc/expmed.h
>>>> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code,
>>>> rtx, rtx, machine_mode,
>>>>     extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
>>>>                       machine_mode, int, int);
>>>>     +extern void canonicalize_comparison (machine_mode, enum rtx_code
>>>> *, rtx *);
>>>> +
>>>> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
>>>
>>> It would probably be better to make the second function static (local
>>> to expmed.c), since it's only used internally by canonicalize_comparison.
>>> Switching the order of the two functions in expmed.c would avoid the need
>>> for a forward declaration.
>>>
>>>> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum
>>>> rtx_code code, rtx op0, rtx op1,
>>>>          return target;
>>>>     }
>>>> +
>>>> +/* Choose the more appropiate immediate in comparisons.  The purpose
>>>> of this
>>>
>>> Maybe worth saying "scalar integer comparisons", since the function
>>> doesn't handle vector or floating-point comparisons.
>>>
>>>> +   is to end up with an immediate which can be loaded into a
>>>> register in fewer
>>>> +   moves, if possible.
>>>> +
>>>> +   For each integer comparison there exists an equivalent choice:
>>>> +     i)   a >  b or a >= b + 1
>>>> +     ii)  a <= b or a <  b + 1
>>>> +     iii) a >= b or a >  b - 1
>>>> +     iv)  a <  b or a <= b - 1
>>>> +
>>>> +   The function is called in the gimple expanders which handle
>>>> GIMPLE_ASSIGN
>>>> +   and GIMPLE_COND.  It assumes that the first operand of the
>>>> comparison is a
>>>> +   register and the second is a constant.
>>>
>>> This last paragraph doesn't seem accurate any more.  Probably best to
>>> drop it, since there's no real need to say who the callers are if we
>>> describe the interface.
>>>
>>>> +   mode is the mode of the first operand.
>>>> +   code points to the comparison code.
>>>> +   imm points to the rtx containing the immediate.  */
>>>
>>> GCC's convention is to put parameter names in caps in comments,
>>> so "MODE is...", etc.
>>>
>>> On the IMM line, it might be worth adding "*IMM must satisfy
>>> CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
>>> on exit."
>>>
>>>> +void canonicalize_comparison (machine_mode mode, enum rtx_code
>>>> *code, rtx *imm)
>>>> +{
>>>> +  int to_add = 0;
>>>> +
>>>> +  if (!SCALAR_INT_MODE_P (mode))
>>>> +    return;
>>>> +
>>>> +  /* Extract the immediate value from the rtx.  */
>>>> +  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
>>>
>>> This should be:
>>>
>>>     rtx_mode_t imm_val (*imm, mode);
>>>
>>> so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.
>>>
>>>> +  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>>> +    to_add = 1;
>>>> +
>>>> +  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>>> +    to_add = -1;
>>>
>>> Might be better to have:
>>>
>>>     if (*code == GT || *code == GTU || *code == LE || *code == LEU)
>>>       to_add = 1;
>>>     else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
>>>       to_add = -1;
>>>     else
>>>       return;
>>>
>>> so that we exit early for other comparisons, rather than doing wasted
>>> work.
>>>
>>>> +  /* Check for overflow/underflow in the case of signed values and
>>>> +     wrapping around in the case of unsigned values.  If any occur
>>>> +     cancel the optimization.  */
>>>> +  wide_int max_val = wi::max_value (mode, SIGNED);
>>>> +  wide_int min_val = wi::min_value (mode, SIGNED);
>>>> +  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
>>>> +      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
>>>> +    return;
>>>
>>> This needs to use the SIGNED/UNSIGNED choice appropriate for the
>>> comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
>>> seem to be an existing function that says whether an rtx comparison
>>> operation is signed or not (bit of a surprise), but there is
>>> unsigned_condition, which returns the unsigned form of an rtx
>>> comparison operator.  It might be worth adding something like:
>>>
>>>     /* Return true if integer comparison operator CODE interprets its
>>> operands
>>>        as unsigned.  */
>>>
>>>     inline bool
>>>     unsigned_condition_p (rtx_code code)
>>>     {
>>>       return unsigned_condition (code) == code;
>>>     }
>>>
>>> and using that to choose between SIGNED and UNSIGNED.
>>>
>>> Rather than using wi::cmp, a more direct way of checking for overflow
>>> is to change this:
>>>
>>>> +  /* Generate a mov instruction for both cases and see whether the
>>>> change
>>>> +     was profitable.  */
>>>> +  wide_int imm_modif = wi::add (imm_val, to_add);
>>>
>>> to use the overflow form of wi::add, i.e.:
>>>
>>>     wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);
>>>
>>> and return if "overflow" is set.
>>>
>>>> +
>>>> +  rtx reg = gen_reg_rtx (mode);
>>>
>>> gen_reg_rtx creates a new pseudo register that essentially stays
>>> around until we've finished compiling the function.  It's better to use:
>>>
>>>     gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);
>>>
>>> for cost calculations, so that we don't waste pseudo register numbers.
>>>
>>>> +  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
>>>
>>> This should be:
>>>
>>>     rtx new_imm = immed_wide_int_const (imm_modif, mode);
>>>
>>> to cope with non-CONST_INT immediates, and to ensure that CONST_INT
>>> immediates are properly sign-extended.
>>>
>>> (The rtx interfaces are a bit clunky, sorry.)
>>>
>>> Patch looks good to me with those changes, but someone else will
>>> need to approve.
>>>
>>> Thanks,
>>> Richard
>>>
>>
>> Thanks for taking the time to point everything out.
>>
>> I've updated the patch according to Richard's comments.
>> See the updated patch below. OK for trunk?
>>
>> Thanks,
>> Vlad
>>
>> ---
>>
>> This patch optimises the choice of immediates in integer comparisons.
>> Integer
>> comparisons allow for different choices (e.g. a > b is equivalent to a
>>> = b+1)
>> and there are cases where an incremented/decremented immediate can be
>> loaded into a
>> register in fewer instructions. The cases are as follows:
>>
>> i)   a >  b or a >= b + 1
>> ii)  a <= b or a <  b + 1
>> iii) a >= b or a >  b - 1
>> iv)  a <  b or a <= b - 1
>>
>> For each comparison we check whether the equivalent can be performed in
>> less instructions.
>> This is done in the gimple expanders. Therefore, it requires a correct
>> implementation of the
>> TARGET_INSN_COST hook. The change is general and it applies to any
>> integer comparison, regardless
>> of types or location.
>>
>> For example, on AArch64 for the following code:
>>
>> int foo (int x)
>> {
>>    return x > 0xfefffffe;
>> }
>>
>> it generates:
>>
>> mov    w1, -16777217
>> cmp    w0, w1
>> cset    w0, cs
>>
>> instead of:
>>
>> mov    w1, 65534
>> movk    w1, 0xfeff, lsl 16
>> cmp    w0, w1
>> cset    w0, hi
>>
>> Bootstrapped and regtested on aarch64-none-linux-gnu,
>> x86_64-pc-linux-gnu and there are no regressions.
>>
>> Thanks,
>> Vlad
>>
>> gcc/testsuite/
>> Changelog for gcc/testsuite/Changelog
>> 2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>
>>
>>      * gcc.target/aarch64/imm_choice_comparison.c: New.
>>
>> gcc/
>> Changelog for gcc/Changelog
>> 2018-08-14  Vlad Lazar  <vlad.lazar@arm.com>
>>      * expmed.h (canonicalize_comparison): New declaration.
>>      * expmed.c (canonicalize_comparison, equivalent_cmp_code): New.
>>      * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
>>      * optabs.c (prepare_cmp_insn): Likewise.
>>      * rtl.h (unsigned_condition_p): New function which checks if a
>> comparison operator is unsigned.
> Thanks.  I fixed up the ChangeLog entry and installed this on the trunk.
>
> Richard S -- thanks for working with Vlad to push this forward.
>
> jeff
>
Thanks for committing.  Sorry about the ChangeLog.

Vlad
Jeff Law Aug. 16, 2018, 4:48 p.m. | #9
On 08/16/2018 10:46 AM, Vlad Lazar wrote:
>> Thanks.  I fixed up the ChangeLog entry and installed this on the trunk.
>>
>> Richard S -- thanks for working with Vlad to push this forward.
>>
>> jeff
>>
> Thanks for committing.  Sorry about the ChangeLog.
No worries.  Just trivial stuff that we deal with all the time.

If you're going to be contributing regularly you probably want to get
write access to the gcc repository.

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

List me as the sponsor/approver.

jeff

Patch

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 9b91279282e1c6956c8b3699f13036c401ea1dcd..5b0a2e0cdb23f649336844506c8241428b3e6e7d 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5423,6 +5423,157 @@  reorder_operands (basic_block bb)
    XDELETE (lattice);
  }
  
+/* Helper function for update_immediate.  Returns the adjusted conditional
+   code for the immediate choice optimization.  See optimize_immediate_choice
+   for cases.  */
+
+static enum tree_code
+immediate_optimized_code (enum tree_code code)
+{
+  switch (code)
+    {
+    case GT_EXPR:
+      return GE_EXPR;
+    case GE_EXPR:
+      return GT_EXPR;
+    case LT_EXPR:
+      return LE_EXPR;
+    case LE_EXPR:
+      return LT_EXPR;
+    default:
+      return code;
+    }
+}
+
+/* Helper function for optimize_immediate_choice.  It updates the immediate
+   and the subcode of the gimple statement.  At the point of calling
+   the function we know that the optimization leads to fewer instructions.
+   stmt points to the gimple statement containing the comparison we wish to
+   optimize and to_add is the amount by which the immediate needs to be
+   adjusted (1 or -1).  */
+
+static void
+update_immediate (gimple *stmt, int to_add)
+{
+  tree inc_dec = to_add == 1 ? build_one_cst (integer_type_node) :
+			       build_minus_one_cst (integer_type_node);
+
+  enum gimple_code code = gimple_code (stmt);
+  if (code == GIMPLE_COND)
+    {
+      gcond *gc = GIMPLE_CHECK2<gcond *> (stmt);
+      tree rhs = gimple_cond_rhs (gc);
+
+      /* Update the statement.  */
+      tree new_rhs = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs), rhs, inc_dec);
+      gimple_cond_set_rhs (gc, new_rhs);
+      gc->subcode = immediate_optimized_code ((enum tree_code) gc->subcode);
+    }
+  if (code == GIMPLE_ASSIGN)
+    {
+      gassign *ga = GIMPLE_CHECK2<gassign *> (stmt);
+      tree rhs2 = gimple_assign_rhs2 (ga);
+
+      tree new_rhs2 = fold_build2 (PLUS_EXPR, TREE_TYPE (rhs2), rhs2, inc_dec);
+      gimple_assign_set_rhs2 (ga, new_rhs2);
+      ga->subcode = immediate_optimized_code ((enum tree_code) ga->subcode);
+    }
+}
+
+/* Helper function for optimize_immediate_choice.  It checks if the other
+   possible immediate leads to a lower rtx cost.  to_add is the amount by
+   which the immediate needs to be adjusted (1 or -1).  The function
+   returns 0 if there is no improvement and the amount by which the immediate
+   needs to be adjusted (1 or -1) otherwise.  */
+
+static int
+validate_immediate_optimization (gimple *stmt, int to_add)
+{
+  tree tree_imm = gimple_code (stmt) == GIMPLE_COND ? gimple_cond_rhs (stmt)
+						: gimple_assign_rhs2 (stmt);
+  const_tree type = TREE_TYPE (tree_imm);
+  widest_int imm = wi::to_widest (tree_imm);
+  enum signop sgn = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
+
+  /* Check for overflow/underflow in the case of signed values and
+     wrapping around in the case of unsigned values.  If any occur
+     cancel the optimization.  */
+
+  widest_int max_type = wi::to_widest (TYPE_MAX_VALUE (type));
+  widest_int min_type = wi::to_widest (TYPE_MIN_VALUE (type));
+  if ((wi::cmp (imm, max_type, sgn) == 0 && to_add == 1)
+      || (wi::cmp (imm, min_type, sgn) == 0 && to_add == -1))
+    return 0;
+
+  widest_int imm_modif = wi::add (imm, to_add);
+
+  rtx reg = gen_reg_rtx (DImode);
+  rtx old_imm = GEN_INT (imm.to_shwi ());
+  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
+
+  rtx_insn *old_rtx = gen_move_insn (reg, old_imm);
+  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
+
+  /* If the change is beneficial we can just propagate to_add further,
+     otherwise return 0 to cancel the optimization.  */
+  return insn_cost (old_rtx, true) > insn_cost (new_rtx, true) ? to_add : 0;
+}
+
+/* Helper function for optimize_immediate_choice.  Checks if the gimple
+   statement has the right shape for the optimization.  */
+
+static bool
+can_optimize_immediate_p (const gimple *stmt)
+{
+  enum gimple_code code = gimple_code (stmt);
+  if (code != GIMPLE_ASSIGN && code != GIMPLE_COND)
+    return false;
+
+  if (code == GIMPLE_ASSIGN
+      && (gimple_num_ops (stmt) != 3
+	  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST))
+    return false;
+  if (code == GIMPLE_COND && TREE_CODE (gimple_cond_rhs (stmt)) != INTEGER_CST)
+    return false;
+
+  return true;
+}
+
+/* Entry point for immediate choice optimization.  It aims to choose
+   the simpler immediate in integer comparisons.  The purpose of this
+   is to end up with an immediate which can be loaded into a register
+   in fewer moves, if possible.
+
+   For each integer comparison there exists an equivalent choice:
+     i)   a >  b or a >= b + 1
+     ii)  a <= b or a <  b + 1
+     iii) a >= b or a >  b - 1
+     iv)  a <  b or a <= b - 1
+
+   Comparisons can only appear on the rhs of a GIMPLE_ASSIGN
+   or in a GIMPLE_COND.  */
+
+static void
+optimize_immediate_choice (gimple *stmt)
+{
+  if (!can_optimize_immediate_p (stmt))
+    return;
+
+  /* Check if the other immediate available is preferable.  */
+  int to_add = 0;
+  if (stmt->subcode == GT_EXPR || stmt->subcode == LE_EXPR)
+    to_add = validate_immediate_optimization (stmt, 1);
+
+  if (stmt->subcode == GE_EXPR || stmt->subcode == LT_EXPR)
+    to_add = validate_immediate_optimization (stmt, -1);
+
+  if (!to_add)
+    return;
+
+  /* Update the current statement.  */
+  update_immediate (stmt, to_add);
+}
+
  /* Expand basic block BB from GIMPLE trees to RTL.  */
  
  static basic_block
@@ -5515,6 +5666,7 @@  expand_gimple_basic_block (basic_block bb, bool disable_tail_calls)
        basic_block new_bb;
  
        stmt = gsi_stmt (gsi);
+      optimize_immediate_choice (stmt);
  
        /* If this statement is a non-debug one, and we generate debug
  	 insns, then this one might be the last real use of a TERed
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
new file mode 100644
index 0000000000000000000000000000000000000000..b30dcca88f44ca73fcb8202ea488888b365400c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -0,0 +1,53 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return x >= 0xfff01;
+}
+
+int
+GT (unsigned int x)
+{
+  return x > 0xfefffffe;
+}
+
+/* Go from four moves to two.  */
+
+int
+baz (long long x)
+{
+  return x <= 0x1999999999999998;
+}
+
+int
+LE (unsigned int x)
+{
+  return x <= 0xfefffffe;
+}
+
+int
+GE (long long x)
+{
+  return x >= 0xff000000;
+}
+
+int
+LT (int x)
+{
+  return x < 0xff000000;
+}
+
+/* Optimize the immediate in conditionals.  */
+
+int check (int x, int y)
+{
+  if (x > y && GT (x))
+    return 100;
+
+  return x;
+}
+
+/* baz produces one movk instruction.  */
+/* { dg-final { scan-assembler-times "movk" 1 } } */