diff mbox

Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

Message ID 20131012073934.GB30970@tucnak.zalov.cz
State New
Headers show

Commit Message

Jakub Jelinek Oct. 12, 2013, 7:39 a.m. UTC
On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote:
> As you had mentioned, the transition in this patch does not reduce
> instructions. But the preexisting optimization does. So we prefer to do the
> preexisting optimization first. E.g.
> 
> X == 10 || X == 12 || X == 26

Ok, that makes sense.

@@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode,
 	}
     }
 
+  /* Optimize X == CST1 || X == CST2
+     if popcount (CST2 - CST1) == 1 into
+     ((X - CST1) & ~(CST2 - CST1)) == 0.  */

Mention here also similarly to the other comment that it works also
with ranges.

+  if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) >= 2)
+    for (i = first; i < length; i++)
+      {
+	tree lowi, highi, lowj, highj, type, tem1, tem2, mask;
+
+	if (ranges[i].exp == NULL_TREE || ranges[i].in_p)
+	  continue;
+	type = TREE_TYPE (ranges[i].exp);
+	if (!INTEGRAL_TYPE_P (type))
+	  continue;
+	lowi = ranges[i].low;
+	if (lowi == NULL_TREE)
+	  continue;

The other loop has here:
      if (lowi == NULL_TREE)
        lowi = TYPE_MIN_VALUE (type);
instead, which is better, can you please change it?

+	highi = ranges[i].high;
+	if (highi == NULL_TREE)
+	  continue;
+	for (j = i + 1; j < length && j < i + 64; j++)
+	  {
+	    lowj = ranges[j].low;
+	    if (lowj == NULL_TREE)
+	      continue;
+	    highj = ranges[j].high;
+	    if (highj == NULL_TREE)
+	      continue;

The other loop has
	if (highj == NULL_TREE)
	  highj = TYPE_MAX_VALUE (type);
here instead.

+	    if (ranges[j].exp == NULL_TREE || ranges[j].in_p
+		|| (ranges[i].exp != ranges[j].exp))
+	      continue;

Can you please move this test the lowj = assignment, and
remove the ranges[j].exp == NULL_TREE test (both here and
in the earlier loop)?  I mean, we've checked at the
beginning of for (i = first; i < length; i++) loop that
ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE,
it will be different than ranges[i].exp.
So
	if (ranges[i].exp != ranges[j].exp || ranges[j].in_p)
	  continue;
(and please also collapse the two checks in the first loop,
so that both look similar).

+	    /* Check lowj > highi.  */
+	    tem1 = fold_binary (GT_EXPR, boolean_type_node,
+			       lowj, highi);
+	    if (tem1 == NULL_TREE || !integer_onep (tem1))
+	      continue;
+	    /* Check highi - lowi == highj - lowj.  */
+	    tem1 = fold_binary (MINUS_EXPR, type, highi, lowi);
+	    if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
+	      continue;
+	    tem2 = fold_binary (MINUS_EXPR, type, highj, lowj);
+	    if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST)
+	      continue;
+	    if (!tree_int_cst_equal (tem1, tem2))
+	      continue;
+	    /* Check popcount (lowj - lowi) == 1.  */
+	    tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi);
+	    if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
+	      continue;
+	    if (tree_log2 (tem1) < 0)
+	      continue;
+	    mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
+	    tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi);
+	    tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
+	    lowi = build_int_cst (type, 0);

Please use lowj instead of lowi here.  Because, if update_range_test
fails, we continue looking for further matches in following ranges,
and while lowj will be computed again, lowi will be assumed to contain
the low bound of the first range.

+	    if (update_range_test (ranges + i, ranges + j, 1, opcode, ops, tem1,
+				   ranges[i].in_p, lowi, tem2,

And here too.

Or, alternatively to avoid duplication, you could put the whole
for (i = first; i < length; i++)
loop from the first optimization into another
int i;
  for (pass = 0; pass < (BRANCH_COST (optimize_function_for_speed_p (cfun),
				      false) >= 2) + 1; pass++)
  {
  }
loop, use whatever is common in the loop unconditionally, and just
conditionalize the rest on pass == 0 vs. pass == 1.
Or maybe even better just move this whole loop into a new function,
with ranges, opcode, first, length arguments plus another (bool?) argument
which of the two optimizations you want to perform, and return the bool
any_changes.  Then you wouldn't run into line length issues that you could
run into with the extra loop.


int
main ()
{
  volatile int n43, n47, n75, n79;
  n43 = 43; n47 = n43 + 4; n75 = 75; n79 = n75 + 4;
  int i;
  for (i = -10; i <= 100; i++)
    if (test (i, 20, 30) != 30 - ((i >= n43 && i < n47)
				  || (i >= n75 && i < n79)) * 10)
      __builtin_abort ();
  return 0;
}

?

	Jakub

Comments

Zhenqiang Chen Oct. 14, 2013, 7:10 a.m. UTC | #1
Patch and test cases are updated according to your comments.

Bootstrap and no make check regression on X86-64.

ChangeLog:
2013-10-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>

	* tree-ssa-reassoc.c (try_transfer_range_tests_1): New function,
	extracted from optimize_range_tests
	* (try_transfer_range_tests_2): New function.
	* (try_transfer_range_tests): New function, extracted from
	optimize_range_tests and calls try_transfer_range_tests_1 and
	try_transfer_range_tests_2,
	* (optimize_range_tests): Use try_transfer_range_tests.

testsuite/ChangeLog:
2013-10-14  Zhenqiang Chen  <zhenqiang.chen@arm.com>

	* gcc.dg/tree-ssa/reassoc-32.c: New test case.
	* gcc.dg/tree-ssa/reassoc-33.c: New test case.
	* gcc.dg/tree-ssa/reassoc-34.c: New test case.
	* gcc.dg/tree-ssa/reassoc-35.c: New test case.
	* gcc.dg/tree-ssa/reassoc-36.c: New test case.

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Saturday, October 12, 2013 3:40 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
> CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> 
> On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote:
> > As you had mentioned, the transition in this patch does not reduce
> > instructions. But the preexisting optimization does. So we prefer to
> > do the preexisting optimization first. E.g.
> >
> > X == 10 || X == 12 || X == 26
> 
> Ok, that makes sense.
> 
> @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode,
>  	}
>      }
> 
> +  /* Optimize X == CST1 || X == CST2
> +     if popcount (CST2 - CST1) == 1 into
> +     ((X - CST1) & ~(CST2 - CST1)) == 0.  */
> 
> Mention here also similarly to the other comment that it works also with
> ranges.
> 
> +  if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) >= 2)
> +    for (i = first; i < length; i++)
> +      {
> +	tree lowi, highi, lowj, highj, type, tem1, tem2, mask;
> +
> +	if (ranges[i].exp == NULL_TREE || ranges[i].in_p)
> +	  continue;
> +	type = TREE_TYPE (ranges[i].exp);
> +	if (!INTEGRAL_TYPE_P (type))
> +	  continue;
> +	lowi = ranges[i].low;
> +	if (lowi == NULL_TREE)
> +	  continue;
> 
> The other loop has here:
>       if (lowi == NULL_TREE)
>         lowi = TYPE_MIN_VALUE (type);
> instead, which is better, can you please change it?
> 
> +	highi = ranges[i].high;
> +	if (highi == NULL_TREE)
> +	  continue;
> +	for (j = i + 1; j < length && j < i + 64; j++)
> +	  {
> +	    lowj = ranges[j].low;
> +	    if (lowj == NULL_TREE)
> +	      continue;
> +	    highj = ranges[j].high;
> +	    if (highj == NULL_TREE)
> +	      continue;
> 
> The other loop has
> 	if (highj == NULL_TREE)
> 	  highj = TYPE_MAX_VALUE (type);
> here instead.
> 
> +	    if (ranges[j].exp == NULL_TREE || ranges[j].in_p
> +		|| (ranges[i].exp != ranges[j].exp))
> +	      continue;
> 
> Can you please move this test the lowj = assignment, and remove the
> ranges[j].exp == NULL_TREE test (both here and in the earlier loop)?  I
mean,
> we've checked at the beginning of for (i = first; i < length; i++) loop
that
> ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will
be
> different than ranges[i].exp.
> So
> 	if (ranges[i].exp != ranges[j].exp || ranges[j].in_p)
> 	  continue;
> (and please also collapse the two checks in the first loop, so that both
look
> similar).
> 
> +	    /* Check lowj > highi.  */
> +	    tem1 = fold_binary (GT_EXPR, boolean_type_node,
> +			       lowj, highi);
> +	    if (tem1 == NULL_TREE || !integer_onep (tem1))
> +	      continue;
> +	    /* Check highi - lowi == highj - lowj.  */
> +	    tem1 = fold_binary (MINUS_EXPR, type, highi, lowi);
> +	    if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
> +	      continue;
> +	    tem2 = fold_binary (MINUS_EXPR, type, highj, lowj);
> +	    if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST)
> +	      continue;
> +	    if (!tree_int_cst_equal (tem1, tem2))
> +	      continue;
> +	    /* Check popcount (lowj - lowi) == 1.  */
> +	    tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi);
> +	    if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
> +	      continue;
> +	    if (tree_log2 (tem1) < 0)
> +	      continue;
> +	    mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
> +	    tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi);
> +	    tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
> +	    lowi = build_int_cst (type, 0);
> 
> Please use lowj instead of lowi here.  Because, if update_range_test
fails, we
> continue looking for further matches in following ranges, and while lowj
will
> be computed again, lowi will be assumed to contain the low bound of the
> first range.
> 
> +	    if (update_range_test (ranges + i, ranges + j, 1, opcode, ops,
tem1,
> +				   ranges[i].in_p, lowi, tem2,
> 
> And here too.
> 
> Or, alternatively to avoid duplication, you could put the whole for (i =
first; i <
> length; i++) loop from the first optimization into another int i;
>   for (pass = 0; pass < (BRANCH_COST (optimize_function_for_speed_p
> (cfun),
> 				      false) >= 2) + 1; pass++)
>   {
>   }
> loop, use whatever is common in the loop unconditionally, and just
> conditionalize the rest on pass == 0 vs. pass == 1.
> Or maybe even better just move this whole loop into a new function, with
> ranges, opcode, first, length arguments plus another (bool?) argument
which
> of the two optimizations you want to perform, and return the bool
> any_changes.  Then you wouldn't run into line length issues that you could
> run into with the extra loop.
> 
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-*
> +v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-*
> +mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
> +
> +/* { dg-options "-O2 -fno-inline -fdump-tree-reassoc1-details" } */
> +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
> +
> +int test (int a, int b, int c)
> +{
> +  if (a == 43 || a == 75 || a == 44 || a == 78 || a == 77 || a == 46 || a
== 76 ||
> a == 45)
> +    return b;
> +  else
> +    return c;
> +}
> +
> +int main ()
> +{
> +  if (test (43, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (44, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (45, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (46, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (75, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (76, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (77, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (78, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (30, 20, 30) != 30)
> +    __builtin_abort ();
> 
> Perhaps it would be better to test more than just one value outside of the
> range.  Say everything from -10 to 100 might be better, but you'd want to
> make sure the optimization can't be applied to the What about
> 
> int
> main ()
> {
>   volatile int n43, n47, n75, n79;
>   n43 = 43; n47 = n43 + 4; n75 = 75; n79 = n75 + 4;
>   int i;
>   for (i = -10; i <= 100; i++)
>     if (test (i, 20, 30) != 30 - ((i >= n43 && i < n47)
> 				  || (i >= n75 && i < n79)) * 10)
>       __builtin_abort ();
>   return 0;
> }
> 
> ?
> 
> 	Jakub
Jakub Jelinek Oct. 14, 2013, 8:49 a.m. UTC | #2
On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote:

@@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, struct range_entry *otherrange,
   return true;
 }
 
+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2 && X != 3 && X != 10 && X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) <= 1U && (X - 10U) <= 1U
+   and this loop can transform that into
+   ((X & ~8) - 2U) <= 1U.  */
+
+static bool
+try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type,
+			    tree lowi, tree lowj, tree highi, tree highj,
+			    vec<operand_entry_t> *ops,
+			    struct range_entry *ranges)

The function names are bad, you aren't transfering anything, but optimizing.
Please rename try_transfer_range_tests to optimize_range_tests_1 and
try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps
better yet to optimize_range_tests_{xor,diff}.

Also, perhaps instead of passing ranges and i and j to these two functions
you could pass struct range_entry *range1, struct range_entry *range2
(caller would pass ranges + i, ranges + j).

+/* It does some common checks for function try_transfer_range_tests_1 and
+   try_transfer_range_tests_2.

Please adjust the comment for the renaming.  Please change trans_option
to bool optimize_xor.

+	  if (trans_option == 1)
+	    {
+	      if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj,
+					      highi, highj, ops, ranges))
+		{
+		  any_changes = true;
+		  break;
+		}
+	    }
+	  else if (trans_option == 2)
+	    {
+	      if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj,
+					      highi, highj, ops, ranges))
+		{
+		  any_changes = true;
+		  break;
+		}
+	    }

I'd prefer
	if (optimize_xor)
	  any_changes
	    = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
					highj, ops, ranges + i, ranges + j);
	else
	  any_changes
	    = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
					highj, ops, ranges + i, ranges + j);
	if (any_changes)
	  break;

Ok with those changes.

	Jakub
Zhenqiang Chen Oct. 15, 2013, 7:57 a.m. UTC | #3
> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Monday, October 14, 2013 4:49 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
> CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> 
> On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote:
> 
> @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range,
> struct range_entry *otherrange,
>    return true;
>  }
> 
> +/* Optimize X == CST1 || X == CST2
> +   if popcount (CST1 ^ CST2) == 1 into
> +   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
> +   Similarly for ranges.  E.g.
> +   X != 2 && X != 3 && X != 10 && X != 11
> +   will be transformed by the previous optimization into
> +   (X - 2U) <= 1U && (X - 10U) <= 1U
> +   and this loop can transform that into
> +   ((X & ~8) - 2U) <= 1U.  */
> +
> +static bool
> +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree
type,
> +			    tree lowi, tree lowj, tree highi, tree highj,
> +			    vec<operand_entry_t> *ops,
> +			    struct range_entry *ranges)
> 
> The function names are bad, you aren't transfering anything, but
optimizing.
> Please rename try_transfer_range_tests to optimize_range_tests_1 and
> try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps
> better yet to optimize_range_tests_{xor,diff}.

try_transfer_range_tests is changed to optimize_range_tests_1
try_transfer_range_tests_1 is changed to optimize_range_tests_xor
try_transfer_range_tests_2 is changed to optimize_range_tests_diff

> Also, perhaps instead of passing ranges and i and j to these two functions
> you could pass struct range_entry *range1, struct range_entry *range2
> (caller would pass ranges + i, ranges + j).

Updated to rangei and rangej since we use i, j, lowi, lowj etc at other
places.
 
> +/* It does some common checks for function try_transfer_range_tests_1
> and
> +   try_transfer_range_tests_2.
> 
> Please adjust the comment for the renaming.  Please change trans_option to
> bool optimize_xor.

Updated.

> +	  if (trans_option == 1)
> +	    {
> +	      if (try_transfer_range_tests_1 (opcode, i, j, type, lowi,
lowj,
> +					      highi, highj, ops, ranges))
> +		{
> +		  any_changes = true;
> +		  break;
> +		}
> +	    }
> +	  else if (trans_option == 2)
> +	    {
> +	      if (try_transfer_range_tests_2 (opcode, i, j, type, lowi,
lowj,
> +					      highi, highj, ops, ranges))
> +		{
> +		  any_changes = true;
> +		  break;
> +		}
> +	    }
> 
> I'd prefer
> 	if (optimize_xor)
> 	  any_changes
> 	    = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
> 					highj, ops, ranges + i, ranges + j);
> 	else
> 	  any_changes
> 	    = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
> 					highj, ops, ranges + i, ranges + j);
> 	if (any_changes)
> 	  break;

This is incorrect. The "any_changes" should be "|=" of the return values.

In the updated patch, I changed the code segment as

	  bool changes;
	  ...
	  if (optimize_xor)
	    changes = optimize_range_tests_xor (opcode, type, lowi, lowj,
						highi, highj, ops,
						ranges + i, ranges + j);
	  else
	    changes = optimize_range_tests_diff (opcode, type, lowi, lowj,
						 highi, highj, ops,
						 ranges + i, ranges + j);
	  if (changes)
	    {
	      any_changes = true;
	      break;
	    }

Is it OK?

Thanks!
-Zhenqiang

> Ok with those changes.
> 
> 	Jakub
Jakub Jelinek Oct. 15, 2013, 8:12 a.m. UTC | #4
On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:
> Is it OK?

Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2 && X != 3 && X != 10 && X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) <= 1U && (X - 10U) <= 1U
+   and this loop can transform that into
+   ((X & ~8) - 2U) <= 1U.  */

Here the example is using != and &&, so it is transformed into:
   !((X - 2U) <= 1U || (X - 10U) <= 1U)
(everything is negated at the end, and note || instead of &&,
with && it could fold into !(0))
and finally into:
   !(((X & ~8) - 2U) <= 1U)
Or alternatively change the first expression into:
  X == 2 || X == 3 || X == 10 || X == 11
and the second to:
  (X - 2U) <= 1U || (X - 10U) <= 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1) & ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) <= 3U && (X - 75U) <= 3U
+   and this loop can transform that into
+   ((X - 43U) & ~(75U - 43U)) <= 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
   (X - 43U) <= 3U || (X - 75U) <= 3U

Ok with those changes.

	Jakub
Jeff Law Oct. 15, 2013, 4:50 p.m. UTC | #5
On 10/15/13 02:12, Jakub Jelinek wrote:
> On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:
>> Is it OK?
>
> Ok, except two comments apparently still need updating.
>
> +/* Optimize X == CST1 || X == CST2
> +   if popcount (CST1 ^ CST2) == 1 into
> +   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
> +   Similarly for ranges.  E.g.
> +   X != 2 && X != 3 && X != 10 && X != 11
> +   will be transformed by the previous optimization into
> +   (X - 2U) <= 1U && (X - 10U) <= 1U
> +   and this loop can transform that into
> +   ((X & ~8) - 2U) <= 1U.  */
>
> Here the example is using != and &&, so it is transformed into:
>     !((X - 2U) <= 1U || (X - 10U) <= 1U)
> (everything is negated at the end, and note || instead of &&,
> with && it could fold into !(0))
> and finally into:
>     !(((X & ~8) - 2U) <= 1U)
> Or alternatively change the first expression into:
>    X == 2 || X == 3 || X == 10 || X == 11
> and the second to:
>    (X - 2U) <= 1U || (X - 10U) <= 1U
> and the third keep as is.
>
> +/* Optimize X == CST1 || X == CST2
> +   if popcount (CST2 - CST1) == 1 into
> +   ((X - CST1) & ~(CST2 - CST1)) == 0.
> +   Similarly for ranges.  E.g.
> +   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
> +   || X == 75 || X == 45
> +   will be transformed by the previous optimization into
> +   (X - 43U) <= 3U && (X - 75U) <= 3U
> +   and this loop can transform that into
> +   ((X - 43U) & ~(75U - 43U)) <= 3U.  */
>
> Here the example is using == and ||, so the only wrong thing in there
> is that the second expression should be
>     (X - 43U) <= 3U || (X - 75U) <= 3U
>
> Ok with those changes.
I noticed that we're now including rtl.h and tm_p.h in 
tree-ssa-reassoc.c, which seems wrong.

Jeff
Jakub Jelinek Oct. 15, 2013, 4:53 p.m. UTC | #6
On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
> I noticed that we're now including rtl.h and tm_p.h in
> tree-ssa-reassoc.c, which seems wrong.

Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h and
tm_p.h and just call it from there.

	Jakub
Jeff Law Oct. 15, 2013, 4:55 p.m. UTC | #7
On 10/15/13 10:53, Jakub Jelinek wrote:
> On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
>> I noticed that we're now including rtl.h and tm_p.h in
>> tree-ssa-reassoc.c, which seems wrong.
>
> Isn't that required for BRANCH_COST use?
> Other option would be to add some dummy wrapper around
> BRANCH_COST, put that wrapper into some file that already includes rtl.h and
> tm_p.h and just call it from there.
Yea, looking at it now, I'm a bit surprised this hasn't been converted 
to a target hook which would avoid this problem entirely.

jeff
Jeff Law Oct. 15, 2013, 5:48 p.m. UTC | #8
On 10/15/13 02:12, Jakub Jelinek wrote:
> On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:
>> Is it OK?
>
> Ok, except two comments apparently still need updating.
>
> +/* Optimize X == CST1 || X == CST2
> +   if popcount (CST1 ^ CST2) == 1 into
> +   (X & ~(CST1 ^ CST2)) == (CST1 & ~(CST1 ^ CST2)).
> +   Similarly for ranges.  E.g.
> +   X != 2 && X != 3 && X != 10 && X != 11
> +   will be transformed by the previous optimization into
> +   (X - 2U) <= 1U && (X - 10U) <= 1U
> +   and this loop can transform that into
> +   ((X & ~8) - 2U) <= 1U.  */
>
> Here the example is using != and &&, so it is transformed into:
>     !((X - 2U) <= 1U || (X - 10U) <= 1U)
> (everything is negated at the end, and note || instead of &&,
> with && it could fold into !(0))
> and finally into:
>     !(((X & ~8) - 2U) <= 1U)
> Or alternatively change the first expression into:
>    X == 2 || X == 3 || X == 10 || X == 11
> and the second to:
>    (X - 2U) <= 1U || (X - 10U) <= 1U
> and the third keep as is.
>
> +/* Optimize X == CST1 || X == CST2
> +   if popcount (CST2 - CST1) == 1 into
> +   ((X - CST1) & ~(CST2 - CST1)) == 0.
> +   Similarly for ranges.  E.g.
> +   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
> +   || X == 75 || X == 45
> +   will be transformed by the previous optimization into
> +   (X - 43U) <= 3U && (X - 75U) <= 3U
> +   and this loop can transform that into
> +   ((X - 43U) & ~(75U - 43U)) <= 3U.  */
>
> Here the example is using == and ||, so the only wrong thing in there
> is that the second expression should be
>     (X - 43U) <= 3U || (X - 75U) <= 3U
>
> Ok with those changes.
I fixed up the comments & ChangeLog entry and committed on Zhenqiang's 
behalf.

jeff
Richard Biener Oct. 16, 2013, 8:21 a.m. UTC | #9
On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law <law@redhat.com> wrote:
> On 10/15/13 10:53, Jakub Jelinek wrote:
>>
>> On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
>>>
>>> I noticed that we're now including rtl.h and tm_p.h in
>>> tree-ssa-reassoc.c, which seems wrong.
>>
>>
>> Isn't that required for BRANCH_COST use?
>> Other option would be to add some dummy wrapper around
>> BRANCH_COST, put that wrapper into some file that already includes rtl.h
>> and
>> tm_p.h and just call it from there.
>
> Yea, looking at it now, I'm a bit surprised this hasn't been converted to a
> target hook which would avoid this problem entirely.

You got the job!

;)

Richard.

> jeff
Jeff Law Oct. 16, 2013, 6:02 p.m. UTC | #10
On 10/16/13 02:21, Richard Biener wrote:
> On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law <law@redhat.com> wrote:
>> On 10/15/13 10:53, Jakub Jelinek wrote:
>>>
>>> On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
>>>>
>>>> I noticed that we're now including rtl.h and tm_p.h in
>>>> tree-ssa-reassoc.c, which seems wrong.
>>>
>>>
>>> Isn't that required for BRANCH_COST use?
>>> Other option would be to add some dummy wrapper around
>>> BRANCH_COST, put that wrapper into some file that already includes rtl.h
>>> and
>>> tm_p.h and just call it from there.
>>
>> Yea, looking at it now, I'm a bit surprised this hasn't been converted to a
>> target hook which would avoid this problem entirely.
>
> You got the job!
Joys :-)

What's the policy on the GDFL stuff.  I've tried so hard to avoid having 
to worry about that rats nest that I have no idea what our policy is. 
Basically I just want to take the old docs for BRANCH_COST and re-use 
those.  Is that considered kosher with all the GDFL nonsense?

Jeff
Joseph Myers Oct. 16, 2013, 7:15 p.m. UTC | #11
On Wed, 16 Oct 2013, Jeff Law wrote:

> What's the policy on the GDFL stuff.  I've tried so hard to avoid having to
> worry about that rats nest that I have no idea what our policy is. Basically I
> just want to take the old docs for BRANCH_COST and re-use those.  Is that
> considered kosher with all the GDFL nonsense?

When moving documentation text from the manual into target.def (so it ends 
up in both target.def and tm.texi, with tm.texi.in just having an @hook 
line to show where the regeneration of tm.texi should put the text), CC 
one of the people listed as docstring relicensing maintainers and ask us 
to approve the GPL relicensing in that case.
diff mbox

Patch

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c
@@ -0,0 +1,38 @@ 
+/* { dg-do run { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
+
+/* { dg-options "-O2 -fno-inline -fdump-tree-reassoc1-details" } */
+/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
+
+int test (int a, int b, int c)
+{
+  if (a == 43 || a == 75 || a == 44 || a == 78 || a == 77 || a == 46 || a == 76 || a == 45)
+    return b;
+  else
+    return c;
+}
+
+int main ()
+{
+  if (test (43, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (44, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (45, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (46, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (75, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (76, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (77, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (78, 20, 30) != 20)
+    __builtin_abort ();
+  if (test (30, 20, 30) != 30)
+    __builtin_abort ();

Perhaps it would be better to test more than just one
value outside of the range.  Say everything from -10 to 100 might be better,
but you'd want to make sure the optimization can't be applied to the
What about