[WIP] Re: Inefficient end-of-loop value computation - missed optimization somewhere?

Submitted by Ulrich Weigand on Feb. 28, 2012, 3:10 p.m.

Details

Message ID 201202281510.q1SFAMDY022388@d06av02.portsmouth.uk.ibm.com
State New
Headers show

Commit Message

Ulrich Weigand Feb. 28, 2012, 3:10 p.m.
Richard Guenther wrote:
> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > we've noticed that the loop optimizer sometimes inserts weirdly
> > inefficient code to compute the value of an induction variable
> > at the end of the loop.
[snip]

> The issue is that (start + 1) + 1 * (int) ... is computed using signed
> integer arithmetic which may invoke undefined behavior when it wraps.
> Thus we cannot re-associate it.  I suppose if you'd arrange the code
> to do the arithmetic in unsigned and only cast the result back to the
> desired type we might fold it ...

I'm not completely sure I've got the correct place for the conversions,
but using the patch below does indeed fix the inefficient code.

I'll still need to do proper testing and benchmarking, but I thought
I'd post the patch anyway just as a heads-up ...

Does this look reasonable to you?

Thanks,
Ulrich


ChangeLog:

	* tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
	Perform chrec_apply computation in an unsigned type.

Comments

Richard Guenther Feb. 28, 2012, 3:23 p.m.
On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > we've noticed that the loop optimizer sometimes inserts weirdly
>> > inefficient code to compute the value of an induction variable
>> > at the end of the loop.
> [snip]
>
>> The issue is that (start + 1) + 1 * (int) ... is computed using signed
>> integer arithmetic which may invoke undefined behavior when it wraps.
>> Thus we cannot re-associate it.  I suppose if you'd arrange the code
>> to do the arithmetic in unsigned and only cast the result back to the
>> desired type we might fold it ...
>
> I'm not completely sure I've got the correct place for the conversions,
> but using the patch below does indeed fix the inefficient code.
>
> I'll still need to do proper testing and benchmarking, but I thought
> I'd post the patch anyway just as a heads-up ...
>
> Does this look reasonable to you?

Yes, that looks reasonable.  Though instead of unsigned_type_for
please use build_nonstandard_integer_type instead (if the type
is not already unsigned).  unsigned_type_for does not necessarily
preserve type-precision and may even return NULL.

Richard.

> Thanks,
> Ulrich
>
>
> ChangeLog:
>
>        * tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
>        Perform chrec_apply computation in an unsigned type.
>
> Index: gcc/tree-scalar-evolution.c
> ===================================================================
> --- gcc/tree-scalar-evolution.c (revision 184439)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -474,11 +474,18 @@
>            return chrec_dont_know;
>          else
>            {
> +             tree type = TREE_TYPE (evolution_fn);
> +             tree utype = unsigned_type_for (type);
>              tree res;
>
>              /* evolution_fn is the evolution function in LOOP.  Get
> -                its value in the nb_iter-th iteration.  */
> -             res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
> +                its value in the nb_iter-th iteration.  Since the
> +                number of iterations is always unsigned, we perform
> +                the computation in an unsigned type to allow for
> +                improved simplification of subexpressions.  */
> +             res = chrec_convert (utype, evolution_fn, NULL);
> +             res = chrec_apply (inner_loop->num, res, nb_iter);
> +             res = chrec_convert (type, res, NULL);
>
>              if (chrec_contains_symbols_defined_in_loop (res, loop->num))
>                res = instantiate_parameters (loop, res);
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

Patch hide | download patch | download mbox

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 184439)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -474,11 +474,18 @@ 
 	    return chrec_dont_know;
 	  else
 	    {
+	      tree type = TREE_TYPE (evolution_fn);
+	      tree utype = unsigned_type_for (type);
 	      tree res;
 
 	      /* evolution_fn is the evolution function in LOOP.  Get
-		 its value in the nb_iter-th iteration.  */
-	      res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
+		 its value in the nb_iter-th iteration.  Since the
+		 number of iterations is always unsigned, we perform
+		 the computation in an unsigned type to allow for
+		 improved simplification of subexpressions.  */
+	      res = chrec_convert (utype, evolution_fn, NULL);
+	      res = chrec_apply (inner_loop->num, res, nb_iter);
+	      res = chrec_convert (type, res, NULL);
 
 	      if (chrec_contains_symbols_defined_in_loop (res, loop->num))
 		res = instantiate_parameters (loop, res);