Patchwork [WIP] Re: Inefficient end-of-loop value computation - missed optimization somewhere?

login
register
mail settings
Submitter Ulrich Weigand
Date Feb. 28, 2012, 3:10 p.m.
Message ID <201202281510.q1SFAMDY022388@d06av02.portsmouth.uk.ibm.com>
Download mbox | patch
Permalink /patch/143470/
State New
Headers show

Comments

Ulrich Weigand - Feb. 28, 2012, 3:10 p.m.
Richard Guenther wrote:
> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > we've noticed that the loop optimizer sometimes inserts weirdly
> > inefficient code to compute the value of an induction variable
> > at the end of the loop.
[snip]

> The issue is that (start + 1) + 1 * (int) ... is computed using signed
> integer arithmetic which may invoke undefined behavior when it wraps.
> Thus we cannot re-associate it.  I suppose if you'd arrange the code
> to do the arithmetic in unsigned and only cast the result back to the
> desired type we might fold it ...

I'm not completely sure I've got the correct place for the conversions,
but using the patch below does indeed fix the inefficient code.

I'll still need to do proper testing and benchmarking, but I thought
I'd post the patch anyway just as a heads-up ...

Does this look reasonable to you?

Thanks,
Ulrich


ChangeLog:

	* tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
	Perform chrec_apply computation in an unsigned type.
Richard Guenther - Feb. 28, 2012, 3:23 p.m.
On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > we've noticed that the loop optimizer sometimes inserts weirdly
>> > inefficient code to compute the value of an induction variable
>> > at the end of the loop.
> [snip]
>
>> The issue is that (start + 1) + 1 * (int) ... is computed using signed
>> integer arithmetic which may invoke undefined behavior when it wraps.
>> Thus we cannot re-associate it.  I suppose if you'd arrange the code
>> to do the arithmetic in unsigned and only cast the result back to the
>> desired type we might fold it ...
>
> I'm not completely sure I've got the correct place for the conversions,
> but using the patch below does indeed fix the inefficient code.
>
> I'll still need to do proper testing and benchmarking, but I thought
> I'd post the patch anyway just as a heads-up ...
>
> Does this look reasonable to you?

Yes, that looks reasonable.  Though instead of unsigned_type_for
please use build_nonstandard_integer_type instead (if the type
is not already unsigned).  unsigned_type_for does not necessarily
preserve type-precision and may even return NULL.

Richard.

> Thanks,
> Ulrich
>
>
> ChangeLog:
>
>        * tree-scalar-evolution.c (compute_overall_effect_of_inner_loop):
>        Perform chrec_apply computation in an unsigned type.
>
> Index: gcc/tree-scalar-evolution.c
> ===================================================================
> --- gcc/tree-scalar-evolution.c (revision 184439)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -474,11 +474,18 @@
>            return chrec_dont_know;
>          else
>            {
> +             tree type = TREE_TYPE (evolution_fn);
> +             tree utype = unsigned_type_for (type);
>              tree res;
>
>              /* evolution_fn is the evolution function in LOOP.  Get
> -                its value in the nb_iter-th iteration.  */
> -             res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
> +                its value in the nb_iter-th iteration.  Since the
> +                number of iterations is always unsigned, we perform
> +                the computation in an unsigned type to allow for
> +                improved simplification of subexpressions.  */
> +             res = chrec_convert (utype, evolution_fn, NULL);
> +             res = chrec_apply (inner_loop->num, res, nb_iter);
> +             res = chrec_convert (type, res, NULL);
>
>              if (chrec_contains_symbols_defined_in_loop (res, loop->num))
>                res = instantiate_parameters (loop, res);
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

Patch

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 184439)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -474,11 +474,18 @@ 
 	    return chrec_dont_know;
 	  else
 	    {
+	      tree type = TREE_TYPE (evolution_fn);
+	      tree utype = unsigned_type_for (type);
 	      tree res;
 
 	      /* evolution_fn is the evolution function in LOOP.  Get
-		 its value in the nb_iter-th iteration.  */
-	      res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
+		 its value in the nb_iter-th iteration.  Since the
+		 number of iterations is always unsigned, we perform
+		 the computation in an unsigned type to allow for
+		 improved simplification of subexpressions.  */
+	      res = chrec_convert (utype, evolution_fn, NULL);
+	      res = chrec_apply (inner_loop->num, res, nb_iter);
+	      res = chrec_convert (type, res, NULL);
 
 	      if (chrec_contains_symbols_defined_in_loop (res, loop->num))
 		res = instantiate_parameters (loop, res);