Patchwork [4.4/4.5/4.6?] check for constant base addresses in is_gimple_invariant_address

login
register
mail settings
Submitter Nathan Froyd
Date Sept. 5, 2010, 2:12 a.m.
Message ID <20100905021216.GJ16898@codesourcery.com>
Download mbox | patch
Permalink /patch/63818/
State New
Headers show

Comments

Nathan Froyd - Sept. 5, 2010, 2:12 a.m.
The functions in the following testcase (PPC-specific, but
generalizable) do not compile to the same code:

#define ASMSTR "stw%U0%X0 %1,%0"

struct Foo
{
  unsigned pad[10];
  unsigned cs0_bnds;
};

static inline
void Out (unsigned *addr)
{
  __asm__ __volatile__(ASMSTR : "=m" (*addr) : "r" (0x1f));
}

void AsmFunction (void)
{
 struct Foo *ddr = (struct Foo *)0xffe02000;

 Out (&ddr->cs0_bnds);
 Out (&ddr->pad[0]);
}

void AsmDirect (void)
{
  struct Foo *ddr = (struct Foo *)0xffe02000;

  __asm__ __volatile__(ASMSTR : "=m" (ddr->cs0_bnds) : "r" (0x1f));
  __asm__ __volatile__(ASMSTR : "=m" (ddr->pad[0]) : "r" (0x1f));
}

The results are (cleaned up to remove #APP bits):

AsmFunction:
	lis 9,0xffe0
	li 0,31
	ori 9,9,8232
	stw 0,0(9)
	lis 9,0xffe0
	ori 9,9,8192
	stw 0,0(9)
	blr

AsmDirect:
	lis 9,0xffe0
	li 0,31
	ori 9,9,8192
	stw 0,40(9)
	li 0,28
	stw 0,0(9)
	blr

Notice that in AsmDirect, we've CSE'd the address of `ddr', and we
haven't done so in AsmFunction.  (We failed to move the lowpart of the
address into the stores, but AFAICT that is a problem with the PPC
backend and is not relevant here.  Compiling the example above for MIPS
demonstrates the same problem without the distraction of bad code.)

We used to do this CSE'ing in 4.3.  When faced with this code:

  D.2017_2 = &4292878336B->cs0_bnds;
  __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m" *D.2017_2 : "r" 31);

in ccp, we'd propagate D.2017_2 into its use and wind up with identical
trees between the two functions heading into expand.  In 4.4 and 4.5,
however, we don't, so we've never given an opportunity to CSE the
address of `ddr'.

This affects applications that are size-conscious; if you're doing
several calls to 'Out' in the same function, your code size can increase
quite a bit in 4.4 and 4.5.

So far as I can tell, this is because is_gimple_invariant_address fails
to consider that we might have an INTEGER_CST base address.  The patch
below fixes the testcase; with it applied, AsmFunction and AsmDirect
compile to the same code.  What I'm unsure of is:

- Whether I should also check for decl_address_invariant_p, perhaps by:

  if (TREE_CODE (op) == INDIRECT_REF)
    op = TREE_OPERAND (op, 0);

  return CONSTANT_CLASS_P (op) || decl_address_invariant_p (op);

- Whether the same treatment should be applied to
  is_gimple_ip_invariant_address.

My most recent powerpc compiler is somewhat old, but it is post-MEM_REF
merge and it also does not do the desired propagation.  It looks to me
like the relevant case in handled in is_gimple_invariant_address, but
something is not working; the result &4292878336B->FOO is marked
VARYING.

Anyway, the below patch is against 4.5 and I'm asking for approval to
4.4 and 4.5 and possibly mainline (with modifications, obviously) once I
verify with an up-to-date compiler.  Answers to the questions above
would be helpful.

-Nathan

	* gimple.c (is_gimple_invariant_address): Check for a constant
	operand of an INDIRECT_REF.
Richard Guenther - Sept. 5, 2010, 8:36 a.m.
On Sun, Sep 5, 2010 at 4:12 AM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> The functions in the following testcase (PPC-specific, but
> generalizable) do not compile to the same code:
>
> #define ASMSTR "stw%U0%X0 %1,%0"
>
> struct Foo
> {
>  unsigned pad[10];
>  unsigned cs0_bnds;
> };
>
> static inline
> void Out (unsigned *addr)
> {
>  __asm__ __volatile__(ASMSTR : "=m" (*addr) : "r" (0x1f));
> }
>
> void AsmFunction (void)
> {
>  struct Foo *ddr = (struct Foo *)0xffe02000;
>
>  Out (&ddr->cs0_bnds);
>  Out (&ddr->pad[0]);
> }
>
> void AsmDirect (void)
> {
>  struct Foo *ddr = (struct Foo *)0xffe02000;
>
>  __asm__ __volatile__(ASMSTR : "=m" (ddr->cs0_bnds) : "r" (0x1f));
>  __asm__ __volatile__(ASMSTR : "=m" (ddr->pad[0]) : "r" (0x1f));
> }
>
> The results are (cleaned up to remove #APP bits):
>
> AsmFunction:
>        lis 9,0xffe0
>        li 0,31
>        ori 9,9,8232
>        stw 0,0(9)
>        lis 9,0xffe0
>        ori 9,9,8192
>        stw 0,0(9)
>        blr
>
> AsmDirect:
>        lis 9,0xffe0
>        li 0,31
>        ori 9,9,8192
>        stw 0,40(9)
>        li 0,28
>        stw 0,0(9)
>        blr
>
> Notice that in AsmDirect, we've CSE'd the address of `ddr', and we
> haven't done so in AsmFunction.  (We failed to move the lowpart of the
> address into the stores, but AFAICT that is a problem with the PPC
> backend and is not relevant here.  Compiling the example above for MIPS
> demonstrates the same problem without the distraction of bad code.)
>
> We used to do this CSE'ing in 4.3.  When faced with this code:
>
>  D.2017_2 = &4292878336B->cs0_bnds;
>  __asm__ __volatile__("stw%U0%X0 %1,%0" : "=m" *D.2017_2 : "r" 31);
>
> in ccp, we'd propagate D.2017_2 into its use and wind up with identical
> trees between the two functions heading into expand.  In 4.4 and 4.5,
> however, we don't, so we've never given an opportunity to CSE the
> address of `ddr'.
>
> This affects applications that are size-conscious; if you're doing
> several calls to 'Out' in the same function, your code size can increase
> quite a bit in 4.4 and 4.5.
>
> So far as I can tell, this is because is_gimple_invariant_address fails
> to consider that we might have an INTEGER_CST base address.  The patch
> below fixes the testcase; with it applied, AsmFunction and AsmDirect
> compile to the same code.  What I'm unsure of is:
>
> - Whether I should also check for decl_address_invariant_p, perhaps by:
>
>  if (TREE_CODE (op) == INDIRECT_REF)
>    op = TREE_OPERAND (op, 0);
>
>  return CONSTANT_CLASS_P (op) || decl_address_invariant_p (op);
>
> - Whether the same treatment should be applied to
>  is_gimple_ip_invariant_address.
>
> My most recent powerpc compiler is somewhat old, but it is post-MEM_REF
> merge and it also does not do the desired propagation.  It looks to me
> like the relevant case in handled in is_gimple_invariant_address, but
> something is not working; the result &4292878336B->FOO is marked
> VARYING.
>
> Anyway, the below patch is against 4.5 and I'm asking for approval to
> 4.4 and 4.5 and possibly mainline (with modifications, obviously) once I
> verify with an up-to-date compiler.  Answers to the questions above
> would be helpful.
>
> -Nathan
>
>        * gimple.c (is_gimple_invariant_address): Check for a constant
>        operand of an INDIRECT_REF.
>
> Index: gcc/ChangeLog
> ===================================================================
> Index: gcc/gimple.c
> ===================================================================
> --- gcc/gimple.c        (revision 163868)
> +++ gcc/gimple.c        (working copy)
> @@ -2591,7 +2591,13 @@ is_gimple_invariant_address (const_tree
>
>   op = strip_invariant_refs (TREE_OPERAND (t, 0));
>
> -  return op && (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
> +  if (!op)
> +    return false;
> +
> +  if (TREE_CODE (op) == INDIRECT_REF)
> +    return CONSTANT_CLASS_P (TREE_OPERAND (op, 0));
> +
> +  return (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
>  }

This doesn't make sense.  If we are asking for &CST->x then we should have
folded it to a plain CST.

So - you have to look elsewhere for why that does not happen.

(and of course such a change wouldn't be appropriate for the release branches,
you basically change the GIMPLE IL)

Richard.

>  /* Return true if T is a gimple invariant address at IPA level
>
Nathan Froyd - Sept. 5, 2010, 7:52 p.m.
On Sun, Sep 05, 2010 at 10:36:16AM +0200, Richard Guenther wrote:
> > Index: gcc/ChangeLog
> > ===================================================================
> > Index: gcc/gimple.c
> > ===================================================================
> > --- gcc/gimple.c        (revision 163868)
> > +++ gcc/gimple.c        (working copy)
> > @@ -2591,7 +2591,13 @@ is_gimple_invariant_address (const_tree
> >
> >   op = strip_invariant_refs (TREE_OPERAND (t, 0));
> >
> > -  return op && (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
> > +  if (!op)
> > +    return false;
> > +
> > +  if (TREE_CODE (op) == INDIRECT_REF)
> > +    return CONSTANT_CLASS_P (TREE_OPERAND (op, 0));
> > +
> > +  return (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
> >  }
> 
> This doesn't make sense.  If we are asking for &CST->x then we should have
> folded it to a plain CST.

I don't understand what you mean by this.  Could you explain?

> (and of course such a change wouldn't be appropriate for the release branches,
> you basically change the GIMPLE IL)

I'm happy to fix it however you like on mainline, but I'd also like to
find a fix that works for release branches.

-Nathan
Richard Guenther - Sept. 5, 2010, 7:56 p.m.
On Sun, Sep 5, 2010 at 9:52 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On Sun, Sep 05, 2010 at 10:36:16AM +0200, Richard Guenther wrote:
>> > Index: gcc/ChangeLog
>> > ===================================================================
>> > Index: gcc/gimple.c
>> > ===================================================================
>> > --- gcc/gimple.c        (revision 163868)
>> > +++ gcc/gimple.c        (working copy)
>> > @@ -2591,7 +2591,13 @@ is_gimple_invariant_address (const_tree
>> >
>> >   op = strip_invariant_refs (TREE_OPERAND (t, 0));
>> >
>> > -  return op && (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
>> > +  if (!op)
>> > +    return false;
>> > +
>> > +  if (TREE_CODE (op) == INDIRECT_REF)
>> > +    return CONSTANT_CLASS_P (TREE_OPERAND (op, 0));
>> > +
>> > +  return (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
>> >  }
>>
>> This doesn't make sense.  If we are asking for &CST->x then we should have
>> folded it to a plain CST.
>
> I don't understand what you mean by this.  Could you explain?

&CST->x is a constant, we should fold it to it instead of keeping the address-of
expression (if it wouldn't be a constant it wouldn't and shouldn't be
is_gimple_min_invariant either).

>> (and of course such a change wouldn't be appropriate for the release branches,
>> you basically change the GIMPLE IL)
>
> I'm happy to fix it however you like on mainline, but I'd also like to
> find a fix that works for release branches.

A fix that folds &CST->x to a constant would be appropriate.  And I think
we do such folding somewhere, just maybe not in the right place(s).

Richard.

> -Nathan
>
Nathan Froyd - Sept. 5, 2010, 8:47 p.m.
On Sun, Sep 05, 2010 at 09:56:39PM +0200, Richard Guenther wrote:
> >> This doesn't make sense.  If we are asking for &CST->x then we should have
> >> folded it to a plain CST.
> >
> > I don't understand what you mean by this.  Could you explain?
> 
> &CST->x is a constant, we should fold it to it instead of keeping the address-of
> expression (if it wouldn't be a constant it wouldn't and shouldn't be
> is_gimple_min_invariant either).

Hm, I don't know.  I can see this being worse.  Consider doing that
folding to:

  *(&CST->x) = foo;
  *(&CST->y) = bar;

we'll have:

  *(CST1) = foo;
  *(CST2) = bar;

We might expand that to:

  t1 = CST1
  [t1] = foo
  t2 = CST2
  [t2] = bar

Here we can't CSE CST1 and CST2.  But if we'd gone with the original,
we'd have something like:

  t1 = CST
  [t1+offsetof(x)] = foo
  t2 = CST
  [t2+offsetof(y)] = bar

and here we can CSE CST easily.  Of course, some backends might expand
the folded example to:

  t1 = high(CST1)
  [t1+low(CST1)] = foo
  t2 = high(CST2)
  [t2+low(CST2)] = bar

and high(CST1) is almost certainly equal to high(CST2), so we can CSE
those.  But they'd also expand the unfolded example to that same code,
so I don't see the folding as winning you anything.

> >> (and of course such a change wouldn't be appropriate for the release branches,
> >> you basically change the GIMPLE IL)
> >
> > I'm happy to fix it however you like on mainline, but I'd also like to
> > find a fix that works for release branches.
> 
> A fix that folds &CST->x to a constant would be appropriate.  And I think
> we do such folding somewhere, just maybe not in the right place(s).

I don't see an obvious case in fold-const.c.  Part of the reason we
don't do it, even if the code existed, is due to this bit in
tree-ssa-ccp:

			  tree ret, save = *base;
			  tree new_base;
			  new_base = fold_build2 (MEM_REF, TREE_TYPE (*base),
						  unshare_expr (val),
						  TREE_OPERAND (*base, 1));
			  /* We need to return a new tree, not modify the IL
			     or share parts of it.  So play some tricks to
			     avoid manually building it.  */
			  *base = new_base;
			  ret = unshare_expr (rhs);
			  recompute_tree_invariant_for_addr_expr (ret);
			  *base = save;
			  return ret;

so when we propagate CST into ADDR_EXPRs, we're just copying things
rather than folding.

4.3 marked the whole ADDR_EXPR as TREE_INVARIANT, and TREE_INVARIANT
disappeared in 4.4.  Trying to tweak is_gimple_min_invariant to
something closer to 4.3's behavior seems like a more reasonable thing to
do--at least for release branches.

-Nathan
Richard Guenther - Sept. 5, 2010, 9:26 p.m.
On Sun, Sep 5, 2010 at 10:47 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> On Sun, Sep 05, 2010 at 09:56:39PM +0200, Richard Guenther wrote:
>> >> This doesn't make sense.  If we are asking for &CST->x then we should have
>> >> folded it to a plain CST.
>> >
>> > I don't understand what you mean by this.  Could you explain?
>>
>> &CST->x is a constant, we should fold it to it instead of keeping the address-of
>> expression (if it wouldn't be a constant it wouldn't and shouldn't be
>> is_gimple_min_invariant either).
>
> Hm, I don't know.  I can see this being worse.  Consider doing that
> folding to:
>
>  *(&CST->x) = foo;
>  *(&CST->y) = bar;
>
> we'll have:
>
>  *(CST1) = foo;
>  *(CST2) = bar;
>
> We might expand that to:
>
>  t1 = CST1
>  [t1] = foo
>  t2 = CST2
>  [t2] = bar
>
> Here we can't CSE CST1 and CST2.  But if we'd gone with the original,
> we'd have something like:
>
>  t1 = CST
>  [t1+offsetof(x)] = foo
>  t2 = CST
>  [t2+offsetof(y)] = bar
>
> and here we can CSE CST easily.  Of course, some backends might expand
> the folded example to:
>
>  t1 = high(CST1)
>  [t1+low(CST1)] = foo
>  t2 = high(CST2)
>  [t2+low(CST2)] = bar
>
> and high(CST1) is almost certainly equal to high(CST2), so we can CSE
> those.  But they'd also expand the unfolded example to that same code,
> so I don't see the folding as winning you anything.

It's IL consistency.  It's not GIMPLEs business to create target
friendly constants.

>> >> (and of course such a change wouldn't be appropriate for the release branches,
>> >> you basically change the GIMPLE IL)
>> >
>> > I'm happy to fix it however you like on mainline, but I'd also like to
>> > find a fix that works for release branches.
>>
>> A fix that folds &CST->x to a constant would be appropriate.  And I think
>> we do such folding somewhere, just maybe not in the right place(s).
>
> I don't see an obvious case in fold-const.c.  Part of the reason we
> don't do it, even if the code existed, is due to this bit in
> tree-ssa-ccp:
>
>                          tree ret, save = *base;
>                          tree new_base;
>                          new_base = fold_build2 (MEM_REF, TREE_TYPE (*base),
>                                                  unshare_expr (val),
>                                                  TREE_OPERAND (*base, 1));
>                          /* We need to return a new tree, not modify the IL
>                             or share parts of it.  So play some tricks to
>                             avoid manually building it.  */
>                          *base = new_base;
>                          ret = unshare_expr (rhs);
>                          recompute_tree_invariant_for_addr_expr (ret);
>                          *base = save;
>                          return ret;
>
> so when we propagate CST into ADDR_EXPRs, we're just copying things
> rather than folding.
>
> 4.3 marked the whole ADDR_EXPR as TREE_INVARIANT, and TREE_INVARIANT
> disappeared in 4.4.  Trying to tweak is_gimple_min_invariant to
> something closer to 4.3's behavior seems like a more reasonable thing to
> do--at least for release branches.

No.

I can look at the folding - is there a PR for this?

Thanks,
Richard.

> -Nathan
>
Nathan Froyd - Sept. 5, 2010, 9:46 p.m.
On Sun, Sep 05, 2010 at 11:26:56PM +0200, Richard Guenther wrote:
> It's IL consistency.  It's not GIMPLEs business to create target
> friendly constants.

*shrug*  It's not about creating "friendly" constants, it's about
creating something that the RTL optimizers can deal with appropriately.

> I can look at the folding - is there a PR for this?

Not one that I'm aware of.  I can file one if you like.

-Nathan

Patch

Index: gcc/ChangeLog
===================================================================
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c	(revision 163868)
+++ gcc/gimple.c	(working copy)
@@ -2591,7 +2591,13 @@  is_gimple_invariant_address (const_tree 
 
   op = strip_invariant_refs (TREE_OPERAND (t, 0));
 
-  return op && (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
+  if (!op)
+    return false;
+
+  if (TREE_CODE (op) == INDIRECT_REF)
+    return CONSTANT_CLASS_P (TREE_OPERAND (op, 0));
+
+  return (CONSTANT_CLASS_P (op) || decl_address_invariant_p (op));
 }
 
 /* Return true if T is a gimple invariant address at IPA level