Patchwork [RFC] Don't completely scalarize a record if it contains bit-field (PR tree-optimization/45144)

login
register
mail settings
Submitter Jie Zhang
Date July 30, 2010, 4:09 p.m.
Message ID <4C52F946.6010404@codesourcery.com>
Download mbox | patch
Permalink /patch/60367/
State New
Headers show

Comments

Jie Zhang - July 30, 2010, 4:09 p.m.
PR tree-optimization/45144 shows an issue that SRA causes. I used 
arm-none-eabi target as an example in PR tree-optimization/45144. But 
the same issue can also been seen on x86_64-linux-gnu target using the 
same test case in the PR.

SRA completely scalarizes a small record. But when the record is used 
later as a whole, GCC has to make the record out of the scalar parts. 
When the record contains bit-fields, GCC generates ugly code to assemble 
the scalar parts into a record.

Until the aggregates copy propagation is implemented, I think it would 
better to disable full scalarization for such records. The patch is 
attached. It's bootstrapped on x86_64-linux-gnu and regression tested.

Is it OK for now? We can remove it after aggregates copy propagation is 
implemented.

Will it be better to add bit-field check in type_consists_of_records_p 
instead of using a new function "type_contains_bit_field_p"?


Regards,
Richard Guenther - July 30, 2010, 4:22 p.m.
On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang <jie@codesourcery.com> wrote:
> PR tree-optimization/45144 shows an issue that SRA causes. I used
> arm-none-eabi target as an example in PR tree-optimization/45144. But the
> same issue can also been seen on x86_64-linux-gnu target using the same test
> case in the PR.
>
> SRA completely scalarizes a small record. But when the record is used later
> as a whole, GCC has to make the record out of the scalar parts. When the
> record contains bit-fields, GCC generates ugly code to assemble the scalar
> parts into a record.
>
> Until the aggregates copy propagation is implemented, I think it would
> better to disable full scalarization for such records. The patch is
> attached. It's bootstrapped on x86_64-linux-gnu and regression tested.
>
> Is it OK for now? We can remove it after aggregates copy propagation is
> implemented.
>
> Will it be better to add bit-field check in type_consists_of_records_p
> instead of using a new function "type_contains_bit_field_p"?

Without looking at the patch I have two comments.

First, I heard talks about an aggregate copy propagation pass.
Instead of a new pass I would suggest to rewrite aggregate
variables (partly) into SSA form, extending the DECL_GIMPLE_REG_P
facility.  Thus, for each full definition aggr = ... you get
aggr_2 = ..., partial reads of that SSA name should be fine as well
as aggregate uses.

This works for non-aliased variables only of course and requires
some thinking as for DECL_GIMPLE_REG_P as that applies to
the underlying decl which would prohibit creating from

 a.a = x;
 a = b;

 a.a = x;
 a_2 = b;

which means that we need to replace the full defs with a new
temporary (with DECL_DEBUG_EXPR_FROM set appropriately),
thus

 a.a = x;
 tmp_2 = b; // was a

this rewriting should happen in update_address_taken.

Second comment.

SRA should be the place to lower bitfield accesses to loads of
the underlying scalar and the bit extraction via BIT_FIELD_REF.
Stores should be handled via a read-modify-write cycle,
borrowing BIT_FIELD_EXPR as done on the old mem-ref branch.

Richard.

> Regards,
> --
> Jie Zhang
> CodeSourcery
>
Mark Mitchell - July 30, 2010, 5:11 p.m.
Richard Guenther wrote:

>> Until the aggregates copy propagation is implemented, I think it would
>> better to disable full scalarization for such records. The patch is
>> attached. It's bootstrapped on x86_64-linux-gnu and regression tested.

> Without looking at the patch I have two comments.

From reading your comments, I'm not sure if you're saying that you don't
like Jie's idea, or if you think it's an OK idea, but there are some
other things that you would like to see done as well or instead.  It
would be helpful if you could make that clear.

To me, Jie's change seems like a plausible heuristic within the current
infrastructure.  I'm all for building new infrastructure when
possible/necessary, but I don't think we should prevent these kinds of
tweaks to heuristics just because we can think of another way of doing
things.  To me, the question ought to be "does this make the compiler
better for users?"

To answer that question, what I guess I'd like to know is what the
impact is on some benchmark that matters.  Here, I don't think SPEC,
EEMBC, etc. are probably the right places to look; they probably don't
have much of this kind of code.  Perhaps it would be interesting to know
how many SRA opportunities we lose in the Linux kernel because of this
change -- and then spot-check some of them to see whether those are
cases where we really lose by not doing SRA, or really win because we're
not doing the kind of ugly stuff that inspired this change.

Thanks,
Jakub Jelinek - July 30, 2010, 5:53 p.m.
On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote:
> To me, Jie's change seems like a plausible heuristic within the current
> infrastructure.  I'm all for building new infrastructure when
> possible/necessary, but I don't think we should prevent these kinds of
> tweaks to heuristics just because we can think of another way of doing
> things.  To me, the question ought to be "does this make the compiler
> better for users?"

I wouldn't call it tweak to heuristics, it looks to me like an ugly hack.
In many cases the SRA of bitfields is very desirable, especially for apps
that use bitfields heavily like Linux kernel.  I remember Alex spending
quite some time improving SRA to handle bitfields more efficiently,
and this hack just disables it altogether.

What we IMHO need is a pass late in the gimple pipeline which will
optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
or something similar, because bitfield ops are too hard to be handled
efficiently after the expansion.  Combiner helps sometimes, but for many
bit field ops the 3 insn limit is too limiting.
Such pass could help with many more things than just what has been created
for bitfields by SRA in some cases, consider say:
struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c;

void
foo (void)
{
  a.i |= 5; a.j = 3; a.k &= 2; a.l = 8;
  b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l;
}

which on say i?86/x86_64 can be optimized as roughly:
  ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035;
  ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1];
(of course in some alias friendly way).
See e.g. PR37135/PR22121/PR42172.

	Jakub
Mark Mitchell - July 30, 2010, 6:45 p.m.
Jakub Jelinek wrote:
> On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote:
>> To me, Jie's change seems like a plausible heuristic within the current
>> infrastructure.  I'm all for building new infrastructure when
>> possible/necessary, but I don't think we should prevent these kinds of
>> tweaks to heuristics just because we can think of another way of doing
>> things.  To me, the question ought to be "does this make the compiler
>> better for users?"
> 
> I wouldn't call it tweak to heuristics, it looks to me like an ugly hack.

Well, call it what you like.  It's making a guess about when SRA is or
isn't profitable.  It might be a good guess or a bad guess, but it's a
guess, which is why I used the term "heuristic".

> In many cases the SRA of bitfields is very desirable, especially for apps
> that use bitfields heavily like Linux kernel.  I remember Alex spending
> quite some time improving SRA to handle bitfields more efficiently,
> and this hack just disables it altogether.

That's precisely why I wanted some data about that.  Maybe we already
know enough to know that this isn't a good heuristic because we already
know that SRA on bitfields is a big win in lots of cases.  That's fine;
in that case, the patch is not a good thing.

> What we IMHO need is a pass late in the gimple pipeline which will
> optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
> or something similar, because bitfield ops are too hard to be handled
> efficiently after the expansion.

That's an interesting idea.  Does anyone else have an idea as to the
best plan here?
Richard Kenner - July 30, 2010, 6:54 p.m.
> > What we IMHO need is a pass late in the gimple pipeline which will
> > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
> > or something similar, because bitfield ops are too hard to be handled
> > efficiently after the expansion.
> 
> That's an interesting idea.  Does anyone else have an idea as to the
> best plan here?

Note that fold-const.c has done this in some cases for a very long time.
Andrew Pinski - July 30, 2010, 6:57 p.m.
On Jul 30, 2010, at 11:54 AM, kenner@vlsi1.ultra.nyu.edu (Richard  
Kenner) wrote:

>>> What we IMHO need is a pass late in the gimple pipeline which will
>>> optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
>>> or something similar, because bitfield ops are too hard to be  
>>> handled
>>> efficiently after the expansion.
>>
>> That's an interesting idea.  Does anyone else have an idea as to the
>> best plan here?
>
> Note that fold-const.c has done this in some cases for a very long  
> time.

Interesting because we have this extact pass here at cavium. I can  
post the patch set that adds it to 4.3.3 if anyone wants to look at  
them.
Mark Mitchell - July 30, 2010, 6:59 p.m.
Andrew Pinski wrote:

> Interesting because we have this extact pass here at cavium. I can post
> the patch set that adds it to 4.3.3 if anyone wants to look at them.

Certainly seems like it would be helpful, assuming Cavium will assign
copyright.
Richard Guenther - July 31, 2010, 9:38 a.m.
On Fri, Jul 30, 2010 at 8:54 PM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> > What we IMHO need is a pass late in the gimple pipeline which will
>> > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
>> > or something similar, because bitfield ops are too hard to be handled
>> > efficiently after the expansion.
>>
>> That's an interesting idea.  Does anyone else have an idea as to the
>> best plan here?
>
> Note that fold-const.c has done this in some cases for a very long time.

Indeed - also in a very ugly way.

Richard.
Richard Guenther - July 31, 2010, 9:44 a.m.
On Fri, Jul 30, 2010 at 7:53 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote:
>> To me, Jie's change seems like a plausible heuristic within the current
>> infrastructure.  I'm all for building new infrastructure when
>> possible/necessary, but I don't think we should prevent these kinds of
>> tweaks to heuristics just because we can think of another way of doing
>> things.  To me, the question ought to be "does this make the compiler
>> better for users?"
>
> I wouldn't call it tweak to heuristics, it looks to me like an ugly hack.
> In many cases the SRA of bitfields is very desirable, especially for apps
> that use bitfields heavily like Linux kernel.  I remember Alex spending
> quite some time improving SRA to handle bitfields more efficiently,
> and this hack just disables it altogether.
>
> What we IMHO need is a pass late in the gimple pipeline which will
> optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
> or something similar, because bitfield ops are too hard to be handled
> efficiently after the expansion.  Combiner helps sometimes, but for many
> bit field ops the 3 insn limit is too limiting.
> Such pass could help with many more things than just what has been created
> for bitfields by SRA in some cases, consider say:
> struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c;
>
> void
> foo (void)
> {
>  a.i |= 5; a.j = 3; a.k &= 2; a.l = 8;
>  b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l;
> }
>
> which on say i?86/x86_64 can be optimized as roughly:
>  ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035;
>  ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1];
> (of course in some alias friendly way).
> See e.g. PR37135/PR22121/PR42172.

Note that on the old mem-ref branch I lowered bit-field references very
early, but people complained that we eventually loose the targets
ability to directly perform bitfield stores.

Basically I lowered each bitfield access (with bitfield access I name
those using a COMPONENT_REF with a FIELD_DECL that has
DECL_BIT_FIELD set) to either a load of the underlying type
plus a BIT_FIELD_REF on the loaded scalar, or a load of the underlying
type plus a BIT_FIELD_EXPR (new code, which inserts bits into
a scalar) plus a store of the underlying type.  CSE and DSE handled
the redundant loads and stores very nicely and BIT_FIELD_EXPRs
and BIT_FIELD_EXPRs were easily combined.

Now instead of lowering very early or very late as Jakub suggests we
can drive (and perform) that lowering in SRA.  For example by
simply walking through its list of accesses and combine them.

Richard.

>        Jakub
>
Richard Guenther - July 31, 2010, 9:48 a.m.
On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang <jie@codesourcery.com> wrote:
> PR tree-optimization/45144 shows an issue that SRA causes. I used
> arm-none-eabi target as an example in PR tree-optimization/45144. But the
> same issue can also been seen on x86_64-linux-gnu target using the same test
> case in the PR.
>
> SRA completely scalarizes a small record. But when the record is used later
> as a whole, GCC has to make the record out of the scalar parts. When the
> record contains bit-fields, GCC generates ugly code to assemble the scalar
> parts into a record.
>
> Until the aggregates copy propagation is implemented, I think it would
> better to disable full scalarization for such records. The patch is
> attached. It's bootstrapped on x86_64-linux-gnu and regression tested.
>
> Is it OK for now? We can remove it after aggregates copy propagation is
> implemented.
>
> Will it be better to add bit-field check in type_consists_of_records_p
> instead of using a new function "type_contains_bit_field_p"?

The patch looks like a hack.  Can you instead make SRA treat the
underlying type of bit-fields as the object for scalarization?
I'm not 100% familiar with the internals, but IIRC SRA builds an
access tree, so for each bitfield load/store the analysis phase should
record an access of the underlying field covering all bits and
a sub-access for the respective member.

Maybe Martin can weight in here.

Richard.

>
> Regards,
> --
> Jie Zhang
> CodeSourcery
>
Jie Zhang - Aug. 2, 2010, 3:27 a.m.
On 07/31/2010 01:11 AM, Mark Mitchell wrote:
> Richard Guenther wrote:
>
>>> Until the aggregates copy propagation is implemented, I think it would
>>> better to disable full scalarization for such records. The patch is
>>> attached. It's bootstrapped on x86_64-linux-gnu and regression tested.
>
>> Without looking at the patch I have two comments.
>
>  From reading your comments, I'm not sure if you're saying that you don't
> like Jie's idea, or if you think it's an OK idea, but there are some
> other things that you would like to see done as well or instead.  It
> would be helpful if you could make that clear.
>
> To me, Jie's change seems like a plausible heuristic within the current
> infrastructure.  I'm all for building new infrastructure when
> possible/necessary, but I don't think we should prevent these kinds of
> tweaks to heuristics just because we can think of another way of doing
> things.  To me, the question ought to be "does this make the compiler
> better for users?"
>
> To answer that question, what I guess I'd like to know is what the
> impact is on some benchmark that matters.  Here, I don't think SPEC,
> EEMBC, etc. are probably the right places to look; they probably don't
> have much of this kind of code.  Perhaps it would be interesting to know
> how many SRA opportunities we lose in the Linux kernel because of this
> change -- and then spot-check some of them to see whether those are
> cases where we really lose by not doing SRA, or really win because we're
> not doing the kind of ugly stuff that inspired this change.
>
My patch causes no changes on EEMBC for arm-none-eabi target.

My patch prevents several full scalarizations of records with bit-field 
when compiling Linux kernel for x86_64, but none of these causes 
differences in final assemblies. I use 2.6.34.1 and the default config 
for x86_64. I checked -O2 and -Os.

I also checked the effect of my patch on GCC itself. My patch prevents 
one full scalarization of records with bit-field when compiling files 
under gcc/ with -O2. But there is no difference in the final assemblies.


Regards,
Jie Zhang - Aug. 2, 2010, 4:01 a.m.
On 07/31/2010 01:53 AM, Jakub Jelinek wrote:
> On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote:
>> To me, Jie's change seems like a plausible heuristic within the current
>> infrastructure.  I'm all for building new infrastructure when
>> possible/necessary, but I don't think we should prevent these kinds of
>> tweaks to heuristics just because we can think of another way of doing
>> things.  To me, the question ought to be "does this make the compiler
>> better for users?"
>
> I wouldn't call it tweak to heuristics, it looks to me like an ugly hack.
> In many cases the SRA of bitfields is very desirable, especially for apps
> that use bitfields heavily like Linux kernel.  I remember Alex spending
> quite some time improving SRA to handle bitfields more efficiently,
> and this hack just disables it altogether.
>
My patch does not disable SRA of bit-fields. It's an adjust of the tweak 
for PR 42585. It's only disable total scalarization of small records 
containing bit-fields. As you said below, it's difficult for RTL 
combiner to combine the scalarized bit-fields again. So we'd better to 
avoid total scalarizing it. If the bit-fields are accessed explicitly in 
the IL, SRA still works as in usual way.

As my reply to Mark's email, my patch does not change the final assembly 
of linux kernel, at least for 2.6.34.1 with the default configuration of 
x86_64.


Regards,
Jie Zhang - Aug. 2, 2010, 4:10 a.m.
On 07/31/2010 02:59 AM, Mark Mitchell wrote:
> Andrew Pinski wrote:
>
>> Interesting because we have this extact pass here at cavium. I can post
>> the patch set that adds it to 4.3.3 if anyone wants to look at them.
>
> Certainly seems like it would be helpful, assuming Cavium will assign
> copyright.
>
This is very interesting! If Cavium will assign copyright and the patch 
set is posted, I can help to bring it to trunk.


Regards,
Jie Zhang - Aug. 2, 2010, 4:28 a.m.
On 07/31/2010 05:48 PM, Richard Guenther wrote:
> On Fri, Jul 30, 2010 at 6:09 PM, Jie Zhang<jie@codesourcery.com>  wrote:
>> PR tree-optimization/45144 shows an issue that SRA causes. I used
>> arm-none-eabi target as an example in PR tree-optimization/45144. But the
>> same issue can also been seen on x86_64-linux-gnu target using the same test
>> case in the PR.
>>
>> SRA completely scalarizes a small record. But when the record is used later
>> as a whole, GCC has to make the record out of the scalar parts. When the
>> record contains bit-fields, GCC generates ugly code to assemble the scalar
>> parts into a record.
>>
>> Until the aggregates copy propagation is implemented, I think it would
>> better to disable full scalarization for such records. The patch is
>> attached. It's bootstrapped on x86_64-linux-gnu and regression tested.
>>
>> Is it OK for now? We can remove it after aggregates copy propagation is
>> implemented.
>>
>> Will it be better to add bit-field check in type_consists_of_records_p
>> instead of using a new function "type_contains_bit_field_p"?
>
> The patch looks like a hack.  Can you instead make SRA treat the
> underlying type of bit-fields as the object for scalarization?
> I'm not 100% familiar with the internals, but IIRC SRA builds an
> access tree, so for each bitfield load/store the analysis phase should
> record an access of the underlying field covering all bits and
> a sub-access for the respective member.
>
Yes. It's a hack, but it's a hack to the hack for PR 42585. ;-) The fix 
for PR 42585 scalarize small records unconditionally, in hope that later 
passes can fix it up if the scalars are not used eventually. But for 
bit-fields, RTL combiner failed to do so. :-(

> Maybe Martin can weight in here.
>
Martin?


Regards,
Martin Jambor - Aug. 2, 2010, 1:01 p.m.
Hi,

On Sat, Jul 31, 2010 at 12:09:42AM +0800, Jie Zhang wrote:
> PR tree-optimization/45144 shows an issue that SRA causes. I used
> arm-none-eabi target as an example in PR tree-optimization/45144.
> But the same issue can also been seen on x86_64-linux-gnu target
> using the same test case in the PR.
> 
> SRA completely scalarizes a small record. But when the record is
> used later as a whole, GCC has to make the record out of the scalar
> parts. When the record contains bit-fields, GCC generates ugly code
> to assemble the scalar parts into a record.
> 
> Until the aggregates copy propagation is implemented, I think it
> would better to disable full scalarization for such records. The
> patch is attached. It's bootstrapped on x86_64-linux-gnu and
> regression tested.
> 
> Is it OK for now? We can remove it after aggregates copy propagation
> is implemented.
> 
> Will it be better to add bit-field check in
> type_consists_of_records_p instead of using a new function
> "type_contains_bit_field_p"?
> 

When I was implementing the total scalarization bit of SRA I thought
of disabling it for structures with bit-fields too.  I did not really
examine the effects in any way but I never expected this to result in
nice code at places where we use SRA to do poor-man's copy
propagation.  However, eventually I decided to keep the total
scalarization for these structures because doing so can save stack
space and it would be shame if adding one such field to a structure
would make us use the space again (in fact, total scalarization was
introduced as a fix to unnecessary stack-frame setup bugs like PR
42585).  But given your results with kernel and gcc, I don't object to
disabling it... people will scream if something slows down for them.

On the other hand, if we decide to go this way, we need to do the
check at a different place, going over the whole type whenever looking
at an assignment is not necessary and is wasteful.  The check would be
most appropriate as a part of type_consists_of_records_p where it
would be performed only once for each variable in question.

Thanks,

Martin

> 
> Regards,
> -- 
> Jie Zhang
> CodeSourcery

> 
> 	PR tree-optimization/45144
> 	* tree-sra.c (type_contains_bit_field_p): New.
> 	(build_accesses_from_assign): Don't completely scalarize
> 	a record if it contains bit-field.
> 
> 	testsuite/
> 	PR tree-optimization/45144
> 	* gcc.dg/tree-ssa/pr45144.c: New test.
> 
> Index: testsuite/gcc.dg/tree-ssa/pr45144.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/pr45144.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/pr45144.c	(revision 0)
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +void baz (unsigned);
> +
> +extern unsigned buf[];
> +
> +struct A
> +{
> +  unsigned a1:10;
> +  unsigned a2:3;
> +  unsigned:19;
> +};
> +
> +union TMP
> +{
> +  struct A a;
> +  unsigned int b;
> +};
> +
> +static unsigned
> +foo (struct A *p)
> +{
> +  union TMP t;
> +  struct A x;
> +  
> +  x = *p;
> +  t.a = x;
> +  return t.b;
> +}
> +
> +void
> +bar (unsigned orig, unsigned *new)
> +{
> +  struct A a;
> +  union TMP s;
> +
> +  s.b = orig;
> +  a = s.a;
> +  if (a.a1)
> +    baz (a.a2);
> +  *new = foo (&a);
> +}
> +
> +/* { dg-final { scan-tree-dump "x = a;" "optimized"} } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: tree-sra.c
> ===================================================================
> --- tree-sra.c	(revision 162704)
> +++ tree-sra.c	(working copy)
> @@ -998,6 +998,30 @@ disqualify_ops_if_throwing_stmt (gimple 
>    return false;
>  }
>  
> +static bool
> +type_contains_bit_field_p (const_tree type)
> +{
> +  tree fld;
> +
> +  if (TREE_CODE (type) != RECORD_TYPE)
> +    return false;
> +
> +  for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> +    if (TREE_CODE (fld) == FIELD_DECL)
> +      {
> +	tree ft = TREE_TYPE (fld);
> +
> +	if (DECL_BIT_FIELD (fld))
> +	  return true;
> +
> +	if (TREE_CODE (ft) == RECORD_TYPE
> +	    && type_contains_bit_field_p (ft))
> +	  return true;
> +      }
> +
> +  return false;
> +}
> +
>  /* Scan expressions occuring in STMT, create access structures for all accesses
>     to candidates for scalarization and remove those candidates which occur in
>     statements or expressions that prevent them from being split apart.  Return
> @@ -1025,7 +1049,8 @@ build_accesses_from_assign (gimple stmt)
>      {
>        racc->grp_assignment_read = 1;
>        if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
> -	  && !is_gimple_reg_type (racc->type))
> +	  && !is_gimple_reg_type (racc->type)
> +	  && !type_contains_bit_field_p (racc->type))
>  	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
>      }
>
Martin Jambor - Aug. 2, 2010, 1:24 p.m.
Hi,

On Sat, Jul 31, 2010 at 11:44:24AM +0200, Richard Guenther wrote:
> On Fri, Jul 30, 2010 at 7:53 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Jul 30, 2010 at 10:11:42AM -0700, Mark Mitchell wrote:
> >> To me, Jie's change seems like a plausible heuristic within the current
> >> infrastructure.  I'm all for building new infrastructure when
> >> possible/necessary, but I don't think we should prevent these kinds of
> >> tweaks to heuristics just because we can think of another way of doing
> >> things.  To me, the question ought to be "does this make the compiler
> >> better for users?"
> >
> > I wouldn't call it tweak to heuristics, it looks to me like an ugly hack.
> > In many cases the SRA of bitfields is very desirable, especially for apps
> > that use bitfields heavily like Linux kernel.  I remember Alex spending
> > quite some time improving SRA to handle bitfields more efficiently,
> > and this hack just disables it altogether.
> >
> > What we IMHO need is a pass late in the gimple pipeline which will
> > optimize adjacent bitfield operations and lower to BIT_FIELD_REF ops
> > or something similar, because bitfield ops are too hard to be handled
> > efficiently after the expansion.  Combiner helps sometimes, but for many
> > bit field ops the 3 insn limit is too limiting.
> > Such pass could help with many more things than just what has been created
> > for bitfields by SRA in some cases, consider say:
> > struct A { unsigned short h; int i : 4, j : 4, k : 4, l : 4; } a, b, c;
> >
> > void
> > foo (void)
> > {
> >  a.i |= 5; a.j = 3; a.k &= 2; a.l = 8;
> >  b.i = c.i; b.j = c.j; b.k = c.k; b.l = c.l;
> > }
> >
> > which on say i?86/x86_64 can be optimized as roughly:
> >  ((unsigned short *)&a)[1] = (((unsigned short *)&a)[1] & 0x20a) | 0x8035;
> >  ((unsigned short *)&b)[1] = ((unsigned short *)&c)[1];
> > (of course in some alias friendly way).
> > See e.g. PR37135/PR22121/PR42172.
> 
> Note that on the old mem-ref branch I lowered bit-field references very
> early, but people complained that we eventually loose the targets
> ability to directly perform bitfield stores.
> 
> Basically I lowered each bitfield access (with bitfield access I name
> those using a COMPONENT_REF with a FIELD_DECL that has
> DECL_BIT_FIELD set) to either a load of the underlying type
> plus a BIT_FIELD_REF on the loaded scalar, or a load of the underlying
> type plus a BIT_FIELD_EXPR (new code, which inserts bits into
> a scalar) plus a store of the underlying type.  CSE and DSE handled
> the redundant loads and stores very nicely and BIT_FIELD_EXPRs
> and BIT_FIELD_EXPRs were easily combined.
> 
> Now instead of lowering very early or very late as Jakub suggests we
> can drive (and perform) that lowering in SRA.  For example by
> simply walking through its list of accesses and combine them.
> 

With somehing like BIT_FIELD_EXPR... yes, this is probably the way to
go, also for other reasons like replacing the implementation of
buid_ref_for_offset with building of a MEM_REF.  But as you well know,
this is not particuarly easy.  But I hope I'll get there eventually :-)

Martin
Mark Mitchell - Aug. 2, 2010, 4:52 p.m.
Jie Zhang wrote:

> My patch prevents several full scalarizations of records with bit-field
> when compiling Linux kernel for x86_64, but none of these causes
> differences in final assemblies. I use 2.6.34.1 and the default config
> for x86_64. I checked -O2 and -Os.

That seems at odds with the statement made previously in this thread
that this optimization was essential for Linux kernel performance.

If Jie's statement is accurate, then, whether or not this is a "hack",
it seems like a win.  I don't see anything wrong with accepting a small,
local improvement that has no user-observable negative impact; we can
always rip it out and replace it with something better when something
better exists.
Richard Guenther - Aug. 3, 2010, 9 a.m.
On Mon, Aug 2, 2010 at 6:52 PM, Mark Mitchell <mark@codesourcery.com> wrote:
> Jie Zhang wrote:
>
>> My patch prevents several full scalarizations of records with bit-field
>> when compiling Linux kernel for x86_64, but none of these causes
>> differences in final assemblies. I use 2.6.34.1 and the default config
>> for x86_64. I checked -O2 and -Os.
>
> That seems at odds with the statement made previously in this thread
> that this optimization was essential for Linux kernel performance.
>
> If Jie's statement is accurate, then, whether or not this is a "hack",
> it seems like a win.  I don't see anything wrong with accepting a small,
> local improvement that has no user-observable negative impact; we can
> always rip it out and replace it with something better when something
> better exists.

OTOH no changes in code generation are also not in favor of this
patch.  Why didn't it improve anything?  Or was that expected?

Can you adjust the patch according to Martins suggestion?

Richard.
Jie Zhang - Aug. 3, 2010, 9:58 a.m.
On 08/03/2010 05:00 PM, Richard Guenther wrote:
> On Mon, Aug 2, 2010 at 6:52 PM, Mark Mitchell<mark@codesourcery.com>  wrote:
>> Jie Zhang wrote:
>>
>>> My patch prevents several full scalarizations of records with bit-field
>>> when compiling Linux kernel for x86_64, but none of these causes
>>> differences in final assemblies. I use 2.6.34.1 and the default config
>>> for x86_64. I checked -O2 and -Os.
>>
>> That seems at odds with the statement made previously in this thread
>> that this optimization was essential for Linux kernel performance.
>>
>> If Jie's statement is accurate, then, whether or not this is a "hack",
>> it seems like a win.  I don't see anything wrong with accepting a small,
>> local improvement that has no user-observable negative impact; we can
>> always rip it out and replace it with something better when something
>> better exists.
>
> OTOH no changes in code generation are also not in favor of this
> patch.  Why didn't it improve anything?  Or was that expected?
>
Only two total scalarizations are prevented by my patch when building 
the linux kernel for the default x86_64 config, which compiles 1646 C files.

One is in io_apic.c:ioapic_write_entry (), "struct IO_APIC_route_entry e".

The other is in fs-writeback.c:bdi_sync_writeback (), "struct 
wb_writeback_args args".

In both cases, those structs are used simply as a whole. GCC can already 
optimize away the parts generated by total scalarization. So there is no 
difference when it's disabled.

> Can you adjust the patch according to Martins suggestion?
>
OK.


Thanks,

Patch


	PR tree-optimization/45144
	* tree-sra.c (type_contains_bit_field_p): New.
	(build_accesses_from_assign): Don't completely scalarize
	a record if it contains bit-field.

	testsuite/
	PR tree-optimization/45144
	* gcc.dg/tree-ssa/pr45144.c: New test.

Index: testsuite/gcc.dg/tree-ssa/pr45144.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr45144.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr45144.c	(revision 0)
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void baz (unsigned);
+
+extern unsigned buf[];
+
+struct A
+{
+  unsigned a1:10;
+  unsigned a2:3;
+  unsigned:19;
+};
+
+union TMP
+{
+  struct A a;
+  unsigned int b;
+};
+
+static unsigned
+foo (struct A *p)
+{
+  union TMP t;
+  struct A x;
+  
+  x = *p;
+  t.a = x;
+  return t.b;
+}
+
+void
+bar (unsigned orig, unsigned *new)
+{
+  struct A a;
+  union TMP s;
+
+  s.b = orig;
+  a = s.a;
+  if (a.a1)
+    baz (a.a2);
+  *new = foo (&a);
+}
+
+/* { dg-final { scan-tree-dump "x = a;" "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: tree-sra.c
===================================================================
--- tree-sra.c	(revision 162704)
+++ tree-sra.c	(working copy)
@@ -998,6 +998,30 @@  disqualify_ops_if_throwing_stmt (gimple 
   return false;
 }
 
+static bool
+type_contains_bit_field_p (const_tree type)
+{
+  tree fld;
+
+  if (TREE_CODE (type) != RECORD_TYPE)
+    return false;
+
+  for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
+    if (TREE_CODE (fld) == FIELD_DECL)
+      {
+	tree ft = TREE_TYPE (fld);
+
+	if (DECL_BIT_FIELD (fld))
+	  return true;
+
+	if (TREE_CODE (ft) == RECORD_TYPE
+	    && type_contains_bit_field_p (ft))
+	  return true;
+      }
+
+  return false;
+}
+
 /* Scan expressions occuring in STMT, create access structures for all accesses
    to candidates for scalarization and remove those candidates which occur in
    statements or expressions that prevent them from being split apart.  Return
@@ -1025,7 +1049,8 @@  build_accesses_from_assign (gimple stmt)
     {
       racc->grp_assignment_read = 1;
       if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt)
-	  && !is_gimple_reg_type (racc->type))
+	  && !is_gimple_reg_type (racc->type)
+	  && !type_contains_bit_field_p (racc->type))
 	bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc->base));
     }