Patchwork [C++0x] contiguous bitfields race implementation

login
register
mail settings
Submitter Aldy Hernandez
Date Aug. 26, 2011, 6:54 p.m.
Message ID <4E57EBDC.9090900@redhat.com>
Download mbox | patch
Permalink /patch/111822/
State New
Headers show

Comments

Aldy Hernandez - Aug. 26, 2011, 6:54 p.m.
This is a "slight" update from the last revision, with your issues 
addressed as I explained in the last email.  However, everything turned 
out to be much tricker than I expected (variable length offsets with 
arrays, bit fields spanning multiple words, surprising padding 
gymnastics by GCC, etc etc).

It turns out that what we need is to know the precise bit region size at 
all times, and adjust it as we rearrange and cut things into pieces 
throughout the RTL bit field machinery.

I enabled the C++ memory model, and forced a boostrap and regression 
test with it.  This brought about many interesting cases, which I was 
able to distill and add to the testsuite.

Of particular interest was the struct-layout-1.exp tests.  Since many of 
the tests set a global bit field, only to later check it against a local 
variable containing the same value, it is the perfect stressor because, 
while globals are restricted under the memory model, locals are not.  So 
we can check that we can interoperate with the less restrictive model, 
and that the patch does not introduce ABI inconsistencies.  After much 
grief, we are now passing all the struct-layout-1.exp tests. 
Eventually, I'd like to force the struct-layout-1.exp tests to run for 
"--param allow-store-data-races=0" as well.  Unfortunately, this will 
increase testing time.

I have (unfortunately) introduced an additional call to 
get_inner_reference(), but only for the field itself (one time).  I 
can't remember the details, but it was something to effect of the bit 
position + padding being impossible to calculate in one variable array 
reference case.  I can dig up the case if you'd like.

I am currently tackling a reload miscompilation failure while building a 
32-bit library.  I am secretly hoping your review will uncover the flaw 
without me having to pick this up.  Otherwise, this is a much more 
comprehensive approach than what is currently in mainline, and we now 
pass all the bitfield tests the GCC testsuite could throw at it.

Fire away.
* machmode.h (get_best_mode): Remove 2 arguments.
	* fold-const.c (optimize_bit_field_compare): Same.
	(fold_truthop): Same.
	* expr.c (store_field): Change argument types in prototype.
	(emit_group_store): Change argument types to store_bit_field call.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Change argument types.
	Change arguments to get_best_mode.
	(get_bit_range): Rewrite.
	(expand_assignment): Adjust new call to get_bit_range.
	Adjust bitregion_offset when to_rtx is changed.
	Adjust calls to store_field with new argument types.
	(store_field): New argument types.
	Adjust calls to store_bit_field with new arguments.
	* expr.h (store_bit_field): Change argument types.
	* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
	* expmed.c (store_bit_field_1): Change argument types.
	Do not calculate maxbits.
	Adjust bitregion_maxbits if offset changes.
	(store_bit_field): Change argument types.
	Adjust address taking into account bitregion_offset.
	(store_fixed_bit_field): Change argument types.
	Do not calculate maxbits.
	(store_split_bit_field): Change argument types.
	(extract_bit_field_1): Adjust arguments to get_best_mode.
	(extract_fixed_bit_field): Same.
Richard Guenther - Aug. 29, 2011, 11:49 a.m.
On Fri, Aug 26, 2011 at 8:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> This is a "slight" update from the last revision, with your issues addressed
> as I explained in the last email.  However, everything turned out to be much
> tricker than I expected (variable length offsets with arrays, bit fields
> spanning multiple words, surprising padding gymnastics by GCC, etc etc).
>
> It turns out that what we need is to know the precise bit region size at all
> times, and adjust it as we rearrange and cut things into pieces throughout
> the RTL bit field machinery.
>
> I enabled the C++ memory model, and forced a boostrap and regression test
> with it.  This brought about many interesting cases, which I was able to
> distill and add to the testsuite.
>
> Of particular interest was the struct-layout-1.exp tests.  Since many of the
> tests set a global bit field, only to later check it against a local
> variable containing the same value, it is the perfect stressor because,
> while globals are restricted under the memory model, locals are not.  So we
> can check that we can interoperate with the less restrictive model, and that
> the patch does not introduce ABI inconsistencies.  After much grief, we are
> now passing all the struct-layout-1.exp tests. Eventually, I'd like to force
> the struct-layout-1.exp tests to run for "--param allow-store-data-races=0"
> as well.  Unfortunately, this will increase testing time.
>
> I have (unfortunately) introduced an additional call to
> get_inner_reference(), but only for the field itself (one time).  I can't
> remember the details, but it was something to effect of the bit position +
> padding being impossible to calculate in one variable array reference case.
>  I can dig up the case if you'd like.
>
> I am currently tackling a reload miscompilation failure while building a
> 32-bit library.  I am secretly hoping your review will uncover the flaw
> without me having to pick this up.  Otherwise, this is a much more
> comprehensive approach than what is currently in mainline, and we now pass
> all the bitfield tests the GCC testsuite could throw at it.
>
> Fire away.

+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
+    {
+      *byte_offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
+      return;
+    }

shouldn't this be at the very beginning of the function?  Because
you've set *bit_offset to an offset that was _not_ calculated relative
to TREE_OPERAND (exp, 2).  And you'll avoid ICEing

+	  /* bitoff = start_byte * 8 - (fld.byteoff * 8 + fld.bitoff) */
+	  t = fold_build2 (MINUS_EXPR, size_type_node,
+			   fold_build2 (PLUS_EXPR, size_type_node,
+					fold_build2 (MULT_EXPR, size_type_node,
+						     toffset, bits),
+					build_int_cst (integer_type_node,
+						       tbitpos)),
+			   fold_build2 (MULT_EXPR, size_type_node,
+					*byte_offset, bits));
+
+	  *bit_offset = tree_low_cst (t, 1);

here in case t isn't an INTEGER_CST.  The comment before the
tree formula above doesn't match it, please update it.  If
*bit_offset is supposed to be relative to *byte_offset then it should
be easy to calculate it without another get_inner_reference.

Btw, *byte_offset is still not relative to the containing object as
documented, but relative to the base object of the exp reference
tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
to be relative to a.i.j.k get_inner_reference would be not needed
either.  Can you clarify what "containing object" means in the
overall comment please?

If it is really relative to the innermost reference of exp you can
"CSE" the offset of TREE_OPERAND (exp, 0) and do relative
adjustments for all the other get_inner_reference calls.  For
example the

+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
...
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &tbitsize, &end_bitpos, &end_offset,
+			   &tmode, &tunsignedp, &tvolatilep, true);

case is not correct anyway, fld may have variable position
(non-INTEGER_CST DECL_FIELD_OFFSET), you can't
assume

+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);

this thus.

+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start));
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }

so you wasn't convinced about my worries about tail-padding re-use?
And you blindly assume a constant-size record_type ...
and you don't account for DECL_FIELD_OFFSET of bitregion_start
(shouldn't you simply use (and compute) a byte_offset relative to
the start of the record)?  Well, I still think you cannot incoude the
padding at the end of the structure (if TREE_OPERAND (exp, 0) is
a COMPONENT_REF as well then its DECL_SIZE can be different
than it's TYPE_SIZE).

+	      bitregion_byte_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_byte_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));

general remark - you should be using sizetype for byte offsets,
bitsizetype for bit offset trees and size_binop for computations, instead
of fold_build2 (applies everywhere).  And thus pass size_zero_node
to store_field bitregion_byte_offset.

Can you split out the get_best_mode two param removal pieces?  Consider
them pre-approved.

Why do you need to adjust store_bit_field with the extra param - can't
you simply pass an adjusted str_rtx from the single caller that can
have that non-zero?

Thanks,
Richard.
Aldy Hernandez - Aug. 30, 2011, 3:01 p.m.
[I'm going to respond to this piece-meal, to make sure I don't drop 
anything.  My apologies for the long thread, but I'm pretty sure it's in 
everybody's kill file by now.]

> +  /* Be as conservative as possible on variable offsets.  */
> +  if (TREE_OPERAND (exp, 2)
> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
> +    {
> +      *byte_offset = TREE_OPERAND (exp, 2);
> +      *maxbits = BITS_PER_UNIT;
> +      return;
> +    }
>
> shouldn't this be at the very beginning of the function?  Because
> you've set *bit_offset to an offset that was _not_ calculated relative

Sure.  I assume in this case, *bit_offset would be 0, right?
Aldy Hernandez - Aug. 30, 2011, 4:15 p.m.
> *bit_offset is supposed to be relative to *byte_offset then it should
> be easy to calculate it without another get_inner_reference.

Since, as you suggested, we will terminate early on variable length 
offsets, we can assume both DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET 
will be constants by now.  So, I assume we can calculate the bit offset 
like this:

*bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
	       * BITS_PER_UNIT
	       + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
   - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
      * BITS_PER_UNIT
      + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));

(Yes, I know we can factor out the BITS_PER_UNIT and only do one 
multiplication, it's just easier to read this way.)

Is this what you had in mind?
Aldy Hernandez - Aug. 30, 2011, 6:13 p.m.
> Btw, *byte_offset is still not relative to the containing object as
> documented, but relative to the base object of the exp reference
> tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
> to be relative to a.i.j.k get_inner_reference would be not needed
> either.  Can you clarify what "containing object" means in the
> overall comment please?

I'm thoroughly confused here.  Originally I had "inner decl", then we 
changed the nomenclature to "containing object", and now there's this 
"innermost reference".

What I mean to say is the "a" in a.i.j.k.l.  How would you like me to 
call that?  The innermost reference?  The inner decl?  Would this 
comment be acceptable:

    Given a COMPONENT_REF, this function calculates the byte offset
    from the innermost reference ("a" in a.i.j.k.l) to the start of the
    contiguous bit region containing the field in question.

>
> If it is really relative to the innermost reference of exp you can
> "CSE" the offset of TREE_OPERAND (exp, 0) and do relative
> adjustments for all the other get_inner_reference calls.  For
> example the
>
> +  /* If we found the end of the bit field sequence, include the
> +     padding up to the next field...  */
>     if (fld)
>       {
> ...
> +      /* Calculate bitpos and offset of the next field.  */
> +      get_inner_reference (build3 (COMPONENT_REF,
> +				   TREE_TYPE (exp),
> +				   TREE_OPERAND (exp, 0),
> +				   fld, NULL_TREE),
> +			&tbitsize,&end_bitpos,&end_offset,
> +			&tmode,&tunsignedp,&tvolatilep, true);
>
> case is not correct anyway, fld may have variable position
> (non-INTEGER_CST DECL_FIELD_OFFSET), you can't
> assume

Innermost here means "a" in a.i.j.k.l?  If so, this is what we're 
currently doing, *byte_offset is the start of the bit region, and 
*bit_offset is the offset from that.

First, I thought we couldn't get a variable position here because we are 
now handling that case at the beginning of the function with:

   /* Be as conservative as possible on variable offsets.  */
   if (TREE_OPERAND (exp, 2)
       && !host_integerp (TREE_OPERAND (exp, 2), 1))
     {
       *byte_offset = TREE_OPERAND (exp, 2);
       *maxbits = BITS_PER_UNIT;
       *bit_offset = 0;
       return;
     }

And even if we do get a variable position, I have so far being able to 
get away with this...

>
> +      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
>
> this thus.

...because the call to fold_build2 immediately preceding this will fold 
away the variable offset.

Is what you want, that we call get_inner_reference once, and then use 
DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit 
offset?  I found this to be quite tricky with padding, and such, but am 
willing to give it a whirl again.

However, could I beg you to reconsider this, and get something working 
first, only later concentrating on removing the get_inner_reference() 
calls, and performing any other tweaks/optimizations?

Aldy
Richard Guenther - Aug. 31, 2011, 7:45 a.m.
On Tue, Aug 30, 2011 at 5:01 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> [I'm going to respond to this piece-meal, to make sure I don't drop
> anything.  My apologies for the long thread, but I'm pretty sure it's in
> everybody's kill file by now.]
>
>> +  /* Be as conservative as possible on variable offsets.  */
>> +  if (TREE_OPERAND (exp, 2)
>> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
>> +    {
>> +      *byte_offset = TREE_OPERAND (exp, 2);
>> +      *maxbits = BITS_PER_UNIT;
>> +      return;
>> +    }
>>
>> shouldn't this be at the very beginning of the function?  Because
>> you've set *bit_offset to an offset that was _not_ calculated relative
>
> Sure.  I assume in this case, *bit_offset would be 0, right?

It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
*byte_offset would be

*byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
                                       size_int (DECL_OFFSET_ALIGN
(field) / BITS_PER_UNIT));

see expr.c:component_ref_field_offset () (which you conveniently
could use here).

Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
return offsets relative to the immediate containing struct type, not
relative to the base object like get_inner_reference does ...
(where it is still unclear to me what we are supposed to return from this
function ...)

Thus, conservative would be using get_inner_reference here, if the
offset is supposed to be relative to the base object.

Richard.
Richard Guenther - Aug. 31, 2011, 7:46 a.m.
On Tue, Aug 30, 2011 at 6:15 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> *bit_offset is supposed to be relative to *byte_offset then it should
>> be easy to calculate it without another get_inner_reference.
>
> Since, as you suggested, we will terminate early on variable length offsets,
> we can assume both DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET will be
> constants by now.

Yes.

>  So, I assume we can calculate the bit offset like this:
>
> *bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
>               * BITS_PER_UNIT
>               + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
>  - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
>     * BITS_PER_UNIT
>     + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));
>
> (Yes, I know we can factor out the BITS_PER_UNIT and only do one
> multiplication, it's just easier to read this way.)
>
> Is this what you had in mind?

Yes.  For convenience I'd simply use double_ints for the intermediate
calculations.

Richard.
Richard Guenther - Aug. 31, 2011, 7:53 a.m.
On Tue, Aug 30, 2011 at 8:13 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Btw, *byte_offset is still not relative to the containing object as
>> documented, but relative to the base object of the exp reference
>> tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
>> to be relative to a.i.j.k get_inner_reference would be not needed
>> either.  Can you clarify what "containing object" means in the
>> overall comment please?
>
> I'm thoroughly confused here.  Originally I had "inner decl", then we
> changed the nomenclature to "containing object", and now there's this
> "innermost reference".

Well, the nomenclature is not so important once the function only
computes one variant.  Only because it doesn't right now I am
confused with the nomenclature trying to figure out what it is supposed
to be relative to ...

The containing object of a component-ref is TREE_OPERAND (exp, 0)
to me.  The base object would be get_base_object (exp), which is
eventually what we want, right?

> What I mean to say is the "a" in a.i.j.k.l.  How would you like me to call
> that?  The innermost reference?  The inner decl?  Would this comment be
> acceptable:
>
>   Given a COMPONENT_REF, this function calculates the byte offset
>   from the innermost reference ("a" in a.i.j.k.l) to the start of the
>   contiguous bit region containing the field in question.

  from the base object ("a" in a.i.j.k.l) ...

would be fine with me.

>>
>> If it is really relative to the innermost reference of exp you can
>> "CSE" the offset of TREE_OPERAND (exp, 0) and do relative
>> adjustments for all the other get_inner_reference calls.  For
>> example the
>>
>> +  /* If we found the end of the bit field sequence, include the
>> +     padding up to the next field...  */
>>    if (fld)
>>      {
>> ...
>> +      /* Calculate bitpos and offset of the next field.  */
>> +      get_inner_reference (build3 (COMPONENT_REF,
>> +                                  TREE_TYPE (exp),
>> +                                  TREE_OPERAND (exp, 0),
>> +                                  fld, NULL_TREE),
>> +                       &tbitsize,&end_bitpos,&end_offset,
>> +                       &tmode,&tunsignedp,&tvolatilep, true);
>>
>> case is not correct anyway, fld may have variable position
>> (non-INTEGER_CST DECL_FIELD_OFFSET), you can't
>> assume
>
> Innermost here means "a" in a.i.j.k.l?  If so, this is what we're currently
> doing, *byte_offset is the start of the bit region, and *bit_offset is the
> offset from that.
>
> First, I thought we couldn't get a variable position here because we are now
> handling that case at the beginning of the function with:
>
>  /* Be as conservative as possible on variable offsets.  */
>  if (TREE_OPERAND (exp, 2)
>      && !host_integerp (TREE_OPERAND (exp, 2), 1))
>    {
>      *byte_offset = TREE_OPERAND (exp, 2);
>      *maxbits = BITS_PER_UNIT;
>      *bit_offset = 0;
>      return;
>    }
>
> And even if we do get a variable position, I have so far being able to get
> away with this...

Did you test Ada and enable the C++ memory model? ;)

Btw, even if the bitfield we access (and thus the whole region) is at a
constant offset, the field _following_ the bitregion (the one you query
above with get_inner_reference) can be at variable offset.  I suggest
to simply not include any padding in that case (which would be,
TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).

>>
>> +      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
>>
>> this thus.
>
> ...because the call to fold_build2 immediately preceding this will fold away
> the variable offset.

You hope so ;)

> Is what you want, that we call get_inner_reference once, and then use
> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
> offset?  I found this to be quite tricky with padding, and such, but am
> willing to give it a whirl again.

Yes.

> However, could I beg you to reconsider this, and get something working
> first, only later concentrating on removing the get_inner_reference() calls,
> and performing any other tweaks/optimizations?

Sure, it's fine to tweak this in a followup.

Thanks,
Richard.

> Aldy
>
Richard Guenther - Aug. 31, 2011, 10:57 a.m.
On Wed, Aug 31, 2011 at 9:45 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 5:01 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> [I'm going to respond to this piece-meal, to make sure I don't drop
>> anything.  My apologies for the long thread, but I'm pretty sure it's in
>> everybody's kill file by now.]
>>
>>> +  /* Be as conservative as possible on variable offsets.  */
>>> +  if (TREE_OPERAND (exp, 2)
>>> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
>>> +    {
>>> +      *byte_offset = TREE_OPERAND (exp, 2);
>>> +      *maxbits = BITS_PER_UNIT;
>>> +      return;
>>> +    }
>>>
>>> shouldn't this be at the very beginning of the function?  Because
>>> you've set *bit_offset to an offset that was _not_ calculated relative
>>
>> Sure.  I assume in this case, *bit_offset would be 0, right?
>
> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
> *byte_offset would be
>
> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>                                       size_int (DECL_OFFSET_ALIGN
> (field) / BITS_PER_UNIT));
>
> see expr.c:component_ref_field_offset () (which you conveniently
> could use here).
>
> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
> return offsets relative to the immediate containing struct type, not
> relative to the base object like get_inner_reference does ...
> (where it is still unclear to me what we are supposed to return from this
> function ...)
>
> Thus, conservative would be using get_inner_reference here, if the
> offset is supposed to be relative to the base object.

That said, shouldn't *maxbits not at least make sure to cover the field itself?

> Richard.
>
Aldy Hernandez - Aug. 31, 2011, 3:17 p.m.
>> *bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
>>                * BITS_PER_UNIT
>>                + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
>>   - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
>>      * BITS_PER_UNIT
>>      + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));
>>
>> (Yes, I know we can factor out the BITS_PER_UNIT and only do one
>> multiplication, it's just easier to read this way.)
>>
>> Is this what you had in mind?
>
> Yes.  For convenience I'd simply use double_ints for the intermediate
> calculations.

Ok, let's leave it like this for now.  I have added a FIXME note, and we 
can optimize this after we get everything working.
Aldy Hernandez - Aug. 31, 2011, 4:53 p.m.
>>> Sure.  I assume in this case, *bit_offset would be 0, right?
>>
>> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
>> *byte_offset would be
>>
>> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>>                                        size_int (DECL_OFFSET_ALIGN
>> (field) / BITS_PER_UNIT));
>>
>> see expr.c:component_ref_field_offset () (which you conveniently
>> could use here).
>>
>> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
>> return offsets relative to the immediate containing struct type, not
>> relative to the base object like get_inner_reference does ...
>> (where it is still unclear to me what we are supposed to return from this
>> function ...)

Ok, I see where your confusion lies.  The function is supposed to return 
a byte offset from the base object, none of this containing object or 
immediate struct, or whatever.  Base object, as in "a" in a.i.j.k, as in 
what you get back from get_base_address().

Originally everything was calculated with get_inner_reference(), which 
is relative to the base object, but now we have this hodge podge of 
get_inner_reference() calls with ad-hoc calculations and optimizations. 
  Gladly, we've agreed to use get_inner_reference() and optimize at a 
later time.

So... base object throughout, anything else is a mistake on my part.

BTW, this whole variable length offset I still can't trigger.  I know 
you want to cater to Ada, but does it even make sense to enable the C++ 
memory model in Ada?  Who would ever do this?  Be that as it may, I'll 
humor you and handle it.

>> Thus, conservative would be using get_inner_reference here, if the
>> offset is supposed to be relative to the base object.
>
> That said, shouldn't *maxbits not at least make sure to cover the field itself?

Is this what you want?

   /* Be as conservative as possible on variable offsets.  */
   if (TREE_OPERAND (exp, 2)
       && !host_integerp (TREE_OPERAND (exp, 2), 1))
     {
       get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   field, NULL_TREE),
			   &tbitsize, &start_bitpos, &start_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);

       *byte_offset = start_offset ? start_offset : size_zero_node;
       *bit_offset = start_bitpos;
       *maxbits = tbitsize;
       return;
     }
Aldy Hernandez - Aug. 31, 2011, 6:09 p.m.
> Did you test Ada and enable the C++ memory model? ;)

See my earlier comment on Ada.  Who would ever use the C++ memory model 
on Ada?

> Btw, even if the bitfield we access (and thus the whole region) is at a
> constant offset, the field _following_ the bitregion (the one you query
> above with get_inner_reference) can be at variable offset.  I suggest
> to simply not include any padding in that case (which would be,
> TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).

I still have not found a place where we get a variable offset here 
(after folding the computation).  How about we put a gcc_assert() along 
with a big fat comment with your above suggestion when we encounter 
this.  Or can you give me an example of this case?

>> Is what you want, that we call get_inner_reference once, and then use
>> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
>> offset?  I found this to be quite tricky with padding, and such, but am
>> willing to give it a whirl again.
>
> Yes.

I have added a comment to this effect, and will address it along with 
the get_inner_reference() removal you have suggested as a followup.
Richard Guenther - Sept. 1, 2011, 6:58 a.m.
On Wed, Aug 31, 2011 at 6:53 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>>>> Sure.  I assume in this case, *bit_offset would be 0, right?
>>>
>>> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
>>> *byte_offset would be
>>>
>>> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>>>                                       size_int (DECL_OFFSET_ALIGN
>>> (field) / BITS_PER_UNIT));
>>>
>>> see expr.c:component_ref_field_offset () (which you conveniently
>>> could use here).
>>>
>>> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
>>> return offsets relative to the immediate containing struct type, not
>>> relative to the base object like get_inner_reference does ...
>>> (where it is still unclear to me what we are supposed to return from this
>>> function ...)
>
> Ok, I see where your confusion lies.  The function is supposed to return a
> byte offset from the base object, none of this containing object or
> immediate struct, or whatever.  Base object, as in "a" in a.i.j.k, as in
> what you get back from get_base_address().
>
> Originally everything was calculated with get_inner_reference(), which is
> relative to the base object, but now we have this hodge podge of
> get_inner_reference() calls with ad-hoc calculations and optimizations.
>  Gladly, we've agreed to use get_inner_reference() and optimize at a later
> time.
>
> So... base object throughout, anything else is a mistake on my part.
>
> BTW, this whole variable length offset I still can't trigger.  I know you
> want to cater to Ada, but does it even make sense to enable the C++ memory
> model in Ada?  Who would ever do this?  Be that as it may, I'll humor you
> and handle it.
>
>>> Thus, conservative would be using get_inner_reference here, if the
>>> offset is supposed to be relative to the base object.
>>
>> That said, shouldn't *maxbits not at least make sure to cover the field
>> itself?
>
> Is this what you want?
>
>  /* Be as conservative as possible on variable offsets.  */
>  if (TREE_OPERAND (exp, 2)
>      && !host_integerp (TREE_OPERAND (exp, 2), 1))
>    {
>      get_inner_reference (build3 (COMPONENT_REF,
>                                   TREE_TYPE (exp),
>                                   TREE_OPERAND (exp, 0),
>                                   field, NULL_TREE),
>                           &tbitsize, &start_bitpos, &start_offset,
>                           &tmode, &tunsignedp, &tvolatilep, true);
>
>      *byte_offset = start_offset ? start_offset : size_zero_node;
>      *bit_offset = start_bitpos;
>      *maxbits = tbitsize;
>      return;
>    }

Yes, exactly.

Richard.
Richard Guenther - Sept. 1, 2011, 7:02 a.m.
On Wed, Aug 31, 2011 at 8:09 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Did you test Ada and enable the C++ memory model? ;)
>
> See my earlier comment on Ada.  Who would ever use the C++ memory model on
> Ada?

People interoperating Ada with C++.  Our bug triager Zdenek who
figures out the --param?

>> Btw, even if the bitfield we access (and thus the whole region) is at a
>> constant offset, the field _following_ the bitregion (the one you query
>> above with get_inner_reference) can be at variable offset.  I suggest
>> to simply not include any padding in that case (which would be,
>> TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).
>
> I still have not found a place where we get a variable offset here (after
> folding the computation).  How about we put a gcc_assert() along with a big
> fat comment with your above suggestion when we encounter this.  Or can you
> give me an example of this case?

My point is, the middle-end infrastructure makes it possible for this
case to appear, and it seems to be easy to handle conservatively.
There isn't a need to wait for users to run into an ICE or an assert we put
there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I ain't.

>>> Is what you want, that we call get_inner_reference once, and then use
>>> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
>>> offset?  I found this to be quite tricky with padding, and such, but am
>>> willing to give it a whirl again.
>>
>> Yes.
>
> I have added a comment to this effect, and will address it along with the
> get_inner_reference() removal you have suggested as a followup.

Thanks,
Richard.
Arnaud Charlet - Sept. 1, 2011, 7:05 a.m.
> >> Did you test Ada and enable the C++ memory model? ;)
> >
> > See my earlier comment on Ada.  Who would ever use the C++ memory model on
> > Ada?
> 
> People interoperating Ada with C++.  Our bug triager Zdenek who
> figures out the --param?

Right, that's one example. There are also actually some similarities between
the C++ memory model and the Ada language, so it's not so unconcievable
that Ada would like to take advantage of some of these capabilities.

Arno
Aldy Hernandez - Sept. 1, 2011, 2:16 p.m.
> My point is, the middle-end infrastructure makes it possible for this
> case to appear, and it seems to be easy to handle conservatively.
> There isn't a need to wait for users to run into an ICE or an assert we put
> there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I ain't.

Ughh, this is getting messier.

Ok, I propose keeping track of the field prior (lastfld), calling 
get_inner_reference() and adding DECL_SIZE (or tbitsize if you prefer) 
to calculate maxbits without the padding.

Notice the comment at the top.  We can get rid of yet another call to 
get_inner_reference later.

Is this what you had in mind?

BTW, we don't need to round up to the next byte here, do we?

Thanks.
Aldy

   /* If we found the end of the bit field sequence, include the
      padding up to the next field...  */
   if (fld)
     {
       tree end_offset, t;
       HOST_WIDE_INT end_bitpos;

       /* FIXME: Only call get_inner_reference once (at the beginning
	 of the bit region), and use
	 DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET throughout to
	 calculate any subsequent bit offset.  */

       /* Even if the bitfield we access (and thus the whole region) is
	 at a constant offset, the field _following_ the bitregion can
	 be at variable offset.  In this case, do not include any
	 padding.  This is mostly for Ada.  */
       if (TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST)
	{
	  get_inner_reference (build3 (COMPONENT_REF,
				       TREE_TYPE (exp),
				       TREE_OPERAND (exp, 0),
				       lastfld, NULL_TREE),
			       &tbitsize, &end_bitpos, &end_offset,
			       &tmode, &tunsignedp, &tvolatilep, true);

	  /* Calculate the size of the bit region up the last
	     bitfield, excluding any subsequent padding.

	     t = (end_byte_off - start_byte_offset) * 8 + end_bit_off */
	  end_offset = end_offset ? end_offset : size_zero_node;
	  t = fold_build2 (PLUS_EXPR, size_type_node,
			   fold_build2 (MULT_EXPR, size_type_node,
					fold_build2 (MINUS_EXPR, size_type_node,
						     end_offset,
						     *byte_offset),
					build_int_cst (size_type_node,
						       BITS_PER_UNIT)),
			   build_int_cst (size_type_node,
					  end_bitpos));
	  /* Add the bitsize of the last field.  */
	  t = fold_build2 (PLUS_EXPR, size_type_node,
			   t, DECL_SIZE (lastfld));

	  *maxbits = tree_low_cst (t, 1);
	  return;
	}
...
...
...
Aldy Hernandez - Sept. 1, 2011, 2:52 p.m.
[Jason, can you pontificate on tail-padding and the upcoming C++ 
standard with regards to bitfields?]

> so you wasn't convinced about my worries about tail-padding re-use?

To answer your question, I believe we can't touch past the last field 
(into the padding) if the subsequent record will be packed into the 
first's padding.

struct A {
   int a : 17;
};
struct B : public A {
   char c;
};

So here, if <c> gets packed into the tail-padding of A, we can't touch 
the padding of A when storing into <a>.  These are different structures, 
and I assume would be treated as nested structures, which are distinct 
memory locations.

Is there a way of distinguishing this particular variant (possible 
tail-packing), or will we have to disallow storing into the record tail 
padding altogether?  That would seriously suck.

Aldy
Jason Merrill - Sept. 1, 2011, 3:01 p.m.
On 09/01/2011 10:52 AM, Aldy Hernandez wrote:
> To answer your question, I believe we can't touch past the last field
> (into the padding) if the subsequent record will be packed into the
> first's padding.

Right.

> struct A {
>   int a : 17;
> };
> struct B : public A {
>   char c;
> };
>
> So here, if <c> gets packed into the tail-padding of A, we can't touch
> the padding of A when storing into <a>.

But that doesn't apply to this testcase because A is a POD class, so we 
don't mess with its tail padding.

> Is there a way of distinguishing this particular variant (possible
> tail-packing), or will we have to disallow storing into the record tail
> padding altogether? That would seriously suck.

Basically you can only touch the size of the CLASSTYPE_AS_BASE variant. 
  For many classes this will be the same as the size of the class itself.

Jason
Aldy Hernandez - Sept. 1, 2011, 3:10 p.m.
>> Is there a way of distinguishing this particular variant (possible
>> tail-packing), or will we have to disallow storing into the record tail
>> padding altogether? That would seriously suck.
>
> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
> For many classes this will be the same as the size of the class itself.

All this code is in the middle end, so we're language agnostic.

What do we need here, a hook to query the front-end, or is it too late? 
  Or will we have to play it conservative and never touch the padding 
(regardless of language)?

Aldy
Jason Merrill - Sept. 1, 2011, 3:19 p.m.
On 09/01/2011 11:10 AM, Aldy Hernandez wrote:
>> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
>> For many classes this will be the same as the size of the class itself.
>
> All this code is in the middle end, so we're language agnostic.
>
> What do we need here, a hook to query the front-end, or is it too late?
> Or will we have to play it conservative and never touch the padding
> (regardless of language)?

I think it would make sense to expose this information to the back end 
somehow.  A hook would do the trick: call it type_data_size or 
type_min_size or some such, which in the C++ front end would return 
TYPE_SIZE (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for 
other types.

Jason
Richard Guenther - Sept. 2, 2011, 8:48 a.m.
On Thu, Sep 1, 2011 at 4:16 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> My point is, the middle-end infrastructure makes it possible for this
>> case to appear, and it seems to be easy to handle conservatively.
>> There isn't a need to wait for users to run into an ICE or an assert we
>> put
>> there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I
>> ain't.
>
> Ughh, this is getting messier.
>
> Ok, I propose keeping track of the field prior (lastfld), calling
> get_inner_reference() and adding DECL_SIZE (or tbitsize if you prefer) to
> calculate maxbits without the padding.
>
> Notice the comment at the top.  We can get rid of yet another call to
> get_inner_reference later.
>
> Is this what you had in mind?

That could work also for the tail-padding re-use case, yes.  Note that
DECL_SIZE of the field is just the last fieds bit-precision, so ..

> BTW, we don't need to round up to the next byte here, do we?

.. rounding up to the next byte cannot hurt (dependent on what the
caller will do with that value).

Note that with all this mess I'll re-iterate some of my initial thoughts.
1) why not do this C++ (or C) specific stuff in the frontends, maybe
at gimplifying/genericization time?  That way you wouldn't need to
worry about middle-end features but you could rely solely on what
C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
enforce here, in the middle-end, which looks out-of-place.
2) all this information we try to re-construct here is sort-of readily
available when we layout the record (thus, from layout_type and
friends).  We should really, really try to preserve it there, rather
than jumping through hoops here (ideally we'd have an
(unused?) FIELD_DECL that covers the whole "bitfield group"
followed by the individual FIELD_DECLS for the bits (yep, they'd
overlap that group FIELD_DECL), and they would refer back to
that group FIELD_DECL)

Is the C++ memory model stuff going to be "ready" for 4.7?

Thanks,
Richard.

> Thanks.
> Aldy
>
>  /* If we found the end of the bit field sequence, include the
>     padding up to the next field...  */
>  if (fld)
>    {
>      tree end_offset, t;
>      HOST_WIDE_INT end_bitpos;
>
>      /* FIXME: Only call get_inner_reference once (at the beginning
>         of the bit region), and use
>         DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET throughout to
>         calculate any subsequent bit offset.  */
>
>      /* Even if the bitfield we access (and thus the whole region) is
>         at a constant offset, the field _following_ the bitregion can
>         be at variable offset.  In this case, do not include any
>         padding.  This is mostly for Ada.  */
>      if (TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST)
>        {
>          get_inner_reference (build3 (COMPONENT_REF,
>                                       TREE_TYPE (exp),
>                                       TREE_OPERAND (exp, 0),
>                                       lastfld, NULL_TREE),
>                               &tbitsize, &end_bitpos, &end_offset,
>                               &tmode, &tunsignedp, &tvolatilep, true);
>
>          /* Calculate the size of the bit region up the last
>             bitfield, excluding any subsequent padding.
>
>             t = (end_byte_off - start_byte_offset) * 8 + end_bit_off */
>          end_offset = end_offset ? end_offset : size_zero_node;
>          t = fold_build2 (PLUS_EXPR, size_type_node,
>                           fold_build2 (MULT_EXPR, size_type_node,
>                                        fold_build2 (MINUS_EXPR,
> size_type_node,
>                                                     end_offset,
>                                                     *byte_offset),
>                                        build_int_cst (size_type_node,
>                                                       BITS_PER_UNIT)),
>                           build_int_cst (size_type_node,
>                                          end_bitpos));
>          /* Add the bitsize of the last field.  */
>          t = fold_build2 (PLUS_EXPR, size_type_node,
>                           t, DECL_SIZE (lastfld));
>
>          *maxbits = tree_low_cst (t, 1);
>          return;
>        }
> ...
> ...
> ...
>
Richard Guenther - Sept. 2, 2011, 8:53 a.m.
On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/01/2011 11:10 AM, Aldy Hernandez wrote:
>>>
>>> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
>>> For many classes this will be the same as the size of the class itself.
>>
>> All this code is in the middle end, so we're language agnostic.
>>
>> What do we need here, a hook to query the front-end, or is it too late?
>> Or will we have to play it conservative and never touch the padding
>> (regardless of language)?
>
> I think it would make sense to expose this information to the back end
> somehow.  A hook would do the trick: call it type_data_size or type_min_size
> or some such, which in the C++ front end would return TYPE_SIZE
> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.

That's too late to work with LTO, you'd need to store that information
permanently
somewhere.

Maybe move this whole C++ specific bitfield handling where it belongs,
namely to the C++ frontend?

I suggest to always not re-use tail padding for now (I believe if your
parent object is a COMPONENT_REF, thus, x.parent.bitfield,
you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepance
to decide about whether the tail-padding was reused, but please
double-check that ;)))

Richard.

> Jason
>
>
Aldy Hernandez - Sept. 2, 2011, 12:49 p.m.
> Note that with all this mess I'll re-iterate some of my initial thoughts.
> 1) why not do this C++ (or C) specific stuff in the frontends, maybe
> at gimplifying/genericization time?  That way you wouldn't need to
> worry about middle-end features but you could rely solely on what
> C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
> enforce here, in the middle-end, which looks out-of-place.

The front-end, really?  After all this going back and forth?  After you 
were all so worried about Ada, and now you're ditching it in favor of 
handling only C++?

> Is the C++ memory model stuff going to be "ready" for 4.7?

No, not if you expect me rewrite things every day.
Richard Guenther - Sept. 2, 2011, 1:04 p.m.
On Fri, Sep 2, 2011 at 2:49 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Note that with all this mess I'll re-iterate some of my initial thoughts.
>> 1) why not do this C++ (or C) specific stuff in the frontends, maybe
>> at gimplifying/genericization time?  That way you wouldn't need to
>> worry about middle-end features but you could rely solely on what
>> C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
>> enforce here, in the middle-end, which looks out-of-place.
>
> The front-end, really?  After all this going back and forth?

Well, I'm fine with handling it in the middle-end if it's correct there.

> After you were
> all so worried about Ada, and now you're ditching it in favor of handling
> only C++?

I'm just showing you a possible solution for where you'd not need to
worry ;)  Consider LTOing an Ada and a C++ module - you need to
enable the C++ memory model at link-time so it is in effect when we
process bit-fields.  That will automatically enable it for the Ada pieces, too.

>> Is the C++ memory model stuff going to be "ready" for 4.7?
>
> No, not if you expect me rewrite things every day.

I don't expect you to rewrite things every day.

Don't read every comment I make as a definite decision and order to
you.  I am a mere mortal, too, and the bitfield thing is, I must admit,
still partially a mystery to myself (which is why I keep asking questions
instead of simply providing you with definite answers).  After all I pushed
back my idea of lowering bitfield accesses somewhere on GIMPLE and
I'm not sure if I get back to it for 4.7.  And I definitely would consider
2) for that work.

Btw, it would be nice if I weren't the only one reading your updated
patches :/  I'm just punching holes where I see them and hope I and
you learn something in that process.

Richard.
Jason Merrill - Sept. 2, 2011, 2:10 p.m.
On 09/02/2011 04:53 AM, Richard Guenther wrote:
> On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill<jason@redhat.com>  wrote:
>> I think it would make sense to expose this information to the back end
>> somehow.  A hook would do the trick: call it type_data_size or type_min_size
>> or some such, which in the C++ front end would return TYPE_SIZE
>> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.
>
> That's too late to work with LTO, you'd need to store that information
> permanently somewhere.

OK.

> Maybe move this whole C++ specific bitfield handling where it belongs,
> namely to the C++ frontend?

I don't think that is the way to go; C is adopting the same memory 
model, and this is the only sane thing to do with bit-fields.

> I suggest to always not re-use tail padding for now (I believe if your
> parent object is a COMPONENT_REF, thus, x.parent.bitfield,
> you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepancy
> to decide about whether the tail-padding was reused, but please
> double-check that ;)))

But you don't always have a COMPONENT_REF; you still need to avoid 
touching the tail padding when you just have a pointer to the type 
because it might be a base sub-object.

I wonder what would break if C++ just set TYPE_SIZE to the as-base size?

Jason
Richard Guenther - Sept. 2, 2011, 2:38 p.m.
On Fri, Sep 2, 2011 at 4:10 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/02/2011 04:53 AM, Richard Guenther wrote:
>>
>> On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill<jason@redhat.com>  wrote:
>>>
>>> I think it would make sense to expose this information to the back end
>>> somehow.  A hook would do the trick: call it type_data_size or
>>> type_min_size
>>> or some such, which in the C++ front end would return TYPE_SIZE
>>> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.
>>
>> That's too late to work with LTO, you'd need to store that information
>> permanently somewhere.
>
> OK.
>
>> Maybe move this whole C++ specific bitfield handling where it belongs,
>> namely to the C++ frontend?
>
> I don't think that is the way to go; C is adopting the same memory model,
> and this is the only sane thing to do with bit-fields.
>
>> I suggest to always not re-use tail padding for now (I believe if your
>> parent object is a COMPONENT_REF, thus, x.parent.bitfield,
>> you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepancy
>> to decide about whether the tail-padding was reused, but please
>> double-check that ;)))
>
> But you don't always have a COMPONENT_REF; you still need to avoid touching
> the tail padding when you just have a pointer to the type because it might
> be a base sub-object.
>
> I wonder what would break if C++ just set TYPE_SIZE to the as-base size?

Good question.  Probably argument passing, as the as-base size wouldn't
get a proper mode assigned form layout_type then(?) for small structs?

Maybe worth a try ...

Richard.

> Jason
>
Jeff Law - Sept. 2, 2011, 8:33 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/02/11 02:48, Richard Guenther wrote:
> 
> Note that with all this mess I'll re-iterate some of my initial
> thoughts. 1) why not do this C++ (or C) specific stuff in the
> frontends, maybe at gimplifying/genericization time?  That way you
> wouldn't need to worry about middle-end features but you could rely
> solely on what C/C++ permit.  It is, after all, C++ _frontend_
> semantics that we enforce here, in the middle-end, which looks
> out-of-place.
Well, it's worth keeping in mind that fixing the way we handle bitfields
is just one piece of a larger project.  Furthermore, many of the ideas
in the C++ memory model are applicable to other languages.

However, I must admit, I'm somewhat at a loss, I thought we were doing
all this in stor-layout.c at the time we layout the structure's memory
form then just trying to keep the code generator and optimizers from
mucking things up by combining accesses and the like.

Clearly I'm going to need to sit down and review the code as well.
Which means learning about a part of GCC I've largely been able to
ignore...  Sigh...

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOYT2jAAoJEBRtltQi2kC7BF4IAJWwqzsPUdhsHaodlUfm1LRu
JpMojs04wPVfu12+8zcXaWfitjOhEPJDPcZ5c0AHb74NRPJINJmjDvSBZWfFazaE
CrZE/U9IBz7Ay8s/gw//uMIWDS8lNjjYFxpqn6VMUpY2F/4QSkDZtaTlsTfm8YfU
+IZnwq82Johh8MzGDRuYY0HBKRdAotGS2F+SycdOxBGW6hnW0WR/2pt0BpIxNYgl
ro0dOgSptWoEmOFzhlN9pVsFvImSjXVlbV9GnF4AsDrh9x9FIaFIpvhgZMUU8wc+
Akg2jgLy2hhysQ9JtES0rL9qrptPPcQVqCL8ct/sLB85vYw7oUxwSenj7w8mwmE=
=yjMx
-----END PGP SIGNATURE-----
Jason Merrill - Sept. 7, 2011, 6:08 p.m.
On 09/02/2011 10:38 AM, Richard Guenther wrote:
> On Fri, Sep 2, 2011 at 4:10 PM, Jason Merrill<jason@redhat.com>  wrote:
>> I wonder what would break if C++ just set TYPE_SIZE to the as-base size?
>
> Good question.  Probably argument passing, as the as-base size wouldn't
> get a proper mode assigned form layout_type then(?) for small structs?

Classes for which the as-base size is different are passed by invisible 
reference, so that wouldn't be an issue.

But layout_decl would get the wrong size for variables and fields of the 
type, so that won't work.

Perhaps it's time to get serious about the change I talked about in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22488#c42 ...

Jason

Patch

Index: machmode.h
===================================================================
--- machmode.h	(revision 176891)
+++ machmode.h	(working copy)
@@ -249,8 +249,6 @@  extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-					unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT,
 					unsigned int,
 					enum machine_mode, int);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176891)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@  optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+    nmode = get_best_mode (lbitsize, lbitpos,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@  fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5286,7 +5286,7 @@  fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: testsuite/c-c++-common/cxxbitfields-9.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-9.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-9.c	(revision 0)
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+
+enum bigenum
+{ bigee = 12345678901LL
+};
+
+struct objtype
+{
+  enum bigenum a;
+  int b:25;
+  int c:15;
+  signed char d;
+  unsigned int e[3] __attribute__ ((aligned));
+  int f;
+};
+
+struct objtype obj;
+
+void foo(){
+  obj.c = 33;
+}
Index: testsuite/c-c++-common/cxxbitfields-10.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-10.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-10.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+
+/* Variable length offsets with the bit field not ending the record.  */
+   
+typedef struct
+{
+  short f:3, g:3, h:10;
+  char xxx;
+} small;
+
+struct sometype
+{
+  int i;
+  small s[10];
+} x;
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 10; i++)
+    x.s[i].f = 0;
+  return 0;
+}
Index: testsuite/c-c++-common/cxxbitfields-12.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-12.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-12.c	(revision 0)
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+
+struct stuff_type
+{
+  double a;
+  int b:27;
+  int c:9;
+  int d:9;
+  unsigned char e;
+} stuff;
+
+void foo(){
+stuff.d = 3;
+}
Index: testsuite/c-c++-common/cxxbitfields-14.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-14.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-14.c	(revision 0)
@@ -0,0 +1,25 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "--param allow-store-data-races=0" } */
+
+enum E0 { e0_0 };
+
+enum E2 { e2_m3 = -3, e2_m2, e2_m1, e2_0, e2_1, e2_2, e2_3 };
+
+struct S757
+{ 
+  enum E0 a;
+  enum E2 b:17;
+  enum E2 c:17;
+  unsigned char d;
+};
+
+struct S757 s757;
+
+int main()
+{   
+    s757.c = e2_m2;
+    return 0;
+}
+
+/* Make sure we don't load/store a full 32-bits.  */
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-6.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int :0;
+  volatile int c:7;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-8.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O --param allow-store-data-races=0" } */
+
+struct bits {
+  /* Make sure the bit position of the bitfield is larger than what
+     can be represented in an unsigned HOST_WIDE_INT, to force
+     get_inner_reference() to return something in POFFSET.  */
+      
+  struct {
+    int some_padding[1<<30];
+    char more_padding;
+  } pad[1<<29];
+
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+struct bits *p;
+
+/* Test that the store into <bitfield> is not done with something
+   wider than a byte move.  */
+void foo()
+{
+  p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-11.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-11.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-11.c	(revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+
+struct S1075
+{
+  unsigned short int a;
+  unsigned long long int b:29;
+  unsigned long long int c:35;
+  unsigned long long int d:31;
+  unsigned long long int e:50;
+  char *f;
+};
+
+struct S1075 blob;
+void foo(){
+blob.d=55;
+}
Index: testsuite/c-c++-common/cxxbitfields-13.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-13.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-13.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "--param allow-store-data-races=0" } */
+
+/* Test bit fields that are split word boundaries.  */
+
+struct footype
+{
+    int c:9;
+    int d:9;
+    char e;
+} foo;
+
+void funky()
+{
+    foo.d = 88;
+}
+
+/* Make sure we don't load/store a full 32-bits.  */
+/* { dg-final { scan-assembler-not "movl\[ \t\]foo" } } */
Index: testsuite/c-c++-common/cxxbitfields-7.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  int some_padding;
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+/* Store into <bitfield> should not clobber <b>.  */
+void update(struct bits *p)
+{
+    p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176891)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,8 @@  noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
+	      store_bit_field (op, size, start, integer_zero_node, 0, 0,
+			       GET_MODE (x), y);
 	      return;
 	    }
 
@@ -940,7 +941,7 @@  noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-		   0, 0, outmode, y);
+		   integer_zero_node, 0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -145,7 +145,7 @@  static void store_constructor_field (rtx
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			tree, HOST_WIDE_INT, HOST_WIDE_INT,
 			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
@@ -2077,7 +2077,7 @@  emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 integer_zero_node, 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2171,7 +2171,8 @@  copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD,
+		       integer_zero_node, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2808,7 +2809,8 @@  write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
+		   integer_zero_node, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3943,8 +3945,7 @@  get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+				 HOST_WIDE_INT bitregion_maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4005,8 +4006,9 @@  optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
+      if (bitregion_maxbits && bitregion_maxbits < GET_MODE_BITSIZE (str_mode))
+	str_mode = get_max_mode (bitregion_maxbits);
       str_mode = get_best_mode (bitsize, bitpos,
-				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4115,114 +4117,184 @@  optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
-/* In the C++ memory model, consecutive bit fields in a structure are
-   considered one memory location.
+/* In the C++ memory model, consecutive non-zero bit fields in a
+   structure are considered one memory location.
 
-   Given a COMPONENT_REF, this function returns the bit range of
-   consecutive bits in which this COMPONENT_REF belongs in.  The
-   values are returned in *BITSTART and *BITEND.  If either the C++
-   memory model is not activated, or this memory access is not thread
-   visible, 0 is returned in *BITSTART and *BITEND.
-
-   EXP is the COMPONENT_REF.
-   INNERDECL is the actual object being referenced.
-   BITPOS is the position in bits where the bit starts within the structure.
-   BITSIZE is size in bits of the field being referenced in EXP.
-
-   For example, while storing into FOO.A here...
-
-      struct {
-        BIT 0:
-          unsigned int a : 4;
-	  unsigned int b : 1;
-	BIT 8:
-	  unsigned char c;
-	  unsigned int d : 6;
-      } foo;
-
-   ...we are not allowed to store past <b>, so for the layout above, a
-   range of 0..7 (because no one cares if we store into the
-   padding).  */
+   Given a COMPONENT_REF, this function calculates the byte offset
+   from the containing object to the start of the contiguous bit
+   region containing the field in question.  This byte offset is
+   returned in *BYTE_OFFSET.
+
+   The bit offset from the start of the bit region to the bit field in
+   question is returned in *BIT_OFFSET.
+
+   The maximum number of bits that can be addressed while storing into
+   the COMPONENT_REF is returned in *MAXBITS.  This number is the
+   number of bits in the contiguous bit region, including any
+   padding.  */
 
 static void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp, tree innerdecl,
-	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+get_bit_range (tree exp, tree *byte_offset, HOST_WIDE_INT *bit_offset,
+	       HOST_WIDE_INT *maxbits)
 {
   tree field, record_type, fld;
-  bool found_field = false;
   bool prev_field_is_bitfield;
+  tree start_offset;
+  HOST_WIDE_INT start_bitpos;
+  /* First field of the bitfield group containing the bitfield we are
+     referencing.  */
+  tree bitregion_start;
 
-  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+  HOST_WIDE_INT tbitsize;
+  enum machine_mode tmode;
+  int tunsignedp, tvolatilep;
 
-  /* If other threads can't see this value, no need to restrict stores.  */
-  if (ALLOW_STORE_DATA_RACES
-      || ((TREE_CODE (innerdecl) == MEM_REF
-	   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
-	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
-      || (DECL_P (innerdecl)
-	  && (DECL_THREAD_LOCAL_P (innerdecl)
-	      || !TREE_STATIC (innerdecl))))
-    {
-      *bitstart = *bitend = 0;
-      return;
-    }
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
 
   /* Bit field we're storing into.  */
   field = TREE_OPERAND (exp, 1);
   record_type = DECL_FIELD_CONTEXT (field);
 
-  /* Count the contiguous bitfields for the memory location that
-     contains FIELD.  */
-  *bitstart = 0;
-  prev_field_is_bitfield = true;
+  /* Find the bitfield group containing the field in question, and set
+     BITREGION_START to the start of the group.  */
+  prev_field_is_bitfield = false;
+  bitregion_start = NULL_TREE;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
-
-      if (field == fld)
-	found_field = true;
-
-      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+      /* If we have a non-zero bit-field.  */
+      if (DECL_BIT_FIELD_TYPE (fld)
+	  && !integer_zerop (DECL_SIZE (fld)))
 	{
-	  if (prev_field_is_bitfield == false)
+	  if (!prev_field_is_bitfield)
 	    {
-	      *bitstart = bitpos;
+	      bitregion_start = fld;
 	      prev_field_is_bitfield = true;
 	    }
 	}
       else
+	prev_field_is_bitfield = false;
+      if (fld == field)
+	break;
+    }
+  gcc_assert (bitregion_start);
+  gcc_assert (fld);
+
+  /* Save the starting position of the bitregion.  */
+  get_inner_reference (build3 (COMPONENT_REF,
+			       TREE_TYPE (exp),
+			       TREE_OPERAND (exp, 0),
+			       bitregion_start, NULL_TREE),
+		       &tbitsize, &start_bitpos, &start_offset,
+		       &tmode, &tunsignedp, &tvolatilep, true);
+
+  if (!start_offset)
+    start_offset = size_zero_node;
+  /* Calculate byte offset to the beginning of the bit region.  */
+  /* BYTE_OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
+  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+  *byte_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+			      start_offset,
+			      build_int_cst (integer_type_node,
+					     start_bitpos / BITS_PER_UNIT));
+
+  /* Calculate the starting bit offset and find the end of the bit
+     region.  */
+  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
+    {
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      if (!DECL_BIT_FIELD_TYPE (fld)
+	  || integer_zerop (DECL_SIZE (fld)))
+	break;
+
+      if (fld == field)
 	{
-	  prev_field_is_bitfield = false;
-	  if (found_field)
-	    break;
+	  tree t = DECL_FIELD_OFFSET (fld);
+	  tree bits = build_int_cst (integer_type_node, BITS_PER_UNIT);
+	  HOST_WIDE_INT tbitpos;
+	  tree toffset;
+
+	  get_inner_reference (build3 (COMPONENT_REF,
+				       TREE_TYPE (exp),
+				       TREE_OPERAND (exp, 0),
+				       fld, NULL_TREE),
+			       &tbitsize, &tbitpos, &toffset,
+			       &tmode, &tunsignedp, &tvolatilep, true);
+
+	  if (!toffset)
+	    toffset = size_zero_node;
+
+	  /* bitoff = start_byte * 8 - (fld.byteoff * 8 + fld.bitoff) */
+	  t = fold_build2 (MINUS_EXPR, size_type_node,
+			   fold_build2 (PLUS_EXPR, size_type_node,
+					fold_build2 (MULT_EXPR, size_type_node,
+						     toffset, bits),
+					build_int_cst (integer_type_node,
+						       tbitpos)),
+			   fold_build2 (MULT_EXPR, size_type_node,
+					*byte_offset, bits));
+
+	  *bit_offset = tree_low_cst (t, 1);
 	}
     }
-  gcc_assert (found_field);
 
+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
+    {
+      *byte_offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
+      return;
+    }
+
+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
-      /* We found the end of the bit field sequence.  Include the
-	 padding up to the next field and be done.  */
-      *bitend = bitpos - 1;
+      tree end_offset, maxbits_tree;
+      HOST_WIDE_INT end_bitpos;
+
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &tbitsize, &end_bitpos, &end_offset,
+			   &tmode, &tunsignedp, &tvolatilep, true);
+      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);
+
+      if (end_offset)
+	{
+	  tree type = TREE_TYPE (end_offset);
+
+	  maxbits_tree = fold_build2 (PLUS_EXPR, type,
+				      build2 (MULT_EXPR, type,
+					      build2 (MINUS_EXPR, type,
+						      end_offset,
+						      *byte_offset),
+					      build_int_cst (size_type_node,
+							     BITS_PER_UNIT)),
+				      build_int_cst (size_type_node,
+						     end_bitpos));
+	}
+      else
+	maxbits_tree = build_int_cst (integer_type_node,
+				      end_bitpos - start_bitpos);
+
+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
     }
+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start));
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }
 }
 
@@ -4324,12 +4396,15 @@  expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
       tree tem;
+      tree bitregion_byte_offset = size_zero_node;
+      HOST_WIDE_INT bitregion_bit_offset = 0;
+      /* Set to 0 for the special case where there is no restriction
+	 in play.  */
+      HOST_WIDE_INT bitregion_maxbits = 0;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
@@ -4337,8 +4412,30 @@  expand_assignment (tree to, tree from, b
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
-	get_bit_range (&bitregion_start, &bitregion_end,
-		       to, tem, bitpos, bitsize);
+	{
+	  /* If other threads can't see this value, no need to
+	     restrict stores.  */
+	  if (ALLOW_STORE_DATA_RACES
+	      || ((TREE_CODE (tem) == MEM_REF
+		   || TREE_CODE (tem) == TARGET_MEM_REF)
+		  && !ptr_deref_may_alias_global_p (TREE_OPERAND (tem, 0)))
+	      || TREE_CODE (tem) == RESULT_DECL
+	      || TREE_CODE (tem) == PARM_DECL
+	      || (DECL_P (tem)
+		  && ((TREE_CODE (tem) == VAR_DECL
+		       && DECL_THREAD_LOCAL_P (tem))
+		      || !TREE_STATIC (tem))))
+	    {
+	      bitregion_byte_offset = size_zero_node;
+	      bitregion_bit_offset = 0;
+	      /* Set to 0 for the special case where there is no
+		 restriction in play.  */
+	      bitregion_maxbits = 0;
+	    }
+	  else
+	    get_bit_range (to, &bitregion_byte_offset,
+			   &bitregion_bit_offset, &bitregion_maxbits);
+	}
 
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
@@ -4388,6 +4485,10 @@  expand_assignment (tree to, tree from, b
 	      && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
 	    {
 	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      bitregion_byte_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_byte_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));
 	      bitpos = 0;
 	    }
 
@@ -4421,13 +4522,15 @@  expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4450,7 +4553,8 @@  expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
-				    bitregion_start, bitregion_end,
+				    bitregion_byte_offset, bitregion_bit_offset,
+				    bitregion_maxbits,
 				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
@@ -4477,13 +4581,14 @@  expand_assignment (tree to, tree from, b
 	    }
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
-					       bitregion_start, bitregion_end,
+					       bitregion_maxbits,
 					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4877,7 +4982,7 @@  store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, 0, 0, GET_MODE (temp), temp);
+			     0, integer_zero_node, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5342,8 +5447,8 @@  store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
-		 false);
+    store_field (target, bitsize, bitpos, integer_zero_node, 0, 0, mode, exp,
+		 type, alias_set, false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5917,10 +6022,14 @@  store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_BYTE_OFFSET is the byte offset from the beginning of the
+   containing object to the start of the bit region.
+
+   BITREGION_BIT_OFFSET is the bit offset from the start of the bit
+   region.
+
+   BITREGION_MAXBITS is the size of the bit region containing the bit
+   field in question.
 
    Always return const0_rtx unless we have something particular to
    return.
@@ -5935,8 +6044,9 @@  store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     tree bitregion_byte_offset,
+	     HOST_WIDE_INT bitregion_bit_offset,
+	     HOST_WIDE_INT bitregion_maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5970,7 +6080,8 @@  store_field (rtx target, HOST_WIDE_INT b
 	emit_move_insn (object, target);
 
       store_field (blk_object, bitsize, bitpos,
-		   bitregion_start, bitregion_end,
+		   bitregion_byte_offset, bitregion_bit_offset,
+		   bitregion_maxbits,
 		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
@@ -6086,7 +6197,8 @@  store_field (rtx target, HOST_WIDE_INT b
 
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
-		       bitregion_start, bitregion_end,
+		       bitregion_byte_offset, bitregion_bit_offset,
+		       bitregion_maxbits,
 		       mode, temp);
 
       return const0_rtx;
@@ -7497,7 +7609,8 @@  expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, 0, 0, TYPE_MODE (valtype), treeop0,
+			   0, integer_zero_node, 0, 0,
+			   TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 176891)
+++ expr.h	(working copy)
@@ -664,10 +664,12 @@  enum extraction_pattern { EP_insv, EP_ex
 extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
+extern enum machine_mode get_max_mode (HOST_WIDE_INT);
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     tree,
+			     HOST_WIDE_INT,
+			     HOST_WIDE_INT,
 			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176891)
+++ stor-layout.c	(working copy)
@@ -2361,13 +2361,6 @@  fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
-   BITREGION_START is the bit position of the first bit in this
-   sequence of bit fields.  BITREGION_END is the last bit in this
-   sequence.  If these two fields are non-zero, we should restrict the
-   memory access to a maximum sized chunk of
-   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
-   any adjacent non bit-fields.
-
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2386,20 +2379,11 @@  fixup_unsigned_type (tree type)
 
 enum machine_mode
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
 	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
-  unsigned HOST_WIDE_INT maxbits;
-
-  /* If unset, no restriction.  */
-  if (!bitregion_end)
-    maxbits = MAX_FIXED_MODE_SIZE;
-  else
-    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2436,7 +2420,6 @@  get_best_mode (int bitsize, int bitpos,
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
-	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 176891)
+++ calls.c	(working copy)
@@ -924,7 +924,8 @@  store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+	    store_bit_field (reg, bitsize, endian_correction,
+			     integer_zero_node, 0, 0,
 			     word_mode, word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 176891)
+++ expmed.c	(working copy)
@@ -48,13 +48,11 @@  struct target_expmed *this_target_expmed
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
@@ -340,8 +338,7 @@  mode_for_extraction (enum extraction_pat
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   HOST_WIDE_INT bitregion_maxbits,
 		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
@@ -558,7 +555,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
 				  bitnum + bit_offset,
-				  bitregion_start, bitregion_end,
+				  bitregion_maxbits,
 				  word_mode,
 				  value_word, fallback_p))
 	    {
@@ -722,10 +719,6 @@  store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -733,15 +726,20 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
-	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
+	  || (bitregion_maxbits
+	      && GET_MODE_BITSIZE (GET_MODE (op0)) > bitregion_maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode  (bitsize, bitnum,
-				  bitregion_start, bitregion_end,
-				  MEM_ALIGN (op0),
-				  (op_mode == MAX_MACHINE_MODE
-				   ? VOIDmode : op_mode),
-				  MEM_VOLATILE_P (op0));
+	{
+	  bestmode = (op_mode == MAX_MACHINE_MODE ? VOIDmode : op_mode);
+	  if (bitregion_maxbits
+	      && bitregion_maxbits < GET_MODE_BITSIZE (op_mode))
+	    bestmode = get_max_mode (bitregion_maxbits);
+	  bestmode = get_best_mode  (bitsize, bitnum,
+				     MEM_ALIGN (op0),
+				     bestmode,
+				     MEM_VOLATILE_P (op0));
+	}
       else
 	bestmode = GET_MODE (op0);
 
@@ -752,6 +750,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	{
 	  rtx last, tempreg, xop0;
 	  unsigned HOST_WIDE_INT xoffset, xbitpos;
+	  HOST_WIDE_INT xmaxbits = bitregion_maxbits;
 
 	  last = get_last_insn ();
 
@@ -762,13 +761,24 @@  store_bit_field_1 (rtx str_rtx, unsigned
 	  xoffset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
 	  xbitpos = bitnum % unit;
 	  xop0 = adjust_address (op0, bestmode, xoffset);
+	  if (xmaxbits)
+	    xmaxbits -= xoffset * BITS_PER_UNIT;
 
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
-				 bitregion_start, bitregion_end,
-				 fieldmode, orig_value, false))
+	  if (xmaxbits && unit > xmaxbits)
+	    {
+	      /* Do not allow reading past the bit region.
+		 Technically, you can read past the bitregion, because
+		 load data races are allowed.  You just can't write
+		 past the bit region.
+
+		 ?? Perhaps allow reading, and adjust everything else
+		 accordingly.  Ughh. */
+	    }
+	  else if (store_bit_field_1 (tempreg, bitsize, xbitpos, xmaxbits,
+				      fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
 	      return true;
@@ -781,7 +791,7 @@  store_bit_field_1 (rtx str_rtx, unsigned
     return false;
 
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_maxbits, value);
   return true;
 }
 
@@ -789,18 +799,22 @@  store_bit_field_1 (rtx str_rtx, unsigned
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_BYTE_OFFSET is the byte offset from STR_RTX to the start
+   of the bit region.
+
+   BITREGION_BIT_OFFSET is the field's bit offset from the start of
+   the bit region.
+
+   BITREGION_MAXBITS is the number of bits in the bit region.
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 tree bitregion_byte_offset,
+		 HOST_WIDE_INT bitregion_bit_offset,
+		 HOST_WIDE_INT bitregion_maxbits,
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
@@ -808,33 +822,51 @@  store_bit_field (rtx str_rtx, unsigned H
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
   if (MEM_P (str_rtx)
-      && bitregion_start > 0)
+      && bitregion_maxbits
+      && !integer_zerop (bitregion_byte_offset))
     {
-      enum machine_mode bestmode;
-      enum machine_mode op_mode;
-      unsigned HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset;
 
-      op_mode = mode_for_extraction (EP_insv, 3);
-      if (op_mode == MAX_MACHINE_MODE)
-	op_mode = VOIDmode;
-
-      offset = bitregion_start / BITS_PER_UNIT;
-      bitnum -= bitregion_start;
-      bitregion_end -= bitregion_start;
-      bitregion_start = 0;
-      bestmode = get_best_mode (bitsize, bitnum,
-				bitregion_start, bitregion_end,
-				MEM_ALIGN (str_rtx),
-				op_mode,
-				MEM_VOLATILE_P (str_rtx));
-      str_rtx = adjust_address (str_rtx, bestmode, offset);
+      if (host_integerp (bitregion_byte_offset, 1))
+	{
+	  /* Adjust the bit position accordingly.  */
+	  offset = tree_low_cst (bitregion_byte_offset, 1);
+	  /* Adjust the actual address.  */
+	  str_rtx = adjust_address (str_rtx, GET_MODE (str_rtx), offset);
+	}
+      else
+	{
+	  /* Handle variable length offsets.  */
+	  str_rtx = offset_address (str_rtx,
+				    expand_normal (bitregion_byte_offset), 1);
+	}
+      bitregion_byte_offset = integer_zero_node;
+      bitnum = bitregion_bit_offset;
     }
 
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
-			  bitregion_start, bitregion_end,
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, bitregion_maxbits,
 			  fieldmode, value, true))
     gcc_unreachable ();
 }
+
+/* Return the largest mode that can be used to address a bit field of
+   size BITS.  This is basically a MODE whose bit size is <= BITS.  */
+enum machine_mode
+get_max_mode (HOST_WIDE_INT bits)
+{
+  enum machine_mode mode, prev;
+
+  for (prev = mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
+       mode = GET_MODE_WIDER_MODE (mode))
+    {
+      if (GET_MODE_BITSIZE (mode) > bits
+	  || GET_MODE_BITSIZE (mode) > MAX_FIXED_MODE_SIZE)
+	return prev;
+      prev = mode;
+    }
+  gcc_unreachable ();
+  return VOIDmode;
+}
 
 /* Use shifts and boolean operations to store VALUE
    into a bit field of width BITSIZE
@@ -843,14 +875,16 @@  store_bit_field (rtx str_rtx, unsigned H
    The field starts at position BITPOS within the byte.
     (If OP0 is a register, it may be a full word or a narrower mode,
      but BITPOS still counts within a full word,
-     which is significant on bigendian machines.)  */
+     which is significant on bigendian machines.)
+
+     BITREGION_MAXBITS is the number of bits in the bit region, which
+     starts at OP0.  */
 
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   enum machine_mode mode;
@@ -872,19 +906,12 @@  store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos,
-				 bitregion_start, bitregion_end,
-				 value);
+	  store_split_bit_field (op0, bitsize, bitpos, bitregion_maxbits, value);
 	  return;
 	}
     }
   else
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
-
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -897,20 +924,26 @@  store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
-	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
+	  && (!bitregion_maxbits
+	      || GET_MODE_BITSIZE (GET_MODE (op0)) <= bitregion_maxbits)
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
-			      bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	{
+	  if (bitregion_maxbits
+	      && (bitregion_maxbits - offset * BITS_PER_UNIT
+		  < GET_MODE_BITSIZE (mode)))
+	    mode = get_max_mode (bitregion_maxbits - offset * BITS_PER_UNIT);
+	  mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+				MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	}
 
       if (mode == VOIDmode)
 	{
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 bitregion_start, bitregion_end, value);
+				 bitregion_maxbits, value);
 	  return;
 	}
 
@@ -932,6 +965,14 @@  store_fixed_bit_field (rtx op0, unsigned
 	 Then alter OP0 to refer to that word.  */
       bitpos += (offset % (total_bits / BITS_PER_UNIT)) * BITS_PER_UNIT;
       offset -= (offset % (total_bits / BITS_PER_UNIT));
+      if (bitregion_maxbits)
+	{
+	  enum machine_mode tmode;
+	  bitregion_maxbits -= offset * BITS_PER_UNIT;
+	  tmode = get_max_mode (bitregion_maxbits);
+	  if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (tmode))
+	    mode = tmode;
+	}
       op0 = adjust_address (op0, mode, offset);
     }
 
@@ -1031,8 +1072,7 @@  store_fixed_bit_field (rtx op0, unsigned
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   unsigned int unit;
@@ -1043,7 +1083,14 @@  store_split_bit_field (rtx op0, unsigned
   if (REG_P (op0) || GET_CODE (op0) == SUBREG)
     unit = BITS_PER_WORD;
   else
-    unit = MIN (MEM_ALIGN (op0), BITS_PER_WORD);
+    {
+      unit = MIN (MEM_ALIGN (op0), BITS_PER_WORD);
+
+      /* ?? Ideally we should do as much as we can with the wider
+	 mode, and use BITS_PER_UNIT for the remaining bits.  */
+      if (bitregion_maxbits % unit)
+	unit = BITS_PER_UNIT;
+    }
 
   /* If VALUE is a constant other than a CONST_INT, get it into a register in
      WORD_MODE.  If we can do this using gen_lowpart_common, do so.  Note
@@ -1148,7 +1195,7 @@  store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, bitregion_start, bitregion_end, part);
+			       thispos, bitregion_maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1588,7 +1635,7 @@  extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1714,7 +1761,7 @@  extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: stmt.c
===================================================================
--- stmt.c	(revision 176891)
+++ stmt.c	(working copy)
@@ -1760,7 +1760,7 @@  expand_return (tree retval)
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
 	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
-			   0, 0, word_mode,
+			   integer_zero_node, 0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 176891)
+++ params.def	(working copy)
@@ -912,7 +912,9 @@  DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
 	  "allow-store-data-races",
 	  "Allow new data races on stores to be introduced",
-	  1, 0, 1)
+	  /* TESTING TESTING */
+	  /* TESTING: Enable the memory model by default.  */
+	  0, 0, 1)
 
 
 /*